Jeff Fischer is an independent consultant focused on data intensive systems, from infrastructure to machine learning. Current clients include a research laboratory, a high performance storage company, and the Max Planck Institute for Software Systems. Jeff has a PhD in Computer Science from UCLA and is co-organizer of the Bay Area Python Interest Group (BayPiggies).
These days, DevOps folks live and breath containers, especially Docker and related technologies. As a Data Scientist, you may have heard about Docker, but are less interested in investing the time to become an expert, since it is not core to your job. In this talk, you will get just enough Docker knowledge to improve your data science workflow and avoid common pitfalls.
The day-to-day concerns of Data Scientists and DevOps Engineers can be very different. In this talk, we’ll show how Docker, a technology from the DevOps world, can improve the lives of Data Scientists. In particular, Docker can dramatically simplify the configuration of your programming/analysis environment, facilitate sharing your work with colleagues, and lead to reproducible workflows and results. On the other hand, there are pitfalls with Docker that you will want to avoid.
We will cover container and Docker concepts, running Docker, using Docker with GPUs, and best practices for data science with Docker. You will see concrete examples of how to use Docker in Python-focused environments, including interactive REPL and script development using Anaconda Python and scikit-learn, web environments such as Jupyter and TensorBoard, and Nvidia’s DIGITS. Along the way, we will point out common pitfalls to avoid and better approaches.
This talk is aimed at intermediate to advanced Data Scientists. Some familiarity with a Linux/Mac command line is assumed.