William Horton

William Horton is a Senior Backend Engineer at Compass, where he works on systems for ingesting, processing, and serving millions of real estate listings. In his spare time, he blogs and speaks about deep learning, contributes to open-source libraries like fastai and pytorch, and competes in computer vision competitions on Kaggle. When he’s not doing tech things, he enjoys powerlifting and singing a cappella.

CUDA in your Python: Parallel Programming on the GPU

ML, AI, & Data, Scale & Performance, Intermediate
8/17/2019 | 3:45 PM-4:30 PM | Robertson 2

Description

It’s 2019, and Moore’s Law is dead. CPU performance is plateauing, but GPUs provide a chance for continued hardware performance gains, if you can structure your programs to make good use of them. In this talk you will learn how to speed up your Python programs using Nvidia’s CUDA platform.

Abstract

CUDA is a platform developed by Nvidia for GPGPU--general purpose computing with GPUs. It backs some of the most popular deep learning libraries, like Tensorflow and Pytorch, but it has broader uses in data analysis, data science, and machine learning.

There are several ways that you can start taking advantage of CUDA in your Python programs.

For some common Python libraries, there are drop-in replacements that let you start running computations on the GPU, while still using APIs that you might be familiar with. For example, CuPy provides a Numpy-like API for interacting with multi-dimensional arrays. Another recent project is cuDF by RAPIDS AI, which mimics the pandas interface for dataframes.

If you want more control over your use of CUDA APIs, you can use the PyCUDA library, which provides bindings for the CUDA API that you can call from your Python code. Compared with drop-in libraries, it gives you the ability to manually allocate memory on the GPU, as well as to write custom CUDA code. However, it comes with some drawbacks, such as having to write your CUDA code as large strings in your Python program, and compiling your CUDA code while running your program, rather than beforehand.

Finally, for the best performance you can use the Python C/C++ extension interface, the approach taken by deep learning libraries like Pytorch. One of the strengths of Python is the ability to drop down into C/C++, and libraries like Numpy take advantage of this for increased speed. If you use Nvidia’s nvcc compiler for CUDA, you can use the same extension interface to write custom programs in CUDA and then call them from your Python code.

This talk will explore each of these methods, provide examples to get started, and discuss in more detail the pros and cons of each approach.