Lisa Roach

Lisa is a Production Engineer at Facebook and a CPython Core Developer. She is passionate about Python, and has spent time using Python on networking and security teams, and now focuses on improving the language itself and enabling other users of it.

Extending GDB with Python

Python & Libraries, Advanced
8/17/2019 | 2:50 PM-3:20 PM | Robertson

Description

GDB is powerful, and can be extended with Python to do more than just one-off debugging. This talk will describe using Python with GDB to with GDB to write tools that interact with running processes, highlighting GDB’s ability to call C functions and how this can be coupled with Python’s C-API to inject code without needing to stop the process.

Abstract

Outline

The first half of the talk will be introductory in nature. I will talk about GDB and some of the specifics of how it can be extended with Python. The second half of the talk I will use the things discussed to create a program that does simple memory analysis of all objects in a running Python process.

  1. Introduction- who I am, what this talk is about (1 min)
  2. GDB Introduction (3 min) a. What it is typically used for- debugging segfaults, etc b. How it works- ptrace under the hood c. Can execute C code (this will be important later in the talk) d. Source files (also important later in the talk)
  3. GDB Python API (6 min) a. Accessing Python from GDB: basic commands (python [command], source, py-list, etc) b. Accessing GDB from Python: To use GDB in Python, from a GDB process, you can import gdb and all of the API is available to you. The API has extensive options to allow for many types of analysis but I will focus on some of the more basic options: executing GDB CLI strings, setting GDB parameters, and creating custom GDB commands. Docs: https://sourceware.org/gdb/current/onlinedocs/gdb/Python-API.html#Python-API c. Show an example using gdb and Python’s C-API: gdb.execute(‘call PyRun_SimpleString(“print(‘Hello World’)”)’)
  4. Problem Statement: I want to know what objects are taking up the most memory in my program, but I don’t want to (or more realistically, can’t) add code to my process to do the data collection. Solution: Use GDB to inject into my running process some memory analysis code (we can use open-source projects like objgraph or pympler to do the memory work for us) and get that data back without killing the debugged process. (2 min)
  5. Launch a GDB subprocess and attach to the running process we are interested in debugging. (2 min)
  6. In the GDB subprocess command, connect to a Python script file that GDB can use as a ‘source file’. This file will be able to import gdb and use the GDB Python API to create custom commands. (1 mins)
  7. Write a custom command by inheriting from gdb.Command and using gdb.execute with PyRun_SimpleString (3 mins)
  8. Memory analysis tools can be injected the same way as print(‘Hello World’) using PyRun_SimpleString, but PyRun_SimpleString only likes one line at a time. To get around this, I place all the code I would like executed in another file, and have PyRun_SimpleString run an exec(file.read()) on the open file (3 mins)
  9. Two gotchas: locking the GIL, and making sure the running process has access to the memory analysis modules (2 min)
  10. Conclusion (2 min)