Daniel Imberman, Seth Edwards

Daniel Imberman is a Big Data Engineer on the Spark & NoSQL Team at Bloomberg LP. He received a BS/MS in distributed Machine Learning from UC Santa Barbara in 2016.

Seth Edwards is a Staff Software Engineer at PubNub and responsible for Data platforms, ETL, and DevOps.

Speaker home page

Airflow on Kubernetes: Dynamically Scaling Python-based DAG Workflows

Python & Libraries, AI & Data, DevOps & Automation, Scale & Performance, Fun & People, Intermediate
8/19/2018 | 11:35 AM-12:20 PM | Robertson

Description

Over the past year, we have developed a native integration between Apache Airflow and Kubernetes that allows for dynamic allocation of DAG-based workflows and dynamic dependency management of individual tasks.

Abstract

Apache Airflow is a highly popular Directed Acyclic Graphs (DAG) based workflow engine that allows users to deploy complex DAGs as python code. It is considered a natural progression of the "code as configuration" philosophy of DevOps and ETL.

With the addition of the native "Kubernetes Executor" and "Kubernetes Operator", we have extended Airflow's flexibility with dynamic allocation and dynamic dependency management capabilities of Kubernetes and Docker.