Software Engineer at Blue Apron on the Data Engineering team. Work daily using Python on our data pipeline. Excited by how Python is transforming Data Engineering.
Heard of Apache Airflow? Do you work with Airflow or want to work with Airflow? Ever wonder how to better test Airflow? Have you considered all data workflow use cases for Airflow? Come be reminded of key concepts and then we will dive into Airflow’s value add, common use cases, and best practices. Some use cases: Extract Transform Load (ETL) jobs, snapshot databases, and ML feature extraction.
Background - What is Airflow?
Explain Cron and how it compares to Airflow
High level explain the key concepts of Airflow
* Direct Acyclic Graph (DAG) - nodes are tasks and edges are dependency structure
* Third Party Integrations (Slack, Google Cloud Platform, AWS, etc)
* Airflow Hooks & Operators
* What is Airflow?
* Programmatically author workflows
* Stateful scheduling
* Rich CLI and UI that make development easy
* Logging, monitoring, and alerting
* Modularity lends itself well to testability
* Solves common problems with batch processing
* Open sourced by AirBnB in 2015
* What value does Airflow add?
* Retries task elegantly, which handles transient network errors
* Alerts on failure (email or slack)
* Can re-run specific tasks in a large DAG
* Support distributed execution
* Great OSS community and momentum
* Can be hosted on AWS, Azure, or GCP
* Managed options for Airflow - AWS Glue, GCP Cloud Composer, or Azure Data Factory
Common Use Cases
Extract Transform Load (ETL) Jobs
* Airflow enables moving data and transforming data very easily
* Can create custom Hooks for Third Party APIs
Efficiently Snapshot Databases
Create Test Environments for QA
ML Feature Extraction
* Unit tests from lib functions
* Acceptance tests to run list_dags
Doc MD for the DAG
* Contain Points of Contact
* What remediation/escalation steps should the on-call person take when this DAG fails?
Exciting New/New(ish) Features
* Role Based Access Control
* Airflow 2.0 Improvements