Jules Damji

Jules S. Damji is an Apache Spark Developer & Community Advocate at Databricks and an MLflow contributor. He is a hands-on developer with over 15 years of experience and has worked at leading companies, such as Sun Microsystems, Netscape, @Home, Opsware/Loudcloud, VeriSign, ProQuest, and Hortonworks, building large-scale distributed systems. He holds a B.Sc and M.Sc in Computer Science (from Oregon State University and Cal State respectively) and an MA in Political Advocacy and Communication (from Johns Hopkins University).

Speaker home page

Writing Continuous Applications with PySpark

Intermediate
8/15/2019 | 9:15 AM-12:45 PM | Workshop Room B

Description

In this tutorial we'll explore the concepts and motivations behind the continuous application, how Structured Streaming Python APIs in Apache Spark™ enable writing continuous applications, examine the programming model behind Structured Streaming, and look at the APIs that support them.

Abstract

We're amidst the Big Data Zeitgeist era in which data comes at us fast, in myriad forms and formats at intermittent intervals or in a continuous stream, and we need to respond to streaming data immediately. This need has created a notion of writing a streaming application that’s continuous, reacts and interacts with data in real-time. We call this continuous application.

In this tutorial we'll explore the concepts and motivations behind the continuous application, how Structured Streaming Python APIs in Apache Spark™ enable writing continuous applications, examine the programming model behind Structured Streaming, and look at the APIs that support them.

Through presentation, code examples, and notebooks, I will demonstrate how to write an end-to-end Structured Streaming application that reacts and interacts with both real-time and historical data to perform advanced analytics using Spark SQL, DataFrames and Datasets APIs.

You’ll walk away with an understanding of what’s a continuous application, appreciate the easy-to-use Structured Streaming APIs, and why Structured Streaming in Apache Spark is a step forward in developing new kinds of streaming applications.

This tutorial will be both instructor-led and hands-on interactive session. Instructions in how to get tutorial materials will be covered in class.

What you will Learn: – Understand the concepts and motivations behind Structured Streaming – How to use Spark DataFrame APIs – How to use Spark SQL and create tables on streaming data – How to write a simple end-to-end continuous application