speaker-page

Deborah Hanus

Deborah is a PhD student, studying machine learning at Harvard University, and she graduated from MIT with a M.Eng. in Electrical Engineering & Computer Science. Her work in machine learning has spanned developing models of human perception to exploring medical data. She has also been awarded the NSF, Fulbright, and ACM/Intel Computational & Data Science Fellowship. She has spoken at PyTennessee, SciPy Conf, AI With the Best, QConNY, and PyCon US.

Speaker home page

Predicting Oscar winners & box office hits using things you found on the Internet

Dealing with data, Novice
8/12/2017 | 2:00 PM-3:00 PM | Fisher East

Description

Using Jupyter notebooks and scikit-learn, you’ll predict whether a movie is likely to win an Oscar or be a box office hit. Together, we’ll step through the creation of an effective dataset: asking a question your data can answer, writing a web scraper, and answering those questions using nothing but Python libraries and data from the Internet.

Abstract

Using Jupyter notebooks and scikit-learn, you’ll predict whether a movie is likely to win an Oscar or be a box office hit. Together, we’ll step through the creation of an effective dataset: asking a question your data can answer, writing a web scraper, and answering those questions using nothing but Python libraries and data from the Internet.

This talk is for engineers, data scientists, and movie lovers who want to learn how to scrape information from the Internet, and then use python libraries (and some domain knowledge) to answer interesting questions using that data. This presentation could be informative for people with a wide range of skill-levels, but I expect it to be especially useful for anyone getting started with data science, http requests, pandas, and sklearn.

By the end of this talk, the you should expect to (a) understand how to scrape and manage small to medium data sets, (b) know how to overcome the most common roadblocks (i.e. dealing with timeouts or API keys), (c) understand the tools you need to use and steps you need to take to answer interesting questions in data science, and (d) have access to a great example project in a Jupyter notebook that you can use as a template or extend.