Amelia Taylor

After 16 years as a mathematics professor, including at Colorado College, Taylor competed for, and completed, the Insight Data Science Fellows Program, a rapid immersion training where she created a data product in four weeks. She is now a data scientist for Zymergen, where she works with scientists and robots. She works on end-to-end data products, including design of experiments, deciding what data we collect, how we store that data, analysis using that data, and production engineering of data products.

Speaker home page

Robots, Biology and Unsupervised Model Selection

AI & Data, Intermediate
8/18/2018 | 2:10 PM-2:40 PM | Fisher

Description

Zymergen combines robotics, software, and biology to improve microbial strains predictably and reliably. Robots can perform hundreds of experiments in parallel, and our analytical automation cleans and processes those data in near real-time. I present an end-to-end approach, in Python, for model selection, with an emphasis on parameter tuning, for unsupervised outlier detection algorithms.

Abstract

At Zymergen we integrate robotics, software and biology to provide predictability and reliability to the process of rapidly improving microbial strains through genetic engineering. One critical part of this process is rapid, robust and useful processing of data to provide scientists with the information they need to make the next round of changes and decide which strains to promote. Robots can perform hundreds of experiments in parallel, and our analytical automation cleans and processes those data in near real-time. A first step is to identify outliers that arise in the data due to multiple opportunities for process failure, and with this comes the challenges of modeling outliers, selecting a model, and tuning parameters for these models. In Robots, Biology and Unsupervised Model Selection, I present an end-to-end approach, in python, for parameter tuning unsupervised outlier detection algorithms. This problem is well studied for supervised and even semi-supervised (labels are human evaluation) anomaly and outlier detection algorithms, but there are few resources readily available when it comes to unsupervised algorithms in this arena.