Cung is the lead architect for the product search platform at Wayfair. He has developed a variety of applications from search intent engines to visual search engines. His team’s application design focuses on long-term scalability and supports millions of requests per day. He specializes in natural language processing, information retrieval, and computer vision.
With the rise of deep learning, computer vision tasks have become more accessible for developers. Building a custom visual search engine creates interesting possibilities for new features. Learn to train and deploy a visual search engine to enable visual similarity search. Implementing it from scratch enables domain-specific features and customizability that third party solutions lack.
Building a visual search engine allows users to discover items they may never be exposed to. It enables features such as using images to query visually similar items on an application.
The field of computer vision has exploded with recent developments in deep convolutional neural networks. These models can embed input images into lower dimensional spaces, transforming them into fixed-size vectors. This is useful because all images reside in a unified vector-space where their relative positions hold significance. With deep learning, visual similarity models are taught to minimize the distance between similar images and separate dissimilar images. This process is called metric learning.
Python packages like Keras and Tensorflow make it simple to structure and train a model. The training data needed are pairs of similar images and pairs or dissimilar images. This could be as easy as labeling various images of the same item as similar and randomly sampling images as dissimilar. Use Keras to implement a siamese neural network that takes two images as input and embeds each into the vector-space. If the image pair is labeled as similar, the model will adjust itself to push their vectors closer together. Depending on the training set size and GPU hardware, a visual similarity model could be ready within a few hours or days.
K-nearest-neighbor search is the method for finding similar images in the vector-space. Each searchable image must be embedded into the vector-space by pushing it through the trained model. When all vectors are generated, create a fast knn data-structure with them to enable real-time vector-search. Many Python packages exists that create binary tree structures or small-world networks for knn search. Once that structure is created, it can be pickled and stored for later production use.
Models trained on Keras (with a Tensorflow backend) can be deployed to production systems using Tensorflow Serving. TF Serving is a gRPC server that allows a client to run the model on images as if the model were running on the client locally. There are many reasons to run the model outside the microservice such as performance and independent scaling.
Flask is a useful tool to build a microservice that ties together all the necessary components. Upon startup, the microservice will load the pickled index into memory so it can use the index for knn search. Url routes are easily creatable and support POST data like images. Upon image upload, the microservice will generate the image’s vector via connection to TF Serving and use that vector for a knn search on the in-memory index. The results of the search are the most similar images in the index.