Insights

Team Leads: Kaitlyn Chen, Greg Bodik

Advisor: Prof. Jeff Rzeszotarski

About Us

The Insights team is composed of data scientists, software engineers, statisticians, and designers dedicated to these main goals:
— Gaining insights from complex data using a variety of data science methods, from Natural Language Processing to Deep Learning, with a focus on improving model interpretability.
— Creating visual interfaces that display the knowledge to both technical and non-technical audiences, and researching the best ways of doing so.

Current Projects

Food Waste Management Visualization
The project aims to understand what aspects of data visualizations can provoke people to consider lessening their food waste. By using the lens of data visualization to better examine food waste, the outcome of this research can serve to help education campaigns surrounding food waste, as well as help local grocery stores understand the role they play in the food waste management pipeline. This research will provide insights into the key elements needed in data visualizations for managing food waste.
Learn More
Class Visualizer
Students have to visit different resources to access information about courses such as the class roster and student center. The search functionalities within these resources aren’t super helpful with comparing different classes or necessarily showing the relationships between series of classes. This project aims to develop an interface (using D3) to visualize information about classes such as the umber of students (and max number of students), prerequisites/corequisites, professor, and median grade.

Past Projects

GAN Art Generation
The GAN Art generation project experimented with Generative Adversarial Networks to create filters for images to resemble the work of different artists and to generate images from scratch (from random noise). A flask app with the weights of the models from each of the filters and generators has been developed so that one can upload any image and view them through our different filters.
Learn More
Mask Off
Motivated by the inconvenience of not being able to use facial recognition software to unlock our phones while wearing masks, we set out to build a model that could accurately recognize one's face while partially obstructed by a mask. Namely, this required recognizing the upper portion of the subject's face while everything below the nose and along the chin was covered. To do so, we preprocessed a dataset of 13,000 people and correctly drew masks on all of them (coloring over their nose, mouth, and along their jawline with a solid blue). Then, we built and trained a Siamese Network using these matched pictures, ultimately leading to our final product, which will allow the user to select two images from the dataset and determine whether or not the two images portray the same person.
Learn More
Databowl
NFL Databowl utilized statistical, machine learning, and deep learning techniques to predict how many yards a team will gain on rushing plays using data provided by NFL’s Next Gen Stats. The goal was to gain deeper insight into rushing plays and help coaches develop effective strategies to optimize their play selections. Our final results gave us accuracy scores in the top 1% of Kaggle’s NFL Big Data Bowl competition.
Learn More
Visualizing ML
The main idea behind the Visualizing ML project was to visualize machine learning and computer-driven processes in a way that's appealing and insightful to the users, specifically by visualizing what a chess engine thinks for a given game state. The project had three main sections: getting the top moves for a position from an engine, making a GUI with features such as the chessboard, drawing arrows, and highlighting squares, and then connecting these parts by getting the information from the engine and using that as inputs for GUI visualizations. The GUI can take in a FEN string, which represents a specific game state, and displays the new position on the board. On this new position, it generates arrows to show the best moves and responses, shows a heatmap of which squares are more active, and displays the top move sequences with their probabilities. Overall, these features are meant to give the user different visual insights into what the engine thinks about the position.
Learn More