Project Team

NAVIGATION

About

Projects

Alumni

LATEST PROJECTS

imdb

In-Memory Database (IMDb) is an implementation of the Redis protocol.

VibeSync

VibeSync is a research project which aims to explore the boundary of ML research with music. Inspired by recent advances with contrastive learning and joint language-audio embeddings, we aim to build a proof-of-concept system where a user specifies a playlist title and receives recommended songs. We want to see how far take this and what insights we can gain.

QUICK LINKS

Cornell Engineering - Project Teams

FREQUENTLY SEARCHED FOR

Projects

Project Spotlight

At Cornell Data Science, our project work embodies the cutting-edge intersection of theory and practical application across a broad spectrum of data science disciplines. Our dedicated subteams—Data Science, Machine Learning Engineering, Data Engineering, and Quantitative Finance—drive forward a diverse range of initiatives that advance our understanding and application of data analytics, machine learning models, and quantitative financial strategies. Through rigorous analysis, innovative model development, and strategic implementations, each project supports our mission to foster an environment of learning and growth while producing impactful, data-driven solutions for real-world problems.

Fall 2024

VibeSync

VibeSync is a research project which aims to explore the boundary of ML research with music. Inspired by recent advances with contrastive learning and joint language-audio embeddings, we aim to build a proof-of-concept system where a user specifies a playlist title and receives recommended songs. We want to see how far take this and what insights we can gain.

imdb

In-Memory Database (IMDb) is an implementation of the Redis protocol.

My Course Index v2

My Course Index v2 is a project that aims to build a search engine for edstem for Cornell University. It is a web application that allows users to search for course contents efficiently using natural language.

Prediction Markets

Prediction markets allow users to trade contracts based on future event outcomes, using crowd wisdom to generate probability forecasts. Platforms like Kalshi demonstrate how well-calibrated predictions and skilled participants can lead to accurate forecasting of real-world events.

Millennium X CDS - Data Quality Platform

This project is another collaboration between Millennium and CDS. This project supports a qualitative investment thesis with quantitative evidence and explore different quantitative strategies in finance. The goal is to build pipelines to collect and clean data at scale, perform signal research to construct robust portfolios, and gain insight into industry tools/techniques

CDS X arXiv: Paper Moderation

This project is a collaboration between CDS and arXiv. The goal is to build a system that can automatically moderate papers submitted to arXiv. The system will use a combination of natural language processing and machine learning techniques to identify and flag potentially problematic papers.

FileBuddy

FileBuddy is a project that aims to build a system that can automatically manage files on a user's computer. It uses ReAct Agents to interact with the user and the file system.

Spring 2024

Pocket-ML

We developed a mobile application and library that reduces overhead of training ML models by allowing users to start, stop, train, monitor, and deploy their models remotely from their mobile device.

Caddie-AI

We develop a mobile app that uses computer vision to analyze a user's golf swing through their smartphone camera to predict the trajectory and flight of their golf ball. While on the course, would provide caddie-like advice to players to improve their game.

sketchify

Sketchify is a tool that converts color images into infinitely scalable vector sketches for coloring books. Users can either select a portion of an image to convert into a line drawing, or upload their own sketch to receive a similarity score compared to the AI-generated version.

MILLENNIUM X CDS

Building scalable pipelines for data collection and cleansing, and utilizing quantitative strategies for portfolio construction.

Fall 2023

Ball-101

Ball-101 builds a service for the community at large + low budget sports programs for Sports Analytics and Stats Tracking.

Munch

Munch is a CDS MacrosAI project that aims to build a proof-of-concept system that can classify images in real-time using a mobile device.

Spring 2023

Edge-ML

Edge-ML is a project that aims to bring ML to the edge of the network, allowing for faster and more efficient processing of data. We aim to build a proof-of-concept system that can classify images in real-time using a mobile device.

TRIVAI

An iOS application that generates quizzes for users based on any topic.

DIGS

DIGS is a distributed game server that allows users to play games with each other over the internet. We aim to build a proof-of-concept system that can host a variety of games and allow for multiplayer gameplay.

Fall 2022

MathSearch

MathSearch is a next-generation search engine designed to locate equations within PDFs using LaTeX math script. It addresses a major limitation of using CTRL/CMD + F with standard keyboards: the inability to directly search for mathematical symbols (like integrals and summations). Our platform enables users to search equations using LaTeX notation, and we return PDF pages with bounding boxes highlighting the closest matches to the queried equations.

Fall 2021

self-driving-car

CDS Self-Driving Car is a multi-year project undertaken by the Cornell Data Science team intended to demonstrate autopilot for a full-size car. It aims to demonstrate tight integration of a camera based vision algorithm to navigate a car safely. We utilize a SLAM system to localize ourselves, and an additional lane recognition pipeline to see the road. This is all built upon a Python control loop and will leverage robotics to directly actuate the steering wheel to augment a normal car.

For more projects, you are welcome to visit our GitHub organization