Project: West Coast Vectors Team Submission for the OpenLongevity HackAging.ai Hackathon (October 2025)
Challenge: Sequence-to-Function Track
This project aims to address the bottleneck in longevity research caused by critical protein sequence-to-function knowledge being locked away in unstructured scientific literature. We built an AI-powered pipeline to automatically extract these relationships, create a queryable knowledge base, and synthesize accessible summary articles for researchers, focusing initially on the C-C Chemokine Receptor family (CCR1, CCR2, CCR5, CCR7) and their role in inflammaging.
- Live Web Application: https://protein-seq-to-function.vercel.app/
- Presentation Slides: View Slides (Google Slides)
- Video Summary (2 min): Watch on YouTube
This repository contains the code and resources developed during the hackathon:
backend/: The FastAPI application serving the extracted knowledge and providing endpoints for the pipeline.core_logic/: Core Python modules, including the LangChain tool definition.data/: Snapshots of data used or generated by the pipeline.deployment_test/: Scripts for testing containerized deployment to GCP Cloud Run.docs/: Additional documentation and diagrams.frontend/: The Next.js web application code.notebooks/: Jupyter notebooks for prototyping and exploration.scripts/: One-off utility scripts.test_pipeline/: Scripts for testing offline pipeline components..github/workflows/: GitHub Actions workflows for CI/CD.
- Backend: Python, FastAPI, LlamaIndex, FAISS, Nebius AI (Embeddings & LLM)
- Frontend: Next.js
- Deployment: Google Cloud (Cloud Run, Cloud SQL), GitHub Actions
- Data: Pandas, SQLite (Corpus Index)
(Optional: Add Getting Started/Usage sections here if you want to guide users on running the code)