Hackathon 2023

Hackathon 2023


The 2023 QBI hackathon was a 48-hour gathering, forging connections between the lively Bay Area developer community and scientists from UCSF, UCB, and UCSC. Witnessing a convergence of brilliant ideas, teams composed of life scientists, computer scientists, and industry innovators showcased an impressive array of concepts. The event was a testament to the power of collaboration and the great ideas that emerge when we bridge the gap between disciplines. Explore descriptions of the team projects and view links to their code below.

Team StructHunt - First place

Exploring biomolecular structures is crucial for advancing our understanding of biology and tailoring effective therapeutics. The recent impact of the SARS-CoV-2 pandemic and the success of the AlphaFold2 AI tool underscore the importance of open access to structural biology data provided by the Protein Data Bank (PDB) in accelerating global scientific research and drug development. This team's initiative utilized large language models for natural language processing to search for integrative structures of biological molecules reported in publications but not yet deposited in public data-sharing resources. Through StructHunt, their cloud-based software pipeline, they employed ChatGPT to scan preprints, generate summaries of relevant structures, and notify the RCSB PDB biocuration team, encouraging authors to deposit their findings in the emerging PDB-Dev archive. For code and more details see the team’s GitHub page.

Team Animorphs - Second place

Currently, most generative protein models focus on generating static structures with single conformations. However in living organisms, proteins occupy many conformational states to perform their function. This team sought to capture these nuances in protein conformation by creating ConfDiffusion: a generative model that, given a polypeptide sequence, could output multiple, diverse structural conformations for it. Their model is a modified version of Protpardelle, an open-source generative protein diffusion model. Building a dataset from CoDNaS, a database of ~30,000 protein chains and ~430,000 possible conformations, they trained Protpardelle to output new protein conformations for a given protein structure. If perfected, their model could provide greater clarity to a wide range of problems in biology and biochemistry, from helping researchers understand how motor proteins change shape as they operate, to illuminating protein conformational changes in acidic tumor microenvironments. For code and more details see the team’s GitHub page.

Team SpekL - Third place

Recent research over the past decade has elucidated the importance of a diverse class of organelles known as bimolecular condensates in both normal biology and pathology. This team's project aimed to make data-driven inferences on the correlation between specific condensates and cancer, by comparing condensate proteomes to cancer gene expression databases. They found preliminary evidence for their correlation and plan to conduct further analysis in order to identify specific condensates that can be targeted in future cancer therapies. For code and more details see the team’s GitHub page.

Team Propel - Bonus prize from Neurotech Collider Lab 

This team's project aimed to develop a multi-modal machine learning model to combine a VGG16 neural network and an artificial neural network into a multi-modal model to determine whether a given patient should be recommended for radiation therapy based on the model's prediction. They used a combination of MRI imaging data, genetic information, and clinical data provided by the Burdenko Glioblastoma Progression (BGP) dataset on the Cancer Imaging Archive. Future directions include running downstream analysis on transcriptomic data to analyze the gene expression of MGMT promoter methylation and IDH1/2 to validate disease progression as well as improving model performance and accuracy. They hope that their project can be employed by physicians, neurologists, radiologists, and other practicing medical professionals in the field when diagnosing patients and developing effective patient treatment plans and ultimately improve the utility of machine-learning in biomedical sciences. For code and more details see the team’s GitHub page

Team Alphabeta

Team alpha↔beta explored a novel view on studying TCR-peptide recognition, a central mechanism of adaptive T-cell immunity. Existing technologies rely on learning correspondence between sequences of TCRs and peptides using supervised approach, and the progress is held back by scale of the available data. On the contrast, paired TCRα and TCRβ sequences can be collected quite cheaply and in abundance, and results of our team demonstrate that learning to match these two parts of TCR with deep learning can recover clusters specific to peptides. This proof-of-concept opens an opportunity to significantly improve our models in TCR-peptide recognition, a critical step to study auto-immunity, allergy and improve vaccines. For code and more details see the team’s GitHub page


View event photos here.

Go back to the main Hackathon page.