QBI | Hackathon 2018

Hackathon 2018

QBI fellow, Kliment Verba produced a program focused on the future of proteomics, light microscopy and cryoem for scientists, developers and entrepreneurs passionate about biotechnology. During our first ever hackathon we had close to 50 people come together from different backgrounds to work on cutting edge problems in processing imaging and mass spectrometry data. Please see below for descriptions of the team projects and links to their code.

Spectranet - First place

Modern mass spectrometers have an awesome power of discerning masses with differences in mass of a single proton. Such precise distribution of masses in your sample is called a spectrum. Identifying precise masses present in your biological samples enables identification of proteins and changes in protein expression, protein interactions, post translational modifications, etc, all informing our understanding of basic and disease relevant biology. However, the more complex and interesting is the sample the more complicated and busy are the resulting spectra, presenting a significant computational challenge in identifying distinct species in your sample. Classically, a major part of data analysis is comparing your spectra against existing “standard” spectra of known samples. The process of collecting such “standard” spectra is slow, laborious and is quite limited.

This team trained a neural network model on previously deposited mass spectrometry spectra to be able to predict theoretical spectra from peptide sequences, bypassing the need to collect the “standard” spectra. The team also created a web interface where a user can enter a peptide sequence and immediately see and download the predicted spectrum. For code and details see the team’s GitHub page.

Scutoids - Second Place

Simply looking at your samples is still one of the cornerstones of modern biology. Therefore, scientists routinely use a multitude of light microscopy methods to visualize cells and molecules within cells. A major strength of this approach is that light microscopy can be utilized to visualize living cells, allowing one to see how cell rearrangements/divisions over time can lead to the formation of tissues or how rearrangements of subcellular structures lead to differential cell behavior. Technology now allows to collect movies (often in 3D!) of cells dividing and moving over time scales of days. Therefore scientists are often presented with terabytes of imaging data which needs to be analyzed. However, as image contrast is fairly low and cells and subcellular structures most of the time lack regular shapes, identifying these things often ends up being a manual process. Although there are computational methods which can identify cells when they are sparse on the microscope slide, as cells divide and get more and more crowded, the task of identifying individual cells becomes next to impossible.

This team utilized both, neural networks and other types of machine learning to dynamically identify and track individual cells and cell nuclei as cells crawled around microscope slides. Importantly, their code worked not only on sparse cells but also at a point when the cells got really crowded, abutting each other. For code please check out the team’s GitHub page.

PartiAnimals - Third Place

Proteins are the molecular workers of life, performing a variety of tasks, from sensing light or heat to digesting proteins and sugars in our diet into smaller pieces. A fundamental tenet of modern biology is that unique functions of proteins arise as the result of different proteins adopting unique 3D shapes. Visualizing these protein 3D shapes in atomic details enables us to understand how proteins work and also how their function goes awry in disease. The field tasked with such visualizations is the field of structural biology and it has been recently revolutionized by cryo-electron microscopy (cryoEM). In cryoEM we collect millions of 2D protein images and then utilize computer clusters to perform image processing, classification, denoising and 3D reconstruction on these images to obtain an atomic model of the protein of interest. Although there have been major advances in classification algorithms, there are still often errors in image classification which are obvious as errors to the human eye.

This team created software with an intuitive interface where a user can display filtered and downsampled images of the individual proteins together with their higher contrast class averages to allow for rapid identification of misclassified protein images. As a web interface, this can be used to crowdsource this task, allowing for rapid manual curating where algorithms fail. To see the team’s code check out their GitHub.

EMPoser

As described above, cryoEM is now becoming a major structural biology technique where we can obtain atomic 3D models of proteins from their 2D images. However, the details we can observe are often limited due to the fact that each individual protein molecule is not necessarily rigid and there may be regions of it which are move about. During cryoEM data processing we rely on robust classification of individual 2D protein images and then add like images together to effectively denoise our image of the protein. This only works if the images added are all the same. If there are significant differences between images (ie, motion of a subregion of the protein) and if we cannot characterize such motion in any reguralized way this will lead to a failure to obtain an atomic detail 3D model of the protein. Regularizing and visualizing such protein motions in cryoEM models is one of the major areas of active and future work in cryoEM field.

This team modified and utilized sub region classification and refinement analysis as part of Relion processing package to characterize and visualize the dynamic regions of the protein. Utilizing quaternions they were able to decompose major motions into individual principal components and visualize protein motions along those components.

FFTeeth

To make well-fitting tooth crowns a precise 3D model of the tooth is essential. This team utilized algorithms akin to the ones used to reconstruct 3D cryoEM models from 2D images to generate a model of the tooth from a collection of 2D x-ray images.

Surgeon’t

Laparoscopic surgery is a modern minimally invasive surgical technique where rather than having one large incision during the procedure the doctors use special tools with the laparoscope and operate through a small number of incisions 0.5-1cm in diameter. Use of this technique leads to less hemorrhaging, pain and faster recovery. During placement of the incisions for the laparoscope and/or surgical tools care should be taken to avoid damaging vital organs. This team trained a neural net to identify, locate and segment the bladder based on abdominal MRI scans and then propose the best point of incision on the patient as to not disrupt the bladder for the laparoscopic surgery of the prostate tumors. Their code can be found on GitHub.