BioVis Project: Identification of Mutations that Affect Protein Function

Auto rotate

Overview of the Methodology

The histograms at the top of the visualization embody the significant differences between a defective triosephosphate isomerase (we call dTIM) and the family (5000 other TIMs) mean for each of the primary characteristics of a protein. Let:

pc1_dTIM_k_p = 1st principal component of dTIM for characteristic k and position p
ave(pc1_fTIM_k_p) = the average 1st principal component of the TIM family for characteristic k and position p
std(pc1_fTIM_k_p) = the standard deviation of the 1st principal component of the TIM family for characteristic k and position p

The score then is the standardized distance between the score for dTIM and the average for the family:

score_k_p = (pc1_dTIM_k_p - ave(pc1_fTIM_k_p)) / std(pc1_fTIM_k_p)
for k = 1 (Alpha & Turn Propensity),..., 6 (Other Characteristics)
and p = 1,..., 248

The 3-dimensional distances used to compute the distance histogram were provided here, which is the PDB file for the S. cerevisiae triosephosphate isomerase (we call scTIM). We used the location of the central carbon atom of each residue as the (x,y,z) coordinates for the residue when calculating pairwise distances.


Visualization Features

The visualization has several features including the visualization techniques of brushing/linking:

The visualization is initialized with all of the histograms brushed. By brushing these histograms (i.e. mouse-down, drag, and then mouse-up) you can identify residues or regions of the dTIM enzyme that are significantly different than the family of 5000 other proteins. This provides a holisitic view of all of the stacked residuals so you may identify regions of interest. Furthermore, we have encoded the mutated residues (from the funcational 'parent' scTIM) as red and the resdiues that are equal the scTIM are encoded with light grey. If you would like to only focus on a specific property (e.g. hydrophobicity) then you may unbrush all of the other histograms to isolate the property. You may then brush or scroll to the region of the sequence in the middle to inspect the residues and their exact positions in the "zoomed in" version at the bottom. You may compare outliers in both a stacked view or an aligned view. To enhance the analysis we have added chords above the sequence such that when the distance histogram is brushed the chords encode the proximity to other residues. If a chord is present then those two residues or regions are within the extent of the brushed distance histogram (in Angstroms) and to further differentiate proximity within the brush we have encoded proximity with grey scale such that residues and regions that are closer are darker.

A video describing this entire process can be viewed here.

Contact Information:
Johnathan Mercer (mercer at fas.harvard.edu)
Balaji Pandian (balajipandian at college.harvard.edu)
Academic Advisors:
Alexander Lex (alex at seas.harvard.edu)
Nicolas Bonneel (nbonneel at seas.harvard.edu)