Finding detailed relationships between proteins specific to phenotypes among microbial organisms
Daniel ParkMolecular Biology Institute, UCLA
Yeates labSoCalBSI
August 24, 2006
OUTLINE
• Phylogenetic profiles
• Ternary logic analysis
• Building COG & phenotype profiles
• Results of logic analysis
OUTLINE
• Phylogenetic profiles
• Ternary logic analysis
• Building COG & phenotype profiles
• Results of logic analysis
PHYLOGENETIC PROFILES• Turning an earlier question on its side:• From, “What proteins are found in a genome?”• To, “What genomes contain a given protein?”
VARIATIONS OF PHYLOGENETIC PROFILES
• Relationships between protein families
• Relationships between protein family profile and given target ‘phenotype’ profile
OUTLINE
• Phylogenetic profiles
• Ternary logic analysis
• Building COG & phenotype profiles
• Results of logic analysis
COMPLEXITY OF CELLULAR PROCESSES
HIGHER ORDER RELATIONSHIPS:TERNARY LOGIC ANALYSIS
A B
8 LOGIC TYPES FOR PHYLOGENETIC PROFILE TRIPLETS
MEASURING MUTAL INFORMATION BETWEEN TWO PROFILES
Where U is the uncertainty coefficient relating profiles x and y H is the Shannon entropy of the probability distributions
Range of U: [0,1] Ex. U = 0.88 88% decrease in uncertainty
High value of U indicates high
mutual information between x and y
)(/)],()()([)|( xHyxHyHxHyxU
MEASURING MUTAL INFORMATION AMONG THREE PROFILES
U(c | f(a,b)) where f(a,b) is the logical combination of a and b
Constraints:
U(c|a) < xU(c|b) < xU(c|f(a,b)) > y
OUTLINE
• Phylogenetic profiles
• Ternary logic analysis
• Building COG & phenotype profiles
• Results of logic analysis
COGs: CLUSTERS OF ORTHOLOGOUS GROUPS
Set of orthologous proteins from at least three different lineages
Cluster Functional group
COMBINATIONS OF COG PROFILES MATCHING A PHENOTYPE
ASSOCIATING MORE GENOMES WITH COGS
No. of fully sequenced bacterial genomes over the last 9 years
66
354
70
50
100
150
200
250
300
350
400
1997 2003 2006
Years
No
. o
f b
acte
rial
gen
om
es
`
BUILDING COG PROFILES
• 81,480 proteins• 354 bacterial genomes• 4,613 COGs
BUILDING PHENOTYPE PROFILES
http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi
OUTLINE
• Phylogenetic profiles
• Ternary logic analysis
• Building COG & phenotype profiles
• Results of logic analysis
Cumulative no. of protein triplets recovered at an uncertainty coefficient score greater than a given
threshold
Frequency for each of the eight logic function types observed
CORRELATIONS WITH PHENOTYPES:TEMPERATURE RANGE
• For U > 0.8, one relationship between proteins was found:
Hyperthermophilicity = and( COG0432, !COG0225 )U ( Hyp. | COG0432 ) = 0.26
U ( Hyp. | COG0225 ) = 0.29
U ( Hyp. | and( COG0432, !COG0225 ) ) = 0.71
[S] COG0432: Uncharacterized conserved protein
[O] COG0225: Peptide methionine sulfoxide reductase
LOGICAL COMBINATION OF COG PROFILES MATCHING A PHENOTYPE PROFILE
c = hyperthermophilicityf = and( COG0432, !COG0225 ) a = COG0432 (Uncharacterized conserved protein)b = !COG0225 (Peptide methionine sulfoxide reductase)
CONCLUSIONS
• There may be a correlation between the absence of methionine sulfoxide reductase and the presence of an uncharacterized conserved protein in hyperthermophiles.
CONCLUSIONS
– Classified ~80,000 proteins from 354 bacterial genomes into ~4,600 COGs
– Built COG and phenotype profile matrices for 354 fully sequenced bacterial genomes
– Support that ternary relationships among COGs are biologically significant
– Support that some logic types are seen in biology more than others: 1 (and)
57 (xor)
FUTURE DIRECTIONS
• Build a richer database of phenotype profiles
• Investigate relationships at lower cutoffs
• Experimentally characterize the unknown COG0432 by crystallography
ACKNOWLEDGEMENTS
Todd Yeates
Matteo Pellegrini
Yeates lab
Morgan Beeby
Brian O’Connor
Rest of the lab
SoCalBSI 2006
Jamil Momand
Wendie Johnston
Sandra Sharp
Nancy Warter-Perez
Ronnie Cheng
Fellow participants