what are the Advantages of knowing the complete genome sequence? | • All encoded proteins can be predicted and identified.
• The missing functions can be identified and analyzed
• Peculiarities and novelties in each organism can be studied
• Predictions can be made and verified. |
what has changed in protein science of the 20th century to 21th century? | 20th century
• Few well-studied proteins
• Mostly globular with enzymatic activity
• Biased protein set
21st century
• Many “hypothetical” proteins
• Various, often with no enzymatic activity
• Natural protein set |
what are Properties of the natural protein set? | • Unexpected diversity of even common enzymes (analogous, paralogous, xenologous)
• Conservation of the reaction chemistry, but not the substrate specificity
• Functional diversity in closely related proteins
• Abundance of new structures |
what are the conserved in comparative genomics for proteins? | • Those amino acids that are conserved in divergent proteins (archaeal and bacterial, hyperthermophilic and mesophilic) are likely to be important for catalytic activity.
• Prediction of the 3D fold and general biochemical function is much easier than prediction of exact biological (biochemical) function.
•Reaction chemistry often remains conserved even when sequence diverges almost beyond recognition. |
what's Comparative analysis? | allows us to find subtle sequence similarities in proteins that would not have been noticed otherwise. |
what do Sequence database and Sequence analysis function? | -Sequence database searches that use exotic or highly divergent query sequences often reveal more subtle relationships than those using queries from humans or standard model organisms (E. coli, yeast, worm, fly).
-Sequence analysis complements structural comparisons and can greatly benefit from them. |
what's Protein Evolution? | • Tree of life & evolution of protein families (Dayhoff, 1978)
• Can build a tree representing evolution of a protein family, based on sequences
• Othologous gene family: organismal and sequence trees match well. |
what's Protein Evolution with regards to homologs,orthologs,ans paralogs? | • Homolog
✓ Common ancestors
✓ Common 3D structure
✓ Usually at least some sequence similarity
(sequence motifs or more close similarity)
• Ortholog
✓ DerivedfromSpeciation
• Paralog
✓ DerivedfromDuplication |
what's Enzyme recruitment? | Minor mutational changes convert a glycerol kinase into gluconate kinase, that Leads to non-orthologous gene displacement. |
what are some traditional thoughts? | • Homologous sequences have similar function
• Sites of greatest functional significance are under the strongest selective constraints
• Selective constraints can be measured by dN/dS ratio BUT...
• Most synonymous substitutions are selectively neutral and therefore occur at a high rate, i.e., are inappropriate to detect functional divergence, if it occurred long ago (over 150 Mya). |
what are some new approaches? | • Non-synonymous (replacement) substitutions are analyzed alone
• Substitution models, which allow evolutionary rates to vary among sites in a protein-coding sequence according to a gamma distribution:
• homogeneous: the functional constraints at sites are constant over the entire evolutionary history.
• heterogeneous: some residues might be subject to changed functional constraints in various branches of the phylogenetic tree |
what's gamma distribution ? | The mean E(r) of the gamma distribution is the average mutation rate of the selected substitution model and its variance. |
what's the gamma distribution formula? | ?? =?(?)^2/ ?
where a is a shape parameter, which describes the shape of the distribution and the substitution rates for all categories of sites
• a increases as the variation in rates among sites decreases
• when a→+∞, the gamma model reduces to the single rate model |
what's the difference between Homogeneous and non-homogeneous gamma models? | •homogeneous gamma model:
-gives rise to two descendent populations (D1 and D2) each of which has the same site- specific rates as the ancestral population.
• non- homogeneous gamma model:
-gives rise to two descendent populations (D3 and D4) in which the site-specific rates can be different from those in the ancestral population.
-each descendent population D3 and D4 contains the same number of slow, moderate and fast sites. that means the same for each of these individual descendent populations . |
what's the physiological meaning and evolutionary meaning of functional divergence? | • Physiological meaning: the genes (and respective proteins) diverge by their actual physiological (biochemical) functions. This is tested by so-called “wet” experiments.
• Evolutionary meaning: the genes (and respective proteins) diverge by their evolutionary rates, which assumes respective divergence in their physiological functions. |
what are the Types of functional divergence? | • Type I functional divergence results in altered functional constraints (i.e., different evolutionary rate) between duplicate genes (Gu 1999).
• Type II results in no altered functional constraints but radical change in amino acid property between them (e.g., charge, hydrophobicity, etc.). |
what is DIVERGE 3.0? | -Developed by Xun Gu (University of Iowa) and analyses both types of functional divergence in a set of aligned protein sequences.
• Returns values of the coefficients of type I and type II functional divergence, θI and θ II, respectively
• Computes posteriors for each individual site; the value >0.8 indicates that the site has experienced a significant shift in evolutionary rate |
what several other analyses does DIVERGE 3.0 Perform ? | • Rate Variation among Sites(RVS)
• Ancestral Sequence Inference
• Functional Distance Analysis |
what's BADASP (Burst After Duplication with Ancestral Sequence Predictions )? | • Developed by Richard Edwards in 2005
• Identifies residues under both type I and type II divergence
• In addition to residues implicated in functional divergence (i.e., those under positive selection), identifies residues under consistent purifying (negative) selection, which may be important for common functionality between all sequences/subfamilies.
• Implemented in SeqSuite software package |