Part II : Sequence Analysis Paul Tan Thiam Joo paul@bic.nus.edu.sg Department of Biochemistry,...

Preview:

Citation preview

Part II : Sequence Analysis

Paul Tan Thiam Joo

paul@bic.nus.edu.sg

Department of Biochemistry, Medicine Faculty, NUS

Institute for Infocomm Research

What is sequence analysis?

• Nucleic acids: DNA and RNA

• Proteins: amino acid composition, pI, molecular weight, hydrophobicity.

Why do sequence analysis?

• Assessing potential allergenicity (Gendel, 2002)

• Parkinson's disease (neurodegenerative) (Tversky and Fink, 2002)

• Human Genome Project (completed in 2001)

Sequence analysis of proteins

• Backtranslation

• Amino acid composition

• Molecular weights, pIs

• Hydropathy profile

http://kr.expasy.org

Backtranslation

• Protein -> DNA

• Use for cloning protein of interest where it may be present in low amount.

• Beware of codon bias and degeneracy of codons.

UUU-Phe UCU-Ser UAU-Tyr UGU-Cys UUC-Phe UCC-Ser UAC-Tyr UGC-Cys UUA-Leu UCA-Ser UAA-Stop UGA-Stop UUG-Leu UCG-Ser UAG-Stop UGG-Trp

CUU-Leu CCU-Pro CAU-His CGU-Arg CUC-Leu CCC-Pro CAC-His CGC-Arg CUA-Leu CCA-Pro CAA-Gln CGA-Arg CUG-Leu CCG-Pro CAG-Gln CGG-Arg

AUU-Ile ACU-Thr AAU-Asn AGU-Ser AUC-Ile ACC-Thr AAC-Asn AGC-Ser AUA-Ile ACA-Thr AAA-Lys AGA-Arg AUG-Met ACG-Thr AAG-Lys AGG-Arg

GUU-Val GCU-Ala GAU-Asp GGU-Gly GUC-Val GCC-Ala GAC-Asp GGC-Gly GUA-Val GCA-Ala GAA-Glu GGA-Gly GUG-Val GCG-Ala GAG-Glu GGG-Gly

Biased codon usageAmino acid Codon Bacteria Yeast Fruit Fly Human

Leu UUAUUG PreferredCUUCUCCUACUG Preferred Preferred Preferred

Val GUU Preferred PreferredGUCGUAGUG Preferred Preferred

Amino Acid Composition

• Determine the percentages of amino acid residues present in a protein molecule.

• Uses:– determine the lifestyles of organisms: high

percentages of Glutamate (- charge) and both Lysine and Arginine (+ charge) in hyperthermophiles vs. mesophiles -> absent (Tekaia et al., 2002).

– predict structural class (Luo et al., 2002).

Nonpolar amino acids (FILMWAV)

Polar uncharged (S-Q+T-N+Y-)

Polar charged (KHERD)

Unique Properties

Protein functions from specific residues

• C Disulphide-rich, zinc fingers

• G Collagens

• H Histidine-rich glycoprotein

• KR Nuclear proteins, nuclear localisation

• P Collagen, filaments

Molecular weights, pIs

• Aid in designing of purification experiments e.g. SDS-PAGE, IEF, 2-Dimensional Gel, Column chromatography etc.

Hydropathy Profiles

• Hydropathy - describe the hydrophobicity and hydrophilicity of a protein sequence.

• A graph in which hydropathy values are calculated within a sliding window and plotted for each residue in a protein sequence.

A sliding window

M K F F L M C L I I F P I M G V L G

Signal region

Alpha-helix

Alpha-helix

Alpha-helix

Beta-sheet

A schematic representation of a 3-D structure of a scorpion toxin

Hydropathy Profiles

• Hydropathy scale - each amino acid is assigned a value reflecting its relative hydrophobicity and hydrophilicity.

• 2 broad classes of scales:– Environmental characteristics of protein

residues.– Experimental measurements of amino acid

physiochemical properties.

Venn Diagram of the 20 amino acid physiochemical properties

Hydropathy Profiles

• Basic ranking: internal {FILMV}, external {DEHKNQR}, ambivalent {ACGPSTWY}

Hydropathy Profiles

• Detect possible transmembrane domains (consecutive 20-25 runs of hydrophobic amino acids).

• Hydrophobic protein cores

• Predict neurotoxicity in snake Phospholipases A2 (Kini and Iwanaga, 1986)

References

• Kini RM, Iwanaga S. (1986) Toxicon 24(6):527-541.

• Rehm BH. (2001) Appl Microbiol Biotechnol. 57(5-6):579-92.

• Weir M, Swindells M, OveringTon J. (2001) Trends Biotechnol 19(10 Suppl):S61-6.

• Gendel S. M. (2002) Ann. N.Y. Acad. Sci. 964: 87–98.• Luo RY, Feng ZP, Liu JK. (2002) Eur J Biochem 2002 269(17):4219-

4225

• Tekaia F, Yeramian E, Dujon B. (2002) Gene 297:51-60.

• Tversky VN, Fink AL. (2002) FEBS Lett 522(1-3):9-13.

• EXPASY http://cn.expasy.org/

Recommended