17
PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

Embed Size (px)

Citation preview

Page 1: PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

PROTEIN STRUCTURE CLASSIFICATION

SUMI SINGH(sxs5729)

Page 2: PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

2

Levels of Protein Structure

Page 3: PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

3

Traditional Architecture Molecular ArchitectureFormfits

function

Thanks to: Frank Lloyd Wright for graphics

WOOD, BRICK etc. material/building blocks AMINO ACIDS

*****************************************************************Number of Amino Acids found in Eukaryotic Proteins= 20 (found in universal genetic code)+ 2 (synthetically incorporated therefore not included in discussion)

Possible number of protein sequence of size 300 =

20300

This number is greater than the total number of atoms in the universe

Page 4: PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

4

Evolution

Evolution has selected a very small subset of those protein sequences

< 30,000 in humans and an even smaller number of protein structures (1000–5000)

Ratio– 1:6

Conserved structures are expected to reflect functional similarities (interaction with other molecules)

Page 5: PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

5

Sequence Structure Function

Why Compare Protein Structures?

Low sequence similarity may yield very similar structures Sometimes high sequence similarity yields different

structures

Page 6: PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

FOR THE CURRENT PROJECT

Know your dataset

Page 7: PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

PDB: Protein Data Bank

• The Protein Data Bank (PDB) archive is the single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids.

• These are the molecules of life that are found in all organisms including bacteria, yeast, plants, flies, other animals, and humans. Understanding the shape of a molecule helps to understand how it works.

• This knowledge can be used to help deduce a structure's role in human health and disease, and in drug development. The structures in the archive range from tiny proteins and bits of DNA to complex molecular machines like the ribosome.

• Web address: http://www.rcsb.org/pdb/home/home.do

Page 8: PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

SCOP Dataset

“Nearly all proteins have structural similarities with other proteins and, in some of these cases, share a common evolutionary origin. The SCOP database, created by manual inspection and abetted by a battery of automated methods, aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known. As such, it provides a broad survey of all known protein folds, detailed information about the close relatives of any particular protein, and a framework for future research and classification.”

• http://scop.berkeley.edu/

Page 9: PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

Starting at the bottom, the hierarchy of SCOP domains

comprises the following levels-- Species representing a distinct protein

sequence.-- Protein grouping together similar

sequences of essentially the same functions.

-- Family containing proteins with similar sequences but typically distinct functions.

-- Superfamily bridging together protein families with common functional and structural features inferred to be from a common evolutionary ancestor.

-- Levels above Superfamily are classified based on structual features and similarity, and do not imply homology:Folds grouping structurally similar superfamilies.

Page 10: PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

Structural Fingerprints/Features

Structure comparison is an NP-Hard problem.

There are no fast structural alignment algorithms that can guarantee optimality within any given similarity measure. Therefore, existing structure comparison methods employ heuristics.

There are different approaches for extracting structural features.

We use Triangular Spatial Relationship to generate keys.

Page 11: PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

FOR THE CURRENT PROJECT

HUH!! LOOKS LIKE I HAVE DONE EVERYTHING..SO WHY ARE WHY AM I HERE?

Page 12: PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

What do you get from me?

• A file of Keys Created Representing each Protein Structure.

• Each of these files of keys representing protein has been correctly classified into their respective Superfamilies.

• That will give you the class information for the files. It is a hypothesis that each file belonging to same class must have similar keys. You must be able to test this hypothesis.

Page 13: PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

Biggest Challenge

0 50 100 150 200 250 300 350 400 450 500 550 6000.00

5,000,000.00

10,000,000.00

15,000,000.00

20,000,000.00

25,000,000.00

30,000,000.00

4,410,549.00

COMBINATORIAL EXPLOSION

Number of Amino Acids per Protien

Key

Coun

t

Page 14: PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

For the current project

• Develop SIGNATUREs for the PROTEIN KEYS.

• These SIGNATUREs must be used to CLASSIFY the proteins correctly into their respective SUPERFAMILIES.

• Performance and Speed are important

Page 15: PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

Signature for Keys

• Accurately/concisely represent the keys.

• Signatures can be simple statistics like mean, median etc. of the keys or a complex combination of features.

• What ever may be the choice of Signature/s, it/they must be able to perform extremely fast and accurate classification of the protein/s.

Page 16: PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

Choice of Classifier/Tool

• Criteria:

1. ACCURACY

2. SPEED

Page 17: PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)

Final Product

A software that takes in “keys” as input and classifies it correctly.

There must be a check if the “new” protein-keys already exists in the system.