7
T. Hamp & L. Richter Protein Prediction II Exercise

T. Hamp & L. Richter Protein Prediction II Exercise

Embed Size (px)

Citation preview

Page 1: T. Hamp & L. Richter Protein Prediction II Exercise

T. Hamp & L. Richter

Protein Prediction II Exercise

Page 2: T. Hamp & L. Richter Protein Prediction II Exercise

T. Hamp & L. Richter

Exercise – Project LayoutGeneral remarks – recap: Report 60pts, Exam 40 pts, weekly

presentations of each group, one bad presentation allowed, groups of 3-4 students

Contact & Questions: [email protected] only!

The exercise is taken from the CAFA competition

Prediction of HPO terms

HPO: Human phenotype ontology

2

Page 3: T. Hamp & L. Richter Protein Prediction II Exercise

T. Hamp & L. Richter

Terms – Definitions and ExplanationsAmino acids (aa): Building blocks for proteins, 20 different aa are

found in proteinsProtein sequence: String of characters representing a sequence of

amino acids (string from a 20 letter alphabet)The protein sequence defines the protein structure and the protein

function (within some limits)Proteins sequences are stored in large publicly available repositoriesOne of the most well known repositories is UniProt (

http://www.uniprot.org/) and its section Swiss-ProtBesides the sequence these databases hold additional information

about the protein, too

3

Page 4: T. Hamp & L. Richter Protein Prediction II Exercise

T. Hamp & L. Richter

Ontology (in information science)Ontology: An ontology represents knowledge as a set of concepts

within a domain, using a shard vocabulary to denote types, properties and interrelationships of those concepts

Human Phenotype ontology (HPO): Set of concepts describing human appearing (shape, health, a.s.f.)

HPO concepts are hierarchically ordered, i.e. there is a “is-a” relation ship.

they are arranged in a tree-like fashion

4

Page 5: T. Hamp & L. Richter Protein Prediction II Exercise

T. Hamp & L. Richter

Our competitionProteins are annotated (described) with experimentally determined

information

As time goes by: Proteins are associated with information about experimentally confirmed effects on the human phenotype

The associated term are taken form the Human Phenotype ontology

Experimental determination is slow and expensive

=> we try to predict associated HPO terms for the yet un-annotated

5

Page 6: T. Hamp & L. Richter Protein Prediction II Exercise

T. Hamp & L. Richter

More formal stepsFind a function that assigns a set of HPO terms T to a sequence s so

that the number of false assignment is minimal and the number of true assignments is maximal

Remember: The true evaluation is done after submission when so far not annotated sequences get experimentally determined annotations

6

Page 7: T. Hamp & L. Richter Protein Prediction II Exercise

T. Hamp & L. Richter

TasksDownload files from www.rostlab.org/~richter/pp2_files.tgz

Get familiar with the provided files

Especially the column names (look for at Uniprot and HPO)

Read: http://biofunctionprediction.org/sites/default/files/IntroductionCAFA_pedja.pdf

7