49
Biological Sequence Analysis Spring 2008 1: Introduction

Biological Sequence Analysis Spring 2008 1: Introduction

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Biological Sequence Analysis Spring 2008 1: Introduction

Biological Sequence Analysis

Spring 2008

1: Introduction

Page 2: Biological Sequence Analysis Spring 2008 1: Introduction

Teachers:Nimrod Rubinstein [email protected] Burstein [email protected]

Tel: 03-640-9245

Reception hours:by appointment.

Course website:

1: Introduction

Administration

http://bioinfo.tau.ac.il/~intro_bioinfo/mta

Page 3: Biological Sequence Analysis Spring 2008 1: Introduction

Requirements1: Introduction

•Home assignments – 25%

•Midterm quiz – 25%

•Final project – 50%

All assignments must be submitted on time

Do not copy!

Page 4: Biological Sequence Analysis Spring 2008 1: Introduction

Goals

To familiarize the students with research topics in sequence analysis in bioinformatics, and with relevant tools in this field

Prerequisites

• Familiarity with topics in molecular biology (cell biology and genetics)

• Basic familiarity with computers & internet

1: Introduction

Page 5: Biological Sequence Analysis Spring 2008 1: Introduction

Ask, Ask, Ask!!

"אין הביישן למד"

1: Introduction

Page 7: Biological Sequence Analysis Spring 2008 1: Introduction

What do bioinformaticians study?

• Bioinformatics today is part of almost every molecular biological research

• To name a few examples…

1: Introduction

Page 8: Biological Sequence Analysis Spring 2008 1: Introduction

Example 1

• Compare proteins with similar sequences (for instance –kinases) and understand what the similarities and differences mean

1: Introduction

Page 9: Biological Sequence Analysis Spring 2008 1: Introduction

Example 2

• Look at the genome and predict where genes are located (promoters; transcription factor binding sites; introns; exons)

1: Introduction

Page 10: Biological Sequence Analysis Spring 2008 1: Introduction

• Predict the 3-dimensional structure of a protein from its primary sequence

Example 3

Ab-initio prediction – extremely difficult!

1: Introduction

Page 11: Biological Sequence Analysis Spring 2008 1: Introduction

• Correlate between gene expression and disease

Example 4

A gene chip – quantifying gene expression in different tissues under different conditions

May be used for personalized medicine

1: Introduction

Page 12: Biological Sequence Analysis Spring 2008 1: Introduction

1: Introduction

Computational biology – revolutionizing science at the turn of the century

Page 13: Biological Sequence Analysis Spring 2008 1: Introduction

Three studies using bioinformatics which highly impacted science

1. Classifying life into domains2. Predicting drug resistance in HIV

and personalizing drug administration

3. Solving the mystery of anthrax molecular biology

1: Introduction

Page 14: Biological Sequence Analysis Spring 2008 1: Introduction

1. Revolutionizing the Classification of Life

1: Introduction

Page 15: Biological Sequence Analysis Spring 2008 1: Introduction

•Life was classified as

plants and animals

•When Bacteria were discoveredthey were initially classified as plants

•Ernst Haeckel (1866) placed all unicellular organisms in a kingdom called Protista, separated from Plantae and Animalia

In the very beginning

1: Introduction

Page 16: Biological Sequence Analysis Spring 2008 1: Introduction

1: Introduction

Page 17: Biological Sequence Analysis Spring 2008 1: Introduction

Thus, life were classified to 5 kingdoms:

When electron microscopes were developed, it was found that Protista in fact include both cells with and without nucleus. Also, fungi were found to differ from plants, since they are heterotrophs (they do not synthesize their food)

LIFE

FungiPlants Animals ProtistsProcaryotes

1: Introduction

Page 18: Biological Sequence Analysis Spring 2008 1: Introduction

Later on, plants, animals, protists and fungi were collectively called the Eucarya domain, and the procaryotes were shifted from a kingdom to be a Bacteria domain

Domains EucaryaBacteria

FungiPlants Animals ProtistsKingdoms

Even later, a new Domain was discovered…

1: Introduction

Page 19: Biological Sequence Analysis Spring 2008 1: Introduction

•The translation apparatus is universal and probably already existed in the “beginning”

rRNA was sequenced from a great number of organisms to study phylogeny

1: Introduction

Page 20: Biological Sequence Analysis Spring 2008 1: Introduction

Carl R. Woese and rRNA phylogeny1: Introduction

Page 21: Biological Sequence Analysis Spring 2008 1: Introduction

A distance matrix was computed for each two organisms. In a very influential paper, they showed that methanogenic bacteria are as distant from bacteria as they are from eucaryota (1977)

1: Introduction

Page 22: Biological Sequence Analysis Spring 2008 1: Introduction

One sentence about methanogenic “bacteria”

“There exists a third kingdom which, to date, is represented solely by the methanogenic bacteria, a relatively unknown class of anaerobes that possess a unique metabolism based on the reduction of carbon dioxide to methane”.

These "bacteria" appear to be no more related to typical bacteria than they are to eucaryotic cytoplasms.“

1: Introduction

Page 23: Biological Sequence Analysis Spring 2008 1: Introduction

From sequence analysis only, it was thus established that life is divided into 3 domains:BacteriaArchaeaEucarya

1: Introduction

Page 24: Biological Sequence Analysis Spring 2008 1: Introduction

1: Introduction

The rRNA phylogenetic tree

Page 25: Biological Sequence Analysis Spring 2008 1: Introduction

2. Revolutionizing HIV treatment

1: Introduction

Page 26: Biological Sequence Analysis Spring 2008 1: Introduction

There are very efficient drugs for AIDS treatment

1: Introduction

A few viruses in blood

DRUG, +a few more days

Many viruses in blood

DRUG, +a few days

Many viruses in blood

Page 27: Biological Sequence Analysis Spring 2008 1: Introduction

Explanation: the virus mutates and some viruses become resistant to the drug

Solution 1: combination of drugs (cocktail)

Solution 2: not to give drugs for which the virus is already resistant. For example, if one was infected from a person who receives a specific drug.

The question: how does one know to which drugs the virus is already resistant?

1: Introduction

Page 28: Biological Sequence Analysis Spring 2008 1: Introduction

Sequences of HIV-1 from patients who were treated with drug A:

AAGACGCATCGATCGATCGATCGTACGACGACGCATCGATCGATCGATCGTACGAAGACACATCGATCGTTCGATCGTACG

Sequences of HIV-1 from patients who were never treated with drug A:AAGACGCATCGATCGATCGATCTTACGAAGACGCATCGATCGATCGATCTTACG AAGACGCATCGATCGATCGATCTTACG

1: Introduction

Page 29: Biological Sequence Analysis Spring 2008 1: Introduction

drug A+AAGACGCATCGATCGATCGATCGTACGACGACGCATCGATCGATCGATCGTACGAAGACACATCGATCGTTCGATCGTACG

drug A-AAGACGCATCGATCGATCGATCTTACGAAGACGCATCGATCGATCGATCTTACG AAGACGCATCGATCGATCGATCTTACG

This is an easy example!

1: Introduction

Page 30: Biological Sequence Analysis Spring 2008 1: Introduction

drug A+AAGACGCATCGATCGATCGATCGTACGACGACGCATCGATCGATCGATCGTACGAAGACACATCGATCATTCGATCATACG

drug A-AAGACGCATCGATCTATCGATCTTACGAAGACGCATCGATCTATCGATCTTACG AAGACGCATCGATCAATCGATCGTACG

This is NOT an easy example! It’s an example of a classification problem

1: Introduction

Page 31: Biological Sequence Analysis Spring 2008 1: Introduction

1: Introduction

2006: Five machine learning tools were compared:•Decision trees•Linear regression•Linear discriminant analysis•Neural networks•Support vector regression

~80% accuracy

Page 32: Biological Sequence Analysis Spring 2008 1: Introduction

1: Introduction

3. Revolutionizing our understanding of the anthrax molecular mechanism

Page 33: Biological Sequence Analysis Spring 2008 1: Introduction

1: Introduction

•Anthrax is a disease whose causative agent is the gram positive Bacillus anthracis

•It infects mainly cattle, swine, and horses but it can also infect humans

•Humans are infected from milk or meat from infected animals

•In humans, it causes skin problems, in cattle – fatal blood poisoning

Page 34: Biological Sequence Analysis Spring 2008 1: Introduction

1: Introduction

•A vaccine was found by Pasteur

•Koch was the first to isolate the bacterium

•Airborne anthrax, such as that induced by weaponized strains used forbioterrosrism is almost always fatal in humans (respiratory distress, hemorrhage)

Page 35: Biological Sequence Analysis Spring 2008 1: Introduction

1: Introduction

How does the bacterium Bacillus anthracis work?It secretes three proteins: protective antigen (PA), edema factor (EF), and lethal factor (LF)

PA monomer first binds to a host-cell surface receptor. This binding triggers proteolytic cleavage (a part of the N terminus is cut out)

The (remaining) PA monomers oligomerize, forming heptamers

Page 36: Biological Sequence Analysis Spring 2008 1: Introduction

1: Introduction

LF and EF bind the heptamer and the entire complex is internalized into an endosome

The acidity in the endosome causes a conformational change in the complex, which helps it penetrate the endosome membrane and to form a pore

The story continues…

Page 37: Biological Sequence Analysis Spring 2008 1: Introduction

1: Introduction

Researchers from the group of David Baker wanted to know how LF and EF bind to the heptameric PA. They used a bioinformatics method called docking…

Page 38: Biological Sequence Analysis Spring 2008 1: Introduction

1: Introduction

This is where the two proteins interact!

Page 39: Biological Sequence Analysis Spring 2008 1: Introduction

1: Introduction

Once they had a prediction, they performed mutagenesis experiments. Changing residues in the predicted interface cancelled the binding interaction.

Page 40: Biological Sequence Analysis Spring 2008 1: Introduction

1: Introduction

How does docking work? Each 3D conformation is given a score. The pair with the best score is chosen

Page 41: Biological Sequence Analysis Spring 2008 1: Introduction

1: Introduction

Challenges: what is the best score?How to go over as many conformations as possible?How to take into account that proteins are flexible?

Page 43: Biological Sequence Analysis Spring 2008 1: Introduction

Genome

Project 2003

1: Introduction

Page 44: Biological Sequence Analysis Spring 2008 1: Introduction

1: Introduction

)Slide from Prof. Ron Shamir(

Page 45: Biological Sequence Analysis Spring 2008 1: Introduction

BioinformaticsBioinformatics

• Organize, store, analyze, visualize genomic data • Utilizes methods from Computer Science,

Mathematics, Statistics and Biology

The marriage of Computer Science and Biology

1: Introduction

)Slide from Prof. Ron Shamir(

Page 46: Biological Sequence Analysis Spring 2008 1: Introduction

• At the convergence of two revolutions: the ultra-fast growth of biological data, and the information revolution

Biology is becoming an information science

22 Aug 2005:100,000,000,000 bases

1: Introduction Bioinformatics)Slide from Prof. Ron Shamir(

Page 47: Biological Sequence Analysis Spring 2008 1: Introduction

Bioinformatics – a short CV

• Born ~1990• Grown rapidly• Experience: essential part of modern

Biomedical sciences• Now, a separate multidisciplinary scientific

area• Is one of the cornerstones of 21st Century

biomedical research

1: Introduction

)Slide from Prof. Ron Shamir(

Page 48: Biological Sequence Analysis Spring 2008 1: Introduction

1: Introduction

•Academic research: where it all started•Biotechnology companies•Big Pharmas•National and international centers

The Bioinformatics Actors

Find me gene (gin?)

Page 49: Biological Sequence Analysis Spring 2008 1: Introduction

Bioinformatics in Israel

• World class player in research

• Ranked 2-3 in absolute number of papers in the most prestigious and competitive conferences

• Maintaining our competitive global position is nontrivial

1: Introduction )Slide from Prof. Ron Shamir(