33
Rec DNA II. 1 The Human Genome Project

Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Embed Size (px)

Citation preview

Page 1: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 1

The Human Genome ProjectThe Human Genome Project

Page 2: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 2

2003 Completion of the Human Genome Programe

Start of the „post-genomic era”

2001 First draft of the Human Genome

HGP (Human Genome Project)

NCBI:NationalCenter of BiotechnologyInformation

Celera Genomics(privát szektor)Craig Venter

Francis CollinsHGP (Human Genome Project)

NCBI:NationalCenter of BiotechnologyInformation

Celera Genomics(private sector)Craig Venter

Francis Collins

Page 3: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 3

GTCCGGTCCC GGGACCCCCT GCCCAGGGTC AGAGGGGCGC CTACCTAGCT CACGGTCTTG

GGCCGGAGGG AATGGAGGAG GGAGCGGGGT CGACCGCTCA GCTGTCCGCC CAGTTTCGGA

GGCGGCCACG CGAGGATCAA CTGTGCAACG GGTGGGGCCG CGGCTGACCG TGGTGGTCGC

GGGGGCTGAG GGCCAGAGGC TGCGGGGGGG GGGCGGCGGG ATGAGCTAGG CGTCGGCGGT

TGAGTCGGGC GCGGAGTCGG GGGCAGGGGG AGCGGGCGTG GAGGGCGCGC ACGAGGTCGA

GGCGAGTCCG CGGGGGAGGC GGGCAGAGCC TGAGCTCAGG TCTTTCTGCG TCTGGCGGAA

CGGGCCTGGG AGGGAGGTTT TGCCAGATAC CAGGTGGACT AGGGTGAGCG CCCGAGGGCC

GGGACGCACG CACGGGCCGG GTAGGATGGC GCTGGCGTCG ATGCCCGCGC GCTTCAGGGC

CTGGTCTGGC CGCCCCTCCA TCCTTGTCGG TTTCTCGGGT CGCGGACCCC GCGCGGCGCC

GGGCGATGCT GGCCTGCCCG TGGCCACCAC CTCGCTTCAT TCCCGTCTCT TTGGGCCGCC

GCATTCGTCC ACGTGCCCGT CTCTCCCTGC GCAAAATTCC AAGATGAGCA AATACTGGGC

TCACGGTGGA GCGCCGCGGG GGCCCCCCTG AGCCGGGGCG GGTCGGGGGC GGGACCAGGG

TCCGGCCGGG GCGTGCCCGA GGGGAGGGAC TCCCCGGCTT GCGACCCGGC GTTGTCCGCG

J.Watson, 1st director of HGP

The Human Genome Project

‘clone by clone’ technique:- Parallel construction of genetic and physical maps- Representation of the genome in ordered libraries

The Human Genome Project

‘clone by clone’ technique:- Parallel construction of genetic and physical maps- Representation of the genome in ordered libraries

Page 4: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 4

Mapping strategies: Physical MapsMapping strategies: Physical Maps

Cytogenetic (chromosomal) maps - binding pattern

Cosmid contig maps ordered clones of overlapping libraries

Restriction mapssites of known restriction enzymes

DNA sequencesHigh

Low

resolution

Page 5: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 5

1st aim:

Find 30,000 markers(in average distance of 150,000 bp)

Marker: a unique sequence

The ‘clone by clone’ technique

The ‘clone by clone’ technique

2nd aim:

- Isolate chromosomes - Cleave them with endonuclease(150,000 bp fragments) -Clone them (Bacterial Arteficial Chromosome, BAC clones)

Page 6: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 6

3rd aim:Map the BAC clones with restriction endonucleasePut them in order!

Ordered BAC libraries

The ‘clone by clone’ technique

The ‘clone by clone’ technique

Page 7: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 7

Sequence the ends:GCCGAATCCAATTAGAAAAT

TAGAAAATCACATTTACCAGTCTGA

CCAGTCTGACCCCGCAAACGGGTTT

150 000 bp (BAC)

1500 bp fragments(overlapping)

Align the sequences:

GCCGAATCCAATTAGAAAAT

TAGAAAATCACATTTACCAGTCTGA

CCAGTCTGACCCCGCAAACGGGTTT

Sequencing the BAC clonesSequencing the BAC clones

Page 8: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 8

Craig Venter

Celera: The „shotgun” methodsCelera: The „shotgun” methods

2000 bpand 10000 bp fragments

AAGGACTTATG____________________GGACACAGGTTATGG

GACTTA_____CGTTGGAGAGAGGACACA________________CGTTATATTG

Sequencing of the ends and aligning by computer:

Only physical maps

Page 9: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 9

Representation of the human genomeRepresentation of the human genome

1. Databases (‘in silico’)HGP: http://www.ncbi.nlm.nih.gov/Celera: http://www.celera.com/

2. A series of bacterial colonies (BAC libraries)

Page 10: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 10

The ENTREZ databaseThe ENTREZ database

http://www.ncbi.nlm.nih.gov/Entrez/

National Center of Biotechnology Institute, USASurfing on the Net

Page 11: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 11http://www.ncbi.nlm.nih.gov/mapview/

Search for Homo Sapiens, DRD4 (dopamine D4 receptor gene)Surfing on the Net

Page 12: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 12

http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=9606&query=DRD4

Internet sétaChromosomal localization of the DRD4 gene

Page 13: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 13

http://www.ncbi.nlm.nih.gov/mapview/maps.cgi?taxid=9606&chr=11&MAPS=genec,ugHs,genes-r&cmd=focus&fill=40&query=uid(1641)&QSTR=DRD4

Internet séta

nagyítás

Search gene

sequence

Page 14: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 14

NCBI Entrez Gene

Page 15: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 15

http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=60521

Online Mendelian Inheritance in Man (OMIM)

OMIM: Database of mutations, diseases Known function of genes

OMIM: Database of mutations, diseases Known function of genes Internet séta

Review of the literature, references

Page 16: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 16http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=retrieve&dopt=default&list_uids=1815

Internet séta

Exon – intron structure

Exon (red box) – intron (red line) structure of a gene

Direction of transcription

Page 17: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 17

About 20,000 genes

The “useful information” of the genome The “useful information” of the genome

Less than 5% of the genome ???

45% of the human genomes are “jumping genes” (transposones)•LINEs

(long interspread elements): 6 kb, 8500 copies, 25% of our genomereplicates with reverse transcriptionmany truncated forms (inactive)

•SINEs (short interspred elements): 100-300 bp, 1,5 million copies 13% of our genome, replicates by using the SHINE machinery

Others• Duplicated human genes (pseudogenes)• Simple repeats (e.g.. AAAAAAAAAAAAAA….)

The ‘extra’ (‘junk’) DNA - Repeat sequencesThe ‘extra’ (‘junk’) DNA - Repeat sequences

Page 18: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 18

Universal Protein Resource (Swiss-Prot, TrEMBL, és PIR egyesítése)

http://www.expasy.uniprot.org/

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein&itool=toolbar

NCBI Entrez Protein database

Internet sétaProtein databasesProtein databases

Page 19: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 19

http://www.gene-regulation.com/pub/databases.html

http://www.gene-regulation.com/pub/databases.html#transcompel

http://www.cbil.upenn.edu/tess/

Databases of transcription factorsDatabases of transcription factors

2 transzkripciós faktor együtt

Internet séta

Page 20: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 20

The polymorphic nature of the human genome

The polymorphic nature of the human genome

Approx. 0.5% variations(15 million base pairs)

Page 21: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 21

Unrelated humans:

share - 99.9%(the difference is about

3 x 106 bp)

Mutations &Polimorphisms

GAGGGAGCGC

GAGGGAGCGCGAGGGAGCGC

GAGGGTGCGC

GAGGGTGCGC

GAGGGTGCGCHuman & apes:

share ~ 95%

“Similarity” in terms of gene sequence“Similarity” in terms of gene sequence

Page 22: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 22

GTCCGGTCCC GGGACCCCCT GCCCAGGGTC AGAGGGGCGC CTACCTAGCT CACGGTCTTG

GGCCGGAGGG AATGGAGGAG GGAGCGGGGT CGACCGCTCA GCTGTCCGCC CAGTTTCGGA

GGCGGCCACG CGAGGATCAA CTGTGCAACG GGTGGGGCCG CGGCTGACCG TGGTGGTCGC

GGGGGCTGAG GGCCAGAGGC TGCGGGGGGG GGGCGGCGGG ATGAGCTAGG CGTCGGCGGT

TGAGTCGGGC GCGGAGTCGG GGGCAGGGGG AGCGGGCGTG GAGGGCGCGC ACGAGGTCGA

GGCGAGTCCG CGGGGGAGGC GGGCAGAGCC TGAGCTCAGG TCTTTCTGCG TCTGGCGGAA

CGGGCCTGGG AGGGAGGTTT TGCCAGATAC CAGGTGGACT AGGGTGAGCG CCCGAGGGCC

GGGACGCACG CACGGGCCGG GTAGGATGGC GCTGGCGTCG ATGCCCGCGC GCTTCAGGGC

CTGGTCTGGC CGCCCCTCCA TCCTTGTCGG TTTCTCGGGT CGCGGACCCC GCGCGGCGCC

GGGCGATGCT GGCCTGCCCG TGGCCACCAC CTCGCTTCAT TCCCGTCTCT TTGGGCCGCC

GCATTCGTCC ACGTGCCCGT CTCTCCCTGC GCAAAATTCC AAGATGAGCA AATACTGGGC

TCACGGTGGA GCGCCGCGGG GGCCCCCCTG AGCCGGGGCG GGTCGGGGGC GGGACCAGGG

TCCGGCCGGG GCGTGCCCGA GGGGAGGGAC TCCCCGGCTT GCGACCCGGC GTTGTCCGCG

Mutations: rare allele variations - usually monogenic disorders(in less than 1% of the human population)

when the “misprint” is fatal

GAGGGCGCGC ACGAGGTCGA

TCTTTCTGCG TCTGGCGGAA

AGGGTGAGCG CCCGAGGGCC

ATGCCCGCGC GCTTCAGGGC

CGCGGACCCC GCGCGGCGCC

TCCCGTCTCT TTGGGCCGCC

AAGATGAGCA AATACTGGGC

GGTCGGGGGC GGGACCAGGG

CGACCCGGC GTTGTCCGCG

Azonosított monogénes öröklődésű

betegségekSickle cell anemia

Page 23: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 23

2 ismétlődés

3 ismétlődés

4 ismétlődés

5 ismétlődés

VNTR

G C A C T A C CC G T G A T G G

G C A T T A C CC G T A A T G G

SNP

… harmless misprints”

Genetic polimorphisms: variations over 1% frequency in humans

Single Nucleotide Polymorphism Variable Number of Tandem Repeats

Page 24: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 24

Single Nucleotide Polymorphisms/ SNPs (pronounced “snips”)

• 90% of the known variations• most SNPs have only two alleles

Polymorphism - MutationPolymorphism - Mutation

Polymorphism

Neutral ???Risk factors

more than 1%Frequency less than 1%

Effect disease

Mutation

Length Polymorphism: repeat sequences

Page 25: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 25

What is next?

Page 26: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 26

What is next?

Page 27: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 27

“Human - ape genome: 95% similarityWhat is the difference?”

“Human - ape genome: 95% similarityWhat is the difference?”

Page 28: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 28

High throughput methodsin genome analyzes:Automated DNA sequencing

High throughput methodsin genome analyzes:Automated DNA sequencing

‘Color sequencing’

Based on dideoxy-chain termination (see also: Lehninger)

...3’ C A A G T C A C C T T G

C A A G

A ddA

Terminating positions

Sequencing reaction mixture: All the four dNTP All the four ddNTP with different fluorescent dyeDNA polymerase, primer

Page 29: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 29

+

index

Sequencing results:

Page 30: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 30

DNA chip (oligonucleotide array) 1. Mutation analysis

DNA chip (oligonucleotide array) 1. Mutation analysis

50 µm

1.2 cm~ 60 000 position

One position:1 000 000molecules

Page 31: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 31

The oligonucleotide arrayThe oligonucleotide arrayExample: mutation analysis of a 4 000 bp gene (e.g. CFTR)

4000 bp length – 4000 oligo

4 variations in the middle base:12 000 oligo

1–202–213–22

...

Arrays of a 20 bp oligo

Page 32: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 32

sampleControl (no mutation)

Comparison with computer

The resultThe result

Page 33: Rec DNA II.1 The Human Genome Project Rec DNA II.2 2003 Completion of the Human Genome Programe Start of the „post-genomic era” 2001 First draft of the

Rec DNA II. 33

DNA-chip 2: Expression Analysis by Micro-arraysDNA-chip 2: Expression Analysis by Micro-arrays