61
Outline of Talk Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Embed Size (px)

Citation preview

Page 1: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Outline of TalkOutline of Talk

• Protein GenesProtein Genes

• SNPs SNPs

• HaplotypesHaplotypes

• Finding a Disease LocusFinding a Disease Locus

Page 2: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Size of the GenomesSize of the Genomes

0 500 1000 1500 2000 2500 3000 3500

Human

Maize

Rice

Arabidopsis

Drosophila

C. elegans

S. cerevisiae

E. coli

Millions of Basepairs

bacteriabacteria

yeastyeast

round wormround worm

fruit flyfruit fly

flowering plantflowering plant

Page 3: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

The Human GenomeThe Human Genome

Page 4: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

What the letters stand forWhat the letters stand for

DNA has four chemical subunits, called DNA has four chemical subunits, called nucleotide basesnucleotide bases abbreviated A, C, G, T.abbreviated A, C, G, T.

GATTACAGATTACA

http://http://en.wikipedia.org/wiki/Nucleotideen.wikipedia.org/wiki/Nucleotide

Page 5: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

What’s in the Genome?What’s in the Genome?• Chromosomes – 23 pairsChromosomes – 23 pairs

– Genes • Protein genes• RNA genes• MicroRNA genes

– Repeats• Tandem repeats• Inverted repeats• Transposons• Segmental duplications

– Regulatory regions • Promoters• Transcription factor binding sites

Page 6: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Protein GenesProtein Genes

A A protein geneprotein gene contains the genetic code for a protein. The contains the genetic code for a protein. The production of protein involves production of protein involves transcriptiontranscription (copying DNA to (copying DNA to RNA) and RNA) and translation translation (using RNA code to produce a protein).(using RNA code to produce a protein).

http://www.slic2.wsu.edu:82/hurlbert/micro101/images/TransTranscrip.gifhttp://www.slic2.wsu.edu:82/hurlbert/micro101/images/TransTranscrip.gif

Page 7: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

http://nobelprize.org/medicine/educational/dna/a/http://nobelprize.org/medicine/educational/dna/a/translation/polysome_em.htmltranslation/polysome_em.html

http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/M/http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/M/Miller_Beatty3.jpgMiller_Beatty3.jpg

TranscriptionTranscriptionTranslationTranslation

Page 8: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Finding Protein GenesFinding Protein GenesBefore the sequencing of genomes, protein genes were found Before the sequencing of genomes, protein genes were found experimentally. Now, new genes are experimentally. Now, new genes are predicted computationally predicted computationally using ausing a gene model gene model..

Page 9: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Finding Protein GenesFinding Protein GenesBefore the sequencing of genomes, protein genes were found Before the sequencing of genomes, protein genes were found experimentally. Now, new genes are experimentally. Now, new genes are predicted computationally predicted computationally using ausing a gene model gene model..

Page 10: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Finding Protein GenesFinding Protein GenesBefore the sequencing of genomes, protein genes were found Before the sequencing of genomes, protein genes were found experimentally. Now, new genes are experimentally. Now, new genes are predicted computationally predicted computationally using ausing a gene model gene model..

Page 11: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Finding Protein GenesFinding Protein GenesBefore the sequencing of genomes, protein genes were found Before the sequencing of genomes, protein genes were found experimentally. Now, new genes are experimentally. Now, new genes are predicted computationally predicted computationally using ausing a gene model gene model..

Page 12: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Finding Protein GenesFinding Protein GenesBefore the sequencing of genomes, protein genes were found Before the sequencing of genomes, protein genes were found experimentally. Now, new genes are experimentally. Now, new genes are predicted computationally predicted computationally using ausing a gene model gene model..

Page 13: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Building a Gene ModelBuilding a Gene ModelGene models for prediction are based on the Gene models for prediction are based on the structurestructure of genes in of genes in DNA and their messenger RNAs (DNA and their messenger RNAs (mRNAsmRNAs). This includes ). This includes exons, exons, intronsintrons, , promoterspromoters, and the , and the polyadenylation signalpolyadenylation signal. .

http://xray.bmc.uu.se/Courses/Bke2/Exercises/Exercise_answers/pre_mRNA_processing.gifhttp://xray.bmc.uu.se/Courses/Bke2/Exercises/Exercise_answers/pre_mRNA_processing.gif

Page 14: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

ExonsExonsIn this example, In this example, EXONSEXONS are uppercase and introns are lowercase. Exons contain the are uppercase and introns are lowercase. Exons contain the code for a protein, introns code for a protein, introns interruptinterrupt the exons. Before translation, the exons. Before translation, introns are introns are removedremoved from the messenger RNA. from the messenger RNA.

DNA:DNA:

……ACTGCTACAGACTGCTACAGtctattgatctattgaGAACAACATAGGAACAACATAGtcacgaacttaacgtgcatcacgaacttaacgtgcaGTTTAACAGCACGGTTTAACAGCACGtctcgaagggca…tctcgaagggca…

RNA (before removal of introns):RNA (before removal of introns):

……ACUGCUACAGACUGCUACAGucuauugaucuauugaGAACAACAUAGGAACAACAUAGucacgaacuuaacgugcaucacgaacuuaacgugcaGUUUAACAGGUUUAACAGCACGCACGucucgaagggca…ucucgaagggca…

RNA (after removal of introns):RNA (after removal of introns):

……ACUGCUACAGGAACAACAUAGGUUUAACAGCACG…ACUGCUACAGGAACAACAUAGGUUUAACAGCACG…

Page 15: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

The sequence of an exon contains The sequence of an exon contains codons. codons. Each codon is a Each codon is a triplet of nucleotidestriplet of nucleotides which codes for a which codes for a single amino acidsingle amino acid. Amino . Amino acids are the building blocks of a protein.acids are the building blocks of a protein.

Finding ExonsFinding Exons

http://en.wikipedia.org/wiki/Genetic_codehttp://en.wikipedia.org/wiki/Genetic_code

Page 16: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Genetic CodeGenetic Code. Each codon specifies one of . Each codon specifies one of twenty amino acidstwenty amino acids. Three codons . Three codons

are are stop codonsstop codons, which specify the end of translation., which specify the end of translation.

http://www.emc.maricopa.edu/faculty/farabee/BIOBK/code.gifhttp://www.emc.maricopa.edu/faculty/farabee/BIOBK/code.gif

Page 17: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

An An open reading frameopen reading frame (ORF), (ORF), is a sequence of codons that is a sequence of codons that does does not contain a stop codon.not contain a stop codon.

Open Reading Frame (ORF)Open Reading Frame (ORF)

http://en.wikipedia.org/wiki/Genetic_codehttp://en.wikipedia.org/wiki/Genetic_code

alaninealanine

threoninethreonine

glutamic acidglutamic acid

leucineleucine

argininearginine

serineserine

STOP!STOP!

Page 18: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Sequence:Sequence:

acggacucacggacucuaguagccccuaauaaugugugaugacgaccgacugaugacacauaguaggguaauaaauucgcucauucgcuc

Even though this sequence contains stop codons, they are not present in allEven though this sequence contains stop codons, they are not present in all reading framesreading frames..

frame +1frame +1acg gac ucacg gac ucu agu agc cc cua aua aug ug ugauga cga c cga cug aug aca ca uaguag g gua aua aau ucg cucau ucg cuc

frame +2frame +2a cgg acu ca cgg acu cua gua gcc cc uaa uaa ugugu gau gac gac c gac uga uga cacau agu agg g uaa uaa auu cgc ucauu cgc uc

frame +3frame +3ac gga cuc ac gga cuc uaguag cc ccu aau aau gu gug aug acg accg acu gau gac ac aua gua gggu aau aaa uuc gcu ca uuc gcu c

Very short ORFs are unlikely.Very short ORFs are unlikely.

Finding ExonsFinding Exons

Page 19: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Finding IntronsFinding Introns

Introns usually start at a G – T boundary and end at an A – G Introns usually start at a G – T boundary and end at an A – G boundary. boundary.

Page 20: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Sequence:Sequence:

acggacucuagccuaaugugacgacugacauagguaaauucgcucacggacucuagccuaaugugacgacugacauagguaaauucgcuc

A gene can contain open reading frames connected across stop A gene can contain open reading frames connected across stop codons by an intron codons by an intron

frame +1frame +1acg gac ucu agc cua aug acg gac ucu agc cua aug ugauga cga cug aca cga cug aca uaguag gua aau ucg cuc gua aau ucg cuc

frame +3frame +3ac gga cuc ac gga cuc uaguag ccu aau gug acg acu gac aua ggu aaa uuc gcu c ccu aau gug acg acu gac aua ggu aaa uuc gcu c

Finding ExonsFinding Exons

Page 21: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

How many genes are there?How many genes are there?

EstimatesEstimates

• pre 2000: pre 2000: 100,000100,000 based on estimates of required number of based on estimates of required number of genes to account for human complexity genes to account for human complexity

• 2001: 2001: 30,000 – 40,00030,000 – 40,000 based on first draft of human genome based on first draft of human genome

• 2003: 2003: 23,000 – 24,50023,000 – 24,500 based on gene prediction computer based on gene prediction computer programs programs

Why so low?Why so low?

• alternate splicing of exonsalternate splicing of exons

• complex regulatory mechanismscomplex regulatory mechanisms

• inability to predict genes which are unlike those seen beforeinability to predict genes which are unlike those seen before

http://www.ornl.gov/sci/techresources/Human_Genome/faq/genenumber.shtmlhttp://www.ornl.gov/sci/techresources/Human_Genome/faq/genenumber.shtml

Page 22: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

RNA GenesRNA Genes

RNA genesRNA genes do not code for proteins. Instead, the RNA molecule do not code for proteins. Instead, the RNA molecule itself is functional in the cell. itself is functional in the cell.

Examples include:Examples include:

1.1. Ribosomal RNARibosomal RNA – these molecules form the major – these molecules form the major component of the protein building machinery component of the protein building machinery

2.2. Transfer RNATransfer RNA – work with ribosomal RNA to insert correct – work with ribosomal RNA to insert correct amino acids into growing proteinsamino acids into growing proteins

3.3. MicroRNAMicroRNA – a newly discovered class of RNA which helps – a newly discovered class of RNA which helps regulate gene expression. regulate gene expression.

Page 23: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

RibosomeRibosome

http://http://www.ncbi.nlm.nih.www.ncbi.nlm.nih.gov/Class/gov/Class/NAWBIS/NAWBIS/Modules/RNA/Modules/RNA/images/images/fig_rna12.jpgfig_rna12.jpg

Page 24: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

http://nobelprize.org/medicine/educational/dna/a/http://nobelprize.org/medicine/educational/dna/a/translation/polysome_em.htmltranslation/polysome_em.html

http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/M/http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/M/Miller_Beatty3.jpgMiller_Beatty3.jpg

TranscriptionTranscriptionTranslationTranslation

Page 25: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

RNA GenesRNA Genes

MicroRNAsMicroRNAs are are shortshort and show little or no conservation of and show little or no conservation of sequence.sequence.

Unlike protein genes, RNA genes Unlike protein genes, RNA genes do not containdo not contain codons or open codons or open reading frames. But, they do contain reading frames. But, they do contain inverted repeatsinverted repeats..

Page 26: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Inverted Repeats (IRs)Inverted Repeats (IRs)

RNARNA

G A C U U G A U C A A G U CG A C U U G A U C A A G U C

complementedcomplemented

reversedreversed

Two patterns, one the Two patterns, one the reverse complementreverse complement of the other of the other

Page 27: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

IR NomenclatureIR Nomenclature

Left armLeft arm Right armRight arm

SpacerSpacer

RNARNA

G A C U U G A U C A A G U CG A C U U G A U C A A G U C

Page 28: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Stem-Loop StructureStem-Loop Structure

CCAAGGUUUUCCAAGG

GGUUCCAAAAGGUUCC

SpacerSpacer

Left armLeft arm Right armRight arm

Structure forms by pairing of complementary basesStructure forms by pairing of complementary bases

Page 29: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

MicroRNAMicroRNA

MicroRNAs come from a precursor that contains a stem-loop.MicroRNAs come from a precursor that contains a stem-loop.

http://www.ma.uni-heidelberg.de/apps/zmf/argonaute/interface/mirna.jpeghttp://www.ma.uni-heidelberg.de/apps/zmf/argonaute/interface/mirna.jpeg

Page 30: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Detection of Approximate Inverted RepeatsDetection of Approximate Inverted Repeats

Human Chr. 3 ~173,291,101 Human Chr. 3 ~173,291,101

AAGACTTGAA CAACTTTTAA ACATAAGATC AATTATTTCA AAGACTTGAA CAACTTTTAA ACATAAGATC AATTATTTCA AGTAGATTCC CTTTTTCATT CACAATCACA TTCTCACAGA AGTAGATTCC CTTTTTCATT CACAATCACA TTCTCACAGA CACAGTCCCA GTTTCTACCT GACTGAGATG CAGTAAGGAA CACAGTCCCA GTTTCTACCT GACTGAGATG CAGTAAGGAA TCTGATTATA ACACTCATTG ATTATAACAC TCATTGAATT TCTGATTATA ACACTCATTG ATTATAACAC TCATTGAATT TATGGATTCC TTACTGCATC TCATTCAGGT AGAAAAAGGG TATGGATTCC TTACTGCATC TCATTCAGGT AGAAAAAGGG ACTGTGTCTG TGAGAATGTG ATTGTGAATG AAAAAGATGG ACTGTGTCTG TGAGAATGTG ATTGTGAATG AAAAAGATGG AATATGTGTA TTTTTGAGTG TCTATGGAAG AGCTTCTGAC AATATGTGTA TTTTTGAGTG TCTATGGAAG AGCTTCTGAC AAGAGAGAGG AAGATTAGGT AAAATGAAAT ATCGCCGTCG AAGAGAGAGG AAGATTAGGT AAAATGAAAT ATCGCCGTCG GCATTTCCCC CTACGT GCATTTCCCC CTACGT

Page 31: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Detection of Approximate Inverted RepeatsDetection of Approximate Inverted Repeats

Human Chr. 3 ~173,291,101 Human Chr. 3 ~173,291,101

AAGACTTGAA CAACTTTTAA ACATAAGATC AATTATTTCA AAGACTTGAA CAACTTTTAA ACATAAGATC AATTATTTCA AGTAGATTCC CTTTTTCATT CACAATCACA TTCTCACAGA AGTAGATTCC CTTTTTCATT CACAATCACA TTCTCACAGA CACAGTCCCA GTTTCTACCT GACTGAGATG CAGTAAGGAA CACAGTCCCA GTTTCTACCT GACTGAGATG CAGTAAGGAA TCTGATTATA ACACTCATTG ATTATAACAC TCATTGAATT TCTGATTATA ACACTCATTG ATTATAACAC TCATTGAATT TATGGATTCC TTACTGCATC TCATTCAGGT AGAAAAAGGG TATGGATTCC TTACTGCATC TCATTCAGGT AGAAAAAGGG ACTGTGTCTG TGAGAATGTG ATTGTGAATG AAAAAGATGG ACTGTGTCTG TGAGAATGTG ATTGTGAATG AAAAAGATGG AATATGTGTA TTTTTGAGTG TCTATGGAAG AGCTTCTGAC AATATGTGTA TTTTTGAGTG TCTATGGAAG AGCTTCTGAC AAGAGAGAGG AAGATTAGGT AAAATGAAAT ATCGCCGTCG AAGAGAGAGG AAGATTAGGT AAAATGAAAT ATCGCCGTCG GCATTTCCCC CTACGTGCATTTCCCC CTACGT

Arms are 72 nt long, spacer is 42bp longArms are 72 nt long, spacer is 42bp long

Page 32: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

The Problem: Find the Inverted RepeatThe Problem: Find the Inverted Repeat

Human Chr. 3 ~173,291,101 Human Chr. 3 ~173,291,101

AAGACTTGAA CAACTTTTAA ACATAAGATC AATTATTTCA AAGACTTGAA CAACTTTTAA ACATAAGATC AATTATTTCA AGTAGATTCC AGTAGATTCC CTTTTTCATT CACAATCACA TTCTCACAGA CACAGTCCCA GTTTCTACCT GACTGAGATG CAGTAAGGAA TCTGATTATA ACACTCATTG ATTATAACAC TCATTGAATT TATGGATTCC TTACTGCATC TCATTCAGGT AGAAAAAGGG ACTGTGTCTG TGAGAATGTG ATTGTGAATG AAAAAGATGG ATGG AATATGTGTA TTTTTGAGTG TCTATGGAAG AGCTTCTGAC AATATGTGTA TTTTTGAGTG TCTATGGAAG AGCTTCTGAC AAGAGAGAGG AAGATTAGGT AAAATGAAAT ATCGCCGTCG AAGAGAGAGG AAGATTAGGT AAAATGAAAT ATCGCCGTCG GCATTTCCCC CTACGTGCATTTCCCC CTACGT

Page 33: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Single Nucleotide Polymorphisms (SNPs)Single Nucleotide Polymorphisms (SNPs)

A A SNPSNP is a single position in the genome (a is a single position in the genome (a locuslocus) that is ) that is not the same in all not the same in all peoplepeople. Some people have one type of nucleotide and other people have a . Some people have one type of nucleotide and other people have a different nucleotide. Differences in the population at a single locus are different nucleotide. Differences in the population at a single locus are called called polymorphismspolymorphisms and the individual types are called and the individual types are called allelesalleles. .

SNPs are found experimentallySNPs are found experimentally

aaccaattttcccctt

aaccggttttaatttt

SNPsSNPs

Page 34: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

HaplotypesHaplotypes

A A haplotypehaplotype is a collection of SNP alleles on a single is a collection of SNP alleles on a single chromosome in an individual.chromosome in an individual.

Shown are SNPS on two chromosomes in each individual.Shown are SNPS on two chromosomes in each individual.

aaccggttttccaatt

aaccaattttccaatt

ttccggttttccaatt

aaccaaggaattaatt

aaccaattttcccctt

aaccaattttcccctt

aattaaggttccccaa

aaccaaggttccccaa

ttccaattttccaatt

aaccaattttccaaaa

Page 35: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

HaplotypesHaplotypes

A A haplotypehaplotype is a collection of SNP alleles on a single is a collection of SNP alleles on a single chromosome in an individual.chromosome in an individual.

Homozygous (same alleles)Homozygous (same alleles)

aaccggttttccaatt

aaccaattttccaatt

ttccggttttccaatt

aaccaaggaattaatt

aaccaattttcccctt

aaccaattttcccctt

aattaaggttccccaa

aaccaaggttccccaa

ttccaattttccaatt

aaccaattttccaaaa

Page 36: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

HaplotypesHaplotypes

A A haplotypehaplotype is a collection of SNP alleles on a single is a collection of SNP alleles on a single chromosome in an individual.chromosome in an individual.

Heterozygous (different alleles)Heterozygous (different alleles)

aaccggttttccaatt

aaccaattttccaatt

ttccggttttccaatt

aaccaaggaattaatt

aaccaattttcccctt

aaccaattttcccctt

aattaaggttccccaa

aaccaaggttccccaa

ttccaattttccaatt

aaccaattttccaaaa

Page 37: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

HaplotypesHaplotypes

A A haplotypehaplotype is a collection of SNP alleles on a single is a collection of SNP alleles on a single chromosome in an individual.chromosome in an individual.

Rare allelesRare alleles

aaccggttttccaatt

aaccaattttccaatt

ttccggttttccaatt

aaccaaggaattaatt

aaccaattttcccctt

aaccaattttcccctt

aattaaggttccccaa

aaccaaggaaccccaa

ttccaattttccaatt

aaccaattttccaaaa

Page 38: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

HaplotypesHaplotypes

A A haplotypehaplotype is a collection of SNP alleles on a single is a collection of SNP alleles on a single chromosome in an individual.chromosome in an individual.

Strong linkage (usually occur together)Strong linkage (usually occur together)

aaccggttttccaatt

aaccaattttccaatt

ttccggttttccaatt

aaccaaggaattaatt

aaccaattttcccctt

aaccaattttcccctt

aattaaggttccccaa

aaccaaggttccccaa

ttccaattttccaatt

aaccaattttccaaaa

Page 39: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Linkage AnalysisLinkage Analysis

SNPs and haplotypes are used to identify SNPs and haplotypes are used to identify regionsregions of the genome of the genome that cause diseasethat cause disease. The technique is called . The technique is called linkage analysislinkage analysis and and evidence of a connection is called evidence of a connection is called linkage disequilibrium (LD)linkage disequilibrium (LD)..

aaccaattttccaat`t`

aattaaggttccccaa

aaccaaggaattaatt

ttccaattttccaatt

mommom daddadaaccaaggttccccaa`̀

aaccaaggaaccaatt

childchild

recombination and recombination and inheritanceinheritance

Page 40: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Linkage AnalysisLinkage Analysis

SNPs and haplotypes are used to identify SNPs and haplotypes are used to identify regionsregions of the genome of the genome that cause diseasethat cause disease. The technique is called . The technique is called linkage analysislinkage analysis and and evidence of a connection is called evidence of a connection is called linkage disequilibrium (LD)linkage disequilibrium (LD)..

aaccaattttccaat`t`

aattaaggttccccaa

aaccaaggaattaatt

ttccaattttccaatt

mommom daddadaaccaaggttccccaa`̀

aaccaaggaaccaatt

childchild

recombination in recombination in the mother’s the mother’s chromosomeschromosomes

Page 41: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Linkage AnalysisLinkage Analysis

SNPs and haplotypes are used to identify SNPs and haplotypes are used to identify regionsregions of the genome of the genome that cause diseasethat cause disease. The technique is called . The technique is called linkage analysislinkage analysis and and evidence of a connection is called evidence of a connection is called linkage disequilibrium (LD)linkage disequilibrium (LD)..

aaccaattttccaat`t`

aattaaggttccccaa

aaccaaggaattaatt

ttccaattttccaatt

mommom daddadaaccaaggttccccaa`̀

aaccaaggaaccaatt

childchild

recombination in recombination in the father’s the father’s chromosomeschromosomes

Page 42: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Linkage AnalysisLinkage Analysis

SNPs and haplotypes are used to identify SNPs and haplotypes are used to identify regionsregions of the genome of the genome that cause diseasethat cause disease. The technique is called . The technique is called linkage analysislinkage analysis and and evidence of a connection is called evidence of a connection is called linkage disequilibrium (LD)linkage disequilibrium (LD)..

aaccaattttccaat`t`

aattaaggttccccaa

aaccaaggaattaatt

ttccaattttccaatt

mommom daddadaaccaaggttccccaa`̀

aaccaaggaaccaatt

childchild two to three crossovers per two to three crossovers per chromosome per generationchromosome per generation

Page 43: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Linkage AnalysisLinkage Analysis

Key point: Key point: Alleles Alleles that are physically that are physically close togetherclose together tend to be tend to be inherited together inherited together because the chance of a crossover between because the chance of a crossover between them is small. They them is small. They exhibit strong linkageexhibit strong linkage..

aaccaattttccaat`t`

aattaaggttccccaa

aaccaaggaattaatt

ttccaattttccaatt

mommom daddadaaccaaggttccccaa`̀

aaccaaggaaccaatt

childchild

Page 44: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Finding an Unknown Disease LocusFinding an Unknown Disease Locus

The The location on the genome of many diseases is unknownlocation on the genome of many diseases is unknown. SNPs . SNPs and haplotypes are being used to search for disease loci using and haplotypes are being used to search for disease loci using linkage analysis. linkage analysis.

aaccaattttccaat`t`

aattaaggttccccaa

aaccaaggaattaatt

ttccaattttccaatt

mommom dad dad has has diseasedisease

aaccaaggttccccaa`̀

aaccaaggaaccaatt

child child has has diseasedisease

Page 45: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Linkage Analysis – Linkage Analysis – Dominant ModelDominant Model

Assume the disease is caused by a Assume the disease is caused by a dominant alleledominant allele, meaning , meaning one one copy is enough to cause the diseasecopy is enough to cause the disease..

aaccaattttccaat`t`

aattaaggttccccaa

aaccaaggaattaatt

ttccaattttccaatt

mommom dad dad has has diseasedisease

aaccaaggttccccaa`̀

aaccaaggaaccaatt

child child has has diseasedisease

SNP alleles in SNP alleles in father that are not father that are not in motherin mother

Page 46: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Linkage Analysis – Linkage Analysis – Dominant ModelDominant Model

Assume the disease is caused by a Assume the disease is caused by a dominant alleledominant allele, meaning , meaning one one copy is enough to cause the diseasecopy is enough to cause the disease..

aaccaattttccaat`t`

aattaaggttccccaa

aaccaaggaattaatt

ttccaattttccaatt

mommom dad dad has has diseasedisease

aaccaaggttccccaa`̀

aaccaaggaaccaatt

child child has has diseasedisease

SNP allele in child, SNP allele in child, inherited from inherited from father with diseasefather with disease

Page 47: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Linkage Analysis – Linkage Analysis – Dominant ModelDominant Model

Assume the disease is caused by a Assume the disease is caused by a dominant alleledominant allele, meaning , meaning one one copy is enough to cause the diseasecopy is enough to cause the disease..

aaccaattttccaat`t`

aattaaggttccccaa

aaccaaggaattaatt

ttccaattttccaatt

mommom dad dad has has diseasedisease

aaccaaggttccccaa`̀

aaccaaggaaccaatt

child child has has diseasedisease

SNP allele and SNP allele and disease are linked disease are linked indicating possible indicating possible disease locus.disease locus.

Page 48: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Linkage Analysis – Linkage Analysis – Recessive ModelRecessive Model

Assume the disease is caused by a Assume the disease is caused by a recessive allelerecessive allele, meaning , meaning two two copies are required to cause the diseasecopies are required to cause the disease..

aaccaattttccaatt

aattaaggttccccaa

aaccaaggaattaatt

ttccaattttccaatt

mommom dad dad has has diseasedisease

aaccaaggttccccaa`̀

aaccaaggaaccaatt

child child has has diseasedisease

homozygous SNP homozygous SNP alleles in father that are alleles in father that are heterozygous in motherheterozygous in mother

Page 49: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Linkage Analysis – Linkage Analysis – Recessive ModelRecessive Model

Assume the disease is caused by a Assume the disease is caused by a recessive allelerecessive allele, meaning , meaning two two copies are required to cause the diseasecopies are required to cause the disease..

aaccaattttccaat`t`

aattaaggttccccaa

aaccaaggaattaatt

ttccaattttccaatt

mommom dad dad has has diseasedisease

aaccaaggttccccaa`̀

aaccaaggaaccaatt

child child has has diseasedisease

homozygous SNP homozygous SNP allele in child, allele in child, identical to father’s identical to father’s

Page 50: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Linkage Analysis – Linkage Analysis – Recessive ModelRecessive Model

Assume the disease is caused by a Assume the disease is caused by a recessive allelerecessive allele, meaning , meaning two two copies are required to cause the diseasecopies are required to cause the disease..

aaccaattttccaat`t`

aattaaggttccccaa

aaccaaggaattaatt

ttccaattttccaatt

mommom dad dad has has diseasedisease

aaccaaggttccccaa`̀

aaccaaggaaccaatt

child child has has diseasedisease

SNP allele and SNP allele and disease are linked disease are linked indicating possible indicating possible disease locus.disease locus.

Page 51: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus
Page 52: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

BMI = weight/heightBMI = weight/height22 in kg/m in kg/m22, BMI > 25 overweight, BMI > 30 obese, BMI > 25 overweight, BMI > 30 obese

Page 53: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Other Differences – Other Differences – MicrodeletionsMicrodeletions

A A microdeletion microdeletion is the loss of a small piece of DNA, perhaps as is the loss of a small piece of DNA, perhaps as small as 1000 bases. These pieces can contain genes, parts of small as 1000 bases. These pieces can contain genes, parts of genes or regulatory regions.genes or regulatory regions.

aaccaattttcccctt

aattggtttt

tt

ggcc

ggccaatt

aaccaaccttcccctt

microdeletionsmicrodeletions

Page 54: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Other Differences – Other Differences – MicrodeletionsMicrodeletions

A A microdeletion microdeletion is the loss of a small piece of DNA, perhaps as is the loss of a small piece of DNA, perhaps as small as 1000 bases. These pieces can contain genes, parts of small as 1000 bases. These pieces can contain genes, parts of genes or regulatory regions.genes or regulatory regions.

aaccaattttcccctt

aattggtttt

tt

ggcc

ggccaatt

aaccaaccttcccctt

heterozygousheterozygous

Page 55: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Other Differences – Other Differences – MicrodeletionsMicrodeletions

A A microdeletion microdeletion is the loss of a small piece of DNA, perhaps as is the loss of a small piece of DNA, perhaps as small as 1000 bases. These pieces can contain genes, parts of small as 1000 bases. These pieces can contain genes, parts of genes or regulatory regions.genes or regulatory regions.

aaccaattttcccctt

aattggtttt

tt

ggcc

ggccaatt

aaccaaccttcccctt

homozygoushomozygous

Page 56: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Other Differences – Other Differences – MicrodeletionsMicrodeletions

A A microdeletion microdeletion is the loss of a small piece of DNA, perhaps as is the loss of a small piece of DNA, perhaps as small as 1000 bases. These pieces can contain genes, parts of small as 1000 bases. These pieces can contain genes, parts of genes or regulatory regions.genes or regulatory regions.

aaccaattttcccctt

aattggtttt

tt

ggcc

ggccaatt

aaccaaccttcccctt

miscalled miscalled homozygoushomozygous

Page 57: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Apparent Inheritance InconsistencyApparent Inheritance Inconsistency

SNPs and haplotypes are used to identify SNPs and haplotypes are used to identify regionsregions of the genome of the genome that cause diseasethat cause disease. The technique is called . The technique is called linkage analysislinkage analysis and and evidence of a connection is called evidence of a connection is called linkage disequilibrium (LD)linkage disequilibrium (LD)..

aattaaaaggaaaacc

cccccc

aacc

aaccaattccccaacc

ccccccttccccaacc

mommom daddadcccccc

aacc`̀

aaccaattccccaacc

childchild

Page 58: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Apparent Inheritance Apparent Inheritance InconsistencyInconsistency

SNPs and haplotypes are used to identify SNPs and haplotypes are used to identify regionsregions of the genome of the genome that cause diseasethat cause disease. The technique is called . The technique is called linkage analysislinkage analysis and and evidence of a connection is called evidence of a connection is called linkage disequilibrium (LD)linkage disequilibrium (LD)..

aattaaaaggaaaacc

cccccc

aacc

aaccaattccccaacc

ccccccttccccaacc

mommom daddadcccccc

aacc`̀

aaccaattccccaacc

childchild

a a + t t a a + t t → → a t a t

by Mendelian inheritanceby Mendelian inheritance

Page 59: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

Apparent Inheritance Apparent Inheritance InconsistencyInconsistency

SNPs and haplotypes are used to identify SNPs and haplotypes are used to identify regionsregions of the genome of the genome that cause diseasethat cause disease. The technique is called . The technique is called linkage analysislinkage analysis and and evidence of a connection is called evidence of a connection is called linkage disequilibrium (LD)linkage disequilibrium (LD)..

aattaaaaggaaaacc

cccccc

aacc

aaccaattccccaacc

ccccccttccccaacc

mommom daddadcccccc

aacc`̀

aaccaattccccaacc

childchild

cluster of inconsistencies cluster of inconsistencies suggests a microdeletion.suggests a microdeletion.

Page 60: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

MicrodeletionsMicrodeletions

HundredsHundreds of microdeletion haplotypes have been discovered of microdeletion haplotypes have been discovered recently. They may be a recently. They may be a major contributor to human differences major contributor to human differences and disease.and disease.

Page 61: Outline of Talk Protein Genes Protein Genes SNPs SNPs Haplotypes Haplotypes Finding a Disease Locus Finding a Disease Locus

ResourcesResources

UCSC Human Genome BrowserUCSC Human Genome Browser

http://genome.ucsc.edu/cgi-bin/hgGatewayhttp://genome.ucsc.edu/cgi-bin/hgGateway

National Center for Biotechnology Information (NCBI)National Center for Biotechnology Information (NCBI)

http://http://www.ncbi.nlm.nih.govwww.ncbi.nlm.nih.gov//

PubMedPubMed

http://http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?dbwww.ncbi.nlm.nih.gov/entrez/query.fcgi?db==PubMedPubMed