37
Genome structure and evolution Jan Pačes Institute of Molecular Genetics AS CR

Genome structure and evolution

  • Upload
    melita

  • View
    81

  • Download
    0

Embed Size (px)

DESCRIPTION

Genome structure and evolution. Jan Pačes Institute of Molecular Genetics AS CR. sizes of selected completed genomes. genome complexity. genome sizes. arabidopsis thaliana. psilotum nudum. genome size ~100 Mbp. genome size: ~ 250 Gbp. unregular genome sizes?. - PowerPoint PPT Presentation

Citation preview

Page 1: Genome  structure and  evolution

Genome structure and evolution

Jan PačesInstitute of Molecular Genetics AS CR

Page 2: Genome  structure and  evolution

sizes of selected completed genomesgenome chromosomes size genes

Mycoplasma genitalium 0.58 Mbp 521

Escherichia coli 4.6 Mbp(5.4 Mbp)

4 377(5 416)

Saccharomyces cerevisiae 16 12.5 Mbp 5 770

Caenorhabtitis elegans 6 ~100 Mbp 19 427

Arabidopsis thaliana 5 ~115 Mbp ~28 k

Drosophila melanogaster 5 ~122 Mbp 13 379

Homo sapiens 24 ~ 3.3 Gbp ~22.5 k

Page 3: Genome  structure and  evolution

genome complexity

Page 4: Genome  structure and  evolution

genome sizes

arabidopsis thaliana psilotum nudum

genome size ~100 Mbp genome size: ~ 250 Gbp

Page 5: Genome  structure and  evolution

unregular genome sizes?

Schizosaccharomyces pombe fission yeast, genome smaller than many bacterias genome 12 462 637 bp, 4 929 genes

Mimivirus virus of an amoeba genome 1 181 404 bp, 1 262 genes

Tetraodon nigroviridis (pufferfish) same number of genes as human, genome size only 1/10th 300 Mbp, 27 918 genes

Page 6: Genome  structure and  evolution

C-value C-value refers to the amount of DNA contained

within a haploid nucleus in picograms among diploid organisms the terms C-value and

genome size are used interchangeably in polyploids the C-value may represent two or more

genomes contained within the same nucleus in animals C-value range more than 3,300x

genome size (bp) = (0.978 x 109) x DNA content (pg) DNA content (pg) = genome size (bp) / (0.978 x 109) 1 pg = 978 Mb

Page 7: Genome  structure and  evolution

genome sizes 0.0023 pg in the parasitic microsporidium Encephalitozoon

intestinalis 1 400 pg in protist, the free-living amoeba Chaos chaos

Gregory T http://www.genomesize.com

Page 8: Genome  structure and  evolution
Page 9: Genome  structure and  evolution

C-value enigma What types of non-coding DNA are found in different

eukaryotic genomes, and in what proportions? From where does this non-coding DNA come, and

how is it spread and/or lost from genomes over time?

What effects, or perhaps even functions, does this non-coding DNA have for chromosomes, nuclei, cells, and organisms?

Why do some species exhibit remarkably streamlined chromosomes, while others possess massive amounts of non-coding DNA?

What is the minimal genome?

Page 10: Genome  structure and  evolution

e-cell model and reconstruct biological phenomena in silico

http://www.e-cell.org

Page 11: Genome  structure and  evolution

Synthetic genomes Mycoplasma laboratorium

Gibson D, et al. (2008): Complete Chemical Synthesis, Assembly, and Cloning of a Mycoplasma genitalium Genome. Science. DOI: 10.1126/science.1151721

Synthia synthetic species of bacterium derived from the genome

of Mycoplasma mycoides from scratch and transplanted into a Mycoplasma capricolum cell

Gibson D, et al. (2010): Creation of a bacterial cell controlled by a chemically synthesized genome. Science. DOI: 10.1126/science.1190719

Page 12: Genome  structure and  evolution

just for fun – watermarks

"TO LIVE, TO ERR, TO FALL, TO TRIUMPH, TO RECREATE LIFE OUT OF LIFE.""SEE THINGS NOT AS THEY ARE, BUT AS THEY MIGHT BE.""WHAT I CANNOT BUILD, I CANNOT UNDERSTAND."

PA C

E

S

VENTERINSTITVTECRAIGVENTERHAMSMITHCINDIANDCLYDEGLASSANDCLYDE

Page 13: Genome  structure and  evolution

Rhodobacter capsulatus, GC content

Page 14: Genome  structure and  evolution

homo sapiens, gene distribution

Saccone S, et al. (2001) Chromosome Res.

Page 15: Genome  structure and  evolution

structure of human genome Up to date was read 3,164.7 billions nucleotides. Average gene is 3 thousands nucleotides length, longest

gene (dystrophin) is 2.4 billion nucleotides length. Number of the genes is between 20k and 30k (23k) Less than 2% of the genome code some protein. Function of more than 50% of the genes is unknown. DNA is more than 99,9% identical between all humans. Repetitive elements, which does not code proteins ("junk

DNA") compose more than 50% of the human genome. Entropy rate is around 1.7 (.9 for Y chromosome). Around 20% of our genome is transcribed.

Page 16: Genome  structure and  evolution

importance of “junk” DNA syncytin (adapted ancestral env polyprotein)

Blond JL (1999): Molecular characterization and placental expression of HERV-W, a new human endogenous retrovirus family". J Virol

social behavior in rodents (and possibly humans) Hammock EA, Young LJ (2005): Microsatellite instability generates diversity in brain and

sociobehavioral traits. Science regulation of gene expression and promotion of genetic diversity

Peaston A, et al (2004): Retrotransposons Regulate Host Genes in Mouse Oocytes and Preimplantation Embryos. Developmental Cell

evolution of sequences, for example, an antifreeze-protein gene in a species of fish DeVries AL and Cheng C-HC (2005): Antifreeze proteins in polar fishes. Fish Physiology

source of microRNAs Woolfe A, et al (2005): Highly conserved non-coding sequences are associated with

vertebrate development .PLoS Biol LINE-1 capable of repairing broken strands of DNA.

Morrish TA, et al (2002): DNA repair mediated by endonuclease-independent LINE-1 retrotransposition. Nature Genetics

Page 17: Genome  structure and  evolution

synthesizing non-natural parts from natural genomic template Journal of Biological Engineering 2009, 3:2 doi:10.1186/1754-1611-3-2 Pawan K Dhar1 , Chaw Su Thwin1 , Kyaw Tun1 , Yuko Tsumoto1 , Sebastian

Maurer-Stroh2 , Frank Eisenhaber2 and Uttam Surana3 The current knowledge of genes and proteins comes from 'naturally designed'

coding and non-coding regions. It would be interesting to move beyond natural boundaries and make user-defined parts. To explore this possibility we made six non-natural proteins in E. coli. We also studied their potential tertiary structure and phenotypic outcomes.

The chosen intergenic sequences were amplified and expressed using pBAD 202/D-TOPO vector. All six proteins showed significantly low similarity to the known proteins in the NCBI protein database. The protein expression was confirmed through Western blot. The endogenous expression of one of the proteins resulted in the cell growth inhibition. The growth inhibition was completely rescued by culturing cells in the inducer-free medium. Computational structure prediction suggests globular tertiary structure for two of the six non-natural proteins synthesized.

Page 18: Genome  structure and  evolution

main events in genome evolution

mutations (SNP) duplications rearrangements horizontal transfer

parasitic DNA

Page 19: Genome  structure and  evolution

how and where to find transposones

Repbase database of repetitive elements http://www.girinst.org/repbase

RepeatMasker search for repetitions in genome sequence http://www.repeatmasker.org

Page 20: Genome  structure and  evolution

repetitive elements in human genome Transposones: transposon-derived repeats,

interspersed repeats 45% of the genome

Micro a minisatellites: simple sequence repeats repetition of simple sort direct repeats 3% of the genome

Duplications: duplications of genome segments of different length (10 - 300 kb); inter and intra - chromosomal 3.3% of the genome

Other types of repetitions: centromeric and telomeric repeats

IHGSC, Nature 2001

Page 21: Genome  structure and  evolution

transposones in human (vertebrate) genome DNA transposones retrotransposones

RNA as intermediate, reverse transcription LTR transposones (similar to retroviruses) polyA retrotransposones (colinear with mRNA, polyA)

human chromosome 21

Page 22: Genome  structure and  evolution

DNA transposones

2-3 kb terminal reversed repetitions (50 - 100 bp) cut-and-paste mechanism 3% of the genome at least 7 classes, some of them not related

Page 23: Genome  structure and  evolution

LTR retrotransposones LTR – long terminal repeat Human Endogenous Retroviruses (HERVs) RNA intermediate (RNA pol. II ) short insertional duplications (4-6 bp) 8 % of the genome 100 000 elements, tens of families

Page 24: Genome  structure and  evolution

LINE1 (L1) elements LINE – long interspersed elements poly A (non-LTR) retrotransposons RNA intermediate (internal promotor for RNA pol. II) insertion duplication of different length (5-15 bp) insertion preferences (TT AAAA) 17 % of genome 500 000 elements, often cutted at 5' end 30-60 active LINE1 elements in genome

Page 25: Genome  structure and  evolution

nonautonomous elements

They do not code enzymes for their own transposition.

For each class of the autonomous elements exists nonautonomous elements. Such elements use different mechanism of replication, specific for autonomous elements.

Page 26: Genome  structure and  evolution

SINE (Alu) elements SINE – short interspersed elements poly A (non-LTR) retrotransposons RNA intermediate (internal promotor for RNA pol. III) insertion duplications (5-15 bp) insertion preferences (TT | AAAA) 10 % of genome 1 000 000 elements, often cutted at 5' end

Page 27: Genome  structure and  evolution

processed pseudogenes

colinear with mRNA missing introns and promotores; poly A often 5' cutted bordered by direct repeats of different legth (4-15bp) insertion sites are similar to LINE1 transposition generated by L1

Page 28: Genome  structure and  evolution

coevolution of “DNA parasites”

DNA transposones

LTR retrotransposones

polyA retrotransposones

Page 29: Genome  structure and  evolution

HERV16 - example

http://hervd.img.cas.cz

Page 30: Genome  structure and  evolution

1000 Genome Projectcurrent status Trio project: two families with ~42x coverage

Yoruba and Caucasian Low-coverage project: ~5x coverage of unrelated

individuals 60 Yoruba, 60 Caucasians, 30 Han, 30 Japanese

Exon project: 8000 exons (900 genes) by capture array, >50x coverage, 700 unrelated individuals

+ 2 individual sequences (Watson and Venter)

1000GPC, Nature 2010

Page 31: Genome  structure and  evolution

stability / fluidity of the genome ~200 to 300 loss-of-function variants in annotated

genes and 50 – 100 variants of implicated inherited disorders

10-8 per base per generation germline substitution rate

1000GPC, Nature 2010

Page 32: Genome  structure and  evolution

ENCODEEncyclopedia Of DNA Elements

Raney, NAR 2010

Page 33: Genome  structure and  evolution

genome browsers

Golden Path http://genome.ucsc.edu

ENSEMBL http://www.ensembl.org

Page 34: Genome  structure and  evolution
Page 35: Genome  structure and  evolution
Page 36: Genome  structure and  evolution
Page 37: Genome  structure and  evolution

that’s it, thank you

Institute of Molecular Genetics AS CR

Free and Open Bioinformatics Association