Upload
gwenda-colleen-cummings
View
230
Download
0
Tags:
Embed Size (px)
Citation preview
Introduction to genomes & genome browsers
Content
Introduction The human genome Human genetic variation
SNPs CNVs Alternative splicing
Browsing the human genomeCelia van Gelder
CMBIUMC Radboud
December [email protected]
Exponential Growth in Genomic Sequence Data
# of genomes Currently1000+ completed
genomes
First 2 bacterial genomes complete
First eukaryotecomplete
(yeast) First metazoancomplete(flatworm)
©CMBI 2013
Genome projects
http://www.genomesonline.org/
The pig genome
The human genome
• Genome: the entire sequence of DNA in a cell
• 3 billion basepairs (3Gb)
• 22 chromosome pairs + X en Y chromosomes
• Chromosome length varies from ~50Mb to ~250Mb
• About 20000 protein-coding genes(average gene length 3000 bases, but largest known gene is 2.4 Mb (dystrophin))
• Human genome is 99.9% identical among individualsThis means that every 2 persons differ in 3 million nts!!
Eukaryotic Genomes: more than collections of genes
• Genes & regulatory sequences make up 5% of the genome
– Protein coding genes – RNA genes (rRNA, snRNA, snoRNA, miRNA, tRNA)– Structural DNA (centromeres, telomeres)– Regulation-related sequences (promoters, enhancers,
silencers, insulators)– Parasite sequences (transposons)– Pseudogenes (non-functional gene-like sequences)– Simple sequence repeats
The human genome cntnd
From: Molecular Biology of the Cell
(4th edition) (Alberts et al., 2002)
• Only 1.2% codes for proteins
• Long introns, short exons
• Large spaces between genes
• More than half consists of repetitive DNA
Alu repeat~300 bp> million copies
Chromosome organisation (1)
Genes that are OFF
Genes that are ON
Introduction to genomes & genome browsers
Content
Introduction The human genome Human genetic variation
CNVs SNPs Alternative splicing
Browsing the human genome
Human Genetic Variation
• Every human has essentially the same set of genes, but there are different forms of each gene -- known as alleles
• Genetic variation explains some of the differences among people, such as:– Blood group– Eye color– Skin color– Hair color– Higher or lower risk for getting particular diseases
• Cystic fibrosis, Sickle cell disease, • Diabetes, Cancer, Arthritis, Asthma • Stroke, Heart disease• Alzheimer's disease, Parkinson's disease• Depression, Alcoholism
Variations in the Genome
Common Sequence
Variations
Polymorphism
Deletions
Translocations
Insertions
Chromosome
Today’s focus
1. Single Nucleotide Polymorphisms (SNPs)
2. Copy number variations (CNV)
3. Alternative transcripts
Single Nucleotide Polymorphisms (SNPs)
• SNPs are DNA sequence variations that occur when a single nucleotide (A,T,C,or G) in the genome sequence is altered.
• For a variation to be considered a SNP, it must occur in at least 1% of the population.
• SNPs make up about 90% of all human genetic variation and occur every 100 to 300 bases.
• SNPs can occur in coding (gene) and non coding regions of the genome; <1% alter the protein sequence
SNPs
• determine properties like eye color, hair (curly or straight), or if you can taste bitter or not.
• are used for identification and forensics • are used for estimating predisposition to disease• can cause drug side–effects and/or non responsiveness for
the drug • have impact on how humans respond to environmental
factors like bacteria, viruses, toxins and chemicals• are used to predict specific genetic traits• are used for classifying patients in clinical trials• are used for mapping and genome-wide association studies
of complex diseases
SNP - Bitter tasting, TAS2R38
SNP & disease, Alzheimer
Alzheimer's disease (AD) & apolipoprotein E (APOE)
• Apolipoprotein E is a cholesterol carrier that is found in the brain and other organs.
• APOE is suspected to be involved in amyloid beta aggregation and clearance, influencing the onset of amyloid beta deposition.
• APOE contains 2 SNPs that result in 3 possible alleles: E2, E3, E4.
• Variant rs429358 rs7412 E2 T + T
E3 T + C E4 C + C
• A person who inherits at least one E4 allele will have a greater chance of developing AD.
Today’s focus
1. Single Nucleotide Polymorphisms (SNPs)
2. Copy number variations (CNV)
3. Alternative transcripts
Copy Number Variation
• Copy Number Variations (CNVs):gains and losses of large chunks of DNA sequence (10kB – 5Mb)
• When there are genes in the CNV areas, this can lead to variations in the number of gene copies between individuals
• CNVs contribute to our uniqueness.
• CNVs can also influence the susceptibility to disease.
• CNVs may either be inherited or caused by de novo mutation
Copy Number Variation
Normal cell
deletion amplification
CN=0 CN=1 CN=3 CN=4
CN=2
CNVs & disease
• Many inherited genetic diseases result from CNVs; – Gene copy number can be elevated in cancer cells– Autism– Schizophrenia (dept. human genetics)– Mental retardation (dept. human genetics)– Parkinsons disease
• There are CNVs that protect against HIV infection and malaria.
• The contribution of CNV to the common, complex diseases, such as diabetes and heart disease, is currently less well understood
Today’s focus
1. Copy number variations (CNV)
2. Single Nucleotide Polymorphisms (SNPs)
3. Alternative transcripts
Alternative splicing
Alternative splicing
• Defects in alternative splicing have been implicated in many diseases, including:
– neuropathological conditions such as Alzheimer disease
– cystic fibrosis, those involving growth and developmental defects
– many human cancers, e.g. BRCA1 in breast cancer
– Beta-globin in Beta-thalassemia
– Parkinsons Disease
Introduction to genomes & genome browsers
Content
Introduction The human genome Human genetic variation
CNVs SNPs Alternative splicing
Browsing the human genome
Annotating the genome
Annotation: attaching biological information to sequences. Two main steps:
• identifying elements on the genome• attaching biological information to these elements.
Basic & Advanced Genome Annotation
• Basic:– Genomic location– Gene features: Exons, Introns, UTRs– Transcript(s)– Pseudogenes, Non-coding RNA– Protein(s)– Links to other sources of information
• Advanced– Cytogenetic bands– Polymorphic markers– Genetic variation, including SNPs & CNVs– Repetitive sequences– cDNAs or mRNAs from related species– Genomic sequence variation– Regulation sequences (enhancers, silencers, insulators)
[Human] Genome Browsers
EBIEnsembl
NCBIMap Viewer
UCSC Genome Browser
Not limited toonly human data
Ensembl
©EMBL-EBI
Other Ensembl Installations
©EMBL-EBI
genes & predictions
variations & repeats
cross-speciescomparative data
& many more types of data from expression& regulation to mRNA and ESTs…
Gene X
DescriptionTranscript dataStructureGene OntologyPathway DataHomologous GenesExpression DataEtc….
Organized Data Based on Chromosome Location
track
s
HGNC – a unique name and symbol for every gene in human http://www.genenames.org/
ENSG### Ensembl Gene IDENST### Ensembl Transcript IDENSP### Ensembl Peptide IDENSE### Ensembl Exon ID
Ensembl: An Example
Click for
more
details
track
str
ack
s
Direction of transcription
Above blue line: forward strand
Below blue line: reverse strand
Ensembl Transcripts
©EMBL-EBI
Synopsis- What can I do with Ensembl?
• View, examine & explore annotated information for any chromosomal region:– Genes, – ESTs, mRNAs, alternative transcripts– Proteins– SNPs, and SNPs across strains (rat, mouse), populations
(human), or even breeds (dog)– homologues and phylogenetic trees across more than 40
species– whole genome alignments– conserved regions across species– gene expression profiles
• Upload your own data and use BLAST/BLATagainst any Ensembl genome
• Export sequence, or create a table of gene information