22
Genetics II (eukaryotes) IT Carlow Bioinformatics September 2006

Genetics II (eukaryotes) IT Carlow Bioinformatics September 2006

Embed Size (px)

Citation preview

Genetics II (eukaryotes)

IT Carlow Bioinformatics

September 2006

Homo sapiens• That’s us.

• 3.1 Gbases, 25,000 genes

• Genetic code same as E.coli– Hence “universal”

• DNA replication (DNApol)

• Transcription RNApol

• Ribosomes, translation

• So “essentially” the same?

Other Eukaryotes• Mouse, Rat, Cow, Chimp etc.

– Chimp human 5mya L.C.Ancestor– Mouse rat 30mya LC Ancestor– Mouse human 100mya LC Ancestor– Chicken human 300mya LC Ancestor

• C.elegans 19,000 genes, 300 cells, 97Mbase

• Drosophila 14,000genes, 180Mbase

• S. cerevisiae 6,000genes, 12Mbase

Eukaryotes have nucleus

• DNA bundled in discrete units – chromosomes

– Ends need capping, telomerase issues

• Bundling = additional access complications– histones, supercoiling

• Nucleus forces decoupling transscr translat

• Two way traffic in/out nucleus -– NFB - Transcriptional regulators

Operons?

• In general not.• But yeast often has common promoters on

divergent (opp strand) genes

• Singer Lloyd Humniecki Wolfe 2005– Find tissue specific clusters – spleen expressed– Chance or “design”– Compare human and mouse cluster breaks

Operons in Mammals?

Telomeres

• Eukaryotic chromosomes are linear

• chromosomes seem to have fixed location.

• Telomeres have characteristic # of repeats– Human TTAGGG, Oxytricha TTTTGGGG

• Chrs get shorter each generation– Priming for Okazaki fragments– Telomerase adds repeats– Telomerase fails: cancer, senescence

How similar is the machinery?

• DNA polymerase size % ID

• RNA polymerase

• Ribosomes – rRNA bigger 5S, 5.8S, 18S and 28S– Bases: 120bp,160bp,1900bp,4700bp– Protein count 50 rplX & 33 rpsX

tRNA

• Essential mediators of translation

• 74-90 base in size clover-leaf stucture

• Anti-codon loop– Curved so “wobble” is possible at third posn –– One anti-codon can serve 2 or 3 codons

• XXG can pair with C … Or U

• XXI (inosine) can pair with A, C or U

Introns

• About 5% of yeast genes

• Most mammalian genes

• Alternative splicing– Explain why we are more complex than worms– Challenges dogma 1 gene = 1 protein– Accounts for 80,000 diff proteins

Intron splice site

Alternative splicing 1

• Splice / don’t splice

• If stop codon in frame in intron then truncated protein.

• Can be used as a genetic switch to control production of two alternative proteins

Alternative splicing 2

• Competing 5' or 3' Splice Site

• Here two different 3’ splice sites

• Proximal, distal

Alternative splicing 3

• Exon skipping

• Could be more than one exon skipped

• Lots of potential for variant transcripts

• Slightly different enzymes

• Missing protein domains

Alternative splicing 4

• Mutually exclusive exons

• Here exons 1, 2, & 4 or 1, 3, & 4

• Two different forms of protein

Alternative splicing 5

• That’s just 1 classification– Can you think of another?

• Binf consequences– Gene prediction difficult in eukaryotes– No one answer in any one case– EST as binf tool for prediction

Junk?

• Human genome 3Gb but only 25K genes

• Even when introns accounted for

• 3% genome coding for “genes”

• 1% is actual codons

• The rest?

Pseudogenes

• Defined as gene inactivated because of mutation– Most obviously by nonsense/stop codon mutation

– Genetic code arranged so many mutations tolerable

– Once inactivated more mutations accumulate

• Processed pseudogene – Reverse transcriptase copy of mRNA

– Lacks introns, 5’ upstream control regions

• 1/3rd of human genome gene and gene related– pseudogenes,

– gene fragments, truncated genes

– introns/UTRs

Repetitive elements• 2/3rd of genome “intergenic”

– 1400Mb interspersed repeats (transposable elements) 44% of genome

• 640Mb LINES, LINE-1

• 420Mb SINES, Alu million copies

• 250Mb LTR, ERV 200,000 copies

• 90Mb DNA transposons, PiggyBac 2000 copies

– 600Mb Microsatellites etc.• 90Mb CACACA and other repeats (forensics)

A bit of history

• Darwin Origin of Species

• 1860s Mendel sends ms to Darwin (ignores)

• 1909 Gene “invented”

• 1910 Genes sit on chromosomes, in order

• 1941 One gene = one enzyme

• 1944 Genes definitely DNA

• 1953 Double helix

• 1977 Splicing

• 1993 MicroRNA identified

What is a gene?• Nature 25 May 2006 News Feature p399-401• Plants (Hothead), now mice may hold RNA copy

of gene to “correct” DNA!• ENCODE project Encyclopedia of DNA elements

– Close look at 1% of human genome

• Alternative splicing (1977) can be fitted in.• 5% of genome transcribed as read-through!• Exons can combine with exons many genes away!• 63% of mouse genome transcribed!• 8/500 non-coding RNAs essential for signalling

and growth

Bioinformatic consequences

• Pseudogenes a bioinf problem– Transcribed? See ESTs

• Alternative splicing a gene prediction prob– Exon prediction “easy”– Gene prediction harder

• Careers in RNA bioinformatics.