44
DNA, Gene, and Genome

DNA, Gene, and Genome Translating Machinery for Genetic Information

Embed Size (px)

Citation preview

  • DNA, Gene, and Genome

  • Translating Machinery for Genetic Information

  • Transcription factorsmRNA levels

  • Automated DNA Sequencing

  • Data Increase (from NCBI web site)

  • Partial Display of Human Draft Sequence (Nature, 2001)

  • Human Genome Map at NCBI

  • MGALRPTLLPPSLPLLLLLMLGMGCWAREVLVPEGPLYRVAGTAVSISCNVTGYEGPAQQNFEWFLYRPEAPDTALGIVSTKDTQFSYAVFKSRVVAGEVQVQRLQGDAVVLKIARLQAQDQGIYECTPSTDTRYLGSYSGKVELRVLPDVLQVSAAPPGPRGRQAPTSPPRMTVHEGQELALGCLARTSTQKHTHLAVSFGRSVPEAPVGRSTLQEVVGIRSDLAVEAGAPYAERLAAGELRLGKEGTDRYRMVVGGAQAGDAGTYHCTAAEWIQDPDGSWAQIAEKRAVLAHVDVQTLSSQLAVTVGPGERRIGPGEPLELLCNVSGALPPAGRHAAYSVGWEMAPAGAPGPGRLVAQLDTEGVGSLGPGYEGRHIAMEKVASRTYRLRLEAARPGDAGTYRCLAKAYVRGSGTRLREAASARSRPLPVHVREEGVVLEAVAWLAGGTVYRGETASLLCNISVRGGPPGLRLAASWWVERPEDGELSSVPAQLVGGVGQDGVAELGVRPGGGPVSVELVGPRSHRLRLHSLGPEDEGVYHCAPSAWVQHADYSWYQAGSARSGPVTVYPYMHALDTLFVPLLVGTGVALVTGATVLGTITCCFMKRLRKR60-70 KDa Protein interacting with prostate cancer suppressor

  • Molecular biology databasesSequence databasesAnnotatedLow-annotationSpecializedStructural databasesMotif databasesGenome databasesProteome databasesRNA expressionLiteraturePopulationsMutationsPolymorphismsOrganismsPathways

  • PromotersESTsTissues and cellsGenome mapsDNA sequencesMolecular PhylogenyProtein sequencesProtein structuresDNA motifsProtein motifsSubstratesMetabolic pathwaysTranscription FactorsRNA expressionMutations/polymorphisms Gene Family

  • Databases formatsRelational databasesGDB, GSDB, MGD etc.Vender: Sybase, Oracle etc.Flat file databasesGenBank, SWISS-PROT etc.Object-oriented databasesACeDB, AtDB etc.

  • Molecular biology data typesOrganismsGenome maps

  • Molecular biology data typesOrganismsGenome mapsDNA sequencesRNA sequences...AATGGTACCGATGACCTGGAGCTTGGTTCGA...

  • Molecular biology data typesOrganismsGenome mapsDNA sequencesRNA sequencesProtein sequences...TRLRPLLALLALWPPPPARAFVNQHLCGSHLVEA...

  • Molecular biology data typesOrganismsGenome mapsDNA sequencesRNA sequencesProtein sequencesProtein structuresRNA structuresPDB entry 1CISP.Osmark, P.Sorensen, F.M.Poulsen

  • Molecular biology data typesOrganismsGenome mapsDNA sequencesRNA sequencesProtein sequencesProtein structuresDNA motifsProtein motifsRNA expressionRNA structures

  • DNA microarrays measure variations in RNA levelsThe full Yeast genome on a chiphttp://cmgm.Stanford.EDU/pbrown/De Risi et al, Science 278:680Red dots: genes whose RNA level increased

    Green dots: genes whose RNA level decreased

  • Substrates for High Throughput ArraysSingle label P33Single label biotinstreptavidinDual labelCy3, Cy5

  • GeneChip Probe Arrays24mMillions of copies of a specificoligonucleotide probe Image of Hybridized Probe Array>200,000 differentcomplementary probes

    Single stranded, labeled RNA targetOligonucleotide probe1.28cmGeneChip Probe ArrayHybridized Probe Cell

  • GeneChip Expression Array Design GeneSequenceProbes designed to be Perfect MatchProbes designed to be Mismatch

  • Procedures for Target PreparationcDNAFragment(heat, Mg2+)LLLLWash & StainScanHybridize

    (16 hours)Labeled transcriptPoly (A)+/Total RNAAAAAIVT

    (Biotin-UTPBiotin-CTP)Labeled fragmentsLLLLCells

  • Microarray Technology

  • Printing Arrays on 50 slides

  • Ratio of expression of genes from two sourcesTotalor

  • GSI Lumonics

  • Beta ActinPKGHPRTBeta 2 microglobulinRubiscoAB binding proteinMajor latex proteinhomologue (MSG)Cattle and Soy ControlsArray of cattle and soy spiking controls. 50 ug of cattle brain total RNA was labeled with Cy3 (green).1 ul each of in vitro transcribed soy Rubisco (5 ng), AB binding protein (0.5 ng) and MSG (0.05 ng) were labeled with Cy5. The two labeled samples were cohybridized on superamine slides (Telechem, Inc.). To the right of each set of spots are five negative controls (water).

  • IgMIgM heavy chainMYLKCOL1A2COL1A2MYLKIgMFetal Spleen-Cy3Adult Spleen-Cy5IgM heavy chain

  • Placenta vs. Brain 3800 Cattle Placenta Array cy3 cy5GenePix Image Analysis Software

  • Experimental DesignImage Analysis raw dataNormalization clean dataData Filtering informative dataModel buildingData Mining (clustering, pattern recognition, et al)ValidationMicroarray Data Process

  • Scatterplot of Normalized DataAdultFetal

  • >0.3
  • Complexity Levels of Microarray Experiments:

    Compare genes in a control situation versus a treatment situationExample: Is the level of expression (up-regulated or down-regulated) significantly different in the two situations? (drug design application)Methods: t-test, Bayesian approach

    Find multiple genes that share common functionalitiesExample: Find related genes that are dependent?Methods: Clustering (hierarchical, k-means, self-organizing maps, neural network, support vector machines)

    Infer the underlying gene and protein networks that are responsible for the patterns and functional pathways observedExample: What is the gene regulation at system level?Directions: mining regulatory regions, modeling regulatory networks on a global scale

  • Comparing data from two experiments.

  • NO DRUG 1nM Drug 1 mM DrugStatistical filters used: The genes present (Presence Call in Affymetrix) in drug treated, ANOVA p
  • Statistical filters used: The genes present (Presence Call in Affymetrix) in absence of drug, ANOVA p
  • Self Organizing Maps

  • Molecular Classification of Cancer

  • Gene Expression Profile of Aging and Its Retardation by Caloric Restriction

    Cheol-Koo Lee, Roger G. Klopp, Richard Weindruch, Tomas A. Prolla

  • Data Mining Methods

    Classification, Regression (Predictive Modeling)Clustering (Segmentation)Association Discovery (Summarization)Change and deviation detectionDependency ModelingInformation Visualization

    Major considerations is how to normalize between 2 samples to account for variation encountered in different stages. Statistical, assumption that most genes are evenly expressed. Between tissues need spiking controls.Parallel soy project.Soy and cattle controls. Legend explains all. These all appear on 3800 placenta array.Why placenta vs. Brain More genes expressed in brain. But on placenta array more highly expressed in placenta.Set of control spots on every grid