29
1 Integrated Microarray Integrated Microarray Database System Database System NHLBI-MGH-PGA

1 Integrated Microarray Database System NHLBI-MGH-PGA

  • View
    227

  • Download
    1

Embed Size (px)

Citation preview

Page 1: 1 Integrated Microarray Database System NHLBI-MGH-PGA

1

Integrated Microarray Database SystemIntegrated Microarray Database System

NHLBI-MGH-PGA

Page 2: 1 Integrated Microarray Database System NHLBI-MGH-PGA

2

Desired Features for DatabaseDesired Features for Database

Ability to accept data from MGH Core Facility and Core Facilities of remote collaborators

Ability to store both spotted array data and Affymetrix data

Web-accessibility Flexibility to accommodate various types of

experiments and the descriptions of those experiments

Tools for analyzing data and exporting data as tab-delimited files and XML (GEML)

Page 3: 1 Integrated Microarray Database System NHLBI-MGH-PGA

3

Database Users Database Users

MGH researchers (able to submit data)

Collaborators (able to submit data through MGH collaborator)

Scientific community (able to access published data through the web interface)

Page 4: 1 Integrated Microarray Database System NHLBI-MGH-PGA

4

Types of Tools for Database Types of Tools for Database

Tools for visualization of the array image (TIFF or proxy GIF file) as a clickable image map– Browse individual spots

– Evaluate the placement of the grid used during data acquisition

– Change the flag status of any of the spots

Normalization tools Clustering analysis tools

Page 5: 1 Integrated Microarray Database System NHLBI-MGH-PGA

5

Experimental designGeneral information about a

series of experiments with the goal of answering a biological

question<Submitter, related publications,

type of experiment, conditions tested, quality indicators,…>

Biological samples<Organism, genetic

variation, tissue, experimental

treatments, …>

Target preparation<RNA sample

extraction, labeling protocol, …>

Hybridization<Hybridization

conditions, multiple targets, …>

Slide elements<Information about genes represented on slide, sequences, …>

Slide manufacturing<Slide printing parameters and conditions, …>

Data acquisition<Scanning parameters,

software used, …>

Raw dataPartially password

protected data, multiple scan per slide

<Image file, fluorescence intensities, …>

Processed data<Filters, Normalized,

multi-slide averaged, …>

Expression dataA fixed expression data

format, can be published on the web

Final analyzed dataData format that will answer the question

asked in the experimental design and be published in a

scientific journal

Tools

Parameters retrieved and

presented with data

Tools

Filtering, Statistical tools, Hierarchial clustering, SOMs, Pathway analysis, data mining

software, …

Filtering, Normalization, Averaging, Extrapolation

(Maslint), Statistical tools, Quality assessment, …

Parameters stored in DBEach box contains a set of tables

Data stored in DBData to be manipulated by tools to different levels (not all data will end in a

publication). Data has to be viewed and monitored in the process to determine the necessity to continue the analysis and filter out data points. Experimental parameters

and external web resources may need to be called upon in the process.

Links to external web resources and

other software packages, data

mining tools, …

”Eric’s

lines”

Page 6: 1 Integrated Microarray Database System NHLBI-MGH-PGA

6

Background: Background: Related Software and Other ImplementationsRelated Software and Other Implementations

Stanford Microarray Database

Express DB

Array Express/Expression Profiler

MaxD

Page 7: 1 Integrated Microarray Database System NHLBI-MGH-PGA

7

Stanford Microarray DatabaseStanford Microarray Database

Strengths – Open source system– Supports spotted microarrays– Sophisticated data normalization tools

Weaknesses– Affymetrix data format not supported– RDBMS is Oracle, with Oracle-specific

functions in the source code

Page 8: 1 Integrated Microarray Database System NHLBI-MGH-PGA

8

Express DBExpress DB

Strengths – Supports both spotted microarrays and

Affymetrix dataWeaknesses– RDBMS is Sybase 11– Used as a demonstration system with

Saccharomyces, but not yet adapted for other organisms

Page 9: 1 Integrated Microarray Database System NHLBI-MGH-PGA

9

Array Express/Expression ProfilerArray Express/Expression Profiler

Strengths – Supports both spotted microarrays and

Affymetrix data– Implements the MIAME data specification

Weaknesses– No storage of raw luminosity data– RDBMS is Oracle– More tables would need to be added to contain

data pertaining to sample preparation, hybridization and other experimental details

Page 10: 1 Integrated Microarray Database System NHLBI-MGH-PGA

10

MaxDMaxD

Strengths – Implementation of Array Express table

structure suitable for SQL92-complaint databases, thus supporting MySQL

– Java based software with source code available for download on the web

– Strengths of Array Express Weaknesses– Weaknesses of Array Express– Not open source

Page 11: 1 Integrated Microarray Database System NHLBI-MGH-PGA

11

Formats of Data InputFormats of Data Input

Automatically entered when spotted arrays are scanned by the core facility– Array ID, chip layout, spot intensities, software

used by the Arrayer Directly entered by users

– Experiment names, hybridization conditions, procedures

Imported from flat files– Spot layout of chips, normalization intensities

generated by third party software packages (Affymetrix)

Page 12: 1 Integrated Microarray Database System NHLBI-MGH-PGA

12

Critical Data to Be StoredCritical Data to Be Stored

Description of each experimentInformation about the submitterDescription of the hybridizationDescription of the array designDescription of experiment info

related to Affymetrix chips or the core Axon Arrayer

Description of the sample and target

Page 13: 1 Integrated Microarray Database System NHLBI-MGH-PGA

13

Critical Data to Be Stored: ExperimentCritical Data to Be Stored: Experiment

Unique experiment IDHuman-readable experiment nameClassification of experiment typeFree text description of experimentDate of entryReferences to publicationsSubmitter ID

Page 14: 1 Integrated Microarray Database System NHLBI-MGH-PGA

14

Critical Data to Be Stored: SubmitterCritical Data to Be Stored: Submitter

Submitter ID Submitter’s name Institution Laboratory Principal Investigator Grant Email address Postal address Phone number

Page 15: 1 Integrated Microarray Database System NHLBI-MGH-PGA

15

Critical Data to Be Stored: HybridizationCritical Data to Be Stored: Hybridization

Hybridization ID Reference to the associated experiment and

arrays Free text description of a particular

hybridization Hybridization protocol Ordinal number for a particular hybridization if

the hybridization is part of a sequential set of hybridizations

Page 16: 1 Integrated Microarray Database System NHLBI-MGH-PGA

16

Critical Data to Be Stored: Array DesignCritical Data to Be Stored: Array Design

Array Design ID Human-readable name of the chip design Indication of the type of probe used (i.e., spotted vs.

synthesized, cDNA vs. oligos) Size of array (number of rows and columns and total

spots) Kind of chip used (e.g., glass, nylon) Type of Array (Affymetrix or Axon) Supplier who produced the slide (company, individual) Protocol to create the chip or provider information if

purchased

Page 17: 1 Integrated Microarray Database System NHLBI-MGH-PGA

17

Critical Data to Be Stored: AffymetrixCritical Data to Be Stored: Affymetrix

Name of chipSample applied to chipProbe used with chipExperimental information found in

Affymetrix .EXP files

Page 18: 1 Integrated Microarray Database System NHLBI-MGH-PGA

18

Critical Data to Be Stored: Axon ArrayerCritical Data to Be Stored: Axon Arrayer

Description of information from core Axon Arrayer that is also stored in the core microarray database

Page 19: 1 Integrated Microarray Database System NHLBI-MGH-PGA

19

Critical Data to Be Stored: SampleCritical Data to Be Stored: Sample

Description of the sample used to make the target that is applied to the chip

Description of the source of the sample (which may include the following information as applicable to a given sample: ID, genus, species, strain, ecotype, organism, organ, tissue, cell type, cell line, cell culture, developmental stage, sex, genetic variation)

Page 20: 1 Integrated Microarray Database System NHLBI-MGH-PGA

20

Critical Data to Be Stored: TargetCritical Data to Be Stored: Target

Description extract used to make the target

Description of the extraction protocol

Description of the labeling method (if any)

Page 21: 1 Integrated Microarray Database System NHLBI-MGH-PGA

21

Database Schema for Integrated Microarray Database SystemDatabase Schema for Integrated Microarray Database System

Page 22: 1 Integrated Microarray Database System NHLBI-MGH-PGA

22

I. Submitter Information: 

Summitter Name: (blank text field to type in name of person who is submitting the experiment (not the data entry person, if different) Organization: MGH, other Laboratory: Ausubel, Freeman, Pier, Seed, other *Grant: PGA, other *Grant Number: PI of Grant: Ausubel, Freeman, Pier, Seed, other  Email: [email protected] Address: Lipid Metabolism Unit, Massachusetts General Hospital, 32 Fruit Street, GRJ 1328, Boston, MA 02114 (blank text field) Phone: (xxx) xxx-xxxx (blank text field) Experiment name: name of experiment (blank text field) Abstract: one line description of experiment (blank text field)

Page 23: 1 Integrated Microarray Database System NHLBI-MGH-PGA

23

II. Taxonomy:Organism: Mouse (pull-down choices)Genus: Mus (pull-down choices)Species: musculus (pull-down choices)Genotype: wild type, mutant, transgenic (pull-down choices)Strain:Organ/Tissue: lungs, liver (text field)Cell type: text fieldCell line: text fieldCell culture: text fieldDevelopmental Stage: text fieldSex: Male, Female, hermaphroditeGenetic Variation: link to supplemental database if neededFree Text:Mutant Name: tlr4 (free text) *Name of mutated gene: toll-like receptor 4 (free text)Gene abbreviation: tlr4 (free text)Allele name: free textDominance: dominant, recessive, semi-dominant, other (pull-down choices)Mutant type: gain of function, loss of function, null, overexpressor, suppressor, unknown, other (pull-down choices)Description: free text

Page 24: 1 Integrated Microarray Database System NHLBI-MGH-PGA

24

III. Sample Treatment: Sample Description: free text*Is this experiment a time course? Yes or No (radio buttons)Hours after treatment: 2, 4, other (free text)Temperature: Type of Treatment: pathogen, hormone, chemical, serum, growth-factor, other (pull-down choices)Compound: name of chemical, hormone, pathogen, etc. (free text)*Dose: free text*Concentration: free textTreatment Protocol: free textRNA extraction method: free textAmount of RNA obtained: free textHybridization: free textNumber of Hybridization: (if more than one hybridization per chip) free text of a numberHybridization protocol: free textLabeling method for target: free textLabeling protocol: free textAmount of sample used to make target: free textSupplemental Database: (pull-down choice) plant

Page 25: 1 Integrated Microarray Database System NHLBI-MGH-PGA

25

Example QueriesExample Queries

1) List all experiments performed by a single user.

2) Retrieve all experiments entered into the database since October 31, 2001.

3) Retrieve normalized data for two arrays in an experiment and graph the luminosity values on a log-log scatter plot.

Page 26: 1 Integrated Microarray Database System NHLBI-MGH-PGA

26

Example QueriesExample Queries

4) List all experiments from a particular lab, or operator.

5) List all experiments using a particular protocol.

6) List all experiments performed on an extract from a particular tissue type.

Page 27: 1 Integrated Microarray Database System NHLBI-MGH-PGA

27

Example QueriesExample Queries

7) Which genes are expressed in response to pathogen A, but not pathogen B in a given host?

8) Compare the results of multiple treatments and produce a Venn diagram showing sets of genes induced or repressed by these different treatments or pathogens.

9) Calculate distance matrices to analyze the extent of differences between treatments, time points or mutants.

Page 28: 1 Integrated Microarray Database System NHLBI-MGH-PGA

28

ToolsTools

Cluster (Stanford): clustering on large datasets (hierarchical, SOMs, kmeans, PCA)

TreeView (Stanford): view cluster output

EPCLUST (EBI): hierarchical clustering of gene expression datasets

Page 29: 1 Integrated Microarray Database System NHLBI-MGH-PGA

29

IMDS Development TeamIMDS Development Team

Harry Bjorkbacka (End User/Feature Consultant) Cheri Chen (End User) Lance Davidow (Developer/User) Julia Dewdney (End User/Feature Consultant) Chen Liu (Developer) Christina Powell (Developer/End User) Sean Quinlan (Database/Program Developer) Jonathan M. Urbach (Program Developer) Eric VanHelene (Manager)