34
MAGE-TAB - a simple tab delimited format for describing microarray (and potentially other) experiments in a MIAME compliant way MAGE-TAB workshop NCI, January 24, 2008

MAGE-TAB introduction: Alvis Brazma (EBI)

Embed Size (px)

Citation preview

Page 1: MAGE-TAB introduction: Alvis Brazma (EBI)

MAGE-TAB - a simple tab delimited format for

describing microarray (and potentially other) experiments in a MIAME

compliant way

MAGE-TAB workshopNCI, January 24, 2008

Page 2: MAGE-TAB introduction: Alvis Brazma (EBI)

What is needed to describe a microarray (and potentially any)

experiment adequately (MIAME)?

1. Description of biological sample (research subject, aliquot – i.e., biomaterial) properties

2. Description of the assay (e.g., microarray design)

3. Data from the assays - raw and processed4. Material and data processing protocols5. Experiment design – which sample went to

which assay and produced which data file

Page 3: MAGE-TAB introduction: Alvis Brazma (EBI)

How to do this?

1. Description of biological sample (research subject, aliquot – i.e., biomaterial) properties – a list of properties – free text or ontology entries – table (spreadsheet)

2. Description of the assay (e.g., microarray design) – a list of features on the array and their properties – sequence, annotation, location – table (spreadsheet)

3. Data from the assays - raw and processed – files or tables

4. Material and data processing protocols – free text5. Experiment design – which sample went to which assay

and produced which data file – a graph (normally a DAG)

Page 4: MAGE-TAB introduction: Alvis Brazma (EBI)

5. Experiment design graph

Sample 1

Sample 2

Hybridisation

Cy3

Cy5

Data

Page 5: MAGE-TAB introduction: Alvis Brazma (EBI)

5. Experiment design graph

Sample 1(Homo S.,

Brain

Sample 2(Homo S.,

Kidney)

Hybridisation(SMD array 1)

Data

Protocol (Cy5)

Protocol (Cy3)

Protocol

Page 6: MAGE-TAB introduction: Alvis Brazma (EBI)

liver 1(Homo sap.) Hybridisation 1

(HG_U95A)Data1.cel

Material processing protocols

(P-XMPL-1)

Normalisation protocol

(P-XMPL-2)

liver 2(Homo sap.)

Hybridisation 2(HG_U95A) Data2.cel

Kidney 1(Homo sap.)

Hybridisation 3(HGU_95A)

Data3.cel

Brain 1(Homo sap.)

FGDM.txtKidney 2

(Homo sap.)

Brain 2(Homo sap.)

Page 7: MAGE-TAB introduction: Alvis Brazma (EBI)

liver 1(Homo sap.) Hybridisation 1

(HG_U95A)Data1.cel

Material processing protocols

(P-XMPL-1)

Normalisation protocol

(P-XMPL-2)

liver 2(Homo sap.)

Hybridisation 2(HG_U95A) Data2.cel

Kidney 1(Homo sap.)

Hybridisation 3(HGU_95A)

Data3.cel

Brain 1(Homo sap.)

FGDM.txtKidney 2

(Homo sap.)

Brain 2(Homo sap.)

Sample IDCharacteristics[Organism]

Characteristics[OrganismPart]

Protocol REFHybridization ID

ArrayDesign REF

ArrayData URI

Protocol REFDerivedArrayData URI

liver 1 Homo sapiens liver P-XMPL-1 hyb 1 HG_U95A 1.CEL P-XMPL-2 FGEM.txt

liver 2 Homo sapiens liver P-XMPL-1 hyb 1 HG_U95A 1.CEL P-XMPL-2 FGEM.txt

kidney 1 Homo sapiens kidney P-XMPL-1 hyb 2 HG_U95A 2.CEL P-XMPL-2 FGEM.txt

kidney 2 Homo sapiens kidney P-XMPL-1 hyb 2 HG_U95A 2.CEL P-XMPL-2 FGEM.txt

brain 1 Homo sapiens brain P-XMPL-1 hyb 3 HG_U95A 3.CEL P-XMPL-2 FGEM.txt

brain 2 Homo sapiens brain P-XMPL-1 hyb 3 HG_U95A 3.CEL P-XMPL-2 FGEM.txt

Page 8: MAGE-TAB introduction: Alvis Brazma (EBI)
Page 9: MAGE-TAB introduction: Alvis Brazma (EBI)
Page 10: MAGE-TAB introduction: Alvis Brazma (EBI)

Important observation

• In high throughput experiments the experiment design graphs are– Regular (similar subgraphs repeated many times)– For most nodes there is only small number of incoming

and outgoing edge – They can be presented in ‘layers’ in a natural way (in

fact any DAG can be represented in layers)

• This makes their representation as a spreadsheet simple and natural

Page 11: MAGE-TAB introduction: Alvis Brazma (EBI)

Layers in a more complex DAG

A

B C

W

FE

G H I

0 1 2 3 4 5

Page 12: MAGE-TAB introduction: Alvis Brazma (EBI)
Page 13: MAGE-TAB introduction: Alvis Brazma (EBI)

In reality things are often even simpler

Page 14: MAGE-TAB introduction: Alvis Brazma (EBI)

... or even simpler than that

Page 15: MAGE-TAB introduction: Alvis Brazma (EBI)

v11 v12 v13

(c111, ..., c11n) (c121, ..., c12m) (c113, c ...,)

source (individual) age sex ... weight

sample extraction prot. sample

sample type amount ...

date taken

RNA extr. and labl. Prot assay

ma batch

Data processing prot. Data

ID1 s1 a1 d1ID2 s2 a2 d2

Idk sk ak dk

l1 l2 l3

vk1 vk2 vk3

(ck11,..., c11n) (c121, ..., c12m) (c113, ...,)

l1 l2 l3

node layer 1

Characteristics 11

characteristics 21 ...

characteristics n1

edge label

node layer 2

characteristics 21

characteristics 22 ...

characteristics m2

edge lable

node layer 3

characteristics 31

v11 c111 c112 c11n l11 v12 c121 c122 c12m l12 v13v21 v22 v23

vk1 vk2 vk3

... ... ...

Page 16: MAGE-TAB introduction: Alvis Brazma (EBI)

Real world examples

• Simplified E-TABM-234.sdrf

Page 17: MAGE-TAB introduction: Alvis Brazma (EBI)
Page 18: MAGE-TAB introduction: Alvis Brazma (EBI)

Source ID Sample ID Extract ID LabeledExtract ID Label Hybridization IDBS_TKAC_13 BSM_TKAC_01m BSM_TKAC_22p BSM_TKAC_22p biotin HFB2002012101ABS_TKAC_13 BSM_TKAC_02m BSM_TKAC_23p BSM_TKAC_23p biotin HFB2002012102ABS_TKAC_13 BSM_TKAC_03m BSM_TKAC_24p BSM_TKAC_24p biotin HFB2002012103ABS_TKAC_13 BSM_TKAC_04m BSM_TKAC_25p BSM_TKAC_25p biotin HFB2002012104ABS_TKAC_13 BSM_TKAC_05m BSM_TKAC_26p BSM_TKAC_26p biotin HFB2002012105ABS_TKAC_13 BSM_TKAC_06m BSM_TKAC_27p BSM_TKAC_27p biotin HFB2002012106ABS_TKAC_13 BSM_TKAC_07m BSM_TKAC_28p BSM_TKAC_28p biotin HFB2002012107ABS_TKAC_13 BSM_TKAC_08m BSM_TKAC_29p BSM_TKAC_29p biotin HFB2002012108ABS_TKAC_13 BSM_TKAC_09m BSM_TKAC_30p BSM_TKAC_30p biotin HFB2002012109ABS_TKAC_13 BSM_TKAC_10m BSM_TKAC_31p BSM_TKAC_31p biotin HFB2002012110ABS_TKAC_14 BSM_TKAC_01n BSM_TKAC_22p BSM_TKAC_22p biotin HFB2002012101ABS_TKAC_14 BSM_TKAC_02n BSM_TKAC_23p BSM_TKAC_23p biotin HFB2002012102ABS_TKAC_14 BSM_TKAC_03n BSM_TKAC_24p BSM_TKAC_24p biotin HFB2002012103ABS_TKAC_14 BSM_TKAC_04n BSM_TKAC_25p BSM_TKAC_25p biotin HFB2002012104ABS_TKAC_14 BSM_TKAC_05n BSM_TKAC_26p BSM_TKAC_26p biotin HFB2002012105ABS_TKAC_14 BSM_TKAC_06n BSM_TKAC_27p BSM_TKAC_27p biotin HFB2002012106ABS_TKAC_14 BSM_TKAC_07n BSM_TKAC_28p BSM_TKAC_28p biotin HFB2002012107ABS_TKAC_14 BSM_TKAC_08n BSM_TKAC_29p BSM_TKAC_29p biotin HFB2002012108ABS_TKAC_14 BSM_TKAC_09n BSM_TKAC_30p BSM_TKAC_30p biotin HFB2002012109ABS_TKAC_14 BSM_TKAC_10n BSM_TKAC_31p BSM_TKAC_31p biotin HFB2002012110ABS_TKAC_15 BSM_TKAC_01o BSM_TKAC_22p BSM_TKAC_22p biotin HFB2002012101ABS_TKAC_15 BSM_TKAC_02o BSM_TKAC_23p BSM_TKAC_23p biotin HFB2002012102ABS_TKAC_15 BSM_TKAC_03o BSM_TKAC_24p BSM_TKAC_24p biotin HFB2002012103ABS_TKAC_15 BSM_TKAC_04o BSM_TKAC_25p BSM_TKAC_25p biotin HFB2002012104ABS_TKAC_15 BSM_TKAC_05o BSM_TKAC_26p BSM_TKAC_26p biotin HFB2002012105ABS_TKAC_15 BSM_TKAC_06o BSM_TKAC_27p BSM_TKAC_27p biotin HFB2002012106ABS_TKAC_15 BSM_TKAC_07o BSM_TKAC_28p BSM_TKAC_28p biotin HFB2002012107ABS_TKAC_15 BSM_TKAC_08o BSM_TKAC_29p BSM_TKAC_29p biotin HFB2002012108ABS_TKAC_15 BSM_TKAC_09o BSM_TKAC_30p BSM_TKAC_30p biotin HFB2002012109ABS_TKAC_15 BSM_TKAC_10o BSM_TKAC_31p BSM_TKAC_31p biotin HFB2002012110ABS_TKAC_16 BSM_TKAC_01q BSM_TKAC_22p BSM_TKAC_22p biotin HFB2002012101ABS_TKAC_16 BSM_TKAC_02q BSM_TKAC_23p BSM_TKAC_23p biotin HFB2002012102ABS_TKAC_16 BSM_TKAC_03q BSM_TKAC_24p BSM_TKAC_24p biotin HFB2002012103ABS_TKAC_16 BSM_TKAC_04q BSM_TKAC_25p BSM_TKAC_25p biotin HFB2002012104ABS_TKAC_16 BSM_TKAC_05q BSM_TKAC_26p BSM_TKAC_26p biotin HFB2002012105ABS_TKAC_16 BSM_TKAC_06q BSM_TKAC_27p BSM_TKAC_27p biotin HFB2002012106ABS_TKAC_16 BSM_TKAC_07q BSM_TKAC_28p BSM_TKAC_28p biotin HFB2002012107ABS_TKAC_16 BSM_TKAC_08q BSM_TKAC_29p BSM_TKAC_29p biotin HFB2002012108ABS_TKAC_16 BSM_TKAC_09q BSM_TKAC_30p BSM_TKAC_30p biotin HFB2002012109ABS_TKAC_16 BSM_TKAC_10q BSM_TKAC_31p BSM_TKAC_31p biotin HFB2002012110A

Page 19: MAGE-TAB introduction: Alvis Brazma (EBI)

Ontology usage

• Characteristics can be either free text or an ‘ontology entry’

• Ontology entry is identified by a ‘source’ column following it

Page 20: MAGE-TAB introduction: Alvis Brazma (EBI)

Elements to describe an experiment

1. Description of biological sample (research subject, aliquot – i.e., biomaterial) properties – a list of properties – free text or ontology entries – table (spreadsheet)

2. Description of the assay (e.g., microarray design) – a list of features on the array and their properties – sequence, annotation, location – table (spreadsheet)

3. Data from the assays - raw and processed – files or tables

4. Material and data processing protocols – free text5. Experiment design – which sample went to which assay

and produced which data file – a graph (normally a DAG) – also a spreadsheet!

Page 21: MAGE-TAB introduction: Alvis Brazma (EBI)

MAGE-TAB

• 1., 5. Sample properties and experiment design - SDRF

• 2. Array design 2 - ADF (Array design file)• 4. Protocols – IDF (Investigation design

file)• 3. Data files and data matrices

Page 22: MAGE-TAB introduction: Alvis Brazma (EBI)

Array Design (ADF)

• ADF has been there around since MAGE-ML times as a way to describe an array

• A (table) spreadsheet - one row per array feature listing feature coordinates, the sequence, biological annotation, etc

Page 23: MAGE-TAB introduction: Alvis Brazma (EBI)

ADF

Block

Column

Block

Row

Column

Row

Reporter ID

Reporter

Sequence

Role Control

Type

Composite

Element ID

CompositeElement

DatabaseEntry [refseq]

1 1 1 1 R1 ATGGTTGGTTACGTGT experimental PTEN NM_000314

1 1 1 2 R2 CCGCGTTGCCCCGCC experimental PAX2 NM_003989 1 1 1 3 R3 CGTAGCTGATCGATGA experimental WWOX NM_016373

1 1 1 4 R4 GGTTGGCTGAGATCGT experimental MAPK8 NM_139047 1 1 2 1 R1 ATGGTTGGTTACGTGT experimental PTEN NM_000314

1 1 2 2 R2 CCGCGTTGCCCCGCC experimental PAX2 NM_003989

1 1 2 3 R3 CGTAGCTGATCGATGA experimental WWOX NM_016373 1 1 2 4 R4 GGTTGGCTGAGATCGT experimental MAPK8 NM_139047

4 6 20 20 462020 TCCCTTCCGTTGTCCT control control_spike_calibration

Page 24: MAGE-TAB introduction: Alvis Brazma (EBI)

Investigation design file IDF

• Lists general information about the experiment and gives (a free text) description of all the protocols

Page 25: MAGE-TAB introduction: Alvis Brazma (EBI)

Investigation Title University of Heidelberg H sapiens TK6Experimental Designs genetic_modification_designExperimental Factors genetic_modification

Person Last Name MaierPerson First Name PatrickPerson Email [email protected] Phone +496213833773Person Address Theodor-Kutzer-Ufer 1-3Person Affiliation Department of Radiation Oncology, University of HeidelbergPerson Roles submitter investigator

Quality Control Types biological_replicateReplicate Types biological_replicateDate of Experiment 2005-02-28Public Release Date 2006-01-03PubMed ID 12345678Publication Author List

Publication Status submittedExperiment Description

Protocol Name GROWTHPRTCL10653Protocol Type growProtocol Description

Protocol Parameters media

Protocol Name EXTPRTCL10654Protocol Type nucleic_acid_extractionProtocol Description

Protocol Parameters Extracted Product Amplification

Protocol Name TRANPRTCL10656Protocol Type bioassay_data_transformationProtocol Description

EDF Files e-mexp-428_tab.txt

Patrick Maier; Katharina Fleckenstein; Li Li; Stephanie Laufs; Jens Zeller; Stefan Fruehauf; Carsten Herskind; Frederik Wenz

Gene expression of TK6 cells transduced with an oncoretrovirus expressing MDR1 (TK6MDR1) was compared to untransduced TK6 cells and to TK6 cell transduced with an oncoretrovirus expressing the Neomycin resistance gene (TK6neo). Two biological replicates of each were generated and the expression profiles were determined using Affymetrix Human Genome U133 Plus2.0 GeneChip microarrays. Comparisons between the sample groups allow the identification of genes with expression dependent on the MDR1 overexpression.

TK6 cells were grown in suspension cultures in RPMI 1640 medium supplemented with 10% horse serum (Invitrogen, Karlsruhe, Germany). The cells were routinely maintained at 37 C and 5% CO2.

Approximately 10^6 cells were lysed in RLT buffer (Qiagen).Total RNA was extracted from the cell lysate using an RNeasy kit (Qiagen).

Mixed Model Normalization with SAS Micro Array Solutions (version 1.3).

Page 26: MAGE-TAB introduction: Alvis Brazma (EBI)

Data files

• Raw data – native formats (cel, genpix)• Normalised, summarised data – columns

may be individual references

Page 27: MAGE-TAB introduction: Alvis Brazma (EBI)

liver 1(Homo sap.) Hybridisation 1

(HG_U95A)Data1.cel

Material processing protocols

(P-XMPL-1)

Normalisation protocol

(P-XMPL-2)

liver 2(Homo sap.)

Hybridisation 2(HG_U95A) Data2.cel

Kidney 1(Homo sap.)

Hybridisation 3(HGU_95A)

Data3.cel

Brain 1(Homo sap.)

FGDM.txtKidney 2

(Homo sap.)

Brain 2(Homo sap.)

Sample IDCharacteristics[Organism]

Characteristics[OrganismPart]

Protocol REFHybridization ID

ArrayDesign REF

ArrayData URI

Protocol REFDerivedArrayData URI

liver 1 Homo sapiens liver P-XMPL-1 hyb 1 HG_U95A 1.CEL P-XMPL-2 FGEM.txt

liver 2 Homo sapiens liver P-XMPL-1 hyb 1 HG_U95A 1.CEL P-XMPL-2 FGEM.txt

kidney 1 Homo sapiens kidney P-XMPL-1 hyb 2 HG_U95A 2.CEL P-XMPL-2 FGEM.txt

kidney 2 Homo sapiens kidney P-XMPL-1 hyb 2 HG_U95A 2.CEL P-XMPL-2 FGEM.txt

brain 1 Homo sapiens brain P-XMPL-1 hyb 3 HG_U95A 3.CEL P-XMPL-2 FGEM.txt

brain 2 Homo sapiens brain P-XMPL-1 hyb 3 HG_U95A 3.CEL P-XMPL-2 FGEM.txt

Page 28: MAGE-TAB introduction: Alvis Brazma (EBI)

Sample IDCharacteristics[Organism]

Characteristics[OrganismPart]

Protocol REFHybridization ID

ArrayDesign REF

ArrayData URI

Protocol REFDerivedArrayData URI

liver 1 Homo sapiens liver P-XMPL-1 hyb 1 HG_U95A 1.CEL P-XMPL-2 FGEM.txt

liver 2 Homo sapiens liver P-XMPL-1 hyb 1 HG_U95A 1.CEL P-XMPL-2 FGEM.txt

kidney 1 Homo sapiens kidney P-XMPL-1 hyb 2 HG_U95A 2.CEL P-XMPL-2 FGEM.txt

kidney 2 Homo sapiens kidney P-XMPL-1 hyb 2 HG_U95A 2.CEL P-XMPL-2 FGEM.txt

brain 1 Homo sapiens brain P-XMPL-1 hyb 3 HG_U95A 3.CEL P-XMPL-2 FGEM.txt

brain 2 Homo sapiens brain P-XMPL-1 hyb 3 HG_U95A 3.CEL P-XMPL-2 FGEM.txt

FGEM.txt:

Gene n

….

Gene 3

Gene 2

Gene 1

Hyb 3 (brain)

Hyb 2 (kidney)

Hyb 1 (liver)

Genes

Page 29: MAGE-TAB introduction: Alvis Brazma (EBI)

Experimental Factors

• Most important experimental variables (e.g., organs – liver, kidney, brain – in the examples above)

• Any column in the EDF can be marked as an experimental factor

• These can be propagated down the edges of the graph to columns in FGEM

• In FGEM they serve as concise annotation

Page 30: MAGE-TAB introduction: Alvis Brazma (EBI)

MAGE-TAB

• Investigation design file – IDF• Array design file ADF• Experiment (sample) design file SDRF• Data files and data matrices (FGEM)

Page 31: MAGE-TAB introduction: Alvis Brazma (EBI)

Standard design templates

• simple iterated design;• iterated design with technical replicates;• iterated design with pooling;• iterated designs for dual channel experiments;• dual channel iterated designs with dye swap;• dual channel iterated designs with a reference sample;• dual channel iterated design with a reference and dye

swap;• dual channel iterated design with a pooled reference;• loop design;• loop design with die swap;• time series experiments;http://www.mged.org/Workgroups/MAGE/mage.html#mage-tab

Page 32: MAGE-TAB introduction: Alvis Brazma (EBI)

MAGE-TAB

• Any experiment can be represented in MAGE-TAB in a structured way to MIAME granularity

• Large experiments with a regular design can be represented in a natural way

• It is possible to create MAGE-TAB files using generic spread-sheet software

• It is flexible – the granularity of the experiment description can vary

Page 33: MAGE-TAB introduction: Alvis Brazma (EBI)

Some points for later discussion

• What should be the level of prescription on ID column names (source, sample, extract, ..., summary data)?– For instance, it is very often confusing to me

what is source and what is sample

• Do we need the concept of Experimental Factors at all?– An alternative could be optional labelling of

variable characteristics as ‘intentional’

Page 34: MAGE-TAB introduction: Alvis Brazma (EBI)

Acknowledgements

• Tim Rayner, Helen Parkinson• Cathy Ball, Don Maier• Philippe Rocca-Sera (ADF)• Michael Miller, Ugis Sarkans, Paul Spellman,

Anna Farne• Mohammad Shojatalob – MAGE-TAB export

from ArrayExpress• MAGE working groupFunding - NHGRI/NIBIB, MGED sponsors