28
Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/

Gene Expression Omnibus (GEO)

Embed Size (px)

Citation preview

Gene Expression Omnibus (GEO)

http://www.ncbi.nlm.nih.gov/geo/

Microarray in general

SPOTS

Hybridization

SCAN

A public repository for the archiving and distribution of gene expression data submitted by the scientific community.

MIAME compliant data. Minimum Information About a Microarray Experiment

http://www.mged.org/Workgroups/MIAME/miame.html

Convenient for deposition of gene expression data, as required by funding agencies and journals.

Curated, online resource for gene expression data browsing, query, analysis and retrieval.

Gene Expression Omnibus (GEO): Gene Expression and Molecular Abundance Data Repository

GEO Architecture

Platform (GPL) = the technology used and the features detected.

Sample (GSM) = preparation and description of the sample.

Series (GSE) defines a set of samples and how they are related.

DataSets (GDS) sample data collections assembled by GEO staff.

GEO has four kinds of data records

Submitters may provide raw data

Original microarray scans Raw quantification data

GPLPlatform

descriptions

GSMRaw/processedspot intensities

from a singleslide/chip

GSEGrouping of

slide/chip data“a single experiment”

GDSGrouping ofexperiments

Curated byNCBI

Submitted byExperimentalistsSubmitted by

Manufacturer*

GEO Architecture

GEO Home Page

Simple interface to: show status

find documentation query data browse data submit data

Basic Search: Repository Browser

Selecting the total public data or Repository Browser links on the GEO home page, takes you to the Repository Browser, listing: number of each type of submitted file, both public and unreleased the total number of each technology type under Platforms the total number of each Sample type

Basic Search: Browse Platforms

All GEO submissions need to be associated with a platform file. These describe the features on a given platform, required to understand the data.

A platform file must be submitted if one is not already present in GEO. Commercial array platform files are submitted to GEO by the manufacturer.

Basic Search: Browse Platforms

Accession:GEO ID

Title:brief description

of platform

Contact:submitter

Samples:number of samples in GEO associated

with platform ID

Technology:platform

type

Release date:when file is

publicly accessible

The table can be sorted on any field except organism by clicking on the header.Specific platform files can be found using the ‘Find Platform’ option.

Basic Search: Find Platforms

Select ‘Find Platform’ Select company Select distribution Select species Enter title keyword

Basic Search: Find Platforms (continued)

Start the platform search Select the accession for the U133 plus 2.0 array Scroll down to find data table information

Data Retrieval: Browse Series

Data is submitted to GEO as a Series, which represents the experiment design.

Selecting Browse>Series brings up a list sorted by release date. Selecting a Series ID brings up the Series file summary.

Data Retrieval: Series Accession Page

GEO Accession Results Display Options

Scope controls what information is displayed:SelfPlatform, Samples or SeriesFamily

Format controls how information is displayed:HTMLSOFT (Simple Omnibus Format in Text)MINiML (MIAME Notation in Markup Language)

Amount controls how much information is displayed:Brief QuickFull Data

All GEO accession results pages have the same header that allows different views and formats for the data to be displayed

Data Retrieval: Series Accession Page

Biological sample summary

Design summary

Publication information

Platform (total)

Samples (total)

Data Retrieval: Sample File Summary

Sample preparation

Hybridization and data

processing

Platform Series

Data Retrieval: Sample File Data TableData table field

descriptions

Truncated data table from Quick view

Total data rows and file size

Supplementary raw data file

Querying GEO with IDs from Papers

A common way to access GEO data is through accessions from papers. Online journals include hyperlinks to the GEO accession page. Or, at the GEO home page enter the accession into the Query>GEO

accession text box

GEO Links in PubMed Search Results

One option for displaying PubMed search results is GEO DataSet links. When present, the results page is actually from Entrez GEO DataSets.

Advanced Searches

GEO data can be queried as: Datasets: experiment-centric view using Entrez GEO DataSets Gene profiles: gene-centric view using Entrez GEO Profiles

Selecting either takes you to a similar Entrez introduction page

Querying GEO DataSets

Start a GEO DataSets search with the Query>DataSets text box This brings up an Entrez GEO DataSets results form

Total results

Number of DataSets

Number of Platforms

Number of Series

DataSet Search Result

DataSet ID

Description

Platform

Reference Series

Supplementary files

Number of Samples and truncated list

Cluster image

Select the DataSet ID or click on the cluster image to go to the DataSet record.

GEO DataSet Record

Experiment design and DataSet information.

Sample and analysis information. Data retrieval.

Selecting analysis takes you to the data clustering interface.

Selecting the cluster image takes you to the clustering page

GEO Gene Profiles

GEO DataSet ID

Platform ID, Platform Feature ID

Gene description

Target sequence accession

Expression profile

GEO Gene Profiles use gene IDs from Platform files to show the expression of a gene across DataSets.

Entering a gene ID into the Query>Gene profiles text box takes you to the Entrez results page.

GEO BLAST

On the GEO BLAST page enter sequences in fasta format, GenBank accessions or select sequence files on local disks for blastn comparisons.

These are compared to GenBank sequences listed in Platform files associated with GEO DataSets

From the Blast result page select the ‘E’ option to the right of an alignment to show GEO Gene Profiles for that sequence in GEO DataSets

E button