34

Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Embed Size (px)

Citation preview

Page 1: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database
Page 2: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Project Goal(from the proposal)

The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database of images, videos, and animations of cells from a variety of organisms, including both cell architecture and intracellular functionalities, as well as stimulate the economy through the creation and retention of 18 (7 full-time equivalents) positions and immediate deployment.

Page 3: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Team

Caroline KanePrincipal InvestigatorUniversity of California Berkley

John MurrayCo-Principal InvestigatorUniversity of Pennsylvania

Janet IwasaCo-Principal InvestigatorHarvard Medical School

Joan GoldbergExecutive DirectorAmerican Society of Cell Biology

David OrloffManager, Image LibraryAmerican Society of Cell Biology

John HufnagleScientific Informatics DeveloperMBL

www.cellimagelibrary.org/pages/personnel

Page 4: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Expert Annotation—The Value Add• 11 annotators• They often solicit and upload

images• They are often in contact with the

scientists who produced the images

Gregory AntipaSan Francisco State University

Carrie Baker Brachmann

Margaret I. DavisNational Institutes of Health, National Institute on Alcohol Abuse and Alcoholism

Keigi FujiwaraUniversity of Rochester

Catherine GalbraithNational Institutes of Health

Yu-Chen HwangUniversity of California, Santa Cruz

Wallace IpUniversity of CincinnatiCollege of Medicine

Caroline McKeownThe Scripps Research Institute

Linda ParysekUniversity of Cincinnati College of Medicine

Ginger WithersWhitman College

Chris WoodcockUniversity of Massachusetts Amherst

Page 5: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Annotation Information

• Image Description• Ontology terms• Attribution

1. Names2. Pubmed Ids3. Citations4. links5. dates

• Dimensional

Page 6: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Multiple Categories of Ontologies

• Categories including:– Biological Sources—NCBI, cell type, cellular

component– Blological Context – biological process, molecular

function– Imaging Methods– Sample Preparation

• Ontologies provide a controlled vocabulary• Useful for searching, browse categorization

Page 7: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Ontologies

• NCBI Organism Classification (NCBITaxon)• Gene Ontology (GO)

– biological_process – molecular_function– cellular_component

• Cell Type (CL)• Cell Line (MCC)• Human Development (EHDA)• Mouse Gross Anatomy (EMAP)• Plant Growth (PO)• Teleost Anatomy (TAO)• Xenopus Anatomy (XAO)• Zebrafish Anatomy (ZFA/ZFS)• Human Disease (DOID)• Mouse Pathology (MPATH)• Biological Imaging Methods (BIM) …the project now controls this ontology

Page 8: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Image Lifecyle

Image Data Upload

Image Data Upload AnnotationAnnotation Publish &

IndexPublish &

Index LibraryLibrary

Edit/Save

Retract

Page 9: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

System Components

OMERO Image Repository Server

www.openmicroscopy.org

OMERO Image Repository Server

www.openmicroscopy.org

DBPostgreSQL

DBPostgreSQL

DiskIndex,

Image Data

DiskIndex,

Image Data

Web Application Web Application

Annotation Web

Application

Annotation Web

Application

Server (Harvard)

Image UploadImage Upload

Library Browser Requests Library Browser Requests

Annotation Browser Requests

Annotation Browser Requests

Page 10: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Image Upload Submission

Image Data Upload

Image Data Upload AnnotationAnnotation Publish &

IndexPublish &

Index LibraryLibrary

Edit/Save

Retract

Page 11: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Image Data Upload

• Submitter downloads Upload Java application• Raw image data files selected (105 image file

formats supported)• Submitter contact information supplied• Submitter supplied image description (not

visible in the Library) which contains technical image details to be used by the annotators

• Choose license type

Page 12: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Upload Process & Components

Java Upload

App

Java Upload

App

Submitter Machine

HTTPHTTPImporterWorker Process

ImporterWorker Process

OMERO Image Repository

OMERO Image Repository

Production Server (Harvard)

DBPostgreSQL

DBPostgreSQL

DiskIndex,

Image Data

DiskIndex,

Image Data

Page 13: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Image Lifecyle

Image Data Upload

Image Data Upload AnnotationAnnotation Publish &

IndexPublish &

Index LibraryLibrary

Edit/Save

Retract

Page 14: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Annotation Process & Components

OMERO Image Repository

Server

OMERO Image Repository

Server

DBPostgreSQL

DBPostgreSQL

DiskIndex,

Image Data

DiskIndex,

Image Data

Annotation Web

Application(Django)

Annotation Web

Application(Django)

Server (Harvard)

ApacheServerApacheServer

Page 15: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Image Lifecyle

Image Data Upload

Image Data Upload AnnotationAnnotation Publish &

IndexPublish &

Index LibraryLibrary

Edit/Save

Retract

Page 16: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Publish

OMERO Image Repository ServerOMERO Image Repository Server

DBPostgreSQL

DBPostgreSQL

DiskIndex,

Image Data

DiskIndex,

Image Data

Annotation Web

Application

Annotation Web

Application

Server (Harvard)

PublishPublish

LibraryCustom Indexing Plug-in

LibraryCustom Indexing Plug-in

Lucene IndexerLucene Indexer

Browser PublishBrowser Publish

Page 17: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Indexing

• OMERO repository provides a way for developers to add their own custom indexing step in order to generate custom search indexing fields and values.

• Custom indexing plug-in, written in Java and configured into the OMERO system.

• Each image upon modification is presented to the custom plug-in

Page 18: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Cell Library Custom Indexing Generating Index Values

• Custom Lucene document index fields– Id– Ontology information for each term in each ontology category

• term id• parent id• ancestor ids• term description• synonym description

– attribution (names, pubmed, citations, urls)– is_recommended (for front page/browse poster child image)– is_video– description– license type– publish date (useful for Recent browsing)– dimensions

Page 19: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Ontology Data Scripting

Download Latest

Ontology .obo file

(Ruby)

Download Latest

Ontology .obo file

(Ruby)

Parse .obo file

(Custom BioJava)

Parse .obo file

(Custom BioJava)

JSON dataJSON data

Populate PostgreSQL

ontology tables(Ruby)

Populate PostgreSQL

ontology tables(Ruby)

BioPortal Ontology

REST services

BioPortal Ontology

REST services

DBPostgreSQL

DBPostgreSQL

Page 20: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Indexing Ontology Terms

…"field_mappings" : [{

"module" : "web_annotation_module","namespace" : "com.glencoesoftware.ilib.ann:ncbi","name" : "NCBIORGANISMALCLASSIFICATION","index_field_name_prefix" : "ncbi",

"ontologies" : [{

"db_table_name" : "ncbis", "model_klass" : "Ncbi”,

"onto_term_regex_pattern" : "NCBITaxon:[0-9]*" ,"ontology_id" : "1023" } ]},….

…"field_mappings" : [{

"module" : "web_annotation_module","namespace" : "com.glencoesoftware.ilib.ann:ncbi","name" : "NCBIORGANISMALCLASSIFICATION","index_field_name_prefix" : "ncbi",

"ontologies" : [{

"db_table_name" : "ncbis", "model_klass" : "Ncbi”,

"onto_term_regex_pattern" : "NCBITaxon:[0-9]*" ,"ontology_id" : "1023" } ]},….

... <entry> <ns>com.glencoesoftware.ilib.ann:celltype<\ns> <name>CELLTYPE<\name> <value>Ciliated Protist<\value> <\entry> <entry> <ns>com.glencoesoftware.ilib.ann:ncbi<\ns> <name>NCBIORGANISMALCLASSIFICATION<\name> <value>NCBITaxon:44030<\value> <\entry> ...

... <entry> <ns>com.glencoesoftware.ilib.ann:celltype<\ns> <name>CELLTYPE<\name> <value>Ciliated Protist<\value> <\entry> <entry> <ns>com.glencoesoftware.ilib.ann:ncbi<\ns> <name>NCBIORGANISMALCLASSIFICATION<\name> <value>NCBITaxon:44030<\value> <\entry> ...

Mapping file Annotation xml fragment

Page 21: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Additional Indexing Artifacts

• Generation of db data to support efficient Library browsing– Entries made for each ontology term in use

Page 22: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Image Lifecyle

Image Data Upload

Image Data Upload AnnotationAnnotation Publish &

IndexPublish &

Index LibraryLibrary

Edit/Save

Retract

Page 23: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

System Components

OMERO ServerOMERO Server

DBPostgreSQL

DBPostgreSQL

DiskIndex,

Image Data

DiskIndex,

Image Data

Annotation Web

Application

Annotation Web

Application

Server (Harvard)

Passenger ContainerPassenger Container

ApacheApacheJettyServletContainer

JettyServletContainer

LibraryWeb

Service

LibraryWeb

ServiceLibraryWeb

Service

LibraryWeb

ServiceLibraryWeb

Service

LibraryWeb

Service

Page 24: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Connecting to the OMERO Server

OMEROServerJava

OMEROServerJava

Annotation Web Application

(Django/Python)

Annotation Web Application

(Django/Python)

Server (Harvard)

Passenger ContainerPassenger Container

Jetty Servlet Container (8081,2,3,4,5)Jetty Servlet Container (8081,2,3,4,5)

Library Web Service (Java)Library Web Service (Java)

• search• get image annotation data• convert video-to-flash • get raw image bytes• get OME-TIF image bytes

• search• get image annotation data• convert video-to-flash • get raw image bytes• get OME-TIF image bytes

OMERO Ice Middleware

(Java)

OMERO Ice Middleware

(Java)

OMERO Ice Middleware

(Python)

OMERO Ice Middleware

(Python)

REST-likeREST-like

ApacheApache

8080

8080R8080R

0808

OMERO Ice Middleware

(Java)

OMERO Ice Middleware

(Java)

Page 25: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Library Basic Search

PrimaryWeightingPrimaryWeighting

SecondaryWeightingSecondaryWeighting

Page 26: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Library Advanced Search

Page 27: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Advanced Search

• If the ontology search value is exact match for existing term, returns matches against term and descendant terms e.g. “rodentia” will match rat, mouse, etc.

• If the ontology search value does not match an existing ontology term a simple text match search against that ontology category is run

Page 28: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Library Browse

• Categories– Cell Process (GO biological_process)– Cellular Component (GO cellular_component)– Cell Type (cell type CL)– Organism (NCBITaxon)

• Sub-categories consist of all ontology terms currently annotated to images…captured during Indexing phase

• Efficiency (NCBI 500K+)

Page 29: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Some Image Sources

• Journals– Journal of Cell Biology– Molecular Biology of the Cell– The Plant Cell– Plant Physiology

Page 30: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Some Sources and Contributors

• Don W. Fawcett’s The Cell• Some images from researchers with MBL ties

– Clara Franzini-Armstrong

– Rudolph Oldenburg

Page 31: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Programmatic Access

• Jetty web service interface is externally available.– Search– Image metadata– raw & OME-TIFF download formats

Page 32: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Statistics

• February stats– 6,635 Visits – 5,093 Absolute Unique Visitors– 31,609 Pageviews

Page 33: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Future Enhancements

• Themed collections with descriptive content• Image tagging• Faceted searching (SOLR)

Page 34: Project Goal (from the proposal) The overall goal of this two-year project is to establish a comprehensive, easily accessible public resource database

Summary

• Research tool with raw image data available for future image processing

• Image Submissions always accepted…contact David Orloff [email protected]