22
Interoperation between InterMines Legume Federation, June 22, 2015 Vivek Krishnakumar Chris Town J. Craig Venter Institute

Interoperation between InterMines

Embed Size (px)

Citation preview

Page 1: Interoperation between InterMines

Interoperation between InterMines

Legume Federation, June 22, 2015Vivek Krishnakumar

Chris TownJ. Craig Venter Institute

Page 2: Interoperation between InterMines

InterMine in a nutshell

• Open-source data warehouse software• Integration of complex biological data• Parsers for common biological data formats• Extensible framework for custom data• Cookie-cutter interface, highly customizable• Interact using sophisticated web query tools• Programmatic access using web-service API

Page 3: Interoperation between InterMines

Open-source Project

• Source code available online• Distributed with the GNU

LGPL license• GitHub Repo:

https://github.com/intermine/intermine

• GitHub Organization: https://github.com/intermine

intermine / intermine> bio> biotestmine> config> flymine> humanmine> imbuild> intermine> testmodel .gitignore .travis.yml LICENSE LICENSE.LIBS README.md RELEASE_NOTES

Page 4: Interoperation between InterMines

Richard N. Smith et al. Bioinformatics 2012;28:3163-3165

InterMine system architecture

Page 5: Interoperation between InterMines

InterMine system architecture

Web Application• Java Server Pages (JSP), HTML, JS, CSS• Interfaces with Java Servlets and IM web-services

Web Server• Tomcat 7.0.x, serves Web application ARchive file• ant based build system using Java SDK

Database Server• PostgreSQL 9.2 or above• range query, btree, gist enabled (refer docs here)

http://intermine.readthedocs.org/en/latest/system-requirements/

Page 6: Interoperation between InterMines

Alex Kalderimis et al. Nucl. Acids Res. 2014;42:W468-W472

InterMine web services

http://iodocs.labs.intermine.org

JBrowse

Page 7: Interoperation between InterMines

Federated Authentication

• Apart from the standard login scheme (username/password), InterMine supports industry standard OAuth2 based login flows, implemented by Google, GitHub, Agave, etc.

• ThaleMine (Arabidopsis) relies on this infrastructure to authenticate users against the araport.org tenant registered within the Agave infrastructure

• Documentation available here: http://intermine.readthedocs.org/en/latest/webapp/properties/web-properties/#openauth2-settings-aka-openid-connect

Page 8: Interoperation between InterMines

Interoperability?

• Ability of InterMine instances to communicate ‘automatically’ with each other

• By way of leveraging web services• Questions to be answered:

What do they say to each other? How do they say it? What mechanisms are used? Enabling these mechanisms…

Page 9: Interoperation between InterMines

Data Model

• Data Model === Schema of InterMine instance

• Defined in XML format• Core data model (based on SO) can be

extended to suit requirements• Access a mines data model in JSON format

http://MINE_URL/service/model/?format=json

• Compatibility of data models across mines ensures interoperability

Page 10: Interoperation between InterMines

Advantages of common data model

• Data mining scripts developed for one mine immediately compatible with others

• Promotes crowdsourcing one/more groups write

tools/widgets/parsers can be easily reused by others

• Enables cross species analysis

Page 11: Interoperation between InterMines

Available tools

• Multi-mine search toolhttps://github.com/alexkalderimis/multimine-search-tool

Based on InterMine Lucene-based search index Allows for interoperation when data models are different

• Integration based on Homologs: Ontology integration using `dagify`

https://github.com/intermine/dagify

Pathway Integration by way of collating shared pathways

• InterMine Staircase Powerful client-side interface enabling data analysis

workflows and cross-mine integration via web serviceshttp://staircase.herokuapp.com

Page 12: Interoperation between InterMines

InterMine Staircase

Page 13: Interoperation between InterMines

InterMine StaircaseConfigure access to multiple mines

Page 14: Interoperation between InterMines

InterMine StaircaseCross-mine search

Page 15: Interoperation between InterMines

InterMine StaircaseFilter results by facets

Page 16: Interoperation between InterMines

InterMine StaircasePrepare and enrich lists

Page 17: Interoperation between InterMines

InterMine StaircasePerform mine-to-mine list conversions

Page 18: Interoperation between InterMines

InterMine StaircaseApp/tool compatibility

Page 19: Interoperation between InterMines

InterMine StaircaseApplication model

MedicMine SoyMine....

Page 20: Interoperation between InterMines

Available Reference Mines

• ThaleMine: https://github.com/Arabidopsis-Information-Portal/intermine/

Integrates variety of genomic datasets pertaining to Arabidopsis thaliana col-0 Leverages both data warehousing and federation methods Represents wide variety of data: genes, proteins, function, expression, co-

expression, interactions, pathways, homologs, alleles, polymorphism, stocks, germplasm, phenotypes

• MedicMine: https://github.com/jcvi-plant-genomics/intermine/ Warehouse for Medicago truncatula A17 genomic data Houses variety of data: genes, proteins, function, expression

• PhytoMine: https://github.com/JoeCarlson/intermine/ Warehouse for 47 different Angiosperm genomes Developed on a Chado InterMine migration path Houses variety of data: genes, proteins, expression, homologs, protein families,

variation

• FlyMine: https://github.com/intermine/intermine/

Page 21: Interoperation between InterMines

Recommendations and Challenges

• Recommendations: Develop core plant InterMine model Follow InterMine guidelines Learn from prior initiatives - InterMOD

• Challenges Users/developers are used to current way of doing

things Time taken to adapt to common data model and/or

software stack Difficult to arrive at consensus with diverse group

Page 22: Interoperation between InterMines

Acknowledgments

• InterMine Team Gos Micklem Julie Sullivan Alex Kalderimis Richard Smith Sergio Contrino Josh Heimbach et al.

• Araport Team Chris Town Jason Miller Matt Vaughn Maria Kim Svetlana

Karamycheva Erik Ferlanti Chia-Yi Cheng Benjamin Rosen Irina Belyaeva