If you can't read please download the document
Upload
toni-hermoso-pulido
View
8.752
Download
1
Embed Size (px)
Citation preview
Presentacin de PowerPoint
Semantic web technologies applied to bioinformatics and laboratory data management
Toni Hermoso [email protected]
Bioinformatics Core Facility
http://biocore.crg.cat
THE CLASSICAL WEB
> SyntaxMarkup languages (HTML, XHTML, etc.)
> ContentText inside the tags (or as attributes)
> StyleHTML tags themselves
CSS (in content or as external files)
Robert CailliauWWW fomer logo
Tim Berners-Lee, Robert Cailiau. CERN (1990)
THE CLASSICAL WEB
WEB 2.0
> Buzz word. First coinage associated to Tim O'Reilly.
> The term "Web 2.0" (2004present) is commonly associated with web applications that facilitate interactive information sharing, interoperability, user-centered design, and collaboration on the World Wide Web.
> Examples of Web 2.0 include web-based communities, hosted services, web applications, social-networking sites, video-sharing sites, wikis, blogs, mashups, and folksonomies.
> AJAX, RSS, Web APIs
wikis may allow anyone to edit
wikis are intended to be easy to use
wiki content is easy to link
wikis support tracking of all changes
wikis may allow upload different media
Wiki Wiki !
WikiWikiWeb. Ward Cunningham - 1994
MediaWiki
> Most popular wiki software
> Behind Wikimedia Foundation.
> The most know implementation is: Wikipedia http://www.wikipedia.org
First version 2002.Wikipedia before UseModWiki (Perl Wiki)
Gene Wiki: Gene annotation project in Wikipedia
http://en.wikipedia.org/wiki/Portal:Gene_Wiki
> Approach rellevant human genes information to end-users
> Manual collaborative annotation & automated external reference thanks to robot software
> Wikipedia portal within Molecular and Cellular Biology Project
Published September 2009
Gene Wiki: Gene annotation project in Wikipedia
GENE WIKI
> Example of a wiki pageReelin
GENE WIKI
> Example of a wiki category pageHuman proteins
GENE WIKI
> Example of a wiki source page:Reelin
GENEWIKI
> Example of a wiki template page:Reelin
Web parsing / scraping
> To get information from a HTML source (wiki included)
Download tools: Lynx
Wget
Perl LWP
Perl WWW::Mechanize
Python Beautiful Soap
Web parsing / scraping
> Processing content. (example, EC: 3.4.21.-)Regular expressionss/ http://en.wikipedia.org/w/api.php
MediaWiki API
> Common scripting with Python or Perl: MediaWiki::Bot
> You can get / store information from/in wiki.
MediaWiki API
> Easier to extract data:Retrieve wiki syntax, not direct HTML content
Useful when templates are used
Can retrieve all pages from a category
SEMANTIC WEB
> The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in collaboration.
Sir Tim Berners-Lee
The Berners-Lee Semantic Web Birthday Cake
http://www.mkbergman.com/231/from-data-federation-pyramid-to-the-semantic-web-birthday-cake/
The evolution of the Web
SNPedia: a semantic wiki for human genetic studies
> http://www.snpedia.com (starts from 2007)
> Semantic MediaWiki (first releases 2005)
> Database of SNP (Single Nucleotide Polymorphisms)
> In September 2009, website claimed 7,938 SNPs in their database.
> Predictive medicine report against SNPedia using Promethease:An application to query SNPedia against your genotyping
SNPedia: a semantic wiki for human genetic studies
SNPedia
> Example of a wiki pageRs333
SNPedia
> Example of a wiki pageRs333
SNPedia
> Example of a wiki page propertiesRs333
SNPedia
> Example of a page property (disease) valueHIV
Semantic MediaWiki Data Types
* Type:Page: links to pages (the default) * Type:String: text strings that are not longer than 250 letters * Type:Number: integer and decimal numbers with optional exponent * Type:Boolean: restricts the value of a property to true/false (also 1/0 and yes/no) * Type:Date: specifies particular points in time * Type:Text: like Type:String but can have unlimited length; the trade-off is values of this type cannot be selection or sort criteria in queries. * Type:Code: like Type:Text but with additional precautions to preserve special formatting as used for technical texts. The value displays as regular text everywhere else (query results, factbox, "Pages using the property", etc.). * Type:Temperature: variation of Type:Number that supports uits of temperature (cannot be user-defined since converting temperature units is more complicated than multiplying by a conversion factor). * Type:Telephone number: validates and stores international telephone numbers based on the RFC 3966 standard * Type:Record: type for compound property values that consists of a short list of values with fixed type and order
Semantic MediaWiki Data Types
For specifying URLs and emails, there are some special variations of the string data type:
* Type:URL: displays an external link to its URL object * Type:Email: displays an e-mail address as a link (with mailto:) * Type:Annotation URI: similar to Type:URL but with some technical differences in SMW's RDF export
Some extension provide further types:
* Type:Geographic coordinate (provided by Semantic Maps): describes geographic locations. Different forms of geographic coordinates are supported.
http://semantic-mediawiki.org/wiki/Help:Properties_and_types
SNPedia RDF behind a wiki page
RDF (Resource Description Framework)
Triple {subject, property/predicate, object}
Defining & describing data and relations among data
Suitable to attach metadata to certain resources
Understood by machines (not so much by humans)
Normally in XML format
Alternative: RDFa (in XHTML pages directly)
RDF: Gene Ontology
OWL: Gene Ontology
SPARQL
RDF query language; its name is a recursive acronym that stands for SPARQL Protocol and RDF Query Language.
Example query of Wikipedia: http://dbpedia.org/sparql
Example query of biological resources:
http://www.semantic-systems-biology.org/biogateway/querying
SPARQL
SPARQL
Semantic MediaWiki vs MediaWiki (I)
Semantic MediaWiki (and other semantic addons) is an extension of MediaWiki.At least as much as with MediaWiki.
Better and more specific search capabilitiesNot only free text search on pagesIt resembles relational database searchingSPARQL =~ SQL
Semantic MediaWiki vs MediaWiki (II)
Better browsing interface (browsing through properties, not only categories)
Importing and exporting of logical mesh.
Easier exchange of information with 3rd party applications (through RDF)
Protein-Wiki
Semantic wiki-based system for the management of a protein production service.
Currently in testing phase
In collaboration with CRG Protein ServiceCustomisation built up after study of their present workflow and actual needs.
Intended for internal use
Protein-Wiki. Advantages
> Cheaper approach than most commercial similar solutions
> Open-source technology. Blooming comunity behind.
> Avoidance of vendor lock-in and abusive licensing.
> Customisable to specific needs. Extrapolable to other cases.
Protein-Wiki. Example Workflow
Create study
Accept
Lab Member
Researcher
Access web interface
Fill form
Submit request
Reject
Review scientific info
Review study
Accept
Reject
Reject
Lab Manager
Assign study to core members
Finance Controller(ORDER MANAGEMENT SYSTEM)
Review financialInfo?
Accept
Open study
Retrieve SOP
Perform all study steps
(quotation)
Review study results
Reject
Sign-off (?)
Request review
Prepare report
(order number)
Send results/report
(communication)
//
Retrieve results/report
Sign-off (?)
Accept
Meeting
Meeting
Meeting
Meeting
Meeting
Meeting
// ?
//
// ?
Receive invoice
Design: Guglielmo Roma
Protein-Wiki: Users roles
Submit requests to the service using pre-defined templates, view the status of his/her requests at any time, and retrieve the study reports when experiments are complete
Can add, edit experimental data, cannot create or delete experiments.
Can create, edit, delete new experiments, associated to submitted requests, using pre-defined templates
Creation of new templates, users management and their training
ResearcherLab memberLab managerAdministrator
Protein-Wiki: permissions & security
Login & role permissions. Done automatically or via administrator
Namespaces specific permissions: Experiment:: (only lab members/managers)Template:: (only administrators)
Page specific permissionsBy using user and parse functions extensions
Network?
Protein-Wiki Homepage
Protein-Wiki. Request Form
Anonymous
Protein-Wiki. Request Form
Researcher
Protein-Wiki. Request result page
Researcher
Protein-Wiki. Enable experiment
Lab manager
Protein-Wiki. Experiment form
Lab member
Protein-Wiki. Experiment form
Lab member
Protein-Wiki. Experiment form
Lab member
Logical inputRestrictions. DataType linked
Protein-Wiki
Experiment page
Lab member
Protein-Wiki
Browse experiment properties
Protein-Wiki. Semantic properties
Administrator
Allowedvalues
Invalid value
Protein-Wiki. Conditional syntax
Enable certain experiment sections if asked by the researcher or lab manager
Input value restriction at the form level
Example: Only nucleotides allowed in Primer sequences
Protein-Wiki. List of tasks
May be visible or not to researchers. Workload.
Different fields depending on the user's role.
Protein-Wiki. List of tasks
Lab memberAny kind of customised listcan be created fromsemantic properties.
Conclusions (I)
Semantic MediaWiki (and other MediaWiki extensions) in lab workflow environments
Efficient collaboration between different usersGroup roles specific permissionsResearchers, lab members, lab managers, administrators
Well-know interface. All people should have edited Wikipedia once!
Note-taking in wiki for future consultation
Conclusions (II)
Semantic MediaWiki (and other MediaWiki extensions) in lab workflow environments
Users can be both humans and robot script applications
Refined and specific queries
Logic connection with other semantic empowered software
Easy set up of new environments (high level programming)Wiki templates, properties and forms vs coding and database design
Conclusions (III)
Semantic MediaWiki (and other MediaWiki extensions) in lab workflow environments
Tracking (page history and recent changes)
Unless performed by the wiki administrator, workflow cannot be avoided
Unless performed by the system administrator, history cannot be forged.
Permits 3rd party quality check auditing
Bioinformatics Unit
Guglielmo Roma
Luca Cozzuto
Francesco Mancuso
Acknowledgments
Protein Service
Michela Bertero
Silvia Speroni
Miriam Alloza