Upload
maximilian-newton
View
224
Download
4
Embed Size (px)
Citation preview
AHM September 2004
Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data
Dr Richard Sinnott
Technical Director National e-Science Centre|||
Deputy Director Technical Bioinformatics Research Centre University of Glasgow
3rd September 2004
AHM September 2004
Overview of BRIDGES
Biomedical Research Informatics Delivered by Grid Enabled Services (BRIDGES)
NeSC (Edinburgh and Glasgow) and IBM www.brc.dcs.gla.ac.uk/projects/bridges
Supporting project for CFG project Generating data on hypertensionRat, Mouse, Human genome databases
Variety of tools usedBLAST, BLAT, Gene Prediction, visualisation, …
Variety of data sources and formatsMicroarray data, genome DBs, project partner research data, medical records, …
Aim is integrated infrastructure supportingData federationSecurity
AHM September 2004
Grids & Life SciencesExtensive Research Community
>1000 per research university
Extensive ApplicationsMany people care about them
Health, Food, Environment, …
Interacts with many disciplinesPhysics, Chemistry, Maths/Statistics, Nano-engineering, …
Huge and expanding number of databases relevant to bioinformatics community
Heterogeneity, Interdependence, Complexity, Change, Dirty…
Linking in co-ordinated, secure manner full of open issues to be addressedCompute demands growing as more in-silico research undertaken
AHM September 2004
Database GrowthPDB Content Growth
•DBs growing exponentially!!!•Biobliographic (MedLine, PubMed…)
•Amino Acid Seq (SWISS-PROT, …)
•3D Molecular Structure (PDB, …)
•Nucleotide Seq (GenBank, EMBL, …)
•Biochemical Pathways (KEGG, WIT…)
•Molecular Classifications (SCOP, CATH,…)
•Motif Libraries (PROSITE, Blocks, …)
AHM September 2004
Complexity of Biological DataN
ucl
eoti
de
seq
uen
ces
Nu
cleo
tid
e st
ruct
ure
s
Gen
e ex
pre
ssio
ns
Pro
tein
Str
uct
ure
s
Pro
tei n
fu
nct
ion
s
Pro
tein
-pro
tein
inte
ract
ion
(p
ath
way
s)
Cel
l
Cel
l sig
nal
lin
g
Tis
sues
Org
ans
Ph
ysio
logy
Org
anis
ms
Pop
ula
tion
s
+ links to plant/crops, environmental, health, … information sources
AHM September 2004
More genomes …...Arabidopsis
thaliana
mouse
rat
Caenorhabitis elegans
Drosophilamelanogaster
Mycobacteriumleprae
Vibrio cholerae
Plasmodiumfalciparum
Mycobacteriumtuberculosis
Neisseria meningitidis
Z2491
Helicobacter pylori
Xylella fastidiosa
Borrelia burgorferi
Rickettsia prowazekii
Bacillus subtilis
Archaeoglobusfulgidus
Campylobacter jejuni
Aquifex aeolicus
Thermotoga maritima
Chlamydiapneumoniae
Pseudomonasaeruginosa
Ureaplasmaurealyticum
Buchnerasp. APS
Escherichia coli
Saccharomycescerevisiae
Yersinia pestis
Salmonellaenterica
Thermoplasmaacidophilum
AHM September 2004
Bio e-Science Projects
AHM September 2004
Bridges Project
Glasgow Edinburgh
Leicester Oxford
London
Netherlands
Publically Curated Data
Private data
Private data
Private data
Private data
Private data
Private data
CFG Virtual Organisation Ensembl
MGI
HUGO
OMIM
SWISS-PROT
… DATA HUB
RGD
SyntenyGrid
Service
blast
+
VO Authorisation
Information Integrator
OGSA-DAI
AHM September 2004
Grid Security
OGSA security Single sign-on based on (X.509) digital certificates
establish credentials– Certification authority based (RAL in UK)
Services (and clients) have APIs for fine grained security
Based on GSS-API
Provides for authentication but need authorisation
Various technologies for authorisation including PERMIS, CAS, …
Collaborating with PrivilEge and Role Management Infrastructure Standards Validation (PERMIS) team
Lead by Prof David Chadwick, University of Salford– (www.permis.org)
AHM September 2004
Security Authorisation
PERMIS allows toDefine roles for who can do what on what
Policy = { Role x Target x Action }– Can user X invoke service Y and access or change data Z?
» Policies created with PERMIS PolicyEditor (output is XML based policy)
AHM September 2004
Security Authorisation
PERMIS Privilege Allocator then used to sign policies
Associates roles with specific users Policies stored as attribute certificates in LDAP server
When is authorisation done?Two main choices
Portal personalised for users based on their policies– If not allowed to invoke service then they do not get to see it
Actions of users (with given role) are authorised every time the service is invoked
– They can see the service but potentially not be allowed to invoke it» Performance issues… but more likely scenario for authorisation
In both cases, if not explicitly agreed in policy then rejected and logged!– Both cases being explored
Plan to exploit the GGF SAML AuthZ specification Based on GT3.3 – currently have BLAST service in GT3.2Final
– Identified issues with standards…
AHM September 2004
Where we are today!Information Integrator DB repository established and populated
… with public data sets (OMIM, HUGO, RGD, SWISS-PROT)… linked to relevant resources (ENSEMBL- rat, human, mouse, MGI)
GT3 based Grid services developed (BLAST) using own meta-scheduler
General usage of ScotGrid and local Condor poolPortal developed using IBM WebSphereGenome visualisation browsers
SyntenyVista – for viewing synteny between local/remote data setsMagnaVista – for exploring genetic information across multiple (remote) resources
Gaining experience with security technologiesSetting up policies with Grid security authorisation software etc
Rolled-out Alpha version of system to CFG group July ‘04
AHM September 2004
Lessons learned
Public data resources opennessOften cannot query directly Often not easy/possible to find schemasJoint Data Standards Study investigating this
Started on 1st June and involves– Digital Archiving Consultancy– Bioinformatics Research Centre (Glasgow)– NeSC (Edinburgh and Glasgow)
Look at technical, political, social, ethical etc issues involved in accessing and using public life science resources
– Will liase with NDCC– Interview relevant scientists, data curators/providers
8 month project with final report in January– Funded by MRC, BBSRC, Wellcome Trust, JISC, NERC, DTI
GT3 not without pain! (… understatement!!!!)Hopefully GT4 will be better?
AHM September 2004
AHM September 2004
www.nesc.ac.uk
AHM September 2004
AHM September 2004
AHM September 2004
AHM September 2004
AHM September 2004