8
© OMII-UK 2008 March 2009 OMII-UK NEWS www.omii.ac.uk By David Woolls, CEO CFL Software In the last two years, CFL Software has been asked for everything: from checking whether UCAS applicants are plagiarising, to stopping music reviewers copying themselves on the Slicethepie website. And that’s in real time, often with tens of thousands of records to check per day. Not a simple task, so CFL are considering the use of cloud computing to help. We asked David Woolls for an overview of their latest work. The problems that CFL’s customers face have a common theme: normal search methods don’t work, because the users don’t know what the question is, and straightforward pattern matching isn’t an option, because changed sentences need to be identified. We are asked for help because we have a strong back- ground in collusion and plagiarism detec- tion, but we now face the challenge of scaling up the methodology and making the programs work in real time. CFL has helped the Universities and Col- leges Admissions Service (UCAS) by revising our base program, Copycatch Investigator, to run 24/7 on a SunT2000. The effect has been striking. By the sec- ond year, there has been a 26% drop in the number of applicants falling into the most serious copying category. No false positives have been reported, and no successful appeal has been lodged. With the advent of cloud computing, such a program could run as a service, with the cloud handling the peaks in demand oc- curring at application deadlines. Slicethepie is a music discovery and re- view website that pays users who submit music reviews. Some users were abusing this system by pasting the same review for each song they listened to. We placed a clause-level checking system inside the Flash player used to play Slicethepie’s music. Now a review is only accepted if it passes tests for self-copying, relevance and brevity (among others factors). Client -side monitoring allows 10,000 reviews a day to be handled with minimal back- ground checking and a 99% clean data- base. The most notable side effect is that the quality of all the reviews has im- proved, even though the original problem was only affecting about 3%. We are also developing a standalone Contextual Query search engine. The fuzzy search and specially designed in- dexing allows close comparison, at clause level, of complete documents of different lengths with a number of user- customisable parameters. This could be readily scaled to index specific areas of the web that are of interest to a particular customer: a concept which was con- firmed at the Cloudscape event in Brus- sels. Grid technology will be required to handle the large volumes, and cloud technologies to deliver the service. CFL are exploring a strategic partnership with the UK National Grid Service to meet education needs, and looking into an enterprise-level offering of the software as a service in the future. www.copycatchgold.com Copying and clouds Within two years, CFL technology had reduced the most serious copying on UCAS forms by 26%.

OMII-UK NEWS Mar 2009.pdf · OMII-UK NEWS By David Woolls, CEO CFL Software ... essed by the caGrid project, or the re-sources used by text and data mining groups

Embed Size (px)

Citation preview

Page 1: OMII-UK NEWS Mar 2009.pdf · OMII-UK NEWS  By David Woolls, CEO CFL Software ... essed by the caGrid project, or the re-sources used by text and data mining groups

© OMII-UK 2008

March 2009

OMII-UK NEWS www.omii.ac.uk

By David Woolls, CEO CFL Software

In the last two years, CFL Software has been asked for everything: from checking whether UCAS applicants are plagiarising, to stopping music reviewers copying themselves on the Slicethepie website. And that’s in real time, often with tens of thousands of records to check per day. Not a simple task, so CFL are considering the use of cloud computing to help. We asked David Woolls for an overview of their latest work.

The problems that CFL’s customers face have a common theme: normal search methods don’t work, because the users don’t know what the question is, and straightforward pattern matching isn’t an option, because changed sentences need to be identified. We are asked for help because we have a strong back-ground in collusion and plagiarism detec-tion, but we now face the challenge of scaling up the methodology and making the programs work in real time.

CFL has helped the Universities and Col-leges Admissions Service (UCAS) by revising our base program, Copycatch Investigator, to run 24/7 on a SunT2000. The effect has been striking. By the sec-ond year, there has been a 26% drop in the number of applicants falling into the most serious copying category. No false positives have been reported, and no successful appeal has been lodged. With the advent of cloud computing, such a program could run as a service, with the cloud handling the peaks in demand oc-curring at application deadlines.

Slicethepie is a music discovery and re-view website that pays users who submit music reviews. Some users were abusing

this system by pasting the same review for each song they listened to. We placed a clause-level checking system inside the Flash player used to play Slicethepie’s music. Now a review is only accepted if it passes tests for self-copying, relevance and brevity (among others factors). Client-side monitoring allows 10,000 reviews a day to be handled with minimal back-ground checking and a 99% clean data-base. The most notable side effect is that the quality of all the reviews has im-proved, even though the original problem was only affecting about 3%.

We are also developing a standalone Contextual Query search engine. The fuzzy search and specially designed in-dexing allows close comparison, at

clause level, of complete documents of different lengths with a number of user-customisable parameters. This could be readily scaled to index specific areas of the web that are of interest to a particular customer: a concept which was con-firmed at the Cloudscape event in Brus-sels. Grid technology will be required to handle the large volumes, and cloud technologies to deliver the service. CFL are exploring a strategic partnership with the UK National Grid Service to meet education needs, and looking into an enterprise-level offering of the software as a service in the future.

www.copycatchgold.com

Copying and clouds

Within two years, CFL technology had reduced the most serious copying on UCAS forms by 26%.

Page 2: OMII-UK NEWS Mar 2009.pdf · OMII-UK NEWS  By David Woolls, CEO CFL Software ... essed by the caGrid project, or the re-sources used by text and data mining groups

March 2009

OMII-UK News—www.omii.ac.uk/wiki/Newsletter

Page 2 NEWS IN BRIEF

Would you like to help guide the future of e-Research software development? OMII-UK is organising a two-day Col-laborations Workshop at NeSC in Edin-burgh, starting on 30 April 2009. The workshop will be based on a flexible agenda: allowing you to suggest top-ics for discussion so that the subjects important to your research are repre-sented.

The Collaborations Workshop will bring together OMII-UK staff, software develop-ers, architects and users. It will be based on developing collaborative solutions, or approaches, to interesting e-Research problems/issues.

Info: www.tinyurl.com/d8sr5r Register: www.nesc.ac.uk/esi/events/958

Help us develop e-Research software OMII-UK Collaborations Workshop 2009

Professor Carole Goble, one of OMII-UK’s Principle Investigators, was the recipient of the inaugural Jim Gray award at the 2008 Microsoft eScience Conference in Indianapolis.

Carole was chosen for the award in rec-ognition of her contributions to e-Science, in particular, the development of tools such as Taverna and myExperi-ment, and their application across a wide range of scientific domains.

www.tinyurl.com/clzmr8

Jim Gray award for Carole Goble

Data Integration in the Life Sciences (DILS) 2009 takes place on 20 June this year. It is the sixth workshop in a series that aims at fostering discussion, ex-change, and innovation in research and development in the areas of data integra-tion and data management for the life

sciences. The DILS workshop provides a forum for the discussion of emerging chal-lenges for data integration in the life sci-ences, and the techniques that seek to address them.

www.tinyurl.com/boteqc

Data Integration in the Life Sciences

As this issue of our newsletter goes to press, I am delighted to announce that OMII-UK has been awarded funding by EPSRC to continue our work to culti-vate and support community software for research.

The funding will continue our community building and outreach, as well as ensur-ing the continuation of our core develop-

ment to support software. The UK National Grid Service has also received funding and we look forward to a con-tinued collaboration with them.

Software sustainability is an issue that is being considered on both sides of the Atlantic, with the US National Science Foundation bringing together organisations from around

the world (including OMII-UK) to define policy. In the UK, the newly formed Infra-structure SAT (Strategic Advisory Team) is advising EPSRC on a strategy that encompasses both e-Science and High Performance Computing and we hope that this will lead to new opportunities for OMII-UK to support innovative research by ensuring the right software is always available.

OMII-UK awarded further funding

Carole Goble receives the Jim Gray award from Daron Green (left) and Tony Hey (right), both

from Microsoft External Research.

e-Science The Changing Landscape

The e-Science landscape has changed significantly over the past few years, so it is a good time to con-sider these changes to ensure the effective embedding of e-Science.

The e-Science workshop will discuss the issues facing the e-Science community. The workshop is being organised by NeSC and will take place on 16-17 April 2009. The first day of the workshop is open to e-Science Directors, and the second day will be open to university IT service providers, and e-Science and HPC researchers. The second day will focus on meeting the researchers and will include: success stories, reports on user requirements from eUptake and a half-day village market where OMII-UK and others will present demonstrations.

www.tinyurl.com/bfkfnw

By Neil Chue Hong, Director, OMII-UK

Page 3: OMII-UK NEWS Mar 2009.pdf · OMII-UK NEWS  By David Woolls, CEO CFL Software ... essed by the caGrid project, or the re-sources used by text and data mining groups

March 2009

OMII-UK News—www.omii.ac.uk/wiki/Newsletter

Page 3

By Simon Laws, IBM By Alan Williams

While working with the UK e-Science community, we spent time thinking about rendering service interfaces for features such as data access and job scheduling. We didn't have to think about how to describe the assembly of a set of such services into a working system, because the open-source Apache Tuscany implementation of the Service Component Architecture (SCA) did it for us.

SCA is a set of specifications currently being standardised at OASIS. The speci-fications provide a technology-neutral composition capability for assembling applications from services. It defines the simple concepts of components which have references and services. Existing SCA and non-SCA services can be as-sembled into compositions, and these can be re-used as new building blocks in their own right. SCA (and Tuscany) pro-vides a policy-based model to uniformly describe and realise quality-of-service requirements across assembled services, irrespective of how individual services are implemented or deployed.

SCA scales from fine-grained assembly of local services through to coarse-grained assembly of distributed services. It is a convenient way of describing a network of services that are to be exe-cuted across a distributed set of hardware resources. Whether you consider these resources to be a grid, a cloud or more generally the basis for an SOA (Service Orientated Architecture), the SCA con-cept still applies: a distributed application is modelled as an assembly of wired com-ponents.

The motivation of the Apache Tuscany project is to provide an easy-to-use, open-source services infrastructure based on the SCA specifications. Apache Tuscany exploits the extensibility of SCA to sup-port a wide range of service implementa-tion types and communication technolo-gies. Tuscany's OSGi-based distributed SCA runtime allows the application de-ployer to match the infrastructure, tech-nology and geographic needs of the ap-plication—from a full blown JEE applica-tion container, a light-weight JSE runtime, or even an Android Virtual Machine.

www.tinyurl.com/aej5p3

Bringing order to service assembly

A new version of Taverna was re-leased in December 2008 by the my-Grid team – one of the three OMII-UK partners. The release comprises the Taverna 2.0 Workflow Engine and an experimental workbench interface, which showcases some of the en-gine's new capabilities. Taverna 2.0 is set to continue the success of its predecessor: more than 1200 downloads were recorded less than a month after its release.

Taverna 2.0 uses a re-engineered, scal-able, high-performance and extensible enactment engine. It natively handles data-reference management so that data is shipped around the engine and can be cached to disk without intervention from the user. The Engine has been devel-oped to solve real-world problems, such as the need to handle large amounts of data, and the need to iterate over, com-bine and collate data sets. These fea-tures have been designed to overcome problems faced in the diverse domains in which Taverna is used, such as the neu-rological data handled by the CARMEN project, the medical information proc-essed by the caGrid project, or the re-sources used by text and data mining groups.

The performance of Taverna 2.0 has been improved. The overall effect on the execution time for a workflow can be

significant (preliminary tests show a five-times improvement over Taverna 1). Workflows for the Taverna 2.0 Workflow Engine can be written using either the 1.7.1 Workbench or the experimental Taverna 2.0 Workbench. A Taverna 2.1 Workbench is due to be released in March 2009. It will benefit from a more user-friendly workflow interface, whilst retaining access to the Taverna 2.0 En-gine’s capabilities.

www.mygrid.org.uk

New Taverna release Latest release of the highly successful Workflow Engine

Taverna 2.0 features at a glance: • Handles large numbers of large data items. • Extensible: it is already being extended to

run services on caGrid. • Includes implicit iteration over items in a

data set. • Allows configuration for combining and iter-

ating over sets of data. • Pipelines data for faster workflow execution,

with services called and workflow outputs produced as soon as possible.

• Copes with erroneous data, preventing it from affecting the rest of the workflow run and allowing traceback of problems.

• Can run both Taverna 1 and Taverna 2 workflows.

caBIG is an information network used by cancer researchers and physicians. They use Taverna to perform data analysis that could lead to new discoveries about the causes and treatment of cancer.

Page 4: OMII-UK NEWS Mar 2009.pdf · OMII-UK NEWS  By David Woolls, CEO CFL Software ... essed by the caGrid project, or the re-sources used by text and data mining groups

March 2009

OMII-UK News—www.omii.ac.uk/wiki/Newsletter

Page 4

A detailed picture of the UK’s e-Research community has been devel-oped by the JISC-funded Engage Initia-tive. The Initiative interviewed over fifty researchers to build up a knowl-edge base of the many ways in which computationally intensive research is being conducted across research do-mains.

Now that the e-Research picture has been created, the Initiative will review best practices and the lessons learned from the interviews, and the most promis-ing technologies will be taken forward by a collection of Engage-funded projects. These projects were chosen because a significant contribution to the community could be achieved in a relatively short time-frame.

The Engage-funded projects cut across all areas of e-Research, from archae-ology to climate modelling, and involve numerous applications of e-Infrastructure. A unifying aim of the projects is to deliver a new set of services and applications for the National Grid Service (NGS), the main provider of e-Infrastructure in the UK. They will also provide a user-base that is

hungry for the improved services that these services and applications will pro-vide. Three of the projects are described on pages four and five of this newsletter, details of the others can be found on the Engage website and in future newsletters.

The Engage projects will provide greater functionality and an extended user base for proof-of-concept software, by strengthening it and transferring it to the NGS. The first stage of the Engage Initia-tive identified the projects that could benefit from funding, the future stages of the Initiative will see the community bene-fitting from software that will be devel-oped to meet their needs.

www.engage.ac.uk

By Steve Brewer, Engage

Focus on the Engage Initiative

Engage provides a picture of the UK’s e-Research community

A simple, fast-running model of the Earth’s climate system was one of the end-products of the GENIE project. Making this system available to a more diverse research community is the goal of the Engage-funded Aladin2 project.

GENIE allowed users to model the cli-mate over long timeframes – even those stretching over many thousands of years – albeit with a lesser degree of granularity than gained with other climate-modelling systems. The modular construction of GENIE enabled Open University students to acquire a stand-alone simulator that was used to create real-time models of various Earth projections. Funding from the Engage Initiative will help the GENIE project to combine their system with a MatLab toolbox, providing users with far greater control over the climate model.

Current climate models, including GENIE, typically require users to possess a sig-nificant degree of computing expertise. Aladin2 will be considerably easier to use. The project will provide a prototype of a launchpad application that will help with the set-up and launch of the GENIE cli-mate-modelling system. It will also facili-

tate the system’s use in training work-shops for PhD students and other re-searchers, and in Masters-level teaching units at the Universities of East Anglia and Bristol. The launchpad will be up-dated following evaluation in these envi-ronments, and an improved version will be released. It is anticipated that an NGS-hosted version of the application will ap-pear shortly afterwards. The new applica-tion will facilitate the simple set-up and sharing of ensemble experiments, and

will remove the need for extensive com-puting experience, making GENIE acces-sible, not only to a wider range of scien-tists, but to the public as well.

www.genie.ac.uk

Making climate modelling accessible

An image of the Earth, developed by the GENIE project, showing sea surface temperatures.

“Now that the e-Research picture has

been created… the most promising technologies will be

taken forward”

Page 5: OMII-UK NEWS Mar 2009.pdf · OMII-UK NEWS  By David Woolls, CEO CFL Software ... essed by the caGrid project, or the re-sources used by text and data mining groups

March 2009

OMII-UK News—www.omii.ac.uk/wiki/Newsletter

Page 5 Focus on the Engage Initiative

Dose planning is fundamentally impor-tant to radiotherapy treatment of tu-mours: get the does too low, and the tumour will be unharmed, get the does too high, and healthy tissue will be damaged. With Engage funding, the Monte Carlo Treatment Planning (MCTP) project will make new tech-niques for quickly generating dose calculations more reliable and avail-able to a wider user group.

A collaboration between the Cardiff School of Computer Science and the De-partment of Medical Physics at the Velin-dre Cancer Centre has been conducting ground-breaking research with distributed-computing resources. Currently, their resources are distributed only on a local scale. Funding from the Engage Initiative will make the technology available on the National Grid Service, which will open the door to a wider and more diverse commu-nity of researchers and clinicians. One of the main challenges that prevents the dose-planning application being accessed by clinicians throughout the UK is the problem of integrating it with other sys-tems – such as the NHS firewall. In re-sponse to this problem, the Engage Initia-tive has assembled a team of security experts from OMII-UK, who will work with the Cardiff developers to overcome secu-rity issues.

Monte Carlo techniques help to more accurately determine the radiation dose received by a tumour and neighbouring

healthy tissue, but each calculation can take days to run using standard equip-ment. Whilst the radiotherapy process is cost-effective, the complex and computa-tionally intensive dose calculation makes the overall treatment cost prohibitive. The precursor to the MCTP, the RTGrid pro-ject, saw researchers at Cardiff using distributed computing to significantly re-duce computation times and treatment

costs.

With over 289,000* cases of cancer diag-nosed each year in the UK, new tech-niques for cancer treatment are vital to the lives of thousands of people. The MCTP project will ensure that the new cost-effective dose-planning technology is available to clinicians all over the country.

Two notable features of e-Research are the willingness of researchers to embrace large-scale ICT, and the ten-dency to take up specialist technolo-gies from different fields. The eSAD project is a good example. It is bring-ing together two multi-disciplinary col-laborations to adapt image-processing tools for the study of ancient docu-ments, which will then be provided to a diverse new community.

The eSAD project will offer image-processing and interpretation support tools to papyrologists and epigraphers, using the more mature VRE-SDM project, which has developed a pilot portal frame-work for classicists studying ancient docu-ments.

A user working with the VRE-SDM as a front-end, will have the ability to call upon the new image-processing tools devel-

oped by the eSAD project. The aim is to use the same techniques to plug in the functionality of the image-processing al-gorithms and, ultimately, an interpretation support system into the VRE. Image-processing algorithms will require the use of NGS resources and would ideally be offered as functionalities wrapped in Web Services and presented to the user in in the VRE-SDM application within portlets. Migrating resources to the NGS also en-sures access to the material by a much larger community, meaning that eSAD will enable a collaborative approach to the analysis and recording of ancient docu-ments.

Radiotherapy to become more accurate

Images such as this X-ray showing a throat cancer overlaid with contours showing tissue density are used to determine radiotherapy doses.

*Source: Cancer Research UK

A view on ancient documents

Segolene Tarte and David Wallom using eSAD.

Page 6: OMII-UK NEWS Mar 2009.pdf · OMII-UK NEWS  By David Woolls, CEO CFL Software ... essed by the caGrid project, or the re-sources used by text and data mining groups

March 2009

OMII-UK News—www.omii.ac.uk/wiki/Newsletter

Page 6

The future of UK e-Science will be the focus of a review organised by EPSRC on behalf of the Research Councils. The UK e-Science programme over-saw a huge investment – £250M of funding. It is the quality and impact of this funding that the Research Coun-cils wish to assess. EPSRC are now seeking nominations for a panel of experts who will assess UK e-Science and report back in March 2010 with recommendations for its future.

The expert panel will comprise interna-tionally renowned researchers from out-side the UK, who will benchmark the strengths of UK research activities com-pared to those found across the world, and highlight any gaps or missed oppor-tunities. The panel will meet during the All Hands Meeting in December 2009, providing them with the perfect opportu-nity to meet UK research groups, and access a pool of experts and supporting data to help reach their conclusions.

Over its eight-year history, the UK e-Science programme has claimed many successes and the formation of an e-Science infrastructure in the UK – includ-ing the foundation of OMII-UK and the NGS. The e-Science review will allow the EPSRC to determine what UK e-Science does well, and what is needed to ensure that it does better in the future.

The EPSRC are now seeking nomina-tions for panel members who work out-side the UK (including UK nationals) hold senior positions, are currently active in their field of research and are highly ef-fective in team-working situations. Nomi-nation forms can be found at: www.tinyurl.com/cdlqoo. For further infor-mation, please contact Sarah Fulford.

[email protected] tel: +44 (0) 1793 44 4122

EPSRC focuses on the future of UK e-Science

In their daily work, researchers and information scientists are often forced to move back and forth between differ-ent digital environments. Institutional or thematic repositories have become a prevalent mechanism to manage publications, and increasingly also to manage research outputs and primary data. Integration of the user's natural work environ-ment with repositories is improv-ing (albeit slowly), and repository-based research environments are emerging (for example, eSciDoc). The natural habitat for many sci-entific users, however, is e-Infrastructure like the grid. At the same time, these users are em-ploying repositories – often home-grown systems – to store their re-search outputs and publications.

A workshop series supported by OGF-Europe, DReSNeT and OMII-UK has set out to reduce this fragmentation and ex-plore the interfaces between grid- and repository-based architectures. Four workshops have been held so far, at OGF23, DReSNeT, DCC 2008 and e-Science 2008, each attended by between

50 and 100 participants from a variety of backgrounds. Despite a tangible termi-nology gap between repository managers and some (research) users, the common-alities between requirements and the existing systems were astonishing.

The main issues raised – preservation, good scientific practice, metadata and collection management – emphasise the tight integration of the repository with the user's work environment where the data is created. Reliable audit trails, metadata, and suchlike, can only be created if the users' work environments and reposito-ries connect seamlessly. The federation of distinct repositories is obviously an important factor for achieving this. Users must not be required to deposit data mul-

tiple times, for example, into their institu-tional repository and into a thematic re-pository too. Protocols such as OAI-ORE are interesting as they resemble proto-cols dubbed grid in many ways. OAI-ORE allows for virtualisation of digital repositories just as the OGSA-DAI/DAIS protocols virtualises distributed data-bases.

Of course, repositories and the grid are not the only kinds of infrastructure avail-able, but interweaving them is an essen-tial step towards the simple and prag-matic plumbing the user is seeking. How-

ever, the time for pure experimenta-tion is coming to an end for both re-positories and the grid. The funding institutions represented at the work-shops are looking for operational, trustworthy scientific repositories with many of the features described above. The next workshop of the series will be held at the Open Grid Forum 25, on 2-6 March 2009. Infra-structure may be of many kinds, but let's ensure that these infrastructures

can interoperate.

Repositories and the grid come together

“the commonalities between [the users’] requirements and the existing systems were

astonishing”

This is an abridged version of an article originally published in D-Lib magazine.

By Andreas Aschenbrenner, Tobias Blanke, Neil Chue Hong, Nicholas Ferguson and Mark Hedges

By Simon Hettrick

Page 7: OMII-UK NEWS Mar 2009.pdf · OMII-UK NEWS  By David Woolls, CEO CFL Software ... essed by the caGrid project, or the re-sources used by text and data mining groups

March 2009

OMII-UK News—www.omii.ac.uk/wiki/Newsletter

Page 7

The far-reaching impact of the Gene Ontology (GO) shows the important role a shared conceptual view of the biological world can have on data in-terpretation in the life sciences. Sub-jects such as anatomy, phenotype and clinical studies are active areas for ontology development where consen-sus has yet to be reached, or where ontologies have been developed, but their organisation needs to be kept under review. At the very least, any ontology that is in use will be curated and subject to extensions and altera-tions. In all cases, effective collabora-tive ontology development is essential to progress, and appropriate tools are needed.

While the GO is curated centrally, many biological ontologies are, in contrast, developed by loosely organised groups of scientists who remotely participate in the ontology development process. Until recent times, technological support for bio-ontology development relied on stand-alone ontology editors running on users’ desktops. These supported the creation of new ontology versions (for example, OBOedit, COBrA and Protégé) private email, email lists and perhaps Wikis for the distribution of ontology files and dis-cussions about their contents. Clearly, much better use could be made of stor-age, versioning and visualisation tech-niques being developed by the database and e-Science communities.

The BioSphere ontology portal provides a set of tools for editing ontologies and organising the community of developers. Through the use of GridSphere, tools and services can be provided securely. This system provides each user with their own view of the ontology which is created on-the-fly. Each user’s view is derived from the group’s view – ontology development is a process where users create new versions from an initial common source, and then, periodically, divergent views are reconciled by a curator and a new consensus emerges. For each user, the system automatically manages the edits they make and supports versioning. That is, while editing their view of the ontology, the modifications are saved in the central source document as updates or dele-tions. Users belong to groups and have

editing or curation rights, according to the views of other members of the group. The use of an XML database allows searching within one or more ontology documents, all of which are stored in the XML syntax recently developed for the Web Ontology Language (OWL) version 1.1. Viewing ontologies as XML docu-ments allows XML methods for version-ing to be applied. Additional annotations are introduced to represent timestamps for each assertion in the ontology. The eXist XML database is accessed through an OGSA-DAI API, and while currently the OWL XML API resides in the GridSphere portlet, our ultimate aim is to relocate these data-access functions to the OGSA-DAI layer so that they can be

made widely available.

While developing systems in the Java Enterprise style has its challenges – the multiplicity of layers being one – we be-lieve that the potential for code reuse within the e-Science development com-munity is a key benefit beyond the imme-diate goal of providing a useful tool for a specific user group.

www.geneontology.org www.biosphere-portal.org

Grid technology for collaborative ontology development

Carl Linnaeus laid the foundations for coding biological data. He was a Swedish botanist, physician, and zoologist, who is known as the father of modern taxonomy.

By Stuart Aitken, School of Informatics, University of Edinburgh

Page 8: OMII-UK NEWS Mar 2009.pdf · OMII-UK NEWS  By David Woolls, CEO CFL Software ... essed by the caGrid project, or the re-sources used by text and data mining groups

March 2009

OMII-UK News—www.omii.ac.uk/wiki/Newsletter

Page 8

By Mario Antonioletti

Source: OGF-Europe

December 2008 saw the release of OGSA-DAI 3.1, which is an extensible framework for accessing and manag-ing distributed heterogeneous data resources, such as databases or files, using web services. OGSA-DAI is de-veloped by the OGSA-DAI team: one of the three OMII-UK partners.

OGSA-DAI 3.1 preserves the published APIs of OGSA-DAI 3.0. This means that OGSA-DAI 3.1 can act as a drop-in re-placement for an OGSA-DAI 3.0 deploy-ment - everything that worked with 3.0 at the client or server side should still work as before. The new release includes ad-ditional support for data-management operations, including extended opera-tions for XML databases. It also includes

numerous bug fixes and a complete re-write of the user documentation. A pres-entation layer built using Globus Toolkit 4.2, which is compliant with the final WSRF specifications, is also available. Since the release of version 3.1, the OGSA-DAI project team have gone on to release an implementation of the OGF's DAIS-WG WS-DAIR candidate recom-mendation and an SQL views extension pack that allows SQL views to be defined over read-only relational databases.

An OGSA-DAI service uses a server-side workflow engine to execute workflows that can be composed of access, update, transform, delivery and other custom operations. These operate on data streamed from data resources before

delivering the data to a client. As most of the work can occur close to the data source, this can reduce the need to move data unnecessarily. In addition, OGSA-DAI is designed to act as a layer within, or upon which, more powerful data man-agement capabilities can be built, for example, to undertake distributed query processing.

www.ogsadai.org.uk

In this age of climate change, as we seek to minimise our energy consump-tion within the ICT industry and society as a whole, the issue of Green IT has become highly topical. Green IT is also of interest to Policy Makers, with dis-cussions emerging about the potential regulation of data centres and a new Code of Conduct for Energy Efficiency aimed at data centre owners in the European Union. Green IT is a focal point in OGF.

To realise effective Green IT, OGF are bringing together key stakeholders and technologists, identifying necessary capa-bilities, possible common practices and potential areas of standardisation. Key issues that will be addressed by OGF include defining the metrics and tools to measure energy efficiency, such as Power Usage Effectiveness and Data Centre Efficiency, and understanding how we might better orchestrate the use of shared infrastructure, to reflect energy policy de-cisions and the distribution of workloads based on energy requirements and estab-lished policies.

Coming at a crucial time, the community-driven session led by OGF-Europe in Ca-tania this March aims to shed light on the challenges of Green IT and arrive at an understanding of the implications from a standards perspective. Building on the outcomes of an introductory workshop at

OGF23 in Barcelona, and the launch of the new Code of Conduct last November, the session is designed as a call to action for the community to work in key areas of contribution. ‘This will enable us to design more efficient infrastructures with minimal carbon impact, operate the infrastructures in line with energy policies allowing the user to devise the most efficient use of energy resources, and identify areas where standards can be developed to incorporate the necessary functionality’, says Ian Osborne, OGF’s VP of Enter-prise.

The Green IT session at OGF25 will ex-plore use cases, and review advances in technology that can assist the efficient deployment of compute capability. It will also feature a discussion about a refer-ence model for end-to-end energy man-agement, the components of which may include monitoring and measuring, deci-sion support tools, and enactment tools. Speakers include: Liam Newcombe, Ro-monet/BCS Data Centre Specialist Group – the architect of the EU Code of Conduct and developer of an engineering model for energy consumption in the data centre, and Paul Strong, Distinguished Scientist at e-Bay and OGF Board Member.

www.ogfeurope.eu www.ogf.org

OGF help IT to reduce carbon emissions

“The community-driven session led by OGF-Europe... aims to shed light

on the challenges of Green IT”

A brand new OGSA-DAI

If you have any ideas or comments, please contact the editorial team: [email protected].