57
e-Science and e-Science and Scholarly Scholarly Communication Communication Tony Hey Tony Hey Corporate VP for Technical Corporate VP for Technical Computing Computing Microsoft Corporation Microsoft Corporation

E-Science and Scholarly Communication Tony Hey Corporate VP for Technical Computing Microsoft Corporation

Embed Size (px)

Citation preview

e-Science and Scholarly e-Science and Scholarly CommunicationCommunication

Tony HeyTony HeyCorporate VP for Technical ComputingCorporate VP for Technical Computing

Microsoft CorporationMicrosoft Corporation

What is e-Science?What is e-Science?

‘‘e-Science is about global collaboration e-Science is about global collaboration in key areas of science, and the next in key areas of science, and the next generation of infrastructure that will generation of infrastructure that will enable it’enable it’

John TaylorJohn Taylor

Former Former Director General of Research CouncilsDirector General of Research Councils

Office of Science and Technology, UKOffice of Science and Technology, UK

A New Science ParadigmA New Science Paradigm Thousand years ago:Thousand years ago:

Experimental Science Experimental Science - - description of natural phenomenadescription of natural phenomena

Last few hundred years:Last few hundred years: Theoretical Science Theoretical Science - Newton’s Laws, Maxwell’s Equations …- Newton’s Laws, Maxwell’s Equations …

Last few decadesLast few decades:: Computational Science Computational Science - simulation of complex phenomena- simulation of complex phenomena

Today:Today: e-Science or Data-centric Science e-Science or Data-centric Science - unify theory, experiment, and simulation - unify theory, experiment, and simulation - using data exploration and data mining- using data exploration and data mining• Data captured by instruments Data captured by instruments • Data generated by simulationsData generated by simulations• Data generated by sensor networksData generated by sensor networks Scientist analyzes databases/filesScientist analyzes databases/files

(With thanks to Jim Gray)(With thanks to Jim Gray)

2

22.

3

4

a

cG

a

a

2

22.

3

4

a

cG

a

a

e-Sciencee-Science e-Science is about data-driven, multidisciplinary e-Science is about data-driven, multidisciplinary

science and the technologies to support such science and the technologies to support such distributed, collaborative scientific researchdistributed, collaborative scientific research Many areas of science are now being overwhelmed Many areas of science are now being overwhelmed

by a ‘data deluge’ from new high-throughput devices, by a ‘data deluge’ from new high-throughput devices, sensor networks, satellite surveys …sensor networks, satellite surveys …

Areas such as bioinformatics, genomics, drug design, Areas such as bioinformatics, genomics, drug design, engineering and healthcare require collaboration engineering and healthcare require collaboration between different domain expertsbetween different domain experts

‘‘e-Science’ is a shorthand for a set of e-Science’ is a shorthand for a set of technologies to support collaborative networked technologies to support collaborative networked science science

HPC and Information Management are key HPC and Information Management are key technologies to support this e-Science revolutiontechnologies to support this e-Science revolution

http://www.neptune.washington.edu/http://www.neptune.washington.edu/

Undersea Sensor

Network

Connected & Controllable

Over the Internet

Visual Programmin

g

PersistentDistributed

Storage

Distributed Computatio

n

Interoperability & Legacy Support via

Web Services

Live Documents

Searching &

Visualization

Reputation& Influence

Two examples of e-ScienceTwo examples of e-Science

Astronomy – The International Virtual Astronomy – The International Virtual Observatory Observatory

Chemistry – The Comb-e-Chem ProjectChemistry – The Comb-e-Chem Project

The Multiwavelength Crab NebulaeThe Multiwavelength Crab Nebulae

X-ray, optical,

infrared, and radio

views of the nearby Crab

Nebula, which is now in a state of

chaotic expansion after a supernova

explosion first sighted in 1054 A.D. by Chinese Astronomers.Slide courtesy of Robert Brunner @ CalTech.

Crab star 1053 AD

IVO: An Astronomy Data GridIVO: An Astronomy Data Grid Working to build world-wide telescopeWorking to build world-wide telescope

All astronomy data and literature All astronomy data and literature online and cross indexedonline and cross indexed Tools to analyze itTools to analyze it

Built SkyServer.SDSS.orgBuilt SkyServer.SDSS.org Built Analysis systemBuilt Analysis system

MyDBMyDB CasJobs (batch job)CasJobs (batch job)

OpenSkyQueryOpenSkyQueryFederation of ~20 observatories.Federation of ~20 observatories.

Results:Results: It works and is used every dayIt works and is used every day Spatial extensions in SQL 2005Spatial extensions in SQL 2005 A good example of Data GridA good example of Data Grid A good example of Web ServicesA good example of Web Services

The Comb-e-Chem ProjectThe Comb-e-Chem Project

National X-RayService

Data Mining and Analysis

Automatic Annotation

Combinatorial Chemistry Wet Lab

HPC SimulationVideo Data

StreamD

iffra

ctom

eter

Middleware

StructuresDatabase

National Crystallographic SNational Crystallographic Serviceervice

X-Ray e-LaboratoryStructuresDatabase

ComputationService

Send sample material to

NCS service

Search materials database and predict properties using

Grid computations

Download full data on materials

of interest

Collaborate in e-Lab experiment and obtain structure

A digital lab book replacement that

chemists were able to use, and liked

Monitoring laboratory experiments using a broker delivered over GPRS on a PDA

Crystallographic e-PrintsCrystallographic e-PrintsDirect Access to Raw Data from scientific papers

Raw data sets can be very Raw data sets can be very large - stored at UK National large - stored at UK National Datastore using SRB softwareDatastore using SRB software

Grid

E-Scientists

Entire e-Science CycleEncompassing experimentation, analysis, publication, research, learning

5

Institutional Archive

LocalWebPublisher

Holdings

Digital Library

E-Scientists Graduate Students

Undergraduate Students

Virtual Learning Environment

e-Experimentation

e-Scientists

Technical Reports

Reprints

Peer-Reviewed Journal &

Conference Papers

Preprints & Metadata

Certified Experimental

Results & Analyses

Data, Metadata & Ontologies

eBank Project

CyberinfrastructureCyberinfrastructure In the US, Europe and Asia there is a In the US, Europe and Asia there is a

common vision for the common vision for the ‘cyberinfrastructure’ required to support ‘cyberinfrastructure’ required to support the e-Science revolutionthe e-Science revolution

Set of Grid Middleware Services Set of Grid Middleware Services supported on top of high bandwidth supported on top of high bandwidth academic research networksacademic research networks

Opportunity for Computer Science Opportunity for Computer Science community to provide scientists with community to provide scientists with powerful new tools to analyze their datapowerful new tools to analyze their data

Open access federation of research Open access federation of research repositories containing full text and data repositories containing full text and data

Grids for Virtual OrganizationsGrids for Virtual Organizations

`

SQL DB

Data Fileshare

Code Fileshare Credential

Srv

Computation Server

Federated Trust

Protocols:- Resource Discovery- Job Scheduling & Management- Data Transfer- Audit

Administrative Domain

Directory

Compute Cluster

Service-Orientation for Service-Orientation for building Distributed Systemsbuilding Distributed Systems

Service

Adm

inis

trat

ive

dom

ain

Service

Service

Service

Administrative domain

Service

Service

Adm

inistrative dom

ain

network

boundariesmessages

Web Services and InteroperabilityWeb Services and Interoperability

Company A(J2EE)

Open Source(OMII)

Company C(.NET)

Web Services

Microsoft Open Specification Microsoft Open Specification Promise (September 12 2006)Promise (September 12 2006)

Covers Web Services specificationsCovers Web Services specifications SOAP, WSDL, WS-I, WS-Security, WS-Management, SOAP, WSDL, WS-I, WS-Security, WS-Management,

WS-Eventing, WS-Addressing ….WS-Eventing, WS-Addressing …. Q: How does the Open Specification Promise Q: How does the Open Specification Promise

work? Do I have to do anything in order to get work? Do I have to do anything in order to get the benefit of this OSP? the benefit of this OSP?

A: No one needs to sign anything or even A: No one needs to sign anything or even reference anything. Anyone is free to reference anything. Anyone is free to implement the specification(s), as they wish implement the specification(s), as they wish and do not need to make any mention of or and do not need to make any mention of or reference to Microsoft. Anyone can use or reference to Microsoft. Anyone can use or implement these specification(s) with their implement these specification(s) with their technology, code, solution, etc. You must technology, code, solution, etc. You must agree to the terms in order to benefit from the agree to the terms in order to benefit from the promise; however, you do not need to sign a promise; however, you do not need to sign a license agreement, or otherwise communicate license agreement, or otherwise communicate your agreement to Microsoft. your agreement to Microsoft.

Progress in Grid Standards?Progress in Grid Standards?

The GGF/EGA merger gives great opportunity The GGF/EGA merger gives great opportunity for the new Open Grid Forum (OGF) to for the new Open Grid Forum (OGF) to standardize a small set of basic Grid services standardize a small set of basic Grid services based on generally accepted Web Services based on generally accepted Web Services Harness the power of the world-wide Grid Harness the power of the world-wide Grid

community to develop robust open source community to develop robust open source reference implementationsreference implementations

Grid research community needs to propose and Grid research community needs to propose and explore new features in real experiments explore new features in real experiments OGF can reassure industry about progress in OGF can reassure industry about progress in

Grid standards and grow the market for allGrid standards and grow the market for all

Key Data Issues for e-Science Key Data Issues for e-Science

NetworksNetworks Lambda technology Lambda technology

The Data Life CycleThe Data Life Cycle From Acquisition to PreservationFrom Acquisition to Preservation

Scholarly CommunicationScholarly Communication Open Access to Data and Publications Open Access to Data and Publications

Computation

Starlight (Chicago) Netherlight

(Amsterdam)

Leeds

PSC

SDSC

UCL

Network PoP Service Registry

NCSA

Manchester

UKLight

Oxford

RAL

US TeraGrid

UK NGS

Steering clients

AHM 2004

Local laptops and Manchester vncserver

All sites connected by production

network (not all shown)

An International An International e-Infrastructuree-Infrastructure

The Problem for the e-ScientistThe Problem for the e-Scientist

Data ingest Data ingest Managing a petabyteManaging a petabyte Common schemaCommon schema How to organize it?How to organize it? How to How to rereorganize it?organize it? How to coexist & cooperate with How to coexist & cooperate with

others?others?

Data Query and Visualization tools Data Query and Visualization tools Support/trainingSupport/training PerformancePerformance

Execute queries in a minute Execute queries in a minute Batch (big) query schedulingBatch (big) query scheduling

Experiments &Instruments

Simulationsfacts

facts

answers

questions

?Literature

Other Archives facts

facts

The e-Science Data Life CycleThe e-Science Data Life Cycle

Data AcquisitionData Acquisition Data IngestData Ingest MetadataMetadata AnnotationAnnotation ProvenanceProvenance

Data StorageData Storage Data CleansingData Cleansing Data MiningData Mining CurationCuration PreservationPreservation

Publishing Data & Analysis Publishing Data & Analysis Is ChangingIs Changing

Roles

Authors

Publishers

Curators

Archives

Consumers

Traditional

Scientists

Journals

Libraries

Archives

Scientists

Emerging

Collaborations

Project web site

Data+Doc Archives

Digital Archives

Scientists

Data Publishing: The BackgroundData Publishing: The Background

In some areas – notably biology – databases are In some areas – notably biology – databases are replacing (paper) publications as a medium of replacing (paper) publications as a medium of communicationcommunication These databases are built and maintained with a These databases are built and maintained with a

great deal of human effortgreat deal of human effort They often do not contain source experimental data - They often do not contain source experimental data -

sometimes just annotation/metadatasometimes just annotation/metadata They borrow extensively from, and refer to, other They borrow extensively from, and refer to, other

databasesdatabases You are now judged by your databases as well as You are now judged by your databases as well as

your (paper) publicationsyour (paper) publications Upwards of 1000 (public databases) in geneticsUpwards of 1000 (public databases) in genetics

Data Publishing: The issuesData Publishing: The issues Data integration Data integration

Tying together data from various sourcesTying together data from various sources

Annotation Annotation Adding comments/observations to existing dataAdding comments/observations to existing data Becoming a new form of communicationBecoming a new form of communication

ProvenanceProvenance ‘‘Where did this data come from?’Where did this data come from?’

Exporting/publishing in agreed formatsExporting/publishing in agreed formats To other programs as well as peopleTo other programs as well as people

SecuritySecurity Specifying/enforcing read/write access to Specifying/enforcing read/write access to partsparts of of

your datayour data

Berlin Declaration 2003Berlin Declaration 2003

‘‘To promote the Internet as a functional To promote the Internet as a functional instrument for a global scientific instrument for a global scientific knowledge base and for human knowledge base and for human reflection’reflection’

Defines open access contributions as Defines open access contributions as including:including: ‘‘original scientific research results, original scientific research results,

raw data and metadata, source raw data and metadata, source materials, digital representations of materials, digital representations of pictorial and graphical materials and pictorial and graphical materials and scholarly multimedia material’scholarly multimedia material’

OECD Declaration on Access to OECD Declaration on Access to Research Data from Public Funding Research Data from Public Funding

(January 2004)(January 2004)

Supported by governments  of Australia, Supported by governments  of Australia, Austria, Belgium, Canada, China, the Austria, Belgium, Canada, China, the Czech Republic, Denmark, Finland, France, Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Iceland, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Japan, Korea, Ireland, Israel, Italy, Japan, Korea, Luxembourg, Mexico, the Netherlands, New Luxembourg, Mexico, the Netherlands, New Zealand, Norway, Poland, Portugal, the Zealand, Norway, Poland, Portugal, the Russian Federation, the Slovak Republic, Russian Federation, the Slovak Republic, the Republic of South Africa, Spain, the Republic of South Africa, Spain, Sweden, Switzerland, Turkey, the UK and Sweden, Switzerland, Turkey, the UK and the United States the United States

OECD Declaration recognizes:OECD Declaration recognizes: Optimum international exchange of data, information Optimum international exchange of data, information

and knowledge contributes decisively to the and knowledge contributes decisively to the advancement of scientific research and innovationadvancement of scientific research and innovation

Open access to, and unrestricted use of, data Open access to, and unrestricted use of, data promotes scientific progress and facilitates the promotes scientific progress and facilitates the training of researcherstraining of researchers

Open access will maximise the value derived from Open access will maximise the value derived from public investments in data collection effortspublic investments in data collection efforts

Substantial benefits that science, the economy and Substantial benefits that science, the economy and society at large could be gained from the society at large could be gained from the opportunities that expanded use of digital data opportunities that expanded use of digital data resourcesresources

The risk that undue restrictions on access to and use The risk that undue restrictions on access to and use of research data from public funding could diminish of research data from public funding could diminish the quality and efficiency of scientific research and the quality and efficiency of scientific research and innovationinnovation

NIH Data Sharing NIH Data Sharing

Data Sharing Policy (2003)Data Sharing Policy (2003) ‘‘Data should be made as widely and freely Data should be made as widely and freely

available as possible while safeguarding the available as possible while safeguarding the privacy of participants, and protecting privacy of participants, and protecting confidential and proprietary data’confidential and proprietary data’

Data Sharing Plan (2005)Data Sharing Plan (2005) The reasonableness of the data sharing plan The reasonableness of the data sharing plan

or the rationale for not sharing research data or the rationale for not sharing research data will be assessed by the reviewers will be assessed by the reviewers

The presence of a data sharing plan will be The presence of a data sharing plan will be part of the terms and conditions of the awardpart of the terms and conditions of the award

Scholarly Communication Scholarly Communication Global Movement towards permitting ‘Open Global Movement towards permitting ‘Open

Access’ to scholarly publicationsAccess’ to scholarly publications Libraries can no longer afford publisher Libraries can no longer afford publisher

subscriptions subscriptions Principle that results of publicly funded Principle that results of publicly funded

research should be available to allresearch should be available to all

Mandates for Open AccessMandates for Open Access US Proposal – Cornyn-Lieberman BillUS Proposal – Cornyn-Lieberman Bill

Supported by most top US research Supported by most top US research universitiesuniversities

EU ProposalsEU Proposals UK, France and German initiativesUK, France and German initiatives

NSF ‘Atkins’ Report on NSF ‘Atkins’ Report on Cyberinfrastructure Cyberinfrastructure

‘‘the primary access to the latest findings the primary access to the latest findings in a growing number of fields is through in a growing number of fields is through the Web, then through classic preprints the Web, then through classic preprints and conferences, and lastly through and conferences, and lastly through refereed archival papers’refereed archival papers’

‘‘archives containing hundreds or archives containing hundreds or thousands of terabytes of data will be thousands of terabytes of data will be affordable and necessary for archiving affordable and necessary for archiving scientific and engineering information’scientific and engineering information’

MIT DSpace VisionMIT DSpace Vision

‘‘Much of the material produced by faculty, Much of the material produced by faculty, such as datasets, experimental results and such as datasets, experimental results and rich media data as well as more rich media data as well as more conventional document-based material conventional document-based material (e.g. articles and reports) is housed on an (e.g. articles and reports) is housed on an individual’s hard drive or department Web individual’s hard drive or department Web server. Such material is often lost forever server. Such material is often lost forever as faculty and departments change over as faculty and departments change over time.’ time.’

  

Open Access and Scholarly Open Access and Scholarly PublishingPublishing

Goal is to work with the research Goal is to work with the research community to assist them in community to assist them in developing open and interoperable developing open and interoperable frameworks for scholarly publishing frameworks for scholarly publishing

Two aspectsTwo aspects ‘‘Community publishing’ toolset Community publishing’ toolset Service Oriented Framework for Service Oriented Framework for

Interoperable Repositories Interoperable Repositories

Community PublishingCommunity Publishing Develop toolset for ‘self-publishing’ of Develop toolset for ‘self-publishing’ of

workshop and conference proceedingsworkshop and conference proceedings Base development around existing MSR Base development around existing MSR

Workshop tool ‘CMT’Workshop tool ‘CMT’ Work with forward-looking publishers to Work with forward-looking publishers to

develop new publishing modelsdevelop new publishing models

Offer Microsoft as one site where such Offer Microsoft as one site where such academic publications can be kept ‘in academic publications can be kept ‘in perpetuity’?perpetuity’? Important that Microsoft is not only Important that Microsoft is not only

repository – cf LOCKSS and Portico repository – cf LOCKSS and Portico

CMT: Conference Management ToolCMT: Conference Management Tool Currently support a conference Currently support a conference

peer-review peer-review system (~300 conferences)system (~300 conferences) Form committeeForm committee Accept ManuscriptsAccept Manuscripts Declare interestDeclare interest ReviewReview DecideDecide Form program Form program NotifyNotify ReviseRevise

CMT++: eJournal Management ToolCMT++: eJournal Management Tool Add publishing stepsAdd publishing steps

Form committeeForm committee Accept ManuscriptsAccept Manuscripts Declare interestDeclare interest ReviewReview DecideDecide Form program Form program NotifyNotify ReviseRevise PublishPublish

• Connect to Archives• Manage archive document versions• Capture Workshop

• presentations • proceedings

• Capture classroom ConferenceXP• Moderated discussions of published articles

The Three Prophets of Open AccessThe Three Prophets of Open Access Paul Ginsparg’s arXiv at Cornell has demonstrated Paul Ginsparg’s arXiv at Cornell has demonstrated

a new model of scientific publishinga new model of scientific publishing Pioneered electronic version of ‘preprints’ hosted on the Pioneered electronic version of ‘preprints’ hosted on the

Web now used routinely by the physics communityWeb now used routinely by the physics community

David Lipman of the NIH National Library of David Lipman of the NIH National Library of Medicine has developed PubMedCentral as Medicine has developed PubMedCentral as repository for NIH funded research papersrepository for NIH funded research papers Microsoft funded development of ‘portable PMC’ now Microsoft funded development of ‘portable PMC’ now

being deployed in UK and other countriesbeing deployed in UK and other countries

Stevan Harnad’s ‘self-archiving’ EPrints project in Stevan Harnad’s ‘self-archiving’ EPrints project in Southampton provides a basis for OAI-compliant Southampton provides a basis for OAI-compliant ‘Institutional Repositories’‘Institutional Repositories’ JISC-funded TARDis Project at Southampton is hybrid of JISC-funded TARDis Project at Southampton is hybrid of

full-text open access and links to publisher sitesfull-text open access and links to publisher sites

The NLM Example: Entrez-GenBankThe NLM Example: Entrez-GenBank Sequence data deposited with GenbankSequence data deposited with Genbank Literature references Genbank IDLiterature references Genbank ID BLAST searches GenbankBLAST searches Genbank Entrez integrates and searchesEntrez integrates and searches

PubMedCentralPubMedCentral PubChemPubChem GenbankGenbank Proteins, SNP, Proteins, SNP, Structure,..Structure,.. Taxononomy…Taxononomy…

Nucleotide sequences

Protein sequences

Taxon

Phylogeny

MMDB3 -D

Structure

PubMed abstracts

Complete Genomes

PubMed Entrez Genomes

Publishers Genome Centers

Portable PubMedCentralPortable PubMedCentral

““Information at your fingertips”Information at your fingertips” Helping build PortablePubMedCentralHelping build PortablePubMedCentral Deployed US, China, England, Italy, South Deployed US, China, England, Italy, South

Africa, (Japan soon).Africa, (Japan soon). Each site can accept documents Each site can accept documents Archives replicated Archives replicated Federate thru web services Federate thru web services Working to integrate Working to integrate Word/Excel/…Word/Excel/…

with PubmedCentral with PubmedCentral To be clear: NCBI is doing 99% of the work. To be clear: NCBI is doing 99% of the work.

Routes to Open AccessRoutes to Open Access

Stevan Harnad identifies 2 roads to OA:Stevan Harnad identifies 2 roads to OA:

(1)(1) OA Journal publishing – ‘Gold’OA Journal publishing – ‘Gold’ ““author pays” rather than present author pays” rather than present

subscription modelsubscription model E.g. PLoS journalsE.g. PLoS journals

(2)(2) Self-Archiving in Repository – ‘Green’Self-Archiving in Repository – ‘Green’ Author provides OA by putting e-print of Author provides OA by putting e-print of

paper submitted to journal in repository paper submitted to journal in repository or on own web siteor on own web site

94% of journals are ‘Green’ and permit 94% of journals are ‘Green’ and permit self-archivingself-archiving

Key results from TARDis projectKey results from TARDis projectin UK FAIR programmein UK FAIR programme

‘‘Hybrid’ research publications database building up to represent Hybrid’ research publications database building up to represent full range of types of research in all disciplines across the full range of types of research in all disciplines across the institutioninstitution

Embed in research recording process with institutional Embed in research recording process with institutional commitmentcommitment

Add more full text as climate improves/authors become familiar Add more full text as climate improves/authors become familiar with practicewith practice

Library checks metadata, adds DOI or other link to publisher Library checks metadata, adds DOI or other link to publisher versionversion

Provided feedback to EPrints software to give good citation Provided feedback to EPrints software to give good citation format: providing tools for recording once – many outputs eg format: providing tools for recording once – many outputs eg export to research group web pagesexport to research group web pages

Hey, Jessie M.N., Simpson, Pauline and Carr, Leslie A. (2005) The TARDis Route Map to Open Access: developing an Hey, Jessie M.N., Simpson, Pauline and Carr, Leslie A. (2005) The TARDis Route Map to Open Access: developing an Institutional Repository Model. In, Dobreva, Milena and Engelen, Jan (eds.) ELPUB2005 From Author to Reader: Challenges Institutional Repository Model. In, Dobreva, Milena and Engelen, Jan (eds.) ELPUB2005 From Author to Reader: Challenges for the Digital Content Chain: Proceedings of the 9th ICCC International Conference on Electronic Publishing, Katholieke for the Digital Content Chain: Proceedings of the 9th ICCC International Conference on Electronic Publishing, Katholieke Universiteit Leuven, Leuven-Heverlee, Belgium, 8-10 June 2005. Leuven, Belgium, Peeters Publishing, 179-182.Universiteit Leuven, Leuven-Heverlee, Belgium, 8-10 June 2005. Leuven, Belgium, Peeters Publishing, 179-182.

http://eprints.soton.ac.uk/16262/http://eprints.soton.ac.uk/16262/ Simpson, Pauline and Hey, Jessie (2006) Repositories for research: Southampton’s evolving role in the knowledge cycle. Simpson, Pauline and Hey, Jessie (2006) Repositories for research: Southampton’s evolving role in the knowledge cycle.

Program, 40, (3), 224-231.Program, 40, (3), 224-231. http://eprints.soton.ac.uk/41240/http://eprints.soton.ac.uk/41240/ http://tardis.eprints.orghttp://tardis.eprints.org//

OA and Institutional RepositoriesOA and Institutional Repositories Registry of OA Repositories records:Registry of OA Repositories records:

213 archives using EPrints software213 archives using EPrints software 174 archives using DSpace software 174 archives using DSpace software

OAIster records:OAIster records: ~10M records from ~700 institutions~10M records from ~700 institutions

Sources of information about ‘Green Sources of information about ‘Green Route’ to OARoute’ to OA www.jisc.ac.uk/publicationswww.jisc.ac.uk/publications www.eprints.orgwww.eprints.org www.openarchives.orgwww.openarchives.org oaister.umdl.umich.edu/o/oaisteroaister.umdl.umich.edu/o/oaister www.OpenDOAR.orgwww.OpenDOAR.org

Augmenting interoperabilityAugmenting interoperability D

Space

Fedora

aD

OR

e

ePri

nts

arX

iv

Natu

re

Individual Data Models and Services

m Ob

tain

Harv

est

Put

The Service RevolutionThe Service Revolution Web 2.0Web 2.0

Social networks, tagging for sharing e.g. Social networks, tagging for sharing e.g. e.g. Flikr, Del.icio.us, MySpace, CiteULike, e.g. Flikr, Del.icio.us, MySpace, CiteULike, Connotea … Connotea …

Wikis, Blogs, RSS, folksonomies …Wikis, Blogs, RSS, folksonomies …

Software delivered as a serviceSoftware delivered as a service Microsoft Live servicesMicrosoft Live services

Office LiveOffice Live Xbox LiveXbox Live Windows Live AcademicWindows Live Academic

MashupsMashups SensorWeb + VirtualEarthSensorWeb + VirtualEarth http://mashupcamp.comhttp://mashupcamp.com

id

id

id

Combine services to give added value

e-Science Mashups?e-Science Mashups?

‘‘As We May Think’As We May Think’Vannevar Bush, 1945Vannevar Bush, 1945

Still grappling with the data preservation Still grappling with the data preservation issues he raised:issues he raised: ““A record if it is to be useful to science, must A record if it is to be useful to science, must

be continuously extended, it must be stored, be continuously extended, it must be stored, and above all it must be consulted.”and above all it must be consulted.”

Can now realize his idea of the ‘memex’Can now realize his idea of the ‘memex’ ““a future device for individual use, which is a a future device for individual use, which is a

sort of mechanized private file and library”sort of mechanized private file and library” Search by following ‘trails’ through dataSearch by following ‘trails’ through data

Now Paul Ginsparg’s ‘As We May Read’ …Now Paul Ginsparg’s ‘As We May Read’ …

UniformityUniformityEarly De Jure StandardsEarly De Jure Standards

Works well for the Works well for the physical worldphysical world

TranslatabilityTranslatabilityDe Facto StandardsDe Facto Standards

Microsoft Office Open XML Microsoft Office Open XML Formats (OOXML)Formats (OOXML)

Documents in Office 2007 will be based on Documents in Office 2007 will be based on new XML-based file formatsnew XML-based file formats Open, royalty-free file format specification will Open, royalty-free file format specification will

allow interoperabilityallow interoperability

OOXML submitted to ECMA International OOXML submitted to ECMA International Standards OrganizationStandards Organization Microsoft also offering ‘Covenant Not to Sue’Microsoft also offering ‘Covenant Not to Sue’

OpenXML Translator ProjectOpenXML Translator Project Microsoft backing open source project to create Microsoft backing open source project to create

translation tool between OOXML and Open translation tool between OOXML and Open Document Format ODFDocument Format ODF

Technical Computing at MicrosoftTechnical Computing at Microsoft Advanced Computing for Science and Advanced Computing for Science and

EngineeringEngineering Application of new algorithms, tools and Application of new algorithms, tools and

technologies to scientific and engineering technologies to scientific and engineering problemsproblems

High Performance ComputingHigh Performance Computing Application of high performance clusters and Application of high performance clusters and

database technologies to industrial and database technologies to industrial and scientific applicationsscientific applications

Radical ComputingRadical Computing Research in potential breakthrough Research in potential breakthrough

technologiestechnologies

SummarySummary

Microsoft wishes to work with the university Microsoft wishes to work with the university research and library communities to:research and library communities to:• develop interoperable high-level services, work develop interoperable high-level services, work flows, tools and data servicesflows, tools and data services

• accelerate progress in a small number of societally accelerate progress in a small number of societally important scientific applicationsimportant scientific applications

• assist in the development of interoperable assist in the development of interoperable repositories and new models of scholarly publishingrepositories and new models of scholarly publishing

• explore radical new directions in computing and explore radical new directions in computing and ways and applications to exploit on-chip parallelismways and applications to exploit on-chip parallelism

How can Microsoft best collaborate with the How can Microsoft best collaborate with the scientific community?scientific community?

© 2005 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.