Upload
kebepcy
View
953
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Παρουσίαση από τη διάλεξη με θέμα «Ψηφιακές βιβλιοθήκες, ψηφιακά αποθετήρια, υποδομές δεδομένων: θέτοντας τις βάσεις για επιστήμες βασισμένες στα δεδομένα» του Kαθηγητή του τμήματος Πληροφορικής και Τηλεπικοινωνιών του Πανεπιστημίου Αθηνών Γιάννη Ιωαννίδη, που πραγματοποιήθηκε την Τρίτη 29 Ιουνίου στο Πανεπιστήμιο Λευκωσίας Την εκδήλωση διοργάνωσαν η Βιβλιοθήκη και το Τμήμα Πληροφορικής Πανεπιστημίου Λευκωσίας, η Βιβλιοθήκη και το Τμήμα Πληροφορικής Πανεπιστημίου Κύπρου και η Κυπριακή Ένωση Βιβλιοθηκονόμων - Επιστημόνων Πληροφόρησης (ΚΕΒΕΠ).
Citation preview
Γιάννης ΙωαννίδηςΕργαςτήριο MaDgIK - Τμ. Πληροφορικήσ & Τηλ/νιών
Πανεπιςτήμιο Αθηνών
Ψηφιακές Βιβλιοθήκες,
Ψηφιακά Αποθετήρια
Υποδομές Δεδομέμωμ:
Θεμέλια της Νέας Επιστήμης
Projects at Work
Outline
Science Paradigms
New Scholarly Communication & Open Access
Digital Libraries & Repositories
– DRIVER → OpenAIRE
Computational & Data Challenges
eInfrastructures
– D4Science (I & II)
– GRDI2020
Conclusions
Science Paradigms
1st - Thousand years ago: science was empirical
describing natural phenomenaw/ some models, generalizations
2nd - Last few hundred years: theoretical branch
using models, generalizations
4
2
2
2.
3
4
a
cG
a
a
Really Early Times
One scientist
One location
One discipline
One phenomenon
One pencil (… carver …)
One paper (… stone …)
Street announcements, e.g., Εύρηκα!
Science Paradigms
3rd - Last few decades: a computational branch
simulating complex phenomena
6
Recent Times
One small group of scientists
One location
One discipline
One phenomenon
One file system
One local disk with custom files
Publications at refereed forums
Science Paradigms
4th - Today: data exploration (eScience)
unify theory, experiment, and simulation
8
Current Times
Many/large teams of scientists
Many locations
Many disciplines
Many phenomena
Many data management systems
Many data forms
Web uploads for publications, data, processes, …
New order in scholarly communication
Open access
Creator, author, publisher, curator, preserver roles mixed up
Digital libraries & repositories at centre stage
Current Times
Web uploads for publications, data, processes, …
eInfrastructure Layers
Network
Processing
Data / Info / Pubs
Functionality
Users
Communities
}
Scholarly Communication
Imperatives1. Comprehensive, global access to any type of
scientific information
2. Minimum time and resources effort to access and use this information
3. Easy search/navigation, handling, manipulation, and re-dissemination of information
4. Maximum visibility to and communication with the research community, research impact
5. Long-term access and preservation of research results
Open Access
“Our mission of disseminating knowledge is only half complete if the information is not made widely and readily available to society. New possibilities of knowledge dissemination not only through the classical form but also and increasingly through the open access paradigm via the Internet have to be supported.
Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities, 2003
Repository Landscape:
Past-Present-Future
Universal DRs
National, Regional, and Thematic DRs
Trans-National DRs (DRIVER)
Pan-European & Inter-Thematic DRs
(OpenAIRE)
DRIVER High-Level Objectives
Develop an environment for integrating existing national, regional, or thematic repositories
Create a production-quality European DR infrastructure
Prepare the future expansion and upgrade of the DR infrastructure across Europe
Identify and promote the use of a relevant set of standards
Raise awareness among user communities
D-NET eInfrastructure Software
Service-Oriented Architecture
Web Services, dynamic service registration, ...
Distributed environment
Services executed on a network of machines
D-NET components (Lego approach)
Enabling services: infrastructure middleware
Data Management services: aggregation systems
End-user Functionality services: search, community support, portals
DRIVER production infrastructureD-Net’s release v1.1
?
Enabling Layer
Data Layer
EU Open Access
Repositories
Functionality Layer
Adm
inis
trato
rsE
nd
users
Advanced User InterfacesLight User Interfaces
PO
PO
RO
DRIVER hard/soft-ware
Resources
DRIVER EU Repository Map
Repository Landscape
DRIVER activity
254 repositories – 31 countries
220+ harvested
1,2M documents
European repositories +/- 500
World repositories +/- 1100
Story – Tales from
Repository managers
Initially I just used the Validation tool to see if
our repository is more or less on track and was
reassured when the results looked good,
which gave me confidence to register.
- Louw Venter,
Boloka Research Repository of the North-West University
South Africa
COAR
Confederation of Open Access Repositories
Permanent organisational backbone for European (and world) repository infrastructure
– Geographic and thematic extension
– Diffusion of DRIVER technology
– Connect established communities of practice
– Promote Open Access
– Fill repositories with Open Access publications
Mature Federations
This is as optional footer
Extended affiliations
New partners National aggregations
D-Net’s current uptake
DRIVER European Information Space
– www.driver-community.eu
OpenAIRE EC pilot
– www.openaire.eu
European Film Gateway and other EC projects
– www.europeanfilmgateway.eu
Experimentation of deployment of new infrastructure instances
– China, India, Portugal, Belgium, Spain, Slovenia
OpenAIRE High-Level Objectives
Implement European policy on Open Access
“Every publication resulting from European funding under FP7 or from the ERC should be stored in a repository and be openly available”
Promote above policy to researchers
Pilot project for full-scale implementation in the future
OpenAIRE - factsheet
Open Access Infrastructure for Research in Europe
Programme: FP7 – Research Infrastructures
Starting date: December 1, 2009
Duration: 36 months
Budget: 4.1 Million
38 partners covering all European member-states
To be reached at www.openaire.eu
Partners
University of Athens (coordinator)
University of Goettingen Library (scientific coordinator)
CNR-ISTI (technical coordinator)
University of Bielefeld
Spanish National Research Council (CSIC)
CERN
SURF
ICM – University of Warsaw
University of Minho
University of Gent Library
eIFL
Technical University Denmark
Scientific Communities
Health (Life Sciences)
– EMBL-EBI
Environment
– World Data Center for Climate
– Consultative Group on International Agricultural Research (CGIAR)
Information & Communication Science
– Cognitive Interaction Technology (CITEC)
Socio-economic Sciences and Humanities
– Data Archiving and Networked Services (DANS)
Liaison Offices
Liaison Offices
Region 1 North
(DTU)
Denmark
(Danish Technical
University)
Finland
(University of Helsinki)
Sweden
(National Library of
Sweden)
Region 2 South
(UMINHO)
Cyprus
(Universtity of Cyprus)
Greece
(National
Documentation Center)
Italy
(CASPAR)
Malta
(Malta Council for
Science & Technology)
Portugal
(University of Minho)
Spain
(Spanish Foundation for
Science & Technology)
Region 3 East
(eIFL)
Bulgaria
(Bulgarian Academy of
Sciences)
Czech Republic
(Technical University of
Ostrava)
Estonia
(University of Tartu)Hungary (HUNOR)
Latvia
(University of Latvia)
Lithuania
(Kaunas Technical
University)
Poland
(ICM – University of
Warsaw)
Romania
(Kosson)
Slovakia
(university Library of
Bratislava)
Slovenia
(University of Ljubljana)
Region 4 West
(UGENT)
France
(Couperin)
Germany
(University of Kostanz)
Ireland
(Trinity College)
Netherlands
(Utrecht University)
UK
(SHERPA)
Austria
(University of Wien)
Belgium
(Universtiy of Gent)
European Helpdesk
National Open Access Liaison Offices (27 countries)
Provide OA “toolkits” for
– Researchers
– Institutions
Setup 24/7 portal for deposit, search of OA publications
Liaison with
– Other European OA initiatives
– Publishers
– CRIS systems
Supporting repository
eInfrastructure
OpenAIRE portal built on D-NET
Access to scientific publications
– Search, browse
– Visualization tools
Deposition of articles
– Setup repository for “orphan” (better, “homeless”) researchers (CERN’s INVENIO)
– Harvest OA publications from existing repositories
Provide monitoring tools for
– Document/depositing statistics
– Usage statistics from repository infrastructure
Interoperation with other infrastructures
OpenAIRE in a nutshell
D-NET platform
DRIVER-2-OpenAIRE Take Away
Changing the culture in research publications
Open accessibility to research results
Metrics of research output vs. funding
Technology + info + people infrastructures
Current Times
Many/large teams of scientists
Many locations
Many disciplines
Many phenomena
Many data management systems
Many data forms
Web uploads for publications, data, processes, …
Data in 4th Science Paradigm
Captured by instruments or generated by simulators
Processed by software
Stored in computer as Information/Knowledge
Analyzed while in scientist’s database / filesusing data management and statistics
37
Metadata
High Speed Network
Overall Data Flow
Data acquisition, reduction,
analysis, visualization, storage
Data
Acquisition
SystemRemote users w/
local computing
and storage
Remote storageLocal
users
raw
dataRemote
users
Supercomputers
PAN-STARRS
PS1
– detect ‘killer asteroids’,starting in November 2008
– Hawaii + JHU + Harvard + Edinburgh + Max Planck Society
Data Volume
– >1 Petabytes/year raw data
– Over 5B celestial objectsplus 250B detections in DB
– 100TB database
– PS4: 4 identical telescopes in 2012, generating 4PB/yr
Cosmological Simulations
Cosmological simulations have 109 particles and produce over 30TB of data (Millennium)
Build up dark matter halos
Track merging history of halos
Use it to assign star formation history
Combination with spectral synthesis
Realistic distribution of galaxy types
Hard to analyze the data afterwards need DB
Optimize comparison to real data
Immersive Turbulence
Unique turbulence database
– Consecutive snapshots of a 1,0243 simulation of turbulence:now 30 Terabytes
– Soon 6K3 and 300 Terabytes
– Hilbert-curve spatial index
and massive mining
– Treat it as an experiment, observethe database!
– Throw test particles in from your laptop,immerse yourself into the simulation,like in the movie Twister
New paradigm for analyzing HPC simulations!
LHC and other HEP data
Concorde(15 km)
Balloon(30 km)
CD stack with1 year LHC data!(~ 20 km)
Mt. Blanc(4.8 km)
Very complex data model
Will generate 1GB/s, 10 PB/y
Data: raw calibrated skimmed high-level objects physics analyses results
Duplicated for in-silico experiments to interpret data
Dependence on grey literature: calibration constants, algorithms ... oral tradition!
Other Reference Applications
SDSS: 10TB total, 3TB in DB, soon 10TB, 6 years old
SkyQuery: fast spatial joins on largest astronomy catalogs / replicate multi-TB datasets 20x for performance (1Bx1B in 3 mins)
OncoSpace: 350TB of radiation oncology images today, 1PB in two years, to be analyzed on the fly
BaBar: Grows 1TB/day2/3 simulation Information 1/3 observational Information
VLBA (NRAO): generates 1GB/s today
NCBI: “only ½ TB” but 2X each year
very rich dataset
Pixar: 100 TB/Movie
D4Science:
Environmental MonitoringEuropean Space Agency
Global environmental issues: marine environment, forest ecosystem, air quality
Sensor data analysis, integration and correlation of data sources; reasoning, information/knowledge mgnt
Large amount of information ( 1TB), added-value applications and services
Seamless workflow definition &on-demand data processing
D4Science: Fishery Resources Mgmt
Fishery@FAO and WorldFish Center
Worldwide spread researchers from many disciplines (biologists, climatologists, GIS experts, socio-economists, fishery managers, etc.)
Continuous assessment for sustainable development & use of the ecosystem of world’s fisheries and aquaculture, e.g., species, aquatic resources, hydrological changes
Extreme data diversity
Conclusions
Digital Libraries & Repositories: The new way for scholarly communication (final product)
Data Infrastructures: The new libraries for all scientific documentation (intermediate and final products)
Huge technological and organizational challenges
LONG way to go
FUN way to go
Ευχαριστώ!