View
104
Download
1
Tags:
Embed Size (px)
Citation preview
www.bsc.es
Volum, Varietat, Velocitat …
i Compartició
Anna Queralt
Storage System Research Group
Looking at things from a different perspective
“Creativity is just
connecting things”
Steve Jobs
“True originality consists
not in a new manner, but
in a new vision”
Edith Wharton
“Cambiar de respuesta
es evolución. Cambiar de
pregunta es revolución”
Jorge Wagensberg
“We cannot solve our
problems with the same
thinking we used when
we created them”
Albert Einstein
Big Data
sharing
Open Data
Open data are the building blocks of open knowledge.
Open knowledge is what open data becomes when it’s
useful, usable and used.
Open data is data that can be freely used, reused and
redistributed by anyone - subject only, at most, to the
requirement to attribute and sharealike.
Importance of Open Data in Europe
“Towards a thriving data driven economy”
European stragegy on data, with Open Data as
a prominent element
– Infrastructure
– Analysis
– Privacy
– ...
Why?
Makes public administration more efficient and more effective
– Thanks to Open Data, the US government has reduced the annual costs of
attending citizens from 500 M$ to 34 M$
Open data portals stimulate innovation and economic growth
– Applications that can help to improve society, tackle economical problems,
generate employment and drive economic growth
– Research suggests that seven sectors alone could generate
more than $3 trillion a year in additional value as a result of
open data Open Data: Unlocking Innovation And Performance With Liquid Information
(McKinsey Global Institute)
– Big Data and open data will contribute more than 200.000
M€ to the European economy by 2020 Big&Open Data in Europe: a growth engine or a missed opportunity?
(demosEuropa, WISE , Microsoft)
How is data shared today?
Most open data is available as downloadable files (2509 sources)
How is data shared today?
Only 27% of sources are provided in a processable format (2132 are PDF)
How is data shared today?
Downloadable files: owner decides what can be copied Unnecessary data movements and copies
Stale data
Owner loses control over data
Flexible
Data services: owner decides what and how data is shared Very restrictive
Changes imply data provider involvement
No data movements or copies
Owner keeps full control
A new way of sharing data
dataClay
The pillars of dataClay
Data sharing
Control
Avoiding data transfers
A single data model
Why persistent data is different than volatile?
Today
We have a data model for volatile data
Objects and data structures
We have a different model for persistent data
Relational database, NoSQL database, files
Future
Store data in the same way as when volatile
Store objects and relations
Our vision
Create a platform that
Enables applications to easily make objects persistent
Enables users to add more data or “change” the data model
Enables users to add new computations to be shared
&
The data owner does not lose control over the data
Key idea: self-contained objects and
data enrichment by 3rd parties
Push the idea of data services to the limit
Key technology: self-contained objects
Data
Client App Client App
Data Data
Data
Functions
Security, Integrity, …
Data
Security, ...
Functions
Data service
Data store
Data store
Key-technology: 3rd party enrichments
Self-contained objects
seem to be a new technology to offer
data services in a different way
Then…
… we need something else …
… something to make it really flexible!
3rd-party enrichment
By enrichment we understand:
Adding new information to existing datasets
Adding new code to existing datasets
This enrichment should
Be possible during the life of data
Not be limited to the data owner
Enable different views of the data to different users/clients
Be shareable again
Data can be enriched both with data and code in provider infrastructure
Code can be executed in the provider infrastructure
Then…
Enrichment
Client App
Data provider infrastructure
Efficient usage of resources
Data and code can be offloaded to resources not accessible by the data provider
Moreover…
Data
Security, ...
Functions
Provider Infrastructure
Client Infrastructure
Cloud
CONCLUSIONS
Sharing (big) data is key to innovation
Conclusions
Build new knowledge on top, and share it
See data produced by others from
a different perspective
Credits
dataClay team
– Toni Cortés (Team leader)
– Anna Queralt (PhD)
– Jonathan Martí
– Daniel Gasull
– Juanjo Costa (PhD)
– Alex Barceló
Former team members
– Ernest Artiaga (PhD)