22
www.bsc.es Volum, Varietat, Velocitat i Compartició Anna Queralt Storage System Research Group [email protected]

Volum, Varietat, Velocitat... i Compartició

Embed Size (px)

Citation preview

Page 1: Volum, Varietat, Velocitat... i Compartició

www.bsc.es

Volum, Varietat, Velocitat …

i Compartició

Anna Queralt

Storage System Research Group

[email protected]

Page 2: Volum, Varietat, Velocitat... i Compartició

Looking at things from a different perspective

“Creativity is just

connecting things”

Steve Jobs

“True originality consists

not in a new manner, but

in a new vision”

Edith Wharton

“Cambiar de respuesta

es evolución. Cambiar de

pregunta es revolución”

Jorge Wagensberg

“We cannot solve our

problems with the same

thinking we used when

we created them”

Albert Einstein

Page 3: Volum, Varietat, Velocitat... i Compartició

Big Data

sharing

Page 4: Volum, Varietat, Velocitat... i Compartició

Open Data

Open data are the building blocks of open knowledge.

Open knowledge is what open data becomes when it’s

useful, usable and used.

Open data is data that can be freely used, reused and

redistributed by anyone - subject only, at most, to the

requirement to attribute and sharealike.

Page 5: Volum, Varietat, Velocitat... i Compartició

Importance of Open Data in Europe

“Towards a thriving data driven economy”

European stragegy on data, with Open Data as

a prominent element

– Infrastructure

– Analysis

– Privacy

– ...

Page 6: Volum, Varietat, Velocitat... i Compartició

Why?

Makes public administration more efficient and more effective

– Thanks to Open Data, the US government has reduced the annual costs of

attending citizens from 500 M$ to 34 M$

Open data portals stimulate innovation and economic growth

– Applications that can help to improve society, tackle economical problems,

generate employment and drive economic growth

– Research suggests that seven sectors alone could generate

more than $3 trillion a year in additional value as a result of

open data Open Data: Unlocking Innovation And Performance With Liquid Information

(McKinsey Global Institute)

– Big Data and open data will contribute more than 200.000

M€ to the European economy by 2020 Big&Open Data in Europe: a growth engine or a missed opportunity?

(demosEuropa, WISE , Microsoft)

Page 7: Volum, Varietat, Velocitat... i Compartició

How is data shared today?

Most open data is available as downloadable files (2509 sources)

Page 8: Volum, Varietat, Velocitat... i Compartició

How is data shared today?

Only 27% of sources are provided in a processable format (2132 are PDF)

Page 9: Volum, Varietat, Velocitat... i Compartició

How is data shared today?

Downloadable files: owner decides what can be copied Unnecessary data movements and copies

Stale data

Owner loses control over data

Flexible

Data services: owner decides what and how data is shared Very restrictive

Changes imply data provider involvement

No data movements or copies

Owner keeps full control

Page 10: Volum, Varietat, Velocitat... i Compartició

A new way of sharing data

dataClay

Page 11: Volum, Varietat, Velocitat... i Compartició

The pillars of dataClay

Data sharing

Control

Avoiding data transfers

A single data model

Page 12: Volum, Varietat, Velocitat... i Compartició

Why persistent data is different than volatile?

Today

We have a data model for volatile data

Objects and data structures

We have a different model for persistent data

Relational database, NoSQL database, files

Future

Store data in the same way as when volatile

Store objects and relations

Page 13: Volum, Varietat, Velocitat... i Compartició

Our vision

Create a platform that

Enables applications to easily make objects persistent

Enables users to add more data or “change” the data model

Enables users to add new computations to be shared

&

The data owner does not lose control over the data

Key idea: self-contained objects and

data enrichment by 3rd parties

Page 14: Volum, Varietat, Velocitat... i Compartició

Push the idea of data services to the limit

Key technology: self-contained objects

Data

Client App Client App

Data Data

Data

Functions

Security, Integrity, …

Data

Security, ...

Functions

Data service

Data store

Data store

Page 15: Volum, Varietat, Velocitat... i Compartició

Key-technology: 3rd party enrichments

Self-contained objects

seem to be a new technology to offer

data services in a different way

Then…

… we need something else …

… something to make it really flexible!

Page 16: Volum, Varietat, Velocitat... i Compartició

3rd-party enrichment

By enrichment we understand:

Adding new information to existing datasets

Adding new code to existing datasets

This enrichment should

Be possible during the life of data

Not be limited to the data owner

Enable different views of the data to different users/clients

Be shareable again

Page 17: Volum, Varietat, Velocitat... i Compartició

Data can be enriched both with data and code in provider infrastructure

Code can be executed in the provider infrastructure

Then…

Enrichment

Client App

Data provider infrastructure

Page 18: Volum, Varietat, Velocitat... i Compartició

Efficient usage of resources

Data and code can be offloaded to resources not accessible by the data provider

Moreover…

Data

Security, ...

Functions

Provider Infrastructure

Client Infrastructure

Cloud

Page 19: Volum, Varietat, Velocitat... i Compartició

CONCLUSIONS

Page 20: Volum, Varietat, Velocitat... i Compartició

Sharing (big) data is key to innovation

Conclusions

Build new knowledge on top, and share it

See data produced by others from

a different perspective

Page 21: Volum, Varietat, Velocitat... i Compartició
Page 22: Volum, Varietat, Velocitat... i Compartició

Credits

dataClay team

– Toni Cortés (Team leader)

– Anna Queralt (PhD)

– Jonathan Martí

– Daniel Gasull

– Juanjo Costa (PhD)

– Alex Barceló

Former team members

– Ernest Artiaga (PhD)