Visions of the Web of Open Data - files.meetup.com Chris Davis.pdf · Visions of the Web of Open...

Preview:

Citation preview

Visions of the Web of Open Data

Chris Davis - @cbdvshttp://enipedia.tudelft.nl

c.b.davis@tudelft.nl

Who am I?

● Postdoc Energy & Industry, TBM, TU Delft

● Focus on Industrial Ecology, Open Data, Collaborative Software, Modeling, Visualization, Analytics, etc.

Motivations

● Energy and sustainability are some of the most important topics of the 21st century

● Need both aggregated and fine-grained data

● Research can be data intensive● There's a lot out there, but

connecting it is tedious● Researchers often duplicate effort● It would be great to revolutionize

how we deal with this data● The energy sector is only slowly

embracing the ICT & Open Data revolutions

“Information wants to be free”

Information wants to be free because it has become so cheap

to distribute, copy, and recombine - too cheap to meter.

Stewart Brand

There's a Tension...

It wants to be expensive because it can be

immeasurably valuable to the recipient.

Stewart Brand

There's a Tension...

That tension will not go away. It leads to endless wrenching debate

about price, copyright, “intellectual property,” and the moral rightness of casual distribution,

because each round of new devices makes the tension worse, not better.

Stewart Brand

There's a Tension...

If you cling blindly to the expensive part

of the paradox, you miss all the action

going on in the free part.

The pressure of the paradox forces information

to explore incessantly.Stewart Brand

There's a Tension...

Enipedia.tudelft.nl

11

12

13

14

15

A tale of one (or four?) power stations and seven data sets

17

How the European Commission manages data

Large Combustion Plants Directivehttp://ec.europa.eu/environment/air/pollutants/stationary/lcp/legislation.htm

Coupling of Power Production to Water Consumption

"Water becoming a serious constraint for power generation" Aditi Nigam, The Hindu, July 18, 2012

Transparency?

Copyright

Unless specifically prohibited by a notice published on any page, you may make a print copy of such parts of the Web-site as you may reasonably require for your use provided that any copy has attached to it any relevant proprietary no-tices and/or disclaimers. You agree not for yourself or through or by way of assistance to any third party to distribute, decompile, reverse engineer, disassemble or otherwise deal in or with the Website or materials therein or otherwise commercially exploit such material or content otherwise than as permitted by law.

[...]

Costs of Access

You shall be responsible for obtaining access to the internet in order to make use of the Website and shall pay any ser-vice fees, telephone charges or other costs associated with such access.

Data for further analysing purposes are to be downloaded using the means available at the website. The use of crawlers, robots or similar tools will be seen as offensive and will lead to a temporarily or permanent disclosure of a user/company from the website.

[...]

The downloads shall be based on fair use, unproportional downloads (7.5 times more than the average user of the same category) may lead to a withdrawal of the user rights without prior notice.

http://www.gas-roads.eu/gte_tp/html/termsandconditions

http://www.entsoe.net/res/disclaimer.pdf

26

Officially Curated vs. Crowdsourced data

● Crowdsourcing generally OK for easily verifiable data● Officially curated data needed for comprehensive, hard

to verify data, small specialized communities● Crowdsourced data is only possible because of revision

control.● Crowdsourced data needs an incentive

● General interests, hobbies, gamification

27

28

29

Data Quality as a Product Data Quality as a Process

How to Measure Data Quality?

DataQuality

ResearcherSkill/Experience

# Viewers/Editors

Ease of IndependentVerification

= X X

Low Editor Diversity

High Editor Diversity

31

How to Measure Data Quality?

● Eric Raymond – “With many eyes all bugs are shallow”● But... not all eyes are evenly distributed

Distributed Air Quality Sensors

http://airqualityegg.wikispaces.com/

Distributed Air Quality Sensors

http://airqualityegg.wikispaces.com/

Two Long Tails

Contributors

Data

Two Long Tails

Contributors

Data Diminishing Marginal Returns

Diversity of User Knowledge

Club of 27?

Loosely Coupled

Open Data?

LinkedOpen Data?

41

42

enipedia.tudelft.nl/mapsenipedia.tudelft.nl/maps

http://skytruth.org/viirs/

48

Big Data?

49

Big Data

http://uncyclopedia.wikia.com/wiki/Rocket_Propelled_Chainsaw

50

Back to Basics

●API●REST●GET●POST●CSV●XML●JSON●CC BY-SA

Conclusions

● Data is embedded in a socio-technical system● Co-evolution of Data, Platforms, and Communities● Official data needs crowdsourced data & vice versa● Data Quality as a product vs. a process

Questions?

Chris Davis - @cbdvshttp://enipedia.tudelft.nl

c.b.davis@tudelft.nl