32
YORK OCT 2011 In praise of inconsistency the long tail of small data Alan Dix Talis and Lancaster University www.hcibook.com/alan/ alandix.com/blog

In praise of inconsistency - the long tail of small data

Embed Size (px)

Citation preview

Page 1: In praise of inconsistency - the long tail of small data

YORK OCT 2011

In praise of inconsistencythe long tail of small data

Alan DixTalis and Lancaster University

www.hcibook.com/alan/alandix.com/blog

Page 2: In praise of inconsistency - the long tail of small data

YORK OCT 2011

LancasterUniversity

Tiree

TalisTiree Tech Wave3-7 Nov

Page 3: In praise of inconsistency - the long tail of small data

YORK OCT 2011

today I am not talking about …

• intelligent internet interfaces• visualisation and sampling• situated displays, eCampus,

small device – large display interactions• fun and games, virtual crackers,

artistic performance, slow time• physicality and product design• creativity and Bad Ideas• modelling dreams and regret

Page 4: In praise of inconsistency - the long tail of small data

YORK OCT 2011

… or even lots of lights

http:/www.hcibook.com/alan/projects/firefly/

Page 5: In praise of inconsistency - the long tail of small data

YORK OCT 2011

back in the 1980s ... Codd and all that

• in theory:– normalisation, atomicity– illusion of single use & strong internal consistency

• in practice– de-normalise for efficiency– maintain consistency through controlled transactions– business logic, APIs

Page 6: In praise of inconsistency - the long tail of small data

YORK OCT 2011

the IS ideal

transactionalupdate

multipleviewssingle central

repository

Page 7: In praise of inconsistency - the long tail of small data

YORK OCT 2011

the more things change ...

the cloud

APIupdate

web-basedviews

Page 8: In praise of inconsistency - the long tail of small data

YORK OCT 2011

... the more they stay the same

single central repository

transactionalupdate

multipleviews

Page 9: In praise of inconsistency - the long tail of small data

YORK OCT 2011

limits of consistency

consistency not always possible

• distribution and caching

• multi-user update (Alison and Brian)

• view-based updates

http://www.perryslingsbysystems.com/trenchers.htmlhttp://www.vfridge.com/

Page 10: In praise of inconsistency - the long tail of small data

YORK OCT 2011

ordering problems(race conditions)

Alison Brian

send send

It's a beautiful dayLet's go out afterwork.

I agree totally

It's a beautiful day. Let's go out after work.

Alison It's a beautiful day. Let's go out after work.

Alison

sendsend

perhaps not, I lookawful after thelate party

perhaps not, I look awful after the late party

Alison I agree totallyBrian

sendsendsend send

I agree totallyBrianperhaps not, I look awful after the late party

Alison

Page 11: In praise of inconsistency - the long tail of small data

YORK OCT 2011

limits of consistency

consistency not always possible

• distribution and caching

• multi-user update (Alison and Brian)

• view-based updates

http://www.perryslingsbysystems.com/trenchers.html

Page 12: In praise of inconsistency - the long tail of small data

YORK OCT 2011

view based updatecomplimentary functions

view /display

central state /data base

D

S

v v

S’f

D’v(f)

Page 13: In praise of inconsistency - the long tail of small data

YORK OCT 2011

view based updatecomplimentary functions

view /display

central state /data base

D

S

v v

S’v–1(op)

D’op

Page 14: In praise of inconsistency - the long tail of small data

YORK OCT 2011

... always

goal is eventual consistency

Page 15: In praise of inconsistency - the long tail of small data

YORK OCT 2011

sometime not possible

distributed garbage collection– various algorithms ...

aim to make sure referenced items not lost– but always storagee node can die

• options:– prevent loss of referenced item– accept loss of referenced item

• leases or “ref not found” exceptions

Page 16: In praise of inconsistency - the long tail of small data

YORK OCT 2011

what is consistent?

• conflicting updates• long-term transactions• synchronisation

... and Apple still can’t get it right!!

http://www.vfridge.com/

Page 17: In praise of inconsistency - the long tail of small data

YORK OCT 2011

internal and external consisitency

• the exam board ....

Page 18: In praise of inconsistency - the long tail of small data

YORK OCT 2011

is the world consistent anyway?

• departmental lists

Page 19: In praise of inconsistency - the long tail of small data

YORK OCT 2011

a different approach

do not enforce consistency

but highlight inconsistency

Page 20: In praise of inconsistency - the long tail of small data

YORK OCT 2011

a different approach

do not enforce consistencybut highlight inconsistency

• instead of views of central data, related yet different sources

• specify connections and automatically checkinform of updates, highlight discrepanciesbut allow divergence

Page 21: In praise of inconsistency - the long tail of small data

YORK OCT 2011

‘workspace’

concept – workspaces

department nameaccounts jane baliktddsjhasddh sdhfg askjhlktechnical alan joun dixtechnical john mariani

centralinstitutional

database

spreadsheet oncolleague’s PC

table in word docon your own PC

Page 22: In praise of inconsistency - the long tail of small data

YORK OCT 2011

fast forward ten years ...

• semantic web and RDF– open schema (but can be specified)– open world model– flexible and extensible (e.g. Volkswagen)

• individual data sets– ontology engineering – getting the model right

• linking open data– connecting web of data– shared vocabularies and URIs

Page 23: In praise of inconsistency - the long tail of small data

YORK OCT 2011

linking open data

Page 24: In praise of inconsistency - the long tail of small data

YORK OCT 2011

linking open data

linking through:• shared• dereferencable• URIs

Page 25: In praise of inconsistency - the long tail of small data

YORK OCT 2011

fast forward ten years ...

• semantic web and RDF– open schema (but can be specified)– open world model– flexible and extensible (e.g. Volkswagen)

• individual data sets– ontology engineering – getting the model right

• linking open data– connecting web of data– shared vocabularies and URIs

soundsfamiliar?

Page 26: In praise of inconsistency - the long tail of small data

YORK OCT 2011

the long tail

size ofdata set

a few very large data setse.g. Open Govt., OS, geonames, dbpedia

the small data of ordinary life:from local bus timetables to squash club league tables

Page 27: In praise of inconsistency - the long tail of small data

YORK OCT 2011

supporting small data?

• Google fusion tables• Google refine• Freebase• Talis Kasabi

• mostly for ‘middle’ sized data

Page 28: In praise of inconsistency - the long tail of small data

YORK OCT 2011

really small?

personal, but also Govt.often

tables

describe semantics rather than ‘converting’• explicit – simple description• implicit – semantics through interaction

Page 29: In praise of inconsistency - the long tail of small data

YORK OCT 2011

explicit – 3 levels

• the table as it is:– there are 5 columns

col 1 is called “name”, col 2 is ‘population

• internal semantics of table (in its dataset)– each row is the properties of a country entity

defined by the ‘name’ column

• external linkage to standard data/vocab– rules + exceptions– country is ‘sameAs’ geoname country by matching name

except ‘Wales’ is geoname administrative region ...

Page 30: In praise of inconsistency - the long tail of small data

YORK OCT 2011

implicit

action is specification:• view a table and give it a name• link items/columns from different data sources• perform calculation

semantics are emergent through use

Page 31: In praise of inconsistency - the long tail of small data

YORK OCT 2011

so ...

long history of consistency... but not always possible or desirable

do not enforce consistencybut highlight inconsistency

exploit the long tail of small data

Page 32: In praise of inconsistency - the long tail of small data

YORK OCT 2011

plus ...

come toTiree Tech Wave

3-7 Nov 2011