Upload
alan-dix
View
188
Download
3
Embed Size (px)
Citation preview
YORK OCT 2011
In praise of inconsistencythe long tail of small data
Alan DixTalis and Lancaster University
www.hcibook.com/alan/alandix.com/blog
YORK OCT 2011
LancasterUniversity
Tiree
TalisTiree Tech Wave3-7 Nov
YORK OCT 2011
today I am not talking about …
• intelligent internet interfaces• visualisation and sampling• situated displays, eCampus,
small device – large display interactions• fun and games, virtual crackers,
artistic performance, slow time• physicality and product design• creativity and Bad Ideas• modelling dreams and regret
YORK OCT 2011
… or even lots of lights
http:/www.hcibook.com/alan/projects/firefly/
YORK OCT 2011
back in the 1980s ... Codd and all that
• in theory:– normalisation, atomicity– illusion of single use & strong internal consistency
• in practice– de-normalise for efficiency– maintain consistency through controlled transactions– business logic, APIs
YORK OCT 2011
the IS ideal
transactionalupdate
multipleviewssingle central
repository
YORK OCT 2011
the more things change ...
the cloud
APIupdate
web-basedviews
YORK OCT 2011
... the more they stay the same
single central repository
transactionalupdate
multipleviews
YORK OCT 2011
limits of consistency
consistency not always possible
• distribution and caching
• multi-user update (Alison and Brian)
• view-based updates
http://www.perryslingsbysystems.com/trenchers.htmlhttp://www.vfridge.com/
YORK OCT 2011
ordering problems(race conditions)
Alison Brian
send send
It's a beautiful dayLet's go out afterwork.
I agree totally
It's a beautiful day. Let's go out after work.
Alison It's a beautiful day. Let's go out after work.
Alison
sendsend
perhaps not, I lookawful after thelate party
perhaps not, I look awful after the late party
Alison I agree totallyBrian
sendsendsend send
I agree totallyBrianperhaps not, I look awful after the late party
Alison
YORK OCT 2011
limits of consistency
consistency not always possible
• distribution and caching
• multi-user update (Alison and Brian)
• view-based updates
http://www.perryslingsbysystems.com/trenchers.html
YORK OCT 2011
view based updatecomplimentary functions
view /display
central state /data base
D
S
v v
S’f
D’v(f)
YORK OCT 2011
view based updatecomplimentary functions
view /display
central state /data base
D
S
v v
S’v–1(op)
D’op
YORK OCT 2011
... always
goal is eventual consistency
YORK OCT 2011
sometime not possible
distributed garbage collection– various algorithms ...
aim to make sure referenced items not lost– but always storagee node can die
• options:– prevent loss of referenced item– accept loss of referenced item
• leases or “ref not found” exceptions
YORK OCT 2011
what is consistent?
• conflicting updates• long-term transactions• synchronisation
... and Apple still can’t get it right!!
http://www.vfridge.com/
YORK OCT 2011
internal and external consisitency
• the exam board ....
YORK OCT 2011
is the world consistent anyway?
• departmental lists
YORK OCT 2011
a different approach
do not enforce consistency
but highlight inconsistency
YORK OCT 2011
a different approach
do not enforce consistencybut highlight inconsistency
• instead of views of central data, related yet different sources
• specify connections and automatically checkinform of updates, highlight discrepanciesbut allow divergence
YORK OCT 2011
‘workspace’
concept – workspaces
department nameaccounts jane baliktddsjhasddh sdhfg askjhlktechnical alan joun dixtechnical john mariani
centralinstitutional
database
spreadsheet oncolleague’s PC
table in word docon your own PC
YORK OCT 2011
fast forward ten years ...
• semantic web and RDF– open schema (but can be specified)– open world model– flexible and extensible (e.g. Volkswagen)
• individual data sets– ontology engineering – getting the model right
• linking open data– connecting web of data– shared vocabularies and URIs
YORK OCT 2011
linking open data
YORK OCT 2011
linking open data
linking through:• shared• dereferencable• URIs
YORK OCT 2011
fast forward ten years ...
• semantic web and RDF– open schema (but can be specified)– open world model– flexible and extensible (e.g. Volkswagen)
• individual data sets– ontology engineering – getting the model right
• linking open data– connecting web of data– shared vocabularies and URIs
soundsfamiliar?
YORK OCT 2011
the long tail
size ofdata set
a few very large data setse.g. Open Govt., OS, geonames, dbpedia
the small data of ordinary life:from local bus timetables to squash club league tables
YORK OCT 2011
supporting small data?
• Google fusion tables• Google refine• Freebase• Talis Kasabi
• mostly for ‘middle’ sized data
YORK OCT 2011
really small?
personal, but also Govt.often
tables
describe semantics rather than ‘converting’• explicit – simple description• implicit – semantics through interaction
YORK OCT 2011
explicit – 3 levels
• the table as it is:– there are 5 columns
col 1 is called “name”, col 2 is ‘population
• internal semantics of table (in its dataset)– each row is the properties of a country entity
defined by the ‘name’ column
• external linkage to standard data/vocab– rules + exceptions– country is ‘sameAs’ geoname country by matching name
except ‘Wales’ is geoname administrative region ...
YORK OCT 2011
implicit
action is specification:• view a table and give it a name• link items/columns from different data sources• perform calculation
semantics are emergent through use
YORK OCT 2011
so ...
long history of consistency... but not always possible or desirable
do not enforce consistencybut highlight inconsistency
exploit the long tail of small data
YORK OCT 2011
plus ...
come toTiree Tech Wave
3-7 Nov 2011