Document versus content: getting quality information across the web Kate Forbes-Pitt 15 th June 2006

Document versus content:getting quality information across

the web

Kate Forbes-Pitt

15th June 2006

Documents vs content

Why bother?Users like documents

Why?

What has this got to do with quality?Quality of understandingQuality of dissemination

Information is in-forming, it is a change in a person from an encounter with data

Getting ideas across can be a tricky process.Human inventiveness draws on all resources – down to the look, the feel,

and even the smell coming off a document.

Users and their documents

Users ask for documents – why? This is how they get the information and share it

They think they are efficient

They know them and trust them

They think documents communicate information effectively

They have them already – it is less work

What is a document?

A piece of paper with writing on it We can scan this and ‘pdf’ it

A way of capturing / imparting information

The letter

How do we know it’s a letter? It is not a piece of paper with writing on it It contains no ‘information’ at all But:

It contains the information that it is a letter

What can be said about this information? It is not held as words Our recognition of it is almost instant We all agree on it social

implicit

learned

knowledge about our world

What can be said about documents? They contain information separate from their content They ‘wrap’ the content and give information about it They are layered:

This is what a makes a document a document

Documents on the web

Is the document information successfully captured and reproduced?

No (arguably)

Why?

Two principal reasons1. Document’s ‘old’ social rules are masked by ‘new’ rules

2. Social context is not ‘automatable’

1. Rules of interaction

Normative and pragmatic access Normative access – rights and obligations surrounding a document

Letter

is it addressed to you?

Are you obliged to reply to it?

Pragmatic access – governed by relevance and opportunity

A notice pinned to a notice board

Relevance is determined by those walking past

Opportunity – walking past and being literate or tall enough to see it

Computer access

Pragmatic access Opportunity is not immediate

Level of literacy required is different

Normative access Rights and obligations are different on the web

Communication of it is unnecessary

Social knowledge of the same kind is unnecessary The implicit information that makes a document a

document is superfluous It retains it as a printed document

2. What can be automated

A document requires social knowledge in order to interpret it

The social knowledge required is not available within the document

i.e. to interpret the document one must have access to implicit rule not contained within the system

This is arguably impossible to ‘automate’ Dreyfuss Collins and Kusch

Dreyfus

Four levels of intelligence Highest level – natural language translation First I was afraid, I was petrified

Kept thinking I could never live without you by my sideBut I spent so many nights thinking how you did me wrongI grew strong I learned how to carry on and so you’re back from outer space

First had I keep thinking fear I petrifiedI could without to apart from my side never live but I had spent thoughtso much nights how you made yourselves me wrongly I developed much I learned how and to continue in such a way you arefrom return of the special atmospheric area

Natural language translation requires social knowledge Documents are the same

Collins and Kusch

Two types of action Mimeomorphic

Reptitive – robotic Spot welding a car

Polimorphic Rule bound BUT Impossible to write a recipe for

HSBC adverts

Concur with Dreyfus:

Where social rules are involved you cannot automate

2. What can be automated?

We know we cannot automate a social process We know the document to be a social artefact Therefore we know that the document will be

impoverished by automating it

Where does that leave us?

The web as a new postal system?

Or way of getting content to users?

Web

Computer has its own ‘rules of engagement’ It destroys pragmatic access It has its own set of expectations It masks the documents own rules

Text is never ‘naked’ It is always in context

Understand the context At best document rules are confused

Resulting in confused users At worst rules are lost

Resulting in lost information

Content

Content wins out Content uses the rules of the web Content is enriched by this environment Documents are impoverished by it

Quality information

Users clear about purpose of text Users able to interpret without confusion