View
1.226
Download
2
Category
Preview:
DESCRIPTION
Digital History seminar 4 November 2014 Live Stream: http://ihrdighist.blogs.sas.ac.uk/2014/10/28/tuesday-4-november-interrogating-the-archived-uk-web-historians-and-social-scientists-research-experiences/
Citation preview
Interrogating the
archived UK web
“RNIB”
Gareth Millward – gareth.millward@lshtm.ac.uk – Centre for History in Public Health
Improving health worldwide
http:://history.lshtm.ac.uk
“The best-laid schemes
o’ mice an’ men…
• Original plan to investigate the presence of information for disabled people on the UK web
• Also to look at the accessibility of that info through Web Accessibility Standard 1.0 (1998)
• Search for major organisations and key disability words
• Run sample through validation tools
Pieter Bruegel the Elder - The Tower of Babel (Vienna) - Google Art Project – edited : from Wikipedia
… Gang aft
agley.”
• Far too much stuff!
• Search terms such as “RADAR”, “SCOPE” and “MIND” obviously… problematic…
• No discernible pattern from code validation
• “Experience” of using screen readers impossible (for now)*
• Defining “information” or “reach” not a simple task
• Still major problems with assessing “importance” and “relevance”
* - At least within design scope of this project… !
Macintosh Performa 5200, a mid-90s Apple computer. From Wikipedia.
“RNIB”
• A simple four-letter string
• Played a key role in promoting web standards in Britain
• Just over half a million “hits” –significant number compared to other disability organisations.
RNIB logo © RNIB – RNIB.org.uk
Large number of instances
relative to peers…
Search term Instances
RNIB 516,165
MENCAP 218,439
RNID 217,963
"disability alliance" 22,421
royal association for disability and rehabilitation
16,072
BCODP 12,501
UKDPC 2,348
"spinal injuries association"
45,477
"centre for independent living"
23,185
"disability benefits consortium"
2,205
disability 12,909,868
*.* (all) 2,023,288,655
0.00%
0.01%
0.01%
0.02%
0.02%
0.03%
0.03%
0.04%
0.04%
0.05%
0.05%
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Inst
ance
s p
.a.
as p
ere
cen
tage
of
wh
ole
p.a
.
Instances of search terms relative to *.*, 1996 - 2010
RNIB MENCAP RNID
… and not all self-
referential
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
Instances per domain as percentage of total for "RNIB"
Predominance of .org.uk
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
.org.uk .co.uk .gov.uk .ac.uk .nhs.uk .parliament.uk
Domains of instances as percentage of total of "RNIB"
The trouble
begins - links
Links to Instances
-> rnib.org.uk 259,421
-> w3.org 71,798
-> mla.gov.uk 34,435
-> openharmonise.org 32,071
-> facebook.com 31,098
• Disaggregated statistics are basically meaningless
• Second most common link is to W3.org – had virtually nothing to do with the actual activities of RNIB
• openharmonise.org – the CMS for mla.gov.uk. Reflects references on MLA site, not the activity of RNIB
The bloody Guardian…
Commensurability goes
out the window..
• Once you start filtering out the areas that aren’t “really” part of your search, it becomes impossible to compare one search term with another.
• You will lose “useful” information and keep “useless” stuff
• Can begin to build a “human readable” corpus – but what the heck do I actually have, here? Certainly not what I originally intended to look at…
xkcd:Thesis Defence
Whittling down
• REMOVED LINKS TO W3.org (usually just a mention of WAI)
• REMOVED RNIB.org.uk (I can browse the main site – more interested in external material)
• REMOVED 2009 & 2010 (made the sample smaller, and these use different crawling system)
• REMOVED RNIB.co.uk
• REMOVED big-print.co.uk
• REMOVED MLA.gov.uk (mentions RNIB a lot, but becomes noise)
• The result of all this? The corpus is down to 71,112
• (Actually, by reducing the date range further and adding a couple of extra tweaks, now down to 39,270)
What did we learn
today?
• Visible effects of the impact of RNIB on UK web standards
• Sheer presence suggests RNIB was better than its peers at establishing itself on the internet
• Google has made us me lazy
• An archive without an archivist or a catalogue is highly problematic for researchers The British Library – from Wikicommons
Recommended