Utilising Webometric Data from Online Digitised Newspaper
Collections Paul Gooding
UCL Centre for Digital Humanities
The Context for Large-Scale Digitisation
Digitised Newspaper Collections: Primary Source and Research topic…
Citations of British Library Nineteenth Century Newspapers (launch to 2012)
BNCN used as research tool BNCN as a collection
From Gooding, P. (2014) “Search All About it”: A Mixed Methods Case Study into the Impact of Large-Scale Newspaper Digitisation.
(Thesis, not yet published)
Web Analytics: Google Analytics
• Web Analytics = “The measurement, collection, analysis and reporting of web data for purposes of understanding and optimizing web usage.” (http://www.digitalanalyticsassociation.org/Files/PDF_standards/WebAnalyticsDefinitions.pdf)
• Google Analytics is the leading analytics platform, and it’s great! • Unobtrusive;
• Easy to implement;
• Rich data source.
• But it does pose a couple of problems…
Google Analytics: A Couple of Flaws…
• (http://dilbert.com/fast/2008-05-08/)
Web Log Analysis for Welsh Newspapers Online
• 3 types of server queries (in this case):
• “Search queries” – users undertake search on the collection;
• “Browser queries” – users use browse or filter functions;
• “Content queries” – users view digitised newspaper content.
• Results cover period from 12th March 2013 to 30th June 2013.
• Investigating a longer period would increase the significance…
Content Log Analysis: Welsh Newspapers Online
• Server logs look like this (except for the colours…):
• 2013-06-02T12:26:50+01:00 51a5c97c3c8d3 llgc-id:3036868 llgc-id:3039814 llgc-id:3037695 Aberystwyth Observer 21 September 1872 [2] ART40
• And they tell us the following information:
• Time and date of interaction Unique user ID Server identification Newspaper title Edition date [Page number] Article number
Users viewed content from the 1840s more than any other decade
0.00%
1.00%
2.00%
3.00%
4.00%
5.00%
6.00%
7.00%
8.00%
9.00%
1804-1809 1810-1819 1820-1829 1830-1839 1840-1849 1850-1859 1860-1869 1870-1879 1880-1889 1890-1899 1900-1909 1910-1919
Po
pu
lari
ty
Most Viewed Decades in WNO, compared to total pages per decade
They searched for personal names, place names and topics relevant to Wales
And they engaged heavily with newspaper content
0%
10%
20%
30%
40%
50%
60%
70%
0 20 40 60 80 100 120 140 160
Pe
rce
nta
ge o
f U
sers
Pageview number
Percentage of Queries by Type
Search %
Browser %
Content %
“But when people are past a certain age,
you sort of stop asking them why they do
things. It feels dangerous. What if you say
So, Mr Penumbra, why do you want to
know about Mr Tyndall's coat buttons? And
he pauses, and scratches his chin, and
there's an uncomfortable silence-- and we
both realize he can't remember?”
Robin Sloan, Mr. Penumbra’s 24 Hour Bookstore.
The Qualitative Context
Thanks for listening!
Any Questions?