Upload
guest5ec99a
View
730
Download
0
Tags:
Embed Size (px)
DESCRIPTION
NORSLIS PhD course in informetrics, Umeå University, Sweden, 18 June 2008
Citation preview
Webometrics 1.0from AltaVista to Small Worlds and Genre Drift
Lennart Björneborn
Royal School of Library and Information [email protected]
NORSLIS PhD course in informetrics Umeå 18.6.2008
outline
webometrics 1.0 birth of webometrics
early webometric research
two webometric studies small-world link analysis
based on graph theory
and social network analysis
genre connectivity analysis
M.C. Escher: House of Stairs, 1951
WWW = largest network with available connectivity data
Woo
d et
al.
(199
5)
WWW = collaborative weaving= macro-level aggregations of micro-level interactions= reflect social, cultural formations
Woo
d et
al.
(199
5)
= keep track of ”the complex web of relationships
between people, programs, machines and ideas” (Tim Berners-Lee, 1997)
Woo
d et
al.
(199
5)
WWW
birth of webometrics
citation analogy link = implicit recommendation of webpage though also negative references
’Webometrics’ 1997 + ’Web Impact Factor’ 1998 Almind & Ingwersen (1997). Informetric analyses on the World Wide Web: methodological approaches to ‘webometrics’.
Ingwersen (1998). The calculation of Web impact factors.
Google ’Page Rank’ 1998 exploit link structures: who receives many links from someone who also receives many links from someone who also … ?
birth of webometrics: access to link data*
linkdomain:norslis.net -site:norslis.net
link:www.norslis.net -site:norslis.net
(* cf. breakthrough of bibliometrics: access to citation data)
linkdomain:norslis.net -site:norslis.net
A
B
D
E G
F
H
C
basic link terminology
B has an inlink from A : ~ citation B has an outlink to C : ~ reference B has a selflink : ~ self-citation
C and D have co-inlinks from B : ~ co-citation
B and E have co-outlinks to D : ~ bibliographic coupling
co-links
(Björneborn 2004)
some proposed web metrics
Netometrics (Bossy, 1995)
supplement bibliometrics and scientometrics in observing
“science in action” on the Internet
Webometry (Abraham, 1996)
Internetometrics (Almind & Ingwersen, 1996)
Webometrics (Almind & Ingwersen, 1997)
Cybermetrics (journal started 1997 by Isidro Aguillo)
Web bibliometry (Chakrabarti et al., 2002)
some related web science
Web Mining (e.g., Etzioni, 1996; Kosala & Blockeel, 2000)
Web Ecology (e.g., Pitkow, 1997; Chi et al., 1998; Huberman, 2001)
Cyber Geography (e.g., Girardin, 1995)
Cyber Cartography (e.g., Dodge, 1999)
Web Graph Analysis (e.g., Kleinberg et al., 1999; Broder et al., 2000)
Web Dynamics (e.g., Levene & Poulovassilis, 2001)
Webology (journal started 2004 by Alireza Noruzi)
Web Science (Berners-Lee et al., 2006)
webometrics the study of quantitative aspects of
the construction and use of
info. resources, structures and technologies on the Web,
drawing on bibliometric and informetric approaches
informetrics
bibliometrics
scientometrics
webometrics
cybermetrics
(Björneborn 2004)
webometrics
four main research areas of webometric concern: web page content analysis;
web link structure analysis;
web usage analysis (e.g., log files);
web technology analysis (e.g., search engine performance)
informetrics
bibliometricsscientometrics
webometrics
cybermetrics
(Björneborn 2004)
web data collection non-standardized, messy data
due to diversified, distributed, dynamic web lack of metadata
primary data own web crawler (beware: robot exclusion) direct access to web servers incl. log files Internet Archive (www.archive.org) manual collection with browser
secondary data search engines (beware: deficiencies)
necessary data cleansing mirror sites, variant names, typo domains + links many file formats, including misspellings
examples of webometric analysis
powerlaw distributions e.g. pages, outlinks, inlinks, visits per web site (Adamic & Huberman 2001)
correlation between research indicators and inlinks e.g. UK, Taiwan, Australia (several studies by Thelwall et al.) EU projects EICSTES + WISER
co-inlink cluster analysis analogous to cocitation analysis e.g. EU universities (Polanco et al. 2001) e.g. Chinese IT companies (Vaughan & You 2005)
longitudinal studies web page change and permanence (e.g. Koehler 2004)
http://www.scit.wlv.ac.uk/~cm1993/mtpublications.html
Björneborn (2004). Small-world link structures across an academic Web space: A library and information science approach. PhD Thesis. www.db.dk/LB
small-world link analysisbased on graph theory andsocial network analysis
graph theory- Leonhard Euler (1707-1783), Königsberg
(Wilson & Watkins 1990)
graph theory graph = mathematical modeling of network
directed graph: e.g. www
nodes (or vertices): A, B, C, D, E
edges (if directed: arcs, links): AC, EB, ...
degree: d(A) = 3
- outdegree: dO (A) = 2; indegree: dI (A) = 1
directed walk: ACB: path length = 2
geodetic distance: shortest path between 2 nodes
centrality global c.: least sum of geodetic distances
betweenness c.: most shortest paths pass node
EE
AA
BB
CC
DD
Gross & Yellen (1999). Graph theory and its applications.
graph theory applications
graph theory used for mathematical modeling of networks e.g., biology, chemistry, physics, sociology, psychology, technology
also applied in information sciences incl. bibliometrics citation networks (e.g., Garner, 1967; Doreian & Fararo, 1985; Hummon
& Doreian, 1989; Shepherd, Watters & Cai, 1990; Egghe & Rousseau, 1990;
Fang & Rousseau, 2001; Egghe & Rousseau, 2002; 2003a; 2003b)
information systems (e.g., Korfhage, Bhat & Nance, 1972)
hypertextual networks (e.g., Botafogo & Shneiderman, 1991; Smeaton,
1995; Furner, Ellis & Willett, 1996)
social network analysis
relations between actors in social network
sociometry - 1930s (Moreno) - sociograms
social networks - 1950s - social network analysis
makes use of mathematical graph theory
Wasserman & Faust (1994). Social network analysis : methods and applications. Cambridge University Press.
Otte & Rousseau (2002). Social network analysis: a powerful strategy, also for the information sciences. Journal of Information Science, 28(6): 441-454
small-world networks
small-world = highly clustered + short paths short distances through shortcuts between clusters in network small-world = short local + short global distances efficient diffusion of signals, contacts, ideas, viruses, etc. in networks
social network analysis in 1960s: ’six degrees of separation’ today: ‘small worlds’ in biological, chemical, technical, social networks brains, epidemics, scientific collaboration, semantic networks etc.
(Watts & Strogatz 1998)
most links connect similar topics topical clusters
small-world web cross-topic shortcuts
main research question what types of web links,
web pages and web sites
function as cross-topic connectors
in small-world link structures
across an academic web space?
objective: identify micro-level aspects
of how small-world phenomena emerge
Björneborn (2004). Small-world link structures across an academic Web space: A library and information science approach. PhD Thesis. www.db.dk/LB
small-world link analysis
UK link data 2001
109 UK universities web crawler, Thelwall
7669 subsites www.hum.port.ac.uk www.atm.ox.ac.uk ... departments, centres,
research groups, etc.
connections between 7669 subsites 207 865 links 105 817 web pages
1893 SCCStrongest Connected
Component
96 IN-Tendrilsconnected from IN
2660 OUTreachable from SCC
626 INtraversable to
SCC
55 OUT-Tendrils connected to OUT
7 Tubeconnecting IN to OUT
2332 Dis-connected
(Björneborn 2004)
‘corona’ graph model
reachability structures
10 seed nodes (stratified sampling in SCC component)
hum.port.ac.uk
Faculty of Humanities and Social Sciences, Portsmouth
Atmospheric, Oceanic and Planetary Physics,
Oxford atm.ox.ac.uk
economics.soton. ac.ukEconomics Dept, Southampton
Chemistry Dept, Glasgow chem.gla.ac.uk
psy.man.ac.ukPsychology Dept, Manchester
Mathematics Dept, Glasgow Caledonian maths.gcal.ac.uk
speech.essex.ac.uk
Speech Research Group, Linguistics Dept, Essex
Palaeontology Research Group, Earth Sciences Dept, Bristol palaeo.gly.bris.ac.uk
geog.plym.ac.ukGeography Dept, Plymouth
Ophthalmology Dept,[eye research] Oxford eye.ox.ac.uk
10 path nets with all shortest link paths between five pairs of topically dissimilar subsites
.ac.uk
.uk
cfd.me.umist.ac.uk
ercoftac.mech.surrey.ac.uk
cajun.cs.nott.ac.uk
ukoln.bath.ac.uk
cs.man.ac.uk
ashmol.ox.ac.uk
collections.ucl.ac.uk
vlmp.museophile.sbu.ac.uk
shortest link path
path net = ‘mini’ small world
transversal link
path net = all shortest link paths between two given nodes (subsites)network analysis tool = Pajek adjacency matrix
(Björneborn 2006)
some indicative findings
findings not generalizable: small, stratified sample
however: indicative findings may suggest
computer-science sites = academic cross-topic connectors
personal link creators = web cohesion ‘glue’ – especially link lists
researchers, PhD students, etc. are important providers of site outlinks
and important receivers of site inlinks
over 80% of cross-topic links academic (research, teaching)
small-world web implications small local threads in the shape of users’ links
affect how the global web is cohesive and may be traversed– like ‘the strength of weak ties’ (Granovetter 1973)
– knowledge diffusion and social cohesion
across social groups
counteract ‘balkanization’ – disconnected / unreachable subpopulations
reachability structures– essential for web crawler harvests
webometric study: genre connectivity what role do web page genres play for cohesion
and reachability on the Web? [one of the first studies]
what types of web page genres function as link providers and link receivers between university web sites?
352 links 249 target pages
source pages and target pages in 10 path nets 281 source pages
genre connectivity analysis
meta genres
genre pairs
web of genres
genre network graph extracted with Pajek software © Björneborn
genre connectivity academic web spaces = rich diversity of interlinked genres
= diversified link motivations personal link creators are important web cohesion builders
personal link lists provide site outlinks personal homepages receive site inlinks
genre connectivity affect web cohesion and reachability by genre drift and topic drift
genre drift + topic drift
topic clusters with genre diversity + genres with topical diversity changes in page genres and page topics along link paths genre drift within clusters + topic drift between clusters
short link distances (small world)
questions?
read more:
Björneborn (2004). Small-world link structures across an academic web space : A library and information science approach. PhD dissertation. www.db.dk/LB
Björneborn (2006). ‘Mini small worlds’ of shortest link paths crossing domain boundaries in an academic Web space. Scientometrics, 68(3): 395-414.
Björneborn (forthcoming). Genre connectivity and genre drift in a web of genres. In: Mehler et al. Genres on the Web: Corpus Studies and Computational Models.