View
227
Download
0
Tags:
Embed Size (px)
Citation preview
Methods for Exploiting Academic Hyperlinks
Mike Thelwall
Statistical Cybermetrics Research Group
University of Wolverhampton, UK
The Problem To map patterns of communication between
researchers in a country based upon university web sites
Patterns of communication are also mapped based upon journal citations or journal title words Provides useful information about the structure and
evolution of research fields Can identify previously unknown field connections
Web analysis could illustrate wider and more current patterns
Data collection Web crawler AltaVista advanced querieshost:wlv.ac.uk AND link:gla.ac.uk AllTheWeb advanced queries Google
Does not support same level of Boolean querying
Types of link count
Direct link counts Inter-site links only
Co-inlink counts B and C are co-inlinked
Co-outlink counts D and E are co-outlinked
B C
A D E
F
Alternative Document Models Domain ADM
Count links between domains (ignoring multiple links) instead of pages
P1P2P3
P4P5P6
www.scit.wlv.ac.uk www.dcs.gla.ac.uk
Alternative Document Models Directory ADM
Counts links between directories Estimated using URL slashes
University ADM Counts links between entire university Web sites Too extreme for most purposes
ADMs reduce the impact of replicated links E.g. a subsite of 1000 pages linking to another
university home page in its navigation bar
Citation-Style Hyperlink Analysis Citation counts are known to be reasonable
indicators of research quality but is the same true for inlink counts? Counts of links to universities within a country can
correlate significantly with measures of research productivity
The significance of this result is in giving ‘permission’ to investigate the use of inter-university links for researching scholarly communication
Most links are only loosely related to research 90% of links between UK university sites have some
connection with scholarly activity, including teaching and research But less than 1% are equivalent to citations
So link counts do not measure research dissemination but are more a natural by-product of scholarly activity Cannot use link counts to assess research Can use link counts to track an aspect of communication
Links to UK universities against their research productivity
The reason for the strong correlation is the quantity of Web publication, not its quality
This is different to citation analysis
Language is a factor in international interlinking
English the dominant language for Web sites in the Western EU
In a typical country, 50% of pages are in the national language(s) and 50% in English
Non-English speaking extensively interlink in English
{Research with Rong Tang & Liz Price}
Can map patterns of international communicationCounts of links between EU universities in Swedish are represented by arrow thickness.
Linking patterns vary enormously by discipline No evidence of a significant geographic trend Disciplinary differences in the extent of
interlinking: e.g., history Web use is very low, Chemistry is very high
Individual research projects can have an enormous impact upon individual departments E.g. Arts web sites are often for specific exhibitions
or for digital media projects Links not frequent enough to reliably reveal
patterns of interdiscipliniarity
Background: Power laws in Academic Webs
Academic Webs have a topology dominated by power laws, including Counts of links to pages (inlink counts) Counts of links to pages (outlink counts) Groups of interconnected pages
Directed component sizes Undirected component sizes
Power laws mean that clustering connected components will not yield useful results
Community Identification Algorithm Can apply to page, directory and domain models Gives complimentary results: a “layered
approach”
1
10
100
1000
10000
1 10 100 1000 10000
Community size: Directory model, k = 32
Freq
uenc
y
1
10
100
1000
10000
100000
1 10 100 1000 10000 100000
Community size: page model, k = 32
Freq
uenc
y
Stretching links further: co-inlinks, co-outlinks For the UK academic Web, about 42% of
domains connected by links alone host similar disciplines, and about 43% connected by links, co-inlinks and co-outlinks
But over 100 times more domains are colinked or coupled than are directly linked
Links in any form are less than 50% reliable as indicators of subject similarity