Bibliometric network analysis: Software tools, techniques, and an analysis of network science at...

  • View
    1.703

  • Download
    0

  • Category

    Science

Preview:

Citation preview

Bibliometric network analysis:

Software tools, techniques, and an

analysis of network science at Leiden

University

Ludo Waltman and Nees Jan van Eck

Centre for Science and Technology Studies (CWTS), Leiden University

LCN2 Seminar

Leiden, November 27, 2015

Centre for Science and Technology

Studies (CWTS)

• Research center at Leiden University

focusing on science and technology

studies

• About 30 staff members

• History of more than 25 years in

bibliometric and scientometric

research

• Contract research

• Full access to large bibliographic

database (Web of Science and

Scopus)

1

Bibliographic databases: ‘Big data’

2

Web of Science Scopus

Journals 12,000 20,000

Publications 45 million 35 million

Citations 1 billion 0.9 billion

Bibliometric networks

3

Web of

Science

Scopus

Citation network

of publications

Co-authorship network

of authors / organizations

Co-citation network

of pubs / authors / journals

Co-occurrence network

of terms

Bibliographic coupling network

of pubs / authors / journals

Bibliographic

database

Outline

• Software tools

• Network analysis techniques

• Analysis of network science

4

Software tools

5

Software tools

• VOSviewer (www.vosviewer.com)

– Tool for constructing and visualizing bibliometric networks

• CitNetExplorer (www.citnetexplorer.nl)

– Tool for visualizing and analyzing citation networks of

publications

6

VOSviewer

7

Map of university co-authorship

network

8

Map of journal citation network

9

CitNetExplorer

10

• Any type of bibliometric

network

• Co-authorship, co-citation, and

bibliographic coupling

• Time dimension is ignored

• Networks of at most ~10,000

nodes are supported

• Only citation networks of

publications

• Direct citation relations

• Time dimension is explicitly

considered

• Millions of publications are

supported

11

VOSviewer CitNetExplorer

Network

analysis

techniques

12

Network analysis techniques

13

Layout:

• Visualization of similarities

(VOS)

Community detection:

• Weighted modularity

• Smart local moving algorithm

1414

Clustering can be seen as mapping

in a restricted space

1515

Clustering can be seen as mapping

in a restricted space

Unified approach to mapping and

clustering

Minimize

where

n: number of nodes in the network

m: total weight of all edges in the network

Aij: weight of edge between nodes i and j

ki: total weight of all edges of node i

16

ji

ij

ji

ijij

ji

nddA

kk

mxxQ

2

1

2),,(

Mapping

xi: vector denoting the location

of node i in a p-dimensional

space

p

k

jkikjiijxxxxd

1

2

)(

Clustering

xi: integer denoting the

community to which node i

belongs

: resolution parameter

ji

ji

ij

xx

xx

d

if 1

if 0

Unified approach: Clustering

Equivalent to a weighted variant of modularity-based

community detection (Waltman et al., 2010)

Maximize

where

(xi, x

j) equals 1 if x

i= x

jand 0 otherwise

17

ji

ji

ijijjin

m

kk

Awxx

m

xxQ

2

),(

2

1),,(ˆ

1

ji

ij

kk

mw

2

Unified approach: Mapping

• Equivalent to the VOS (visualization of similarities)

technique (Van Eck & Waltman, 2007)

• Limit case of multidimensional scaling (Van Eck et

al., 2010)

18

ji

ji

ji

jiij

ji

xxxxA

kk

mQ

22

ji

jiijijxxDW

2

1

2

ij

ji

ijA

m

kk

D ij

ji

ijA

kk

mW

2

VOS

MDS

Unified approach

Commonly used clustering technique (modularity)

and commonly used mapping technique (MDS) can be

brought together in a unified framework

19

Unified

approach

Modularity

(weighted)

VOS

MDS

(limit case)

Louvain algorithm

• ‘Louvain algorithm’ (Blondel et al., 2008) is the

most popular heuristic algorithm for large-scale

modularity optimization

20

Louvain algorithm

21

Q = 0.3791

Q = 0.4151

Local

moving

heuristic

Local moving heuristic

Reduced

network

Original

network

Smart local moving algorithm

• Smart local moving algorithm extends the Louvain

algorithm in two ways:

1. Multiple algorithm iterations, with output of one iteration

serving as input for the next iteration

2. Recursive application of the local moving heuristic

22

Smart local moving algorithm

23

Q = 0.4198

Q = 0.3791

Reduced

network

Local moving

heuristic in

subnetworks

Local moving heuristic

Original

network

Empirical comparison (large networks)

• 6 networks

• Algorithms:

– Louvain (1 iteration)

– Louvain (10 iterations)

– Smart local moving (10 iterations)

• 10 algorithm runs using different random numbers

24

Empirical comparison (large networks)

25

Network Louvain Louvain (iterative) Smart local moving

Amazon

(0.5M / 0.9M)

Qmin

0.9257 0.9293 0.9335

Qmax

0.9264 0.9299 0.9338

t 6 9 28

DBLP

(0.4M / 1.0M)

Qmin

0.8203 0.8243 0.8357

Qmax

0.8227 0.8271 0.8367

t 7 9 26

IMDb

(0.4M / 15.0M)

Qmin

0.6976 0.6994 0.7050

Qmax

0.7041 0.7052 0.7077

t 18 26 100

LiveJournal

(4.0M / 34.7M)

Qmin

0.7441 0.7578 0.7676

Qmax

0.7557 0.7658 0.7720

t 350 566 1 549

WoS

(10.6M / 104.5M)

Qmin

0.7714 0.7851 0.7918

Qmax

0.7786 0.7902 0.7957

t 6 800 8 398 19 994

Web uk-2005

(39.5M / 783.0M)

Qmin

0.9793 0.9796 0.9801

Qmax

0.9795 0.9797 0.9801

t 11 006 11 736 17 074

Large-scale

analysis of the

structure of

science

26

Algorithmic classification systems of

science

• Publications (not journals) are clustered into

research areas based on citation relations

• Research areas are defined at different levels of

granularity and are organized hierarchically

• Clustering is performed using the smart local

moving algorithm (improved Louvain algorithm;

Waltman & Van Eck, 2013)

27

Algorithmically constructed

classification system of science

• 16.2 million publications from the period 2000–

2014 indexed in Web of Science

• 241.7 million citation relations

• Classification system of 3 hierarchical levels:

– 28 broad disciplines

– 813 fields

– 3,822 subfields

28

Breakdown of scientific literature into

3822 subfields

30

Social sciences

and humanities

Biomedical and

health sciences

Life and earth

sciences

Physical

sciences and

engineering

Mathematics and

computer science

Publications in scientometrics

subfield

31

Time-line map of highly cited

scientometrics publications

32

Application: Exploring the interface

between physical and medical sciences

33

Application: Emerging research areas

in physics

35

Particle physics

Astronomy and

astrophysics

Optics

Applied physics

Atomic, molecular,

and chemical

physics

Condensed matter

physics

CWTS Leiden Ranking

36

Analyzing the

structure and

evolution of

network

science

37

Network science according to

Wikipedia

Network science is an interdisciplinary academic field

which studies complex networks such as

telecommunication networks, computer networks,

biological networks, cognitive and semantic networks,

and social networks. The field draws on theories and

methods including graph theory from mathematics,

statistical mechanics from physics, data mining and

information visualization from computer science,

inferential modeling from statistics, and social

structure from sociology.

38

Networks text book by Mark Newman

The scientific study of networks, including computer

networks, social networks, and biological networks,

has received an enormous amount of interest in the

last few years. (...) The study of networks is broadly

interdisciplinary and important developments have

occurred in many fields, including mathematics,

physics, computer and information

sciences, biology, and the social sciences.

39

Journal of Complex Networks

The journal covers everything from the basic

mathematical, physical and computational principles

needed for studying complex networks to their

applications leading to predictive models in

molecular, biological, ecological, informational,

engineering, social, technological and other systems.

40

Network Science journal

Network Science is a new journal for a new discipline -

one using the network paradigm, focusing on actors

and relational linkages, to inform research,

methodology, and applications from many fields

across the natural, social, engineering and

informational sciences.

41

Popular network terms

42

neural network

social network

wireless sensor

network

complex network

wireless network

regulatory

network

Network publications

• Web of Science database

• Time period 1992–2014

• Research articles and review articles

• ‘network’ or ‘graph’ in title or abstract

• 0.7 million publications

43

Number of network publications per

year

44

Co-occurrence relations between terms

in network publications

45

Biology

Neuroscience

Social science

Chemistry

Mathematics

Computer science

Co-occurrence relations between terms

in network publications

46

Biology

Neuroscience

Social science

Chemistry

Mathematics

Computer science

Network fields

• Network publications are clustered into fields

• Based on 3.1 million citation relations between

network publications

• Clustering methodology of Waltman and Van Eck

(2012, 2013)

• Publications in the same journal are assigned to the

same cluster, except for multidisciplinary journals

• 13 main clusters, covering 97% of all 0.7 million

network publications

47

Number of network publications per

field

48

Citation relations between journals

with ≥ 100 network publications

49

Computer science

Mathematics

Physics

Neuroscience

Biology

Chemistry

Convergence toward an integrated

network science field?

Number of citations between network fields

(x 100; 5-year citation window)

502004

Physics

Math

CS

Biology SSNeuro

3 2

2 7 4 2 1 2

Physics

Math

CS

Biology SSNeuro

10 5

10 13 9 9 8 5

2014

25 27

6 39 1

Convergence toward an integrated

network science field?

% of publications in each of two fields citing at least one

publication in the other field (5-year citation window)

512004

Physics

Math

CS

Biology SSNeuro

3 4

2 6 5 3 2 2

Physics

Math

CS

Biology SSNeuro

5 5

3 6 3 5 4 5

2014

6 10

7 12

Convergence of social science and

physics

52

Citation relations between journals at

the SS-physics interface (2005–2014)

53

Scientometrics

Economics

Sociology and SNA

Physica A

PREPRL

PLOS ONE

PNAS

Nature

Science

Sci. Rep.

JSTAT

EPL

EPJ B

Leiden University’s institutes with most

publications on network science

• LUMC

• Leiden Institute of Advanced Computer Science (Science)

• Leiden Institute of Chemistry (Science)

• Leiden Institute of Physics (Science)

• Institute of Psychology (FSW)

• Mathematical Institute (Science)

• Leiden Observatory (Science)

• Institute of Biology Leiden (Science)

• Centre for Science and Technology Studies (FSW)

54

Citation relations between journals

with ≥ 100 network publications

55

Computer science

Mathematics

Physics

Neuroscience

Biology

Chemistry

Leiden University’s publication output

in network science journals

56

Leiden University’s publication output

in network science journals

57

CWTS

Leiden Institute

of Chemistry

LIACS

Leiden Institute

of Physics

Leiden Institute

of Physics

Institute of

Psychology

LUMC

Institute of

Biology Leiden

Mathematical

Institute

Conclusions

• Network research has increased tremendously

during the past 10–15 years

• Network research covers many fields of science,

but there is only limited evidence of increasing

integration

• Network research in social science and physics is

becoming more connected

• Leiden University contributes to all major areas of

network research, although the contribution to in

the area of computer science is somewhat modest

58

Do it yourself!

59

www.vosviewer.com www.citnetexplorer.nl

Thank you for your attention!

60

Recommended