19
The New Empiricism: Big Data, Network Science, and Psychological Inquiry Kevin Lanning, Fla Atl U Seth Stephens-Davidowitz, Google Dustin Wood, Wake Forest U Tal Yarkoni, U Texas

Network analyses of psychological science

Embed Size (px)

DESCRIPTION

Presented at the APS Conference, May 2014.

Citation preview

Page 1: Network analyses of psychological science

The New Empiricism: Big Data, Network Science, and

Psychological Inquiry

Kevin Lanning, Fla Atl USeth Stephens-Davidowitz, Google

Dustin Wood, Wake Forest UTal Yarkoni, U Texas

Page 2: Network analyses of psychological science

Beyond keywords: Network analyses of psychological

science

Kevin Lanning, Ryne Sherman, Xingquan Zhu, Jared Hesse, & Daniel LopezFlorida Atlantic University

Page 3: Network analyses of psychological science

Introduction

Science as a social endeavorNetworks, citations, and meaning

Four levels of analysis

Level of analysis Concept or parameter InterpretationNetwork Giant component Overall connectednessCommunity Modularity Topics, subdisciplines, cliques, categoriesPath Diameter, path length Distance and proximity of papers, scholarsAuthor/paper In-degree, PageRank, centrality Mechanisms of influence, impact, eminence

Page 4: Network analyses of psychological science

Datasets

Annual Review articles on personality• Author as the unit of analysis (Smith,J)• 33 source papers published 1977-2012

Journal of Social Issues and Analyses of Social Issues and Public Policy (journals of SPSSI/APA Division 9• 855 source papers published 2001-2013

• By citation (Smith,J 2006)• By author (Smith,J)

Analyses of first authors only.

Page 5: Network analyses of psychological science

Big data, small world:Personality in the Annual Review

Scope• 6,294 references by 2,803

unique authors• Of these, 219 self-cites

(3.5%) are excluded

Connectedness• All authors are connected,

and separated by no more than 5 degrees

• Average path is 3 Nodes and text are colored by community. Node size represents Eigenvector centrality.

Layout determined by Force Atlas 2 algorithm.

All analyses and visualizations done in Gephi.

Page 6: Network analyses of psychological science

Eminence and network centrality: 3 interpretations• Simple measures (counts)• In-degree (ID) = cites by different sources• Weighted in-degree = total cites

• Recursive measures• PageRank (PR) and Eigenvector Centrality

• Importance of a paper is dependent upon the importance of papers which refer to it

• A random walk through the literature

• Betweenness Centrality (BC)• Tying together regions of scholarship

Page 7: Network analyses of psychological science

Personality in the Annual Review: Most cited authors

115 authors with 5 or more cites

Page 8: Network analyses of psychological science

Proximity and distance in citation networks• Proximity may occur for several reasons;

distance is less ambiguous• Closeness of Block and Mischel in the personality

space (right)

• Greatest distances among source papers• Parke (‘83, Social and Personality Development) • -> Rorer (‘83, Personality Structure and

Assessment) • -> Butcher (‘96, Personality: Individual

Differences and Clinical Assessment).

Page 9: Network analyses of psychological science

Personality in the Annual review: Communities as constellations / The five factor paradigm

Between 12 and 15 communities are identified. One of the largest is anchored by source papers of Wiggins, Carson, Digman, and Ozer.

Page 10: Network analyses of psychological science

Personality in the Annual review: Minnesota and Berkeley schools

In one analysis, the two largest communities

Page 11: Network analyses of psychological science

Personality in the Annual review: Some thoughts on Harrison Gough (1921-2014)

The direct and indirect influenceCited in 5 of the Annual Review papers in the dataset (1 degree), which link to 597 others (22% of scholars by 1-2 degrees), and to 2329 (83%) by 3 degrees.

The continuing legacy“Big Data” approaches can be seen as an extension of the empirical tradition of Binet, Meehl, and Gough.

Page 12: Network analyses of psychological science

Analyses of the SPSSI journal database

• All papers published in JSI, ASAP from 2001-2013. • N sources = 855• 38854 references(45.4 per source) • - 2,042 self-references (5.3%)• - 3,198 (8.2%) unusable: references to news articles, government institutes,

or without a date____________________________

• 33,615 usable citations (86.5%)• 24,263 unique papers• 14,702 unique first authors

Page 13: Network analyses of psychological science

SPSSI citation network: Connectedness

• Of the 24,263 papers, 24,075 (99.2%) are linked in a single giant component• Papers are separated by an

average of 4.2 links

Page 14: Network analyses of psychological science

PageRank is high for papers with commentary• King (2011) • Second highest PR in database

• Explanation• Papers which are cited by papers with few

references (such as commentaries) can have a disproportionate impact in a sparse network

• Two solutions• Omit commentaries and book reviews • Treat authors rather than papers as the unit of analysis

• Limitations of citation networks: sparseness, time-constraint

Page 15: Network analyses of psychological science

The SPSSI author network: (almost) no one is an island• 14,703 unique authors• All but 6 are linked to the main • Average path between nodes =

5.1

• 32-38 communities*• Average author is linked

1.9 times

Whole network

Page 16: Network analyses of psychological science

The SPSSI author

network:Most cited

Includes 68 authors with 20 or more

citations. Nodes ranked by eigenvector

centrality

Page 17: Network analyses of psychological science

The SPSSI author network: Centrality

• Content of rankings• Betweenness vs. other

measures• On gender effects in citation

networks

Page 18: Network analyses of psychological science

The SPSSI author network: Allport and Lewin communities compared

Lewin community includes authors with 5 or more cites; Allport includes authors with 13+ cites. Nodes ranked by eigenvector centrality

Page 19: Network analyses of psychological science

Summary• Content: Influential persons and scholarly works• Different measures of centrality have distinct interpretations• Beyond this, eminence of communities as well as persons

• Citation networks are small worlds• The discrete clustering approach to describing the network is not

ideal• In citation networks, distance may be more interpretable than

proximity• The work is primitive• Bigger data and more sophisticated methods lie ahead