Upload
janis-butler
View
215
Download
0
Embed Size (px)
Citation preview
The Endless Gallery: Visualizations of Author Data
Howard D. White
Xia LinJan Buzydlowski
College of Information Science and Technology
Drexel UniversityPhiladelphia, PA 19104
Authors’ names have a double sense
They can designate:•Persons (Living or Dead)•Oeuvres
Persons or oeuvres can be mapped if they can be related with some metric.
Data gathered on persons can be linked to oeuvres and vice-versa.
What are author maps good for?
Provide intellectual overviews of specialties.
Assist in retrieval of documents. Suggest networks for
sociometric analysis.
Inputs to author maps
Multiple-author input• Use author names from book or other publication
• Example: Deb Stagg used Diana Crane’s list of New York artists• Example: White & McCain used 120 top-cited information
scientists• Could use names in one of Randall Collins’s diagrams
• Use judgment sample (own knowledge, advisors’, etc.)• Example: Hinda Greenberg’s sample of 88 literary theorists
• Use author names from organization’s membership roll• Example: Howard White used Nazer-Wellman list for “Globenet”
Single-author input• AuthorLink
Collins, Randall. 1998. The sociology of philosophies; A global theory of intellectual change. Belknap: Harvard.
Two Author Co-citation Analyses in the Humanities from Drexel
Stagg, Deborah B. 1997. Art world maps: A quantitative sociology of contemporary American art. PhD dissertation. Drexel University.
Greenberg, Hinda F. 1999. Spanning boundaries: An interdisciplinary citation study based on literary-studies author co-citation clusters. PhD dissertation. Drexel University.
ABRAMS
ADORNO
BAKER
BAKHTIN
BALDICK
BARTHES
BELSEY
BENJAMIN
BENNETT
BLEICH
BLOOM
BOOTH
BROOKS
CHASE
CROCE
CULLER
DE_MAN
DERRIDA
EAGLETON
ELIOT
FISCHER
FISH
FOUCAULT
FOWLER
FREUD
FRYE
GADAMER
GATES
GILBERT
GOODHEART
GRAFF
GREEN
GREENBLATT
GUILLORY
GUNN
HABERMAS
HARARI
HARTMAN
HERNADI
HIRSCH
HOHENDAHL
ISERJACOBUS
JAKOBSON
JAMESON
JAUSS
JOHNSON
KREIGER
KRISTEVA
KRUPNICK
KUHN
LACAN
LACAPRA
LEAVIS
LEITCHLENTRICCHIA
LUKACS
MARX
MCGANN
MEISEL
MOI
NORRIS
PRATT
RANSOM
RICHARDS
RIFFATERRE
RORTY
RYAN
SAID
SAUSSURE
SCHMIDT
SCHOLES
SEARLE
SHOWALTER
SIEBERS
TODOROV
TOMPKINS
TRILLING
WATKINS
WEIMANN
WELLEK
WHITEWILLIAMS
WIMSATT
WITTGENSTEIN
WOOLF
ZIZEK
ZUMTHOR
Hinda Greenberg’s 88 literary theorists as PFNET
Co-citation is the mentioning of any two earlier documents in the bibliographic references of a third, later document.
The count of mentions may grow over time as new writings appear. Thus, co-citation counts can reflect citers’ changing perceptions of documents as more or less strongly related.
Documents shown to be related by their co-citation counts can be mapped as proximate in intellectual space.
Co-Citation Analysis
Doc 1
Doc 2
Doc 3 Doc 3 co-cites Docs 1 and 2
Co-Citation Analysis
Lin, Xia. 1997. Map Displays for Information Retrieval. Journal of the American Society for Information Science 48: 40-54.
Chen, Chaomei. 1998. Bridging the Gap: The Use of Pathfinder Networks in Visual Navigation. Journal of Visual Languages and Computing 9: 267-286.
Document co-citation counts times two papers are cited together.
Author co-citation counts times two authors, e.g., Lin and Chen, are cited together.
Journal co-citation counts times two journals are cited together.
Co-Citation Analysis
Data on co-citation are readily obtainable from databases of the Institute for Scientific Information (ISI) in Philadelphia, PA:• Scisearch (Science Citation Index)• Social Scisearch (Social Sciences Citation Index)• Arts & Humanities Search (Arts & Humanities
Citation Index) These databases are searchable online through,
e.g., the Dialog Corporation.
Author Co-Citation Analysis (ACA)
Detects patterns in the frequency with which any works by any two authors are jointly cited in later works. Could be called analysis of co-cited oeuvres.
Only recurrent co-citation is significant: the more times authors are cited together, the more strongly related they are in the eyes of citers.
Author Co-Citation Analysis
If Ben Shneiderman and Shakespeare are cited together in one article, it probably means little.
If Ben Shneiderman and Stuart Card are cited together in more than 200 articles, it means a lot: their names have come to symbolize something like “interactive interfaces for digital libraries.”
In a cited-author (CA) search on Dialog, SELECT CA=SHNEIDERMAN B AND CA=CARD SK
would retrieve the 200+ citing articles.
AuthorLink
Produces co-cited author maps in real time (a few seconds) on a Web site.
User merely has to enter name of a single author of interest as a “seed.”• E.g., “Dickinson-E” for Emily Dickinson
System responds with the top authors co-cited with that seed—24 other names ranked by frequency of co-occurrence.
System then pairs every name with every other in a 25x25 square symmetric matrix.
Quick Visualizations of a Database
User can choose to display the matrix either as a Kohonen feature map (SOM, self-organizing map) or as a Pathfinder network map (PFNET).
User can use either map as • An aid to retrieving articles that cite authors in
various combinations. (Combinations are made interactively.)
• Reproducible artwork in a new study, such as a review of a literature or a commentary on the author used as “seed.”
Advantages of Maps
Ranked list of top 25 co-cited authors often contains names not previously known to user.
Both Kohonen maps and PFNETs show interconnections of the 25 authors not apparent in the one-dimensional ranking of a simple list.
Maps automatically pair authors with their biographers, editors, commentators, and critics.
AuthorLink’s Underlying Database and Software
ISI gave our college 10 years’ worth of data from the Arts & Humanities Citation Index (AHCI 1988-1997) as a research grant. Has 1.26 million bibliographic records on articles and other items from humanities journals.
For retrievals from AHCI, we bought BRS Search, an industrial-strength engine, from Dataware, Inc.
Buzydlowski and Lin have written several special programs in Java and C to implement our system on top of the BRS Search software.
Interpretation of Maps
Kohonen maps show high co-citation counts of authors by placing them closer in space.
PFNETs show highest co-citation counts of authors directly, as links between nodes bearing authors’ names.
Norman Mailer
Limitations on the Maps
AuthorLink maps are pictures of 10 years of scholarship as reflected in AHCI.
They simplify and highlight certain relationships in humanistic studies.
They do not capture all relationships in the data, nor do they do they present “superior” truths.
Interface Design Considerations
Link interface to valuable digital libraries (ISI citation databases and the journal literatures they lead to).
Focus on intellectual content: meaningful words, meaningfully presented.
Stress quick and flexible presentations over long-term displays.
PFNET of Plato rendered with Cortona VRML software
PFNET of Plato made with Pajek
3 Main Shapes Found in AuthorLink PFNET Displays
Dendrite• E.g., for Virginia Woolf
Cycle• E.g., for Herbert A. Simon
Star• E.g., for Noam Chomsky
PFNETs
Are algorithmically connected graphs. based on finding “minimum-cost” path between any two nodes.
In ACA, this is generally the highest single co-citation count between author pairs (all pairs are examined).
Results in useful simplification of graph. Use spring embedder algorithm to
produce layout.
PFNETs
Make sense as pictures of relations in databases! Independent observers have found them highly
intelligible:• Xia Lin on Chinese philosophers• Kate McCain on historians of science & technology• Howard White on various literary figures and artists
Buzydlowski research will test interpretability of PFNETs and Kohonen maps as interfaces for domain experts and naïve users.
Vincent van Gogh
Einstein-A and Mozart
Einstein-A and Bohr
Architecture of AuthorLink
Front tier .. Middle tier .. Back tier
BRS Search EngineWeb Server
Java Servlets
Web-basedMap Interface
Java Applet
MappingProcedures
Application Server
OracleDatabase
MYSQL Database
Two Forms of Citation Data
Intercitation: Occurs when any member of a fixed group cites any other member of that group. Asymmetric. May or may not be reciprocal. • Here, citing among Globenet members as dyads.
Co-citation: Occurs when any two authors are cited together in the reference list of any work. Important when recurrent above some threshold. Symmetric. • Here, joint citation of any Globenet pair in a work
by any author whose journal publications are covered by the Institute for Scientific Information.
I ntercitation and Co-citation
Are produced by diff erent processes:•Co-citation reflects perceptions of
relatedness by all citers.•I ntercitation reflects perceptions of
relatedness by Globenet citers only.•I ntercitation is only a small part of co-
citation.
Correlate at .58 in Globenet.
Citation I dentity: An Author’s Citees Ranked High to Low
RANK: S2/1-55 Field: CA= File(s): 7,34,434 (Rank fields found in 51 records -- 456 unique terms)
RANK No. Items Term -------- ----- ---- 1 25 STONE DP 2 11 STANLEY J C 3 10 PIAGET J 4 10 STERNBERG RJ 5 8 RAVEN J C 6 6 CHI MTH 7 6 FORD ME 8 6 INHELDER B 9 5 TERMAN LM 10 4 CAMPBELL DT 11 4 FELDMAN DH 12 4 GOULD SJ 13 4 GUILFORD J P 14 4 HORN J L 15 4 HUNT E 16 4 J ENSEN AR 17 4 POSNER MI 18 4 SELMAN RL 19 4 SIEGLER RS 20 4 STERNBERG S
Dialog Commands1. I n I SI databases, f orm the set of journal items by an author: Select AU=STONE DP
2. Rank his or her citees with: Rank CA Cont
where CA stands f or Cited Authors and Cont means in continuous descending order. Top 20 (of 456) shown at lef t. Stone cites all of them f our or more times.
3. Look f or Globenet members among the citees to get intercitations.
4. Can alphabetize with Rank CA Cont Alpha to speed look-ups.
Intercitation: citing or being cited in a
group with definite membership
Bo Br Ca Co Cy Fr He Ke Of Pe Po Ro Sc Su Tr Wi -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 1 Bouchard 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 2 BrooksGunn 2 0 0 3 0 0 0 6 1 0 0 0 0 1 0 0 3 Case 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 4 Coe 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 5 Cynader 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 Frost 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 Hertzman 1 0 3 2 1 0 0 0 2 0 9 0 0 1 2 1 8 Keating 0 0 2 1 1 0 1 0 1 0 1 1 1 1 1 1 9 Offord 1 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 10 Pence 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 Power 0 0 1 0 1 0 3 1 1 0 0 0 0 0 1 0 12 Rohlen 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 Scardamalia 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 14 Suomi 0 0 0 12 0 0 0 0 2 0 1 0 0 0 1 0 15 Tremblay 1 5 0 0 0 0 0 0 8 0 3 0 0 0 0 0 16 Willms 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Row names cite column names
Cumulative Intercitation, to 2000
High outcitation
High incitation
Cumulative Intercitation, to 2000
Mann
Applebaum
Green
Martins
JonesWood
Stone
Hart
Cook
Hopkins
Scott
Demore
Oldfield
Brown
Grey
90
65
60
31
17
1
3
8
48
1
3
3
3
6
611
Smith
Co-citation counts of Globenet authors:Line thicknesses are proportional.
Colors reflect the seven disciplines of Globenet members.
Single Author Maps
Can map an author’s• Co-authors (those with whom she writes)• Citees (those she cites)• Co-citing authors (those who cite her with
others) • Co-cited authors (those others with whom she
is cited)
CAMEOs:
Characterizations Automatically Made and Edited Online
• Form set of bibliographic records of writings by an author or citing an author
• Select a field from the records• E.g., descriptors, co-cited authors
• Rank the terms in it by frequency of occurrence across the set
Some Types of CAMEOs
Descriptors or identifiers applied across an authors’ works
Journals to which an author contributed
Citation identities—an author’s citees Citation images—the authors with
whom an author is cocited
Subject CAMEO for Tom Nisonger:First 50 Identifiers in 3 ISI Files
1 4 IMPACT 2 4 LISTS 3 3 FACULTY 4 3 SERIALS 5 2 ARTICLES 6 2 CITATIONS 7 2 INDICATORS 8 2 JOURNALS 9 2 PATTERNS 10 2 PERIODICAL LITERATURE 11 2 RANKING 12 2 RELEVANCE 13 1 ACADEMIC LIBRARIANS 14 1 ACADEMIC-LIBRARY 15 1 ACCESS 16 1 ACQUISITIONS 17 1 AUTHORSHIP 18 1 BIBLIOMETRIC ANALYSIS 19 1 CITATION ANALYSIS 20 1 COLLECTION DEVELOPMENT 21 1 COLLEGE 22 1 COMMERCIAL DOCUMENT
SUPPLIERS 23 1 CONSISTENCY 24 1 COST-EFFECTIVENESS 25 1 DEFINITION
26 1 DELIVERY 27 1 DENVER 28 1 DESIGN 29 1 DOCUMENTS 30 1 GENETICS 31 1 INDEX 32 1 INDEXING CONSISTENCY 33 1 INFORMATION-SCIENCE 34 1 INTERLIBRARY LOAN 35 1 JOINT COMMITTEE REPORT 36 1 JOURNAL-CITATION-REPORTS 37 1 LIBRARIANSHIP 38 1 LIBRARIES 39 1 MODEL 40 1 ONLINE 41 1 OVERLAP 42 1 PERCEIVED PRESTIGE 43 1 PHYSICS 44 1 PHYSICS JOURNALS 45 1 PLUS 46 1 PRACTITIONERS 47 1 PROFESSIONAL JOURNALS 48 1 PROFILE 49 1 PSYCHOLOGY 50 1 PUBLICATION
Co-citation Image CAMEO for Javed Mostafa: First 50 Names in 3 ISI files
1 26 MOSTAFA J 2 13 SALTON G 3 11 BELKIN NJ 4 11 MAES P 5 10 BESSER H 6 9 MARKEY K 7 8 RORVIG ME 8 7 ROBERTSON SE 9 6 FLICKNER M 10 6 FOLTZ PW 11 6 JENNINGS A 12 6 JORGENSEN C 13 6 KONSTAN JA 14 6 LAM W 15 6 LARSON RR 16 6 LEWIS DD 17 6 MUKHOPADHYAY S 18 6 OARD DW 19 6 OCONNOR BC 20 6 RESNICK P 21 5 BARNETT PJ 22 5 BEARD DV 23 5 CHANG SK 24 5 ESTER M 25 5 FIDEL R
26 5 GECSEI J 27 5 GUPTA A 28 5 HASTINGS SK 29 5 LANG K 30 5 LYNCH CA 31 5 PAZZANI M 32 5 SELOFF GA 33 5 TURNER J 34 4 ARMS WY 35 4 BACH JR 36 4 BALABANOVIC M 37 4 BATES MJ 38 4 BRAJNIK G 39 4 CAWKELL AE 40 4 CHANG SF 41 4 ENSER PGB 42 4 HARMAN D 43 4 HOLT B 44 4 HULL DA 45 4 JACOB EK 46 4 KORFHAGE R 47 4 LAYNE SS 48 4 LOSEE RM 49 4 MOUKAS A 50 4 NARENDRA KS
Journal CAMEO for Rob Kling:Where He’s Published at least Twice
1 9 INFORMATION SOCIETY
2 6 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATIO
3 4 BULLETIN OF THE AMERICAN SOCIETY FOR INFORMATI
4 4 COMMUNICATIONS OF THE ACM
5 4 SCIENCE TECHNOLOGY & HUMAN VALUES
6 3 ASTROPHYSICAL JOURNAL (Another Kling R?)
7 3 CRYPTOGAMIE ALGOLOGIE
8 3 INFORMATION AGE
9 3 INFORMATION PRIVACY
10 3 SOCIETY
11 2 CONTEMPORARY SOCIOLOGY-A JOURNAL OF REVIEWS
12 2 JOURNAL OF QUANTITATIVE SPECTROSCOPY & RADIATI (Ditto)
13 2 PHYSICA SCRIPTA (Ditto)
14 2 SYMBOLIC INTERACTION