Upload
eugene-stafford
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Analyzing Text at the Middle Distance between the Close Read and Culturomics
Marti A. HearstU.C Berkeley
Joint Work with Aditi Muralidharan
Definition: “Close Read”
“Close reading describes, in literary criticism, the careful, sustained interpretation of a brief passage of text. Such a reading places great emphasis on the particular over the general, paying close attention to individual words,
syntax, and the order in which sentences and ideas unfold as they are read.”
-English Wikipedia, 6/4/2012
“Power and Passion in Shakespeare’s PronounsInterrogating ‘you’ and ‘thou’”Penelope Freedman, 2007, MPG Books, 280 pp.
Scene from “As you like it” by Daniel Maclise (1806-70)
Conclusions (“Power and Passion of Shakespeare’s Pronouns”)
“The subtleties of the use of ‘you’ and ‘thou’ that have emerged … can seem, at worst, random or, at best,
unfathomable. …
A set of oppositions has been revealed here: … These oppositions are complex and slippery: they may operate
in parallel, may converge or diverge. Each pronoun choice has to be seen in a highly specific context.”
Definition: “Culturomics”
Narrower than “digital humanities” and broader than “corpus linguistics”.
( Loose interpretation of definitions at culturomics.org )
Sensemaking
• A vague information need
• Iteratively refine it by
• Searching
• Reading
• Analyzing
• Reach understandingPirolli and Card 2005, Pirolli and Russell 2011
The North American Slave Narratives
• Stories of the lives of former slaves
• Published by white abolitionist sponsors
• About 3000 narratives survive
• ~300 in prototype
Do the north american slave narratives all conform to the
same stereotypes?
A “Master Plan” for the slave narratives
“... conventions so early and firmly established that one can imagine a sort of master outline drawn from the
great narratives and guiding the lesser ones”
-- Olney, J. “I was born: Slave Narratives and their Status as Autobiography”, Callaloo, 1984
Our approach
• Phase 1: Support searching for instances of conventions
• Phase 2: Support visualizing their occurrence in the collection
Searching for stereotypes
• Keyword search is not enough
• Search words: “cruel” “harsh” “overseer” “master” “mistress”
• Instead: “overseer” “master”, “mistress” described as “cruel”, “harsh”
• Also want the entire picture, for comparison
• “overseer” “master”, “mistress” described as ____?_____
• ___?_____ described as cruel
Natural language processing
The cruel overseer beat us severely.
object
subject
modifier
(automatically-extracted structure)
• Prevalence
• Position of occurrence within a document
• Across the entire collection
Part 2: visualizing stereotypes
Results (presented at MLA 2012)
• Prevalent stereotypes
• “I was born”
• Separation from parents
• Cruel treatment
• Escape
• A ‘missed’ stereotype
• Parents’ death
• Not as strictly ordered as implied by Olney’s master plan.
Problems
• Vocabulary
• Same concept expressed with many different wordings
• Needed to see synonyms, nearby words, suggestions on searches
• Comparison and curation
• Couldn’t isolate and compare results on sub-collections of document
The complete works of Shakespeare
• 42 documents -- plays and sonnet collections
• 1589 -- 1612
Analyze Hamlet.
How does the portrayal of men and women in Shakespeare
change in different circumstances?
(CHI ’12 works in progress)
English 203:Hamlet in the Humanities Lab Spring 2012, University of Calgary
Results
• WordSeer 1.5 being successfully used (so far) in Hamlet class
• How does the relationship between Hamlet and his mother change over the course of the play?
• How does Act 1 portray the character of Horatio?
• Investigated changing language use around men and women
• Unknowingly replicated and extended previous findings by other Shakespeare scholar
Summary
• We suggest enhancing NLP research with sensemaking tools to help with hypothesis formation
• Midway between reading the text and blind statistics.
• Helps with hypothesis formulation, verification, and refinement.
• This is clearly useful for literature analysis.
• It remains to be seen if it can help with social media analysis.