Doing Digital History
Heuristics, Hermeneutics, and Source Criticism in a Digital Age
Melvin Wevers
@melvinweverswww.translantis.nl
February 25, 2015 UCLDH Seminar - UCL
Overview• Consuming America: the role of the United
States as a reference culture in Dutch consumer society between 1890-1940
• Digital Humanities Cycle: heuristics, hermeneutics, corpus creation, source criticism, and tool criticism
• Methods: Full-text search, N-gram analysis, Topic modeling, Named entity recognition
What is a Reference Culture ?
• Reference culture is an analytical concept to study geopolitical formations in a transnational context.
• Reference cultures serve as a model for other countries, e.g. Byzantium empire, 19th century England, Caliphate.
• Twentieth century: The American Century - Henry Luce
• Culture of references > imagined, symbolic, and metaphysical ‘America’
• Focus on the receiving end within a wider global context of globalization, Americanization and modernization (cf. Rob Kroes, John Muthyala)
How do we research Reference Cultures?
• Reference cultures emerge in collective discussions on specific products, ideas, and practices
• Against a background of cultural, technological, and economic developments
• In other words, a reference culture is an imagined, symbolic ‘America’ grounded within actual material conditions and practices
• The project aims to use digital technologies to analyze reference cultures in Dutch digitized newspapers between 1890-1990
Case Study: Cigarettes 1890-1940
• Cultural icon of American entrepreneurialism
• “Product that defined America” (Allan Brandt)
• production, distribution, and consumption
• How was symbolic connotation perceived outside of the United States?
• Geographical connotation
• Debates on technological changes: taste and packaging
• Changing consumer behavior > consumerist abundance, female smokers
Geographical connotations of the cigarette - RQ
• How have the geographic connotations of the cigarette shifted between 1890-1940?
• How has this informed the idea of America? In other words, the performance of America as a reference culture?
Is this Big Data Research?The change of scale has led to a change of state. The quantitative change has led to a qualitative one. […]
[B]ig data refers to things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value
Viktor Mayer-Schönberger en Kenneth Cukier, Big Data: A Revolution That Will
Transform How We Live, Work, and Think (Boston 2013) 13.
Distant reading‘Distant reading’, I have once called this type of approach; where distance is however not an obstacle, but a specific form of knowledge; fewer elements, hence a sharper sense of their overall interconnection. Shapes, relations, structures. Forms. Models.
Franco Moretti, Graphs, Maps, Trees. Abstract Models for a Literary History
(Londen en New York 2005) 1.
• The Dutch newspaper archive is not really big data (biggish data?)
• Do we want to work with big data research? Big patterns? Or do we aim for more extensive searching, and more complexity in our sources
• “[D]ata does not always have to be used as evidence, but can be simply for discovering and framing research questions. […] [P]laying with data – in all its formats and forms – is more important than ever.”Frederick W. Gibbs and Trevor J. Owens, ‘The Hermeneutics of Data and Historical Writing’, in: Kristen Nawrotzki and Jack Dougherty (eds.), Writing History in the Digital Age (Ann Arbor, MI: University of Michigan Press, 2013).
• Exploratory searching as an advance corrective against the threat of essentialism and determinism [important in case of history/Americanization]
How Big is Big Data?
Digital Humanities CycleHeuristics
Corpus Selection
Hermeneutics
Full-text search, text analytics, topic modeling, named entity recognition,
n-gram analysis
Tool Criticism
Source criticism
Heuristics: Full-text search
• Large amounts of data
• Digital archives
• International data
• Ability to search full-text
Delpher.nl
Heuristics using metadata
“At least for research, digital history can be defined as the theory and practice of bringing technology to bear on the abundance we now confront.”‘Interchange: The Promise of Digital History’, The Journal of American History 95 (2008) 452-491, 454.
New Way of Doing History
Bob Nicholson “The Digital Turn” Media History (2013)
Source Criticism
[T]he problem is that while we think we are searching newspapers, we are actually searching markedly inaccurate representations of text, hidden behind a poor quality image. And even more damning, by citing a hard copy of the original we are then refusing to document our research path, making it difficult for others to critique the process.
Tim Hitchcock, ‘Confronting the Digital: Or How Academic History Writing Lost the Plot’, Cultural and Social History 10 (2013) 9-23.
Corpus Selection• Corpus Selection
• API (JSON)
• Texcavator
• Cleaning up the Corpus: Python/OpenRefine/NLTK
• Corpus analysis / Corpus Linguistics
• Topic modeling
• Named entity recognition (NER)
Tool Criticism• Tools as instrument (STS)
• Bruno Latour - Laboratory Life: The construction of scientific facts (1986)
• Steven Shaping - Never Pure: Historical Studies of Science as if It Was Produced by People with Bodies, Situated in Time, Space, Culture, and Society, and Struggling for Credibility and Authority (2010)
• Explain how the tools works
• How do we define whether the tool works?
Topic Modeling• Method (MALLET) to discover latent structures within a
collection of texts
• Words acquire meaning through context -> Topic Modeling
• Contextual comparisons between different periods or corpora
• Main goal: discover events, users, and objects > Topics > Hidden debates
• In other words: not to prove stuff, but to find more stuff
1924-1929 key topics advertisements
• sigaret virginia whip chief ardath london goud cigarettes kwaliteit olympia kurk nummer rook beste gezondheid zoo zulk vooraan punten
• sigaret sigaar pijp beter smakelijker wybert amersfoort virginia houbaer tabletten rooken oudste prijs magnums hollands nasmaak cent nemen noch
• sigaret nieuwe onze tabakken vervaardigd doosje import vraagt rookt smaak betere cents sigaretten turksche fijne kwaliteit edelste uwe proef
• sigaret club sigaretten gij army camel tabak cent sopla camels wereld kwaliteit prijs gemaakt virginia sigaren eerst rookt keel
• sigaret adamas egyptische mildste tegenwoordig stuks coupon coupons mavrides fijnste sigaretten cts geschenken gratis ste naam fijn slechts omar
Named Entity Recognition
• StanfordNER is a method to automatically detect specific entities within texts
• Locations
• Persons
• Organizations
Named Entity Recognition - output 1890-1920
Foreign Locations (N>20) Dutch Locations (N>20) American Locations
The United Kingdom / London (151 / 84) Rotterdam (496) America (70)
Germany / Berlin / Hamburg (146 / 81 / 22) Amsterdam (177) New York (34)
France / Paris (139 / 154) Tilburg (107) Washington (11)
Russia (102) Groningen (94) United States (Vereenigde Staten) (11)
The United States / America / New York ( / 70 / 34) Breda (64) Chicago (3)
Belgium / Bruxelles / Antwerp ( / 57 / 46 ) Haarlem (48) Virginia (3)
Austria / Vienna (40 / 21) Utrecht (43) North-America (Noord-Amerika) (3)
Turkey (39) Arnhem (35)
Holland (39) The Hague (26)
Europe (36) Leeuwarden (24)
Spain (33) Maastricht (26)
Leiden (24)
Friesland (21)
Good ‘Ole Close Reading• Don’t say goodbye to your traditional methods or theories
• Country-of-Origin effect (branding theory)
• Theories of modernization/globalization/Americanization
• Discourse analysis > Foucault
• Conceptual history > Braudel, Koselleck, Armitage [Big history manifesto]
• DH is too often about the tools or the methods; but can be bridged with theoretical / analytical models into critical digital humanities [cf. David Berry, Alan Liu]
Conclusion (I): Geographical connotations
• Country of origin effect
• From actual locations to symbolic references
• Shift of geographical connotation of cigarette
• Oriental, British, European, American
• Detached from United States / United States as floating signifier
Conclusion (II): Collateral damage
• The output provided me with topics to further research in other chapters > data-driven
• These are provided by the source material and not only by secondary literature
• Technologies of Taste
• Consumer Behavior