Doing Digital History: Heuristics, Hermeneutics, and Source Criticism in a Digital Age

  • View
    813

  • Download
    0

  • Category

    Science

Preview:

Citation preview

Doing Digital History

Heuristics, Hermeneutics, and Source Criticism in a Digital Age

Melvin Wevers

@melvinweverswww.translantis.nl

February 25, 2015 UCLDH Seminar - UCL

Overview• Consuming America: the role of the United

States as a reference culture in Dutch consumer society between 1890-1940

• Digital Humanities Cycle: heuristics, hermeneutics, corpus creation, source criticism, and tool criticism

• Methods: Full-text search, N-gram analysis, Topic modeling, Named entity recognition

What is a Reference Culture ?

• Reference culture is an analytical concept to study geopolitical formations in a transnational context.

• Reference cultures serve as a model for other countries, e.g. Byzantium empire, 19th century England, Caliphate.

• Twentieth century: The American Century - Henry Luce

• Culture of references > imagined, symbolic, and metaphysical ‘America’

• Focus on the receiving end within a wider global context of globalization, Americanization and modernization (cf. Rob Kroes, John Muthyala)

How do we research Reference Cultures?

• Reference cultures emerge in collective discussions on specific products, ideas, and practices

• Against a background of cultural, technological, and economic developments

• In other words, a reference culture is an imagined, symbolic ‘America’ grounded within actual material conditions and practices

• The project aims to use digital technologies to analyze reference cultures in Dutch digitized newspapers between 1890-1990

Case Study: Cigarettes 1890-1940

• Cultural icon of American entrepreneurialism

• “Product that defined America” (Allan Brandt)

• production, distribution, and consumption

• How was symbolic connotation perceived outside of the United States?

• Geographical connotation

• Debates on technological changes: taste and packaging

• Changing consumer behavior > consumerist abundance, female smokers

Geographical connotations of the cigarette - RQ

• How have the geographic connotations of the cigarette shifted between 1890-1940?

• How has this informed the idea of America? In other words, the performance of America as a reference culture?

Is this Big Data Research?The change of scale has led to a change of state. The quantitative change has led to a qualitative one. […]

[B]ig data refers to things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value

Viktor Mayer-Schönberger en Kenneth Cukier, Big Data: A Revolution That Will

Transform How We Live, Work, and Think (Boston 2013) 13.  

Distant reading‘Distant reading’, I have once called this type of approach; where distance is however not an obstacle, but a specific form of knowledge; fewer elements, hence a sharper sense of their overall interconnection. Shapes, relations, structures. Forms. Models.

Franco Moretti, Graphs, Maps, Trees. Abstract Models for a Literary History

(Londen en New York 2005) 1.

• The Dutch newspaper archive is not really big data (biggish data?)

• Do we want to work with big data research? Big patterns? Or do we aim for more extensive searching, and more complexity in our sources

• “[D]ata does not always have to be used as evidence, but can be simply for discovering and framing research questions. […] [P]laying with data – in all its formats and forms – is more important than ever.”Frederick W. Gibbs and Trevor J. Owens, ‘The Hermeneutics of Data and Historical Writing’, in: Kristen Nawrotzki and Jack Dougherty (eds.), Writing History in the Digital Age (Ann Arbor, MI: University of Michigan Press, 2013).

• Exploratory searching as an advance corrective against the threat of essentialism and determinism [important in case of history/Americanization]

How Big is Big Data?

Digital Humanities CycleHeuristics

Corpus Selection

Hermeneutics

Full-text search, text analytics, topic modeling, named entity recognition,

n-gram analysis

Tool Criticism

Source criticism

Heuristics: Full-text search

• Large amounts of data

• Digital archives

• International data

• Ability to search full-text

Delpher.nl

Heuristics using metadata

“At least for research, digital history can be defined as the theory and practice of bringing technology to bear on the abundance we now confront.”‘Interchange: The Promise of Digital History’, The Journal of American History 95 (2008) 452-491, 454.

New Way of Doing History

Bob Nicholson “The Digital Turn” Media History (2013)

Source Criticism

[T]he problem is that while we think we are searching newspapers, we are actually searching markedly inaccurate representations of text, hidden behind a poor quality image. And even more damning, by citing a hard copy of the original we are then refusing to document our research path, making it difficult for others to critique the process.

Tim Hitchcock, ‘Confronting the Digital: Or How Academic History Writing Lost the Plot’, Cultural and Social History 10 (2013) 9-23.

N-gram analysis

• http://kbkranten.politicalmashup.nl

Corpus Selection• Corpus Selection

• API (JSON)

• Texcavator

• Cleaning up the Corpus: Python/OpenRefine/NLTK

• Corpus analysis / Corpus Linguistics

• Topic modeling

• Named entity recognition (NER)

Tool Criticism• Tools as instrument (STS)

• Bruno Latour - Laboratory Life: The construction of scientific facts (1986)

• Steven Shaping - Never Pure: Historical Studies of Science as if It Was Produced by People with Bodies, Situated in Time, Space, Culture, and Society, and Struggling for Credibility and Authority (2010)

• Explain how the tools works

• How do we define whether the tool works?

Topic Modeling• Method (MALLET) to discover latent structures within a

collection of texts

• Words acquire meaning through context -> Topic Modeling

• Contextual comparisons between different periods or corpora

• Main goal: discover events, users, and objects > Topics > Hidden debates

• In other words: not to prove stuff, but to find more stuff

1924-1929 key topics advertisements

• sigaret virginia whip chief ardath london goud cigarettes kwaliteit olympia kurk nummer rook beste gezondheid zoo zulk vooraan punten

• sigaret sigaar pijp beter smakelijker wybert amersfoort virginia houbaer tabletten rooken oudste prijs magnums hollands nasmaak cent nemen noch

• sigaret nieuwe onze tabakken vervaardigd doosje import vraagt rookt smaak betere cents sigaretten turksche fijne kwaliteit edelste uwe proef

• sigaret club sigaretten gij army camel tabak cent sopla camels wereld kwaliteit prijs gemaakt virginia sigaren eerst rookt keel

• sigaret adamas egyptische mildste tegenwoordig stuks coupon coupons mavrides fijnste sigaretten cts geschenken gratis ste naam fijn slechts omar

Named Entity Recognition

• StanfordNER is a method to automatically detect specific entities within texts

• Locations

• Persons

• Organizations

Named Entity Recognition - output 1890-1920

Foreign Locations (N>20) Dutch Locations (N>20) American Locations

The United Kingdom / London (151 / 84) Rotterdam (496) America (70)

Germany / Berlin / Hamburg (146 / 81 / 22) Amsterdam (177) New York (34)

France / Paris (139 / 154) Tilburg (107) Washington (11)

Russia (102) Groningen (94) United States (Vereenigde Staten) (11)

The United States / America / New York ( / 70 / 34) Breda (64) Chicago (3)

Belgium / Bruxelles / Antwerp ( / 57 / 46 ) Haarlem (48) Virginia (3)

Austria / Vienna (40 / 21) Utrecht (43) North-America (Noord-Amerika) (3)

Turkey (39) Arnhem (35)

Holland (39) The Hague (26)

Europe (36) Leeuwarden (24)

Spain (33) Maastricht (26)

Leiden (24)

Friesland (21)

Good ‘Ole Close Reading• Don’t say goodbye to your traditional methods or theories

• Country-of-Origin effect (branding theory)

• Theories of modernization/globalization/Americanization

• Discourse analysis > Foucault

• Conceptual history > Braudel, Koselleck, Armitage [Big history manifesto]

• DH is too often about the tools or the methods; but can be bridged with theoretical / analytical models into critical digital humanities [cf. David Berry, Alan Liu]

Conclusion (I): Geographical connotations

• Country of origin effect

• From actual locations to symbolic references

• Shift of geographical connotation of cigarette

• Oriental, British, European, American

• Detached from United States / United States as floating signifier

Conclusion (II): Collateral damage

• The output provided me with topics to further research in other chapters > data-driven

• These are provided by the source material and not only by secondary literature

• Technologies of Taste

• Consumer Behavior