Upload
librarianrafia
View
506
Download
0
Embed Size (px)
Citation preview
Data for the HumanitiesFebruary 21, 2017
Rafia MirzaDigital Humanities [email protected] @librarianrafia
Peace Ossom WilliamsonDirector of Research Data Services [email protected] @123POW
Learning Outcomes
• Understand the use of data in answering humanities research questions
• Understand descriptive metadata and the rationale for its use
• Recognize areas of potential bias and ambiguous or misleading representation in reporting
What are data?
“All content in digital formats can be characterized as structured or unstructured data.”
Introduction to Digital Humanities: Concepts, Methods, and Tutorials
Examples:
•Audio
•Notes
•Geospatial
•Textual
Data are more than numbers
https://www.lib.umn.edu/datamanagement/whatdata
What is data literacy?
the ability to read, create, utilize, communicate, and criticize data.
Data Literacy
data quality
accessibility, usability, and understandability on the basis of context, providence, and metadata
Data Literacy
data structure
of different objects in a way that works to evaluate developing hypotheses
Data Literacy
recognizeResearch potential
be aware ofResearch methods
understandContext and provenience
Humanities Data Literacy
“Humanists have data, and they need data skills.”
Digital Humanities Data Curation
Data in the Humanities
Types of Humanities Data
• Scholarly editions
• Text corpora
• Text with markup
• Thematic research collections
• Data with accompanying analysis or annotation
• Finding aids and other information maps, such as bibliographies
Digital Humanities Data Curation Introduction
Big Data Digital Humanities vs.Small Data Digital Humanities
• “Research in Big Data Digital Humanities focuses on large or dense cultural datasets, which call for new processing and interpretation methods”
• “..Small Data Digital Humanities regroup more focused works that do not use massive data processing..”
• A map for big data research in digital humanities, Frédéric Kaplan
1. research the context: know the data about the data (so meta!)
How to understand data
Data versus Metadata
Big? Smart? Clean? Messy? Data in the Humanities, Christof Schöch
Metadata Metadata Metadata Metadata
data data data data
data data data data
data data data data
data data data data
About this dataset:
Title: Metadata Date Created: MetadataCreator: MetadataMethods Used: Metadata
2. research who the data is about
How to understand data
What are historical contexts around their language and style?
A note on data ethics.
Zine Librarians Code of Ethics
• “Zines are not like mass-distributed books. They are often self-published and self-distributed, and sometimes printed in very small runs, intended for a small audience. In addition, perzinesare by definition “personal”, and zinesters may feel different about having their zines distributed in print than they would about having them openly available on the internet or print. This can be especially true in the case of “historical” zines in library collections — for example, a teen girl writing a zine for her close friends in 1994 may not want her zine distributed online or in print 20 years later.”
• Via Zinelibraries
Ethics
• Choosing tools:
• Omeka CMS vs Mukurtu CMS
• Collecting data:
• Boston College Oral Histories
3. investigate the source
How to understand data
Recognizing uncertainty and bias
Data on killings in the Syrian conflict.
https://responsibledata.io/reflection-stories/uncertainty-statistics/
Let’s investigate the source…
Recognizing uncertainty and bias
Sources include
• Syrian government
• Syrian Center for Statistics and Research
• Syrian Network for Human Rights
• Syrian Observatory for Human Rigets
and many more.
https://responsibledata.io/reflection-stories/uncertainty-statistics/
there are lots of human decisions that go into creating these statistics
without knowing how these deaths have been coded, it’s difficult to trust in the figures
4. highlight un/common data entries to gain rough insights
How to understand data
Descriptive analysis
i.e., description of the data from a sample
Quick descriptive statistics
• frequency
•rank from lowest to highest
•average (mean, median, mode)
•variability
Bivariate descriptive statistics
fancy way of saying we are looking at two variables at once
Hamlet Macbeth Othello
Similes 50 9 59
Metaphors 20 38 58
Total 70 47 117
Evaluating Comparison Methods
Correlation
most common way to describe a relationship between two measures
Finding Data
What type of data are you looking for?List of Data Repositories
DH Toychest: Data Collections and Datasets
• Texts: HathiTrust Digital Library
• Spatial or numeric datasets: Data.gov
• Images: British Library Images
• Hybrid data sets: Digital Public Library of America
Via
What if the dataset you needdoes not exist?
How to data1. Determine what to say
2. Find/collect/create the data you need
3. Wrangle!
4. Clean!
5. Do it many more times.
ID Religion Income Age Q1 Q2 Q3
26371 Jewish <$10K 19 Yes 6 20
26372 Atheist $50-75K 24 - 4 21
26373 Catholic $75-100K 56 Yes 3 21
26374 Withheld $75-100K 33 No 6 21
26375 Pentecostal withheld 49 Yes 8 20
26376 Jewish $40-50K 29 Yes 5 19
26377 Catholic $20-30K 37 No 4 22
http://vita.had.co.nz/papers/tidy-data.pdf
Tidy Data
Most common problems
• Column headers are values, not variable names.
• Multiple variables are stored in one column.
• Variables are stored in both rows and columns.
• Multiple types of observational units are stored in the same table.
• A single observational unit is stored in multiple tables
http://vita.had.co.nz/papers/tidy-data.pdf
if you torture data long enough,
it will confess to anything
How can a visualization be misleading?
What’s wrong?
A little less dramatic than you thought.
http://www.visualisingdata.com/2014/04/the-fine-line-between-confusion-and-deception/
https://thesyriacampaign.org/
Open Data: Things to Consider
http://www.slideshare.net/libereurope/humanities-data-literacy-student-perspective-on-digital-cultural-heritage-collections?qid=70bd86f2-10c5-43a6-b053-56d264ca28ab&v=&b=&from_search=1
Recommended Reading / Viewing
“Numbers are Only Human” – Brian Root
“Ethical Principles of Psychologists and Code of Conduct” –American Psychological Association
“On Not Looking: Ethics and Access in the Digital Humanities” –Kimberly Cristen-Withey
Upcoming Workshops and Eventslibrary.uta.edu/scholcomm
Rafia [email protected] @librarianrafia
Peace Ossom [email protected] @123POW