57
Figures of the Many antitative Concepts for Qualitative Thinking Bernhard Rieder Universiteit van Amsterdam Mediastudies Department

Figures of the Many - Quantitative Concepts for Qualitative Thinking

Embed Size (px)

DESCRIPTION

Slides for a talk given at the Big Data Symposium at Parnassos in Utrecht on April 25 2013.

Citation preview

Page 1: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Figures of the ManyQuantitative Concepts for Qualitative Thinking

Bernhard RiederUniversiteit van AmsterdamMediastudies Department

Page 2: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Context

Terms like "big data", "computational social science", "digital humanities", "digital methods", etc. are receiving a lot of attention.

They point to a set of practices for knowledge production: data analysis, visualization, modeling, etc.

Instead of a totalizing search for a "logic" of data analysis, we could inquire into the vocabulary of analytical gestures that constitute the practice of data analysis.

A twofold approach to methods:☉ Engagement, development, application => digital

methods

☉ Conceptual, historical, and political analysis and critique => software studies

Page 3: Figures of the Many - Quantitative Concepts for Qualitative Thinking

This presentation

How do we talk about data? How do we analyze them? What is our frame of thought? How do we go further in terms of imagination, expressivity?

☉ 1 / Confronting "the many"☉ 2 / Two kinds of mathematics

☉ Objects and their properties => Statistics

☉ Objects and their relations => Graph theory

Engage the theory of knowledge (epistemology) mobilized in data analysis, but through the actual techniques and not generalizing concepts.

Page 4: Figures of the Many - Quantitative Concepts for Qualitative Thinking

What styles of reasoning?

Hacking (1991) building the concept of "style of reasoning" on A. C. Crombie’s (1994) "styles of scientific thinking":

☉ postulation and deduction

☉ experiment and empirical research

☉ reasoning by analogy

☉ ordering by comparison and taxonomy

☉ statistical analysis of regularities and probabilities

☉ genetic development

What kind of reasoning are we mobilizing in data analysis?

Is the history of styles of reasoning simply intellectual progress, or adaptation to a changing world, or co-constitutive of that world?

What is our world like?

Page 5: Figures of the Many - Quantitative Concepts for Qualitative Thinking
Page 6: Figures of the Many - Quantitative Concepts for Qualitative Thinking
Page 7: Figures of the Many - Quantitative Concepts for Qualitative Thinking

"It is hard to believe that we still have to absorb the same types of actors, the same number of entities, the same profiles of beings, and the same modes of existence into the same types of collectives as Comte, Durkheim, Weber, or Parson [sic], especially after science and technology have massively multiplied the participants to be cooked in the melting pot." (Latour 2005, 260)

Page 8: Figures of the Many - Quantitative Concepts for Qualitative Thinking

The proliferation of actors and facilitation of transversal connectivity have lead to large and complex forms of socio-technical grouping and structuring.

Forms of organization take the shape of (multi-sided) markets based around technological platforms that facilitate transactions.

Social media use simple but flexible grammars of connectivity (combination of point to point and list forms), exchange, and aggregation that accommodate various practices and levels of scale.

The diversity of practices, contents, geographies, topologies, intensities, motivations, etc. makes it hard to generalize and theorize dynamics of use.

1 / The many

Page 9: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Platforms like Twitter boost opportunities for connectivity between various types of actors.

Page 10: Figures of the Many - Quantitative Concepts for Qualitative Thinking

At the same time, they produce detailed data traces that are highly centralized and searchable.

Page 11: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Quality / quantity

"One of my favorite fantasies is a dialogue between Mills and Lazarsfeld in which the former reads to the latter the first sentence of The Sociological Imagination: 'Nowadays men often feel that their private lives are a series of traps. ' Lazarsfeld immediately replies: 'How many men, which men, how long have they felt this way, which aspects of their private lives bother them, do their public lives bother them, when do they feel free rather than trapped, what kinds of traps do they experience, etc., etc., etc.' If Mills succumbed, the two of them would have to apply to the National Institute of Mental Health for a million-dollar grant to check out and elaborate that first sentence. They would need a staff of hundreds, and when finished they would have written Americans View Their Mental Health rather than The Sociological Imagination, provided that they finished at all, and provided that either of them cared enough at the end to bother writing anything." (Maurice Stein, cit. in Gitlin 1978)

Theory vs. empiricism, macro vs. micro, qualitative vs. quantitative, inductive vs. deductive, associative vs. formalistic, etc.

The promise of data analysis tools, applied to exhaustive (and cheap) data, is to bridge the gap, to allow zooming, "quali-quanti" (Latour 2010).

Page 12: Figures of the Many - Quantitative Concepts for Qualitative Thinking

“facts and statistics collected together for reference or analysis. See also datum.

- Computing: the quantities, characters, or symbols on which operations are performed by a computer, being stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.

- Philosophy: things known or assumed as facts, making the basis of reasoning or calculation.” (Oxford American Dictionary)

Define: data

Reasoning (OAD): "think rationally", "use one's mind", "calculate", "make sense of", "come to the conclusion", "judge", "persuade", etc.

Reasoning as "giving reasons" – what counts as a good reason? What counts as a good argument? As a proof? What is "good" knowledge?

Reasoning as a series of techniques, e.g. science, engineering, etc.

Page 13: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Why does the astronaut step into the space shuttle?

Page 14: Figures of the Many - Quantitative Concepts for Qualitative Thinking

A short history of reasoning the "more"

Commercial Capitalism (13th +)calculating for trade, arithmetic, sharing risk and profit in long-distance

commerce

Rise of the Nation State (17th +)"art of the state", mercantilism, scientific revolution

Industrialization (19th +)urbanization, scientific management, large bureaucracies

☉ Fibonacci, "Liber Abaci", Fibonacci, Calculating with Arab numerals (Pisa, 1202)

☉ Unknown, "Arte dell'Abbaco", Practical arithmetic (Venice, 1478)

☉ Pacioli, "Summa de arithmetica, geometria, proportioni et proportionalità" , Double entry bookkeeping (Venice, 1494)

☉ William Petty & John Graunt, Political Arithmetick (17th century)

☉ Hermann Conring & Gottf ried Achenwall, Statistik (17th & 18th century)

☉ Adolphe Quetelet, Statistical regularities and the "average man" (19th century)

☉ Francis Galton & Karl Pearson, Public health and eugenics (late 19th century)

Page 15: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Liber Abaci, Fibonacci, 1202

Calculation for accounting, money-changing, insurance, lending, measurement, etc.

Page 16: Figures of the Many - Quantitative Concepts for Qualitative Thinking

"Having proved that there die about 3,506 persons at Paris unnecessarily, to the damage of France, we come next to compute the value of the said damage, and of the remedy thereof, as follows, viz., the value of the said 3,506 at 60 livres sterling per head, being about the value of Algier slaves (which is less than the intrinsic value of people at Paris), the whole loss of the subjects of France in that hospital seems to be 60 times 3,506 livres sterling per annum, viz., 210,360 livres sterling, equivalent to about 2,524,320 French livres." (Petty 1655)

Page 17: Figures of the Many - Quantitative Concepts for Qualitative Thinking

The Assurance of Lifes, Charles Babbage, 1826

First life tables were assembled in the 17th century by John Graunt.

Babbage builds a machine to produce tables faster.

Page 18: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Essai sur la statistique de la population française, Adolphe d'Angeville, 1836

population census, tax register, house numbers, etc.

modern statistics, large bureaucracies, quantitative social sciences, etc.

Page 19: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Over the last centuries, scientific thinking has become the dominant way of producing knowledge and making decisions in most societies.

Scientific thinking implies various styles of reasoning, different ways of "giving reasons", different analytical gestures, etc.

Styles are intrinsically connected to our "lifeworld" (Husserl 1936).

Two diagnoses:☉ Our lifeworld is changing in significant ways => "the many"

☉ We need new ways of making sense of it => data analysis

What is the style of data analysis? Its epistemology? One or many?

What are its techniques, its analytical gestures?

Some conclusions for part 1

Page 20: Figures of the Many - Quantitative Concepts for Qualitative Thinking

2 / Two kinds of mathematics

Can there be data analysis without math? No.

Does this imply epistemological commitments? Yes.

But there are choice, e.g. between:☉ Confirmatory data analysis => deductive

☉ Exploratory data analysis (Tukey 1962) => inductive

There is a fast growing variety of analytical gestures focusing on large numbers of formalized and classed objects.

Page 21: Figures of the Many - Quantitative Concepts for Qualitative Thinking

2 / Two kinds of mathematics

Statistics

Observed: objects and properties

Inferred: relations

Data representation: the table

Visual representation: quantity charts

Grouping: class (similar properties)

Graph-theory

Observed: objects and relations

Inferred: structure

Data representation: the matrix

Visual representation: network diagrams

Grouping: clique (dense relations)

Page 22: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Facebook Page "ElShaheeed", June 2010 – June 2011, (Poell / Rieder, forthcoming)

7K posts, 700K users, 3.6M comments, 10M likes (tool: netvizz), work in progress!

Page 23: Figures of the Many - Quantitative Concepts for Qualitative Thinking

New media platforms funnel practices into reduced and largely formal "grammars of action" (Agre 1989); data is therefore very clean, very complete, and very detailed.

Can be imported with great ease into standard packages that come with many analytical gestures built in R, Excel, SPSS, Rapidminer, etc.).

Tools are easy, concepts are hard.

Statistics

Page 24: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Facebook Page "ElShaheeed", June 2010 – June 2011 comment timescatter

Page 25: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Facebook Page "ElShaheeed", June 2010 – June 2011 comment timescatter, log10 y scale

Page 26: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Facebook Page "ElShaheeed", June 2010 – June 2011: comment timescatter, log10 y scale, likes on comments

Page 27: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Facebook Page "ElShaheeed", June 2010 – June 2011 comment timeline, per day

Page 28: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Facebook Page "ElShaheeed", June 2010 – June 2011 comment timeline, per month

Page 29: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Facebook Page "ElShaheeed", June 2010 – June 2011 page posts by type, per month

Page 30: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Facebook Page "ElShaheeed", June 2010 – June 2011 comparison timeline: comments, posts, comments per post

Page 31: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Facebook Page "ElShaheeed", June 2010 – June 2011 histogram of comment lengths in characters

Page 32: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Facebook Page "ElShaheeed", June 2010 – June 2011 histogram of like count

Page 33: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Calculating relationships between variables

Quetelet 1827, Galton 1885, Pearson 1901

"Erosion of determinism" (Hacking 1991)

Page 34: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Facebook Page "ElShaheeed", June 2010 – June 2011 scatterplot comments / likes, with standard error

Page 35: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Facebook Page "ElShaheeed", June 2010 – June 2011: scatterplot comments / likes, per post type

Page 36: Figures of the Many - Quantitative Concepts for Qualitative Thinking

2 / Two kinds of mathematics

Statistics

Observed: objects and properties

Inferred: relations

Data representation: the table

Visual representation: quantity charts

Grouping: class (similar properties)

Graph-theory

Observed: objects and relations

Inferred: structure

Data representation: the matrix

Visual representation: network diagrams

Grouping: clique (dense relations)

Page 37: Figures of the Many - Quantitative Concepts for Qualitative Thinking

3 / The mathematics of structure

Graph theory has a long prehistory; social network analysis starts in the 1930s with Jacob Moreno's work.

Graph theory is "a mathematical model for any system involving a binary relation" (Harary 1969); it makes relational structure calculable.

Page 38: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Three different force-based layouts of my FB profile

OpenOrd, ForceAtlas, Fruchterman-Reingold

Page 39: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Non force-based layouts

Circle diagram, parallel bubble lines, arc diagram

Page 40: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Network statistics

betweenness centrality

degr

ee

Relational elements of graphs can be represented as tables (nodes have properties) and analyzed through statistics.

Network statistics bridge the gap between individual units and the structural forms they are embedded in.

This is currently an extremely prolific field of research.

Page 41: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Twitter 1% sample, 24 hours: 4.3M tweets, 3.4M users, 2M accounts mentioned, 227K unique hashtags

Page 42: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Helpful: baseline sampling

Twitter's API proposes a random 1% statuses/sample endpoint that does not require privileged access.

Provides datasets for researching certain types of questions and allows to "contextualize" (baseline) other collections.

We (Gerlitz / Rieder 2013) explored 24 hours of the 1% sample and captured 4,376,230 tweets, sent from 3,370,796 accounts, at an average rate of 50.65 tweets per second, leading to about 1.3GB of uncompressed and unindexed MySQL tables.

Page 43: Figures of the Many - Quantitative Concepts for Qualitative Thinking

A baseline provides reference points

Beware of averages in non-normal distributions! But 1% sample is sufficiently large to allow representative exploration of subsamples.

We can qualify structures and individual elements in terms with the help of statistics and graph theory.

Page 44: Figures of the Many - Quantitative Concepts for Qualitative Thinking
Page 45: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Twitter 1% sample, co-hashtag analysis

227,029 unique hashtags, 1627 displayed (freq >= 50)

Size: frequency

Color: modularity

Page 46: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Size: frequency

Color: user diversity

Twitter 1% sample, co-hashtag analysis

227,029 unique hashtags, 1627 displayed (freq >= 50)

Page 47: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Size: frequency

Color: degree

Twitter 1% sample, co-hashtag analysis

227,029 unique hashtags, 1627 displayed (freq >= 50)

Page 48: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Nine measures of centrality (Freeman 1979)

Page 49: Figures of the Many - Quantitative Concepts for Qualitative Thinking
Page 50: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Twitter 1% sample

Co-hashtag analysis

Degree vs. wordFrequency

Page 51: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Degree vs. userDiversity

Twitter 1% sample

Co-hashtag analysis

Page 52: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Facebook Page "ElShaheeed"

700K nodes, 11M connections

Color: type

Page 53: Figures of the Many - Quantitative Concepts for Qualitative Thinking
Page 54: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Facebook Page "ElShaheeed"

700K nodes, 11M connections

Color: outdegree

Page 55: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Conclusions

There is a lot of excitement about data analysis, but our understanding of styles and analytical gestures is still very poor.

We need interrogation and critiques of methodology that are developed from engagement and historical/conceptual investigation.

We need analytical gestures that are more closely tied to concepts from the humanities and social sciences; exploration rather than confirmation.

Visualization and simpler tools are very interesting but require technical and conceptual literacy to deliver more than illustrations.

This is probably not a fad.

Page 56: Figures of the Many - Quantitative Concepts for Qualitative Thinking

"Incite, induce, deviate, make easy or difficult, enlarge or limit, render more or less probable… These are the categories or power." (Deleuze 1986, 77)

Page 57: Figures of the Many - Quantitative Concepts for Qualitative Thinking

Thank You

[email protected]

https://www.digitalmethods.net

http://thepoliticsofsystems.net

"Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise. Data analysis must progress by approximate answers, at best, since its knowledge of what the problem really is will at best be approximate." (Tukey 1962)