Unwritten History of Literary Practice

Preview:

DESCRIPTION

text mining, distant reading, macroanalysis, eighteenth and nineteenth-century literary history, data mining, machine learning

Citation preview

THE UNWRITTEN HISTORY

OF LITERARY PRACTICE.

TED UNDERWOOD

FEB 28, 2013

PRE- AND POST-1150

EXAMPLES

PRE

all

well

good

world

make

king

name

POST

general

country

power

state

present

interest

number

1.0

1.5

2.0

2.5

3.0

1700 1750 1800 1850 1900

Genre

Poetry

Drama

Prose fiction

Nonfiction

DIFFERENTIATION OF FOUR GENRES.

The y axis is a ratio:

Number of pre-1150 words /

number of post-1150 words

CORRELATE WITH THE

RISING PRE-1150 TREND.

CORRELATE NEGATIVELY

WITH THE TREND.

HOW DO YOU FIND THE

FICTION IN A COLLECTION

OF 469,000 VOLUMES?

1. Tag a

“training

corpus” of

example

documents.

2. Identify

features.

naive Bayes

naive Bayes

naive Bayes

naive Bayes

3. Train an

ensemble.

NAIVE BAYES ON TEXT AND TITLES,

COMBINED WITH LOGISTIC REGRESSION.(432 VOLS HELD OUT FROM CORPUS OF 1356 19C VOLS.)

predicted

actual

prose

nonfiction

prose

fiction

verse and

drama

Recall

prose

nonfiction

118 5 0 0.959

prose

fiction

1 143 1 0.986

verse and

drama

0 7 157 0.957

Precision 0.992 0.923 0.994

Weight classifiers by proximity to

the date of the unknown

document.

1700 1800 1900

19c classifier

18c classifier

FEATURES CONSISTENTLY

MORE COMMON

(WILCOXON TEST, N = 220)

IN FIRST PERSON IN THIRD PERSON

MEAN SIMILARITY TO FIRST-

PERSON.

Mean prob.

of “first-

person” for

all fiction

vols.

I‟ve

left out

1700-1720

here, beca

use the

sample

size is so

small.

timespan

mean

firs

t

0.3

0.4

0.5

0.6

1750 1800 1850

WHAT DO THIRD-PERSON

NARRATORS TALK ABOUT?

herself -0.489

himself -0.475

him -0.440

had -0.369

eyes -0.312

was -0.281

face -0.274

hers -0.269

voice -0.249

remembered -0.246

lips -0.243

felt -0.242

turned -0.231

girl -0.227

pale -0.226

loved -0.226

watched -0.223

trembling -0.222

looked -0.222

conscious -0.219

smile -0.216

sudden -0.212

silent -0.209

silence -0.206

husband -0.204

daughter -0.203

WHAT DO THIRD-PERSON

NARRATORS TALK ABOUT?

herself -0.489

himself -0.475

him -0.440

had -0.369

eyes -0.312

was -0.281

face -0.274

hers -0.269

voice -0.249

remembered -0.246

lips -0.243

felt -0.242

turned -0.231

girl -0.227

pale -0.226

loved -0.226

watched -0.223

trembling -0.222

looked -0.222

conscious -0.219

smile -0.216

sudden -0.212

silent -0.209

silence -0.206

husband -0.204

daughter -0.203

0.000

0.005

0.010

0.0 0.5 1.0 1.5

log(pronounratio+1)

agg

regate

fre

quen

cy o

f 'fa

cia

l g

estu

res'

“WE DIDN‟T NEED FIRST PERSON. WE HAD …

FACES!!” N = 47,500, R = -0.247

bodily signs of

emotion:

eyes

face

voice

lips

smile

glance

tears

pale

trembling

sigh

FIRST-PERSON NARRATORS

QUANTIFY MORE:

R = 0.21 ON N=47,500

log(+1) ratio of first to third person pronouns

ag

gre

gate

fre

que

ncy o

f num

bers

0.005

0.010

0.015

0.5 1.0 1.5

DEFOE‟S „QUANTIFYING

NARRATOR‟ IS NOT ALONE.*

“I never saw them afterwards, or any sign of them, except

three of their hats, one cap, and two shoes that were not

fellows.”

The Life and Strange Surprising Adventures of Robinson Crusoe

see also …

Perseverance Island, or the Robinson Crusoe of the 19c.

The Boy Tar: or, A Voyage in the Dark

A Lady’s Experiences in the Wild West in 1883

The Swiss Family Robinson

The Shipwreck and Adventures of M. Pierre Viaud

etc. etc. etc. „quantifying narrator‟ h/t Brett D. Wilson.

WHAT GETS QUANTIFIED

(H/T PATRICK JUOLA)

IN FIRST PERSON

inches pistols

canoes barrels

feet englishmen

guns gallons

savages slaves

IN THIRD

centuries

figures

tears

friends

eyes

Recommended