42
TOWARDS A DICTIONARY OF THE FUTURE COUNTS, COMPARISONS, COLLOCATIONS, CONTESTATIONS

Counts, comparisons, collocations, contestations: Towards a dictionary of the future

  • Upload
    idibon1

  • View
    85

  • Download
    2

Embed Size (px)

Citation preview

T O WA R D S A D I C T I O N A RY O F T H E F U T U R E

COUNTS, COMPARISONS, COLLOCATIONS, CONTESTATIONS

DICTIONARY OF THE FUTURE?

SOME OTHER PLACES TO CHECK OUT

• The Google Ngram Viewer helps you understand trends across a bazillion books that Google has digitized. It’s an amazing resource:• So are the Corpus of Historical American English:

http://corpus.byu.edu/coha/ (COHA)• And the Corpus of Contemporary English:

http://corpus.byu.edu/coca/ (COCA)

TO COHA!

TO COCA!

TAKING CARE WITH COUNTS

• The counts in the last two slides are too small to be anything more than interesting• The next slide shows us tracking the collocates of

future• Collocates are the words that appear near a

given word—one of the chief collocates of salt is pepper, for example

COUNTS COUNT

DISCUSSIONS, DEMOCRACIES AND DICTIONARIES

What’s going on in Urban Dictionary?• Identity• Play• Politics

KEYWORDS

• What are the words that are most contested?• How do they

change?• Who controls the

future?• Liberty vs. Freedom

JACK GRIEVE FINDING WOTY’S

• See also http://idibon.com/quantifying-word-year/

• p.s.—in my ideal Dictionary of the Future, we understand the geography of how a word is used

MEANING IS IN THE USE

• “For a large class of cases of the employment of the word ‘meaning’—though not for all—this way can be explained in this way: the meaning of a word is its use in the language” — Wittgenstein, Philosophical Investigations

MEANING IN THE USE

• Tumblr moms use over 4 x’s as many

and

as Twitter peeps• What are the

collocates?• Blue: his he him• Purple: she’s she• No pink heart option!

• See also http://www.washingtonpost.com/sf/opinions/2015/02/12/why-moms-love-emoji/ and http://idibon.com/emomji-emoji-new-moms-use/

CO-OCCURRENCES MATTER (MOVIE REVIEW RATINGS AND WORDS)

• The idea here is that if you’re writing a review and use the word wow, you’re being very positive or very negative. You don’t say Wow, I have a balanced and neutral opinion on this very often.

• If you’re using however, however, you’re likely to be in the middle of your movie review rating or travel summary—not at the very positive/negative extremes.

• See also http://web.stanford.edu/~cgpotts/manuscripts/potts-schwarz-exclamatives08.pdf and http://web.stanford.edu/~cgpotts/papers/constant-davis-potts-schwarz-expressives.pdf

WHOLESOMENESSH TT P : / / I D I BO N . C O M / W H O L E S O M E -B RA N D I N G - C A M PA I G N - E F F E CT I V E N E S S

/

BRANDS LOVE WORDS

DEEP HISTORY

• The first uses of wholesome tended to be about ‘virtuous teachings’. • In Wycliffe’s Bible way back in 1382:

The..holsum wordis of oure Lord Jhesu Crist. (1 Timothy 6:3)

(Modern versions treat wordis as ‘words’, ‘teachings’, or ‘instructions’.)

“WHOLESOME” [NOUN] OVER TIME

HOW ABOUT IN SOCIAL MEDIA?

• You have to deal with spam (11% of data in this case; another 36% of data is “Wholesome Radio”, which is probably irrelevant)• In 2014 tweets:• Food: 23% (but mostly not about Honey Maid)• Humans: 23% (and how they can/should live; church-

related mentions are prominent)• Entertainment: 13% (movies, TV)

• Now let’s compare this to 2011 tweet uses:• Humans: 32%• Entertainment: 12%• Food: 9%

WORDS ARE CONTESTED

MORE ON CONTESTED WORDS

• In the next slide, you’ll see an image from Monroe et al (2008)

• This is work that takes the basic thing we know: Republicans and Democrats speak about the same issue differently.

• In the next slide, they are showing methods that can pull about how the parties speak about abortion when they take the floor.

• The words at the top are the Democratic party words, the ones at the bottom are the Republican party words.

• http://languagelog.ldc.upenn.edu/myl/Monroe.pdf

ENTREPRENEURH TT P

: / / I D I BO N . C O M / E N T R E P R E N E U R S - F R E N C H - S PA N I S H - E N G L I S H /

ENTREPRENEUR IN ENGLISH, FRENCH, SPANISH

• Tycoon, mogul, industrialist• A flavor of ‘ill-gotten gains’

• Entrepreuneur doesn’t seem to have this—in English right now• Collocates have to do with:• Advice• Success• Investors• Marketing• Social (media/services/topics/techniques)• Failure (especially fear-of)• Lots of named entities (SXSW, Dubai, #KSA, Twitter, Google, LinkedIn,

Etsy)

• The people using entrepreneur identify themselves as• Authors, speakers, writers, bloggers, strategists, (life) coaches,

consultants, moms, wives, husbands, fathers, food-lovers, music-lovers

KEY: GET COMPARISON SETS

Group/

Context A

Group/

Context B

INTERCONNECTED AXES OF DIFFERENCE

• Genre (State of the Unions vs. Reddit comments)• Time (1940s vs. the last ten years)• Geography (hella vs. wicked)• Traditional demographics (age, gender,

education)• Personal identity/style (nerd, goth, bro, mom)

BECAUSE XHTTP: / / ID IBON.COM/INNOVATING- INNOVATION/

INNOVATIONS AND THEIR COMMUNITIES

• Because X’ers disporportionately like:• YouTube• Tumblr• One Direction (especially Harry)• Justin Bieber• Ariana Grande• “bands”• pizza• sex• cats• books

• They are decidedly less likely to talk about • software• basketball• NASCAR• business• words associated with African-

American Vernacular English

TH

E X

IN B

EC

AU

SE X

Part of speech Word counts ≥ 50

Noun (people, spoilers) 32.02%

Compressed clause (ilysm)

21.78%

Adjective (ugly, tired) 16.04%

Interjection (sweg, omg) 14.71%

Agreement (yeah, no) 12.97%

Pronoun (you, me) 2.45%

PART OF SPEECH TAGGERS ARE GOOD

• There’s even a pretty good one for Twitter POS

INNOVATIONS CLUMP

#BLACKLIVESMATTERH TT P : / / I D I BO N . C O M / B L AC K L I V E S M ATT E R- E V E N T S - C H A N G E - C O NV E R S AT I O N S

/

TOPIC MODELING

• In the previous sections, I’ve been noting what you can do when you have two or more comparison sets• How is wholesome used in time x vs. time y vs. time z• What are the differences between English speakers talking about

entrepreneurship vs. French speakers and Spanish speakers?• How are people who use the innovative Because X construction

different than people who don’t use it?

• In this section, we talk about topic modeling, which is a way to automatically identify clusters within a data set, even if you don’t have a comparison set.

• We’ll use this to explore conversations around #blacklivesmatter, but we’ll also see how these conversations shift before/after a particular moment in time

TIME MATTERS

TOPICS (EVEN WHEN YOU DON’T HAVE AN A PRIORI COMPARISON SET)

UNKNOWN UNKNOWNS

• In general, topic modeling is a way of addressing the limits of our knowledge. If you’re asking a question about data, you probably know something about the data going in. • But what we hear from people is that they are keenly

aware that they don’t know what they don’t know.• Topic modeling is meant to help that.

• In the next slides, another use of topic modeling: identifying the themes of Martin Luther King Jr.’s major speeches and sermons

• Topic modeling Dr. King’s major speeches and sermons gets these topics•Which change over time• See also http://idibon.com/topic-detection-mlk/