42
TOWARDS A DICTIONARY OF THE FUTURE COUNTS, COMPARISONS, COLLOCATIONS, CONTESTATIONS

Towards a dictionary of the future

Embed Size (px)

Citation preview

Page 1: Towards a dictionary of the future

T O W A R D S A D I C T I O N A R Y O F T H E F U T U R E

COUNTS, COMPARISONS, COLLOCATIONS, CONTESTATIONS

Page 2: Towards a dictionary of the future

DICTIONARY OF THE FUTURE?

Page 3: Towards a dictionary of the future
Page 4: Towards a dictionary of the future
Page 5: Towards a dictionary of the future

SOME OTHER PLACES TO CHECK OUT

• The Google Ngram Viewer helps you understand

trends across a bazillion books that Google has

digitized. It’s an amazing resource:

• So are the Corpus of Historical American English:

http://corpus.byu.edu/coha/ (COHA)

• And the Corpus of Contemporary English:

http://corpus.byu.edu/coca/ (COCA)

Page 6: Towards a dictionary of the future

TO COHA!

Page 7: Towards a dictionary of the future

TO COCA!

Page 8: Towards a dictionary of the future

TAKING CARE WITH COUNTS

• The counts in the last two slides are too small to be

anything more than interesting

• The next slide shows us tracking the collocates of

future

• Collocates are the words that appear near a given

word—one of the chief collocates of salt is pepper,

for example

Page 9: Towards a dictionary of the future

COUNTS COUNT

Page 10: Towards a dictionary of the future

DISCUSSIONS, DEMOCRACIES AND DICTIONARIES

Page 11: Towards a dictionary of the future

What’s going

on in Urban

Dictionary?

• Identity

• Play

• Politics

Page 12: Towards a dictionary of the future

KEYWORDS

• What are the words

that are most

contested?

• How do they

change?

• Who controls the

future?

• Liberty vs. Freedom

Page 13: Towards a dictionary of the future

JACK GRIEVE FINDING WOTY’S

• See also http://idibon.com/quantifying-word-year/

Page 14: Towards a dictionary of the future

• p.s.—in my

ideal

Dictionary of

the Future, we

understand

the geography

of how a word

is used

Page 15: Towards a dictionary of the future

MEANING IS IN THE USE

• “For a large class of cases of the employment of the word ‘meaning’—though not for all—this way can be explained in this way: the meaning of a word is its use in the language” —Wittgenstein, Philosophical Investigations

Page 16: Towards a dictionary of the future

MEANING IN THE USE

• Tumblr moms use over 4 x’s as many

and

as Twitter peeps

• What are the collocates?• Blue: his he him

• Purple: she’s she

• No pink heart option!• See also http://www.washingtonpost.com/sf/opinions/2015/02/12/why-moms-love-emoji/ and

http://idibon.com/emomji-emoji-new-moms-use/

Page 17: Towards a dictionary of the future

CO-OCCURRENCES MATTER (MOVIE REVIEW RATINGS AND WORDS)

• The idea here is that if you’re writing a review and use the word wow, you’re being very positive or very negative. You don’t say Wow, I have a balanced and neutral opinion on this very often.

• If you’re using however, however, you’re likely to be in the middle of your movie review rating or travel summary—not at the very positive/negative extremes.

• See also http://web.stanford.edu/~cgpotts/manuscripts/potts-schwarz-exclamatives08.pdf and http://web.stanford.edu/~cgpotts/papers/constant-davis-potts-schwarz-expressives.pdf

Page 18: Towards a dictionary of the future

FOUR CASE STUDIES

• Wholesomeness: http://idibon.com/wholesome-

branding-campaign-effectiveness/

• Entrepreneur: http://idibon.com/entrepreneurs-

french-spanish-english/

• Because X: http://idibon.com/innovating-

innovation/

• #BlackLivesMatter:

http://idibon.com/blacklivesmatter-events-change-

conversations/

Page 19: Towards a dictionary of the future

WHOLESOMENESSH T T P : / / I D I B O N .C O M / WH O L E S O M E - B RA N DI N G -

C A M PA I G N - E F F E C T I V E N E S S /

Page 20: Towards a dictionary of the future

BRANDS LOVE WORDS

Page 21: Towards a dictionary of the future

DEEP HISTORY

• The first uses of wholesome tended to be about

‘virtuous teachings’.

• In Wycliffe’s Bible way back in 1382:

The..holsum wordis of oure Lord Jhesu Crist. (1 Timothy 6:3)

(Modern versions treat wordis as ‘words’, ‘teachings’, or

‘instructions’.)

Page 22: Towards a dictionary of the future

“WHOLESOME” [NOUN] OVER TIME

Page 23: Towards a dictionary of the future

HOW ABOUT IN SOCIAL MEDIA?

• You have to deal with spam (11% of data in this case; another 36% of data is “Wholesome Radio”, which is probably irrelevant)

• In 2014 tweets:• Food: 23% (but mostly not about Honey Maid)

• Humans: 23% (and how they can/should live; church-related mentions are prominent)

• Entertainment: 13% (movies, TV)

• Now let’s compare this to 2011 tweet uses:• Humans: 32%

• Entertainment: 12%

• Food: 9%

Page 24: Towards a dictionary of the future

WORDS ARE CONTESTED

Page 25: Towards a dictionary of the future

MORE ON CONTESTED WORDS

• In the next slide, you’ll see an image from Monroe et al (2008)

• This is work that takes the basic thing we know: Republicans and Democrats speak about the same issue differently.

• In the next slide, they are showing methods that can pull about how the parties speak about abortion when they take the floor.

• The words at the top are the Democratic party words, the ones at the bottom are the Republican party words.

• http://languagelog.ldc.upenn.edu/myl/Monroe.pdf

Page 26: Towards a dictionary of the future
Page 27: Towards a dictionary of the future

ENTREPRENEURH T T P : / / I D I B O N .C O M / E N T R E P R E N E U R S - F R E N C H - S PA N I S H -

E N G L I S H /

Page 28: Towards a dictionary of the future

ENTREPRENEUR IN ENGLISH, FRENCH, SPANISH

• Tycoon, mogul, industrialist• A flavor of ‘ill-gotten gains’

• Entrepreuneur doesn’t seem to have this—in English right now

• Collocates have to do with:• Advice• Success• Investors

• Marketing• Social (media/services/topics/techniques)• Failure (especially fear-of)• Lots of named entities (SXSW, Dubai, #KSA, Twitter, Google, LinkedIn,

Etsy)

• The people using entrepreneur identify themselves as• Authors, speakers, writers, bloggers, strategists, (life) coaches,

consultants, moms, wives, husbands, fathers, food-lovers, music-lovers

Page 29: Towards a dictionary of the future

KEY: GET COMPARISON SETS

Group/Context A

Group/Context B

Page 30: Towards a dictionary of the future

INTERCONNECTED AXES OF DIFFERENCE

• Genre (State of the Unions vs. Reddit comments)

• Time (1940s vs. the last ten years)

• Geography (hella vs. wicked)

• Traditional demographics (age, gender, education)

• Personal identity/style (nerd, goth, bro, mom)

Page 31: Towards a dictionary of the future

BECAUSE X

HTTP :// ID IBON.COM/INNOVAT ING - INNOVAT ION/

Page 32: Towards a dictionary of the future

INNOVATIONS AND THEIR COMMUNITIES

• Because X’ersdisporportionately like:• YouTube

• Tumblr

• One Direction (especially Harry)

• Justin Bieber

• Ariana Grande

• “bands”

• pizza

• sex

• cats

• books

• They are decidedly less likely to talk about • software

• basketball

• NASCAR

• business

• words associated with African-American Vernacular English

Page 33: Towards a dictionary of the future

TH

E X

IN B

EC

AU

SE

X

Page 34: Towards a dictionary of the future

Part of speech Word counts ≥ 50

Noun (people, spoilers) 32.02%

Compressed clause

(ilysm)

21.78%

Adjective (ugly, tired) 16.04%

Interjection (sweg, omg) 14.71%

Agreement (yeah, no) 12.97%

Pronoun (you, me) 2.45%

PART OF SPEECH TAGGERS ARE GOOD

• There’s even a pretty good one for Twitter POS

Page 35: Towards a dictionary of the future

INNOVATIONS CLUMP

Page 36: Towards a dictionary of the future

#BLACKLIVESMATTERH T T P : / / I D I B O N .C O M / B LA C K L I V E S M A T T E R - E V E N T S -

C H A N G E - C O N V E R S A T I O N S /

Page 37: Towards a dictionary of the future

TOPIC MODELING

• In the previous sections, I’ve been noting what you can do when you have two or more comparison sets• How is wholesome used in time x vs. time y vs. time z

• What are the differences between English speakers talking about entrepreneurship vs. French speakers and Spanish speakers?

• How are people who use the innovative Because Xconstruction different than people who don’t use it?

• In this section, we talk about topic modeling, which is a way to automatically identify clusters within a data set, even if you don’t have a comparison set.

• We’ll use this to explore conversations around #blacklivesmatter, but we’ll also see how these conversations shift before/after a particular moment in time

Page 38: Towards a dictionary of the future

TIME MATTERS

Page 39: Towards a dictionary of the future

TOPICS (EVEN WHEN YOU DON’T HAVE AN A PRIORI COMPARISON SET)

Page 40: Towards a dictionary of the future

UNKNOWN UNKNOWNS

• In general, topic modeling is a way of addressing

the limits of our knowledge. If you’re asking a

question about data, you probably know

something about the data going in.

• But what we hear from people is that they are keenly aware that they don’t know what they don’t know.

• Topic modeling is meant to help that.

• In the next slides, another use of topic modeling:

identifying the themes of Martin Luther King Jr.’s

major speeches and sermons

Page 41: Towards a dictionary of the future

• Topic modeling Dr. King’s major speeches and sermons gets these topics

• Which change over time

• See also http://idibon.com/topic-detection-mlk/

Page 42: Towards a dictionary of the future