Upload
tyler-schnoebelen
View
567
Download
1
Tags:
Embed Size (px)
Citation preview
T O W A R D S A D I C T I O N A R Y O F T H E F U T U R E
COUNTS, COMPARISONS, COLLOCATIONS, CONTESTATIONS
DICTIONARY OF THE FUTURE?
SOME OTHER PLACES TO CHECK OUT
• The Google Ngram Viewer helps you understand
trends across a bazillion books that Google has
digitized. It’s an amazing resource:
• So are the Corpus of Historical American English:
http://corpus.byu.edu/coha/ (COHA)
• And the Corpus of Contemporary English:
http://corpus.byu.edu/coca/ (COCA)
TO COHA!
TO COCA!
TAKING CARE WITH COUNTS
• The counts in the last two slides are too small to be
anything more than interesting
• The next slide shows us tracking the collocates of
future
• Collocates are the words that appear near a given
word—one of the chief collocates of salt is pepper,
for example
COUNTS COUNT
DISCUSSIONS, DEMOCRACIES AND DICTIONARIES
What’s going
on in Urban
Dictionary?
• Identity
• Play
• Politics
KEYWORDS
• What are the words
that are most
contested?
• How do they
change?
• Who controls the
future?
• Liberty vs. Freedom
JACK GRIEVE FINDING WOTY’S
• See also http://idibon.com/quantifying-word-year/
• p.s.—in my
ideal
Dictionary of
the Future, we
understand
the geography
of how a word
is used
MEANING IS IN THE USE
• “For a large class of cases of the employment of the word ‘meaning’—though not for all—this way can be explained in this way: the meaning of a word is its use in the language” —Wittgenstein, Philosophical Investigations
MEANING IN THE USE
• Tumblr moms use over 4 x’s as many
and
as Twitter peeps
• What are the collocates?• Blue: his he him
• Purple: she’s she
• No pink heart option!• See also http://www.washingtonpost.com/sf/opinions/2015/02/12/why-moms-love-emoji/ and
http://idibon.com/emomji-emoji-new-moms-use/
CO-OCCURRENCES MATTER (MOVIE REVIEW RATINGS AND WORDS)
• The idea here is that if you’re writing a review and use the word wow, you’re being very positive or very negative. You don’t say Wow, I have a balanced and neutral opinion on this very often.
• If you’re using however, however, you’re likely to be in the middle of your movie review rating or travel summary—not at the very positive/negative extremes.
• See also http://web.stanford.edu/~cgpotts/manuscripts/potts-schwarz-exclamatives08.pdf and http://web.stanford.edu/~cgpotts/papers/constant-davis-potts-schwarz-expressives.pdf
FOUR CASE STUDIES
• Wholesomeness: http://idibon.com/wholesome-
branding-campaign-effectiveness/
• Entrepreneur: http://idibon.com/entrepreneurs-
french-spanish-english/
• Because X: http://idibon.com/innovating-
innovation/
• #BlackLivesMatter:
http://idibon.com/blacklivesmatter-events-change-
conversations/
WHOLESOMENESSH T T P : / / I D I B O N .C O M / WH O L E S O M E - B RA N DI N G -
C A M PA I G N - E F F E C T I V E N E S S /
BRANDS LOVE WORDS
DEEP HISTORY
• The first uses of wholesome tended to be about
‘virtuous teachings’.
• In Wycliffe’s Bible way back in 1382:
The..holsum wordis of oure Lord Jhesu Crist. (1 Timothy 6:3)
(Modern versions treat wordis as ‘words’, ‘teachings’, or
‘instructions’.)
“WHOLESOME” [NOUN] OVER TIME
HOW ABOUT IN SOCIAL MEDIA?
• You have to deal with spam (11% of data in this case; another 36% of data is “Wholesome Radio”, which is probably irrelevant)
• In 2014 tweets:• Food: 23% (but mostly not about Honey Maid)
• Humans: 23% (and how they can/should live; church-related mentions are prominent)
• Entertainment: 13% (movies, TV)
• Now let’s compare this to 2011 tweet uses:• Humans: 32%
• Entertainment: 12%
• Food: 9%
WORDS ARE CONTESTED
MORE ON CONTESTED WORDS
• In the next slide, you’ll see an image from Monroe et al (2008)
• This is work that takes the basic thing we know: Republicans and Democrats speak about the same issue differently.
• In the next slide, they are showing methods that can pull about how the parties speak about abortion when they take the floor.
• The words at the top are the Democratic party words, the ones at the bottom are the Republican party words.
• http://languagelog.ldc.upenn.edu/myl/Monroe.pdf
ENTREPRENEURH T T P : / / I D I B O N .C O M / E N T R E P R E N E U R S - F R E N C H - S PA N I S H -
E N G L I S H /
ENTREPRENEUR IN ENGLISH, FRENCH, SPANISH
• Tycoon, mogul, industrialist• A flavor of ‘ill-gotten gains’
• Entrepreuneur doesn’t seem to have this—in English right now
• Collocates have to do with:• Advice• Success• Investors
• Marketing• Social (media/services/topics/techniques)• Failure (especially fear-of)• Lots of named entities (SXSW, Dubai, #KSA, Twitter, Google, LinkedIn,
Etsy)
• The people using entrepreneur identify themselves as• Authors, speakers, writers, bloggers, strategists, (life) coaches,
consultants, moms, wives, husbands, fathers, food-lovers, music-lovers
KEY: GET COMPARISON SETS
Group/Context A
Group/Context B
INTERCONNECTED AXES OF DIFFERENCE
• Genre (State of the Unions vs. Reddit comments)
• Time (1940s vs. the last ten years)
• Geography (hella vs. wicked)
• Traditional demographics (age, gender, education)
• Personal identity/style (nerd, goth, bro, mom)
INNOVATIONS AND THEIR COMMUNITIES
• Because X’ersdisporportionately like:• YouTube
• Tumblr
• One Direction (especially Harry)
• Justin Bieber
• Ariana Grande
• “bands”
• pizza
• sex
• cats
• books
• They are decidedly less likely to talk about • software
• basketball
• NASCAR
• business
• words associated with African-American Vernacular English
TH
E X
IN B
EC
AU
SE
X
Part of speech Word counts ≥ 50
Noun (people, spoilers) 32.02%
Compressed clause
(ilysm)
21.78%
Adjective (ugly, tired) 16.04%
Interjection (sweg, omg) 14.71%
Agreement (yeah, no) 12.97%
Pronoun (you, me) 2.45%
PART OF SPEECH TAGGERS ARE GOOD
• There’s even a pretty good one for Twitter POS
INNOVATIONS CLUMP
#BLACKLIVESMATTERH T T P : / / I D I B O N .C O M / B LA C K L I V E S M A T T E R - E V E N T S -
C H A N G E - C O N V E R S A T I O N S /
TOPIC MODELING
• In the previous sections, I’ve been noting what you can do when you have two or more comparison sets• How is wholesome used in time x vs. time y vs. time z
• What are the differences between English speakers talking about entrepreneurship vs. French speakers and Spanish speakers?
• How are people who use the innovative Because Xconstruction different than people who don’t use it?
• In this section, we talk about topic modeling, which is a way to automatically identify clusters within a data set, even if you don’t have a comparison set.
• We’ll use this to explore conversations around #blacklivesmatter, but we’ll also see how these conversations shift before/after a particular moment in time
TIME MATTERS
TOPICS (EVEN WHEN YOU DON’T HAVE AN A PRIORI COMPARISON SET)
UNKNOWN UNKNOWNS
• In general, topic modeling is a way of addressing
the limits of our knowledge. If you’re asking a
question about data, you probably know
something about the data going in.
• But what we hear from people is that they are keenly aware that they don’t know what they don’t know.
• Topic modeling is meant to help that.
• In the next slides, another use of topic modeling:
identifying the themes of Martin Luther King Jr.’s
major speeches and sermons
• Topic modeling Dr. King’s major speeches and sermons gets these topics
• Which change over time
• See also http://idibon.com/topic-detection-mlk/