Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Quantifying Cultural Change:
An Application to Misogyny in Music
Contribution Statement
Consumer researchers have longs been interested in culture and cultural change, but
measurement has proved challenging. It is difficult to acquire data over time, and some things,
particularly the relationship between concepts (e.g., brand personality or cultural stereotypes) are
difficult to measure. This paper suggests that an emerging computational linguistics method,
word embeddings, can help address some of these challenges. In particular, we demonstrate how
this approach can be used to address a longstanding question, whether music lyrics are
misogynist (i.e., exhibit a dislike of, contempt for, or ingrained prejudice against women).
Natural language processing of a quarter of a million songs over 50 years provides an empirical
test of whether music is biased against women and how these biases have changed over time.
While both genders are equally likely to be objects of aggression, subtler machine learning
approaches paint a more complex picture. Compared to men, women are less likely to be
associated with desirable traits (i.e., competence). While this bias has decreased, it persists.
Ancillary analyses suggest that lyrics have become less gendered more broadly (though remain
gendered) and that temporal changes may be driven by male artists’ (as female artists were less
biased initially). Overall, the results shed light on subtle measures of bias and how natural
language processing can provide deeper insight into cultural change.
Abstract
While researchers have long been interested in culture and cultural change, quantification has
proven difficult. Many have argued that music is misogynistic, for example, but is that actually
true? And have any such biases changed over time? Natural language processing of a quarter of a
million songs over 50 years tries to address these questions. While both genders are equally
likely to be objects of aggression, subtler machine learning approaches paint a more complex
picture. Compared to men, women are less likely to be associated with desirable traits (i.e.,
competence). While this bias has decreased, it persists. Ancillary analyses suggest that lyrics
have become less gendered more broadly (though remain gendered) and that temporal changes
may be driven by male artists’ (as female artists were less biased initially). Overall, the results
shed light on subtle measures of bias and how natural language processing can provide deeper
insight into cultural change.
Keywords: Natural language processing, cultural evolution, music, stereotypes, misogyny.
Across a range of disciplines, researchers have long been interested in cultural change.
Consumer researchers want to understand the diffusion of innovations (Bass 1969), acceptance
of consumption practices (Humphreys 2010), and shifting of brand perceptions and personalities
over time. Management scholars are interested in the adoption of management practices and the
erosion of culinary categories (Rao, Monin, and Durand 2005). Sociologists want to understand
fashion cycles (Lieberson 2000; Simmel 1957), psychologists want to understand the persistence
of stereotypes (Schaller, Conway, and Tanchuk 2002; Schaller and Crandall 2003), and
sociolinguists want to understand the abandonment of dialects (Wolfram, Reaser, and Vaughn
2008).
But while cultural change is of broad interest, quantification is a key challenge.
Acquiring data over time is often difficult. Getting data on the popularity of a given fashion style
or linguistic marker is not trivial. Further, measurement is not always straightforward. It’s one
thing to measure new product adoption, baby name popularity, or the incidence of a particular
term in the news. Such discrete units either increase in usage or do not. But tracking more
complex things like brand personalities or the persistence of stereotypes is more difficult.
Studies usually rely on surveys, which work fine at individual time points, but are tough to
collect over longer time horizons. Further, while researchers can try to manually code archival
data, this method is often subjective and difficult to scale.
We suggest that an emerging machine learning approach may provide a powerful tool to
address some of these challenges. More and more textual data is available in a range of domains,
and natural language processing provides a useful key to unlock a range of insights (Humphreys
and Wang 2018). We examine how word2vec (Mikolov, Sutskever, Chen, Corrado, and Dean
2013), a well-adopted word vector representation technique, can shed light on questions
regarding cultural change that might otherwise be difficult or impossible to address.
In particular, we apply this technique to address an area of longstanding debate:
misogyny in music. While academics and cultural critics have long argued about whether music
lyrics are prejudiced against women (e.g., Cooper, Cooper and Faragher 1985), there have been
few empirical tests. Further, the empirical work that does exist is mostly based on small samples
or one genre over a short period of time (Armstrong 2001; Harding and Nett 1984) and relies on
somewhat subjective ratings.
To address these challenges, we collect more than a quarter of a million songs from six
music genres over more than 50 years. Then, we use word embeddings, as well as some simpler
natural language processing techniques, to investigate potential gender biases and whether they
have changed over time.
We make three main contributions. First, most narrowly, we speak to the longstanding
debate about misogyny in music. We provide the first large-scale analysis in this area, allowing
us to address (1) whether music is misogynist, (2) if this has varied over time, and (3) how this
may vary by genre.
Second, we showcase how an emerging machine learning method can shed light on
cultural change more generally. While our empirical application is misogyny in music, similar
methods could provide insight into changes in brand personality, public perception of politicians,
and the universality of emotions. We illustrate how the approach can be used, and then discuss
broader applications.
Third, we highlight the value of additional computational linguistic techniques for
consumer research. While more and more researchers are starting to use Linguistic Inquiry and
Word Count (LIWC, Pennebaker, Boyd, Jordan, and Blackburn 2015) and similar dictionary-
based methods, other approaches have received less attention. We discuss a number of ways
these approaches can shed light on a range of questions.
MISOGYNY IN MUSIC
Gender bias is pervasive (Carlana 2019; Garg, Schiebinger, Jurafsky, and Zou 2018;
Hamberg 2008; Kesebir 2017; Larivière, Ni, Gingras, Cronin, and Sugimoto 2013; Lundberg and
Stearns 2019; Moss-Racusin, Dovidio, Brescoll, Graham, and Handelsman 2012; Reuben,
Sapienza, and Zingales 2014; Shor, van de Rijt, Miltsov, Kulkarni, and Skiena, 2015 ). Across a
range of disciplines (e.g., business, science, and medicine) and outcomes (e.g., hiring, evaluation,
and recognition), women are often perceived less favorably and treated less fairly. Teachers’
gender stereotypes influence student performance (Carlana 2019), for example, and the same job
applicant is seen as more competent and offered a higher starting salary, if they have a male
rather than a female name (Moss-Racusin et al. 2012).
One reason bias may be so sticky is that they continually reinforced through culture.
Culture is often conceived of cultural background, and indeed, the particular countries or social
group people grow up in has an important impact on attitudes, cognitions, and behaviors (Markus
and Kitayama 1991). Compared to East Asians, for example, Americans tend to prefer more
unique options (Kim and Markus 1999) and are more persuaded by promotion-focused
information (Aaker and Lee 2001).
Beyond cross-cultural differences, though, indivdiual cultural items themselves also
shape attitudes and cognitions (Kashima 2008). Songs, books, and other cultural tastes and
practices not only reflect the setting in which they were produced, but also shape the attitudes
and behaviors of the audiences that consume them (Anderson, Carnagey, and Eubanks 2003;
Berger et al. 2019; Brummert Lennings and Warburton 2011). Song lyrics that are aggressive
towards women, for example, or portray them negatively, increase anti-female attitudes and
misogynous behavior (Fischer and Greitemeyer 2006). Lyrics that espouse equality, however,
can boost attitudes towards women and encourage pro-female behavior (Greitemeyer,
Hollingdale, and Traut-Mattausch 2015). Consequently, one reason stereotypes and biases, as
well as attitudes more generally, may be so persistent is that they are continually reinforced by
the cultural items (e.g., songs, books, and advertisements) that consumers experience on an
everyday basis.
But while such cultural items clearly have impact, their actual nature is less transparent.
Consider music. Are song lyrics biased against women? Do any such biases vary across genres?
And have they changed over time?
Attempts to answer such questions have been hampered by issues of scale and
measurement. While researchers in a range of disciplines have argued about whether music lyrics
are misogynist (i.e., exhibit a dislike of, contempt for, or ingrained prejudice against women),
most perspectives are based on small samples or one genre over a short period of time (Adams
and Fuller 2006). Harding and Nett (1984), for example, examined 40 rock songs covers and
Armstrong (2001) examined 13 rap artists. While a couple papers have examined slightly larger
samples in one genre (e.g., 340 rap songs from 1979 to 1997, Herd 2009) or cross-genre samples
over a short time period (e.g., 600 songs over four years, Flynn, Craig, Anderson, and Holody
2016) without more comprehensive data, it is difficult to draw strong conclusions. One genre
may not be representative of music as a whole, and without examining longer time horizons, it is
hard to know whether bias has shifted over time. Truly examining misogyny in music requires
analyzing at least tens of thousands of songs over multiple genres across multiple decades.
Further, even if one were able to compile such a dataset, measuring misogyny would be
challenging. It’s one thing to read and rate the lyrics of a few dozen songs, and multiple research
assistants could even rate a few hundred, but having people rate lyrics for tens of thousands of
songs would be prohibitive.
Finally, because existing analyses rely completely on human judgment, they are
susceptible to bias. The same lyrics, for example, may seem more or less misogynistic depending
on the gender of the person reading them.
THE CURRENT RESEARCH
To address these challenges, we take a different approach. Rather than having individuals
read song lyrics and manually rate them based on how misogynistic they seem, we use
automated textual analysis.
Recent work has begun to highlight the value of using automated textual analysis in
consumer research (Humphreys and Wang 2018; Netzer, Feldman, Goldenberg, and Fresko
2012; Netzer, Lemaire, and Herzenstein 2019, Moore and McFerran 2017; Packard, Moore, and
McFerran 2018; Rocklage and Fazio 2015; Rocklage, Rucker, and Nordgren 2018). Rather than
requiring individuals to manually read through text, these computerized approaches extract a
variety of textual features automatically. Researchers have used these approaches to measure
things like linguistic mimicry in online word of mouth (Moore and McFerran 2017), pronoun use
in customer service calls (Packard et al. 2018), and emotional language in persuasion (Rocklage
et al. 2018). Automated textual analysis is particularly useful because it can parse large quantities
of textual information, relatively quickly, and can do so in an objective manner.
Most research in this area has relied on a class of approaches that can be described as
word extraction. At a basic level, these approaches capture how often a given entity (e.g., word
or phrase) appears in a text. Dictionaries like Linguistic Inquiry and Word Count (LIWC,
Pennebaker et al. 2015), for example, allow researchers to measure how often first-person
pronouns, positive emotion words, or other types of language show up. Tools like Evaluative
Lexicon (Rocklage et al. 2018), Hedonometer (Dodds and Danforth 2010), and VADER (Hutto
and Gilbert 2014) use similar methods and researchers can even create their own customized
dictionaries to use in a given application (Humphreys and Wang 2018).
But while word extraction is useful in a variety of applications, it has some limitations.
Taken to the context of misogyny, for example, one could certainly measure the number of times
certain pejorative terms for women show up in songs over time, but that, by itself, wouldn’t tell
you how those terms are being used. The word b*tch, for example, could be used to refer to a
man, a woman, or potentially even an object.
Some work by marketing scholars has begun to leverage another class of approaches
called topic extraction. Similar to how factor analysis might be used to identify underlying
themes among survey items, tools like topic modeling (e.g., Latent Dirichlet Allocation or LDA,
Blei, Ng, and Jordan 2003) can identify the general topics or themes discussed in a body of text,
as well as the words that make up those themes. Country songs, for example, talk a lot about the
themes of “girls and cars” (e.g., words like car, drive, girl, and kiss) and “uncertain love” (e.g.,
words like love, ain’t, and can’t, Berger and Packard 2018. Toubia, Iyengar, Bunnell, and
Lemaire (2019) use a similar approach to find the main themes discussed in movies.
But while topic extraction is useful for extracting themes from a large number of
documents, is still doesn’t say much about the relationship between entities. Whether women are
talked about differently than men, for example, or whether certain brands are often described as
having particular traits.
For these types of questions, an emerging computational linguistic approach called word
embeddings is particularly useful (Devlin, Chang, Lee, and Toutanova 2018; Mikolov et al.
2013; Pennington, Socher, and Manning 2014). As discussed in more detail in the methods, this
machine learning framework represents each word by a vector, and uses a high-dimensional
space to map each word or entity based on the other words with which it frequently appears. The
relationship between these vectors then captures the semantic relationship between words.
Researchers can use this space to understand the relationship between words and the context in
which they are used.
We use word embedding to examine whether (and how) men and women are talked about
differently in music and whether any such biases may have shifted over time. We start by
compiling a dataset of more than a quarter of a million songs from six music genres over more
than 50 years, orders of magnitude larger than any previous dataset used to examine misogyny.
Next, we use several natural language processing approaches to examine potential biases.
Before turning to the more complex word-embeddings approach, we begin with a simpler
analysis. Since some work has measured misogyny using aggression towards women (Fischer
and Greitemeyer 2006) we start by identifying verbs that express aggression (e.g., hate) and
measuring the degree to which each gender is the recipient of aggressive thoughts and actions.
Gender bias can also be more subtle. Compared to men, for example, women are often
described as warm (e.g., kind and supportive), but less likely to be described as competent (e.g.,
smart and ambitious, Fiske, Cuddy, Glick, and Xu 2002). Consequently, we use word
embeddings to measure this subtler form of misogyny. Not whether lyrics are explicitly
aggressive towards women, but whether woman are less likely to be linked with desirable traits.
Finally, to begin to shed light on what might be driving any observed patterns, we
examine artist gender. We examine whether the prevalence of female artists has shifted over
time, whether male and female artists use different language, and how any shifts in their
language are related to overall changes in misogyny.
We close with a broader discussion of how word embeddings, and natural language
processing more generally, may be used to inform a wide range of consumer research questions.
EMPIRICAL ANALYSIS OF A QUARTER OF A MILLION SONGS
Data
First, we compiled songs and their lyrics. Given copyright issues, there are limited open-
access lyrics datasets, so we compiled information from different places. We started by scraping
the Billboard website, pulling all songs on each of the major charts (i.e., pop, rock, country,
R&B, dance, and rap) every 3 months from 1965 (or whenever a given chart started) to 2018. We
matched these songs with lyrics from SongLyrics.com. To collect additional songs, we gathered
all the songs and their lyrics from datasets on kaggle.com (Agarwal 2017; Kuznetsov 2017;
Mishra 2017) and scraped all available songs and their lyrics from a major lyrics website. Then,
we used the million songs dataset (Mahieux, Ellis, Whitman, and Lamere 2011) and freeDB
(“FreeDB” 2018) to append year and genre for any song that did not already include this
information. We removed any song with missing information and used artist and title
combination to remove duplicates in the data.
The final dataset includes 258,937 songs from 1965 to 2018. It includes pop, rock,
country, R&B, dance, electronic, rap, and hip-hop genres. Given data sparsity, and the similarity
between rap and hip-hop, we combine these two genres. Given data sparsity, and the similarity
between dance and electronic, we do the same thing for these two genres.
Second, we cleaned the lyrics. Since we are focused on English language lyrics, we used
the langdetect package (version 1.0.7) in Python to remove any non-English lyrics. We also
removed non-informative texts in brackets, such as [Verse 1] or [Pre-Chorus 1]. Finally, we
shifted all lyrics to lower case so that variations of the same word are treated similarly (e.g., Man
and man).
Third, to identify words associated with different genders, we rely on word lists from
prior work. To identify words related to men (e.g., he and him) and women (e.g., she and her) we
used word lists from Garg et al. (2018). While LIWC (Pennebaker et al. 2015) includes a wider
list of words (e.g., godmother, schoolgirl, grandson, and son-in-law) many show up quite
infrequently in the data, so averaging across them would lead to inaccurate estimates.
Consequently, we used Garg et al. (2018) more focused list.
Data Preparation. To ensure we have enough data in each period to train the models, we
use 5-year time buckets. Bucket structure was selected to ensure enough songs per data point to
train the models (including at least half a million words), as well as enough overall data points to
examine changes over time.
One question is how to balance songs from different genres. Some genres happen to have
greater representation in the dataset, and we address this in two ways. First, to ensure equal
representation from each genre, most of our analyses use under-sampling, a common machine
learning practice when datasets include imbalanced class labels. Assume data from three genres
g1, g2, and g3, where g1has the smallest number of songs. To generate a balanced sample for the
cross-genre analyses, we randomly select ¿∨g1∨¿ songs from genres g2 and g3 and perform the
analyses on that sample. We repeat this sampling and analysis 100 times, averaging across the
runs. Substantial number of dance and rap songs do not enter the data until 1980 and 1990
respectively, so we only include them (and sample accordingly) after those time points.
Second, one could argue that certain genres are less popular and so equal weighting does
not actually reflect reality. To address this possibility, as a robustness check we collect data on
genre popularity over time and weight the genres by these estimates. As shown in the Robustness
Checks section, results remain the same.
Methods and Results
Methods for Object of Aggression. Some work has measured misogyny using aggression
towards women (Fischer and Greitemeyer 2006) so we start by identifying verbs that express
aggression (e.g., hate) and measuring the degree to which women and men are the recipient of
aggressive thoughts and actions. In phrases like “I hit her”, for example, “her” is the recipient
(i.e., object) of the aggressive verb “hit.” To identify verbs commonly used for expressing
aggression (e.g., hate and burn) we use Violence Vocabulary Word List
(https://myvocabulary.com/word-list/violence-vocabulary/). We use spacy package (version
2.0.18) in Python to extract words dependency in sentences and calculate the frequency of a
gender being used as an object of an aggressive verb.
Results for Object of Aggression. While women are more likely to be recipients of
aggression in songs today than in the 1960s (β = 1.45, p < 0.001), this ignores the possibility that
aggressive acts could have generally increased over time and affected both genders equally.
Indeed, men are also more likely to be recipients of aggression in recent years (β = 1.70, p =
0.07). Looking over time shows that women (47.6%) and men (52.4%) are recipients of
aggression a similar amount overall (χ2(1) = 2.50, p = 0.11) and that this does not change
significantly over the years (β = -0.0026, p = 0.86).
One could wonder whether the results are somehow driven by the specific way in which
the outcome measure was calculated, so to test robustness, we also use an alternate approach. We
calculate the difference between women and men being recipient of aggression while controlling
for how often they are recipient of any thought or action (e.g., “I kissed him” or “I kissed her”).
Consistent with the main analysis, however, this difference between the ratios does not change
over time (β = -0.0001, p = 0.77).
Even misogynists might not explicitly sing about wanting to hurt women, however, for
fear that it would restrict their audience. More generally, if misogyny does exist it may be more
implicit.
To address these limitations, we next use a state-of-the-art machine learning approach to
examine a subtler form of misogyny. Not whether lyrics are explicitly aggressive towards
women, but whether woman are less likely to be linked with desirable traits.
Methods for Gender-Trait Association. Competence and warmth are two universal
dimensions of social cognition (Fiske, Cuddy, and Glick 2007), but while women are often
described as warm (e.g., kind and supportive), they are less often described as competent (e.g.,
smart and ambitious).
We use word2vec (Mikolov et al. 2013), a well-adopted word vector representation
technique, to quantify how likely women (relative to men) are to be linked to competence, as
well as other traits (i.e., intelligence, warmth, masculine stereotypes, and feminine stereotypes),
over time.1
Word2vec is a two-layer neural network which receives a corpus of text and transforms it
to numerical vectors. This approach assigns each word a high-dimensional vector such that the
relationship between vectors captures the semantic relationship between the words. Words that
relate to one another such as fruit (e.g., apple and orange) or vehicles (e.g., car and truck) appear
close together, but words that are not as related (e.g., apple and truck) appear further apart.
To train word embeddings, word2vec takes into account words co-occurrence, distance,
and occurrence in similar contexts. If “dogs are smart” shows up frequently in a body of text, for
1 We also tested several other well-known methods on a random small subset of our data and found that word2vec was most appropriate. We compared truncated SVD (a.k.a. latent semantic allocation), Positive Pointwise Mutual Information (PPMI), GloVe (Pennington et al., 2014), and word2vec. The preliminary results and consistency of word2vec results along with previous research (Sahlgren and Lenci, 2016) showed that word2vec using CBOW is the appropriate method for our question.
example, that would shrink the relative distance between the vector for dog and the vector for
smart. Beyond just incidence, though, the distance between occurrences of words also
matters. Even though both “dogs are smart” and “dogs love running and are also smart” contain
the words dog and smart, the first example has them closer together, which decreases the
vectors’ distance in word2vec space. Note that two words do not have to necessarily co-occur,
occurring in similar contexts also shapes similarity. If “dogs are animals” and “animals are
smart” both appear frequently, it decreases the distance between the vectors for “dogs” and
“smart” even if the two words never appear together.
The difference between such vectors has been shown to be a powerful tool for studying
language and human perceptions about the relationship between words. Bhatia (2017), for
example, demonstrated that vectors are closer together for words that are similar or used in
similar contexts. Consequently, recent work has begun to use similar methods to measure
stereotypes. Work using the Google Books corpus (Lin, Michel, Aiden, Orwant, Brockman, and
Petrov 2012), for example, found that word embedding associations (i.e., the distance between
male and female words and target words) captured human ratings of whether those target words
were more associated with men or women (r > 0.76, p < .001, Kozlowski, Taddy, and Evans
2019). Other research using that corpus (Garg et al. 2018) found that word embedding
associations between words related to women and different occupations words tracked the
percentage of women in each of those occupations over time (r = 0.70, p < .001). Consequently,
a great deal of research in a variety of contexts has shown that word embeddings are a powerful
method for capturing people’s perceptions of the relationship between words, and those
relationships over time.
We use a similar approach to measure gender bias in music. We quantify the relationship
between words related to each gender, and key dimensions of interest (e.g., competence), over
time. First, we used word lists from prior work (Garg et al. 2018) to identify words related to
men (e.g., “man, “men,” and “he”) and women (e.g., “woman, “women,” and “she,” see table A1
for full list). Second, we use word2vec to train a word embedding model per time-bucket and
generate 300-dimensional real-valued word vectors. That means for each word (e.g., woman) and
period in the dataset (e.g., 1965-1970), we use the set of song lyrics over that period to position
the vector for that word. Third, we take the word vector representation of words in our gender
word lists and average the vectors to obtain one single 300-dimensional vector per each gender.
Fourth, we use a similar approach to obtain vectors for key dimensions of interest (e.g.,
competence). In the case of competence, for example, we use a list of word from prior work
(Nicolas, Bai, and Fiske 2019) that are known to relate to competence (e.g., smart, persistent, and
knowledgeable, see table A1 for more examples).
Fifth, to measure the association, we capture the similarity between gender vectors and
key dimension vectors (e.g., competence) at any given period in time. We use cosine similarity
metric because it is frequently used in the literature and in the original word2vec article
(Mikolov et al. 2013). If a gender word and a competence word occur in similar contexts, for
example, their vector representations are close in the word2vec space, which results in a larger
cosine similarity score and indicates that the two words are highly associated. Conversely, if the
words do not occur in similar contexts, the cosine similarity is smaller, and the words are
considered less associated.
To give a sense of how this approach works, Figure 1 provides a simplified version. We
only show two dimensions for the sake of illustration, but words are represented with 300-
dimensional vectors in our analyses.
FIGURE 1. ILLUSTRATION OF THE APPROACH IN IN A TWO-DIMENSION
VECTOR SPACE
As shown in the figure, semantically similar words (e.g., the group related to men or the
group related to women) appear close to one another. As discussed, we create a single vector for
each gender and compare how close it is to words representing key dimensions of interest (e.g.,
the vector for the word “successful” which relates to competence). The figure shows that
compared to the female vector, the male vector is closer to “successful,” indicating that
successful is more associated with men.
Formally, to calculate similarity of dimension d (e.g., competence) to gender g, we
calculate the cosine similarity of each word vector in W d to gender vector V g, given that W d is a
matrix of n by 300 where n is the number of words in dimension d and 300 is the vector size.2 To
calculate how biased dimensions are, for each dimension word, we subtract cosine similarity
between the dimension word vector and the vector representing female from the cosine similarity
between the dimension word vector and the vector representing male. Positive values indicate
that dimension word is more associated with males, negative values indicate the opposite. This
2 While we create a vector for gender words following Garg et al., (2018), we do not do so for competence because while gender words all capture a similar semantic, competence words may not. Stubborn and ambitious are both on the competence list, for example, but have very different meanings. Averaging over gender word vectors, give us a vector representing that gender, but averaging over tens of dimension words would add noise.
difference score also removes any general time trends that affect both genders equally (e.g.,
maybe more recent songs talk about competence more in general):
d∈ {competence , intelligence, warmth , masculine stereotype , feminine stereotype }
bia sd=1n∑i=1
n
(cosinesimilarity (W⃗ di , V⃗ male )−cosine similarity (W⃗ di ,⃗V female))
In addition to prior work, the Validation section below demonstrates that such difference scores
capture gender biases.3
Finally, we analyze whether the associations change over time. We use linear mixed
effect models with time as the independent variable and bias as the dependent variable. Given
that association with gender may vary across words, we add a random effect for words. We run
separate linear mixed effect models for individual dimensions (e.g., competence) and genres
(e.g., rock).4
Results for Gender-Trait Association. Results suggest that while lyrics have generally
become less misogynistic over time, they remain biased.
3 Note that we do not analyze changes in the popularity of competence words over time but how connected competence words are to each gender. Rather than looking at whether more recent songs mention the word “smart”, for example, we are interested in whether, when songs use the word smart, they are using it to refer more to men or women.4 One might wonder why we did not use word embeddings for the aggressive term analyses. Given that an aggressive act has both a subject and an object (e.g., “she hit him” or “he hit her”), being closer to aggression by itself does not distinguish between whether women are the subject or object of such acts. Consequently, looking at subject or object directly seemed more appropriate.
FIGURE 2. ASSOCIATION BETWEEN GENDER AND COMPETENCE
Note: Dashed line represents equal association with either gender or values greater (less) than 0 indicate the dimension is more associated with men (women). Grey regions represent 95% confidence intervals around the estimates for each period.
Words related to competence (e.g., smart) have become more associated with women,
from strongly biased towards men to less so (β = -0.53, p < 0.001, β2 = 0.54, p < 0.001, figure
2). That said, gains level off in the late 1990s, and confidence intervals do not overlap with zero
in recent years suggesting that competence remains associated with men. Results remain similar
through a number of robustness checks and words related to intelligence (e.g., precocious and
inquisitive, Garg et al., 2018) show similar results (β = -0.40, p < 0.01, β2 = -0.59, p < 0.001,
figure A5).
One could wonder whether the leveling off is driven solely by the introduction of new
genres. Rap and dance show up later in the dataset, for example, so if they are more misogynistic
than other genres, that could lead to a leveling off. Ancillary analyses (Web Appendix),
however, show that this alone does not explain the patter. While the introduction of rap and
dance in the 1980s contributed, even focusing on genres that existed previously (i.e., pop, rock,
and country) shows similar patterns. More generally, examining different genres separately (web
appendix) suggests that while competence has become more associated with women in pop,
rock, and country, such associations have not changed in R&B and dance. In rap lyrics,
competence has become less associated with women.
For completeness, we also perform the same type of analysis for warmth related words
(e.g., kind, friendly, and caring, Nicolas et al. 2019, see table A1 for more examples). Results
indicate that warmth has become less uniquely associated with women (β = 0.86, p < 0.001, β2 =
-0.53, p < 0.001, figure 3).
FIGURE 3. ASSOCIATION BETWEEN GENDER AND WARMTH, MASCULINE, AND FEMININE STEREOTYPES.
Note: Dashed line represents equal association with either gender and values greater (less) than 0 indicate the dimension is more associated with men (women). Grey regions represent 95% confidence intervals around the estimates for each period.
Further analysis of words associated with masculine stereotypes (e.g., leader and
ambitious) and feminine stereotypes (e.g., cooperate and trustworthy, Gaucher, Friesen, and Kay
2011, see table A1 for full lists) suggests that lyrics may have become somewhat less gendered
overall (figure 3).5 Words related to masculine stereotypes have become less associated with men
5 One might wonder whether lyrics becoming less gendered (i.e., less associated with either gender) is good. While decreased gendered associations is clearly good in some situations (e.g., competence moving from being more associated with men to less so), in others it is less obviously beneficial (e.g., warmth becoming less associated with women). Thus, whether it is perceived as good or bad seems to depend less on the change and more about whether the trait is seen as positive and originally associated with women or not. Consequently, we use the termed gendered rather than biased to be agnostic about the valence of the change.
(β = -0.69, p < 0.001, β2 = 0.41, p < 0.001) and words related to feminine stereotypes have
become less associated with women (β = 0.17, p = 0.18, β2 = 0.26, p = 0.03). For both warmth
and masculine stereotypes, though, gains level off in the early 1990s (see web appendix for more
details).
Artists Gender Analysis. One might wonder what is driving the observed changes. Are
the linguistic shifts similar for male and female artists, for example? Or might they simply reflect
the inclusion of more female artists over time who may use less misogynistic and gendered
language?
To attempt to at least begin to address these questions, we analyze artist gender. To infer
artists’ gender, we use gender_guesser package (version 0.4.0) in Python along with national
name list (Kaggle 2017). For individual artists (e.g., Katy Perry or Billy Joel) we simply use their
name, but for artists that go by pseudonyms (e.g., Snoop Dogg) or bands that have multiple
members (e.g., Metallica or the Supremes) we first use their Wikipedia page to extract the
individual name(s). In cases where no such page is available, research assistants searched for this
information manually. Then, we use a similar approach as individual artists to detect gender and
assign the band a score based on the percentage of members that are female. Gender was able to
be assigned for 99.5% of the artists.6
Results cast doubt on the notion that the results are driven by more female artists over
time. There is no change in the percentage of female artists over time (23% female artists, β = -
0.0001, p = 0.82). Further, while one could argue that this is driven by new genres emerging,
even within genres there is no significant change (β pop = 0.0037, p = 0.06, all other genres p >
6 Note that there are three to four times more male artists in the dataset (figure S8), so the effects for male language may be more precisely estimated. Further, while there are enough solo female artists or all female groups, and solo male artists or all male groups to estimate effects for each, it is more difficult to do so for mixed gender bands. Less than 10% of songs were performed by mixed gender groups which provides limited data for estimating language changes.
0.2).7 This casts doubt on the possibility that shifts in language are driven by more female artists
over time.
Instead, results are more consistent with a shift in male artists’ language. For male
artists, words related to competence (β = -0.0004, p < 0.001), and masculine stereotypes (β = -
0.0017, p < 0.01) become more associated with women while warmth related words become
more associated with men (β = 0.0002, p < 0.001). Words related to feminine stereotypes show
no change (β = -0.0006, p = 0.37).
While female artists’ language shows less evidence of change (βcompetence = 0.0001, p =
0.35; βmasculine stereotypes = 0.0006, p = 0.31; β feminine stereotypes = 0.0012, p = 0.1; βwarmth = 0.0003, p <
0.01), deeper examination suggests that this may be driven by the fact that female artists’
language was less misogynistic to begin with (figure 4 and table A4).
FIGURE 4. LINGUISTIC DIFFERENCES BASED ON ARTISTS’ GENDER.
7 Rock and rap seem to have a lower percentage of female artists than other genres (pop = 37%, rock = 13%, country = 26%, R&B = 26%, dance = 23%, rap = 17%).
Note: Dashed line represents equal association with either gender and values greater (less) than 0 indicate the dimension is more associated with men (women). Grey regions represent 95% confidence intervals around the estimates for each period.
Female artists are more likely to associate women with competence overall (βgender = -
0.0144, p < 0.001), and the gender x time interaction (β = 0.0002, p < 0.001) indicates that the
gender difference has decreased over time as male artists started associating women more with
competence. The same holds for masculine stereotypes (βgender = -0.0586, p < 0.001, βgender x time =
0.0013, p < 0.01). Female artists also associate warmth less with women (βgender = 0.0082, p <
0.001) though the gender difference has not closed over time (βgender x time = 0.0000, p = 0.8).
There were no significant effects for feminine stereotypes (βgender = -0.0273, p = 0.08; βgender x time
= 0.0008, p = 0.11). Analysis of the most recent time period, however, suggests that artists of
both genders remain biased. Lyrics associate men more with competence and masculine
stereotypes, and women more with warmth.
One might wonder whether individual artists’ (e.g., U2) language has changed, but the
mechanics of word embeddings makes answering such questions challenging. One would need
thousands of songs at different time points to make comparisons over time. While this is true
across artists, no individual artist has that many songs.
Validation
As noted above, an emerging body of research (e.g., Garg et al. 2018; Kozlowski et al.
2019) demonstrates that word embeddings can be used to accurately capture gender biases. That
said, to ensure that word embeddings are capturing misogyny and gender stereotypes in our
context, we conducted additional validation checks. First, we examined the relationship between
Kozlowski et al. (2019)’s human scores and the word embedding bias in our model. We find
similar results (r = 0.63 p < .001).
Second, we asked three RAs to code competence words (e.g., think and knowledge)
based on how stereotypically masculine or feminine they were (i.e., -2 to 2 scale, is a very
feminine word to is a very masculine word). We then compared these scores to the embedding
bias at the most recent time point (i.e., 2015-2018). Results demonstrate that the embedding bias
is related to human-coded stereotype scores (r = 0.41, p < .001).
Third, following Garg et al. (2018), we calculated the correlation between word
embedding bias in our models and human-coded stereotype scores used by Garg et al. (2018) at
two time points (1977 and 1990). Since we use 5-year time buckets, we trained models for 1975-
1979 and 1990-1994 periods respectively. Results revealed a significant correlation between
word embedding bias and human-coded gender stereotype scores for both time periods (r = 0.36,
p < .001 and r = 0.18, p < .05).
Taken together, the combination of prior work, as well as our own additional empirical
tests support the notion that word embeddings are accurately capturing differences in gender
associations.
We also provide a more general test that our word embeddings are accurately capturing
the semantic relationship between words. As mentioned earlier, words that are similar or are used
in similar contexts, have been shown to have closer word vectors in the word embeddings space
(Bhatia 2017). We find the same thing in our data. Following Hamilton, Leskovec, and Jurafsky
(2016) we use a standard similarity task benchmark called MEN (Bruni, Boleda, Baroni, and
Tran 2012). MEN dataset includes human judgment scores on similarity of 3000 word pairs. We
calculate the similarity score between these word pairs in our model. Then, we measure the
correlation between human-coded similarity scores and model-calculated similarity scores. The
average correlation for cross-genre and within-genre analyses are 0.47 (p < .001) and 0.37 (p
< .001) respectively. The significant positive correlations show that word association in our
trained model is a good representative of word association in language judged by humans.
Robustness Checks
While the results suggest that misogyny has decreased, one could wonder whether this
simply reflects the increasing complexity of song lyrics. This argument would suggest that while
song lyrics in the 1970s were rather simplistic (e.g., Let It Be), in 2010, song lyrics may be much
more complex and varied and this would somehow lead to our effect. But while such an
argument might suggest that the increased complexity would lead to an increased distance
between men and competence words, as well as women and competence words, it says less about
why the relative distance between women and men with competence words would decrease. Said
another way, it would suggest that the distance between both male and female words and
competence would increase, but complexity alone has difficulty explaining why the increase
would be greater for one gender than the other.
That said, additional empirical tests cast doubt on this alternative. The main results
persist even when controlling for any such shifts in language complexity over time. To attempt to
control for the possibility that lyrics may have become more complex, we measure how the
distance amongst words within each of the gender word lists (e.g., words related to men like
“man, “he,” and “his”) has changed over time. This captures whether the similarity of even
related words (“man” and “he,” for example, or “woman” and “she”) has shifted over time. We
then take the average of the similarity among words related to men (e.g., “man, “he,” and “his”)
and similarity among words related to women (e.g., “woman, “she,” and “her”) and normalize
our main competence measure by this average. Results are quite similar. Competence words
have become more associated with women over time, though they remain more associated with
men. Further, these gains have leveled off and even reversed as of late.
We also conducted a number of robustness checks to see whether the results are robust to
the songs, words, and method used.
Songs Used. Given that more popular songs may be more likely to have their lyrics
available, the dataset, especially in earlier years, may include more popular songs. That said,
because more popular songs are listened to more anyway, they should have a greater impact on
the population.
To ensure that equally weighting genres is not driving the results, we weight the genres
based on popularity. Billboard Hot 100 charts are the only consistent audience consumption data
available from 1960s to today, so we rely on genre popularity scores collected from Billboard by
Beckwith (2016), indicating the percentage of songs in Hot 100 charts from each genre in each
time period. We weight things accordingly. For example, assume in 2000, country, rock, and pop
songs occupy 25%, 25%, and 50% of the Billboard charts respectively. If we have 1000 songs
from each genre in our original dataset, we select 1000 pop songs, and 500 songs from rock and
country genres (total of 2000 songs). The number of songs for 1965-70 year-bucket did not meet
our threshold of half a million words and was dropped for the genre popularity analysis.
Results are similar to the main analyses. Aggression verbs were directed equally likely
towards both genders (women 49% of the time and 51% of the time toward men, χ2(1) = 2.0, p =
0.2) and it did not change significantly over the years (β = -0.0029, p = 0.74). Words related to
competence (β = -0.66, p < 0.001, β2 = 0.58, p < 0.001) become less associated with men over
time as does intelligence words (β = -0.47, p < 0.001, β2 = 0.45, p < 0.001). Warmth has become
less associated with women (β = 0.69, p < 0.001, β2 = -0.16, p = 0.25), masculine stereotypes
have become less associated with men (β = -0.47, p < 0.01, β2 = 0.34, p = 0.01), and feminine
stereotypes are more equivocal (β = -0.09, p = 0.38, β2 = 0.23, p = 0.03).
Words Used. To further test robustness, we consider the words used in our analyses. First,
as in previous research (Mikolov et al. 2013), we drop words with frequency smaller than 5 in
each year-bucket and repeat our analyses. Results remain generally similar. Words related to
competence become less misogynistic (β = -0.13, p = 0.07, β2 = 0.27, p < 0.001) with a turnover
in recent years. Warmth (β = 0.30, p < 0.001, β2 = -0.04, p = 0.63) and feminine stereotypes (β =
0.29, p < 0.001, β2 = 0.09, p = 0.22) become less associated with women and masculine
stereotypes become less associated with men (β = -0.12, p = 0.07, β2 = 0.24, p < 0.001).
Second, we only consider words which have occurred in every time point. This is a
restricting rule to guarantee similar predictors (i.e., words) are compared over the years. Results
remain similar. Words related to competence become less associated with men (β = -0.16, p =
0.04, β2 = 0.15, p = 0.05). Warmth (β = 0.36, p < 0.001, β2 = -0.03, p = 0.69) and feminine
stereotypes (β = 0.24, p < 0.01, β2 = 0.15, p = 0.07) become less associated with women, and
masculine stereotypes become less associated with men (β = -0.21, p < 0.01, β2 = 0.21, p <
0.01).
Methods Used. As noted previously, we tested several other well-known methods such as
truncated SVD (a.k.a. latent semantic allocation), Positive Pointwise Mutual Information
(PPMI), and GloVe (Pennington et al. 2014), and find similar results.
We also considered several alternate analyses. First, we considered testing whether there
is a shift in how men and women appear as subjects and objects. The subject of a sentence has an
active role as the agent of action (e.g., the word she in “she called him”), while the object takes a
more passive role as the recipient of action (e.g., the word him in “she called him”). We could
thus measure how often men and women appear as subjects (e.g., she and he) and objects (e.g.,
her and him) over time. The link between this and misogyny, however, is not completely
straightforward. Women could be the subject of a sentence, for example, but still talked about
negatively (e.g., she is dumb).
Second, we tried to examine changes in derogatory words (e.g., b*tch) over time, but
measurement proved challenging. For example, while “b*tch” was originally referred to women,
its meaning shifted to be used for both genders. Thus, it is difficult to accurately capture shifts in
reference to women.
GENERAL DISCUSSION
There is longstanding interest in culture and cultural dynamics, both within consumer
behavior, and across other disciplines more broadly. But measurement has been a key challenge.
It’s one thing to measure brand personality, perceptions of cultural issues, or stereotypes at a
particular time point, but tracking these aspects over time becomes challenging. Surveys are
difficult to use over long time horizons, and manual coding of archival data is difficult to scale.
We suggest that an emerging method from computational linguistics may help address
some of these challenges. To illustrate how this approach might be used, we applied it to the
question of misogyny in music; whether lyrics are misogynistic, whether this has changed over
time, and whether this varies across genres.
Analyzing over a quarter of a million songs over more than 50 years begins to shed light
on these questions. While there is little evidence of explicit misogyny (e.g., greater aggression
towards women), subtler forms of misogyny suggest a more complex picture. Lyrics have
become less biased against women over time, but bias remains. Women are more likely to be
associated with competence and intelligence than they were in the past, but these concepts are
still more associated with men. Lyrics have become less gendered more broadly, but they remain
gendered. Further, reductions in bias seem to have slowed and may even have reversed in some
cases. Deeper examination suggests that this is driven by both new genres (e.g., rap) and changes
in older ones (e.g., country). Finally, ancillary analyses suggest that rather than being driven by
more female artists, these results are driven by male artists shifting their language over time.
Researchers and cultural critics alike have long debated whether popular music is
misogynistic. These results not only provide empirical evidence, but allow for quantitative
comparison of where (i.e., which genres and artists) and when (i.e., time periods) such biases
may be larger.
The results also highlight the importance of considering subtler measures of bias. Across
disciplines, there is increased interest in more implicit measures of attitudes and biases (Payne,
Vuletich, and Brown-Iannuzzi 2019) and language provides another useful approach. Like a
fingerprint, words reflect things about the people and contexts that generate them (Pennebaker,
Mehl, and Niederhoffer 2003). Consequently, language can be a useful barometer in a wide
range of domains. Future work might examine whether the language used in films, for example,
has changed over time, and whether there have been shifts in the way women and minorities are
discussed.
Broader Applications
While our empirical work focused on misogyny in music, word embeddings can be used
to shed light on a variety of other questions.
To study brand personality, for example, researchers often ask people to fill out surveys
reporting whether particular brands are more or less exciting or sincere. While this works
reasonably well when looking at brand personality at one, or even a couple points in time,
measuring brand personality over a longer horizon becomes more challenging. Collecting survey
data over decades is often prohibitive, and even if prior researchers collected some related data,
they often used different measures. But word embeddings can help address these challenges.
Rather than asking people for their perceptions, researchers can use existing textual data (e.g.,
social media or online reviews) to automatically measure semantic relationships. Just as we
measured the relationship between words related to women and competence, researchers could
extract the distance between brands and each of the key personality dimensions, and do so over
time. Similar approaches could be used to track consumer attitudes towards social issues or
political candidates. The Google Books dataset (Lin et al. 2012) and New York Times corpus
(Sandhaus 2008) are valuable data sources for researchers interested in even longer time
horizons.
Word embeddings could also be used to address the ongoing debate in the emotions
literature about the universality of emotions. While some (e.g., Ekman 1992) have argued that
there are a discrete number of universal emotions that are consistent across cultures (e.g., anger,
sadness, and disgust), others have suggested that the meaning of emotions are more malleable
(Jack, Garrod, Yu, Caldara, and Schyns 2012). By examining language corpora from different
cultures, researchers could use word embeddings to test whether the same words really have
similar semantic meanings. Whether anger is used to talk about the same things in Spanish, for
example, as it is in English.
Researchers studying goals, motivation, and self-regulation may also find word
embeddings useful. Work on desire and desire regulation (Hofman, Vohs, and Baumeister 2012),
for example, notes that “the majority of research on self-regulation occurs in the laboratory” and
that “little is known about what types of urges are felt strongly (or only weakly), what urges
conflict with other goals, and how successfully people resist their urges” (p. 582). Word
embeddings, topic modeling, and similar approaches can allow researchers to examine these
questions in more naturalistic settings. Social media, blog posts, and even the Google Books
dataset could be used to examine the urges people have, how they conflict, and even how these
urges may have varied over time. What people want and feel like they should do today (e.g.,
balance work and family or consume less social media), for example, is likely very different
today than it was 20 years ago.
Related approaches may also be valuable for looking within cultural items themselves.
Researchers have long theorized about the nature of narrative. Books, movies, and songs, but
also online content, speeches, and even academic papers usually have beginnings, middles, and
ends, through which there is some sort of narrative flow. Why are some narratives more
engaging and impactful than others?
One possibility is how the plot or narrative develops or unfolds. Some narratives move
very quickly, while others plod along at a slower pace. Some repeatedly tread new ground while
others circle back to the same themes again and again. Natural language processing can be used
to measure the geometry of narrative and whether it shapes success.
Word embeddings examine the semantic relationship between words, but related
approaches have applied these ideas to larger chunks of linguistic information. Just as word2vec
represents words as a vector in a multidimensional space, techniques like Doc2vec (Le and
Mikolov 2014) and USE (Cer, Yang, Kong, Hua, Limtiaco, John, and Sung 2018), do something
similar for sentences or larger chunks of text. This would allow researchers to test, for example,
whether narratives that move at a faster pace (i.e., adjoining chunks are more distant from one
another) are more or less successful.
Research could also examine the relationship between cultural items. Whether certain
songs, baby names, or other cultural items become popular depends not only on the features of
those items themselves, but how similar those items are to the larger cultural context in which
they are embedded. Songs whose lyrics are more atypical, or more differentiated from their
genre, for example, are more popular on the Billboard charts (Berger and Packard 2018). Similar
results might hold for movies or even academic papers. Papers whose themes are quite
differentiated from the journals they are published in, for example, may be more novel and thus
cited more. Alternatively, one could argue that if papers are too different, they may be cited less.
Topic modeling approaches may be even more useful than word embeddings here, because they
can capture underlying thematic content of cultural items.
Limitations
As with many projects involving field data, this work has a number of limitations. First,
one could wonder whether the dataset is representative of the music people are actually exposed
to. At a genre level, the fact that the results hold whether we equally weight each genre, or
weight them by popularity, casts doubt on the notion that genre weighting is driving the results.
At a song level, by collecting the Billboard charts each year, we certainly have a representative
sample of popular songs. We also sampled less popular songs, but it is harder to be certain about
whether that sample is representative because there is no complete list of songs out there. Given
we collected all available sources on songs that included lyrics, genre, and year information,
however, and the dataset is orders of magnitude larger than what has been used in prior work,
hopefully it is closer to being representative. Further, given that the population is exposed to
popular songs much more frequently than unpopular ones, we believe we have a reasonable
sample of what the population was exposed to.
Second, the results are certainly shaped by the word lists used. We relied on lists
developed by prior work, shown to tap the key constructs of interest, but one could still have
additional questions. In our competence (and warmth) dictionary, there are words with positive
relationship with competence (e.g., “smart” with weight = 1) and words with negative
relationship (e.g., “dumb” with weight = -1). That said, one could wonder about negations (e.g.,
“women are not smart”). While negations are not directly taken into account, if some lyrics say
“women are not smart,” it is likely that the same lyrics or other lyrics also say “women are
dumb.” Given enough representative data, there will be several other instances of “smart,”
“dumb,” and “women” appearing in different contexts which will shape the meaning and final
positioning of the words. Therefore, as a result, “women” and “smart” will not end up close
together just because of a handful of “women are not smart” instances.
There are alternate ways one could imagine measuring misogyny. As noted previously,
misogyny is defined as a dislike of, contempt for, or ingrained prejudice against women. This
could be operationalized a multitude of different ways. Indeed, looking at prior empirical work
shows that different papers often look at completely different things. Work has looked at
everything from the presence of violence (Herd 2009) and discussion of murder and assault
(Armstrong 2001), to whether women are talked about as objects (Harding and Nett 1984), and
as evil, as sex objects, or as physically attractive (Cooper et al. 1985). Consequently, it is
unlikely that any one paper could look at all aspects of misogyny. That said, as noted above, we
tried a number of different ways including object of aggression, association with desirable traits,
and, in ancillary analyses, the use of derogatory words as well as how often men and women
appear as subjects and objects. Future work could consider additional directions.
Finally, while automated textual analysis has some benefits, it also has some clear
weaknesses. While it provides consistency across texts, and scalability across large datasets, we
do not mean to suggest that automated methods are somehow better than more manual ones.
Close reading allows researchers to pick up nuances that automated methods might miss. Indeed,
before zeroing in on which aspects to focus on in this investigation, we manually read through
dozens of songs to get a sense of how language was used and how it discussed men and women.
In contrasting the differences between more automated and more manual approaches, Hart
(2001) uses the metaphor of trying to understand a city by walking the streets or viewing it from
a helicopter. Both provide different, but useful pictures of what is going on. Consequently,
automated and manual methods can be complementary, and often provide different viewpoints to
the same problem.
Conclusion
In conclusion, word embeddings and other computational linguistic techniques provide a
powerful toolkit to study culture. Researchers have long been interested in quantifying cultural
dynamics, but measurement has been a key challenge. Natural language processing, however,
provides a reliable method of extracting features, and doing so at scale. These emerging
approaches have the potential to shed light on a range of interesting questions related to
consumer behavior.
REFERENCES
Aaker, J. L., Lee, A. Y. (2001). “I” seek pleasures and “we” avoid pains: The role of self-regulatory goals in information processing and persuasion. Journal of Consumer Research, 28(1), 33–49.
Adams, T. M., Fuller, D. B. (2006). The words have changed but the ideology remains the same: Misogynistic lyrics in rap music. Journal of Black Studies, 36(6), 938–957.
Agarwal, S. (2017). Every song you have heard (almost)! Retrieved December 1, 2018, from https://www.kaggle.com/artimous/every-song-you-have-heard-almost
Anderson, C. A., Carnagey, N. L. and Eubanks, J. (2003). Exposure to violent media: The effects of songs with violent lyrics on aggressive thoughts and feelings. Journal of Personality and Social Psychology, 84(5), 960–971.
Armstrong, E. G. (2001). Gangsta misogyny: A content analysis of the portrayals of violence against women in rap music, 1987–1993. Journal of Criminal Justice and Popular Culture, 8(2), 96–126.
Bass, F. (1969). A New Product Growth for Moden Consumer Durables. Management Science 15(5), 215-227.
Beckwith, J. (2016). The evolution of music genre popularity. Retrieved July 7, 2019, from http://thedataface.com/2016/09/culture/genre-lifecycles
Berger, J., Humphreys, A., Ludwig, S., Moe, W. W., Netzer, O. and Schweidel, D. A. (2019). Uniting the tribes: Using text for marketing insight. Journal of Marketing, 84(1), 1–25.
Berger, J., Packard, G. (2018). Are atypical things more popular? Psychological Science, 29(7), 1178–1184.
Bhatia, S. (2017). Associative judgment and vector space semantics. Psychological Review, 124(1), 1.
Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.
Brummert Lennings, H. I. and Warburton, W. A. (2011). The effect of auditory versus visual violent media exposure on aggressive behaviour: the role of song lyrics, video clips and musical tone. Journal of Experimental Social Psychology, 47(4), 794-799.
Bruni, E., Boleda, G., Baroni, M. and Tran, N.-K. (2012). Distributional semantics in technicolor. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, 1, 136–145.
Carlana, M. (2019). Implicit Stereotypes: Evidence from Teachers’ Gender Bias*. The Quarterly Journal of Economics, 134(3), 1163–1224.
Cer, D., Yang, Y., Kong, S. Y., Hua, N., Limtiaco, N., John, R. S., ... & Sung, Y. H. (2018). Universal sentence encoder. arXiv preprint arXiv:1803.11175.
Cooper, C.L., Cooper, R. and Faragher, B. (1985), Stress and life event methodology. Stress Med., 1, 287-289.
Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2018). Bert: Pre-training of deep
bidirectional transformers for language understanding. ArXiv Preprint ArXiv:1810.04805.Dodds, P. S., Danforth, C. M. (2010). Measuring the happiness of large-scale written expression:
Songs, blogs, and presidents. Journal of Happiness Studies, 11(4), 441–456.Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3-4), 169–200.Fischer, P., Greitemeyer, T. (2006). Music and aggression: The impact of sexual-aggressive song
lyrics on aggression-related thoughts, emotions, and behavior toward the same and the opposite sex. Personality and Social Psychology Bulletin, 32(9), 1165–1176.
Fiske, S. T., Cuddy, A. J. C. and Glick, P. (2007). Universal dimensions of social cognition: Warmth and competence. Trends in Cognitive Sciences, 11(2), 77–83.
Fiske, S. T., Cuddy, A. J. C., Glick, P. and Xu, J. (2002). A model of (often mixed) stereotype content: Competence and warmth respectively follow from perceived status and competition. Journal of Personality and Social Psychology, 82(6), 878–902.
Flynn, M. A., Craig, C. M., Anderson, C. N. and Holody, K. J. (2016). Objectification in popular music lyrics: An examination of gender and genre differences. Sex Roles, 75(3-4), 164–176.
FreeDB. (2018). Retrieved May 10, 2019, from http://www.freedb.org/en/Garg, N., Schiebinger, L., Jurafsky, D. and Zou, J. (2018). Word embeddings quantify 100 years
of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences, 115(16), E3635–E3644.
Gaucher, D., Friesen, J. and Kay, A. C. (2011). Evidence that gendered wording in job advertisements exists and sustains gender inequality. Journal of Personality and Social Psychology, 101(1), 109.
Greitemeyer, T., Hollingdale, J. and Traut-Mattausch, E. (2015). Changing the track in music and misogyny: Listening to music with pro-equality lyrics improves attitudes and behavior toward women. Psychology of Popular Media Culture, 4(1), 56–67.
Hamberg, K. (2008). Gender Bias in Medicine. Womens Health, 4(3), 237–243.Hamilton, W. L., Leskovec, J. and Jurafsky, D. (2016). Diachronic word embeddings reveal
statistical laws of semantic change. ArXiv Preprint ArXiv:1605.09096.Harding, D. and Nett, E. (1984). Women and rock music. Atlantis: Critical Studies in Gender,
Culture & Social Justice, 10(1), 60–76.Hart, R. P. (2001). Theory, Method, and Practice in Computer Content Analysis. Westport:
Ablex Publishing.Herd, D. (2009). Changing images of violence in rap music lyrics: 1979-1997. Journal of Public
Health Policy, 30(4), 395-406.Hofmann, W., Vohs, K. D. and Baumeister, R. F. (2012). What People Desire, Feel Conflicted
About, and Try to Resist in Everyday Life. Psychological Science, 23(6), 582–588.Humphreys, A. (2010). Semiotic structure and the legitimation of consumption practices: The
case of casino. Journal of Consumer Research, 37(3), 490–510.Humphreys, A., Wang, R. J. H. (2018). Automated text analysis for consumer research. Journal
of Consumer Research, 44(6), 1274-1306.Hutto, C. J., Gilbert, E. (2014). Vader: A parsimonious rule-based model for sentiment analysis
of social media text. Eighth international AAAI conference on weblogs and social media.Jack, R. E., Garrod, O. G. B., Yu, H., Caldara, R. and Schyns, P. G. (2012). Facial expressions of
emotion are not culturally universal. Proceedings of the National Academy of Sciences, 109(19), 7241–7244.
Kaggle. (2017). US Baby Names, Version 2. Retrieved May 10, 2019, from https://www.kaggle.com/kaggle/us-baby-names
Kashima, Y. (2008). A social psychology of cultural dynamics: Examining how cultures are formed, maintained, and transformed. Social and Personality Psychology Compass, 2(1), 107–120.
Kesebir, S. (2017). Word order denotes relevance differences: The case of conjoined phrases with lexical gender. Journal of Personality and Social Psychology, 113(2), 262–279.
Kim, H., Markus, H. R. (1999). Deviance or uniqueness, harmony or conformity? A cultural analysis. Journal of Personality and Social Psychology, 77(4), 785–800.
Kozlowski, A. C., Taddy, M. and Evans, J. A. (2019). The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings. American Sociological Review, 84(5), 905–949.
Kuznetsov, S. (2017). 55000+ Song Lyrics. Retrieved December 1, 2018, from https://www.kaggle.com/mousehead/songlyrics
Larivière, V., Ni, C., Gingras, Y., Cronin, B. and Sugimoto, C. R. (2013). Bibliometrics: Global gender disparities in science. Nature, 504(7479), 211–213.
Le, Q., Mikolov, T. (2014). Distributed representations of sentences and documents. In International conference on machine learning, 1188–1196.
Lieberson, S. (2000). A matter of taste: How names, fashions, and culture change. New Haven: Yale University Press.
Lin, Y., Michel, J.-B., Aiden, E. L., Orwant, J., Brockman, W. and Petrov, S. (2012). Syntactic annotations for the google books ngram corpus. Proceedings of the ACL 2012 system demonstrations,169–174.
Lundberg, S., Stearns, J. (2019). Women in Economics: Stalled Progress. Journal of Economic Perspectives, 33(1), 3-22.
Mahieux, T. B., Ellis, D. P. W., Whitman, B. and Lamere, P. (2011). The million song dataset. ISMIR-11.
Markus, H. R., Kitayama, S. (1991). Culture and the self: Implications for cognition, emotion, and motivation. Psychological Review, 98(2), 224.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 3111–3119.
Mishra, G. (2017). 380,000+ lyrics from MetroLyrics. Retrieved December 1, 2018, from https://www.kaggle.com/gyani95/380000-lyrics-from-metrolyrics
Moore, S. G., Mcferran, B. (2017). She said, she said: Differential interpersonal similarities predict unique linguistic mimicry in online word of mouth. Journal of the Association for
Consumer Research, 2(2), 229–245.Moss-Racusin, C., Dovidio, J., Brescoll, V., Graham, M. and Handelsman, J. (2012). Science
facultys subtle gender biases favor male students. Proceedings of the National Academy of Sciences, 109(41), 16474–16479.
Netzer, O., Feldman, R., Goldenberg, J. and Fresko, M. (2012). Mine your own business: Market-structure surveillance through text mining. Marketing Science, 31(3), 521–543.
Netzer, O., Lemaire, A. and Herzenstein, M. (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of Marketing Research, 56(6), 960–980.
Nicolas, G., Bai, X. and Fiske, S. T. (2019). Development and validation of comprehensive stereotype content dictionaries through an automated method.
Packard, G., Moore, S. G. and Mcferran, B. (2018). (Im) happy to help (You): The impact of personal pronoun use in customer–firm interactions. Journal of Marketing Research, 55(4), 541–555.
Payne, B. K., Vuletich, H. A. and Brown-Iannuzzi, J. L. (2019). Historical roots of implicit bias in slavery. Proceedings of the National Academy of Sciences, 116(24), 11693–11698.
Pennebaker, J. W., Boyd, R. L., Jordan, K. and Blackburn, K. (2015). The development and psychometric properties of LIWC2015. Austin, TX: University of Texas at Austin.
Pennebaker, J. W., Mehl, M. R. and Niederhoffer, K. G. (2003). Psychological aspects of natural language use: Our words, our selves. Annual Review of Psychology, 54(1), 547–577.
Pennington, J., Socher, R. and Manning, C. (2014). Glove: Global vectors for word representation. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532–1543.
Rao, H., Monin, P. and Durand, R. (2005). Border Crossing: Bricolage and the Erosion of Categorical Boundaries in French Gastronomy. American Sociological Review, 70(6), 968–991.
Reuben, E., Sapienza, P. and Zingales, L. (2014). How stereotypes impair womens careers in science. Proceedings of the National Academy of Sciences, 111(12), 4403–4408.
Rocklage, M., Fazio, R. (2015). The evaluative lexicon: Adjective use as a means of assessing and distinguishing attitude valence, extremity, and emotionality. Journal of Experimental Social Psychology, 56, 214–227.
Rocklage, M., Rucker, D. and Nordgren, L. (2018). The evaluative lexicon 2.0: The measurement of emotionality, extremity, and valence in language. Behavior Research Methods, 50(4), 1327–1344.
Sahlgren, M., Lenci, A. (2016). The effects of data size and frequency range on distributional semantic models. ArXiv Preprint ArXiv:1609.08293.
Sandhaus, E. (2008). The new york times annotated corpus. Linguistic Data Consortium, Philadelphia, 6(12), e26752.
Schaller, M., Conway, L. G. III and Tanchuk, T. L. (2002). Selective pressures on the once and future contents of ethnic stereotypes: Effects of the communicability of traits. Journal of
Personality and Social Psychology, 82(6), 861–877.Schaller, M., Crandall, C. S. (2003). The psychological foundations of culture. Mahwah NJ
Lawrence: Erlbaum Associates Inc.Shor, Eran, Arnout van de Rijt, Alex Miltsov, Vivek Kulkarni, and Steven Skiena (2015), “A
Paper Ceiling Explaining the Persistent Underrepresentation of Women in Printed News,” American Sociological Review, 80 (5), 960–84.
Simmel, G. (1957). Fashion. American Journal of Sociology, 62(6), 541-558. Toubia, O., Iyengar, G., Bunnell, R. and Lemaire, A. (2018). Extracting features of entertainment
products: A guided latent dirichlet allocation approach informed by the psychology of media consumption. Journal of Marketing Research, 56(1), 18–36.
Wolfram, W., Reaser, J. and Vaughn, C. (2008), Operationalizing Linguistic Gratuity: From Principle to Practice. Language and Linguistics Compass, 2, 1109-1134.