14
1 Martin Schweinberger http://martinschweinberger.de Universität Hamburg Email: [email protected] Freie Universität Berlin Email: [email protected] Particles and Priming - Combining sociolinguistic and psycholinguistic determinants of variation 1. Setting the stage & Research question(s) Sociolinguistics - Branch of linguistics that investigates the correlation between social factors and language use, e.g. it looks at differences between the language use of women and men or between young speakers and old speakers. Psycholinguistics - Branch of linguistics that focuses on the relationship between our brain and language production, comprehension and processing. Psycholinguists investigate the cognitive underpinnings of language and analyze to which extent language use can be explained by mental processes e.g. processing difficulty, priming, etc. Q1: Is our understanding of processes of language change and variation enhanced, if we include psycholinguistic factors in sociolinguistic analyses of linguistic variation? Q2: To which extent do advanced statistical methods (mixed-effects regression models) improve our understanding of determinants of language change and variation (compared to conventional statistics, e.g. fixed-effects regressions, GoldVarb) 2. Theoretical framework Modern sociolinguistic theory - Most of the linguistic changes in progress studied in the 2nd half of the 20th century show a high degree of gender differentiation and social stratification (Labov 1994, 2001, 2010).

Particles and Priming - Combining sociolinguistic and ...martinschweinberger.de/docs/articles/msch-ext-handout...2012/10/24  · Sociolinguistics - Branch of linguistics that investigates

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • 1

    Martin Schweinberger http://martinschweinberger.de Universität Hamburg Email: [email protected] Freie Universität Berlin Email: [email protected]

    Particles and Priming - Combining sociolinguistic and psycholinguistic

    determinants of variation

    1. Setting the stage & Research question(s)

    Sociolinguistics

    - Branch of linguistics that investigates the correlation between social factors and language use, e.g. it looks at differences between the language use of women and men or between young speakers and old speakers.

    Psycholinguistics

    - Branch of linguistics that focuses on the relationship between our brain and language production, comprehension and processing. Psycholinguists investigate the cognitive underpinnings of language and analyze to which extent language use can be explained by mental processes e.g. processing difficulty, priming, etc.

    Q1: Is our understanding of processes of language change and variation enhanced, if we include psycholinguistic factors in sociolinguistic analyses of linguistic variation?

    Q2: To which extent do advanced statistical methods (mixed-effects regression models) improve our understanding of determinants of language change and variation (compared to conventional statistics, e.g. fixed-effects regressions, GoldVarb)

    2. Theoretical framework

    Modern sociolinguistic theory

    - Most of the linguistic changes in progress studied in the 2nd half of the 20th century show a high degree of gender differentiation and social stratification (Labov 1994, 2001, 2010).

    mailto:[email protected]�mailto:[email protected]

  • 2

    - Linguistic variation is not random, but correlates systematically with certain social factors (age, gender, class, ethnicity, etc.).

    - In fact, there seems to be an underlying systematicity not only in the correlation between social variables and language use but there are also certain reoccurring patterns with regard to which groups are leaders of linguistic change and which lag behind.

    Systematic and reoccurring patterns in language change

    - Adolescent, lower middle-class females are typically the leaders of change.

    - In cases of change, female speakers are typically one generation ahead of male speakers.

    - Males exhibit higher rates of non-standard features while female speakers are more sensitive to stigmatization.

    - Successful changes typically spread from the center outward, i.e. from the middle classes, not from top to bottom or vice versa.

    - Our current models of language variation and change have the advantage that they are …

    - based on many studies (highly stable)

    - high predictive and explanatory power

    - But these models also have shortcomings…

    - neglect of language contact and multilingualism

    - do not take psycholinguistic factors into account

    - rely on overcome statistical methods (GoldVarb, fixed-effects models)

    - Sociolinguists are aware of these shortcomings and have begun to look at priming (persistence) (cf. Szmrecsanyi 2006)

    - Also, the fact that classic fixed-effects (GoldVarb) regression analyses are likely to produce unreliable results has been criticized and discussed (Johnson 2009).

    - The question is to which degree is the high degree of social stratification of language use an artifact of inappropriate statistics and ignoring psycholinguistic factors.

  • 3

    Traditional sociolinguistics Modern sociolinguistics Tomorrow’s sociolinguistics

    Methodology Frequency analysis & bivariate

    statistics Multivariate statistics Sophisticated

    statistical modelling (non-)parametric tests GoldVarb analysis Advanced statistics (R)

    χ2-test, t-test, Wilcoxon Sign-Rank test Logistic Regression

    Generalized linear (mixed-effects) models, PCA, MDS

    Proponents Trudgill Labov Szmrecsanyi

    Chambers Rickford ? Preston Tagliamonte ?

    Table 1: Methodological evolution of (Variationist) Sociolinguistics.

    3. What is priming (or persistence)?

    - Priming refers to the tendency of a speaker to reuse linguistic material which occurred in the previous discourse, i.e. previous exposure to a linguistic element will make it more likely, all other things being equal, that the same element will be used again. (cf. Szmrecsanyi 2006:2)

    - For example, if some says “hello” when greeting someone else, it is more likely that the other speaker will also say “hello” and not “hi” or “How are you”.

    - The idea is that once the neuronal network for a certain linguistic expression is activated (by hearing that expression or using it actively), the activation first peaks and then starts to decay but the decay will take a while.

    - If an appropriate chance to use that expression comes up again before the activation of the neuronal network has fully decayed, the “primed” expression is more likely to be used again because it needs less effort to reach the activation threshold.

  • 4

    Figure 1: Decay of neuronal activity (priming effect) after activation event at time t0.

    4. The phenomenon

    Pragmatic particles

    - Not verb + particle constructions like get up

    - Subtype of discourse markers

    - Prototypical examples

    - sort of, kind of, like, so, you know

    - Syntactically optional

    - Semantically (rather) empty

    - Pragmatic and/or social meaning (multifunctional)

    - Do not change truth conditions of a proposition

  • 5

    eh in New Zealand English: Examples

    (1) ICE-NZ:S1A-002#B: remind me to ring up early in the morning not early about er … what time does the … ICE-NZ:S1A-002#B: oh be up there at ten eh ICE-NZ:S1A-002#Q: i'm out of here at nine

    (2) ICE-NZ:S1A-004#M: they're frightened that they might stop them sitting states ICE-NZ:S1A-004#M: it's a really bad buzz eh ICE-NZ:S1A-004#G: oh yeah yeah

    (3) ICE-NZ:S1A-004#M: er she's working with piki teaching ICE-NZ:S1A-004#G: oh is she ICE-NZ:S1A-004#M: cos you know her eh ICE-NZ:S1A-004#G: marie yeah i know her

    eh in New Zealand English: Properties

    - Typically in turn-final position (non-polar tag)

    - Much stereotyped and evaluated negatively in NZE (vernacular and not standard NZE)

    - Used more by Maori men than by Maori women or Pakehas (British/European New Zealanders)

    - Young Pakeha women, though, seem to be the next highest users of eh.

    - Typical feature of working-class speech (Stubbe & Holmes 1995:84)

    - Functions as an in-group signal of ethnic identity for these speakers. (Meyerhoff 1994:371)

    5. Data

    International Corpus of English (ICE)

    - ICE New Zealand

    - Most informal register (S1A): face-to-face conversation, telephone calls (highest frequency of non-standard and discourse features)

  • 6

    Summary Statistics

    Table 2: Survey of the data of this study.

    Sex

    male female Total

    speakers 86 144 230

    files --- --- 100

    turns 11,821 19,394 31,215

    words 89,935 164,695 254,630

    eh 185 257 442

    Data Plotting

    Figure 2: Distribution of eh over AGE and SEX.

  • 7

    Figure 3: Boxplot showing the number of words between prime eh (in previous context) and target eh (at the end of the respective turn) (left boxplot) and the number of words preceding the end of the turn if no prime occurs divided by 2 (control group) (right boxplot). The difference is statistically significant; significance testing was performed by a one-sided t-test for independent samples without equal variance.

  • 8

    Figure 4: Histograms and density plots showing the distribution of eh across speakers (top) and files (bottom). Table 3: Description of variables, their coding, and their levels.

    Dependent variable HIT nominal yes/no occurrence of turn-final eh

    Independent variables FILE categorical File header SPEAKER categorical Speaker ID AGE numeric Age groups in ascending order SEX nominal Male/female ETHNICITY categorical Pakeha/Maori/other PRIMING nominal Yes/no occurrence of eh within last 30 words before turn

    OCCUPATION categorical Unskilled manual labor (UML), skilled manual labor (SML), clerical (CLE), managerial (MAN), professional (PRO)

  • 9

    6. Statistical designs

    What is a Regression?

    - A regression or regression model is a statistical method which measures if there is a meaningful relationship between some phenomenon (dependent variable) and various factors (independent variables).

    - For instance, a regression can tell us whether monthly income of someone (dependent variable) is influenced by her or his degree of education (independent variable), i.e. whether people that have attended university earn, on average, more than people who have not attended university.

    - If there is a meaningful relationship between the dependent and the independent variable(s), then this relationship is called „significant“ and what is called „effect size“ tells us how strong this relationship is.

    - For example, having A-levels (Abitur) correlates significantly with attending a university and this „predictor“ (having A-levels) has a substantial effect size, i.e. it tells us that if you have. Your A-levels, then it is very likely that you will also have attended university.

    - To make this clear, having a beard is correlated significantly with being male, but since not all men have beards, the effect size of the predictor (having beard) is high but not perfect.

    Figure 5: Schema depicting the main difference between fixed-effects and mixed-effects regression models.

  • 10

    Specifics of the statistical modeling

    - The statistical analyses were performed by using R.

    - The classic Logistic Regression Model was validated using bootstrapping and a correction for ‘overfitting’ was performed based on AIC comparisons The correction was implemented by means of penalty-factors (.95).

    - Model fitting followed a step-wise step down protocol.

    - Everything I present today, i.e. the data, the annotated R code, plotting scripts and spreadsheets are freely available at my homepage (martinschweinberger.de).

    7. Results

    Table 4: Summaries of the results of three regression models: The left panel summarizes the results of a fixed-effects logistic regression model, the mid panel summarizes the results of generalized mixed-effects model specified as a binomial logistic model. The right panel summarizes the final minimal adequate mixed-effects model.

    Traditional statistics (Logistic Regression

    Model)

    Modern statistics (Generalized Linear Mixed

    Model)

    Tomorrow’s statistics (Gerneralized Linear

    Mixed Model)

    odds ratio

    increase/ decrease (%)

    p-value odds ratio

    increase/ decrease (%)

    p-value odds ratio

    increase/ decrease (%)

    p-value

    Intercept 0.0442 -95.5

  • 11

    - If a turn is uttered by a Pakeha, then the chance that there is an eh at the end of the turn decreases by 72.3 percent.

    - There are neither significant interactions nor do the occupation of speakers or priming have a significant effect.

    - The study did not detect a significant correlation between eh use and the occupation of a speaker and thus challenges that eh is typically a feature of working class speech.

    - However, eh is typically male and is exhibits substantial age grading, i.e. younger speakers use it more than older speakers.

    - While eh is not an indicator of class membership, it is an ethnic marker and significantly more common among Maoris than among Pakehas.

    A statisticians interpretation

    - Bivariate statistics are oftentimes misleading because they do not correct for confounding factors.

    - Traditional statistical models tend to overestimate significance and studies relying on their results may have overestimated the impact of extra-linguistic variables (of sex, age, priming, …).

    - Studies which have not taken hierarchical /nested variable structures into account –although nesting tokens would have been required– may well be fatally flawed.

    - Traditional models have been the basis for sociolinguistic theorizing for the past 30 or so years, but these models may have led to flawed generalizations about the workings of language…

    Conclusion

    Converging psycholinguistics & sociolinguistics

    - Although priming did not turn out to affect the use of eh, including psycholinguistic factors into models that analyze language variation and change is recommendable.

    - Advanced statistical models outperform traditional fixed-effects models and lead to more robust and reliable interpretations of complex data sets

    - This case study of eh has shown that it has a distinct sociolinguistic profile and that implementing psycholinguistic factors into sociolinguistic models of variation and change has great potential to improve our understanding of determinants of linguistic variability.

    Outlook

    - What to do next and what to do better

  • 12

    - I would like to apply the approach I have presented here to other varieties of English and more phenomena which are currently undergoing change, for instance

    - General Extenders (stuff like that, and so on, and shit)

    - Intensifiers (totally, so, wicked)

    - I will definitely take a closer look at priming in the context of Construction Grammar and particular with respect to collostructional and collexeme analyses.

    - I would also like to determine the exact the decay rate of priming effects by combining experimental and corpus-driven analyses.

    9. References & Appendix

    References

    Baayen, Harald. 2008. Analyzing linguistic variation – a practical introduction using R. Cambridge: Cambridge University Press.

    Johnson , Daniel Ezra. 2009. Getting off the GoldVarb Standard: Introducing Rbrul for Mixed-Effects Variable Rule. Language and Linguistics Compass 3(1). 359–383.

    Galway, Nick. 2006. Introduction to Mixed Modelling: Beyond Regression and Analysis of Variance. Chichester: Wiley.

    Gries, Stefan Th. 2009. Statistics for linguistics with R. A practical introduction. Berlin & New York: Mouton de Gruyter.

    Maier, Georg. 2012. The Distribution of Subject Pronoun Case Forms in Subject Predicative Complements in Varieties of English: a corpus- and web-based study of pronoun case variation. University of Hamburg PhD dissertation.

    Harrington, Jonathan. 2012. Generalised linear mixed models (GLMM) und die logistische Regression. http://www.phonetik.uni-muenchen.de/~jmh/lehre/sem/ss10/stat/glmm.pdf, access October 10th, 2012.

    Meyerhoff, Miriam. 1994.Sounds pretty ethnic, eh? A pragmatic particle in New Zealand English. Language in Society 23. 367-388.

    Manning, Christopher. 2007. Generalized Linear Mixed Models. (illustrated with R on Bresnan et al.’s dativs data). http://nlp.stanford.edu/~manning/courses/ling289/GLMM.pdf, access October 10th, 2012.

    Stubbe, Maria & Janet Holmes. 1995. You know, eh, and other ‘exasperating expressions’: an analysis of social and stylistic variation in the use of pragmatic devices in a sample of New Zealand English. Language & Communication 15(1). 63–88.

  • 13

    Szmreczanyi, Benedikt. 2006. Morphosyntactic Persistence in Spoken English. A Corpus study at the Intersection of variationist Sociolinguistics, Psycholinguistics, and Discourse Analysis. Berlin & New York: Mouton de Gruyter.

    Labov, William. 1994. Principles of Linguistic Change. Vol. 1, Internal Factors. Oxford: Blackwell.

    Labov, William. 2002. “Driving forces in linguistic change”. In Proceedings of the 2002 International Conference on Korean Linguistics. http://www.ling.upenn.edu/~wlabov/Papers/DFLC.htm, access October 14th, 2012.

    Appendix

    Figure 6: Graphical display of the effect sizes of the sex, age and ethnicity of speakers according the final minimal mixed-effects model.

    http://www.ling.upenn.edu/~wlabov/Papers/DFLC.htm�http://www.ling.upenn.edu/~wlabov/Papers/DFLC.htm�

  • 14

    Figure 7: Original age groups and their mean use of ‘eh’ with 95% confidence intervals.

    Figure 8: Cluster dendrogram showing the clustering of age groups according to their mean frequency of ‘eh’ use and their size.

    Particles and Priming - Combining sociolinguistic and psycholinguistic determinants of variation1. Setting the stage & Research question(s)SociolinguisticsPsycholinguistics

    2. Theoretical frameworkModern sociolinguistic theorySystematic and reoccurring patterns in language change

    3. What is priming (or persistence)?4. The phenomenonPragmatic particleseh in New Zealand English: Exampleseh in New Zealand English: Properties

    5. DataInternational Corpus of English (ICE)Summary StatisticsData Plotting

    6. Statistical designsWhat is a Regression?Specifics of the statistical modeling

    7. Results8. Discussion, Conclusion & OutlookDiscussionSociolinguistic interpretationA statisticians interpretationConclusionConverging psycholinguistics & sociolinguisticsOutlook

    9. References & AppendixReferencesAppendix