43
Image: © flickr/srqpix CC BY 2.0 GENDER DIFFERENCES IN DISCIPLINARY WRITING Brian N. Larson WRAB III, 20 February 2014 Université Paris-Ouest Nanterre La Défense

2.0 srqpix GENDER DIFFERENCES IN DISCIPLINARY · PDF filer / srqpix 2.0 GENDER DIFFERENCES IN DISCIPLINARY WRITING Brian N. Larson WRAB III, 20 February 2014 Université Paris-Ouest

  • Upload
    vodung

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Imag

e: ©

flic

kr/s

rqpi

x C

C B

Y 2.

0

GENDER DIFFERENCES IN DISCIPLINARY WRITING

Brian N. Larson WRAB III, 20 February 2014

Université Paris-Ouest Nanterre La Défense

www.Rhetoricked.com @Rhetoricked

Housekeeping

•  www.Rhetoricked.com (these slides + some additional)

•  Communicate with me: –  @Rhetoricked –  [email protected]

•  Research supported by: –  Graduate Research Partnership Program fellowship(U of M

CLA) –  James I. Brown fellowship

www.Rhetoricked.com @Rhetoricked

Synonyms (for now)

•  I’ll use these words as synonyms for this talk (for reasons explained in another talk) –  {sex, gender, Fr. sexe} –  {woman, female, feminine} –  {man, male, masculine}

www.Rhetoricked.com @Rhetoricked

Do men and women communicate differently?

•  Much work inspired by Robin Lakoff (1975)

•  Scholarly and popular works by Deborah Tannen (e.g. 1990) and others

•  Much of this research in oral/face-to-face communication

www.Rhetoricked.com @Rhetoricked

Writing: Process and product

•  In writing studies, we can (roughly) divide process and product – Do men and women produce writing using

different processes? –  Is the writing they produce distinguishable

based on author gender?

www.Rhetoricked.com @Rhetoricked

Previous studies: Process research

•  Focus on interpersonal communications in mixed-gender contexts – Lay, 1989; Rehling, 1996; Raign & Sims,

1993; Ton & Klecun, 2004; Wolfe & Alexander, 2005; Brown & Burnett, 2006; Wolfe & Powell, 2006, 2009.

www.Rhetoricked.com @Rhetoricked

Previous studies: Product research

•  In technical and professional communication – Sterkel, 1988 (20 stylistic chars) – Smeltzer & Werbel, 1986 (16 stylistic and

evaluative measures) – Tebeaux, 1990 (quality of responses) – Allen, 1994 (markers of authoritativeness)

•  Manual methods, small samples

www.Rhetoricked.com @Rhetoricked

Enter computational methods

•  Natural language processing (NLP) •  Allows processing of large quantities of

text data •  Study that attracted my attention

– Argamon et al., 2003 – Koppel, Argamon & Shimoni, 2002 –  “02/03 Argamon Study”

www.Rhetoricked.com @Rhetoricked

02/03 Argamon study

•  Used 500 published texts from BNC •  Mean 34,000 words (‘tokens’) per text •  Categorized texts by author gender

accurately –  82.6% of time on non-fiction texts –  79.5% of time on fiction texts

www.Rhetoricked.com @Rhetoricked

Gender in computer-mediated communication (CMC)

•  CMC popular for NLP studies – Data are readily available – Data are voluminous

•  Examples – Herring & Paolillo, 2006 (blog posts) – Yan & Yan, 2006 (blog posts) – Argamon et al., 2007 (blog posts) – Rao et al., 2010 (Twitter) – Burger et al., 2011 (Twitter)

www.Rhetoricked.com @Rhetoricked

Rationale: Why is the question important?

•  Lend support to one or more theories of gender –  ‘Two cultures’ (Maltz & Borker, 1982) –  ‘Standpoint’ (Barker & Zifcak, 1999) –  ‘Performative’ (Butler 1993, 1999, 2004) –  Others

•  Concern that “women’s writing” may be less persuasive (Amstrong & McAdams 2009)

•  Sorting out methodological problems, particularly use of gender as a variable

www.Rhetoricked.com @Rhetoricked

Study design goals

•  Overarching: Show utility of NLP/corpus methods in disciplinary communication research. Important considering, e.g., Pakhomov et al. 2008.

•  Examine a corpus of texts –  All of the same genre –  Where we can be confident of single authorship –  Where author gender is self-identified

•  Analyze them using the same variables (“features”) as the 02/03 Argamon study

•  LATER: analyze them using other features

www.Rhetoricked.com @Rhetoricked

Data collection

•  Major writing project at end of first year of law school* –  Students address hypothetical problem (writing in

same ‘genre’ broadly defined) –  Students not allowed to collaborate –  Plagiarism difficult (but still possible)

•  Students self-identified gender** •  193 texts (mean word tokens = 3764) *Law school comes after 4-year baccalaureate in U.S. **This study IRB-approved (UMN Study #1202E10685)

www.Rhetoricked.com @Rhetoricked

Text genre: Memorandum regarding motion to dismiss

•  Written to hypothetical court •  Supporting or opposing a motion before

the court •  High-level organization is formulaic

www.Rhetoricked.com @Rhetoricked

r

•  t

www.Rhetoricked.com @Rhetoricked

Memorandum Sections

•  Caption** •  Introduction/summary* •  Facts •  Legal standard of review* •  Argument •  Conclusion •  Signature block**

* Not always present. **I did not analyze (content is highly formulaic)

www.Rhetoricked.com @Rhetoricked

Manual Annotation Using GATE

•  General Architecture for Text Engineering (Cunningham et al. 2012; 2013)

•  Annotation is nondestructive bracketing that allows exclusion of material from analysis

•  Annotated and excluded from study –  Long quotations –  Legal citations –  Headings

•  Annotated to permit segmentation of samples: –  Large sections of text

•  About two hours of work for each text in sample

www.Rhetoricked.com @Rhetoricked

t

•  t

www.Rhetoricked.com @Rhetoricked

Coding and inter-rater reliability

•  Two coders did this work •  Coding guide developed with other legal

texts (not the study texts) •  Performed test of inter-rater reliability on

10 (5%) papers •  F-scores satisfactory (for those interested)

– Strict = .83 (target >.80) –  Lenient = .98 (target > .95) – Average = .91 (target > .91)

www.Rhetoricked.com @Rhetoricked

Pre-processing

•  Exported from GATE in XML •  Used Python and NLTK (Bird et al.

2009) – Stripped sections I am not analyzing – Created a text corpus

www.Rhetoricked.com @Rhetoricked

Feature (“variable”) selection

•  For now, those of 02/03 Argamon study •  Relative frequencies of

–  405 “function words” (I used 429) –  76 BNC parts of speech (I used 45 from the

Penn Treebank tagset) –  500 most common part-of-speech trigrams –  100 most common POS bigrams –  I can explain variations from Argamon if you

have questions

www.Rhetoricked.com @Rhetoricked

‘Part-of-speech’ tags? ‘Bigrams & trigrams’?

•  First, ‘tokenize’ each sentence (automated): –  ‘My aunt’s pen is on the table.’ (purple shading represents ‘function’ words) –  ‘La plume de ma tante est sur la table.’

www.Rhetoricked.com @Rhetoricked

POS tags

•  Then tag the parts of speech (automated)

•  I can now calculate relative frequency of function words and POS tags (automated)

www.Rhetoricked.com @Rhetoricked

POS bigrams and trigrams

•  A bigram or trigram is a 2- or 3-token ‘window’ on the sentence. –  Automated calculation

www.Rhetoricked.com @Rhetoricked

POS bigrams and trigrams

•  Or our French sentence with bigrams…

www.Rhetoricked.com @Rhetoricked

Each student’s text is represented as a ‘vector’

•  A series of numerical values expressing each feature (variable), i.e., the relative frequency of: – Function words / total tokens – POS tags / total tokens – Bigrams / total bigrams* – Trigrams / total trigrams* – Automated calculation

*Multiplied by a factor.

www.Rhetoricked.com @Rhetoricked

t

•  T

www.Rhetoricked.com @Rhetoricked

Example 1

•  Tokens of the function word-type “all” in paper 1007 account for less than 7/100 of 1% of all tokens in that paper.

www.Rhetoricked.com @Rhetoricked

Example 2

•  Bigrams made up of a plural common noun (NNS) followed by a coordinating conjunction (CC) accounted for 1/10 of 1% of bigrams in paper 1009.

www.Rhetoricked.com @Rhetoricked

Example 3

•  t •  Trigrams consisting of a determiner (DT), a past-participle (VBN), and a common noun (NN) accounted for nearly 3.7% of the trigrams in paper 1014

www.Rhetoricked.com @Rhetoricked

Machine learning algorithm (MLA)

•  All based on WEKA implementation (Hall et al. 2009)

•  Algorithm that trains on part of the data, learning which features are most useful for categorizing texts

•  Then it’s tested on another part of the data, to see how accurate it is

•  Repeat x times, using different “slices” of the data (x-fold cross validation)

www.Rhetoricked.com @Rhetoricked

Evaluation baselines

•  Baseline 1—Default category: If all texts are assigned Gender 1, observed agreement would be 104/193 or 53.9%

•  Baseline 2—My “gut” target of 70% (given Argamon study correctly categorized 82.6% (non-fiction))

www.Rhetoricked.com @Rhetoricked

Preliminary results

•  These MLAs unable to classify texts better than default-category baseline – Bayesian Logistic Regression (53.9%) – Naïve Bayes (49.2%/51.3%) – Voted Perceptron (53.9%)

www.Rhetoricked.com @Rhetoricked

Preliminary results

•  Two performed better than default category baseline, but only one statistically significantly –  Logistic regression: 60.1%, χ2=7.66 (df=1),

p<0.01 – Simple Logistic: 57%, χ2=3.54 (df=1), p>0.05

•  But compare to target of 70% (considering Argamon at 82.6% (non-fiction))

www.Rhetoricked.com @Rhetoricked

(Preliminary) Conclusion

•  My preliminary conclusion: With these texts, on these features, these MLAs cannot be said to classify texts successfully based on author gender.

•  Limitations/questions –  Conclusion not generalizable (even to all law students. –  Would other features (lexical, syntactic, discourse

level) distinguish texts by author gender? –  Would humans attempting to classify these texts be

able to? Based on what characteristics? –  Would differences be evidence before/after law

school?

www.Rhetoricked.com @Rhetoricked

Possible implications

•  If there is gender-correlated language difference coming into law school, the conventions of legal writing disguise it

•  Students adapt their language to the genres in which they communicate

•  This supports standpoint and performative theories of gender, but not some psychological theories

•  Use of gender as a variable in many of these studies is undertheorized

www.Rhetoricked.com @Rhetoricked

Implications for NLP in disciplinary writing research

•  Thinking of Pakhomov et al. 2008 •  PRO: Tools are open-source •  PRO: Techniques are easily learned •  CON: Manual annotation is time-

consuming (but may not always be necessary)

•  CON: Methods not sufficiently theorized (in many cases)

www.Rhetoricked.com @Rhetoricked

THANK YOU!

•  www.Rhetoricked.com (these slides + some additional)

•  Communicate with me: –  @Rhetoricked –  [email protected]

•  Research supported by: –  Graduate Research Partnership Program fellowship(U of M

CLA) –  James I. Brown fellowship

www.Rhetoricked.com @Rhetoricked

Works cited Allen, J. (1994). Women and authority in business/technical communication scholarship: An analysis of writing... Technical Communication Quarterly, 3(3), 271. Argamon, S., Koppel, M., Fine, J., & Shimoni, A. R. (2003). Gender, genre, and writing style in formal written texts. Text, 23(3), 321–346. Argamon, S., Koppel, M., Pennebaker, J. W., & Schler, J. (2007). Mining the Blogosphere: Age, gender and the varieties of self-expression. First Monday, 12(9). Retrieved from http://firstmonday.org/issues/issue12_9/argamon/index.html Armstrong, C. L., & McAdams, M. J. (2009). Blogs of information: How gender cues and individual motivations influence perceptions of credibility. Journal of Computer-Mediated Communication, 14(3), 435–456. Barker, R. T., & Zifcak, L. (1999). Communication and gender in workplace 2000: creating a contextually-based integrated paradigm. Journal of Technical Writing & Communication, 29(4), 335. Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python (1st ed.). O’Reilly Media. Brown, S. M., & Burnett, R. E. (2006). Women hardly talk. Really! Communication practices of women in undergraduate engineering classes (pp. T3F1–T3F9). Presented at the 9th International Conference on Engineering Education, San Juan, Puerto Rico: International Network for Engineering Education & Research. Retrieved from http://ineer.org/Events/ICEE2006/papers/3219.pdf Burger, J., Henderson, J., Kim, G., & Zarrella, G. (2011). Discriminating gender on Twitter. Bedford, MA: MITRE Corporation. Retrieved from http://www.mitre.org/work/tech_papers/2011/11_0170/

Butler, J. (1993). Bodies that matter: on the discursive limits of“ sex.” New York: Routledge. Butler, J. (1999). Gender trouble. New York: Routledge. Butler, J. (2004). Undoing gender. New York: Routledge. Cunningham, H., Maynard, Diana, Bontcheva, K., Tablan, V., Aswani, N., Roberts, I., … Peters, W. (2012, December 28). Developing Language Processing Components with GATE Version 7 (a User Guide). GATE: General Architecture for Text Engineering. Retrieved January 1, 2013, from http://gate.ac.uk/sale/tao/split.html Cunningham, H., Tablan, V., Roberts, A., & Bontcheva, K. (2013). Getting More Out of Biomedical Documents with GATE’s Full Lifecycle Open Source Text Analytics. PLoS Computational Biology, 9(2), e1002854. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA Data Mining Software: An Update. SIGKDD Explorations, 11(1), 10–18. Herring, S. C., & Paolillo, J. C. (2006). Gender and genre variation in weblogs. Journal of Sociolinguistics, 10(4), 439–459. Koppel, M., Argamon, S., & Shimoni, A. R. (2002). Automatically categorizing written texts by author gender. Literary and Linguistic Computing, 17(4), 401 –412. Lakoff, R. T. (1975/2004). Language and Woman’s Place: Text and Commentaries. (M. Bucholtz, Ed.) (Revised and expanded ed.). New York: Oxford University Press. Lay, M. M. (1989). Interpersonal conflict in collaborative writing: What we can learn from gender studies. Journal of Business and Technical Communication, 3(2), 5–28.

www.Rhetoricked.com @Rhetoricked

Works cited Maltz, D. N., & Borker, R. (1982). A cultural approach to male-female miscommunication. In J. J. Gumperz (Ed.), Language and social identity (pp. 196–216). Cambridge U.K.: Cambridge University Press. Pakhomov, S. V., Hanson, P. L., Bjornsen, S. S., & Smith, S. A. (2008). Automatic classification of foot examination findings using clinical notes and machine learning. Journal of the American Medical Informatics Association, 15, 198–202. Raign, K. R., & Sims, B. R. (1993). Gender, persuasion techniques, and collaboration. Technical Communication Quarterly, 2(1), 89–104. Rao, D., Yarowsky, D., Shreevats, A., & Gupta, M. (2010). Classifying latent user attributes in Twitter. In Proceedings of the 2nd international workshop on Search and mining user-generated contents (pp. 37–44). Toronto, ON, Canada: ACM. Rehling, L. (1996). Writing together: Gender’s effect on collaboration. Journal of Technical Writing and Communication, 26(2), 163–176. Smeltzer, L. R., & Werbel, J. D. (1986). Gender differences in managerial communication: Fact or folk-linguistics? Journal of Business Communication, 23(2), 41–50. Sterkel, K. S. (1988). The relationship between gender and writing style in business communications. Journal of Business Communication, 25(4), 17–38. Tannen, D. (2001). You Just Don’t Understand: Women and Men in Conversation. William Morrow Paperbacks. Tebeaux, E. (1990). Toward an understanding of gender differences in written business communications: A suggested perspective for future research. Journal of Business and Technical Communication, 4(1), 25–43.

Tong, A., & Klecun, E. (2004). Toward accommodating gender differences in multimedia communication. Professional Communication, IEEE Transactions on, 47(2), 118–129. Wolfe, J., & Alexander, K. P. (2005). The computer expert in mixed-gendered collaborative writing groups. Journal of Business and Technical Communication, 19(2), 135–170. Wolfe, J., & Powell, B. (2006). Gender and expressions of dissatisfaction: A study of complaining in mixed-gendered student work groups. Women & Language, 29(2), 13–20. Wolfe, J., & Powell, E. (2009). Biases in interpersonal communication: How engineering students perceive gender typical speech acts in teamwork. Journal of Engineering Education, 98(1), 5–16. Yan, X., & Yan, L. (2006). Gender classification of weblog authors. In AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs (pp. 228–230).

www.Rhetoricked.com @Rhetoricked

‘Bonus’ slides

www.Rhetoricked.com @Rhetoricked

Inter-rater reliability: Strict and lenient

•  Coders have to select span and code it •  What about leading/trailing spaces or

punctuation?

•  “Strict” means spans are identical; “lenient” means they have same code but don’t overlap 100%

www.Rhetoricked.com @Rhetoricked

Inter-rater reliability

•  My target F-scores –  Strict > .80 –  Lenient > .95 –  Average > .90

•  Actual F-scores –  Strict = .83 (target >.80) –  Lenient = .98 (target > .95) –  Average = .91 (target > .91)

•  Manual review showed most lenient matches would have been strict matches but for a terminal missed space or punctuation (not affecting this analysis)