43
Natural Language Processing for Games Studies Research Jose P. Zagal & Noriko Tomuro College of Computing and Digital Media (2012 edition)

Natural Language Processing for Games Research

Embed Size (px)

Citation preview

Natural Language Processing for

Games Studies Research

Jose P. Zagal & Noriko TomuroCollege of Computing and Digital Media

(2012 edition)

120+ game reviews

120+ game reviews

400,000game reviews

26 meters

6 tall guys standing on each other

Natural Language Processing

(NLP)

What is NLP?

A field in Artificial Intelligence (AI) devoted to creating computers that use natural language as input and/or output

NLP in Games

Façade: an AI-based interactive story game

It interacts with users through automatically generated dialogues

Analyzing Large Amounts of Text NLP techniques -- automatically analyze

human languages formally (in the right way) Useful for analyzing and extracting information

from a large amount of text

NLanalysis

NLP Technique: POS Tagging Part-Of-Speech (POS) tagging is a process of

assigning a POS to each word in a sentence (and all sentences in a corpus)

Input: The icy roads are dangerousOutput: The/Det icy/Adj roads/N are/V

dangerous/Adj

NLP Technique: Syntactic Parsing Deriving the phrase structure of a sentence The structure is based on the grammar

GrammarR0:R1:R2:R3:R4:R5:R6:R7: cake"" N

the"" Det

ate"" VJohn"" NP V VG

NPVG VPN Det NP

VP NP S

S

NP VP

V NP

“John” “ate”

“the”

Det N

“cake”

Involves other disciplines

Linguistics Also called ”Computational Linguistics”

Psychology Mathematics and Statistics Information Theory Computer Science

Some real-world applications

NLP can be stand-along applications or components embedded in other systems.

Major NLP applications include: Machine translation (e.g. Google Translate) Question answering (e.g. Ask.com) Summarization Conversational agents (e.g. Chatbots)

Also, analyzing web documents Analyze (not just retrieve) weblogs,

discussion forums, message boards, user groups, and other forms of user generated media Product marketing information Political opinion tracking Social network analysis Buzz analysis (what’s hot, what topics are people

talking about right now).

Source: Jurafsky & Martin “Speech and Language Processing”

NLP is Hard

Understanding natural languages is hard … because of inherent ambiguity

Engineering NLP systems is also hard … because of: Huge amount of data resources needed (e.g.

grammar, dictionary, documents to extract statistics from)

Computational complexity (intractable) of analyzing a sentence

Ambiguity

There are different types of ambiguity and different techniques for dealing with it as well

““Get the cat with the gloves.”

Source: Marti Hearst, i256, at UC Berkeley

The Bottom Line

Complete NL Understanding (thus general intelligence) is impossible.

But we can make incremental progress. Also we have made successes in limited

domains. [But NLP is costly – Lots of work and

resources are needed, but the amount of return is sometimes not worth it.]

How NLP Can Help Games Research By analyzing LOTS of game texts we could:

Verify various hypotheses about the ‘language’ of games, gamers, gamer cultures, etc.

Analyze player preferences from game reviews Analyze text/dialogue from games Analyze conversations in MMOGs Create more realistic dialogue for interactive

games

And more!

...now back to game studies

Two Examples(illustrating some of the questions we might explore)

Research Question 1:How readable are professionally written

game reviews?

Common Wisdom (Hypothesis) Game reviews are written for the “lowest

common denominator”

Use simple words and sentences

Limited vocabulary

Poor writing

Study

Analyze sentence length and word length in game reviews

Used established readability formulas

1,500 professional reviews posted between 2007 and 2008 on Gamespot

Results

SMOG - years of education needed to completely understand a piece of writing Avg 10.98, Min 8.2, Max 15.1, Stdev 1.04

Coleman-Liau - approximate U.S. grade level necessary for comprehension Avg 9.7, Min 6.9, Max 14, Stdev 1.01

Fog Index - number of years of formal education required to easily understand text on first reading Avg 13.10, Min 8.8, Max 18.8, Stdev 1.56

Findings

Game reviews are written at a secondary education reading level That seems pretty high!

Perhaps this is a barrier to mainstream adoption of AAA videogames?

Further analysis is needed… Need to sample more broadly (other sources)

Research Question 2:What language do players use to describe gameplay and what can this tell us about

how they understand it?

Assumption

Consumer written game reviews can provide insight into thoughts and feelings on gameplay in popular culture

Common Wisdom (Hypothesis) Popular discourse for describing gameplay

is limited in vocabulary and nuance1. Few words are used to describe gameplay2. Mostly judgmental (i.e. gameplay is good/bad)3. Rarely descriptive (i.e. gameplay is varied)

Analyzed User-Submitted Reviews All user-submitted reviews on Gamespot

as of April 20, 2009

397,759 user reviews 111,943 unique users 8,279 game titles

Analysis (Part 1)

1. Extract all sentences in which the word gameplay (and variations) appear

2. Extract all the adjectives used to describe gameplay

E.g. “smooth gameplay”, “gameplay was smooth” List of 723 adjectives (eliminated those that

appeared once)3. Create a set of all the words that appeared in

the context of a given adjective 175,000 words

Analysis (Part 2)

1. Chose the 5,000 most frequent context words2. Created matrix of adjectives vs. context words

The value of each cell is how many times a given adjective appeared together with a context word

3. Created adjective clusters Adjectives whose context is similar/close are

grouped into the same cluster

4. Interpreted clusters What do the clusters mean?

Categories of gameplay?

Assigning MeaningSample Adjectives

from Clusterfast

stressfuldull

tedious frantic

chaotic obnoxious

frenziedenergetic

silkybrisk

Is there a concept/idea/notion that captures what these adjectives refer to?

Pacing The perception of how often

game events occur

Assigning Meaning (another example)

Sample Adjectives from Cluster

limitedunlimited

largeendless massive

vasttremendous

immenseminimal

maximum moderate

infiniteextensive

Is there a concept/idea/notion that captures what these adjectives refer to?

Scope The size of the possibility

space afforded by a game.

Findings (some) Clusters of adjectives represent a popular aesthetic

of gameplay I.e. The main “categories” of concepts used to describe a

medium

Many “important” gameplay concepts are absent from popular discourse E.g. Emergence

For full details, see references at the end.

Wrapping up…

Broader Potential of NLP Provide Baseline data for guiding future inquiry

E.g. What next for game review analysis?

Extend / Complement findings that used other methods E.g. Complement in-depth qualitative studies of player

communities

Explore previously impractical questions E.g. Stylistic differences in Infocom text adventure

games?

Thank you!Further reading:

Zagal, J. P., Ladd, A., Johnson, T. (2009), “Characterizing and Understanding Game Reviews”, Proceedings of the 4th International Conference on the

Foundations of Digital Games, Orlando FL, 215-222

Zagal, J. P., Tomuro, N., Shepitsen, A., “Natural Language Processing for Games Studies Research”, Simulation & Gaming. Published online before print,

October 12, 2011, doi: 10.1177/1046878111422560

Zagal, J. P., Tomuro, N. (2010), “The Aesthetics of Gameplay: A Lexical Approach”, Proceedings of the 14th International Academic MindTrek

Conference, Tampere, Finland, 9-16.

http://facsrv.cs.depaul.edu/~jzagal/publications.html

Bonus Slides

Syntactic ParsingSyntactic Parsing

GrammarR0:R1:R2:R3:R4:R5:R6:R7: cake"" N

the"" Det ate"" V

John"" NP V VG

NPVG VPN Det NP

VP NP S

S

NP VP

V NP

“John” “ate”

“the”

Det N

“cake”

The process of deriving the phrase structure The process of deriving the phrase structure of a sentence is called “parsing”.of a sentence is called “parsing”.

The structure (often represented by a The structure (often represented by a Context Free parse tree) is based on the Context Free parse tree) is based on the grammar.grammar.

Lexical category (part-of-speech) “duck” as a noun or a verb

Lexical Semantics (word meaning) “duck” as an animal or a plaster duck statue

Compound nouns e.g. “dog food”, “Intelligent design scores …”

Syntactic ambiguity

[But semantics can sometimes help disambiguate]

“I saw a man on the hill with a telescope”

“I saw a man on the hill with a hat”

Dealing with Ambiguity

Four possible approaches:1. Formal approaches -- Tightly coupled interaction

among processing levels; knowledge from other levels can help decide among choices at ambiguous levels.

2. Pipeline processing that ignores ambiguity as it occurs and hopes that other levels can eliminate incorrect structures.

3. Probabilistic approaches based on making the most likely choices

4. Don’t do anything, maybe it won’t matter

Source: Jurafsky & Martin “Speech and Language Processing”

NLU + NLG = Conversational NLU + NLG = Conversational SystemsSystems

ELIZA (by Joseph Weizenbaum)(by Joseph Weizenbaum), 1966, 1966 A psychotherapistA psychotherapist No real understanding; simple pattern-matching to respond No real understanding; simple pattern-matching to respond

to user input -> canned responsesto user input -> canned responses

>> Hello.How do you do. Please state your problem.>> Men are all alike.In what way ?>> They're always bugging us about something or other.Can you think of a specific example ?>> Well, my boyfriend made me come here.Your boyfriend made you come here ?>> He says I'm depressed much of the time.I am sorry to hear that you are depressed.

Goals1. (Briefly) Explain what NLP is

2. Convince you that these methods and techniques can be useful and productive in games research

3. Outline some interesting research questions you think should be explored

1 sheet of paper (photocopier) = 0.009652 cm

1 game review = 1 sheet of paper

400,000 sheets of paper = 38.60 meters