Upload
jose-zagal
View
3.576
Download
2
Embed Size (px)
Citation preview
Natural Language Processing for
Games Studies Research
Jose P. Zagal & Noriko TomuroCollege of Computing and Digital Media
(2012 edition)
What is NLP?
A field in Artificial Intelligence (AI) devoted to creating computers that use natural language as input and/or output
NLP in Games
Façade: an AI-based interactive story game
It interacts with users through automatically generated dialogues
Analyzing Large Amounts of Text NLP techniques -- automatically analyze
human languages formally (in the right way) Useful for analyzing and extracting information
from a large amount of text
NLanalysis
NLP Technique: POS Tagging Part-Of-Speech (POS) tagging is a process of
assigning a POS to each word in a sentence (and all sentences in a corpus)
Input: The icy roads are dangerousOutput: The/Det icy/Adj roads/N are/V
dangerous/Adj
NLP Technique: Syntactic Parsing Deriving the phrase structure of a sentence The structure is based on the grammar
GrammarR0:R1:R2:R3:R4:R5:R6:R7: cake"" N
the"" Det
ate"" VJohn"" NP V VG
NPVG VPN Det NP
VP NP S
S
NP VP
V NP
“John” “ate”
“the”
Det N
“cake”
Involves other disciplines
Linguistics Also called ”Computational Linguistics”
Psychology Mathematics and Statistics Information Theory Computer Science
Some real-world applications
NLP can be stand-along applications or components embedded in other systems.
Major NLP applications include: Machine translation (e.g. Google Translate) Question answering (e.g. Ask.com) Summarization Conversational agents (e.g. Chatbots)
Also, analyzing web documents Analyze (not just retrieve) weblogs,
discussion forums, message boards, user groups, and other forms of user generated media Product marketing information Political opinion tracking Social network analysis Buzz analysis (what’s hot, what topics are people
talking about right now).
Source: Jurafsky & Martin “Speech and Language Processing”
NLP is Hard
Understanding natural languages is hard … because of inherent ambiguity
Engineering NLP systems is also hard … because of: Huge amount of data resources needed (e.g.
grammar, dictionary, documents to extract statistics from)
Computational complexity (intractable) of analyzing a sentence
Ambiguity
There are different types of ambiguity and different techniques for dealing with it as well
““Get the cat with the gloves.”
Source: Marti Hearst, i256, at UC Berkeley
The Bottom Line
Complete NL Understanding (thus general intelligence) is impossible.
But we can make incremental progress. Also we have made successes in limited
domains. [But NLP is costly – Lots of work and
resources are needed, but the amount of return is sometimes not worth it.]
How NLP Can Help Games Research By analyzing LOTS of game texts we could:
Verify various hypotheses about the ‘language’ of games, gamers, gamer cultures, etc.
Analyze player preferences from game reviews Analyze text/dialogue from games Analyze conversations in MMOGs Create more realistic dialogue for interactive
games
And more!
Common Wisdom (Hypothesis) Game reviews are written for the “lowest
common denominator”
Use simple words and sentences
Limited vocabulary
Poor writing
Study
Analyze sentence length and word length in game reviews
Used established readability formulas
1,500 professional reviews posted between 2007 and 2008 on Gamespot
Results
SMOG - years of education needed to completely understand a piece of writing Avg 10.98, Min 8.2, Max 15.1, Stdev 1.04
Coleman-Liau - approximate U.S. grade level necessary for comprehension Avg 9.7, Min 6.9, Max 14, Stdev 1.01
Fog Index - number of years of formal education required to easily understand text on first reading Avg 13.10, Min 8.8, Max 18.8, Stdev 1.56
Findings
Game reviews are written at a secondary education reading level That seems pretty high!
Perhaps this is a barrier to mainstream adoption of AAA videogames?
Further analysis is needed… Need to sample more broadly (other sources)
Research Question 2:What language do players use to describe gameplay and what can this tell us about
how they understand it?
Assumption
Consumer written game reviews can provide insight into thoughts and feelings on gameplay in popular culture
Common Wisdom (Hypothesis) Popular discourse for describing gameplay
is limited in vocabulary and nuance1. Few words are used to describe gameplay2. Mostly judgmental (i.e. gameplay is good/bad)3. Rarely descriptive (i.e. gameplay is varied)
Analyzed User-Submitted Reviews All user-submitted reviews on Gamespot
as of April 20, 2009
397,759 user reviews 111,943 unique users 8,279 game titles
Analysis (Part 1)
1. Extract all sentences in which the word gameplay (and variations) appear
2. Extract all the adjectives used to describe gameplay
E.g. “smooth gameplay”, “gameplay was smooth” List of 723 adjectives (eliminated those that
appeared once)3. Create a set of all the words that appeared in
the context of a given adjective 175,000 words
Analysis (Part 2)
1. Chose the 5,000 most frequent context words2. Created matrix of adjectives vs. context words
The value of each cell is how many times a given adjective appeared together with a context word
3. Created adjective clusters Adjectives whose context is similar/close are
grouped into the same cluster
4. Interpreted clusters What do the clusters mean?
Categories of gameplay?
Assigning MeaningSample Adjectives
from Clusterfast
stressfuldull
tedious frantic
chaotic obnoxious
frenziedenergetic
silkybrisk
Is there a concept/idea/notion that captures what these adjectives refer to?
Pacing The perception of how often
game events occur
Assigning Meaning (another example)
Sample Adjectives from Cluster
limitedunlimited
largeendless massive
vasttremendous
immenseminimal
maximum moderate
infiniteextensive
Is there a concept/idea/notion that captures what these adjectives refer to?
Scope The size of the possibility
space afforded by a game.
Findings (some) Clusters of adjectives represent a popular aesthetic
of gameplay I.e. The main “categories” of concepts used to describe a
medium
Many “important” gameplay concepts are absent from popular discourse E.g. Emergence
For full details, see references at the end.
Broader Potential of NLP Provide Baseline data for guiding future inquiry
E.g. What next for game review analysis?
Extend / Complement findings that used other methods E.g. Complement in-depth qualitative studies of player
communities
Explore previously impractical questions E.g. Stylistic differences in Infocom text adventure
games?
Thank you!Further reading:
Zagal, J. P., Ladd, A., Johnson, T. (2009), “Characterizing and Understanding Game Reviews”, Proceedings of the 4th International Conference on the
Foundations of Digital Games, Orlando FL, 215-222
Zagal, J. P., Tomuro, N., Shepitsen, A., “Natural Language Processing for Games Studies Research”, Simulation & Gaming. Published online before print,
October 12, 2011, doi: 10.1177/1046878111422560
Zagal, J. P., Tomuro, N. (2010), “The Aesthetics of Gameplay: A Lexical Approach”, Proceedings of the 14th International Academic MindTrek
Conference, Tampere, Finland, 9-16.
http://facsrv.cs.depaul.edu/~jzagal/publications.html
Syntactic ParsingSyntactic Parsing
GrammarR0:R1:R2:R3:R4:R5:R6:R7: cake"" N
the"" Det ate"" V
John"" NP V VG
NPVG VPN Det NP
VP NP S
S
NP VP
V NP
“John” “ate”
“the”
Det N
“cake”
The process of deriving the phrase structure The process of deriving the phrase structure of a sentence is called “parsing”.of a sentence is called “parsing”.
The structure (often represented by a The structure (often represented by a Context Free parse tree) is based on the Context Free parse tree) is based on the grammar.grammar.
Lexical category (part-of-speech) “duck” as a noun or a verb
Lexical Semantics (word meaning) “duck” as an animal or a plaster duck statue
Compound nouns e.g. “dog food”, “Intelligent design scores …”
Syntactic ambiguity
[But semantics can sometimes help disambiguate]
“I saw a man on the hill with a telescope”
“I saw a man on the hill with a hat”
Dealing with Ambiguity
Four possible approaches:1. Formal approaches -- Tightly coupled interaction
among processing levels; knowledge from other levels can help decide among choices at ambiguous levels.
2. Pipeline processing that ignores ambiguity as it occurs and hopes that other levels can eliminate incorrect structures.
3. Probabilistic approaches based on making the most likely choices
4. Don’t do anything, maybe it won’t matter
Source: Jurafsky & Martin “Speech and Language Processing”
NLU + NLG = Conversational NLU + NLG = Conversational SystemsSystems
ELIZA (by Joseph Weizenbaum)(by Joseph Weizenbaum), 1966, 1966 A psychotherapistA psychotherapist No real understanding; simple pattern-matching to respond No real understanding; simple pattern-matching to respond
to user input -> canned responsesto user input -> canned responses
>> Hello.How do you do. Please state your problem.>> Men are all alike.In what way ?>> They're always bugging us about something or other.Can you think of a specific example ?>> Well, my boyfriend made me come here.Your boyfriend made you come here ?>> He says I'm depressed much of the time.I am sorry to hear that you are depressed.
Goals1. (Briefly) Explain what NLP is
2. Convince you that these methods and techniques can be useful and productive in games research
3. Outline some interesting research questions you think should be explored