Upload
ucsc
View
0
Download
0
Embed Size (px)
Citation preview
THE QUESTIONING NEWS SYSTEM: INTERACTIVE NEWS NARRATIVES*
Warren [email protected]
Abstract: Ever since Socrates we’ve known that posing questions is a good way of educating people and getting them to participate in a conversation. Most educators and advertising agents know the power of questions, but – beyond the editorial page – it is rare to see news laced with questions for the reader, viewer, or user. Questions get the reader to think and thinking helps remembering and encourages participation. The questioning news system described in this paper can be used to automatically annotate news stories with open-ended, leading questions: questions which might push you and me to read and retain more of the content of the news, and even generate some content ourselves. Given a news narrative, natural language processing techniques are used to tag, morphologically analyze, and parse the text of the story. A series of text analysis techniques – some well-known in media studies and psychoanalysis – have been implemented to detect possible unstated assumptions, ambiguities, or ellisions. A handful of these techniques are now running in the questioning news system. More of these techniques are under development.
* to appear in Tell Tale: The Narrative in Post-Digital Culture, Marisa S. Olson, Editor (Cambridge, MA: MIT Press, forthcoming)
1
1.0 Introduction: Interactive News and Content Elicitation
One of the obligations of the news is to simultaneously inform and engage the
audience. This double obligation has motivated remarkable transformations of
the formats and styles of news story presentation and writing (e.g., the color
graphs and pictures introduced into newspapers by USA Today; and, the “docu-
drama” format of many so-called “edutainment” programs now on televisions
like, COPS). While many of the narrative transformations that have been tested
and adopted have arguably increased audience size; in the eyes of many, they
have also degraded the quality of the news. Thus, most local, evening,
television news shows are no more than poor excuses for ambulance-chasing.
Most tested transformations have been attempts to make news stories more
“exciting.” Instead, what is now needed are experiments in making the news
more interactive.
Unfortunately, most of the computer-based experiments in “interactive”
news that have been tried are also problematic. “Interaction” is almost always
interpreted to mean interaction between a networked computer and a single
person. What has been lost in these overly technocentric experiments with the
news is the understanding that the news is one of the main avenues through
which a community learns about itself and reproduces itself. As citizens of a
community, we read, watch or listen to the news in order to learn what we need
to know so that we can actively participate in public life and public discourse. In
other words, the news in a democratic society is only successful insofar as it
2
gets citizens to produce their own “content,” i.e., to add to the on-going
“conversation” that we know as “our community.”
Some non-computerized news is already truly interactive. Computerized,
“interactive” news should not be measured any differently. Truly “interactive”
news is only achieved when there is as much “content” flowing from the “news
audience” to the “news producer” as there is from the “news producer” to the
“news audience.” So, interactive news agencies must in the future be as
concerned with content elicitation as they are currently with content production
and distribution.
Computer-based content elicitation has received little serious attention.
Market researchers and social scientists have discovered various interviewing
techniques, but few of them have been computerized. Most attempts at
computerization have been of dubious value: for instance, web sites have been
produced for advertising and web users asked to fill out forms detailing personal
information and preferences in exchange for a small trinket or a chance to win
something larger. While these sorts of simplistic solutions may be adequate for
the needs of marketers – i.e., the provocation of quasi-private confessions of
taste from individuals -- something more sophisticated is necessary if the goal is
to inspire participation in public discourse.
To achieve such a goal, it is important to look at non-computer-based
innovations that already function well to elicit commentary and participation.
Primarily the non-computerized procedures of content elicitation worthy of
emulation are interviewing techniques well-known to philosophers,
3
psychologists, counselors, educators, social scientists, documentary film
makers, journalists, and others who have a vested interest in questioning and
listening to others.
The work presented in this paper is a first step towards the computational
re-enactment of a selection of interviewing or questioning techniques. Of
especially interest are those related to, what is commonly called, the Socratic
method. After a quick review of some previous work in the area of computational
question-asking, the architecture and implementation of the questioning news
system is discussed.
2.0 AI: Artificial Interviewing
The process of asking questions has only received selective attention in the
fields of artificial intelligence (AI) natural language processing (NLP) and
computational linguistics (CL). While a variety of works have addressed the
issues of question-answering (e.g., Lehnert, 1978), question-asking has only
received limited attention in such AI sub-fields as expert systems for medical
diagnosis (e.g., Clancey, 1986), machine learning (e.g., Shapiro, 1987), and
intelligent tutoring systems (e.g., Collins et al., 1983). Indeed, in most of these
AI sub-fields, the research on question-asking has usually assumed that
questions can be posed in some constrained format (e.g., in multiple-choice
format) and so – at least implicitly – such research has failed to recognize the
importance of open-ended questions. Open-ended questions have been
discussed in the AI literature (e.g., Schank, 1987). But even research projects
4
aimed at the production of a system that can ask open-ended questions, have
mostly been concerned with issues of machine understanding rather than the
processes of elicitation (e.g., Lehnert and Stucky, 1988).
Projects that were arguably successful in their implementation of content
elicitation procedures (e.g., ELIZA, Weizenbaum, 1966) have been
misunderstood by the AI community. Instead of asking why the simple
techniques implemented in ELIZA were very powerful elicitation devices, AI has
historically focused on the weaknesses of ELIZA. Arguably, ELIZA -- and similar
systems (e.g., Colby, 1981) – did not “understand” the content input by users,
and this mis- or non-understanding on the system’s part is what has been
highlighted in most reviews of ELIZA’s failings. Unfortunately, even reviewers
that have recognized ELIZA’s strengths as a content elicitor (e.g., Suchman,
1987; Turkle, 1995) have not produced any theories of why the simple
techniques of ELIZA do, at times, work well for content elicitation. Even ELIZA’s
own author (Weizenbaum, 1966) has never produced a convincing explanation
of why ELIZA can be so wonderfully effective at getting people to talk about
themselves and their relationships to others.
ELIZA was intended to simulate a Rogerian therapist. Rogerian
therapists employ a client-directed set of techniques that have been summarized
like this:
Very briefly we have found that the therapist is most effective if he is: (a) genuine, integrated, transparently real in the relationship [with the client]; (b) acceptant of the client as a separate, different, person, and acceptant of each fluctuating aspect of the client as it comes to expression; and (c) sensitively empathic of his understanding, seeing the world through the client’s eye (Rogers, 1961, p. 397).
5
From Rogers’ perspective, the very idea of a simulation of a Rogerian therapist
would be oxymoronic. Moreover, from an AI perspective, no existing machine is
“transparently real” or “empathic” and so any claim – by an AI scientist like
Weizenbaum – that a machine has already been built which successfully fulfills
these conditions would have to be met with considerable skepticism. Obviously,
Weizenbaum could not appreciate the merits of his system if he was employing
AI’s or Rogers’ criteria of success.
The proof is not in the pudding, but in the eating. By itself, the computer
program ELIZA is not an interesting artifact and to expect “understanding” from
such trivial mechanism would be misguided. However, the system composed of
ELIZA in interaction with an earnest and interested user is a very interesting
phenomenon of content elicitation. Thus, the criteria with which to properly
measure ELIZA’s success are the criteria of content elicitation; namely, analyses
of the sorts of information and opinions that the system evokes of provokes from
its “users.” Even if the machine cannot “understand” its human interlocutors, it
might still be effective from the perspective of content elicitation.
Advancing the state-of-the-art of content elicitation devices will depend on
improving the techniques of computational question generation. Such
advancement can be made even if the machines we work with today
“understand” next to nothing.
3.0 The Questioning News System
6
The questioning news system was built as a first attempt at implementing a set
of new, content elicitation techniques. The system was tested on the texts of
several hundred AP news stories. System output for some of these stories can
be found on the web at this address:
http://www.media.mit.edu/~wsack/DC/toc.html. In this section, one example of
system output will be described. In the following section, an overview of the
architecture – and some of the implementation details – will be given.
3.1 An Example of Input to and Output from the System
Given the following news story, the questioning news system produced a large
number of questions:
FRENCH EMBASSY IN TOGO TURNS DOWN VISAS FOR 80 OF MOBUTU ENTOURAGE
LOME, Togo (AP) [05-21 15:58:45] -- More than half of ousted Zairian dictator Mobutu Sese Seko's entourage were denied visas to France, a French embassy official said Wednesday. With Togo's leaders pressing Mobutu to go, France is the most likely destination for the autocrat on the run, but French officials have repeatedly said that he has not yet turned to them for refuge. Mobutu and members of his extended family fled from his jungle retreat in Zaire's north on Sunday, as rebels loyal to the country's new leader, Laurent Kabila, closed in. Since then, he has been joined by other members of his family and close aides, bringing the number of his entourage in this West African country to 155. His son, Kongulu, arrived Monday night from Brazzaville, Republic of Congo, with 84 members of Mobutu's extended family. Of those, Togolese authorities would only let in five -- including Kongulu -- leaving the other 80 at the airport. They attempted to get visas to enter France, an official of the French Embassy in Lome said, but were turned down. He said that Mobutu was not among the applicants. France is the most likely last stop for Mobutu, who owns a lavish villa there. It was the country that backed his regime to the end, and it still refuses to recognize the name change to Democratic Republic of Congo that Kabila announced Saturday. The 80 boarded a flight that airport officials said is headed for N'djamena, Chad -- although there are no scheduled flights from Lome to N'djamena on Wednesdays. Mobutu
7
owns a property there, but it was unclear whether Chad would welcome the 80. Mobutu, whose body is ravaged by cancer, says he is too ill to travel immediately. Togolese President Gnassingbe Eyadema was caught off guard by Mobutu's arrival in a cargo plane, and has asked him to hasten his departure. Officials in Morocco -- where Mobutu also owns property -- say that the French government has asked Mobutu to postpone his arrival until after elections at the end of this month, to save President Jacques Chirac embarrassment.
After parsing and analyzing the story, the questioning news system
generates a set of questions for the reader and/or author of the story. For this
particular story, the system generated thirty-four questions. Following are the
seven questions generated concerning this sentence of the story: “With Togo's
leaders pressing Mobutu to go, France is the most likely destination for the
autocrat on the run, but French officials have repeatedly said that he has not yet
turned to them for refuge.”
(1) NOMINALIZATION QUESTION(S): Who leads what?
(2) PRESUPPOSITION QUESTION(S): If leaders [are] pressing Mobutu,
(2a) Did leaders compact Mobutu?
(2b) Did leaders pack together Mobutu?
(2c) Did leaders wring out Mobutu?
(2d) Did leaders squeeze out Mobutu?
(2e) Or, Did leaders compress Mobutu?
(3) IMPLICATION QUESTION(S): Did officials really say he [has not yet
turned to them for refuge]?
8
The additions and grammatical corrections shown [in brackets] in two of the
questions above have been added by hand, but the system could be easily
modified to produce them automatically.
The questions generated by the system are designed to help reveal, to
the reader, some of the ambiguities and possible ellipses of the news story. For
example, question (1) is questioning the usage of the word “leaders” in the text.
In general, the process which generate this question is looking for nominalized
verbs; i.e., words that are used as nouns (e.g., “leader”), but which have verbs
as their root form (e.g., “to lead”). Nominalizations are psychoanalytically
interesting because they allow the arguments of a verb to be ellided. For
instance, in the story above, it is unclear as to what or whom the mentioned
“leaders” lead. By choosing to use a nominalized verb, rather than the root verb,
a writer can leave information out. The point of the questioning news system is
to point out these sorts of ellisions or ambiguities; thus, it asks “Who leads
what?” after it encounters the phrase “Togo’s leaders.”
A second question-generation tactic was used by the system to generate
questions 2a-2e. Given an abstract verb like “press,” one can imagine a great
number of actions which might cause or entail something to be “pressed.” The
questioning news system asks the reader to consider the many possibilities by
listing some of them: to press someone or something, one might, more
specifically, compact, pack together, wring out, squeeze out, or compress that
thing or person. Obviously, such a list can be extended since, in this case,
Togo’s leaders are doing none of these more specific actions in order to “press”
9
Mobutu. Instead, they are probably insistently arguing that he, Mobutu, should
leave the country.
Question (3) “Did officials really say he [has not yet turned to them for
refuge]?” was generated by the third question-asking process implemented in
the system: given a statement containing a verb that directly entails or causes
other actions, ask for an elaboration of the statement. Thus, in this case the
verb “to say” entails talking, speaking, uttering, mouthing, or verbalizing
(according to the lexical resource, WordNet, used in this questioning process).
In other words, by saying that someone said something, one is implying
something more. So, the questioning news system simply asks a question that
is intended to elicit an elaboration; i.e., “Did <the subject> really <verb> <the
object>?”
3.2 Architecture and Implementation of the System
The questioning news system is implemented in the Perl and C programming
languages and runs in a UNIX environment. The first three modules of the
system are based upon pre-existing implementations and/or algorithms to
perform morphological analysis, part-of-speech tagging, and syntactic parsing of
the news stories:
1. Morphological analysis: The roots of the words of the input message are
extracted. The two-level morphological analysis method Kimmo (Kimmo,
10
1983) is used as implemented in the PC-Kimmo system, a C-based
morphological analyzer.
2. Part-of-speech tagging : The words of the input message are each labeled
with one of the fifty-some Brown corpus tags. An existing part-of-speech
tagger, Eric Brill’s rule-based tagger (Brill, 1992) is used. In addition, an
auxiliary module to select the most likely morphological analysis for a word
given the correct part-of-speech tag has been implemented to combine the
tagging and morphological analyses together.
3. Parsing : A set of procedures have been constructed to extract several
syntactic relations from the input sentences. Existing algorithms from the
literature of partial parsing techniques designed to handle very large corpora
with an acceptable rate of accuracy have been employed (especially, those
of Greffenstette, 1994). These algorithms are able to extract relations like
the following: adjective-modifies-noun, noun-modifies-noun, noun-of-
prepositional-phrase-modifies-noun, noun-is-subject-of-verb, noun-is-object-
of-verb. Moreover, “labeling” algorithms -- to distinguish passive, active, and
interrogative sentences – have been implemented in Perl for the questioning
news system.
After the news stories have been analyzed, tagged, and parsed, they are run
through a series of functions written to detect nominalizations, presuppositions,
implications. If any of these linguistic transformations are detected, then
questions are generated.
11
A. Nominalization : Media analysts investigating the ideological bias of the news
have pointed out how the choice of a nominalized verb can hide larger
political conflicts (cf., Trew, 1979). A nominalization can syntactically permit
the elision of the subject and object of a verb thus allowing one to write about
a process or action and yet not mention who is involved. For instance, Trew
(1979) describes how a headline in one day’s news – “Police shot 11” – is
referred to in the news several days later as simply “the shooting.” The
nominalization of the verb “to shoot” does not necessarily hide anything, but it
might be motivated by the newspaper’s efforts to portray the police in the
best possible light and, therefore, be indicative of a contradiction between the
police’s actions of that day and the newspaper’s prior descriptions of the
police as a peaceful force. Similarly personal, psychosocial uses of
nominalization have been noted by psychotherapists. Individuals use
nominalization to cover up their opinions or feelings about other people and
events. Noting nominalizations computationally is relatively straightforward.
Given a tagged and morphologically analyzed text, a program can be written
to find those words which have been tagged as nouns, but which have a verb
as their root. Producing a question concerning a nominalization is equally
straightforward if one has access to a database of verb case frames (like the
database contained in WordNet (Miller, 1995) that identifies the type and
number of arguments for each verb). For instance, with such a database, the
question “Who shot whom?” can be generated for an occurrence of
12
“shooting.” The questioning news system uses WordNet’s verb case frames
to generate such questions.
B. Presupposition : While nominalization is a process by which a verb can be
replaced by a lexically related noun, presupposition licenses the replacement
of a verb with a potentially, morphologically unrelated verb or noun.
Presupposition, as I am using it here, is the replacement of a verb with its
effects. Trew (1979) again provides an illustrative example: “11 shot” can
become “11 killed”, which, in turn, can become “11 died” since a mortal
shooting entails a killing which, in turn, entails death. But, for instance, the
verb “to die” is much vaguer than the verb “to kill” since one does not have to
be killed to die. In concert with a nominalization, the event denoted by “11
shot” becomes simply “the deaths” in a news report a few days later thus
hiding not only the role of the police, but also even the fact that people were
killed rather than simply dead of unknown causes. WordNet’s database of
verb causes and entailments is a useful lexical resource for the identification
of presupposition and the production of possible questions (e.g., “What
caused the deaths?”). The identification of presuppositions is a hard problem
in general since analogous statements (“11 killed” and “deaths”) need to be
identified within and between messages. Work on analogical reasoning
(especially, Haase, 1995) demonstrates a possible approach to this open
problem.
C. Implication : Terms that entail a set of implications – e.g., terms of causation --
are often colloquially or hyperbolically used to describe connections between
13
loosely related (or even unrelated) events or agents. For instance, it is more
likely that a statement such as “He makes me sick” is metaphorical, not a
statement of causation. Statements that put these “causal” relations into
question can elicit an entirely different or more specific set of relationships;
e.g., “What in particular about him makes you sick?” “How in particular are
you made sick?” The computational process of identifying possible “false”
causals is predominantly a process of keyword spotting in concert with the
use of a lexical resource, like WordNet. The questioning news system
currently only produces one sort of question of this type which is generated
using this template: “Did <the subject> really <verb> <the object> <the direct
object>?” (e.g., “Did he really make you sick?”)
14
4.0 Conclusions and Future Directions
In its current state, the questioning news system provides a demonstration of a
small set of content elicitation procedures that might provide a kernel of
computational means for encouraging news readers to also become news
writers.
The questioning news system is a first step towards the implementation of
a large set of content elicitation procedures. The three current question-
generation procedures will be improved and four more procedures added. In
addition, a discourse focusing mechanism will be implemented that takes into
account the longer history of a discussion or “thread” in the news. Currently, too
many questions are generated for each story. The focusing mechanism will all
the questions to be ordered according to a measurement of relevancy and
interestingness and will, consequently, provide a means by which only a few of
the best questions are presented to users of the system.
15
5.0 References
Eric Brill. A Corpus-Based Approach to Language Learning. Ph.D. Thesis. University of Pennsylvania: Department of Computer and Information Science, 1993.
William J. Clancey. Knowledge-based tutoring: The GUIDON program. Cambridge, MA: MIT Press, 1987
Kenneth Colby. “Modeling a Paranoid Mind.” Behavioral and Brain Sciences, 4 (1981): 515-534
Allan Collins and Albert L. Stevens, “Goals and Strategies of Inquiry Teachers,” In Robert Glaser (editor), Advances in Instructional Psychology, Volume 2. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers, 1982.
Gregory Grefenstette, Explorations in Automatic Thesaurus Discovery. Boston: Kluwer Academic Publishers, 1994.
Kenneth Haase. “Analogy in the Large.” In Proceedings of the International Joint Conference on Artificial Intelligence. Montréal, 1995.
Kimmo Koskenniemi. Two-level morphology: a general computational model for word-form recognition and production. Publication No. 11. University of Helsinki: Department of General Linguistics, 1983.
Wendy Lehnert. The Process of Question Answering. Hillsdale, NJ: Lawrence Erlbaum, 1978.
Wendy Lehnert and Brian Stucky. “Understanding Answers to Questions.” In Questions and Questioning. Michael Meyer (editor) New York: Walter de Gruyter, 1988.
George Miller, “WordNet: A Lexical database for English”. Communications of the ACM. November 1995, 39-41.
Carl R. Rogers. On Becoming a Person: A Therapist’s View of Psychotherapy. Boston: Houghton Mifflin Company, 1961.
Roger Schank. Explanation Patterns: Understanding Mechanically and Creatively. Hillsdale, NJ: Lawrence Erlbaum, 1986.
Ehud Shapiro. Algorithmic Program Debugging. Cambridge, MA: MIT Press, 1983.
Lucy Suchman. Plans and Situated Actions: The problem of human-machine communication. New York: Cambridge University Press, 1987.
Tony Trew, “Theory and ideology at work.” In Roger Fowler, Bob Hodge, Gunther Kress, and Tony Trew (editors) Language and Control. Boston: Routledge & K. Paul, 1979.
Sherry Turkle. Life on the Screen: Identity in the Age of the Internet. New York: Simon and Schuster, 1995.
Joseph Weizenbaum, “ELIZA – A Computer Program for the Study of Natural Language Communication between Man and Machine,” Communications of the Association for Computing Machinery (9: 36-45), 1966.
16