16
THE QUESTIONING NEWS SYSTEM: INTERACTIVE NEWS NARRATIVES * Warren Sack [email protected] Abstract: Ever since Socrates we’ve known that posing questions is a good way of educating people and getting them to participate in a conversation. Most educators and advertising agents know the power of questions, but – beyond the editorial page – it is rare to see news laced with questions for the reader, viewer, or user. Questions get the reader to think and thinking helps remembering and encourages participation. The questioning news system described in this paper can be used to automatically annotate news stories with open-ended, leading questions: questions which might push you and me to read and retain more of the content of the news, and even generate some content ourselves. Given a news narrative, natural language processing techniques are used to tag, morphologically analyze, and parse the text of the story. A series of text analysis techniques – some well-known in media studies and psychoanalysis – have been implemented to detect possible unstated assumptions, ambiguities, or ellisions. A handful of these techniques are now running in the questioning news system. More of these techniques are under development. * to appear in Tell Tale: The Narrative in Post-Digital Culture, Marisa S. Olson, Editor (Cambridge, MA: MIT Press, forthcoming) 1

THE QUESTIONING NEWS SYSTEM: INTERACTIVE NEWS NARRATIVES

  • Upload
    ucsc

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

THE QUESTIONING NEWS SYSTEM: INTERACTIVE NEWS NARRATIVES*

Warren [email protected]

Abstract: Ever since Socrates we’ve known that posing questions is a good way of educating people and getting them to participate in a conversation. Most educators and advertising agents know the power of questions, but – beyond the editorial page – it is rare to see news laced with questions for the reader, viewer, or user. Questions get the reader to think and thinking helps remembering and encourages participation. The questioning news system described in this paper can be used to automatically annotate news stories with open-ended, leading questions: questions which might push you and me to read and retain more of the content of the news, and even generate some content ourselves. Given a news narrative, natural language processing techniques are used to tag, morphologically analyze, and parse the text of the story. A series of text analysis techniques – some well-known in media studies and psychoanalysis – have been implemented to detect possible unstated assumptions, ambiguities, or ellisions. A handful of these techniques are now running in the questioning news system. More of these techniques are under development.

* to appear in Tell Tale: The Narrative in Post-Digital Culture, Marisa S. Olson, Editor (Cambridge, MA: MIT Press, forthcoming)

1

1.0 Introduction: Interactive News and Content Elicitation

One of the obligations of the news is to simultaneously inform and engage the

audience. This double obligation has motivated remarkable transformations of

the formats and styles of news story presentation and writing (e.g., the color

graphs and pictures introduced into newspapers by USA Today; and, the “docu-

drama” format of many so-called “edutainment” programs now on televisions

like, COPS). While many of the narrative transformations that have been tested

and adopted have arguably increased audience size; in the eyes of many, they

have also degraded the quality of the news. Thus, most local, evening,

television news shows are no more than poor excuses for ambulance-chasing.

Most tested transformations have been attempts to make news stories more

“exciting.” Instead, what is now needed are experiments in making the news

more interactive.

Unfortunately, most of the computer-based experiments in “interactive”

news that have been tried are also problematic. “Interaction” is almost always

interpreted to mean interaction between a networked computer and a single

person. What has been lost in these overly technocentric experiments with the

news is the understanding that the news is one of the main avenues through

which a community learns about itself and reproduces itself. As citizens of a

community, we read, watch or listen to the news in order to learn what we need

to know so that we can actively participate in public life and public discourse. In

other words, the news in a democratic society is only successful insofar as it

2

gets citizens to produce their own “content,” i.e., to add to the on-going

“conversation” that we know as “our community.”

Some non-computerized news is already truly interactive. Computerized,

“interactive” news should not be measured any differently. Truly “interactive”

news is only achieved when there is as much “content” flowing from the “news

audience” to the “news producer” as there is from the “news producer” to the

“news audience.” So, interactive news agencies must in the future be as

concerned with content elicitation as they are currently with content production

and distribution.

Computer-based content elicitation has received little serious attention.

Market researchers and social scientists have discovered various interviewing

techniques, but few of them have been computerized. Most attempts at

computerization have been of dubious value: for instance, web sites have been

produced for advertising and web users asked to fill out forms detailing personal

information and preferences in exchange for a small trinket or a chance to win

something larger. While these sorts of simplistic solutions may be adequate for

the needs of marketers – i.e., the provocation of quasi-private confessions of

taste from individuals -- something more sophisticated is necessary if the goal is

to inspire participation in public discourse.

To achieve such a goal, it is important to look at non-computer-based

innovations that already function well to elicit commentary and participation.

Primarily the non-computerized procedures of content elicitation worthy of

emulation are interviewing techniques well-known to philosophers,

3

psychologists, counselors, educators, social scientists, documentary film

makers, journalists, and others who have a vested interest in questioning and

listening to others.

The work presented in this paper is a first step towards the computational

re-enactment of a selection of interviewing or questioning techniques. Of

especially interest are those related to, what is commonly called, the Socratic

method. After a quick review of some previous work in the area of computational

question-asking, the architecture and implementation of the questioning news

system is discussed.

2.0 AI: Artificial Interviewing

The process of asking questions has only received selective attention in the

fields of artificial intelligence (AI) natural language processing (NLP) and

computational linguistics (CL). While a variety of works have addressed the

issues of question-answering (e.g., Lehnert, 1978), question-asking has only

received limited attention in such AI sub-fields as expert systems for medical

diagnosis (e.g., Clancey, 1986), machine learning (e.g., Shapiro, 1987), and

intelligent tutoring systems (e.g., Collins et al., 1983). Indeed, in most of these

AI sub-fields, the research on question-asking has usually assumed that

questions can be posed in some constrained format (e.g., in multiple-choice

format) and so – at least implicitly – such research has failed to recognize the

importance of open-ended questions. Open-ended questions have been

discussed in the AI literature (e.g., Schank, 1987). But even research projects

4

aimed at the production of a system that can ask open-ended questions, have

mostly been concerned with issues of machine understanding rather than the

processes of elicitation (e.g., Lehnert and Stucky, 1988).

Projects that were arguably successful in their implementation of content

elicitation procedures (e.g., ELIZA, Weizenbaum, 1966) have been

misunderstood by the AI community. Instead of asking why the simple

techniques implemented in ELIZA were very powerful elicitation devices, AI has

historically focused on the weaknesses of ELIZA. Arguably, ELIZA -- and similar

systems (e.g., Colby, 1981) – did not “understand” the content input by users,

and this mis- or non-understanding on the system’s part is what has been

highlighted in most reviews of ELIZA’s failings. Unfortunately, even reviewers

that have recognized ELIZA’s strengths as a content elicitor (e.g., Suchman,

1987; Turkle, 1995) have not produced any theories of why the simple

techniques of ELIZA do, at times, work well for content elicitation. Even ELIZA’s

own author (Weizenbaum, 1966) has never produced a convincing explanation

of why ELIZA can be so wonderfully effective at getting people to talk about

themselves and their relationships to others.

ELIZA was intended to simulate a Rogerian therapist. Rogerian

therapists employ a client-directed set of techniques that have been summarized

like this:

Very briefly we have found that the therapist is most effective if he is: (a) genuine, integrated, transparently real in the relationship [with the client]; (b) acceptant of the client as a separate, different, person, and acceptant of each fluctuating aspect of the client as it comes to expression; and (c) sensitively empathic of his understanding, seeing the world through the client’s eye (Rogers, 1961, p. 397).

5

From Rogers’ perspective, the very idea of a simulation of a Rogerian therapist

would be oxymoronic. Moreover, from an AI perspective, no existing machine is

“transparently real” or “empathic” and so any claim – by an AI scientist like

Weizenbaum – that a machine has already been built which successfully fulfills

these conditions would have to be met with considerable skepticism. Obviously,

Weizenbaum could not appreciate the merits of his system if he was employing

AI’s or Rogers’ criteria of success.

The proof is not in the pudding, but in the eating. By itself, the computer

program ELIZA is not an interesting artifact and to expect “understanding” from

such trivial mechanism would be misguided. However, the system composed of

ELIZA in interaction with an earnest and interested user is a very interesting

phenomenon of content elicitation. Thus, the criteria with which to properly

measure ELIZA’s success are the criteria of content elicitation; namely, analyses

of the sorts of information and opinions that the system evokes of provokes from

its “users.” Even if the machine cannot “understand” its human interlocutors, it

might still be effective from the perspective of content elicitation.

Advancing the state-of-the-art of content elicitation devices will depend on

improving the techniques of computational question generation. Such

advancement can be made even if the machines we work with today

“understand” next to nothing.

3.0 The Questioning News System

6

The questioning news system was built as a first attempt at implementing a set

of new, content elicitation techniques. The system was tested on the texts of

several hundred AP news stories. System output for some of these stories can

be found on the web at this address:

http://www.media.mit.edu/~wsack/DC/toc.html. In this section, one example of

system output will be described. In the following section, an overview of the

architecture – and some of the implementation details – will be given.

3.1 An Example of Input to and Output from the System

Given the following news story, the questioning news system produced a large

number of questions:

FRENCH EMBASSY IN TOGO TURNS DOWN VISAS FOR 80 OF MOBUTU ENTOURAGE

LOME, Togo (AP) [05-21 15:58:45] -- More than half of ousted Zairian dictator Mobutu Sese Seko's entourage were denied visas to France, a French embassy official said Wednesday. With Togo's leaders pressing Mobutu to go, France is the most likely destination for the autocrat on the run, but French officials have repeatedly said that he has not yet turned to them for refuge. Mobutu and members of his extended family fled from his jungle retreat in Zaire's north on Sunday, as rebels loyal to the country's new leader, Laurent Kabila, closed in. Since then, he has been joined by other members of his family and close aides, bringing the number of his entourage in this West African country to 155. His son, Kongulu, arrived Monday night from Brazzaville, Republic of Congo, with 84 members of Mobutu's extended family. Of those, Togolese authorities would only let in five -- including Kongulu -- leaving the other 80 at the airport. They attempted to get visas to enter France, an official of the French Embassy in Lome said, but were turned down. He said that Mobutu was not among the applicants. France is the most likely last stop for Mobutu, who owns a lavish villa there. It was the country that backed his regime to the end, and it still refuses to recognize the name change to Democratic Republic of Congo that Kabila announced Saturday. The 80 boarded a flight that airport officials said is headed for N'djamena, Chad -- although there are no scheduled flights from Lome to N'djamena on Wednesdays. Mobutu

7

owns a property there, but it was unclear whether Chad would welcome the 80. Mobutu, whose body is ravaged by cancer, says he is too ill to travel immediately. Togolese President Gnassingbe Eyadema was caught off guard by Mobutu's arrival in a cargo plane, and has asked him to hasten his departure. Officials in Morocco -- where Mobutu also owns property -- say that the French government has asked Mobutu to postpone his arrival until after elections at the end of this month, to save President Jacques Chirac embarrassment.

After parsing and analyzing the story, the questioning news system

generates a set of questions for the reader and/or author of the story. For this

particular story, the system generated thirty-four questions. Following are the

seven questions generated concerning this sentence of the story: “With Togo's

leaders pressing Mobutu to go, France is the most likely destination for the

autocrat on the run, but French officials have repeatedly said that he has not yet

turned to them for refuge.”

(1) NOMINALIZATION QUESTION(S): Who leads what?

(2) PRESUPPOSITION QUESTION(S): If leaders [are] pressing Mobutu,

(2a) Did leaders compact Mobutu?

(2b) Did leaders pack together Mobutu?

(2c) Did leaders wring out Mobutu?

(2d) Did leaders squeeze out Mobutu?

(2e) Or, Did leaders compress Mobutu?

(3) IMPLICATION QUESTION(S): Did officials really say he [has not yet

turned to them for refuge]?

8

The additions and grammatical corrections shown [in brackets] in two of the

questions above have been added by hand, but the system could be easily

modified to produce them automatically.

The questions generated by the system are designed to help reveal, to

the reader, some of the ambiguities and possible ellipses of the news story. For

example, question (1) is questioning the usage of the word “leaders” in the text.

In general, the process which generate this question is looking for nominalized

verbs; i.e., words that are used as nouns (e.g., “leader”), but which have verbs

as their root form (e.g., “to lead”). Nominalizations are psychoanalytically

interesting because they allow the arguments of a verb to be ellided. For

instance, in the story above, it is unclear as to what or whom the mentioned

“leaders” lead. By choosing to use a nominalized verb, rather than the root verb,

a writer can leave information out. The point of the questioning news system is

to point out these sorts of ellisions or ambiguities; thus, it asks “Who leads

what?” after it encounters the phrase “Togo’s leaders.”

A second question-generation tactic was used by the system to generate

questions 2a-2e. Given an abstract verb like “press,” one can imagine a great

number of actions which might cause or entail something to be “pressed.” The

questioning news system asks the reader to consider the many possibilities by

listing some of them: to press someone or something, one might, more

specifically, compact, pack together, wring out, squeeze out, or compress that

thing or person. Obviously, such a list can be extended since, in this case,

Togo’s leaders are doing none of these more specific actions in order to “press”

9

Mobutu. Instead, they are probably insistently arguing that he, Mobutu, should

leave the country.

Question (3) “Did officials really say he [has not yet turned to them for

refuge]?” was generated by the third question-asking process implemented in

the system: given a statement containing a verb that directly entails or causes

other actions, ask for an elaboration of the statement. Thus, in this case the

verb “to say” entails talking, speaking, uttering, mouthing, or verbalizing

(according to the lexical resource, WordNet, used in this questioning process).

In other words, by saying that someone said something, one is implying

something more. So, the questioning news system simply asks a question that

is intended to elicit an elaboration; i.e., “Did <the subject> really <verb> <the

object>?”

3.2 Architecture and Implementation of the System

The questioning news system is implemented in the Perl and C programming

languages and runs in a UNIX environment. The first three modules of the

system are based upon pre-existing implementations and/or algorithms to

perform morphological analysis, part-of-speech tagging, and syntactic parsing of

the news stories:

1. Morphological analysis: The roots of the words of the input message are

extracted. The two-level morphological analysis method Kimmo (Kimmo,

10

1983) is used as implemented in the PC-Kimmo system, a C-based

morphological analyzer.

2. Part-of-speech tagging : The words of the input message are each labeled

with one of the fifty-some Brown corpus tags. An existing part-of-speech

tagger, Eric Brill’s rule-based tagger (Brill, 1992) is used. In addition, an

auxiliary module to select the most likely morphological analysis for a word

given the correct part-of-speech tag has been implemented to combine the

tagging and morphological analyses together.

3. Parsing : A set of procedures have been constructed to extract several

syntactic relations from the input sentences. Existing algorithms from the

literature of partial parsing techniques designed to handle very large corpora

with an acceptable rate of accuracy have been employed (especially, those

of Greffenstette, 1994). These algorithms are able to extract relations like

the following: adjective-modifies-noun, noun-modifies-noun, noun-of-

prepositional-phrase-modifies-noun, noun-is-subject-of-verb, noun-is-object-

of-verb. Moreover, “labeling” algorithms -- to distinguish passive, active, and

interrogative sentences – have been implemented in Perl for the questioning

news system.

After the news stories have been analyzed, tagged, and parsed, they are run

through a series of functions written to detect nominalizations, presuppositions,

implications. If any of these linguistic transformations are detected, then

questions are generated.

11

A. Nominalization : Media analysts investigating the ideological bias of the news

have pointed out how the choice of a nominalized verb can hide larger

political conflicts (cf., Trew, 1979). A nominalization can syntactically permit

the elision of the subject and object of a verb thus allowing one to write about

a process or action and yet not mention who is involved. For instance, Trew

(1979) describes how a headline in one day’s news – “Police shot 11” – is

referred to in the news several days later as simply “the shooting.” The

nominalization of the verb “to shoot” does not necessarily hide anything, but it

might be motivated by the newspaper’s efforts to portray the police in the

best possible light and, therefore, be indicative of a contradiction between the

police’s actions of that day and the newspaper’s prior descriptions of the

police as a peaceful force. Similarly personal, psychosocial uses of

nominalization have been noted by psychotherapists. Individuals use

nominalization to cover up their opinions or feelings about other people and

events. Noting nominalizations computationally is relatively straightforward.

Given a tagged and morphologically analyzed text, a program can be written

to find those words which have been tagged as nouns, but which have a verb

as their root. Producing a question concerning a nominalization is equally

straightforward if one has access to a database of verb case frames (like the

database contained in WordNet (Miller, 1995) that identifies the type and

number of arguments for each verb). For instance, with such a database, the

question “Who shot whom?” can be generated for an occurrence of

12

“shooting.” The questioning news system uses WordNet’s verb case frames

to generate such questions.

B. Presupposition : While nominalization is a process by which a verb can be

replaced by a lexically related noun, presupposition licenses the replacement

of a verb with a potentially, morphologically unrelated verb or noun.

Presupposition, as I am using it here, is the replacement of a verb with its

effects. Trew (1979) again provides an illustrative example: “11 shot” can

become “11 killed”, which, in turn, can become “11 died” since a mortal

shooting entails a killing which, in turn, entails death. But, for instance, the

verb “to die” is much vaguer than the verb “to kill” since one does not have to

be killed to die. In concert with a nominalization, the event denoted by “11

shot” becomes simply “the deaths” in a news report a few days later thus

hiding not only the role of the police, but also even the fact that people were

killed rather than simply dead of unknown causes. WordNet’s database of

verb causes and entailments is a useful lexical resource for the identification

of presupposition and the production of possible questions (e.g., “What

caused the deaths?”). The identification of presuppositions is a hard problem

in general since analogous statements (“11 killed” and “deaths”) need to be

identified within and between messages. Work on analogical reasoning

(especially, Haase, 1995) demonstrates a possible approach to this open

problem.

C. Implication : Terms that entail a set of implications – e.g., terms of causation --

are often colloquially or hyperbolically used to describe connections between

13

loosely related (or even unrelated) events or agents. For instance, it is more

likely that a statement such as “He makes me sick” is metaphorical, not a

statement of causation. Statements that put these “causal” relations into

question can elicit an entirely different or more specific set of relationships;

e.g., “What in particular about him makes you sick?” “How in particular are

you made sick?” The computational process of identifying possible “false”

causals is predominantly a process of keyword spotting in concert with the

use of a lexical resource, like WordNet. The questioning news system

currently only produces one sort of question of this type which is generated

using this template: “Did <the subject> really <verb> <the object> <the direct

object>?” (e.g., “Did he really make you sick?”)

14

4.0 Conclusions and Future Directions

In its current state, the questioning news system provides a demonstration of a

small set of content elicitation procedures that might provide a kernel of

computational means for encouraging news readers to also become news

writers.

The questioning news system is a first step towards the implementation of

a large set of content elicitation procedures. The three current question-

generation procedures will be improved and four more procedures added. In

addition, a discourse focusing mechanism will be implemented that takes into

account the longer history of a discussion or “thread” in the news. Currently, too

many questions are generated for each story. The focusing mechanism will all

the questions to be ordered according to a measurement of relevancy and

interestingness and will, consequently, provide a means by which only a few of

the best questions are presented to users of the system.

15

5.0 References

Eric Brill. A Corpus-Based Approach to Language Learning. Ph.D. Thesis. University of Pennsylvania: Department of Computer and Information Science, 1993.

William J. Clancey. Knowledge-based tutoring: The GUIDON program. Cambridge, MA: MIT Press, 1987

Kenneth Colby. “Modeling a Paranoid Mind.” Behavioral and Brain Sciences, 4 (1981): 515-534

Allan Collins and Albert L. Stevens, “Goals and Strategies of Inquiry Teachers,” In Robert Glaser (editor), Advances in Instructional Psychology, Volume 2. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers, 1982.

Gregory Grefenstette, Explorations in Automatic Thesaurus Discovery. Boston: Kluwer Academic Publishers, 1994.

Kenneth Haase. “Analogy in the Large.” In Proceedings of the International Joint Conference on Artificial Intelligence. Montréal, 1995.

Kimmo Koskenniemi. Two-level morphology: a general computational model for word-form recognition and production. Publication No. 11. University of Helsinki: Department of General Linguistics, 1983.

Wendy Lehnert. The Process of Question Answering. Hillsdale, NJ: Lawrence Erlbaum, 1978.

Wendy Lehnert and Brian Stucky. “Understanding Answers to Questions.” In Questions and Questioning. Michael Meyer (editor) New York: Walter de Gruyter, 1988.

George Miller, “WordNet: A Lexical database for English”. Communications of the ACM. November 1995, 39-41.

Carl R. Rogers. On Becoming a Person: A Therapist’s View of Psychotherapy. Boston: Houghton Mifflin Company, 1961.

Roger Schank. Explanation Patterns: Understanding Mechanically and Creatively. Hillsdale, NJ: Lawrence Erlbaum, 1986.

Ehud Shapiro. Algorithmic Program Debugging. Cambridge, MA: MIT Press, 1983.

Lucy Suchman. Plans and Situated Actions: The problem of human-machine communication. New York: Cambridge University Press, 1987.

Tony Trew, “Theory and ideology at work.” In Roger Fowler, Bob Hodge, Gunther Kress, and Tony Trew (editors) Language and Control. Boston: Routledge & K. Paul, 1979.

Sherry Turkle. Life on the Screen: Identity in the Age of the Internet. New York: Simon and Schuster, 1995.

Joseph Weizenbaum, “ELIZA – A Computer Program for the Study of Natural Language Communication between Man and Machine,” Communications of the Association for Computing Machinery (9: 36-45), 1966.

16