Handbook of Visual Analysis an Ethnomethodological Approach

8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach

1/40

Practices of Seeing:Visual Analysis : An Ethnomethodological Approach

Charles Goodw inApp lied Lingu istics

UCLAcgoodw in@hu mn et.ucla.edu

Pp . 157-182 inHandbook of Visual Analysis

edited by Theo van Leeuw en and Carey JewittLondon : Sage Pu blications

2000 Charles Goodw in


2/40

Practices of SeeingVisual Analysis : An Ethnomethodological Approach

Charles Goodw in

A p rimord ial site for the an alysis of hum an langu age, cognition an d action

consists of a situation in which mu ltiple participan ts are attempting to carry out

courses of action together while attending to each other, the larger activities that

their cur rent actions are embedded within, and relevant phenomena in their

surrou nd . Vision can be central to this p rocess.1. The visible bodies of

participan ts provide systematic, changing d isplays about relevant action an d

orientation. Seeable structure in the environment can not only constitute a locus

for shared visual attention, but can also contribute crucial semiotic resources for

the organization of current action (consider for example the u se of graphs an d

charts in a scientific d iscussion). For the past thirty years both Conversation

Analysis and Ethnom ethodology have p rovided extensive analysis of how

hu man vision is socially organized . Both fields investigate the p ractices that

participan ts use to build and shape in concert with each other the stru ctured

events that constitute the lifeworld of a comm un ity of actors. Phenomena

investigated in which vision plays a central role range from sequ ences of talk, to

medical and legal encounters, to scientific knowledge.

The approach taken by both ethnomethodology and conversation analysis to

the study of visual p henom ena is qu ite distinctive. At least since Saussure

proposed studying langue as an an alytically distinct subfield of a more

encompassing science of signs, different kinds of semiotic phenomena (language,

visual signs, etc.) have typ ically been analyzed in isolation from each other.

How ever in the work to be d escribed here neither vision, nor the images or other

phenomena that participan ts look at, are treated as coherent, self-contained

1 Vision is not, however, essential as both the comp etence of the blind and

teleph one conversations d emonstrate. Below it will be argued that situated

action is accomplished through the juxtaposition of multiple semiotic fields,

only some of wh ich m ake vision relevant.

nal Printed page mbers in margin.

p. 157


3/40

2

dom ains that can be su bjected to analysis in th eir own terms. Instead it quickly

becomes app arent that visual ph enomena can only be investigated by taking into

account a diverse set of semiotic resources and meaning-making practices that

participan ts dep loy to build th e social worlds that they inhabit and constitute

through on going processes of action. Many of these, such as stru cture provided

by current talk, are not in any sense visual, but the visible phenomena that the

participan ts are attend ing to cannot be prop erly analyzed w ithout them. The

focus of analysis is not thu s not rep resentations or v ision p er se, but instead the

part p layed by visual phenomena in the prod uction of meaningful action.

Both the m ethodology and the forms of analysis used in this app roach can

best be demonstrated throu gh sp ecific examples.

Gaze between Speakers and HearersIn formu lating the d istinction between comp etence and performance Chomsky

(1965: 3-4) argu ed that actual speech is so full of perform ance errors, such as

sentence fragments, restarts and pau ses, that both lingu ists and p arties faced

with the task of acquiring a langu age should ignore it. Investigating a corpus of

conversation recorded on video Goodwin (1980a, 1981, Chapter 2) indeed found

precisely the false starts and changes of plan in m id-course that Chomsky

describes. In the following instead of prod ucing an u nbroken gramm atical

sentence the speaker says:2

2 Talk is transcribed using the system d eveloped by Gail Jefferson (see Sacks,

Schegloff and Jefferson Sacks, et al. 19 74: 731-733). Talk receiving som e formof emphasis is marked with und erlining or bold italics. Punctuation is used to

transcribe intonation: A period indicates falling pitch, a question mark rising

pitch, and a comm a a falling contour, as wou ld be found for examp le after a

non-terminal item in a list. A colon indicates lengthening of the current sound .

A d ash marks the sud den cut-off of the current sound (in English it is

frequen tly realized as glottal stop). Comm ents (e.g., descriptions of relevant

nonvocal behavior) are pr inted in italics within dou ble parentheses. Num bers

within single parentheses mark silences in second s and tenths of a second . A

degree sign () ind icates that th e talk that follows is being spoken with low

p. 158


4/40

3

Cathy: En a couple of girls- One other girl from there,

How ever, when the video is examined it is foun d th at the restart occurs at a

specific place: precisely at the point w here the speaker br ings her gaze to her

add ressee, and finds that her add ressee is looking elsewhere:

Pam: En a coup le of girls- One other girl from the:re,

Speaker BringsGaze to

Recipient

Restart

Hearer LookingAwayAnn:

Hearer StartsMoving Gazeto Speaker

Gaze Arrives

Moreover, the restart acts as a request for the Hearers gaze. Thu s immed iately

after the restart the h earer starts to move her gaze to the speaker.

Paradoxically, if the speaker had not p rodu ced a restart at this point she

could h ave said something that w ould ap pear to be an unbroken gramm atical

unit if one examined only the stream of speech (e.g., En a coup le of girls from

there ), but which would in fact be interactively a sentence fragment since her

add ressee attended to only part of it.

The identities of speaker an d hearer are th e most generic participan t

categories relevant to th e produ ction of a strip of talk. The p henomena examined

here (which occur pervasively in conversation) provide evidence that the w ork of

volume. Left brackets connecting talk by d ifferent sp eakers mark the p oint

where overlap begins.

p. 159


5/40

4

being a hearer in face-to-face interaction requires situated use of the body, and

gaze in particular, as a w ay of visibly displaying to others th e focus of ones

orientation. Moreover speakers not on ly use their own gaze to see relevant action

in the bod y of a silent hearer, bu t actively change the structure of their emerging

talk in term s of what th ey see.

What relevance do p rocesses such as th ese have to the oth er issue raised by

Chom sky (1965 :3), that of d etermining from the data of performance the

underlying system of rules mastered by the speaker-hearer? Many repairs

involve the rep etition, with some significant change, of something said elsewh ere

in the u tterance:

We wen t- I went to

If he could- If you could

Such rep etition has the effect of delineating th e boun dar ies and stru cture of

many d ifferent units in the stream of speech. Thu s, by analyzing w hat is the

same an d wh at is d ifferent in these examples one is able to d iscover: First, where

the stream of speech can be d ivided into significant subu nits; second , that

alternatives are p ossible in a p articular slot; third, wh at some of these

alternatives are (here d ifferent p ronouns); and fourth, that these alternatives

contrast w ith each other in some significant fashion, or else the rep air wou ld not

be war ranted . Repairs in other examples not only d elineate basic un its in the

stream of speech (noun p hrases for examp le), but also demonstrate the different

forms su ch units can take, and th e types of operations that can be performed

up on th em (see Goodwin 1981 :170-173). Repairs further requ ire that a listener

learn to recognize that not all of the sequences within the stream of speech are

possible sequences within the language, e.g., that I does not follow to in We

went t- I went to . In ord er to deal with such a repair a hearer is thus required

to make one of the most basic distinctions posed for anyone attemp ting to

decipher the stru cture of a langu age: to differentiate what are and are not

possible sequences in th e language, that is between gram matical and

un gramm atical structures. The fact that this task is posed m ay be crucial to any

learning process. If the party attemp ting to learn the language d id not have to


6/40

5

deal w ith ungram matical possibilities, if for example she was exposed to only

well-formed sentences, she might not have the data necessary to d etermine the

boundar ies, or even the structure of the system. Chomskys argument that the

repairs found in natu ral speech so flaw it that a child is faced with data of very

degenerate qu ality does not app ear warranted. Rather it might be argued that

if a child g rew u p in an ideal world w here she heard only well-formed sentences

she would not learn to p rodu ce sentences herself because she w ould lack the

analysis of their structure p rovided by events such as repair. Crucial to this

process is the way in wh ich visual phenomena , such as dispreferred gaze states

can both lead to repair, and d emonstrate that the participan ts are in fact

attending in fine detail to wh at might ap pear to be quite ephem eral structure in

the stream of speech.

What h as just been d escribed p rovides one example of the method ologies and

forms of analysis used to investigate visual ph enomena within Conversation

Analysis. Several observa tions can be m ade. First, the focus of analysis is not

visual events in isolation, but instead the systematic practices used by

participants in interaction to achieve courses of collaborative action with each

other in the p resent case the interactive construction of turns at talk, and the

utterances that emerge w ith those turn s. Visual events, such as gaze, play a

central role in th is process but their sense and relevance is established th rough

their embedd edness in other mean ing making tasks and practices, such as the

prod uction of a strip of talk that is in fact heard and attended to by its add ressee.

This links vision to a h ost of other phenom ena includ ing language an d the visible

body as an unfolding locus for the display of meaning and action. Second , what

the analyst seeks to do is not to provide his or her ow n gloss on how v isual

phenomena m ight be m eaningful, but instead to demonstrate how the

participan ts themselves not on ly actively orient to par ticular kinds of visual

events (such as states of gaze), bu t use them as a constitu tive featu re of the

activities they are engaged in (for example by modifying th eir talk in terms of

wh at they demonstrably see). Third, in addition to the spatial dimension that is

natu rally associated w ith vision, these p rocesses also have an intrinsic temporal

dimension as changes in visual events are marked by, and lead to, ongoing

changes in the organ ization of emerging action. If one had only a static snap shot,

p. 160


7/40

6

or measured only a single structural possibility, such as mutu al gaze instead of

looking a t the temp orally u nfolding interplay of different combinations of

participan t gaze, the type of analysis being pursued here would be impossible.

Fourth, such analysis requires data of a particular type, specifically a record that

maintains as much information as possible about the setting, embod ied displays

and spatial organization of all relevant participan ts, their talk, and h ow events

change throu gh time. In p ractice no record is comp letely adequ ate. Every camera

position exclud es other views of what is hap pening. The choice of where to p lace

the camera is but the first in a long series of crucial analytical decisions. Despite

these limitations a video or film record d oes constitute a relevant d ata source,

something that can be worked w ith in an imperfect world .

Fifth, crucial problems oftranscription are posed. The task of translating the

situated, embod ied p ractices used by p articipants in interaction to organize

phenomena relevant to vision p oses enorm ous theoretical and m ethodological

problems. Our ability to transcribe talk is built up on a process of analyzing

relevant stru cture in the stream of speech, and marking those distinctions with

wr itten symbols, that extends back thousand s of years, and is still being mod ified

today (for examp le the system developed by Gail Jefferson (Sacks, Schegloff and

Jefferson 1974: 731-733) for transcribing the texture of talk-in-interaction,

includ ing phenom ena, such as momentary restarts and sou nd stretches, that are

crucial for the analysis being reported here). When it comes to the transcription

of visual ph enomena we are at the very beginning of such a p rocess. The arrows

and other symbols Ive used to mark gaze on a transcript (see Goodw in 1981)

capture on ly a small part of a larger comp lex constituted by bod ies interacting

together in a relevant setting. The decision to describe gaze in terms of the

speaker-hearer framework is itself a major analytic one, and by no means simp le,

neutral description. Moreover a gazing head is embed ded within a larger

postu ral configuration, and indeed different par ts of the body can

simultaneously d isplay orientation to different p articipants or regions (see

Kend on 1990b, Schegloff 1998), creat ing participation frameworks of

considerable complexity. Thus on occasion a transcriber w ants some way of

indicating on the p rinted page posture and alignment. In ad dition, not only the

bodies of the participants, but also phenomena in their surrou nd , can be crucial

p. 161


8/40

7

to the organization of their action. To try to m ake the ph enomena Im analyzing

independently accessible to the reader so that she or he can evaluate m y analysis,

Ive experimented w ith using tran scription symbols, frame grabs, diagrams, and

movies embedded in electronic versions of papers. Multiple issues are involved

and no method is entirely successful. On the one han d th e analyst needs

materials that maintain as mu ch of the original structure of the events being

analyzed as p ossible, and wh ich can be easily and repetitively replayed. On the

other hand, just as a raw tape recording d oes not display the analysis of

segmental structure in the stream of speech p rovided by transcription with a

phonetic or alph abetic writing system, in itself a video, even one that can be

embedd ed w ithin a paper, does not provide an analysis of how visible events are

being parsed by p articipants. The comp lexity of the phenom ena involved

requires multiple methods for rendering relevant d istinctions (e.g., accurate

transcription of speech, gaze notation, fram e grabs, diagrams, etc., see also Ochs

1979). Moreover , like the two-faced Roman god Janus, any tran scription system

mu st attend simultaneously to two separate fields, looking in one direction at

how to accurately recover throu gh a systematic notation the end ogenous

structure of the events being investigated, while simultaneously keeping an other

eye on the add ressee/ reader of the analysis by attemp ting to present relevant

descript ions as clearly and vividly as possible. In man y cases d ifferent stages of

analysis and presentation w ill require m ultiple transcriptions. There is a

recursive interplay between analysis and method s of description.

Work in Conversation Analysis has provid ed extensive stud y of how the gaze

of participants tow ard each oth er is consequen tial for the organization of action

within talk-in-interaction. Phenomena investigated include th e way in w hich

speakers change the structure of an emerging utterance, and the sentence being

constructed within it, as gaze is moved from one typ e of recipient to another, so

that the u tterance maintains its app ropriateness for its addressee of the mom ent

(Goodw in 1979, 1981); how speakers modify d escript ions in term s of their

hearers visible assessment of wh at is being said (M.H. Goodwin 1980b); how

genres such as stories are constructed not by a speaker alone, but instead

through the d ifferentiated visible d isplays of a range of structurally d ifferent

kind s of recipients (speaker, pr imary addressee, pr incipal character, etc. See


9/40

8

Goodwin 1984); the organization of gaze and co-participation in medical

encoun ters (Heath 1986, Robinson 1998); the interactive organization of

assessments (Goodw in and Goodw in 1987), gesture (Goodw in in press, Streeck

1993, 1994), the u se of gaze in activities such as w ord searches (M.H. Good win

and C. Goodw in 1986), etc. Though not strictly lodged within Conversation

Analysis the w ork of Kendon (1990a. 1994, 1997) on both the interactive

organ ization of bodies as they frame states of talk, and on gesture, is central to

the stu dy of visible behavior in interaction. Haviland (1993) provides imp ortant

analysis of the interactive organization of gesture within narration (for extensive

analysis of gesture from a p sychological perspective see McNeill 1992).

Scientific Images

The visible, gazing bod y, and the orientation of participants tow ard eachother as they co-prod uce states of talk is central to the w ork in

ConversationalAnalysis just examined . By w ay of contrast mu ch work w ithin

Ethnomethod ology has focused not on th e bodies of actors, but instead on the

images, diagrams, graph s and other visual p ractices used by scientists to

construct the crucial visual working environm ents of their d isciplines. As noted

by Lynch and Woolgar (1990:5):

Manifestly, what scientists laboriously piece together, pick up in

their hand s, measure, show to one another, argue about, and

circulate to others in th eir comm un ities are not natural objects

indep enden t of cultural processes and literary forms. They are

extracts, tissue cultures, and residu es imp ressed w ithin grap hic

matrices; ordered, shaped, and filtered samples; carefully aligned

ph otograph ic traces and chart record ings; and verbal accoun ts.

These are the p roximal things" taken into th e laboratory and

circulated in p rint and they are a rich repository ofsocial

actions.

Despite importan t d ifferences in su bject matter and method ology both fields

emphasize the importan ce of focusing not on representations or other v isual

phenomena as self-contained en tities in their own right, but instead on how they

are constructed, attend ed to, and used by par ticipants as componen ts of the

p. 162


10/40

9

endogenous activities that make u p the lifeworld of a setting. Thus, in

introdu cing their imp ortant volume onRepresentation in Scientific Practice Lynch

and Woolgar (1990: 11) define their inqu iry as follows:

Instead of asking what d o we m ean, in various contexts, by

representation? the stud ies begin by asking, What do the participants,

in this case, treat as representation?

Note that what m ust be investigated is specified both in term s of the orientation

of the pa rticipants, and with respect to the features of the relevant local setting

(e.g., in this case). This leads to a distinctive ethn omethod ological persp ective

on reflexivity :

Reflexivity in th is usage m eans, not self-referential nor reflective

awareness of representational practice, but the inseparability of a theory

of representation from the h eterogeneous social contexts in wh ich

representations are composed and used (Lynch and Woolgar 1990 12).

In a classic article Lynch (1990 :153-154) formulates the task of analyzing

scientific representa tions as that of describing the publicly visible externatized

retina that is the site for the p ractices implicated in the social constitution of the

objects that are th e focus of scientific work:

This study is based on the prem ise that visual displays are more

than a simple matter of supplying p ictorial illustrations for

scientific texts. They are essential to how scientific objects and

orderly relationships are revealed an d mad e analyzable. To

app reciate this, we first need to wrest the idea ofrepresentation from

an ind ividu alistic cognitive foundation, and to replace a

preoccup ation with images on th e retina (or alternatively mental

images or pictorial ideas) with a focus on the externalized retina

of the graphic and instrum ental fields u pon which the scientific

image is imp ressed and circulated.

Using as d ata images from scientific journal art icles and books Lynch describes

two families of practices used to constitu te the visible scientific object: selection

and mathematization. Selection, illustrated through dou ble images in wh ich a

photograph and a diagram of entities visible in the ph otograph are presented

side by side, is described as a host of practices that iteratively transform one

p. 163


11/40

10

image of an entity into another (e.g. the photograp h to the diagram) while

simultaneously structuring and shap ing what it is that is being represented.

Crucial to this process is that fact that d ifferent selective/ shaping practices,

including Filtering, Uniforming, Upgrading and Defining can be repetitively

app lied creating not just a single image, but a linked, d irectional chain of

representations Indeed mu ch of the w ork of actually d oing science consists in

building and shap ing what Latour (1986) (see also Latou r and Woolgar 1979)

have called inscriptions in this fashion. Mathematization refers not simp ly to

the use of nu mbers, but instead to the host of practices used to tr ansform

recalcitrant events into mathematically tractable visual and graphic displays e.g.,

graph s, charts and d iagrams. Thus an image showing a map of lizard territories

is assembled throu gh, among oth er operations, driving stakes into the lizard s

environment to create a grid for measurement (and thus injecting a scientifically

relevant Cartesian sp ace into the very h abitat being stud ied), repetitively

capturing lizards, distingu ishing th em from each other by cutting off a different

pattern of toes on each lizard, recording each capture on a pap er map of the

staked ou t territory, and finally draw ing lines around collections of points to

create the m ap . As noted by Lynch (1990: 171) the p rod uct of these p ractices, e.g.,

the published map , is a hybrid object that is demon strably mathem atical,

natural and literary. Note how in all of these cases the focus of analysis is on the

contextually based practices of the participan ts who are assembling an d using

these images to accomp lish the w ork that defines their p rofession.

Though em erging from psychological anthropology, rather than

ethnomethdology, Hutchins (1995) grou nd breaking stud y of the cognitive

practices required to nav igate a ship outlines a major p erspective for the an alysis

of both images and seeing as forms of work-relevant p ractice. Hu tchins

dem onstrates how the practices required to navigate a ship are not situated

within the mental life of a single individu al, but are instead em bedd ed w ithin a

distributed system that encomp asses visual tools such as maps and instruments

for juxtaposing a land mark and compass bearing within the same visual field,

and actors in stru cturally d ifferent p ositions wh o use alternative tools and, in

part because of this, perform d ifferent kind s of cognitive operations, many of


12/40

11

wh ich h ave a strong visual comp onent (e.g., locating land marks, p lotting

positions on a map, etc.).

Images in Interaction

All of the work discussed so far takes as its point of departure for theinvestigation of visual ph enomena the task of describing an d analyzing the

practices used by participan ts to construct the actions and events that m ake up

their lifeworld. Rather than stand ing alone as a self-contained analytic dom ain,

visual ph enomena are constituted and mad e meaningful through the way in

wh ich they are embedd ed w ithin this larger set of practices. However, within

this common focus, two quite d ifferent ord ers of visual p ractice have been

examined. Research in science studies has investigated the images prod uced by

scientists, and the way in w hich they visually and math ematically structure thewor ld that is the focus on their inquiry, without h owever looking in much detail

at how scientists attend to each other as living, meaningful bodies, or structure

wh at they are seeing through the organ ization of talk-in-interaction. By w ay of

contrast studies of the interactive organ ization of vision in conversation looked

in considerable detail at how participan ts treat the visual displays of each

others bodies as consequential, and how this is relevant to the moment-by-

moment production of talk, but d id not focus m uch analysis on images in the

environment. Clearly all of the phenomena noted the visible body,

participation, gesture, the d etails of talk and language u se, visual structure in the

surrou nd , images, maps and other representational practices, the pu blic

organ ization of visual practice within the worklife of a profession, etc. are

relevant. The question arises as to wh ether it is possible to analyze such d isparate

phenomena w ithin a coherent analytic framework.

Before turn ing to stud ies that have probed such questions several issues must

be noted . First, it is clearly not the case that the on ly acceptable analysis is one

that includes th is full range of all possible visual p henom ena. Both p articipants

and the structures that p rovide organization for action and events use visible

phenomena selectively. Parties speaking over the telephone can see neither either

others bodies nor events in a common surrou nd . A scientific journ al can be read

in the absence of the parties who constructed its text and d iagrams. More

p. 164


13/40

12

interestingly within face-to-face interaction participants can continuously shift

between actions that invoke, and p erhaps require, gaze toward specific events in

the surround , and those make relevant gaze toward no more than each others

bodies, and even in th is more limited case there may be a real issue as to whether

it is relevant to attend to everything that a body does, e.g., some gestures mad e

by a speaker may not requ ire gaze toward them from an ad dressee. There is thu s

an essential contingency, not only for the analyst bu t more crucially for the

participants themselves, as to what su bset of possible visual events are in fact

relevant to the organ ization of the actions of the mom ent. Moreover, this means

that in ad dition to investigating how different kinds of visible phenom ena are

organized, the analyst mu st also take into account how participan ts show each

other what kind s of events they are expected to take into account at a p articular

mom ent, for example to indicate that a participan t, gesture, or entity in th e

surrou nd should be gazed at. There is thus not only comm un ication through

vision, bu t also ongoing comm un ication abou t relevant vision (Goodw in 1981,

1986; in p rep aration, Streeck 1988).

Second visual events are quite heterogeneous, not only in wh at they m ake

visible, but more crucially in their structure. Consider for example the issue of

temporality. Both gestures and the d isplays of postu ral orientation used to bu ild

participation framew orks are performed by th e body w ithin interaction.

However, while gestures, like the bits of talk they accompany, are typically brief

(e.g. they frequently fall within the scope of a single utterance) and display

semantic content relevant to th e topic of the m oment, participation d isplays

frame extended strips of talk and typically provide information abou t the

participants orientation rather th an the specifics of wh at is being d iscussed.

Bodily displays with one kind of temp oral du ration (and information content)

are thus embed ded within another class of visual displays being mad e by the

body w hich have a quite different structure.

Third , the structure of visual signs, including their possibilities for

prop agation through space and time, can be intimately tied to the medium used

to construct them. A major theme of Shakespeares sonnets focuses on the

contrast between the temporally constrained hu man bod y, cond emned to

inevitable decay, and the (limited) possibilities for transcending such corruption

. 165


14/40

13

provid ed by language inscribed on the p rinted page which can remain fresh and

alive long after its author an d subject have passed into du st. This contrast

between th e temp oral possibilities provided by alternative media (e.g., the bod y

and docum ents) constitutes an ongoing resource for pa rticipants in vernacular

settings as they bu ild, throu gh interaction w ith each other, the events that make

up their lifewor ld. In ad dition to the d isplays made by a fleeting gesture or local

participation framework, participants also have access to images and docum ents

wh ich can encompass m ultiple interactions and quite d iverse settings. This arises

in par t from the specific media u sed to constitute the signs they contain. Rather

than being lodged within an ever changing hu man body, such d ocum ents

constitute w hat Latou r (1987: 223) has called imm utable mobiles, portable

material objects that can carry stable inscriptions of various types from place to

place and through time.

However, despite the way in w hich crucial aspects of the structure of images

and docum ents remain constant in d ifferent environm ents, they are not self-

contained visual artifacts that can be analyzed in isolation from the processes of

interaction and w ork practices through w hich they are mad e relevant and

meaningful. The same image or docum ent can be construed in quite different

ways in alternat ive settings. For example, a schedule listing all arriving and

dep arting flights was a m ajor tool for almost all workgroups at th e airport

stud ied by the Xerox PARC workplace project (Brun-Cottan et al. 1991, Goodwin

and Goodw in 1996, Suchman 1992), and indeed it linked diverse w orkers

throughou t North America into a common w eb of activity. However w hile

baggage loaders carefully structured their work to anticipate arriving flights, so

that p lanes could be speedily unloaded , these same arrival times were almost

ignored by gate agents looking at the same schedule, but concerned w ith the

dep arture of passengers. Each work group h ighlighted the comm on docum ent in

ways relevant to the specific work tasks it faced. Similarly, on the oceanographic

ship reported in Goodwin (1995) a map showing where samples would be taken

in the Atlantic at the mou th of the Amazon, was a major docum ent at all stages

of the research project. Before the ship sailed th e places where samples could be

taken w as the focus of intense political debate betw een d ifferent grou ps of

scientists and the Brazilian and American governm ents; after the project was


15/40

14

comp leted the map p rovided an infrastructure for graph ic displays that could be

used in pu blished journal articles to show wh at the scientists had found about

how the w aters of the Am azon and the Atlantic interacted w ith each other, i.e., a

way of making visible relevant scientific phenomena; during the voyage itself the

map not only provided a comm on framework for the quite different work of

various teams of scientists and the crew nav igating the sh ip, but could also be

looked at by lab technicians not able to go to bed for days at a time because of the

map s incessant samp ling d emand s, to locate places wh ere stations w ere far

apart and rest was p ossible. In brief, though the material form of images and

docum ents gives them an extended temporal scope, and the ability to travel from

setting to setting, they cann ot be analyzed as self-conta ined fields of visually

organized m eaning, but instead stand in a reflexive relationship to th e settings

and p rocesses of embodied hum an interaction through w hich they are

constituted as m eaningful entities. To explicate such events ana lysis must deal

simultaneously with th e quite different structure and temporal organization of

both local embodied p ractice and endu ring graph ic displays.

Finally, the visual (and other p roperties) of settings stru cture environments

that sh ape, on an historical time scale, the activities systematically performed

within th ose settings. A very simp le examp le is provided by the bridge of the

oceanographic ship which not only had a wind ow facing forward so the

helmsman could steer the ship and watch for trouble, but also a window facing

backward s. This was u sed by a w inch operator who had the task of lifting heavy

instrument packages in and out of the sea. Though being u sed here to d o science,

this arrangem ent is in fact a systematic solution to a repetitive problem faced by

sailors, such as fishermen using n ets, who have to m aneuver heavy objects while

a sea. Solutions found to these tasks, such as the rear facing w indow with the

visual access it provides (as well as the forward window facilitating n avigation),

are built into the tools that constitute the work environments u sed by subsequent

actors faced w ith similar tasks. See Hutchins (1995) for illum inating analysis of

this process, includ ing tools that visually structure complex mathematical

calculations, as well as map s. Both w ork environm ents and many of the tools

used within them (comp uter d isplays, etc.) structure in quite specific ways the

embodied visual p ractices of those who inhabit such settings.

p. 166


16/40

15

In an attemp t to come to terms w ith such issues Goodw in (in press) has

prop osed that images in interaction are lodged within end ogenous activity

systems constituted through th e ongoing, changing dep loyment of mu ltiple

semiotic fields which mutually elaborate each other. The term semiotic field is

intended to focus on signs-in-their-med ia, i.e., the way in w hich what is typ ically

been attended to are sign p henomena of various typ es (gestures, maps, displays

of bodily orientation, etc.) which have variable structural properties that arise in

part from the d ifferent kind s of materials used to make them visible (e.g., the

body, talk, documents, etc.). Bringing signs lodged within different fields into a

relationship of mutu al elaboration produ ces locally relevant meaning and action

that could not be accomp lished by one sign system alone. Consider for example a

place on a m ap ind icated by a p ointing finger wh ich is being construed in a

specific fashion by the accomp anying talk. Neither the map as a w hole, that is a

self-sufficient representation, nor the pointing finger in isolation from a) its

target (the spot on th e map ) and b) the construal being p rovided by the talk, nor

the talk alone w ould be sufficient to constitute the action m ade visible by the

conjoined use of the three semiotic fields, each of wh ich p rovid es resources for

specifying how to relevantly see and understand the others (see the brief

d iscussion of the Rodney King data below for a specific examp le; see Goodwin

in preparation for more detailed analysis of pointing). The particular subset of

semiotic fields available in a setting that p articipants orient to as relevant to the

construction of the actions of the m oment constitutes a contextual configuration.

As interaction u nfolds contextual configurations can change as new fields are

add ed to, or d ropp ed from, the specific mix being u sed to constitute the events of

the moment. Thus, as contextual configurations change th ere is both un folding

pu blic semiotic structure an d contingency(and indeed in some circum stances

actions can m isfire when ad dressees fail to take into accoun t a relevant semiotic

field, such as the sequential organization p rovided by a prior unheard utterance

see Goodw in in preparation for an examp le).

Professional Vision

Work settings provid e one environment in wh ich the interplay between situated,

embodied interaction, and the u se of visual images of different types, can be

p. 167


17/40

16

systematically investigated. In man y w ork settings par ticipants face the task of

classifying visual phenom ena in a w ay that is relevant to the work they are

charged w ith performing. Frequently they m ust also construct different kinds of

representations of visual stru cture in the environment that is the focus of their

professional scrut iny. We will now briefly examine how such vision is socially

organ ized in two tasks faced by archaeologists: 1) color classification and 2) Map

making, and then look at how such professional vision was both constructed and

contested in the tr ial of four policemen charged w ith beating an African

American motor ist, Mr. Rodney King. The key eviden ce at the trial was a

videotape of the beating.

Color Classification as Historically Structured Professional Practice

As par t of the work involved in excavating a site, archaeologists make m aps

showing relevant stru cture in the layers of dirt they uncover. In ad dition to

artifacts, such as stone tools, archaeologists are also interested infeatures,such the

remains of an old h earth or the ou tlines of the posts that held up a building. Such

features are typically visible as color d ifferences in th e dirt being examined (e.g.,

the remains of a cooking fire will be blacker than the su rroun ding soil, and the

holes used for p osts will also have a different color from the soil aroun d them).

Field archaeologists thu s face the task of systematically classifying the color of

the d irt they are excavating. The m ethods th ey use to accomp lish this taskconstitute a form of professional visual p ractice. As dem onstrated by the

discussion of Lynchs analysis of scientific representation, and the brief

description of the oceanograp hers, crucial work in m any different occup ations

takes the form of classifying and constructing visual p henomena in w ays that

help shap e the objects of knowledge th at are the focus of the w ork of a profession

(e.g., architects, sailors plotting courses on charts, air traffic controllers,

professors making graphs and overheads for talks and classes, etc.). Such

professional vision constitutes a persp icuou s site for systematic stud y of howdifferent kinds of phenomena intersect to organize a commu nitys practices of

seeing.

Goodwin (1996, in press) describes how archaeologists code the color of the

dirt they are excavating throu gh u se of a Munsell chart. The following show s two


18/40

17

archaeologists performing th is task, the Mu nsell page that they are u sing, and

the coding form where they will record their classification:

17 Pam: En this one. ((Points at color patch))

18 (0.4) ((Jeff moves trowel))

19 Jeff: nuhhh?20 (1.8)

21 Pam: Or that one? ((Points at color patch))

Within this scene are a nu mber of different kinds of phenomena relevant to

the organ ization of visual p ractice, includ ing tools that stru cture the process of

seeing an d classification, and docum ents that organize cognition an d interaction

in the current setting w hile linking these p rocesses to larger activities and other

settings. These archaeologists are intently examining the color of a tiny sam ple of

d irt because they have been given a coding form to fill out. That form t ies their

work at th is site to a range of other settings, such as the offices and lab of the

senior investigator, where the form being filled in here will eventually become

part of the perm anent record of the excavation, and a componen t of subsequent

analysis. The multivocality of this form, the way in which it displays on a single

. 168


19/40

18

surface the actions of multip le actors in structura lly d ifferent positions, is show n

visually in vivid fashion by th e contrast between the printed coding categories,

and the hand written entries of the field workers.

The use of a coding form su ch as this to organize the perception of natu re,

events, or people within the d iscourse of a p rofession carries with it an array of

perceptual and cognitive operations that have far reaching impact. Coding

schemes d istributed on forms allow a senior investigator to inscribe his or her

perceptual d istinctions into the w ork p ractices of the technicians w ho code the

data. By using such a system a w orker views the world from the perspective it

establishes. Of all the possible ways that the earth could be looked at, the

perceptual w ork of field workers using this form is focused on d etermining the

exact color of a minu te samp le of d irt. They engage in active cognitive work, but

the p arameters of that work have been established by the classification system

that is organ izing their perception. In so far as the coding scheme establishes an

orientation toward the world, a work-relevant w ay of seeing, it constitutes a

structure of intentionality whose proper locus is not the isolated, Cartesian mind ,

but a m uch larger organizational system, one that is characteristically m ediated

through mu nd ane bureaucratic docum ents such as this form.

Rather than stand ing alone as self-explicating textual objects, forms are

embedd ed w ithin webs of socially organized, situated practices. In ord er to make

an entry in the slot provided for coloran archaeologist mu st make u se of another

tool, the set of standard color samp les provided by a Mu nsell chart. This chart

incorporates into a portable physical object the resu lts of a long history of

scientific investigation of the p roperties of color.

The Mun sell chart being u sed by the archaeologists contains not just on e, but

three different kind s of sign systems for describing each point in the color space

it provides: 1) a set of carefully controlled color samples arranged in a grid to

dem onstrate the changes that result from systematic variation of the variables of

Hu e , Chroma and Value u sed to d efine each color (each p age displays an

ordered set of Value and Chroma variables for a single hue); 2) numeric

coordinates for each row and column , the intersection of what specifies each

square as a p air of nu mbers (e.g., 4/ 6 on the 10YR Hue page); and 3) standard

color names such as dark yellowish brown (these names are on the left facing

p. 169


20/40

19

page w hich is not reproduced here). Moreover these systems are not p recisely

equivalent to each other. For example several color squares can fall within the

scope of a single nam e.

Why d oes the Munsell page contain mu ltiple, overlapping representation of

wh at is app arently the same visual entity (e.g., a par ticular choice within a larger

set of color categories)? The answer seems to like in the way that each

representation as a semiotic field with its own distinctive prop erties makes

possible alternative operations and actions, and thus fits into d ifferent kind s of

activities. Both the names and nu mbered grid coordinates can be written, and

thus easily transported from the actual excavation to the other w ork sites, such as

laboratories and journals, that constitute archaeology as a profession. The

numbers p rovide the most precise description, and d o not require translation

from language to langu age. However locating the color indexed by the

coordinates requ ires that the classification be read with a Mu nsell book at hand .

By way of contrast the color names can be grasped in a way that is adequate for

most p ractical pu rposes by any competent sp eaker of the language u sed to write

the rep ort. The outcome of the activity of color classification initiated by the

empty square on th e coding form is thu s a set of portable lingu istic objects that

can easily be incorporated into the u nfolding chains of inscription th at lead step

by step from the d irt at the site to reports in the archaeological literature.

How ever, as arbitrary linguistic signs produ ced in a medium that d oes not

actually m ake visible color, neither the color names nor the num bers, allow d irect

visual comparison betw een a sam ple of dirt an d a reference color. This is

precisely what the color patches and viewing h oles make possible. In brief, rather

than simply specifying unique p oints in a larger color space, the Mun sell chart is

used in mu ltiple overlapp ing activities (comparing a reference color and a p atch

of dirt as par t of the work of classification, transpor ting those results back to the

labe, comparing sam ples, publishing reports, etc.), and thu s represents the

same entity, a particular color, in multiple ways, each of which makes p ossible

different kind s of operations because of the unique p roperties of each

representational system.

In addition to its various sign system s it also contains a set of circular holes,

positioned so that one is ad jacent to each color patch. To classify color the

p. 170


21/40

20

archaeologist puts a sm all samp le of dirt on the tip of a trowel, pu ts the trowel

directly un der th e Munsell page and then m oves it from hole to hole until the

best match w ith an adjacent color samp le is found . With elegant simplicity the

Mun sell page with its holes for viewing the sam ple of dirt on the trow el

juxtaposes in a single visual field two qu ite different kind s of spaces: 1) actual

dirt from the site at the archaeologists feet is framed by 2) a theoretical space for

the r igorous, replicable classification of color. The latter is both a conceptual

space, the prod uct of considerable research into p roperties of color, and an actual

physical space instantiated in the orderly m odification of variables arranged in a

grid on the Mu nsell page. The pages juxtaposing color p atches and viewing holes

that allow the d irt to be seen right n ext to the color sample provide an

historically constituted architecture for perception, one that encapsulates in a

material object theory an d solutions d eveloped by earlier w orkers at other sites

faced w ith the task of color classification. By juxtap osing unlike spaces, but on es

relevant to th e accomp lishment of a specific cognitive task, the char t creates a

new , distinctively hum an, kind of space. It is precisely here, as bits of d irt are

shaped into the work relevant categories of a specific social group, that nature

is transformed into culture.

How are the resources provided by the chart made visible and relevant

within talk-in-interaction? At line 17 Pam moves her han d to the space above the

Mun sell chart an d points to a p articular color patch wh ile saying En this one.

Within th e field of action created by the activity of color classification, what Pam

does here is not simply an indexical gesture, but a p roposal that the ind icated

color migh t be the one they are searching for. By virtue of such cond itional

relevance (Schegloff 1968) it creates a new context in w hich rep ly from Jeff is the

expected n ext action. In line 19 Jeff rejects the p roposed color. His move occurs

after a noticeable silence in line 18. However that silence is not an em pty space,

but a p lace occup ied by its own relevant activity. Before a comp etent answ er to

Pams prop osal in line 17 can be made, the dirt being evaluated h as to be placed

un der the viewing hole next to the samp le she indicated, so that the tw o can be

compared . During line 18 Jeff moves the trowel to this position. Because of the

spatial organization of this activity, specific actions have to be p erformed before

a relevant task, a color comparison, can be competently performed. In brief, in

p. 171


22/40

21

this activity the spatial organization of the tools being w orked w ith, and th e

sequential organization of talk in interaction interact with each other in th e

prod uction of relevant action (e.g. getting to a place where one m ake an expected

answer requires rearrangement of the visual field being scrutinized so that th e

jud gment being requested can be competently p erformed ). Here socially

organized vision requ ires embod ied manipulation of the environment being

scrutinized.

It is comm on to talk about stru ctures such as the Munsell chart as

representations. How ever exclusive focus on the representational prop erties of

such structures can seriously distort our u nd erstand ing of how such entities are

embedd ed w ithin the organization of human p ractice. With its viewholes for

scru tinizing sam ples, the page is not simp ly a perspicuous rep resentation of

current know ledge about th e organization of color, but a space designed for the

ongoing prod uction of particular kinds ofaction.

We will now look at how a group of archaeologists make a m ap. This process

will allow us to examine the interface between seeing, writing practices, talk,

human interaction and tool use (see Goodw in 1994 for more d etailed analysis).

Map Making and t he Pract ices of Seeing it Requires

Map s are central to archaeological practice. The p rofessional seeing required to

prod uce and m ake use of a visual document, such as a map, encompasses notonly the image itself but also the ability to competently see relevant structure in

the territory being m app ed, mastery of appropriate tools, and on occasion the

ability to an alyze the w ork-relevant actions of anothers body. These d ifferent

kinds of phenomena can be brough t together within the temporally unfolding

process of hu man interaction used to accomp lish the activity of making a map . In

the following, two archaeologists are making a map to record w hat they have

foun d in a p rofile of the d irt on the side of one of the square holes they have

excavated. Before actually setting pen to paper some relevant events in the d irt,such as the bound ary between two different kinds of soil, are highlighted by

outlining them w ith the tip of a trowel. The structure visible in the dirt is then

map ped on a sheet of graph pap er. Typically this task is don e by two

participan ts working together. One u ses a pair of rulers (one laid horizontally on

the surface, and the other a hand held tape measure used to m easure d epth


23/40

22

beneath the sur face) to measure the length an d d epth coordinates of the points in

the dirt that are to be transferred to the m ap, and then speaks these coord inates

as pairs of numbers (e.g., at fifteen th ree point tw o). The second p erson p lots

the points specified on th e graph pap er, and d raws lines between successive

measurements. What w e find here is a small activity system that encomp asses

talk, writing, tools and distributed cognition as tw o parties collaborate to inscribe

events they see in the earth onto p aper. Here Ann, the party draw ing the map, is

the senior Archaeologist at the site, and Sue, the person making measurem ents is

her Student:

p. 172


24/40

23

1 An n: Give m e th e grou nd su rface over here

2 to about ninety .

3 (1.6)4 Ann: No- No- Not at ninety.

5 From you to about ninety.

6 (1.0)

7 Sue: Oh.

8 Ann: Wherever there's a change in slope.

9 (0.6)

10 Sue: Mm kay.

11 Ann: See so if its fairly flat

12 I'll need one13 where it stops being fairly flat.

14 Sue: Okay.

15 Ann: Like right there.

Line DrawnWith Trowel

SurfaceTape

MeasureSueAnn

Ruler


25/40

24

The sequence to be examined begins with a d irective. Ann, the w riter, tells Sue

the measurer, to Give me the ground su rface over here to aboutninety.

How ever before Sue has p roduced any num bers, indeed before she has said

anything w hatsoever, Ann in lines 4 and 5 challenges her, telling her that what

she is doing is wrong: No- No- Notat ninety.From you t o about ninety.

Directives are a classic form of speech action that sociolingu ists have used to

probe the relationship between language and social structure, and in particular

issues of power and gender. Here Sue formats both her d irective and her

correction in very strong, d irect aggravated fashion. No forms of mitigation are

foun d in either utterance, and Ann is not given an opportun ity to find and

correct the trouble on h er own. Directives formatted in this fashion h ave

frequen tly been argu ed to d isplay a hierarchical relationship, i. e., Ann is treating

Sue as someone that she can give direct, unm itigated ord ers to. And indeed Ann

is a professor and Sue is her stud ent.

Issues of power do not how ever exhau st the social phenomena visible in th is

sequence. Equally imp ortant are a range ofcognitive processes that are as

socially organized as the relationships between the participants. For example, in

that Sue has not p rodu ced an answer to the directive, how can Ann see that there

is something w rong w ith a response that has not even occurred yet? Crucial to

this process is the ph enomenon ofconditional relevance first described by

Schegloff (1968). Basically a first ut terance creates an interp retive environm ent

that w ill be used to an alyze whatever occurs after it. Here no subsequ ent talk has

yet been produced. How ever, providing an answer in this activity system

encompasses more than talk. Before speaking the set of nu mbers that counts as a

prop er next bit of talk, Sue m ust first locate a relevant point in the d irt and

measure its coordinates. Both her movement throu gh space, and h er use of tools

such as a tape measu re, are visible events. As Ann finishes her directive Sue is

holding the tape m easure against the d irt at the left or zero end of the profile.

How ever, just after hearing ninety Sue m oves both her body and the tape

measure to right, stopping near the 90 mark on the up per ru ler. By virtue of

the field interpretation opened u p through cond itional relevance, Sues

movement and tool use can now be analyzed by Ann as elements of the activity

she has been asked to p erform, and foun d w anting. Sue has moved imm ediately

. 173


26/40

25

to ninety instead of measuring the relevant points between zero and ninety. The

sequential framew ork created by a directive in talk thu s provid es resources for

analyzing and evaluating the visible activity of an ad dressees body interacting

with a relevant environment.

Additional elements of the cognitive operations and kinds of seeing that Ann

requires from Sue in order to make her measurements are revealed as the

sequence continues to unfold. Making the relevant measurements p resupposes

the ability to locate where in the d irt measurements should be mad e. However

Sues response calls this presupp osition into question and leads to Ann telling

her explicitly, in several different w ays, what she shou ld look for in order to

determine where to measure. After Ann tells Sue to measure points between zero

and ninety, Sues does not imm ediately move to points in that r egion bu t instead

hesitates for a full second before replying w ith a w eakOh (line 7). Ann th en

tells her w hat she shou ld be looking for Wherever theres a change in slope

(line 8). This description of course presup poses Sues ability to find in the d irt

wh at will could as a change in slope. Sue again moves her tape measure far to

the right. At this p oint, instead of relying upon talk alone to make explicit the

phenomena that she w ants Sue to locate, Ann m oves into the sp ace that Sue is

attending to and points to one place that should be measured w hile describing

more explicitly w hat constitutes a change in slope: See so if itsfairlyflat Ill

need one where itstops being fairly flat like right th ere.

One of the th ings that is occurr ing within this sequence is a p rogressive

expansion of Sues und erstanding as the d istinctions she must make to carry out

the task assigned to her are explicated and elaborated. In th is process of

socialization th rough language th ere is a growth in intersubjectivity as d omains

of ignorance that p revent the successful accomplishm ent of collaborative action

are revealed and transformed into practical knowledge, a way of seeing, that is

sufficient to get the job at hand don e, such that Sue is finally able to un derstand

wh at Ann is asking h er to do (that is to see the scene in front of her in a manner

that p ermits her to m ake an ap prop riate, comp etent response to the d irective). It

wou ld how ever be wrong to see the un it within which this intersubjectivity is

lodged as simply these two mind s coming together in the work at han d. Instead

the d istinction being explicated, the ability to see in the very complex perceptual

p. 174


27/40

26

field provid ed by the landscape they are attending to, those few events that

coun t as points to be transferred to the m ap, are central to wh at it means to see

the world as an archaeologist, and to use that seeing to bu ild the artifacts, such as

this map , which are constitut ive of archaeology as a profession. Such seeing

wou ld be expected of any competent archaeologist. It is an essential part of what

it means to be an archaeologist, and it is these professional p ractices of seeing

that Sue is being held accountable to. The relevant unit for the analysis of the

intersu bjectivity at issue here the ability of separate ind ividu als to see a

comm on scene in a congru ent, work-relevant fashion is thus not these

individuals as isolated entities, but instead archaeology as a profession, a

comm un ity of comp etent practitioners, most of whom have never m et each

other, but w ho n onetheless expect each other to be able to see and categorize the

wor ld in w ays that are relevant to the w ork, scenes, tools and ar tifacts that

constitute their profession.

The ph enomena examined so far provide some demonstration of how what is

to be seen in a m ap, scene, hum an bod y or image stand s in a reflexive

relationship to other semiotic structures that p articipants are u sing to constitute

visual phenom ena as a relevant component of the events and activities that make

up their lifeworld . These structures includ e language, the constitution of action

and context provid ed by sequen tial organization, and ways of seeing events and

using images of d ifferent typ es that are lodged w ithin the p ractices of particular

social commu nities, such as the profession of archaeology.

Professional Vision in Court

Parties who are not competent m embers of relevant social comm unities can lack

the ability, and/ or the social positioning, to see and articulate visual events in a

consequential way. These issues were mad e d ramatically visible in the trial of

four Los Angeles policemen w ho were recorded on v ideotape ad ministering a

beating to an African American motorist, Mister Rodn ey King, whom they hadstopped after a high speed p ursu it triggered by a traffic violation. When the tape

of the beating was show n on n ational television there was outrage, and even the

head of the Los Angeles police d epartm ent thought that conviction of the officers

was almost autom atic. How ever, at their first trial (they w ere later tried again in

Federal ra ther than state cour t for violating Mister Kings civil rights) all fourp. 175


28/40

27

policemen w ere acquitted, a verdict that triggered an u prising in the city of Los

Angeles, with neighborhood s being burn ed, federal troops being called in, etc.

The crucial evidence at the trial was a visual d ocum ent: the videotape of the

beating. Rather than transparently proving the guilt of the policemen w ho were

seen on it beating a man lying p rone on the groun d, the tape in fact provided the

policemens lawyers w ith their evidence for convincing the jury that th eir clients

were not gu ilty of any w rongd oing. They did this by using langu age, pointing

and expert testimony to structure how the jury saw the events on the tape in a

way the exonerated the p olicemen. In essence they used the tap e of the beating to

dem onstrate that Mr. King w as the aggressor, not the p olicemen, and that the

policemen w ere following prop er police practice for subd uing a violent,

dan gerous su spect (see Goodwin 1994 for more d etailed analysis of such

professional vision). Crucial to their success was their use of another policeman,

Sargent Du ke, as an expert witness. It was argued that laymen could n ot

prop erly see the events on the tap e. Instead, the ability to legitimately see wh at

the body of a suspect was d oing, such as Mr. Kings as he lay on the ground

being beaten, and specifically whether th e suspect w as being aggressive or

comp liant, was lodged w ithin the work practices of the social group charged

with arresting susp ects: the p olice. The ability to see such a body, and code it in

terms of its aggressiveness, was a component of the professional practices that

policemen u se to code the events that are the focus of their w ork. It so far as such

vision is a pu blic comp onent of the work p ractices of a p articular social group ,

someone wh o wasnt present bu t who is a member of the profession, a

policeman, can m ake auth oritative statements abou t what can be legitimately

seen on th e tape. How ever, while policemen constitute a socially organized

profession, susp ects and victims of beatings d ont. Therefore there is no one with

the social stand ing, i.e., mem bership and mastery of the practices of a relevant

social group , to act as an expert w itness to articulate what w as happening from

Mister Kings p erspective.

What was to be seen on the tape was structured through the way in w hich

different semiotic fields, such as structure in the stream of speech, pointing

wh ich h ighlighted specific places and phenomena in the image being looked at,

and events in the image itself, mu tually elaborated each other to p rovide a


29/40

28

construal of events that served the pu rposes of the party articulating the image.

The following p rovides an examp le. At the p oint wh ere we enter th is sequence

the prosecutor has noted that Mr. King ap pears to be moving into a position

app ropriate for han dcuffing him, and that on e officer is in fact reaching for his

han dcuffs, i.e. the su spect is being cooperative.

1 Prosecu tor: So uh would you ,

2 again consider this to be:

3 a nonagressive, movement by Mr. King?

4 Sgt. Duke: At this t ime no I wouldn't. (1.1)

5 Prosecutor : It is aggressive.

6 Sgt. Duke: Yes. It 's starting to be. (0.9)

7 This foot, is laying flat, (0.8)8 There's starting to be a bend. in uh (0.6)

9 this leg (0.4)

10 in his butt (0.4)

11 The buttocks area has started to rise. (0.7)

12 which would put us,

13 at the beginning of ourspectrum again.

indicates that Sgt. Duke is pointing on the screenat the body part described in his talk.

By noting the submissive elements in Mr. Kings posture, and the fact that one of

the officers is reaching for his handcuffs the prosecutor has shown that th e tape

demon strates that Mr. King is being cooperative. If he can establish this point

hitting Mr. King again would be un justified, and the officers should be found

guilty of the crimes they are charged with. The contested vision being d ebated

here has very h igh stakes.

To rebut the vision p roposed by the p rosecutor, Sgt. Duke uses the seman tic

resources provided by language to code as aggressive extremely subtle body

movements of a man lying face down beneath the officers (lines 7-11). Note for

examp le not only line 13s explicit placement of Mr. King at the very edge, the

beginning, of an aggressive spectrum introdu ced in earlier testimony, but also

p. 176


30/40

29

how very small movements are made mu ch larger by situating them w ithin a

prosp ective horizon through the repeated u se ofstarting to (lines 6, 8, 13). The

events visible on the tape are structured, enhanced and amp lified by the

language used to describe them.

This focusing of attention organizes the p erceptu al field p rovided by the

videotape into a salient figure, the aggressive susp ect, wh o is highlighted against

an am orphous backgroun d containing nonfocal participan ts, the officers doing

the beating. Such stru cturing of the materials prov ided by the image is

accomp lished n ot only through talk, but also through gesture. As Sergeant Duke

speaks he brings his hand to the screen and points to the par ts of Mr. Kings

body that he is arguing d isplay aggression. The pointing gesture and the

perceptual field wh ich it is articulating m utu ally elaborate each other. The

touchable events on the television screen provide visible evidence for the

description constructed th rough talk. What emerges from Sgt. Dukes testimony

is not just a statement, a static category, but a demonstration built through the

active interplay between coding scheme and the image to w hich it is being

app lied. As talk and image mu tually enhance each other a demonstration that is

greater than th e sum of its parts em erges, while simultaneously Mr. King, rather

than the p olice officers becomes the focus of attention as the experts finger

articulating the image delineates what is relevant w ithin it.

By virtue of the category system erected by the defense, the minu te rise in Mr.

Kings buttocks noted on the tape u nleashes a cascade of perceptual inferences

that have the effect of exonerating the offers. A rise in Mr . Kings body becomes

interp reted as aggression, which in tu rn justifies the escalation of force. Like

other p arties, such as th e archaeologists, faced with the task of coding a visual

scene, the jury was led to engage in intense, minute cognitive scrutiny as they

looked at the tap e of the beating to decide the issues at stake in the case.

How ever, once the d efense coding scheme is accepted as a relevant framework

for looking at the tap e, the operative perspective for viewing it is no longer a

laypersons reaction to a man lying on the grou nd being beaten, but instead a

micro-analysis of the movements being m ade by that m ans bod y to see if it is

exhibiting aggression.

p. 177


31/40

30

In the first trial, though the p rosecution dispu ted the an alysis of specific body

movements as d isplays of aggression, the relevance of looking at th e tape in

terms of such a category system w as not challenged . A key d ifference in the

second tr ial, which led to the conviction of two of the officers, was th at there the

prosecution gave the jury alternative frameworks for interp reting the events on

the tap e. These included ways of seeing the m ovements of Mr. Kings body that

Sgt. Duke highlighted as normal reactions of a man to a beating rather than as

displays of incipient aggression. In the prosecutions argum ent Mr. King cocks

his leg, not in p reparation for a charge, but because h is muscles natu rally jerk

after being hit w ith a m etal club.

The stud y of the p ractices used to structure relevant vision in scientific and

workplace environments, what H utchins (1995) has called Cognition in the Wild,

has become the focus of considerable research. A major initiative for such studies

was provided by Lucy Scuhman in th e early 1990s when she initiated the

Workp lace project while at Xerox PARC. The site chosen for research w as

groun d op erations at a mid-sized airport. Docum ents and images of many

different types, and the ability of actors in alternative structural positions to see

and analyze events in relevant w ays, were crucial to the work of the airport.

Phenomena that received extensive stud y includ ed w ork relevant seeing of

docum ents, airplanes and events (Goodw in and Goodw in 1996), the constitution

of shared workspaces (Suchman 1996), the stud y of how a common docum ent

coordinated different kinds of work in different w ork settings, and the p ractices

involved in seeing and shaping p henom ena in collaborative work (Suchman and

Trigg 1993, Bru n-Cottan 1991). In part because of the central role played by

visual phenom ena in the work being analyzed, the projects final report w as

submitted as a videotape (Brun-Cottan et al. 1991). Subsequent analysis growing

from this project has focused on th e organization of both d ocuments and visual

phenomena in a range of occupational settings, such as law firms and th e work

of architects. In England Christian Heath and his collaborators have investigated

the structuring of vision w ithin interaction in a ran ge of settings includ ing the

control room for the Lond on Und erground, centers for the produ ction of

electron ic news, ar t classes, etc. (Heath and Luff 1992, 1996, Heath and Nicholls

1997). In much of th is research there is a focus on h ow core practices for the


32/40

31

organization of talk, reference, gesture and other p henomena central to the

prod uction of action within hum an interaction can encompass not only talk but

also embod iment in a world p opu lated by work-relevant objects. Hinderm aash

and Heath (in p ress) investigate reference within su ch a framew ork. LeBaron

(1998), Streeck (1996) and LeBaron and Streeck (in p ress) examine how gesture

emerges from the interaction of working h and s acting in th e world in settings

such as architects meetings and auto body shops. Robinson (1998) has provided

analysis of participants in m edical interviews organize their interaction by

attending to how gaze is shifted from other participants to relevant visual

materials in the setting, such as medical records. Whalen (1995) analyzed how

the talk of operators respond ing to emergency 911 calls was organized in part by

the task of filling in requ ired information on a computer screen w ith a sp ecific

visual organization. Rogers Hall and Reed Stevens have investigated visual

practices in a ran ge of school, scientific and occup ational settings (Hall and

Stevens 1995; Stevens an d H all in p ress). Research in Compu ter Supp orted

Cooperative Work h as focused sp ecifically on new form s of visual access created

by electronic media. Heath (in p ress) and Heath and Luff (1993) have d one

considerable research on interaction mediated th rough video, demonstrating the

crucial ways in w hich resources available to parties who are actually co-present

to each other are not available in media spaces. Yamazaki and his

colleagues,have explored the systematic problems that arise when p articular

kinds of d irectives, such as instructions for how to use CPR to start a h eart attack

victims heart, are given through talk alone, for example over the p hone, without

access to a relevant visual environment. Patients usually die, since the novice is

not able to place his or her hand s at the appropriate spot on the patients body.

To remedy som e of these issues technologies that incorporate basic resources

available for doing reference in face-to-face interaction, such as p ointing, have

been developed . These include a remote controlled car w ith a laser that has the

ability to move while clearly marking the specific places being pointed at in a

remote environ ment (Yamazaki et al. 1999). Nishizaka (in press) has investigated

how par ticipants coordinate gaze both sp atially and temporally on electronic

docum ents such as compu ter screens. Kawatoko (in p ress) has investigated how

lathe workers organize p erceptu al fields so to make visible the invisible

p. 178


33/40

32

movements of their cutting tools.. Both Kwatoko and Ueno (in press) have

examined the organization of vision on m any d ifferent levels (from docum ents to

systematic placement of objects on the warehouse floor as part of its work flow)

in the w ork p ractices of large warehouse. In all of this work p ractices for seeing

relevant p henomena are systematically embedd ed within processes of social

organization, structures of mu tual accoun tability and the organ ization of activity.

Conclusion

Within both Conversation Analysis and Ethnomethod ology visual phenom ena

have been analyzed by investigating how th ey are made m eaningful by being

embedd ed w ithin the practices that p articipants in a var iety of settings use to

construct the events and actions that m ake up their lifeworld. This has led to the

detailed stud y of a range of quite different kinds of phenom ena, from th einterplay between gaze, restarts and gramm ar in the building of utterances

within conversation, to the construction and use of visual representations in

scientific practice, to how the ability of lawyers to shap e wh at can be seen in the

videotape of policemen beating a suspect can contribute to d isruption of the

body politic that leaves a city in flames, to the part played by visual practices in

both trad itional and electronic workp laces. Visual phenomena that h ave received

particular atten tion include 1) the body as a visible locus for disp lays of

intentional orientation throu gh both gaze and postu re; 2) the body as a locus for

a variety of different kinds of gesture, from iconic elaboration on w hat is being

said in the stream of speech, to pointing, to the hand as an agent engaged with

the world arou nd it; 3) visual documents of man y different types used in both

scientific practice and the w orkp lace, e.g., maps, grap hs, Mun sell charts, coding

forms, schedules, television screens providing access to distant sites,

architectural d rawings, comp uter screens, etc. 4) material structure in th e

environment where action and interaction are situated. This perspective brings

together within a common analytic framework both the d etails of how the visible

body is used to build talk and action in moment to moment interaction, and the

way in w hich h istorically structured v isual images and features of a setting

participate in that process. Rather than standing alone as self-contained, self-

explicating images, visual phenom ena become meaningful throu gh the way in

p. 179


34/40

33

wh ich th ey help elaborate, and are elaborated by, a range of other semiotic fields

sequential organization, structure in the stream of speech, encompassing

activities, etc. that are being used by par ticipants to both construct and make

visible to each other relevant actions. The focus of analysis is always on how the

participan ts in a setting themselves display a consequential orientation to v isual

phenomena (e.g., by shifting gaze after a restart, focusing th eir work on a

Mun sell chart, building images as a core comp onent of the p ractices used to

make visible scientific phen omen a, etc.). A var iety of different m ethod ologies are

employed. How ever a basic comp onent of many research p rojects includ es going

to the site where th e activities being investigated are actually performed, an d

examining wh at the p articipants are doing th ere as carefully as possible.

Videotap e records are frequently m ost useful because of the way in which they

preserve limited but crucial aspects of the spatial and environmental features of a

setting, the tem poral u nfolding organization of talk, the visible displays of

participants bodies, and changes in relevant phenomena in the setting as

relevant courses of action unfold. Analysis typically requires not only viewing

the tape, ethnograph ic records and docum ents collected in a setting, but also the

construction of new visual representations such as tran scripts of many d ifferent

types (note how some in this pap er incorporate both d etailed transcription of the

talk and a var iety of different kind s of graph ic representations). While this

analysis sheds mu ch important new light on how visual ph enomena are

organized throu gh systematic discursive practice, it is not restricted to vision per

se but is instead investigating the m ore general practices used to build action

within situated human interaction.

References Cited

Brun-Cottan, Franoise

1991 Talk in the Work Place: Occupational Relevance.Research on Language

and Social Interaction 24:277-295.

Brun-Cottan, Franoise, et al.

1991 The Workplace Project: Designing for Diversity and Change. Video

produced by Xerox Palo Alto Research Center.

Chomsky, Noam

1965 Aspects of the Theory of Syntax. Cambr idge, Mass.: MIT Press.

p. 180


35/40

34

Goodw in, Charles

1979 The Interactive Construction of a Sentence in N atural Conversation. In

Everyday Language: Studies in Ethnomethodology. George Psathas, ed. Pp .

97-121. New York: Irvington Publishers.

Goodw in, Charles

1980a Restarts, Pauses, and the Achievement of Mutual Gaze at Turn-

Beginning. Sociological Inquiry 50:272-302.

Goodw in, Charles

1981 Conversational Organization: Interaction Between Speakers and Hearers.

New York: Academic Press.

Goodw in, Charles

1984 Notes on Story Structure and the Organization of Participation. In

Structures of Social Action. Max Atkinson and John H eritage, eds. Pp.

225-246. Cambridge: Cambrid ge University Press.

Goodw in, Charles

1986 Gesture as a Resource for the Organization of Mutual Orientation.

Semiotica 62(1/ 2):29-49.

Goodw in, Charles

1994 Professional Vision .American A nthropologist96(3):606-633.

Goodw in, Charles

1995 Seeing in Dep th. Social Studies of Science 25:237-274.

Goodw in, Charles

1996 Practices of Color Classification.Ninchi Kagaku (Cognitive Studies:

Bulletin of the Japanese Cognitive Science Society) 3(2):62-82.

Goodw in, Charles

in preparation Pointing as Situated Practice. In Pointing: Where Language,

Culture and Cognition Meet. Sotaro Kita, ed.

Goodw in, Charles

in press Action and Embodiment Within Situated H um an Interaction.Journal of

Pragmatics .

Goodw in, Charles, and Marjorie Harness Goodw in

1987 Concurrent Operations on Talk: Notes on the Interactive Organization

of Assessments.IPrA Papers in Pragmatics 1, N o.1:1-52.


36/40

35

Goodw in, Charles, and Marjorie Harness Goodw in

1996 Seeing as a Situated Activity: Formulating Planes. In Cognition and

Communication at Work. Yrj Engestrm an d David Middleton, eds. Pp.

61-95. Cambridge: Cambridge University Press.

Goodw in, Marjorie Harness

1980b Processes of Mutual Monitoring Implicated in the Production of

Description Sequences. Sociological Inquiry 50:303-317.

Goodw in, Marjorie Harness, and Char les Goodw in

1986 Gesture and Coparticipation in the Activity of Searching for a Word.

Semiotica 62(1/ 2):51-75.

Hall, Rogers, and Reed Stevens

1995 Making Space: A Comparison of Mathematical Work in School and

Professional Design Practices. Sociological Review :118-275.

Haviland , John B.

1993 Anchoring, iconicity, and orientation in Guugu Yimidhirr pointing

gestures.Journal of Linguistic Anthropology 3(1):3-45.

Heath , Christian

1986 Body Movement and Speech in M edical Interaction. Cambridge:

Cambridge University Press.

Heath , Christian

in press Virtual Looking: Spatial Transformation and Comm un icative

Asymmetries. In Proceedings of the Colloquiem on the Semiotics of Space. P.

Pelligrino, ed . Geneva: University of Geneva.

Heath , Christian, and Pau l Luff

1993 Disembodied Condu ct:Interactional Asymmetries in Video-Mediated

Commun iation. In Technology in Working Order. Graham Button, ed.

Pp. 35-54. London and N ew York: Routledge.

Heath , Christian, and Pau l Luff

1996 Convergent Activities: Line Control and Passenger Information on the

London Un derground . In Cognition and Communication at Work. Yrj

Engestrm and David Midd leton, eds. Pp . 96-129. Cambridge:

Cambridge University Press.

Heath , Christian, and Gillian Nicholls

p. 181


37/40

36

1997 Animating Texts: Selective Readings of News Stories. InDiscourse,

Tools and Reasonsing: Essays on Situated Cognition . Lauren B. Resnick,

Roger Slj, Clotilde Pontecorvo, and Barbara Burge, eds. Pp . 63-86.

Berlin, Heidelberg, New York: Spr inger.

Heath , Christian C., and Paul K. Luff

1992 Crisis and Control: Collaborative Work in London Underground

Control Rooms.Journal of Computer Supported Cooperative Work1(1):24-

48.

Hind marsh, Jon, and Christian Heath

in p ress The Interactional Practice of Reference. Journ al of Pragmatics

Hutchins, Edwin

1995 Cognition in the Wild. Cambridge MA: MIT Press.

Kawatoko, Yasuko

in press Organizaing Multiple Vision.Mind, Culture and Activity .

Kend on, Adam

1990a Conducting Interaction: Patterns

Documents

Handbook of Visual Analysis an Ethnomethodological Approach