Upload
tanya-vdovina
View
221
Download
0
Embed Size (px)
Citation preview
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
1/40
Practices of Seeing:Visual Analysis : An Ethnomethodological Approach
Charles Goodw inApp lied Lingu istics
UCLAcgoodw in@hu mn et.ucla.edu
Pp . 157-182 inHandbook of Visual Analysis
edited by Theo van Leeuw en and Carey JewittLondon : Sage Pu blications
2000 Charles Goodw in
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
2/40
Practices of SeeingVisual Analysis : An Ethnomethodological Approach
Charles Goodw in
A p rimord ial site for the an alysis of hum an langu age, cognition an d action
consists of a situation in which mu ltiple participan ts are attempting to carry out
courses of action together while attending to each other, the larger activities that
their cur rent actions are embedded within, and relevant phenomena in their
surrou nd . Vision can be central to this p rocess.1. The visible bodies of
participan ts provide systematic, changing d isplays about relevant action an d
orientation. Seeable structure in the environment can not only constitute a locus
for shared visual attention, but can also contribute crucial semiotic resources for
the organization of current action (consider for example the u se of graphs an d
charts in a scientific d iscussion). For the past thirty years both Conversation
Analysis and Ethnom ethodology have p rovided extensive analysis of how
hu man vision is socially organized . Both fields investigate the p ractices that
participan ts use to build and shape in concert with each other the stru ctured
events that constitute the lifeworld of a comm un ity of actors. Phenomena
investigated in which vision plays a central role range from sequ ences of talk, to
medical and legal encounters, to scientific knowledge.
The approach taken by both ethnomethodology and conversation analysis to
the study of visual p henom ena is qu ite distinctive. At least since Saussure
proposed studying langue as an an alytically distinct subfield of a more
encompassing science of signs, different kinds of semiotic phenomena (language,
visual signs, etc.) have typ ically been analyzed in isolation from each other.
How ever in the work to be d escribed here neither vision, nor the images or other
phenomena that participan ts look at, are treated as coherent, self-contained
1 Vision is not, however, essential as both the comp etence of the blind and
teleph one conversations d emonstrate. Below it will be argued that situated
action is accomplished through the juxtaposition of multiple semiotic fields,
only some of wh ich m ake vision relevant.
nal Printed page mbers in margin.
p. 157
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
3/40
2
dom ains that can be su bjected to analysis in th eir own terms. Instead it quickly
becomes app arent that visual ph enomena can only be investigated by taking into
account a diverse set of semiotic resources and meaning-making practices that
participan ts dep loy to build th e social worlds that they inhabit and constitute
through on going processes of action. Many of these, such as stru cture provided
by current talk, are not in any sense visual, but the visible phenomena that the
participan ts are attend ing to cannot be prop erly analyzed w ithout them. The
focus of analysis is not thu s not rep resentations or v ision p er se, but instead the
part p layed by visual phenomena in the prod uction of meaningful action.
Both the m ethodology and the forms of analysis used in this app roach can
best be demonstrated throu gh sp ecific examples.
Gaze between Speakers and HearersIn formu lating the d istinction between comp etence and performance Chomsky
(1965: 3-4) argu ed that actual speech is so full of perform ance errors, such as
sentence fragments, restarts and pau ses, that both lingu ists and p arties faced
with the task of acquiring a langu age should ignore it. Investigating a corpus of
conversation recorded on video Goodwin (1980a, 1981, Chapter 2) indeed found
precisely the false starts and changes of plan in m id-course that Chomsky
describes. In the following instead of prod ucing an u nbroken gramm atical
sentence the speaker says:2
2 Talk is transcribed using the system d eveloped by Gail Jefferson (see Sacks,
Schegloff and Jefferson Sacks, et al. 19 74: 731-733). Talk receiving som e formof emphasis is marked with und erlining or bold italics. Punctuation is used to
transcribe intonation: A period indicates falling pitch, a question mark rising
pitch, and a comm a a falling contour, as wou ld be found for examp le after a
non-terminal item in a list. A colon indicates lengthening of the current sound .
A d ash marks the sud den cut-off of the current sound (in English it is
frequen tly realized as glottal stop). Comm ents (e.g., descriptions of relevant
nonvocal behavior) are pr inted in italics within dou ble parentheses. Num bers
within single parentheses mark silences in second s and tenths of a second . A
degree sign () ind icates that th e talk that follows is being spoken with low
p. 158
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
4/40
3
Cathy: En a couple of girls- One other girl from there,
How ever, when the video is examined it is foun d th at the restart occurs at a
specific place: precisely at the point w here the speaker br ings her gaze to her
add ressee, and finds that her add ressee is looking elsewhere:
Pam: En a coup le of girls- One other girl from the:re,
Speaker BringsGaze to
Recipient
Restart
Hearer LookingAwayAnn:
Hearer StartsMoving Gazeto Speaker
Gaze Arrives
Moreover, the restart acts as a request for the Hearers gaze. Thu s immed iately
after the restart the h earer starts to move her gaze to the speaker.
Paradoxically, if the speaker had not p rodu ced a restart at this point she
could h ave said something that w ould ap pear to be an unbroken gramm atical
unit if one examined only the stream of speech (e.g., En a coup le of girls from
there ), but which would in fact be interactively a sentence fragment since her
add ressee attended to only part of it.
The identities of speaker an d hearer are th e most generic participan t
categories relevant to th e produ ction of a strip of talk. The p henomena examined
here (which occur pervasively in conversation) provide evidence that the w ork of
volume. Left brackets connecting talk by d ifferent sp eakers mark the p oint
where overlap begins.
p. 159
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
5/40
4
being a hearer in face-to-face interaction requires situated use of the body, and
gaze in particular, as a w ay of visibly displaying to others th e focus of ones
orientation. Moreover speakers not on ly use their own gaze to see relevant action
in the bod y of a silent hearer, bu t actively change the structure of their emerging
talk in term s of what th ey see.
What relevance do p rocesses such as th ese have to the oth er issue raised by
Chom sky (1965 :3), that of d etermining from the data of performance the
underlying system of rules mastered by the speaker-hearer? Many repairs
involve the rep etition, with some significant change, of something said elsewh ere
in the u tterance:
We wen t- I went to
If he could- If you could
Such rep etition has the effect of delineating th e boun dar ies and stru cture of
many d ifferent units in the stream of speech. Thu s, by analyzing w hat is the
same an d wh at is d ifferent in these examples one is able to d iscover: First, where
the stream of speech can be d ivided into significant subu nits; second , that
alternatives are p ossible in a p articular slot; third, wh at some of these
alternatives are (here d ifferent p ronouns); and fourth, that these alternatives
contrast w ith each other in some significant fashion, or else the rep air wou ld not
be war ranted . Repairs in other examples not only d elineate basic un its in the
stream of speech (noun p hrases for examp le), but also demonstrate the different
forms su ch units can take, and th e types of operations that can be performed
up on th em (see Goodwin 1981 :170-173). Repairs further requ ire that a listener
learn to recognize that not all of the sequences within the stream of speech are
possible sequences within the language, e.g., that I does not follow to in We
went t- I went to . In ord er to deal with such a repair a hearer is thus required
to make one of the most basic distinctions posed for anyone attemp ting to
decipher the stru cture of a langu age: to differentiate what are and are not
possible sequences in th e language, that is between gram matical and
un gramm atical structures. The fact that this task is posed m ay be crucial to any
learning process. If the party attemp ting to learn the language d id not have to
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
6/40
5
deal w ith ungram matical possibilities, if for example she was exposed to only
well-formed sentences, she might not have the data necessary to d etermine the
boundar ies, or even the structure of the system. Chomskys argument that the
repairs found in natu ral speech so flaw it that a child is faced with data of very
degenerate qu ality does not app ear warranted. Rather it might be argued that
if a child g rew u p in an ideal world w here she heard only well-formed sentences
she would not learn to p rodu ce sentences herself because she w ould lack the
analysis of their structure p rovided by events such as repair. Crucial to this
process is the way in wh ich visual phenomena , such as dispreferred gaze states
can both lead to repair, and d emonstrate that the participan ts are in fact
attending in fine detail to wh at might ap pear to be quite ephem eral structure in
the stream of speech.
What h as just been d escribed p rovides one example of the method ologies and
forms of analysis used to investigate visual ph enomena within Conversation
Analysis. Several observa tions can be m ade. First, the focus of analysis is not
visual events in isolation, but instead the systematic practices used by
participants in interaction to achieve courses of collaborative action with each
other in the p resent case the interactive construction of turns at talk, and the
utterances that emerge w ith those turn s. Visual events, such as gaze, play a
central role in th is process but their sense and relevance is established th rough
their embedd edness in other mean ing making tasks and practices, such as the
prod uction of a strip of talk that is in fact heard and attended to by its add ressee.
This links vision to a h ost of other phenom ena includ ing language an d the visible
body as an unfolding locus for the display of meaning and action. Second , what
the analyst seeks to do is not to provide his or her ow n gloss on how v isual
phenomena m ight be m eaningful, but instead to demonstrate how the
participan ts themselves not on ly actively orient to par ticular kinds of visual
events (such as states of gaze), bu t use them as a constitu tive featu re of the
activities they are engaged in (for example by modifying th eir talk in terms of
wh at they demonstrably see). Third, in addition to the spatial dimension that is
natu rally associated w ith vision, these p rocesses also have an intrinsic temporal
dimension as changes in visual events are marked by, and lead to, ongoing
changes in the organ ization of emerging action. If one had only a static snap shot,
p. 160
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
7/40
6
or measured only a single structural possibility, such as mutu al gaze instead of
looking a t the temp orally u nfolding interplay of different combinations of
participan t gaze, the type of analysis being pursued here would be impossible.
Fourth, such analysis requires data of a particular type, specifically a record that
maintains as much information as possible about the setting, embod ied displays
and spatial organization of all relevant participan ts, their talk, and h ow events
change throu gh time. In p ractice no record is comp letely adequ ate. Every camera
position exclud es other views of what is hap pening. The choice of where to p lace
the camera is but the first in a long series of crucial analytical decisions. Despite
these limitations a video or film record d oes constitute a relevant d ata source,
something that can be worked w ith in an imperfect world .
Fifth, crucial problems oftranscription are posed. The task of translating the
situated, embod ied p ractices used by p articipants in interaction to organize
phenomena relevant to vision p oses enorm ous theoretical and m ethodological
problems. Our ability to transcribe talk is built up on a process of analyzing
relevant stru cture in the stream of speech, and marking those distinctions with
wr itten symbols, that extends back thousand s of years, and is still being mod ified
today (for examp le the system developed by Gail Jefferson (Sacks, Schegloff and
Jefferson 1974: 731-733) for transcribing the texture of talk-in-interaction,
includ ing phenom ena, such as momentary restarts and sou nd stretches, that are
crucial for the analysis being reported here). When it comes to the transcription
of visual ph enomena we are at the very beginning of such a p rocess. The arrows
and other symbols Ive used to mark gaze on a transcript (see Goodw in 1981)
capture on ly a small part of a larger comp lex constituted by bod ies interacting
together in a relevant setting. The decision to describe gaze in terms of the
speaker-hearer framework is itself a major analytic one, and by no means simp le,
neutral description. Moreover a gazing head is embed ded within a larger
postu ral configuration, and indeed different par ts of the body can
simultaneously d isplay orientation to different p articipants or regions (see
Kend on 1990b, Schegloff 1998), creat ing participation frameworks of
considerable complexity. Thus on occasion a transcriber w ants some way of
indicating on the p rinted page posture and alignment. In ad dition, not only the
bodies of the participants, but also phenomena in their surrou nd , can be crucial
p. 161
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
8/40
7
to the organization of their action. To try to m ake the ph enomena Im analyzing
independently accessible to the reader so that she or he can evaluate m y analysis,
Ive experimented w ith using tran scription symbols, frame grabs, diagrams, and
movies embedded in electronic versions of papers. Multiple issues are involved
and no method is entirely successful. On the one han d th e analyst needs
materials that maintain as mu ch of the original structure of the events being
analyzed as p ossible, and wh ich can be easily and repetitively replayed. On the
other hand, just as a raw tape recording d oes not display the analysis of
segmental structure in the stream of speech p rovided by transcription with a
phonetic or alph abetic writing system, in itself a video, even one that can be
embedd ed w ithin a paper, does not provide an analysis of how visible events are
being parsed by p articipants. The comp lexity of the phenom ena involved
requires multiple methods for rendering relevant d istinctions (e.g., accurate
transcription of speech, gaze notation, fram e grabs, diagrams, etc., see also Ochs
1979). Moreover , like the two-faced Roman god Janus, any tran scription system
mu st attend simultaneously to two separate fields, looking in one direction at
how to accurately recover throu gh a systematic notation the end ogenous
structure of the events being investigated, while simultaneously keeping an other
eye on the add ressee/ reader of the analysis by attemp ting to present relevant
descript ions as clearly and vividly as possible. In man y cases d ifferent stages of
analysis and presentation w ill require m ultiple transcriptions. There is a
recursive interplay between analysis and method s of description.
Work in Conversation Analysis has provid ed extensive stud y of how the gaze
of participants tow ard each oth er is consequen tial for the organization of action
within talk-in-interaction. Phenomena investigated include th e way in w hich
speakers change the structure of an emerging utterance, and the sentence being
constructed within it, as gaze is moved from one typ e of recipient to another, so
that the u tterance maintains its app ropriateness for its addressee of the mom ent
(Goodw in 1979, 1981); how speakers modify d escript ions in term s of their
hearers visible assessment of wh at is being said (M.H. Goodwin 1980b); how
genres such as stories are constructed not by a speaker alone, but instead
through the d ifferentiated visible d isplays of a range of structurally d ifferent
kind s of recipients (speaker, pr imary addressee, pr incipal character, etc. See
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
9/40
8
Goodwin 1984); the organization of gaze and co-participation in medical
encoun ters (Heath 1986, Robinson 1998); the interactive organization of
assessments (Goodw in and Goodw in 1987), gesture (Goodw in in press, Streeck
1993, 1994), the u se of gaze in activities such as w ord searches (M.H. Good win
and C. Goodw in 1986), etc. Though not strictly lodged within Conversation
Analysis the w ork of Kendon (1990a. 1994, 1997) on both the interactive
organ ization of bodies as they frame states of talk, and on gesture, is central to
the stu dy of visible behavior in interaction. Haviland (1993) provides imp ortant
analysis of the interactive organization of gesture within narration (for extensive
analysis of gesture from a p sychological perspective see McNeill 1992).
Scientific Images
The visible, gazing bod y, and the orientation of participants tow ard eachother as they co-prod uce states of talk is central to the w ork in
ConversationalAnalysis just examined . By w ay of contrast mu ch work w ithin
Ethnomethod ology has focused not on th e bodies of actors, but instead on the
images, diagrams, graph s and other visual p ractices used by scientists to
construct the crucial visual working environm ents of their d isciplines. As noted
by Lynch and Woolgar (1990:5):
Manifestly, what scientists laboriously piece together, pick up in
their hand s, measure, show to one another, argue about, and
circulate to others in th eir comm un ities are not natural objects
indep enden t of cultural processes and literary forms. They are
extracts, tissue cultures, and residu es imp ressed w ithin grap hic
matrices; ordered, shaped, and filtered samples; carefully aligned
ph otograph ic traces and chart record ings; and verbal accoun ts.
These are the p roximal things" taken into th e laboratory and
circulated in p rint and they are a rich repository ofsocial
actions.
Despite importan t d ifferences in su bject matter and method ology both fields
emphasize the importan ce of focusing not on representations or other v isual
phenomena as self-contained en tities in their own right, but instead on how they
are constructed, attend ed to, and used by par ticipants as componen ts of the
p. 162
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
10/40
9
endogenous activities that make u p the lifeworld of a setting. Thus, in
introdu cing their imp ortant volume onRepresentation in Scientific Practice Lynch
and Woolgar (1990: 11) define their inqu iry as follows:
Instead of asking what d o we m ean, in various contexts, by
representation? the stud ies begin by asking, What do the participants,
in this case, treat as representation?
Note that what m ust be investigated is specified both in term s of the orientation
of the pa rticipants, and with respect to the features of the relevant local setting
(e.g., in this case). This leads to a distinctive ethn omethod ological persp ective
on reflexivity :
Reflexivity in th is usage m eans, not self-referential nor reflective
awareness of representational practice, but the inseparability of a theory
of representation from the h eterogeneous social contexts in wh ich
representations are composed and used (Lynch and Woolgar 1990 12).
In a classic article Lynch (1990 :153-154) formulates the task of analyzing
scientific representa tions as that of describing the publicly visible externatized
retina that is the site for the p ractices implicated in the social constitution of the
objects that are th e focus of scientific work:
This study is based on the prem ise that visual displays are more
than a simple matter of supplying p ictorial illustrations for
scientific texts. They are essential to how scientific objects and
orderly relationships are revealed an d mad e analyzable. To
app reciate this, we first need to wrest the idea ofrepresentation from
an ind ividu alistic cognitive foundation, and to replace a
preoccup ation with images on th e retina (or alternatively mental
images or pictorial ideas) with a focus on the externalized retina
of the graphic and instrum ental fields u pon which the scientific
image is imp ressed and circulated.
Using as d ata images from scientific journal art icles and books Lynch describes
two families of practices used to constitu te the visible scientific object: selection
and mathematization. Selection, illustrated through dou ble images in wh ich a
photograph and a diagram of entities visible in the ph otograph are presented
side by side, is described as a host of practices that iteratively transform one
p. 163
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
11/40
10
image of an entity into another (e.g. the photograp h to the diagram) while
simultaneously structuring and shap ing what it is that is being represented.
Crucial to this process is that fact that d ifferent selective/ shaping practices,
including Filtering, Uniforming, Upgrading and Defining can be repetitively
app lied creating not just a single image, but a linked, d irectional chain of
representations Indeed mu ch of the w ork of actually d oing science consists in
building and shap ing what Latour (1986) (see also Latou r and Woolgar 1979)
have called inscriptions in this fashion. Mathematization refers not simp ly to
the use of nu mbers, but instead to the host of practices used to tr ansform
recalcitrant events into mathematically tractable visual and graphic displays e.g.,
graph s, charts and d iagrams. Thus an image showing a map of lizard territories
is assembled throu gh, among oth er operations, driving stakes into the lizard s
environment to create a grid for measurement (and thus injecting a scientifically
relevant Cartesian sp ace into the very h abitat being stud ied), repetitively
capturing lizards, distingu ishing th em from each other by cutting off a different
pattern of toes on each lizard, recording each capture on a pap er map of the
staked ou t territory, and finally draw ing lines around collections of points to
create the m ap . As noted by Lynch (1990: 171) the p rod uct of these p ractices, e.g.,
the published map , is a hybrid object that is demon strably mathem atical,
natural and literary. Note how in all of these cases the focus of analysis is on the
contextually based practices of the participan ts who are assembling an d using
these images to accomp lish the w ork that defines their p rofession.
Though em erging from psychological anthropology, rather than
ethnomethdology, Hutchins (1995) grou nd breaking stud y of the cognitive
practices required to nav igate a ship outlines a major p erspective for the an alysis
of both images and seeing as forms of work-relevant p ractice. Hu tchins
dem onstrates how the practices required to navigate a ship are not situated
within the mental life of a single individu al, but are instead em bedd ed w ithin a
distributed system that encomp asses visual tools such as maps and instruments
for juxtaposing a land mark and compass bearing within the same visual field,
and actors in stru cturally d ifferent p ositions wh o use alternative tools and, in
part because of this, perform d ifferent kind s of cognitive operations, many of
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
12/40
11
wh ich h ave a strong visual comp onent (e.g., locating land marks, p lotting
positions on a map, etc.).
Images in Interaction
All of the work discussed so far takes as its point of departure for theinvestigation of visual ph enomena the task of describing an d analyzing the
practices used by participan ts to construct the actions and events that m ake up
their lifeworld. Rather than stand ing alone as a self-contained analytic dom ain,
visual ph enomena are constituted and mad e meaningful through the way in
wh ich they are embedd ed w ithin this larger set of practices. However, within
this common focus, two quite d ifferent ord ers of visual p ractice have been
examined. Research in science studies has investigated the images prod uced by
scientists, and the way in w hich they visually and math ematically structure thewor ld that is the focus on their inquiry, without h owever looking in much detail
at how scientists attend to each other as living, meaningful bodies, or structure
wh at they are seeing through the organ ization of talk-in-interaction. By w ay of
contrast studies of the interactive organ ization of vision in conversation looked
in considerable detail at how participan ts treat the visual displays of each
others bodies as consequential, and how this is relevant to the moment-by-
moment production of talk, but d id not focus m uch analysis on images in the
environment. Clearly all of the phenomena noted the visible body,
participation, gesture, the d etails of talk and language u se, visual structure in the
surrou nd , images, maps and other representational practices, the pu blic
organ ization of visual practice within the worklife of a profession, etc. are
relevant. The question arises as to wh ether it is possible to analyze such d isparate
phenomena w ithin a coherent analytic framework.
Before turn ing to stud ies that have probed such questions several issues must
be noted . First, it is clearly not the case that the on ly acceptable analysis is one
that includes th is full range of all possible visual p henom ena. Both p articipants
and the structures that p rovide organization for action and events use visible
phenomena selectively. Parties speaking over the telephone can see neither either
others bodies nor events in a common surrou nd . A scientific journ al can be read
in the absence of the parties who constructed its text and d iagrams. More
p. 164
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
13/40
12
interestingly within face-to-face interaction participants can continuously shift
between actions that invoke, and p erhaps require, gaze toward specific events in
the surround , and those make relevant gaze toward no more than each others
bodies, and even in th is more limited case there may be a real issue as to whether
it is relevant to attend to everything that a body does, e.g., some gestures mad e
by a speaker may not requ ire gaze toward them from an ad dressee. There is thu s
an essential contingency, not only for the analyst bu t more crucially for the
participants themselves, as to what su bset of possible visual events are in fact
relevant to the organ ization of the actions of the mom ent. Moreover, this means
that in ad dition to investigating how different kinds of visible phenom ena are
organized, the analyst mu st also take into account how participan ts show each
other what kind s of events they are expected to take into account at a p articular
mom ent, for example to indicate that a participan t, gesture, or entity in th e
surrou nd should be gazed at. There is thus not only comm un ication through
vision, bu t also ongoing comm un ication abou t relevant vision (Goodw in 1981,
1986; in p rep aration, Streeck 1988).
Second visual events are quite heterogeneous, not only in wh at they m ake
visible, but more crucially in their structure. Consider for example the issue of
temporality. Both gestures and the d isplays of postu ral orientation used to bu ild
participation framew orks are performed by th e body w ithin interaction.
However, while gestures, like the bits of talk they accompany, are typically brief
(e.g. they frequently fall within the scope of a single utterance) and display
semantic content relevant to th e topic of the m oment, participation d isplays
frame extended strips of talk and typically provide information abou t the
participants orientation rather th an the specifics of wh at is being d iscussed.
Bodily displays with one kind of temp oral du ration (and information content)
are thus embed ded within another class of visual displays being mad e by the
body w hich have a quite different structure.
Third , the structure of visual signs, including their possibilities for
prop agation through space and time, can be intimately tied to the medium used
to construct them. A major theme of Shakespeares sonnets focuses on the
contrast between the temporally constrained hu man bod y, cond emned to
inevitable decay, and the (limited) possibilities for transcending such corruption
. 165
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
14/40
13
provid ed by language inscribed on the p rinted page which can remain fresh and
alive long after its author an d subject have passed into du st. This contrast
between th e temp oral possibilities provided by alternative media (e.g., the bod y
and docum ents) constitutes an ongoing resource for pa rticipants in vernacular
settings as they bu ild, throu gh interaction w ith each other, the events that make
up their lifewor ld. In ad dition to the d isplays made by a fleeting gesture or local
participation framework, participants also have access to images and docum ents
wh ich can encompass m ultiple interactions and quite d iverse settings. This arises
in par t from the specific media u sed to constitute the signs they contain. Rather
than being lodged within an ever changing hu man body, such d ocum ents
constitute w hat Latou r (1987: 223) has called imm utable mobiles, portable
material objects that can carry stable inscriptions of various types from place to
place and through time.
However, despite the way in w hich crucial aspects of the structure of images
and docum ents remain constant in d ifferent environm ents, they are not self-
contained visual artifacts that can be analyzed in isolation from the processes of
interaction and w ork practices through w hich they are mad e relevant and
meaningful. The same image or docum ent can be construed in quite different
ways in alternat ive settings. For example, a schedule listing all arriving and
dep arting flights was a m ajor tool for almost all workgroups at th e airport
stud ied by the Xerox PARC workplace project (Brun-Cottan et al. 1991, Goodwin
and Goodw in 1996, Suchman 1992), and indeed it linked diverse w orkers
throughou t North America into a common w eb of activity. However w hile
baggage loaders carefully structured their work to anticipate arriving flights, so
that p lanes could be speedily unloaded , these same arrival times were almost
ignored by gate agents looking at the same schedule, but concerned w ith the
dep arture of passengers. Each work group h ighlighted the comm on docum ent in
ways relevant to the specific work tasks it faced. Similarly, on the oceanographic
ship reported in Goodwin (1995) a map showing where samples would be taken
in the Atlantic at the mou th of the Amazon, was a major docum ent at all stages
of the research project. Before the ship sailed th e places where samples could be
taken w as the focus of intense political debate betw een d ifferent grou ps of
scientists and the Brazilian and American governm ents; after the project was
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
15/40
14
comp leted the map p rovided an infrastructure for graph ic displays that could be
used in pu blished journal articles to show wh at the scientists had found about
how the w aters of the Am azon and the Atlantic interacted w ith each other, i.e., a
way of making visible relevant scientific phenomena; during the voyage itself the
map not only provided a comm on framework for the quite different work of
various teams of scientists and the crew nav igating the sh ip, but could also be
looked at by lab technicians not able to go to bed for days at a time because of the
map s incessant samp ling d emand s, to locate places wh ere stations w ere far
apart and rest was p ossible. In brief, though the material form of images and
docum ents gives them an extended temporal scope, and the ability to travel from
setting to setting, they cann ot be analyzed as self-conta ined fields of visually
organized m eaning, but instead stand in a reflexive relationship to th e settings
and p rocesses of embodied hum an interaction through w hich they are
constituted as m eaningful entities. To explicate such events ana lysis must deal
simultaneously with th e quite different structure and temporal organization of
both local embodied p ractice and endu ring graph ic displays.
Finally, the visual (and other p roperties) of settings stru cture environments
that sh ape, on an historical time scale, the activities systematically performed
within th ose settings. A very simp le examp le is provided by the bridge of the
oceanographic ship which not only had a wind ow facing forward so the
helmsman could steer the ship and watch for trouble, but also a window facing
backward s. This was u sed by a w inch operator who had the task of lifting heavy
instrument packages in and out of the sea. Though being u sed here to d o science,
this arrangem ent is in fact a systematic solution to a repetitive problem faced by
sailors, such as fishermen using n ets, who have to m aneuver heavy objects while
a sea. Solutions found to these tasks, such as the rear facing w indow with the
visual access it provides (as well as the forward window facilitating n avigation),
are built into the tools that constitute the work environments u sed by subsequent
actors faced w ith similar tasks. See Hutchins (1995) for illum inating analysis of
this process, includ ing tools that visually structure complex mathematical
calculations, as well as map s. Both w ork environm ents and many of the tools
used within them (comp uter d isplays, etc.) structure in quite specific ways the
embodied visual p ractices of those who inhabit such settings.
p. 166
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
16/40
15
In an attemp t to come to terms w ith such issues Goodw in (in press) has
prop osed that images in interaction are lodged within end ogenous activity
systems constituted through th e ongoing, changing dep loyment of mu ltiple
semiotic fields which mutually elaborate each other. The term semiotic field is
intended to focus on signs-in-their-med ia, i.e., the way in w hich what is typ ically
been attended to are sign p henomena of various typ es (gestures, maps, displays
of bodily orientation, etc.) which have variable structural properties that arise in
part from the d ifferent kind s of materials used to make them visible (e.g., the
body, talk, documents, etc.). Bringing signs lodged within different fields into a
relationship of mutu al elaboration produ ces locally relevant meaning and action
that could not be accomp lished by one sign system alone. Consider for example a
place on a m ap ind icated by a p ointing finger wh ich is being construed in a
specific fashion by the accomp anying talk. Neither the map as a w hole, that is a
self-sufficient representation, nor the pointing finger in isolation from a) its
target (the spot on th e map ) and b) the construal being p rovided by the talk, nor
the talk alone w ould be sufficient to constitute the action m ade visible by the
conjoined use of the three semiotic fields, each of wh ich p rovid es resources for
specifying how to relevantly see and understand the others (see the brief
d iscussion of the Rodney King data below for a specific examp le; see Goodwin
in preparation for more detailed analysis of pointing). The particular subset of
semiotic fields available in a setting that p articipants orient to as relevant to the
construction of the actions of the m oment constitutes a contextual configuration.
As interaction u nfolds contextual configurations can change as new fields are
add ed to, or d ropp ed from, the specific mix being u sed to constitute the events of
the moment. Thus, as contextual configurations change th ere is both un folding
pu blic semiotic structure an d contingency(and indeed in some circum stances
actions can m isfire when ad dressees fail to take into accoun t a relevant semiotic
field, such as the sequential organization p rovided by a prior unheard utterance
see Goodw in in preparation for an examp le).
Professional Vision
Work settings provid e one environment in wh ich the interplay between situated,
embodied interaction, and the u se of visual images of different types, can be
p. 167
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
17/40
16
systematically investigated. In man y w ork settings par ticipants face the task of
classifying visual phenom ena in a w ay that is relevant to the work they are
charged w ith performing. Frequently they m ust also construct different kinds of
representations of visual stru cture in the environment that is the focus of their
professional scrut iny. We will now briefly examine how such vision is socially
organ ized in two tasks faced by archaeologists: 1) color classification and 2) Map
making, and then look at how such professional vision was both constructed and
contested in the tr ial of four policemen charged w ith beating an African
American motor ist, Mr. Rodney King. The key eviden ce at the trial was a
videotape of the beating.
Color Classification as Historically Structured Professional Practice
As par t of the work involved in excavating a site, archaeologists make m aps
showing relevant stru cture in the layers of dirt they uncover. In ad dition to
artifacts, such as stone tools, archaeologists are also interested infeatures,such the
remains of an old h earth or the ou tlines of the posts that held up a building. Such
features are typically visible as color d ifferences in th e dirt being examined (e.g.,
the remains of a cooking fire will be blacker than the su rroun ding soil, and the
holes used for p osts will also have a different color from the soil aroun d them).
Field archaeologists thu s face the task of systematically classifying the color of
the d irt they are excavating. The m ethods th ey use to accomp lish this taskconstitute a form of professional visual p ractice. As dem onstrated by the
discussion of Lynchs analysis of scientific representation, and the brief
description of the oceanograp hers, crucial work in m any different occup ations
takes the form of classifying and constructing visual p henomena in w ays that
help shap e the objects of knowledge th at are the focus of the w ork of a profession
(e.g., architects, sailors plotting courses on charts, air traffic controllers,
professors making graphs and overheads for talks and classes, etc.). Such
professional vision constitutes a persp icuou s site for systematic stud y of howdifferent kinds of phenomena intersect to organize a commu nitys practices of
seeing.
Goodwin (1996, in press) describes how archaeologists code the color of the
dirt they are excavating throu gh u se of a Munsell chart. The following show s two
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
18/40
17
archaeologists performing th is task, the Mu nsell page that they are u sing, and
the coding form where they will record their classification:
17 Pam: En this one. ((Points at color patch))
18 (0.4) ((Jeff moves trowel))
19 Jeff: nuhhh?20 (1.8)
21 Pam: Or that one? ((Points at color patch))
Within this scene are a nu mber of different kinds of phenomena relevant to
the organ ization of visual p ractice, includ ing tools that stru cture the process of
seeing an d classification, and docum ents that organize cognition an d interaction
in the current setting w hile linking these p rocesses to larger activities and other
settings. These archaeologists are intently examining the color of a tiny sam ple of
d irt because they have been given a coding form to fill out. That form t ies their
work at th is site to a range of other settings, such as the offices and lab of the
senior investigator, where the form being filled in here will eventually become
part of the perm anent record of the excavation, and a componen t of subsequent
analysis. The multivocality of this form, the way in which it displays on a single
. 168
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
19/40
18
surface the actions of multip le actors in structura lly d ifferent positions, is show n
visually in vivid fashion by th e contrast between the printed coding categories,
and the hand written entries of the field workers.
The use of a coding form su ch as this to organize the perception of natu re,
events, or people within the d iscourse of a p rofession carries with it an array of
perceptual and cognitive operations that have far reaching impact. Coding
schemes d istributed on forms allow a senior investigator to inscribe his or her
perceptual d istinctions into the w ork p ractices of the technicians w ho code the
data. By using such a system a w orker views the world from the perspective it
establishes. Of all the possible ways that the earth could be looked at, the
perceptual w ork of field workers using this form is focused on d etermining the
exact color of a minu te samp le of d irt. They engage in active cognitive work, but
the p arameters of that work have been established by the classification system
that is organ izing their perception. In so far as the coding scheme establishes an
orientation toward the world, a work-relevant w ay of seeing, it constitutes a
structure of intentionality whose proper locus is not the isolated, Cartesian mind ,
but a m uch larger organizational system, one that is characteristically m ediated
through mu nd ane bureaucratic docum ents such as this form.
Rather than stand ing alone as self-explicating textual objects, forms are
embedd ed w ithin webs of socially organized, situated practices. In ord er to make
an entry in the slot provided for coloran archaeologist mu st make u se of another
tool, the set of standard color samp les provided by a Mu nsell chart. This chart
incorporates into a portable physical object the resu lts of a long history of
scientific investigation of the p roperties of color.
The Mun sell chart being u sed by the archaeologists contains not just on e, but
three different kind s of sign systems for describing each point in the color space
it provides: 1) a set of carefully controlled color samples arranged in a grid to
dem onstrate the changes that result from systematic variation of the variables of
Hu e , Chroma and Value u sed to d efine each color (each p age displays an
ordered set of Value and Chroma variables for a single hue); 2) numeric
coordinates for each row and column , the intersection of what specifies each
square as a p air of nu mbers (e.g., 4/ 6 on the 10YR Hue page); and 3) standard
color names such as dark yellowish brown (these names are on the left facing
p. 169
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
20/40
19
page w hich is not reproduced here). Moreover these systems are not p recisely
equivalent to each other. For example several color squares can fall within the
scope of a single nam e.
Why d oes the Munsell page contain mu ltiple, overlapping representation of
wh at is app arently the same visual entity (e.g., a par ticular choice within a larger
set of color categories)? The answer seems to like in the way that each
representation as a semiotic field with its own distinctive prop erties makes
possible alternative operations and actions, and thus fits into d ifferent kind s of
activities. Both the names and nu mbered grid coordinates can be written, and
thus easily transported from the actual excavation to the other w ork sites, such as
laboratories and journals, that constitute archaeology as a profession. The
numbers p rovide the most precise description, and d o not require translation
from language to langu age. However locating the color indexed by the
coordinates requ ires that the classification be read with a Mu nsell book at hand .
By way of contrast the color names can be grasped in a way that is adequate for
most p ractical pu rposes by any competent sp eaker of the language u sed to write
the rep ort. The outcome of the activity of color classification initiated by the
empty square on th e coding form is thu s a set of portable lingu istic objects that
can easily be incorporated into the u nfolding chains of inscription th at lead step
by step from the d irt at the site to reports in the archaeological literature.
How ever, as arbitrary linguistic signs produ ced in a medium that d oes not
actually m ake visible color, neither the color names nor the num bers, allow d irect
visual comparison betw een a sam ple of dirt an d a reference color. This is
precisely what the color patches and viewing h oles make possible. In brief, rather
than simply specifying unique p oints in a larger color space, the Mun sell chart is
used in mu ltiple overlapp ing activities (comparing a reference color and a p atch
of dirt as par t of the work of classification, transpor ting those results back to the
labe, comparing sam ples, publishing reports, etc.), and thu s represents the
same entity, a particular color, in multiple ways, each of which makes p ossible
different kind s of operations because of the unique p roperties of each
representational system.
In addition to its various sign system s it also contains a set of circular holes,
positioned so that one is ad jacent to each color patch. To classify color the
p. 170
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
21/40
20
archaeologist puts a sm all samp le of dirt on the tip of a trowel, pu ts the trowel
directly un der th e Munsell page and then m oves it from hole to hole until the
best match w ith an adjacent color samp le is found . With elegant simplicity the
Mun sell page with its holes for viewing the sam ple of dirt on the trow el
juxtaposes in a single visual field two qu ite different kind s of spaces: 1) actual
dirt from the site at the archaeologists feet is framed by 2) a theoretical space for
the r igorous, replicable classification of color. The latter is both a conceptual
space, the prod uct of considerable research into p roperties of color, and an actual
physical space instantiated in the orderly m odification of variables arranged in a
grid on the Mu nsell page. The pages juxtaposing color p atches and viewing holes
that allow the d irt to be seen right n ext to the color sample provide an
historically constituted architecture for perception, one that encapsulates in a
material object theory an d solutions d eveloped by earlier w orkers at other sites
faced w ith the task of color classification. By juxtap osing unlike spaces, but on es
relevant to th e accomp lishment of a specific cognitive task, the char t creates a
new , distinctively hum an, kind of space. It is precisely here, as bits of d irt are
shaped into the work relevant categories of a specific social group, that nature
is transformed into culture.
How are the resources provided by the chart made visible and relevant
within talk-in-interaction? At line 17 Pam moves her han d to the space above the
Mun sell chart an d points to a p articular color patch wh ile saying En this one.
Within th e field of action created by the activity of color classification, what Pam
does here is not simply an indexical gesture, but a p roposal that the ind icated
color migh t be the one they are searching for. By virtue of such cond itional
relevance (Schegloff 1968) it creates a new context in w hich rep ly from Jeff is the
expected n ext action. In line 19 Jeff rejects the p roposed color. His move occurs
after a noticeable silence in line 18. However that silence is not an em pty space,
but a p lace occup ied by its own relevant activity. Before a comp etent answ er to
Pams prop osal in line 17 can be made, the dirt being evaluated h as to be placed
un der the viewing hole next to the samp le she indicated, so that the tw o can be
compared . During line 18 Jeff moves the trowel to this position. Because of the
spatial organization of this activity, specific actions have to be p erformed before
a relevant task, a color comparison, can be competently performed. In brief, in
p. 171
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
22/40
21
this activity the spatial organization of the tools being w orked w ith, and th e
sequential organization of talk in interaction interact with each other in th e
prod uction of relevant action (e.g. getting to a place where one m ake an expected
answer requires rearrangement of the visual field being scrutinized so that th e
jud gment being requested can be competently p erformed ). Here socially
organized vision requ ires embod ied manipulation of the environment being
scrutinized.
It is comm on to talk about stru ctures such as the Munsell chart as
representations. How ever exclusive focus on the representational prop erties of
such structures can seriously distort our u nd erstand ing of how such entities are
embedd ed w ithin the organization of human p ractice. With its viewholes for
scru tinizing sam ples, the page is not simp ly a perspicuous rep resentation of
current know ledge about th e organization of color, but a space designed for the
ongoing prod uction of particular kinds ofaction.
We will now look at how a group of archaeologists make a m ap. This process
will allow us to examine the interface between seeing, writing practices, talk,
human interaction and tool use (see Goodw in 1994 for more d etailed analysis).
Map Making and t he Pract ices of Seeing it Requires
Map s are central to archaeological practice. The p rofessional seeing required to
prod uce and m ake use of a visual document, such as a map, encompasses notonly the image itself but also the ability to competently see relevant structure in
the territory being m app ed, mastery of appropriate tools, and on occasion the
ability to an alyze the w ork-relevant actions of anothers body. These d ifferent
kinds of phenomena can be brough t together within the temporally unfolding
process of hu man interaction used to accomp lish the activity of making a map . In
the following, two archaeologists are making a map to record w hat they have
foun d in a p rofile of the d irt on the side of one of the square holes they have
excavated. Before actually setting pen to paper some relevant events in the d irt,such as the bound ary between two different kinds of soil, are highlighted by
outlining them w ith the tip of a trowel. The structure visible in the dirt is then
map ped on a sheet of graph pap er. Typically this task is don e by two
participan ts working together. One u ses a pair of rulers (one laid horizontally on
the surface, and the other a hand held tape measure used to m easure d epth
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
23/40
22
beneath the sur face) to measure the length an d d epth coordinates of the points in
the dirt that are to be transferred to the m ap, and then speaks these coord inates
as pairs of numbers (e.g., at fifteen th ree point tw o). The second p erson p lots
the points specified on th e graph pap er, and d raws lines between successive
measurements. What w e find here is a small activity system that encomp asses
talk, writing, tools and distributed cognition as tw o parties collaborate to inscribe
events they see in the earth onto p aper. Here Ann, the party draw ing the map, is
the senior Archaeologist at the site, and Sue, the person making measurem ents is
her Student:
p. 172
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
24/40
23
1 An n: Give m e th e grou nd su rface over here
2 to about ninety .
3 (1.6)4 Ann: No- No- Not at ninety.
5 From you to about ninety.
6 (1.0)
7 Sue: Oh.
8 Ann: Wherever there's a change in slope.
9 (0.6)
10 Sue: Mm kay.
11 Ann: See so if its fairly flat
12 I'll need one13 where it stops being fairly flat.
14 Sue: Okay.
15 Ann: Like right there.
Line DrawnWith Trowel
SurfaceTape
MeasureSueAnn
Ruler
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
25/40
24
The sequence to be examined begins with a d irective. Ann, the w riter, tells Sue
the measurer, to Give me the ground su rface over here to aboutninety.
How ever before Sue has p roduced any num bers, indeed before she has said
anything w hatsoever, Ann in lines 4 and 5 challenges her, telling her that what
she is doing is wrong: No- No- Notat ninety.From you t o about ninety.
Directives are a classic form of speech action that sociolingu ists have used to
probe the relationship between language and social structure, and in particular
issues of power and gender. Here Sue formats both her d irective and her
correction in very strong, d irect aggravated fashion. No forms of mitigation are
foun d in either utterance, and Ann is not given an opportun ity to find and
correct the trouble on h er own. Directives formatted in this fashion h ave
frequen tly been argu ed to d isplay a hierarchical relationship, i. e., Ann is treating
Sue as someone that she can give direct, unm itigated ord ers to. And indeed Ann
is a professor and Sue is her stud ent.
Issues of power do not how ever exhau st the social phenomena visible in th is
sequence. Equally imp ortant are a range ofcognitive processes that are as
socially organized as the relationships between the participants. For example, in
that Sue has not p rodu ced an answer to the directive, how can Ann see that there
is something w rong w ith a response that has not even occurred yet? Crucial to
this process is the ph enomenon ofconditional relevance first described by
Schegloff (1968). Basically a first ut terance creates an interp retive environm ent
that w ill be used to an alyze whatever occurs after it. Here no subsequ ent talk has
yet been produced. How ever, providing an answer in this activity system
encompasses more than talk. Before speaking the set of nu mbers that counts as a
prop er next bit of talk, Sue m ust first locate a relevant point in the d irt and
measure its coordinates. Both her movement throu gh space, and h er use of tools
such as a tape measu re, are visible events. As Ann finishes her directive Sue is
holding the tape m easure against the d irt at the left or zero end of the profile.
How ever, just after hearing ninety Sue m oves both her body and the tape
measure to right, stopping near the 90 mark on the up per ru ler. By virtue of
the field interpretation opened u p through cond itional relevance, Sues
movement and tool use can now be analyzed by Ann as elements of the activity
she has been asked to p erform, and foun d w anting. Sue has moved imm ediately
. 173
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
26/40
25
to ninety instead of measuring the relevant points between zero and ninety. The
sequential framew ork created by a directive in talk thu s provid es resources for
analyzing and evaluating the visible activity of an ad dressees body interacting
with a relevant environment.
Additional elements of the cognitive operations and kinds of seeing that Ann
requires from Sue in order to make her measurements are revealed as the
sequence continues to unfold. Making the relevant measurements p resupposes
the ability to locate where in the d irt measurements should be mad e. However
Sues response calls this presupp osition into question and leads to Ann telling
her explicitly, in several different w ays, what she shou ld look for in order to
determine where to measure. After Ann tells Sue to measure points between zero
and ninety, Sues does not imm ediately move to points in that r egion bu t instead
hesitates for a full second before replying w ith a w eakOh (line 7). Ann th en
tells her w hat she shou ld be looking for Wherever theres a change in slope
(line 8). This description of course presup poses Sues ability to find in the d irt
wh at will could as a change in slope. Sue again moves her tape measure far to
the right. At this p oint, instead of relying upon talk alone to make explicit the
phenomena that she w ants Sue to locate, Ann m oves into the sp ace that Sue is
attending to and points to one place that should be measured w hile describing
more explicitly w hat constitutes a change in slope: See so if itsfairlyflat Ill
need one where itstops being fairly flat like right th ere.
One of the th ings that is occurr ing within this sequence is a p rogressive
expansion of Sues und erstanding as the d istinctions she must make to carry out
the task assigned to her are explicated and elaborated. In th is process of
socialization th rough language th ere is a growth in intersubjectivity as d omains
of ignorance that p revent the successful accomplishm ent of collaborative action
are revealed and transformed into practical knowledge, a way of seeing, that is
sufficient to get the job at hand don e, such that Sue is finally able to un derstand
wh at Ann is asking h er to do (that is to see the scene in front of her in a manner
that p ermits her to m ake an ap prop riate, comp etent response to the d irective). It
wou ld how ever be wrong to see the un it within which this intersubjectivity is
lodged as simply these two mind s coming together in the work at han d. Instead
the d istinction being explicated, the ability to see in the very complex perceptual
p. 174
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
27/40
26
field provid ed by the landscape they are attending to, those few events that
coun t as points to be transferred to the m ap, are central to wh at it means to see
the world as an archaeologist, and to use that seeing to bu ild the artifacts, such as
this map , which are constitut ive of archaeology as a profession. Such seeing
wou ld be expected of any competent archaeologist. It is an essential part of what
it means to be an archaeologist, and it is these professional p ractices of seeing
that Sue is being held accountable to. The relevant unit for the analysis of the
intersu bjectivity at issue here the ability of separate ind ividu als to see a
comm on scene in a congru ent, work-relevant fashion is thus not these
individuals as isolated entities, but instead archaeology as a profession, a
comm un ity of comp etent practitioners, most of whom have never m et each
other, but w ho n onetheless expect each other to be able to see and categorize the
wor ld in w ays that are relevant to the w ork, scenes, tools and ar tifacts that
constitute their profession.
The ph enomena examined so far provide some demonstration of how what is
to be seen in a m ap, scene, hum an bod y or image stand s in a reflexive
relationship to other semiotic structures that p articipants are u sing to constitute
visual phenom ena as a relevant component of the events and activities that make
up their lifeworld . These structures includ e language, the constitution of action
and context provid ed by sequen tial organization, and ways of seeing events and
using images of d ifferent typ es that are lodged w ithin the p ractices of particular
social commu nities, such as the profession of archaeology.
Professional Vision in Court
Parties who are not competent m embers of relevant social comm unities can lack
the ability, and/ or the social positioning, to see and articulate visual events in a
consequential way. These issues were mad e d ramatically visible in the trial of
four Los Angeles policemen w ho were recorded on v ideotape ad ministering a
beating to an African American motorist, Mister Rodn ey King, whom they hadstopped after a high speed p ursu it triggered by a traffic violation. When the tape
of the beating was show n on n ational television there was outrage, and even the
head of the Los Angeles police d epartm ent thought that conviction of the officers
was almost autom atic. How ever, at their first trial (they w ere later tried again in
Federal ra ther than state cour t for violating Mister Kings civil rights) all fourp. 175
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
28/40
27
policemen w ere acquitted, a verdict that triggered an u prising in the city of Los
Angeles, with neighborhood s being burn ed, federal troops being called in, etc.
The crucial evidence at the trial was a visual d ocum ent: the videotape of the
beating. Rather than transparently proving the guilt of the policemen w ho were
seen on it beating a man lying p rone on the groun d, the tape in fact provided the
policemens lawyers w ith their evidence for convincing the jury that th eir clients
were not gu ilty of any w rongd oing. They did this by using langu age, pointing
and expert testimony to structure how the jury saw the events on the tape in a
way the exonerated the p olicemen. In essence they used the tap e of the beating to
dem onstrate that Mr. King w as the aggressor, not the p olicemen, and that the
policemen w ere following prop er police practice for subd uing a violent,
dan gerous su spect (see Goodwin 1994 for more d etailed analysis of such
professional vision). Crucial to their success was their use of another policeman,
Sargent Du ke, as an expert witness. It was argued that laymen could n ot
prop erly see the events on the tap e. Instead, the ability to legitimately see wh at
the body of a suspect was d oing, such as Mr. Kings as he lay on the ground
being beaten, and specifically whether th e suspect w as being aggressive or
comp liant, was lodged w ithin the work practices of the social group charged
with arresting susp ects: the p olice. The ability to see such a body, and code it in
terms of its aggressiveness, was a component of the professional practices that
policemen u se to code the events that are the focus of their w ork. It so far as such
vision is a pu blic comp onent of the work p ractices of a p articular social group ,
someone wh o wasnt present bu t who is a member of the profession, a
policeman, can m ake auth oritative statements abou t what can be legitimately
seen on th e tape. How ever, while policemen constitute a socially organized
profession, susp ects and victims of beatings d ont. Therefore there is no one with
the social stand ing, i.e., mem bership and mastery of the practices of a relevant
social group , to act as an expert w itness to articulate what w as happening from
Mister Kings p erspective.
What was to be seen on the tape was structured through the way in w hich
different semiotic fields, such as structure in the stream of speech, pointing
wh ich h ighlighted specific places and phenomena in the image being looked at,
and events in the image itself, mu tually elaborated each other to p rovide a
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
29/40
28
construal of events that served the pu rposes of the party articulating the image.
The following p rovides an examp le. At the p oint wh ere we enter th is sequence
the prosecutor has noted that Mr. King ap pears to be moving into a position
app ropriate for han dcuffing him, and that on e officer is in fact reaching for his
han dcuffs, i.e. the su spect is being cooperative.
1 Prosecu tor: So uh would you ,
2 again consider this to be:
3 a nonagressive, movement by Mr. King?
4 Sgt. Duke: At this t ime no I wouldn't. (1.1)
5 Prosecutor : It is aggressive.
6 Sgt. Duke: Yes. It 's starting to be. (0.9)
7 This foot, is laying flat, (0.8)8 There's starting to be a bend. in uh (0.6)
9 this leg (0.4)
10 in his butt (0.4)
11 The buttocks area has started to rise. (0.7)
12 which would put us,
13 at the beginning of ourspectrum again.
indicates that Sgt. Duke is pointing on the screenat the body part described in his talk.
By noting the submissive elements in Mr. Kings posture, and the fact that one of
the officers is reaching for his handcuffs the prosecutor has shown that th e tape
demon strates that Mr. King is being cooperative. If he can establish this point
hitting Mr. King again would be un justified, and the officers should be found
guilty of the crimes they are charged with. The contested vision being d ebated
here has very h igh stakes.
To rebut the vision p roposed by the p rosecutor, Sgt. Duke uses the seman tic
resources provided by language to code as aggressive extremely subtle body
movements of a man lying face down beneath the officers (lines 7-11). Note for
examp le not only line 13s explicit placement of Mr. King at the very edge, the
beginning, of an aggressive spectrum introdu ced in earlier testimony, but also
p. 176
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
30/40
29
how very small movements are made mu ch larger by situating them w ithin a
prosp ective horizon through the repeated u se ofstarting to (lines 6, 8, 13). The
events visible on the tape are structured, enhanced and amp lified by the
language used to describe them.
This focusing of attention organizes the p erceptu al field p rovided by the
videotape into a salient figure, the aggressive susp ect, wh o is highlighted against
an am orphous backgroun d containing nonfocal participan ts, the officers doing
the beating. Such stru cturing of the materials prov ided by the image is
accomp lished n ot only through talk, but also through gesture. As Sergeant Duke
speaks he brings his hand to the screen and points to the par ts of Mr. Kings
body that he is arguing d isplay aggression. The pointing gesture and the
perceptual field wh ich it is articulating m utu ally elaborate each other. The
touchable events on the television screen provide visible evidence for the
description constructed th rough talk. What emerges from Sgt. Dukes testimony
is not just a statement, a static category, but a demonstration built through the
active interplay between coding scheme and the image to w hich it is being
app lied. As talk and image mu tually enhance each other a demonstration that is
greater than th e sum of its parts em erges, while simultaneously Mr. King, rather
than the p olice officers becomes the focus of attention as the experts finger
articulating the image delineates what is relevant w ithin it.
By virtue of the category system erected by the defense, the minu te rise in Mr.
Kings buttocks noted on the tape u nleashes a cascade of perceptual inferences
that have the effect of exonerating the offers. A rise in Mr . Kings body becomes
interp reted as aggression, which in tu rn justifies the escalation of force. Like
other p arties, such as th e archaeologists, faced with the task of coding a visual
scene, the jury was led to engage in intense, minute cognitive scrutiny as they
looked at the tap e of the beating to decide the issues at stake in the case.
How ever, once the d efense coding scheme is accepted as a relevant framework
for looking at the tap e, the operative perspective for viewing it is no longer a
laypersons reaction to a man lying on the grou nd being beaten, but instead a
micro-analysis of the movements being m ade by that m ans bod y to see if it is
exhibiting aggression.
p. 177
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
31/40
30
In the first trial, though the p rosecution dispu ted the an alysis of specific body
movements as d isplays of aggression, the relevance of looking at th e tape in
terms of such a category system w as not challenged . A key d ifference in the
second tr ial, which led to the conviction of two of the officers, was th at there the
prosecution gave the jury alternative frameworks for interp reting the events on
the tap e. These included ways of seeing the m ovements of Mr. Kings body that
Sgt. Duke highlighted as normal reactions of a man to a beating rather than as
displays of incipient aggression. In the prosecutions argum ent Mr. King cocks
his leg, not in p reparation for a charge, but because h is muscles natu rally jerk
after being hit w ith a m etal club.
The stud y of the p ractices used to structure relevant vision in scientific and
workplace environments, what H utchins (1995) has called Cognition in the Wild,
has become the focus of considerable research. A major initiative for such studies
was provided by Lucy Scuhman in th e early 1990s when she initiated the
Workp lace project while at Xerox PARC. The site chosen for research w as
groun d op erations at a mid-sized airport. Docum ents and images of many
different types, and the ability of actors in alternative structural positions to see
and analyze events in relevant w ays, were crucial to the work of the airport.
Phenomena that received extensive stud y includ ed w ork relevant seeing of
docum ents, airplanes and events (Goodw in and Goodw in 1996), the constitution
of shared workspaces (Suchman 1996), the stud y of how a common docum ent
coordinated different kinds of work in different w ork settings, and the p ractices
involved in seeing and shaping p henom ena in collaborative work (Suchman and
Trigg 1993, Bru n-Cottan 1991). In part because of the central role played by
visual phenom ena in the work being analyzed, the projects final report w as
submitted as a videotape (Brun-Cottan et al. 1991). Subsequent analysis growing
from this project has focused on th e organization of both d ocuments and visual
phenomena in a range of occupational settings, such as law firms and th e work
of architects. In England Christian Heath and his collaborators have investigated
the structuring of vision w ithin interaction in a ran ge of settings includ ing the
control room for the Lond on Und erground, centers for the produ ction of
electron ic news, ar t classes, etc. (Heath and Luff 1992, 1996, Heath and Nicholls
1997). In much of th is research there is a focus on h ow core practices for the
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
32/40
31
organization of talk, reference, gesture and other p henomena central to the
prod uction of action within hum an interaction can encompass not only talk but
also embod iment in a world p opu lated by work-relevant objects. Hinderm aash
and Heath (in p ress) investigate reference within su ch a framew ork. LeBaron
(1998), Streeck (1996) and LeBaron and Streeck (in p ress) examine how gesture
emerges from the interaction of working h and s acting in th e world in settings
such as architects meetings and auto body shops. Robinson (1998) has provided
analysis of participants in m edical interviews organize their interaction by
attending to how gaze is shifted from other participants to relevant visual
materials in the setting, such as medical records. Whalen (1995) analyzed how
the talk of operators respond ing to emergency 911 calls was organized in part by
the task of filling in requ ired information on a computer screen w ith a sp ecific
visual organization. Rogers Hall and Reed Stevens have investigated visual
practices in a ran ge of school, scientific and occup ational settings (Hall and
Stevens 1995; Stevens an d H all in p ress). Research in Compu ter Supp orted
Cooperative Work h as focused sp ecifically on new form s of visual access created
by electronic media. Heath (in p ress) and Heath and Luff (1993) have d one
considerable research on interaction mediated th rough video, demonstrating the
crucial ways in w hich resources available to parties who are actually co-present
to each other are not available in media spaces. Yamazaki and his
colleagues,have explored the systematic problems that arise when p articular
kinds of d irectives, such as instructions for how to use CPR to start a h eart attack
victims heart, are given through talk alone, for example over the p hone, without
access to a relevant visual environment. Patients usually die, since the novice is
not able to place his or her hand s at the appropriate spot on the patients body.
To remedy som e of these issues technologies that incorporate basic resources
available for doing reference in face-to-face interaction, such as p ointing, have
been developed . These include a remote controlled car w ith a laser that has the
ability to move while clearly marking the specific places being pointed at in a
remote environ ment (Yamazaki et al. 1999). Nishizaka (in press) has investigated
how par ticipants coordinate gaze both sp atially and temporally on electronic
docum ents such as compu ter screens. Kawatoko (in p ress) has investigated how
lathe workers organize p erceptu al fields so to make visible the invisible
p. 178
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
33/40
32
movements of their cutting tools.. Both Kwatoko and Ueno (in press) have
examined the organization of vision on m any d ifferent levels (from docum ents to
systematic placement of objects on the warehouse floor as part of its work flow)
in the w ork p ractices of large warehouse. In all of this work p ractices for seeing
relevant p henomena are systematically embedd ed within processes of social
organization, structures of mu tual accoun tability and the organ ization of activity.
Conclusion
Within both Conversation Analysis and Ethnomethod ology visual phenom ena
have been analyzed by investigating how th ey are made m eaningful by being
embedd ed w ithin the practices that p articipants in a var iety of settings use to
construct the events and actions that m ake up their lifeworld. This has led to the
detailed stud y of a range of quite different kinds of phenom ena, from th einterplay between gaze, restarts and gramm ar in the building of utterances
within conversation, to the construction and use of visual representations in
scientific practice, to how the ability of lawyers to shap e wh at can be seen in the
videotape of policemen beating a suspect can contribute to d isruption of the
body politic that leaves a city in flames, to the part played by visual practices in
both trad itional and electronic workp laces. Visual phenomena that h ave received
particular atten tion include 1) the body as a visible locus for disp lays of
intentional orientation throu gh both gaze and postu re; 2) the body as a locus for
a variety of different kinds of gesture, from iconic elaboration on w hat is being
said in the stream of speech, to pointing, to the hand as an agent engaged with
the world arou nd it; 3) visual documents of man y different types used in both
scientific practice and the w orkp lace, e.g., maps, grap hs, Mun sell charts, coding
forms, schedules, television screens providing access to distant sites,
architectural d rawings, comp uter screens, etc. 4) material structure in th e
environment where action and interaction are situated. This perspective brings
together within a common analytic framework both the d etails of how the visible
body is used to build talk and action in moment to moment interaction, and the
way in w hich h istorically structured v isual images and features of a setting
participate in that process. Rather than standing alone as self-contained, self-
explicating images, visual phenom ena become meaningful throu gh the way in
p. 179
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
34/40
33
wh ich th ey help elaborate, and are elaborated by, a range of other semiotic fields
sequential organization, structure in the stream of speech, encompassing
activities, etc. that are being used by par ticipants to both construct and make
visible to each other relevant actions. The focus of analysis is always on how the
participan ts in a setting themselves display a consequential orientation to v isual
phenomena (e.g., by shifting gaze after a restart, focusing th eir work on a
Mun sell chart, building images as a core comp onent of the p ractices used to
make visible scientific phen omen a, etc.). A var iety of different m ethod ologies are
employed. How ever a basic comp onent of many research p rojects includ es going
to the site where th e activities being investigated are actually performed, an d
examining wh at the p articipants are doing th ere as carefully as possible.
Videotap e records are frequently m ost useful because of the way in which they
preserve limited but crucial aspects of the spatial and environmental features of a
setting, the tem poral u nfolding organization of talk, the visible displays of
participants bodies, and changes in relevant phenomena in the setting as
relevant courses of action unfold. Analysis typically requires not only viewing
the tape, ethnograph ic records and docum ents collected in a setting, but also the
construction of new visual representations such as tran scripts of many d ifferent
types (note how some in this pap er incorporate both d etailed transcription of the
talk and a var iety of different kind s of graph ic representations). While this
analysis sheds mu ch important new light on how visual ph enomena are
organized throu gh systematic discursive practice, it is not restricted to vision per
se but is instead investigating the m ore general practices used to build action
within situated human interaction.
References Cited
Brun-Cottan, Franoise
1991 Talk in the Work Place: Occupational Relevance.Research on Language
and Social Interaction 24:277-295.
Brun-Cottan, Franoise, et al.
1991 The Workplace Project: Designing for Diversity and Change. Video
produced by Xerox Palo Alto Research Center.
Chomsky, Noam
1965 Aspects of the Theory of Syntax. Cambr idge, Mass.: MIT Press.
p. 180
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
35/40
34
Goodw in, Charles
1979 The Interactive Construction of a Sentence in N atural Conversation. In
Everyday Language: Studies in Ethnomethodology. George Psathas, ed. Pp .
97-121. New York: Irvington Publishers.
Goodw in, Charles
1980a Restarts, Pauses, and the Achievement of Mutual Gaze at Turn-
Beginning. Sociological Inquiry 50:272-302.
Goodw in, Charles
1981 Conversational Organization: Interaction Between Speakers and Hearers.
New York: Academic Press.
Goodw in, Charles
1984 Notes on Story Structure and the Organization of Participation. In
Structures of Social Action. Max Atkinson and John H eritage, eds. Pp.
225-246. Cambridge: Cambrid ge University Press.
Goodw in, Charles
1986 Gesture as a Resource for the Organization of Mutual Orientation.
Semiotica 62(1/ 2):29-49.
Goodw in, Charles
1994 Professional Vision .American A nthropologist96(3):606-633.
Goodw in, Charles
1995 Seeing in Dep th. Social Studies of Science 25:237-274.
Goodw in, Charles
1996 Practices of Color Classification.Ninchi Kagaku (Cognitive Studies:
Bulletin of the Japanese Cognitive Science Society) 3(2):62-82.
Goodw in, Charles
in preparation Pointing as Situated Practice. In Pointing: Where Language,
Culture and Cognition Meet. Sotaro Kita, ed.
Goodw in, Charles
in press Action and Embodiment Within Situated H um an Interaction.Journal of
Pragmatics .
Goodw in, Charles, and Marjorie Harness Goodw in
1987 Concurrent Operations on Talk: Notes on the Interactive Organization
of Assessments.IPrA Papers in Pragmatics 1, N o.1:1-52.
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
36/40
35
Goodw in, Charles, and Marjorie Harness Goodw in
1996 Seeing as a Situated Activity: Formulating Planes. In Cognition and
Communication at Work. Yrj Engestrm an d David Middleton, eds. Pp.
61-95. Cambridge: Cambridge University Press.
Goodw in, Marjorie Harness
1980b Processes of Mutual Monitoring Implicated in the Production of
Description Sequences. Sociological Inquiry 50:303-317.
Goodw in, Marjorie Harness, and Char les Goodw in
1986 Gesture and Coparticipation in the Activity of Searching for a Word.
Semiotica 62(1/ 2):51-75.
Hall, Rogers, and Reed Stevens
1995 Making Space: A Comparison of Mathematical Work in School and
Professional Design Practices. Sociological Review :118-275.
Haviland , John B.
1993 Anchoring, iconicity, and orientation in Guugu Yimidhirr pointing
gestures.Journal of Linguistic Anthropology 3(1):3-45.
Heath , Christian
1986 Body Movement and Speech in M edical Interaction. Cambridge:
Cambridge University Press.
Heath , Christian
in press Virtual Looking: Spatial Transformation and Comm un icative
Asymmetries. In Proceedings of the Colloquiem on the Semiotics of Space. P.
Pelligrino, ed . Geneva: University of Geneva.
Heath , Christian, and Pau l Luff
1993 Disembodied Condu ct:Interactional Asymmetries in Video-Mediated
Commun iation. In Technology in Working Order. Graham Button, ed.
Pp. 35-54. London and N ew York: Routledge.
Heath , Christian, and Pau l Luff
1996 Convergent Activities: Line Control and Passenger Information on the
London Un derground . In Cognition and Communication at Work. Yrj
Engestrm and David Midd leton, eds. Pp . 96-129. Cambridge:
Cambridge University Press.
Heath , Christian, and Gillian Nicholls
p. 181
8/3/2019 Handbook of Visual Analysis an Ethnomethodological Approach
37/40
36
1997 Animating Texts: Selective Readings of News Stories. InDiscourse,
Tools and Reasonsing: Essays on Situated Cognition . Lauren B. Resnick,
Roger Slj, Clotilde Pontecorvo, and Barbara Burge, eds. Pp . 63-86.
Berlin, Heidelberg, New York: Spr inger.
Heath , Christian C., and Paul K. Luff
1992 Crisis and Control: Collaborative Work in London Underground
Control Rooms.Journal of Computer Supported Cooperative Work1(1):24-
48.
Hind marsh, Jon, and Christian Heath
in p ress The Interactional Practice of Reference. Journ al of Pragmatics
Hutchins, Edwin
1995 Cognition in the Wild. Cambridge MA: MIT Press.
Kawatoko, Yasuko
in press Organizaing Multiple Vision.Mind, Culture and Activity .
Kend on, Adam
1990a Conducting Interaction: Patterns