50
Clarke, R. J (2001) S951-10: 1 Critical Issues in Information Systems BUSS 951 Seminar 10 Transcription & Coding

Critical Issues in Information Systems

Embed Size (px)

DESCRIPTION

Critical Issues in Information Systems. BUSS 951. Seminar 10 Transcription & Coding. Transcription & Coding An Introduction. Transcribing & Coding. transcription and coding is a major requirement for language based methods of analysis transcription - convertion of speech to writing - PowerPoint PPT Presentation

Citation preview

Page 1: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 1

Critical Issues in Information Systems

BUSS 951

Seminar 10Transcription & Coding

Page 2: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 2

Transcription & CodingAn Introduction

Page 3: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 3

Transcribing & Coding

transcription and coding is a major requirement for language based methods of analysistranscription- convertion of speech to

writingcoding- is the addition of relevant

information to the transcriptionneeded because spoken and written

language are very different

Page 4: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 4

Speech is not WritingDifferences in Spoken & Written Texts

+ interactive2 or more participants

+ face-to-face in the same place and time+ language as action

using language to accomplish some task

+ spontaneouswithout rehearsing what is going to be said

+ casualinformal and everyday

- interactive

one participant

- face-to-face

on his or her own

- language as action

using language to reflect

- spontaneous

planning, drafting and rewriting

- casual

formal and special occasions

Page 5: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 5

Transcribing & Coding

Seek to

Lead-in Zone

Playback

Coding

Transcribe

cue the tape (rewind and fast forward) until you get tothe part of the tape you are seeking

iterate until the text is transcribed and coded

Page 6: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 6

CHAT Standard

Page 7: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 7

CHAT

one of the best standards is CHAT- Codes for the Human Analysis of Transcripts

well defined standardeven in research literature,

transcriptions are often ad hoc & idiosyncratic

formal standards are difficult to obtain

Page 8: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 8

CHAT

developed for subsequent computer processing in mindsuite of programs is available called

CLAN to parse the textexcellent provision for creating

transcripts even when the text is difficult to understandspeaker has an accent or has a speech

problem

Page 9: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 9

CHAT

standard is extensible; provides a consistent way of adding new headers if necessary

developed by Brain MacWhinney and Jane Walter at the CHILDES- Child Language Data Exchange Research Centre Department of Psychology, Carnegie Mellon University

Page 10: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 10

CHAT StructureCHAT has a basic structure common to all

transcriptsa block of so-called Constant Headers at the top of

the transcript starting with an @Beginthe body of the transcript consisting of turns taken

by speakers called Mainlines, followed by zero through to many Dependent Tiers

a single command which is used to signal the end of the transcript, @End

Page 11: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 11

CHAT StructureTop of Transcript

Page 12: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 12

CHAT StructureTop of the Transcript (1)

the top of any transcript always has two compulsory commands:

@Begin@Participants: MCL MicroLabs Assistant, STU Student

@Begin indicates the start of the transcript. It must always be the first line of any CHAT transcript. It does not include any other information...

Page 13: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 13

CHAT StructureTop of the Transcript (2)

@Participants specifies is a mandatory Constant Header- a command only used once per transcript- which lists the interactants in the transcript. The syntax as with all transcripts is critical.

the three letter codes after the header indicate a person who speaks or is other wise involved with the text

the string after the three letter code explains the role of that participant in the text

Page 14: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 14

CHAT StructureTop of the Transcript (3)

below the @Begin and @Participants can be listed other optional constant headers including @Age of, @Sex of, @SES of

@Age of MCL: 35

@SES of MCL: middle

@Sex of MCL: male

Page 15: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 15

CHAT StructureTop of the Transcript (4)

optional Constant Headers must follow the @Participants header because they need to refer to the three letter participant identifier

whether you include them will depend on if they are significant: is the age of a participant important in the text?

a complete list follows...

Page 16: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 16

Table 1: CHAT Constant Headers. CHAT Constant Headers. Constant Headers that haveproved to be useful in workplace language studies (Clarke 1996b, 1996c) arepresented against a white background while less relevant Constant Headers are

presented against a shaded background.

@Begin indicates the start of CHAT file@Participants: list of actors in file@Age of XXX: speakers age in yymmdd format@Birth of XXX: date of birth of speaker@SES of XXX: socio-economic status of speaker@Education of XXX: speakers education in years@Sex of XXX: indicates gender of the speaker@Filename: name of transcription data file@Coding: version of CHAT being used@Warning: relative completeness of the transcript@End indicates the end of CHAT file

Page 17: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 17

CHAT StructureTop of the Transcript (6)

the CHAT Constant Headers can also be represented using a syntax diagram, which are also used for describing the syntax rules for computer languages like Pascal

a diagram follows...

Page 18: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 18

Figure 3: CHAT Constant Headers Syntax Diagram

Page 19: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 19

CHAT StructureTop of the Transcript (8)

Completed transcript so far...

@Begin

@Participants: MCL MicroLabs Assistant, STU Student

@Age of MCL: 35

@SES of MCL: middle

@Sex of MCL: male

@Age of STU: 18

@SES of STU: middle

@Sex of STU: male

Page 20: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 20

CHAT StructureTranscript Body

Page 21: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 21

CHAT StructureTranscript Body (1)

most of the transcript body of mainlines which indicate that a participant is taking a turn in the conversation

other features are also found in the transcript body include:Dependent Tiers which are used to add special

coding for a given turnChangeable or Repeating Headers

Page 22: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 22

CHAT StructureMainlines (1)

a mainline is a turn taken by a participant, indicated by an *

who takes a turn is indicated by one of the participant identifiers, listed in the @Participants constant header...

Page 23: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 23

CHAT StructureMainlines (2)

the text comprising the speakers turn is transcribed after the * and participant identifier

an example of a completed mainline:

*MCL what software do you want

Page 24: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 24

CHAT StructureDependent Tiers (1)

Dependent Tiers are used to add extra detail

many different types of themalways relate only to a specific turn,

and if necessary, are only ever listed below the mainline to which they refer

Page 25: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 25

CHAT StructureDependent Tiers (2)

dependent tiers are identified in a transcript by the use of a % followed by the appropriate dependent tier code

the dependent tier code tells the reader what kind of information is being coded for the above mainline

Page 26: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 26

CHAT StructureDependent Tiers (3)

an example showing a mainline and its two dependent tiers (%sit, %com) is provided below:

*MCL what software do you want

%sit STU and MCL are at the service desk

%com STU looks like he is lost

a list of valid dependent tiers follows...

Page 27: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 27

Table 4:CHAT Dependent Tiers. Dependent Tiers that have proved to be useful inworkplace language studies (Clarke 1996b, 1996c) are presented against a whitebackground while less relevant Dependent Tiers are presented against a shadedbackground.

%flo simplified flowing original%pho phonetic and phonemic transcription%par paralinguistic features%int intonation and prosody%lan code shifting into secondary language%act actions%fac facial actions%gpx gestures and proxemics%add addressee%sit situational coding%exp explanation%com comments by investigator/transcriber%alt alternative utterance%tim time stamp coding%spa speech act coding%mor morphemic semantics%phs phrase structure notation%err error coding%cod general purpose coding

Page 28: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 28

CHAT StructureChangeable/Repeating Headers (1)

Repeating Headers can be inserted repeatedly in a transcript, but they are only used when a significant condition has changed

inserted in a transcript, a Repeating Header is valid for the remainder of the transcript, or until another Header of the same type overrides it

Page 29: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 29

CHAT StructureChangeable/Repeating Headers (2)

a list of valid Changeable or Repeating Headers is provided on the next slide

just like the Constant Headers, Changeable or Repeating Headers can be described using a syntax diagram, which is on the slide following the list

Page 30: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 30

Table 2:CHAT Changeable or Repeating Headers. Repeating Headers that have provedto be useful in workplace language studies (Clarke 1996b, 1996c) are presentedagainst a white background while less relevant Repeating Headers are presentedagainst a shaded background.

@Date: the date of the interaction@Timing: absolute or relative timing@Situation: general atmosphere or setting@Activities: activities in the situation@Location: city, state, country@Room Layout: room configuration, furniture@Transcriber: name of transcriber of the tape@Tape Location: specific tape ID, side, footage@Stim: stimuli used in an elicited task@Language: unnecessary in a monolingual (English) study

Page 31: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 31

Page 32: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 32

CHAT StructureSummary...so far!

so far we have described three separate types of structure that occur within the body of a CHAT transcript:Mainlines (for transcribing turns)Dependent Tiers (for coding turns)Changeable or Repeating Headers

Page 33: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 33

CHAT StructureSpecial Mainline Codes (1)

sometimes it is important to add additional information into the mainline itself

NOTE the following about the body of the CHAT transcript:an actual turn as shown in lower case

on a mainline, andthat there is normally no punctuation

on mainlines

Page 34: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 34

CHAT StructureSpecial Mainline Codes (2)

this is because when punctuation is used it conforms to CHAT Special Mainline Codes

Special Mainline Codes occur in one of two types:Utterance Junctures and DelimitersUtterance Ambiguity Codes

we will describe both types in order...

Page 35: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 35

CHAT StructureSpecial Mainline Codes (3)

Utterance Junctures and Delimiters- indicate either junctures or brakes in the

turn (pauses etc). These Special Mainline Codes are referred to as Utterance Internal Junctures

indicate how a turn was completed (as a question, the speaker was interrupted etc). These Special Mainline Codes are referred to as Post Utterance Delimiters

Page 36: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 36

CHAT StructureSpecial Mainline Codes (4)

Utterance Junctures and Delimiters continued...indicate how a turn was started, either

by a participant taking up anothers talk (called latching), or by completing anothers talk (called completion). These Special Mainline Codes are referred to as Pre Utterance Delimiters

a list follows...

Page 37: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 37

Utterance Junctures and Delimiters

(a) Utterance Internal Junctures

Short Pause [#]Long Pause [#long]Timed Pause [#ss.mm]Comma ,

(b) Post Utterance Delimiters

Period .Question ?Exclamation !Trailing off [...]Interruption [\]

(c) Pre Utterance Delimiters

Latching [>]Completion [+]

Page 38: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 38

CHAT StructureSpecial Mainline Codes (6)

Utterance Ambiguity Codes can also be inserted into a mainline

used when there has been:a problem with the transcription

process, orwhen an unusual condition occurs

(when a gesture substitutes for a word) words used special coding is required...

Page 39: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 39

CHAT StructureSpecial Mainline Codes (7)

Utterance Ambiguity Codes may also be moved to their own dependent tiers if the mainline is getting cluttered up with coding

the table that follows shows the valid CHAT Utterance Ambiguity Codes ...

Page 40: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 40

Utterance Ambiguity Codes

(a) Main Line Coding

Unintelligible speech xxxUnrecognizable word break &Word Non-completion ()Omitted Word 0

(b) Dependent Tier Coding

Phonemic transcription yyy %pho dependent tierUntranscribed material www %exp dependent tierActions or Gestures 0 %act or %gpx tiers

Page 41: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 41

CHAT StructureBottom of the Transcript

Page 42: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 42

CHAT StructureBottom of the Transcript (1)

the only unique syntax for the bottom of the transcript is the @End mandatory Constant Header

needed to indicate when a transcript is finished

a relatively complete transcript extract showing required features follows. NOTE that : is not part of the CHAT standard...

Page 43: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 43

Page 44: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 44

Tool Support

Page 45: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 45

Tool Support (1)

the CHAT system has a number of tools available for it

one tool called CLAN consists of a parser for checking the syntax of CHAT transcripts

multimedia versions of CLAN are being developed; useful when meetings have been videotaped

Page 46: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 46

Tool Support (2)Needed for Transcription NOT Coding

these tools are great for building elaborately coded transcripts

they are not so helpful when dealing with workplace language

coding is not the major problem- its transcription that takes the greatest effort in workplace language studies

Page 47: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 47

Tool Support (3)

Transcription there are of course a number of

transcription systems which when combined with CHAT and CLAN could form a useful workplace language system

but, the ‘State-of-the-Art’ still not very good

Page 48: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 48

Tool Support (4) Speech Recognition?

some manufacturers claim to get 95% accuracy in transcription, but this is only possible under very constrained conditions:these systems cannot handle speech which is

continuous and flowing- the software cannot find where words start and end

these systems cannot transcribe speech unless the system has been trained to understand each and every speaker

Page 49: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 49

Tool Support (5)

in some circumstances the inability of current systems to recognise Flowing Speech may not be a great problem because workplace transcripts can be sparse

Some excellent system are becoming available eg./ Dragon DICTATE for Windows

Page 50: Critical Issues in Information Systems

Clarke, R. J (2001) S951-10: 50

Tool Support (6)

but, it has taken the IS Discipline 20 years to come up with reasonable CASE tools to support traditional systems development activities

we may need another 20 years to provide the same level of support for semio-informatics!