Upload
walter-gallegos
View
41
Download
2
Embed Size (px)
DESCRIPTION
Critical Issues in Information Systems. BUSS 951. Seminar 10 Transcription & Coding. Transcription & Coding An Introduction. Transcribing & Coding. transcription and coding is a major requirement for language based methods of analysis transcription - convertion of speech to writing - PowerPoint PPT Presentation
Citation preview
Clarke, R. J (2001) S951-10: 1
Critical Issues in Information Systems
BUSS 951
Seminar 10Transcription & Coding
Clarke, R. J (2001) S951-10: 2
Transcription & CodingAn Introduction
Clarke, R. J (2001) S951-10: 3
Transcribing & Coding
transcription and coding is a major requirement for language based methods of analysistranscription- convertion of speech to
writingcoding- is the addition of relevant
information to the transcriptionneeded because spoken and written
language are very different
Clarke, R. J (2001) S951-10: 4
Speech is not WritingDifferences in Spoken & Written Texts
+ interactive2 or more participants
+ face-to-face in the same place and time+ language as action
using language to accomplish some task
+ spontaneouswithout rehearsing what is going to be said
+ casualinformal and everyday
- interactive
one participant
- face-to-face
on his or her own
- language as action
using language to reflect
- spontaneous
planning, drafting and rewriting
- casual
formal and special occasions
Clarke, R. J (2001) S951-10: 5
Transcribing & Coding
Seek to
Lead-in Zone
Playback
Coding
Transcribe
cue the tape (rewind and fast forward) until you get tothe part of the tape you are seeking
iterate until the text is transcribed and coded
Clarke, R. J (2001) S951-10: 6
CHAT Standard
Clarke, R. J (2001) S951-10: 7
CHAT
one of the best standards is CHAT- Codes for the Human Analysis of Transcripts
well defined standardeven in research literature,
transcriptions are often ad hoc & idiosyncratic
formal standards are difficult to obtain
Clarke, R. J (2001) S951-10: 8
CHAT
developed for subsequent computer processing in mindsuite of programs is available called
CLAN to parse the textexcellent provision for creating
transcripts even when the text is difficult to understandspeaker has an accent or has a speech
problem
Clarke, R. J (2001) S951-10: 9
CHAT
standard is extensible; provides a consistent way of adding new headers if necessary
developed by Brain MacWhinney and Jane Walter at the CHILDES- Child Language Data Exchange Research Centre Department of Psychology, Carnegie Mellon University
Clarke, R. J (2001) S951-10: 10
CHAT StructureCHAT has a basic structure common to all
transcriptsa block of so-called Constant Headers at the top of
the transcript starting with an @Beginthe body of the transcript consisting of turns taken
by speakers called Mainlines, followed by zero through to many Dependent Tiers
a single command which is used to signal the end of the transcript, @End
Clarke, R. J (2001) S951-10: 11
CHAT StructureTop of Transcript
Clarke, R. J (2001) S951-10: 12
CHAT StructureTop of the Transcript (1)
the top of any transcript always has two compulsory commands:
@Begin@Participants: MCL MicroLabs Assistant, STU Student
@Begin indicates the start of the transcript. It must always be the first line of any CHAT transcript. It does not include any other information...
Clarke, R. J (2001) S951-10: 13
CHAT StructureTop of the Transcript (2)
@Participants specifies is a mandatory Constant Header- a command only used once per transcript- which lists the interactants in the transcript. The syntax as with all transcripts is critical.
the three letter codes after the header indicate a person who speaks or is other wise involved with the text
the string after the three letter code explains the role of that participant in the text
Clarke, R. J (2001) S951-10: 14
CHAT StructureTop of the Transcript (3)
below the @Begin and @Participants can be listed other optional constant headers including @Age of, @Sex of, @SES of
@Age of MCL: 35
@SES of MCL: middle
@Sex of MCL: male
Clarke, R. J (2001) S951-10: 15
CHAT StructureTop of the Transcript (4)
optional Constant Headers must follow the @Participants header because they need to refer to the three letter participant identifier
whether you include them will depend on if they are significant: is the age of a participant important in the text?
a complete list follows...
Clarke, R. J (2001) S951-10: 16
Table 1: CHAT Constant Headers. CHAT Constant Headers. Constant Headers that haveproved to be useful in workplace language studies (Clarke 1996b, 1996c) arepresented against a white background while less relevant Constant Headers are
presented against a shaded background.
@Begin indicates the start of CHAT file@Participants: list of actors in file@Age of XXX: speakers age in yymmdd format@Birth of XXX: date of birth of speaker@SES of XXX: socio-economic status of speaker@Education of XXX: speakers education in years@Sex of XXX: indicates gender of the speaker@Filename: name of transcription data file@Coding: version of CHAT being used@Warning: relative completeness of the transcript@End indicates the end of CHAT file
Clarke, R. J (2001) S951-10: 17
CHAT StructureTop of the Transcript (6)
the CHAT Constant Headers can also be represented using a syntax diagram, which are also used for describing the syntax rules for computer languages like Pascal
a diagram follows...
Clarke, R. J (2001) S951-10: 18
Figure 3: CHAT Constant Headers Syntax Diagram
Clarke, R. J (2001) S951-10: 19
CHAT StructureTop of the Transcript (8)
Completed transcript so far...
@Begin
@Participants: MCL MicroLabs Assistant, STU Student
@Age of MCL: 35
@SES of MCL: middle
@Sex of MCL: male
@Age of STU: 18
@SES of STU: middle
@Sex of STU: male
Clarke, R. J (2001) S951-10: 20
CHAT StructureTranscript Body
Clarke, R. J (2001) S951-10: 21
CHAT StructureTranscript Body (1)
most of the transcript body of mainlines which indicate that a participant is taking a turn in the conversation
other features are also found in the transcript body include:Dependent Tiers which are used to add special
coding for a given turnChangeable or Repeating Headers
Clarke, R. J (2001) S951-10: 22
CHAT StructureMainlines (1)
a mainline is a turn taken by a participant, indicated by an *
who takes a turn is indicated by one of the participant identifiers, listed in the @Participants constant header...
Clarke, R. J (2001) S951-10: 23
CHAT StructureMainlines (2)
the text comprising the speakers turn is transcribed after the * and participant identifier
an example of a completed mainline:
*MCL what software do you want
Clarke, R. J (2001) S951-10: 24
CHAT StructureDependent Tiers (1)
Dependent Tiers are used to add extra detail
many different types of themalways relate only to a specific turn,
and if necessary, are only ever listed below the mainline to which they refer
Clarke, R. J (2001) S951-10: 25
CHAT StructureDependent Tiers (2)
dependent tiers are identified in a transcript by the use of a % followed by the appropriate dependent tier code
the dependent tier code tells the reader what kind of information is being coded for the above mainline
Clarke, R. J (2001) S951-10: 26
CHAT StructureDependent Tiers (3)
an example showing a mainline and its two dependent tiers (%sit, %com) is provided below:
*MCL what software do you want
%sit STU and MCL are at the service desk
%com STU looks like he is lost
a list of valid dependent tiers follows...
Clarke, R. J (2001) S951-10: 27
Table 4:CHAT Dependent Tiers. Dependent Tiers that have proved to be useful inworkplace language studies (Clarke 1996b, 1996c) are presented against a whitebackground while less relevant Dependent Tiers are presented against a shadedbackground.
%flo simplified flowing original%pho phonetic and phonemic transcription%par paralinguistic features%int intonation and prosody%lan code shifting into secondary language%act actions%fac facial actions%gpx gestures and proxemics%add addressee%sit situational coding%exp explanation%com comments by investigator/transcriber%alt alternative utterance%tim time stamp coding%spa speech act coding%mor morphemic semantics%phs phrase structure notation%err error coding%cod general purpose coding
Clarke, R. J (2001) S951-10: 28
CHAT StructureChangeable/Repeating Headers (1)
Repeating Headers can be inserted repeatedly in a transcript, but they are only used when a significant condition has changed
inserted in a transcript, a Repeating Header is valid for the remainder of the transcript, or until another Header of the same type overrides it
Clarke, R. J (2001) S951-10: 29
CHAT StructureChangeable/Repeating Headers (2)
a list of valid Changeable or Repeating Headers is provided on the next slide
just like the Constant Headers, Changeable or Repeating Headers can be described using a syntax diagram, which is on the slide following the list
Clarke, R. J (2001) S951-10: 30
Table 2:CHAT Changeable or Repeating Headers. Repeating Headers that have provedto be useful in workplace language studies (Clarke 1996b, 1996c) are presentedagainst a white background while less relevant Repeating Headers are presentedagainst a shaded background.
@Date: the date of the interaction@Timing: absolute or relative timing@Situation: general atmosphere or setting@Activities: activities in the situation@Location: city, state, country@Room Layout: room configuration, furniture@Transcriber: name of transcriber of the tape@Tape Location: specific tape ID, side, footage@Stim: stimuli used in an elicited task@Language: unnecessary in a monolingual (English) study
Clarke, R. J (2001) S951-10: 31
Clarke, R. J (2001) S951-10: 32
CHAT StructureSummary...so far!
so far we have described three separate types of structure that occur within the body of a CHAT transcript:Mainlines (for transcribing turns)Dependent Tiers (for coding turns)Changeable or Repeating Headers
Clarke, R. J (2001) S951-10: 33
CHAT StructureSpecial Mainline Codes (1)
sometimes it is important to add additional information into the mainline itself
NOTE the following about the body of the CHAT transcript:an actual turn as shown in lower case
on a mainline, andthat there is normally no punctuation
on mainlines
Clarke, R. J (2001) S951-10: 34
CHAT StructureSpecial Mainline Codes (2)
this is because when punctuation is used it conforms to CHAT Special Mainline Codes
Special Mainline Codes occur in one of two types:Utterance Junctures and DelimitersUtterance Ambiguity Codes
we will describe both types in order...
Clarke, R. J (2001) S951-10: 35
CHAT StructureSpecial Mainline Codes (3)
Utterance Junctures and Delimiters- indicate either junctures or brakes in the
turn (pauses etc). These Special Mainline Codes are referred to as Utterance Internal Junctures
indicate how a turn was completed (as a question, the speaker was interrupted etc). These Special Mainline Codes are referred to as Post Utterance Delimiters
Clarke, R. J (2001) S951-10: 36
CHAT StructureSpecial Mainline Codes (4)
Utterance Junctures and Delimiters continued...indicate how a turn was started, either
by a participant taking up anothers talk (called latching), or by completing anothers talk (called completion). These Special Mainline Codes are referred to as Pre Utterance Delimiters
a list follows...
Clarke, R. J (2001) S951-10: 37
Utterance Junctures and Delimiters
(a) Utterance Internal Junctures
Short Pause [#]Long Pause [#long]Timed Pause [#ss.mm]Comma ,
(b) Post Utterance Delimiters
Period .Question ?Exclamation !Trailing off [...]Interruption [\]
(c) Pre Utterance Delimiters
Latching [>]Completion [+]
Clarke, R. J (2001) S951-10: 38
CHAT StructureSpecial Mainline Codes (6)
Utterance Ambiguity Codes can also be inserted into a mainline
used when there has been:a problem with the transcription
process, orwhen an unusual condition occurs
(when a gesture substitutes for a word) words used special coding is required...
Clarke, R. J (2001) S951-10: 39
CHAT StructureSpecial Mainline Codes (7)
Utterance Ambiguity Codes may also be moved to their own dependent tiers if the mainline is getting cluttered up with coding
the table that follows shows the valid CHAT Utterance Ambiguity Codes ...
Clarke, R. J (2001) S951-10: 40
Utterance Ambiguity Codes
(a) Main Line Coding
Unintelligible speech xxxUnrecognizable word break &Word Non-completion ()Omitted Word 0
(b) Dependent Tier Coding
Phonemic transcription yyy %pho dependent tierUntranscribed material www %exp dependent tierActions or Gestures 0 %act or %gpx tiers
Clarke, R. J (2001) S951-10: 41
CHAT StructureBottom of the Transcript
Clarke, R. J (2001) S951-10: 42
CHAT StructureBottom of the Transcript (1)
the only unique syntax for the bottom of the transcript is the @End mandatory Constant Header
needed to indicate when a transcript is finished
a relatively complete transcript extract showing required features follows. NOTE that : is not part of the CHAT standard...
Clarke, R. J (2001) S951-10: 43
Clarke, R. J (2001) S951-10: 44
Tool Support
Clarke, R. J (2001) S951-10: 45
Tool Support (1)
the CHAT system has a number of tools available for it
one tool called CLAN consists of a parser for checking the syntax of CHAT transcripts
multimedia versions of CLAN are being developed; useful when meetings have been videotaped
Clarke, R. J (2001) S951-10: 46
Tool Support (2)Needed for Transcription NOT Coding
these tools are great for building elaborately coded transcripts
they are not so helpful when dealing with workplace language
coding is not the major problem- its transcription that takes the greatest effort in workplace language studies
Clarke, R. J (2001) S951-10: 47
Tool Support (3)
Transcription there are of course a number of
transcription systems which when combined with CHAT and CLAN could form a useful workplace language system
but, the ‘State-of-the-Art’ still not very good
Clarke, R. J (2001) S951-10: 48
Tool Support (4) Speech Recognition?
some manufacturers claim to get 95% accuracy in transcription, but this is only possible under very constrained conditions:these systems cannot handle speech which is
continuous and flowing- the software cannot find where words start and end
these systems cannot transcribe speech unless the system has been trained to understand each and every speaker
Clarke, R. J (2001) S951-10: 49
Tool Support (5)
in some circumstances the inability of current systems to recognise Flowing Speech may not be a great problem because workplace transcripts can be sparse
Some excellent system are becoming available eg./ Dragon DICTATE for Windows
Clarke, R. J (2001) S951-10: 50
Tool Support (6)
but, it has taken the IS Discipline 20 years to come up with reasonable CASE tools to support traditional systems development activities
we may need another 20 years to provide the same level of support for semio-informatics!