A semi-supervised method for efficient construction of statistical spoken language understanding resources

A Semi-supervised Method for Efficient Construction of Statistical Spoken

Language Understanding Resources

Seokhwan Kim, Minwoo Jeong, and Gary Geunbae Lee

Pohang University of Science and Technology (POSTECH), South Korea

ABSTRACT

We present a semi-supervised framework to construct spoken

language understanding resources with very low cost. We

generate context patterns with a few seed entities and a large

amount of unlabeled utterances. Using these context patterns,

we extract new entities from the unlabeled utterances. The

extracted entities are appended to the seed entities, and we can

obtain the extended entity list by repeating these steps. Our

method is based on an utterance alignment algorithm which is a

variant of the biological sequence alignment algorithm. Using

this method, we can obtain precise entity lists with high

coverage, which is of help to reduce the cost of building

resources for statistical spoken language understanding systems.

SCORING CANDIDATES

● The score of the alignment between a raw utterance and a

context pattern

OVERALL PROCEDURE

1. Prepare seed entity list E and unlabeled corpus C

2. Find utterances containing lexical of entities in E in the

corpus C, and replace the parts of matched entities in the

found utterances with a label which indicates the location

of entities. Add partially labeled utterances to the

context pattern set P.

3. Align each utterance in the corpus C with each context

pattern in P, and extract new entity candidates in the utterance

which is matched with the entity label in the context

pattern.

4. Compute the score of extracted entity candidates in step

3, and add only high-scored candidates to E.

5. If there is no additional entities to E in step 4, terminate

the process with entity list E, context pattern set P, and

partially labeled corpus C as results. Otherwise, return

to step 2 and repeat the process.

MOTIVATION

Spoken Language Understanding (SLU) is a problem of

extracting semantic meanings from recognized utterances and

filling the correct values into a semantic frame structure. Most

of the statistical SLU approaches require a sufficient amount of

training data which consist of labeled utterances with

corresponding semantic concepts. The task of building the

training data for the SLU problem is one of the most important

and high-cost required tasks for managing the spoken dialog

systems. We concentrate on utilizing a semi-supervised

information extraction method for building resources for

statistical SLU modules in spoken dialog systems.

EXTRACTING CONTEXT PATTERNS

To overcome the context sparseness problem of spoken utterances, we make use of not sub-phrases of an utterance, but the full

utterance itself as a context pattern for extracting named entities in the utterance. First, we assume that each entry in the entity list

is absolutely precise and uniquely belong to only a category whether it is a seed entity or an extended entity as an intermediate of

the overall procedure. For each entry in the entity list, we find out utterances containing it, and make an utterance template by

replacing the part of entity in the utterance with the defined entity label. In this replacing task, we exclude the entities which are

located at the beginning or end of the utterance, because context patterns containing the entities located in such positions can lead

to confusion of determining the boundaries of each entity in the later procedure.

ALIGNMENT-BASED NAMED ENTITY RECOGNITION

We firstly align a raw utterance with a context pattern containing entity labels. Then, from the result of the best alignment

between them, we extract the parts of the raw utterance which are aligned to the entity labels in the context pattern as an entity

candidate belonging to the category of the corresponding entity label.

MATRIX COMPUTATION TRACE BACK

The traceback step is started at the position with

maximum score from among the first column and the

first row. Then, the next position of the position [i, j] is

determined by following policies.

• If tar(i) and ref(j) are identical, then the next position

is [i + 1, j + 1].

• Otherwise, the position with maximum score from

among [i + 1, j + 1] ~ [i + 1, n] and [i + 1, j + 1] ~ [m, j +

1] is the next position.

● The score of an entity candidate ej which is extracted by a

context pattern refi

● The final score of an entity candidate ej

EXPERIMENTS

We evaluated our method on the CU-Communicator corpus,

which consists of 13,983 utterances. We chose the three most

frequently occurring semantic categories in the corpus, CITY

NAME, MONTH, and DAY NUMBER. we empirically set the

entity selection threshold value to 0.3.

● Result of automatic entity list extension

Category # of

seeds

# of

extended

entities

# of

total

entities

Precision Recall

CITY_NAME 20 123 209 65.04% 37.91%

MONTH 1 10 12 100% 83.33%

DAY_NUMBER 3 27 34 100% 79.41%

Category Precision Recall F-measure

CITY_NAME 91.30 86.83 89.01

MONTH 98.98 87.24 92.74

DAY_NUMBER 92.00 82.03 86.73

Overall 93.24 85.53 89.22

where ref is a context pattern, tar is a raw utterance, n is the

number of words in ref, m is the number of words in tar, t is the

number of aligned entity labels, and e is the number of words

extracted as entity candidates

● Result of corpus labeling experiment