23

HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical

Embed Size (px)

Citation preview

Page 1: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical
Page 2: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical

HTK Tool Kit

The HTK language modeling tools are a group of programs designed for constructing and testing statistical n-gram language models

Page 3: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical
Page 4: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical

HTK Tool Kit

Page 5: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical

HTK Tool Kit

Page 6: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical

HTK Tool Kit

Page 7: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical

HTK Tool Kit

Page 8: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical

HTK Tool Kit

Page 9: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical
Page 10: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical

HTK Tool Kit

LNewMap [options] name mapfn

-e esc Change the contents of the EscMode header to esc. Default is RAW.

-f fld Add the field fld to the Fields header.

Page 11: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical

HTK Tool Kit

Example:

LNewMap -f WFC Holmes empty.wmap

Name = HolmesSeqNo = 0Entries = 0EscMode = RAWFields = ID,WFC\Words\

Page 12: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical

HTK Tool Kit

LGPrep [options] wordmap [textfile ...]

-a n Allow upto n new words in input texts (default 100000).

-b n Set the internal gram buffer size to n (default 2000000). LGPrep stores incoming n-grams in this buffer. When the buffer is full, the contents are sorted and written to an output gram file. Thus, the buffer size determines the amount of process memory that LGPrep will use and the size of the individual output gram files.

Page 13: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical

HTK Tool Kit

LGPrep [options] wordmap [textfile ...]

-d Directory in which to store the output gram files (default current directory).

-i n Set the index of the first gram file output to be n (default 0).

-n n Set the output n-gram size to n (default 3).

-r s Set the root name of the output gram files to s (default “gram”).

Page 14: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical

HTK Tool Kit

LGPrep [options] wordmap [textfile ...]

-s s Write the string s into the source field of the output gram files. This string should be a comment describing the text source.

-z Suppress gram file output. This option allows LGPrep to be used just to compute a word frequency map. It is also normally applied when applying edit rules to the input.

Page 15: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical

HTK Tool Kit

Example:

LGPrep -T 1 -a 100000 -b 2000000 -d holmes.0 –n 4-s "Sherlock Holmes" empty.wmap D:\train\abbey_grange.txt, D:\train\beryl_coronet.txt,...

Page 16: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical

HTK Tool Kit

WMAP file

Name = HolmesSeqNo = 1Entries = 18080EscMode = RAWFields = ID,WFC\Words\<s> 65536 33669IT 65537 8106WAS 65538 7595...

Page 17: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical

HTK Tool Kit

LGCopy [options] wordmap [mult] gramfiles

-b n Set the internal gram buffer size to n (default 2000000). LGPrep stores incoming n-grams in this buffer. When the buffer is full, the contents are sorted and written to an output gram file. Thus, the buffer size determines the amount of process memory that LGPrep will use and the size of the individual output gram files.

-d Directory in which to store the output gram files (default current directory).

Page 18: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical

HTK Tool Kit

LGCopy [options] wordmap [mult] gramfiles

-o n Output class mappings only. Normally all input n-grams are copied to the output, however, if a class map is specified, this options forces the tool to output only n-grams containing at least one class symbol.

Page 19: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical

HTK Tool Kit

Example:

LGCopy -T 1 -b 2000000 -d D:\holmes.1 D:\ holmes.0\wmap D:\ holmes.0\gram.1 D:\ holmes.0\gram.2.....

Page 20: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical

HTK Tool Kit

LBuild [options] wordmap outfile [mult] gramfile .. -c n c Set cutoff for n-gram to c.

-n n Set final model order to n.

Page 21: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical

HTK Tool Kit

Example:

LBuild -T 1 -c 2 1 -c 3 1 -n 3 D:\lm_5k\5k.wmapD:\lm_5k\tg2-1_1 D:\holmes.1\data.1 D:\holmes.1\data.2... D:\lm_5k\data.1 D:\lm_5k\data.12

Page 22: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical

HTK Tool Kit

LPlex [options] langmodel labelFiles

-n n Perform a perplexity test using the n-gram component of the model. Multiple tests can be specified. By default the tool will use the maximum value of n available.

-t Text stream mode. If this option is set, the specified test files will be assumed to contain plain text.

Page 23: HTK Tool Kit. What is HTK tool kit HTK Tool Kit The HTK language modeling tools are a group of programs designed for constructing and testing statistical

HTK Tool Kit

Example:

Lplex -n 3 -t D:\lm_5k\tg1_1 D:\test\red-headed_league.txt