pocketsphinx_batch - Man Page
Run speech recognition in batch mode
Synopsis
pocketsphinx_batch -ctl ctlfile -cepdir cepdir -cepext .mfc [ options ]...
Description
Run speech recognition over a list of utterances in batchmode. A list of arguments follows:
- -adchdr
Size of audio file header in bytes (headers are ignored)
- -adcin
Input is raw audio data
- -agc
Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')
- -agcthresh
Initial threshold for automatic gain control
- -allphone
phoneme decoding with phonetic lm
- -allphone_ci
Perform phoneme decoding with phonetic lm and context-independent units only
- -alpha
Preemphasis parameter
- -argfile
file giving extra arguments.
- -ascale
Inverse of acoustic model scale for confidence score calculation
- -aw
Inverse weight applied to acoustic scores.
- -backtrace
Print results and backtraces to log file.
- -beam
Beam width applied to every frame in Viterbi search (smaller values mean wider beam)
- -bestpath
Run bestpath (Dijkstra) search over word lattice (3rd pass)
- -bestpathlw
Language model probability weight for bestpath search
- -build_outdirs
Create missing subdirectories in output directory
- -cepdir
files directory (prefixed to filespecs in control file)
- -cepext
Input files extension (suffixed to filespecs in control file)
- -ceplen
Number of components in the input feature vector
- -cmn
Cepstral mean normalization scheme ('current', 'prior', or 'none')
- -cmninit
Initial values (comma-separated) for cepstral mean when 'prior' is used
- -compallsen
Compute all senone scores in every frame (can be faster when there are many senones)
- -ctl
file listing utterances to be processed
- -ctlcount
No. of utterances to be processed (after skipping -ctloffset entries)
- -ctlincr
Do every Nth line in the control file
- -ctloffset
No. of utterances at the beginning of -ctl file to be skipped
- -ctm
output in CTM file format (may require post-sorting)
- -debug
level for debugging messages
- -dict
pronunciation dictionary (lexicon) input file
- -dictcase
Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only)
- -dither
Add 1/2-bit noise
- -doublebw
Use double bandwidth filters (same center freq)
- -ds
Frame GMM computation downsampling ratio
- -fdict
word pronunciation dictionary input file
- -feat
Feature stream type, depends on the acoustic model
- -featparams
containing feature extraction parameters.
- -fillprob
Filler word transition probability
- -frate
Frame rate
- -fsg
format finite state grammar file
- -fsgctl
file listing FSG file to use for each utterance
- -fsgdir
directory for FSG files
- -fsgext
extension for FSG files (including leading dot)
- -fsgusealtpron
Add alternate pronunciations to FSG
- -fsgusefiller
Insert filler words at each state.
- -fwdflat
Run forward flat-lexicon search over word lattice (2nd pass)
- -fwdflatbeam
Beam width applied to every frame in second-pass flat search
- -fwdflatefwid
Minimum number of end frames for a word to be searched in fwdflat search
- -fwdflatlw
Language model probability weight for flat lexicon (2nd pass) decoding
- -fwdflatsfwin
Window of frames in lattice to search for successor words in fwdflat search
- -fwdflatwbeam
Beam width applied to word exits in second-pass flat search
- -fwdtree
Run forward lexicon-tree search (1st pass)
- -hmm
containing acoustic model files.
- -hyp
output file name
- -hypseg
output with segmentation file name
- -input_endian
Endianness of input data, big or little, ignored if NIST or MS Wav
- -jsgf
grammar file
- -keyphrase
to spot
- -kws
file with keyphrases to spot, one per line
- -kws_delay
Delay to wait for best detection score
- -kws_plp
Phone loop probability for keyword spotting
- -kws_threshold
Threshold for p(hyp)/p(alternatives) ratio
- -latsize
Initial backpointer table size
- -lda
containing transformation matrix to be applied to features (single-stream features only)
- -ldadim
Dimensionality of output of feature transformation (0 to use entire matrix)
- -lifter
Length of sin-curve for liftering, or 0 for no liftering.
- -lm
trigram language model input file
- -lmctl
a set of language model
- -lmname
language model in -lmctl to use by default
- -lmnamectl
file listing LM name to use for each utterance
- -logbase
Base in which all log-likelihoods calculated
- -logfn
to write log messages in
- -logspec
Write out logspectral files instead of cepstra
- -lowerf
Lower edge of filters
- -lpbeam
Beam width applied to last phone in words
- -lponlybeam
Beam width applied to last phone in single-phone words
- -lw
Language model probability weight
- -maxhmmpf
Maximum number of active HMMs to maintain at each frame (or -1 for no pruning)
- -maxwpf
Maximum number of distinct word exits at each frame (or -1 for no pruning)
- -mdef
definition input file
- -mean
gaussian means input file
- -mfclogdir
to log feature files to
- -min_endfr
Nodes ignored in lattice construction if they persist for fewer than N frames
- -mixw
mixture weights input file (uncompressed)
- -mixwfloor
Senone mixture weights floor (applied to data from -mixw file)
- -mllr
transformation to apply to means and variances
- -mllrctl
file listing MLLR transforms to use for each utterance
- -mllrdir
directory for MLLR transforms
- -mllrext
extension for MLLR transforms (including leading dot)
- -mmap
Use memory-mapped I/O (if possible) for model files
- -nbest
Number of N-best hypotheses to write to -nbestdir (0 for no N-best)
- -nbestdir
for writing N-best hypothesis lists
- -nbestext
Extension for N-best hypothesis list files
- -ncep
Number of cep coefficients
- -nfft
Size of FFT
- -nfilt
Number of filter banks
- -nwpen
New word transition penalty
- -outlatbeam
Minimum posterior probability for output lattice nodes
- -outlatdir
for dumping word lattices
- -outlatext
Filename extension for dumping word lattices
- -outlatfmt
Format for dumping word lattices (s3 or htk)
- -pbeam
Beam width applied to phone transitions
- -pip
Phone insertion penalty
- -pl_beam
Beam width applied to phone loop search for lookahead
- -pl_pbeam
Beam width applied to phone loop transitions for lookahead
- -pl_pip
Phone insertion penalty for phone loop
- -pl_weight
Weight for phoneme lookahead penalties
- -pl_window
Phoneme lookahead window size, in frames
- -rawlogdir
to log raw audio files to
- -remove_dc
Remove DC offset from each frame
- -remove_noise
Remove noise with spectral subtraction in mel-energies
- -round_filters
Round mel filter frequencies to DFT points
- -samprate
Sampling rate
- -seed
Seed for random number generator; if less than zero, pick our own
- -sendump
dump (compressed mixture weights) input file
- -senin
Input is senone score dump files
- -senlogdir
to log senone score files to
- -senmgau
to codebook mapping input file (usually not needed)
- -silprob
Silence word transition probability
- -smoothspec
Write out cepstral-smoothed logspectral files
- -svspec
specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)
- -tmat
state transition matrix input file
- -tmatfloor
HMM state transition probability floor (applied to -tmat file)
- -topn
Maximum number of top Gaussians to use in scoring.
- -topn_beam
Beam width used to determine top-N Gaussians (or a list, per-feature)
- -toprule
rule for JSGF (first public rule is default)
- -transform
Which type of transform to use to calculate cepstra (legacy, dct, or htk)
- -unit_area
Normalize mel filters to unit area
- -upperf
Upper edge of filters
- -uw
Unigram weight
- -var
gaussian variances input file
- -varfloor
Mixture gaussian variance floor (applied to data from -var file)
- -varnorm
Variance normalize each utterance (only if CMN == current)
- -verbose
Show input filenames
- -warp_params
defining the warping function
- -warp_type
Warping function type (or shape)
- -wbeam
Beam width applied to word exits
- -wip
Word insertion penalty
- -wlen
Hamming window length
To do batchmode recognition, you will need to specify a control file, using -ctl This is a simple text file containing one entry per line. Each entry is the name of an input file relative to the -cepdir directory, and without the filename extension (which is given in the -cepext argument).
If you are using acoustic feature files as input (see sphinx_fe(1) for information on how to generate these), you can also specify a subpart of a file, using the following format:
FILENAME START-FRAME END-FRAME UTTERANCE-ID
Author
Written by numerous people at CMU from 1994 onwards. This manual page by David Huggins-Daines <dhdaines@gmail.com>
Copyright
Copyright © 1994-2016 Carnegie Mellon University. See the file LICENSE included with this package for more information.
See Also
pocketsphinx_continuous(1), sphinx_fe(1).