Package tesseract-tools

Training tools for tesseract

https://github.com/tesseract-ocr/tesseract

A commercial quality OCR engine originally developed at HP between 1985 and
1995. In 1995, this engine was among the top 3 evaluated by UNLV. It was
open-sourced by HP and UNLV in 2005.

The tesseract-tools package contains tools for training tesseract.

Version: 5.5.2

See also: tesseract.

General Commands
ambiguous_words	generate sets of words Tesseract is likely to find ambiguous
classifier_tester	for legacy tesseract engine.
cntraining	character normalization training for Tesseract
combine_lang_model	generate starter traineddata
combine_tessdata	combine/extract/overwrite/list/compact Tesseract data
dawg2wordlist	convert a Tesseract DAWG to a wordlist
lstmeval	Evaluation program for LSTM-based networks.
lstmtraining	Training program for LSTM-based networks.
merge_unicharsets	Simple tool to merge two or more unicharsets.
mftraining	feature training for Tesseract
set_unicharset_properties	set properties about the unichars
shapeclustering	shape clustering training for Tesseract
text2image	generate OCR training pages.
unicharset_extractor	Reads box or plain text files to extract the unicharset.
wordlist2dawg	convert a wordlist to a DAWG for Tesseract
File Formats
unicharambigs	Tesseract unicharset ambiguities
unicharset	character properties file used by tesseract(1)

General Commands

File Formats