Package tesseract-tools

Training tools for tesseract

https://github.com/tesseract-ocr/tesseract

A commercial quality OCR engine originally developed at HP between 1985 and
1995. In 1995, this engine was among the top 3 evaluated by UNLV. It was
open-sourced by HP and UNLV in 2005.

The tesseract-tools package contains tools for training tesseract.

Version: 5.4.1

See also: tesseract.

General Commands

ambiguous_words generate sets of words Tesseract is likely to find ambiguous
classifier_tester for *legacy tesseract* engine.
cntraining character normalization training for Tesseract
combine_lang_model generate starter traineddata
combine_tessdata combine/extract/overwrite/list/compact Tesseract data
dawg2wordlist convert a Tesseract DAWG to a wordlist
lstmeval Evaluation program for LSTM-based networks.
lstmtraining Training program for LSTM-based networks.
merge_unicharsets Simple tool to merge two or more unicharsets.
mftraining feature training for Tesseract
set_unicharset_properties set properties about the unichars
shapeclustering shape clustering training for Tesseract
text2image generate OCR training pages.
unicharset_extractor Reads box or plain text files to extract the unicharset.
wordlist2dawg convert a wordlist to a DAWG for Tesseract

File Formats

unicharambigs Tesseract unicharset ambiguities
unicharset character properties file used by tesseract(1)