Package tesseract-tools
Training tools for tesseract
https://github.com/tesseract-ocr/tesseract
A commercial quality OCR engine originally developed at HP between 1985 and
1995. In 1995, this engine was among the top 3 evaluated by UNLV. It was
open-sourced by HP and UNLV in 2005.
The tesseract-tools package contains tools for training tesseract.
Version: 5.5.1
See also: tesseract.
General Commands | |
| ambiguous_words | generate sets of words Tesseract is likely to find ambiguous |
| classifier_tester | for *legacy tesseract* engine. |
| cntraining | character normalization training for Tesseract |
| combine_lang_model | generate starter traineddata |
| combine_tessdata | combine/extract/overwrite/list/compact Tesseract data |
| dawg2wordlist | convert a Tesseract DAWG to a wordlist |
| lstmeval | Evaluation program for LSTM-based networks. |
| lstmtraining | Training program for LSTM-based networks. |
| merge_unicharsets | Simple tool to merge two or more unicharsets. |
| mftraining | feature training for Tesseract |
| set_unicharset_properties | set properties about the unichars |
| shapeclustering | shape clustering training for Tesseract |
| text2image | generate OCR training pages. |
| unicharset_extractor | Reads box or plain text files to extract the unicharset. |
| wordlist2dawg | convert a wordlist to a DAWG for Tesseract |
File Formats | |
| unicharambigs | Tesseract unicharset ambiguities |
| unicharset | character properties file used by tesseract(1) |