- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.3k
Closed
Description
Usage instructions are given in https://github.com/tesseract-ocr/tesseract/blob/master/training/combine_lang_model.cpp#L43-58
// Check validity of input flags.
 if (FLAGS_input_unicharset.empty() || FLAGS_script_dir.empty() ||
     FLAGS_output_dir.empty() || FLAGS_lang.empty()) {
   tprintf("Usage: %s --input_unicharset filename --script_dir dirname\n",
           argv[0]);
   tprintf("  --output_dir rootdir --lang lang [--lang_is_rtl]\n");
   tprintf("  [--words file --puncs file --numbers file]\n");
   tprintf("Sets properties on the input unicharset file, and writes:\n");
   tprintf("rootdir/lang/lang.charset_size=ddd.txt\n");
   tprintf("rootdir/lang/lang.traineddata\n");
   tprintf("rootdir/lang/lang.unicharset\n");
   tprintf("If the 3 word lists are provided, the dawgs are also added to");
   tprintf(" the traineddata file.\n");
   tprintf("The output unicharset and charset_size files are just for human");
   tprintf(" readability.\n");
However, the actual info displayed is
USAGE: combine_lang_model
  --lang_is_rtl  True if lang being processed is written right-to-left  (type:bool default:false)
  --pass_through_recoder  If true, the recoder is a simple pass-through of the unicharset. Otherwise, potentially a compre
ssion of it  (type:bool default:false)
  --input_unicharset  Unicharset to complete and use in encoding  (type:string default:)
  --script_dir  Directory name for input script unicharsets  (type:string default:)
  --words  File listing words to use for the system dictionary  (type:string default:)
  --puncs  File listing punctuation patterns  (type:string default:)
  --numbers  File listing number patterns  (type:string default:)
  --output_dir  Root directory for output files  (type:string default:)
  --version_str  Version string to add to traineddata file  (type:string default:)
  --lang  Name of language being processed  (type:string default:)
So, it looks like that the program is calling a common training argument parser and exiting.
int main(int argc, char** argv) {
  tesseract::ParseCommandLineFlags(argv[0], &argc, &argv, true);
Related: #1297
Metadata
Metadata
Assignees
Labels
No labels