Skip to content

combine_lang_model does not print correct usage help #1375

@Shreeshrii

Description

@Shreeshrii

Usage instructions are given in https://github.com/tesseract-ocr/tesseract/blob/master/training/combine_lang_model.cpp#L43-58

// Check validity of input flags.
 if (FLAGS_input_unicharset.empty() || FLAGS_script_dir.empty() ||
     FLAGS_output_dir.empty() || FLAGS_lang.empty()) {
   tprintf("Usage: %s --input_unicharset filename --script_dir dirname\n",
           argv[0]);
   tprintf("  --output_dir rootdir --lang lang [--lang_is_rtl]\n");
   tprintf("  [--words file --puncs file --numbers file]\n");
   tprintf("Sets properties on the input unicharset file, and writes:\n");
   tprintf("rootdir/lang/lang.charset_size=ddd.txt\n");
   tprintf("rootdir/lang/lang.traineddata\n");
   tprintf("rootdir/lang/lang.unicharset\n");
   tprintf("If the 3 word lists are provided, the dawgs are also added to");
   tprintf(" the traineddata file.\n");
   tprintf("The output unicharset and charset_size files are just for human");
   tprintf(" readability.\n");

However, the actual info displayed is

USAGE: combine_lang_model
  --lang_is_rtl  True if lang being processed is written right-to-left  (type:bool default:false)
  --pass_through_recoder  If true, the recoder is a simple pass-through of the unicharset. Otherwise, potentially a compre
ssion of it  (type:bool default:false)
  --input_unicharset  Unicharset to complete and use in encoding  (type:string default:)
  --script_dir  Directory name for input script unicharsets  (type:string default:)
  --words  File listing words to use for the system dictionary  (type:string default:)
  --puncs  File listing punctuation patterns  (type:string default:)
  --numbers  File listing number patterns  (type:string default:)
  --output_dir  Root directory for output files  (type:string default:)
  --version_str  Version string to add to traineddata file  (type:string default:)
  --lang  Name of language being processed  (type:string default:)

So, it looks like that the program is calling a common training argument parser and exiting.

https://github.com/tesseract-ocr/tesseract/blob/master/training/combine_lang_model.cpp#L40

int main(int argc, char** argv) {
  tesseract::ParseCommandLineFlags(argv[0], &argc, &argv, true);

Related: #1297

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions