combine_lang_model does not  print correct usage help

Usage instructions are given in https://github.com/tesseract-ocr/tesseract/blob/master/training/combine_lang_model.cpp#L43-58

 ```
 // Check validity of input flags.
  if (FLAGS_input_unicharset.empty() || FLAGS_script_dir.empty() ||
      FLAGS_output_dir.empty() || FLAGS_lang.empty()) {
    tprintf("Usage: %s --input_unicharset filename --script_dir dirname\n",
            argv[0]);
    tprintf("  --output_dir rootdir --lang lang [--lang_is_rtl]\n");
    tprintf("  [--words file --puncs file --numbers file]\n");
    tprintf("Sets properties on the input unicharset file, and writes:\n");
    tprintf("rootdir/lang/lang.charset_size=ddd.txt\n");
    tprintf("rootdir/lang/lang.traineddata\n");
    tprintf("rootdir/lang/lang.unicharset\n");
    tprintf("If the 3 word lists are provided, the dawgs are also added to");
    tprintf(" the traineddata file.\n");
    tprintf("The output unicharset and charset_size files are just for human");
    tprintf(" readability.\n");
```

However, the actual info displayed is

```
USAGE: combine_lang_model
  --lang_is_rtl  True if lang being processed is written right-to-left  (type:bool default:false)
  --pass_through_recoder  If true, the recoder is a simple pass-through of the unicharset. Otherwise, potentially a compre
ssion of it  (type:bool default:false)
  --input_unicharset  Unicharset to complete and use in encoding  (type:string default:)
  --script_dir  Directory name for input script unicharsets  (type:string default:)
  --words  File listing words to use for the system dictionary  (type:string default:)
  --puncs  File listing punctuation patterns  (type:string default:)
  --numbers  File listing number patterns  (type:string default:)
  --output_dir  Root directory for output files  (type:string default:)
  --version_str  Version string to add to traineddata file  (type:string default:)
  --lang  Name of language being processed  (type:string default:)
```

So, it looks like that the program is calling a common training argument parser and exiting.

https://github.com/tesseract-ocr/tesseract/blob/master/training/combine_lang_model.cpp#L40

```
int main(int argc, char** argv) {
  tesseract::ParseCommandLineFlags(argv[0], &argc, &argv, true);
```

Related: https://github.com/tesseract-ocr/tesseract/issues/1297

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

combine_lang_model does not print correct usage help #1375

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

combine_lang_model does not print correct usage help #1375

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions