- 
                Notifications
    You must be signed in to change notification settings 
- Fork 2.9k
add ctc beam search decoder #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| The algorithm in prefix beam search paper is found to be very confusing and may have some problem in details. So here is a modification, which the code is based on | 
| inputs_t = [ops.convert_to_tensor(x) for x in inputs] | ||
| inputs_t = array_ops.stack(inputs_t) | ||
|  | ||
| # run CTC beam search decoder in tensorflow | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
请问为什么单测用tensorflow来写呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
只是为了和TensorFlow对比结果
| Validate the implementation To affirm the correctness, the implementation is compared with the ctc_beam_search_decoder in TensorFlow under the same input probability matrix and beam size. An independent repo is provided to test the logic. Run the script  More validation can be done by setting different  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please make the interface of ctc_beam_search_decodermore general to allow any external custom scorer to be used.
- Please carefully clean and check the codes before committing.
| import random | ||
| import numpy as np | ||
|  | ||
| # vocab = blank + space + English characters | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove unnecessary comment lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| return ids_str | ||
|  | ||
|  | ||
| def language_model(ids_list, vocabulary): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a "toy" language model just for testing. Please replace it with a "real" one build in the pull request #71.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| beam_size, | ||
| vocabulary, | ||
| max_time_steps=None, | ||
| lang_model=language_model, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lang_model --> external_scoring_function.
- Please use "language_model" instead of lang_model for clarity.
- Not only LM, but also other custom scoring function are also allowed. Please rename it to make this clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| vocabulary, | ||
| max_time_steps=None, | ||
| lang_model=language_model, | ||
| alpha=1.0, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If lang_model --> external_scoring_function, these parameters should be moved to external_scoring_function creator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| space_id=1, | ||
| num_results_per_sample=None): | ||
| ''' | ||
| Beam search decoder for CTC-trained network, adapted from Algorithm 1 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Adapted" means there is a difference? Could you please explain what the difference is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| vocab = ['-', '_', 'a'] | ||
|  | ||
|  | ||
| def ids_list2str(ids_list): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove Line 13 - 20. Please clean codes before commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| num_results_per_sample=None): | ||
| """ | ||
| CTC-like sequence decoding from a sequence of likelihood probablilites. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since now we have more than one type of decoders. Please add comments to simply explain each one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| import numpy as np | ||
| import tensorflow as tf | ||
| from tensorflow.python.framework import ops | ||
| from tensorflow.python.ops import array_ops | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not proper to include tensor-flow dependency. It would be better to paste ground-truth results and just compare our results with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| @@ -0,0 +1,69 @@ | |||
| from __future__ import absolute_import | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we put it in a ./test folder?What is the best practice for a python unit test file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the test code. Done
| ## This is a prototype of ctc beam search decoder | ||
|  | ||
| import copy | ||
| import random | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used. Remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
        
          
                deep_speech_2/decoder.py
              
                Outdated
          
        
      |  | ||
|  | ||
| def ctc_decode(probs_seq, vocabulary, method): | ||
| class Scorer(object): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should consider the expandability. KenLM is only one of the language model tools and each tool have its special interface. We can define a unify base class, and derivate KenLMScore from the base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If more language models are involved, the Scorer will be redesigned. Temporarily we use one class to avoid redundancy.
        
          
                deep_speech_2/decoder.py
              
                Outdated
          
        
      | self._beta = beta | ||
| self._language_model = kenlm.LanguageModel(model_path) | ||
|  | ||
| def language_model_score(self, sentence, bos=True, eos=False): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Special tokens should be replaced by KenLM's internal usage format like end token、unknown token etc. Start token should be removed from the sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decoded prefix in ctc decoder doesn't contain any special tokens. So the reprocessing is simplified.
        
          
                deep_speech_2/decoder.py
              
                Outdated
          
        
      | return ctc_best_path_decode(probs_seq, vocabulary) | ||
| else: | ||
| raise ValueError("Decoding method [%s] is not supported.") | ||
| max_time_steps = len(probs_seq) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider replace max_time_steps to other name (like time_step_num) ? Feel confused somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
        
          
                deep_speech_2/decoder.py
              
                Outdated
          
        
      | ## initialize | ||
| # the set containing selected prefixes | ||
| prefix_set_prev = {'-1': 1.0} | ||
| probs_b, probs_nb = {'-1': 1.0}, {'-1': 0.0} | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider rename probs_b and probs_nb to probs_b_prev and probs_nb_prev ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| Used grid search to find out optimal parameters alpha=0.26, beta=0.1, decreasing WER to ~0.17 | 
5c4751e    to
    3d292d0      
    Compare
  
    | Passed CI. With a rebuilt more powerful language model, the WER has been decreased to 13%. #115 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
        
          
                deep_speech_2/decoder.py
              
                Outdated
          
        
      | cutoff_prob=1.0, | ||
| ext_scoring_func=None, | ||
| nproc=False): | ||
| '''Beam search decoder for CTC-trained network, using beam search with width | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use """ instead of ''' for consistency.
Please also check other places for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
        
          
                deep_speech_2/decoder.py
              
                Outdated
          
        
      | import multiprocessing | ||
|  | ||
|  | ||
| def ctc_best_path_decode(probs_seq, vocabulary): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctc_best_path_decode --> ctc_best_path_decoder. Please also modify the function comments' decoding to decoder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
        
          
                deep_speech_2/decoder.py
              
                Outdated
          
        
      | ext_scoring_func=None, | ||
| nproc=False): | ||
| '''Beam search decoder for CTC-trained network, using beam search with width | ||
| beam_size to find many paths to one label, return beam_size labels in | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, using beam search with width find many paths to one label, return  beam_size labels in the descending order --> ". It utilizes beam search to approximately select top best decoding paths and returning results in the descending order`
原句不是一个完整的句子,尤其注意标点的使用。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
        
          
                deep_speech_2/decoder.py
              
                Outdated
          
        
      | '''Beam search decoder for CTC-trained network, using beam search with width | ||
| beam_size to find many paths to one label, return beam_size labels in | ||
| the descending order of probabilities. The implementation is based on Prefix | ||
| Beam Search(https://arxiv.org/abs/1408.2873), and the unclear part is | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beam Search( --> Beam Search (
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
        
          
                deep_speech_2/decoder.py
              
                Outdated
          
        
      | beam_size to find many paths to one label, return beam_size labels in | ||
| the descending order of probabilities. The implementation is based on Prefix | ||
| Beam Search(https://arxiv.org/abs/1408.2873), and the unclear part is | ||
| redesigned. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the redesigned and why? Could you please add detailed explanation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
        
          
                deep_speech_2/scorer.py
              
                Outdated
          
        
      | return np.power(10, log_cond_prob) | ||
|  | ||
| # word insertion term | ||
| def word_count(self, sentence): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not expose word_count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
        
          
                deep_speech_2/scorer.py
              
                Outdated
          
        
      | self._language_model = kenlm.LanguageModel(model_path) | ||
|  | ||
| # n-gram language model scoring | ||
| def language_model_score(self, sentence): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to expose this score
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| return len(words) | ||
|  | ||
| # execute evaluation | ||
| def __call__(self, sentence, log=False): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to get_score
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserved because by using __call__ the scorer can be called by scorer_name(prefix) and compatible with a plain function func_name(prefix).
| :param alpha: Parameter associated with language model. | ||
| :type alpha: float | ||
| :param beta: Parameter associated with word count. | ||
| :type beta: float | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain when word count is not used? e.g. "If beta = xxxx ...."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
        
          
                deep_speech_2/tune.py
              
                Outdated
          
        
      | from __future__ import division | ||
| from __future__ import print_function | ||
|  | ||
| import paddle.v2 as paddle | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Reorder the imports.
- Please modify all below according to the suggestions in infer.py and evaluate.py.
- Add descriptions to README.md for usage of tune.py and evaluate.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refined. Please review again.
| blank_id=0, | ||
| cutoff_prob=1.0, | ||
| ext_scoring_func=None, | ||
| nproc=False): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserved temporarily before fixing the problem about how to pass ext_scoring_fuc to the multi processes.
| from model import deep_speech2 | ||
| from decoder import * | ||
| from scorer import Scorer | ||
| from error_rate import wer | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
        
          
                deep_speech_2/evaluate.py
              
                Outdated
          
        
      | help="Manifest path for normalizer. (default: %(default)s)") | ||
| parser.add_argument( | ||
| "--decode_manifest_path", | ||
| default='data/manifest.libri.test-clean', | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| :param alpha: Parameter associated with language model. | ||
| :type alpha: float | ||
| :param beta: Parameter associated with word count. | ||
| :type beta: float | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
        
          
                deep_speech_2/decoder.py
              
                Outdated
          
        
      | import multiprocessing | ||
|  | ||
|  | ||
| def ctc_best_path_decode(probs_seq, vocabulary): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
        
          
                deep_speech_2/scorer.py
              
                Outdated
          
        
      |  | ||
|  | ||
| class Scorer(object): | ||
| """External defined scorer to evaluate a sentence in beam search | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
        
          
                deep_speech_2/scorer.py
              
                Outdated
          
        
      | self._language_model = kenlm.LanguageModel(model_path) | ||
|  | ||
| # n-gram language model scoring | ||
| def language_model_score(self, sentence): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
        
          
                deep_speech_2/scorer.py
              
                Outdated
          
        
      | return np.power(10, log_cond_prob) | ||
|  | ||
| # word insertion term | ||
| def word_count(self, sentence): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| return len(words) | ||
|  | ||
| # execute evaluation | ||
| def __call__(self, sentence, log=False): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserved because by using __call__ the scorer can be called by scorer_name(prefix) and compatible with a plain function func_name(prefix).
        
          
                deep_speech_2/tune.py
              
                Outdated
          
        
      | from __future__ import division | ||
| from __future__ import print_function | ||
|  | ||
| import paddle.v2 as paddle | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost LGTM.
| of probabilities, the assignment operation is changed to accumulation for | ||
| one prefix may comes from different paths; 2) the if condition "if l^+ not | ||
| in A_prev then" after probabilities' computation is deprecated for it is | ||
| hard to understand and seems unnecessary. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make sure that these modifications are correct?
| blank_id=0, | ||
| cutoff_prob=1.0, | ||
| ext_scoring_func=None, | ||
| nproc=False): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we fix it now ?
        
          
                deep_speech_2/decoder.py
              
                Outdated
          
        
      | '\t': 1.0 | ||
| }, { | ||
| '\t': 0.0 | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to use so many lines. Maybe you can revert it back with only two lines.
| vocabulary, | ||
| blank_id=0, | ||
| blank_id, | ||
| num_processes, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we set it to 'multiprocessing.cpu_count()' as default value?
        
          
                deep_speech_2/evaluate.py
              
                Outdated
          
        
      | help="Manifest path for normalizer. (default: %(default)s)") | ||
| parser.add_argument( | ||
| "--decode_manifest_path", | ||
| default='data/manifest.libri.test-clean', | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still 'data/manifest.libri.test-clean' ?
| help="Manifest path for decoding. (default: %(default)s)") | ||
| parser.add_argument( | ||
| "--model_filepath", | ||
| default='checkpoints/params.latest.tar.gz', | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use latest as default.
| :param alpha: Parameter associated with language model. | ||
| :param alpha: Parameter associated with language model. Don't use | ||
| language model when alpha = 0. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--》 Language-model scorer is disabled when alpha=0.
| :type alpha: float | ||
| :param beta: Parameter associated with word count. | ||
| :param beta: Parameter associated with word count. Don't use word | ||
| count when beta = 0. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Word-count scorer is disabled when beta = 0.
        
          
                deep_speech_2/lm/lm_scorer.py
              
                Outdated
          
        
      | word_cnt = self._word_count(sentence) | ||
| if log == False: | ||
| score = np.power(lm, self._alpha) \ | ||
| * np.power(word_cnt, self._beta) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible tp put L60 and L61 into a single line within 80 columns?
| @@ -0,0 +1,3 @@ | |||
| echo "Downloading language model." | |||
|  | |||
| wget -c ftp://xxx/xxx/en.00.UNKNOWN.klm -P ./data | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you replace it with a real url?

resolve PaddlePaddle/Paddle#2230
In progress. Add pseudo code and test information later.