Releases: wenet-e2e/wenet
Releases · wenet-e2e/wenet
v3.1.0
❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤
What's Changed
- [ctc] Update search.py by @pengzhendong in #2398
 - fix mask to bias by @Mddct in #2401
 - [ssl/w2vbert] weight copy from meta w2vbert-2.0 by @Mddct in #2392
 - [lint] fix linter version by @xingchensong in #2405
 - [search] Update search.py by @xingchensong in #2406
 - fix mask bias dtype in sdpa by @Mddct in #2407
 - Fix ckpt conversion bug by @zhr1201 in #2399
 - [dataset] restrict batch type by @Mddct in #2410
 - [wenet/bin/recognize.py] modify args to be consistent with train by @Mddct in #2411
 - [transformer] remove pe to device by @Mddct in #2413
 - add timer for steps by @Mddct in #2416
 - [dataset] support repeat by @Mddct in #2415
- (!! breaking changes, we recommand 
step_saveinstead ofepoch_save!!) 🚀🚀🚀 
 - (!! breaking changes, we recommand 
 - [transformer] fix sdpa u2pp training nan by @Mddct in #2419
- (!! important bug fix, enjoy flash attention without pain !!) 🚀🚀🚀
 
 - [transformer] fix sdpa mask for ShowRelAttention by @xingchensong in #2420
 - [runtime/libtorch] fix jit issue by @xingchensong in #2421
 - [dataset] add shuffle at shards tar/raw file level by @kakashidan in #2424
 - [dataset] fix cycle in recognize.py by @Mddct in #2426
 - [dataset] unify shuf conf by @Mddct in #2427
 - fix order by @Mddct in #2428
 - [runtime] upgrade libtorch version to 2.1.0 by @xingchensong in #2418
 - [torchaudio] Fix torchaudio interface error (#2352) by @lsrami in #2429
 - [paraformer] fsdp fix submodule call by @Mddct in #2431
 - fix modify by @Mddct in #2436
 - [deprecated dataset] small fix by @kakashidan in #2440
 - [dataset] add singal channel conf & processor by @kakashidan in #2439
 - fix list shuffle in recognize.py by @Mddct in #2446
 - fix list_shuffle in cv_conf by @Mddct in #2447
 - [runtime] Fixed failed compilation without ITN. Now, compiling ITN is mandatory. by @roney123 in #2444
 - [runtime] add blank_sacle in ctc_endpoint by @jia-jidong in #2374
 - fix step in continue training in steps mode by @Mddct in #2453
 - fix export_jit.py by @Mddct in #2455
 - [fix] fix copyright by @robin1001 in #2456
 - [fix] fix copyright by @xingchensong in #2457
 - fix llama rope by @Mddct in #2459
 - [train_engine] support fsdp by @Mddct in #2412
- (!! breaking changes, enjoy both fsdp & deepspeed !!) 🚀🚀🚀
 
 - [env] update python version and deepspeed version by @xingchensong in #2462
- (!! breaking changes, you may need to update your env !!) ❤❤❤
 
 - fix rope pos embdining by @Mddct in #2463
 - [transformer] add multi warmup and learning rate for different modules by @Mddct in #2449
- (!! Significant improvement on results of whisper !!) 💯💯💯
 
 - [whisper] limit language to Chinese by @xingchensong in #2470
 - [train] convert tensor to scalar by @xingchensong in #2471
 - [workflow] upgrad python version to 3.10 by @xingchensong in #2472
- (!! breaking changes, you may need to update your env !!) ❤❤❤
 
 - refactor cache behaviour in training mode (reduce compute cost and me… by @Mddct in #2473
 - fix ut by @Mddct in #2477
 - [transformer] Make MoE runnable by @xingchensong in #2474
 - [transformer] fix mqa by @Mddct in #2478
 - enable mmap in torch.load by @Mddct in #2479
 - [example] Add deespeed configs of different stages for illustrative purposes by @xingchensong in #2485
 - [example] Fix prefetch and step_save by @xingchensong in #2486
- (!! Significant decrease on cpu ram !!) 💯💯💯
 
 - [ctl] simplified ctl by @Mddct in #2483
 - [branchformer] simplified branchformer by @Mddct in #2482
 - [e_branchformer] simplified e_branchformer by @Mddct in #2484
 - [transformer] refactor cache by @Mddct in #2481
 - fix gradient ckpt in branchformer/ebranformer by @Mddct in #2488
 - [transformer] fix search after refactor cache by @Mddct in #2490
 - [transformer] set use_reentrant=False for gradient ckpt by @xingchensong in #2491
 - [transformer] fix warning: ignore(True) has been deprecated by @xingchensong in #2492
 - [log] avoid reduntant logging by @xingchensong in #2493
 - [transformer] refactor mqa repeat by @Mddct in #2497
 - [transformer] fix mqa in cross att by @Mddct in #2498
 - [deepspeed] update json config by @xingchensong in #2499
 - [onnx] clone weight for whisper by @xingchensong in #2501
 - [wenet/utils/train_utils.py] fix log by @Mddct in #2504
 - [transformer] keep high precisioin in softmax by @Mddct in #2508
 - [websocket] 8k and 16k support by @Sang-Hoon-Pakr in #2505
 - [Fix #2506] Specify multiprocessing context in DataLoader by @MengqingCao in #2507
 - [mask] set max_chunk_size according to subsample rate by @xingchensong in #2520
 - Revert "[Fix #2506] Specify multiprocessing context in DataLoader" by @xingchensong in #2521
 - [transformer] try to fix mga in onnxruntime by @Mddct in #2519
 - [utils] update precision of speed metric by @xingchensong in #2524
 - fix segmentfault in (#2506) by @MengqingCao in #2530
 
New modules and methods (from LLM community) by @Mddct & @fclearner 🤩🤩🤩
- [transformer] support multi query attention && multi goruped by @Mddct in #2403
 - [transformer] add rope for transformer/conformer by @Mddct in #2458
 - LoRA support by @fclearner in #2049
 
New Contributors
- @lsrami made their first contribution in #2429
 - @jia-jidong made their first contribution in #2374
 - @MengqingCao made their first contribution in #2507
 
Full Changelog: v3.0.1...v3.1.0
❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤
WeNet 3.0.1
❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤
What's Changed
- Fix loss returned by CTC model in RNNT by @kobenaxie in #2327
 - [dataset] new io for code reuse for many speech tasks by @Mddct in #2316
- (!! breaking changes, please update to torch2.x torchaudio2.x !!) 🚀🚀🚀
 
 - Fix eot by @Qiaochu-Song in #2330
 - [decode] support length penalty by @xingchensong in #2331
 - [bin] limit step when averaging model by @xingchensong in #2332
 - fix 'th_accuracy' not in transducer by @DaobinZhu in #2337
 - [dataset] support bucket by seq length by @Mddct in #2333
 - [examples] remove useless yaml by @xingchensong in #2343
 - [whisper] support arbitrary language and task by @xingchensong in #2342
- (!! breaking changes, happy whisper happy life !!) 💯💯💯
 
 - Minor fix decode_wav by @kobenaxie in #2340
 - fix comment by @Mddct in #2344
 - [w2vbert] support w2vbert fbank by @Mddct in #2346
 - [dataset ] fix typo by @Mddct in #2347
 - [wenet] fix args.enc by @Mddct in #2354
 - [examples] Initial whisper results on wenetspeech by @xingchensong in #2356
 - [examples] fix --penalty by @xingchensong in #2358
 - [paraformer] add decoding args by @xingchensong in #2359
 - [transformer] support flash att by 'torch scaled dot attention' by @Mddct in #2351
- (!! breaking changes, please update to torch2.x torchaudio2.x !!) 🚀🚀🚀
 
 - [conformer] support flash att by torch sdpa by @Mddct in #2360
- (!! breaking changes, please update to torch2.x torchaudio2.x !!) 🚀🚀🚀
 
 - [conformer] sdpa default to false by @Mddct in #2362
 - [transformer] fix bidecoder sdpa by @Mddct in #2368
 - [runtime] Configurable blank token idx by @zhr1201 in #2366
 - [wenet] modify - runtime/code/decoder more faster  by @Sang-Hoon-Pakr in #2367
- (!! Significant improvement on warmup when using libtorch !!) 🚀🚀🚀
 
 - [lint] fix lint by @cdliang11 in #2373
 - [examples] better results on wenetspeech using revised transcripts by @xingchensong in #2371
- (!! Significant improvement on results of whisper !!) 💯💯💯
 
 - [dataset] support pad or trim for whisper decoding by @Mddct in #2378
 - [bin/recognize.py] support numworkers and compute dtype by @Mddct in #2379
- (!! Significant improvement on inference speed when using fp16 !!) 🚀🚀🚀
 
 - [whisper] fix decoding maxlen by @Mddct in #2380
 - fix whisper ckpt modify error by @fclearner in #2381
 - 更新 recognize.py by @Mddct in #2383
 - [transformer] add cross attention by @Mddct in #2388
- (!! Significant improvement on inference speed of attention_beam_search !!) 🚀🚀🚀
 
 - [paraformer] fix some bugs by @Mddct in #2389
 - new modules and methods by @Mddct in 🤩🤩🤩
 
New Contributors
- @Qiaochu-Song made their first contribution in #2330
 - @Sang-Hoon-Pakr made their first contribution in #2367
 
❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤
Full Changelog: v3.0.0...v3.0.1
WeNet 3.0.0
❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤
New Features
- Support full part of bestrq #1869, #2060
 - Support GPU Tlg streaming #1878
 - Support streaming ASR web demo #1888
 - Support k2 rnnt loss and delay penality #1909
 - Supports context biasing #1931, #1936
 - Support ZeroPrompt (not merged) #1943
 - Support M1 Mac onnxruntime #1953
 - Support ITN runtime #2001, #2042, #2246
 - Support wav2vec2 #2034, #2035
 - Support part of w2vbert training #2039
 - wenet cli #2047, #2054, #2075, #2082, #2088, #2087, #2098, #2101, #2122 (!! simple and fast !!) 🛫
 - Support E-Branchformer module #2013
 - Support deepspeed #1849, #2168, #2123 (!! big big big !!) 💯
 - LoRA support (not merged) #2049
 - support batch decoding for ctc_prefix_beam_search & attention_rescoring #2059 (!! simple and fast !!) 🛫
 - support ali-paraformer #2067, #2078, #2093, #2096, #2099, #2124, #2139, #2140, #2155, #2219, #2222, #2277, #2282, #2289, #2314, #2324
 - support Contrastive learning for unified models #2100
 - support context biasing with ac automaton #2128, #2136
 - support whisper arch #2141, #2157, #2196, #2313, #2322, #2323
 - Support gradient checkpointing for Conformer & Transformer (whisper) #2173, #2275
 - ssh-launcher for multi-node multi-gpu training #2180, #2265
 - u2++-lite training support #2202
 - support blank penalty #2278
 - support speaker in dataset #2292
 - Whisper inference support in cpp runtime #2320
 
What's Changed
- Upgrade libtorch CPU runtime with IPEX version #1893
 - Refine ctc alignment #1966
 - Use torchrun for distributed training #2020, #2021
 - Refine traning code #2055, #2103, #2123, #2248, #2252, #2253, #2270, #2286, #2288, #2312 (!! big changes !!) 🚀
 - mv all ctc functions to ctc_utils.py #2057 (!! big changes !!) 🚀
 - move search methods to search.py #2056 (!! big changes !!) 🚀
 - move all k2 related functions to k2 #2058
 - refactor and simplify decoding methods #2061, #2062
 - unify decode results of all decoding methods #2063
 - refactor(dataset): return dict instead of tuple #2106, #2111
 - init_model API changed #2116, #2216 (!! big changes !!) 🚀
 - move yaml saving to save_model() #2156
 - refine tokenizer #2165, #2186 (!! big changes !!) 🚀
 - deprecate wenetruntime #2194 (!! big changes !!) 🚀
 - use pre-commit to auto check and lint #2195
 - refactor(yaml): Config ctc/cmvn/tokenizer in train.yaml #2205, #2229, #2230, #2227, #2232 (!! big changes !!) 🚀
 - train with dict input #2242, #2243 (!! big changes !!) 🚀
 - [dataset] keep pcm for other task #2268
 - Updgrad torch to 2.x #2301 (!! big changes !!) 🚀
 - log everything to tensorboard #2307
 
New Bug Fixes
- Fix NST recipe #1863
 - Fix Librispeech fst dict #1929
 - Fix bug when make shard.list for *.flac #1933
 - Fix bug of transducer #1940
 - Avoid problem during model averaging when there is parameter-tying. #2113
 - [loss] set zero_infinity=True to ignore NaN or inf ctc_loss #2299
 - fix android #2303
 
❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤
Many thanks to all the contributors !!!!! I love u all.
WeNet 2.2.1
What's Changed
- Add http server/client @aluminumbox #1670
 - Add Trt (Myelin) support for streaming ASR @yuekaizhang #1679
 - Support OpenVino @FionaZZ92 #1700
 - Support ONNX GPU export, add librispeech results, and fix V2 streaming decode issue for efficient conformer @zwglory #1701
 - Support ort backend in wenetruntime @xingchensong #1708
 - Support LFMMI @aluminumbox #1725
 - Support Paraformer @MrSupW & @robin1001 #1738 & #1749 & #1791 & #1795
 - Support part of bestrq @Mddct #1750 & #1754 & #1824
 - Remove concat after to simplify the code flow #1762 & #1763 & #1764
 - Add riva cuda tlg decoder @yuekaizhang #1773
 - Add CUDA TLG nbest and mbr decoding @yuekaizhang #1804
 - Support IPEX @ZailiWang #1816
 - Support Branchformer @kli017 #1845
 - Support GPU hotword @zwglory #1860
 
WeNet 2.2.0
What's Changed
- support exporting squeezeformer to onnx (CPU & GPU) by @yygle in #1593 and #1634
 - support horizon x3 pi by @xingchensong in #1597
 - support noisy student training by @NevermoreCY in #1600
 - support efficient conformer by @zwglory in #1636
 - add blank scale for wfst decoding by @simonwang517 in #1646
 
WeNet 2.1.0
What's Changed
WeNet Python Binding Models
This release is for hosting the wenet python binding models.
WeNet 2.0.0
The following features are stable.
- U2++ framework for better accuracy
 - n-gram + WFST language model solution
 - Context biasing(hotword) solution
 - Very big data training support with UIO
 - More dataset support, including WenetSpeech, GigaSpeech, HKUST and so on.
 
WeNet 1.0.0
Model
- propose and support U2++, as the following graph shows, which uses both forward and backward information at training and decoding.
 
- support dynamic left chunk training and decoding, so we can limit history chunk at decoding to save memory and computation.
 - support distributed training.
 
Dataset
Now we support the following five standard speech datasets, and we got SOTA result or close to SOTA result.
| 数据集 | 语言 | 数据量(h) | 测试集 | CER/WER | SOTA | 
|---|---|---|---|---|---|
| aishell-1 | 中文 | 200 | test | 4.36 | 4.36(WeNet) | 
| aishell-2 | 中文 | 1000 | test_ios | 5.39 | 5.39(WeNet) | 
| multi-cn | 中文 | 2385 | / | / | / | 
| librispeech | 英文 | 1000 | test_clean | 2.66 | 2.10(EspNet) | 
| gigaspeech | 英文 | 10000 | test | 11.0 | 10.80(EspNet) | 
Productivity
Here are some features related to productivity.
- LM support. Here is the system design or LM supporting. WeNet can work with/without LM according to your applications/scenarios.
 
- timestamp support.
 - n-best support.
 - endpoint support.
 - gRPC support
 - further refine x86 server and on-device android recipe.
 
WeNet 0.1.0
Major Features
- Joint CTC/AED model structure
 - U2, dynamic chunk training support
 - Torchaudio support
 - Runtime x86 and android support
 

