-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Closed
Labels
othersunknown issue typeunknown issue type
Description
New Features
大模型训推全流程
基础框架升级
Pipelines
Bug Fix
What's Changed
- fix cache select bug by @wtmlon in fix cache select bug #7026
- [Bug fixes] Fix generate d2s by @wj-Mcat in [Bug fixes] Fix generate d2s #6971
- [Bug fixes] Fix typo by @wj-Mcat in [Bug fixes] Fix typo #7039
- add BAAI/bge-small-zh-v1.5 model by @qingzhong1 in add BAAI/bge-small-zh-v1.5 model #7036
- fix beam search topk bug by @wtmlon in fix beam search topk bug #7044
- raise error if TP > 2 for ChatGLM2 by @sijunhe in raise error if TP > 2 for ChatGLM2 #7041
- [New features] refactor llm ci testing by @wj-Mcat in [New features] refactor llm ci testing #7046
- [Trainer] Support show paddlenlp commit id by @ZHUI in [Trainer] Support show paddlenlp commit id #7047
- [Bug fixes] fix hard-code model-prefix by @wj-Mcat in [Bug fixes] fix hard-code model-prefix #7048
- [Paddle-Pipelines] Add function calling examples by @w5688414 in [Paddle-Pipelines] Add function calling examples #7051
- [Bug fixes] update inference-model generation-utils by @wj-Mcat in [Bug fixes] update inference-model generation-utils #7059
- [CI] add_llm_ci_ipipe by @zjjlivein in [CI] add_llm_ci_ipipe #7050
- Support
decode_token
for stream generation by @sijunhe in Supportdecode_token
for stream generation #7063 - [SetUp] Fix set up with pip install -e by @ZHUI in [SetUp] Fix set up with pip install -e #7062
- fuse rope udpate by @FeixLiu in fuse rope udpate #6957
- [llm]fix bloom tensor parallelism by @lugimzzz in [llm]fix bloom tensor parallelism #7065
- Add Baichuan2 Models by @sijunhe in Add Baichuan2 Models #7072
- [Bug fixes]update layername of chatglm inference model by @wj-Mcat in [Bug fixes]update layername of chatglm inference model #7073
- [CI] fix-llm-tests by @zjjlivein in [CI] fix-llm-tests #7066
- [Bug fixes] use top_p_sampling of paddlenlp_ops by @wj-Mcat in [Bug fixes] use top_p_sampling of paddlenlp_ops #7071
- [Pipelines Agents] Add prompt for function call by @sijunhe in [Pipelines Agents] Add prompt for function call #7070
- Update README.md by @ZHUI in Update README.md #7069
- add if-else use_new_executor branch by @Wennie396 in add if-else use_new_executor branch #6897
- Update OpenWebText2.md by @ZHUI in Update OpenWebText2.md #7078
- Fix synchronized memcpy in GPT by @Wong4j in Fix synchronized memcpy in GPT #7008
- [Pipelines Agents] limit to 1 function call per round by @sijunhe in [Pipelines Agents] limit to 1 function call per round #7084
- [llm]add quantize by @lugimzzz in [llm]add quantize #7027
- align config between dy/st and add memory profiler for dy by @Wennie396 in align config between dy/st and add memory profiler for dy #6982
- Support WikiText & Lambada by @RachelXu7 in Support WikiText & Lambada #7079
- [New Features] add llm pretrain & lora & sft & prefix_tuning testing scripts by @wj-Mcat in [New Features] add llm pretrain & lora & sft & prefix_tuning testing scripts #7056
- [Paddle-Pipelines] Add stream output by @w5688414 in [Paddle-Pipelines] Add stream output #7087
- [Paddle-pipelines] Add HNSW support for BES by @qingzhong1 in [Paddle-pipelines] Add HNSW support for BES #7021
- [Paddle-Pipelines] Fix function log reptition bug by @w5688414 in [Paddle-Pipelines] Fix function log reptition bug #7097
- [LLM] Support gpt3 fine grained dybatch v1 by @yuanlehome in [LLM] Support gpt3 fine grained dybatch v1 #7080
- support opt-2.7 by @zhoutianzi666 in support opt-2.7 #7010
- [Paddle-Pipelines] Add top_k support by @w5688414 in [Paddle-Pipelines] Add top_k support #7103
- [Bug fixes]fix chatglm2 d2s by @wj-Mcat in [Bug fixes]fix chatglm2 d2s #7110
- Polish the benchmark scripts and add test for N4C32. by @Xreki in Polish the benchmark scripts and add test for N4C32. #7105
- sharding stage 2 main grad by @FeixLiu in sharding stage 2 main grad #7075
- fix benchmark compute statistics by @MARD1NO in fix benchmark compute statistics #7098
- [NVIDIA] Add stage2 NCCL kernel overlap by @Tom-Zheng in [NVIDIA] Add stage2 NCCL kernel overlap #7092
- [CI] fix llm tests by @zjjlivein in [CI] fix llm tests #7117
- [LLM] Unify pipeline model with PretrainModelPipe by @ZHUI in [LLM] Unify pipeline model with PretrainModelPipe #7095
- [BugFix] Revert PR7008 Update trainer.py by @ZHUI in [BugFix] Revert PR7008 Update trainer.py #7123
- Remove load state as np of from_pretrained by @ZHUI in Remove load state as np of from_pretrained #7120
- [New Features] Add test predictor for llm by @wj-Mcat in [New Features] Add test predictor for llm #7115
- [BugFix] Fix tp load and gpt pp model by @DesmonDay in [BugFix] Fix tp load and gpt pp model #7129
- group null judge fix by @TimeYWL in group null judge fix #7122
- [llm]add quantization tensor parallelism by @lugimzzz in [llm]add quantization tensor parallelism #7099
- [GPT] Fix bugs by @ZHUI in [GPT] Fix bugs #7141
- [llm]support lora for quantizationlinear by @lugimzzz in [llm]support lora for quantizationlinear #7138
- [LLM] Support chatglm precache. by @xiaoxiaohehe001 in [LLM] Support chatglm precache. #7109
- Enable FLAGS_new_executor_micro_batching in AutoEngine by @From00 in Enable FLAGS_new_executor_micro_batching in AutoEngine #7168
- [Bug fixes] Fix experimental llama set_state_dict by @wj-Mcat in [Bug fixes] Fix experimental llama set_state_dict #7142
- [BugFix] fix readthedocs by @wj-Mcat in [BugFix] fix readthedocs #7175
- [GPT-3] fix attention and past_key_values by @DrownFish19 in [GPT-3] fix attention and past_key_values #7088
- Support AI Studio download by @sijunhe in Support AI Studio download #7146
- [New features] Add precache predictor test by @wj-Mcat in [New features] Add precache predictor test #7137
- gradio ui support streaming by @wtmlon in gradio ui support streaming #7086
- [New Features] add llama ce scripts by @wj-Mcat in [New Features] add llama ce scripts #7118
- Add
share-folder
for distributed training by @KB-Ding in Addshare-folder
for distributed training #7068 - [README] fix README in model_zoo/ernie-1.0/preprocess by @KB-Ding in [README] fix README in model_zoo/ernie-1.0/preprocess #7188
- [LLM] Fix error when config.bs is more than input bs. by @xiaoxiaohehe001 in [LLM] Fix error when config.bs is more than input bs. #7187
- [Bug fixes] add opt init.py by @wj-Mcat in [Bug fixes] add opt __init__.py #7193
- [LLM] Support llm prealloc. by @xiaoxiaohehe001 in [LLM] Support llm prealloc. #7194
- [�PEFT]support pp + lora by @lugimzzz in [PEFT]support pp + lora #7198
- [Function optimization] predict with batch_size = 3 by @wj-Mcat in [Function optimization] predict with batch_size = 3 #7191
- Add max_shard_size arg by @DesmonDay in Add max_shard_size arg #6835
- bloom chatglmv1/2 llm unittest by @wtmlon in bloom chatglmv1/2 llm unittest #7165
- [LLM] Reconstruct fused transformer layers by @RichardWooSJTU in [LLM] Reconstruct fused transformer layers #7186
- Fix ckpt shard size by @DesmonDay in Fix ckpt shard size #7202
- [Tokenizer] fix _decode by @DrownFish19 in [Tokenizer] fix _decode #7195
- [LLM] Support bloom precache. by @xiaoxiaohehe001 in [LLM] Support bloom precache. #7204
- adjust gradio unittest assert by @wtmlon in adjust gradio unittest assert #7205
- Update README.md by @ZHUI in Update README.md #7184
- Support iterable dataset with dist dataloader by @DesmonDay in Support iterable dataset with dist dataloader #7208
- [Bug fixes] Fix export model validate pdmodel by @wj-Mcat in [Bug fixes] Fix export model validate pdmodel #7201
- [New Features] Add prefix allowed token by @wj-Mcat in [New Features] Add prefix allowed token #7210
- Fix subfolder default argument in model download by @sijunhe in Fix subfolder default argument in model download #7220
- [peft]remove warning by @lugimzzz in [peft]remove warning #7234
- Support fp32 for dist dataloader by @DesmonDay in Support fp32 for dist dataloader #7240
- [llm] 当cuda < 11.6,self.linear 回退到matmul+add by @zhoutianzi666 in [llm] 当cuda < 11.6,self.linear 回退到matmul+add #7250
- [New Features] add chat-template for inference by @wj-Mcat in [New Features] add chat-template for inference #7219
- [LLM] Fix bloom precache by @xiaoxiaohehe001 in [LLM] Fix bloom precache #7249
- Support zero dataset splits for pre-training by @DesmonDay in Support zero dataset splits for pre-training #7259
- Fix fp32 dist dataloader by @DesmonDay in Fix fp32 dist dataloader #7261
- add flash attention for auto gpt3 by @zhiqiu in add flash attention for auto gpt3 #7135
- [Bug fixes] update lru_cache usage by @wj-Mcat in [Bug fixes] update lru_cache usage #7260
- update shard tensor for pp by @zhaoyinglia in update shard tensor for pp #7167
- fix error in dropout of hybrid_model by @heavyrain-lzy in fix error in dropout of hybrid_model #7269
- [New features] add docs for chat-template by @wj-Mcat in [New features] add docs for chat-template #7270
- [Trainer]Add unified checkpoint for hybrid parallel. by @ZHUI in [Trainer]Add unified checkpoint for hybrid parallel. #7238
- [PIR][Prim] support PIR train by @cyber-pioneer in [PIR][Prim] support PIR train #7276
- [llm]support chatglmv2 in fined FT by @zhoutianzi666 in [llm]support chatglmv2 in fined FT #7253
- [LLM] Support chatglm/bloom wint8/4. by @xiaoxiaohehe001 in [LLM] Support chatglm/bloom wint8/4. #7207
- Polish the setting of dist_strategy. by @Xreki in Polish the setting of dist_strategy. #7214
- Use set_dynamic_shape api to set dynamic shape in GPT generation model by @pkuzyc in Use set_dynamic_shape api to set dynamic shape in GPT generation model #7281
- readme update 14b by @wtmlon in readme update 14b #7291
- [New features] update llm docs by @wj-Mcat in [New features] update llm docs #7290
- [Bug fixes] disable ir_optim under bfloat16 by @wj-Mcat in [Bug fixes] disable ir_optim under bfloat16 #7292
- [llm]fix weight only quantization by @lugimzzz in [llm]fix weight only quantization #7252
- [New features] update inference doc by @wj-Mcat in [New features] update inference doc #7296
- Add qwen flash attention by @wawltor in Add qwen flash attention #7289
- Modify dist dataloader by @DesmonDay in Modify dist dataloader #7286
- Support qwen pretrain. by @ZHUI in Support qwen pretrain. #7275
- [LLM]fix llama pp recompute by @lugimzzz in [LLM]fix llama pp recompute #7235
- [llm benchmark] Add flash attention 2 by @w5688414 in [llm benchmark] Add flash attention 2 #7057
- Support setting allreduce_matmul_grad_overlapping and enable_send_recv_overlap in auto_config by @From00 in Support setting allreduce_matmul_grad_overlapping and enable_send_recv_overlap in auto_config #7297
- [Bug fixes] update inference doc by @wj-Mcat in [Bug fixes] update inference doc #7299
- Optimize qwen model with fused_rope, fused_rms_norm and core_attn recompute by @DesmonDay in Optimize qwen model with fused_rope, fused_rms_norm and core_attn recompute #7307
- Support Fast PTQ by @RachelXu7 in Support Fast PTQ #7282
- [quant] add QuantizationConfig by @lugimzzz in [quant] add QuantizationConfig #7314
- [LLM] Support fuse_gpt wint8/4. by @xiaoxiaohehe001 in [LLM] Support fuse_gpt wint8/4. #7304
- [CI] Add qwen pretrain test by @DesmonDay in [CI] Add qwen pretrain test #7330
- [aistudio] add save to aistudio hub by @lugimzzz in [aistudio] add save to aistudio hub #7338
- [Other]TopPProcess add bfloat16 input by @LokeZhou in [Other]TopPProcess add bfloat16 input #7337
- Add qwen baichuan ci by @wtmlon in Add qwen baichuan ci #7324
- [Trainer] Support broadcast optimizer in dp group by @ZHUI in [Trainer] Support broadcast optimizer in dp group #7256
- add parallel seed control by @zhiqiu in add parallel seed control #7347
- [PEFT]fix peft load_best_model_at_end & resume from checkpoint by @lugimzzz in [PEFT]fix peft load_best_model_at_end & resume from checkpoint #7360
- freeze gradio version by @wtmlon in freeze gradio version #7355
- [Auto Parallel] Support pure single-card modeling for auto_parallel by @haohongxiang in [Auto Parallel] Support pure single-card modeling for auto_parallel #7348
- [Fix llm bugs] Add use cache & version check by @w5688414 in [Fix llm bugs] Add use cache & version check #7357
- Qwen support position_ids by @wtmlon in Qwen support position_ids #7359
- 【AutoParallelism】add "refined_ops_patterns" in recompute strategy by @heavyrain-lzy in 【AutoParallelism】add "refined_ops_patterns" in recompute strategy #7349
- [LLM]add peft save_to_aistudio & add options in llm directory by @lugimzzz in [LLM]add peft save_to_aistudio & add options in llm directory #7364
- [PEFT]add qatlora mp by @lugimzzz in [PEFT]add qatlora mp #7288
- [baichuan2 13b] Fix alibi bug by @w5688414 in [baichuan2 13b] Fix alibi bug #7376
- [AutoParallel] Sequence Parallel for Auto Mode by @JZ-LIANG in [AutoParallel] Sequence Parallel for Auto Mode #7301
- [Function optimization] update llama ce script by @wj-Mcat in [Function optimization] update llama ce script #7192
- [llm]add neft by @lugimzzz in [llm]add neft #7382
- support torch safe tensor convert by @wtmlon in support torch safe tensor convert #7383
- Develop sharding reshard by @liuzhenhai93 in Develop sharding reshard #7399
- fix qwen lm_head matrix dimmention alignment by @WAI-clear in fix qwen lm_head matrix dimmention alignment #7402
- Fix dist dataloader with no mp group by @DesmonDay in Fix dist dataloader with no mp group #7407
- Separate linux and non-linux requirements by @sijunhe in Separate linux and non-linux requirements #7408
- convert chatglm2 to fuse qkv by @wtmlon in convert chatglm2 to fuse qkv #7379
- [BugFix] Fix bug in sequence parallel by @DesmonDay in [BugFix] Fix bug in sequence parallel #7414
- [llm] Update fa2 comments by @w5688414 in [llm] Update fa2 comments #7415
- support pir fuse pass by @zhiqiu in support pir fuse pass #7393
- add chatglm2 legacy checkpoints convert script by @wtmlon in add chatglm2 legacy checkpoints convert script #7419
- [LLM] Support ptq inference by @RichardWooSJTU in [LLM] Support ptq inference #7224
- [Bug fixes]revert chatglm2 config by @wj-Mcat in [Bug fixes]revert chatglm2 config #7410
- [paddle-pipelines] Fix retrieval benchmarks by @w5688414 in [paddle-pipelines] Fix retrieval benchmarks #7420
- Add llm ci report by @zjjlivein in Add llm ci report #7418
- [New features] Add chat template for training by @wj-Mcat in [New features] Add chat template for training #7241
- [llm] support GQA in chatglmv2 by @zhoutianzi666 in [llm] support GQA in chatglmv2 #7412
- [llm benchmark] remove tsinghua pip by @w5688414 in [llm benchmark] remove tsinghua pip #7435
- [Improvement] update chat-template readme by @wj-Mcat in [Improvement] update chat-template readme #7442
- [retrieval benchmark] fix dynamic oom bug by @w5688414 in [retrieval benchmark] fix dynamic oom bug #7449
- [AutoParallel] Support Llama by @zhaoyinglia in [AutoParallel] Support Llama #7394
- [CI]update gpt-3 scripts by @Liujie0926 in [CI]update gpt-3 scripts #7350
- [gpt-3]add new_exec_pp benchmark by @Liujie0926 in [gpt-3]add new_exec_pp benchmark #7457
- [LLM] Inference use new executor and remove top_p_sampling custom op by @yuanlehome in [LLM] Inference use new executor and remove top_p_sampling custom op #7432
- [Bug fixes] use old_executor to do static inference by @wj-Mcat in [Bug fixes] use old_executor to do static inference #7464
- Log ips in auto Llama by @From00 in Log ips in auto Llama #7459
- Add load multi torch bin slices logic by @WAI-clear in Add load multi torch bin slices logic #7445
- [Llama] Support hybrid strategy and recompute in llama auto version by @haohongxiang in [Llama] Support hybrid strategy and recompute in llama auto version #7458
- update bsz for pretrain auto by @zhaoyinglia in update bsz for pretrain auto #7474
- [gpt-3]Add pir ci by @Liujie0926 in [gpt-3]Add pir ci #7471
- [GPT] add sequence_parallel by @DrownFish19 in [GPT] add sequence_parallel #7311
- [AutoParallel] Sequence Parallel Optimization Pass Config by @JZ-LIANG in [AutoParallel] Sequence Parallel Optimization Pass Config #7409
- [LLM] Fix alibi in pipeline parallel for baichuan-13b by @ZHUI in [LLM] Fix alibi in pipeline parallel for baichuan-13b #7433
- [Unified Ckpt] Add unittest for unified checkpoint. by @ZHUI in [Unified Ckpt] Add unittest for unified checkpoint. #7392
- [Unified checkpoint] Add unified optimizer by @DesmonDay in [Unified checkpoint] Add unified optimizer #7373
- [BugFix] Update trainer.py by @ZHUI in [BugFix] Update trainer.py #7492
- dp main_grad by @tianhaodongbd in dp main_grad #7293
- sharding stage 3 main grad by @tianhaodongbd in sharding stage 3 main grad #7319
- [Improvement] add context_data support for chat_template rendering by @wj-Mcat in [Improvement] add context_data support for chat_template rendering #7480
- [Trainer] Fix pp config usage. by @ZHUI in [Trainer] Fix pp config usage. #7443
- [Unified Checkpoint] Fix optimizer sharding index json by @DesmonDay in [Unified Checkpoint] Fix optimizer sharding index json #7499
- [chatglm] Fix attention_score -nan bug by @w5688414 in [chatglm] Fix attention_score -nan bug #7501
- [Trainer] Remove dist_gather, use
gather
under paddle.distributed instead by @DesmonDay in [Trainer] Remove dist_gather, usegather
under paddle.distributed instead #7509 - [Bug fixes]disable predictor testing by @wj-Mcat in [Bug fixes]disable predictor testing #7496
- Fix lora save by @zzjjay in Fix lora save #7504
- [Trainer] Fix dist gather by @DesmonDay in [Trainer] Fix dist gather #7513
- Gradio multi by @wtmlon in Gradio multi #7508
- [Bug fixes] remove qwen 14b-chat special token testing by @wj-Mcat in [Bug fixes] remove qwen 14b-chat special token testing #7514
- [Unified checkpoint] Add unified optimizer loader by @DrownFish19 in [Unified checkpoint] Add unified optimizer loader #7441
- remove useless import by @wtmlon in remove useless import #7517
- 【LLama Config】Add refined-recompute config in Llama model by @heavyrain-lzy in 【LLama Config】Add refined-recompute config in Llama model #7519
- rm sd ldm benchmark by @JunnYu in rm sd ldm benchmark #7434
- Hotfix recompute in PP for benchmark. by @Xreki in Hotfix recompute in PP for benchmark. #7479
- [Bug fixes]fix pyyaml installing under py310 by @wj-Mcat in [Bug fixes]fix pyyaml installing under py310 #7526
- [Paddle-pipelines] remove redundant requirements, upgrade to 0.6.2 by @w5688414 in [Paddle-pipelines] remove redundant requirements, upgrade to 0.6.2 #7518
- [chatglm] Fix fp16 o1 bugs by @w5688414 in [chatglm] Fix fp16 o1 bugs #7548
- 删除tipc中对于paddlemix的依赖 by @JunnYu in 删除tipc中对于paddlemix的依赖 #7543
- add scale dtype test for weight quantize by @wwbitejotunn in add scale dtype test for weight quantize #7552
- Fix init weight for llama modeling auto by @From00 in Fix init weight for llama modeling auto #7491
- [CLIP] 确保CLIP静态图导出的时候使用的是1D tensor by @JunnYu in [CLIP] 确保CLIP静态图导出的时候使用的是1D tensor #7555
- [llm]update data loading by @lugimzzz in [llm]update data loading #7557
- [PEFT]add qlora by @lugimzzz in [PEFT]add qlora #7416
- fix clip image processor by @westfish in fix clip image processor #7540
- [llm]fix qat lora by @lugimzzz in [llm]fix qat lora #7565
- init fp32 model and then cast to fp16 when use amp-o2 by @zhiqiu in init fp32 model and then cast to fp16 when use amp-o2 #7546
- [Bug fixes]fix intergration test by @wj-Mcat in [Bug fixes]fix intergration test #7569
- template.py AutoTemplate.load_from method read file with encoding utf8 by @anexplore in template.py AutoTemplate.load_from method read file with encoding utf8 #7558
- Set flash_attn mask to None in LlamaPretrainedModelAuto by @From00 in Set flash_attn mask to None in LlamaPretrainedModelAuto #7566
- [llm][gpt] fix mp parameter initialization by @xysheng-baidu in [llm][gpt] fix mp parameter initialization #7534
- [Dist Dataloader] Fix dist dataloader sampler by @DesmonDay in [Dist Dataloader] Fix dist dataloader sampler #7579
- Update set_seed in trainer_utils.py by @niuliling123 in Update set_seed in trainer_utils.py #7528
- remove astype for probs, all dtype should same with scores. by @zxcd in remove astype for probs, all dtype should same with scores. #7577
- [AutoParallel] Sequence Parallel Hybrid Parallelism for GPT3 by @JZ-LIANG in [AutoParallel] Sequence Parallel Hybrid Parallelism for GPT3 #7486
- [Bug Fixes]disable CUDA_DEVICE_MAX_CONNECTIONS for CE by @wj-Mcat in [Bug Fixes]disable CUDA_DEVICE_MAX_CONNECTIONS for CE #7586
- [gpt-3]Fix dynamic_prepare for CI by @Liujie0926 in [gpt-3]Fix dynamic_prepare for CI #7550
- Chat gradio by @wtmlon in Chat gradio #7581
- [LLM] inference model switch to new executor by @yuanlehome in [LLM] inference model switch to new executor #7467
- [Improvement] support chat_template for predictor & finetune_generation by @wj-Mcat in [Improvement] support chat_template for predictor & finetune_generation #7584
- [Auto Parallel] Support arg for enabling fused_linear_param_grad_add pass by @haohongxiang in [Auto Parallel] Support arg for enabling fused_linear_param_grad_add pass #7589
- Type improvement behavior correction, fix gpt-3's dtype. by @zxcd in Type improvement behavior correction, fix gpt-3's dtype. #7595
- 【benchmark】add max_mem_reserved for benchmark by @mmglove in 【benchmark】add max_mem_reserved for benchmark #7488
- Fix post_init bug for run_pretrain_auto by @From00 in Fix post_init bug for run_pretrain_auto #7604
- [Trainer] Remove dp group with group_sharded_parallel checks. by @ZHUI in [Trainer] Remove dp group with group_sharded_parallel checks. #7507
- [Bug fixes] open ci testing by @wj-Mcat in [Bug fixes] open ci testing #7606
- [Trainer] Add memory info in log by @ZHUI in [Trainer] Add memory info in log #7607
- add ce_gpt-gpt345_dp1 && dp8 && SD8_stage1 by @tianhaodongbd in add ce_gpt-gpt345_dp1 && dp8 && SD8_stage1 #7484
- [Llama2] Add uts for llama in semi-auto mode by @haohongxiang in [Llama2] Add uts for llama in semi-auto mode #7588
- [Pretrain] Fix Qwen paddle.outer calculation by @DesmonDay in [Pretrain] Fix Qwen paddle.outer calculation #7617
- [Bug fixes]Refactor chat template intergration test by @wj-Mcat in [Bug fixes]Refactor chat template intergration test #7618
- 【benchmark】add max_mem_reserved for llm benchmark by @Liujie0926 in 【benchmark】add max_mem_reserved for llm benchmark #7538
- [Llama2] Fix llama final norm in pp modeling by @haohongxiang in [Llama2] Fix llama final norm in pp modeling #7615
- [Qwen] Support Qwen flash attention 2 by @DesmonDay in [Qwen] Support Qwen flash attention 2 #7624
- add to_static for electra by @MayYouBeProsperous in add to_static for electra #7575
- [Unified Checkpoint] Add dynamic load unified checkpoint by @DesmonDay in [Unified Checkpoint] Add dynamic load unified checkpoint #7487
- Bug Fix, to fix pp sp layernorm grad sync by @iosmers in Bug Fix, to fix pp sp layernorm grad sync #7613
- [Llama2] Fix gradient accumulation for Llama2 training in auto model and add uts by @haohongxiang in [Llama2] Fix gradient accumulation for Llama2 training in auto model and add uts #7625
- [BugFix]Add local seed for gpt in flash_attn by @ForFishes in [BugFix]Add local seed for gpt in flash_attn #7626
- add args in set_seed by @niuliling123 in add args in set_seed #7592
- [FG] Fix fast generation by @DrownFish19 in [FG] Fix fast generation #7630
- Support SharedLayer with different prefixs for pipeline parallel by @DrownFish19 in Support SharedLayer with different prefixs for pipeline parallel #7598
- [Bug fixes] fix chatglm2 finetune for chat-template by @wj-Mcat in [Bug fixes] fix chatglm2 finetune for chat-template #7637
- [CI] add coverage status checks by @zjjlivein in [CI] add coverage status checks #7635
- [Bug fixes]disable chat-template in tipc by @wj-Mcat in [Bug fixes]disable chat-template in tipc #7633
- [AutoParallel] add resume_from_checkpoint and vpp for llama by @zhaoyinglia in [AutoParallel] add resume_from_checkpoint and vpp for llama #7601
- update for seed by @niuliling123 in update for seed #7651
- update for seed by @niuliling123 in update for seed #7653
- Type improvement behavior correction, index_sample API already support float16, logits no need to cast. by @zxcd in Type improvement behavior correction, index_sample API already support float16, logits no need to cast. #7619
- [AutoParallel] add pipeline.auto_parallel_profiler to auto_config by @AndSonder in [AutoParallel] add pipeline.auto_parallel_profiler to auto_config #7343
- add CE test for stage2 by @iosmers in add CE test for stage2 #7544
- add ce for mp2-pp2-sp2-dp2 by @iosmers in add ce for mp2-pp2-sp2-dp2 #7664
- [BugFix] Fix qwen merge tp bug by @DesmonDay in [BugFix] Fix qwen merge tp bug #7675
- Improve flash-attn mask calculation for auto llama by @From00 in Improve flash-attn mask calculation for auto llama #7670
- add ce_mp8 gpt test case by @xysheng-baidu in add ce_mp8 gpt test case #7665
- add to_static for electra-gen by @MayYouBeProsperous in add to_static for electra-gen #7656
- 【Hackathon 5th No.64】PaddleNLP套件模型接入动转静训练功能 -part by @MayYouBeProsperous in 【Hackathon 5th No.64】PaddleNLP套件模型接入动转静训练功能 -part #7576
- fix cache kv len by @zhink in fix cache kv len #7679
- [bug fix] Add ce for mp2-sp2-pp2-dp2 by @iosmers in [bug fix] Add ce for mp2-sp2-pp2-dp2 #7678
- [Llama2] Fix setting seeds of distributed training for Llama model with Auto Parallel by @haohongxiang in [Llama2] Fix setting seeds of distributed training for Llama model with Auto Parallel #7643
- [llm]Qlora fix by @lugimzzz in [llm]Qlora fix #7645
- llm doc update by @wtmlon in llm doc update #7478
- [Bug fixes] fix tokenizer decode-tokens by @wj-Mcat in [Bug fixes] fix tokenizer decode-tokens #7681
- add ce_sharding-stage3 && ce_mp_sharding-stage1 by @tianhaodongbd in add ce_sharding-stage3 && ce_mp_sharding-stage1 #7639
- Support sep by @pangengzheng in Support sep #6994
- [CI] set codecov status check by @zjjlivein in [CI] set codecov status check #7647
- [PEFT] support lora conv2d by @JunnYu in [PEFT] support lora conv2d #7680
- [CI]update for llm_gpt by @Liujie0926 in [CI]update for llm_gpt #7631
- fix ernie-vil2.0 example by @MayYouBeProsperous in fix ernie-vil2.0 example #7673
- Fix typos by @co63oc in Fix typos #7687
- Pp reshard develop by @liuzhenhai93 in Pp reshard develop #7629
- [DOC] Add pretraining docs and report pretrain performance. by @ZHUI in [DOC] Add pretraining docs and report pretrain performance. #7437
- [Type Promotion] fix dtype for LayoutXLMSelfOutput by @zxcd in [Type Promotion] fix dtype for LayoutXLMSelfOutput #7697
- support torch safetensors load by @wtmlon in support torch safetensors load #7690
- [PreTraining] Unify pretraining scripts by @ZHUI in [PreTraining] Unify pretraining scripts #7700
- mv refined recompute to run_pretrain_auto by @zhaoyinglia in mv refined recompute to run_pretrain_auto #7701
- Fix ce bug by @iosmers in Fix ce bug #7686
- [LLM PDC] merge some llm pdc code into develop by @JunnYu in [LLM PDC] merge some llm pdc code into develop #7694
- [BUG Fix] update IntervalStrategy, update lora conv2d init, update logger.warning by @JunnYu in [BUG Fix] update IntervalStrategy, update lora conv2d init, update logger.warning #7705
- [GPT2] Delete model zoo gpt2. by @ZHUI in [GPT2] Delete model zoo gpt2. #7702
- remove chunkid annotation by @zhaoyinglia in remove chunkid annotation #7703
- [LLM] Fix weight_quantize dtype. by @xiaoxiaohehe001 in [LLM] Fix weight_quantize dtype. #7692
- [Trainer] Fix bugs of setting distributed seed bugs by @haohongxiang in [Trainer] Fix bugs of setting distributed seed bugs #7715
- [BugFix] fix gpt by @ZHUI in [BugFix] fix gpt #7716
- fix lora from pretrained. by @zzjjay in fix lora from pretrained. #7714
- [Bug fixes] use single thread to do prediction by @wj-Mcat in [Bug fixes] use single thread to do prediction #7720
- [Improvement]support base-port & flask-port by @wj-Mcat in [Improvement]support base-port & flask-port #7668
- Fix sep compatible by @pangengzheng in Fix sep compatible #7721
- [Trainer] Support release gradients for develop branch by @haohongxiang in [Trainer] Support release gradients for develop branch #7594
- [Bug fixes] update input_spec for forward by @wj-Mcat in [Bug fixes] update input_spec for forward #7648
- Add unified checkpoint config by @DrownFish19 in Add unified checkpoint config #7574
- [BugFix] Fix dead ln links. by @ZHUI in [BugFix] Fix dead ln links. #7729
- 去掉内部关于tgt_attention_mask的一些修改 by @zhoutianzi666 in 去掉内部关于tgt_attention_mask的一些修改 #7696
- [Paddle-pipelines] update pdf by @qingzhong1 in [Paddle-pipelines] update pdf #7737
- update fasttokenizer readme by @JunnYu in update fasttokenizer readme #7684
- modify pdf bug by @qingzhong1 in modify pdf bug #7738
- Update unified checkpoint by @DrownFish19 in Update unified checkpoint #7730
- Add mp delay_scale_loss function by @sneaxiy in Add mp delay_scale_loss function #7713
- Update semantic_search.yaml by @kingTLE in Update semantic_search.yaml #7708
- Update Install_windows.md by @kingTLE in Update Install_windows.md #7707
- Support AWQ & GroupWiseQuant for LLMs by @RachelXu7 in Support AWQ & GroupWiseQuant for LLMs #7688
- [Paddle-pipelines] update processing by @qingzhong1 in [Paddle-pipelines] update processing #7743
- sharding reshard 兼容老版本 by @liuzhenhai93 in sharding reshard 兼容老版本 #7734
- [bloom] Add kv cache support for flash attention & fix bugs by @w5688414 in [bloom] Add kv cache support for flash attention & fix bugs #7735
- Fix qwen matix dimension alignment v1 by @WAI-clear in Fix qwen matix dimension alignment v1 #7473
- [Pretrain] Fix llama max_seq_len settings by @DesmonDay in [Pretrain] Fix llama max_seq_len settings #7745
- [Improvement] support system prompt in training dataset by @wj-Mcat in [Improvement] support system prompt in training dataset #7667
New Contributors
- @Wennie396 made their first contribution in add if-else use_new_executor branch #6897
- @Wong4j made their first contribution in Fix synchronized memcpy in GPT #7008
- @yuanlehome made their first contribution in [LLM] Support gpt3 fine grained dybatch v1 #7080
- @Xreki made their first contribution in Polish the benchmark scripts and add test for N4C32. #7105
- @Tom-Zheng made their first contribution in [NVIDIA] Add stage2 NCCL kernel overlap #7092
- @TimeYWL made their first contribution in group null judge fix #7122
- @From00 made their first contribution in Enable FLAGS_new_executor_micro_batching in AutoEngine #7168
- @RichardWooSJTU made their first contribution in [LLM] Reconstruct fused transformer layers #7186
- @heavyrain-lzy made their first contribution in fix error in dropout of hybrid_model #7269
- @LokeZhou made their first contribution in [Other]TopPProcess add bfloat16 input #7337
- @JZ-LIANG made their first contribution in [AutoParallel] Sequence Parallel for Auto Mode #7301
- @WAI-clear made their first contribution in fix qwen lm_head matrix dimmention alignment #7402
- @tianhaodongbd made their first contribution in dp main_grad #7293
- @zzjjay made their first contribution in Fix lora save #7504
- @anexplore made their first contribution in template.py AutoTemplate.load_from method read file with encoding utf8 #7558
- @niuliling123 made their first contribution in Update set_seed in trainer_utils.py #7528
- @zxcd made their first contribution in remove astype for probs, all dtype should same with scores. #7577
- @MayYouBeProsperous made their first contribution in add to_static for electra #7575
- @iosmers made their first contribution in Bug Fix, to fix pp sp layernorm grad sync #7613
- @AndSonder made their first contribution in [AutoParallel] add pipeline.auto_parallel_profiler to auto_config #7343
- @zhink made their first contribution in fix cache kv len #7679
- @kingTLE made their first contribution in Update semantic_search.yaml #7708
Full Changelog: v2.6.1...1982091
Metadata
Metadata
Assignees
Labels
othersunknown issue typeunknown issue type