Skip to content

Conversation

@DrRyanHuang
Copy link
Contributor

@DrRyanHuang DrRyanHuang commented Mar 8, 2025

PR Category

Execute Infrastructure

PR Types

Bug fixes

Description

TL;NR
在动态图转静态图(动转静)模型导出流程中,paddle.jit.save 会在组网过程中将部分中间内容缓存于 _global_inplace_map,并以 id(program) 作为 key。由于该缓存未在组网结束后及时清理,若 program 被复用,则可能出现缓存指向已被释放内存的情况,导致访问非法内存并最终触发段错误(Segmentation fault)。本 PR 针对这一问题,在组网流程结束后增加了对 _global_inplace_map 的相应清理,确保缓存与内存状态一致,消除悬垂指针隐患,提升导出流程的稳定性。


在 PaddleSeg 中,即使是以动态图形式运行,但依旧会执行导出过程(会进行动转静)

def export(args, model=None, save_dir=None, use_ema=False):
	......
    model = paddle.jit.to_static(model, input_spec=input_spec)
	......
    paddle.jit.save(model, inference_model_path)

jit.save 过程中,多次执行(>50次)之后会出现段错误

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   paddle::pybind::static_api_add(_object*, _object*, _object*)
1   paddle::dialect::add(pir::Value const&, pir::Value const&)
2   paddle::dialect::GetValueDataType(pir::Value const&)
3   paddle::dialect::GetValueDataType(pir::Type const&)
4   pir::DenseTensorType::classof(pir::Type)
5   pir::AbstractType::GetInterfaceImpl(pir::TypeId) const

----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1740228764 (unix time) try "date -d @1740228764" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0x21) received by PID 19557 (TID 0x7f922eb4e740) from PID 33 ***]

paddle.jit.save 中会进行组网,组网过程中会缓存一部分内容,存在 _global_inplace_map_global_parameter_recorder

_global_parameter_recorder 在组网结束后会自动释放其中内容,而_global_inplace_map不会释放

由于是以 id(program) 为 key,所以如果某个 program 被复用,则存在访问已被释放变量的风险,导致段错误
本PR添加释放过程

PCard-66972

@paddle-bot
Copy link

paddle-bot bot commented Mar 8, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Mar 8, 2025
@SigureMo SigureMo changed the title [Dy2st] add InplaceMap pop [Dy2St] Clear InplaceMap after program is completed Mar 8, 2025
@SigureMo SigureMo requested a review from Copilot March 8, 2025 09:16
Copy link
Member

@SigureMo SigureMo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTMeow 🐾

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR addresses a bug fix in the export process where repeated calls to paddle.jit.save eventually led to segmentation faults by ensuring that the InplaceMap gets cleared upon program completion.

  • Added a pop method to clear the parameters associated with a program in the InplaceMap.
  • Modified the program translator to call _global_inplace_map.pop(main_program) after handling parameters.

Reviewed Changes

File Description
python/paddle/jit/pir_dy2static/parameter_recorder.py Added a pop method to remove a program’s entry from params_dict
python/paddle/jit/dy2static/program_translator.py Invoked _global_inplace_map.pop(main_program) to clear the cached inplace map

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

@SigureMo SigureMo changed the title [Dy2St] Clear InplaceMap after program is completed [Dy2St] Clear InplaceMap after program is completed Mar 8, 2025
@SigureMo SigureMo changed the title [Dy2St] Clear InplaceMap after program is completed [Dy2St][3.13] Clear InplaceMap after program is completed Mar 8, 2025
@SigureMo SigureMo changed the title [Dy2St][3.13] Clear InplaceMap after program is completed [Dy2St] Clear InplaceMap after program is completed Mar 8, 2025
@SigureMo SigureMo merged commit bb3f9c9 into PaddlePaddle:develop Mar 8, 2025
33 of 34 checks passed
@DrRyanHuang DrRyanHuang deleted the new_hash branch March 10, 2025 02:17
YqGe585 pushed a commit to YqGe585/Paddle that referenced this pull request May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants