Skip to content

Conversation

@tizhou86
Copy link
Member

PR Category

Custom Device

PR Types

New features

Description

Add xpu async memory copy to enable zero cost checkpoint

@paddle-bot
Copy link

paddle-bot bot commented Feb 18, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the XPU label Feb 18, 2025
@tizhou86 tizhou86 force-pushed the flashcheckpoint branch 2 times, most recently from 033a925 to 406fb2d Compare February 21, 2025 07:19
std::shared_ptr<distributed::XpuAsyncLoad::Task>>(
*m, "XpuAsyncLoadTask")
.def("is_completed",
&distributed::XpuAsyncLoad::Task::IsCompleted,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个接口没有测试到

return;
}
// platform::MemcpySyncH2D(dst, src, num, dst_place);
xpu_memcpy_async(dst, src, num, XPU_HOST_TO_DEVICE, stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里没有检查返回值

return;
}
// platform::MemcpySyncD2H(dst, src, num, src_place);
xpu_memcpy_async(dst, src, num, XPU_DEVICE_TO_HOST, stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没有检查返回值

// (but let's store a CPU event just so we can return a reference).
// In a real design, you might do a separate approach.

phi::Place event_place = is_xpu_place(place) ? phi::CPUPlace() : place;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XPU为什么不需要创建event?

data1 = paddle.randn([10, 10])
print_debug_info(data1, "data1 (for compute)")

# Offload data0 -> pinned memory (usually on CPU)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

似乎没有看到哪里有指定是CPU pinned memory的类型?

void XpuAsyncLoad::SyncCalcuStream(const Place& place,
phi::XPUContext* offload_ctx,
platform::DeviceEvent* calc_event) {
if (is_xpu_place(place)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个place似乎是offload的src place,也就是XPU place,为啥不需要插入event wait?

src.place(),
src_ptr,
size,
/*stream=*/nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里使用nullptr stream,而task->UpdateWaitChain(*load_ctx_)使用的是load_ctx_的stream,可能会有同步问题,即task.wait返回成功了,但是copy可能还未成功

src.place(),
src_ptr,
size,
/*stream=*/nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

return task;
}

/* ------------ Reload (CPU -> XPU) ------------ */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reload的CPU如果不是pinned memory,xpu_memcpy_async可能会退化成同步xpu_memcpy

Copy link
Contributor

@SylarTiaNII SylarTiaNII left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tizhou86 tizhou86 merged commit bdeb17b into PaddlePaddle:develop Mar 10, 2025
31 checks passed
YqGe585 pushed a commit to YqGe585/Paddle that referenced this pull request May 7, 2025
…addlePaddle#71168)

* [XPU] feat: add xpu async memory copy to enable zero cost checkpoint

* [XPU] feat: add xpu async memory copy to enable zero cost checkpoint

* [XPU] feat: add xpu async memory copy to enable zero cost checkpoint

* [XPU] feat: add xpu async memory copy to enable zero cost checkpoint

* [XPU] feat: add xpu async memory copy to enable zero cost checkpoint

* [XPU] feat: add xpu async memory copy to enable zero cost checkpoint

* [XPU] feat: add xpu async memory copy to enable zero cost checkpoint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants