Hi,
Intuitively, pcie bandwidth is much lower than hbm bandwidth. It seems that its difficult to converge unless u severely slowdown XPU when using CoW or soft dirty bit, which may largely affects the value of this work?
Is it possible in most cases where memory wall is the bottleneck this work has only little or even none improvement compared with the traditional stop-the-world ckpt? Did i get it wrong?
Thx for reply.