Skip to content

Conversation

fntlnz
Copy link
Contributor

@fntlnz fntlnz commented Sep 11, 2025

Changed the check pagehole logic to go through the entire parent chain to find missing pages. More details on how to reproduce in the linked issue.

Fixes #2738

@fntlnz
Copy link
Contributor Author

fntlnz commented Sep 11, 2025

I figured y'all might want me to add a zdtm test for this, I see that there are tests for predumps already but they don't trigger the behavior. I see that it consistently happens when the target process uses at least 3.6G of memory, maybe we can add a test that does that or modify the existing ones.

@fntlnz fntlnz force-pushed the fix/check-parent-dumps branch from 1a2f34e to 94f91c6 Compare September 11, 2025 16:05
@fntlnz
Copy link
Contributor Author

fntlnz commented Sep 11, 2025

force push to rebase criu-dev

@fntlnz
Copy link
Contributor Author

fntlnz commented Sep 17, 2025

@avagin @rst0git if there is any more info I can provide to help you folks when reviewing this please let me know. Thanks

@rst0git rst0git self-assigned this Sep 17, 2025
@avagin
Copy link
Member

avagin commented Sep 18, 2025

@fntlnz this patch isn't right. By design, criu has to read just previous pagemap image to figure out what pages has to be dumped. The issue in pagemap_len is an integer overflow when calculating a region size. Your reproducer should pass with this patch:

diff --git a/criu/include/pagemap.h b/criu/include/pagemap.h
index 3ae15deb9..fae110108 100644
--- a/criu/include/pagemap.h
+++ b/criu/include/pagemap.h
@@ -121,7 +121,7 @@ extern int dedup_one_iovec(struct page_read *pr, unsigned long base, unsigned lo
 
 static inline unsigned long pagemap_len(PagemapEntry *pe)
 {
-	return pe->nr_pages * PAGE_SIZE;
+	return (unsigned long)pe->nr_pages * PAGE_SIZE;
 }
 
 static inline bool page_read_has_parent(struct page_read *pr)

The real fix should convert nr_pages to uint64, but we need to preserve restore compatibility with previous images.

@fntlnz
Copy link
Contributor Author

fntlnz commented Sep 18, 2025

@avagin thanks for taking a look, I suspected that something else might be wrong because it seemed strange that we would need to do that. I didn't realize it was supposed to read only the previous pagemap. I'll test and report back soon.

@fntlnz
Copy link
Contributor Author

fntlnz commented Sep 18, 2025

@avagin I tested the cast and it works with my reproducer. I'll close this one and send another patch with your fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Third predump fails with "Missing in parent pagemap" when compiled with musl
3 participants