Skip to content

aarch64: ZFS image crashes due to a page fault caused by enabled access flag #1131

@wkozaczuk

Description

@wkozaczuk

The crash looks like this:

OSv v0.55.0-206-g6df05b6c
getauxval() stubbed
eth0: 192.168.122.15
Booted up in 141.08 ms
Cmdline: /hello
page fault outside application, addr: 0x0000100000000000
[registers]
PC: 0x000000004047a070 <???+1078435952>
X00: 0x0000100000000000 X01: 0x0000100000001000 X02: 0x0000100000000000
X03: 0x0000000084448004 X04: 0x0000000000000040 X05: 0x000000004087a000
X06: 0x0000200000100140 X07: 0x0000100000000000 X08: 0x0000000000000000
X09: 0x0000100000000000 X10: 0x000000000000007b X11: 0x0000000000000000
X12: 0x0000000000000001 X13: 0x0000000000000000 X14: 0x0000000000010000
X15: 0x0000000000000000 X16: 0x00000000401d93b0 X17: 0x0000000000000001
X18: 0x0000000000000000 X19: 0xffffa000410adb00 X20: 0x0000100000000000
X21: 0x0000000000001000 X22: 0x0000000000000000 X23: 0xffffa000410adb00
X24: 0x000000009600000b X25: 0x0000000000000005 X26: 0xffffa00040947be0
X27: 0xffffa000410ada00 X28: 0x0000200000100680 X29: 0x00002000001000e0
X30: 0x00000000401e4ac4 SP:  0x00002000001000e0 ESR: 0x000000009600014b
PSTATE: 0x0000000080000345
Aborted

[backtrace]
0x00000000401da5c4 <mmu::vm_fault(unsigned long, exception_frame*)+724>
0x000000004020bb18 <page_fault+100>
0x000000004020b824 <???+1075886116>
0x00000000401da3b8 <mmu::vm_fault(unsigned long, exception_frame*)+200>
0x000000004020bb18 <page_fault+100>
0x000000004020b824 <???+1075886116>
0x00000000401f1f98 <elf::program::load_object(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<std::shared_ptr<elf::object>, std::allocator<std::shared_ptr<elf::object> > >&)+2696>
0x00000000401f28e4 <elf::program::get_library(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, bool)+116>
0x000000004031507c <osv::application::prepare_argv(elf::program*)+252>
0x0000000040315890 <osv::application::application(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<0x00000000403160ac <osv::application::run(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>0x0000000040316328 <osv::application::run(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)+72>
0x00000000400dc78c <do_main_thread(void*)+2220>
0x0000000040348670 <???+1077184112>
0x00000000402e7290 <thread_main_c+32>
0x00000000402c04dc <???+1076626652>

Disabling the page_access_scanner thread makes the image run without this error but it takes long before the app is actually run.

The relevant information from AArch64 programmer's guide:
"Another memory attribute bit in the descriptor, the Access Flag (AF), indicates when a block entry is used for the first time.
• AF = 0: This block entry has not yet been used.
• AF = 1: This block entry has been used.
Operating systems use an access flag bit to keep track of which pages are being used. Software manages the flag. When the page is first created, its entry has AF set to 0. The first time the page is accessed by code, if it has AF at 0, this triggers an MMU fault. The Page fault handler records that this page is now being used and manually sets the AF bit in the table entry. For example, the Linux kernel uses the [AF] bit for PTE_AF on ARM64 (the Linux kernel name for AArch64), which is used to check whether a page has ever been accessed. This influences some of the kernel memory management choices. For example, when a page must be swapped out of memory, it is less likely to swap out pages that are being actively used."

The issue seems to be that OSv does not handle or recognize memory access fault. It seems that page cache scanner needs to be adapter to work on AArch64 where rather than scanning if a page has been accessed, it should simply handle the fault and react then.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions