-
-
Notifications
You must be signed in to change notification settings - Fork 606
Description
This issue was discovered by @wkozaczuk and reported on the mailing list. It is likely a recent regression, or an old bug which for an unknown reason got exposed by recent changes to the memory layout.
Running tst-huge.so with 1GB of memory works fine:
scripts/build image=tests
scripts/run.py -m1G -e tests/tst-huge.so
However, giving it bigger memory, like the default 2GB, results in a hang.
With gdb we can see the test thread hung on a deadlock:
(gdb) connect
(gdb) osv syms
(gdb) osv info threads
...
253 (0xffff80000255b040) tests/tst-huge. cpu1 status::waiting waitqueue::wait(lockfree::mutex&) at core/waitqueue.cc:56 vruntime 2.75268e-19
(gdb) osv thread
(gdb) bt
#0 sched::thread::switch_to (this=0xffff800000a6a040,
this@entry=0xffff800000a81040) at arch/x64/arch-switch.hh:108
#1 0x00000000403f31c4 in sched::cpu::reschedule_from_interrupt (
this=0xffff800000053040, called_from_yield=called_from_yield@entry=false,
preempt_after=..., preempt_after@entry=...) at core/sched.cc:339
#2 0x00000000403f36ac in sched::cpu::schedule () at include/osv/sched.hh:1310
#3 0x00000000403f3d72 in sched::thread::wait (
this=this@entry=0xffff80000255b040) at core/sched.cc:1214
#4 0x000000004041b0e2 in sched::thread::do_wait_for<lockfree::mutex, sched::wait_object<waitqueue> > (mtx=...) at include/osv/mutex.h:41
#5 sched::thread::wait_for<waitqueue&> (mtx=...) at include/osv/sched.hh:1220
#6 waitqueue::wait (this=this@entry=0x409a3550 <mmu::vma_list_mutex+48>,
mtx=...) at core/waitqueue.cc:56
#7 0x00000000403d6ea3 in rwlock::reader_wait_lockable (this=<optimized out>)
at core/rwlock.cc:174
#8 rwlock::rlock (this=this@entry=0x409a3520 <mmu::vma_list_mutex>)
at core/rwlock.cc:29
#9 0x0000000040341e0f in rwlock_for_read::lock (
this=0x409a3520 <mmu::vma_list_mutex>) at include/osv/rwlock.h:113
#10 std::lock_guard<rwlock_for_read&>::lock_guard (__m=...,
this=<synthetic pointer>) at /usr/include/c++/8/bits/std_mutex.h:162
#11 lock_guard_for_with_lock<rwlock_for_read&>::lock_guard_for_with_lock (
lock=..., this=<synthetic pointer>) at include/osv/mutex.h:89
#12 mmu::vm_fault (addr=18446603337326391296, addr@entry=18446603337326395384,
ef=ef@entry=0xffff800002560068) at core/mmu.cc:1334
#13 0x000000004039cdf6 in page_fault (ef=0xffff800002560068)
at arch/x64/mmu.cc:38
#14 <signal handler called>
#15 0x00000000403e5b64 in memory::page_range_allocator::insert<true> (
this=this@entry=0x409bb300 <memory::free_page_ranges>, pr=...)
at core/mempool.cc:575
#16 0x00000000403e337c in memory::page_range_allocator::<lambda(memory::page_range&)>::operator() (header=..., __closure=<synthetic pointer>)
at core/mempool.cc:751
#17 memory::page_range_allocator::<lambda(memory::page_range&)>::operator() (
header=..., __closure=<synthetic pointer>) at core/mempool.cc:736
#18 memory::page_range_allocator::for_each<memory::page_range_allocator::alloc_aligned(size_t, size_t, size_t, bool)::<lambda(memory::page_range&)> > (f=...,
min_order=<optimized out>, this=0x409bb300 <memory::free_page_ranges>)
at core/mempool.cc:809
#19 memory::page_range_allocator::alloc_aligned (
this=this@entry=0x409bb300 <memory::free_page_ranges>,
size=size@entry=2097152, offset=offset@entry=0,
alignment=alignment@entry=2097152, fill=fill@entry=true)
at core/mempool.cc:736
#20 0x00000000403e3c90 in memory::alloc_huge_page (N=N@entry=2097152)
at core/mempool.cc:1601
#21 0x000000004034718e in mmu::uninitialized_anonymous_page_provider::map (
this=0x408681b0 <mmu::page_allocator_init>, offset=83886080, ptep=...,
pte=..., write=<optimized out>) at include/osv/mmu-defs.hh:219
#22 0x000000004034c79a in mmu::populate<(mmu::account_opt)1>::page<1> (
offset=83886080, ptep=..., this=0x2000001ffd80)
at include/osv/mmu-defs.hh:235
#23 mmu::page<mmu::populate<>, 1> (ptep=..., offset=83886080, pops=...)
at core/mmu.cc:311
#24 mmu::map_level<mmu::populate<(mmu::account_opt)1>, 2>::operator() (
base_virt=35185397596160, parent=..., this=<synthetic pointer>)
at core/mmu.cc:437
#25 mmu::map_level<mmu::populate<(mmu::account_opt)1>, 3>::map_range<2> (
this=<synthetic pointer>, ptep=..., base_virt=35184372088832, slop=4096,
page_mapper=..., size=132120576, vcur=<optimized out>) at core/mmu.cc:399
#26 mmu::map_level<mmu::populate<(mmu::account_opt)1>, 3>::operator() (
base_virt=35184372088832, parent=..., this=<synthetic pointer>)
at core/mmu.cc:449
#27 mmu::map_level<mmu::populate<(mmu::account_opt)1>, 4>::map_range<3> (
this=<synthetic pointer>, ptep=..., base_virt=35184372088832, slop=4096,
page_mapper=..., size=134217728, vcur=<optimized out>) at core/mmu.cc:399
#28 mmu::map_level<mmu::populate<(mmu::account_opt)1>, 4>::operator() (
base_virt=35184372088832, parent=..., this=<synthetic pointer>)
at core/mmu.cc:449
#29 mmu::map_range<mmu::populate<(mmu::account_opt)1> > (
vma_start=vma_start@entry=35185313710080,
vstart=vstart@entry=35185313710080, size=<optimized out>, page_mapper=...,
slop=slop@entry=4096) at core/mmu.cc:354
#30 0x000000004034cfc4 in mmu::operate_range<mmu::populate<(mmu::account_opt)1> > (size=<optimized out>, start=0x200038200000, vma_start=<optimized out>,
mapper=...) at core/mmu.cc:801
#31 mmu::vma::operate_range<mmu::populate<(mmu::account_opt)1> > (size=3,
addr=0x200038200000, mapper=..., this=0xffffa000481ff300)
at core/mmu.cc:1412
#32 mmu::populate_vma<(mmu::account_opt)1> (vma=vma@entry=0xffffa000481ff300,
v=v@entry=0x200038200000, size=size@entry=134217728,
write=write@entry=false) at core/mmu.cc:1206
#33 0x0000000040345809 in mmu::map_anon (addr=addr@entry=0x0,
size=size@entry=134217728, flags=flags@entry=2, perm=perm@entry=3)
at core/mmu.cc:1222
#34 0x0000000040456dad in mmap (addr=addr@entry=0x0,
length=length@entry=134217728, prot=prot@entry=3, flags=flags@entry=32802,
fd=fd@entry=-1, offset=offset@entry=0) at libc/mman.cc:156
#35 0x00001000000685d4 in exhaust_memory (size=size@entry=134217728)
at /home/nyh/osv.tmp2/osv/tests/tst-huge.cc:31
#36 0x000010000006820b in main (argc=<optimized out>, argv=<optimized out>)
length=length@entry=134217728, prot=prot@entry=3, flags=flags@entry=32802,
fd=fd@entry=-1, offset=offset@entry=0) at libc/mman.cc:156
#35 0x00001000000685d4 in exhaust_memory (size=size@entry=134217728)
at /home/nyh/osv.tmp2/osv/tests/tst-huge.cc:31
#36 0x000010000006820b in main (argc=<optimized out>, argv=<optimized out>)
The deadlock happens because map_anon (frame 33) takes vma_list_mutex.for_write()
and then we get a page fault, and page_fault (frame 13) takes the same rwlock for reading. And apparently our rwlock implementation isn't recursive. Which maybe we should reconsider, but this isn't the real bug here - the real bug is why we got this page fault in the first place.
The page fault happens on address 0xffff800041dffff8:
(gdb) fr 13
#13 0x000000004039cdf6 in page_fault (ef=0xffff800002560068)
at arch/x64/mmu.cc:38
38 mmu::vm_fault(addr, ef);
(gdb) p (void*) addr
$2 = (void *) 0xffff800041dffff8
How is this address calculated?
(gdb) fr 15
#15 0x00000000403e5b64 in memory::page_range_allocator::insert<true> (
this=this@entry=0x409bb300 <memory::free_page_ranges>, pr=...)
at core/mempool.cc:575
575 void insert(page_range& pr) {
(gdb) l
570 return size;
571 }
572
573 private:
574 template<bool UseBitmap = true>
575 void insert(page_range& pr) {
576 auto addr = static_cast<void*>(&pr);
577 auto pr_end = static_cast<page_range**>(addr + pr.size - sizeof(page_range**));
578 *pr_end = ≺
579 auto order = ilog2(pr.size / page_size);
(gdb) p addr
$3 = (void *) 0xffff800002561000
(gdb) p pr_size
No symbol "pr_size" in current context.
(gdb) p pr.size
$4 = 1066004480
(gdb) p pr_end
$5 = (memory::page_range **) 0xffff800041dffff8
So the problematic address is pr_end. It is calculated by taking pr and adding to it 1066004480, which is almost one gigabyte. Does this make sense? Do we really have such a huge continguous (in physical memory...) allocation? We need to debug this further.
Another interesting observation: running tst-huge.so with -m2G or even -m1.1G fails as above. But if I run it with sizes only a tiny bit over 1G, it fails in more spectacular ways:
$ scripts/run.py -m1.01G -e tests/tst-huge.so
OSv v0.53.0-88-g5377a50b
Assertion failed: ef->rflags & processor::rflags_if (arch/x64/mmu.cc: page_fault: 34)
Halting.
$ scripts/run.py -m1.02G -e tests/tst-huge.so
OSv v0.53.0-88-g5377a50b
page fault outside application, addr: 0xffff800067a13000
page fault outside application, addr: 0x0000000000000000
exception nested too deeplyHalting.
I'll open a new issue about this, because it happens before running the test, so it's not at all specific to tst-huge.so - although possibly it will end up that both issues stem from the same bug.