Skip to content

Conversation

@NHDaly
Copy link
Contributor

@NHDaly NHDaly commented Apr 10, 2019

This is a super-simple fix for JuliaInterop/libcxxwrap-julia#24.

I'm not sure if it's the best approach, but it seems sensible to me: nothing should be incorrectly baked in at compiletime and it will be reinitialized at runtime for any runtime-only calls to CxxWrap.

I am opening this PR to see if all the tests still pass, because I wasn't able to figure out how to run the tests locally. :)

@NHDaly
Copy link
Contributor Author

NHDaly commented Apr 11, 2019

Huh, so this didn't actually fix the matter, but I don't yet understand why. Hopefully someone can help me here.

I'm still getting the same segfault during static compilation, even with this PR.

I tried changing to call __init__() instead of the manual ccall thinking maybe I need the call to dlopen, but that didn't change anything.

I added some print statements that show that indeed the module global variable is being set. With this change:

diff --git a/src/CxxWrap.jl b/src/CxxWrap.jl
index 3baa25e..ab0dbd2 100644
--- a/src/CxxWrap.jl
+++ b/src/CxxWrap.jl
@@ -383,11 +386,17 @@ function readmodule(so_path::AbstractString, funcname, m::Module)
   Core.eval(m, :(const __cxxwrap_nbpointers = $nb_pointers))
 end

+get_cxxwrap_module() = ccall((:get_cxxwrap_module, jlcxx_path), Ptr{Module}, ())
+
 function wrapmodule(so_path::AbstractString, funcname, m::Module)
   # Initialize the jlcxxwrap library from top-level user code, so that it will still
   # be initialized for static compilation. (See:
   #   https://github.com/JuliaInterop/libcxxwrap-julia/issues/24)
-  ccall((:initialize, jlcxx_path), Cvoid, (Any, Any), CxxWrap, CppFunctionInfo)
+  @show get_cxxwrap_module()
+  CxxWrap.__init__()
+  println("Initialized")
+  @show get_cxxwrap_module()
+
   readmodule(so_path, funcname, m)
   wraptypes(m)
   wrapfunctions(m)

I now see this output before the crash:

get_cxxwrap_module() = Ptr{Module} @0x0000000000000000
Initialized
get_cxxwrap_module() = Ptr{Module} @0x000000010dd55120

signal (11): Segmentation fault: 11

So indeed, it is being set. But what seems to be happening is that from within the C++ module i'm actually wrapping, it hasn't seen the change:

 JLCXX_MODULE define_julia_module(jlcxx::Module& pagerWrap) {

+  std::cout << "jlcxx: " << jlcxx::get_cxxwrap_module() << std::endl;
+
   pagerWrap.add_type<rai::CloudStorageOptions>("CloudStorageOptions");

But here i see jlcxx: 0x0, and it still segfaults inside add_type:

get_cxxwrap_module() = Ptr{Module} @0x0000000000000000
Initialized
get_cxxwrap_module() = Ptr{Module} @0x000000010dd55120
jlcxx: 0x0

signal (11): Segmentation fault: 11

And when I run without static compilation, the C++ sees the same _g_cxxwrap_module() both before and after setting it in @wrapmodule, although it does change to a new value when called from our module (PagerWrap)'s __init__ function.

Here are those same print-statements when running without static compilation (and here I also print the value from register_julia_module, and added stacktraces for clarity):

get_cxxwrap_module() = Ptr{Module} @0x0000000119902980
Initialized
get_cxxwrap_module() = Ptr{Module} @0x0000000119902980
[ Info: register_julia_module
get_cxxwrap_module() = Ptr{Module} @0x0000000119902980
19-element Array{Base.StackTraces.StackFrame,1}:
 register_julia_module(::Module, ::Ptr{Nothing}) at CxxWrap.jl:155
 readmodule(::String, ::Symbol, ::Module) at CxxWrap.jl:389
 wrapmodule(::String, ::Symbol, ::Module) at CxxWrap.jl:405
...
jlcxx: 0x119902980

[ Info: register_julia_module
get_cxxwrap_module() = Ptr{Module} @0x0000000112fdf340
11-element Array{Base.StackTraces.StackFrame,1}:
 register_julia_module(::Module) at CxxWrap.jl:166
 __init__() at PagerWrap.jl:75
...
jlcxx: 0x112fdf340

So, is it possible that somehow (due to static compilation), there are two different instances of the libcxxwrap being dynamically loaded? Where julia is calling into one version, but our c++ module is calling into a different one? And this only happens during static compilation somehow?

Thanks in advance for the help!

@NHDaly
Copy link
Contributor Author

NHDaly commented Apr 12, 2019

@SimonDanisch: Any chance you could help me understand the difference in behavior for the statically compiled version? Is it common that during static compilation we'd load multiple distinct instances of C/C++ dynamically linked libraries?

@SimonDanisch
Copy link

Pfew, that's the part I know the least about static compilation :(

@staticfloat
Copy link
Contributor

@NHDaly Can you run this within a debugger such as gdb or lldb, and at the point that it crashes, dump out the currently loaded shared libraries? In gdb that's info sharedlibraries. We should start by seeing if you do indeed have multiple copies of a library loaded, as that's obviously not what we want. :)

@NHDaly
Copy link
Contributor Author

NHDaly commented Apr 15, 2019

Hey, @staticfloat, thanks for the response! ❤️!

Agreed, that's a good first step. I actually tried this on friday; I'm sorry I didn't update this bug! So, as far as I can tell: No, there aren't multiple copies of the library loaded. Which is good, I think.

I used (lldb) image list and indeed it only shows up once. I also tried dtruss (the mac equivalent to strace) and I see libcxxwrap is only open()ed once. So we just have the one shared library. Check that theory off the list.

@NHDaly
Copy link
Contributor Author

NHDaly commented Apr 15, 2019

Since this initialization somehow wasn't visible to our c++ library that we're trying to wrap, I decided to just try to do the initialization in C/C++! So I added a ccall in my Julia module to a C function I defined that initializes the CxxWrap library:

extern "C" {
  // From libcxxwrap
  extern void initialize(jl_value_t* julia_module, jl_value_t* cppfunctioninfo_type);

void initialize_cxxwrap(jl_value_t* cxxwrap_module, jl_value_t* cppfunctioninfo_type)
{
  std::cout << "Before setting manually -- jlcxx: " << jlcxx::get_cxxwrap_module() << std::endl;
  initialize(cxxwrap_module, cppfunctioninfo_type);
  std::cout << "After setting manually -- jlcxx: " << jlcxx::get_cxxwrap_module() << std::endl;
}
} // extern "C"
initialize_cxxwrap() = ccall((:initialize_cxxwrap, _l_jpagerwrapper), Cvoid, (Any, Any), CxxWrap, CxxWrap.CppFunctionInfo)
initialize_cxxwrap()

And now, indeed, my define_julia_module() function does see a value from get_cxxwrap_module(), but crashes immediately after:

Process 21982 launched: '/Users/nathan.daly/src/julia_release_native/usr/bin/julia' (x86_64)
Before setting manually -- jlcxx: 0x0
After setting manually -- jlcxx: 0x10dd55390
define_julia_module(): jlcxx: 0x10dd55390
ERROR: LoadError: LoadError: LoadError: LoadError: LoadError: Failed to find module CxxWrap
Stacktrace:
 [1] top-level scope at none:0 (repeats 5 times)
in expression starting at /Users/nathan.daly/work/raicode2/src/PagerWrap/PagerWrap.jl:75
...
Process 21982 exited with status = 1 (0x00000001)

And, annoyingly, it exits immediately so i can't get a backtrace from lldb.

@NHDaly
Copy link
Contributor Author

NHDaly commented Apr 15, 2019

Some other ruled out ideas:

We thought that maybe the fact that it's crashing without dropping into LLDB means there's a bug during C++ global variable initialization somewhere, but we can't find anything that might be responsible.

Also, i tried checking to see if maybe the problem is coming from spawned child julia processes, but I don't think there is any forking happening:

(lldb) b fork
Breakpoint 1: where = libsystem_c.dylib`fork, address = 0x000000000000fd6f
(lldb) r --compiled-modules=yes --cpu-target=native --optimize=3 -g0 --output-o=delve.a --track-allocation=none --code-coverage=none --history-file=yes --inline=yes --math-mode
=ieee --project=@. --compile=yes --track-allocation=none --sysimage-native-code=yes --sysimage=/Users/nathan.daly/src/julia_release_native/usr/lib/julia/sys.dylib --compiled-mo
dules=yes --optimize=2 /Users/nathan.daly/.julia/packages/PackageCompiler/oT98U/sysimg/run_julia_code.jl
Process 80500 launched: '/Users/nathan.daly/src/julia_release_native/usr/bin/julia' (x86_64)
@wrapmodule
get_cxxwrap_module() = Ptr{Module} @0x0000000000000000
Initialized
get_cxxwrap_module() = Ptr{Module} @0x000000010d49d390
┌ Info: register_julia_module(mod, fptr)
└ @ CxxWrap /Users/nathan.daly/.julia/dev/CxxWrap/src/CxxWrap.jl:153
get_cxxwrap_module() = Ptr{Module} @0x000000010d49d390
5-element Array{Base.StackTraces.StackFrame,1}:
 top-level scope at none:0
 top-level scope at none:0
 top-level scope at none:0
 top-level scope at none:0
 top-level scope at none:0
define_julia_module(): jlcxx: 0x0

signal (11): Segmentation fault: 11
in expression starting at /Users/nathan.daly/work/raicode2/src/PagerWrap/PagerWrap.jl:72```


@barche
Copy link
Collaborator

barche commented Apr 16, 2019

OK, so I managed to reproduce the problem. While working on STL support, I had actually ran into a similar problem and already simplified the GC protection code and used a similar trick for initialization as in your PR. You can find it in the STL branches of CxxWrap and libcxxwrap-julia. I haven't been able to test it yet, since I can't seem to figure out how to make PackageCompiler use my locally modified CxxWrap (it appears to just get the latest release).

@NHDaly
Copy link
Contributor Author

NHDaly commented Apr 17, 2019

I haven't been able to test it yet, since I can't seem to figure out how to make PackageCompiler use my locally modified CxxWrap (it appears to just get the latest release).

Huh that's surprising. For me, when I use PackageCompiler to just build an executable (via build_executable()) it respects my current environment's Manifest. Is that not happening for you? It should be!

OK, so I managed to reproduce the problem. While working on STL support, I had actually ran into a similar problem and already simplified the GC protection code and used a similar trick for initialization as in your PR. You can find it in the STL branches of CxxWrap and libcxxwrap-julia.

Ooh interesting! I've just been comparing the branches, and it looks like a good improvement, yeah. I'm also interested to try this out! Will try that when I get home tonight. Do you have any idea why the simple fix I tried in this PR didn't work? Does the observed behavior above (setting the value of g_cxxwrap_module from Julia, only to have it still 0x0 when read from my c++ module) make sense to you?

@barche
Copy link
Collaborator

barche commented Apr 17, 2019

Aha, I was using compile_package, using build_executable a simple hello world test actually works, haven't tried anything more complicated.

I think the reason your fix doesn't work is that wrapmodule is only called at compile time, in the STL branch I call the init function from the register_julia_module function, which gets called both from the @wrapmodule and @initcxx macros.

I will backport the required changes from the stl branch to master, stl itself has to wait until #134 is fixed, which requires some major revision.

@NHDaly
Copy link
Contributor Author

NHDaly commented Apr 18, 2019

Thanks for the help debugging! :)

I think the reason your fix doesn't work is that wrapmodule is only called at compile time, in the STL branch I call the init function from the register_julia_module function, which gets called both from the @wrapmodule and @initcxx macros.

:/ i'm not sure that's it, unfortunately. The segfault itself is happening at compiletime, so @initcxx hasn't been called yet.

I tried moving the initialization to register_julia_module as you suggest, but it still crashes the same way (which makes sense since this is happening at compile time.

I've pushed up all my changes including the debug statements so that hopefully my output will make more sense. Here's what i'm seeing now:

@wrapmodule
get_cxxwrap_module() = Ptr{Module} @0x0000000000000000
┌ Info: register_julia_module(mod, fptr)
└ @ CxxWrap /Users/nathan.daly/.julia/dev/CxxWrap/src/CxxWrap.jl:153
get_cxxwrap_module() = Ptr{Module} @0x0000000000000000
Initialized
get_cxxwrap_module() = Ptr{Module} @0x00000001148d6980
5-element Array{Base.StackTraces.StackFrame,1}:
 top-level scope at none:0
 top-level scope at none:0
 top-level scope at none:0
 top-level scope at none:0
 top-level scope at none:0

signal (11): Segmentation fault: 11
in expression starting at /Users/nathan.daly/work/raicode2/src/PagerWrap/PagerWrap.jl:72

Aha, I was using compile_package, using build_executable a simple hello world test actually works, haven't tried anything more complicated.

:) I'm glad that helped! :) hmm i'm glad you got something working. Hopefully we can figure how what's making our situation more complicated...

I will backport the required changes from the stl branch to master, stl itself has to wait until #134 is fixed, which requires some major revision.

That would be fantastic! :) I tried the stl branch myself, but it doesn't build:

ERROR: LoadError: LibraryProduct(nothing, ["libcxxwrap_julia_stl"], :libcxxwrap_julia_stl, "Prefix(/Users/nathan.daly/.julia/dev/CxxWrap/deps/usr)") is not satisfied, cannot generate deps.jl!

I'm sorry i'm sure i don't know how to set up the build correctly to get the binary dependency built correctly.

barche added a commit that referenced this pull request Apr 19, 2019
barche added a commit that referenced this pull request Apr 20, 2019
barche added a commit that referenced this pull request Apr 20, 2019
@barche
Copy link
Collaborator

barche commented Apr 20, 2019

I'm not sure why it doesn't work. I merged the GC-related changes from STL into master and made a release, though this doesn't seem to be in the registry yet. You should be able to test it using latest master, it should get the correct libcxxwrap-julia binaries.

@NHDaly
Copy link
Contributor Author

NHDaly commented Apr 21, 2019

Thanks for letting me know. Yeah, I was able to get it to run, but sadly it still segfaults in the same way. :(

A coworker and I are going to try to look through it more carefully on Monday with a fine-toothed comb, i'll let you know if we find anything!

@barche
Copy link
Collaborator

barche commented Apr 21, 2019

Do you have a test that reproduces the problem?

@NHDaly
Copy link
Contributor Author

NHDaly commented Apr 23, 2019

@barche Sorry I don't have a minimal working example together yet. I'm still working on it.

In the meantime, i think this would be fairly simple to diagnose the problem if we had a debug-build of libcxxwrap_julia.dylib. We should be able to just put a watchpoint on the variable and see if it's reset back to 0, for example. However, the release (https://github.com/JuliaInterop/libcxxwrap-julia/releases/tag/v0.5.3) doesn't contain a debug build library.

Would you be able to help us build one? It would be super awesome if we could set things up to upload debug builds from binary builder as well so that they will always be created in the future, but I'd be happy to do it manually for now if I could figure out how to get it to build.

Thanks for your help, @barche!
~Nathan

@barche
Copy link
Collaborator

barche commented Apr 23, 2019

OK, to build yourself it should normally just be a matter of building libcxxwrap-julia as any other CMake project, using the Julia_PREFIX CMake variable to indicate which Julia to build against. Then, in Julia, set the environment variable JLCXX_DIR to point to the build dir of libcxxwrap-julia, and Pkg.build the CxxWrap package again.

@NHDaly
Copy link
Contributor Author

NHDaly commented Apr 25, 2019

OKAY! Good news: your change fixed my segfault, and I have some understanding to share about static compilation! :)

First: Indeed, your fix in #137 fixed the segfault I was seeing. Thank you! I believe that the solution presented in this PR would have fixed it as well. It turns out that I wasn't able to see the fix because it was being shadowed by a second bug:

Second: It turns out that all the symptoms I was seeing here and in JuliaInterop/libcxxwrap-julia#24 are actually a result of a different bug: apparently when I have a _different version of CxxWrap in my local Project.toml and the top-level v1.1 environment, they are both getting loaded! So I did have two different versions of the library loaded, and somehow the C++ was seeing one while the Julia was seeing the other. But this only happens during static compilation. I'm guessing this is a bug in PackageCompiler.jl, but it might be in Julia itself. I am going to start investigating this now.

So in this case, because I had dev'd CxxWrap in my local Project while trying to fix the bug, I wasn't able to see that your PR fixed it because I was running into this double-library loading issue.

Thanks for the help. I will close this PR now. I'll keep commenting here on my investigation into the double-library-loading until I can find the right place to open an issue! 😅 Wow! Thanks again! 😊

@NHDaly NHDaly closed this Apr 25, 2019
@NHDaly NHDaly deleted the nhdaly-staticCompilation branch April 25, 2019 14:53
@NHDaly
Copy link
Contributor Author

NHDaly commented Apr 25, 2019

Aha! I think i'm finally beginning to unravel the situation!

When we build our C++ library, we _hardcode the path to the libcxxwrap_julia.dylib, which I didn't realize. So when I change which library to load from Julia, the C++ library still tries to load the same hardcoded library. If those differ, we get two instances of the library loaded.

I still don't fully understand what is different about static compilation vs not, but somehow, something I'm doing allows it to load both libs when static compiling, but for regular builds julia's loaded library is somehow shared with the C++ library, even though it has a hardcoded path to a different lib.

We set the path by setting -DCMAKE_PREFIX_PATH="$jlcxx_dir" when compiling our own c++ library.

@barche
Copy link
Collaborator

barche commented Apr 28, 2019

OK, great to hear you figured it out. The library to use is stored in the deps/deps.jl file, which is generated at Pkg.build time and at that time also uses the JLCXX_DIR environment variable.

@barche barche mentioned this pull request Apr 28, 2019
@NHDaly
Copy link
Contributor Author

NHDaly commented Apr 29, 2019

Soooooooo just as an update:

This PR definitely fixed the build on my mac. However, it's still broken (now in a different way).

On Linux, I segfault when static compiling, with this GC error:

GC error (probable corruption) :
Allocations: 5285812 (Pool: 5284453; Big: 1359); GC: 10
<?#0x7fffe8dcc530::<circular reference @-1>>
0x7fffe208a010: Queued root: 0x7fffe82c20e0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a028: Queued root: 0x7fffe82c1be0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a040: Queued root: 0x7fffe66f96e0 :: 0x7fffe6518080 (bits: 3)
        of type Core.TypeName
0x7fffe208a058: Queued root: 0x7fffe82c0c40 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a070: Queued root: 0x7fffe6815a50 :: 0x7fffe6519ba0 (bits: 3)
        of type Method
0x7fffe208a088: Queued root: 0x7fffe6815c30 :: 0x7fffe6519ba0 (bits: 3)
        of type Method
0x7fffe208a0a0: Queued root: 0x7fffe82c25e0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a0b8: Queued root: 0x7fffe8766810 :: 0x7fffe6519c10 (bits: 3)
        of type Core.MethodInstance
0x7fffe208a0d0: Queued root: 0x7fffe8766890 :: 0x7fffe6519c10 (bits: 3)
        of type Core.MethodInstance
0x7fffe208a0e8: Queued root: 0x7fffe89bf810 :: 0x7fffe6519c10 (bits: 3)
        of type Core.MethodInstance
0x7fffe208a100: Queued root: 0x7fffe74dc670 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a118: Queued root: 0x7fffe79290a0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a130: Queued root: 0x7fffe89e0b50 :: 0x7fffe6519ba0 (bits: 3)
        of type Method
0x7fffe208a148: Queued root: 0x7fffe7929000 :: 0x7fffe65181d0 (bits: 3)
        of type Core.MethodTable
0x7fffe208a160: Queued root: 0x7fffe82c04c0 :: 0x7fffe6518080 (bits: 3)
        of type Core.TypeName
0x7fffe208a178: Queued root: 0x7fffe6815af0 :: 0x7fffe6519ba0 (bits: 3)
        of type Method
0x7fffe208a190: Queued root: 0x7fffe829f1c0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a1a8: Queued root: 0x7fffe6814f10 :: 0x7fffe6519ba0 (bits: 3)
        of type Method
0x7fffe208a1c0: Queued root: 0x7fffe829f120 :: 0x7fffe65181d0 (bits: 3)
        of type Core.MethodTable
0x7fffe208a1d8: Queued root: 0x7fffe89e1c30 :: 0x7fffe6519ba0 (bits: 3)
        of type Method
0x7fffe208a1f0: Queued root: 0x7fffe7937030 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a208: Queued root: 0x7fffe89e1cd0 :: 0x7fffe6519ba0 (bits: 3)
        of type Method
0x7fffe208a220: Queued root: 0x7fffe82c0740 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a238: Queued root: 0x7fffe6815870 :: 0x7fffe6519ba0 (bits: 3)
        of type Method
0x7fffe208a250: Queued root: 0x7fffe68157d0 :: 0x7fffe6519ba0 (bits: 3)
        of type Method
0x7fffe208a268: Queued root: 0x7fffe6815730 :: 0x7fffe6519ba0 (bits: 3)
        of type Method
0x7fffe208a280: Queued root: 0x7fffe68155f0 :: 0x7fffe6519ba0 (bits: 3)
        of type Method
0x7fffe208a298: Queued root: 0x7fffe6815410 :: 0x7fffe6519ba0 (bits: 3)
        of type Method
0x7fffe208a2b0: Queued root: 0x7fffe829ff80 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a2c8: Queued root: 0x7fffe6815550 :: 0x7fffe6519ba0 (bits: 3)
        of type Method
0x7fffe208a2e0: Queued root: 0x7fffe6814fb0 :: 0x7fffe6519ba0 (bits: 3)
        of type Method
0x7fffe208a2f8: Queued root: 0x7fffe7928c90 :: 0x7fffe6d35eb0 (bits: 3)
        of type Base.Dict{UInt64, Tuple{Any, Int64}}
0x7fffe208a310: Queued root: 0x7fffe6d8eef0 :: 0x7fffe6518080 (bits: 3)
        of type Core.TypeName
0x7fffe208a328: Queued root: 0x7fffef7e3750 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a340: Queued root: 0x7fffef7e3990 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a358: Queued root: 0x7fffee638610 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a370: Queued root: 0x7fffef7e36c0 :: 0x7fffe6518be0 (bits: 3)
        of type Core.TypeMapLevel
0x7fffe208a388: Queued root: 0x7fffeed695e0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a3a0: Queued root: 0x7fffe8c0f190 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a3b8: Queued root: 0x7fffed08b550 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a3d0: Queued root: 0x7fffee02e160 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a3e8: Queued root: 0x7fffee55c270 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a400: Queued root: 0x7fffee0711c0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a418: Queued root: 0x7fffedf88a20 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a430: Queued root: 0x7fffee691d00 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a448: Queued root: 0x7fffee69a1e0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a460: Queued root: 0x7fffefaaeac0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a478: Queued root: 0x7fffefab0dd0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a490: Queued root: 0x7fffec0cf460 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a4a8: Queued root: 0x7fffee6939c0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a4c0: Queued root: 0x7fffed126120 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a4d8: Queued root: 0x7fffed126620 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a4f0: Queued root: 0x7fffee68df80 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a508: Queued root: 0x7fffec42d5f0 :: 0x7fffe6519ba0 (bits: 3)
        of type Method
0x7fffe208a520: Queued root: 0x7fffec7d2990 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a538: Queued root: 0x7fffefacc220 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a550: Queued root: 0x7fffefad0850 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a568: Queued root: 0x7fffedc911b0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a580: Queued root: 0x7fffe89e1b90 :: 0x7fffe6519ba0 (bits: 3)
        of type Method
0x7fffe208a598: Queued root: 0x7fffe79359b0 :: 0x7fffe65181d0 (bits: 3)
        of type Core.MethodTable
0x7fffe208a5b0: Queued root: 0x7fffe89e1a50 :: 0x7fffe6519ba0 (bits: 3)
        of type Method
0x7fffe208a5c8: Queued root: 0x7fffeeb41cd0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a5e0: Queued root: 0x7fffed350be0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a5f8: Queued root: 0x7fffed6db590 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a610: Queued root: 0x7fffefb486d0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a628: Queued root: 0x7fffecb45890 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a640: Queued root: 0x7fffec513760 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a658: Queued root: 0x7fffedcc6320 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a670: Queued root: 0x7fffefa51920 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a688: Queued root: 0x7fffefa1c130 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a6a0: Queued root: 0x7fffeed4e2c0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a6b8: Queued root: 0x7fffed57c500 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a6d0: Queued root: 0x7fffee64b430 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a6e8: Queued root: 0x7fffedb54840 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a700: Queued root: 0x7fffedce03a0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a718: Queued root: 0x7fffedefc680 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a730: Queued root: 0x7fffedec7000 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a748: Queued root: 0x7fffe7cc3af0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a760: Queued root: 0x7fffe7ca0f10 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a778: Queued root: 0x7fffedc369e0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a790: Queued root: 0x7fffed947240 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a7a8: Queued root: 0x7fffed28ec70 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a7c0: Queued root: 0x7fffeeee3df0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a7d8: Queued root: 0x7fffee644970 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a7f0: Queued root: 0x7fffee63c0d0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a808: Queued root: 0x7fffed1d0870 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a820: Queued root: 0x7fffee643ca0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a838: Queued root: 0x7fffefac9540 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a850: Queued root: 0x7fffede10e70 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a868: Queued root: 0x7fffeca9fad0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a880: Queued root: 0x7fffecaa0c30 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a898: Queued root: 0x7fffe7ca0df0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a8b0: Queued root: 0x7fffe7ca0d90 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a8c8: Queued root: 0x7fffeec387f0 :: 0x7fffe6518be0 (bits: 3)
        of type Core.TypeMapLevel
0x7fffe208a8e0: Queued root: 0x7fffec4452b0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a8f8: Queued root: 0x7fffede613e0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a910: Queued root: 0x7fffe74dc6d0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a928: Queued root: 0x7fffee3fd3c0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a940: Queued root: 0x7fffe758a350 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a958: Queued root: 0x7fffee2273d0 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a970: Queued root: 0x7fffee892500 :: 0x7fffe6519040 (bits: 3)
        of type Array{Any, 1}
0x7fffe208a988: Queued root: 0x7fffe8767510 :: 0x7fffe6519c10 (bits: 3)
        of type Core.MethodInstance
0x7fffe208a9a0:  r-- Module (bindings) 0x7fffe693d120 (bits 3) -- [0x82a46a8, 0x82a4b20)

signal (6): Aborted
in expression starting at /tmp/nix-build-raicode.drv-0/builddir/Delve.app/res/Delve/src/PagerWrap/PagerWrap.jl:72

On the mac, it does successfully statically compile! But if I try to include CxxWrap functions in precompile statements generated by snooping, it segfaults when executing those precompile statements, such as this one:

try;precompile(Tuple{typeof(CxxWrap.register_julia_module), Module}); catch e; @debug "couldn't precompile statement 82" exception = e; end

I can try to get a more detailed backtrace, but for now i've just commented those lines out of my generated precompile.jl and am focusing only on macOS.

@NHDaly
Copy link
Contributor Author

NHDaly commented Apr 30, 2019

Hmm, now I'm seeing that same GC error on mac as well.

I'm pretty sure the segfault is actually occuring inside the JL_GC_PUSH1 call here:
https://github.com/JuliaInterop/libcxxwrap-julia/blob/3818cbe8615c41d7f1d9b557cb85e66953494cde/src/jlcxx.cpp#L22-L24

When stepping through the assembly, it crashes at the symbol stub for: jl_get_ptls_states:

(lldb) nexti
Process 12000 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step over
    frame #0: 0x000000011b8db043 libcxxwrap_julia.0.5.3.dylib`jlcxx::protect_from_gc(_jl_value_t*) + 35
libcxxwrap_julia.0.5.3.dylib`jlcxx::protect_from_gc:
->  0x11b8db043 <+35>: callq  0x11b8df892               ; symbol stub for: jl_get_ptls_states
    0x11b8db048 <+40>: movq   %rax, %rbx
    0x11b8db04b <+43>: movq   (%rbx), %rax
    0x11b8db04e <+46>: movq   %rax, -0x28(%rbp)
Target 0: (julia) stopped.
(lldb) nexti

signal (11): Segmentation fault: 11
in expression starting at /Users/nathan.daly/work/builddir/Delve.app/Contents/Resources/Delve/src/PagerWrap/PagerWrap.jl:72
unknown function (ip: 0xffffffffffffffff)
Allocations: 2837112 (Pool: 2836190; Big: 922); GC: 5```


That also seems consistent with the error message describing `GC` problems.

(I checked, also, and `g_protect_from_gc` seems fine):
```gdb
(lldb) image lookup -v -s jlcxx::g_protect_from_gc
1 symbols match 'jlcxx::g_protect_from_gc' in /Users/nathan.daly/work/builddir/Delve.app/Contents/Resources/dotjulia/packages/CxxWrap/4xNLt/deps/usr/lib/libcxxwrap_julia.0.5.3.dylib:
        Address: libcxxwrap_julia.0.5.3.dylib[0x000000000000e688] (libcxxwrap_julia.0.5.3.dylib.__DATA.__common + 16)
        Summary: libcxxwrap_julia.0.5.3.dylib`jlcxx::g_protect_from_gc
         Module: file = "/Users/nathan.daly/work/builddir/Delve.app/Contents/Resources/dotjulia/packages/CxxWrap/4xNLt/deps/usr/lib/libcxxwrap_julia.0.5.3.dylib", arch = "x86_64"
         Symbol: id = {0x00000078}, range = [0x000000011b8e2688-0x000000011b8e2690), name="jlcxx::g_protect_from_gc", mangled="_ZN5jlcxx17g_protect_from_gcE"

(lldb) print _ZN5jlcxx17g_protect_from_gcE
(void *) $2 = 0x00000001211601b0

Although i'm not sure which C api functions to call to actually verify that that is a valid function pointer. So i think the problem is with the GC.

is it weird that we're calling GC functions in C, and then calling this julia function to do GC operations?

@barche
Copy link
Collaborator

barche commented Apr 30, 2019

The protection functions are made cfunctions in initialize_cxx_lib, maybe it's a problem they are local variables? I'm not sure if the cfunction result itself can get GC'ed?

@NHDaly
Copy link
Contributor Author

NHDaly commented May 8, 2019

maybe it's a problem they are local variables

Yeah, i'm wondering the same thing. That's my current best guess! The examples with @cfunction in the docs have it as a global variable, but there aren't many examples.

This section describing using the @cfunction($foo, ...) syntax to create a Functor/Closure explicitly says they'll be GC'd, but it's unclear if that's only referring to the specific usage with $ or all return values of @cfunction:
https://docs.julialang.org/en/v0.7/manual/calling-c-and-fortran-code/#Closure-cfunctions-1

You must ensure that this return object is kept alive until all uses of it are done. The contents and code at the cfunction pointer will be erased via a finalizer when this reference is dropped and atexit.

I guess it couldn't hurt to try it! I'll give it a shot in the next couple of days! :)

@NHDaly
Copy link
Contributor Author

NHDaly commented May 9, 2019

So, i tried moving them to be global variables in the CxxWrap.jl module, but when they're used later, we run into the problem that julia has zeroed-out the Ptrs after precompiling the module... :/ sigh.

Can we maybe just change these in the C++ library to not be stored as global variables at all, but rather have them passed-in as locals for every function call from julia to the C++? It might be a fair amount of plumbing work, but I think that would solve this problem once and for all! What would you think about that, @barche?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants