Skip to content

Modularization/"Librarization" - create toolchain to optionally build custom kernel tailored to specific hypervisor or app #1110

@wkozaczuk

Description

@wkozaczuk

Currently, OSv kernel provides quite a significant subset of the functionality of some standard Linux libraries listed here - https://github.com/cloudius-systems/osv#kernel-size. In reality, many applications do not need all of this functionality, but they "get it" whether they need it or not. Even Java, which used to need lots of symbols from standard libraries, has become way more modular, and with the advent of GraalVM and other AOT-type technologies, OSv kernel does not need to provide all this functionality universally to every app. Worse, if you run an app on Firecracker which needs console, non-PCI virtio-blk and virtio-net drivers only, one gets all other drivers including ones for VirtualBox, Xen, VMware, etc. This actually makes OSv barely a unikernel or at best a "fat" one. This has some real negative consequences - higher memory utilization (kernel needs to be loaded in memory), larger kernel file (makes decompression longer), and poorer security because of the fairly vast number of exported symbols (at this moment everything non-static gets exported) and finally possibly less optimized code. On the other hand, because of this "universality", it is quite easy, comparing to other unikernels, to run an arbitrary Linux app on OSv. And no matter what we do to make OSv more modular, we should preserve that "ease" and not make it harder, at least by default, to run an app on OSv.

So in general, what I am advocating for, is an ability (and a mechanism) to create more "stripped-down" versions of kernels tailored to the need of specific app and/or specific hypervisor OSv will run on while preserving the default universal kernel. And also shrinking the universal kernel by extracting optional functionality from it, where it makes sense and is relatively easy to do so, as a shared library to be loaded during the boot process. The latter should also ideally involve the build process (compile/link) optimizations.

In the end, what I am proposing could be organized into the following three categories:

  • Tailor kernel (and really drivers) to a specific hypervisor - this could be as simple as defining more granular sets of targets in the main makefile and adding #ifdef in all relevant places and possibly using existing ./conf/*.mk - based mechanism; for starters we could define a build configuration for Firacracker and QEMU microvm machine that I believe requires the same small subset of drivers. These could be called profiles.
  • Extract optional functionality into shared libraries - this is more difficult than the above. One example of such functionality is ZFS and there is already an open issue. Some drivers could be extracted as libraries as well but it might be more difficult to do so. The main difficulty here is that there needs to be a filesystem mounted early enough in the boot process to load such a library from - bootfs (less attractive as it is part of loader.elf/kernel.elf) or ROFS.
  • Create a mechanism to build a smaller kernel "tailored" to a specific app. This would require some sort of ELF analyzer tool that would identify all symbols needed by the given app and its dependencies and create a version script file defining a specific set of symbols to be exported from the kernel. To achieve that we could start with addressing the issue - Be more selective on symbols exported from the kernel - that could deliver such a generic solution.

Addressing 3) could help us with another issue - Combining pre-compiled OSv kernel with pre-compiled executable. To that end, we could also consider creating a mechanism that would let us build a stripped-down version of the kernel with functionality exposed through SYSCALL instruction only and no built-in musl (except for dynamic linker function (dlopen, etc)) and libc and let one mix in original pre-built musl library which would interact with kernel through those SYSCALL calls. This would require probably exposing more functions as SYSCALL than we have now in linux.cc - at least brk and clone. I am not sure if that is even feasible but I think I think at least one of the unikernels does just this - Hermitux.

Please note that addressing this issue depends on #97.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions