This is more of an art than anything pragmatic.
Useful if you are into low-level stuff and want to see ELF32 / dynamic linking / PIC relocation in action.
Not useful if you want to copy and paste some code that just works.
- No OS, no virtual memory, no threads/processes, no anything
arm-none-eabi-xx
toolchain for compiling and linking- Custom pi-side bootloader that receives binaries through UART
- Custom unix-side bootloader that sends binaries through UART
libpi.so
- a custom set of drivers and bare-metal libc implementation The static version of this library is used for CS 140E @ Stanford University- Custom FAT32 driver
- Have
libpi.so
, or any other dynamic libraries reside at pi's FAT32 filesystem root - Send the binary (our main executable, compiled to ELF32) through pi's UART
- Have the bootloader parse ELF32 and load the program
- Upon facing an unresolved symbol, our dynamic linker is automatically called and it loads, relocates, and links the dynamic libraries accordingly.
1-libpi.so
: the dynamic library to test on. Contains source and build system (a.k.a. makefile)2-my-dynamic-linker
: the main folder. Contains FAT32 driver, ELF32 reader, and dynamic linker/loader.3-my-unix-bootloader
: the unix-side bootloader that sends programs through UART4-my-pi-bootloader
: the pi-side bootloader that receives programs through UART5-tests
: bunch of main executables we can send through UART for testingREADME.md
: this filedocs
: relevant docs you can look into
- The executable is an ELF binary. So we parse it.
- We fill the third entry (each entry is 4B) in the .got.plt section with the address of the dynamic linking function (which is part of the pi-side bootloader binary). An ELF binary that requires dynamic linking is compiled such that the program eventually jumps to this value when it reaches an unresolved symbol.
- We read the .dynamic section, fetch the list of required shared libraries, load them into memory, and keep track of which addresses they were loaded at. Panic if any of them are not found on the FAT32 FS (another approach would be to do lazy loading, and load the files only when they are needed).
- We also keep track of the starting addresses of the following sections for dynamic symbol resolution later: .rel.dyn, .dynsym, .dynstr
- We copy contents of the modified ELF executable from position 0x8000, to the physical memory address 0x8000. The difference from our original bootloader is that we cannot just copy the .text, .rodata, and .data sections, because we need other sections to resolve symbols at runtime.
- We set PC = 0x8000
- The shared library is a position-independent ELF binary. We must load and relocate its symbols properly.
- First, we load the contents of binary to correct "virtual addresses" based on the ELF Program Header Table. We can't just copy the entire file to a contiguous memory region, because the ELF binary might have intentional gaps between sections, defined through the program headers.
- We perform load-time relocations (since shared library is PIC) based on the relocation sections of the ELF binary (.rel.dyn, .rel.plt). This is mainly finding symbol locations and filling their addresses in the .got section.
- When our executable hits an unresolved symbol, it first jumps to the .plt section, which holds a stub code that jumps to the address stored in the third entry of the .got.plt section. Upon this, our dynamic linking function is called.
- At this point, we have the following information:
- r0, r1, r2, r3: holds the arguments to the dynamically linked function.
- lr: holds the address to the third entry of the .got.plt section, which holds the address of the dynamic linker entry function.
- r12: holds the address to the .got.plt section that corresponds to the unresolved symbol. Our goal is to fill this address with the actual address of the symbol. We can also calculate the index of the dynamic symbol table entry that we need to resolve from this address.
- stack: holds one 4B value, which is the return address. After we fill in the address at r12, and call the actual function that the original program meant to call, we need to store this value into the lr register, so that the program can return back to where it was. The stack also might hold additional arguments to the dynamically linked function.
-------------
| arg n |
-------------
| ... |
-------------
| arg 6 |
-------------
| arg 5 |
-------------
| lr |
------------- <- sp
-
Using these values, we perform dynamic linking as follows.
- Push general registers to the stack.
- Jump to executable's relocation tables, and find the section entry that has the OFFSET field equal to the address stored in r12. We can quickly calculate this by finding the current entry's index in the global offset table.
- Read this entry's INFO field, bitwise shift it to the right by 8 (the first 8 bits hold other data). This is the index in the dynamic symbol table.
- Jump to executable's .dynsym section, go to the index we found (note that each entry is 16B). Read the first 4 bytes of the entry. This is the index of the symbol in the dynamic string table.
- Jump to executable's .dynstr section, go to the index we found. Read the characters until you reach a null terminator. This is the name of the symbol.
- Hash this string according to the ELF32 specifictaion.
- Iterate through the list of shared libraries that we loaded in the previous part. Jump to each library's .hash section. Using the hash we calculated, find the bucket that corresponds to the hash we calculated (must handle hash collision with the chain list in the .hash section). The bucket holds an index into the .dynsym section of the shared library. Jump to this index, read the .dynsym index, read the symbol, and compare the name. If it matches, we found the symbol. If not, continue to the next shared library.
- Jump back to the .dynsym table entry. The entry's .st_value field contains the address of the symbol.
- Pop r12 from the stack
- Store the resolved symbol address to the address stored in r12.
- Pop general registers from the stack.
- Pop the last item from the stack to the lr register, as well as general args if any.
- Pass control to the resolved symbol.
- Now the program will continue running with the resolved symbol, as if none of this ever happened. When it returns control to the caller, it will return to the our main executable. The main executable's .got section will have the resolved address, so it will not need to call the dynamic linker again when it reaches the same symbol next time.
This is more of an arbitrary decision, but here it is:
0xFFFF'FFFF ----------------
| |
| UNMAPPED |
| |
0x2030'0000 ----------------
| |
| Peripherals |
| |
0x2000'0000 ----------------
| |
| UNMAPPED |
| |
0x1200'0000 ----------------
| |
| Shared |
| Libraries |
| (2MB) |
| |
0x1000'0000 ----------------
| |
| UNMAPPED |
| |
0x0900'0000 ----------------
| |
| INTERRUPT |
| STACK |
| |
0x0890'0000 ----------------
| |
| UNMAPPED |
| |
0x0800'0000 ----------------
| |
| STACK |
| |
0x0790'0000 ----------------
| |
| UNMAPPED |
| |
0x0510'0000 ----------------
| |
| BOOTLOADER |
| + |
| DYNAMIC |
| LINKER |
| |
0x0500'0000 ----------------
| |
| HEAP |
| |
0x0010'0000 ----------------
| |
| CODE |
| |
0x0000'8000 ----------------
| |
| FREE |
| |
0x0000'0000 ----------------