write history for most of december

author: Wojtek Kosior <kwojtus@protonmail.com> 2020-01-20 17:15:35 +0100
committer: Wojtek Kosior <kwojtus@protonmail.com> 2020-01-20 17:15:35 +0100
commit: 7482e3065d9f3b3db805a7f5cd5b9e8cd2c9dec8 (patch)
tree: 15dad01c86e04cec8290fbc5a9e66ef5f06740c4
parent: aee564b1242f2593a8991251e7f6e7b2ece06164 (diff)
download: rpi-MMU-example-7482e3065d9f3b3db805a7f5cd5b9e8cd2c9dec8.tar.gz
rpi-MMU-example-7482e3065d9f3b3db805a7f5cd5b9e8cd2c9dec8.zip
1 files changed, 28 insertions, 1 deletions
diff --git a/HISTORY.md b/HISTORY.md
index 21dee9f..c203913 100644
--- a/HISTORY.md
+++ b/HISTORY.md
@@ -51,6 +51,33 @@ used to do - echoing everything on uart. The privileged code would mark a memory
 
 This was also an opportunity for us to check that the memory mapping truly works. We mapped virtual addresses 0xAAA00000 - 0xAAAFFFFF to physical addresses just after our kernel image and translation table (probably 0x00100000 - 0x001FFFFF given the small size of our kernel). The virtual address range 0xAAA00000 - 0xAAAFFFFF was also marked available for use by PL0 code. We then made kernel write the blob at 0x00100000 knowing it will also appear at virtual 0xAAA00000. Then, successfully running the unprivileged code from that address confirmed, that the mapping really works.
 
-There were 2 important things forgetting about which would stop us from succeeding in this step. The first one was the stack. Kernel used to use the memory just below itself (physical 0x8000) as the stack and since these addresses would not be available for PL0 code, a new stack had to be chosen - we set it somewhere on the high part of our unprivileged memory. The second important thing was marking the section in which the memory-mapped uart registers reside as accessible form PL0. This is because we were going to have unprivileged code write to uart by itself (and use the same uart code kernel uses...). Once we had interrupts programmed, this demo was to be improved to actually call the privileged code for writing and reading.
+There were 2 important things forgetting about which would stop us from succeeding in this step. The first one was the stack. Kernel used to use the memory just below itself (physical 0x8000) as the stack and since these addresses would not be available for PL0 code, a new stack had to be chosen - we set it somewhere on the high part of our unprivileged memory. The second important thing was marking the section in which the memory-mapped uart registers reside as accessible form PL0. This is because we were going to have unprivileged code write to uart by itself (and use the same uart code kernel uses...). Once we had exceptions programmed, this demo was to be improved to actually call the privileged code for writing and reading.
+
+The switch to unprivileged user mode was at that point done by the code loaded as user program. The goal was to have 100% of that code execute without privileges, so a short mode-swithing routine was separated to execute from it's own memory section (executable, but non-writable from PL0). We called that piece of code "libkernel" and embedded it in the actual kernel image.
+
+We also introduced another blob in the kernel, which would ontain the exception vector table and some exception handlers and would be copied to address 0x0.
+
+As so many pieces of code had to be embedded one in another, we hoped to find a cleaner way than current objcopy trick for achieving this. We came up with 2: The first one is using .incbin directive in assembly source, which we didn't do at that time. The second one is linking together pieces of code, that are supposed to work as separate programs (i.e. kernel and PL0 code), but puting their code in different elf sections, so that later, in runtime, such piece of code can be copied out and run from another location. -fPIC and later -fPIE was added to compile options to allow code pieces to run from different addresses than the kernel was compiled for. At first, that worked (for exception handlers and libkernel), but at some point something broke and after investigating we found out, that position intependent code relies on having a global offset table filled by some environment and it cannot work in a bare-metal case. All the changes with embedding had to be reverted.
+It is worth noting, that truly position independent code can be produced for arm, it's just not supported by most toolchains.
+Although the changes undertaken proved disastrous, when doing them we learned linker script syntax and started rewriting wiki.osdev-derived linker scripts' contents, in a more concise way, which was an important improvement to the project.
+
+To otherwise cope with embedding, we implemented a very simple ram filesystem to be able to easily embed many files at once in the kernel.
+
+We also found out, that switching from system mode to user mode is illegal in ARM. This explained some of the weird bugs we had. We then made the kernel use supervisor mode for most of the time. In fact, system mode ended up being used only to set the sp and lr of user mode (those 2 registers are shared between user and system modes).
+
+The problem with having exception-related code at 0x0 pushed us to make the decision to split the kernel into 2 stages, just as with bootloader. We also wanted kernel and loader to be able to run from any address, and bare-metal position independent code could not be generated by the compiler, so we wrote 1st stages of compiler and loader in careful, fully-pic assembly, which succeeded splendidly. At that point, we also got rid of old assembly boot code taken from wiki.osdev.
+We then used .incbin to embed second stages of loader and kernel in their first stages, which, together with the inclusion of exception vector in kernel's stage2, reduced the need to use objcopy for embedding of code and simplified linking.
+
+With all this dome, we could then extend and more easily test exception-handling routines. We implemented uart io of PL0 process in terms of supervisor call, which allowed us to make memory region with mapped peripherals unaccessible to unprivileged code as planned.
+
+To make debugging easier, we wrote some functions for printing of numbers and strings. We then also added some basic utilities, like memcpy().
+
+To known the memory size in runtime, we implemented handling of atags - a structure with information, that is passed to the kernel at boot. Our solution had to involve copying of entire atags from the initial location of 0x100 to some other, that would not be overwritten by stage 2 of the kernel (which gets copied to 0x0). Later, C code in kernel would parse the atags and get the ram size from it. No changes were required in bootloader, as it's second stage would be copied to 0x4000 and atags is quaranteed to end below that address.
+
+Unfortunately, rpi-open-firmware doesn't pass atags to the kernel, so this feature would only be useful in qemu. To otherwise learn the memory size, flattened device tree would need to be parsed - something, that replaced atags in recent years. We did not, however, do it at that time.
+
+We then wrote code to dynamically allocate, dealocate memory pages and made physical memory section for our only process be obtained that way. This feature would later be needed to implement multiple processes and their management.
+
+<more to come here>
 
 At that point we rearranged files, as it was becoming pretty unreadable having over 50 files cluttered in one directory so now we could start writing proper docs and modularize project.
 \ No newline at end of file
author	Wojtek Kosior <kwojtus@protonmail.com>	2020-01-20 17:15:35 +0100
committer	Wojtek Kosior <kwojtus@protonmail.com>	2020-01-20 17:15:35 +0100
commit	7482e3065d9f3b3db805a7f5cd5b9e8cd2c9dec8 (patch)
tree	15dad01c86e04cec8290fbc5a9e66ef5f06740c4
parent	aee564b1242f2593a8991251e7f6e7b2ece06164 (diff)
download	rpi-MMU-example-7482e3065d9f3b3db805a7f5cd5b9e8cd2c9dec8.tar.gz rpi-MMU-example-7482e3065d9f3b3db805a7f5cd5b9e8cd2c9dec8.zip