aboutsummaryrefslogtreecommitdiff
path: root/docs/Problems-faced.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/Problems-faced.md')
-rw-r--r--docs/Problems-faced.md45
1 files changed, 45 insertions, 0 deletions
diff --git a/docs/Problems-faced.md b/docs/Problems-faced.md
new file mode 100644
index 0000000..e19bd21
--- /dev/null
+++ b/docs/Problems-faced.md
@@ -0,0 +1,45 @@
+## Problems faced
+
+### Ramfs alignment
+* Our ramfs needs to be 4-aligned in memory, but when objcopy creates the embeddable file, it doesn't (at least by default) mark it's data section as requiring 2**2 alignment. There has to be .=ALIGN(4) line in linker script before ramfs_embeddable.o. At some point we forgot about it, which caused the ramfs to misbehave.
+Bugs located in linker script, like this one, are often non-obvoius. This makes them hard to trace.
+
+### /COM/ section
+Many sources mention /COMMON/ as the section in object files resulting from compilation, that contains some specific kind of uninitialized (0-initialized) data (simillar to .bss). Obviously, it has to be included in the linker script.
+Unfortunately, gcc names this section differently, mainly - /COM/. This caused our linker script to not include it in the actual image. Instead, it was placed somewhere after the last section defined in the linker script. This happened to be after our NOLOAD stack section, where first free MMU section is. Due to how our memory management algorithm works, this part of physical memory always gets allocated to the first process, which gets it's code copied there.
+This bug caused incredibly weird behaviour. The user space code would fail with either abort or undefined instruction, always on the second PL0 instruction. That was because some statically allocated scheduler variable in /COM/ was getting mapped at that address. It took probably a few hours of analysing generated assembly in radare2 and modyfying [scheduler.c](../src/arm/PL1/kernel/scheduler.c) and [PL0_test.c](../src/arm/PL0/PL0_test.c) to find, that the problem lies in the linker script.
+
+### Bare-metal position indeppendent code
+We wanted to make bootloader and kernel able to run regardless of what address they are loaded at (also see comment in [kernel's stage1 linker script](../src/arm/PL1/kernel/kernel.ld)).
+To achieve the goal, we added -fPIC to compilation options of all arm code. With this, we decided we can, instead of embedding code in other code using objcopy, put relevant pieces of code in separate linker script sections, link them together and then copy entire sections to some other addresss in runtime. I.e. the exception vector would be linked with the actual kernel (loaded at 0x8000), but the copied along with exception handling routines to 0x0. It did work in 2 cases (of exception vector and libkernel), but once most of the project was modified to use this method of code embedding, it turned out to be faulty and work had to be done to move back to the use of objcopy.
+The problem is, -fPIC (as well af -fPIE) requires code to be loaded by some operating system or bootloader, that can fill it's got (global offset table). This is not being done in environment like ours.
+It is possible to generate ARM bare-metal position-independent code, that would work without got, but support for this is not implemented in gcc and is not a common feature in general.
+The solution was to write stage1 of both bootloader and the kernel in careful, position-independent assembly This required more effort, but was ultimately successful.
+
+### Linker section naming
+Weird behaviour occurs, when trying to link object code files with nonstandard section names using GNU linker. Output sections defined in the linker script didn't cause problems in our case. Problems occured when input sections were nonstandard (such as sections generated by using __attribute__((section("name"))) in GCC-compiled C code), as they would not be included or would be included in wrong place, despite being explicitly listed for inclusion in the linker script's SECTION command.
+At some point, renaming a section from .boot to .text.boot would make the code work properly.
+
+### Context switches
+This is a description of a mistake made by us during work on the project.
+At first, we didn't know about special features of SUBS pc, lr and ldm rn {pc} ^ instructions. Our code would switch to user mode by branching to code in PL0-accessible memory section and having it execute cps instruction. This worked, but was not good, because code executed by the kernel was in memory section writable by userspace code.
+First improvement was separating that code into "libkernel". Libkernel would be in a PL0-executable but non-writable section and would perform the switch.
+It did work, however, it was not the right way.
+We later learned how to achieve the same with subs/ldm and removed, making the project a bit simpler.
+
+### Different modes' sp register
+System mode has separate stack pointer from supervisor mode, so when upon switch from supervisor to system mode it has to be set to point to the actual stack.
+At first we didn't know about that and we had undefined behaviour occur. At some points during the development, changing a line of code in one place would make a bug occur or not occur in some other, unrelated place in the kernel.
+
+### Swithing between system mode and user mode
+It is also not allowed (undefined behaviour) to switch from system mode directly to user mode, which we were not aware of and which also caused some problem/bugs.
+
+### UART interrupt masking
+Both BCM2835 ARM Peripherals manual and the manual to PL011 UART itself say, that writing 0s to PL011_UART_IMSC unmasks specific interrupts. Practical experiments showed, that it's the opposite: writing 1s enables specific interrupts and writing 0s disables them.
+UART code on wiki.osdev was also written to disable interrupts in the way described in the manuals. The interrrupts were then unmasked instead of masked. This didn't cause problems in practice, as UART interrupts have to also be unmasked elsewhere (register defined ARM_ENABLE_IRQS_2 in [interrupts.h](../src/arm/PL1/kernel/interrupts.h)) to actually occur.
+
+###
+The very simple pipe_image program breaks stdin when run.
+Even other programs run in that same (bash) shell after pipe_image cannot read from stdin.
+In zsh other commands run interactively after pipe_image do work, but commands executed after pipe_image inside a shell function still have the problem occur.
+