aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--docs/Problems-faced.md45
-rw-r--r--docs/Sources.md16
-rw-r--r--docs/interrupts.pngbin38473 -> 0 bytes
3 files changed, 46 insertions, 15 deletions
diff --git a/docs/Problems-faced.md b/docs/Problems-faced.md
new file mode 100644
index 0000000..e19bd21
--- /dev/null
+++ b/docs/Problems-faced.md
@@ -0,0 +1,45 @@
+## Problems faced
+
+### Ramfs alignment
+* Our ramfs needs to be 4-aligned in memory, but when objcopy creates the embeddable file, it doesn't (at least by default) mark it's data section as requiring 2**2 alignment. There has to be .=ALIGN(4) line in linker script before ramfs_embeddable.o. At some point we forgot about it, which caused the ramfs to misbehave.
+Bugs located in linker script, like this one, are often non-obvoius. This makes them hard to trace.
+
+### /COM/ section
+Many sources mention /COMMON/ as the section in object files resulting from compilation, that contains some specific kind of uninitialized (0-initialized) data (simillar to .bss). Obviously, it has to be included in the linker script.
+Unfortunately, gcc names this section differently, mainly - /COM/. This caused our linker script to not include it in the actual image. Instead, it was placed somewhere after the last section defined in the linker script. This happened to be after our NOLOAD stack section, where first free MMU section is. Due to how our memory management algorithm works, this part of physical memory always gets allocated to the first process, which gets it's code copied there.
+This bug caused incredibly weird behaviour. The user space code would fail with either abort or undefined instruction, always on the second PL0 instruction. That was because some statically allocated scheduler variable in /COM/ was getting mapped at that address. It took probably a few hours of analysing generated assembly in radare2 and modyfying [scheduler.c](../src/arm/PL1/kernel/scheduler.c) and [PL0_test.c](../src/arm/PL0/PL0_test.c) to find, that the problem lies in the linker script.
+
+### Bare-metal position indeppendent code
+We wanted to make bootloader and kernel able to run regardless of what address they are loaded at (also see comment in [kernel's stage1 linker script](../src/arm/PL1/kernel/kernel.ld)).
+To achieve the goal, we added -fPIC to compilation options of all arm code. With this, we decided we can, instead of embedding code in other code using objcopy, put relevant pieces of code in separate linker script sections, link them together and then copy entire sections to some other addresss in runtime. I.e. the exception vector would be linked with the actual kernel (loaded at 0x8000), but the copied along with exception handling routines to 0x0. It did work in 2 cases (of exception vector and libkernel), but once most of the project was modified to use this method of code embedding, it turned out to be faulty and work had to be done to move back to the use of objcopy.
+The problem is, -fPIC (as well af -fPIE) requires code to be loaded by some operating system or bootloader, that can fill it's got (global offset table). This is not being done in environment like ours.
+It is possible to generate ARM bare-metal position-independent code, that would work without got, but support for this is not implemented in gcc and is not a common feature in general.
+The solution was to write stage1 of both bootloader and the kernel in careful, position-independent assembly This required more effort, but was ultimately successful.
+
+### Linker section naming
+Weird behaviour occurs, when trying to link object code files with nonstandard section names using GNU linker. Output sections defined in the linker script didn't cause problems in our case. Problems occured when input sections were nonstandard (such as sections generated by using __attribute__((section("name"))) in GCC-compiled C code), as they would not be included or would be included in wrong place, despite being explicitly listed for inclusion in the linker script's SECTION command.
+At some point, renaming a section from .boot to .text.boot would make the code work properly.
+
+### Context switches
+This is a description of a mistake made by us during work on the project.
+At first, we didn't know about special features of SUBS pc, lr and ldm rn {pc} ^ instructions. Our code would switch to user mode by branching to code in PL0-accessible memory section and having it execute cps instruction. This worked, but was not good, because code executed by the kernel was in memory section writable by userspace code.
+First improvement was separating that code into "libkernel". Libkernel would be in a PL0-executable but non-writable section and would perform the switch.
+It did work, however, it was not the right way.
+We later learned how to achieve the same with subs/ldm and removed, making the project a bit simpler.
+
+### Different modes' sp register
+System mode has separate stack pointer from supervisor mode, so when upon switch from supervisor to system mode it has to be set to point to the actual stack.
+At first we didn't know about that and we had undefined behaviour occur. At some points during the development, changing a line of code in one place would make a bug occur or not occur in some other, unrelated place in the kernel.
+
+### Swithing between system mode and user mode
+It is also not allowed (undefined behaviour) to switch from system mode directly to user mode, which we were not aware of and which also caused some problem/bugs.
+
+### UART interrupt masking
+Both BCM2835 ARM Peripherals manual and the manual to PL011 UART itself say, that writing 0s to PL011_UART_IMSC unmasks specific interrupts. Practical experiments showed, that it's the opposite: writing 1s enables specific interrupts and writing 0s disables them.
+UART code on wiki.osdev was also written to disable interrupts in the way described in the manuals. The interrrupts were then unmasked instead of masked. This didn't cause problems in practice, as UART interrupts have to also be unmasked elsewhere (register defined ARM_ENABLE_IRQS_2 in [interrupts.h](../src/arm/PL1/kernel/interrupts.h)) to actually occur.
+
+###
+The very simple pipe_image program breaks stdin when run.
+Even other programs run in that same (bash) shell after pipe_image cannot read from stdin.
+In zsh other commands run interactively after pipe_image do work, but commands executed after pipe_image inside a shell function still have the problem occur.
+
diff --git a/docs/Sources.md b/docs/Sources.md
index 7a48250..36236a4 100644
--- a/docs/Sources.md
+++ b/docs/Sources.md
@@ -1,18 +1,4 @@
-## Problems we faced
-
-Problems we've faced (I mentioned to Metal (is it ok to use that pseudonym? I think it's ok), I'd include those and he seemed very enthusiastic about us describing them) (+ some stuff qualifiable as probles is already in HISTORY.md)
-
-* Our ramfs needs to be 4-aligned in memory, but when objcopy creates the embeddable file, it doesn't (at least by default) mark it's data section as requiring 2**2 alignment. There has to be .=ALIGN(4) in linker script before ramfs_embeddable.o, but I forgot about it, which caused the ramfs to misbehave
-* VERY NAUGHTY PROBLEM · Many sources mention /COMMON/ as the section, that contains some specific kind of uninitialized (0-initialized) data. Obviously, it has to be included in the linker script. Unfortunately, gcc names it differently, mainly - /COM/. This caused our linker script to not include it in the image. Instead, it was placed somewhere after the last section specified in the linker script. This happened to be after our NOLOAD stack section, where first free MMU section is (which happens to always get allocated to the first process, which gets it's code copied there). Do You imagine sitting for hours in front of radare2, searching for a bug in scheduler.c or PL0_test.c, that causes the userspace code to fail with either some kind of weird abort or undefined instruction, always on the second PL0 instruction!?
-* VERY NAUGHTY PROBLEM · I wanted to make our bootloader and kernel able to run no matter what address they are loaded at (see comment in kernel's stage1 linker script). To achieve that, I added -fPIC to compilation options of all arm code. With this, I decided I can, instead of embedding code in other code using objcopy, just put that code in separate linker script section with section_start and section_end symbols defined, so that I can copy it to some other address in runtime. I did it and it worker with interrupt vector and libkernel (see point below). But once I changed EVERYTHING to use linker symbols/sections instead of objcopy embedding it turned out it doesn't really work... and I had to make it back the old way :( The thing is -fPIC requires code to be loaded by some os or bootloader, that will fill the global offset table with symbols. I knew it's possible to generate bare-metal position-independent code, that shall work without got, but it turned out this is not implemented in gcc (it is in arm compiler, but only in 32-bits and who would like to use arm compiler anyway). I ended up writing stage1 of both bootloader and the kernel in careful position-independent assembly, thus achieving my goal (jut with a bit more of effort).
-* Linker behaves weird when section names don't start with .text, .data, etc.
-* Not strictly a problem, but a funny mistake of mine, that is worth mentioning... At first I didn't know about special features of SUBS pc, lr and ldm rn {pc} ^ instructions. So I would switch to user mode by first branching to code in PL0-accessible section and having it execute isb instruction. This worked, but was not good, because code executed by the kernel was in memory section writable by userspace code. So i separated that into "libkernel", that would be in a PL0-executable but non-writable section and would perform the switch... Well, it did work. Still, I was happy when I learned how to achieve the same with subs/ldm and could remove libkernel, making the project a bit simpler.
-* system mode has separate stack pointer from supervisor mode, so when going from supervisor to system we need to set it... We didn't know that and we were getting weeeird bugs (where changing something little in one place would make the bug occur or not occur somewhere completely else); also, it's not allowed (undefined behaviour) to switch from system mode directly to user mode... (at least this didn't cause such weird things to happen...)
-* both bcm arm peripherals manual and the manual to uart itself say, that writing 0s to PL011_UART_IMSC unmasks interrupts; its the opposite: writing 1s enables specific interrupts and writing 0s disables them. wiki.osdev code also got it the way it's written in those docs, but this didn't cause problems, since uart irq was not enabled in ARM_ENABLE_IRQS_2 (using #define names from our code), so, as intended, no irq was occuring
-* STILL UNFIXED · The very simple pipe_image program somehow manages to break stdin, so that even other programs run in that same (bash) shell can't read from it... (in zsh other interactively run commands work ok, but commands following pipe_image inside a shell function still have that problem)
-
-## Sources of Info
-
+## Sources of Information
* wiki.osdev (good for starting off, we could also (in a polite way) mention the things, that were broken there)
+ getting uart irq masking wrong (not really their fault, see above)
+ zeroing bss even tho it was placed in a section, that wasn't marked NOLOAD (I need to verify that yet)
diff --git a/docs/interrupts.png b/docs/interrupts.png
deleted file mode 100644
index 8973fa8..0000000
--- a/docs/interrupts.png
+++ /dev/null
Binary files differ