1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
|
high priority TODOs are higher; low priority ones and completed ones are lower; a lot of stuff was also implemented without ever being mentioned here
!!! VERY IMPORTANT !!!
* partially DONE - most of uart_init() unchanged * write better uart (using interrupts, maybe DMA?); the current one is sopy-pasted from wiki osdev which is lame
!!! VERY IMPORTANT !!!
* (never to be completed, as new stuff to document comes all the time) Write documentation for what we've already done (yea, there's HISTORY.md, but we also want something that gives technical details of how MMU gets set up (writes to SCTLR, DACR and others), etc.). Things worth including:
- Problems we've faced (I mentioned to Metal (is it ok to use that pseudonym? I think it's ok), I'd include those and he seemed very enthusiastic about us describing them) (+ some stuff qualifiable as probles is already in HISTORY.md)
· Our ramfs needs to be 4-aligned in memory, but when objcopy creates the embeddable file, it doesn't (at least by default) mark it's data section as requiring 2**2 alignment. There has to be .=ALIGN(4) in linker script before ramfs_embeddable.o, but I forgot about it, which caused the ramfs to misbehave
· VERY NAUGHTY PROBLEM · Many sources mention /COMMON/ as the section, that contains some specific kind of uninitialized (0-initialized) data. Obviously, it has to be included in the linker script. Unfortunately, gcc names it differently, mainly - /COM/. This caused our linker script to not include it in the image. Instead, it was placed somewhere after the last section specified in the linker script. This happened to be after our NOLOAD stack section, where first free MMU section is (which happens to always get allocated to the first process, which gets it's code copied there). Do You imagine sitting for hours in front of radare2, searching for a bug in scheduler.c or PL0_test.c, that causes the userspace code to fail with either some kind of weird abort or undefined instruction, always on the second PL0 instruction!?
· VERY NAUGHTY PROBLEM · I wanted to make our bootloader and kernel able to run no matter what address they are loaded at (see comment in kernel's stage1 linker script). To achieve that, I added -fPIC to compilation options of all arm code. With this, I decided I can, instead of embedding code in other code using objcopy, just put that code in separate linker script section with section_start and section_end symbols defined, so that I can copy it to some other address in runtime. I did it and it worker with interrupt vector and libkernel (see point below). But once I changed EVERYTHING to use linker symbols/sections instead of objcopy embedding it turned out it doesn't really work... and I had to make it back the old way :( The thing is -fPIC requires code to be loaded by some os or bootloader, that will fill the global offset table with symbols. I knew it's possible to generate bare-metal position-independent code, that shall work without got, but it turned out this is not implemented in gcc (it is in arm compiler, but only in 32-bits and who would like to use arm compiler anyway). I ended up writing stage1 of both bootloader and the kernel in careful position-independent assembly, thus achieving my goal (jut with a bit more of effort).
· Linker behaves weird when section names don't start with .text, .data, etc.
· Not strictly a problem, but a funny mistake of mine, that is worth mentioning... At first I didn't know about special features of SUBS pc, lr and ldm rn {pc} ^ instructions. So I would switch to user mode by first branching to code in PL0-accessible section and having it execute isb instruction. This worked, but was not good, because code executed by the kernel was in memory section writable by userspace code. So i separated that into "libkernel", that would be in a PL0-executable but non-writable section and would perform the switch... Well, it did work. Still, I was happy when I learned how to achieve the same with subs/ldm and could remove libkrnel, making the project a bit simpler.
· system mode has separate stack pointer from supervisor mode, so when going from supervisor to system we need to set it... We didn't know that and we were getting weeeird bugs (where changing something little in one place would make the bug occur or not occur somewhere completely else); also, it's not allowed (undefined behaviour) to switch from system mode directly to user mode... (at least this didn't cause such weird things to happen...)
· both bcm arm peripherals manual and the manual to uart itself say, that writing 0s to PL011_UART_IMSC unmasks interrupts; its the opposite: writing 1s enables specific interrupts and writing 0s disables them. wiki.osdev code also got it the way it's written in those docs, but this didn't cause problems, since uart irq was not enabled in ARM_ENABLE_IRQS_2 (using #define names from our code), so, as intended, no irq was occuring
· STILL UNFIXED · The very simple pipe_image program somehow manages to break stdin, so that even other programs run in that same (bash) shell can't read from it... (in zsh other interactively run commands work ok, but command following pipe_image inside a shell function still have that problem)
- Our sources of information
· wiki.osdev (good for starting off, we could also (in a polite way) mention the things, that were broken there)
> getting uart irq masking wrong (not really their fault, see above)
> zeroing bss even tho it was placed in a section, that wasn't marked NOLOAD (I need to verify that yet)
> switch inside an enum!?!?!?
> + There is already some more mentioned in HISTORY.md
· ARM Architecture Reference Manual® ARMv7-A and ARMv7-R edition (man, this has 2720 pages! But I think this was the most useful document of all)
· dwelch67
· http://www.simtec.co.uk/products/SWLINUX/files/booting_article.html - very good description of atags
· BCM2835-ARM-Peripherals.pdf and https://elinux.org/BCM2835_datasheet_errata
· https://buildmedia.readthedocs.org/media/pdf/devicetree-specification/latest/devicetree-specification.pdf - perhaps You've found another source for that... but if not, this seems to be good!
· online ARM Compiler toolchain Assembler Reference
· Christina Brook's rpi-open-firmware
· http://infocenter.arm.com/help/topic/com.arm.doc.ddi0183g/DDI0183G_uart_pl011_r1p5_trm.pdf
· GNU make documentation
- Once we get rid of other ppl's code (there was more, now only a little bit in uart.c remains) boast, that we wrote everything bu ourselves
- maybe some special thanks (i.e. to gcc devs? idk)
!!! IMPORTANT !!!
* Add license (Unlicense?)
! A bit important !
* Also implement the generic timer and use it, as it's probably going to run on both qemu and RPi.
* inform linux errata guys about incorrecty-described uart irq masking in bcm arm perif manual
* maybe add some comments in code (would do with some feedback from someone who didn't write this, as to what is unclear)
* Add multiple processes
* Add file access from userspace
* implement buffered read/write operations (for now we only have single-char read and write - if no process reads, and data keeps coming on uart, some of it might get lost)
* also use smaller pages with the MMU (not just those fat sections as it is now)
* malloc() ?
* Maybe make .o files depend not only on their respective .c but also on used headers? (Without this make sometimes doesn't know that some file needs to be recompiled)
* also handle flattened device tree (not just ATAGS)
* write some procedures for dumping registers and other stuff (for use in debugging); maybe print registers' contents on data/prefetch abort?
* Memory regions can be configured as one of several types, which affects how memory reads/writes are performed by the processor. Dig into that and use the best appropriate settings in paging.c (i.e. normal memory instead of strongly-ordered memory for RAM).
* In the Makefile: is =? the right assignment for, say, CFLAGS?
* Reintroduce checking if size of loader_stage2.img is small enough in Makfile (removed by accident).
* Check if setting user mode's sp and lr can be achieved by msr instead of switching to system mode. If so, use this method.
* partially DONE - one can always add more, but we have the most important stuff * Implement some basic utilities for us to use (memcpy, printf, etc...)
* partailly DONE - svc works; once we implement processes we could also kill them on aborts * develop userspace process supervision (handling of interrupt caused by svc instruction, proper handling of other data abort, undefined instruction, etc.)
* UNIMPORTANT/TOO HARD * Fix piping with pipe_image
* UNIMPORTANT/TOO HARD * Races might occur, when one processor starts overwriting stuff at image load address before other processors execute the initial piece of code that puts them to sleep... This should be fixed in bootloader and will need to be taken into account when we develop the actual kernel to manage it's own memory
* UNIMPORTANT/TOO HARD * Real RPi firmware would jump to the kernel on all cores after loading it from SD... So it'd be good if bootloader did a simillar thing - i.e. bootloader, when started, first shuts off all cores but one, it loads it's stage2, which downloads the kernel by uart, turns all cores back on and jumps to kernel on all of them... Additional kudos if U make this race-free (see, TODO above)
* DONE * learn how recursive Makefiles work and put stuff into separate dirs (maybe also separate build/ dir for .o .elf and .img files)
* DONE * have one convention of linker script names
* DONE * Remove duplications in Makefile... i.e. use generic recipes for .c -> .o compilations and many other things, that can be shortened this way
* DONE (we don't place .bss nor /COM/ in a NOLOAD section, so they're included in krenel/bootloader image) * ensure .bss section is zeroed properly in stage2 (stage1 and actual kernel do it in common boot.S file; stage2 doesn't use boot.S); Note, that:
- It works as it is right now. If we have no uninitialized static variables in stage2 code, then .bss is probably empty... so this is not really important
- What if wiki.osdev was wrong about this and objcopy includes .bss in it's output image? Then also no work needs to be done
- Stage2 gets loaded between 0x4000 and 0x8000, so that piece of memory could be zeroed-out before by stage1 and that would solve the issue
* DONE * Add sanity-check at build-time, that stage2 blob is smaller than 0x4000 in size
* DOOOONE * Finally, the most important thing - move forward and start working with the MMU already!
* DOOOONE * Start doing this on hardware already... (Hey, whole making of a bootloader was with this in mind!)
* DONE * learn some asm and write exception handlers withour gcc's "interrupt" function attribute (this is so that we see what's happening - right now gcc hides some things from us...)
* DOOOONE * boot.S is also copy-pasted from wiki-osdev (very lame)
* DOOOONE * shorten linker script... (I think I don't have to tell You where it's original version was copy-pasted from...)
* DOOOONE (we have clock and uart irq working) * get external interrupts work
* DONE (I think - it's 3 places now, but in a lot cleaner way) * find a way to make management of binary pieces less messy (in the entire project there're currently about 5 places where code is being copied by other code...)
|