1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
|
#How we were writing it, from the beginning#
So, the goal is to learn and program the MMU of the RaspberryPI.
To be able to start quickly, we, errrmm... copy-pasted some code from [RaspberryPI bare metal tutorial on wiki OSdev](https://wiki.osdev.org/ARM_RaspberryPi_Tutorial_C) with plans to replace these bits with our own later. "An easy way" - one would say. Not really. The wiki code, although useful, is far from working. Consider this part:
enum
{
// The GPIO registers base address.
switch (raspi) {
case 2:
case 3: GPIO_BASE = 0x3F200000; break; // for raspi2 & 3
case 4: GPIO_BASE = 0xFE200000; break; // for raspi4
default: GPIO_BASE = 0x20200000; break; // for raspi1, raspi zero etc.
}
// more stuff here
};
Switch statement inside of an enum?! First thought? That it's some kind of nonstandard extension to C. Well, no. No matter how many own things gcc adds to the language, the ability to do THIS is not one of them. This is just one example.
It's really weird that someone wrote this and made it available online. There were also other, simillar problems with that code. I.e. there was double padding, which resulted in the initial routine being loaded at address 0x10000, not 0x8000...
Maybe the wiki OSdev authors don't actually want people to directly reuse their code or they want to stop inexperienced programmers from doing this kind of low-level bare-metal stuff too easily.
Nevertheless, we had to fix the bugs and then we could run the kernel.elf under qemu emulating the RPI2 and receive the (virtual) uart output.
The real hardware we have is RPI3B, but the versions of qemu available in some distributions don't yet have support for emulating RPI3 and compiling from source seemed like o good way to waste time we don't have (+ we were not going to use aarch64 until we get the 32-bit version working).
There were more problems with running the raw binary version of kernel (which we had to get working if we wanted to ever use real hardware). As it turned out after 1.5 hour of static analysis of the image in radare2 - qemu doesn't load the binary image at 0x8000 as a real RPI would, but rather at 0x10000. This migh also explain the need for double padding the wiki asm and linker code caused. After finding that out, we could finally understand how the ld script and objcopy work and we didn't need to use dd on the image anymore. We temporarily changed the entry point address to 0x10000.
We wrote few lines to check for paging support based on wiki info and we moved uart code into separate files. Then we came up with the idea of bootloader.
Once we would have started working on real RPI, we would have had to take the SD out of it, put into the pc, write the kernel to it and move it from PC back to the PI on every kernel compilation. Sounds terrible, doesn't it? And aside from that it takes a lot of time, it also kills the SD.
We decided to send the kernel through UART. In fact, there already exists a bootloader, called Raspbootin, that does exactly that. This time, however, we wanted to write this ourselves (especially that it seemed rather easy). So, the PI (or qemu) boots the loader instead of the actual kernel, the kernel (prepended with 4 bytes describing it's size) is piped through the uart (or to qemu's stdin), the loader writes the received data into the memory and jumps to it. One problem is that if loader gets loaded at 0x8000, then it cannot just write the kernel at 0x8000, because it would overwrite it's own code. That's why we made a "2nd stage" of bootloader, which is embedded in the main loader executable. The loader copies the stage2 to some other address, e.g. 0x4000 and jumps to it. The 2nd stage then initializes the uart, receives the kernel, writes it at 0x8000 and jumps to it.
By writing the bootloader we also removed the need to change the kernel entry address depending on the environment. The bootloader entry address has to bo changed instead, but this is less problematic, since the bootloader can just sit there on the SD card undisturbed, while we're working on the kernel.
We also finally ran the code on real hardware. And we used [RPI Open Firmare](https://github.com/christinaa/rpi-open-firmware) for that (at last something it CAN be used for). Aside from kernel (which, btw., is supposed to be linux and has to ba named "zImage"), the firmawere expected 2 files (device tree, cmdline.txt). It also turned out to load the kernel at even different address. All this should be changed in the firmware itself **TODO**, but the original version is enough to work on for now.
Another problem appeared when trying to use the bootloader on real RPi. Mainly - how does one pipe the image through UART? GNU Screen we successfully used for communicating with the board doesn't seem to support this. We found a tool called "socat", which was available from the repo and could be used instead. So the makefile rule would first pipe the image using socat and then run screen for the usual io. An additional uart_getc() had to be added at the beginning of kernel main function, so that it's first output wouldn't get lost before screen would start. Socat solution also required the the UART USB adapter and PI's power supply to be replugged in a specific order to work, so we started working on a different solution using libRS232 for UART communication from the PS.
Only at this point we eventually started working on the relevant part - the MMU. It was surprisingly difficult to achieve something in this field. Information on the wiki was incomplete, just as information in the first few reference manuals picked. The source that eventually proved to be good enough is *ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition*. Also, the configuration of the MMU turned out to be way more complicated than first thought. After some hours of digging through dozens of options changed in various coprocessor registers we eventually came up with some code to enable the MMU, with a simple, (obvoiusly) flat mapping of memory and after finding out the bugs (forgetting to also map the part of memory where UART periphs are accessible, forgetting to mark the descriptor as describing section, creating the translation table at the same place the stack was) we got it working in qemu.
The above could be possibly achieved easier, by using others' existing code, but doing the whole project with the Copy-Paste method seemed like a bad idea.
Knowing a good, working sequence of actions needed for enabling the MMU, we could start writing it a cleaner way - using unions and structs with bitfields, which make the code a lot more readable compared to when bit masks and bit shifts are used for work on coprocessor register contents and translation tables entries.
The next step was switching to PL0 (unprivileged) mode under MMU-mapped address space. For that, we embedded in our kernel image a binary, that is supposed to run in PL0 mode. It did the same simple thing kernel
used to do - echoing everything on uart. The privileged code would mark a memory section entry in translation table as accessible for PL0 and then copy the embedded blob to that section. It would then jump to that just-copied code. switching from PL1 to PL0 would be done on the blob side (at the very beginning of it's execution).
This was also an opportunity for us to check that the memory mapping truly works. We mapped virtual addresses 0xAAA00000 - 0xAAAFFFFF to physical addresses just after our kernel image and translation table (probably 0x00100000 - 0x001FFFFF given the small size of our kernel). The virtual address range 0xAAA00000 - 0xAAAFFFFF was also marked available for use by PL0 code. We then made kernel write the blob at 0x00100000 knowing it will also appear at virtual 0xAAA00000. Then, successfully running the unprivileged code from that address confirmed, that the mapping really works.
There were 2 important things forgetting about which would stop us from succeeding in this step. The first one was the stack. Kernel used to use the memory just below itself (physical 0x8000) as the stack and since these addresses would not be available for PL0 code, a new stack had to be chosen - we set it somewhere on the high part of our unprivileged memory. The second important thing was marking the section in which the memory-mapped uart registers reside as accessible form PL0. This is because we were going to have unprivileged code write to uart by itself (and use the same uart code kernel uses...). Once we had interrupts programmed, this demo was to be improved to actually call the privileged code for writing and reading.
|