Boot-explained.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

When RaspberryPi boots, it searches the first partition on SD card (which should be formatted FAT) for it's firmware and configuration files, loads them and executes them. The firmware then searches for the kernel image file. The name of the looked for file can be kernel.img, kernel7.img, kernel8.img (for 64-bit mode) or something else, depending on configuration and firmware used (rpi-open-firmware looks for zImage). The image is then copied to some address (which should be 0x8000 for 32-bit kernel, but is 0x2000000 in rpi-open-firmware and 0x10000 in qemu (version 2.9.1)) and jumped to on all cores. 3 arguments are passed to the kernel: first (passed in r0) is 0; second (passed in r1) is machine type; third (passed in r2) is the address of FDT or ATAGS structure describing the system or 0.
Pis, that support aarch64, can also boot directly into 64-bit mode, in which case the image gets loaded at 0x80000. We're not using 64-bit mode in this project. 
Qemu can be used to emulate RaspberryPi, in which case kernel image and memory size are provided to the emulator on the command line. Qemu can also load kernel in the form of an elf file, in which case it's load address is determined based on information in the elf.

Our kernel has been executed on qemu emulating RaspberryPi 2 as well as on real RaspberryPi 3 running rpi-open firmware (although not every functionality works everywhere). To quicken running new images of the kernel on the board, a simple bootloader has been written by us, which can be run from the SD card instead of the actual kernel. It reads the kernel image from uart, and executes it. The bootloader can also be used within qemu (although there are problems with passing keyboard input to the kernel once it's running).

Both bootloader and kernel are split into 2 stages.
In case of the loader it is due to the fact, that the the actual kernel read by it from UART is supposed to be written at 0x8000. If the loader also ran from 0x8000 or a cloase address, it could possibly overwrite it's own code while writing kernel to memory. To avoid this, the first stage of the loader first copies it's second stage embedded in it to address 0x4000. Then it jumps to that second stage, which reads kernel image from uart, writes it at 0x8000 and jumps to it. Arguments (r0, r1, r2) are preserved and passed to the kernel. Second stage of the bootloader is intended to be kept small enough to fit between 0x4000 and 0x8000. Atags structure, if present, is guaranteed to end below 0x4000, so it should not get overwritten by loader's stage2.
The loader protocol is simple: first, size of the kernel is sent through UART (4 bytes, little endian). Then, the actual kernel image. Our program pipe_image is used to prepend kernel image with it's size.

In case of kernel, it is desired to have image run from 0x0, because that's where the interrupt vector table is under default settings. This is also achieved by splitting into 2 stages. Stage 1 is loaded at some higher address. It has second stage image embedded in it. It copies it to 0x0 and jumps to it. What gets more complicated, than in the loader, is the handling of ATAGS structure. Before copying stage 2 to 0x0, stage 1 first checks if atags is present and if so, it is copied to some location high enough, that it won't be overwritten by stage 2 image. Whenever the memory layout is modified, it should be checked, if there is a danger of ATAGS being overwritten by some kernel operations before it is used. In current setup, new location chosen for ATAGS is always below the memory later used as the stack and it might overlap memory later used for translation table, which is not a problem, since kernel only uses ATAGS before filling that table.
When stage 1 of the kernel jumps to second stage, it passes modified arguments: first argument (r0) remains 0 if ATAGS was found and is set to 3 to indicate, that ATAGS was not found. Second argument (r2) remains unchanged. Third argument (r2) is the current address of ATAGS (or remains unchanged if no ATAGS was found).
If support for FDT is added in the future, it must also be done carefully, so that FDT doesn't get overwritten.
At the start of the stage 2 of the kernel, there is the interrupt vector table. It's first entry is the reset vector, which is not normally unused. In our case, when stage 1 jumps to 0x0, to first instruction of stage 2, it jumps to that vector, which then calls the setup routine.

In both loader and the kernel, at the beginning of stage1 it is ensured, that only one ARM core is executing.

It's worth noting, that in first stages the loop that copies the embedded second stage is intentionally situated after the blob in the image. This way, this loop will not overwrite itself with the data it is copying, since the stage 2 is always copied to some lower address (to 0x0 in case of kernel and to 0x4000 in case of loader - we assume stage 1 won't be loaded below 0x4000).

Qemu, stock RaspberryPi firmware and rpi-open-firmware all load image at different addresses. Although stock firmware is not used in this project, our loader loads kernel at 0x8000, where the stock firmware would. Because of that, it is desired, that image is able to run, regardless of where it was loaded at. This was realized by writing first stages of loader and kernel in careful, position-independent assembly. The starting address in corresponding linker scripts is irrelevant. The stage 2 blobs are embedded using .incbin assembly directive. Second stages are written normally in C and compiled as position-dependent for their respective addresses.