From dbe4100b58c901685f223a241d90bd901ea59c68 Mon Sep 17 00:00:00 2001 From: Wojtek Kosior Date: Tue, 21 Jan 2020 16:51:59 +0100 Subject: rewrite documentation in org --- README.org | 1321 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1321 insertions(+) create mode 100644 README.org diff --git a/README.org b/README.org new file mode 100644 index 0000000..4f6f7d8 --- /dev/null +++ b/README.org @@ -0,0 +1,1321 @@ +#+TITLE: RaspberryPi MMU example + +* Building the project +** Dependencies +1. Native GCC (+ binutils) +2. ARM cross-compiler GCC (+ binutils) (arm-none-eabi works - others + might or might not) +3. GNU Make +4. rpi-open-firmware (for running on the Pi) +5. GNU screen (for communicating with the kernel when running on the Pi) +6. socat (for communicating with the bootloader when running on the Pi) +7. Qemu ARM (for emulating the Pi). + +For building rpi-open-firmware one will need more tools (not listed +here). + +The project has been tested only in Qemu emulating Pi 2 and on real Pi 3 model B. + +Running on Pis other than Pi 2 and Pi 3 is sure to require changing the definition in global.h (because peripheral base addresses differ between Pi versions) and might also require other modifications, not known at this time. + +Assuming make, gcc, arm-none-eabi-gcc and its binutils are in the PATH, the kernel can be built with: + +#+BEGIN_EXAMPLE + $ make kernel.img +#+END_EXAMPLE + +which is the same as: + +#+BEGIN_EXAMPLE + $ make +#+END_EXAMPLE + +The bootloader can be built with: + +#+BEGIN_EXAMPLE + $ make loader.img +#+END_EXAMPLE + +Both loader and kernel can then be found in build/ + +* Running +** Running in Qemu +To run the kernel (passed as elf file) in qemu: + +#+BEGIN_EXAMPLE + $ make qemu-elf +#+END_EXAMPLE + +If You want to pass a binary image to qemu: + +#+BEGIN_EXAMPLE + $ make qemu-bin +#+END_EXAMPLE + +To pass loader image to qemu and pipe kernel to it through emulated uart: + +#+BEGIN_EXAMPLE + $ make qemu-loader +#+END_EXAMPLE + +With qemu-loader the kernel will run, but will be unable to receive any keyboard input. + +The timer used by this project is the ARM timer ("based on an ARM +AP804", with registers mapped at 0x7E00B000 in the GPU address space). +It's absent in emulated environment, so no timer interrupts can be +witnessed in qemu. + +** Running on real hardware. + +First, the rpi-open-firmware has to be built. Then, kernel.img (or +loader.img) should be copied to the SD card (next to bootcode.bin) and renamed to +zImage. Also, the .dtb file corresponding to the Pi model (actually, any .dtb +would do, it is not used right now) from stock firmware files has to be put to the SD +card and renamed as rpi.dtb. Finally, a cmdline.txt has to be present on the SD card +(content doesn't matter). + +Now, RaspberryPi can be connected via UART to the development machine. GPIO on the Pi works +with 3.3V, so one should make sure, that UART device on the other end is +also working wih 3.3V. This is the pinout of the RaspberyPi 3 model B +that has been used for testing so far: + +#+BEGIN_EXAMPLE + Top left of the board is here + | + V + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | 2| 4| 6| 8|10|12|14|16|18|20|22|24|26|28|30|32|34|36|38|40| + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | 1| 3| 5| 7| 9|11|13|15|17|19|21|23|25|27|29|31|33|35|37|39| + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +#+END_EXAMPLE + +Under rpi-open-firmware (stock firmware might map UARTs differently): + +1. pin 6 is Ground +2. pin 8 is TX +3. pin 10 is RX + +Once UART is connected, the board can be powered on. + +It is assumed, that USB to UART adapter is used and it is seen by the system as /dev/ttyUSB0. + +If one copied the kernel to the SD card, they can start communicating +with the board by running: + +#+BEGIN_EXAMPLE + $ screen /dev/ttyUSB0 115200,cs8,-parenb,-cstopb,-hupcl +#+END_EXAMPLE + +If one copied the loader, they can send it the kernel image and start +communicating with the system by running: + +#+BEGIN_EXAMPLE + $ make run-on-rpi +#+END_EXAMPLE + +To run again, one can replug USB to UART adapter and Pi's power supply (order +matters!) and re-enter the command. + +Running under stock firmware has not been performed. In particular, the +default configuration on RaspberryPi 3 seems to map other UART than used +by the kernel (so-called miniUART) to pins 6, 8 and 10. This is supposed +to be configurable through the use of overlays. + +* Makefile + To maintain order, all files created with the use of make, that is binaries, object +files, natively executed helper programs, etc. get placed in build/. + +Our project contains 2 Makefiles: one in it's root directory and one in +build/. The reason is that it is easier to use Makefile to simply, +elegantly and efficiently produce files in the same directory where it +is. To produce files in directory other than Makefile's own, it requires +this directory to be specified in many rules across the Makefile and in +general it complicates things. Also, a problem arises when trying to +link objects not from within the current directory. If an object is +referenced by name in linker script (which is a frequent practice in our +scripts) and is passed to gcc with a path, then it'd need to also appear +with that path in the linker script. Because of that a Makefile in +build/ is present, that produces files into it's own directory and the +Makefile in project's root is used as a proxy to that first one - it +calls make recursively in build/ with the same target it was called +with. These changes makes it easier to read. + +From now on only Makefile in build/ will be discussed. + +In the Makefile, variables with the names of certain tools and their +command line flags are defined (using =? assignment, which allows one to +specify their own value of that variable on the command line). In case a +cross-compiler with a different triple should be used, ARM\_BASE, +normally set to arm-none-eabi, can be set to something like +arm-linux-gnueabi or even /usr/local/bin/arm-none-eabi. + +All variables discussed below are defined using := assignment, which +causes them to only be evaluated once instead of on every reference to +them. + +Objects that should be linked together to create each of the .elf files +are listed in their respective variables. I.e. objects to be used for +creating kernel\_stage2.elf are all listed in KERNEL\_STAGE2\_OBJECTS. +When adding a new source file to the kernel, it is enough to add it's +respective .o file to that list to make it compile and link properly. No +other Makefile modifications are needed. In a similar fashion, +RAMFS\_FILES variable specifies files, that should be put in the ramfs +image, that will be embedded in the kernel. Adding another file only +requires listing it there. However, if the file is to be found somewhere +else that build/, it might be useful to use the vpath directive to tell +make where to look for it. + +Variables dirs and dirs\_colon are defined to store list of all +directories within src/, separated with spaces and colons, respectively. +dirs\_colons are used for vpath directive. 'dirs' variable is used in +ARM\_FLAGS to pass all the directories as include search paths to gcc. +empty and space are helper variables - defining dirs\_colon could be +achieved without them (but it's clearer this way). + +The vpath directive tells make to look for assembler sources, C sources +and linker scripts in all direct and indirect subdirectories of src/ +(including itself). All other files shall be found/created in build/. + +** Targets + +The default target is the binary image of the kernel. + +The generic rule for compiling C sources uses cross-compiler or native +compiler with appropriate flags depending on whether the source file is +located somewhere under arm/ directory (which lies in src/) or enywhere +else. + +The generic rules for making a stripped binary image out of elf file, +for assembling an assembly file, for making an arbitrary file a linkable +object and for linking objects are ARM-only. + +In C world it is possible to embed a file in an executable by using +objcopy to create an object file from it and then linking that object +file into the executable. In this project, at the current time, this is +used only for embedding ramfs in the kernel (incbin is used for +embedding kernel and loader second stages in their first stages). +Generic rule for making a binary image into object file is present, in +case it is needed somewhere else again. + +To link elf files, the generic rule is combined with a rule that +specifies the elf's objects. Objects are listed in variables whenever +more than one of them is needed. + +At this point in the Makefile, the dependence of objects created from +assembly on files referenced in the assembly source via incbin is +marked. + +Simple ram filesystem is created from files it should contain with the +use of our own simple tool - makefs. + +Another 2 rules specify how native programs (for the machine we're +working on) are to be linked. + +** Aliased Rules + +Rule qemu-elf runs the kernel in qemu emulating RaspberryPi 2 with +256MiB of memory by passing the elf file of the kernel to the emulator. + +Rule qemu-bin does the same, but passes the binary image of the kernel +to qemu. + +Rule qemu-loader does the same, but first passes the binary image of the +bootloader to qemu and the actual kernel is piped to qemu's standard +input, received by bootloader as uart data and run. This method +currently makes it impossible to pass any keyboard input to kernel once +it's running. + +Rule run-on-rpi pipes the kernel through uart, assuming it is available +under /dev/ttyUSB0, and then opens a screen session on that interface. +This allows for executing the kernel on the Pi connected through UART, +provided that our bootloader is running on the board. + +Rule clean removes all the files generated in build/. + +Rules that don't generate files are marked as PHONY. + + +* Project structure + Directory structure of the project: + +#+BEGIN_EXAMPLE + doc/ + build/ + Makefile + Makefile + src/ + lib/ + rs232/ + rs232.c + rs232.h + host/ + pipe_image.c + makefs.c + arm/ + common/ + svc_interface.h + strings.c + io.h + io.c + strings.h + PL0/ + PL0_utils.h + svc.S + PL0_utils.c + PL0_test.c + PL0_test.ld + PL1/ + loader/ + loader_stage2.ld + loader_stage2.c + loader_stage1.S + loader.ld + kernel/ + demo_functionality.c + paging.h + setup.c + interrupts.h + interrupt_vector.S + kernel.ld + scheduler.h + atags.c + translation_table_descriptors.h + bcmclock.h + ramfs.c + kernel_stage1.S + paging.c + ramfs.h + interrupts.c + armclock.h + atags.h + kernel_stage2.ld + cp_regs.h + psr.h + scheduler.c + memory.h + demo_functionality.h + PL1_common/ + global.h + uart.h + uart.c +#+END_EXAMPLE + +** Most significant directories and files + +doc/ Contains documentation of the project. + +build/ Contains main Makefile of the project. All objects created during +the build process are placed there. + +Makefile Proxies all calls to Makefile in build/. + +src/ Contains all sources of the project. + +src/host/ Contains sources of helper programs to be compiled using +native GCC and run on the machine where development takes place. + +src/arm/ Contains sources to be compiled using ARM cross-compiler GCC +and run on the RaspberryPi. + +src/arm/common Contains sources used in both: privileged mode and +unprivileged mode. + +src/arm/PL0 Contains sources used exclusively in unprivileged, user-mode +(PL0) program, as well as the program's linker script. + +src/arm/PL1 Contains sources used exclusively in privileged (PL1) mode. + +src/arm/PL1/loader Contains sources used exclusively in the bootloader, +as well as linker scripts for stages 1 and 2 of this bootloader. + +src/arm/PL1/kernel Contains sources used exclusively in the kernel, as +well as linker scripts for stages 1 and 2 of this kernel. + +src/arm/PL1/PL1\_common Contains sources used in both: kernel and +bootloader. + +TODOs Contains what the name suggests, in plain text. It lists things +that still can be implemented or improved, as well as tasks, that were +once listed and have since been completed (in which case they're marked +as done). + +* Boot Process + When RaspberryPi boots, it searches the first +partition on SD card (which should be formatted FAT) for its firmware +and configuration files, loads them and executes them. The firmware then +searches for the kernel image file. The name of the looked for file can +be kernel.img, kernel7.img, kernel8.img (for 64-bit mode) or something +else, depending on configuration and firmware used (rpi-open-firmware +looks for zImage). + +The image is then copied to some address and jumped to on all cores. +Address should be 0x8000 for 32-bit kernel, but in reality is 0x2000000 +in rpi-open-firmware and 0x10000 in qemu (version 2.9.1). 3 arguments +are passed to the kernel: first (passed in r0) is 0; second (passed in +r1) is machine type; third (passed in r2) is the address of FDT or ATAGS +structure describing the system or 0 as default. + +PIs that support aarch64 can also boot directly into 64-bit mode. Then, +the image gets loaded at 0x80000. We're not using 64-bit mode in this +project. + +Qemu can be used to emulate RaspberryPi, in which case kernel image and +memory size are provided to the emulator on the command line. Qemu can +also load kernel in the form of an elf file, in which case its load +address is determined based on information in the elf. + +Our kernel has been executed on qemu emulating RaspberryPi 2 as well as +on real RaspberryPi 3 running rpi-open firmware (although not every +functionality works everywhere). To quicken running new images of the +kernel on the board, a simple bootloader has been written by us, which +can be run from the SD card instead of the actual kernel. It reads the +kernel image from uart, and executes it. The bootloader can also be used +within qemu, but there are several problems with passing keyboard input +to the kernel once it's running. + +Both bootloader and kernel are split into 2 stages. + +** Loader + +In case of the loader it is due to the fact, that the the actual kernel +read by it from UART is supposed to be written at 0x8000. If the loader +also ran from 0x8000 or a close address, it could possibly overwrite +it's own code while writing kernel to memory. To avoid this, the first +stage of the loader first copies its second stage embedded in it to +address 0x4000. Then, it jumps to that second stage, which reads kernel +image from uart, writes it at 0x8000 and jumps to it. Arguments (r0, r1, +r2) are preserved and passed to the kernel. Second stage of the +bootloader is intended to be kept small enough to fit between 0x4000 and +0x8000. Atags structure, if present, is guaranteed to end below 0x4000, +so it should not get overwritten by loader's stage2. + +The loader protocol is simple: first, size of the kernel is sent through +UART (4 bytes, little endian). Then, the actual kernel image. Our +program pipe\_image is used to prepend kernel image with its size. + +** Kernel + In case of kernel, it is desired to have image run from 0x0, +because that's where the interrupt vector table is under default +settings. This is also achieved by splitting it into 2 stages. +*** Stage 1 + Stage 1 is loaded at some higher address. It has second stage +image embedded in it. It copies it to 0x0 and jumps to it. What gets +more complicated compared to loader, is the handling of ATAGS structure. +Before copying stage 2 to 0x0, stage 1 first checks if atags is present +and if so, it is copied to some location high enough, that it won't be +overwritten by stage 2 image. Whenever the memory layout is modified, it +should be checked, if there is a danger of ATAGS being overwritten by +some kernel operations before it is used. In current setup, new location +chosen for ATAGS is always below the memory later used as the stack and +it might overlap memory later used for translation table, which is not a +problem, since kernel only uses ATAGS before filling that table. + +When stage 1 of the kernel jumps to second stage, it passes modified +arguments: first argument (r0) remains 0 if ATAGS was found and is set +to 3 to indicate, that ATAGS was not found. Second argument (r2) remains +unchanged. Third argument (r2) is the current address of ATAGS (or +remains unchanged if no ATAGS was found). If support for FDT is added in +the future, it must also be done carefully, so that FDT doesn't get +overwritten. +*** Stage 2 + At the start of the stage 2 of the kernel, +there is the interrupt vector table. It's first entry is the reset +vector, which is not normally unused. In our case, when stage 1 jumps to +0x0, first instruction of stage 2, it jumps to that vector, which then +calls the setup routine. + +*** Notes + +In both loader and the kernel, at the beginning of stage1 it is ensured, +that only one ARM core is executing. + +It's worth noting, that in first stages the loop that copies the +embedded second stage is intentionally situated after the blob in the +image. This way, this loop will not overwrite itself with the data it is +copying, since the stage 2 is always copied to some lower address. It +copies to 0x0 in case of kernel and to 0x4000 in case of loader - we +assume stage 1 won't be loaded below 0x4000. + +Qemu, stock RaspberryPi firmware and rpi-open-firmware all load image at +different addresses. Although stock firmware is not used in this +project, our loader loads kernel at 0x8000, where the stock firmware +would. Because of that, it is desired, that image is able to run, +regardless of where it was loaded at. This was realized by writing first +stages of loader and kernel in careful, position-independent assembly. +The starting address in corresponding linker scripts is irrelevant. The +stage 2 blobs are embedded using .incbin assembly directive. Second +stages are written normally in C and compiled as position-dependent for +their respective addresses. + +* MMU + +Here's an explanation of steps we did to enable the MMU and how the MMU +works in general. + +MMU stands for Memory Management Unit. It does 2 important things: + +1. It allows programs to use virtual memory addressing. Virtual + addresses are translated by the MMU to physical addresses with the + help of translation table. +2. It guards against unallowed memory access. Element that only + implements this functionality is called MPU (Memory Protection Unit) + and is also found in some ARM cores. + +Without MMU code executing on a processor sees the memory as it really +is. + +When it tries to load data from address 0x00AA0F3C it indeed loads data +from 0x00AA0F3C. This doesn't mean address 0x00AA0F3C is in RAM: RAM can +be mapped into the address space in an arbitrary way. + +MMU can be configured to "redirect" some range of addresses to some +other range. Let's assume we configured the MMU to translate address +range 0x00A00000 - 0x00B00000 to range 0x00200000 - 0x00300000. Now, +code trying to perform operation on address 0x00AA0F3C would have the +address transparently translated to 0x002A0F3C, on which the operation +would actually take place. + +The translation affects all (stack and non-stack) data accesses as well +as instruction fetches, hence an entire program can be made to work as +if it was running from some memory address, while in fact it runs from a +different one! + +The addresses used by program code are referred to as virtual addresses, +while addresses actually used by the processor - as physical addresses. + +This aids operating system's memory management in several ways + +1. A program may by compiled to run from some fixed address and the OS + is still free to choose any physical location to store that program's + code - only a translation of program's required address to that + location's address has to be configured. A problem of simultaneous + execution of multiple programs compiled for the same address is also + avoided in this way. +2. A consecutive memory region might be required by some program. For + example: due to earlier allocations and deallocactions there isn't a + big enough (no pun intended) free consecutive region of physical + memory. Smaller regions can be mapped to become accessible as a + single region in virtual address space, thus avoiding the need for + defragmentation. + +A given mapping can be made valid for only one execution mode (i.e. +region only accessible from privileged mode) or only certain types of +accesses . A memory region can be made non-executable, which guards +against accidental jumping there by program code. That is important for +countering buffer-overflow exploits. An unallowed access triggers a +processor exception, which passes control to an appropriate interrupt +service routine. + +In RaspberryPi environments used by us, there are ARMv7-A compatible +processors, which we currently use only in 32-bit mode. Information here +is relevant to those systems (there are Pi boards with both older and +newer processors, with more or less functionality and features +available). + +If MMU is present, general configuration of it is done through registers +of the appropriate coprocessor (cp15). Translations are managed through +translation table. It is an array of 32-bit or 64-bit entries (also +called descriptors) describing how their corresponding memory regions +should be mapped. A number of leftmost bits of a virtual address +constitutes an index into the translation table to be used for +translating it. This way no virtual addresses need to be stored in the +table and MMU can perform translations in O(1) time. + +** Coprocessor 15 + +Coprocessor 15 contains several registers, that control the behaviour of +the MMU. They are all accessed through mcr and mrc arm instructions. + +1. SCTLR, System Control Register - "provides the top level control of + the system, including its memory system". Bits of this register + control, among other things, whether the following are enabled: + + 1. the MMU + 2. data cache4. TEX remap + 3. instruction cache + 4. TEX remap (changes how some translation table entry bit fields + (called C, B and TEX) are used - not in the project) + 5. access flags (enabling causes one translation table descriptor bit + normally used to specify access permissions of a region to be used + as access flag - not used either) + +2. DACR, Domain Access Control Register - "defines the access permission + for each of the sixteen memory domains". Entries in translation table + define which of available 16 memory domains a memory region belongs + to. Bits of DACR specify what permissions apply to each of the + domains. Possible settings are to allow accesses to regions based on + settings in translation table descriptor or to allow/disallow all + accesses regardless of access permission bits in translation table. + +3. TTBR0, Translation Table Base Register 0 - "holds the base address of + translation table 0, and information about the memory it occupies". + System mode programmer can choose (with respect to some alignment + requirements) where in the physical memory to put the translation + table. Chosen address (actually, only a number of it's leftmost bits) + has to be put in TTBR for the MMU to know where the table lies. Other + bits of this register control some memory attributes relevant for + accesses to table entries by the MMU + +4. TTBR1, Translation Table Base Register 1 - simillar function to TTBR0 + (see below for explaination of dual TTBR) +5. TTBCR, Translation Table Base Control Register, which controls: + + 1. How TLBs (Translation Lookaside Buffers) are used. TLBs are a + mechanism of caching translation table entries. + 2. Whether to use some extension feature, that changes traslation + table entries and TTBR* lengths to 64-bit (we're not using this, + so we won't go into details) + 3. How a translation table is selected. + +There can be 2 translation tables and there are 2 cp15 registers (TTBR0 +and TTBR1) to hold their base addresses. When 2 tables are in use, then +on each memory access some leftmost bits of virtual address determine +which one should be used. If the bits are all 0s - TTBR0-pointed table +is used. Otherwise - TTBR1 is used. This allows OS developer to use +separate translation tables for kernelspace and userspace (i.e. by +having the kernelspace code run from virtual addresses starting with 1 +and userspace code run from virtual addresses starting with 0). A field +of TTBCR determines how many leftmost bits of virtual address are used +for that (and also affects TTBR0 format). In the simplest setup (as in +our project) this number is 0, so only the table specified in TTBR0 is +used. + +** Translation table + +Translation table consists of 4096 entries, each describing a 1MB memory +region. An entry can be of several types: + +1. Invalid entry - the corresponding virtual addresses can not be used +2. Section - description of a mapping of 1MB memory region +3. Supersection - description of a mapping of 16MB memory region, that + has to be repeated 16 times in consecutive memory sections . This can + be used to map to physical addresses higher than 2\^32. +4. Page table - no mapping is given yet, but a page table is pointed. + See below. + +Besides, translation table descriptor also specifies: + +1. Access permissions. +2. Other memory attributes (cacheability, shareability). +3. Which domain the memory belongs to. + +** Page Table + +Page table is something simillar to translation table, but it's entries +define smaller regions (called, well - pages). When a translation table +descriptor describing a page table gets used for translation, then entry +in that page table is fetched and used along with some middle bits of +the virtual address used as index. This allows for better granularity of +mappings, as it doesn't require the page tables to occupy space if small +pages are not needed. We could say, that 2-level translations are +performed. On some versions of ARM translations can have more levels +than that. This means the MMU might sometimes need to fetch several +entries from different level tables to compute the physical address. +This is called a translation table walk. + +As of 15.01.2020 page tables and small pages are not used in the project +(although programming them is on the TODO list). + +** Project specific information + +Despite the overwhelming amount of configuration options available, most +can be left deafult and this is how it's done in this project. Those +default settings usually make the MMU behave like it did in older ARM +versions, when some options were not yet available and hence, the entire +system was simpler. + +Our project uses C bitfield structs for operating on SCTLR and TTBCR +contents and translation table descriptors. With DACR - bit shifts are +more appropriate and with TTBCR - our default configuration means we're +writing '0' to that register. This is an elegant and readable approach, +yet little-portable across compilers. Current struct definitions work +properly with GCC. + +Structs describing SCTLR, DACR and TTBCR are defined in +src/arm/PL1/kernel/cp\_regs.h. Structs describing translation table +descriptors are defined in +src/arm/PL1/kernel/translation\_table\_descriptors.h. + +Before the MMU is enabled, all memory is seen as it really is. +Therefore, the only feasible way of enabling it is by initially setting +the descriptors in translation table to map all addresses (mapping just +addresses used by the kernel would be enough) to themselves. It is +called a flat map. + +** Setting up MMU and FlatMap + +How setting up a flat map and turning on the MMU and management of +memory sections is done in our project: + +1. Translation table is defined in the linker script + src/arm/PL1/kernel/kernel\_stage2.ld as a NOLOAD section. C code gets + the table's start and end addresses from symbols defined in that + linker script (see arm/PL1/kernel/memory.h). +2. Function setup\_flat\_map() defined in arm/PL1/kernel/paging.c + enables MMU with a flat map. It prints relevant information to uart + while performing the following procedure: + + 1. In a loop write all descriptors to the translation table, set them + as sections, accessible from PL1 only, belonging to domain 0. + 2. Set DACR to allow domain 0 memory accesses, based on translation + table descriptor permissions and block accesses to other domains, + as only domain 0 is used in this project. + 3. Make sure TEX remap, access flag, caches and the MMU are disabled + in SCTLR. Disabling some of them might be unnecessary, because MMU + is assumed to be disabled from the start and enabled caches might + cause no problems as long as only flat map is used. Still, the way + it is done right now is known to work well and optimizations are + not needed. + 4. Clear all caches and TLBs (again, it is suspected that some of + this is unnecessary). + 5. Write TTBCR setting such that only 32-bit translation table is + used. + 6. Make TTBR0 point to the start of translation table. Rest of + attributes in TTBR0 (concerning how table entries are being + accessed) are left as 0s (defaults). + 7. Enable the MMU and caches by setting the appropriate bits in + SCTLR. + +After some cp15 register writes, the isb assembly instruction is used, +which causes ARM core to wait until changes take effect. This is done to +prevent some later instructions from being executed before the changes +are applied. + +In arm/PL1/kernel/paging.c the function claim\_and\_map\_section() can +be used to modify an entry in translation table to create a new mapping. +Memory allocation also done in that source file uses some lists to +describe free and taken sections, but has nothing to do with with the +MMU. + +* Program Status Register + CPSR (Current Program Status Register) is a register, bits of which contain and/or determine various aspects of + execution, i.e. condition flags, execution state (arm, thumb or + jazelle), endianness state, execution mode and interrupt mask. This register is readable and writeable with + the use of mrs and msr instructions from any PL1 mode, thus it is + possible to change things like mode or interrupt mask by writing to this + register. + +Additionally, there are other registers with the same or simillar bit +fields as CPSR. Those PSRs (Program Status Registers) are: + +1. APSR (Application Program Status Register) +2. SPSRs (Saved Program Status Registers) + +APSR is can be considered the same as CPSR or a view of CPSR, with some +limitations - some bit fields from CPSR are missing (reserved) in APSR. +APSR can be accessed from PL0, while CPSR should only be accessed from +PL1. This was an application program executing in user mode can learn +some of the settings in CPSR without accessing CPSR directly. + +SPSR is used for exception handling. Each exception-taking mode has it's +own SPSR (they can be called SPSR\_sup, SPSR\_irq, etc.). On exception +entry, old contents of CPSR are backed up in entered mode's SPSR. +Instructions used for exception return (subs and ldm \^), when writing +to the pc, have the important additional effect of copying the SPSR to +CPSR. This way, on return from an exception, processor returns to the +state from before the exception. That includes endianess settings, +execution state, etc. + +In our project, the structure of PSRs is defined in terms of C bitfield +structs in src/arm/PL1/kernel/psr.h. + +* Ramfs + +A simple ram file system has been introduced to avoid having to embed +too many files in the kernel in the future. + +The ram filesystem is created on the development machine and then +embedded into the kernel. Kernel can then parse the ramfs and access +files in it. + +Ramfs contains a mapping from file's name to it's size and contents. +Directories, file permissions, etc. as well as writing to filesystem are +not supported. + +Currently this is used to access the code of PL0 test program by the +kernel, which it then copies to the appropriate memory location. In case +more user mode programs are later written, they can all be added to +ramfs to enable the kernel to access them easily. + +** Specification + +When ramfs is accessed in memory, it MUST be aligned to a multiple of 4. + +The filesystem itself consists of blocks of data, each containing one +file. Blocks of data in the ramfs come one after another, with the +requirement, that each block starts at a 4-aligned offset/address. If a +block doesn't end at a 4-aligned address, there shall be up to 3 +null-bytes of padding after it, so that the next block is properly +aligned. + +Each block start with a C (null-terminated) string with the name of the +file it contains. At the first 4-aligned offset after the string, file +size is stored on 4 bytes in little endian. Null-bytes are used for +padding between file name and file size if necessary. Immediately after +the file size reside file contents, that take exactly the amount of +bytes specified in file size. + +As obvious from the specification, files bigger than 4GB are not +supported, which is not a problem in the case of this project. + +** Implementations + +Creation of ramfs is done by the makefs program (src/host/makefs.c). The +program accepts file names as command line arguments, creates a ramfs +containing all those files and writes it to stdout. As makefs is a very +simple tool (just as our ramfs is a simple format), it puts files in +ramfs under the names it got on the command line. No stripping or +normalizing of path is performed. In case of errors (i.e. io errors) +makefs prints information to stderr and exits. + +Parsing/reading of ramfs is done by a kernel driver +(src/arm/PL1/kernel/ramfs.c). The driver allows for finding a file in +ramfs by name. File size and pointers to file name string and file +contents are returned through a structure from function find\_file. + +As ramfs is embedded in kernel image, it is easily accessible to kernel +code. The alignment of ramfs to a multiple of 4 is assured in kernel's +linker script (src/arm/PL1/kernel/kernel\_stage2.ld). ## Exceptions +Whenever some illegal operation (attempt to execute undefined +instruction, attempt to access memory with insufficient permission, +etc.) happens or some peripheral device "messages" the ARM core, that +something important happened, an exception occurs. Exception is +something, that pauses normal execution and passes control to the +(specific part of) operating system. Upon an exception, several things +happen: + +1. Change of proocessor mode. +2. CPSR gets saved into new mode's [[./PSRs-explained.txt][SPSR]]. +3. pc (incremented by some value) is saved into new mode's lr. +4. Execution jumps to an entry in the exception vectors table specific + to the exception. + +Each exception type is taken to it's specific mode. Types and their +modes are: + +1. Reset and supervisor mode. +2. Undefined instruction and undefined mode. +3. Supervisor call and supervisor mode. +4. Prefetch abort and abort mode. +5. Data abort and abort mode. +6. Hypervisor trap and hypervisor mode (not used normally, only with + extensions). +7. IRQ and IRQ mode. +8. FIQ and FIQ mode. + +The new value of the pc (the address, to which the exception "jumps") is +the address of nth instruction from exceptiom base address, which, under +simplest settings, is 0x0 (bottom of virtual address space). N depends on the exception type. It is: + +1. reset +2. undefined instruction +3. supervisor call +4. prefetch abort +5. data abort +6. hypervisor trap (not used here) +7. IRQ +8. FIQ + +Those 8 instructions constitute the exception vectors table. As the +instruction follow one another, each of them should be a branch to some +exception-handling routine. In fact, on other architectures often the +exception vector table holds raw addresses of where to jump instead of +actual instructions, as here. + +Bottom of virtual address space can be changed to some other value by +manipulating the contents of SCTLR and VBAR coprocessor registers. + +On exception entry, the registers r0-r12 contain values used by the code +that was executing before. In order for the exception handler to perform +some action and return to that code, those registered can be preserved +in memory. Some compilers can automatically generate appropriate +prologue and epilogue for handler-functions, that will preserve the +right registers (we're not using this feature in our project). + +Having old CPSR in SPSR and old pc in lr is helpful, when after handling +the exception, the handler needs to return to the code that was +executing before. There are 2 special instructions, subs and ldm \^ +(load multiple with a dash \^), that, when used to change the pc (and +therefore perform a jump) cause the SPSR to be copied into CPSR. As bits +of CPSR determine the current execution mode, this causes the mode to be +change to that from before the exception. In short, subs and ldm \^ are +the instructions to use to return from exceptions. + +As noted eariler, upon exception entry an incremented value of pc is +stored in lr. By how much it is incremented, depends on exception type +and execution state. For example, entering undefined instruction +exception for thumb state places in undef's lr the problematic +instruction's address + 2, while taking this exception from ARM state +places in undef's lr that instruction's address + 4 (see full table in +paragraph B1.8.3 of [[https://static.docs.arm.com/ddi0406/c/DDI0406C_C_arm_architecture_reference_manual.pdf][ARMv7-ar\_arm]]). + +It's worth noting, that while our +[[ile:../src/arm/PL1/kernel/interrupt_vector.S][implementation of exception handlers]] also sets the stack pointer (sp) upon each +exception entry, a kernel could be written, where this wouldn't be done, +as each mode enterable by exception has it's own sp. + +* IRQ + 2 of out of all possible exceptions in ARM are IRQ (Interrupt Request) and FIQ (Fast + Interrupt Request). The can be caused by external source, such as +peripheral devices and they can be used to inform the kernel about some +action, that happened. + +Interrupts offer an economic way of interacting with peripheral devices. +For example, code can probe UART memory-mapped registers in a loop to +see whether transmitting/receiving of a character finished. However, +this causes the processor needlessly execute the loop and makes it +impossible or difficult to perform another tasks at the same time. +Interrupt can be used instead of probing to "notify" the kernel, that +something it was waiting for just happened. While waiting for interrupt, +the system can be put to halt (i.e. wfi instruction), which helps save +power, or it can perform other actions without wasting processor cycles +in a loop. + +An interrupt, that is normally IRQ, can be made into FIQ by ARM system +dependent means. FIQ is meant to be able to be handled faster, by not +having to back up registers r8-r12, that FIQ mode has it's own copies +of. This project only uses IRQ. + +Some peripheral devices can be configured (through their memory-mapped +registers) to generate an interrupt under certain conditions (i.e. UART +can generate interrupt when received characters queue fills). The +interrupt can then be either masked or unmasked (sometimes in more than +one peripheral register). If interrupts are enabled in CPSR and a +peripheral device tries to generate one, that is not masked, IRQ (or +FIQ) exception occurs (which causes interrupts to be temporarily masked +in CPSR). The code can usually check, whether an interrupt of given kind +from given device is *pending*, by looking at the appropriate bit of the +appropriate peripheral register (mmio). As long as an interrupt is +pending, re-enabling interrupts (for example via return from IRQ +handler) shall cause the exception to occur again. Removing the source +of the interrupt (i.e. removing characters from UART fifo, that filled) +doesn't usually cause the interrupt to stop pending, in which case a +pending-bit has to be cleared, usually by writing to the appropriate +peripheral register (mmio). + +IRQs and FIQs can be configured as vectored - the processor then, upon +interrupt, jumps to different location depending on which interrupt +occured, instead of jumping to the standard IRQ/FIQ vector. This can be used +to speed up interrupt handling. Our simple project does not, however, +use this feature. + +Currently, IRQs from 2 sources are used: +[[https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf][ARM timer IRQ]] and UART IRQs. The kernel makes sure, that timer IRQ only +occurs when processor is in user mode. IRQ handler does not return in +this case - it calls scheduler. The kernel makes sure, that UART IRQ +only occurs, when a process is blocked and is waiting for UART IO +operation. The interrupt handler, when called, checks what type of UART +action happened and tries (through calling of appropriate function from +scheduler.c) to handle that action and, possibly, to unblock the waiting +process. UART IRQ might occur when another process is executing (not +possible now, with only one process, but shall be possible when more +processes are added to the project), in which case it the handler +returns, or when kernel is explicitly waiting for interrupts (because +all processes are blocked), in which case it calls schedule() instead of +returning. + +* Processor modes + +ARMv7-A core can be executing in one of several modes (not to be +confused with instruction set states or endianness execution state). +Those are: + +1. User +2. FIQ +3. IRQ +4. Supervisor +5. Abort +6. Undefined +7. System + +In fact, there are more if the processor implements some extensions, but +this is irrelevant here. + +Current processor mode is encoded in the lowest five bits of the CPSR register. + +Processor can operate in one of 2 privilege levels (although, again, +extensions exist, that add more levels): + +1. PL0 - privilege level 0 +2. PL1 - privilege level 1 + +Processor modes have their assigned privilege levels. User mode has +privilege level 0 and all other modes have privilege level 1. Code +executing in one of privileged modes is allowed to do more things, than +user mode code, i.e. writing and reading some of the coprocessor +registers, executing some privileged instructions (i.e. mrs and msr, +when used to reference CPSR, as well as other modes' registers), +accessing privileged memory and changing the mode (without causing an +interrupt). Attempts to perform those actions in user mode result either +in undefined (within some limits) behaviour or an exception (depending +on what action is considered). + +User mode is the one, in which application programs usually run. Other +modes are usually used by the operating system's kernel. Lack of +privileges in user mode allows PL1 code to control execution of PL0 +code. + +While code executing in PL1 can freely (except switching from system to +user mode, which produces undefined behaviour) change mode by either +writing the CPRS or executing cps instruction, user mode can only be +exitted by means of an interrupt. + +Some ARM core registers (i.e. r0 - r7) are shared between modes, while +some are not. In this case, separate modes have their private copies of +those registers. For example, lr and sp in supervisor mode are different +from lr and sp in user mode. For full information about shared and not +shared (banked) registers, see paragraph B9.2.1 in +[[https://static.docs.arm.com/ddi0406/c/DDI0406C_C_arm_architecture_reference_manual.pdf][armv7-a +manual]]. The most important things are that user mode and system mode +share all registers with each other and they don't have their own SPSR +(which is used for returning from exceptions and exceptions are +never taken to those 2 modes) and that all other modes have their own +SPSR, sp and lr. + +The reason for having multiple copies of the same register in different +modes is that it simplifies writing interrupt handlers. I.e. supervisor +mode code can safely use sp and lr without destroying the contents of +user mode's sp and lr. + +The big number of PL1 modes is supposed to aid in handling of +interrupts. Each kind of interrupt is taken to it's specific mode. + +Supervisor mode, in addition to being the mode supervisor calls are +taken to, is the mode the processor is in when the kernel boots. + +System mode, which uses the same registers as user mode, is said to have +been added to ARM architecture to ease accessing the unprivileged +registers. For example, setting user mode's sp from supervisor mode can +be done by switching to system mode, setting the sp and switching back +to supervisor mode. Other modes' registers can alternatively be accessed +with the use of mrs and msr assembly instructions (but not from user +mode). + +Despite the name, system mode doesn't have to be the mode used most +often by operating system's kernel. In fact, prohibition of direct +switching from system mode to user mode would make extensive use of +system mode impractical. This project, for example, uses supervisor mode +for most of the privileged tasks. + +* Process management + An operating system has + to manage user processes. Our system only has one process right now, but +usual actions, such as context saving or context restoring, are +implemented anyways. The following few paragraphs contain information on +how process management looks like in operating systems in general. + +Process might return control to the system by executing the svc (eariler +called swi) instruction. System would then perform some action on behalf +of the process and either return from the supervisor call exception or +attempt to schedule another process to run, in which case context of the +old process would need to be saved for later and context of the new +process would need to be restored. + +Process has data in memory (such as it's stack, code) as well as data in +registers (r0-r15, CPSR). Together they constitute process' context. +From process' perspective, context should not unexpectedly change, so +when control is taken away from user mode code (via an exception) and +later (possibly after execution of some other processes) given back, it +should be transparent to the process (except when kernel does something +for the process in terms of supervisor call). In particular, the +contents of core registers should be the same as before. For this to be +achievable, the operating system has to back up process' registers +somewhere in memory and later restore them from that memory. + +Operating system kernel maitains a queue of processes waiting for +execution. When a process blocks (for example by waiting for IO), it is +removed from the queue. If a process unblocks (for example because IO +completed) it is added back to the queue. In general, some systems might +complicate it, for example by having more queues, but discussing those +variations is out of scope of this documentation. When processor is +free, one of the processes from the queue (determined by some scheduling +algorithm implemented in the kernel) gets +chosen and run on the processor. + +As one process could never use a supervisor call, it could occupy the +processor forever. To remedy this, timer interrupts can be used by the +kernel to interrupt the execution of a process after some time. The +process would then have it's context saved and go to the end of the +queue. Another process would be scheduled to run. + +Other exceptions might occur when process is running. Depending on +kernel design, handler of an exception (such as IRQ) might return to the +process or cause another one to be scheduled. + +If at some time all processes are blocked waiting, the kernel can wait +for some interrupt to happen, which could possibly unblock some process +(i.e. because IO completed). + +While not mentioned earlier, switching between processes' contexts +involves not only saving and restoring of registers, but also changing +the translation table entries to properly map memory regions used by +current process. + +In our project, process management is implemented in +src/arm/PL1/kernel/scheduler.c. + +A "queue" contains data of the only process (variables PL0\_regs[], +PL0\_sp, PL0\_lr and PL0\_PSR). + +** Scheduler functions + +Function setup\_scheduler\_structures is supposed to be called before +scheduler is used in any way. + +Function schedule\_new() creates and runs a new process. + +Function schedule\_wait\_for\_output() causes the current process to +have it's context saved and get blocked waiting for UART to send data. +It is called from supervisor call handler. Function +schedule\_wait\_for\_input() is similar, but process waits for UART to +receive data. + +Function schedule() attempts to select a process (currently the only +one) and run it. If process cannot be run, schedule() waits for +interrupt, that could unblock the process. The interrupt handler would +not return in this case, but rather call schedule() again. + +Function scheduler\_try\_output() is supposed to be called by IRQ +handler when UART is ready to transmit more data. It can cause a process +to get unblocked. scheduler\_try\_input() is simillar, but relates to +receiving data. + +The following are assured in our design: + +1. When processor is in user mode, interrupts are enabled. +2. When processor is in system mode, interrupts are disabled, except + when explicitly waiting for the interrupt when process is blocked. +3. When a process is waiting for input/output, the corresponding IRQ is + unmasked. Otherwise, that IRQ is masked. +4. If an interrupt from UART occurs during execution of user mode code + (not possible here, as we only have one process, but shall become + possible when proper processes are implemented), the handler shall + return. If that interrupt occurs during execution of PL1 code, it + means it occured in scheduler, that was implicitly waiting for it and + the handler calls scheduler() again instead of returning. +5. Interrupt from timer is unmasked and set to come whenever a process + gets scheduled to run. Timer interrupt is disabled when in PL1 (when + scheduler is waiting for interrupt, only UART one can come). +6. A supervisor call requesting an UART operation, that can not be + completed immediately, causes the process to block. + +* Linking + +[[https://en.wikipedia.org/wiki/Linker_%28computing%29][Linking]] is a process of creating an executable, library or another +object file out of object files. +During linking, values previously unknown to the compiler (i.e. what +will be the addresses of external functions/variables, from what address +will the code be executing) might be injected into the code. + +Linker script is, among others, used to tell the linker, where in memory +the specific parts of the executable should lie. + +In a hosted environment (when building a program to run under an +full-featured operting system, like GNU/Linux), a linker script is +usually provided by the toolchain and used if no other script is +provided. In a bare-metal project, the developer usually has to write +their own linker script, in which they specify the binary image's *load +address* and section layout. + +Contents of an object code file or executable (our .o or .elf) are +grouped into sections. Sections have names. Common named are .text +(usually contains code), .data (usually contains statically-allocated +variables initialized to non-zero values), .bss (usually used to reserve +memory for statically allocated variables initialized to zero), .rodata +(usually contains statically-allocated variables, that are not going to +be modified). + +In a hosted environment, when an executable (say, of elf format) is +executed, contents of it's sections are usually placed in different +memory segments with different access privileges, so that, for example, +code is not writable and variable contents are not executable. This +helps reduce the risk of buffer overflow exploits. + +In a bare-environment like ours, we don't execute an elf file directly +(except in qemu, which is the unpreferred approach anyway), but rather a +raw binary image created from an elf file. Still, the notion of section +is used along the way. + +During link, one or more object code files are combined into one file +(in our case an executable). Section contents of input files land in +some sections of the output file, in a way defined in the linker script. +In a hosted environment, a linker script would likely put contents of +input .text sections in a .text section, contents of input .data +sections in a .data section, etc. The developer can, however, use +sections with different names (although weird behaviour of some linkers +might occur) and assign their contents in their preferred way using a +linker script. + +In linker script it is possible to specify a section as NOLOAD (usually +used for .bss), which, in our case, causes that section not to be +included in the binary image later created with objcopy. + +It is also possible to treat same-named input sections differently +depending on what file they came from and even use wildcards when +specifying file names. + +Variables can be created, as well as new symbols, which can then be +references from C code. + +Defining alignment of specific parts of future image is also easily +achievable. + +We made use of all those possibilities in our scripts. + +In src/arm/PL1/kernel/kernel\_stage2.ld the physical memory layout of +thkernel is defined. Symbols defined there, such as \_stack\_end, are +referenced in C header src/arm/PL1/kernel/memory.h. + +While src/arm/PL1/kernel/kernel.ld and src/arm/PL1/loader/loader.ld +define the starting address, it is irrelevant, as the assembly-written +position-independent code for first stages of loader and kernel does not depend on that address. + +At the beginning of this project, we had very little understanding of +linker scripts' syntax. +[[https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Using_ld_the_GNU_Linker/sections.html#OUTPUT-SECTION-DESCRIPTION][This article]] proved useful and allowed us to learn the required parts in a +short time. As discussing the entire syntax of linker scripts is beyond +the scope of this documentation, we refer the reader to that resource. + +* Miscellaneous topics + +** Supervisor calls + +Supervisor call happens, when the svc (previously called swi) +instruction get executed. Exception is then entered. Supervisor call is +the standard way for user process to ask the kernel for something. As +user code might request many different things, the kernel must somehow +know which one was requested. The svc instruction takes one immediate +operand. The supervisor call exception handler can check at what address +the execution was, read svc instruction from there and inspect it's +bytes. This way, by executing svc with different immediate values, the +used mode code can request different things from the kernel - the value +in svc shall encode the request's type. + +To save time and for the sake of simplicity, we don't make use of +immediades in svc and instead we encode call's type in r0. In our +implementation we decided, that supervisor call will preserve and +clobber the same registers as function call and it will return values +through r0, just as function call. This enables us to use actually +perform the supervisor call as call to function defined in +src/arm/PL0/svc.S. Calls from C are performed in +src/arm/PL0/PL0\_utils.c and request type encodings are defined in +src/arm/common/svc\_interface.h (they must be known to both user mode +code and handler code). + +** Utilities + +We've compiled useful utilities (i.e. memcpy(), strlen(), etc.) in +src/arm/common/strings.c. Those Do not depend on the environment and can +be used by both user mode code, kernel code, even bootloader code. +Functions used for io (like puts()) are also defined in common way for +privileged and unprivileged code. They do, however, rely on the +existence of putchar() and getchar(). In PL0 code +(src/arm/PL0/PL0\_utils.c), putchar() and getchar() are defined to +perform a supervisor call, that does that operation. In the PL1 code, +they are defined as operations on UART. + +** Timers + +Several timers are available on the RaspberryPi: + +1. System Timer (with 4 interrupt lines, regarded as the most reliable, + as it is not derived from the system clock and hence is not affecter + by processor power mode changes), + [[https://cs140e.sergio.bz/docs/BCM2837-ARM-Peripherals.pdf][BCM2837 ARM Peripherals, Chapter 12]] +2. ARM side Timer (based on a ARM AP804) + [[https://cs140e.sergio.bz/docs/BCM2837-ARM-Peripherals.pdf][BCM2837 ARM Peripherals, Chapter 14]] +3. ARM Generic Timer (optional extension to ARMv7-A and ARMv7-R, + configured through coprocessor registers) + +At first, we attempted to use the System Timer, some code for which is +still present in src/arm/PL1/kernel/bcmclock.h. The interrupts from that +timer are not, however, routed to any ARM core under rpi-open-firmware, +but rather to the GPU. Because of that, we ended using the ARM side +Timer (programmed in src/arm/PL1/kernel/armclock.h). The ARM side Timer +based on ARM AP804 is currently only available on real hardware and not +in qemu. Programming the ARM Generic Timer (listed in TODOs) could +enable the use of timer interrupts in qemu. + +** UARTs + +src/arm/PL1/PL1\_common/uart.c implements putchar() and getchar() in +terms of UART. Those implementations are blocking - they poll UART +peripheral registers in a loop, checking, if the device is ready to +perform the operation. They are, however, accompanied by functions +getchar\_non\_blocking() and putchar\_non\_blocking(), that check *once* +if the device is ready and only perform the operation if it is. +Otherwise, they return an error value, Their purpose is to use them with +interrupts. In interrupt-driven UART we avoid waiting in a loop - +instead, an IRQ comes when desired UART's operation completes. The code +that wants to write/read from UART, does, however, need to tie it's +operation with IRQ handler and scheduler. Blocking versions should not +be used once UART interrupts are enabled or in exception handlers, that +should always run quickly. However, doing this does not break UART and +might be justified for debugging purposes (like error() function defined +in src/arm/common/io.c and used throughout the kernel code). + +There are 2 UARTs in RapsberryPi. One mini UART (also called UART 1) and +one PL011 UART (also called UART 0). The PL011 UART is used exclusively +in this project. The hardware allows some degree of configuration of +which pins which UART is routed to (via so-called alternative +functions). In our project it is assumed, that UART 0's TX and RX are +routed to GPIO pins 14 & 15 by the firmware, which is true for +rpi-open-firmware. With stock Broadcom firmware, either changing the +default configuration (config.txt) or selection of alternative fuctions +as part of uart initialization (present in TODOs list) might be +required. + +Before UART can be used, GPIO pins 14 and 15 should have pull up/down +disabled. This is done as part of UART initialization in uart\_init() in +src/arm/PL1/PL1\_common/uart.c. There is a requirement that UART is +disabled when being configured, which is also fulfilled by uart\_init(). +The PL011 is toroughly described in +[[https://cs140e.sergio.bz/docs/BCM2837-ARM-Peripherals.pdf][BCM2837 ARM Peripherals]] as well as [[http://infocenter.arm.com/help/topic/com.arm.doc.ddi0183f/DDI0183.pdf][PrimeCell UART (PL011) Technical Reference Manual]]. + +* Afterword + +This project has been done as part of the Embedded Systems course on +[[https://www.agh.edu.pl/en/][AGH University of Science and Technology]]. The goal of the project was to investigate and program the +MMU (Memory Management Unit) of the RaspberryPi, but ended up to form a +basis of a small operating system. +[[https://www.raspberrypi.org/products/raspberry-pi-3-model-b/][RaspberyPi 3 model B]] was the hardware platform used, with stock firmware replaced +with +[[https://github.com/christinaa/rpi-open-firmware][rpi-open-firmware]]. +An emulator, [[https://www.qemu.org/download/][qemu]] (version 2.9.1) +capable of emulating an older RaspberryPi 2 was also used extensively. + +The project was written in C programming language and ARM assembly. +Knowlegde of C is required to understand the code. Knowledge of ARM +assembly is useful, but it should be considered a thing, that can be +learned *while* working with it. Still, the reader should at least have +an idea of what assembly language is and how it is used. + +This documentation is intended to provide information on bare-metal +programming on the RapsberryPi and ARM in general, as well as +description of our solutions and implementations. There is a lot of +information available on the topic in online sources, yet it is not always in an +easy-to-understand form and the amount of different options described in +manuals might me overwhelming for people new to the topic. That's why we +attempted to describe our work in a way the audience of bare-metal +programming newcomers will find useful. External resources we used are listed at the end of the documentation. + +It is planned, for future years students of the Embedded Systems course, +to have an option to continue or reuse previous projects, such as this +one. We hope this documentation will prove useful to our younger +colleagues who happen to be work with the codebase. + +In case on any bugs or questions, the authors can be contacted at kwojtus@protonmail.com. + +* Sources of Information + * wiki.osdev + * ARM Architecture Reference ManualĀ® ARMv7-A and ARMv7-R edition (probably the most useful document of all) + * dwelch67 + * http://www.simtec.co.uk/products/SWLINUX/files/booting\_article.html - very good description of atags + * BCM2835-ARM-Peripherals.pdf and https://elinux.org/BCM2835\_datasheet\_errata + * https://buildmedia.readthedocs.org/media/pdf/devicetree-specification/latest/devicetree-specification.pdf + * online ARM Compiler toolchain Assembler Reference + * Christina Brook's rpi-open-firmware + * http://infocenter.arm.com/help/topic/com.arm.doc.ddi0183g/DDI0183G\_uart\_pl011\_r1p5\_trm.pdf + * GNU make documentation + * description of linker scripts: https://access.redhat.com/documentation/en-US/Red\_Hat\_Enterprise\_Linux/4/html/Using\_ld\_the\_GNU\_Linker/sections.html#OUTPUT-SECTION-DESCRIPTION -- cgit v1.2.3