From dbe4100b58c901685f223a241d90bd901ea59c68 Mon Sep 17 00:00:00 2001
From: Wojtek Kosior <kwojtus@protonmail.com>
Date: Tue, 21 Jan 2020 16:51:59 +0100
Subject: rewrite documentation in org

---
 README.org | 1321 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 1321 insertions(+)
 create mode 100644 README.org

diff --git a/README.org b/README.org
new file mode 100644
index 0000000..4f6f7d8
--- /dev/null
+++ b/README.org
@@ -0,0 +1,1321 @@
+#+TITLE: RaspberryPi MMU example
+
+* Building the project
+** Dependencies
+1. Native GCC (+ binutils)
+2. ARM cross-compiler GCC (+ binutils) (arm-none-eabi works - others
+   might or might not)
+3. GNU Make
+4. rpi-open-firmware (for running on the Pi)
+5. GNU screen (for communicating with the kernel when running on the Pi)
+6. socat (for communicating with the bootloader when running on the Pi)
+7. Qemu ARM (for emulating the Pi).
+
+For building rpi-open-firmware one will need more tools (not listed
+here).
+
+The project has been tested only in Qemu emulating Pi 2 and on real Pi 3 model B.
+
+Running on Pis other than Pi 2 and Pi 3 is sure to require changing the definition in global.h (because peripheral base addresses differ between Pi versions) and might also require other modifications, not known at this time.
+
+Assuming make, gcc, arm-none-eabi-gcc and its binutils are in the PATH, the kernel can be built with:
+
+#+BEGIN_EXAMPLE
+    $ make kernel.img 
+#+END_EXAMPLE
+
+which is the same as:
+
+#+BEGIN_EXAMPLE
+    $ make
+#+END_EXAMPLE
+
+The bootloader can be built with:
+
+#+BEGIN_EXAMPLE
+    $ make loader.img
+#+END_EXAMPLE
+
+Both loader and kernel can then be found in build/
+
+* Running
+** Running in Qemu
+To run the kernel (passed as elf file) in qemu:
+
+#+BEGIN_EXAMPLE
+    $ make qemu-elf
+#+END_EXAMPLE
+
+If You want to pass a binary image to qemu:
+
+#+BEGIN_EXAMPLE
+    $ make qemu-bin
+#+END_EXAMPLE
+
+To pass loader image to qemu and pipe kernel to it through emulated uart:
+
+#+BEGIN_EXAMPLE
+    $ make qemu-loader
+#+END_EXAMPLE
+
+With qemu-loader the kernel will run, but will be unable to receive any keyboard input.
+
+The timer used by this project is the ARM timer ("based on an ARM
+AP804", with registers mapped at 0x7E00B000 in the GPU address space).
+It's absent in emulated environment, so no timer interrupts can be
+witnessed in qemu.
+
+** Running on real hardware.
+
+First, the rpi-open-firmware has to be built. Then, kernel.img (or
+loader.img) should be copied to the SD card (next to bootcode.bin) and renamed to
+zImage. Also, the .dtb file corresponding to the Pi model (actually, any .dtb
+would do, it is not used right now) from stock firmware files has to be put to the SD
+card and renamed as rpi.dtb. Finally, a cmdline.txt has to be present on the SD card
+(content doesn't matter).
+
+Now, RaspberryPi can be connected via UART to the development machine. GPIO on the Pi works
+with 3.3V, so one should make sure, that UART device on the other end is
+also working wih 3.3V. This is the pinout of the RaspberyPi 3 model B
+that has been used for testing so far:
+
+#+BEGIN_EXAMPLE
+    Top left of the board is here
+        |
+        V
+        +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+  
+        | 2| 4| 6| 8|10|12|14|16|18|20|22|24|26|28|30|32|34|36|38|40|  
+        +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+  
+        | 1| 3| 5| 7| 9|11|13|15|17|19|21|23|25|27|29|31|33|35|37|39|  
+        +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+  
+#+END_EXAMPLE
+
+Under rpi-open-firmware (stock firmware might map UARTs differently):
+
+1. pin 6 is Ground
+2. pin 8 is TX
+3. pin 10 is RX
+
+Once UART is connected, the board can be powered on.
+
+It is assumed, that USB to UART adapter is used and it is seen by the system as /dev/ttyUSB0.
+
+If one copied the kernel to the SD card, they can start communicating
+with the board by running:
+
+#+BEGIN_EXAMPLE
+    $ screen /dev/ttyUSB0 115200,cs8,-parenb,-cstopb,-hupcl
+#+END_EXAMPLE
+
+If one copied the loader, they can send it the kernel image and start
+communicating with the system by running:
+
+#+BEGIN_EXAMPLE
+    $ make run-on-rpi
+#+END_EXAMPLE
+
+To run again, one can replug USB to UART adapter and Pi's power supply (order
+matters!) and re-enter the command.
+
+Running under stock firmware has not been performed. In particular, the
+default configuration on RaspberryPi 3 seems to map other UART than used
+by the kernel (so-called miniUART) to pins 6, 8 and 10. This is supposed
+to be configurable through the use of overlays.
+
+* Makefile
+ To maintain order, all files created with the use of make, that is binaries, object
+files, natively executed helper programs, etc. get placed in build/.
+
+Our project contains 2 Makefiles: one in it's root directory and one in
+build/. The reason is that it is easier to use Makefile to simply,
+elegantly and efficiently produce files in the same directory where it
+is. To produce files in directory other than Makefile's own, it requires
+this directory to be specified in many rules across the Makefile and in
+general it complicates things. Also, a problem arises when trying to
+link objects not from within the current directory. If an object is
+referenced by name in linker script (which is a frequent practice in our
+scripts) and is passed to gcc with a path, then it'd need to also appear
+with that path in the linker script. Because of that a Makefile in
+build/ is present, that produces files into it's own directory and the
+Makefile in project's root is used as a proxy to that first one - it
+calls make recursively in build/ with the same target it was called
+with. These changes makes it easier to read.
+
+From now on only Makefile in build/ will be discussed.
+
+In the Makefile, variables with the names of certain tools and their
+command line flags are defined (using =? assignment, which allows one to
+specify their own value of that variable on the command line). In case a
+cross-compiler with a different triple should be used, ARM\_BASE,
+normally set to arm-none-eabi, can be set to something like
+arm-linux-gnueabi or even /usr/local/bin/arm-none-eabi.
+
+All variables discussed below are defined using := assignment, which
+causes them to only be evaluated once instead of on every reference to
+them.
+
+Objects that should be linked together to create each of the .elf files
+are listed in their respective variables. I.e. objects to be used for
+creating kernel\_stage2.elf are all listed in KERNEL\_STAGE2\_OBJECTS.
+When adding a new source file to the kernel, it is enough to add it's
+respective .o file to that list to make it compile and link properly. No
+other Makefile modifications are needed. In a similar fashion,
+RAMFS\_FILES variable specifies files, that should be put in the ramfs
+image, that will be embedded in the kernel. Adding another file only
+requires listing it there. However, if the file is to be found somewhere
+else that build/, it might be useful to use the vpath directive to tell
+make where to look for it.
+
+Variables dirs and dirs\_colon are defined to store list of all
+directories within src/, separated with spaces and colons, respectively.
+dirs\_colons are used for vpath directive. 'dirs' variable is used in
+ARM\_FLAGS to pass all the directories as include search paths to gcc.
+empty and space are helper variables - defining dirs\_colon could be
+achieved without them (but it's clearer this way).
+
+The vpath directive tells make to look for assembler sources, C sources
+and linker scripts in all direct and indirect subdirectories of src/
+(including itself). All other files shall be found/created in build/.
+
+** Targets
+
+The default target is the binary image of the kernel.
+
+The generic rule for compiling C sources uses cross-compiler or native
+compiler with appropriate flags depending on whether the source file is
+located somewhere under arm/ directory (which lies in src/) or enywhere
+else.
+
+The generic rules for making a stripped binary image out of elf file,
+for assembling an assembly file, for making an arbitrary file a linkable
+object and for linking objects are ARM-only.
+
+In C world it is possible to embed a file in an executable by using
+objcopy to create an object file from it and then linking that object
+file into the executable. In this project, at the current time, this is
+used only for embedding ramfs in the kernel (incbin is used for
+embedding kernel and loader second stages in their first stages).
+Generic rule for making a binary image into object file is present, in
+case it is needed somewhere else again.
+
+To link elf files, the generic rule is combined with a rule that
+specifies the elf's objects. Objects are listed in variables whenever
+more than one of them is needed.
+
+At this point in the Makefile, the dependence of objects created from
+assembly on files referenced in the assembly source via incbin is
+marked.
+
+Simple ram filesystem is created from files it should contain with the
+use of our own simple tool - makefs.
+
+Another 2 rules specify how native programs (for the machine we're
+working on) are to be linked.
+
+** Aliased Rules
+
+Rule qemu-elf runs the kernel in qemu emulating RaspberryPi 2 with
+256MiB of memory by passing the elf file of the kernel to the emulator.
+
+Rule qemu-bin does the same, but passes the binary image of the kernel
+to qemu.
+
+Rule qemu-loader does the same, but first passes the binary image of the
+bootloader to qemu and the actual kernel is piped to qemu's standard
+input, received by bootloader as uart data and run. This method
+currently makes it impossible to pass any keyboard input to kernel once
+it's running.
+
+Rule run-on-rpi pipes the kernel through uart, assuming it is available
+under /dev/ttyUSB0, and then opens a screen session on that interface.
+This allows for executing the kernel on the Pi connected through UART,
+provided that our bootloader is running on the board.
+
+Rule clean removes all the files generated in build/.
+
+Rules that don't generate files are marked as PHONY.
+
+
+* Project structure
+  Directory structure of the project:
+
+#+BEGIN_EXAMPLE
+    doc/
+    build/
+          Makefile
+    Makefile
+    src/
+        lib/
+            rs232/
+                  rs232.c
+                  rs232.h
+        host/
+             pipe_image.c
+             makefs.c
+        arm/
+            common/
+                   svc_interface.h
+                   strings.c
+                   io.h
+                   io.c
+                   strings.h
+            PL0/
+                PL0_utils.h
+                svc.S
+                PL0_utils.c
+                PL0_test.c
+                PL0_test.ld
+            PL1/
+                loader/
+                       loader_stage2.ld
+                       loader_stage2.c
+                       loader_stage1.S
+                       loader.ld
+                kernel/
+                       demo_functionality.c
+                       paging.h
+                       setup.c
+                       interrupts.h
+                       interrupt_vector.S
+                       kernel.ld
+                       scheduler.h
+                       atags.c
+                       translation_table_descriptors.h
+                       bcmclock.h
+                       ramfs.c
+                       kernel_stage1.S
+                       paging.c
+                       ramfs.h
+                       interrupts.c
+                       armclock.h
+                       atags.h
+                       kernel_stage2.ld
+                       cp_regs.h
+                       psr.h
+                       scheduler.c
+                       memory.h
+                       demo_functionality.h
+                PL1_common/
+                           global.h
+                           uart.h
+                           uart.c
+#+END_EXAMPLE
+
+** Most significant directories and files
+
+doc/ Contains documentation of the project.
+
+build/ Contains main Makefile of the project. All objects created during
+the build process are placed there.
+
+Makefile Proxies all calls to Makefile in build/.
+
+src/ Contains all sources of the project.
+
+src/host/ Contains sources of helper programs to be compiled using
+native GCC and run on the machine where development takes place.
+
+src/arm/ Contains sources to be compiled using ARM cross-compiler GCC
+and run on the RaspberryPi.
+
+src/arm/common Contains sources used in both: privileged mode and
+unprivileged mode.
+
+src/arm/PL0 Contains sources used exclusively in unprivileged, user-mode
+(PL0) program, as well as the program's linker script.
+
+src/arm/PL1 Contains sources used exclusively in privileged (PL1) mode.
+
+src/arm/PL1/loader Contains sources used exclusively in the bootloader,
+as well as linker scripts for stages 1 and 2 of this bootloader.
+
+src/arm/PL1/kernel Contains sources used exclusively in the kernel, as
+well as linker scripts for stages 1 and 2 of this kernel.
+
+src/arm/PL1/PL1\_common Contains sources used in both: kernel and
+bootloader.
+
+TODOs Contains what the name suggests, in plain text. It lists things
+that still can be implemented or improved, as well as tasks, that were
+once listed and have since been completed (in which case they're marked
+as done).
+
+* Boot Process
+ When RaspberryPi boots, it searches the first
+partition on SD card (which should be formatted FAT) for its firmware
+and configuration files, loads them and executes them. The firmware then
+searches for the kernel image file. The name of the looked for file can
+be kernel.img, kernel7.img, kernel8.img (for 64-bit mode) or something
+else, depending on configuration and firmware used (rpi-open-firmware
+looks for zImage).
+
+The image is then copied to some address and jumped to on all cores.
+Address should be 0x8000 for 32-bit kernel, but in reality is 0x2000000
+in rpi-open-firmware and 0x10000 in qemu (version 2.9.1). 3 arguments
+are passed to the kernel: first (passed in r0) is 0; second (passed in
+r1) is machine type; third (passed in r2) is the address of FDT or ATAGS
+structure describing the system or 0 as default.
+
+PIs that support aarch64 can also boot directly into 64-bit mode. Then,
+the image gets loaded at 0x80000. We're not using 64-bit mode in this
+project.
+
+Qemu can be used to emulate RaspberryPi, in which case kernel image and
+memory size are provided to the emulator on the command line. Qemu can
+also load kernel in the form of an elf file, in which case its load
+address is determined based on information in the elf.
+
+Our kernel has been executed on qemu emulating RaspberryPi 2 as well as
+on real RaspberryPi 3 running rpi-open firmware (although not every
+functionality works everywhere). To quicken running new images of the
+kernel on the board, a simple bootloader has been written by us, which
+can be run from the SD card instead of the actual kernel. It reads the
+kernel image from uart, and executes it. The bootloader can also be used
+within qemu, but there are several problems with passing keyboard input
+to the kernel once it's running.
+
+Both bootloader and kernel are split into 2 stages.
+
+** Loader
+
+In case of the loader it is due to the fact, that the the actual kernel
+read by it from UART is supposed to be written at 0x8000. If the loader
+also ran from 0x8000 or a close address, it could possibly overwrite
+it's own code while writing kernel to memory. To avoid this, the first
+stage of the loader first copies its second stage embedded in it to
+address 0x4000. Then, it jumps to that second stage, which reads kernel
+image from uart, writes it at 0x8000 and jumps to it. Arguments (r0, r1,
+r2) are preserved and passed to the kernel. Second stage of the
+bootloader is intended to be kept small enough to fit between 0x4000 and
+0x8000. Atags structure, if present, is guaranteed to end below 0x4000,
+so it should not get overwritten by loader's stage2.
+
+The loader protocol is simple: first, size of the kernel is sent through
+UART (4 bytes, little endian). Then, the actual kernel image. Our
+program pipe\_image is used to prepend kernel image with its size.
+
+** Kernel
+ In case of kernel, it is desired to have image run from 0x0,
+because that's where the interrupt vector table is under default
+settings. This is also achieved by splitting it into 2 stages.
+*** Stage 1
+ Stage 1 is loaded at some higher address. It has second stage
+image embedded in it. It copies it to 0x0 and jumps to it. What gets
+more complicated compared to loader, is the handling of ATAGS structure.
+Before copying stage 2 to 0x0, stage 1 first checks if atags is present
+and if so, it is copied to some location high enough, that it won't be
+overwritten by stage 2 image. Whenever the memory layout is modified, it
+should be checked, if there is a danger of ATAGS being overwritten by
+some kernel operations before it is used. In current setup, new location
+chosen for ATAGS is always below the memory later used as the stack and
+it might overlap memory later used for translation table, which is not a
+problem, since kernel only uses ATAGS before filling that table.
+
+When stage 1 of the kernel jumps to second stage, it passes modified
+arguments: first argument (r0) remains 0 if ATAGS was found and is set
+to 3 to indicate, that ATAGS was not found. Second argument (r2) remains
+unchanged. Third argument (r2) is the current address of ATAGS (or
+remains unchanged if no ATAGS was found). If support for FDT is added in
+the future, it must also be done carefully, so that FDT doesn't get
+overwritten.
+*** Stage 2
+ At the start of the stage 2 of the kernel,
+there is the interrupt vector table. It's first entry is the reset
+vector, which is not normally unused. In our case, when stage 1 jumps to
+0x0, first instruction of stage 2, it jumps to that vector, which then
+calls the setup routine.
+
+*** Notes
+
+In both loader and the kernel, at the beginning of stage1 it is ensured,
+that only one ARM core is executing.
+
+It's worth noting, that in first stages the loop that copies the
+embedded second stage is intentionally situated after the blob in the
+image. This way, this loop will not overwrite itself with the data it is
+copying, since the stage 2 is always copied to some lower address. It
+copies to 0x0 in case of kernel and to 0x4000 in case of loader - we
+assume stage 1 won't be loaded below 0x4000.
+
+Qemu, stock RaspberryPi firmware and rpi-open-firmware all load image at
+different addresses. Although stock firmware is not used in this
+project, our loader loads kernel at 0x8000, where the stock firmware
+would. Because of that, it is desired, that image is able to run,
+regardless of where it was loaded at. This was realized by writing first
+stages of loader and kernel in careful, position-independent assembly.
+The starting address in corresponding linker scripts is irrelevant. The
+stage 2 blobs are embedded using .incbin assembly directive. Second
+stages are written normally in C and compiled as position-dependent for
+their respective addresses.
+
+* MMU
+
+Here's an explanation of steps we did to enable the MMU and how the MMU
+works in general.
+
+MMU stands for Memory Management Unit. It does 2 important things:
+
+1. It allows programs to use virtual memory addressing. Virtual
+   addresses are translated by the MMU to physical addresses with the
+   help of translation table.
+2. It guards against unallowed memory access. Element that only
+   implements this functionality is called MPU (Memory Protection Unit)
+   and is also found in some ARM cores.
+
+Without MMU code executing on a processor sees the memory as it really
+is.
+
+When it tries to load data from address 0x00AA0F3C it indeed loads data
+from 0x00AA0F3C. This doesn't mean address 0x00AA0F3C is in RAM: RAM can
+be mapped into the address space in an arbitrary way.
+
+MMU can be configured to "redirect" some range of addresses to some
+other range. Let's assume we configured the MMU to translate address
+range 0x00A00000 - 0x00B00000 to range 0x00200000 - 0x00300000. Now,
+code trying to perform operation on address 0x00AA0F3C would have the
+address transparently translated to 0x002A0F3C, on which the operation
+would actually take place.
+
+The translation affects all (stack and non-stack) data accesses as well
+as instruction fetches, hence an entire program can be made to work as
+if it was running from some memory address, while in fact it runs from a
+different one!
+
+The addresses used by program code are referred to as virtual addresses,
+while addresses actually used by the processor - as physical addresses.
+
+This aids operating system's memory management in several ways
+
+1. A program may by compiled to run from some fixed address and the OS
+   is still free to choose any physical location to store that program's
+   code - only a translation of program's required address to that
+   location's address has to be configured. A problem of simultaneous
+   execution of multiple programs compiled for the same address is also
+   avoided in this way.
+2. A consecutive memory region might be required by some program. For
+   example: due to earlier allocations and deallocactions there isn't a
+   big enough (no pun intended) free consecutive region of physical
+   memory. Smaller regions can be mapped to become accessible as a
+   single region in virtual address space, thus avoiding the need for
+   defragmentation.
+
+A given mapping can be made valid for only one execution mode (i.e.
+region only accessible from privileged mode) or only certain types of
+accesses . A memory region can be made non-executable, which guards
+against accidental jumping there by program code. That is important for
+countering buffer-overflow exploits. An unallowed access triggers a
+processor exception, which passes control to an appropriate interrupt
+service routine.
+
+In RaspberryPi environments used by us, there are ARMv7-A compatible
+processors, which we currently use only in 32-bit mode. Information here
+is relevant to those systems (there are Pi boards with both older and
+newer processors, with more or less functionality and features
+available).
+
+If MMU is present, general configuration of it is done through registers
+of the appropriate coprocessor (cp15). Translations are managed through
+translation table. It is an array of 32-bit or 64-bit entries (also
+called descriptors) describing how their corresponding memory regions
+should be mapped. A number of leftmost bits of a virtual address
+constitutes an index into the translation table to be used for
+translating it. This way no virtual addresses need to be stored in the
+table and MMU can perform translations in O(1) time.
+
+** Coprocessor 15
+
+Coprocessor 15 contains several registers, that control the behaviour of
+the MMU. They are all accessed through mcr and mrc arm instructions.
+
+1. SCTLR, System Control Register - "provides the top level control of
+   the system, including its memory system". Bits of this register
+   control, among other things, whether the following are enabled:
+
+   1. the MMU
+   2. data cache4. TEX remap
+   3. instruction cache
+   4. TEX remap (changes how some translation table entry bit fields
+      (called C, B and TEX) are used - not in the project)
+   5. access flags (enabling causes one translation table descriptor bit
+      normally used to specify access permissions of a region to be used
+      as access flag - not used either)
+
+2. DACR, Domain Access Control Register - "defines the access permission
+   for each of the sixteen memory domains". Entries in translation table
+   define which of available 16 memory domains a memory region belongs
+   to. Bits of DACR specify what permissions apply to each of the
+   domains. Possible settings are to allow accesses to regions based on
+   settings in translation table descriptor or to allow/disallow all
+   accesses regardless of access permission bits in translation table.
+
+3. TTBR0, Translation Table Base Register 0 - "holds the base address of
+   translation table 0, and information about the memory it occupies".
+   System mode programmer can choose (with respect to some alignment
+   requirements) where in the physical memory to put the translation
+   table. Chosen address (actually, only a number of it's leftmost bits)
+   has to be put in TTBR for the MMU to know where the table lies. Other
+   bits of this register control some memory attributes relevant for
+   accesses to table entries by the MMU
+
+4. TTBR1, Translation Table Base Register 1 - simillar function to TTBR0
+   (see below for explaination of dual TTBR)
+5. TTBCR, Translation Table Base Control Register, which controls:
+
+   1. How TLBs (Translation Lookaside Buffers) are used. TLBs are a
+      mechanism of caching translation table entries.
+   2. Whether to use some extension feature, that changes traslation
+      table entries and TTBR* lengths to 64-bit (we're not using this,
+      so we won't go into details)
+   3. How a translation table is selected.
+
+There can be 2 translation tables and there are 2 cp15 registers (TTBR0
+and TTBR1) to hold their base addresses. When 2 tables are in use, then
+on each memory access some leftmost bits of virtual address determine
+which one should be used. If the bits are all 0s - TTBR0-pointed table
+is used. Otherwise - TTBR1 is used. This allows OS developer to use
+separate translation tables for kernelspace and userspace (i.e. by
+having the kernelspace code run from virtual addresses starting with 1
+and userspace code run from virtual addresses starting with 0). A field
+of TTBCR determines how many leftmost bits of virtual address are used
+for that (and also affects TTBR0 format). In the simplest setup (as in
+our project) this number is 0, so only the table specified in TTBR0 is
+used.
+
+** Translation table
+
+Translation table consists of 4096 entries, each describing a 1MB memory
+region. An entry can be of several types:
+
+1. Invalid entry - the corresponding virtual addresses can not be used
+2. Section - description of a mapping of 1MB memory region
+3. Supersection - description of a mapping of 16MB memory region, that
+   has to be repeated 16 times in consecutive memory sections . This can
+   be used to map to physical addresses higher than 2\^32.
+4. Page table - no mapping is given yet, but a page table is pointed.
+   See below.
+
+Besides, translation table descriptor also specifies:
+
+1. Access permissions.
+2. Other memory attributes (cacheability, shareability).
+3. Which domain the memory belongs to.
+
+** Page Table
+
+Page table is something simillar to translation table, but it's entries
+define smaller regions (called, well - pages). When a translation table
+descriptor describing a page table gets used for translation, then entry
+in that page table is fetched and used along with some middle bits of
+the virtual address used as index. This allows for better granularity of
+mappings, as it doesn't require the page tables to occupy space if small
+pages are not needed. We could say, that 2-level translations are
+performed. On some versions of ARM translations can have more levels
+than that. This means the MMU might sometimes need to fetch several
+entries from different level tables to compute the physical address.
+This is called a translation table walk.
+
+As of 15.01.2020 page tables and small pages are not used in the project
+(although programming them is on the TODO list).
+
+** Project specific information
+
+Despite the overwhelming amount of configuration options available, most
+can be left deafult and this is how it's done in this project. Those
+default settings usually make the MMU behave like it did in older ARM
+versions, when some options were not yet available and hence, the entire
+system was simpler.
+
+Our project uses C bitfield structs for operating on SCTLR and TTBCR
+contents and translation table descriptors. With DACR - bit shifts are
+more appropriate and with TTBCR - our default configuration means we're
+writing '0' to that register. This is an elegant and readable approach,
+yet little-portable across compilers. Current struct definitions work
+properly with GCC.
+
+Structs describing SCTLR, DACR and TTBCR are defined in
+src/arm/PL1/kernel/cp\_regs.h. Structs describing translation table
+descriptors are defined in
+src/arm/PL1/kernel/translation\_table\_descriptors.h.
+
+Before the MMU is enabled, all memory is seen as it really is.
+Therefore, the only feasible way of enabling it is by initially setting
+the descriptors in translation table to map all addresses (mapping just
+addresses used by the kernel would be enough) to themselves. It is
+called a flat map.
+
+** Setting up MMU and FlatMap
+
+How setting up a flat map and turning on the MMU and management of
+memory sections is done in our project:
+
+1. Translation table is defined in the linker script
+   src/arm/PL1/kernel/kernel\_stage2.ld as a NOLOAD section. C code gets
+   the table's start and end addresses from symbols defined in that
+   linker script (see arm/PL1/kernel/memory.h).
+2. Function setup\_flat\_map() defined in arm/PL1/kernel/paging.c
+   enables MMU with a flat map. It prints relevant information to uart
+   while performing the following procedure:
+
+   1. In a loop write all descriptors to the translation table, set them
+      as sections, accessible from PL1 only, belonging to domain 0.
+   2. Set DACR to allow domain 0 memory accesses, based on translation
+      table descriptor permissions and block accesses to other domains,
+      as only domain 0 is used in this project.
+   3. Make sure TEX remap, access flag, caches and the MMU are disabled
+      in SCTLR. Disabling some of them might be unnecessary, because MMU
+      is assumed to be disabled from the start and enabled caches might
+      cause no problems as long as only flat map is used. Still, the way
+      it is done right now is known to work well and optimizations are
+      not needed.
+   4. Clear all caches and TLBs (again, it is suspected that some of
+      this is unnecessary).
+   5. Write TTBCR setting such that only 32-bit translation table is
+      used.
+   6. Make TTBR0 point to the start of translation table. Rest of
+      attributes in TTBR0 (concerning how table entries are being
+      accessed) are left as 0s (defaults).
+   7. Enable the MMU and caches by setting the appropriate bits in
+      SCTLR.
+
+After some cp15 register writes, the isb assembly instruction is used,
+which causes ARM core to wait until changes take effect. This is done to
+prevent some later instructions from being executed before the changes
+are applied.
+
+In arm/PL1/kernel/paging.c the function claim\_and\_map\_section() can
+be used to modify an entry in translation table to create a new mapping.
+Memory allocation also done in that source file uses some lists to
+describe free and taken sections, but has nothing to do with with the
+MMU.
+
+* Program Status Register
+  CPSR (Current Program Status Register) is a register, bits of which contain and/or determine various aspects of
+  execution, i.e. condition flags, execution state (arm, thumb or
+  jazelle), endianness state, execution mode and interrupt mask. This register is readable and writeable with
+  the use of mrs and msr instructions from any PL1 mode, thus it is
+  possible to change things like mode or interrupt mask by writing to this
+  register.
+
+Additionally, there are other registers with the same or simillar bit
+fields as CPSR. Those PSRs (Program Status Registers) are:
+
+1. APSR (Application Program Status Register)
+2. SPSRs (Saved Program Status Registers)
+
+APSR is can be considered the same as CPSR or a view of CPSR, with some
+limitations - some bit fields from CPSR are missing (reserved) in APSR.
+APSR can be accessed from PL0, while CPSR should only be accessed from
+PL1. This was an application program executing in user mode can learn
+some of the settings in CPSR without accessing CPSR directly.
+
+SPSR is used for exception handling. Each exception-taking mode has it's
+own SPSR (they can be called SPSR\_sup, SPSR\_irq, etc.). On exception
+entry, old contents of CPSR are backed up in entered mode's SPSR.
+Instructions used for exception return (subs and ldm \^), when writing
+to the pc, have the important additional effect of copying the SPSR to
+CPSR. This way, on return from an exception, processor returns to the
+state from before the exception. That includes endianess settings,
+execution state, etc.
+
+In our project, the structure of PSRs is defined in terms of C bitfield
+structs in src/arm/PL1/kernel/psr.h.
+
+* Ramfs
+
+A simple ram file system has been introduced to avoid having to embed
+too many files in the kernel in the future.
+
+The ram filesystem is created on the development machine and then
+embedded into the kernel. Kernel can then parse the ramfs and access
+files in it.
+
+Ramfs contains a mapping from file's name to it's size and contents.
+Directories, file permissions, etc. as well as writing to filesystem are
+not supported.
+
+Currently this is used to access the code of PL0 test program by the
+kernel, which it then copies to the appropriate memory location. In case
+more user mode programs are later written, they can all be added to
+ramfs to enable the kernel to access them easily.
+
+** Specification
+
+When ramfs is accessed in memory, it MUST be aligned to a multiple of 4.
+
+The filesystem itself consists of blocks of data, each containing one
+file. Blocks of data in the ramfs come one after another, with the
+requirement, that each block starts at a 4-aligned offset/address. If a
+block doesn't end at a 4-aligned address, there shall be up to 3
+null-bytes of padding after it, so that the next block is properly
+aligned.
+
+Each block start with a C (null-terminated) string with the name of the
+file it contains. At the first 4-aligned offset after the string, file
+size is stored on 4 bytes in little endian. Null-bytes are used for
+padding between file name and file size if necessary. Immediately after
+the file size reside file contents, that take exactly the amount of
+bytes specified in file size.
+
+As obvious from the specification, files bigger than 4GB are not
+supported, which is not a problem in the case of this project.
+
+** Implementations
+
+Creation of ramfs is done by the makefs program (src/host/makefs.c). The
+program accepts file names as command line arguments, creates a ramfs
+containing all those files and writes it to stdout. As makefs is a very
+simple tool (just as our ramfs is a simple format), it puts files in
+ramfs under the names it got on the command line. No stripping or
+normalizing of path is performed. In case of errors (i.e. io errors)
+makefs prints information to stderr and exits.
+
+Parsing/reading of ramfs is done by a kernel driver
+(src/arm/PL1/kernel/ramfs.c). The driver allows for finding a file in
+ramfs by name. File size and pointers to file name string and file
+contents are returned through a structure from function find\_file.
+
+As ramfs is embedded in kernel image, it is easily accessible to kernel
+code. The alignment of ramfs to a multiple of 4 is assured in kernel's
+linker script (src/arm/PL1/kernel/kernel\_stage2.ld). ## Exceptions
+Whenever some illegal operation (attempt to execute undefined
+instruction, attempt to access memory with insufficient permission,
+etc.) happens or some peripheral device "messages" the ARM core, that
+something important happened, an exception occurs. Exception is
+something, that pauses normal execution and passes control to the
+(specific part of) operating system. Upon an exception, several things
+happen:
+
+1. Change of proocessor mode.
+2. CPSR gets saved into new mode's [[./PSRs-explained.txt][SPSR]].
+3. pc (incremented by some value) is saved into new mode's lr.
+4. Execution jumps to an entry in the exception vectors table specific
+   to the exception.
+
+Each exception type is taken to it's specific mode. Types and their
+modes are:
+
+1. Reset and supervisor mode.
+2. Undefined instruction and undefined mode.
+3. Supervisor call and supervisor mode.
+4. Prefetch abort and abort mode.
+5. Data abort and abort mode.
+6. Hypervisor trap and hypervisor mode (not used normally, only with
+   extensions).
+7. IRQ and IRQ mode.
+8. FIQ and FIQ mode.
+
+The new value of the pc (the address, to which the exception "jumps") is
+the address of nth instruction from exceptiom base address, which, under
+simplest settings, is 0x0 (bottom of virtual address space). N depends on the exception type. It is:
+
+1. reset
+2. undefined instruction
+3. supervisor call
+4. prefetch abort
+5. data abort
+6. hypervisor trap (not used here)
+7. IRQ
+8. FIQ
+
+Those 8 instructions constitute the exception vectors table. As the
+instruction follow one another, each of them should be a branch to some
+exception-handling routine. In fact, on other architectures often the
+exception vector table holds raw addresses of where to jump instead of
+actual instructions, as here.
+
+Bottom of virtual address space can be changed to some other value by
+manipulating the contents of SCTLR and VBAR coprocessor registers.
+
+On exception entry, the registers r0-r12 contain values used by the code
+that was executing before. In order for the exception handler to perform
+some action and return to that code, those registered can be preserved
+in memory. Some compilers can automatically generate appropriate
+prologue and epilogue for handler-functions, that will preserve the
+right registers (we're not using this feature in our project).
+
+Having old CPSR in SPSR and old pc in lr is helpful, when after handling
+the exception, the handler needs to return to the code that was
+executing before. There are 2 special instructions, subs and ldm \^
+(load multiple with a dash \^), that, when used to change the pc (and
+therefore perform a jump) cause the SPSR to be copied into CPSR. As bits
+of CPSR determine the current execution mode, this causes the mode to be
+change to that from before the exception. In short, subs and ldm \^ are
+the instructions to use to return from exceptions.
+
+As noted eariler, upon exception entry an incremented value of pc is
+stored in lr. By how much it is incremented, depends on exception type
+and execution state. For example, entering undefined instruction
+exception for thumb state places in undef's lr the problematic
+instruction's address + 2, while taking this exception from ARM state
+places in undef's lr that instruction's address + 4 (see full table in
+paragraph B1.8.3 of [[https://static.docs.arm.com/ddi0406/c/DDI0406C_C_arm_architecture_reference_manual.pdf][ARMv7-ar\_arm]]).
+
+It's worth noting, that while our
+[[ile:../src/arm/PL1/kernel/interrupt_vector.S][implementation of exception handlers]] also sets the stack pointer (sp) upon each
+exception entry, a kernel could be written, where this wouldn't be done,
+as each mode enterable by exception has it's own sp.
+
+* IRQ
+  2 of out of all possible exceptions in ARM are IRQ (Interrupt Request) and FIQ (Fast
+  Interrupt Request). The can be caused by external source, such as
+peripheral devices and they can be used to inform the kernel about some
+action, that happened.
+
+Interrupts offer an economic way of interacting with peripheral devices.
+For example, code can probe UART memory-mapped registers in a loop to
+see whether transmitting/receiving of a character finished. However,
+this causes the processor needlessly execute the loop and makes it
+impossible or difficult to perform another tasks at the same time.
+Interrupt can be used instead of probing to "notify" the kernel, that
+something it was waiting for just happened. While waiting for interrupt,
+the system can be put to halt (i.e. wfi instruction), which helps save
+power, or it can perform other actions without wasting processor cycles
+in a loop.
+
+An interrupt, that is normally IRQ, can be made into FIQ by ARM system
+dependent means. FIQ is meant to be able to be handled faster, by not
+having to back up registers r8-r12, that FIQ mode has it's own copies
+of. This project only uses IRQ.
+
+Some peripheral devices can be configured (through their memory-mapped
+registers) to generate an interrupt under certain conditions (i.e. UART
+can generate interrupt when received characters queue fills). The
+interrupt can then be either masked or unmasked (sometimes in more than
+one peripheral register). If interrupts are enabled in CPSR and a
+peripheral device tries to generate one, that is not masked, IRQ (or
+FIQ) exception occurs (which causes interrupts to be temporarily masked
+in CPSR). The code can usually check, whether an interrupt of given kind
+from given device is *pending*, by looking at the appropriate bit of the
+appropriate peripheral register (mmio). As long as an interrupt is
+pending, re-enabling interrupts (for example via return from IRQ
+handler) shall cause the exception to occur again. Removing the source
+of the interrupt (i.e. removing characters from UART fifo, that filled)
+doesn't usually cause the interrupt to stop pending, in which case a
+pending-bit has to be cleared, usually by writing to the appropriate
+peripheral register (mmio).
+
+IRQs and FIQs can be configured as vectored - the processor then, upon
+interrupt, jumps to different location depending on which interrupt
+occured, instead of jumping to the standard IRQ/FIQ vector. This can be used
+to speed up interrupt handling. Our simple project does not, however,
+use this feature.
+
+Currently, IRQs from 2 sources are used:
+[[https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf][ARM timer IRQ]] and UART IRQs. The kernel makes sure, that timer IRQ only
+occurs when processor is in user mode. IRQ handler does not return in
+this case - it calls scheduler. The kernel makes sure, that UART IRQ
+only occurs, when a process is blocked and is waiting for UART IO
+operation. The interrupt handler, when called, checks what type of UART
+action happened and tries (through calling of appropriate function from
+scheduler.c) to handle that action and, possibly, to unblock the waiting
+process. UART IRQ might occur when another process is executing (not
+possible now, with only one process, but shall be possible when more
+processes are added to the project), in which case it the handler
+returns, or when kernel is explicitly waiting for interrupts (because
+all processes are blocked), in which case it calls schedule() instead of
+returning. 
+
+* Processor modes
+
+ARMv7-A core can be executing in one of several modes (not to be
+confused with instruction set states or endianness execution state).
+Those are:
+
+1. User
+2. FIQ
+3. IRQ
+4. Supervisor
+5. Abort
+6. Undefined
+7. System
+
+In fact, there are more if the processor implements some extensions, but
+this is irrelevant here.
+
+Current processor mode is encoded in the lowest five bits of the CPSR register.
+
+Processor can operate in one of 2 privilege levels (although, again,
+extensions exist, that add more levels):
+
+1. PL0 - privilege level 0
+2. PL1 - privilege level 1
+
+Processor modes have their assigned privilege levels. User mode has
+privilege level 0 and all other modes have privilege level 1. Code
+executing in one of privileged modes is allowed to do more things, than
+user mode code, i.e. writing and reading some of the coprocessor
+registers, executing some privileged instructions (i.e. mrs and msr,
+when used to reference CPSR, as well as other modes' registers),
+accessing privileged memory and changing the mode (without causing an
+interrupt). Attempts to perform those actions in user mode result either
+in undefined (within some limits) behaviour or an exception (depending
+on what action is considered).
+
+User mode is the one, in which application programs usually run. Other
+modes are usually used by the operating system's kernel. Lack of
+privileges in user mode allows PL1 code to control execution of PL0
+code.
+
+While code executing in PL1 can freely (except switching from system to
+user mode, which produces undefined behaviour) change mode by either
+writing the CPRS or executing cps instruction, user mode can only be
+exitted by means of an interrupt.
+
+Some ARM core registers (i.e. r0 - r7) are shared between modes, while
+some are not. In this case, separate modes have their private copies of
+those registers. For example, lr and sp in supervisor mode are different
+from lr and sp in user mode. For full information about shared and not
+shared (banked) registers, see paragraph B9.2.1 in
+[[https://static.docs.arm.com/ddi0406/c/DDI0406C_C_arm_architecture_reference_manual.pdf][armv7-a
+manual]]. The most important things are that user mode and system mode
+share all registers with each other and they don't have their own SPSR
+(which is used for returning from exceptions and exceptions are
+never taken to those 2 modes) and that all other modes have their own
+SPSR, sp and lr.
+
+The reason for having multiple copies of the same register in different
+modes is that it simplifies writing interrupt handlers. I.e. supervisor
+mode code can safely use sp and lr without destroying the contents of
+user mode's sp and lr.
+
+The big number of PL1 modes is supposed to aid in handling of
+interrupts. Each kind of interrupt is taken to it's specific mode.
+
+Supervisor mode, in addition to being the mode supervisor calls are
+taken to, is the mode the processor is in when the kernel boots.
+
+System mode, which uses the same registers as user mode, is said to have
+been added to ARM architecture to ease accessing the unprivileged
+registers. For example, setting user mode's sp from supervisor mode can
+be done by switching to system mode, setting the sp and switching back
+to supervisor mode. Other modes' registers can alternatively be accessed
+with the use of mrs and msr assembly instructions (but not from user
+mode).
+
+Despite the name, system mode doesn't have to be the mode used most
+often by operating system's kernel. In fact, prohibition of direct
+switching from system mode to user mode would make extensive use of
+system mode impractical. This project, for example, uses supervisor mode
+for most of the privileged tasks.
+
+* Process management
+  An operating system has
+  to manage user processes. Our system only has one process right now, but
+usual actions, such as context saving or context restoring, are
+implemented anyways. The following few paragraphs contain information on
+how process management looks like in operating systems in general.
+
+Process might return control to the system by executing the svc (eariler
+called swi) instruction. System would then perform some action on behalf
+of the process and either return from the supervisor call exception or
+attempt to schedule another process to run, in which case context of the
+old process would need to be saved for later and context of the new
+process would need to be restored.
+
+Process has data in memory (such as it's stack, code) as well as data in
+registers (r0-r15, CPSR). Together they constitute process' context.
+From process' perspective, context should not unexpectedly change, so
+when control is taken away from user mode code (via an exception) and
+later (possibly after execution of some other processes) given back, it
+should be transparent to the process (except when kernel does something
+for the process in terms of supervisor call). In particular, the
+contents of core registers should be the same as before. For this to be
+achievable, the operating system has to back up process' registers
+somewhere in memory and later restore them from that memory.
+
+Operating system kernel maitains a queue of processes waiting for
+execution. When a process blocks (for example by waiting for IO), it is
+removed from the queue. If a process unblocks (for example because IO
+completed) it is added back to the queue. In general, some systems might
+complicate it, for example by having more queues, but discussing those
+variations is out of scope of this documentation. When processor is
+free, one of the processes from the queue (determined by some scheduling
+algorithm implemented in the kernel) gets
+chosen and run on the processor.
+
+As one process could never use a supervisor call, it could occupy the
+processor forever. To remedy this, timer interrupts can be used by the
+kernel to interrupt the execution of a process after some time. The
+process would then have it's context saved and go to the end of the
+queue. Another process would be scheduled to run.
+
+Other exceptions might occur when process is running. Depending on
+kernel design, handler of an exception (such as IRQ) might return to the
+process or cause another one to be scheduled.
+
+If at some time all processes are blocked waiting, the kernel can wait
+for some interrupt to happen, which could possibly unblock some process
+(i.e. because IO completed).
+
+While not mentioned earlier, switching between processes' contexts
+involves not only saving and restoring of registers, but also changing
+the translation table entries to properly map memory regions used by
+current process.
+
+In our project, process management is implemented in
+src/arm/PL1/kernel/scheduler.c.
+
+A "queue" contains data of the only process (variables PL0\_regs[],
+PL0\_sp, PL0\_lr and PL0\_PSR).
+
+** Scheduler functions
+
+Function setup\_scheduler\_structures is supposed to be called before
+scheduler is used in any way.
+
+Function schedule\_new() creates and runs a new process.
+
+Function schedule\_wait\_for\_output() causes the current process to
+have it's context saved and get blocked waiting for UART to send data.
+It is called from supervisor call handler. Function
+schedule\_wait\_for\_input() is similar, but process waits for UART to
+receive data.
+
+Function schedule() attempts to select a process (currently the only
+one) and run it. If process cannot be run, schedule() waits for
+interrupt, that could unblock the process. The interrupt handler would
+not return in this case, but rather call schedule() again.
+
+Function scheduler\_try\_output() is supposed to be called by IRQ
+handler when UART is ready to transmit more data. It can cause a process
+to get unblocked. scheduler\_try\_input() is simillar, but relates to
+receiving data.
+
+The following are assured in our design:
+
+1. When processor is in user mode, interrupts are enabled.
+2. When processor is in system mode, interrupts are disabled, except
+   when explicitly waiting for the interrupt when process is blocked.
+3. When a process is waiting for input/output, the corresponding IRQ is
+   unmasked. Otherwise, that IRQ is masked.
+4. If an interrupt from UART occurs during execution of user mode code
+   (not possible here, as we only have one process, but shall become
+   possible when proper processes are implemented), the handler shall
+   return. If that interrupt occurs during execution of PL1 code, it
+   means it occured in scheduler, that was implicitly waiting for it and
+   the handler calls scheduler() again instead of returning.
+5. Interrupt from timer is unmasked and set to come whenever a process
+   gets scheduled to run. Timer interrupt is disabled when in PL1 (when
+   scheduler is waiting for interrupt, only UART one can come).
+6. A supervisor call requesting an UART operation, that can not be
+   completed immediately, causes the process to block.
+
+* Linking
+
+[[https://en.wikipedia.org/wiki/Linker_%28computing%29][Linking]] is a process of creating an executable, library or another
+object file out of object files.
+During linking, values previously unknown to the compiler (i.e. what
+will be the addresses of external functions/variables, from what address
+will the code be executing) might be injected into the code.
+
+Linker script is, among others, used to tell the linker, where in memory
+the specific parts of the executable should lie.
+
+In a hosted environment (when building a program to run under an
+full-featured operting system, like GNU/Linux), a linker script is
+usually provided by the toolchain and used if no other script is
+provided. In a bare-metal project, the developer usually has to write
+their own linker script, in which they specify the binary image's *load
+address* and section layout.
+
+Contents of an object code file or executable (our .o or .elf) are
+grouped into sections. Sections have names. Common named are .text
+(usually contains code), .data (usually contains statically-allocated
+variables initialized to non-zero values), .bss (usually used to reserve
+memory for statically allocated variables initialized to zero), .rodata
+(usually contains statically-allocated variables, that are not going to
+be modified).
+
+In a hosted environment, when an executable (say, of elf format) is
+executed, contents of it's sections are usually placed in different
+memory segments with different access privileges, so that, for example,
+code is not writable and variable contents are not executable. This
+helps reduce the risk of buffer overflow exploits.
+
+In a bare-environment like ours, we don't execute an elf file directly
+(except in qemu, which is the unpreferred approach anyway), but rather a
+raw binary image created from an elf file. Still, the notion of section
+is used along the way.
+
+During link, one or more object code files are combined into one file
+(in our case an executable). Section contents of input files land in
+some sections of the output file, in a way defined in the linker script.
+In a hosted environment, a linker script would likely put contents of
+input .text sections in a .text section, contents of input .data
+sections in a .data section, etc. The developer can, however, use
+sections with different names (although weird behaviour of some linkers
+might occur) and assign their contents in their preferred way using a
+linker script.
+
+In linker script it is possible to specify a section as NOLOAD (usually
+used for .bss), which, in our case, causes that section not to be
+included in the binary image later created with objcopy.
+
+It is also possible to treat same-named input sections differently
+depending on what file they came from and even use wildcards when
+specifying file names.
+
+Variables can be created, as well as new symbols, which can then be
+references from C code.
+
+Defining alignment of specific parts of future image is also easily
+achievable.
+
+We made use of all those possibilities in our scripts.
+
+In src/arm/PL1/kernel/kernel\_stage2.ld the physical memory layout of
+thkernel is defined. Symbols defined there, such as \_stack\_end, are
+referenced in C header src/arm/PL1/kernel/memory.h.
+
+While src/arm/PL1/kernel/kernel.ld and src/arm/PL1/loader/loader.ld
+define the starting address, it is irrelevant, as the assembly-written
+position-independent code for first stages of loader and kernel does not depend on that address.
+
+At the beginning of this project, we had very little understanding of
+linker scripts' syntax.
+[[https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Using_ld_the_GNU_Linker/sections.html#OUTPUT-SECTION-DESCRIPTION][This article]] proved useful and allowed us to learn the required parts in a
+short time. As discussing the entire syntax of linker scripts is beyond
+the scope of this documentation, we refer the reader to that resource.
+
+* Miscellaneous topics
+
+** Supervisor calls
+
+Supervisor call happens, when the svc (previously called swi)
+instruction get executed. Exception is then entered. Supervisor call is
+the standard way for user process to ask the kernel for something. As
+user code might request many different things, the kernel must somehow
+know which one was requested. The svc instruction takes one immediate
+operand. The supervisor call exception handler can check at what address
+the execution was, read svc instruction from there and inspect it's
+bytes. This way, by executing svc with different immediate values, the
+used mode code can request different things from the kernel - the value
+in svc shall encode the request's type.
+
+To save time and for the sake of simplicity, we don't make use of
+immediades in svc and instead we encode call's type in r0. In our
+implementation we decided, that supervisor call will preserve and
+clobber the same registers as function call and it will return values
+through r0, just as function call. This enables us to use actually
+perform the supervisor call as call to function defined in
+src/arm/PL0/svc.S. Calls from C are performed in
+src/arm/PL0/PL0\_utils.c and request type encodings are defined in
+src/arm/common/svc\_interface.h (they must be known to both user mode
+code and handler code).
+
+** Utilities
+
+We've compiled useful utilities (i.e. memcpy(), strlen(), etc.) in
+src/arm/common/strings.c. Those Do not depend on the environment and can
+be used by both user mode code, kernel code, even bootloader code.
+Functions used for io (like puts()) are also defined in common way for
+privileged and unprivileged code. They do, however, rely on the
+existence of putchar() and getchar(). In PL0 code
+(src/arm/PL0/PL0\_utils.c), putchar() and getchar() are defined to
+perform a supervisor call, that does that operation. In the PL1 code,
+they are defined as operations on UART.
+
+** Timers
+
+Several timers are available on the RaspberryPi:
+
+1. System Timer (with 4 interrupt lines, regarded as the most reliable,
+   as it is not derived from the system clock and hence is not affecter
+   by processor power mode changes),
+   [[https://cs140e.sergio.bz/docs/BCM2837-ARM-Peripherals.pdf][BCM2837 ARM Peripherals, Chapter 12]]
+2. ARM side Timer (based on a ARM AP804)
+   [[https://cs140e.sergio.bz/docs/BCM2837-ARM-Peripherals.pdf][BCM2837 ARM Peripherals, Chapter 14]]
+3. ARM Generic Timer (optional extension to ARMv7-A and ARMv7-R,
+   configured through coprocessor registers)
+
+At first, we attempted to use the System Timer, some code for which is
+still present in src/arm/PL1/kernel/bcmclock.h. The interrupts from that
+timer are not, however, routed to any ARM core under rpi-open-firmware,
+but rather to the GPU. Because of that, we ended using the ARM side
+Timer (programmed in src/arm/PL1/kernel/armclock.h). The ARM side Timer
+based on ARM AP804 is currently only available on real hardware and not
+in qemu. Programming the ARM Generic Timer (listed in TODOs) could
+enable the use of timer interrupts in qemu.
+
+** UARTs
+
+src/arm/PL1/PL1\_common/uart.c implements putchar() and getchar() in
+terms of UART. Those implementations are blocking - they poll UART
+peripheral registers in a loop, checking, if the device is ready to
+perform the operation. They are, however, accompanied by functions
+getchar\_non\_blocking() and putchar\_non\_blocking(), that check *once*
+if the device is ready and only perform the operation if it is.
+Otherwise, they return an error value, Their purpose is to use them with
+interrupts. In interrupt-driven UART we avoid waiting in a loop -
+instead, an IRQ comes when desired UART's operation completes. The code
+that wants to write/read from UART, does, however, need to tie it's
+operation with IRQ handler and scheduler. Blocking versions should not
+be used once UART interrupts are enabled or in exception handlers, that
+should always run quickly. However, doing this does not break UART and
+might be justified for debugging purposes (like error() function defined
+in src/arm/common/io.c and used throughout the kernel code).
+
+There are 2 UARTs in RapsberryPi. One mini UART (also called UART 1) and
+one PL011 UART (also called UART 0). The PL011 UART is used exclusively
+in this project. The hardware allows some degree of configuration of
+which pins which UART is routed to (via so-called alternative
+functions). In our project it is assumed, that UART 0's TX and RX are
+routed to GPIO pins 14 & 15 by the firmware, which is true for
+rpi-open-firmware. With stock Broadcom firmware, either changing the
+default configuration (config.txt) or selection of alternative fuctions
+as part of uart initialization (present in TODOs list) might be
+required.
+
+Before UART can be used, GPIO pins 14 and 15 should have pull up/down
+disabled. This is done as part of UART initialization in uart\_init() in
+src/arm/PL1/PL1\_common/uart.c. There is a requirement that UART is
+disabled when being configured, which is also fulfilled by uart\_init().
+The PL011 is toroughly described in
+[[https://cs140e.sergio.bz/docs/BCM2837-ARM-Peripherals.pdf][BCM2837 ARM Peripherals]] as well as [[http://infocenter.arm.com/help/topic/com.arm.doc.ddi0183f/DDI0183.pdf][PrimeCell UART (PL011) Technical Reference Manual]].
+
+* Afterword
+
+This project has been done as part of the Embedded Systems course on
+[[https://www.agh.edu.pl/en/][AGH University of Science and Technology]]. The goal of the project was to investigate and program the
+MMU (Memory Management Unit) of the RaspberryPi, but ended up to form a
+basis of a small operating system.
+[[https://www.raspberrypi.org/products/raspberry-pi-3-model-b/][RaspberyPi 3 model B]] was the hardware platform used, with stock firmware replaced
+with
+[[https://github.com/christinaa/rpi-open-firmware][rpi-open-firmware]].
+An emulator, [[https://www.qemu.org/download/][qemu]] (version 2.9.1)
+capable of emulating an older RaspberryPi 2 was also used extensively.
+
+The project was written in C programming language and ARM assembly.
+Knowlegde of C is required to understand the code. Knowledge of ARM
+assembly is useful, but it should be considered a thing, that can be
+learned *while* working with it. Still, the reader should at least have
+an idea of what assembly language is and how it is used.
+
+This documentation is intended to provide information on bare-metal
+programming on the RapsberryPi and ARM in general, as well as
+description of our solutions and implementations. There is a lot of
+information available on the topic in online sources, yet it is not always in an
+easy-to-understand form and the amount of different options described in
+manuals might me overwhelming for people new to the topic. That's why we
+attempted to describe our work in a way the audience of bare-metal
+programming newcomers will find useful. External resources we used are listed at the end of the documentation.
+
+It is planned, for future years students of the Embedded Systems course,
+to have an option to continue or reuse previous projects, such as this
+one. We hope this documentation will prove useful to our younger
+colleagues who happen to be work with the codebase.
+
+In case on any bugs or questions, the authors can be contacted at kwojtus@protonmail.com.
+
+* Sources of Information 
+ * wiki.osdev
+ * ARM Architecture Reference Manual® ARMv7-A and ARMv7-R edition (probably the most useful document of all)
+ * dwelch67
+ * http://www.simtec.co.uk/products/SWLINUX/files/booting\_article.html - very good description of atags
+ * BCM2835-ARM-Peripherals.pdf and https://elinux.org/BCM2835\_datasheet\_errata
+ * https://buildmedia.readthedocs.org/media/pdf/devicetree-specification/latest/devicetree-specification.pdf
+ * online ARM Compiler toolchain Assembler Reference 
+ * Christina Brook's rpi-open-firmware 
+ * http://infocenter.arm.com/help/topic/com.arm.doc.ddi0183g/DDI0183G\_uart\_pl011\_r1p5\_trm.pdf
+ * GNU make documentation 
+ * description of linker scripts: https://access.redhat.com/documentation/en-US/Red\_Hat\_Enterprise\_Linux/4/html/Using\_ld\_the\_GNU\_Linker/sections.html#OUTPUT-SECTION-DESCRIPTION
-- 
cgit v1.2.3