Here's an explaination of steps we did to enable the MMU and how the MMU works in general. MMU stands for Memory Management Unit. It does 2 important things: 1. It allows programs to use virtual memory addressing. Virtual addresses are translated by the MMU to physical addresses with the help of translation table. 2. It guards against unallowed memory access. Element that only implements this functionality is called MPU (Memory Protection Unit) and is also found in some ARM cores. Without MMU code executing on a processor sees the memory as it really is. I.e. when it tries to load data from address 0x00AA0F3C it indeed loads data from 0x000A0F3C (of course, this doesn't mean it's address 0x000A0F3C in RAM; RAM can be mapped into the address space in an arbitrary way). MMU can be configured to "redirect" some range of addresses to some other range. Let's assume we configured the MMU to translate address range 0x00A00000 - 0x00B00000 to range 0x00200000 - 0x00300000. Now, code trying to perform operation on address 0x00AA0F3C would have the address transparently translated to 0x002A0F3C, on which the operation would actually take place. The translation affects all (stack and non-stack) data accesses as well as instruction fetches, hence an entire program can be made to work as if it was running from some memory address, while it in fact runs from a different one! The addresses used by program code are referred to as virtual addresses, while addresses actually used by the processor - as physical addresses. This aids operating system's memory management in several ways 1. A program may by compiled to run from some fixed address and the OS is still free to choose any physical location to store that program's code - only a translation of program's required address to that location's address has to be configured. A problem of simultaneous execution of multiple programs compiled for the same address is also avoided in this way. 2. A consecutive memory region might be required by some program. In a scenerio where due to earlier allocations and deallocactions no big enough (no pun intended) consecutive region of physical memory is free, smaller regions can be mapped to become accessible as a single region in virtual address space, thus avoiding the need for defragmentation. A given mapping can be made valid for only one execution mode (i.e. region only accessible from privileged mode) or only certain types of accesses (i.e. a memory region can be made non-executable, which guards against accidental jumping there by program code (important for countering buffer-overflow exploits)). An unallowed access triggers a processor exception, which passes control to an appropriate interrupt service routine. General configuration of the MMU in ARM processors it is present on is done through registers of the appropriate coprocessor (cp15). Translations are managed through translation table. It is an array of 32-bit or 64-bit entries (also called descriptors) describing how their corresponding memory regions should be mapped. A number of leftmost bits of a virtual address constitutes an index into the translation table to be used for translating it. This way no virtual addresses need to be stored in the table and MMU can perform translations in O(1) time. Coprocessor 15 contains several registers, that control the behaviour of the MMU. They are all accessed through mcr and mrc arm instructions. 1. SCTLR, System Control Register - "provides the top level control of the system, including its memory system" Bits of this register control, among other things: · whether the MMU is enabled · whether data cache is enabled · whether instruction cache is enabled · whether TEX remap is enabled TEX remap is a feacher, that changes how some translation table entry bit fields (called C, B and TEX) are used. We're not using TEX remap in our project. · whether access flags are enabled Enabling access flag causes one translation table descriptor bit normally used to specify access permissions of a region to be used as access flag. We don't use this feature either 2. DACR, Domain Access Control Register - "defines the access permission for each of the sixteen memory domains" Entries in translation table define which of available 16 memory domains a memory region belongs to. Bits of DACR specify what permissions apply to each of the domains. Possible setting are to allow accesses to regions based on settings in translation table descriptor or to allow/disallow all accesses regardless of access permission bits in translation table. 3. TTBR0, Translation Table Base Register 0 - "holds the base address of translation table 0, and information about the memory it occupies" System mode programmer can choose (with respect to some alignment requirements) where in the physical memory to put the translation table. Chosen address (actually, only a number of it's leftmost bits) has to be put in TTBR for the MMU to know where the table lies. Other bits of this register control some memory attributes relevant for accesses to table entries by the MMU 3. TTBR1, Translation Table Base Register 1 - simillar function to TTBR0 (see below for explaination of dual TTBR) 4. TTBCR, Translation Table Base Control Register Bits of this register control · How TLBs (Translation Lookaside Buffers) are used. TLBs are a mechanism of caching translation table entries. · Whether to use some extension feature, that changes traslation table entries and TTBR* lengths to 64-bit (we're not using this, so we won't go into details) · How a translation table is selected. There can be 2 translation tables and there are 2 cp15 registers (TTBR0 and TTBR1) to hold their base addresses. When 2 tables are in use, then on each memory access some leftmost bits of virtual address determine which one should be used. If the bits are all 0s - TTBR0-pointed table is used. Otherwise - TTBR1 is used. This allows OS developer to use separate translation tables for kernelspace and userspace (i.e. by having the kernelspace code run from virtual addresses starting with 1 and userspace code run from virtual addresses starting with 0). A field of TTBCR determines how many leftmost bits of virtual address are used for that (and also affects TTBR0 format). In the simplest setup (as in our project) this number is 0, so only the table specified in TTBR0 is used. Translation table consists of 4096 entries, each describing a 1MB memory region. An entry can be of several types: 1. Invalid entry - the corresponding virtual addresses can not be used 2. Section - description of a mapping of 1MB memory region 3. Supersection - description of a mapping of 16MB memory region, that has to be repeated 16 times in consecutive memory sections (can be used to map to physical addresses higher than 2^32) 4. Page table - no mapping is given yet, but a page table is pointed. See below. Besides, translation table descriptor also specifies: 1. Access permissions. 2. Other memory attributes (cacheability, shareability). 3. which domain the memory belongs to. Page table is something simillar to translation table, but it's entries define smaller regions (called, well - pages). When a translation table descriptor describing a page table gets used for translation, then entry in that page table (with some middle bits of the virtual address used as index into it) is fetched and used. This allows for better granularity of mappings while not requiring the page tables to occupy space if small pages are not needed. We can say, that 2-level translations are performed. On some versions of ARM translations can have more levels than here. As of 15.01.2020 page tables and small pages are not used in the project (although programming them is on the TODO list). Our project uses C bitfield structs for operating on coprocessor registers' contents and translation table descriptors. This is an elegant and readable approach, yet little-portable across compilers. Current struct definitions are sure to work properly with GCC. Despite the overhelming amount of configuration options available, most can be left with deafults and this is how it's done in this project. Those default settings usually make the MMU behave as in older ARM versions, when some options were not yet available (and hence, the entire system was simpler).