diff options
author | vetch <vetch97@gmail.com> | 2020-01-18 08:20:19 +0100 |
---|---|---|
committer | vetch <vetch97@gmail.com> | 2020-01-18 08:20:19 +0100 |
commit | bd7ba048a0b429e2b51b294cf427e87e976fe572 (patch) | |
tree | 4028c728b94de5072a026c20bddda518ebd464ac | |
parent | f540834c09905f848139371005d077577eabdd57 (diff) | |
download | rpi-MMU-example-bd7ba048a0b429e2b51b294cf427e87e976fe572.tar.gz rpi-MMU-example-bd7ba048a0b429e2b51b294cf427e87e976fe572.zip |
Improvement of MMU explanation
-rw-r--r-- | MMU-explained.txt | 57 |
1 files changed, 32 insertions, 25 deletions
diff --git a/MMU-explained.txt b/MMU-explained.txt index e640aaa..ec08efc 100644 --- a/MMU-explained.txt +++ b/MMU-explained.txt @@ -1,23 +1,27 @@ -Here's an explaination of steps we did to enable the MMU and how the MMU works in general. +Here's an explanation of steps we did to enable the MMU and how the MMU works in general. MMU stands for Memory Management Unit. It does 2 important things: 1. It allows programs to use virtual memory addressing. Virtual addresses are translated by the MMU to physical addresses with the help of translation table. 2. It guards against unallowed memory access. Element that only implements this functionality is called MPU (Memory Protection Unit) and is also found in some ARM cores. -Without MMU code executing on a processor sees the memory as it really is. I.e. when it tries to load data from address 0x00AA0F3C it indeed loads data from 0x000A0F3C (of course, this doesn't mean it's address 0x000A0F3C in RAM; RAM can be mapped into the address space in an arbitrary way). MMU can be configured to "redirect" some range of addresses to some other range. Let's assume we configured the MMU to translate address range 0x00A00000 - 0x00B00000 to range 0x00200000 - 0x00300000. Now, code trying to perform operation on address 0x00AA0F3C would have the address transparently translated to 0x002A0F3C, on which the operation would actually take place. -The translation affects all (stack and non-stack) data accesses as well as instruction fetches, hence an entire program can be made to work as if it was running from some memory address, while it in fact runs from a different one! +Without MMU code executing on a processor sees the memory as it really is. + +When it tries to load data from address 0x00AA0F3C it indeed loads data from 0x00AA0F3C. This doesn't mean address 0x00AA0F3C is in RAM: RAM can be mapped into the address space in an arbitrary way. + +MMU can be configured to "redirect" some range of addresses to some other range. Let's assume we configured the MMU to translate address range 0x00A00000 - 0x00B00000 to range 0x00200000 - 0x00300000. Now, code trying to perform operation on address 0x00AA0F3C would have the address transparently translated to 0x002A0F3C, on which the operation would actually take place. +The translation affects all (stack and non-stack) data accesses as well as instruction fetches, hence an entire program can be made to work as if it was running from some memory address, while in fact it runs from a different one! The addresses used by program code are referred to as virtual addresses, while addresses actually used by the processor - as physical addresses. This aids operating system's memory management in several ways 1. A program may by compiled to run from some fixed address and the OS is still free to choose any physical location to store that program's code - only a translation of program's required address to that location's address has to be configured. A problem of simultaneous execution of multiple programs compiled for the same address is also avoided in this way. -2. A consecutive memory region might be required by some program. In a scenerio where due to earlier allocations and deallocactions no big enough (no pun intended) consecutive region of physical memory is free, smaller regions can be mapped to become accessible as a single region in virtual address space, thus avoiding the need for defragmentation. +2. A consecutive memory region might be required by some program. For example: due to earlier allocations and deallocactions there isn't a big enough (no pun intended) free consecutive region of physical memory. Smaller regions can be mapped to become accessible as a single region in virtual address space, thus avoiding the need for defragmentation. -A given mapping can be made valid for only one execution mode (i.e. region only accessible from privileged mode) or only certain types of accesses (i.e. a memory region can be made non-executable, which guards against accidental jumping there by program code (important for countering buffer-overflow exploits)). An unallowed access triggers a processor exception, which passes control to an appropriate interrupt service routine. +A given mapping can be made valid for only one execution mode (i.e. region only accessible from privileged mode) or only certain types of accesses . A memory region can be made non-executable, which guards against accidental jumping there by program code. That is important for countering buffer-overflow exploits. An unallowed access triggers a processor exception, which passes control to an appropriate interrupt service routine. -In RaspberryPi environments used by us, there are ARMv7-A-compatible processors, which we currently use only in 32-bit mode. Information here is relevant to those systems (there are Pi boards with both older and newer processors, with more or less functionality and features available). +In RaspberryPi environments used by us, there are ARMv7-A compatible processors, which we currently use only in 32-bit mode. Information here is relevant to those systems (there are Pi boards with both older and newer processors, with more or less functionality and features available). -General configuration of the MMU in ARM processors it is present on is done through registers of the appropriate coprocessor (cp15). Translations are managed through translation table. It is an array of 32-bit or 64-bit entries (also called descriptors) describing how their corresponding memory regions should be mapped. A number of leftmost bits of a virtual address constitutes an index into the translation table to be used for translating it. This way no virtual addresses need to be stored in the table and MMU can perform translations in O(1) time. +If MMU is present, general configuration of it is done through registers of the appropriate coprocessor (cp15). Translations are managed through translation table. It is an array of 32-bit or 64-bit entries (also called descriptors) describing how their corresponding memory regions should be mapped. A number of leftmost bits of a virtual address constitutes an index into the translation table to be used for translating it. This way no virtual addresses need to be stored in the table and MMU can perform translations in O(1) time. Coprocessor 15 contains several registers, that control the behaviour of the MMU. They are all accessed through mcr and mrc arm instructions. 1. SCTLR, System Control Register - "provides the top level control of the system, including its memory system" @@ -43,35 +47,38 @@ Coprocessor 15 contains several registers, that control the behaviour of the MMU Translation table consists of 4096 entries, each describing a 1MB memory region. An entry can be of several types: 1. Invalid entry - the corresponding virtual addresses can not be used 2. Section - description of a mapping of 1MB memory region -3. Supersection - description of a mapping of 16MB memory region, that has to be repeated 16 times in consecutive memory sections (can be used to map to physical addresses higher than 2^32) +3. Supersection - description of a mapping of 16MB memory region, that has to be repeated 16 times in consecutive memory sections . This can be used to map to physical addresses higher than 2^32. 4. Page table - no mapping is given yet, but a page table is pointed. See below. + Besides, translation table descriptor also specifies: 1. Access permissions. 2. Other memory attributes (cacheability, shareability). -3. which domain the memory belongs to. +3. Which domain the memory belongs to. -Page table is something simillar to translation table, but it's entries define smaller regions (called, well - pages). When a translation table descriptor describing a page table gets used for translation, then entry in that page table (with some middle bits of the virtual address used as index into it) is fetched and used. This allows for better granularity of mappings while not requiring the page tables to occupy space if small pages are not needed. We can say, that 2-level translations are performed. On some versions of ARM translations can have more levels than here. This means the MMU might sometimes need to fetch several entries from different level tables to compute the physical address. This is called a translation table walk. +Page table is something simillar to translation table, but it's entries define smaller regions (called, well - pages). When a translation table descriptor describing a page table gets used for translation, then entry in that page table is fetched and used along with some middle bits of the virtual address used as index. This allows for better granularity of mappings, as it doesn't require the page tables to occupy space if small pages are not needed. We could say, that 2-level translations are performed. On some versions of ARM translations can have more levels than that. This means the MMU might sometimes need to fetch several entries from different level tables to compute the physical address. This is called a translation table walk. As of 15.01.2020 page tables and small pages are not used in the project (although programming them is on the TODO list). -Despite the overhelming amount of configuration options available, most can be left with deafults and this is how it's done in this project. Those default settings usually make the MMU behave as in older ARM versions, when some options were not yet available (and hence, the entire system was simpler). +Despite the overhelming amount of configuration options available, most can be left deafult and this is how it's done in this project. Those default settings usually make the MMU behave like it did in older ARM versions, when some options were not yet available and hence, the entire system was simpler. -Our project uses C bitfield structs for operating on SCTLR and TTBR contents (with DACR - bit shifts are more appropriate and with TTBCR - our default configuration means just writing 0 to register) and translation table descriptors. This is an elegant and readable approach, yet little-portable across compilers. Current struct definitions are sure to work properly with GCC. +Our project uses C bitfield structs for operating on SCTLR and TTBCR contents and translation table descriptors. With DACR - bit shifts are more appropriate and with TTBCR - our default configuration means we're writing '0' to that register. This is an elegant and readable approach, yet little-portable across compilers. Current struct definitions work properly with GCC. -Structs describing SCTLR, DACR and TTBR are defined in src/arm/PL1/kernel/cp_regs.h, while those describing translation table descriptors - in src/arm/PL1/kernel/translation_table_descriptors.h. +Structs describing SCTLR, DACR and TTBCR are defined in src/arm/PL1/kernel/cp_regs.h. +Structs describing translation table descriptors are defined in src/arm/PL1/kernel/translation_table_descriptors.h. Before the MMU is enabled, all memory is seen as it really is. Therefore, the only feasible way of enabling it is by initially setting the descriptors in translation table to map all addresses (mapping just addresses used by the kernel would be enough) to themselves. It is called a flat map. How setting up a flat map and turning on the MMU and management of memory sections is done in our project: -1. Translation table is defined in the linker script src/arm/PL1/kernel/kernel_stage2.ld as a NOLOAD section. C code gets the table's start and end addresses from smbols defined in that linker script (see arm/PL1/kernel/memory.h). -2. Function setup_flat_map() defined in arm/PL1/kernel/paging.c enables MMU with a flat map. It prints relevant information to uart while performing the following operations: - · In a loop writes all descriptors to the translation table, setting them as sections, accessible from PL1 only, belonging to domain 0. - · Sets DACR to allow domain 0 memory accesses based on translation table descriptor permissions and block accesses to other domains (only domain 0 is used in this project). - · Makes sure TEX remap, access flag, caches and the MMU are disabled in SCTLR. Disabling some of them might be unnecessary, because MMU is assumend to be disabled on the start and enabled caches might cause no problems as long as only flat map is used. Still, the way it is done right now is known to work well and optimizations are not needed. - · Clears all caches and TLBs (again, it is suspected that at some of this is unnecessary). - · Writes TTBCR setting, that causes only one, 32-bit translation table to be used. - · Makes TTBR0 point to the start of translation table. Rest of attributes in TTBR0 (concerning how table entries are being accessed) are left as 0s (defaults). - · Enables the MMU and caches by setting the appropriate bits in SCTLR. -After some cp15 register writes, the isb assembly instruction is used, which causes ARM core to wait until changes take effect (otherwise some later instructions could possibly be executed before this happens). - -In arm/PL1/kernel/paging.c the function claim_and_map_section() can be used to modify an entry in translation table to create a new mapping. Memory allocation also done in that source file uses some lists to describe free and taken sections and has nothing to do with with the MMU. +1. Translation table is defined in the linker script src/arm/PL1/kernel/kernel_stage2.ld as a NOLOAD section. C code gets the table's start and end addresses from symbols defined in that linker script (see arm/PL1/kernel/memory.h). +2. Function setup_flat_map() defined in arm/PL1/kernel/paging.c enables MMU with a flat map. It prints relevant information to uart while performing the following procedure: +2.1. In a loop write all descriptors to the translation table, set them as sections, accessible from PL1 only, belonging to domain 0. +2.2. Set DACR to allow domain 0 memory accesses, based on translation table descriptor permissions and block accesses to other domains, as only domain 0 is used in this project. +2.3. Make sure TEX remap, access flag, caches and the MMU are disabled in SCTLR. Disabling some of them might be unnecessary, because MMU is assumed to be disabled from the start and enabled caches might cause no problems as long as only flat map is used. Still, the way it is done right now is known to work well and optimizations are not needed. +2.4. Clear all caches and TLBs (again, it is suspected that some of this is unnecessary). +2.5. Write TTBCR setting such that only 32-bit translation table is used. +2.6. Make TTBR0 point to the start of translation table. Rest of attributes in TTBR0 (concerning how table entries are being accessed) are left as 0s (defaults). +2.7. Enable the MMU and caches by setting the appropriate bits in SCTLR. + +After some cp15 register writes, the isb assembly instruction is used, which causes ARM core to wait until changes take effect. This is done to prevent some later instructions from being executed before the changes are applied. + +In arm/PL1/kernel/paging.c the function claim_and_map_section() can be used to modify an entry in translation table to create a new mapping. Memory allocation also done in that source file uses some lists to describe free and taken sections, but has nothing to do with with the MMU. |