From 2d54271bd84e256cb79041979ee2918ff10f639f Mon Sep 17 00:00:00 2001 From: vetch Date: Sun, 19 Jan 2020 21:32:30 +0100 Subject: Add sources, format MMU --- MMU-explained.txt | 37 +++++++++++++++--------------- Sources.txt | 30 +++++++++++++++++++++++++ document.md | 67 ++++++++++++++++++++++++++++++++++++++++--------------- makeDoc.sh | 3 ++- 4 files changed, 100 insertions(+), 37 deletions(-) create mode 100644 Sources.txt diff --git a/MMU-explained.txt b/MMU-explained.txt index 91d3caa..a26c670 100644 --- a/MMU-explained.txt +++ b/MMU-explained.txt @@ -31,25 +31,26 @@ If MMU is present, general configuration of it is done through registers of the ### Coprocessor 15 Coprocessor 15 contains several registers, that control the behaviour of the MMU. They are all accessed through mcr and mrc arm instructions. -1. SCTLR, System Control Register - "provides the top level control of the system, including its memory system" - Bits of this register control, among other things: - · whether the MMU is enabled - · whether data cache is enabled - · whether instruction cache is enabled - · whether TEX remap is enabled - TEX remap is a feature, that changes how some translation table entry bit fields (called C, B and TEX) are used. We're not using TEX remap in our project. - · whether access flags are enabled - Enabling access flag causes one translation table descriptor bit normally used to specify access permissions of a region to be used as access flag. We don't use this feature either -2. DACR, Domain Access Control Register - "defines the access permission for each of the sixteen memory domains" - Entries in translation table define which of available 16 memory domains a memory region belongs to. Bits of DACR specify what permissions apply to each of the domains. Possible setting are to allow accesses to regions based on settings in translation table descriptor or to allow/disallow all accesses regardless of access permission bits in translation table. -3. TTBR0, Translation Table Base Register 0 - "holds the base address of translation table 0, and information about the memory it occupies" - System mode programmer can choose (with respect to some alignment requirements) where in the physical memory to put the translation table. Chosen address (actually, only a number of it's leftmost bits) has to be put in TTBR for the MMU to know where the table lies. Other bits of this register control some memory attributes relevant for accesses to table entries by the MMU +1. SCTLR, System Control Register - "provides the top level control of the system, including its memory system". Bits of this register control, among other things, whether the following are enabled: + + 1. the MMU + 2. data cache4. TEX remap + 3. instruction cache + 4. TEX remap (changes how some translation table entry bit fields (called C, B and TEX) are used - not in the project) + 5. access flags (enabling causes one translation table descriptor bit normally used to specify access permissions of a region to be used as access flag - not used either) + +2. DACR, Domain Access Control Register - "defines the access permission for each of the sixteen memory domains". Entries in translation table define which of available 16 memory domains a memory region belongs to. Bits of DACR specify what permissions apply to each of the domains. Possible settings are to allow accesses to regions based on settings in translation table descriptor or to allow/disallow all accesses regardless of access permission bits in translation table. + +3. TTBR0, Translation Table Base Register 0 - "holds the base address of translation table 0, and information about the memory it occupies". System mode programmer can choose (with respect to some alignment requirements) where in the physical memory to put the translation table. Chosen address (actually, only a number of it's leftmost bits) has to be put in TTBR for the MMU to know where the table lies. Other bits of this register control some memory attributes relevant for accesses to table entries by the MMU + 3. TTBR1, Translation Table Base Register 1 - simillar function to TTBR0 (see below for explaination of dual TTBR) -4. TTBCR, Translation Table Base Control Register - Bits of this register control - · How TLBs (Translation Lookaside Buffers) are used. TLBs are a mechanism of caching translation table entries. - · Whether to use some extension feature, that changes traslation table entries and TTBR* lengths to 64-bit (we're not using this, so we won't go into details) - · How a translation table is selected. There can be 2 translation tables and there are 2 cp15 registers (TTBR0 and TTBR1) to hold their base addresses. When 2 tables are in use, then on each memory access some leftmost bits of virtual address determine which one should be used. If the bits are all 0s - TTBR0-pointed table is used. Otherwise - TTBR1 is used. This allows OS developer to use separate translation tables for kernelspace and userspace (i.e. by having the kernelspace code run from virtual addresses starting with 1 and userspace code run from virtual addresses starting with 0). A field of TTBCR determines how many leftmost bits of virtual address are used for that (and also affects TTBR0 format). In the simplest setup (as in our project) this number is 0, so only the table specified in TTBR0 is used. +4. TTBCR, Translation Table Base Control Register, which controls: + + 1. How TLBs (Translation Lookaside Buffers) are used. TLBs are a mechanism of caching translation table entries. + 2. Whether to use some extension feature, that changes traslation table entries and TTBR* lengths to 64-bit (we're not using this, so we won't go into details) + 3. How a translation table is selected. + +There can be 2 translation tables and there are 2 cp15 registers (TTBR0 and TTBR1) to hold their base addresses. When 2 tables are in use, then on each memory access some leftmost bits of virtual address determine which one should be used. If the bits are all 0s - TTBR0-pointed table is used. Otherwise - TTBR1 is used. This allows OS developer to use separate translation tables for kernelspace and userspace (i.e. by having the kernelspace code run from virtual addresses starting with 1 and userspace code run from virtual addresses starting with 0). A field of TTBCR determines how many leftmost bits of virtual address are used for that (and also affects TTBR0 format). In the simplest setup (as in our project) this number is 0, so only the table specified in TTBR0 is used. ### Translation table diff --git a/Sources.txt b/Sources.txt new file mode 100644 index 0000000..7a48250 --- /dev/null +++ b/Sources.txt @@ -0,0 +1,30 @@ +## Problems we faced + +Problems we've faced (I mentioned to Metal (is it ok to use that pseudonym? I think it's ok), I'd include those and he seemed very enthusiastic about us describing them) (+ some stuff qualifiable as probles is already in HISTORY.md) + +* Our ramfs needs to be 4-aligned in memory, but when objcopy creates the embeddable file, it doesn't (at least by default) mark it's data section as requiring 2**2 alignment. There has to be .=ALIGN(4) in linker script before ramfs_embeddable.o, but I forgot about it, which caused the ramfs to misbehave +* VERY NAUGHTY PROBLEM · Many sources mention /COMMON/ as the section, that contains some specific kind of uninitialized (0-initialized) data. Obviously, it has to be included in the linker script. Unfortunately, gcc names it differently, mainly - /COM/. This caused our linker script to not include it in the image. Instead, it was placed somewhere after the last section specified in the linker script. This happened to be after our NOLOAD stack section, where first free MMU section is (which happens to always get allocated to the first process, which gets it's code copied there). Do You imagine sitting for hours in front of radare2, searching for a bug in scheduler.c or PL0_test.c, that causes the userspace code to fail with either some kind of weird abort or undefined instruction, always on the second PL0 instruction!? +* VERY NAUGHTY PROBLEM · I wanted to make our bootloader and kernel able to run no matter what address they are loaded at (see comment in kernel's stage1 linker script). To achieve that, I added -fPIC to compilation options of all arm code. With this, I decided I can, instead of embedding code in other code using objcopy, just put that code in separate linker script section with section_start and section_end symbols defined, so that I can copy it to some other address in runtime. I did it and it worker with interrupt vector and libkernel (see point below). But once I changed EVERYTHING to use linker symbols/sections instead of objcopy embedding it turned out it doesn't really work... and I had to make it back the old way :( The thing is -fPIC requires code to be loaded by some os or bootloader, that will fill the global offset table with symbols. I knew it's possible to generate bare-metal position-independent code, that shall work without got, but it turned out this is not implemented in gcc (it is in arm compiler, but only in 32-bits and who would like to use arm compiler anyway). I ended up writing stage1 of both bootloader and the kernel in careful position-independent assembly, thus achieving my goal (jut with a bit more of effort). +* Linker behaves weird when section names don't start with .text, .data, etc. +* Not strictly a problem, but a funny mistake of mine, that is worth mentioning... At first I didn't know about special features of SUBS pc, lr and ldm rn {pc} ^ instructions. So I would switch to user mode by first branching to code in PL0-accessible section and having it execute isb instruction. This worked, but was not good, because code executed by the kernel was in memory section writable by userspace code. So i separated that into "libkernel", that would be in a PL0-executable but non-writable section and would perform the switch... Well, it did work. Still, I was happy when I learned how to achieve the same with subs/ldm and could remove libkernel, making the project a bit simpler. +* system mode has separate stack pointer from supervisor mode, so when going from supervisor to system we need to set it... We didn't know that and we were getting weeeird bugs (where changing something little in one place would make the bug occur or not occur somewhere completely else); also, it's not allowed (undefined behaviour) to switch from system mode directly to user mode... (at least this didn't cause such weird things to happen...) +* both bcm arm peripherals manual and the manual to uart itself say, that writing 0s to PL011_UART_IMSC unmasks interrupts; its the opposite: writing 1s enables specific interrupts and writing 0s disables them. wiki.osdev code also got it the way it's written in those docs, but this didn't cause problems, since uart irq was not enabled in ARM_ENABLE_IRQS_2 (using #define names from our code), so, as intended, no irq was occuring +* STILL UNFIXED · The very simple pipe_image program somehow manages to break stdin, so that even other programs run in that same (bash) shell can't read from it... (in zsh other interactively run commands work ok, but commands following pipe_image inside a shell function still have that problem) + +## Sources of Info + +* wiki.osdev (good for starting off, we could also (in a polite way) mention the things, that were broken there) + + getting uart irq masking wrong (not really their fault, see above) + + zeroing bss even tho it was placed in a section, that wasn't marked NOLOAD (I need to verify that yet) + + switch inside an enum!?!?!? + + There is already some more mentioned in HISTORY.md +* ARM Architecture Reference Manual® ARMv7-A and ARMv7-R edition (man, this has 2720 pages! But I think this was the most useful document of all) +* dwelch67 +* http://www.simtec.co.uk/products/SWLINUX/files/booting_article.html - very good description of atags +* BCM2835-ARM-Peripherals.pdf and https://elinux.org/BCM2835_datasheet_errata +* https://buildmedia.readthedocs.org/media/pdf/devicetree-specification/latest/devicetree-specification.pdf - perhaps You've found another source for that... but if not, this seems to be good! +* online ARM Compiler toolchain Assembler Reference +* Christina Brook's rpi-open-firmware +* http://infocenter.arm.com/help/topic/com.arm.doc.ddi0183g/DDI0183G_uart_pl011_r1p5_trm.pdf +* GNU make documentation +* description of linker scripts: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Using_ld_the_GNU_Linker/sections.html#OUTPUT-SECTION-DESCRIPTION diff --git a/document.md b/document.md index e6faca2..4802ac2 100644 --- a/document.md +++ b/document.md @@ -300,25 +300,26 @@ If MMU is present, general configuration of it is done through registers of the ### Coprocessor 15 Coprocessor 15 contains several registers, that control the behaviour of the MMU. They are all accessed through mcr and mrc arm instructions. -1. SCTLR, System Control Register - "provides the top level control of the system, including its memory system" - Bits of this register control, among other things: - · whether the MMU is enabled - · whether data cache is enabled - · whether instruction cache is enabled - · whether TEX remap is enabled - TEX remap is a feature, that changes how some translation table entry bit fields (called C, B and TEX) are used. We're not using TEX remap in our project. - · whether access flags are enabled - Enabling access flag causes one translation table descriptor bit normally used to specify access permissions of a region to be used as access flag. We don't use this feature either -2. DACR, Domain Access Control Register - "defines the access permission for each of the sixteen memory domains" - Entries in translation table define which of available 16 memory domains a memory region belongs to. Bits of DACR specify what permissions apply to each of the domains. Possible setting are to allow accesses to regions based on settings in translation table descriptor or to allow/disallow all accesses regardless of access permission bits in translation table. -3. TTBR0, Translation Table Base Register 0 - "holds the base address of translation table 0, and information about the memory it occupies" - System mode programmer can choose (with respect to some alignment requirements) where in the physical memory to put the translation table. Chosen address (actually, only a number of it's leftmost bits) has to be put in TTBR for the MMU to know where the table lies. Other bits of this register control some memory attributes relevant for accesses to table entries by the MMU +1. SCTLR, System Control Register - "provides the top level control of the system, including its memory system". Bits of this register control, among other things, whether the following are enabled: + + 1. the MMU + 2. data cache4. TEX remap + 3. instruction cache + 4. TEX remap (changes how some translation table entry bit fields (called C, B and TEX) are used - not in the project) + 5. access flags (enabling causes one translation table descriptor bit normally used to specify access permissions of a region to be used as access flag - not used either) + +2. DACR, Domain Access Control Register - "defines the access permission for each of the sixteen memory domains". Entries in translation table define which of available 16 memory domains a memory region belongs to. Bits of DACR specify what permissions apply to each of the domains. Possible settings are to allow accesses to regions based on settings in translation table descriptor or to allow/disallow all accesses regardless of access permission bits in translation table. + +3. TTBR0, Translation Table Base Register 0 - "holds the base address of translation table 0, and information about the memory it occupies". System mode programmer can choose (with respect to some alignment requirements) where in the physical memory to put the translation table. Chosen address (actually, only a number of it's leftmost bits) has to be put in TTBR for the MMU to know where the table lies. Other bits of this register control some memory attributes relevant for accesses to table entries by the MMU + 3. TTBR1, Translation Table Base Register 1 - simillar function to TTBR0 (see below for explaination of dual TTBR) -4. TTBCR, Translation Table Base Control Register - Bits of this register control - · How TLBs (Translation Lookaside Buffers) are used. TLBs are a mechanism of caching translation table entries. - · Whether to use some extension feature, that changes traslation table entries and TTBR* lengths to 64-bit (we're not using this, so we won't go into details) - · How a translation table is selected. There can be 2 translation tables and there are 2 cp15 registers (TTBR0 and TTBR1) to hold their base addresses. When 2 tables are in use, then on each memory access some leftmost bits of virtual address determine which one should be used. If the bits are all 0s - TTBR0-pointed table is used. Otherwise - TTBR1 is used. This allows OS developer to use separate translation tables for kernelspace and userspace (i.e. by having the kernelspace code run from virtual addresses starting with 1 and userspace code run from virtual addresses starting with 0). A field of TTBCR determines how many leftmost bits of virtual address are used for that (and also affects TTBR0 format). In the simplest setup (as in our project) this number is 0, so only the table specified in TTBR0 is used. +4. TTBCR, Translation Table Base Control Register, which controls: + + 1. How TLBs (Translation Lookaside Buffers) are used. TLBs are a mechanism of caching translation table entries. + 2. Whether to use some extension feature, that changes traslation table entries and TTBR* lengths to 64-bit (we're not using this, so we won't go into details) + 3. How a translation table is selected. + +There can be 2 translation tables and there are 2 cp15 registers (TTBR0 and TTBR1) to hold their base addresses. When 2 tables are in use, then on each memory access some leftmost bits of virtual address determine which one should be used. If the bits are all 0s - TTBR0-pointed table is used. Otherwise - TTBR1 is used. This allows OS developer to use separate translation tables for kernelspace and userspace (i.e. by having the kernelspace code run from virtual addresses starting with 1 and userspace code run from virtual addresses starting with 0). A field of TTBCR determines how many leftmost bits of virtual address are used for that (and also affects TTBR0 format). In the simplest setup (as in our project) this number is 0, so only the table specified in TTBR0 is used. ### Translation table @@ -573,3 +574,33 @@ In src/arm/PL1/kernel/kernel_stage2.ld the physical memory layout of thkernel is While src/arm/PL1/kernel/kernel.ld and src/arm/PL1/loader/loader.ld define the starting address, it is irrelevant, as the assembly-written position-independent code for [first stages of loader and kernel](./Boot_explained.txt) does not depend on that address. At the beginning of this project, we had very little understanding of linker scripts' syntax. [This article](https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Using_ld_the_GNU_Linker/sections.html#OUTPUT-SECTION-DESCRIPTION) proved useful and allowed us to learn the required parts in a short time. As discussing the entire syntax of linker scripts is beyond the scope of this documentation, we refer the reader to that resource. +## Problems we faced + +Problems we've faced (I mentioned to Metal (is it ok to use that pseudonym? I think it's ok), I'd include those and he seemed very enthusiastic about us describing them) (+ some stuff qualifiable as probles is already in HISTORY.md) + +* Our ramfs needs to be 4-aligned in memory, but when objcopy creates the embeddable file, it doesn't (at least by default) mark it's data section as requiring 2**2 alignment. There has to be .=ALIGN(4) in linker script before ramfs_embeddable.o, but I forgot about it, which caused the ramfs to misbehave +* VERY NAUGHTY PROBLEM · Many sources mention /COMMON/ as the section, that contains some specific kind of uninitialized (0-initialized) data. Obviously, it has to be included in the linker script. Unfortunately, gcc names it differently, mainly - /COM/. This caused our linker script to not include it in the image. Instead, it was placed somewhere after the last section specified in the linker script. This happened to be after our NOLOAD stack section, where first free MMU section is (which happens to always get allocated to the first process, which gets it's code copied there). Do You imagine sitting for hours in front of radare2, searching for a bug in scheduler.c or PL0_test.c, that causes the userspace code to fail with either some kind of weird abort or undefined instruction, always on the second PL0 instruction!? +* VERY NAUGHTY PROBLEM · I wanted to make our bootloader and kernel able to run no matter what address they are loaded at (see comment in kernel's stage1 linker script). To achieve that, I added -fPIC to compilation options of all arm code. With this, I decided I can, instead of embedding code in other code using objcopy, just put that code in separate linker script section with section_start and section_end symbols defined, so that I can copy it to some other address in runtime. I did it and it worker with interrupt vector and libkernel (see point below). But once I changed EVERYTHING to use linker symbols/sections instead of objcopy embedding it turned out it doesn't really work... and I had to make it back the old way :( The thing is -fPIC requires code to be loaded by some os or bootloader, that will fill the global offset table with symbols. I knew it's possible to generate bare-metal position-independent code, that shall work without got, but it turned out this is not implemented in gcc (it is in arm compiler, but only in 32-bits and who would like to use arm compiler anyway). I ended up writing stage1 of both bootloader and the kernel in careful position-independent assembly, thus achieving my goal (jut with a bit more of effort). +* Linker behaves weird when section names don't start with .text, .data, etc. +* Not strictly a problem, but a funny mistake of mine, that is worth mentioning... At first I didn't know about special features of SUBS pc, lr and ldm rn {pc} ^ instructions. So I would switch to user mode by first branching to code in PL0-accessible section and having it execute isb instruction. This worked, but was not good, because code executed by the kernel was in memory section writable by userspace code. So i separated that into "libkernel", that would be in a PL0-executable but non-writable section and would perform the switch... Well, it did work. Still, I was happy when I learned how to achieve the same with subs/ldm and could remove libkernel, making the project a bit simpler. +* system mode has separate stack pointer from supervisor mode, so when going from supervisor to system we need to set it... We didn't know that and we were getting weeeird bugs (where changing something little in one place would make the bug occur or not occur somewhere completely else); also, it's not allowed (undefined behaviour) to switch from system mode directly to user mode... (at least this didn't cause such weird things to happen...) +* both bcm arm peripherals manual and the manual to uart itself say, that writing 0s to PL011_UART_IMSC unmasks interrupts; its the opposite: writing 1s enables specific interrupts and writing 0s disables them. wiki.osdev code also got it the way it's written in those docs, but this didn't cause problems, since uart irq was not enabled in ARM_ENABLE_IRQS_2 (using #define names from our code), so, as intended, no irq was occuring +* STILL UNFIXED · The very simple pipe_image program somehow manages to break stdin, so that even other programs run in that same (bash) shell can't read from it... (in zsh other interactively run commands work ok, but commands following pipe_image inside a shell function still have that problem) + +## Sources of Info + +* wiki.osdev (good for starting off, we could also (in a polite way) mention the things, that were broken there) + + getting uart irq masking wrong (not really their fault, see above) + + zeroing bss even tho it was placed in a section, that wasn't marked NOLOAD (I need to verify that yet) + + switch inside an enum!?!?!? + + There is already some more mentioned in HISTORY.md +* ARM Architecture Reference Manual® ARMv7-A and ARMv7-R edition (man, this has 2720 pages! But I think this was the most useful document of all) +* dwelch67 +* http://www.simtec.co.uk/products/SWLINUX/files/booting_article.html - very good description of atags +* BCM2835-ARM-Peripherals.pdf and https://elinux.org/BCM2835_datasheet_errata +* https://buildmedia.readthedocs.org/media/pdf/devicetree-specification/latest/devicetree-specification.pdf - perhaps You've found another source for that... but if not, this seems to be good! +* online ARM Compiler toolchain Assembler Reference +* Christina Brook's rpi-open-firmware +* http://infocenter.arm.com/help/topic/com.arm.doc.ddi0183g/DDI0183G_uart_pl011_r1p5_trm.pdf +* GNU make documentation +* description of linker scripts: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Using_ld_the_GNU_Linker/sections.html#OUTPUT-SECTION-DESCRIPTION diff --git a/makeDoc.sh b/makeDoc.sh index 556969f..05f7c16 100755 --- a/makeDoc.sh +++ b/makeDoc.sh @@ -1,5 +1,6 @@ +#!/usr/bin/env bash rm -f document.md - array=("Building-and-running-explained.txt" "Makefile-explained.txt" "Project-structure-explained.txt" "Boot-explained.txt" "MMU-explained.txt" "PSRs-explained.txt" "Ramfs-explained.txt" "Exception-vector-explained.txt" "IRQ-explained.txt" "processor-modes-explained.txt" "Scheduler-explained.txt" "Linker-scripts-explained.txt") + array=("Building-and-running-explained.txt" "Makefile-explained.txt" "Project-structure-explained.txt" "Boot-explained.txt" "MMU-explained.txt" "PSRs-explained.txt" "Ramfs-explained.txt" "Exception-vector-explained.txt" "IRQ-explained.txt" "processor-modes-explained.txt" "Scheduler-explained.txt" "Linker-scripts-explained.txt" "Sources.txt") echo "# Raspberry PI MMU project" >> document.md echo "//TODO insert [TOC] here" >> document.md for file in "${array[@]}" -- cgit v1.2.3