Merge branch 'bob' of https://repo.or.cz/RPi-MMU-example into alice

# Conflicts: # src/arm/PL1/kernel/interrupts.c
author: vetch <vetch97@gmail.com> 2020-01-17 20:49:32 +0100
committer: vetch <vetch97@gmail.com> 2020-01-17 20:49:32 +0100
commit: dfd6177fea6769a0e7dcd2d2205e5a795bba3553 (patch)
tree: e11cd7794da7f283f177e36bbb906b3111d51883
parent: c0ce14c3fd8b598fceacdf0a194420f8acd924bf (diff)
parent: 9ed55d7612be0ffd17e3e9cc08bea7225470ee67 (diff)
download: rpi-MMU-example-dfd6177fea6769a0e7dcd2d2205e5a795bba3553.tar.gz
rpi-MMU-example-dfd6177fea6769a0e7dcd2d2205e5a795bba3553.zip
6 files changed, 90 insertions, 31 deletions
diff --git a/MMU-explained.txt b/MMU-explained.txt
index af350ff..e640aaa 100644
--- a/MMU-explained.txt
+++ b/MMU-explained.txt
@@ -14,5 +14,64 @@ This aids operating system's memory management in several ways
 
 A given mapping can be made valid for only one execution mode (i.e. region only accessible from privileged mode) or only certain types of accesses (i.e. a memory region can be made non-executable, which guards against accidental jumping there by program code (important for countering buffer-overflow exploits)). An unallowed access triggers a processor exception, which passes control to an appropriate interrupt service routine.
 
-General configuration of the MMU in ARM processors it is present on is done through registers of the appropriate coprocessor (cp15). Translations are managed through translation table. It is an array of 32-bit or 64-bit entries describing how their corresponding memory regions should be mapped. A number of leftmost bits of a virtual address constitutes an index into the translation table to be used for translating it. This way no virtual addresses need to be stored in the table and MMU can perform translations in O(1) time.
 
+In RaspberryPi environments used by us, there are ARMv7-A-compatible processors, which we currently use only in 32-bit mode. Information here is relevant to those systems (there are Pi boards with both older and newer processors, with more or less functionality and features available).
+
+General configuration of the MMU in ARM processors it is present on is done through registers of the appropriate coprocessor (cp15). Translations are managed through translation table. It is an array of 32-bit or 64-bit entries (also called descriptors) describing how their corresponding memory regions should be mapped. A number of leftmost bits of a virtual address constitutes an index into the translation table to be used for translating it. This way no virtual addresses need to be stored in the table and MMU can perform translations in O(1) time.
+
+Coprocessor 15 contains several registers, that control the behaviour of the MMU. They are all accessed through mcr and mrc arm instructions.
+1. SCTLR, System Control Register - "provides the top level control of the system, including its memory system"
+   Bits of this register control, among other things:
+      · whether the MMU is enabled
+      · whether data cache is enabled
+      · whether instruction cache is enabled
+      · whether TEX remap is enabled
+         TEX remap is a feature, that changes how some translation table entry bit fields (called C, B and TEX) are used. We're not using TEX remap in our project.
+      · whether access flags are enabled
+         Enabling access flag causes one translation table descriptor bit normally used to specify access permissions of a region to be used as access flag. We don't use this feature either
+2. DACR, Domain Access Control Register - "defines the access permission for each of the sixteen memory domains"
+   Entries in translation table define which of available 16 memory domains a memory region belongs to. Bits of DACR specify what permissions apply to each of the domains. Possible setting are to allow accesses to regions based on settings in translation table descriptor or to allow/disallow all accesses regardless of access permission bits in translation table.
+3. TTBR0, Translation Table Base Register 0 - "holds the base address of translation table 0, and information about the memory it occupies"
+   System mode programmer can choose (with respect to some alignment requirements) where in the physical memory to put the translation table. Chosen address (actually, only a number of it's leftmost bits) has to be put in TTBR for the MMU to know where the table lies. Other bits of this register control some memory attributes relevant for accesses to table entries by the MMU
+3. TTBR1, Translation Table Base Register 1 - simillar function to TTBR0 (see below for explaination of dual TTBR)
+4. TTBCR, Translation Table Base Control Register
+   Bits of this register control
+      · How TLBs (Translation Lookaside Buffers) are used. TLBs are a mechanism of caching translation table entries.
+      · Whether to use some extension feature, that changes traslation table entries and TTBR* lengths to 64-bit (we're not using this, so we won't go into details)
+      · How a translation table is selected. There can be 2 translation tables and there are 2 cp15 registers (TTBR0 and TTBR1) to hold their base addresses. When 2 tables are in use, then on each memory access some leftmost bits of virtual address determine which one should be used. If the bits are all 0s - TTBR0-pointed table is used. Otherwise - TTBR1 is used. This allows OS developer to use separate translation tables for kernelspace and userspace (i.e. by having the kernelspace code run from virtual addresses starting with 1 and userspace code run from virtual addresses starting with 0). A field of TTBCR determines how many leftmost bits of virtual address are used for that (and also affects TTBR0 format). In the simplest setup (as in our project) this number is 0, so only the table specified in TTBR0 is used.
+
+Translation table consists of 4096 entries, each describing a 1MB memory region. An entry can be of several types:
+1. Invalid entry - the corresponding virtual addresses can not be used
+2. Section - description of a mapping of 1MB memory region
+3. Supersection - description of a mapping of 16MB memory region, that has to be repeated 16 times in consecutive memory sections (can be used to map to physical addresses higher than 2^32)
+4. Page table - no mapping is given yet, but a page table is pointed. See below.
+Besides, translation table descriptor also specifies:
+1. Access permissions.
+2. Other memory attributes (cacheability, shareability).
+3. which domain the memory belongs to.
+
+Page table is something simillar to translation table, but it's entries define smaller regions (called, well - pages). When a translation table descriptor describing a page table gets used for translation, then entry in that page table (with some middle bits of the virtual address used as index into it) is fetched and used. This allows for better granularity of mappings while not requiring the page tables to occupy space if small pages are not needed. We can say, that 2-level translations are performed. On some versions of ARM translations can have more levels than here. This means the MMU might sometimes need to fetch several entries from different level tables to compute the physical address. This is called a translation table walk.
+
+As of 15.01.2020 page tables and small pages are not used in the project (although programming them is on the TODO list).
+
+Despite the overhelming amount of configuration options available, most can be left with deafults and this is how it's done in this project. Those default settings usually make the MMU behave as in older ARM versions, when some options were not yet available (and hence, the entire system was simpler).
+
+Our project uses C bitfield structs for operating on SCTLR and TTBR contents (with DACR - bit shifts are more appropriate and with TTBCR - our default configuration means just writing 0 to register) and translation table descriptors. This is an elegant and readable approach, yet little-portable across compilers. Current struct definitions are sure to work properly with GCC.
+
+Structs describing SCTLR, DACR and TTBR are defined in src/arm/PL1/kernel/cp_regs.h, while those describing translation table descriptors - in src/arm/PL1/kernel/translation_table_descriptors.h.
+
+Before the MMU is enabled, all memory is seen as it really is. Therefore, the only feasible way of enabling it is by initially setting the descriptors in translation table to map all addresses (mapping just addresses used by the kernel would be enough) to themselves. It is called a flat map.
+
+How setting up a flat map and turning on the MMU and management of memory sections is done in our project:
+1. Translation table is defined in the linker script src/arm/PL1/kernel/kernel_stage2.ld as a NOLOAD section. C code gets the table's start and end addresses from smbols defined in that linker script (see arm/PL1/kernel/memory.h).
+2. Function setup_flat_map() defined in arm/PL1/kernel/paging.c enables MMU with a flat map. It prints relevant information to uart while performing the following operations:
+   · In a loop writes all descriptors to the translation table, setting them as sections, accessible from PL1 only, belonging to domain 0.
+   · Sets DACR to allow domain 0 memory accesses based on translation table descriptor permissions and block accesses to other domains (only domain 0 is used in this project).
+   · Makes sure TEX remap, access flag, caches and the MMU are disabled in SCTLR. Disabling some of them might be unnecessary, because MMU is assumend to be disabled on the start and enabled caches might cause no problems as long as only flat map is used. Still, the way it is done right now is known to work well and optimizations are not needed.
+   · Clears all caches and TLBs (again, it is suspected that at some of this is unnecessary).
+   · Writes TTBCR setting, that causes only one, 32-bit translation table to be used.
+   · Makes TTBR0 point to the start of translation table. Rest of attributes in TTBR0 (concerning how table entries are being accessed) are left as 0s (defaults).
+   · Enables the MMU and caches by setting the appropriate bits in SCTLR.
+After some cp15 register writes, the isb assembly instruction is used, which causes ARM core to wait until changes take effect (otherwise some later instructions could possibly be executed before this happens).
+
+In arm/PL1/kernel/paging.c the function claim_and_map_section() can be used to modify an entry in translation table to create a new mapping. Memory allocation also done in that source file uses some lists to describe free and taken sections and has nothing to do with with the MMU.
diff --git a/Makefile b/Makefile
index e9d9852..66f3b0a 100644
--- a/Makefile
+++ b/Makefile
@@ -1,7 +1,6 @@
 # actual recipes for everything are in build/Makefile;
 
 % :
-	echo generic
 	$(MAKE) -C build $@
 
 # below is just for shell auto-completion
diff --git a/Makefile-explained.txt b/Makefile-explained.txt
new file mode 100644
index 0000000..c199059
--- /dev/null
+++ b/Makefile-explained.txt
@@ -0,0 +1,13 @@
+Our project contains 2 Makefiles: one in it's root directory and one in build/. The reason is that it is possible to use Makefile to simply, elegantly and efficiently produce files in the same directory where it is, but to produce files in directory other than Makefile's own, it requires this directory to be specified in many rules across the Makefile and in general it complicates things. Also, a problem arises when trying to link objects not from within the current directory. If an object is referenced by name in linker script (which is a frequent practice in our scripts) and is passed to gcc with a path, then it'd need to also appear with that path in the linker script.
+Because of that a Makefile in build/ is present, that produces files into it's own directory and the Makefile in project's root is used as a proxy to that first one - it calls make recursively in build/ with the same target it was called with.
+
+From now on only Makefile in build/ will be discussed.
+
+In the Makefile, variables with the names of certain tools and their command line flags are defined (using =? assignment, which allows one to specify their own value of that variable on the command line). In case a cross-compiler with a different triple should be used, ARM_BASE, normally set to arm-none-eabi, can be set to something like arm-linux-gnueabi or even /usr/local/bin/arm-none-eabi.
+
+All variables discussed below are defined using := assignment, which causes them to only be evaluated once instead of on every reference to them.
+
+Objects that should be linked together to create each of the .elf files are listed in their respective variables. I.e. objects to be used for creating kernel_stage2.elf are all listed in KERNEL_STAGE2_OBJECTS. When adding a new source file to the kernel, it is enough to add it's respective .o file to that list to make it compile and link properly. No other Makefile modifications are needed.
+In a simillar fashion, RAMFS_FILES variable specifies files, that should be put in the ramfs image, that will be embedded in the kernel. Adding another file only requires listing it there. However, if the file is to be found somewhere else that build/, it might be useful to use the vpath directive to tell make where to look for it.
+
+Variables dirs and dirs_colon are defined to 
diff --git a/TODOs b/TODOs
index ebfafc5..dd7707e 100644
--- a/TODOs
+++ b/TODOs
@@ -54,6 +54,10 @@ high priority TODOs are higher; low priority ones and completed ones are lower;
 
 * write some procedures for dumping registers and other stuff (for use in debugging); maybe print registers' contents on data/prefetch abort?
 
+* Memory regions can be configured as one of several types, which affects how memory reads/writes are performed by the processor. Dig into that and use the best appropriate settings in paging.c (i.e. normal memory instead of strongly-ordered memory for RAM).
+
+* In the Makefile: is =? the right assignment for, say, CFLAGS?
+
 * partially DONE - one can always add more, but we have the most important stuff * Implement some basic utilities for us to use (memcpy, printf, etc...)
 
 * partailly DONE - svc works; once we implement processes we could also kill them on aborts * develop userspace process supervision (handling of interrupt caused by svc instruction, proper handling of other data abort, undefined instruction, etc.)
diff --git a/src/arm/PL1/kernel/interrupts.c b/src/arm/PL1/kernel/interrupts.c
index 2c3c752..5695e6f 100644
--- a/src/arm/PL1/kernel/interrupts.c
+++ b/src/arm/PL1/kernel/interrupts.c
@@ -4,6 +4,7 @@
 #include "armclock.h"
 #include "scheduler.h"
 
+// defined in setup.c
 void __attribute__((noreturn)) setup(void);
 
 // from what I've heard, reset is never used on the Pi;
@@ -101,29 +102,3 @@ void fiq_handler(void)
 {
   error("fiq happened");
 }
-
-
-/* Old, not sure if working interrupt function */
-//void
-//__attribute__((interrupt("IRQ")))
-//__attribute__((section(".interrupt_vectors.text")))
-//irq_handler2(void) {
-////    uart_puts("GOT INTERRUPT!\r\n");
-//
-//    local_timer_clr_reload_reg_t temp = { .IntClear = 1, .Reload = 1 };
-//    QA7->TimerClearReload  = temp;									// Clear interrupt & reload
-//}
-
-//int enable_timer(void) {
-//
-//    QA7->TimerRouting.Routing = LOCALTIMER_TO_CORE0_IRQ;			// Route local timer IRQ to Core0
-//    QA7->TimerControlStatus.ReloadValue = 100;						// Timer period set
-//    QA7->TimerControlStatus.TimerEnable = 1;						// Timer enabled
-//    QA7->TimerControlStatus.IntEnable = 1;							// Timer IRQ enabled
-//    QA7->TimerClearReload.IntClear = 1;								// Clear interrupt
-//    QA7->TimerClearReload.Reload = 1;								// Reload now
-//    QA7->Core0TimerIntControl.nCNTPNSIRQ_IRQ = 1;					// We are in NS EL1 so enable IRQ to core0 that level
-//    QA7->Core0TimerIntControl.nCNTPNSIRQ_FIQ = 0;					// Make sure FIQ is zero
-////    uart_puts("Enabled Timer\r\n");
-//    return(0);
-//}
-\ No newline at end of file
diff --git a/src/arm/PL1/kernel/paging.c b/src/arm/PL1/kernel/paging.c
index 771c681..6da9905 100644
--- a/src/arm/PL1/kernel/paging.c
+++ b/src/arm/PL1/kernel/paging.c
@@ -101,10 +101,11 @@ void setup_flat_map(void)
   // enable MMU
   puts("enabling the MMU");
 
-  // redundant - we already have SCTLR contents in the variable
-  // asm("mrc p15, 0, %0, c1, c0, 0" : "=r" (SCTLR.raw));
+  // we already have SCTLR contents in the variable
 
-  SCTLR.fields.M = 1;
+  SCTLR.fields.M = 1; // enable MMU
+  SCTLR.fields.C = 1; // enable data cache
+  SCTLR.fields.I = 1; // enable instruction cache
 
   asm("mcr p15, 0, %0, c1, c0, 0\n\r"
       "isb" :: "r" (SCTLR.raw) : "memory");
@@ -241,6 +242,14 @@ uint16_t claim_and_map_section
   // write modified descriptor to the table
   *section_entry = descriptor;
   
+  // invalidate instruction cache
+  asm("mcr p15, 0, r0, c7, c5, 0\n\r" // r0 gets ignored
+      "isb" ::: "memory");
+
+  // invalidate branch-prediction
+  asm("mcr p15, 0, r0, c7, c5, 6\n\r" // r0 - same as above
+      "isb" ::: "memory");
+
   // invalidate main Translation Lookup Buffer
   asm("mcr p15, 0, r1, c8, c7, 0\n\r"
       "isb" ::: "memory");
author	vetch <vetch97@gmail.com>	2020-01-17 20:49:32 +0100
committer	vetch <vetch97@gmail.com>	2020-01-17 20:49:32 +0100
commit	dfd6177fea6769a0e7dcd2d2205e5a795bba3553 (patch)
tree	e11cd7794da7f283f177e36bbb906b3111d51883
parent	c0ce14c3fd8b598fceacdf0a194420f8acd924bf (diff)
parent	9ed55d7612be0ffd17e3e9cc08bea7225470ee67 (diff)
download	rpi-MMU-example-dfd6177fea6769a0e7dcd2d2205e5a795bba3553.tar.gz rpi-MMU-example-dfd6177fea6769a0e7dcd2d2205e5a795bba3553.zip