README.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98

## About
This repository shall contain the the code for 'Laboratory station based on
programmable logic device for WebAssembly execution evaluation' developed as my
engineering thesis at AGH University of Science and Technology in Cracov,
Poland.

The project utilizes Verilog HDL. Icarus Verilog Simulator is used for
simulation and test benches, while Yosys, arachne-pnr/nextpnr and icestorm are
the tools chosen for synthesis, p&r and bitstream generation for Olimex's
iCE40HX8K-EVB FPGA.

## Technical choices
I'm using one of few FPGAs with fully libre toolchain. SystemVerilog and VHDL
are not yet (officially) supported in Yosys, so I'm using Verilog2005.

I'm writing my own stack machine CPU for the job. Another option would be to
run an existing register-based CPU (picorv32?) on the FPGA and interpret Wasm
on it. Despite my thesis' topis is broad anough it would allow that, I didn't
go this way, because:
 - there'd be nothing innovative in this approach,
 - I'd end up mostly copying other's code, ending up with a copy-paster's
   thesis...

I'm using Wishbone pipelined interconnect for CPU and other components.

WebAsm binary format was not designed for direct execution, so I'm instead
creating a minimal stack machine, that would allow almost 1:1 translation of
Wasm code to it's own instruction format. I still think it's possible to make
a CPU, that would execute Wasm directly - it's just matter of a bit more
effort.

The stack machine is and will be limited. That's why some more complex Wasm
instructions (e.g. 64-bit operations, maybe float operations) have to be
replaced with calls to software routines.

The goal is to write some minimal "bootloader", that would translate Wasm
to my stack machine's instructions on-device.

The SPI chip on iCE40HX8K-EVB is 2MB big. The configuration stored on it is
below 137KB. I'm going to use the remaining memory to store the actual Wasm
code for execution.

The initial booting code will be preloaded to embedded RAM (iCE40HX8K has such
feature and Yosys supports it).

I'm using VGA (640x480@60Hz) with self-created text mode for communicating to
the outside. UART is also planned.

I wrote an assembly for my stack machine (tclasm.tcl). The actual assembly
instructions are expressed in terms of tcl command executions, so we could call
it pseudo-assembly. Before embracing tcl, I needed a way to express memory
reads and writes for some test benches and created a simple macroassembly
(include/macroasm.vh). I probably should have used tcl from the beginning...

Everything is done through some (quite sophisticated) Makefiles.

## Project structure
 - Makefile - needs no explaination...
 - Makefile.config - included by Makefile and Makefile.test, defines variables,
   		     makes it easy to, e.g., change the compiler command
 - Makefile.util - also included by Makefile and Makefile.test - defines things,
   		   that didn't semantically fit into Makefile.config
 - design/ - Verilog sources, that will get synthesized for FPGA (+some other
   	     files like initial memory contents)
 - models/ - Verilog modules used in testing
 - tests/ - benches, each in its own subdirectory, with a Makefile including
   	    Makefile.test
 - tclasm.tcl - implementation of simple assembly in terms of tcl commands
 - include/ - Verilog header files for inclusion
 - tools/ - small C programs
 - COPYING - 0BSD license
 - README.txt - You're reading it

## Project status
I'm a huge bit delayed with the work (should have had a working prototype in
June...), but I'm working on it.

I had a previous approach to the problem in July. Work was going extremely
slowly and I felt, that my code was really bad. This is also because I haven't
had any serious hardware design experience before. Now, I started anew. My
current approach is less CISCy. I'm also doing everything in the simulator,
with test benches for every module and plans to get it to run on the FPGA once
the design is able to display something through VGA. That's different from my
previous approach, where I was trying to make something run on the board and
then write tests for it. I'm now determined to use Wishbone, because I believe
it helps me keep the design clean.

My stack machine is currently able to do some operations like memory accesses,
addition and unsigned division, jumps, but it's not yet ready to have most of
Wasm translated to it. At the beginning of September I changed the design and
instruction format and rewrote the stack machine. The current one can be
considered my third approach :p

### Thoughts
It's indeed an interesting project, but from practical point of view - it's
still going to be more efficient to JIT-compile Wasm on a register-based
architecture... Perhaps it'd be more useful to optimize an exisiting processor
(OpenRISC, OpenSPARC, RiscV) for that?