Documentation of J1 Versions.
There are many many different versions of the J1 processor. Here I will introduce them.
The J1 processor is a striped down 16 bit stack machine designed to be very small and very fast. Its goal was to copy bits from a camera to udp packets. That is all it had to do. So no interrupts, no lights, no nothing else. It is less than 200 lines of Verilog, and quite understandable. But it did require dual port RAM. An instruction is read every clock cycle, and on average every 10th clock cycle application data is read or written. The J1 by itself was otherwise not very useful, so James Bowman released the J1a, J1b, and the J4. And then Mecrisp-Ice released a bunch more versions.
https://github.com/jamesbowman/swapforth/issues/74 Pseudo dual-port SRAM's can read from an address whilst writing to a different address, but each of the two ports is dedicated to only read or write. With true dual port SRAM, each of the ports can read OR write. It's possible to 'emulate' a true dual port sram with only pseudo dual port blocks, but it costs performance, since you need to clock the sram twice as fast to do that. ice40 architecture FPGA's only have pseudo-dual port embedded ram blocks, and the j1a was written to run on an ice40hx1k chip. So to accommodate memory access apart from just reading the next instruction, the j1a core has to have an 'alternate' mode, which is done by setting You can see Of course, this then means there is no need for opcode Instead that opcode is allowed to be used for a 'minus' op in the j1a, which would otherwise require The other difference can be seen if you pc[12] isn't actually used to address the SRAM: the SRAM is generated in j1a/mkrom.py so that the initial contents can be set at FPGA compile time, so that the FPGA also bootstraps the core at configure time. (This isn't so necessary now that the icestorm tools have the ability to just replace SRAM contents without a recompile, but they couldn't do that back when j1a was written, and it's a neat way to make the FPGA configuration logic do your SoC core's bootstrapping too). Which is to say that The highest ram fetch address bit the design uses is It's a little confusing IMHO, but 'din' in ram.v is the connection flowing data from the RAM to the core, and vice-versa for 'dout'. Another thing which makes the J1 very fast: note that top of stack It makes one realize that Stack movements are just encoded as two-bit signed integers in the ALU opcode format - one return stack and one for data stack, although You could in principle have opcodes that operated to replace any number of stack items - you'd just rearrange the core such that the top few logical stack items are also registers, like This makes the J1 design pretty interesting for custom FPGA SoC use, IMHO. Of course, in practise I've found it much easier to extend the I/O section (in icestorm/j1a.v) to allow just hooking up 'accelerator' units, added to the design on an as-needed basis. The only 'deep' core modding I did was the j4a, which is kinda 4x j(1/4)a in a sense. Has 4x the context, and 'looks' like a 1/4 speed j1a to the code... until you put the other 'cores' to work (they're logical only, the ALU, SRAM and IO are all shared). Mainly it just has funky 'stack' modules, with a little bit of pipelining and tuning. It's probably got a bug, but has mostly worked out pretty well for me. It lets me run multiple dumb spin loop bit-bang IO to control/talk to different chips at different rhythms without any interlocks or glitches. Just for a maximum of 4 'threads', but this is heaps for simple thing like a PID controller. A nice consequence it has is that you can have a spin-loop based app running and still talk to swapforth over rs232 to get/set variables in SRAM without any timing changes. You can even actively hack / rewrite code for different jobs without upsetting at all ones that are running. Having no DRAM, no wait cycles, no bubbles and only an 'emergency' interrupt system (to recover crashed cores) is incredibly freeing when you're writing a real-time controller. Kind of like having a RTOS in hardware, only better; since the timing is FPGA-state-machine rock-solid, and interlocks are impossible. Anyway, the code is so short and beautiful for the J1 cores that 'documenting' them is probably more about learning to read verilog than anything else. Better to have a single source of truth and all that. One interesting observation: There are other parts of the instruction space which are 'available': J1 uses a 4-bit field to select one of 16 ops, but that could easily be extended to one of 32 ops, since that thirteenth bit is already 'free'... |
Built using the Forest Map Wiki