FPU and DEUNA fixes

Since I found the problem with the BAE register in the RH70 that prevented RSX-11MP from running, I’ve been working on straightening out the timing between the CPU and the main memory. It’s nowhere near finished, but the first tests show that when it is, the CPU will be capable of much higher speed – one experiment even ran at 90Mhz on the latest FPGA models. Not bad at all, compared to the current baseline of 10Mhz, even considering that the new timing needs slightly more cycles per instruction.

In the meantime people have been looking at RSX. Especially Paul Anokhin, who has helped me find several issues. Firstly in bringing the somewhat forgotten de1dram board variant back to life. The DE1 board has two memory chips, an sram of 512KB and a dram of 8MB. Obviously the sram is a bit too small to really bring to life 22-bit CPUs, and very limiting if you want to run the later versions of RSX or Unix on them. Years ago I made the de1dram version as an experiment while I was waiting for my first DE0 board to arrive, but it was never quite finished – the DE0 arrived a bit earlier than that, and I finished the work on that board and forgot about the de1dram. But now it works.

Secondly, Johnny Billquist has been working on BQTCP – a TCP stack for RSX that coexists with Decnet. It is afaik the only case that really requires the buffer chaining in the DEUNA to work correctly – Decnet uses fairly small buffers, but TCP by default uses an MTU of 1500, so if a packet of that size arrives and the buffers are smaller than that, the buffer chaining needs to be correct. This was not a problem before, since it appears as if all other cases – Decnet itself, TCP on 2.11BSD – appear to use buffer sizes slightly larger than the maximum packet size they expect to receive.

I wrote the DEUNA microcode as a sort of proof-of-concept – meaning, it is not really clean structured code. But it appeared to work well enough, so the somewhat more complex case of correct buffer chaining was not completely finished; it triggered a warning message, and it would also switch to the next buffer when needed. What I forgot was the case where a chunk of data from the ENC424J600 chip – the data is copied from the chip in 16-byte chunks – did not completely fit in the buffer; in that case, the whole 16-byte chunk would be placed in the new buffer instead of filling up the old one instead. Obviously, that caused errors for BQTCP. Luckily, it was surprisingly easy to fix, I only needed to restructure the receive flow in the microcode a bit and add a couple of tests to make the buffer chaining work correctly.

The latest issue Paul reported was slightly more complex to find – he wrote a F77 program to do some floating point calculations, and the results were not correct. The same algorithm in BP2 on RSX also was wrong, but translated to C on 2.11BSD it worked correctly. Since I did not have much time to look into this, I asked Paul to look at the instructions generated from the F77 program, and try to find the difference in flow between SIMH – which worked correctly – and the FPGA hardware. What he came up with was that the different flow started near the execution of the ABSF instruction.

That provided a nice clue for me to start chewing on. And soon enough, it became clear that there was an issue in the way that the addressing mode 0 for the group of instructions called ‘fp single operand group 2’ was handled – ABSF, NEGF, TSTF, and CLRF. For mode 0, I implemented a fast path in the instruction sequencer to bypass reading the input operand – because the input operand is in a register, it does not require memory access and thus does not need memory cycles. However, the register read occurred in the same cycle as picking up the output from the ALU – so, in effect, the output of the ALU was not based on the input. For the CLRF instruction, that makes no difference since the input is irrelevant anyway, and I would speculate that the TSTF instruction is not used much – but for the ABSF and NEGF instruction this is obviously not the case.

Apparently the addressing mode 0 ABSF and NEGF instructions are not used much. I checked 2.11BSD; at least the C implementation hides these instructions through library calls, so the compiler does not appear to generate these instructions directly. And the library implementation works with the operands on the stack, so it will never use mode 0. Also the MAINDECs seem to omit checking this part – maybe it did not use a separate data flow in the original machines, so that it would not make sense to specifically test for it. Whatever the case, none of FFPA, FFPB, FFPC, KFPA, KFPB, KFPC, or ZKDL picked up this issue – all of these run quite nicely even when the bug in the CPU is present.

Also here, once I understood the nature of the problem the fix was quite easy, I only needed to advance the register read to the main instruction decode state. Where it should have been in the first place, obviously – the whole point of the fast path was that the register should have been read already during the instruction decode.

Anyway, I’ll be posting the updated sources to the download page, and later this weekend I’ll post updated bitstreams as well. Big thanks to Paul for his help in finding and fixing these!