FPU and DEUNA fixes

Since I found the problem with the BAE register in the RH70 that prevented RSX-11MP from running, I’ve been working on straightening out the timing between the CPU and the main memory. It’s nowhere near finished, but the first tests show that when it is, the CPU will be capable of much higher speed – one experiment even ran at 90Mhz on the latest FPGA models. Not bad at all, compared to the current baseline of 10Mhz, even considering that the new timing needs slightly more cycles per instruction.

In the meantime people have been looking at RSX. Especially Paul Anokhin, who has helped me find several issues. Firstly in bringing the somewhat forgotten de1dram board variant back to life. The DE1 board has two memory chips, an sram of 512KB and a dram of 8MB. Obviously the sram is a bit too small to really bring to life 22-bit CPUs, and very limiting if you want to run the later versions of RSX or Unix on them. Years ago I made the de1dram version as an experiment while I was waiting for my first DE0 board to arrive, but it was never quite finished – the DE0 arrived a bit earlier than that, and I finished the work on that board and forgot about the de1dram. But now it works.

Secondly, Johnny Billquist has been working on BQTCP – a TCP stack for RSX that coexists with Decnet. It is afaik the only case that really requires the buffer chaining in the DEUNA to work correctly – Decnet uses fairly small buffers, but TCP by default uses an MTU of 1500, so if a packet of that size arrives and the buffers are smaller than that, the buffer chaining needs to be correct. This was not a problem before, since it appears as if all other cases – Decnet itself, TCP on 2.11BSD – appear to use buffer sizes slightly larger than the maximum packet size they expect to receive.

I wrote the DEUNA microcode as a sort of proof-of-concept – meaning, it is not really clean structured code. But it appeared to work well enough, so the somewhat more complex case of correct buffer chaining was not completely finished; it triggered a warning message, and it would also switch to the next buffer when needed. What I forgot was the case where a chunk of data from the ENC424J600 chip – the data is copied from the chip in 16-byte chunks – did not completely fit in the buffer; in that case, the whole 16-byte chunk would be placed in the new buffer instead of filling up the old one instead. Obviously, that caused errors for BQTCP. Luckily, it was surprisingly easy to fix, I only needed to restructure the receive flow in the microcode a bit and add a couple of tests to make the buffer chaining work correctly.

The latest issue Paul reported was slightly more complex to find – he wrote a F77 program to do some floating point calculations, and the results were not correct. The same algorithm in BP2 on RSX also was wrong, but translated to C on 2.11BSD it worked correctly. Since I did not have much time to look into this, I asked Paul to look at the instructions generated from the F77 program, and try to find the difference in flow between SIMH – which worked correctly – and the FPGA hardware. What he came up with was that the different flow started near the execution of the ABSF instruction.

That provided a nice clue for me to start chewing on. And soon enough, it became clear that there was an issue in the way that the addressing mode 0 for the group of instructions called ‘fp single operand group 2’ was handled – ABSF, NEGF, TSTF, and CLRF. For mode 0, I implemented a fast path in the instruction sequencer to bypass reading the input operand – because the input operand is in a register, it does not require memory access and thus does not need memory cycles. However, the register read occurred in the same cycle as picking up the output from the ALU – so, in effect, the output of the ALU was not based on the input. For the CLRF instruction, that makes no difference since the input is irrelevant anyway, and I would speculate that the TSTF instruction is not used much – but for the ABSF and NEGF instruction this is obviously not the case.

Apparently the addressing mode 0 ABSF and NEGF instructions are not used much. I checked 2.11BSD; at least the C implementation hides these instructions through library calls, so the compiler does not appear to generate these instructions directly. And the library implementation works with the operands on the stack, so it will never use mode 0. Also the MAINDECs seem to omit checking this part – maybe it did not use a separate data flow in the original machines, so that it would not make sense to specifically test for it. Whatever the case, none of FFPA, FFPB, FFPC, KFPA, KFPB, KFPC, or ZKDL picked up this issue – all of these run quite nicely even when the bug in the CPU is present.

Also here, once I understood the nature of the problem the fix was quite easy, I only needed to advance the register read to the main instruction decode state. Where it should have been in the first place, obviously – the whole point of the fast path was that the register should have been read already during the instruction decode.

Anyway, I’ll be posting the updated sources to the download page, and later this weekend I’ll post updated bitstreams as well. Big thanks to Paul for his help in finding and fixing these!

Fixes for DEUNA

Over the last months, I had a couple of occurrences of the problem where 2.11BSD would loose it’s network connection, reporting that there were no transmit buffers available on the DEUNA. All in all, I’ve seen this problem three or four times over the last year, but maybe ten times in the last month or so. No idea why – the only thing that changed is that I have a new Ethernet switch, that could make for a subtle change in the timing.

Anyway, now that it occurred more often, that also gave me the opportunity to find out what was actually wrong. I enabled the debug code in the if_de.c driver for the DEUNA and added some more debug statements. Next morning I was surprised by debug output – showing that in effect all transmit buffers were free…

So, that got me thinking of the interrupt controller core in the DEUNA. It did contain some strange edge-trigger construction, that could potentially result in a deadlock. I changed it, and setup my venerable old 20Mhz oscilloscope to show the interrupt signals – br and bg.

This time I had to wait for three days for the problem to occur again – and to my disappointment, it did, and it still locked up in the same way. However, it was also clear that no interrupts were taking place, so I was definitely looking in the right place – the interrupt controller was maybe not locked up itself, but even so no interrupts were taking place. More evidence against the interrupt controller and the edge-trigger in it.

A couple of experiments showed that an easy solution would be just to generate interrupts on the level instead of the edge. But this would also cause the DEUNA to keep on interrupting until the software disabled interrupts or cleared the originating bit. Not very elegant, but it did work – and after some time, I realised that the software will in all likely scenario’s examine the interrupt bits, and most likely reset them. So would it maybe work if I went back to the edge triggering system, and reset the trigger on writes into the PCSR0 register?

Of course it did.

And a minor other thing comes to mind: I keep saying DEUNA, but it’s actually a DELUA now. The difference is only in the PCSR1 ID bits; no logic has been changed at all. I did this because Decnet on RSX-11M-Plus tries to load microcode into the DEUNA – which will not work because in reality of course the controller does not look like a real DEUNA at all. But it will leave a DELUA alone. And because all the other software – 2.11BSD and RSTS – does not seem to make a difference between DEUNA and DELUA, there seems to be no reason not to change the thing into a DELUA.

I changed several subtle things in the microcode as well, mostly around buffer chaining and resetting the chip if it becomes disconnected for some reason. Buffer chaining probably still is not correct, but it doesn’t really seem to be used extensively by the operating systems – it’s only when broadcast frames longer than what the network stack expect arrive that the code seems to be triggered.

The updates – including the fix for RSX-11M-Plus – are on the download page now, and several pregenerated bitstreams as well. Enjoy!

RSX11M-Plus. Finally.

A couple of weeks ago someone mentioned that there were some FPGA related articles in the December issue of Circuit Cellar. So I checked it, and one of the articles pointed me to the built-in logic analyzers that the leading tool chains now all seem to have. At least, the Circuit Cellar article is about Chipscope, which is the Xilinx variant, and Altera has something similar called SignalTap.

Since most of my Xilinx stuff has been stored away since last years spring cleaning, I decided to go and play with SignalTap. And as usual with the FPGA tooling, the first impression was not that favourable. But a couple of days later I thought to try again, and this time around I started to appreciate some of the things that the software can do. For instance, tap into an enormous lot of signals at a time – at least certainly compared to my old ‘real’ analyzer, which can do only 32 signals. And the amount of capture memory is also decent, provided you’ve some room in your FPGA memories.

But more interesting is the trick where you can let the analyzer capture when some subset of the signals change state. And you can assign names to bit pattern values in a capture. Those two tricks I used to finally find the problem that prevented RSX-11M-Plus from booting – first, I used the address match signal within the RH11 controller logic as a trigger for the analyzer to capture state, and second, I assigned the register names of the control registers within the RH11 to the address signal.

So, I thought that would give me a nice and easy overview of exactly what RSX-11M-Plus was doing to the RH, and what would cause it to get wrong results. And that is exactly what it did – only, not in the way I expected. Took me some time to see something that in retrospect is very obvious; there is a write to a register in the RH11 space, but it isn’t decoded into a register name – even though I added register names for all registers that I knew about.

Aha. So, something going on here… The first thing I checked was whether it could be a controller register or a disk register – in a real setup with RH and RP, some of the registers reside in the disk, others in the controller. I decided to verify all the controller side registers first – and the one that I was consistently missing was BAE, the register that holds the bits 21-16 of the address for the controller. A quick change to the controller source proved that to be correct; if I assigned BAE to this register address, suddenly RSX-11M-Plus would boot happily… And it seems to run quite happily as well, including running complete sysgens, and also running Decnet and other software.

A couple of things still need some clarification; mostly, do other registers also live at other addresses than I would expect them. Once that is done, and I’ve completed my usual regression tests, I’ll be posting the new vhdl to the download page.

This output from the SignalTap-II analyzer shows the unexpected address for the BAE register

This output from the SignalTap-II analyzer shows the unexpected address for the BAE register


Last week Al Kossow posted some new sources for XXDP tests on Bitsavers. Included are the sources for the MMU tests for 11/34. I tried to run those, and was surprised by a list of error messages… Turns out, there were a number of issues left in my MMU implementation, that none of the other tests I’ve used so far had picked up. To be more specific, FKTA, FKTB, and FKTC all three complained about issues that FKTH, KKTA, KKTB, and ZKDK let pass…

  • mtpi worked in the wrong order. Because the instruction follows the more or less regular destination pattern, I did the address calculation for the destination first, and only after that the special part for mtpi – finding the implied operand from the stack, and popping the stack in the process. That gives wrong results if the stack pointer is updated in the destination address calculation… An obscure case, probably, and I’m not at all sure that this exact process is used by all PDP models. Nevertheless, there is a fix.
  • the a and w bits in the pdr registers should be reset on a write. However, my implementation only did that for word or even byte writes – not for odd byte writes.
  • and probably the most promising of all, the a and w bits in the pdr registers should not be set if the access was aborted by the mmu.

I had some hope that these fixes might also have an impact on the still mysterious problem in booting RSX-11M-Plus. No luck though… Still, it never ceases to amaze me how good and how devious some of these test programs are, it really sometimes takes hours to find out what a test does, and how to make the cpu and mmu work as it was intended. And, obscure though these issues may seem, they may in some way impact how some old software runs on the VHDL.

Anyway, I’m running a number of regression tests now, and will post the latest versions some time later. And maybe play with some of the other tests as well – there could still be more to find there.

And of course a big thankyou to Al!


Since DEUNA works, I more or less constantly have one or more boards running 2.11BSD. One of these has only been down twice since November 2 – the first time after three months when I accidentally touched the power switch while cleaning, and the other time last week when a transformer exploded across the street – close to 10.000 homes without power. So only two reboots since begin of November – and both because of power problems. I have current systems that do worse…

Not much has happened on new developments. Partly because my boards were all showing increasingly impressive uptimes, partly because I was not sure on what to do next – and, other priorities demanded a lot of time. What I did do was restructure the DEUNA microcode a bit to make it more robust. There were a couple of bugs in there as well – for instance, the bit in the control register that should cause a reset of the controller hardware actually didn’t. But a more major improvement is that it is now no longer fatal if the pmod becomes disconnected. The controller will just reconnect when the chip becomes available to it again, and continue where it left off.

There is one problem remaining in DEUNA that I’m aware of; very occasionally – so far, I’ve seen it happen 3 times in almost 6 months of running – 2.11BSD will start complaining that there are no free buffers. At the same time, activity on the blinkenlights show that there is something going on – more instructions appear to be processed than normal during idle-waiting. However, vmstat shows no unusual activity that I can detect. The solution is to ifconfig the interface down and up – and things will be normal again.

In the meantime, I’ve been thinking on what to do next – if anything. One item on the roadmap that is a bit overdue is the split disk controller – I mean, the change to the disk controller that allows the cpu to run while the sd cards are active. Would be useful to make this change – I expect that 2.11BSD and probably RSTS and RSX would benefit from the cpu being able to run during card activity, and thus become faster. But also interrupt response and clock stability would improve. The downside is that it is relatively boring work…

Another idea I have is to change my mind and implement the Qbus systems after all. The difficult bit there really is only that that would require a DEQNA – which shares surprisingly little with DEUNA, so it would be a lot of work. And there really is only one reason to do it – it would run Xinu.

Anyway. Summer is here, time to go out and play. But maybe it will rain some days.

DEUNA works!

Terasic's de0 board with Digilent's pmodnic100 attached

Terasic's de0 board with Digilent's pmodnic100 attached

I already posted that I was working on it, so it should not be a very big surprise. DEUNA now works. At least, it works good enough for 2.11BSD – and very stable as well. I’ve even written a new web server for 2.11 – well, you’ve got to understand that 2.11 is a bit older than the concept of web servers. Even though there is a lot of code for simple web servers around on the net, most seem to assume the ‘new’ style of C – which is not that big a deal to convert, until you encounter the intricacies of varargs. Anyway, I decided that I would just roll my own. It seems only right to run your own web server on your own hardware, after all. And, in the process, I decided that I was not really interested in a ‘normal’ web server – I wanted something to browse the operating system with, not something that would serve me content. So, the web server I made gives you the sources, the text files, the directories, and if I get around to it it will give you octal dumps of the binaries. Would have been so brilliantly useful, 30 years ago. And in a way, it still is.

Anyway, Decnet also works. That is, I’ve not had time to test anything beyond a trivially simple 2-system network, with two nodes each running RSTS version 10. That was challenging enough – Decnet encodes it’s node address into the Ethernet card’s mac address, and as it turns out, I misread the specs of the Ethernet chip on the byte order of the mac address. It works now, but to find the mistake took me several nights of debugging. And the show counters command also basically works – at least, it doesn’t hang the system anymore like it did when I first tried it, but it still only produces random numbers. All the required hardware is there though – even the clock that is required for the seconds-since-last-reset fits in. It just needs some more microcode to implement the different counters.

Writing that microcode is not yet on the agenda though… All boards are now tasked with running 2.11BSD, or are in use for tweaking minor things in the DEUNA microcode. Even though I now have plenty of the PMODNIC100 and already had a lot of boards – I should be able to run more than two systems at the same time. However, I forgot that my Ethernet switch hasn’t got any free ports left… so, yes, well, eeehm, I could very well run tests with other OSses as well, but then I’d have to break off the stability tests of the systems I have running now. Would be a shame. I’ll get around to it, eventually. But not this month. Lots of other things demand attention as well, and rewiring my network isn’t even on the things to do list yet.

Which brings me to. The DEUNA itself. It’s an interesting collection of bits and pieces. First of all, the DEUNA that I implemented is a front end for the PMODNIC100 that Digilent sells. Or, more specifically, the ENC424J600 from Microchip – a brilliant tiny thing that has an SPI interface, and does all things that need doing to be Ethernet. The DEUNA frontend is basically a Unibus interface, with a cpu that runs microcode, a busmaster that copies data from the Unibus to the DEUNA, and a busmaster that copies data from the local DEUNA memories to the ENC424J600. Nifty hardware, that is… but the hardware, and by extension also the microcode for the DEUNA, is not compatible in any way with the real deal. Which means that running XXDP for the DEUNA is out of the question. And in turn, that makes verifying if the DEUNA-to-be works correctly a bit difficult. The microcode is also somewhat difficult to debug – it seems like stepping back into the early days of PDP-11 work to me – I mainly have had to rely on theory to find bugs, and only the occasional printf to console to help find a clue as to what is happening. There’s not that much debugging infrastructure in the DEUNA hardware and microcode just yet – probably that is because it’s not been a priority, and that’s because a lot of things already seem to work just fine.

Thus far, I’ve been able to find the answers for 2.11BSD and RSTS v10. At least, 2.11BSD works great – I’ve seen uptimes of several weeks so far, and I’m not really able to break things without cheating – for instance, flood pings work just fine, and all normal protocols work as they should and seem stable – also with uptimes ranging into weeks. And RSTS also seems to work fine – even though details like the read counters command is not really implemented yet. Ultrix-11 however has a problem; it fails because something appears to go wrong with the TCP acknowledge frames – even though the DEUNA microcode receives them and signals the frames to the upper layers, Ultrix appears not to be aware that the frames were received, so a session will stop after the TCP window is filled. I’m clueless as to what goes wrong. It might not even be a problem in my DEUNA; I’ve heard a suggestion that the same issue also may exist on ‘real’ hardware.

Most amazing about the whole DEUNA setup is that I’ve been able to fit an 11/70, a terminal including it’s own PDP-11 cpu to run the terminal microcode, and the DEUNA adapter including it’s own PDP-11 cpu for it’s microcode – all in the single smallish low cost FPGA that sits on the DE0 board. And the same thing also fits on the N2B1200 board. It should come as no surprise that for these configurations the 11/70 does not include floating point hardware – but still, three PDP-11 CPU’s in a single small and low cost FPGA. Luckily, thanks to Walter’s brilliant work on fixing the FPU simulator code, you don’t need floating point hardware to run a 2.11BSD system – so you can have a 11/70 without FP11, but with Ethernet and an embedded console, and have it run a complete 2.11BSD system on the DE0 board – with the ENC424J600 to connect it to Internet. Just make sure that you install the patches on a system that does include the FPU… then set FPSIM YES in your kernel configuration file, run make, probably tweak the makefile, make install, move the image to an SD card, and then you can boot it on your FPGA.

Curious? Surf to my de0 board at http://pdp11.sytse.net/.

PDP2011 has been to VCF

Jack Rubin sent me this picture of his DE0 board running RT11 in front of a real PDP11/70. Taken during the VCF East 8.0, which took place last weekend. It’s something of a classic ‘old meets new’ picture – there being something like 40 years of age difference in it. Would have liked to be there and see it for real – but it’s on the other side of the ocean from where I live. Still makes me proud to see my work on display. Thanks Jack!

Over the last weeks, I’ve been working on an Ethernet interface for the PDP – a DEUNA, to be more exact. And it’s basically only the implementation of the controller; the real Ethernet stuff is implemented in a tiny Michrochip ENC624J600 device that I interface via SPI. The DEUNA is working a bit already, but there is still a major bug in it – it appears to cause memory corruption. I’ve been able to do some pings to a board running BSD2.11, and even telnetted to it a couple of times, but it tends to crash often. I expect that there is a bug in the bus master logic, and that will take me some time to find. Especially since the summer is here, and it’s time to go outside and spend time on outdoor projects.

PDP2011 in front of a real PDP11/70

Fixes and Nexys3

Since the last post I’ve been fixing some more bugs in the new serial controller. The most obvious difference is that it’s speed is configurable now. Less obvious, but equally if not more important, is that the stability is much improved, especially in the rx channel. At the time of the last post, however, a flood ping would still cause the controller to lock up. Not any more. It now runs quite stable with SLIP at 38400 BPS. At that speed, the CPU is already doing ~4000 interrupts per second and very near 100% busy – it will not go much faster than that, the link will run at 57600 BPS but the actual throughput will decrease, and the amount of packets received in error will rise considerably at that speed.

Anyway, at the same time I received my order of new goodies – a Nexys3 board and assorted pmods, including the pmodnic100. The pmodnic100 is the first step towards creating a DEUNA-type ethernet controller – that is, it looks like it will be possible to do, what I’m not yet sure of is how much time it is going to take. The interfacing is not very difficult, but it is still going to be a lot of work, and DEUNA is a complicated thing to be compatible with.

The Nexys3 is somewhat of a mixed joy, I must say. From the pricing, I had expected the FPGA to be huge. It is not – it’s effectively smaller than the Nexys2-1200 board I already had. Also, it has only 3rd party prom parts – not the Xilinx proms, so prom programming is only really possible with the Digilent tools, and those only run on Windows. My last remaining Windows pc runs fine, but with one minor issue: I need to keep pushing down the CPU fan – if I let go of it, it comes loose, because the plastic retainer bracket is broken. If that happens, after about 5 seconds the cpu will overheat and shut down – causing Windows to crash. Oh joy.

Digilent in the mean time is very slow about answering support questions – it took them almost two weeks to come up with what I already knew – it’s not possible to program the proms with the Xilinx tools. And, to finish my rant, the difference in build quality if you compare Terasic’s products to Digilent’s is rather big – and the pricing is too, but the wrong way around. Digilent used to do very well – the S3Board was and is a fine product: well thought out, well-engineered, and lovingly produced to decent standards. Apparently something got lost along the way.

What the Nexys3 also brought was somewhat of a surprise – the VGA core did not work on it. And apparently it never worked on the Nexys2 either – I must have forgotten that I never implemented it. Anyway, the issue with Nexys3 was that the spartan-6 is unhappy about asynchronous access to blockram; it should be possible, but the synthesizer seems not to generate the core for it. So, since it didn’t really need to be asynchronous anyway, I changed it to synchronous – and fixed one of the older minor issues, that caused the last scan line to fall off the edge of the VGA screen. And I fixed another bug in the vga core as well – it was impossible to enter a capital U. Also I made lots of updates to the font table – it should look better now, and it will also show control characters in a style similar to a vt100. It’s nowhere near a ‘real’ vt100 yet, though.

Together with one of the nice things about the Nexys3 – the USB Host port for the keyboard – that does create a rather fancy system; I can now connect my new fancy Rapoo wireless mini keyboard. No more big old ugly PS2 keyboards!

I’ll be posting the latest sources later this weekend. Besides the fixes to the terminal core and the serial line controller, there’s also the floating point updates in there. Definitely worth the effort of an upgrade.

TCP/IP on 2.11BSD

After finding the bug in the MMU details of the JSR instruction, now almost three weeks ago, I thought to spend a lot of time just playing with 2.11BSD and not actively doing any development.

Well, that almost worked. At least, it did until I realized that it would be really great to be able to run a network on the PDP – and that since I could now rebuild the kernel on BSD, it should be possible to add a SLIP link to it and run TCP/IP over it. There was only a slight problem – the serial link controller. Good enough for generating output, even good enough to do some typing. But as I already had found out, it would lock up whenever there was more than just a few characters input, such as when cutting and pasting some text. The controller source was one of the oldest pieces of VHDL that I had left – and by and large unmodified for the last two years. Built according to what I then thought was a good idea. So basically what it needed was a major overhaul, or maybe even better, a complete rewrite, from scratch. Just a simple rewrite of a very simple controller. How much work can it be, really.

Probably, the right answer is: not that much. But the interrupt controller still took me a couple of tries to get exactly right. Because, it turns out that the real problem in the old controller was that sometimes it would get stuck in its interrupt controller – whenever there would be interrupts pending from both the receiver and the transmitter. And in the same situation, also interrupts would get lost – resulting in either the receiver or the transmitter to get stuck.

Anyway, it took some doing, but I think I’m fairly close now. It still is not perfect, but good enough to run SLIP over at modest speeds. It still gets stuck sometimes – especially when the disk is busy and locks out the lower level interrupts, the SLIP link will get stuck in retransmissions and may eventually become inoperative for minutes. It does recover by itself though – I’m not sure why, but it does.

So all in all I now have a working SLIP link to my 2.11BSD system:

# ifconfig sl0
sl0: flags=b1
        inet --> netmask ffffff00

and if we look at the counters with netstat

# netstat -in
Name  Mtu   Network     Address            Ipkts Ierrs    Opkts Oerrs  Coll
sl0   296   192.168       140479   953   138015    42     1
lo0   1536  127            391     0      391     0     0

so we can see that there is a significant number of input errors – some of which I think are caused by timeouts because the disk controller is locking the bus or causes its higher interrupt level to lock out the serial links. I might implement some kind of buffer in the KL controller to partially fix this, I’m not sure if I will do this yet though – another and maybe more elegant fix would be to make the disk controller somewhat more sophisticated.
Anyway, we can see some more detail in the other output from netstat, looking at the active sockets, the routing table, and the statistics:

# netstat -a
Active Internet connections (including servers)
Proto Recv-Q Send-Q  Local Address          Foreign Address        (state)
tcp        0      2      ESTABLISHED
tcp        0      0      ESTABLISHED
tcp        0      0      ESTABLISHED
tcp        0      0  *.smtp                 *.*                    LISTEN
tcp        0      0  *.printer              *.*                    LISTEN
tcp        0      0  *.tcpmux               *.*                    LISTEN
tcp        0      0  *.discard              *.*                    LISTEN
tcp        0      0  *.echo                 *.*                    LISTEN
tcp        0      0  *.ident                *.*                    LISTEN
tcp        0      0  *.finger               *.*                    LISTEN
tcp        0      0  *.uucp                 *.*                    LISTEN
tcp        0      0  *.login                *.*                    LISTEN
tcp        0      0  *.shell                *.*                    LISTEN
tcp        0      0  *.telnet               *.*                    LISTEN
tcp        0      0  *.ftp                  *.*                    LISTEN
udp        0      0          *.*                   
udp        0      0        *.*                   
udp        0      0  *.ntp                  *.*                   
udp        0      0  *.who                  *.*                   
udp        0      0  *.time                 *.*                   
udp        0      0  *.echo                 *.*                   
udp        0      0  *.biff                 *.*                   
udp        0      0  *.syslog               *.*                   
Active UNIX domain sockets
Address  Type   Recv-Q Send-Q    Inode     Conn     Refs  Nextref Addr
    4588 dgram       0      0        0     5688        0     5988
    6288 dgram       0      0     3336        0     4088        0 /dev/log
    4188 dgram       0      0        0     5688        0     4408
    5908 dgram       0      0        0     5688        0        0
    4988 stream      0      0     3ed0        0        0        0 /dev/printer
# netstat -s
        142500 total packets received
        1207 bad header checksums
        159 with size smaller than minimum
        1009 with data size < data length
        349 with header length < data size
        0 with data length < header length
        0 fragments received
        0 fragments dropped (dup or out of space)
        0 fragments dropped after timeout
        0 packets forwarded
        0 packets not forwardable
        0 redirects sent
        192 calls to icmp_error
        0 errors not generated 'cuz old message was icmp
        Output histogram:
                echo reply: 41249
                destination unreachable: 192
        0 messages with bad code fields
        0 messages < minimum length
        0 bad checksums
        0 messages with bad length
        Input histogram:
                destination unreachable: 2470
                echo: 41249
        41249 message responses generated
        89138 packet sent
                88077 data packet (9414995 bytes)
                818 data packets (80393 byte) retransmitted
                190 ack-only packets (164 delayed)
                3 URG only packets
                3 window probe packets
                0 window update packets
                88 control packets
        89418 packet received
                86822 ack (for 9335337 bytes)
                145 duplicate acks
                0 acks for unsent data
                2587 packets (4470 bytes) received in-sequence
                20 completely duplicate packets (33 bytes)
                0 packets with some dup. data (0 bytes duped)
                3 out-of-order packets (0 bytes)
                0 packets (0 bytes) of data after window
                0 window probes
                0 window update packets
                0 packets received after close
                0 discarded for bad checksums
                0 discarded for bad header offset fields
                0 discarded because packet too short
        84 connection requests
        10 connection accepts
        10 connections established (including accepts)
        95 connections closed (including 3 drops)
        87 embryonic connections dropped
        77241 segment updated rtt (of 78054 attempt)
        850 retransmit timeouts
                3 connections dropped by rexmit timeout
        0 persist timeouts
        56 keepalive timeouts
                56 keepalive probes sent
                0 connections dropped by keepalive
        0 incomplete headers
        0 bad data length fields
        3 bad checksums
        192 no ports
        0 (arrived as bcast) no ports
# netstat -rn
Routing tables
Destination      Gateway            Flags     Refs     Use  Interface          UH          1      333  lo0        UH          3    74558  sl0
default        UG          2     4520  sl0

And then I should probably also show these commands:

# uptime    
 12:23am  up 4 days, 21:27,  4 users,  load averages: 0.66, 0.40, 0.21
# who
root            console Feb 26 02:33
sytse           ttyp0   Mar  1 23:06    (
root            ttyp1   Feb 26 02:56    (
root            ttyp2   Mar  2 00:20    (
# w
 12:24am  up 4 days, 21:28,  4 users,  load averages: 0.56, 0.40, 0.21
User            tty       login@  idle   JCPU   PCPU  what
root            console   2:33am    53     12      1  -sh 
sytse           ttyp0    11:06pm    55      2      1  -sh 
root            ttyp1     2:56am 71:13  96:26  29:25  sleep 10 
root            ttyp2    12:20am            4      1  w 
# ps ax
     0 ?   0:35 swapper
     1 ?   0:01  (init)
    46 ?   0:34 syslogd 
    56 ?   7:14 update 
    59 ?   0:02 cron 
    63 ?   1:34 acctd 
    71 ?   0:01 /usr/sbin/inetd 
    75 ?   0:04 rwhod 
    79 ?   0:00 /usr/sbin/lpd 
    97 ?   0:02 /usr/sbin/sendmail -bd -q1h 
   101 ?   1:13 ntpd 
   105 co  0:01 -sh 
    38 l1  0:00 slattach /dev/ttyl1 9600 
 15302 p0  0:02 telnetd 
 15303 p0  0:01 -sh 
   130 p1  4:22 telnetd 
   131 p1 29:25 -sh 
 18941 p1  0:00 sleep 10 
 18765 p2  0:01 telnetd 
 18766 p2  0:01 -sh 
 18944 p2  0:00 ps ax 
# uname -a
BSD pdp11.sytse.net 2.11 2.11 BSD UNIX #27: Tue Feb 21 21:14:38 MET 2012     root@pdp11.sytse.net:/usr/src/sys/PDP2011  pdp11

Maybe a bit more interesting specifically about the VHDL system aspect is that also the ntpd runs. The ntpd synchronizes to the ntpd running on my PC, that is itself synchronized to some outside time source – so the clock running in 2.11BSD is actually showing the real time, and quite accurate as well – even though the clock is only derived from the modest 50Mhz crystal oscillator on the DE0-Nano board that I’m using for these tests. The crystal oscillator is not really stable – by itself, it tends to waver a couple of minutes fast or slow per day.

# ntpdc -v localhost
Neighbor address port:123  local address
Reach: 0377 stratum: 3, precision: -24
dispersion: 64000.000000, flags: 9101, leap: 0
Reference clock ID: [] timestamp: fad20173.eb057573
hpoll: 6, ppoll: 6, timer: 64, sent: 6535 received: 6382
Delay(ms)   175.00  175.00  175.00  175.00  175.00  175.00  175.00  175.00 
Offset(ms)    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00 

        delay: 175.000000 offset: 4085.000000 dsp 64000.000000

# tail -200 /usr/adm/messages|grep ntp
Mar  1 18:47:14 pdp11 March  1 18:47:14 ntpd[101]: adjust: STEP st 3 off 0.900270 drft 0.007104 cmpl 0.022846
Mar  1 18:55:48 pdp11 March  1 18:55:48 ntpd[101]: adjust: STEP st 3 off 1.819736 drft 0.007104 cmpl 0.022846
Mar  1 19:04:21 pdp11 March  1 19:04:21 ntpd[101]: adjust: STEP st 3 off 0.856055 drft 0.007104 cmpl 0.022846
Mar  1 19:08:37 pdp11 March  1 19:08:37 ntpd[101]: stats: dc 0.007104 comp 0.022846 peersw 1 inh 0 off 2.043122 SYNC 3
Mar  1 19:12:55 pdp11 March  1 19:12:55 ntpd[101]: adjust: STEP st 3 off 2.043122 drft 0.007104 cmpl 0.022846
Mar  1 19:21:30 pdp11 March  1 19:21:30 ntpd[101]: adjust: STEP st 3 off 2.016722 drft 0.007104 cmpl 0.022846
Mar  1 19:30:02 pdp11 March  1 19:30:02 ntpd[101]: adjust: STEP st 3 off 0.417967 drft 0.007104 cmpl 0.022846
Mar  1 19:38:36 pdp11 March  1 19:38:36 ntpd[101]: adjust: STEP st 3 off 1.240674 drft 0.007104 cmpl 0.022846
Mar  1 19:47:09 pdp11 March  1 19:47:09 ntpd[101]: adjust: STEP st 3 off 1.144854 drft 0.007104 cmpl 0.022846
Mar  1 19:55:44 pdp11 March  1 19:55:44 ntpd[101]: adjust: STEP st 3 off 2.617553 drft 0.007104 cmpl 0.022846
Mar  1 20:04:16 pdp11 March  1 20:04:16 ntpd[101]: adjust: STEP st 3 off 0.368686 drft 0.007104 cmpl 0.022846
Mar  1 20:08:48 pdp11 March  1 20:08:48 ntpd[101]: stats: dc 0.007104 comp 0.022846 peersw 1 inh 0 off 2.624537 SYNC 3
Mar  1 20:12:51 pdp11 March  1 20:12:51 ntpd[101]: adjust: STEP st 3 off 2.624537 drft 0.007104 cmpl 0.022846
Mar  1 20:21:25 pdp11 March  1 20:21:25 ntpd[101]: adjust: STEP st 3 off 1.297541 drft 0.007104 cmpl 0.022846
Mar  1 20:29:57 pdp11 March  1 20:29:57 ntpd[101]: adjust: STEP st 3 off 0.317852 drft 0.007104 cmpl 0.022846
Mar  1 20:38:31 pdp11 March  1 20:38:31 ntpd[101]: adjust: STEP st 3 off 1.005208 drft 0.007104 cmpl 0.022846
Mar  1 20:47:03 pdp11 March  1 20:47:03 ntpd[101]: adjust: STEP st 3 off 0.309247 drft 0.007104 cmpl 0.022846
Mar  1 20:55:37 pdp11 March  1 20:55:37 ntpd[101]: adjust: STEP st 3 off 1.381118 drft 0.007104 cmpl 0.022846
Mar  1 21:04:11 pdp11 March  1 21:04:11 ntpd[101]: adjust: STEP st 3 off 1.803875 drft 0.007104 cmpl 0.022846
Mar  1 21:08:59 pdp11 March  1 21:08:59 ntpd[101]: stats: dc 0.007104 comp 0.022846 peersw 1 inh 0 off 0.255038 SYNC 3
Mar  1 21:12:43 pdp11 March  1 21:12:43 ntpd[101]: adjust: STEP st 3 off 0.391487 drft 0.007104 cmpl 0.022846
Mar  1 21:21:17 pdp11 March  1 21:21:17 ntpd[101]: adjust: STEP st 3 off 1.278973 drft 0.007104 cmpl 0.022846
Mar  1 21:29:50 pdp11 March  1 21:29:50 ntpd[101]: adjust: STEP st 3 off 0.281603 drft 0.007104 cmpl 0.022846
Mar  1 21:38:24 pdp11 March  1 21:38:24 ntpd[101]: adjust: STEP st 3 off 1.804268 drft 0.007104 cmpl 0.022846
Mar  1 21:46:56 pdp11 March  1 21:46:56 ntpd[101]: adjust: STEP st 3 off 0.373988 drft 0.007104 cmpl 0.022846
Mar  1 21:55:31 pdp11 March  1 21:55:31 ntpd[101]: adjust: STEP st 3 off 1.949129 drft 0.007104 cmpl 0.022846
Mar  1 22:04:03 pdp11 March  1 22:04:03 ntpd[101]: adjust: STEP st 3 off 0.432497 drft 0.007104 cmpl 0.022846
Mar  1 22:09:07 pdp11 March  1 22:09:07 ntpd[101]: stats: dc 0.007104 comp 0.022846 peersw 1 inh 0 off 2.688494 SYNC 3
Mar  1 22:12:39 pdp11 March  1 22:12:39 ntpd[101]: adjust: STEP st 3 off 3.657928 drft 0.007104 cmpl 0.022846
Mar  1 22:21:12 pdp11 March  1 22:21:12 ntpd[101]: adjust: STEP st 3 off 0.259591 drft 0.007104 cmpl 0.022846
Mar  1 22:44:07 pdp11 March  1 22:44:07 ntpd[101]: Lost reachability with
Mar  1 22:46:20 pdp11 March  1 22:46:20 ntpd[101]: adjust: STEP st 3 off 7.093814 drft 0.007104 cmpl 0.022846
Mar  1 22:54:53 pdp11 March  1 22:54:53 ntpd[101]: adjust: STEP st 3 off 0.730772 drft 0.007104 cmpl 0.022846
Mar  1 23:03:26 pdp11 March  1 23:03:26 ntpd[101]: adjust: STEP st 3 off 0.366301 drft 0.007104 cmpl 0.022846
Mar  1 23:12:02 pdp11 March  1 23:12:02 ntpd[101]: adjust: STEP st 3 off 3.575523 drft 0.007104 cmpl 0.022846
Mar  1 23:18:26 pdp11 March  1 23:18:26 ntpd[101]: stats: dc 0.007104 comp 0.022846 peersw 1 inh 0 off 1.043266 SYNC 3
Mar  1 23:20:35 pdp11 March  1 23:20:35 ntpd[101]: adjust: STEP st 3 off 1.043266 drft 0.007104 cmpl 0.022846
Mar  1 23:29:11 pdp11 March  1 23:29:11 ntpd[101]: adjust: STEP st 3 off 4.085119 drft 0.007104 cmpl 0.022846

For which there is also a little story to tell – just after the SLIP link started working, I noticed that the ntpd would crash just after establishing the sync. That turned out to be caused by a minor problem in the load/convert integer to float instruction – LDCLF, in this case. The error was caused by that I set the length of the long integer to 16 bits whenever R7 was used in the source – but that rule should only be applied when the mode is 2. That caused a divide-by-zero. Luckily, I’m getting fairly good with adb… But what is still a bit amazing is that this error is quite an obvious one – but none of all the 11/34, 11/44, 11/45, 11/70, or J-11 MAINDEC tests I ran for the FP11 detected it. Can’t really complain about that, of course – these test programs were obviously not designed for finding bugs in a VHDL CPU almost 40 years later.

So, I’m busy adding the new serial controller to all the board level files. And I’ve made some changes to the clock controller as well – it will be configurable 50 or 60Hz. And I’ve ordered some new toys – a Nexys3 board and the PMODNIC100. Not sure yet if I’ll turn that into a DEUNA. As I said before, that’s a lot of work, and I’m not sure I like the DEUNA. I will start working on improving the RH controller though – to make the SD card interface work separately from the rest of the controller, so that the bus can be released while the card is busy. That will make systems much more responsive during disk activity.

That’s all for now!

Finding the final bug

Since the 11-11-11 announcement of the project, nothing much happened for quite a while. Or so it probably seems.

Actually, I’ve worked on-and-off (mostly off, to be honest) on the minor, but still very annoying bug in 2.11 BSD – I mean the bug that crashed the C compiler when doing a kernel build.

I had already looked into the bug a bit before the 11-11-11 announcement, but with the somewhat complex structure of the C compiler, I did not easily find what was going wrong and where – considering that the exact message was just “Fatal error in /lib/c0”, there was no easy clue to start looking into. And the failed instruction the core dump pointed to made no immediate sense either.

The 2.11 BSD images I used were the RP06 disk image from PUPS, and the RK image composed by Walter Müller. The RP06 image is a bit flaky in some areas, for instance it has a mix of binaries including the short- and long versions of UT_NAMESIZE that affect the passwd bdb, so it’s not easily possible to change the shell for the root user. And also I suspect the /usr/include contents are not consistent. Walter’s images don’t have that kind of problem, but lack the sources – there simply is no room for them anywhere – but also, the source images that I tried were not consistent with Walter’s kernel and what came with it in terms of /usr/include. So, I did not have a set of reliable sources that I could consistently use to reproduce the bug; but, at one point I was playing with it, and I noticed that if a kernel build crashed on one image, I could move it to the other and it would not crash at the same point. So, by moving the build between the RP image and Walter’s RK set, I could actually complete a build – and the resulting kernel image would appear to boot somewhat ok-ish – at least, it would not be completely broken. I was completely flabbergasted at this point, and decided to concentrate on other things for a while – and basically forgot about this strange phenomenon that I could not explain. In retrospect though, it was an important clue – I should have remembered it. Well, hindsight is always 20/20.

Anyway, a couple weeks later I was playing with Unix V7 – which worked fine, I thought. But, I noticed that there was a similar problem with the C compiler, only not while building the kernel, but while building the commands in /usr/src/cmd. Some of the commands living in their own subdirectory failed. For example the build of the eqn command, that I would use to reproduce the problem and that I used as a test many times:

# cd /usr/src/cmd/eqn
# make
yacc -d e.y

conflicts: 85 shift/reduce, 71 reduce/reduce
mv y.tab.c e.c
mv y.tab.h e.def
yacc -d e.y

conflicts: 85 shift/reduce, 71 reduce/reduce
mv y.tab.c e.c
cc -O -c e.c
cc -O -c diacrit.c
cc -O -c eqnbox.c
cc -O -c font.c
cc -O -c fromto.c
cc -O -c funny.c
cc -O -c glob.c
cc -O -c integral.c
cc -O -c io.c
cc -O -c lex.c
cc -O -c lookup.c
cc -O -c mark.c
cc -O -c matrix.c
cc -O -c move.c
cc -O -c over.c
cc -O -c paren.c
cc -O -c pile.c
cc -O -c shift.c
Fatal error in /lib/c1
*** Error code 8


I verified the same set with SIMH – but there the compiler worked flawlessly. In all images that I could find or create, SIMH worked flawlessly, but my VHDL consistently failed, and it always failed in the same places. So, I started to try and find out more about the problem. I started by trying to vary things in the setup. The first attempt was to see if it might be something to do with timing – so, I ran the same setup at 1Mhz instead of 12. Same thing. The next thing I tried was if there was a difference if I ran the same test on an Altera board and a Xilinx board. Or a board with dram and one with sram. No difference at all – so, obviously, the problem very likely would have to be in the VHDL.

I still did not have any useful idea on where to start looking. So, after a while, I came up with the idea to include some logic in the CPU core to show a signal outside the FPGA, to trigger my logic analyzer. I started at the instruction that consistently featured in the /lib/c1 core dumps – the mov 0104216(r0),r3.

Like so:

   when state_src6 =>
      if ir(8 downto 6) = "111" then
         addr_indirect <= unsigned(datain) + unsigned(rbus_data_p2);
         addr_indirect <= unsigned(datain) + unsigned(rbus_data);
      end if;
      if rbus_data = "0010000001101110" and r7 = x"334C" then
--       no state transition - halt cpu
         state <= state_src6a;
         r7 <= r7p2;
      end if;
So, effectively the CPU would crash at the point where the mov 0104216(r0),r3 instruction was executed – after several tries, I found that that was specific enough to allow Unix to boot and run everything up to the compiler run, and still consistently trigger the logic analyzer. But, to my disappointment, the ~4000 clock transitions that my logic analyzer can capture were not sufficient to show the problem – I traced back from where the trigger occurred and checked all the instructions, one by one, but all appeared to be processed correctly; at least, I did not see anything going wrong.
During the work with the logic analyzer, I also looked into the exact code that was breaking in the C compiler. It was the c12.c source file, and specifically this part:
struct tnode *atree;
        struct { int intx[4]; };
        register op, dope;
        int d1, d2;
        struct tnode *t;
        register struct tnode *tree;

        if ((tree=atree)==0)
        if ((op = tree->op)==0)
        if (op==NAME && tree->class==AUTO) {
                tree->class = OFFS;
                tree->regno = 5;
                tree->offset = tree->nloc;
        dope = opdope[op];

where, in the compiled form the opdope[op] is interesting because the index, op, was out of range and thus causing the problem. To be more easily understood by reading the compiled form of the same bit of the compiler source, and especially considering the last two instructions, asl r0, and mov _opdope(r0):

jsr     r5,csv
jbr     L1
mov     4(r5),r2
jne     L4
clr     r0
jbr     L3
L4:mov  (r2),r4
jne     L5
mov     r2,r0
jbr     L3
L5:cmp  $24,r4
jne     L6
cmpb    $13,4(r2)
jne     L6
movb    $24,4(r2)
movb    $5,5(r2)
mov     10(r2),6(r2)
L6:mov  r4,r0
asl     r0
mov     _opdope(r0),r3

What I also found out while looking into the structure of the C compiler, was that Unix V7 already included a very advanced debugger: adb, that you can use to examine core dumps. Which produced the following:

# adb /lib/c1 core
ps      0170000
pc      031512  ~optim+076
sp      0175234
r5      0175252
r4      010067
r3      0113544 _end+0154
r2      052
r1      0
r0      020156
~optim+076:     mov     0104216(r0),r3
c routine not found

The first thing to check in the adb output was the instruction itself, and the ones preceding it. Obviously, a mode-6 access with the index in r0 would be out of range for a normal array for this value of r0 – like the opdope[op] would imply, and which I could easily verify to be correct from the C language sources. And also, the shift instruction converting the word index in r4 to the address index in r0 worked correctly – after all, 2 times 010067 is exactly 020156. So, from the adb output, and also by looking at the logic analyzer, it was becoming clear that the problem actually had to be something else: the stack was corrupted, because there was no way that the op value of 010067 could be correct – at the very least, the high order bit was not correct, but even if the offending bit would be ignored, 067 also did not make sense looking at the potential values for the field. Equally obvious, 052 (052 being the octal notation) could not be a valid value for a pointer into the heap. But still, looking into the logic analyzer output and all of the instructions in the assembler output from the compiler, I could not see how or why the stack pointer would become corrupted – and it simply happened outside of the tiny viewport the ~4000 clock transitions that the logic analyzer recorded.

So, effectively, I was still more or less at the same point that I started out at. Even though I had spent considerable time working on it, the problem had not become any more clear, I had found no theory to chew on, and really the only definite clue I had was that “it did not work” and the vague notion that it probably had something to do with stack corruption. But for that kind of problems, there are many possible explanations – stack corruption typically occurs after a lot of errors. So, at that point I was somewhat discouraged and again decided to work on other things for a while.

That’s where I more or less forgot about the whole problem for a couple of weeks. I concentrated on some of the other things I like to do for a while – like, working out in the climbing gym. So for some weeks I was not really thinking about the problem at all, spending a lot of time on other things, and maybe only playing with the PDP stuff in some lost moments. Until the day that I was climbing something slightly too difficult, and hurt a pulley in one of my fingers. And decided that it would be a good idea to take a rest from climbing training for some days to allow my finger to heal somewhat – or to find out if it was an injury to be worried about, because at one point it really hurt bad. So suddenly I was confronted with a whole weekend I did not have any plans for – I had no trip planned to anywhere, the weather was not inviting to go outside, and I would certainly not go to the gym for training. To cut the story short, I decided to spend the entire weekend on debugging instead.

So on the friday night, I started thinking where to begin work. I had already looked into the set of instructions that were used in the C compiler assembler source, and had found no obvious candidates to corrupt the stack – except perhaps the jsr instruction. However, that instruction had already taken the lead role in the last major debugging exercise – it was, after all, the vexing jsr r6 problem which caused compiled Fortran code to break on RT-11. And thus also broke sysgen and much more importantly Dungeon. So I thought I would not have to look at the jsr instruction, because I had already examined it ad nauseam.

So, even though I thought that there could be no problem there, I had already looked at the jsr, the differences with the implementation in SIMH, and also the formation of the MMU SR1 for jsr. Which, I noted at some point, were missing in the VHDL, but SIMH did include them. So I had already added a tentative fix in the VHDL:

                  when state_jsr =>
                     rbus_ix <= "110";
                     state <= state_jsra;

                  when state_jsra =>
                     addr_indirect <= rbus_data_m2;
                     rbus_waddr <= pswmf(15 downto 14) & "0110";
                     rbus_d <= rbus_data_m2;
                     rbus_we <= '1';
                     sr1_dstd <= sr1_m2;
                     rbus_ix <= ir(8 downto 6);
                     state <= state_jsrb;

                  when state_jsrb =>
                     if ir(8 downto 6) /= "111" then
                        rbus_waddr <= pswmf(15 downto 14) & pswmf(11) & ir(8 downto 6);
                        rbus_d <= r7;
                        rbus_we <= '1';
                     end if;
                     r7 <= dest_addr;
                     state <= state_ifetch;

but that did not cause any differences. And that was something that had already vexed me - why had Bob Supnik included the update of the SR1, if apparently none of the operating systems really needed it to run? I did not know what to think of it, really - so I started looking into the order by which the SR1 is formed, because I knew that was a difference between some of the PDP models and my VHDL - the rule appears to be that 'the register that is modified first goes into the lower byte, and the other goes into the upper byte'. But different models update the registers in different orders... so which version to follow?

That is when I realized that by the fix I had applied I wrote the sr1_dstd field - but, since the state_jsr follows the destination address calculation, that field is already potentially used by the state_dstX FSM states. I quickly changed the VHDL to:

                  when state_jsr =>
                     rbus_ix <= "110";
                     state <= state_jsra;

                  when state_jsra =>
                     addr_indirect <= rbus_data_m2;
                     rbus_waddr <= pswmf(15 downto 14) & "0110";
                     rbus_d <= rbus_data_m2;
                     rbus_we <= '1';
                     sr1_srcd <= sr1_m2;
                     rbus_ix <= ir(8 downto 6);
                     state <= state_jsrb;

                  when state_jsrb =>
                     if ir(8 downto 6) /= "111" then
                        rbus_waddr <= pswmf(15 downto 14) & pswmf(11) & ir(8 downto 6);
                        rbus_d <= r7;
                        rbus_we <= '1';
                     end if;
                     r7 <= dest_addr;
                     state <= state_ifetch;

and reran the usual test - flash the Unix V7 image, because it would be corrupt after failing a test, then booting it, changing to /usr/src/cmd/eqn, and running make. Even though I had not really expected the test to pass outright, I was still somewhat disappointed that it failed - but, it failed in a different way - so I knew that I had somewhat affected the nature of the problem. Now, it crashed in the same compilation, but apparently earlier in the same run - or at least, at an instruction closer to the start of the /lib/c1 executable.

So that puzzled me for a while. Then I came up with the idea that since it was likely a stack issue, and likely the jsr instruction that caused a stack push that caused the MMU to abort and therefore the instruction restart routines to kick in and somehow fail, that it would make sense to look into the order of the updates to the registers during the jsr instruction.

Oops. So, if the stack push would fail, the CPU state machine during the memory access that would be aborted would have already progressed to the state where it would update the target register for the instruction. So, even though the jsr instruction would be aborted because of the stack push, the target register would be written into - and thus, after the instruction restart carefully constructed by the OS fault handler, the restarted instruction would contain the wrong value of the target register. In any case but jsr pc, that would mean that the target register would be updated twice, but with only one update to the stack - so the original value of the register would be lost. Ouch. So, once I saw that, the fix was easy, just introduce a new intermediate state so that the update to the stack (which may cause the MMU to abort) and the update to the target register to actually occur in different cycles - so that if the memory cycle causes an abort, no update to the target register will have been initiated. As follows:

                  when state_jsr =>
                     rbus_ix <= "110";
                     state <= state_jsra;

                  when state_jsra =>
                     addr_indirect <= rbus_data_m2;
                     rbus_waddr <= pswmf(15 downto 14) & "0110";
                     rbus_d <= rbus_data_m2;
                     rbus_we <= '1';
                     sr1_srcd <= sr1_m2;
                     rbus_ix <= ir(8 downto 6);
                     state <= state_jsrb;

                  when state_jsrb =>
                     state <= state_jsrc;

                  when state_jsrc =>
                     if ir(8 downto 6) /= "111" then
                        rbus_waddr <= pswmf(15 downto 14) & pswmf(11) & ir(8 downto 6);
                        rbus_d <= r7;
                        rbus_we <= '1';
                     end if;
                     r7 <= dest_addr;
                     state <= state_ifetch;

After applying the fix, I was somewhat surprised - but happily so - that the problem in the make eqn test was apparently fixed. The realization that I had found the problem and actually fixed it came as something of a shock. And, going back to my plan of spending the entire weekend to chew on the problem, by the time I found the fix, it was only 10PM on the Friday night. Quickly I started test runs on several boards in parallel - so I had Unix V7 running all kinds of work on three different boards, and was working on another to check 2.11BSD. Which also now ran commands that previously repeatedly crashed and repeatedly completed kernel builds, in short: everything worked flawlessly.


So, I spent most of the night in the enlightened state that you can only reach when something of an incredibly complex technical nature unexpectedly works like a charm. Watching how it works over and over again, hours on end. In a state somewhere between disbelief and utter amazement, just looking at what I had created. If there is a heaven for geeks, it must be something very close to this.

So the next morning I had no plan for the rest of the weekend. Well, to be honest, by the time I woke up normal people were already well into their notion of afternoon. And this day, for the first time this season there was snow - and a lot of it too, 4 inches easily, but by the time I woke up the sun was shining and the sky was clear. So I took my camera and went out into the nature reserve that I consider my back yard and spent some time watching an incredible sunset over a frozen plain seemingly inhabited only by wildlife and me.

Some days, life really is wonderful.