Over the last months, I had a couple of occurrences of the problem where 2.11BSD would loose it’s network connection, reporting that there were no transmit buffers available on the DEUNA. All in all, I’ve seen this problem three or four times over the last year, but maybe ten times in the last month or so. No idea why – the only thing that changed is that I have a new Ethernet switch, that could make for a subtle change in the timing.
Anyway, now that it occurred more often, that also gave me the opportunity to find out what was actually wrong. I enabled the debug code in the if_de.c driver for the DEUNA and added some more debug statements. Next morning I was surprised by debug output – showing that in effect all transmit buffers were free…
So, that got me thinking of the interrupt controller core in the DEUNA. It did contain some strange edge-trigger construction, that could potentially result in a deadlock. I changed it, and setup my venerable old 20Mhz oscilloscope to show the interrupt signals – br and bg.
This time I had to wait for three days for the problem to occur again – and to my disappointment, it did, and it still locked up in the same way. However, it was also clear that no interrupts were taking place, so I was definitely looking in the right place – the interrupt controller was maybe not locked up itself, but even so no interrupts were taking place. More evidence against the interrupt controller and the edge-trigger in it.
A couple of experiments showed that an easy solution would be just to generate interrupts on the level instead of the edge. But this would also cause the DEUNA to keep on interrupting until the software disabled interrupts or cleared the originating bit. Not very elegant, but it did work – and after some time, I realised that the software will in all likely scenario’s examine the interrupt bits, and most likely reset them. So would it maybe work if I went back to the edge triggering system, and reset the trigger on writes into the PCSR0 register?
Of course it did.
And a minor other thing comes to mind: I keep saying DEUNA, but it’s actually a DELUA now. The difference is only in the PCSR1 ID bits; no logic has been changed at all. I did this because Decnet on RSX-11M-Plus tries to load microcode into the DEUNA – which will not work because in reality of course the controller does not look like a real DEUNA at all. But it will leave a DELUA alone. And because all the other software – 2.11BSD and RSTS – does not seem to make a difference between DEUNA and DELUA, there seems to be no reason not to change the thing into a DELUA.
I changed several subtle things in the microcode as well, mostly around buffer chaining and resetting the chip if it becomes disconnected for some reason. Buffer chaining probably still is not correct, but it doesn’t really seem to be used extensively by the operating systems – it’s only when broadcast frames longer than what the network stack expect arrive that the code seems to be triggered.
The updates – including the fix for RSX-11M-Plus – are on the download page now, and several pregenerated bitstreams as well. Enjoy!