The VM.SYS issue in MINC

One of the left over issues for MINC was how the install disks would set up for booting a MINC system – including setting up the VM.SYS driver to cache the MINC BASIC software so that it would load faster after a round trip to the (modified) RT utilities like PIP and MEDIT.

In short, the original install procedure would cause a hang after booting the newly installed disk image at the end of the startup, and depending on the variant (11/03 or 11/23, Y or N to the VM question) you might be able to use ^C to get out of the hang. But there was an easy workaround: just don’t use VM.SYS.

Still it was a bit unsatisfactory to know that there was a mismatch between PDP2011 and how the ‘real’ MINC systems used to work, and that the original images didn’t ‘just work’. So today I set off to find exactly what was going on, and with some luck I found it.

Of course it was obvious from the start that the issue would be somewhere in the MMU, and after instrumenting the core to look at all MMU settings, that was also confirmed again – the code would trap, but unexpectedly it would trap on a page length error, and in a loop too. Weird – why would that happen?

Looking a bit closer though, the first trap would occur on a write to address 17777572 – setting bit 0 in the MMU SR0 register. Turning on the MMU immediately caused it to trap – in the same cycle!

As the PDP2011 design works, writes to the SR0 have immediate effect. For all operating systems I’ve looked at so far that works just fine, because those OSes set up access to the MMU registers before activating address translation. So the access to the register is allowed, both before and after the actual access.

Not so for the MINC startup though. The way it turns on the MMU without assuring mapping to the MMU registers is something of a suicide move – either it works or it’ll crash itself. Once I understood what was happening the fix was easy enough though, just delay the update to the SR0 till the next cycle. At the cost of a one-bit register.

The way I’ve implemented the fix for now is very specific to 11/23 and MMUSR0 bit 0 – all other CPU types and all other SR0 bit updates are unaffected. It’s only setting that one specific bit that is delayed, so that the MMU is only turned on after the instruction that wrote the SR0 bit 0 to 1 has completed. In contrast, turning off is still immediate, as are all other updates to SR0.

There’s a good question though whether the fix should be more generic, but at this point the only answer I have is ‘I don’t really know’. It would seem that all other OSes map the MMUSR0 to the same address with and without translation – which of course is the completely obvious and natural thing to do, because it saves you from having to use two different addresses for it in your code. The definitive answer of course is in the details of the hardware and microcode of all ‘real’ PDP11s, but the general practice of mapping the MMU registers before turning on the MMU already tells me most of what I need to know.

In short, it needs a lot of testing and verifying. The good news though is that PDP2011 is a very, very tiny bit closer to the real thing. I won’t say it’s at 100%, even after 15 years of running all possible software mostly without any issues. But it’s getting very, very close.

As usual for this kind of update, I’ll hold up on updating the download page until I’ve had the chance to run tests for quite a while, including regression testing on all kind of weird setups. And there’s also a pending update for the GPIB core to support active drivers coming up. It’ll take a couple weeks probably. Rushing things seems wrong when you’re working on 50+ year old systems.

Leave a Reply