Ingres!

After the summer, I’ve picked up work on the interface to Oscar Vermeulen’s PiDP11 console – what was left to do was the virtual settings on the address rotary switch and the actual values on the address and data lights. It mostly works now, and I’ve come to the point that I need to take a step back from it, let it rest for a while and come back to it in a couple of days, maybe a week or so – to avoid getting blind to the things that aren’t right yet. Meantime I’ve sent a preview to a beta tester, and I’m anxiously awaiting his comments…

So now it’s time to just play with the machine! and the first thing on my mind to dive into was Ingres. One of the oldest real relational database systems, and with a long and rich history. I knew it was included in 2.11BSD, but when I tried it out years ago when I first got 2.11BSD to run, it didn’t work… all the commands core dumped. So it needed a bit more work – and after quite a bit of tinkering and experimenting, it turned out to be quite easy – as usual if you know the answer. At first, I tried rebuilding the Ingres sources as root, but that doesn’t work quite right – it can be done, but it’s a lot easier to run the make as the ingres user.

So, what needs to be done is this:

  1. Reconfigure the kernel to include the Ingres lock driver – in other words, the INGRES option (on the last line of the config file) should be set to YES. And obviously then recompile the kernel, install it and reboot the machine – and all of that using root, as usual.
  2. Login to the ingres user, change into the source directory, and run make – if you thought the kernel took a bit to recompile, well, this takes a bit longer.
  3. Change into the demo directory, and create the demo database by running ./demodb demo

And after that, the famous ’emp’ tables are ready for use. One surprise though – I must have known this in the day, but I forgot – this version of Ingres doesn’t use SQL, but it’s own language: QUEL. So ‘select * from emp’ doesn’t work, I had to use some of the examples from the manual.

* range of e is employee
* retrieve (e.all) 
* \g
Executing . . .


|number|name                |salary|manage|birthd|startd|
|-------------------------------------------------------|
|   157|Jones, Tim          | 12000|   199|  1940|  1960|
|  1110|Smith, Paul         |  6000|    33|  1952|  1973|
|    35|Evans, Michael      |  5000|    32|  1952|  1974|
|   129|Thomas, Tom         | 10000|   199|  1941|  1962|
|    13|Edwards, Peter      |  9000|   199|  1928|  1958|
|   215|Collins, Joanne     |  7000|    10|  1950|  1971|
|    55|James, Mary         | 12000|   199|  1920|  1969|
|    26|Thompson, Bob       | 13000|   199|  1930|  1970|
|    98|Williams, Judy      |  9000|   199|  1935|  1969|
|    32|Smythe, Carol       |  9050|   199|  1929|  1967|
|    33|Hayes, Evelyn       | 10100|   199|  1931|  1963|
|   199|Bullock, J.D.       | 27000|     0|  1920|  1920|
|  4901|Bailey, Chas M.     |  8377|    32|  1956|  1975|
|   843|Schmidt, Herman     | 11204|    26|  1936|  1956|
|  2398|Wallace, Maggie J.  |  7880|    26|  1940|  1959|
|  1639|Choy, Wanda         | 11160|    55|  1947|  1970|
|  5119|Ferro, Tony         | 13621|    55|  1939|  1963|
|    37|Raveen, Lemont      | 11985|    26|  1950|  1974|
|  5219|Williams, Bruce     | 13374|    33|  1944|  1959|
|  1523|Zugnoni, Arthur A.  | 19868|   129|  1928|  1949|
|   430|Brunet, Paul C.     | 17674|   129|  1938|  1959|
|   994|Iwano, Masahiro     | 15641|   129|  1944|  1970|
|  1330|Onstad, Richard     |  8779|    13|  1952|  1971|
|    10|Ross, Stanley       | 15908|   199|  1927|  1945|
|    11|Ross, Stuart        | 12067|     0|  1931|  1932|
|-------------------------------------------------------|

continue
*

A little bit more complex example: calculating the average salary for the employees working for each manager:

* range of e is employee
* retrieve (e.manager, avgsal=avg(e.salary by e.manager))
* \g
Executing . . .


|manage|avgsal    |
|-----------------|
|    10|  7000.000|
|     0| 19533.500|
|    32|  6688.500|
|    33|  9687.000|
|    13|  8779.000|
|    55| 12390.500|
|    26| 10356.333|
|   199| 11117.556|
|   129| 17727.667|
|-----------------|

continue
*

and then of course it would be nice to add another column with the name of the manager. Simple, add another view on the same table and match the number to the manager id:

* range of e is employee
* range of m is employee
* retrieve (m.name, e.manager, avgsal=avg(e.salary by e.manager)) where e.manager=m.number
* \g
Executing . . .


|name                |manage|avgsal    |
|--------------------------------------|
|Ross, Stanley       |    10|  7000.000|
|Smythe, Carol       |    32|  6688.500|
|Hayes, Evelyn       |    33|  9687.000|
|Edwards, Peter      |    13|  8779.000|
|James, Mary         |    55| 12390.500|
|Thompson, Bob       |    26| 10356.333|
|Bullock, J.D.       |   199| 11117.556|
|Thomas, Tom         |   129| 17727.667|
|--------------------------------------|

continue
* 

But, oops. Now we’ve lost manager 0 – because there isn’t a row for manager 0 in the table. Maybe 0 means that there isn’t one, and that it’s the big boss who has manager 0 in the table? That would seem right for J.D. Bullock – he fits all the stereotypes, being the oldest, and earning the most of all employees – and he started working in the company the day he was born. But there’s also Stuart Ross, who started a year later, and earns a lot less. So, I’m not sure – maybe the sample data is intentionally confusing.

Anyway, this case of missing rows in the last query is a nice example of what would be easy to lift out of the data with an outer join, but I have no clue how to do that in QUEL, or if it’s even possible. Nothing to be found in the manuals I’ve seen so far.

Things are moving!

Blinkenlights, for instance.

It’s so hard to believe that it’s already almost been 3 years since my last post here.

Well, I did have to hack my own site – I didn’t remember the admin password. Still, it’s not really like nothing did happen in the meantime, just nothing that I felt was finished enough to merit a post – like, the experimental work I did on the faster cpu. Or the ideas I had for adding sdhc support. And then there were the preliminary discussions on Oscar’s PiDP11, and whether or not I could interface my vhdl pdp11 to that. Somehow all of that was still in the not-quite-ready-for-posting stage until now… and maybe it’s showing my age, I like to make things public when they’re finished and real, even though the current fashion is to start shouting when you’ve just got a plan but can’t be sure if it’ll ever fly yet.

Anyway, it’s real enough now, I’ve got a few setups blinking their lights at me now. No, the vhdl for the console interface isn’t quite ready yet, but the tricky bits are done. Since the pinouts of Oscar’s PiDP11, and the Raspberry Pi interface to that, don’t quite match the pinouts of the fpga boards, it looks like there’ll have to be a converter board – a ‘shim’, we’re calling it for now. And since there will be a couple pins left on the 40-pin interface, I’ll most likely add the bare essential peripherals to that shim too – most likely it’ll work out to just enough to connect a serial console and a sd card. Center of development now is the DE0-NANO board from Terasic – a big fpga with lots of IO connectors, a very good build quality, and widely available for about USD 80 – it’s unbeatable. But probably most of Terasic’s other boards will do fine as well, if they have two of the 40-pin connectors and if the fpga is big enough – I’m not sure that DE0 (without the -NANO) will still be big enough. Why two of the 40-pin connectors, you might ask, if the console is clearly using just one? well, I’m planning for a peripheral board and that would use the other connector.

Development version of the console
No that isn’t what the console will look like when it’s finished – it’s one of the pre-production boards that Oscar gave me to start development on, and what we call the ‘lab test animal’. Oscar fished it out of the bin for me to play with – the holes are not aligned correctly so it won’t fit nicely in the case, and the leds are not the right colour either, obviously – but it’s just fine for the development I’m now doing. Just to show you the setup that I’m now working on 😉

Functionally, most of the lights and switches already work. Most of the work still to be done is around the rotary switches (with the mmu console modes) and some of the lights that the fpga PDP11 never needed – such as the run/pause/master, for instance. That might still take lots of time, but it’s getting there.

The finished console of course has the nice switches and rich red leds, and the beautiful panel to hide the PCB behind. And of course the custom injection molded  case… check Oscar’s site at http://obsolescence.wixsite.com/obsolescence/pidp-11 to see more.

That’s my setup that’s now running Oscar’s PiDP11 for comparison and for playing, obviously! And, note how the white lamp test switch is not quite aligned with the rest of the switches, that’s entirely my fault in being in too much of a hurry to build the kit…

So that’s it for today, and I’ll try to post a bit more regular to keep you up to speed on what’s going on with the PDP2011. Over the next weeks I’ll be working to get the vhdl for the front panel working correctly, and also to get the design of the shim finalised. Hope to get that done before the summer comes 😉

Release

Finally, I’ve managed to find the time to finish up the new boot code, test everything, generate new bitstreams for the download page, and update the site.

There are now two different sets of boot roms to choose from. The sources in m9312l46.mac and m9312h46.mac are the – now almost unchanged – DEC M9312 boot roms, as described in the K-SP-M9312 documents you can find on Bitsavers. The only change I made is a tiny one that will allow you to use lower case input – even though the size of the roms is completely filled up by the original code, I found some room by removing a couple of instructions that read the switch settings on the original hardware that allowed to select whether or not diagnostics would be run before booting. The PDP2011 does not have these switches – diagnostics will always run.

The second set of boot roms is in the sources m9312l47.mac and m9312h47.mac – I used the additional rom space to make the original boot code a bit more elaborate. It now lists what is  in the device space of the system, before going on to the original way of booting – ie, boot from the first disk of the first controller it finds, in the order RK, RL, RH.

Which of the two sets to choose depends a bit on which kind of configuration you run, and what you’re going to do with it. The DEC version is more flexible, it allows you to boot from whichever disk is in the system – which is very useful if you have made a configuration with more than one disk controller in it. And the load and store commands are very easy to use if you are debugging the interface between the PDP2011 core and memory chips, as you would do when porting the PDP2011 to an FPGA board that I don’t support. On the other hand, if you’re using a simple configuration and are booting it a lot, then the ‘old’ style core is easier – nothing to do, it just boots.

As a side effect of adding the second M9312 boot rom at 165000, in addition to the one already there in the older PDP2011 versions at 173000, the internal bus structure of the system has become larger. No problem for all of the existing board setups that I distribute, except for one – the de0, that was already used to the max of it’s capacity with the older 1170-rpxunofp setup, now gets seriously cramped for resources. As a consequence, I’ve had to decrease the clock speed somewhat – it now runs at 6.25Mhz at the cpu, instead of the 10Mhz. Still remarkable, if you consider that this the 1170-rpxunofp has 3 actual PDP-11 cpu’s in it – one for the system itself, one for the DEUNA, and one for the embedded terminal. Interestingly, it seems to be snappier despite the lower clock speed when I access it over the network – it might be that the original clock speeds caused some kind of interference between the DEUNA code and the ENC chip.

I’ve also updated the site in several places, and added a couple of how-to pages to explain how to get started, how to run the system, and how to make your own configuration.

Next thing on the agenda is to redo the sd card core in the disk controllers – clean up the old core, and add sdhc support – which I’ve postponed for a long time already, but since regular sd cards are becoming increasingly difficult to find (and my own stock is also rapidly depleting) this is becoming a priority. I don’t have a plan yet when it will be finished though – as I’ve often said, PDP2011 is something to do in winter, and the only reason that I’ve just now found time to work on it is because of a spell of bad weather in The Netherlands.

Finishing up for today, I thought to give an example of what the device space list looks like with the new boot code. Here it is:

Hello, world [t47]: cpu 11/45 fpu
177776           psw
177774           slr
177772           pirq
177770           mbr
177676 - 177640  par
177636 - 177600  pdr
177576 - 177572  mmu
177570           sdr
177566 - 177560  kl
177546           kw
174406 - 174400  rl
173776 - 173000  m9312
172516           mmu
172376 - 172340  par
172336 - 172300  pdr
172276 - 172240  par
172236 - 172200  pdr
165776 - 165000  m9312

boot from rl:

which of course will look slightly different depending on the configuration.

Booting

Last November, Scott Swazey asked why I made my own boot loader instead of using the original M9312 code.

I knew that the sources for M9312 were available, and I did have a look at them a long time ago. At that point, I was not sure I would ever get the CPU running, let alone booting from disks. And the code looked, well, complex and unlikely to run unless the hardware would mimic the original exactly. Also that seemed hardly possible at that time.

Later, when I got the first disk controller working, I just copied the boot loader from the simh sources – which I studied to get an idea of which parts of the disk controller were essential and which I could skip. And after the second and third disk controllers came into being, I just followed that pattern. Eventually that turned into T44 – the boot loader so far, the one that will announce itself with ‘Hello world’ and then proceed to boot from the first disk on the first available controller it knows about – RK05, RL02, RP06, in that order. Since in most cases the systems have one SD card only, and thus only one controller, that conveniently works for most cases. But Scott was building a system with both an RL and RH controller, so wanting to boot from a specific disk made total sense. So we looked into the challenge of making the original M9312 code work.

The first issue was that the M9312 code used absolute psects – as in, code to be fixed at a specific address. I knew there was an issue with that in my macro11 toolchain, but I never found what it was. Scott found it quickly though, it was a rather embarrassing mistake I made in the replacement of the macro11 linker that I modified to output the VHDL source for the boot roms.

After that, it was surprisingly simple. Just a question of adding the secondary boot rom at 165000, and I restructured the original device boot roms into one source – so it will fit into a single rom image. A bit later, I also changed the interpreter to accept lower case input – the original only works with upper case, which is a bit awkward.

M9312 commands

  • L <octal value> : set address
  • D <octal value> : deposit value at address
  • E <space> : examine data at address
  • S : start program

It also accepts the name of the four device bootroms as command:

  • DL<#> : boot from RL disk #
  • DK<#> : boot from RK disk #
  • DB<#> : boot from RP disk #
  • ZZ : run diagnostic

To make space for the lower case input, I had to remove some of the code from the original interpreter source – a bit of diagnostic code that would run on first boot. That also leaves some leftover room to reintroduce the ‘Hello world’ message – I’ve become used to that, and I’m missing it now. Or maybe some more user friendliness in the command interpreter, it’s very historic in the original state – and although it is somewhat fun to have it work in that way, it also makes for a lot of typing mistakes.

Next to his work on the booting stuff, Scott also found a mistake in the RL controller. The adders for the sector address were not wide enough, so an access to the fourth disk could wrap around to the first. That’s fixed now. He also made a suggestion to offset the disk images on the card, and use a standard MBR to address those images. After some long and hard thought, I decided not to include this – it may be convenient in some cases, but it also conflicts with the future plans I have for the disk controllers.

I haven’t decided yet if I will include the new boot loader into all prebuilt bitstreams. For the simple setups at least, the old boot loader scheme still makes sense. What I’ll definitely do is integrate both to use a joint code base for the device boot roms.

Updated sources will be published in a couple of weeks, I’m currently working to include Terasic’s C5G board into the distribution. After that is finished, I’ll post the new sources.

 

RSTS and J-11

Some time ago, Paul Koning contacted me about the issue that RSTS did not correctly detect the CPU type when the cpu was configured as a J-11 type – 11/84 or 11/94. He had already identified a problem in the cpu sources: the MFPT instruction would set the CPU code in the primary register set, instead of the currently active register set according to the PSW.

I built the core for the MFPT instruction a long time ago, at the point where I was working with a copy of the ZKDJ test to verify that the regular instructions were working correctly. I added the MFPT mainly because I liked the idea of sticking as close as possible to the original ZKDJ source – at the time, I did not anticipate the system becoming as complete as it is now. Why I chose to write the CPU type value in the primary register set I don’t really remember – it seems illogical now.

Anyway. The fix did solve the CPU type detection problem, but immediately revealed another: the startup code in RSTS went into a halt. Paul quickly found the reason; RSTS would overwrite it’s memory sizing code while trying to find out how much memory was available. The cause of this was that the J-11 models have 2044Mw of memory, and do not implement the unibus remap of the top 128K back into low memory – as 11/44 and 11/70 do.

After I fixed that issue, yet another appeared: the startup would proceed further, but would now issue the message:

This DCJ11 cannot be used in conjunction with an FPJ11 accelerator.
Contact Field Service for FCO kit EQ-01440-01 to correct the problem.

INIT will continue, but timesharing cannot be started.

RSTS V10.1-L RSTS   (DB0) INIT V10.1-0L

Which I could easily suppress by setting the 8th bit of the control register at 17 777 750 to zero – stating no FPJ-11 floating point accelerator is present. Since the J-11 always includes the floating point instruction set in it’s microcode, functionally there is no difference in whether or not the FPJ-11 is present – it should only speed up the floating point instructions. But then, the message shows that there is a difference…

Diving deeper into the issue, Paul was able to find that the test that produced the message failed on a test involving the ASHC instruction. Sure enough, in the manual for the 11/84 EK-1184E-TM-001_Dec87.pdf – to be found on Bitsavers – page B-17 lists two model differences for the ASH and ASHC instructions, which I had already implemented a long time ago – but incorrectly applied to all models. As a test, I disabled this specific behaviour – and the result was that RSTS booted up, and recognized a FPJ-11 without complaining.

Apparently, the FPJ-11 then played some role in fixing the wrong implementation of ASHC and probably ASH in the J-11. Maybe the accelerator actually executed these instructions? or maybe it’s presence implied different microcode, or a different path in the microcode?

I’m not sure there is a way to find out – none of the documentation I’ve found so far includes this level of detail on the original hardware. Whatever the case, the RSTS CPU recognition bug is now fixed. Thanks Paul!

Besides fixing these bugs, I also made the bit setting in 17 777 750 a configurable item – including the corresponding behaviour of the ASHC and ASH instructions. The parameter is called have_fpa, and it’s default setting is 0 meaning no FPJ-11. I don’t think there is any use for having this, other than looking at the differences in the hardware listing in the RSTS startup…


have_fpa => 0

Start timesharing?  HA

  HARDWR suboption? LI

  Name  Address Vector  Comments
  TT0:   177560   060   
  RB0:   176700   254   Units: 0(RP06)
  XE0:   174510   120   DELUA Address: 00-04-A3-1A-70-E1

  KW11L  177546   100   (Write-only)
  SR     177570
  DR     177570

  Hertz = 60.

  Other: FPU, 22-Bit, Data space, J11-E CPU

  HARDWR suboption? 


have_fpa => 1

Start timesharing?  HA

  HARDWR suboption? LI

  Name  Address Vector  Comments
  TT0:   177560   060   
  RB0:   176700   254   Units: 0(RP06)
  XE0:   174510   120   DELUA Address: 00-04-A3-1A-70-E1

  KW11L  177546   100   (Write-only)
  SR     177570
  DR     177570

  Hertz = 60.

  Other: FPU with FPA, 22-Bit, Data space, J11-E CPU

  HARDWR suboption? 

As usual, I’ll post the updated sources to the download page some time later this weekend.

FPU and DEUNA fixes

Since I found the problem with the BAE register in the RH70 that prevented RSX-11MP from running, I’ve been working on straightening out the timing between the CPU and the main memory. It’s nowhere near finished, but the first tests show that when it is, the CPU will be capable of much higher speed – one experiment even ran at 90Mhz on the latest FPGA models. Not bad at all, compared to the current baseline of 10Mhz, even considering that the new timing needs slightly more cycles per instruction.

In the meantime people have been looking at RSX. Especially Paul Anokhin, who has helped me find several issues. Firstly in bringing the somewhat forgotten de1dram board variant back to life. The DE1 board has two memory chips, an sram of 512KB and a dram of 8MB. Obviously the sram is a bit too small to really bring to life 22-bit CPUs, and very limiting if you want to run the later versions of RSX or Unix on them. Years ago I made the de1dram version as an experiment while I was waiting for my first DE0 board to arrive, but it was never quite finished – the DE0 arrived a bit earlier than that, and I finished the work on that board and forgot about the de1dram. But now it works.

Secondly, Johnny Billquist has been working on BQTCP – a TCP stack for RSX that coexists with Decnet. It is afaik the only case that really requires the buffer chaining in the DEUNA to work correctly – Decnet uses fairly small buffers, but TCP by default uses an MTU of 1500, so if a packet of that size arrives and the buffers are smaller than that, the buffer chaining needs to be correct. This was not a problem before, since it appears as if all other cases – Decnet itself, TCP on 2.11BSD – appear to use buffer sizes slightly larger than the maximum packet size they expect to receive.

I wrote the DEUNA microcode as a sort of proof-of-concept – meaning, it is not really clean structured code. But it appeared to work well enough, so the somewhat more complex case of correct buffer chaining was not completely finished; it triggered a warning message, and it would also switch to the next buffer when needed. What I forgot was the case where a chunk of data from the ENC424J600 chip – the data is copied from the chip in 16-byte chunks – did not completely fit in the buffer; in that case, the whole 16-byte chunk would be placed in the new buffer instead of filling up the old one instead. Obviously, that caused errors for BQTCP. Luckily, it was surprisingly easy to fix, I only needed to restructure the receive flow in the microcode a bit and add a couple of tests to make the buffer chaining work correctly.

The latest issue Paul reported was slightly more complex to find – he wrote a F77 program to do some floating point calculations, and the results were not correct. The same algorithm in BP2 on RSX also was wrong, but translated to C on 2.11BSD it worked correctly. Since I did not have much time to look into this, I asked Paul to look at the instructions generated from the F77 program, and try to find the difference in flow between SIMH – which worked correctly – and the FPGA hardware. What he came up with was that the different flow started near the execution of the ABSF instruction.

That provided a nice clue for me to start chewing on. And soon enough, it became clear that there was an issue in the way that the addressing mode 0 for the group of instructions called ‘fp single operand group 2’ was handled – ABSF, NEGF, TSTF, and CLRF. For mode 0, I implemented a fast path in the instruction sequencer to bypass reading the input operand – because the input operand is in a register, it does not require memory access and thus does not need memory cycles. However, the register read occurred in the same cycle as picking up the output from the ALU – so, in effect, the output of the ALU was not based on the input. For the CLRF instruction, that makes no difference since the input is irrelevant anyway, and I would speculate that the TSTF instruction is not used much – but for the ABSF and NEGF instruction this is obviously not the case.

Apparently the addressing mode 0 ABSF and NEGF instructions are not used much. I checked 2.11BSD; at least the C implementation hides these instructions through library calls, so the compiler does not appear to generate these instructions directly. And the library implementation works with the operands on the stack, so it will never use mode 0. Also the MAINDECs seem to omit checking this part – maybe it did not use a separate data flow in the original machines, so that it would not make sense to specifically test for it. Whatever the case, none of FFPA, FFPB, FFPC, KFPA, KFPB, KFPC, or ZKDL picked up this issue – all of these run quite nicely even when the bug in the CPU is present.

Also here, once I understood the nature of the problem the fix was quite easy, I only needed to advance the register read to the main instruction decode state. Where it should have been in the first place, obviously – the whole point of the fast path was that the register should have been read already during the instruction decode.

Anyway, I’ll be posting the updated sources to the download page, and later this weekend I’ll post updated bitstreams as well. Big thanks to Paul for his help in finding and fixing these!

Fixes for DEUNA

Over the last months, I had a couple of occurrences of the problem where 2.11BSD would loose it’s network connection, reporting that there were no transmit buffers available on the DEUNA. All in all, I’ve seen this problem three or four times over the last year, but maybe ten times in the last month or so. No idea why – the only thing that changed is that I have a new Ethernet switch, that could make for a subtle change in the timing.

Anyway, now that it occurred more often, that also gave me the opportunity to find out what was actually wrong. I enabled the debug code in the if_de.c driver for the DEUNA and added some more debug statements. Next morning I was surprised by debug output – showing that in effect all transmit buffers were free…

So, that got me thinking of the interrupt controller core in the DEUNA. It did contain some strange edge-trigger construction, that could potentially result in a deadlock. I changed it, and setup my venerable old 20Mhz oscilloscope to show the interrupt signals – br and bg.

This time I had to wait for three days for the problem to occur again – and to my disappointment, it did, and it still locked up in the same way. However, it was also clear that no interrupts were taking place, so I was definitely looking in the right place – the interrupt controller was maybe not locked up itself, but even so no interrupts were taking place. More evidence against the interrupt controller and the edge-trigger in it.

A couple of experiments showed that an easy solution would be just to generate interrupts on the level instead of the edge. But this would also cause the DEUNA to keep on interrupting until the software disabled interrupts or cleared the originating bit. Not very elegant, but it did work – and after some time, I realised that the software will in all likely scenario’s examine the interrupt bits, and most likely reset them. So would it maybe work if I went back to the edge triggering system, and reset the trigger on writes into the PCSR0 register?

Of course it did.

And a minor other thing comes to mind: I keep saying DEUNA, but it’s actually a DELUA now. The difference is only in the PCSR1 ID bits; no logic has been changed at all. I did this because Decnet on RSX-11M-Plus tries to load microcode into the DEUNA – which will not work because in reality of course the controller does not look like a real DEUNA at all. But it will leave a DELUA alone. And because all the other software – 2.11BSD and RSTS – does not seem to make a difference between DEUNA and DELUA, there seems to be no reason not to change the thing into a DELUA.

I changed several subtle things in the microcode as well, mostly around buffer chaining and resetting the chip if it becomes disconnected for some reason. Buffer chaining probably still is not correct, but it doesn’t really seem to be used extensively by the operating systems – it’s only when broadcast frames longer than what the network stack expect arrive that the code seems to be triggered.

The updates – including the fix for RSX-11M-Plus – are on the download page now, and several pregenerated bitstreams as well. Enjoy!

RSX11M-Plus. Finally.

A couple of weeks ago someone mentioned that there were some FPGA related articles in the December issue of Circuit Cellar. So I checked it, and one of the articles pointed me to the built-in logic analyzers that the leading tool chains now all seem to have. At least, the Circuit Cellar article is about Chipscope, which is the Xilinx variant, and Altera has something similar called SignalTap.

Since most of my Xilinx stuff has been stored away since last years spring cleaning, I decided to go and play with SignalTap. And as usual with the FPGA tooling, the first impression was not that favourable. But a couple of days later I thought to try again, and this time around I started to appreciate some of the things that the software can do. For instance, tap into an enormous lot of signals at a time – at least certainly compared to my old ‘real’ analyzer, which can do only 32 signals. And the amount of capture memory is also decent, provided you’ve some room in your FPGA memories.

But more interesting is the trick where you can let the analyzer capture when some subset of the signals change state. And you can assign names to bit pattern values in a capture. Those two tricks I used to finally find the problem that prevented RSX-11M-Plus from booting – first, I used the address match signal within the RH11 controller logic as a trigger for the analyzer to capture state, and second, I assigned the register names of the control registers within the RH11 to the address signal.

So, I thought that would give me a nice and easy overview of exactly what RSX-11M-Plus was doing to the RH, and what would cause it to get wrong results. And that is exactly what it did – only, not in the way I expected. Took me some time to see something that in retrospect is very obvious; there is a write to a register in the RH11 space, but it isn’t decoded into a register name – even though I added register names for all registers that I knew about.

Aha. So, something going on here… The first thing I checked was whether it could be a controller register or a disk register – in a real setup with RH and RP, some of the registers reside in the disk, others in the controller. I decided to verify all the controller side registers first – and the one that I was consistently missing was BAE, the register that holds the bits 21-16 of the address for the controller. A quick change to the controller source proved that to be correct; if I assigned BAE to this register address, suddenly RSX-11M-Plus would boot happily… And it seems to run quite happily as well, including running complete sysgens, and also running Decnet and other software.

A couple of things still need some clarification; mostly, do other registers also live at other addresses than I would expect them. Once that is done, and I’ve completed my usual regression tests, I’ll be posting the new vhdl to the download page.

This output from the SignalTap-II analyzer shows the unexpected address for the BAE register

This output from the SignalTap-II analyzer shows the unexpected address for the BAE register

FKTA, FKTB, FKTC

Last week Al Kossow posted some new sources for XXDP tests on Bitsavers. Included are the sources for the MMU tests for 11/34. I tried to run those, and was surprised by a list of error messages… Turns out, there were a number of issues left in my MMU implementation, that none of the other tests I’ve used so far had picked up. To be more specific, FKTA, FKTB, and FKTC all three complained about issues that FKTH, KKTA, KKTB, and ZKDK let pass…

  • mtpi worked in the wrong order. Because the instruction follows the more or less regular destination pattern, I did the address calculation for the destination first, and only after that the special part for mtpi – finding the implied operand from the stack, and popping the stack in the process. That gives wrong results if the stack pointer is updated in the destination address calculation… An obscure case, probably, and I’m not at all sure that this exact process is used by all PDP models. Nevertheless, there is a fix.
  • the a and w bits in the pdr registers should be reset on a write. However, my implementation only did that for word or even byte writes – not for odd byte writes.
  • and probably the most promising of all, the a and w bits in the pdr registers should not be set if the access was aborted by the mmu.

I had some hope that these fixes might also have an impact on the still mysterious problem in booting RSX-11M-Plus. No luck though… Still, it never ceases to amaze me how good and how devious some of these test programs are, it really sometimes takes hours to find out what a test does, and how to make the cpu and mmu work as it was intended. And, obscure though these issues may seem, they may in some way impact how some old software runs on the VHDL.

Anyway, I’m running a number of regression tests now, and will post the latest versions some time later. And maybe play with some of the other tests as well – there could still be more to find there.

And of course a big thankyou to Al!

Uptime

Since DEUNA works, I more or less constantly have one or more boards running 2.11BSD. One of these has only been down twice since November 2 – the first time after three months when I accidentally touched the power switch while cleaning, and the other time last week when a transformer exploded across the street – close to 10.000 homes without power. So only two reboots since begin of November – and both because of power problems. I have current systems that do worse…

Not much has happened on new developments. Partly because my boards were all showing increasingly impressive uptimes, partly because I was not sure on what to do next – and, other priorities demanded a lot of time. What I did do was restructure the DEUNA microcode a bit to make it more robust. There were a couple of bugs in there as well – for instance, the bit in the control register that should cause a reset of the controller hardware actually didn’t. But a more major improvement is that it is now no longer fatal if the pmod becomes disconnected. The controller will just reconnect when the chip becomes available to it again, and continue where it left off.

There is one problem remaining in DEUNA that I’m aware of; very occasionally – so far, I’ve seen it happen 3 times in almost 6 months of running – 2.11BSD will start complaining that there are no free buffers. At the same time, activity on the blinkenlights show that there is something going on – more instructions appear to be processed than normal during idle-waiting. However, vmstat shows no unusual activity that I can detect. The solution is to ifconfig the interface down and up – and things will be normal again.

In the meantime, I’ve been thinking on what to do next – if anything. One item on the roadmap that is a bit overdue is the split disk controller – I mean, the change to the disk controller that allows the cpu to run while the sd cards are active. Would be useful to make this change – I expect that 2.11BSD and probably RSTS and RSX would benefit from the cpu being able to run during card activity, and thus become faster. But also interrupt response and clock stability would improve. The downside is that it is relatively boring work…

Another idea I have is to change my mind and implement the Qbus systems after all. The difficult bit there really is only that that would require a DEQNA – which shares surprisingly little with DEUNA, so it would be a lot of work. And there really is only one reason to do it – it would run Xinu.

Anyway. Summer is here, time to go out and play. But maybe it will rain some days.