STCFI

Store and Convert from Floating or Double to Integer or Long Integer. STCFI / STCFL / STCDI / STCDL – or stc(f|d)(i|l) in my notation. One of the more obscure instructions in the FP11 floating point unit, and the hiding place of an equally obscure bug in PDP2011 – undetected since I wrote the erroneous bit of logic for the instruction, just before Christmas 2009.

It’s not the first time I’ve found an issue in the FP11 – there have been issues with the LDCIL instruction with mode 2, register 7 operands, and ABS/NEG/CLR mode 0 operands – in both cases resulting in the floating point ALU not getting the right input value. In the STCFI case though the right input reached the FP11, but the calculation of the result was wrong.

How? Well, simply explained, because I forgot to allow for the case where a negative result is stored in a 16-bit integer.

The PDP-11 floating point format stores the fractions as absolute values, and with the sign stored separately. This is different from how integers are stored; those are stored in 2-s complement format. Converting to 2-s complement is easy enough: invert all bits, and add 1. There’s one tiny detail to that though: you have to add the 1 in the right place – to the least significant bit.

And that is where things were wrong – only the case for 32-bits integers was implemented, and – since 16- and 32-bits integers are stored left-aligned to the most significant bit – the add-1 operation was 16 bits out of place for the smaller format. And thus, negative integers would typically be converted to a value of 1 lower than the right answer – simply because the add-1 operation ended up in the wrong place.

(wrong)
if falu_input(63) = '1' then
   falu_output(63 downto 32) <= (not falu_work1(58 downto 27)) + 1; 

(right)
if falu_input(63) = '1' then                               
   if fps(6) = '0' then
      -- 16-bit                           
      falu_output(63 downto 48) <= (not falu_work1(58 downto 43)) + 1;
   else             
      -- 32-bit                     
      falu_output(63 downto 32) <= (not falu_work1(58 downto 27)) + 1;
   end if; 

Pretty simple, really. Except it took me a while to find this one.

I should probably explain how I found it. It started with the graphs in MINC, and comparing what my vt105-under-development was doing and how the graphs compared to what was in the book – the MINC-11 Book 4: MINC Graphic Programming. For instance figure 6 – a nice graph of a full screen sinus. Except mine was an almost flat line with very small waves.

I could correct that by setting the dimensions of the graph explicitly – using a WINDOW statement, or with the ‘exact’ keyword on the GRAPH statement. But still, things didn’t work as they should according to the book – and worse, a quick check showed that simh did get the scale right. So a definite difference, and with the previous experience of obscure bugs in the FP11, and since BASIC tends to use a lot of floating point, I had a suspicion where I would find the problem.

After spending some hours over a couple sessions of looking at the logic analyzer window and reading up on floating point formats, I decided that I wasn’t making a lot of progress that way. So I tried a different idea: I exported everything that the analyzer could capture to a text file, and wrote a script to interpret that – giving me a file with 3 columns: the instruction, the input value, and the output value – and those in readable values instead of hex floating point format.

That worked surprisingly well – it was immediately clear how the graph routine was scanning over all the data points to find the extreme values – by this time, I knew half the graph data points by heart so it was easy to recognise what was going on. Next it would do some calculations for setting the appropriate scale based on values in a table and some operations on that. Everything seemed to work as it should – everything made sense, and all the calculations I saw seemed correct – there wasn’t a line in the list of instructions that was obviously wrong.

So then I set out to make a similar file, but from a run in simh – adding fprintf statements all through the simh source file that deals with floating point, such that the same list of instructions, input and output would come out – and after an hour, I had two files that I could compare, and quite soon after that I had pinpointed the STCFI instruction as the problem. And then it it was also clear what the issue was – the value it was converting was -1.301015, and the result was -2: obviously wrong.

But how to fix it was another matter – it took me at least an hour to find the source statement shown above, and to understand why that was the cause of the error and how to fix it. But once I realised what the exact issue was, it was obvious – and easily verified that of course the fix did in fact solve the autorange problem.

I’ll publish the fix as usual – in a complete update, and together with the new MINC updates and vt105; after some final regression testing, and probably this week. If you run a regular, non-MINC PDP2011, you should probably update to the new version – but not necessarily with all urgency. This bug has been there for a while without anyone noticing, and it doesn’t seem the STCFI instruction is used a lot. Also (from earlier experience with the other FP11 bugs) compiled code is probably not affected. Even in MINC, STCFI is not used all that often – I’ve only really noticed things going wrong with the GRAPH statement.

Leave a Reply