TCP/IP on 2.11BSD

After finding the bug in the MMU details of the JSR instruction, now almost three weeks ago, I thought to spend a lot of time just playing with 2.11BSD and not actively doing any development.

Well, that almost worked. At least, it did until I realized that it would be really great to be able to run a network on the PDP – and that since I could now rebuild the kernel on BSD, it should be possible to add a SLIP link to it and run TCP/IP over it. There was only a slight problem – the serial link controller. Good enough for generating output, even good enough to do some typing. But as I already had found out, it would lock up whenever there was more than just a few characters input, such as when cutting and pasting some text. The controller source was one of the oldest pieces of VHDL that I had left – and by and large unmodified for the last two years. Built according to what I then thought was a good idea. So basically what it needed was a major overhaul, or maybe even better, a complete rewrite, from scratch. Just a simple rewrite of a very simple controller. How much work can it be, really.

Probably, the right answer is: not that much. But the interrupt controller still took me a couple of tries to get exactly right. Because, it turns out that the real problem in the old controller was that sometimes it would get stuck in its interrupt controller – whenever there would be interrupts pending from both the receiver and the transmitter. And in the same situation, also interrupts would get lost – resulting in either the receiver or the transmitter to get stuck.

Anyway, it took some doing, but I think I’m fairly close now. It still is not perfect, but good enough to run SLIP over at modest speeds. It still gets stuck sometimes – especially when the disk is busy and locks out the lower level interrupts, the SLIP link will get stuck in retransmissions and may eventually become inoperative for minutes. It does recover by itself though – I’m not sure why, but it does.

So all in all I now have a working SLIP link to my 2.11BSD system:

# ifconfig sl0
sl0: flags=b1
        inet 192.168.0.1 --> 192.168.0.2 netmask ffffff00

and if we look at the counters with netstat

# netstat -in
Name  Mtu   Network     Address            Ipkts Ierrs    Opkts Oerrs  Coll
sl0   296   192.168     192.168.0.1       140479   953   138015    42     1
lo0   1536  127         127.0.0.1            391     0      391     0     0

so we can see that there is a significant number of input errors – some of which I think are caused by timeouts because the disk controller is locking the bus or causes its higher interrupt level to lock out the serial links. I might implement some kind of buffer in the KL controller to partially fix this, I’m not sure if I will do this yet though – another and maybe more elegant fix would be to make the disk controller somewhat more sophisticated.
Anyway, we can see some more detail in the other output from netstat, looking at the active sockets, the routing table, and the statistics:

# netstat -a
Active Internet connections (including servers)
Proto Recv-Q Send-Q  Local Address          Foreign Address        (state)
tcp        0      2  192.168.0.1.telnet     192.168.0.2.51342      ESTABLISHED
tcp        0      0  192.168.0.1.telnet     192.168.0.2.51332      ESTABLISHED
tcp        0      0  192.168.0.1.telnet     192.168.0.2.50618      ESTABLISHED
tcp        0      0  *.smtp                 *.*                    LISTEN
tcp        0      0  *.printer              *.*                    LISTEN
tcp        0      0  *.tcpmux               *.*                    LISTEN
tcp        0      0  *.discard              *.*                    LISTEN
tcp        0      0  *.echo                 *.*                    LISTEN
tcp        0      0  *.ident                *.*                    LISTEN
tcp        0      0  *.finger               *.*                    LISTEN
tcp        0      0  *.uucp                 *.*                    LISTEN
tcp        0      0  *.login                *.*                    LISTEN
tcp        0      0  *.shell                *.*                    LISTEN
tcp        0      0  *.telnet               *.*                    LISTEN
tcp        0      0  *.ftp                  *.*                    LISTEN
udp        0      0  127.0.0.1.ntp          *.*
udp        0      0  192.168.0.1.ntp        *.*
udp        0      0  *.ntp                  *.*
udp        0      0  *.who                  *.*
udp        0      0  *.time                 *.*
udp        0      0  *.echo                 *.*
udp        0      0  *.biff                 *.*
udp        0      0  *.syslog               *.*
Active UNIX domain sockets
Address  Type   Recv-Q Send-Q    Inode     Conn     Refs  Nextref Addr
    4588 dgram       0      0        0     5688        0     5988
    6288 dgram       0      0     3336        0     4088        0 /dev/log
    4188 dgram       0      0        0     5688        0     4408
    5908 dgram       0      0        0     5688        0        0
    4988 stream      0      0     3ed0        0        0        0 /dev/printer
# netstat -s
ip:
        142500 total packets received
        1207 bad header checksums
        159 with size smaller than minimum
        1009 with data size < data length
        349 with header length < data size
        0 with data length < header length
        0 fragments received
        0 fragments dropped (dup or out of space)
        0 fragments dropped after timeout
        0 packets forwarded
        0 packets not forwardable
        0 redirects sent
icmp:
        192 calls to icmp_error
        0 errors not generated 'cuz old message was icmp
        Output histogram:
                echo reply: 41249
                destination unreachable: 192
        0 messages with bad code fields
        0 messages < minimum length
        0 bad checksums
        0 messages with bad length
        Input histogram:
                destination unreachable: 2470
                echo: 41249
        41249 message responses generated
tcp:
        89138 packet sent
                88077 data packet (9414995 bytes)
                818 data packets (80393 byte) retransmitted
                190 ack-only packets (164 delayed)
                3 URG only packets
                3 window probe packets
                0 window update packets
                88 control packets
        89418 packet received
                86822 ack (for 9335337 bytes)
                145 duplicate acks
                0 acks for unsent data
                2587 packets (4470 bytes) received in-sequence
                20 completely duplicate packets (33 bytes)
                0 packets with some dup. data (0 bytes duped)
                3 out-of-order packets (0 bytes)
                0 packets (0 bytes) of data after window
                0 window probes
                0 window update packets
                0 packets received after close
                0 discarded for bad checksums
                0 discarded for bad header offset fields
                0 discarded because packet too short
        84 connection requests
        10 connection accepts
        10 connections established (including accepts)
        95 connections closed (including 3 drops)
        87 embryonic connections dropped
        77241 segment updated rtt (of 78054 attempt)
        850 retransmit timeouts
                3 connections dropped by rexmit timeout
        0 persist timeouts
        56 keepalive timeouts
                56 keepalive probes sent
                0 connections dropped by keepalive
udp:
        0 incomplete headers
        0 bad data length fields
        3 bad checksums
        192 no ports
        0 (arrived as bcast) no ports
#
# netstat -rn
Routing tables
Destination      Gateway            Flags     Refs     Use  Interface
127.0.0.1        127.0.0.1          UH          1      333  lo0
192.168.0.2      192.168.0.1        UH          3    74558  sl0
default          192.168.0.2        UG          2     4520  sl0

And then I should probably also show these commands:

# uptime
 12:23am  up 4 days, 21:27,  4 users,  load averages: 0.66, 0.40, 0.21
# who
root            console Feb 26 02:33
sytse           ttyp0   Mar  1 23:06    (192.168.0.2)
root            ttyp1   Feb 26 02:56    (192.168.0.2)
root            ttyp2   Mar  2 00:20    (192.168.0.2)
# w
 12:24am  up 4 days, 21:28,  4 users,  load averages: 0.56, 0.40, 0.21
User            tty       login@  idle   JCPU   PCPU  what
root            console   2:33am    53     12      1  -sh
sytse           ttyp0    11:06pm    55      2      1  -sh
root            ttyp1     2:56am 71:13  96:26  29:25  sleep 10
root            ttyp2    12:20am            4      1  w
# ps ax
   PID TTY TIME COMMAND
     0 ?   0:35 swapper
     1 ?   0:01  (init)
    46 ?   0:34 syslogd
    56 ?   7:14 update
    59 ?   0:02 cron
    63 ?   1:34 acctd
    71 ?   0:01 /usr/sbin/inetd
    75 ?   0:04 rwhod
    79 ?   0:00 /usr/sbin/lpd
    97 ?   0:02 /usr/sbin/sendmail -bd -q1h
   101 ?   1:13 ntpd
   105 co  0:01 -sh
    38 l1  0:00 slattach /dev/ttyl1 9600
 15302 p0  0:02 telnetd
 15303 p0  0:01 -sh
   130 p1  4:22 telnetd
   131 p1 29:25 -sh
 18941 p1  0:00 sleep 10
 18765 p2  0:01 telnetd
 18766 p2  0:01 -sh
 18944 p2  0:00 ps ax
# uname -a
BSD pdp11.sytse.net 2.11 2.11 BSD UNIX #27: Tue Feb 21 21:14:38 MET 2012     root@pdp11.sytse.net:/usr/src/sys/PDP2011  pdp11

Maybe a bit more interesting specifically about the VHDL system aspect is that also the ntpd runs. The ntpd synchronizes to the ntpd running on my PC, that is itself synchronized to some outside time source – so the clock running in 2.11BSD is actually showing the real time, and quite accurate as well – even though the clock is only derived from the modest 50Mhz crystal oscillator on the DE0-Nano board that I’m using for these tests. The crystal oscillator is not really stable – by itself, it tends to waver a couple of minutes fast or slow per day.

# ntpdc -v localhost
Neighbor address 10.1.0.10 port:123  local address 192.168.0.1
Reach: 0377 stratum: 3, precision: -24
dispersion: 64000.000000, flags: 9101, leap: 0
Reference clock ID: [10.1.0.7] timestamp: fad20173.eb057573
hpoll: 6, ppoll: 6, timer: 64, sent: 6535 received: 6382
Delay(ms)   175.00  175.00  175.00  175.00  175.00  175.00  175.00  175.00
Offset(ms)    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
        delay: 175.000000 offset: 4085.000000 dsp 64000.000000
# tail -200 /usr/adm/messages|grep ntp
Mar  1 18:47:14 pdp11 March  1 18:47:14 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 0.900270 drft 0.007104 cmpl 0.022846
Mar  1 18:55:48 pdp11 March  1 18:55:48 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 1.819736 drft 0.007104 cmpl 0.022846
Mar  1 19:04:21 pdp11 March  1 19:04:21 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 0.856055 drft 0.007104 cmpl 0.022846
Mar  1 19:08:37 pdp11 March  1 19:08:37 ntpd[101]: stats: dc 0.007104 comp 0.022846 peersw 1 inh 0 off 2.043122 SYNC 10.1.0.10 3
Mar  1 19:12:55 pdp11 March  1 19:12:55 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 2.043122 drft 0.007104 cmpl 0.022846
Mar  1 19:21:30 pdp11 March  1 19:21:30 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 2.016722 drft 0.007104 cmpl 0.022846
Mar  1 19:30:02 pdp11 March  1 19:30:02 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 0.417967 drft 0.007104 cmpl 0.022846
Mar  1 19:38:36 pdp11 March  1 19:38:36 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 1.240674 drft 0.007104 cmpl 0.022846
Mar  1 19:47:09 pdp11 March  1 19:47:09 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 1.144854 drft 0.007104 cmpl 0.022846
Mar  1 19:55:44 pdp11 March  1 19:55:44 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 2.617553 drft 0.007104 cmpl 0.022846
Mar  1 20:04:16 pdp11 March  1 20:04:16 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 0.368686 drft 0.007104 cmpl 0.022846
Mar  1 20:08:48 pdp11 March  1 20:08:48 ntpd[101]: stats: dc 0.007104 comp 0.022846 peersw 1 inh 0 off 2.624537 SYNC 10.1.0.10 3
Mar  1 20:12:51 pdp11 March  1 20:12:51 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 2.624537 drft 0.007104 cmpl 0.022846
Mar  1 20:21:25 pdp11 March  1 20:21:25 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 1.297541 drft 0.007104 cmpl 0.022846
Mar  1 20:29:57 pdp11 March  1 20:29:57 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 0.317852 drft 0.007104 cmpl 0.022846
Mar  1 20:38:31 pdp11 March  1 20:38:31 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 1.005208 drft 0.007104 cmpl 0.022846
Mar  1 20:47:03 pdp11 March  1 20:47:03 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 0.309247 drft 0.007104 cmpl 0.022846
Mar  1 20:55:37 pdp11 March  1 20:55:37 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 1.381118 drft 0.007104 cmpl 0.022846
Mar  1 21:04:11 pdp11 March  1 21:04:11 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 1.803875 drft 0.007104 cmpl 0.022846
Mar  1 21:08:59 pdp11 March  1 21:08:59 ntpd[101]: stats: dc 0.007104 comp 0.022846 peersw 1 inh 0 off 0.255038 SYNC 10.1.0.10 3
Mar  1 21:12:43 pdp11 March  1 21:12:43 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 0.391487 drft 0.007104 cmpl 0.022846
Mar  1 21:21:17 pdp11 March  1 21:21:17 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 1.278973 drft 0.007104 cmpl 0.022846
Mar  1 21:29:50 pdp11 March  1 21:29:50 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 0.281603 drft 0.007104 cmpl 0.022846
Mar  1 21:38:24 pdp11 March  1 21:38:24 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 1.804268 drft 0.007104 cmpl 0.022846
Mar  1 21:46:56 pdp11 March  1 21:46:56 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 0.373988 drft 0.007104 cmpl 0.022846
Mar  1 21:55:31 pdp11 March  1 21:55:31 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 1.949129 drft 0.007104 cmpl 0.022846
Mar  1 22:04:03 pdp11 March  1 22:04:03 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 0.432497 drft 0.007104 cmpl 0.022846
Mar  1 22:09:07 pdp11 March  1 22:09:07 ntpd[101]: stats: dc 0.007104 comp 0.022846 peersw 1 inh 0 off 2.688494 SYNC 10.1.0.10 3
Mar  1 22:12:39 pdp11 March  1 22:12:39 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 3.657928 drft 0.007104 cmpl 0.022846
Mar  1 22:21:12 pdp11 March  1 22:21:12 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 0.259591 drft 0.007104 cmpl 0.022846
Mar  1 22:44:07 pdp11 March  1 22:44:07 ntpd[101]: Lost reachability with 10.1.0.10
Mar  1 22:46:20 pdp11 March  1 22:46:20 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 7.093814 drft 0.007104 cmpl 0.022846
Mar  1 22:54:53 pdp11 March  1 22:54:53 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 0.730772 drft 0.007104 cmpl 0.022846
Mar  1 23:03:26 pdp11 March  1 23:03:26 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 0.366301 drft 0.007104 cmpl 0.022846
Mar  1 23:12:02 pdp11 March  1 23:12:02 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 3.575523 drft 0.007104 cmpl 0.022846
Mar  1 23:18:26 pdp11 March  1 23:18:26 ntpd[101]: stats: dc 0.007104 comp 0.022846 peersw 1 inh 0 off 1.043266 SYNC 10.1.0.10 3
Mar  1 23:20:35 pdp11 March  1 23:20:35 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 1.043266 drft 0.007104 cmpl 0.022846
Mar  1 23:29:11 pdp11 March  1 23:29:11 ntpd[101]: adjust: STEP 10.1.0.10 st 3 off 4.085119 drft 0.007104 cmpl 0.022846

For which there is also a little story to tell – just after the SLIP link started working, I noticed that the ntpd would crash just after establishing the sync. That turned out to be caused by a minor problem in the load/convert integer to float instruction – LDCLF, in this case. The error was caused by that I set the length of the long integer to 16 bits whenever R7 was used in the source – but that rule should only be applied when the mode is 2. That caused a divide-by-zero. Luckily, I’m getting fairly good with adb… But what is still a bit amazing is that this error is quite an obvious one – but none of all the 11/34, 11/44, 11/45, 11/70, or J-11 MAINDEC tests I ran for the FP11 detected it. Can’t really complain about that, of course – these test programs were obviously not designed for finding bugs in a VHDL CPU almost 40 years later.

So, I’m busy adding the new serial controller to all the board level files. And I’ve made some changes to the clock controller as well – it will be configurable 50 or 60Hz. And I’ve ordered some new toys – a Nexys3 board and the PMODNIC100. Not sure yet if I’ll turn that into a DEUNA. As I said before, that’s a lot of work, and I’m not sure I like the DEUNA. I will start working on improving the RH controller though – to make the SD card interface work separately from the rest of the controller, so that the bus can be released while the card is busy. That will make systems much more responsive during disk activity.
That’s all for now!