Well, 2022 came and went without a post from me, and I apologize. Oh, and most of 2023. Sorry.
I had plans, honest. I still have plans actually and maybe sometime in what remains of this year I can actually do something about them.
I started posting on Mastodon (if that means anything) you can check it out right here, I sometimes post more than once a month, can you even stand it?
I started writing this post back in June, intending to publish it on the third anniversary of the shuttering of LCM+L, but I couldn't think of anything to say that wouldn't get me immediately sued by The Allen Estate (hallowed be its name) and/or Vulcan so I'll just say that it sure is a darn shame that the place hasn't reopened and I'm sure the current owner has the best of intentions.
Anyway.
ANYWAY.
I do have some new projects in progress and coming down the pike, so to speak.
Xerox Dolphin
(For a bit more background on the Dolphin, see my previous post here about it.)
After many months I've finally made significant progress getting the Dolphin to boot over its 3mbit Ethernet interface!
Xerox 1100 (Dolphin) console |
It doesn't really look like much, but that picture to the left is the Dolphin (aka 1100, aka D0) after successfully booting into the Alto Net Executive. (At the same time I got the thing to boot, I learned that my console has no vertical deflection (more on that later)).
The Dolphin can boot from either 3mbit Ethernet or from the internal hard disk (a 24mb Shugart SA-4008, with 14" platters). I've been working on updating the IFS implementation (which I started working on while at LCM, long may it stand) to support the new protocols needed to support bootstrapping D0's. Turns out this only actually requires one new protocol, everything else uses standard PUP services that I'd already implemented to support Alto systems.
That new protocol is "MicrocodeBootRequest" and it's used by the Dolphin's microcode ROM to bring down the "Initial" microcode from the network. Initial does some hardware testing and initialization, and then initiates another MicrocodeBootRequest, this time to load so-called "emulator" microcode. This is the final microcode loaded at boot, and its job is to "emulate" (that is, implement) the instruction set that the real software for the system runs on. On the Dolphin, there were a variety of options, but typically this was either Alto (which emulates the Alto's Nova-based instruction set) or Mesa, which executes the Mesa instruction set. The Dolphin Alto emulator microcode was primarily used as a bootstrap into other systems (Smalltalk, for example) or for running common Alto utilities to transfer files or copy disks.
The MicrocodeBootRequest protocol is so simple the only documentation for it is less than a paragraph long:
"For version 1 of the protocol, a server willing to supply the data simply sends a sequence of packets of type MicrocodeReply as fast as it can. The high half of its pupID contains the version number (1) and the low half of the pupID contains the packet sequence number. After all the data packets have been sent, the server sends an empty (0 data bytes) packet for an end marker. There are no acknowledgments. This protocol is used by Dolphins and Dorados. Currently, the version 1 servers send packets containing 3 * n words of data. This constraint is imposed by the Rev L Dolphin EPROM microcode. I’d like to remove this restriction if I get a chance, so please don’t take advantage of it unless you need to. The Rev L Dolphin EPROM also requires the second word of the source socket to be 4. / HGM May - 80."
Seems simple enough, right? It took me about five minutes to put together the initial implementation and... it didn't work. Looked it over, fixed a couple of small things -- still, no joy. Pulled hair for an entire evening and even that didn't help.
So I went to go looking for the Rev L Dolphin EPROM microcode sources and a friend of mine hooked me up (these sources have since been released publicly in CHM's PARC archives here). And there, right there, near the beginning of the Ethernet input routine, was this:
"skip over first packet?" What fresh hell is this? But sure enough, I changed my implementation to spit out a garbage packet before beginning the actual transfer... and it worked! The MP codes on the front panel of the D0 changed as Initial started a series of short diagnostic tests.
Front panel of the 1100, showing an MP code of 702 |
It stopped at "0702," which translates to "Bad Map" per the available documentation. The map is a 16KW RAM used by the memory mapping hardware to provide address translation; essentially a lookup table for virtual to physical address mappings. A failure here would seem to point to a bad ram chip or two (which is not at all uncommon for the 4116 RAMs used in this thing) but it turned out to be the power supply -- the 12V supply had gone away, and 4116s require a 12V supply to function. (It took me *way* too long to find this, and I'm still kind of embarrassed about it.)
I tracked the cause of the loss of 12V to a shorted electrolytic capacitor on one of the 96KW memory boards, and after replacing it all was good -- the tests in the "Initial" microcode finished without errors, and it progressed up to MP code 758, which means: "Trying to load AltoD0.eb from the Ethernet." Unfortunately it never continued past that -- the D0 would indeed request AltoD0.eb from my IFS, and IFS would happily send it, but the boot process never went any further.
HP 1670D Logic Analyzer lashed up |
I started writing a disassembler for Dolphin microinstructions and made some small progress with it, but the instruction stream was still rather difficult to decipher. And then it occurred to me: You dummy, you have the original source code for the boot ROM, it's in the CHM archive. What if the archive also has the original Alto programs used to assemble those sources? What if you could use those tools to produce a listing as well? Then all you'd need would be the microcode instruction addresses from your logic analyzer captures, and you could see *exactly* what was being executed at the source level... with context!
ContrAlto running MicroD |
After all that faffing about, now I had a set of listing files that looked like this:
MicroD 9.14 (OS 16) of March 25, 1981
at 18-Jul-95 20:15:41
MicroD.run/o/m Sa4000Boot EtherBoot Boot
Sa4000Boot.DIB 302b instructions written 18-Jul-95 20:14:37
EtherBoot.DIB 222b instructions written 18-Jul-95 20:14:10
Boot.DIB 356b instructions written 18-Jul-95 20:13:48
Total of 1102b instructions
Checking for errors...
Linking...
Building allocation lists...
Assigning locations...
1102b instructions in rings involving ONPAGE or AT
Reloading binaries...
Checking assignment...
Writing .MB file...
Writing listing...
IM:
Imag Real W0 W1 W2 Symbol
---- ---- ------ ------ -- --------
Sa4000Boot.DIB:
0 77 30050 33171 14 DISKBOOT
1 74 30024 131025 14 (+1)
2 12 30147 21372 14 (+2)
3 13 34004 101020 14 (+3)
4 10 30026 101014 14 DISKEMULATORLOOP
5 6 30050 25011 14 (+1)
6 4 30050 25372 14 (+2)
7 5 31350 124414 14 (+3)
10 7 35350 124420 14 (+4)
11 11 30045 3175 14 (+5)
12 76 30050 25132 14 (+6)
13 75 30050 25401 14 DISKPAGETASKSWITCH
14 @6000 30045 1002 14 INITRDC
15 6001 30002 61277 14 (+1)
16 6002 30020 41101 14 (+2)
...
The column labeled "Real" indicates the physical address in the control store that the given symbol+offset resides at, and given that I could do fun things like annotating the original source code:
%
Get here is Task 0 with previous error code in T and in MP and all old devices already turned off.
If EtherBoting doesn't work, we will blink the MP between the old number and our number.
%
1400 EtherBoot: OldMPCode <- T, AT[EtherBootLoc];
%
Check to see if we were booted by the fault handler (Boot instruction in Boot.mc) or booted because we got another fault while the fault handler (task 15) was already running. If so, delay a while to avoid flooding the net. Note that we will get an H4 pairity error right away if the pairity bit on the host number switches is set wrong.
We would like to test for a programmed boot (BootF = 4) or a parity boot (1), but the hardware doesn't latch BootF. So for now, we will take the short delay if we recognize why we are getting booted.
%
* BEWARE: The documentation (Feb 80) doesn't mention that the bits are read in complemented.
1401 T <- (GetRSpec[157]) XOR (77C);
1411 R0 <- T;
* LU <- (R0) AND (5C); * programed Boot or Parity Boot
* SKIP[ALU=0],TimerReg <- 400C;
1427 LU <- (R0) AND (62C); * PowerOn, Tester, or PushButton
1456 SKIP[ALU#0],TimerReg <- 400C;
1454 TimerReg <- 20000C;
1455 NOP; * Allocation
1404 EtherDally: R0 _ 60000C;
1402 R0 <- (R0)-1, GOTO[.,R>=0];
1403 TimerReg <- (TimerReg)-1, GOTO[EtherDally,R>=0];
With this annotated listing in hand, it was relatively easy to see what was going on in the processor after it'd attempted to download the emulator microcode. What it looked like on the logic analyzer was an infinite (or nearly so) loop of the below:
D0 - State Listing
Label > Odata CIA
Base > Octal Octa
____________ ______ ____
13290 000220 0302
13292 001462 0326
13294 004122 0336
13296 004220 0303
13298 001462 0326
13300 004122 0336
13302 000220 0304
13310 001462 0326
13313 000222 0336
13314 000263 0305
13316 006264 0310
13321 000527 0307
13323 000522 0313
13324 000535 0314
13326 000626 0315
13329 000526 0317
13331 001426 0320
13333 000022 0321
13337 000526 0322
13339 001426 0323
13341 000022 3774
13345 004126 0325
13347 000220 0302
The annotated listing revealed the above loop to be this bit of microcode (in LoadRam.mc):
OnPage[LRJpage];
LRJenter:
0300 xfTemp <- 1c, AT[LRJStart,0];
* following location is the place to restart for overlays
0301 T <- xfTemp, AT[LRJContinue]; * wait for write of xfTemp to avoid bypass problem
LRJloop:
0302 PFetch1[LP,xBuf2], Call[LRJIncCount], At[LRJStart,2];
0303 PFetch1[LP,xBuf], Call[LRJIncCount], At[LRJStart,3];
0304 PFetch1[LP,xBuf1], Call[LRJIncCount], At[LRJStart,4];
0305 LU <- (LdF[xfTemp1,14,3]) + 1, GoTo[DoTask,R Odd], At[LRJStart,5];
0310 xfTemp1 <- (xfTemp1) + (2c), Skip[Alu#0], At[LRJStart,10];
0306 Refresh[xfTemp1, 0], At[LRJStart,6];
0307 T <- LdF[xBuf2,0,14], At[LRJStart,7]; * address
0313 xBuf3 <- T, At[LRJStart,13];
LRJloopx:
0314 LU <- (xBuf3) xnor (170000c), At[LRJStart,14]; *look for m-i address = 7777
0315 T <- xBuf2, GoTo[RamLoaded,Alu=0], At[LRJStart,15];
0317 LU <- xBuf, At[LRJStart,17];
0320 APCTask&APC <- xBuf3, At[LRJStart,20];
0321 WriteCS0&2, At[LRJStart,21];
0322 LU <- xBuf1, At[LRJStart,22];
0323 APCTask&APC <- xBuf3, At[LRJStart,23];
0324 WriteCS1, At[LRJStart,24];
0325 T <- xfTemp, GoTo[LRJloop], At[LRJStart,25];
LRJIncCount:
0326 T _ xfTemp _ (xfTemp) + 1, UseCTask, GoTo[LRJRet], At[LRJStart,26];
This loop stuffs the downloaded microcode from main memory into the D0's control store... so the download had apparently completed, but why was the copy to the control store never finishing?
Even though microcode's not the easiest thing to read, the code is fairly simple: it copies microcode data from main memory into the microcode store until it finds an entry with a destination address of 7777; this exit condition is tested for by the instruction at 0314, the actual branch takes place in 0315. So the only obvious way to get an infinite loop here is if the microcode data in memory is somehow invalid and doesn't have an entry with address 7777 specified in it. If that were to occur, the microcode would loop through memory forever since there are no other exit conditions.
I decided to look a step back in the boot process and see what the actual microcode download was doing -- perhaps by looking at the behavior of the microcode during download some additional secrets would be revealed. I set the logic analyzer to trigger at E3InStart (microcode address 2110) and rebooted the Dolphin. And what I noticed was that shortly after triggering, the microcode execution trace fell right back into that same LRJloop as above -- so the download was being short-circuited for some reason. I moved the trigger back to LRJloop, and I could watch the Dolphin jump into that loop while my IFS server was still sending microcode. So it was 100% clear that whatever I was sending, the Initial microcode didn't like it.
But why?
If the microcode that the IFS sent was invalid or corrupted (bad checksum) or just being ignored it'd trigger a retry of the MicrocodeBootRequest, but that's not what was happening -- the Initial microcode was reading in a single Ethernet packet, taking zero bytes out of it, and continuing happily on its way -- leaving no data in memory for the LRJloop code to copy, and no "7777" marker to indicate the end of the transfer.
Zero bytes... none of the packets that IFS is sending should have a zero byte payload. Except that pesky garbage packet I hacked in to satisfy the boot ROM microcode... hmm. Well, it made no sense at all, because the Initial code also discards the first packet it receives, but just for the fun of it I hacked the IFS code again. This time I modified it to send the garbage packet only for the Initial microcode, not for the AltoD0 microcode.
It worked. It worked. The boot ROM pulled down Initial, Initial pulled down AltoD0, and the emulator booted into the Alto Net Executive via a Breath Of Life packet (just like a real Alto!) and there you have it.
As to why this hack (on top of a hack) works, I'm still not sure. I don't know why the D0's Boot ROM and Initial microcode skip that first packet -- the Xerox IFS implementation doesn't send a garbage packet, it effectively does exactly what my re-implementation does. Further, the boot microcode on the Dorado (the D0's ECL-based big brother, which uses the same MicrocodeBootRequest protocol) doesn't discard any packets at all! As far as I can tell, my IFS code (and the hardware it's built on top of) isn't doing anything incorrect, but I can't discount having overlooked something. The discrepancy between the Dolphin and the Dorado's microcode makes me think that the Dolphin may have had some kind of hardware bug that the boot microcode is working around.
At this point I'm pretty close to having the system running again, but I need to fix the issue with the console's lack of vertical deflection. The cause is the deflection yoke: the vertical windings were corroded by the glue used to hold them in place, and went open. Fortunately, a fellow I know rewinds transformers and the like for audio folks and he's graciously taken it on as a short project, so hopefully when that's complete I'll be able to see an actual picture on screen.The 1100 Console's deflection yoke
After that, I need to work on the hard drive, after which I can have all kinds of fun (it'd be neat to try to get Cedar running, for one..) Then... I tackle the Dorado...
Other Projects
In addition to the Dolphin and the Dorado projects, I also semi-recently acquired a DEC PDP-11/60, an 11/20, most of a Honeywell DDP-416 processor, a Raytheon 704, a DEC KS10, and I will be taking a trip later this year to grab a PDP-15. I also still have to finish restoring the PDP-11/45, reading in about 80 QIC tapes for the Three Rivers PERQ, and there's probably at least three other things I've forgotten. I guess what I'm saying is: I have a problem, this blog is a cry for help, won't someone please help me end the pain.
Until next time (probably September of 2025, by my reckoning), be excellent to each other... and PARTY ON, DUDES!
Party on! Really do enjoy following this!
ReplyDelete