Tuesday, August 22, 2023

Stuff

Well, 2022 came and went without a post from me, and I apologize.  Oh, and most of 2023.  Sorry.

I had plans, honest.  I still have plans actually and maybe sometime in what remains of this year I can actually do something about them.

I started posting on Mastodon (if that means anything) you can check it out right here, I sometimes post more than once a month, can you even stand it?

I started writing this post back in June, intending to publish it on the third anniversary of the shuttering of LCM+L, but I couldn't think of anything to say that wouldn't get me immediately sued by The Allen Estate (hallowed be its name) and/or Vulcan so I'll just say that it sure is a darn shame that the place hasn't reopened and I'm sure the current owner has the best of intentions.

Anyway.

ANYWAY.

I do have some new projects in progress and coming down the pike, so to speak.

Xerox Dolphin 

(For a bit more background on the Dolphin, see my previous post here about it.)

After many months I've finally made significant progress getting the Dolphin to boot over its 3mbit Ethernet interface!

Xerox 1100 (Dolphin) console

It doesn't really look like much, but that picture to the left is the Dolphin (aka 1100, aka D0) after successfully booting into the Alto Net Executive.  (At the same time I got the thing to boot, I learned that my console has no vertical deflection (more on that later)). 

The Dolphin can boot from either 3mbit Ethernet or from the internal hard disk (a 24mb Shugart SA-4008, with 14" platters).  I've been working on updating the IFS implementation (which I started working on while at LCM, long may it stand) to support the new protocols needed to support bootstrapping D0's.  Turns out this only actually requires one new protocol, everything else uses standard PUP services that I'd already implemented to support Alto systems.

That new protocol is "MicrocodeBootRequest" and it's used by the Dolphin's microcode ROM to bring down the "Initial" microcode from the network.  Initial does some hardware testing and initialization, and then initiates another MicrocodeBootRequest, this time to load so-called "emulator" microcode.  This is the final microcode loaded at boot, and its job is to "emulate" (that is, implement) the instruction set that the real software for the system runs on.  On the Dolphin, there were a variety of options, but typically this was either Alto (which emulates the Alto's Nova-based instruction set) or Mesa, which executes the Mesa instruction set.  The Dolphin Alto emulator microcode was primarily used as a bootstrap into other systems (Smalltalk, for example) or for running common Alto utilities to transfer files or copy disks.

The MicrocodeBootRequest protocol is so simple the only documentation for it is less than a paragraph long:

"For version 1 of the protocol, a server willing to supply the data simply sends a sequence of packets of type MicrocodeReply as fast as it can.  The high half of its pupID contains the version number (1) and the low half of the pupID contains the packet sequence number. After all the data packets have been sent, the server sends an empty (0 data bytes) packet for an end marker.  There are no acknowledgments. This protocol is used by Dolphins and Dorados. Currently, the version 1 servers send packets containing 3 * n words of data.  This constraint is imposed by the Rev L Dolphin EPROM microcode.  I’d like to remove this restriction if I get a chance, so please don’t take advantage of it unless you need to.  The Rev L Dolphin EPROM also requires the second word of the source socket to be 4. / HGM May - 80."

Seems simple enough, right?  It took me about five minutes to put together the initial implementation and... it didn't work.  Looked it over, fixed a couple of small things -- still, no joy.  Pulled hair for an entire evening and even that didn't help.

So I went to go looking for the Rev L Dolphin EPROM microcode sources and a friend of mine hooked me up (these sources have since been released publicly in CHM's PARC archives here).  And there, right there, near the beginning of the Ethernet input routine, was this:


"skip over first packet?"  What fresh hell is this?  But sure enough, I changed my implementation to spit out a garbage packet before beginning the actual transfer... and it worked!  The MP codes on the front panel of the D0 changed as Initial started a series of short diagnostic tests.

Front panel of the 1100, showing an MP code of 702

It stopped at "0702," which translates to "Bad Map" per the available documentation.  The map is a 16KW RAM used by the memory mapping hardware to provide address translation; essentially a lookup table for virtual to physical address mappings.  A failure here would seem to point to a bad ram chip or two (which is not at all uncommon for the 4116 RAMs used in this thing) but it turned out to be the power supply -- the 12V supply had gone away, and 4116s require a 12V supply to function.  (It took me *way* too long to find this, and I'm still kind of embarrassed about it.)

I tracked the cause of the loss of 12V to a shorted electrolytic capacitor on one of the 96KW memory boards, and after replacing it all was good -- the tests in the "Initial" microcode finished without errors, and it progressed up to MP code 758, which means: "Trying to load AltoD0.eb from the Ethernet."  Unfortunately it never continued past that -- the D0 would indeed request AltoD0.eb from my IFS, and IFS would happily send it, but the boot process never went any further.

HP 1670D Logic Analyzer lashed up
And this is where I remained stuck until July 2023.  I'd considered a number of options, but ultimately what I needed to be able to see was what the microcode was actually doing when the boot failed.  Much like on the Alto, the D0's backplane offers an array of test points that let me look at the current microcode instruction address, and most (but not all) of the microcode instruction word being executed.  I rigged up the logic analyzer to capture this data, but the major issue was being able to decipher the instruction stream from a raw dump of the bits.

I started writing a disassembler for Dolphin microinstructions and made some small progress with it, but the instruction stream was still rather difficult to decipher.  And then it occurred to me: You dummy, you have the original source code for the boot ROM, it's in the CHM archive.  What if the archive also has the original Alto programs used to assemble those sources?  What if you could use those tools to produce a listing as well?  Then all you'd need would be the microcode instruction addresses from your logic analyzer captures, and you could see *exactly* what was being executed at the source level... with context!

ContrAlto running MicroD
Well, the idea had merit, I gotta admit.  And, as it turns out, the CHM archive did indeed have the D0 microcode assembler (called MicroD), and so it was time to whip out my other pet project, ContrAlto to do the heavy lifting.  The result is available here (the archive contains both the sources and an Alto disk image containing all the necessary tools and sources to rebuild everything for all three stages of microcode boot.)

After all that faffing about, now I had a set of listing files that looked like this:

MicroD 9.14 (OS 16) of March 25, 1981
  at 18-Jul-95 20:15:41

MicroD.run/o/m Sa4000Boot EtherBoot Boot


Sa4000Boot.DIB   302b instructions   written 18-Jul-95 20:14:37
EtherBoot.DIB    222b instructions   written 18-Jul-95 20:14:10
Boot.DIB         356b instructions   written 18-Jul-95 20:13:48

Total of 1102b instructions

Checking for errors...
Linking...
Building allocation lists...
Assigning locations...
  1102b instructions in rings involving ONPAGE or AT
Reloading binaries...
Checking assignment...
Writing .MB file...
Writing listing...

IM:

 Imag   Real    W0     W1   W2  Symbol
 ----   ----  ------ ------ --  --------
Sa4000Boot.DIB:
    0     77   30050  33171 14  DISKBOOT
    1     74   30024 131025 14   (+1)
    2     12   30147  21372 14   (+2)
    3     13   34004 101020 14   (+3)
    4     10   30026 101014 14  DISKEMULATORLOOP
    5      6   30050  25011 14   (+1)
    6      4   30050  25372 14   (+2)
    7      5   31350 124414 14   (+3)
   10      7   35350 124420 14   (+4)
   11     11   30045   3175 14   (+5)
   12     76   30050  25132 14   (+6)
   13     75   30050  25401 14  DISKPAGETASKSWITCH
   14  @6000   30045   1002 14  INITRDC
   15   6001   30002  61277 14   (+1)
   16   6002   30020  41101 14   (+2)

 ...

The column labeled "Real" indicates the physical address in the control store that the given symbol+offset resides at, and given that I could do fun things like annotating the original source code:

%
Get here is Task 0 with previous error code in T and in MP and all old devices already turned off.

If EtherBoting doesn't work, we will blink the MP between the old number and our number.
%

1400    EtherBoot:    OldMPCode <- T, AT[EtherBootLoc];

%
Check to see if we were booted by the fault handler (Boot instruction in Boot.mc) or booted because we got another fault while the fault handler (task 15) was already running.  If so, delay a while to avoid flooding the net.  Note that we will get an H4 pairity error right away if the pairity bit on the host number switches is set wrong.

We would like to test for a programmed boot (BootF = 4) or a parity boot (1), but the hardware doesn't latch BootF.  So for now, we will take the short delay if we recognize why we are getting booted.
%
* BEWARE: The documentation (Feb 80) doesn't mention that the bits are read in complemented.
1401        T <- (GetRSpec[157]) XOR (77C);
1411    R0 <- T;
*    LU <- (R0) AND (5C);  * programed Boot or Parity Boot
*    SKIP[ALU=0],TimerReg <- 400C;
1427    LU <- (R0) AND (62C);  * PowerOn, Tester, or PushButton
1456    SKIP[ALU#0],TimerReg <- 400C;
1454    TimerReg <- 20000C;
1455    NOP;    * Allocation
1404 EtherDally:    R0 _ 60000C;
1402    R0 <- (R0)-1, GOTO[.,R>=0];
1403    TimerReg <- (TimerReg)-1, GOTO[EtherDally,R>=0];

With this annotated listing in hand, it was relatively easy to see what was going on in the processor after it'd attempted to download the emulator microcode.  What it looked like on the logic analyzer was an infinite (or nearly so) loop of the below:

D0          -  State Listing                                                    

                           
Label    >   Odata  CIA  
Base     >   Octal  Octa
____________ ______ ____
    13290   000220 0302   
    13292   001462 0326    
    13294   004122 0336    
    13296   004220 0303    
    13298   001462 0326    
    13300   004122 0336    
    13302   000220 0304    
    13310   001462 0326        
    13313   000222 0336
    13314   000263 0305    
    13316   006264 0310        
    13321   000527 0307    
    13323   000522 0313
    13324   000535 0314    
    13326   000626 0315
    13329   000526 0317    
    13331   001426 0320    
    13333   000022 0321        
    13337   000526 0322    
    13339   001426 0323    
    13341   000022 3774    
    13345   004126 0325    
    13347   000220 0302

The annotated listing revealed the above loop to be this bit of microcode (in LoadRam.mc):

OnPage[LRJpage];

LRJenter:
0300     xfTemp <- 1c, AT[LRJStart,0];
* following location is the place to restart for overlays
0301    T <- xfTemp, AT[LRJContinue];    * wait for write of xfTemp to avoid bypass problem
LRJloop:
0302    PFetch1[LP,xBuf2], Call[LRJIncCount], At[LRJStart,2];
0303    PFetch1[LP,xBuf], Call[LRJIncCount], At[LRJStart,3];
0304    PFetch1[LP,xBuf1], Call[LRJIncCount], At[LRJStart,4];
0305    LU <- (LdF[xfTemp1,14,3]) + 1, GoTo[DoTask,R Odd], At[LRJStart,5];
0310    xfTemp1 <- (xfTemp1) + (2c), Skip[Alu#0], At[LRJStart,10];
0306      Refresh[xfTemp1, 0], At[LRJStart,6];
0307    T <- LdF[xBuf2,0,14], At[LRJStart,7];    * address
0313    xBuf3 <- T, At[LRJStart,13];
LRJloopx:
0314    LU <- (xBuf3) xnor (170000c), At[LRJStart,14]; *look for m-i address = 7777
0315    T <- xBuf2, GoTo[RamLoaded,Alu=0], At[LRJStart,15];

0317    LU <- xBuf, At[LRJStart,17];
0320    APCTask&APC <- xBuf3, At[LRJStart,20];
0321    WriteCS0&2, At[LRJStart,21];
0322    LU <- xBuf1, At[LRJStart,22];
0323    APCTask&APC <- xBuf3, At[LRJStart,23];
0324    WriteCS1, At[LRJStart,24];
0325    T <- xfTemp, GoTo[LRJloop], At[LRJStart,25];

LRJIncCount:
0326    T _ xfTemp _ (xfTemp) + 1, UseCTask, GoTo[LRJRet], At[LRJStart,26];

This loop stuffs the downloaded microcode from main memory into the D0's control store... so the download had apparently completed, but why was the copy to the control store never finishing? 

Even though microcode's not the easiest thing to read, the code is fairly simple: it copies microcode data from main memory into the microcode store until it finds an entry with a destination address of 7777; this exit condition is tested for by the instruction at 0314, the actual branch takes place in 0315.  So the only obvious way to get an infinite loop here is if the microcode data in memory is somehow invalid and doesn't have an entry with address 7777 specified in it.  If that were to occur, the microcode would loop through memory forever since there are no other exit conditions.

I decided to look a step back in the boot process and see what the actual microcode download was doing -- perhaps by looking at the behavior of the microcode during download some additional secrets would be revealed.  I set the logic analyzer to trigger at E3InStart (microcode address 2110) and rebooted the Dolphin.  And what I noticed was that shortly after triggering, the microcode execution trace fell right back into that same LRJloop as above -- so the download was being short-circuited for some reason.  I moved the trigger back to LRJloop, and I could watch the Dolphin jump into that loop while my IFS server was still sending microcode.  So it was 100% clear that whatever I was sending, the Initial microcode didn't like it.

But why?

If the microcode that the IFS sent was invalid or corrupted (bad checksum) or just being ignored it'd trigger a retry of the MicrocodeBootRequest, but that's not what was happening -- the Initial microcode was reading in a single Ethernet packet, taking zero bytes out of it, and continuing happily on its way -- leaving no data in memory for the LRJloop code to copy, and no "7777" marker to indicate the end of the transfer.

Zero bytes... none of the packets that IFS is sending should have a zero byte payload.  Except that pesky garbage packet I hacked in to satisfy the boot ROM microcode... hmm.  Well, it made no sense at all, because the Initial code also discards the first packet it receives, but just for the fun of it I hacked the IFS code again.  This time I modified it to send the garbage packet only for the Initial microcode, not for the AltoD0 microcode.

It worked.  It worked.  The boot ROM pulled down Initial, Initial pulled down AltoD0, and the emulator booted into the Alto Net Executive via a Breath Of Life packet (just like a real Alto!) and there you have it.

As to why this hack (on top of a hack) works, I'm still not sure.  I don't know why the D0's Boot ROM and Initial microcode skip that first packet -- the Xerox IFS implementation doesn't send a garbage packet, it effectively does exactly what my re-implementation does.  Further, the boot microcode on the Dorado (the D0's ECL-based big brother, which uses the same MicrocodeBootRequest protocol) doesn't discard any packets at all!  As far as I can tell, my IFS code (and the hardware it's built on top of) isn't doing anything incorrect, but I can't discount having overlooked something.  The discrepancy between the Dolphin and the Dorado's microcode makes me think that the Dolphin may have had some kind of hardware bug that the boot microcode is working around.

The 1100 Console's deflection yoke
At this point I'm pretty close to having the system running again, but I need to fix the issue with the console's lack of vertical deflection.  The cause is the deflection yoke: the vertical windings were corroded by the glue used to hold them in place, and went open.  Fortunately, a fellow I know rewinds transformers and the like for audio folks and he's graciously taken it on as a short project, so hopefully when that's complete I'll be able to see an actual picture on screen.

After that, I need to work on the hard drive, after which I can have all kinds of fun (it'd be neat to try to get Cedar running, for one..)  Then... I tackle the Dorado...

 

 

 

 

Other Projects

In addition to the Dolphin and the Dorado projects, I also semi-recently acquired a DEC PDP-11/60, an 11/20, most of a Honeywell DDP-416 processor, a Raytheon 704, a DEC KS10, and I will be taking a trip later this year to grab a PDP-15.  I also still have to finish restoring the PDP-11/45, reading in about 80 QIC tapes for the Three Rivers PERQ, and there's probably at least three other things I've forgotten. I guess what I'm saying is: I have a problem, this blog is a cry for help, won't someone please help me end the pain.

Until next time (probably September of 2025, by my reckoning), be excellent to each other... and PARTY ON, DUDES!