January 2021: Debugging Commences
In my last post I left off my tale of PDP-11/70 restoration in late January, having just powered up the rebuilt supplies after reinstalling them in the chassis. The next step was to reinstall the processor, cache, and memory and see what happens:
|
(not quite) The First Power-up with Stuff Installed
|
The answer is: not much. The processor was almost entirely unresponsive: it powered up with the RUN
and MASTER lights on and wasn't responding to most input from the front panel. Toggling the "Halt" switch and hitting "Start" caused the RUN
light to go out, but that's the only response I got from the console.
Enter the KM11-A:
How does one debug a processor as complex as the 11/70's? These days, advanced diagnostic tools like Logic Analyzers and digital storage oscilloscopes are commonplace, but in 1974 they weren't really an option. DEC's solution to this was the KM11-A "Maintenance Set", a pair of boards with an array of lights and four switches. The lights were used to monitor device state, and the switches controlled the behavior of the device and allowed for single-stepping processors. The KM11 could be used to debug a variety of DEC hardware -- various PDP-11 processors and a few different peripherals and device controllers. My KM11-A is a reproduction, which I built over a decade ago to debug my PDP-11/40, since then I've also used it to repair my PDP-11/05. And now, it's time for the KM11 to work its magic again.
With the KM11 boardset installed (you can see it sticking out from the left-hand side in the above picture) I was able to step the processor through
micro-instruction execution. A toggle switch on the KM11 clocks the processor, and the DATA lights on the front panel show the microcode address in the right 8 bits (with the selector knob turned to "uADDRS FPP/CPU). (In the above picture it's showing address 200 octal). The PDP-11/70's KB11-C processor is microcoded, using an array of small, high-speed bipolar PROMs to store 256 64-bit microcode words. These 256 words are interpreted by the hardware to implement the PDP-11 instruction set, address memory and the Unibus, and to interface the processor to the front panel.
The KB11-C Engineering Drawings contain 14 pages of "flow diagrams" which detail precisely how the microcode executes. The "KB11-C Processor Manual" (EK-KB11C-TM-001) provides 376 pages explaining exactly how the hardware works. A typical flow diagram looks like:
|
Feel Flows
|
This is FLOWS 14, which diagrams the Console (front panel) portion of the microcode. Top-center, you can see a starting bubble labeled "CON.00" which marks the start of the console portion of the microcode. The box below it represents a single microinstruction, and details the operation of this microinstruction in each of the processor instruction cycle's "T-states." The arrows coming out of this box indicate branches to other microinstructions, depending on the state of the hardware at the time of the instruction execution. Branches may also lead to other flows (indicated by diamonds).
Use of the KM11 indicated that the processor was definitely executing microinstructions, and seemed to be
following the flow diagrams in the engineering drawings. This is excellent -- it indicates that a lot of the hardware is functional.
Curiously, left to its
own devices the processor didn't seem to be executing
microinstructions at all and was stuck at micro-address 200 octal. This is "ZAP.00" in the flow diagrams and is where the processor starts at power up or after a reset.
In the troubleshooting section of the 11/70 service docs (diagram on p. 5-16) it states:
IF LOAD ADRS DOES NOT WORK AND:
- RUN, MASTER & ALL DATA INDICATORS ARE ON
- uADRS = 200 (ZAP)
THEN MEMORY HAS LOST POWER
Which
seems to adequately describe the symptoms I was seeing -- there is power-fail hardware in the processor that forces the microcode address to 200 in the event that power is lost, but the AC and DC LO signals (which are what the power supply uses to tell the processor of such a failure) were all fine (after checking again, just to be sure). Also if this was the case I wouldn't expect that the KM11 would be able to step the processor at all -- the power fail hardware should force the processor's microcode address to 200 at all times until the power failure is resolved.
Probing the processor clock signal on the backplane with an oscilloscope revealed no clock signal at all, just a flat line. The clock signal is provided from one of three sources on the "TIG" (Timing Generator) board: Normally it comes from a 33.3333Mhz clock crystal. While debugging with the KM11, it can come either from the MAINT STPR switch on the KM11, or from a special diagnostic RC clock network on the TIG board (this latter can be adjusted to a wide range of frequencies for margin testing.) This lack of a clock signal was definitely an important clue.
Another oddity was revealed after a closer look at the service docs: In Chapter 4 of the Processor Manual, Section 4.1.3 it states:
"The
third source of timing [the other two being the crystal clock and a
diagnostic R/C network] is the manually-operated, single-step MAINT STPR
switch S4, located on the maintenance card. This switch is only
enabled when maintenance card switches S2 and S3 are both set to 1."
Section 4.2.3 confirms this:
"The
maintenance card S2 and S1 switches are both set to 1 to allow single
timing pulses to be generated by MAINT STPR switch S4.... Removing the
S2 or S1 input conditions the MS EN flip-flop to be cleared."
What was interesting about the above is that on my system, switch S4 (MAINT STPR)
stepped the processor with switches S1 and S2 set to any configuration. This being the
case, I wondered if the logic that selects the clock source was faulty, and was always selecting the MAINT STPR input.
Well, only one way to be sure, and this would require getting the TIG board out on an extender for some extensive probing. In doing so, I found that no clock signal was being generated by the 33.3333Mhz crystal at all; in fact while probing it one of the legs to the crystal fell right off. This is usually a sign of a faulty component.
So I placed an order on Digi-Key for a replacement.
But then I got impatient and remembered that the rusty burned-out hulk of a PDP-11/45 I picked up along with the 11/70 was in the garage, and the 11/45 also has a TIG board, very similar to the one in the 11/70, and also using a 33.3333Mhz crystal.
A short while later, the 11/70s TIG had a new, stolen, clock crystal:
|
Where'd you get that shiny new crystal?
|
And after reinstalling the TIG back in the backplane and powering up:
It's alive! A bit. With a working clock, the processor was able to respond to the front panel and I was able to load addresses and examine and deposit into memory. However, instructions would not execute -- loading an address and hitting "Start" on the front panel had no effect. More pressing: after the system warmed up for a minute or two, the "Load Address" switch on the front panel would stop working properly, and would always load "0" rather than what was in the front panel switches.
Still, good progress for just a few evenings of research and debugging (and conversing with people on cctalk for advice.) Over the next few days I started in on investigating these issues... which I'll talk about in my next exciting installment. Until then... go find something else to read.
No comments:
Post a Comment