Lexicon ARU
Early Lexicon reverbs contain an IC called the 'ARU' which contains the datapath required for a reverberation unit. Mine came from a PCM-60. The other chips used in the PCM60 are described here for background.
The MMU contains two counters, an adder, and an address multiplexor. The first counter is the program counter, and is clocked on the incoming transition to state 2 (/AS1 rising). This increments the program counter address (SA0..SA6). The second counter is the address pointer counter. A specialized DSP for reverberation does not need static variables or any nonsense like that. So to facilitate a delay (z-1), a counter is used to cycle through the addresses. An offset is used to determine the length of a delay line, and this address comes from the memory system. Anyways, the SEL line chooses between row or column address, MI0..MI15 provides the memory offset, and YA0..YA7 provides row and column addresses. Anyone competent at digital electronics should be able to see how the memory cycle timing is performed. Note that the 4416 DRAM's use YA1..YA6 for the second address... not YA0!!! This can bit you if you attempt to use a 4464 DRAM in place of the 4416's!!! Perhaps the MMU designer's routed adder output 15 and 16 to YA0 and YA7, respectively, so either chip could be used? I dunno, haven't tried it.
The CMU consists of two registers. One register system is a 16-bit successive approximation register. This uses the (single) PCM-53 DAC as an A-D converter. Those chips are expensive so you use one converter for both outputs and the input!!!. The SAR, comparator, and DAC have a nonlinear settling time, so the program provides timing pulses for the sequence, as well as DAC and sample+hold timing. Clever. For writing, the CMU contains a DAC output register which is sent to the DAC. I need to look at the specific timing of this to determine if it is a simple transceiver or a latch.... I expect a latch. Possibly it's the same latch that performs the SAR function.
The 'meat' of this page is the ARU. This chip is no longer in production, and in the interest of how things work, I've started looking at it carefully. Here's what I have so far:
/AS2 latches a holding register (there are four holding registers, R0, R1, R2, and R3. R3 is normally used). WA0..1 and RA0..1 select which register is read from or written to, respectively. In almost all cases, if a register is written to, the RA and WA addresses are the same. In software, they would be used to store intermediate calculations or memory contents. These registers are located at the 'inlet' of the chip... the data bus. To store an intermediate calculation, you need to /XCLK the result out onto the data bus, then on the next cycle, write it to a register.
/XCLK latches the internal data bus onto the output (bus) register. This value will be placed on the bus when the MEMWR line is brought high. When MEMWR is low, the data bus is an input and may be latched into a register.
BCON3 and BCON0 control the state of the two internal data bus paths.When low, BCON3 selects either a shifted-right version of the accumulator (the shift occurs on a MC rising edge so it's one cycle behind), or a shifted-right version of an internal transfer bus while high. This data path goes to the 'B' input of a bank of the (simple) ALU. BCON0 selects either reading the holding registers (when low), or the accumulator (when high). This bus is fed to two registers. The first is the output register (the one clocked by /XCLK), the other is clocked by MC edge with /AREG low. This register is fed into the 'A' input of the ALU. The output of the ALU goes to an accumulator register, which is latched by the rising edge of /SC.
The inputs M0..M4 and /CI are used in the ALU. After doing a lot of looking at the coefficient ROM, I realized that only very few functions were actually used. Well, it turns out that the M4..0 and /CI bits are simply applied to an ALU consisting of an adder or data selector. Either the ALU will perform addition (A+B), subtraction (A-B), select A pass-through, select B pass-through, invert B, or all zero's. Only five combinations used. Ok, so that simplifies it. So why six bits to do this? It's legacy. If you make an ALU out of 74S181 chips, it all works out! Magic. Not only the Data General Nova, and numerous other minicomputers.... but reverbs!!! In this case, what is a useful chart is this one: multiplier coefficient vs. program bit combination. The original PROM is barely readable as it just has a whole bunch of hex codes. But with some reading, and a lot of starting at state maps and timing diagrams and a 'scope, I think I have a lot of it figured out.
I have not fully decoded the red part there....
CI3..0 | Single Precision | DP, ACC first cycle | DP, ACC second | DP, non-acc, first | DP, non-ACC, Second |
0000 | -0.875 | 0.46875 | -0.5 - acc | ||
0001 | -0.750 | 0.4375 | 0.21875 | ||
0010 | -0.625 | 0.40625 | |||
0011 | -0.500 | 0.3750 | 0.1875 | ||
0100 | -0.375 | 0.3475 | -0.5 - first cyc + acc | ||
0101 | -0.250 | 0.3125 | - first cyc + acc | 0.15625 | |
0110 | -0.125 | 0.28125 | -0.5 - first | ||
0111 | -0 | 0.25 | - first | 0.125 | |
1000 | +0.875 | 0.21875 | +0.5 + acc | ||
1001 | +0.750 | 0.1875 | 0.09375 | ||
1010 | +0.625 | 0.15625 | |||
1011 | +0.500 | 0.125 | 0.0625 | ||
1100 | +0.375 | 0.09375 | +0.5 + first + acc | first + 0.75 + acc | |
1101 | +0.250 | 0.0625 | first + acc | 0.03125 | first + 0.5 + acc |
1110 | +0.125 | 0.03125 | +0.5 + first | ||
1111 | +0 | 0 | first | 0 | first |
A multiply cycle adds and shifts but in reverse of how you do it by hand. This ALU multiplies the LS bit first, then shifts the result RIGHT. Then it adds the next most significant bit, and shifts. The first cycle of an instruction does the two least significant bits at once; following cycles multiply one bit at once. The following table illustrates the multiplier pipeline: Note that each cycle consists of a single addition - no surprise, I suppose. There are a lot of side effects in the dual-precision mode in terms of what loads the accumulator and what doesn't. The original Lexicon code uses a few of these tricks. For example, the DP/ACC lines that ignore the ACC even though it is being commanded to add.
Instruction | Cycle | What happens, Single Precision ACC | What happens, DP - ACC |
ACC (DP) | 0 | reg = 0.125 * MEM + 0.250 * MEM | reg = 0.03125 * MEM + 0.0625 * MEM |
1 | reg += 0.500 * MEM | reg += 0.125 * MEM | |
2 | acc += reg * sign (+/- 1) | reg += 0.250 * MEM | |
ACC | 0 | reg += 0.500 * MEM | |
1 | acc += reg * sign (+/- 1) | ||
2 | empty |
The /OVLD signal is used to indicate that the accumulator has overflowed.
DAB0..15 are obviously the data bus and connect to the DRAM and CMU.
The state machine is very well done. Designers had to really know what they were doing back then. The PROM U45 contains the instruction sequencing. There are delays used to remember double-precision mode for the next instruction cycle. Instructions take three clock cycles. Instruction fetch occurs on the entrance of state 0. Some of the state transitions are a bit cumbersome in today's terms but for the early 1980's this is amazing pipelining, considering the simplicity of the hardware itself. Single-precision ACC cycles are pipelined so the result is ready in cycle 1 of an instruction - the addition is clocked in cycle 0 of the next instruction.
Instruction execution on the PCM60 starts out with an instruction latch (U44, U35) (start of state 0), memory read cycle (state 1) - this also latches the instruction code for the instruction decode PROM (U41 is the latch) - this occurs at the end of state 0 - and at the end of state 1, the decode latch (U42) feeds the new instruction code out.
I am assuming that this is correct... I think I'm making progress here.
126: rdadc w3 r3 tfr +0.000 wrdac 0x3ff2 ; read A-D converter
127: xc w3 r3 acc +0.000 wrdac 0x3fff ; latch data
000: rdmem w3 r3 dp l(+0.000) wrdac 8966 ; read output of predelay buffer.
001: wrmem w3 r3 acc h(+0.500) wrdac 9039 ; write new value of predelay buffer - latched by xc. Finish double multiply *0.5000.
002: rdmem w0 r0 dp l(A-0.40625) wrdac 16174 ; read diffusor output, multiply by diffusion constant, sum with input
003: wrmem xc w3 r3 ndp h(A-0.40625) wrdac 16381 ; write output of predelay buffer for other parts of the reverb loop. Finish first sum
004: wrb w3 r3 dpa l(A+0.40625) wrdac 0x3fff ; dummy write, multiply input of diffusor delay line by diffusion constant. latch input of diff delay
005: wrm xc w3 r0 ndp h(A+0.40625) wrdac 16244 ; write input of diffusor, finish multiply of diffusor output
The room reverb looks just like Griesinger 'plate' reverb covered by the AES paper 'Effect Design - Part 1, Reverberator and other filters'. Except that there is one additional stage of diffusion and one additional stage of delay through each leg of the tank. Also, the placement of the damping low-pass filters are somewhat different. The input diffusion is done differently. Rather than having four cascaded input diffusors, there are two pairs of cascaded diffusors, each feeding one leg of the tank. Both are fed from the predelay line. The output tap summation uses a lot more taps, including some in the predelay.
Also of note - the PCM60 addressing system does not have a facility for modulation of delay taps, which makes the modulating delays impossible. Other Lexicon units can - the 'slave' processor can update the control store at will. The function of a separate slave processor is to do exactly that - update the control store at a moment's notice. But the PCM-60 uses a ROM control store without a control processor at all.
Note that a double precision multiply flushes the pipeline at the end, which is why it is used for the initial scalar multiply *0.5000. Also, double precision multiply offers accumulator clearing functions which are not present in the single-precision MAC instruction.
Turns out that the PCM70 is almost identical to this PCM60. Except that the processor (a Z80) can reload the program 'at will' which is useful for modifying parameters. The master Z80 contains the user interface. The slave processor (another Z80) contains the DSP microcode, LFO's, and a bunch of parameter updating functions. It also has some of the presets stored in it. The two EPROM's are literally full.
Looking at this design, and especially the timeframe they were done in, is a lot like looking at Mr. Wozniak's Apple II logic design, and the group coding disk controller card. Those were awesome designs and the way the old ARU was put together reminds me of what good design is.
The same architecture is used on the 224XL, 200, and the ARU IC's in the PCM60, PCM70, and the 480. The later Lexicons (300, PCM80, 81, 90, 91) use a new 'lexchip', which looks to be a single-chip implementation of a PCM70 ARU with one additional state in the multiplier - a Lexichip uses four clocks per cycle, which makes the multiplier precision any one of 4-bit, 5-bit, 7-bit, or 8-bit, depending on the timing bits set in the microcode. The microcode for the Lexichip1 is very similar, though not identical to, the ARU system on the 480L. Some control bits are missing to make room for generating the raw ADC/DAC timing signals and memory bank selection. I don't think they significantly affect programming it, though. Note that the 224XL ARU is significantly different... yet not really. The 200, PCM60-70-480 ARU's are practically identical - the 480 uses an additional T-state per instruction, allowing one more multiplication, whereas the PCM-70 is an updated PCM-60. The 224XL allows two multiplier bits at once, and has a dedicated shift register rather than single registers with rewired inputs.
Different Lexicons used different program lengths. The 224 used 100 program steps, the PCM60 and 70 used 128 steps, the 480 uses 80 steps but with four processing cores, the 300 (using two Lexichips) used a 96 step program, and the PCM91, which uses two Lexichips as well, allows all 128 steps. A Lexchip-3 appears to offer 256 program steps per loop rather than the 128 steps for the older ARU-based units.
Another interesting thing marginally related to this is the AL3201 digital reverb. It follows many of these principles in hardware design, however, almost two decades of 'progress' has made transistors cheap enough that a fast (and cheap) 7-bit plus sign multiplier very practical.... which is what the AL3201 does. There are fewer temporary registers (just enough, though, to help in making reverb logic like diffusors, comb filters, allpass filters, etc. The 128-byte per sample code length is shared with the Lexicon 'verbs, although the control store on the AL3201 is in RAM - much like a PCM70. I may try running one of these chips when I get around to it.... when I have some time. Note that the added efficiency of the AL3201 is offset by the shorter data RAM page size, meaning that the DRAM needs to have instructions specifically added to refresh the dynamic memory used for storage. That makes the PCM-60 room algorithm take around 110 out of 128 instructions. Also, the audio data memory is stored in a floating-point format. Finally, it also contains four LFO's which would allow modulating delays to be implemented without processor intervention. It also offers a linear interpolation feature which allows LFO sample switching to be a bit smoother.
A diffusor in the AL3201 can be written more efficiently than in the Lexicon reverb. Two lines of code are required.
RAPB diffend k=-0.40625
WBP diffstart k=0.40625
The first line fetches the end of the diffusor, stores that in B. Simultaneously, it multiplies that by -0.40625 and adds it to the accumulator.
The next line stores the accumulator into the data memory, *then* takes the accumulator, multiplies it by 0.40625, and adds it to the B register, and stores that into the accumulator. Voila, instant diffusor in two lines.
Now, I'm not about to start distributing any Lexicon code... I only posted a small snippet that you see here - so don't go asking for any more of it. I've been in trouble for posting too much reverse engineered information before. Up to this point, it's educational.