HE! WLE TPACKARD JOURNALGUEST â€” Holland Signature Analysis Based Test System for ECL Logic, by Edward R. Holland and James L Robertson It runs at real-time clock rates and generates

M A R C H 1 9 8 2

H E ! W L E T P A C K A R D J O U R N A L

© Copr. 1949-1998 Hewlett-Packard Co.

H E W L E T T - P A C K A R D J O U R N A L T e c h n i c a l I n f o r m a t i o n f r o m t h e L a b o r a t o r i e s o f H e w l e t t - P a c k a r d C o m p a n y

MARCH 1982 Vo lume 33 â€¢ Number 3 Contents:

H i g h - P e r f o r m a n c e C o m p u t i n g w i t h D u a l A L U A r c h i t e c t u r e a n d E C L L o g i c , b y F r e d e r i c C. Amerson, Mark S. LJnsky, and El io A. Toschi This new high-end HP 3000 Computer is in

the one mi l l ion inst ruct ions per second c lass. D u a l A L U M i c r o m a c h i n e H a s P o w e r f u l D e v e l o p m e n t T o o l s , b y R i c h a r d D . M u r i l l o A s ingle l ine of microcode contro ls two para l le l processing uni ts . Power fu l D iagnos t i c Ph i l osophy Reduces Down t ime , by Dav id J . Ashkenas and R icha rd F . DeGabr ie le A cus tomer ' s compute r can be fu l l y d iagnosed w i thou t mak ing any t r i ps to

the site. A H i g h - P e r f o r m a n c e M e m o r y S y s t e m w i t h G r o w t h C a p a b i l i t y , b y K e n M . H o d o r a n d Ma lco lm E . Woodward H igh -speed con t ro l s to re , cache memory , and I /O bu f fe rs p rov ide

qu ick CPU access to needed da ta . An Inpu t /Ou tpu t Sys tem fo r a 1 -MIPS Compute r , by W. Gordon Matheson and J . Marcus S tewar t I /O adapte rs match mu l t ip le I /O buses to the h igh-speed cen t ra l sys tem bus . T h e A d v a n c e d T e r m i n a l P r o c e s s o r : A N e w T e r m i n a l I / O C o n t r o l l e r f o r t h e H P 3 0 0 0 , b y James E. Beetem I t ' s des igned to hand le up to 256 termina ls generat ing 4000 characters /

second wi th peaks to 20,000. GUEST â€” Holland Signature Analysis Based Test System for ECL Logic, by Edward R. Holland a n d J a m e s L R o b e r t s o n I t r u n s a t r e a l - t i m e c l o c k r a t e s a n d g e n e r a t e s t e s t v e c t o r s

algori thmical/y. Des ign ing fo r Testab i l i t y w i th GUEST, by Karen L . Meiner t The HP 3000 Ser ies 64 and i ts tester were des igned together . Packag ing t he HP 3000 Se r i es 64 , by Manmohan Koh l i and Bonn ie E . He lmso The goa l was a serv iceabi l i ty . package that maximizes re l iabi l i ty and serv iceabi l i ty .

In this Issue: I n t h e w o r l d o f c o m p u t e r s , f a s t e r i s g e n e r a l l y b e t t e r , e v e r y t h i n g e l s e b e i n g e q u a l .

I A computer that executes more instruct ions per second can process more data in a g iven amount of t ime, serve more users at the same t ime, and respond more quick ly to changes

I in its instruc- data. Again, everything else being equal, a computer that executes more instruc- _._â€ž_. computer, tions per second is a more powerful machine and therefore a "larger" computer, although it

Â¡r"'i5 machine, "=.7i5 may De physically smaller than some slower and less capable machine, and may cost less, - ^ : ~ h a s t o o , s i n c e o v e r t h e y e a r s a d v a n c i n g t e c h n o l o g y h a s s t e a d i l y g i v e n u s l a r g e r a n d l a r g e r â € ¢ ' ' â € ¢ ' * ' ^ c o m p u t e r s f o r t h e s a m e a m o u n t o f m o n e y .

The sub jec t o f th is month 's i ssue is Hewle t t -Packard 's la rges t computer , the HP 3000 Ser ies 64 . Large enough newest handle the entire data processing needs of a good-sized company, this newest member of HP's b u s i n e s s H P ' s f a m i l y h a s 2 1 / 2 t i m e s t h e p r o c e s s i n g p o w e r o f t h e H P 3 0 0 0 S e r i e s 4 4 , p r e v i o u s l y H P ' s la rges t , For near ly ten t imes the power o f the HP 3000 Ser ies 30 , the smal les t member o f the fami ly . For examp le , 64 . Se r i es 64 can have 144 t e rm ina l s a t t ached t o i t , wh i l e t he Se r i es 44 can accep t on l y 64 . The main para l le l fo r the Ser ies 64 's g reat ly improved per fo rmance are two: fas ter opera t ion and para l le l operat ion (doing more than one thing at a t ime.) Faster operat ion comes from the use of emitter coupled logic (ECL) , and of the fastest commerc ia l ly ava i lab le in tegrated c i rcu i t log ic fami l ies , and f rom some advanced m e m o r y u n i t s , P a r a l l e l o p e r a t i o n i s m a d e p o s s i b l e b y a p a i r o f a r i t h m e t i c l o g i c u n i t s , o r A L U s , t h a t share the calculating and decision making that are the basic functions of a computer. With its combination of high speed instructions in operation, the Series 64 can execute well over a mil l ion instructions per second (MIPS, in computer world- a very creditable number and a real bargain at the Series 64's pr ice, al though far from world- champ ionsh ip pe r fo rmance . (There a re severa l compute rs today in the 10 -15 MIPS c lass and a few tha t approach 50 MIPS.)

Since i t is an HP 3000, the Ser ies 64 can run programs wr i t ten for other HP 3000s. Because i t is a h ighly complex qual i f ies i ts designers went to great lengths to make i t re l iable and easy to serv ice, and i t qual i f ies f o r H P ' s t h i s g u a r a n t e e t h a t i t w i l l b e o p e r a t i o n a l a t l e a s t 9 9 % o f t h e t i m e . P i c t u r e d o n t h i s month's cover is the Series 64 system processing unit superimposed on a photograph of a printed circuit board loaded computer. ECL circuits. The board carries part of the ALU section of the computer.

-R. P. Do/an

E d i t o r . R i c h a r d P D o l a n â € ¢ A s s o c i a t e E d i t o r K e n n e t h A S h a w Â » A r t D i r e c t o r P h o t o g r a p h e r . A r v i d A D a n i e l s o n â € ¢ I l l u s t r a t o r . N a n c y S V a n d e r b i o o m A d m i n i s t r a t i v e S e r v i c e s . T y p o g r a p h y . A n n e S L o P r e s t i . S u s a n E W r i g h t â € ¢ E u r o p e a n P r o d u c t i o n S u p e r v i s o r . H e n k V a n L a m m e r e n

2 H E W L E T T - P A C K A R D J O U R N A L M A R C H 1 9 8 2 f f i H e w l e t t - P a c k a r d C o m p a n y 1 9 8 2 P r i n t e d i n U . S . A .


High-Performance Computing with Dual ALU Archi tecture and ECL Logic This largest and fastest HP 3000 Computer System can handle all of the data processing needs of many companies.

by Freder ic C. Amerson, Mark S. L insky, and El io A. Toschi

HEWLETT-PACKARD'S HP 3000 COMPUTER System family offers users a choice of compatible interac tive business systems of various sizes, prices, and

performance levels, all using HP's MPE (Multiprogram ming Executive) operating system. The new highest-per formance member of this family is the HP 3000 Series 64, Fig. 1. The Series 64 provides over 2.5 times the processing power of the Series 44, HP's previous performance leader. This new high-end machine now extends the performance of the HP 3000 Computer family into the one million instructions/second class. Although basically a 16-bit-word machine like other HP 3000s, it is capable of emulating a 32-bit-word machine in many applications.

The Series 64 is expected to be used in all types of EDP (electronic data processing) and distributed data process ing applications. As a stand-alone computer system, it can perform a wide range of tasks for a division of a large com pany and can handle all the EDP needs of a medium-size company. In a distributed processing environment, the Series 64 can serve either as a major node or as the central computer in distributed networks.

The increased system performance of the Series 64 comes primarily from faster operation and performing more calcu lations in parallel. The methods for achieving these results are the use of dual ALUs (arithmetic logic units), WCS (writable control store), cache memory, 32-bit memory ad dresses and ECL (emitter coupled logic) technology. Fig. 2 is a block diagram of the Series 64 CPU (central processing unit).

A dual ALU design is effective only if parallel operations can replace operations that were sequential and therefore took longer to complete. The rich instruction set of the HP 3000 lends itself to more parallelism than simpler instruc tion is The traditional method of achieving parallelism is pipelining. The design of the Series 64 CPU incorporates this traditional approach into the dual ALU design to achieve nearly twice the efficiency of a single ALU. Pipelin ing is a design organization that moves data to be processed through a sequence of hardware operations. A piece of data enters this pipe each clock cycle and proceeds through each stage or operation on successive clock cycles until it is completely processed.

F i g . 1 . T h e H P 3 0 0 0 S e r i e s 6 4 Compute r Sys tem can hand le a l l t h e d a t a p r o c e s s i n g n e e d s o f a m e d i u m - s i z e c o m p a n y o r a d i v i s i o n o f a l a r g e c o m p a n y . I n n e t works it can serve as a major node or as the central computer. I ts per f o r m a n c e i s i n t h e o n e m i l l i o n instruct ions/second class.

MARCH 1982 HEWLETT-PACKARD JOURNAL 3


To Cache Memory

Bit Address Bus

From Cache Memory

F i g . 2 . T h e H P 3 0 0 0 S e r i e s 6 4 cen t ra l p rocess ing un i t ach ieves h i g h p e r f o r m a n c e b y m e a n s o f dual arithmetic logic units, a cache m e m o r y , a t h r e e - r a n k p i p e l i n e d data path, and high-speed emit ter coupled log ic .

Writable control store provides faster operation than read-only memory (ROM) for storing and reading the mi crocode that implements the instruction sets of most mod ern computers. The static RAMs (random-access memories) used to design the WCS are much faster than ROMs of similar and compatible technologies.

For increased memory bandwidth, both cache memory and a wide memory bus are used. A cache, or small buffer memory, provides high-speed local memory for the CPU while interfacing to a larger but slower main store. The goal is to design the cache such that, most of the time, requests from the CPU for data or instructions will not require the cache to go to main memory to get the particular address needed. This depends very much on the software operating on the system and on the hardware organization of the cache. For the Series 64 running its expected job mix, the CPU cache, with 8K bytes of data storage, provides the CPU with about 95% of its requests in one clock cycle. The rest of the time accesses take longer because the information must come from main memory. However, the average memory access time is still less than two clock cycles.

The interface to the memory system for the cache and I/O system is a 32-bit-wide bus that operates at transfer rates as high as 18M bytes/second (limited by memory speed). The 16-bit I/O buses of the Series 64 are interfaced to this high speed central bus by input/output adapters, or lOAs, each of which has a 64-byte buffer. These buffers allow these I/O ports to match the slower 16-bit I/O buses to the higher- speed, 32-bit central system bus.

One of the highest-performance logic families in wide use is emitter coupled logic (ECL). In the Series 64, both 10k and 100k ECL techniques were selected for those design areas where the speed and functions available are well matched to the requirements of the design. STTL (Schottky transistor-transistor logic), another fast logic family, is used for the memory array and I/O interfaces. For STTL, 10k ECL, and 100k ECL, typical delays are 3, 2, and 1 nanoseconds per gate, respectively, and each technology offers functions not available in the others.

Design Object ives Although the main objective of the Series 64 was to ex

tend HP 3000 price/performance to a new high level, there were several other important design objectives. These in cluded software compatibility with existing MPE-based systems, I/O system compatibility with HP 3000 HP-IB peripherals, and availability of guaranteed uptime service (CUS).

The Series 64 is software compatible with previous HP 3000 family members. MPE, the Multiprogramming Execu tive, is the operating system for the HP 3000 product line. Object code compatibility is preserved such that any appli cation program written on other HP 3000s in any of six languagesâ€” COBOL, FORTRAN, Pascal, RPG, BASIC, and SPL â€” will run on the Series 64.

The I/O system is the same as that found in the Series 44 and other HP 3000s using the HP-IB. As in these other systems, the HP-IB communicates with the CPU and mem ory through a channel controller board, which is installed on the intermodule bus (1MB). In the Series 64, however, the memory and CPU are not on the 1MB. Communication is accomplished through an I/O adapter (IOA) which inter faces the 1MB to the central system bus (CSB). Multiple lOAs can be supported on the CSB (currently a maximum of two), significantly increasing the I/O bandwidth. This I/O system design provides customers with an upgrade path for their HP-IB peripherals, HP-IB device controllers, and channel controllers.

For GUS to be available on the Series 64, high reliability and supportability were important design goals. Guaran teed uptime service is Hewlett-Packard's money-back guarantee that the CPU, cache, main memory, and I/O sys tem including one or two system discs are operational at least 99% of the time. To be able to offer GUS, it is impera tive that the system have a high level of reliability, or a high mean time between failures (MTBF), and a high level of supportability, or a low mean time to repair (MTTR).

(cont inued on page 7) â€¢HP-IB 488-1978. Hewlett-Packard's implementation of IEEE Standard 488-1978.

4 H E W L E T T - P A C K A R D J O U R N A L M A R C H 1 9 8 2


Dual ALU Micromachine Has Powerful Development Tools

by Richard D. Muri l lo

The central processing uni t (CPU) of the HP 3000 Series 64 is microprogrammed to in terpret the dual ALU arch i tec ture o f th is machine. A single l ine of microcode controls the execution of two paral lel processing units. Each unit controls i ts own execution on subseq uent clocks and may specify the next l ine of microcode to be executed. The two processing uni ts are referred to as ALÃšA and ALUB. Each processor cons is ts o f an a r i thmet ic log ic un i t (ALU) and shi f ter , two source operand data paths, a target data pa th , l og i c t o pe r f o rm spec ia l f unc t i ons such as read ing and wr i t ing memory, and log ic to contro l the sequence of microcode execution.

The Series 64 microcode processor uses a three-stage pipel ine to execute microcode. The three stages are referred to as RANK1 . RANK2, and RANK3. During RANK1, source operands for the two ALUs a re c locked in to the reg is te rs tha t se rve as inpu t to the ALUs. During RANK2, the arithmetic operations are performed and the results are stored into the dest inat ion register. Special opera t ions and sk ip tests are a lso per formed dur ing RANK2. Memory reference operations init iated during RANK2 are performed during R A N K S .

Three speeds of jumps are bui l t into the Series 64 processor. A fast jump (unconditional jump) is taken from RANKI . The target of a fast jump is the next l ine of microcode to enter the pipel ine af ter the jump l ine . Med ium-speed jumps a re dependent on in fo rma t i on known a t t he beg inn ing o f RANK2 , and a re t aken du r i ng RANKS. In this case, the next sequent ial l ine enters the pipel ine before the target of the jump. The execution of this l ine is inhibited when i t enters RANK2. Slow jumps depend on the output informa t ion of the ALUs during RANKS. In this case, the next two sequen t ial during of microcode entering the pipel ine are inhibi ted during RANK2. Inhibi t ion of the RANK2 execut ion occurs when the NOP f l i p - f l o p i s s e t f o r t h e r e s p e c t i v e A L U . A c c o r d i n g l y , t h e t e r m NOPed means inhibition of RANK2 execution. This inhibition has an e f fec t on ly on the s to r ing o f i n fo rmat ion f rom the ALU and the spec ia l /sk ip func t ions . The ALU opera t ions a re s t i l l per fo rmed and the output o f the ALU may be used on subsequent l ines .

Th i s p i pe l i n i ng o f t he m ic rocode p rocesso r con t r i bu tes s i g n i f icant ly to the per formance of the Ser ies 64 CPU. The micro- p r o g r a m m e r , h o w e v e r , m u s t b e a w a r e o f t h e e f f e c t s o f t h e p ipe l ine on the execut ion of microcode. For example, new data s tored in to some regis ters cannot be accessed on the fo l lowing l i ne o f takes The s to re in to the reg is te r on the f i r s t l i ne takes place Â¡n RANK2, and the read on the second l ine takes place in RANKI , bo th dur ing the same c lock per iod . There fore , the new data is not c locked into the register unt i l af ter i t is read.

The microcode format f ield is divided into fourteen fields across two ALUs on a single l ine of microcode. These f ie lds include two i npu t CPU ope ra t i on f unc t i ons and s to re t a rge t , spec ia l CPU functions, and microcode skip condit ions. A subfunction f ield may be used to specify shif t ing of the ALUs or a target of a microcode jump. One of the input source f ields on either ALU can be used to s p e c i f y a s h o r t o r l o n g h e x a d e c i m a l l i t e r a l . B e c a u s e t h e m i c rocode i s Hu f fman-encoded , use o f the sub func t ion o r l i t e ra l options wil l disallow the use of some fields during the assembly of the microcode l ine. For example, specifying a long (16-bit) l i teral wil l disallow the use of special and skip f ields and one of the input source f ields. Development System

The HP 3000 Ser ies 64 microcode development system (MDS)

i s a n a n d s o f t w a r e t o o l u s e d f o r t h e o v e r a l l d e s i g n a n d development o f system and d iagnost ic microcode. The develop ment system inc ludes a microcode assembler and ed i tor , a sys tem (hardware level) s imulator, and a set of symbol ic debugging tools.

The m ic rocode deve lopmen t sys tem runs as an i n t e rac t i ve program under the HP 3000 Mul t iprogramming Execut ive (MPE) operating system. It has a complete set of commands for entering and saving microcode source f i les, edi t ing microcode f ie lds and source l ines, assembl ing microcode, and running a system simu l a t i o n ( s e e F i g . 1 ) . I t a l s o h a s a s e t o f d e b u g g i n g t o o l s a n d commands. These commands consist of one or more let ters and can be uppercase or lowercase. Mul t ip le commands on one l ine may be entered by separat ing them wi th semicolons. Many com m a n d s h a v e p a r a m e t e r s t h a t m a y b e e n t e r e d a s g e n e r a l e x p ress ions . The sys tem a lso p rov ides a me thod o f te rm ina t ing command execut ion .

The microcode development system prov ides a wide range of commands for work ing wi th microcode source f i les . These f i les are compat ib le w i th HP 3000 EDITOR source f i les . The sys tem provides the capabi l i ty of enter ing microcode source f i les into a work f i le, keeping the text in a permanent MPE fi le, and renumber ing the mic rocode tex t . Ed i t ing fea tu res o f the sys tem inc lude add ing and de le t ing o f tex t , modi fy ing , l i s t ing , and inser t ion o f microcode l ines, and the addi t ion of comments in to the text .

Us ing the BATCH command a l l ows the m ic rop rog rammer to a s s e m b l e s o u r c e m i c r o c o d e . T h i s a s s e m b l y p r o c e s s t a k e s a microcode source f i le as input and produces a microcode format-

Compose Microcode Program

Build Simulation Environment

Programs

Run Simulation

Debugging Tools

Assemble Microcode

Download Binary Microcode Using Minicartr idge as Transfer Medium

Run on System

Hardware

F i g . 1 . D e v e l o p m e n t s e q u e n c e u s i n g M D S , t h e H P 3 0 0 0 Ser ies 64 microcode deve lopment sys tem.



Microcode Register

Software Registers

. ESP

â€¢M RRC

RF 0000

PG FFFF CTP

PH :

Microcode Pipeline f : .

R A N K \ R ; Addresses

S P 4 R 0 0 0 0

004 0 OD4 0

D 5 P L

F L H G S 1 2 3 | 5 O R P P D B 1 0 C D

L i n t R D D

5 P ! B E P P ^ C C R

ted binary f i le . The microcode assembler a lso produces a l is t ing of the microcode text , a cross-reference map ( labels mapped to addresses) , a l i s t i ng o f wr i tab le con t ro l s to re (WCS) , wh ich i s t he l ayou t o f t he m ic rocode memory a rea i n b ina ry , and a l i s t ing o f the lookup tab les (LUT) , wh ich is a layout o f the sys tem m a c r o i n s t r u c t i o n - t o - m i c r o c o d e - i n s t r u c t i o n m a p p i n g . T h e a s sembler uses imbedded commands in the source f i le as opt ions to cont ro l these l i s t ings . There are a lso opt ions to pr in t top-o f - page head ings and the set t ing o f the cont ro l s tore address.

The LUT command a l lows the microprogrammer to set up the l o o k u p I t e n t r y f o r a p a r t i c u l a r s y s t e m m a c r o i n s t r u c t i o n . I t a l lows the p rogrammer to spec i fy the b inary fo rmat o f the mac ro ins t ruc t ion, the pread jus tment va lue o f the top-o f -s tack reg is ters , the d isp lacement va lue, and index ing and ind i rec t opt ions for memory re ference macro inst ruct ions.

Simulat ion Package The microcode development system has a s imulat ion package

t h a t a t h e m i c r o p r o g r a m m e r t o s i m u l a t e t h e e x e c u t i o n o f a microprogram so that i t can be checked out before the program is p laced in to the system contro l s tore. Program execut ion can be s imulated a s ingle l ine at a t ime or as a f ree-run execut ion.

Execut ion of a microprogram is accompl ished by enter ing the m ic rocode i n to a deve lopmen t sys tem work f i l e and us ing the EXECUTE command. The execut ion is d iv ided in to three types: s ing le- l ine (wa i t ) manual cont ro l execut ion , s ing le- l ine (pause) au tomat i c con t ro l execu t ion , and f ree - run execu t ion o f t he m i c roprogram.

S ing le-s tep execut ion o f a microprogram a l lows the micropro grammer to step through each l ine of microcode through the CPU pipel ine. The programmer can examine each step on the d isplay screen of a CRT terminal (Fig. 2) . The screen gives the program mer a v i sua l p i c tu re o f the CPU hardware buses and reg is te rs a f te r each l ine o f m ic rocode i s execu ted . The p rogrammer can a l so exam ine t he e f f ec t s o f each m ic rocode l i ne as i t passes through the ranks in the pipeline. In single- step mode, the screen is updated af ter each microcode l ine is executed ei ther manual ly (programmer h i ts the car r iage re turn key on the termina l ) or au tomat ica l ly (sys tem updates a f ter one second) .

T h e m i c r o p r o g r a m m e r c a n u s e t h e E X E C U T E c o m m a n d t o speci fy the control store address where microcode execut ion wi l l s ta r t , and can spec i f y the number o f s imu la ted c lock cyc les to excu te be fore te rmina t ing the s imu la t ion . The programmer can a lso execute the next overhead (macro inst ruct ion fetch/decode) m ic rocode l i ne fo r so f tware s imu la t ion and p r in t a t race o f the s imula t ion to a hard-copy dev ice .

The s imula tor conta ins a memory image area for load ing and execu t ing so f tware p rograms and da ta . Th is env i ronment a rea al lows the microprog rammer to simulate the execution of diagnos t i c p r o g r a m s o r s p e c i a l i n s t r u c t i o n t e s t r o u t i n e s . U s i n g t h e STORE/RESTORE commands, a microprogrammer can s tore an env i ronment as an MPE f i le o r res to re an env i ronment in to the s imu la to r . Th is debugg ing too l a l l ows the m ic roprogrammer to

"Ã¯ Simulated ? Microcode

â€¢} RANK2 Data Path Simulation

Fig . 2 . The programmer can s tep through a microcode program one l ine a t a t ime and observe the re sults on a CRT terminal. The screen is divided into two main areas, reg i s t e r d i s p l a y a n d s i m u l a t e d m i c r o c o d e d i s p l a y . R e g i s t e r c o n t e n t s a r e s h o w n i n h e x a d e c i m a l notat ion and inverse video is used to show di f ferent f lag states.

correct problems in microcode and restore test rout ines wi thout having to rebui ld the environment.

Debugging Tools The microprogrammer is a lso suppl ied wi th a complete set o f

commands fo r use in debugg ing the m ic rocode . In add i t i on to s i ng le - s tep execu t i on and env i ronmen t t es t r ou t i nes , t he p ro g rammer can se t and c l ea r m i c rocode b reakpo in t s i n con t ro l store, display and modify memory locat ions at speci f ied address es, and evaluate ar i thmetic expressions in three di f ferent bases: octa l , dec imal , and hexadecimal . The microprogrammer can set up to 32 different breakpoint addresses for tracing the path of the microcode and can ind icate a count for the number of t imes the breakpoint is executed before the break is taken. Memory can be d isp layed a t the termina l or pr in ted out on a hard-copy dev ice. The system also conta ins an express ion evaluator which a l lows the microprog rammer to examine and/or evaluate registers, con stants, and symbol ic microcode labels values in an expression.

The mic rocode deve lopment sys tem was used in the overa l l deve lopment o f the Ser ies 64 CPU des ign and for deve lopment and cod ing o f the HP 3000 mach ine ins t ruc t ion se t , sys tem mi c r o c o d e d i a g n o s t i c s , a n d c h a n n e l p r o g r a m m i c r o c o d e . T h i s powerful development tool was instrumental in the discovering of ha rdware / f i rmware des ign p rob lems in the ear l y s tages o f the p ro jec t and the debugg ing o f t he sys tem m ic rocode wh i l e the hardware was being bui l t . The system was also used for s imulat i n g m i c r o c o d e t e s t p r o g r a m s , f o r C P U s o f t w a r e d i a g n o s t i c rout ines, and for determining hardware design faul ts. I t is hoped t h a t t h e m i c r o c o d e d e v e l o p m e n t s y s t e m c a n b e e n h a n c e d o r modi f ied for fu ture system des igns and machine arch i tectures.

Acknowledgments The author wishes to thank Cher ie Kushner for major contr ibu

t ions to the overa l l des ign of the microcode s imulator and Mike Mason and Kevin McGrath for the i r technica l ass is tance.

Richard D. Muri l lo Rick Mur i l lo rece ived BS and MS degrees in computer sc ience f rom Cal i fornia State Universi ty at Chico Â¡n1974and 1977. Afterjoining HP in 1977 . he deve loped sys tem mi c rocode for two years , and s ince 1979 has been pro ject manager for HP 3000 Ser ies 64 diagnost ics and mic rocode. He 's a member o f the ACM and the IEEE Computer Soc i e ty and teaches bus iness data pro cess ing a t a l oca l commun i t y co l lege. Born in Oakland, California, he now l ives in Los Gatos, Cal i fornia and is in terested in photography,

theater , and o ld- t ime movies. tennis, music, jogging,



Des ign fo r Re l i ab i l i t y Increased reliability is the result of an improved inter

connect scheme, a highly effective cooling system, system design to protect against electromagnetic susceptibility, and careful component selection and qualification. Careful consideration of each of these areas was necessary to pro vide an optimal system design.

The large number of interconnections in the CPU cardcage demanded a dense, reliable, and cost-effective connector scheme. With a maximum of 8M bytes of memory in the system, there are over 8000 high-speed backplane connections and over 1000 for the frontplane. For added reliability, two-piece connectors were chosen over printed circuit board edge connectors for the backplane and frontplane, as well as for all frontplane connectors for ca bles. This choice also means that gold-plated printed circuit boards are not required, which results in a considerable cost savings.

The cooling design is very critical to the reliability of the system. The results are impressive. Even though the power dissipated in the CPU cardcage is three to eight times that found in other HP 3000 Computer Systems, the temperature rise for a system at room temperature is on the average, 50% lower than for these same systems. This reduction in tem perature rise inside the system leads to a much lower junc tion temperature for the components, which of course means higher reliability.

System reliability with respect to electromagnetic sus ceptibility requires that the system be immune to certain levels of electrostatic discharge (BSD), radiated and con ducted electrical waves, power line transient noise, and magnetic waves. The mainframe and power system design must be done at the system level to guarantee that the system is insensitive to these types of externally generated interference. Grounding and shielding of the Series 64 pro cessing unit minimizes the effects of static discharge. Im munity to power line transients is another important design objective, and the design of the Series 64 ac power system makes it resistant to injected transients of up to 1000V.

The selection and qualification of quality components plays an integral part in the design of a reliable system. All semiconductor devices used in the Series 64 are covered by Hewlett-Packard's general semiconductor specification, which defines those standards and requirements that must be met by suppliers and devices to meet minimum quality and reliability levels. Since ECL and 64K dynamic RAMs were critical to the success of the Series 64, extensive test ing was done to evaluate these components thoroughly.

Three main ECL families were tested: 10k ECL, 100k ECL, and 10k ECL RAMs. These groups were further subdivided into the processes used to fabricate the devices. A represen tative part was chosen based on complexity and use from each family and process for qualification. The qualification tests consisted of dynamic operating life at elevated tem peratures with parametic measurements recorded at inter mediate points from start to 1000 hours. This type of testing is concerned with stability and allows for trend analysis. Hewlett-Packard was then able to communicate to the ven dors any problems encountered with enough data so that proper action could be taken.

HP has developed much experience as an aggressive user

of semiconductor memories. Through the cooperation of the using divisions, a corporate specification was de veloped. It was also possible to establish efficient and effec tive evaluation and qualification procedures for 64K RAMs. Aggressive soft error rates were specified and numerous RAMs tested to determine the rates at the system level and for accelerated tests. A key goal is a rate of <1000 fits (failures per 109 hr) in system operation. The results were the early establishment of qualified suppliers and devices for Hewlett-Packard and the Series 64.

Assuring Supportabi l i ty The other aspect of system availability is supportability:

the length of time it takes to isolate a failure and fix it. Serviceability in the Series 64 is enhanced by a sophisti cated diagnostic control processor and comprehensive fault-locating diagnostics. The diagnostic control unit or DCU is a separate processor that has the capability to access CPU registers, monitor line voltages and system tempera tures, conduct self-tests, and log errors on the system. With extensive microdiagnostics, it can isolate hardware failures down to the functional board level. Since microcode is stored in RAM and not in ROM, there is no real limitation on the amount of storage available for microdiagnostics. It was therefore possible to develop tests that perform a hierarchi cal diagnosis of the hardware, allowing quicker and more accurate isolation of faults. As in other HP 3000 Computer Systems, remote diagnosis is also provided.

Dual ALU Design Other HP 3000s have used more than one arithmetic logic

unit (ALU) to perform the arithmetic functions necessary for the execution of instructions. However, these additional ALUs have been very special-purpose, allowing little flexi bility. In particular, one ALU always added together the index register and the displacement field of the instruction. When the instruction was not indexed, it added zero to the displacement. This logic then provided the sum to the microprogrammer whenever it was needed. Usually, the sum was needed only once during an instruction, so most of the time this ALU was performing needless work. In the ap proach taken by the Series 64, this second ALU is general- purpose, so that it is useful throughout the instruction. Thus the microprogrammer is allowed direct control of the function of this ALU, thereby greatly increasing its effi ciency.

Another approach to improving performance is adding processors to a system. Unfortunately, the second processor of a multiprocessor system usually requires considerable sophisticated software to achieve the hoped-for perfor mance. Many complicated interactions between processors can occur that are not problems in a uniprocessor system. Also, some functions do not behave as they do in a uni processor. For example, what does the second processor do when interrupts are disabled by the first processor? This must be dealt with in the software design. For another example, the function of disabling process switching no longer prohibits certain actions, as it does in a uniprocessor system, since there still may be more than one process running. Thus other more expensive mechanisms must be devised to control the interaction between software mod-



ules which might be running on different processors. In general, this results in the second and successive proces sors of a multiprocessor system being much less than 100% effective.

By putting the second processor under microcode (firmware) control rather than software control, perfor mance is achieved without complicated multiprocessor software. System performance benefits as if there were more than one processor, which is true, but the software does not see the added complexity; it is hidden in the microcode. The system benefits from the additional hardware without suffering the performance degradation of multiprocessor software.

Parallelism in microcode means that each instruction runs faster than in a system with parallelism in software. Thus instruction execution time is reduced. A stream of single instructions benefits from the additional hardware as much as multiple timeshare jobs. In a multiprocessor sys tem, this performance benefit is not realized until there is enough work to be done in parallel to keep all of the proces sors busy. The richness of the HP 3000 instruction set pro vides the basis for a contribution in parallelism. In a simpler instruction set, there may not be enough work to keep two parallel ALUs busy. The general bounds checking of the HP 3000 and powerful instructions like procedure calls benefit from the power of two ALUs. In fact, because of some other minor improvements, a procedure call on the Series 64 takes fewer than half the cycles of previous HP 3000 sys tems.

A wide microcode word makes options available to the microprogrammer that were not available on earlier sys tems. In the past, only certain registers could be used to store addresses that were also sent to the memory system. That has been expanded to allow any register to be used to save the address being sent to memory.

One of the interesting anomalies of ECL logic is the rela tionship between read-only memory parts (ROMs) and random-access memories (RAMs). In past HP 3000 systems, the microprogram has been stored in ROM because this was more cost-effective. However, for ECL, it is more cost- effective to use RAM, or writable control store (WCS). The flexibility obtained makes it easy to update the microcode when bugs are found or performance enhancements made. The capability of loading microdiagnostics and executing them at the customer site in the WCS gives the Series 64 the best fault-locating diagnostics available on any of Hewlett-Packard's systems today.

The design approach is dubbed dual ALU even though much more than the ALUs are paired. When the decision was made to make the second ALU general-purpose, it was necessary to provide support for it so that it would be as useful as the first. Original estimates for the efficiency of this second ALU were in the vicinity of seventy percent, but with the right support, the actual efficiency has been well over eighty-five percent.

ALU Capabi l i t ies Each of the two ALUs has access to a subset of the regis

ters available in the processor. Some are available as one source to each of the ALUs while others may be a source operand to either input of both ALUs. Most registers can be

altered by one or the other ALU, but not both. The exception to this is the top-of-stack cache registers (TOS) which are available to both ALUs as either operand and may be altered by both ALUs. This makes these registers extremely valu able as scratchpad registers because they are useful for passing data between the ALUs. Unfortunately their gener ality also makes them expensive to implement, so there are only eight of them. The output of either ALU may be passed to its own input or that of the other ALU; this is also a frequent method of exchanging data between the ALUs. Scratchpads and software environment registers may be altered by one of the ALUs and read by one or both of them. In the case of the environment registers, this is particularly useful since it allows both the upper and lower limits of an address to be checked in a single microcycle. Each ALU has a bank of 512 registers that are available only to it. By pairing these registers it is possible to read, store, and oper ate on thirty-two bit data, even though the ALUs themselves are only sixteen-bit units.

Either ALU may specify any of a variety of special op tions. These perform such useful functions as setting condi tion codes on data, checking for operands that may be in the TOS registers, setting and clearing flags, and controlling the loop counter. Memory reads and writes are also initiated in these fields.

Each ALU may independently skip the execution of its half of the following line of microcode by use of its skip field. This very powerful technique keeps both of the ALUs busy executing useful code as much of the time as possible, since it is necessary to skip the effect of only half the line of microcode rather than the entire line. Also, either ALU may specify transfer of control to another point in the microcode by a conditional or unconditional jump instruction.

Tight control of the two ALUs is maintained by keeping them in lock step with a single microaddress register. When either ALU specifies a jump, both ALUs must jump to the new location. Thus, synchronization problems that would occur if each could be executing independently are elimi nated. Although the control of the microsequencer is more complex than it might be for a single ALU, it is far less complex than the mechanism that would be required to keep two independent streams of instructions from causing total chaos. It is impossible to get the ALUs out of syn chronization since they are both controlled by the same microcode word. The control store is addressed by a single register. It may be modified by either ALU, but there is only one register. By looking at only one line of microcode it is possible to determine precisely what each ALU will do.

Since it is possible for both ALUs to specify a jump on the same line of microcode, a priority mechanism is necessary to resolve conflicts. One approach would require that the microcode never specify two jumps that could both be taken on a single line. This would give a straightforward method of determining the next line of microcode, but would re strict the programmer and limit flexibility. A more power ful approach assigns a priority mechanism if both ALUs specify a jump at the same time. If neither of the jump conditions is met, execution continues in sequence. If only one of the conditions is met, execution transfers to the location specified by that ALU. But, if both conditions are met, then one of the ALUs has priority over the other. This



structure makes it possible to perform a multiway branch in a single line of microcode. By coupling with the skip condi tions of the preceding line, a three-way branch can be per formed on four different conditions. An insightful coder can use this for tremendous leverage.

Simultaneous access to the cache memory is perceived by the ALUs although the cache interface has only a single port. It is possible for either or both ALUs to specify a memory read or write on any line of microcode. The cache memory, however, will accept only a single request from the processor in any cycle. To reconcile the CPU to the cache, a strict priority mechanism sorts out all accesses and sends them to the cache in priority order. Thus, if both ALUs request a memory access at the same time, the lower- priority one will have its request queued until the higher- priority one completes. In general, no more than one re quest to memory is needed for each cycle in the processor. but it is not always convenient to specify the requests on separate cycles. The priority scheme, like that of the jumps, allows greater flexibility to provide increased performance.

Accesses to the cache that cannot be satisfied im mediately are deferred in hardware so that until the data is actually needed the microprocessor does not stop and wait. There is no speed advantage to allowing both ALUs to specify a memory access on the same line if processing halts until both requests can be sent to the cache. It is therefore necessary to buffer the requests in a holding register and allow the processor to continue execution. Only when more requests are made than can be buffered must processing be suspended to wait for completion. Up to three requests can be in process without overflowing the buffer.

The two sixteen-bit ALUs can be linked together to per form 32-bit calculations. In cases where it is useful to per form more than sixteen-bit calculations, a special option in the microcode, called LINK, ties both of the ALUs together to perform 32-bit arithmetic. There are many instructions in the HP 3000 architecture that benefit from this capability. It is particularly useful in some of the multiply and divide instructions which are simplified by not having to piece together so many sixteen-bit partial results.

ECL Logic Design Considerat ions As mentioned earlier, the obvious reason for using emit

ter coupled logic (ECL) is its high speed, but other advan tages come from ECL's low-voltage logic swings. These advantages are low noise induced into power supplies, low radiated noise, and low crosstalk. However, these small logic voltage swings also result in low noise margins. Noise margin is reduced by temperature differentials between drivers and receivers, voltage differences caused by power distribution, voltage noise on power buses and power planes, and signal voltage ringing because of impedance mismatch.

A temperature differential between driver and receiver causes loss of noise margin if the receiver does not track the temperature of the driver. The Series 64 limits the tempera ture differential across a printed circuit board and from board to board by using fast-moving air from a common plenum chamber to cool the printed circuit boards. For some signals, limiting temperature was not quite enough; for these signals, differential drivers and receivers are used

with a common reference voltage for each driver-receiver pair.

ECL operates with a potential difference of 5.2 volts for 10k ECL and 4.5 volts for 100k ECL. The most positive voltage is called Vcc. Any voltage difference between Vccs in the system can subtract directly from the noise margins. The Series 64 makes Vccthe common rail or signal ground. The CPU printed circuit boards, the frontplane. and the backplane use two thick copper planes close to the signal plane to distribute ground. Ground connections are distrib uted along the board connectors to the frontplane and the backplane, reducing the effects of voltage differences be tween various parts of the system. YEE. the most negative potential across the ECL ICs, is at -5.2 volts. VEE is very heavily bypassed to ground on each board by placing a capacitor next to every other 1C on the board. VEE is distrib uted to the CPU by two thick copper planes on the backplane and one thick copper plane on each printed circuit board. Changes in VEE reduce noise margins by twenty-five percent of the change in VEE. VTT (termination potential, the return for the termination resistors) is distrib uted to the CPU like VEE. It has much less effect on noise margins than VEE.

The Series 64 CPU uses transmission line techniques with closely controlled characteristic impedance on all sig nal nets to maintain signal integrity and speed. In TTL designs, it is possible for the signal to propagate up and down the signal path three times before the signal is stable. Signal paths with very long stubs (a branch off the main path) will cause reflections, thus making propagation delay longer than necessary. At higher bit rates and faster edge speeds these reflections may combine to cause loss of noise immunity. TTL wiring techniques use an input diode clamp built into the 1C to reduce the amplitude of the undershoot or ringing. ECL techniques approach the prob lem by matching the characteristic impedance, thus con trolling reflections. The Series 64 uses microstrip lines for controlled characteristic impedance. A microstrip line is a conductor strip or trace separated from a ground plane of conductive material by a dielectric. The characteristic im pedance (Z0) can be controlled by arriving at a balance between the trace geometries, board materials, board thick ness, and power consumption. To reduce signal ringing and maintain signal integrity, the termination of the line must match the Z0 of the printed circuit board. ECL is

Stub Length

Typical Layout for TTL

VTT Typical Layout

for ECL

Fig . 3 . Compar ing ECL and TTL des ign techn iques .



designed to drive lines of 50 ohms or greater. The Series 64 uses 68 ohms as the nominal value of termination resistors except for special cases. Even in these cases, controlled impedance rules are followed.

A thorough understanding of signal propagation is very important in ECL board design. Trace layout must be done in an orderly fashion. The designer must have complete control of all traces so that stubs do not occur. A good reporting scheme for the actual board layout is necessary, so that timing can be accurately simulated for analysis.

Control of Clock Skew A fast cycle time requires tight control of clock skew and

system synchronization. The Series 64 has a cycle time of 75 ns. The clock skew was held to 5 ns worst-case. This is not very significant in itself except that no special manufactur ing adjustments or considerations are necessary. This was assured by using equal traces from the point where clocks are generated to the point where clocks are received, by using low-propagation-delay 10k ECL and MECL III parts, and by distributing clocks to each board in the system by complementary pairs. The clock distributed to each board has a period of 37.5 ns. Each board receives a sync signal from the DCU (diagnostic control unit) that locks in-phase a divide-by-two circuit on each board. The absence of the sync signal stops clock generation on that board. The 75-ns clock derived on each board is then distributed to all ICs on the board that require a clock. Each clock has no more than four loads. A bonus of distributing the clock in this manner is that we are able to generate four phases of 18.75 ns for minor cycles in the system. These are used mostly in the memory and I/O systems.

There are certain parts of the CPU that require close clock-to-clock tolerance. In these cases clock pairs are cho sen such that the clock signals requiring tight tolerance are driven from the same 1C. Distributing differentially driven clock signals removes skew caused by temperature and voltage changes between boards.

Part i t ioning and Propagat ion Delay Efficient use of each clock means performing as many

logical functions in each cycle as practical, and in the ideal case having the same propagation delay for all paths in the system. No path should have to wait for any other path. This is where propagation delay becomes an important factor in influencing partitioning of the circuitry, even more than logical functions. ECL works best when signal paths flow from one point to one or more receivers, as shown in Fig. 3. Stubs must be short and lines should be distributed such that the distance between loads is greater than 5 cm. This allows the line to approximate a distributed capacitive loading rather than a lumped capacitance which would cause reflections. The Series 64 CPU accomplishes this by bit-slicing the data path of the system into four bits for ALÃšA and four bits for ALUB on each of four printed circuit boards, called RALU boards. This enables each individual bit line to be completely contained on one printed circuit board. It is much easier to adhere to ECL rules if each line is contained on one board. The four RALUs contain the data paths for RANKl and RANK2 (see Fig. 2). The data signals used to carrys and shifts are routed across the frontplane to

other RALUs. The frontplane is an added signal plane that ties six of the eleven CPU boards together.

Higher Data Path Performance Higher performance and increased functionality are ob

tained by using high-speed 100k logic in the main data path of the CPU. The use of 100k logic decreases the propagation delay through the 1C to one- half of what it would be for 10k ECL. Additional care had to be taken in board layout be cause of the increased rise times and reduced voltage swings experienced with 100k ECL. The RANKl and RANK2

Frederic C. Amerson Rick Amerson received his BEE degree f rom Georgia Inst i tu te of Technology and jo ined HP in 1972. He designed a plot ter inter face for the HP 3000, d id CPU design for the HP 3000 Ser ies I I and Ser ies 64 , and was p ro jec t man ager fo r por t ions o f the Ser ies 64 de s ign. He 's now a pro jec t manager wi th HP's Computer Systems Div is ion. A resident of Santa Clara, Cal i fornia, Rich is t reasurer of his church, an instrument-rated pr ivate pi lot , and a member of the Internat ional Brotherhood of Magic ians. He enjoys cooking and downhi l l sk i ing.

Elio A. Toschi El io Toschi has been des ign ing computer hardware for HP since 1965. He has served as a des ign eng ineer and as project leader for var ious I /O, communicat ions, and memory systems inc luding those of the HP 3000 Ser ies 64. Before coming to HP, he designed avionics equipment for eight years. El io was born in San Jose, Cal i forn ia and received his BSEE degree in 1957 from San Jose Sta te Univers i ty . He 's mar r ied, has two chi ldren, loves ski ing of any var iety, and st i l l l ives in San Jose.

Mark S. Linsky A native of Gloucester, Massachusetts, Mark L insky received h is BSEE and MSEE degrees f rom Massachuset ts In st i tute of Technology in 1972 and 1973. He jo ined HP in 1973, des igned a handheld-calcu lator power system, cont r ibuted to the des ign of the HP 3000 Ser ies 64 memory system, and served as pro ject manager for the Series 64 memory and I /O systems. In 1976 he received his MBA degree from the Univers i ty of Santa Clara. One patent â€” on a battery recharger â€” has resulted from his work. Mark is married, has a daughter, and l ives in Saratoga,

Cal i fo rn ia . He keeps busy p lay ing sof tba l l and racquet spor ts and working on his home in Saratoga and his vacat ion home near Lake Tahoe.

1 0 H E W L E T T - P A C K A R D J O U R N A L M A R C H 1 9 8 2


data path is implemented in 100k logic, except for the store. The store path. L'BL'S. uses 10k logic. Since 10k logic should not be driven by 100k logic because of their different threshold voltages, the CPU uses quad line drivers to inter face between 100k and 10k logic.

The increased speed and functionality of 100k ECL is used by the main data path of the ALU, the R and S operand registers, the R and S multiplexers and the carry lookahead circuit. The major advantages in functionality are in the multiplexers and the carry lookahead ICs. where a two-for- one reduction in parts count is achieved. The ALU ICs perform eight logic operations and eight arithmetic opera tions. In addition to performing binary arithmetic, the cir cuit contains the necessary correction logic to perform BCD

addition and subtraction. The ability to do decimal arithme tic is not available in standard 10k ECL without extensive correction circuitry, which increases critical path delay.

Acknowledgments The success of the Series 64 was the result of the hard

work of many people. We especially wish to thank Bob Horst, Bill Bryg, and Bill Putzier for their contributions to the design of the CPU, Dave Cantrell for his efforts in build ing and debugging CPU hardware. Ed Miller for his design of the power system and EMC compatibility, Jim Brannan and Andre Shaari for their help in designing quality into the system, and Peter Rosenbladt for his leadership in getting the product to market.

Powerful Diagnostic Philosophy Reduces Downtime by David J . Ashkenas and Richard F. DeGabr ie le

AN IMPORTANT ATTRIBUTE of any computer sys tem is availability. Because we rely on computers so heavily to increase our productivity, it is vital that

they function properly all the time. From an availability standpoint, an ideal computer is one that never fails. Since this goal is difficult to attain, the next best computer is one that rarely fails and can be quickly repaired when it does.

A key factor to be considered in designing such a machine is that it is generally difficult to isolate a faulty component but it is usually easy to replace it. One method of solving this problem is to increase the size of the smallest field-replaceable unit. With fewer subassemblies to con sider, the process of determining which one to replace becomes easier. If this approach is carried to its logical conclusion, the entire computer can be made one unit. For economic and other practical reasons, however, the small est field-replaceable unit of the HP 3000 Series 64 is a printed circuit assembly or a power supply. An exception is the main memory RAM integrated circuits, which can be individually replaced.

Because of the tremendous complexity of the Series 64 (there are over 3000 ICs in the CPU alone), conventional methods of fault isolation, such as board swapping, are both time-consuming and expensive. Therefore, an innovative failure diagnosis philosophy was formulated and adhered to throughout the design and development of the computer. The resultant system is easy to use and provides for remote diagnosis and board-level isolation of mainframe failures.

The field diagnostics used in previous HP 3000 Comput ers do not diagnose; rather, they test and then simply halt if a failure occurs. The customer engineer (CE) must then correlate the particular test that failed with a specific circuit

function being tested, determine which board performs that function, and replace it. This requires that the CE be famil iar enough with the diagnostic to know precisely which circuit function it was testing. In addition, the CE must be familiar with the machine under test at the detailed block diagram level to know how that circuit function is par titioned over the set of boards in the system. Without this detailed knowledge, troubleshooting is reduced to swap ping boards on a best-guess basis until the failing board is isolated. It is expensive to train the CE to know the machine at the detailed block diagram level. However, board swap ping is also costly; it takes time and large spare parts inven tories must be carried.

To reduce repair time and inventory costs and to increase field productivity, the Series 64 diagnostics have been de signed to indicate which board(s) in the system could con tain the fault. To help achieve this objective, the diagnostics are written in microcode to exercise maximum isolation and control of the hardware being tested.

The fault-locating diagnostics are packaged so that they obtain failure information while requiring no special train ing or knowledge to use. In fact, the diagnostics are de signed to be run either on-site by the customer or remotely from a field office so that the CE has an understanding of which replacement boards may be needed before traveling to the computer site. This is possible because the diagnos tics themselves identify suspected boards in order of fault probability (see Fig. 1). Hence no special training or knowl edge of the hardware is necessary to interpret the test re sults.

All of the functions available at the system console are also available at a remote console via a modem and tele-

MARCH 1982 HEWLETT-PACKARD JOURNAL 1 1


D i a g n o s t i c M e n u - P r e s s t h e c o r r e s p o n d i n g

fi - KER'MICR -- Run Kernel and all Mi crod i agnost i es .

f2 - RLL MICR -- Run all Mi crod I agnost i cs (Section 1-5).

f3 - MEMORY -- Run Mi cr od 1 agnost i cs 'Section -4-5).

- I - ' O - - R u n M i c r o d i a g n o s t i c s ' S e c t i o n 5 ) .

fS - WCS PRRT I -- Hddresses 0100H - 13FFH

f 6 - WCS PflRT II - Rddresses OOOOH - 12FFH

f7 - IOMRP -- Pun Mi cr odiagnost i c IOMRP

fS - RESTHRT -- Rerun currently loaded Ml cr od i agnost I c

K E R N E L D I R G N O S T I C R E V I S I O N f l - 2 1 2 3 T E S T I N G T I M E â € ¢ 4 M I N . P H G E O N E C O M P L E T E D P R G E T N O C O M P L E T E D P H G E T H R E E C O M P L E T E D P R G E F O U R C O M P L E T E D E N D O F K E R N E L D I R G N O S T I C

F R U L T L O C R T I N G M I C R O D I H G N O S T I C R E V R - 2 1 2 6 S E C T I O N 0 0 0 1 L O R D I N G S E C T I O N 0 0 0 1 E X E C U T I N G â € ” T E S T I N G T I M E 0 0 0 1 0 S E C O N D S S E C T I O N 0 0 0 1 C O M P L E T E D , 0 0 1 9 2 P R S S E S S E C T I O N 0 0 0 2 L O R D I N G S E C T I O N 0 0 0 2 E X E C U T I N G - - T E S T I N G T I M E 0 0 0 4 0 S E C O N D S M I C R O D I H G N O S T I C F R U L T - - T E S T N U M B E R 0 5 9 . 3 , P H S S C O U N T S U S P E C T E D H S S E M B L I E S I N O R D E R O F F R U L T R f i N K : 1 CTLB

2 RHL2

3 SKSP

F ig . 1 . Fau l t - l oca t i ng d i agnos t i c s menu and a t yp i ca l t es t resul t showing a successfu l d iagnosis .

phone line. In addition to normal console capabilities, a CE can load any diagnostic, monitor the ac and dc power sys tems, and run the fault-location diagnostics without leav ing the field office. Hence a customer's computer can be fully diagnosed before any trips to the site have been made.

Diagnostic Control Unit The most powerful diagnostic tool of the Series 64, the

diagnostic control unit or DCU, is completely built into the CPU. The DCU is a Z80 microprocessor-based board that provides the sole interface between the user and the main

frame (see Fig. 2). For the first time on an HP 3000, there are no separate front-panel controls and maintenance panels. All these functions can be executed from the system con sole, which is connected directly to the DCU via a standard RS-232-C serial data channel. The ROM-based program for the microprocessor configures the DCU and the console to allow maintenance, diagnostic, and system operator func tions to be performed while remaining transparent to the user.

The DCU has access to all other printed circuit boards in the CPU/MEM cardcage through the use of serial shift strings. Each board is designed such that many of its state- determining elements (e.g. registers, flags, flip-flops, etc.) are implemented using shift register ICs. These elements are connected to form a large circular shift register known as a shift string. There is one string for each board. The DCU can freeze the machine, shift data out from a selected board for examination and/or modification, and then return the shift string to the board (see Fig. 3).

The DCU also controls the clocks for the Series 64 system. It can clock the entire system or any subset of its boards from 1 to 255 contiguous clocks. The shift string and clock con trol capabilities of the DCU form the basis of a number of system control and diagnostic functions, including: load ing and reading the WCS (writable control store) and mem ory, system initialization, maintenance panel (i.e., register and flag) displays, and initial microprogram loading (power-up and cold-load sequences).

When not performing any of the above tasks, the DCU operates the power system control (PSC) board, an interface between the DCU and the power system of the Series 64. It is via the PSC that the DCU can monitor the ac line voltage, dc power supply voltages and currents, and other parameters. This information can be displayed on both the system con sole and a light-emitting diode display on the PSC. Hence the CE can use the DCU to identify potential problems in the power system.

Diagnostic Tests An equally important component of the Series 64 diag

nostic philosophy is the set of diagnostic tests used to verify the proper operation of all mainframe hardware and to

Diagnostic Control Unit (DCU)

2 7 Sync Lines

Printed Circuit Boards in CPU/MEM Cardcage

Shift Strings 27

Power System Control (PSC)

Ã ̄

Power System

F i g . 2 . D i a g n o s t i c c o n t r o l u n i t b lock diagram. The DCU provides t h e s o l e i n t e r f a c e b e t w e e n t h e user and the mainframe.

12 HEWLETT-PACKARD JOURNAL MARCH 1982


D i a g n o s t i c C o n t r o l U n i t (DCU)

V B u s

Shift String

Multiplexer

T o Z S O 8-Bit Shift

Register

Â».: Â»!=.Â». ,- i iÂ»i:. iEr, Â«tÂ»iÂ»Â«Â« ,sÂ»ii f f o i E o a g i o

detect and isolate faults when they occur. The diagnostics are designed to test the computer in a hierarchical fashion, with three distinct levels of diagnosis (see Fig. 4).

This hierarchy is designed to take maximum advantage of the built-in diagnostic features of the Series 64 to provide board-level isolation of failures. The procedure for testing the computer involves applying the lowest-level tests first, and then working upward through the heirarchy, adding more known good hardware in small increments. Thus when a in fails, the faulty circuitry is implicitly isolated in most cases, since all hardware required to run lower-level tests has already been checked.

Level 1â€” Kernel Hardware Verification The first level of testing is designed to check the essential

core, or kernel, of hardware that is required to perform all subsequent tests. Level 1 is composed of two separate diagnostics: the DCU self-test and the kernel hardware diagnostic.

The DCU self-test is a program stored in ROM on the DCU. Initiated on power-up or via a command typed on the con sole, it verifies most of the DCU and PSC hardware and provides a pass/fail indication on both the console and a group of LEDs on the DCU board.

The kernel hardware diagnostic verifies that portion of the CPU hardware needed to load and run the next level of tests, the fault-locating microdiagnostics. The kernel hardware diagnostic consists of a set of DCU commands that are stored on flexible disc and loaded into the DCU via the system console for execution. In this diagnostic, fundamen tal CPU operations (microsequencing, basic microinstruc tion decoding, and main data paths) are checked out. Essen tially this is a single-cycle test that is both applied and observed by the DCU. The DCU forces lines of microcode

F i g . 3 . E a c h C P U o r m e m o r y board is des igned such that most of its fl ip-flops, flags, and registers form a large c i rcular shi f t register or "shi f t str ing." This is a diagram o f a s h i f t s t r i n g a n d t h e c o r r e s p o n d i n g c o n s o l e d i s p l a y . T h e s t r ing is b roken up in to f ie lds de termined by the des igner .

into the CPU, clocks the system, and then checks the results. In this manner, the basic CPU kernel is verified by an inde pendent processor.

Level 2 â€” Fault-Locating Microdiagnostics Fault-locating microdiagnostics perform the bulk of the

testing done on the Series 64 mainframe. They are used to isolate faults down to the board level. Primarily designed for field use, these diagnostics will, upon a test failure, give the user a list of boards that could contain the fault. Usually this list will contain four or fewer boards.

Like the kernel hardware diagnostic, these tests are stored on flexible disc and are loaded by the DCU. Unlike the kernel diagnostic, they are written in CPU microcode. The DCU's job is to transfer the tests to WCS, initiate them, and interpret the results. The advantage of testing in microcode is that hardware can be tested in small increments. Because the microdiagnostics are quite large, they are broken into five sections and the hardware is tested one subsystem at a time.

To assure board-level isolation, testing is done on a func tional circuit basis. A functional circuit is defined as several gates connected to implement a given logical function. With the knowledge of which functional circuit is under test and how it is partitioned across the boards in the sys tem, it becomes possible to specify which set of boards must contain the fault. This information is embedded within each test. When a failure occurs, the DCU extracts this data from the microcode and displays the suspected printed circuit assemblies in order of fault rank.

The DCU is also used to enhance the isolation capability of the fault-locating microdiagnostics. Some tests use the DCU to access hardware circuits that are otherwise inacces sible to the microcode. This is accomplished by embedding



DCU Self -Test

Kernel Hardware Diagnostics

Section 1 : Basic CPU Checks

Sect ion 2: CPU & Pre l iminary Cache Checks

Sect ion 3 : F ina l CPU & Cache Checks

Sect ion 4: Memory

Sect ion 5: IOB/ IMBI , GICs, IOMAP, I /O Loopback

CPU Sof tware Diagnost ic Sect ions 1-5

Main Memory Diagnost ic

I /O System Diagnost ics (Per ipherals, Channels, IOB/ IMBI Exerciser)

Level 1 Kernel Hardware

Verification

Level 2 Fault-Locating

Microdiagnostics

Level 3 Software

Diagnostics

Fig. 4. The diagnostics test the computer in hierarchical fashion. Success ive tests invo lve increas ing amounts of hardware.

DCU requests in the microdiagnostic. Upon receiving a request, the DCU freezes the system, retrieves a command from a designated register, and restarts the microprogram after the command has been completed.

Level 3 â€” Software Diagnostics The first two levels of diagnostic tests do a thorough

check of all the CPU hardware. However, they do not rigor ously test all the CPU instructions, nor do they exercise all of main memory, nor do they verify the proper operation of all the I/O cards and peripherals. To test all this hardware, many software diagnostics have been developed or adapted from other HP 3000 Computers. These are written in higher-level languages rather than microcode.

The CPU software diagnostics perform an extensive test of all the CPU machine instructions, expecially those unique to the Series 64. These tests do not isolate faults to the board level, but rather give a pass/fail indication.

The main memory diagnostic tests the main memory arrays as well as error correction and logging circuitry. Any array failures are isolated to the faulty ICs, which can be replaced in the field.

The I/O system diagnostics are test programs that have been adapted from those written for the HP 3000 Series 33, 40, and 44. They check the entire I/O system of the Series 64, including all boards that can be plugged into the I/O card cage. There are also diagnostics fora number of tapes, discs, and printers that can be connected to the mainframe.

All software diagnostics are stored on magnetic tape and are loaded and run by using the diagnostic utility system (DUS), a simplified operating system. The role of the DCU in these tests is very small. These diagnostics are intended for use by the CE rather than the customer and require some specialized knowledge to configure and execute.

One Hour to Repair This combination of diagnostic hardware and microcode

allows unprecedented supportability in the field. An indi cation of the success of this effort has been the achievement of one of the primary design goals of the Series 64 : an MTTR (mean time to repair) of less than one hour. Hence the Series 64 may be regarded as a computer with very high availability.

Acknowledgments The authors wish to call special attention to the contribu

tions of Kevin McGrath, who laid the foundations for the diagnostic philosophy and was instrumental in the de velopment of the microdiagnostics. Special thanks are also due W. Grant Grovenburg for his work on the initial design of the DCU, Norm Galassi, Randy Jones, and Tom Graham for their ability to develop 56K bytes of DCU firmware against all odds, and Mehraban Jam for his outstanding work on the final design of the PSC.

Richa rd F . DeGabr i e l e R ick DeGabr ie le is a nat ive o f Sac ramento, Cal i forn ia and at tended Cal i fornia State Polytechnic Universi ty (San Lu is Obispo) where he rece ived

Â« the BSET degree in 1976. He joined HP Ã¯ that year and was the diagnostic group

pro ject leader for the HP 3000 Ser ies 64. Rick recent ly t ransferred to HP's Rosevi l le Divis ion to work in the 2250 system manufactur ing group. He is marr ied and l ives in Ci t rus Heights, Cal i fornia. He enjoys water ski ing, foot ba l l , and basebal l , and is restor ing a 1966 Corvette.

David J . Ashkenas David Ashkenas was a summer intern at HP's Stanford Park Div is ion in 1977 whi le s tudy ing for the BSEE degree at Stanford Univers i ty . He received that degree in 1 978 and joined HP full t ime. He was responsible for the initial design of the HP 3000 Series 64 power system cont ro l ler and the hardware des ign of the d iagnost ic contro l un i t . In 1979 he rece ived h is MSEE degree f rom Stan ford . Dav id was born in I thaca, New York and ra ised in Pasadena, Ca l i fo r nia. He l ives in Sunnyvale, Cal i fornia

â€¢v and is interested in sailing, international t ravel , and I ta l ian sports cars.



A High-Performance Memory System with Growth Capabil i ty by Ken M. Hodor and Malcolm E. Woodward

IN THE BEGINNING, computers consisted of a simple central processing unit with a few registers, some mem ory, and an input output system (Fig. 1). As technology

advanced, the central processing unit grew and acquired more registers and the RAM (random-access memory) size grew as new higher-density RAMs became available.

The overriding goal of the central processor became high speed, and the overriding goal of RAM design became low cost. As RAM became a larger portion of the cost of a computer system, cheaper RAMs were developed. More registers were used to fill the need for fast access to data.

As the conflict between memory and CPU grew, there evolved the cache memory. This memory contains a small subset of the information in the main memory array, but its speed of access is much closer to what the central processor requires. However, the cache integrated circuits are rela tively expensive compared to the integrated circuits used in the main memory array.

While this was going on, computer input/output systems were developing a need for larger and larger buffer storage areas. I/O buffers attempt to match the speed of the I/O devices with the speed of the main memory array. As the speed of the CPU has increased, the demand for more I/O memory has also increased.

This is where the HP 3000 Series 64 is today. Fig. 2 is a basic block diagram of the system. The basic modules are the central processor and cache memory, the input/output adapters (IOA), and the memory module (MEM). These modules communicate with one another over the central system bus (CSB).

The central system bus is a synchronous, general- purpose, 14-MHz bus. Data transfers to and from memory are on a 16-byte block basis. Addresses, data, and messages share the same 32-bit path with two parity bits. Transfer rates as high as 56 megabytes per second can be achieved, although the current memory modules limit this to 18 megabytes per second.

Basic Information Transfer The basic information transfer on the CSB is either to read

data from memory or to write information to memory. To do

C P U R e g i s t e r s

a write to memory, a module sends the address followed by the block of data to be written to memory. This takes a total of five clock cycles to go across the CSB, one clock cycle for the address and four for the four 32-bit data words.

For a read operation, data can be in the memory module or in any of the caches in the system. For a read cycle a module sends an address out to memory via the CSB and every module looks to see if it has the most current data at that address. Only the copy held by the last module to access the data will be flagged as valid.

If memory has the only copy of the information, the mem ory module provides it to the requesting module. If one of the caches on the CSB has a more up-to-date copy of the data requested, this module must give it up and supply it to the requesting module. A module is said to abort the memory request and become memory, supplying the data. This fea ture increases the effective memory bandwidth, thus in creasing the system performance.

The CSB is also used for sending messages. These mes sages may be originated by the CPU or by one of the system modules. Messages are used by the CPU to request the status of the various modules and to control their operations. Messages are used by the system modules to report status changes and to reply to CPU requests. One example of a message is the CPU's requesting the memory module to report memory size. Another example of a message transac tion is the I/O adapter's reporting that a device is requesting service.

Messages are also used extensively by the microdiagnostics. The diagnostics use messages to set up conditions for various test cases. They also use them in the same manner that the system would during normal operations.

Memory Module Fig. 3 is a block diagram of the memory module. To read

from memory an address is sent over the CSB and through the common bus interface (CBI), through the buffers on the memory module and out to the main memory arrays (MMAs). Each MMA responds to an address range. The

Fig . 1 . S imp le computer b lock d iagram.

I /O Expansion

Fig. 2. HP 3000 Ser ies 64 block diagram showing cache and I /O buf fe r memor ies used to match modules tha t opera te a t d i f ferent speeds.



Main Memory

Array

1 to 8 megabytes

Error Correction/ Detection

Common Bus

Interface

C e n t r a l S y s t e m B u s

F i g . 3 . M e m o r y m o d u l e b l o c k d i a g r a m .

addresses are generated by adders on the arrays. Each MMA added to the system adds the array size coming from the board next to it to the amount of memory available on its board. This allows MMAs with various memory sizes to be mixed within a system. The memory address goes out to all of the MMAs and the MMA that has the correct address range will respond with the data. The data will come out of the array in four 39-bit words, each consisting of 32 bits of data and seven bits for error detection and correction.

As the information goes through the error correction and detection logic, single-bit errors are corrected and logged in the error-logging RAMs. Double-bit errors are not corrected, but are sent back to the requesting module as they came from the MMA. An error condition is sent to the requesting module with the bad word. To get to the requesting module they must go through the CBI.

For a write to memory the information comes through the CBI and into the buffer. The RAMs cannot handle the in formation as fast as the CSB can deliver it, so buffering is used to handle the mismatch in speed. For a write the address is sent to the MMAs followed by the four 39-bit words, 32 bits of data and seven syndrome bits for error correction and detection. The seven syndrome bits are gen erated in the error correction and detection circuitry and sent to the MMAs.

The RAMs are dynamic and must be refreshed periodi cally, so there is refresh logic within the memory module. A refresh takes priority over an access of memory. If a refresh is needed it can hold off the reading or writing of data in the MMAs. Thus the amount of time to access memory is vari able. Refresh has a minimal effect on the effective memory bandwidth.

The Series 64 has automatic power-fail/auto-restart capa bility. If the power to the machine should fail, all of the

information in the caches will be sent to memory within 5 milliseconds. Memory has battery backup power, which is supplied to the refresh logic, RAMs, and other necessary circuitry to keep the information alive in memory. When the power returns, the system automatically resumes where it left off.

Messages sent to memory are used for determining mem ory size, to respond to various error conditions, for diagnos tic purposes to check out various paths, and to find out if any read of memory has caused a single-bit error and which RAM caused the error condition.

Cache Module The cache module consists of the cache memory array

(CMA) and the cache array controller (CAC) printed circuit boards (see Fig. 4). The cache serves two functions. First, it is a high-speed buffer between the CPU and the main mem ory module. Second the cache is a communications path between the CPU and all of the other system modules.

There are five data paths to and from the cache and one between the CAC and CMA. These are: â€¢ The processor data bus (PDB) â€¢ The cache address bus (CAB) â€¢ The address and data bus (ADATA) â€¢ The data bus (DATA) â€¢ The intercache bus (ICB) â€¢ The cache data bus (CDB).

The processor data bus provides data to the cache mem ory array and commands to the cache array controller. The cache address bus provides the cache with address informa tion from the CPU and messages from the CPU to other system modules. The ADATA bus provides the cache with addresses and data from the central system bus. The cache data bus provides the CPU with data from the cache and the central system bus. The DATA bus provides addresses and data to the central system bus from the cache. The intercache bus provides an address path between the CAC and the CMA.

The cache memory array consists of 8K bytes of high speed ECL RAM which serves as a high-speed buffer be-

Cache Address Bus (CAB) Central Processing

U n i t ( C P U )

Cache Data Bus ( C D B )

' Processor Data Bus (PDB)

Cache Array Controller

(CAC)

Intercache Bus (ICB)

Data Bus (DATA)

Cache Memory A r r a y ( C M A )

Address and Data Bus (ADATA)

Common Bus Interface (CBI)

Central System Bus (CSB)

F i g . 4 . C a c h e m o d u l e b l o c k d i a g r a m .



Uveen main memory and the CPU. The array is organized into two data sets. Each data set contains 4K bytes of data arranged into 32-bit words. The CMA also provides the data path for incoming messages from the system modules to the CPU.

The cache array controller contains the control logic for the cache. The CAC keeps track of what data is contained in the cache memory array. When a memory request is issued by the CPU or any of the system modules, the CAC deter mines whether the CMA contains the requested informa tion. If it does, the CAC takes the necessary action to supply the data to the requester. If the CMA doesn't contain the requested data, the actions taken by the CAC depend upon the requester. If the requester is another system module, no action is taken by the cache array controller. If the requester is the CPU, the CAC tells the CPU that the data is not available and initiates a memory request for the memory block (16 bytes) that contains the requested data. When the cache memory array receives the requested block, the CAC instructs the CMA to supply the requested data to the CPU via the cache data bus and informs the CPU that the re quested data is available.

Bus Contro l Commands Communications between the CPU and other system

modules, including the cache, take the form of messages. The CPU initiates these messages by sending bus control (BUSC) commands to the cache array controller. BUSC commands are used in normal system operations and in diagnostic operations. When the CPU issues a BUSC com mand, the CAC determines whether the message is to be sent to another system module or to be acted upon by the cache. The messages the cache acts upon are: â€¢ Read cache status â€¢ Write cache status â€¢ Write tag data â€¢ Verify tag data â€¢ Read incoming message.

The send word message, SNDWRD, is sent to the ad dressed system module via the central system bus. The SNDWRD message is addressed to main memory, one of the I/O modules, or the cache. The messages that are sent to the other system modules are used to: â€¢ Determine system configuration â€¢ Report status changes of modules to the CPU â€¢ Read error conditions and the memory error log â€¢ Perform diagnostic exercising.

The cache may interrupt the CPU in two ways. The first is a message interrupt. This interrupt occurs when the cache array controller detects an incoming message from one of the system modules. The CAC responds to an incoming message by issuing the MSGINT signal to the CPU and hold ing the message word until the CPU has had time to read it. MSGINT causes the CPU to branch to the proper place in microcode and read the incoming message. The CPU re sponds to the message interrupt by issuing BUSC com mands to read the incoming message.

The second interrupt is an error condition (CACHE ER ROR) which is generated by the CAC when a catastrophic error has been detected in the cache memory array data or in the tag and status information relating to the CMA data. The

response to a CACHE ERROR interrupt is a system halt.

I /O Adapter The LO adapter, which interfaces the Series 64 to an I O

bay. has a small cache used as a buffer. The LO cache is structured as a four-set associative memory. Each set is one block deep. The article on page 18 goes into more detail.

The LO adapter also uses the central system bus for the communications with the other modules. This module also accesses the CSB via a common bus interface.

Acknowledgments The authors would like to acknowledge the contributions

made by Rick Amerson. Mark Linsky, Karen Meinert. Stan ley Anderson, Jim Socha. Charles Andrews, Skip LaFetra. John Figueroa, Paul Rogers, Anthony Boresch and Fred Bird to this part of the Series 64 project.

Ken M. Hodor Born and raised in Hammond, Indiana, Ken Hodor rece ived h is BSE degree from Purdue University and joined HP in 1973. He has designed calculator ICs. electronics for a laser interferometer, and the cen t ra l sys tem bus and mem ory for the HP 3000 Series 64. Ken is a scuba d i ve r , a home compu te r hob by is t , a cyc l is t and a swimmer. He's marr ied, has two daughters, l ives in Sunnyvale, Cal i fornia, and is current ly

A helping introduce microcomputers at a local elementary school.

Malcolm E. Woodward Woody Woodward s tar ted h is HP career in 1972 as a technic ian on the 2100 Computer manufactur ing l ine. He soon moved into the R&D lab as a tech n ic ian and la te r became a des ign en gineer, contributing to the design of the HP 300 and HP 3000 Series 64 Comput ers, Before coming to HP, he served in the U.S. Marine Corps for over six years as an electronics technician, instructor, and techn ica l superv isor . Born in On tar io , Oregon, Woody is marr ied, has four chi ldren, and l ives in San Jose, California, His hobbies include amateur radio, automobi le mechanics, and pis-

r i f le shoot ing. He serves as amateur radio emergency coor- for the Ci ty of San Jose and several county agencies.

tol and dinator



An Input /Output System for a 1-MIPS Computer by W. Gordon Matheson and J . Marcus Stewar t

WHEN THE HIGH-SPEED CPU and central system bus for the HP 3000 Series 64 were first con ceived, considerable attention was given to the

goals and design approach for the I/O system hardware. What emerged was a system that is compatible with earlier systems, yet provides for expansion and future growth.

On the HP 3000 Series 30, 33, 40, and 44 Computer Systems, all I/O channels, the CPU, and main memory communicate on the intermodule bus (1MB). The 1MB is an asynchronous TTL-technology backplane bus that will support up to 15 I/O ports, depending on total length and configuration (see system diagram, Fig. 1, and 1MB sum mary in Table 1). Previously there have been two types of I/O ports available for the 1MB: the 31262A General I/O Channel, which connects HP-IB (IEEE 488) devices to the system, and the 31642A Asynchronous Data Communica tion Channel for RS-232-C terminals and modems. Many kinds of peripherals attach to these two channels, including disc memories, magnetic tapes, printers, printer interfaces, laser printers, and CRT terminals.

The input/output hardware of the HP 300 Computer is similar to that of the HP 3000 family members mentioned above.1

Ser ies 64 I /O System Goals Since the HP 3000 Series 64 is a member of a family, and

since the goal was to support many of the same peripherals that are available for the other family members, the strategy for the I/O system was to retain the 1MB as the primary attachment of I/O to the system if it could be made compatible with other system requirements. This allows an easier upgrade path for users, who can use many of the same cables, I/O channels, and some IMB-compatible device controllers that are used in earlier systems. Of course, it also allowed HP to implement a system that had been proven

IMS-Based Computer System (HP 3000 Ser ies 40 /44)

in other HP computers, and to minimize software driver development, hardware design, and control program algo rithm development. Other goals of the HP 3000 Series 64 I/O system were improved I/O throughput and support of larger numbers of peripherals than on the next fastest family member, the Series 44.

To help achieve these goals and assure that the higher CPU performance is augmented by higher I/O performance, three major improvements have been made: 1 . The new 301 44 A Advanced Terminal Processor (ATP) is

used for datacomm interfacing (see article, page 22). This eliminates much of the CPU intervention required by previous products for datacomm transfers. It allows attachment of up to 96 datacomm connections per chan nel on the intermodule bus. The ATP has an extensive DMA (direct memory access) facility, whereas the older 31264A Asynchronous Data Communication Channel required channel program intervention by the processor for every byte transferred.

2. A round-robin approach was adopted for servicing the channel programs for devices on the general I/O chan nels. This is a rotating priority scheme that gives equal service to all devices. With large system configurations having large numbers of high- use discs, round-robin servicing eliminates the lockout of lower-priority de vices by higher-priority devices.

3. Provisions for multiple I/O buses. The throughput of the central system bus is maximized if several intermodule buses are able to attach to the CSB and perform DMA simultaneously. The throughput of the 1MB is much lower than that of the CSB, and the CPU averages a small portion of the maximum bandwidth of the CSB, so mul tiple IMBs can provide enhanced DMA throughput and allow for attachment of larger numbers of I/O channels than could be supported by a single 1MB.

F i g . 1 . H P 3 0 0 0 f a m i l y i n p u t / o u t p u t s y s t e m d i a g r a m . A l l 1 1 0 c h a n n e l s , t h e C P U , a n d m a i n m e m o r y c o m m u n i c a t e o n t h e i n termodule bus (1MB).



While the LO subsystem for the HP 3000 Series 64 is designed for optimized attachment to the 1MB. it was de veloped under a consistent set of system rules for I/O opera tions and hardware interface. Non-IMB types of I/O hardware may be attached to the system if the need arises, and the HP 3000 Series 64 will be able to keep pace with future I O system developments.

A final and extremely important goal was to make I/O hardware problems easy to diagnose accurately and fast to repair. This means that it must be possible to check each level of hardware interface quite thoroughly by diagnostics before extending testing to levels farther out. For example, an untested logical block on an I/O channel could cause a channel failure to be diagnosed as a device failure. To achieve the goal of diagnosability , a significant part of each hardware component consists of diagnostic features that allow on-board loopback of data and control bits, checking of error detection logic, and simulating opera tions that normally involve the next level of hardware.

I /O Adapters The CSB was developed for very high-speed data trans

fers and synchronized operation. Its implementation was necessary to achieve the memory and CPU performance goals for the system. Unfortunately, the CSB can't be used for direct attachment of input/output channels. The main difficulty is that it is necessary to restrict the physical length of the bus to maintain the capability of high transfer rates; otherwise, it would take too long for a signal to propa gate from one end of the bus to the other. Physically, the CSB is less than 15 cm long. This places severe restrictions on the number of printed circuit assemblies that can be plugged into the bus. For this and other reasons, a separate I/O bus is required for the HP 3000 Series 64, and the 1MB was chosen to do the job. All I/O channels (general I/O channels, advanced terminal processors, intelligent net work processors, etc.) plug into an IMB-type backplane in the I/O cardcage, and an I/O adapter has been developed to provide for communication between the two buses.

An I/O adapter, in general, is any interface between the CSB, which supports the CPU and main memory, and another bus that supports I/O channels and devices. On the HP 3000 Series 64, the present I/O bus is the 1MB-, but some other bus could be used so long as the proper I/O adapter is provided.

As an integral part of the Series 64, the I/O adapter is required to perform the following functions: â€¢ Electrical translations. The CSB is an ECL-level bus. If the

I/O bus uses other logic levels (the 1MB, for example, is a TTL bus), it is up to the I/O adapter to provide translation between the two.

â€¢ Command and service request translations. On the CSB, all commands and requests are sent in the form of mes sages. An I/O bus may have a similar scheme, or it may use separate lines for service requests and for issuing commands, as the 1MB does. In either case, the I/O adapt er must translate requests for service on the I/O bus, whatever their form, to messages addressed to the CPU, and messages from the CPU must be properly translated to commands on the I/O bus that use the proper hand shaking protocol.

â€¢ Memory buffer. All memory transactions on the CSB take place in 16-byte chunks in the form of four contiguous 32-bit words transferred in four successive clock cycles. If the I/O bus uses any other size as its minimum address able data block (on the 1MB, it is one 16-bit word), the I/O adapter must provide some sort of buffer to convert one size to the other.

â€¢ Synchronization to system clock. The CSB is a fully syn chronous bus. The I/O bus may be either asychronous as the 1MB is, or synchronized to another clock. In either case, it is up to the I/O adapter to synchronize the I/O bus to the CSB. The Series 64 system currently supports up to two I/O

adapters , making it possible to increase significantly the I/O bandwidth of the system, since two separate I/O buses can transfer data independently at the same time.

Interfacing the 1MB to the Series 64 The 1MB I/O adapter for the Series 64 consists of three

printed circuit assemblies and the associated interconnect ing cables. The three printed circuit assemblies are the common bus interface (CBI), the I/O buffer (IOB), and the 1MB interface (IMBI) assemblies. They are connected as shown in Fig. 2. The IOB and CBI both reside in the main (CPU) cardcage and the IMBI resides in the 1MB cardcage. The common bus interface is the universal assembly that interfaces modules on the central system bus to the CSB itself. Communication signals between the I/O buffer and the common bus interface travel along the backplane and through a very short 2-cm frontplane cable. The I/O buffer is connected to the intermodule bus interface with two 1.5-m transmission line cables. Each of these cables carries 30

Table I Intermodule Bus (1MB) and Central System Bus (CSB)

Characteristics



C S B Central System Bus

ECL Level Signals

Common Bus Interface

(CBI) I /O Buffer

(IOB) T T L

TTL Level Signals

1MB

J C L TTL

Transmission Cables

Intermodule Bus

Fig. 2. The I/O adapter interfaces the Series 64 to the 1MB. I t consists of three pr inted c i rcu i t assembl ies.

signal and 60 associated ground conductors, providing bal anced 92-ohm paths for the TTL-level interface signals be tween the IOB and IMBI.

In the 1MB I/O adapter, signal level translation between the central system bus and the intermodule bus is done by the I/O buffer. Referring again to Fig. 2, the broken line across the IOB indicates the boundary between ECL-level signals and TTL-level signals. Except for the level trans lators, the lOB's internal logic is implemented with ECL devices.

Another general requirement of an I/O adapter is the translation of the sixteen-byte memory transfers on the CSB to whatever size the I/O bus uses. The 1MB works with one sixteen-bit word at a time, and the IOB takes care of the buffering between the two. The IOB uses a four-set, fully associative cache with a two-cycle (150 nanosecond) access time as the memory buffer. "Fully associative" means that the IOB can store copies of up to four sixteen-byte blocks from main memory at the same time, and any of the four blocks can be associated with any arbitrary memory ad dress. In terms of performance, this means the IOB can support up to four simultaneous high-speed direct memory accesses (DMAs) at the same time without any thrashing (excessive swapping) in the cache. Besides acting as mem ory to the intermodule bus, the IOB supports central system bus memory activity by checking addresses on the CSB and providing any blocks requested for which it has a valid copy.

Still another requirement of the I/O adapter is translation of commands between the CSB and the I/O bus (see Fig. 3). The 1MB I/O adapter's specific task is to translate channel program service and interrupt requests on the 1MB to mes sages to the CPU, and to translate messages from the CPU into global or addressed commands on the 1MB. Most of this is done by the IMBI, with the IOB acting as a level translator and traffic controller. When a request line on the 1MB is asserted, the 1MB I/O adapter formulates a 32-bit unsolicited message and transmits it to the CPU, then disables further messages until they are reenabled by commands from the CPU. 1MB and I/O adapter commands are sent as a single 32-bit message from the CPU. When the I/O adapter com pletes the handshaking of the command it formulates a 32-bit response message containing 16 bits of status (parity

error on command, 1MB handshake timeout, etc.) and the contents of the 1MB data bus when the command execution is completed.

One of the biggest challenges for the 1MB I/O adapter is that of synchronization. Since the 1MB is a totally asyn chronous bus and the CSB is totally synchronous, all trans actions on the 1MB have to be synchronized with the Series 64's master clock before any interaction with the CSB can take place. All of this is done by the IMBI, which is itself synchronized with the system clock. What makes this task challenging is the high speed of the Series 64's master clock: the cycle time is only 75 ns. The normal method for synchronizing any signal is to run it through a series of D-type flip-flops, with each successive stage minimizing the chance of an unstable output at the end of the chain. Normally, the higher the clock speed, the more levels of synchronization are needed to assure stability. However, each level of synchronization adds delay to any signal that must be synchronized and reduces the bandwidth of the bus. The IMBI, therefore, uses a specially designed state transition system that imposes minimum extra delay on the signal being synchronized by being tolerant of metastable states. Thus the IMBI is designed with only one level of synchronization on each signal, with the state machine adding the extra levels needed for high reliability.

Expansion and Growth To support more than one 1MB for the performance and

configurability issues discussed earlier, a few modifica tions were necessary both to software and to the 1MB.

The I/O bay is wide enough to support a 24-slot 1MB backplane, which is divided into two IMBs of eight slots and sixteen slots respectively (see Fig. 4). The IMBI resides in the slot nearest the CPU bay of each 1MB. Channel and device controller boards may be inserted in the remaining slots with a few configuration guidelines. All cables leaving the system do so through a junction panel, which provides a solid chassis ground for them. The larger 1MB accommodates

F i g . 3 . T h e I / O a d a p t e r t r a n s l a t e s v a r i o u s c o m m a n d s b e tween the cent ra l system bus (CSB) and the I /O channel .



S e c o n d I n t e r m o d u l e B u s F i r s t I n t e r m o d u l e B u s

1 10 I 11 | 121 13 I 141 15 I 16 I 17 1 181 19 I 201 211 22 1 23 1 24

Single-IMB Configuration:

I Dual- IMB Configuration:

I /O Channels, IMB2

Device Control lers

IMBI=lntermodule Bus Inter face

I /O Channel Slots, IMBI

Device Controllers

I /O Channels, IMB1

Device Controllers

a reasonable system I/O configuration before space or per formance requirements indicate the need for splitting the bandwidth or adding more channels than can fit onto a single 1MB. The closer an I/O channel is to the IMBI, the higher its priority for memory access.

The Series 64 has a device reference table structure simi lar to the other members of the family . That is , for each of the eight devices per channel, a four-word entry in main mem ory provides the following information: 1. The address of the channel program for that device. 2 . The address of the channel program variable area for the device (to save parameters relating to execution of the channel programs). 3. The label of the interrupt handler code segment for ex ternal interrupts from the device. 4. A word for run-time execution state information for the device (e.g., the reason a channel program is suspended).

The total device reference table length has been ex panded from 120 to 512 entries to accomodate up to four I/O adapters for future growth, and its starting location is not fixed in hardware, but has been made indirect through a pointer in reserved main memory.

The RMSK and SMSK instructions (read mask and set mask) specify which I/O channels are allowed to generate interrupts to the CPU. These instructions also had to change, since software is providing for four I/O adapters, each of which can have 15 channels (channel 0 is not used on the 1MB). The interrupt masks are now contained in words 32-35 of memory.

Similarly, all of the I/O instructions have been expanded to include specification of the I/O adapter number.

It is a testimonial to the modularity of the I/O system software that these changes were implemented with mini mal time and effort, since only a few common system sub routines required modification. The MPE-IV operating sys tem will work equally well with single-IMB systems or with the Series 64, since it can distinguish the hardware system under which it is running.

An intelligent processor on an 1MB might need to know where its device reference table is, if it is executing its own channel programs. Therefore, two extra lines were added to the 1MB to let the smart channels recognize the 1MB to which they are attached.

Summary The HP 3000 Series 64 I/O system uses much of the same

F i g . 4 . T h e S e r i e s 6 4 I / O b a y h o l d s t w o i n t e r m o d u l e b u s e s ( IMBs) of e ight and sixteen slots.

hardware and supports the same peripherals as the Series 30, 33, 40, and 44 while providing an expansion path to meet the needs of users of larger and faster systems. This was also accomplished with relatively minor changes in the I/O software. Yet there is also provision for adaptability to future requirements in the I/O area.

Acknowledgments Series 64 I/O development was managed by Mark Linsky.

Neil Wilheim developed the original design concepts for the I/O adapter. Paul Chang developed the CPU firmware to support all I/O functions. Rick DeGabriele and Tom Graham wrote the I/O microdiagnostics. Skip LaFetra assisted in several phases of the project and contributed provisions for performance enhancements.

Reference 1. W.G. Matheson, "A Computer Input/Output System Based on the HP Interface Bus," Hewlett-Packard Journal, July 1979.

J. Marcus Stewart Marc Stewar t rece ived h is BSEE de gree from the University of California at Davis in 1979. He jo ined HP the same year as a product ion eng ineer and a yea r l a te r became a deve lopmen t en g ineer on the HP 3000 Ser ies 64 pro j ect . He is marr ied, l ives in San Jose, Cal i fornia, is act ive in his church, and enjoys ski ing, bicycl ing, swimming, and model airplanes.

W. Gordon Matheson With HP since 1973, Gordon Matheson has done CPU, DMA, and mic rocode des ign fo r HP 1000 Compute rs , de veloped hardware and f i rmware for the HP 300 Computer I /O sys tem, and de veloped the I/O adapter for the HP 3000 Ser ies 64. He's a lso worked on local data networks and I/O standards (ANSI X3.T9.2). Born in Glendora, Cal i fornia, he now lives in San Jose. He's married, has four children between 2 and 7 years

!**'.â€¢'â€¢â€¢ -..- - â€¢â€¢* old, plays piano, and considers himself - -â€¢'â€¢â€¢â€¢'â€¢â€¢'".U** a ful l- t ime parent.



The Advanced Terminal Processor: A New Terminal I /O Control ler for the HP 3000 by James E. Beetem

THE ADVANCED TERMINAL PROCESSOR (ATP) is a new terminal I/O controller designed specifi cally for the HP 3000 Series 64. Although the basic

ATP design is compatible with any HP 3000 that uses the intermodule bus (1MB), its design center is a large system with the processing power of the Series 64.

The HP 3000 Terminal I /O Envi ronment The first step in designing a new terminal I/O controller is

to establish the maximum number of terminals to be sup ported and the demands that each terminal will place on the system. The terminal I/O loads expected on the Series 64 can be predicted by projecting the same loads on the widely used HP 3000 Series III to a system with four to six times its processing power. The resulting numbers can be adjusted further to allow for the trends that are apparent in the HP 3000 applications environment.

A typical HP 3000 Series III has 35 terminals connected, three of them via modems. Usually, during the busiest part of the day, 25 of the terminals have active sessions and 20 terminals are actually in use. About two- thirds of the termi nals are supplied by HP and the rest come from a wide variety of vendors.

The maximum supported number of terminal I/O ports for the Series III is 64. All 64 may be operated as modem ports, although the typical system has only three modem ports.

The number of terminals that an HP 3000 can operate with acceptable user response times is strongly dependent upon the demands the application package places on the system. Customers have reported satisfactory response times with their application package supporting as few as ten and as many as 48 terminals. The typical numbers chosen for this analysis represent a 75th percentile case â€” only about 25% of HP 3000 customers have more terminals connected to their systems.

On the typical HP 3000 Series III, the average I/O load per terminal during peak hours is about 20 characters per sec ond. The load is generated by processing one user transac tion every 20 seconds at each terminal. A transaction in volves writing a message to the screen of the terminal and reading in the operator's reply. Usually, the output is about 300 characters and the input is about 100 characters. The transaction time and the number of I/O characters vary over a very wide range and are strongly dependent upon the specific application being run.

The actual I/O load in any short period of time varies widely from the average. Peaks of up to 1000 characters per second occur occasionally when several terminal devices operate concurrently.

Several trends are evident that will have an impact on terminal I/O. First, as terminal prices decline, more and more terminals that are lightly used are attached to the

system. These terminals demand much less of the system's resources. On the other hand, the new intelligent terminals have substantial processing power and frequently move masses of data to and from the system. These devices can generate very heavy I/O loads.

Another class of devices placing significant terminal I/O loads on the system consists of character printers such as the HP 2631B and HP 2601A. When in operation, these devices may require as many as 180 characters per second. Because these devices tend to operate during the system's peak loads, they can sharply increase the average I/O loads.

Most of the newer terminal devices allow higher line speeds. Almost all terminals now offer a maximum line

In te rmodu le Bus (1MB)

Interrupt FIFO

Byte Pack/ Unpacking

Logic

A T P B u s Interface Sys tem In te r face

Board (S IB) A T P B u s (a r i bbon cab le )

A T P B u s Interface

Baud Rate

Generation

Modem Junction

Panel

Direct Connect Junct ion Panel

A s y n c h r o n o u s I n te r f ace Boa rds (A lBs ) - up to 8

P C C = P o r t C o n t r o l l e r C h i p M o d u l e M C C = M o d e m C o n t r o l l e r C h i p M o d u l e

F ig . 1 . B lock d iag ram o f t he advanced te rm ina l p rocesso r (ATP) for the HP 3000 Ser ies 64. Up to 256 terminals can be connected.



speed of 9600 bits per second (bps). New designs call for a speed of 19.200 bps. These higher data rates will cause fewer but higher peak 1O loads.

Based on this analysis, the ATP for the Series 64 was designed to allow a maximum configuration of 256 termi nals. It expects to have about 200 terminals connected in a typical installation. About 125 devices will have active sessions with 100 actually in use at any given time. The average I/O load is expected to be about 4000 characters per second with occasional peaks of 20,000 characters per second. However, it is important that the reader understand that the numbers developed in this section are design center and maxima for a subsystem within the Series 64. Con straints external to the terminal I/O subsystem may not allow it to achieve its design maxima.

Other Design Considerat ions The trend to higher line speeds causes another problem.

The RS-232-C standard used to connect terminals to the HP 3000 has a limited speed/distance relationship. In theory, devices must be within 16 metres of the system. In actual installations, these limitations are often exceeded without problems at slower line speeds. At 9600 and 19,200 bps, the speed/distance relationship is a real limitation, restricting successful high-speed operation to terminals that are rela tively close to the system.

A new modem interface standard is emerging: RS-449. When these new devices become widely used, the HP 3000 Series 64 will need to support them. Provisions are made in the new controller to allow customers to operate both RS- 232-C and RS-449 types of modems on a single Series 64. Flexible junction panels support both modem standards, the existing direct connection standard, and a new direct connection standard with an improved speed/distance rela tionship.

Another consideration is how terminals are operated. Current HP 3000 software operates terminals in one of two modes. Each mode makes sharply differing demands upon the terminal I/O system. In character mode, a human operator is typing characters and using the editing facilities provided by the host computer system to prepare input data strings. Characters arrive at the computer infrequently, but each character must be compared to a large special charac ter set and the appropriate actions must be taken.

In block mode, the terminal provides the editing facilities. When the input data string is ready, the terminal transmits it as one continuous block of data. The data ar rives to fast as the terminal can transmit it, usually close to the line speed. At 9600 bps, one character arrives each millisecond. Very little processing is required, only a check for the character that marks the end of the block.

Under the current MPE (Multiprogramming Executive) terminal I/O system, the terminal I/O controller sometimes does not know which mode is in effect. Therefore, the ATP is designed to handle the character mode processing load at the block mode data rate.

ATP Design The key element in the ATP design is the 6801 single-

chip microcomputer. This device is programmed to provide the functions needed to support an HP 3000 terminal port. It

is called the port controller chip (PCC) . One of these chips is used for each terminal port.

The ATP hardware is designed around the PCC, provid ing the interfaces to the intermodule bus and to the terminal or modem (Fig. 1). The ATP software is designed for flex ibility, to allow easy implementation and support of a wide range of facilities. Port Control ler Chip (PCC)

The 6801 PCC contains an enhanced 6800 microproces sor, 2K bytes of read-only memory' (ROM), 128 bytes of random-access memory (RAM), a universal asynchronous receiver/transmitter (UART), and a timer with an edge- triggered counter. The UART provides the bit-serial inter face to the terminal, eliminating the need for a separate UART chip. The microprocessor and its memory provide the processing power to do character handling without burdening the system processor. The PCC can handle the full line speed of 19,200 bits per second (1920 characters per second) in all modes of operation.

Since the PCC's microprocessor is dedicated to one port, the ROM-based software that operates the PCC can be very simple. There is no need for software to share resources or to resolve contention problems. The PCC is a simple slave processor, dedicated to doing a single task very quickly.and efficiently.

The PCC checks the input and output data streams for special characters. System software may define two input edit sets and enable either of them for comparison with the input data stream. Two output sets are defined, one to scan the output stream and one to scan the input data stream during an output. These facilities give software complete control of the full-duplex operation of the PCC without impact on the system processor.

The PCC handles both the ENQ/ACK and X-ON/X-OFF flow control handshakes that prevent data overruns at the termi nal. Also, it generates the time delays required by unbuff ered teletypewriter devices that must perform physical movement of the carriage or print mechanism. Both the ENQ and ACK characters may be redefined by system software.

The PCC generates output parity and checks input parity when parity is enabled. In the 7-bit data mode, it clears the eighth input bit to zero and sets it to zero or one on output.

The PCC is provided with three paths to or from system memory. One path is used to send control information (or ders) to the PCC from system software. This facility replaces the channel programs used by other intermodule-bus-based I/O devices and eliminates the need for channel program processing facilities.

The remaining two paths provide an input path to and an output path from system memory for data. Having two unidirectional paths rather than one bidirectional path al lows the PCC to perform both input and output of data as the result of a single control order. Also, it allows the switch from output to input to occur very quickly, minimizing the ATP's response time.

The PCC does not have access to the address registers for these paths. This implementation simplifies the hardware design, but requires the PCC to interrupt the system proces sor to handle the backspace and delete-line special charac ters. These line-edit functions occur rarely and usually at human typing speeds, so the workload they generate for the



system processor is very low. The PCC's timer is used for most terminal timeouts while

the edge-triggered counter allows the PCC to do speed sens ing accurately at all supported line speeds by measuring the width of the asynchronous protocol's character start bit.

A portion of the PCC's RAM is used for a 16-character input buffer. An input buffer of this size allows the ATP subsystem a full 16 character times to respond to a software interrupt (or eight milliseconds at the 19,200 bps line speed) before a data overrun can occur.

ATP Hardware There are three elements in the ATP hardware design.

The first element is an interface on the intermodule bus. This is provided by the system interface board (SIB). Then there is logic to resolve the asynchronous contention of many PCC's for the shared facilities. The board that con tains this logic is called the asynchronous interface board (AID). Last, there are junction panels to provide the terminal and modem interfaces (see Fig. 2).

System Inter face Board (SIB) Functionally, the SIB is a byte multiplexer channel op

timized for the port controller chip (PCC) and terminal I/O. On one side, the SIB provides an interface to the inter module bus (1MB), the standard I/O bus for the HP 3000. On the other side, it controls the ATP bus, which connects the SIB to as many as eight asynchronous interface boards (AIBs). This design allows one ATP subsystem to provide 96 terminal I/O ports while occupying one 1MB channel address and nine 1MB I/O slots. These are important consid erations for a Series 64 system.

The SIB provides an interface to the software for control of the PCCs. It generates 1MB requests for the PCCs and performs the tasks required to manage the asynchronous nature of the interface.

Since the PCCs and the ATP bus are byte-oriented while the 1MB is word-oriented (two bytes), the SIB performs the byte packing and unpacking to translate from one bus to the

other. It also allows data transfers to start or stop in either the left or right bytes of the two-byte words in system memory.

As the controller of the ATP bus, the SIB polls each AIB many times each second to see if one of the PCCs has data for or needs data from system memory or wants to generate a software interrupt. The SIB generates software interrupts upon request of a PCC. The PCC's identity number and a byte of status information are presented to software at the 1MB interface. A first-in, first-out (FIFO) queue is main tained to smooth out peaks in the interrupt workload. To prevent interrupt overruns, polling on the ATP bus is sus pended when the FIFO is full. The worst-case maxi mum data rate on the ATP bus is about 150,000 characters per second.

Each PCC is provided with three paths into system mem ory. The SIB implements these paths via the IMB's direct memory access (DMA) facility. This implementation allows the ATP to handle heavy I/O loads with very low overhead. All of the overhead on an ATP I/O transfer is incurred to start and stop the transfer. Once the transfer starts, the only overhead is one 1MB DMA cycle for each two characters.

The SIB is designed to be easy to manufacture and sup port. The basic design is conservative, making little use of exotic or high-speed parts. The ability to diagnose a failed board has been designed into the ROM-based state machine that operates the board.

Asynchronous Interface Board (AIB) Each AIB contains an interface to the ATP bus, twelve

PCC modules, one modem controller chip (MCC) module, and the circuits that allow the PCC modules to share the ATP bus and the MCC. The AIB also contains the circuits that generate the UART baud rates and drive the cables to the junction panels.

The PCC and MCC modules and the ATP bus interface are attached to a bus on the AIB. Access to this bus is controlled by a ROM-based state machine. The state machine arbitrates

Junction Panel

Data Communicat ions Modem Motherboard

F i g . 2 . A d v a n c e d t e r m i n a l p r o cessor junct ion panel mini-boards carry four d i rect connect ion por ts o r two modem pons each .

2 4 H h W L E T T - P A C K A R D J O U R N A L M A R C H 1 9 8 2


requests for the AIB bus from the PCC modules, the MCC module and the ATP bus interface.

The MCC is another 6801 microcomputer. Its function is to allow each PCC to control the signals that operate a device connected to that port, usually a modem. Each PCC may read seven input signals and set eight output signals. This capability allows the PCC to control almost all devices offering a serial I/O capability.

The UART on the MCC is used to multiplex the 180 device control signals between the MCC and the junction panel. Use of the UART as a multiplexer el iminates the need for external multiplexer logic or a very large and expensive cable.

The AIB is designed to take advantage of the PCCs and the MCC to diagnose much of its own logic. Extensive use is made of the 6801 microcomputers to loop data from the system memory through the AIB's logic and back to system memory. This technique provides good troubleshooting facilities without adding special logic for testing.

Junction Panels The ATP provides two types of junction panels. Direct

connect junction panels are used to connect terminals in the local area. Modem junction panels provide the control sig nals required to operate full-duplex modems, allowing re motely located terminals to be connected to the HP 3000 via the public telephone system.

The ATP design calls for a high-speed, long-distance direct connection facility. This facility is implemented using EIA standard RS-422. The standard allows operation at a line speed of 100,000 bps at 4000 feet (1200 meters). However, since most terminals now available offer only RS-232-C interfaces, the ATP junction panels allow both types of interfaces, and interfaces can be easily mixed.

The ATP direct connection junction panels are im plemented as small boards that plug into a motherboard. Each mini-board carries four ports, either RS-422 or RS- 232-C.

To allow the maximum number of ports in the least pos sible space, a new more compact connector was designed to replace the 25-pin D subminiature connector in common use. The new connector uses only three wires for RS-232-C and five wires for RS-422. It is fully shielded and carefully grounded to insure that no radio frequency interference (RFI) is generated and that the system has the minimum possible susceptibility to external electrical influences.

The modem junction panels can also support a mixture of two interface standards. The modem junction panel uses the same mini-board concept as the direct connection panels, except that the modem mini-boards can carry only two ports per board because of the large size of the standard modem connectors.

On the modem junction panel, another 6801 demulti plexes the signals from the modem controller chip. This microcomputer is called the modem scanner chip. The port controller chip notifies the modem scanner chip (through the MCC) which input signals to examine, the expected state of these signals, and how to set the output latches.

The modem scanner chip scans the modem input signals, notices and debounces any changes and reports the changes io inn IviCC, which passes them on to the HJ(J.

ATP Software The principal objective of the ATP software design is to

provide a structure that will be easy to support and en hance. Most of the costs associated with a software product are incurred to fix problems in the original implementation and to add new features after the initial release. The ATP software was designed with a modular structure to facilitate this process.

The most desirable enhancement to HP 3000 terminal I/O has been a more flexible terminal-type facility. To achieve this goal in the ATP, the control of a port's characteristics is table-driven. Each of the current terminal types corre sponds to one table. Changing one of these tables or adding a new table is a simple task.

In addition, the ATP software is designed to be fail-safe. When the MPE software detects a serious problem, all pro cessing is stopped and a system failure is reported. When the ATP software detects a problem that affects only one port, only that port's operations are stopped. All other pro cessing continues. An on-line facility allows the system manager or the HP customer engineer to record the state of that port for later analysis. The on-line facility also allows the port to be reset without requiring the system to be warm-started.

The ATP does not support some obsolete features of the earlier controllers, including half-duplex modems, tape mode, and several obsolete HP terminals.

Acknowledgments A project of the size and scope of the ATP requires the

skills of many people. Phil Ho contributed to many aspects of the design and wrote the PCC ROM code that is the heart of the product. Phil Curtis designed the SIB and acted as the hardware lead. Nancy Madison designed the AIB and many of the diagnostic tools. John Starita designed the junction panels, was responsible for the new connectors, and wrote the MCC and MSC ROM code. Steve Couch and Jon Hewitt designed the software structure and did most of the coding, with help from Cathy Smith and Curt MotÃ³la.

James E. Beetem Jim Beetem received the SB degree in mechan ica l eng ineer ing f rom the Mas sachuset ts Inst i tu te of Technology in 1962. He then served in the U.S. Ai r Force unt i l 1967, at ta in ing the rank of captain. J im cont inued his educat ion at Stanford Univers i ty , earn ing an MBA degree in 1969 and do ing pos t - graduate work in computer science. He jo ined HP in 1969 and has managed product ion contro l and informat ion sys tems act iv i t ies. Now a project manager for data communicat ions, he is a member o f the Assoc ia t ion fo r Com put ing Mach inery and has presented

var ious papers on project management and terminal I /O at technical conferences. J im was born in Lexington, Kentucky and now l ives in San Jose, Cal i forn ia. He enjoys whi te-water raf t ing, photography, camping , and backpack ing . He i s mar r ied and has two sma l l chi ldren.



GUEST â€” A Signature Analysis Based Test System for ECL Logic by Edward R. Hol land and James L . Robertson

TESTING OF LOADED PRINTED CIRCUIT BOARD assemblies is vital in the production of any comput er system. For effective testing of HP 3000 Series 64

boards, a special tester was designed. This system, the Gemini Universal ECL Signature Test system, or GUEST, is able to capitalize on the HP 5005 Signature Multimeter1 and the built-in shift string organization of the flip-flops . on the boards under test (see article, page 11).

Since all ECL networks in the Series 64 must be termi nated in their characteristic impedances to suppress reflec tions, a tester running at slow clock speeds, as most board testers do, would not detect termination and other timing problems. A tester that functions at real-time clock rates was required and the GUEST system meets that need, being able to operate at clock rates up to 25 MHz.

In board testing, a test vector is a set of input states applied by a tester to a unit being tested. Generation of test vectors for a given unit under test (UUT) is often a difficult and time-consuming operation. In the GUEST system, vec tors are generated algorithmically in real time by hardware. Therefore, preparation of test programs is reduced from weeks or months to a few hours. This has made the GUEST test system a useful tool throughout the prototype design process of the Series 64. By contrast, most test systems are usable only after the design is complete and all the test vectors have been generated, a point that is reached typi cally very late in a project.

Computer -Dr iven Test System The GUEST test system is interfaced to the controlling HP

1000 Computer though the HP-IB (IEEE 488) interface bus (see Fig. 1). The primary functions of the computer system are to of testing and to guide the operator in probing of a defective UUT. One computer is able to handle up to six GUEST test stations, or a combination of GUEST test sta tions and DTS-702 test stations with all stations active at the same time.

The GUEST test system is interfaced to the UUT by a personality board as shown in Fig. 2. Building the personal ity board for ECL boards requires simply loading wire jumpers on a universal board. There is one set of jumpers for UUT input pins and another set of jumpers for pins that require terminations.

Hardware Test Vector Generat ion The test vectors used in the GUEST system are generated

as a serial bit stream by a 13-bit linear feedback shift regis ter, which produces an 8191-bit-long pseudorandom se quence (see Fig. 2). This bit stream is shifted though a 588-stage shift register on the GUEST board and then via the personality board through the shift string of the UUT. Shift ing continues for 1023 clock cycles. Then on the 1024th clock cycle, all of the shift register elements on both the GUEST board and the UUT are parallel-loaded under con trol of the diagnostic control unit shift line. The next 1023 clock cycles then cause the data loaded in the parallel operation to be shifted out of the combined registers past the GO/NOGO point. The test runs for a total of 8,379,393 (8191 x 1023) shifting clock cycles so that each of the inputs to the board is driven by all of the 8191 bits of the

7906 or 7920 Disc

HP-IB

941 6A Programmer

Power Supplies

HP 1000 Computer

System

HP 5005 Signature Mult imeter

Line Printer

or Other Peripherals

51 50 A Thermal Printer

F i v e M o r e ! i Test Stat ions j

Possible

RS-232-C

1 2621 CRT

Terminal

Probe

GUEST Test Stat ion

Fig. 1 . GUEST is interfaced to the contro l l ing HP 1000 v ia the HP-IB ( IEEE 488) , and to the un i t under test by a personal i ty board.



S e r i e s 6 4 B o a r d ( U U T )

C o m b i n a t i o n a l L o g i c o f U U T Probe

(Back t race ) ,

UUT Edge Pins â€¢ â€¢ â€¢

- O â€¢ â€¢

J u m p e r s \ Q ,

â€¢ â€¢ Personality

Board

GUEST Tes t Sta t ion

GUEST Shif t Str ing (588 Bits)

so

Clock Generator

13-Bit P s e u d o random

Generator

Power Supplies

Load/ Shift

Control DCU Shift

5005 Signature Multimeter

Control Signals f rom Computer (HP-IB)

pseudorandom sequence and each output of every shift string flip-flop on the board is also driven by all 8191 bits.

Go/No-Go Test ing As the data stored in the shift registers on the 1024th

clock cycle is shifted out past the GO/NOGO test point, the HP 5005 Signature Multimeter computes the signature of the data stream and the computer checks this signature against the expected signature stored for the board type being tested. Since both the GUEST and the UUT shift registers are shifted past this test point, any incorrect value from any part of the UUT causes an incorrect GO/NOGO signature. If the board passes the GO/NOGO test it is ready for installation in a system. If the board fails this test, additional testing is done on the GUEST tester.

The control logic of the GUEST tester can limit the signa ture analyzer to only the last N data outputs of the shift string, where N is specified by the computer. When N is the maximum value, the signature analyzer clock is always enabled and the signature depends on all bits in the shifted outputs. As N is decreased, fewer and fewer outputs enter into the signature. At some value N=n the signature will be incorrect, while at N=n â€” 1 the signature will be found to equal the expected value. Since there is a one-for-one map ping of the value of N and a point in the overall shift string, it is possible to pinpoint the fault to an I/O pin or an input to a shift string bit on the UUT. The computer is programmed to use a binary search algorithm and takes ten signatures or less to find the correct point in the shift string to begin further backtracing.

Fig. 2. Tesi vectors are generated a l g o r i t h m i c a l l y b y G U E S T a n d loaded in to the sh i f t s t r ing o f the H P 3 0 0 0 S e r i e s 6 4 b o a r d b e i n g t e s t e d . T h e d a t a i s t h e n s h i f t e d past the GO/NOGO point, where the 5005 S ignature Mul t imeter gener ates a signature that tel ls whether o r n o t a p r o b l e m e x i s t s o n t h e board.

Backtracing While backtracing faults to the component level, the sig

nature at each node is dependent only on combinational functions of the pseudorandom test vectors applied to the UUT input pins and shifted into the internal shift register states of the UUT. The signatures are measured by moving the signature analyzer probe to each node as guided by the computer.

Backtracing starts at the node determined by the GO/ NOGO binary search. When a node is found that has an incorrect signature but all inputs affecting that node have correct signatures, the faulty node has been isolated.

Feedback Loops To make effective use of signature analysis, all feedback

loops should be broken. If a feedback loop is not broken, a bad signature anywhere in the loop causes all other nodes in the loop to have bad signatures, and fault isolation to the component level is impossible. Feedback loops are broken during backtracing by the internal UUT shift registers. This is accomplished by clocking the signature analyzer when the outputs of each shift register are dependent only on its serial input and not on its parallel data inputs. Since bad signatures cannot propagate through these shift registers, any feedback path from an output to a parallel data input is broken.

Clock and Shif t Control Faults Another complication in the fault isolation process is that

of diagnosing faults in the clock and shift control circuits of the UUT. Since data in these areas is normally synchronous



Designing for Testability with GUEST

by Karen L. Meinert

The HP 3000 Ser ies 64 and the GUEST system were designed at the same t ime to work together. I t was discovered that certain design issues had to be considered when designing a Ser ies 64 printed circuit assembly (PCA) i f maximum GUEST testabi l i ty was to be ach ieved.

High Fan-in Fan- in i s de f ined as the number o f inpu ts to a c i rcu i t tha t i s

needed to produce an output. Because GUEST outputs only 8191 predefined test vectors, a fan-in of 1 4 or greater makes it impossi b le to sequence th rough a l l comb ina t ions (214 i s g rea te r than 8191) . I f the fan- in i s less than 10 i t i s sa fe to assume tha t a l l combinat ions wi l l appear in the tes t sequence. Modi f ica t ions to the board or the personal i ty board can be made to assure that al l comb ina t i ons a re t r i ed when the fan - in i s be tween 11 and 13 . There a re a lso severa l s igna ls (mos t l y h igh , mos t l y low, pu lse high, pulse low) avai lable on the personal i ty board; these can be used to dr ive inputs to increase a circui t 's ef fect ive test vectors.

I t is dr iven that the 1 3 inputs to a high-fan- in c ircui t be dr iven by contiguous bits from GUEST. The bits in the shif t str ing and on the edge connectors of GUEST are just shif ted versions of the 1 3 bits that guarantee the 81 91 test vectors. If 13 bits are selected at random, the re i s no guaran tee tha t a l l 8191 comb ina t ions w i l l appear .

Feedback Loops and Shi f t St r ings For a PCA to be GUEST-testable i t is not necessary that i t have

shift strings (see article, page 11). The shift strings are an effective m e a n s t o b r e a k u p f e e d b a c k l o o p s a n d i n i t i a l i z e m e m o r y e l e ments. I f no feedback loops exist on a board and memory can be in i t ia l ized, there is no need for shi f t st r ings.

Nonstandard Clocking Both the GUEST sh i f t reg is ter and the s ignature analyzer are

clocked on the standard system clock. I f other phases or edges of the c lock are used, test ing problems may occur. Feedback loops are not necessar i ly broken i f shi f t str ing registers are c locked on p h a s e s o t h e r t h a n t h a t u s e d b y G U E S T . A l s o , t h e r e c o u l d b e problems in gett ing information between the PCA and the GUEST system in t ime i f other phases are used. In general , c locks other than the s tandard system c lock should be avo ided when des ign ing PCAs to be tes ted by GUEST.

Non-ECL Circui t ry GUEST was designed to be an ECL circuit tester. To test circuits

other than ECL, t ranslator packs are used on specia l personal i ty b o a r d s t h a t a d a p t e a c h t e s t e d b o a r d t o G U E S T . I n a d d i t i o n , t h r e e - s t a t e b u s e s s h o u l d n o t b e a l l o w e d t o f l o a t t o t h e h i g h - impedance state or unstable signatures wil l result . The solut ion is

to have GUEST dr ive the bus whenever the board is not . This is accomplished by bringing the three-state bus enable signal to an unused I/O pin on the PCA. This signal is inverted and used on the personality board to enable the translator direction (ECL to TTL or vice versa).

TTL s i gna l s can be p robed by t he HP 5005 S igna tu re Mu l t i meter, since this analyzer is capable of changing its input voltage threshold to a l low for non-ECL s ignals . This can be done e i ther manually or under automatic control through the HP-IB interface.

RAMS To test a RAM completely each location must be addressed f ive

times: once each to write one, write zero, read one, read zero, and deselect. Since GUEST outputs a total of only 8191 test vectors, RAMs larger than 1 024 bits deep cannot be tested completely. To obtain known signatures on the outputs of RAMs, any location that is read has f i rst have been wr i t ten. I f a locat ion is read that has no t been wr i t ten , the read data is whatever was in the RAM at power-up. This is random data and therefore has a random signa ture . A s igna l ca l led RAMINIT (RAM iNiT ia l ize) is generated by GUEST to assure that a l l read locat ions are in i t ia l ized. GUEST g e n e r a t e s a p s e u d o r a n d o m n u m b e r s e q u e n c e w i t h a l e n g t h equa l t o ha l f o f a s i gna tu re ana l yze r cyc le . Th i s sequence i s repeated during the second half of the cycle. RAMINIT is high for the f i rs t ha l f -cyc le and low for the second hal f -cyc le dur ing the time that the RAMs can be read or written into. RAMINIT is used in the write enable circuitry to force a write when high and to enable reads and wr i t es when l ow . I n t h i s way eve ry l oca t i on t ha t i s addressed in the f irst half is writ ten with known data. When these addresses repeat themselves in the second half ( they are part of the repeated pseudorandom sequence), where reads may occur the locat ions have been ini t ia l ized, and known, repeatable s igna tures wi l l occur.

Karen L. Meinert Karen Meinert is a native of the State o f Iowa and at tended Iowa State Univers i ty , earn ing the BSEE de gree in 1979. She jo ined HP that same year and is a design engineer for the main memory subsystem used in the HP 3000 Ser ies 64. Karen l ives in Santa Clara, Cal i for nia. When she isn' t busy exercis ing her golden retr iever , she enjoys sports, scuba div ing, camping, and woodwork ing.

with the signature analyzer clock, only SAO and SAl (stuck at 0 and stuck at 1) signatures would be found and normal backtracing would be impossible. To isolate this type of fault, the first backtrace signature measured after the GO/ NOGO test fails is at the GO/NOGO test point. A correct signature here indicates that the UUT clock and shift con trol circuits are functioning. If the test fails, the shift register serial outputs are probed using a binary search algorithm until a good signature is found and the following device's serial output is bad. Clock and shift control inputs to this earliest failing shift register are probed while pseudoran

dom data is applied to the clock and shift inputs of the UUT. If no bad inputs are found, the shift register is suspected to be bad. If a clock or shift control input fails, backtracing starts at that point, with pseudorandom data still applied to the clock and shift inputs of the UUT, and continues until the failing node is isolated.

Test Report When the faulty node has been isolated, three additional

measurements are made on the suspected node. They are: +



peak voltage, - peak voltage, and resistance to ground. A test is with information pertinent to the failing node is then printed. The failure descriptions can be: NODE MOVES. NODE STUCK, NODE UNSTABLE, or OPEN TRACE. The infor mation on the test report can be used to further diagnose the fault (shorted nodes, open terminations, etc.).

Two assumptions made so far are that the probe has made good contact with the node being tested, and that the operator probes the correct point when prompted. This is not always the case, so provisions have been made for recovery from probing contact errors. To minimize operator misprobes, all pertinent 1C pins are probed in sorted as cending order before moving on to the next 1C. Thus, pin to-pin and IC-to-IC movement is minimized. Each node is probed at least twice in different locations to diagnose misprobes or open traces. If a misprobe is discovered before it is diagnosed by the computer, the operator can ask to reprobe the point where the error was made.

Edge connector pins, resistor pins, or any other point that is difficult to locate or probe can be declared inaccessible. During backtrace, the operator will not be prompted to probe these points. When the test report is printed, any inaccessible point that could have caused the fault is listed as not checked.

Creating the Test Fi le Before using GUEST, a UUT test file must be created by

TESTAID.3 To use TESTAID, topology information (a de scription of the types and interconnections of all integrated circuits on the UUT) must be supplied by the test program mer. This data is available from the computer routing and placement program used in the design of Series 64 boards, and is converted by a utility program to the format required by TESTAID. No other input data is required for TESTAID simulation because all test vectors are generated by the GUEST hardware.

After the preliminary test file has been created by TEST- AID, UUT-dependent test setup parameters need to be added (clock period, reference voltages, special node defin itions, etc.). Signatures can then be added as measured from a known good UUT. The addition of all GO/NOGO signatures is automatic and requires approximately two hours run time. Backtrace signatures are added by manually probing each internal node as guided sequentially by the computer. This process also requires approximately two hours.

Once generated, the UUT test file should be verified for accuracy of fault isolation and fault coverage. Fault isola tion accuracy can be verified by inserting a few known faults in critical areas of the UUT and determining whether they are diagnosed correctly. Fault coverage is verified by measuring the GO/NOGO signature repeatedly while prob ing internal nodes, as guided by the computer, with a special probe that forces each node both high and low. If the GO/ NOGO test passes, the stuck at 1 or stuck at 0 node fault is marked as not detected. When probing is complete, the fault detection percentage is computed and saved in the UUT test file. Typical fault coverage is in the order of 99% for ECL boards.

The final step to be done when creating a test file is documentation. By entering the DOCALL command, com plete documentation of the UUT test file (all node drivers,

receivers, terminations, special setup, signatures, etc.) can be listed on the line printer.

Acknowledgments The authors would like to acknowledge the creative work

of Bob Horst and Rick Amerson for proposing and design ing the early GUEST test system. Credit should go to Jim Tseng, Harish Joshi, and Bob Cook for the final translation of the hardware into a reliable and highly usable system. Mike Phillips deserves much credit for working out bugs in the process of personality board design and a lot of helpful feedback in the programming process. John Shing and Gary Gitzen deserve thanks for their helpfulness in making available modified 5005A Signature Multimeters for use in the GUEST system.

References 1. R.A. Frohwerk, "Signature Analysis: A New Digital Field Ser vice Method," Hewlett-Packard Journal, May 1977. 2. P.S. Stone and J.F. McDermid, "Circuit Board Testing: Cost- Effective Production Test and Troubleshooting," Hewlett-Packard Journal, March 1979. 3. K.P. Parker, "Software Simulator Speeds Digital Board Test Generation," Hewlett-Packard Journal, March 1979.

James L. Robertson Jim Robertson jo ined HP in 1958 wi th previous experience as an avionics test engineer. He has been a standards lab manager and a pretest supervisor, and now works in electronic tooling. Jim is a nat ive of Los Angeles, Cal i forn ia and received the AS degree in e lect r ica l engineer ing from Foothi l l Col lege, Cal i fornia in 1968. He is marr ied, has two sons and two marr ied daughters , and enjoys camping, hiking, and travel. He l ives in Los Altos, Cal i fornia.

II

X

Edward R. Hol land I Ed Hol land was born in Det ro i t , Mich i gan and a t tended Mich igan Sta te Un i versity where he earned the BS degree in e lectr ical engineer ing. Af ter serv ice in the U.S. Naval Reserve and work on

I radar and test system design, Ed con t inued h is educat ion a t S tanford Un i vers i ty , earn ing an MS degree in e lec trical engineering in 1960. He joined HP in 1961 and has cont r ibuted to a number o f products that inc lude the

~ / 3440 , 3460 , and 3462 Vo l tme te rs , t he jSm 2114 , 21 15 , and 21 16 Compute rs , and I K B H P 3 0 0 0 C o m p u t e r s . C u r r e n t l y , E d i s

m - ^ ^ ( ^ * l l H p r o j e c t m a n a g e r f o r t h e S e r i e s 6 4 C P U , c a c h e a a n d G U E S T t e s t e r h a r d w a r e . H e i s t h e a u t h o r o f a paper on computer I /O and per iphera ls and is named inventor on a patent re la ted to the bas ic A- to-D convers ion concept used in the 3460 DVM. Ed is a member of the IEEE and the Planetary Society. He l ives He Palo Alto, Cal i fornia, is marr ied, and has two chi ldren. He en joys backpack ing , b icyc l ing and sa i l ing and works a t p romot ing space explorat ion and the search for extraterrest ia l in te l l igence.



Packaging the HP 3000 Ser ies 64 by Manmohan Kohl i and Bennie E . He lmso

THE HP 3000 SERIES 64 is the new highest-perfor mance member of the HP 3000 Computer family. It has been designed to retain certain visual aspects

of current HP systems and peripherals for visual compati bility, but also incorporates new directions for future prod ucts. The Series 64 is intended to be used in an EDP envi ronment and a cost-effective package design has been applied to optimize reliability and serviceability.

New tubular welded frames, preassembled formed cardcages, bus bars and modular junction panels were de veloped as part of this cost-effective package. New two- piece the connectors for printed circuit boards and the backplane and an effective cooling scheme were developed for higher reliability.

The cabinet has two open frames that are bolted together in the front and rear. These provide complete access to various assemblies including backplanes for servicing (Fig. I). A 36-slot cardcage contains the CPU, memory and I/O adapter cards and a second 24-slot cardcage contains I/O device cards. The power system is designed to support the maximum configuration.

Separate cooling systems are used to cool printed circuit boards and power supplies. Cooling air is drawn from the rear and is then forced through the cardcages for proper cooling of the printed circuit boards. Bus bars are used between backplanes and power supplies to minimize cable harnesses and to facilitate replacement of power supplies. I/O junction panels are modular and provide maximum flexibility for various system configurations.

The system cabinet is designed to meet HP Class C en vironmental specifications. This requires the system to op erate at extreme temperature and humidity, between sea level and high altitude, and exposed to shock and vibration.

Cool ing System The cooling system is designed to meet two major objec

tives under worst-case conditions: â€¢ Transistor junction temperatures not to exceed 85Â°C for

higher reliability â€¢ Air temperature rise not to exceed 10Â°C for adequate

noise margin. The CPU boards contain mostly ECL chips with an aver

age heat dissipation of 0.5 watt/chip and 45 watts/board. However, the total heat dissipation from the maximum- configured CPU, memory, and I/O adapter is 1300 watts. These parameters presented some challenges in designing the cooling system. To meet the above objectives, five 16.5- cm-diameter tubeaxial fans are used to deliver high- velocity air at 2.3 mis for cooling the ECL chips.

Cardcages are cooled by taking air in from the rear and then forcing it up through the cardcage. A plenum is used between the fans and the cardcage to help distribute air across the cardcage more evenly. The fans are inside the plenum, which is lined with acoustic foam to reduce noise.

Two sets of overtemperature switches are mounted on the two cardcages. When the temperature reaches the low set point (40Â°C), both audio and visual alerts to the operator are initiated. The system is shut down if the cardcage tempera ture reaches the high set point (50Â°C).

Electromagnetic Compatibi l i ty Design To decrease emission of electromagnetic energy and sus

ceptibility to it, the logic ground is isolated from the frame ground. This is achieved by electrically isolating all printed circuit boards and backplanes from the sheet-metal cardcages. Radiated emissions are minimized by providing and connecting ground planes in the printed circuit boards

Top Covers and Exhaust P lenum

Power Control Module

CPU Frame

, -

Fig. 1 . HP 3000 Series 64 cabinet design.



1 Signal Layer

2 Ground Plane 2 oz

P R E P R E G

3 - 5 . 2 V P l a n e

4 - 2 . 0 V

PRE PREG

5 Ground Plane

6 Signal Layer

Fig . 2 . Typ ica l HP 3000 Ser ies 64 pr in ted c i rcu i t board con struct ion. Dimensions are in inches.

and backplanes. Susceptibility is minimized by grounding the frames, exterior panels, and internal assemblies to each other through plated interfaces. Beryllium-copper clips mounted on all exterior panels make contact with the plated frame members to provide grounding to reduce susceptibil ity to electrostatic discharge. Three isolation transformers, one for each phase, provide good common-mode noise re jection, lessen susceptibility to electromagnetic interfer ence, and insure that ac line leakage current is well within safety requirements.

Boards and Connect ions Designing and making large, high-density, controlled-

impedance-interconnect printed circuit boards presented some new challenges. The design parameters that were traded off to arrive at final design rules were board size, minimum trace width and spacing, 1C package density, trace routability, and raw board testability. What resulted were boards approximately 36 cm square with an average of 120 1C packages per board. Minimum trace width and spac ing was set at 0.18 mm (0.007 in). A 2.5-mm (0.100-in) grid system for component holes was employed to facilitate testing of unloaded boards before assembly.

For density and routability considerations only two sig nal layers are used. These are on the outer layers, leaving the inner layers for power and ground distribution. The typical board construction is shown in Fig. 2. It should be noted that the signal lines are in the standard microstrip configuration with nominal characteristic impedance of approximately 68 ohms.

For the interconnect design of the large number of indi vidual circuits, manual design was ruled out, largely be cause of time constraints. A sophisticated design automa tion approach was selected to provide automated place ment of components and routing of traces after initial data entry of schematic and mechanical information.

Interconnecting all the boards in a system this large also presented some special problems. Board-to-board inter

connection in the CPU and memory sections requires just under 10.000 pins. A pin-and-socket (two-piece) connector system was selected. It was felt that the conventional card- edge (one-piece) system had some problems (such as board edge contamination from soldering and handling, gold plating cost, thickness and quality control) which the two- piece system minimizes. For example, all connector plating is done by the connector manufacturer, and all gold plating is eliminated from the printed circuit boards.

The board-to-board interconnection is accomplished by means of a backplane assembly. In addition to providing for the interconnection of signal lines, the backplane also dis tributes dc power to the individual circuit boards. The construction details of the backplane are as shown in Fig. 3. There are four signal layers, two ground planes, and four power planes. The outer layers are microstrip transmission lines and the inner signal layers (layer 3 and 8) are conven tional striplines. The power and ground planes are du plicated to aid in power distribution. Because of the number

1 Signal Layer

2 Ground Plane 2 o z

PRE PREG

1 oz 3 Signal Layer

4 - 5 . 2 V P l a n e 2 o z

PRE PREG

5 - 2 . 0 V P l a n e

6 - 2 . 0 V P l a n e

PRE PREG

2 o z 7 - 5 . 2 V P l a n e

8 Signal Layer

Core .010 Â± .001 n ^ n

' / 2 0 Z

9 Ground Plane

10 Signal Layer

F ig . 3 . HP 3000 Ser ies 64 backp lane cons t ruc t i on . D imen sions are in inches.



Receptacle Assembly â€”

Close-Tolerance Mole with Fused Tin/Lead Plating

-Printed Circuit Board

-Pin Header

-Compl iant Press-Fit

Pin

-Backplane

F ig . 4 . Board - to -backp lane connec t ion method i s gas - t i gh t and e l iminates solder ing.

of layers and the large amount of copper in the inner planes, conventional connectors soldered into the backplane were not practical. The problems of thermal stresses to the backplane assembly during soldering and inability to re move or replace connectors because of the heat-sink effect of the inner planes led to the selection of press-fit connec tions between the connectors and the backplane. An inter ference fit between the connector pins and holes in the backplane produces a gas-tight connection and eliminates the need for soldering the pins to the backplane. The board- to-backplane connection is shown in Fig. 4.

A c k n o w l e d g m e n t s The development of the new cabinet and packaging re

quired contributions from many people. Contributions

P R O D U C T I N F O R M A T I O N r I Ã© HP 3000 Ser ies 64 Computer System

~ MANUFACTURING DIVISION: Computer Systems Div is ion 19447 Pruner idge Avenue Cupert ino, Cal i forn ia 95014 U.S.A. SPECIFICATIONS: HP Pub l i ca t ion No. 5953-0656 P R I C E S I N U . S . A . : 3 2 4 6 0 A H P 3 0 0 0 S e r i e s 6 4 S y s t e m P r o c e s s i n g U n i t , $ 1 6 4 , 7 0 0 . 3 0 1 4 3 A I / O A d a p t e r M o d u l e , $ 1 0 , 0 0 0 . 3 0 1 4 2 A 1 M - B y t e Memory Modu le , $16 ,000 . D isc Dr i ves , t e rm ina l s , and other per ipherals addi t ional .

were made by Bob Cook on packaging and serviceability, Steve Spelman on industrial design and plastic parts, George Canfield on power distribution, and Katie Torres for most of the drawings and documentation. Jim Brannan was responsible for most of the product qualification testing, Barb Gee provided the manufacturing support and Toby Huff the service engineering support. Guidance and support was provided by Peter Rosenbladt, R&D section manager.

Manmohan Kohl i Manny Kohli Â¡s a native of Punjab, India and rece ived the BSME degree f rom Punjab Univers i ty in 1964. He earned the MSME degree at the Univers i ty o f Cal i fornia at Berkeley in 1967 and then d id mechan ica l des ign o f impac t p r in t e rs and h igh-speed mechan isms and st ress analys is of a i rcraf t in ter iors be fo re jo in ing HP in 1974 . Manny de s igned the cab inet for the HP 3000 Ser ies 33/44, worked on the thermal

, p r i n t e r u s e d i n t h e H P - 9 1 a n d H P - 9 7 M l < , J j | C a l c u l a t o r s , a n d i s p r o j e c t m a n a g e r f o r

the industr ia l /product design of the HP 3000 Ser ies 64. Manny enjoys working

on home projects, t inker ing wi th cars, camping, and reading. He is marr ied, has two chi ldren, and l ives in San Jose, Cal i forn ia.

Bennie E. Helmso Ben Helmso joined HP in 1 960 after re ce iv ing the BSEE degree f rom the Uni versity of California at Berkeley. During his more than twenty years at HP he has worked on osc i l loscope, mic rowave sweepe r , and w r i s twa tch p roduc t de s igns and managed manufac tur ing en g ineer ing and qual i ty assurance groups. At present , Ben is do ing RF s ignal generator product design. He is named inventor on a patent re la ted to the HP-01 watchband. Ben was born in Los Angeles, Cal i fornia and served two

Â¿Ã© I years in the U.S. Navy before studies for Ãˆlff*Ã~\ his BSEE degree. He Â¡s married, has

two children, and l ives in Spokane, Washington. His interests include camping, boat ing, hunt ing, and f ish ing.

\ H e w l e t t - P a c k a r d C o m p a n y , 3 0 0 0 H a n o v e r

S t r e e t , P a l o A l t o , C a l i f o r n i a 9 4 3 0 4

H E W L E T T - P A C K A R D J O U R N A L

MARCH 1982 Volume 33 â€¢ Number 3 T e c h n i c a l I n f o r m a t i o n f r o m t h e L a b o r a t o r i e s o f

H e w l e t t - P a c k a r d C o m p a n y Hewle t t -Packard Company. 3000 Hanover S t ree t

Palo Al to. Cal i forn ia 94304 U.S A. Hewlet t -Packard Centra l Mai l ing Depar tment

Van Heuven Goedhar t laan 121 1 181 KK Amstelveen. The Nether lands

Yokogawa-Hewle t t -Packard L td . . Sug inami -Ku Tokyo 168 Japan Hewlet t -Packard (Canada) Ltd.

6877 Goreway Dr ive . Miss issauga. Ontar io L4V 1M8 Canada

Bulk Rate U.S. Postage

Paid Hewlett-Packard

Company

0 4 0 0 0 9 4 0 3 5 E . & & H A N S & Q M O O

S U S Â « f I I S E A R C H G E N T E . F P D D I V C O D E B L D G N 2 4 4 - 7 M O F F E T T F I E L D C A 9 4 0 3 5

CHANGE OF ADDRESS: To change Send address or de lete your name f rom our mai l ing l is t p lease send us your o ld address label . Send c h a n g e s A l l o w d a y s . J o u r n a l . 3 0 0 0 H a n o v e r S t r e e t , P a l o A l t o . C a l i f o r n i a 9 4 3 0 4 U . S . A . A l l o w 6 0 d a y s .


Documents

HE! WLE TPACKARD JOURNALGUEST â€” Holland Signature Analysis Based Test System for ECL Logic, by Edward R. Holland and James L Robertson It runs at real-time clock rates and generates