Digital PDFs
Documents
Guest
Register
Log In
EY-3474E-DP
May 1986
107 pages
Original
6.8MB
view
download
Document:
dtj v01-02 mar1986
Order Number:
EY-3474E-DP
Revision:
0
Pages:
107
Original Filename:
dtj_v01-02_mar1986.pdf
OCR Text
o MicroVAX II System Digital Technical Journal ofDigital Equipment Corporation Number 2 March 1986 Editorial Staff Editor - Richard W. Beane Production Staff Production Editor- M. Terri Autieri Designer- Charlotte Bell Typesetting Programmer -James K. Scarsdale Advisory Board Samuel H. Fuller, Chairman Robert M. Glorioso John W . McCredie John F. Mucci Mahendra R. Patel Grant F. Saviers William D. Strecker Maurice V. Wilkes The Digital Technical journal is published by Digital Equipment Corporation, 77 Reed Road, Hudson, Massachusetts 01749. Comments on the content of any paper are welcomed. Write to the editor at Mail Stop HL02-3/KI1 at the published-by address. Comments can also be sent on the ENET to RDVAX::BEANE or on the ARPANET to BEANE%RDVAX.DEC@DECWRL. Copyright © 1986 Digital Equipment Corporation. Copying without fee is permitted provided that such copies are made for use in educational institutions by faculty members and are not distributed for commer cial advantage. Abstracting with credit of Digital Equipment Corporation's authorship is permitted. Requests for other copies for a fee may be made to the Digital Press of Digital Equipment Corporation. All rights reserved. The information in this journal is subject to change without notice and should not be construed as a com mitment by Digital Equipment Corporation. Digital Equipment Corporation assumes no responsibility for any errors that may appear in this document. ISBN 932376-89-4 Documentation Number EY-3474E-DP The following are trademarks of Digital Equipment Corporation: CompacTape, DEC, the Digital logo, MicroVAX, MicroVAX I, MicroVAX II, MicroVMS, PDP-7, PDP-I I , Q-BUS, RSTS, TK50, ULTRIX, ULTRIX-32, UN113US, VAX, VAX-ll/730, VAX-ll/750, VAX-ll/780, VAX 8600, VAX 8200, VAXELN, VAXstation, VMS, VT. Apple II is a trademark of Apple Computer, Inc. AT&T is a trademark of American Telephone & Tele graph Company. IBM is a registered trademark of International Business Machines, Inc. Mylar is a trademark of E. I. duPont deNemours & Company. Tek is a registered trademark of Tektronix, Inc. UNIX and System V are trademarks of AT&T Bell Cover Design Hardware, software, and peripheral devices for the Micro VAX 11 system are featured in this issue. Two Laboratories. VLSI devices, the 78032 CPU chip and the 78132 FPU chip, form the core of this system. Our cover shows the input programmable logic array for the FPU chip. The cover was designed by Deborah Falck of the Graphic Design Department. Xerox is a registered trademark of Xerox Corporation. 68000 is a trademark of Motorola, Inc. 8086 and Intel are trademarks of Intel Corporation. The manuscript for this book was created using generic coding and, via a translation program, was automatically typeset. Book production was done by Educational Services Media Communications Group in Bedford, MA. Contents 8 12 24 Foreword jeffrey C. Kalb The MicroVAX 78032 Chip, A 32-Bit Microprocessor Daniel W. Dobberpuhl. Robert M. Supnik, Richard T. Witek The MicroVAX 78132 Floating Point Chip William R. Bidermann, Amnon Fisher, Burton M. Leary, Robert J Simcoe, William R. Wheeler 37 Developing the MicroVAX II CPU Board Barry A. Maskas 48 The Evolution of the Custom CAD Suite Used on the MicroVAX II System Anthony F. Hutchings 56 The Making of a MicroVAX Workstation Rick Spitz, Peter George. Stephen Zalewski 66 76 The RQDX3 Design Project Nicholas A. Warchol, Stephen F. Shirron The Evolution of Instruction Emulation for the MicroVAX Systems Kathleen D. Morse, Lawrence). Kenah 86 99 The TK50 Cartridge Tape Drive Steven E. Doone, Guemer E. Schneider Porting ULTRIX Software to the Micro VAX System Raymond J. Lanza New Products Editor,s Introduction Richard W. Beane F.ditor This issue of the journal is the second pub lished by Digital's engi neering organ izati on. Our first issue ( August 198 5 ) fe atured p a p e rs about the tech n o l o g i e s us e d i n designing the VAX 8600 processor. The jour nal presents papers written by the technical contributors who design Digital's products . The i nformat ion is directed at engineering faculty members, Digita l's own engi neers, and customers. This issue features the M icroVAX ll syste m , which imp lements the VAX architecture o n a single CPU chip, the 780 3 2 . Another chip, the 7813 2 , executes fast floating point oper ations ; a si ngle board holds both those chips, plus one m egabyte of memory . New per i pherals have been designed , and the VMS a n d ULTRIX s o ftware a d a p t e d to t h e MicroVAX I I syste m . T h i s col l ect i o n o f papers, b y authors from d i fferen t engi neer ing groups, presents a wide spectrum of the MicroVAX II hardware and software . The first paper, by Dan Dobberpuhl , Bob Supni k , and Rich W i tek, is a description of the 780 3 2 CPU chip, which im p le ments a subset of the fu ll VAX i nstruction set. The d e c i s i o n s a b out w h i c h i n s truc t i o n s t o microcode are discussed , along with hard ware simplifications needed to fit functions on one chi p . The chip's various operations are expla i n e d , with e m p hasis on paral le l execut i o n . The CPU chip can use a coprocessor, the 78 1 3 2 FPU chip , to pe rform fast floating p oint o p e r a t i o n s . T h e p a p e r by B i ll Bidermann, Amnon Fisher, M i ke Leary, Bob S i mcoe , and Bill Wheeler relates t he 7813 2's arch itecture and algori thms. The protocol between the two chips is discussed and a description is given of the wiring and signal i n t e g r i ty i s s u e s a n d h o w t h e y w e r e addressed. Both chips are m ounted on a single board conta ining one megabyte of me mory . Barry Maskas' paper exp lains how the CPU board had to be designed as a linked sequential machine with dual ports . The development process is interesting because the board and the chips were designed in para l lel. The paper on CAD tools , by Tony Hutch i ngs , relates the large role they played in the chi p and board designs . The various levels of CAD support , from be havi oral modeling, t hrough logic a n d circuit s i m u l at i o n, to wirelist generation is described. The software gra p hics t ha t t u r n t h e MicroVAX II system into a single-user work station are reported in the paper by Rick Spitz , Peter George, and Steve Zalewski . The control of wi ndowing software and virtual displays is discusse d , as are the implementa tion details. The RQDX3 disk controller provides fast data transfers between a C PU and disk stor age devices. Nick Warc hol and Stephen Shir ron explain the top-down development pro cess that lead to unique solutions to difficult prob l e m s . Their descri p tion of the fi nal archi tecture shows how the original goa ls were met in the eventual design. With a subset architecture , those instruc t ions not in the set have to be executed another way. The paper by Kathy Morse and Larry Kenah describes the macrocode emula tion of the VMS changes required to do that . The test i ng techniques are i nterest ing s i nce they were done without M icroVAX hardware . The paper by Steve Boone and Guenter Schne ider describes the TKSO, a strea ming cartridge tape drive prov i d i ng fast data trans fe r . The authors d iscuss the unique cartridge, tape transport , and controller designs, hi gh Iight i ng the self-thread i ng technique and the serpentine readjwrite process . The fi nal paper, by Ray La nza , describes p o r t i n g the ULTRIX - 3 2 software to the MicroVAX processor. Ray explains the cross development environment and the mapping techni ques that allowed the heart of the ULTRIX software to fit on a small system . 3 Biographies Bill Bidermann is the engineering manager of the Advanced Deve lopment Mem ory Group. He consu lted on the float ing point chips for both the VAX 8 2 0 0 and MicroVAX II processors . Before joining Digital in 198 4 , he was a consultant for Tenex and Rampower. Previously, he worked as a project manager at Hewlett Packard LaboratOries in Palo Al to, Cal iforni a , and as a design engineer at Texas I nstru me nts Central Research Labs . B i l l received his S.B. and S . M . degrees i n electrical engineering and computer science from M . I . T. i n William R. Bidermann 1978 Steven E. Boone Steve Boone graduated from Michigan State Univer sity ( B . S . E . E . , 1974) and the University of Mich igan ( M . S . E .C . E . , 197 5). He has a lso done advanced graduate work at Southern Methodist Un iver sity. Before joining Digital in 1984, Steve worked as a principal hard ware engineer for Seq uoia Systems, and as a senior design e ngineer at Prime and Raytheon. For two years, he was an engineering supervisor working on the TK 5 0 controller design . Steve is currently the technica l �ngineering manager for TK Cartridge Tape Su bsystem Engineering. Dan Dobberpu h l is a senior consulting engi neer and manager of the Processor Advanced Deve lopment Group. On the MicroVAX II proj ect, he led the impl ementation of the 78032 CPU chip. Previously, he consulted on CMOS, ZMOS, and TIPI technology development, and worked on the Til and F l l projects . Dan joined Digital in 1976 from General Electric Company. He received a B .S . E . E . degree from the University o f I l linois in 1967. A member o f IEEE, he holds fou r patents and is the coa u thor of The Design and A nalysis of VLS! Circuits. Daniel W. Dobberpuhl Amnon Fisher Ed ucated at Israel Institute of Techno logy ( B . S . E . E . , 1973) and City Coll ege o f New York ( M . S . E . E . , 1975), Amnon Fisher worked as both a contribu tor and project leader on t he 3 20 16 CPU at National Semiconductor. join ing Digital in 1983, he was a project leader of the Vl l jSCORPIO floating point chip (VAX 8 2 0 0 system) , and a contrib u tOr tO the MicroVAX I I 78132 chip. Amnon is curre ntly an engineering manager in the SemiconductOr Engineering Grou p , working on t h e design and deve lopment of a fou r-chip s e t VAX implementation . Peter C. George Earning h is bachelors and masters degrees in com puter science a n d engineering from M . I . T . in 1980, Peter George joined the VMS Development Group in that year. He first worked on VMS user interfaces, then on the workstation software as a p rincipal engineer on the VAXstation project. Peter is currently a project leader, working on advanced workstation software projects . Peter is a member of ACM, and the nationa l honor societies Tau Beta Pi, and Eta Kappa N u . 4 Anthony F. Hutchings Tony Hutchings received his B.S. degree from the University of Newcastle On Tyne in 19 6 5 . At ICL in the U . K . fo r 1 6 years, h e designed operating systems a n d was o n e o f the VME· system architects on the 2 900 series . He later became corporate man· ager of CAD . Tony joined Digital in 1 98 2 as the project manager for the proprietary DECSIM software and then became manager of the VLSI CAD Group. Tony, a membe r of IEEE and the British Computer Society, is currently chairman of the CAD section of the ICCD. Lawrence J . Kenah Larry Kenah , a cons u lting software engineer in t he VMS Deve lopment Group, wrote the decimal jstring emu lator for the MicroVA.X project. Since joining engineering i n 1 9 80, Larry has worked on t he VMS nucleus in the areas of memory management, pro· cess schedul ing, and image activation. He came to Digital in 1 9 7 5 as an i nstructor and course deve loper in Educational Services . Larry received h is B . S . degree ( 1 968) from Boston Col lege and his M . S. ( 1 970) and Ph . D . ( 1 9 7 7 ) degrees in high-energy physics from Northwestern Uni· versity. He is coauthor of VAXjVMS Internals and Data Structures. Ray Lanza is currently the project leader for the ULTRIX- 3 2 system. After joining Digital i n 1 9 8 3 , he ported the ULTRIX system to t he Mi croVA.X I processor. As project leader, he ported the system to the Micro VAX I I processor in 1 9 8 4 . Ray received h is B . S .E.E. jC.E. degree from the University of New Hampshire in 1 980, then became the lead engineer in a UNIX group a t AT&T. Later he was a senior software engineer a t Wa ng Laboratories, I n c . , researc hing wi ndowing systems and UNIX distribu ted syste ms. Raymond J. Lanza I n 1 98 0 , M ike Leary joined Digital after receiving his B . S . degree i n el ectrical engineering from the Unive rsity of Massa· c husetts . In semiconductor engineering, he worked on chip designs and he lped to develop the floating point chip for the MicroVA.X II system . Mi ke did behavioral model ing, wrote microcode , and designed the main sequencer for t hat chip. He is now a senior engineer in the Advanced Deve lopment Memory Group, designing the internal cache for an advanced chip project . Burton M. Leary Barry A. Maskas Barry Maskas is a principal engi neer cu rrently speci fying and designing an i ntegrated circu it, and fiber-optic boards for fu ture systems . As a senior engineer on the MicroVA.X I I project , he was co-designer of the CPU board and the me mory boards. Barry came to Digital in 1 9 79 after rece iving his B . S.E.E. degree from Pennsylvania State University. He also holds a n associate's degree from the Commu· nity Col lege of Allegheny Cou nty and did u ndergraduate work at LSU . Barry i s a member of Eta Kappa Nu ; he has a patent pending for a seJf. configurable memory su bsyste m . 5 ----�- Biographies Kathleen D. Morse As a consulting software engineer, Kathy Morse is responsible for VMS support on a l l low-end CPUs and peripherals. Ear lier, she did the VMS support for both MicroVAX systems , the VAX 1 1 /782 system , and the MA780 m u l tiport memory. Kathy joined Digital i n 1 9 76 after receiving her B . S . C . S . degree from Worcester Polytechnic I nstitute, where she al so earned her M . S . C .S . degree in 1 9 8 5 . Kathy is a member of I EE E, the Professional Council, and ACM, as we l l as Tau Beta Pi and Upsilon Phi Epsilon . She has pu blished in the Computer Mea surement Group ' s 1 9 8 5 Conference Proceedings , and Datamation. Guenter E. Schneider Guenter Schneider joined the Mass Storage Group in 1 9 7 0 , when it had only about 2 5 people . He has worked on the designs for the RX0 5 , RL0 1 , RX0 2 , TU 5 8 , RX 5 0 , and RD50j5 1 storage devices . As a consulting engineer, he hel ped to design the TK5 0 cartridge tape drive . Guenter received a Diplom lngenieur from the Technische Hochschu le Aachen in West Germany and his M . S . M . E . degree from M . I .T. in 1 969 . H e holds two patents , with a third pending, and is a m e m b e r of the e ngin eering society Verein D e u tscher I ngenieure . Stephen F. Shirron Educated at Catholic Un iversity of America (13 . S ., 1 9 80 and M . S . , 1 9 8 1 ) , Stephen Shirron came to Digita l after graduating Summa C u m Laude . As a senior software engineer, he developed an interpreter for VAXjSmal ltalk-80 and designed the V�Xstation 1 0 0 firmware . Currently a principal software engineer, Stephen designed and i m p lemented the firmware for the RQDX3 disk controller. He is a member of Phi Beta Kappa and has written a chapter in Sma/ltalk-80: Bits of History, Words of A dvice. Robert J. Simcoe 13ob Simcoe is a technical manager currently work· ing on serial interconnect produ cts . He was the technical manager for the floating point chips in both the MicroVAX I I and VAX 8 2 0 0 syste ms. Before joining Digital in 1 9 8 2 , Bob worked for the Department of Defense and General Electric Company. His du ties i nvo lved MOS design , process deve lopment, and product design using custom res. Bob holds seven patents on IC circuitry and syste ms. He graduated from the University of l l l i nois ( B . S . E . E . , 1 966) . Rick Spitz manages VAXjVMS software deve lopment for CPUs and peripherals. As a consu lting software engineer, he was a primary member of the architectural design team on the MicroVAX workstation project. Rick designed the VMS grap hics hardware inte rface architecture and, for six years , has specialized in VAXjVMS hardware software interfaces . He j o i ned Digital in 1 9 7 7 as a senior software specialist and received Digita l's Software Excel lence Award. Previously, Rick developed m icroprocessor software for Inco , Inc. He earned a B . S . E . E . degree from Clemson University i n 1 9 74 and his M . S . C . E . degree from the U niversity o f Lowe l l in 19 83 . Rick Spitz 6 Bob Supnik is a corporate consultant a nd group manager in semiconductor engineering . On the M icroVAX CPU chip project, he was project l eader and l ead m icroprogrammer. Bob was the project manager for the J11, a contributor to the F11, and supervised advanced deve lopment on the H S C 5 0 a nd U DA 5 0 . Before j o i n i ng D igital in 1977, he worked at Applied Data Research . Bob rece ived his S . B . degrees (1967) in math and history from M . I .T. and his M .A . degree (1972) in history from Brandeis U niversity. He received Science Digest's" 1 0 0 Top I nnovators of 1985" award. Robert M. Supnik Nicholas A. Warchol ) In 1977, Nick Warchol j oined Digital after receiving his B . S . E . E . degree (cum laude) from the New Jersey I nstitute of Technology. Later he earned his M . S . E . E . degree from Worcester Polytechnic I nstitute in 1984. He is a member of Tau Beta Pi and Eta Kappa Nu . Nick has worked on the advanced deve lopment of charged couple device memories , bubble memories, and laser video disks . I n his present position a s a principal engineer, h e worked on the design of the RQDX3 disk controller. William R . Wheeler After earning his B .S . E . E . degree in 1982 and his M . S . E . E . degree i n 1983 from Corn e l l Un ivers ity, B i l l Wheeler came to D igital as a j u n ior engineer. On the MicroVAX I I project, he designed the exponent datapath and control for the 78132 floating point chip. Later he designed the exponent section of the floating point chip i n the VAX 8 2 0 0 system . B i l l is currently working on the instruction box and bus interface unit for a new m icroprocessor chip. RichardT. Witek Rich Witek is a consu l ti ng engineer working o n the architecture and i m p l e mentation of new m icroprocessors . He helped to develop and debug the MicroVAX 78032 CPU chi p . Rich a lso worked on i m plementing DECnetjE and on the DECnet Architecture Review Group during Phases 2 and 3. He also worked in the VLSI CAD grou p . Before joining Digital i n 1977, Rich was a senior technical associate at AT&T Bell Laboratories and a n engineering assistant at Argonne National Laboratory . H e received his B .A . degree i n computer science from Aurora Coll ege , and is a member of ACM and I E E E . Stephen H. Zalewski Steve Zalewski i s a senior software engineer worki ng on the graphics execution rou tines for the VAXstation I I jGPX system . He joined D igital i n 1981 after rece iving his B . S . degree in computer engineering from \Vorcester Polytechnic I nstitute . Steve developed the graphics device driver for the VAXstation I and II sys tems. His earlier work involved writing RMS file-sharing internals and i m pl em e n t i ng RMS fi l e s haring a nd g loba l buffers for VAXcl uster software . 7 ---- Foreword Jeffrey C. Kalb Vice President and Group Manager Large Scale Integration 8 The roots of the MicroVAX program go bac k to the summer of 1 9 8 1 . To understand why this program was initiated and the thinking behind it, one has to look at the events of that t i m e . Many deve lopments were taking p l a c e , sug gesting that a whole new class of systems capa b i li ties could emerge before long . The VAX-I 1 /780 system was in i ts heyday. It was recognized as the standard against which a l l o t h e r c o m p u t e rs w e re c o m pared a n d benchmarked . And true t o fashion , everyone seemed to find some way to benchmark his machine in some particu lar niche aga i nst the l l j 7 8 0 ' s capab i l i ties . That was particu larly t r u e of t h e u p c o m i n g g e n e r a t i o n o f microprocessors and microprocessor-based sys tems. The universities were busily benchmark i ng Intel Corporation's l a test generations of 8 0 86s, 80 1 86s, and t he early 8 0 2 86s on spe cifi c jobs . The same was true of the 68000based syste m . Many companies were start ing tO come to market with engineering workstations a n d s i m i l a r p ro d u c t s b a s e d o n t h e s e m icroprocessor ch ips . I n fact i f one bel ieved the trade press , the VAX - l lj780 system had actu a l ly been e c l i psed i n performance and capabil ities by these "upstarts . " Needless to say, these events caused some degree of conste rnation and soul -searc h i ng within Digital Equipment Corporation. More over, another factor was becom ing painfully obviou s : the emergence of the i ndependent software vendors . Hoards of small companies were springing up everywhe re to generate software for various personal computers that e ither had already been i ntroduced to the mar ketplace, l i ke the Apple I I , or shortly wou ld be , l i ke the IBM PC . These small vendors wanted to write software for the systems that had the high est market volum e . Their reasoning was clear. To sell as m a ny of their software packages as possible required i mplementing their ideas on the highest volume hardware . It was also clear that the highest volume hardware was going to be m i croprocessor based a nd q u i t e i nexpensive . Meanwhile , within Digital, the Semiconduc tor Engineering Group (SEG) was busy deve l oping a multichi p implementation o f t h e VAX archi tecture . Built with a midrange , mult iuser, h i gh-performance system in m in d , this chip set and its attendant system im ple mentations were aimed at the marketplace for systems above SSO thousand. CAD tools were being developed and m a nufactu r i n g p r o c e s s e s d e velo p e d a n d refi ned . The module and system concepts were then i n the defin ition stage . D i scussi ons began a t t h i s t i me , centered around what was later known as the MicroVAX system . There was a perceived need to counter the rising tide of encroachment on our systems business by microprocessors. We wanted tO cre ate systems with volumes high enough to war rant the attention of the i ndependent software vendors. I n general, we wanted to establish the VAX archi tecture as one of the preferred archi tectures at all potential price levels in the entire i ndustry. These d iscussions and strategic thinking con verged after receiving a n unsolicited proposal from a sem iconductor manufacturer. This firm had approached us during that summer, want i ng tO implement the VAX archi tecture in one or two h i g h - performance chips. This set of chips could be used in our systems and sold as standalone products. The firm wanted tO use the VAXjVMS archi tecture (and primarily the software associated with it) to get a jump in the ma rke tplace by establi s h i n g a h i g h-volume archi tectural standard at the 3 2 -bit level. We were concerned from the beginning that the capabi l i ties a nd resources of this smaller firm would not be sufficient tO execute such a for mi dable progra m . But the notion t hat buildi n g a si ngle -chi p VAX i m plementation and us ing it to counter-attack the emerging m icroprocessor based systems had struck a responsive chord . Until that t i m e , our thinking had been in terms of our traditi onal pricejperformance learning curves . Our strategies did not i nclude extraor dinarily low-priced VAX systems. As indicated above , the Se miconductor Engi neeri ng Group i n Hudson, Massachusetts , was already heavily co m m itted to the m ultic h i p V AX syste m . A number o f other major c h i p projects were i n development a s we l l . There fore , we searched for a larger semiconductOr vendor who could bring additional design and manufacturing resources to bear on this con cept . Such a ve ndor could also make available add i tional distribut ion cha nnels for sales of high-vo lume chips tO the general marketplace . This line of thi nking was pursued with various vendors throughout the fall and winter of 1981, until April 1982 Inte resti ngly enough , there was less than wholehearted enthus iasm on the part of the various vendors who were approached. Each of them had already decided on an approach tO the problem and were unwilling to make the development of the MicroVAX chip a priority i t e m. T h a t com m i t m e n t was an extremely i mportant issue tO us . Experience had shown that complex projects of this nature always exceeded the schedules and the budgets antici pated when they received seco nd-class atten tion within the merchant semiconductOr indus try . Thus one criteria for working with a vendor was that he com m i t to the M icroVAX architec ture as a pri mary market thrust. No one was willing tO do that . At the same t i m e , other issues had to be worke d . I t was clear that the full VAX architec ture as i m ple mented in the multichip set could not easily be put on a single chip. That would have taken over 1 mi llion transistOrs , a capab i l ity t hat would n o t b e avai lable until t h e e n d of the decade . Therefore , early in the project , we recognized that there was a need to subset the architecture to make it i m plementable on a sin gle chip. By December 1981, the idea of devel oping a single-chip VAX i m p lementation was begi nning to get some pos it ive re -enforce ment w i t h i n Digital. As a result , in that m o n t h , Gordon Bell, then vice-president of Engineer ing, chartered a subcommi ttee tO investigate w h a t s hould be i nclu d e d in a M icroVAX architecture . The key people involved were Roy Moffa , who had been lead i n g the strategic th ink ing about a single-chip VAX system; Bob Supni k , representing semiconductor technology; Dick 9 Fo reword H u s tvedt a n d Dave C u t l e r , r e p r e s e n t i n g software technol ogy; and Bill Strecker, repre senting VAX architectu re technology. After a few intensive meetings , they proposed a subset of the VAX architecture in January 198 2 . Bob Supnik and the sem iconductor tec hnol ogists thought that this subset could be implemented i n a single chip. This new arch i tecture would be modified slightly later in the year, but it is essentially the architect ure that exists today. The only sign ificant modification was in the memory manage ment capab i l i ty, and in some sense , this change actually simp l ified the devel opment o f the chip. I n para l le l with these other activiti es, Bob Supnik and other members of SEG had been studying ways to get the chip developed i nter na l ly . They were h o p i ng t o l ev e rage t h e existing i nvestments in process technol ogy, chip mode l i ng, CAD tools, and the various other e leme nts that were necessary. Further more, and highly signi ficant to the whole pro gram, t hey developed ways of re-using some of the i nvestments be ing made in the multich ip VAX i m p l e me n ta t i o n and o t h e r p rograms a l ready in progress . As a res u l t the fl oating point c h i p b e ing dev e loped for a PDP 1 1 microprocessor was used as the bui lding block for t h e MicroVA.X i m plementat ion. Not only t hat but the chip was a lso retrofitted back into the existing m u ltichip set to m in i mize the workload . Moreover, the datapath was lifted from the instructionjexecution u n i t of the m u l t i c h i p s e t to fo rm t h e b a c k b o n e o f t h e MicroVA.X CPU . Tools and techniques were bor rowed whenever it was possible . I n this sense the Mi croVA.X program was u n iq u e . There were a l most n i ne months of strategy discussion and evaluati ons of various ways of implementing and executing before any rea l design actua l ly started. While many of the p r oposed b u s i n e ss s t r a t e g i e s were n e v e r adopted, they a t least received a hearing. In any case the die was cast. The real i m pl ementation of the MicroVA.X chip did not get started u n t i l June 198 2, the official start date being July 6, 1982 . (Some work had been done prior to t hat for recru iting - 10 and staffing.) It was soon evident that there w e r e s o m e key e l e m e n ts t h a t had to b e addressed. T h e first was CAD tools. There was no question that this device had to be simulated exte nsively at a l l leve ls of im plementa tion . There was no other way ro get the quality of design and performance l evels being planned . At the time the program started, these tools were mostly experi menta l . Some techn iques had been tested, but the real i ty was that CAD rools "broke" on numerous occasions during the deve lo pment of the system . Crisis-ori en ted SWAT teams had to be put in place to bridge over or break through barriers that threatened to bring the e ntire program to a halt. There was another equa l ly i mportant e l e ment. The entire program was an extremely co mplicated one, with many elements on para l l e l paths . Process technology had to be devel oped, CAD tools deve loped and refi ned, chip designs done, systems imple mentations exe cuted, and test tech niques and equ ip ment deve loped. Each of t hose e l ements was inti mately entwined with the others . Therefore the possibil i ty clearly existed that, upon reaching the end of the desi gn, we wou ld be faced with debugg ing a new process technology, a new manufactu ring line, new testers , a new chip design, new packages, and a new system , a l l s i m u l taneously. A real possibil ity existed that we co u ldn ' t separate the variables in a suffi ciently clear and timely manner to a l low the chip debugging and system eva l uation to take place . This phase could last for months or per haps even years , something t hat has happened before on many such programs i n the merchant i ndustry. To avo id that, we segmen ted the major risks in the program and put plans in place to m i n i mize as many o f those a s possi ble in paralle l before t h e new c h i p ar rived. F o r instance , rather than debugging an entirely new manufac tu ring line while trying to build t h is new chip, we combin ed the existing two wafer fabrication l ines in to one . The sma l ler li ne was t h e n retrofitted r o provide a p i lot l i ne capability. That gave us a trai ned staff, a debugged fac i l i ty, and a l l the other e lem ents necessary to mi ni- m ize the interaction of the process and fac i lity . Additional ly, a test vehicle was designed s o that manufactu ring cou ld run wafers, debug process steps, and i mprove the basic yields of the pro cess we l l before the new chip arrived. In the test area, test programs were implemented on older, proven testers on which the engineers had experience . That worked even though we knew that, for the eventual produ ction, an entire ly new generation of testers wou ld be necessary tO precisely test such a compl icated device at i ts full speed . Simi larly, other areas , such as packaging, CAD cool development, and parts of the system evaluation , were exam ined and improved i n parallel long before they had tO work together. A major program was put i n place to u ncou ple risks and tO hire and train the workforce we l l in advance of the completion of the MicroVAX chip design . This effort was quite expensive; some people thought that m u c h of the money was being thrown out with the materials that were made experi mentally. But the end result was one of the s moothest debugs and introduc tions i ntO chip manufactu ring that I have ever witnessed for a complex device . While there were p ro b l e ms a nd a lthough t h i ngs didn ' t always work right, there were almost always independent ways of separating the variables i n the proble m . In that way it cou ld b e properly analyzed and corrections put in p lace . This example should serve us wel l with comp lex deve lopment programs in the fu ture . One other thing done tO enhance the debug and ensure the qual ity at the system leve l was to co-locate the C PU modu le designers with the chip designers . I n that way their interaction was enhanced and the rate of problem resolu tion greatly acce lerated. The modu le team itself was exceptionally sma ll for such a major pro gra m , consisting of only three primary engi neering people. But this unique program envi ronment featured a high degree of simu lation , close proximity of the engi neers (the MicroVAX chip team had only 20 people) , and heavy rel i ance o n thorough evaluation at every ste p . T h e e n d res u l t was very, very few bugs i n either t h e c h i p or t h e system. I n fact there were fewer than 20 bugs that had tO be corrected before the integrated chip and system were able tO boot the operating system . It should be noted that this qua l ity has continued tO manifest itself in the rapid manufacturing ramp-up and the qual ity of the systems that have been generated . There were more engineering changes to the parts and the system tO enhance our margin and ease of manufacture than there were tO make the system fu nctional in the first p lace . That is evidence of a fundamentally different approach to bui lding system s . As noted above , t h e MicroVAX program i s quite u nique, from its initial conception t o the continuing efforts to enhance quality and pro ductivity. From the i nitial conception of the strategy, through the organization of the people and probl e m s , to the ongoing e ngineering activity around quality and ease of manufac ture , this program has provided a new paradigm for program execution and management. Our hope is that, with this knowledge, people can emu late the success of this program while elim i nating the errors . I n so do ing, Digita l can greatly enhance its abil ity to b u i ld and manu factu re high-quality systems in i ncreasingly shorter periods of time. 11 Daniel W. Dobberpuhl Robert M. Supnik Richard T. Witek The Micro VAX 78032 Chip, A 32-Bit Microprocessor The Micro VAX 78032 implements the VAX architecture on one chip. To do that, the instruction set was repartitioned to reduce the number of tran sistors. The instructions used most frequently are in microcode; others, notably floating point, are emulated in macrocode. Hardware was sim plified by having a small address translation cache and no memory cache; however, full VAX memory management is supported. Afast 200nanosecond microcycle allows instructions to execute in parallel. The CPU chip is made using a 3-micron, double-metal NMOS process. The control store ROM has X-shaped cells, which help to reduce its size. The MicroVAX 780 3 2 chip is the latest exten sion of the VAX architecture and the first i n the form of a single-chip microprocessor. As the CP of the MicroVAX II computer system , the 7 8 0 3 2 p e rfo r m s n e a r l y as fa s t as t h e VAX- 1 1 / 7 8 0 s u p e r m i n i c o m p u t e r , b u t i n a m icrocomputer package . tional ity, but the basic VAX functions had to be i ncorporated i n the base CPU design . 2. The chip had to be compatible with a l l VAX application programs . I t had t o exe cute any application program, whatever its size or complexity, written for any computer in the VAX fam i ly . And it had tO execute without alterations to the pro gram code . That meant that the chip had to run the M i croVMS and ULTRIX- 3 2 m ( D i g i ta l ' s e n h a n c e d U N I X software) operating systems, a nd the VAXE LN real time kernel . 3. The chip had tO perform at or near the speed of the VAX - 1 1 /780 p rocessor . This goal implied that the chip had to have a highly parallel i n ternal implementation, a high-performance external i nterface , and a fast microcycle . Accordingly, the i n ternal microcycle of the chip was set at the same 200 nanoseconds (ns) as the l l j780's mi crocycle . 4. The price of the chip had to be competi 3 2 -b i t tive c o m m e rc i a l with m i croprocessors of comparable c o m plexity. This required a relatively con servative d i e size and a n i nexpens ive package . I t also requ ired the i mplemen- Origins and Goals Digita l began the MicroVAX CPU chip project in late 1981 in anticipation of increasing com p e t i t i ve pressu res from i n d u s t ry - s t a n da rd microprocessors . The original i ntent of the pro gram was to l i cense a semiconductor vendor to design and manufacture a MicroVAX single-chip microprocessor. However, the l eading semicon ductor companies were unable tO meet the h i g h - p e rform a n c e re q u i r e m e n t s a n d t i g h t schedules that the project requi red . I n May 1 9 8 2 , a n i nternal development project was chartered tO design the MicroVAX CPU chip . From a designer's viewpoint, the develop ment of this CPU was a challenging exercise i n shrinking the VAX computer a rchitecture with out changing i ts fu ncti o n . There were five major goals that governed the design . 1. 12 The kernel architecture was tO be imple mented on a s i ngle chip . Other chips or hardware coul d be used to i mprove per formance or to provide additional func- Digital Technical journal No. 2 March 1986 I New Products tation of an external interface that was compatible with standard VLSI periph eral chips and demanded minimal sup port from the hardware on the C PU board . 5. The chip had to be designed and buil t quickly. T o meet or beat competitive prod ucts, the chip had to be in produc tion less than 2 Yz years after the start of development. With these goals gu iding the chip design tea m , the major problem was quickly identi fied: to reduce the nu mber of transistors. That, in t u rn, re q u i r e d r e p a rti t i o n i n g the VAX instruction set and simplifying hardware func tions wherever possible . Reducing the Number of Transistors The pri ncipal problem in designing the 7 8 0 3 2 was how t o implement the complexity o f the VAX architecture on a s i ngle chip . There are 3 0 4 instructions in the ful l instru ction set, w i th 1 4 data types and 2 1 addressing modes. I nstruc tions vary in length from 1 byte to 54 bytes. 1 D e ma n d - paged virtu a l m e m o ry s u pport is requ ired to guarantee compatibility with the operating system software . To accommodate this complexity in a ful l-scale VLSI VAX imple mentation requ ires about 1 . 2 5 m i l lion transis tor sites2 However, the semiconductor tech nologies avai lable at the time of design could support only about one-tenth that nu mber in a single-chip microprocessor. 3 The architectural fu nctions i n all VAX sys tems are parti tioned among hardware, microcode, and the operating system . Al l previ ous VAX implementations have similar bounda ries between these three. The hardware pro vides the registers and memory, the microcode provides the instruction set, and the operating system provides the program services . A large contro l store-a minimum of 4 00 kilobits (Kb) is requ i r e d to c o n t a i n t h e i n s t r u c t i o n microcode . The console fu nction is handled in either microcode or a support processor. More over, the control logic needed to support mem ory manage m e n t a n d the var i a b l e ins truc tion format is qu ite complex4 Two d i fferent approaches were taken to reduce the transistor count i n the microproces sor ch i p . First, the VAX i nstruction set was repartitioned to cut the size of the control store Digital Technical journal No. 2 Mm·cb 1986 to 6 2 Kb . Second , the amount of on-chip hard ware was reduced by simplifying some fu nc tions, p l acing others el sewhere, or omi tting some altogether. Repartitio n ing the Instructio n Set As the first repartitioning step, the design team assumed that all VAX instructions had to be imp lemented in order to execute all VAX appli cation software . However , there are several classes of i nstructions that involve a good deal of m ic rocode and yet are infrequently exe cute d . For example, a typical timesharing work load is handled by base instructions, scientifi cally oriented i nstructions , and commercially oriented i nstructions . Analyses of more than 70 mill ion executed i nstructions showed that the commercia l ly oriented ones represented less than 0 . 2 percent of the total executed5·6 Stud ies of sc ientific and engineering workloads showed even lower percentage s . Even i n com mercia l applications, the commerci a l l y ori ented instructio ns represented less than 4 per cent of the total executed , the majority being base instru ctions. The refore , e m u lating the commercia l l y oriented i nstructions in the oper a t i n g sys te m rather than u s i ng m i crocode wou ld significantly reduce the size of the con trol store, but would have l ittle effect on over a l l performance because these i nstruct ions were seldom executed . On the other hand, floating point i nstructions require a good deal of microcode and are exe cu ted more fre quently. Even with m icrocode, i nstruction execu tion is relatively slow unless a separate floating point accelerator (FPA) is used . Therefore , a l though existing VAX imple mentations offered both m icrocoded (warm) a nd hardware (hot) floating point , the design team decided not to implement these i nstruc tions in m i c rocod e . I ns tead, fl oating point i nstructions wou ld be executed i n an optional floating point chi p, or by e m u lation using macrocode . I n tota l , 1 75 of the 3 0 4 VAX instructions and 6 of the 14 data types are imple mented in on chi p m i crocode . Those i nclude integer and log ical i nstructions, variable-bit fie l d , contro l , queue, procedure calls, character string moves, and operating system support . This microcoded s u bs e t c o m prises over 9 8 percent of the instructions that are used to execute a typical program . H oweve r, the requ i red microcode 13 ---- . The Micro VAX 78032 Chip, A 32- Bit Microprocessor occupies only one-fifth the control store space of a fu l l VAX impl ementation. Seventy t1oating point instructions and th ree data types (F. D, and G t1oating) are implemented in the t1oating poinr c h i p , when it is present. If that chip is absent, t h e instru ctions a r e emu l ated i n macrocode . T h e remaining 59 instruct ions and 5 data types are a lways emu lated in macrocode. Those are main ly decima l string, c h a racte r string, and H floating point operations . The CPU chip provides some microcode support for the emu lated instructions . Table 1 summarizes the ins truction set arch itecture of the 78032 chip. The decis ion to emu l ate instructions in macrocode has an effect on speed because emu lated instructi ons take three to ten times longer to execute than microcoded instructions . How ever, the instructions in this group of 59 a re Table 1 normally used so infreq uently that the execu tion speed of a typical program is reduced by no more than four percent. Tab le 2 i l lustrates the division of instructions between the CPU chip , the FPU chip , and the macrocode . All in all, the fivefold reduction in the size of the control srore ha lved what wou ld have been the active area of the chip . Simplifying the Hardware Fu nctio ns The principal hardware simp l i fications in the 78032 a re the reduced size of the address trans lation cache (translation bu ffe r) , and the e limi na tion of a memory cache in favor of tightly coupled local memory. As mentioned earlier, demand-paged virtual memory management was req u i red for compati bi lity with the VAX arc h i tecture . Consequently, the design team decided that the 78032 wou l d Instruction Set Architecture I m p lemented in Macrocode Im plemented i n Floating Point Chip I m plemented i n CPU Chip I n structions: I nteger a n d Logical 89 F floati ng 24 H floati ng Address 8 D floating 23 Octaword 4 Va riable Bit Field G floating 23 7 C h aracter String 9 Control 39 Decimal String Procedure Call 3 Edit M i scellaneous 10 CRC Queue 6 Operati ng System Support C haracter M ove Total 28 16 11 2 59 70 175 Data Types: F floating H floating Word Integer D floating Octaword Longword I nteger G floati ng Leading Separate N u m e r i c String Byte I nteger Quadword I nteger Trai ling N u meric String Va riable Bit Fie ld Packed Decimal Variable C haracter String 14 Digital Technical journal No. 2 March /')86 New Products Table 2 Division of I n stru ctions I n structions I m p lemented in CPU Chip I n structions I m plemented i n Floating Point Chip I n structions I m p lemented in Macrocode Pe rcent by I n struction Count 57.6% 23.0% 1 9.4% Percent by M i croword Count 20.0% 20.0% 60.0% Percent by Typical Execution Frequency 98. 1 % 1 .7% 0.2% be the first s i ngle-chip CPU with fu l l demand paged virtu a l me mory support right on the chip. At first the design team proposed to use a s i m p lifi ed version o f VAX memory manage ment. During the course of the design , how ever, the software engineers reported that not providing fu l l memory manage ment was q u i te expensive in terms of the use of p hysical mem ory . Therefore, the design team i mpleme nted fu l l VAX double-mapped compatibi l i ty in the chip. As the design progressed , it became evi dent that the incre menta l cost of providing this capabi l i ty was much lower than orig i n a l l y anticipated . Al l existing VAX processors i m p lement mem ory management with a large address transla tion cache (at least 1 2 8 entries) , w i th system and process addresses in separate halves. A translation cache must have a high hit rate to be effective . Si nce m ost caches are direct mapped, many entries are requ ired to achieve a high cache rate 7 · H Implementing a comparable num ber of translation cache entri es in the 780 3 2 was out o f the question, due to die size con stra ints. However, the VLSI technology in the 780 3 2 is very amenable to using a fu l ly associa tive translation cache w i th least-recently-used (LRU) replace ment. Such a cache needs many fewer entries to achieve the same hit rate as the d irect- mapped version. In add ition, the tight cou pling to loca l memory, as explained i n the next paragrap h , m a d e i t poss i b l e to reduce drast i ca l l y t h e amount of time requ ired to process a transla tion cache m iss . Thus the translation cache in the chip has only eight entr ies , but the cache is fu l ly associative , uses true LRU replacement, a n d is s u p p o r t e d b y h i g h l y o p t i m i z e d m i crocode for fast processing o f m isses. More - Digital Tecbni<:al jourual No. 2 /lfarch 1 ')86 over , simu lation studies showed that the best use of the eight entries was with a homogene ous structure . Therefore, the system and pro cess addresses are cached together . The team also decided to forgo the use of an external memory cache, which requ ired a com plex external interface . Use of a n internal mem ory cache had already been ruled out due to die size constr a i n t s . Accord i ngly, the speed of mem ory access is 400 ns , or two m icrocycles, which is the speed of local m emory. Thus the chip encounters no wait states, and its average time to access memory is approxi mately the same as the 1 1 j7 8 0 ' s . In a typical program , there is little differe nce between the i nteger instruction performance of the two CPUs . Add it i onal s i m pl ificat i ons i ncluded the e l i m i nation of warm ( m i crocoded) floating p o i n t in favor of a floating point acce lerator , e l i m i na tion of wr i table control store capabi l ity, and e l i m i nation of on-chip console su pport. Design Narrative The starting point for the chip design was the i nstru ction execution chip of a multichip VLSI VAX processor a lready in des ign . This c h i p would provide a general floorplan a n d a base m i croarchi tecture , and m i g h t even provide complete des ign sections that cou ld be used for t h e M i c r o VAX 7 8 0 3 2 . As t h e p r o j e c t progressed , the designs o f the VLSI VAX proces sor and the MicroVAX 780 3 2 tended to d iverge under the pressure of differing constraints : chip set and system fu nctionality for the former ; die size, power, and t i m e to market for the latter . Ulti mately, only part of the main datapath was s hared betw e e n t h e two ; t h e rest o f t he M icroVAX 7 80 3 2 design and i ts m icrocode were uniqu e . l5 ---- The Micro VA X 7803 2 Chip, A 3 2-Bit Microprocessor The MicroVAX 780 3 2 project took 20 momhs fro m s tart to fi rs t -pass mask generat i o n : 6 months for specification and general design, and 1 4 months for physical implementation . E i ghteen people worked on the des ign ream . Project Design Tools The design ream was ai ded by a h i erarchica l CAD rool s u i te that ran on a VAX system. The use of these tools was one of the primary rea sons that the project was completed on sched u le . The principa l components of t h is rool su ite are as fo l lows : 1. A proprietary chi p-database manager and tool i nterface called the CHAS system 2. A schematic capture progra m , Q UICK D RAW , t hat uses simple term inals 3. A p r o p r i e ta ry h i e ra rch i c a l s i m u l a t o r ca l l e d t h e D E CS I M system , u s ed fo r behavioral s i m u l ation 4. A s w i t c h - leve l MOS l o g i c s i m u l a t o r , RSI M , used for u n i t-delay l ogic s i m u la tion 5. A mod i fied vers i on of the standard SPICE c i rcu i t simu lator that in corporates new ana lytica l, rather than empi rica l , M OS transistOr mode Is 6. Des ign-ru le c hecking programs, DRC and D RACULA I I 7 . A n i n te rcon nect verifica t i o n program cal led the I V system , w h i c h performs both layout extraction and wiring verifi cation 9 8. A cross-reference progra m , XREF, that a n a l yz e s c o u p l i n g , boots trap ra t i os , dynamic node stab i l i ty, and other c i rcuit problems The chip layout was done on Calma GDS n system s . Three dedicated VAX- 1 1 /780 systems and five Ca lma stations were used throughout the project. The back-end veri fi cation of cir· cu i ts and the layout req u i red as many as eight VAX systems . Final Chip Design The fi na l product of this design process is a microprocessor that conta i ns 1 2 5 , 000 transis tOr s i tes in a 3 - m i cron , double-metal NMOS chip that measures 8 .7 by 8 . 6 m m . It req u i res 16 only 5 Vdc and a maxi mum o f 3 watts of power; it is pac kaged in a 68- p i n , su rface-mounted leaded chip carri er. The chip operates at 2 0 MHz and has fu l l 3 2 - bit i n terna l and external datapa ths. The 780 3 2 is mounted on a s i ngle board , quad-si zed ( 8 . 5 by I 0 . 5 i n . ) CPU mod u l e hav ing a Q 2 2 1/0 bus and 1 megabyte (MB) o f l o c a l m e m o r y . An o p t i o n a l F P A , t h e M icroVAX 78 1 3 2 c h i p , can also be mou nred o n the CPU board . The measu red speeds of i n teger and floati ng p o i n t opera t i o ns of the 780 3 2 represent a brea kthrough in 3 2 -bit m icroprocessors . System evaluati ons of M icroVAX 780 3 2 modu les i nd i cate that t h e i r performance i n process i ng i nte gers is app rox i mately e q u a l to that of the VAX- 1 1 /780 syste m . With the float i ng point c h i p , t he performance i s between those of the VAX- 1 1 /7 5 0 and VAX- 1 1 /780 systems w i t h FPAs . The remainder of t h i s paper expla i ns the fu nctional organization of the chip and i ts phys ical i m plementation in s i l i con . Functional Organization The diagram i n Figure 1 and the photomi cro graph in Figure 2 o ut l i ne the various subsec tions, or fu nctional boxes, of the M icroVAX 780 3 2 chip. They are organi zed i n to three sec ti ons . At the left of Figure 2 are the darapaths for decod ing and exec u t i ng i nstructions and for memory management. At the center is the con trol logic for i nternal operations and the proto col s ignal logic for external operations. At the righ t is the sequencing l ogic for both internal and external operations. The left sect i o n i n t h e p hotom ic rograph ( Figure 2 ) , comprising the data paths, consists of the I Box, the E Box, and the M Box. • The l Box prefetches and decodes i nstruc tions. I ts main fu nction is to parse the cur re n t macro i nstruct i o n in the i nstruct i o n stream and work i n conjunction w i t h the m i crosequ e ncer to gen erate t h e m i croad dress for the next mi cro i nstru c t i o n . This microaddress is a fu nction of the c u rrent macro i nstru ct i on . A prefetcher, which works i n para l l e l w i t h o t h e r c h i p opera t i o n s , accesses a n d s tores i nstru c t i o n data i n a n eight-byte prefetch q u e u e . The prefetcher acts au tonomously by atte mpting to keep that queue fu l l at all ti mes, u s i ng any free I/O-bus cyc les to access t h e i nstru c t i o n Digital Technical journal No. 2 March 1 986 New Products I N T E R R U PT I BOX •r ,_ .... "" -- - - - - - -� R J IN PRIORITY LEVEL I I Lr r+------,------.---1 • I READ ROTATOR rl H � L...1 r-1 REAO OATA LATC H t A L I G N M ENT Mux t INSTRUCTION DATA REGISTER • � - EBOX INPUT MUX B BUS GPRS AND SCRA TCH PADS n�· I� I ''NI \ : I g��e�� I I I � EXTERNAL DATA A N D A D D R E S S BUS I I ALU ,;;;I TH LATCH A N D I BIT S H I FT B L.. .--- LENGTH REGISTERS [- • t M I VI RTUAL ADDRESS BUS TRANSLATION B U F F E R , TAGS PAGE TABLE ENTRIES Figure 1 CONTROL JtAOO RESS STACK rL___. Lr--- � I TEST ,uADDRE SS ODE I' ! UX ADDER (JUM p p.A O O R E s 51 1-'TRAP/OR BOX I MUX r- - 1 I I L---y �;.,-. - -= . ==-=� -- ,-p-:-: 0D '::: : R:-: :-: S---, ES A :-: J LATC H p.ADDR E SS BUS '--1 I CONTROL STORE I _ ...JI r- - - - - -, I I I I I I MBOX CONT ROL I _ _ r- I J I I I 1- -- - - _J I EXTERNAL CONTROLS CLOCK G E N ERATOR t AND STROBES Block Diagram of the CPU Chip i nstru ction-decode PLA (l PLA) generates 1 9 bits of opcode-specific data for contro l l i ng other chip operations related to a giv e n i nstruct ion . That a l l ows many mi c rocode sequences to be table driven and shared. strea m . Even i f t h e q u e u e is fu l l , the prefetcher wi l l start to read data if the queue will be a t least half-empty after the cu rre nt microcyc l e . The l Box a lso decodes i nstructions and va ri able-length operand spec ifiers i n para l lel with other c h i p operati ons . That avoids requ iring explicit decode cycles to execu te su ccessive macroinstructions . D u e to the constraints on the size of the control store, most of the address-specific microcode had to be shared among a l l instructions. The DigiUll Technical journal No. 2 March 1986 ,..... pc MtCROtNST AUCTtON 14--Bu _ s ____ l ._ . _ _ _ ·---- PHYS!CAL ADDRESS B U S -.- L I Aw Bus I -" /_ / -, \ .__ _ M U X D R IV E R L.. _ _ �� II II J ADDRESS REGISTERS I N C R E M ENTE R 1 (�-tBRANCH o F F s ET) MICROSEOUENCER CONTROL EBOX CONTROL J LENGTH COMPARATOR TB/AW BUS ,---'.. -, ..1. J- I � l� J_, MUX DRI VE R l _j-r--8l ----, �I NSTRUCTI ON II ,II 1,-,-L-----'.__,�L.-.. I I I I I Ij II � I I I I I - - - - - - --- I HALT I P OW E R F AI L (ENTAY·POINT .uADDAESSES) _;- •� I ---1 I I I I ..___.jI WITH I I • sox , I f...J I �I I I I1 ....-J J INPUT DRIVERS I SHIFTER I \ r CONTRO L BARREL wtTH INPUT lATCH E S ----, ----. t I N T E RNAL DATA A N D ADDRESS B� ! BOX E R EO U ES�S - - - - - - - .-l � E BOX ROTATOR AND CONTROL INSTRUCTION LOGIC INSTRUCTION STREAM ROTATOR I rw-::R::,T:-E--, I I L _j 1 n oecooe I:NTERRuPT I I SYN C H R O N I Z E R AN D l I N T E R R U P T . P R I O A1 T 1Z E R PREFETCH STACK 1 LI I J DATAPATH TOP BUS � M I CROSEOUENCER \ ;,; �- � - - - - �- - - • The E Box is the instruction execu tion unit and contains the main datapath of the chip. This box holds 1 6 VAX-spec i fied genera l pu rpose registers (GPRs) , 2 0 m icrocode reg isters, a 3 2-bit arithmetic logic unit (ALU) , and a 3 2 -b i t barrel shifter. The E Box a lso maintains condition codes for t he process 17 The Micro VAX 78032 Chip, A 3 2-Bit Microp1·ocessor :... t. - -�.,.:-- - - .. - I - �t .J.' . . .. . ·I t., ·.J --- '"� I Figure 2 I - t I · I ., ,_ . • .. �I • .: : Photomicrograph of the CPU Chip status longword (PSL) and determi nes VAX branch conditions at the macrocode level . In a 2 0 0 - ns cycle , the E I3ox ca n read two reg is ters , perform an ALU operation or shi ft, and write the result i n to a register. Si nce reading a n d w r i t i n g to regis ters are p e r fo r m e d seque n t i a l l y , the ALU result bus is mul tiplexed with an i nput bus. thus savi n g vert i cal interconnect . The ALU employs a 4 - bit lookahead carry sche me, with ripple carries across the n i bbles . The carry chain uses dual ra il logic for maxi mum speed . The barre l shifter is a pass -transistor network, which is very compact and fast enough for this tas k. 18 • "' I . • The M I3ox serves as the memory ma nage ment unit and translates vi rtual addresses to p hysical addresses . The address translation cache , which is ful ly associative , stores t he most rece n tly referenced address tra nsla tions. The M I3ox m a i n ta i ns t h ree virtual address registers , one for i n struction data and two for program data. This unit also detects cross-page accesses and includes a separate compararor for length checking. A dedicated adder generates the next vi rtual address for seque ntial data and in struction addresses . The time to perform an add ress translation is less than 2 5 ns when the virtua l address is i n Digital Technical journal No. 2 March 1 986 New Products ates control signa ls for the three principal functions in the main datapath : the I Box , the E Box , and the M Box. The access time of the control store is less than I 00 ns . the trans lation cache . This short translation time a l lows memory management to be trans parent to the external chip timing. The center section of the photom icrograph is composed mostly of random control logic. That logic translates the highly vertical ( 3 9 - bit) microcode into the many discrete control sig na ls requ ired to operate the datapath . The right section of the photomicrograph, comprising the sequencing and clock ing logic, consists of the inte rru pt logic, the contro l store , t h e DAL i n t e r fa c e , a n d t h e c l o c k generatOr. • The interrupt logic accepts , sync h ronizes, and pri oritizes external interrupt requ ests , compares them with the cu rrent interrupt priority leve l (IPL) , and determines if the r e q u e st w i l l be s e rv i ce d . T h e interrupt requ ests are c hecked at the beginning of each microcycle and the interrupt update is forwarded to the I Box . That a l l happens through the central control logic before the next microcyc le begins . Exte rna l interrupt processing h a s been implemented on-chi p in the 78032 to avo id the complex ity that results from having the interrupt priorities arbitrated outside the chip. Since these priorities are an integral part of the processor state, an off-chip design wou ld invo lve broadcasting the interrupt priority level eac h time it changed . More ove r, off-chip interru pt processing wou l d a lso req u i re additiona l hardware o n t h e C PU board . • The mi crosequencer accepts inputs from various po ints on the chip and generates the next microaddress to ac cess the contro l store . The microsequencer logic pe rforms such operations as microsubroutine calls and returns, microcode traps, n-way (or case) branche s , and s i gned offset co nd i t ional branc h e s . I m pl emented in the m i c rose quencer is a n e i ght- level m i croprogram stack . • T h e control store is a 3 9 -bit ROM with 1 6 00 entries . It receives microadd resses and status s ignals and generates the next set of microin structions . The control store transfers those microinstructions to the control section in the center area . That section , in turn, gener- Di�ital Tecbnica/ journal No. 2 Manb I ')8() • The DAL interface hand les a l l control signals and transfers data and addresses between the chip and local memory , peri pherals, and other devices outside the c h i p . The DAL interface transparently processes va riable length operands and al igns data references that cross natural 3 2 -bit memory boundaries. It also causes the microprocessor to sta l l dur ing r;o refe r enc e s , so t h at add i t i o na l microcode is not needed tO test for I/0 com p l etion. The DAL interface contro ls transac tions involving the C PU chip, the FPU chip, and exte rna l devices. It also a rbitrates d i rect memory access (DMA) requests . • The clock generatOr rece ives an externa l 4 0 MHz c lock reference and produces the ei ght 2 5 -ns clock phases that time functions on the chip . The control logic of the chip makes extensive use of bootstrapped drivers . For that reason, certain cl ock p h ases have to drive very high capac itances, as much as 2 5 0 pi cofarads . To assist in that task, a special driver c ircu it with cu rrent- l i m iting resistors is used to prov ide fast edges without us ing excessive power or s i l i con area. These resis tors control the overlap cu rrent drawn d u r ing bootstrapping and provide a voltage drop d u ring the overlap. External Interface A principal goa l in des igning the chip's exter na l interface (Figure 3) was to de mand as few support functions as possible from the CPU board . The 7 8 0 3 2 chip prov ides seven hard ware interrupt inputs . Fou r of these inp uts (IRQ<3 : 0> L) correspond to standard VAX I/0 interru pts and res u l t in vectored interrupt transacti ons . Three others (I NTTIM L, PWRFL L, HALT L) have preassigned interpretati ons and the corresponding vectors a re generated inside the chip. The 7 8 0 3 2 takes in a double-fre qu ency cl ock input from a standard osc il lator. The chip prod uces a norma l-frequ ency cl ock output, which can be used to drive or synchro nize externa l logic. The functions between the chip and the Q-bus can be implemented in off the-shelf discrete logic . 19 ---- The Micro VAX 78032 Chip, A 32- Bit Microprocessor - I N T E R RUPT CONTROL DMA CONTROL { { - INTTI M ROY R OY PWR F L -- ERR ERR H A LT B M < 3 O> B M <3 :0> I R Q < 3 0> - DS - AS DS � AS -- DMR -- DMG D A L < 3 100> /" � Y-- r-V ADDRESS LAT C H M i c ro V A X 78032 CENTRAL PROCESSING � r-V UNIT - OBE - WR RESET CLKI � 8 0 < 3 1 : 00> - DBE - WR � - -- DATA T R A N S C E I VE R S BA<3 1 : 0 0> EPS � M 1 cr0VAX 7 8 1 3 2 FLOATI N G PO I N T UNIT C S < 2 0> CLKO r-EPS C S < 2:0> CLKO Figure 3 External Interface E xcept for the 3 2 -bit DAL bus , the external interface closely rese mbles t hose for ex isting 1 6-bit microprocessors . Specifica l ly, its t i m i ng and signal complement are quite s i m i lar tO those in current machines . The addresses and data on the DAL are time d ivision multi plexed, with separate t i m ing strobes (AS and DS, respec tively, in Figure 3) . The data direction and the data buffe r signals (WR and DBE in Figure 3) are used to con trol e x te rnal transce i ve rs di rectly. The cycle status signals differentiate a mong the various types of bus transactions . Four-byte mask signals , one for each group of e i ght bits on the DAL bus , allow straightfor ward manipulation of bytes within longwords (four bytes) . 20 The ROY s i gnal a l l ows slower peri p heral devices on the I/0 bus to stretch the me mory access time beyond 4 0 0 ns until they are ready tO respond . Parallel Opera tio n Besides giving the 7 80 3 2 opti m ized mi crocode and a fast microcycle time, the design team en hanced the chip's performance by allowing parallel operations between and within func ti onal subsections . This parallel flow is actually a form of pipel i n i ng in which the operations happen in depende ntly a nd concurre ntly. For e xa m p l e , w h i l e t h e E Box is execut i n g a datapath operation, the control store can access the next microinstruction . At the same time , the Digital Technical journal No. 2 March 1 ')8() New Products microseq uencer can be ca lculating the add ress of the m icroinstruction after that one, and the M Box can be trans lating a virtual ad dress . Meanw h i l e , the I Box can be decoding a n instruction o r operand specifier and prefetch ing more instruction data . And the DAL in ter fa ce can be i n i t i a t i n g or c o m p l e t i n g a n external bus operation . For example, assume that the chip is tO exe cute the fo l l o w i n g two t h r e e - m icro cyc l e macroinstructions i n sequence : AD DL3 RO, R l , R2 SUI3L3 R4 , R S , R6 Withi n the t hi rd 2 0 0-ns microcyc ! e , some operations associated with these two macroin structions are performed in para l l e l by several subsections. The E Box will write the resu l t of ADDL3 into R2 in the register fi l e , set the PSL condition codes, and check fo r arithmetic exceptions, such as an overflow trap. Mean while, the I Box w i l l decode the next macroi n struction , SU BL3 , and its first specifi e r , R4 . Concurrently, the prefe tcher in the I Box w i l l determine i f t h e decode of t he instruction and s p e c i f i e r w i l l c l e a r e n o u g h space i n t h e prefetch stack t O warra n t another lo ngword transfer. If so, the I Box will then ini tiate the transfer and fe tch another macroi nstruction, which also i nvolves the DAL interface . Within each subsection, there are a lso a num ber of parallel operations that reduce the over a l l execution speed significantly. In addition tO simultaneous prefetch and decode actions in the I Box (as described above) , the microcode access in the control stare is pipeline d : The next microaddress is accessed while the cu rrent microinstruction at the cu rrent microaddress is being execu ted . In the M Box, l ength checks against referenced addresses take place simulta neously with the translation cache l ookups . If a lookup misses , therefore , the length check will have already determined whether or not the ref erenced page is within range . In the E Box , a separate program counter ( PC) adder maintains the PC so that the ALU can be dedicated to its primary tas k. Some typical execution times for instru ctions u nder normal operating conditions ( a l igned operands, no memory management exceptions) are as fol lows : Digital Technical journal No. 2 March 1986 Typical Execution Time (Nanoseconds) In struction Operands M OVL Reg , Reg 400 A D D L2 Reg , Reg 400 M OVL M e m , Reg 800 A D D L2 M e m , Reg 800 MOVL Reg , Mem 600 A D D L2 Reg , Mem 1 2 00 Conditional Branch , not taken 200 Conditional Bra nch, taken 800 Physical Implementation The Mi croVAX 7 80 3 2 chip is made using a 3 - m i c ro n , d o u b l e - m e ta l NMOS process that a llows power savings and superior circuit flexi bil i ty . Until the MicroVAX 7 8 0 3 2 chip design, single metal was a standard for NMOS technol ogy . The use of a second layer on the 7 8 0 3 2 c h i p was a significant departure for NMOS design . There are two main advantages of a double -metal impleme ntation . First, it is easier to place logic circu its in the interconnect layer, where there are more circuits per u ni t area of s i l i con . Sec ond, the metal i nterconnect has l ower resistance than polysi licon, thus avoiding wire de lays that are difficu l t to elim inate in design . The double-metal process provided the chip design team with two layers of a l u minum inter connect and fou r types of devices (N, E , L, and D) . The four types allow some savings in power and a su bstantial i ncrease in circuit flexi bil ity. However , the E device ( l ight e nhancement) is typ ical l y used only in source-fol lower circuits, and the L device ( l i ght depl etion) only in latches and stati c memories . The second layer of aluminum interconnect manages the com plex ity associated with 3 2 -bit microprocessors . T h a t p e r m i t s g l o b a l c o m m u n i c a t i o ns a n d a l lows local control or routing to share the same chip are a . H owever, second metal can only contact first meta l , and then only through an offset, or staggered, contact. 21 The Micro VAX 7803 2 Chip, A 3 2-Bit Microprocessor Due to tight sil icon constraints, the test fea tu res b u i l t i nto the design had to be l i mited in scope . The principal o nes used are as fol l ows : • S e r i a l s h ift registers w i t h fe edback for observing the control srore , I PLA , and mi cro sequencer outputs • S p e c i a l test mode for overr i d i n g normal sequencing with external microaddresses • Dedicated mi crocode for opti m i z i ng state observations in the spec ial test mode Summary Figure 4 X-shaped Cells The control store is a 1 60 0 -e ntry by 39-bit RO M . Although its size was decreased mostly thro u gh reparti t i o n i n g a n d opti m i zed mi crocode , abou t ten percent of the red uction was ga i ned through the cell structure chose n . X-shaped ce l ls w i t h a virtual -gro u nd design were used (Figure 4 ) . This ROM has no phys i cal grou n d , w hereas s t a n d a r d ROMs w i t h H-sha ped ce lls have one ground l i ne for every two data lines. The X-shaped c e l l , which is 9 5 microns square , is al so more dense t han the standard ce l l . Moreover, in the X-shaped ce lls, second metal is stra pped across the top of the array to min im ize the row propagation time . The cell access time is 1 0 0 ns . The ROM bit l i nes are precharged tO Vee using depletion pull ups. Sensing is done with a cross - c o u p l e d s tage u s i ng l o c a l d e p l e t i o n di vider vol tage references s e t at 0 . 6 X V00 . Col umn access occurs in 2 5 ns. The c o n t r o l c i rcu i ts ( a t t h e c e n t e r i n Figure 1 ) are imp lemented in dynamic logic so that the total power dissipation is kept be low three watts. That also al lows a low- cost packag ing design . The eight clock p hases provi de refresh t i m ing references to the dynamic logic . 22 The M icroVA.,"'{ 7 8 0 3 2 represents a major break through both in semiconductor technology and in the VAX fam i ly. From a technology perspec tive, it is the first imple mentation of a su ccess fu l 3 2 - bit su perm inicomputer on a s i ngle c h i p . I t is t h e first chip t o provide integral demand paged virtual memory manage ment. And it is the first chip to provide system performance comparable tO the l l j780 . From a VAX per spective , the 7 8 0 3 2 is the key tO the downward extension of the industry-standard VAX fa m i l y i nto the rea lm o f small systems a n d worksta tions . Acknowledgements The authors acknowledge the techn ical contri but ions of John Beck, Sandy Carro l l , Ge rry Che ney, Mary )o Doherty, John Glynn, Jim Gorr, Bob Grondal ski , Dave Grondalsk i , Pat Hart , Ernie Hohe ngasser, Taan Lee , Steve Morris, Tony Pasq ui to, Steve Thierauf, Tim Thru s h , Janet Vite l l o , a n d Barry Worster. References 1 . VAX A rchitecture Handbook ( Maynard : D i g i tal Equi pment Corporat ion, Order No. EB- 1 9 5 8 0 , 1 9 8 1 ) . 2. W . N . Johnson, " A V LSI S u perminicom puter CPU," IEEE In ternatio nal Solid State Circ u i ts Co nference Digest of Technical Papers ( 1 98 4 ) : 1 7 4 - 1 7 5 . 3. J . Slager et a ! . , "A 1 6 -bit M icroprocessor w i t h O n - c h i p M e m o ry Pro t e c t i o n , " International Solid- State Circuits Con fe re n c e Digest of Tec h n ical Pape1·s ( 1 983) : 2 4 - 2 5 Digital Technical ]out-nat No. 2 March 1986 New Products 4. H . M . Levy and R . H . Eckhouse, Comp uter Programming a n d A rchitect u re: The VAX- 1 1 (Bedford : Digital Press, 1 9 80) . 5. D .W. Clark and ] . S . Emer, " Measurement and Analysis of I nstruction Use i n the VAX- 1 1 /7 8 0 , " IEEE Proceedings of the 9th A nn ual Symposium on Comp uter A rchitecture ( 1 9 8 2 ) : 9- 1 7 . 6 . j .S . Emer and D .W . Clark, "A Characteri zation of Processor Performance in the VAX- 1 1 /7 8 0," IEEE Proceedings of the l i th A n n ual Symposium on Comp u ter A rchitecture ( 1 9 8 4 ) : 3 0 1 - 3 1 0 . 7. W . O . Strecker, " Trans ient Behavior of Cache M e mories , " A CM Transactions on Comp u ter Systems, val . 1 , no . 4 (November 1 98 3 ) : 2 8 1 - 2 9 3 . 8. D .W. Clark, " Cache Performance on the VAX- 1 1 /7 80 , " A CM T1-ansactions on Compu ter Systems, val . 1 , no. 1 (Febru ary 1 9 8 3 ) 2 4 - 3 7 . 9. G . M . Taro l l i a nd W.j . Herma n , "Hierar chical Circu it Extraction with Detai led Parasitic Capacitances , " A CM IEEE 20th Design A u tomation Conference Pro ceedings ( 1 9 8 3 ) : 3 3 7 - 3 4 5 . Digital Tecbnical]ournal No. 2 March 1 986 23 William R. Bidermann Amnon Fisher Burton M. Leary Robert]. Simcoe William R. Wheeler The Micro VAX 78132 Floating Point Chip A separate chip, the 78132, in the Micro VAX II system petforms fast floating point calculations. Three datapaths, each controlled by microcode, work in parallel to yield a 1 00-nanosecond microcycle. The wide datapaths accommodate a large variety of instructions, using microwords of only 35 bits for control. The 78132 is a 3-micron NMOS chip connecting to the CPU chip of the Micro VAX II system via a general purpose protocol and a limited set of lines. Crosstalk and resistivity posed particular design problems, as did the routing of signals and power. The 78132's electrical integrity was carefully checked to ensure high reliability. Scientific and engi neering applications req ui re strong floati ng point su pport from their com· p u te rs . All VAX i m pleme n tations offer both microcoded (warm) and hardware ( hot) capa bilities to execute the 95 floa ting point instruc t i ons i n the fu l l VAX i nstruction s e t . The MicroVAX II processor also supports floating point instru ctions, b u t in a s l ightly diffe rent fa s h i o n . S i n c e t h e c o n tro l s t o r e i n t h e mi croprocessor, the CPU chip, has a lim ited size, these i nstructions are not execu ted in m i crocode ; i n stead t h e y are e m u l a t e d i n macrocode . 1 · 2 Emu lation i s relatively slow and does not provide the fast speeds requ ired for i ntensive mathemat ical applications . Therefore , a separate floating point acceleratOr (FPA) , the MicroVAX 78 1 3 2 chip, has been deve loped as a companion tO the CPU chip, the MicroVA.,'{ 7803 2 chip. The 78 1 3 2 , or FPU chip, i s designed tO pro vide fast float i ng poi n t calcu lations on a single chip. It execu tes 6 1 of the 70 floating po int i nstructions i n the MicroVAX i nstruction set. N ine of the 70 i nstructions simply move data, and the CPU chip does not need the FPU chip tO handle them. The FPU chip al so accel erates ca l culations for 9 integer i nstructions, which are associated with integer m u lt iplies and divides . The FPU chip execu tes i nstructions about 1 00 ti mes faster than macrocoded e m u lation . 24 The FPU chip (Figure 1 ) contains 3 2 , 1 4 1 transis tors i n a 3-m icron, double- metal NMOS chip, which req u i res just u n der 2 watts of power at 5 Vd c . It measures 8.4 by 6 . 6 mm and is packaged i n a 68-pin leaded chip carrier. The chip has a 1 DO -na nosecond (ns) mi crocycl e , d ivided i nto four 2 5 -ns c lock phases generated from a 4 0 -MHz i npu t clock. The CPU chip, which also operates on a 4 0 -MHz input clock, has a microcycl e of 200 ns . The faster m icro cyc le and w i de datapaths enable the FPU chip tO perform fl oating point operations m u c h fa ster t han t h e CPU c h i p w i t h i ts general data pa th. This paper di scusses the imple mentation of floating point i n the Mic roVAX I I ' s FPU chip and the u n i que constrai nts of a s i ngle-chip floating point accelerator. These constra in ts are not l i m ited only to arc h i tecture but inc lude interface design, wiring, and signa l integrity, all areas where design trade-otis are important. At the hi ghest leve l , the FPU chip imple me nts the F, D, and G fl oating point instruc t ions in the VAX i nstruction set . The chip is constra i ned by the requ i rem ents of the VA.,'\ architecture-data formats, accu racy require ments, and i nstruction vaga ries-and by the characteristics of the technology-lim ited num ber of pins, lim ited die size , and l i m i ted in ter- Digital Technical Journal No. 2 March J 986 New Products Figure 1 Digital Technical journal No. 2 March 1 986 Photom icrog raph of the FPU chip 25 The Micro VAX 781 3 2 Floating Po int Chip connec t . These constra i n ts di ctated many of the design considerations in the FPU chip. FPU Chip Architecture The ma i n el ements of the FPU chip, shown in the block diagram i n Figure 2, are s i m i lar to those in most floating point devices . � Three separate processors-a 6 7 -bit fraction processor, a 1 3 -bi t exponent processor, and a sing le-bit si gn processor-operate in para l l e l . The bus i n terface u n i t handles data transfers over the external bus to the CPU chip and data move ment into a nd out of the three datapa ths . The mi croseque ncer controls the para l l e l opera ti ons of the processors . Each e lement i n the FPU chip operates i n parallel t o speed u p instruction process ing. The mi crosequencer steps through the m icrocode for an i nstruction and determines which opera tion is to be performed by each processor for the current cyc l e . The m i crose quencer also ta kes inputs from each of t he processors to determ i ne which microword is to be executed next. The datapath of the fraction processor performs a l l the arithmetic compu tations on the mantissa of a fl oating point number. This datapath is designed to be fl exible enough to handle the many diffe rent operations req u i red i n a general-pu rpose FPA. The datapath is also segmented to hand le the F, D, and G data types, and is optimized to provide t he max i m u m pos sible pe rformance from the N-channel MOS technology. The datapath of the exponent processor han dles only the exponent portion of a floating point nu mber. The exponent datapath is a lso used as a counter d uring certa in operat ions such as m u l tiply and divide. This datapath does a l l the exception a n d bounds checking for operations l i ke addi tion and subtraction. The sign processor is incorporated into the expo- SIGN PROCESSOR EXPONENT PROCESSOR FRACTION PROCESSOR I I I BUS INTERFACE U N IT Figure 2 26 I MICROSEQUENCER Block Diagram of the FPU chip nent datapath and handles a l l operations per taining to the sign b i t . During an addi tion or s ubtraction, the sign bit determi nes which case is performed by checking the signs of the two operands and the opcode of the i nstruction . The bus i nterface unit (BIU) is respons ible for handl ing all the FPU portions of the bus traffic between the FPU and CPU chips. The BIU decodes the opcode sent to the FPU chip and tells the m icrose quencer which instru ction to execute . That allows the FPU and CPU chips to coordinate their acti ons without a lot of proto col or pins. Since many different data types are processed, the BJU is responsible for u npacking the operands and steering them to the appropri ate datapath . O nce the instruction is com plete d , the BIU takes the u npacked result from each datapath and formats the res u l t i nto the specified data type . Figure 3 contains a more detai led block d iagram for the e ntire floating point u n i t . A lgorithms To keep the FPU chip at a size t hat cou ld be produced, we decided not to use special -pu r pose hardware to i m p l ement instructions l i ke a d d i t i o n or m u l ti p l i c a t i o n . I n s t e a d , t h e datapaths are designed t o b e general-purpose ones to accommodate the needs of a wide vari ety of instru ctions. A ddition and Su btraction The datapaths are u nder microcode control and work in para l l e l . W i t h i n e a c h , the steps requ ired for either addition or su btraction are done seria lly. First, the exponents of the two operands are compared to see if they are of equal magnitude . If not, the larger exponent is stored i n a register, and the exponent differ ence is used to control the a l ignmen t . The shifter on the output of the fraction arithmetic logic unit (ALU shifter) al lows the fraction with the smaller exponent to be aligned five b i ts at a t i me . During each a l i gnment step, the exponent difference is reduced by u p tO a magnitude of five u n t i l the exponents are equal . O nce equ a l , t h e fractions are adde d . ( I n subtraction , the fraction tO be al igned is complemented before al ignmen t . ) The resu l t i ng fraction i s t h e n normal ize d . The norma l ize s hift is accom p lished b y a single left shift i n the fraction ALU and two left shifts i n t h e ALU s h i ft e r . I f the a d d i t i o n of t h e Digital Technical Journal No. 2 March 1986 New Products RTs'('l CLKI I CLOCKS I 1 0 8UF-H R 1J 13 SE;QUF;�CE R CZOO X 351 �·AIN CS2 CSI CSO [P'S Wi'i Figure 3 Block Diagram of the FPU Processor fractions results in an overflow i nto the top guard bit, a single right shift i n the ALU shifter is requi red to normalize the result. During nor mal ization , a 3-bit code is sent to the exponent datapath, which determi nes the amount the exponent must be adjuste d . After norma lization , the fraction i s rounded using a rou nding constant appropriate for the data type of the floating point operation b e i ng perform ed. If the round resu lts i n an overflow in t h e fra c t i o n data p a t h , the exponent is in cremented by one and the fraction is norma l ize d . The exponent datapath t hen checks the resulting exponent for any error con d i tions . If no errors are fou n d , the final fraction and expo nent values are loaded into the outpu t register and the sequencer signals the BIU that the oper ation is complete . Digital Tecbntcaljournal No. 2 Mm·ch 1986 Multzply The mu ltiply operation in the FPU chip is based o n a 3-bit retirement algori t h m . The 3 - b i t reti rement, or octal multiply, m ust generate the requ ired mu ltiple , 0-7, of the m u ltiplicand to be added i nto the partial produ ct for each step . The m u l ti pl es must be generated by simply shifting the m ultipl icand and adding or sub tracting them from t he partial product. The multiples 0, 2, 4 , and 8 are easy to generate in this way. The m u l tiple 6 can be formed by tak ing three -quarters of the multipl icand and stor ing that in a register at the beginning of the m u ltiply (� X 8 6) . As shown in Table 1 , all the even multiples can be generated . To gener ate all the odd m u ltiples, a -1 m u l tiple is added to ach ieve the final exact mu ltiple for each retired group of three b i ts . = 27 The Micro VA X 78132 Floating Point Chip Table 1 M u l tiply Operation - Booth Encodings Multiplier Group Required M u lti p l e Data Used M u ltiple Sh ift 0 000 0 0 001 1 mult 01 0 2 01 1 3 1 00 4 1 01 5 110 6 111 7 mult mult mult 3.4 m u l t 3/4 mult mult The key to making this scheme work is that this - 1 m u l t iple must be generated from the previous group of three bits . To that group, the - 1 mult iple for the next group is equ ivalent to a -8 mu ltip l e . To know whether or not the next grou p will need the - 1 mu ltiple, it is sufficient to examine the least significant bit (lsb) of the next group of bits . If the lsb is a 1 , then the g ro u p w i l l be odd a n d w i l l n e e d t h e - 1 multiple. This process is started by examin ing the lsb of the mul tiplier and initializing the partial product register to either zero or minus the m u l tipl icand . If the lsb is a 0, the - 1 m u l t i p l e will n o t b e neede d . The opera tion always term inates i n the case not requ i ring compensa tion because the nu mbers are a l l norma l ized. Tab le I shows the Booth encodi ngs for each mu ltiplier group. These Booth encodi ngs translate into the fra c tion data path controls depicted in Tab le 2 . A multipl ication i n the FPU chip is begun by load ing the mu ltiplier into the Q Register (quo tient register) and loa d i ng the m u l t i p l icand into registe r 0 i n the scratch RAM Three quarters of the mult iplicand i s then ca lculated during two ALU cycles and is stored in register I o f the scratch RAM . S u b s e q u e n t l y , the A Register is initialized to store the partial products. Du ring each cycle of the m u l tiply loop, the fo ur least significant bits of the Q Register are latched to control each mu ltiply step. Based on th ese fou r b i ts , the m u l t i p ly control loads either the mu ltipl icand or three-quarters of the m u l tipl icand from the scratch RAM into the 8 Register . The control then adds or subtracts the 8 Register from the A Reg ister. The resu lting new partial product is shifted right by the ALU . 28 Mu ltiple Added Mu lti p l e Owed 0 0 2 -1 2 0 2 4 -1 2 4 0 3 6 -1 3 6 0 3 8 -1 s hifter and relatched in t he A Register. The Q Register is then shifted t hree bits to the right to retire the c u rrent set of m u l ti p lier bits and to set up for the next iterat ion . The expo nent datapath i s u sed to control the nu mber of iterations that should occur for each m u l tiply operation and to calcu late the resu lt ing exponen t. The nu mber of iterations that take place for a multiply depends on the le ngth of the mantissa . For exam ple, an F fo rmat n u m b e r w i t h a 2 3 - b i t manti ssa req u i r es eight iterations . Dit •ision The floating po i n t u n i t performs a I . 5 - bit, non restoring division . This algorithm is simi lar to a 1 - bit, non-restoring d ivisio n , b u t takes advan tage of the fact that l ong strings of zeros or ones in the partial remainder can be skipped over without doing an addition or subtraction. The FPU chip handles double precision through i ts normal datapath . Within the FPU chip, the part i a l remaind ers will always be < + Y2 and > -Yz because both fl oating point nu mbers are norma lized . I f the partial remainder is small relative to the nor mal ized d ivisor, a 1 w i l l not be shifted in to the quotient over the next few cyc les (The oppo site is true if an addition is performed . ) Know ing this fact and whether the previous opera tion was an addition, subtraction, or a shift wi I I determine how the quotient b its are deve lope d . If t h e previous operation was a shift , the pro cess is in the middle of a long string of zeros or ones and no addi tion or subtraction has to be performed . If the partial rema inder is not small re lative to the normali zed divisor, the quotient bits are deve loped as they wou l d be in a 1 -bit Digital Technical journal No. 2 March I 'J86 New Products Table 2 M u ltiply Operation - Fraction Datapath Controls Next Group Actual Present Group Group M u ltiple ALU Look Ahead M u ltiple Group M u ltiple M u ltiple Generated Operation 0 0 000 0 0 0 A - A 0 1 001 2 0 2 A - A+B; B=RO 0 2 01 0 2 0 2 A ,_ A+B; B=RO 0 3 01 1 4 0 4 A ,_ A+B; B=RO 0 4 1 00 4 0 4 A ,_ A+B; B=RO 0 5 1 01 6 0 6 A - A+B; B= R 1 0 6 110 6 0 6 A ,_ A+B; B= R 1 0 7 111 8 0 8 A <--- A+B; B=RO 0 000 0 -8 -8 A - A-B; B=RO 1 001 2 -8 -6 A <--- A-B; B=R 1 2 01 0 2 -8 -6 A <--- A-B; B=R1 3 01 1 4 -8 -4 A - A-B; B=RO 4 1 00 4 -8 -4 A ,_ A-B; B=RO 5 1 01 6 -8 -2 A - A-B; B=RO 6 110 6 -8 -2 A - A- B ; B=RO 7 111 8 -8 0 A <- A where: RO contains the multiplicand R 1 contai ns 314 multiplicand division algorithm . Tabl e 3 s u m marizes the 1 . 5 - bit, non-restoring divisi o n . The imp lementation o f t h i s algorithm i n the FPU chip is straightforward . To start, the divisor is loaded into the B Register and the dividend into the A Register. The Q Register i s initialized to 0 and will become the location where the quotient is deve loped . During each step of the d ivision , quotient bits are inserted at the least significant end of the Q Register. The register contents are then shifted left either 1 or 2 as required to deve lop the new quotient for that step. If necessary, the divisor is added to or subtracted from the par tial remainder. The result is then shifted left by the appropriate n u mber of places. When bit 65 in the Q Register beco mes a 1 , the d ivision is stopped . Since these nu mbers are normalized, the resu l t w i l l fal l in the range of greater than Yz but less t han 2 . The contents of the Q Register, already norma l i zed, are then read back into the A Register. However, i f the i n itial subtraction res u l ted in a positive partial remainder, then one must be added to the expo nent to accou nt for the fact that the result has a whole part ( i . e . , � 1 ) . Digital Technical journal No. 2 March 1 986 Integer Division The FPU chip a lso performs a 1 -bit, non-restor ing d ivide algorithm , which is used to acceler ate the execu tion of the DIVL and EDIV instruc t i o ns . I n a l l cas e s , the i n t e g e r d i v i d e is acco mplished with a 3 2 -bit d ivisor and a 64-b i t d ividen d . Polynomial Calculations The polynomial evaluation a lgorithm , POLY, uses Horner's Method to calcu late a l l trigono metric fu nctions . Because execu tion time can be so long, POLY is the only VAX floating point i nstruction that can be interru pted by the CPU chip. The algorithm performs a series of ax+ b operations once d u r i ng each cyc le. I n each operation , x is treated as a constant, the value of b is provided by the CPU chip, and the value of ax+ b in the cu rrent cyc le becomes a in the next cycle. The FPU chip first m u ltipl ies a by x with the MUL algorithm and then adds b with the ADD algorit h m . The main sequencer tel ls the I/0 control ler that the first POLY cycle has been completed and that the result is ready in the 29 The Micro VAX 78132 Floating Point Chip Table 3 1 .5-Bit Division Operation Most Sign ificant Bits of Partial Remainder 66 0 0 0 0 65 0 64 63 0 0 0 0 0 1 0 0 0 Add/Sub Quotient: S h i ft Quotient: Value of Shift ALU bits 66-63 Add/Sub Shift Left Operation Quotient Quotient none 10 1 0 '/e 1 3fe 0 0 1 0 1f4 >-'12 _ J/a -% -1/a 2 2 subt subt s u bt add 1 add 2 none 2 add 00 10 00 1 0 0 0 0 01 11 01 11 Bits sh ifted into the q u otient if previous operation was an addition or subtraction. Bits s h i fted into the qu oti e nt if the previous operation was a pure shift (no A L U operation). ljO registers for transfer to the CPU chip. The sequencer executes the second MUL, (ax + b)x, during the time that the CPU chip i s reading t h e first resu lt, storing i t in a register , a n d transferring t h e next value of b t O the FPU chip. The second ADD operat i o n , (ax + b)x + b, then takes place to complete the second cyc l e , and the process continues. The CPU chip's register is updated with the new resu l t at the end of each cycle. This pipe l i ning a llows fast generation of trigonometric and transcendental fu nctions. Both the CPU and F P U c h i ps are working to i m p le m e n t the i nstructi o n , and t he actual m u l tiply tim e is overlapped by the operand fetch time. In puts to the PLA are comprised of five next address bits, three ded ica ted inputs, and forty signal s from the three major processors on the chip. Three bits from the next-address field are used tO select five of the forty signals for the next FPU cycle. These five m u l tip lexed i nputs, in conjunction with the e ight d irect inputs, are used to address the next m i croword . The thi rty five outputs, or signa ls, from the PLA are used to com municate with the rest of the fl oating point u n i t . These signals determine which operation is to be performed by each of the three data paths (expone nt, fraction and sign processor) . The Microsequencer Inte1jace L ines The mi crocode for the FPU chip is contained in a large programmable logic array (PLA) , which is the heart of the mi crosequ encer. I nputs to the PLA are received from all major sections of the FPU chip. A microword of 35 bits is all that i s needed to control the two main datapaths (the sign processor is part of the exponent datapath) and to communi cate with the bus i nterface unit. Each fi eld in the mi croword is encoded to reduce the n u mber of wires routed to the other sections. Two hu ndred microwords are required to implement the sixty-one float ing point and n i ne accelerated integer instru c tions executed by the FPU chip. The block dia gram for the microseq uencer i s s h own in Figure 4 . The commun ication between the CPU and FPU chips is done through a very l i m ited set of lines: a write (W) strobe , three cyc le status (CS) l ines , a n external processor strobe ( E PS) , and the 3 2 -bit data and address li nes (DAL) . ( Th i s a p p r o a c h was u s e d t o re d u c e t h e pincount o n both chips .) I n t he Mic roVAX II processor, the chip proto col is designed as a genera l-purpose one so that other coprocessors cou ld take the place of the FPU chip. Each interface l i ne has a specific pur pose, as explained below. 30 Interface Between Chips • The W strobe sends a signal from the CPU chip to indicate the d irection of data flow over the DAL . For the FPU chip, the write Digital Technical journal No 2 March 1986 New Products 40 1 N PUTS + DATAPATH STATUS 40:5 MUX l3 B ITS NEXT ADDR<9:7> is BITS 3 BITS A N D P LA N E (200 T E R M S ) Fig ure 4 • The EPS is used by the CPU chip to qualify a l l com munication between itself and the FPU chip or other non-memory devi ce. • The three CS l ines provide status about the cu rrent bus cycle. Two of the l ines indicate the type of info rmation be ing transferred; they are "val id" when the external processor strobe is asserted. The third l i ne is an open drain output (fu nctionally simi lar to an open co llector in TTL) , which w i l l be active when the bus cycle is a response enable and the FPU chip has comple ted the current com manded operation. The DAL is a 3 2 -bit, bidirectional bus that exchanges data between the CPU and FPU chips. The CPU chip is always the bus master and controls the transfer of operands to the FPU chip and resu lts back to itself. The information exchanged between the C PU and FPU ch ips coul d be of different types : write external processor command, read or wri te external processor data, command to other external processors (not the FPU c hip) , and externa l processor response enable. The exter nal processor strobe (EPS) is used by the CPU chip to qualify a l l com m u nication between itself and the FPU chip. Figure 5 i l l ustrates all the interface l ines between the two chips. Co m m u n icatio ns Pro tocol The commun ications protocol permi ts the FPU and CPU ch ips to com m u n i ca te efficiently. Digital Technical journal No. 2 March J 986 N E X T A D D R <6:5> i2 BITS I M I CROCODE PLA OR P LA N E (200 T E R M S) Block Diagram of the Microsequencer signal indicates that data is being transferred from the CPU chip. • 30 ADDITIONAL OUTP UTS Every in terchip operation will be assoc iated with the fol l owing sequence of bus activities : 1. The CPU chip ini tiates an i n teraction by placing a command onto the DAL bus, a status code on two CS lines, a write sig nal of " low , " and an E PS of " l ow . " The FPU chip recognizes this sequence as a command-write cyc le and aborts any i nstruction b e i ng exec u t e d . The FPU chip then decomposes the command to determine t he req uired operation and the nu mber and size of the operands . 2. The CPU chip fetches the requ ired oper ands and executes one or more data write cycles to transfer them to the FPU chip. 3. After transferring the last operand , the C PU chip asserts a response-enable sig nal on the CS l ines and pulses the EPS "low . " The chip does t hat once for each m icrocycl e that it has control of the bus i n order to determine if the FPU chip has fi nished processing the data . 4. To signal the completion of operations, the FPU c h i p asserts the CS< 2 > l i ne " low" when the respo nse-enable signal is on the two CS li nes and the E PS is " l ow . " At t he same time, the FPU chip asserts the status of the just-completed operation. 5. The CPU chip recognizes the "low" sig nal from the FPU chip and reads the sta tus i n form ation . The C P U c h i p w i l l repeat this transaction t o compensate for 31 The Micro VAX 78132 Floating Point Chip V ss � I R 0 <3:0> L CLKI ! I G N O , Vee AS L AS L + I NTTI M L OAL<31 :00> LATCH BA<29:00> OMR L OMG L M I C ROVAX CPU C H I P B M <3:0> L - TRANSCEIVER 8 0 <3 1 :00> OS L WR L PWRFL L WR L OBE L OBE L H A LT L '- ERR L EPS L ROY L R ESET L CLKI FLOATI N G POINT CHIP V ss CS<2:0> I G N O , Vee t CS<2:0> Figure 5 Interfaces Between the CPU and FPU Chips its microcoded pipeline, capturing the status i nformation the second time. 6. The CPU chip execu tes zero or more data-read cycles to read the results, if there are any, from the FPU chip. Both chips are now free to perform the next transaction in the i nstru ction stream . (The FPU chip w i l l respond unpredi ctably to other nonstandard protocols and rel ies on the sequence of i n teractions described above for proper operation . ) Perform a n ce A nalysis The performance of the FPU chip is very sensi tive to the 1/0 bandwidt h . Every floating point opera t i o n i s assoc i a t e d w i t h a s p e c i f i e d sequence o f events that m u s t occur berween the chips before the execution can start. There is another sequence of events that must take place when the computation is completed. These sequences happen without any para l l e l ism o r pipelining. 32 The protocol affects the performance o f the FPU chip because cycles must be expended for sending and reading status signals, and transfer ring data . Table 4 i l l ustrates the i nd ividual steps that occur for three types of operations : ADD F , MULF, and MULD . For these examples, assume that no time is spent o n i nstruction fetch and decode , and that the memory subsys tem has an u n l i m i ted bandwidth and buffering capabil ity for reads and outstanding wri tes. The performance is measured from the completion of the initial instruction decode to the final result store in the memory (or a register) . The total execution time for other instruc tions can be derived in the same manner using the fo l lowing internal execution t i mes : Add in D format - 7 0 0 ns Division in F format - 2 2 0 0 ns D ivision i n D format - 4 4 0 0 ns Digital Technica/Journal No. 2 March 1 986 New Products Table 4 Steps for Add and M u ltiply Operations I n struction: ADDF Specifier decode and data transfer for first operand Specifier decode and d ata transfer for second operand I n ternal transfer (first operand) Execution Status read Status read Result transfer on DAL bus Total Total Execution Time: I n struction: Byte Displacement Execute Protocol Time Time (nanoseconds) Execute Protocol Time Time (nanoseconds) 500 300 500 200 1 00 600 200 200 200 700 1 1 00 1 00 600 200 200 400 1 800 700 1 . 8 microseconds 2.5 microseconds Register Mode Byte Displacement Execute Protocol Time Time (nanoseconds) Execute Protocol Time Time (nanoseconds) MULF Specifier decode and data transfer for first operand Specifier decode and data transfer for second operand I n ternal transfer (first operand) Execution Status read Status read Result transfer on DAL bus Total Total Execution Time: I n struction: Register Mode 500 300 500 200 1 00 1 00 1 900 200 200 200 1 1 00 1 900 200 200 400 2000 1 800 2000 3.1 microseconds 3.8 microseconds Register Mode Byte Displ acement Protocol Execute Time Time (nanoseconds) Execute Protocol Time Time (nanoseconds) MULD Specifier decode a n d data transfer for first operand Specifier decode and data transfer for second operand I nternal transfer (first operand) Execution Status read Status read Result transfer on DAL bus Total Total Execution Time: Digital Technical journal No. 2 March I ')86 400 600 300 1 00 600 1 00 2700 2700 200 200 800 200 200 400 1 500 2800 4.3 mi croseconds 2400 2800 5.2 microseconds 33 The Micro VAX 781 3 2 Floating Point Chip Wiring and Signal Integrity in the FPU Signal i ntegrity in a large VLSI chip such as the 7 8 1 3 2 is fu ndamental to ensure correct functiona lity and good yield, given the varia tions in manufacturing. The one- to two-mi cron proximity of signal l ines on an integrated cir c u i t ( I C ) can c a u s e s i g n ificant c o u p l i ng probl ems . Moreover , there are p roblems i n terms o f clock distribution and power-supply noise . The design of the logic must a llow suffi cient noise margin to permit correct operation in spite of the noise present in the system . The use of charge as the signal (used in many c ir cuits in an NMOS design) , rather than voltage o r cu rre n t , c re a t e d s o m e s p e c i a l d e s i g n probl ems for the FPU chip tea m . IC Wiring Characteristics The FPU chip has four layers-two of meta l , one of polysilicon, and one of diffusion-that are used to interconnect a nd form devices . The wir i ng i n an IC is conceptually similar to the wir ing on a printed circuit board . Although the total wiri ng length on the FPU chip is only abou t four meters, the interconnected nodes and elements nu mber in the tens of thousands. Placing and routing the logic functions i nevita bly affects the estimates of l oading and system performance . Thus an iterative process of first rout ing a design , then simu lating the subse quent performance is needed to identify a workable routing plan . Once this workab le rou ti ng-performance trade-off is identified, the final rou ting and loadings can be made . The wiring considerations for a VLSI design are different from those for conventional sys tems in several ways. First, the dimensions are smaller. In the NMOS process the horizontal metal separation is about three microns and the vertical separation is from one to two m icrons . Even with the smaller size of the wiring in the MicroVAX II chips, crosstal k can become a seri ous prob l e m . On a M O S c h i p , c ross t a l k between poorly designed nodes can approach fifty percent. The capacitance o n many of the critical nodes in the FPU chip is only about 1 00 femtofarads (0 . 1 picofarad) . Any cou piing at all on these nodes becomes quite significant. The largest capacitance on the chip is the clock l ines at arou nd 1 1 0 p icofarads. On dynamic nodes , which rely o n a charge s tored on a 34 capacitor t o represent a l ogic leve l , this coup l i ng is particularly trou blesome . To elimi nate this problem on the FPU chip, the design team checked each of the over 1 2, 5 0 0 nodes for crosstalk from all other nodes in the chi p . This data was then used to change the layout , where appropriate , to minimize or in some critical cases, elim i nate i ntolerable levels of crosstal k . These checks took about three man-months to complete . Another difference in the wiring of a VLSI chip is the resistivity of the wiring. The metal layers in the FPU chip have res istivities on the order of 1 0 0 m i ll iohms per square . However, the resistivities of the polysilicon and diffusion i n terconnect layers are about 40 ohms per square , or 4 00 times that of the metal layers . The i n teraction of this parasitic resistance with the on-chip capacitive l oads can cause serious p e r fo r m a n c e l i m i t a t i o n s if not c a r e fu l l y monitored. I n fact, these two layers are so resistive that they were u nusable for u nconditional routing of either signals or power; they could be used only for very local rou ting. As a special p recau tion , a hand-check of those layers was made at pattern generation time to verify that no long, speed-critical paths util ized these layers as part of the routing network. Power and Sig nal Routing A minimu m-width wire routed the length of the FPU chip has a resistance of about 2 0 0 ohms . The use of metal layers with noticeable resis tance therefore begins to set system perform ance lim its through RC delays as we l l as I R drops , which happens in larger systems . The clock distribution i ntroduces a de lay of about one nanosecond across the FPU chip, due solely to the resistance of the metal interconnect and the distributed l oad capacitance . This de lay amounts to about fou r percent of the length of a single p hase in the chi p . A well - m on i tored c lock distribution system is a requ irement in any sem iconductor chip. The problem is that the performance of the u nderlying semiconduc tor device is beginning to outstrip the capabil ity of the chip wiring to distribute the c lock. RC de lays become the limiting speed factor of the wiring in an IC, while the speed of l ight across transmission J ines is the limiting factor in a larger syste m . These resistances can a lso Digital Technical Journal No. 2 March 1986 New Products seriously affect the power and ground su pply as it is distribu ted t hroughout the FPU chip. We used several techniques to keep the sup ply noise under 200 mV as power is distribu ted throughout the chip. First, the tota l de current was calcu lated by summ ing the current used in each power and ground line as it joined other branches on the route to the actual su pply pad . At this point in the net , rwo factors had tO be analyzed so t hat the width of the power bus cou ld be sized correctly. That sizing kept the equ ival ent resistance low enough so that the overall d rop from a pad to the most remote logic cou ld be kept under 200 mV. Unfortu nately, that sometimes requ ired large (on an IC scale) power buses i n which a significant frac tion of an ampere must be provided by one supply line. The second prob lem, and the more difficult one , associated with the power and ground wir ing is the large ac voltage trans ients that can occur when large portions of the system switch at the same time. That prob lem i s espec i a l ly significant with the V55 lines. And i t is particu larly difficu l t when driving wide buses or large datapaths as wide as the 8 1 bits in the FPU chip. In these cases , large transients (one ampere or more) flow in ground and power l i nes for a few nanoseconds. In a large system environment, decou pl ing capacitors can be used to supply these currents loca lly. Unfortunately, that is not possible in an IC environment where such large capacitors are not practical . As a resul t certai n ground l i nes in t h e FPU chip are a l l owed to have significant noise on t he m . In some cases this noise spike can be as much as rwo vol ts. This noise is handled by r� mn ing these " dirty" grounds in a separate metal line all the way bac k to the pad on the chip. However, even when the li ne is taken back to the pad to prevent local IR drops from u pset ting the logic, parasitic inductance in the pack aging can still cause problems . The most strik ing exam ple is that of off-chip bus drivers . Here a typica l 3 2-bit bus is driven over 4- or 5 -volt swings in as l ittle as fo u r or five nanoseconds. With each bus load being on the order of 1 00 pf, the large dijdt that the chip imposes on t he power pins causes i ndu ctive ringing. Solving this problem by placing a decoupl ing capacitor on the externa l pins is of l ittle va lue s ince the pac kage indu ctance effect ively isolates t he capac i t o r from t h e a c t u a l nodes i t m u s t Digital Technical journal No. 2 March 1 986 decouple inside the chip. Therefore , the FPU chip, like most chips that drive wide buses, has separate power pins going only to the output trans istors. The subsequ ent ringing is tOlerated since it does not affect any i nternal logic. (The ringing can become even more of a prob lem on c h ips with several buses with diffe rent timings, s i n c e separate s u p p l i e s m u s t be used fo r each bus . That drastically in creases the number of supply pins requ i red on the chip.) The FPU chip devotes 1 9 of its 68 pins to V55 and V00 d istribution. Electro migration A fi nal wiring consideration in design ing the FPU chip was e lectrom igration. El ectromigra tion is a re liabil ity issue in IC wiri ng because high cu rrent density in the metal interconnect can cause t he metal to m igrate , thinning sec t ions of wiring until they fi nally fa i l . Current densities much higher than 1 0 5 amperes per square centimeter can cause increases in wiring res istance and eventual ly, open c i rcu its or increased interlevel leakage , and s hort circu its. Cl ock li nes, power and ground buses , as well as some globa l wiring, are susceptible to this fai l ure mechanism. As a resu lt, all l i nes o n t h e F P U c h i p have an a d d i t i o n a l cu rrent constra i n t imposed b y electrom igration . When t h e chip was designe d, these lines all had to be checked to eliminate the problem . Wiring In tegrity Considerable time was spent checking the elec trical integrity of the wiring in t he FPU chip. The fol lowing l ist contains the most important wiring integrity che cks made of the intercon nect on the chip: 1 . Transistor SourcejDrain Integrity - This check assured that the silicon intercon nect resistance caused l ess than five per cent degradation . 2. RC Delays - Al l RC de lays greater than one nanosecond were analyzed . 3. Cou p l i n g - Al l i n ternodal co u p l i ng capacitors were checked to verify t hat there would be less than 2 0 0 mV of noise injected into the node. 4. V00 and V55 Nets - Three checks were performed. First, al l IR d rops were mea su red tO ensure that ac and de voltage 35 The Micro VA X 78 132 Floating Po in t Chip sources were kept under 200 mV. Sec ond, all buses were sized to verify their r e l i a b i l i ty for el ectromigra t i o n r e si s tance . This check i ncluded contact el ec trom igration . Th ird , a check ensured that sufficient isola ted power pins existed to guarantee that clean and di rty grounds were isolated. 5. References 1. D .W. Dobberpu h l et a ! , "The MicroVAX 7 8 0 3 2 Chip, A 3 2 -bit Microprocessor," Digital Technicaljournal (March 1 9 86, this issue) : 1 2 - 2 3 . 2. R.J . Simcoe e t a l , "A Floating Point Un it for a 3 2 -b i t M i croprocessor Sys te m , " Proceedings of the 1984 IEEE Custom In tegrated Circ u it Co nference ( May 1 98 4 ) : 4 78 -4 8 1 . 3. G . Wolrich et a l , "A H igh Performance Floating Point Coprocessor , " IEEE jo u r nal of Solid State Circuits, vol . SC- 1 9 , no. 5 (October 1 98 4 ) : 690-696 Clock An analysis ide ntical to that for V 00 and V55 nets was done on all eight clock l i nes. - Although there were significant CAD tools to perform most of the checking, this task atone required approximately ten percent of the total engineering time for the entire project. Summary The VLSI chips we are now design ing are as complex as several boards of TTL used in past implementations of the VAX architectu re . The FPU chip performs the same functions at about the same speed as five boards conta i n ing ICs in the VAX- 1 1 /780 syste m . The designs of these complex systems on chips present a set of con stra i n ts and considerations simi lar to and yet d i fferent from those encou n tered by board level system designers . We hope that this paper c a p t u res t h e com p l e x i t y a n d u n i q u e ness i nvolved in the MicroVAX FPU chip. Acknowledgements The FPU chip team completed the design of two VAX floating point chips, the MicroVAX FPU and the 8 2 0 0 chip, in e ighteen months . Tha t was possible only because anot her design team working on the J - 1 1 FPA had estab lished the basic arch itecture and took the time to he lp our team to u nderstand that work . This c lose worki ng relationship a llowed us to complete the MicroVAX FPU design in step with the CPU chip tea m , which was our major challenge . 36 Digital Technical jounlal No. 2 March 1 986 Barry A. Maskas I Developing the Micro VAX II CPU Board Within the Micro VAX II system, the CPU board provides an environment to optimize the performance of the CPU and floating point processor chips. The board is designed as a linked sequential machine to accom modate the sequential control of the CPU chip. A Q-bus handles I;o for the system. The memory access path is dual ported, allowing the memory and the CPU chip to run synchronously without wait states. A scatter gather map provides Q-bus address translations. To minimize p1·oduct delivery time, the CPU board was developed in parallel with the chips. Using CAD tools helped to go from first-pass chips to running the Micro VMS system in only two weeks. The CPU board i n the MicroVA.X I I system (Figu re 1) holds two chips: a microprocessor, ca l l ed the CPU c h i p , and a fl oating p o i n t coprocessor, cal led the F P U c h i p . The board also integrates a synchronous memory su bsys te m , a syn chronous I/O-bus contro l ler, and a sync hronous on-board 1/0 subsyste m . The pro ject tO develop the CPU board was governed primarily by t i m e -t O - m a rket consi derations . Figure I The Micro VAX II CPU Board Digital Technical journal No . 2 March I ')8(i Other factOrs, such as VMS and ULTRIX compat i b i l i ty , performance , re l iability , cost, and ease of h igh-vo l u me production were a lso important criteria . The end resu l t is a su ccessfu l balance between a l l these factors . Development Goals The im portance of the primary goa l governed how the project team orga nized itself to make decisi ons and tO execu te tasks. Rapid decision making, and parallel and ove rl apping act ivi ties were the norms for th is deve lopment effort . Unfortu nate l y , para l l e l activi ties can cause com m u n i cation problems, thus increasing the r i sk s of p rod u c t fa i l u re . Howeve r , these prob lems were anticipated and mechanisms put in place tO reduce t he risks to an acceptable leve l . The CPU board was designed around the specifications of the CPU and FPU chips, which were be i ng developed at the same time . There fo re, one deve lopment goal was to m i n i m ize the dependency of t he board design and layout on the first-pass designs for these chips. The team aimed at provid ing a fu ll y fu nctional sys tem environment into which the first-pass ch ips cou ld drop. This aggressive approach lead the team to leap-frog over events rather than to take a conve ntional steppi ng-stone progress ion. The overa l l project manager encou raged the taking 37 of pru dent risks because he was respons ible for m e e t i n g t h e deve l o p m e n t s c h e d u l e . T h e acceptance of these risks event ually paid off i n an on-time d e l i very of t h e CPU-board design . Single-board Design Deve loping the CPU board aro u nd the two chips re quired us to provide a specific system e nvironmen t . That environment had to ba lance the memory bandwidth of the C PU ch i p agai nst its I/0 bandwidth requi reme nts . The real iza tion of that balance is the key to the board 's success. Having either a sl ower mem ory or a slower ljO subsystem wou ld degrade system performance by at l east twenty-five percen t . The environm e n t also h a d to su pport t h e M i croVMS , ULT R I X , a n d VAX E L N opera t i n g systems. Our goa l was to provide the hardware speci fied by the three opera t i ng sys tems on one Digital-standard quad-sized board (8-!12 by I O - Y2 inch) . The single-board goal was a consequence of technology improvements balanced by the costs of replacing the unit in the fie l d . In this case , needing fewer pieces t o build the system wou l d reduce manufac turing costs, i m prove re liabi l i ty, and ease mainta i nability costs . The objective of operating at the ful l bandwidths of the chip and the IjO bus was especially chal lenging when so l i ttle board space was avai la ble for the necessary fu nctions . Most new chips do not run at their fu l l speed immediately; they take some t i me to debug. Our design objective was to run the C PU chip at a n operating frequency lower than i ts maxi m u m during the first-pass debug . Of course , ru nning at a s lower clock rate was never an acceptable comprom ise for the fi nal product. (Two versions of the C PU board were devel oped with m i n i mal component differences, one running at the fu l l 2 0 0-nanosecond (ns) micro cycle speed and the other at a slower 24 2 -ns m icrocycle speed.) However, if the first-pass c h i p had missed its performance ta rget , the deve lopment of the CPU board cou l d s t i l l have conti n u e d . It is a tribute to the chip designers that the first-pass chips did run at fu l l speed , which was q u ite unusual in so compl icated a product. The bus c hosen to meet the 1/0 needs of the system was the Q2 2 -bus. This 2 2 -bit bus has sufficient bandwidth to hand le traffic from the system disk, the Ethernet LAN , and other ljO 38 sources, such as other processors. The risk of using this bus was low due to its proven design, and the deve lopment cost for this application was reasonable. The Q 2 2 -bus is also supported by many disk, ta p e , a nd other ljO products from b o t h D i g i ta l an d t h i r d - p a r ty a d d - o n manufactu rers . CPU Board Functions We ru led o u t using the Q 2 2 -bus for access ing mem ory directly, since the bus cou l d not meet the memory cycle time of 400 nanoseconds for the C PU chip . 1 Therefore , a new memory arc h i tecture had to b e devel oped . W e i nvestigated rwo a l ternative sche mes, the first being the widely used d irect me mory access (DMA) with a s i n g l e p o r t . Unfo rt u n a t e l y , DMA forces addresses and data to cross the mi croproces sor bus on t h e i r way to me mory. The usual pro cedure is to ha lt the m i croprocessor with a D MA req uest or grant while the DMA device uses the m i croprocessor's data a n d address paths . I n this case the C PU c h i p , having no ca c h e , wou l d waste time by exerc i s i n g the memory request and memory gra nt s i gnals. Therefore, we chose the second sche me, a dual ported memory contro l ler. Figure 2 depicts the single- and dual-ported memory control lers that were considered . This dual -ported control ler requ i res that the C PU chip have different address a nd datapaths for the Q 2 2 - bus and the memory control ler. While a DMA access is taki ng place , the CPU chip can continue operating on i ts 3 2 -bit exter nal datapa t h , primarily comm unicating with memory and the FPU chip. I n this context, memory cycles can be p i ct u red as stri ngs of 4 0 0- ns time s lots controlled by a central arbi ter. This mem ory c o n tro l l er m i n i mizes the i mpact on the C PU chip's performance by DMA accesses t o m e m o ry on the Q 2 2 - b u s . This orga n ization is not locked u p by asynchronous Q 2 2 -bus cycl es, whose transac tions are three to four times slower than the C PU chip's memory cyc les. It also al lows the Q 2 2 -bus protocol to operate a u tonomously with the CPU c h i p and memory, except when the buffered bus proto col and the memory system exchange buffered data . The mem ory controller also serves as an alter native to one based on a cache . The C PU chip does not implement an i nternal cache due to power and c h ip-size constrai n ts . 1 Cycles for Digital Technical Journal No. 2 March 1 986 N e w Products CPU CHIP FPU CHIP '--132 f-- +24 LATCH - LATCH SCATTER/ GATHER MAP r-- M U LTIPLEXER +10 M EMORY ARRAY (f) :::> (lJ - 1 6 0J N 0 +32 DATA TRANSCEIVER SING LE-PORTED ORGANI ZATION ,--- f24 LATCH LATCH SCATTER/ GATHER MAP - MULTIPLEXER CPU CHIP FPU CHIP +o r-- f---12- (f) :::> (lJ � 0J N 0 MEMORY ARRAY f-- +2 '--- DATA TRANSCEIVER DATA TRANSCEIVER DUAL-PORTED ORGA N I ZATION Figure 2 Block Diagrams of the Proposed Con trollers DMA, refresh i ng memory, and C PU-chip access are interl eaved in time . The MicroVAX I I system is designed to be used in a m u l ticomputing e nvironment. There fore , the bus interface logic has to accommo date the role of e i ther bus arb i ter or auxil iary processor. To that end , a doorbe l l register fac i l itates an i nterprocessor i nterrupt mechani s m . The datapath o f t h e Q 2 2 -bus i n terface h a s to provide the address translations from the virtual Digital Technical journal No. 2 March I 986 memory space of the bus to the address space of memory. We defi ned several other ele ments as being essential for supporting an operating system on a s i ngle board . Those are the t i me-of-year (TOY) c lock, the console serial line, the VAX consol e command program , and the conso le i nterface-boot and self-test ROM . These e l e me nts, a long w i t h some status and error regis ters, comprise the on-board 1/0 su bsystem . 39 The fu nctional organ ization of the CPU board is depi cted in Figu re 3 Linked Sequential Machines Optim i z i ng the overa l l computer performance means that data transfers between the CPU chip and mem ory have tO be as fast as the chip can operate . Withou t a cache me mory , the CPU chip has a relatively long memory cycle time of 400 ns (two m icrocycJes) . Thus CPU chip-tO mem ory data transfers can take place without wai t states . The 4 0 0-ns 1/0 cyc le is nevertheless fast enough that the CPU board had tO be designed as a l inked seque ntial machine rather than as fl ow- through logic. The control function in the MicroVAX II system receives signals, in terprets the m , and generates control outputs , a l l in a defined sequence . This mode of contro l cannot be satisfied using a com b i national logic syste m . I n addition t O perm it t i ng 4 0 0 -ns m em o ry cycles without wa it states, sequential machine design requ i res less random l ogic and board space than a flow-through design . The des ign process is s i mpl i fi ed because the machin es are impleme nted in eas i l y cha ngeable FPLS (fuse progra mmable l ogic sequencer) l ogi c. Moreo ver, design changes can be read i ly documented and less time is needed for de bugging a nd trac ing events. Sequential circuitry is more easil y s i m u l ated than random logi c , i n which a l l events must b e sampled. And , since the CPU board 's logic compone nts run on the same cl ock, it is possible tO debug them at faster or slower operating speeds . When the C PU-board proj ect starte d , this sequential mac h i ne approach had nor been widely used in microcomputer design . Off-the shelf hardware and adequ ate CAD tools were not ava i lable . This project shows that designing w i t h commerc i a l PALs and F P LS l ogic can reduce the c h i p cou n t , as well as cost and deve lopment time . The overa l l control l o g i c of t h i s l i nked sequ entia l machine is d ivided into partitions . The events i nside i ndividual parti tions are gov- EXPANSION M E M O R Y DATAPATH II CONSOLE � SERIAL L I N E GATE ARRAY 022-BUS INTER FACE CONN ECTOR/ DISTRIBUTION PANEL SCATTER-GATHER MAP 1 M B OR 256 KB DRAM GATE ARRAY M I CROVAX 78032 INTERFACE BOOT/ DIAGNOSTIC ROMS MICROVAX 78032 M I CROPROCESSOR M I CROVAX 78132 FLOATING POINT U N I T C / D INTERCONNEC T E X P A N S I O N MEMORY CONTROL PATH c D Figure 3 40 B A Functional Partitions of the CPU Board Digital Technical journal No. 2 March I Y86 New Products The bl ock d iagram i n Figure 4 depi cts a sequential machi ne representation of the CPU board's functional configuration in Figure 3 . Under the on-board control partition at the left, t he control fu nction for t he me mory su bsystem is distributed am ong three sequential devices : the mem ory sequencer, the memory arbiter, and the auxil iary device controller. Under Q 2 2 bus contro l , t here are also three sequential devices : t h e s l ave, a r b i t ra t i o n , and master mac h i nes . These ma chines exchange req uest, acknowledge , and status signals to control operations . erned by i ndependent sequential mach ines, called contro llers. The logic within a partition goes through a fi xed , repetitive sequence of operations , or states, duri ng the fou r quarters , or p hases , of a microcycle. The operations of the various partitions are coordinated i n two ways . First, all sequential machines run from the same c lock so that their timing is based on the same stream of c lock edges . Second, the sequential mach ines are constantly exchanging signals, providing each other with the protocol i nformation needed for coord i nating their flow sequences . The sequential mac h i nes can be classified as modified Mealy mac h ines 2 The ou tputs are determined by the present input condi tions and the present state of the mac hine. However, the state register is separated from the output regis ter, with the AND programmable logic array fed by both the state register and the i np uts tO gen erate OR p l a ne terms for the c l ocked S R latches . The advantage o f clocked S R latc hes is that the past state need not be regenerated by every c lock edge; only c hanges need activate an OR term. Using D-type latches wou ld requi re that rege neration . Memory Subsystem Our market research data suggested that the on board me mory should be either 2 5 6 kilobytes (KB) or 1 megabyte (MB) . The amount depends on whether 6 4 K DRAMs or 2 5 6K DRAMs are used . At the time the design was started, 2 5 6 K parts were i n short supply. Therefore, using 64 K D RAMs was a strategy to cou nter t hat s hortage . The function of the memory control ler is to carry out 4 0 0- ns read and write operations and to refres h its RAM chips. This control ler con- f-.--- ON-BOARD CONTROL ----to>- 022-BUS CONTROL ---- 1 I I I MEMORY SEQUENCER I lu-- i a: 0 <f) rJ) () >< o w CI: a; U a. < CI: o "- < a: CI: w o w >- >- <( ::;: � (9 UJ MEMORY ARBITER I I ARBITRATION MACHINE � MASTER MACHINE I I I AUXILIARY DEVICE CONTROLLER Fig ure 4 Digital Tecbnicaljournal No. 2 March 1 986 SLAVE MACHINE I I I <f) a: w N z 0 a: I u z >rJ) BUS INTERFACE GATE ARRAY I I I Block Diagram of the Control A rchitecture 41 ---- . De veloping the Micro VA X II CPU Board tains a Q 2 2 -bus scatter-gather map that handles coincident with the CPU c h i p ' s entry to a new transfers between the Q 2 2 - bus virtual m e mory m i crocyc l e . This enabl ing happens even though and on-board phys i cal m e m ory. the sequencer does not yet know whether or Mem ory access is controlled by the memory not there will actually be a memory access by arbiter. T h i s arbi ter c h e c ks for out sta n d i n g the C PU c h i p . Not until three phases later can m e mory ac cess r e q u e sts i n a fi x e d - p ri o r i t y the sequencer determine whether or not the sequence at the e n d s of 2 0 0 - ns i d l e cycles and address strobe has been asserted for a mem ory 4 0 0 - n s m e m o ry cycl es . reference . If so, the sequencer enables the con It a l s o c h e cks for requests from the Q 2 2 -bus slave machine , the t i n u a t i o n of the anticipated m emory access . memory-refresh counter, and the CPU chip, i n After that cyc l e completes, the next memory that order. T h e fixed-priority seque nce resolves access wi l l be enabl ed , and t h e p ro c e d u re col l ision requests for memory usage . If the arbi repeated. ter req u i res exclus ive control of the m emory cycle and ru ns another p o l l l o o p after checking su bsyste m , a locking mechanism b u i l t int o the for Q 2 2 - bus s l ave or refresh requests. Not antic subsystem prevents contention . If not, the sequencer " k i l l s " t h e ipating a m e m ory access wo u l d reduce per When the CPU c h i p re qu ires a mem ory-read fo r m a n c e lock, the m emory arbiter wi ll sta l l the chip and percent. by approx i m ately t h i rty - t h re e direct the Q 2 2 - bus arbitration machine to sus The m e m ory sequencer generates the row pend other bus activity. Those actions wi l l hap and column address strobe s , sets up reads and pen only after any pending m emory cycles of writes on each byte , and handles pariry genera the slave machine have been complete d . The tion and detectio n . The a u x i l i ary device con arbitration mach i n e will reta i n Q 2 2 - bus master tro l l e r can "stretc h " the me mory cycle of the ship until the writeju nlock cycle of the CPU C PU chip to synchronize its timing with slower chip fre e s the bus. Unti l the a r b i t ra t i o n machine becomes Q 2 2 -bus master and w h i l e devices, such as the TOY clock and the boot ROM . t h e C PU c h i p i s stal l e d , the memory arbiter w i l l The scatter-gather map converts between the pe rfo rm the demand- driven refresh cycles and 2 2 -bit virtual addresses of the Q 2 2 - bus ( 4 MB resolve s l ave-deadlock cycles from the Q 2 2 - addressable) and the 2 4 - b i t physical addresses bus . As eac h memory cycle is completed, the of the mem ory (up tO 1 6 MB addressab l e) . As me mory arbiter c hecks t hese requests aga i n , defined by VAX mem ory management, the 4 M B and ei ther t h e Q 2 2 -b u s o r t h e refresh - m e m ory is di vided i n to 8 1 9 2 pages o f 5 1 2 bytes eac h . cyc l e can begin at the next clock edge . If no The 2 2 - bit virtual address consists of a 1 3 -bit Q 2 2 - bus or refresh requests are pending, the page n u m ber and a 9 - b i t offset to the addressed arbiter antic ipates that a C PU-chip cycle w i l l be byte in that page . The 24 -bit physi cal address next. consists of a 1 5 -b i t page number and a 9-bit fi x e d - p r i o r i ty offs e t . An entry in the map for each 5 1 2 -byte sequence save a lot of program execution time . page and offset points to a location in physical The C PU c h i p makes about seventy percent of m e m ory. Each physical address has four byte That anticipation and the a l l m e mory references . Slave machine accesses masks that select w h i c h bytes are i nactive on by the 1/0 bus devices occur twenty percent of any memory reference . the time (a max i m u m burst rate , not the aver age rate) , and those by the refres h cou nter, two There are , of c o u rs e , other ways tO map addresses ber�'l'e e n the 1/0 bus and m e mory. ( T h e r e m a i n d e r are i d l e c y c l e s . ) One way is on e-ro-one address trans la t i o n , The refore the controller, b y anticipa t i ng that w h i c h in t h i s case wou l d have restricted physi the CPU chi p-rather than the I/0 bus or the cal memory to 4 M B . Another way is first tO map perce n t . me mory-refresh counter-wi l l make the next one-ro-one i n to the lowest 4 M B of m emory. mem ory access , allows a m emory cycle of 4 0 0 Then, the C PU c h i p can pe rform the transla ns, instead o f 6 0 0 ns. (The 6 0 0 -ns cyc le wou l d ti ons and data transfers to the proper pages in b e necessary because t h e address strobe of the the address space of the rem aining m e mory. CPU chip wou ld have to assert before the m e m · Unfo rtunately, ory t h u s was t i n g o n e due to its e ffe ct on performanc e . A third way is When t i m i ng m i c rocyc les, t h e m e mory arbi case , programmers m i ght have tO provide the i r ter enables the mem ory seque ncer at phases own mapping software for m a n y re a l - t i m e IjO cyc l e could start , m i crocycle . ) 42 this approach is unacceptable to have fewer than 8 1 9 2 mapped pages. I n this Digital Technical journal No. 2 March 1986 New Products ap p l i c at ion s . T h a t typ i c a l l y i nvo lves D MA access to large nu mbers of RAM locat ions . None of these me thods proved as sati sfactory as the use of the scatter-gather map. Interface Co ntrol Signals The in terface control signals to the CPU chip include the fo l l ow i ng: • Cloc k- i n ( 4 0 MHz) , clock-out (20 MHz; used to time the sequential machines) , and reset signals • Address , data , externa l -processor, and tim ing-strobes-out signa ls • Three chi p-status , fo u r byte -mask, and the readjwrite signals • DMA-request and DMA-gran t signals • Four i nterrupt - l i n e signals and one HALT s ignal • Ready and error signals The pu lse of the design is a fou r-state grey code bi nary counter, which is c locked from the synchronous clock-out signal of the CPU chip. The first edge assertion of the clock-out signal after power-up puts the C PU chip i n the first 5 0- ns phase of the fou r-phase m icrocycle . The grey code a l l ows the memory arbi ter and auxil iary device contro l ler to track the state of the m icrocycles. The 2 8 - b i t address of the CPU chip is decoded to select the accessed device and then encoded into a series of 3 -bit cyc le cod es . The a u x il iary device controll e r , the me m o ry a rb i t e r , and the master m a c h i n e decode those cycl e codes t o identify what type of t i m i ng cyc les to sequence through . The two key signa l s , apart from the cyc le codes, are those for the address strobe and the readjwrite . They direct the auxi liary device control ler, the me mory sequencer, and the master machine to perform the read or write operations with the device specifi e d in the cycl e codes . Those three e le ments control the CPU chip's cyc les and any system exceptions via the ready and error s ignals. The D MA request s ignal is used only during a reset operation to delay the CPU chip until the system has fin ished reset ting . The byte-mask signals si mply direct the control logic to perform certa i n operations. Those i n c l u d e masked ( byte or word) or unmasked ( longword) mem ory cycles and data Digital Technical journal No. 2 March 1986 fu n n e l i ng operations on the Q 2 2 - bus. (Data fu nne l i ng converts 3 2-b i t Io ngwords tO 1 6-bit words and v ice-versa . ) The unmasked cyc les are requ i red since the Q 2 2 -bus is 1 6 bits wide, whereas the me mory and CPU-chip buses are 3 2 b i ts wide . The on-board ljO t i me can be exte n d ed tO accommodate s l ower external devices. The memory contro l l e r a l l ows t he me mory cycl e to end only when a device has asserted a ready (ROY) signa l , indicating the completion of i ts tas k. Add- o n Memory The syste m's memory can be expanded with one or two memory boards, each containing e ither 1 , 2 , 4 or 8 megabytes . Thus total mem ory can be as large as 1 6 MB and stil l offer a fixed 4 0 0-ns access time with no wait-states . Each board is l i n ked to the CPU board by means of a l ocal i nterconnect . This i n terconnect con s ists of special control signals on the C and D rows of the Q 2 2 backplane and a 5 0 -pin mod u l e -header and ri bbon cab le for data . Each i nterconnect l inks a board d i rectly to the one j ust below i t in the board cage of the system enclosure . Thus control signals and addresses can pass directly betwee n the chips and mem ory without using the Q 2 2 -bus. The diagram in Figure 5 shows the fu nctional organization of the memory boards. For ease of instal lation and maintainabil ity, t h e a d d - o n m e m o ry b o a r d s are s e l f- c o n figurab l e ; there are n o user-settable swi tches or j u mpers on the CPU board or mem ory boards. This design req u i res a logic fu nction that com b i nes active addresses with static configuration data tO generate the proper control strobes accord i n g to the configu ra t i o n . Therefore , a lthough the add-on memory boards are posi tion i ndependent, they " recogn i z e " w h i c h expansion slots they occupy. (To g e t t h e fu l l 1 6 MB configurati o n , t h e mem ory contro l ler design su pports 1 MB-by- 1 DRAM chips . ) On- board IjO Su bsystem The serial line in terface in the on -board ljO su bsystem provides the C PU board with a fu 1 1 duplex, RS -4 2 3 EIA conso le term inal interface . The conso le i nterface program is imp lem ented i n macrocode i n the boot ROM . The console mode fu ncti ons include general booting, user computer i nterface , se l f- test and HALT . The boot ROM also includes special su pport fu nc- 43 MEMORY OATAPATH ( 1 0 MB/S) PR IVATE MEMORY 1 A C/D I N TERCONNECT 022-BUS BLOCKMODE DMA (3.3 MB/S) 1/0 Figure 5 Fu nctional Partitions of Memory Modules ti ons for the software i n the MicroVMS, ULTRIX and VAXELN systems . As the boot ROM goes t hrough a self- test sequence , progra mmable LEDs disp lay the test status, identifying any board su bsystem that conta ins a fa il ure. By ana lyzing this sequence for effect iveness , we found that it provided a confidence level of eighty-six percent in the fu nctional integrity of the CPU board and add on memory boards. Although some Q 2 2-bus l ogic fu nctions cou ld not be tested with this method , i t he lped to re duce significantly the ti mes to do manufacturing and fie l d service tests . To emu late a C PU-ha l ted condition, the CPU c h i p can be di rected by either software or hard ware switc hes to transfer program control to a firmware rou tine at a fixed PROM address. The HALT fu nction retains the board state . The CPU chip traps to the boot ROM when there is a HALT, masking it u nt i l there is an i nstruction fetch outside the ROM . While i n this emu lated HALT, the firmware will perform the specified operati ons only after receiving e i t her console commands or a signal from the AUTO-REBOOT swi tch . The CPU chip does not have a RESET i nstruc tion; t he chip si mply sets a RESET req uest fl ag. The UNJAM command in the console mode i n i - 44 MASS STORAGE tial izes t h e b u s b y forcing t h e C P U c h i p to the D MA grant state . UNJAM then transfers control to RESET in the i nterface gate array of the C PU chip . After that, the logic resets the board's fu nctions and the arbitration mac h i ne resets the Q 2 2 -b u s . Any a u x i l i a ry processors are reset from the Q 2 2 -bus reset signal . Exceptions , which may originate in the con sole, the on-board 1/0, the Q 2 2 -bus, or the memory subsyste m , are reported to the CPU chip for a mac h i ne chec k . This process i nvo lves setting an error-register flag i n the i nte rface gate array of the CPU c h i p . The c h i p then treats the exception as either fatal (HALT or AUTO REBOOT) or non-fata l (abort the process) . Board Compon ents Logic hardware for the CPU board was selected by balancing the need for m i n i m u m power and board space against the use of low-cost, off-the shelf components. The gate arrays for the C PU board and the bus i n terface, for i nstan ce, are more expensive than di screte logic; however, they are necessary to fit a l l su pport fu nctions on one qu ad-sized board . Due to a conductivity con nectivity l i m i ta t i o n through the board's edge fi ngers , the max i m u m a l l owable power cons u m ption is 45 watts for a 1 MB on-board memory confi g u ra t i o n . We were a l s o con- Digital Tecbnicaljournal No. 2 March 1 986 New Products strained by the watts per square inch that had to be conducted from the board surface to the environment. That was important given that the enc losure is cooled by the flow of forced air. The gate array for the CPU-chip i nterface decodes add resses a n d l a t c h e s b o o t - R O M words. T h i s gate array also contains registers for booti ng, diagnostics, and memory su bsystem errors; the on-board 1/0 datapath ; and the i n ter ru pt-acknowledge decode and control . The gate array for the bus interface incl u des such components as the doorbe l l register, the mem ory-refresh cou nter, the holding latches for byte and word packing and u npacking, and timeout cou nters . This gate array also generates the bus addresses. The memory subsystem i nclu des a nu mber of discrete components . The memory arbiter and auxil iary device controller are both commer cial progra mmable sequencers . The memory sequencer consists of 1 2 d iscrete l ogic chips. However, we had to design our own memory control lers . The ava i lable commercial ones cou l d not handle both the speed a nd the higher- level arb i tration fu nction requ ired to antici pate me mory accesses. Previous board designs used a n eight- layer construction tech nology (two power, fou r sig nal, two cove rs, a nd top and bottOm solder masks) . However, to reduce the board 's cost, a six-layer technol ogy had tO be deve loped (two power, fou r s igna l , and top and bottom dry-fil m solder masks) . Six- layer construction costs less than eight- layer due to alignment and dri l l ing problems with the stacked layers of the latter. We used a CAD system to eva l u ate the chip interconnects on the board layout . The system showed that the signals cou l d not be routed on two signal layers, bu t could on fou r . The two additional layers provide the 5V power and gro u n d p l a n e s . D i g i ta l ' s C o m p u te r -A i d e d Design (CAD) Gro u p in Mayn ard , Massachu setts, designed a custom software too l to he lp i n deve loping the board layout. With this tool , i t was possible to fit all fu nctions o n the board with 8-m i l l i nes and spaces, and 60-mil pads. Having the l i nes and pads as wide as poss ible offers satisfactory yield in production and good signal qual ity due to stri p - l i ne c haracteristics. Enclosures Two enc losures were considered to house the boards, the BA2 3 and the BA 1 2 3 boxes . At the time, the BA2 3 box was an active product; only Digital Tecbnical]ountal No. 2 March 1986 minor modifications were needed to accommo date it to the MicroVAX I I system , a nice, l ow risk p lan . In contrast, the BA 1 2 3 box was still being developed . Using it represented a greater risk; however, it cou l d support more mass stor age . The backpl a ne cages of ei ther box cou l d accept add-on me mory a n d peripheral device interfaces on eit her quad-sized or d u a l-sized ( 5 - ll.; by 8 - Y2 inch) boards. However, the BA 1 2 3 box accepted more quad-sized and dual-sized boa rds . That was a disti nct advantage because there wou ld be d i fferent n u mbers of board s lots in the board cages i n diffe rent packages of the M icroVA.X I I syste m . Moreover, each enclosure had a different thermal environment that had to be considered in the layout of the C PU and memory boards . Based on these considerations, we chose to u se both the BA2 3 and BA 1 2 3 boxes as the enclosures for the boards. CAD Tools The tight schedu le dictated that separate design teams had to deve lop each of the chips and the CPU board as paral lel projects . These separate efforts were made possible by the extensive use of CAD tools and computer s i m u lation. Simu la tion was used extensively to design the C PU and FPU c h ips, the on-board memory and ljO su bsyste ms, the gate arrays , the sequential machine contro llers, and the Q 2 2 -bus. A board deve lopment tool set was selected from CAD packages avai lable i n the industry . S i nce these pac kages were genera l ly i ncompatible , we deve loped a process that transported wire lists between these various CAD tools. The process li nked i np uts and outputs between the sche matic-capture work stations, the PC-board lay out system, the simulator, the gate -array ve n dor, and the docume ntation control group. One key to the rap i d deve lopment of schematics was to let the designers reta in control by perform ing their own drawings and edits. We planned to use gate arrays right from the start of the project. Therefore, a hierarc hi cal schemati c-capture system was needed to fac ili tate the representation of devices at a number of levels. To ve rify the schematics, we selected a mixed-mode logic s i m ulator that had l ibrary support for most of the off-the-shelf devices u sed in PC-board design . That m i n imized the develop ment time to construct the simu lat i on l i braries . A complete simu lation model of the CPU board was also constructed to expedite the 45 ---- Developing the Micro VAX ll CPU Board design veri fication process . This mode l pro vided a "soft" test bed for design changes before t h ey were c o m m i tted tO hardwa re . Behavioral mode ls were used to simu late the signals from the CPU chip, as we l l as any device a ttached to the Q 2 2 - bus . No attempt was made to emu late the VAX i nstruction set. I nstead, the goal was to veri fy the sequences for read s , writes, interrupt acknowledgements , and the cycle flows for the b lock and non-block modes of the Q 2 2 -bus. Several CAD packages developed by Digital were a lso employed to expedite the board design process. Figure 6 shows the CAD flow process that was asse mbled. (For more deta i ls on the CAD tool suite, see reference 3 . ) USER HAN D·DRAWN CONTROL FLOW DIAGRAMS WIRE LISTS: • • • sac 2 GATE ARRAYS 2 MEM BOARDS SEQUENCER DESIGN (FPLS) MICROPROCESSOR HARDWARE SIMULATORS Figure 6 46 PROTOTYPE SINGLE BOARD COMPUTER LOGIC ANALYZER CAD To ols Used in the CPU Board Development Process Digital Technical journal No. 2 March 1 986 New Products Two CAD tools were used to help in the deci sion process for selecting re liable components . The C PU board was mode led with the reliabil ity prediction program PREDI C, which is based on MIL STD 2 1 7 . PREDIC uti li zes component thermal data from the second too l , the THUDS analysis progra m . Using these tools helped us to avoid the creation of hot spots on the board layout and the use of low-re l iabil ity compo nents . These CAD tools were so su ccessful that the C PU board was ready by the time the first-pass C PU and FPU chips were ready. It then took only two weeks of debug to go from the func tional chips to running the MicroVMS operating system . In a l l , the deve lopment of the CPU board took less than one year from initial speci fication to operational prototypes . References 1 . D .W Dobberpuhl et a l , "The MicroVAX 7 8 0 3 2 Chip, A 3 2 -Bit Microprocessor , " Digital Techn ical jo urnal (March 1 9 86, this issue) : 1 2 - 2 3 . 2. W . I . Fletcher, A n Engineering Approach to Digital Desig n ( E nglewood Cliffs : Prentice-Ha l l , Inc . , 1 9 80) . 3. A . F . Hutchi ngs, "The Evolution of the C u s t o m C A D S u i t e U s e d on t h e M i croVAX I I Syste m , " Digital Technical jo urnal (March 1 9 86, this issu e) : 4 8 - 5 5 . Summary The CPU board was designed as part of a larger project with formidable time constra ints . Such an environment demanded that the design of any one component rely on the proposed speci fications for other, interlocking components, rather than on actual pieces of deve loped hard ware . That environment requ ired a cooperative team spirit that was goal oriented and fostered the assumption of rational risks . Both inter group and intra-group communication became extre mely i mportant . The ach ievement of these factors was large ly responsible for the su ccess of the MicroVAX II proj ect. Especia l ly i mportant was the fact that com munication was aided b y t h e CAD tool suite used to su pport the overa l l project . In the case of the MicroVAX If system , we started from a wel l organized datapath and employed sequen tial machine architectures for contro l l ing it. In that way, the design documentati on, simula tion, verificat ion, and su pport were a l l made more manageabl e . In fu ture projects these tool suites wi l l mature and be haviora l component models wil l begin to se rve as design specifica tions. The ability to solid ify the design early in a project means that board designers can fash ion sil icon systems on boards that are func tional on the first pass . Digital Technical JounJal No. 2 March 1 986 47 A n thony F. Hutchings The Evolution of the Custom CAD Suite Used on the Micro VAX II System The Micro VAX II chips were designed in only 20 months, due in part to simulation on CAD systems. Digital has a long history of using CAD Much of the Micro VAX If's CAD suite evolved from tools used on an earlier VLSI VAX design. The higher-level chip junctions were debugged using behavioral simulation, after which the circuits were modeled using the reliable SPICE and GRAPES systems. The IV system verified all inter connects and extracted wirelists, while other tools controlled the databases and checked design rules. The next generation of CAD tools must deal with a threefold increase in chip complexity. . The factors that m ust be considered when initi· ating and comm itting to a new VLSI design are qu ite complex . They are related in the fol low i ng way: Market Requirements/C hip Defi nition + Technology Status + CAD Status + Engineering Talent Avai lable Products w i th long l ead - t i m e s can accept h igher risks in the process c hosen for chip fabrication and CAD technology . Howeve r , prod ucts with short lead-ti mes , s u c h a s the M icroVAX 780 3 2 chip, can tolerate virtually no risk in this doma i n . O n e way t o reduce these r isks i s t o test the chip designs by simu lating their performance before fa brication; another way is to check for a l l possible, known fabrication process viola tions before submitting the mask data for manu facture . CAD systems and tools have been deve l oped for this purpose : to discover prob lems so they can be corrected at min imal cost, bot h in time and resources. Digita l Equipment Corpora tion was an early user of CAD to decrease the time-to-market for its VLSI products . 48 The M icroVAX I I proj ect needed tO re ly o n a stab le CAD system and set of too ls while design i ng the 7 80 3 2 CPU chip (and i ts companion fl oating point coprocessor, the 7 8 1 3 2 FPU chip) . Much of the stab i l ity of the CAD system was derived from work done to deve l op a mul tichip set for another VAX m icroprocessor. 1 We were able to both rational ize and simpl i fy the resu lts of this p ioneering effort to s u i t the needs of the MicroVAX proj ect. Let's begin by discussing this earlier CAD system to see how its use affected decisi ons made on the 7 8 0 3 2 a n d 7 8 1 3 2 projects . CAD System for Earlier VLSI VAX Design In many ways, the design process for the earlier VLSI VAX microcomputer set the tone for all subsequent VLSI designs at Digital Equ ipment Corporation . This process was characterized by the extensive use of simul ation, espec ially high l eve l , or behavi oral , si mulation . The commit ment to high-level simu lation was particu larly in novative at that time. Two types of s i m u lation models were used for this earlier microcom puter. The first rype was d e s i g n e d a s a h i g h - l e v e l s o ft w a r e breadboard used to deve lop a n d check out the Digital Technical journal No. 2 March 1986 I New Products m icrocode before the chip hardware was avai la b l e . The second type was developed as a rela tively deta i led register transfer leve l ( RTL) model of t he actual p hysical partitions and design concepts of the chips themselves. This model was used directly by the logic and cir cuit designers to develop the switch and cir cuit- level representations of the des ign . One problem with using two models is that the output test vectors have to be checked con tinually to ensure compatibil ity between the microcode and chip designs . Thus , a lthough each was optimized to a specific task, the mod els proved to be somewhat cumbersome to use. The hub, or kern e l , of the data manage ment system was called CHAS 2 · 3 This proprietary sys tem was developed at D igital's semiconductor faci li ty in Hudso n , Massachusetts, expressly to form the nucleus of a n i n tegrated MOS custom design suite. The CHAS system performs the necessary data management functions on chip design databases and was originally intended to control a l l the design activities of a chip pro ject. The system embod ies many of the "struc tured top-down design" princip les of Carver Mead. 4 The C HAS system manages the data collected from c i rcu i t and l ogic s i m u l a t i on s, layout designs and syntheses , layout verifications, and schematics entry. This central system also pro vides data protection and conversion functions, as well as generating s i m u lation w i re l ists. • This approach also differed greatly from that of the earlier project, a l though the lessons l e a r n e d fro m t h a t project c o n s i d e ra b l y shaped t h e team's attitudes. For examp l e , the earl ier project suffered-for a while-from attempting to use a first-generation layout editor that had too many bugs . (This tool was not in fact used on any part of the final design . ) It a lso experimented with early ver sions of the CHAS system . These versions did not perform as wel l as desired for some func tions ( e . g . , the Assembled Block Wire l ister) . I n contra s t, t h e M icroVAX design teams decided to perform a l l layout on the indus try-standard CALMA GDS I I layout syste m , a robust and proven too l . • The third decision i nvolved the data manage ment of the design databas e . Rather than use a l l the features of the C HAS sys t e m , we decided to manipulate the design data using the s i mpler VMS fi le-management system with i ts loose but adequate version-control mecha nisms . The CHAS system was used, but in the role of tool integrator, l i nking, for example, the QUI CKDRAW schematic editor to the S P I C E c i rcuit simu lator. 5 The CHAS syst e m a lso provided a variety of va l u able format conversion u t i l ities . • The fi nal decision was to use one proven tool for interconnection verification . This layout extraction; verification too l , called IV, per formed all the e lectrical connectivity check ing in a very efficient manner. 6 The earlier project had used a combination of bought out too ls and although that verification was very thorough, it was more costly tha n the s i n g l e - t o o l p ro c e s s ( IV ) u s e d on t h e MicroVAX project. Decisions Derived from the Earlier Project From the outset , the C PU and FPU design teams made a number of i mportant decisions based on the experience gained from the earlier project. One driving factor in these decisions was the short time-to-market, which dictated that sim pl ifying the design process was a primary goa l . • The first decision was that there wou ld be only one behaviora l , or fu nctional , h i gh level simulation model of the chip rather than the two used earlier. Thus the func tional model was more compl icated than the earlier one, but avoided the very time-con suming task of checking the output test vec tors . Using one model guaranteed that the microcode deve lopment would be in step with t he chip design , s i nce both teams had to use the same mode l . Digital Tecbnicaljournal No. 2 March 1986 The next decision was to carefu lly control the evolution of the CAD system that was used . Any experimentation with e nhance ments to existing CAD tools or with brand new CAD tools would be done only in a con trolled e nvironment. One project engineer, trained i n software and with CAD experi ence, was to be responsible for re-verifying the new functional ity and " robustness" of all new CAD releases. This approach enabled the team to acqu ire a vastly superior design ru le c hecker (DRC) , which cons i derably enhanced productivity during the physical design phase of the project. 49 The Evolution of the Custom CAD Suite Used on the Micro VAX II System A n u m b e r o f very i m portant p a r a d igms should be noted . The Design Methodology and CAD Tool System Having made these simpl ifications, the design team establ ished a fi xed defi n i tion of t he i r design methodol ogy a n d CAD tool mapping. This defi nition was fo l l owed faithfu l ly through out the l i fe of the project. Figure I shows all the activities i n the design phase that were su pported by CAD tools . The middle col u m n l ists each activity; the left col umn shows the type of data used in t h i s activity and manipulated by the CAD too ls which are shown in the right-hand col u mn alongside the actual activity and data they supportju se. The arrows indicate i teration paths where feedback is sent to a h igher leve l . That is, where results are obtained from a checking or verifica tion activity, it may be necessary to go back and modi fy an earlier set of assumptions and design decisions. For example, in ru nning the DRC, i t i s h ighly l i kely that we will find design rule violations that requ i re us to correct the physi cal chip layout. The behavioral model of the design was kept cu rrent with the logic design of the chip to guarantee the accuracy of the microcode with the chip design . 2. The critica l hurdle for the fu nctiona l correctness of the design was the correct execution of a certa in number of VAX macro i nstructions u nder an automated checking process . (The tool used for this process was cal led AXE, an architectura l test-case generator and execu tion too l , working i n conju nction with the DECS I M system , Digita l ' s proprietary m u l ti-leve l , m ixed-mode s i m u lation syste m ) . The m i n i m um number of cases was 1 0 0 , 0 00 tests for each VAX instruction group . ln all, more than 1 m i l l ion tests were exe cuted before the chip was fabri cated . 3. The nu mber o f iterations during the lay o u t - d e s i g n p h a s e was m i n i m i z e d . CAD TOOL USED DESIGN REPRESENTATION USED DESIGN PHASE OECSIM Behavioral Modeling Language ) B O S ) FunciiOnal Schematics Logic Design Capture Switch Level C1rcuit Wirelist Logic Design Verification r 1. esignfVeril · cation + � • DECSIM-Behavior/AXE/HCORE OUICKORAW RSIM Output Test Vectors Verification of Functional Equivalence )output panern comparison] Circuit Netlist Circuit Design Verification SPICE-GRAPES Chop Floorplan Layout Floorplanning Sized Chip Schematocf CALMA GOSII Stream Format � ' CALMA GOSII ' Chip Celt Layout/Assembly ---.--. CALMA GOSII CCF ) La yout· Format) Electrical Con nectivity Checking/ Parasitic Capacitance Extraction lor Celts IV/XREF CCF ) Layout Format) Design Rule Checking for Celts DRC � • t Electrical Connectivity Checking/ CCF )Layout Format) IVfX R E F ParasitiC Capacitance Extraction for Sub Chips and Full Chi ' Fairchild Sentry/Tektronix Tester fnpul Formal Ad h o c Project Tools P l u s O E C S I M CALMA GOSII·E Beam Format )MEBES] Mask Data Preparat1on MOP Fig uTe 1 50 j Test Vector Preparation CAD To ols Used in tbe Design Phase Digital Technical journal No. 2 March 1 986 New Products C ha n ge s d u ri n g t h i s p h ase are very expensive and the n u m b e r was kept sma l l by having the design team submit only sized schematics ( i . e . , ones with transistor width and l ength specifica tions that were verified using the logic and circu it s i m u lators) to the l ayout design tea m . 4. The mask data was not submitted to the mask shop (or even generated) until a11 sections on the whole chip were free of design- ru l e and e lectrical -connectivity errors . The Value of the NMOS CAD Suite on the MicroVAX II Project Figure 2 illustrates the enti re CAD suite used on the 7 8 0 3 2 and 78 1 3 2 chip designs . Use of the CHA S System As mentioned earlier, the final use of the CHAS system was pared down considerably by the M icroVAX project as compared with i ts use in the earl ier project. The fu nctions used most fre quently were • Schematic wirel isting • Layout format conversion • Copyi ng files o u t of the CHAS database • Plotting • I nvoking the SPICE circuit simu lator and the GRAPES graphical post-processor Behavioral Modeling and Simulation A simu lation system cal led DECSI M was used tO simulate the behavioral defi nition of the chip design . 3 · 7 • 8 The DECSIM system works i nterac tive ly and was used to debug the high- level fu nctional design . This system is very reliable and proved to be a vital i ngredient i n achieving the high degree of accu racy of the microcode . ROM/PLA LAYOUT ASSEMBLY � SPICE GRAPES QUICKDRAW H I ERARCH ICAL BEHAVIORAL/ LOGIC S I M U LATION GATE SIMULATION a c::J Figure 2 Digital Technical]ournal No. 2 March 1986 CAD Suite Used on the Micro VAX /1 VLSI Design 51 Schematic Capture A drawing system called QUICKDRAW was used as a schematic editor. QUICKDRAW's greatest assets were its architectural simplic ity , reliabil ity, and ease of use. The system permitted sche matic entry on low-performance graphics ter m inals (VT 1 2 5s) . Of course, keyboard entry is not always totally practical for bulk schematics e n t ry , or e v e n good for s m a l l s c h e m a t i c changes . However , QUIC KDRAW cou l d be accessed from any terminal, was easy to learn i n a few hours, and could be used b y t h e whole chip tea m . these models were derived in two ways: first, by extracting the operating charac teristics of N M O S devices from fabricated test chips; and second, from the res u lts of experi me nts performed by another team at Digita l . That team cre ated models for devices and processes by using a battery of sophisticated simula tors , such as MINIMOS, SUPRE M , and SE DAN . 2. Logic Simulatio n As in the earlier project, the MicroVAX team decided that they n e e d e d the accu racy of switch -level logic simulation . At this level of representati o n , t h e transistors are l i terally treated as "switches , " but with resistance and capacitance attribu tes . The models can also represent both bidirectionali ty and charge shar ing. At the time, the MOS (switc h-level) capa bility of the DECSIM software was still matur ing; t herefore , the team decided to use a switch-level simu lator called RSI M , developed at the Massachusetts Institute of Technology. RSIM was sufficiently accu rate to enable the complete design to be simulated at this leve l, although its timing aspects could not be used. RSIM's usage , therefore, resembled that of a logic simulation sys te m . The prime role of this stage of the process was to prove equivalence with the higher-level behavioral model, thus gaining functional completeness at a l ower, more accurate level of represen tation . That equivalence was ach ieved by supplying the same test vectors used in the behavioral phase to the RSIM runs. Circuit Simulation An industry-standard system , SPICE, was used for c i rcuit simulation. SPICE was the most accu rate mechanism of its kind available for simu lat· ing the electrical performance of c i rcuits on the chips. This simulator was used extensively for circui ts containing up to 1 00 0 transistors . There were two major advantages of Digital's version of the SPICE system . 1. 52 The device models encoded into SPICE were a very accurate representation of the devices made i n Digital's NMOS pro cess . The device equations built i n to Throughout the pre- and post-processing stages, all voltage values over time from a SPICE r u n could be saved and later graphically analyzed by the designers in a proprietary graph ical post-processing system called GRAPES. Using this system avoided having to make multiple runs of SPICE and permitted much easier inter pretation of the o u t p u t waveforms . Figure 3 is a sample circu it simulation waveform from the GRAPES syste m . In terco nnect Verification and Wirelist Extractio n The IV sys te m , partially proven on previous chip design projects , was a major boon to this design team . The system performed seve ral fu nctions . 1. I t extracted a wirelist (in SPICE format) from the actual layout database . 2. I t calcu lated the parasitic capacitances for devices and nodes and fed those into the extracted wi re list. That automatic input permitted the final simulations in SPICE to be very accurate . 3. I t detected any open and short circuits in the e lectrical network of the wirelist. 4. I t compared the extracted wire lis t with the original wire l i s t ( created via the schematics editor, QUICKDRAW) and reported any m ismatches in signal or node names, device sizes, and other e le ments . This verification and extraction tool per formed all these fu nctions much faster and more accurately than any of the connectivity checkers or extractors that were avai lable com· mercially. The IV system is generally recog nized as one of the best in the industry for this purpose . Digital Tecbnical]ournal No. 2 March 1986 New Products R F OO 5 5 1 1 - > A : V ( J ) ?-> :V ( J ) 3-> ( : V ( 5 ) 1 - > 1 . 2 5 . ����� � � � � � � · � 1 - � 7 5 . � � · • ·�- � � � - - - 2 · 5 � � � - - - � � - - - - - J . I 2 5 � - -� - - - - - - 3 . 7 5 2 - > z . · - ·· - � � � · � ·�•• • • • • z . 3 1 � • •••••• • •• • • · - - - • • z . 7 5 �- · � · � · �- · · • · - � 3 , 1 2 5 .� � - - � - -� 3 . 5 (ns) 3 - > p��� A� � � � � ���A � � A � A � A t . 2 5 £ A���A£AA4£��4A�4A � 2 - � 4 4 A A A .. � �� .. � ��� A � . 7 5 ��A����� �A�.. A 5 � - ���•'0 I � X I I I I 1 . �-��� ? . � � M� 3 . � ·� "0 • • • •� � � S . A��\'" b , 0M0P 7 , 00 0 U 0 e . o ��·� Q . 00000 l � . ��0A 1 1 • ���� 1 2 . �000 1 J . 0000 I I C C I I I ' I I C I I A (V) I I A B A 8 I I C C I X I t ------------------------• -------B---------------- X••••••••••• •••••••••••••+·· ·······---------·-·--I C I I A I 8 B 8 X 1 5 , 0000 I b . A000 X X 1 7 . 00�0 B 1 8 , 00�0 B I I I I A I I A 8 I I I C I I c I I I C C I I I I I I I I C Sample Outp u t fro m the GRAPES System The system has some u nique data structures and a lgorithms. 1. It simpl ifies circuit extraction by con verting all shapes i nto trapezoids . These are very convenient representations that permi t IV to thoroughly analyze lateral node and vertical-device connections . 2. It calcu lates the parasitic capaci tances for both area a nd periphery, taking i n to account cel l-capacitance effects coming from ever-shrinking device geometries. The system a l s o c a l c u lates c o u p l i n g capacitances . It performs very fast wirelist compari sons ( layout to logical) , using a u nique graph-isomorphism algorithm that iso lates e rrors rath e r than propaga t i n g them. System Verification The fi n a l syste m - l ev e l verifi c a t i o n of t h e MicroVAX chips was performed u s i n g t h e AX E test-case generator i n conjunction w i t h the DECSIM behavioral models. In this way, test cases (which were in fact VAX macroinstruc tions generated by AXE) were passed to the sim u lation model for execu tion. The execu tion resu lts were then compared automatically with those obta ined from running the same test cases on an operational VAX system . The MicroVAX team used AXE in a particularly novel way . Via Digital's Ethernet network, they searched for i n house VAX- 1 1 /7 8 0 systems w i t h spare capacity Digital Technical journal No. 2 March 1986 I I I Figure 3 3. I I C 1 0 , 000� 1 q , 0 0 00 C (V) (V) on non-prime shifts . Then the team activated AXE on those systems, which generated a tre mendous n u m b e r of test case s . This same approach was used (and continues to be used today on subsequent projects) for ru nning C PU intens ive SPICE circ u i t simu lations on many processors i n remote locations. VLSI CAD Beyond the Micro VAX II Project Digita l ' s u s e o f t he NMOS VLSI CAD su ite reached a peak of maturity with the 7 8 0 3 2 and 7 8 1 3 2 projects . We have been able to make a major process-technology step to CMOS 1 with l ittle cost by exploiting the same basic set of tools . That has enabled us to deve lop a whole new set of VLSI chip products i n very q u ick successio n . However, fol lowi ng Moore 's Law, it is time to face t h e c h a l l e nge of a two- to threefo l d i ncrease i n complexity for the next generation of chip designs . This complexity means that design teams for new custom chips must be able to design parts with twice the transistor count as the 7 8 0 3 2 , yet take the same or less time to do it. Figure 4 i l l u strates the complex i ty that w i l l be experienced in fu ture chip design projects . Major productivity improvements in CAD sys tems must be made to accomplish this doubling of the transistor cou n t . Digita l 's VLSI CAD Group is now making t he fol lowing improve ments in its custom tool suite: 53 The Evolution of the Custom CAD Suite Used o n the Micro VAX II System time. 1 0 TV performs within fifteen per cent of the accuracy of SPICE, but its speed is several orders of magnitude faster. 1 280K Q) _ co 640K u � 320K 0 • Schematic entry can be improved by ru nning QU ICKDRAW on high-performance , high resolution graphics workstations . The system will su pport multiwindowing, menus , and pointing devices, as wel l as provide high performa nce w i r e l i s t i n g , with at least a doubling of speed over the version used on the 780 3 2 chip design . • H igh-resolution, VAX-based graphics work stations will also be used for custom layout editing, using the in- house deve loped editor, MEGAN. :::!.fJJ 0 1 60K Ui Ui c "' .:: 0 a; 1 25 K • MicroVAX I I chi p 8 0K • 7 0K (V1 1 IE chi p ) • 35K 40K n E :::> z ( B I I C chip) 20K 1 0K • • 1 0.5K (T1 1 chip) 17K (F1 1 chip) 1 9 7 9 1 8 0 1 9 8 1 1 8 21 9 83 1 8 4 1 9 8 5 1 8 61 9 8 7 1 88 9 9 9 9 9 Year Summary and Conclusions Figure 4 • • • 54 Chip Complexity Projections A new system for tool integration and data base management, ca l led KATI E , is being deve loped to replace the C HAS system . The KATIE system has a simpler, more modular CAD kernel than has the CHAS system , bu t with much higher performance . The DECSIM software is being improved to provide true mixed-mode model i ng and sim u la t i o n ( b e havioral -gate-sw i t c h ) . I n i t i a l results i ndicate a doubling o f simulation pro ductivity, and our aim is to gai n equ ivalent performance in the s ep arate switch and behavioral areas. A variety of techniques is now providing u p t o t e n times the performance o f t h e tradi tional SPICE system for circuit simulation. For example: 1. An 2. SPICE can be made to run much faster o n vector processors a n d m u ltiprocessors. 3. A timing verification system called TV can analyze critical paths at the rate of I 0 0 0 tran s is tors p e r m i n u te of C P U event-driven c i rcuit simu lation sys tem called SAMSON, 9 which expl oits the temporal sparseness of d igital networks, has been deve loped . SAMSON offers from five to fifty times the performance of SPICE for d i rect current and transient analyses . The MicroVAX I I project demonstrated a num ber of valuable lessons about CAD i n general and VLSI CAD in particu lar. 1. The second and subsequent projects that use a particular CAD technology benefit enormously from the experience gained during the first use. 2. As a corol lary to the point above , it is imperative that CAD tools and systems be built to endu re at least two generations of projects . Otherwise , the cost and diffi cu lties of using these tools w i l l far Out· weigh the benefits . 3. The CAD teams shou l d use the period of stab i l i ty during these later uses of the tools to deve lop the next generation of more powerfu l tool s . 4. M u c h co nserva t i s m e x i s ts i n t h e I C industry around t h e n e e d to archive com plete images of all too ls (layered prod ucts, operating systems, etc . ) used i n the design of a n I C , along with its final mask d a tabase . F u t u re c h i p teams p l a n tO m igrate their mask databases to contem porary CAD system s . This process will use the same exhaustive checks and tools used on the original design to ensure that the conversion is thorough . In this way, there w i l l be no need to revert tO old copies of o u tdated systems and rools when making engineering change orders late in the product's life cycle . Digital Technical jounull No. 2 March 1 986 New Products 5. The close coupl i ng between chip design teams and CAD developers is an invalua ble i ngred ient in the successfu l comple tion of chip projects . References 1. W . N . Johnso n , "A VLSI Supermin icom p u ter C PU , " IEEE International Solid State Circu its C o nfere n ce Digest of Technical Papers ( 1 9 8 4 ) : 1 7 4 - 1 7 5 . 2 . J . C . Mudge , C . Peters, and G . M . Taro l l i , "A VLSI C h i p Asse mbler , " i n Desig n Methodologies fo r VLSI Circu its, ed . P G . Jespers ( Ro c kvi l l e : S i j thoff a nd Noordhoff, 1 98 2 ) , 3 29 - 3 5 6 . 3. A . F . H u tchings , R J . Bonneau , and W . M . Fisher, " Integrated VLSI CAD Systems At Digital Equ ipment Corporation , " Pro ceedings of the 22nd A CMjiEEE Design A u to mation Co nference ( 1 9 8 5 ) : 5 4 3 548. 4. C . Mead and L . Conway, Introduction To VLSI Systems (Readi ng: Addiso n-Wesley, 1 980) . 5. SPICE was developed by Lawrence Nage l and E l l is Cohen of the Department of El ectrical E ngineeri ng and Computer Sci ences, University of California, Berke ley. 6. W.J . Herman and G . M . Taro l l i , " Hierar chical Circuit Extraction With Deta i led Parasitic Capacitance , " A CM IEEE 20th Desig n A u to m a tion Co nference Pro ceedings ( 1 9 8 3 ) : 3 3 7- 3 4 5 . 7. M .A . Kearney, " DECS I M : A M u lti-level Simu lation System For Digital Design , " Proceedings of the ICCD Conference o n Computer Desig n ( 1 9 84 ) : 206-209 . 8. R . R. Rezac and LT. Smith, " Methodology for and Resu lts from the Use of a Hard ware Logic Simu lation E ngine , " Proceed ings of the ICCD Co nference on Com puter Design ( 1 98 4 ) : 4 5 7 - 4 6 1 . 9. K .A . Sakallah and S .W . D i rector, "SAM SON : An Event Driven VLSI C i rcuit Simu lato r , " Proceedings of the Custom Inte grated Circu i ts C o nferen ce ( 1 9 8 4 ) : 2 26-23 1 . Digital Technical journal No. 2 March 1986 1 0 . N . P . )ouppi , " TV : An NMOS Timing Veri fi e r , " ( T h e s i s , S t a n fo r d U n ivers i t y , 1 98 2 ) . Other References Panel D iscussion , R .J . Camoin , Moder ator, " Central DA and its Rol e : An Execu tive View , " A CM IEEE 20th Design A u to ma tio n Conference Proceedings ( 1 98 3 ) : 3 - 1 1 . R . H . Katz , " Managing the Chip Design Database , " IEEE Computer, vo! . 1 6 , no. 1 2 (December 1 9 8 3 ) : 2 6 - 3 5 . W . M . v a n C l e e m p u t a n d H . O fe k , " Design Automation for Systems," IEEE Co mpu ter, vol . 1 7 , n o . 1 0 ( October 1 98 4 ) : 1 1 4 - 1 2 2 . J . C . Foster , "A Un ified CAD System for E l ectronic Design , " A CM IEEE 2 1 st Design A u to matio n Confere n ce Pro ceedings ( 1 9 8 4 ) : 3 6 5 - 3 6 9 . K . Sherhart, M . Vershe l , a n d J . Owe n , "The Engineering Design E nvironment , " A CM IEEE 2 1 st Design A u to mation Conference Proceedings ( 1 9 84 ) : 4 66 472. B .W. Lampson , " H ints for Computer System Design , " IEEE Software, vol . 1 , no. 1 Qanuary 1 9 8 4 ) : 1 1 - 2 8 . 55 Rick Spitz Peter George Stephen Zalewski The Making of a Micro VAX Workstation Developing a Micro VAX workstation required that graphics hardware and software be designed. The pmject team kept the hardware simple by using VAX instructionsfor most ofthe work. Extensive graphics software bridges the hardware and the graphics applications. The graphics and windowing software, UIS, is the key to that process. UJS supports trans parent multitasking with a distributed method for managing regions on the screen. A video device driver manages lists of region descriptors, keeping track of keyboard and mouse changes. The UIS system normally executes in user mode, thus minimizing overhead and utilizing the full performance of the VMS system. When Digital decided to develop the MicroVAX series, we a lso began to consider how to build them i nto a fam i ly of low-cost VAX engineering workstati ons . Experience with the VAXstation 1 00 provided us with a great deal of knowledge related to workstation requ irements . However, its architecture re q u i red extensive grap hics hardware . This architectural approach was not considered viable for a low-cost, h i gh-volum e engineering workstation i ntended for a single user. Another approach p lacing greater empha sis on software was i l lustrated by Xerox's Star workstations, which were in use within Digita l . We decided that combining t h e MicroVAX processor with a low-cost graphics contro l ler, the VMS operating syste m , and a good h u man interface would resu l t i n a powerfu l worksta t i o n . The VAX/VMS e nv i r o n m e n t a l ready allowed any VMS appl ication program to ru n on every member of the VAX family. The MicroVAX system wou l d extend the fam i l y to include lower-cost VAX systems . A MicroVAX worksta tion, in addition to ru nning a l l existing VMS software , would now provide a base for graph ics appl ications . I n the spring of 1 9 8 3 , a joint task force of hardware and software e ngineers was formed to determine how this workstat i o n shou l d be bu i l t . Our strategy was to design a product based on the MicroVAX I system and evolve it to 56 a mature worksta tion using the MicroVAX I I system . The task force 's objective was to set the over a l l goals of t he project and tO make sure that the grap hics hardware and software were wel l inte gra ted . G u ided by a strong focus on tim e to market , the workstation hardware group had the responsibi l i ty of buil ding an i n itial graph ics control ler. They were also chartered to initiate design work on fu ture hardware graphics con trollers with more features and h igher perform ance . The VMS software group took on the role of deve loping the sofn:vare components . This paper is written by members of the VMS Devel opment Group; therefore , its primary em phasis is on the software aspects of this proj ect . Our first task was to make sure that the graph i cs hardware being defined was s u i table for effi cient use by the software . Having l i m ited expe rience with low-cost graphics contro llers and workstations , we proposed a strategy of using a very basic Q-bus control ler and doing most of the work with VAX instructi ons . This approach was viable because the VAX i nstruction set is rich and versatile in the area of character and bit manipu lation . It also m i n i mized the risk i n deve loping hardware and provided maximum flexibi lity for the graphics capab ilities . With greater freedom in the software design, we cou l d ga in experience and provide better d i rec- Digital Technical journal No. 2 March 1 986 I New Products tion for hardware features needed in fu ture graph ics controllers . Since no MicroVAX CPU had yet been devel oped, we built a breadboard hardware configur ation to do hardware and software eva l uations . MicroVAX systems exec ute a su bset of the fu l l V AX i nstruction s e t i n hardware ; however, software e m u lation of the other instruct ions allows a l l VAX software to run transparently. For cost and space reasons, MicroVAX systems were targeted to use the Q-bus for ljO , while most existing VAX systems used the UNIBUS for most peripherals . The breadboard configuration consisted of a VAX- 1 1 / 7 5 0 system with a UNI BUS-to-Q-bus adapter. We obtained some experimental Q-bus graphics contro llers used in the development of the graphics interface for the PR0 3 5 0 hard ware . Using this configu ration, we evaluated the performance of text and graphics by imple menting a nu mber of software algorithms . 1 This technique treated d isplay memory as standard VAX program memory, and VAX c haracter and bit instructions were used to generate text and graphics . Evaluation of our results showed that this approach was reasonable and the basic per fo rmance was a c c e p ta b l e ; howeve r , some assists were sti l l needed i n hardware. The VCBOl Hardware Graphics Controller ages, the controller had to fit on a single-quad Q-bus modu le. It contained 2 5 6K of bitmap mem ory that was ful ly addressable by any VAX instruction. That amount of memory was more than was needed to fi ll a fu l l -screen v ideo mon i tor. The extra memory would al low software gra p h i cs ro u t i n e s to operate d i re c t l y o n occluded areas o f windows i n the video disp lay memory. Based on inputs from the software eval uation, the hardware would a lso contain a scan-line map to a l low mapping any scan l i ne in d isplay mem ory onto the physical scree n . This tech nique allows much better scroll ing perform ance , fac il i tates the management of occ.luded window areas, and allows the s i m u l taneous s u p p o rt o f d i ffe r e n t w i n d o w i ng sys t e m s . A 1 6 X 1 6 -pixel cursor plane , a separate hard ware c o m p o n e n t , g r e a t l y s i m p l i fi e d t h e software logic required to manage the mouse cursor. The pattern is progra mmable to allow dynamic changes to the cursor pattern , depend i ng on its screen location and the state of the workstation . I n addition, a mouse interface and dual UART are provided to connect to a mouse , a keyboard and an optional tablet. The inherent simplicity of the hardware allowed the hard ware team to produ ce the first protOtype by the early summer of 1 9 8 3 . Figure 1 shows a b lock diagram of the VCBO 1 configuration. Taking our resu l ts back to the the joint task force, we settled , after several iterat ions , on a hardware design . The hardware graphics con troller was named the VCBO 1 , known i n ternal ly as the Q- bus video su bsyste m , or QVSS. Due to space and power constraints in MicroVAX pack- BITMAP SCAN MAP Software A rchitecture The software team was chartered to develop a general software workstation architecture . Our goa l was to al low the evo l u t i o n of fu ture Mi croVAX workstations that would address PHYSICAL SCREEN 1 024 2048 LINES 864 LINES ENTRIES 16 BITS 960 BITS 16 CURSOR 1 024 BITS Fig ure 1 Digital Technical journal No. 2 March 1 986 Block Diagram of the VCBO 1 57 cost-sensltlve markets with basic, inexpensive hardware . We a lso wanted to improve perform ance and take advantage of features to be pro vided by more - i ntell igent hardware graph ics controllers in the fu ture . Our performance eval uation of the VA.Xsta tion 1 00 a rchitecture p o in ted out that the cen tra l dispatcher needed to manage the window i ng activities on the physical screen was a real bottleneck . Therefore , we elected to pursue an approach that used a distribu ted met hod to manage regions o n the physical scree n . I n most cases this approach wou ld a l low an ind ividual job, called a process i n the VMS system, to oper ate directly on bitmap me mory. There is much less overhead than context switching between processes, as requ ired in a centralized screen manager design . The software archi tecture that we defined was implemented by a l oadable set of VMS sys tem services know as the User I nterface Ser vices , or UIS 2 UIS provides fu ndamental graph i cs services and d i s p l ay l i s t capa b i l i t i e s . Ap p l i ca t i o n programs , h i g h - l ev e l gra p h i c s packages , a n d VMS ' s VT 1 0 0 a n d TEK4 0 1 4 e m u lation drivers a l l u t i l ize UIS to construct indi vidual wi ndows, as wel l as for text and graphics fu nctions. 3 A VCBO 1 device driver is used to 4 manage the physical hardware The driver is responsible for control ling the keyboard , the mouse (pointer) , and the scan-line map . VCBO 1 Video Device Driver The video device-driver software has one pri mary fu n c t i o n : to ma nage l i sts of region descriptors . I n particu lar, it keeps three ma i n lists; o n e each for keyboard i n p u t , bu tton tran sitions, and poi nter (mouse) movement . To be notified about a particular event, a n appl ication p rogram posts a request to t h e driver. The request specifies t h e type o f event desi red and the region on the scree n . The driver then places this request on the appropriate list. For example, if pointer movement requests are active and mouse movement occurs, the driver w i l l search the l is t for the entry that has speci fied a region that the pointer is c u rre n t ly within. The driver then notifies the application that was the last one to specify this area. The notification mechanism used is a software inter rupt, known in the VMS system as an asynchro nous system trap . This trap interrupts the flow of the specified user process and invokes a user- 58 defi ned action rou tine. This technique provides a low-cos t , res ponsive n o t i fi c a t i o n to the appl ication . The keyboard is connected to the device driver by a dual UART on the video control ler. A hardware interrupt is del ivered to the driver each time a key is pressed . The driver then searches the keyboard l ist and del ivers the char acter to the process associated wit h the top entry on the l ist. Al l keys are "soft , " which means that any key on the main keypad can be defined as any of the possible ASC I I character codes . I t is a l s o poss i b l e to defi n e m u l ticharacter sequences for a given key. The sec ond half of the dual UART is used to support a bit tablet or a serial mouse . These devices need to send several bytes of data for each poi nter or bu tton transition . The driver buffers this data u nt i l it receives enough to decode an event . Then i t searches the appropriate event list and, if necessary, del ivers a software i nterrup t to the appl ication . The driver su pports the capability to specify cursor patterns for a regi o n . When cursor move ment is detected , the driver searches a list to determine what the cursor pattern should be for the current location of the pointing device . O nce located , the pattern is l oaded into the hardware . The video controller hardware then superi mposes the pattern onto the appropriate screen area by merging the pattern with the video signal from the bitmap memory. This pro ced u re e l i m i nates the need for a save-and restore operation in the physical b i tmap each time the cursor moves or a write to bitmap memory occurs . The hardware also has the abil i ty to specify two logical operations, NAND and XOR, on the cursor pattern . This abil ity pre vents a white cursor from being lost on a white screen , or a b lack cursor on a black screen. The driver tests the physical bi tmap location that is overlaid by the cursor to determine which logi cal operation shou ld be used to maxim ize the cursor's visibility. A p r o p o r t i o n a l - a c c e l e r a t i o n m ov e m e n t a lgori thm i s used t o min imize the desktop area req u i red for a mouse pointer. The driver acce l erates the cursor's movement if the mouse 's rate of movement exceeds any of a series of thresho lds in a given screen refresh i n terva l . If no acce leration were to occur, it woul d take a desktop space of a pp rox imately 1 3 by 1 1 i nches to move the mouse both horizonta l ly Digital Tecbnicaijournal No. 2 March 1986 New Products and vertica lly respectively across the screen . With acceleration, a mouse movement of only 2 inches is needed to move across . The acce lera tion va lues used are as fo l lows : 1 to 2 pixels of l i near mouse movement per screen refresh interva l , no acceleration needed; 3 to 4 pixels, accelerate by a factor of 2; 5 to 8 pixels, accel erate b y a factor o f 4 ; greater than 8 p ixels, acce lerate by a factor of 6 . The driver provides an optional console win dow to a l l ow syste m - l evel debugg i n g . The MicroVAX CPU can communicate directly with the video control ler duri ng booting and debug ging. If this feature is enabled , the top 2 4 0 scan l i nes of video memory w i l l be al located for the console window. When the CPU wants to com municate with the console, the VMS console driver will map directly to those 2 4 0 scan l i nes. Thu s , the console driver emu lates a "dumb" term inal in this region . When a function key is pressed on the keyboard, the video driver will map this special console memory onto the top 2 4 0 entries of the physical scan-line map, and the operator console wi l l appear. When the key toggles aga i n , the top 2 4 0 entries of the scan line map will be restore d . UIS Graphics and Windowing Software The decision to use simple hardware meant that software had to be developed to bridge the gap between that hardware and the app lications . T h i s software was of c r i t i c a l i m po rtance because the hardware designers assu med that a software layer wou l d be needed to su pport even the most basic graphics functions . Early in the design process, we decided that this software would provide more than just basic IjO su pport through the video contro l ler. Like the VMS operating system i t was b u i l t on, t h e worksta t i o n gra p h i cs and w i n d o w i n g software , UIS, wou ld su pport transparen t mul ti task i n g . That meant being able to handle simultaneous demands by m u l ti pl e i ndepen dent appl ications on the shared VCBO 1 hard ware resou rce s . Therefore , U I S s h o u l.d be designed to provide two capabilities . First, i t should have a l i brary o f genera l -purpose proce dures that app l i cations c o u l d use to easily access the hardware resources . Second , U I S should contain transparent management and syn c h r o n i z a t i o n m e c h a n i s m s . I n that way, i ndependent appl ications cou l d share both Digital Technical Journal No. 2 March 1 986 screen space and the use of the syste m's input device s . This design wou ld a l so a l l ow the deve lopment of U I S appl ication programs o n any VAX syste m , whether it was a workstation or not. For the initial release of the MicroVMS work station on the VAXstati o n I , these object ives were broken down i nto the fol l owing specific design goals : • Provide routi nes for creating a nd manipulat ing viewports on the video disp lay. • Su pport m u ltiple overlappi ng viewports and manage viewport occlusion transparently for appl ications . • Allow simu ltaneous graphics operations into a l l viewports . • Prov i d e a u s e r i n te rfa ce fo r v i ew p o r t manipu lations . • P r o v i d e ro u t i ne s for cre a t i n g gra p h i cs objects . • Provide d i splay- l i s t backup for graph ics operations so that appl ications can easi ly perform operations like " pan" and " zoom . " • Support shared access t o the mouse and key board and provide routines to notify applica tions of i np u t even ts occurring on these devices . The fo l l owing sections describe the architec ture of UIS and the mechanisms that were used to rea l ize these goals. Figure 2 i s a block dia gram show i ng the fu nctions of U I S . Virtual Displays The fu ndamental presentation object manipu lated by applications to construct images is the virtual display. All UIS output fu nctions are per formed within a virtual display. The coord i nate system of a virtual disp lay is defi ned in "world coordinates . " The world coordinate system uses the coordinate system of an application as a means of expressing d isplay loca t i ons . For exa m p l e , an application that draws a graph s howing popu lation growth versus time may find it conve n i e n t to use "Time" and " Nu mber of People" as x and y coordinates . The range of world-coordi nate val ues is specified to t he grap hics subsystem when the virtual d isplay is created . The coord i nates are specified as signed F-floating VAX data types 59 The Making of a Micro VAX Workstatio n U S E R A P P L ICATION PROGRAM I U I S SHAREABLE I M A G E UIS G R A P H I CS/WIN DOWI N G SYSTEM SERVICES f--- � (BINARY E C O D I N G ) I E N C O D I N G D I S PATC H E R G R A P H ICS EXECUTION R O U TI N ES & ROUTINES TO U P DATE D I S PLAY LIST - I I BITMAP G R A P H I CS R O U T I N ES (GER) D I S PLAY VI EWPORT S E R V I C ES (VPS) I I r--- 'l DISPLAY M E M O R Y (VCB01 O R VAX) Figure 2 VIS Fu nctional Block Diagram for reasons of precision and ease of calcu lation in high-level languages . A display l ist is an encoding of the exact con tents of a virtual display, independent of the device . Display l ists are maintai ned and used by UI S to ach ieve the fo llow i ng short- and l ong term goals: • Al low the automatic management of pan n i ng, zooming, res iz ing, and d u p licating dis p lay windows • Al low high-res o l u t i o n pr i nt i n g of virtual displays • Allow the structuri ng and manipu lation of virtua l-display objects • Al l ow an appl ication to select an arbitrary output from a virtua l display, give it to an " i ntel ligent" cooperat i ng appl ica t i o n , or simply store i t in a fi le as generic encod ing, and then later rep lay the generic encoding into a new virtual display 60 VCB01 DEVICE DRIVER Disp lay lists consist o f the fol l owing basic objects : • Output primitives • Attribute primitives • Structural primi tives Output primitives map directly onto the U l S output operations ( e . g . , plot some l ines , write a text, draw a circle) a nd the modifications that they make to a virtual display. Attri bute primitives change the current value of an attri bute in an attri bute block i n order to affect subsequent output primitives . Attribute b locks are used by UIS to specify a set of attri bute va lu es for a l l UIS graphics objects ( l ines , text. c i rcles) . Typical attribu tes incl ude the writing mode ( replace , complement, erase) , l i ne style (sol i d , dashed) , a nd font to use when wri t i ng text. There may be up to 2 5 6 attribute bl ocks a d d ress a b l e a t o n e ti m e . At t r i b u te b l o c k Digital Technical journal No. 2 March I <JB(, New Products nu mbers are used and assigned only by the app lication, except for attribute block 0 . This block is a special one that cannot be modified . It provides a set of attributes used as a standard defa u l t for text and graph ics. Block 0 also pro vides a template for creating alternate attribute blocks . Structural prim itives a l low the hierarchical grou ping of attribute and output pri mi tives into graph ical begin and end blocks, called seg me nts . Segme n ts a l low appl ications to have access to many more than 2 5 6 attribute blocks . While segments inherit current attribute blocks from higher- level segments , modifications to attribute blocks from within a segment cause local copies of the modified attribute blocks to be crea ted . For example, if a particu lar attri bute block is referenced within a segment, then that segment is first searched for the b lock. If the b lock isn ' t found, the search is made in successive outer segme nts. The coord i nate syst e m , called normalized coordi nates, is used both within the d isplay l ist and when creating generic encoding. Normal ized coord i nates are used to defer t he mapping of a set of world coord inates to specific device coord inates until the actual output device is known. As described in the fol lowing section, this mapping tO the physical device does not occur u nt i l a display viewport is create d . This delay is important since output devices have different resolu tions . For exampl e , prin ters typ ically have much higher resol u tions tha n video moni tors. Since floating point calcu lations are typically slower than integer ones , normal ized coordi nates are e xpressed i n u n i ts c a l l e d "Gutenbergs , " which are stored as 3 2 -bit inte gers . A Gu tenberg, the same u n i t used in UIS font definitions, is defined to be 1 /7 2 0 0 inch ( . 0 1 points) The ir use as normal ized coord i nates is we ll suited because they m inim ize the nu mber of coordinate transformations t hat must be performed when wri ting text. Gutenbergs have the desirabl e characteristics of being both reasonably smal l-and the refore amenable to good graphics resolution-and very efficient for text operations . The conversion betwee n world and normal ized coord inates is based on the desired physi cal size and world-coord inate size of the virtual display as specified by the appl ication . When a virtual d is p l ay is create d , t h e a p p l ication Digital Technical joun1al No. 2 March 1 986 expresses the desired size of the virtual disp lay in both p hys ical and virtual uni ts . That estab l ishes the relationship between the physical size of the fonts and the arbi trary size of a vir tual display 's world-coordinate system. Display Windo ws and Viewpo rts A display window is the object used by app li ca tions to control how much of a virtual disp lay i s avai lable for viewing by the user. This control is accomplished by defini ng a rectangle speci fying the v i ewable port ion of the virtu a l display. A display viewport is the area of the physical screen in to which a display wi ndow is mappe d . Display v i ewports vary in s i z e a n d may b e placed anywhere in t h e physical screen area . Display viewports a lways occlude when they o v e r l a p . The o r d e r of o c c l u s i o n u s u a l l y depends o n the order i n which t h e display viewports were created . However, the order may be a ltered by t he user through the UIS user i n terfa ce or by applications using the U I S wi ndowing services. A disp lay wi ndow is created , mapped, and automatically sca led t o a display viewport when the application makes a singl e , routine call tO UIS. Note that at the time of the cal l , the output of the UIS app li cation is directed to a specific phys ical output device , usually the screen. Scal ing can be avoided if the appl ica tion directs UIS to use the physical size sup plied by the application when the virtua l dis play was created . That al lows text and graphics to appear in exactly the size and aspect ratio that an appl ication considers ideal . The a m o u n t a n d s i z e of t h e i mage t ha t appears in a display viewport can be contro J l ed by altering the size and pos ition of the display wi ndow or the size of the display viewport. The i mage can be managed by either the applica tion, through UIS, or the user, through the user interface functions. The fo l lowing ru l es govern the im age : • To magnify the i mage , either the size of the window is decreased without a ltering the viewport , or the siz e of the v iewport is increased without altering the window . • To reduce the image , either the size of the wi ndow is increased without a ltering the viewport , or the size of the viewport is decreased without altering the window . 61 The Making of a Micro VA X Workstation • To change the amount of the virtual disp lay being viewed wi thout scaling, both the win dow and the viewport size are expanded or contracted by the same amou nt. • To pan the i mage , the window around the virtual display is moved without altering the viewport size or location. can be received i n either of two forms. First, applications can specify that they be del ivered a software interrupt whenever keyboard input occurs . Second , they can periodically poll the v i rt u a l keyboard to see i f new i n p u t has occu rred . Certa i n characteristics can be man aged for each virtual keyboard , such as keyc lick volu mes and keyboard key mappings. The connection between the physical key board and the various virtual keyboards ava ila ble on the workstation is genera l ly managed by the user. An appl ication could force the physi cal keyboard to be bound to a virtual keyboard. Typica l ly, however, the appl ication will associ are the keyboard with some d isplay viewport and al low the user to manage that connection through the user interface. Figure 3 i l lustrates the mapping that takes place when going directly from a virtual display to a physical display. The left column shows the transformations between the coord inate spaces. The two colu mns on the right show the way the virtua l d isplay is scaled to the fi nal output device . Virtual Keyboards App l ications use a concept ca lled virtual key boards to share and ind ividua l ly manipulate t he p hysical workstation keyboard . Virtual key boards allow an application to get input from the physical keyboard and to modify its charac teristics , both i n a synchronized manner. I np u t Mo use Input Applications can both solicit and manage input from a mouse with respect to rectangles within disp lay viewpons . To do that, an app lication must specify a world-coordi nate rectangle and WO RLD COORDI NATES (WO RLD COORDI NATES) (WORLD COO R D I N ATES) I I I I I I ( S I Z E D TO . . . ) ( )\ VI RTUAL DISPLAY VIR TUAL DIS PLAY D I S PLAY LIST ENCODING I N NORMALIZE D COO R D I NATES (CLIPPED TO .. ) D I S P LAY WIN DOW D I S PLAY W I N DOW OUTPUT P R I M ITIVE EXECUTION ROUTI NES I ( S I Z E D TO .. ) D I S PLAY V I EWPORT (DEVICE SPECIFIC COO R D I N ATES) � (WRITTEN TO .. ) � Dl 7 ( 7 V I EWPORT EN TO . . . ) PHYSICAL DISPLAY Figure 3 62 Mapping from Virtual- to-Physical Display Digital Technical]ountaJ No. 2 March 1986 New Products the disp lay viewport to which the rectangle app lies. The application then directs the UIS to • C hange the cursor pattern or pos ition when the cursor moves within the rectangle • Send a software interrupt whenever the cur sor moves within or out of the rectangle • Send a software interrupt whenever a mouse button is depressed or released within the rectangle App l i c a tions can a lso c h e c k t h e cu rre nt mouse position or bu tton state at any time . Implementation Details UIS was designed with two primary implemen tation goals in mind. Of course, the first goal was to implement the architecture described in the previous sections. just as important was the belief that the cost of using UIS had to be as small as possible. The overhead associated with a rou tine call had to be m i nim ized , and the algorithms and arc hitecture employed by UJS had tO be as efficient as possible. UIS also had to be fast because the simple graphics hardware re lied upon UIS software to take the place of sophisti cated grap hics hardware . To meet these goa l s , t h e software team made s o m e basic design decisions right at the start . The effect of these decisions on how the design operates are discussed in the following section . UIS operates in the caller's mode (usually user mode) because the cost i nvolved i n c hang ing tO kernel mode wou ld be proh i b i tive . Because UIS operates in user mode, a l l data structu res used by U I S are given user-write pro t e c t i o n . T h i s d es i g n d e c i s i o n m e a n s t h a t timesharing use of the graphi cs package is pos sible, but withou t any security considerations . Most of the UIS code res ides in system space , and UIS rou tines exist as system services within the VMS operating system . That gives UIS all the desirable performance characteristics of oper ating system code ( i . e . , minimal image activa tion cost, max i m u m s hareabil ity , separately managed paging, etc . ) . Fonts are stored in files and treated as system resources. Since several applications are likely to use the same fo nts at the same time, UIS font management was designed to optim ize fo n t sharing. Fonts curre ntly in u s e are kept i n a font pool in system me mory. Upon beginning a text drawing operati on, a process accesses the sys tem font pool to fi nd the required fon t . If not found in the poo l , a font can be loaded into the Digital Technical journal No. 2 March 1 986 font pool by searching the disk for the proper fo nt file and then reading it in to system mem ory. Sim ilarly, fo nts can be removed from the font pool because they can always be retrieved from disk . Each virtual display is managed by only one process . That synchronizes the access to virtual disp lays and disp lay l ists and min im izes the effect that graphics applications have on each other. I f a second process wants to man ipu late the virtual display of another process, then the applications running in the two processes must com m u nicate . The process that created the vir tual display must then make modifications to it. This concept is enforced by the fact t hat the contexts for a l l virtual d isplays reside i n pro cess address space. Data structures for display viewports, on the other han d , are kept i n system spac e . That a.llows a process to change the topol ogy of the viewpons on the video display. For example, a viewport bound to a display win dow that i t owns c a n b e " popped" w i t h o u t having to notify every other process of the necessary screen changes. The storage for viewport data structures is allocated from paged poo l . How ever, the storage protection must be changed to user write to a ll ow access by the process-based graphics routines. Access tO those data structures by UIS rou tines i s synchronized using t h e VMS lock man ager. M u l t i p le processes are gra nted shared readjwrite access to the physica l display as long as they are simply read i ng from or writing tO their own viewports . I f a process needs to change the re lationships between the display view ports on the screen ( e . g . , create a new viewport or pop an existing viewport) , it must requ est exclu sive readjwrite access to the phys ical display. Thus, no sync hron ization overhead is incu rred in the steady state. Figure 4 depicts the bas ic u se of storage by UIS. As s hown i n Figure 4 , UIS software i s organ ized into five basic parts. ' T h e first piece of U I S t ha t app! i cations encounter is the UIS shareable image . UIS rou tines a r e accessed b y appl ica t i o ns through transfer vectors in a VMS-protected shareable i mage . That a llows UIS code to i ncrease in size and tO change location within the operating sys tem without affecting the appl icati ons that use the code . Also , UIS application development can occur on machines where UIS has not been 63 The Making of a Micro VAX Workstation • SYSTEM SPACE • M A P P E D V C B 0 1 M EM O R Y INCLUDING PHYSICAL BITMAP • • P A G E D POOL BACKUP VIEWPORT B I TM A P S • VIEWPORT D A T A STRUCTU R E S • • FONTS • • • To present the rest of UIS with the " i l l usion" that viewports are always u nocc l u ded and are contiguous p i eces of hardware video controller memory • To take advantage of VCBO 1 scan- l i ne scro l l i n g whenever possible • To provide bitmap backup for occluded win dows so that app lications are free from the complex ities of occ l usion management • P R O C ESS ( P 1 ) SPACE • PROCESS- P E R M A N E N T D I S PLAY CONTEXT (DELETED AT PROCESS R U N DOWN) • • • PROCESS (PO) SPACE • N O N - P R O C E SS - P E R M A N E N T D I S PLAY CONTEXT ( D E LETE D AT I MA G E R U N DOWN) • • Figure 4 VIS Storage i nstalled. The UIS shareab le image can be used to resolve UIS references at link and image acti vation time, even if the UIS system services are not present on the system . Finally, because the shareable image is protected, UIS can get con tro l during image ru ndown and perform some necessary clean-up activities . The shareable image performs the requested operation by cal l i ng the :� ppropriate UIS system service . At this point, user requests are trans lated into calls tO internal UIS routines, and the relevant i n ternal data structures are located . For exampl e , for a typical keyboard operation , UIS wou l d locate the right virtual keyboard and make the appropriate calls to the VCBO 1 device driver. 64 For a typ ical output operation, such as draw i ng a l in e , UIS first creates a d isplay l ist entry. UIS then calls the display list management rou ti nes to u pdate the d isplay list and a l l wi ndows i nto the virtual display. These rou tines, in turn , wil l check with the viewport service routi nes (VPS) to find the right area of the physical screen in which to draw . Finally, the manage ment rou tines d irect the bitmap graph ics exe cution rou ti nes (GER) to draw to those areas . VPS is more than a simple screen rectangle manager. Its tasks are VPS does this by judiciously using and mi xing th ree diffe rent types of video m e m ory : on screen VCBO 1 memory, off-screen VCBO 1 mem ory, and off-screen VAX m e mory . VPS a lso manipu lares the entries in the VCBO 1 video scan-line map to present UIS with a virtual scan l i ne map, or virtual viewport , for each physica l disp lay viewport . If the physical d isp lay has only one viewport, VPS will simply a l locate a set of p hysi ca l VCBO 1 scan lines and set up the viewport data struc tures to direct GER tO that set. In this case, the physical and virtual viewports w i l l be the sam e . However, if the d i s p l ay has occ l u d ing viewports , VPS w i l l create a virtual viewport i n off-screen memory for each physical viewport . The n , at 8 0 - m i .l l isecond i nterva l s , VPS wi l l copy the m o d ified contents o f the v i rt u a l viewports t o the phys ical viewports . I f changes must be made to the VCBO 1 video scan- l ine m a p , then VPS wi l l u pdate the m . These changes cou l d be caused b y ei ther a viewport that needs to be hardware scrolled or a change in the layout of the viewports on the p hysica l scree n . VPS then merges all the virtu a l scan- l i ne maps a n d requ ests an u pdate of the physical scan-line map. Those actions are done in synchronization with the 60-Hz video vert i cal -retrace interva l . Digital Technical ]out·nal No. 2 March l 'JB(i New Products Summary Our initial goals were to design a workstation prod u ct with t he Mi croVAX I syste m , thus pro vid ing a stab l e , mature product available for the MicroVAX II syste m . The joint engi neering task force was i n itiated in the spring of 1 9 8 3 ; proto type graphics hardware was available i n the early summer. Once that pre l i m inary hardware was ready, the VMS team entered into fu ll -scale development. The VAXjVMS worksta tion (VWS) product was developed during the fal l and win ter of 1 983, and into the spring of 1 9 84 . VWS underwent customer field test with the VCBO 1 graphics controller, the MicroVMS system , and the MicroVAX I system in the summer and early fa l l o f 1 9 8 4 . T h e f i r s t r e l e a s e o f t h e VA.Xstation I was available i n late 1 984 . This i n i t i a l pro d u c t a l l ow e d t h i r d - p a rty VAX software vendors to take advantage of the VWS architecture . La t e r , t h e VA X s t a t i o n I I re p l a c e d t h e MicroVAX I CPU with a MicroVAX I I engine, thus gaining much higher performance . The MicroVAX I I processor entered customer field test in the early spring of 1 9 8 5 , with ship ments to customers by early s u m mer. A new VWS so ftware r e l e a s e t h a t s u pp o r t e d t h e VA.Xstation I I was made avai lable short ly after wards . That VMS software was the fu lfi l l m ent of this project's long-term goa l . References 1 . J . D . Foley and A . van Dam , Fundamen tals of Interactive Computer Graphics (Reading: Addiso n-Wes ley, 1 9 8 2 ) . 2. Micro VMS Wo rkstation Graphics Pro g ra m m ing Guide ( Maynard : D i g ital Equ ipment Corporation , Order No. AA G 1 1 0B-TN , 1 9 85) . 3. Micro VMS Workstation User's Guide ( Maynard : Digital Equ ipment Corpora tion, Order No. AA-EZ24 C-TN , 1 9 8 5 ) . 4. Micro VMS Worksta tion Video Device Driver Manual (Maynard : Digital Equip ment Corporatio n , Order No . AA-DY65C TE , 1 9 85) . Acknowledgements We wou l d like to acknowledge the contri bu tion made by Dick Hustvedt to the MicroVAX workstation effort. Dick was instru mental in spearhead ing this u ndertaking . The contr ibu tions of Cathy Learoyd , Tom Furlong, Rob Scot t , john DiMack, Mike Rosenbl u m , jake Vannoy, and the rest of the VMS workstation tea m were also inva luable. Digital Technical journal No. 2 March 1 986 65 Nicholas A. Warchol Stephen F. Shirron The RQDX3 Design Project The RQDX3 is a Winchester and floppy disk controller aimed specifically for use on Micro VAX II systems. The designers foUowed a top-down development process to meet their goals. Trade-o.ffs, some requiring hardware andfirmware to be built and testedfor reliability, were identi fied and evaluated early in the project. The RQDX3 bas a three-port data buffer to smooth data transfers between the host processor, the control ler's microprocessor, and the disks. Four internal subsystems work in parallel to allow maximum system performance. Design Goals The project team set a nu mber of specifi c goals at the start of the RQDX3 design. The greatest need was to i mprove the performance of the MicroVAX I I system over that available with existi ng controllers, yet greatly red uce the man ufacturing costs of the disk subsystem. The fol lowi ng list contains the goals that governed the design of the module: • Cost-Obtain a manufactur i ng cost less than half of the best curre nt disk controller. t he RQDX 2 . • Performance-The c o n t ro l l e r s hou l d n o t force an interleave o f data sectors on t h e sur face of the hard disk drives or limit the per· formance of the Winc hester disk drives. The controller should also avoid wasting system bus bandwidth on the Q -bus . The controller architecture had therefore to be chosen to al low the highest performance poss i b l e w hile meeting t h e other design goals. • Dual Mod u le- The c o n t r o l l e r s h o u ld be designed so that it will fit on one Q-bus dual module. This form factor will allow the most flexible system configurations . • Schedule-First customer shipment would be approxi mately one yea r from the project start . Meeting this goal would a ! l ow the 66 phase -out of the higher cost and lower per formance RQDX l and RQDX2 modules . • Testable Design-A h igh percentage of th is module would be testable by providing extra hardware , m icroprocessor code , and test strategies . This design would help to reduce both manufactu ri ng and mai ntenance costs . The Design Philosophy The team members decided that a top-down approach to the problem was the only way that the design goals could be met. A wel l stru c tured , we ll docu mented design would al low the max i m u m communication between team members , and it would allow trade-offs to be made early in the design cycle. The design process u se d i n the project adhered to the following form : • Set the goals and assign priorities to deter· mine how flexible each one i s ; that will al low tradeoffs to be made if a goal is not attainable . • ColleCt and study any overall system specifi cations and requirements that apply. This is t he time to write the prel iminary engi neer ing specifi cation and define the i nte rfaces (both hardware and software) that must be adhered to. Any impu lse to go back and change these specifications shou ld be vehe· mently resisted. Digital Technical Journal No. 2 March I 986 I New Products • Analyze the prob lem and determ ine t he sys tem architecture based on the flow of i nfor mation and the complexity of the req uired control functions. If the problem appears too large or is not easy to document or describe , then it should be d ivided i nto smaller, more manageable fu nctions. During this phase , operational descriptions are created . Those can be flow diagrams, timing diagrams, state transition diagrams, or a nything that w i l l help t o explain how the controller should work . These descriptions shou ld be i ncluded as part of t he documentation package . • Look for the solution to each problem while we ighing it aga inst the design goa ls. Itera tions between this step and the previous one can be expected in order to meet the goals . This part of the process involves l ooking at the avai lable technologies and other designs to determ ine what is or is not usable. If other designs have fol l owed the same documenta tion strategy, then this task is much easier; if they have not, then do not waste too much time tryi ng to " reverse engineer" those designs . The risk of u s i ng new technologies must be assessed tO determi ne what impact they wou l d have on the design 's cost and schedule . The hardware design is documented using drawings called functional partitions . These drawings are a hierarchy showing the i nter c o n n e c t i o n of fu nct i o n a l , n o t physica l , p ieces of the design. Al l datapaths and con trol signals are named at this time. The draw i ngs w i l l be the reference point of the design team and make up a major portion of the design package . Because of the functional nature of these drawings, simulation of the design can be accomplished in a structu red form. At this time, a technical description docu ment is written to a l low others outside the design team to u nderstand the operation of the design . This document is especially use fu l in train ing new groups about the design as it progresses from the design phase to the manufactu ring phase . • " Paper debug" the design . This is an in depth review by the design team before any hardware is built. The process begins with the operational descriptions and fol lows the Digital Technical journal No. 2 March 1 986 documentation hierarchy down to the lowest leve l of the design. Normal operations and error conditions are checked , and each ele ment is ana lyzed for test and d iagnostic coverage . M istakes found at this stage are much easier to fix on paper than in circuit boards, gate arrays , or software debugging. • Build a prototype . This process includes the drawing of schematics to show the i ntercon nection of the physical pieces , the layout of circuit board s , the deve l o p m e n t of gate arrays , and the writing of software routines that interface to the hardware. • Debug the protOtyp e . If the paper debug was done correctly, this stage shou ld not u ncover any d i sasters . The i n d i v i d u a l fu n c t i o n a l p i eces of t he design can be tested and checked off using the fu nctional partitions as a gu ide . That systematic method will ensure that the entire design is teste d . T h e design process i s t h e solu tion t o a mul t i d i mensional p rob l e m . T herefore , there is probably more than one design that will meet the goals. There is a lso the probabil ity that it may be i m possible to meet all the goals. In this case , some comprom ise in the goals must be made in order to make a solution possibl e . This design problem is like those encou n tered i n most other designs: Make it fast, cheap, smal l , reliable, and don't take too much time . With each goal being constrained by others , the need for a structured method of fi nding a solu tion becomes more i mportant. The way to solve a set of simu ltaneous equations is not to try a so lu tion and see if it fits , but to use some proven techniques tO determine the correct sol u t io n . Dividing the overa l l problem i nto smaller ones and then determining a solution is probably the most powerful technique that can be appl ied . Design Implementation and Testing A ttacking the Goals Each goal p l aced some unique restrictions on the design . Thus, it was important to u nder stand the effect of each goal a nd how flexible the achievement of that goal was . By keeping a constam watch on how the goals were being met, trade-offs cou ld be made very qu ickly . 67 The RQDX3 Design Project The fo l lowing discussion details each goal and how it was hand led: • • Cost-This was the original goal that caused the creation of the RQDX3 proj e c t . The cost/performance relationship was higher than desirable for the cu rrent disk control lers . A project l i ke the MicroVAX II system, in order to obta i n a good market sha re , needed to i m p rove t h i s r e l a t i o n s h i p by reducing the cost of the disk su bsystem . Therefore , i t was very important for us to atta i n our cost goa l . To do that we placed a restriction on which components or technol ogies could be used, and what the assembly cost of the modu le cou ld be. Maximizing the number of machine-insertable parts there fore became an important consideratio n . Performance-The MicroVAX I f system wou ld support the ful l VAXjVMS operati ng system . Since i t supports virtual memory, the VMS system uses large data transfers in the d isk subsyste m . We therefore chose to optimize the performance of the controller around these large transfers to improve total system performance . By making the physical disk drive the l i miting factor, we evolved a n archi tecture that would a l low simu ltaneous operations in the controller. In contrast, the curren t RQDX I and RQDX2 disk controllers l im i t the data transfer rate between the host memory and the disk drive because of their archi tecture . The single thread of control i n these modules, though adequate for P D P- 1 1 systems, forced an interleave of logical data blocks on the disk surface. That interleaving w o u l d h i n d e r t h e p e rfo r m a n c e of t h e MicroVAX I I system . There are also many techniques for reducing the average seek time of the d isk drives . These methods include overlapped seeking on multiple drives, rotational optimi zations, improved seek algorithms , and various data buffering techniques . We wan ted to i nclude as many of these optimizations as possible and , since the goals were d riven by the design team , the trade-offs were a l i ttle more flexible . • 68 Dual module-This goal more than any other caused the most problems in the design of the hardware . Many t imes a solution seemed to meet all the goals but, when a detai led parts count a n d mock-up were created , there were a few components that just didn't fit on the board . Meeting this goal led to the exten sive use of CMOS gate-array technology to meet this size restriction. • Schedule-We d id not have the luxury of set ting the date for the project's completio n . Because t h e disk controller was s o important tO the overa l l MicroVAX I I project, we were given a completion date based on the availa b i l i ty of the M i c roVAX II h a rdwa re . Of course, this procedure i nvolved a manage ment factor that certainly kept the design team on i ts roes by being cold to see if we could do it. In response, we developed a schedule that wou ld maximize the work that could be done in parallel while keeping the risks at an acceptable l eve l . • Testable Design-T h i s goal became more i mportant as the details of the design were completed . The mod u l e , being driven by an onboard microprocessor, would be capable of self-d iagnosis. Therefore , where possib le , a l l i nterna l ly addressa ble registers were made to be writejread registers and extra datapaths w e re added to m a x i m i z e t h e a m o u n t o f l o g i c a va i l a b l e to t h e microprocessor for testing. This goal had to be weighed agai nst the need for l i miting the design complexity, cos t , and size. Task Partitio n ing The short p roject sched u le forced us to adopt a development strategy that wou l d maximize par alle lism in the deve lopment of the RQDX3 . The first division was made between the hardware development and the microprocessor firmware deve l o p m e n t . Each major task was fu rther reduced tO smaller design fu nctions . I n many cases we had to create a model or emu latOr of some other undeveloped part of the design in order to a l l ow tasks to contin u e . Hardware Developmen t Once the fu nctional partition drawings were created , we had a solution that met the per formance and fu nctionality that were requ ired . However, we still did not know if the cost and board area requ irements wou ld be met. The design team qu ickly determined that some cus tom i n tegrated circuits would be needed tO help us meet these goa ls. Previous experience , Digital Tecbnical]out-nal No. 2 March 1 986 New Products a known process , and qu i c k tu rnarou nd made CMOS gate array techno logy the key to our solution . Two gate-a rray devices would be needed , but we had only one gate-array design team on our project . We decided that one gate array wou l d b e deve loped first and a TT L emulator o f the second device would be created and used for the module-level testi ng . In that way, the i nte gration of the firmware u nder deve lopment with the hardware cou ld begin early in the schedu l e . The key area i n al most any disk control ler ce nters around the design of the phase locked loop and the data separator logic used in recov ering the e ncoded data from the disk surface . We knew at the beginning of this project that our team did not have the experie nce to design this section. Therefore, we employed the ser vices of outside consultants to this proj ect. They contributed not only their previous expe rience in data separator design , but also re in forcement and management of the design phi losophy taught to us i n the past . Firmware Development To meet our sched u l e goal , it was necessary to begin development and testing of the firmware for the onboard m icroprocessor we l l before any hardware was ready . The firmware consisted of many modu les, the majori ty of which were in dependent of the hardware . These modu les cou l d be d e s i gned , cod e d , d e b u gge d , and tested in para l l e l with the design, implementa tion , and debugging of the hardware . Then at a later date , the few re main ing hardwa re-depen dent modules cou ld be developed and inte grated tO form the complete RQDX3 firmware . Thus, the target system first used for develop ing the firmware was not the prototype RQDX3 w i t h i t s o n bo a r d m i c r o p rocesso r , b u t a VA.XfVMS system with two software emu latOrs (one for the Q-bus subsystem and one for the disk subsystem) . The VMS system was chosen for several reasons : first, it has an extremely nice set of program deve lopment tools; second , the VMS d isk d river cou ld be adapted to pro duce a ste ady stream of stimu l i ( d i sk 1/0 requ ests) to ve r i fy the correctn ess of t h e firmware's responses . Wi t h only a s m a l l amount of " trickery," the VMS system cou ld be "con vinced" to use a disk controller built not out of hardware , bu t out of software ; the two emula- Digital Technical Journal No. 2 March I 'J86 tors mentioned above provided the necessary glue . The emerging RQDX3 firmware coul d be deve loped in the context of a normal VMS process, taking ful l advantage of VMS compil ers, l inkers, and debugge rs . Alt hough it took a lot of time (and many system crashes) to get this technique to work , it greatly speeded up the job of building all the hardware-indepen dent modu les . This stage took about fifty per cent of the total t i me spent to deve lop the firmware . The next target system was the actual proto type RQDX3 with an i n-circu i t emu latOr (ICE) for the m icroprocessor and a TTL emu lator for one of the gate arrays . Hardware debugging was accomp l ished first by spec ial code written to perform repetitive actions on particu lar por t i o n s of t h e hardwa r e . T h e n , t h e a c t u a l fi rmware , which h a d been previously deve l oped and was, in a sense , known t o work, was loaded into the hardware . The ICE was a great help here si nce it al lowed RAM to be substi tuted for ROM; that allowed a l evel of symbol i c debugging. A t this point in t h e process , the hardware -dependent modu les were built. This stage took abo u t t h i rty perce n t of the total firmware devel opment time . The fi nal target sys t e m was t h e " b are " RQ DX 3 , with no emulators and real ROM . This configuration proved to be identical tO the pre vious one ( i . e . , no problems were fou nd in replacing the emu lators with real devices) , but al lowed prototype boards to be ship ped i nter nally. The firmware of the RQDX3 could now be tested by differe nt operating system groups , and bugs appropriatel y located and fixed. This stage took about twenty perce nt of the tOta l firmware devel opment time . Design Verificatio n Testing The p u rpose of design v e r i fication tes t i n g (DVT) i s t o assess at an early stage whether a d e s i g n has any p a r t i c u l a r i m p l e m e n t a t i o n probl ems . To d o that , t h e board i s tested agai nst a l l Digita l ' s applicable standards. First, the lay out of the board (the etch) is checked by l ook ing for n oise radiation and picku p , and for undershoot or overshoot on clock l i nes. Then, the board is checked therma l ly to see if it can withstand both opera t i ng and nonoperating environmental stresses. Next, FCC testing is d o n e to m e asure the rad i a ted fre q u e ncy spect ru m . Finally, the module is sha ken and 69 dropped tO ensure that no chip fa l ls out of its socket under normal hand ling conditions . Feed back from DVT can resu l t in physical cha nges tO the mod u l e , perhaps as severe as a new e tch layout. In the case of the RQDX3 , a recomm endation was made to add resistors to a pair of clock l ines in order to dampen un dershoot . Fortu nately, this a l teration did not have much impact on the schedu l e . Relia bility a n d Quality Testing The purpose of reliabil ity and qual ity testing (RQT) is to demonstrate that the prod uct me ets ce rta i n m i n imum r e l i a b i l ity standards , mea su red as m ean time between fa ilures (MTBF) . The design team specifies the MTBF and also other measures of quality, such as hard and soft error rates, both of which affect the perceived quality of a disk control ler product. The n , the RQT team designs a test that wi l l demonstrate whether or not the product meets or exceeds t h e s e m e a s u ra b l e q u a n t i t i e s . Us u a l l y t h a t i nvo lves b u i l d i ng a system ( C PU , mem ory , serial line interface) t hat includ es the product u nder test . The system runs some leve l of host software that exercises the product for a large number of hours u nd e r various temp eratu re and humidity extremes. Designing these tests is not an easy task, and indeed the RQDX3 had major prob lems during RQT because of this dif ficu lty. Feedback from RQT can result in hard ware changes, or firmware changes, or both. Ideally, if the product is changed, RQT shou l d start again from t h e beginning. However, sched u les will often not al low that and compromi ses must be mad e . A decision affecting a l l of RQT must b e made near the beginning: whether to test the product at the system level or a t the module leve l . Test ing at the system l evel im p lies that the system MTBF and error rates must be met, and a l l fai l ures, whether related t o t h e product u nder test or not, shou ld be counted . Testing at the mod u le leve l impl ies that the mod u l e MTBF and error rates must be met, and only fa i l u res that can be attribu ted to components u nder test sho u l d be counted . Cl early, module-level test ing is preferred since i t gives the most informa tion about the new prod uct. Howeve r, modu le l evel testing i s more difficu l t because each error has to be investigated tO determ ine its cause and whether or not it shou ld be cou nted . Furthermore, the burden of proof is on the 70 design team to verify that the error was not caused by their mod u l e . ( G u i l ty u n til proven i n nocen t ' ) Weighing all these factOrs, we decided t o test the RQDX3 at the mod u le leve l ; that caused most of our RQT problems . A sealed chamber was used to control the tests of cycl ing over t e m p e r a t u re a n d h u m i d i t y e x t r e m es . T h e RQDX3 modu les were placed i n t h is c hamber, al ong with the systems in to which the modu les were pl ugge d . Part of the testing i n c l uded read ing and writing from both floppy disks and Winchester d isks . S ince these disks cou ld not withstand the environmental extremes inside the chamber, they were p laced outs id e . Early testing showed that th is setup did not work, since the disk d rives had to be connected to the contro l lers with le ngthy cables, which were suscept ible to noise picku p . This configuration was modified to bring the disk drives inside the chamber where they were connected tO the controllers with normal cab les. That e l i m in ated the noise problem, but now d ictated a reduced environmental stress on the RQDX3 mod u l e (from class C t o class A) . At first, we encou ntered a higher-than -normal rate of soft errors on the floppy disks . A search for the cause of this problem showed that a com bination of two separate b u t contributing problems were responsib l e . First, a rare combi nat i on of eve nts cou l d cause the data separator for the floppy disk to temporarily fa i l to lock to the data stream. Second , most if not a l l the floppy di sk drives t hem se lves were not per fo rming correctly. The former prob lem was fixed by a component c ha nge to t he data separator; the latter, by testing and repai ring those drives that showed the greatest nu mber of soft errors . These two changes reduced the soft error rate for the floppy d isks to a l evel we l l with i n the range specified b y t h e design tea m . The extensive , and lengthy, RQT also uncov ered one b u g in the error handl i ng of the RQDX3 firmware that had never been seen in our deve l opment lab. The problem cou ld only have been experienced by running many, many modules in para l l e L Of cou rse , the pu rpose of RQT is tO catch such problems then instead of at custOmers' sites. The RQDX3 Architecture The mass storage control ler protocol (MSCP) defines the commu nication between the host processor and the d is k controller. Communica- Digital Technical journal No. 2 Mm·ch 1986 New Products tion occurs using sequences of command pack ets, generated by the host , a nd response pack e t s , g e n e r a t e d by t h e c o n t r o l l e r . T h e transm ission o f the packets and logical data blocks that are to move between the host and the controller is defined in the U/Q Storage Systems Port (UQSSP) specification . These two spec i ficati ons p l ace the fo l l ow i ng req u i re ments o n the contro l l er: • • Two sequential -word register locations on the Q -bus are required . Those are referred to as the status and address (SA) register and the initialization and pol l ( I P) register. These registers must be able to be assigned at any longword boundary within the Q-bus 1/0 page . The control ler must have the abi l ity to inter rupt the host processor using a previously loaded vector address . • The control ler must contain enough i n te l l i gence t o i n itialize itself, perform i nternal diagnostics, decode command packets , per form all disk control functions, transfer data, and encode response packets . These tasks are accom pl ished on the RQDX3 through the use of a DCT l l microprocessor. • The contro l ler must be able to perform D MA data transfers on the Q-bus. These transfers will be for command and response packets , as we i l as for d isk data. The d iagram i n Figure 1 shows the flow of i nformation in an MSCP control.l er. MSCP com mand and response packets flow between the memory in the host processor and the on-board m icroprocessor. Disk data flows between the memory of the host processor and the disk sur face . I n formation dealing with the format of HOST CPU HOST MEMORY MICROP ROCESSOR DATA REVECTOR/FORMAT TABLES DISK DRIVE Figure I Digital Technical journal No. 2 March 1 986 Information Flow in the RQDX3 71 The RQDX3 Design Project data on the disk su rface (revector tables, format tables, etc . ) must be transferred between the disk su rface and the m icroprocessor. Figure 1 shows a centra l i zed data buffer ele ment . It is used for temporary storage and as a means for smoothing the d i fferences i n data transfer rates between the host mem ory, the mi croprocessor, and the disk surface . It was decided to i m plement this centra l i zed data bu ffer as a three - po rt mem ory system . Three control elements are provided for the transfer of data between each memory port and the appropriate sou rce or destination. These e lements are the Q -bus D MA contro l l e r , the microprocessor w i th i ts internal bus- interface control ler, and a VlSI d isk controller with an A internal DMA interface . The i nterconnection of these subsystems is shown in Figu re 2. Each control e lement assu mes that it has the memory system for its own dedi cated use . The arbitra tion between these e lements for access to the memory devices is hand led within the memory subsystem . The Mem o ry Subsystem The m e m o ry s u bsys t e m con t a i n s a fi n i t e sequential -state machine that receives req uests for memory cycles from the three ports and per forms the memory cyc le for the h ighest-priority requesting port . It is required that any port reques t i ng a m e mory cycle m u s t have i ts address and any required data avai lable before 0-BUS � 4� v 0-BUS I NT E R FACE SU BSYSTEM t � t "'I /)>. CONTROL I N FO R M AT I O N � DATA � v MULTIPORT M EM O R Y SU BSYSTEM t ,(). A D D R ESS I � ----....- '-----' � M I CR O PR O C E S S O R ---, ----,v / / SUBSYSTEM M S C P PACKETS A N D W O R K S PA C E ,( t t DATA CONTROL I N FO R M AT I O N v� DISK I NT E R F A C E S U BSYSTEM � I .11 K �'---------' I ------ � r- --' �ll---- � --:=-: �--' FRONT PANEL BUS / TO DISTR I B UT I O N BOA R D Figure 2 72 '- RQDX3 Subsystems Digital Technical journal No. 2 March 1 986 New Products posting the request to the memory contro l ler state machine . The principle fu nction of the memory system is twofold : fi rst, it allows the controller attached to a specific port to deposit data to be wri tten to the mem ory in a ho lding registe r; second, it a l l ows the me mory control ler ro write that data to the RAL\1 devices some time later. For most read requests, the me mory contro l ler performs a p refetch operation when there is an em pty output register in one of the ports. This operation is possible because the accesses by both the d isk and Q-bus controllers are known to be sequential , with the next a d d ress always a va i l a b l e to t h e m e mo ry controller. The port of the microprocessor is an excep tion to this prefetch operation. The me mory controller cannot prefetch the data si nce mem ory accesses by a mi croprocessor are not a lways sequ ential . When requesting a cycle from the memory, the microprocessor will be " cycle slipped" (i . e . , wa it states added to i ts mi cro cycle) until the me mory controtler determi nes that the m icroprocessor is the highest-priority requesting device . The h i ghest priority for mem ory cyc les is given to the d isk controller port . Fa i l u re to ser vice this port first wi l l cause overrun or u nder run errors in the disk controller chip, which has l i tt l e buffering. These error cond i t ions wou ld cause serious degradation of system per formance , since fu ll disk revolutions wou ld be wasted retrying the operati ons. The midd le priority is given to the Q-bus DMA con trol ler port . This port requires the h ighest serv ice rate from the system (approxi mately 700 nanoseconds per request) . How ever, the port is capable of slowing itself if it cannot be serviced in time by the mem ory con troller. Of cou rse , to ach ieve the h ighest system performance and most e ffi c i e n t use of the Q-bus, i t is desirable that the Q-bus contro l ler never slow down. The microprocessor is given the lowest prior ity for mem ory cycles . That allows the normal .operation of data transfer between the d is k and host (both disk control l er and Q-bus DMA con trolle r active) to be completed as fast as possi b l e . The microprocessor can use any remaining memory ban d w i d t h for its o p e ration . The microprocessor uses the s hared mem ory for both temporary storage and its operational Digital Technical }ow-n a/ No . 2 Marcb I ')8(j stack. Since its use of that memory will be infre quent, the microprocessor will not be affected by any loss in memory response . A prototype of the memory subsystem was bui l t to measure t he amount of bandwidth available tO the individual ports and tO deter m ine the effect of arbitration between the ports . A worst-case condition of requests from a l l ports was created and the bandwidth used by each was measured . With any two ports oper ating at their fu l l speed , there was no measura ble red uction in service rate from that of the ports running independently . When all three ports were operating, t h e disk port lost no mem ory bandwidth, the Q-bus port lost only one percent of its requested bandwidth, and the m i c r op rocessor l o s t e i g h t p e r c e n t of i ts requested bandwid th. These observations dur i ng worst-case cond i tions indicated that all three ports are capable of operating at fu ll speed with their normal request patterns. This feature of the RQ DX3 allows it ro overlap disk data transfers, Q-bus DMA transfers, and microprocessor opera tions to achieve maximu m performance . The memory controller is implemented using a field progra mmable logic sequencer ( FPLS) and an external i np u t sychronizer. Even though gate-array technology was used for t he majority of the datapath on this mod u l e , it was fe lt that b u i lding the state mach ine in the gate array was too risky for the proj ect schedule. The state machine was therefore placed outside the gate array. Only a few gate array pins connect it to t he datapath el eme nts that it controls . The mem ory con tro l l e r also in corporates some featu res to aid in the test and repair of the mod u l e . After module init ia li zation , an i n put signal is asserted to force the memory control ler tO honor only those requests coming from the microprocessor. Without t hat, a hardware fai l u re in e ither the disk controller or the Q-bus DMA control ler could consta ntly requ est mem ory cyc les and cause the m i croprocessor to " hang" on its first access to memo ry. With this signal asserted , the microprocessor can initiate the module d iagnostics in a small, isolated envi ronment that enables the microprocessor, ROM and RAM devices, and I/0 page registers to be tested . The microprocessor can then clear the signal later in its diagnostics, thus completing the mod u le testing. 73 The RQDX3 Design Project The Micropro cessor Su bsystem The m icroprocessor subsystem of the RQ DX3 module is made up of a DCT l l m i croprocessor, 1 6K words of E PROM memory, a front-panel interface , and a prioritizing interrupt circu it. Although many d i ffe rent m icroprocessors could have been used, the choice of the DCT1 1 was made with the fol lowing criteria in mind : • A 1 6-bit microprocessor cou ld handle the MSC P requ i rements adequately, while an 8-bit microprocessor wo uld be strai ned and a 3 2 -bit mi croprocessor m ight be an overki l l . • A mul tiplexed address-and-data bus woul d r e d u c e t h e n u m b er of g a t e array p i n s requ ired . • A rich, orthogonal i nstru ction set (PDP- 1 1 system) that cou l d be easily understood should be used . • The m icroprocessor should be able to be program med i n a high-level language . M u c h o f t h e code for t h i s modul e wou ld be written in the C programming language . • Relatively fast execution speed is desire d . • Avai lable hardware and software develop ment tools shou l d be used . • Our past d e s i g n experience s h o u l d b e exploited t o i mprove the product's time to market. The Q- bus Su bsystem The Q-bus su bsystem of this module is made up of the programmed 1/0 section , the Q-bus D MA controller section and the Q -bus i n terrupt sec tion . The Q-bus DMA controller i s composed of a finite sequential-state machine and associated datapath eleme nts that are used to perform both block-mode and nonblock-mode Q-bus cycles. The state machine is imple mented i n a fie l d programmable logic sequencer rather than a gate array to eliminate the risk of schedu le d e l ays due to coding errors . H oweve r, the datapath ele ments needed to su pport the state machine are contained within the gate array devices. Some of the features of this controller are • Fu l l 2 2 - bit Q-bus addressing • A 1 6-bit DMA word cou nter • Q-bus me mory parity detection 74 • Fu l l , e ffi c i e n t i mp le menta t i o n of Q-bus block-mode transfers • A programmable hold off timer to regu late the Q -bus activity The Disk Controller Su bsystem The disk controller su bsystem had to provide the control and datapath fu nctions for both floppy and hard d isk drives i n the smal lest space and for the least cost. This requirement was satisfied by using a VlSI disk controller device . The RQDX3 data separator is designed to receive the encoded data stream from the disk and convert it into a bi nary data stream and clock, both of which are then fed to the disk controller chip. The data separator is designed to operate at three different data frequencies tO be compatible with the ava i l a b l e range of Winc hester and floppy disk drives. The fre quencies for each type of drive are as fol lows : • 5 - MHz MFM encoded data recovery from ST4 1 2 Winchester disks (RDSX type) • 5 0 0 - KH z M FM encoded data from high speed , h ig h - d e ns i ty floppy d i sks ( RX 3 3 type) • 2 5 0 -KHz MFM encoded data fro m standard double -density floppy disks ( RX50 type) The data recovery system for the RQDX3 is a unique MFM data recovery circuit that is very cl ose to ideal . I n short , with proper matching of the device de lays, the recovery window is +50 nanoseconds, or one hundred percent of the window . This a lmost idea l data recovery is made possible by the fol l owing conditions : • A solid and precise phase l ocked loop is use d . • The M F M encoding rules specify a 1 0 0 nanosecond " n u l l " period after each flux transi tion . This period is used to reset the edge store and compensation fli p-flops of the circuit. • The VCO output has a fifty percent d u ty cyc l e . • The logic delay paths in t h e data separatOr circu its are carefu l ly matched. This matching was a c c o m p l i s h e d by d e v i c e m a t c h in g within the gate array that implements this fu nction. Carefu l simu lation of this logic was carried out to prove this operation . Digital Tecbnicaljournal No. 2 March 1986 New Products The Structure of the Firmware The firmware had to be designed to take fu l l advantage o f the paral lelism provided by the chosen hardware architectu re . Therefore, the RQDX3 fi rmware consists of a set of cooperat ing ro utines, or jobs, each of which performs a dedicated fu nction . Each job has its own stack and thus i ts own context and state informat ion . Any operations that cou l d possibly run in para l lel have been separated and are control led by separate jobs . A small operating system kernel provides facil ities for creating new jobs, sus pend ing and resu mi ng execution of a given job, acqu iring exclusive access to shared resources and later re leasing those resources, and sched u l i ng jobs tO ru n based up on priority and resource contention cri teria . This kernel pro vides a controlled way of overlapping opera tions . That effectivel y means that the RQDX3 can be simu ltaneously seeking on one or more drives, reading or writing from another drive, and transferring data to or from t he host, a l l while perform ing calculations relat ing either ro the cu rrent transfer or tO a pending transfer. Performance Tests The main performance goa l was to be able ro sustain a high data-transfer rate for large trans fe rs . In a typical situation, the VMS system uses the disk to swap, page , and load images. The RQDX3 is tuned so that these operat ions are completed as rapidly as poss ible. Maximum sus tained data transfer rates of 4 2 0 KB per second have been measured, compared ro 1 7 0 KB per second on the RQ DX 2 . Such workloads are atypica l , though, and do not give a good indica tion of overa l l system performance . When tested with a workload of from one to fifteen users on a Mic roVAX I I syste m , the RQDX3 is faster than the RQDX 2 , but sl ightly slower than the KDAS O . This relationship is more in li ne with the performance based on theoretica l cal culations. A user work load generates a lot of seeking, and the RD-class disks control led by the RQDX2 and RQDX3 seek more slowly than the RA-class disks control led by the KDAS O . Higher performance can be ga ined by split ting the disk activity among two, three , or even fo ur disks . The RQDX3 has the abili ty to keep a l l fou r drives seeking at the same time. For sma l l transfers, seek time dom inates , and an increase in system throughput of thirty-five ro Digital Technical journal No. 2 March I ')86 forty percent can be rea li zed . For large trans · fe rs, seek time is sti l l im portant but decreases in s i g n i fi c a n c e ; t h e i n c r e a s e in s y s t e m throughput may only be twenty percent . The RQ DX2 does not take advantage of separate sys tem and user disks; however, the RQDX3 wil l . Higher performance o n a single drive can be ach ieved by queuing mu ltiple requests to the RQDX 3 . The MSCP prorocol allows these m u lt i p l e requests to b e automatically reordered by the control ler to reduce the average seek time. F o r exa m p l e , the contro l l e r cou l d always choose the request with the shortest seek time instead of the first request in its queue . An increase in system th roughput of thirty to forty percent occurs when the nu mber of outstand ing 1/0 requests increases from one tO twelve . Summary The RQDX3 design project carne close to meet ing all its design goa ls. There were 40 working un its exactly one year after the project began . However, prob lems in the re liabili ty test setup, w h i c h de layed the m a n u facturing start u p , caused our first customer ship ment t o s l i p . The cost, performance , and module-size goa ls were a l l met ro the satisfaction of the design tea m . The high yields i n manufactu ring can be a ttrib u ted to the quality of both the design and the manufacturing process . Without the structu red design process and t he team's adherence to it, this project wou ld not have been su ccessfu l . References 1. W. I . Fletcher, An Engin eering Approach to Digital Desig n ( E nglewood C l i ffs : Prentice-Ha l l , 1 9 80 ) . 75 Kathleen D. Morse Lawrence ]. Kenah The Evolution of Instruction Emulation for the Micro VAX Systems The Micro VAX CPU, the 78032 chip, implements a subset of the VAX instruction set, yet the operating system must support the full set. To accomplish that, the Micro VMS developers decided to emulate the miss ing instructions-floating point, packed decimal, and character string instructions-in software. Since hardware and software were developed in paraUel, a VAX-11/730 system, with its microcode rewritten to make it act like Micro VAX hardware, was used as a test vehicle. The perfonnance measurements indicated excessively long execution times. The hardware design was extended to assist the software emulation task. The final emulator was also used in the UL TRIX-32 and VAXELN systems. When Digital Equipment Corporation decided to implement the VAX architecture ' in sil icon, it was clear that the entire instruction set cou l d not b e implemented o n a single chip. To deter m ine wha t coul d be i mplemented , a team of software and hardware engi neers was formed to identify the best su bset of the VAX instructi ons that woul d fit . As a consequence, the software engineers had to find ways to provide su pport in the operating system for those i nstructions removed from the base machine . This paper dis c u s s e s how t ha t e m u l a t i o n s u p p o r t was provi ded . Micro VAX A rchitecture The amount of microcode needed to implement an instruction is a good measure of the amount of space needed on a chip tO implement the same instruction. Microcode size thus became one m e a s u re u s e d in d e t e r m i n i n g w h i c h instructions t O move off the chip. A second cri terion was the frequency with which particu lar instru cti ons are used . For example , integer and logical instructions are used very heavily and their freque ncy of use is indepen dent of the app l i cation area . Floating point instru ctions appear most frequently in scientifi c and engi neering computations . Packed decimal instruc tions are more common in certain commercial appl ications. Eventually, by balancing t h ese 76 considerations , the engineers i dentified a sub set of the VAX instru ction set that wou l d fit on one chip. That su bset became the definition of the MicroVAX architecture . (The subset archi tecture also diffe red from the ful l VAX arch i tec ture i n such areas as the console su bsyste m . ) Once t h e MicroVAX architecture was com p leted, the hardware and software teams began i ndependent development efforts . Since a major project goa l was tO mini m i ze the time to mar k e t , o n e h a r d w a re tea m i n v e s t i ga t e d a MicroVAX impl ementation (the MicroVAX I sys tem) that used semicustom logic instead of a single chip. A second hardware team started the design of the MicroVAX chip itsel f , and a t h ird team in itiated the design of the implementation (the Mic roVAX I I system) that wou l d incorpo rate that chip. At the same time, t he softv.rare teams began their investigations of how to enhance the VMS, ULTRIX-3 2 , and VAX E LN operating systems in order to ru n these new mach i nes. The software des igns were infl u enced in part by the need to implement and test the m i ssing- i n struction software emu lation before any hardware was available. Operating System Support The major d i ffe rence between the software architectu res of the MicroVAX and the fu l l VAX systems is the group of instru ctions that were Digital Technical journal No. 2 March I 986 I New Products not implemented i n the chip hardware . This group consists of • Floating point i nstructions • Packed decimal instructions • Character string instructions (The MicroVAX architecture i n c l u ded the MOVC3 and MOVC5 instructions because they were heavily used in fu ndamental rou t i nes, such as copying or fi l l ing memory arrays . ) Each o f the three operating systems was sup ported b y a d i ffe re nt design grou p . These groups had to decide which course of action to take to accommodate t he reduced n u m ber of instructions that wo u l d be impleme nted in microcode. The fol lowing alternatives were the most realistic courses to tak e : 1. All compilers and assemblers could be changed to eliminate all uses of the miss ing i nstructions . 2. Emulation subrouti nes that appl ications could l i n k into their programs could be s uppl i e d . (VMS used t h i s method on early VAX models that did not include h a r d w a r e s u p p o r t for t h e G a n d H floating point data types.) 3. The e m u l ation subro u t i n es could be im plemented so that their use would be invisible to application programs and even to most of t he operating syste m . The VMS Decisio n Pro cess The VMS design team began a study to deter mine the extent to which the m issing instruc tions were used in the operating system cod e , including all the va rious VMS u t i l ity programs. As expected, t he character string i nstruct ions were used most frequ ently and, in fact, were more widely used than expecte d . The CMPC 3 , CMPC5 , and LOCC instructions were the most frequently used string i nstructions, occurring almost everywhere that ASC I I text was manipu lated ( for exam p l e , in device names , fi le names, and DCL commands) . All software that included some kind of bi tmap (about six to ten different areas, ranging from the file system to memory management) used the SCANC and SPANC instru ctions. A large nu mber of table lookup designs (including DCL and u til ity com mand parsers) used the MATCH C , MOVTC, and MOVTUC instructions . Finally, the CRC instru c- Digital Technical journal No. 2 March 1986 tion was used by the BACKUP utility and by the DECnet code . Very few data types were used outside their realms and only a few u nexpected seque nces were found that used the missing instructions. One example was the use of the CVTLF instruc tion in the VMS kernel to determ ine the small est power of 2 larger than a given integer. A second example was the use of the CVTLP instruction i n t he FORTRAN ru n -time support l i brary as a quick me thod for converting bi nary representations to tex t . Once t h e extent o f t h e missing instru ction usage was determined, the design team consid ered t he nu mber of compi lers that were sup ported by the VMS operating system. I n all, over fifteen different languages are supported . 3 The first alternative, c hanging t he compi lers and assemb lers, wou ld requ i re that the code gener ators for each product be changed . Moreover, new versions of the VMS operating system and all its layered products would have to be gener a ted using these new compi lers . That would involve a signi ficant i nvestment of manpower, not just to enhance the compilers, but to pro vide ongoing support to maintain each product. In addition, two variants of each new version of each prod uct would have to be produced . A l i kely side effect was that these changes wou l d probably cause other development groups to l i mit most layered produ cts to the Mic roVAX s u bset on all VAX machines . I n that way, each group wou ld have to maintain only one version of their prod u ct . Another consideration was the effect that the first or second alternatives wou ld have on the marketing of MicroVAX systems . Custom ers and Digita l ' s software engineers had become accus tomed to deve loping software on one machine and executing it transparently on a ny other machine i n t h e VAX fa mily. That wou ld not have been possible u nder ei ther of the first two alternatives. Through this reasoning process, it became obvi ous that the correct choice was the third alterna tive, to design for software emu lation and make it transparent to both appl ications and operating system code . While requ iring a concentrated effort to write the emulation sup port , the overall effort for software emu lation was much sma l ler than removing the use of the missing instru ctions from existing software and com piler code generators . The effort was also 77 isolated. While some new code was needed, the number of changes to existing components was minimized . These changes were confined to the exception handler and the startup routi nes for the operating system . Finally, transparent emu lation of all missing instructions would guaran tee that systems implementing the MicroVAX architecture would be fu lly compatible with the VAX family of machines. Implementation As mentioned earlier, the MicroVAX program was geared to a tight time-to-market schedule . That made i t highly desirable to develop the hardware and software in parallel as much as possible . The VMS design team decided to implement the emulation code and debug i t long before the hardware design specifications for a particular MicroVAX implementation were written. In this way, the emu lation code would be finished and working by the time the first MicroVAX hardware was ready to be debugged . Design of the Emulator At this point in the proj ect, several decisions were made relating to the design and imple mentation of the MicroVMS instruction emula tor . The emulation routines would be devel oped and tested by the VMS Deve lopment Group. These rou tines wou ld atte mpt to avoid features or coding techniques specific to the VMS operating system . Thus the same emu lation source code for the instructions cou ld be used later by the ULTRI X - 3 2 and VAXELN Develop ment Groups. The emu lation support was divided into two pieces. The first supported c haracter string and packed decimal instructions (including CRC and EDITPC) ; the other, fl oating point data type s . From the beginning of the MicroVAX effort , system configurations would be offered that provided some sort of floating point sup port in hardware . 4 That fact i nflu enced the design of the two pieces in the emu lator. Software support for floating p o i n t was viewed as a technique for ru nning programs that contained small amounts of floating point computation. Applications that depended heav ily on floating point operations would likely be run on systems that had floating point support in the hardware . Converse ly, applications that depended heavily on packed decimal or charac ter operations did not have a hardware option at their disposa l . The decimaljstring emulator 78 reflects that i n several places where space is sacrificed i n an effort to speed up the emula tion subroutines . Struc ture of the Em ulator Once the two pieces were designed, the actual coding bega n . Eac h of the two emulation com ponents was fu rther divided into an operand decode piece and an i nstruction execution piece . The operand decoder was a straightforward fi ni te-state machine. It parsed the instruction stream one operand at a time , placing results into registers " appropriate" to each instruc tion . The register assignme nts were usually mad e by examining the expected register con tents after each i nstruction had completed its execution. For example, the final state of a CMPC5 instruction suggests that R 1 and R3 be used as pointers to the two character stri ngs, while RO and R2 contain the in itial sizes of the strings . The instruction execution rou tines were sim ple subroutines that accepted input parameters in registers and produced output conforming to the architectural specification of the instruc tions. For example, after the execution of an ADDP4 instruction , RO and R2 contain zero, Rl and R3 locate the addend and sum strings , and the other registers are preserved . A t t h e outset, several other decisions were made that simplified the design and i mplemen tation of the emu lator. • Emulation support was provided transpar ently by being i mplemented at a very low level in the operating system. • Emulation subroutines were executed in the access mode of the missing instruction. • The existing emulation support for G and H floating point data types woul d serve as a b a s e for fu l l fl o a t i ng p o i n t e m u l a t i o n support. Tra nspare n t Support To emu late the missing instructions transpar ently, the emulators had to become an integral part of the operating sys te m . They were loaded into system space during the system bootstrap and connected directly to the reserved-opcode exception vector in the system control block. W h e n e v e r a r e s e rv e d - o p c o d e e x c e p t i o n occurred, t h e emulator woul d distingu ish the Digital TecbntcaJ ]ournm No. 2 March 1 986 New Products execution of a mJssmg instruction from other i l l ega l opcodes. Missing i ns tru ctions would cause a contro l transfer tO the appropriate emu lation su brou tines. Other l l lega l opcodes were passed on to the operati ng system as excep tions. Since the host operating system provided support in a transparent fashion , existing pro grams cou ld execute on a M icroVAX system without being change d . A ccess Mode of Execution The reserved -opcode exception handler had to begin its execution in kernel mode , as defined by the VAX architecture . However, if the emula tOr r o u t i n e s c o n t i n u e d i n that m od e , the address val idation rules demanded that not only each operand but also each byte in a character string be probed for read or write access before that operand could be used. Because of the excessive cost of these operations, we decided that the emu lator routines wou ld execute in the access mode in which the missing i nstruction was used . If an operand or string was not acces s i b l e , an access violation exception wou l d occu r, which could b e intercepted for special processi ng by the emu lator. The Use of Existing Routines An emu lator for G a nd H floating point data types already existed . I nstead of completely rewrit i ng this emu lator to accommodate a l l four data types, it was restructured to separate its operand packi ng and u npacking routi nes from the arithmetic and conversion operations . The n , additional packing and u npacking rou tines were added for F and D floating point data types. Also, the overall structure of the floating point emu lator was changed from a condition hand ler to an i n tegral piece of the operating system . (A condition handler executes only within user programs, while an i n tegral compo nent wou ld receive control whenever a missing floating point i nstruction is executed . ) In itial Testing I t was obvious that a testbed was needed to enable the design team to debug the emu lation software. Some method was needed to force the emulation software to gain control in order to execu te the missing i nstru ctions. Since the VMS macro assembler can substitute a macro for an instruction opcode , macros cou ld be used to cause the assembler to take speci a l action Digital Technical journal No. 2 March 1 986 whenever it encountered any of the missing instructions . A set of macros was written that caused spe cial object code to be generated whenever any of the missing i nstructions was encountere d . This special object code consisted o f a byte conta i n i n g the i l l egal opcode FE ( hex) , the opcode for the i nstru ctio n , and all the operand specifiers . When one of these i nstructions was executed, a reserved -opcode exception was generate d . A special exception handler wou ld then advance the PC from the byte containing the FE opcode to the actual opcode . Control was then passed to the instruction emu lator. One of these macros is l isted in Figure l . U s i ng t hese macros, programs written i n assembly language coul d be reassembled and executed using software emu lation for the miss ing instru ctions . Thus any existing VAX proces sor, such as a VAX- 1 1 /7 3 0 system , could be used as a testbed for the software emulatio n . Results of In itial Tests O ne key factor to determine was the i ncrease i n execution t i m e requ ired b y software emu lation for different parts of the operating system and for appl ication programs. To determine these d ifferences, t he VMS Performance Grou p at D igital ran standard i nstruction-timing tests against the emu lation code . Because these tests were ru n on an existing VAX processor, the exe cution times for emu lated instructions could be compared to those done in hardware on the same VAX processor. These test resu lts showed that it took about ten times longer to emulate character string i nstructions than to execute them in hardware . To determ ine the reasons for this disparity, the design team performed a close i nspection of the emu lation code . Q u ite qu ickly it became obvious that, for the simpler string i nstru ctions, the operand decode required as much time as the i nstru ction execution . To s peed up the emu lated i nstructions, hardware su pport was requested by the Mi croVMS tea m . To support this request , w e made a l ist o f the operand types for the missing character string and packed decimal instructions. There were only 5 operand types in a l l 27 i nstru ctions . These operand types were already being used by i n s t r u c t i o n s t h a t were a p a r t of t h e M icroVAX subset, such a s M OVC3 a n d MOVC5 . A meeting of the hardware and software teams 79 The Evolution of Instruction Emulation fo r the Micro VAX Systems . t i t le locc t s t $opdef Rede f i ne the L O C C o p c o d e w i t h a n e w L O C C ma c r o . opdef l occ_fe < < o p $ _ l o c c@8> ! A x f e > , r b , r w , a b . ma c r o lace c h a r . r b , l e n . rw , a dd r . a b locc_fe des c : ; Te s t char . r b , l en . rw , addr . a b . e ndm l ace .ascid " Th i s pr o g r am to t ry is a tes t" a LOCC data LOCC for i n s t r uc t i on . . entry s tar t _ h e r e , O Entry l ace <H Aa" Gene r a t e an m ov z w l H 1 , r0 Standard ex i t " > , d e s c , @d e s c + 4 ret Exi t . en d s ta r t _here Figure 1 End point f r om of for test emu l a t e d status p r o g ram LOCC code p r o g r am test p r o g r am Test Program with Macro for L O CC Instruction concluded that there wou l d be l i ttle cost to the underlying hardware if these operands were decoded before a missing instru ction exception was signaled . Design of New Em ulatio n Exceptio ns The result of that meeting was that two new exceptions were added to the MicroVAX archi tecture as emu lation assists. Since the hardware cou ld easily decode the operands for the char acter string and decimal stri ng instructions, they were defined as the ones that the new exceptions woul d su pport . Thus, two of the three i nstruction types not implemented in hardware co u l d now be hand led effective ly. The third type , fl oating point instru c tions, woul d cont i n u e to c a u s e reserve d -opcode exceptions, s ince their operands cou l d not be decoded without significant additional hard ware support . (A separate floating point unit, the MicroVAX 78 1 3 2 chip, provides this hard ware su pport for three of the four floating point data types.) 4 The first exception is generated whenever a character string or decimal string instruction that is not in the hardware subset is executed . The process causes the hardware to decode the operands and push the exception parameters onto the current stac k . The exception parame ters are depicted in Figure 2 . The second exception occurs only when one of the emu lated instru ctions is executed and the first-part-done (FPD) bit is set in the pro- 80 ; Te s t gram status longword ( PSL) . The VAX architec ture allows many instructions (including a l l the decimal and character string instructions) to be interrupted after partial execution. The original operand specifiers cannot be decoded again because the register contentS may have been altered to store the intermediate res u l ts . When this second exception occurs, the exception handler u npacks the in term ediate resu lts and resumes execution at the point where the i nstruction was interrupted. OPCODE PC OF I N ST R U CTION DECO D E D F I RST OPERAND - 1--� OTH ER DECODED OPERANDS '"' - U P DATED PC P S L O F EXCEPT I O N Figure 2 Exceptio n Param eters for Emulation A ssist Exception Digital Technical journal No. 2 March I 986 New Products Note that this second exception can occur o n ly when a n access violation has a l ready occu rred du ring i nstruction emulation . I n that case , the operating syste m ' s access vio lation h a n d l e r transfers control to the e m u l a t o r . Enough i ntermediate state i s stored i n t h e regis ters to allow restarting the i nstru ctio n , at which time the stack is restored to its state when the instruction began execu tion . Then the excep tion PC is changed from a PC i nside the emula tor to the PC of the original i nstruction that triggered emulation. Final ly, control is passed back to the o p e rat i n g sys t e m ' s exception report ing mechan i s m . ( Page fau lt s , device i nterru pts, and the l i ke are i nvisible to the user and requ ire no special handling. That is, there is no need to pack the state i nto the registers and a lter the saved P C . ) Final Design of the Instructio n Emulators The final design produced emu lation support in two pieces: one for the m issing floating point i nstructions; the other for packed decimal and character string i nstructions. Although the two emu latOr programs su pported different data types, their overall design contained many com mon threads . This section describes the com mon design philosophy, as well as the step-by step operation of each emu lator. Com m on Design Philosophy Nearly a l l the emu lation code executes in the access mode in which each missing i nstruction was origina l ly executed. The stack associated with that access mode is used as a working stor age area for the emulation routines . The e m u l ation of m issing instru c t i o ns is nearly invisible to programs i n the sense that memory and register contents are identical co those obtained o n fu l l VAX implementations . The only d ifference between the emu lated and hardware i m p l e m e nt a t i o n s i s in the t i m e requ ired t o complete a n instruction a n d i n the stack remnants from the emulator's temporary storage area. (Memory locations at small nega tive offsets from the top of the stack are speci fi e d as U N P R E D I C T A B L E i n t h e VAX architecture . ) The two emu lator p ieces share a common phi losophy, if not common code , in regards to the two memory management fau l ts . One fau l t Digital Technical Journal No. 2 March 1 986 is made in response to an inva lid page and the other when a reference is made to a page that is not readable or writable as requ ired . No spec ial treatment is requ ired for page fau lts (trans lation-not-valid fau l ts) . If an invalid page is referenced by the emulator, a page-fault exception is reported to the operating syste m . The P C i n the page-fault frame poi nts at the i nstruction within the emu lator that referenced the i nvalid page . After the operating system makes the page val i d , execution resumes with the fau l ti ng i nstruction . References to pages that are not accessible (access-violation fau lts) are more complicated than the page fau l ts . Access-vio lation fau Its, u n like references to i nvalid pages, are visible at t h e p r o g r a m l ev e l . W h e n t h e e m u l a t o r i n tercepts the exception, the fau lting P C points at the emulator instruction that references the inaccessible page . The stack contains working storage that must be removed and saved regis ters that must be restore d . I n that way, the exception looks l i ke an access violation gener ated on a fu l l VAX imple mentation . For most floating point instructions, an access violation implies that the state of the machine w i l l be reset to its state when the instruction bega n . For the decimal, string, and POLYx i nstructions, the i nstruction can be left in a partially com p leted state . The i ntermediate context is scored in the registers and the FPD bit is set in the saved PSL . This b i t a l l ows the e m u lator to resu me these instructions at the point where they left off, rather than restarting them from the begin ni ng (assu ming that the access viola tion can be resolved) . Floating Point Emulation Support The program that emulates the m issing floati ng point instructions in software differs in several details from the decimaljstring emulation rou tines . In floating point emulation , the functions are performed i n the fol lowing order: 1. Execution begins i n kernel mode as a resu lt of a reserved-opcode exception . 2. I f the exception occurs i n a mode other than kerne l , the exception parameters are copied to the stack of that access mode . Further emulation takes place in that access mode. 3. Each operand i s decoded . 81 The Evolution of Instruction Emulation fo r the Micro VA X .�ystems 4. Floating point operands are u n packed into exponent and mantissa . 5. The operation (arithmetic or conver sion) is performed . 6. I f the result is a floating point nu mber, the resulting exponent and mantissa are packed i nto a single nu mber. 7 . The resu It is stored and the exception dism issed . Before the exception is dism issed , the float ing point emulatOr exami nes the opcode of the next instru ction . If it is also a floating point instru ction, then con trol is passed back co the beginning of the emu lator tO begin the operand decode for the next instru ction . This technique saves the overhead of dismissing one exception a n d i m m e d i a t e l y g e n e r a t i ng a n i d e n t i c a l reserved-opcode exception . The nature of floating p o i n t operations allows many instructions to accomplish their resu lts by sharing different rou tines. There are routines that can u npack and pack each of the four floating point data types. There are al so routines that perform the various arith metic and conversion operations . Because these rou tines operate on unpacked numbers, the rou tines are independent of the initial data type. The floating point emu lation routines sup port all four floating point data types . Thus the ro utines can be used with all MicroVAX systems and other VAX systems that do not implement all fou r floating point data types i n firmware or hardware . Decimal/String Em ulation Support The emu lation of a character string or packed decimal i nstruction proceeds as fol lows : 82 1. Execution begins in the access mode i n w h i c h the missing instruction was origi naily used . 2. Operands are moved from the stack into registe rs and control is passed to a n instruction-specific rou tine_ 3. Some i nstruction res u l ts ( for example, from MOVTC, MOVTU C , and packed decimal arithmetic and conversions) are stored w h i l e t hese routines are execu ting. 4. The rou tine execu tes u ntil an input or ou tpu t string is used up, at which time it completes the storage of resu lts . Execu tion i s res u m e d with the next instruction . Because the decimal/string emulator rel ies on hardware for its operand decode stage , the lookahead technique used by the floating point emu lator cannot be used for decima l and string i nstructions . I f the i nstruction fol l owing an emu lated instruction also requ ires emulation support, the fo llowing sequence takes place : 1. The fi rst exception is dismisse d . 2. The next instruction is execu ted . 3. The operands of t hat i nstruction are decoded and stored on the stac k. 4. The d e c i m al/strin g e m u l a t O r rega i n s contro l . Since these instructions perform many un re lated opera tions, t here is little code that can be shared between their emulation rou tines. Testing and Debugging The main problem i n testing the emu lation s o ftware i n i t ia l ly was t h a t t h e re was no MicroVAX hardware available du ring most of the i m p l e m entation cyc l e . Thus we had to deve lop techniques to simu late the hardware in order to begin the tests . There were two chief techni ques used to test and debug the emula tor. Firs t , instru ctio n-specific routi nes were tested as user-mode programs in a normal pro gram development envi ron ment . Secon d , the exception handler front-end was tested on a VAX - 1 1 /7 3 0 sys tem that was m od ifi e d , by rewriting some of the 1 1 /7 3 0 mi crocode, to act l i ke a MicroVAX syste m . Instructio n - Specific Testing Microcode written for a particular implementa tion (both VAX and MicroVAX systems) can be used only on that particu lar machine or a simu lation of that machine . However, macro-level code can be execu ted on any VAX processor. Therefore , since the emu lation routi nes were written in macro-level code that execu tes on a n y VAX p rocesso r , " n o r m a l " d e b u g g i n g Digital Tecl:mical journal No. 2 March 1986 New Products techniques cou ld be used for part of the debug effort . A set of test programs was constructed that wou ld run on other VAX processors ( 1 1 /7 3 0 , 1 1 /7 5 0 , a n d 1 1 /7 8 0 ) . These test programs wou ld call each i nstru ction-specific subroutine and compare the resu lts (memory contents, reg ister contents , and settings of the condition codes) with the output from the corresponding instru c ti o ns executed on those processors . These tests a llowed the basic a lgorithms to be debugged even before they were plugged i nto the emulator . The set of tests was l i m ited only by the choice of input data for each i nstru ctio n . T h e f i r s t s e t o f t e s t s u n c ov e r e d m o s t algorithmic problems b u t d i d not exercise the error paths (such as inaccessible source or des tination strings) . The code to handle these error conditions was written later in the deve lop ment cyc le . Neither the absence of these error paths nor errors in edge conditions (such as zero-length strings) prevented the VMS system from executing. Another benefit of a macrocode implementa tion was seen duri ng the debug of the edge condition problems. Since the i nstruction emu lation routines were just an extension of the operating syste m , the debugging tools used for other operating system code cou ld be used to debug the emu lator. Testing the VAX- 1 1/73 0 Breadboard Implementatio n The ava i l ab i l i ty of the two new e m u lation exceptions changed the strategy for debugging the emu lation code . The software solution used to obtain pre l i m inary resu lts was u nable to mimic the new exceptions i nvented to assist the emu latio n . Therefore , a new testbed was needed to accommodate the debugging pro cess . The testbed had to decode the operands and generate the appropriate exceptions to pass control to the software emu lation code . One way to perform these functions was to alter an existing VAX system, such as the VAX- 1 1 /7 3 0 processor. The 1 1 / 7 3 0 is an entirely " soft" machine; that i s , all its microcode is loaded at powerup rather than being resident in ROM. By a l tering that microcode , the design team could make the 1 1 / 7 3 0 look l ike the architecture i n a M icroVAX system . The required changes were simply a matter of removing the microcode for Digital Technical journal No. 2 March 1 986 instruction execution while leaving that for operand decode . To fin is h the alterations, the design team had to write a new "exception gen erator" to create the emu lation exceptions . At this time in the project, the first real MicroVAX hardware wou ld still not be avai lable for nine months. Therefore , the VMS design team decided to undertake the modifications to the 1 1 / 7 3 0 ' s m i crocode and to b u i l d t he testbe d . We estimated that this effort would take one to two months, since the VMS devel oper h a d to learn to write m i crocode . That meant that the software e m ulation code wou l d s t i l l b e c o m p l e t e d l o n g b efore t h e first MicroVAX hardware was ready. T h e m i c r o c o d e s o u rc e p r o g r a m s w e re acquired from the 1 1 /730 m icrocode team and ass e m b l e d u s i n g the l a te s t vers i o n o f the microcode assembler. The 1 1 /7 3 0 microcode was structured as separate modu les for different fu nctions (for example , floating point, compat ibi lity mode, exceptions, memory management, and so on) . Due to the lack of a " linker , " label files that a llowed routines to be called across modu les had to be created. To speed the devel opmen t , the design team wrote several FOR TRAN tools that a u tomatical ly generated new label files . In addition , command files were built that correctly created a new set of binary microcode files from a set of modified sources . The next step was to change the 1 1 j 7 3 0 ' s microcode . Since i t h a d to exist i n a l i mited amount of RAM space, the new code could not be added w i t h o u t rem o v i ng some existing code . Therefore , we decided to replace the compatibi lity mode microcode with a new rou tine to generate the emu lation exception . Some new flags were added that, at the developer's c h o i c e , w o u l d a l l o w d i ffe re n t c l asses of i n s tructions to be e m u lated ( i . e . , decimal s tri n g , c haracter string, or floating poi nt) . F i n a l ly , to b o o t t h e V M S syst e m o n t h is MicroVAX version of an 1 1 / 7 3 0 , we had to enhance the VMS bootstrap code to load the e m u l a t i o n exception hand lers and connect them to the appropriate exception vectors. Now the software emulation code , from the exception handler all the way down to i nstruc tion execution, cou ld be debugged . The best measure of the success of this venture was made when M icroVAX hardware was finally available. The customized VAX - 1 1 /7 3 0 system was such a good testbed , not o n ly for the 83 instruction emu lator but also the rest of the MicroVAX I support, that it took a mere four days to get the VMS system running. Other Test Mechanisms The i nitial testing of the instruction emulator consisted of a set of programs and sample input data for each of the missing instructions. Whi l e providing routines that worked in almost a l l cases, these tests d i d not exercise some o f the more exotic edge conditions . Those inclu ded very long or very short strings, i l legal operands, or strings that were not readable or writabl e . Once MicroVAX hardware was available, several new testing techniques cou ld be used to exer cise the emulator. Operating System Code More testing was provided by running the oper ating system code with the emulator providing character-string and packed-decimal support . The VMS Development Group has a large set of regression tests that exercise most success and error paths within the operating system . These tests plus normal dai ly use by the VMS develop m ent community ensured that extensive testing of the instructions used by the VMS operating system was performed . Once t h e VMS s ys t e m was r u n n i n g , the ULTRI X- 3 2 and VAXELN Deve lopment Groups requested the source code for i ncorporation into their systems . These systems exercised parts of the emulator that the VMS system did not use. The U LTRIX kernel uses a sma l l num ber o f packed decimal i nstru ctions (AS H P , ADDP4 , SUBP4 , a n d E D ITPC) for some o f its arithmetic and formatting support. When the ULTRIX- 3 2 operating system first exercised the e m u l a tor, several bugs were detected a n d corrected . Compiler-Gen erated Code and Associated Tests The base operating systems used packed-deci mal and floating point instru ctions in a smal l number of cases . These instructions received better testing using programs written in COBOL and FORTRAN . The compilers and their valida tion tests were used to test the emulator rou tines from the time they were first written u ntil they finally shippe d . 84 Architectural Co nformance Even such continual testing is no guarantee that each instruction execu tes according to the VAX archi tecture specification . Most of the testi ng described so far exercised the su ccess paths of the emu lation subrouti nes . The error paths, especially the code that intercepted and modi fied access violations, requi red a different set of tests . CPU Diagnostics For each C PU designed by Digita l , a set of CPU diagnostics is wri tten that exercises as much of the central processor as possibl e . Included in these diagnostics is an instruction-set exerciser that tests for proper behavior in at l east some of the interesting error cases. The CPU diagnostics for the MicroVAX I served as the primary test for the access violation hand ler in the deci maljstring emulator. AXE Ve rification Program All new VAX computers at Digital are tested with an architectural verification tool known as AX E . AXE p rograms are used to determ i ne whether or not the machine conforms to the VAX architectural specification . AX E accom p l ishes this testing by subjecting each VAX instru ction, with many combinati ons of oper ands, to a variety of error conditions . These c o n d i t i o ns i n c l u d e i naccess i b l e o p e rands , instructions or operands that cross page bo und aries , and u nusual operands. When the MicroVAX instruction e m u lator was subjected to AXE testing, the only bugs that remained involved an instruction restart fol low ing an access violat ion . Results As a result of this strategy, the software emula tion code was completed and fu l ly debugged before the first real MicroVAX hardware was finishe d . The ULTRIX- 3 2 and VAXELN oper ating system groups were able to take the VMS emulation code and convert it to work under their operating systems. That took much l ess effort than was requ ired for the VMS develop ment team to implement that code . With this technique , bugs fou n d in the instruction-execu tion logic in one system cou ld be corrected i n a l l three operating systems. Digital Technical journal No. 2 March 1986 New Products A second benefit of this engineering effort was s e e n by the ha rdware designers . The revised VAX- 1 1 /7 3 0 m icrocode sources and m icrocode tools were further mod ified to cre ate a MicroVAX CPU chip simu lator. The simu lator a llowed the MicroVAX CPU boards to be tested before any MicroVAX chips were actually availab l e . The biggest gain of a l l was that no applica tion software, compilers , or operating system code had to be rewritten to avoid the use of the m issing instru ctions . References 1. VAX A rchitecture Reference Ma n ual (Be dford : Digital Equ ipment Corpora t i o n , Order N o . E K - VAXAR- RM- 0 0 2 , 1 983) . 2. D . W . Dobberpuhl et a!, "The MicroVAX 78032 Chip: A 3 2 -bit Microprocessor," Digital Techn ical journal (March 1 9 86, this issue) : 1 2 - 2 3 . 3. VAX Software, Lang uages and Tools Handbook (Maynard : Digital Equ ipment Corporatio n , Order No . E B- 2 7 2 4 0- 4 8 , 1 98 5 ) . 4 . W.R . Bidermann et a ! , "The MicroVAX 7 8 1 3 2 Floating Point C h i p , " Digital Technical journal (March 1 9 8 6 , this issue) : 2 4 - 3 6 . Digital Tecbnicaljournal No. 2 March 1 986 85 Steven E. Boone Guenter E. Schneider The TK50 Cartridge Tape Drive A streaming tape drive, the TK50 subsystem, provides fast backup and data transfer for small computers like the Micro VAX II system. Asingle reel cartridge, using half-inch magnetic tape, stores 100 megabytes of data. A unique tape transport system automatically threads the tape when the cartridge is inserted. The drive reads and writes data in a serpentine manner, going the entire tape length first on one track, then another. For high data integrity, the TK50 subsystem employs a sophisti cated error-recovery algorithm, reading data after writing it and rewrit ing any corrected data farther down on the tape. The Q-bus controller, the TQK50, contains complex firmware conforming to Digital's Storage Architecture and controlling data transfers between the CPU and the tape. As t h e pe rformance of c o m p u t e r syste m s expands whi le their size shrinks, many factors demand special attention. O ne major factor i s storage systems. Over t h e past few years , disk drives have made dramatic advances, p roviding storage capacity of hundreds of megabytes i n very small and re latively i nexpensive packages . Since the predomi nant technology for today's disk drive is based on the fixed-media concept, some means of providing system backup and data transfer capabi l ities is requ ired . Magnetic tape systems are still the most viable way of providing these capabilities . Ease-of- u se considerations r e q u i re that a backu p/transfer device be matched in capacity to the supported disk systems. It should also be extremely rel iable , fast, and very cost effective . This paper describes a peripheral subsystem , t h e TK 5 0 m a g n e t i c c a r t r i d g e t a p e d r i v e (Figure 1 ) , that meets a l l these requirements. Design Goals of the TK50 Subsystem The TK5 0 cartridge tape subsystem was con ceived to meet the needs of the MicroVAX I I and similar computer systems. A study o f tape products then ava ilable indicated that existing quarter-inch cartridge drives did not provide 86 either the performance or the capacity required to back up the large capacity disk drives sup ported by these syste ms. Existing drives also l a c ked t h e re l i a b i l i ty and data i n te g r i ty required to complement the designs of our new microsystems. Therefore , Digital designed the TK5 0 cartridge tape su bsystem to meet the needs of the MicroVAX II system and other sma l l to mid-range compu ters . A wide variety of factors defined the design goals of the TK5 0 subsystem . It had to fit i nto a standard 5 l;.:i -inch form factor and provide high capaci ty with high data i ntegrity. The desire for mechanical simplici ty , rel iability, and l ow cost, while maintain i ng good performance, dictated a streaming tape design . The TK50 subsystem had to be compatible with the Q-bus , a nd the TK5 0 controller had to su pport the Tape Mass Storage Control Protocol ( TM S C P ) of t h e Digital Storage Architecture . Our i nvestigations led to the concept of an au tomatic-threading, s i ngle-reel cartridge that utilized the established medium of i nstrumen tation tap e . This tape supports high bit densi ties and fast tape speeds, allow i ng great latitude in specifying the performance and capacity of the TK50 su bsystem . We a lso decided to use Digital Tecbnicaljournal No. 2 March 1986 I New Products Figure 1 The TK50 Tape Drive half-inch tape , rather than quarter-inch , to max im ize capacity. The requirement of the MicroVAX II syste m , a s w e l l a s our desire t o m i nimize risks i n a first generat ion product, d ictated that the tape capacity should be 1 0 0 megabytes (MB) . System Design The TK50 cartridge tape subsystem was deve l oped with three major components: • A tape cartridge , cal led the CompacTape Car tridge , that houses 600 feet of half-inch tape and supports the auto-threading feature of the transport mechanism • A unique streaming tape transport featuring auto-thread i n g and a mi croprocessor-con trolled servo-system • An i nte l l igent, microprocessor-based Q-bus controller that su pports TMSCP Compac Tape Cartridge The CompacTape Cartridge is u n ique i n many ways . First, it provides a large amount of data Digital Tecbnical journal No. 2 March 1 986 recording surface for its vol u m e . The cartridge has approximately two hundred and fifty times the recordi ng su rface area of a single-sided 5 � -inch floppy disk. Moreover , compared to the only commercial tape product then availa ble to fit the 5 � - inch form factor, the Com pacTape Cartridge is fou r times as efficient in utilizing tape volume i n relation to cartridge vol u me . The cartridge is designed to maximize the volume of tape i n the standard form factor of the 5 � - i nch drive . The cartridge , shown i n Figure 2 , contains a s i ngle reel with the tape occupying forty percent of the cartridge's vol ume . The tape is Y2 i nch wide, . 0 0 1 i nch thick, and 600 feet long. Second, the CompacTape Cartridge is a com p letely e nclosed device that never exposes the media to the environment, thus greatly enhanc i ng the data reliabil ity of the entire su bsystem . Third , the CompacTape Cartridge a l lows automatic tape threading once it is inserted i nto the TK5 0 tape drive . This auto-threading func tion is a key feature of the mechanical design of the tape transport. 87 The TK50 Cartridge Tape Drive D R I VE H U B Figure 2 The TK50 Tape Cartridge The auto-threading works i n the fol lowi ng way. When a cartridge is i nserted into the drive, the tape must be threaded around the tape guides, over the readjwri te head, arou nd the take-up ree l , and then fastened to the ree l hub. Two leaders are used ro accomplish the thread ing, as shown in Figure 3 . One, made of . 0 0 7 i nch Mylar, i s attached r o the BOT e n d o f the tape in the cartridge ; the second is attached to the hub of the take-up reel i n the drive . This second leader has an arrow-shaped tip that reaches from the ree l , through the tape path , and i nto the area that w i l l be occupied by the tip of the first leader when the cartri dge is i nserted . During the insertion process , the arrow-shaped tip is moved by a cam into the ope n i ng of the cartridge leader. Tension is then a p p l ied to lock the leaders toge t her. This "buckle " is now ready to be pul l ed t hrough the tape path and wound onto the take-up ree l . This buckl i ng process is accompl ished by two l i nks in the drive , in conjunction with a constant tension applied by the motor to the take-up leader. One link uses a cam to move the two leader tips into each other. The other l i nk holds the take-up leader in the correct pos ition and retreats at the right instant, al lowing the motor to cinch the buckle . The entire process 88 0 0 0 CARTRIDGE LEADER OPENING ARROW SHAPED BUCKLE TIP R E T AINING NOTCH 0 RELATIVE MOTION TAPE DRIVE LEADER STEP 1 Figure 3 STEP 2 STEP 3 Engagement of Drive Leader to Cartridge Leader Digital Technical journal No. 2 March 1986 New Products happens du ring the last half-inch of insertion as the cartridge enters the drive . (See Figure 4 . ) This l inking takes place withou t any tape being spool ed out of the cartridge . When the tape is rewound into the cartridge for remova l from the drive, the two ears on the cartridge leader come to rest in a pocket in the cartridge shell . When the cartridge is removed from the drive , two opposing locks hold the reel in this position. The toothed locks engage with rhe teeth on the outer diameter of the reel flange . Thus locked, the tape stays tightly wound and the leader tip is kept in the correct position for a subsequent buckling process. PROPER LOCATION OF LEADER LEADER UNHOOKED LEADER HIDDEN Tape Transport The TK50 tape transport ( Figure 5) consists of two major components: the tape drive and a single printed circu it board assembly. The tape drive encom passes the mechanical and electromechanical components to read data from and write data to t he magnetic tape . The drive 's major components incl ude • The magnetic readjwrite head and i ts linear positioner LEADER. TAKE-UP EXTERNAL LEADER DISPLACED ABOVE LINK Figw·e 4 View of Leader Shown in Fo ur Positio ns REEL. TAKE-UP I/ CONSTRAINT, TAPE SHIELD ASSEMBLY TACHOMETER ASSEMBLY BRACKET & LEADER LATCH ASSEMBLY LINK. BUCKLING H E A D ASSEMBLY SPRINGS LINEAR ACTUATOR INSULATOR ASSEMBLY BEZEL ASSEMBLY Figure 5 Digital Teclmical]ournal No. 2 March I 986 TK50 Tape Drive Tra nsport 89 W R ITE READ TAPE I S LA N DS READ Figure 6 • W R I TE TK5 0 ReadjWrite Head (Top View) The cartridge threading mechanism Read/Write Head • The take-up reel and i ts motor • The drive hub mechanis m , which interfaces tO the CompacTape Cartridge, and i ts motor • The tachometer, which provides feed back to a mi croprocessor, the 8 0 5 1 , for tape speed control • Various sensing devices that monitor and control the hand ling of the tape as it passes over the readjwrite head The readjwrite head is designed with fou r isl ands t h a t a r e i n contact w i t h the tape (Figure 6) . The tape forms a polygon as it con tacts these four areas . Each island bends the tape by an angle of six degrees . Over i ts width, each island is curved by an amount correspond ing to the radius of the natural curvature of the tape under working tension, thus assuring good su rface contact (Figure 7) . The narrow islands l i m i t any temporary liftoff (due to contam ina tion) to very short sections of tape , and they clean the tape as wel l . HEAD ---1-1�1 1 /2" TAPE I I I S LAN D S Figure 7 90 TK50 ReadjWrite Head (Side View) Digital Technicaljournal No. 2 March 1986 New Products Except for the ferrite cores, the entire head b lock is made of ceramic material to ensu re long li fe. The two inner islands contain the readjwrite cores; the two o u ter ones direct the tape to the inner ones so that u n i form contact between the tape and the head is provided . On the upper part of the head assembly are two gaps, a write gap ( . 0 1 8 inch wide) fol lowed by a read gap ( . 008 inch wide) , that read and write data when the tape is movi ng forwa rd . Two corresponding lower gaps read and write data during reverse tape motion. The lower gaps cover the odd tracks and the u pper gaps cover the even tracks ; thus, the head has to traverse only h a l f the tape w i d th , h e l p i n g greatly t o keep t h e height of the drive within l i m its . The track spaci ng is . 0 1 9 inc h . A uta- Threading As the cartri dge i s inserted, its door opens, expos i n g t h e c a r t r i dge l e a d e r . T h e n , as Fig ure 8 Digital Technical journal No. 2 March 1 986 described earlier, two plastic arms i n the drive act to buckle the cartridge's supply l eader to the drive 's take-up leader. The rest of the au to threadi ng process is handled by the drive's motOrs, sensors and m icroprocessor. Tape motion and tension control is accom plished through two microprocessor-control led brushless direct-current motors. One of these motors is connected directly to the take-up hub; the other to a drive hub designed tO inter face to the Com pacTape Cartridge . The engagement of the cartridge hub with the drive motor shaft is accomplished by a pair of gears that trans mit torq u e and simu l tane ou sly center the reel (Figure 8) . A plastic hub with one set of teeth is attached to the spindle; another set of teeth is molded on the u nderside of the cartridge reel hub. A cl utch gear engages both sets of teeth to d rive the reel . To fac i l i tate the insertion or removal of the cartridge, the clutch gear is axially retracted out of engage- TK50 Door A ssem bly 91 The TK50 Cartridge Tape Drive ment. The clutch gear is activated by the opera tor's lowering or ra ising the hand l e . When the handle is lowered, the spring-loaded lower gear engages the reel and lifts it s lightly into the cartri dge to e l i m i nate contact between the rotating reel and the stationary she l l ( Figure 8) . This clutching arrange ment has a big advan tage because it a llows mechan ical s i m p l i c i ty and easy operation of the drive . The cartri dge i s i n se rted by t h e o p e r a t o r i n to a c h a n n e l ( receiver) that p u ts t h e two leaders i n to a coplanar relations h i p . The entire l i n king pro cess is thus accomplished by merely sl id ing the cartridge i nto the receiver slot. A solenoid-acti vated interposer locks the cartri dge in pl ace when i t r e a c hes t h e end pos i t i o n i n the rece iver. When the front handle is then low ered, the drive gear rises to mate with the car tri dge ree l . A set of fingers s i m u l taneously enters the bottom of the cartridge to release the reel locks, thus al lowing the tape to move . The operator accomp l ishes a l l these actions with one hand . After a tape cartridge is i nserted i nto the TKSO drive, the operatOr presses a buttOn and the 80 5 1 m icroprocessor on the printed circuit assembly i n iti ates the t hread ing process . The re e l motors , under m i c roprocessor contro l , slowly p u t tension o n t h e tape t o accomplish the process . The buckled leaders and a length of tape are pul led through the d rive and onto the ta ke-up ree l . Auto-thread ing i s complete when the BOT hole in the tape is detected by a photo-transistor. When the au to-threading oper a t i o n e n d s , t h e m i c roprocessor w i l l have received pulses from a tachometer attached to one of the rotating tape gu ides. Through the information derived from the tachometer, the microprocessor can mai ntain proper tens ion and tape speed . After the tape is positioned at BOT, the con troller requests a cal ibration procedure. This procedure sets up the d r ive to ens u re that proper va lues for the read circu itry gai n and head stepper alignment are obtained. This cal i bration provides o n e o f the key features o f the T K S O su bsyste m : the a b i l i t y of a u s e r to exchange media between different TKSO tape drives without the need for adj ustments . Once ca l i brated and at BOT, the TKS O drive is ready to read or write data . The drive writes data i n a serpe ntine fas hion over the entire length of the tape . The upper part of the 92 readjwrite head wri tes data o n one track down the entire length of tape until it reaches a logi cal EOT marker . (The logical EOT marker is a preset tachometer c o u n t ; the phys ical EOT marker i s a hole in the tape .) The tape d i rection i s then reversed and the other lower write core w i l l write data i n the other d i rect ion for the entire length of the tape u nt il a logical I30T is reached . The d i rection of the tape is then changed to forward , the head is stepped u p by 1 9 mils, and the upper wri te core is again used to write data . Figure 9 i l l ustrates the physical tape configuration . Dri1 ·e Circu il!J> The printed c i rc u i t board assembly is bu i l t around an 805 1 m icroprocesso r. The 8 0 5 1 and assoc iated circu i try prov ide the i ntell igence to i nterpret commands, provide servo control for the ree l motors , pe rform tape c a l ibration proce du res, and monitor va rious status inputs. The readjwrite c i rcuits necessary to tra nslate data to and from the tape ' s MFM format a lso res ide on the board . F igure 1 0 i l lustrates a simpl ified b lock d iagram of the TK S O d rive board . Write data comes i nto the drive ' s logic board via the d ifferential signa l cab le from the con troller board . The data enters the shift register, which accepts the serial data and out puts a five-bit parallel data pattern into a progra m mable array logic ( PAL) devi ce . The data i s c locked t hrough the shift register b y a 500-KHz clock. ( 5 0 0 KHz is the write pu lse rate , or data rate.) The PAL first accepts the five parallel bits from the shift re gister. Then the PAL generates the pre-compensation , as requ i red, and trans lates the data i nto the MFM format recorded on the tape . A consta nt current sou rce of 1 5 m i l l iamps i s applied a lternately t o each core o f the active write head , resu lting in t he flux trans i tions necessary t o write data on the tape . To enhance data re l i abil i ty , the TKSO su bsys tem reads data just after writing i t . This tech nique uses the read head (pos i t ioned i mmedi ately behin d the write head) to read the data from the tape as soon as it has been written . (See Figu re 7 . ) The read data i s sent back t o the controller, where the com municati ons i n terface performs CRC process ing. If an error is detected, the con trol ler rewrites the block that conta ined the error. The rewr i tten block is pl aced fa rther Digital Technical journal No. 2 March / 986 New Products LO G I C A L BEG I NN I NG OF T RACK ( F O R WA R D ) I 914 I MM DATA A R E A I 914 1 83 M E T E R S 1 600 F E E T ) NOMI NAL 610 MM 1 2 FT) MIN 1 1 3 FT ) MIN I T R A c K I I MM I 1 3 FT) I 1 MIN I I I I 1 305 I MM I 1 1 1 FT)I MIN I I I I I c A L I B R A T I 0 N BOT HOLE G u A R D G u A R D B A N D B A N D I G l u A l R l o I N u M B E R I E X T E N E I x l T l E N I s l I 0 I N 1 l I I I 20 G u A R D 18 16 14 12 B A N D 10 8 6 EOT HOLE 15 13 s 11 9 7 5 3 I I I I 305 j MM I 1 1 1 FTl 1 MIN I I R E F E R ENCE EDGE No. 2 March 1 986 1219 MM (4 FT) MIN H UB END LOG I C A L BEG I N N I N G O F T RACK ( R EVERSE) Figure 9 Digital Technical journal 21 17 I 1 2 19 B I I A 0 I N N I D LEADER END 4 Physical Tape Configuration 93 The TK50 Cartridge Tape Drive M ISCELLANEOUS SE NSE co NTROL ¢:> -- NRZ READ DATA VCO ENABLE vco R E A D ENABLE DATA SEPARATOR + 875 1 R E A D CLOCK t TACHOMETER INTO MICROPROCESSOR SENSE PLS-L ... (AMPLITUDE. TRACKING) � READ FIL T E A ' S GAIN CONTROL SERIAL COMMAND GAP -L -INTI MICROPROCESSOR ENABLE � DUAL DAC � ": h -V BUS 8 1 55 ENABLE 8X256 RAM WRITE GATE MA I--r- 1-- I L[>-8' HEAD SERIAL - STATUS (ECHO) FORWARD + -- B A C K W A RD CHANNEL 600+ F E E T OF TAPE Vt ['r WRITE/ERASE ENABLE K:J (22) MISCELLANEOUS SENSE CONTROL SERVO TIMER )-WRITE DATA (NRZ) f ENCODER WRITE COM PENSATION (PAL'S) WRITE CLOCK ERASE GAP 24 MHZ Figure 1 0 94 __j Block Diagram of the Drive Board Digital Tecbnical]ournal No. 2 March 1986 New Products T h e read-data pu lse from t h e read amplifier circuit is used in conjunction with the 5 0 0-KHz write clock to optimize the " lock time" for the PLL. Whenever there is a gap (no signal) going i nto the PLL, it will lock onto the 50 0-KHz clock signal . This locking is done so that the loop-fi l ter i ntegrating capacitor is kept a t a con stant voltage . This process mini m izes the phase lock time duri ng the preamble . When the READ ENABLE signal is asserte d , the PLL waits for the synchronization (sync) b i t . When the P L L detects t h e transition, it clocks the sync b i t and data onto the serial line to the controller and starts sending back the read clock. The sync bit signals the communications processor on the controJler to start processing the following data and the CRC c heck-word , and to check for a matching CRC . down on the tape tO avoid the performance loss resu lting from the drive ' s having to move the tape back and rewrite over the data block con taining the error. The controller firmware is able to detect these rewritten blocks duri ng a subsequent read pass for data recovery proce dures, thereby e nhancing system i ntegrity. Read data signals from the read head are fed tO the differential preamplifier circu it and i n turn t o t h e read amplifier. The gain o f the preamplifier is automatically set during cal ibra tion to maintain an optimum signal level . The signal from the read ampl ifier is then passed to a differen tiated , l inear-phase , low-pass fi lter. A zero-crossing detection c ircu i t prod u c es a digital signal , consisting of a si ngle pu lse for each detected zero crossing, that represents data read from the tape . The digital data is then sent to the phase lock loop (PLL) circuit where the clock signal is recovered and the MFM data is decoded . The PLL consists of two PALs , a voltage-controlled oscillator, and some analog circuitry. Q- bus Co n troller The i n te l l igent interface between t he TK50 tape transport and the Q-bus is designated as the TQK5 0 . Figure 1 1 is a block diagram of the 0- B U S - I I ,.--- M I C R O P R O C E SSO R I I - RAM I Figm·e 1 1 Digital Technical journal No. 2 March 1986 U A RT T -- ROM I l I I I I R I V E I N T E R FACE I 0- B U S I NTER FACE --p� F-;;- I I !" M 7 546 0 - B US C O N T R O L L E R .4� �, l__ I I 1 ... I DR IVE T R A N SCE I V E RS _ I _ I I I ... ... � TO DRIVE F RO M DRIVE j Block Diagram of the TQK50 95 The TK50 Cartridge Tape Drive TQK 5 0 . The i nterface is a Q-bus-compati ble dual board based on the 80 1 8 6 microprocessor. In conjunction with 32 kilobytes ( KB) of highly complex firmware , the 80 1 86 and its associ ated hardware perform the fo llowing fu nctions : • Interface the controller to the Q-bus (via sin gle-word and DMA transfers) • Translate and process TMSCP command packets and responses • Provi d e data format a n d e rror recovery processing • Control the general operation of the tape transport mechanism • Support the serial data link between the con troller and drive Hardware The Q-bus interface is controlled primarily by an 80 1 86 mi croprocessor and an 8 2 S I 0 5 field programmed logic sequencer (FPLS) , which is a high-performance LSI device capable of per forming complex logic functions . Using the 82S 1 0 5 FPLS sequencer allowed us to create an efficient, flexible design in a very smal l spac e . The FPLS and microprocessor are responsible for maintaining the strict Q-bus protocol du ring DMA transfers to and from the co ntroller. The DMA transfe rs a n d interface i nterrupts are processed very qu ickly due to the high per formance of the microprocessor and FPLS . This h igh performance makes possible the data rates needed to su pport tape streaming and lessens the critica l i ty of the DMA latencies in the host system. Assisting the FPLS is an 80 1 86 m icroproces sor operating at 6 MHz. The 80 1 86 is a highly comp lex , 1 6-bit microprocessor; it is responsi ble for all the command, control , and data processing for the TQK50 . A microprocessor with the 80 1 86's performance is required due to the large number of complex tasks that must be performed within very short time fra mes (e . g . , ECC process ing during inter-block gaps on tape) . The high level of integration available with the 80 1 86 was a key factor i n i ts selection . I n addition to the CPU, the 80 1 86 contains three onboard timers, an interrupt controller, address decoding, and two DMA channels. Also i mportant in the selection of the 80 1 86 was the availability of sophisticated development tools and efficient software su pport packages. 96 The 80 1 86 microprocessor is su pported by nu merous components that include SSI , MSI and PAL devices . Furthermore , the program store and the workspacejdata buffers are pro vided by 1 2 8-kilobit (Kb) EPROMs and 64Kb static RAMS . A total of 3 2 KB of program store and 1 6KB of bu ffer is ava i lable to the 80 1 8 6 . Communicati ons between t h e TQ K50 con troller and t he TK50 tape transport take place over a pa i r of fu ll -duplex, differential, serial lines . A multi protocol commu nications proces sor (NEC 7 2 0 1 ) is used to process the serial-to parallel and paralle l-to-seri a l conversions. One fu l l - d u p l e x c h a n n e l , o p e ra t i n g a t 1 8 7 . 5 kilobau d , com m u nicates the commandjstatus information betwe en the controller and the transport. The other channel provides the data communications path, su pported by data-l i nk error checking via CRC- 1 6 . This second chan nel operates synchronously at 5 00Kb per sec ond . The NEC 720 1 communications chip sup ports D MA transfers to and from the 80 1 86 and operates in a priority-interrupt mode. Firmware The most complex component of the TK5 0 sub system is i ts firmware . The 32KB of firmware contained in EPROM are partitioned into five major fu nctions: • The PORTjQ 2 2 ( Q-bus) for data transfe r control • The SERVER for TMSCP command processing • • • T h e T O S fo r t a p e and formatting transport control The ECC for error detection and correction The ROD for resident onboard diagnostics The PORTjQ 2 2 firmware controls data trans fers between the controller a nd C PU, and a lso mai ntains t he command queue processing. Up tO fou r TMSC P commands can be q u e u e d , al lowing t h e host t o s e t u p a series o f opera tions for execution whi l e it continues with other processing. D MA transfers of up to 64K- 1 bytes can be made, allowing an effective, low overhead data transfer between the subsystem a nd CPU memory space. The SERVER firmware is responsible for trans lating and execu ting the wide variety of TMSCP com mands . These com mands provide a very structured environment within which control, Digital Technical]ournal No. 2 March 1986 New Products status, and data transfers are acco mplished . TMSCP is a packet protocol that uses a com mand-response sequence . Each pair of com mand-response packe ts conta ins i n formation pertain ing to the internal command as wel l as various command modifiers , status fields, and subsystem parameters . All l evels of information , from the command sequence n u mber to com mand status to hardware and firmware revision leve ls, are provided in TMSC P . In addition to assembling and processing this information, the SERVER firmware uses va lues , such as physical and logical record numbers, to val idate i nfor mation being processed from the tape . SERVER has an additional mode that supports the Diagnostic Uti l ity Protocol (DUP) . DUP provides a set of commands that a l low detailed tests of the subsystem to be performed . DUP operates in conjunction with the resident on board d iagnostic modu le . The TOS (tape operation support) firmware controls the transfer of data between the tape transport and the buffers al located by SERVER. This control is accomplished through format ting operations and through physical control of the tape transport mechanism. The TK50 subsystem is a streaming tape drive that was designed to operate in an efficient block-mode environment. The TK50 subsystem relies on logical i nformation written on the tape to determine the tape 's physical and logi cal positions . The physical and logical contexts are maintained by the TOS firmware and writ ten into special control fields embedded in the TK5 0 tape format. I nformation contained i n these fields includes physical object number, logical object n u mber, tape-mark number, byte count, sequence control number, track num ber, a n d block typ e . T h i s i n fo r m a t i o n is processed by TOS to maintain the physical and logical contexts between the subsystem a nd the data on the tape . D u r i n g s t r e a m i n g o p e ra t i o n s , c o n t e x t processing i s the prima ry fu nction o f TOS . However, when the host system i s u nable to process data at a sufficient rate to maintain the streaming operation ( 4 5 KB per second) , TOS must p rovide complex posi t i o n i n g contro l . Whenever t h e host system fal ls below the required data transfer rate , TOS must stop the tap e . Since the TK50 subsystem was mechani cally optimized for streaming, any stoppi ng a nd starting of the tape is a ti me-consuming a nd imprecise operation. Moreover , the TK5 0 subDigital Technicaljournal No. 2 March 1 986 system lacks the inter-record gaps that are used for positional information in traditional 9-track tape drives. The TK5 0 subsystem must rely on data read from the tape to locate its positio n . When t h e host system resumes data process i n g , TOS m u s t r e p os i t i o n t h e tape by a sequence of reverse, stop, forward , a nd read. After locating the last data block processed on the tape , TOS continues with the host 's request . The host's failure to process data at a sufficient rate is costly in terms of system throughput. This situation requires i ncreased complexity i n t h e su bsystem design . TOS provides a padding fu nction to help compensate for insufficient host processi ng power. With padd ing, TOS a l lows data latencies of up to 6 3 m i l l iseconds before reverting to the repositioning mode . During this data latency period, pad blocks are written to the tape i n 9-mil lisecond increments . That al lows t h e tape to c o n t i n u e s t r e a m i n g . T h e t r a d e - off i s i m p rove d pe rformance a t t h e expe nse of slightly reduced tape capacity ( 5 1 2 bytes per pad block) . I f the 6 3 -m i l l isecond period is exceeded , TOS stops and performs a reposition to the point of the last data block. When addi tional data arrives, TOS overwrites any previ ously written pad blocks . In practice, this pad fu nction e n hances performance and seldom reduces tape capacity by more than ten percent. The ECC fi rmware provides the means to detect and correct errors . To provide a high leve l of r e l iability, the TK50 s ubsystem is designed to allow only one unrecoverable error in every 1 X 1 0 1 1 b i ts read . This is equ ivalent to one u nrecoverable error i n every 1 2 5 cartridge reads . To achieve this goal , ECC implements error-detection and error-correction schemes. Error detection is based on the C RC- 1 6 method , which is su pported by the hardware commun i cations device . This i ndustry-standard method has been proven to be very efficient in this environment . To implement the error-correction fu nction, ECC processes seri a l-formatted data to and from the tape . Data is written to a nd read from the tape in 5 1 2 -byte blocks. Each block is grouped into 8-block units, called data entities . Within an entity, the four even-nu mbered data blocks (0 , 2 , 4 , 6 ) and the four odd-numbered blocks ( 1 , 3 , 5 , 7) are protected by longitudinal check sum blocks . An entity , therefore , consists of ten blocks : data blocks 0 through 7 and ECC blocks 97 · ---- The TK50 Cartridge Tape Drive Acknowledgements (5 1 2 BYTE BLOCKS) Figure 1 2 Designing the TK5 0 cartridge tape subsystem required a multitude of d isciplines i nvolving scores of individuals. Each member of the TK5 0 program team contributed time, energy, and personal commitment to yield a su ccessfu l produ c t . The a u thors wish to acknowledge those contributions here . Entity of Ten Blocks 8 and 9 . Figure 1 2 shows the arrangement of the ten blocks . This technique, coupled with record-level checking by SERVER and the host operating sys tem , i nsures the complete i ntegrity of the user's data . The ROD ( resident on board d iagnostics) firmware provides additional support for the TQK5 0 . When the subsystem is i n itialized , the firmware executes a series of gojno-go tests that validate the fu nctionality of the controller. Ninety-eight percent of the TQK5 0 ' s function ality is covered by these tests, excluding the Q-bus and drive-interface logic circu its . More extensive diagnostics that fu l ly test the TK5 0 subsystem are available u nder the DUP. Having the diagnostics resident i n firmware allows the running of i ntegrated tests that i nteract at levels not permi tted from the system i nterface . That avoids the difficu lties i n supporting down-line loadable code i n various run-time environments . Summary Designi ng the TK5 0 cartridge tape subsystem and turning it i nto a product was a significant challenge . The effort proves that good perform a n c e , h i g h r e l i a b i l i t y , e a s e of u s e , a n d extraordinary data i ntegrity can b e ach ieved i n a cost-effective manner. These qualities w i l l continue t o be requ ired a s computer systems increase in performance and capacity. To that end, the TK5 0 cartridge tape subsys tem is but the first of a family of cartridge tape products . Work is continu i ng on the develop ment of subsystems with h igher performance and greater capacities. Interfaces to computer systems other than those based on the Q -bus have been or are being developed to meet the expanding needs for greater storage capacity. 98 Digital Technicaljournal No. 2 March 1 986 Raymond]. Lanza Porting UL TRIX Software to the Micro VAX System The ULTRIX system, written in the C programming language, was ported to the Micro VAX II processor by a multistep process. This involved estab lishing a cross-development environment, building a bootpath, porting the UL TRIX kernel, and writing special device drivers. The remaining software was ported after those steps were completed. To minimize UL TRIX design changes, the system's IjO architecture was mapped into the Micro VAX physical address space so as to mirror the equivalent mapping on larger VAX systems. Some Micro VAX instructions must be emulated in macrocode. The emulator used in the Micro VMS software was adapted for use in this UL TRIX software. The UNIX system came i nto existence in 1 9 69 at the AT&T Bell Laboratories in Murray H i l l , New Jersey. The initial system was written i n assembler and ran on a PDP· 7 system that was l oaded from paper tapes. From l ate 1 9 70 to early 1 9 7 1 , the UNIX software was reimple mented for the PDP- 1 1 system using a cross assembler running on the original PDP- 7 sys tem. In 1 9 7 3 , the kernel was rewritten in the C programming language . Since that time the sys tem has undergone many changes and is stil l the subject of much researc h . 1 Today, there are two major 3 2 - b i t vari a n ts of the o r i g i n a l software : 4 BSD, developed at the University o f C a l i fornia a t Berkeley; a n d System V , from AT&T Corporation . Digita l Equ ipment Corpora tion's original U LTRIX-3 2 product is a direct descendant of 4 . 2 BSD. In 1 9 8 3 , Digital decided to develop and dis tribute a UNIX software produ ct. At that time, 4 . 2 BSD was the only virtual-me mory U N I X operating system ru nning on VAX processors. I t i s Sti l l the only UNIX software derivative to pro vide network su pport. These features were the key factors i n deciding to use 4 . 2 BSD as the basis of the ULTRIX- 3 2 syste m . Deve lopment started in t h e fa l l of 1 9 8 3 o n o n e o f the first 4 . 2 BSD distributions, a n d the Digital Technical }ourn.al No. 2 March 1 986 fi nal prod uct was re leased in April 1 9 84 as U LTRIX- 3 2 V l . O . In the current version of the product, we have combined the two UNIX sys tem derivatives by adding the system services of the AT&T version to the original U LTRIX- 3 2 sys te m . To that base we have added reliability and maintainability features , as we l l as new-proces sor su pport . The resu lting system , one of the industry's most powerfu l and versatile UNIX software versions, spans the fu l l VAX system pricejperformance range . Porting the UNIX System " Porting" is the process of implementing an operating system on a new processor. The UNIX system has been ported to more processors than any other system in existence . It ru ns on a l l c lasses of machines , from 8 0 8 6 m icroproces sors to the CRAY- 2 . For VMS and RSX systems and the like , port ing normal ly means a major rewrite because significant parts of them are written in low-level languages, usually macro asse mbler. Rewriting one of these systems is so expensive that either the effort wou l d not be undertaken or the new system would be written from scratch . The UNIX system is differe nt. I t i s written in a single high-level language , C , 2 and has been 99 I structured to be as processor independent as possible. However, vestiges of i ts PDP- 1 1 heri tage are still apparent. Al l 3 2 -bit versions of the ULTRI X- 3 2 system are built from a common set of sou rce fi les. The kernel fi les are organized into machine-depen dent and mac hine-independent parts. The dif ferences between the VAX and MicroVAX ver sions of the system are resolved through the use of conditional co mpilation and l inking. The present kernel sources for the MicroVAX ver sion are as fol lows : Esta b l i s h a environment. c r o s s - d eve 1 o p m e n t • C language • Native assembler • Linker • Debugger 2. Bu i ld a boot path . 3. Port t he kernel and a few key programs . 4. Write special device drivers . 5. Port the rest of the syste m . Files Language 209 C headers 31 5 C source The Cross Develop ment Enviro n m ent Assembler source When porti n g to a new arc h itecture , it is neces sary to develop a set of tools that produces code for the target system . These tools constitute a cross-deve lopment system for software genera tion and often become the basis for the even tual native environment. Their construction is normally the first step in the porting process . In the M i croVAX case, the cross-deve lopment tools were not necessary, for reasons explained below. The Mi croVAX system is a subset architectu re with the majority of the string manipu lation i nstructions missing. 3 M i croVAX systems can also be configured without floating point sup port in the hardware . Our c halle nge , which was a lso s hared by the VMS and VA.XELN Develop ment Grou ps, was tO provide an execu tion environ ment for user programs t hat was com p letely compatible with larger VAX systems . By closely examining the instructions pro d uced by our C compiler, we found that, with the exception of the floating point i nstructions, not one missing string instru ction was create d. Further exa m inations revealed that the o n ly place where any of the missing i nstructions were used was in a handful of output formatting routines. As an i nterim solution, the affected routi nes were rewritten to elim inate the mi ss 4 i ng instructions. 21 The 2 1 assembler source fi les can be further broken down as follows: Files 14 Purpose M i croVAX su bset and floating point emu lator 3 Te m p l ates for r p b , scb, spt 3 Macro defi n itions I n i tial startup code (locore .s) The l ast and most significant fi le is locore .s, which contains the i nit ial startup code and a few c r i t i c a l r o u t i n e s n e e d e d for process management. Bri n g i ng the U N I X system up on a new processor is normally done i n mu ltiple steps by a sma l l team . The d i fficulty and extent of the work i nvolved is di rectly related to the archi tectural differences between the vers ions for the existing and target processors. Our tea m consisted of three peop l e , later joined by a fourth . The first was responsible for the com piler and subset emulator. The second did the software i nstallation and verifi cation for the firs t vers i o n of the produ c t . Later , h e was responsible for some device drivers. The author of this paper d id the kernel port and other device drivers . The fou rt h person ass u m e d responsibil ity for installation. Bringing the U LTRI X - 3 2 system up on a processor invo lves the following steps : 1 00 1. The Boot Pa th Mic roVAX systems conta in the virtual memory boot (VMB) program in ROM . Norma lly this program loads the VMS system but has been enhanced to perform an alternate i nitial pro gram load operation, called a boot-b lock boot. Digital Technical ]oun�al No. 2 March 1 986 New Products This operation is the mechanism used to boot the ULTRIX system and is based on block num ber 0 of the boot disk being i n a special format. Booting is a m u ltistage process. 1. 2. VMB first checks for an ODS- I I fi le struc ture s In the default case, VMB will per form a "sniffer boot , " which consists of first checking the removable media, then the fixed disks, and finally the Ethernet. The system can also be booted from the TKSO cartridge tape d rive and a special PROM board . I f an ODS - I I fi le structure is not present, VMB looks for a valid boot-block i mage in the first block on the disk . This block contains a table that specifies the size and location o f the s e c o n d a ry boot image . If the table is valid, VMB reads the secondary boot image i ntO memory and transfers control to the image . (If the table is inval i d , control is transferred back to step 1 above .) 3. The secondary boot image on ULTRIX systems is a program that locates, reads, and executes the tertiary boot program from an ULTRIX fil e system. The func t i o n a l i ty of the s e c o n d a ry b o o t i s severely constrained because i t resides outside the file system in a fixed-size (7 . S KB) area adjacent to the boot block. 4. The tertiary boot is capable of loading and running other programs . Unl ike the secondary boot progra m , it su ppo rts interactive terminal I/0 and can prompt the user for an alternate program to load . As a default, the tertiary boot loads the operating system kernel , called vmunix, 6 from the boot disk. S. Afte r the steps above have been com pl eted , the kernel i s i n memory a n d ready to run . The two boot programs are part o f the stand a lone syste m , which in itse lf constitutes a port i ng prob lem that is not very different from port ing the kernel . The problems encou ntered are similar, although simpl ified, because the stand alone system runs with the interrupts and mem ory management d isabled . The stand -alone sys tem is not nearly as fl exible as the kerne l . Digital TecbnicaljoUJ-nal No. 2 March 1986 Porting The Kernel The VAX Architecture Standard ( D igital Stan dard 0 3 2) specifies the VAX instruction set, memory management, and process environ ment. However, the standard leaves many other areas open for change . These areas are typically ones that need to be supported on each new processor. For the M icroVAX system , it was necessary to address problems in the fol lowing areas : • Startup code • I/0 architecture • Console su pport • System clock • Missing instruction emulation Initial Startup Code After the kernel is loaded into memory, control is transferred tO the initial startup code. This is entered with the processor interrupt priority " ra ised " to disable the interru pts, and with memory manage ment tu rned off. The code sets up the memory management system and then " handcrafts " the processor to run the first VAX process. The majority of this code is located in a single assembly language fil e , called locore . s . In the case o f the MicroVAX system , the instruc tion emu latOr and several changes tO the I/0 system requ ired special mapping support dur ing startu p. (This su pport is discussed i n the l ast section of this pape r . ) In addition t o the startup cod e, locore .s con tains time-critical routines that use the VAX process-management i nstru ct ions . Some of them contain a easel instru ction based on the processor type for processor-specific opera tions . Those routines had to be extended to include the MicroVAX processors . 1/0 A rchitecture VAX processors do not contain I/0 instructions; instead , device and device adapters exist in various sections of the physical address space of the processor . The control and data registers for these adapters appear as memory locations and are accessed using normal instructions . A key e lement of system software for any new proces sor is su pport for these devices and their associ ated address spaces. As an example, the physi cal address space of the VAX - 1 1 /780 system is pictu red in Figure l . 101 ---- Porting UL TRIX Software to the Micro VAX System PHYSICAL ADDRESS FUNCTION 0000 0000 MEMORY 1 FFF FFFF 2000 0000 TRACKO 8KB 2000 2000 TRACK! 8KB 2000 4000 TRACK2 8KB 2000 6000 TRACK3 8KB ADAPTER OR NEXUS REGISTER ADDRESS SPACE 2001 cooo TRACK 1 4 8KB 2001 EOOO TRACK I S 8KB 1 28K RESERVED 2 0 1 0 0000 UNIBUSO ADDRESS SPACE 2014 0000 UNIBUS! ADDRESS SPACE 20 1 8 0000 UNIBUS2 ADDRESS SPACE 20 1 C 0000 UNIBUS3 ADDRESS SPACE Figure 1 256K EACH VAX- l lj780 Physical Address Space Each of the UNIBUS spaces can be further broken down as shown in Figure 2 . T h e p h ys i c a l a d d r e s s s p a c e o f t h e MicroVAX I I system i s somewhat si mpler, as depicted in Figure 3 . With the exception of the memory sections , the address spaces of the two processors appear to be very d ifferent. I n fact there are a su rpris i ng number of simi larities, as shown in Table 1 . The NEXUS space is where the adapter con trol and status registers reside. In the case of a UNI BUS adapter, the registers that control the mapping from the bus to m a i n memory are l ocated in the NEXUS space. The equivalent Mi croVAX area, cal led local register space , a lso contains the mapping registers for the Q-bus to main memory. These physical address spaces are eventually mapped into virtual addresses through entries in the VAX Page Tabl e . The result is pictured i n Figure 4 . One development goal that we set for each new processor support proj ect is to minimize the c hanges necessary in the operating syste m . I n the case of the MicroVAX I I syste m , we exa m i ned the d i ffe rences i n the physical address spaces between that system and larger VAX systems. Although the names , sizes, and positions were d i fferent, they are fu nctiona l ly equ ivalent on both the small and larger sys tems. As a result, we " coerced" the local regis ter space i nto the NEXUS map , and the Q-bus me mory and I/0 spaces were arranged to look l i ke a large UNI B US adapter. Wi th this approach we were not forced to drasti ca l l y a l ter the ker n e l ' s view of the machine, thus mini mizing changes to other portions of the kerne l . A similar situation existed with respect to the Q-bus map . A device installed i n a UNIBUS adapter sees an 1 8-bit address for a 2 5 6KB FUNCTION PHYSICAL ADDRESS 0000 0000 MEMORY IFFF FFFF 2000 0000 0-BUS 1/0 BKB DEVICE REGISTERS 2008 0000 LOCAL REGISTER SPAC E (256KB) UNIBUS M E M O R Y (248K) 2008 FFFF 3000 0000 O·BUS MEMOAY SPACE (4MB) U N I B U S 1/0 Figure 2 1 02 8KB DEVICE R EGISTERS VAX- 1 1/780 UNIBUS Space 303F FFFF Figure 3 Micro VA X II Physical Address Space Digital Tecb11ical Journal No. 2 March 1986 New Products Table 1 Comparison of Physical Add ress Spaces for the VAX- 1 1 /780 System and the M i c roVAX I I System Physical Address Spaces VAX Function Size MicroVAX I I Function Purpose of Function Size Memory 2 M B-6 4 M B Mem ory 1 6M B Execute Programs N EX U S 8 K each Local Register 256K C P U a n d Bus Control Reg isters U N I B U S Memory 248K 0- bus Memory 4MB Device Memory U N I B U S 1/0 8 K each 0-bus 1/0 8K Device Reg isters PHYSICAL ADDRESS F U NCTION 8000 0000 / VAX MEMORY KERNEL I II UNIBUS SPACE Result of Physical- to- Virtual Mapping address space . The adapter has a set of registers that maps this 2 5 6KB space onto the much larger VAX mem ory space . These registers per form t he equ ivalent fu nction that is provided by VAX Page Table en tries . In effect, they "vir t u a l i ze" t h e m e m ory that devices acce ss . Figure 5 dep icts this mapping. The Mi croVAX II system contains a similar set of registers with the pri ncipal difference being that it has enough to map all four megabytes of main memory. Al though t ha t appears advanta geou s , it in fa ct posed a serious proble m . The ULTRIX system dynamically all ocates the bus mappi ng registers from a central ro utine . It wou ld have been easy to modify this rou tine to " know" about the extra registers. The prob lem encountered here was that these al location rou - Digital Tech-nical ]01n-nal No. 2 March I 'J86 I BUS M AP � NEXUS SPACE Figure 4 I Fig ure 5 I I / I I I I 0-BUS MEMORY I I Q- bus Memory Mapping BIT POSITION 31 BUFFERED DATAPATH NUMBER Figure 6 BUS VIRTUAL ADDRESS Coding of Allocation Ro u tine Wo rd tines retu rn a word that i s encoded as shown in Figu re 6 . The upper part contai ns the nu mber of the buffe red datapath al located , the middle is the nu mber of registers use d , and t he lower is 1 03 the bus virtual address . The format of this 3 2 -bit word i s known by all device drivers that do DMA transfers . To change the word to use a l l t h e map registers available meant that t h e vir tual address portion wou ld need 2 2 bits instead of 1 8 . That wou ld have req uired correspond ing changes in each of the device drivers . To deter mine the severity of these problems, we did some tests to see if the 1 8 -bit format wou l d be a l i m i t i n g fa ctor. Fortu nately, we fou nd that there were always registers availabl e . The end resu lt o f the mapping a n d map-regis ter al location scheme was that UNI BUS device drivers cou ld be left unchanged as long as the Q - b u s hardware was compat i b l e w i t h t h e UNIJ3US versions . W e took advantage of that fact and thus were able to su pport the TSVO S , DHV1 1 , and RL0 2 disk su bsystems wit hout any impact on the development schedu le 7 Console Port Trad i tional VAX systems have a separate proces sor that p e rfo rms c o n s o l e fu n c t i o n s . This processor is used to control the main CPU and replaces the older-styl e front pane l . In stead of having switches for ha lt or ru n , the console ru ns a program t h a t prov i d e s h a l t , r u n , exa m i ne , and initial program l oad capabilities . Programs ru nni ng in a VAX system can commu nicate w i t h t h e console through an i nternal processor register. Com mands sent i n this regis ter are used by the operating system to reboot and restart the machine . The M i croVAX system is differen t: the con sole fu nctions are handled by the microproces sor, the M icroVAX 7 8 0 3 2 chip, which ru ns a program resident in ROM . like the larger VAX systems, a register is used to communicate with the console . A code can be placed in this regis ter. When a subsequent HALT instru ction is exe cuted, execution switches to the console pro gram in ROM , which then examines the code in the register 8 In fac t the register is actually a mem ory location in RAM that is backed up by batteries . The U LTRIX system contains a reboot a nd halt rou tine that is accessed by a privileged sys tem ca l l . That rou tine was modified to comm u nicate with t h e conso le program. System Clock The ULTRlX system keeps track of the cu rrent time by co un ting c l oc k i nterru pts from the 1 0 ms interval timer. The time is kept in mem- 1 04 ory as an unsigned integer; it is initiali zed from the time-of-year (TOY) register during system boot . The time is set by a privileged program through standard system calls and can be read by normal user programs . That set procedure i s norma l ly done b y the system manager using the DATE command . DATE converts the time from a format of year, month, day, hour, m i nute, and second to the integer format needed by the sys tem cal l . User e nters : System converts to : - set __, Integer yymmddhhmmss <-- read where yym mddhhmmss Hour, M i nute, Second = Year, Month, Day, The MicroVAX system does not have a TOY register; in stead , it has a watch chip backed up by a battery. The chip contains a n u mber of cou nters that correspond to the year, month , day, hour, minute, and second . We could have mod i fied the system cal l or added a new o ne to explicitly set the MicroVAX TOY clock. That wou l d h a v e a v o i d e d t h e c o n v e rs i o n t o integer format, given that the user has t o enter date and time information in the format needed by the watch chip. However, it woul d have meant that we needed two versions of the DATE command, one for existing systems and the other for the Mi croVAX system , to use the new format . To avoid that, we borrowed the conversion routines from the DATE command and used them in M i croVAX versions of the sys tem time-setting rou tine. The irony here is that t h e d a t e is n ow c o n v e r t e d t w i c e . The i nteger format is present o n either side of the system ca l l . System reads: User enters : yymmddhhmmss i n teger yymmd dhhmmss Missing Instru c tion Emula tion As mentioned previously, the MicroVAX hard ware i mp l e m e n ts a subset of the fu l l VAX instru cti on s e t . Most stri ng i nstruct ions are missing and are e m ulated i n macrocode i nstead of implemented in hardware . The emu lation code c o u l d have been p laced in l i b ra ries , where it could be linked with user-level code . To do that, however, wou ld mean that l i nked Digital Technical journal No. 2 March 1 986 New Products i mages from other VAX systems wou ld not run on a MicroVAX syste m , thus violating one of its bas ic objectives . Ra ther than using l i bra ries , we c hose to use an emu lator designed by the VMS Development Grou p and ported that emulator to the ULTRIX system 9 The emu lator l i nks w i t h the kernel and is a l most completely invisible to user programs . It is su pported by new traps in the hardware that help to decode each mi ssing i nstruction . When the kernel or a user program executes one of the mi ssing i nstruct ions , a trap occu rs and the emu lation code takes over. That hap pens without changing mode; in other words, if an emu lation trap occurs in a user program, the emu lator is entered in user mode, not kernel mode l i ke other traps . The resu lt is user-mode execution of code in the kernel address space . (Unlike the VMS syste m , the entire ULTRIX ker nel is normal ly unreadable by user programs . ) The startup code now i n i tializes the pages con ta i n i ng the emu latOr so that they can be read and execu ted by user- level code . As stated earlier, the end result is a combina tion of hardware and software that is a l most completely compatible with systems running the fu l l VAX i ns truction set . I n fa ct, executab le i mages from other VAX systems can run without re linking. The only point of i ncompati b i l i ty is that the emu lation code runs on the user stack when one of the m iss ing i nstructions is exe cuted by user code . (We have seen one cus tomer appl ication that was affected by this situ ation. The appli cation used knowledge of its past usage of the stack to do " garbage collec tion" a nd was confused by the intermediate res u l ts of the emulation code . That is norma l ly not a problem; the ULTRIX- 3 2 and ULTRIX3 2 m kits have over 500 user-level programs . They are co mpiled and l i nked once on a fu l l VAX system and t h e n ru n without modifi cation on the MicroVAX sys te m .) software to provide c ustomers, including devel opers of software device-d rivers, with a product that runs all VAX programs for a fraction of the cost of a larger VAX syste m . References 1. A detailed h istory of and supplemental information about the UNIX system can be found in the A T& T Bell Laboratories Te c h n ical j o u rnal, v o l . 5 7 , n o . 6 Ou lyjAugust 1 9 78) and vol . 6 3 , No. 8 (October 1 9 8 4 ) . 2. Some programmers consider C to be a low-level language ; i n fact, i t has proven ro be more than adequate for program m i ng an operati ng system l ike the U N I X syste m . 3. D .W . Dobberpu h l e t a l , "The MicroVAX 7 8 0 3 2 C h i p , A 3 2 -Bit Microprocessor , " Digital Technical journal (March I 986 , this issue) : 1 2 - 2 3 4. This work was done long before the first hardware protOtype was developed . 5. ODS-U is t he VMS on-disk fi le structure. 6. The AT&T versions call this fi le "u n i x , " w h i l e t h e Be rke le y v e r s i o n s c a l l i t "vmunix , " denoting "virtual u n i x . " 7. However, we did have tO expend time and energy tO do addi t i onal configura tion testing. 8. The MicroVAX 7 8 0 3 2 chip never ha l ts; it is ru nni ng ei ther ROM console code or programs in RA.t\1 . 9. K . D . Morse and L .] . Ke nah, "The Evolu tion of I nstruction Emu lation for the Mi croVAX Syste ms , " Digital Technical journal (March 1 9 86, this issue ) : 76-8 5 . Summary In porting the U LTRIX system to the MicroVAX processor, we opted to maintain compatibi l i ty with other versions of the system , wherever possible. We choose not tO su pport hardware featu res if they vio lated internal or external inte rfaces . The refore , we were able ro del iver a broader range of peripheral support with a min i m u m of develop me n t . The end re s u l t-the M i croVAX syste m - c o m b i nes ha rdware and Di_v,ital Technical ]OIII"Illl Nv . .! Marcb I 'J86 1 05
Home
Privacy and Data
Site structure and layout ©2025 Majenko Technologies