Digital PDFs
Documents
Guest
Register
Log In
EK-FP780-TD-001
December 1978
112 pages
Original
6.1MB
view
download
OCR Version
5.6MB
view
download
Document:
VAX-11/780 FP780 Floating-Point Accelerator Technical Description
Order Number:
EK-FP780-TD
Revision:
001
Pages:
112
Original Filename:
OCR Text
Rel < 2 ERetasSidy e (PRI T el AR g S o U it Eadbel A EK-FP780-TD-001 FP780 Floating-Point Accelerator Technical Description digital equipment corporation - maynard, massachusetts Ist Edition, December 1978 Copyright € 1978 by Digital Equipment Corporation The material in this manual is for informational purposes and is subject to change without notice. Digital Equipment Corporation assumes no responsibility for any errors which may appear in this manual. Printed in U.S.A. This document was set on DIGITAL’s DECset-8000 computerized typesetting system. The following are trademarks of Digital Equipment Corporation, Maynard, Massachusetts: DIGITAL DECsystem-10 MASSBUS DEC DECSYSTEM-20 OMNIBUS 0S/8 PDP DIBOL DECUS EDUSYSTEM RSTS UNIBUS VAX VMS RSX IAS 6/04-15 CONTENTS Page PREFACE INTRODUCTION CHAPTER 2 FUNCTIONAL DESCRIPTION w o = = N W [\ NNRNNNNRORNRDNRNDNND MDD === — - T NI e O ~I O\ W bW == p— GENERAL DESCRIPTION ...ttt s vae e 1-1 Accelerator INterface ... 1-2 FPA INSTRUCGCTION SET ...ttt cccees ittt tteetee s ee e eeaee annessansnans 1-3 PHYSICAL DESCRIPTION ..ottt ee e ee v s ae e e 1-4 REVIEW OF FLOATING POINT NUMBERS AND ARITHMETIC .............. 1-5 INtrOAUCTION . .c.ceiee e et ee et et tr et e s ae et e e e re a e e 1-5 I TS ittt et e e e bt e e e e 1-5 Floating-Point NUMDETS.......cccooiiiiiieiiiceeeeee et 1-5 Decimal/Binary/Hexadecimal Conversion.........oocceeiiiiiiieciiiniiiiieceeine e, 1-6 NOIMAlIZALION ....cooeeeiiiiicecie e e e e e e e e e seeeree et an seeaeee e 1-11 VAX Floating-Point NOtation.........coooeiiiiiiiiiiiiiiiiiinnee e e e 1-12 Floating-Point Addition and Subtraction.........c.cccceeeiiiiiiiiiiiniiniire 1-13 Floating-Point Multiplication and Division .........cccooooiiiiiiiiiiiinnini . 1-13 EXCESS 80 (EXCESS 2008) NOTATION.....couttuiiieeccee e eeeere e e 1-14 —_~_—-‘——~—_~—‘ LhRARERRRRLLE - CHAPTER1 2.2.3.1 223.2 224 224.1 2242 2243 2.2.5 2.2.5.1 2.25.2 DATA FORMAT ...ttt ce e e s e e et ae e e aeaebaeeaessaeae s ansbeees 2-1 Floating-Point NUmMDETS.......cccoviiiireiieitie ettt 2-1 Integer NUMDETS ....ooooiiiiiiiiiiec e et ee e e et eee e e e e e e 2-4 LItEEAIS .. .. ceeieeiiiiieeie e ceeeeeee e e e ee e eree et enra v e s esra e seeeeeseesseaeetaeaban aeaessaenraes 2-4 Zero and Reserved Operand Codes..........cooovuiiiiiiiiiiiiiiiiiiiin e e2-7 Hidden, Overflow and Guard Bits ............coieiiiiniiiininiiiiiniciee e, 2-8 Overflow, Underflow, Zero, and Reserved Operands.............cccocevuenivennnnen.e. 2-9 INSTRUCTIONS AND ALGORITHMS ...t e2-12 Add/SUbLract ........coccooiiiniiiiiiiii e2-14 0T U OO OO POV PUPPRPRRPN 2-14 Add/Subtract .........coovviieiiiiii e 2-14 NOTMANZE ......eeeiieiiieiiie e ettt e e s eeee e e e e een e eas 2-15 Multiply (Floating-Point) ...........occoeiiiiiiiiiiiiiiiiiiiiiniiiice e e 2-16 LOGA. ... ettt ee e e e s es e e e e s eee e s s s aeebeaeeeas 2-16 MUY e e 2-16 NOTMALIZE ..o e e e e e e e e e s aee 2-17 MULL (Multiply Integer Longword)..........ccoviuiiiiiiiiniiiiiiiicieticec e 2-17 LOAA..... oottt et e e s e e e e s s et e s s e 2-17 Multiply and Return..................e eeterteureeeeatataaa—aaaaee e teretaeen e enenes 2-17 DIVIAE ..ovneeeeiiiete ettt et eee e e eee e e e eseee s ettt ebe e e e e et b be e e st s aeas 2-17 | 0T V¢ DO U OO UP PRSPPI 2-18 DIVIAE oottt ee e ee e e ee et e e e e e e ne st e2-19 NOIMANZE ... re e s s s et ra e 2-19 EMOD (Extended Precision Multiply and Integerize) ...........ccccccecennnnnnene. 2-19 Operand Load ........coooiiiiiiiiiiieece e 2-19 Result Calculation and Return............cooovveiiiiiiiiiininiiiceccce e 2-19 iti CONTENTS (Cont) Page POLY (Polynomial Evaluation).............cueeeeeeveemoeeeeeeeeeoeeeeeeeee 2-20 INtrOAUCLION ...t 2-20 The Polynomial EXPression .............oo.ueeeeeeueeeevoeees oo, 2-20 Normal POLY FIOWS .....cccoovvviiiiiiieeeeeeeeeeeeeeee e eee oo 2-20 POLY Exception FIOWS .......cc..ocvuiiiiiiie oo 2-23 BLOCK DIAGRAM AND UNITDESCRIPTION ......oooooeoeoee oo 2-25 CPU-FPA INterface. ......oocvie e iiiieeiiiieceit 2-27 CPU-FPA Status and Control Interface.............coeveeeeeeeeeeeoosooee, 2-28 CPU-FPA Data Interface .........oocuevieiieeeieeie e e eeeeeeeee 2-30 Trap and Diagnostic Information ...............cccoeeeevvveeeoeeeeeeeeeeeen 2-31 FPA Internal BusSes.......ccooouiiiieiiieiiiiecee e 2-34 Fraction Adder (FAD) .......ooo e e, ouuiiiiii veenn2=37 Fraction Normalizer/Divide (FNM) ......oooiimoimieeeeeeeeeeeeeee e 2-41 Normalize OPeration...........coee oo veuevien eee e iiiiee 2-43 Divide OPeration.......cc.cocveeeeeeieeiene e eee oo iiiieeeee e ee 2-45 Fraction Multiplier (FML and FMH)..........cocooiimmiemmeeeeeeeeeeeeeeee 2-48 The PIPElINe.......coooiiiiiiiiiiee e e e 2-50 FM Control.........co et iiiiii ee e iieeee 2-57 DIVISION.....cciii e ettt iiiiii e et eieete ee e e e nitee 2-68 EXPONENt PrOCESSOT .........o et oiiii eee e e iiiie e oo 2-68 SIZN PTOCESSOT ...t eee tt e e e e e oot e 2-74 Control Store and LOZIC ........ooveii e uieiiii e, iiiie 2-76 IRD ..ottt e e e e e e e 2-77 Performing an FPA Instruction .........cccccoeeeveeeoeeveeeoveeeeeeeeeeseenn 2-80 Exception Conditions .........cccoeoveeeeeiiiiiiiei e eeeeeeeeeeeee eeeieeee e 2-81 FPA MICROCONTROL FIELDS......c..iiioteteees oo 2-82 EPA MICROCODE STRUCTURE ........cooiittieeeee oo 2-84 FPAINTERFACE FIRMWARE..........ooiiiiieee e, 2-84 Major Interface FUNCHONS ........ccoovvieenrieicci e 2-84 Major INStruCtion GIOUPS .....c.uccuuee et eeeeee e cieieei e e e eees e oo ice e e 2-87 iv FIGURES Figure No. Page AW bW —O — o O NN — e e e S N N UL U VOO~ WN =W — PRTUPPPP S PPOOUPPPPPPS 1-2 O 8U110 ol o7 U UT T UV iiiiiiiiteinit 1-4 e FPA Physical LOCAtION. .....ccuiiii iiii 1-11 e Positional Value of Binary NUMDET ........cooooviimiiii Floating-Point FOrmMat........ccooiiiiiiin it 2-2 s 2-5 s e e INteger FOTMAaL. ... vveiiiieiiii it 2-6 esennaaasesnnaes e iiii s s era e s e e s rre e eeetie tee e e siiii e eeeeiiee uuii Short Literal FOTMAL ....couu 2-8 i Zero and Reserved Operand Code........ccooviiniiiiiiiniiiiniiiinieneni e 2-8 Hidden, Overflow, and Guard Bits .........ccccccviiiiimimiiii e, 2-11 Overflow and Underflow Ranges............cccooeiiiiiiiiimiiiiii 2-13 s FPA Block DIagram ........cooeiieieiiiiiiiie it RO RO 10 19 1O R R D O W — SO WN— OO N oMb W AP o L) LI W W W R OO NN NN NN NN S N o I N R Title THE POLY FlOW.. ettt eee s ete e e emt e e s eas e e s s sesaass setsseeas b sesaes 2-21 i 2-26 st sttt eerieeii ieiiiii it FPA Block Diagram......cccviiei 2-27 ererans ees s s saesseen ieeesae s evrstiars e eessssan iteeitrii CPU-FPA INtEI ACE. ..uue oo eeeieieiiiiiee 2-28 e e ettt et SEALUS REZISTET ..uviiiiieeiee ettt et ettt e e 2-32 e Maintenance REZISTET ... ...uvvviieieiiiiiie ittt FP BUS FOTIMALS ...eiveeneeeiiiaieieieiteeeetieeesianenees eassenesenesenrneessaraanas srssaeesannsesansans 2-36 iii 2-37 e Fraction Adder Block Diagram ............ccocoveeiniiiiiiiiiii s 2-39 s s san s s iiiiie ettt sttt ereree it ieieiiiis sttt SHEFR OPEration. ...cccceueee 2-42 inniine Fraction Normalizer/Divide Block Diagram .............ccoccoeiiinii 2-43 iins Normalize Shift Enable Control Hardware..........cccoovviiniiiii 2-47 nie e iniiiieie e Divide Sequence Hardware..........oocuiviii 2-48 s e ieirnie miinnieii et Divide Sequence TIMING .....cccvivuiiinii Fraction Multiplier Block Diagram ...........cooveiimiiiniiniinnii i 2-49 The PIPElINe ....cc.eoiiei ettt ettt 2-51 Loading and Accessing the Multiplicand ..., 2-52 Loading and Accessing the Multiplier........c.occoooiiiiiiin 2-53 SALU Operation — Adding the Stored Carrys.......ccoooviieinniiinini, 2-57 ssana 2-58 s sbnnseseenessesnsnns FIM CONtIOl StALES. . cciveiieiiiieeerieiieeerreeeeettrenieceeieerrertesaesaee 2-61 st st FM Control LOZIC ... .uuvuiiiiieiieee e it OPPPPPTPPPPP U PRSI 2-62 115 o) IUUEU 5 S 00) 1Y 161 ssserraase 2-64 s aaeearases seeensennssseansans The XFER StAte...cceueruiiieiiiiiriiieeeeeeniiieeseeeneeerest s ae s anaan s 2-65 srenae e s MULD CONEIOL.. ettt iieiiiieeeeeeeereeiee s eestree e s s e e ettt se s s aata e es 2-69 s sesssaeseesennsntans eeeeeee seastata srsssseesessanssaras et e eee e iieeeie MULL CONETOL .. eiiiieiiiaitii Exponent Processor Block Diagram ...........ccoooviiiiiiiniinniinnnnii 2-70 s 2-74 iiniiiin it Sign Processor Block Diagram.........cooeiiiniii 2-76 eiiiniiiiin Control Store and Logic Block Diagram...........ccccco 2-78 errrnaas i nar arennna i eeerie Next Address LOZIC......coovvimvveemimmiiicii eer i 2-82 e e itiiiiiiiei FPA Control Word Fields .......coooiviiiii 2-85 en eesesns s s e ieeieeeie eeeesane ettt e s FPA Microcode StIUCLUTE ....uuuiiieeieeeri TABLES oard ek b jumed WV BN - et e \O 00 I NV B WN - 1 5 B {2 4 1 1 Nl'\)NN R A W N U] - SONION Table No. Title Related Hardware Manuals..............oocooovvoovooii Page e et e et e e ee iy e enenas 1-1 FPA INStruction Set .............ccoevieuineieesioeeeeeeeeee e 1-3 FPA MOQUIES.......o.oiiee e 1-5 Binary - Hex EQUIVAIENtS .........o.ovviveeieieee o o 1-10 Floating Literals...............coccoovmiomninineieeeeeeeoeeeee o o 2-6 Zero Operand MiCroCode........uovueeruruniuueeeeeeeeoeeeeee o o 2-7 Exception Conditions............ceeieeveiuieeriees oo 2-10 FALU OPeration ............occoeimmomeiiieeeies o o 2-15 Special FAD Operation...........cooeeiueveeeioieieceeeene e 2-15 The Division Load..........ccooevunminieiieeteieeeeeeee eeeeeoooooo 2-18 The Status Register.............ccouveinierioiiiieceee o o 2-29 CS LIMES...io it 2-30 The Maintenance Register .............ovuouiveveemeeeeeeeeee eeeoeooooo 2-33 Signals Monitored by Visibility Bus.............cooeovve eemeveeeooooo 2-34 BSC Control Store Field........c.ccoovuvoiuiveeioet oo 2-35 Fraction Data Entry...........cccooevmeniiioiocieieieeeee o o 2-38 FALU OPeration ..........c..cc oot o eeeueieuni o eiiee t eees 2-40 FALU MUX CONtrol.......cccceu oo imumi o mriie o ieeeoet 2-41 Round Byte and Normalize Control...............ooooeeme vmeeveooo 2-44 Divide Sequence States...........oeieeueeeieieieeeiee o o 2-48 Operand Bus SOUTCE..........c.coomuiiieeieceieie o o 2-55 FM Control States............oceveuiiiieuireiee oo t 2-59 EAC Control Store Field .........ooooeuvuiuiiinieeeeeeeeeee eeeeeooooo 2-71 EALU Input Control......... oot .coo eeeeumii e uireieoe 2-72 EALU Control Store Field ..............ooovooiee o o 2-73 SGNC Control Store Field ..........co.eovueivieieeeeeeeeee oo 2-75 Sign Processor OPeration ...............o...oeeeeueeeeieeeeceeeeeseeeo eoooooooooooo 2-75 Next Address Lines .........c.occeeimniiiiioieieeeee o o 2-78 BEN Control Store Field..........co..oooveviveiooesio eeeeeeeoeooeeoo 2-81 EPA Control Word Field Definitions ..............coco ouvveeovooooo 2-83 Interface MicroCode .........c.covumeeuiurieiieeoeeeeeee e 2-86 Vi The FPA is a microprogrammed device operating as a synchronous extension of the CPU data path. Both the FPA and CPU operate using a 200 ns microcycle; FPA TO coincides with CPU TO. As an extension of the CPU, the FPA does not access memory data. The CPU must do memory address calculations, access the calculated address, and transmit the accessed data to the FPA. The CPU is also responsible for fetching and storing the FPA results. The FPA performs only the required floatingpoint or integer operation on the properly formatted operands transmitted to it. The FPA can do floating-point addition, subtraction, multiplication, and division instructions. It receives a packed, normalized floating-point number containing a sign bit, fraction bits, and exponent bits. The FPA breaks the number into parts and FPA data manipulation sections perform the operations required to carry out the instructions on each part. Once the result is completed, it normalizes and packs the result for return to the CPU. Refer to Figure 1-1, a simplified diagram of the FPA. < YT T > SN CPU < 8 J 2 |2 > -2 |8 FRACTION 3 PROCESSORS O +—1> -1T* fi/\ 4 |E DM FX BUS CS BUS ~ i ~ E <CONTROL LINES ) I = >3 b |& T 8 N EXPONENT AND z= 2 PROCESSORS —+ SIGN FPA TK-0522 Figure 1-1 The FPA 1.1.1 Accelerator Interface The FPA is an optional hardware extension of the VAX CPU data path. It is the first of a series of optional accelerators that can be plugged into slots 24 through 28 of the CPU backplane. To facilitate design of these optional accelerators, a set of standard interface signals and buses is used to transfer data and control information. T8V qldopiésVobdhe CPU general register set are kept in the FPA. These are read-only memory to the HPAqamdaproeide!rapid access to register operands when used in instructions. Every time the CPU yenerdmégisietd At updated, a copy of the update data is transmitted via the DFMX bus to the FPA copies mmdsohgagef them. o1 291alqmoo A9 odi olin Al Sthedidara¢erngeypdhd literal) is transmitted to the accelerator via the ID bus. Memory data is srhinsifeod anlto A1 CGHT sgister and then onto the ID bus. Literal data is transferred from the shdtrolethors bsfftl olg rsed 1D 9i1s All op codes are received from the instruction buffer. The FPA uses dedicated hardware to handle é@l\'tdibmiag%fléé.\'flhef@p%défidmudeflbded and, if part of the FPA implemented set, processing is % 2i Inges1qor nso A9 . stgnied) .3¢-01 X €C. Ismiosb ol 1uods o1 19dmun noieiosiq slduob b ovieuloni TP, E8M TR C 01 8p0,E8D TH] C- motl. [1F2 FPA results are returned to the CPU via the DFMX bus. Any transfer of data (either operands or results) between the CPU and FPA is controlled by the CPSYNC and FPSYNC. CPSYNC is transmitted via the CS bus. When an operand is transferred to the FPA, CPSYNC asserted (by the CPU) indicates that data is available on the ID bus and FPSYNC is asserted (by the FPA) to indicate data has been received. When the FPA is returning a result, FPSYNC indicates result available and CPSYNC indicates result received. When a result is transferred, the FPA also transmits the proper condition codes to the CPU. Traps and errors are handled with three signals: ACC ERROR (from FPA to CPU), FPTRAP (CPU to FPA), and ACC TRAP (CPU to FPA). ACC ERROR (also called ERRSYNC) is asserted when the FPA detects an internal error and is input to the CPU BEN mux. FP TRAP is used by the CPU to initiate microdiagnostics stored in the FPA. ACC TRAP selects either the power-up trap or the abort trap (both stored in the FPA microcode). 1.2 FPA INSTRUCTION SET The FPA handles only a limited number of instructions (refer to Table 1-2). No floating-point instructions are available in VAX’s PDP-11 compatibility mode. As shown in the table, the FPA handles single and double precision instructions in both 2 and 3-operand formats. The FPA handles the single and double precision instruction variations internally. However, as stated before, the FPA does no memory accessing. This means the CPU must do all address calculations and accessing for any input operands stored in memory. Also, the FPA does not store any final results; it merely makes the results available to the DFMX bus. The'CPU must enable the result onto the DFMX bus, determine the result destination, and put it into the destination. In a 3-operand instruction, the FPA begins computing as soon as it has the 2 source operands while the CPU is computing the third, or destination, address. Table 1-2 FPA Instruction Set Mnemonic Description ADDF* ADDD* SUBF* SUBD* MULF* MULD* DIVF* DIVD* POLYF POLYD EMODF EMODD MULL* Add single-precision floating-point Add double-precision floating-point Subtract single-precision floating-point Subtract double-precision floating-point Multiply single-precision floating-point Multiply double-precision floating-point Divide single-precision floating-point Divide double-precision floating-point Evaluate polynomial single-precision floating-point Evaluate polynomial double-precision floating-point Extended single-precision floating-point Extended double-precision floating-point Multiply integer longword *The FPA instruction set includes both the 2-operand and 3-operand format of these instructions 1-3 1.3 PHYSICAL DESCRIPTION ‘ The FPA consists of 5 hex-height, extended-length modules containing mostly Schottky TTL logic. They replace blank modules 7014103 in slots 24 through 28 of the KA780 backplane. These slots are designated as the accelerator option slots. The FPA is powered by an H7100 installed in power supply position 1. When viewed from the rear, position 1 is the rightmost location in the VAX CPU cabinet. Position 1 is left empty if an accelerator is not installed. The H7100 isa 5 V, 100 A supply. Refer to Figure 1-2 for the location of backplane slots and power supply. Refer to Table 1-3 for module designations and locations. FPA MODULES 4 Z ] tPS #1|[ PS#2 || Ps#3 PS#41 PS#5 |J ! l \ 0 N ) S— a — — T D ——— a} N AT S 3 K PN R WIN 0N 1 - \ E T\ =0 hon S AL LA X.}‘A’\\/‘" EnX 2 e [\ l-—€ = FPA POWER \ SUPPLY Figure 1-2 FPA Physical Location 1-4 TK-0524 Table 1-3 1.4 1.4.1 FPA Modules Module No. | Slot Module Name | Module Function M8285 M8286 M8287 MS8288 M 8289 FNM FMH FML FAD FCT 24 25 26 27 28 Normalization and fraction division Fraction multiplication (most significant bits) Fraction multiplication (least significant bits) Fraction addition and subtraction Exponent manipulation and FPA control FLOATING-POINT NUMBERS AND ARITHMETIC Introduction This section discusses some fundamentals of floating-point numbers and arithmetic. It provides useful background for more advanced topics in later sections. The reader already familiar with floating-point may skip this section. 1.4.2 Integers All data within a computer system could be represented in integer form. The numbers that could be represented in a 32-bit machine range in magnitude from 00000000, to FFFFFFFF ¢ (or from 0jg to 4,294,967,295). However, integer form imposes some limitations. Only whole numbers can be represented, i.e., no fraction or decimal parts; this imposes an accuracy limitation. Furthermore, numbers greater than 4,294,967,295 cannot be represented; this imposes a range limitation. These limitations are imposed by the stationary position of the radix point (e.g., the decimal point in base 10 notation or the binary point in base 2 notation). An integer’s radix point is usually omitted in integer representation because it always marks the integer’s least significant place. That is, there are never any digits to the right of an integer’s radix point. For this reason, an integer is sometimes called a fixed-point number. Integer notation, however, can be modified to overcome the range and accuracy limitations imposed by the fixed radix point. This is done through the use of floating-point notation. 1.43 Floating-Point Numbers Floating-point numbers, unlike integers, have no position restrictions imposed on their radix points. A popular type of floating-point representation is called scientific notation. With scientific notation, a floating-point number is represented by some basic value multiplied by the radix raised to some power. Example basic value exponent 1,000,000 = 1. X 106 radix There are many ways to represent the same number in scientific notation, as shown in the following example. Right shifts Left shifts 512 = 512. X 10° 512 = 512 X 10° 51.2 X 10! = 5120 X 107 = 512 X 10?2 = 51200 X 107 = 512 X 103 = 512000 X 1073 The convention chosen for representing floating-point numbers with scientific notation in the FPA requires the radix point to always be to the left of the most significant digit in the basic value (e.g., .512 X 103 in the above example). This modified basic value is called a fraction. Notice that for each right shift of the basic value, the exponent is incremented and for each left shift the exponent is decremented. The value of the number remains constant if the exponent is adjusted for each shift of the basic value. More examples of scientific notation are as follows. Decimal Notation Decimal Scient. No. Binary Notation Hex Notation Hex Scient. No. 64 33 1/2(.5) 3/32(.09375) 64 X 102 33 X 102 .5 X 100 9375 X 10-1 1000000. 100001. 0.1 0.00011 4016 2116 816 1816 4 X 1672 21 X 1672 8 X 160 .18 X 169 1.4.4 Decimal/Binary/Hexadecimal Conversion There are standard routines to convert from decimal notation to hexadecimal (also called hex) and back. When converting from either decimal-to-hex or hex-to-decimal it is convenient to first convert to binary notation and then to the final notation. Decimal to Hex Conversion: To convert a decimal number with both integer and fraction portion to a hex number, the integer and fraction are separated and converted individually. The integer is converted to binary by a repeated division technique, the fraction by a repeated multiplication technique. To convert an integer to binary representation, the integer is divided by two. The remainder of this division (either 1 or 0) becomes the LSB of the binary representation. The result of this division is again divided by two. The remainder of this division goes to the left of the LSB, becoming “next to LSB.” The result is divided again. This process is continued until the resultis zero. Refer to Example 1. Example | Convert 1979 to binary STEP 1 STEP 2 STEP 3 STEP 4 5 STEP 6 49 R O 24 R 1 12 R O 6 R O 3 R O 1 R 1 0 R 1 2) 49 2 1100 0101 ) 6 2 5 3 STEP 8 L 24 2 5 12 25 STEP 7 1 2) 98 2) STEP 98 R 2)197 1 197, = 1100 0101, TK-0654 A repeated multiply-by-2 converts a decimal fraction to a binary fraction. The decimal fraction is multiplied by two. If the result is 1.0 or more, a 1 is placed in the MSB of the fraction (directly to the right of the binary point); if less than 1.0, a zero is placed there. The fraction portion only of this result is again multiplied by two, if the result is 1.0 or more, a 1 goes to the right of the MSB, less than 1.0, a zero. This continues until the fraction portion of the result is all zeros (refer to Example 2) or until enough binary fraction bits have been generated to represent the decimal accurately enough (refer to Example 3). Note that finite length decimal fractions can become repeating fractions in binary (Example 3). Example 2 Convert 3/8 (.375) to binary STEP 1 375 011 2 J © .750 -0 STEP 2 .75 2 @ .50 —=1 STEP 3 .50 2 @® .00 —1 STOP 37519 = 011, TK-0655 Example 3 Convert .603;¢ to binary STEP 1 603 STEP 2 -206 STEP 3 412 (0) .824 STEP 4 824 STEP 5 .648 STEP 6 .296 STEP 7 592 1001 101 —0 2 Q) .184 DECIDE TO STOP —1 60319 = .1001 101, TK-0656 The conversion from binary to hex is very simple. Starting at the binary point, break the binary number into groups of 4 digits each. (Zero fill at both right and left ends to complete groups of 4.) Then replace each group of 4 with its hex equivalent. Refer to Table 1-4, and Example 4. Table 1-4 Example 4 Binary-Hex Equivalents Binary Hex 0000 0001 0010 0011 0100 0 1 2 3 4 0101 0110 0111 5 6 7 1000 8 1001 9 1010 1011 A B 1100 C 1101 1110 1111 D E F Convert 110010110.101101, to Hex 1. Break into groups of four and zero-fill left and right ends. Zeros Zeros Added Added 0001 1001i 0110.1011 0100 i i i 4 2. 4 4 4 4 Replace four digit groups with hex equivalents. Refer to Table 1-4. 0001 1001 0110.1011 0100 oo \ R S ¢ 1 6 B 8 9 196.B8, ¢ 11001 0110.1011 01,=196.B8,¢ To convert from hex back to decimal, first replace each hex digit with its 4-bit binary equivalent (refe: to Table 1-4). Each position in a binary number has a positional value based on which side of the binary point it is and its distance from the binary point. The positional values are based on powers of two. The bit in the unit column has a positional value of one. The positional value doubles each time you move from right to left, and halves as you move from left to right. Refer to Figure 1-3 for a summary of binary positional values in both powers of two and decimal value. .97 128 96 64 25 24 23 22 21 0 21 22 23 32 4 8 16 2 1 1/8 % % 5 .25 .125 26 | 25 4 1/64 1/32 1/16 | 015625 .0625 03125 TK-0657 Figure 1-3 Positional Value of Binary Number To convert from binary notation to decimal notation, add the decimal positional value of each bit that is a one. This sum will be the decimal equivalent of the binary number. 1.4.5 Normalization As discussed previously, there are many ways to represent a particular floating-point number using scientific notation and the convention chosen for representing floating-point numbers in VAX and the FPA requires the radix point to be to the left of the most significant bit in the basic value. Refer to Example 5. Example 5 Floating-Point Form 29,, = 11101, = 11101, 1110.1 11101 Fraction s Exponent 111.01 11.101 1.1101 %g%‘._..mol 011101 X X X X X X X 0011101X 20 21 22 2% 2% 25 26 27 = = = = = = = = 11101. 111010. X X 20 2% 1110 1000. 11101 0000. 1110100000. 11101000000. X X X X 273 2 2% 2 1110100. 111010000000. X X 22 27 The process of ensuring that the first significant bit is directly to the right of the binary point is called normalization. If the number is one or larger it involves right-shifting the basic value and incrementing the exponent until the MSB (a one) is directly to the right of the binary point. If the number is a fraction with leading zeros the basic value is left-shifted and the exponent is decremented. Examples 6 and 7 show conversion of numbers to VAX normalized form. Example 6 Convert 75)9 to a normalized binary number 1. Integer conversion 7510 = 100 10115 2. Floating-point form 100 1011, = 100 10115 X 20 3. Normalized form Right shift fraction 7 times Increment exponent by 7 100 1011, X 20 = .100 1011 X 27 Fraction = .100 1011 Exponent = 7 Example 7 Convert 3/16 (.01875) to a normalized binary number. 1. Integer conversion 0187519 = .0011, 2. Floating-point form 00113 = .0011, x 20 3. Normalized form Left shift fraction 2 times Decrement exponent by 2 00113 X 20 = 11 x 2-2 Fraction = .11 Exponent = -2 1.4.6 VAX Foating-Point Notation Two conventions are used in the FPA to conserve memory space without losing accuracy and to aid in hardware manipulation. The first convention is called the hidden bit. All numbers transferred between the CPU and FPA are normalized floating-point numbers. This means the first significant bit (always a 1) is always directly to the right of the binary point. To conserve memory space and data lines, the first significant bit is not stored or transmitted to the FPA. For example, the fraction part of the normalized binary number .11000... X 2-2 will be stored and transmitted to the FPA as 100.... The normalized fraction of 1/2 (.100... X 20) will be stored and transmitted as 000.... In both cases the first 1 (the hidden bit), will be added by hardware in the FPA. When the FPA transfers a normalize d answer back to the CPU the hidden bit is not sent. 1-12 The 8-bit exponent portion of a floating-point number is stored using excess 80;¢ notation. This notation simplifies the hardware that manipulates the exponent during floating-point arithmetic operation. Excess 8016 exponent notation is obtained by adding 10000000, (200g, 8016, or 128;9) to 2’s complement notation. Refer to Paragraph 1.5 for a further discussion of excess 80 notation. 1.4.7 Floating-Point Addition and Subtraction In order to perform floating-point addition or subtraction, the exponents of the two floating-point numbers involved must be aligned or equal. If they are not aligned, the fraction with the smaller exponent is shifted right until they are. Each shift to the right is accompanied by an increment of the associated exponent. When the exponents are aligned, the fractions can then be added or subtracted. The exponent value indicates the number of places the binary point is to be moved to obtain the integer representation of the number. In example 8, the number 7;¢ is added to the number 4019 using floating-point representation. Note that the exponents are first aligned and then the fractions are added; the exponent value dictates the final location of the binary points. Example 8 Floating-Point Addition 0.1010 0000 0000 000 X 26 = 286 = 40;9 +0.1110 0000 0000000 X 22 = 1. 716 = 710 To align exponents, shift the fraction with one smaller exponent three places to the right and increment the exponent by 3, and then add the two fractions. 0.1010 0000 0000 000 X 26 = 28;¢ = 40y +0.0001 1100 0000 000 X 26 = -~ 7y = 7}0 0.1011 1100 0000 000 X 26 = 2F = 47} 2. To find the integer value of the answer, move the binary point six places to the right. 010 1111.0000 0000 O N> 1.4.8 Floating-Point Multiplication and Division In floating-point multiplication, the fractions are multiplied and the exponents are added. For floating-point division, the fractions are divided and the exponents are subtracted. There is no requirement to align the binary point in the floating-point multiplication or division. Example 9 shows floatingpoint multiplication. Example 10 shows division. 1-13 Example 9: 1. 0.1110000 X 23 = 7 = 7, X 0.1010000 X 26 = 28, X 2% (Result already in normalized form.) = 40,, 1110000 0000 11100 .1000110000 2. Move the binary point nine places to the right. 100011000.00000 = 118, = 280y, Example 10: 1. .1111000 X 2% .1010000 X 23 1.100000 1010000 )1 111000.000000 1010000 101000 101000 0 2. Exponent: 4-3 = 1 3. Result: 1.100000 X 21 Normalized Result: 1100000 X 22 Normalized Fraction Normalized Exponent Move binary point two places to the right. \1,1;00000 = 316 = 310 1.5 EXCESS 80 NOTATION The VAX and, consequently, the FPA use excess 80 notation to store and handle the exponent portion of floating-point numbers. Excess 80 notation is the 2’s complement of exponent plus 128 or 80;6. 1-14 It is convenient to handle the exponent portion of the floating-point number in 2’s complement notation. This allows a wide range of both positive and negative exponents to be represented. However, in 2’s complement notation an overflow must occur to go from the least negative number to zero. To avoid this the bias of 128j¢ is added to the 2’s complement number. Historically, minicomputers have been discussed and explained using octal notation. In octal, the bias of 1289 is 200g. In previous manuals this exponent notation has been discussed using octal form. As a result, it is called excess 200g or excess 200. However, the VAX is discussed using hexadecimal notation. Unfortunately, when discussing the excess 80 bias in VAX documentation, it has been called 80y, 1280, 2003, and 10000000, (sometimes the base is indicated, sometimes it isn’t). When studying the FPA print sets, technical manuals, and microcode listings, be aware of this variation in terminology. In this manual hex notation is used and the exponent bias is called excess 80. When multiply and divide operations are performed using floating-point numbers with excess 80 expo- nent notation the resulting exponent must be adjusted by the bias to return the result to excess 80 notation. When a multiplication is performed exponents are added, 80;6 must be subtracted from the result to return it to excess 80 notation. To understand why 80 must be subtracted from the exponent calculation during multiplication, consider the following. Exponent A + 80 Excess 80 notation Exponent B + 80 Exponent A + Exponent B + 100 Both exponent A and exponent B are biased by 80, yielding a bias of 100. However, only a bias of 80 is desired in excess 80 notation. Multiplication Example 2X3=6 Exponent Fraction 2=0.100 X 82 3=0.110 X 82 Exponent Calculation Fraction Calculation 82 2=0.100 3=0.110 +82 1000 104 100 6=0.011000 X -80 1-15 84 Normalize the fraction by left-shifting one place and decreasing the exponent by 1. Fraction Exponent 0.11000 X 83 =6 When a division is performed, exponents are subtracted and 80;¢ must be added to the result to return it to excess 80 notation. To understand why 80 must be added to the exponent calculation during division, consider the following: Exponent A + 80 - Exponent B + 80 Exponent - Exponent B + 80 - 80 = Exponent A - Exponent B + 0 A However, since the result is to be in excess 80 notation, 80;¢ must be added to the exponent, yielding Exponent A - Exponent B + 80. Division Example 16/4 =4 Fraction Exponent 16 = .10000 X 85 4 = 10000 X 83 Fraction Exponent Calculation Calculation 1.000 85 0.10009)0.10000.000 _—_8% +80 82 Normalize the fraction by right-shifting one place and incrementing the exponent. Fraction Exponent 10000 X 83=4 1-16 CHAPTER 2 FUNCTIONAL DESCRIPTION This chapter explains the operation of the FPA. The chapter can be divided into four areas: introduction, algorithms, hardware operation, and microcode. The introduction (Paragraph 2.1) discusses the various types of data formats that may be handled by the FPA. The algorithms (Paragraph 2.2) lists the various instructions the FPA can do and explains the FPA operations required to perform each operation. This section discusses the FPA operation based on instruction flow. Hardware operation (Paragraph 2.3) breaks the FPA into hardware blocks and discusses the operation of each. Both the algorithm section and the hardware operation section should be read to get a thorough understanding of the FPA operation. They discuss the same equipment from different viewpoints. Microcode (Paragraphs 2.4 through 2.6) summarizes both the FPA microcode and the FPA specific microcode in the CPU. This discussion focuses on the generation and monitoring of the various control signals passed between the units. 2.1 DATA FORMATS The FPA handles single (float) and double precision floating-point data and signed integer longwords. It receives normalized, packed data from the CPU and returns normalized, packed results to the CPU over 32-bit wide buses. Within the FPA, intermediate data is transmitted over two 34-bit wide buses. The data formats used by the FPA are compatible with these bus structures as well as the input and output formats of the various data manipulation units within the FPA. 2.1.1 Floating-Point Numbers Floating-point numbers consist of sign bit, exponent bits, and fraction bits. A single precision floatingpoint number is stored in CPU memory as 4 contiguous bytes starting on an arbitrary byte boundary. Bits are labeled from the right, O through 31. The number is specified by its address A, the address of the byte containing bit O (Figure 2-1). The range of a single precision floating-point number is approx- imately .29 X 10-38 through 1.7 X 1038, The precision is typically 7 decimal digits. A double precision floating-point number is stored as 8 contiguous bytes. Bit labeling and addressing is similar to a single precision floating-point number. A double precision number has a range similar to a single precision, but its precision is about 16 decimal digits (Figure 2-1). 2-1 SIGN FRACTION + .6567 EXPONENT X 212 A NORMALIZED FLOATING POINT NUMBER. SIGN BIT FRACTION BITS [t X X X X L. (Z JO [ 199YS) 1eWIO 1uIod-Suneold ¢ [-C 2131y 31 [ EXPONENT BITS eooee | y 1 L O ORDER FRACTION COMPUTER REPRESENTATION. J y { | (excess 200 NoTATION) | SIGN 16 15 14 [ ] , EXPONENT 7 6 | ) HioRDERFRACTION ]| 1 1 33 32 [ RECEIVED BY FPA. SIGN 31 , T 16 L. . FRACTION OVERFLOWJ_T | 15 | 14 ] ‘ 7 EXPONENT 6 | ‘ 0 H.O.FRACTION | I HIDDEN —| J \ I_—L] ol H] Ho.FrACTION | L 0. FRACTION | [ __ 133 FP BUS A + FP BUS B. RESULTS) v v AS TRANSFERRED ON FPA BUSES; (UNNORMALIZED, INTERMEDIATE — SIGN f2SOR N VAX MEMORY. ED A EXPONENT ] D IN FPA (UNPACKED: Cir\?g:muzeo (nesuus,) ] v SIGN 32 31 [oT+T , 16 L. 0. FRACTION 15 [ 14 | 7 EXPONENT 6 | 0 _HoO.FRACTION oR RET o READY FOR RETURN TO CPU | (PACKED, NORMALIZED) | RETURNED TO CPU SIGN 31 [ 16 L. 0. FRACTION 15 | 14 | : EXPONENT 7 6 | ' Ho.FrRACTION 0 NOTE 1: A NORMALIZED NUMBER HAS A O (ZERO) OVERFLOW BIT. AND A 1 HIDDEN BIT. TK-0528 a. Single Precision SIGN FRACTION + 657 EXPONENT |(excess 200 NOTATION) | [t XX X eooee |y g ey A NORMALIZED FLOATING POINT NUMBER EXPONENT BITS FRACTION BITS SIGN BIT 214.7 X COMPUTER REPRESENTATION —— L —_— r 63 rFRacTiON (T Jo T 199y§) 110y 1uled-Suneol] £-C [-g 21081y | 33 32 31 [ NOT USED—-T—-T | \ rraction | FRACTION | L;B SIGN 32 31 FRACTION 1 FRACTION | | | [} 16 15 14 | | SIGN \ 0 33 32 31 FRACTION 161514 [ | 1 l o] H|FRACTION| FRACTION | FRACTION [ FRACTION {1615 | 0 33 32 31 A FRACTION — ] A |0 l 1] 1615 | 35 FRACTION \ FRACTION 31 | FracTiON EXP f ) 1 NOT USED—U 31 | | FRACT FRACTION i FPA. AND RECEIVED BY FPA (TRANSFERRED IN TWO TRANSFERS: BITS 0-31 FIRST TRANSFER, BITS 32-63 SECOND TRANSFER) 0 76 AS STORED IN VAX MEMORY, TRANSFERRED TO AS TRANSFERRED ON FP BUSES (UNNORMALIZED, INTERMEDIATE RESULTS). COMPLETE NUMBER (66 BITS TRANSFERRED SIMULTANEOUSLY) AS USED IN FPA (UNPACKED. UNNORMALIZED RESULTS) LSB " | . MSB - MSB"s | | § Exp | FRAcCT | L —— 33 32 31 exp 0 46 OVERFL§W HIDDEN LS8 — } | \ SIGN 16 15 48 47 0 rFracion [ 16 15 14 76 SIiGN Mfs i ] BES 1 161514 | | ¢ exe 76 SIoN | 0 j 0 [ rract| 1 Mee READY FOR RETURN TO CPU (PACKED. NORMALIZED) RETURNED TO CPU 1ST TRANSFER - 32 BITS (EXPONENT AND MOST SIGNIFICANT FRACTION BITS) 2ND TRANSFER - 32 BITS (LEAST SIGNIFICANT FRACTION BITS) NOTE 1 A NORMALIZED N UMBER HAS A 0 (ZERO) OVERFLOW BIT. A ND A HIDDEN BIT. b. Double Precision TK-0527 Floating-point numbers are transmitted to the FPA as packed, normalized numbers without a hidden or overflow bit. A single precision (float) number will have 24 fraction bits and a double precision number will have 56 fraction bits. Hardware in the FPA inserts and handles both the hidden and overflow bits. The number is split apart and used in various data manipulation units in the FPA. Although all operations begin with normalized operands, the intermediate results produced by the FPA data manipulation units can vary widely. Subtraction of nearly equal numbers can produce a number very close to zero. Addition and division can produce numbers close to 2. As a result intermediate results are transferred between data manipulation units as unnormalized numbers with both hidden and overflow bits. After the result is normalized, it is ready to return to the CPU. When the result is transmitted, it is transmitted as a packed, binary normalized number without hidden or overflow bits. POLY uses specialized floating-point notation for intermediate results. In POLY, 7 additional bits are used for fraction addition. POLY execution consists of multiply, add, multiply, etc. To maintain maximum accuracy while functioning within the limitations of the FPA hardware, 7 additional LSBs are transferred from the fraction multiply (FMH + FML) hardware to the fraction add hardware (FAD). The 7 additional bits come from LSH <11:5> along FP bus A <14:08> into AR <06:00> (also called ARX). The FPA performs the add on the extended precision number, then transfers the addition result to the normalizer logic (FNM) where it is rounded, normalized, and held for the next part of the POLY instruction. The EMOD instruction causes a 32 X 24 (64 X 56 for double) bit fraction multiplication to be performed in the FMH and FML. The extra 8 bits in the multiplicand are transferred over the ID bus to FP bus B line <07:00> to MCINT (also called MCX). MCINT <07:00> drives MCAND bus <07:00> for the fraction multiply. MPLIER is handled in the usual fashion. The result of the extended precision multiply is transferred to the CPU in one 32-bit transfer (F) or two 32-bit transfers (D). 2.1.2 Integer Numbers The FPA handles a single integer format instruction, MULL (multiply longword). A longword is stored in CPU memory as 4 contiguous bytes starting on an arbitrary byte boundary. The FPA receives two 32-bit signed integers and multiplies them as unsigned integers to form a 64-bit product. The product, a 64-bit number, is returned to the CPU in two 32-bit transfers (low half first) for further processing. Refer to Figure 2-2 for summary of integer format. 2.1.3 Literals The FPA handles float and double precision literal data. It receives the data from the CPU IB. Float literal data is transferred from the IB to the FPA’s Literal Register (LR) using the ID bus. The FPA then loads the LR data into FPA internal registers and begins processing. The first half of double precision literal data is handled similarly. The second half comes from the CPU D-register via the ID bus and is loaded directly from the ID bus into the FPA internal registers. 24 INTEGER (MULL) FORMAT 0 3130 LSB MSB T 0 333231 MSB LSB AS TRANSFERRED ON FPA BUSES UNSIGNED (POSITIVE) NUMBER —~ NOT USED 03 31 MSB jewlIo J8auy -z aIndig RECEIVED BY FPA. 2's COMPLEMENT (SIGNED) NUMBER SIGN S-¢C AS STORED IN VAX MEMORY TRANSFERRED TO FPA AND SALU AALU 31 . 4 0 31 LSH REG LSB 0 LSB RESULT STORED IN FPA RESULT TO CPU (VIA FP BUS A TO DFMX BUS) 1st TRANSFER 31 «| 0 MSB * BITS 32 AND 33 OF FP BUS NOT USED 2nd TRANSFER TK-0523 The FPA handles short literals. Short literals contain only six data bits and are part of the instruction. The CPU formats the six data bits within the 32-bit data longword based on instruction type (floatingpoint or integer instruction.) If it is an integer instruction (the FPA handles only MULL), the six data bits are zero extended (26 zeros are added.) Any integer between 0 and 6319 can be written using a short literal. If it is a floating-point instruction, the short literal is assumed to contain three exponent bits and three fraction bits. The IB packs the data into standard FP format. This includes excess 80 notation for the exponent, a positive sign bit and a normalized fraction with a one hidden bit that is not stored. Refer to Figure 2-3 for FPA short literal format, and Table 2-1 for data that can be transferred using floating-point short literal form. Notice only positive numbers can be transferred. If a double precision short literal is specified, the FPA accepts the first half and manufactures zeros to fill the second half. 5 3 2 EXPONENT| A. SHORT LITERAL DATA; 0 FRACTION AS STORED IN INSTRUCTION STREAM ——d 161413 ZEROS B. 10 9 ZEROS 4 DATA 3 0 ZEROS SHORT LITERAL DATA: AS FORMATTED BY IB AND TRANSFERRED TO FPA FOR A FLOATING-POINT OPERATION TK-0519 Figure 2-3 Short Literal Format Table 2-1 Exponent 0 1 2 3 4 5 6 7 Floating Literals Fraction 0 1 2 3 4 5 6 7 1/2 1 2 4 8 16 32 64 9/16 1-1/8 2-1/4 4-1/2 9 18 36 72 5/8 1-1/4 2-1/2 5 10 20 40 80 11/16 1-3/8 2-3/4 5-1/2 11 22 44 88 3/4 1-1/2 3 6 12 24 48 96 13/16 1-5/8 3-1/4 6-1/2 13 26 52 104 7/8 1-3/4 3-1/2 7 14 28 56 112 15/16 1-7/8 3-3/4 7-1/2 15 30 60 120 The FPA also handles long literals (32 or 64 data bits). Thirty-two bits, either a complete single precision transfer or the first half of a double precision, are transferred from the IB to the FPA LR. The second half of the double precision number is taken directly from the ID bus. Float and double precision floating-point data can be transferred using long literal format. The FPA also receives 32-bit integer data using the long literal format. (The FPA does not handle any 64-bit integer operands.) 2.1.4 Zero and Reserved Operand Codes The FPA checks all data received for zeros and reserved operands during the fraction processing. Both zero and reserved operand function as codes transmitting special information. As discussed in Para- graph 1.4, the FPA assumes all floating-point numbers to be normalized numbers (between 1/2 and 1) with a hidden bit that is not stored. The hidden bit is normally inserted by data manipulation hardware. A zero cannot be represented as a normalized number and the hardware that inserts the hidden bit only increases the problem of representing and using zero. As a result, zero is represented by a code with zeros in the exponent bits (no excess 200 notation) and a clear sign bit. The fraction bits do not matter. Whenever this combination of bits is sensed, the FPA accesses special microcode that simulates the special properties of addition, subtraction, multiplication, and division with zero. Refer to Table 2-2 for the result of an operation with zero, and Figure 2-4 for the zero code. Table 2-2 Zero Operand Microcode Operation Operand(s) Operation Result Add 0+X, X+0 0+0 X operand returned Zero returned* Subtract 0-X X-0 0-0 -X returned X operand returned Zero returned Multiply 0X%0, XX0,0xX Zero returned* Divide 0-+X (dividend is zero) Zero returned* X =+0 (divisor is zero; divide by zero) Error conditiont * Zero code is returned, O in sign and exponent. t+ FPA informs CPU that division by zero was attempted by asserting FPA error and PSL V bit and not-asserting FP SYNC. ZERO CODE 31 16 15 14 7 6 0 DON'T CARE 10| ZERO DON'T CARE FRACTION SIGN EXPONENT FRACTION RESERVED OPERAND CODE 31 - 161514 DON'T CARE FRACTION 76 0 1 ZERO DON'T CARE® SIGN EXPONENT FRACTION TK-0517 Figure 2-4 Zero and Reserved Operand Code The code for reserved operand is zeros (cleared) in the exponent bits and a one (set) in the sign bit. One in the sign bit normally indicates a minus number so this sometime s called minus zero. A reserved operand indicates invalid data. It indicates data was accessed from a location that had not had data loaded into it, or a previous exception. Refer to Figure 2-4 for reserved operand code. 2.1.5 Hidden, Overflow and Guard Bits The FPA uses extra fraction data bits during fraction manipulation to completely represent the fraction data, to handle result overflow, and to ensure accuracy of fraction result. Refer to Figure 2-5 for location of hidden, overflow, and guard bits. USED BY FPA ADDED [ By 3 FPA DATA FROM CPU 333531 16 15 14 FRACTION OVERFLOW 76 EXPONENT SIGN 0le FP BUS LINES FRACTION WHERE GUARD BITS ARE HIDDEN TRANSFERRED TK-0518 Figure 2-5 Hidden, Overflow, and Guard Bits As discussed previously, the CPU stores floating-point numbers in a packed normalized form with the it is always a 1). The FPA receives the floating-poipt numbers in this form. To facilitate fraction calculatj on, logic on FNM adds the hidden bit to all CPU fraction data as it transported over the FP buses. The hidden bit is transmitted on FP bus (32). This means that all fraction data received by FPA fraction manipulation units have correct MSB of the fraction (called the hidden bit) not stored (since hidden bits. The FPA also transmits an overflow bit between fraction manipulation units using FP bus (33). The overflow bit handles unnormalized intermediate fraction results. The combination (addition, subtraction, or division) of two normalized fractions can create a result greater than 1. The overflow bit enables the FPA to transmit this unnormalized result from the fraction computation units to the fraction normalizer logic (FNM). To ensure accuracy of fractional results, the FPA data manipulation units add seven zeros called guard bits to the low order end of the fraction data they receive. This means a float fraction is 32-bits wide; a double, 64-bits wide. The POLY instruction loads extra data bits rather than zeros at the low order end of each coefficient fraction. The instruction also transfers additional low order data bits from the fraction multiply logic to the fraction add logic. These guard bits are dropped each time the POLY accumulation is normalized and rounded but they do ensure that the final answer is accurate. Without the guard bits, the right-shifting of a FP fraction to align radix points for addition and subtraction, or to normalize the result would lose the least significant bits off the right end of the shifted fraction. In some cases this loss would cause the last bit of the normalized result to be wrong. The guard bits prevent this. Guard bits are transmitted between FP data manipulation units using FP bus A (14:08). These lines normally transmit exponent data. This arrangement allows the FPA to maximize accuracy without additional hardware overhead. 2.1.6 Overflow, Underflow, Zero, and Reserved Operands The FPA monitors all operands and results for exceptional conditions. When the FPA senses one or more of these conditions it informs the CPU via various bits and combinations of bits. Fither one or both units begin special operations designed to minimize the effect of the condition. In some cases it stops the FPA’s current operation and returns the FPA to the IRD state where all logic and registers are cleared in anticipation of a new FP instruction. The following paragraphs discuss these various unusual conditions. Table 2-3 summarizes the FPA and CPU operations caused by the unusual conditions. Table 2-3 Exception Conditions Exceptions Encountered Op Code Zero Operand Reserved Operand Result ADD, SUBT, MULT, Microcode simulates FPSYNC (ACCO) clear arithmetic operation ERRSYNC (ACC1) set with zero (Table 2-2). | CPU traps FPA to IRD All operations handle the occurrence ot zero, underflow, and overflow results similarly.* ZERO DIVIDEND — Microcode returns zero as result ZERO — The zero code and FPSYNC are sent. PSL Z bit EMOD DIVIDE FPSYNC (ACCO) clear ERRSYNC (ACC1) set PSL V bit clear is set. ZERO DIVISOR — Divide by zero ERROR — FPSYNC (ACCO) clear ERRSYNC (ACC1) set PSL V bit set UNDERFLOW — Zero code, FPSYNC, and ERRSYNC are sent. PSL Z is set. If PSL U (underflow) is set underflow causes a trap, otherwise operations continue. CPU differentiates between ZERO DIVISOR and RESERVED OPERAND by examining PSL V OVERFLOW — Reserved code, FPSYNC, and ERR bit. In both cases, CPU traps FPA to IRD. SYNC are sent. PSL V is set. CPU traps FPA to IRD. POLY* POLY microcode FPSYNC (ACCO) set simulates POLY ERRSYNC (ACC1) set operations with zero. In STATUS REGISTER, (Table 2-2 and minus ZERO ERROR Paragraph 2.2.6). bit set. CPU checks argument = RESERVED OPERAND. FPA checks coefficient = RESERVED OPERAND. MULL No checking of MULL operands or results is performed by FPA software or hardware. Any combination of bits can be interpreted as an acceptable integer. * When POLY flows note a RESERVED OPERAND, UNDERFLOW, or OVERFLOW, both FPSYNC (ACCO) and ERRSYNC (ACC1) are set. CPU examines PSL and FPA STATUS REGISTER to determine exception condition. RESERVED OPERAND sets the MINUS ZERO ERROR bit. OVERFLOW sets the PSL V bit. UNPERFLOW sets PSL Z bit. 2-10 Overflow and Underflow The FPA can handle a very large but bounded, range of numbers. Numbers too large (overflow) or too small (underflow) cannot be accurately handled (Figure 2-6). Special hardware monitors the results of all FPA operations for overflow and underflow conditions. The FPA checks for overflow and underflow by monitoring the exponent results. The monitoring is straightforward because of the excess 80 notation used. If the exponent with its excess 80 bias exceeds FFj¢ an overflow has occurred. If the exponent is less than 0, an underflow has occurred. OVERFLOW —.111 X27F — RANGE 1X2~80 UNDERFLOW 1x278° RANGE * o 111X 2" OVERFLOW RANGE -« ~ 1.7 X 1038 B ~—29 x 1038 IMOST NEGATIVE ~.29x 1038 ZERO NUMBER SMALLEST SMALLEST NEG. NUM. POS. NUM. ~ 1.7 X 1038 | * EXACT ZERO DOES NOT CAUSE UNDERFLOW TK-0521 Figure 2-6 Overflow and Underflow Ranges If an overflow condition is sensed, the overflowed number is useless. The FPA manufactures a reserved operand and informs the CPU that an overflow occurred. The CPU notes the overflow and stores the reserved operand. The FPA returns to IRD. Underflow is not as serious a problem. It merely indicates that the number is so small and so close to zero that the FPA cannot accurately represent it. If an underflow occurs the FPA sets the underflowed number to zero and informs the CPU that an underflow has occurred by asserting both FP SYNC and ERR SYN. It is important to inform the CPU that a zero has been returned because the CPU may at some later time attempt a division by the result (division by zero results in an error). Zero If a zero code is encountered in an operand transmitted to the FPA from the CPU, FPA microcode simulates the special properties of addition, subtraction, multiplication, and division with zero. Refer to Table 2-2 for the result of an operation with zero. If an exact zero is generated as a result of an FPA operation, the zero code is returned to the CPU and the condition code bits are set for a zero result. Zero can be generated in a normal arithmetic add or subtract operation (equal or equal-opposite operands) or in a microcode simulated arithmetic operation with a zero operand. An operation that generates an exact zero does not assert ERR SYN like an underflow operation (although both return a zero code). Reserved Operand Refer to Table 2-3 for the condition codes returned to the CPU when a reserved operand is encoun- tered by the FPA. 2-11 2.2 INSTRUCTIONS AND ALGORITHMS This section concentrates on the microcontrol used to carry out each FPA instruction. Each instruction accesses different microcontrol addresses to correctly move and load operands, compute intermediate results, and ready the final result for return to the CPU. Special instructions check for and handle errors and exceptional conditions. This section details the data flow between hardware required to carry out the selected instruction. It only summarizes the hardware actions started once the data has been loaded by the microcontrol. Paragraph 2.3 contains a complete and detailed description of the hardware in each FPA section. Paragraph 2.2 and 2.3 complement each other and both should be read to thoroughly understand how the hardware implements each FPA instruction. As stated before this section concentrates on data flow. Figure 2-7, FPA block diagram, shows the data bus interconnections and the various register in the FPA. Although this figure is not specifically referenced in the discussion it will help in understanding the data flow and should be referred to frequently. 2-12 FCT m8289 sLOT 28 FPA CONTROL, SIGN PROCESSOR, I FAD m8288 SLOT 27 FRACTION ADDER EXPONENT PROCESSOR EXPONENT DIFFERENCE FRACTION NORMALIZER FMH/FML ms286/mM8287 SLOT 25/SLOT 26 FNM FRACTION MULTIPLIER 232 64 7 SALU 1 + 32 DALU : v | DOUBL [rewe Je— CARRYS l A £ l PRECISION 50 1 4 4 32, 18 NROM | SIGN PROCESSOR ] EALU SIGN PROCESSOR LOGIC OPCODE Ts s Ts 1 11 |__5§] LSA LB SB ‘ EALU ouTt SIGN ouT ACCM | MUX I I l 15 32 AR |ARX 1 BR l 7| 1 i A 25 - l OPCODE 32 [ROMA MATRIX [ 24] 32 } BUS FP B <33:00> IRD CONTROL 2120'13 LL 'MATRIX o Ty MC1 MCO | |MC INT 1 ¥ 32 —»| X ROM | FPA LOGIC BUF. DATA b——» | —I NEXT ADDRESS ' CS BUS <95:00> I I | ; 604 HADRS REGS A {3 9 SYSTEM ID BUS <31:00> —{ GEN l 4 -] NR 1. 34| MUX 68 A B XFER I uBRK | , I BUS L 1 32 1 32 | RLA _ | LB | GRA GRB DRIVERS 32 | = BUS DFMUX <31:00> (T.S.) TK-0538 Figure 2-7 FPA Block Diagram 2-13 1 26 NSHF I XMTR RCVR ROUND BIT . )D—v UMATCH s _ | | l ['o |;|‘_R 8 | 3, \'MX l 8 l OUTBUF 24 60 TROM B I \ SEL P gg":\TROL —# SPECIFIERS 32 8 3’"’ LU MUX L o l ] )i 60 SEL ! B ] RNDMX | FPA CONTROL 8 I 64 NIB - mux | mp|_(MUX L] 1.4 7 ] ROM BUS FP A <33:00> I8 L1 J 25|32 FPB - 1 LR NALU /A I l BUS . / LSH = l - FPA QUOTIENT A 14 | I s 15 l ' ms285 SLOT 24 | During IRD (instruction decode) the FPA performs some operations that are prerequisites to many FPA instructions. The FPA assumes a R-R float instruction and begins FPA register loading. The FPA has two copies of the CPU general registers. During IRD, it receives specifier information from the IB and accesses the register addresses contained. The contents of the first specifier is placed on FPA bus A, the content of the second on bus B. The data on bus A is loaded in AR1, LA, SA, MCI1, and MPO,; bus B loads BR1, LB, SB, MP1, and MCI. AR1 and BRI are fraction registers used for the addition and subtraction of floating-point numbers. LA and LB are loaded with the exponents of the numbers and immediately the hardware begins an exponent difference calculation. The exponent difference and/or which exponent is larger is needed for floating-point additions, subtractions, and multiplications. SA and SB are input registers for the sign-processing hardware. Fraction data from specifier 1 (on bus A) is loaded into multiply registers, MC1 (multiplicand) and MPO (multiplier). Fraction data from specifier 2 (on bus B) is loaded into MP1 (multiplier) and MCI (multiplicand-integer). MC1 and MP1 hold operand data for MULF and EMODF instructions. The hardware multiply begins the MULF or EMODF fraction multiply operation during IRD using MC1 and MP1. MCI and MPO contain the operand for a MULL instruction. During IRD, numerous FPA instructions have been started. If the instruction is a float register-toregister, both operands are already loaded and ready in the FPA. Exponent manipulations needed for add, subtract, and multiply operations have started. MULF and EMODF fraction multiplication have started. If the instruction decoded is a MULL, the multiplier and multiplicand have already been loaded into the proper registers. 2.2.1 Add/Subtract The FPA add/subtract operations can be broken into three states: 1. 2. 3. Load Add/Subtract Normalize. 22.1.1 Load - While the FPA is in IRD, it is setting up for a float, R-R operation. This means that specifiers 1 and 2 from the instruction buffer are being placed on FP buses A and B, respectively. Bus A loads ARI1 (fraction register), LA (exponent register) and SA (sign latch). Bus B loads BR1, LB, and SB. When the FPA decodes a floating-point instruction, it enters A-Fork and selects a microword address based on op code and specifier types. If the instruction is a float R-R A /S, the FPA enters the optimized add/subtract execution state immediately. If, however, it is not, the FPA, undercontrol of the selected microword, receives and stores the required data during A-Fork and possibly B-Fork flows. If it is double-precision, 32 additional fraction bits are loaded into both ARO (extension of AR1) and BRO (extension of BR1.) If it is not an R-R operation, the new data from the correct source is loaded into AR1, LA, SA, BR1, LB, and SB. As tne final correct operands are loaded, whether during IRD (in the case of float R-R operations) or during some following microcontrol state in A-Fork or B-Fork, the exponent difference of the two operands is determined by comparing LA and LB in DALU and CALU. Based on the exponent difference, the fraction associated with the smaller exponent is loaded into SHMX and right-shifted by ASHR until the radix points align. This happens before entering the add/subtract state. 2.2.1.2 Add/Subtract - In this state, the fractional result is computed. Based on the op codes, signs of the operands, and exponent difference, FALU operation is selected. Normally, the FALU adds or subtracts the already aligned fractions for the fractional result. Refer to Table 2-4 for normal FALU operation, and Table 2-5 for special FAD operation criterion. 2-14 Table 24 FALU Operation Op Code Operand Sign FALU Operation ADD ADD SUBT SUBT Same Diff Same Diff Add Subtract Subtract Add Table 2-5 Combination of Conditions Initializing Special FAD Operation FALU Subtract Exponent Diff Op Code Precision Yes Yes Yes Greater than7 Greater than 1 Less than 2 X POLY POLY D D X X = Don’t care The special FAD operation is used to ensure maximum accuracy in the result while operating within the FPA hardware constraints. The special FAD operation involves complementing the fraction associated with the smaller exponent by subtracting the fraction from zero in the FAD, returning the complemented number to the fraction register (either AR or BR) it was in originally, and then loading it into SHFMX and right-shifting and sign-extending based on exponent difference until the radix points align. This special operation takes an extra microstep but ensures maximum accuracy. As a result, the actual fraction subtraction to produce the result does not take place until this third state. During the add/subtract state, the larger exponent is transferred to the PR. 22.1.3 Normalize - In this state, the answer is readied for return to the main machine. This involves final normalization of the fraction, adjustment of the exponent and determination of the resultant sign. If the calculation involved special FAD operations as discussed in the previous paragraph, the fraction subtraction will first be carried out and then the result will be readied for return to the main machine. When entering the normalization flows, the FPA checks three conditions: 1. 2. 3. Exponents equal zero FALU subtract with exponent difference less than two Subtract, exponent difference less than 7, and DP. If a zero operand is noted, the other (non-zero) operand is transferred to the output and if it is the subtrahend in a FALU subtraction, the sign is complemented (minuend - subtrahend = remainder; 0 X = -X). A FALU subtraction with exponent difference of 1 or 0 initiates special flows because the subtraction of two nearly equal numbers can result in a very small fraction (numerous leading zeros) which might require many shifts before the first significant bit is located. The special flow initiated can shift the result up to sixty places to find the first signficant bit before it is transferred to the standard normalize routine. If a first significant bit is not found after 60 bits have been shifted, a zero is readied as a result. If the third branch is taken, the addition state described in Paragraph 2.2.1.2 results, then flow reenters the normalization routine. 2-15 Usually, the unnormalized result requires a shift of four places or less. If this is the case, the four MSBs are examined to locate the first significant bit. Based on the location of the first significant bit, a rounding byte is added to the fraction. If the result from a FALU subtraction is negative, the FALU result is subtracted from the rounding byte to return the number to sign magnitude notation and round itin a single step. Once the FALU result is added to or subtracted from the rounding byte, the fraction is shifted and least significant bits are dropped. In all cases, the number of shifts required to ready the fraction for return to the CPU is computed and is used to adjust the exponent in the PR. Once completed, the exponent, the normalized fraction, and the sign of the result are placed on the FP bus A. When the complete result is on the bus, standard routines handle the actual transfer to the main machine. 2.2.2 Mutiply (Floating-Point) The FPA multiply operation can be broken into three operations: load, multiply, and normalize. In the process of carrying out a FP multiply, the FPA receives the operands (each consisting of an exponent, fraction, and sign bits), checks for zeros and reserved operands; loads the exponent, fraction, and sign bits into the appropriate registers; starts the hardware to carry out the required calculations ; and assembles and readies the result for return to the CPU when notified that the hardware calculation is finished. 2.2.2.1 Load - To maximize speed, the FPA is continuously setting up for a float R-R operation. This means that in IRD specifiers, 1 and 2 from the instruction buffer are addressing the GPRs (generalpurpose register) in the CPU, and the register data is being placed on FP buses A and B, respectively. Bus A loads MC1 (multiplicand register), LA (exponent register) and SA (sign latch.) Bus Bloads MPI (multiplier register), LB, and SB. When the FPA decodes a floating-point instruction, it enters A-Fork and branches to a specific microword based on op code and specifier types. If the instruction is a float R-R multiply, the operands are already loaded and the FPA enters the multiply state immediately. If, however, it is not, the FPA, under control of the selected microword receives and stores the required data during A-Fork and possibly B-Fork flows. If it is a double-precision multiply, 32 additional fraction bits are loaded into both MCO (extension of MC1) and MPO (extension of MPI1 .) If one or both of the specifiers are not registers, ail new data will be loaded into MCl, LA, SA, MPI, LB, and SB. As the final correct operands are loaded, whether during IRD (in the case of float R-R operations) or during some following microcontrol state, the fraction multiplier begins the fraction multiply by breaking the fractions into nibbles and beginning the hardware multiplication using the first multiplier nibble. 2.2.2.2 Multiply - In the multiply state, the fraction multiplication continues until a final yet unnormalized) is computed, the exponents are added, and the sign of fraction (as the result is computed. The fraction multiplication is initiated when the multiply flows issue MCONT (multiply continue.) As MCONT is issued, the FPA checks for operands equal to zero or minus zero (reserved operand.) If a zero operand is found, computation stops and the FPA immediately returns a zero to the base machine. If a reserved operand is found, the operation aborts. If neither are found, computation continues. In the case of a float (single-precision) multiply, the fraction multiplica tion is completed as the exponent calculation is completed. The product is transferred to the NR. In a double-precision multiply, the microcontrol enters a wait state. While waiting during a double-pre cision multiply, the FPA continually transfers the output of the fraction multiplier to the normalizer. This enables the FPA to begin normalizing the fraction result as soon as the multiplication is complete. It remains in the wait state until a hardware counter in the fraction multiply logic asserts MUL/DI V DONE indicating the fraction multiply is complete. 2-16 While the fraction multiply and the check for zeros and reserved operands is taking place, the exponents are added If no zeros or reserved operands are found, the fraction multiply and exponent processing continues. After the exponents are added, a bias of 200g or 804 is subtracted from the exponent result to return the exponent to excess 80 notation (refer to Paragraph 1.5). In a multiply operation, the sign of the result is the exclusive-OR of the operand signs. By the time the fraction multiply is complete, the exponents have been added, and exponent bias subtracted, and the sign of the result has been calculated. The result of the fraction multiply is moved to NR. 2.2.2.3 Normalize - The normalize state of a floating-point multiply is very simple. Since the input operands are always between 1/2 and 1, the result is always between 1/4 and 1. This means that the result can be normalized with a single shift of four bits, or less. In the normalize state, the fraction is rounded and shifted, and the exponent is adjusted to reflect the normalization shift. The normalized fraction, adjusted exponent, and sign bit are placed on the FP bus A. Once the complete result is on the bus, standard routines handle the actual data transfers to the main machine. 2.2.3 MULL (Multiply Integer Longword) The FPA’s MULL algorithm is the simplest and most straightforward of all the operation flows. The FPA receives two 32-bit signed integers, performs an unsigned multiplication, and returns the 64-bit answer to the base machine. The FPA performs no result normalization, no checks for reserved operands, zero operands, or other error conditions. Microcode in the base machine generates the condition codes and handles all the checks and manipulations required to ensure a correct result. 223.1 Load - As discussed in introductory Paragraph 2.2, the FPA during IRD loads MP0O and MCI (the two registers used in MULL operations) with the register contents of specifier 1 and 2, respectively. If the instruction decoded in the A-Fork flows isa R-R MULL, the FPA can begin the multiply immediately. If it is a MULL but not an R-R, the FPA will, under the control of the selected microaddress, load data from the correct source into either or both MPO and MCI. 2.23.2 Muitiply and Return - The decoding of a MULL causes the fraction multiply hardware to abandon set-up of a MULF and begin accessing the registers used for MULL (MCI and MP0.) When the proper data has been loaded, MCONT is issued by the FPA. This indicates to the fraction multiply hardware that the correct data is in MPO and MCI, and that the data accesses started previously were accessing correct data. MCONT enables the fraction multiply hardware to continue multiplying. The multiply continues, controlled by a hardware sequencer within fraction multiply hardware, while the FPA waits two machine cycles. The answer accumulates in ACCM and LSH. After two wait cycles, the multiply is finished. The hardware stops and the FPA makes the 32 low-order bits (from LSH) available to the CPU. When the CPU responds with CPSYNC, indicating the low-order bits have been stored, the FPA readies the high 32 bits from SALU for transmission to the CPU. 2.2.4 Divide The FPA divide operation can be broken into three steps: load, divide, and normalize. To do a floating-point divide, the FPA receives the operands (each consisting of sign, fraction, and exponent bits), loads the operands into holding registers, tranfers the operands from the holding registers into the correct division registers, starts the hardware to do the fraction division, checks for zero and reserved operands, starts the hardware to store the result, and normalizes and packs the result for return to the CPU. 2-17 2.2.4.1 Load - The loading of division operands takes place in two substeps: data fetch, and division register load. Unlike the FPA add/subtract, multiply, and MULL operations, the FPA does not load division operands into the proper division registers during IRD (Table 2-6). Table 2-6 IRD The Division Load Specifier 1 Specifier 2 Register and float assumed (divisor) Register data to AR1, LA, SB Register and float assumed (dividend). Register data to BRI, LB, SB Data Fetch Substep Op code decoded, specifiers and precision known New data loaded into ARI and ARO*, LA, and SA, if needed. New data loaded and LA, and SB, if BRO*, into BRI needed. Division Register Load Substep 2 microwords st Microword - move LA (divisor exponent) to XR. Move BR (divident fraction) to NR. 2nd Microword - move AR (divisor fraction) to just vacated NR. Move NR (dividend fraction) to RR and right shifts the just loaded divident fraction to compensate for RR’s hard wired left shift. This right shift ensures initial dividend properly represented. is Subtract XR (divisor exponent) from LB (divident exponent). *ARO and BRO are fraction extension registers for double precision operations. During IRD a R-R float operand is assumed. This means that both specifier 1 and 2 are assumed to be registers. The contents of the first register named is placed in AR, LA, and SA, the content of the second in BR, LB, and SB. If the operation decode is a R-R float divide, the data fetch substep is done and division register load may begin. However, if it is not an R-R float, divide microcode waits for data from the correct specifier and loads it into either ARI1, LA, and SA; and/or BR, LB, and SB. When the divisor is in AR, LA, and SA, and the dividend is in BR, LB, and SB; the data fetch substep is finished. The division register load substep loads the divisor’s and the dividend’s fraction and exponent components into the registers required to do a division. The loading of the proper registers takes two microcode steps. The first microcode step loads the divisor exponent into XR and loads the dividend fraction into the NR. The second microcode step finishes the register loading by moving dividend fraction (in the NR) to the RR and loading the just vacated NR with the divisor fraction from the AR. It also starts the fraction division hardware, checks for zeros and reserved operands, and subtracts the divisor exponent (XR) from the dividend exponent (LB) (LB - XR). 2-18 a zero 2.2.4.2 Divide - The divide operation continues unless a zero, or reserved operand is found. Ifdivisor a zero dividend is found, operations cease and a zero is readied for return to the CPU. Finding returned to until states error these in remain will FPA The states. error initiates or a reserved operand IRD by a CPU signal. result of If no zeros or reserved operands are found, the division continues. A bias 80 is added to the multiply the exponent subtraction to return it to excess 80 notation (Paragraph 1.5.) The fraction . hardware is started. This hardware is used to store the result of the fraction division as it is generated loop. The division continues under hardware control as the FPA microcode remains in a divide wait The hardware uses the restoring, repeated subtraction technique to divide. The dividend is initiallyd loaded into the RR and the divisor is stored in the NR. The divisor (contents of NR) is subtracte from the dividend (contents of RR). If the result is negative, a zero is left-shifted into result register in the fraction multiply hardware and the contents of the RR is left-shifted by one. If the result is positive left or zero, a 1 is left-shifted into the result register, and the result is loaded into the remainder register RR the of shifted by one. The divisor (contents of NR) is continually subtracted from the contents is now asuntil 26 bits (58 bits for double precision) of quotient are generated. MUL/DIV DONE serted. Asserting MUL/DIV DONE stops the division and ends the divide wait loop. The divide result is transferred from the fraction multiply hardware where it was stored during generation to the normal‘ ize register (NR) in the normalize hardware. is 2243 Normalize - Since the two initial operands are normalized (between 1/2 and 1), the resultand simple is always positive and between 1/2 and 2. This means the normalize and round operation data is will take only one microstep. The result is examined, a round byte is selected and added, and the directhe reflect to shifted as needed to produce a normalized result. The exponent result is adjusted are bit sign and tion and amount of the fraction shift. The normalized fraction, adjusted exponent, placed on the FP bus(es). Once the result is on the bus(es), standard storage routines handle the actual transfer to the CPU. 2.2.5 EMOD (Extended Precision Multiply and Integerize) 24-bit (64 X The EMOD operation is partially done in the FPA. The FPA performs an unsigned 32 Xmachine. The main the to 56-bit for double precision) multiplication and returns the fraction result two steps: main machine does all further processing. The FPA EMOD operation can be broken into operand load, and result calculation and return. an 8-bit 2.2.5.1 Operand Load - Loading the EMOD operands involves loading the multiplicand, single or (either and multiplic multiplicand extension, and the multiplier into proper registers. The flows These started. are flows double precision) is loaded into MC during A-Fork. In B-Fork, EMOD bus. wait for the CPU to fetch the multiplicand extension (8 bits) and transmit it to the FPA via the ID then is operand second The The FPA loads the extension into MCX which is part of the MCI register. multiplier transmitted to the FPA and loaded into appropriate multiplier register MPO and MP1. The but operands both with d is not extended. The FPA receives and stores the exponent and sign associate does not use them. and the 2952 Result Calculation and Return - Once the operands are loaded, MCONT is asserted found, are zeros If operands. reserved FMOD multiply begins. The operands are tested for zeros or error initiates operands reserved special flows stop the multiply and return a zero to the CPU. Finding continues asserted, MCONT by flows. If no exceptions are found, the multiply sequencer, started test. A multiplying. A single precision (float) multiply is finished in one microstep after the exponentuntil the loop wait the in remains It double precision multiply causes the FPA to enter a wait loop. multiply sequencer asserts MUL/DIV DONE indicating the result is computed. 2-19 When the result computation is finished, the fraction (32-bit CPU. The CPU does all further processing including sign normalization, and exponent calculation. 2.2.6 float, 64-bits double) is transmitted to the computation, removal of the integer part, POLY (Polynomial Evaluation) 2.2.6.1 Introduction - POLY is an FPA implemented instruct ion. The FPA does the majority of calculations required to evaluate a polynomial expression. This involves storing a constant, and an accumulation; receiving coefficients; repeated additions and multiplications using the constant, the accumulation, and the new coefficient, and the readying of a final result to be returned to the CPU. It also uses specialized operations (both hardware and microco de) to ensure maximum accuracy within the FPA hardware limits. The following paragraphs explain POLY flows, polynomial expression and define various terms, and s flows required to handle errors, under- POLY exceptions in detail. Also discussed are the numerou flows, overflows, and zeros. 2.2.6.2 The Polynomial Expression — The generalized polynomial may be written: f(x) = ap + ajx + axx2 + ... + ayx". The x, a constant within each polynomial, is called the argumen t and is raised to various powers: x!, x2, x3, ..., x1. The highest power represented here by n superscript is called the degree of the equation. The ap, aj, a, ..., a, are the coefficients. Rearrangement and factoring produces f(x) =ap + x(a; + x (a2 +...+ x(ap-1+ xa,))). The result, f(x), may be computed: a, times x then add ap-| ; the resulta nt answer times x and thenadd ap,_5... The generalized form is: (accumulation times x) plus coefficient, a;j, equals the new accumulation. the new The POLY instruction formatis POLY argument, degree, coefficients table. The FPA receives and The CPU uses the degree operand to determine when it has accessed the last coefficient of the table so it may inform the FPA that the POLY calculation is done. The coefficient tableis arranged in ay, ap_y, a,4-, ..., a1, and ag order. The CPU transmits the coefficients to the FPA as needed: a, first, a,_; next, ... stores the argument. 2.2.6.3 Normal POLY Flows - The FPA begins special POLY flows in B-Fork. The POLY argument is transferred to the FPA during A-Fork and then loaded into the argument registers. The argument fraction is loaded into MP, the exponent into XR, and the sign is SX. The argument remains in these registers throughout POLY execution. The FPA waits for the first coefficient to be sent so the POLY computation can begin. POLY computation can be divided into three large categories: 1. 2. 3. Argument and First Coefficient Handler Generalized POLY Computation (neither first term or last term) POLY DONE Handler (handles Ay, the last coefficient). This section will discuss the flow by these three categories. Within each category, microcode controls the normal operations, checks for exceptional conditions, and attempts to recover from any exceptional conditions. Refer to Figure 2-8 for a summary of the POLY flow. 2-20 POLY BEGINS WITH ARGUMENT IN AR, LA, AND SA Y FIRST COEFFICIENT HANDLER *MOVE ARGUMENT TO REGISTERS MP ~ AR ARGUMENT FRACTION XR ~ LA ARGUMENT EXPONENT SX « SA ARGUMENT SIGN * IF ARGUMENT IS ZERO, FLOW REMAINS IN THIS HANDLER WAITING FOR LAST COEFFICIENT WHICH WILL BE FLAGGED BY POLY DONE *WAIT FOR FIRST COEFFICIENT POLY 8-z 21ndig *MOVE COEFFICIENT TO REGISTERS MC.BR « A(N) COEFFICIENT FRACTION LB « A(N) COEFFICIENT EXPONENT SB « A(N) COEFFICIENT SIGN SA «~SB TRANSFER COEFFICIENT SIGN DONE (POLY DONE ASSERTED AND ARGUMENT OR DEGREE =0) ANSWER IS JUST LAST COEFFICIENT "READY COEFFICIENT FOR RETURN MO[d ATOd 24l *MULTIPLY COEFFICIENT AND ARGUMENT FORMING MULT.RESULT AR~ MP*MC MULTIPLY FRACTIONS LA.PR — XR+LB-128 ADD & ADJUST EXPONENTS SA « SA X0OR.SX COMPUTE SIGN TRANSFER EXPONENT TRANSFER FRACTION SA « SB TRANSFER SIGN NSHF « NR TRANSFER FRACTION ASSERT FPSYNC INDICATING ANSWER IS READY A RECOVERY ENTRY PR < LB NR « BR GO TO REGULAR STORE FLOWS *IF OVERFLOW/UNDERFLOW ENTER GENERAL POLY FLOWS ATTEMPTING _ | NORMAL LAST COEFFICIENT HANDLER OVERFLOW/UNDERFLOW ENTRY POLY STSO-NL GENERAL POLY FLOWS (NO POLY DONE) DONE "WAIT FOR COEFFICIENT LAST COEFFICIENT HANDLER (POLY DONE ASSERTED) *WAIT FOR COEFFICIENT "MOVE COEFFICIENT TO REGISTERS *MOVE COEFFICIENT TO REGISTERS BB « A(l) COEFFICIENT FRACTION BR « A{l) COEFFICIENT FRACTION LB « A(l) COEFFICIENT EXPONENT LB « A(l) COEFFICIENT EXPONEN i SB « A{l) COEFFICIENT SIGN SB < All) COEFFICIENT SIGN *ADD COEFFICIENT AND MULT. RESULT FORMING ACCUMULATION *ADD COEFFICIENT AND MULT.RESULT FORMING ACCUMULATION NB —AR+BR ADD FRACTIONS NR —~AR+BR ADD FRACTIONS PR « MAX(LA.LB) SELECT EXPONENT PR « MAX(LA,LB) SELECT EXPONENT MC <« NR NORMALIZED *IF OVERFLOW, ERROR PR «~ PR NORMALIZED "GO TO REGULAR NORMALIZE FLOWS SA « SR SIGN OF ACCUMULATION NSHF « NR NORMAL FRACTION PR « PR ADJUST EXPONENT ‘IF UNDERFLOW ACCUMULATION IS SET TO ZERO SA « SR SIGN OFRESU' T *MULTIPLY ACCUMULATION AND ARGUMENT FORMING MULT.RESULT ASSERT FPSYNC INDICATING ANSWER IS READY *IF OVERFLOW, ERROR AR ~MP*"MC ARGUMENT * ACCUMULATION PR~ PR+XR-128 ADD & ADJUST EXPONENTS SA < SA XOR.SX CGMPUTE SIGN *IF OVERFLOW/UNDERFLOW, CONTINUE GENERAL POLY FLOWS ATTEMPTING A RECOVERY ] Within the flows different microcode handles float and double precision operation. In POLY double coefficient, argument, and accumulation fractions each use an additional 32 low-order bits. The differences between float and double precision are not discussed in each operation because it is normally limited to longer fraction multiply times and slower fraction transfers. These come about because there are more bits to be multiplied and moved. When the first coefficient, Ag, is sent, it is loaded in MC, LB, and SB. Since the argument has not yet been checked, both the argument and the coefficient are checked for exception conditions and POLY DONE is checked. If any exception condition is noted, special flows are accessed. POLY DONE asserted indicates that the coefficient just sent was the final coefficient (in this case, the first coefficient is also the last coefficient). If the argument (x) is zero, all terms except the Ag term of the polynomial will be zero. Either POLY DONE asserted or x equals zero causes the FPA to access a special last coefficient routine in the argument and first coefficient handler that returns Ag to the CPU as the result of the polynomial calculation. After both the argument and the coefficient are checked and no exception conditions are found, the first multiply takes place. While the fractions are multiplied in the fraction multiply logic (FML and FMH), the exponents are added and adjusted to return the excess 80 notation (FCT) and the sign of the result is computed (FCT). When the multiply is done, the fraction is moved to AR for the addition operation. To maximize calculation accuracy, no normalization is performed after the multiplication and 8 additional low-order fraction bits are transferred to the AR register and stored in ARX. These 8 bits are used when the new coefficient is added to the multiplication result to produce the new accumu- lation. While the multiplication fraction result is being transferred to AR, the exponent result is checked for exponent overflow or underflow. If no overflow or underflow is found, the addition will begin as soon as the new coefficient data is ready. If, however, overflow or underflow are sensed, special flows that attempt to recover from the over/underflow are accessed (Paragraph 2.2.6.4). While the new coefficient data is checked for zero and/or reserved operands, the addition/subtraction begins on the assumption that the coefficient data will be valid. The exponent difference hardware selects the larger exponent for processing by the FCT and loads it into PR. It also shifts and loads the fraction associated with the smaller exponent into the B-input of FALU. FALU then adds or subtracts the fraction. When the coefficient data proves valid, the computed fraction result is transferred to NR where it can be normalized. The fraction normalization takes place in the FNM logic. A rounding byte is added and the result is shifted until normalized. The exponent is adjusted based on both the rounding byte and the number of shifts required to normalize the fraction. The normalized fraction is moved to MC and a multiply with the stored argument (x) begins. Once the first multiply is completed, the POLY calculation is in the general POLY flow. These flows multiply by the result of the last add and normalize by the argument (x), receive a new coefficient from the CPU, check it for exceptional condition, then add it to the result of the multiply operation, normalize the result of the addition, and ready it for the next multiply. The general POLY flows check the intermediate results for overflow, underflow, and zeros, and access special flows if an exception is found. The general POLY flow continues until the CPU sends a coefficient flagged with POLY DONE rather than CP SYNC. This indicates that the coefficient just transmitted is the finai coefficient in the table. The POLY DONE flow adds the final coefficient and then accesses the normalization flows in the FPA addition flows. These flows round and normalize the fraction and adjust the exponent based on therounding byte and normalization shift. Once the result is complete, it is placed on the FP bus A and standard routines handle the transfer to the CPU. 2-22 have many special sections to check for and 22.6.4 POLY Exception Flows - The POLYisflows checked for zeros and reserved operands. The POLY handle exceptional conditions. Each coefficient the argument and degree for reserved operand. The argument is checked for zero. The CPU checks low, zero, and overflow. If an underflow or overFPA also checks the intermediate results for underf y. flow is detected, special flows attempt to recover from the condition without a loss of accurac The exception flows (zero, reserved operand, overflow, and underflow) can be divided into three categories to handle exceptions discovered during: 1. First coefficient and argument handling 3. POLY DONE (final coefficient) handling. 2. General coefficient handling precision operation. However, Within each category, different microcode handles float and indouble y and only minor differcategor each used res there is little difference between the exception procedu ed, rather the microdiscuss not is flow on excepti ual ences in the microcode. As a result, each individ code procedure for each type of exception is explained. argument and first coefficient are The argument and each coefficient are checked for zeros. The nt (x) is zero, all the terms of the argume the If checked for zeros at the start of the POLY flow. t equal to 0, the FPA will argumen the With ent. polynomial will be zero except Ao, the last coeffici POLY DONE). When it by d (flagge ent coeffici last remain in the first coefficient loop waiting for the , will be returned to the CPU as the is received, it will be tested for reserved operand and, if not reservedation registers will be set to zero and accumul the zero, result of the polynomial. If the first coefficient is Zeros the FPA will wait for the next coefficient. ation is not zero), the current If a zero is found as a subsequent coefficient (when the current accumul and the FPA will wait for the zed, normali and d accumulation which is unnormalized will be rounde next coefficient. . If a reserved operand is found, the Each coefficient is checked by FPA hardware for reserved operand bit is set. The argument is not error tor accelera the and POLY operation is immediately aborted checked for reserved operand by the FPA because it is checked in the CPU and, if found to be re- Reserved Operand served, the POLY operation never starts in the FPA. The FPA checks for overflow by examining the exponent bits PR8 and PR9 in the PR register. If PR8 Overflow (the overflow bit) is high and PR is low, an overflow has occurred. w condition —once when The FPA checks each current accumulation two times per cycle for an overflo nt and once after the coefficie new the adding for the unnormalized multiplication result is readied the second instance in detected is overflow an If ed. addition result has been rounded and normaliz V (overflow) bit PSL the set will FPA The abort. will (normalized addition result overflow) the FPA and wait until the CPU traps it back to IRD. in an attempt If the unnormalized multiplication result overflows, the FPA accesses overflow routines the assumpon based written is e microcod FPA The . to recover an accurate result from the overflow may be result the , overflow current the from ed subtract tion that if the new coefficient exponent is PR8 is before, stated As low.) be will (PR8 overflow small enough that the exponent will no longer e differenc exponent the Since long.) bits (9 XXX high. This means the exponent in PR is 10XXXX 80;¢ subtracts FPA The down. scaled be must exponent taker EALU is only 8 bits long, the overflowed to scale it down. 2-23 The new coefficient is first checked for zero or reserved operands. A reserved operand causes an abort. If the coefficient is zero, it will not change the overflow. The FPA will attempt to recover from the overflow by first adding back the 80¢ to return the exponent to correct value, then normalizing and rounding. If this fails the FPA will set the overflow bit and abort. If the new coefficient is not zero or reserved, the operation continues. The FPA subtracts 80;¢ from the exponent of the coefficient to scale it down. The reduced exponent coefficient is checked for underflow. If an underflow is sensed, the coefficient is effectively zero when compared with the accumulation. Since the coefficient is effectively zero, the FPA will attempt to recover from the overflow by first adding back the 80;6 to return the exponent to correct value, then normalizing and rounding. If this fails, the FPA will set the overflow bit and abort. If the reduced coefficient did not underflow, it shows that the coefficient can effect the accumulation and possibly recover it from the overflow condition. In the case of accumulation overflow flows, we know the accumulation is the larger number. Therefore, no checks are performed on the exponent to find the larger number. The exponent difference taker then subtracts the two scaled down exponents to determine how many times the coefficient must be shifted to align the radix points. The POLY add/subtract will take place. The accumulation fraction is moved through ADER MUX to FALU and the restored (8016 added) accumulation exponent is moved to PR for processing. The POLY add/subtract takes place. The fraction result is moved to NR where it is normalized and rounded. The result exponent (formerly the accumulation exponent), is adjusted based on the fraction normalization and rounding. The result is checked for overflow and underflow. As stated at the begin- ning of this overflow section, an overflow after the normalization and rounding operation will cause the FPA to assert the overflow V bit and abort. Underflow The FPA can handle numbers as small as .29 X 10-3. A number smaller than this causes an underflow. The FPA checks for underflow by examining the exponent register PR. PR9 will be high or PR <8:0> will be low in an underflow. Underflow is not as serious a fault as overflow. An underflow means the result just checked is so close to zero that the FPA cannot accurately represent it. When encountered, the FPA sets the ACC ZDATA bit and special flows attempt to recover the number. If the underflow result cannot be recov- ered, the number is set to zero and FPA operation continues. After the POLY operation is completed, the CPU will trap on underflow if bit 6 (floating underflow) of the PSL is set. The FPA checks for accumulation underflow twice per POLY cycle, once as the unnormalized multiplication result is readied for the following addition and once after the result of the addition has been normalized and rounded. If an underflow is detected in the normalized addition result, no result recovery is possible. The FPA merely sets the accumulation to zero, informs the CPU of the underflow, and continues the operation. If an underflow is detected after the multiplication, special flows are accessed to save the result. In an underflow the exponent of both the accumulation and the coefficient must be scaled up so the exponent difference can be taken with an 8-bit exponent processor. The scale factor is 8016. The coefficient is first checked for zero or reserved operands. A reserved operand causes an abort. A zero coefficient will not change the underflow so the FPA will try to recover by normalizing and rounding. If this fails, the accumulation will be cleared (set to zero) and the FPA operation continues. 2-24 If the new coefficient is not zero or reserved, the operation continues. The FPA adds 80;¢ to both exponents to scale them up. If the coefficient exponent overflows when it is scaled up, the coefficient is so much larger than the accumulation that the accumulation will not effect the coefficient. The FPA will disregard the accumulation and make the new coefficient the accumulation by subtracting the 80¢ just added to the coefficient exponent and moving the coefficient to the registers formerly holding the underflow accumulation. If the new coefficient does not overflow, it shows that the coefficient can effect the accumulation and the exponent difference taker determines the exponent difference. Since the coefficient is the larger number, the coefficient fraction is moved through the ADER MUX to the FALU and the coefficient exponent is stored in PR after the bias previously added is removed. The accumulation fraction is shifted based on the exponent difference until the radix points align, and then added/subtracted. The result is rounded and normalized in the normalize logic. The coefficient exponent (stored in PR) is adjusted based on the fraction normalization and rounding, and becomes the accumulation exponent. The rounded result is checked for underflow. If underflow is detected, the ACCZ bit is set and a zero is stored. The FPA informs the CPU that an underflow has occurred by asserting both FP SYNC and ERR SYNC. In any case, the polynomial operation continues. 2.3 BLOCK DIAGRAM AND UNIT DESCRIPTION This section provides a functional description of each area of the FPA with relation to the control store and instruction execution. Discussions of logic unit operations are included for areas that require further clarification. The FPA can be divided into three areas. The first area contains two interface sections: the CPU-FPA interface and the FPA internal buses (which interface between the various sections of the data manipulation area). The second area, data manipulation, contains five sections: Fraction Adder/Subtractor, Fraction Normalizer/Divider, Fraction Multiplier, Exponent Processor, and Sign Processor. Each section in this area operates as an independent unit, capable of processing data in parallel with operations being performed in other sections. The third area contains only the Control Store and Logic which controls both interfacing and data manipulation. Refer to Figure 2-9, the FPA Block Diagram. 2-25 FCT m8289 sLOT 28 FPA CONTROL, SIGN PROCESSOR, | 10 8, 7 A EXPONENT PROCESSOR 7 EALU ' l EXPONENT DIFFERENCEI B P +3 ,54 SALU A , ! FMH/FML M8286/M8287 SLOT 25/SLOT 26 FRACTION MULTIPLIER 7 6 I BMUX\ 18 3 ¥ l I \ AMU FAD m8288 sSI.GT 27 FRACTION ADDER A DALU 18 SIGN PROCESSOR EALU - SIGN pCoDE OPCOD I sB ‘ LAl [LB EALU I SIGN #BUS " CARRYS _ ACCM 1 ! o : " Se; ! I 26 FALU —~ MUX } es l 1L 32 l MP MUX 8 32 FPA CONTROL 2 ROM - NEXT ADDRESS MC1 : "~ CS BUS <95:00> TOALL FPA LOGIC (Ts) I l | ] SEL A {32 | 32 1 \! . ADRS l 68 o D UMATCH - XMTR NR =t 1.1 34 26 NSHF lXFER l I 1 — 60 A 1 32 uBRK REGS ROUND BIT 4 32 , MX é; RCVR MUX 8 32 OUTBUF 1 32 [_T__]R I 604 Yy mMco | |mciINT ID DATA o » BUF. —l l 60 f 132 RLA RLB [ GraT] "GR8 BU DRIVERS 32 J CONTROL £ 213'18 X ' IRD > I TROMA TROM B )i i I MATRIX ! MATRIX [ 24l 32 BUS FP B <33:00> CONTROL —» 8 RNDMX 32 - BUS FP A <33:00> ‘ OPCODE 1 NALU LSH SEL mp| f MUX i QUOTIENT / | { 24 SPECIFIERS 7 BR P d B 1 I AR |ARX 5133 7] I l PRECISION 32 )i EPB - DOUBLE 1 I l SHE I L FPA ouT | I I 15 BUS 2 I | /\ | I SHF PROCESSOR LOGIC 35] I FNM ma8285 SLOT 24 FRACTION NORMALIZER | l I NROM ouT L 32 l SYSTEM ID BUS <31:00> BUS DFMUX <31:00> (T.S.) TK-0538 Figure 2-9 FPA Block Diagram 2-26 The CPU transmits both data and instructions to the FPA. The instructicns are decoded in the Control Store and Logic and access an FPA control store word. The FPA control store word controls the transfer of the data on the FPA internal buses and the operation of the various data manipulation sections. The various data manipulation sections perform the required operations. The resulting answer is formatted and sent to the CPU-FPA interface. A signal from the FPA informs the CPU that the answer is available at the interface. Each of the eight sections mentioned in this introduction are discussed individually in the following paragraphs. Each discussion includes an explanation of pertinent control store fields and a description of the hardware operation as controlled by the control store, CPU instruction, data characteristics, and both internal and external flags. CPU-FPA Interface 2.3.1 The CPU and FPA have numerous interconnections. They exchange data, instruction information, device control signals, and status information over buses and individual signal lines. There are three types of information transferred via the CPU-FPA interface. 1. 2. 3. CPU-FPA control and status Data Trap and diagnostic information. They will be discussed in this order in the following paragraphs. Refer to Figure 2-10 for a summary of the CPU-FPA interface. REGISTER #16 MAINTENANCE ID BUS REGISTER #17 STATUS CS BUS OP CODE INFORMATION MACHINE CLOCKS FPSYNC FPA ACC ERROR GENERAL REGISTER ADDRESS LINES DFMX BUS y CPU C,V,Z, AND N BITS EXECUTION POINT COUNTER TK-0520 Figure 2-10 CPU-FPA Interface 2-27 23.1.1 CPU-FPA Status and Control Interface - The FPA and CPU work interactively. This means they are constantly exchanging status and control information, and that operations in one unit can and do effect operations in the other unit. The status register (ID register 17) provides some CPU control of the FPA. Bit 15 of the status register is used by the CPU to enable the FPA. The CPU can disable all FPA outputs and effectively remove the FPA from the comput ing system by clearing bit 15. Refer to Figure 2-11 and Table 2-7 for a complete description of this register. STATUS REGISTER ID REGISTER #17 3130 29 28 27 26 25 l 0e»( | 16 1514 0e— -0 I ACC MINUS ERROR ZERO 0= l ACC EN ERROR 4 3 0 +»0{00 0 1 l ACC TYPE TK-0514 Figure 2-11 Status Register 2-28 Table 2-7 The Status Register Bit No. Name Bit Access 31 Accelerator Error Write by FPA Also called ACC Function Set when FPA detects an Read by CPU exception condition. Write by FPA Read by CPU Set when FPA encounters a reserved operand or generates an overflow. Also called Error Sync 30-28 Not Used-Set to zZero 27 Minus Zero Error Setting this bit sets Accelerator Error. 26-16 Not Used-Set to zZero 15 144 Accelerator Enable Write by CPU Read by FPA When clear all FPA outputs are disabled. This removes the FPA from the computing system. Must be set for normal FPA outputs. Not Used-Set to zZero 3-0 Accelerator Type Read by CPU A hardwired code identifies Hardwired in the type of accelerator FPA installed in the backplane slots. The FPA code is 0001. 2-29 The FPA also receives control and status information from the CS bus. The functions of these lines are summarized in Table 2-8. Table 2-8 CS Lines CS BUS 71 70 Name 0 0 NOP 1 0 ACC TRAP Initiates an Accelerator trap. Refer to Paragraph 2313 0 1 CPSYNC Indicates CPU has received FPA data or CPU is presenting valid data to FPA. 1 | Redefine uSI Decodes CS lines 57, 56, and 55 for more information. Function CS BUS 57 56 55 1 1 0 Poly End Indicates last term of polynomial has been transmitted from CPU. 1 1 1 FP TRAP Initiates an FPA trap. Refer to Paragraph 2.3.1.3. Op code information (operation and precision) is transmitted to the FPA from the instruction buffer via IRC OPC lines 7 to 0. These lines, from byte 0 of the instruction buffer, are used by the A-Fork/BFork logic and BEN logic for FPA control store next address generation (refer to Figure 2-34). A few other lines from the instruction buffer and decode logic provide specifier source information to the FPA. The possible sources of data are as follows: 1. 2. 3. 4. Memory Register Short literal Long literal. The CPU-FPA interface includes clock signals from the CPU to the FPA. The units operate synchronously on a 200 ns cycle. The TO of both units coincide. The FPA transmits two status signals to the CPU: FP SYNC and ACC ERROR. These signals are input to the CPU for branch control. FP SYNC is normally asserted when an FPA result is available to the CPU. ACC ERROR s set during an FPA error condition. 23.1.2 CPU-FPA Data Interface - The FPA receives operand data from the CPU and, after performing the required operation, returns the answer to the CPU. The data is transmitted to the FPA via the ID bus and is returned to the CPU via the DF mux bus. As mentioned previously the FPA does not do any memory accessing. The CPU must calculate the data memory address, access the address, and place the data on the ID bus to the FPA. 2-30 The FPA is optimized to use CPU scratchpad register data. It stores two copies of the 16 CPU scratchpad registers. To ensure that the FPA copies are exact copies, the FPA copies are addressed and written by the same lines that address and write the CPU general registers. The address lines are from the DAP board and the data is transmitted via the DF mux bus. To ensure that a changing register is never read, the CPU updates the general register and the FPA copies between T100 and T200 (T0) and the FPA reads the copies between TO and T100. Note that the FPA general register copies are writeonly memory to the CPU and read-only memory to the FPA. This means that results of FPA operations that are destined for the general register set are transmitted back to the CPU via the DF mux bus and then written into the general register set under CPU control rather than written directly into the general register copies by the FPA. The data stored in the FPA general register copies is read by the FPA using address lines from the instruction buffer operand source logic. This scheme enables the FPA to access register data and begin the operation as soon as the general register address/addresses is/are in the instruction buffer. All operands other than register operands are transmitted to the FPA via the ID bus. This includes memory data, and long and short literals. When memory data is specified in an instruction, the CPU fetches it and places it in the CPU D-register. The contents of the D-register is placed on the ID bus and, in the FPA, is transferred from the ID bus directly onto the FP buses. Since the D-register and ID bus are only 32 bits wide each, it takes two transfers to transmit a double precision number. Single precision (float) literal data, part of the instruction stream is transferred from the instruction buffer onto the ID bus. In the FPA, single precision literal data is latched into the literal register (LR) and then placed on the FP bus. The most significant part of double precision literal data is handled similiarly, i.e., IB » ID bus - LR - FP buses. The least significant part of a double precision literal is transferred from the instruction buffer over the ID bus to the CPU D-register, then back on the ID bus and onto the FP buses. Note that no ID bus addresses are required for data transfers over the ID bus. The FPA simply accepts the current ID bus data. When the FPA operation result is ready to be transmitted to the CPU, FP SYNC is asserted and the single precision result or the most significant part of a double precision result is on FP bus A. The CPU responds to FP SYNC by enabling the FPA DF mux bus drivers which place the FP bus A contents on the DF multiplexer bus. The FPA result is transferred to the CPU D-register via the DF mux bus. When the CPU has the data, it asserts CP SYNC. This ends a single precision (float) transfer or enables the second part of a double precision transfer. For a double precision transfer, the second part is placed on FP bus A and remains there until the CPU responds to the newly asserted FP SYNC by enabling the DF mux bus drivers, accepting the data, and asserting CP SYNC to indicate it has the data. While the FPA is transmitting the result back to the CPU, valid condition codes are also being transmitted to CPU condition code. latches. These latches are read during the next machine cycle. The N, V, and Z bits are set based on the status of the result. The C-bit is always cleared by the FPA. 2.3.1.3 Trap and Diagnostic Information - The FPA contains several features to facilitate error diagnosis and troubleshooting. These include programmable traps, and microdiagnostics, special maintenance features, and the visibility bus. The CPU can initiate 2 types of traps: ACC TRAP and FP TRAP. CS 71 high and CS 70 low initiate an ACC TRAP. This causes the FPA to access one of the FPA microcode addresses O through 7 as selected by CS lines 57, 56, and 55. Currently only 2 of these traps are used: Accelerator Power-Up Trap (address 0) and Accelerator Abort Trap (address 2). The FP TRAP (used for FP microdiagnostics), is selected by CS lines 71, 70, 57, 56, and 55 high. When FP TRAP is asserted, the FPAmicrocode address is selected by bits 23 through 16 of the maintenance register. The trap address (0 through 255 in the microcode) is selected by the data previously loaded into the maintenance register. 2-31 The maintenance register is a CPU-FPA readable /writeable register located on the ID bus. The CPU accesses this register as ID bus register 16. The register is designed to facilitate maintenance. As discussed previously it contains the FP trap diagnostic address. Using the trap address the CPU can exercise various sections of FPA logic. Bit 14 of this register provides a synch pulse that can be used for troubleshooting with an oscilloscope. This bit will go high each time the FPA accesses the microcode address stored in bits 8 through 0. Refer to Figure 2-12 and Table 2-9 for summary of this address. MAINTENANCE REGISTER ID REGISTER #16 3130 24 23 16 151413 | [<—ZERO——»{<—TRAP ADDRESS WRITE TRAP ADDRESS 9 8 0 MICRO /CURRENT e ZERO—>{e-gpeny [ SORRENT MICRO MATCH WRITE MICRO BREAK TK-0515 Figure 2-12 Maintenance Register 2-32 Table 2-9 The Maintenance Register Bit No. Name Bit Access Function 31 Write Trap Address Write by CPU Read by When set (by CPU) enables 30-24 Not Write/Read by Read by FPA Selects dress Used-Set FPA CPU to write trap (bits <23:16>). address to Zero 23-16 Trap Address CPU FPA microcode adfor FPA micro- diagnostics. 15 Write Microbreak Write by CPU Read by FPA When set (by CPU) enables CPU to write microbreak (bits <8:0>). 14 Micromatch 13-9 Not Used-Set Write by FPA Read by CPU Set by FPA when currently accessed by FPA microcode address equals address stored in microbreak (bits<8:0>). CPU writes microbreak. These bits serve two functions: FPA reads microbreak. FPA writes current FPA microcode address. CPU reads current FPA microcode address. 2. to Zero 8-0 Microbreak /Current Address 2-33 1. The microbreak selects the FPA microcode address to be monitored for micromatch (bit 14). The current address provides CPU monitoring of FPA microcode activity. Forty-three FPA signals are accessed by the Visibility Bus (V bus). The V bus is a diagnostic tool, designed to allow polling of stable internal CPU (in this case, FPA) signals. The console can issue commands which load the V bus latches with the signals monitored and then shift the loaded latches one bit at a time to a control word located in the console interface. At the console, the data shifted in will be examined by diagnostic software. There are 8 data input channels on the V bus, channel 6 is devoted to the FPA. Refer to Table 2-10 for listing of the FPA signals that are available to the V bus. Table 2-10 Signals Monitored by Visibility Bus FCTESHF COUNT SH FCTESHF COUNT4 H FCTESHF COUNT 3 H FCTD EALUOL FCTECOMPLL FADR SPC (0) H FNMS EALUCIN L FCTCSEL NORM H FCTP RAADRS3 L FCTP RAADRS2H FCTPRAADRS1L FCTP RA ADRSOL FCTP RBADRS3L FCTP RBADRS2L FCTPRBADRSS 1L FCTP RBADRSOL DAPL ACCCONTEXTOH DAPL ACCCONTEXT1H FCTCCLRRRL FCTH CPSYNCH FNME BUS- EXPL FCTESHF COUNT 2 H FCTE SHF COUNT | H FCTE SHF COUNTOH FCTN FALU CARRY INH FCTN FAMX SELOH FCTN FAMXENOL FCTAAGTBIJ FCTN SHF MUXEN 1L FCTN SHF MUX ENOL FCTN FALU FUNCSEL 2 H FCTN FALU FUNCSEL 1 H FCTN FALU FUNCSELOH FCTN FAMXSEL1H FCTN LOAD AR1H FCTN LOAD AROH FCTNLOAD ARX H FCTN LOAD BRI H FCTN LOAD BROH FADS BUS - FAD L FCTIACCNDATA H FCTCACCZDATAH FCTC ACCVDATAH 2.3.2 FPA Internal Buses As discussed in Paragraph 2.3, the FPA internal buses transmit data between the various data manipulation units. These units are arranged along two parallel 34-bit tristate buses called FP bus A and FP bus B. These buses transmit data from the CPU-FPA interface to the various data manipulation units, transfer intermediate results between units, and return the result to the FPA-CPU interface. The buses can transfer a complete 64-bit double-precision word or two 32-bit float words simultaneously. The BSC field of the microword controls a majority of the bus activity. The available sources include all FPA data manipulation units and the CPU-FPA interface. Refer to Table 2-11 for a summary of BSC bus control operations. Note that the BSC field controls only the data source. The destination is enabled via other control fields and accepts the data available onthe FP buses. 2-34 Table 2-11 Hex BSC Control Store Field Microcode Mnemonic 3 BSC Field 2 1 0 uCS uCS uCS uCS 15 14 13 12 0 | 2 3 4 0 0 0 0 0 0 0 0 0 1 0 0 ] 1 0 0 1 0 1 0 INTH NL 5 0 ] 0 1 NH Bus B* « Bus A*~ NSHF LO Bus B* « Bus A* « NSHF HI EXP SGN (Packed result) 6 0 ] 1 0 PQ Buses « SALU and LSH if MUL ‘Function Bus A « SALU TEMP and LSH if DIV (LSH is accessed differently if MUL or DIV) 7 8 9 A 0 ] ] ] ] 0 0 0 1 0 0 1 1 0 1 0 INTL ID LR ID.RB Bus A « LSH B ] 0 | 1 R Bus A « RA Bus B~ RB C 1 1 0 0 FAL.X Bus A « FALU HI/LO Bus B~ FALU LO/HI OR D 1 | 0 1 FAL.LH Bus A« FALULO Bus B+~ FALU HI E 1 1 1 0 FAL.HL Bus A « FALULO Bus B~ FALU HI F ] 1 1 1 Bus B* « Bus A* « ID Bus Bus B* « Bus A* ~LR Bus A « ID bus Bus B~ RB *The same data is placed on both buses. The buses handle both floating-point and integer numbers. The buses can handle intermediate, unpacked, and unnormalized data as well as final packed-and normalized results. Since the buses must handle intermediate data each bus contains two extra lines to handle the overflow and hidden bits. Refer to Figure 2-13 for summary of data formats used on FP buses. 2-35 SINGLE PRECISION (FLOAT) FLOATING POINT FORMAT BR FORMAT OVERELOW OVERFLOW F P BUS LINES (EITHER A OR B) FP BUS A — HIDDEN 32 i 30 28 TV 2624 11T v 22 20 vvr v qrnrn 18 e 16 14 12 10 L FRACTION L 8 L 6 ] EXPONENT v 4 2 vyvrry 0 I NOT SIGN F 1 1 ] 33 3231 ¥ {+¥ 7V 11 1 7V 17717 7 | 03332 31 FRACTION — ) FRACTION | |EXPONENT] FRACTIONJ | SIGN ~ ) \ ) - NOT 17 7V 7 A A FPBUS 161514 FRACTION | ») 7v LSB OVERFLOW — [~ HIDDEN 16 15 1 N\ 0 76 16 1514 10 31302928272625242322212019181716 v DOUBLE PRECISION FLOATING POINT FORMAT AR FORMAT FPBUSB 0333231 FP BUS B USED FRACTION BIT SIGNIFICANCE MSB HIDDEN 16 15 FRACTION FRACTION LSB 33326 5 4 3 2 333231 ] I_ 76 0 |EXPONENT| FRACTION 33326 0 31 ' 16 15 SIGN USED —18 | L§3 FRACTION BIT SIGNIFICANCE MSB )| 0 31 LONG WORD INTEGER (MULL) FORMAT FP BUS (EITHER A OR B) 32 30 rrT P 28 26 24 22 20 18 14 122 10 8 6 4 2 rvyreryrryryvirryroeovyvivrirryrervir vyt yayyirvvienrved -— A MSB LSB NOT | USED 333231 0 31 N ' i) 16 15 0 31 16 T | MSB FRACTION BIT SIGNIFICANCE RESULT FPBUS A "' 2ND CYCLE MOST SIGNIFICANT | ! 3332 31 HALF FROM SALU 03332 - o [1ST CYCLE LEAST SIGNIFICANT HALF |FROM LSH REGISTER ] LSB NOT NOT USED USED TK-0833 Figure 2-13 FP Bus Formats 2-36 2.3.3 Fraction Adder (FAD) The fraction adder aligns and adds or subtracts the fraction portions of two FPNs. The module contains 2 registers that receive data from the FP buses, 2 multiplexers that manipulate the register data, a shifter to align register contents before an add or subtract, an ALU to add or subtract the data, and bus drivers to place the result on the FP buses (Figure 2-14). Certain FAD signals are interfaced to the V-bus for maintenance and diagnostic purposes. Refer to Paragraph 2.3.1 for a discussion of the Vbus. 63:00 FALU B ] fi SHF COUNT<5:0> [ I FALU FUNC \ SEL <2:0> ASHFR A 7$ (FORMAT SELECT) BSC<3:0> SEL AR gt FMT (SHIFTS RIGHT) ' SIGN EXTENSION FALU 63:00 63:00 MUX /N PUT ENABLE (OUTsEi; MUX EN) (SMALLER (INPUT SELECT) NUMBER) SHF MUX SEL l FAMX SEL < < 1 o FAMX EN (LARGER NUMBER) A ZERO , CLK AR AR i 63:00 | o ENABLE) (OUTPUT ENABLE) NA-8Us <FaD FAMX SHFMX (INPUT SELECT) (OUTPUT A ! 63:00 CLK BR——f x S FILLED BITS 06:00 63:00 <:__| gr 106:00 1 63:071(NOT 1LOADED) BUS FP A <33:00> BUS FP B <33:00> > Y TK-0268 Figure 2-14 Fraction Adder Block Diagram 2-37 The fraction parts of the FPNs are loaded into the AR and BR registers. The data entry is controlled by the FADC (Fraction Processor Controls) control store field as shown in Table 2-12. Both registers are loaded with the MSB in bit 63. The execution of the POLY instruction causes an additional 7 LSBs to be transmitted via FP bus A lines <14:08> (where the FPE is normally) and placedin AR <6:0> by loading ARX. Table 2-12 Fraction Data Entry FADC Fields Hex 0 1 2 3 4 5 6 7 8 3 2 uCS 11 0 0 0 0 0 0 0 0 1 Operation 1 0 LOAD uCS uCS uCS 10 9 8 ARI ARO ARX BRI BRO 0 0 0 0 1 0 0 1 1 0 1 0 1 1 0 1 0 0 0 0 1 0 0 0 0 1 1 1 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 1 0 1 0 1 1 0 0 0 1 0 0 0 0 1 1 0 1 1 Select lines controlled by both microcode and hardware normally load the FPF associated with the smaller exponent into the SHFMX and the other fractional part into FAMX. 2-38 The contents of SHFMX is then right-shifted up to 63 bits to ensure that the radix points align. The magnitude of the exponent difference determines the amount of the shift. The shifted number is padded on the left with its sign. In most cases, the fraction is positive (Figure 2-15). l 5| 4 I 31 21 p—e 1[ o ALIGNED DATA TO FALU INPUTB \ 64 |_""N\ SHFR _/ { e SHIFTS0.1.2,0R3 SHFC SHIFTS 0.4.8.0R 12 64 f SHFB J SHIFTS 0. 16.32, OR 48 L N SHIFT) [ . SHF COUNT (MAGNITUDE OF SHFR /// SIGN 64 EXTENSION UNALIGNED DATA 1'S FOR NEG FROM SHFMX 0'S FOR POS TK-0275 Figure 2-15 SHFR Operation 2-39 When the two FPFs are aligned, the FALU operates on the two fractions. The FALU operation is determined by the op code and the sign of the two numbers. Refer to Table 2-13. Table 2-13 FALU Operation Instruction Sign of Numbers FALU Operation Add Add Like (Both + or -) Unlike Add Subtract Subtract Subtract Like Unlike Subtract Add FALU Operations Selected ) Si So Function 0 0 0 0 0 Clear 1 B-A Comment B = 0. Used for complementing number when Shift/Subtract D.P. would lose bits off end. Used when SUBD and exponent difference is greater than 7 or POLYD. 0 0 1 1 1 1 1 1 0 0 0 A-B 1 0 1 A+B Not Used AorB 0 Not Used Not Used ] Normal Subtract Normal Add Used to get A out or B out. Other side is zero. 2-40 The output of the FALU is loaded onto the FP buses under control of hardware and the BSC micro- control field. Refer to Table 2-14. The result is in unnormalized form. When a double precision ALU subtraction is done (either as the result of an ADDD, SUBD, or a POLY instruction), the exponent difference is examined. If it is less than or equal to 7, operation continues as usual. However, if the difference is 8 or more, error will be introduced into the LSB if a shift, then subtract is done. To prevent this error, special control hardware is enabled. It disables the output of SHFMX, forcing zeros into the shifter. The smaller operand is routed through FAMX to the A side of the ALU. A B-A (B = all zeros) is done, complementing the operand. The larger operand remains stored in its original register. The result of the ALU operation is output to the FP buses and reloaded into the AR or BR depending upon where it was before complementing. During the next machine state the complemented operand is aligned, sign-extended and added to the other operand. The result is loaded onto the FP buses and is normalized. Table 2-14 3 BSC Field 2 1 0 HEX uCS 11 uCS 10 uCS 8 0-B C Not used for FALU MUX Control | 1 0 0 uCS 9 FALU MUX Control FALU Function Hardware determined. NOTE During double precision add /subtract and poly; If EXP A<EXP B, AR format is used. If EXP B<EXP A, BR format is used. D 1 1 0 1 FP A FALU L (BR Format) FPFALUH E 1 1 1 0 FP A FALU H (AR Format) FPBFALUL 2.34 Fraction Normalize /Divide (FNM) The normalize /divide logic located on FNM performs the two functions indicated by its title. Refer to Figure 2-16. The hardware can either normalize the fractional result of an add, subtract, multiply or divide, generate the quotient given a divisor and dividend. The quotient is generated bit by bit and stored elsewhere. When the quotient is complete, it is returned to the same hardware to be normalized as any other fraction result. Both functions receive data based on microcontrol words, but once started, operate relatively free of microcode control until they are ready to transmit the answer. 2-41 QUOTIENT BIT STREAM e NALU // 6 o 60} 1 8 |60 ~ 'RNDMX 4 12 RND 7 BIT ey RR 60, / ‘ GEN NR {\ SHIFT / DATA 7 E ;35 L /\ /\ SHF VAL 30 34 NSHF BUS FPB 33:00 ) BUS. FPA 33:00 AV 32 TK-0274 Figure 2-16 Fraction Normalizer/Divide Block Diagram 2-42 2.3.4.1 Normalize Operation - Before a normalize operation can take place, the Remainder Register must be cleared. A 3 in the 3-bit MSC field of the microstore word clears it during IRD. Since the divide operations use the RR, it is also cleared during the end of the divide flows before the normalization of the quotient. The add, subtract, multiply, and divide operations produce results with varying characteristics. The add/subtract operation has the widest variability in result. Operand size (both fraction and exponent), operand sign, and desired operation, all contribute to this variation. The subtraction of two very nearly equal operands can result in a very small number, i.e., a number that must be shifted left many times before it is in final normalized form. Addition of two operands with equal exponents will produce a result between 1 and 2, necessitating a right-shift. Since the add/subtract operations do produce a wide variability of results, special firmware in the control store is accessed and the normalizations proceed under firmware and hardware control. A divide operation produces results between 1/2 and 2. A multiply produces results between 1/4 and 1. Both divide and multiply normalizations proceed under hardware-only control. All normalizations begin with NRC equal to 0, parallel-loading the result to be normalized into the NR. If the operation was an A/S, BEN 5 selects special firmware based on exponent differences. If the special firmware is enabled, an NRC equal to 2 enables the NR to shift left in 4-bit steps, 3 steps per machine cycle. Once the NR shift left is enabled, hardware looks at the top 12 bits of the NR for the first significant bit as the leading bits are left shifted away. In a positive number, leading zeros are disregarded and the first significant bit is a 1. In negative numbers (2’s complement notation), leading 1s are disregarded and the first significant bit is a O (refer to Figure 2-17). MSN NE SIGN becomes true as the data is parallel-loaded into NR. If the first significant bit is in NR<63:60>. This stops any left shifts. STOP SHF goes high whenever NR <59:56> contain the first significant bit and will cause the NR to stop shifting after one more 4-bit shift (i.e., when first significant bit is in NR <63:60>). If NR <63;52> does not contain the first significant bit, SWR will remain low, shifting all 12 bits out and enabling a new microstore control word via BEN 2. It continues monitoring for the first significant bit. If the NR is left-shifted 60 bits (counted by the control store), and the first significant bit is not found, firmware returns a result of zero by forcing the output of the NMX to zero via FORCE ZERO. ¥y NR <63:52> » SWR —»MSN NE SIGN —e STOP SHF RES NEG IF NUMBER IS NEGATIVE DISREGARD LEADING 1S, IF POSITIVE DISREGARD LEADING 0S. Figure 2-17 TK-0272 Normalize Shift Enable Control Hardware 2-43 When the first significant bit is in NR <63:60>, remaining the number can be rounded and normalized by the final normalization shift is controlled by the round bit FNM logic. | The round byte contents, NALU operation, and generator. The round bit generator contro ls these functions based on NR 63, NR 62, NR 61 and RES NEG. The round byte is combined with NR lines 39 through 36 (float or single precision) or lines 7 through 4 (double precision). This is selected via the FLOAT line. Since the final normalization shift takes place after the round byte is added and the first significant bit can be in NR 63, NR 62, NR 61, or NR 60 (it must be in one of these four positions), the position of the round bit (1) in the round byte varies (refer to Table 2-15). As summarized in the table, decode logic divides the 16 possible input cases into 4 cases, corresponding to the FSB in bit 63, 62, 61, and 60. Note that the RBG does not monito r NR bit 63, but, since the logic is only enabled when the FSB is in bits 63 through 60 the RBG logic can sense the contents NR bit 63 even though it does not monitor it. RES NEG L enabled means that the number being shifted and normalized is negativ e. This means that leading 1s (Hs) should be disregarded in the search for FSB and that the FSB will be a 0 (L). RES NEG L high indicates a positive number, disregard of leading Os (Ls), and FSB will be a 1 (H). The contents of the rounding byte is based on the location of the FSB. The rounding byte is designed to place a one 24 bits (56 bits for double precision) behind the FSB. Table 2-15 1. Round Byte and Normalize Control The logic decodes the four signals and locate s the FSB. RES NR63 NEGL* NRé62 NRé61 First Significant Bit (FSB) L L L L L L L 63 L H H 63 L H L H 63 L L L L 63 H L H L L 62 H 62 L H L H H L H 61 H 60 H L H L L L 60 H H H H H H L L L H H H H H H H L L H H 61 L H L H L H 62 62 63 63 63 63 *RES NEG L high indicates a positive numbe r. This means a | (H) is the FSB. RES NEG L low indicates a negative number. This means a 0 (L) is the FSB. RES NEG L asserted also causes a NALU subtract thereby rounding and complementing the number in a single step. 2-44 Table 2-15 2. Based on location of FSB, an appropriate rounding byte is generated. FSB 3. Round Byte and Normalize Control (Cont) Rounding Byte Selected Bit 3 Bit 2 63 1 62 61 0 0 60 0 Bit 1 Bit 0 0 0 0 1 0 0 1 0 0 0 0 1 Also based on location of FSB, the final shift required to normalize and ready the result for the CPU is selected. FSB Shift Selected SHF VAL 1 63 Right 1 place L | SHF VAL 0 L 62 61 60 No shift Left 1 place Left 2 places L H H L H H If the FSB is not in NR <63:60>, the NR is left-shifted and a binary counter counts each 4-bit shift. This count, RES NEG line, and NR bits 63, 62, and 61 (magnitude of final shift) determine the NORM ROM location to be addressed. The content of this location is added to the exponent of the result in the FALU and corrects it for all shifts that take place in the FNM. If however, the number to be rounded is all Is, the addition of the rounding byte will ripple through all bits and cause a fraction overflow. This is sensed by comparing the round byte location (indicating where the logic decoded the current MSB of the number to be rounded) and location of the MSB of the rounded result. If this comparison asserts NORM ERR and thus EALU CIN (indicating there was a ripple and subsequent overflow), a one will be added to the EALU (the exponent adder on FCT) to correct the exponent for the overflow. NR <63:04> goes to the NALU B side and round byte (4-bit) goes to the A side. Normally the NR is added to the rounding byte. However, if RES NEG L is asserted, indicating a negative (2’s complement) number, the content of the NR is subtracted from the rounding byte. This operation rounds and complements (return to positive notation) in one step. The 60-bit result <63:04> of the NALU operation (rounded and ready to be normalized) is trans- mitted to the NMX. The high part (and only part, if float or single precision) is transmitted through to the NSHF for final normalization shift. The NSHF shift control bits select a 0 to 3-bit shift for final normalization. Final normalization moves the MSB to the equivalent of the NR 62 position. When the data is placed on the FP buses, NR 62 (always a one since the fraction is now normalized) is the hidden bit and is placed on the FP bus A bit 32. When the data is transferred to the CPU, the hidden bit is not transferred and the data in NR 61 (bus A bit 6) is the MSB to be transferred. 2.34.2 Divide Operation - This logic also performs the fraction part of the divide operation for the FPA. Once the dividend and divisor are loaded into the FNM logic and the quotient storage on the multiplier boards is enabled for either a float (single) or double precision result, the divide operation runs under hardware control until the answer has been computed to the required precision. Once the answer has been computed, microcontrol takes over and transmits the unnormalized quotient back to the FNM logic where it is normalized and rounded like any other fraction. 2445 The hardware uses the restoring, repeated subtraction technique to divide. The dividend is initially loaded into the RR and the divisor is stored in the NR. The divisor (contents of NR) is subtracted from the dividend (contents of RR). If the result is negative, a 0 is left-shifted into the answer (quotient) register and the contents of the RR is left-shifted by one. If the result is positive or 0, a 1 is left-shifted into the answer (quotient) register; and the result is loaded into the remainder register left shifted by one. The divisor (contents of NR) is continually subtracted from the contents of the RR until 26 bits (58 bits for double precision) of quotient are generated. The quotient is then rounded and normalized. The division operands are loaded under microstore control. The first microstore state loads the dividend into the NR. The second state causes the NALU to OR the contents of the NR with the contents of the RR (currently clear) and load the result of the operation into the RR. In the same state the divisor is loaded into the NR. At the end of the second state the division operands are in their correct register and the divide sequencer hardware takes over. The divide sequencer hardware generates the RR control signals (refer to Figures 2-18 and 2-19). The RR CTL signals either load the NALU result into the RR or left-shift the RR contents based on the result being negative or positive. The input of the RR is hardwired to automatically produce a left shift when loading NALU result. This means that during the initial loading of the RR, the dividend is leftshifted by 1. The 11 state in Table 2-16 riglit shifts the dividend by one to adjust for this before beginning the divide operation. 2-46 DIV DONE H ) NEXT INIT RR CTL 1 RES POSH RR CLK CTLO 100 ns Y- D, REFER TO TABLE 2-16 DIVIDE SEQUENCE STATES Figure 2-18 Divide Sequence Hardware 247 NEXT TK-0270 CPU AND FPA ns CLOCK K ( (200 ns) [ T 150 50 T T 0 0 0 0 0 200 200 200 200 200 | 100 150 T 50 T T 100 150 DIVIDE SEQUENCE T 50 T | T & OUTPUT OF FF'S l 00 !oo > lo1 UWORD = LDRR T 100 150 RR<e—- NAL CLOCK (100 ns) State A | 50 ln l1o 10 10 FNM Function RR CTL 1 0 RR Function NOP 0 0 0 0 LD RR LDRR 0 0 NOP X NOP L 1 1 L 0 0 1 1 L LDNALU | H TORR L 1 1 X 1 0 1 DIV DONH1 0 Shift R* 0 Divide H Lt 1 0 DIV DONHO0 0 10 10 Divide 10 TK-0516 Divide Sequence States B 7T RR RIGHT SHIFT Table 2-16 Next A T 50 100 150 DIVIDE Divide Sequence Timing Input T 100 150 ! Figure 2-19 B T | H Parallel LD** L H H Shift R* H¥} Parallel LD Result** Shift L RR Contents Refer to PREVIOUS STATE *Used only once at the beginning of each divide. T Control bit 0 is controlled by RES POS H. **Since the RR is hardwired for a left shift, a parallel load shifts the data one place left. The answer is generated at the rate of one bit per 100 ns. If the result of the NALU subtract is positive or zero, a 1 is left-shifted into the quotient register. A negative NALU result causes a 0 to be shifted into the quotient register. The quotient register is made of two multiplier registers (TEMP and LSH). In single (float) precision the quotient bit stream is shifted intc TEMP (use only TEMP <29:4>. Indouble precision the bit stream shifts into LSH <31:4> then to TEMP <29:00>. When a 1 is leftshifted into TEMP 29 or 28 on the proper time phase in the multiplier logic, DIV DONE is asserted. This stops the division and accesses a new microstore word that normalizes and rounds the quotient. 2.3.5 Fraction Multiplier (FML and FMH) The fraction multiplier hardware in the FPA is located on two modules, FMH (Fraction Multipl ier High) and FML (Fraction Multiplier Low). They handle all fraction multiply functions, part of the EMOD function, and also store the division quotien t as it is generated. It accepts data from the FP buses, performs the required unsigned multiplication , and gates the results back on the FP buses. Refer to Figure 2-20. 248 » ~ 24 I> MC1 |— ROM MCAND < ° a. o w w > a 32) gank 71 s / ] 2 BUS Y MCO 32 . 32 64 ROM 2 [+ ] SEL 1) MUX 64 | ROM PALU ‘34 E::g: STORAGE AALU #{ ACCM [ | 64 }McmT__ / 32 T v mp| 7:4| 32] 32 VANVAN 36 ROM SALU 1 BANK A i MPUER ) § NIBBLE | 39 SEL MUX / \ , M PLIER BUS CARRY ,8 HOLD ' CARRY N\ FP BUS ELVDRVR2 8 Hotp - i LSH < a o o 1, ® 2 2 NV TEMP 156 MP1 21 Py MPO 32 <BUSFPA> TK-0278 Figure 2-20 Fraction Multiplier Block Diagram 2-49 The FPA microcontrol controls the loading of both the multiplicand and multiplier into the appropriate FM (fraction multiplier) registers. In both float and double the complete multiplier is stored on the FMH. During the single precision (float) function, the FMH handles the upper 16 bits of the multiplicand, FML the lower 8 bits and the answer is completed after one pass through the logic. For double precision (56 bits) the upper half of multiplicand fraction is handled in the FMH and the lower “half is handled in the FML. Two passes are required to compute the final answer. The FM multiplies under its own control logic. After the operands are loaded, the MCTL field in the FPA microcontrol is asserted; this starts the multiplication. A float multiply is stopped by the micro- code two states (400 ns) after it starts. For a double multiply, control goes to a wait state and remains at that location untili MUL/DIV DONE is enabled, indicating that the FM logic has finished the operation. At this point microstore control takes over and the answer is transmitted to the normalize logic or, in the case of EMOD or MULL, transmitted to the CPU as an unnormalized number. In order to obtain fast multiplication, a pipeline technique is used (Figure 2-21). The multiplier is divided into 4-bit nibbles. The nibbles are then accessed consecutively by a counter-multiplexer combination (least significant nibble first) and each nibble operates on up to 32 bits of multiplicand. The MCAND bus and MPLIER nibbles are used to address the ROMs. The banks of ROMs provide a4 X 4 primitive with 2-way interleaving. The data is latched (ROM STORE) and applied to the inputs of 4- BN - bit adders (PALU). These adders combine the ROM data to form a partial product, storing the carryout of each 4-bit section, to be added in on the next cycle. The partial product is latched in PPROD and passed to another row of adders (AALU) which accumulate the final product, again, saving the carries. Thus, when the pipeline is operating, there are four processes cycling at the same time: Select ROM addresses Latch ROM data Form partial product Accumulate final product. After the final product is calculated, the stored carriers from both stages are combined with the accumulated product using full carry look-ahead to produce the final answer in a single precision (float) operation. In double precision, this result is stored and used during the generation of the final answer during the second pass. Each of the pipeline processes, with the exception of accessing ROM data (which occurs in each bank of ROMs on 100 ns) occurs at 50 ns intervals. The operation of the FM hardware is discussed in three sections. The first section explains the operation of the pipeline, concentrating on operand loading and manipulation of partial products, partial results, and carries to produce the final answer. The second section concentrates on the control logic and how the signals that control the pipeline are generated. The third, and shortest section, explains how the FM registers are used to accumulate the quotient during a divide operation. 2.35.1 The Pipeline Loading the Operands The multiplication process begins with the loading of the operands. As discussed in Paragraphs 2.1 and 2.3.2, data is transferred along the FPA buses in several formats. The multiplicand loading logic sorts out these formats and loads the multiplicand register (MCO, MC1, and MC I) so that when the MCAND bus does a parallel access of the MCAND, the MSB of the multiplicand is always in MCAND bus bit 31, and each following bit is progressively less significant (Figures 2-22 and 2-23). 2-50 THE PIPELINE 10 T IME [L SELECT ROM * ADDRESSES — PPROD LATCH % T150 ! 7250 1 l ! ' ADDRESS BANK B ADDRESS BANK A ADDRESS B ADDRESS A MP<7:4>@) MP <3:0> (V) MP <7:4> (X) MP <3.0> (W) MP <7:4> (V) MP <3:0> (U) &| 1sTNIBBLE 1ST NIBBLE OFB 2ND NIBBLE OF A 2ND NIBBLE 3RD NIBBLE 3RD NIBBLE STORE RESULT OF Z X MCAND LOOKUP STORE RESULT OF Y X MCAND LOOKUP STORE RESULT OF X X MCAND LOOKUP STORE RESULT STORE RESULT W X MCAND LOOKUP V X MCAND LOOKUP | V| nor NOP . FORM PARTIAL PRODUCT NOP ACCUMULATION ACCM = 0 NOP ACCM = 0 N\ PARTIAL PRODUCT PARTIAL PRODUCT (Z CAND) (Y CAND) (X CAND) (W CAND) ACCM+YCAND= ACCM+XCAND= NEW ACCM NEW ACCM NOP FORM ACCM ACCM (0) + ZCAND= ° o o e o o o . o o o o o @ e o o e TEND 1 FORM WXMCAND PARTIAL PRODUCT ACCM = 0 ) ADDRESS B FORM Z X MCAND\‘ FORM Y X MCAND\ FORM X X MCAND IN ACCM SALU 7200 ! | ADDRESS BANKA LATCH ROM DATA o | \5p IN ROM STORAGE < PRODUCT IN 7100 2| W . FORM PARTIAL T50 PARTIAL PRODUCT ACCM OPERATION FORM - COMPUTE FINAL - —_— - SE— RESULT ACCM * FINAL RESULT FINAL RESULT EQUALS PLUS CARRYS MULTIPLIER (MP) AND MULTIPLICAND (MCAND) ADDRESSING. BOTH MULTIPLIER AND MULTIPLICAND ARE DIVIDED INTO 4 BIT NIBBLES. THE MULTIPLIER NIBBLES ARE ACCESSED INDIVIDUALLY (LEAST SIGNIFICANT NIBBLE FIRST) AND ARE USED WITH ALL MULTIPLICAND NIBBLES TO GENERATE ROM ADDRESSES. TK-0529 Figure 2-21 The Pipeline 2-51 A 32 6 . | MC 2.F | 8 % 31:24 2 .8 , 23:16 7 4 .8 , 4 i 0 31 MC . E : 24 23 E ’ MCAND BUS mcC 2 . ) > 15:8 TO ROM 16 BANKS A&B MCO B 15 MC . I ” 8 7: — 8 7 : . < o ,1 8 L] o 0 w uw 31 @ @ : S 1 2 ACCESS CODES 3 1 — FIRST HALF OF EMODD OR MULD 8 2 — SECOND HALF OF EMODD OR MULD F 24 23 : — EMODF OR MULF | - MULL (INTEGER MULTIPLY) 8 1 7 | .8 ' 7 l .8 7 1R .8 16 McCl 31 24 23 : : 8 16 15 : : 8 7 ] 0 *THIS 8 BIT REGISTER IS ALSO CALLED EMOD EXTENSION AND MCX TK-07FG Figure 2-22 Loading and Accessing the Multiplicand 2-52 M PLIER BUS 7:0 N TOROM | mP 3.0 1o dnny oy Suisseooy pue Suipeo] €7-7 2In3di]g €6-C MP 7:4 | TOROM NIBBLE COUNTER NIBBLE COUNTER / (SB) (SA) M PLIER BUS 63:08 F E D Cc 9 A B 8 7 6 5 8 |16156 ] 12 11 2019 |6059 | 5655 | 5251 | 4847 | 4443 | 40|39 | 3635 | 3231 | 2827 | 2423 63 4 3 2 .2019...16 32, 6....4 3-000.031' 0282700'242300-2019...16 15...1211 ooooo 8 7.0..-04 30--00 o 3100-2827-..2423.. B < < mP1 (24 BITS) ' A MPO (32 BITS) BUS FP B D X BUS FP A TK-0267 The multiplier up to 56 bits (14 nibbles) long, is loaded into MP1 and MPO on FMH. MP1 is 24 bits (6 nibbles) long and MPO is 32 bits (8 nibbles) long. Unlike the multiplic and, the multiplier is loaded in one format only (Figure 2-23). The MSB is in MP1-23 and each following bit is progressively less significant. The LSB is MP1-00 for single precision (float) or MP0-00 for double precision. The single format is possible because, as stated before, the multiplier is used consecutiv ely, the various formats are sorted out by the counter as the nibbles are used during the multiplication. Selecting the Multiplicand The operands, multiplicand and multiplier, are enabled onto their respective buses, MCAND BUS and MPLIER BUS, under control of operand bus source logic. Refer to Figures 2-22 and 2-23 and Table 2-17. All 32 lines of the MCAND bus are enabled every time. During a MULF and EMOD and for the first pass of a MULD and EMODD, the MCAND bus accesses MCX. Both MULF and MULD (first pass) use only the top 24 bits, as the lower 8 are discarded later in the pipeline. The MPLIER BUS multiplexer begins by selecting the least significant byte of the multiplier. Interleaving hardware later selects the high or low nibble of the bus. The mux then selects a new, progressively more significant byte each 100 ns. Selecting ROM Address — The Interleave Hardware Both the MCAND and MPLIER buses are divided into 4-bit nibbles for ROM addressing. Each MCAND nibble (8 nibbles) is combined with a MPLIER nibble to provide address bits for 16 4x4 look-up ROMs. Rather than compute the product of the two 4-bit nibbles, the fraction multiply hardware uses look-up ROMs. The multiply results are stored in the ROMs. The data is stored within the ROMs such that the content of the address accessed by the two nibbles is the 8-bit result of a multiply with the same two nibbles. Since the ROMs are relatively slow the 16 ROMs are divided into two interleaved 8 ROM banks. One bank is accessed by the low MPLIER nibble (MP 3:00) theotherby the high MPLIER nibble (MP 7:4). Both ROMs are addressed on 100 ns cycles; the MP low ROM is first, and the MP high is second, trailing by 50 ns. The addressing of a ROM bank ends the first part of the pipe. Latch the ROM Data The second part of the pipe selects the outputs from either of the ROM banks, using the ROM SEL MUX, and latches the data (64 bits) in ROM STRG. It alternately selects data from the low and high ROM banks on a 50 ns cycle. While the ROM data selected is being latched, the first part of the pipe is selecting a new address for the ROM bank just selected. The output of the other ROM bank will be selected during the next cycle (50 nsin the future). The address lines of this ROM bank were changed 50 ns ago and the outputs are settling. Form Partial Product The outputs of ROM STRG and any carrys from the previous PALU add are added to form the partial product. The PALU is eight 4-bit adders. The outputs of the ROM STRG are wired to the PALU adder inputs such that bits of equal significance are combined. The outputs of the PALU without carrys are stored in the PPROD LATCH. The carrys are stored in CARRYHOLD registers to be added in on the next PALU add. The latching of the partial products in the PPROD LATCH ends the thitd part of the pipeline. As indicated previously each multiply cycle selects 4 new bits from the multiplier register and each 4 new bits are 4 positions more significant. This means that the input of the PALU add becomes 4 bits more significant each multiply cycle. Because of the increase in significanc e the stored carry-out of each- PALU adder is input, on the next cycle, to the carry-in of the same PALU adder rather than the carry-in of the next PALU adder. 2-54 Table 2-17 Operand Bus Source Input Signals MCAND Bus Load Enable* Operation DOUBLE TTH OoPC7 MC1 EMODF or MULF L X L L MULL (INTEGER MUL) X X H 1st Pass H H L 2nd Pass H L L MCIL MCO MPLIER BUS MCINT Nibble Select L Start at A, do 6 nibbles L Start at 6, do 4, then start at 2, do 4. L Start at 2, do 14 §S-¢ L MCX EMODD or MULD MCAND Bus lines fed *MCAND Bus lines are low enabled. L L L 31-8 | 7-0 Start at 2, do 14 31-8 31-8 7-0 Note that while the third part of the pipeline is operating, new ROM data is being placed in ROM STRG to be presented to the PALU inputs on the next cycle, and new ROM addresses are being generated to access new data. Accumulate Result The fourth and final section, the AALU and associated accumulator (ACCM) adds the partial products computed in the previous pipeline section to the result stored in the ACCM including stored carries from the previous AALU cycle and latches the result into the ACCM and LSH register. The AALU, ACCM, and ALU carry-hold interconnections automatically shift the ACCM content and ALU carry-hold content to adjust for the 4-bit increase of each new partial product. Because each partial product input to the AALU is 4 bits more significant than the previously stored ACCM content, the outputs of the ACCM are wired to shift the ACCM content 4 bits right (a decrease in significance) before being added to the PPROD LATCH content. The lower 4 bits of the AALU output are always right-shifted into the LSH register. In double precision operations, the content of this register is the least significant half of the result. As with the PALU carrys, the carry-out of each AALU is stored and added in on the next cycle. Also similar to the PALU logic, the stored carrys are added to the AALU adder that generated them because the content of the AALU is now 4 bits more significant than when the stored carrys were generated. The latching of the accumulating final result in the ACCM ends the fourth pipeline section. The 4 sections of the pipeline continue to operate until stopped by the point is selected based on both function and precision. FM control logic. The stopping SALU OPERATION When stop is initiated, the whole pipeline stops and new logic, the SALU, is accessed which adds the two sets of stored carrys still in the pipeline to the total product on the output of AALU. When a pipeline stop is initiated, the AALU output (SALU input) is the contents of ACCM plus the current PPROD. Both the ACCM plus PPROD addition (the AALU operation) and the PPROD forming addition (the PALU operation) form stored carrys. The hard-wired 2-bit shift in the PPROD LATCH input is not part of the several 4-bit shifts that take place throughout the FM logic, but rather format the stored carrys so they may be easily combined for a final answer in the SALU. Both the PALU and AALU are composed of 4-bit adders with carry-outs. This means that the carry-outs are generated every 4 bits and that the PALU and AALU stored carryouts can be treated as numbers of the following format: X000X000X X is a stored carry (data bit) 0 is a zero (non-significant bit) Conventional wiring (output of a 4-bit PALU adder to input of a 4-bit PPROD LATCH to a 4-bit AALU adder) wouid cause the data bits of the PALU stored-carry to line up (be of equal significance) with the AALU stored-carry. This would prevent PALU stored-carrys, the AALU stored-carrys, and the ACCM result from being combined in one operation in one adder (the SALU). However, wiring the PPROD LATCH input and outputs with a 2-place shift, generates a PALU stored-carry number with data bits of significance between the AALU stored-carry data bits. This shift allows both AALU and PALU stored-carry numbers to be input to one side of the SALU, since the data bit of the PALU stored-carry is always a non-significant bit of the AALU stored-carry and vice versa. Refer to Figure 2-24, 2-56 SALU OUT 32 BITS | 1 N b r | ] | | | | | | eee] Jeeeeeeoec SALU MSB Rhkiididj i) / \nse | zeros L AALU (32 BITS) PALU - CARRY HOLD * (8 BITS) CARRIES FROM . AALU . (8 BITS) . TK-0276 Figure 2-24 SALU Operation - Adding the Stored Carrys The use of the SALU result is determined by operation and the operation precision. If the SALU result is the final answer, the result is transferred to the FP buses under both op code control and FPA microcontrol. If however, the operation is double precision, the result is stored, and then, shifted to format it for later operations under FM logic control. Before the shift, the most significant half of the operation is in TEMP, the least significant half in LSH. The shift transfers the contents of LSH (the least significant half) to the ACCM register which is designated ACCM 14 at this time, and transfers the most significant half from TEMP to (just vacated) LSH. For the second pass, the second half (the more significant half) of the multiplicand is accessed from register MC1 and MC1L, and logic enabled only during the second pass, combines the data transferred to LSH from TEMP with the new result being accumulated. Otherwise, the operation of the pipeline during the second pass is the same as during the first pass. 23.52 FM Control - The fraction multiplier logic is hardware rather than firmware controlled. Four state bits select one of 13 function states that control the FM logic. Within each state, the state bits, various internal flags, and various flags from other FPA logic are combined to provide the control signals needed to implement the selected state’s functions (Figure 2-25 and Table 2-18). 2-57 ADDZ (0101) CONT LOAD PPROD D8 ACCM Z * DBL* FLAG TG STATE o1 A INIT XFER TEST INIT MULTIPLIER LSH PIPE S0 (0000) (0100) TEMP. LSH LD COUNTER | STATE SHF RIGHT ACCM LOAD TEMP LSH CARRYS FLAG <0 (0111) ACCM CLEAR LOAD PPROD CADD LOAD PPROD PPROD, ACCM #5 (0110) LSH TEMP CLEAR CARRYS IVRD * CONT D4 * FLAG INT _—-fi WAIT (1100) DIV FLAG * INT IF EVEN FLAG « 1 DONE MULL (1110) *\ COUNT = 3 LOAD PPROD LSH FLAG ASSERT MUL/DIV DONE ACCM —_ (1111) NOP ALL COUNT = 3 REG'S ‘ DIV DONE DIV DIV * CONT NOP (1000) (1011) EVEN IF ODD, NOP ELSE SHF DIV DONE LEFT TEMP LSH TK-0279 Figure 2-25 FM Control States 2-58 Table 2-18 STATE VARIABLES | NAME X3 X2 X1 X0 0 0 0 0 INIT NEXT STATE FM Control States DEFINITION OUTPUT CONTROL CNTR _ | IF TO, THEN 0000; ELSE 0010 Next | NEXT | next | NEXT Iquppiv LDCNTR | coNSTANT | TTH RESULT OF MINIT SIGNAL FROM MICROCODE. PREPARES MPLIER NIBBLE SELECT COUNTER FOR MULF SEQUENCE. 0 1 o SYNC 1101 FLAG | IFFTAG | PREV 1 1010 0 * AND DOUBLE 0 1010 0 AND ENTRY FROM STATE 0000 AT T50 TO 6 fil‘;g * 0 1 0 0 0 1 [conT 0001 [ TEST | IFCONT.,THEN 0000 ELSE IF DIV, THEN 1000; ELSE IF DBL. OR INT., 1100: ELSE 0100 STATE CALCULATION;. CLEARS THE NOP | 1IF EVEN THEN 1000: WAITS FOR FIRST QUOTIENT BIT TO BE 1 0 1 1 pIv | IF DIV DONE, THEN 1011; | SHIFTS LSH AND TEMP LEFT ONEBIT TO ACCEPT QUOTIENT BITS IN DIVIDE . 0 0 WAIT | IF FLAG, THEN 1100 CLEAR DATA PATH AND CARRY 1 1 1 0 MULL | IF COUNT=3, THEN 1110 ELSE 1111 RUNS MULTIPLIER PIPE FOR MULL. 0 1 0 0 PIPE | IF SHF ZEROES, AND DBL. | RUNS MULTIPLIER PIPE FOR FLOATING | ELSE 0100 0o 1 1 1 CADO | IF D4, THEN 0111 TIME THROUGH DBL MULTIPLY. 0 NOP | NOP NOP | NOP 0 1 0 LD SR SR SR * * * * * | 1 1 0 | XFER | IFDS, THENO110 o 1 0 1 ADDZ | IF D1, THEN 0101 11 1 1 DONE ELSE ;0100 LD | LD | NOP | NOP 0 PREV 0 NoP | Nop 0 [ELSEPREV PREV * | FLAG PREV 1 x TTH 0 PREV oY, 0 + ELSE 0111 111 SL IF EVEN |LDIF |EVENAND/| FLAG FLAG 0 0 NOP | NOP | PREV 0 0 Nop | Nop | SLIF 0 LD | LD LD FLAG . * TTH 0010 o010 INT AND FLAG STOPS PIPE TO ADD FINAL STORED - s " NOP = NOP N |SLIF EVEN | EVEN | IFEVEN | o 0 0 0 D | LD LD 0 0 LD | LD LD ELSEPREV| [ o FLAG LD NOP 0 SR F 0010 * 0010 prev TTH | 1] AND | poyBLE 0 0 PREV FLAG PREV ol 0 | IF Sixc | NOP | NOP LD IF |EVEN NOP . LD AND FLAG SHIFTS ACCM, TEMP, AND LSH RIGHT TO TRANSFER.) 3 ! LOADS TEMP. 0 %* - ELSE IF FLAG, THEN 0110 | CARRYS TO FINAL ACCUMULATION. ELSE 1111 0 e | o 0 1110 IF INT |1 IF DBL - | LSB’S TO ACCM’S 4 MSB’S IF SECOND " |D3 ANDDBL| gy e 1010 ELSE IF DBL, THEN 0100; | REGISTERS FOR MULD, EMODD, AND | ! IFINT ELSE 1110 MULL. WAITS FOR FIRST ROM LOOK UP.| AND FLAG ELSE IF D1, THEN 0111; 1 . ¢ . 0 OR INT * 1 POINT MULTIPLY OPERATIONS. LSH’S 4 TTH 1010 |l IF 1 AND FLAG THEN 0101 PREV * i FORMED IN THE NALU. ELSE 0 1010 0 ’ 0 T | pousLE | OR INT TESTS OPCODE FOR FIRST EXECUTION 0 1 IF pousLe |2 O MULD, OR EMODD. 0 ELSE 1111 1 IF NIBBLE SELECT COUNTER IF MUL, 1 ELSE 1011 |TEMP * NOP IF MULF OR EMODF; LOAD MPLIER| MULTIPLIER DATA PATH LSH FLAG DOUBLE MICROCODES 200ns. CLOCK 1 DONE |TRODI ACCM 1 IF FIAG PROVIDE SYNCRONIZATION BETWEEN MULTIPLIERS 50ns. CLOCK AND I ill:lfi ; , , ADDS ZEROES TO ACCM’S 4 MSB’S. STOPS ALL REGISTERS FROM 0 o010 0 0010 1 IF . 0110 IFINT| PREV TTH 0 | IF CHANGING TO ALLOW NR OR CPU D D3 AND DBL| ELSE 0010 | DOUBLE REG. TO ACCEPT FINAL RESULT. ORINT * . | 11F DOUBLE 0 0 0 PREV PLAG PREV FLAG 0 0 0 0 0 1 | | L D | SR SR LD | LD LD NOP | NOP NOP 5R b ! NOP * * DON'T CARE TK-0735 2-59 bl S The states can be roughly divided into four groups: IRD Integer Multiply Fraction Multiply Divide. This section will discuss the states by groups and in the previously shown order. Within each discussion, the states will be discussed in the order they are accessed within the group. This is important because the function of some states is partially dependent on the previous state. The state of the logic is defined by the output of the PRESENT STATE register which is clocked on a 50 ns cycle. The inputs to this register (the next state) are based on the current state and internal and external flags. A majority of the internal flags provide sequence information and are generated in the logic shown in Figure 2-26. IRD Group (Instruction Register Decode) When the FM logic is not performing a multiply or divide operation, it is in IRD. While waiting, the logic is continually cycling through the 4 states in this group preparing the FM logic for a multiply. In this IRD group the op codes in the instruction buffer are monitored. Initially, (in INIT), the FM logic is set up fora MULF, but if the op codes indicate either a MULL, MULD, or EMODD, new information is loaded into the FM logic in the CONT state. The FPA microcontrol will be loading the MPLIER and MCAND register during IRD if the op codes indicate a multiply operation. The control logic enters INIT whenever the Multiplier Operand Control (OPLD) field in the FPA microcontrol store is F. This normally happens during the FPA IRD or when a multiply operation is finished. The SYNC state is entered at CPU T50 and synchronizes the FM clock with the CPU clock. It also clears FLAG. CONT is entered at T100 and loads new information if the op codes indicate a MULL, MULD or EMODD. TEST is entered at TI50. In TEST, if the MCNT bit in the FPA microcode is not asserted, indicating that the FPA does not want the multiply pipeline to begin, the FM returns to the INIT state and continues waiting. If however, MCNT is asserted, indicating that the multiplier operands are loaded and the FPA wants a multiply to start, the correct execution state is selected based on the op code. Refer to Table 2-18 for summary of IRD group functions. Multiply Float Path : If the op code indicates a MULF, the PIPE state is selected and the multiplier pipe can continue. Note that during INIT the nibble counter was loaded with MULF control data for ROM look-up to begin based on that data. Since a MULF is being done, the data in the beginning of the pipe is correct. The logic remains this state (PIPE), running the pipe and accumulating the answer, until D1, a timing signal, is asserted. When D1 is asserted the current content of the PPROD plus ACCM plus the storedcarrys is the final correct answer. Asserting D1 selects the CADD state. This state NOPS most of the FM registers and enables the SALU add of stored-carrys to the AALU content. CADD also latches the SALU result into TEMP. The FM logic remains in CADD 150 ns (until D4 is asserted.) - Since FLAG was cleared during the IRD group and never set, it is clear and asserting D4 initiates the DONE state. This state asserts MUL/DIV DN and NOPs all other FM logic. MUL/DIV DN, monitored by the FPA control logic, returns control to the FPA microcontrol. It is the FPA control store that selects the MULF result, via a multiplexer, directly from the SALU outputs rather than from TEMP. The FM logic will remain in DONE until returned to INIT by the multiplier INIT code in the multiplier operand control field of the FPA microcontrol store. Refer to Figure 2-27 for a summary of MULF control. 2-60 WIRED AS MULTIPLIER NIBBLE SHIFT REGISTER _—] COUNTER LOAD 4 BIT & DATA COUNTER COUNTER smmmi>| UP 6 BIT 8 BIT REGISTER DECODE LATCH D1 THRU D8 4BIT REG 50 ns (COUNTER CLOCK LSB MPLIER SELECT LINES IGNORED) ‘1 MPLIER SELECT LINES ROM BANK ROM BANK NOTE THIS FIGURE SHOWS ONLY GENERAL SIGNAL FLOW. ALL ITEMS SHOWN HAVE NUMEROUS. OTHER OUTPUTS AND INTERCONNECTIONS B I 50 ns DELAY Figure 2-26 TK-0281 FM Control Logic 2-61 63:60 55:52 47:44 59:56 ¢ 51:48 UV ]W]| | 43:40 X | Y] z]| MPI24BIT MPLER MULF OPERANDS | MULF TIMING | Mc1 24 BIT McanD e —-’, 50 NS je— MUL STATE TO >ie IRD INIT SYNC X 0 | V, SA <2:0> SB <2;0> ODD H BANKA MP <3:.0> X PIPE PIPE PIPE PIPE PIPE PIPE 0 PIPE ) CADD 0 0 0 0 0 0 0 0 0 0 0 0 DO D1 D2 D3 D4 D5 D6 2 3 4 5 6 7 8 X 3 3 E 5 X 5 6 6 5 6 0 6 7 0 0 1 1 2 7 7 0 0 1 1 e— MP 43:40 —f— MP 51:48 —>je— MP 59:56 cADD | capp | DONE | DONE 3 2 2 —» MP 47:44 —ple— MP 55:52 —plg— MP 63:60 ,—p I Z-MCl [ Y-MCI | X-MCl | w-McI | v-Mcl | u-mcl CONTENTS OF ROM STRG CLR CONTENTS OF PPROD X SL SL sL sL LD CONTENTS OF ACCM PP1 PP2 PP3 PP4 PP5 PP6 —o LD LD LD LD LD LD PP1 + | PP2 ACCMO| ACCY CTL | me | BANKB MP <7:4> PPC CTL TO TEST (LD) X TO CONT (FMHM) “D” TIMING MUL NIBBL CNTR TO MCONT X X SsL SL SL SL LD LSH MUL DIV DONE + | PP3 + | PPA + | PP5 + 1 pcon o ACCM1| ACCM 2 | ACCM 3 |AcCM 4 LD LD LD LD LD LD LD LD LD LD LD NR | | iAcl:‘U = :""5 PLUS s?oansg c'x-\Lljalsavs FROM PP6 & ACCM 5 MUL DONE ADD | MUL DONE MUL DONE LD NR LAST CARRYS MULF RESULT * ACCUMULATION ACCM 1 =] ACCM 2 =| ACCM 3 =| PP2 + ACCM 1 | LSH oTM PP3 + ACCM 2 | ACCM 4 = | PP4 + ACCM 3 ACCM 5 = | PP5 + ACCM 4 * PP1 + ACCM O | — sy LSH LSH LSH j | j ] j AFTER EACH ADDITION OF THE PARTIAL PRODUC TAND ACCUMULATOR CONT ENTS, THE 4 LEAST SIGNIFICANT BITS OF THE RESULT ARE LOADED INTO THE LSH REGISTE R. TK-0512 Figure 2-27 MULF Control 2-62 MULD Path If, when the FM control logic is in TEST, the op codes indicate a double precision muitiply (DOUBLE set), the WAIT state will be entered. Initially (in INIT) the nibble counter was loaded for MULF and ROM lookup began, then in CONT (100 ns later) when a MULD was decoded, new data was loaded into the nibble counter. The WAIT state waits for the data loaded in CONT to settle and access new ROM locations before beginning the pipe. After 100 ns in this state FLAG is set. In this context, FLAG set indicates the first pass in a double precision multiply. After 150 ns, since both DOUBLE and FLAG are set, PIPE is entered. The logic remains in the PIPE state, running the pipe and accumulating the answer until D1, a timing signal, is asserted. When D1 is asserted the current content of ACCM plus the two sets of stored-carrys are the first half of the MULD partial product. Asserting D1 selects the CADD state. This state NOPs most of the FM registers and enables the SALU add of stored-carrys and the ACCM content. CADD latches the upper 32 bits of the first half of the MULD partial product in TEMP. The lower 32 bits have been accumulating in LSH during the pipeline operation. The FM logic remains in CADD 150 ns (until D4 is asserted). Since FLAG is asserted, indicating first pass, asserting D4 selects the XFER state. Four cycles in the X FER state transfer the content of TEMP and LSH to LSH and ACCM (refer to Figure 2-28), clear FLAG, and clear the stored-carry registers. The assertion of D8 returns the FM logic to PIPE. The FLAG bit now cleared and DOUBLE set asserts ALU ADD. This signal causes the data stored in LSH during the XFER state to be addedin (4 bits per cycle) to the final product being developed. Six cycles transfer all 24 bits stored during XFER. While these bits are being right-shifted from the right end of LSH into the MSBs of the developing final product, the LSB of the developing final product are being right-shifted into the left end of the LSH. When 20 bits have been transferred in from LSH, SHF ZERO is enabled. This causes the logic to enter the ADDZ state. The final 4-bit transfer of LSH data takes place during the first ADDZ state. After that the ALU that added LSH to the ACCM is disabled. During this state, the pipe continues functioning and the LSBs of the accumulating final product are still shifted into the left end of LSH. The only difference between PIPE and ADDZ during this second pass is, in PIPE, LSH data bits are added into the MSB of the ACCM, and, in ADDZ, zeros are added. Note this state even has the same ending criterion as PIPE, namely D1 asserted. D1 asserted transfers control to the CADD state. As discussedin MULF path, CADD is entered when the ACCM plus the two sets of stored-carrys is the final answer. In CADD the stored-carrys are added to the AALU content by SALU and the result is latched into TEMP. Since FLAG is now clear the assertion of D4 causes a transfer to DONE. In DONE, MUL /DIV DONE is asserted. This causes the FPA microcode to select and transfer, via multiplexers, the upper 32 bits of the double precision result from the SALU onto FP bus A and the lower 32 bits from the LSH register onto FP bus B. Refer to Figure 2-29 for a summary of MULD control. MULL Path If the op code being monitored during CONT decodes as MULL, new data is loaded into the nibble counter. The logic proceeds to TEST and, in TEST, selects the WAIT as the first execution state because INT (meaning integer) is set. 2-63 31 BEFORE XFER —=0 TEMP 31e— AFTER [ AeIS YAAX YL v9-C 8z-¢ 2Indiyg xFer | NOT 123 ~0 8 7e—=0 —+(0 Usep | 31 LSH 1 1 NOT USED 31— —* 0 0 LSH 8l 7e—e0 ACCM 14 THE XFER TEMP 28 24 RIGHT 20 16 12 8 [ |Q I IQ | 1& | IQ | |Q I I< N LsH : 28 Sy 24 . N N\ . N . NSNS 20 12 JIIIllllllllllllllll a III N\ A 8 SHIFT 4x RIGHT SHIFT lNlll4lllO4x ACCM \X\x DATA FLOW T 12 | RIGHT 8 4 SHIFT 4 X TK-0273 MULD TIMING T0 T0 T0 TO |RD—————»}¢———— MCONT———» ¢ TO TO TO —»{50 NSje— 1 I MUL CLK _I l_ MUL STATE | INIT |SYNC|CONT| TEST | WAIT |WAIT|WAIT| (FMHM)FLAG | X | 0 [ o | o PIPE | PIPE | PIPE | PIPE | PIPE | PIPE | PIPE | PIPE | PIPE | PIPE | PIPE | PIPE | PIPE | PIPE |CADD|CADD|CADD MULNIBBLECNTR | x | A | B8 | 2 BANK A MP <3:0> CONTENTS OF ROM STRG . | 4 [ 5 | 6 MP MP 71! MP 19:16 8| 9| MP 27:24 MP Aa]|B|c | Db]|eE MP MP MP 35:32 MP 43:40 MP MP F 51:48 | o] MP MP 1 59:56 2 | 3| 4| 5| 2 MP 11 2 l PP1 | PP2 | PP3 | PP4 | PPS | PP6 | PP7 | PP8 | PP9 [ PP10| PP11|PP12|PP13|PP14 | x | st | st |{st|s.|[w|w|w|{w|w)||w|w|L|Lw/|fL]|Lw]|fL]|L,/ 1 | x | x Lb|LD]|LD]|NOP|NOP ACCM| ACCM{ACCM|ACCM|ACCM|ACCM|ACCM|ACCM|ACCM|ACCM|ACCM|ACCM|ACCM CONTENTS OF ACCM 2 | 3| 4| 5| 6| 7| 8| 9| 1011]12]13 |{st|st]st|st|w|w}|w}w|to|w|w|w]|L}|L]LW|fL]|LD|LD]|LD| to LSH ||| LD NOP|NOP w|w|w||Lw]|LW|LW]|L]|L,|LD|NOP|LD Lo | LD TEMP T 1 15:12 23:20 31:28 39:36 47:44 55:52 63:60 MCc | MC | mc | MC | mc | Mc |[MC | MC | MC| MC | MC | MC | MC | MC zo0|vyo|xo|wofvOo|luo0|TO|S0|RO|QO|PO|OO|N-O|MO CONTENTS OF PPROD accyetk | 3 11:08 CLR ppcctL 1 1 po | b1 | D2 | D3 | D4 (FMHM) “D" TIMING BANK B MP <7:4> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 | 0 | o [ 1 | I , ALU ADD 3 N MUL DIV DONE TK-0530 Figure 2-29 MULD Control (Sheet 1 of 3) 2-65 MULD TIMING T0 T0 TO T0 T0 T0 TO MUL CLK MUL STATE (FMHM) FLAG XFER | XFER | XFER 1 o| PIPE | PIPE | PIPE | PIPE | PIPE o] o o (FMHM) “D” TIMING D5 | D6 | D7 | D8 MUL NIBBLE CNTR 3 | 4|5 P |6 7 19:16 M 15:12 BANK B MP <7:4> 8 MP 11:08 BANK A MP <3:0> | o o | 9 MP o | A B | o] | c MP 27:24 MP 23:20 mc CONTENTS OF ROM STRG | o |ADDZ|ADDZ|ADDZ|ADDZ|ADDZ| ADDZ|ADDZ|ADDZ |CADD|CADD|CADD|DONE|DONE|DONE|DONE 35:32 MP 31:28 o ol | D MP E F T s R 1 ool po | b1 2 3 | D2 ofjo]o]|]o]|o}fo | D3 | 4|5 | D4 | D5 | 2] 3|4 ]| 51|66 59:56 MP 55:52 | Mc | mc | Mc | MC | MmC | MC | MC | MC | z1 | Y1 | x1|warlver{u-t| o MP 51:48 MP 47:44 | o] | o P 43:40 MP 39:36 oo mC | |1 MP 63:60 MC MC P01 [N M CLR CONTENTS OF PPROD PPC CTL PP15 | PP16 | PP17|PP18 | PP19 | PP20 | PP21 | PP22 | PP23 | PP24 | PP25 | PP26]| st | st CONTENTS OF ACCM PP27 | PP28 |st|{st|]w]|w|w|w|wjw|wb|w]|L|Lb|w]|L]|Lb|LD ACCM|ACCM|ACCM|[ACCM |ACCM|ACCM |ACCM[ ACCM|ACCM|ACCM (ACCM] ACCM |ACCM 15 | 16 | 17 | 18 | 19| 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 ACCY CTL NoP| st [ st | st|sL|w}|w|b|w]|to|w|{w|L|w]| || LSH NOP| SR | SR | SR| LD TEMP Lb | SR| sR| SR | SR | D | D | LD| D | D | LD | D| D |LD| |[LD| | LD |NOP|NOP Ldb|NOP|NOP | LD SR TTH ALU ADD MUL DIV DONE MUL DONE | TK-0531 Figure 2-29 MULD Control (Sheet 2 of 3) 2-66 MULD OPERANDS MINJo]lPr|la]lr]s]|T = v|w] : MP1 vl (€ 30 € 199YS) [o13U0D ATNN 6Z-7 2In3L] v | z MPO MC1.MC1L L9T x| |568ITMPLER E’ MCO 0's 32 24 SECOND HALF FIRST HALF | MCAND »>ie—8—» MULD RESULT ACCUMULATION 31— TEMP ACCM 1 =[ PP1 + ACCMO |-+{LSH| - accm2 = =1 pP2 + ACCM1 |+f LsH | | L 0|31 «—» 0 : LSH 1 v23 i 0131 owL | RESULT OF TRANSFER ACCM14 LSH ACCM 3 =| pP3 + ACCM2 |-+ LSH | ACCM4 =| LSH | L LSH Tor ) | s LsH ] LSH LisH | LsSH LSH | H L] pp4 + ACCM3 | accm s =[5 7 Acova Jeof AccM 6 =[ Pps + ACCMS |+ ACCM 7 =| PP7 + ACCM6 |4 ACCM 8 =| Ppg + ACCM7 |-»f ACCM 9 =| ACCM 10 ={ ppPg + ACCM8 PP10 + ACCMI |-»] LSH |-+ ACCM 12 =| PP12 + ACCM11 |-»f ACCM 13= | PP13 + ACCM12 |-of sh LSH LSH | }——] [ — [ — [ SALU = PP14 PLUS ACCM B P+LUS CARRYS FROM PP14 AND ACCM13 TEMP e LSH o |-+ ] Accmis = pp15 + Accmia Accmis psiLsH] ACCM16 AccMi6 = = PP16 PP16 ++ ACCM15 AC | | Accmis | | Accmas Accm2e Accm27 |+ == L LSH ]Accmzo LsH | Accmz1 = PP21 + ACCM20 LSH | Accm22 LSH LSH |+ 1L ] Accmie = PP19 + AcCM18 e |+ sH LSH LSH 1sH | Accmis = pPis + Accmi? LsH }——[ el Accm2a ] |+ ACCM20 Aaccm22 Accmzs i LSH || Accm17 = PP17 + ACCM16 ACCM17 ACCM21 H = LSH accM1s_ Accmis LL H LSH ACCM 11 ={ pp11 +ACCM10 }» LSH ] tsH FIRST HALF PARTIAL PRODUCT ] J PP20 + ACCM19 PP22 + ACCM21 } Accmz3 = PP23 + ACCM22 |— Accmaa= PP24 + ACCM23 — Accmzs PP25 + ACCM24 —— Acemae = PP26 + ACCM25 }-————-{ ACCM27 PP27 + ACCM 26 FINAL PRODUCT = SALU = PP28 PLUS ACCM27 PLUS CARRYS FROM PP28 AND ACCM27 FIRST HALF PARTIAL PRODUCT OF MULD TK-0532 In WAIT, the new ROM data selected by the new ROM address accessed as a result of the new data loaded into the nibble counter during CONT is given time to settle before entering the pipeline. When FLAG is set, the data has settled and the integer multiply pipeline state (MULL) is entered. The FM logic remains in the MULL state as the pipeline accumulates the final product (the least significant half accumulates in LSH). When COUNT = 3 is set, the AALU plus the two sets of storedcarrys is the final product. COUNT = 3 asserted selects DONE. In DONE, MUL/DIV DONE is asserted and the final product is available. The FPA microcode loads the upper half from the SALU onto FP bus A during one machine cycle. On the following cycle the lower half is loaded from LSH onto FP bus A. Refer to Figure 2-30 for a summary of MULL control. 2.3.53 Division - The TEMP and LSH register in the fraction multiplier logic are used to store the quotient generated during floating-point division. The registers are concatenated with the MSB of LSH shifting into the LSB of TEMP. During a divide operation the FPA asserts DIV and loads the divisor and dividend into the FNM. In the FM logic, the nibble counter is loaded for a MULF and clocks through until TEST. To initiate quotient storage the multiply control field (MCNT) of the FPA microcode must be asserted. The combination of MCNT and DIV asserted selects the NOP state in the division path. The FM logic enters NOP with the nibble counter odd and exits when the nibble counteris even. The 2 cycles (100 ns) allows the first quotient bit to be formed. From NOP, the FM logic enters DIV. In DIV, the logic left-shifts LSH and TEMP one bit every even cycle. When doing a single precision division the single quotient bit is input to both LSH bit 4 and TEMP bit 4. The data input to LSH is never accessed in single precision. In double precision the TEMP bit 4 quotient input is blocked and the TEMP bit 3 is input to TEMP bit 4 on the left shifts. DIV DONE is asserted when quotient bits are left-shifted in TEMP bits 28 and 29. This condition is tested at T100 of each state and transfers control to DONE if true. In DONE, MUL/DIV DONE is asserted, stoppmg the division process in the FNM and causing the FPA microcode to access TEMP for a smgle precision quotient and TEMP and LSH for a double precision quotient. 2.3.6 Exponent Processor The exponent processor, part of the FCT, processes the FP exponent during FP operations. During FP multiply /divide, the processor adds/subtracts the exponents as needed. During add/subtracts, the processor stores the larger exponent and determines the final exponent by takmg into account the operation, fraction right-shifts, and left-shifts during normalization. By comparing the exponent magnitudes the exponent processor also controls the FPF addition and subtraction in the FAD. Refer to Figure 2-31. 2-68 MULL OPERANDS 23:20 15:11 39:36 31:28 l 19:16 L 11:08 l 35:32 l 27:24 MPo 32BITMPLIER| M [N Jo| P | a| R | s | T | mciNT32BITMCAND MULL TIMING TO MUL STATE TO TO TO INIT | syNc | CONT | TEST | WAIT | walT | WAIT | mutt | mutL | mutL | mutt | muL | muce | mutt | mucl | pone FLAG MULL NIBBLE CNTR TO I X A B 6 l 7 I 8 9 2 3 4 5 6 7 8 9 COUNT =3 BANK A MP <3:0> «— MP27:24 —ble— MP35:32 —pja— MP 11:08 —sle— MP 19:16 —] BANK B MP <4:7> @— MP 31:28 — ple—— MP 39:36 —t@— MP 15:12 —t@— MP 23:20 — CONTENTS OF ROM STRG | mcinT Is meintlr meinTla meintle maintlo maint In meinTiv moinT CLR _ CONTENTS OF ACCM Ach;no Ach1 ACS-MZ AC;M3 AC:M4 acoms| acems | - ACCM 7 LSH LD LD LD LD LD LD LD MUL DIV DONE MUL DONE E 0 E 0 ACCM 1 = | PP1 + ACCM 0 E 0 E 0 E 0 E 0 E MULL RESULT ACCUMULATION ] LSH ACCM 2 =|PP2 + ACCM 1 ACCM 3 ACCM ACCM § = ACCM 6 = ACCM 7 = 4 = = |PP3 + ACCM 2 o LSH |PP4 + ACCM 3 |PP5 + ACCM 4 sl LSH o LsH |PP6 + ACCM5 |PP7 + ACCM 6 LSH LSH LSH TK-0525 SALU = PP8 PLUS ACCM 7 PLUS STORED CARRYS FROM PP8 & ACCM 7 Figure 2-30 MULL Control 2-69 / DALU LA-LB ,8 / L SHF COUNT /8 yd CALU LB-LA 3 /8 7 - 10 va (INPUT SEL) z2 / /8 EAC3 <07:00> PR A le— 10 (LOAD ENABLE) EAC1 2 / OPERATION SEL 2 . 801 6 —+—. f EAUL EALU ) : 1 0 7 /8 (LOAD ENABLE) | LA SELECTS INPUT AMUX A 8 POSITIVE OR ZERO & (INPUT SEL) BMX h A GT B 2 - - (Fap) SHF COUNT IS ALWAYS / (LOAD ENABLE)| EAC2 ' BMUX ) §8 NORMALIZATION LB CONSTANT ——s| <07:00> A /3 /,8 A Jd /8 s e——(LOAD ENABLE) EACO / 14:07 (OUTPUT ¢ BUS FP A <33:00> O\ SELECT) BSC <3:0> BUS FP B <33:00> TK-0277 Figure 2-31 Exponent Processor Block Diagram 2-70 The FPEs are loaded from FP buses A plus B into LA and LB under control of the EAC field in the microcontrol (Table 2-19). The contents of LA and LB are loaded into CALU and DALU. CALU computes LA - LB and DALU computes LB - LA. The carry-out signal from DALU selects either CALU or DALU as the positive exponent difference (SHF COUNT) to provide FPF control in the FAD. Table 2-19 EAC Control Store Field EAC Fields 3 2 1 0 uCs 27 uCs 26 uCs 25 uCs 24 Controls LB- BusB Controls PR - EALU Controls XR - EALU Operation | Controls LA->Bus A Transfers Transfers Transfers Transfers Hex 0 )] 0 1 0 0 0 0 2 3 4 5 6 7 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 0 0 1 1 8 9 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 0 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 A B C D E F 0 0 1 1 1 1 NOTE Although the control field appears to be a 4-bit field, each bit of the 4 bits actually controls a single, independent function. 2-71 NOP The contents of LA and LB, as well as XR (poly register), PR (product register), a normalization constant, and 806 are possible inputs to EALU. Input selection is controlled by both microcontrol and hardware. Refer to Table 2-20 for input selection summary. Table 2-20 EALU Input Control AMXC Fields 1 0 uCs 34 Operation 0 0 1 0 I 0 1 LA to EALU Ainput LBto EALU A input PRto EALU A input Hardware select: For FP Add/Subtract, larger exponent (LA or LB)to EALU A — uCs 35 BMXC Fields 1 0 uCs 33 uCs 32 0 0 1 1 0 1 0 1 Operation Normalization constant to EALU B input XR to EALU Binput 8016 to EALU B input LBto EALU Binput 2-72 The EALU operation is controlled by the microcontrol field EALUC. Refer to Table 2-21. The output of the EALU can be loaded into XR or PR for further processi ng, or loaded onto the FPA bus as a final answer. The XR and/or PR are loaded under control of the EAC microcontrol field. Refer to Table 2-19 (bits 0 and 1). The EALU output to FP bus A <14:07> is controlled by BSC microcontrol field (Bus A EXP). Refer to the discussion of BSC field in Paragra ph 2.3.2. The partial answers in XR and PR are reloaded into the EALU via AMUX and BMUX, and are combined with either a normalization constant or £80;¢ before they are loaded onto FPA < 14:7>. Refer to Table 2-20. The normalization constant, a variable quantity, adjusts the exponent for shifts required to normalize the FPF in the FAD. (The actual normalization constant is read from a ROM rather than computed. The ROM is on the FNM.) The 804 corrects for the offset that results in FPE add/subtract during exponent processing in MUL/DIV. Refer to Paragraphs 1.4 and 1.5. Table 2-21 EALU Control Store Field EALU Fields 1 0 uCs 31 uCs 30 Carry Control S3 Sy St S 0 0 X H (logic) H H H H Pass 0 1 0 L (arith) L H H L A-B 1 0 1 L (arith) H L L H A+B 1 1 X H (logic) H H L L Force 1I's out (interpreted as Control Signals Generated EALU Operation Required | Req Mode AINPUT underflow. This function is used to generate zeros on the buses. X = Don’t care 2-73 Sign Processor n result using both The sign processor, a section of the FCT, determines the sign of the FP operatio2-32 and Tables 2-22 Figure to Refer . hardware and the microcontrol field SGNC (sign latch controls) operand, the each of e magnitud and and 2-23. This section receives information indicating the sign resulting The result. the of e magnitud the desired operation (add, subtract, multiply, divide, poly) and 2.3.7 sign is placed on FP bus A 15. SB | SA SIGN A FP BUS A <156>5—+ SIGN g INSTRUCTION EP BUS B <15>5 > COMBINATORIAL B 3 | DECODER SGN' s TO LOGIC 6 IRC? ——# EALU? 42y FP BUS AS <15> (OUTPUT) RESULT* NOTES 1. 2. 3. FROM uCS SGN FIELD FROM 1B DETERMINES INSTRUCTION TYPE sx* 4. 5. INTERMEDIATE RESULTS SIGN OF OPERANDS DETERMINES IF RESULT IS ZERO OR NEGATIVE TK-0280 Figure 2-32 Sign Processor Block Diagram 2-74 Table 2-22 SGNC Control Store Field SGNC Field SGN SGN SGN C2 C1 Cco Operation uCs 07 uCs 06 uCs 05 Load into SA Load into SB 0 0 0 SA (NOP) 0 0 SB (NOP) 0 1 0 FPbus A 15 SA + Op Code SB (NOP) SB (NOP) 1 = SUB 0 1 1 1 1 * 1 1 Result* 0 0 SA (NOP) 0 1 FP bus A 15 1 0 SB 1 1 SA + SX SB (NOP) FPbusB 15 FPbus B 15 SB (NOP) SB (NOP) This is the resultant sign, determined by the op code, signs of the operands, the relative magnitude of the exponents, and the signs of the FALU. It can also be forced if a floating underflow or overflow occur. Table 2-23 Sign Processor Operation Op Code of Exponents Sign of Result (FALU sign) MULX ADDX SUBX X X LA>LB LA>LB LA<LB LA<LB LA =LB LA=1LB LA=LB SUBX LA=1L1B X X X X X X Positive Negative Positive Relative Size DIVX ADDX SUBX ADDX SUBX ADDX Negative Result* SA @ SB SA®SB SA SA SB SB SB SB SB SB X = Don’t Care *Except for error - in case of overflow, the sign is forced to a 1 while underflow forces a 0. 2.3.8 Control Store and Logic As indicated in previous sections, the control store and logic, located on the FCT, provides the control signals for all FPA operations. These include both FPA internal operations: the transfer and manipulation of FP data, and external operations (interface between the FPA and CPU). Refer to Figure 2-33. TO CPU FPA T —_— FPA STATUS TO CONTROL LINES CPU INTERFACE SELECTED LOGIC MICRO WORD T I NEXT ADR <8:0> i STORE OPCODE & SPECIFIER LOGIC ADR 8:0 2:0 LOGIC K SPECIFIER .| DECODE LINES 2:0 BEN STALL IROPC__| NEXT T 1 CLK 7:0 MSC CONTROL 4 LeFLoAT NEXT ADDRESS -OC FLOAT—] LOGIC TRAP ADDRESS LINES TRAP LOGIC f[*—CS LINES TK-0271 Figure 2-33 Control Store and Logic Block Diagram The FPA has two normal operating functions: instruction register decode (IRD), and performing an FPA instruction. The FPA normally alternates between these two functions. A third function, exceptional conditions, handles error conditions, traps, and interrupts. The FPA executes the third function whenever an exceptional condition is sensed. The FPA and the CPU run synchronously, i.e., both have 200 ns microcycles divided into 4 time states (CPTO0, CPUTS50, CPT100, CPT150) and TO CPU is simultaneous with TO FPA. Both load a new microword only at TO. The FPA always keeps two updated copies of the 16 CPU general (scratchpad) registers. These copies are used by the FPA to optimize register-mode instructions. These register copies are accessed and updated by the same lines that access and update the CPU registers themselves. To ensure that the FPA never reads a changing register the CPU updates the general register set (and FPA copies) between T100 and T200 (T0) and the FPA reads the copies only between TO and T100. 2-76 The FPA as a whole is directly controlled by the CPU. The CPU can enable and disable the FPA via bit 15 of the FPA status register (ID bus register 17). The FPA is normally enabled by the CPU. The FPA is a microcontrolled unit containing a 512 words by 48 bits of control store in ROM. Each word is divided into various length control fields, each field providing independent control of a particular section of the FPA. In general, these fields: control the operation of the FPA data manipulation components; coordinate the operation of the FPA with the operation of the CPU: and initiate the operation of parts of the FPA control logic. Control of FPA operations is handled by accessing specific ROM words causing a particular set of FPA actions. 2.3.8.1 IRD - The IRD stateis controlled by location IRD.1 in the control ROM. In this state a new microword is not read until STALL is disabled. ACC INSTR H and IB CALL from the CPU microword disables the STALL condition. When the FPA leaves IRD, the ACC ERROR bit in the status register is cleared if it was set during a previous cycle. The op code and specifier decode logic is monitoring the IRC OPC 7:0 and specifier lines. The OPC lines enable ACC INSTR H when a FPA instruction is in the IB and are decoded to determine instruction type. The specifier decode lines determine specifier type. The output of this decode logic is transmitted to the next address logic. Location IRD.1 controls all FPA operations in the IRD state. The operation assumed is a register to register operation. The FPA continually begins this operation without any indication that the next operation will be an R to R because it has both operands in its register set and, if the next FPA operation is an R to R, both operands will already be loaded. Location IRD.1 has MSC = 6 and the next address = 180. This information is transmitted to the next address logic and along with the outputs of the op code and specifier decode logic determines the correct next microaddress. In the next address logic (refer to Figure 2-34 and Table 2-24), the MSC = 6, and op code and specifier decode logic lines select the address offset to be ORed with next address (= 180) to select the next microaddress. MSC = 6 selects the A-fork inputs from op code and specifier decode logic lines and transmits them through the A-B fork mux. This selects the correct offset based on instruction type, float or double, and specifiers 1 and 2. ; 2-77 TRAP CONTROL CS SIGNALS NEXT ADDRESS DECODE (FROM CURRENT MICRO WORD) \ 4 (9) ACC TRAP ADDRESS (8) CS BUS > fg— ID BUS :> REGISTER <16.23> FP TRAP ADDRESS(3) MAINTENANCE A OR B FORK DATA CONTROL A—-B FORK FORK :"‘"_1 > MUX AB —— FORK A — B DATA (4) el NEXT ADDRESS NEXT ADDRESS SELECT > DECODE FLOAT? {(1)—> MSC=60F‘7?——I BRANCH ENABLE BEN BEN DATA (3) MUX DATA TK-0534 Address Figure 2-34 Next Address Logic Table 2-24 Next Address Lines Description Next Address Control Lines FCTK BEN 2:0 H CS 71,70 From FPA control store selects lines to be monitored during execution flows. CPU accelerator control field 00 - NOP 01 - CPSYNC - To 3-bit address specified by CPU USI field 10 - ACC TRAP 11 - REDEFINE USI 2-78 Table 2-24 Address Next Address Lines (Cont) Description Next Address Control Lines (Cont) CS 57, 56, 55 If CS71 and CS70 are high enabling DEC USI, a 6 on these lines enables POLY DONE, a 7 FP TRAP. FCTH ACC TRAP H High during accelerator trap, low otherwise. FCTH FP TRAP L Low during FP trap, high otherwise. FCTH TRAP DIS L Low during either FP trap or accelerator trap, high otherwise. Next Address Selector Controls DEC uSI A-FORK FCTH DEC uSI L enabled and CS 57, 56, and 55 high enable FCTH FP TRAP, otherwise it is high. B-FORK MUX SELECT Enable H causes all highs out and doesn’t affect next address. Enable L enables select input to select A-B data. NEXT ADDRESS MUX Enable H causes all highs out. If enable is low, S low selects A input. BEN MUX Enable high causes all highs out. Address Lines FCTR CRADR 08:00 H To control store selects address. Also can be transmitted to CPU via Reg 16 as current ADR. FCTK NEXT ADR 08:00 From control store next address from microword. FCTH FCTF TRAP A 07:00 L to Contains either trap address or next address. FMHR TRAP A 7:00H FP trap address from MAINT REG ID BUS. FCTH BRC 2.0 L From branch enable MUX (BEN) monitors various FPA conditions and modifies the next address during execution flows based on BEN field in FPA microcode. A-B FORK ADR (Not a signal name on prints) From A-FORK B-FORK select Mux. Monitors op code and specifier type from IB and modifies address in A-B forks. FCTF FLOAT H Based on op code. Used during A-B forks and by branch enable logic (BEN). CS 57, 56, 55 Select trap address during ACC trap. Also refer to CS 57, 56, 55 in control lines. 2-79 The offset is ORed with 180 and since STALL is no longer enabled (ACC INSR H is high) the next CPT 0 will select the correct microword to control the next FPA cycle. If the data is already in the FPA, an optimized routine will be selected. 2.3.8.2 Performing an FPA Instruction - Once an FPA instruction is sensed, the microcontrol words and the order they are selected is based on the operation desired, float or double, location of the operands, and relative size of the operands and/or result. The FPA first ensures that it has all the required data. If both operands are in registers, or one is in a register and the other is a short literal, all the data is in the FPA after the A-fork test and the FPA transfers directly to the execution flows. If not, the first operand is fetched during A-fork and then MSC = 7 and next address = 100 is transmitted to the next address logic. In the next address logic, MSC = 7 selects the B-fork inputs from the op code and specifier decode, and transmits them through the A-B fork mux to be ORed with next address = 100. The offset selected depends on instruction type, double or float, and type of specifier 2. As before, if the data is already in the FPA, an optimized routine is selected; otherwise, the FPA waits for the CPU to fetch data. In some data transfers (A-fork or B-fork) the FPA must wait for data to be transmitted from the CPU via the ID bus. The microcode has a special WAIT bit to enable STALL for this purpose. The CPU indicates that the required data is on the ID bus by asserting CP SYNC. CP SYNC causes the data to be stored in the FPA and clears STALL; thereby enabling a new microword to be read and FPA operations to continue. Once the FPA has all required data ACC OVERIDE is asserted. This signal, transmitted to CPU microaddress bit 12, causes the CPU to select microcode from FPA specialized microcode in the writeable control store (WCS) rather than PCS. This prevents the CPU from beginning microcode floating-point routines (used when no FPA is present) to do FP instructions. The enabling of ACC OVERIDE is based on instruction type (IRC lines) and the execution point counter, (IRC EP 2:0). Note that since the FPA cannot fetch data itself, the data-fetch routines (CPU AFORK and BFORK) are allowed to continue until the FPA has all required data. Once the FPA has all the data the FPA execution flows are entered. These flows perform the manipulation required to A, S, M, and D. This includes unpacking and individually manipulating the FPF and FPE parts of the number, as well as checking the operands and/or results for unusual conditions (zeros, underflow, overflow, etc.). During execution flows the BEN field selects lines to be monitored and used to modify the next address. The 3-bit BEN field of each microword can select 3 of 24 possible lines to be ORed with the next address field of the microword to select the address. The BEN multiplexer monitors signals from both the CPU and FPA. POLY DONE and CP SYNC are transmitted from the CPU using CS lines 71, 70, 57, 56, and 55. FLOAT, IRBRO L, and IRBRI1 L are generated in the FPA but are summaries of op code information transmitted from the instruction buffer. All other BEN lines monitor FPA internal conditions. Refer to Table 2-25 for a summary of BEN fields. Finally the flows manipulate the result to ensure it is in correct form and inform the CPU via FP SYNC asserted that the answer is available. 2-80 Table 2-25 BEN BEN Control Store Field Lines Monitored BRC2L BRCIL BROOL 1 FLOATH* |RBRIL* IRBROL* 2 SWR SWR SWR Shift within range 3 RSVH BH A=0H Operand(s) equal zero Field 0 Operation Summary NOP | Op code decode Reserved operand 4 POLY DNL* CPSYNCH* FLOAT* 5 (AorB=0)H ED.GESH SUB*ED<2H Operand(s) equal zero Check exponent difference 6 7 MUL/DIV | Multiply done DN H Division done UNDFL PR 8 H Error Condition *From the CPU. The CPU accepts the answer via DFMX bus drivers on the FNM using DAP ENA ACC D (1) and also reads the ACC Z, V, C, and N data lines to determine the conditio n codes of the answer. Once the CPU has the answer it transmits a CPSYNC and the FPA returns to its IRD state. 23.83 Exception Conditions - At any time during either IRD or instructi on states the CPU can direct the FPA to enter a trap routine for error recovery or microdia gnostics. The trap routines are located in the FPA’s own microcode. There are two separate sets of trap routines: ACC traps for CPU and FPA errors and FP traps for microdiagnostics. Both trap routines are initiated via CS lines 71 and 70. IfCSbus 71 is H and CS bus 70 is L, an ACC TRAP is initiated. An ACC TRAP addresses the FPA microcode location selected by CS bus lines 57, 56, and 55 (Iocation 0-7). These traps are normally initiated for.power-up and abort sequences. If CS bus 71, 70, 57, and 56 are high and 55 is low, an FP trap is initiated. The FP trap selects an 8-bit address previously stored in ID register 16, the Status register to access one of 256 addresses in the FPA microcode (location 0-255). These trap locations normally handle FPA microdiagnostics. Refer to Figure 2-34. 2-81 2.4 FPA Microcontrol Fields This section summarizes all the fields in the FPA microcontrol word. Figure 2-35 shows the complete microcontrol word, all the fields, and the microcode mnemonics. Table 2-26 lists the function of each field. 47 43 44 45 46 NEXT ADDRESS 31 30 29 28 — EALU 27 26 25 24 T . v MCTL EXPONENT CONTROL FP SYNC 23 J PROCESSOR CONTROL ) I\ I\ v I v \ 32 33 34 35 36 37 38 39 40 41 42 BRANCH EALUA EALUB ENABLE INPUT INPUT 19 17 22 . 21 20 ~ 18 _);W ) SCRATCH MISCELLANEOUS CONTROLS waIT 16 PAD CONTROL NORM, REGISTER 15141312111009080706050403020100 h4 BUSA -BUSB DATA SOURCE N I\, v h 8 J \ SIGN LATCH FRACTION CONTROL PROCESSOR CONTROL v J MULTIPLIER OPERAND CONTROL REMAINDER REGISTER CONTROL TK-0513 Figure 2-35 FPA Control Word Fields 2-82 Table 2-26 FPA Control Word Field Definitions Microcode Bits Field Function 47:39 (9 bits) NAD — Next Address Contains the address of the next control word to be accessed. 38:36 (3 bits) BEN — Branch Enable Selects signals to be used for next address calculations. 35:34 (2 bits) AMXC — A Mux Control Selects A input to FCT exponent ALU. 33:32 (2 bits) BMXC — B Mux Control Selects B input to FCT exponent ALU. 31:30 (2 bits) EALUC — EALU Control Controls FCT exponent ALU operation. 29 (1 bit) FPSYNC — Floating-Point Transmits FPSYNC to CPU. Synchronize 28 (1 bit) MCTL — Multiply Control Starts FML and FMH fraction multiply operation. 27:24 (4 bits) EAC — Exponent Processor Controls FCT (exponent processing). Control 23 (1 bit) WAIT — Wait Controls FPA wait loop operation. Stalls until CPSYNC. 22:20 (3 bits) MSC — Miscellaneous Controls Miscellaneous FPA operations. Control 19:18 (2 bits) NRC — Normalization Controls fraction normalize operation in Register Control FNM. 17:16 (2 bits) SCR - Scratchpad Control Handles FPA General Register copies on FNM. 15:12 (4 bits) BSC — Bus A — Bus B Controls data transmission along FPA buses. Data Source 11:8 (4 bits) FADC — Fraction Controls FAD fraction processing. Processor Controls 7:5 (3 bits) SGNC - Sign Latch Controls sign calculation on FCT. Controls 4 (1 bit) LRR - Load Remainder Controls remainder register (RR) on FNM. Register 3:0 (4 bits) OPLD — Operand Load Loads fractions for multiplication on FML (Multiplier Control) and FMH. 2-83 2.5 FPA MICROCODE STRUCTURE The FPA contains a 512 word by 48 bits (per word) memory. This memory provides microcontrol of the FPA during normal operation and diagnostic programs for maintenance and troubleshooting. About 225 locations are for normal microcontrol, and 200 locations contain diagnostic programs. The other locations are available for future use. The microcontrol code has an IRD state (instruction register decode) and three fork points (A, B, and C). The FPA remains in the IRD state until an FPA instruction is decoded. The FPA then enters Afork, to receive the operands. If both operands are registers or short literals, optimized routines are entered and computation begins. Otherwise, B-fork is entered. If the second operand is not register data, C-fork is entered. Otherwise a B-fork optimization is taken. Figure 2-36 shows the basic microcode structure and indicates the microcode starting addresses of the various routines. 2.6 FPA INTERFACE FIRMWARE The CPU-FPA interaction is handled by specialized firmware located in the CPU’s writeable control store (WCS). This firmware handles numerous interface tasks. For ADD, SUBT, MUL, and DIV operations it accepts and stores the FPA results and condition codes, and handles any exceptions flagged by the FPA. In 3-operand op codes it calls specifier decoding microcode in the base machine to decode the third operand. It also handles the special requirements of the EMOD, MULL and POLY commands. It is accessed when the FPA overrides the CPU Address by forcing the uPC <12> to 1. This happens when the FPA detects an execution or optimization exit at a CPU A-fork, B-fork, or C-fork for an FPA implemented instruction. 2.6.1 Major Interface Functions This firmware coordinates the interface between the CP microcode and the FP microcode including the normal transfers of CPU data to the FPA, FPA results back to the proper register in the CPU, and various control signals for both normal and exception control. Table 2-27 lists important macros and microorders that are used by the FPA interface firmware to generate and/or monitor the signals which are transferred between the CPU and FPA. 2-84 0A3 IRD (AFORK ) J | 186 MULL | 196 ADDF.SUBF | 146 MULF | 186 DIVF S* 4R | 18€ MULL S*4R | 19€ ADDF.SUBF SA4R | 1a€ MULF S*#.R | 1BE DIVF R.R R.R 194 | 144 ADDD.SUBD| [ MULD R.R BIVD R.R S*4R s*#R | 1e2 FLOAT MEM.X [ ( B FORK } [ FLOAT (cponx } J 132 1F6 1FA [ 1ro [ 1ra FLOAT—] FLOA'T'"' DOUBLE DOUBLE S A #.X #.X Lrx memx SA#X [ | | [ 136 | | [ 13a FLOAT X MEM XS A # X4 L | | oac | oap MULF || | FLOAT DIVF 1FE FLOAT [ 130 DOUBLE X.MEM | ADIDFOAE ADDF ) | 134 | 138 | e DOUBLE | I [ 106 [ 11€ S | 126 | 1Bc BIVD |rr R.R RR DATA SOURCE KEY R RX | 19C | 1ac ADDD.SUBD| [ MULD REGISTER 4 SHORT LITERAL 4 LITERAL (IMMEDIATE) MEM MEMORY X DON'T KNOW [ 13e | 11c | 12¢ | 13c DOUBLE DOUBLE MULL ADDF.SUBF MULF pive | [ADDD.SUBD) [ muLD XS 4 # DIVD X4 X.R X.R X.R X.R X.R X.R | | | | | [ 1ae [ | | | L | oar | oas [ oa9 ] oaa MULL | 1rc DOUBLE #X SA4R 184 DIVD | MULD DD, Soon l X.R EMODF 1ac EMODD [ 1se POLYF 1 1sc POLYD l TK-0511 Figure 2-36 FPA Microcode Structure 2-85 Table 2-27 Interface Microcode Name of Macro Signal Monitored or Generated Data Transfer Function ID-D. SYNC CP SYNC generated CPU - FPA Gates the CPU D-Register’s contents onto the ID bus. Generates CP SYNC. CP SYNC indicates that valid data is on bus. D-ACCEL & SYNC CP SYNC generated FPA - CPU Gates data placed on DFMX Bus by FPA into DRegister. CP SYNC indicates that the FPA’s data has been accepted. Q-ACCEL & CP SYNC generated FPA - CPU SYNC Gates data placed on DFMX Bus by FPA into Q- Register. CP SYNC indicates that the FPA’s data has been accepted. ACCEL?* (BEN/ACC<UB2, UBI, UB0>)t FP SYNC monitored FPA - CPU ACC<UBO> = 1; Result data, on DFMX bus, and condition codes are being transmitted by FPA. If double precision condition codes are passed with first half. ERR SYNC monitored NO ACC<UBI> = 1; An exception has been detected by the FPA. This initiates specialized routines that handle the exception. POLY.DONE Not Mull** generated NO ACC<UB2> = 1, Separates MULL and MULF POLY.DONE generated CPU - FPA Indicates the last coefficient in the POLY operation, it being presented. In POLYD, used while both halves of the last coefficient are transmitted. TRAP.ACC[ 1] Accelerator Trap MSC/LOAD. ACC.CCT NO Returns FPA microcode to IRD state NO Loads PSW<N,Z,V.C> with FPA generated condition codes from CPU latches loaded in previous cycle. * This macro, in combination with the target constraint block, enables the CP microcode to test for various conditions. t This is a microorder rather than a macro. ** This is a condition rather than a specific signal. 2-86 2.6.2 Major Instruction Groups The FPA firmware can be broken into 4 groups of routines: Generalized instructions handler, POLY handler, MULL handler, and EMOD handler. Group 1 handles all ADD, SUB, MUL, and DIV instructions as well as FPA exceptions. This group provides optimized flows for operands located in the general register set and literal operands. The POLY group transmits the polynomial coefficients to the FPA as they are needed and transmits POLY DONE when the last coefficient has been transmitted. It also responds to the FPA detection of overflow, underflow, and coefficient reserved operand. Overflow and reserved operand detections causes a branch to exception conditions routines in the base machine. If an underflow is noted, the firmware notes it and continues execution of the POLY flows. The MULL routine accepts the result of the longword integer multiplication from the FPA. Since the FPA creates an unsigned 64-bit product using 32-bit signed operands, the firmware must correct the result by subtracting out the effects of the negative signs on the magnitude result. To do this the firmware stores the operands in a form that can later be used as subtrahend operands to correct the product and, based on this stored information, determines the correction sequence to select when the result is transmitted from the FPA. The firmware also creates the proper signed result, sets the condition codes, and tests for overflow. The FPA handles only the fraction multiply of the EMOD instructions. As a result the EMOD firm- ware is relatively short. While the FPA is doing the fraction multiply this routine adds the exponents and checks for reserved operands, accepts the fraction multiply result from the FPA, checks for a zero result, and formats the FPA result so control can return to the EMOD routines in the base machine. 2-87 FP780 : FLOATING-POINT ACCELERATOR Reader’s Comments - TECHNICAL DESCRIPTION EK-FP780-TD-001 Your comments and suggestions will help us in our continuous effort to improve the quality and usefulness of our publications. What is your general reaction to this manual? In your judgment is it complete, accurate, well organized, well written, etc.? Is it easy to use? What features are most useful? What faults or errors have you found in the manual? Does this manual satisfy the need you think it was intended to satisfy? Does it satisfy your needs? O Why? Please send me the current copy of the Technical Documentation Catalog, which contains information on the remainder of DIGITAL’s technical documentation. Name Street Title City Company State/Country Department Zip Additional copies of this document are available from: Digital Equipment Corporation 444 Whitney Street Northboro, Ma 01532 Attention: Communications Services (NR2/M15) Customer Services Section Order No. _ EK-FP780-TD-001 FIRST CLASS PERMIT NO. 33 MAYNARD, MASS. BUSINESS REPLY MAIL NO POSTAGE STAMP NECESSARY IF MAILED IN THE UNITED STATES Postage will be paid by: Digital Equipment Corporation Technical Documentation Department Maynard, Massachusetts 01754
Home
Privacy and Data
Site structure and layout ©2025 Majenko Technologies