Digital PDFs
Documents
Guest
Register
Log In
EK-FP780-TD-1
December 1978
112 pages
Original
5.0MB
view
download
Document:
VAX-11/780 FP780 Floating-Point Accelerator Technical Description
Order Number:
EK-FP780-TD
Revision:
1
Pages:
112
Original Filename:
OCR Text
EK-FP780-TD-001 FP780 Floating-Point Accelerator Technical Description digital equipment corporation • maynard, massachusetts I st Edition, December 1978 Copyright ~ 1978 by Digital Equipment Corporation The material in this manual is for informational purposes and is subject to change without notice. Digital Equipment Corporation assumes no responsibility for any errors which may appear in this manual. Printed in U.S.A. This document was set on DIGITAL's DECset-8000 computerized typesetting system. The following are trademarks of Digital Equipment Corporation, Maynard, Massach usctts: DIGITAL DEC PDP DEC US UNIBUS DEC system- I 0 DECSYSTEM-20 DIBOL EDUSYSTEM VAX VMS MASSBUS OMNIBUS OS/8 RSTS RSX IAS CONTENTS Page PREFACE CHAPTER 1 INTRODUCTION I. I 1.1.1 1.2 1.3 1.4 1.4.1 1.4.2 1.4.3 1.4.4 1.4.5 1.4.6 1.4.7 1.4.8 1.5 GENERAL DESCRIPTION ............................................................................... 1-1 Accelerator Interface .................................................................................... 1-2 FPA INSTRUCTION SET .................................................................................. 1-3 PHYSICAL DESCRIPTION .............................................................................. 1-4 REVIEW OF FLOATING POINT NUMBERS AND ARITHMETIC .............. 1-5 Introduction ................................................................................................. 1-5 Integers ........................................................................................................ 1-5 Floating-Point Numbers ............................................................................... 1-5 Decimal/Binary/Hexadecimal Conversion ................................................... 1-6 Normalization ............................................................................................ 1-11 VAX Floating-Point Notation .................................................................... 1-12 Floating-Point Addition and Subtraction .................................................... 1-13 Floating-Point Multiplication and Division ................................................ 1-13 EXCESS 80(EXCESS 200g) NOTATION .......................................................... 1-14 CHAPTER2 FUNCTIONAL DESCRIPTION 2.1 2.1.1 2.1.2 2.1.3 2.1.4 2.1.5 2.1.6 2.2 2.2.1 2.2.1.1 2.2.1.2 2.2.1.3 2.2.2 2.2.2.1 2.2.2.2 2.2.2.3 2.2.3 2.2.3.1 2.2.3.2 2.2.4 2.2.4.1 2.2.4.2 2.2.4.3 2.2.5 2.2.5.l 2.2.5.2 DATA FORMAT ................................................................................................ 2-1 Floating-Point Numbers ............................................................................... 2-1 Integer Numbers .......................................................................................... 2-4 Literals ......................................................................................................... 2-4 Zero and Reserved Operand Codes ............................................................... 2-7 Hidden, Overflow and Guard Bits ................................................................ 2-8 Overflow, Underflow, Zero, and Reserved Operands .................................... 2-9 INSTRUCTIONS AND ALGORITHMS ......................................................... 2-12 Add/Subtract ............................................................................................. 2-14 Load ................................................................................................... 2-14 Add/Subtract ..................................................................................... 2-14 Normalize .......................................................................................... 2-15 Multi ply (Floating-Point) ........................................................................... 2-16 Load ................................................................................................... 2-16 Multiply ............................................................................................. 2-16 Normalize .......................................................................................... 2-17 MULL (Multiply Integer Longword) .......................................................... 2-17 Load ................................................................................................... 2-17 Multiply and Return ........................................................................... 2-17 Divide ........................................................................................................ 2-17 Load ................................................................................................... 2-18 Divide ................................................................................................ 2-19 Normalize .......................................................................................... 2-19 EMOD (Extended Precision Multiply and Integerize) ................................. 2-19 Operand Load ........................ ,........................................................... 2-19 Result Calculation and Return ............................................................ 2-19 iii CONTENTS (Cont) Page 2.2.6 2.2.6.1 2.2.6.2 2.2.6.3 2.2.6.4 2.3 2.3.1 2.3.1.1 2.3.1.2 2.3.1.3 2.3.2 2.3.3 2.3.4 2.3.4.1 2.3.4.2 2.3.5 2.3.5.1 2.3.5.2 2.3.5.3 2.3.6 2.3.7 2.3.8 2.3.8.1 2.3.8.2 2.3.8.3 2.4 2.5 2.6 2.6.1 2.6.2 POLY (Polynomial Evaluation) .................................................................. 2-20 Introduction ....................................................................................... 2-20 The Polynomial Expression ................................................................ 2-20 Normal POLY Flows ......................................................................... 2-20 POLY Exception Flows ...................................................................... 2-23 BLOCK DIAGRAM AND UNIT DESCRIPTION .......................................... 2-25 CPU-FPA Interface .................................................................................... 2-27 CPU-FPA Status and Control Interface .............................................. 2-28 CPU-FPA Data Interface ................................................................... 2-30 Trap and Diagnostic Information ....................................................... 2-31 FPA Internal Buses ..................................................................................... 2-34 Fraction Adder (FAD) ................................................................................ 2-37 Fraction Normalizer/Divide(FNM) .......................................................... 2-41 Normalize Operation .......................................................................... 2-43 Divide Operation ................................................................................ 2-45 Fraction Multiplier (FML and FMH) ......................................................... 2-48 The Pipeline ........................................................................................ 2-50 FM Control ........................................................................................ 2-57 Division .............................................................................................. 2-68 Exponent Processor .................................................................................... 2-68 Sign Processor ............................................................................................ 2- 74 Control Store and Logic ............................................................................. 2-76 IRD .................................................................................................... 2-77 Performing an FPA Instruction .......................................................... 2-80 Exception Conditions ......................................................................... 2-81 FPA MICROCONTROL FIELDS .................................................................... 2-82 EPA MICROCODE STRUCTURE .................................................................. 2-84 FPA INTERFACE FIRMW ARE ...................................................................... 2-84 Major Interface Functions .......................................................................... 2-84 Major Instruction Groups .......................................................................... 2-87 iv FIGURES Figure No. 1-1 1-2 1-3 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 2-10 2-11 2-12 2-13 2-14 2-15 2-16 2-17 2-18 2-19 2-20 2-21 2-22 2-23 2-24 2-25 2-26 2-27 2-28 2-29 2-30 2-31 2-32 2-33 2-34 2-35 2-36 Title Page The FPA .............................................................................................................. 1-2 FPA Physical Location ......................................................................................... 1-4 Positional Value of Binary Number .................................................................... 1-11 Floating-Point Format ......................................................................................... 2.;,2 Integer Format ..................................................................................................... 2-5 Short Literal Format ............................................................................................ 2-6 Zero and Reserved Operand Code ........................................................................ 2-8 Hidden, Overflow, and Guard Bits ....................................................................... 2-8 Overflow and Underflow Ranges ........................................................................ 2-11 FPA Block Diagram ........................................................................................... 2-13 The POLY Flow ................................................................................................. 2-21 FPA Block Diagram ........................................................................................... 2-26 CPU-FPA Interface ............................................................................................ 2-27 Status Register ................................................................................................... 2-28 Maintenance Register ......................................................................................... 2-32 FP Bus Formats ................................................................................................. 2-36 Fraction Adder Block Diagram .......................................................................... 2-37 SHFR Operation ................................................................................................ 2-39 Fraction Normalizer /Divide Block Diagram ...................................................... 2-42 Normalize Shift Enable Control Hardware ......................................................... 2-43 Divide Sequence Hardware ................................................................................. 2-47 Divide Sequence Timing ..................................................................................... 2-48 Fraction Multiplier Block Diagram .................................................................... 2-49 The Pipeline· ....................................................................................................... 2-51 Loading and Accessing the Multiplicand ............................................................ 2-52 Loading and Accessing the Multiplier ................................................................. 2-53 SALU Operation - Adding the Stored Carrys ..................................................... 2-57 FM Control States .............................................................................................. 2-58 FM Control Logic .............................................................................................. 2-61 MULF Control .................................................................................................. 2-62 The XFER State ................................................................................................. 2-64 MULD Control. ................................................................................................. 2-65 MULL Control .................................................................................................. 2-69 Exponent Processor Block Diagram ................................................................... 2-70 Sign Processor Block Diagram ............................................................................ 2-74 Control Store and Logic Block Diagram ............................................................. 2-76 Next Address Logic .................................................... .- ....................................... 2-78 FPA Control Word Fields .................................................................................. 2-82 FPA Microcode Structure .................................................................................. 2-85 v TABLES Table No. 1-1 1-2 1-3 1-4 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 2-10 2-11 2-12 2-13 2-14 2-15 2-16 2-17 2-1~ L.< 2-20 2-21 2-22 2-23 2-24 2-25 2-26 2-27 Title Page Related Hardware Manuals ................................................................................... 1-1 FPA Instruction Set ............................................................................................. 1-3 FPA Modules ....................................................................................................... 1-5 Binary- Hex Equivalents ................................................................................... 1-10 Floating Literals ................................................................................................... 2-6 Zero Operand Microcode ..................................................................................... 2-7 Exception Conditions ......................................................................................... 2-10 FALU Operation ............................................................................................... 2-15 Special FAD Operation ...................................................................................... 2-15 The Division Load .............................................................................................. 2-18 The Status Register ............................................................................................. 2-29 CS Lines ............................................................................................................. 2-30 The Maintenance Register .................................................................................. 2-33 Signals Monitored by Visibility Bus .................................................................... 2-34 BSC Control Store Field ..................................................................................... 2-35 Fraction Data Entry ........................................................................................... 2-38 FALU Operation ............................................................................................... 2-40 FALU MUX Control ......................................................................................... 2-41 Round Byte and Normalize Control ................................................................... 2-44 Divide Sequence States ....................................................................................... 2-48 Operand Bus Source ........................................................................................... 2-55 FM Control States .............................................................................................. 2-59 EAC Control Store Field .................................................................................... 2-71 EALU Input Control. ......................................................................................... 2-72 EALU Control Store Field ................................................................................. 2-73 SGNC Control Store Field ................................................................................. 2-75 Sign Processor Operation ................................................................................... 2-75 Next Address Lines ............................................................................................ 2-78 BEN Control Store Field .................................................................................... 2-81 EPA Control Word Field Definitions ................................................................. 2-83 Interface Microcode ........................................................................................... 2-86 vi The FPA is a microprogrammed device operating as a synchronous extension of the CPU data path. Both the FPA and CPU operate using a 200 ns microcycle; FPA TO coincides with CPU TO. As an extension of the CPU, the FPA does not access memory data. The CPU must do memory address calculations, access the calculated address, and transmit the accessed data to the FPA. The CPU is also responsible for fetching and storing the FPA results. The FPA performs only the required floatingpoint or integer operation on the properly formatted operands transmitted to it. The FPA can do floating-point addition, subtraction, multiplication, and division instructions. It receives a packed, normalized floating-point number containing a sign bit, fraction bits, and exponent bits. The FPA breaks the number into parts and FPA data manipulation sections perform the operations required to carry out the instructions on each part. Once the result is completed, it normalizes and packs the result for return to the CPU. Refer to Figure 1-1, a simplified diagram of the FPA. ID BUS FRACTION PROCESSORS (.) (.!) 0 ...J DM FX BUS w u <( CPU LL CS BUS ' a: w I- z CONTROL LINES ...J (.) (!) 0 ...J 0 w a: N z :i u ::!!: I- <( 0 a: <( 0.. LL EXPONENT AND SIGN PROCESSORS 0 z FPA TK-0522 Figure 1-1 The FPA 1.1.1 Accelerator Interface The FPA is an optional hardware extension of the VAX CPU data path. It is the first of a series of optional accelerators that can be plugged into slots 24 through 28 of the CPU backplane. To facilitate design of these optional accelerators, a set of standard interface signals and buses is used to transfer data and control information. CI8N~Ide~Vov.rihe CPU general register set are kept in the FPA. These are read-only memory to the fi*laii;nd>~JH'C)QMJd~apid access to register operands when used in instructions. Every time the CPU g6~rri:lgbitehl<Jire~pdated, a copy of the update data is transmitted via the DFMX bus to the FPA ro.ipi:t~ ruridlBah'3tt~§l 1hQfil . ~rh 2~1~Iqmo::> Aqq ~rh ~Iin AH\ifhorl&c11~iMl14o<P~iid•literal) is transmitted to the accelerator via the ID bus. Memory data is trlt~ ciriuo Mi\3CffID ~ister and then onto the ID bus. Literal data is transferred from the itdtr<Dittbt> ~ ~fartkdit0f~! 1 All op codes are received from the instruction buffer. The FPA uses dedicated hardware to handle eet\laibmt>8tQ,H~S'llhenor>~d@<iim1*~ed and, if part of the FPA implemented set, processing is 3lgntecA .Bf.-iJI Xe~.± 2i 1n~2~1q~1 n~ Aqq, I£mi::>~b ol 1uod£ 01 wdmun noi2io~1q ~Iduob h .~viwbni. \M.£8~.\~I.~ 01 8M,£8~.n~1.~- mo11 c. IU2 FPA results are returned to the CPU via the DFMX bus. Any transfer of data (either operand~ or results) between the CPU and FPA is controlled by the CPSYNC and FPSYNC. CPSYNC is transmitted via the CS bus. When an operand is transferred to the FPA, CPSYNC asserted (by the CPU) indicates that data is available on the ID bus and FPSYNC is asserted (by the FPA) to indicate data has been received. When the FPA is returning a result, FPSYNC indicates result available and CPSYNC indicates result received. When a result is transferred, the FPA also transmits the proper condition codes to the CPU. Traps and errors are handled with three signals: ACC ERROR (from FPA to CPU), FP TRAP (CPU to FPA), and ACC TRAP (CPU to FPA). ACC ERROR (also called ERRSYNC) is asserted when the FPA detects an internal error and is input to the CPU BEN mux. FP TRAP is used by the CPU to initiate microdiagnostics stored in the FPA. ACC TRAP selects either the power-up trap or the abort trap (both stored in the FPA microcode). 1.2 FPA INSTRUCTION SET The FPA handles only a limited number of instructions (refer to Table 1-2). No floating-point instructions are available in VAX's PDP-11 compatibility mode. As shown in the table, the FPA handles single and double precision instructions in both 2 and 3-operand formats. The FPA handles the single and double precision instruction variations internally. However, as stated before, the FPA does no memory accessing. This means the CPU must do all address calculations and accessing for any input operands stored in memory. Also, the FPA does not store any final results; it merely makes the results available to the DFMX bus. The'CPU must enable the result onto the DFMX bus, determine the result destination, and put it into the destination. In a 3-operand instruction, the FPA begins computing as soon as it has the 2 source operands while the CPU is computing the third, or destination, address. Table 1-2 FP A Instruction Set Mnemonic Description ADDF* ADDD* SUBF* SUBD* MULF* MULD* DIVF* DIVD* POLYF POL YD EMODF EMO DD MULL* Add single-precision floating-point Add double-precision floating-point Subtract single-precision floating-point Subtract double-precision floating-point Multiply single-precision floating-point Multiply double-precision floating-point Divide single-precision floating-point Divide double-precision floating-point Evaluate polynomial single-precision floating-point Evaluate polynomial double-precision floating-point Extended single-precision floating-point Extended double-precision floating-point Multiply integer longword *The FPA instruction set includes both the 2-operand and 3-operand format of these instructions 1-3 1.3 PHYSICAL DESCRIPTION The FPA consists of 5 hex-height, extended-length modules containing mostly Schottky ITL logic. They replace blank modules 7014103 in slots 24 through 28 of the KA 780 backplane. These slots are designated as the accelerator option slots. The FPA is powered by an H7100 installed in power supply position 1. When viewed from the rear, position 1 is the rightmost location in the VAX CPU cabinet. Position 1 is left empty if an accelerator is not installed. The H7 IOO is a 5 V, 100 A supply. Refer to Figure 1-2 for the location of backplane slots and power supply. Refer to Table 1-3 for module designations and locations. - --~ TK-0524 Figure 1-2 FPA Physical Location 1-4 Table 1-3 1.4 FPA Modules Module No. Slot Module Name Module Function M8285 M8286 M8287 M8288 M8289 24 25 26 27 28 FNM FMH FML FAD FCT Normalization and fraction division Fraction multiplication (most significant bits) Fraction multiplication (least significant bits) Fraction addition and subtraction Exponent manipulation and FPA control FLOATING-POINT NUMBERS AND ARITHMETIC 1.4.1 Introrluction This section discusses some fundamentals of floating-point numbers and arithmetic. It provides useful background for more advanced topics in later sections. The reader already familiar with floating-point may skip this section. 1.4.2 Integers All data within a computer system could be represented in integer form. The numbers that could be represented in a 32-bit machine range in magnitude from 0000000016 to FFFFFFFF16 (or from Ow to 4,294,967 ,295). However, integer form imposes some limitations. Only whole numbers can be represented, i.e., no fraction or decimal parts; this imposes an accuracy limitation. Furthermore, numbers greater than 4,294,967 ,295 cannot be represented; this imposes a range limitation. These limitations are imposed by the stationary position of the radix point (e.g., the decimal point in base 10 notation or the binary point in base 2 notation). An integer's radix point is usually omitted in integer representation because it always marks the integer's least significant place. That is, there are never any digits to the right of an integer's radix point. For this reason, an integer is sometimes called a fixed-point number. Integer notation, however, can be modified to overcome the range and accuracy limitations imposed by the fixed radix point. This is done through the use of floating-point notation. 1.4.3 Floating-Point Numbers Floating-point numbers, unlike integers, have no position restrictions imposed on their radix points. A popular type of floating-point representation is called scientific notation. With scientific notation, a floating-point number is represented by some basic value multiplied by the radix raised to some power. Example basic value i ~exponent 1,000,000 = 1. x 1Q6 ~radix 1-5 There are many ways to represent the same number in scientific notation, as shown in the following example. Right shifts 512 = 512. = 51.2 = 5.12 = .512 Left shifts x x x x 100 101 102 103 512 = 512 = 5120 = 51200 = 512000 x x x x 100 10-1 10-2 10-3 The convention chosen for representing floating-point numbers with scientific notation in the FPA requires the radix point to always be to the left of the most significant digit in the basic value (e.g., .512 X I 03 in the above example). This modified basic value is called a fraction. Notice that for each right shift of the basic value, the exponent is incremented and for each left shift the exponent is decremented. The value of the number remains constant if the exponent is adjusted for each shift of the basic value. More examples of scientific notation are as follows. Decimal Notation Decimal Scient. No. Binary Notation Hex Notation Hex Scient. No. 64 .64 x 102 .33 x 102 .5 x 100 .9375 x 10-1 1000000. 100001. 0.1 0.00011 4016 2116 .816 .1816 .4 x 16-2 .21 x 16-2 .8 x 16'> .I 8 x 160 33 I /2(.5) 3/32(.09375) 1.4.4 Decimal/Binary/Hexadecimal Conversion There are standard routines to convert from decimal notation to hexadecimal (also called hex) and back. When converting from either decimal-to-hex or hex-to-decimal it is convenient to first convert to binary notation and then to the final notation. Decimal to Hex Conversion: To convert a decimal number with both integer and fraction portion to a hex number, the integer and fraction are separated and converted individually. The integer is converted to binary by a repeated division technique, the fraction by a repeated multiplication technique. 1-6 To convert an integer to binary representation, the integer is divided by two. The remainder of this division (either 1 or 0) becomes the LSB of the binary representation. The result of this division is again divided by two. The remainder of this division goes to the left of the LSB, becoming "next to LSB." The result is divided again. This process is continued until the result is zero. Refer to Example 1. Example 1 Convert 19710 to binary STEP STEP 2 STEP 3 STEP 4 STEP 5 98 2]197 R 1100 l 49 R 0101 _J_ J J 0 2J"98 24 R 1 2)49 12 R 0 R 0 R 0 R 1 2)24 6 2)1'2 STEP 6 3 2)6 STEP 7 STEP 8 1 2)3 0 R 2J1 19710 = 1100 01012 TK-0654 1-7 A repeated multiply-by-2 converts a decimal fraction to a binary fraction. The decimal fraction is multiplied by two. If the result is 1.0 or more, a l is placed in the MSB of the fraction (directly to the right of the binary point); if less than 1.0, a zero is placed there. The fraction portion only of this result is again multiplied by two, if the result is 1.0 or more, a l goes to the right of the MSB, less than 1.0, a zero. This continues until the fraction portion of the result is all zeros (refer to Example 2) or until enough binary fraction bits have been generated to represent the decimal accurately enough (refer to Example 3). Note that finite length decimal fractions can become repeating fractions in binary (Example 3). Example 2 Convert 3/8 (.375) to binary STEP 1 .375 .0 1 1 @.1~-o__J STEP 2 .75 2 G) .50 -+ 1 - - - - STEP 3 .50 2 <D .00 _., 1 - - - - .37510 = .0112 STOP TK-0655 1-8 Example 3 Convert .60310 to binary _J .1 0 0 1 STEP 1 .603 G).2~ _ _,..1 STEP 2 10 1 1 . .206 2 ® .412 STEP 3 0-----' .412 2 @ .824 STEP 4 0------1 .824 2 (!).648 - STEP 5 .648 2 G) .296 STEP 6 .296 2 ® .592 STEP 7 0------~ .592 2 © .184 - - · ---------J .60310 ~ .1001 1012 DECIDE TO STOP TK-0656 1-9 The conversion from binary to hex is very simple. Starting at the binary point, break the binary number into groups of 4 digits each. (Zero fill at both right and left ends to complete groups of 4.) Then replace each group of 4 with its hex equivalent. Refer to Table 1-4, and Example 4. Table 1-4 Binary-Hex Equivalents Binary Hex 0000 0001 0 1 0010 2 0011 0100 3 4 5 0101 0110 0111 1000 6 7 8 1001 9 1010 A 1011 1100 1101 1110 1111 Example 4 B c D E F Convert 1100I0110.101101 2 to Hex 1. Break into groups of four and zero-fill left and right ends. Zeros Zeros Added Added 0001 1001 0110.1011 0 100 '-..;-" '-..,.-" '-..;-" '-..;-" ~ 4 4 4 4 4 -- -- 2. Replace four digit groups with hex equivalents. Refer to Table 1-4. 0001 1001 0110.1011 0100 i i + + + 1 9 6 B 196.B8 16 8 1 1001 0110.101101 2 =196.B8 16 1-10 To convert from hex back to decimal, first replace each hex digit with its 4-bit binary equivalent (refe, to Table 1-4). Each position in a binary number has a positional value based on which side of the binary point it is and its distance from the binary point. The positional values are based on powers of two. The bit in the unit column has a positional value of one. The positional value doubles each time you move from right to left, and halves as you move from left to right. Refer to Figure 1-3 for a summary of binary positional values in both powers of two and decimal value. ••• 27 128 26 64 25 32 24 16 23 22 21 8 4 2 20 . 2·1 2·2 2-3 % % 1/8 .5 .25 .125 2·6 1/64 2·4 2·5 1/16 1/32 I .0625 ... I .015625 .03125 TK-0657 Figure 1-3 Positional Value of Binary Number To convert from binary notation to decimal notation, add the decimal positional value of each bit that is a one. This sum will be the decimal equivalent of the binary number. 1.4 .S Normalization As discussed previously, there are many ways to represent a particular floating-point number using scientific notation and the convention chosen for representing floating-point numbers in VAX and the FPA requires the radix point to be to the left of the most significant bit in the basic value. Refer to Example 5. Example 5 Floating-Point Form 2910 x x 1110.1 x 111.01 x 11.101 x 1.1101 Chosen ... 1110 1 x Form .0111 01 x .0011 101 x = 111012 = 11101. .11101 Fraction 5 Exponent 1-11 20 21 22 23 24 2s 26 2·1 = = = = = = = = 1 1101. 11 1010. 111 0100. 1110 1000. 1 1101 0000. 11 1010 0000. 111 0100 0000. 1110 1000 0000. x x x x x x x x 20 2-1 2-2 2-3 2-4 2-s 2-6 2-7 The process of ensuring that the first significant bit is directly to the right of the binary point is called normalization. If the number is one or larger it involves right-shifting the basic value and incrementing the exponent until the MSB (a one) is directly to the right of the binary point. If the number is a fraction with leading zeros the basic value is left-shifted and the exponent is decremented. Examples 6 and 7 show conversion of numbers to VAX normalized form. Example 6 Convert 7510 to a normalized binary number I. Integer conversion 7510=10010112 2. Floating-point form I 00 I01 12 = I00 10112 X 20 3. Normalized form Right shift fraction 7 times Increment exponent by 1 100 10112 x i> = .100 1011 x 21 Fraction = .100 I 011 Exp.onent = 7 Example 7 Convert 3/16 (.01875) to a normalized binary number. 1. Integer conversion .0187510 = .00112 2. Floating-point form .00112 = .00112 x 20 3. Normalized form Left shift fraction 2 times Decrement exponent by 2 .001 Ii X 20 = .11 X 2-2 Fraction = .11 Exponent = -2 1.4.6 VAX Floating-Point Notation Two conventions are used in the FPA to conserve memory space without losing accuracy and to aid in hardware manipulation. The first convention is called the hidden bit. All numbers transferred between the CPU and FPA are normalized floating-point numbers. This means the first significant bit (always a I) is always directly .to the right of the binary point. To conserve memory space and data lines, the first significant bit is not stored or transmitted to the FPA. For example, the fraction part of the normalized binary number .llOOO... X 2-2 will be stored and transmitted to the FPA as 100 .... The normalized fraction of 1/2 (.100 ... X 20) will be stored and transmitted as 000 .... In both cases the first I (the hidden bit), will be added by hardware in the FPA. When the FPA transfers a normalized answer back to the CPU the hidden bit is not sent. 1-12 The 8-bit exponent portion of a floating-point number is stored using excess 8016 notation. This notation simplifies the hardware that manipulates the exponent during floating-point arithmetic operation. Excess 8016 exponent notation is obtained by adding 100000002 (200s, 8016, or 12810) to 2's complement notation. Refer to Paragraph 1.5 for a further discussion of excess 80 notation. 1.4.7 Floating-Point Addition and Subtraction In order to perform floating-point addition or subtraction, the exponents of the two floating-point numbers involved must be aligned or equal. If they are not aligned, the fraction with the smaller exponent is shifted right until they are. Each shift to the right is accompanied by an increment of the associated exponent. When the exponents are aligned, the fractions can then be added or subtracted. The exponent value indicates the number of places the binary point is to be moved to obtain the integer representation of the number. In example 8, the number 710 is added to the number 4010 using floating-point representation. Note that the exponents are first aligned and then the fractions are added; the exponent value dictates the final location of the binary points. Example 8 Floating-Point Addition 0.1010 0000 0000 000 x 26 = 2816 = 4010 +0.1110 0000 0000 000 x 23 = 1. 716 = 710 To align exponents, shift the fraction with one smaller exponent three places to the right and increment the exponent by 3, and then add the two fractions. 0.1010 0000 0000 000 x 'lfJ = 2816 = 4010 +0.0001 1100 0000 000 x 26 = ~ 716 = 710 0.1011 1100 0000 000 X '2fJ = 2F16 = 4710 2. To find the integer value of the answer, move the binary point six places to the right. 010 1111.0000 0000 0 ~ 1.4.8 Floating-Point Multiplication and Division In floating-point multiplication, the fractions are multiplied and the exponents are added. For floating-point division, the fractions are divided and the exponents are subtracted. There is no requirement to align the binary point in the floating-point multiplication or division. Example 9 shows floatingpoint multiplication. Example IO shows division. 1-13 Example 9: Multiply 7 10 by 4010· 1. x 23 = 7 = 710 x 0.1010000 x 26 = 2816 = 4010 0.1110000 1110000 0000 11100 .1000110000 2. x 29 (Result already in normalized form.) Move the binary point nine places to the right. J.Q_00110.!l9.00000 = 11816 = 28010 Example 10: Divide 151 o by 51 o. 1. .1111000 .1010000 x 24 x 23 1.100000 1010000 )1111000.000000 1010000 101000 101000 0 2. Exponent: 4-3 = 1 3. Result: 1.100000 X 21 Normalized Result: .1100000 X 22\ Normal~ Normalized Exponent Move binary point two places to the right. ~00000 = 316 = 310 1.5 EXCESS 80 NOTATION The VAX and, consequently, the FPA use excess 80 notation to store and handle the exponent portion of floating-point numbers. Excess 80 notation is the 2's complement of exponent plus 12810 or 80t6· 1-14 It is convenient to handle the exponent portion of the floating-point number in 2's complement notation. This allows a wide range of both positive and negative exponents to be represented. However, in 2's complement notation an overflow must occur to go from the least negative number to zero. To avoid this the bias of 12810 is added to the 2's complement number. Historically, minicomputers have been discussed and explained using octal notation. In octal, the bias of 12810 is 200g. In previous manuals this exponent notation has been discussed using octal form. As a result, it is called excess 200g or excess 200. However, the VAX is discussed using hexadecimal notation. Unfortun~tely, when discussing the excess 80 bias in VAX documentation, it has been called 8016, 12810, 200g, and lOOOOOQOi (sometimes the base is indicated, sometimes it isn't). When studying the FPA print sets, technical manuals, and microcode listings, be aware of this variation in terminology. In this manual hex notation is used and the exponent bias is called excess 80. When multiply and divide operations are performed using floating-point numbers with excess 80 exponent notation the resulting exponent must be adjusted by the bias to return the result to excess 80 notation. When a multiplication is performed exponents are added, 8016 must be subtracted from the result to return it to excess 80 notation. To understand why 80 must be subtracted from the exponent calculation during multiplication, consider the following. Exponent A + 80 \ I Excess 80 notation Exponent B + 80 Exponent A + Exponent B + l 00 Both exponent A and exponent B are biased by 80, yielding a bias of l 00. However, only a bias of 80 is desired in excess 80 notation. Multiplication Example 2 x 3 =6 Exponent Fraction x 82 2 = 0.100 3 = 0.110 x 82 Fraction Calculation Exponent Calculation 2 = 0.100 3 = 0.110 1000 100 6 = 0.011000 82 +82 104 -80 84 x 1-15 Normalize the fraction by left-shifting one place and decreasing the exponent by 1. Fraction + Exponent I 0.11000 x 83 = 6 When a division is performed, exponents are subtracted and 8016 must be added to the result to return it to excess 80 notation. To understand why 80 must be added to the exponent calculation during division, consider the following: Exponent A + 80 - Exponent B + 80 Exponent A - Exponent B + 80 - 80 = Exponent A - Exponent B + 0 However, since the result is to be in excess 80 notation, 8016 must be added to the exponent, yielding Exponent A - Exponent B + 80. Division Example 16/4 = 4 Exponent Fraction x x 16 = .10000 4 = .10000 85 83 Exponent Calculation Fraction Calculation 85 -83 2 +80 82 1.000 Normalize the fraction by right-shifting one place and incrementing the exponent. Fraction t ExRonent j' .10000 x 83 = 4 1-16 CHAPTER 2 FUNCTIONAL DESCRIPTION This chapter explains the operation of the FPA. The chapter can be divided into four areas: introduction, algorithms, hardware operation, and microcode. The introduction (Paragraph 2.1) discusses the various types of data formats that may be handled by the FPA. The algorithms (Paragraph 2.2) lists the various instructions the FPA can do and explains the FPA operations required to perform each operation. This section discusses the FPA operation based on instruction flow. Hardware operation (Paragraph 2.3) breaks the FPA into hardware blocks and discusses the operation of each. Both the algorithm section and the hardware operation section should be read to get a thorough understanding of the· FPA operation. They discuss the same equipment from different viewpoints. Microcode (Paragraphs 2.4 through 2.6) summarizes both the FPA microcode and the FPA specific microcode in the CPU. This discussion focuses on the generation and monitoring of the various control signals passed between the units. 2.1 DATA FORMATS The FPA handles single (float) and double precision floating-point data and signed integer longwords. It receives normalized, packed data from the CPU and returns normalized, packed results to the CPU over 32-bit wide buses. Within the FPA, intermediate data is transmitted over two 34-bit wide buses. The data formats used by the FPA are compatible with these bus structures as well as the input and output formats of the various data manipulation units within the FPA. 2.1.1 Floating-Point Numbers Floating-point numbers consist of sign bit, exponent bits, and fraction bits. A single precision floatingpoint number is stored in CPU memory as 4 contiguous bytes starting on an arbitrary byte boundary. Bits are labeled from the right, 0 through 31. The number is specified by its address A, the address of the byte containing bit 0 (Figure 2-1 ). The range of a single precision floating-point number is approximately .29 X 10-38 through 1.7 X 1038. The precision is typically 7 decimal digits. A double precision floating-point number is stored as 8 contiguous bytes. Bit labeling and addressing is similar to a single precision floating-point number. A double precision number has a range similar to a single precision, but its precision is about 16 decimal digits (Figure 2-1 ). 2-1 SIGN FRACTION + .657 EXPONENT x FRACTION BITS SIGN BIT 1 x x x x A NORMALIZED FLOATING POINT NUMBER. 212 EXPONENT BITS ••••• (EXCESS 200 NOTATION) COMPUTER REPRESENTATION. SIGN 7 31 6 1 33 32 31 SIGN 16 15 14 L. 0. FRACTION N N 0 AS STORED IN VAX MEMORY. TRANSFERRED TO FPA. AND RECEIVED BY FPA. HI ORDER FRACTION L 0 ORDER FRACTION 7 6 0 AS TRANSFERRED ON FPA BUSES; FP BUS A + FP BUS B. (UNNORMALIZED. INTERMEDIATE RESULTS) H. 0. FRACTION EXPONENT OVERFLOW HIDDEN L. 0. FRACTION AS USED IN FPA (UNPACKED: UNNORMALIZED RESULTS) EXPONENT SIGN 1 33 32 31 7 6 L. 0. FRACTION 0 31 0 READY FOR RETURN TO CPU (PACKED. NORMALIZED) H. 0. FRACTION 16 SIGN 15 14 L. O. FRACTION 7 EXPONENT 6 0 H. 0. FRACTION RETURNED TO CPU NOTE 1: A NORMALIZED NUMBER HAS A 0 (ZERO) OVERFLOW BIT. AND A 1 HIDDEN BIT. TK-0528 a. Single Precision + SIGN BIT I EXPON~ FRACTION .657 SIGN x 214 EXPONENT BITS FRACTION BITS 1 A NORMALIZED FLOATING POINT NUMBER xx x • • • • ·I COMPUTER REPRESENTATION (EXCESS 200 NOTATION) '-.,.-.J"-y-J~ FRACTION :n AS STORED IN VAX MEMORY. TRANSFERRED TO FPA. AND RECEIVED BY FPA (TRANSFERRED IN TWO TRANSFERS: BITS 0-31 FIRST TRANSFER. BITS 32-63 SECOND TRANSFER) 0 33 32 31 0 AS TRANSFERRED ON FP BUSES (UNNORMALIZED. INTERMEDIATE RESULTS). FRACTION 1 c: "'t 16 15 (1) t:.J 0 0 LSB (JQ 'Tl 32 31 48 47 63 FRACTION FRACTION LSB NOT USED COMPLETE NUMBER (66 BITS TRANSFERRED SIMULTANEOUSLY) OVERFLOW HIDDEN --~~~~~-r-~~-- ,... ~ s· N w O? "'tl Q a SIGN 'Tl AS USED IN FPA (UNPACKED. UNNORMALIZED 0 FRACTION FRACTION "'t FRACTION EXP 3 e - RESULTS) LSB en ::r (1) ~ N 0 -, I._) 33 32 31 0 33 32 31 16 15 . FRACTION FRACTION 0 1 0 16 1 5 14 FRACTION SIGN i NOT USE MSB 16 15 14 31 0 FRACTION 1615 31 FRACTION RETURNED TO CPU 1ST TRANSFER - 32 BITS (EXPONENT AND MOST SIGNIFICANT FRACTION BITS) 0 SIGN MSB FRACTION LSB READY FOR RETURN TO CPU (PACKED. NORMALIZED) 2ND TRANSFER - 32 BITS (LEAST SIGNIFICANT FRACTION BITS) NOTE 1· A NORMALIZED NUMBER HAS A 0 (ZERO) OVERFLOW BIT. AND A HIDDEN BIT. TK·0527 b. Double Precision Floating-point numbers are transmitted to the FPA as packed, normalized numbers without a hidden or overflow bit. A single precision (float) number will have 24 fraction bits and a double precision number will have 56 fraction bits. Hardware in the FPA inserts and handles both the hidden and overflow bits. The number is split apart and used in various data manipulation units in the FPA. Although all operations begin with normalized operands, the intermediate results produced by the FPA data manipulation units can vary widely. Subtraction of nearly equal numbers can produce a number very close to zero. Addition and division can produce numbers close to 2. As a result intermediate results are transferred between data manipulation units as unnormalized numbers with both hidden and overflow bits. After the result is normalized, it is ready to return to the CPU. When the result is transmitted, it is transmitted as a packed, binary normalized number without hidden or overflow bits. POLY uses specialized floating-point notation for intermediate results. In POLY, 7 additional bits are used for fraction addition. POLY execution consists of multiply, add, multiply, etc. To maintain maximum accuracy while functioning within the limitations of the FPA hardware, 7 additional LSBs are transferred from the fraction multiply (FMH + FML) hardware to the fraction add hardware (FAD). The 7 additional bits come from LSH < 11 :5> along FP bus A < 14:08> into AR <06:00> (also called ARX). The FPA performs the add on the extended precision number, then transfers the addition result to the normalizer logic (FNM) where it is rounded, normalized, and held for the next part of the POLY instruction. The EMOD instruction causes a 32 X 24 (64 X 56 for double) bit fraction multiplication to be performed in the FMH and FML. The extra 8 bits in the multiplicand are transferred over the ID bus to FP bus B line <07:00> to MCINT (also called MCX). MCINT <07:00> drives MCAND bus <07:00> for the fraction multiply. MPLIER is handled in the usual fashion. The result of the extended precision multiply is transferred to the CPU in one 32-bit transfer (F) or two 32-bit transfers (D). 2.1.2 Integer Numbers The FPA handles a single integer format instruction, MULL (multiply longword). A longword is stored in CPU memory as 4 contiguous bytes starting on an arbitrary byte boundary. The FPA receives two 32-bit signed integers and multiplies them as unsigned integers to form a 64-bit product. The product, a 64-bit number, is returned to the CPU in two 32-bit transfers (low half first) for further processing. Refer to Figure 2-2 for summary of integer format. 2.1.3 Literals The FPA handles float and double precision literal data. It receives the data from the CPU IB. Float literal data is transferred from the IB to the FPA's Literal Register (LR) using the ID bus. The FPA then loads the LR data into FPA internal registers and begins processing. The first half of double precision literal data is handled similarly. The second half comes from the CPU D-register via the ID bus and is loaded directly from the ID bus into the FPA internal registers. 2-4 INTEGER(MULL)FORMAT 0 3130 LSB 0 333231 I'-r-' 11 MSB cjQ' c ...., N LSB I AS TRANSFERRED ON FPA BUSES UNSIGNED (POSITIVE) NUMBER NOT USED 0 N I N v. AS STORED IN VAX MEMORY TRANSFERRED TO FPA AND RECEIVED BY FPA. 2's COMPLEMENT (SIGNED) NUMBER SIGN "Tl I 31 - MSB :l ..... 0 0 3 AALU SALU 4 0 31 LSH REG LSB RESULT STORED IN FPA 0 RESULT TO CPU (VIA FP BUS A TO DFMX BUS) O'Q 0 ...., "Tl 0 ...., 3 ..... ~ 31 * LSB 1st TRANSFER 31 * 0 MSB * BITS 32 AND 33 OF FP BUS NOT USED 2nd TRANSFER TK-0523 The FPA handles short literals. Short literals contain only six data bits and are part of the instruction. The CPU formats the six data bits within the 32-bit data longword based on instruction type (floatingpoint or integer instruction.) If it is an integer instruction (the FPA handles only MULL), the six data bits are zero extended (26 zeros are added.) Any integer between 0 and 6310 can be written using a short literal. If it is a floating-point instruction, the short literal is assumed to contain three exponent bits and three fraction bits. The IB packs the data into standard FP format. This includes excess 80 notation for the exponent, a positive sign bit and a normalized fraction with a one hidden bit that is not stored. Refer to Figure 2-3 for FPA short literal format, and Table 2-1 for data that can be transferred using floating-point short literal form. Notice only positive numbers can be transferred. If a double precision short literal is specified, the FPA accepts the first half and manufactures zeros to fill the second half. 5 I 3 2 EXPONENl 0 I FRACTION ] A. SHORT LITERAL DATA; AS STORED IN INSTRUCTION STREAM 151413 I 111 ZEROS ZEROS B. 10 9 4 3 DATA I 0 ZEROS SHORT LITERAL DATA: AS FORMATTED BY IB AND TRANSFERRED TO FPA FOR A FLOATING-POINT OPERATION TK-0519 Figure 2-3 Short Literal Format Table 2-1 Exponent Fraction 0 0 1 2 3 4 5 6 7 Floating Literals 1/2 1 2 4 8 16 32 64 9/16 1-1/8 2-1/4 4-1/2 9 18 36 72 2 3 4 5 6 7 5/8 1-1 /4 2-1/2 5 11/16 1-3/8 2-3/4 5-1/2 11 22 44 88 3/4 1-1/2 3 6 12 24 48 96 13/16 1-5/8 3-1/4 6-1/2 13 26 52 104 7/8 1-3/4 3-1/2 7 14 28 56 112 15/16 1-7 /8 3-3/4 7-1/2 15 30 60 120 !O 20 40 80 2-6 The FPA also handles long literals (32 or 64 data bits). Thirty-two bits, either a complete single precision transfer or the first half of a double precision, are transferred from the IB to the FPA LR. The second half of the double precision number is taken directly from the ID bus. Float and double precision floating-point data can be transferred using long literal format. The FPA also receives 32-bit integer data using the long literal format. (The FPA does not handle any 64-bit integer operands.) 2.1.4 Zero and Resened Operand Codes The FPA checks all data received for zeros and reserved operands during the fraction processing. Both zero and reserved operand function as codes transmitting specia·I information. As discussed in Paragraph 1.4, the FPA assumes all floating-point numbers to be no_rmalized numbers (between l /2 and I) with a hidden bit that is not stored. The hidden bit is normally inserted by data manipulation hardware. A zero cannot be represented as a normalized number and the hardware that inserts the hidden bit only increases the problem of representing and using zero. As a result, zero is represented by a code with zeros in the exponent bits (no excess 200 notation) and a clear sign bit. The fraction bits do not matter. Whenever this combination of bits is sensed, the FPA accesses special microcode that simulates the special properties of addition, subtraction, multiplication, and division with zero. Refer to Table 2-2 for the result of an operation with zero, and Figure 2-4 for the zero code. Table 2-2 Zero Operand Microcode Operation Operand(s) Operation Result Add o+x, x+o o+o X operand returned Zero returned* Subtract 0-X X--0 0-0 -X returned X operand returned Zero returned Multiply oxo, xxo, oxx Zero returned* Divide O+X (dividend is zero) X +0 (divisor is zero; divide by zero) Zero returned* Error conditiont • Zero code is returned, 0 in sign and exponent. t FPA informs CPU that division by zero was attempted by asserting FPA error and PSL V bit and not· asserting FP SYNC. 2-7 ZERO CODE 31 0 7 6 16 15 14 .,.,.,oE•t--- ZERO ----13•4•DON'T CARE3 ,....,...___ _ _ DON'T CARE _ _ _ _ FRACTION SIGN EXPONENT FRACTION RESERVED OPERAND CODE 76 161514 31 .j~..------DON'T . 0 cARE-----•-l 1 l-4----zERo----3...... DON'T CARE3 FRACTION SIGN EXPONENT FRACTION TK-0517 Figure 2-4 Zero and Reserved Operand Code The code for reserved operand is zeros (cleared) in the exponent bits and a one (set) in the sign bit. One in the sign bit normally indicates a minus number so this sometimes called minus zero. A reserved operand indicates invalid data. It indicates data was accessed from a location that had not had data loaded into it, or a previous exception. Refer to Figure 2-4 for reserved operand code. 2.1.S Hidden, Overflow and Guard Bits The FPA uses extra fraction data bits during fraction manipulation to completely represent the fraction data, to handle result overflow, and to ensure accuracy of fraction result. Refer to Figure 2-5 for location of hidden, overflow, and guard bits. USED BY FPA ADDED J BY + FPA DATA FROM CPU ~f 31 161~14 al+- FP BUS 7 6 -.-....------------------------------------------~-----------FRACTION EXPONENT Ul'--_ SIGN OVERFLOW HIDDEN LINES FRACTION ___, WHERE GUARD BITS ARE TRANSFERRED TK-0518 Figure 2-5 Hidden, Overflow, and Guard Bits As discussed previously, the CPU stores floating-point numbers in a packed normalized form with the MSB of the fraction (called the hjdden bit) not stored (since it is always a 1). The FPA receives the floatin~-poipt numbers in this form. To facilitate fraction calculat;on, logic on FNM adds the hidden bit to ~.11 CPU fraction d~ta as it transported over the FP buses. T,he hidden bit is transmitted on FP bus (32). This means that all fraction data received by FPA fraction manipulation units have correct hidden bits. 2-8 The FPA also transmits an overflow bit between fraction manipulation units using FP bus (33). The overflow bit handles unnormalized intermediate fraction results. The combination (addition, subtraction, or division) of two normalized fractions can create a result greater than 1. The overflow bit enables the FPA to transmit this unnormalized result from the fraction computation units to the fraction normalizer logic (FNM). To ensure accuracy of fractional results, the FPA data manipulation units add seven zeros called guard bits to the low order end of the fraction data they receive. This means a float fraction is 32-bits wide; a double, 64-bits wide. The POLY instruction loads extra data bits rather than zeros at the low order end of each coefficient fraction. The instruction also transfers additional low order data bits from the fraction multiply logic to the fraction add logic. These guard bits are dropped each time the POLY accumulation is normalized and rounded but they do ensure that the final answer is accurate. Without the guard bits, the right-shifting of a FP fraction to align radix points for addition and subtraction, or to normalize the result would lose the least significant bits off the right end of the shifted fraction. In some cases this loss would cause the last bit of the normalized result to be wrong. The guard bits prevent this. Guard bits are transmitted between FP data manipulation units using FP bus A (.14:08). These lines normally transmit exponent data. This arrangement allows the FPA to maximize accuracy without additional hardware overhead. 2.1.6 Overflow, Underflow, Zero, and Reserved Operands The FPA monitors all operands and results for exceptional conditions. When the FPA senses one or more of these conditions it informs the CPU via various bits and combinations of bits. Either one or both units begin special operations designed to minimize the effect of the condition. In som~ cases it stops the FPA's current operation and returns the FPA to the IRD state where all logic and registers are cleared in anticipation of a new FP instruction. The following paragraphs discus() these v3rious unusual conditions. Table 2-3 summarizes the FPA and CPU operations caused by the unusual conditions. 2-9 Table 2-3 Exception Conditions Op Code Exceptions Encountered Reserved Operand Zero Operand Result ADD, SUBT, MULT, EMOD Microcode simulates arithmetic operation with zero (Table 2-2). FPSYNC (ACCO) clear ERRSYNC (ACC 1) set CPU traps FPA to IRD All operations handle the occurrence of zero, underflow, and overflow results similarly.* DIVIDE ZERO DIVIDEND Microcode returns zero as result FPSYNC (ACCO) clear ERRSYNC (ACC 1) set PSL V bit clear ZERO - The zero code and FPSYNC are sent. PSL Z bit is set. ZERO DIVISOR Divide by zero ERROR - FPSYNC (ACCO) clear ERRSYNC (ACC 1) set PSL V bit set UNDERFLOW - Zero code, FPSYNC, and ERRSYNC are sent. PSL Z is set. If PSL U (underflow) is set underflow causes a trap, otherwise operations continue. CPU differentiates between ZERO DIVISOR and RESERVED OPERAND by examining PSL V bit. In both cases, CPU traps FPA to IRD. OVERFLOW - Reserved code, FPSYNC, and ERR SYNC are sent. PSL V is set. CPU traps FPA to IRD. POLY* POLY microcode simulates POLY operations with zero. (Table 2-2 and Paragraph 2.2.6). MULL No checking of MULL operands or results is performed by FPA software or hardware. Any combination of bits can be interpreted as an acceptable integer. * FPSYNC (ACCO) set ERRSYNC (ACC 1) set In STATUS REGISTER, minus ZERO ERROR bit set. CPU checks argument = RESERVED OPERAND. FPA checks coefficient =RESERVED OPERAND. When POLY flows note a RESERVED OPERAND, UNDERFLOW, or OVERFLOW, both FPSYNC (ACCO) and ERRSYNC (ACCI) are set. CPU examines PSL and FPA STATUS REGISTER to determine exception condition. RESERVED OPERAND sets the MINUS ZERO ERROR bit. OVERFLOW sets the PSL V bit. UNDERFLOW sets PSL Z bit. 2-10 Overflow and Underflow The FPA can handle a very large but bounded, range of numbers. Numbers too large (overflow) or too small (underflow) cannot be accurately handled (Figure 2-6). Special hardware monitors the results of all FPA operations for overflow and underflow conditions. The FPA checks for overflow and under- flow by monitoring the exponent results. The monitoring is straightforward because of the excess 80 notation used. If the exponent with its excess 80 bias exceeds FF16 an overflow has occurred. If the exponent is less than 0, an underflow has occurred. I OVERFLOW -.111 X2 7 F RANGE ~ -1.7 -.1 x2- 13 o x 103 8 MOST NEGATIVE NUMBER .1x2-s 0 UNDERFLOW RANGE* ::::;.29 x 103 8 .111X2 ~ 1.7 7 F OVERFLOW RANGE x 1Q38 l l ZERO SMALLEST NEG. NUM. SMALLEST POS. NUM. *EXACT ZERO DOES NOT CAUSE UNDERFLOW TK-0521 Figure 2-6 Overflow and Underflow Ranges If an overflow condition is sensed, the overflowed number is useless. The FPA manufactures a reserved operand and informs the CPU that an overflow occurred. The CPU notes the overflow and stores the reserved operand. The FPA returns to IRD. Underflow is not as serious a problem. It merely indicates that the number is so small and so close to zero that the FPA cannot accurately represent it. If an underflow occurs the FPA sets the underflowed number to zero and informs the CPU that an underflow has occurred by asserting.both FP SYNC and ERR SYN. It is important to inform the CPU that a zero has been returned because the CPU may at some later time attempt a division by the result (division by zero results in an error). Zero If a zero code is encountered in an operand transmitted to the FPA from the CPU, FPA microcode simulates the special properties of addition, subtraction, multiplication, and division with zero. Refer to Table 2-2 for the result of an operation with zero. If an exact zero is generated as a result of an FPA operation, the zero code is returned to the CPU and the condition code bits are set for a zero result. Zero can be generated in a normal arithmetic add or subtract operation (equal or equal-opposite operands) or in a microcode simulated arithmetic operation with a zero operand. An operation that generates an exact zero does not assert ERR SYN like an underflow operation (although both return a zero code). Reserved Operand Refer to Table 2-3 for the condition codes returned to the CPU when a reserved operand is encountered by the FPA. 2-11 2.2 INSTRUCTIONS AND ALGORITHMS This section concentrates on the microcontrol used to carry out each FPA instruction. Each instruction accesses different microcontrol addresses to correctly move and load operands, compute intermediate results, and ready the final result for return to the CPU. Special instructions check for and handle errors and exceptional conditions. This section details the data flow between hardware required to carry out the selected instruction. It only summarizes the hardware actions started once the data has been loaded by the microcontrol. Paragraph 2.3 contains a complete and detailed description of the hardware in each FPA section. Paragraph 2.2 and 2.3 complement each other and both should be read to thoroughly understand how the hardware implements each FPA instruction. As stated before this section concentrates on data flow. Figure 2-7, FPA block diagram, shows the data bus interconnections and the various register in the FPA. Although this figure is not specifically referenced in the discussion it will help in understanding the data flow and should be referred to frequently. 2-12 10 FCT M8289 SLOT 28 FPA CONTROL SIGN PROCESSOR. EXPONENT PROCESSOR 8 EXPONENT DIFFERENCE I FN M M8285 SLOT 24 FRACTION NORMALIZER FM H/FM L M8286/M8287 SLOT 25/SLOT 26 FRACTION MULTIPLIER FAD M8288 SLOT 27 FRACTION ADDER 32 64 DOUBLE PRECISION 32 I I I 60 QUOTIENT NALU 4 8 8 SIGN PROCESSOR 60 NROM EALU OPCODE 32 ROUND BIT GEN 60 ROM STAG REG 8 60 2 32 34 25 32 BUS FPB 7 26 FALU MUX 25 68 24 32 32 15 BUS FPA <33:00> FPA CONTROL IB OPCODE SPECIFIERS CONTROL STORE 512x48 NEXT ADDRESS CS BUS <95:00> - - - CONTROL TO ALL ROM FPA LOGIC DATA......__ __,. BUF. I I I I I BUS FP B <33:00> 32 D-+µMATCH I µBRK ALA GRA I I I +----~--------...-. :00> SYSTEM ID BUS <31 BUS DFMUX <31 :OO> (T.S.) TK-0638 Figure 2-7 FPA Block Diagram 2-13 During IRD (instruction decode) the FPA performs some operations that are prerequisites to many FPA instructions. The FPA assumes a R-R float instruction and begins FPA register loading. The FPA has two copies of the CPU general registers. During IRD, it receives specifier information from the IB and accesses the register addresses contained. The contents of the first specifier is placed on FPA bus A, the content of the second on bus B. The data on bus A is loaded in ARI, LA, SA, MCI, and MPO; bus B loads BRl, LB, SB, MPl, and MCI. ARI and BRI are fraction registers used for the addition and subtraction of floating-point numbers. LA and LB are loaded with the exponents of the numbers and immediately the hardware begins an exponent difference calculation. The exponent difference and/or which exponent is larger is needed for floating-point additions, subtractions, and multiplications. SA and SB are input registers for the sign-processing hardware. Fraction data from specifier I (on bus A) is loaded into multiply registers, MCI (multiplicand) and MPO (multiplier). Fraction data from specifier 2 (on bus B) is loaded into MPI (multiplier) and MCI (multiplicand-integer). MCI and MPI hold operand data for MULF and EMODF instructions. The hardware multiply begins the MULF or EMODF fraction multiply operation during IRD using MCI and MPl. MCI and MPO contain the operand for a MULL instruction. During IRD, numerous FPA instructions have been started. If the instruction is a float register-toregister, both operands are already loaded and ready in the FPA. Exponent manipulations needed for add, subtract, and multiply operations have started. MULF and EMODF fraction multiplication have started. If the instruction decoded is a MULL, the multiplier and multiplicand have already been loaded into the proper registers. 2.2.1 Add/Subtract The FPA add/subtract operations can be broken into three states: I. 2. 3. Load Add/Subtract Normalize. 2.2.1.1 Load - While the FPA is in IRD, it is setting up for a float, R-R operation. This means that specifiers 1 and 2 from the instruction buffer are being placed on FP buses A and B, respectively. Bus A loads ARI (fraction register), LA (exponent register) and SA (sign latch). Bus B loads BRl, LB, and SB. When the FPA decodes a floating-point instruction, it enters A-Fork and selects a microword address based on op code and specifier types. If the instruction is a float R-R A/S, the FPA enters the optimized add/subtract execution state immediately. If, however, it is not, the FPA, under-control of the selected microword, receives and stores the required data during A-Fork and possibly B-Fork flows. If it is double-precision, 32 additional fraction bits are loaded into both ARO (extension of ARI) and BRO (extension of BRl .) If it is not an R-R operation, the new data from the correct source is loaded into ARI, LA, SA, BRI, LB, and SB. As tne final correct operands are loaded, whether during IRD (in the case of float R-R operations) or during some following microcontrol state in A-Fork or B-Fork, the exponent difference of the two operands is determined by comparing LA and LB in DALO and CALU. Based on the exponent difference, the fraction associated with the smaller exponent is loaded into SHMX and right-shifted by ASHR until the radix points align. This happens before entering the add/subtract state. 2.2.1.2 Add/Subtract - In this state, the fractional result is computed. Based on the op codes, signs of the operands, and exponent difference, FALU operation is selected. Normally, the FALU adds or subtracts the already aligned fractions for the fractional result. Refer to Table 2-4 for normal FALU operation, and Table 2-5 for special FAD operation criterion. 2-14 Table 2-4 FALU Operation Op Code Operand Sign FALU Operation ADD ADD SUBT SUBT Same Di ff Same Di ff Add Subtract Subtract Add Table 2-S Combination of Conditions Initializing Special FAD Operation FALU Subtract Exponent Diff Op Code Precision Yes Yes Yes Greater than 7 Greater than 1 Less than 2 x D D POLY POLY x X = Don't care The special FAD operation is used to ensure maximum accuracy in the result while operating within the FPA hardware constraints. The special FAD operation involves complementing the fraction associated with the smaller exponent by subtracting the fraction from zero in the FAD, returning the complemented number to the fraction register (either AR or BR) it was in originally, and then loading it into SHFMX and right-shifting and sign-extending based on exponent difference until the radix points align. This special operation takes an extra microstep but ensures maximum accuracy. As a result, the actual fraction subtraction to produce the result does not take place until this third state. During the add/subtract state, the larger exponent is transferred to the PR. 2.2.1.3 Nonnalize - In this state, the answer is readied for return to the main machine. This involves final normalization of the fraction, adjustment of the exponent and determination of the resultant sign. If the calculation involved special FAD operations as discussed in the previous paragraph, the fraction subtraction will first be carried out and then the result will be readied for return to the main machine. When entering the normalization flows, the FPA checks three conditions: 1. 2. 3. Exponents equal zero FALU subtract with exponent difference less than two Subtract, exponent difference less than 7, and DP. If a zero operand is noted, the other (non-zero) operand is transferred to the output and if it is the subtrahend in a FALU subtraction, the sign is complemented (minuend - subtrahend = remainder; 0 X = -X). A FALU subtraction with exponent difference of 1 or 0 initiates special flows because the subtraction of two nearly equal numbers can result in a very small fraction (numerous leading icros) which might require many shifts before the first significant bit is located. The special flow initiated can shift the result up to sixty places to find the first signficant bit before it is transferred to the standard normalize routine. If a first significant bit is not found after 60 bits have been shifted, a zero is readied as a result. If the third branch is taken, the addition state described in Paragraph 2.2.1.2 results, then flow reenters the normalization routine. 2-15 Usually, the unnormalized result requires a shift of four places or less. If this is the case, the four MSBs are examined to locate the first significant bit. Based on the location of the first significant bit, a rounding byte is added to the fraction. If the result from a FALU subtractio_n is negative, the. FALU result is subtracted from the rounding byte to return the number to sign magnitude notation and round it in a single step. Once the FALU result is added to or subtracted from the rounding byte, the fraction is shifted and least significant bits are dropped. In all cases, the num her of shifts required to ready the fraction for return to the CPU is computed and is used to adjust the exponent in the PR. Once completed, the exponent,. the normalized fraction, and the sign of the result are placed on the FP bus A. When the complete result is on the bus, standard routines handle the actual transfer to the main machine. 2.2.2 Multiply (Floating-Point) The FPA multiply operation can be broken into three operations: load, multiply, and normalize. In the process of carrying out a FP multiply, the FPA receives the operands (each consisting of an exponent, fraction, and sign bits), checks for zeros and reserved operands;Ioads the exponent, fraction, and sign bits into the appropriate registers; starts the hardware to carry out the required calculations; and assembles and readies the result for return to the CPU when notified that the hardware calculation is finished. 2.2.2.1 Load - To maximize speed, the FPA is continuously setting up for a float R-R operation. This means that in IRD specifiers, 1 and 2 from the instruction buffer are addressing the GPRs (generalpurpose register) in the CPU, and the register data is being placed on FP buses A and B, respectively. Bus A loads MCI (multiplicand register), LA (exponent register) and SA (sign latch.) Bus B loads MPI (multiplier register), LB, and SB. When the FPA decodes a floating-point instruction, it enters A-Fork and branches to a specific microword based on op code and specifier type~. If the instruction is a float R-R multiply, the operands are already loaded and the FPA enters the multiply state immediately. If, however, it is not, the FPA, under control of the selected microword receives and stores the required data during A-Fork and possibly B-Fork flows. If it is a double-precision multiply, 32 additional fraction bits are loaded into both MCO (extension of MCI) and MPO (extension of MPl .) If one or both of the specifiers are not registers, ail new data will be loaded into MCI, LA, SA, MPI, LB, and SB. As the final correct operands are loaded, whether during IRD (in the case of float R-R operations) or during some following microcontrol state, the fraction multiplier begins the fraction multiply by breaking the fractions into nibbles and beginning the hardware multiplication using the first multiplier nibble. 2.2.2.2 Multiply - In the multiply state, the fraction multiplication continues until a final fraction (as yet unnormalized) is computed, the exponents are added, and the sign of the result is computed. The fraction multiplication is initiated when the multiply flows issue MCONT (multiply continue.) As MCONT is issued, the FPA checks for operands equal to zero or minus zero (reserved operand.) If a zero operand is found, computation stops and the FPA immediately returns a zero to the base machine. If a reserved operand is found, the operation aborts. If neither are found, computation continues. In the case of a float (single-precision) multiply, the fraction multiplication is completed as the exponent calculation is completed. The product is transferred to the NR. In a double-precision multiply, the microcontrol enters a wait state. While waiting during a double-precision multiply, the FPA continually transfers the output of the fraction multiplier to the normalizer. This enables the FPA to begin normalizing the fraction result as soon as the multiplication is complete. It remains in the wait state until a hardware counter in the fraction multiply logic asserts MUL/DIV DONE indicating the fraction multiply is complete. 2-16 While the fraction multiply and the check for zeros and reserved operands is taking place, the exponents are added If no zeros or reserved operands are found, the fraction multiply and exponent processing continues. After the exponents are added, a bias of 200g or 8016 is subtracted from the exponent result to return the exponent to excess 80 notation (refer to Paragraph 1.5). In a multiply operation, the sign of the result is the exclusive-OR of the operand signs. By the time the fraction multiply is complete, the exponents have been added, and exponent bias subtracted, and the sign of the result has been calculated. The result of the fraction multiply is moved to NR. 2.2.2.3 Normalize - The normalize state of a floating-point multiply is very simple. Since the input operands are always between 1/2 and 1, the result is always between ,1/4 and 1. This means that the result can be normalized with a single shift of four bits, or less. In the normalize state, the fraction is rounded and shifted, and the exponent is adjusted to reflect the normalization shift. The normalized fraction, adjusted exponent, and sign bit are placed on the FP bus A. Once the complete result is on the bus, standard routines handle the actual data transfers to the main machine. 2.2.3 MULL (Multiply Integer Longword) The FPA's MULL algorithm is the simplest and most straightforward of all the operation flows. The FPA receives two 32-bit signed integers, pe'rforms an unsigned multiplication, and returns the 64-bit answer to the base machine. The FPA performs no result normalization, no checks for reserved operands, zero operands, or other error conditions. Microcode in the base machine generates the condition codes and handles all the checks and manipulations required to ensure a correct result. 2.2.3.1 Load - As discussed in introductory Paragraph 2.2, the FPA during IRD loads MPO and MCI (the two registers used in MULL operations) with the register contents of specifier 1 and 2, respectively. If the instruction decoded in the A-Fork flows is a R-R MULL, the FPA can begin the multiply immediately. If it is a MULL but not an R-R, the FPA will, under the control of the selected microaddress, load data from the correct source into either or both MPO and MCI. 2.2.3.2 Multiply and Return - The decoding of a MULL causes the fraction multiply hardware to abandon set-up of a MULF and begin accessing the registers used for MULL (MCI and MPO.) When the proper data has been loaded, MCONT is issued by the FPA. This indicates to the fraction multiply hardware that the correct data is in MPO and MCI, and that the data accesses started previously were accessing correct data. MCONT enables the fraction multiply hardware to continue multiplying. The multiply continues, controlled by a hardware sequencer within fraction multiply hardware, while the FPA waits two machine cycles. The answer accumulates in ACCM and LSH. After two wait cycles, the multiply is finished. The hardware stops and the FPA makes the 32 low-order bits (from LSH) available to the CPU. When the CPU responds with CPSYNC, indicating the low-order bits have been stored, the FPA readies the high 32 bits from SALU for transmission to the CPU. 2.2.4 Divide The FPA divide operation can be broken into three steps: load, divide, and normalize. To do a floating-point divide, the FPA receives the operands (each consisting of sign, fraction, and exponent bits), loads the operands into holding registers, tranfers the operands from the holding registers into the correct division registers, starts the hardware to do the fraction division, checks for zero and reserved operands, starts the hardware to store the result, and normalizes and packs the result for return to the CPU. 2-17 2.2.4.1 Load - The loading of division operands takes place in two substeps: data fetch, and division register load. Unlike the FPA add/subtract, multiply, and MULL operations, the FPA does not load division operands into the proper division registers during IRD (Table 2-6). Table 2-6 The Division Load Specifier 1 Specifier 2 IRD Register and float assumed (divisor) Register data to ARI, LA, SB Register and float assumed (dividend). Register data to BRI, LB, SB Data Fetch Substep Op code decoded, specifiers and precision known Division Register Load Substep 2 microwords New data loaded into ARI and ARO*, LA, and SA, if needed. New data loaded into BR I and BRO*, LA, and SB, if needed. I st Microword - move LA (divisor exponent) to XR. Move BR (divident fraction) to NR. 2nd Microword - move AR (divisor fraction) to just vacated NR. Move NR (dividend fraction) to RR and right shifts the just loaded divident fraction to compensate for RR 's hard wired left shift. This right shift ensures initial dividend is properly represented. Subtract XR (divisor exponent) from LB ( divident exponent). *ARO and BRO are fraction extension registers for double precision operations. During IRD a R-R float operand is assumed. This means that both specifier I and 2 are assumed to be registers. The contents of the first register named is placed in AR, LA, and SA, the content of the second in BR, LB, and SB. If the operation decode is a R-R float divide, the data fetch substep is done and division register load may begin. However, if it is not an R-R float, divide microcode waits for data from the correct specifier and loads it into either ARI, LA, and SA; and/or BR, LB, and SB. When the divisor is in AR, LA, and SA, and the dividend is in BR, LB, and SB; the data fetch substep is finished. The division register load substep loads the divisor's and the dividend's fraction and exponent components into the registers required to do a division. The loading of the proper registers takes two microcode steps. The first microcode step loads the divisor exponent into XR and loads the dividend fraction into the NR. The second microcode step finishes the register loading by moving dividend fraction (in the NR) to the RR and loading the just vacated NR with the divisor fraction from the AR. It also starts the fraction division hardware, checks for zeros and reserved operands, and subtracts the divisor exponent (XR) from the dividend exponent (LB) (LB - XR). 2-18 2.2.4.2 Divide - The divide operation continues unless a zero, or reserved operand is found. If a zero dividend is found, operations cease and a zero is readied for return to the CPU. Finding a zero divisor or a reserved operand initiates error states. The FPA will remain in these error states until returned to IRD by a CPU signal. If no zeros or reserved operands are found, the division continues. A bias 80 is added to the result of the exponent subtraction to return it to excess 80 notation (Paragraph 1.5.) The fraction multiply hardware is started. This hardware is used to store the result of the fraction division as it is generated. The division continues under hardware control as the FPA microcode remains in a divide wait loop. The hardware uses the restoring, repeated subtraction technique to divide. The dividend is initially loaded into the RR and the divisor is stored in the NR. The divisor (contents of NR) is subtracted from the dividend (contents of RR). If the result is negative, a zero is left-shifted into result register in the fraction multiply hardware and the contents of the RR is left-shifted by one. If the result is positive or zero, a 1 is left-shifted into the result register, and the result is loaded into the remainder register left shifted by one. The divisor (contents of NR) is continually subtracted from the contents of the RR until 26 bits (58 bits for double precision) of quotient are generated. MUL/DIV DONE is now asserted. Asserting MUL/DIV DONE stops the division and ends the divide wait loop. The divide result is transferred from the fraction multiply hardware where it was stored during generation to the normalize register (NR) in the normalize hardware. 2.2.4.3 Normalize - Since the two initial operands are normalized (between 1/2 and 1), the result is always positive and between 1/2 and 2. This means the normalize and round operation is simple and will take only one microstep. The result is examined, a round byte is selected and added, and the data is shifted as needed to produce a normalized result. The exponent result is adjusted to reflect the direction and amount of the fraction shift. The normalized fraction, adjusted exponent, and sign bit are placed on the FP bus(es). Once the result is on the bus(es), standard storage routines handle the actual transfer to the CPU. 2.2.5 EMOD (Extended Precision Multiply and Integerize) The EMOD operation is partially done in the FPA. The FPA performs an unsigned 32 X 24-bit (64 X 56-bit for double precision) multiplication and returns the fraction result to the main machine. The main machine does all further processing. The FPA EMOD operation can be broken into two steps: operand load, and result calculation and return. 2.2.5.1 Operand Load - Loading the EMOD operands involves loading the multiplicand, an 8-bit multiplicand extension, and the multiplier into proper registers. The multiplicand (either single or double precision) is loaded into MC during A-Fork. In B-Fork, EMOD flows are started. These flows wait for the CPU to fetch the multiplicand extension (8 bits) and transmit it to the FPA via the ID bus. The FPA loads the extension into MCX which is part of the MCI register. The second operand is then transmitted to the FPA and loaded into appropriate multiplier register MPO and MPl. The multiplier is not extended. The FPA receives and stores the exponent and sign associated with both operands but does not use them. 2.2.5.2 Result Calculation and Return - Once the operands are loaded, MCONT is asserted and the FMOD multiply begins. The operands are tested for zeros or reserved operands. If zeros are found, special flows stop the multiply and return a zero to the CPU. Finding reserved operands initiates error flows. If no exceptions are found, the multiply sequencer, started by M CONT asserted, continues multiplying. A single precision (float) multiply is finished in one microstep after the exponent test. A double precision multiply causes the FPA to enter a wait loop. It remains in the wait loop until the multiply sequencer asserts MUL/DIV DONE indicating the result is computed. 2-19 When the result computation is finished, the fraction (32-bit float, 64-bits double) is transmitted to the CPU. The CPU does all further processing including sign computation, removal of the integer part, normalization, and exponent calculation. 2.2.6 POLY (Polynomial Evaluation) 2.2.6.l Introduction - POLY is an FPA implemented instruction. The FPA does the majority of calculations required to evaluate a polynomial expression. This involves storing a constant, and an accumulation; receiving coefficients; repeated additions and multiplications using the constant, the accumulation, and the riew coefficient, and the readying of a final result to be returned to the CPU. It also uses specialized operations (both hardware and microcode) to ensure maximum accuracy within the FPA hardware limits. The following paragraphs explain POLY flows, polynomial expression and define various terms, and POLY exceptions in detail. Also discussed are the numerous flows required to handle errors, underflows, overflows, and zeros. 2.2.6.2 The Polynomial Expression - The generalized polynomial may be written: f(x) = ao + a1x + a2x2 + ... + anxn. The x, a constant within each polynomial, is called the argument and is raised to various powers: xi, x2, x3, ... , xn. The highest power represented here by n superscript is called the degree of the equation. The ao, a1, a1, ... , an are the coefficients. Rearrangement and factoring produces f(x) = ao + x(a1 + x (a2 + ... + x(an-1 + xan ))). The result, f(x), may be computed: an times x then add an-I ; the resultant answer times x and then add an-2 . .. The generalized form is: (accumulation times x) plus the new coefficient, ai, equals the new accumulation. The POLY instruction format is POLY argument, degree, coefficients table. The FPA receives and stores the argument. The CPU uses the degree operand to determine when it has accessed the last coefficient of the table so it may inform the FPA that the POLY calculation is done. The coefficient table is arranged in an, an-I, an-2, ... , a1, and ao order. The CPU transmits the coefficients to the FPA as needed: an first, an-I next, ... 2.2.6.3 Normal POLY Flows- The FPA begins special POLY flows in B-Fork. The POLY argument is transferred to the FPA during A-Fork and then loaded into the argument registers. The argument fraction is loaded into MP, the exponent into XR, and the sign is SX. The argument remains in these registers throughout POLY execution. The FPA waits for the first coefficient to be sent so the POLY computation can begin. POLY computation can be divided into three large categories: 1. 2. 3. Argument and First Coefficient Handler Generalized POLY Computation (neither first term or last term) POLY DONE Handler (handles Ao, the last coefficient). This section will discuss the flow by these three categories. Within each category, microcode controls the normal operations, checks for exceptional conditions, and attempts to recover from any exceptional conditions. Refer to Figure 2-8 for a summary of the POLY flow. 2-20 POLY BEGINS WITH ARGUMENT IN AR, LA, AND SA j_ FIRST COEFFICIENT HANDLER •MOVE ARGUMENT TO REGISTERS MP+-AR XR +-LA SX+-SA ARGUMENT FRACTION ARGUMENT EXPONENT ARGUMENT SIGN * IF ARGUMENT IS ZERO, FLOW REMAINS IN THIS HANDLER WAITING FOR LAST COEFFICIENT WHICH WILL BE FLAGGED BY POLY DONE .,, ciQ' ..,c: 0 ~ 00 t;-' f...) -l ::r 0 *WAIT FOR FIRST COEFFICIENT ·MOVE COEFFICIENT TO REGISTERS COEFFICIENT FRACTION MC.BR +-A(N) COEFFICIENT EXPONENT LB +-A(N) COEFFICIENT SIGN SB+-A(N) TRANSFER COEFFICIENT SIGN SA+-SB "MULTIPLY COEFFICIENT AND ARGUMENT FORMING MULT.RESULT MULTIPLY FRACTIONS AR+-MP"MC ADD & ADJUST EXPONENTS LA.PR +-XR+LB-128 COMPUTE SIGN SA+- SA.XOR.SX "IF OVERFLOW/UNDERFLOW ENTER GENERAL POLY FLOWS ATTEMPTING A RECOVERY ""C ...... NORMAL --... ENTRY 0 r .,,-<: 0 ~ • • 0 0 J ...... LAST COEFFICIENT HANDLER (POLY DONE ASSERTED AND ARGUMENT OR DEGREE= 0) ANSWER IS JUST LAST COEFFICIENT • READY COEFFICIENT FOR RETURN PR+- LB TRANSFER EXPONENT NR +-BR TRANSFER FRACTION SA+- SB TRANSFER SIGN • GO TO REGULAR STORE FLOWS NSHF +- NR TRANSFER FRACTION ASSERT FPSYNC INDICATING ANSWER IS READY OVERFLOW/UNDERFLOW ENTRY GENERAL POLY FLOWS (NO POtY DONE) "WAIT FOR COEFFICIENT ·MOVE COEFFICIENT TO REGISTERS BB +-A(I) COEFFICIENT FRACTION LB +-A(I) COEFFICIENT EXPONENT SB +-A(I) COEFFICIENT SIGN "ADD COEFFICIENT AND MULT. RESULT FORMING ACCUMULATION ADD FRACTIONS NB +-AR+BR PR+- MAX( LA.LB) SELECT EXPONENT NORMALIZED MC+- NR NORMALIZED PR+- PR SIGN OF ACCUMULATION SA +-SR "IF OVERFLOW. ERROR IF UNDERFLOW ACCUMULATION IS SET TO ZERO MULTIPLY ACCUMULATION AND ARGUMENT FORMING MULT.RESULT ARGUMENT• ACCUMULATION AR +-MP MC ADO & ADJUST EXPONENTS PR+- PR+XR-128 CGMPUTE SIGN SA~ SA.XOR.SX "IF OVERFLOW/UNDERFLOW. CONTINUE GENERAL POLY HOWS ATTEMPTING A RECOVERY 0 POLY DONE POLY DONE ...... p LAST COEFFICIENT HANDLER (POLY DONE ASSERTED) •WAIT FOR COEFFICIENT • MOVE COEFFICIENT TO REGISTERS BR.+-A(I) COEFFICIENT FRACTION LB +-A(I) COEFFICIENT EXPONEN i SB +-A(I) COEFFICIENT SIGN • ADD COEFFICIENT AND MULT.RESULT FORMING ACCUMULATION NR +-AR+BR ADD FRACTIONS PR+- MAX( LA.LB) SELECT EXPONENT "IF OVERFLOW. ERROR • GO TO REGULAR NORMALIZE FLOWS NSHF +- NR NORMAL FRACTION PR+- PR ADJUST EXPONENT SA+- SR SIGN OF RESU· T ASSERT FPSYNC INDICATING ANSWER IS READY Within the flows different microcode handles float and double precision operation. In POLY double coefficient, argument, and accumulation fractions each use an additional 32 low-order bits. The differences between float and double precision are not discussed in each operation because it is normally limited to longer fraction multiply times and slower fraction transfers. These come about because there are more bits to be multiplied and moved. When the first coefficient, Ao, is sent, it is loaded in MC, LB, and SB. Since the argument has not yet been checked, both the argument and the coefficient are checked for exception conditions and POLY DONE is checked. If any exception condition is noted, special flows are accessed. POLY DONE asserted indicates that the coefficient just sent was the final coefficient (in this case, the first coefficient is also the last coefficient). If the argument (x) is zero, all terms except the Ao term of the polynomial will be zero. Either POLY DONE asserted or x equals zero causes the FPA to access a special last coefficient routine in the argument and first coefficient handler that returns Ao to the CPU as the result of the polynomial calculation. After both the. argument and the coefficient are checked and no exception conditions are found, the first multiply takes place. While the fractions are multiplied in the fraction multiply logic (FML and FMH), the exponents are added and adjusted to return the excess 80 notation (FCT) and the sign of the result is computed (FCT). When the multiply is done, the fraction is moved to AR for the addition operation. To maximize calculation accuracy, no normalization is performed after the multiplication and 8 additional low-order fraction bits are transferred to the AR register and stored in ARX. These 8 bits are used when the new coefficient is added to the multiplication result to produce the new accumulation. While the multiplication fraction result is being transferred to AR, the exponent result is checked for exponent overflow or underflow. If no overflow or underflow is found, the addition will begin as soon as the new coefficient data is ready. If, however, overflow or underflow are sensed, special flows that attempt to recover from the over/underflow are accessed (Paragraph 2.2.6.4). While the new coefficient data is checked for zero and/or reserved operands, the addition/subtraction begins on the assumption that the coefficient data will be valid. The exponent difference hardware selects the larger exponent for processing by the FCT and loads it into PR. It also shifts and loads the fraction associated with the smaller exponent into the B-input of FALU. FALU then adds or subtracts the fraction. When the coefficient data proves valid, the computed fraction result is transferred to NR where it can be normalized. The fraction normalization takes place in the FNM logic. A rounding byte is added and the result is shifted until normalized. The exponent is adjusted based on both the rounding byte and the number of shifts required to normalize the fraction. The normalized fraction is moved to MC and a multiply with the stored argument (x) begins. Once the first multiply is completed, the POLY calculation is in the general POLY flow. These flows multiply by the result of the last add and normalize by the argument (x), receive a new coefficient from the CPU, check it for exceptional condition, then add it to the result of the multiply operation, normalize the result of the addition, and ready it for the next multiply. The general POLY flows check the intermediate results for overflow, underflow, and zeros, and access special flows if an exception is found. The general POLY flow continues until the CPU sends a coefficient flagged with POLY DONE rather than CP SYNC. This indicates that the coefficient just transmitted is the final coefficient in the table. The POLY DONE flow adds the final coefficient and then accesses the normalization flows in the FPA addition flows. These flows round and normalize the fraction and adjust the exponent based on therounding byte and normalization shift. Once the result is complete, it is placed on the FP bus A and standard routines handle the transfer to the CPU. 2-22 2.2.6.4 POLY Exception Flows - The POLY flows have many special sections to check for and handle exceptional conditions. Each coefficient is checked for zeros and reserved operands. The POLY argument is checked for zero. The CPU checks the argument and degree for reserved operand. The FPA also checks the intermediate results for underflow, zero, and overflow. If an underflow or overflow is detected, special flows attempt to recover from the condition without a loss of accuracy. The exception flows (zero, reserved operand, overflow, and underflow) can be divided into three categories to handle exceptions discovered during: 1. 2. 3. First coefficient and argument handling General coefficient handling POLY DONE (final coefficient) handling. Within each category, different microcode handles float and double precision operation. However, there is little difference between the exception procedures used in each category and only minor differences in the microcode. As a result, each individual exception flow is not discussed, rather the microcode procedure for each type of exception is explained. Zeros The argument and each coefficient are checked for zeros. The argument and first coefficient are checked for zeros at the start of the POLY flow. If the argument (x) is zero, all the terms of the polynomial will be zero except Ao, the last coefficient. With the argument equal to 0, the FPA will remain in the first coefficient loop waiting for the last coefficient (flagged by POLY DONE). When it is received, it will be tested for reserved operand and, if not reserved, will be returned to the CPU as the result of the polynomial. If the first coefficient is zero, the accumulation registers will be set to zero and the FPA will wait for the next coefficient. If a zero is found as a subsequent coefficient (when the current accumulation is not zero), the current accumulation which is unnormalized will be rounded and normalized, and the FPA will wait for the next coefficient. Reserved Operand F.ach coefficient is checked by FPA hardware for reserved operand. If a reserved operand is found, the POLY operation is immediately aborted and the accelerator error bit is set. The argument is not checked for reserved operand by the FPA because it is checked in the CPU and, if found to be reserved, the POLY operation never starts in the FPA. Overflow The FPA checks for overflow by examining the exponent bits PR8 and PR9 in the PR register. If PR8 (the overflow bit) is high and PR9 is low, an overflow has occurred. The FPA checks each current accumulation two times per cycle for an overflow condition - once when the unnormalized multiplication result is readied for adding the new coefficient and once after the addition result has been rounded and normalized. If an overflow is detected in the second instance (normalized addition result overflow) the FPA will abort. The FPA will set the PSL V (overflow) bit and wait until the CPU traps it back to IRD. If the unnormalized multiplication result overflows, the FPA accesses overflow routines in an attempt to recover an accurate result from the overflow. The FPA microcode is written based on the assumption that if the new coefficient exponent is subtracted from the current overflow, the result may be small enough that the exponent will no longer overflow (PR8 will be low.) As stated before, PR8 is high. This means the exponent in PR is lOXXXXXXX (9 bits long.) Since the exponent difference taker EALU is only 8 bits long, the overflowed exponent must be scaled down. The FPA subtracts 8016 to scale it down. 2-23 The new coefficient is first checked for zero or reserved operands. A reserved operand causes an abort. If the coefficient is zero, it will not change the overflow. The FPA will attempt to recover from the overflow by first adding back the 8016 to return the exponent to correct value, then normalizing and rounding. If this fails the FPA will set the overflow bit and abort. If the new coefficient is not zero or reserved, the operation continues. The FPA subtracts 8016 from the exponent of the coefficient to scale it down. The reduced exponent coefficient is checked for underflow. If an underflow is sensed, the coefficient is effectively zero when compared with the accumulation. Since the coefficient is effectively zero, the FPA will attempt to recover from the overflow by first adding back the 8016 to return the exponent to correct value, then normalizing and rounding. If this fails. the FPA will set the overflow bit and abort. If the reduced coefficient did not underflow, it shows that the coefficient can effect the accumulation and possibly recover it from the overflow condition. In the case of accumulation overflow flows, we know the accumulation is the larger number. Therefore, no checks are performed on the exponent to find the larger number. The exponent difference taker then subtracts the two scaled down exponents to determine how many times the coefficient must be shifted to align the radix points. The POLY add/subtract will take place. The accumulation fraction is moved through ADER MUX to FALU and the restored (8016 added) accumulation exponent is moved to PR for processing. The POLY add/subtract takes place. The fraction result is moved to NR where it is normalized and rounded. The result exponent (formerly the accumulation exponent), is adjusted based on the fraction normalization and rounding. The result is check~d for overflow and underflow. As stated at the beginning of this overflow section, an overflow after the normalization and rounding operation will cause the FPA to assert the overflow V bit and abort. Underflow The FPA can handle numbers as small as .29 X IQ-38. A number smaller than this causes an underflow. The FPA checks for underflow by examining the exponent register PR. PR9 will be high or PR <8:0> will be low in an underflow. Underflow is not as serious a fault as overflow. An underflow means the result just checked is so close to zero that the FPA cannot accurately represent it. When encountered, the FPA sets the ACC ZDA TA bit and special flows attempt to recover the number. If the underflow result cannot be recovered, the number is set to zero and FPA operation continues. After the POLY operation is completed, the CPU will trap on underflow if bit 6 (floating underflow) of the PSL is set. The FPA checks for accumulation underflow twice per POLY cycle, once as the unnormalized multiplication result is readied for the following addition and once after the result of the addition has been normalized and rounded. If an underflow is detected in the normalized addition result, no result recovery is possible. The FPA merely sets the accumulation to zero, informs the CPU of the underflow, and continues the operation. If an underflow is detected after the multiplication, special flows are accessed to save the result. In an underflow the exponent of both the accumulation and the coefficient must be scaled up so the exponent difference can be taken with an 8-bit exponent processor. The scale factor is 80t6· The coefficient is first checked for zero or reserved operands. A reserved operand causes an abort. A zero coefficient will not change the underflow so the FPA will try to recover by normalizing and rounding. If this fails, the accumulation will be cleared (set to zero) and the FPA operation continues. 2-24 If the new coefficient is not zero or reserved, the operation continues. The FPA adds 8016 to both exponents to scale them up. If the coefficient exponent overflows when it is scaled up, the coefficient is so much larger than the accumulation that the accumulation will not effect the coefficient. The FPA will disregard the accumulation and make the new coefficient the accumulation by subtracting the 8016 just added to the coefficient exponent and moving the coefficient to the registers formerly holding the underflow accumulation. If the new coefficient does not overflow, it shows that the coefficient can effect the accumulation and the exponent difference taker determines the exponent difference. Since the coefficient is the larger number, the coefficient fraction is moved through the ADER MUX to the FALU and the coefficient exponent is stored in PR after the bias previously added is removed. The accumulation fraction is shifted based on the exponent difference until the radix points align, and then added/subtracted. The result is rounded and normalized in the normalize logic. The coefficient exponent (stored in PR) is adjusted based on the fraction normalization and rounding, and becomes the accumulation exponent. The rounded result is checked for underflow. If underflow is detected, the ACCZ bit is set and a zero is stored. The FPA informs the CPU that an underflow has occurred by asserting both FP SYNC and ERR SYNC. In any case, the polynomial operation continues. 2.3 BLOCK DIAGRAM AND UNIT DESCRIPTION This section provides a functional description of each area of the FPA with relation to the control store and instruction execution. Discussions of logic unit operations are included for areas that require further clarification. The FPA can be divided into three areas. The first area contains two interface sections: the CPU-FPA interface and the FPA internal buses (which interface between the various sections of the data manipulation area). The second area, data manipulation, contains five sections: Fraction Adder/Subtractor, Fraction Normalizer/Divider, Fraction Multiplier, Exponent Processor, and Sign Processor. Each section in this area operates as an independent unit, capable of processing data in parallel with operations being performed in other sections. The third area contains only the Control Store and Logic which controls both interfacing and data manipulation. Refer to Figure 2-9, the FPA Block Diagram. 2-25 FCT M8289 SLOT 28 FPA CONTROL SIGN PROCESSOR, EXPONENT PROCESSOR 8 10 r EXPONENT DIFFERENCE I FN M M8285 SLOT 24 FRACTION NORMALIZER FM H/FM L M8286/M8287 SLOT 25/SLOT 26 FRACTION MULTIPLIER FAD M8288 SLOT 27 FRACTION ADDER 32 64 DOUBLE PRECISION 6 I I I 60 32 QUOTIENT NALU 4 8 8 SIGN PROCESSOR 60 NROM ADER MUX EALU 32 OPCODE 8 8 ROM STRG REG 64 8 ROUND BIT GEN 60 60 AR 1 BR ARX 0 0 2 32 34 25 32 BUS FPB 15 7 26 25 68 32 24 32 BUS FP A <33:00> FPA CONTROL 18 OPCODE SPECIFIERS CONTROL STORE 512x48 - - - CONTROL TO ALL ROM FPA DATA....__ _ LOGIC __ I I BUS FP B <33:00> RLA GRA µBAK BUF. NEXT ADDRESS CS BUS <95:00> 32 D-+µMATCH 32 SYSTEM ID BUS <31:00> BUS DFMUX <31 :OO> (T.S.) TK-0538 Figure 2-9 FPA Block Diagram 2-26 The CPU transmits both data and instructions to the FPA. The instructions are decoded in the Control Store and Logic and access an FPA control store word. The FPA control store word controls the transfer of the data on the FPA internal buses and the operation of the various data manipulation sections. The various data manipulation sections perform the required operations. The resulting answer is formatted and sent to the CPU-FPA interface. A signal from the FPA informs the CPU that the answer is available at the interface. Each of the eight sections mentioned in this introduction are discussed individually in the following paragraphs. Each discussion includes an explanation of pertinent control store fields and a description of the hardware operation as controlled by the control store, CPU instruction, data characteristics, and both internal and external flags. 2.3.1 CPU-FPA Interface The CPU and FPA have numerous interconnections. They exchange data, instruction information, device control signals, and status information over buses and individual signal lines. There are three types of information transferred via the CPU-FPA interface. l. 2. 3. CPU-FPA control and status Data Trap and diagnostic information. They will be discussed in this order in.the following paragraphs. Refer to Figure 2-10 for a summary of the CPU-FPA interface. - -. ID BUS #16 MAINTENANCE -- REGISTER REGISTER #17 STATUS CS BUS --- OP CODE INFORMATION MACHINE CLOCKS .. ._. --.. FPSYNC CPU FPA ._. ACC ERROR ~ GENERAL REGISTER ADDRESS LINES - DFMX BUS ...- C, V, Z, AND N BITS EXECUTION POINT COUNTER .. -.. TK·0520 Figure 2-10 CPU-FPA Interface 2-27 2.3.l.I CPU-FPA Stan. ~d Control Interface-The FPA and CPU work interactively. This means they are constantly exchanging status and control information, and that operations in one unit can and do effect operations in the other unit. The status register (ID register 17) provides some CPU control of the FPA. Bit 15 of the status register is used by the CPU to enable the FPA. The CPU can disable all FPA outputs and effectively remove the FPA from the computing system by clearing bit 15. Refer to Figure 2-11 and Table 2-7 for a complete description of this register. STATUS REGISTER ID REGISTER #17 31 30 29 28 27 26 25 16 15 14 4 3 0 I lo-ol lo~------ol lo ....-. --~•oloo o11 I ACC ERROR I I MINUS ZERO ERROR I ACC ACC EN TYPE TK·0514 Figure 2-11 Status Register 2-28 Table 2-7 Bit No. Name 31 Accelerator Error Also called ACC Also called Error Sync 30-28 Not Used-Set to zero 27 Min us Zero Error 26-16 Not Used-Set to zero 15 Accelerator Enable 14-4 Not Used-Set to zero 3-0 Accelerator Type The Status Register Bit Access .. Function Write by FPA Read by CPU Set when FPA detects an exception condition. Write by FPA Read by CPU Set when FPA encounters a reserved operand or generates an overflow. Setting th is bit sets Accelerator Error. Write by CPU Read by FPA When clear all FPA outputs are disabled. T}\is removes the FPA from the computing system. Must be set for normal FPA outputs. Read by CPU Hard wired in FPA A hardwired code identifies the type of accelerator installed in the backplane slots. The FPA code is 0001. 2-29 The FPA also receives control and status information from the CS bus. The functions of these lines are summarized in Table 2-8. Table 2-8 CS Lines CS BUS 71 70 Name 0 0 NOP 1 0 ACC TRAP Initiates an Accelerator trap. Refer to Paragraph 2.3.1.3 0 1 CPSYNC Indicates CPU has received FPA data or CPU is presenting valid data to FPA. 1 1 Redefine µSI Decodes CS lines 57, 56, and 55 for more information. Function CS BUS 57 56 55 1 1 0 Poly End Indicates last term of polynomial has been transmitted from CPU. l 1 l FP TRAP Initiates an FPA trap. Refer to Paragraph 2.3.1.3. Op code information (operation and precision) is transmitted to the FPA from the instruction buffer via IRC OPC lines 7 to 0. These lines, from byte 0 of the instruction buffer, are used by the A-Fork/BFork logic and BEN logic for FPA control store next address generation (refer to Figure 2-34). A few other lines from the instruction buffer and decode logic provide specifier source information to the FPA. The possible sources of data are as follows: 1. 2. 3. 4. Memory Register Short literal Long literal. The CPU-FPA interface includes clock signals from the CPU to the FPA. The units operate synchronously on a 200 ns cycle. The TO of both units coincide. The FPA transmits two status signals to the CPU: FP SYNC and ACC ERROR. These signals are input to the CPU for branch control. FP SYNC is normally asserted when an FPA result is available to the CPU. ACC ERROR is set during an FPA error condition. 2.3.1.2 CPU-FPA Data Interface - The FPA receives operand data from the CPU and, after performing the required operation, returns the answer to the CPU. The data is transmitted to the FP A via the ID bus and is returned to the CPU via the DF mux bus. As mentioned previously the FPA does not do any memory accessing. The CPU must calculate the data memory address, access the address, and place the data on the ID bus to the FPA. 2-30 The FPA is optimized to use CPU scratchpad register data. It stores two copies of the 16 CPU scratchpad ..-egisters. To ensure that the FPA copies are exact copies, the FPA copies are addressed and written by the same lines that address and write the CPU general registers. The address lines are from the DAP board and the data is transmitted via the DF mux bus. To ensure that a changing register is never read, lhe CPU updates the general register and the FPA copies between TlOO and T200 (TO) and the FPA reads the copies between TO and TlOO. Note that the FPA general register copies are writeonly memory to the CPU and read-only memory to the FPA. This means that results of FPA operations that are destined for the general register set are transmitted back to the CPU via the DF mux bus and then written into the general register set under CPU control rather than written directly into the general register copies by the FPA. The data stored in the FPA general register copies is read by the FP A using address lines from the instruction buffer operand source logic. This scheme enables the FPA to access register data and begin the operation as soon as the general register address/addresses is/are in the instruction buffer. All operands other than register operands are transmitted to the FP A via the ID bus. This includes memory data, and long and short literals. When memory data is specified in an instruction, the CPU fetches it and places it in the CPU D-register. The contents of the D-register is placed on the ID bus and, in the FPA, is transferred from the ID bus directly onto the FP buses. Since the D-register and ID bus are only 32 bits wide each, it takes two transfers to transmit a double precision number. Single precision (float) literal data, part of the instruction stream is transferred from the instruction buffer onto the ID bus. In the FPA, single precision literal data is latched into the literal register (LR) and then placed on the FP bus. The most significant part of double precision literal data is handled similiarly, i.e., IB -+ ID bus -+ LR-+ FP buses. The least significant part of a double precision literal is transferred from the instruction buffer over the ID bus to the CPU D-register, then back on the ID bus and onto the FP buses. Note that no ID bus addresses are required for data transfers over the ID bus. The FPA simply accepts the current ID bus data. When the FPA operation result is ready to be transmitted to the CPU, FP SYNC is asserted and the single precision result or the most significant part of a double precision result is on FP bus A. The CPU responds to FP SYNC by enabling the FPA DF mux bus drivers which place the FP bus A contents on the DF multiplexer bus. The FPA result is transferred to the CPU D-register via the DF mux bus. When the CPU has the data, it asserts CP SYNC. This ends a single precision (float) transfer or enables the second part of a double precision transfer. For a double precision transfer, the second part is placed on FP bus A and remains there until the CPU responds to the newly asserted FP SYNC by enabling the DF mux bus drivers, accepting the data, and asserting CP SYNC to indicate it has the data. While the FPA is transmitting the result back to the CPU, valid condition codes are also being transmitted to CPU condition code. latches. These latches are read during the next machine cycle. The N, V, and Z bits are set based on the status of the result. The C-bit is always cleared by the FPA. 2.3.1.3 Trap and Diagnostic Information - The FPA contains several features to facilitate error diagnosis and troubleshooting. These include programmable traps, and microdiagnostics, special maintenance features, and the visibility bus. The CPU can initiate 2 types of traps: ACC TRAP and FP TRAP. CS 71 high and CS 70 low initiate an ACC TRAP. This causes the FPA to access one of the FPA microcode addresses 0 through 7 as selected by CS lines 57, 56, and 55. Currently only 2 of these traps are used: Accelerator Power-Up Trap (address 0) and Accelerator Abort Trap (address 2). The FP TRAP (used for FP microdiagnostics), is selected by CS lines 71, 70, 57, 56, and 55 high. When FP TRAP is asserted, the FPAmicrocode address is selected by bits 23 through 16 of the maintenance register. The trap address (0 through 255 in the microcode) is selected by the data previously loaded into the maintenance register. 2-31 The maintenance register is a CPU-FPA readable/writeable register located on the ID bus. The CPU accesses this register as ID bus register 16. The register is designed to facilitate maintenance. As discussed previously it contains the FP trap diagnostic address. Using the trap address the CPU can exercise various sections of FPA logic. Bit 14 of this register provides a synch pulse that can be used for troubleshooting with an osciJloscope. This bit will go high each time the FPA accesses the microcode address stored in bits 8 through 0. Refer to Figure 2-12 and Table 2-9 for summary of this address. MAINTENANCE REGISTER ID REGISTER #16 24 23 31 30 ZERO WRITE TRAP ADDRESS 16151413 0 9 8 MICRO /CURRENT BREAK ADDRESS +-TRAP ADDRESS MICRO MATCH WRITE MICRO BREAK TK-0515 Figure 2-12 Maintenance Register 2-32 Table 2-9 Bit No. Name 31 Write Trap Address 30-24 Not Used-Set to zero 23-16 The Maintenance Register Bit Access Function Write by CPU Read by FPA When set (by CPU) enables CPU to write trap address (bits <23: 16> ). Trap Address Write/ Read by CPU Read by FPA Selects FPA microcode address for FPA microdiagnostics. 15 Write Microbreak Write by CPU Read by FPA When set (by CPU) enables CPU to write microbreak (bits <8:0>). 14 Micro match Write by FPA Read by CPU Set by FPA when currently accessed by FPA microcode address equals address stored in microbreak (bits<8:0> ). 13-9 Not Used-Set to Zero 8-0 Microbreak/Current Address CPU writes microbreak. FP A reads micro break. FPA writes current FPA microcode address. CPU reads current FPA microcode address. These bits serve two functions: I. The microbreak selects the FP A microcode address to be monitored for micromatch (bit 14). 2. The current address provides CPU monitoring of FP A microcode activity. 2-33 Forty-three FPA signals are accessed by the Visibility Bus (V bus). The V bus is a diagnostic tool, designed to allow polling of stable internal CPU (in this case, FPA) signals. The console can issue commands which load the V bus latches with the signals monitored and then shift the loaded latches one bit at a time to a control word located in the console interface. At the console, the data shifted in will be examined by diagnostic software. There are 8 data input channels on the V bus, channel 6 is devoted to the FPA. Refer to Table 2-10 for listing of the FPA signals that are available to the V bus. Table 2-10 Signals Monitored by Visibility Bus FCTDEALUOL FCTECOMPLL FADR SPC (0) H FNMS EALU CIN L FCTCSELNORM H FCTP RA ADRS 3 L FCTP RA ADRS 2 H FCTP RA ADRS 1 L FCTP RA ADRS 0 L FCTP RB ADRS 3 L FCTP RB ADRS 2 L FCTP RB ADRSS 1 L FCTP RBADRSOL DAPL ACC CONTEXT 0 H DAPL ACC CONTEXT 1 H FCTCCLRRRL FCTH CP SYNCH FNME BUS-+ EXP L FCTJACCNDATAH FCTCACCZDATAH FCTC ACC VDATA H FCTE SHF COUNT 5 H FCTE SHF COUNT 4 H FCTE SHF COUNT 3 H FCTE SHF COUNT 2 H FCTE SHF COUNT 1 H FCTE SHF COUNT 0 H FCTN FALU CARRY IN H FCTN FAMX SEL 0 H FCTN FAMX EN 0 L FCTAAGTBJ FCTN SHF MUX EN 1 L FCTN SHF MUX EN 0 L FCTN FALU FUNC SEL 2 H FCTN FALU FUNC SEL 1 H FCTN FALU FUNC SEL 0 H FCTN FAMX SEL 1 H FCTN LOAD ARI H FCTN LOAD ARO H FCTN LOAD ARX H FCTN LOAD BRl H FCTN LOAD BRO H FADS BUS-+ FAD L 2.3.2 FP A Internal Buses As discussed in Paragraph 2.3, the FPA internal buses transmit data between the various data manipulation units. These units are arranged along two parallel 34-bit tristate buses called FP bus A and FP bus B. These buses transmit data from the CPU-FPA interface to the various data manipulation units, transfer intermediate results between units, and return the result to the FPA-CPU interface. The buses can transfer a complete 64-bit double-precision word or two 32-bit float words simultaneously. The BSC field of the microword controls a majority of the bus activity. The available sources include all FPA data manipulation units and the CPU-FPA interface. Refer to Table 2-11 for a summary of BSC bus control operations. Note that the BSC field controls only the data source. The destination is enabled via other control fields and accepts the data available onthe FP buses. 2-34 Table 2-11 BSC Control Store Field Microcode Hex 3 &.SC Field 2 1 0 µCS µCS 14 µCS 13 µCS 15 0 0 0 0 I I 0 1 5 0 0 0 0 0 0 6 7 0 I 2 0 Mnemonic f'unction 1 INTH NL NH Bus A+- SALU Bus B* +-Bus A*+- NSHF LO Bus B* +-Bus A*+- NSHF HI EXP SGN (Packed result) 12 0 1 0 0 1 I 0 0 0 I I 0 PQ Buses +- SALU and LSH if MUL TEMP and LSH if DIV (LSH is accessed differently if MUL or DIV) I 0 1 0 0 I 0 0 1 0 I A 0 t 1 1 0 INTL ID LR ID.RB Bus A +-LSH Bus B* +- Bus A*+- ID Bus Bus B* +- Bus A* +-LR Bus A+- ID bus Bus B+-RB B 1 0 1 I R Bus A .-RA BusB+-RB c t t 0 0 FAL.X Bus A+- FALU HI/LO Bus B +-FALU LO/HI OR D I I 0 I FAL.LH Bus A+- FALU LO Bus B +-FALU HI E 1 I I 0 FAL.HL Bus A+- FALU LO Bus B +-FALU HI F 1 t 1 I 3 4 8 9 •The same data is placed on both buses. The buses handle both floating-point and integer numbers. The buses can handle intermediate, unpacked, and unnormalized data as well as final packed· and normalized res.ults. Since the buses must handle intermediate data each bus contains two extra lines to handle the overflow and hidden bits. Refer to Figure 2-13 for summary of data formats used on FP buses. 2-35 SINGLE PRECISION (FLOAT) FLOATING POINT FORMAT BR FORMAT F P BUS LINES (EITHER A OR B) OVERFLOW r-HIDDEN 32 30 28 26 24 22 20 18 12 OVERFLOW FP BUS A 10 8 4 16 15 2 l Ir HIDDEN FP BUS B 0 33 32 31 16 15 14 0 78 FRACTION FRACTION NOT USED Ill'''''' 1.. ,,, ,, ,111 ll••i 3332 6 5 4 3 2 1 0 31302928272625242~22212019181716 I MSB I LSB FRACTION BIT SIGNIFICANCE DOUBLE PRECISION FLOATING POINT FORMAT AR FORMAT FP BUS B 333231 OVERFLOW I 1615 r HIDDEN 0333231 FP BUS A 161514 7 6 0 16 15 FRACTION FRACTION BIT SIGNIFICANCE NOT USED LONG WORD INTEGER (MULL) FORMAT FP BUS (EITHER A ORB) 32 30 28 26 I ;1""" 4 10 8 24 22 20 18 14 12 6 I I I I I I I I I I I I I I I I I I I I I I I I 1615 0 31 0 31 16 I I I FRACTION BIT SIGNIFICANCE MSB I LSB RESULT FP BUSA 33 32 31 2ND CYCLE MOST StGNIFICANT HALF FROM SALU LSB I I I MSB NOT USED 33 32 31 2 0 1ST CYCLE LEAST SIGNIFICANT HALF FROM LSH REGISTER NOT NOT USED USED TK-GIA Figure 2-13 FP Bus Formats 2-36 2.3.3 Fraction Adder (FAD) The fraction adder aligns and adds or subtracts the fraction portions of two FPNs. The module contains 2 registers that receive data from the FP buses, 2 multiplexers that manipulate the register data, a shifter to align register contents before an add or subtract, an ALU to add or subtract the data, and bus drivers to place the result on the FP buses (Figure 2-14). Certain FAD signals are interfaced to the V-bus for maintenance and diagnostic purposes. Refer to Paragraph 2.3.1 for a discussion of the Vbus. 63:00 FALU FALU FUNC SEL <2:0> {FORMAT SELECT) BSC<3:0>l SHF COUNT<5:0> SEL AR FMT SIGN EXTENSION _ ____. 63:00 63:00 (OUTPUT ENABLE) SHF MUX EN (OUTPUT ENABLE) FAMX EN (INPUT SELECT) FAMXSEL (INPUT SELECT) SHF MUX SEL 63:00 63:00 AR CLK AR I s3:oo I o r CLK B R - - BR 106:00 63:Q71(NOT I LOADED) BUS FP A <33:00> BUS FP B <33:00> TK-0268 Figure 2-14 Fraction Adder Block Diagram 2-37 The fraction parts of the FPNs are loaded into the AR and BR registers. The data entry is controlled by the FADC (Fraction Processor Controls) control store field as shown in Table 2-12. Both registers are loaded with the MSB in bit 63. The execution of the POLY instruction causes an additional 7 LSBs to be transmitted via FP bus A lines <14:08> (where the FPE is normally) and placed in AR <6:0> by loading ARX. Table 2-12 Fraction Data Entry Hex 0 I 2 3 4 5 6 7 8 FADCFields 1 Operation 3 2 µCS µCS µCS µCS 11 10 9 8 ARI ARO ARX BRI BRO 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 I 0 0 0 0 0 I 0 0 0 0 I 0 0 I 0 I 0 I I 0 0 0 0 0 0 I 0 1 0 1 0 I I 1 1 0 0 LOAD I 0 1 0 I 1 0 I I 0 0 0 0 0 l I I I 1 I 0 l 0 I I 0 l 1 Select lines controlled by both microcode and hardware normally load the FPF associated with the smaller exponent into the SHFMX and the other fractional part into F AMX. 2-38 The contents of SHFMX is then right-shifted up to 63 bits to ensure that the radix points align. The magnitude of the exponent difference determines the amount of the shift. The shifted number is padded on the left with its sign. In most cases, the fraction is positive (Figure 2-15). SHFCOUNT (MAGNITUDE OF SHIFT) 5 4 3 2 , 0 ALIGNED DATA TO FALU INPUT B SHFR SHIFTS 0. 1. 2. OR 3 64 SHFC SHIFTS 0. 4. 8. OR 12 64 SHFB SHIFTS 0. 16. 32. OR 48 SHFR 64 SIGN EXTENSION 1'S FOR NEG O'S FOR POS UNALIGNED DATA FROM SHFMX TK-0275 Figure 2-15 SHFR Operation 2-39 When the two FPFs are aligned, the FALU operates on the two fractions. The FALU operation is determined by the op code and the sign of the two numbers. Refer to Table 2-13. Table 2-13 FALU Operation Instruction Sign of Numbers FALU Operation Add Add Subtract Subtract Like (Both +or-) Unlike Like Unlike Add Subtract Subtract Add FALU Operations ~lected Sz S1 So Function 0 0 0 0 0 I Clear B-A 0 0 I I 0 I I I 0 0 0 A-B A+B Not Used A or B I I I I 0 I I Comment B = 0. Used for complementing number when Shift/Subtract D. P. would lose bits off end. Used when SUBD and exponent difference is greater than 7 or POL YD. Normal Subtract Normal Add Used to get A out or B out. Other side is zero. Not Used Not Used 2-40 The output of the FALU is loaded onto the FP buses under control of hardware and the BSC m1crocontrol field. Refer to Table 2-14. The result is in unnormalized form. When a double precision ALU subtraction is done (either as the result of an ADDD, SUBD, or a POLY instruction), the exponent difference is examined. If it is less than or equal to 7, operation continues as usual. However, if the difference is 8 or more, error will be introduced into the LSB if a shift, then subtract is done. To prevent this error, special control hardware is enabled. It disables the output of SHFMX, forcing zeros into the shifter. The smaller operand is routed through FAMX to the A side of the ALU. AB-A (B = all zeros) is done, complementing the operand. The larger operand remains stored in its original register. The result of the ALU operation is output to the FP buses and reloaded into tht- AR or BR depending upon where it was before complementing. During the next machine state the complemented operand is aligned, sign-extended and added to the other operand. The result is loaded onto the FP buses and is normalized. Table 2-14 3 BSC Field 2 1 FALU MUX Control 0 µCS µCS µCS µCS HEX 11 IO 9 8 0-B Not used for FALU MUX Control I I 0 0 c FALU Function Hardware determined. NOTE During double precision add/subtract and poly; If EXP A<EXP B, AR format is used. If EXP B<EXP A, BR format is used. D 1 1 0 1 FP AF ALU L (BR Format) FPFALUH E 1 1 1 0 FP A FALU H (AR Format) FPBFALUL 2.3.4 Fraction Normalize/Divide (FNM) The normalize/divide logic located on FNM performs the two functions indicated by its title. Refer to Figure 2-16. The hardware can either normalize the fractional result of an add, subtract, multiply or divide, generate the quotient given a divisor and dividend. The quotient is generated bit by bit and stored elsewhere. When the quotient is complete, it is returned to the same hardware to be normalized as any other fraction result. Both functions receive data based on microcontrol words, but once started, operate relatively free of microcode control until they are ready to transmit the answer. 2-41 QUOTIENT BIT STREAM NALU 60 60 RND BIT GEN RR NR SHIFT DATA SHF VAL 30 34 BUS FPB 33:00 BUS. FPA 33:00 TK-0274 Figure 2-16 Fraction Normalizer/Divide Block Diagram 2-42 2.3.4.1 Normalize Operation - Before a normalize operation can take place, the Remainder Register must be cleared. A 3 in the 3-bit MSC field of the microstore word clears it during IRD. Since the divide operations use the RR, it is also cleared during the end of the divide flows before the normalization of the quotient. The add, subtract, multiply, and divide operations produce results with varying characteristics. The add/subtract operation has the widest variability in result. Operand size (both fraction and exponent), operand sign, and desired operation, all contribute to this variation. The subtraction of two very nearly equal operands can result in a very small number, i.e., a number that must be shifted left many times before it is in final normalized form. Addition of two operands with equal exponents will produce a result between 1 and 2, necessitating a right-shift. Since the add/subtract operations do produce a wide variability of results, special firmware in the control store is accessed and the normalizations proceed under firmware and hardware control. A divide operation produces results between 1/2 and 2. A multiply produces results between 1/4 and 1. Both divide and multiply normalizations proceed under hardware-only control. All normalizations begin with NRC equal to 0, parallel-loading the result to be normalized into the NR. If the operation was an A/S, BEN 5 selects special firmware based on exponent differences. If the special firmware is enabled, an NRC equal to 2 enables the NR to shift left in 4-bit steps, 3 steps per machine cycle. Once the NR shift left is enabled, hardware looks at the top 12 bits of the NR for the first significant bit as the leading bits are left shifted away. In a positive number, leading zeros are disregarded and the first significant bit is a 1. In negative numbers (2's complement notation), leading ls are disregarded and the first significant bit is a 0 (refer to Figure 2-17). MSN NE SIGN becomes true as the data is parallel-loaded into NR. If the first significant bit is in NR <63:60>. This stops any left shifts. STOP SHF goes high whenever NR <59:56> contain the first significant bit and will cause the NR to stop shifting after one more 4-bit shift (i.e., when first significant bit is in NR <63:60> ). If NR <63;52> does not contain the first significant bit, SWR will remain low, shifting all 12 bits out and enabling a new microstore control word via BEN 2. It continues monitoring for the first significant bit. If the NR is left-shifted 60 bits (counted by the control store), and the first significant bit is not found, firmware returns a result of zero by forcing the output of the NMX to zero via FORCE ZERO. ------SWR NR <63:52> t--~--+---•MSN NE SIGN ---------------STOPSHF RES NEG IF NUMBER IS NEGATIVE DISREGARD LEADING 1S. IF POSITIVE DISREGARD LEADING OS. Figure 2-17 TK-0272 Normalize Shift Enable Control Hardware 2-43 When the first significant bit is in NR <63:60>, the number can be rounded and normalized by the remaining FNM logic. The round byte contents, NALU operation, and final normalization shift is controlled by the round bit generator. The round bit generator controls these functions based on NR 63, NR 62, NR 61 and RES NEG. The round byte is combined with NR lines 39 through 36 (float or single precision) or lines 7 through 4 (double precision). This is selected via the FLOAT line. Since the final normalization shift takes place after the round byte is added and the first significant bit can be in N R 63, NR 62, NR 61, or NR 60 (it must be in one of these four positions), the position of the round bit (I) in the round byte varies (refer to Table 2-15). As summarized in the table, decode logic divides the 16 possible input cases into 4 cases, corresponding to the FSB in bit 63, 62, 61, and 60. Note that the RBG does not monitor NR bit 63, but, since the logic is only enabled when the FSB is in bits 63 through 60 the RBG logic can sense the contents NR bit 63 even though it does not monitor it. RES NEG L enabled means that the number being shifted and normalized is negative. This means that leading ls (Hs) should be disregarded in the search for FSB and that the FSB will be a 0 (L). RES NEG L high indicates a positive number, disregard of leading Os (Ls), and FSB will be a I (H). The contents of the rounding byte is based on the location of the FSB. The rounding byte is designed to place a one 24 bits (56 bits for double precision) behind the FSB. Table 2-15 1. Round Byte and Normalize Control The logic decodes the four signals and locates the FSB. RES NEGL* NR63 NR62 NR61 First Significant Bit(FSB) L L L L L L L L H H H H H H H H L L L L H H H H L L L L H H H H L L H H L L H H L L H H L L H H L H L H L H L H L H L H L H L H 63 63 63 63 62 62 61 60 60 61 62 62 63 63 63 63 *RES NEG L high indicates a positive number. This means a I (H) is the FSB. RES NEG L low indicates a negative number. This means a 0 (L) is the FSB. RES NEG L asserted also causes a NALU subtract thereby rounding and complementing the number in a single step. 2-44 Table 2-15 Round Byte and Normalize Control (Cont) 2. Based on location of FSB, an appropriate rounding byte is generated. FSB 63 62 61 60 Rounding Byte Selected Bit 3 Bit2 Bit l 1 0 0 0 0 1 0 0 0 0 1 0 BitO 0 0 0 1 3. Also based on location of FSB, the final shift required to normalize and ready the result for the CPU is selected. FSB Shift Selected SHF VAL 1 SHF VAL 0 63 62 61 60 Right 1 place No shift Left 1 place Left 2 places L L H H L H L H If the FSB is not in NR <63:60>, the NR is left-shifted and a binary counter counts each 4-bit shift. This count, RES NEG line, and NR bits 63, 62, and 61 (magnitude of final shift) determine the NORM ROM location to be addressed. The content of this location is added to the exponent of the result in the FALU and corrects it for all shifts that take place in the FNM. If however, the number to be rounded is all Is, the addition of the rounding byte will ripple through all bits and cause a fraction overflow. This is sensed by comparing the round byte location (indicating where the logic decoded the current MSB of the number to be rounded) and location of the MSB of the rounded result. If this comparison asserts NORM ERR and thus EALU CIN (indicating there was a ripple and subsequent overflow), a one will be added to the EALU (the exponent adder on FCT) to correct the exponent for the overflow. NR <63:04> goes to the NALU B side and round byte (4-bit) goes to the A side. NormaJly the NR is added to the rounding byte. However, if RES NEG L is asserted, indicating a negative (2's complement) number, the content of the NR is subtracted from the rounding byte. This operation rounds and complements (return to positive notation) in one step. The 60-bit result <63:04> of the NALU operation (rounded and ready to be normaliz!ld) is transmitted to the NMX. The high part (and only part, if float or single precision) is transmitted through to the NSHF for final normalization shift. The NSHF shift control bits select a 0 to 3-bit shift for final normalization. Final normalization moves the MSB to the equivalent of the NR 62 position. When the data is placed on the FP buses, NR 62 (always a one since the fraction is now normalized) is the hidden bit and is placed on the FP bus A bit 32. When the data is transferred to the CPU, the hidden bit is not transferred and the data in NR 61 (bus A bit 6) is the MSB to be transferred. 2.3.4.2 DMde Operation - This logic also performs the fraction part of the divide operation for the FPA. Once the dividend and divisor are loaded into the FNM logic and the quotient storage on the multiplier boards is enabled for either a float (single) or double precision result, the divide operation runs under hardware control until the answer has been computed to the required precision. Once the answer has been computed, microcontrol takes over and transmits the unnormalized quotient back to the FNM logic where it is normalized and rounded like any other fraction. 2-45 The hardware uses the restoring, repeated subtraction technique to divide. The dividend is initially loaded into the RR and the divisor is stored in the NR. The divisor (contents of NR) is subtracted from the dividend (contents of RR). If the result is negative, a 0 is left-shifted into the answer (quotient) register and the contents of the RR is left-shifted by one. If the result is positive or 0, a 1 is left-shifted into the answer (quotient) register; and the result is loaded into the remainder register left shifted by one. The divisor (contents of NR) is continually subtracted from the contents of the RR until 26 bits (58 bits for double precision) of quotient are generated. The quotient is then rounded and normalized. The division operands are loaded under microstore control. The first microstore state loads the dividend into the NR. The second state causes the NALU to OR the contents of the NR with the contents of the RR (currently clear) and load the result of the operation into the RR. In the same state the divisor is loaded into the NR. At the end of the second state the division operands are in their correct register and the divide sequencer hardware takes over. The divide sequencer hardware generates the RR control signals (refer to Figures 2-18 and 2-19). The RR CTL signals either load the NALU result into the RR or left-shift the RR contents based on the result being negative or positive. The input of the RR is hardwired to automatically produce a left shift when loading NALU result. This means that during the initial loading of the RR, the dividend is leftshifted by I. The 11 state in Table 2-16 riglit shifts the dividend by one to adjust for this before beginning the divide operation. 2-46 NEXT A INIT l RES POSH CLK >--tto------ 100 ns RR CTL 0 NEXT B REFER TO TABLE 2-16 DIVIDE SEQUENCE STATES Figure 2-18 Divide Sequence Hardware 2-47 TK-0270 CPU AND FPA CLOCK (200 ns) 0 I I 0 0 0 0 0 200 200 200 200 200 I I I 50 100 150 150 I I I I 50 100 150 I I I OUTPUT OF FF'S I I I I I 50 100 150 DIVIDE 111 I} I 00 10 U WORD= LORR State I I I 50 100 150 50 100 150 RR+--NAL DIVIDE SEQUENCE CLOCK (100 ns) I RR RIGHT SHIFT TK·0516 Figure 2-19 Divide Sequence Timing Table 2--16 Divide Sequence States Next FNM Function RRCTL 1 0 RR Function A B Input A B 0 0 0 0 0 LORR LORR 0 0 I 0 I I NOP NOP LONALU TORR L L H L L H NOP L H H H Ht Lt Shift R* Parallel LD Result** Shift L RR Contents Refer to PREVIOUS STATE I x I I I x 0 I DIV DONE I 0 0 Shift R* Divide I 0 DIVDONa 0 0 Divide Parallel LD** -.- *Used only once at the beginning of each divide. t Control bit 0 is controlled by RES POS H. **Since the RR is hardwired for a left shift, a parallel load shifts the data one place left. The answer is generated at the rate of one bit per 100 ns. If the result of the NALU subtract is positive or zero, a I is left-shifted into the quotient register. A negative NALU result causes a 0 to be shifted into the quotient register. The quotient register is made of two multiplier registers (TEMP and LSH). In single (float) precision the quotient bit stream is shifted into TEMP (use only TEMP <29:4>. In double precision the bit stream shifts into LSH <31:4> then to TEMP <29:00>. When a I is leftshifted into TEMP 29 or 28 on the proper time phase in the multiplier logic, DIV DONE is asserted. This stops the division and accesses a new microstore word that normalizes and rounds the quotient. 2.3.5 Fraction Multiplier (FML and FMH) The fraction multiplier hardware in the FPA is located on two modules, FMH (Fraction Multiplier High) and FML (Fraction Multiplier Low). They handle all fraction multiply functions, part of the EMOD function, and also store the division quotient as it is generated. It accepts data from the FP buses, performs the required unsigned multiplication, and gates the results back on the FP buses. Refer to Figure 2-20. 2-48 24 MC1 <( m a.. a.. IL. IL. Cl> Cl> :::> m 32 ROM BANK B 32 MCA ND BUS 32 MCO MP 7:4 :::> m ROM STORAGE PALU PP ROD LATCH ACCM AALU MCINT 32 4 ROM BANK A SALU CARRY HOLD MP 7:4 CARRY HOLD M PLIER BUS LSH TEMP 56 MP1 MPO TK-0278 Figure 2-20 Fraction Multiplier Block Diagram 2-49 The FPA microcontrol controls the loading of both the multiplicand and multiplier into the appropriate FM (fraction multiplier) registers. In both float and double the complete multiplier is stored on the FMH. During the single precision (float) function, the FMH handles the upper 16 bits of the multiplicand, FML the lower 8 bits and the answer is completed after one pass through the logic. For double precision (56 bits) the upper half of multiplicand fraction is handled in the FMH and the lower half is handled in the FML. Two passes are required to compute the final answer. The FM multiplies under its own control logic. After the operands are loaded, the MCTL field in the FPA microcontrol is asserted; this starts the multiplication. A float multiply is stopped by the microcode two states (400 ns) after it starts. For a double multiply, control goes to a wait state and remains at that location until MUL/DIV DONE is enabled, indicating that the FM logic has finished the operation. At this point microstore control takes over and the answer is transmitted to the normalize logic or, in the case of EMOD or MULL, transmitted to the CPU as an unnormalized number. In order to obtain fast multiplication, a pipeline technique is used (Figure 2-21 ). The multiplier is divided into 4-bit nibbles. The nibbles are then accessed consecutively by a counter-multiplexer combination (least significant nibble first) and each nibble operates on up to 32 bits of multiplicand. The MCA ND bus and MPLIER nibbles are used to address the ROMs. The banks of ROMs provide a4 X 4 primitive with 2-way interleaving. The data is latched (ROM.STORE) and applied to the inputs of 4bit adders (PALU). These adders combine the ROM data to form a partial product, storing the carryout of each 4-bit section, to be added in on the next cycle. The partial product is latched in PPROD and passed to another row of adders (AALU) which accumulate the final product, again, saving the carries. Thus, when the pipeline is operating, there are four processes cycling at the same time: 1. 2. 3. 4. Select ROM addresses Latch ROM data Form partial product Accumulate final product. After the final product is calculated, the stored carriers from both stages are combined with the accumulated product using full carry look-ahead to produce the final answer in a single precision (float) operation. In double precision, this result is stored and used during the generation of the final answer during the second pass. Each of the pipeline processes, with the exception of accessing ROM data (which occurs in each bank of ROMs on 100 ns) occurs at SO ns intervals. The operation of the FM hardware is discussed in three sections. The first section explains the operation of the pipeline, concentrating on operand loading and manipulation of partial products, partial results, and carries to produce the final answer. The second section concentrates on the control logic and how the signals that cont;ol the pipeline are generated. The third, and shortest section, explains how the FM registers are used to accumulate the QlJOtient during a divide operation. 2.3.5.l The Pipeline Loading the Operands The multiplication process begins with the loading of the operands. As discussed in Paragraphs 2.1 and 2.3.2, data is transferred along the FPA buses in several formats. The multiplicand loading logic sorts out these formats and loads the multiplicand register (MCO, MCl, and MC I) so that when the MCAND bus does a parallel access of the MCAND, the MSB of the multiplicand is always in MCAND bus bit 31, and each following bit is progressively less significant (Figures 2-22 and 2-23). 2-50 THE PIPELINE TO I 1. SELECT ROM * ADDRESSES ~ 0 TIMEADDRESS BANK A MP <7:4> (Z) 1 ST NIBBLE ...J LATCH ROM DATA IN ROM STORAGE <( I<( NOP 4. FORM PARTIAL PRODUCT ACCUMULATION IN ACCM i I T150 I I ADDRESS BANK 8 MP <3:0> (Y) 1ST NIBBLE OF B ""' Z X MCAND LOOKUP 0 3. FORM PARTIAL PRODUCT IN PPROD LATCH T100 ~ STORE RESULT OF ~ 2 T50 NOP NOP ACCM NOP =0 NOP ACCM = 0 SALU OPERATION COMPUTE FINAL RESULT ADDRESS BANKA MP <7:4> (X) 2ND NIBBLE OF A "-.. STORE RESULT OF Y X MCAND LOOKUP ~ FORM Z X MCAND PARTIAL PRODUCT (ZCAND) NOP ACCM = 0 T200 • • • • I I ADDRESS B MP <3.0> (W) 2ND NIBBLE ADDRESS A MP <7:4> (V) 3RD NIBBLE STORE RESULT OF XX MCAND LOOKUP STORE RESULT W X MCAND LOOKUP FORM Y X TEND T250 ADDRESS B r.1P <3:0> (U) "' 3RD NIBBLE STORE RESULT V X MCANO LOOKUP MCAND~ FORM X X MCAN~ FORM W X MCAND PARTIAL PRODUCT (YCAND) PARTIAL PRODUCT (X CANO) PARTIAL PRODUCT (W CANO) FORM ACCM ACCM (0) + Z CANO= ACCM ACCM + Y CANO= NEWACCM ACCM + X CANO= NEWACCM • • • • • • • • • • • • • • • • • • FORM FINAL RESULT FINAL RESULT EQUALS ACCM PLUS CARRYS • • • MULTIPLIER (MP) AND MULTIPLICAND (MCAND) ADDRESSING. BOTH MULTIPLIER AND MULTIPLICAND ARE DIVIDED INTO 4 BIT NIBBLES. THE MULTIPLIER NIBBLES ARE ACCESSED INDIVIDUALLY (LEAST SIGNIFICANT NIBBLE FIRST) AND ARE USED WITH ALL MULTIPLICAND NIBBLES TO GENERATE ROM ADDRESSES. TK-0529 Figure 2-21 The Pipeline 2-51 MC1 A 32 6 , 2. F .L. B ,,.L. MC 31 :24 0 ... 31 2. F ,La 2. F , B 2 ,1-8 1 ,..La ,.L MC 23:, 6 l MC AND BUS 24 23 .. .L. ,.L MC 15:8 16 MCO B 15 TO ROM BANKS A&B MC ,..... 7:0 8 7 <( a.. u. en ::> a:i a:i a.. LL 0 ACCESS CO DES 1 - FIRST HALF OF EMODD OR MU LD 2 - SECON D HALF OF EMODD OR MULD F - EMODF OR MULF I - MULL (I NTEGER MULTIPLY) 31 en ::> aJ 1 , 8 1 --, I ---, I 7 I , 8 _L 24 . 23 .L.8 16 MCI 31 ..LB 24 23 ... .: 1_ B 16 15 ...L 8 7 1. F. I 0 * ,LB •THIS 8 BIT REGISTER IS ALSO CALLED EMOD EXTENSION AND MCX TK-0:.'"C. Figure 2-22 Loading and Accessing the Multiplicand 2-52 M PLIER BUS 7:0 MP 7:4 4 TO ROM MP 3:0 BANK A )' 4 TO ROM _(BANK B ....'Tl ~ .... G N I N NIBBLE COUNTER (SB) w ---z__, l NIBBLE-----1 COUNTER ----------------(SA) ._..-...-...~-------- b .... ~ Q. = = OQ ~ M PLIER BUS 63:08 Q. N I Vl w ~ (") G ....~ = OQ ; G -....= ~ 60 59 63 t-t- ~ F G .... 56 55 E l D 52 51 4443 4847 c B 9 A 8 7 20 1s 2423 2827 4_Q 39136 3513231 6 5 l 4 1s1 s 8 12 11 3 2 3 2. 6 •••• 4 3 ..... 0 31 • • 2 8 2 7 ••• 2 4 2 3 ••• 2 0 19 •• • 1 6 1 5 ••• 12 11 • •••• 8 7 ••••• •4 3 • • • • • 0 31 ••• 2 8 2 7 ••• 24 2 3 ••• 2 0 19 •• •16 B MP1 (24 BITS) A MPO (32 BITS) BUS FP B BUS FPA TK-0267 The multiplier up to 56 bits (14 nibbles) long, is loaded into MPl and MPO on FMH. MPl is 24 bits (6 nibbles) long and MPO is 32 bits (8 nibbles) long. Unlike the multiplicand, the multiplier is loaded in one format only (Figure 2-23). The MSB is in MPl-23 and each following bit is progressively less significant. The LSB is MPl-00 for single precision (float) or MP0-00 for double precision. The single format is possible because, as stated before, the multiplier is used consecutively, the various formats are sorted out by the counter as the nibbles are used during the multiplication. Selecting the Multiplicand The operands, multiplicand and multiplier, are enabled onto their respective buses, MCAND BUS and MPLIER BUS, under control of operand bus source logic. Refer to Figures 2-22 and 2-23 and Table 2-17. All 32 lines of the MCAND bus are enabled every time. During a MULF and EMOD and for the first pass of a MULD and EMODD, the MCAND bus accesses MCX. Both MULF and MULD (first pass) use only the top 24 bits, as the lower 8 are discarded later in the pipeline. The MPLIER BUS multiplexer begins by selecting the least significant byte of the multiplier. Interleaving hardware later selects the high or low nibble of the bus. The mux then selects a new, progressively more significant byte each 100 ns. Selecting ROM Address - The Interleave Hardware Both the MCAND and MPLIER buses are divided into 4-bit nibbles for ROM addressing. Each MCAND nibble (8 nibbles) is combined with a MPLIER nibble to provide address bits for 16 4X4 look-up ROMs. Rather than compute the product of the two 4-bit nibbles, the fraction multiply hardware uses look-up ROMs. The multiply results are stored in the ROMs. The data is stored within the ROMs such that the content of the address accessed by the two nibbles is the 8-bit result of a multiply with the same two nibbles. Since the ROMs are relatively slow the 16 ROMs are divided into two interleaved 8 ROM banks. One bank is accessed by the low MPLIER nibble (MP 3:00) theotherby the high MPLIER nibble (MP 7:4). Both ROMs are addressed on 100 ns cycles; the MP low ROM is first, and the MP high is second, trailing by 50 ns. The addressing of a ROM bank ends the first"part of the pipe. Latch the ROM Data The second part of the pipe selects the outputs from either of the ROM banks, using the ROM SEL MUX, and latches the data (64 bits) in ROM STRG. It alternately selects data from the low and high ROM banks on a 50 ns cycle. While the ROM data selected is being latched, the first part of the pipe is selecting a new address for the ROM bank just selected. The output of the other ROM bank will be selected during the next cycle (50 ns in the future). The address lines of this ROM bank were changed 50 ns ago and the outputs are settling. Form Partial Product The outputs of ROM STRG and any carrys from the previous PALU add are added to form the partial product. The PALU is eight 4-bit adders. The outputs of the ROM STRG are wired to the PALU adder inputs such that bits of equal significance are combined. The outputs of the PALU without carrys are stored in the PPROD LATCH. The carrys are stored in CARRY-HOLD registers to be added in on the next PALU add. The latching of the partial products in the PPROD LATCH ends the thitd part of the pipeline. As indicated previously each multiply cycle selects 4 new bits from the multiplier register and each 4 new bits are 4 positions more significant. This means that the input of the PALU add becomes 4 bits more significant each multiply cycle. Because of the increase in significance the stored carry-out of each PALU adder is input, on the next cycle, to the carry-in of the same PALU adder rather than the carry-in of the next PALU adder. 2-54 Table 2-17 Operand Bus Source Operation Input Simals DOUBLE ITH OPC7 MCAND Bus Load Enable* MCI MCIL MCO EMODF or MULF L x L L MULL (INTEGER MUL) x x H lst Pass H H L 2nd Pass H L L MCINT L MCX MPLIER BUS Nibble Select L Start at A, do 6 nibbles L Start at 6, do 4, then start at 2, do 4. L Start at 2, do 14 EMODD or MULD MCAND Bus lines fed *MCAND Bus lines are low enabled. L L L 31-8 7-0 Start at 2, do 14 31-8 31-8 7-0 Note that while the third part of the pipeline is operating, new ROM data is being placed in ROM STRG to be presented to the PALU inputs on the next cycle, and new ROM addresses are being generated to access new data. Accumulate Result The fourth and final section, the AALU and associated accumulator (ACCM) adds the partial products computed in the previous pipeline section to the result stored in the ACCM including stored carries from the previous AALU cycle and latches the result into the ACCM and LSH register. The AALU, ACCM, and ALU carry-hold interconnections automatically shift the ACCM content and ALU carry-hold content to adjust for the 4-bit increase of each new partial product. Because each partial product input to the AALU is 4 bits more significant than the previously stored ACCM content, the outputs of the ACCM are wired to shift the ACCM content 4 bits right (a decrease in significance) before being added to the PPROD LATCH content. The lower 4 bits of the AALU output are always right-shifted into the LSH register. In double precision operations, the content of this register is the least significant half of the result. As with the PALU carrys, the carry-out of each AAL U is stored and added in on the next cycle. Also similar to the PALU logic, the stored carrys are added to the AALU adder that generated them because the content of the AALU is now 4 bits more significant than when the stored carrys were generated. The latching of the accumulating final result in the ACCM ends the fourth pipeline section. The 4 sections of the pipeline continue to operate until stopped by the FM control logic. The stopping point is selected based on both function and precision. SALU OPERATION When stop is initiated, the whole pipeline stops and new logic, the SALU, is accessed which adds the two sets of stored carrys still in the pipeline to the total product on the output of AALU. When a pipeline stop is initiated, the AAL U output (SALU input) is the contents of ACCM plus the current PPROD. Both the ACCM plus PPROD addition (the AALU operation) and the PPROD forming addition (the PALU operation) form stored c&rrys. The hard-wired 2-bit shift in the PPROD LATCH input is not part of the several 4-bit shifts that take place throughout the FM logic, but rather format the stored carrys so they may be easily combined for a final answer in the SALU. Both the PALU and AALU are composed of 4-bit adders with carry-outs. This means that the carry-outs are generated every 4 bits and that the PALU and AALU stored· carryouts can be treated as numbers of the following format: xoooxooox Xis a stored carry (data bit) 0 is a zero (non-significant bit) Conventional wiring (output of a 4-bit PALU adder to input of a 4-bit PPROD LATCH to a 4-bit AALU adder) would cause the data bits of the PALU stored-carry to line up (be of equal significance) with the AALU stored-carry. This would prevent PALU stored-carrys, the AALU stored-carrys, and the ACCM result from being combined in one operation in one adder (the SALU). However, wiring the PPROD LATCH input and outputs with a 2-place shift, generates a PALU stored-carry number with data bits of significance between the AALU stored-carry data bits. This shift allows both AALU and PALU stored-carry numbers to be input to one side of the SALU, since the data bit of the PALU stored-carry is always a non-significant bit of the AALU stored-carry and vice versa. Refer to Figure 2-24. 2-56 SALU OUT 32 BITS •• ••••••• •• SALU ZEROS '::' AALU (32 BITS) PALU CARRY HOLD (8 BITS) CARRIES FROM AALU (8 BITS) {: {: TK-0276 Figure 2-24 SAL U Operation - Adding the Stored Carrys The use of the SALU result is determined by operation and the operation precision. If the SALU result is the final answer, the result is transferred to the FP buses under both op code control and FPA microcontrol. If however, the operation is dou.ble precision, the result is stored, and then, shifted to format it for later operations under FM logic control. Before the shift, the most significant half of the operation is in TEMP, the least significant half in LSH. The shift transfers the contents of LSH (the least significant halt) to the ACCM register which is designated ACCM 14 at this time, and transfers the most significant half from TEMP to Gust vacated) LSH. For the second pass, the second half (the more significant half) of the multiplicand is accessed from register MCI and MCI L, and logic enabled only during the second pass, combines the data transferred to LSH from TEMP with the new result being accumulated. Otherwise, the operation of the pipeline during the second pass is the same as during the first pass. 2.3.S.2 FM Control -The fraction multiplier logic is hardware rather than firmware controlled. Four state bits select one of 13 function states that control the FM logic. Within each state, the state bits, various internal flags, and various flags from other FPA logic are combined to provide the control signals needed to implement the selected state's functions (Figure 2-25 and Table 2-18). 2-57 MULTIPLIER INIT FLAG * DOUBLE IRD INT FLAG * INT DIV COUNT= 3 EVEN DIV DONE TK-0279 Figure 2-25 FM Control States 2-58 Table 2-18 FM Control States STATE VARIABLES X3 0 0 NAME NEXT STATE OUTPUT CONTROL DEFINITION LDCNTR CNTR CONSTANT NEXT TTH NEXT ALU ADD 1 1010 0 1 IF FLAG AND DOUBLE X2 Xl XO 0 0 0 1 0 0 INIT SYNC IF TO, THEN 0000; ELSEOOlO 1101 RESULT OF MINIT SIGNAL FROM MICROCODE. PREPARES MPLIER NIBBLE SELECT COUNTER FOR MULF SEQUENCE. ENTRY FROM STATEOOOO AT TSO TO PROVIDE SYNCRONIZATION BETWEEN MULTIPLIERS SOns. CLOCK AND MICROCODES 200ns. CLOCK 0 l IF DOUBLE OR INT * 1010 0 * 1 0 1 CONT 0001 NOP IF MULF OR EMODF; LOAD MPLIER NIBBLE SELECT COUNTER IF MUL, MULD, OR EMODD. 0 0 0 1 TEST IF CONT., THEN 0000 ELSE IF DIV, THEN 1000; ELSE IF DBL. OR INT., 1100: ELSE 0100 TESTS OPCODE FOR FIRST EXECUTION STATE CALCULATION; CLEARS THE MULTIPLIER DATA PATH, 0 IF EVEN THEN 1000; ELSE 1011 WAITS FOR FIRST QUOTIENT BIT TO BE FORMED IN THE NALU. 1 IF DIV DONE, THEN 1011 ; ELSE 1111 SHIFTS LSH AND TEMP LEFT ONE BIT TO ACCEPT QUOTIENT BITS IN DIVIDE 1 IF DBL I IF 1110 IF INT D3ANDDBL ELSE 1010 ELSE PREV TTH ORINT * IF FLAG, THEN 1100 ELSE IF DBL, THEN 0100; ELSE 1110 CLEAR DATA PATH AND CARRY REGISTERS FOR MULD, EMODD, AND MULL. WAITS FOR FIRST ROM LOOK UP. 1 IF INT AND FLAG 0010 IF COUNT=3, THEN 1110 ELSE 1111 RUNS MULTIPLIER PIPE FOR MULL. 0 0010 IF SHF ZEROES, AND DBL. AND FLAG THEN 0101 ELSE IF D 1, THEN 01 11 ; ELSE 0100 RUNS MULTIPLIER PIPE FOR FLOATING POINT MULTIPLY OPERATIONS. LSH'S 4 LSB'S TO ACCM'S 4 MSB'S IF SECOND TIME THROUGH DBL MULTIPLY. 1 IF INT AND FLAG 0010 IF D4, THEN 0111 ELSE IF FLAG, THEN 0110 ELSE 1111 STOPS PIPE TO ADD FINAL STORED CARRYS TO FINAL ACCUMULATION. LOADS TEMP. 1 IF D3 0010 1 1 1 0 0 0 0 1 1 1 1 0 1 0 1 0 1 0 1 0 0 0 1 NOP DIV WAIT MULL PIPE CADO * 0110 IF INT ELSE 0010 1010 0 0 XFER IF 08, THEN 0110 ELSE 0100 SHIFTS ACCM, TEMP, AND LSH RIGHT TO TRANSFER. 0 0010 0 1 0 1 ADDZ IF DI, THEN 0101 ELSE 0111 ADDS ZEROES TO ACCM'S 4 MSB'S. 0 0010 1 1 DONE 1111 STOPS ALL REGISTERS FROM CHANGING TO ALLOW NR OR CPU D REG. TO ACCEPT FINAL RESULT. * 0 1 I 0 PREV TTH 1 l 1 IF DOUBLE ELSEO * 1010 0 PREV FLAG NEXT CLR ALL MUL/DIV PPROD ACCM DONE LSH TEMP 1 0 NOP NOP NOP NOP * * * * 0 1 0 LD SR SR SR * * * * PREV FLAG 0 1 LD LD NOP NOP * * * * * SL IF EVEN LDIF EVEN AND FLAG LD * * 1 1 l IF FLAG AND DOUBLE NEXT FLAG * PREV FLAG 1 0 NOP NOP * * 0 PREV FLAG 0 0 NOP NOP NOP NOP 0 PREV FLAG 0 0 NOP NOP * * SL IF EVEN SLIF EVEN * * * * PREV TTH 0 1 IF EVEN ELSE PREV FLAG 1 IF EVEN 0 LD LD LD NOP * PREV TTH * 0 0 0 0 LD LD LD SR * PREV TTH 1 IF FLAG AND DOUBLE PREY FLAG 0 0 LD LO LD NOP 0 0 PREY FLAG 0 1 IF FLAG NOP NOP LDIF EVEN AND FLAG LO PREV TTH l IF DOUBLE 0 0 0 LD SR SR SR 0 0 PREY FLAG 0 0 LD LD LD LD I IF DOUBLE 0 PREY FLAG 0 I NOP NOP NOP NOP * I IF 0110 IF INT D3 AND DBL ELSE 0010 OR INT * * * *DON'T CARE * * * * TK-0735 2-59 The states can be roughly divided into four groups: 1. 2. 3. 4. IRD Integer Multiply Fraction Multiply Divide. This section will discuss the states by groups and in the previously shown order. Within each discussion, the states will be discussed in the order they are accessed within the group. This is important because the function of some states is partially dependent on the previous state. The state of the logic is defined by the output of the PRESENT STATE register which is clocked on a 50 ns cycle. The inputs to this register (the next state) are based on the current state and internal and external flags. A majority of the internal flags provide sequence information and are generated in the logic shown in Figure 2-26. IRD Group (Instruction Register Decode) When the FM logic is not performing a multiply or divide operation, it is in IRD. While waiting, the logic is continually cycling through the 4 states in this group preparing the FM logic for a multiply. In this IRD group the op codes in the instruction buffer are monitored. Initially, (in INIT), the FM logic is set up for a MULF, but if the op codes indicate either a MULL, MULD, or EMO DD, new information is loaded into the FM logic in the CONT state. The FPA microcontrol will be loading the MPLIER and MCAND register during IRD if the op codes indicate a multiply operation. The control logic enters INIT whenever the Multiplier Operand Control (OPLD) field in the FPA microcontrol store is F. This normally happens during the FPA IRD or when a multiply operation is finished. The SYNC state is entered at CPU TSO and synchronizes the FM clock with the CPU clock. It also clears FLAG. CONT is entered at TlOO and loads new information if the op codes indicate a MULL, MULD or EMODD. TEST is entered at TISO. In TEST, if the MCNT bit in the FPA microcode is not asserted, indicating that the FPA does not want the multiply pipeline to begin, the FM returns to the INIT state and continues waiting. If however, MCNT is asserted, indicating that the multiplier operands are loaded and the FPA wants a multiply to start, the correct execution state is selected based on the op code. Refer to Table 2-I 8 for summary of IRD group functions. Multiply Float Path If the op code indicates a MULF, the PIPE state is selected and the multiplier pipe can continue. Note that during INIT the nibble counter was loaded with MULF control data for ROM look-up to begin based on that data. Since a MULF is being done, the data in the beginning of the pipe is correct. The logic remains this state (PIPE), running the pipe and accumulating the answer, until Dl, a timing signal, is asserted. When DI is asserted the current content of the PPROD plus ACCM plus the storedcarrys is the final correct answer. Asserting DI selects the CADD state. This state NOPS most of the FM registers and enables the SALU add of stored-carrys to the AALU content. CADD also latches the SALU result into TEMP. The FM logic remains in CADD I50 ns (until D4 is asserted.) Since FLAG was cleared during the IRD group and never set, it is clear and asserting D4 initiates the DONE state. This state asserts MUL/DIV DN and NOPs all other FM logic. MUL/DIV DN, monitored by the FPA control logic, returns control to the FPA microcontrol. It is the FPA control store that selects the MULF result, via a multiplexer, directly from the SALU outputs rather than from TEMP. The FM logic will remain in DONE until returned to INIT by the multiplier INIT code in the multiplier operand control field of the FPA microcontrol store. Refer to Figure 2-27 for a summary of MULF control. 2-60 MULTIPLIER NIBBLE COUNTER LOAD COUNTER DATA ---- WIRED AS SHIFT REGISTER 1-- , --- 4 BIT UP COUNTER -- 6 BIT LATCH - ...... DECODE 8 BIT REGISTER 01 THAU -- 08 j ·~ ·~ 4 BIT • REG 1 50ns CLOCK (COUNTER~ ,~ LSB ,~ IGNORED) MPLIER SELECT LINES ROM BANK MPLIER SELECT LINES ROM BANK A B _ _ 50 ns ~ 14 NOTE THIS FIGURE SHOWS ONLY GENERAL SIGNAL FLOW. ALL ITEMS SHOWN HAVE NUMEROUS. OTHER OUTPUTS ANO·INTERCONNECTIONS DELAY~ TIC·0~1 Figure 2-26 FM Control Logic 2-61 47:44 43:40 z MP1 24 BIT MPLIER MULF OPERANDS MC1 24 BIT MCAND TO TO MULF TIMING _J IRD 50 NS TO MCONT T ~ ·1 TO TO I I MUL CLK MUL STATE INIT SYNC CONT TEST PIPE PIPE PIPE PIPE PIPE PIPE PIPE CADD CADD CADO DONE DONE x 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 DO 01 02 03 04 05 06 2 3 4 5 6 7 8 (FMHM) "D" TIMING (LO) A B c D E F 0 5 5 6 6 6 7 7 0 SB <2~0> x x 6 7 7 ODD H x MUL NIBBL CNTR SA <2:0> x 5 BANKA MP <3:0> I I Z·M~MCI X·~·MCI 5 MP 43:40 BANKB MP <7:4> CLR SL SL SL SL x x SL SL SL PP3 PP4 PPS LO LO LO LO LO PP1 + ACCMO PP2 + ACCM 1 PP3 ACCM2 PP4 + ACCM3 LO LO LO LO LO LO LO LO LO LO LO LO SL MUL Div DONE MULF RESULT ACCUMULATION ACCM 1 = ACCM 2 ACCM 3 = ACCM 4 = ACCM 5 = 2-62 3 3 4 2 3 3 + PPS LO T PPS + = ACCM 5 ACCM 4 MUL DONE ADD LAST CAR RYS LO NR = PP1 MUL DONE LO NR LSH PP3 + ACCM 2 .....___ _.. PP5 + ACCM 4 MUL DONE LSH PP2 + ACCM 1 PP4 + ACCM 3 I I SALU = PP6 PLUS ACCM 5 PLUS STORED CAARYS FROM PP6 & ACCM 5 * + ACCM 0 ~-....., LSH LSH LSH AFTER EACH ADDITION OF THE PARTIAL PRODUCT AND ACCUMULATOR CONTENTS, THE 4 LEAST SIGNIFICANT BITS OF THE RESULT ARE LOADED INTO THE LSH REGISTER. Figure 2-27 MULF Control U·MCI PP2 LSH • V•MCI PP1 CONTENTS OF ACCM ACCY CTL 2 2 --1 CONTENTS OF PPROD x 2 1 0 .,.. MP 51:48 .,.. MP 59:56 ~ MP 47:44 MP 55:52 MP 63:60 I CONTENTS OF ROM STRG PPC CTL 0 0 TK-0512 MULD Path If, when the FM control logic is in TEST, the op codes indicate a double precision multiply (DOUBLE set), the WAIT state will be entered. Initially (in INIT) the nibble counter was loaded for MULF and ROM lookup began, then in CONT (100 ns later) when a MULD was decoded, new data was loaded into the nibble counter. The WAIT state waits for the data loaded in CONT to settle and access new ROM locations before beginning the pipe. After 100 ns in this state FLAG is set. In this context, FLAG set indicates the first pass in a double precision multiply. After 150 ns, since both DOUBLE and FLAG are set, PIPE is entered. The logic remains in the PIPE state, running the pipe and accumulating the answer until Dl, a timing signal, is asserted. When Dl is asserted the current content of ACCM plus the two sets of stored-carrys are the first half of the MULD partial product. Asserting D 1 selects the CADD state. This state NOPs most of the FM registers and enables the SALU add of stored-carrys and the ACCM content. CADD latches the upper 32 bits of the first half of the MULD partial product in TEMP. The lower 32 bits have been accumulating in LSH during the pipeline operation. The FM logic remains in CADD 150 ns (until D4 is asserted). Since FLAG is asserted, indicating firs.t pass, asserting D4 selects the XFER state. Four cycles in the XFER state transfer the content of TEMP and LSH to LSH and ACCM (refer to Figure 2-28), clear FLAG, and clear the stored-carry registers. The assertion of D8 returns the FM logic to PIPE. The FLAG bit now cleared and DOUBLE set asserts ALU ADD. This signal causes the data stored in LSH during the XFER state to be added in (4 bits per cycle) to the final product being developed. Six cycles transfer all 24 bits stored during XFER. While these bits are being right-shifted from the right end of LSH into the MSBs of the developing final product, the LSB of the developing final product are being right-shifted into the left end of the LSH. When 20 bits have been transferred in from LSH, SHF ZERO is enabled. This causes the logic to enter the ADDZ state. The final 4-bit transfer of LSH data takes place during the first ADDZ state. After that the ALU that added LSH to the ACCM is disabled. During this state, the pipe continues functioning and the LSBs of the accumulating final product are still shifted into the left end of LSH. The only difference between PIPE and ADDZ during this second pass is, in PIPE, LSH data bits are added into the MSB of the ACCM, and, in ADDZ, zeros are added. Note this state even has the same ending criterion as PIPE, namely DI asserted. Dl asserted transfers control to the CADD state. As discussed in MULF path, CADD is entered when the ACCM plus the two sets of stored-carrys is the final answer. In CADD the stored-carrys are added to the AALU content by SALU and the result is latched into TEMP. Since FLAG is now clear the assertion of D4 causes a transfer to DONE. In DONE, MUL/DIV DONE is asserted. This causes the FPA microcode to select and transfer, via multiplexers, the upper 32 bits of the double precision result from the SALU onto FP bus A and the lower 32 bits from the LSH register onto FP bus B. Refer to Figure 2-29 for a summary of MULD control. MULL Path If the op code being monitored during CONT decodes as MULL, new data is loaded into the nibble counter. The logic proceeds to TEST and, in TEST, selects the WAIT as the first execution state because INT (meaning integer) is set. 2-63 0 31 31 BEFORE XFER I 31 • NOT AFTER USED XFER I TEMP 1 !23 ai1.-.ol • 0 LSH 01 0 LSH l 31 ACCM 14 aj 7.--01 l~ 0 I USED 'Tl dQ" c..., n N I N 00 N a.. ~ THE XFER TEMP ~ :::r' 0 RIGHT SHIFT 4X 0 RIGHT SHIFT 4X n )( 'Tl m :;t:I .... .... CJ'.) LSH ~ ('D RIGHT O SHIFT ......-..-...--..--~.._.....__.__...__~,___._____,__.__,,__..__.__..._.__....__.__.-.._._........__.,___.__.-.._._......__...._.__4x TK·0273 MULD TIMING TO TO TO r----t-IRD T -;so NstMUL CLK MUL STATE INIT (FMHM) FLAG x 0 0 0 PIPE PIPE PIPE PIPE PIPE PIPE PIPE PIPE PIPE PIPE PIPE PIPE PIPE PIPE CAOO CAOO CAOD DO 01 02 03 04 2 3 4 5 2 MC N·O MC M·O PP12 PP13 PP14 0 (FMHM) "O" TIMING MUL NIBBLE CNTR x A B 2 3 4 5 7 6 MP 11 :08 BANK A MP <3:0> TO TO TO MCONTI SYNC CONT TEST WAIT WAIT WAIT 0 TO MP 15:12 BANK B MP <7:4> I CONTENTS OF ROM STAG MC Z·O A 9 8 MP 19:16 MP 27:24 MP 23:20 MC I c B MP 31:28 I MP 39:36 MC MC U·O PP4 PP5 I 0 MP 59:56 MP 51 :48 MP 47:44 MC MC T·O S·O PPS PP7 I MP 55:52 I MP 63:60 I MC MC X·O W·O PP1 PP2 PP3 LO LO LO LO LO LO LO LO LO LO LO LO LO NOP ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM 2 3 1 4 7 9 10 6 5 11 12 8 13 Y·O V·O F E 0 MP 43:40 MP 35:32 MC R·O MC Q·O PPS ppg MC P·O MC O·O CLR~-------------------------------..1 CONTENTS OF PPROO PPC CTL x SL SL SL SL LO LO LO LO x x SL SL SL SL LO LO LO CONTENTS OF ACCM ACCY CTL LSH TEMP ITH PP10 PP11 NOP LO LO LO LO LD LO LO LO LO LO LO LO LO NOP NOP LO LO LO LO LD LO LO LO LO LO LO LO LO NOP LO LO LO -+------------- ALUAOD..,.__________________________________________________________________________..------------------:--------------------:~-t MUL DIV DONE TK-0530 Figure 2-29 MULD Control (Sheet 1 of 3) 2-65 MULD TIMING TO TO TO TO TO TO TO MUL CLK XFER MUL STATE (FMHM) FLAG XFER XFER 0 0 0 (FMHM} "D" TIMING D5 06 D7 08 MUL NIBBLE CNTR 3 4 5 6 MP 11 :08 BANK A MP <3:0> PIPE PIPE PIPE 0 0 0 0 7 8 MP 19: 16 'MP 15: 12 BANK B MP <7:4> PIPE I Z·MC1 CONTENTS OF ROM STAG 9 MP 27:24 MP 23:20 PIPE AOOZ ADDZ ADDZ AODZ ADDZ ADDZ ADDZ ADDZ CADD CADD CADD DONE DONE DONE DONE 0 0 A c B MP 35:32 MP 31 :28 0 0 D E MP 43:40 MP 39:36 F 0 MP 51 :48 MP 47:44 0 0 0 0 0 0 0 0 0 DO D1 D2 D3 D4 D5 2 3 4 5 2 3 NOP NOP NOP NOP MP 55:52 MP 63:60 R·1 1 1 PP22 PP23 MC I MC I X·1 MC W·1 V·1 MC U·1 PP15 PP16 PP19 PP20 PP21 LD LD LO LO LD LO LO LD LO LO LO LO LO ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM 15 16 17 18 19 20 21 22 23 24 25 26 27 I MC T·1 MC S·1 0 0 4 5 6 MP 59:56 Y·1 MC 0 I MC 0·MC I MC 0·MC I MC P· 1 N· 1 MC M· 1 CLR CONTENTS OF PPROO PPC CTL PP17 PP18 PP24 PP25 PP26 PP27 PP28 SL SL SL SL LO ACCY CTL NO? SL SL SL SL LO LD LO LO LO LD LD LO LO LO LO LD LO LSH NOP SR SR SR SR LD LD LD LO LD LO LO LD LO LO LO LD LO TEMP LO SR SR SR SR CONTENTS OF ACCM TTH ALU ADD MUL DIV DONE ~ ~: IMUL DONE TK·0531 Figure 2-29 MULD Control (Sheet 2 of 3) 2-66 MULD OPERANDS l~M'~Nl~oI~PI~aI ~RI ~s I r~Iu~Iv~I~~Ix ~Iv ~I z Ise e1r MPLIER ,.....i-------MP1!------1~--------MPO,-------.i•I ~'------M_c_1._M_c_1_L I _ _ _ _ _....JI O's FIRST HALF ::1 !JQ c: '"1 1314 .. _ _ _... 0131-·---· 01 MULD RESULT ACCUMULATION I ~ N i:.., i I TEMP ACCM1=~ ~ N °' -.J PP2_ + ACCM1 ACCM 2 = ti i23 c=~SH H LSH ACCM 4 =I PP4 + ACCM3 H LSH ACCM 5 = I PPS + ACCM4 H LSH ACCM 6 = I PPS + ACCMS H LSH (") 0 '"'I 2. @ ::r' ~ ~ w _, 0 !:j ACCM 1 = I PP7 + AccMa ACCM 12 =I ACCM 13= H H I I LSH LSH SALU = PP14 PLUS ACCM B PLUS CARRYS FROM PP14 AND ACCM13 I 1. + TEMP II ·I ~ ACCM22 ACCM26 LSH 32 32 FIRST HALF PARTIAL PRODUCT OF MULD ACCM27 ·I H H LSH LSH H ACCM25 LSH LSH H ACCM24 LSH H H H ACCM23 LSH LSH LSH LSH I ACCM18 = PP18 + ACCM17 _.H.__L_S_H_..I ACCM19 = PP19 +. ACCM18 H ACCM20 ACCM21 H LSH H ACCM 1s LSHf._.,1.,_ _A_c_c_M_1_9_ H LSH ACCM17 = PP17 + ACCM16 ACCM17 ILs Hf I ~ ACCM16 = PP16 + ACCM15 ACCM16 LSH Ls_H_ __, H PP12 + ACCM11 I PP13 + ACCM12 RESULT OF TRANSFER 1 r+(] ACCM15 = PP15 + ACCM14 ACCM15 LSH r.-t. .___ H PP10 + ACCM9 PP11 + ACCM10 ACCM14 LSH I PPS + ACCM7 H.____L_S_H_ ___. ACCM 9 = I ppg + ACCMS H.____L_SH_ ___,H ACCM 11 =I o~ 1 LSH ACCM 8 = ACCM 10 =I FIRST HALF PARTIAL PRODUCT PP3 + ACCM2 ACCM 3 = I a i I LSH 0~31 \() c::t""" MCAND "!" 81 r..- 2 4 3 2 - _ _ _ . . .. SECOND HALF 1 .. I I I MCO I ACCM20 = PP20 + ACCM 19 I ACCM21 = PP21 + ACCM20 I ACCM22 = PP22 + ACCM21 H ACCM23 = PP23 + ACCM22 H ACCM 24= PP24 + ACCM23 r----i ACCM25 = PP25 + ACCM24 I ACCM26 = PP26 + ACCM25 I ACCM27 = PP27 + ACCM 26 FINAL PRODUCT = SALU = PP28 PLUS ACCM27 PLUS CARRYS FROM PP28 AND ACCM27 TK-0532 In WAIT, the new ROM data selected by the new ROM address accessed as a result of the new data loaded into the nibble counter during CONT is given time to settle before entering the pipeline. When FLAG is set, the data has settled and the integer multiply pipeline state (MULL) is entered. The FM logic remains in the MULL state as the pipeline accumulates the final product (the least significant half accumulates in LSH). When COUNT = 3 is set, the AAL U plus the two sets of storedcarrys is the final product. COUNT = 3 asserted selects DONE. In DONE, MUL/DIV DONE is asserted and the final product is available. The FPA microcode loads the upper half from the SALU onto FP bus A during one machine cycle. On the following cycle the lower half is loaded from LSH onto FP bus A. Refer to Figure 2-30 for a summary of MULL control. 2.3.5.3 Division - The TEMP and LSH register in the fraction multiplier logic are used to store the quotient generated during floating-point division. The registers are concatenated with the MSB of LSH shifting into the LSB of TEMP. During a divide operation the FPA asserts DIV and loads the divisor and dividend into the FNM. In the FM logic, the nibble counter is loaded for a MULF and clocks through until TEST. To initiate quotient storage the multiply control field (MCNT) of the FPA microcode must be asserted. The combination of MCNT and DIV asserted selects the NOP state in the division path. The FM logic enters NOP with the nibble counter odd and exits when the nibble counter is even. The 2 cycles ( 100 ns) allows the first quotient bit to be formed. From NOP, the FM logic enters DIV. In DIV, the logic left.. shifts LSH and TEMP one bit every even cycle. When doing a single precision division the single quotient bit is input to both LSH bit 4 and TEMP bit 4. The data input to LSH is never accessed in single precision. In double precision the TEMP bit 4 quotient input is blocked and the TEMP bit 3 is input to TEMP bit 4 on the left shifts. DIV DONE is asserted when quotient bits are left-shifted in TEMP bits 28 and 29. This condition is tested at TIOO of each state and transfers control to DONE if true. In DONE, MUL/DIV DONE is asserted, stopping the division process in the FNM and causing the FPA microcode to access TEMP for a single precision quotient and TEMP and LSH for a double precision quotient. · 2.3.6 Exponent Proce~or The exponent processor, part of the FCT, processes the FP exponent during FP operations. During FP multiply /divide, the processor adds/subtracts the exponent~ as ne~ded~ During add/subtracts, the processor stores the larger exponent and determines the final exponent by taking into account the operation, fraction right-shifts, and left-shifts during normalization. By comparing the exponent magnitudes the exponent processor also controls the FPF addition and subtraction in the FAD. Refer to Figure 2-31. 2-68 MULL OPERANDS 23:20 39:36 15: 11 31:28 27:24 MPO 32BIT MPLIER T MCINT 32 BIT MCAND MULL TIMING TO MULSTATE TO INIT FLAG MULL NIBBLE CNTR x COUNT=3 I SYNC I CONT I A I B I TEST I 6 I I TO WAIT 7 I I WAIT I WAIT 8 I 9 I r-- BANK A MP<3:0> CONTENTS OF ROM STRG MULL I 2 "!• MP27:24 BANK B MP<4:7> 1 TO I I MULL I MULL I + 3 MP35:32 4 TO MULL I MULL 7 I 8 I MULL I MULL I 5 I 6 .f. MP19:16 MP 11:08 I I ____.j I MULL I DONE I 9 I d Ir MCINTt MCINTIR MCIN~t MCINTIP MCINI MCINT IN MCIN M MCINT - MP 31 :2S MP 39:36 MP 1S:12 MP 23:20 CLR PP1 CONTENTS OF PPROD PP2 PP1 PP3 PP2 ACCMO LO + CONTENTS OF ACCM LSH PP4 PP3 PPS PP4 PP6 PPS PP7 PP6 ACCM 1 ACCM2 ACCM3 ACCM4 ACCM 5 ACCM6 LO LO LO LD LD LD + + + + + PPS PP7 ...... - + MUL DIV DONE = ACC M7 MUL DONE E 0 E 0 0 E E 0 E 0 E 0 E MULL RESULT ACCUMULATION ACCM 2 = ACCM 6 PP2 + ACCM 1 LSH = PP6 + ACCM 5 ACCM 7 == PP7 + ACCM 6 TK-0525 SALU =PPS PLUS ACCM 7 PLUS STORED CAR RYS FROM PPB & ACCM 7 Figure 2-30 MULL Control 2-69 DALU LA-LB 8 SHF COUNT .,________a_____,....(FAD) SHF COUNT IS ALWAYS POSITIVE OR ZERO 8 CALLI LB-LA SELECTS INPUT A GT B 10 10 PR (INPUT SEL) AMX (LOAD ENABLE) EAC1 2 2 (INPUTSEL) BMX ------+------. 10 OPERATION SEL EAUL --+--• EALU ...--,----. 8 8 8 8 {LOAD ENABLE). LA EAC3 <07:00> {LOAD ENABLE) EAC2 LB <07:00> .a XR 8 (LOAD ENABLE) EAC 0 {OUTPUT ...__ _ SELECT) BSC<3:0> BUS FP A <33:00> BUS FP B <33:00> TK-0277 Figure 2-31 Exponent Processor Block Diagram 2-70 The FPEs are loaded from FP buses A plus B into LA and LB under control of the EAC field in the microcontrol (Table 2-19). The contents of LA and LB are loaded into CALU and DALU. CALU computes LA - LB and DAL U computes LB - LA. The carry-out signal from DALU selects either CALU or DALU as the positive exponent difference (SHF COUNT) to provide FPF control in the FAD. Operation Table 2-19 EAC Control Store Field 3 2 EAC Fields 1 0 µCs µCs µCs µCs 27 26 25 24 Controls LA-+ Bus A Transfers Controls LB-+ Bus B Transfers Controls PR-+EALU Transfers Controls XR-+EALU Transfers 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 0 0 1 1 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Hex 0 1 2 3 4 5 6 7 8 9 A B c D E F 1 1 1 1 1 0 0 1 1 0 0 0 0 1 1 1 1 0 0 1 1 NOTE Although the control field appears to be a 4-bit field, each bit of the 4 bits actually controls a single, independent funcdon. 2-71 NOP The contents of LA and LB, as well as XR (poly register), PR (product register), a normalization constant, and 8016 are possible inputs to EALU. Input selection is controlled by both microcontrol and hardware. Refer to Table 2-20 for input selection summary. Table 2-20 EALU Input Control AMXC Fields 1 0 µCs 35 µCs 34 Operation 0 0 1 1 0 I 0 1 LA to EALU A input LB to EALU A input PR to EALU A input Hardware select: For FP Add/Subtract, larger exponent (LA or LB) to EALU A BMXC Fields 1 0 µCs 33 µCs 32 Operation 0 0 0 1 0 1 Normalization constant to EALU B input XR to EALU B input 8016 to EALU B input LB to EALU B input 1 1 2-72 The EALU operation is controlled by the microcontrol field EALUC. Refer to Table 2-21. The output of the EALU can be loaded into XR or PR for further processing, or loaded onto the FPA bus as a final answer. The XR and/or PR are loaded under control of the EAC microcontrol field. Refer to Table 2-19 (bits 0 and 1). The EALU output to FP bus A <14:07> is controlled by BSC microcontrol field (Bus A EXP). Refer to the discussion of BSC field in Paragraph 2.3.2. The partial answers in XR and PR are reloaded into the EALU via AMUX and BMUX, and are combined with either a normalization constant or ±8016 before they are loaded onto FPA < 14:7>. Refer to Table 2-20. The normalization constant, a variable quantity, adjusts the exponent for shifts required to normalize the FPF in the FAD. (The actual normalization constant is read from a ROM rather than computed. The ROM is on the FNM.) The 8016 corrects for the offset that results in FPE add/subtract during exponent processing in MUL/DIV. Refer to Paragraphs 1.4 and 1.5. Table 2-21 EALU Fields 1 0 EALU Control Store Field Control Signals Generated EALU Operation µCs 31 µCs 30 Required Carry Req Mode Control S3 S2 S1 ~ 0 0 x H (logic) H H H H Pass A INPUT 0 1 0 L (arith) L H H L A-B 1 0 1 L (arith) H L L H A+B I I x H (logic) H H L L Force l's out (interpreted as underflow. This function is used to generate zeros on the buses. X = Don't care 2-73 2.3. 7 Sign Processor The sign processor, a section of the FCT, determines the sign of the FP operation result using both hardware and the microcontrol field SGNC (signlatch controls). Refer to Figure 2-32 and Tables 2-22 and 2-23. This section receives information indicating the sign and magnitude of each operand, the desired operation (add, subtract, multiply, divide, poly) and the magnitude of the result. The resulting sign is placed on FP bus A 15. SB SA SIGN ~ A J -- 4 -- FP BUS A <15> s__... FP BUS B <15> s__,.. SGN 1-F-~ INSTRUCTION DECODER ~ ! -- SIGN B TO FP BUS As <15> (OUTPUT) IRC 2 --6 --10 COMBINATORIAL LOGIC .L T EALU3 -'I -- r-- RESULT4 SX 4 NOTES 1. 2. FROM µCS SGN FIELD FROM IB DETERMINES INSTRUCTION TYPE 3. DETERMINES 4. 5. INTERMEDIATE RESULTS SIGN OF OP ERAN OS IF RESULT IS ZERO OR NEGATIVE TK-0280 Figure 2-32 Sign Processor Block Diagram 2-74 Table 2-22 SGNC Control Store Field SGNC Field SGN C2 SGN Cl SGN co Operation µCs 07 µCs 06 µCs 05 Load into SA Load into SB 0 0 0 0 0 1 0 1 0 SA (NOP) FP bus A 15 SA+ Op Code =SUB SB(NOP) SB (NOP) SB (NOP) 0 1 1 1 1 1 0 0 1 1 1 0 1 0 1 Result* SA (NOP) FP bus A 15 SB SA+ SX SB (NOP) FPbus B 15 FPbus B 15 SB (NOP) SB (NOP) * This is the resultant sign, determined by the op code, signs of the operands, the relative magnitude of the exponents, and the signs of the FALU. It can also be forced if a floating underflow or overflow occur. Table 2-23 Op Code MULX DIVX ADDX SUBX ADDX SUBX ADDX ADDX SUBX SUBX Sign Processor Operation Sign of Result (FALU sign) Relative Size of Exponents x x x x x x x x LA>LB LA>LB LA<LB LA<LB LA= LB LA= LB LA= LB LA= LB Positive Negative Positive Negative Result* SAG> SB SAG> SB SA SA SB SB SB Sil SB SD X = Don't Care *Except for error - in case of overflow, the sign is forced to a 1 while underflow forces a 0. 2-75 2.3.8 Control Store and Logic As indicated in previous sections, the control store and logic, located on the FCT, provides the control signals for all FPA operations. These include both FPA internal operations: the transfer and manipulation of FP data, and external operations (interface between the FPA and CPU). Refer to Figure 2-33. TO CPU FPA FPA STATUS TO CPU INTERFACE LOGIC CONTROL LINES SELECTED MICROWORD I NEXT ADA <8:0> CONTROL STORE I I MSC 2:0 L-l---- ----~ NEXT ADA CLK 8:0 BEN IROPC 7:0 SPECIFIER LINES STALL 2:0 ..,___ ____.,_ __,.. LOGIC OPCODE & SPECIFIER 8 DECODE LOGIC NEXT ADDRESS --------F-L_O_A_T----1 LOGIC ,......._ __. FLOAT -------- TRAP ADDRESS LINES TRAP LOGIC CS LINES TK-0271 Figure 2-33 Control Store and Logic Block Diagram The FPA has two normal operating functions: instruction register decode (IRD), and performing an FPA instruction. The FPA normally alternates between these two functions. A third function, exceptional conditions, handles error conditions, traps, and interrupts. The FPA executes the third function whenever an exceptional condition is sensed. The FPA and the CPU run synchronously, i.e., both have 200 ns microcycles divided into 4 time states (CPTO, CPUT50, CPTIOO, CPT150) and TO CPU is simultaneous with TO FPA. Both load a new microword only at TO. The FPA always keeps two updated copies of the 16 CPU general (scratchpad) registers. These copies are used by the FPA to optimize register-mode instructions. These register copies are accessed and updated by the same lines that access and update the CPU registers themselves. To ensure that the FPA never reads a changing register the CPU updates the general register set (and FPA copies) between TlOO and T200 (TO) and the FPA reads the copies only between TO and TlOO. 2-76 The FPA as a whole is directly controlled by the CPU. The CPU can enable and disable the FPA via bit 15 of the FPA status register (ID bus register 17). The FPA is normally enabled by the CPU. The FPA is a· microcontrolled unit containing a 512 words by 48 bits of control store in ROM. Each word is divided into various length control fields, each field providing independent control of a particular section of the FPA. In general, these fields: control the operation of the FPA data manipulation components; coordinate the operation-of the FPA with the operation of the CPU; and initiate the operation of parts of the FPA control logic. Control of FPA operations is handled by accessing specific ROM words causing a particular set of FPA actions. 2.3.8.1 IRD -The IRD state is controlled by location IRD.l in the control ROM. In this state a new microword is not read until STALL is disabled. ACC INSTR Hand IB CALL from the CPU microword disables the STALL condition. When the FPA leaves IRD, the ACC ERROR bit in the status register is cleared if it was set during a previous cycle. The op code and specifier decode logic is monitoring the IRC OPC 7:0 and specifier lines. The OPC lines enable ACC INSTR H when a FPA instruction is in the IB and are decoded to determine instruction type. The specifier decode lines determine specifier type. The output of this decode logic is transmitted to the next address logic. Location IRD.1 controls all FPA operations in the IRD state. The operation assumed is a register to register operation. The FPA continually begins this operation without any indication that the next operation will be an R to R because it has both operands in its register set and, if the next FPA operation is an R to R, both operands will already be loaded. Location IRD. l has MSC = 6 and the next address = 180. This information is transmitted to the next address logic and along with the outputs of the op code and specifier decode logic determines the correct next microaddress. In the next address logic (refer to Figure 2-34 and Table 2-24), the MSC = 6, and op code and specifier decode logic lines select the address offset to be ORed with next address ( = 180) to select the next microaddress. MSC = 6 selects the A-fork inputs from op code and specifier decode logic lines and transmits them through the A-B fork mux. This selects the correct offset based on instruction type, float or double, and specifiers 1 and 2. . 2-77 TRAP CONTROL SIGNALS cs DECODE NEXT ADDRESS (FROM CURRENT MICRO WORD) (9) ACC TRAP ADDRESS CS BUS (8) MAINTENANCE REGISTER <16.23> ID BUS NEXT A ORB FORK DATA CONTROL ADDRESS NEXT ADDRESS SELECT A-8 FORK MUX DECODE A- FORK B - FORK A- B DATA (4) MSC= 6 OR 7 ?-____, BRANCH ENABLE DATA BEN MUX BEN DATA (3) TK-0534 Addre~ Figure 2-34 Next Address Logic Table 2-24 Next Addre~ Lines Description Next Addre~ Control Lines FCTK BEN 2:0 H From FPA control store selects lines to be monitored during execution flows. cs 71, 70 CPU accelerator control field 00- NOP 01 - CPSYNC 10-ACC TRAP-To 3-bit address specified by CPU USI field 11 - REDEFINE USI 2-78 Table 2-24 Next Address Lines (Cont) Address Description Next Address Control Lines (Cont) cs 57, 56, 55 If CS7 I and CS70 are high enabling DEC USI, a 6 on these liries enables POLY DONE, a 7 FP TRAP. FCTH ACC TRAP H High during accelerator trap, low otherwise. FCTH FP TRAP L Low during FP trap, high otherwise. FCTH TRAP DIS L Low during either FP trap or accelerator trap, high otherwise. Next Address Selector Controls DEC µSI A-FORK MUX FCTH DEC µSIL enabled and CS 57, 56, and 55 high enable FCTH FP TRAP, otherwise it is high. B-FORK SELECT Enable H causes all highs out and doesn't affect next address. Enable L enables select input to select A-B data. NEXT ADDRESS MUX Enable H causes all highs out. If enable is low, Slow selects A input. BEN MUX Enable high causes all highs out. Addre~ Lines FCTR CRADR 08:00 H To control store selects address. Also can be transmitted to CPU via Reg 16 as current ADR. FCTK NEXT ADR 08:00 From control store next address from microword. FCTH TRAP A 07:00 L to FCTF Contains either trap address or next address. FMHR TRAP A 7:00H FP trap address from MAINT REG ID BUS. FCTH BRC 2:0 L From branch enable MUX (BEN) monitors various FPA conditions and modifies the next address during execution flows based on BEN field in FPA microcode. A-B FORK ADR (Not a signal name on prints) From A-FORK B-FORK select M ux. Monitors op code and specifier type from I Band modifies address in A-B forks. FCTF FLOAT H Based on op code. Used during A-B forks and by branch enable logic (BEN). cs 57, 56, 55 Select trap address during ACC trap. Also refer to CS 57, 56, 55 in control lines. 2-79 The offset is ORed with 180 and since STALL is no longer enabled (ACC INSR His high) the next CPT 0 will select the correct microword to control the next FPA cycle. If the data is already in the FP A, an optimized routine will be selected. 2.3.8.2 Performing an FPA Instruction - Once an FPA instruction is sensed, the microcontrol words and the order they are selected is based on the operation desired, float or double, location of the operands, and relative size of the operands and/or result. The FPA first ensures that it has all the required data. If both operands are in registers, or one is in a register and the other is a short literal, all the data is in the FPA after the A-fork test and the FPA transfers directly to the execution flows. If not, the first operand is fetched during A-fork and then MSC = 7 and next address = 100 is transmitted to the next address logic. In the next address logic, MSC = 7 selects the B-fork inputs from the op code and specifier decode, and transmits them through the A-B fork mux to be ORed with next address = 100. The offset selected depends on instruction type, double or float, and type of specifier 2. As before, tf the data is already in the FPA, an optimized routine is selected; otherwise, the FPA waits for the CPU to fetch data. In some data transfers (A-fork or B-fork) the FPA must wait for data to be transmitted from the CPU via the ID bus. The microcode has a special WAIT bit to enable ST ALL for this purpose. The CPU indicates that the required data is on the ID bus by asserting CP SYNC. CP SYNC causes the data to be stored in the FPA and clears STALL; thereby enabling a new microword to be read and FPA operations to continue. Once the FPA has all required data ACC OVERIDE is asserted. This signal, transmitted to CPU microaddress bit 12, causes the CPU to select microcode from FPA specialized microcode in the writeable control store (WCS) rather than PCS. This prevents the CPU from beginning microcode floating-point routines (used when no FPA is present) to do FP instructions. The enabling of ACC OVERIDE is based on instruction type (IRC lines) and the execution point counter, (IRC EP 2:0). Note that since the FPA cannot fetch data itself, the data-fetch routines (CPU AFORK and BFORK) are allowed to continue until the FPA has all required data. Once the FPA has all the data the FPA execution flows are entered. These flows perform the manipulation required to A, S, M, and D. This includes unpacking and individually manipulating the FPF and FPE parts of the number, as well as checking the operands and/or results for unusual conditions (zeros, underflow, overflow, etc.). During execution flows the BEN field selects lines to be monitored and used to modify the next address. The 3-bit BEN field of each microword can select 3 of 24 possible lines to be ORed with the next address field of the microword to select the address. The BEN multiplexer monitors signals from both the CPU and FPA. POLY DONE and CP SYNC are transmitted from the CPU using CS lines 71, 70, 57, 56, and 55. FLOAT, IRBRO L, and IRBRl L are generated in the FPA but are summaries of op code information transmitted from the instruction buffer. All other BEN lines monitor FPA internal conditions. Refer to Table 2-25 for a summary of BEN fields. Finally the flows manipulate the result to ensure it is in correct form and inform the CPU via FP SYNC asserted that the answer is available. 2-80 Table 2-25 BEN Control Store Field Operation Lines Monitored BRC2L BRCIL BR COL Summary l FLOATH* IRBRI L* IRBRO L* NOP Op code decode 2 SWR SWR SWR Shift within range 3 RSVH BH A=OH Operand(s) equal zero Reserved operand 4 POLYDNL* CPSYNCH* FLOAT* 5 (A or B=O) H ED.GE.8 H Operand(s) equal zero Check exponent difference MUL/DIV DNH Multiply done Division done PR8H Error Condition BEN Field 0 SUB*ED<2 H 6 7 UNDFL *From the CPU. The CPU accepts the answer via DFMX bus drivers on the FNM using DAP ENA ACC D (I) and also reads the ACC Z, V, C, and N data lines to determine the condition codes of the answer. Once the CPU has the answer it transmits a CPSYNC and the FPA returns to its IRD state. 2.3.8.3 Exception Conditions - At any time during either IRD or instruction states the CPU can direct the FPA to enter a trap routine for error recovery or microdiagnostics. The trap routines are located in the FPA's own microcode. There are two separate sets of trap routines: ACC traps for CPU and FPA errors and FP traps for microdiagnostics. Both trap routines are initiated via CS lines 71 and 70. If CS bus 71 is Hand CS bus 70 is L, an ACC TRAP is initiated. An ACC TRAP addresses the FPA microcode location selected by CS bus lines 57, 56, and 55 (location 0-7). These traps are normally initiated for.power-up and abort sequences. If CS bus 71, 70, 57, and 56 are high and 55 is low, an FP trap is initiated. The FP trap selects an 8-bit address previously stored in ID register 16, the Status register to access one of 256 addresses in the FPA microcode (location 0-255). These trap locations normally handle FPA microdiagnostics. Refer to Figure 2-34. 2-81 2.4 FPA Mlcrocontrol Fields This section summarizes all the fields in the FPA microcontrol word. Figure 2-35 shows the complete microcontrol word, all the fields, and the microcode mnemonics. Table 2-26 lists the function of each field. 47 46 45 44 43 41 42 40 38 39 37 35 36 33 34 32 I II III I I I I I I I I I I I Jl T NEXT ADDRESS 31 30 29 28 25 22 1 EALU MCTL CONTROL FPSYNC 15 26 14 13 12 exPo NENT PROCESSOR CONTROL 11 10 09 2.1 20 t EALU B INPUT 19 16 J~l l J .J MISCELrANEousscRATCH CONTROLS PAD WAIT NORM. CONTROL REGISTER 08 07 06 05 -------v------A-------.-----n------y.-------' BUS A - BUS B FRACTION SIGN LATCH DATA SOURCE EALUA INPUT BRANCH ENABLE '-------J 'Y l 27 Jl 1 PROCESSOR CONTROL 04 03 02 l CONTROL 01 00 V MULTIPLIER OPERAND CONTROL REMAINDER REGISTER CONTROL TK-0513 Figure 2-35 FPA Control Word Fields 2-82 Table 2-26 FPA Control Word Field Definitions Microcode Bits Field Function 47 :39 (9 bits) NAD - Next Address Contains the address of the next control word to be accessed. 38:36 (3 bits) BEN - Branch Enable Selects signals to be used for next address calculations. 35 :34 (2 bits) AMXC - A Mux Control Selects A input to FCT exponent ALU. 33 :32 (2 bits) BMXC - B Mux Control Selects B input to FCT exponent ALU. 31 :30 (2 bits) EALUC - EALU Control Controls FCT exponent ALU operation. 29 (1 bit) FPSYNC - Floating-Point Synchronize Transmits FPSYNC to CPU. 28 (1 bit) MCTL - Multiply Control Starts FML and FMH fraction multiply operation. 27 :24 (4 bits) EAC - Exponent Processor Control Controls FCT (exponent processing). 23 (1 bit) WAIT - Wait Controls FPA wait loop operation. Stalls until CPSYNC. 22 :20 (3 bits) MSC - Miscellaneous Control Controls Miscellaneous FPA operations. 19: 18 (2 bits) NRC - Normalization Register Control Controls fraction normalize operation in FNM. 17:16 (2 bits) SCR - Scratchpad Control Handles FPA General Register copies on FNM. 15:12 (4 bits) BSC - Bus A - Bus B Data Source Controls data transmission along FPA buses. 11 :8 (4 bits) F ADC - Fraction Processor Controls Controls FAD fraction processing. 7:5 (3 bits) SGNC - Sign Latch Controls Controls sign calculation on FCT. 4 (1 bit) LRR - Load Remainder Register Controls remainder register (RR) on FNM. 3:0 (4 bits) OPLD - Operand Load (Multiplier Control) Loads fractions for multiplication on FML and FMH. 2-83 2.5 FPA MICROCODE STRUCTURE The FP A contains a 512 word by 48 bits (per word) memory. This memory provides microcontrol of the FPA during normal operation and diagnostic programs for maintenance and troubleshooting. About 225 locations are for normal microcontrol, and 200 locations contain diagnostic programs. The other locations are available for future use. The microcontrol code has an IRD state (instruction register decode) and three fork points (A, B, and C). The FPA remains in the IRD state until an FPA instruction is decoded. The FPA then enters Afork, to receive the operands. If both operands are registers or short literals, optimized routines are entered and computation begins. Otherwise, B-fork is entered. If the second operand is not register data, C-fork is entered. Otherwise a B-fork optimization is taken. Figure 2-36 shows the basic microcode structure and indicates the microcode starting addresses of the various routines. 2.6 FPA INTERFACE FIRMWARE The CPU-FPA interaction is handled by specialized firmware located in the CPU's writeable control store (WCS). This firmware handles numerous interface tasks. For ADD, SUBT, MUL, and DIV operations it accepts and stores the FPA results and condition codes, and handles any exceptions flagged by the FPA. In 3-operand op codes it calls specifier decoding microcode in the base machine to decode the third operand. It also handles the special requirements of the EMOD, MULL and POLY commands. It is accessed when the FPA overrides the CPU Address by forcing the µPC < 12> to 1. This happens when the FPA detects an execution or optimization exit at a CPU A-fork, B-fork, or C-fork for an FPA implemented instruction. 2.6.1 Major Interface Functiom This firmware coordinates the interface between the CP microcode and the FP microcode including the normal transfers of CPU data to the FPA, FPA results back to the proper register in the CPU, and various control signals for both normal and exception control. Table 2-27 lists important macros and microorders that are used by the FPA interface firmware to generate and/ or monitor the signals which are transferred between the CPU and FPA. 2-84 IRD A FORK 1F2 1 F8 DOUBLE #.X DATA SOURCE KEY 1FC DOUBLE R.X R SA# # MEM x REGISTER SHORT LITERAL LITERAL (IMMEDIATE) MEMORY DON'T KNOW B FORK DOUBLE X.# C FORK OAC TK-0511 Figure 2-36 FPA Microcode Structure 2-85 Table 2-27 Interface Microcode Name of Macro Signal Monitored or Generated Data Transfer Function ID-D. SYNC CP SYNC generated CPU-+ FPA Gates the CPU D-Register's contents onto the ID bus. Generates CP SYNC. CP SYNC indicates that valid data is on bus. D-ACCEL & SYNC CP SYNC generated FPA-+ CPU Gates data placed on DFMX Bus by FPA into DRegister. CP SYNC indicates that the FPA's data has been accepted. Q-ACCEL & SYNC CP SYNC generated FPA-+ CPU Gates data placed on DFMX Bus by FPA into QRegister. CP SYNC indicates that the FPA's data has been accepted. ACCEL?* (BEN/ ACC<UB2, UBI, UBO>)t FP SYNC monitored FPA-+ CPU ACC<UBO> = l; Result data, on D FMX bus, and condition codes are being transmitted by FPA. If double precision condition codes are passed with first half. ERR SYNC monitored NO ACC<UBI> = I; An exception has been detected by the FP A. This initiates specialized routines that handle the exception. Not Mull** generated NO ACC<UB2> = l; Separates MULL and MULF POLY.DONE POLY.DONE generated CPU-+ FPA Indicates the last coefficient in the POLY operation, it being presented. In POLYD, used while both halves of the last coefficient are transmitted. TRAP.ACC[l] Accelerator Trap NO Returns FPA microcode to IRD state Loads PSW <N,Z, V,C> with FPA generated condition codes from CPU latches loaded in previous cycle. MSC/LOAD. ACC.CCT NO * This macro, in combination with the target constraint block, enables the CP microcode to test for various conditions. t This is a microorder rather than a macro. **This is a condition rather than a specific signal. 2-86 2.6.2 Major Instruction Grou~ The FPA firmware can be broken into 4 groups of routines: Generalized instructions handler, POLY handler, MULL handler, and EMOD handler. Group I handles all ADD, SUB, MUL, and DIV instructions as well as FPA exceptions. This group provides optimized flows for operands located in the general register set and literal operands. The POLY group transmits the polynomial coefficients to the FPA as they are needed and transmits POLY DONE when the last coefficient has been transmitted. It also responds to the FPA detection of overflow, underflow, and coefficient reserved operand. Overflow and reserved operand detections causes a branch to exception conditions routines in the base machine. If an underflow is noted, the firmware notes it and continues execution of the POLY flows. The MULL routine accepts the result of the longword integer multiplication from the FPA. Since the FPA creates an unsigned 64-bit product using 32-bit signed operands, the firmware must correct the result by subtracting out the effects of the negative signs on the magnitude result. To do this the firmware stores the operands in a form that can later be used as subtrahend operands to correct the product and, based on this stored information, determines the correction sequence to select when the result is transmitted from the FPA. The firmware also creates the proper signed result, sets the condition codes, and tests for overflow. The FPA handles only the fraction multiply of the EMOD instructions. As a result the EMOD firmware is relatively short. While the FPA is doing the fraction multiply this routine adds the exponents and checks for reserved operands, accepts the fraction multiply result from the FPA, checks for a zero result, and formats the FP A result so control can return to the EMOD routines in the base machine. 2-87 FP780 FLOATING-POINT ACCELERATOR TECHNICAL DESCRimON EK-FP780-TD-001 Reader's Comments Your comments and suggestions will help us in our continuous effort to improve the quality and usefulness of our publications. What is your general reaction to this manual? In your judgment is it complete, accurate, well organized, well written, etc.? Is it easy to use? - - - - - - - - - - - - - - - - - - - - - - - - - - - - What features are most u s e f u l ? - - - - - - - - - - - - - - - - - - - - - - - - - - - - What faults or errors have you found in the m a n u a l ? - - - - - - - - - - - - - - - - - - - Does this manual satisfy the need you think it was intended to satisfy? - - - - - - - - - - - - Does it satisfy your needs? _ _ _ _ _ _ _ _ _ _ _ __ D Why?---------- Please send me the current copy of the Technical Documentation Catalog, which contains information on the remainder of DIGITAL's technical documentation. Name---------------- Street----------------Title - -______________ - - - - - - - - - - - - -_- City - - - - - - - - - - - - - - - - - Company State/Country - - - - - - - - - - - - - Zip Department - - - - - - - - - - - - - Additional copies of this document are available from: Digital Equipment Corporation 444 Whitney Street Northboro, Ma 01532 Attention: Communications Services (NR2/Ml5) Customer Services Section Order No. EK-FP780-TD-001 -----------~~----------- - - - - - - - - DoNotTear-FoldHereandStaple - - - - - - - FIRST CLASS PERMIT NO. 33 MAYNARD, MASS. BUSINESS REPLY MAIL NO POSTAGE STAMP NECESSARY IF MAILED IN THE UNITED STATES Postage will be paid by: Digital Equipment Corporation Technical Documentation Department Maynard, Massachusetts 01754 -
Home
Privacy and Data
Site structure and layout ©2025 Majenko Technologies