Digital PDFs

EK-FP780-TD-1

December 1978

112 pages

Original

5.0MB

Document:	VAX-11/780 FP780 Floating-Point Accelerator Technical Description
Order Number:	EK-FP780-TD
Revision:	1
Pages:	112
Original Filename:

OCR Text

EK-FP780-TD-001

FP780 Floating-Point Accelerator
Technical Description

digital equipment corporation • maynard, massachusetts

I st Edition, December 1978

Copyright ~ 1978 by Digital Equipment Corporation
The material in this manual is for informational
purposes and is subject to change without notice.
Digital Equipment Corporation assumes no responsibility for any errors which may appear in
this manual.

Printed in U.S.A.

This document was set on DIGITAL's DECset-8000
computerized typesetting system.

The following are trademarks of Digital Equipment Corporation,
Maynard, Massach usctts:
DIGITAL
DEC
PDP
DEC US
UNIBUS

DEC system- I 0
DECSYSTEM-20
DIBOL
EDUSYSTEM
VAX
VMS

MASSBUS
OMNIBUS
OS/8
RSTS
RSX
IAS

CONTENTS

Page
PREFACE
CHAPTER 1

INTRODUCTION

I. I
1.1.1
1.2
1.3
1.4
1.4.1
1.4.2
1.4.3
1.4.4
1.4.5
1.4.6
1.4.7
1.4.8
1.5

GENERAL DESCRIPTION ............................................................................... 1-1
Accelerator Interface .................................................................................... 1-2
FPA INSTRUCTION SET .................................................................................. 1-3
PHYSICAL DESCRIPTION .............................................................................. 1-4
REVIEW OF FLOATING POINT NUMBERS AND ARITHMETIC .............. 1-5
Introduction ................................................................................................. 1-5
Integers ........................................................................................................ 1-5
Floating-Point Numbers ............................................................................... 1-5
Decimal/Binary/Hexadecimal Conversion ................................................... 1-6
Normalization ............................................................................................ 1-11
VAX Floating-Point Notation .................................................................... 1-12
Floating-Point Addition and Subtraction .................................................... 1-13
Floating-Point Multiplication and Division ................................................ 1-13
EXCESS 80(EXCESS 200g) NOTATION .......................................................... 1-14

CHAPTER2

FUNCTIONAL DESCRIPTION

2.1
2.1.1
2.1.2
2.1.3
2.1.4
2.1.5
2.1.6
2.2
2.2.1
2.2.1.1
2.2.1.2
2.2.1.3
2.2.2
2.2.2.1
2.2.2.2
2.2.2.3
2.2.3
2.2.3.1
2.2.3.2
2.2.4
2.2.4.1
2.2.4.2
2.2.4.3
2.2.5
2.2.5.l
2.2.5.2

DATA FORMAT ................................................................................................ 2-1
Floating-Point Numbers ............................................................................... 2-1
Integer Numbers .......................................................................................... 2-4
Literals ......................................................................................................... 2-4
Zero and Reserved Operand Codes ............................................................... 2-7
Hidden, Overflow and Guard Bits ................................................................ 2-8
Overflow, Underflow, Zero, and Reserved Operands .................................... 2-9
INSTRUCTIONS AND ALGORITHMS ......................................................... 2-12
Add/Subtract ............................................................................................. 2-14
Load ................................................................................................... 2-14
Add/Subtract ..................................................................................... 2-14
Normalize .......................................................................................... 2-15
Multi ply (Floating-Point) ........................................................................... 2-16
Load ................................................................................................... 2-16
Multiply ............................................................................................. 2-16
Normalize .......................................................................................... 2-17
MULL (Multiply Integer Longword) .......................................................... 2-17
Load ................................................................................................... 2-17
Multiply and Return ........................................................................... 2-17
Divide ........................................................................................................ 2-17
Load ................................................................................................... 2-18
Divide ................................................................................................ 2-19
Normalize .......................................................................................... 2-19
EMOD (Extended Precision Multiply and Integerize) ................................. 2-19
Operand Load ........................ ,........................................................... 2-19
Result Calculation and Return ............................................................ 2-19

iii

CONTENTS (Cont)
Page

2.2.6
2.2.6.1
2.2.6.2
2.2.6.3
2.2.6.4
2.3
2.3.1
2.3.1.1
2.3.1.2
2.3.1.3
2.3.2
2.3.3
2.3.4
2.3.4.1
2.3.4.2
2.3.5
2.3.5.1
2.3.5.2
2.3.5.3
2.3.6
2.3.7
2.3.8
2.3.8.1
2.3.8.2
2.3.8.3
2.4
2.5
2.6
2.6.1
2.6.2

POLY (Polynomial Evaluation) .................................................................. 2-20
Introduction ....................................................................................... 2-20
The Polynomial Expression ................................................................ 2-20
Normal POLY Flows ......................................................................... 2-20
POLY Exception Flows ...................................................................... 2-23
BLOCK DIAGRAM AND UNIT DESCRIPTION .......................................... 2-25
CPU-FPA Interface .................................................................................... 2-27
CPU-FPA Status and Control Interface .............................................. 2-28
CPU-FPA Data Interface ................................................................... 2-30
Trap and Diagnostic Information ....................................................... 2-31
FPA Internal Buses ..................................................................................... 2-34
Fraction Adder (FAD) ................................................................................ 2-37
Fraction Normalizer/Divide(FNM) .......................................................... 2-41
Normalize Operation .......................................................................... 2-43
Divide Operation ................................................................................ 2-45
Fraction Multiplier (FML and FMH) ......................................................... 2-48
The Pipeline ........................................................................................ 2-50
FM Control ........................................................................................ 2-57
Division .............................................................................................. 2-68
Exponent Processor .................................................................................... 2-68
Sign Processor ............................................................................................ 2- 74
Control Store and Logic ............................................................................. 2-76
IRD .................................................................................................... 2-77
Performing an FPA Instruction .......................................................... 2-80
Exception Conditions ......................................................................... 2-81
FPA MICROCONTROL FIELDS .................................................................... 2-82
EPA MICROCODE STRUCTURE .................................................................. 2-84
FPA INTERFACE FIRMW ARE ...................................................................... 2-84
Major Interface Functions .......................................................................... 2-84
Major Instruction Groups .......................................................................... 2-87

FIGURES
Figure No.
1-1
1-2
1-3
2-1
2-2
2-3
2-4
2-5
2-6
2-7
2-8
2-9
2-10
2-11
2-12
2-13
2-14
2-15
2-16
2-17
2-18
2-19
2-20
2-21
2-22
2-23
2-24
2-25
2-26
2-27
2-28
2-29
2-30
2-31
2-32
2-33
2-34
2-35
2-36

Title

Page

The FPA .............................................................................................................. 1-2
FPA Physical Location ......................................................................................... 1-4
Positional Value of Binary Number .................................................................... 1-11
Floating-Point Format ......................................................................................... 2.;,2
Integer Format ..................................................................................................... 2-5
Short Literal Format ............................................................................................ 2-6
Zero and Reserved Operand Code ........................................................................ 2-8
Hidden, Overflow, and Guard Bits ....................................................................... 2-8
Overflow and Underflow Ranges ........................................................................ 2-11
FPA Block Diagram ........................................................................................... 2-13
The POLY Flow ................................................................................................. 2-21
FPA Block Diagram ........................................................................................... 2-26
CPU-FPA Interface ............................................................................................ 2-27
Status Register ................................................................................................... 2-28
Maintenance Register ......................................................................................... 2-32
FP Bus Formats ................................................................................................. 2-36
Fraction Adder Block Diagram .......................................................................... 2-37
SHFR Operation ................................................................................................ 2-39
Fraction Normalizer /Divide Block Diagram ...................................................... 2-42
Normalize Shift Enable Control Hardware ......................................................... 2-43
Divide Sequence Hardware ................................................................................. 2-47
Divide Sequence Timing ..................................................................................... 2-48
Fraction Multiplier Block Diagram .................................................................... 2-49
The Pipeline· ....................................................................................................... 2-51
Loading and Accessing the Multiplicand ............................................................ 2-52
Loading and Accessing the Multiplier ................................................................. 2-53
SALU Operation - Adding the Stored Carrys ..................................................... 2-57
FM Control States .............................................................................................. 2-58
FM Control Logic .............................................................................................. 2-61
MULF Control .................................................................................................. 2-62
The XFER State ................................................................................................. 2-64
MULD Control. ................................................................................................. 2-65
MULL Control .................................................................................................. 2-69
Exponent Processor Block Diagram ................................................................... 2-70
Sign Processor Block Diagram ............................................................................ 2-74
Control Store and Logic Block Diagram ............................................................. 2-76
Next Address Logic .................................................... .- ....................................... 2-78
FPA Control Word Fields .................................................................................. 2-82
FPA Microcode Structure .................................................................................. 2-85

TABLES
Table No.
1-1

1-2
1-3
1-4

2-1
2-2
2-3
2-4

2-5
2-6
2-7
2-8
2-9
2-10
2-11
2-12

2-13

2-14
2-15
2-16
2-17
2-1~

L.<
2-20
2-21
2-22

2-23
2-24

2-25
2-26
2-27

Title

Page

Related Hardware Manuals ................................................................................... 1-1
FPA Instruction Set ............................................................................................. 1-3
FPA Modules ....................................................................................................... 1-5
Binary- Hex Equivalents ................................................................................... 1-10
Floating Literals ................................................................................................... 2-6
Zero Operand Microcode ..................................................................................... 2-7
Exception Conditions ......................................................................................... 2-10
FALU Operation ............................................................................................... 2-15
Special FAD Operation ...................................................................................... 2-15
The Division Load .............................................................................................. 2-18
The Status Register ............................................................................................. 2-29
CS Lines ............................................................................................................. 2-30
The Maintenance Register .................................................................................. 2-33
Signals Monitored by Visibility Bus .................................................................... 2-34
BSC Control Store Field ..................................................................................... 2-35
Fraction Data Entry ........................................................................................... 2-38
FALU Operation ............................................................................................... 2-40
FALU MUX Control ......................................................................................... 2-41
Round Byte and Normalize Control ................................................................... 2-44
Divide Sequence States ....................................................................................... 2-48
Operand Bus Source ........................................................................................... 2-55
FM Control States .............................................................................................. 2-59
EAC Control Store Field .................................................................................... 2-71
EALU Input Control. ......................................................................................... 2-72
EALU Control Store Field ................................................................................. 2-73
SGNC Control Store Field ................................................................................. 2-75
Sign Processor Operation ................................................................................... 2-75
Next Address Lines ............................................................................................ 2-78
BEN Control Store Field .................................................................................... 2-81
EPA Control Word Field Definitions ................................................................. 2-83
Interface Microcode ........................................................................................... 2-86

The FPA is a microprogrammed device operating as a synchronous extension of the CPU data path.
Both the FPA and CPU operate using a 200 ns microcycle; FPA TO coincides with CPU TO. As an
extension of the CPU, the FPA does not access memory data. The CPU must do memory address
calculations, access the calculated address, and transmit the accessed data to the FPA. The CPU is also
responsible for fetching and storing the FPA results. The FPA performs only the required floatingpoint or integer operation on the properly formatted operands transmitted to it.
The FPA can do floating-point addition, subtraction, multiplication, and division instructions. It receives a packed, normalized floating-point number containing a sign bit, fraction bits, and exponent
bits. The FPA breaks the number into parts and FPA data manipulation sections perform the operations required to carry out the instructions on each part. Once the result is completed, it normalizes
and packs the result for return to the CPU. Refer to Figure 1-1, a simplified diagram of the FPA.

ID BUS

FRACTION
PROCESSORS

(.)

(.!)

...J

DM FX BUS

w
u

CPU

CS BUS

a:
w

I-

CONTROL LINES

...J

(.)

(!)

...J

::!!:

I-

<(
0..
LL

EXPONENT AND
SIGN
PROCESSORS

FPA
TK-0522

Figure 1-1

The FPA

1.1.1 Accelerator Interface
The FPA is an optional hardware extension of the VAX CPU data path. It is the first of a series of
optional accelerators that can be plugged into slots 24 through 28 of the CPU backplane. To facilitate
design of these optional accelerators, a set of standard interface signals and buses is used to transfer
data and control information.
CI8N~Ide~Vov.rihe CPU general register set are kept in the FPA. These are read-only memory to the
fi*laii;nd>~JH'C)QMJd~apid access to register operands when used in instructions. Every time the CPU
g6~rri:lgbitehl<Jire~pdated, a copy of the update data is transmitted via the DFMX bus to the FPA
ro.ipi:t~ ruridlBah'3tt~§l 1hQfil .
~rh 2~1~Iqmo::> Aqq ~rh ~Iin

AH\ifhorl&c11~iMl14o<P~iid•literal) is transmitted to the accelerator via the ID bus. Memory data is
trlt~ ciriuo Mi\3CffID ~ister and then onto the ID bus. Literal data is transferred from the

itdtr<Dittbt> ~ ~fartkdit0f~! 1

All op codes are received from the instruction buffer. The FPA uses dedicated hardware to handle
eet\laibmt>8tQ,H~S'llhenor>~d@<iim1*~ed and, if part of the FPA implemented set, processing is
3lgntecA .Bf.-iJI Xe~.± 2i 1n~2~1q~1 n~ Aqq,
I£mi::>~b ol 1uod£ 01 wdmun noi2io~1q ~Iduob h
.~viwbni. \M.£8~.\~I.~ 01 8M,£8~.n~1.~- mo11 c.
IU2

FPA results are returned to the CPU via the DFMX bus. Any transfer of data (either operand~ or
results) between the CPU and FPA is controlled by the CPSYNC and FPSYNC. CPSYNC is transmitted via the CS bus. When an operand is transferred to the FPA, CPSYNC asserted (by the CPU)
indicates that data is available on the ID bus and FPSYNC is asserted (by the FPA) to indicate data
has been received. When the FPA is returning a result, FPSYNC indicates result available and
CPSYNC indicates result received. When a result is transferred, the FPA also transmits the proper
condition codes to the CPU.
Traps and errors are handled with three signals: ACC ERROR (from FPA to CPU), FP TRAP (CPU
to FPA), and ACC TRAP (CPU to FPA). ACC ERROR (also called ERRSYNC) is asserted when the
FPA detects an internal error and is input to the CPU BEN mux. FP TRAP is used by the CPU to
initiate microdiagnostics stored in the FPA. ACC TRAP selects either the power-up trap or the abort
trap (both stored in the FPA microcode).
1.2 FPA INSTRUCTION SET
The FPA handles only a limited number of instructions (refer to Table 1-2). No floating-point instructions are available in VAX's PDP-11 compatibility mode. As shown in the table, the FPA handles
single and double precision instructions in both 2 and 3-operand formats. The FPA handles the single
and double precision instruction variations internally. However, as stated before, the FPA does no
memory accessing. This means the CPU must do all address calculations and accessing for any input
operands stored in memory. Also, the FPA does not store any final results; it merely makes the results
available to the DFMX bus. The'CPU must enable the result onto the DFMX bus, determine the
result destination, and put it into the destination. In a 3-operand instruction, the FPA begins computing as soon as it has the 2 source operands while the CPU is computing the third, or destination,
address.

Table 1-2

FP A Instruction Set

Mnemonic

Description

ADDF*
ADDD*
SUBF*
SUBD*
MULF*
MULD*
DIVF*
DIVD*
POLYF
POL YD
EMODF
EMO DD
MULL*

Add single-precision floating-point
Add double-precision floating-point
Subtract single-precision floating-point
Subtract double-precision floating-point
Multiply single-precision floating-point
Multiply double-precision floating-point
Divide single-precision floating-point
Divide double-precision floating-point
Evaluate polynomial single-precision floating-point
Evaluate polynomial double-precision floating-point
Extended single-precision floating-point
Extended double-precision floating-point
Multiply integer longword

*The FPA instruction set includes both the 2-operand and 3-operand format of these instructions

1-3

1.3 PHYSICAL DESCRIPTION
The FPA consists of 5 hex-height, extended-length modules containing mostly Schottky ITL logic.
They replace blank modules 7014103 in slots 24 through 28 of the KA 780 backplane. These slots are
designated as the accelerator option slots. The FPA is powered by an H7100 installed in power supply
position 1. When viewed from the rear, position 1 is the rightmost location in the VAX CPU cabinet.
Position 1 is left empty if an accelerator is not installed. The H7 IOO is a 5 V, 100 A supply. Refer to
Figure 1-2 for the location of backplane slots and power supply. Refer to Table 1-3 for module designations and locations.

--~

TK-0524

Figure 1-2

FPA Physical Location

1-4

Table 1-3

1.4

FPA Modules

Module No.

Slot

Module Name

Module Function

M8285
M8286
M8287
M8288
M8289

24
25
26
27
28

FNM
FMH
FML
FAD
FCT

Normalization and fraction division
Fraction multiplication (most significant bits)
Fraction multiplication (least significant bits)
Fraction addition and subtraction
Exponent manipulation and FPA control

FLOATING-POINT NUMBERS AND ARITHMETIC

1.4.1 Introrluction
This section discusses some fundamentals of floating-point numbers and arithmetic. It provides useful
background for more advanced topics in later sections. The reader already familiar with floating-point
may skip this section.
1.4.2 Integers
All data within a computer system could be represented in integer form. The numbers that could be
represented in a 32-bit machine range in magnitude from 0000000016 to FFFFFFFF16 (or from Ow to
4,294,967 ,295). However, integer form imposes some limitations. Only whole numbers can be represented, i.e., no fraction or decimal parts; this imposes an accuracy limitation. Furthermore, numbers
greater than 4,294,967 ,295 cannot be represented; this imposes a range limitation.
These limitations are imposed by the stationary position of the radix point (e.g., the decimal point in
base 10 notation or the binary point in base 2 notation). An integer's radix point is usually omitted in
integer representation because it always marks the integer's least significant place. That is, there are
never any digits to the right of an integer's radix point. For this reason, an integer is sometimes called a
fixed-point number.
Integer notation, however, can be modified to overcome the range and accuracy limitations imposed
by the fixed radix point. This is done through the use of floating-point notation.

1.4.3 Floating-Point Numbers
Floating-point numbers, unlike integers, have no position restrictions imposed on their radix points. A
popular type of floating-point representation is called scientific notation. With scientific notation, a
floating-point number is represented by some basic value multiplied by the radix raised to some power.

Example
basic
value

~exponent

1,000,000 = 1. x 1Q6

~radix

1-5

There are many ways to represent the same number in scientific notation, as shown in the following
example.
Right shifts

512 = 512.
= 51.2
=
5.12
=
.512

Left shifts

x
x
x
x

100
101
102
103

512 =

512
= 5120
= 51200
= 512000

x
x
x
x

100
10-1
10-2
10-3

The convention chosen for representing floating-point numbers with scientific notation in the FPA
requires the radix point to always be to the left of the most significant digit in the basic value (e.g., .512
X I 03 in the above example). This modified basic value is called a fraction.
Notice that for each right shift of the basic value, the exponent is incremented and for each left shift the
exponent is decremented. The value of the number remains constant if the exponent is adjusted for
each shift of the basic value.
More examples of scientific notation are as follows.
Decimal
Notation

Decimal
Scient. No.

Binary
Notation

Hex
Notation

Hex
Scient. No.

.64 x 102
.33 x 102
.5 x 100
.9375 x 10-1

1000000.
100001.
0.1
0.00011

4016
2116
.816
.1816

.4 x 16-2
.21 x 16-2
.8 x 16'>
.I 8 x 160

33
I /2(.5)
3/32(.09375)

1.4.4 Decimal/Binary/Hexadecimal Conversion
There are standard routines to convert from decimal notation to hexadecimal (also called hex) and
back. When converting from either decimal-to-hex or hex-to-decimal it is convenient to first convert to
binary notation and then to the final notation.
Decimal to Hex Conversion:
To convert a decimal number with both integer and fraction portion to a hex number, the integer and
fraction are separated and converted individually. The integer is converted to binary by a repeated
division technique, the fraction by a repeated multiplication technique.

1-6

To convert an integer to binary representation, the integer is divided by two. The remainder of this
division (either 1 or 0) becomes the LSB of the binary representation. The result of this division is
again divided by two. The remainder of this division goes to the left of the LSB, becoming "next to
LSB." The result is divided again. This process is continued until the result is zero. Refer to Example 1.
Example 1 Convert 19710 to binary
STEP
STEP 2
STEP 3
STEP 4
STEP 5

2]197

1100
l

49 R

0101
_J_

2J"98
24 R

2)49
12

2)24
6

2)1'2
STEP 6

2)6
STEP 7
STEP 8

2)3
0

2J1
19710 = 1100 01012
TK-0654

1-7

A repeated multiply-by-2 converts a decimal fraction to a binary fraction. The decimal fraction is
multiplied by two. If the result is 1.0 or more, a l is placed in the MSB of the fraction (directly to the
right of the binary point); if less than 1.0, a zero is placed there. The fraction portion only of this result
is again multiplied by two, if the result is 1.0 or more, a l goes to the right of the MSB, less than 1.0, a
zero. This continues until the fraction portion of the result is all zeros (refer to Example 2) or until
enough binary fraction bits have been generated to represent the decimal accurately enough (refer to
Example 3). Note that finite length decimal fractions can become repeating fractions in binary (Example 3).

Example 2 Convert 3/8 (.375) to binary

STEP 1

.375

.0 1 1

@.1~-o__J
STEP 2

.75
2

G) .50 -+ 1 - - - - STEP 3

.50
2

<D .00 _., 1 - - - - .37510 = .0112

STOP

TK-0655

1-8

Example 3

Convert .60310 to binary

.1 0 0 1

STEP 1

.603

G).2~ _ _,..1
STEP 2

10 1
1 .

.206

® .412
STEP 3

0-----'

.412

@ .824
STEP 4

0------1

.824
2

(!).648 - STEP 5

.648
2

G) .296
STEP 6

.296
2

® .592
STEP 7

0------~

.592
2

DECIDE TO STOP

TK-0656

1-9

The conversion from binary to hex is very simple. Starting at the binary point, break the binary
number into groups of 4 digits each. (Zero fill at both right and left ends to complete groups of 4.)
Then replace each group of 4 with its hex equivalent. Refer to Table 1-4, and Example 4.
Table 1-4 Binary-Hex Equivalents
Binary

Hex

0000
0001

0
1

0010

0011
0100

3
4
5

0101
0110
0111
1000

6
7
8

1001

1010

1011
1100
1101
1110
1111

Example 4

D
E

Convert 1100I0110.101101 2 to Hex
1.

Break into groups of four and zero-fill left and right ends.
Zeros
Zeros
Added
Added
0001 1001 0110.1011 0 100
'-..;-" '-..,.-" '-..;-" '-..;-" ~
4
4
4
4
4

Replace four digit groups with hex equivalents. Refer to Table 1-4.
0001 1001 0110.1011 0100

6
B
196.B8 16

1 1001 0110.101101 2 =196.B8 16

1-10

To convert from hex back to decimal, first replace each hex digit with its 4-bit binary equivalent (refe,
to Table 1-4). Each position in a binary number has a positional value based on which side of the
binary point it is and its distance from the binary point. The positional values are based on powers of
two. The bit in the unit column has a positional value of one. The positional value doubles each time
you move from right to left, and halves as you move from left to right. Refer to Figure 1-3 for a
summary of binary positional values in both powers of two and decimal value.
••• 27
128

26
64

25
32

24
16

23 22 21
8
4 2

20 . 2·1 2·2 2-3
%
% 1/8
.5

.25 .125

2·6
1/64

2·4 2·5
1/16 1/32

.0625

...

.015625
.03125
TK-0657

Figure 1-3

Positional Value of Binary Number

To convert from binary notation to decimal notation, add the decimal positional value of each bit that
is a one. This sum will be the decimal equivalent of the binary number.
1.4 .S Normalization
As discussed previously, there are many ways to represent a particular floating-point number using
scientific notation and the convention chosen for representing floating-point numbers in VAX and the
FPA requires the radix point to be to the left of the most significant bit in the basic value. Refer to
Example 5.
Example 5 Floating-Point Form
2910

x
x
1110.1
x
111.01
x
11.101
x
1.1101
Chosen ... 1110 1 x
Form
.0111 01 x
.0011 101 x

= 111012 = 11101.

.11101
Fraction
5
Exponent

1-11

20
21
22
23
24
2s
26
2·1

=
=
=
=
=
=
=

1 1101.
11 1010.
111 0100.
1110 1000.
1 1101 0000.
11 1010 0000.
111 0100 0000.
1110 1000 0000.

x
x
x
x
x
x
x
x

20
2-1
2-2
2-3
2-4
2-s
2-6
2-7

The process of ensuring that the first significant bit is directly to the right of the binary point is called
normalization. If the number is one or larger it involves right-shifting the basic value and incrementing
the exponent until the MSB (a one) is directly to the right of the binary point. If the number is a
fraction with leading zeros the basic value is left-shifted and the exponent is decremented. Examples 6
and 7 show conversion of numbers to VAX normalized form.
Example 6

Convert 7510 to a normalized binary number
I.

Integer conversion
7510=10010112

Floating-point form
I 00 I01 12 = I00 10112 X 20

Normalized form
Right shift fraction 7 times
Increment exponent by 1

100 10112 x i> = .100 1011 x 21
Fraction = .100 I 011
Exp.onent = 7
Example 7 Convert 3/16 (.01875) to a normalized binary number.
1.

Integer conversion
.0187510 = .00112

Floating-point form
.00112 = .00112 x 20

Normalized form
Left shift fraction 2 times
Decrement exponent by 2

.001 Ii X 20 = .11 X 2-2
Fraction = .11
Exponent = -2

1.4.6 VAX Floating-Point Notation
Two conventions are used in the FPA to conserve memory space without losing accuracy and to aid in
hardware manipulation. The first convention is called the hidden bit. All numbers transferred between
the CPU and FPA are normalized floating-point numbers. This means the first significant bit (always a
I) is always directly .to the right of the binary point. To conserve memory space and data lines, the first
significant bit is not stored or transmitted to the FPA. For example, the fraction part of the normalized
binary number .llOOO... X 2-2 will be stored and transmitted to the FPA as 100 .... The normalized
fraction of 1/2 (.100 ... X 20) will be stored and transmitted as 000 .... In both cases the first I (the
hidden bit), will be added by hardware in the FPA. When the FPA transfers a normalized answer back
to the CPU the hidden bit is not sent.

1-12

The 8-bit exponent portion of a floating-point number is stored using excess 8016 notation. This notation simplifies the hardware that manipulates the exponent during floating-point arithmetic operation.
Excess 8016 exponent notation is obtained by adding 100000002 (200s, 8016, or 12810) to 2's complement notation.
Refer to Paragraph 1.5 for a further discussion of excess 80 notation.
1.4.7 Floating-Point Addition and Subtraction
In order to perform floating-point addition or subtraction, the exponents of the two floating-point
numbers involved must be aligned or equal. If they are not aligned, the fraction with the smaller
exponent is shifted right until they are. Each shift to the right is accompanied by an increment of the
associated exponent. When the exponents are aligned, the fractions can then be added or subtracted.
The exponent value indicates the number of places the binary point is to be moved to obtain the integer
representation of the number.
In example 8, the number 710 is added to the number 4010 using floating-point representation. Note
that the exponents are first aligned and then the fractions are added; the exponent value dictates the
final location of the binary points.
Example 8

Floating-Point Addition

0.1010 0000 0000 000 x 26 = 2816 = 4010
+0.1110 0000 0000 000 x 23 =
1.

716 = 710

To align exponents, shift the fraction with one smaller exponent three places to the right and
increment the exponent by 3, and then add the two fractions.
0.1010 0000 0000 000 x 'lfJ = 2816 = 4010
+0.0001 1100 0000 000 x 26 =
~

716 =

710

0.1011 1100 0000 000 X '2fJ = 2F16 = 4710
2.

To find the integer value of the answer, move the binary point six places to the right.
010 1111.0000 0000 0
~

1.4.8 Floating-Point Multiplication and Division
In floating-point multiplication, the fractions are multiplied and the exponents are added. For floating-point division, the fractions are divided and the exponents are subtracted. There is no requirement
to align the binary point in the floating-point multiplication or division. Example 9 shows floatingpoint multiplication. Example IO shows division.

1-13

Example 9:
Multiply 7 10 by 4010·
1.

x 23 = 7 = 710
x 0.1010000 x 26 = 2816 = 4010
0.1110000

1110000
0000
11100
.1000110000
2.

x 29 (Result already in normalized form.)

Move the binary point nine places to the right.
J.Q_00110.!l9.00000

= 11816 = 28010

Example 10:
Divide 151 o by 51 o.

.1111000
.1010000

x 24

x 23

1.100000
1010000 )1111000.000000
1010000
101000
101000

0
2.

Exponent: 4-3 = 1

Result: 1.100000 X 21
Normalized Result: .1100000 X 22\

Normal~

Normalized Exponent

Move binary point two places to the right.
~00000

= 316 = 310

1.5 EXCESS 80 NOTATION
The VAX and, consequently, the FPA use excess 80 notation to store and handle the exponent portion
of floating-point numbers. Excess 80 notation is the 2's complement of exponent plus 12810 or 80t6·

1-14

It is convenient to handle the exponent portion of the floating-point number in 2's complement notation. This allows a wide range of both positive and negative exponents to be represented. However, in
2's complement notation an overflow must occur to go from the least negative number to zero. To
avoid this the bias of 12810 is added to the 2's complement number.
Historically, minicomputers have been discussed and explained using octal notation. In octal, the bias
of 12810 is 200g. In previous manuals this exponent notation has been discussed using octal form. As a
result, it is called excess 200g or excess 200. However, the VAX is discussed using hexadecimal notation. Unfortun~tely, when discussing the excess 80 bias in VAX documentation, it has been called 8016,
12810, 200g, and lOOOOOQOi (sometimes the base is indicated, sometimes it isn't). When studying the
FPA print sets, technical manuals, and microcode listings, be aware of this variation in terminology. In
this manual hex notation is used and the exponent bias is called excess 80.
When multiply and divide operations are performed using floating-point numbers with excess 80 exponent notation the resulting exponent must be adjusted by the bias to return the result to excess 80
notation. When a multiplication is performed exponents are added, 8016 must be subtracted from the
result to return it to excess 80 notation. To understand why 80 must be subtracted from the exponent
calculation during multiplication, consider the following.
Exponent A + 80

Excess 80 notation

Exponent B + 80

Exponent A + Exponent B + l 00
Both exponent A and exponent B are biased by 80, yielding a bias of l 00. However, only a bias of 80 is
desired in excess 80 notation.
Multiplication Example
2 x 3 =6

Exponent

Fraction

x 82

2 = 0.100
3 = 0.110

x 82

Fraction Calculation

Exponent Calculation

2 = 0.100
3 = 0.110
1000
100
6 = 0.011000

82
+82
104
-80
84

1-15

Normalize the fraction by left-shifting one place and decreasing the exponent by 1.
Fraction

Exponent

0.11000 x 83 = 6
When a division is performed, exponents are subtracted and 8016 must be added to the result to return
it to excess 80 notation. To understand why 80 must be added to the exponent calculation during
division, consider the following:
Exponent A + 80
- Exponent B + 80
Exponent A -

Exponent B + 80 - 80 = Exponent A - Exponent B + 0

However, since the result is to be in excess 80 notation, 8016 must be added to the exponent, yielding
Exponent A - Exponent B + 80.
Division Example

16/4 = 4
Exponent

Fraction

x
x

16 = .10000
4 = .10000

85
83
Exponent
Calculation

Fraction
Calculation

85
-83
2
+80
82

1.000

Normalize the fraction by right-shifting one place and incrementing the exponent.
Fraction

ExRonent

.10000 x 83 = 4

1-16

CHAPTER 2
FUNCTIONAL DESCRIPTION

This chapter explains the operation of the FPA. The chapter can be divided into four areas: introduction, algorithms, hardware operation, and microcode. The introduction (Paragraph 2.1) discusses the various types of data formats that may be handled by the FPA. The algorithms (Paragraph
2.2) lists the various instructions the FPA can do and explains the FPA operations required to perform
each operation. This section discusses the FPA operation based on instruction flow. Hardware operation (Paragraph 2.3) breaks the FPA into hardware blocks and discusses the operation of each. Both
the algorithm section and the hardware operation section should be read to get a thorough understanding of the· FPA operation. They discuss the same equipment from different viewpoints. Microcode (Paragraphs 2.4 through 2.6) summarizes both the FPA microcode and the FPA specific
microcode in the CPU. This discussion focuses on the generation and monitoring of the various control signals passed between the units.
2.1 DATA FORMATS
The FPA handles single (float) and double precision floating-point data and signed integer longwords.
It receives normalized, packed data from the CPU and returns normalized, packed results to the CPU
over 32-bit wide buses. Within the FPA, intermediate data is transmitted over two 34-bit wide buses.
The data formats used by the FPA are compatible with these bus structures as well as the input and
output formats of the various data manipulation units within the FPA.
2.1.1 Floating-Point Numbers
Floating-point numbers consist of sign bit, exponent bits, and fraction bits. A single precision floatingpoint number is stored in CPU memory as 4 contiguous bytes starting on an arbitrary byte boundary.
Bits are labeled from the right, 0 through 31. The number is specified by its address A, the address of
the byte containing bit 0 (Figure 2-1 ). The range of a single precision floating-point number is approximately .29 X 10-38 through 1.7 X 1038. The precision is typically 7 decimal digits.
A double precision floating-point number is stored as 8 contiguous bytes. Bit labeling and addressing
is similar to a single precision floating-point number. A double precision number has a range similar to
a single precision, but its precision is about 16 decimal digits (Figure 2-1 ).

2-1

SIGN

FRACTION

.657

EXPONENT

FRACTION BITS

SIGN BIT

x x x x

A NORMALIZED FLOATING
POINT NUMBER.

212

EXPONENT BITS

•••••

(EXCESS 200 NOTATION)

COMPUTER

REPRESENTATION.

SIGN
7

1
33 32 31

SIGN
16 15 14
L. 0. FRACTION

0
AS STORED IN VAX MEMORY.
TRANSFERRED TO FPA. AND
RECEIVED BY FPA.

HI ORDER FRACTION

L 0 ORDER FRACTION

0
AS TRANSFERRED ON FPA BUSES;
FP BUS A + FP BUS B.
(UNNORMALIZED. INTERMEDIATE
RESULTS)

H. 0. FRACTION

EXPONENT

OVERFLOW
HIDDEN

L. 0. FRACTION

AS USED IN FPA (UNPACKED:
UNNORMALIZED RESULTS)

EXPONENT

SIGN

1 33 32 31

L. 0. FRACTION

0
READY FOR RETURN TO CPU
(PACKED. NORMALIZED)

H. 0. FRACTION

SIGN
15 14

L. O. FRACTION

7
EXPONENT

0
H. 0. FRACTION

RETURNED TO CPU

NOTE 1:
A NORMALIZED NUMBER HAS A 0 (ZERO) OVERFLOW BIT. AND A 1 HIDDEN BIT.
TK-0528

a. Single Precision

+
SIGN BIT

EXPON~

FRACTION
.657

SIGN

214

EXPONENT BITS

FRACTION BITS
1

A NORMALIZED FLOATING POINT NUMBER

xx x • • • •

·I

COMPUTER REPRESENTATION

(EXCESS 200 NOTATION)

'-.,.-.J"-y-J~

FRACTION

AS STORED IN VAX MEMORY. TRANSFERRED TO
FPA. AND RECEIVED BY FPA (TRANSFERRED IN
TWO TRANSFERS: BITS 0-31 FIRST TRANSFER.
BITS 32-63 SECOND TRANSFER)

0 33 32 31

AS TRANSFERRED ON FP BUSES
(UNNORMALIZED. INTERMEDIATE RESULTS).

FRACTION

c:
"'t

16 15

(1)

t:.J
0

LSB

(JQ

'Tl

32 31

48 47

FRACTION

FRACTION
LSB

NOT USED

COMPLETE NUMBER (66 BITS
TRANSFERRED SIMULTANEOUSLY)

OVERFLOW HIDDEN

--~~~~~-r-~~--

,...
~

s·

"'tl
Q

SIGN

'Tl

AS USED IN FPA (UNPACKED. UNNORMALIZED

FRACTION

"'t

FRACTION

EXP

RESULTS)

LSB

::r
(1)
~

0
-,

I._)

33 32 31

0 33 32 31

16 15

.
FRACTION

FRACTION

16 1 5 14
FRACTION
SIGN

NOT USE

MSB

16 15 14

FRACTION

1615

31
FRACTION

RETURNED TO CPU 1ST TRANSFER - 32 BITS
(EXPONENT AND MOST SIGNIFICANT FRACTION
BITS)

SIGN

MSB

FRACTION
LSB

READY FOR RETURN TO CPU (PACKED.
NORMALIZED)

2ND TRANSFER - 32 BITS
(LEAST SIGNIFICANT FRACTION BITS)
NOTE 1·
A NORMALIZED NUMBER HAS A 0 (ZERO)
OVERFLOW BIT. AND A HIDDEN BIT.
TK·0527

b. Double Precision

Floating-point numbers are transmitted to the FPA as packed, normalized numbers without a hidden
or overflow bit. A single precision (float) number will have 24 fraction bits and a double precision
number will have 56 fraction bits. Hardware in the FPA inserts and handles both the hidden and
overflow bits. The number is split apart and used in various data manipulation units in the FPA.
Although all operations begin with normalized operands, the intermediate results produced by the
FPA data manipulation units can vary widely. Subtraction of nearly equal numbers can produce a
number very close to zero. Addition and division can produce numbers close to 2. As a result intermediate results are transferred between data manipulation units as unnormalized numbers with both
hidden and overflow bits. After the result is normalized, it is ready to return to the CPU. When the
result is transmitted, it is transmitted as a packed, binary normalized number without hidden or overflow bits.
POLY uses specialized floating-point notation for intermediate results. In POLY, 7 additional bits are
used for fraction addition. POLY execution consists of multiply, add, multiply, etc. To maintain
maximum accuracy while functioning within the limitations of the FPA hardware, 7 additional LSBs
are transferred from the fraction multiply (FMH + FML) hardware to the fraction add hardware
(FAD). The 7 additional bits come from LSH < 11 :5> along FP bus A < 14:08> into AR <06:00>
(also called ARX). The FPA performs the add on the extended precision number, then transfers the
addition result to the normalizer logic (FNM) where it is rounded, normalized, and held for the next
part of the POLY instruction.
The EMOD instruction causes a 32 X 24 (64 X 56 for double) bit fraction multiplication to be performed in the FMH and FML. The extra 8 bits in the multiplicand are transferred over the ID bus to
FP bus B line <07:00> to MCINT (also called MCX). MCINT <07:00> drives MCAND bus
<07:00> for the fraction multiply. MPLIER is handled in the usual fashion. The result of the extended
precision multiply is transferred to the CPU in one 32-bit transfer (F) or two 32-bit transfers (D).
2.1.2 Integer Numbers
The FPA handles a single integer format instruction, MULL (multiply longword). A longword is
stored in CPU memory as 4 contiguous bytes starting on an arbitrary byte boundary. The FPA receives two 32-bit signed integers and multiplies them as unsigned integers to form a 64-bit product. The
product, a 64-bit number, is returned to the CPU in two 32-bit transfers (low half first) for further
processing. Refer to Figure 2-2 for summary of integer format.
2.1.3 Literals
The FPA handles float and double precision literal data. It receives the data from the CPU IB. Float
literal data is transferred from the IB to the FPA's Literal Register (LR) using the ID bus. The FPA
then loads the LR data into FPA internal registers and begins processing. The first half of double
precision literal data is handled similarly. The second half comes from the CPU D-register via the ID
bus and is loaded directly from the ID bus into the FPA internal registers.

2-4

INTEGER(MULL)FORMAT

3130

LSB

333231

I'-r-'

11 MSB

cjQ'

c
....,

LSB

AS TRANSFERRED ON FPA BUSES

UNSIGNED (POSITIVE) NUMBER

NOT
USED

0
N
I
N

AS STORED IN VAX MEMORY
TRANSFERRED TO FPA AND
RECEIVED BY FPA.

2's COMPLEMENT (SIGNED) NUMBER

SIGN

"Tl

MSB

.....
0

0 3

AALU

SALU

0 31
LSH REG

LSB

RESULT STORED IN FPA

RESULT TO CPU (VIA
FP BUS A TO DFMX BUS)

O'Q

0
....,
"Tl
0
....,

3
.....
~

LSB

1st TRANSFER
31

MSB

* BITS 32 AND 33 OF FP BUS NOT USED

2nd TRANSFER

TK-0523

The FPA handles short literals. Short literals contain only six data bits and are part of the instruction.
The CPU formats the six data bits within the 32-bit data longword based on instruction type (floatingpoint or integer instruction.) If it is an integer instruction (the FPA handles only MULL), the six data
bits are zero extended (26 zeros are added.) Any integer between 0 and 6310 can be written using a
short literal. If it is a floating-point instruction, the short literal is assumed to contain three exponent
bits and three fraction bits. The IB packs the data into standard FP format. This includes excess 80
notation for the exponent, a positive sign bit and a normalized fraction with a one hidden bit that is
not stored. Refer to Figure 2-3 for FPA short literal format, and Table 2-1 for data that can be
transferred using floating-point short literal form. Notice only positive numbers can be transferred.
If a double precision short literal is specified, the FPA accepts the first half and manufactures zeros to
fill the second half.
5

3 2
EXPONENl

FRACTION ]

A. SHORT LITERAL DATA; AS STORED IN INSTRUCTION STREAM

151413

111 ZEROS

ZEROS
B.

10 9

4 3
DATA

0
ZEROS

SHORT LITERAL DATA: AS FORMATTED BY IB AND
TRANSFERRED TO FPA FOR A FLOATING-POINT OPERATION
TK-0519

Figure 2-3

Short Literal Format

Table 2-1
Exponent

Fraction
0

0
1
2
3
4
5
6
7

Floating Literals

1/2
1
2
4
8
16
32
64

9/16
1-1/8
2-1/4
4-1/2
9
18
36
72

5/8
1-1 /4
2-1/2
5

11/16
1-3/8
2-3/4
5-1/2
11
22
44
88

3/4
1-1/2
3
6
12
24
48
96

13/16
1-5/8
3-1/4
6-1/2
13
26
52
104

7/8
1-3/4
3-1/2
7
14
28
56
112

15/16
1-7 /8
3-3/4
7-1/2
15
30
60
120

!O
20
40
80

2-6

The FPA also handles long literals (32 or 64 data bits). Thirty-two bits, either a complete single
precision transfer or the first half of a double precision, are transferred from the IB to the FPA LR.
The second half of the double precision number is taken directly from the ID bus. Float and double
precision floating-point data can be transferred using long literal format. The FPA also receives 32-bit
integer data using the long literal format. (The FPA does not handle any 64-bit integer operands.)
2.1.4 Zero and Resened Operand Codes
The FPA checks all data received for zeros and reserved operands during the fraction processing. Both
zero and reserved operand function as codes transmitting specia·I information. As discussed in Paragraph 1.4, the FPA assumes all floating-point numbers to be no_rmalized numbers (between l /2 and I)
with a hidden bit that is not stored. The hidden bit is normally inserted by data manipulation hardware. A zero cannot be represented as a normalized number and the hardware that inserts the hidden
bit only increases the problem of representing and using zero. As a result, zero is represented by a code
with zeros in the exponent bits (no excess 200 notation) and a clear sign bit. The fraction bits do not
matter. Whenever this combination of bits is sensed, the FPA accesses special microcode that simulates the special properties of addition, subtraction, multiplication, and division with zero. Refer to
Table 2-2 for the result of an operation with zero, and Figure 2-4 for the zero code.
Table 2-2

Zero Operand Microcode

Operation

Operand(s)

Operation Result

Add

o+x, x+o
o+o

X operand returned
Zero returned*

Subtract

0-X

X--0
0-0

-X returned
X operand returned
Zero returned

Multiply

oxo, xxo, oxx

Zero returned*

Divide

O+X (dividend is zero)
X +0 (divisor is zero;
divide by zero)

Zero returned*
Error conditiont

• Zero code is returned, 0 in sign and exponent.

t FPA informs CPU that division by zero was attempted by asserting FPA error and PSL V bit and
not· asserting FP SYNC.

2-7

ZERO CODE
31

7 6

16 15 14

.,.,.,oE•t--- ZERO ----13•4•DON'T CARE3

,....,...___ _ _ DON'T CARE _ _ _ _
FRACTION

SIGN

EXPONENT

FRACTION

RESERVED OPERAND CODE

161514

.j~..------DON'T
.

cARE-----•-l 1 l-4----zERo----3...... DON'T CARE3

FRACTION

SIGN

EXPONENT

FRACTION
TK-0517

Figure 2-4

Zero and Reserved Operand Code

The code for reserved operand is zeros (cleared) in the exponent bits and a one (set) in the sign bit. One
in the sign bit normally indicates a minus number so this sometimes called minus zero. A reserved
operand indicates invalid data. It indicates data was accessed from a location that had not had data
loaded into it, or a previous exception. Refer to Figure 2-4 for reserved operand code.
2.1.S Hidden, Overflow and Guard Bits
The FPA uses extra fraction data bits during fraction manipulation to completely represent the fraction data, to handle result overflow, and to ensure accuracy of fraction result. Refer to Figure 2-5 for
location of hidden, overflow, and guard bits.
USED BY FPA
ADDED J
BY

FPA

DATA FROM CPU

~f 31

161~14

al+- FP BUS

7 6

-.-....------------------------------------------~-----------FRACTION

EXPONENT

Ul'--_
SIGN

OVERFLOW
HIDDEN

LINES

FRACTION

___,

WHERE GUARD
BITS ARE
TRANSFERRED
TK-0518

Figure 2-5

Hidden, Overflow, and Guard Bits

As discussed previously, the CPU stores floating-point numbers in a packed normalized form with the
MSB of the fraction (called the hjdden bit) not stored (since it is always a 1). The FPA receives the
floatin~-poipt numbers in this form. To facilitate fraction calculat;on, logic on FNM adds the hidden
bit to ~.11 CPU fraction d~ta as it transported over the FP buses. T,he hidden bit is transmitted on FP
bus (32). This means that all fraction data received by FPA fraction manipulation units have correct
hidden bits.
2-8

The FPA also transmits an overflow bit between fraction manipulation units using FP bus (33). The
overflow bit handles unnormalized intermediate fraction results. The combination (addition, subtraction, or division) of two normalized fractions can create a result greater than 1. The overflow bit
enables the FPA to transmit this unnormalized result from the fraction computation units to the
fraction normalizer logic (FNM).
To ensure accuracy of fractional results, the FPA data manipulation units add seven zeros called guard
bits to the low order end of the fraction data they receive. This means a float fraction is 32-bits wide; a
double, 64-bits wide. The POLY instruction loads extra data bits rather than zeros at the low order end
of each coefficient fraction. The instruction also transfers additional low order data bits from the
fraction multiply logic to the fraction add logic. These guard bits are dropped each time the POLY
accumulation is normalized and rounded but they do ensure that the final answer is accurate. Without
the guard bits, the right-shifting of a FP fraction to align radix points for addition and subtraction, or
to normalize the result would lose the least significant bits off the right end of the shifted fraction. In
some cases this loss would cause the last bit of the normalized result to be wrong. The guard bits
prevent this. Guard bits are transmitted between FP data manipulation units using FP bus A (.14:08).
These lines normally transmit exponent data. This arrangement allows the FPA to maximize accuracy
without additional hardware overhead.
2.1.6 Overflow, Underflow, Zero, and Reserved Operands
The FPA monitors all operands and results for exceptional conditions. When the FPA senses one or
more of these conditions it informs the CPU via various bits and combinations of bits. Either one or
both units begin special operations designed to minimize the effect of the condition. In som~ cases it
stops the FPA's current operation and returns the FPA to the IRD state where all logic and registers
are cleared in anticipation of a new FP instruction. The following paragraphs discus() these v3rious
unusual conditions. Table 2-3 summarizes the FPA and CPU operations caused by the unusual conditions.

2-9

Table 2-3 Exception Conditions

Op Code

Exceptions Encountered
Reserved Operand
Zero Operand

Result

ADD,
SUBT,
MULT,
EMOD

Microcode simulates
arithmetic operation
with zero (Table 2-2).

FPSYNC (ACCO) clear
ERRSYNC (ACC 1) set
CPU traps FPA to IRD

All operations handle the
occurrence of zero, underflow,
and overflow results similarly.*

DIVIDE

ZERO DIVIDEND Microcode returns
zero as result

FPSYNC (ACCO) clear
ERRSYNC (ACC 1) set
PSL V bit clear

ZERO - The zero code and
FPSYNC are sent. PSL Z bit
is set.

ZERO DIVISOR Divide by zero
ERROR - FPSYNC
(ACCO) clear
ERRSYNC (ACC 1) set
PSL V bit set

UNDERFLOW - Zero code,
FPSYNC, and ERRSYNC are
sent. PSL Z is set. If PSL U
(underflow) is set underflow
causes a trap, otherwise
operations continue.

CPU differentiates between ZERO DIVISOR and
RESERVED OPERAND by examining PSL V
bit. In both cases, CPU traps FPA to IRD.

OVERFLOW - Reserved
code, FPSYNC, and ERR
SYNC are sent. PSL V is set.
CPU traps FPA to IRD.

POLY*

POLY microcode
simulates POLY
operations with zero.
(Table 2-2 and
Paragraph 2.2.6).

MULL

No checking of MULL operands or results is performed by FPA software or
hardware. Any combination of bits can be interpreted as an acceptable integer.

FPSYNC (ACCO) set
ERRSYNC (ACC 1) set
In STATUS REGISTER,
minus ZERO ERROR
bit set.
CPU checks argument =
RESERVED OPERAND.
FPA checks coefficient
=RESERVED
OPERAND.

When POLY flows note a RESERVED OPERAND, UNDERFLOW, or OVERFLOW, both FPSYNC (ACCO)
and ERRSYNC (ACCI) are set. CPU examines PSL and FPA STATUS REGISTER to determine exception
condition. RESERVED OPERAND sets the MINUS ZERO ERROR bit. OVERFLOW sets the PSL V bit.
UNDERFLOW sets PSL Z bit.

2-10

Overflow and Underflow

The FPA can handle a very large but bounded, range of numbers. Numbers too large (overflow) or too
small (underflow) cannot be accurately handled (Figure 2-6). Special hardware monitors the results of
all FPA operations for overflow and underflow conditions. The FPA checks for overflow and under- flow by monitoring the exponent results. The monitoring is straightforward because of the excess 80
notation used. If the exponent with its excess 80 bias exceeds FF16 an overflow has occurred. If the
exponent is less than 0, an underflow has occurred.

OVERFLOW -.111 X2 7 F

RANGE

~ -1.7

-.1 x2- 13 o

x 103 8

MOST
NEGATIVE
NUMBER

.1x2-s 0

UNDERFLOW
RANGE*

::::;.29 x 103 8

.111X2

~ 1.7

OVERFLOW
RANGE

x 1Q38

l l
ZERO

SMALLEST
NEG. NUM.

SMALLEST
POS. NUM.

*EXACT ZERO DOES NOT CAUSE UNDERFLOW
TK-0521

Figure 2-6 Overflow and Underflow Ranges
If an overflow condition is sensed, the overflowed number is useless. The FPA manufactures a reserved
operand and informs the CPU that an overflow occurred. The CPU notes the overflow and stores the
reserved operand. The FPA returns to IRD.

Underflow is not as serious a problem. It merely indicates that the number is so small and so close to
zero that the FPA cannot accurately represent it. If an underflow occurs the FPA sets the underflowed
number to zero and informs the CPU that an underflow has occurred by asserting.both FP SYNC and
ERR SYN. It is important to inform the CPU that a zero has been returned because the CPU may at
some later time attempt a division by the result (division by zero results in an error).
Zero
If a zero code is encountered in an operand transmitted to the FPA from the CPU, FPA microcode
simulates the special properties of addition, subtraction, multiplication, and division with zero. Refer
to Table 2-2 for the result of an operation with zero. If an exact zero is generated as a result of an FPA
operation, the zero code is returned to the CPU and the condition code bits are set for a zero result.
Zero can be generated in a normal arithmetic add or subtract operation (equal or equal-opposite
operands) or in a microcode simulated arithmetic operation with a zero operand. An operation that
generates an exact zero does not assert ERR SYN like an underflow operation (although both return a
zero code).
Reserved Operand

Refer to Table 2-3 for the condition codes returned to the CPU when a reserved operand is encountered by the FPA.

2-11

2.2 INSTRUCTIONS AND ALGORITHMS
This section concentrates on the microcontrol used to carry out each FPA instruction. Each instruction accesses different microcontrol addresses to correctly move and load operands, compute intermediate results, and ready the final result for return to the CPU. Special instructions check for and
handle errors and exceptional conditions.
This section details the data flow between hardware required to carry out the selected instruction. It
only summarizes the hardware actions started once the data has been loaded by the microcontrol.
Paragraph 2.3 contains a complete and detailed description of the hardware in each FPA section.
Paragraph 2.2 and 2.3 complement each other and both should be read to thoroughly understand how
the hardware implements each FPA instruction.
As stated before this section concentrates on data flow. Figure 2-7, FPA block diagram, shows the
data bus interconnections and the various register in the FPA. Although this figure is not specifically
referenced in the discussion it will help in understanding the data flow and should be referred to
frequently.

2-12

FCT M8289 SLOT 28
FPA CONTROL SIGN PROCESSOR.
EXPONENT PROCESSOR
8
EXPONENT
DIFFERENCE

FN M M8285 SLOT 24
FRACTION NORMALIZER

FM H/FM L M8286/M8287 SLOT 25/SLOT 26
FRACTION MULTIPLIER

FAD M8288 SLOT 27
FRACTION ADDER

32
64
DOUBLE
PRECISION

I
I
I

QUOTIENT

NALU

SIGN PROCESSOR

NROM
EALU
OPCODE

ROUND
BIT
GEN

ROM STAG REG

32
34

25 32
BUS
FPB

FALU
MUX

68
24 32

BUS FPA <33:00>

FPA CONTROL
IB
OPCODE
SPECIFIERS

CONTROL
STORE

512x48

NEXT ADDRESS

CS BUS <95:00>

- - - CONTROL
TO ALL
ROM
FPA LOGIC
DATA......__ __,.
BUF.

I
I
I
I
I

BUS FP B <33:00>

D-+µMATCH

µBRK

ALA
GRA

I
I
I +----~--------...-.
:00>
SYSTEM ID BUS <31

BUS DFMUX <31 :OO> (T.S.)
TK-0638

Figure 2-7 FPA Block Diagram
2-13

During IRD (instruction decode) the FPA performs some operations that are prerequisites to many
FPA instructions. The FPA assumes a R-R float instruction and begins FPA register loading. The
FPA has two copies of the CPU general registers. During IRD, it receives specifier information from
the IB and accesses the register addresses contained. The contents of the first specifier is placed on
FPA bus A, the content of the second on bus B.
The data on bus A is loaded in ARI, LA, SA, MCI, and MPO; bus B loads BRl, LB, SB, MPl, and
MCI. ARI and BRI are fraction registers used for the addition and subtraction of floating-point
numbers. LA and LB are loaded with the exponents of the numbers and immediately the hardware
begins an exponent difference calculation. The exponent difference and/or which exponent is larger is
needed for floating-point additions, subtractions, and multiplications. SA and SB are input registers
for the sign-processing hardware. Fraction data from specifier I (on bus A) is loaded into multiply
registers, MCI (multiplicand) and MPO (multiplier). Fraction data from specifier 2 (on bus B) is
loaded into MPI (multiplier) and MCI (multiplicand-integer). MCI and MPI hold operand data for
MULF and EMODF instructions. The hardware multiply begins the MULF or EMODF fraction
multiply operation during IRD using MCI and MPl. MCI and MPO contain the operand for a
MULL instruction.
During IRD, numerous FPA instructions have been started. If the instruction is a float register-toregister, both operands are already loaded and ready in the FPA. Exponent manipulations needed for
add, subtract, and multiply operations have started. MULF and EMODF fraction multiplication have
started. If the instruction decoded is a MULL, the multiplier and multiplicand have already been
loaded into the proper registers.

2.2.1

Add/Subtract

The FPA add/subtract operations can be broken into three states:
I.
2.
3.

Load
Add/Subtract
Normalize.

2.2.1.1

Load - While the FPA is in IRD, it is setting up for a float, R-R operation. This means that
specifiers 1 and 2 from the instruction buffer are being placed on FP buses A and B, respectively. Bus
A loads ARI (fraction register), LA (exponent register) and SA (sign latch). Bus B loads BRl, LB, and
SB.
When the FPA decodes a floating-point instruction, it enters A-Fork and selects a microword address
based on op code and specifier types. If the instruction is a float R-R A/S, the FPA enters the optimized add/subtract execution state immediately. If, however, it is not, the FPA, under-control of the
selected microword, receives and stores the required data during A-Fork and possibly B-Fork flows. If
it is double-precision, 32 additional fraction bits are loaded into both ARO (extension of ARI) and
BRO (extension of BRl .) If it is not an R-R operation, the new data from the correct source is loaded
into ARI, LA, SA, BRI, LB, and SB.
As tne final correct operands are loaded, whether during IRD (in the case of float R-R operations) or
during some following microcontrol state in A-Fork or B-Fork, the exponent difference of the two
operands is determined by comparing LA and LB in DALO and CALU. Based on the exponent
difference, the fraction associated with the smaller exponent is loaded into SHMX and right-shifted by
ASHR until the radix points align. This happens before entering the add/subtract state.

2.2.1.2

Add/Subtract - In this state, the fractional result is computed. Based on the op codes, signs of
the operands, and exponent difference, FALU operation is selected. Normally, the FALU adds or
subtracts the already aligned fractions for the fractional result. Refer to Table 2-4 for normal FALU
operation, and Table 2-5 for special FAD operation criterion.
2-14

Table 2-4

FALU Operation

Op Code

Operand Sign

FALU Operation

ADD
ADD
SUBT
SUBT

Same
Di ff
Same
Di ff

Add
Subtract
Subtract
Add

Table 2-S Combination of Conditions Initializing Special FAD Operation
FALU Subtract

Exponent Diff

Op Code

Precision

Yes
Yes
Yes

Greater than 7
Greater than 1
Less than 2

D
D

POLY
POLY

X = Don't care
The special FAD operation is used to ensure maximum accuracy in the result while operating within
the FPA hardware constraints. The special FAD operation involves complementing the fraction associated with the smaller exponent by subtracting the fraction from zero in the FAD, returning the
complemented number to the fraction register (either AR or BR) it was in originally, and then loading
it into SHFMX and right-shifting and sign-extending based on exponent difference until the radix
points align. This special operation takes an extra microstep but ensures maximum accuracy. As a
result, the actual fraction subtraction to produce the result does not take place until this third state.
During the add/subtract state, the larger exponent is transferred to the PR.
2.2.1.3 Nonnalize - In this state, the answer is readied for return to the main machine. This involves
final normalization of the fraction, adjustment of the exponent and determination of the resultant sign.
If the calculation involved special FAD operations as discussed in the previous paragraph, the fraction
subtraction will first be carried out and then the result will be readied for return to the main machine.
When entering the normalization flows, the FPA checks three conditions:
1.

2.
3.

Exponents equal zero
FALU subtract with exponent difference less than two
Subtract, exponent difference less than 7, and DP.

If a zero operand is noted, the other (non-zero) operand is transferred to the output and if it is the
subtrahend in a FALU subtraction, the sign is complemented (minuend - subtrahend = remainder; 0 X = -X). A FALU subtraction with exponent difference of 1 or 0 initiates special flows because the
subtraction of two nearly equal numbers can result in a very small fraction (numerous leading icros)
which might require many shifts before the first significant bit is located. The special flow initiated can
shift the result up to sixty places to find the first signficant bit before it is transferred to the standard
normalize routine. If a first significant bit is not found after 60 bits have been shifted, a zero is readied
as a result. If the third branch is taken, the addition state described in Paragraph 2.2.1.2 results, then
flow reenters the normalization routine.

2-15

Usually, the unnormalized result requires a shift of four places or less. If this is the case, the four MSBs
are examined to locate the first significant bit. Based on the location of the first significant bit, a
rounding byte is added to the fraction. If the result from a FALU subtractio_n is negative, the. FALU
result is subtracted from the rounding byte to return the number to sign magnitude notation and round
it in a single step. Once the FALU result is added to or subtracted from the rounding byte, the fraction
is shifted and least significant bits are dropped.
In all cases, the num her of shifts required to ready the fraction for return to the CPU is computed and
is used to adjust the exponent in the PR. Once completed, the exponent,. the normalized fraction, and
the sign of the result are placed on the FP bus A. When the complete result is on the bus, standard
routines handle the actual transfer to the main machine.
2.2.2 Multiply (Floating-Point)
The FPA multiply operation can be broken into three operations: load, multiply, and normalize. In the
process of carrying out a FP multiply, the FPA receives the operands (each consisting of an exponent,
fraction, and sign bits), checks for zeros and reserved operands;Ioads the exponent, fraction, and sign
bits into the appropriate registers; starts the hardware to carry out the required calculations; and
assembles and readies the result for return to the CPU when notified that the hardware calculation is
finished.
2.2.2.1 Load - To maximize speed, the FPA is continuously setting up for a float R-R operation. This
means that in IRD specifiers, 1 and 2 from the instruction buffer are addressing the GPRs (generalpurpose register) in the CPU, and the register data is being placed on FP buses A and B, respectively.
Bus A loads MCI (multiplicand register), LA (exponent register) and SA (sign latch.) Bus B loads MPI
(multiplier register), LB, and SB.
When the FPA decodes a floating-point instruction, it enters A-Fork and branches to a specific microword based on op code and specifier type~. If the instruction is a float R-R multiply, the operands are
already loaded and the FPA enters the multiply state immediately. If, however, it is not, the FPA,
under control of the selected microword receives and stores the required data during A-Fork and
possibly B-Fork flows. If it is a double-precision multiply, 32 additional fraction bits are loaded into
both MCO (extension of MCI) and MPO (extension of MPl .) If one or both of the specifiers are not
registers, ail new data will be loaded into MCI, LA, SA, MPI, LB, and SB.
As the final correct operands are loaded, whether during IRD (in the case of float R-R operations) or
during some following microcontrol state, the fraction multiplier begins the fraction multiply by
breaking the fractions into nibbles and beginning the hardware multiplication using the first multiplier
nibble.
2.2.2.2 Multiply - In the multiply state, the fraction multiplication continues until a final fraction (as
yet unnormalized) is computed, the exponents are added, and the sign of the result is computed. The
fraction multiplication is initiated when the multiply flows issue MCONT (multiply continue.)
As MCONT is issued, the FPA checks for operands equal to zero or minus zero (reserved operand.) If
a zero operand is found, computation stops and the FPA immediately returns a zero to the base
machine. If a reserved operand is found, the operation aborts. If neither are found, computation
continues. In the case of a float (single-precision) multiply, the fraction multiplication is completed as
the exponent calculation is completed. The product is transferred to the NR. In a double-precision
multiply, the microcontrol enters a wait state. While waiting during a double-precision multiply, the
FPA continually transfers the output of the fraction multiplier to the normalizer. This enables the FPA
to begin normalizing the fraction result as soon as the multiplication is complete. It remains in the wait
state until a hardware counter in the fraction multiply logic asserts MUL/DIV DONE indicating the
fraction multiply is complete.

2-16

While the fraction multiply and the check for zeros and reserved operands is taking place, the exponents are added If no zeros or reserved operands are found, the fraction multiply and exponent
processing continues. After the exponents are added, a bias of 200g or 8016 is subtracted from the
exponent result to return the exponent to excess 80 notation (refer to Paragraph 1.5).
In a multiply operation, the sign of the result is the exclusive-OR of the operand signs.
By the time the fraction multiply is complete, the exponents have been added, and exponent bias
subtracted, and the sign of the result has been calculated. The result of the fraction multiply is moved
to NR.
2.2.2.3 Normalize - The normalize state of a floating-point multiply is very simple. Since the input
operands are always between 1/2 and 1, the result is always between ,1/4 and 1. This means that the
result can be normalized with a single shift of four bits, or less. In the normalize state, the fraction is
rounded and shifted, and the exponent is adjusted to reflect the normalization shift. The normalized
fraction, adjusted exponent, and sign bit are placed on the FP bus A. Once the complete result is on the
bus, standard routines handle the actual data transfers to the main machine.
2.2.3 MULL (Multiply Integer Longword)
The FPA's MULL algorithm is the simplest and most straightforward of all the operation flows. The
FPA receives two 32-bit signed integers, pe'rforms an unsigned multiplication, and returns the 64-bit
answer to the base machine. The FPA performs no result normalization, no checks for reserved operands, zero operands, or other error conditions. Microcode in the base machine generates the condition
codes and handles all the checks and manipulations required to ensure a correct result.
2.2.3.1 Load - As discussed in introductory Paragraph 2.2, the FPA during IRD loads MPO and
MCI (the two registers used in MULL operations) with the register contents of specifier 1 and 2,
respectively. If the instruction decoded in the A-Fork flows is a R-R MULL, the FPA can begin the
multiply immediately. If it is a MULL but not an R-R, the FPA will, under the control of the selected
microaddress, load data from the correct source into either or both MPO and MCI.
2.2.3.2 Multiply and Return - The decoding of a MULL causes the fraction multiply hardware to
abandon set-up of a MULF and begin accessing the registers used for MULL (MCI and MPO.) When
the proper data has been loaded, MCONT is issued by the FPA. This indicates to the fraction multiply
hardware that the correct data is in MPO and MCI, and that the data accesses started previously were
accessing correct data.
MCONT enables the fraction multiply hardware to continue multiplying. The multiply continues,
controlled by a hardware sequencer within fraction multiply hardware, while the FPA waits two machine cycles. The answer accumulates in ACCM and LSH. After two wait cycles, the multiply is
finished. The hardware stops and the FPA makes the 32 low-order bits (from LSH) available to the
CPU. When the CPU responds with CPSYNC, indicating the low-order bits have been stored, the
FPA readies the high 32 bits from SALU for transmission to the CPU.
2.2.4 Divide
The FPA divide operation can be broken into three steps: load, divide, and normalize. To do a floating-point divide, the FPA receives the operands (each consisting of sign, fraction, and exponent bits),
loads the operands into holding registers, tranfers the operands from the holding registers into the
correct division registers, starts the hardware to do the fraction division, checks for zero and reserved
operands, starts the hardware to store the result, and normalizes and packs the result for return to the
CPU.

2-17

2.2.4.1

Load - The loading of division operands takes place in two substeps: data fetch, and division
register load. Unlike the FPA add/subtract, multiply, and MULL operations, the FPA does not load
division operands into the proper division registers during IRD (Table 2-6).
Table 2-6 The Division Load
Specifier 1

Specifier 2

IRD

Data Fetch Substep

Op code decoded, specifiers and
precision known

Division Register Load
Substep 2 microwords

New data loaded into ARI and
ARO*, LA, and SA, if needed.

New data loaded into BR I
and BRO*, LA, and SB, if
needed.

I st Microword - move LA (divisor
exponent) to XR.

Move BR (divident fraction)
to NR.

2nd Microword - move AR (divisor fraction) to just vacated NR.

Move NR (dividend fraction)
to RR and right shifts the just
loaded divident fraction to
compensate for RR 's hard
wired left shift. This right shift
ensures initial dividend is
properly represented.

Subtract XR (divisor exponent)
from LB ( divident exponent).
*ARO and BRO are fraction extension registers for double precision operations.

During IRD a R-R float operand is assumed. This means that both specifier I and 2 are assumed to be
registers. The contents of the first register named is placed in AR, LA, and SA, the content of the
second in BR, LB, and SB. If the operation decode is a R-R float divide, the data fetch substep is done
and division register load may begin.
However, if it is not an R-R float, divide microcode waits for data from the correct specifier and loads
it into either ARI, LA, and SA; and/or BR, LB, and SB. When the divisor is in AR, LA, and SA, and
the dividend is in BR, LB, and SB; the data fetch substep is finished.
The division register load substep loads the divisor's and the dividend's fraction and exponent components into the registers required to do a division. The loading of the proper registers takes two
microcode steps. The first microcode step loads the divisor exponent into XR and loads the dividend
fraction into the NR. The second microcode step finishes the register loading by moving dividend
fraction (in the NR) to the RR and loading the just vacated NR with the divisor fraction from the AR.
It also starts the fraction division hardware, checks for zeros and reserved operands, and subtracts the
divisor exponent (XR) from the dividend exponent (LB) (LB - XR).
2-18

2.2.4.2 Divide - The divide operation continues unless a zero, or reserved operand is found. If a zero
dividend is found, operations cease and a zero is readied for return to the CPU. Finding a zero divisor
or a reserved operand initiates error states. The FPA will remain in these error states until returned to
IRD by a CPU signal.
If no zeros or reserved operands are found, the division continues. A bias 80 is added to the result of
the exponent subtraction to return it to excess 80 notation (Paragraph 1.5.) The fraction multiply
hardware is started. This hardware is used to store the result of the fraction division as it is generated.
The division continues under hardware control as the FPA microcode remains in a divide wait loop.

The hardware uses the restoring, repeated subtraction technique to divide. The dividend is initially
loaded into the RR and the divisor is stored in the NR. The divisor (contents of NR) is subtracted
from the dividend (contents of RR). If the result is negative, a zero is left-shifted into result register in
the fraction multiply hardware and the contents of the RR is left-shifted by one. If the result is positive
or zero, a 1 is left-shifted into the result register, and the result is loaded into the remainder register left
shifted by one. The divisor (contents of NR) is continually subtracted from the contents of the RR
until 26 bits (58 bits for double precision) of quotient are generated. MUL/DIV DONE is now asserted.
Asserting MUL/DIV DONE stops the division and ends the divide wait loop. The divide result is
transferred from the fraction multiply hardware where it was stored during generation to the normalize register (NR) in the normalize hardware.
2.2.4.3 Normalize - Since the two initial operands are normalized (between 1/2 and 1), the result is
always positive and between 1/2 and 2. This means the normalize and round operation is simple and
will take only one microstep. The result is examined, a round byte is selected and added, and the data is
shifted as needed to produce a normalized result. The exponent result is adjusted to reflect the direction and amount of the fraction shift. The normalized fraction, adjusted exponent, and sign bit are
placed on the FP bus(es). Once the result is on the bus(es), standard storage routines handle the actual
transfer to the CPU.
2.2.5 EMOD (Extended Precision Multiply and Integerize)
The EMOD operation is partially done in the FPA. The FPA performs an unsigned 32 X 24-bit (64 X
56-bit for double precision) multiplication and returns the fraction result to the main machine. The
main machine does all further processing. The FPA EMOD operation can be broken into two steps:
operand load, and result calculation and return.
2.2.5.1 Operand Load - Loading the EMOD operands involves loading the multiplicand, an 8-bit
multiplicand extension, and the multiplier into proper registers. The multiplicand (either single or
double precision) is loaded into MC during A-Fork. In B-Fork, EMOD flows are started. These flows
wait for the CPU to fetch the multiplicand extension (8 bits) and transmit it to the FPA via the ID bus.
The FPA loads the extension into MCX which is part of the MCI register. The second operand is then
transmitted to the FPA and loaded into appropriate multiplier register MPO and MPl. The multiplier
is not extended. The FPA receives and stores the exponent and sign associated with both operands but
does not use them.
2.2.5.2 Result Calculation and Return - Once the operands are loaded, MCONT is asserted and the
FMOD multiply begins. The operands are tested for zeros or reserved operands. If zeros are found,
special flows stop the multiply and return a zero to the CPU. Finding reserved operands initiates error
flows. If no exceptions are found, the multiply sequencer, started by M CONT asserted, continues
multiplying. A single precision (float) multiply is finished in one microstep after the exponent test. A
double precision multiply causes the FPA to enter a wait loop. It remains in the wait loop until the
multiply sequencer asserts MUL/DIV DONE indicating the result is computed.

2-19

When the result computation is finished, the fraction (32-bit float, 64-bits double) is transmitted to the
CPU. The CPU does all further processing including sign computation, removal of the integer part,
normalization, and exponent calculation.

2.2.6

POLY (Polynomial Evaluation)

2.2.6.l

Introduction - POLY is an FPA implemented instruction. The FPA does the majority of
calculations required to evaluate a polynomial expression. This involves storing a constant, and an
accumulation; receiving coefficients; repeated additions and multiplications using the constant, the
accumulation, and the riew coefficient, and the readying of a final result to be returned to the CPU. It
also uses specialized operations (both hardware and microcode) to ensure maximum accuracy within
the FPA hardware limits.
The following paragraphs explain POLY flows, polynomial expression and define various terms, and
POLY exceptions in detail. Also discussed are the numerous flows required to handle errors, underflows, overflows, and zeros.

2.2.6.2

The Polynomial Expression - The generalized polynomial may be written:

f(x) = ao + a1x + a2x2 + ... + anxn.
The x, a constant within each polynomial, is called the argument and is raised to various powers: xi,
x2, x3, ... , xn. The highest power represented here by n superscript is called the degree of the equation.
The ao, a1, a1, ... , an are the coefficients. Rearrangement and factoring produces f(x) = ao + x(a1 + x
(a2 + ... + x(an-1 + xan ))). The result, f(x), may be computed: an times x then add an-I ; the resultant
answer times x and then add an-2 . .. The generalized form is: (accumulation times x) plus the new
coefficient, ai, equals the new accumulation.
The POLY instruction format is POLY argument, degree, coefficients table. The FPA receives and
stores the argument.
The CPU uses the degree operand to determine when it has accessed the last
coefficient of the table so it may inform the FPA that the POLY calculation is done. The coefficient
table is arranged in an, an-I, an-2, ... , a1, and ao order. The CPU transmits the coefficients to the
FPA as needed: an first, an-I next, ...

2.2.6.3 Normal POLY Flows- The FPA begins special POLY flows in B-Fork. The POLY argument
is transferred to the FPA during A-Fork and then loaded into the argument registers. The argument
fraction is loaded into MP, the exponent into XR, and the sign is SX. The argument remains in these
registers throughout POLY execution. The FPA waits for the first coefficient to be sent so the POLY
computation can begin.
POLY computation can be divided into three large categories:
1.
2.
3.

Argument and First Coefficient Handler
Generalized POLY Computation (neither first term or last term)
POLY DONE Handler (handles Ao, the last coefficient).

This section will discuss the flow by these three categories. Within each category, microcode controls
the normal operations, checks for exceptional conditions, and attempts to recover from any exceptional conditions. Refer to Figure 2-8 for a summary of the POLY flow.

2-20

POLY BEGINS WITH
ARGUMENT IN
AR, LA, AND SA

j_
FIRST COEFFICIENT HANDLER
•MOVE ARGUMENT TO REGISTERS
MP+-AR
XR +-LA
SX+-SA

ARGUMENT FRACTION
ARGUMENT EXPONENT
ARGUMENT SIGN

* IF ARGUMENT IS ZERO, FLOW REMAINS IN THIS HANDLER WAITING FOR
LAST COEFFICIENT WHICH WILL BE FLAGGED BY POLY DONE

.,,

ciQ'

..,c:
0

t;-'
f...)

-l

::r
0

*WAIT FOR FIRST COEFFICIENT
·MOVE COEFFICIENT TO REGISTERS
COEFFICIENT FRACTION
MC.BR +-A(N)
COEFFICIENT EXPONENT
LB +-A(N)
COEFFICIENT SIGN
SB+-A(N)
TRANSFER COEFFICIENT SIGN
SA+-SB
"MULTIPLY COEFFICIENT AND ARGUMENT FORMING MULT.RESULT
MULTIPLY FRACTIONS
AR+-MP"MC
ADD & ADJUST EXPONENTS
LA.PR +-XR+LB-128
COMPUTE SIGN
SA+- SA.XOR.SX
"IF OVERFLOW/UNDERFLOW ENTER GENERAL POLY FLOWS ATTEMPTING
A RECOVERY

""C

...... NORMAL
--... ENTRY

.,,-<:
0
~

•

......

LAST COEFFICIENT HANDLER
(POLY DONE ASSERTED AND ARGUMENT OR DEGREE= 0)
ANSWER IS JUST LAST COEFFICIENT
• READY COEFFICIENT FOR RETURN
PR+- LB
TRANSFER EXPONENT
NR +-BR
TRANSFER FRACTION
SA+- SB
TRANSFER SIGN
• GO TO REGULAR STORE FLOWS
NSHF +- NR
TRANSFER FRACTION
ASSERT FPSYNC INDICATING ANSWER IS READY

OVERFLOW/UNDERFLOW
ENTRY

GENERAL POLY FLOWS (NO POtY DONE)
"WAIT FOR COEFFICIENT
·MOVE COEFFICIENT TO REGISTERS
BB +-A(I)
COEFFICIENT FRACTION
LB +-A(I)
COEFFICIENT EXPONENT
SB +-A(I)
COEFFICIENT SIGN
"ADD COEFFICIENT AND MULT. RESULT FORMING ACCUMULATION
ADD FRACTIONS
NB +-AR+BR
PR+- MAX( LA.LB)
SELECT EXPONENT
NORMALIZED
MC+- NR
NORMALIZED
PR+- PR
SIGN OF ACCUMULATION
SA +-SR
"IF OVERFLOW. ERROR
IF UNDERFLOW ACCUMULATION IS SET TO ZERO
MULTIPLY ACCUMULATION AND ARGUMENT FORMING MULT.RESULT
ARGUMENT• ACCUMULATION
AR +-MP MC
ADO & ADJUST EXPONENTS
PR+- PR+XR-128
CGMPUTE SIGN
SA~ SA.XOR.SX
"IF OVERFLOW/UNDERFLOW. CONTINUE GENERAL POLY HOWS
ATTEMPTING A RECOVERY
0

POLY
DONE

......
p

LAST COEFFICIENT HANDLER (POLY DONE ASSERTED)
•WAIT FOR COEFFICIENT
• MOVE COEFFICIENT TO REGISTERS
BR.+-A(I)
COEFFICIENT FRACTION
LB +-A(I)
COEFFICIENT EXPONEN i
SB +-A(I)
COEFFICIENT SIGN
• ADD COEFFICIENT AND MULT.RESULT FORMING ACCUMULATION
NR +-AR+BR
ADD FRACTIONS
PR+- MAX( LA.LB)
SELECT EXPONENT
"IF OVERFLOW. ERROR
• GO TO REGULAR NORMALIZE FLOWS
NSHF +- NR
NORMAL FRACTION
PR+- PR
ADJUST EXPONENT
SA+- SR
SIGN OF RESU· T
ASSERT FPSYNC INDICATING ANSWER IS READY

Within the flows different microcode handles float and double precision operation. In POLY double
coefficient, argument, and accumulation fractions each use an additional 32 low-order bits. The differences between float and double precision are not discussed in each operation because it is normally
limited to longer fraction multiply times and slower fraction transfers. These come about because there
are more bits to be multiplied and moved.
When the first coefficient, Ao, is sent, it is loaded in MC, LB, and SB. Since the argument has not yet
been checked, both the argument and the coefficient are checked for exception conditions and POLY
DONE is checked. If any exception condition is noted, special flows are accessed. POLY DONE
asserted indicates that the coefficient just sent was the final coefficient (in this case, the first coefficient
is also the last coefficient). If the argument (x) is zero, all terms except the Ao term of the polynomial
will be zero. Either POLY DONE asserted or x equals zero causes the FPA to access a special last
coefficient routine in the argument and first coefficient handler that returns Ao to the CPU as the result
of the polynomial calculation.
After both the. argument and the coefficient are checked and no exception conditions are found, the
first multiply takes place. While the fractions are multiplied in the fraction multiply logic (FML and
FMH), the exponents are added and adjusted to return the excess 80 notation (FCT) and the sign of
the result is computed (FCT). When the multiply is done, the fraction is moved to AR for the addition
operation. To maximize calculation accuracy, no normalization is performed after the multiplication
and 8 additional low-order fraction bits are transferred to the AR register and stored in ARX. These 8
bits are used when the new coefficient is added to the multiplication result to produce the new accumulation.
While the multiplication fraction result is being transferred to AR, the exponent result is checked for
exponent overflow or underflow. If no overflow or underflow is found, the addition will begin as soon
as the new coefficient data is ready. If, however, overflow or underflow are sensed, special flows that
attempt to recover from the over/underflow are accessed (Paragraph 2.2.6.4).
While the new coefficient data is checked for zero and/or reserved operands, the addition/subtraction
begins on the assumption that the coefficient data will be valid. The exponent difference hardware
selects the larger exponent for processing by the FCT and loads it into PR. It also shifts and loads the
fraction associated with the smaller exponent into the B-input of FALU. FALU then adds or subtracts
the fraction. When the coefficient data proves valid, the computed fraction result is transferred to NR
where it can be normalized.
The fraction normalization takes place in the FNM logic. A rounding byte is added and the result is
shifted until normalized. The exponent is adjusted based on both the rounding byte and the number of
shifts required to normalize the fraction. The normalized fraction is moved to MC and a multiply with
the stored argument (x) begins.
Once the first multiply is completed, the POLY calculation is in the general POLY flow. These flows
multiply by the result of the last add and normalize by the argument (x), receive a new coefficient from
the CPU, check it for exceptional condition, then add it to the result of the multiply operation, normalize the result of the addition, and ready it for the next multiply. The general POLY flows check the
intermediate results for overflow, underflow, and zeros, and access special flows if an exception is
found.
The general POLY flow continues until the CPU sends a coefficient flagged with POLY DONE rather
than CP SYNC. This indicates that the coefficient just transmitted is the final coefficient in the table.
The POLY DONE flow adds the final coefficient and then accesses the normalization flows in the FPA
addition flows. These flows round and normalize the fraction and adjust the exponent based on therounding byte and normalization shift. Once the result is complete, it is placed on the FP bus A and
standard routines handle the transfer to the CPU.

2-22

2.2.6.4 POLY Exception Flows - The POLY flows have many special sections to check for and
handle exceptional conditions. Each coefficient is checked for zeros and reserved operands. The POLY
argument is checked for zero. The CPU checks the argument and degree for reserved operand. The
FPA also checks the intermediate results for underflow, zero, and overflow. If an underflow or overflow is detected, special flows attempt to recover from the condition without a loss of accuracy.
The exception flows (zero, reserved operand, overflow, and underflow) can be divided into three categories to handle exceptions discovered during:
1.
2.
3.

First coefficient and argument handling
General coefficient handling
POLY DONE (final coefficient) handling.

Within each category, different microcode handles float and double precision operation. However,
there is little difference between the exception procedures used in each category and only minor differences in the microcode. As a result, each individual exception flow is not discussed, rather the microcode procedure for each type of exception is explained.
Zeros
The argument and each coefficient are checked for zeros. The argument and first coefficient are
checked for zeros at the start of the POLY flow. If the argument (x) is zero, all the terms of the
polynomial will be zero except Ao, the last coefficient. With the argument equal to 0, the FPA will
remain in the first coefficient loop waiting for the last coefficient (flagged by POLY DONE). When it
is received, it will be tested for reserved operand and, if not reserved, will be returned to the CPU as the
result of the polynomial. If the first coefficient is zero, the accumulation registers will be set to zero and
the FPA will wait for the next coefficient.
If a zero is found as a subsequent coefficient (when the current accumulation is not zero), the current
accumulation which is unnormalized will be rounded and normalized, and the FPA will wait for the
next coefficient.
Reserved Operand

F.ach coefficient is checked by FPA hardware for reserved operand. If a reserved operand is found, the
POLY operation is immediately aborted and the accelerator error bit is set. The argument is not
checked for reserved operand by the FPA because it is checked in the CPU and, if found to be reserved, the POLY operation never starts in the FPA.
Overflow

The FPA checks for overflow by examining the exponent bits PR8 and PR9 in the PR register. If PR8
(the overflow bit) is high and PR9 is low, an overflow has occurred.
The FPA checks each current accumulation two times per cycle for an overflow condition - once when
the unnormalized multiplication result is readied for adding the new coefficient and once after the
addition result has been rounded and normalized. If an overflow is detected in the second instance
(normalized addition result overflow) the FPA will abort. The FPA will set the PSL V (overflow) bit
and wait until the CPU traps it back to IRD.
If the unnormalized multiplication result overflows, the FPA accesses overflow routines in an attempt
to recover an accurate result from the overflow. The FPA microcode is written based on the assumption that if the new coefficient exponent is subtracted from the current overflow, the result may be
small enough that the exponent will no longer overflow (PR8 will be low.) As stated before, PR8 is
high. This means the exponent in PR is lOXXXXXXX (9 bits long.) Since the exponent difference
taker EALU is only 8 bits long, the overflowed exponent must be scaled down. The FPA subtracts 8016
to scale it down.
2-23

The new coefficient is first checked for zero or reserved operands. A reserved operand causes an abort.
If the coefficient is zero, it will not change the overflow. The FPA will attempt to recover from the
overflow by first adding back the 8016 to return the exponent to correct value, then normalizing and
rounding. If this fails the FPA will set the overflow bit and abort.
If the new coefficient is not zero or reserved, the operation continues. The FPA subtracts 8016 from the
exponent of the coefficient to scale it down. The reduced exponent coefficient is checked for underflow. If an underflow is sensed, the coefficient is effectively zero when compared with the accumulation. Since the coefficient is effectively zero, the FPA will attempt to recover from the overflow by first
adding back the 8016 to return the exponent to correct value, then normalizing and rounding. If this
fails. the FPA will set the overflow bit and abort.
If the reduced coefficient did not underflow, it shows that the coefficient can effect the accumulation
and possibly recover it from the overflow condition. In the case of accumulation overflow flows, we
know the accumulation is the larger number. Therefore, no checks are performed on the exponent to
find the larger number. The exponent difference taker then subtracts the two scaled down exponents to
determine how many times the coefficient must be shifted to align the radix points. The POLY
add/subtract will take place. The accumulation fraction is moved through ADER MUX to FALU and
the restored (8016 added) accumulation exponent is moved to PR for processing.
The POLY add/subtract takes place. The fraction result is moved to NR where it is normalized and
rounded. The result exponent (formerly the accumulation exponent), is adjusted based on the fraction
normalization and rounding. The result is check~d for overflow and underflow. As stated at the beginning of this overflow section, an overflow after the normalization and rounding operation will cause
the FPA to assert the overflow V bit and abort.

Underflow
The FPA can handle numbers as small as .29 X IQ-38. A number smaller than this causes an underflow. The FPA checks for underflow by examining the exponent register PR. PR9 will be high or
PR <8:0> will be low in an underflow.
Underflow is not as serious a fault as overflow. An underflow means the result just checked is so close
to zero that the FPA cannot accurately represent it. When encountered, the FPA sets the ACC
ZDA TA bit and special flows attempt to recover the number. If the underflow result cannot be recovered, the number is set to zero and FPA operation continues. After the POLY operation is completed,
the CPU will trap on underflow if bit 6 (floating underflow) of the PSL is set.
The FPA checks for accumulation underflow twice per POLY cycle, once as the unnormalized multiplication result is readied for the following addition and once after the result of the addition has been
normalized and rounded. If an underflow is detected in the normalized addition result, no result
recovery is possible. The FPA merely sets the accumulation to zero, informs the CPU of the underflow, and continues the operation.
If an underflow is detected after the multiplication, special flows are accessed to save the result. In an
underflow the exponent of both the accumulation and the coefficient must be scaled up so the exponent difference can be taken with an 8-bit exponent processor. The scale factor is 80t6·
The coefficient is first checked for zero or reserved operands. A reserved operand causes an abort. A
zero coefficient will not change the underflow so the FPA will try to recover by normalizing and
rounding. If this fails, the accumulation will be cleared (set to zero) and the FPA operation continues.

2-24

If the new coefficient is not zero or reserved, the operation continues. The FPA adds 8016 to both
exponents to scale them up. If the coefficient exponent overflows when it is scaled up, the coefficient is
so much larger than the accumulation that the accumulation will not effect the coefficient. The FPA
will disregard the accumulation and make the new coefficient the accumulation by subtracting the 8016
just added to the coefficient exponent and moving the coefficient to the registers formerly holding the
underflow accumulation.

If the new coefficient does not overflow, it shows that the coefficient can effect the accumulation and
the exponent difference taker determines the exponent difference. Since the coefficient is the larger
number, the coefficient fraction is moved through the ADER MUX to the FALU and the coefficient
exponent is stored in PR after the bias previously added is removed. The accumulation fraction is
shifted based on the exponent difference until the radix points align, and then added/subtracted. The
result is rounded and normalized in the normalize logic. The coefficient exponent (stored in PR) is
adjusted based on the fraction normalization and rounding, and becomes the accumulation exponent.
The rounded result is checked for underflow. If underflow is detected, the ACCZ bit is set and a zero is
stored. The FPA informs the CPU that an underflow has occurred by asserting both FP SYNC and
ERR SYNC. In any case, the polynomial operation continues.

2.3 BLOCK DIAGRAM AND UNIT DESCRIPTION
This section provides a functional description of each area of the FPA with relation to the control store
and instruction execution. Discussions of logic unit operations are included for areas that require
further clarification.
The FPA can be divided into three areas. The first area contains two interface sections: the CPU-FPA
interface and the FPA internal buses (which interface between the various sections of the data manipulation area). The second area, data manipulation, contains five sections: Fraction Adder/Subtractor,
Fraction Normalizer/Divider, Fraction Multiplier, Exponent Processor, and Sign Processor. Each
section in this area operates as an independent unit, capable of processing data in parallel with operations being performed in other sections. The third area contains only the Control Store and Logic
which controls both interfacing and data manipulation. Refer to Figure 2-9, the FPA Block Diagram.

2-25

FCT M8289 SLOT 28
FPA CONTROL SIGN PROCESSOR,
EXPONENT PROCESSOR
8

EXPONENT
DIFFERENCE

FN M M8285 SLOT 24
FRACTION NORMALIZER

FM H/FM L M8286/M8287 SLOT 25/SLOT 26
FRACTION MULTIPLIER

FAD M8288 SLOT 27
FRACTION ADDER

64
DOUBLE
PRECISION

I
I
I

QUOTIENT

NALU

SIGN PROCESSOR

NROM

ADER
MUX

EALU

OPCODE

ROM STRG REG

ROUND
BIT
GEN

60
60

ARX

2
32
34
25 32
BUS
FPB
15

24 32

BUS FP A <33:00>

FPA CONTROL
18
OPCODE
SPECIFIERS

CONTROL
STORE
512x48

- - - CONTROL
TO ALL
ROM
FPA
DATA....__
_ LOGIC
__

I
I

BUS FP B <33:00>
RLA
GRA

µBAK

BUF.

NEXT ADDRESS

CS BUS <95:00>

D-+µMATCH

32
SYSTEM ID BUS <31:00>

BUS DFMUX <31 :OO> (T.S.)
TK-0538

Figure 2-9

FPA Block Diagram

2-26

The CPU transmits both data and instructions to the FPA. The instructions are decoded in the Control Store and Logic and access an FPA control store word. The FPA control store word controls the
transfer of the data on the FPA internal buses and the operation of the various data manipulation
sections. The various data manipulation sections perform the required operations. The resulting answer is formatted and sent to the CPU-FPA interface. A signal from the FPA informs the CPU that
the answer is available at the interface.
Each of the eight sections mentioned in this introduction are discussed individually in the following
paragraphs. Each discussion includes an explanation of pertinent control store fields and a description
of the hardware operation as controlled by the control store, CPU instruction, data characteristics,
and both internal and external flags.

2.3.1 CPU-FPA Interface
The CPU and FPA have numerous interconnections. They exchange data, instruction information,
device control signals, and status information over buses and individual signal lines. There are three
types of information transferred via the CPU-FPA interface.
l.
2.
3.

CPU-FPA control and status
Data
Trap and diagnostic information.

They will be discussed in this order in.the following paragraphs. Refer to Figure 2-10 for a summary of
the CPU-FPA interface.

ID BUS

#16 MAINTENANCE
-- REGISTER
REGISTER #17 STATUS

CS BUS

---

OP CODE INFORMATION
MACHINE CLOCKS

._.

--..

FPSYNC

CPU

FPA
._.

ACC ERROR

GENERAL REGISTER
ADDRESS LINES

DFMX BUS

...- C, V, Z, AND N BITS
EXECUTION POINT
COUNTER

-..
TK·0520

Figure 2-10

CPU-FPA Interface

2-27

2.3.l.I CPU-FPA Stan. ~d Control Interface-The FPA and CPU work interactively. This means
they are constantly exchanging status and control information, and that operations in one unit can and
do effect operations in the other unit. The status register (ID register 17) provides some CPU control
of the FPA. Bit 15 of the status register is used by the CPU to enable the FPA. The CPU can disable all
FPA outputs and effectively remove the FPA from the computing system by clearing bit 15. Refer to
Figure 2-11 and Table 2-7 for a complete description of this register.

STATUS REGISTER
ID REGISTER #17

31 30 29 28 27 26 25

16 15 14

4 3

I lo-ol lo~------ol lo ....-. --~•oloo o11
I

ACC

ERROR

MINUS
ZERO
ERROR

ACC

TYPE
TK·0514

Figure 2-11

Status Register

2-28

Table 2-7
Bit
No.

Name

Accelerator Error
Also called ACC
Also called Error
Sync

30-28

Not Used-Set to
zero

Min us Zero Error

26-16

Not Used-Set to
zero

Accelerator Enable

14-4

Not Used-Set to
zero

3-0

Accelerator Type

The Status Register
Bit
Access

Function

Write by FPA
Read by CPU

Set when FPA detects an
exception condition.

Write by FPA
Read by CPU

Set when FPA encounters a
reserved operand or
generates an overflow.
Setting th is bit sets
Accelerator Error.

Write by CPU
Read by FPA

When clear all FPA outputs
are disabled. T}\is removes
the FPA from the computing
system. Must be set for
normal FPA outputs.

Read by CPU
Hard wired in
FPA

A hardwired code identifies
the type of accelerator
installed in the backplane
slots. The FPA code is
0001.

2-29

The FPA also receives control and status information from the CS bus. The functions of these lines are
summarized in Table 2-8.
Table 2-8

CS Lines

CS BUS
71
70

Name

NOP

ACC TRAP

Initiates an Accelerator trap. Refer to Paragraph
2.3.1.3

CPSYNC

Indicates CPU has received FPA data or CPU is
presenting valid data to FPA.

Redefine µSI

Decodes CS lines 57, 56, and 55 for more information.

Function

CS BUS
57
56

Poly End

Indicates last term of polynomial has been transmitted from CPU.

FP TRAP

Initiates an FPA trap. Refer to Paragraph 2.3.1.3.

Op code information (operation and precision) is transmitted to the FPA from the instruction buffer
via IRC OPC lines 7 to 0. These lines, from byte 0 of the instruction buffer, are used by the A-Fork/BFork logic and BEN logic for FPA control store next address generation (refer to Figure 2-34). A few
other lines from the instruction buffer and decode logic provide specifier source information to the
FPA. The possible sources of data are as follows:
1.
2.
3.
4.

Memory
Register
Short literal
Long literal.

The CPU-FPA interface includes clock signals from the CPU to the FPA. The units operate synchronously on a 200 ns cycle. The TO of both units coincide.
The FPA transmits two status signals to the CPU: FP SYNC and ACC ERROR. These signals are
input to the CPU for branch control. FP SYNC is normally asserted when an FPA result is available
to the CPU. ACC ERROR is set during an FPA error condition.
2.3.1.2 CPU-FPA Data Interface - The FPA receives operand data from the CPU and, after performing the required operation, returns the answer to the CPU. The data is transmitted to the FP A via
the ID bus and is returned to the CPU via the DF mux bus. As mentioned previously the FPA does not
do any memory accessing. The CPU must calculate the data memory address, access the address, and
place the data on the ID bus to the FPA.

2-30

The FPA is optimized to use CPU scratchpad register data. It stores two copies of the 16 CPU scratchpad ..-egisters. To ensure that the FPA copies are exact copies, the FPA copies are addressed and
written by the same lines that address and write the CPU general registers. The address lines are from
the DAP board and the data is transmitted via the DF mux bus. To ensure that a changing register is
never read, lhe CPU updates the general register and the FPA copies between TlOO and T200 (TO) and
the FPA reads the copies between TO and TlOO. Note that the FPA general register copies are writeonly memory to the CPU and read-only memory to the FPA. This means that results of FPA operations that are destined for the general register set are transmitted back to the CPU via the DF mux
bus and then written into the general register set under CPU control rather than written directly into
the general register copies by the FPA.
The data stored in the FPA general register copies is read by the FP A using address lines from the
instruction buffer operand source logic. This scheme enables the FPA to access register data and begin
the operation as soon as the general register address/addresses is/are in the instruction buffer.
All operands other than register operands are transmitted to the FP A via the ID bus. This includes
memory data, and long and short literals. When memory data is specified in an instruction, the CPU
fetches it and places it in the CPU D-register. The contents of the D-register is placed on the ID bus
and, in the FPA, is transferred from the ID bus directly onto the FP buses. Since the D-register and ID
bus are only 32 bits wide each, it takes two transfers to transmit a double precision number. Single
precision (float) literal data, part of the instruction stream is transferred from the instruction buffer
onto the ID bus. In the FPA, single precision literal data is latched into the literal register (LR) and
then placed on the FP bus. The most significant part of double precision literal data is handled similiarly, i.e., IB -+ ID bus -+ LR-+ FP buses. The least significant part of a double precision literal is
transferred from the instruction buffer over the ID bus to the CPU D-register, then back on the ID bus
and onto the FP buses. Note that no ID bus addresses are required for data transfers over the ID bus.
The FPA simply accepts the current ID bus data.
When the FPA operation result is ready to be transmitted to the CPU, FP SYNC is asserted and the
single precision result or the most significant part of a double precision result is on FP bus A. The CPU
responds to FP SYNC by enabling the FPA DF mux bus drivers which place the FP bus A contents on
the DF multiplexer bus. The FPA result is transferred to the CPU D-register via the DF mux bus.
When the CPU has the data, it asserts CP SYNC. This ends a single precision (float) transfer or
enables the second part of a double precision transfer. For a double precision transfer, the second part
is placed on FP bus A and remains there until the CPU responds to the newly asserted FP SYNC by
enabling the DF mux bus drivers, accepting the data, and asserting CP SYNC to indicate it has the
data.
While the FPA is transmitting the result back to the CPU, valid condition codes are also being transmitted to CPU condition code. latches. These latches are read during the next machine cycle. The N, V,
and Z bits are set based on the status of the result. The C-bit is always cleared by the FPA.
2.3.1.3 Trap and Diagnostic Information - The FPA contains several features to facilitate error diagnosis and troubleshooting. These include programmable traps, and microdiagnostics, special maintenance features, and the visibility bus.
The CPU can initiate 2 types of traps: ACC TRAP and FP TRAP. CS 71 high and CS 70 low initiate
an ACC TRAP. This causes the FPA to access one of the FPA microcode addresses 0 through 7 as
selected by CS lines 57, 56, and 55. Currently only 2 of these traps are used: Accelerator Power-Up
Trap (address 0) and Accelerator Abort Trap (address 2). The FP TRAP (used for FP microdiagnostics), is selected by CS lines 71, 70, 57, 56, and 55 high. When FP TRAP is asserted, the
FPAmicrocode address is selected by bits 23 through 16 of the maintenance register. The trap address
(0 through 255 in the microcode) is selected by the data previously loaded into the maintenance register.
2-31

The maintenance register is a CPU-FPA readable/writeable register located on the ID bus. The CPU
accesses this register as ID bus register 16. The register is designed to facilitate maintenance. As discussed previously it contains the FP trap diagnostic address. Using the trap address the CPU can
exercise various sections of FPA logic. Bit 14 of this register provides a synch pulse that can be used for
troubleshooting with an osciJloscope. This bit will go high each time the FPA accesses the microcode
address stored in bits 8 through 0. Refer to Figure 2-12 and Table 2-9 for summary of this address.
MAINTENANCE REGISTER
ID REGISTER #16

24 23

31 30
ZERO

WRITE
TRAP
ADDRESS

16151413

9 8
MICRO /CURRENT
BREAK ADDRESS

+-TRAP ADDRESS

MICRO MATCH
WRITE MICRO BREAK
TK-0515

Figure 2-12

Maintenance Register

2-32

Table 2-9
Bit
No.

Name

Write Trap Address

30-24

Not Used-Set to
zero

23-16

The Maintenance Register
Bit
Access

Function

Write by CPU Read by
FPA

When set (by CPU) enables
CPU to write trap address
(bits <23: 16> ).

Trap Address

Write/ Read by CPU
Read by FPA

Selects FPA microcode address for FPA microdiagnostics.

Write Microbreak

Write by CPU Read by
FPA

When set (by CPU) enables
CPU to write microbreak (bits
<8:0>).

Micro match

Write by FPA Read by
CPU

Set by FPA when currently accessed by FPA microcode address equals address stored in
microbreak (bits<8:0> ).

13-9

Not Used-Set to
Zero

8-0

Microbreak/Current Address

CPU writes microbreak.
FP A reads micro break.
FPA writes current FPA
microcode address. CPU
reads current FPA microcode address.

These bits serve two functions:
I.
The microbreak selects
the FP A microcode address to be monitored for
micromatch (bit 14).
2.
The current address provides CPU monitoring of
FP A microcode activity.

2-33

Forty-three FPA signals are accessed by the Visibility Bus (V bus). The V bus is a diagnostic tool,
designed to allow polling of stable internal CPU (in this case, FPA) signals. The console can issue
commands which load the V bus latches with the signals monitored and then shift the loaded latches
one bit at a time to a control word located in the console interface. At the console, the data shifted in
will be examined by diagnostic software. There are 8 data input channels on the V bus, channel 6 is
devoted to the FPA. Refer to Table 2-10 for listing of the FPA signals that are available to the V bus.
Table 2-10

Signals Monitored by Visibility Bus
FCTDEALUOL
FCTECOMPLL
FADR SPC (0) H
FNMS EALU CIN L
FCTCSELNORM H
FCTP RA ADRS 3 L
FCTP RA ADRS 2 H
FCTP RA ADRS 1 L
FCTP RA ADRS 0 L
FCTP RB ADRS 3 L
FCTP RB ADRS 2 L
FCTP RB ADRSS 1 L
FCTP RBADRSOL
DAPL ACC CONTEXT 0 H
DAPL ACC CONTEXT 1 H
FCTCCLRRRL
FCTH CP SYNCH
FNME BUS-+ EXP L
FCTJACCNDATAH
FCTCACCZDATAH
FCTC ACC VDATA H

FCTE SHF COUNT 5 H
FCTE SHF COUNT 4 H
FCTE SHF COUNT 3 H
FCTE SHF COUNT 2 H
FCTE SHF COUNT 1 H
FCTE SHF COUNT 0 H
FCTN FALU CARRY IN H
FCTN FAMX SEL 0 H
FCTN FAMX EN 0 L
FCTAAGTBJ
FCTN SHF MUX EN 1 L
FCTN SHF MUX EN 0 L
FCTN FALU FUNC SEL 2 H
FCTN FALU FUNC SEL 1 H
FCTN FALU FUNC SEL 0 H
FCTN FAMX SEL 1 H
FCTN LOAD ARI H
FCTN LOAD ARO H
FCTN LOAD ARX H
FCTN LOAD BRl H
FCTN LOAD BRO H
FADS BUS-+ FAD L

2.3.2 FP A Internal Buses
As discussed in Paragraph 2.3, the FPA internal buses transmit data between the various data manipulation units. These units are arranged along two parallel 34-bit tristate buses called FP bus A and FP
bus B. These buses transmit data from the CPU-FPA interface to the various data manipulation units,
transfer intermediate results between units, and return the result to the FPA-CPU interface. The buses
can transfer a complete 64-bit double-precision word or two 32-bit float words simultaneously.
The BSC field of the microword controls a majority of the bus activity. The available sources include
all FPA data manipulation units and the CPU-FPA interface. Refer to Table 2-11 for a summary of
BSC bus control operations. Note that the BSC field controls only the data source. The destination is
enabled via other control fields and accepts the data available onthe FP buses.

2-34

Table 2-11

BSC Control Store Field

Microcode
Hex
3

&.SC Field
2
1

µCS

µCS
14

µCS
13

µCS

0
0

0
0
I
I

0
1

0
0
0
0
0
0

0
I
2

Mnemonic

f'unction

INTH
NL
NH

Bus A+- SALU
Bus B* +-Bus A*+- NSHF LO
Bus B* +-Bus A*+- NSHF HI
EXP SGN (Packed result)

0
1
0

0
1
I

0
0

Buses +- SALU and LSH if MUL
TEMP and LSH if DIV
(LSH is accessed
differently if MUL or DIV)

I
0

0
0

I
0
0
1

0
I

0
t
1
1

INTL
ID
LR
ID.RB

Bus A +-LSH
Bus B* +- Bus A*+- ID Bus
Bus B* +- Bus A* +-LR
Bus A+- ID bus
Bus B+-RB

Bus A .-RA
BusB+-RB

FAL.X

Bus A+- FALU HI/LO
Bus B +-FALU LO/HI OR

FAL.LH

Bus A+- FALU LO
Bus B +-FALU HI

FAL.HL

Bus A+- FALU LO
Bus B +-FALU HI

3
4

8
9

•The same data is placed on both buses.

The buses handle both floating-point and integer numbers. The buses can handle intermediate, unpacked, and unnormalized data as well as final packed· and normalized res.ults. Since the buses must
handle intermediate data each bus contains two extra lines to handle the overflow and hidden bits.
Refer to Figure 2-13 for summary of data formats used on FP buses.

2-35

SINGLE PRECISION (FLOAT) FLOATING POINT FORMAT

BR FORMAT
F P BUS LINES (EITHER A OR B)

OVERFLOW
r-HIDDEN

OVERFLOW
FP BUS A

16 15

l Ir

HIDDEN
FP BUS B

0 33 32 31

16 15 14

FRACTION

FRACTION
NOT
USED

Ill'''''' 1.. ,,, ,, ,111 ll••i
3332 6 5 4 3 2 1 0 31302928272625242~22212019181716

I
MSB

LSB

FRACTION BIT SIGNIFICANCE

DOUBLE PRECISION FLOATING POINT FORMAT

AR FORMAT FP BUS B
333231

OVERFLOW I

1615

r HIDDEN

0333231

FP BUS A

161514

7 6

16 15

FRACTION
FRACTION BIT SIGNIFICANCE

NOT
USED

LONG WORD INTEGER (MULL) FORMAT
FP BUS (EITHER A ORB)

I ;1"""

4
10
8
24 22 20 18
14 12
6
I I I I I I I I I I I I I I I I I I I I I I

I I

1615

0 31

I
FRACTION BIT SIGNIFICANCE

MSB

LSB

RESULT
FP BUSA
33 32 31
2ND CYCLE MOST StGNIFICANT
HALF FROM SALU

LSB

MSB
NOT
USED

33 32 31

0
1ST CYCLE LEAST SIGNIFICANT HALF
FROM LSH REGISTER

NOT

USED

USED
TK-GIA

Figure 2-13

FP Bus Formats
2-36

2.3.3 Fraction Adder (FAD)
The fraction adder aligns and adds or subtracts the fraction portions of two FPNs. The module contains 2 registers that receive data from the FP buses, 2 multiplexers that manipulate the register data, a
shifter to align register contents before an add or subtract, an ALU to add or subtract the data, and
bus drivers to place the result on the FP buses (Figure 2-14). Certain FAD signals are interfaced to the
V-bus for maintenance and diagnostic purposes. Refer to Paragraph 2.3.1 for a discussion of the Vbus.
63:00

FALU

FALU FUNC
SEL <2:0>

{FORMAT SELECT)

BSC<3:0>l

SHF COUNT<5:0>

SEL AR FMT

SIGN EXTENSION _ ____.
63:00

63:00

(OUTPUT ENABLE)
SHF MUX EN

(OUTPUT ENABLE)
FAMX EN

(INPUT SELECT)
FAMXSEL
(INPUT SELECT)
SHF MUX SEL

63:00
63:00
AR

CLK AR

s3:oo I
o r

CLK B R - -

BR 106:00
63:Q71(NOT
I LOADED)

BUS FP A <33:00>

BUS FP B <33:00>
TK-0268

Figure 2-14

Fraction Adder Block Diagram
2-37

The fraction parts of the FPNs are loaded into the AR and BR registers. The data entry is controlled
by the FADC (Fraction Processor Controls) control store field as shown in Table 2-12. Both registers
are loaded with the MSB in bit 63. The execution of the POLY instruction causes an additional 7 LSBs
to be transmitted via FP bus A lines <14:08> (where the FPE is normally) and placed in AR <6:0> by
loading ARX.
Table 2-12 Fraction Data Entry

Hex
0
I
2
3
4

5
6

7
8

FADCFields
1

Operation

µCS

ARI

ARO

ARX

BRI

BRO

0
0
0
0
0
0
0
0

0
0
0
0

0
0

I
0
0
0
0
0

I
0
0
0
0

0
0
I

0
I
0
I
I
0
0
0

0
0
0

0
1
0
1
0

I
1
1
0

LOAD

1
0

1
0
I
I
0
0
0
0

I
I

1
I

I
I
0
l
1

Select lines controlled by both microcode and hardware normally load the FPF associated with the
smaller exponent into the SHFMX and the other fractional part into F AMX.

2-38

The contents of SHFMX is then right-shifted up to 63 bits to ensure that the radix points align. The
magnitude of the exponent difference determines the amount of the shift. The shifted number is padded on the left with its sign. In most cases, the fraction is positive (Figure 2-15).
SHFCOUNT
(MAGNITUDE OF
SHIFT)

ALIGNED DATA TO
FALU INPUT B

SHFR
SHIFTS 0. 1. 2. OR 3

SHFC
SHIFTS 0. 4. 8. OR 12

SHFB
SHIFTS 0. 16. 32. OR 48

SHFR

64
SIGN
EXTENSION
1'S FOR NEG
O'S FOR POS

UNALIGNED DATA
FROM SHFMX
TK-0275

Figure 2-15

SHFR Operation

2-39

When the two FPFs are aligned, the FALU operates on the two fractions. The FALU operation is
determined by the op code and the sign of the two numbers. Refer to Table 2-13.
Table 2-13

FALU Operation

Instruction

Sign of Numbers

FALU Operation

Add
Add
Subtract
Subtract

Like (Both +or-)
Unlike
Like
Unlike

Add
Subtract
Subtract
Add

FALU Operations ~lected

Function

0
0

0
I

Clear
B-A

0
0

I
I

0
I

I
I

0
0

A-B
A+B
Not Used
A or B

I
I

Comment
B = 0. Used for complementing number when
Shift/Subtract D. P. would lose bits off end. Used
when SUBD and exponent difference is greater
than 7 or POL YD.
Normal Subtract
Normal Add
Used to get A out or B out. Other side is zero.

Not Used
Not Used

2-40

The output of the FALU is loaded onto the FP buses under control of hardware and the BSC m1crocontrol field. Refer to Table 2-14. The result is in unnormalized form. When a double precision ALU
subtraction is done (either as the result of an ADDD, SUBD, or a POLY instruction), the exponent
difference is examined. If it is less than or equal to 7, operation continues as usual. However, if the
difference is 8 or more, error will be introduced into the LSB if a shift, then subtract is done. To
prevent this error, special control hardware is enabled. It disables the output of SHFMX, forcing zeros
into the shifter. The smaller operand is routed through FAMX to the A side of the ALU. AB-A (B =
all zeros) is done, complementing the operand. The larger operand remains stored in its original register. The result of the ALU operation is output to the FP buses and reloaded into tht- AR or BR
depending upon where it was before complementing. During the next machine state the complemented
operand is aligned, sign-extended and added to the other operand. The result is loaded onto the FP
buses and is normalized.

Table 2-14
3

BSC Field
2
1

FALU MUX Control

µCS

HEX

0-B

Not used for FALU MUX Control
I
I
0
0

FALU
Function

Hardware determined.
NOTE
During double precision add/subtract and poly;
If EXP A<EXP B, AR format is used.
If EXP B<EXP A, BR format is used.

FP AF ALU L (BR Format)
FPFALUH

FP A FALU H (AR Format)
FPBFALUL

2.3.4 Fraction Normalize/Divide (FNM)
The normalize/divide logic located on FNM performs the two functions indicated by its title. Refer to
Figure 2-16. The hardware can either normalize the fractional result of an add, subtract, multiply or
divide, generate the quotient given a divisor and dividend. The quotient is generated bit by bit and
stored elsewhere. When the quotient is complete, it is returned to the same hardware to be normalized
as any other fraction result. Both functions receive data based on microcontrol words, but once
started, operate relatively free of microcode control until they are ready to transmit the answer.

2-41

QUOTIENT
BIT STREAM

NALU

60
60

RND
BIT
GEN
RR

NR
SHIFT
DATA

SHF VAL

BUS FPB 33:00

BUS. FPA 33:00

TK-0274

Figure 2-16

Fraction Normalizer/Divide Block Diagram

2-42

2.3.4.1 Normalize Operation - Before a normalize operation can take place, the Remainder Register
must be cleared. A 3 in the 3-bit MSC field of the microstore word clears it during IRD. Since the
divide operations use the RR, it is also cleared during the end of the divide flows before the normalization of the quotient.
The add, subtract, multiply, and divide operations produce results with varying characteristics. The
add/subtract operation has the widest variability in result. Operand size (both fraction and exponent),
operand sign, and desired operation, all contribute to this variation. The subtraction of two very
nearly equal operands can result in a very small number, i.e., a number that must be shifted left many
times before it is in final normalized form. Addition of two operands with equal exponents will produce a result between 1 and 2, necessitating a right-shift. Since the add/subtract operations do produce
a wide variability of results, special firmware in the control store is accessed and the normalizations
proceed under firmware and hardware control.
A divide operation produces results between 1/2 and 2. A multiply produces results between 1/4 and 1.
Both divide and multiply normalizations proceed under hardware-only control.
All normalizations begin with NRC equal to 0, parallel-loading the result to be normalized into the
NR. If the operation was an A/S, BEN 5 selects special firmware based on exponent differences. If the
special firmware is enabled, an NRC equal to 2 enables the NR to shift left in 4-bit steps, 3 steps per
machine cycle.
Once the NR shift left is enabled, hardware looks at the top 12 bits of the NR for the first significant
bit as the leading bits are left shifted away. In a positive number, leading zeros are disregarded and the
first significant bit is a 1. In negative numbers (2's complement notation), leading ls are disregarded
and the first significant bit is a 0 (refer to Figure 2-17). MSN NE SIGN becomes true as the data is
parallel-loaded into NR. If the first significant bit is in NR <63:60>. This stops any left shifts. STOP
SHF goes high whenever NR <59:56> contain the first significant bit and will cause the NR to stop
shifting after one more 4-bit shift (i.e., when first significant bit is in NR <63:60> ). If NR <63;52>
does not contain the first significant bit, SWR will remain low, shifting all 12 bits out and enabling a
new microstore control word via BEN 2. It continues monitoring for the first significant bit. If the NR
is left-shifted 60 bits (counted by the control store), and the first significant bit is not found, firmware
returns a result of zero by forcing the output of the NMX to zero via FORCE ZERO.

------SWR
NR <63:52>

t--~--+---•MSN

NE SIGN

---------------STOPSHF

RES NEG
IF NUMBER IS NEGATIVE DISREGARD LEADING 1S.
IF POSITIVE DISREGARD LEADING OS.

Figure 2-17

TK-0272

Normalize Shift Enable Control Hardware

2-43

When the first significant bit is in NR <63:60>, the number can be rounded and normalized by the
remaining FNM logic.
The round byte contents, NALU operation, and final normalization shift is controlled by the round bit
generator. The round bit generator controls these functions based on NR 63, NR 62, NR 61 and RES
NEG. The round byte is combined with NR lines 39 through 36 (float or single precision) or lines 7
through 4 (double precision). This is selected via the FLOAT line. Since the final normalization shift
takes place after the round byte is added and the first significant bit can be in N R 63, NR 62, NR 61, or
NR 60 (it must be in one of these four positions), the position of the round bit (I) in the round byte
varies (refer to Table 2-15). As summarized in the table, decode logic divides the 16 possible input cases
into 4 cases, corresponding to the FSB in bit 63, 62, 61, and 60. Note that the RBG does not monitor
NR bit 63, but, since the logic is only enabled when the FSB is in bits 63 through 60 the RBG logic can
sense the contents NR bit 63 even though it does not monitor it. RES NEG L enabled means that the
number being shifted and normalized is negative. This means that leading ls (Hs) should be disregarded in the search for FSB and that the FSB will be a 0 (L). RES NEG L high indicates a positive
number, disregard of leading Os (Ls), and FSB will be a I (H). The contents of the rounding byte is
based on the location of the FSB. The rounding byte is designed to place a one 24 bits (56 bits for
double precision) behind the FSB.
Table 2-15
1.

Round Byte and Normalize Control

The logic decodes the four signals and locates the FSB.
RES
NEGL*

NR63

NR62

NR61

First Significant
Bit(FSB)

L
L
L
L
L
L
L
L
H
H
H
H
H
H
H
H

L
L
L
L
H
H
H
H
L
L
L
L
H
H
H
H

L
L
H
H
L
L
H
H
L
L
H
H
L
L
H
H

L
H
L
H
L
H
L
H
L
H
L
H
L
H
L
H

63
63
63
63
62
62
61
60
60
61
62
62
63
63
63
63

*RES NEG L high indicates a positive number. This means a I (H) is the FSB. RES NEG L low indicates a
negative number. This means a 0 (L) is the FSB. RES NEG L asserted also causes a NALU subtract thereby
rounding and complementing the number in a single step.

2-44

Table 2-15 Round Byte and Normalize Control (Cont)
2.

Based on location of FSB, an appropriate rounding byte is generated.

FSB
63
62
61

Rounding Byte Selected
Bit 3
Bit2
Bit l
1
0
0
0

0
1
0
0

0
0
1
0

BitO
0
0
0
1

3. Also based on location of FSB, the final shift required to normalize and ready the result for the
CPU is selected.

FSB

Shift Selected

SHF VAL 1

SHF VAL 0

63
62
61
60

Right 1 place
No shift
Left 1 place
Left 2 places

L
L
H
H

L
H
L
H

If the FSB is not in NR <63:60>, the NR is left-shifted and a binary counter counts each 4-bit shift.
This count, RES NEG line, and NR bits 63, 62, and 61 (magnitude of final shift) determine the
NORM ROM location to be addressed. The content of this location is added to the exponent of the
result in the FALU and corrects it for all shifts that take place in the FNM. If however, the number to
be rounded is all Is, the addition of the rounding byte will ripple through all bits and cause a fraction
overflow. This is sensed by comparing the round byte location (indicating where the logic decoded the
current MSB of the number to be rounded) and location of the MSB of the rounded result. If this
comparison asserts NORM ERR and thus EALU CIN (indicating there was a ripple and subsequent
overflow), a one will be added to the EALU (the exponent adder on FCT) to correct the exponent for
the overflow. NR <63:04> goes to the NALU B side and round byte (4-bit) goes to the A side.
NormaJly the NR is added to the rounding byte. However, if RES NEG L is asserted, indicating a
negative (2's complement) number, the content of the NR is subtracted from the rounding byte. This
operation rounds and complements (return to positive notation) in one step.
The 60-bit result <63:04> of the NALU operation (rounded and ready to be normaliz!ld) is transmitted to the NMX. The high part (and only part, if float or single precision) is transmitted through to
the NSHF for final normalization shift. The NSHF shift control bits select a 0 to 3-bit shift for final
normalization.
Final normalization moves the MSB to the equivalent of the NR 62 position. When the data is placed
on the FP buses, NR 62 (always a one since the fraction is now normalized) is the hidden bit and is
placed on the FP bus A bit 32. When the data is transferred to the CPU, the hidden bit is not transferred and the data in NR 61 (bus A bit 6) is the MSB to be transferred.

2.3.4.2

DMde Operation - This logic also performs the fraction part of the divide operation for the
FPA. Once the dividend and divisor are loaded into the FNM logic and the quotient storage on the
multiplier boards is enabled for either a float (single) or double precision result, the divide operation
runs under hardware control until the answer has been computed to the required precision. Once the
answer has been computed, microcontrol takes over and transmits the unnormalized quotient back to
the FNM logic where it is normalized and rounded like any other fraction.

2-45

2-46

NEXT
A

INIT
l

RES
POSH
CLK

>--tto------

100 ns

RR
CTL 0

NEXT
B

REFER TO TABLE 2-16 DIVIDE SEQUENCE STATES

Figure 2-18 Divide Sequence Hardware

2-47

TK-0270

CPU AND FPA
CLOCK (200 ns)
0

200

I I

50 100 150

150

I I I

50 100 150

I I I

OUTPUT OF FF'S

I I I

50 100 150

DIVIDE

111 I}

U WORD= LORR

State

I I I

50 100 150

RR+--NAL

DIVIDE SEQUENCE
CLOCK (100 ns)

RR RIGHT SHIFT

TK·0516

Figure 2-19

Divide Sequence Timing

Table 2--16

Divide Sequence States

FNM
Function

RRCTL
1
0

RR
Function

Input

0
0
0

0
0

LORR
LORR

0
0
I

0
I
I

NOP
NOP
LONALU
TORR

L
L
H

NOP

L
H
H

H
Ht
Lt

Shift R*
Parallel LD Result**
Shift L RR Contents
Refer to
PREVIOUS STATE

I
I

I
DIV DONE I

0
0

Shift R*
Divide

DIVDONa 0

Divide

Parallel LD**

-.-

*Used only once at the beginning of each divide.

t Control bit 0 is controlled by RES POS H.
**Since the RR is hardwired for a left shift, a parallel load shifts the data one place left.

The answer is generated at the rate of one bit per 100 ns. If the result of the NALU subtract is positive
or zero, a I is left-shifted into the quotient register. A negative NALU result causes a 0 to be shifted
into the quotient register. The quotient register is made of two multiplier registers (TEMP and LSH).
In single (float) precision the quotient bit stream is shifted into TEMP (use only TEMP <29:4>.
In double precision the bit stream shifts into LSH <31:4> then to TEMP <29:00>. When a I is leftshifted into TEMP 29 or 28 on the proper time phase in the multiplier logic, DIV DONE is asserted.
This stops the division and accesses a new microstore word that normalizes and rounds the quotient.
2.3.5 Fraction Multiplier (FML and FMH)
The fraction multiplier hardware in the FPA is located on two modules, FMH (Fraction Multiplier
High) and FML (Fraction Multiplier Low). They handle all fraction multiply functions, part of the
EMOD function, and also store the division quotient as it is generated. It accepts data from the FP
buses, performs the required unsigned multiplication, and gates the results back on the FP buses. Refer
to Figure 2-20.

2-48

24
MC1

a..

IL.

Cl>

:::>
m

ROM
BANK
B

MCA ND
BUS
32

MCO

MP 7:4

:::>
m

ROM
STORAGE

PALU

PP ROD
LATCH

ACCM

AALU

MCINT
32

ROM
BANK
A

SALU
CARRY
HOLD

MP
7:4

CARRY
HOLD

M PLIER BUS

LSH

TEMP

MP1

MPO

TK-0278

Figure 2-20 Fraction Multiplier Block Diagram
2-49

The FPA microcontrol controls the loading of both the multiplicand and multiplier into the appropriate FM (fraction multiplier) registers. In both float and double the complete multiplier is stored on the
FMH. During the single precision (float) function, the FMH handles the upper 16 bits of the multiplicand, FML the lower 8 bits and the answer is completed after one pass through the logic. For
double precision (56 bits) the upper half of multiplicand fraction is handled in the FMH and the lower
half is handled in the FML. Two passes are required to compute the final answer.
The FM multiplies under its own control logic. After the operands are loaded, the MCTL field in the
FPA microcontrol is asserted; this starts the multiplication. A float multiply is stopped by the microcode two states (400 ns) after it starts. For a double multiply, control goes to a wait state and remains
at that location until MUL/DIV DONE is enabled, indicating that the FM logic has finished the
operation. At this point microstore control takes over and the answer is transmitted to the normalize
logic or, in the case of EMOD or MULL, transmitted to the CPU as an unnormalized number.
In order to obtain fast multiplication, a pipeline technique is used (Figure 2-21 ). The multiplier is
divided into 4-bit nibbles. The nibbles are then accessed consecutively by a counter-multiplexer combination (least significant nibble first) and each nibble operates on up to 32 bits of multiplicand. The
MCA ND bus and MPLIER nibbles are used to address the ROMs. The banks of ROMs provide a4 X
4 primitive with 2-way interleaving. The data is latched (ROM.STORE) and applied to the inputs of 4bit adders (PALU). These adders combine the ROM data to form a partial product, storing the carryout of each 4-bit section, to be added in on the next cycle. The partial product is latched in PPROD
and passed to another row of adders (AALU) which accumulate the final product, again, saving the
carries. Thus, when the pipeline is operating, there are four processes cycling at the same time:
1.
2.
3.
4.

Select ROM addresses
Latch ROM data
Form partial product
Accumulate final product.

After the final product is calculated, the stored carriers from both stages are combined with the accumulated product using full carry look-ahead to produce the final answer in a single precision (float)
operation. In double precision, this result is stored and used during the generation of the final answer
during the second pass.
Each of the pipeline processes, with the exception of accessing ROM data (which occurs in each bank
of ROMs on 100 ns) occurs at SO ns intervals.
The operation of the FM hardware is discussed in three sections. The first section explains the operation of the pipeline, concentrating on operand loading and manipulation of partial products, partial
results, and carries to produce the final answer. The second section concentrates on the control logic
and how the signals that cont;ol the pipeline are generated. The third, and shortest section, explains
how the FM registers are used to accumulate the QlJOtient during a divide operation.
2.3.5.l

The Pipeline

Loading the Operands

The multiplication process begins with the loading of the operands. As discussed in Paragraphs 2.1 and
2.3.2, data is transferred along the FPA buses in several formats. The multiplicand loading logic sorts
out these formats and loads the multiplicand register (MCO, MCl, and MC I) so that when the
MCAND bus does a parallel access of the MCAND, the MSB of the multiplicand is always in
MCAND bus bit 31, and each following bit is progressively less significant (Figures 2-22 and 2-23).

2-50

THE PIPELINE TO

1. SELECT ROM *
ADDRESSES

TIMEADDRESS BANK A
MP <7:4> (Z)
1 ST NIBBLE

...J

LATCH ROM DATA
IN ROM STORAGE

<(
I<(

NOP

FORM PARTIAL
PRODUCT
ACCUMULATION
IN ACCM

T150

ADDRESS BANK 8
MP <3:0> (Y)
1ST NIBBLE OF B

""'

Z X MCAND LOOKUP

3. FORM PARTIAL
PRODUCT IN
PPROD LATCH

T100

~ STORE RESULT OF

T50

NOP

NOP
ACCM

NOP

NOP
ACCM = 0

SALU OPERATION
COMPUTE FINAL
RESULT

ADDRESS BANKA
MP <7:4> (X)
2ND NIBBLE OF A

"-..

STORE RESULT OF
Y X MCAND LOOKUP

FORM Z X MCAND
PARTIAL PRODUCT
(ZCAND)

NOP
ACCM = 0

T200

• • • •

ADDRESS B
MP <3.0> (W)
2ND NIBBLE

ADDRESS A
MP <7:4> (V)
3RD NIBBLE

STORE RESULT OF
XX MCAND LOOKUP

STORE RESULT
W X MCAND LOOKUP

FORM Y X

TEND

T250

ADDRESS B

r.1P <3:0> (U)

3RD NIBBLE
STORE RESULT
V X MCANO LOOKUP

MCAND~ FORM X X MCAN~ FORM W X MCAND

PARTIAL PRODUCT
(YCAND)

PARTIAL PRODUCT
(X CANO)

PARTIAL PRODUCT
(W CANO)

FORM ACCM
ACCM (0) + Z CANO=
ACCM

ACCM + Y CANO=
NEWACCM

ACCM + X CANO=
NEWACCM

• • • •

• • • •
•

• • •

• • • •
•

•

FORM FINAL RESULT
FINAL RESULT EQUALS
ACCM PLUS CARRYS

• •

• MULTIPLIER (MP) AND MULTIPLICAND
(MCAND) ADDRESSING. BOTH MULTIPLIER
AND MULTIPLICAND ARE DIVIDED INTO 4
BIT NIBBLES. THE MULTIPLIER NIBBLES ARE
ACCESSED INDIVIDUALLY (LEAST
SIGNIFICANT NIBBLE FIRST) AND ARE USED
WITH ALL MULTIPLICAND NIBBLES TO
GENERATE ROM ADDRESSES.
TK-0529

Figure 2-21

The Pipeline

2-51

MC1
A 32
6

2. F

.L.

,,.L.

MC
31 :24

...

2. F

,La

2. F

, B

,1-8

,..La

,.L

MC
23:, 6

l
MC AND
BUS

.L.

,.L

MC
15:8

16
MCO
B 15

TO
ROM
BANKS
A&B

,.....

7:0

8
7
<(

a..

::>
a:i

a:i

a..

ACCESS CO DES
1 - FIRST HALF OF EMODD OR MU LD
2 - SECON D HALF OF EMODD OR MULD
F - EMODF OR MULF
I - MULL (I NTEGER MULTIPLY)

::>
aJ

, 8

--,

---,

, 8

.L.8

16
MCI
31

..LB

24
23

...
.:

16
15

...L

8
7

1. F. I
0

,LB

•THIS 8 BIT REGISTER IS ALSO CALLED EMOD EXTENSION AND MCX
TK-0:.'"C.

Figure 2-22

Loading and Accessing the Multiplicand

2-52

M PLIER BUS 7:0
MP 7:4

TO ROM
MP 3:0
BANK A )'
4

TO ROM
_(BANK B

....'Tl

~
....
G
N
I
N

NIBBLE COUNTER
(SB)

---z__,

NIBBLE-----1
COUNTER
----------------(SA)
._..-...-...~--------

b
....
~

=
=

OQ
~

M PLIER BUS
63:08

Q.
N
I

~
(")
G

....~

;

-....=
~

60 59

t-t-

....

56 55
E

l
D

52 51

4443

4847

20 1s

2423

2827

4_Q 39136 3513231

l
4

1s1 s

12 11
3

3 2. 6 •••• 4 3 ..... 0 31 • • 2 8 2 7 ••• 2 4 2 3 ••• 2 0 19 •• • 1 6 1 5 ••• 12 11 • •••• 8 7 ••••• •4 3 • • • • • 0 31 ••• 2 8 2 7 ••• 24 2 3 ••• 2 0 19 •• •16
B

MP1

(24 BITS)

MPO
(32 BITS)

BUS FP B

BUS FPA

TK-0267

The multiplier up to 56 bits (14 nibbles) long, is loaded into MPl and MPO on FMH. MPl is 24 bits (6
nibbles) long and MPO is 32 bits (8 nibbles) long. Unlike the multiplicand, the multiplier is loaded in
one format only (Figure 2-23). The MSB is in MPl-23 and each following bit is progressively less
significant. The LSB is MPl-00 for single precision (float) or MP0-00 for double precision. The single
format is possible because, as stated before, the multiplier is used consecutively, the various formats
are sorted out by the counter as the nibbles are used during the multiplication.
Selecting the Multiplicand
The operands, multiplicand and multiplier, are enabled onto their respective buses, MCAND BUS
and MPLIER BUS, under control of operand bus source logic. Refer to Figures 2-22 and 2-23 and
Table 2-17. All 32 lines of the MCAND bus are enabled every time. During a MULF and EMOD and
for the first pass of a MULD and EMODD, the MCAND bus accesses MCX. Both MULF and
MULD (first pass) use only the top 24 bits, as the lower 8 are discarded later in the pipeline.
The MPLIER BUS multiplexer begins by selecting the least significant byte of the multiplier. Interleaving hardware later selects the high or low nibble of the bus. The mux then selects a new, progressively more significant byte each 100 ns.
Selecting ROM Address - The Interleave Hardware
Both the MCAND and MPLIER buses are divided into 4-bit nibbles for ROM addressing. Each
MCAND nibble (8 nibbles) is combined with a MPLIER nibble to provide address bits for 16 4X4
look-up ROMs. Rather than compute the product of the two 4-bit nibbles, the fraction multiply
hardware uses look-up ROMs. The multiply results are stored in the ROMs. The data is stored within
the ROMs such that the content of the address accessed by the two nibbles is the 8-bit result of a
multiply with the same two nibbles. Since the ROMs are relatively slow the 16 ROMs are divided into
two interleaved 8 ROM banks. One bank is accessed by the low MPLIER nibble (MP 3:00) theotherby
the high MPLIER nibble (MP 7:4). Both ROMs are addressed on 100 ns cycles; the MP low ROM is
first, and the MP high is second, trailing by 50 ns. The addressing of a ROM bank ends the first"part of
the pipe.
Latch the ROM Data
The second part of the pipe selects the outputs from either of the ROM banks, using the ROM SEL
MUX, and latches the data (64 bits) in ROM STRG. It alternately selects data from the low and high
ROM banks on a 50 ns cycle.
While the ROM data selected is being latched, the first part of the pipe is selecting a new address for
the ROM bank just selected. The output of the other ROM bank will be selected during the next cycle
(50 ns in the future). The address lines of this ROM bank were changed 50 ns ago and the outputs are
settling.
Form Partial Product
The outputs of ROM STRG and any carrys from the previous PALU add are added to form the
partial product. The PALU is eight 4-bit adders. The outputs of the ROM STRG are wired to the
PALU adder inputs such that bits of equal significance are combined. The outputs of the PALU
without carrys are stored in the PPROD LATCH. The carrys are stored in CARRY-HOLD registers
to be added in on the next PALU add. The latching of the partial products in the PPROD LATCH
ends the thitd part of the pipeline.
As indicated previously each multiply cycle selects 4 new bits from the multiplier register and each 4
new bits are 4 positions more significant. This means that the input of the PALU add becomes 4 bits
more significant each multiply cycle. Because of the increase in significance the stored carry-out of
each PALU adder is input, on the next cycle, to the carry-in of the same PALU adder rather than the
carry-in of the next PALU adder.

2-54

Table 2-17 Operand Bus Source

Operation

Input Simals
DOUBLE

ITH

OPC7

MCAND Bus Load Enable*
MCI
MCIL
MCO

EMODF or MULF

MULL (INTEGER MUL)

lst Pass

2nd Pass

MCINT

MCX

MPLIER BUS
Nibble Select

Start at A, do
6 nibbles

Start at 6, do 4,
then start at 2, do 4.

Start at 2, do 14

EMODD or MULD

MCAND Bus lines fed
*MCAND Bus lines are low enabled.

L
L

31-8

7-0

Start at 2, do 14

31-8

7-0

Note that while the third part of the pipeline is operating, new ROM data is being placed in ROM
STRG to be presented to the PALU inputs on the next cycle, and new ROM addresses are being
generated to access new data.
Accumulate Result
The fourth and final section, the AALU and associated accumulator (ACCM) adds the partial products computed in the previous pipeline section to the result stored in the ACCM including stored
carries from the previous AALU cycle and latches the result into the ACCM and LSH register.

The AALU, ACCM, and ALU carry-hold interconnections automatically shift the ACCM content
and ALU carry-hold content to adjust for the 4-bit increase of each new partial product. Because each
partial product input to the AALU is 4 bits more significant than the previously stored ACCM content, the outputs of the ACCM are wired to shift the ACCM content 4 bits right (a decrease in
significance) before being added to the PPROD LATCH content. The lower 4 bits of the AALU
output are always right-shifted into the LSH register. In double precision operations, the content of
this register is the least significant half of the result.
As with the PALU carrys, the carry-out of each AAL U is stored and added in on the next cycle. Also
similar to the PALU logic, the stored carrys are added to the AALU adder that generated them
because the content of the AALU is now 4 bits more significant than when the stored carrys were
generated.
The latching of the accumulating final result in the ACCM ends the fourth pipeline section.
The 4 sections of the pipeline continue to operate until stopped by the FM control logic. The stopping
point is selected based on both function and precision.
SALU OPERATION
When stop is initiated, the whole pipeline stops and new logic, the SALU, is accessed which adds the
two sets of stored carrys still in the pipeline to the total product on the output of AALU. When a
pipeline stop is initiated, the AAL U output (SALU input) is the contents of ACCM plus the current
PPROD. Both the ACCM plus PPROD addition (the AALU operation) and the PPROD forming
addition (the PALU operation) form stored c&rrys.
The hard-wired 2-bit shift in the PPROD LATCH input is not part of the several 4-bit shifts that take
place throughout the FM logic, but rather format the stored carrys so they may be easily combined for
a final answer in the SALU. Both the PALU and AALU are composed of 4-bit adders with carry-outs.
This means that the carry-outs are generated every 4 bits and that the PALU and AALU stored· carryouts can be treated as numbers of the following format:

xoooxooox

Xis a stored carry (data bit)
0 is a zero (non-significant bit)

Conventional wiring (output of a 4-bit PALU adder to input of a 4-bit PPROD LATCH to a 4-bit
AALU adder) would cause the data bits of the PALU stored-carry to line up (be of equal significance)
with the AALU stored-carry. This would prevent PALU stored-carrys, the AALU stored-carrys, and
the ACCM result from being combined in one operation in one adder (the SALU). However, wiring
the PPROD LATCH input and outputs with a 2-place shift, generates a PALU stored-carry number
with data bits of significance between the AALU stored-carry data bits. This shift allows both AALU
and PALU stored-carry numbers to be input to one side of the SALU, since the data bit of the PALU
stored-carry is always a non-significant bit of the AALU stored-carry and vice versa. Refer to Figure
2-24.

2-56

SALU OUT
32 BITS

•• ••••••• ••

SALU

ZEROS

'::'

AALU (32 BITS)
PALU
CARRY
HOLD
(8 BITS)

CARRIES
FROM
AALU
(8 BITS)

{:
{:
TK-0276

Figure 2-24

SAL U Operation - Adding the Stored Carrys

The use of the SALU result is determined by operation and the operation precision. If the SALU result
is the final answer, the result is transferred to the FP buses under both op code control and FPA
microcontrol. If however, the operation is dou.ble precision, the result is stored, and then, shifted to
format it for later operations under FM logic control. Before the shift, the most significant half of the
operation is in TEMP, the least significant half in LSH. The shift transfers the contents of LSH (the
least significant halt) to the ACCM register which is designated ACCM 14 at this time, and transfers
the most significant half from TEMP to Gust vacated) LSH.
For the second pass, the second half (the more significant half) of the multiplicand is accessed from
register MCI and MCI L, and logic enabled only during the second pass, combines the data transferred
to LSH from TEMP with the new result being accumulated. Otherwise, the operation of the pipeline
during the second pass is the same as during the first pass.
2.3.S.2 FM Control -The fraction multiplier logic is hardware rather than firmware controlled. Four
state bits select one of 13 function states that control the FM logic. Within each state, the state bits,
various internal flags, and various flags from other FPA logic are combined to provide the control
signals needed to implement the selected state's functions (Figure 2-25 and Table 2-18).

2-57

MULTIPLIER
INIT

FLAG * DOUBLE

IRD

INT

FLAG * INT

DIV
COUNT= 3

EVEN

DIV DONE

TK-0279

Figure 2-25 FM Control States
2-58

Table 2-18 FM Control States
STATE VARIABLES
X3

NAME

NEXT STATE

OUTPUT CONTROL

DEFINITION
LDCNTR

CNTR
CONSTANT

NEXT
TTH

NEXT
ALU
ADD

1010

1 IF FLAG
AND
DOUBLE

X2 Xl XO
0

INIT

SYNC

IF TO, THEN 0000;
ELSEOOlO

1101

RESULT OF MINIT SIGNAL FROM
MICROCODE. PREPARES MPLIER
NIBBLE SELECT COUNTER FOR MULF
SEQUENCE.
ENTRY FROM STATEOOOO AT TSO TO
PROVIDE SYNCRONIZATION BETWEEN
MULTIPLIERS SOns. CLOCK AND
MICROCODES 200ns. CLOCK

0
l IF
DOUBLE
OR INT

1010

CONT

0001

NOP IF MULF OR EMODF; LOAD MPLIER
NIBBLE SELECT COUNTER IF MUL,
MULD, OR EMODD.

TEST

IF CONT., THEN 0000
ELSE IF DIV, THEN 1000;
ELSE IF DBL. OR INT.,
1100: ELSE 0100

TESTS OPCODE FOR FIRST EXECUTION
STATE CALCULATION; CLEARS THE
MULTIPLIER DATA PATH,

IF EVEN THEN 1000;
ELSE 1011

WAITS FOR FIRST QUOTIENT BIT TO BE
FORMED IN THE NALU.

IF DIV DONE, THEN 1011 ;
ELSE 1111

SHIFTS LSH AND TEMP LEFT ONE BIT
TO ACCEPT QUOTIENT BITS IN DIVIDE

1 IF DBL
I IF
1110 IF INT
D3ANDDBL ELSE 1010 ELSE PREV
TTH
ORINT *

IF FLAG, THEN 1100
ELSE IF DBL, THEN 0100;
ELSE 1110

CLEAR DATA PATH AND CARRY
REGISTERS FOR MULD, EMODD, AND
MULL. WAITS FOR FIRST ROM LOOK UP.

1 IF INT
AND FLAG

0010

IF COUNT=3, THEN 1110
ELSE 1111

RUNS MULTIPLIER PIPE FOR MULL.

0010

IF SHF ZEROES, AND DBL.
AND FLAG THEN 0101
ELSE IF D 1, THEN 01 11 ;
ELSE 0100

RUNS MULTIPLIER PIPE FOR FLOATING
POINT MULTIPLY OPERATIONS. LSH'S 4
LSB'S TO ACCM'S 4 MSB'S IF SECOND
TIME THROUGH DBL MULTIPLY.

1 IF
INT AND
FLAG

0010

IF D4, THEN 0111
ELSE IF FLAG, THEN 0110
ELSE 1111

STOPS PIPE TO ADD FINAL STORED
CARRYS TO FINAL ACCUMULATION.
LOADS TEMP.

1 IF
D3

0010

NOP

DIV

WAIT

MULL

PIPE

CADO

0110 IF INT
ELSE 0010

1010

XFER

IF 08, THEN 0110
ELSE 0100

SHIFTS ACCM, TEMP, AND LSH RIGHT
TO TRANSFER.

0010

ADDZ

IF DI, THEN 0101
ELSE 0111

ADDS ZEROES TO ACCM'S 4 MSB'S.

0010

DONE

1111

STOPS ALL REGISTERS FROM
CHANGING TO ALLOW NR OR CPU D
REG. TO ACCEPT FINAL RESULT.

*
0

PREV
TTH

1 IF
DOUBLE
ELSEO

1010

PREV
FLAG

NEXT
CLR
ALL

MUL/DIV
PPROD ACCM
DONE

LSH

TEMP

NOP

PREV
FLAG

NOP

SL IF EVEN
LDIF
EVEN AND
FLAG

l IF FLAG
AND
DOUBLE

NEXT
FLAG

PREV
FLAG

NOP

PREV
FLAG

NOP

PREV
FLAG

NOP

SL IF
EVEN

SLIF
EVEN

PREV
TTH

1 IF EVEN
ELSE PREV
FLAG

1 IF
EVEN

NOP

PREV
TTH *

PREV
TTH

1 IF FLAG
AND
DOUBLE

PREY
FLAG

NOP

PREY
FLAG

1 IF
FLAG

NOP

LDIF
EVEN
AND FLAG

PREV
TTH

l IF
DOUBLE

PREY
FLAG

I IF
DOUBLE

PREY
FLAG

NOP

I IF
0110 IF INT
D3 AND DBL ELSE 0010
OR INT *
*

*DON'T CARE

*
*

TK-0735

2-59

The states can be roughly divided into four groups:
1.
2.

3.
4.

IRD
Integer Multiply
Fraction Multiply
Divide.

This section will discuss the states by groups and in the previously shown order. Within each discussion, the states will be discussed in the order they are accessed within the group. This is important
because the function of some states is partially dependent on the previous state.
The state of the logic is defined by the output of the PRESENT STATE register which is clocked on a
50 ns cycle. The inputs to this register (the next state) are based on the current state and internal and
external flags. A majority of the internal flags provide sequence information and are generated in the
logic shown in Figure 2-26.
IRD Group (Instruction Register Decode)
When the FM logic is not performing a multiply or divide operation, it is in IRD. While waiting, the
logic is continually cycling through the 4 states in this group preparing the FM logic for a multiply. In
this IRD group the op codes in the instruction buffer are monitored. Initially, (in INIT), the FM logic
is set up for a MULF, but if the op codes indicate either a MULL, MULD, or EMO DD, new information is loaded into the FM logic in the CONT state. The FPA microcontrol will be loading the
MPLIER and MCAND register during IRD if the op codes indicate a multiply operation.
The control logic enters INIT whenever the Multiplier Operand Control (OPLD) field in the FPA
microcontrol store is F. This normally happens during the FPA IRD or when a multiply operation is
finished. The SYNC state is entered at CPU TSO and synchronizes the FM clock with the CPU clock.
It also clears FLAG. CONT is entered at TlOO and loads new information if the op codes indicate a
MULL, MULD or EMODD. TEST is entered at TISO. In TEST, if the MCNT bit in the FPA microcode is not asserted, indicating that the FPA does not want the multiply pipeline to begin, the FM
returns to the INIT state and continues waiting. If however, MCNT is asserted, indicating that the
multiplier operands are loaded and the FPA wants a multiply to start, the correct execution state is
selected based on the op code. Refer to Table 2-I 8 for summary of IRD group functions.
Multiply Float Path
If the op code indicates a MULF, the PIPE state is selected and the multiplier pipe can continue. Note
that during INIT the nibble counter was loaded with MULF control data for ROM look-up to begin
based on that data. Since a MULF is being done, the data in the beginning of the pipe is correct.
The logic remains this state (PIPE), running the pipe and accumulating the answer, until Dl, a timing
signal, is asserted. When DI is asserted the current content of the PPROD plus ACCM plus the storedcarrys is the final correct answer.
Asserting DI selects the CADD state. This state NOPS most of the FM registers and enables the
SALU add of stored-carrys to the AALU content. CADD also latches the SALU result into TEMP.
The FM logic remains in CADD I50 ns (until D4 is asserted.)
Since FLAG was cleared during the IRD group and never set, it is clear and asserting D4 initiates the
DONE state. This state asserts MUL/DIV DN and NOPs all other FM logic. MUL/DIV DN, monitored by the FPA control logic, returns control to the FPA microcontrol. It is the FPA control store
that selects the MULF result, via a multiplexer, directly from the SALU outputs rather than from
TEMP. The FM logic will remain in DONE until returned to INIT by the multiplier INIT code in the
multiplier operand control field of the FPA microcontrol store. Refer to Figure 2-27 for a summary of
MULF control.
2-60

MULTIPLIER
NIBBLE
COUNTER

LOAD
COUNTER
DATA

----

WIRED AS
SHIFT REGISTER
1--

, ---

4 BIT
UP
COUNTER

6 BIT
LATCH

......

DECODE

8 BIT
REGISTER

THAU
-- 08

j
·~

·~

4 BIT

•

REG

50ns
CLOCK

(COUNTER~

,~ LSB

IGNORED)
MPLIER
SELECT
LINES
ROM BANK

MPLIER
SELECT
LINES
ROM BANK

B
_ _ 50 ns ~

NOTE
THIS FIGURE SHOWS ONLY GENERAL SIGNAL
FLOW. ALL ITEMS SHOWN HAVE NUMEROUS.
OTHER OUTPUTS ANO·INTERCONNECTIONS

DELAY~
TIC·0~1

Figure 2-26

FM Control Logic

2-61

47:44
43:40

MP1 24 BIT MPLIER

MULF OPERANDS
MC1 24 BIT MCAND

MULF TIMING

IRD
50 NS

TO
MCONT

·1

MUL CLK
MUL STATE

INIT

SYNC

CONT

TEST

PIPE

CADD

CADO

DONE

(FMHM) "D" TIMING
(LO)

6
6

SB <2~0>

x
x

ODD H

MUL NIBBL CNTR
SA <2:0>

BANKA MP <3:0>

I I Z·M~MCI X·~·MCI
5

MP 43:40

BANKB MP <7:4>

CLR
SL

PP3

PP4

PPS

PP1 +
ACCMO

PP2 +
ACCM 1

PP3
ACCM2

PP4 +
ACCM3

MUL Div DONE

MULF RESULT
ACCUMULATION

ACCM 1 =
ACCM 2
ACCM 3 =
ACCM 4 =
ACCM 5 =

2-62

PPS
LO

PPS + = ACCM 5
ACCM 4

MUL
DONE
ADD
LAST
CAR RYS

LO NR

PP1

MUL
DONE
LO NR

LSH

PP3 + ACCM 2 .....___ _..

PP5 + ACCM 4

MUL
DONE

LSH

PP2 + ACCM 1

PP4 + ACCM 3

SALU = PP6 PLUS
ACCM 5 PLUS
STORED CAARYS
FROM PP6 &
ACCM 5

+ ACCM 0 ~-.....,
LSH
LSH
LSH

AFTER EACH ADDITION OF THE PARTIAL PRODUCT AND ACCUMULATOR CONTENTS, THE 4 LEAST SIGNIFICANT
BITS OF THE RESULT ARE LOADED INTO THE LSH REGISTER.

Figure 2-27 MULF Control

U·MCI

PP2

LSH

•

V•MCI

PP1

CONTENTS OF ACCM

ACCY CTL

2
2

--1

CONTENTS OF PPROD

1
0

.,..
MP 51:48
.,..
MP 59:56
~
MP 47:44
MP 55:52
MP 63:60

CONTENTS OF ROM STRG

PPC CTL

0
0

TK-0512

MULD Path
If, when the FM control logic is in TEST, the op codes indicate a double precision multiply (DOUBLE
set), the WAIT state will be entered. Initially (in INIT) the nibble counter was loaded for MULF and
ROM lookup began, then in CONT (100 ns later) when a MULD was decoded, new data was loaded
into the nibble counter. The WAIT state waits for the data loaded in CONT to settle and access new
ROM locations before beginning the pipe. After 100 ns in this state FLAG is set. In this context,
FLAG set indicates the first pass in a double precision multiply. After 150 ns, since both DOUBLE
and FLAG are set, PIPE is entered.
The logic remains in the PIPE state, running the pipe and accumulating the answer until Dl, a timing
signal, is asserted. When Dl is asserted the current content of ACCM plus the two sets of stored-carrys
are the first half of the MULD partial product.
Asserting D 1 selects the CADD state. This state NOPs most of the FM registers and enables the
SALU add of stored-carrys and the ACCM content. CADD latches the upper 32 bits of the first half of
the MULD partial product in TEMP. The lower 32 bits have been accumulating in LSH during the
pipeline operation. The FM logic remains in CADD 150 ns (until D4 is asserted).
Since FLAG is asserted, indicating firs.t pass, asserting D4 selects the XFER state. Four cycles in the
XFER state transfer the content of TEMP and LSH to LSH and ACCM (refer to Figure 2-28), clear
FLAG, and clear the stored-carry registers.
The assertion of D8 returns the FM logic to PIPE. The FLAG bit now cleared and DOUBLE set
asserts ALU ADD. This signal causes the data stored in LSH during the XFER state to be added in (4
bits per cycle) to the final product being developed. Six cycles transfer all 24 bits stored during XFER.
While these bits are being right-shifted from the right end of LSH into the MSBs of the developing
final product, the LSB of the developing final product are being right-shifted into the left end of the
LSH.
When 20 bits have been transferred in from LSH, SHF ZERO is enabled. This causes the logic to enter
the ADDZ state. The final 4-bit transfer of LSH data takes place during the first ADDZ state. After
that the ALU that added LSH to the ACCM is disabled. During this state, the pipe continues functioning and the LSBs of the accumulating final product are still shifted into the left end of LSH. The only
difference between PIPE and ADDZ during this second pass is, in PIPE, LSH data bits are added into
the MSB of the ACCM, and, in ADDZ, zeros are added. Note this state even has the same ending
criterion as PIPE, namely DI asserted.
Dl asserted transfers control to the CADD state. As discussed in MULF path, CADD is entered when
the ACCM plus the two sets of stored-carrys is the final answer. In CADD the stored-carrys are added
to the AALU content by SALU and the result is latched into TEMP. Since FLAG is now clear the
assertion of D4 causes a transfer to DONE.
In DONE, MUL/DIV DONE is asserted. This causes the FPA microcode to select and transfer, via
multiplexers, the upper 32 bits of the double precision result from the SALU onto FP bus A and the
lower 32 bits from the LSH register onto FP bus B. Refer to Figure 2-29 for a summary of MULD
control.
MULL Path
If the op code being monitored during CONT decodes as MULL, new data is loaded into the nibble
counter. The logic proceeds to TEST and, in TEST, selects the WAIT as the first execution state
because INT (meaning integer) is set.

2-63

0 31

BEFORE XFER

31 •
NOT
AFTER
USED
XFER

TEMP

1
!23

ai1.-.ol
• 0

LSH

0
LSH

31
ACCM 14

aj 7.--01
l~
0

USED

'Tl

dQ"

c...,

N
I
N

a..
~

THE XFER
TEMP

~
:::r'

RIGHT
SHIFT
4X

n
)(
'Tl

:;t:I

....
....

CJ'.)

LSH

('D

RIGHT

O SHIFT

......-..-...--..--~.._.....__.__...__~,___._____,__.__,,__..__.__..._.__....__.__.-.._._........__.,___.__.-.._._......__...._.__4x

TK·0273

MULD TIMING

r----t-IRD

-;so NstMUL CLK
MUL STATE

INIT

(FMHM) FLAG

PIPE

CAOO CAOO CAOD

MC
N·O

MC
M·O

PP12

PP13 PP14

(FMHM) "O" TIMING
MUL NIBBLE CNTR

MP
11 :08

BANK A MP <3:0>

MCONTI

SYNC CONT TEST WAIT WAIT WAIT
0

MP
15:12

BANK B MP <7:4>

CONTENTS OF ROM STAG

Z·O

MP
19:16

MP
27:24
MP
23:20

MP
31:28

MP
39:36

MC
U·O

PP4

PP5

0
MP
59:56

MP
51 :48
MP
47:44

T·O

S·O

PPS

PP7

MP
55:52

MP
63:60

X·O

W·O

PP1

PP2

PP3

LO
LO
LO
LO
LO
LO
LO
LO
LO
LO
LO
LO
LO
NOP
ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM
2
3
1
4
7
9
10
6
5
11
12
8
13

Y·O

V·O

MP
43:40

MP
35:32

MC
R·O

MC
Q·O

PPS

ppg

MC
P·O

MC
O·O

CLR~-------------------------------..1
CONTENTS OF PPROO
PPC CTL

CONTENTS OF ACCM
ACCY CTL
LSH
TEMP

ITH

PP10 PP11

NOP

-+-------------

ALUAOD..,.__________________________________________________________________________..------------------:--------------------:~-t
MUL DIV DONE
TK-0530

Figure 2-29 MULD Control (Sheet 1 of 3)

2-65

MULD TIMING
TO

MUL CLK
XFER

MUL STATE
(FMHM) FLAG

XFER XFER

(FMHM} "D" TIMING

MUL NIBBLE CNTR

MP
11 :08

BANK A MP <3:0>

PIPE

MP
19: 16
'MP
15: 12

BANK B MP <7:4>

PIPE

I Z·MC1

CONTENTS OF ROM STAG

9
MP
27:24

MP
23:20

PIPE AOOZ ADDZ ADDZ AODZ ADDZ ADDZ ADDZ ADDZ CADD CADD CADD DONE DONE DONE DONE

MP
35:32
MP
31 :28

MP
43:40
MP
39:36

MP
51 :48
MP
47:44

NOP

MP
55:52

MP
63:60

R·1

PP22

PP23

MC I MC
I X·1
MC
W·1 V·1

MC
U·1

PP15

PP16

PP19 PP20 PP21

LD
LD
LO
LO
LD
LO
LO
LD
LO
LO
LO
LO
LO
ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM ACCM
15
16
17
18
19
20
21
22
23
24
25
26
27

I MC
T·1

MC
S·1

MP
59:56

Y·1

I MC 0·MC I MC 0·MC I MC
P· 1

N· 1

MC
M· 1

CLR
CONTENTS OF PPROO
PPC CTL

PP17 PP18

PP24 PP25

PP26 PP27

PP28

ACCY CTL

NO?

LSH

NOP

TEMP

CONTENTS OF ACCM

TTH
ALU ADD
MUL DIV DONE

IMUL DONE
TK·0531

Figure 2-29 MULD Control (Sheet 2 of 3)

2-66

MULD OPERANDS

l~M'~Nl~oI~PI~aI ~RI ~s
I r~Iu~Iv~I~~Ix ~Iv ~I
z Ise e1r MPLIER

,.....i-------MP1!------1~--------MPO,-------.i•I

~'------M_c_1._M_c_1_L

_ _ _ _ _....JI

O's

FIRST HALF

::1

!JQ

c:
'"1

1314
.. _ _ _... 0131-·---· 01

MULD RESULT ACCUMULATION

i:..,

i I

TEMP

ACCM1=~

°'
-.J

PP2_ + ACCM1

ACCM 2 =

i23

c=~SH

H LSH
ACCM 4 =I
PP4 + ACCM3
H LSH
ACCM 5 = I
PPS + ACCM4
H LSH
ACCM 6 = I
PPS + ACCMS
H
LSH

(")
0
'"'I

2.
@
::r'
~

!:j

ACCM 1 = I

PP7 + AccMa

ACCM 12 =I
ACCM 13=

H
H

I
I

LSH
LSH

SALU = PP14 PLUS ACCM B
PLUS CARRYS FROM PP14 AND ACCM13

I
1.

TEMP

II
·I ~

ACCM22

ACCM26
LSH

32
32
FIRST HALF PARTIAL PRODUCT OF MULD

ACCM27

·I

LSH

ACCM25

LSH

ACCM24

LSH

H
H

ACCM23

LSH
LSH
LSH

LSH

I ACCM18 = PP18 + ACCM17

_.H.__L_S_H_..I ACCM19 = PP19 +. ACCM18

ACCM20
ACCM21

LSH

ACCM 1s

LSHf._.,1.,_ _A_c_c_M_1_9_

LSH

ACCM17 = PP17 + ACCM16

ACCM17

ILs Hf
I

~ ACCM16 = PP16 + ACCM15

ACCM16

LSH

Ls_H_ __,

PP12 + ACCM11

I PP13 + ACCM12

RESULT OF TRANSFER

r+(] ACCM15 = PP15 + ACCM14

ACCM15

LSH

r.-t. .___

PP10 + ACCM9
PP11 + ACCM10

ACCM14

LSH

I PPS + ACCM7 H.____L_S_H_ ___.
ACCM 9 = I ppg + ACCMS
H.____L_SH_ ___,H
ACCM 11 =I

LSH

ACCM 8 =

ACCM 10 =I

FIRST HALF PARTIAL PRODUCT

PP3 + ACCM2

ACCM 3 = I

i I

LSH

0~31

\()

c::t"""

MCAND

"!" 81

r..- 2 4

3 2 - _ _ _ . . ..
SECOND HALF
1

..
I

I I

MCO

I ACCM20 = PP20 + ACCM 19
I ACCM21 = PP21 + ACCM20
I ACCM22 = PP22 + ACCM21

H ACCM23 = PP23 + ACCM22
H ACCM 24= PP24 + ACCM23

r----i

ACCM25 = PP25 + ACCM24

I ACCM26 = PP26 + ACCM25

I ACCM27 = PP27 + ACCM 26

FINAL PRODUCT = SALU = PP28 PLUS ACCM27 PLUS CARRYS FROM PP28 AND ACCM27
TK-0532

In WAIT, the new ROM data selected by the new ROM address accessed as a result of the new data
loaded into the nibble counter during CONT is given time to settle before entering the pipeline. When
FLAG is set, the data has settled and the integer multiply pipeline state (MULL) is entered.
The FM logic remains in the MULL state as the pipeline accumulates the final product (the least
significant half accumulates in LSH). When COUNT = 3 is set, the AAL U plus the two sets of storedcarrys is the final product. COUNT = 3 asserted selects DONE.
In DONE, MUL/DIV DONE is asserted and the final product is available. The FPA microcode loads
the upper half from the SALU onto FP bus A during one machine cycle. On the following cycle the
lower half is loaded from LSH onto FP bus A. Refer to Figure 2-30 for a summary of MULL control.
2.3.5.3 Division - The TEMP and LSH register in the fraction multiplier logic are used to store the
quotient generated during floating-point division. The registers are concatenated with the MSB of
LSH shifting into the LSB of TEMP.

During a divide operation the FPA asserts DIV and loads the divisor and dividend into the FNM. In
the FM logic, the nibble counter is loaded for a MULF and clocks through until TEST. To initiate
quotient storage the multiply control field (MCNT) of the FPA microcode must be asserted. The
combination of MCNT and DIV asserted selects the NOP state in the division path.
The FM logic enters NOP with the nibble counter odd and exits when the nibble counter is even. The 2
cycles ( 100 ns) allows the first quotient bit to be formed.
From NOP, the FM logic enters DIV. In DIV, the logic left.. shifts LSH and TEMP one bit every even
cycle. When doing a single precision division the single quotient bit is input to both LSH bit 4 and
TEMP bit 4. The data input to LSH is never accessed in single precision. In double precision the
TEMP bit 4 quotient input is blocked and the TEMP bit 3 is input to TEMP bit 4 on the left shifts.
DIV DONE is asserted when quotient bits are left-shifted in TEMP bits 28 and 29. This condition is
tested at TIOO of each state and transfers control to DONE if true.
In DONE, MUL/DIV DONE is asserted, stopping the division process in the FNM and causing the
FPA microcode to access TEMP for a single precision quotient and TEMP and LSH for a double
precision quotient.
·
2.3.6 Exponent Proce~or
The exponent processor, part of the FCT, processes the FP exponent during FP operations. During FP
multiply /divide, the processor adds/subtracts the exponent~ as ne~ded~ During add/subtracts, the
processor stores the larger exponent and determines the final exponent by taking into account the
operation, fraction right-shifts, and left-shifts during normalization. By comparing the exponent magnitudes the exponent processor also controls the FPF addition and subtraction in the FAD. Refer to
Figure 2-31.

2-68

MULL OPERANDS

23:20

39:36

15: 11

31:28
27:24

MPO 32BIT MPLIER

MCINT 32 BIT MCAND

MULL TIMING
TO
MULSTATE

TO
INIT

FLAG

MULL NIBBLE CNTR

COUNT=3

SYNC

CONT

TEST

I
I

TO
WAIT

WAIT

I r--

BANK A MP<3:0>

CONTENTS OF ROM STRG

MULL

"!•

MP27:24

BANK B MP<4:7>

I I
MULL

MULL

I
+

MP35:32

TO
MULL

MULL

.f.

MP19:16

MP 11:08

____.j

MULL

DONE

I 9 I

Ir MCINTt MCINTIR MCIN~t MCINTIP MCINI MCINT IN MCIN M MCINT

- MP 31 :2S

MP 39:36

MP 1S:12

MP 23:20

CLR
PP1

CONTENTS OF PPROD

PP2
PP1

PP3
PP2

ACCMO
LO

CONTENTS OF ACCM
LSH

PP4
PP3

PPS
PP4

PP6
PPS

PP7
PP6

ACCM 1

ACCM2

ACCM3

ACCM4

ACCM 5

ACCM6

PPS
PP7

......

MUL DIV DONE

= ACC M7

MUL DONE
E

MULL RESULT ACCUMULATION

ACCM 2 =

ACCM 6

PP2 + ACCM 1

LSH

= PP6 + ACCM 5

ACCM 7 == PP7 + ACCM 6
TK-0525

SALU =PPS PLUS ACCM 7 PLUS STORED CAR RYS FROM PPB & ACCM 7

Figure 2-30

MULL Control
2-69

DALU
LA-LB

SHF COUNT

.,________a_____,....(FAD) SHF COUNT IS ALWAYS
POSITIVE OR ZERO

CALLI
LB-LA

SELECTS INPUT
A GT B

PR
(INPUT SEL)
AMX

(LOAD ENABLE)
EAC1

(INPUTSEL)
BMX

------+------.

OPERATION SEL
EAUL

--+--• EALU ...--,----.

{LOAD ENABLE). LA
EAC3
<07:00>

{LOAD ENABLE)
EAC2

LB
<07:00>

(LOAD ENABLE)
EAC 0

{OUTPUT
...__ _ SELECT) BSC<3:0>
BUS

<33:00>

BUS

<33:00>
TK-0277

Figure 2-31 Exponent Processor Block Diagram

2-70

The FPEs are loaded from FP buses A plus B into LA and LB under control of the EAC field in the
microcontrol (Table 2-19). The contents of LA and LB are loaded into CALU and DALU. CALU
computes LA - LB and DAL U computes LB - LA. The carry-out signal from DALU selects either
CALU or DALU as the positive exponent difference (SHF COUNT) to provide FPF control in the
FAD.

Operation

Table 2-19

EAC Control Store Field

EAC Fields
1

µCs

Controls
LA-+ Bus A
Transfers

Controls
LB-+ Bus B
Transfers

Controls
PR-+EALU
Transfers

Controls
XR-+EALU
Transfers

0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1

0
0
0
0
1

0
0
1
1
0
0

0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1

Hex
0
1

2
3
4
5
6
7
8
9
A
B

D
E
F

1
1
1

1
1
0
0
1
1

0
0

0
1
1
1
1

0
0
1
1

NOTE
Although the control field appears to be a 4-bit field,
each bit of the 4 bits actually controls a single, independent funcdon.

2-71

NOP

The contents of LA and LB, as well as XR (poly register), PR (product register), a normalization
constant, and 8016 are possible inputs to EALU. Input selection is controlled by both microcontrol
and hardware. Refer to Table 2-20 for input selection summary.
Table 2-20

EALU Input Control

AMXC Fields
1
0

µCs
35

µCs
34

Operation

0
0
1
1

0
I
0
1

LA to EALU A input
LB to EALU A input
PR to EALU A input
Hardware select: For FP Add/Subtract, larger exponent (LA or LB) to EALU A

BMXC Fields
1
0

µCs
33

µCs
32

Operation

0
0

0
1
0
1

Normalization constant to EALU B input
XR to EALU B input
8016 to EALU B input
LB to EALU B input

1
1

2-72

The EALU operation is controlled by the microcontrol field EALUC. Refer to Table 2-21. The output
of the EALU can be loaded into XR or PR for further processing, or loaded onto the FPA bus as a
final answer. The XR and/or PR are loaded under control of the EAC microcontrol field. Refer to
Table 2-19 (bits 0 and 1). The EALU output to FP bus A <14:07> is controlled by BSC microcontrol
field (Bus A EXP). Refer to the discussion of BSC field in Paragraph 2.3.2. The partial answers in XR
and PR are reloaded into the EALU via AMUX and BMUX, and are combined with either a normalization constant or ±8016 before they are loaded onto FPA < 14:7>. Refer to Table 2-20. The normalization constant, a variable quantity, adjusts the exponent for shifts required to normalize the FPF in
the FAD. (The actual normalization constant is read from a ROM rather than computed. The ROM is
on the FNM.) The 8016 corrects for the offset that results in FPE add/subtract during exponent
processing in MUL/DIV. Refer to Paragraphs 1.4 and 1.5.
Table 2-21
EALU Fields
1
0

EALU Control Store Field
Control Signals Generated

EALU Operation

µCs
31

µCs
30

Required
Carry

Req Mode
Control

H (logic)

Pass A INPUT

L (arith)

A-B

L (arith)

A+B

H (logic)

Force l's out
(interpreted as
underflow. This
function is used
to generate
zeros on the
buses.

X = Don't care

2-73

2.3. 7 Sign Processor
The sign processor, a section of the FCT, determines the sign of the FP operation result using both
hardware and the microcontrol field SGNC (signlatch controls). Refer to Figure 2-32 and Tables 2-22
and 2-23. This section receives information indicating the sign and magnitude of each operand, the
desired operation (add, subtract, multiply, divide, poly) and the magnitude of the result. The resulting
sign is placed on FP bus A 15.
SB
SA

SIGN
~
A

J
--

FP BUS A <15> s__...
FP BUS B <15> s__,..
SGN

1-F-~

INSTRUCTION
DECODER

SIGN
B

TO
FP BUS As
<15>
(OUTPUT)

IRC 2

--6 --10

COMBINATORIAL
LOGIC

EALU3 -'I

r--

RESULT4
SX 4

NOTES
1.
2.

FROM µCS SGN FIELD
FROM IB DETERMINES INSTRUCTION TYPE

DETERMINES

4.
5.

INTERMEDIATE RESULTS
SIGN OF OP ERAN OS

IF RESULT IS ZERO OR NEGATIVE
TK-0280

Figure 2-32 Sign Processor Block Diagram

2-74

Table 2-22

SGNC Control Store Field

SGNC Field
SGN
C2

SGN
Cl

SGN

Operation

µCs
07

µCs
06

µCs
05

Load into
SA

Load into
SB

0
0
0

0
0
1

0
1
0

SA (NOP)
FP bus A 15
SA+ Op Code
=SUB

SB(NOP)
SB (NOP)
SB (NOP)

0
1
1
1
1

1
0
0
1
1

1
0
1
0
1

Result*
SA (NOP)
FP bus A 15
SB
SA+ SX

SB (NOP)
FPbus B 15
FPbus B 15
SB (NOP)
SB (NOP)

* This is the resultant sign, determined by the op code, signs of the operands, the relative magnitude of the
exponents, and the signs of the FALU. It can also be forced if a floating underflow or overflow occur.

Table 2-23

Op Code
MULX
DIVX
ADDX
SUBX
ADDX
SUBX
ADDX
ADDX
SUBX
SUBX

Sign Processor Operation
Sign of
Result
(FALU sign)

Relative Size
of Exponents

x
x

x
x
x
x
x
x

LA>LB
LA>LB
LA<LB
LA<LB
LA= LB
LA= LB
LA= LB
LA= LB

Positive
Negative
Positive
Negative

Result*
SAG> SB
SAG> SB
SA

SB
SB
SB

Sil
SB

X = Don't Care
*Except for error - in case of overflow, the sign is forced to a 1 while underflow forces a 0.

2-75

2.3.8 Control Store and Logic
As indicated in previous sections, the control store and logic, located on the FCT, provides the control
signals for all FPA operations. These include both FPA internal operations: the transfer and manipulation of FP data, and external operations (interface between the FPA and CPU). Refer to Figure 2-33.

TO CPU
FPA
FPA STATUS TO
CPU INTERFACE
LOGIC

CONTROL LINES
SELECTED
MICROWORD
I

NEXT ADA

<8:0>

CONTROL
STORE

I
I

MSC

2:0

L-l---- ----~

NEXT
ADA

CLK

8:0
BEN

IROPC
7:0
SPECIFIER
LINES

STALL
2:0
..,___ ____.,_ __,.. LOGIC
OPCODE &
SPECIFIER
8
DECODE
LOGIC
NEXT ADDRESS
--------F-L_O_A_T----1 LOGIC
,......._ __.
FLOAT

--------

TRAP ADDRESS LINES
TRAP
LOGIC

CS LINES
TK-0271

Figure 2-33

Control Store and Logic Block Diagram

The FPA has two normal operating functions: instruction register decode (IRD), and performing an
FPA instruction. The FPA normally alternates between these two functions. A third function, exceptional conditions, handles error conditions, traps, and interrupts. The FPA executes the third function
whenever an exceptional condition is sensed.
The FPA and the CPU run synchronously, i.e., both have 200 ns microcycles divided into 4 time states
(CPTO, CPUT50, CPTIOO, CPT150) and TO CPU is simultaneous with TO FPA. Both load a new
microword only at TO.
The FPA always keeps two updated copies of the 16 CPU general (scratchpad) registers. These copies
are used by the FPA to optimize register-mode instructions. These register copies are accessed and
updated by the same lines that access and update the CPU registers themselves. To ensure that the
FPA never reads a changing register the CPU updates the general register set (and FPA copies) between TlOO and T200 (TO) and the FPA reads the copies only between TO and TlOO.

2-76

The FPA as a whole is directly controlled by the CPU. The CPU can enable and disable the FPA via
bit 15 of the FPA status register (ID bus register 17). The FPA is normally enabled by the CPU.
The FPA is a· microcontrolled unit containing a 512 words by 48 bits of control store in ROM. Each
word is divided into various length control fields, each field providing independent control of a particular section of the FPA. In general, these fields: control the operation of the FPA data manipulation
components; coordinate the operation-of the FPA with the operation of the CPU; and initiate the
operation of parts of the FPA control logic. Control of FPA operations is handled by accessing specific ROM words causing a particular set of FPA actions.
2.3.8.1 IRD -The IRD state is controlled by location IRD.l in the control ROM. In this state a new
microword is not read until STALL is disabled. ACC INSTR Hand IB CALL from the CPU microword disables the STALL condition. When the FPA leaves IRD, the ACC ERROR bit in the status
register is cleared if it was set during a previous cycle. The op code and specifier decode logic is
monitoring the IRC OPC 7:0 and specifier lines. The OPC lines enable ACC INSTR H when a FPA
instruction is in the IB and are decoded to determine instruction type. The specifier decode lines
determine specifier type. The output of this decode logic is transmitted to the next address logic.

Location IRD.1 controls all FPA operations in the IRD state. The operation assumed is a register to
register operation. The FPA continually begins this operation without any indication that the next
operation will be an R to R because it has both operands in its register set and, if the next FPA
operation is an R to R, both operands will already be loaded. Location IRD. l has MSC = 6 and the
next address = 180. This information is transmitted to the next address logic and along with the
outputs of the op code and specifier decode logic determines the correct next microaddress.
In the next address logic (refer to Figure 2-34 and Table 2-24), the MSC = 6, and op code and specifier
decode logic lines select the address offset to be ORed with next address ( = 180) to select the next
microaddress. MSC = 6 selects the A-fork inputs from op code and specifier decode logic lines and
transmits them through the A-B fork mux. This selects the correct offset based on instruction type,
float or double, and specifiers 1 and 2.
.

2-77

TRAP
CONTROL
SIGNALS

cs
DECODE

NEXT ADDRESS
(FROM CURRENT
MICRO WORD)
(9)

ACC TRAP ADDRESS

CS BUS

(8)

MAINTENANCE
REGISTER <16.23>

ID BUS

NEXT
A ORB FORK DATA CONTROL

ADDRESS

NEXT
ADDRESS

SELECT

A-8
FORK
MUX
DECODE

A- FORK
B - FORK

A- B DATA (4)

MSC= 6 OR 7 ?-____,
BRANCH
ENABLE
DATA

BEN
MUX

BEN DATA (3)

TK-0534

Addre~

Figure 2-34

Next Address Logic

Table 2-24

Next Addre~ Lines

Description

Next Addre~ Control Lines
FCTK BEN 2:0 H

From FPA control store selects lines to be monitored during
execution flows.

cs 71, 70

CPU accelerator control field
00- NOP
01 - CPSYNC
10-ACC TRAP-To 3-bit address specified by CPU USI field
11 - REDEFINE USI

2-78

Table 2-24 Next Address Lines (Cont)
Address

Description

Next Address Control Lines (Cont)

cs 57, 56, 55

If CS7 I and CS70 are high enabling DEC USI, a 6 on these liries
enables POLY DONE, a 7 FP TRAP.

FCTH ACC TRAP H

High during accelerator trap, low otherwise.

FCTH FP TRAP L

Low during FP trap, high otherwise.

FCTH TRAP DIS L

Low during either FP trap or accelerator trap, high otherwise.

Next Address Selector Controls
DEC µSI

A-FORK
MUX

FCTH DEC µSIL enabled and CS 57, 56, and 55 high enable
FCTH FP TRAP, otherwise it is high.
B-FORK SELECT
Enable H causes all highs out and doesn't affect next address.
Enable L enables select input to select A-B data.

NEXT ADDRESS MUX

Enable H causes all highs out. If enable is low, Slow selects A
input.

BEN MUX

Enable high causes all highs out.

Addre~ Lines

FCTR CRADR 08:00 H

To control store selects address. Also can be transmitted to
CPU via Reg 16 as current ADR.

FCTK NEXT ADR 08:00

From control store next address from microword.

FCTH TRAP A 07:00 L to
FCTF

Contains either trap address or next address.

FMHR TRAP A 7:00H

FP trap address from MAINT REG ID BUS.

FCTH BRC 2:0 L

From branch enable MUX (BEN) monitors various FPA conditions and modifies the next address during execution flows
based on BEN field in FPA microcode.

A-B FORK ADR

(Not a signal name on prints) From A-FORK B-FORK select
M ux. Monitors op code and specifier type from I Band modifies
address in A-B forks.

FCTF FLOAT H

Based on op code. Used during A-B forks and by branch enable
logic (BEN).

cs 57, 56, 55

Select trap address during ACC trap. Also refer to CS 57, 56, 55
in control lines.

2-79

The offset is ORed with 180 and since STALL is no longer enabled (ACC INSR His high) the next
CPT 0 will select the correct microword to control the next FPA cycle. If the data is already in the
FP A, an optimized routine will be selected.
2.3.8.2 Performing an FPA Instruction - Once an FPA instruction is sensed, the microcontrol words
and the order they are selected is based on the operation desired, float or double, location of the
operands, and relative size of the operands and/or result.
The FPA first ensures that it has all the required data. If both operands are in registers, or one is in a
register and the other is a short literal, all the data is in the FPA after the A-fork test and the FPA
transfers directly to the execution flows. If not, the first operand is fetched during A-fork and then
MSC = 7 and next address = 100 is transmitted to the next address logic.
In the next address logic, MSC = 7 selects the B-fork inputs from the op code and specifier decode,
and transmits them through the A-B fork mux to be ORed with next address = 100. The offset selected
depends on instruction type, double or float, and type of specifier 2. As before, tf the data is already in
the FPA, an optimized routine is selected; otherwise, the FPA waits for the CPU to fetch data.
In some data transfers (A-fork or B-fork) the FPA must wait for data to be transmitted from the CPU
via the ID bus. The microcode has a special WAIT bit to enable ST ALL for this purpose. The CPU
indicates that the required data is on the ID bus by asserting CP SYNC. CP SYNC causes the data to
be stored in the FPA and clears STALL; thereby enabling a new microword to be read and FPA
operations to continue.
Once the FPA has all required data ACC OVERIDE is asserted. This signal, transmitted to CPU
microaddress bit 12, causes the CPU to select microcode from FPA specialized microcode in the
writeable control store (WCS) rather than PCS. This prevents the CPU from beginning microcode
floating-point routines (used when no FPA is present) to do FP instructions. The enabling of ACC
OVERIDE is based on instruction type (IRC lines) and the execution point counter, (IRC EP 2:0).
Note that since the FPA cannot fetch data itself, the data-fetch routines (CPU AFORK and BFORK)
are allowed to continue until the FPA has all required data.
Once the FPA has all the data the FPA execution flows are entered. These flows perform the manipulation required to A, S, M, and D. This includes unpacking and individually manipulating the FPF
and FPE parts of the number, as well as checking the operands and/or results for unusual conditions
(zeros, underflow, overflow, etc.). During execution flows the BEN field selects lines to be monitored
and used to modify the next address. The 3-bit BEN field of each microword can select 3 of 24 possible
lines to be ORed with the next address field of the microword to select the address.
The BEN multiplexer monitors signals from both the CPU and FPA. POLY DONE and CP SYNC
are transmitted from the CPU using CS lines 71, 70, 57, 56, and 55. FLOAT, IRBRO L, and IRBRl L
are generated in the FPA but are summaries of op code information transmitted from the instruction
buffer. All other BEN lines monitor FPA internal conditions. Refer to Table 2-25 for a summary of
BEN fields. Finally the flows manipulate the result to ensure it is in correct form and inform the CPU
via FP SYNC asserted that the answer is available.

2-80

Table 2-25

BEN Control Store Field
Operation

Lines Monitored
BRC2L
BRCIL

BR COL

Summary

FLOATH*

IRBRI L*

IRBRO L*

NOP
Op code decode

SWR

Shift within range

RSVH

A=OH

Operand(s) equal zero
Reserved operand

POLYDNL* CPSYNCH*

FLOAT*

(A or B=O) H

ED.GE.8 H

Operand(s) equal zero
Check exponent difference

MUL/DIV

DNH

Multiply done
Division done

PR8H

Error Condition

BEN
Field

SUB*ED<2 H

UNDFL

*From the CPU.

The CPU accepts the answer via DFMX bus drivers on the FNM using DAP ENA ACC D (I) and
also reads the ACC Z, V, C, and N data lines to determine the condition codes of the answer. Once the
CPU has the answer it transmits a CPSYNC and the FPA returns to its IRD state.
2.3.8.3 Exception Conditions - At any time during either IRD or instruction states the CPU can
direct the FPA to enter a trap routine for error recovery or microdiagnostics. The trap routines are
located in the FPA's own microcode. There are two separate sets of trap routines: ACC traps for CPU
and FPA errors and FP traps for microdiagnostics. Both trap routines are initiated via CS lines 71 and
70.
If CS bus 71 is Hand CS bus 70 is L, an ACC TRAP is initiated. An ACC TRAP addresses the FPA
microcode location selected by CS bus lines 57, 56, and 55 (location 0-7). These traps are normally
initiated for.power-up and abort sequences.

If CS bus 71, 70, 57, and 56 are high and 55 is low, an FP trap is initiated. The FP trap selects an 8-bit
address previously stored in ID register 16, the Status register to access one of 256 addresses in the
FPA microcode (location 0-255). These trap locations normally handle FPA microdiagnostics. Refer
to Figure 2-34.

2-81

2.4 FPA Mlcrocontrol Fields
This section summarizes all the fields in the FPA microcontrol word. Figure 2-35 shows the complete
microcontrol word, all the fields, and the microcode mnemonics. Table 2-26 lists the function of each
field.
47

I II III I I I I I I I I I I I
Jl

NEXT ADDRESS

EALU
MCTL
CONTROL
FPSYNC

exPo NENT
PROCESSOR
CONTROL

2.1

t
EALU B
INPUT

J~l

MISCELrANEousscRATCH
CONTROLS
PAD
WAIT
NORM.
CONTROL
REGISTER

-------v------A-------.-----n------y.-------'
BUS A - BUS B
FRACTION
SIGN LATCH
DATA SOURCE

EALUA
INPUT

BRANCH
ENABLE

'-------J

PROCESSOR
CONTROL

CONTROL

MULTIPLIER
OPERAND
CONTROL

REMAINDER
REGISTER
CONTROL
TK-0513

Figure 2-35

FPA Control Word Fields

2-82

Table 2-26 FPA Control Word Field Definitions
Microcode Bits

Field

Function

47 :39 (9 bits)

NAD - Next Address

Contains the address of the next control word
to be accessed.

38:36 (3 bits)

BEN - Branch Enable

Selects signals to be used for next address
calculations.

35 :34 (2 bits)

AMXC - A Mux Control

Selects A input to FCT exponent ALU.

33 :32 (2 bits)

BMXC - B Mux Control

Selects B input to FCT exponent ALU.

31 :30 (2 bits)

EALUC - EALU Control

Controls FCT exponent ALU operation.

29 (1 bit)

FPSYNC - Floating-Point
Synchronize

Transmits FPSYNC to CPU.

28 (1 bit)

MCTL - Multiply Control

Starts FML and FMH fraction multiply
operation.

27 :24 (4 bits)

EAC - Exponent Processor
Control

Controls FCT (exponent processing).

23 (1 bit)

WAIT - Wait

Controls FPA wait loop operation. Stalls until
CPSYNC.

22 :20 (3 bits)

MSC - Miscellaneous
Control

Controls Miscellaneous FPA operations.

19: 18 (2 bits)

NRC - Normalization
Register Control

Controls fraction normalize operation in FNM.

17:16 (2 bits)

SCR - Scratchpad Control

Handles FPA General Register copies on FNM.

15:12 (4 bits)

BSC - Bus A - Bus B
Data Source

Controls data transmission along FPA buses.

11 :8 (4 bits)

F ADC - Fraction
Processor Controls

Controls FAD fraction processing.

7:5 (3 bits)

SGNC - Sign Latch
Controls

Controls sign calculation on FCT.

4 (1 bit)

LRR - Load Remainder
Register

Controls remainder register (RR) on FNM.

3:0 (4 bits)

OPLD - Operand Load
(Multiplier Control)

Loads fractions for multiplication on FML
and FMH.

2-83

2.5 FPA MICROCODE STRUCTURE
The FP A contains a 512 word by 48 bits (per word) memory. This memory provides microcontrol of
the FPA during normal operation and diagnostic programs for maintenance and troubleshooting.
About 225 locations are for normal microcontrol, and 200 locations contain diagnostic programs. The
other locations are available for future use.
The microcontrol code has an IRD state (instruction register decode) and three fork points (A, B, and
C). The FPA remains in the IRD state until an FPA instruction is decoded. The FPA then enters Afork, to receive the operands. If both operands are registers or short literals, optimized routines are
entered and computation begins. Otherwise, B-fork is entered. If the second operand is not register
data, C-fork is entered. Otherwise a B-fork optimization is taken. Figure 2-36 shows the basic microcode structure and indicates the microcode starting addresses of the various routines.
2.6 FPA INTERFACE FIRMWARE
The CPU-FPA interaction is handled by specialized firmware located in the CPU's writeable control
store (WCS).
This firmware handles numerous interface tasks. For ADD, SUBT, MUL, and DIV operations it
accepts and stores the FPA results and condition codes, and handles any exceptions flagged by the
FPA. In 3-operand op codes it calls specifier decoding microcode in the base machine to decode the
third operand. It also handles the special requirements of the EMOD, MULL and POLY commands.
It is accessed when the FPA overrides the CPU Address by forcing the µPC < 12> to 1. This happens
when the FPA detects an execution or optimization exit at a CPU A-fork, B-fork, or C-fork for an
FPA implemented instruction.
2.6.1 Major Interface Functiom
This firmware coordinates the interface between the CP microcode and the FP microcode including
the normal transfers of CPU data to the FPA, FPA results back to the proper register in the CPU, and
various control signals for both normal and exception control.
Table 2-27 lists important macros and microorders that are used by the FPA interface firmware to
generate and/ or monitor the signals which are transferred between the CPU and FPA.

2-84

IRD

A FORK

1F2

1 F8
DOUBLE

#.X

DATA SOURCE KEY

1FC
DOUBLE
R.X

R
SA#
#
MEM

B FORK
DOUBLE
X.#

C FORK
OAC

TK-0511

Figure 2-36 FPA Microcode Structure
2-85

Table 2-27

Interface Microcode

Name of Macro

Signal Monitored
or Generated

Data Transfer

Function

ID-D. SYNC

CP SYNC generated

CPU-+ FPA

Gates the CPU D-Register's contents onto the ID
bus. Generates CP SYNC.
CP SYNC indicates that
valid data is on bus.

D-ACCEL &
SYNC

CP SYNC generated

FPA-+ CPU

Gates data placed on
DFMX Bus by FPA into DRegister. CP SYNC indicates that the FPA's data
has been accepted.

Q-ACCEL &
SYNC

CP SYNC generated

FPA-+ CPU

Gates data placed on
DFMX Bus by FPA into QRegister. CP SYNC indicates that the FPA's data
has been accepted.

ACCEL?*
(BEN/ ACC<UB2,
UBI, UBO>)t

FP SYNC monitored

FPA-+ CPU

ACC<UBO> = l; Result
data, on D FMX bus, and
condition codes are being
transmitted by FPA. If
double precision condition
codes are passed with first
half.

ERR SYNC monitored

ACC<UBI> = I; An exception has been detected
by the FP A. This initiates
specialized routines that
handle the exception.

Not Mull** generated

ACC<UB2> = l; Separates MULL and MULF

POLY.DONE

POLY.DONE generated

CPU-+ FPA

Indicates the last coefficient
in the POLY operation, it
being presented. In
POLYD, used while both
halves of the last coefficient
are transmitted.

TRAP.ACC[l]

Accelerator Trap

Returns FPA microcode to
IRD state
Loads PSW <N,Z, V,C>
with FPA generated condition codes from CPU
latches loaded in previous
cycle.

MSC/LOAD.
ACC.CCT

* This macro, in combination with the target constraint block, enables the CP microcode to test for various
conditions.
t This is a microorder rather than a macro.
**This is a condition rather than a specific signal.

2-86

2.6.2 Major Instruction Grou~
The FPA firmware can be broken into 4 groups of routines: Generalized instructions handler, POLY
handler, MULL handler, and EMOD handler.
Group I handles all ADD, SUB, MUL, and DIV instructions as well as FPA exceptions. This group
provides optimized flows for operands located in the general register set and literal operands.
The POLY group transmits the polynomial coefficients to the FPA as they are needed and transmits
POLY DONE when the last coefficient has been transmitted. It also responds to the FPA detection of
overflow, underflow, and coefficient reserved operand. Overflow and reserved operand detections
causes a branch to exception conditions routines in the base machine. If an underflow is noted, the
firmware notes it and continues execution of the POLY flows.
The MULL routine accepts the result of the longword integer multiplication from the FPA. Since the
FPA creates an unsigned 64-bit product using 32-bit signed operands, the firmware must correct the
result by subtracting out the effects of the negative signs on the magnitude result. To do this the
firmware stores the operands in a form that can later be used as subtrahend operands to correct the
product and, based on this stored information, determines the correction sequence to select when the
result is transmitted from the FPA. The firmware also creates the proper signed result, sets the condition codes, and tests for overflow.
The FPA handles only the fraction multiply of the EMOD instructions. As a result the EMOD firmware is relatively short. While the FPA is doing the fraction multiply this routine adds the exponents
and checks for reserved operands, accepts the fraction multiply result from the FPA, checks for a zero
result, and formats the FP A result so control can return to the EMOD routines in the base machine.

2-87

FP780
FLOATING-POINT ACCELERATOR
TECHNICAL DESCRimON
EK-FP780-TD-001

Reader's Comments

Your comments and suggestions will help us in our continuous effort to improve the quality and usefulness of our
publications.

What is your general reaction to this manual? In your judgment is it complete, accurate, well organized, well
written, etc.? Is it easy to use? - - - - - - - - - - - - - - - - - - - - - - - - - - - -

What features are most u s e f u l ? - - - - - - - - - - - - - - - - - - - - - - - - - - - -

What faults or errors have you found in the m a n u a l ? - - - - - - - - - - - - - - - - - - -

Does this manual satisfy the need you think it was intended to satisfy? - - - - - - - - - - - - Does it satisfy your needs? _ _ _ _ _ _ _ _ _ _ _ __

Why?----------

Please send me the current copy of the Technical Documentation Catalog, which contains information on
the remainder of DIGITAL's technical documentation.

Name---------------- Street----------------Title
- -______________
- - - - - - - - - - - - -_- City - - - - - - - - - - - - - - - - - Company
State/Country - - - - - - - - - - - - - Zip
Department - - - - - - - - - - - - -

Additional copies of this document are available from:
Digital Equipment Corporation
444 Whitney Street
Northboro, Ma 01532
Attention: Communications Services (NR2/Ml5)
Customer Services Section
Order No.

EK-FP780-TD-001

-----------~~-----------

DoNotTear-FoldHereandStaple

FIRST CLASS
PERMIT NO. 33
MAYNARD, MASS.
BUSINESS REPLY MAIL
NO POSTAGE STAMP NECESSARY IF MAILED IN THE UNITED STATES
Postage will be paid by:

Digital Equipment Corporation
Technical Documentation Department
Maynard, Massachusetts 01754