Digital PDFs
Documents
Guest
Register
Log In
MISC-6840850E
August 1991
1022 pages
Original
51MB
view
download
Document:
NVAX CPU Chip Functional Specification
Order Number:
MISC-6840850E
Revision:
Pages:
1022
Original Filename:
OCR Text
NVAX CPU Chip Functional Specification· The NVAX CPU Chip is a high-perform~r..ce,single-chipimplerru:tntatioI1 of the,· VAX Ar~itecture for use in low-end and mid-range systems. :" Revlslon/Update Information: This is Revision 12 of this specification, wtiichsUP$r~d.esJ1evi,sion 1.1 released in August 1991. The inform ation in this speelfication)eflects pass 2 of the NVAX CPU chip. Only the Electrioal Charact~ti.sti¢s:,Ctita:pter was updated from:,f1evision'·1.1 to Revision 1.2. DIGITAL CONFIDENTJA'C' ' .: '. .~_'-:. ,.' ~_ '. '.: ,!'" l.J "_"'~" ....• :>.\' " , ' This information shall not be discloSed to persons otherthan.~~GITAL:emptoy.es;br·\1erirJ.y.;d-tStribute.d ~ithin ' DIGITAL. Distribution is restricted to persons auth.?r~ed aOO ;~signat~ by tmt·'o.tjgif1alij1g'btg~~iz~~qn ... 1hi$> .. document shall not be transmitted ,electronically, copied unle~"auti:lo~~igh1atinffOf9ariizatioFi~'orleft unattended. When not in use, this doelim&rit§hafI'-'be stored;'in al()Ckecr*-r.s~~;:atr••·; The"-restrictionsare enforced until this document is reclassif!,~~by:the~~,inating .organizati!?",!:, '~'. ~:-"'~::::';11;0:'.:;~'·<:· "~0i;C :',. '.. :. • j, ,~...~. \: ': . .:. ..... ~ • ..~., . . '... Semiconductor- Engi'A$sring Group Digital Equipment CorpOteit~on, HudsonrMasSClchusetts December 1991 The drawings and specificatio:ns in this document are the property of Digital Equipment Corporation and shall not be reproduced or copied or used in whole or in part as the basis for the manufacture or sale of items without written permission. The information in this document may be changed without notice and is not a commitment by Digital . Equipment Corporation. Digital Equipment Corporation is not responsible for any errors in this document. This specification does not describe any program or product that is currently available from Digital Equipment Corporation, nor is Digital Equipment Corporation committed to implement this specification in any program or product.' -Digital Equipment Corporation makes no commitment that this document accurately describes any product it might ever make. Copyright C1989, 1990, 1991 by Digital Equipment Corporation All Rights Reserved Printed in U .SA :~ , " The fonowing are trademarks ot:Digttal Equipment Corporation: . ULTRIX ULTRIX-32 , DEC DECnet DECUS MicroVAX MicroVMS PDP UNIBUS VAX VAXBI VAXcluster VAXstation VMS VT Contents INTRODUCTION 1-1 1.1 SCOPE AND ORGANIZATION OF THIS SPECIFICATION 1-1 1.2 RELATED DOCUMENTS 1-1 1.3 TERMINOLOGY AND CONVENTIONS 1.3.1 Numbering UNPREDICTABLE and UNDEFINED 1.3.2 Ranges and Extents 1.3.3 1.3.4 Must be Zero (Mal) 1.3.5 Should be Zero (SBl) 1.3.6 Register Format Notation 1.3.7 TIming Diagram Notation CHAPTER 1 1.4 ;:)' 1-1 1-1 1-1 1-2 1-2 1-2 1-2 1-5 . F. REVISION HISTORY 1-6 ARCHITECTURAL SUMMARY 2-1 OVERVIEW 2-1 VISIBLE STATE Virtual Address Space Physical Address Space 2.2.2.1 Physical Address Control Registers • 2-4 2.2.3 Registers 2-1 2-1 2-2 2-4 2.3 DATA TYPES 2-6 2.4 INSTRUCTION FORMATS AND ADDRESSING MODES 2.4.1 Opcode Formats 2.4.2 Addressing Modes 2.4.3 Branch Displacements 2-8 2-8 2-8 2-11 2.5 INSTRUCTION SET 2-11 2.6 MEMORY MANAGEMENT 2.6.1 Memory Management Control Registers 2.6.2 System Space Address Translation 2.6.3 Process Space Address Translation 2.6.3.1 PO Region Address Translation • 2-28 2.6.3.2 P1 Region Address Translation • 2-29 Page Table Entry 2.6.4 Translation Buffer 2.6.5 2-25 2-25 2-26 2-28 EXCEPTIONS AND INTERRUPTS Interrupts 2.7.1.1 Interrupt Control Registers • 2-34 2-33 2-33 CHAPTER 2 2.1 2.2 2.2.1 2.2.2 2.7 2-31 2-32 2.7.1 1 ,.. ~ ,.:.': ...:,~,;'" i.:~ ".,~. ; : ,. ~: . ~~ DIGITAL CONFIDENTIAL Iii Contents 2.7.2 Exceptions 2.7.2.1 2.7.2.2 2.7.2.3 2.7.2.4 2.7.2.5 2.7.2.6 2.8 SYSTEM CONTROL BLOCK 2.8.1 System Control Block Vectors 2.8.2 System Control Block Layout 2-41 2-41 2-42 2.9 CPU IDENTIFICATION 2-44 2.10 SYSTEM IDENTIFICATION 2-44 2.11 PROCESS STRUCTURE 2-46 2.12 PROCESSOR REGISTERS 2-49. 2.13 110 SPACE ADDRESSES 2-61 2.14 REVISION HISTORY 2-62 NVAX CHIP INTERFACE 3-1 3.1 INTRODUcnoN 3-1 3.2 NVAX CPU PINOUT 3.2.1 NDAL Signals and nmlng 3.2.1.1 P%CPU_REQ_L • 3-6 P%CPU_HOLD_L· 3-7 3.2.1.2 3.2.1.3 POkCPU_SUPPRESS_L • 3-7 3.2.1.4 POkCPU_GRANT_L • 3-7 3.2.1.5 POkCPU_WB_ONLY_L • 3-7 3.2.1.6 POkNDAL_H<63:0> • 3-7 3.2.1.7 POkCMD_H<3:O> • 3-7 3.2.1.8 POklD_H<2:0> • 3-7 3.2.1.9 POkPARITY_H<2:0> • 3-7 3.2.1.10 POkACK_L • 3-7 3.2.2 Clocking signals 3.2.2.1 POkOSC_H, POkOSC_L • 3-8 3.2.2.2 POkOSC_TC1_H, POkOSC_TC2_H • 3-8 3.2.2.3 POkOSC_TEST_H • 3-8 POkPHI12_0UT_H, POkPH123_0UT_H, POkPHI34_0UT_H, 3.2.2.4 POkPHI41_0UT_H • 3-8 POkPHI12_IN_H, POkPHI23_IN_H, POkPHI34_IN_H, 3.2.2.5 P%PHI41_IN_H • 3-8 3.2.2.6 POkASYNC_RESET_L • 3-8 3.2.2.7 PO/oSYS_RESET_L • 3-9 3.2.3 Interrupt and Error Signals 3.2.3.1 PO/oMACHINE_CHECK_H • 3-9 3.2.3.2 PO/oIRQ_L<3:O> • 3-9 3.2.3.3 POkH_ERR_L • 3-9 3.2.3.4 POkS_ERR_L • 3-9 3.2.3.5 P%INT_TIM_L • 3-10 3.2.3.6 POkPWRFL_L • 3-10 CHAPTER 3 Iv 2-35 Arithmetic Exceptions • 2-36 Memory Management Exceptions • 2-37 Emulated Instruction Exceptions • 2-38 Vector Unit Disabled Fault • 2-40 Machine Check Exceptions • 2-40 Console Halts • 2-40 3-1 3-4 3-8 3-9 DIGITAL CONFIDENllAL Contents 3.2.4 3.2.5 ~ .. .' 3.3 ~ 3.2.3.7 POkHALT_L • 3-10 Cache Interface signals 3.2.4.1 POkTS_INDEX_H<20:5> • 3-10 3.2.4.2 POkTS_OE_L • 3-10 3.2.4.3 POkTS_WE_L • 3-10 3.2.4.4 POkTS_TAG_H<31:17> • 3-12 3.2.4.5 POkTS_ECC_H<5:0> • 3-12 3.2.4.6 POkTS_OWNED_H • 3-12 3.2.4.7 POkTS_VALlD_H • 3-12 3.2.4.8 POkDR_INDEX_H<20:3> • 3-12 3.2.4.9 POkDR_OE_L • 3-12 3.2.4.10 POkDR_WE_L • 3-12 3.2.4.11 POkDR_DATA_H<63:0> • 3-12 3.2.4.12 POkDR_ECC_H<7:O> • 3-12 Test Pins 3.2.5.1 POkTEST_DATA_H • 3-13 3.2.5.2 POkTEST_STROBE_H • 3-13 3.2.5.3 POkDISABLE_OUT_L • 3-13 3.2.5.4 POkTEMP_H • 3-13 3.2.5.5 POkTMS_H • 3-13 3.2.5.6 POkTCK_H • 3-13 3.2.5.7 POkTDI_H • 3-13 3.2.5.8 POkTDO_H • 3-14 3.2.5.9 POkPP_CMD_H<2:0> • 3-14 3.2.5.10 P%PP_DATA_H<11:0> • 3-14 THE NDAL 3.3.1 Terms 3.3.2 NDAL Clocking NDAL Arbitration 3.3.3 3.3.3.1 NDAL Arbitration Signals • 3-19 3.3.3.1.1 PO/oCPU_REQ_L • 3-19 3.3.3.1.2 101 - REQ_L • 3-19 3.3.3.1.3 102_REQ_L • 3-20 3.3.3.1.4 PO/oCPU_HOLD_L • 3-20 3.3.3.1.5 101_HOLD_L • 3-20 102_HOLD_L· 3-20 3.3.3.1.6 3.3.3.1.7 P%CPU_SUPPRESS_L • 3-20 3.3.3.1.8 101_SUPPRESS_L • 3-21 3.3.3.1.9 102_SUPPRESS_L • 3-21 3.3.3.1.10 PO,4CPU_GRANT_L • 3-21 3.3.3.1.11 101_GRANT_L • 3-21 3.3.3.1.12 102_GRANT_L • 3-21 3.3.3.1.13 PO,4CPU_WB_ONLY_L • 3-21 3.3.3.1.14 101_WB_ONLY_L • 3-22 3.3.3.1.15 102_WB_ONLY_L • 3-22 3.3.3.2 NDAL Arbitration liming • 3-22 3.3.3.3 NDAL Suppress and Its liming • 3-24 3.3.3.4 NDAL Arbitration Rules • 3-24 DIGITAL CONFIDENTIAL 3-10 ... '..,.",., 3-13 .- t.:.'":, 3-15 3-17 3-18 3-18 v .) ~, Contents 3.3A 3.3.5 3.3.6 3.3.7 3.3.8 3.3.9 3.3.10 3.3.11 vi NDAL Information Transfer 3.3.4.1 POkNDAL_H<63:0> • 3-27 3.3.4.1.1 Address Field • 3-27 3.3.4.1.2 Byte Enable Field • 3-29 3.3.4.1.2.1 110 space writes • 3-33 3.3.4.1.3 Length Field • 3-33 3.3.4.2 P%CMD_H<3:0> • 3-33 3.3.4.3 POkID_H<2:0> • 3-35 3.3.4.4 POkPARITY_H<2:0> • 3-35 3.3.4.5 P%ACK_L • 3-36 NDAL Transactions 3.3.5.1 Reads and Fills • 3-41 3.3.5.1.1 Dstream Read Requests (DREAD) • 3-41 3.3.5.12 Istream Read Requests (IREAD) • 3-41 3.3.5.1.3 Ownership Read Requests (OREAD) • 3-41 3.3.5.1.4 How memory handles reads to Owned blocks • 3-42 3.3.5.1.5 Read cycle description and timing • 3-42 3.3.5.1.6 Read Data Return cycles (RDRO, RDR1, RDR2, RDR3) • 3-44 3.3.5.1.7 Read data error cycles (ROE) • 3-44 3.3.5.1.8 Read data cycle description and timing • 3-45 3.3.5.1.9 Read Transaction Examples • 3-45 3.3.5.1.9.1 Ouadword Read and Fill • 3-45 3.3.5.1.9.2 Multiple Ouadword Reads • 3-47 3.3.52 Writes· H9 3.3.5.2.1 Normal Write Transactions (WRITE) • 3-49 3.3.5.22 Disown Write Transactions (WDISOWN) • 3-49 3.3.5.2.3 Write Data and Bad Write Data (WDATA,BADWDATA) • 3-49 3.3.5.2.4 Write transaction description and timing • 3-49 3.3.5.2.5 Write Transaction Examples • 3-50 3.3.5.2.5.1 Ouadword Writes • 3-50 3.3.5.2.5.2 Multiple Ouadword Writes • 3-52 3.3.5.3 NOPs· 3-53 Cache Coherency Interrupts Clear Write Buffer VAX. archltecturally-deflned Interlocks 3.3.9.1 Ownership and Interlock transactions • 3-56 Errors 3.3.10.1 Transaction Timeout • 3-57 3.3.10.2 Non-existent memory and VO • 3-58 3.3.10.3 Error Handling • 3-58 3.3.10.4 Error Recovery • 3-63 NDAL Initialization 3-27 3-38 3-54 3-55 3-55 3-56 3-57 3-64 3.4 THE XMI-2 NVAX. SYSTEM Cache coherency In the XMl2 system 3A.1 3-65 3-65 3.5 THE LOWEND NVAX SYSTEM - OMEGA 3-67 3.6 RESOLVED ISSUES 3-68 3.7 NVAX. CHIP INTERFACE SIGNAL NAME CROSS-REFERENCE 3-70 3.8 REVISION HISTORY 3-72 DIGITAL CONFIDENTIAL Contents CHAPTER 4 4.1 CHIP OVERVIEW 4-1 4-1 4-2 4.1.1 4.1.2 4.1.3 4.1.4 4.1.5 4.1.6 NVAX CPU CHIP BOX AND SECnON OVERVIEW Thelbox The Ebox and Mlcrosequencer The Fbox The Mbox The Cbox Major Internal Buses 4-4 4-4 4-4 REVISION HISTORY 4-6 MACROINSTRUCTION AND MICROINSTRUCTION PIPELINES 5-1 5.1 INTRODUCTION 5-1 5.2 PIPELINE FUNDAMENTALS 5.2.1 The Concept of a Pipeline 5.2.2 Pipeline Flow Stalls and exceptions in an Instruction Pipeline 5.2.3 5-1 5-1 5-3 NVAX CPU PIPELINE OVERVIEW Normal Macroinstruction Execution 5.3.1.1 The Ibox • 5-8 5.3.1.2 The Microsequencer • 5-9 5.3.1.3 The Ebox • 5-9 5.3.1.4 The Fbox • 5-10 5.3.1.5 The Mbox • 5-10 5.3.1.6 The Cbox • 5-11 5.3.2 Stalls In the Pipeline 5.3.2.1 SO Stalls • 5-12 5.3.22 S1 Stalls • 5-12 5.3.2.3 S2 Stalls • 5-13 5.3.2.4 S3 Stalls • 5-14 5.3.2.5 S4 Stalls • 5-15 5.3.3 exception Handling 5.3,3.1 Interrupts • 5-17 5.3.32 Integer Arithmetic Exceptions • 5-17 5.3.3.3 Floating Point Arithmetic Exceptions • 5-17 5.3.3.4 Memory Management Exceptions • 5-18 5.3.3.5 Translation Buffer Miss • 5-19 5.3.3.6 Reserved Addressing Mode Faults • 5-19 5.3.3.7 Reserved Operand Faults • 5-20 5.3.3.8 Exceptions Occurring as the Consequence of an Instruction • 5-20 5.3.3.9 Trace Fault • 5-20 5.3.3.10 Conditional Branch Mispredict • 5-20 5.3.3.11 First Part Done Handling • 5-21 5.3.3.12 Cache and Memory Hardware Errors • 5-21 5-6 5-8 4.2 CHAPTER 5 5.3 5.3.1 5.4 REVISION HISTORY DIGITAL CONFIDENTIAL 4-3 4-3 5-5 5-11 5-16 5-22 vii Contents CHAPTER 6 MICROINSTRUCTION FORMATS 6-1 EBOX MICROCODE 6.1.1 Data Path Control 6.1.2 Mlcrosequencer Control 6-1 6-1 6-3 6.2 IBOX CSU MICROCODE 6-4 6.3 IBOX INSTRUCTION ROM AND CONTROL PLAS 6-5 6.4 REVISION HISTORY 6-8 6.1 CHAPTER 7 THE IBOX 7.1 7.2 7.3 vIII 7-1 OVERVIEW 7.1.1 Introduction 7.1.2 Functional Overview The Pipeline 7.1.3 7-1 7-1 7-2 7-4 INSTRucnON STREAM PREFETCHING 7.2.1 The VIC 7.2.1.1 VIC Control • 7-7 7.2.1.2 VIC_Reads· 7-8 7.2.1.3 VIC Fills • 7-8 7.2.1.4 VIC Writes • 7-9 7.2.1.5 VIC Bypass • 7-9 7.2.1.6 VIC Hits Under Miss • 7-10 7.2.1.7 VIC Exceptions and Errors • 7-10 7.2.1 .8 PC Load Effects • 7-10 7.2.1.9 E%STOP_IBOX_H Effects • 7-11 7.2.1.10 Prefetch Stop Conditions • 7-12 7.2.1.11 Prefetch Start Conditions • 7-12 7.2.1 .12 Prioritized List of Prefetch Start/stop Conditions • 7-12 7.2.1.13 VIC Enable • 7-13 7.2.1.14 VIC Flushing • 7-13 7.2.1.15 Flushing IREFs • 7-13 7.2.1.16 VIC Control and Error Registers • 7-14 7.2.1.17 VIC Performance Monitoring Hardware • 7-16 7.2.2 The Prefetch Queue 7.2.2.1 PC load effects • 7-17 INSTRUCnON PARSING 7.3.1 VAX Instruction Format The Instruction Burst Unit 7.3.2 7.3.2.1 Specifier Identification • 7-21 7.3.2.2 Operand Access Types • 7-23 7.3.2.3 DL stall • 7-24 7.3.2.4 Driving SPEC_CTRL • 7-24 7.3.2.5 PC and Delta_PC • 7-24 7.3.2.6 Branch Displacement Processing • 7-25 7.3.2.7 Ebox Assist Processing • 7-25 7.3.2.8 Reserved Addressing Modes • 7-26 7.3.2.9 Quadword Immediate Specifiers • 7-26 7.3.2.10 Index Mode Specifiers • 7-27 7.3.2.11 Loading a new opcode • 7-27 7.3.2.12 Reserved Opcodes • 7-28 7-5 7-5 7-17 7-17 7-19 7-19 DIGITAL CONFIDENTIAL Contents 7.3.3 7.4 7.3.2.13 Instruction Parse Completion • 7-28 7.3.2.14 Operands with Access Type VR and VM • 7-28 7.3.2.15 I%IMEM_MEXC_H and I%IMEM_HERR_H • 7-28 7.3.2.16 IBU stop and restart conditions • 7-29 7.3.2.17 First Part Done (FPD) Set • 7-29 The Instruction Issue Unit 7.3.3.1 Issue Stall • 7-30 7.3.3.2 PC Queue and PC loads • 7-31 OPERAND SPECIFIER PROCESSING 7.4.1 Operand Queue Unit 7.4.1.1 Source Queue Interface • 7-34 7.4.1.1.1 Short Literal Specifiers (Modes 0 .. 3) • 7-36 7.4.1.1.2 RMODE Specifiers (Mode 5) • 7-36 7.4.1.1.3 Index Mode Specifiers (Mode 4) • 7-36 7.4.1.1.4 All Other Addressing Modes • 7-36 7.4.1.2 Destination Queue Interface • 7-37 7.4.1.2.1 RMODE Specifiers (Mode 5) • 7-38 7.4.1.2.2 Index Mode Specifiers (Mode 4) • 7-38 7.4.1.2.3 All Other Addressing Modes • 7-38 7.4.1.3 Queue Entry Allocation • 7-38 7.4.1.4 MD Allocation • 7-39 7.4.1.5 Specifier Bus Enable • 7-39 7.4.1.6 E%STOP_IBOX and Branch Mispredict • 7-39 7.4.2 Complex Specifier Unit 7.4.2.1 CSU Microcode Control • 7-40 7.4.2.2 CSU Pipeline • 7-41 7.4.2.2.1 S1 Pipeline Stage • 7-41 7.4.2.2.2 S2 Pipeline Stage • 7-45 7.4.2.2.3 S3 Pipeline Stage • 7-48 7.4.2.3 RlOG • 7-51 7.4.2.4 Branch Mispredict effects • 7-52 7.4.2.5 E%STOP_IBOX Effects • 7-52 RSVD~DDR_FAULT effects • 7-52 7.4.2.6 7.4.2.7 CSU Microcode Restrictions • 7-53 7.4.2.8 Ibox IPR Transactions • 7-53 7.4.2.8.1 IPR Reads • 7-53 7.4.2.8.2 IPR Writes • 7-54 Scoreboard Unit 7.4.3 7.4.3.1 E%STOP_IBOX and Branch Mispredict PC load Effects • 7-55 7-30 7-32 7-32 7-40 7-54 7.5 BRANCH PREDICTION Branch Prediction Unit 7.5.1 7.5.1.1 The Branch Prediction Algorithm • 7-55 7.5.1.2 The Branch History Table • 7-56 7.5.1.3 Branch Prediction Sequence • 7-56 7.5.1.4 The Branch Queue • 7-57 7.5.1.5 Branch Mispredict • 7-58 7.5.1.6 Branch Stall • 7-58 7.5.1.7 PC loads • 7-58 7.5.1.8 Branch Prediction IPR Register • 7-59 7-55 7-55 7.6 PC LOAD EFFECTS Mispredlct PC Loads 7.6.1 7~1 DIGITAL CONFIDENTIAL 7~2 Ix Contents 7.6.2 Ebox PC Loads 7-62 E%STOP_IBOX EFFECTS 7-63 7.8.1 INmALIZATlON Mechanisms for lbox State Reset 7-64 7-64 7.9.1 7.9.2 7.9.3 7.9.4 7.9.5 ERRORS, EXCEPTIONS, AND FAULTS Overview Istream Memory Errors Dstrum Memory Errors Reserved Opcode Faults Reserved Addressing Mode Faults 7-64 7-64 7-64 7-65 7-65 7-65 7.10 IBOX SIGNAL NAME CROSS-REFERENCE 7-67 7.11 TESTABILITY Overview 7.11.1 7.11.2 Internal Scan Register and Data Reducer Parallel Port 7.11.3 7.11.4 Architectural Features 7-68 7-69 7-69 7-69 7-70 7.12.1 PERFORMANCE MONITORING HARDWARE Signals 7-70 7-70 REVISION HISTORY 7-71 THE EBOX 8-1 8.1 CHAPTER OVERVIEW 8-1 8.2 INTRODUCTION 8-1 8.3 CHAPTER STRUCTURE 8-4 EBOX OVERVIEW Mlcroword Fields 8.4.1.1 Microsequencer Control Fields • 8-6 8.4.2 The Register File 8.4.3 ALU and Shifter 8.4.3.1 Sources of ALU and Shifter Operands • 8-6 8.4.3.2 ALU Functions • 8-7 8.4.3.3 Shifter Functions • 8-7 8.4.3.4 Destinations of ALU and Shifter Results • 8-7 8.4.4 Ibox-Ebox Interface Other Registers and States 8.4.5 Ebox Memory Access 8.4.6 CPU Control Functions 8.4.7 8.4.8 Ebox Pipeline 8.4.9 Pipeline Stalls Mlcrotraps, Exceptions, and Interrupts 8.4.10 8-4 8-4 8-7 8-9 8-9 8-9 8-10 8-10 8-11 EBOX DETAILED FUNCTIONAL DESCRIPTION 8-13 7.7 7.8 7.9 7.12 7.13 CHAPTER 8 8.4 8.4.1 8.5 x 8-6 8-6 DIGITAL CONFIDENTIAL Contents 8.5.1 8.5.2 8.5.3 8.5.4 8.5.5 8.5.6 8.5.7 8.5.8 8.5.9 8.5.10 8.5.11 8.5.12 8.5.13 8.5.14 8.5.15 DIGITAL CONFIDENTIAL Register File 8.5.1.1 Register Groups • 8-13 8.5.1.2 Access Ports • 8-14 8.5.1.3 Register File Bypass Paths • 8-14 8.5.1.4 Write Collisions • 8-16 8.5.1.5 Valid, Fault, and Error Bits • 8-16 Constant Generation The ALU 8.5.3.1 ALU Condition Codes • 8-19 8.5.32 SMUL Step Definition • 8-20 8.5.3.3 UDIV Step Definition • 8-20 The Shifter 8.5.4.1 Shifter Condition Codes • 8-22 8.5.4.2 Shifter Sign • 8-23 RMUX and E_BUS%WBUS_L 8.5.5.1 RMUX Produced Memory Request Signals • 8-24 8.5.5.2 RMUX Produced E_BUs%WBUS_L Related Information • 8-24 VA Register Q Register Bypassing of Results Result Destinations Miscellaneous Ebox Registers and States 8.5.10.1 PSL • 8-27 8.5.10.1.1 Condition Code Alteration • 8-28 8.5.10.1.2 Trace and Trace Pending Bits • 8-29 8.5.10.2 SC • 8-29 8.5.10.3 INT.SYS • 8-30 8.5.10.4 MMGT.MODE • 8-30 8.5.10.5 State Flags • 8-30 8.5.10.5.1 E%MACHINE_CHECK_H • 8-31 8.5.10.5.2 State Rags and Pipeline Abort • 8-31 8.5.10.6 DL Part of the Instruction Context Register • 8-32 8.5.10.7 Mask Processing Unit • 8-32 Branch Condition Evaluator Miscellaneous Ebox Operand Sources 8.5.12.1 S+PSW_EX • 8-36 8.5.12.2 Population Counter • 8-36 8.5.12.3 RN.MODE.OPCODE· 8-36 8.5.12.4 PMFCNT Register • 8-37 VAX Restart Bit Ebox-Mlcrosequencer Interlace 8.5.14.1 Instruction Context Register • 8-38 8.5.14.2 Microtest Fields • 8-39 8.5.14.3 Miscellaneous Microsequencer Signals • 8-40 8.5.14.4 Miscellaneous Ebox-to-Microsequencer Signals • 8-41 Ebox-Ibox Interlace 8.5.15.1 Ibox Counters • 8-43 8.5.15.2 Source Queue • 8-43 8.5.15.3 Destination Queue • 8-44 8.5.15.4 Miscellaneous Queue Retire Information • 8-45 8.5.15.5 Branch Queue • 8-46 8.5.15.6 Operand and Branch Buses • 8-46 8-13 8-17 8-18 8-21 8-23 8-25 8-25 8-26 8-27 8-27 8-34 8-35 8-37 8-38 8-42 xl Contents 8.5.16 8.5.17 8.5.18 8.5.19 8.5.20 8.5.21 8.5.22 8.5.23 8.5.24 8.5.25 8.5.26 xII 8.5.15.7 Retire Queue • 8-47 8.5.15.8 Field Queue • 8-48 8.5.15.9 Retiring Instructions • 8-49 8.5.15.10 First Part Done • 8-49 8.5.15.11 Ebox to Ibox Commands and IPR Accesses • 8-49 8.5.15.12 Loading The PC • ~50 8.5.15.13 Ebox to Ibox Rush Signals • ~O 8.5.15.14 Detecting lbox Incurred Faults and Errors • 8-51 Ebox-Fbox Interface 8.5.16.1 Fbox Opcode and Operand Delivery • ~2 8.5.16.2 Fbox Result Handling • 8-53 8.5.16.3 Fbox Store Stall • ~3 8.5.16.4 Fbox Destination Scoreboard. • 8-54 8.5.16.5 Fbox Fault and Error Management • ~6 8.5.16.6 Ebox to Fbox Commands • ~6 8.5.16.7 Summary of Fbox-Ebox Signals • ~7 8.5.16.8 Fbox Disabled Mode • 8-58 Ebox-Mbox Interface 8.5.17.1 10 Read Synchronization • 8-63 8.5.17.2 Mbox-Ebox signals • 8-84 8.5.17.3 Ibox IPR Access and LOAD PC • 8-66 Ebox Vector Support Fault and Trap Management 8.5.19.1 Faults and Errors Detected in S4 • 8-68 8.5.19.1 .1 Coordinating Ebox and Fbox Faults and Errors • 8-68 8.5.19.1 .2 Breaking the S4 Stall • 8-09 8.5.19.2 Faults and Errors detected in S3 • 8-69 8.5.19.3 Integer Overflow and Branch Mispredict Traps • 8-69 8.5.19.4 Ebox Microtrap Handling • 8-70 8.5.19.5 Coincidence of Branch Mispredict Trap with other Traps • 8-70 8.5.19.6 Possible Microtrap Requests • ~71 8.5.19.7 Fbox Fault Reporting • ~71 Ebox Stalls 8.5.20.1 The STALL Microword • 8-74 8.5.20.2 Field Queue Stall • ~75 8.5.20.3 Ebox Stall Conditions • 8-75 8.5.20.4 Fbox and RMUX Related Stall Conditions • 8-76 Miscellaneous Operations Ebox IPRs 8.5.22.1 IPR 7C (hex), Patchable Control Store Control Register • 8-80 8.5.22.2 IPR 70 (hex), Ebox Control Register • 8-81 Initialization Timing Error Detection 8.5.25.1 S3 Stall limeout • 8-84 8.5.25.1 .1 Testing the S3 Stalllimeout Timer • 8-86 Testability 8.5.26.1 Parallel Port Test Features • 8-87 8.5.26.2 E%WBUS_H<31 :0> LFSR • 8-90 8-52 8-59 8-67 8-67 8-72 8-77 8-79 8-84 8-84 8-84 8-87 DIGITAL CONFIDENTIAL Contents 8.5.27 8.5.28 8.5.29 CHAPTER 9 9.1 9.2 Microcode Restrictions 8.5.27.1 Register Access Restriction • 8-91 8.5.27.2 FLUSH.PAQ Restriction • 8-91 8.5.27.3 Memory access restrictions • 8-91 8.5.27.4 Shifter Restrictions • 8-91 8.5.27.5 SHIFT.SIGN Restriction • 8-92 8.5.27.6 MMGT.MODE Restrictions • 8-92 8.5.27.7 MPU Restrictions • 8-92 8.5.27.8 Microbranch Condition Restrictions • 8-92 8.5.27.9 Ibox IPR read restriction • 8-92 8.5.27.10 RETIRE.lNSTRUCTION • 8-92 8.5.27.11 VAX Restart Bit Restriction • 8-92 8.5.27.12 Q Register Interaction With SMUL.STEP and UDIV.STEP • 8-92 8.5.27.13 UDIV/SMUL Restrictions • 8-93 8.5.27.14 F.DEST.CHECK Restrictions • 8-93 8.5.27.15 Fbox Operand Delivery Restriction • 8-93 8.5.27.16 RMUX control Restrictions • 8-93 8.5.27.17 Control Bits • 8-93 8.5.27.18 Microtrap Dispatch and RESET.CPU Restrictions • 8-93 8.5.27.18.1 Microtrap Flows • 8-93 8.5.27.18.2 MISC/RESET.CPU Restrictions • 8-94 8.5.27.18.3 Asynchronous Hardware Error Microtrap Restriction • 8-94 8.5.27.18.4 Rrst Part Done Dispatch Restriction • 8-94 8.5.27.19 PSL Use Restrictions • 8-94 8.5.27.20 S+PSW Restrictions • 8-96 8.5.27.21 RN.MODE.OPCODE Restrictions • 8-96 Signal Name Cross-Reference Revision History 8-91 8-97 8-99 THE MICROSEQUENCER 9-1 OVERVIEW 9-1 FUNCTIONAL DESCRIPTION Introduction Control Store 9.2.2.1 Patchable Control Store • 9-3 Loading the Patchable Control Store • 9-3 9.2.2.1.1 Microsequencer Control Field of Microcode • 9-8 9.2.2.2 9.2.2.2.1 Jump Format • 9-9 9.2.2.2.2 Branch Format • 9-10 9.2.2.3 M IB Latches • 9-1 0 Next Address Logic 9.2.3 9.2.3.1 CAL and CAL INPUT BUS • 9-11 9.2.3.1.1 Microtest Bus • 9-12 9.2.3.2 Microtrap Logic • 9-13 9.2.3.2.1 Microtraps • 9-13 9.2.3.2.2 Microtrap Request liming • 9-15 9.2.3.2.3 Prioritization of Microtraps • 9-15 Erroneous Microtrap Interruption • 9-16 9.2.3.2.4 Microtrap Detection Abort Effects • 9-17 9.2.3.2.5 9.2.3.3 Last Cycle Logic • 9-18 9.2.3.3.1 Interrupts • 9-19 9.2.3.3.2 Trace Fault • 9-1 9 9-1 9-1 9-3 9.2.1 9.2.2 DIGITAL CONFIDENTIAL 9-11 xiii Contents 9.2.4 9.2.3.3.3 First Part Done • 9-19 9.2.3.3.3.1 Interaction with Reserved Instructions • 9-19 9.2.3.3.4 Instruction Queue • 9-20 9.2.3.3.4.1 Instruction Context Latches • 9-22 9.2.3.4 Microstack • 9-22 Stall Logic 9.3 INITIALIZAnON 9-24 9.4 MICROCODE RESTRICTIONS 9-25 9.5 TESTABIUTY 9.5.1 Test Address 9.5.2 MlB Scan Chain 9-25 9-25 9-26 9.6 SIGNAL CROSS REFERENCE 9-30 9.7 REVISION HISTORY 9-33 CHAPTER 10 THE INTERRUPT SECTION 10-1 10.1 OVERVIEW 10-1 10.2 INTERRUPT SUMMARY External Interrupt Requests Received by Edge-SensHlve Logic 10.2.1 External Interrupt Requests Received by Level-Sensltlve Logic 10.2.2 Internal Interrupt Requests 10.2.3 Special Considerations for Interval Timer Interrupts 10.2.4 10.2.5 Priority of Interrupt Requests 10-1 10-2 10-2 10-4 10-5 INTERRUPT SECTION STRUCTURE 10.3.1 Edge Detect and Synchronization Logic 10.3.1.1 Edge Detect Circuitry • 10-8 10.3.1.2 Interrupt Synchronization • 10-9 10.3.2 Interrupt State Register 10.3.3 Interrupt Generation Logic 10-8 10-8 10-9 10-10 10.4 EBOX MICROCODE INTERFACE 10-12 10.5 PROCESSOR REGISTER INTERFACE 10-13 10.6 INTERRUPT SECTION INTERFACES Ebox Interface 10.6.1 10.6.1.1 Signals From Ebox • 10-14 H)"'6.1.2 Signals To Ebox • 10-14 M1crosequencer Interface 10.6.2 10.6.2.1 Signals from Microsequencer • 10-14 10.6.2.2 Signals To Microsequencer • 10-15 Cbox Interface 10.6.3 10.6.3.1 Signals From Cbox • 10-15 Ibox Interface 10.6.4 10.6.4.1 Signals From Ibox • 10-15 10.6.5 Mbox Interface 10.6.5.1 Signals From Mbox • 10-15 10.6.6 Pin Interface 10.6.6.1 Input Pins • 10-15 10.6.7 Signal Dictionary 10-14 10-14 10.3 xlv 9-24 10-6 10-14 10-15 10-15 10-15 10-15 10-15 DIGITAL CONFIDENTIAL Contents REVISION HISTORY 10-17 THE FBOX 11-1 11.1 OVERVIEW 11-1 11.2 INTRODUCTION 11-1 11.3 FBOX FUNCTIONAL OVERVIEW Fbox Interface 11.3.1 Divider 11.3.2 Stage 1 11.3.3 Stage 2 11.3.4 Stage 3 11.3.5 Stage 4 11.3.6 11-2 11-3 11-4 11-4 11-4 11-4 11-4 11.4 FBOX-EBOXINTERFACE Opcode Transfers to the Fbox 11.4.1 Operand Transfers to the Fbox 11.4.2 11.4.3 SummarY of Fbox Input Stage Stall Rules 11.4.4 Fbox Result Transfers to the Ebox 11.4.5 Fbox Pipeline Stalls Fbox Reset and Flush 11.4.6 11.4.7 Summary of Fbox-Ebox Signals 11.4.8 Fbox Instruction Set 11-4 11-5 11-6 11-7 11-8 11-10 11-11 11-11 11-12 11.5 DIVIDER 11.5.1 11.5.2 11-15 11-15 11-16 10.7 CHAPTER 11 Introduction Overview 11.6 INTERFACE SIGNAL TIMING DIAGRAMS 11-17 11.7 DIVIDER OPERATION 11-17 11.8 DIVIDER IMPLEMENTATION 11.8.1 Divider Fraction Data Path 11 .8.1.1 Divisor Register - DVR • 11-20 11 .8.1 .2 Divider Array • 11-20 11.8.1.2.1 DCSA and DSEL • 11-20 11.8.1.2.2 LAT1· 11-21 11.8.1.2.3 R2D and DCSAF· 11-21 11 .8.1.2.4 DFB and SHF • 11-21 CPA· 11-22 11.8.1.2.5 11.8.1.3 Ouotient Recoding and Quotient Registers • 11-23 11.8.1.3.1 OS21 and OREC • 11-24 11 .8.1.3.2 OM and OS registers • 11-24 11.8.1.3.3 OSEL and TSF· 11-25 11.8.2 Divider Control 11 .8.2.1 Divider Control Blocks • 11-26 11 .8.2.1 .1 Control Sequencer • 11-26 11.8.2.1.2 Opcode Information Latches • 11-27 11 .8.2.1 .3 Divider Behavior during ABORT • 11-27 11 .8.2.1 .4 Data path Control Drivers • 11-27 11 .8.2.2 Summary of Divider Stage Outputs • 11-27 11 .8.2.3 Data Valid Logic • 11-28 11-19 11-19 DIGITAL CONFIDENTIAL 11-25 xv Contents 11.8.3 11-28 11.9 STAGE 1 11-28 11.10 SECnON IMPLEMENTATION DESCRIPTION 11.10.1 Fraction Datapath 11.10.2 Integer Overflow - IOVF 11.10.3 Input Selector - ISEL Adder 11.10.4 Recoder Selector - RSEL 11.10.5 11.10.6 SRECODER 11.10.7 Multiplier 'TWo's Complement Register - MTCR<18:0> Recoder 11.10.8 11.10.9 PHI_4 LATCHES 11.10.10 Recoder Register - MRECR[0:6]<5:0> Multiplier Initial Partial Product Selector and Register - MlPPR 11.10.11 11.10.12 Multiplier Row 1 Selector and Register - MRW1 R 11.10.13 Multiplier Row 2 Selector and Register - MRW2R 11.10.14 Selector and Register - FD1 R 11.10.15 Selector and Register - FD2R 11-29 11-29 11-31 11-31 11-31 11-32 11-32 11-32 11-32 11-32 11-33 11-33 11-33 11-33 11-33 11-33 11.11 EXPONENT DATAPATH Stage 1 Exponent Processor Block diagram 11.11.1 Exponent Adders 11.11.2 11.11.3 Constants Zero Detection 11.11.4 11.11.5 Exponent Adder 1 11.11.6 Exponent Adder 2 Exponent Difference Detection 11.11.7 Output Selector 11.11.8 11-35 11-35 11-36 11-36 11-37 11-37 11-37 11-38 11-38 11.12 SIGN DATAPATH 11-39 11.13 STAGE 1 CONTROL 11.13.1 Divide Instruction 11-39 11-39 11.14 FRACTION DATAPATH OPERATION SUMMARY 11-40 11.15 FRACTION DATAPATH EXCEPTION SUMMARY 11-40 11.16 EXPONENT DATAPATH OPERATION SUMMARY 11-42 11.17 EXPONENT DATAPATH EXCEP110N SUMMARY Passthru Signals 11.17.1 11-43 11-43 11.18 STAGE 2 11.18.1 11.18.2 Introduction MUL Instruction Flows 11-44 11-44 11-44 STAGE 2 IMPLEMENTATION DESCRIPTION Fraction Datapath 11.19.1 MSEL - Multiplier Selector 11.19.2 11.19.3 MROW1 - Multiplier Row 1 MROW2 - Multiplier Row 2 11.19.4 MARRAY - Multiplier Array 11.19.5 11-47 11-47 11-49 11-49 11-50 11-50 11.19 xvi Exponent and Sign Data Path DIGITAL CONFIDENTIAL Contents 11.19.6 11.19.7 11.19.8 11.19.9 11.19.10 11.19.11 11.19.12 11.19.13 11.19.14 11.19.15 11.19.16 11.19.17 11.19.18 11.19.19 11.19.20 11.19.21 11.19.22 11.19.23 11.19.24 11.19.25 11.19.26 11.19.27 11.19.28 11.19.29 11.19.30 11.20 STAGE 3 11.20.1 11.20.2 11.20.3 11.20.4 11.20.5 11.20.6 DIGITAL CONFIDENTIAL MILSBSR<S:O> - Multiplier Integer LSB Sum Register MILSBCR<4:0> - Multiplier Integer LSB Carry Register RSHIFT - Right Shifter RSHFTOR<AO:B58> - Right Shifter Output Register SDEC - Shift Decoders SDECOR<57:0> - Shift Decoder Output Register DETL - Detection Logic DETLOR<BO:B57> - Detection Logic Output Register L 1DETL - Leading 1 Detection Logic LSSEL - Left Shift Selector LSENC - Left Shift Encoder LSHR<57:0> - Left Shifter Control Register FD1 SEL - Fraction Data 1 Selector FD1 R<AO:B58> - Stage 2 Fraction Data 1 Register FD2R<AO:B58> - Stage 2 Fraction Data 2 Register Exponent Datapath Zero Detection Exponent Adder 1 Floating Overflow and Underflow Detection Output Selector ED2R<5:0> - Exponent Data 2 Register Sign Datapath Control 11.19.28.1 Datapath Control Signals Output from Control Block • 11-58 Stage 2 Fraction Datapath Operation Summary Passthru Signals Introduction Stage 4 Bypass 11.20.2.1 Stage 4 Bypass Request • 11-64 11.20.2.2 Stage 4 Bypass Abort • 11-64 11.20.2.3 Stage 3 Response to FBOX Purge • 11-64 Section Implementation Description 11.20.3.1 Block Diagrams • 11-65 Fraction Datapath 11.20.4.1 Normalizer Input Selection • 11-69 11.20.4.2 Left Shifter • 11-69 11.20.4.3 Adder Input Selection • 11-70 11.20.4.4 Adder • 11-70 11.20.4.5 Mini-Round Incrementers • 11--72 11.20.4.6 Output Selector • 11-72 11.20.4.7 Fraction Datapath Operation Summary (Normal Operating Mode): • 11-73 Exponent Datapath 11.20.5.1 Constants • 11-74 -11.20.5.2 Zero Detection • 11-74 11.20.5.3 Exponent Adder 1 • 11-75 11.20.5.4 Output Selector • 11-75 11.20.5.5 Exponent Datapath Operation Summary (Normal Operating Mode): • 11-77 Sign Datapath 11-50 11-51 11-51 11-51 11-51 11-51 11-52 11-52 11-52 11-52 11-53 11-53 11-53 11-53 11-53 11-54 11-54 11-55 11-55 11-55 11-55 11-56 11-57 11-60 11-62 11-63 11-63 11-63 11-64 11-69 11-74 11-77 xvii Contents 11.20.7 Control 11.20.7.1 Miscellaneous Control Signals • 11-78 11.20.7.2 Oata_Valid • 11-78 11.20.7.3 Fault Bits and NEW_FOP • 11-78 11.20.7.4 Signs_NoCEql, Fb_Neg4 • 11-79 11.20.7.5 Integer Overflow Logic • 11-79 11.20.7.6 Cin_BSS • 11-80 11.20.7.7 SeLOther • 11-80 11.20.7.8 Left Shifter Input Selection Signals • 11-80 11.20.7.9 Osel1_Zero • 11-81 11.20.7.10 0se11_Ed1r • 11-81 11.20.7.11 MULL Adder • 11-81 11.21 STAGE 4 11-$1 11.22 FRACTION DATAPATH 11.22.1 Fraction Implementation Descrl ptlon 11.22.2 Fraction Operation 11-$2 11-$3 11-84 11.23 EXPONENT DATAPATH 11.23.1 Exponent Block Description 11.23.2 Exponent Operation 11.23.3 Floating Overflow and Underflow Detection 11.23.4 Output Selector 11-86 11-$7 11-$7 11-$7 11-$8 11.24 CONTROL 11.24.1 Control Block Description 11.24.2 Control Block Implementation 11-90 11-90 11-90 11.25 MISCELLANEOUS AND SIGN LOGIC 11.25.1 Miscellaneous Sign Logic Implementation 11.25.2 Sign and Negative Result Logic 11.25.3 Integer Overflow 11.25.4 Zero Result 11.25.5 Reserved Operand 11.25.6 Floating Divide by Zero 11-91 11-91 11-92 11-93 11-95 11-96 11-96 11.26 FBOX TESTABIUTY 11.26.1 FBOX_Test Control Signals 11.26.2 FBOX_Test Mode Description 11.26.2.1 FBOX Section Operation During FBOX_Test Mode • 11-97 11.26.3 Revision History 11-97 11-97 11-97 CHAPTER 12 THE MBOX xvIII 11-77 11-99 12-1 12.1 INTRODUCTION 12-1 12.2 MBOX STRUCTURE 12.2.1 IREF_LATCH 12.2.2 SPEC_QUEUE 12.2.3 EM_LATCH VAP_LATCH 12.2.4 12.2.5 MME_LATCH 12.2.6 RTY_OMISS_LATCH 12.2.7 CBOX_LATCH 12-2 12-6 12-8 12-9 12-11 12-12 12-14 12-16 DIGITAL CONFIDENTIAL Contents 12.2.8 12.2.9 12.2.10 12.2.11 12.2.12 12.2.13 12.2.14 12.2.15 12.3 PA_QUEUE TB MME_DATAPATH ARBITRATION LOGIC S6_PIPELATCH DMISS_LATCH and IMISS_LATCH MD_BUS_ROTATOR Pcache REFERENCE PROCESSING 12.3.1 REFERENCE DEFINITIONS 12.3.2 SIMPLE MBOX PIPELINE FLOW 12.3.3 REFERENCE ORDER RESTRICTIONS 12.3.3.1 No D-stream hits under O-stream misses • 12-26 12.3.3.2 No I-stream hits under I-stream misses • 12-26 12.3.3.3 Maintain the order of writes • 12-27 12.3.3.4 Maintain the order of Cbox references • 12-27 12.3.3.5 Preserve the order of Ibox reads relative to any pending Ebox writes to the same quadword address • 12-27 12.3.3.6 110 Space Reads from the Ibox must only be executed when the Ebox is executing the corresponding instruction • 12-27 12.3.3.7 Reads to the same Pcache block as a pending read/fill operation must be inhibited • 12~8 12.3.3.8 Writes to the same Pcache block as a pending readltill operation must be inhibited until the readlfill operation completes • 12-28 12.3.4 REFERENCE ARBITRATION 12.3.4.1 Arbitration Priority • 12~8 12.3.4.2 Arbitration Algorithm • 12-29 12.3.5 READS 12.3.5.1 Generic Read-hit and Read-miss/Cache_fili Sequences • 12-30 12.3.5.1.1 Returning Read Data • 12-31 12.3.5.1.1.1 Pcache Data Bypass • 12-31 12.3.5.2 I-stream Read Processing • 12-31 12.3.5.2.1 I-stream Read Hits • 12-31 12.3.5.2.2 I-stream Read Misses • 12-32 12.3.5.2.3 1/0 Space I-stream Reads • 12--32 12.3.5.3 D-stream Read Processing • 12--32 12.3.5.3.1 Reads under Rlls • 12--33 12.3.5.4 110 Space Reads • 12-33 12.3.6 WRITES 12.3.6.1 Destination Specifier Writes • 12-35 12.3.6.2 Explicit Writes • 12-36 12.3.6.3 Writes to 110 Space • 12--36 12.3.6.4 Byte Mask Generation • 12-36 12.3.7 IPR PROCESSING 12.3.7.1 MBOX IPRs • 12-37 12.3.7.2 Hardware MBOX IPR Format • 12-46 12.3.7.3 IPR Reads • 12-47 12.3.7.3.1 Mbox IPR Reads • 12-48 12.3.7.3.2 Non-Mbox IPR Reads • 12-48 12.3.7.4 IPR WRITES • 12-48 12.3.7.4.1 Mbox IPR Writes • 12-48 DIGITAL CONFIDENTIAL 12-17 12-18 12-18 12-18 12-18 12-19 12-20 12-21 12-23 12-23 12-24 12-25 12-28 12-30 12-34 12-37 xix Contents 12.3.8 12.3.9 12.3.10 12.3.11 12.3.12 12.3.13 12.3.14 12.3.15 12.3.16 12.3.17 12.3.18 12.3.19 12.3.20 12.3.21 12.4 xx 12.3.7.4.2 Non-Mbox IPR Writes • 12-49 LOAD_PC INVALIDATES CACHE FILL COMMANDS MME CHECK COMMANDS 12.3.11.1 MME_CHK· 12-50 12.3.11.2 PROBE· 12-50 TB Fills 12.3.12.1 TB Tag Rlls • 12-51 12.3.12.2 TB PTE Rlls • 12-52 TBIS TBIP TBIA STOP_SPEC_Q UNALIGNED REFERENCES 12.3.17.1 Unaligned Reads • 12-56 12.3.17.2 Unaligned Writes • 12-56 12.3.17.3 Byte Mask Generation for Unaligned Writes • 12-57 12.3.17.4 Unaligned Destination Specifier Writes • 12-58 12.3.17.5 Implication of Ebox unaligned references on M%EM_LAT_FUU._H • 12-58 ABORTING REFERENCES 12.3.18.1 Conditions for Aborting References • 12-59 12.3.18.1.1 Aborting to Maintain Reference Order Restrictions • 12-59 12.3.18.1.2 Aborting due to lack of hardware resources • 12-62 12.3.18.1.3 Aborting due to memory management operation • 12-63 12.3.18.1.4 Aborting due to an external flush condition • 12-63 MBOX PIPELINE DEADLOCK AVOIDANCE SCENARIOS 12.3.19.1 Unaligned Reference Deadlock Condition • 12-04 12.3.19.2 READ_LOCKlWRITE_UNLOCK Deadlock Condition • 12-64 THESPEC_Q_SYNC_CTR FLUSHING REFERENCES FROM THE MBOX PIPE 12.3.21.1 Ibox Rushes • 12-66 12.3.21.2 Ebox Rushes • 12-67 12.3.21 .2.1 Rushing due to E%EM_ABORT_L • 12-67 12.3.21 .2.2 Rushing due to E%FLUSH_MBOX_H • 12-67 12.3.21.2.3 Ebox Rushing of the PA_QUEUE • 12-68 THE PCACHE 12.4.1 PCCTL 12.4.2 Pcache HltlMiss Determination 12.4.2.1 HitlMiss Determination by Tag Comparison • 12-72 12.4.2.2 Conditions which force Pcache Miss • 12-72 12.4.2.3 Conditions which force Pcache Hit • 12-73 12.4.3 Pcache Read Operation 12.4.4 Peaehe Write Operation 12.4.5 Pcache Replacement Algorithm 12.4.6 Pcache Fill Operation 12.4.7 Pcache Invalidate Operation 12.4.8 PcachelPRAecess 12.4.9 Peache IPR Summary 12.4.10 Pcache States Resulting In UNPREDICTABLE operation 12.4.11 Pcache Redundancy Logie 12-49 12-49 12-50 12-50 12-51 12-54 12-54 12-55 12-55 12-55 12-58 12-63 12-65 12-66 12-70 12-71 12-72 12-73 12-74 12-74 12-74 12-75 12-75 12-76 12-77 12-71 DIGITAL CONFIDENTIAL Contents 12.5 MEMORY MANAGEMENT 12.5.1 NVAX MEMORY STRUCTURE 12.5.1.1 Virtual Address Space • 12-80 12.5.1.2 Physical Address Spaces • 12-81 Physical Address Space Mappings • 12-81 12.5.1.2.1 ADDRESS TRANSLATION AND THE TB • 12-83 12.5.1.3 12.5.1.4 30-bit to 32-bit Physical Address Translations • 12-85 12.5.1.5 MEMORY MANAGEMENT EXCEPTIONS • 12-85 12.5.1.5.1 MME_DATAPATH • 12-85 12.5. 1.5.1. 1 MME Register File • 12-86 12.5. 1.5.1.2 MME ALU • 12-87 12.5.1.5.1.3 MME_SEQ· 12-88 12.5.1.5.2 TB MISS SEQUENCE • 12-91 12.5.1.5.2.1 Single Miss Sequence • 12-91 12.5.1.5.2.2 Double Miss Sequence • 12-92 12.5.1.5.3 ACVITNVIM=O • 12-93 12.5.1.5.3.1 ACVITNVIM=O Fault Handling: • 12-93 12.5.1.5.3.2 ACV detection: • 12-94 12.5. 1.5.3.3 TNV detection • 12-94 12.5.1.5.3.4 M=O detection: • 12-95 12.5.1.5.3.5 Recording ACVITNVIM=O Faults • 12-95 12.5.1.5.3.6 ACVITNVIM=O MME_DATAPATH Sequence • 12-97 12.5.1.5.3.7 Microcode Invocation of ACVITNVIM=O • 12-98 12.5.1.5.3.8 Microcode Processing of ACVITNVIM=O: • 12-99 12.5.1.5.3.9 Pipeline Implications of ACVITNVIM=D condition • 12-100 12.5.1.5.3.9.1 Pipeline Effects for MME Faults on Write References • 12-100 12.5.1.5.3.9.2 Pipeline Effects for MME Faults on Read References • 12-100 12.5. 1.5.3.9.3 Pipeline Effects of E%A.USH_MBOX_H on MME State • 12-100 12.5.1.5.3.9.4 Pipeline Effects ofE%FLUSH_MBOX_H on "'''''ME_TRAP_L • 12-101 12.5.1.5.4 Cross Page Sequence • 12-102 12-79 12-80 12.6 MBOX ERROR HANDLING 12-103 12-103 12-103 12.6.1 12.6.2 12.6.3 12.6.4 12.6.5 DIGITAL CONFIDENTIAL Types of Errors Handled T8 parity error detection 12.6.2.1 TB tag parity error detection • 12-103 12.6.2.2 TB data parity error detection • 12-104 Pcache parity error detection 12.6.3.1 Pcache tag parity error detection • 12-104 12.6.3.2 Pcache data parity error detection • 12-105 Recording Mbox errors 12.6.4.1 TBSTS and TBADR • 12-106 12.6.4.2 PCSTS and PCADR • 12-107 Mbox Error Processing 12.6.5.1 Processing TB parity errors • 12-108 12.6.5.2 Processing Pcache parity errors • 12-109 12.6.5.3 Processing Cbox errors on Mbox-initiated read-like sequences • 12-110 12.6.5.3.1 Cbox-detected ECC errors • 12-110 12.6.5.3.2 Cbox-detected hard errors on requested fill data • 12-110 12.6.5.3.3 Cbox-detected hard errors on non-requested fill data • 12-111 12-104 12-105 12-108 xxi Contents 12.6.5.3.4 12.6.5.4 12.7 MBOX INTERFACES 12.7.1 IBOX INTERFACE 12.7.1.1 Signals from lbox • 12-116 12.7.1.2 Signals to lbox • 12-116 EBOX INTERFACE 12.7.2 12.7.2.1 Signals from Ebox • 12-117 12.7.2.2 Signals to Ebox • 12-117 INTERRUPT SECnON INTERFACE 12.7.3 12.7.3.1 Signals to Interrupt Section • 12-118 USEQINTERFACE 12.7.4 12.7.4.1 Signals to Useq • 12-118 12.7.5 CBOX INTERFACE 12.7.5.1 Signals from Cbox • 12-118 12.7.5.2 Signals to Cbox • 12-119 12-116 12-116 12-117 12-118 12-118 12-118 12.8 INITIALIZATION Power-up Initialization 12.8.1 Initialization by Microcode and Software 12.8.2 12.8.2.1 Pcache Initialization • 12-121 12.8.2.2 Memory Management Initialization • 12-121 12-120 12-120 12-120 12.9 MBOX TESTABIUTY FEATURES Internal Scan Register and Data Reducers 12.9.1 12.9.2 Nodes on Parallel Port 12.9.3 Nodes on Top Metal Architectural features 12.9.4 12.9.4.1 Translation Buffer Testability • 12-124 12.9.4.2 Pcache Testability • 12-125 12.9.5 M-BOX Miscellaneous Features 12-122 12-122 12-123 12-124 12-124 MBOX PERFORMANCE MONITOR HARDWARE 12.10.1 TB hit rate Performance Monitor Modes 12.10.1.1 TB hit rate for PO/P1 I-stream Reads • 12-127 12.10.1.2 TB hit rate for POIP1 D-stream Reads • 12-127 12.10.1.3 TB hit rate for SO I-stream Reads • 12-127 12.10.1.4 TB hit rate for SO D-stream Reads • 12-127 Pcache hit rate Performance Monitor Modes 12.10.2 12.10.2.1 Pcache hit rate for I-stream Reads • 12-127 12.10.2.2 Pcache hit rate for D-stream Reads • 12-128 Unaligned reference statistics 12.10.3 12-126 12-126 12.10 xxII Microcode Invocation on Cbox-detected Hard Errors • 12-111 Mbox Error Processing Matrix • 12-112 12-125 12-127 12-128 12.11 MBOX SIGNAL NAME CROSS-REFERENCE 12-129 12.12 REVISION HISTORY 12-132 DIGITAL CONFIDENTIAL Contents CHAPTER 13 THECBOX 13-1 13.1 TERMINOLOGY 13-1 13.2 FUNCTIONAL OVERVIEW OF THE CBOX AND BACKUP CACHE 13.2.1 The Cbox and the System 13.2.2 Wrlteback Cache and Ownership Concepts Backup Cache Operating Modes 13.2.3 13-1 13-2 13-3 13-3 13.3 NVAX BACKUP CACHE ORGANIZATION AND INTERFACE Backup Cache Interface 13.3.1 13.3.1.1 POkTS_INDEX_H<20:S> • 13-10 13.3.1.2 POkTS_OE_L • 13-11 13.3.1.3 POkTS_WE_L • 13-11 13.3.1.4 POkTS_TAG_H<31 :17> • 13-11 13.3.1.5 POkTS_ECC_H<S:O> • 13-12 13.3.1.6 POkTS_OWNED_H • 13-12 13.3.1.7 POkTS_VALlD_H • 13-12 13.3.1.8 POkDR_INDEX_H<20:3> • 13-12 13.3.1.9 POkDR_OE_L • 13-13 13.3.1.10 POkDR_WE_L • 13-13 13.3.1.11 POkDR_DATA_H<63:0> • 13-13 13.3.1.12 P%DR_ECC_H<7:0> • 13-13 Backup Cache Block Diagrams 13.3.2 13-4 13-10 THE CBOX DATAPATH Mbox Interface 13.4.1 13.4.1.1 Mbox to Cbox Transactions • 13-26 13.4.1.1.1 The IREAD_LATCH and the DREAD_LATCH • 13-26 13.4.1.1.2 WRITE_PACKER and WRITE_QUEUE • 13-27 13.4.1.2 Cbox to Mbox Transactions • 13-30 13.4.1.2.1 CM_OUT_LATCH • 13-32 13.4.1.2.2 FILL_DATA_PIPE1 and FILL_DATA_PIPE2 • 13-33 13.4.1.2.3 IREAD Aborts • 13-35 13.4.2 ECC Datapaths 13.4.2.1 Backup Cache Tag Store ECC • 13-36 13.4.2.2 Backup Cache Data Store ECC • 13-39 The BIU 13.4.3 13.4.3.1 NDAL_IN_QUEUE • 13-42 NON_WRITEBACK_QUEUE· 13-44 13.4.3.2 13.4.3.3 WRITEBACK_QUEUE • 13-44 13.4.3.4 limeout counters • 13-46 13.4.3.5 BIU clocking: Relating internal cycles to external cycles • 13-50 13.4.4 The FILL_CAM 13.4.4.1 Block-conflict in the FILL_CAM • 13-54 13.4.4.2 The FILL_CAM and DREAD_LOCKs • 13-54 13-19 13-24 13.4 13.5 CBOX INTERNAL PROCESSOR REGISTERS DIGITAL CONFIDENTIAL 13-14 13-36 13-42 13-53 13-56 xxiii Contents 13.5.1 13.5.2 13.5.3 13.5.4 13.5.5 xxiv Cbox Control IPR (CCTL) 13.5.1.1 ENABLE • 1~2 13.5.1.2 TAG_SPEED • 13-62 13.5.1.3 DATA_SPEED • 13-62 13.5.1.4 SIZE • 13-63 13.5.1.5 FORCE_HIT • 13-63 13.5.1.6 DISABLE_ERRORS • 13-63 13.5.1.7 SW_ECC • 13-63 13.5.1.8 TIMEOUT_TEST • 13-64 13.5.1.9 DISABLE_PACK • 13-64 13.5.1.1 0 PM~CCESS_TYPE • 13-64 13.5.1.11 PM_HIT_TYPE· 13-64 13.5.1.12 FORCE_NDAL_PERR· 13-64 13.5.1.13 SW_ETM· 13-64 13.5.1.14 HW_ETM· 13-64 IPR A2 (hex), BCDECC Backup Cache Tag Store Error Registers (BCETSTS, BCETIDX, BCETAG) 13.5.3.1 Beache Error Tag Status (BCETSTS) • 13-66 13.5.3.1.1 LOCK • 13-67 13.5.3.1.2 CORR ·1~7 13.5.3.1.3 UNCORR • 13-67 13.5.3.1.4 BAD_ADDR • 13-67 13.5.3.1.5 LOST_ERR • 13-08 13.5.3.1.6 TS_CMD • 13-68 13.5.3.2 8cache Error Tag Index (BCETIDX) • 13-68 13.5.3.3 Beaehe Error Tag (BCETAG) • 13-69 13.5.3.3.1 VALID • 13-70 13.5.3.32 OWNED • 13-70 13.5.3.3.3 ECC • 13-70 13.5.3.3.4 TAG • 13-70 Backup Cache Data RAM Error Registers (BCEDSTS, BCEDIDX, BCEDECC) 13.5.4.1 Beache Error Data Status (BCEDSTS) • 13-71 LOCK • 13-72 13.5.4.1.1 13.5.4.1.2 CORR • 13-72 13.5.4.1.3 UNCORR • 13-72 13.5.4.1.4 BAD_ADDR • 13-72 13.5.4.1.5 LOST_ERR • 13-73 13.5.4.1.6 DR_CMD • 13-73 13.5.4.2 Beaehe Error Data Index (BCEDIDX) • 13-73 13'.5.4.3 Beaehe Error Data ECC (BCEDECC) • 13-74 Fill Error Registers (CEFADR, CEFSTS) 13.5.5.1 Cbox Error Fill Status (CEFSTS) • 13-76 13.5~5.1 .1 RDLK • 13-77 13.5.5.12 LOCK • 13-77 13.5.5.1.3 TIMEOUT • 13-77 13.5.5.1 .4 ROE • 13-78 13.5.5.1.5 LOST_ERR • 13-78 13.5.5.1.6 100 • 13-78 13.5.5.1.7 IREAD • 13-78 13.5.5.1.8 OREAD • 13-78 13.5.5.1.9 WRITE • 13-78 13.5.5.1.10 TO_MBOX· 13-78 13-61 13-65 13-66 13-71 13-76 DIGITAL CONFIDENT.AL Contents 13.5.6 13.5.7 13.5.8 RIP • 13-78 13.5.5.1.11 13.5.5.1.12 OIP· 13-78 13.5.5.1.13 DNF· 13-79 13.5.5.1.14 RDLK_FL_DONE· 13-79 13.5.5.1.15 REO_Flll_DONE· 13-79 13.5.5.1.16 COUNT· 13-79 13.5.5.1.17 UNEXPECTED_FilL· 13-79 13.5.5.2 Fill Error Address (CEFADR) • 13-79 NDAL Error Registers (NESTS, NEOADR, NEOCMD, NEDATHI, NEDATLO, NEICMD) 13.5.6.1 NDAl Error Status IPR (NESTS) • 13-81 NOACK • 13-82 13.5.6.1.1 13.5.6.1.2 BADWDATA • 13-82 13.5.6.1.3 LOST_ OERR • 13-82 13.5.6.1.4 PERR • 13-82 13.5.6.1.5 INCON_PERR • 13-83 13.5.6.1.6 LOST_PERR • 13-83 13.5.6.2 NDAL Error Output Address IPR (NEOADR) • 13-83 13.5.6.3 NDAL Error Output Command (NEOCMD) • 13-83 13.5.6.4 NDAL Error Input Command (NEICMD) • 13-84 13.5.6.4.1 PARITY • 13-85 13.5.6.4.2 10 • 13-85 CMD • 13-85 13.5.6.4.3 13.5.6.5 NDAL Error Data High and NDAL Error Data Low (NEDATHI and NEDATLO) • 13-85 Backup Cache Tag Store Access Through IPR Reads and Writes (BCTAG) Backup cache deallocates through IPR access (BCFLUSH) 13-81 13-87 13-89 13.6 CBOX CONTROL DESCRIPTION 13-90 13.7 TRANSACTION DESCRIPTIONS 13.7.1 IPR Reads and IPR Writes 13.7.2 I/O Space 13.7.3 Clear Write Buffer Memory Read Hit 13.7.4 13.7.5 Read Miss and Fill 13.7.6 Write Hit 13.7.7 Write Miss 13.7.8Deallocates Due to CPU Reads and Writes 13.7.9 DREAD_LOCK and WRITE_UNLOCK 13-94 13-94 13-94 13-94 13-96 13-97 13-97 13-98 13-98 13-98 13.8 CACHE COHERENCY 13-99 13.9 ABNORMAL CONDITIONS 13.9.1 Cbox Behavior When the Backup Cache is OFF 13.9.2 Cbox Behavior When the Backup Cache is In FORCE_HIT Mode 13.9.3 Cbox Behavior When the Backup Cache Is In Error Transition Mode 13.9.4 Cbox transition Into Error Transition Mode 13.9.5 How to turn the Bcache off 13.9.6 How to turn the Bcache on 13.9.7 Assertion of POkCPU_WB_ONLY_L 13.9.8 Backup Cache Errors 13.9.9 Backup Cache Errors Incurred While In Error Transition Mode 13-101 13-103 13-104 13-104 13-106 13-108 13-108 13-109 13-111 13-114 DIGITAL CONFIDENTIAL xxv Contents 13.9.10 NDAL Parity Errors 13-114 13.10 TESTABILITY 13.10.1 Parallel port 13.10.2 Internal scan chain 13-115 13-115 13-116 13.11 PERFORMANCE MONITORING 13-118 13.12 INITIALIZATION 13-119 13.13 CBOX INTERFACES 13-120 13.14 RESOLVED ISSUES 13-124 13.15 NVAX CBOX SIGNAL NAME CROSS-REFERENCE 13-125 13.16 REVISION HISTORY 13-128 CHAPTER14 VECTOR INTERFACE 14-1 14.1 DESCRIPTION 14-1 14.2 REVISION HISTORY 14-4 ERROR HANDLING 15-1 15.1 TERMINOLOGY 15-1 15.2 ERROR HANDLING INTRODUCnON AND SUMMARY 15-1 15.3 ERROR HANDLING AND RECOVERY 15.3.1 Error State Collection 15.3.2 Error Analysis 15.3.3 Error Recovery 15.3.3.1 Special Considerations for Cache and Memory Errors • 15-9 15.3.3.1.1 Cache Coherence in Error Handling • 15-10 15.3.3.1.1.1 Cache Enable, Disable, and Flush Procedures • 15-10 15.3.3.1.1.1.1 Disabling the NVAX Caches for Error Handling (Leaving the Bcache in ETM) • 15-11 15.3.3. 1.1.1.2 Flushing and Disabling the Bcache • 15-11 15.3.3.1.1.1.3 Enabling the NVAX Caches • 15-11 15.3.3.1.2 Special Writeback Cache Recovery Situations and Procedures • 15-12 15.3.3.1.2.1 Bcache Uncorrectable Error During Writeback • 15-12 15.3.3.1.2.2 Memory State • 15-12 15.3.3.1.2.2.1 Accessing Memory State • 15-13 15.3.3.1.2.2.2 Repairing Memory State (Fill Errors) • 15-13 15.3.3.12.2.3 Repairing Memory State (Tagged-Bad Locations) • 15-14 15.3.3.1.2.3 Extracting Data from the Bcache • 15-14 15.3.3.12.4 Address Determination Procedure for Recovery from Uncorrectable Bcache Data RAM Errors • 15-15 15.3.3.1.2.5 Special Address Determination Procedure for Recovery from Uncorrectable Bcache Tag Store Errors • 15-15 15.3.3.1.3 Cache and TB Test Procedures • 15-16 15.3.4 Error Retry 15.3.4.1 General Multiple Error Handling Philosophy • 15-17 15.3.4.2 Retry Special Cases • 15-18 15-3 15-3 15-7 15-8 CHAPTER 15 xxvi 15-17 DIGITAL CONFIDENTIAL Contents 15.4 CONSOLE HALT AND HALT INTERRUPT 15-19 15.5 MACHINE CHECKS 15.5.1 Machine Check Stack Frame 15.5.2 Events Reported Via Machine Check Exceptions 15.5.2.1 MCHK_UNKNOWN_MSTATUS • 15-33 15.5.2.2 MCHK_INT.ID_VALUE • 15-33 15.5.2.3 MCHK_CANT_GET_HERE • 15-33 15.5.2.4 MCHK_MOVC.STATUS • 15-33 15.5.2.5 MCHK_ASYNC_ERROR • 15-34 15.5.2.5.1 TB Parity Errors • 15-34 15.5.2.5.2 Ebox S3 Stall Timeout Error • 15-34 15.5.2.6 MCHK_SYNC_ERROR • 15-35 15.5.2.6.1 VIC Parity Errors • 15-36 15.5.2.6.2 Bcache Data RAM Uncorrectable ECC Errors and Addressing Errors • 15-36 15.5.2.6.3 Bcache Lost Data RAM Access Error • 15-37 15.5.2.6.4 NDAL I-Stream or D-Stream Read or D-Stream Ownership Read Timeout Errors • 15-37 15.5.2.6.5 NDAL I-Stream or D-Stream Read or D-Stream Ownership Read Data Errors • 15-39 15.5.2.6.6 Lost Bcache Fill Error • 15-41 15.5.2.6.7 Unacknowledged NDAL I-Stream or D-Stream Read or D-Stream Ownership Read • 15-41 15.5.2.6.8 Lost NDAL Output Error • 15-42 15.5.2.6.9 PTE read errors • 15-43 15.5.2.6.9.1 PTE Read Errors in Interruptable Instructions • 15-43 15.5.2.6.9.2 Bcache Data RAM Uncorrectable EGG Errors and Addressing Errors on PTE Reads • 15-44 15.5.2.6.9.3 NDAL PTE Read Timeout Errors • 15-45 15.5.2.6.9.4 NDAL PTE Read Data Errors • 15-46 15.5.2.6.9.5 Unacknowledged NDAL PTE Read • 15-47 15.5.2.6.9.6 Multiple Errors Which interfere with Analysis of PTE Read Error· 15-47 15.5.2.7 Inconsistent Status in Machine Check Cause Analysis • 15-47 15-22 15-22 15-24 15.6 POWER FAIL INTERRUPT 15-48 15.7 HARD ERROR INTERRUPTS 15.7.1 Events Reported Via Hard Error Interrupts 15.7.1.1 Uncorrectable Data Errors and Addressing Errors During Write or Write-Unlock Processing • 15-51 15.7.1.2 Lost Bcache Data RAM Hard Errors • 15-53 15.7.1.3 Bcache Timeout or Read Data Error in Quadword OREAD RII After Write Data Merged • 15-53 15.7.1.3.1 Unexpected RII Error • 15--54 15.7.1.3.2 Lost Bcache RII Error • 15-54 15.7.1.4 NDAL No-ACK During WRITE or WDISOWN • 15-55 15.7.1.5 Lost NDAL No-ACK Hard Errors • 15-55 15.7.1.6 System Environment Hard Error Interrupts • 15-55 15.7.1.7 Inconsistent Status in Hard Error Interrupt Cause Analysis • 15-56 15-49 15-49 15.8 SOFT ERROR IN'rERRUPTS 15-57 DIGITAL CONFIDENTIAL xxvii Contents 15.8.1 xxvIII Events Reported Via Soft Error Interrupts 15-57 15.8.1.1 VIC Parity Errors • 15-69 15.8.1.2 Pcache Parity Errors • 15-69 15.8.1.3 Bcache Tag Store Uncorrectable Errors • 15-69 15.8.1.3.1 Case: BCETSTS<TS_CMD>=WUNLOCK • 15-70 15.8.1.3.2 Case: BCETSTS<TS_CMD>=DREAD,IREAD,OREAD • 15-70 15.8.1.3.3 Case: BCETSTS<TS_CMD>=R_INVAL,O_INVAL,IPR_DEALLOCATE • 15-] Lost Bcache Tag Store Errors • 15-71 15.8.1.4 Bcache Tag Store Correctable ECC errors • 15-71 15.8.1.5 Lost Bcache Tag Store Correctable ECC errors • 15-72 15.8.1.6 15.8.1.7 Bcache Data RAM Correctable ECC Errors • 15-72 Lost Bcache Data RAM Correctable ECC Errors • 15-72 15.8.1.8 Bcache Data RAM Uncorrectable ECC Errors and Addressing 15.8.1.9 Errors on I-Stream or D-Stream Reads • 15-73 15.8.1.10 Bcache Data RAM Uncorrectable ECC Errors and Addressing Errors on Writebacks • 15-73 15.8.1.11 Lost Bcache Data RAM Errors With Possible Lost Writebacks • 15-74 15.8.1.12 Lost Beache Data RAM Errors Without Lost Writebacks • 15-75 15.8.1.13 NDAL I-Stream or D-Stream Read or D-Stream Ownership Read limeout Errors • 15-76 15.8.1.14 NDAL I-Stream or D-Stream Read or D-Stream Ownership Read Data Errors • 15-77 15.8.1.15 Lost Bcache Fill Error • 15-79 15.8.1.16 Unacknowledged NDAL I-Stream or D-Stream Read or D-Stream Ownership Read • 15-79 15.8.1.17 Lost NDAL Output Error • 15-80 15.8.1.18 PTE read errors • 15-80 15.8.1.18.1 Bcache Data RAM Uncorrectable ECC Errors and Addressing Errors on PTE Reads • 15-81 15.8.1.18.2 NDAL PTE Read Timeout Errors • 15-82 15.8.1.18.3 NDAL PTE Read Data Errors • 15-83 15.8.1.18.4 Unacknowledged NDAL PTE Read • 15-84 15.8.1.18.5 Multiple Errors Which interfere with Analysis of PTE Read Error • 15-84 15.8.1.19 NDAL Parity Errors • 15-84 15.8.1.20 Lost Parity Errors • 15-85 15.8.1.21 System Environment Soft Error Interrupts • 15-85 15.8.1.22 Inconsistent Status in Soft Error Interrupt Analysis • 15-85 15.9 KERNEL STACK NOT VALID EXCEPTION 15-87 15.10 ERROR RECOVERY CODING EXAMPLES 15-88 15.11 MISCELLANEOUS BACKGROUND INFORMATION Note On Tagged-Bad Data Mechanisms 15.11.1 Note On Ownership Mechanism 15.11.2 15-88 15-88 15-88 15.12 REVISION HISTORY 15-90 DIGITAL CONFIDENTIAL Contents CHAPTER 16 CHIP INITIAUZATION 16-1 16.1 OVERVIEW 16-1 16.2 HARDWAREIMICROCODE INITIALIZATION 16-1 16.3 CONSOLE INITIALIZATION 16-2 16.4 CACHE INITIALIZATION 16-3 16.5 MISCELLANEOUS INFORMATION 16-6 16.6 REVISION HISTORY 16-7 CHAPTER 17 CHIP CLOCKING 17-1 17.1 OVERVIEW OF THE NVAX CLOCKING SYSTEM 17-1 17.2 RECEIVING THE NVAX EXTERNAL OSCILLATOR SIGNAL The System Environment 17.2.1 17.2.2 The Chip Test Environment 17-1 17-1 . 17-2 17.3 ON-CHIP CLOCKS 17.3.1 Clock GeneratlonlDlstrlbutlon Overview Global Clock Distribution 17.3.2 Section Clock Distribution 17.3.3 17.3.4 Global Clock Waveforms 17.3.5 Section Clock Waveforms 17.3.6 Clock Skews and Rise/Fail Times of the Section Clocks 17-3 17-3 17-5 17-5 17-5 17-6 17-7 17.4 THE NDAL INTERFACE TIMING SYSTEM 17.4.1 NDAL Clocks 17.4.2 Controlling Inter-Chlp Clock Skew 17-7 17-7 17-7 17.4.2.1 17.4.2.2 17.4.3 17.4.4 Self Skew • 17-8 Inter-Clock Skew • 17-9 Driving and Receiving NDAL signals Information Transfer between the NDAL clock system and the on-chip clock system 17-9 17-9 17.5 INITIALIZING THE NVAX SYSTEM. 17.5.1 Internal NVAX Reset 17.5.2 Generation of Clocks During Power-up Clock Generator Reset 17.5.3 17-9 17-10 17-11 17-11 17.6 NVAXCLOCK SECTION SIGNAUPIN DICTIONARY 17.6.1 Schematic - Behavioral Translation 17.6.2 Behavloral- Schematic Translation 17-14 17-14 17-15 17.7 REVISION HISTORY 17-16 DIGITAL CONFIDENTIAL x).lx Contents CHAPTER 18 PERFORMANCE MONITORING FACILITY 18-1 18.1 OVERVIEW 18-1 18.2 SOFTWARE INTERFACE TO THE PERFORMANCE MONITORING FACILITY 18.2.1 Memory Data Structure 18.2.2 Memory Data Structure Updates 18.2.3 Configuring the Performance Monitoring Facility 182.3.1 Ibox Event Selection • 18-4 18.2.3.2 Ebox Event Selection • 18-4 18.2.3.3 Mbox Event Selection • 18-5 18.2.3.4 Cbox Event Selection • 18-6 18.2.4 Enabling and Disabling the Performance Monitoring Facility 18.2.5 Reading and Clearing the Performance Monitoring Facility Counts 18-1 18-1 18-2 18-3 18-6 18-7 HARDWARE AND MICROCODE IMPLEMENTATION OF THE PERFORMANCE MONITORING FACILITY Hardware Implementation 18.3.1 18.3.2 Microcode Interaction with the Hardware 18-8 18-10 18-11 REVISION HISTORY 18-12 18.3 18.4 CHAPTER 19 TESTABILITY MICRO-ARCHITECTURE xxx 19-1 19.1 CHAPTER OVERVIEW 19-1 19.2 THE TESTABILITY STRATEGY 19-1 19.3 TEST MICRO-ARCHITECTURE OVERVIEW 19-2 19.4 PARALLEL TEST PORT Parallel Port Operation 19.4.1 19-3 19-4 19.5 TEST PADS 1~ 19.6 SYSTEM PORT 1~ 19.7 SERIAL P-CACHE PORT 19-7 19.8 IEEE 1149.1 (JTAG) SERIAL TEST PORT 19.8.1 TAP Controller State Machine 19.8.2 Instruction Register Bypass Register 19.8.3 19.8.4 Control Dispatch Logic 19.8.5 Initialization 19-7 19-9 19-11 19-12 19-12 19-16 19.9 BOUNDARY SCAN REGISTERS Boundary Scan Register Cells 19.9.1 Boundary Scan Register Organization 19.9.2 19-16 19-16 19-19 19.10 INTERNAL SCAN REGISTER AND LFSR REDUCER 19.10.1 Internal Scan Register Cells 19.10.2 Internal Scan Register Organization 19-22 19-22 19-23 19.11 OUTPUT PIN TRI-STATE CONTROL 19-23 19.12 OPERATING SPEED OF TEST LOGIC 19-24 19.13 REVISION HISTORY 19-25 DIGITAL CONFIDENTIAL Contents CHAPTER 20 ELECTRICAL CHARACTERISTICS 20-1 20.1 INTRODUCTION 20-1 20.2 NVAX DC OPERATING CHARACTERISTICS Maximum Ratings 20.2.1 Pin Driver Impedance 20.2.2 Pin Capacitance 20.2.3 20.2.4 Pin Operating Levels 20-1 20-1 20-3 20-4 20-4 20.3 NVAX AC OPERATING CHARACTERISTICS AC Conditions of Test 20.3.1 NDAL nmlng Specification 20.3.2 20.3.3 BCACHE llmlng Specification Other Pin Timing Specifications 20.3.4 20.3.4.1 Clock liming • 20-22 20.3.4.2 Reset Timing • 20-23 20.3.4.3 Interrupt, Error, and Test Pin liming • 20-24 20-7 20-7 20-9 20-11 20-22 20.4 REVISION HISTORY 20-26 APPENDIX A A.1 PROCESSOR REGISTER DEFINITIONS A-1 REVISION HISTORY A-19 INDEX FIGURES 1-1 Register Format Example 1-3 1-2 nmlng Diagram Notation 1-5 2-1 Virtual Address Space Layout 2-2 2-2 32-blt Physical Address Space Layout 2-3 2-3 30-blt Physical Address Space Layout 2-3 2-4 IPR E7 (hex), PAMODE 2-4 2-5 General Purpose Registers 2-5 2-6 Processor Status Longword Fields 2-5 2-7 Data Types 2-6 2-8 Opcode Formats 2-8 2-9 Addressing Modes 2-9 2-10 Branch Displacements 2-11 2-11 IPR 38 (hex), MAPEN 2-25 2-12 IPR 3A (hex), TBIS 2-25 2-13 IPR 39 (hex), TBIA 2-26 2-14 IPR OC (hex), SBR and IPR OD (hex), SLR 2-27 2-15 System Space Translation Algorithm 2-28 2-16 IPR 08 (hex), POBR and IPR 09 (hex), POLR 2-29 2-17 PO Space Translation Algorithm 2-29 2-18 IPR OA (hex), P1 BR and IPR OB (hex), P1 LR 2-30 DIGITAL CONFIDENTIAL xxxi Contents 2-19 2-20 2-21 2-22 PTE Format (21-blt PFN) PTE Format (25-blt PFN) Minimum Exception Stack Frame 2-23 General Exception Stack Frame 2-24 2-25 IPR 12 (hex), IPL IPR 14 (hex), SIRR IPR 15 (hex), SISR ArHhmetic Exception Stack Frame 2-26 2-27 2-28 2-29 2-30 2-31 2-30 2-31 2-31 2-33 2-33 2-35 2-35 2-35 2-37 2-38 2-39 2-40 Memory Management exception Stack Frame Instruction Emulation Trap Stack Frame Suspended Emulation Fault Stack Frame Generic Machine Check Stack Frame IPR 2A (hex), SAVPC and IPR 2B (hex), SAVPSL IPR 11 (hex), SCBB System Control Block Vector IPR OE (hex), CPUID IPR 3E (hex), SID IPR 10 (hex), PCBB 2-40 2-41 2-41 2-42 2-44 2-45 2-47 Process Control Block IPR Address Space Decoding 2-48 2-49 3-1 3-2 3-3 3-4 3-5 3-6 3-7 3-8 NDAL Pin Timing Relative to the NDAL CLOCKS NDAL Arbitration timing NDAL Suppress timing Address Cycle Format Physical Address Space Layout NDAL Memory Address Interpretation 3-5 3-11 3-19 3-23 3-25 3-27 3-28 3-29 3-9 3-10 3-11 3-17 POkACK_L Timing NDAL Read timing RDE example NDAL Fill timing Quadword Read and Fill Read command on the NDAL Read data return without using HOLD Read data return using HOLD NDAL Write timing 3-37 3-43 3-45 3-46 3-47 3-47 3-48 3-48 3-51 3-18 3-19 3-20 3-21 3-22 Quadword write on the NDAL Hexaword write on the NDAL Octaword write on the NDAL NVAX XMI-2 System Block Diagram XMI2 Unlock Write example 3-52 3-52 3-53 3-65 3-66 2-32 2-33 2-34 2-35 2-36 2-37 2-38 2-39 3-12 3-13 3-14 3-15 3-16 xxxII P1 Space Translation Algorithm Bcache Pin Timing Relative to INTERNAL NVAX Clocks (14ns system) NDAL Arbitration Block Diagram DIGITAL CONFIDENTIAL Contents ~23 4-1 NVAX Lowend System Block Diagram NVAX CPU Block Diagram 5-1 5-2 Non-Plpellned Instruction execution Partlally-Plpellned Instruction Execution 3-67 4-2 5-2 5-2 5-3 Fully-Plpellned Instruction Execution 5-3 5-4 5-4 5-7 5-8 Simple Three-Segment Pipeline Information Flow Against the Pipeline Stalls Introduced by Backward Pipeline Flow Buffers Between Pipeline Segments NVAX CPU Pipeline 6-1 6-2 Ebox Data Path Control, Standard Format Ebox Data Path Control, Special Format 6-1 6-2 6-3 Ebox Mlcrosequencer Control, Jump Format Ebox Microsequencer Control, Branch Format Ibox CSU Format 6-4 6-4 6-5 6-6 7-1 7-2 7-3 7-4 7-5 Ibox Instruction ROM Format Ibox Block Diagram VIC Block Diagram VIC Cache Row Format IPR DO (hex), VMAR IPR D1 (hex), VTAG 6-6 7-2 7-6 7-7 7-14 7-15 7-6 IPR D2 (hex), VDATA IPR D3 (hex), ICSR Prefetch Queue Block Diagram SourcelDestlnation Queue Entry Formats Microword Format Complex Specifier Unit Control Path Block Diagram Complex Specifier Unit Data Path Block Diagram Branch Table Entry Format 7-15 7-16 7-18 7-33 7-40 7-42 7-51 7-56 IPR D4 (hex), BPCR Ebox Block Diagram SMUL Step Operation UDIV Step Operation S+PSW Format RN.MODE.OPCODE E_BUSOkBBUS_L<31 :0> Source A Source Queue Entry A Destination Queue Entry Ebox Pipeline Latches 7-60 8-3 8-20 IPR 7C (hex), PCSCR IPR 7D (hex), ECR NVAX Timeout Counters EOfoWBUS_H LFSR Block Diagram PMFCNT Processor Register in EOfoWBUS_H<31:0> LFSR Format 8-80 8-82 8-85 8-90 8-91 5-5 5-6 6-4 6-5 7-7 7-8 7-9 7-10 7-11 7-12 7-13 7-14 8-1 8-2 8-3 8-4 8-5 8-6 8-7 8-8 8-9 8-10 8-11 8-12 8-13 DIGITAL CONFIDENTIAL 5-4 S-5 5-6 5-7 8-21 8-36 8-37 8-44 8-45 8-73 xxxiii Contents ~1 Mlcrosequencer Block Diagram ~2 Microcode Mlcrosequencer Control Field Formats Instruction Queue Entry Format 9-3 9-4 xxxiv Instruction Context Format ~5 Mlcrostack Organization 9-6 10-1 10-2 10-3 11-1 11-2 11-3 11-4 11-5 11-6 11-7 11-8 11-9 11-10 11-11 11-12 11-13 Parallel Port Output Format Interrupt SCB Vector Offset Interrupt Section Block Diagram IPR 7A (hex), INTSYS Fbox block diagram Fbox Execute Cycle Diagram Opcode Transfers to the Fbox Divider Array Block Diagram Input Signals from Input Interface Result Transfer to Stage-l Divider Fraction Data Path CPA Block Diagram Divider Sequencer State Transition Table Fraction Datapath Block Diagram Recoder Block Diagram Stage 1 Exponent Processor Block diagram Sign Datapath Block Diagram 11-14 11-15 Fraction Datapath Operation Table 11-16 Exponent Datapath Operation Table Fraction Datapath Exception Summary 11-17 Exponent Datapath Exception Table 11-18 Stage 2 Fraction Datapath Block Diagram 11-19 Stage 2 Exponent Datapath Block Diagram 11-20 Sign Datapath Block Diagram 11-21 Control Block Diagram 11-22 Fraction Datapath Operation Summary 11-23 Stage 2 Exponent Datapath Operation Summary 11-24 Stage 3 Fraction Datapath Block Diagram 11-25 11-26 Stage 3 Fraction Mlnl-round Block Diagram Stage3 Exponent Datapath Block Diagram 11-27 11-28 Fraction Datapath Operation Summary Fraction Datapath Block Diagram 11-29 11-30 Block Diagram of Exponent Processor Control Block Diagram 11-31 Miscellaneous Pia Block Diagram 12-1 12-2 Mbox Block Diagram Iref Latch 12-3 Spec Queue 9-2 9-8 9-20 9-22 9-23 9-26 10-3 10-8 10-12 11-2 11-3 11-7 11-17 11-18' 11-19 11-22 11-23 11-26 11-30 11-34 11-35 11-39 11-40 11-41 11-42 11-43 11-47 11-54 11-56 11-57 11-60 11-62 11-66 11-67 11-68 11-73 11-82 11-86 11-90 11-91 12-3 12-7 12-8 DIGITAL CONFIDENTIAL Contents 12-4 EM_LATCH 12-10 12-5 VAP_LATCH 12-11 12-6 MME_LATCH 12-13 12-7 RTY_DMlSS_LATCH 12-15 12-8 CBOX_LATCH 12-16 12-9 PA_QUEUE 12-17 12-10 DMISS_LATCH and IMlSS_LATCH 12-19 12-11 MD_BUS_ROTATOR 12-21 12-12 Basic Mbox Timing 12-25 12-13 2 Processor Synchronization Example 12-26 12-14 Memory Scoreboard Example 12-27 12-15 Barrel Shifter Function 12-35 12-16 IPR EO (hex), MPOBR 12-38 12-17 IPR E1 (hex), MPOLR 12-38 12-18 IPR E2 (hex), MP1 BR 12-39 12-19 IPR E3 (hex), MP1 LR 12-39 12-20 IPR E4 (hex), MSBR 12-39 12-21 IPR E5 (hex), MSLR 12-39 12-22 IPR E6 (hex), MMAPEN 12-40 12-23 IPR E7 (hex), PAMODE 12-40 12-24 IPR E8 (hex), MMEADR 12-41 12-25 IPR E9 (hex), MMEPTE 12-41 12-26 IPR EA (hex), MMESTS 12-41 12-27 IPR EC (hex), TBADR 12-42 12-28 IPR ED (hex), TBSTS 12-42 12-29 IPR F2 (hex), PCADR 12-43 12-30 IPR F4 (hex), PCSTS 12-43 12-31 IPR F8 (hex), PCCTL 12-44 12-32 IPRs 01800000 thru 01801 FEO (hex), PCTAG 12-45 12-33 IPRs 01COOOOO thru 01C01FF8 (hex), PCDAP 12-46 12-34 MPOLA Register 12-46 12-35 MP1 LA Register 12-47 12-36 MSLR Register 12-47 12-37 MP1 BR Register 12-47 12-38 TB_TAG_FILL Format (from MME_LATCH) 12-52 12-39 TB_TAG_FILL Format (from EM_LATCH): IPR 7E (hex), MTBTAG 12-52 12-40 TB_PTE_FILL Data Format (from MME_LATCH) 12-53 12-41 TB_PTE_FILL Data Format (from EM_LATCH): IPR 7F (hex), MTBPTE 12-54 12-42 PA_QUEUE conflict detection 12-60 12-43 Logical Pcache Organization 12-70 12-44 Peache Address Breakdown 12-71 12-45 IPR Address Space Mapping 12-76 12-46 Pcache Address Redundancy Mapping 12-78 DIGITAL CONFIDENTIAL xxxv Contents 12-47 Virtual Address Space Layout 12-80 12-48 Physical Address Space of the NVAX Hardware 12-81 12-49 3O-blt Physical Address Mapping 12-82 12-50 32-blt Physical Address Mapping 12-83 12-51 PTE and TB format 12-84 12-52 12-86 12-53 MME Datapath MME Sequences 12-54 MME Sequences Cont'd 12-55 IPR EA (hex), MMESTS 12-56 IPR ED (hex), TBSTS 12-57 IPR F4 (hex), PCSTS 13-1 The Cbox In the System 13-2 13-2 Backup Cache Tag RAM Pin Timing 13-8 12-106 12-107 13-3 Backup Cache Data RAM Pin Timing 13-9 13-4 13-5 Tags and Data for 128-Kllobyte Cache 13-14 Address as used for 128-Kllobyte Cache 13-15 13-6 Tags and Data for 256-Kllobyte Cache 13-16 13-7 Address as used for 256-Kllobyte Cache 13-8 Tags and Data for 512-Kllobyte Cache 13-16 13-17 13-9 Address as used for 512-Kllobyte Cache 13-17 13-10 Tags and Data for 2-Megabyte Cache 13-18 13-11 Address as used for 2-Megabyte Cache 13-18 13-12 13-13 Cbox block diagram with DATA_BUS Cbox block diagram with ADDRESS_BUS 13-22 13-23 13-14 Mbox Interface 13-24 13-15 B%S6_DATA_Hc63:O> bypass timing 13-16 M%ABORT_CBOX_IRD_H Timing 13-33 13-35 13-17 Tag Store ECC Block Diagram 13-37 13-18 Tag Store Error Correcting Code Matrix 13-38 13-19 Data RAM ECC Block Diagram 13-40 13-20 Backup Cache Data Store Error Correcting Code Matrix 13-41 13-21 NVAX Timeout Counters 13-22 BIU cycle counts NVAX time relative to NDAL time 13-47 13-50 13-23 13-24 13-51 IPR Address Space Decoding as seen by Software 13-56 (PR AO (hex), CCTL Format of the BCDECC 13-61 13-65 IPR A3 (hex), BCETSTS 13-66 13-25 13-26 13-27 13-28 IPR A4 (hex), BCETlDX IPR A5 (hex), BCETAG 13-69 13-29 xxxvi 12-89 12-90 12-95 13-69 13-30 13-31 IPR A6 (hex), BCEDSTS IPR A7 (hex), BCEDIDX 13-71 13-74 13-32 IPR A8 (hex), BCEDECC 13-75 DIGITAL CONFIDENTIAL Contents 13-33 13-34 IPR AC (hex), CEFSTS IPR AB (hex), CEFADR 13-76 13-80 13-35 13-36 IPR AE (hex), NESTS IPR BO (hex), NEOADR 13-81 13-83 13-37 13-38 13-39 13-40 13-41 13-42 IPR B2 (hex), NEOCMD IPR B8 (hex), NEICMD IPR B4 (hex), NEDATHI Backup Cache Tag Store IPR Addressing Format IPRs 01000000 thru 011FFFEO (hex), BCTAG 13-84 13-85 13-86 13-86 13-87 13-87 13-43 15-1 IPRs 01400000 thru 015FFFEO (hex), BCFLUSH IPR 2A (hex), SAVPC 13-89 15-19 15-2 15-3 15-4 15-5 15-6 15-7 15-8 15-9 15-10 IPR 2B (hex), SAVPSL IPR 26 (hex), MCESR Machine Check Stack Frame Cause Parse Tree for Machine Check Exceptions Power Fall Interrupt Stack Frame 15-19 15-22 15-23 15-11 Kernel Stack Not Valid Stack Frame NVAX CPU Interface Circuitry On-Chip XOR Test Functionality Waveforms On-Chip Clock Distribution Global Clock Wavefonns Relationship of Internal and NDAL Clock Cycles Self Skew System Reset nmlng 17-1 17-2 17-3 17-4 17-5 17-6 17-7 17-8 17-9 18-1 18-2 18-3 18-4 18-5 19-1 19-2 19-3 19-4 19-5 19-6 19-7 IPR B6 (hex), NEDATLO Hard Error Interrupt Stack Frame Cause Parse Tree for Hard Error Interrupts Soft Error Interrupt Stack Frame Cause Parse Tree for Soft Error Interrupts 15-26 15-48 15-49 15-50 15-67 15-58 15-87 17-2 17-3 17-4 17-6 17-8 17-8 17-10 Clock State During Initial Power-up Clock Generator Reset Timing Performance Monitoring Data Structure Base Address Per-CPU Performance Monitoring Data Structure IPR 3D (hex), PME IPR 7B (hex), PMFCNT In PMF Format Performance Monitoring Hardware Block Diagram Test Interface Unit 17-11 17-13 18-2 Internal Scan Register Operation nming 19-5 19-6 19-8 19-9 19-10 19-12 Self Relative Timing in Observe MAB Mode Serial Port Timing IEEE 1149.1 Serial Port (the Basic CTI) TAP Controller State Machine JTAG Instruction Register Cell DIGITAL CONFIDENTIAL 18-2 18-7 18-8 18-9 19-2 xxxvII Contents 19-8 19-9 IEEE 1149.1 Logic Timing during IR-5can Sequence IEEE 1149.1 Logic Timing during DR-5can Sequence 19-14 19-15 19-10 In_beell Boundary Scan Cell out_beell Boundary Scan Cell 19-17 19-17 lo_beell Boundary Scan Cell 19-18 19-18 19-21 19-22 19-23 19-11 19-12 19-13 19-14 md_beell Boundary Scan Cell Boundary Scan Register at TAG Store Interface 19-15 19-16 20-1 20-2 Cells for Internal Scan Registers An ISR section turned Into LFSR NDAL Pin Timing Relative to the NDAL CLOCKS Generic Data RAM Timing Diagram 20-12 20-3 Generic Tag RAM Timing Diagram 20-14 20-4 20-5 20-6 20-7 20-8 XNP Specific Data RAM Timing Diagram XNP Specific Tag RAM Timing Diagram Relationship of Internal and NDAL Clock Cycles 20-18 20-20 20-22 20-23 20-24 2~9 System Reset Timing Clock Generator Reset Timing TABLES 1-1 Register Field Description Example 1-3 1-2 1-3 1-4 Register Field 'tYpe Notation Register Field Notation Revision History 3D-bit Mapping of Program Addresses to 32-blt Hardware Addresses General Purpose Register Usage 1-3 1-4 1-6 2-4 2-5 Processor Status Longword General Register Addressing Modes PC-Relative Addressing Modes NVAX Instruction Set 2-5 2-10 2-11 2-12 PTE Protection Code Access Matrix 2-32 2-34 2-35 2-37 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 2-10 2-11 2-12 2-13 2-14 2-15 2-16 2-17 2-18 2-19 xxxvIII Interrupt Priority Levels Exception Classes Arithmetic exceptions Memory Management Exceptions Memory Management Exception Fault Parameter Instruction Emulation Trap Stack Frame System Control Block Vector System Control Block Layout SID Field Descriptions IPR Address Space Decoding Processor Registers 1/0 Space Registers 2-37 2-38 2-39 2-42 2-42 2-46 2-50 2-52 2-61 DIGITAL CONFIDENTIAL Contents 2-20 3-1 3-2 3-3 Revision History NVAX CPU pinout NDAL AC timing specs NVAX DAL Bandwidth at 30ns 3-4 NVAX DAL Bandwidth at 42ns 3-5 NDAL Signals NDAL clocks 3-6 3-7 3-8 3-9 3-10 3-11 3-12 3-13 3-14 3-15 3-16 3-17 3-18 3-19 3-20 3-21 3-22 3-23 3-24 3-25 4-1 5-1 6-1 6-2 6-3 6-4 6-5 6-6 6-7 7-1 7-2 7-3 7-4 7-5 7-6 7-7 7-8 Byte Enable for Quadword Reads and Writes Byte Enable for Octaword Writes Possible Byte Enables for NVAX-generated transactions NDAL Length Field 2-62 3-2 3-6 3-15 3-15 3-16 3-18 3-30 3-30 3-32 3-33 NDAL Command Encodlngs and Definitions NDAL Address Cycle Commands as used by the NVAX CPU Commander P%ID_H Assignments 3-34 3-34 3-35 . NDAL Parity Coverage NDAL Command Usage by NVAX NDAL Command Usage by NDAL nodes besides NVAX RDR usage for ALL fill cycles NVAX Backup Cache Invalidates and Wrltebacks NVAX Read Timeout Values In Normal Mode 3-36 NVAX Read Timeout Values In Test Mode NDAL Errors and NVAX CPU Error Responses NDAL Errors and Error Responses by System Components XMI2-NVAX Coherency requirements Cross-reference of all names appearing In the NVAX chip Interface chapter Revision History Revision History Revision History EBOX Data Path Control Mlcroword Fields, Standard Format EBOX Data Path Control Mlcroword Fields, Special Format Ebox Mlcrosequencer Control Mlcroword Fields, Jump Format Ebox Microsequencer Control Mlcroword Fields, Branch Format Ibox CSU Mlcroword Fields Ibox Instruction ROM Fields Revision History lbox Pipeline VIC Attributes VIBA bit fields VIC Status Flags VMAR Field Descriptions VTAG Field Descriptions VDATA Field Descriptions ICSR Field Descriptions DIGITAL CONFIDENTIAL 3-39 3-40 3-44 3-54 3-57 3-57 3-60 3-63 3-65 3-70 3-72 4-6 5-22 6-1 6-2 6-4 6-4 6-5 6-6 6-8 7-4 7-5 7-7 7-8 7-14 7-15 7-15 7-16 xxxix Contents 7-9 Specifier Control Fields 7-20 7-10 Complex Specifier Control Fields 7-20 7-11 Specifier Data Fields 7-21 7-12 Instruction Context Summary 7-21 7-13 Ebox Assist Summary 7-25 7-14 IBU stop and start summary 7-29 7-15 Instruction Queue Entry Format 7-30 7-16 I%OPERAND_BUS_H Definition 7-34 7-17 I%OPERAND_BUS_H Definition 7-34 7-18 Source Queue Entries Written for Non-field Access Type Operands 7-35 7-19 Source queue Entries Written for VR or VM Access Type Operands 7-35 7-20 Destination Queue Entries Written for Non-field Access Type Operands 7-37 7-21 Destination Queue Entries Written for VM Access Type Operands 7-38 7-22 Source Queue Entries Retired 7-39 7-23 Mlcroword Fields 7-40 7-24 Microcode Page Allocation 7-41 7-25 S1 Pipe Latch 7-43 7-26 Next Address Generation Fields 7-45 7-27 S2 Pipe Latch 7-46 7-28 CSU Registers 7-46 7-29 S3 Pipe Latch 7-48 7-30 Branch Prediction Logic 7-57 7-31 BPCR Field Descriptions 7-60 7-32 BPCR <8:6> 7-61 7-33 Reserved Addressing Mode Faults 7-66 7-34 Cross-reference of all names appearing In the Ibox chapter 7-67 7-35 Ibox Scan Register Fields 7-69 7-36 Revision History 7-71 8-1 Data Path Control Mlcroword Fields 8-4 8-2 GPR Write Length 8-13 ALU Operations 8-18 Shifter Operations 8-22 8-3 8-4 8-5 8-6 8-7 8-8 8-9 8-10 8-11 8-12 8-13 8-14 8-15 xl Condition Code Alteration Maps Specified By Microcode 8-28 Condition Code Alteration Maps Used By The Fbox 8-28 Setting and Clearing State Flags 8-30 MPU Calculation 8-32 Branch Condition Evaluation 8-34 Ebox Sourced Microbranch Conditions 8-40 Field Queue Branch 8-48 Detection of Ibox Incurred Faults and Errors 8-51 Ebox Mbox Requests 8-60 Ebox Memory Request Information Busses 8-62 Ebox Memory Request Information Truth Table 8-62 DIGITAL CONFIDENTIAL Contents 8-16 8-17 8-18 8-19 8-20 8-21 8-22 8-23 8-24 8-25 8-26 8-27 Ebox Response to M%MME_FAULT_H and MOfoHARD_ERR_H Ebox Mlcrotrap Requests Fbox Fault Codes Ebox Pipeline Stall and Flush Cases Ebox Miscellaneous Operations PCSCR Field Descriptions ECR Field Descriptions S3 Stall Timeout Values In Normal Mode S3 Stall Timeout Values In Test Mode Derivation of NVAX TImeout Values Ebox Observe Scan Signals PSt Restrictions Summary 8-65 8-71 8-72 8-73 8-77 8-81 8-83 8-85 8-86 8-87 8-87 8-96 8-87 8-28 8-29 Signal Name Cross-Reference Revision History 9-1 9-2 9-3 Example: Writing an Entry In the Patchable Control Store Contents of MlB Scan Chain, When Loading Patchable Control Store Jump Format Control Field Definitions 9-4 9-8 Branch Format Control Field Definitions Jump Format Control Field Decodes Branch Format Control Field Decodes Branch Address Formation Current Address Selection 9-9 9-10 9-10 9-11 9-9 9-10 9-11 9-12 9-13 9-14 9-15 9-16 Mlcrotest Bus Sources Mlcrotrap Request TIming Mlcrotraps Abort Effects In the Mlcrosequencer Mlcroaddresses for Last Cycle Interrupts or exceptions Instruction Queue Entry Format Field Definitions Control Store Address Formation Instruction Queue Operation 9-12 9-15 9-15 9-18 9-18 9-20 9-20 9-21 9-17 9-18 9-19 9-22 9-23 9-24 9-26 9-26 10-1 Instruction Context Format Field Definitions Mlcrostack Pointer Example Stall TIming In the Mlcrosequencer Parallel Port Output Format Field Definitions Contents of MlB Scan Chain, In Observe Mode Schematic Signal Names, In Alphabetical Order Behavioral Model Signal Names, In Alphabetical Order Revision History Interrupt Vector Offset Registers 10-2 10-3 10-4 10-5 Interrupt sca Vector Offset Internal Interrupt Requests Software Interrupts References to Interval TImer Processor Registers 9-5 9-6 9-7 9-20 9-21 9-22 9-23 9-24 DIGITAL CONFIDENTIAL 8-89 9-4 9-5 9-9 9-9 9-30 9-31 9-33 10-3 10-3 10-4 1~5 1~ xII Contents Relative Interrupt Priority 10-7 Summary of Interrupts 10-10 10-8 INTSYS Field Descriptions 10-9 Cross-reference of all names appearing In the Interrupt chapter 10-12 10-15 10-10 Revision History 10-17 11-1 Fbox Internal Execute Cycles List of the Fbox Total Execute Cycles 11-3 11-3 11-3 Fbox Floating Point and Integer Instructions 11-4 Total Fbox execute cycles for Divide operation 11-13 11-16 11-5 CSA Inputs 11-21 11-6 QM Cell Control Signals 11-25 10-6 10-7 11-2 11-7 Divider Output Stages 11-27 11-8 Stage 1 Fraction Register Operations 11-31 11-9 Exponent Adder Operations 11-36 11-10 Exponent Adder Carry-in Operations 11-37 11-11 Stage 3 Fraction Datapath Operations 11-69 11-12 Possible Values For Sum Bits cAO:B1> 11-71 11-13 Bit Injection Within Adder 11-72 11-14 11-15 Exponent Datapath Operation Summary 11-74 11-75 11-16 Exponent Output Selection Stage 3 Sign Datapath Operatlons/slgn_dp_oper 11-76 11-78 11-19 Categories of Datapath Operations Fraction Datapath Operations 11-20 Fraction Datapath Operation Summary 11-17 11-18 11-77 11-84 11-85 11-21 Exponent Datapath Operation Summary 11-89 11-22 Revision History 11-99 12-1 Reference Definitions Byte Mask Logic for Aligned References 12-23 Mbox IPRs 12-37 MMAPEN Field Descriptions PAMODE Field Descriptions 12-40 12-40 12-2 12-3 12-4 12-5 12-6 xiii LSB Carry-ln Values 12-37 MMESTS Field Descriptions 12-41 12-7 12-8 TBSTS Field Descriptions 12-42 PCSTS Field Descriptions 12-43 12-9 12-10 PCCTL Field Descriptions PCTAG Field Descriptions 12-44 12-45 12-11 PCDAP Field Descriptions 12-46 12-12 Probe Status Encodlngs 12-51 12-13 TB_TAG_FILL Definition 12-52 12-14 12-15 MTBTAG Field Descriptions TB_PTE_FILL Definition 12-52 12-53 12-16 MTBPTE Field Descriptions 12-54 DIGITAL CONFIDENTIAL Contents 12-17 Byte Mask Logic for Aligned and Unaligned References 12-57 12-18 PcachelPRs 12-76 12-19 MMESTS Field Descriptions 12-95 12-20 LOCK Encodlngs 12-96 12-21 FAULT Encodlngs 12-96 12-22 MMESTS State Update 12-97 12-23 18STS Field Descriptions 12-106 12-24 SRC Encodings 12-106 12-25 PCSTS Field Descriptions 12-107 12-26 Mbox Error Handling Matrix 12-112 12-27 Mbox Performance Monitor Modes 12-126 12-28 Cross-reference of all names appearing in the Mbox chapter 12-129 13-1 Backup Cache Size and RAMs Used 13-4 13-2 Tag and Index Interpretation based on cache size 13-4 13-3 Backup Cache RAM Speeds and NVAX Cycle Time 13-5 13-4 Cache pin drive times In the XNP environment 13-6 13-5 Cache pin timing symbol definitions 13-6 13-6 NVAX Backup Cache Interface Pins 13-10 13-7 Usage of P%TS_INDEX_H<20:5> based on cache size 13-11 13-8 Usage of P%TS_TAG_H<20:17> based on cache size 13-11 13-9 Usage of P%DR_INDEX_H<20:5> based on cache size 13-12 13-10 Cbox Queues and Major Latches 13-19 13-11 Mbox-Cbox Commands 13-25 13-12 Mbox to Cbox Command Matrix 13-26 13-13 tREAD_LATCH Fields 13-27 13-14 DREAD_LATCH Fields 13-27 13-15 WRITE_QUEUE Fields 13-28 13-16 Cbox to Mbox interface signals 13-30 13-17 Cbox to Mbox Command Matrix 13-31 13-18 Cbox to Mbox commands and resulting Mbox actions 13-31 13-19 CM_OUT_LATCH Fields 13-32 13-20 Fields of FILL_DATA_PIPE1 and FILL_DATA_PIPE2 13-34 13-21 Cbox Action Upon Receiving M%A8ORT_CBOX_IRD_H 13-35 13-22 NDAL_IN_QUEUE Fields 13-43 13-23 BIU commands sent to Cbox proper 13-43 13-24 NON_WRITEBACK_QUEUE Fields 13-44 13-25 WRITEBACK_QUEUE Fields 13-44 13-26 NVAX Tlmeout Values In Normal Mode 13-48 13-27 NVAX Tlmeout Values In Test Mode 13-48 13-28 Derivation of NVAX Tlmeout Values 13-49 13-29 FILL_CAM Fields Cbox Response to Coherence Transactions to FILL_CAM Entries 13-53 13-30 13-31 tPR Address Space Decoding 13-57 DIGITAL CONFIDENTIAL 13-54 xliii Contents 13-32 Cbox Processor Registers 13-33 CCTL Field Descriptions 13-58 13-61 13-34 TAG_SPEED 13-62 13-35 DATA_SPEED 13-62 13-36 SIZE 13-63 13-37 13-66 13-38 BCETSTS Field Descriptions Interpretation of TS_ CMD 13-39 BCETAG Field Descriptions 13-40 TAG Interpretation 13-69 13-70 13-41 BCEDSTS Field Descriptions 13-42 Interpretation of DR_CMD 13-71 13-73 13-43 BCEDIDX Interpretation 13-74 13-44 CEFSTS Field Descriptions 13-76 13-45 NESTS Field Descriptions 13-81 13-46 NEOCMD Field Descriptions 13-47 BCTAG Field Descriptions 13-84 13-87 13-48 Tag and Index interpretation for BCTAG IPR 13-88 13-49 Cbox Task Priority Under Normal Conditions. 13-50 Cbox Task Priority When DWR_CONFLICT Bits are Set In the WRITE_QUEUE. 13-91 13-91 13-51 Cbox Task Priority When IWR_CONFLICT Bits are Set In the WRITE_QUEUE. 13-92 13-52 Cbox Task Priority When a DREAD_LOCK Is In progress until the WRITE_UNLOCK Is done. 13-92 13-54 Order of quaclwords read from the Bcache NVAX Backup Cache Invalidates and Wrltebacks 13-96 13-99 13-55 Backup cache behavior while It Is ON 13-102 13-56 Backup cache behavior during ETM 13-105 13-57 Backup cache state changes during ETM 13-106 13-58 Backup Cache ECC Errors and NVAX CPU Error Responses 13-53 13-60 13-111 Probability of reading data with an uncorrectable error after writing It with Inverted checkblts 13-113 Backup Cache ECC Error handling during ETM 13-114 13-61 13-62 Cbox Parallel Port Connections Interpretation of BC_TS_CMD<2:0> 13-63 FILL_CAM scan chain 13-116 13-64 13-65 Cbox Performance Monitoring Control Cbox Performance Monitoring Control 13-118 13-118 13-66 CBOX Interface signals 13-120 13-67 Cross-reference of all names appearing In the CBOX chapter 13-125 13-68 Revision History 13-128 13-59 xliv 13-68 13-115 13-115 14-1 Vector Instruction Set 14-1 14-2 15-1 Revision History 14-4 Error Summary By Notification Entry Point 15-2 Console Halt Codes 15-2 15-19 DIGITAL CONFIDENTIAL Contents 15-3 CPU State Initialized on Console Halt 15-20 15-4 Machine Check Stack Frame Fields 15-23 15-5 Machine Check Codes 15-24 15-6 Revision History 15-90 16-1 Revision History 16-7 17-1 NVAX CPU Clock Sections 17-3 17-2 Skews and RlselFall Times 17-3 Revision History 17-7 17-16 18-1 Performance Monitoring Facility Box Selection 18-3 18-2 Ibox Event Selection 18-4 18-3 Ebox Event Selection 18-4 18-4 Mbox Event Selection 18-5 18-5 Cbox PMCTRO Event Selection 18-6 18-6 18-7 Cbox PMCTR1 Event Selection 18-6 Revision History 18-12 19-3 19-4 19-1 NVAX CPU's Test Pins 19-2 Parallel Port Operating Modes 19-3 Serial to Parallel Conversion of Scan Data 19-4 Instruction Register 19-5 19-11 19-5 Boundary Scan Register Organization 19-19 19-6 Revision History 19-25 20-1 Maximum Ratings 20-1 20-2 20-3 Power DiSSipation Across Voltage and Cycle Time 20-2 NVAX Pin Driver Impedance Maximum Pin Capacitance 20-3 20-4 20-4 20-6 NVAX Pin Levels NVAX Pin Characteristics 20-7 Pin Loading for AC Tests 20-4 20-5 20-5 20-8 NDAL AC timing specs 20-7 20-10 20-9 Generic Data RAM Timing Specification 20-13 20-10 Generic Tag RAM Timing SpeCification OMEGA-Speclflc Data RAM Timing Specification 20-15 20-16 20-12 20-13 OMEGA Specific Tag RAM Timing Specification XNP SpecHlc Data RAM Timing Specification 20-17 20-19 20-11 20-14 XNP Specific Tag RAM Timing Specification 20-21 20-15 Interrupt, Test, and Boundary Scan Pin AC timing specs 20-16 Revision History 20-25 20-26 A-1 Revision History A-19 DIGITAL CONFIDENTIAL xlv Chapter 1 Introduction The NVAX CPU is a high-performance, single-chip implementation of the VAX architecture. It is partitioned into multiple sections which cooperate to execute the VAX base instruction group. The CPU chip includes the first levels of the memory subsystem hierarchy in an on-chip virtual instruction cache and an on-chip physical instruction and data cache, as well as the controller for a large second-level cache implemented in static RAMs on the CPU module. 1.1 Scope and Organization of this Specification This specification describes the operation of the NVAX CPU chip. It contains a description of the interface to the chip, an overview of the operation of the instruction pipeline, and extensive detail about the functional operation of each section of the chip. In addition, the specification contains discussions of error handling, chip initialization, and testability features. 1.2 Related Documents The following documents are related to or were used in the preparation of this document: • • DEC Standard 032 VAX Architecture Standard. NVAX CPU Chip Design Methodology. 1.3 Terminology and Conventions 1.3.1 Numbering All numbers are decimal unless otherwise indicated. Where there is ambiguity, numbers other than decimal are indicated with the name of the base following the number in parentheses, e.g., FF (hex). 1.3.2 UNPREDICTABLE and UNDEFINED RESULTS specified as UNPREDICTABLE may vary from moment to moment, implementation to implementation, and instruction to instruction within implementations. Software can never depend on results specified as UNPREDICTABLE. DIGITAL CONFIDENTIAL Introduction 1-1 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 OPERATIONS specified as UNDEFINED may vary from moment to moment, implementation to implementation, and instruction to instruction within implementations. The operation may vary in effect from nothing, to stopping system operation. UNDEFINED operations must not cause the processor to bang.,· i.e., reach a state from which there is no transition to a normal state in which the machine executes instructions. Note the distinction between result and operation. Non-privileged software can not invoke UNDEFINED operations. 1.3.3 Ranges and Extents Ranges are specified by a pair of numbers separated by a " .. " and are inclusive, e.g., a range of integers 0 ..4 includes the integers 0, 1, 2, 3, and 4. Extents are specified by a pair of numbers in angle brackets separated by a colon and are inclusive, e.g., bits <7:3> specify an extent of bits including bits 7, 6, 5, 4, and 3. 1.3.4 Must be Zero (MBZ) Fields specified as Must Be Zero (MBZ) must never be filled by software with a non-zero value. If the processor encounters a non-zero value in a field specified as MBZ, a Reserved Operand exception occurs. 1.3.5 Should be Zero (SBZ) Fields specified as Should Be Zero (SBZ) should be filled by software with a zero value. These fields may be used at some future time. Non-zero values in SBZ fields produce UNPREDICTABLE results. 1.3.6 Reg ister Format Notation This spec:ification contains a number of figures that show the format of various registers, followed by a description of each field. In general, the fields on the register are labeled with either a name or a mnemonic. The description of each field includes the name or mnemonic, the bit extent, and the type. An example of a register is shown in Figure 1-1. Table 1-1 is an example of the description of the fields in this register. 1-2 Introduction DIGITAL CONFIDENTIAL NVAX CPU Chip Funetional Specification, Revision 1.0, February 1991 Figure 1-1: Register Format Example 31 30 29 28127 26 25 24123 22 21 20119 l8 l7 l61l5 l4 l3 12111 lO 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 1 0 0 0 0 0 0 01 FAULT_CMD 1 x x x xlIEI 0 0 0 0 0 0 0 01 1 1 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 1 1 TRAP INTERRUPT 1 1 1 ---+ BUS_ERROR Table 1-1: -+ ----+ Register Field Description Example Name Extent Type BUS_ERROR 0 WC,O The BUS_ERROR bit is set when a bus error is detected. INTERRUPr 1 WC,O The INTERRUPT bit is set when an error that is reported as an interrupt is detected. TRAP 2 WC,O The TRAP bit is set when an error that is reported as a trap is detected. IE 11 RW,O The IE bit enables error reporting interrupts. When IE is 0, interrupts are disabled. When IE is a 1, interrupts are enabled. FAULT_CMD 23:16 RO The FAULT_CMD field latches the command that was in progress when an error is detected. Description The ''Type'' column in the field description includes both the actual type of the field, and an optional initialized value, separated from the type by a comma. The type denotes the functional operation of the field, and may be one of the values shown in Table 1-2. If present, the initialized value indicates that the field is initialized by hardware or microcode to the specified value at powerup. If the initialized value is not present, the field is not initialized at powerup. Table 1-2: Register Field Type Notation Notation Description RW A read-write bit or field. The value may be read and written by software, microcode, or hardware. RO A read-only bit or field. The value may be read by software, microcode, or hardware. It is written by hardware; software or microcode writes are ignored. WO A write-only bit or field. The value may be written by software or microcode. It is read by hardware and reads by software or microcode return an UNPREDICTABLE result. wz A write-only bit or field. The value may be written by software or microcode. It is read by hardware and reads by soaware or microcode return a O. WC A write-one-to-clear bit. The value may be read by software or microcode. Software or microcode writes of a 1 cause the bit to be cleared by hardware. Software or microcode writes of a 0 do not modify the state of the bit. RC A read-to-clear field. The value is written by hardware and remains unchanged until read. The value may be read by software or microcode, at which point, hardware may write a new value into the field. DIGITAL CONFIDENTIAL Introduction 1-3 NVAX CPU Chip Functional Specification, Revision l~O, February 1991 In addition to named fields in registers, other bits of the register may be labeled with one of the three symbols listed in Table 1-3. These symbols denote the type of the unnamed fields in the register. Table 1-3: Register Field Notation Notation Description o A "0" in a bit position denotes a register bit that is read as a 0 and ignored on write. 1 A "1" in a bit position denotes a register bit that is read as a 1 and ignored on write. x An "X' in a bit position denotes a register bit that does not exist in hardware. The value is UNPREDICTABLE when read, and ignored on write. 1-4 Introduction DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 1.3.7 Timing Diagram Notation This specification contains a number of timing diagrams that show the timing of various signals, including NDAL signals. The notation used in these timing diagrams is shown in Figure 1-2. Figure 1-2: Timing Diagram Notation HIGH LOW INTERMEDIATE VALID HIGH OR LOW CHANGING xxxxxxxxx E!GH TO LOW HIGH TO VALID ,SS\ ,SSS 'SSm , $ $ , , $ $ , LOW TO HIGH 101 LOW TO VALID 1111 10M I I I» LOW TO INTERMEDIATE t 2 2 ; -...,j>...>..,)-INVALID TO INTERMEDIATE XXX»)-- --«« --«(<XX DIGITAL CONFIDENTIAL Introduction 1-5 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 1.4 Revision History Table 1-4: Revision History Who When Mike Uhler 06-Mar-1989 Release for external review. Mike Uhler 15-Dec-1989 Update for second-pass release. 1-6 Introduction Description of change DIGITAL CONFIDENTIAL Chapter 2 Architectural Summary 2.1 Overview This chapter provides a summary of the VAX. architectural features of the NVAX CPU Chip. It is not intended as a complete reference but rather to give an overview of the user-visible features. For a complete description of the architecture, consult the VAX Architecture Standard (DEC Standard 032). 2.2 Visible State The visible state of the processor consists of memory, both virtual and physical, the general registers, the processor status longword (PSL), and the privileged internal processor registers (IPRs). 2.2.1 Virtual Address Space The virtual address space is four gigabytes (2**32), separated into three accessable regions (PO, Pl, and SO) and one reserved page, as shown in Figure 2-1. DIGITAL CONFIDENTIAL Architectural Summary 2-1 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 2-1: 00000000 Virtual Address Space Layout +-------------------------+1 1 1 1 length of PO Region in pages (POLR) 11 Region PO ----------------1 1 1 3FFFFFFF 1 1 PO Region growth direction 40000000 +-------------------------+ 1 1 Pl Region growth direction V 1 1 1 1I Pl ----------------1I Region 7FFFFFFF I I 80000000 +-------------------------+ 1 I 1 1 length of Pl Region in pages (2**21-P1LR) length of System Region in pages (SLR) 1I Region System ---------------1 I 1 1 1 1 1 1 1 1 1 1 I I 1 , ITIT!'DIT, I V I 1 ====~OO IT:::::: System Region growth direction +-------------------------+ I Reserveci 1 Page +-------------------------~ NOTE' NVAX CPU chips at revision 1 implement the .original VAX memory management architecture in which any reference to a virtual address above BFFFFFFF (hex) causes a length violation. NVAX CPU chips at revision 2 or later implement the extended SO space addressing described above. 2.2.2 Physical Address Space The NVAX CPU naturally generates 32..bit physical addresses. This corresponds to a four gigabyte physical address space as shown in Figure 2-2. Memory space occupies the :first seven-eighths (3.5GB) of the physical address space. 110 space occupies the last one-eighth (512MB) of the physical address space and can be distinguished from memory space by the fact that bits <31:29> of the physical address are all ones. 2-2 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 2-2: 32-blt Physical Address Space Layout 00000000 +-------------------------+ I I I I I I +- I I I I Memory Space DFFFFFFF EOOOOOOO FFFFFFFF -+ I I I I I 3.5 GB +- -+ +- -+ I I I I I +-------------------------+ I/O I 512 MB Space I +-------------------------+ In addition to the natural 32-bit physical address, the CPU may be configured to generate 30-bit physical addresses. In this mode, only 512MB of memory space can be referenced, as shown in Figure 2---3. Figure 2-3: 00000000 1FFFFFFF 20000000 30-blt Physical Address Space Layout +-------------------------+ I Memory I 512 MB I Space I I -+ I I I I I +I I I I I +I I I I I Inaccessable Region +DFFFFFFF EOOOOOOO FFFFFFFF I +-------------------------+ I I I I -+ 3.0 GB I I I I I -+ I I +-------------------------+ I/O I 512 MB Space I +-------------------------+ The translation from 30-bit addresses to 32-bit addresses is accomplished by sign-extending PA<29> to PA<31:30>. In this mode, the programmer sees a 1GB address space, split evenly between memory and liD space, which is mapped to the actual 32-bit physical address space as shown in Table 2-1. Unless explicitly stated otherwise, addresses that are given in the remainder DIGITAL CONFIDENTIAL Architectural Summary 2-3 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 of this specification are the full 32-bit addresses (which, of course, may have been generated from a 30-bit program address via the mapping shown). Table 2-1: 30-blt Mapping of Program Addresses to 32-blt Hardware Addresses Program Address Hardware Addr.ess OOOOOOOO •. 1FFFFFFF OOOOOOOO .. lFFFFFFF 20000000..3F'F'F'F'F'F'F' EOOOOOOO.• FFFFFFFF 2.2.2.1 Physical Address Control Registers During powerup, microcode configures the CPU to generate 30-bit physical addresses. Console firmware may then reconfigure the CPU and optional vector unit to generate either 30-bit or 32-bit physical addresses by writing to the MODE bit in the PAMODE register. The PAMODE register is shown in Figure 2-4. Figure 2-4: IPR E7 (hex), PAMODE 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 01 1 :PAMODE +--+--+--+--+--+--+--+--+--+--+--+--+--+--+~-+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 MODE --+ The PAMODE register also determines how PrEs are to be interpreted. In 30-bit mode, PrEs are interpreted in 21-bit PFN format. In 32-bit mode, PrEs are interpreted in 25-bit PFN format (although the two upper bits of the PFN field are ignored). The different PTE formats are described in Section 2.6.4. The PAMODE register is described in more detail in Chapter 12. 2.2.3 Registers There are 16 32-bit General Purpose Registers (GPRs). The format is shown in Figure 2-5, and the use of each GPR is shown in Table 2-2. 2-4 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 2-5: General Purpose Registers 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I :Rn +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Table 2-2: General Purpose Register Usage GPR Use Synonym RO-RII General Purpose RI2 AP Argument Pointer RI3 FP Frame Pointer R14 SP Stack Pointer R15 PC Program Counter The Processor Status Longword (PSL) is a 32-bit register which contains processor state. The PSL format is shown in Figure 2-6, and the fields of the PSL are shown in Table 2-3. Figure 2-6: Processor Status Longword Fields 31 30 29 28127 26 25 24123 22 21 20119 16 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--~-+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I I IMBIFPI I CUR I PRV IMBI ICMITPIVMIZ ID IISI MOD I MOD IZ I IPL MBZ I I I I I I I I I IDVIFUIIVI TI NI ZI VI CI :PSL +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Table 2-3: Processor Status Longword Name Bit(s) Description CM Compatability Model TP 31 30 VM 29 VIrtual Machine Model FPD 27 First Part Done IS 26 Interrupt Stack CUR_MOD Trace Pending 25:24 Current Mode PRV_MOD 23:22 Previous Mode IPL Interrupt Priority Level IV 20:16 7 6 5 T 4 DV FU Decimal Overfiow Trap Enable Floating Underfiow Fault Enable Integer Overfiow Trap Enable Trace Trap Enable 1MBZ in CUlTent implementation DIGITAL CONFIDENTIAL Architectural Summary 2-5 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 2-3 (ConL): Processor Status Longword Name Bit(s) Description N 3 Negative Condition Code Z 2 Zero Condition Code V 1 Overfiow Condition Code C 0 Carry Condition Code Data Types 2.3 The l\TVAX CPU supports nine data types: byte, word, longword, quadword, character string, variable length hit field, F_floating, D_floating, and G_floating. These are summarized in Figure 2-7. Figure 2-7: Data Types 07 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+ Data Type: Byte Length: 8 bits Use: Signed or unsigned integer 15 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I :A +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Data Type: Word Length: 16 bits Use: Signed or unsigned integer 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I :A +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Data Type: Longword Length: 32 bits Use: Signed or unsigned integer Figure 2-7 Cont'd on next page 2-6 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 2-7 (Cont.): Data Types 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I :A +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I :A+4 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Data Type: Quadword Length: 64 bits Use: Signed integer 07 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+ :A +--+--+--+--+--+--+--+--+ :A+l +--+--+--+--+--+--+--~-+ +--~--+--+--+--+--+--+--+ :.A+len;r:.h - 1 Da~a ~y?e: ~n;~t.: Use: Ch~~acte~ S~=ing 0-64K by:.es :Syce s~=ing i///!///////I///////////I 1 :A +--+--+--+--+--+--+--+--~--+--+--+--+--+--+--+--+--+--+--+--~--+--~-+--+--+--+--+--+--+--+--+--+ Data Type: Variable length bit field Length: 0-32 bits Use: Bit string 15 14 13 12/11 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ lsi exponent fraction:A +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ fraction :A+2 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 31 30 29 28127 26 25 24123 22 21 20/19 18 17 16 Data Type: F_floating Length: 32 bits Use: Floating point l5 14 13 1211l 10 09 08107 06 OS 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ lsi exponent I fraetion I :A +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+-~+ 1 fraction 1 :A+2 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ fraction :A+4 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 fraction 1 :A+6 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 63 62 61 60159 58 57 56155 54 53 52151 50 49 48 Data Type: D_floating Length: 64 bits Use: Floating point Figure 2-7 Cont'd on next page DIGITAL CONFIDENTIAL Architectural Summary 2-7 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 2-7 (Cont.): Data Types 15 14 13 ~21~1 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 sl exponent I fraetion 1 :A +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I fraction 1 :A+2 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ fraction I :A+4 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ fraction 1 :A+6 +--+-----+-----+--+--+--+--+--+--+--+--+--+--+--+ 63 62 61 60159 58 57 56155 54 53 52151 SO 49 4B Data Type: G_floating 64 bits Use: Floating point I.ength: 2.4 Instruction Formats and Addressing Modes VAX instructions consist of a one- or two-byte opcode, followed by zero to six operand specifiers. 2.4.1 Opcode Formats .An opcode may be either one or two contiguous bytes. The two-byte format begins with an FD (hex) byte and is followed by a second opcode byte. The one-byte format is indicated by an opcode byte whose value is anything other than FD (hex). The one- or two-byte opcode format is shown in Figure 2-8. Figure 2-8: Opcode Formats 07 06 05 04103 02 01 00 One-byte opcocie: +--+--+--+--+--+--+--+--+ 1 opeocie 1 :A +--+--+--+--+--+--+--+--+ Two-byte opcocie: +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ opeocie FD I :A +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 15 14 13 12111 10 09 08107 06 05 04103 02 01 00 2.4.2 Addressing Modes An operand specifier starts with a specifier byte and may be followed by a specifier extension. Bits <3:0> of the specifier byte contain a GPR number and bits <7:4> of the specifier byte indicate the addressing mode of the specifier. If the register number in the specifier byte does not contain 15, the addressing mode is a general register addressing mode. H the register number in the specifier byte does contain 15, the addressing mode is a PC-relative addressing mode. The 2-8 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 different addressing modes are shown graphically in Figure 2-9. General register addressing modes are listed in Table 2-4 and PC-relative addressing modes are listed in Table 2-5. Figure 2-9: Addressing Modes General register addressing mode: 07 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+ mode 1 register 1 +--+--+--+--+--+--+--+--+ PC-relative addressing mode: 07 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+ mode 1 1 1 1 11 +--+--+--+--+--+--+--+--+ DIGITAL CONFIDENTIAL Architectural Summary 2-9 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 2-4: General Register Addressing Modes Access Mode Name Assembler rmwav PC SP IDdeDlble? 0-3 literal S"#literal yffff x x f 4 index i[Rx] yyyyy u y f 5 register Rn yyyfy u uq f 6 register deferred (Rn) yyyyy u y y \IX 7 autodecrement -eRn) yyyyy u Y 8 autoincrement (Rn);. yyyyy P Y \IX 9 autoincrement deferred @(Rn)+ yyyyy P Y \IX A byte displacement B"d(Rn) yyyyy P Y Y B byte displacement deferred @B"d(Rn) yyyyy P Y Y C vvord displacement W"d(Rn) yyyyy P Y Y D word deferred @W"'d(Rn) yyyyy P Y Y E lo~ord displaceIDent L"d(Rn) yyyyy P Y Y F lo~ord @L"'d(Rn) yyyyy P Y Y displacement displacement deferred .Access Types r = read m modify w = write a = address v variable bit field = = Syntax i = any indexable address mode d =displacement Rn =general register, n =0 to 15 Rx = general register, n = 0 to 14 Results y = yes, always valid address mode f reserved addressing mode fault :r: logically impossible p program counter acidressiDg u = unpredictable ud = unpredictable for clestination of CALLG, CALLS, JMP and JSB uq = unpredictable for quad, D/G_lloating and field if po8+8ize > 32 u:r: = unpredictable if index register = base register = = = 2-10 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 2-5: PC-Relative Addressing Modes Mode Name Assembler rmwav 8 PC SP Indezable? immediate IA#constant yuuyud u 9 absolute O#address byte relative BAaddress B byte relative deferred @BAaddress yyyyy yyyyy yyyyy yyyyy y A C word relative WAaddress D word relative deferred @WAaddress E longword relative LAaddress F longword relative defelTed @LAaddress yyyyy yyyyy yyyyy y y y y y y For notation, refer to the key in Table 2-4 2.4.3 Branch Displacements Branch instructions contain a one- or two-byte signed'branch displacement after the final specifier (if any). The branch displacement is shown in Figure 2-10. Figure 2-10: Branch Displacements Signed. byte displacement: 07 06 05 04 10.3 02 01 00 +--+--+--+--+--+--+--+--+ displacement +--+--+--+--+--+--+--+--+ Signed. word displacement: 15 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I displacem&pt I +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 2.5 Instruction Set The NVAX CPU supports the VAX Base Instruction Group as defined in DEC Standard 032. These instructions are listed in Table 2-6. DIGITAL CONFIDENTIAL Architectural Summary 2-11 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 2-6: Opcode NVAX Instruction Set IDstructiOD N Z V C Exceptions Integer, Arithmetic and Logical Instructions 58 ADAWI add.rw, sum.mw * * * * iov 80 ADDB2 add.rb, sum.mb ADDL2 add.rI, sum.ml AO ADDW2 add.rw, sum.mw * * * * * * * * * iov 00 * * * 81 ADDB3 add1.rb, add2.rb, sum.wb Al ADDLS addi.rI, add2.rI, sum.wI ADDW3 add1.rw, add2.rw, sum.ww * * * * * * * * * iov 01 * * * iov 08 ADWC add.rI, sum.m1 * * * * iov 78 ASHL cnt.rb, src.rI, dst. wI * * iov ASHQ cnt.rb, src.rq, dst.wq * * 0 79 * * 0 iov 8A OA BIOB2 mask.rb, dst.mb BICW2 mask-rw, dst.mw * * * 0 0 AA * * * 8B OB BICB3 mask.rb, src.rb, dst. wb BICW3 mask-rw, srC.TW, dst.ww * * * 0 AB * * * 88 BISB2 mask.rb, dst.m.b AS BISL2 mask.rI, dst.ml BISW2 mask.rw, dst.mw * * * 0 08 * * * 89 BISB3 mask.rb, src.rb, dst.wb BISLS mask.rI, src.rI, dst.wI A9 BISW3 mask.rw, src.rw, dst. ww * * * 0 09 * * * 93 BITB mask.rb, src.rb BITL mask.rI, src.rl B3 BITW mask.rw, src.TW * * * 0 D3 * * * BICL2 mask.rI, dst.m1 BICL3 mask.rI, src.rI, dst. wI 2-12 Architectural Summary iov iov iov 0 0 0 0 0 0 0 0 0 DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 2-6 (Cont.): Opcode NVAX Instruction Set IDstructioD N Z V C EltceptiODS Integer, Arithmetic and LogicallDstructioDS 94 CLRB dst.wb 0 I 0 D4 CLRL{::F} dst.wI 0 I 0 7C CLRQ{=D::G} dst.wq 0 I 0 B4 CLRWdst.ww 0 I 0 91 CMPB src1.rb, src2.rb Dl CMPL srel.rl, src2.rI Bl CMPW sre1.rw, sre2.rw * * * 98 CVTBL sre.rb, dst.wI 99 F6 CVTBW sre.rb, dst.ww CVTLB sre.rl, dst.wb F7 CVTLW src.rl, dst.ww 33 CVTWB sre.rw, dst.wb 32 CVTWL Srt.rw, dst.wI 97 DECB dif.mb D7 DECL dif.m1 B7 DECW dif.mw 86 DIVB2 divr.rb, quo.mb C6 DIVL2 elivr.rl, quo.ml A6 DIVW2 divr.rw, quo.mw 87 DIVB3 divr.rb, divcLrb, quo.wb C7 DIVL3 divr.rl, divd.rl, quo.wI II< 0 II< II< 0 II< II< 0 II< II< 0 0 II< 0 0 * II< 0 iov II< II< 0 iov II< II< 0 iov II< 0 0 * * * II< II< II< iov II< II< II< iov II< II< II< iov * * * II< II< 0 iov, idvz II< II< 0 iov, idvz II< II< 0 iov, idvz * * * II< 0 iov, idvz * 0 iov, idvz II< 0 iov, idvz iov, idvz * * * * * * A7 DIVW3 divr.rw, divd.rw, quo.ww * * * 7B EDIV divr.rI, divcLrq, quo.wl, rem.wI * II< * 0 7A EMUL mulr.rl, muld.rl, add.rl, prod.wq * * 0 0 96 INCB sum.mb II< II< D6 INCLsum.m1 * * II< II< * * DIGITAL CONFIDENTIAL iov iov Architectural Summary 2-13 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 2-6 (Cont.): Opcode NVAX Instruction Set IDstractiOD N Z V C ExceptioDS * iov * * iov * iov * 0 iov * * 0 iov 0 iov 0 iov 0 iov 0 iov Integer, Arithmetic and Logical InstructiOD& B6 INCWsum.mw * * * 92 MOOMB src.rb, dst.wb * 0 D2 MOOML src.rI, dst.wl B2 MOOMW src.rw, dst. ww * * * * * BE M'NEGB src.rb, dst.wb * OE MNEGL src.rI, dst.wI AE MNEGW src.rw, dst. ww * * * * * * 90 MOVB sre.rb, dst. wb * DO MOVL sre.rI, dst.wl * 7D MOVQ sre.rq, dst.wq BO MOv\v srC.rw, dst.ww * * 9A MOVZBW sre.rb, dst.wb 0 9B MOVZBL sre.rb, dst.wl 0 30 MOVZWL sre.rw, dst.wI 0 * * * 84 MULB2 mulr.rb, prod.mb C4 MULL2 mulr.rl, prod.m1 A4 MULW2 mulr.rw, prod.mw * * * * * * 85 MULB3 mulr.rb, muld.rb, prod.wb * * * * * * * * * * * * 0 0 iov 0 0 0 0 0 0 0 05 MULL3 mulr.rI, muld.rI, prod.wI AS MULW3 mulr.rw, muld.rw, prod.ww * * * DD PUSHL src.rI, {-(SP).wI} * * 0 90 ROTL cnt.rb, src.rl, dst.wl * * 0 D9 SBWO sub.rI, dif.m1 * * * * iov 82 SUBB2 sub.rb, dif.m.b * * * * iov 2-14 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0. February 1991 Table 2-6 (Cont.): Opcode NVAX Instruction Set IDstructiOD. N Z V C Exceptions * * * * * * * * iov * * * * * * * * * * * * * * * * * * 0 0 0 0 0 0 * * * * * * 0 * * * * * * 0 * * * * * * * * 0 * * * * * * * * 0 Integer, Arithmetic and LogicallDstructioDS C2 SUBL2 sub.rI, di£m1 A2. SUBW2 sub.rw, dif.mw 83 SUBB3 sub.rb, min.rb, dif.wb C3 SUBL3 sub.rI, min.rI, dif.wI A3 SUBW3 sub.rw, min.rw, dif.ww 95 TSTB sre.rb D5 TSTL sre.rl B5 TSTW srC.rw BC XORB2 mask.rb, dst.mb CC XORL2 mask.rI, dst.m1 AC XORW2 mask.rw, dst.mw 8D XORB3 mask.rb, src.rb, cist.wb CD XORL3 mask.rI, src.rI, dst.wl AD XORW3 mask.rw, src.rw, dst. ww iov iov iov iov 0 0 0 0 Address IDstructioDS 9E MOVAB src.ab, cist. wI DE MOVAL{=F} arc.al, dst.wI 7E 3E MOVAQ{=D::G} src.aq, dst. wI 9F PUSHAB src.ab, {-(SP).wl} DF PUSHALt=F} ere.al, {-(SP).wI} 7F PUSHAQ{=D=G} arc.aq, {-(SP).wI} 3F PUSHAW src.aw, {-(SP).wl) MOVAW src.aw, dst.wI 0 0 0 0 0 0 Variable-LeDgth Bit Field IDstructioDS EC CMPV pos.rI, size.rb, base.vb, {field.rv}, sre.rl * * 0 * rav ED CMPZV pos.rI, size.rb, base.vb, {field.rv}, src.rI * * 0 * rav DIGITAL CONFIDENTIAL Architectural Summary 2-15 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 2-6 (Cont.): Opcode NVAX Instruction Set IDstructiOD N Z V C Exceptions Variable-Length Bit Field IDstructions EE EXTV pos.rl, size.rb, base.vb, ffield.rv}, dst.wl * * 0 rsv EF EXTZV pos.rl, size.rb, base.vb, {field.rv}, dst.wl * * 0 rsv FO INSV src.rl, pos.rl, size.rb, base. vb, {field.wv} EB FFC startpos.rl, size.rb, base. vb , {field.rv} , findpos.wl FFS startpos.rl, size.rb, base.vb, {field.rv} , findpos.wl EA rsv 0 * 0 0 rsv 0 * 0 0 rsv * * * * * * * * * * * * * * * Control InstructiODS 9D ACBB limit.rb, add.rb, index.mb, displ.bw Fl ACBL limit.rl, add.rl, index.mI, clispl.bw 3D ACBW limit.rw, add.rw, index.mw, disp1.bw F3 AOBLEQ limit.rl, index.ml, disp1.bb F2 AOBLSS limit.rl, index.mI, displ.bb IE BCC{=BGEQU} disp1.bb IF BCS{=BLSSU} displ.bb 13 BEQL{=BEQLU} displ.bb 18 BGEQ clisp1.bb 14 BGTR displ.bb iov iov iov iov iov 1A BGTRU disp1.bb 15 BLEQ displ.bb IB BLEQU d,ispLbb 19 BLSS displ.bb 12 BNEQI=BNEQU} displ.bb 1C BVC displ.bb ID BVS dispLbb El BBC pos.rl, base.vb, clisp1.bb, {field.rv} rsv EO BBS pos.rl, base.vb, displ.bb, {neld.rv} rsv 2-16 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0. February 1991 Table 2-6 (Cont.): Opcode NVAX Instruction Set N IDstruction Z V C E%ceptiODS Control InstructioDS E5 BBCC pos.rI, base.vb, displ.bb, ffield.mv} rsv E3 BBCS pos.rI, base.vb, dispI.bb, ffield.mv} rsv E4 BBSC pos.rI, base.vb, displ.bb, ffield.mv} rsv E2 BBSS pos.rI, base.vb, dispI.bb, ffield.mv} rsv E7 BBCCI pos.rI, base.vb, displ.bb, ffieId.mv} rsv E6 BBSSI pos.rI, base.vb, displ.bb, {field.mv} rsv E9 ES BLBC src.rI, disp1.bb BLBS src.r1, disp1.bb 11 BRB disp1.bb 31 BRW displ.bw 10 BSBB disp1.bb, {-(SP).wl} 30 BSBW disp1.bw, {-(SP).wl} SF CASEB selector.rb, displ.bw-list base.rb, limit.rb, * * 0 * CF CASEL selector.rI, displ.bw-list base.rI, limit.rI, * lie 0 lie AF CASEW selector.rw, displ.bw-list base.rw, limit.rw, * lie 0 lie 17 JMP dst.ab 16 JSB dst.ab, {-(SP).wl} 05 RSB {(SP)+.rl} F4 SOBGEQ index.ml, disp1.bb lie lie iov F5 SOBGTR index.ml, displ.bb * * lie lie iov DIGITAL CONFIDENTIAL Architectural Summary 2-17 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 2-6 (Cont.): Opcode NVAX Instruction Set IDstructiOD N Z V C Exceptions Procedure Call1DstructiODS FA CALLG arglist.ab, dst.ab, {-(SP).w*) 0 0 0 0 rev FB CALLS numarg.rl, dst.ab, {-(SP).w*} 0 0 0 0 rev 04 RET {(SP)+.r*) * * * * rev * * * * * * rev rsv 0 0 0 Miscellaneous IDstructions B9 BICPSW mask.rw B8 BISPSW mask.rw * * 03 BPI' (·<KSP). w*} 0 00 HALT f·<KSP).w*} OA INDEX subscript.rl, low.r!, bigh.rl, size.rl, indexin.rl, indexout.wI DC MOVPSL dst.wI 01 NOP BA POPR mask.rw, {(SP)+.r*} BB PUSHR mask.rw, {-(SP). w*} Fe XFC {unspecified operands} PrY * * 0 0 0 0 0 0 * * 0 * 0 * * * sub Queue IDstructiODS 5C INSQHI entry.ab, header.aq 0 5D OE INSQTI entry.ab, header.aq 0 INSQUE entry.ab, pred.ab * * 0 5E 5F OF REMQHI header.aq, addr. wI 0 * REMQTI header.aq, addr. wI 0 REMQUE entry.ab, addr.wl * * * * * * 2-18 Architectural Summary * rsv rsv rsv rev * DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 2-6 (Cont.): Opcode NVAX Instruction Set IDstructiOD N Z V C hceptioDS Operating System Support IDstructiODS BD OHME param.rw, {...(ySP).w*} 0 0 0 0 BO OHM!{ param.rw, {-(ySP).w*} 0 0 0 0 BE OHMS param.rw, {-(ySP).w*} 0 0 0 0 BF OHMU param.rw, {-(ySP).w*} 0 0 0 0 06 LDPOTX fPOB.r*, -<KSP).w*} DB MFPR procreg.rl, dst. wI DA rsy, prv * * 0 rsy, prv MTPR src.rl, procreg.rI * * 0 rsy, prv 00 PROBER mode.rb, len.rw, base.ab 0 PROBEW mode.rb, Ien.rw, base.ab 0 * * 0 OD 02 REI {(SP)+.r*} oj< * * 07 SVPCTX {(SP)+.r*, POB.w*} 0 * rsy prv Character StriDg IDstractioDB 29 OMPCS len.rw, srcladdr.ab, src2addr.ab 2D OMP05 srcllen.rw, :fill.rb,src21en.rw, src2addr.ab SA LOOO char.rb, len.rw, addr.ab 28 MOVOS Ien.rw, {R0-5.wl} * * 0 * * * 0 * 0 * 0 0 dstaddr.ab, 0 1 0 0 20 MOV05 srclen.rw, srcaddr.ab, fill.rb, dstlen.rw, dstaddr.ab,{R0-5.wl} * * 0 * 2A SOANO len.rw, addr.ab, tbladdr.ab, mask.rb 0 * 0 0 DIGITAL CONFIDENTIAL srcaddr.ab, srcladdr.ab, Architectural Summary 2-19 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 2-6 (Cont.): Opcode NVAX Instruction Set IDstructiOD N Z V C ExceptioDS Character StriDg InstructiODS 3B SKPC ehar.rb, len.rw, addr.ab 0 * 0 0 2B SPANC len.rw, addr.ab, tbladdr.ab, mask.rb 0 * 0 0 * * * * * * 0 0 0 rav, fov, fuv 0 0 0 rav, fov, fuv * * * * * * 0 0 rsv, fov,fuv 0 0 rav, fov, fuv 0 0 rsv, fov, fuv * * * * * * 0 0 rav 0 0 rav 0 0 rav * * * * * * * * * * * * * * 0 0 0 0 0 0 * 0 rav, iov 0 0 rav, fov * * * 0 rav, iov 0 rav, iov 0 rav, iov 0 0 rsv 0 0 * * * 0 rsv rsv, iov 0 rav, iov 0 rav, iov 0 0 rsv,fov,fuv * * 0 rsv, iov Floating Point IDstructiODS 60 ADDD2 add.rd, sum.md 40 ADDF2 add.rf, sum.mf 40FD ADDG2 add.rg t sum.mg 61 ADDD3 addl.rd, add2.rd t sum.wd 4lFD ADDF3 add1.ri', add.2.ri't sum.wf ADDG3 addl.rg, add2.rg, sum. wg 71 CMPD srcl.rd, src2.rd 51 51FD CMPF srel.n, sre2.rf CMPG srel.rg, src2.rg 6C CVTBD sre.rb, dst.wd 41 33FD CVTGF sre.rg, dst.wf * * * * * * * * * * * * * * 4A.FD CVTGL arc.rg, dst.wl * 4C 4CFD 68 76 CVTBF sre.rb, dst.wf CVTBG sre.rb, dst.wg CVTDB sre.rd, dst.wb CVTDF sre.rd, dst.wf SA CVTDL arc.rd, dst.wl 69 48 CVTDW src.rd, dst.ww CVTFB src.n, dst. wb 56 CVTFD sre.n, dst.wd 99FD CVTFG src.n, dst. wg 4A CVTFL src.n, dst. wI 49 CVTFW sre.n, dst.ww 48FD CVTGB sre.rg, dst.wb 2-20 Architectural Summary rav, fov, fuv DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 2-6 (Cont.): Opcode NVAX Instruction Set IDstructiOIl N Z * * * * * * * * * * * * * * * * * * * * * * * V C ExCeptioDS rsv, iov Floating Point IDstructiODB 49FD CVTGW src.rg, dst.ww 6E CVTLD sre.rI, dst.wd 4E CVTLF src.rI, dst.wf 4EFD CVTLG sre.rI, dst.wg 6D CVTWD src.rw, dst.wd 4D CVTWF src.rw, dst. wf 4DFD CVTWG src.rw, dst.wg 6B CVTRDL sre.rd, dst.wI 4B CVTRFL sre.rf, dst. wI 4BFD CVTRGL sre.rg, dst.wI 66 DIVD2 divr.rd, quo.md 46 DIVF2 divr.rf, quo.mf 46FD DIVG2 divr.rg, quo.mg 67 DIVD3 divr.rd, divd.rd, quo.wd 47 DIVF3 divr.rf, divd.rf, quo.wf 47FD DIVG3 divr.rg, divd.rg, quo.wg 72 MNEGD arc.rd., dst.wd 52 MNEGF src.rf, dst. wf 52FD MNEGG sre.rg, dst.wg 70 MOVD src.rd., dst.wd * 0 0 0 0 0 0 0 0 0 0 0 0 0 * * * 0 rsv, iov 0 rsv, iov 0 rsv, iov * * * 0 0 rsv, fov, fuv, fdvz 0 0 rsv, fov, fuv, fdvz 0 0 rsv, fov, fuv, fdvz * * * * * * 0 0 rsv, fov, fuv, fdvz 0 0 rsv, fov, fuv, fdvz 0 0 rsv, fov, fuv, fdvz * * * * * * * * * * * 0 0 rsv 0 0 rsv 0 0 rsv * * * 0 50 MOVF src.rf, dst.wf 50FD MOVG src.rg, dst.wg lie 64 MULD2 mulr.rd, prod.md lie 44 MULF2 .mulr.rf, prod.mf * 44FD MULG2 mulr.rg, prod.mg lie 65 MULD3 mulr.rd, muld.rd, prod.wd lie 45 MULF3 mulr.rf, muld.rf, prod.wf 45FD MULG3 mulr.rg, muld.rg, prod.wg DIGITAL CONFIDENTIAL 0 rsv 0 rsv 0 rsv 0 rsv,fov, fuv 0 0 rsv, fov, fuv 0 0 rsv,fov,fuv 0 0 rsv, fov, fuv * * * 0 0 rsv, fov, fuv lie lie 0 0 rsv, fov, fuv Architectural Summary 2-21 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 2-6 (Cont.): Opcode NVAX Instruction Set N Z V C Exceptions * * * * * * 0 0 rsv,fov,fuv 0 0 rsv, fov, fuv 0 0 rsv, fov, fuv * * * * * * 0 0 rsv, fov, fuv 0 0 rsv, fov, fuv 0 0 rsv, fov, fuv * * * * * * 0 0 rsv 0 0 rsv 0 0 rsv sumlen.rw, * * * 0 rsv, dov ADDP6 addllen.rw, add1addr.ab, add21en.rw, * * * 0 rsv,dov rsv,dov IDstructiOD Floating Point IDstructioDS 62 SUBD2 sub.rd, dif.md 42 SUBF2 sub.rf, dif.mf 42FD SUBG2 sub.rg, dif.mg 63 SUBD3 sub.rd, min.rd, dif.wd 43 SUBF3 sub.rf, min.rf, dif.wf 43FD SUBG3 sub.rg, min.rg, dif.wg 73 TSTD sre.rd 53 TSTF sre.rf 53FD TSTGsrc.rg Microcode-Assisted Emulated IDstructiODS 20 ADDP4 addlen.rw, addaddr.ab, sumaddr.ab 21 add2addr.ab, suaUen.rw, suznaddr.ab F8 ASHP cnt.rb, srclen.rw, srcaddr.ab, round.rb, dstlen.rw, dstaddr.ab * * * 0 35 CMPP3 len.rw, srcladdr.ab, src2addr.ab * * 0 37 * * 0 CMPP4 srellen.rw, srcladdr.ab, src21en.rw, src2addr.ab 0 0 OB CRC tbl.ab, inierc.rI, strlen.rw, stream.ab * * 0 0 F9 CVTLP src.rI, dstlen.rw, dstaddr.ab * * rsv,dov CVTPL srcien.TW, srcaddr.ab, dst.wI * * 0 36 * * 0 rsv, iov 08 CVTPS srelen.rw, dstaddr.ab srcaddr.ab, dstlen.rw, * * * 0 rsv, dov 09 CVTSP srclen.rw, dstaddr.ab srcaddr.ab, dstlen.rw, * * * 0 rsv,dov 2-22 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 2-6 (Cont.): Opcode NVAX Instruction Set IDstrDctiOD N Z V C kceptions Microcode-Assisted Emulated Instructions 24 CVTPl' srclen.rw, srcaddr.ab, dstlen.rw, dstaddr.ab tbladdr.ab, * '" '" 0 rsv, dov 26 CVTTP srclen.rw, srcaddr.ab, dstlen.rw, dstaddr.ab tbladdr.ab, * '" '" 0 rsv, dov 27 DIVP divrlen.rw, divraddr.ab, divdlen.rw, divdaddr.ab, quolen.rw, quoaddr.ab * '" * 0 rsv, dov, ddvz EDITPC srclen.rw, srcaddr.ab, pattern.ab, * * * '" rsv, dov srclen.rw, 0 * 0 0 * '" 0 0 38 dstaddr.ab 39 :MA.TCHC objlen.rw, srcaddr.ab 34 :MO'V"P len.rw, srcaddr.ab, dstaddr.ab 2E MOVTC srclen.rw, srcaddr.ab, tbladdr.ab, dstlen.rw, dstaddr.ab fill.rb, * '" 0 * 2F MOVTUC srclen.rw, srcaddr.ab, tbladdr.ab, dstlen.rw, dstaddr.ab esc.rb, * * * '" 25 MULP mulrlen.rw, mulraddr.ab, muldlen.rw, muldaddr.ab, prodlen.rw, prodaddr.ab * * * 0 rsv, dov 22 SUBP4 sublen.rw, difaddr.ab subaddr.ab, diflen.rw, '" * * 0 rsv, dov 23 SUBPG sublen.rw, subaddr.ab, mjnaddr ab diflen]=WI difaddr ab minlen.rw, * * '" 0 rsv, dov DIGITAL CONFIDENTIAL objaddr.ab, Architectural Summary 2-23 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 2-6 (Cont.): NVAX Instruction Set The notation used for operand specifiers is <DBme>.<access type><data type>. Implied operands (those locations that are referenced by the instruction but not specified by an operand) are denoted by curly braces n. Access Type = a address operand b branch displacement m = modi1ied operand (both read and written) r = read only operand v if not "Rn", same as a, otherwise R[n+l]'R[n] w write only operand = = = Data Type = b byte d = D_fioating f = F_floating g G_fioating 1 = longword q = quadword v = neld (used only in implied operands) w word multiple longwords (used only in implied operands) = = '" = Condition Codes Modification '" = conditionally set/cleared - =not affected 0= cleared 1 = set EzceptiODS = = rsv reserved operand fault iov = integer overflow trap idvz integer divide by zero trap fov = fioating overflow fault fuv = floating under:fiow fault fdvz floating divide by zero fault dov = decimal over1low trap ddvz decimal divide by zero trap sub = subscript range trap prv privileged instruction fault vec = vector unit disabled fault = = = 2-24 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 2.6 Memory Management The NVAX CPU Chip supports a four gigabyte (2**32) virtual address space, divided into two sections, system space and process space. Process space is further subdivided into the PO region and the PI region. 2.6.1 Memory Management Control Registers Memory management is controlled by three processor registers: Memory Management Enable (MAPEN), Translation Buffer Invalidate Single (TBIS), and Translation Buffer Invalidate All (TBlA). Bit <0> of the MAPEN register enables memory management if written with a 1 and disables memory management if written with a O. The MAPEN register is shown in Figure 2-11. Figure 2-11: IPR 38 (hex), MAPEN 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+-.-+--+ I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 I I : MAP EN +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I MME --+ The TBIS register controls translation buffer invalidation. Writing a virtual address into TBIS invalidates any entry which maps that virtual address. The TBIS format is shown in Figure 2-12. Figure 2-12: IPR 3A (hex), TBIS 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Virtual Address I :TBIS +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ The TBIA register also controls translation buffer invalidation. Writing a zero into TBIA invalidates the entire translation buffer. The TBIA format is shown in Figure 2-13. DIGITAL CONFIDENTIAL Architectural Summary 2-25 NVAX CPU Chip Functional Specification, Revision 1.lt August 1991 Figure 2-13: IPR 39 (hex), TBIA 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 01 :TBIA +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 2.6.2 System Space Address Translation A virtual address with bit <31> = 1 is an address in the system virtual address space. System virtual address space is mapped by the System Page Table (SPT), which is defined by the System Base Register (SBR) and the System Length Register (SLR). The SBR contains the page-aligned physical address of the the System Page Table. The SLR contains the size of the SPT in longwords, that is, the number of Page Table Entries. The Page Table Entry addressed by the System Base Register maps the :first page of system virtual address space, that is, virtual byte address 80000000 (hex). These registers are shown in Figure 2-14. With a 22-bit SLR width, 222 - 1 pages in system space may be addressed. As a result, the last page of system space (beginning at virtual address FFFFFEOO (hex» is not addressable. As a result, this page is reserved and a reference to any address in that page will result in a length violation. NOTE NVAX CPU chips at revision 1 implement the original VAX memory management architecture in which any reference to a virtual address above BFFFFFFF (hex) causes a length violation. NVAX CPU chips at revision 2 or later implement the extended SO space addressing described above. NOTE When the CPU is configured to generate SO-bit physical addresses, SBR<SI:S0> are ignored. 2-26 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 2-14: IPR OC (hex), SBR and IPR OD (hex), SLR 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I Physical Page Address of SPT I 0 0 0 0 0 0 0 0 01 :SBR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 0 0 0 0 0 0 0 0 0 01 Length of SPT in Longwords 1 :SLR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ The system space translation algorithm is shown graphically in Figure 2-15. DIGITAL CONFIDENTIAL Architectural Summary 2-27 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 2-15: System Space Translation Algorithm 3 3 1 0 o 9 8 +-+---------------------+-------+ 111 virtual page number I byte I +-+---------------------+-------+ I 1\ \ system-space virtual address: 3 1 I I 212 413 extract VPN, check length, and add I \ 1 \ 1 \ 211 0 \ \ \ \ \ +------+---------------------+---+ \ 1 physical address of SPT base 1 \ +--------------------------------+ I 1 sign-extend PA<29> to PA<31:30>1 I SBR: 1 1 1 I I 01 1 1 yields \ I 1 1 I 1 if in 30-bit mode 13 11 \ +--------------------------------+ physical address of SPTE +--------------------------------+ I I 1 I I I I 1 fetch 3 :2 2 1 1 3 2 0 1 +------+-------------------------+ page frame number +------+-------------------------+ 1 check access in current I SPTE: 1 1 I I I 1 1 1 I I mode, I sign-extend PTE<:20> to 1 1 I?TE<22:21> if in 30-bit 1 I mode 1 13 11 physical address: 2.6.3 merge 1 1 / / I / 91/8 o/ / +-------------------------+-------+ 1 page frame number 1 byte I +-------------------------+-------+ Process Space Address Translation A virtual address with hit <31> = 0 is an address in the process virtual address space. Process space is divided into two equal sized, separately mapped regions. Ifvirtual address bit <30> = 0, the address is in region PO. If virtual address bit <30> = 1, the address is in region PI. 2.6.3.1 PO Region Address Translation The PO region of the address space is mapped by the PO Page Table (POPT), which is defined by the PO Base Register (POBR) and the PO Length Register (POLR). The POBR contains the system page-aligned virtual address of the PO Page Table. The POLR contains the size of the POPT in longwords, that is, the number of Page Table Entries. The Page Table Entry addressed by the PO Base Register maps the first page of the PO region of the virtual address space, that is, virtual byte address O. The PO base and length registers are shown in Figure 2-16. 2-28 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 The PO space translation algorithm is shown graphically in Figure 2-17. Figure 2-16: IPR 08 (hex), POBR and IPR 09 (hex), POLR 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 1 01 System Virtual Page Address of POPT I 0 0 0 0 0 0 0 0 01 : POBR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 0 0 0 0 0 0 0 0 0 01 Length of POPT in Longwords 1 :POLR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Figure 2-17: PO Space Translation Algorithm 332 109 o 9 8 +---+-------------------+-------+ I 0 I virtual page number I byte I +---+-------------------+-------+ process-space virtual address: I I I 212 312 3 1 extract VPN, check length, and add 1\ I \ I \ 1 \ \ \ \ \ 211 0 \ +-------+-------------------+---+ \ virtual address of POPT base \ +-------------------------------+ I POBR: \ \ 1 I 1 1 I I 1 I I I yields 1 13 3 2 11 0 9 virtual address of POPTE: \ 9 8 1 I I 1 01 I I I I I I I I I I I I +---+---------------------------+ I Ivirtual pagQ number I byte +---+---------------------------+ fetch using system-space translation algorithm, including length check, but without access check 3 2 2 3 2 I +------+-------------------------+ I I page frame number 1 +------+-------------------------+ check access in current POPTE: I 1 I mode, I I sign-extend PTE<20> to 1 I PTE<22:21> if in 30-bit 1 I mode 1 I merge 1 / 13 1 / 11 91/8 physical address: 2.6.3.2 I I I I I I I 0 I I / / 0 / +-------------------------+-------+ I page frame number 1 byte I +-------------------------+-------+ P1 Region Address Translation The PI region of the address space is mapped by the PI Page Table (PIPT), which is defined by the PI Base Register (PIBR) and the PI Length Register (PILR). Because PI space grows towards smaller addresses, and because a consistent hardware interpretation of the base and length registers is desirable, PIBR and PILR describe the portion of PI space that is NOT accessible. DIGITAL CONFIDENTIAL Architectural Summary 2-29 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Note that PILR contains the number of nonexistent PTEs. PIBR contains the page-aligned virtual address of what would be the PTE for the first page of PI, that is, virtual byte address 40000000 (hex). The address in PIBR is not necessarily an address in system space,but all the addresses of PTEs must be in system space. The PI space translation algorithm is shown graphically in Figure 2-19. Figure 2-18: IPR OA (hex), P1BR and IPR OB (hex), P1 LR 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I Virtual Page Address of P1PT I 0 0 0 0 0 0 0 0 01 : P1BR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 0 0 0 0 0 0 0 0 0 01 (2 ** 21) - Length of P1PT in Longwords I :P1LR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Figure 2-19: P1 Space Translation Algorithm 332 109 o 9 8 +---+-------------------+-------+ process-space virtual address: I 0 Ivirtua1 page number I byte I +---+-------------------+-------+ I I I 212 312 3 1 extract VPN, check length, and add 1\ I \ I \ I \ 211 0 \ \ \ \ \ \ +-------+-------------------+---+ \ I virtual address of P1PT base P1BR: I \ \ \ +-------------------------------+ I I I I I I I I I I I I I I I I I I I I I I I I I I I I yields 13 3 2 11 0 9 9 8 I 01 +---+---------------------------+ virtual address of P1PTE: I virtual page nwnber I byte I +---+---------------------------+ fetch using system-space translation algorithm, including length check, but without access check 3 2 2 1 3 2 0 +------+-------------------------+ P1PTE: page frame number _I I I +------+-------------------------+ I I check access in current I I mode, I I sign-extend PTE<20> to I I PTE<22:21> if in 30-bit I I I mode I merge / 13 I / 11 91/8 physical address: / / 0 / +-------------------------+-------+ I page frame number I byte I +-------------------------+-------+ 2-30 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0t February 1991 2.6.4 Page Table Entry If the CPU is configured to generate 30-bit physical addresses, it interprets PTEs in the 21-bit PFN format shown in Figure 2-20. Conversely, if the CPU is configured to generate 32-bit physical addresses, it interprets PTEs in the 25-bit PFN format shown in Figure 2-21. Note that bits <24:23> of the 25-bitPFN format are ignored by the NVAX CPU chip, which implements only 32-bit physical addresses. The PTE formats shown below are described both in DEC Standard 032, and in Chapter l2. Figure 2-20: PTE Format (21-bH PFN) 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 .--+--+--+--+--+--+--+--~--+--+--+--+--+--+--+--+--+--.--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I V[ PROT 1 M1 Z \ OWN 1 SIS I Page Frame NUltIber , :PTE +--*--+--+--+--+--+--+--+--+--+--+--+--+--+--+--.--+--+--+--+--+--+--+--~-+--+--+--+--+--+--+--+ Figure 2-21 : PTE Format (25-blt PFN) 31 30 29 26i27 26 25 24[23 22 21 20119 18 1~ 16[15 l ' 13 12\11 10 09 0610' 06 C! 0'103 02 0: 00 ------------~--------~--+--------~--+--+_----~--~-----~--~--+--T--~--·-----------~-----------~--+ :?O':' I Yo i SiS IS: ---------~--------~--------~-----~-------+-----------~-----------------------------~--~--------- DIGITAL CONFIDENTIAL Architectural Summary 2-31 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 2-7: PTE Protection Code Access Matrix Code C1Ift"eIlt Mode Decimal Binary Mnemonic 0 1 NA 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 2 3 4 5 6 7 8 9 10 11 12 13 14 15 It E S U no access unpredictable KW RW KR R UW SR.A~ RW RW RW R RW RW RW SR R URSW URKW RW RW RW UR R EW ERKW ER SW SREW UREW Comment RW RW R R RW RW R R RW RW RW reserved RW all access RW R R R RW R R R R R R R R R Access Modes K = Kernel E = Executive S = Supervisor U=User Access 'l'ypes R=Read W= Write - = No access 2.6.5 Translation Buffer In order to save actual memory references when repeatedly referencing pages, the NVAX CPU Chip uses a translation buffer to remember successful virtual address translations and page status. The translation buffer contains 96 fully associative entries. Both system and process references share these entries. Translation buffer entries are replaced using a not-last-used (NLU) algorithm.. This algorithm guarantees that the replacement pointer is not pointing at the last translation buffer entry to be used. This is accomplished by rotating the replacement pointer to the next sequential translation buffer entry if it is pointing to an entry that has just been accessed. Both D-stream and I-stream references can cause the NLU to cycle. When the translation buffer does not contain a reference's virtual address and page status, the machine updates the translation buffer by replacing the entry that is selected by the replacement pointer. 2-32 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 2.7 . Exceptions and Interrupts At certain times during the operation of a system, events within the system require the execution of software routines outside the explicit How of control of instruction execution. An exception is an event that is relevant primarily to the currently executing process and normally invokes a software routine in the context of the CUtTent process. .An interrupt is an event which is usually due to some activity outside the current process and invokes a software routine outside the context of the current process. Exceptions and interrupts are reported by constructing a frame on the stack and then dispatching to the service routine through an event-specific vector in the System Control Block (8CB). The minimum stack frame for any interrupt or exception is a PCIPSL pair as shown in Figure 2-22. Figure 2-22: Minimum Exception Stack Frame 3: 30 2~ :8\27 26 25 :~:23 22 21 20/19 18 :- :6,:5 1~ 13 ::,:: 10 O~ 08107 06 05 04103 02 01 00 ------~--------------------~--------------------------------------------------------------+--~--. \ : (SP) ------~-----~-----~-----------~--.--~--~--------------------------+--+--~----.--+--~--~--~-----+ ------------------------------+--~--------------------+-----~--------------~--+--~-----~--------. This minimum stack frame is used for all interrupts. Certain exceptions expand the stack frame by pushing additional parameters on the stack above the PCIP~L pair as shown in Figure 2-23. Figure 2-23: General Exception Stack Frame 31 30 2~ 28127 26 25 24123 22 21 20119 18 17 16/15 14 13 l2111 10 O~ 08/07 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Parameter n I : (SP) +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I Parameter 1 I +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ \ PC I +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ / PSI. 1 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ What parameters, if any, are pushed on the stack above the PCIPSL pair is a function of the specific exception being reported. 2.7.1 Interrupts DEC Standard 032 defines 31 interrupt priority levels, a subset of which is implemented by the NVAX CPU. When an interrupt request is generated, the hardware compares the request with the current IPL of the CPU. If the new request is of higher priority an internal request is generated. At the completion of the current instruction (or at selected points during the execution of interruptible instructions), a microcode interrupt handler is invoked to process the request. With hardware assistance, the microcode handler determines the highest priority interrupt, DIGITAL CONFIDENTIAL Architectural Summary 2-33 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 updates the IPL, pushes a PCIPSL pair on the stack, and dispatches to a macrocode interrupt handler through the appropriate location in the SCB. Of the 31 interrupt priority levels defined by DEC Standard 032, the NVAX CPU makes use of 24 of them, as shown in Table 2-8. Table 2-8: Interrupt Priority Levels lPL (hex) IPL (decimal) Interrupt Condition IF IE ID 31 BALT_L asserted (non maskable) 30 PWRFL_L asserted 29 B_ERR_L asserted (or internal hard elTOt' detected) Ie 28 Unused IB 27 Performance microcode) 1A 26 S_ERR_L asserted (or internal soft error detected) 18-19 17 16 15 14 10-13 24-25 Unused 23 20 m(LL<3> asserted m(LL<2> or I!\'"T_TlM_L asserted <mQ..L<2> takes priority) mQ..L<l> asserted mQ..L<o> asserted 16-19 Unused 01-15 SofWrare interrupt asserted 01-OF 22 21 monitoring intelTUpt (internally handled by Interrupts are discussed in more detail in Chapter 10. 2.7.1.1 Interrupt Control Registers The interrupt system is controlled by three processor registers: the Interrupt Priority Level Register (IPL), the Software Interrupt Request Register (SIRR), and the Software Interrupt Summary Register (SISR). A new interrupt priority level may be loaded into PSL<20:16> by writing the new value to IPL<4:0>. The IPL register is shown in Figure ~24. 2-34 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 2-24: IPR 12 (hex), IPL 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 01 PSL<20:16> 1 :IPL +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ A software interrupt may be requested by writing the desired level to 8IRR<3:0>. The 81RR register is shown in Figure 2-25. Figure 2-25: IPR 14 (hex), SIRR 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 OIRequest IPLI :SIRR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ The 818R register records pending software interrupt requests at levels 01 through OF (hex). The 818R register is shown in Figure 2-26. Figure 2-26: IPR 15 (hex), SISR 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 01 I I I I I 1 1 1 1 1 1 1 1 1 1 01 :SISR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 1 1 1 IPL 15 request --+ 1 IPL 2 request --+ 1 IPL 14 request --+ IPL 1 request --+ 2.7.2 Exceptions The VAX architecture recognizes six classes of exceptions. Table 2-9 lists instances of exceptions in each class. Table 2-9: Exception Classes Exception Class Instances Arithmetic trapslfaults Integer over1low trap Integer divide-by-zero trap Subscript range trap Floating overfiow fault Floating divide-by-zero fault Floating underfiow fault DIGITAL CONFIDENTIAL Architectural Summary 2-35 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 2-9 (Cont.): Exception Classes Exception Class Instances Memory management exceptions Access control violation fault Translation not valid fault M::O fault Operand reference exceptions Reserved addressing mode fault Reserved operand fault or abort Instruction execution exceptions Reserved/privileged instruction fault Emulated instruction faults. XFC fault Change-mode trap Breakpoint fault Vector disabled fault Tracing exceptions Trace fault System failure exceptions Kemel-stack-not-valid abort IntelTUpt-stack-not-valid halt Console error halt Machine check abort A trap is an exception that occurs at the end of the instruction that caused the exception. Therefore, the PC saved on the stack is the address of the next instruction that would normally have been executed. A fault is an exception that occurs during an instruction and that leaves the registers and memory in a consistent state such that elimination of the fault condition and restarting the instruction will give correct results. After the instruction faults, the PC saved on the stack points to the instruction that faulted. An abort is an exception that occurs during an instruction. An abort leaves the value of registers and memory UNPREDICTABLE such that the instruction cannot necessarily be correctly restarted, completed, simulated, or undone. In most instances, the NVAX microcode attempts to convert an abort into a fault by restoring the state that was present at the start of the instruction which caused the abort. The following sections describe only those exceptions which are unique to the NVAX CPU, or where DEC Standard 032 is not clear about the implementation. 2.7.2.1 Arithmetic Exceptions Arithmetic exceptions -are detected during the execution of instructions that perform integer or floating point arithmetic manipulations. Whether the exception is reported as a trap or a fault is a function of the specific event. In any case, the exception is reported through SCB vector 34 (hex) with the stack frame shown in Figure 2-27. Table 2-10 lists the exceptions reported by this mechanism. 2-36 Architectural Summary DIGITAL CONFIDENTlAL NVAX CPU Chip Functional Specification, Revision 1.2, December 1991 Figure 2-27: Arithmetic Exception Stack Frame 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 OS 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Type Code 1 1 : (SP) +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ PC 1 1 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ PSL 1 1 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Table 2-10: Arithmetic Exceptions Type Code Decimal Hex Type Exception 1 1 Trap Integer overflow 2 2 Trap Integer divide-by-zero 7 7 Trap Subscript range 8 8 Fault Floating overflow 9 9 Fault Floating divide-by-zero 10 A Fault Floating underflow 2.7.2.2 Memory Management Exceptions Memory management exceptions are detected during a memory reference and are always reported as faults. The three memory management exceptions are listed in Table 2-11. All three exceptions push the same frame on the stack, as shown in Figure 2-28. The top longword of the stack frame contains a fault parameter whose bits are described in Table 2-12. Table 2-11: Memory Management Exceptions SCB Vector Exception 20 (hex) Access control violation 24 (hex) Translation not valid 30 (hex) Modify fault DIGITAL CONFIDENTIAL Architectural Summary 2-37 NVAX CPU Chip Functional Specification, Revision 1.2, December 1991 Figure 2-28: Memory Management Exception Stack Frame 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 01 MI PI LI : (SP) +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 Some Virtual Address in the Faulting Page 1 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 PC 1 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I PSL 1 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Table 2-12: Memory Management Exception Fault Parameter Bit Mnemonic Meaning o Length violation 1 L p 2 M Modify or write intent 2.7.2.3 PTE reference Emulated Instruction Exceptions The NVAX CPU implements the VAX base instruction group. For certain instructions outside that group, the NVAX microcode provides support for the macrocode emulation of instructions. There are two types of emulation exceptions, depending on whether PSL<FPD> is set at the beginning of the instruction. If PSL<FPD>=O at the beginning of the instruction, the exception is reported through 8CB vector C8 (hex) as a trap with the stack frame shown in Figure 2-29. The longwords in the stack frame are described in Table 2-13. 2-38 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 2-29: Instruction Emulation Trap Stack Frame 31 30 29 28127 26 25 24123 22 21 20119 18 l7 l61l5 14 13 l211l lO 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Opcocie 1 : (SP) +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I Old PC 1 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Specifier n +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Specifier f2 .--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Specifier .3 .--+--+--~--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Specifier .4 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--~--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Specifier f 5 ---+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--~--+--+ Specifier +6 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+-_._----+--+--+--+--+--+--+--+--+--+--+--+--+--+ SpeCifier f.' 7 ---+--+--~--+--~--~--~-+--~-----~--+--+--+--+--+-----+-----+--+--.--T--~--+--~--~--+--+--+--+--+ Speci!ier .8 PC ---.--T-----------~--~--~--+--~--+--+-----~--~--------~-----~----~--+--~--~--+--~--+--T--+-----~ ---+--~--~--+--+--~--+--~--+--~--~--~--.--~--~--~--------------~--~--+--~--------~-~--~--+------ Table 2-13: Instruction Emulation Trap Stack Frame Location Use Opcode Zero-extended opcode of the emulated instruction Old PC PC of the opcode of the emulated instru.ction Specifiers Address of the specified operand for specifiers of access type write (.wx) or address (.ax). Operand value for specifiers of access type read (.rx). For read-type operands whose size is smaller than a longword, the remaining bits are UNPREDICTABLE. For those instructions that don't have 8 specifiers, the remaining specifier longwords contain UNPREDICTABLE values New PC PC of the instruction following the emulated instruction PSL PSL saved at the time of the trap If PSL<FPD>=l at the beginning of the instruction, the exception is reported through SCB vector CC (hex) as a fault with the stack frame shown in Figure 2-30. In this case, PC is that of the opcode of the emulated instruction. DIGITAL CONFIDENTIAL Architectural Summary 2-39 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 2-30: Suspended emulation Fault Stack Frame 31 30 29 28127 26 25 24123 22 21 20119 16 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I PC I : (SP) +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ PSL I I +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 2.7.2.4 Vector Unit Disabled Fault Vlhen the NVAX CPU attempts to issue a vector instruction to the optional vector processor, it may discover that the vector unit is disabled. In this case, a vector unit disabled fault is initiated through 8CB vector 68 (hex). There are no parameters for this exception (besides the usual PCIPSL pair), and the reason for the exception must be determined by reading the appropriate vector unit registers. 2.7.2.5 Machine Check exceptions A machine check exception is reported through 8CB vector 04 (hex) when the ~TVAX CPU detects an error condition. The frame pushed on the stack for a machine check indicates the type of error and provides internal state information that may help identify the cause of the error. The generic machine check stack frame is shown in Figure 2-31. Machine checks are discussed at length in Chapter 15. Figure 2-31 : Generic Machine Check Stack Frame 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--~--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ B~e Count of Parameters, Excluding This Longword I : (SP) +-_._-.--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ PC +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I PSL I +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 2.7.2.6 Console Halts In certain microcode Hows, the NVAX microcode may detect an inconsistency in internal state, a kernel-mode HALT, or a system reset. In these instances, the microcode initiates a hardware restart sequence which passes control to the console program. When a hardware restart sequence is initiated, the NVAX microcode saves the current CPU state, partially initializes the CPU, and passes control to the console program at physical address E0040000 (hex). During a hardware restart sequence, the stack pointer is saved in the appropriate stack pointer IPR (0 through 4), the current PC is saved in IPR 42 (SAVPC), and the current P8L, halt code, and validity ftag are saved in IPR 43 (SAVP8L). The format of SAVPC and 8AVPSL are shown in Figure 2-32. 2-40 Architectural SUi. imary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 2-32: IPR 2A (hex), SAVPC and IPR 28 (hex), SAVPSL 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 Saved PC 1 :SAVPC +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I PSL<31 : 16> 1 I I Halt Code 1 PSL<7 : 0> 1 : SAVPSL +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 1 MAPEN<O> --+ 1 Invalid SAVPSL if 1 --+ Console halts are discusssed in detail in Chapter 15. 2.8 System Control Block The System Control Block (SCB) is a page containing the vectors for servicing interrupts and exceptions. The SCB is pointed to by the System Control Block Base Register (SCBB), whose format is shown in Figure 2-33. For best performance, SCBB should contain a page-aligned address. Microcode forces a longword-aligned SCBB by clearing bits <1:0> of the new value before loading the register. NOTE When the CPU is configured to generate 30-bit physical addresses, SCBB<31:30> are ignored. Figure 2-33: IPR 11 (hex), SCaB 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Physical Page Address of SCB 1 SBZ 1 0 01 :SCBB +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 2.8.1 System Control Block Vectors An SCB vector is an aligned longword in the SCB through which the NVAX microcode dispatches interrupts and exceptions. Each SCB vector has the format shown in Figure 2-34. The fields of the vector are described in Table 2-14. DIGITAL CONFIDENTIAL Architectural Summary 2-41 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 2-34: System Control Block Vector 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I longword address of service routine Icode I +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Table 2-14: System Control Block Vector Bits Contents 31:2 VutuaJ. address of the service routine for the interrupt or exception. The routine must be longword aligned, as the microcode forces the lower two bits of the address to 00 1:0 Code, interpreted as follows: 2.8.2 Value Meaning 00 The event is to be serviced on the kernel stack unless the CPU is already on the interrupt stack, in which case the event is serviced on the interrupt stack 01 The event is to be serviced on the interrupt stack. If the event is an exception, the IPL is raised to IF (hex) 10 Unimplemented, results in a console error halt 11 Unimplemented, results in a console error halt System Control Block Layout The System Control Block layout is shown in Table 2-15. Table 2-15: System Control Block Layout Vector Name Type Param Notes 00 04 passive release interrupt 0 IPL is raised to request IPL machine check abort 6 parameters reflect machine state; must be serviced on interrupt stack 08 kernel stack not valid abort 0 must be serviced on interrupt stack OC 10 power fail interrupt 0 IPL is raised to IE (hex) reserved/privileged instruction fault 0 14 18 customer reserved instruction fault 0 XFC instruction reserved operand fault/abort 0 not always recoverable lC reserved addressing mode fault 0 20 access control violation/vector alignment fault fault 2 2-42 Architectural Summary parameters are virtual address, status code DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 2-15 (ConI.): System Control Block Layout vector Name Type Param Notes 24 translation not valid fault 2 parameters are virtual address, status code 28 trace pending fault o 2C breakpoint instruction fault o 30 unused 34 arithmetic trap'fault 38-3C unused 40 compatibility mode VAXes in other trap'fault 1 parameter is type code CHMK trap 1 parameter is operand word sign-extended 44 CHME trap 1 parameter is operand word sign-extended 48 CHMS trap 1 parameter is operand word sign-extended 4C CHMU trap 1 parameter is operand word sign-extended 50 unused 54 soft error notification interrupt o IPL is 1A (hex) 58 Performance counter overflow 5C unused 60 hard error notification monitoring Internal interrupt at IPL IB (hex). This vector supplies the physical base address of the block of performance monitoring counts in memory. See Chapter 18 for details. interrupt interrupt o IPL is ID (hex) fault o vectorinstrucQons IPL is 16 (hex) 64 unused 68 vector unit disabled 6C-7C unused 80 interprocessor interrupt interrupt 84 software level 1 interrupt o o 88 software level 2 interrupt o ordinarily used for AST delivery 8C software level 3 interrupt o ordinarily used scheduling 90-BC softwareleveb4-15 interrupt o CO C4 interval timer interrupt o for process IPL is 16 (hex) unused DIGITAL CONFIDENTIAL Architectural Summary 2-43 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 2-15 (Cont.): System Control Block Layout Vector Name Type Param Notes C8 emulation start fault 10 same mode exception, FPD=O; parameters are opcode, PC, specifiers CC emulation continue fault 0 same mode exception, FPD=l; no parameters DO-F4 unused F8 console receiver interrupt 0 IPL is 15 (hex) FC console transmitter interrupt 0 IPL is 15 (hex) interrupt 0 Device interrupt vectors 100-FFFC device vectors 2.9 CPU Identification Software may quickly determine on which CPU it is executing in a multi-processor system by reading the CPUID processor register. The format of this register is shown in Figure 2-35. Figure 2-35: IPR OE (hex), CPUID 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 01 CPU Identification I :CPUID +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ The CPUID processor register is implemented internally as an 8-bit read-write register. The source of the CPU ID information is system-specific, and it is the responsibility of the console firmware at powerup to determine the CPU ID from the system-specific source, and write the CPU ID register to the correct value. 2.10 System Identification The System Identification Register (SID) is a read-only register which includes the the system (actually the CPU) type, and the microcode revision number. The format of the SID register is shown in Figure 2-36. 2-44 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 2-36: IPR 3E (hex), SID 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 CPU Type 1 0 0 0 0 0 0 0 0 0 OIPatch RevisionlNSI Microcode Revision I :SID +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ DIGITAL CONFIDENTIAL Architectural Summary 2-45 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 2-16: SID Field Descriptions Name Extent Type Microcode Revision 7:0 RO This field contains the microcode (chip) revision number. This number is incremented for each pass of the chip. NS S RO,O If this bit is a zero, there is either no microcode patch loaded, or the patch is a standard patch. If this bit is a one, a non-standard microcode patch is loaded.. A non-standard patch is one which goes beyond the formally released patches, such as a patch used for performance analysis. This bit is cleared on chip reset. Patch Revision 13:9 RO,O If this field is zero, no microcode patch is loaded. If this field is non-zero, a microcode patch is loaded and this field indicates the patch number. This field is cleared on chip reset. CPU Type 31:24 RO This field contains 19 (declmal), indicating that this is an NVAXCPU. Description NOTE The patch revision and non-standard patch fields (SID<13:8» were added in pass 2 of the NVAX chip. 2.11 Process Structure A process is a single thread of execution. The context of the current process is contained in the Process Control Block (PCB). The PCB is pointed to by the Process Control Block Base register (PCBB), which is shown in Figure 2-37. The format of the process control block is shown in Figure 2-38. Microcode forces a longword-aligned PCBB by clearing bits <1:0> of the new value before loading the register. NOTE When the CPU is configured to generate 30-bit physical addresses, PCBB<31:30> are ignored. 2-46 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 2-37: IPR 10 (hex), PCBB 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 OS 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I Physical Longword Address of the PCB 1 0 01 :PCBB +--+--+--~+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ DIGITAL CONFIDENTIAL Architectural Summary 2-47 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 2-38: Process Control Block 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I KSP :PCB +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ ESP +4 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ SSP +8 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ USP +12 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I RO I +16 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ R1 +20 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ R2 I +24 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I R3 +28 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I R4 +32 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I R5 +36 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I R6 +40 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I R7 +44 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ R8 +48 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I R9 +52 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I R10 +56 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I Rll I +60 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I AP(Rl2) I +64 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I FP (Rl3) I +68 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I PC +72 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I PSL +76 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I POBR I +80 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 0 0 0 0 0 I ASTLVL I 0 0 I POLR I +84 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ P1BR I +88 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ IPME 0 0 0 0 0 0 0 0 0I PlLR I +92 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 2-48 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Hevision 1.0, February 1991 2.12 Processor Registers The processor registers that are implemented by the NVAX CPU chip, and those that are required of the system environment, are logically divided into five groups, as follows: • • • • • Normal-Those IPRs that address individual registers in the NVAX CPU chip or system environment. Bcache tag IPRs-The read-write block of IPRs that allow direct access to the Bcache tags. Bcache deallocate IPRs-The write-only block of IPRs by which a Bcache block may be deallocated. Pcache tag IPRs-The read-write block of IPRs that allow direct access to the Pcache tags. Pcache data parity IPRs-The read-write block of IPRs that allow direct access to the Pcache data parity bits. Each group of IPRs is distinguished by a particular pattern of bits in the IPR address, as shown in Figure 2-39. Figure 2-39: IPR Address Space Decoding 31 30 29 2612i 26 25 24123 22 21 20119 l8 17 16115 14 13 12111 10 09 0810i 06 05 0'103 02 01 00 ------~--~-----+--~--~--~-----.--+--+--~--~--~--~--~--------------+-----~--~--~-----------+-----~ Bcache Tag !PR Address 31 30 29 2812i 26 25 24123 22 21 20119 18 l7 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--~--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ SBZ Bcacbe Tag Index 1 11 01 01 xl SBZ +--+--+--~--~--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Bcache Deallocate IPR Address 31 30 29 28127 26 25 24123 22 21 20119 18 1i 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 SBZ 1 11 0 1 11 xl Bcache Tag Deallocate Index 1 SBZ 1 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Pcache Tag IPR Address 31 30 29 28127 26 2S 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ SBZ 1 11 11 0 I SBZ I Pcacbe Tag Index I SBZ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I Pcache Set Select (O-left, 1-rigbt) -+ Pcache Data Parity IPR Address 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I SBZ 1 11 11 11 SBZ I Pcacbe Tag Index I SBZ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I Pcache Set Select (o-left, 1-right) -+ DIGITAL CONFIDENTIAL 1 Subblock select + Architectural Summary 2-49 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The numeric range for each of the four groups is shown in Table 2-17. Table 2-17: IPR Address Space Decoding IPR Group Mnemonic2 IPR Address Range (hez:) Contents OOOOOOOO..OOOOOOFFl Normal 256 individual IPRs. I 64k Bcache tag IPRs, each separated by 20(bex) from the previous one. Bcache Tag BCTAG 01000000..011FFFEO Bcache Deallocate BCFLUSH 01400000..015FFFEOI 64k Bcache tag deallocate IPRs, each separated by 20(bex) from the previous one. Pcache Tag PCTAG 01800000..01801FE01 256 Pcache tag IPRs, 128 for each Pcache set, each separated by 20(bex) from the previous one. Pcache Data Parity PCDAP 01COOOOO .. 01C01FFS 1 1024 Pcache data parity IPRs, 512 for each Pcache set, each separated by 8(bex) from the previous one. 1 Unused fields in the IPR addresses for these groups should be zero. Neither hardware nor microcode detects and faults on an address in which these bits are non-zero. Although non-contiguous address ranges are shown for these groups, the entire IPR address space maps into one of the these groups. H these fields are non-zero, the operation of the CPU is Ul\TDEFINED. 2The mnemonic is for the first lPR in the block NOTE The address ranges shown above are those used by the programmer. When processing normal IPRs, the microcode shifts the IPR number left by 2 bits for use as an IPR command address. This positions the IPR number to bits <9:2> and modifies the address range as seen by the hardware to 0 .. 3FC, with bits <1:0>=00. No shifting is performed for the other groups of IPR addresses. Because of the sparse addressing used for IPRs in groups other than the normal group, valid IPR addresses are not separated by one. Rather, valid IPR addresses are separated by either 8 or 20(hex). For example, the IPR address for Bcache tag 0 is 01000000 (hex), and the IPR address for Bcache tag 1 is 01000020 (hex). In this group, bits <4:0> of the IPR address are ignored, so IPR numbers 01000001 through 0100001F all address Bcache tag o. Similarly, the IPR address for the first subblock of Pcache data parity is 01COOOOO (hex), and the IPR address for the second subblock of Pcache data parity is 01COOOO8 (hex). Processor registers in all groups except the normal group are processed entirely by the NVAX CPU chip and will never appear on the NDAL. This is also true for a number of the IPRs in the normal group. IPRs in the normal group that are not processed by the NVAX CPU chip are converted into 110 space references and passed to the system environment via a read or write command on the NDAL. Each of the 256 possible IPRs in the normal group are of longword length, so a lKB block of 110 space is required to convert each possible IPR to a unique 110 space longword. This block starts at address E1000000 (hex). Conversion of an IPR address to an 110 space address in this block 2-50 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 is done by shifting the IPR address left into bits <9:2>, filling bits <1:0> with zeros, and merging in the base address of the block. This can be expressed by the equation [0 ADDRESS = E1000000 + (IPR NUMBER * 4) The actual hardware implementation of this is different in that the IPR number is shifted left by 2 bits, and bits <31:30,24> are set. There is no multiply or add done as one might conclude from the equation. Because many of the 256 possible IPRs in the normal group are processed entirely by the NVAX CPU chip, the corresponding I/O space location in the 1KB block is never referenced as a result of an MTPRlMFPR to or from these IPRs. However, note that a programmer can indeed reference these locations via an explicit I/O space reference with, e.g., MOVL. References to this block of 110 space locations with instructions other than MTPR/MFPR may result in UNDEFINED behavior. The processor registers implemented by the NVAX CPU are are shown in Table 2-18. NOTE Many of the processor registers listed in Table 2-18 are used internally by the microcode during normal operation of the CPU, and are not intended to be referenced by software except during test or diagnosis of the system. These registers are flagged with the notation "Testability and diagnostic use only; not for software use in normal operation". References by software to these registers during normal operation can cause UNDEFINED behavior of the CPU. DIGITAL CONFIDENTIAL Architectural Summary 2-51 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 2-18: Processor Registers Number Register Name Mnemonic (Dec) (Hex) Type Imp! Cat Kernel Stack Pointer KSP Executive Stack Pointer ESP Supervisor Stack Pointer SSP User Stack Pointer USP 3 Interrupt Stack Pointer ISP 0 0 RW NVAX 1-1 1 1 RW 2 RW 3 RW RW NVAX NVAX NVAX NVAX 1-1 2 1I0Adcb 1-1 1-1 4 4 Reserved 5 5 3 El00001· Reserved 6 6 3 El00001~ Reserved 7 7 3 El00001C 1-1 PO Base Register POBR 8 8 RW NVAX 1-2 PO Length Register POLR 9 9 RW 1-2 PI Base Register PIBR 10 A RW PI Length Register PILR 11 B RW System Base Register SBR 12 C RW System Length Register SLR 13 D RW CPU Identification1 CPUID 14 E RW NVAX NVAX NVAX NVAX NVAX NVAX Reserved 1-2 1-2 1-2 1-2 2-1 3 El00003( 15 F Process Control Block Base PCBB 16 10 RW System Control Block Base Interrupt Priority Levell ASTLevel l SCBB 17 11 RW IPL 18 12 RW ASTLVL 19 13 RW Soft.ware Interru.pt Request Register SmR 20 14 W Soft.ware Interru.pt Summary Registerl SISR 21 15 RW Reserved 22 16 3 El000058 Reserved 23 17 3 El00005C 2-7 El000060 3-7 El000064 3-7 El000068 2-3 El00006C 2-3 El000070 2-3 El000074 2-3 El000078 2-3 El00007C 2-3 El000080 Interval Counter ControllStatus l ,2 ICCS 24 18 RW Next Interval Count NICR 25 19 W Interval Count ICR 26 1A R Time of Year Register TODR 27 IB RW Console Storage Receiver Status CSRS 28 lC RW Console Storage Receiver Data CSRD 29 ID R Console Storage 'Transmitter Status CSTS 30 IE RW Console Storage 'Transmitter Data CSTD 31 IF W Console Receiver Contro1lStatus RXCS 32 20 RW NVAX NVAX NVAX NVAX NVAX NVAX NVAX System System System System System System System System 1-1 1-1 1-1 1-1 1-1 1-1 1 Initialized on reset 2Subset or full implementation depending on ECR control hit 2-52 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU .Chip Functional Specification, Revision 1.0, February 1991 Table 2-18 (Cont.): Processor Registers Number VOAddress Register Name Mnemonic (])ec) (Bm:) Type Impl Console Receiver Data Buffer RXDB 33 21 R System 2-3 E1000084 Console Transmitter Control/Status TXCS 34 22 RW System 2-3 E1000088 Console Transmitter Data Buffer TXDB W System 2-3 E100008C Cat 35 23 Reserved 36 24 3 E1000090 Reserved 37 25 3 E1000094 38 26 Reserved 39 27 3 E100009C Reserved 40 28 3 E10000AO Reserved 41 29 3 E10000A4 42 2A R R Machine Check Error Register MCESR Console Saved PC SAVPC Console Saved PSL SAVPSL NVAX W :NVAX NVAX 2-1 2-1 43 2B Reserved 44 2C 3 E10000BO Reserved 45 2D 3 E10000B4 Reserved 46 2E 3 E10000B8 Reserved 47 2F 3 E10000BC Reserved 48 30 3 E10000CO Reserved 49 31 3 E10000C4 Reserved 50 32 3 E10000C8 Reserved 51 33 3 E10000CC Reserved 52 34 3 E10000DO Reserved 53 35 3 E10000D4 Reserved 54 36 3 E10000D8 IORESET 55 37 W System 2-3 MAPEN 56 38 RW Translation Buffer Invalidate All TBIA 57 39 W Translation Buffer Invalidate Single TBIS 58 3A W NVAX NVAX NVAX I/O System Reset Register Memory Management Enable 1 2-1 E10000DC 1-2 1-1 1-1 3B 3 E10000EC 60 3C 3 E10000FO PME 61 3D RW System Identification sm 62 3E R Translation Buffer Check TBCHK 63 3F W Reserved . Reserved Performance Monitor Enable 1 59 NVAX NVAX NVAX 2-1 2-1 1-1 llnitialized on reset DIGITAL CONFIDENTIAL Architectural Summary 2-53 NVAX CPU Chip Functional Specification, R.evision 1.0, F.ebruary 1991 Table 2-18 (Cont.): Processor Registers Number Register Name Mnemomc (Dec) (Hex) Type Impl IPL 14 Interrupt ACKs IPL 15 Interrupt ACKs IPL 16 Interrupt ACKs IPL 17 Interrupt ACKs Clear Write Buffers IAKl4 IAKl5 64 1AK16 66 IAKl7 CWB Reserved 67 68 69 70 71 72 73 45 46 47 48 49 3 Reserved 74 4A 3 Reserved 75 4B 3 Reserved 76 4C 8 Reserved 4D 4E 4F 50 51 52 53 3 Reserved 77 78 79 80 81 82 83 Reserved 84 54 Reserved 85 Reserved 86 Reserved 87 55 56 57 58 59 5A 5B 5C 5D 5E 5F Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved 88 Reserved Reserved 89 90 91 92 93 Reserved 94 Reserved 95 Reserved Reserved Reserved I10Addre R System 2-3 System 2-3 R System 2-3 43 R 44 RW System 2-3 System 2-3 E1000100 E1000104 E1000108 E100010C E1000110 E1000114 E1000118 E100011C E1000120 E1000124 E1000128 E100012C E1000130 E1000134 E1000138 E100013C E1000140 E1000144 E1000148 E100014C E1000150 E1000154 E1000158 E100015C E1000160 E1000164 E1000168 ElOOO16C E1000170 E1000174 ElOOO178 E1OOO17C 40 41 42 65 Reserved Cat R 8 3 3 8 3 3 3 3 3 3 3 8 3 3 3 3 3 3 3 3 3 8 sTestability and diagnostic use only; not for software use in normal operation 2-54 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 2-18 (Cont.): Processor Registers Number Register Name Mnemonic (Dec) (He:s:) Type Impl Cat IlOAddress Reserved 96 60 3 E1000180 Reserved 97 61 3 E1000184 Reserved 98 62 3 E1000188 Reserved 99 63 3 E100018C Reserved for VM 100 64 3 E1000190 Reserved for VM 101 65 3 E1000194 Reserved for VM 102 66 3 E1000198 Reserved 103 67 3 E100019C Reserved 104 68 3 E1000lAO Reserved 105 69 3 E1000lA4 Reserved 106 6A 3 E1000lA8 Reserved 107 6B 3 E1000lAC Reserved 108 6C 3 E10001BO Reserved 109 6D 3 E10001B4 Reserved 110 6E 3 E10001B8 Reserved 111 6F 3 E10001BC Reserved 112 70 3 E10001CO Reserved 113 71 3 E10001C4 Reserved 114 72 3 E10001C8 Reserved 115 73 3 E10001CC Reserved 116 74 3 E10001DO Reserved 117 75 3 E10001D4 Reserved 118 76 3 E10001D8 Reserved 119 77 3 E10001DC Reserved for Ebox 120 78 2-6 E 1000lEO Reserved for Ebox 121 79 2-6 E10001E4 Interrupt System Status Registe~ INTSYS 122 7A RW Performance Monitoring Facility Count PMFCNT. 123 7B RW Patchable Control Store Control Registers PCSeR 124 7C RW Ebox Control Register ECR 125 7D RW Mbox TB Tag FillS MTBTAG. 126 7E W Mbox TB PTE FillS MTBPI'E 127 7F W NVAX NVAX NVAX NVAX NVAX NVAX 2-1 2-1 2-1 2-1 2-1 2-1 sTesta.bility and diagnostic use only; not for software use in normal operation DIGITAL CONFIDENTIAL Architectural Summary 2-55 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 2-18 (Cont.): Processor Registers Number Register Name Mnemonic (Dec) (Hex) Type Impl Cat 110 Addrel Reserved for Vectors 128 80 3 E1000230 Reserved for Vectors 129 81 3 E1000230 Reserved for Vectors 130 82 3 E1000230 Reserved for Vectors 131 83 3 E1000230 Reserved for Vectors 132 84 3 E1000230 Reserved for Vectors 133 85 3 E1000230 Reserved for Vectors 134 86 3 E1000230 Reserved for Vectors 135 87 3 E1000230 Reserved for Vectors 136 88 3 E1000230 Reserved for Vectors 137 89 3 E1000230 Reserved for Vectors 138 SA 3 E1000230 Reserved for Vectors 139 BB 3 E1000230 Reserved for Vectors 140 8e 3 E1000230 Reserved for Vectors 141 8D 3 E1000234 Reserved for Vectors 142 8E 3 E1000238 143 8F 3 E100023C Vector Processor Status Register Reserved for Vectors VPSR 144 90 RW Vector 3 E1000240 Vector Arithmetic Exception Register VAER 145 91 R Vector 3 E1000244 Vector Memory Activity Register VMAC 146 92 R Vector 3 E1000248 Vector Trans. Buffer Invalidate All VTBIA W Vector 147 93 3 E100024C Reserved for Vectors 148 94 3 E1000250 Reserved for Vectors 149 95 3 E1000254 Reserved for Vectors 150 96 3 E1000258 Reserved for Vectors 151 97 3 E100025C Reserved for Vectors 152 98 3 E1000260 Reserved for Vectors 153 99 3 E1000264 Reserved for Vectors 154 9A 3 E1000268 Reserved for Vectors 155 9B 3 E100026C Reserved for Vectors 156 9C 3 E1000270 Reserved for Vectors 157 9D 3 E1000274 Reserved for Vectors 158 9E 3 E1000278 Reserved for Vectors 159 9F 3 E100027C 2-56 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 2-18 (Cont.): Processor Registers Number Register Name Mnemonic (Dec) (Hes:) Type Impl Cat Obox Control Register CCTL NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX :t\"VAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX 2-5 Reserved for Obox 160 AO 161 A1 A2 W Bcache Data ECC BCDECC 162 RW Bcache Error Tag Status BCETSTS 163 A3 RW Bcache Error Tag Index BCETIDX 164 A4 R Bcache Error Tag BCETAG 165 AD R Bcache Error Data Status BCEDSTS 166 A6 RW Bcache Error Data Index BCEDIDX 167 A7 R Bcache Error ECC BCEDECC 168 AS R Reserved for Cbox 169 A9 Reserved for Obox 170 AA Fill Error Address CEFADR 171 AB R Fill Error Status OEFSTS 172 AO RW 173 AD 174 AE 175 AF 176 BO 177 B1 178 B2 Reserved for Obox NDAL Error Status :r-..~STS Reserved for Cbox NDAL Error Output Address NEOADR Reserved for Cbox NDAL EITOr Output Command NEOCMD 179 B8 NEDATHI 180 B4 181 B5 NEDATLO 182 B6 188 B7 184 B8 Reserved for Obox 185 B9 Reserved for Cbox 186 BA Reserved for Cbox NDAL EITOr Data High Reserved for Obox NDAL Error Data Low Reserved for Obox NDAL Error Input Command NEIOMD Reserved for Cbox 187 BB Reserved for Cbox 188 BC Reserved for Cbox 189 BD Reserved for Cbox 190 BE Reserved for Obox 191 BF DIGITAL CONFIDENTIAL RW R R R R R 110 Address 2-6 2-5 2-5 2-5 2-5 2-5 2-5 2-5 2-6 2-6 2-5 2-5 2-6 2-5 2-6 2-5 2-6 2-5 2-6 2-5 2-6 2-5 2-6 2-5 2-6 2-6 2-6 2-6 2-6 2-6 2-6 Architectural Summary '2-57 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 2-18 (Cont.): Processor Registers Number Register Name Mnemonic (Dec) (Be:&:) Type Impl Cat I10Addrt Reserved 192 00 3 El000300 Reserved 193 01 3 El000304 Reserved 194 02 3 El000308 Reserved 195 03 3 El00030C Reserved 196 C4 3 El000310 Reserved 197 05 3 El000314 Reserved 198 OS 3 El000318 Reserved 199 07 3 El000310 Reserved 200 08 3 E1000320 Reserved 201 09 3 E1000324 Reserved 202 CA 3 El000328 Reserved 203 OB 3 E100032C Reserved 204 CC 3 El000330 Reserved 205 CD 3 El000334 Reserved 206 CE 3 E1000338 3 E100033C Reserved VIC :Memory Address Register VIC Tag Register VIC Data Register Ibox Control and Status Register Ibox Branch Prediction Control RegisterB VMAR VTAG VDATA ICSR BPCR Reserved for !box 207 CF 208 DO 209 D1 210 D2 211 D3 212 D4 213 D5 RW RW RW RW RW Ibox Backup pQ4 BPC 214 DS R Ibox Backup PC with RLOG Unwind' BPCUNW 215 D7 R Reserved for !box 216 D8 Reserved for !box 217 D9 Reserved for !box 218 DA Reserved for !box 219 DB Reserved for !box 220 DC Reserved for !box 221 DD Reserved for Ibox 222 DE Reserved for Ibox 223 DF NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX 2-5 2-5 2-5 2-5 2-5 2-6 2-5 2-5 2-6 2-6 2-6 2-6 2-6 2-6 2-6 2-6 sTestability and diagnostic use only; not for software use in normal operation "Chip test use only; not for software use 2~ Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0t February 1991 Table 2-18 (Cont.): Processor Registers Number Register Name Mnemonic (Dec) (Hex) Type Impl Cat Mhox PO Base Registe~ MPOBR 2-5 2-5 Mbox System Length RegisterS MSLR 229 Mbox Memory Management Enable! MMAPEN 230 E4 E5 E6 Mbox Physical Address Mode PAMODE 231 E7 RW l\Ibox M:ME Address MMEADR 232 E8 R l\fbox M:ME PTE Address MMEPTE 233 E9 R Mbox M:ME Status MMESTS 234 EA R 235 EB 236 EC R NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX !\"VAX RW 1'4\TAX. 2-5 NVAX NVAX NVAX NVAX NVAX NVAX NVAX. NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX NVAX 2-6 224 EO RW Mhox PO Length Registe~ MPOLR 225 El RW Mhox P1 Base RegisterS MP1BR 226 E2 RW Mhox P1 Length Registers MP1LR 227 E3 RW Mhox System Base Registe~ MSBR 228 RW RW Reserved for l\fbox ~fbox TB Parity Address TBADR l\fbox TB Parity Status TBSTS 23i ED Reserved for Mbox 238 EE Reserved for Mhox 239 EF Reserved for Mhox 240 FO 241 Fl 242 244 F2 F3 F4 Reserved for Mbox Mbox Pcache Parity Address PCADR Reserved for Mbox Mhox Pcache Status 243 PCSTS Reserved for Mhox 245 F5 Reserved for Mbox 246 F6 Reserved for Mbox 247 F7 Mhox Pcache Control PCCTL 248 F8 Reserved for Mbox 249 F9 Reserved for Mhox 250 FA Reserved for Mhox 251 FB Reserved for Mbox 252 FC Reserved for Mbo%. 253 FD Reserved for Mhox 254 FE Reserved for Mhox 255 FF RW R RW RW 110 Address 2-5 2-5 2-5 2-5 2-5 2-5 2-5 2-5 2-5 2-5 2·6 2-6 2-6 2-6 2·5 2-6 2-5 2-6 2-6 2-6 2-5 2-6 2-6 2-6 2-6 2-6 2-6 2-6 sTestability and diagnostic use only; not for software use in normal operation DIGITAL CONFIDENTIAL Architectural Summary 2-59 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 2-18 (Cont.): Processor Registers Number Register Name Mnemonic (Dec) (Hex) Type Unimplemented 100- Impl Cat I10Addn 3 OOFFFFFF See Table 2-17 01000000- 2 F'F'F'F'F'F'F'F Type: R = Read-only register RW = Read-write register W = Write-only register Impl(emented): = = l\'"VAX Implemented in the NVAX CPU chip System Implemented in the system environment Vec'tor =Implemented in the optional vector unit or its NDAL interface Cat(eg-ory), class-subclass, where: class is one of: = 1 Implemented as per DEC standard 032 2 l'.I\iAX-spec:i:1ic implementation which is unique or different from the DEC standard 032 implementation 3 = Not implemented internally; converted to I/O space read or write and passed to system environment = subclass is one of: = 1 Processed as appropriate by Ebox microcode 2 =Converted to Mbox IPR number and processed via internal IPR command 3 = Processed by internal IPR command, then converted to I/O space read or write and passed to system environment 4 = If virtual machine option is implemented., processed as in 1, otherwise as in 3 5 = Processed by internal IPR command 6 May be block decoded; reference causes UNDEFINED behavior 7 = Full interval timer may be implemented in the system environment. Subset ICCS is implemented in NV.AX CPU chip = 2-60 Architectural Summary DIGITAL CONFIDENTIAL NVAX CPU Chip Functional SpecificatiOllt Revision 1.0, February 1991 2.13 1/0 space Addresses .As noted above, processor registers that are not implemented on the NVAX CPU chip are converted to I/O space reads or writes. Most of these IPRs are optional and may be implemented or not, as dictated by the needs of the system environment. The I/O space registers that must be implemented by the system environment are shown in Table 2-19. Table 2-19: 1/0 Space Registers I/O Space Address (Hex) Type Definition E0040000 RO Powerup boot ROM address from which the first instruction is fetched. EI000100 RO Interrupt acknowledge for an IPL 14 (hex) interrupt requested via the IRQ..L<O> pin. E1000104 RO Interrupt acknowledge for an IPL 15 (hex) interrupt requested via the mQ..L<l> pin. E1000108 RO Interrupt acknowledge for an IPL 16 (hex) interrupt requested via the mQ..L<2> pin. E100010C RO Interrupt acknowledge for an IPL 17 (hex) interrupt requested via the IRQ..L<3> pin. E1000110 RW Location which invokes a write buffer flush in the system environment. When this location is read, the CPU is waiting for confirmation that the ftush has completed. The returned data is ignored. DIGITAL CONFIDENTIAL Architectural Summary 2-61 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 2.14 Revision History Table 2-20: Revision History Who When Description of change Mike Uhler 06-Mar-1989 Release for extemal review. Mike Uhler 15-Dec-1989 Update for second-pass release. Mike Uhler 2O-Jul-1990 Update to reflect implementation. Mike Uhler 04-Dec-1990 Update after pass 1 PG. 2-62 Architectural Summary DIGITAL CONFIDENTIAL Chapter 3 NVAX Chip Interface 3.1 Introduction The NVAX chip communicates through five interfaces: the NDAL (NVAX data-address lines), the backup cache interface, the interrupt lines, the clocking interface, and the test interface. This chapter begins by listing all the NVAX pins and giving a brief description of each. The rest of the chapter describes the NDAL protocol in detail. The other interfaces are described as follows: the backup cache interfaces in Chapter 13, the intelTUpt lines in Chapter 10, the test interface in Chapter 19, and the clocking interface in Chapter 17. The NDAL is a 64-bit pended bidirectional bus which is used by the NVAX CPU to communicate with the system environment. The NDAL cycle time is three times longer than· the NVAX CPU cycle time. The NVAX CPU cycle time is targeted to 14ns, making the NDAL cycle time 42 ns. Binned CPU parts may run at 10ns, resulting in an NDAL cycle time of 30ns. The NDAL supports up to four (4) nodes with a maximum of one (1) NVAX CPU. In this spec, these four nodes are referred to as CPU (NVAX), IOl_NODE, I02_NODE, and the memory interface. The NVAX CPU contains a writethrough primary cache and a writeback backup cache. The NDAL is designed to support the writeback cache and cache coherency in a multiprocessor system. NOTE IMPORTANT INFORMATION REGARDING THE NVAX CHIP INTERFACE IS ALSO CONTAINED IN Chapter 10 (The Interrupt Section), Chapter 13 (The Cbox), Chapter 17 (Chip Clocking), AND Chapter 19 (Testability Micro-Architecture). THE READER MUST CONSULT THOSE CHAPTERS IN ORDER TO OBTAIN COMPLETE INFORMATION. 3.2 NVAX CPU pinout The NVAX CPU chip contains the pins listed in Table 3-1. Following the table, each pin is described in more detail. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-1 NVAX CPU Chip Functional Specifi.~tion, Revision 1.0, February 1991 Table 3-1: NVAX CPU pinout Pin 1101 Type2 Function Number Total 0 0 0 SS,lDlR SS,lDlR NVAX Request NVAXHold 1 SS,lDlR SS,lDlR SS,lDlR T,4D4R NVAX Suppress NVAXGrant Writeback Only Data/Address Lines T,4D4R T,4D4R Command NDAL SIGNALS (80 total)s P%CPU_REQ..L PO/OCPU_BOLD_L Po/oCPU_SUPPRESS_L PO/OCPU_GRANT_L I PO/OCPU_WB_ONLY_L I P%NDAL_B<63:O> 10 10 10 10 10 Po/oC:MD_B<8:0> P%ID_H<2:O> P%PARITY_B<2:O> Po/cACK_L 1 1 2 1 1 1 3 4 64 69 4 3 3 73 76 79 80 5 OD,4D4R N ode Identification Lines NDALParity Acknowledge SS,lD1R SS,lDlR SS,lD1R SS,lDlR Oscillator, High Asserted Oscillator, Low Asserted Test ClocklTimeout Clock Test Clock 1 1 1 1 81 82 83 SS,lDlR SS,lD4R SS,lD4R SS,lD4R Test Clock Control NDAL PHIl2, Driven NDAL PHI23, Driven NDAL PHI34, Driven 1 1 1 85 86 87 1 88 NDAL PHI41, Driven NDAL PHI12, Received NDAL PHI23, Received NDAL PHI34, Received 1 1 89 90 91 92 T~4D4R 1 CLOCKS (15 total)" Po/c;OSC_H P%OSC_L P%OSC_TCl_H Po/oOSC_TC2_B P%OSC_TEST_H P%PBl12_0UT_H P%PHI23_0UT_H P%PID34_0UT_B p%pm41_0UT_B I I I I I 0 0 0 P%PBl12_IN_H Po/oPHI23_IN_B P%PBl34_IN_B p%pm41_IN_B I SS,lD4R SS,lD4R I I SS,lD4R SS,lD4R I P%ASYNC_RESET_L P%SYS_RESET_L I 0 SS,1D4R SS,lDlR SS,lD3R 0 1 84 NDAL PHI4l, Received 1 1 Reset Input to NVAX Reset Output to System 1 1 93 94 95 INTERRUPT AND ERROR SIGNALS (10 total)1 P%MACBINE_CHECK_B 0 SS,lD1R Machine Check 1 96 Po/cIR(LL<3:O> P%H_ERR_L P%S_ERR_L I OD,3DlR 100 OD,3DlR OD,3D1R SS,1DlR Interrupt Request Lines Hard (unrecoverable) Error Soft (recoverable) Error 4 I I I Interval Timer Request 1 1 1 101 102 103 I I SS,ID1R SS,lDlR Power Fail Halt 1 1 105 P%INT_TIM_L P%PWRFL_L P%HALT_L 3-2 NVAX Chip Interface 104 DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Running Pin 110 Type Function Number Total SS,lD6R Tag Store Index Lines 16 121 SS,lD6R Tag Store Output Enable 1 122 SS,lD6R Tag Store Write Enable 1 123 T,7D7R Tag Store Tag 15 138 T,7D7R Tag Store ECC 6 144 T,7D7R Tag Store Owned Bit 1 145 T,7D7R Tag Store Valid Bit 1 146 BACKUP CACHE SIGNALS (133 total)· P%TS_INDEX_B<20:5> P%TS_OE_L P%TS_WE_L P%TS_TAG_B<31:17> P%TS_ECC_B<5:0> P%TS_OWNED_H P%TS_VALID_H P%DR_INDEX_B<20:3> P%DR_OE_L P%DR_WE_L P%DR_DATA_B<63:0> P%DR_ECC_B<7:0> 0 0 0 10 10 10 10 0 0 0 10 10 SS,lD18R Data RAM Index Lines 18 164 SS,1D18R Data RAM Output Enable 1 165 SS,lD18R Data RAM Write Enable 1 166 T,19D19R Data RAM Data Lines 64 230 T,19D19R Data RAM ECC 8 238 TEST SIGNALS (23 total) '1 P%TEST_DATA_H I SS,lD1R Test data input for microcode use. 1 239 P%TEST_STROBE_B I SS,lDIR Test strobe for microcode use. 1 240 P%DISABLE_OUT_L I SS,ID1R Disable NVAX Outputs 1 241 P%TEMP_H 0 SS,lD1R NVAX Temperature Output 242 P%TMS_H I SS,lDIR JTAG Test Mode Select 1 1 243 P%TCK_H I SS,lD1R JTAG Test Clock 1 244 P%TDI_H I SS,lDIR JTAG Serial Test Data Input 1 245 P%TDO_B 0 SS,lD2R JTAG Serial Test Data Output 1 24S P%PP_CMD_B<2:O> I SS,lD1R Parallel Test Port Command 3 249 P%PP_DATA_B<ll:O> 0 T,2D2R Parallel Test Port Data 12 261 1 Indicates whether the pin is an NVAX. CPU Input, Output, or Input/Output pin. 2Single Source is denoted by SS, Tristate by T, Open Drain by OD; #D indicates the maximum number of drivers and #R indicates the maximum. number of receivers expected on the board. sThese pins are discussed in detail in this chapter. 'These pins are discussed in detail in Chapter 17 6These pins are discussed in detail in Chapter 10 6These pins are discussed in detail in Chapter 13 7These pins are discussed in detail in Chapter 19 DIGITAL CONFIDENTIAL NVAX Chip Interface 3-3 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.2.1 NDAL Signals and Timing The functionality of the NDAL pins is described in detail in Section 3.3. The timing of the pins is shown in Figure 3-1, and the AC specs are given in Table 3-2. NOTE The timing of the NDAL signals is given relative to the NDAL clocks which are received by NVAX: P%PHI12_IN_H, P9DPHI23_IN_H, P%PHIS4_IN_H, and P%PHI41_IN_ H. NVAX drivers were designed to meet this timing, taking the NDAL clock skew into account. (NDAL clock skew is covered in Chapter 17.) NVAX expects to receive signals which have been designed taking the clock skew into account; NVAX receivers account for no clock skew. 3-4 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 3-1: NDAL Pin nmlng Relative to the NDAL CLOCKS ,----I-----Nl~ P4 P%PHI12_ IN_H P%PHI23_IN_H P%PHI34 _ IN_ H P%PHI41_IN_H P%ID 8<2:0> P%PARITY 8<2 :0> P%NDAI. 8<63:0> P%CMD_H<3:0> Pl _------~I 1 ------~~~ ---I P4 \. \ I : : ~I I j : 1------------ »»»>-- AJt ciriven by NVAX CPU: I \ 1 Driven from PtPHI12 IN 8 r.ia:i.ng edge Ra~ea.~~ with P~PHIil_lN~H rising edge ; ; ; >OOOOOOOOOOOOOO La~ch c~oses >-- I >) > : I with P%PHI41_IN_H ri~ing ed9- As receive~ by NVAX CPU I E---- P3 i _____________J~-------------~~____________ ~ ~«««{«««««( II P2 ~--------------~I1 /.,.______....;.i______~\. P-%ID H<2:0> P-%PARITY 8<2:0> P-%NDAL 8<63:0> P%CMD_ ik3 : 0> D ALe Y C L (~atch open ~U%'ing phi23) I 1----_________________ ! ______ I . .. .. ~--------------~\~.\~\~,~,~,~,~\~,\ i?II?II?I?I?II?I?li??III??li ~ ?~~led low by ~\TAX CP~ ~ ?~~led h.gh ~h%'ough boara pullup resistor h-;AX ~~~. ~ow w/?~P£:23 IN E r~sin;; ~~ release. w~th P~PEI23 IN H f.~~ing - i - -: :- - As required by NVAX CPU La~ch c~ose. with P%PH!3' IN H rising edge (~tch open during phi12)- 1----1 PtCPU HOLD L PtCPU-SUPPRESS L PtCPU:REQ_L - I 1 ~~~~~~--------A.a d%'i ven by NVAX CPU I _______ I_D_r_i_V_en_With POPHI12_lN_H riaiAg ~_e______________________ PtCPU WB ONLY L PtcPU:c:Rim_L- As requi%'e~ by NVAX CPU Latch c~oses with PtPHI41 IN 8 rising edge (~atch open c1uring pb.i23) - --- ---I DIGITAL CONFIDENTIAL 1 1 I - - - - - - \1 - - - - - - NVAX Chip Interface 3-5 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 3-2: NDAL AC timing specs Input Pin Hold Time Po/cNDAL_B<63:O> P%CMD_H<3:0> P%ID_B<2:O> Po/tPARITY_B<2:O> P%CPU_WB_ONLY_L P%CPU_GRANT_L Output Pin Drive Time Tristate Time P'iC!'.~.AL_H<63:O> Po/cCMD_B<3:0> P%ID_H<2:O> P%P.ARITY_H<2:O> P%PBI23_IN_H R + 1 phase Gow transition), PC-IDPBl23_IN_H F + 3 phases(high transition)3 PO/CCPU_HOLD_L Po/cCPU_SUPPRESS_L Po/tCPU_REQ...L lR means the rising edge of the dock is used; F meaDS the falling edge of the clock is used.. 2The 2ns hold time requirement on the NDAL is as follows: the data does not have to be actively driven for this amount of time if the driver ensures that the values will be capacitively held on the bus for 2m past the phi4 risiDg edge. 8P%ACE_L is pulled up through a resistor in the system; the same must be done on the test load board. 3.2.1.1 pOkCPU_REQ_L NVAX asserts P9"DCPU_REQ..L to request the NDAL for the following cycle. P%CPU_REQ..L is a unidirectional signal from NVAX to the arbiter. 3-6 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specificationt Revision 1.0t February 1991 3.2.1.2 P%CPU_HOLD_L The NVAX CPU asserts P%CPU_HOLD_L in order to drive the NDAL on consecutive cycles. 3.2.1.3 pOkCPU_SUPPRESS_L NVAX asserts P%CPU_SUPPRESS_L in order to suppress new NDAL transactions. While P%CPU_SUPPRESS_L is asserted, only fills and writebacks are allowed to proceed from nonCPU nodes. . 3.2.1.4 P%CPU_GRANT_L P%CPU_GRANT_L is asserted to notify NVAX that it must drive the NDAL during the following cycle. 3.2.1.5 pO/oCPU_WB_ONLY_L 'When the system asserts P%CPU_WB_ONLY_L, NVAX only issues 'WDISOW':N or NOPcommands. 3.2.1.6 pO/oNDAL_H<63:0> !\'VAX uses P%NDAL_H<63:0> to transfer address and data information to and from the system. 3.2.1.7 pO/oCMD_H<3:0> The Po/oCMD_H<3:O> lines contain the NDAL command during any given cycle. NVAX drives and receives these lines. 3.2.1.8 P%ID_H<2:0> NVAX drives and receives P%ID_H<2:O>, which contain the node identification number for every cycle. These lines identify which node is driving the NDAL or which node is to receive the NDAL, depending upon the current command. 3.2.1.9 pOkPARITY_H<2:0> NVAX drives and receives P%PARITY_H<2:0>, which contains parity computed over P%NDAL_ H<63:0>, P%CMD_H<3:O> and P%ID_H<2:O> during every NDAL cycle. 3.2.1.10 P%ACK_L NVAX asserts P%ACK_L when it has received a fill data cycle. NVAX receives P%ACK_L as an acknowledgement that its outgoing cycle was successfully received. It also receives Po/oACK_L for cycles which it did not drive on the NDAL, as a way of detecting inconsistent parity errors. An inconsistent parity error is where NVAX detects a parity error on the NDAL and also notices that P%ACK_L was asserted for that cycle. P%ACK_L is an open drain signal which is pulled high (deasserted) by an external resistor on the board. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-7 NVAX CPU Chip Functional Specification, Revi~OD 1.0, February 1991 3.2.2 Clocking signals The NVAX CPU chip generates four two-phase clocks which are distributed to the system. These clocks are also distributed back to itself, which minimizes skew between NVAX and the other chips on the NDAL. Each NDAL cycle is three CPU cycles long. The clocking signals are described in detail in Chapter 17. 3.2.2.1 P%OSC_H, POIoOSC_L P%OSC_H and P%OSC_L are complementary oscillator inputs to NVAX. They are used to generate on-chip clocks and system clocks. When P9DOSC_TEST_H is deasserted, P%OSC_H and P%OSC_L are used to generate NVAX clocks. 3.2.2.2 P%OSC_TC1_H, POIoOSC_TC2_H P%OSC_TCl_H and P%OSC_TC2_H are oscillator inputs to l\TVAX for use during testing only. 'When Po/oOSC_TEST_H is asserted, P%OSC_TCl_H and P%OSC_TC.2_H are used to generate NVJV( clocks. P%OSC_TCl_H and Po/oOSC_TC2_H are 90 degrees out of phase with each other, and are XOR'd internally to produce an internal clock \vhich runs at twice the speed. This allows ~-VA..~ to run at full speed while the input clocks are running at half speed. P%OSC_TCl_H is also used as an input to the Ebox base timeout counter as an alternate clock for the timeout counter. Normally, the base counter is run from the internal NVAX clock; if the system designer wants to lengthen the timeout values used by NVAX, the base counter may be configured to run from Po/DOSC_TCl_H instead. Po/oOSC_TCl_H is synchronized to the internal NVAX clocks in order to be used for this purpose. 3.2.2.3 P%OSC_TEST_H P%OSC_TEST_H is a control pin which determines which oscillator inputs are used by the clock generators. When Po/oOSC_TEST_H is deasserted, Po/DOSC_H and Po/DOSC_L are used; when P%OSC_TEST_H is asserted, lHrDOSC_TCl_H and P%OSC_TC2_H are used. 3.2.2.4 P%PHI12_0UT_H, POIoPHI23_0UT_H, POIoPHI34_0UT_H, POkPHI41_0UT_H These two-phase overlapping clocks are driven from the NVAX chip to all nodes on the NDAL, including back to NVAX itself. 3.2.2.5 P%PHI12_IN_H, POIoPHI23_IN_H, POkPHI34_IN_H, P%PHI41_IN_H These NVAX pins are used to receive the NDAL clocks, which are driven from P%PHII2_0UT_ H, P%PHI23_0UT_H, lHrcPHI34_0UT_H, and P%Pffi41_0UT_H. 3.2.2.6 P%ASYNC_RESET_L P%ASYNC_BESET_L is an asynchronous input to NVAX which is used to generate an internal reset signal as well as P%SYS_RESET_L. 3-8 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specificatiollt Revision 1.0, February 1991 3.2.2.7 P%SYS_RESET_L NVAX drives P%SYS_RESET_L to notify all NDAL receivers to reset. It is deasserted synchronously with the NDAL clocks. 3.2.3 Interrupt and Error Signals The interrupt and elTor signals are described in detail in Chapter 10. 3.2.3.1 pO/oMACHINE_CHECK_H The assertion of P%MACHINE_CHECK_H indicates that the CPU is in a machine cheek sequence. This signal may be wired to an LED on the board. (The pin is not able to drive the LED directly.) It will flicker during a normal machine check. If the CPU never comes out of machine check, the LED will stay lit and indicates to Field Service that the board needs to be replaced. 3.2.3.2 pOkIRQ_l<3:0> The P%m~L<3:0> lines provide a general-purpose interrupt request facility to interrupt the ~'VAX CPU. These four external interrupt request lines cOlTespond to interrupt requests at IPLs 17,16,15, and 14 (hex). P%m~L<3> corresponds to IPL 17, P%IR~L<2> cOlTesponds to IPL 16, p%m~L<l> cOlTesponds to IPL 15, and p%m(LL<O> cOlTesponds to IPL 14. These lines are level-sensitive, NOT edge sensitive. Once a node asserts its interrupt line, it should keep it asserted until N'VAX services the request. p%m~L<3:0> are asynchronous inputs to NVAX and are not expected to operate with any fixed relationship to the NDAL timing. 3.2.3.3 p%H_ERR_l P%H_ERR_L is used to notify NVAX of an error condition in the system which has cOITUpted machine state. These elTors usually cannot be colTected by any retry mechanism. If at all possible, NDAL errors should be reported using the transaction level error reporting mechanisms (not asserting P%ACK_L or using the Read Data Error command). If this is not possible, P%H_ERR_L or Po/oS_ERR_L may be used. When P%H_EBR_L is asserted, NVAX will take a Hard Error InteITu.pt at IPL 1D (hex). P%H_ERR_L is an asynchronous input to NVAX and is not expected to operate with any fixed relationship to the NDAL timing. 3.2.3.4 p%S_ERR_l The assertion of P%S_ERR_L indicates that an error which did not affect instruction execution has been detected in the system environment. For example, if an NDAL node uses the BADWDATA because of an uncorrectable error in its cache, it would also assert P%S_ERR_L to notify NVAX of the event. When it recognizes the assertion of P%S_ERR_L, NVAX takes a Soft ElTor Interrupt at IPL 1A (hex). P%S_ERR_L is an asynchronous input to NVAX and is not expected to operate with any fixed relationship to the NDAL timing. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-9 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.2.3.5 POkINT_TIM_L The assertion of P%INT_TIM_L indicates that the interval timer period has expired. P%INT_TIM_L is an asynchronous input to NVAX and is not expected to operate with any fixed relationship to the NDAL timing. 3.2.3.6 P%PWRFL_L The assertion of P%PWRFL_L informs the CPU of an impending power failure. P%PWRFL_L is an asynchronous input to NVAX and is not expected to operate with any fixed relationship to the NDAL timing. 3.2.3.7 P%HAl.T_L The assertion of ProHALT_L causes the CPU to enter the console at IPL IF (hex) at the next macroinstruction boundary. P%HALT_L is an asynchronous input to NVAX and is not expected to operate with any fixed relationship to the NDAL timing. 3.2.4 Cache interface signals These pins are described in detail in Chapter 13. The timing of the pins is shown in Figure 3-2. NOTE The timing of the Bcache interface signals is given relative to the INTERNAL hTVAX clocks. 3.2.4.1 pOkTS_INDEX_H<20:S> P%TS_INDEX_H<20:5> drive the address lines of the backup cache tag RAMs, thus indexing into one row of the tag store. 3.2.4.2 POkTS_OE_L This pin is connected to the output enable pins of the backup cache tag store RAMs. When NVAX asserts P%TS_OE_L, the RAMs are enabled to drive P%TS_TAG_H<31:17>, P%TS_VALID_H, P%TS_OWNED_H, and P%TS_ECC_H<5:0>. 3.2A.3 pOkTS_WE_L This pin is connected to the write enable pins of the backup cache tag store RAMs. When NVAX asserts P%TS_WE_L, the RAMs are enabled to write the information on P%TS_TAG_ H<31:17>, P%TS_VALID_H, P%TS_OWNED_H, and P%TS_ECC_H<5:0>, which NVAX drives when P%TS_WE_L is asserted. 3-10 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 3-2: Bcache Pin TIming Relative to INTERNAL NVAX Clocks (14ns system) eye L E , - - - - I - - - - o N N V A:X eye L E : - - - - - - - - I - - - - N N V A:X Pl P4 P2 Pl P4 P3 r P2 P4 P3 ~---------IE 8.0n~ ~ri"'" by.WAX at ________. ~SSSsssss,ssssssss,sS' ""~ or ph_:i_3_ _ _ _ _ __ i77777777177777777i777 ____i .. IE S.On. ~i As c1riven by NVAX : IE S.On.~ I rted at ""~ of ph~'. ""Ujrted at 'SSSSSSSS'SSSSSSSSSSS\ P%'.rS_WE_L e.Ons~ ,E As c:lrivan by NVAX : As.ertec:1 at beginn:i.ng of phi3, ?%'1'S ECC 5<5:0> P%'1'S-TAG-5<31:17> ---I : : 1717171777777777777171 ?t'.rS_WE_L P%TS EC= 5<5:0> P%TS-TAG-5<31:17> ?%'1'S-O~ E P%TS:V;''':'ZD:S ""gimlinq of phi. ---. .E S . Ons ~ cieasaertec:1 at beg:i.nnUlg of phil I. ~««««««««««({ w:~~e dA~a As c:lriven by NVAX 1):ivan at :be¢nn:i.ng of phH; i xxxxxxxXxxxxxx P%TS-o~"fD 3 As requirec:1 by WAX VUi· 1. 5"" bafon phi2; l . E 8. 5ns ~ : 1. 5n~ '.r:iat&-ced. a.~ :be~nnin; of phi' : i i I II »)~ iE- i . I ., i»»»»»»»»)i»}»»)i)}»»» rea~ aa~a P%'.rS:VALID:B I ~:istata ~t l>eg:i.n%'l.l.ng of phi4 u10w WAX to ""_i_ve _ _ _ _ _ __ ~---------IE S.Ons ~i Dri~ by ~ at tba ~g1rmi1>g Ior phi3 . _ _ _ _ _ _ _ _ _ _ __ \SSSSSSSS'SSSSSSSS'ss' il7111177111111111illl IE S.Ona ~I I Ie S.Ona ~i As c:lrivan by NVAX : ____ioertad ""gimli i " . . . T at ted of ZSSSSSSSSZSSSSSSSS{SS\' P%OR_WE_L IE P%OR_WE_L S.Ons As c:lriven by NVAX Assertec:1 at beginning of ph:i3, - I :! :!---: P%OR ECC B<7:0> P%OR:OA'.ri_B<63:0> P%OR Ece 5<7: 0> P%OR:o~i_B<63:0> at ~gimIin<;: of phi' ~I . : : ~777727'1~!!11777'f7 I: !: ! daaaaertec:1 at beginning of phil X>OOOOOOOOOOOOO As c1riven by NVAX : 4 orivi at be~ing of phi i ~ read data wrl.te dita::::J}}) -~S-.-S-n~: 1.Sn;::;j IE:ristatec:1~at be~ of P:u' : »»»»> >})}}})}»»»»»}>}»}». As required by W A X : : VaJ.id at beginning of ph:i2 : : ---- 1----i i :--- : : : '.rristate at beginning of phi4 to aJ.l.ow WAX to dr:f.ve i---- ----1---- ---- ---- ---- NOTE: Al.l. c:lrive times are shown as simul.atec:1 in the XNP boarc:1 environment with typical. (14na) WAX parts; drive times in other environments and nth non-typical. WAX parts will differ. ~e diagram assumes a 14-ns WAX cycl.e. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-11 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.2.4.4 P%TS_TAG_H<31 :17> P%TS_TAG_H<31:17> car.ry the tag which is written to and read from the backup cache tag store. Each of these pads is built with an internal resistor so that if the tag bit is not used in a particular system, the pin value as seen by the Cbox is O. For example, a machine which runs only in SO-bit mode does not need to connect P%TS_TAG_H<31:29> to the backup cache. 3.2.4.5 POkTS_ECC_H<5:0> P%TS_ECC_H<5:0 > carry the error correcting code which is written to and read from the backup cache tag store. 3.2.4.6 pOkTS_OWNED_H P%TS_OWNED_H carries the OWNED bit which is written to and read from the backup cache tag store. 3.2.4.7 pOkTS_VAlID_H P'ibTS_VALID_H carries the VALID bit which is written to and read from the backup cache tag store. 3.2.4.8 P"oDR_INDEX_H<20:3> P%DR_INDEX_H<20:3> drive the address lines of the backup cache data RAMs, thus indexing into one row (one quadword) of the cache. 3.2.4.9 P"oDR_OE_L This pin is connected to the output enable pins of the backup cache data RAMs. When NVAX asserts P%DR_OE_L, the RAMs are enabled to drive Po/oDR_DATA_H<63:0> and P%DR_ECC_ H<7:0>. 3.2.4.10 P%DR_WE_l This pin is connected to the write enable pins of the backup cache data RAMs. When NVAX asserts P%DR_WE_L, the RAMs are enabled to write the information on P%DR_DATA_H<63:0> and P%DR_ECC_H<7:0>, which NVAX drives when P9DDR_WE_L is asserted. 3.2.4.11 P%DR_DATA_H<63:0> P%DR_DATA_H<63:O> carry the cache data which is written to and read from the backup cache. 3.2.4.12 POkDR_ECC_H<7:0> P%DR_ECC_H<7:0 > carry the error correcting code which is written to and read from the backup cache data RAMs. 3-12 NVAX Chip Interface DIGITAL. CONFIDENTIAL. NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.2.5 Test Pins These pins are covered in more detail in Chapter 19. 3.2.5.1 P%TEST_DATA_H TEST_DATA_H.is an asynchronous input pin which may be used by microcode. It is pulled high internally so that if it is not used, it does not have to be connected on the board. 3.2.5.2 P% TEST_STROBE_H TEST_STROBE_H is an asynchronous input pin which may be used by microcode. It is pulled high internally so that if it is not used, it does not have to be connected on the board. 3.2.5.3 pO/oDISABLE_OUT_L When P%DISABLE_OUT_L is asserted, NVAX does not drive any of its Input/Output or Output pins, including the NDAL clock outputs CP%Pln12_0UT_H, P%PHI23_0UT_H, P%Pffi34_ OUT_H and P%PHI41_0UT_H). This functionality is used only during test. 3.2.5.4 P% TEMP_H P%TEMP_H is an output pin to be used in test to determine when the NVAX CPU chip is at thermal equilibrium. The voltage on this pin will vary between VDD_I and VSS_I, depending on chip temperature, but the temperature to voltage transfer function will not be specified. As the chip heats up the voltage on the pin will fall, and once the chip is at thermal equilibrium the voltage will remain at some value below VDD_I. This voltage will be monitored by the tester, and testing will commence only when the voltage stops changing, indicating that the chip is at thermal equilibrium. 3.2.5.5 P%TMS_H P%TMS_H is the JTAG test mode select input. It is pulled high by an on-chip resistor when it is not being driven externally. 3.2.5.6 POkTCK_H P%TCK_H is the JTAG test clock. It is pulled low by an on-chip resistor when it is not being driven externally. 3.2.5.7 P%TDI_H P%TDI_H is the JTAG serial test data input. It is pulled high by an on-chip resistor when it is not being driven externally. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-13 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.2.5.8 pOkTDO_H P%TDO_H is the JTAG serial test data output. 3.2.5.9 pOkPP_CMD_H<2:0> p%pp_CMD_H<2:0> provides the NVAX parallel port a command indicating the current function of the parallel port. 3.2.5.10 POkPP_DATA_H<11:D> P%PP_DATA_H<II:O> are output pins for reading test data from NVAX. 3-14 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specifi.catiollt Revision 1.0, February 1991 3.3 The NDAL The NDAL is a 64-bit limited length, pended, synchronous bus with centralized arbitration. Several transactions can be in progress at a given time, allowing highly efficient use of bus bandwidth. Arbitration and data transfers occur simultaneously. The bus uses multiplexed data and address lines. The NDAL supports quadword, octaword and hexaword reads and writes to memory and 110 space. The NDAL supports up to four (4) nodes with a maximum of one (1) NVAX CPU. In this spec, these four nodes are referred to as CPU (NVAX), IOl_NODE, 102_NODE, and the memory interface. Thirty nanoseconds is the minimum NDAL cycle time being considered for a binned CPU. Operating at 30ns, the NDAL has a raw bandwidth of 267 Mbytes/second. At 42ns, the NDAL has a raw bandwidth of 190 Mbytes/second. The usable bandwidth, which depends on transaction length, is shown in Table 3-3 and Table 3-4:. Table 3-3: NVAX DAL Bandwidth at 30ns Operation Bandwidth Quadword Read 133.0 Mbytes/sec Octa.word Read 178.0 Mbytes/sec Hexaword Read 213.0 Mbytes/sec Quadword Write 133.0 :Mbytes/sec Oet.a.word Write 178.0 Mbytes/sec Hexaword Write 213.0 Mbytes/sec Table 3-4: NVAX DAL Bandwidth at 42ns Operation Bandwidth Quadword Read 95.0 Mbyteslsec Oet.a.word Read 127.0 Mbyteslsec Hexaword Read 152.0 Mbytes/sec Quadword Write 95.0 Mbyteslsec Octaword Write 127.0 Mbyteslsec Hexaword Write 152.0 Mbyteslsec Table 3-5 details each NDAL signal. Where All is indicated for Drivers and Receivers, all four possible NDAL nodes drive or receive the signal. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-15 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 3-5: SigDa! NDAL Signals Type! Drivers Receivers Function Arbitration sigDals P'ltCPU_REQ..L 88 NVAX Arbiter NVAX requests the bus. IOl_RE<LL 88 IOl_NODE Arbiter IOl_NODE requests the bus. I02_RE<LL 88 I02_NODE Arbiter I02_NODE requests the bus. Po/tCPU_HOLD_L 88 NVAX Arbiter Extends Po/t>CPU_GBANT_L. IOl_HOLD_L 88 10l_NODE Arbiter Extends IOl_GRANT. I02_HOLD_L 88 102_NODE Arbiter Extends I02_GRANT. Po/tCPU_GRANT_L 88 Arbiter NVAX Grants NVAX the bus. IOl_GRANT_L 88 Arbiter 10 I_NODE Grants IOI_NODE the bus. I02_GRMTT_L 88 Arbiter I02_NODE Grants I02_NODE the bus. PO/OCPU_SUPPRESS_L 88 NVAX Arbiter Suppresses all but writebacks and fills. PO/OCPU_WB_O~'"LY_L 88 Arbiter NVAX Limits NVAX to doing only Disown Writes or NOPs. 10l_SLTPPRESS_L 8S IOl_NODE Arbiter Suppresses all but writebacks and fills. 101_WB_01'l'"LY_L 88 Arbiter IOl_NODE IOl_NODE may only do Disown Writes and :fills. I02_SUPPRES8_L 88 I02_NODE Arbiter Suppresses all but writebacks and fills. 102_WB_ONLY_L 88 Arbiter I02_NODE I02_NODE may only do Disown Writes and :fills. Data, address, and com.mand sigDals All All Multiplexed data and address lines. Po/oCMD_H<3:0> T T All All Command being performed this cycle. P%ID_B<2:O> T All All Commander identification for the transaction. P'loPARITY_B<2:O> T All All P%NDAL_B<63:O> Parity for P%NDAL_H, P%CMD_H, and KID_H. OD All All Po/oSYS_RESET_L 88 NVAX All but NVAX Resets all nodes. Pln12_H 88 NVAX All PHI12 clock for all bus residents. Pm23_H S8 NVAX All Pln23 clock for all bus residents. PHI34_H 88 NVAX All pm34 clock for all bus residents. PHI4l_H 88 NVAX All pm4l clock for all bus residents. Po/oACK_L NDAL acknowledgement of receipt. Clock sigDals 1Indicates whether the pin is Bingle 80urce (B8), '.1ristate (T), or Open Drain (OD) 3-16 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.3.1 Terms In order to clearly describe the transactions which occur on the NDAL, the following terms are used: • • • • • • • • • • • • Node - A node is a hardware device that connects to the NDAL. The largest NDAL system configuration will support 4 nodes. Transfer - A transfer is the smallest quantum of work that occurs on the NDAL. Typical examples of transfers are the address cycle of a read, the address cycle of a write, and each data cycle of a write. Transaction - A transaction is composed of one or more transfers. Transaction is the name given to the logical task being performed (e.g., read); in the case of the read specifically, the transaction consists of a command transfer followed some time later by a return data transfer. See Commander, Responder, Transmitter, and Receiver below. Commander - The commander is the node that initiated the transaction in progress. In any write transaction, the commander is the node that requested the write; for reads, the commander is the one who requested the data. The distinction of being the commander in a transaction holds for the duration of the transaction in spite of the fact that in some cases it might appear that the commander changes. A case in point is where the commander initiates a read transaction. It is the responder (data source) that initiates the return data transfer, but the node that requested the data is still the commander. Responder - The responder is the complement to the commander in a transaction. Transmitter - The transmitter during an !'.c~AL cycle is the node that is driving the information on the NDAL. Using the read transaction as an example, the commander is the transmitter .during the command transfer; during the return data transfer the commander is the receiver. Receiver - The receiver receives the data being moved during a transfer. Naturally Aligned - Refers to a data quantity whose address could be specified as an offset, from the beginning of memory, of an integral number of data elements of the same size. The lower address bits of a piece of naturally aligned data are zero. ETM - Error Transition Mode. The backup cache enters Error Transition Mode when an error occurs. While in ETM, the state of the backup cache is preserved as much as possible. It continues to service requests to blocks which it owns, since those contain the only valid copy of data in the system. ETM is described completely in Chapter 13. Address cycle - The cycle during which the address of the transaction is transmitted on the NDAL. This is the first cycle of a read or write. Data cycle - A cycle during which the NDAL transfers data. These include data cycles of a write and fill data cycles. Read Data Return - This is the command used during a cycle in which a responder is returning read data to a commander. These cycles are also referred to as fills. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-17 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.3.2 NDAL Clocking The NDAL is a four-phase bus. NVAX drives four two-phase overlapping clocks to the other chips on the NDAL as well as back to itself, as shown in Table ~. Table 3-6: NDAL clocks NVAX output pin NDAL clock NVAX input pin P%PBl12_0UT_H P%PBl23_0UT_H P%PBl34_0UT_H Po/oPID41_0UT_H PHI12_H PHI23_H PHI34_B P%PBIl2_IN_H P%PBI23_IN_H P9'tPm34_IN_H P%Pm:41_IN_H PHI41_H See Chapter 17 for more details. 3.3.3 NDAL Arbitration The NDAL protocol can architecturally support up to 4 nodes, which consist of one NVAX CPU and three interfaces to memory or 110. This spec assumes one interface to memory and two interfaces to 110. The 110 interfaces are referred to as I01_NODE and I02_NODE. The non-CPU nodes mayor may not contain caches. use At a given time, any or all of the nodes may desire the of the ~-nAL. Arbitration cycles occur in parallel with data transfer cycles using a set of lines dedicated specifically for arbitration. Figure 3-3 shows the connection of the arbitration signals on the fully-configured NDAL. This arbitration scheme assumes that the arbiter is built into the memory interface. If the arbiter were built as a separate chip, the memory interface would nee,d its own request, hold, grant, suppress, and wb_only lines. When the arbiter is built into the memory interface, the memory interface can withhold grant if its input queues are filling up. 3-18 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specificati~n, Revision 1.0, February 1991 Figure 3-3: NDAL Arbitration Block Diagram .---------. CPU REO L 1 1 1 NVAX 1 1 1 1 1 1 --------- 1----=---=---------------->1 1 CPU HOLD L 1 I----~----=--------------->I 1 CPU_GRANT_L 1 1<-------------------------1 1 CPU SUPPRESS L 1 1----=--------=----------->1 1 CPU WB ONLY L 1 I<---~--~----=-------------I ,---------, 1 1 Arbiter 1 1 Memory 1Inter!acQI 1 .---------. I01_P~Q_L 1 1 !Cl_P.~~D_~ 1 1 I :~:_G?~.!::_:. 1 1<-------------------------1 1------------------------->1 I:Ol_N:iD~ 1------------------------->1 1 1 I :O:_S::?P?~SS_:. I------------------------->i 1 1 :<-------------------------i ,---------, ! . ---------. :c: ?~'~ l : :----=---=---------------->1 1 1 102 HOLD L 1 1:02 NODE !----=----=--------------->I 1 - 1 I02_GRANT_L 1 1 1<-------------------------1 1 102 SUPPR!:SS l 1 1 1 1 1----=--------=----------->1 1 I02_WB_ONLY_L 1 1<-------------------------1 1 ,---------, 1 ,---------, The following sections describe the NDAL arbitration signals. 3.3.3.1 3.3.3.1.1 NDAL Arbitration Signals P%CPU_REQ_L NVAX asserts P%CPU_REQ..,L to request the NDAL for the following cycle. PrDCPU_REQ..,L is a unidirectional signal from NVAX to the arbiter. 3.3.3.1.2 I01_REQ_L IOl_NODE, an interface node, asserts IOI_REQ..L when it wants to drive the NDAL. IOl_REQ..L is a unidirectional signal from IOI_NODE to the arbiter. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-19 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.3.3.1.3 I02_REQ_L I02_NODE, an interface node, asserts I02_RE~L when it wants to drive the NDAL. I02_RE'LL is a unidirectional signal from I02_NODE to the arbiter. 3.3.3.1.4 pO/oCPU_HOLD_L The NVAX CPU asserts Po/£PU_HOLD_L in order to gain access to the NDAL for consecutive cycles. The NVAX CPU only asserts Po/oCPU_HOLD_L when PlfoCPU_GBANT_L is asserted; it never asserts P%CPU_HOLD_L unless Po/oCPU_GBANT_L is asserted. Assertion of P%CPU_HOLD_L guarantees that NVAX may retain ownership of the NDAL in the next cycle, independent of the value of any other outstanding requests. The arbiter must grant the bus to the CPU if the CPU asserts Po/£PU_HOLD_L. P%CPU_HOLD_L is used for multicycle transfers, allowing NVAX to acquire consecutive cycles. NVAX asserts P%CPU_HOLD_L for hexaword Disown Write transactions, in order to transfer the four quadwords of data consecutively and directly after the address cycle; and for quadword Write or Disown Write transactions, in order to transfer the one quadword of data directly after the address cycle. NVAX never asserts P%CPU_HOLD_L for more than four contiguous cycles. 3.3.3.1.5 I01_HOLD_L IOl_HOLD_L is analogous to Po/cCPU_HOLD_L. It performs HOLD functionality for IOl_NODE. IOl_NODE may not assert IOl_HOLD_L unless IOI_GRANT_L is asserted during the current ~"DAL cycle. Assertion ofIOI_HOLD_L guarantees that IOI_NODE may retain ownership of the NDAL in the next cycle, independent of the value of any other outstanding requests. The arbiter must grant the bus to IOl_NODE if it asserts IOl_HOLD_L. IOl_HOLD_L signal is used for multicycle transfers, allowing IOI_NODE to acquire consecutive cycles. In a hexaword write transaction, for instance, IOl_NODE asserts IOI_HOLD_L in order to transfer the four quadwords of data consecutively. IOI_HOLD_L may also be used to transfer Fill data in consecutive cycles. 101_HOLD_L may be asserted for a maximum of four contiguous cycles. 3.3.3.1.6 I02_HOLD_L I02_HOLD_L is analogous to 101_HOLD_L. It performs HOLD functionality for I02_NODE. 3.3.3.1.7 POIoCPU_SUPPRESS_L NVAX asserts Po/£PU_SUPPRESS_L in order to suppress new NDAL transactions which NVAX treats as cache coherency requests. It does this when its two-entry cache coherency queue (the NDAL_IN_QUEUE) is in danger of overflowing. During the cycle when Po/£PU_SUPPRESS_L is asserted, NVAX will accept a new transaction. NVAX requires transactions in the following cycle to be suppressed. 3-20 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 While Pt;OCPU_SUPPRESS_L is asserted, only fills and writebacks are allowed to proceed from non-CPU nodes. The CPU may continue to put all transactions onto the bus (as long as Po/oCPU_WB_ONLY_L is not asserted). Because the NDAL_IN_QUEUE is full and takes the highest priority within the Cbox, NVAX is mostly working on cache coherency transactions while P%CPU_SUPPRESS_L is asserted, which may cause NVAX to issue WDISOWNs on the NDAL. However, NVAX may and does issue any type of transaction while P%CPU_SUPPRESS_L is asserted. 3.3.3.1.8 I01_SUPPRESS_L IOl_NODE can suppress new transactions on the NDAL by asserting IOl_SUPPRESS_L. Fills and writebacks will proceed as usual. 3.3.3.1.9 I02_SUPPRESS_L I02_NODE can suppress new transactions on the NDAL by asserting I02_SUPPRESS_L. Fills and writebacks will proceed as usual. 3.3.3.1.10 P%CPU_GRANT_L Pt7cCPU_GR.A-"'TT_L is asserted to notify ~-v..4J( that it must drive the !\'T!>AL during the following cycle. Vthen P7cCPU_GRANT_L is asserted, ~"VAX must drive the bus with a valid command and correct parity. If NVAX did not request the NDAL, it drives the bus with a NOP. It only drives a non-NOP command if it actually requested the NDAL in the previous cycle. IfNVAX asserts P%CPU_HOLD_L, P%CPU_GRANT_L must be asserted in the next cycle. 3.3.3.1.11 I01_GRANT_L The arbiter asserts IOl_GRANT_L when IOl_NODE is permitted to drive the bus. When IOl_GRANT_L is asserted, IOl_NODE must drive the bus with a valid command and correct parity. If IOl_HOLD_L is asserted, IOl_GRANT_L must be asserted in the next cycle. 3.3.3.1.12 I02_GRANT_L I02_GRANT_L is analogous to IOl_GRANT_L. It grants the bus to I02_NODE. 3.3.3.1.13 p%CPU_WB_ONLY_L When Po/oCPU_WB_ONLY_L is asserted, NV.AX will only issue Write Disown or NOPcommands, including Write Disowns due to Write Unlocks when the cache is off or in ETM. Otherwise, NVAX will not issue any new requests. During the cycle in which P%CPU_WB_ONLY_L is asserted, the system must be prepared to accept one more non-writeback. command from the CPU. Starting with the cycle following the assertion of P9"oCPU_WB_ONLY_L, NVAX will only issue writeback. commands. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-21 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.3.3.1~14 I01_WB_ONLY_L 101_WB_ONLY_L is driven by the arbiter and received by 101_NODE. When 101_WB_ONLY_L is asserted, 101_NODE only arbs for the bus in order to return. fills or disown writes. It does not initiate any new transactions. 3.3.3.1.15 I02_WB_ONLY_L 102_WB_ONLY_L is driven by the arbiter and received by I02_NODE. When 102_WB_ONLY_L is asserted, 102_NODE only arbs for the bus in order to return. fills or disown writes. It does not initiate any new transactions. 3.3.3.2 NDAL Arbitration Timing The timing for NDAL arbitration is shown in Figure 3-4. There are several critical spots to note in the diagram. The arbiter receives the request lines by the end of PI. It must drive the grant lines to \"alid values by the end ofP3. It has two phases to calculate arbitration and to drive the grant lines across the board. In the fastest system (IOns ~"VAX), the arbiter has 15ns after receiving the request lines to arbitrate and to drive the grant lines. Board simulations for one system show that driving the grant lines will take about half that time. From the time a bus driver receives its grant line, it has three phases to drive P%NDAL_H<63:0>, P%CMD_H<3:0>, P%ID_H<2:0>, and Po/cPARITY_H<2:0> to valid levels. From the time the NDAL is valid on its pins, the receiver has four phases to compute parity and to assert P%ACK_L. 3-22 NVAX Chip Interface DIGITAL CONFIDENTIAL c i5 i;!r o --" D A L oz ,. 1 8m ~ NDAL H r CPUflO_L ;;: PI I C Y C L &: P2 I~ ---I--N I_P4__ o \\\\\\\\\\ -arbiter- i i- s ~ w ~ »>-- z j r » it0" OlllllUl1 grlmt held because of CPU HOLD_ L bun not granted to 10 NODEl -arbiter- NULL cycle hot ACKod \ __ /... \. _ _ _ _ _ _--1 1 \\\\\\\\\\\\\\\\\\\\\~ 11WIW write address oyole ACKed -------- \ I ---',------~--, bus granted I r \. J \\.-_ _ _ __ \___ ~ n 5" toc::J = a ~ ~ g ~ l ia ~ r. I _n~,- ~ :::I CO \.\\\.\.\.\.\.\.'\\ / CiI ..g bus granted to CPU !I. PHIU \ r- \ I \ I s ~ .? ij 5' U)---{JUTUH « « H « « drive write data n C P4 filllllttt AC~L o :r 2---- Lp: \.'-'-\.\.\.\.\.\..'-. IOI_GRANT_L ~ . . -l . ~)~.-.-.-I C L p2 \\\'\ill\\\' 1l7lll7l11 CPU holds bus 1:0 drive w.rUe data IOI_REO_L PHI34 ~.---r Pl~:t C'I 1 drive write addr.eulJ \\\\\\\\\\\ CPU request to do write CPU_GRANT_L z C Y G J.~: -< <<<<« (S UUUUJJ---------u}---{UU JUJU U<ilill CPU_HOLD_L PHI23 D A L -~-.I-~- null data oyole PHIl2 J! CO 1--- - - - - - - - - - - - - ---I------I----i--- - - - - - - - - - ~ J ~ cc cc ~ NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.3.3.3 NDAL Suppress and Its nmlng When any node asserts its suppress line, no transactions other than writebacks or fills must be driven onto the bus, starting in the following cycle. For example, when Po/cCPU_SUPPRESS_L is asserted, the arbiter can accomplish this in the following way: if P9"cCPU_SUPPRESS_L is asserted during cycle 0, the arbiter does not grant the bus to any node, with the possible exception of the CPU, in cycle O. At the same time it asserts 101_'WB_ONLY and 102_WB_ONLY. In cycle 1, the arbiter continues to perform bus arbitration as it normally would, but now IOI_NODE and I02_NODE recognize the assertion of their respective W'B_ONLY lines, and they do not request the bus except for fills and writebacks. From this, it may be seen that the assertion of Po/oCPU_SUPPRESS_L causes the arbiter to assert 101_WB_ONLY_L and 102_'WB_ONLY_L; the assertion of 101_SUPPRESS_L causes the arbiter to assert Po/oCPU_WB_ONLY_L and 102_WB_ONLY_L; and the assertion of I02_SUPPRESS_L causes the arbiter to assert Po/cCPU_WB_ONLY_L and 101_WB_ONLY_L. The timing for suppression of the bus is shown in Figure 3-5. In this example, the CPU suppresses the bus by asserting P%CPU_SUPPRESS_L, which is valid at the end of PI in NDAL cycle O. The arbiter immediately asserts 101_,\\'B_01\TLY_L and 102_,\\'B_ONLY_L, which are valid by the end of P3 in the same cycle. This notifies IOI_NODE and 102_NODE that they should not arbitrate for the bus for new transactions, only for writebacks and fills. (If the 10 chip cannot suppress its request line quickly enough, it may drive NOPs onto the !\"D.A.L if it gets GRAl\"T, instead of '\vithdrawing its request in the first cycle.) Accordingly, in ~'"D.AL CYCLE 1 as shown in the diagram, IOI_RE~L is deasserted by IOI_NODE, since it has a read or.a write request to do. I02_RE'LL remains asserted because 102_NODE has a fill to do. During the cycle in which Po/oCPU_SUPPRESS...L is asserted, the arbiter does not grant to any node with the exception of the CPU. Since it is the one suppressing the bus it should be allowed to continue issuing transactions on the bus. If a node had its HOLD line asserted and it had been granted the bus in the cycle before, it WOULD get grant under suppress. The rules for HOLD override the rules for SUPPRESS. In NDAL CYCLE 1, the bus is granted to 102_NODE which has arb'd to do its fill. The:fill is driven in NDAL cycle 2. 3.3.3.4 NDAL Arbitration Rules The rules of arbitration are as follows: 1. Any node may assert its request line during any cycle. 2. A node's grant line must be asserted before that node drives the NDAL. 3. An NDAL driver may only assert its HOLD_L line if it has been granted the bus for the current cycle. 4. If a node has been granted the bus, and it asserts HOLD, it is guaranteed to be granted the bus in the following cycle. 5. HOLD may only be used in two cases: (a) to hold the bus for the data cycles of a write; (b) to send consecutive :fill cycles. 3-24 NVAX Chip Interface DIGITAL CONFIDENTIAL c i5 ....~ -N D A L o o z I :!! c m ~ l> r- Pl I~ o P3 <UillUIUllmU{~--- NDAL_H --I--N I C Y C L E CPU_SUPPRIlSS_L ._p_l__ I_~ P4 __ U~( H s lO_NODEl drives read address D A L : C Y C I. Po ---1N __ 1 ____ I_ . _~~_I_~_4 CPU drives rend ndd~ess ~_I~ c 2--- i P4 ~ P3 :read data return driven by IO_NODE2: z c ~ r- ~ill\\\\' IIlllTTl171 U> C 'U "V \\\\\\\,\\ 11UIUll7 NB ONLY deaaserted after SUpfRESS dea.serted arbiter asserts 101 NB ONLY L due to suppress - - I02_WB_ONLY_L C YC L E U )-{UllUllIIHllllU--- -=:7»- UUH UUH (UU s CPU suppresses bus due to full input queue IOl_WB_ONLY_L D A L J1 ca \\\\\\\\\\ U7lllTlll arbiter as.erts I02_WB_ONLY_L due to suppress I... i:r ca IOl_RIlQ_L I02_REQ_L ""'\\\\\\ CPU_RIlQ_L """,,"ill' request for fill fill request remn.i.llIJ request for read IUUflju/ nflll~"" ,·d 0 0 ~ ~ ~ Et. 111101171 : request withdrawn after grant i f 0; 171111/111 IOl_GRANT_L 0 ."tt g I01_WD_ONLY_L read/write request ~ i bus not re-granted due to suppress Et. bus not granted due to suppraso; CPU GRANT L z ~ o :r if - PHI12 - ""'''''' I bus grnnl:ed in cycle after suppress I , J f ~. 011111171 bus granted to CPU under CPU_SUJ?PltESS , !i. I , ' -_ _ __ S ~ j:> i PHI23 _ _---II a PHI34 _______, ~U1 PHIU \. it UUUlZll7 \\\\\\\\\~. I02_GRAN'r_L I , r----- ~ r--- , ----, - - - - - - - - - - - - ---1------1 / I \. , _ __ / --, ·--I·-·------j·--+ / -I - - - - - ~ ~ j.... i.... NVAX CPU Chip Functional Specification, Hevision 1.0, February 1991 6. HOLD must be used to retain the bus for the data cycles of a write, as the data cycles must be contiguous with the write address cycle. 7. HOLD must not be used to retain the bus for new transactions, as arbitration fairness would not be maintained. 8. If a node requests the bus and is granted the bus, it must drive the NDAL during the granted cycle with a valid command. NOP is a valid command. NVAX takes this a step further and drives NOP if it is granted the bus when it did not request it. 9. Any node which issues a read must be able to accept the corresponding fills as they cannot be suppressed or slowed. 10. If a node's WB_ONLY line is asserted, it may only drive the NDAL with NOP, RDE, RORn, WDISOWN, WDATA, or BADWDATA. 11. If a node asserts its SUPPRESS line, the arbiter must not grant the bus to any node except that one in the next cycle. At the same time the arbiter must assert the appropriate 'WB_ONLY lines. In the following cycle, the arbiter must grant the bus normally. 12. The rules for HOLD override the rules for SUPPRESS. 13. The bus must be actively driven during every cycle. Specifics on arbitration algorithms may be found in the system specs for each ~~.AX system. 3-26 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.3.4 NDAl Information Transfer 3.3.4.1 p % NDAL_H<63:0> The use of this field is multiplexed between address and data information. On data cycles the lines represent 64 bits of read or write data; on address cycles the lines represent address, byte enable, and length information. There are four types of data cycles: Write Data, Bad Write Data, Read Data Return, and Read Data Error. During write data cycles the commander drives its Commander ID on P%ID_H<2:O> and drives data on P%NDAL_H<63:0>. The full 64 bits of data are written during hexaword writes. For octaword and quadword length writes, the data bytes which are written correspond to the byte enable hits which were asserted during the address cycle which initiated the transaction. During Read Data Return and Read Data Error cycles the responder drives the original commander ID. The ~'"DAL address cycle is used by a commander to initiate an NDAL transaction. On address cycles the address is driv.en in the lo\ver longword of the bus, and the byte enable and transaction length are in the upper longword, as shown in Figure 3-6. Figure 3-6: Address Cycle Format ---~--~--~--~--~--~-----------~-----------~--~--~-----~--~--~--~--~--~--~--~-----~--~-----~--~--~ : :";i:: I ~-----~--~-----------------------------~--~--~--------------------~--~--------~--------~-----~--~ 3: 30 29 2812i 26 25 24::3 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 I space aaaress 1 I ---+--+--~--~--~--~--~--------~--~--+--+--+--+--~-----+--~--~--+--+--+-----~--~--~--~--+--+--+--+ I ,-- Mem - 000 •• 110 I/O - III Each field shown in the diagram is described in the sections which follow. 3.3.4.1.1 Address Field The address space supported by the NDAL is divided into memory space and 110 space. The lower 32 bits of the address cycle P%NDAL_H<31:0> define the address of an NDAL read or write transaction. The NDAL supports a 4 Gigabyte (2**32 byte) address space. The most significant hits of this address (corresponding to lines P%NDAL_H<31:29» select 512 Mb 110 space (P%NDAL_H<31:29> = 111) or 3.5 Gb memory space <P%NDAL_H<31:29> =000 .. 110). Figure 3-7 illustrates the division of the address space into memory space and 110 space. The division of the NDAL address space in the 110 region is further defined to accommodate the need for NDAL node and 110 node address space. More information about the division of 110 space may be found in Chapter 2. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-27 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 3-7: 00000000 Physical Address Space Layout +-------------------------+ -+I +I I I -+ +I I Memory Space +- I :;.5 Gigabytes I -+ I I -+ -+ -+ ::0 i I ::.2 Me:;~::'":es Address bits <31:0> are all significant bits in an address to 1/0 space. Although the length field on the ~"DAL is always quadword for I/O space reads and writes, the actual amount of data read or written may be less than a quadword. The byte enable is used to read or write the requested bytes only. If the byte enable indicates a I-byte read or write, every bit of the address is significant. The lower bits of the address are provided so that the I/O adapters do not have to deduce the address from the byte enable. The number of significant bits in an address to MEMORY depends on the transaction type and length as shown in Figure 3-8. 3-28 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 3-8: NDAL Memory Address Interpretation A<i>, i- 4 3 2 1 0 +-+-+-+-+-+ Read quadword, octaword, hexaword Islsldldldl +-+-+-+-+-+ Write quadword Islsldldldl +-+-+-+-+-+ Write oct.aword Islalaldldl +-+-+-+-+-+ Write hexaword Idlalaldldl +-+-+-+-+-+ s - significant d - don't care It can be seen from the figure that bits A<4:3> are significant address bits or don't care, depending on the function being requested. All reads have significant bits down to the quadword. Although fills may be returned in any order, there is a performance advantage if memory returns the requested quadword first. The NDAL prot.ocol identifies each quadword using one of the four Read Data Return commands, so that quadwords can be placed in correct locations regardless of the order in which they are returned. Quadword, octaword and hexaword writes are always naturally aligned and driven on the NDAL in order from the lowest-addressed quadword to the highest. 3.3.4.1.2 Byte Enable Field The Byte Enable field is located in P%NDAL_H<55:40> during the address cycle. It is used to supply byte-level enable information for quadword-Iength DREADs, !READs, DREADs, WRITEs, and "TDISDWNs and octaword-Iength WRITEs and WDISOWNs. Of these transactions, NVAX generates only quadword IREADs and DREADs to I/O space, quadword WRITEs to I/O space, and quadword WRITEs and WDISOWNs to memory space. If the byte enable is a "ltt, the byte is to be read or written. If it is a "0", the byte is not read or written. NOTE During quadword-Iength transactions the high portion of the byte enable field, located in P%NDAL_H<55:48>, is ignored. Commanders may drive any data pattern they wish in this field as long as it has correct parity. Responders must not depend on a certain defined pattern (such as all zeros). During hexaword-Iength transactions the entire byte enable field is ignored. During hexaword transactions, commanders are permitted to drive any data pattern they wish in this field as long as it has correct parity. Responders must not depend on a certain defined pattern (such as all zeros). During oetaword-Iength transactions, the byte enable located in PfDNDAL_H<47:4D> always corresponds to the low-order quadword of the oetaword. The byte enable located in P%NDAL_H<55:48> always corresponds to the high-order quadword of the oetaword. The correspondence between bits in the enable and bytes of the data is shown in Table 3-7 and Table 3-8. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-29 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 3-7: Byte Enable for Quadword Reads and WrHes Address cycle Data cycle Byte Enable Data Byte Po/oNDAL_H<47> P%NDAL_H<63:56> P%NDAL_H<46> Po/oNDAL_H<45> P%NDAL_H<55:48> P%NDAL_B<47:40> P%NDAL_H<44> P%NDAL_B<39:32> Po/oNDAL_H<43> P%NDAL_H<31:24> P%NDAL_B<42> Po/cNDAL_H<23:16> P%NDAL_H<41> P%NDAL_H<15:08> Po/oNDAL_H<07:00> . Po/oNDAL_H<40> Table~: Byte Enable for Octaword Writes Address cycle First data cycle Second data cycle B~-te Enable Bit Quadword 0 Data Byte Quadword 1 Data Byte Po/c..''"DAL_B<4i> Po/~"DAL_H<63:56> P%~'"DAL_B<46> Po/oNDAL_H<55:4B> P%~"DAL_H<45> P%~"DAL_H<44> P%NDAL_H<47:40> P%NDAL_H<39:32> P%~"DAL_H<43> P%NDAL_H<31:24> P%~'"DAL_B<42> Po/oNDAL_H<23:16> P%NDAL_H<41> P%NDAL_B<15:08> P%NDAL_H<40> P%NDAL_H<07:00> P%NDAL_H<55> P%NDAL_B<63:56> P%NDAL_H<D4> P%NDAL_B<55:48> Po/oNDAL_B<03> P%NDAL_B<52> P%NDAL_B<4'1:40> Po/oNDAL_H<51> P%NDAL_B<31:24> P%NDAL_B<50> P%NDAL_B<23:l6> P%NDAL_B<39:32> P%NDAL_B<49> P%NDAL_B<15:08> Po/oNDAL_B<48> KNDAL_B<O'1:OO> 3-30 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 3-9 illustrates possible bit pattem.s in the byte enable for transactions which NVAX generates. Only transactions in which the byte enable is valid are listed. NVAX will generate every possible byte enable for every possible address for quadword WRITEs and WDISOWNs to memory space, as shown by the table. lREADs to 110 space will always request a full quadword of data by asserting all the byte enable bits. DREADs and WRITEs to I/O space are issued using the quadword length NDAL encoding, but the requests are for only a byte, word, or longword at a time, as indicated by the byte enable given in the command cycle of a transaction. References that are unaligned across a naturally aligned quadword are decomposed into two separate requests for the bytes in each quadword; where this is the case, Table 3-9 shows the byte enable values for both references generated. In the cases where a second request is generated, the address is incremented by 8, which addresses the next quadword in 110 space, but address bits <2:0> are OOO(BIN). 'When the NVAX CPU does an I/O space read for an interrupt acknowledge (IAK. read), it always generates a longword-aligned word-length read request. In other words, the byte enable which l\."l'\JAX uses for an IAK read is either 0000 0011 (binary) or 0011 0000 (binary). Table 3-9 reflects what NDAL requests the l\TVAX CPU will generate, depending on the software written. Software must take care only to generate requests which make sense in the system environment. Specifically, unaligned requests are forbidden by DEC Standard 032. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-31 NVAX CPU Chip Functional Specification, Revision l~O, February 1991 Table 3-9: Possible Byte Enables for NVAX-generated transactions Byte Enabled:O> NDAL Software Transaction Addr<2:O> :Beq.# QW WRITE, WDISOWN (memory space) any adcir 1st QW !READ (lIO space) 000 1st QWDREAD, 000 QWWRITE NDALI NDAL Software Software Software Software Addr<2:O> Byte Beq. Word Beq. LW:Beq. QWBeq. unrestricted unrestricted unrestricted unrestricted same as SW 11111111 1st 000 2nd 100 1st 001 0000 0001 OOOOOOll 0000 1111 00001111 11110000 (lIO space) 001 010 011 100 101 110 111 3-32 NVAX Chip Interface 0000 0010 00000110 00011110 00011110 2nd 101 1110 0000 3rd 000 00000001 1st 010 0000 0100 00001100 00111100 00111100 2nd 110 11000000 3rd 000 00000011 1st 011 0000 1000 00011000 01111000 01111000 2nd 111 10000000 3rd 000 0000 o ill 1st 100 2nd 000 1st 101 2nd 000 3rd 001 1st 110 2nd 000 3rd 010 1st 111 2nd 000 3rd 011 00010000 0011 0000 1111 0000 11110000 0000 llll 00100000 01100000 11100000 11100000 0000 0001 00000001 00011110 0100 0000 1100 0000 1100 0000 UOOOOOO 0000 0011 00000011 00111100 1000 0000 1000 0000 1000 0000 1000 0000 0000 0001 0000 0111 0000 0111 01l110oo DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.3.4.1.2.1 110 space writes When the NVAX CPU issues an 110 space write, it always replicates the data identically on the high longword and the low longword of the NDAL, although the byte enable indicates that the data is only valid in one longword or the other. A system device may take advantage of this fact to avoid rotating the data. 3.3.4.1.3 Length Field The length field is used to indicate the amount of data to be read or written for the current transaction. Table 3-10 shows how the length values correspond to transaction lengths. Table 3-10: NDAL Length Field 00 hexaword 01 unused 10 quadword 11 ocr.aword (not used by NVAX CPU) 3.3.4.2 pO/oCMD_H<3:0> The P%CMD_H<3:0> lines specify the current ·bus transaction during any given cycle. The interpretation of the four bits is shown in Table 3-11. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-33 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 3-11 : NDAl Command Encodlngs and Definitions Levels Abbrev. Bus 'IraDsactioD Type Function 0000 0001 NOP No Operation Nop No Operation Reserved 0010 WRITE Write Addr Write to memory with byte enable if quadword or octaword 0011 WDISOWN Write Disown Addr Write memory; cache disowns block and returns ownership to memory 0100 !READ DREAD Instruction Stream Read Addr Instruction-stream read Data Stream Read Addr Data-stream read (without ownership) OREAD D-Stream Read Ownership Addr Data-stream read claiming ownership for the cache 0101 0110 0111 1000 1001 RDE Read Data ElTor Data Used instead of Read Data Return in the case of an error. 1010 'WDATA Write Data Cycle Data Write data is being ttansfen-ed 1011 BADWDATA Bad Write Data Data Write data with errors is being transfened 1100 RDRO Read DataO Return (fill) Data Read data is returning colTesponding to QW 0 of a hexaword. 1101 RDR1 Read Datal Return (fill) Data Reserved Reserved Read data is returning corresponding to QW 1 ofa hexaword. 1110 RDR2 Read Data2 Return (fill) Data Read data is returning colTesponding to QW 2 of a hexaword. 1111 RDRS Read Data3 Return (fill) Data Read data is returning corresponding to QW 3 of a hexaword. The NVAX CPU does not implement all transaction lengths with all commands. The commands and lengths which it uses are in the table which follows. If NVAX implements the command in memory space, MEM is indicated in the table; if it implements the command in 110 space, 110 is indicated in the table. Table 3-12: NDAL Address Cycle Commands as used by the NVAX CPU COMMAND QUADWORD !READ 00 110 DREAD OCTAWORD REXAWORD ~M ~M OREAD ~ WRITE WDISOWN ~M 1 NVAX uses these transactions only when the backup cache is disabled or in Error Transition Mode. 3-34 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 When the cache is off, the NVAX CPU issues OREAD commands of hexaword length, and cOlTesponding Disown Write commands of quadword length. These correspond to the CPU-internal commands of Read Lock and Write Unlock. The lock/ownership granularity in memory must not be less than a hexaword. Otherwise, when the CPU did a hexaword OREAD followed by a quadword Disown Write, the other three quadwords would be in limbo. The CPU would assume that it didn't own them, and memory would believe that they were still owned by the cache. 3.3.4.3 pOklD_H<2:0> During the address cycle and return data cycles, P%ID_H<2:O> contain the commander's ID. This ID is used to identify the source of the request on the address cycle and to associate returning data with the commander who issued the request on return data cycles. The commander ID codes available for use by a node are shown in Table 3-13. P%ID_H<2:1> indicate which node originated the transaction, and P%ID_H<O> indicates which of two outstanding reads per node. Table 3-13: Commander pOkID_H Assignments Node Name !\'\.:-\.x OOX memory interface Olx 10 I_NODE I02_NODE lOX llX During write command and data cycles, P%ID_H<2:0> is driven with the ID of the commander. P%ID_H<2:1> is driven with the bits identifying the commander, and P%ID_H<O> may be driven with any value. P%ID_H<O> is not necessarily driven with the same value during the command cycle of a write and the corresponding data cycles of that write. Each commander node on the NDAL may have two read transactions outstanding. The memory interface is not a commander node, but it has been assigned a commander ID which may be used in some NVAX. systems. For example, in the nlI2 system, the memory interface refiects XMI2 read and write commands into the NDAL for cache coherency reasons. These reads and writes are not taken up by any node on the NDAL except to enforce cache coherency. The memory interface uses its own ID when driving these reads and writes onto the NDAL. If a write is refiected onto the NDAL merely to enforce cache coherency, the WDATA cycles may be omitted. 3.3.4.4 P%PARITY_H<2:D> P%PARITY_H<2> is computed over P%CMD_H<3:O> and P%ID_H<2:O>. Even parity is used, where the" exclusive OR" of all bits including the parity bit is a "0". (All bits, including the parity bit, have an even number of "1"'s.) P%PARITY_H<2> is inverted, forcing an NDAL parity CCTL<FORCE_NDAL_PERR> is set. This is described in Chapter 13. DIGITAL CONFIDENTIAL error, when NVAX Chip Interface 3-35 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 P%PARITY_H<I> is computed over the high longword of the NDAL, P%NDAL_H<63:32>. Odd parity is used, where the "exclusive OR" of all bits including the parity bit is a "Itt. (All bits, including the parity bit, have an odd number of "I "'s.) P%PARITY_H<O> is computed over the low longword of the NDAL, P%NDAL..-H<31:0>. Even parity is used. Using a combination of odd and even parity means that neither all "l"'s nor all "O"'s is a legal bus pattern. If a device requests the bus and is granted it, but chooses not to use it during a given cycle, it is responsible for driving the NOP command on P%CMD_H<3:0>. It must drive P%NDAL_H<63:0>, P%ID_H<2:0>, and P%PARITY_H<2:0> with correct parity. If NVAX did not request the bus, and it is granted the NDAL anyway, it will drive the NDAL with a NOP. When the bus is idle, the arbiter ensures that the NDAL is driven with correct parity. To do this, the arbiter may take advantage of the fact that ~"\~~ will drive the l\t~AL with NOP if it is unexpectedly gTanted the bus. The l\"'VAX BIU checks the ~L>AL for correct parity in every cycle, regardless of the contents of the bus. It does not distinguish between errors on the command lines or the data lines; it computes the three parity bits, and if any fail, it responds to the error according to Table 3-21. Table 3-14: NDAl Parity Coverage Parity bit protected data parity type P%PARITY_B<2> Po/oCMD_H<8:O>J»%ID_H<2:0> even parity P%PARITY_B<l> Po/cNDAL_H<63:32> odd parity even parity 3.3.4.5 PO/OACK_L P%ACK_L is an open drain signal which is pulled high (deasserted) by an external resistor on the board. The resistor is able to pull the node high during the time allotted without assistance from any other P%ACK_L driver. Thus, an P%ACK_L driver only has to pull the signal low at the appropriate time. The receiver for a particular NDAL cycle is responsible for pulling P%ACK_L low (asserted) if it receives the cycle without parity errors. If another receiver detects a parity error on the cycle, it reports it by asserting P%H_EBR_L or PfoS_ERR_L. If Po/oACK_L is asserted in response to an NDAL cycle, it indicates that the receiving node has accepted an address cycle or a data cycle. Po/oACK_L being asserted for a read address cycle indicates that the responder will return a read response cycle at a later time. If it is asserted for a write address cycle, the transfer of the write address is assumed successful. If a cycle is accompanied by a NOP command, the cycle mayor may not be acknowledged by the assertion of P%ACK_L; NOP's do not have to be acknowledged but they may be. P%ACK_L is always asserted by the NDAL receiver unless there was a parity error on the bus. It is NOT used for flow control. 3-36 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 P%ACK_L is also not asserted when there is no node on the NDAL which recognizes the address space addressed, i.e., transactions to non-existent memory and I/O space will not receive P%ACK_L assertion. See Table 3-21 for NVAX response when P%ACK_L is not asserted. The timing of ACK_L relative to the data or address cycle is shown in Figure 3-9. For a given transfer, ACK_L is asserted one cycle later. In cycle 0 a read is driven, so ACK_L is asserted in cycle 1. In cycle 4 a NOP is driven, and in cycle 5 ACK_L is not asserted because NOP's do not have to be acknowledged. Figure 3-9: ImA!. pO/oACK_l Timing cycle i 0 I 1 I :2 I 3 I 4 I 5 I 6 I i I i-----I-----I-----I-----!-----l-----j-----I-----I I : I ?ea::" I v~=:'-:I ! I ~=K I I rica": I Read: !,cF I I I I ~=K I ACK I ~=:. ; I : ~·==:'-:i v;::'a-:I ! i cycle I cy:le: cyc:'e I c~·:::'e' I 0 I :. i : i ~ I I DIGITAL CONFIDENTIAL I I i ,-I ..- ~K . I _~ I I ! I A=? I .. -. __ , _.:, __ --= .... c -~-,I :; ! € NVAX Chip Interface 3-37 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.3.5 NDAL Transactions The following sections describe the entire set of NDAL transactions. Table 3-15 shows the entire set of NDAL commands and how they are used by NVAX. In memory space, NVAX issues all reads with hexaword length. Normal writes to memory space are always quadword length, and Disown Writes are quadword or hexaword. When the cache is operating normally, Disown Writes are only issued in hexaword length. When the cache is in ETM, NVAX issues Disown Writes of both hexaword and quadword length. When the cache is off, NVAX issues only quadword Disown Writes. NVAX issues quadword Disown Writes only as the result of an interlock operation. In I/O space, the ownership commands (OREAD and Disown Write) are not defined at all. NVAX issues only quadword operations in I/O space. NVAX never uses the BADWDATA command in I/O space. 3-38 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0t February 1991 Table 3-15: NDAL Command Usage by NVAX Address Space N/A N/A Memory Memory Memory Memory Memory :Memory l\iemory :Memory Memory :Memory l\iemory I\lemory 110 110 110 110 110 110 110 110 110 110 110 110 Used Command NVAX by Leugth LeDgth LeDgth QW OW HW Nop Reserved yes no WRITE yes yes yes yes yes no yes yes no no no no yes yes no no no no no no no no no yes yes yes yes yes no yes yes no no yes no no no no no yes no yes yes no no no no no no no no no no no WDISOWN !READ DREAD OREAD RDE 'WDATA BAD,\VDATA RDRO RDRI RDR2 RDR3 WRITE WDISOWN !READ DREAD OREAD RDE WDATA BADWDATA RDRO RDRl RDR2 RDRa DIGITAL CONFIDENTIAL NVAX Chip Interface 3-39 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 3-16 shows the usage of NDAL commands by NDAL devices other than NVAX. The ownership commands (OREAD and WDISOWN) are not defined at all in I/O space. Although nodes may use OREAD and WDISOWN of lengths other than hexaword, they must be aware of the memory coherency problems connected with using lengths other than hexaword for these operations. Memory defines ownership along hexaword boundaries. Table 3-16: Address Space NDAL Command Usage by NDAL nodes besides NVAX COlDDUUld Used by NDAL nodes Leugth Length Length QW OW HW N/A Nop yes N/A Reserved no :Memory WRITE yes yes yes yes .:.uemory " WDISOWN yes yes yes yes 1:Iemory !READ yes yes yes yes ~lemory DREAD yes yes yes yes ~lemory OREAD yes yes yes yes ~lemory RDE yes yes yes yes XvIemory WDATA yes Memory BADWDATA yes XvIemory RDRO yes Memory RDRl yes Memory RDR2 yes Memory RDR3 yes I/O WRITE yes I/O WDISOWN no no no no I/O !READ yes yes yes yes I/O DREAD OREAD yes yes yes yes no no no no I/O RDE yes I/O WDATA yes I/O BADWDATA yes I/O RDRO yes I/O RDRl yes I/O RDR2 yes I/O RDR3 yes I/O 3-40 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.3.5.1 Reads and Fills The read address cycle, which is recognized by one of the three read commands (DREAD, IREAD, or OREAD) is decoded by the interfaces in the system, and the one which recognizes the address latches that address and command. This device is the responder. The responder uses Read Data Return or Read Data Error cycles to return the data. Reads and fills are described in the sections which follow. 3.3.5.1.1 Dstream Read Requests (DREAD) An NDAL commander uses the DREAD command to request Data Stream data from a responder, either memory or an 110 device. 3.3.5.1.2 Istream Read Requests (IREAD) The IREAD command is used to request Instruction-Stream data from a responder, either memory or an I/O device. The separate I-stream read command is used in implementing halt protection for the CPU. 'When a system device which asserts P~oHALT_L recognizes an I-stream read in halt-protected space, it prevents PlioHALT_L from being asserted to the CPU. In the meantime, DREADs outside of halt-protected space may occur. Vlhen an IREAD outside of halt-protected space happens, the system device resumes asserting P%HALT_L to the CPU. When NVAX issues the IREAD command in 110 space, it expects a full quadword of data in return. The responding device may decode the IREAD command instead of the byte enable field to detect the need to return a full quadword of data. In addition, the separate IREAD command may be helpful in analysis during system debug or for performance analysis. 3.3.5.1.3 Ownership Read Requests (OREAD) A node uses the OREAD command to gain ownership of a hexaword block of memory. Whereas previous systems implemented an Interlock read as well, the NDAL defines only the Ownership read. Interlocks can be accomplished using OREADs. OREADs are only defined for memory space; they are not used in 110 space. When memory receives an ownership read, an owned" bit is set in memory and the read data is returned. Each hexaword in memory has an owned bit. The NVAX backup cache is organized by hexawords also, with an owned bit for each hexaword. Memory clears the owned bit when a Disown Write of any length is received to the same block. It DIGITAL CONFIDENTIAL NVAX Chip Interface 3-41 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.3.5.1.4 How memory handles reads to Owned blocks If the ownership bit is already set in memory when the DREAD arrives, data is not returned immediately to the commander. Once the node which owns the data Disown Writes the block, the Ownership bit is set in memory and the data is returned to the commander. The fact that the ownership bit was set at the beginning of the reference is transparent to the commander on the NDAL. Once an OREAD is issued on the NDAL, the data must be returned to the commander without requiring any retry sequence. The analogous statement is true for an IREAD or a DREAD: If the ownership bit is already set in memory when the IREAD or DREAD arrives, data is not returned immediately to the commander. Once the node which owns the data Disown Writes the block, the the data is returned to the commander. The fact that the ownership bit was set at the beginning of the reference is transparent to the commander on the NDAL. Once an lREAD or DREAD is issued on the NDAL, the data must be returned to the commander without requiring any retry sequence. In certain error-handling situations, l'.TVAX itself may issue a read to a block which it already o·wns. In this case the memory controller should handle the read as it normally would: wait until !\'"'V-U completes the WDISO"'N, then return the read data to NVAX and set the ownership hit if the read was an DREAD. 3.3.5.1.5 Read cycle description and timing A read command cycle consists of a commander driving an address cycle on P%NDAL_H<63:0>, as sho'\vn in Section 3.3.4. The commander drives Po/cCMD_H<3:0> with DREAD, IREAD, or OREAD. It drives its own identification code on P%ID_H<2:0>, and it drives correct parity on P%PARlTY_H<2:0>. The timing for a read cycle is shown in Figure 3-10. In this example, N'VAX is doing a read. In Cycle 0, NVAX asserts P%CPU_REQ..L to request the NDAL. It is granted the bus immediately, as shown by the assertion of Po/oCPU_GRANT_L in cycle O. (This example assumes that no other device was requesting the NDAL during this cycle.) The assertion ofPo/oCPU_GRANT_L in phase 3 of cycle 0 means that NVAX is obligated to drive the NDAL in phase 1 of Cycle 1. It drives the read address out at that time. In this example, it deasserts its request line at the same time as it has no other requests to make. (It is not obligated to deassert request if it does have other requests to make.) The device receiving the read recognizes it in phase 3 of cycle 1, and computes parity across the data it received. In this example, it recognizes no parity error, and asserts Po/GACK_L so that it is valid in phase 3 of cycle 2. The CPU receives P%ACK_L and knows that the read address . cycle completed successfully. If there had been a parity error and Po/GAC~L had not been asserted, NVAX would have responded with an error condition as described in Section 3.3.10.3. 3-42 NVAX Chip Interface DIGITAL CONFIDENTIAL c Ci r~ 11 _" 0 0 z PI I~ __ I ____ 1 2---- ____ : NDAL_H ~n(M(M(~(~(T(r«r.(M(M(~(~(~(T«r(M(~------T)~)~««««««««« CPU_REO_L \S\\\\\\\) reque.t to do read null data cycle ); r ~. Pl ~------I--" D A L C Y C L 1: I~~_~I ~~ ~I_~_I~ C Y C L E P4 CiJ 1 .. 0 c ~ I~ PI o ---I--~" D A L CYCL1: 1 ::!! m DAL (Q C »>---- »~««««««««« drive re8d 8ddress null data cycle Z ~ :u r 11alllll1 m a. ct CPU_HOLD_L 3 :; ""SID\\ CPU_GRANT_L arbiter (Q {{fllllU71 bus granted to read driver ~ n ~ ·0 e: to ~ null data cycle not ACKed PHI12 I , , PHI2l _ _--'I PRI34 _ _ _ _ _ _-II-------u-- loom \\\\\ill\\\\\\"\\ill""" AC~_L read address cycle ACKed , , g Et. , _ _ _ _ __ l 00 \ I '\ \ I /----~'__ '\ __ 1 ~ g I Et. PRUI Z ~ \ I 1--- - - - - - - - - - \ I \ ---I---I------I---i--- - - - I ~ ~ ~. !i. S .... 0 :r ~ - ~ ii" 5" " i- j Iw CC CC s .... .... NVAX CPU Chip Functional Specification, Revision 1.0t February 1991 3.3.5.1.6 Read Data Return cycles (RORO, RDR1, RDR2, ROR3) The Read Data Return command is used in response to any read request, whether !READ, DREAD, or OREAD. Multiple cycles are necessary to transfer all of the quadwords in a given hexaword transaction, and the cycles are not required to be consecutive. The commander, which has been monitoring the bus traffic waiting for its return data, latches the information. The responder returns the commander ID with the returned read data so the commander can recognize the returned read data it requested. For a hexaword read, the four fill quadwords may be returned in any order. The NDAL Read Data Return command identifies the location of each quadword within the natural boundary as it is returned so that it can be placed in the correct location regardless of the return order. The data which is returned is naturally aligned within each quadword. In I/O space, only one cycle's worth of data is returned. The actual amount of valid data returned depends upon the byte enable which was issued with the read request, as described in Section 3.3.4.1.2. The Read Data Return command corresponding to the requested 110 space address is used in returning the data. Read Data Return cycles do not have to occur in adjacent cycles. The requested quadword should be returned as soon as possible, for performance reasons, even if the remaining quadwords are not yet available. The remaining quadwords may be sent as they become available. Because the ~L>AL is a pended bus, multiple reads may be outstanding at a time. Because Read Data Return cycles do not have to occur contiguously, it is possible for Read Data Return cycles resulting from different read requests to take place in an interleaved fashion. Table 3-17 shows the correspondence between address hits <4:3> and the RDR command used in returning data at that address. (Bits <4:3> indicate the alignment of a quadword of data within a hexaword.) The RDR command must correspond to the address of the data being returned for transactions of all lengths, whether quadword, octaword or hexaword. The correct RDR command must be used for both memory space and 110 space. Table 3-17: RDR usage for ALL fill cycles Address bits <4:3> COD1Dl8.Dd used for fill cycle 00 RDRO 01 RDRl 10 RDR2 11 RDRS 3.3.5.1.7 Read data error cycles (RDE) RDE is used to notify a commander of a problem with read data which is being returned. For example, the memory interface may use this command when it encounters an uncorrectable read error. 3-44 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification. Revision 1.0. February 1991 Once a Read Data Error cycle is sent for a particular read, no further read responses may be sent for that transaction. The following sequence illustrates the series of events during a return data of hexaword length containing an uncorrectable read error. In this example, HOLD is used to return the data in consecutive cycles. Figure 3-11: RDE example o 1 2 3 5 I resplHOLD IHOLD I I IRDROIRDRllRDE I I Arb CMD_H NDAI.._H I !D_F. I Idataldatal I Icmdr Icmdrl cmdrl ACK_L I I 3.3.5.1.8 4 IACR lACK IACR Read data cycle description and timing During a read data cycle, Po/tC:MD_H<3:0> is driven to the value representing RDRO~ RDR1, RDR2, RDR3, or RDE. P%NDAL_H<63:0> is driven with the quadword of read data being returned. P%ID_H<2:0> is driven with the ID which was issued \vith the original read request. Correct parity is driven on P%PARITY_H<2:0>. The timing for a Read Data Return cycle is shown in Figure 3-12. In this example, IOI_NODE has fill data to return. In cycle 0, I01_RE'LL is asserted to request the bus, and IOl_GR.4...:."""T....L is asserted in response. Since I01_G~"T_L was asserted in cycle 0, IOl_NODE is obligated to drive the NDAL in cycle 1. It does so and returns the fill data. The original requestor of the data receives the data at the beginning of phase 3 of cycle 1, and since it detects no parity errors, it asserts Po/aACK_L so that it is valid in phi3 of cycle 2, as shown. 3.3.5.1.9 Read Transaction Examples 3.3.5.1.9.1 Quadword Read and Fill A quadword read consists of a command transfer followed by a return data transfer as shown below: DIGITAL CONFIDENTIAL NVAX Chip Interface 3-45 I 0) z -N D A L 0 I PI e :r -0 5' I' I~ P3 ---I---N I o P4 D A L 1~ Pi C Y C L E 1 - --- --- _____ 1_ _ ~)3 -----1 PI) --1---- N C D A L _ ~_I C Y C L E I~ P2 2,-=-1 ! (; ~ I NDAL_H -<"("'(T«7"'l("('Y("r.«"(,",,(T«7"'l("'("T(,.,«.--~---r»n>-< « «( «( null data cycle IOl_IUlO_L ,\\"'\\\\ s- a CYCLE ::II CO request to return read data « « « «( ( rend <lat_" rel:urn » >-< « « «( « « « « «( ) n-- null data cycle lllllllZ7/ arbiter AC~L null data cycle not ACKed I PHI12 PHI23 PHI34 PHUI bus qranted to read reftpoll<ier - _ _ _ _ _...,,1 " , / _ --~,_ ,-------- - / \ g ~ r 0 0 z JI c m § r> -~---~- --,'-_ _ _ _ __ 1--- - - - , _ __ \ / \ :;: ::!l E:t. t () f S; i. J f- rI. at"" ~ ____ ~---'____ _ _ _ _ / / - - - - - - - - - - - - - - - ----1----1 _---1-----1--- - - - - - 0 C 11777711l read data return ACKed 1-----" /1------'"'"\\ _ _---J1 \\\\\\\\S\\\\\\\\\\\\\\\\\\\ (") i = 1l1ll11l11 ~ c r> ~ \\\\S\\\\\ IOl_GRANT_L (") Z ; IOl_HOLD_L ~ f ~ i ~ NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 3-13: 0 Arb eMIl_H NDAI.._H ID_B ACK_I. Quadword Read and Fill 1 2 :3 I emar I I I read I I I addr I I Icmd:1 I I I ACKI I I 5 4 I resp I i 6 I I I IRDR21 I I I I I data I I Ic:::mdrl I I ACKI The two transfers are the read command and the Read Data Return. The CPU commander arbitrates for the NDAL in cycle 0, and wins. In cycle 1 it drives the command and address of the read, and its own ID (for use later to identify the returning data). In cycle 2 the receiver for tha t cycle asserts P%ACK_L if no parity error was detected on the bus. Sometime later (call it cycle 4) the return data transfer begins with the responder arbitrating for the l\IL>AL. Having won it, in cycle 5 it drives the command, the data, and the commander's ID. The status of the returning data is specified in the read response code: either Read Data Return or Read Data Error. In this example, the quadword requested '\vas to quad\vord 2 of a hexaword, so the RDR2 command is used in returning the data. The commander monitors the NDAL and checks for an ID match during Read Data Return cycles. An ID match indicates that the read data is meant for that commander. In cycle 6, the commander asserts P%ACK_L if it detected no parity error during the previous ~'"DAL cycle. 3.3.5.1.9.2 Multiple Quadword Reads The only type of multiple quadword read which is used by NVAX is the hexaword read. Octaword reads are also supported by the NDAL protocol but are not issued by the NVAX CPU. These read transactions move multiple quadwords of data from the responder to the commander. The command transfer of the transaction is shown below. Figure 3-14: o Arb eMIl H NDAL_H IDB ACK_I. Read command on the NDAL 1 2 :3 I emar I I I I read I I laddrl I I cmdr I I I ACKI The following sequence illustrates the response to a hexaword read. In this example, quadword 1 of the hexaword was the requested quadword, so Read Data. Return 1 is the command accompanying the first data to return. The requested quadword is returned first for performance reasons, although that is not required by NVAX or the NDAL. DIGITAL CONFIDENTIAL NVAX Chip Interface 347 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 3-15: Read data retum without using HOLD o Arb CMD_H NDAl.._E 2 1 5 4 3 6 7 Irespl Irespl Irespl Irespl I IRDRlI IRDRO I IRDR31 IRDR21 Idata I Idata I Idata I Idatal ID-:H Iemar I ACK_L I Icmdr I lACK I lemarl lACK I lemarl lACK I lACK The transfer above moves four quadwords of data. The command field of the NDAL in cycle 1, 3, 5, and 7 says Read Data Return with the P%ID_H field identifying the intended receiver (the transaction commander). Each cycle provides a new quadword of read data and the P%ID_H remains unchanged. The example shows no transactions interleaved with the Read Data Return cycles, but it is entirely possible for non-related transfers to be taking place in the cycles between the fill cycles for one read. Read data may be returned in continuous cycles, if desired. through the use of the hold arbitration signals (see example below). The transmitter asserts its hold line in the first cycle to ensure that it maintains use of the NDAL long enough to complete the transfer. The hold lines are the highest priority arbitration lines and thus guarantee access. An interface is constrained to a maximum of four consecutive cycles in which it can assert its hold line. Figure 3-16: Read data return using HOLD o eM!) H 3-48 4 1 Iresplholdlhcldlholdl I I IRDR2 IRDR3 IRDRO IRDRl I ND;,L_H Idataldataldatalda~al ID_H I cmdr Icmdr I cmdr Icmdr I ACK_l I lACK lACK lACK lACK NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.3.5.2 3.3.5.2.1 Writes Normal Write Transactions (WRITE) These transactions are used to move a pattern of bytes from an NDAL commander to one of the responders. The byte enable functionality is only used for quadword and octaword length transactions. In any hexaword write, all bytes are written regardless of the byte enable values. Parity must be correct for all bytes sent from any node, as NVAX checks parity across the entire NDAL during every cycle. If NVAX sees a write on the NDAL, it treats it as an invalidate request. A block invalidate is done if it is valid in the cache. A writeback is done if the block is owned. 3.3.5.2.2 Disown Write Transactions (WDISOWN) The Disown "Trite transaction is the complement to the Ownership Read. After NVAX successfully gains ownership of a block in memory, it must relinquish ownership when another node wants ownership of the block or when the Bcache needs to do a deallocate. h'T\TAX accomplishes this by performing a Disown Write to the memory ,vith the latest copy of the data. The memory, which has been monitoring the bus traffic, notices that the transaction requested is a Disown Write. This condition allows it to clear the ownership bit in memory and to write the data as requested. I\'"V:<\X uses the Disown "Trite command ofhexaword length to perform writebacks from the backup cache. "W"hen the cache is off, it uses quad'\vord Diso'WIl '\\;'rites to achieve the effect of a Write Unlock. 3.3.5.2.3 Write Data and Bad Write Data (WDATA,BADWDATA) The Write Data command is used during the data cycles ofa write if the data is good. If the data has been corrupted in some way, for instance, there were uncorrectable errors in a cache which was storing the data, the command used is Bad Write Data. When one quadword of a hexaword Write Disown is bad, the Bad Write Data command is only used for that quadword. The Write Data command is used for the good quadwords. The memory can use this information to distinguish which quadword of a hexaword block is bad. In addition, P%S_ERR_L may be asserted when the Bad Write Data command is used, to notify NVAX of the error. 3.3.5.2.4 Write transaction description and timing In a Write transaction, a commander gains the NDAL and sends an address cycle. In this cycle, P%CMD_H<3:0> is driven to the value for WRITE. P%NDAL_H<63:0> is driven with the address, the transaction length, and byte enable. P%ID_H<2:1> is driven with the commander's identification code, and P%ID_H<O> is driven with any value. The commander immediately follows this cycle with one to four consecutive cycles of write data, depending on the length specified. In these cycles, P%CMD_H<3:0> is driven with either the WDATA command or the BADWDATA command. P%NDAL_H<63:O> is driven with the write data. P%ID_H<2:1> is driven with the commander's identification code, and P%ID_H<O> must be driven, but may be driven with any value, as long as the parity is correct. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-49 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 All interfaces on the NDAL decode the address, and the one that recognizes the address becomes the responder and asserts P9CACK_L. The responder accepts the command, address, and data and performs the requested write. For quadword and octaword length transactions to memory space, the byte enable field that accompanies each command and address is completely unrestricted. Each bit in the IS-bit byte enable field corresponds to a byte of data in the associated quadword or octaword. If the bit is 0, that byte must not be written; if the bit is 1, that byte must be written. For hexaword write transactions, the responder ignores the byte enable and writes all 32 bytes. For I/O space transactions, the byte enable is used as indicated in Section 3.3.4.1.2. The timing for a quadword write on the NDAL is shown in Figure 3-17. In cycle 0, NVAX requests the bus for the write by asserting P%CPU_RE<LL. In this example, no higher priority request is pending, so NVAX is granted the bus right away, in cycle O. NVAX then drives the write command and address in cycle 1, and asserts P%CPU_HOLD_L at the same time in order to retain the bus. In cycle 2 the write data is driven. Assuming there are no parity errors, Po/oACK_L is asserted by the receiver in cycle 2. This is in response to the address cycle of cycle 1. In cycle 3, which is not shown, P%ACK_L is asserted for the data cycle, cycle 2. 3.3.5.2.5 3.3.5.2.5.1 Write Transaction Examples Quadword Writes Quadword writes move some number of bytes from the commander to the responder as specified by the byte enable field. The commander arbitrates as usual and upon winning the NDAL, drives the appropriate write command, the intended address, the data byte enable, and its own ID and asserts its hold line to signal that it will need the next cycle also. In cycle two, it identifies the cycle as a Write Data Cycle and provides the write data. If an NDAL parity error is detected on cycle 1 or 2, it is signaled in cycle 2 or 3 by withholding the assertion of P%ACK_L. The cycle timing for a quadword write is shown in Figure 3-18. 3-50 NVAX Chip Interface DIGITAL CONFIDENTIAL c s ilr -N D A L 0 I 0 Z "11 6 m § PI ---I--N I C Y C LEO I~ P3 NDAL_H ~ « « « « « «(<< (<< CPU_IUlQ_L \\\\\\\\\\ P4 null dat.a cycle » PI :!1 CQ D A L I_~ C Y C. L ..~ -.--------1--__~~__ ~_I 1 C N D A L __ I _ P3 C Y C L E P2 2 OJ _I~ P4 ~ >---< « «( «( «(lillBl_____J.I>----< « « (« « « « « (« dri.ve wdle ,,<.I.tr:r·no » >-- drive write dat.a ~ r 1: alllalll z .... g ~ request. to do writ.e :s. tr \\\\\\\\\\ CPU_HOLDJ. 1111111111 \.\.\.'\.y\'\.'\.'\.'\. CPU_GRANT_L gron\; halel by orbit.er bus 9rant.ed to writ.e driver d: /lI1llZll7 3 Sea e fd~ ~ ~ t!!j AC~L PHI12 I PHI23 __ PJlI34 _ _ _ _ _ _.JI PHI41 \\\\\\\\'\.\\\\\\lli\\\'~\ null data cycle not ACKed \ ---Jr----~- , I -, writ.e address cycle ACKed \ I I \ '\ ___ m _ / ,- -, ,'"-__ r~~ l rn i l / --, g tt- ,'-_ _ _ __ '\ / allOW / - - - - - - - - - - - - ---I--------I---··-----·--I----i---- - - - - - - - - - - - - $ r. Z rI. 0 .... ~ g :r j:> a t!!j -0 CD ;. fa t.... j .... .... CO C&) NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 3-18: Quadword wrHe on the NDAL o Arb CMD_P. NDAl._H ID H ACK_L 3.3.5.2.5.2 123 4 I emcir IHOLD I I I Iw:1tlwaatl I ladcirldatal I I emcir I emcir I I I lACK lACK Multiple Quadword Writes The only multiple-data-cycle write issued by the NVAX CPU is the Hexaword Disown Write. Hexaword writes are similar to quadword writes except for the amount of data moved. The byte enable must be ignored in hexaword write transactions and all the bytes of the hexaword must be written. The first cycle of a hexaword write is identified with the length desired; successive cycles are identified as write data cycles. The hold line remains asserted, maintaining use of the NDAL for the commander. The four auad'~ords of data "ithin the hexaword must be issued in order from lowest address to highest address. The order then is quad,vord 0, quadword 1, quadword 2, quadword 3. (Address bits <4:3> determine the position of a quadword within a hexaword.) Unlike fill data cycl.es, the same command, 'W'DAT..A\., is issued for every write data cycle, so the order in which the data is issued is essential so that it is written to the correct address in memory. . A hexaword write is shown in Figure 3-19. Figure 3-19: Hexaword write on the NDAL o Arb om H NDAL_H ID_H AC:K_L 1 :2 3 4 5 lemcirlholdlholdlholdlholdl I I Iwrt Iwdattwdatlwdattwdatl I ladciridatOIdatlldat21aat31 I lemcirl I I I I I I lACK lACK lACK lACK lACK NOTE The write data must always immediately follow the write address cycle with no NULL cycles in between. The NDAL protocol also allows for octaword writes. The NVAX CPU does not use these, but they may be used by other nodes. The two quadwords of data within the octaword must be issued in order from lowest address to highest address. The order then is quadword 0, quadword 1. (Address hit <3> determines the position of a quadword within an octaword.) 3-52 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 .An octaword write is shown in Figure 3-20. Figure 3-20: Octaword write on the NDAL o 2 ID_E ACK_L I CMD_H NDAl.._H 3.3.5.3 1 I emar Iholdlholdl I I Iwri~lwda~lwdatl I laddrldatOldatll I lemdrl I I Arb I 4 5 lACK lACK lACK NOPs For implementation reasons, occasionally NVAX will arbitrate for the NDAL and, if the bus is granted, it \vill drive a NOP. This only happens when ~\TAX has just driven out two back-to-back transactions. This happens rarely, and since 1\TVAX has the lowest priority of the NDAL nodes, it is not a performance problem. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-53 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.3.6 Cache Coherency Ownership Reads and Disown Writes on the NDAL are intended to support writeback caches by attaching an owner status to each block in physical memory. A block in memory is defined as a hexaword, or 32 bytes. A node which owns a block may write it repeatedly without accessing memory. Only one node owns a given block. Ownership is passed from memory to a non-memory node through an Ownership Read command. Ownership is passed from non-memory nodes to memory through a Disown Write command. The ownership bits in the caches and in memory indicate that a cache owns the block. The ownership bit in the writeback cache is set when the cache owns the block and is clear when the cache does not own the block. The ownership bit in memory is set when some cache owns the block and clear when memory owns the block. Shared read-only access to a block is permitted only when memory owns it. Otherwise the block can only be read by the node which owns the block. 1\'VAX nodes with writeback caches can gain ownership and retain it for a very long time. ~\TAX monitors the bus continuously for memory space read-type and write commands to memory space by other nodes. ~nen l\TVAX detects a request for a block that it owns, it will perform the disown write to memory, allowing the original command to complete successfully. Table 3-18 shows what action is performed in the backup cache based upon the state of the block in the cache when a particular command is received. Table 3-18: NVAX Backup Cache Invalidates and Wrltebacks ~'1>AL Command Invalid block Valid & Unowned Valid & Owned Writeback, IREAD,DREAD set Bcache to valid-unowned state OREAD Invalidate Writeback, Invalidate WRITE Invalidate Writeback, Invalidate WDISOWN Some devices other than NVAX will access memory directly over the NDAL. As these commands go to memory, NVAX recognizes the command and performs the appropriate cache coherency action. NVAX does not acknowledge the commands as the memory interface is the receiver for the transaction. NVAX distinguishes cycles driven by devices other than itself by decoding the value driven on P%ID_H<2:0> for the cycle, and recognizes those as cache coherency transactions. In some systems, such as the XMI2 system, there is a system bus to which multiple NVAX CPUs are interfaced. In these systems, memory commands which occur on the system bus must be driven into the NDAL so that NVAX can respond to them as necessary with cache coherency actions. For example, if an OREAD happens on the XMI2, an OREAD must be driven onto the NDAL to trigger NVAX to write back the block if it owns it. However, there is no node on the NDAL which becomes a responder to a memory access transaction which is driven FROM the memory interface. The result is that P<ftACK_L is not asserted to acknowledge such a transaction. This is not an error condition. 3-54 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 For more detail on the specific cache coherency requirements in the XMI2 system, refer to Section 3.4.1. 3.3.7 Interrupts The P%ffiQ..L<3:0> lines provide a general-purpose interrupt request facility to inteITUpt the NVAX CPU. These lines are level-sensitive, NOT edge sensitive. Once a node asserts its interrupt line, it should keep it asserted until NVAX services the request. When NVAX receives an interrupt, it issues a read on the NDAL to one of four specified 110 space addresses. There is one address specified for each Interrupt Priority Level. This mechanism replaces the specific command, Read Interrupt Vector, which was used in previous systems. Read cycles to these specified I/O space addresses are monitored by all nodes which have an interrupt outstanding. The node which responds :first with a Read Data Return transaction will deasserl its interrupt request. Interrupting nodes on the ~'1)AL do not have to deassert and reissue their interrupts after one node is serviced. The remaining nodes monitoring the bus see the return vector cycle and maintain their interrupt requests in anticipation of another ~"V_4Jr 110 space read for an Interrupt Vector. If the common interrupt line remains asserted, NVAX will initiate another such cycle to be fielded by another first responder. Chapter 10 describes interrupts in detail. 3.3.8 Clear Write Buffer Clear Write Buffer is used to force all writes in the processor to be delivered to memory. In previous systems, an explicit Clear Write Buffer command on the pin bus was used. The NDAL uses an 110 space address which may be read or written to indicate that write buffers should be cleared. The I/O space read is used when the CPU wishes an acknowledgement of the request. The CPU waits for the "read data to return. before continuing operation. The actual read data which is returned is meaningless except to allow the CPU to proceed. The 110 space read does not complete until all previous writes are complete. This mechanism may be used during a process context switch to force any errors associated with previous writes to happen in the context of the current process before the process context switch actually occurs. II The device which responds with read data to the Clear Write Buffer is system dependent. In theory it would be memory, since memory responding would indicate that all buffers before memory had been cleared. The I/O space write which serves as Clear Write Buffer is used when the process mode changes but the process is not being switched. Here the purpose is to :flush the writes as fast as possible when the mode changes, and to flush them ahead of any subsequent reads. Because the mode is changed often, it would be a performance hit to use the CWB read and to have to wait for the read data to return. Therefore the Clear Write Buffer is done as a write. When the Cbox receives the clear write buffer command from the Mbox, it flushes its write queue. The writes are delivered to the backup cache, since it is writeback, rather than directly to memory. The I/O space clear write buffer command, whether a read or a write, is then issued on the NDAL. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-55 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.3.9 VAX architecturally-defined interlocks A VAX interlocked instruction causes the generation of a Read-Lock and a Write-Unlock which are guaranteed to happen back-to-back. The NDAL does not explicitly define interlocked transactions. Instead, the Ownership Read command is used in place of Read Lock and the Disown Write command is used in place of Unlock Write. If the interlocked location is already owned in the backup cache when the Cbox receives the read lock from the Mbox, the command is never seen on the NDAL as it is serviced directly on the cache. Writeback of the block is prevented until the write unlock is issued from the Ebox. 3.3.9.1 Ownership and Interlock transactions If !\TVAX has a read lock in progress and P%CPU_WB_ONLY_L is asserted, the CBOX issues the write unlock regardless of the assertion ofP%CPU_WB_ONLY_L. Otherwise, deadlock might occur ifP%CPU_WB~ONLY_L were asserted and a device in the system was waiting for NVAX to do a Write Unlock before deassertingP%CPU_W'B_ONLY_L. For example, memory would not return Read Lock data to an 110 device if the ownership bit 'were set. The ~\~ CPU does not support interlocks to 110 space. If the Cbox receives an interlock to I/O space, it converts it to a normal read on the l\rnAL. 3-56 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.3.10 Errors The NDAL supports the detection of all single-bit and some multiple-bit transmission related elTor conditions on the P%NDAL_H, Po/oCMD_H, P%ID_H, and P%PARITY_H lines by implementing parity across those lines. Additionally, the NDAL allows commanders to recover from some memory and I/O-space read/write class elTors. 3.3.1 0.1 Transaction Timeout Each NDAL node must implement a timeout counter for each read which it may have outstanding. The NVAX Cbox implements two timeout counters, one for each possible outstanding read. If a read request times out, it is aborted by the Cbox. Any missing Read Data Return cycles will eventually cause that read to timeout in the Cbox. See Table 3-21 for details on how timeout is handled. The ~'VAX BIU starts its read timeout counter when it receives P%ACK_L assertion for the read. The counter is an 8-bit counter '\vhich, in normal operation, is clocked with a signal from the Ebox, EO/CTIMEOUT_ENABLE_H. The base counter in the Ebox is 16 bits wide. This implementation results in the timeout values shown in Table 3-19. Table 3-19: NVAX Read TImeout Values In Normal Mode l'\\TAX chip speed Timeout Granularity Read timeout l().ns~-VAX 655 microseconds 16i milliseconds 12-ns NVAX 786 microseconds 200 milliseconds 14-ns:t-.TVAX 917 microseconds 234 milliseconds A test mode for the NVAX read timeout counters is provided, and is described in detail in Chapter 13. In test mode, the read timeout counters are run directly from the internal NVAX clock, rather than from E%TIMEOVT_ENABLE_H. The test mode timeout values are shown in Table 3-20. Table 3-20: NVAX Read Timeout Values In Test Mode NVAX chip speed Timeout Granularity Read timeout 10-ns NVAX 10 nanoseconds 2.5 microseconds 12-ns NVAX 12 nanoseconds 3.0 microseconds 14-ns NVAX 14 nanoseconds 3.5 microseconds The occurrence of transaction timeout is not normal and is expected to happen only when the system is broken. More information on timeout may be found in Chapter 13. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-57 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.3.10.2 Non-exlstent memory and 110 An address which is not implemented in memory on a particular system is known as non-existent memory. An 110 address of a device which is not present on a particular system is known as non-existent I/O. Devices on the NDAL must acknowledge any transactions to address space which they recognize by asserting P%ACK_L (except when there is a parity error). An address which is not recognized by any NDAL device is not acknowledged. If Po/oACK_L is not asserted in response to an NVAX request, the Cbox records the error by saving state in its error registers. (This error case is covered in Table 3-21). Software can read the error registers in the other NDAL nodes and find that the absence of P%ACK_L was not due to a parity error on the NDAL. From that information it can deduce that the problem was non-existent memory or 110. If an interface between the NDAL and another bus recognizes a read to some address and ACK's it, then finds that the address is not implemented on the other bus, the interface must use RDE to terminate the READ on the :r-..TDAL. It must not simply let the read time out, as this method of terminating the transaction takes much longer. If an ~~AL device ACK.'s a write, then determines that it was to non-existent memory or I/O space, it should notify the CPU appropriately. One possibility is to assert P%H_ERR_L. 3.3.10.3 Error Handling This section describes the required behavior of NDAL commanders and responders in reaction to error conditions. In general, NDAL errors are handled as follows: • • • • 3-58 Null cycles have correct parity but are not acknowledged. The absence ofP%ACK_L assertion for these cycles is not an error condition. Any NDAL receiver detecting bad parity in any field on a non-NULL cycle must ignore the cycle. P%ACK_L must not be asserted and no action should be taken in response to the NDAL command. The receiver may log the error. The device which drove the NDAL cycle must log the error (the absence of lHfDACK_L assertion) and notify NVAX in some way, depending on the exact situation. If an NDAL responder returns Read Data Error for one quadword of return data, it must not send any further quadwords of data for that requeSt. If any further fills are received, the Cbox treats them as unexpected fills as described in Table 3-21. On an ownership read, the memory should set the ownership bit as soon as it starts sending data back to the requestor. The NVAX backup cache does not set its ownership bit until it receives all the data for the block, so if any fill data is lost, the block will appear not to be owned by any element in the system. This simplifies error handling ifNVAX did the OREAD because of a write, and the write data has already been written into the cache when the error occurs. No other device can get access to the block while the error is being handled. NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • An NDAL memory node may not clear its ownership hit unless all write data cycles associated with the Disown Write transaction are properly received. H write data is sent with the BADWDATA command, it is considered to be properly received. The NVAX BIU does not retry failed commands on the NDAL. If the Cbox recognizes that data has been lost, it asserts C%CBOX_H_ERR_B to the Ebox. (In some cases, the data may be recoverable by software.) When C%cBOx..H_ERR_B is asserted, the Cbox always puts the Bcache into Error Transition Mode. The Cbox asserts C%CBOx..S_ERR_H when it recognizes a soft error. A soft elTor does not necessarily interfere with code running on the machine. In some cases, the Cbox enters ETM upon recognizing a soft elTor. Table 3-21 shows the response of the NVAX CPU for every eITor situation. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-59 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 3-21: NDAl Errors and NVAX CPU Error Responses General Problem Specific situation and action taken by NVAX CPU NVAX detects parity en-or on any NDALcycle P1foACK_L asserted (inconsistent parity error) P%ACK_L not asserted (parity error) Cbox asserts OICBO~sJB.R_H, puts backup cache into Error Transition Mode. An invalidate or writeback request may have been missed. Cbox asserts OIiCBO~s..EBR..H, puts backup cache into Error Transition Mode. An invalidate or writeback request may have been missed. 1 Po/cACK_L not asserted for :NVAX-originated command IREAD, DREAD (to Cbox aborts the read in the Cbox and the MOOr, asserts memory or 110) CC'ACBGX..S_ERRJ. OREAD Cbox aborts the read in the Cbox and the Mbox, enters ElTor Transition ~Mode~ and asserts ~Ox...S_ERR_EL If the OREAD was done because of a write miss, the write will now be done straight to memory since the cache is in ET~1. WRITE WDISOWN, or address cycle or data cycle (to memory or lIO) Read timeout or Read Data Error before requested quadword is received Cbox asserts ~BOXJI_ElUt_E. enters Error Transition :Mode. Data which should have been written to memory has been lost. If the elTor was on the data cycle and, in the sysum implementation. memory marks the data bad. software may choose to ignore the hard e!TOr response since the en-or will. be detected whenfrl'the data is read. ~"VAX continues to send the \VDATA cycles even if the address cycle or one of the WDATA cycles is NAICd. IREAD, Cbox aborts the read in the Cbox and the ~!box, asserts DREAD (memory or lIO space) CIliCBGX..SJm,R..H. OREAD Cbox aborts the read in the Cbox and the Mbox, asserts The Cbox does not set the ownership bit in the cache. If memory has set its ownership bit, there is no record of ownership for the block in the system; however, soft;ware can analyze and clean up the problem by reading the. Cbox error registers. If the OREAD was done because of a write miss, the write will now be done straight to memory since the cache is in CIliCBGX..SJRlUl, enters Error Transition Mode. ETM. lin some systems, such as the NVAX XMI-2 system, commands may be sent on the NDAL purely to notify NVAX of an invalidate request; these commands are not acknowledged. 2The Cbox aborts the read in the Cbox by clearing valid bit in the FILL_CAM; it aborts the read in the Mbox by asserting OACBox..BAB.D_EU...a with the I_CF or D_CF command. 3-60 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 3-21 (Cont.): NDAL Errors and NVAX CPU Error Responses General Problem Specific situation and action taken by NVAX CPU Read timeout or Read Data Error after requested quadword successfully received Read timeout or RDE on OREAD with pending writeback request DIGITAL CONFIDENTIAL mEAD,DREAD Cbox aborts the read in the Cbox and the Mbox, asserts c.cBO%..S_EltR.R, does not validate cache entry. OREAD for a read-modify or a read-lock Cbox aborts the read in the Cbox and the Mbox, asserts The block is not validated or marked owned in the backup cache. Depending on system implementation, the ownership bit may be set in memory. If the OREAD was for a read-modify, software can analyze and correct the potential inconsistency in ownership information by reading the Cbox error registers. If the OREAD was for a read-lock, the write-unlock will follow to memory (as a quadword disown write) after the Cbox handles the error. If the memory subsystem has set its ownership bit, this write unlock preserves consistency in ownership in the memory subsyste~ if not, the write unlock location appears to be owned by memory and will be handled as an error by memory. OREAD for a write Cbox aborts the read in the Cbox, asserts CC-~BOx...H..EBR.B:, enters Error Transition }Ylode. The write was pretiously done into the cache when the requested quadword. returned, since the Cbox merges the write data with the fill data. Since the read did not complete, the ownership bit is not set in the cache even though the new write data is in the cache. Software can recover the write data ifit is non-shared data. The backup cache must be flushed of owned data using the deallocate register, then put into force hit mode. The data can then be read and written to memory. If the data is shared, writes to memory may have been done out of order by the Cbox, and system integrity is in question. c.cBO%..S_EltR.R, enters Error Transition Mode. A pending writeback request is entered in the Fn..J..,_CAM when a writeback request arrives for an outstanding OREAD. If the OREAD does not complete successfully for any reason, the writeback request is aborted. The Cbox has not received the entire block, so it does not claim ownership for the block. Therefore, it does not write back the block as was requested. NVAX Chip Interface 3-61 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 3-21 (Cont.): NDAL Errors and NVAX CPU Error Responses General Problem Specific SituatiOD and action taken by NVAX CPU Unexpected fill or unexpected RDE received If there is no corresponding FILL_CAM entry for a returning fill or RDE, the Cbox ignores the fill data. CfIiCBOX..H...EB.R..B is asserted. The data is not placed in the Bcache and not sent to the Mhox. 3 CEFSTS is loaded and locked; the UNEXPECTED_FILL bit is set since the £l1 or RDE was unexpected. SIt is possible to create a scenario where an unexpected fill is received and is recognized by the Cbex because there is an entry in the FILL_CAM which apparently corresponds to the:6ll. For example, suppose the Cbox starts READ A READ A times out, so the Cbox aborts it and the con-esponding FILL_CAM entry is cleared. Now the Cbex starts READ B using the same ID as the aborted READ A Now, if memory returns read data for A, it apparently cOlTesponds to the fill cam entry for READ B. The data is accepted and NVAX is unknowingly operating with incorrect data. This behavior may cascade into READ C, READ D, etc., if the Cbox always has a new read outstanding by the time some unexpected data arrives. Eventually, however, the :fill cam entry will be empty when read data is returned, and the Cbox will recognize the error. Before the Cbox recognizes the condition. ~\:A..X may have been behavmg very strangely, as it bas probably been operating 'with either wrong Dstream. or ~'l"OI!g !stream data. 3-62 NVAX Chip Interface DIGITAL CONFIDENTIAl. NVAX CPU Chip Functional Specification. Revision 1.0. February 1991 Each system which uses the NVAX CPU chip will develop its own error strategy. In general, enough information should be logged so that software can understand the problem. Table 3-22 addresses system errors which the system designer should take into account. Table 3-22: NDAL Errors and Error Responses by System Components General Problem Specific situation and considerations to be made NDAL parity error and Po/oACK_L asserted Node has cache Assert P%S_ERR_L and disable the cache. The node may have missed an invalidate. Node has no cache Assert P%S_ERR_L. NDAL parity error and Po/oACK_L not asserted The lack of assertion of P%ACK_L is sufficient to notify the transmitter of the cycle; that transmitter is responsible for notifying the CPU of the error. If the transmitter lost a write, it should assert P%H_ERR_L. write or The memory interlace should not assert Po/oH_ERR_L because it cannot tell who sent the write. It should log the parity error. The transmitter which sent the write asserts P%H_ERR_L or takes other actions to initiate error recovery. WDATA for Disown Write a The memory should not clear its OBIT; this way, reads from other CPUs will fail until software corrects the problem. Any WDATA 'WDISO'WN to memory location which memory owns 3.3.10.4 Response is system dependent. :Memory should probably perform the write and log the error. Error Recovery In most cases an NDAL commander is permitted to reissue a failing transaction in order to recover from transient bus errors. Should the recovery fail (recovery may involve one or more reattempts of the failed transaction), then the commander logs a hard error. Implementation of error recovery is a system-dependent decision. This section contains guidelines on when a transaction may be retried. • • • • • All transactions which do not receive P%ACK_L assertion for the address cycle may be retried. Any failing NDAL Write transaction may be retried. Any failing Read to memory space may be retried. Any failing 110 space Write transaction may be retried. It is unsafe to retry any 110 space Read transaction receiving a response timeout since some 110 devices may have read side effects. The NVAX CPU will not implement retry on any NDAL transactions. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-63 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.3.11 NDAL Initialization When the NVAX CPU chip enters the reset state, the BID does the following: • • • • Tristates P%NDAL_H<63:O>, P90CMD_H<3:0>, P%ID_H<2:0>, and P%P.ARI'lY_H<2:O>. This occurs when internal reset is asserted, and is not qualified with any clock.. Releases P%ACK_L. This occurs when internal reset is asserted, and is not qualified with any clock. Deasserts P%CPU_REQ..L, P%cPU_HOLD_L, and Po/oCPU_SUPPBESS_L. This occurs when internal reset is asserted, and is not qualified with any clock. P%CPU_GRANT_L and P%CPU_WB_ONLY_L are sampled during reset. While NVAX is asserting P%SYS_RESET_L, the NDAL clocks are nmning. P%SYS_RESET_L is deasserted relative to l\T!)AL P:En12. During reset, some l\TDAL node must drive the NDAL so that it is driven with a NOP and good parity by the time P~SYS_BESET_L is deasserted. NVAX receives the NDAL during reset. The NDAL must be drh.-en to valid levels with good parity by the time reset is deasserted, to prevent 1\'"'\'AX from detecting a parity error. The following is an example of how to drive the ~-nAL with a NOp, \vhile putting valid parity on the bus: • • • • • • Drive P%CMD_H<3:0> low (this is the NOP command). Drive P%NDAL_H<63:0> low. Drive P'icID_H<2:0> low. Drive P%P.ARI'lY_H<2> low. Drive P%P.ARI'lY_H<l> high. Drive P%P.ARI'lY_H<O> low. The NVAX CPU does not assert P%CPU_REQ..L until at least 4 NDAL cycles after P%SYS_RESET_L is deasserted. P%CPU_GRANT_L should be deasserted during system reset. NVAX will not drive the NDAL if granted during reset. 3-64 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.4 The XMI·2 NVAX System A block diagram of the XMI-2 system is shown in Figure 3-21. Everything in the picture except memory,1I0, other CPUs, and the XMI-2 is contained on one module. The XMI-2 system is being developed by MSB and is a follow-on to the Mariah XMI-2 system. Figure 3-21: NVAX XMI-2 System Block Diagram 1 BCache 1 ECC I Tag I Valid I RAMs I Owned 1ROM 1 I EEPROM I I SRAM I ,---, ,------, ,----, I 1 I ,-------, I IOPort I ,------, I I TOY I '---' I ,-------+------+-+------+------, I ROMBus I :S6!~~z ------------- 1 1~·= I I X!-::2 I 1,;;'.): ! ~::- -----> lOs:: I ,--- 5"+~1 l~;;':' XM!-2 i\ II 11 XClkl 100 Ii }::'ao:1 1 i 1I II II :?.=. "\T:'=: 1<------l------------------>::r:l':.,SS:::I<--/-->1 :.a-::2 !<--->I: S?.z :,,~ 1 I " 1m;...!. I la:b:!'':.e:1 :~:lS ,-------, ! I C:::::ne: I I I ,------, , _____________ , II I I II II I! i I I i I II II 1 !.~_!.:I?~! i<-----> i I ,----------, 1I S ~:= -------I Cache I ":~ .... E:::}:, I Da':.a I R»!s I ---------II I I/O 1<-----> I I ,----------, II ---------II IOther CPOs 1<----->1 I \/ I ,-------, ,----------, 3.4.1 Cache coherency in the XMI2 system Commands on the XMI2 must be forwarded to the CPU in order to maintain cache coherency. Table 3-23 shows the XMI2 commands and the corresponding command which must be forwarded on the NDAL to NVAX. The actions which Nvax takes as a result of the NDAL commands are shown in Section 3.3.6. Table 3-23: XMI2-NVAX Coherency requirements XMI2 Command ResuJ:tiDg NDAL Command Read Dstream read Interlock Read,Ownership Read Ownership Read Unlock Write, Write Masked Write1 Disown Write,Tag Bad Data none 1WDATA cycles for the write may be omitted since the write is driven onto the NDAL for cache coherency reasons only. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-65 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Unlock Writes must be forwarded to the NDAL for the following case. Assume an 110 device does a Read Lock, Write Unlock to memory location A. Assume that the CPU wants to do a normal read to location A, and that it does not have A in its cache. Assume the following timing on the XMI: Figure 3-22: t1~ XMI2 Unlock Write example I/O device CPU I I Interlock Read A I I I I Read A Unlock Write A "\;" If the CPU reads A between the Read Lock and the Write Unlock the data the CPU caches should be invalidated after the write unlock. Otherwise, the CPU has stale data in its cache. This is because normal reads get data from XMI2 memory even if the location is interlocked. When '\rntes are fOI"\\"arded from the XMI2 to the NDAL, only the write address cycle must be driven. The write data cycles may be omitted. 3-66 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.5 The Lowend NVAX System • OMEGA A block diagram of the lowend system, called Omega, is shown in Figure 3-23. The lowend system is being developed in Maynard, in ESB, the Entry Systems Business Group (formerly MVB). Figure 3-23: NVAX Lowend System Block Diagram 1 Bc 1 ECC 1 Tag ,Valid 1 RAMs ,Owned 1 O-bit 1 memory , \_--------, \_------, 1 NMC I memory I :-s: I \_-- lw;':' ==''0 1::"nt~::fac~ I (64+=~1) -----> I :?-= Y::"=: I<--------------·------------>i& l~~~ I S?J: :iD la:bit.. : I ; i<-->!~MI I i \_--------, I I \_--' i ,------------, ;..±" 1 18 I 6.; :'a-:.a i 6 ~::: 1 ._-_ ... 3:'ock I I:C;' I (l;"v·,i;A-::?:bus Adapte=) i I I!~;''!'-=;''': Im;...:..-=~.ft: linte:!ace inte==ace \_---------------------, ;':t:-~ •• _:;- I: ICacne ,Data 1 I I iRams I ICP-bus ~----->I I I I I 1 ICP-bus V I I \_------, I I CQBle Qbus 1<---------> I I \-------, • \-----> I sse \_------,I The Lowend System implements an ownership hit in memory which is used to indicate that the NVAX CPU owns the block in its backup cache. This bit is covered by ECC. If an I/O interface issues a read or a write to a location which is owned by the NVAX backup cache, the memory interface holds the request until the writeback completes. It then completes the original transaction. The same applies to ownership transactions which may arrive from the NCA for an owned block of memory. The NCA uses the NDAL ownership transactions in order to perform interlocked transactions. One key problem in the Lowend System is the latency of a Qbus transaction. Once a device successfully issues a transaction on the Qbus, a timeout counter starts which will time out after 8 microseconds. This timing is difficult to meet in an NVAX system because of the writeback cache. The analysis of the problem may be found in the specs for the Omega system. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-67 NVAX CPU Chip Functional Speci:.6cation, Revision 1.0, February 1991 3.6 Resolved Issues 1. Issue: Should we implement Force Bad Parity on the NDAL for testing purposes, or can we get away without it? Solution: We are implementing a way to force bad parity on the command field of the ndal. 2. Issue: The arbitration signals are not parity protected. Solution: This is not a problem because they are acknowledged by grant. The commander can always detect a problem by observing grant. If a request line is broken, the CPU will eventually timeout. 3. Issue: Should the Cbox do retry on parity errors? Solution: No. The XMI has never seen a parity elTOr and it is a much longer bus with big connectors which we don't have. Retry would add unnecessary complexity. 4. Issue from Supnik: Allow space for extended addressing by moving byte enable over. Solution: Byte enable moved over. 5. Issue: Should parity be even or odd or a combination of both? Solution: Use even parity across the command, even parity across the lower longword of the hTDAL, and odd parity across the upper longword of the l\'T!>AL. The combination helps for package reasons - all pins can't drive the same way at once. (Steve Thierauf) 6. Issue: Should the ~'"D. -\L . cycle time equal 2 or 3 CPU cycles? Solution: It will be much easier to design to 3 cycles so we'll do this in the interest of the schedule. 3 cycles may cost us 3% performance but it is worth it for ease of design. 7. Issue: Should NVAX drive the lower three bits of address for I/O space transactions? Solution: Yes. It is in the critical path of I/O devices to deduce the address from the byte enable. 8. Issue: If an Unlock Write transaction is directed to a location not currently locked, should the responder perform the write operation? Solution: This is a system-dependent issue. Recommendation added to the ElTOrs section. 9. Issue: Should we have an acknowledged I/O space write? This would preserve write ordering between memory writes and I/O space writes. Solution: Historically this problem has not been addressed so our solving it is no value added. Software can be written which avoids the problem. 10. Issue: Do the lowend systems need byte parity? Solution: If a system is built without a backup cache, the performance is going to be poor so doing the read-modify-write for masked writes to memory is OK. The Lowend System will need to do read-modify-writes when the cache is in EtTor Transition Mode, but this is very rare. As long as there is time to compute longword parity it seems sufficient. Adding byte parity would increase the number of pins on the CPU and on all NDAL interfaces by 6. 11. Issue: There was not enough time for the arbiter if HOLD was a single open-drain signal. Solution: Have three hold signals, one for each commander, each of which is point-to-point. 12. Is parity enable necessary? If not, we get rid of a pin. Solution: Parity enable is not necessary. Every planned NVAX system is able to generate parity on every ndal cycle. 13. An additional command is under consideration. It would be called Disown Without Writeback (DISWOWB). It would be driven from the CPU to the memory interface after the CPU received a hexaword write to an owned block. DISWOWB indicates that the backup cache has given up ownership and invalidated the block, but is returning no data to memory. If a hexaword write is done in the system, memory has no use for the old data so it would be a waste of 3-68 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 time for the CPU to return it. Solution: This command does not appear to be useful enough to warrant the complexity. 14. Can the CPU chip remove the internal resistors on the NDAL? If we do, some chip in the system would have to pull the bus to valid levels during reset. Resolution: Yes, NVAX has removed the internal resistors. Another component in every NVAX system will pull the NDAL to valid levels during reset. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-69 · NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.7 NVAX Chip Interface Signal Name Cross-Reference All NVAX signal names and pin names referenced in this chapter have appeared in bold and reflect the actual name appearing in the NVAX schematic set. For each signal and pin appearing in this chapter, the table below lists the corresponding name which exists in the behavioral model. Table 3-24: Cross-reference of all names appearing In the NVAX chip Interface chapter Schematic Name Behavioral Model Name OICBOXJI_EIlR-.B C9iCBOx.,.S..EBll-B OIiCBOx.,.BARD_ElUt..B ~'l'lMEO'VT_ENABLE_H P%ACK_L P%ACK_L Po/aASYNC_RESET_L P%ASYNC_RESET_L P%CMD_H<3:0> P%CMD_H<3:0> Po/cCPU_GRAl'.'T_L PO/CCPU_GR.A1\'T_L P%CPU_HOLD_L P%CPU_HOLD_L Po/cCPU_REQ..L Po/cCPU_REQ..L Po/cCPU_SUPPRESS_L p%CPU_S'UPPBESS_L Po/£PU_WB_ONLY_L PO/CCPU_WB_ONLY_L P%DISABLE_OUT_L P%DISABLE_OUT_L P%DR_DATA_B<63:0> Po/oDR_DATA.,H<63:0> Po/oDR_ECC_H<7:O> Po/oDR_ECC_B<7:O> P%DR_INDEX_H<20:3> Po/oDR_INDEX....B<20:3> P%DR_OE_L P%DR_OE_L P%DR_WE_L Po/oDR_WE_L P%HALT_L Po/dlALT_L P%H_ERR_L Po/dI_EBR_L P%ID_B<2:O> Po/dD_B<2:O> Po/oINT_TIM_L P%INT_TIM_L p%m~L<3:O> Po/dR~L<3:0> Po/oMACBlNE_CHECK_H Po/oMACBlNE_CHECK_H P%NDAL_B<6S:O> Po/oNDAL_B<63:O> Po/"oSC_B Po/tOSC_H Po/oOSC_L Po/oOSC_L Po/oOSC_TCl_H Po/oOSC_TCl_H Po/oOSC_TC2_H Po/tOSC_TC2_H Po/oOSC_TEST_B Po/oOSC_TEST_H 3-70 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 3-24 (Cont.): Cross-reference of all names appearing In the NVAX chip Interface chapter Schematic Name Behavioral Model Name P%PARI'IY_B<2:O> P%PBl12_IN_H Po/DPARI'IY_B<2:O> P%PBl12_IN_H P%PBI12_0UT_H P%PBI12_0UT_H Po/DPBI23_IN_B Po/DPBI23_IN_H P%PBI23_0UT_H Po/DPBl23_0UT_B Po/DPIDS4_IN_B P%PID34_IN_H P%PIDS4_0UT_H P%PID34_0UT_H P%PID41_IN_B P%PID41_IN_H P%PID41_0UT_H Po/cpp_CMD_H<2:0> P%PP_CMD_H<2:0> P%PP_DATA_H<ll:O> P%PP_DATA_H<ll:O> P%PID41_0UT_H P%PWRFL_L P%PWRFL_L P%SYS_RESET_L P%SYS_RESET_L P%S_ERR_L P%S_ERR_L Po/cTCK_H P%TDI_H Po/cTCK_H P%TDI_H P%TDO_H P%TDO_H P%TEMP_B P%TEMP_B P%TEST_DATA_H P%TEST_DATA_H P%TEST_STROBE_H P%TEST_STROBE_H P%TMS_H P%TMS_H P%TS_ECC_B<5:O> P%TS_ECC_H<5:O> P%TS_INDEX_H<20:5> P%TS_INDEX_B<20:5> P%TS_OE_L P%TS_OE_L P%TS_OWNED_H P%TS_OWNED_H P%TS_TAG_B<31:1'1> P%TS_TAG_B<31:1'1> P%TS_VALID_B P%TS_VALID_H P%TS_WE_L P%TS_WE_L DIGITAL CONFIDENTIAL NVAX Chip Interface 3-71 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3.8 Revision History Table 3-25: Revision History Who When Description of change Rebecca Stamm 20-Feb-1991 Update after NVAX first pass. Clarmed ACK timing. Added signal name cross-reference. Added NDAL timing AC spec. Corrected Byte Enable table. Updated Bcache pin timing. J5T1MEOt1CENABLEJI clocks the Cbox timeout counter, not l5"l'Dr1EOVT_BABEJI. Added P% prefix to all pin names. Rebecca Stamm 7-Nov-1990 PP_DATA are output only. Correction: power-up. Clarify NACK'd write handling. NVAX DOES receive the NDAL I/O signals during Rebecca Stamm 4-Jul-1990 Update initialization description. Assert Herr on unexpected :6.11. Update ndal pin timing. :NVAX may drive NOPs under 'WB_O:NLY. Po/dD_H<O> not driven with same value during command and data cycles of a write. Close force_bad-parity issue. Rebecca Stamm 17-May-1990 Take out vector pins, add two new test pins~ update description of unexpected :fill handling by setting CEFSTSctJl"~ECTED_FILL>. Rebecca Stamm 20-Feb-1990 Add unexpected RDE handling. Clarmed byte enables and octaword-length transactions. Corrected running total for ~\:~~ pins. Add detailed timeout description. Added timeout functionality to P%OSC_TCl_H. Rebecca Stamm 3-Feb-1990 External release. Updates from internal review. Address<2:0> is sent out as zeros for the second half of an unaligned 110 space reference. NVAX does not implement internal resistors to pull the NDAL to valid levels during reset; a system device must drive the bus during reset. Rebecca Stamm 3O-Jan-1990 Reorganized chapter. Clarified byte enable section. NVAX issues identical data on both halves of the bus during I/O space writes. Released for internal review. Rebecca Stamm 01-Dec-1989 Revision 1.0 release. Clarified byte enable table. Added error handling for unexpected fills. Added error handling for requested writebacks whose OREADs do not complete. Rebecca Stamm 06-Mar-1989 Release for external review. 3-72 NVAX Chip Interface DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 3-25 (Cont.): Revision History Who When Description of change Rebecca Stamm 24-0ct-1989 Several NVAX pins were added, deleted, or changed in either name or functionality. The terminology byte mask is changed to byte enable. 101_WB_ONLY, 102_WB_ONLY, lOI_SUPPRESS, and I02_SUPPRESS were added, and NDAL arbitration was changed, giving the arbiter responsibility for asserted the appropriate WE_ONLY lines when a SUPPRESS line is asserted. Addition of BADWDATA command. New command encodings. Elimination of Read Lock and Write Unlock commands on the NDAL. Add better explanation of Clear Write Buffer. Update error section. Remove PARITY_ENABLE_L pin. Removed Qbus latency problem description. Assigned an ID to the memory interface. Read data may be returned in any order: :r-.,'rVAX does not require the requested quadword first, although it is a performance advantage to return the requested qw first. DIGITAL CONFIDENTIAL NVAX Chip Interface 3-73 .. Chapter 4 Chip Overview 4.1 NVAX CPU Chip Box and Section Overview The ~'TVAX. CPU Chip is a single-chip CMOS-4 macropipelined implementation of the base instruction group, and the optional vector instruction group of the VAX architecture. Included in the chip are: • • • • • • • • CPU: Instruction fetch and decode, microsequencer, and execution unit Control Store: 1600, 61-bit microwords Primary Cache: 8 KB, 2-way set associative, physically-addressed, write through, mixed instruction and data stream Instruction Cache: 2 KB, direct-mapped., virtually addressed, instruction stream only Translation Buffer: 96 entries, fully associative Floating Point: 4 stage, pipelined, integrated fioating point unit with selective stage 4 bypass Backup Cache Interface: Support for four cache sizes (2MB, 512KB, 256KB, 128KB), two tag RAM speeds and three data RAM speeds. NDAL Interface: Memory subsystem interface. Supports an ownership coherence protocol on the Backup Cache The NVAX chip is designed in CMOS-4 with a typical cycle time of 14 DB, and with the option of running chips at a slower or faster cycle time. The chip can be incorporated into many different system environments, ranging from the desktop to the midrange, and from single processor to multiprocessor systems. The NVAX is a macropipelined design: it pipelines macroinstruction decode and operand fetch with macroinstruction execution. Pipeline efficiency is increased by queuing up instruction information and operand values for later use by the execution unit. Thus, when the macropipeline is nlDning smoothly, the Ibox (instruction parser/operand fetcher) is running several macroinstructions ahead of the Ebox (execution unit). Outstanding writes to registers or memory locations are kept in a scoreboard to ensure that data is not read before it has been written. See Chapter 5 for a more in-depth discussion of the macropipeline. DIGITAL CONFIDENTIAL Chip Overview 4-1 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 This chapter gives an overview of the different sections, or "boxes", that comprise the NVAX CPU. For more information on any of the boxes, please see the appropriate chapters within this specification. Figure 4-1 is a block diagram of the boxes, and the major buses that run between them. Figure 4-1: NVAX CPU Block Diagram IBOX E%IBOX IA BUS • E%50 nTI1=IE' A E%DO::RETIRE'.J-I ~ USEO ~~~~l~HL----:~~~ FBOX EBOX -----------------TAGI DATA RAMS I ; CBOX ; NDAL _ _ _ _ _ _ __ 4.1.1 The Ibox The Ibox decodes VAX instructions and parses operand specifiers. Instruction control, such as the control store dispatch address, is then placed in the instruction queue for later use by the Microsequencer and Ebox. The Ibox processes the operand specifiers at a rate of one specifier per cycle and, as necessary, initiates specifier memory read operations. All the information needed to access the specifiers is queued in the source queue and destination queue in the Ebox. The Ibox prefetches instruction stream data into the prefetch queue (PFQ), which can hold 16 bytes. The Ibox has a dedicated instruction-stream-only cache, called the virtual instruction cache (VIC). The VIC is a 2 KE, direct-mapped cache, with a block and fill size of 32 bytes. 4-2 Chip Overview DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The Ibox has both read and write ports to the GPR and MD portions of the Ebox register file which are used to process the operand specifiers. The Ibox maintains a scoreboard to ensure that reads and writes to the register file are always performed in synchronization with the Ebox. The Ibox stops processing instructions and operands upon issuing certain complex instructions (for example, CALL, RET, and character string instructions). This is done to maintain read/write ordering when the Ebox will be altering large amounts of VAX state. Since the Ibox is often parsing several macroinstructions ahead of the Ebox, the correct value for the PSL condition codes is not known at the time the Ibox executes a conditional branch instruction. Rather than emptying the pipe, the Ibox predicts which direction the branch will take, and passes this information on to the Ebox via the branch queue. The Ebox later signals if there was a misprediction, and the hardware backs out of the path. The branch prediction algorithm utilizes a 512-entry RAM, which caches four bits of branch history per entry. 4.1.2 The Ebox and Microsequencer The Ebox and Microsequencer work together to perform the actual "work" of the VAX instructions. Together they implement a four stage micropipelined unit, which has the ability to stall and to microtrap. The Ebox and Microsequencer dequeue instruction and operand information provided by the Ibox via the instruction queue, the source queue, and the destination queue. For literal type operands, the source queue contains the actual operand value. In the case of register, memory, and immediate type operands, the source queue holds a pointer to the data in the Ebox register file. The contents of memory operands are provided by the Mbox based on earlier requests from the Ibox. GPR results are written directly back to the register :file. Memory results are sent to the Mbox, where the data will be matched with the appropriate specifier address previously sent by the Ibox. At times, the Ebox initiates its own memory reads and writes using E%V~BUS_L and E%WBUS_H. The Microsequencer determines the next microword to be fetched from. the control store. It then provides this cycle-by-cycle control to the Ebox. The Microsequencer allows for eight-way microbranches, and for microsubroutines to a depth of six. The Ebox contains a five-port register file, which holds the VAX GPRs, six Memory Data Registers (MDs), six microcode working registers, and ten miscellaneous CPU state registers. It also contains an ALU, a shifter, and the VAX PSL. The Ebox uses the RMUX, controlled by the retire queue, to order the completion of Ebox and Fbox instructions. As the Ebox and the Fbox are distinct hardware resources, there is some amount of execution overlap allowed between the two units. The Ebox implements specialized hardware features in order to speed the execution of certain VAX instructions: the population counter (CALLx, PUSHR, POPR), and the mask processing unit (CALLx, RET, FFx, PUSHR, POPR). The Ebox also has logic to gather hardware and software interrupt requests, and to notify the Microsequencer of pending interrupts. 4.1.3 The Fbox The Fbox implements a four stage pipelined execution unit with selective stage 4 bypass for the floating point and integer multiply instructions. Operands are supplied by the Ebox up to 64 bits per cycle on Eo/cABUS_B and ~BBUS_H. Results are returned to the Ebox 32 bits per cycle on F%FBOX_RESULT_H. The Ebox is responsible for storing the Fbox result in memory or the GPRs. DIGITAL CONFIDENTIAL Chip Overview 4-3 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 4.1.4 The Mbox The Mbox receives read requests from the Ibox (both instruction stream and data stream) and from the Ebox (data stream only). It receives write/store requests from the Ebox. Also, the Cbox sends the Mbox fill data and invalidates for the Pcache. The Mbox arbitrates between these requesters, and queues requests which cannot currently be handled. Once a request is started, the Mbox performs address translation and cache lookup in two cycles, assuming there are no misses or other delays. The two-cycle Mbox operation is pipelined. The Mbox uses the translation buffer (96 fully associative entries) to map virtual to physical addresses. In the case of a TB miss, the memory management hardware in the Mbox will read the page table entry and fill the TB. The Mbox is also responsible for all access checks, TNV checks, M-bit checks, and quadword unaligned data processing. The Mbox houses the Primary Cache (Pcache). The Pcache is 8KB, 2-way set associative and writethrough, with a block and fill size of 32 bytes. The Pcache state is maintained as a subset of the Backup Cache. The Mbox ensures that Ibox specifier reads are ordered correctly with respect to Ebox specifier stores. This memory "scoreboarding" is accomplished by using the PA queue. a small list of physical addresses which have a pending Ebox store. 4.1.5 The Cbox The Cbox is the controller for the second level cache (the Backup Cache, or Bcache). Both the tags and data for the Bcache are stored in off-chip RAMs. The size and access time of the Bcache RAMs can be configured as needed by different system environments. The Bcache sizes supported are 2 ME, 512 KB, 256 KB, and 128 KB. In addition, a system with no Bcache RAMs is supported, although significant performance degradation occurs without a Bcache. The Bcache is a direct mapped writeback cache with block and fill sizes of 32 bytes. The Cbox packs sequential writes to the same quadword in order to mjnimize Bcache write accesses. Multiple write commands are held in the eight-entry WRITE_QUEUE. The Cbox is also the interface to the NDAL, which is the NVAX connection to the memory subsystem. The NDAL_IN_QUEUE loads fill data and writeback requests from the NDAL to the CPU. The NON_WRITEBACK_QUEUE and WRITEBACK_QUEUE hold read requests and writeback data to be sent to the memory subsystem over the NDAL. 4.1.6 Major Internal Buses This is a list of the major interbox buses: • B%S6_DATA,..B: This bidirectional bus between the Cbox and MBox is used to transfer write data to the backup cache, to to transfer:fill data to the primary cache. • C%cBOX-,ADDR_B: This bus is used to transfer the physical address of a Pcache invalidate from the Cbox to the MBOx. • Eo/oABUS_H, E%BBUS_H: These two 32-bit buses contain the A- and B-port operands for the Ebox, and are also used to transfer operand data to the Fbox. 4-4 Chip Overview DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • E%mOx..~BUS_L: This bus is used by the Ibox to read the Ebox Register File in order to perform an operand access. An example is to read a register's contents for a register deferred type specifier. • E%IXLRETIRE_H, E%D~RETIRE_BMODE_H, E~RETIRE_RN'_H: This collection of related buses transfers information from the Ebox to the Ibox when a destination queue entry is retired. • E%SQ..BETIRE_H, WfcSQ..RETIRE_MD_H, £C1c8~RETIRE_RMODE_H, E%SQ..RETIBE_RN'l_H, Eo/cS~RETlRE_BN2_H: This collection of related buses transfers information from the Ebox to the Ibox when a source queue entry is retired. • E%VA....BUS_L: This bus transfers an address from the Ebox to the MBOx. • E%WBUS_H: This 32-bit bus transfers write data from the RMUX to the register file and the Mbox. • E_USQI1cMIB_H: This bus carries Control Store data from the Mierosequencer to the Ebox. • E_BUS%UTEST_L: This 3-bit bus transfers mierobranch conditions from the Ebox: to the mierosequencer. • Fo/oFBOX_RESULT_H: This bus is used to transfer results from the Fbox to the Ebox. • I%mOX-.ADDR_H: This bus transmits the virtual address of an Ibox memory reference to the Mbox. The address may be for instruction prefeteh or an operand access. • I%IQ...BUS_H: This bus carries instruction information from the Ibox to the Instruction Queue in the Mierosequencer. • I%mOX_IW_BUS_H: This bus is used by the Ibox to write the Ebox Register File for autoincrementldecrement type specifiers and to deliver immediate operands to the Register File. • I%OPERAND_BUS_H: This bus transfers information from the Ibox to the source and destination queues in the Ebox. • Mo/cMD_BUS_H: The bus returns right-justified memory read data from the Mbox to either the Ibox (64 bits) or the Ebox (32 bits). • M%S6_PA....H: This bus transfers the address for a backup cache reference from the MBox to the Cbox. • NDAL: The NDAL are bidirectional off-cbip multiplexed address and data lines used by the Cbox to communicate with the memory subsystem. The NDAL carries :6.11 data and writeback requests to the CPU, and writeback data and read requests from the CPU to memory. DIGITAL CONFIDENTIAL Chip Overview 4-5 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 , 4.2 Revision History Table 4-1: Revision History Who When DesmdpuoDofCbaage Debra Bernstein 06-Mar-1989 Release for extemal review. Mike Uhler 18-Dec-1989 Update for second-pass release. Mike Uhler 04-Dec-1990 Update after pass 1 PG. 4-6 Chip OVerview DIGITAL CONFIDENTIAL Chapter 5 Macroinstruction and Microinstruction Pipelines 5.1 Introduction This chapter discusses the architecture of the !\'"'\:U CPU macroinstruction and microinstruction pipeline. It includes a section of general pipeline fundamentals to set the stage for the specific NVAX CPU implementation of the pipeline. This is followed by an overview of the NVAX CPU pipeline, an examination of macroinstIUction execution, and a discussion of stall and exception handling from the viewpoint of the Ebox. 5.2 Pipeline Fundamentals This section discusses the fundamentals of instruction pipelining in a general manner that is independent of the NVAX CPU implementation. It is intended as a primer for those readers who do not understand the concept and implications of instruction pipelining. Readers familiar with this material are encouraged to skip (or at most skim) this section. 5.2.1 The Concept of a Pipeline The execution of a VAX macroinstruction involves a sequence of steps which are carried out in order to complete the macroinstruction operation. Among these steps are: instruction fetch, instruction decode, specifier evaluation and operand fetch, instruction execution, and result store. On the simplest machines, these steps are carried out sequentially, with no overlap of the steps, as shown in Figure 5-1. DIGITAL CONFIDENTIAL Macroinstruction and Microinstruction Pipelines 5-1 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 5-1: Non-Plpellned Instruction Execution ---------------> +--------------------+ 1501511521531541551561 +--------------------+ +--------------------+ 1501511521531541551561 +--------------------+ +--------------------+ 1501511521531541551561 +--------------------+ --------------- Time Instruction 1 Instruction 2 Instruction 3 In this diagram, "SO", "S2", ..., "S6" denote particular steps in the execution of an instruction. For this simple scheme, all of the steps for one instruction are performed, and the instruction is completed, before any of the steps for the next instruction are started. In more complex machines, one or more steps of the execution process are carried out in parallel with other steps. For example, consider Figure 5-2. Figure 5-2: Partlally.Plpelined Instruction execution ---------------> ---------------------+----------------------------------- T~ Instruction 1 Instruction :2 Instruction 3 . . 15015lI5215315'155ISf, ~--------------------+ 150:5:1521531541551561 +--------------------+ +--------------------+ 15015l1521531541S51561 +--------------------+ In this example, step 86 of each instruction is overlapped in time (or executed in parallel) with step SO of the next instruction. In doing so, the number of instructions executed per unit time (instruction throughput) goes up because an instruction appears to take less time to complete. In the most complex machines, most (or all) of the steps are executed in parallel as indicated in Figure 5-3. 5-2 Macroinstruction and Microinstruction Pipelines DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Speemcation, Revision 1.0, February 1991 Figure 5-3: Fully.Plpelined Instruction execution ---------------> +--------------------+ 1501511521531541551561 +--------------------+ +--------------------+ 1501511521531541551561 +--------------------+ +--------------------+ 1501511521531541551561 +--------------------+ +--------------------+ 1501511521531541551561 +--------------------+ +--------------------+ :501511521531S41551S61 +--------------------+ --------------- !ime Instruction 1 Instruction 2 Instruction 3 Instruction 4 Instruction 5 In this example every step of instruction execution is performed in parallel with every other step. This means that a ne\v instruction is started as soon as step 80 is completed for the previous instruction. If each step, 80 ..86, took the same amount of time, the apparent instruction throughput would be seven times greater than that of Figure 5-1 above, even though each instruction takes the same amount of time to execute in both cases. Figures 5-2 and 5-3 are examples of the concept of instruction pipelining, in which one or more steps necessary to ex.ecute an instruction are performed in parallel with steps for other instructions. 5.2.2 Pipeline Flow A real-world form of a pipeline is an automobile assembly line. At each station of the assembly line (called segments of the pipeline in our case), a task is performed on the partially completed automobile and the result is passed on to the next station. At the end of the assembly line, the automobile is complete. In an instruction pipeline, as in an assembly line, each segment is responsible for performing a task and passing the completed result to the next segment. The exact task to be performed in each pipeline segment is a function of the degree of pipelining implemented and the complexity of the instruction set. One attribute of an automobile assembly line is equally important to an instruction pipeline: smooth and continuous flow. An automobile assembly line works well because the tasks to be performed at each station take about the same amount of time. This keeps the line moving at a constant pace, with no starts and stops which would redu.ce the number of completed automobiles per unit time. An analogous situation exists in an instruction pipeline. In order to achieve real efficiency in an instruction pipeline, information must flow smoothly and continuously from the start of the pipeline to the end. If a pipeline segment somewhere in the middle is not able to supply results to the next segment of the pipeline, the entire pipeline after the offending segment must stop, or stall, until the segment can supply a result. In the general case, a pipeline stall results when a pipeline segment can not supply a result to the next segment, or when it can not accept a new result from a previous segment. DIGITAL CONFIDENTIAL Macroinstruction and Microinstruction Pipelines ~ NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 This is a fundamental problem with most instruction pipelines because they occasionally (or not so occasionally) stall. Stalls result in decreased instruction throughput because the smooth flow of the pipeline is broken. A typical example of a pipeline stall involves memory reads. A simple three-segment pipeline might fetch operands in segment 1, use the operands to compute results in segment 2, and make memory references or store results in segment 3, as shown in Figure 5-4. Figure 5-4: Simple Three-Segment Pipeline +-----------+ +-----------+ +-----------+ 1 Operand 1-> I Computation 1->1 Memory I Access I I I I Read +-----------+ +-----------+ +-----------+ Figure 5-5 illustrates what happens when the pipeline control wants to use the result of the memory read as an operand. Figure 5-5: Information Flow Against the Pipeline ~-----------1 1 I2 O!=,&=r.d ~----------+ +-----------Meme=-..! i ----+ ,-> I Computation 1->1 1 1 .iL~::eS5! aead 1 1 +------------ +-----------+ +-----------+ I --------------------------------------+ I +-----------+ +-----------+ +-----------+ Operand 1->ICcmputationl->1 Result ~---->I 1 Access 1 1 1 I Stor& +-----------+ +-----------+ +-----------+ In this case, the operand access segment of 12 can not supply an operand to the computation segment because the memory read done by II has not yet completed. As a result, the pipeline must stall until the memory read has completed. This is shown in Figure 5-6. 5-4 Macroinstruction and Microinstruction Pipelines DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 5-6: Stalls Introduced by Backward Pipeline Flow I1 12 I2 :2 +-----------+ +-----------+ +-----------+ 1 Operand 1->IComputat1onl->1 Memory 1----+ 1 Access 1 1 1 1 Read 1 1 +-----------+ +-----------+ +-----------+ 1 +-------------------------------------+ 1 +-----------+ +-----------+ +-----------+ +---->1 Stall 1->1 Stall 1->1 Stall 1 1 1 1 1 1 1 1 +-----------+ +-----------+ +-----------+ 1 1 +-----------+ +-----------+ +-----------+ +------------------->1 Stall 1->1 Stall 1->1 Stall 1 1 1 1 1 1 +-----------+ +-----------+ +-----------+ 1 1 +-----------+ +-----------+ +-----------+ +---------------------------------->1 Operand 1->IComputat1onl->1 Result 1 1 Access 1 1 1 1 Store I +-----------+ +-----------+ +-----------+ In this diagram, the memory read data from 11 is not available until the read request passes through segment 3 of the pipeline. But the operand access segment for 12 wants the data immediately. The result is that the operand access segment of 12 has to stall twice waiting for the memory read data to become available. This, in turn, stalls the rest of the pipeline segments after the operand access segment. This situation is an excellent example of an age-old problem with instruction pipelining. The natural and desired direction of information :fiow in a pipeline is from left to right in the above diagrams. In this case, information must flow from the output of the memory read segment into the operand access segment. This requires a right-to-Ieft movement of information from a later pipeline segment to an earlier one. In general, any information transfer which goes against the normal flow of the pipeline has the potential for causing pipeline stalls. 5.2.3 Stalls and Exceptions in an Instruction Pipeline Even the best pipeline design must be prepared to deal with stalls and exceptions created in the pipeline. As mentioned above, a stall is a condition in which a pipeline segment can not accept a new result from a previous segment, or can not send a result to a new segment. An exception occurs when a pipeline segment detects an abnormal condition which must stop, and then drain the pipeline. Examples of exceptions are: memory management faults, reserved operand faults, and arithmetic overflows. One of the inherent costs of a pipelined implementation is the extra logic necessary to deal with stalls and exceptions. There are two primary considerations concerning stalls: what action to take when one occurs, and how to minimize them in the first place. The design of most instruction pipelines assumes that the pipeline will not stall, and handles the stall condition as a special case, rather than the other way around. This means that each segment of the pipeline performs its function and produces a result each cycle. If a stall occurs just before the end of the cycle, the segment must block global state updates and repeat the same operation during the next cycle. The design of the pipeline control must take this into account and be prepared to handle the condition. A common stall condition occurs when each pipeline segment has the same average speed, but different peak speeds. For example, a pipeline segment whose task is to perform both memory references and register result stores may take longer to perform memory references than result stores. This can cause earlier segments of the pipeline to stall because the segment can not take new inputs as fast if it is doing a memory reference rather than a result store. A common DIGITAL CONFIDENTIAL Macroinstruction and Microinstruction Pipelines 5-5 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 technique to minimize this problem is to place buffers between pipeline segments, as shown in Figure 5-7. Figure 5-7: Buffers Between Pipeline Segments +-----------+ +------+ +-----------+ +------+ +-----------+ I Operand 1-> IBu£fer 1-> IComputation 1-> IBu££er 1->1 Memory I Aeees s I I I I I b I I Read +-----------+ +------+ +-----------+ +------+ +-----------+ By placing a buffer of sufficient depth between each segment of the pipeline, segments of differing peak speeds can avoid stalls caused if the next segment is unable to accept a new result. Instead, the result goes into the inter-segment buffer and the next segment removes it from the buffer when it needs it. Unfortunately, adding such buffers means that additional logic must also be added to handle the buffer fullIbuffer empty conditions. The performance advantage of an instruction pipeline comes from the parallelism built into the pipeline. If the parallelism is defeated by, for example, a stall, the advantage starts to drop. One problem associated with pipelines is that they can provide '1umpy" performance. That is, two similar programs may experience radically different performance if one causes many more stalls (which defeat the parallelism of the pipeline) than the other. Pipeline exceptions are different from stalls in that exceptions cause the pipeline to empty or drain. Usually, everything that entered the pipeline before the point of error is allowed to complete. Everything that entered the pipeline after the point of error is prevented from completing. This can add considerable complexity to the pipeline control. A larger problem occurs when the designer wants exceptions to be recoverable. Consider an exception caused by a memory management fault. On the VAX., this condition can occur because of a TB miss. The correct response to this fault is to read a PrE from memory, refill the TB, and restart the request that caused the fault. This can add considerable complexity to the design. 5.3 NVAX CPU Pipeline Overview The remainder of this chapter discusses the NVAX CPU pipeline, which is shown as a block diagram in Figure 5-8. This is a high-level view of the CPU and abstracts many of the details. For a more detailed view of the pipeline, users are encouraged to refer to the individual box chapters in this specification. The pipeline is divided into seven segments denoted as "SO" through "S6". In Figure 5-8, the components of each section of the CPU are shown in the segment of the pipeline in which they operate. The NVAX CPU is fully pipelined and, as such, is most similar to the abstract example shown in Figure 5-3. In addition to the overall macroinstruction pipeline, in which multiple macroinstructions are processed in the various segments of the pipeline, most of the sections also micropipeline operations. That is, if more than one operation is required to process a macroinstruction, the multiple operations are also pipelined within a section. 5-6 Macroinstruction and Microinstruction Pipelines DIGITAL CONFIDENTIAL c i5 ~r J! fa C o CiJ o ~ Z "11 5 m z ~ ~ >< ;; r so 0 '''0 'ILlr 0 ...._ ....... 00 "V IlAMII 81 82 83 84 c: 88 85 l! 1.5" MYoMD_OUB_" I I I CD ~I ii ft. l ~ ct NDAL 6" ! ::J ::J ! B: 0" ! It. CM OUT UHC" ~ D- a "T" Mil 3" ~ ~ ; .. a: ~ MjOX 2 C S n : : !1 ~ ~ ---~.-~~~ I'X oUt,,,: ~ c f. !I. ~ S & $> c t"" g :2 ~ '~"7 .;-, I3" :: t jlf(~ 1;-....,..;:~ '~j -,,....,.,. ~ j to-' t8 to-' NVAX CPU Chip Functional Speci:&cation, Revision 1.0, February 1991 5.3.1 Normal Macroinstruction Execution Execution of macroinstructions in the NVAX pipeline is decomposed into many smaller steps which are the distributed responsibility of the various sections of the chip. Because the NVAX CPU implements a macroinstruction pipeline, each section is relatively autonomous, with queues inserted between the sections to normalize the processing rates of each section. 5.3.1.1 The Ibox The Ibox is responsible for fetching instruction stream data for the next instruction, decomposing the data into opcode and specifiers, and evaluating the specifiers with the goal of prefetcbing operands to support Ebox execution of the instruction. The Ibox is distributed across segments SO through S3 of the pipeline, with most of the work being done in 81. In 80, instruction stream data is fetched from the virtual instruction cache (VI C) using the address contained in the virtual instruction buffer address register MBA). The data is written into the prefetch q1l:eue (PFQ) and VIBA is incremented to the next location. In segment S1, the PFQ is read and the burst unit uses internal state and the contents of the IROM to select the next instruction stream component-either an opcode or specifier. This decoding processing is known as bursting. Some instruction components take multiple cycles to burst. For example, FD opcodes require two burst cycles: one for the FD byte, and one for the second opcode byte. Similarly, indexed specifiers require at least two burst cycles: one for the index byte, and one or more for the base specifier. "When an opcode is decoded, the information is passed to the issue unit, which consults the mOM for the initial Ebox control store address of the routine which will process the instruction. The issue unit sends the address and other instruction-related information to the instruction queue where it is held until the Ebox reaches the instruction. "When a specifier is decoded, the information is passed to the source and destination queue allocation logic and, potentially, to the complex specifier pipeline. The source and destination queue allocation logic allocates the appropriate number of entries for the specifier in the source and destination queues in the Ebox. These queues contain pointers to operands and results, and are discussed in more detail below. If the specifier is not a short literal or register specifier, which are collectively known as simple specifiers, it is considered to be a complex specifier and is processed by the small microcode-controlled complex specifier unit (CSU), which is distributed in segments 81 (control store access), S2 (operand access, including register file read), and S3 (ALU operation, Mbox request, GPR write) of the pipeline. The esu pipeline computes all specifier memory addresses, and makes the appropriate request to the Mbox for the specifier type. To avoid reading or writing a GPR which is interlocked by a pending Ebox reference, the CSU pipeline includes a register scoreboard which detects data dependencies. The esu pipeline also provides additional help to the Ebox by supplying operand information that is not an explicit part of the instruction stream. For example, the PC is supplied as an implicit operand for instructions that require it (such as BSBB). The branch prediction unit (BPU) watches each opcode that is decoded looking for conditional and unconditional branches. For unconditional branches, the BPU calculates the target PC and redirects PC and VIBA to the new path. For conditional branches, the BPU predicts whether the instruction will branch or not based on previous history. If the prediction indicates that the branch will be taken, PC and VIBA are redirected to the new path. The BPU writes the conditional 5-8 Macroinstruction and Microinstruction Pipelines DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 branch prediction flag into the branch queue in the Ebox, to be used by the Ebox in the execution of the instruction. The BPU maintains enough state to restore the correct instruction PC if the prediction turns out to be incorrect. 5.3.1.2 The Microsequencer The microsequencer operates in segment 82 of the pipeline and is responsible for supplying to the Ebox the next microinstruction to execute. If a macroinstruction requires the execution of more than one microinstruction, the microsequencer supplies each microinstruction in sequence based on directives included in the previous microinstruction. At macroinstruction boundaries, the microsequencer removes the next entry from the instruction queue, which includes the initial microinstruction address for the macroinstruction. If the instruction queue is empty, the microsequencer supplies the address of a special no-op microinstruction. The microsequencer is also responsible for evaluating all exception requests, and for providing a pipeline £lush control signal to the Ebox. For certain exceptions and interrupts, the microsequencer injects the address of a special microinstruction handler that is used to respond to the event. 5.3.1.3 The Ebox The Ebox is responsible for executing all of the non-floating point instructions, for delivery of operands to and receipt of results from the Fbox, and for handling non-instruction events such as interrupts and exceptions. The Ebox is distributed through segments 83 (operand access, including register file read), S4 (ALU and shifter operation, Rmux request), and 85 (Rmux completion, register write, completion of Mbox request) of the pipeline. For the most part, instruction operands are prefetched by the Ibox, and addressed indirectly through the source queue. The source queue contains the operand itselffor short literal specifiers, and a pointer to an entry in the register file for other operand types. An entry in the field queue is made when a field-type specifier entry is made into the source queue. The field queue provides microbranch conditions that allow the Ebox microcode to determine if a field-type specifier addresses either a GPR or memory. A microbranch on a valid field queue entry retires the entry from the queue. The register file is divided into four parts: the GPRs, memory data (MD) registers, working registers, and CPU state registers. For register-mode speci::6.ers, the source queue points to the appropriate GPR in the register file. For other non-short literal speci::6.er modes, the source queue points to an MD register. The MD register is either written directly by the Ibox, or by the Mbox as the result of a memory read generated by the Ibox. The 83 segment of the Ebox pipeline is responsible for selecting the appropriate operands for the Ebox and Fbox execution of instructions. Operands are selected onto E%ABUS_B and E%BBUS_B for use in both the Ebox and Fbox. In most instances, these operands come from the register file, although there are other data path sources of non-instruction operands (such as the P8L). Ebox computation is done by the ALU and the shifter in the S4 segment of the pipeline on operands supplied by the 83 segment. Control for these units is supplied by the microinstruction which was originally supplied to the 83 segment by the microsequencer, and then subsequently moved forward in the pipeline. DIGITAL CONFIDENTIAL Macroinstruction and Microinstruction Pipelines 5-9 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991· The S4 segment also contains the RMUX, whose responsibility is to select results from either the Ebox or Fbox and perform the appropriate register or memory operation. The RMUX inputs come from the ALU, shifter, and P%FBO~RESULT_B at the end of the cycle. The RMUX actually spans the S4/85 boundary such that its outputs are valid at the beginning of the 85 segment. The RMUX is controlled by the retire queue, which specifies the source (either Ebox or Fbox) of the result to be processed (or retired) next. Non-selected RMUX sources are delayed until the retire queue indicates that they should be processed. As the source queue points to instru.ction operands, so the destination queue points to the destination for instruction results. If the result is to be stored in a GPR, the destination queue contains a pointer to the appropriate GPR. If the result is to be stored in memory, the destination queue indicates that a request is to be made to the Mbox, which contains the physical address of the result in the PA queue (which is described below). This information is supplied as a control input to the RMUX logic. Once the RMUX selects the appropriate source of result information, it either requests Mbox service, or sends the result onto E%WBUS_B to be written back to the register file or to other data. path registers in the 85 segment of the pipeline. The interface between the Ebox and Mbox for all memory requests is the EM_LATCH, which contains control information and may contain an address, data, or both, depending on the type of request. In addition to operands and results that are prefetched by the Ibox, the Ebox can also make explicit memory requests to the Mbox to read or write data. 5.3.1.4 The Fbox The Fbox is responsible for executing all of the :floating point instructions in the VAX base instruction group, as well as the longword-Iength integer multiply instructions. For each instruction that the Fbox is to execute, it receives from the microsequencer the opcode and other instruction-related information. The Fbox receives operand data from the Ebox on Eo/oABUS_B and E%BBUS_H. Execution of instructions is performed in a dedicated Fbox pipeline that appears in segment S4 of Figure 5-8,. but is actually a minimum of three cycles in length. Certain instructions, such as integer multiply, may require multiple passes through some segments of the Fbox pipeline. Other instructions, such as divide, are not pipelined at all. Fbox results and status are returned via P%FBO~BESULT_H to the RMUX in the Ebox for retirement. When the instruction is next to retire, the RMUX hardware, as directed by the destination queue, sends the results to either the GPRs for register destinations, or to the Mbox for memory destinations. 5.3.1.5 The Mbox The Mbox operates in the 85 and 86 segments of the pipeline, and is responsible for all memory references initiated by the other sections of the chip. Mbox requests can come from the Ibox (for VIC fills and for specifier references), the Ebox or Fbox via the RMUX and the EM_LATCH (for instruction result stores and for explicit Ebox memory requests), from the Mbox itself (for translation buffer fills and PrE reads), and from the Cbox (for invalidates and cache fills). 5-10 Macroinstruction and Microinstruction Pipelines DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 All virtual references are translated to a physical address by the translation buffer (TB), which operates in the 85 segment of the pipeline. For instruction result references generated by the Ibox, the translated address is stored in the physical address queue CPA queue). These addresses are later matched with data from the Ebox or Fbox, when the result is calculated. For memory references, the physical address from either the TB or the PA queue is used to address the primary cache (Pcache) starting in the 85 segment of the pipeline and continuing into the 86 segment. Read data is available in the middle of the 86 segment, right-justified and returned to the requester on M%'MD_BUS_B by the end of the cycle. Writes are also completed by the end of the cycle. Although the Pcache access spans the 85 and 86 segments of the pipeline, a new access can be started each cycle in the absence of a TB or cache miss. 5.3.1.6 The Cbox The Cbox is responsible for maintaining and accessing the backup cache (Bcache), and for control of the off-chip bus (the NDAL). The Cbox receives input from the Mbox in the 86 segment of the pipeline, and usually takes multiple cycles to complete a request. For this reason, the Cbox is not shown in specific pipeline segments. If a memory read misses in the Pcache, the request is sent to the Cbox for processin~. The Cbox first looks for the data in the Bcache and fills the Pcache from the Bcache if the data is present. If the data is not present in the Bcache, the Cbox requests a cache filIon the ~-nAL from memory. When memory returns the data, it is written to both the Bcache and to the Pcache (and potentially to the VIC). Although Pcache fills are done by making a request to the Mbox pipeline, data is returned to the original requester as quickly as possible by driving data directly onto Bo/cS6_DAT.A...B, and from there onto M%MD_BUS_B as soon as the bus is free. Because the Pcache operates as a write-through cache, all memory writes are passed to the Cbox. 10 avoid multiple writes to the same Bcache block., the Cbox contains a write buffer in which multiple writes to the same quadwords are packed together before the Bcache is actually written. To maintain cache coherence with other system components, the Obex acquires ownership of any data that is written to the cache. 5.3.2 Stalls in the Pipeline Despite our best attempts at keeping the pipeline flowing smoothly, there are conditions which cause segments of the pipeline to stall. Conceptually, each segment of the pipeline can be considered as a black box which performs three steps every cycle: 1. The task appropriate to the pipeline segment is performed, using control and inputs from the previous pipeline segment. The segment then updates local state (within the segment), but not global state (outside of the segment). 2. Just before the end of the cycle, all segments send stall conditions to the appropriate state sequencer for that segment, which evaluates the conditions and determines which, if any, pipeline segments must stall. 3. If no stall conditions exist for a pipeline segment, the state sequencer allows it to pass results to the next segment and accept results from the previous segment. This is accomplished by updating global state. DIGITAL CONFIDENTIAL Macroinstruction and Microinstruction Pipelines 5-11 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 This sequence of steps maximizes throughput by allowing each pipeline segment to assume that a stall will not occur (which should be the common case). If a stall does occur at the end of the cycle, global state updates are blocked, and the stalled segment repeats the same task (with potentially different inputs) in the next cycle (and the next, and the next) until the stall condition is removed. This description is over-simplified in some cases because some global state must be updated by a segment before the stall condition is known. Also, some tasks must be performed by a segment once and only once. These are treated specially on a case-by-case basis in each segment. Within a particular section of the chip, a stall in one pipeline segment also causes stalls in all upstream segments (those that occur earlier in the pipeline) of the pipeline. Unlike Rigel, stalls in one segment of the pipeline do not cause stalls in downstream segments of the pipeline. For example, a memory data stall in Rigel also caused a stall of the downstream ALU segment. In NVAX., a memory data stall does not stall the ALU segment (a no-op is inserted into the S4 segment when S4 advances to S5). There are a number of stall conditions in the chip which result in a pipeline stall. Each is discussed briefly below and in much more detail in the appropriate chapter of this specification. 5.3.2.1 SO Stalls Stalls that occur in the SO segment of the pipeline are as follows: Ibox: • 5.3.2.2 PFQ full: In normal operation, the VIC is accessed using the address in VIBA, the data is sent to the prefetch queue, and VIBA is incremented. If the PFQ is full, the increment of VIBA is blocked, and the data is re-referenced in the VIC until there is room for it in the PFQ. At that point, prefetch resumes. S1 Stalls Stalls that occur in the Sl segment of the pipeline are as follows: !box: • • • Insufficient PFQ data: The burst unit attempts to decode the next instruction component each cycle. If there are insufficient PFQ bytes valid to decode the entire component, the burst unit stalls until the required bytes are delivered from the VIC. Source queue or destination queue full: During specifier decoding, the source and destination queue allocation logic must allocate enough entries in each queue to satisfy the requirements of the specifier being parsed. To guarantee that there will be sufficient resources available, there must be at least 2 free source queue entries and 2 free destination queue entries to complete the burst of the specifier. If there are insufficient free entries in either queue,the burst unit stalls until free entries become available. MD file full: When a complex specifier is decoded, the source queue allocation logic must allocate enough memory data registers in the register file to satisfy the requirements of the specifier being parsed. 'Ib guarantee that there will be sufficient resources available, there must be at least 2 free memory data registers available to complete the burst of the specifier. If there are insufficient free registers, the burst unit stalls until enough memory data registers becomes available. 5-12 Macroinstruction and Microinstruction Pipelines DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • • • • 5.3.2.3 Second conditional branch decoded: The branch prediction unit predicts the path that each conditional branch will take and redirects the instruction stream based on that prediction. It retains sufficient state to restore the alternate path if the prediction was wrong. If a second conditional branch is decoded before the first is resolved by the Ebox, the branch prediction unit has nowhere to store the state, so the burst unit stalls until the Ebox resolves the actual direction of the first branch. Instruction queue full: When a new opcode is decoded by the burst unit, the issue unit attempts to add an entry for the instruction to the instruction queue. If there are no free entries in the instruction queue, the burst unit stalls until a free entry becomes available, which occurs when an instruction is retired through the RMtJX. Complex specifier unit busy: If the burst unit decodes an instruction component that must be processed by the CSU pipeline, it makes a request for service by the CSU through an 81 request latch. If this latch is still valid from a previous request for service (either due to a multi-cycle :flow or a CSU stall), the burst unit stalls until the valid bit in the request latch is cleared. Immediate data length not available: The length of the specifier extension for immediate specifiers is dependent on the data length of the specifier for that specific instruction. The data length information comes from one of the Ibox instr-:.ction PLAs which is accessed based on the opcode of the instruction. If the PLA access is not complete before an immediate specifier is decoded (which would have to be the first specifier of the instruction), the burst unit stalls for one cycle. S2 Stalls Stalls that occur in the 82 segment of the pipeline are as follows: Ibox: • • Outstanding Ebox or Fbox GPR write: In order to calculate certain specifier memory addresses, the C8U must read the contents of a GPR from the register file. If there is a pending Ebox or Fbox write to the register, the Ibox GPR scoreboard prevents the GPR read by stalling the 82 segment of the C8U pipeline. The stall continues until the GPR write completes. Memory data not valid: For certain operations, the Ibox makes an Mbox request to return data which is used to complete the operation (e.g., the read done for the indirect address of a displacement deferred specifier). The Ibox MD register contains a valid bit which is cleared when a request is made, and set when data returns in response to the request. If the Ibox references the Ibox MD register when the valid bit is off, the 82 segment of the e8U pipeline stalls until the data is returned by the Mbox. Microsequeneer: • Instruction queue empty: The final microinstruction of a macroinstruction execution How in the Ebox is indicated when a 8EQ.MUXILAST.CYCLE* microinstruction is decoded by the microsequencer. In response to this event, the Ebox expects to receive the first microinstruction of the next macroinstruction How based on the initial address in the instruction queue. If the instruction queue is empty, the Microsequencer supplies the instruction queue stall microinstruction in place of the next macroinstruction :flow. In effect, this stalls the micro sequencer for one cycle. DIGITAL CONFIDENTIAL Macroinstruction and Microinstruction Pipelines 5-13 NVAX CPU Chip Functional Speci:fication, Revision 1.0t February 1991 5.3.2.4 S3 Stalls Stalls that occur in the S3 segment of the pipeline are as follows: Ibox: • • • Outstanding Ebox GPR read: In order to complete the processing for auto-increment, auto-decrement, and auto-increment defelTed specifiers, the CSU must update the GPR with the new value. If there is a pending Ebox read to the register through the source queue, the Ibox scoreboard prevents the GPR write by stalling the S3 segment of the esu pipeline. The stall continues until the Ebox reads the GPR. Specifier queue full: For most complex specifiers, the esu makes a request for Mbox service for the memory request required by the specifier. If there are no free entries in the specifier queue, the S3 segment of the esu pipeline stalls until a free entry becomes available. RLOG full: Auto-increment, auto-decrement, and auto-increment defelTed specifiers require a free RLOG entry in which to log the change to the GPR. If there are no free RLOG entries when such a specifier is decoded, the S3 segment of the esu pipeline stalls until a free entry becomes available. Ebox: • • • • Memory read data not valid: In some instances, the Ebox may make an explicit read request to the Mbox to return data in one of the 6 Ebox working registers in the register file. When the request is made, the valid bit on the register is cleared. '\\1ten the data is written to the register, the valid bit is set. If the Ebox references the working register when the valid bit is clear, the S3 segment of the Ebox pipeline stalls until the entry becomes valid. Field queue not valid: For each macroinstruction that includes a field-type specifier, the microcode microbranches on the first entry in the field queue to determine whether the field specifier addresses a GPR or memory. If the field queue is empty (indicating that the Ibox has not yet parsed the field specifier), the result of the next address calculation repeats the microbranch the next cycle. Although this is not a true stall, the effects are the same in that a microinstruction is repeated until the field queue becomes valid. Outstanding Fbox GPR write: Because the Fbox computation pipeline is multiple cycles long, the Ebox may start to process subsequent instructions before the Fbox completes the first. If the Fbox instruction result is destined for a GPR that is referenced by a subsequent Ebox microword, the S3 segment of the Ebox pipeline stalls until the Fbox GPR write occurs. Fbox instruction queue full: When an instruction is issued to the Fbox, an entry is added to the Fbox instruction queue. If there are no free entries in the queue, the S3 segment of the Ebox pipeline stalls until a free entry becomes available. EboxlFbox: • Source queue empty: Most instruction operands are prefetched by the Ibox, which writes a pointer to the operand value into the source queue. The Ebox then references up to two operands per cycle indirectly through the source queue for delivery to the Ebox or Fbox. If either of the source queue entries referenced is not valid, the S3 segment of the Ebox pipeline stalls until the entry becomes valid. 5-14 Macroinstruction and MicrolnstNction Pipelines DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • 5.3.2.5 Memory operand not valid: Memory operands are prefetched by the Ibox, and the data is written by the either the Mbox or Ibox into the memory data registers in the register file. If a referenced source queue entry points to a memory data register which is not valid, the S3 segment of the Ebox pipeline stalls until the entry becomes valid. S4 Stalls Stalls that occur in the S4 segment of the pipeline are as follows: Ebox: • • Branch queue empty: When a conditional or unconditional branch is decoded by the Ibox, an entry is added to the branch queue. For conditional branch instructions, the entry indicates the Ibox prediction of the branch direction. The branch queue is referenced by the Ebox to verify that the branch displacement was valid, and to compare the actual branch direction with the prediction. If the branch queue entry has not yet been made by the Ibox, the S4 segment of the Ebox pipeline stalls until the entry is made. Fbox GPR operand scoreboard full: The Ebox implements a register scoreboard to prevent the Ebox from reading a GPR to which there is an outstanding write by the Fbox. For each Fbox instruction which will write a GPR result, the Ebox adds an entry to the Fbox GPR scoreboard. If the scoreboard is full when the Ebox attempts to add an entry, the S4 segment of the Ebox pipeline stalls until a free entry becomes available. Fbox: • Fbox operand not valid: Instructions are issued to the Fbox when the opcode is removed from the instruction queue by the microsequencer. Operands for the instruction may not arrive until some time later. If the Fbox attempts to start the instruction execution when the operands are not yet valid, the Fbox pipeline stalls until the operands become valid. EboxlFbox: • • • Destination queue empty: Destination specifiers for instructions are processed by the Ibox, which writes a pointer to the destination (either GPR or memory) into the destination queue. The destination queue is referenced in two cases: when the Ebox or Fbox store instruction results via the RMIJX, and when the Ebox tries to add the destination of Fbox instructions to the Ebox GPR scoreboard. If the destination queue entry is not valid (as would be the case if the Ibox has not completed processing the destination specifier), a stall occurs until the entry becomes valid. PA queue empty: For memory destination specifiers, the Ibox sends the virtual address of the destination to the Mbox, which translates it and adds the physical address to the PA queue. If the destination queue indicates that an instruction result is in memory, a store request is made to the Mbox which supplies the data for the result. The Mbox matches the data with the first address in the PA queue and performs the write. If the PA queue is not valid when the Ebox or Fbox has a memory result ready, the RMUX stalls until the entry becomes valid. As a result, the source of the RMUX input (Ebox or Fbox) also stalls. EM_LATCH full: All implicit and explicit memory requests made by the Ebox or Fbox pass through the EM_LATCH to the Mbox. If the Mbox is still processing the previous request when a new request is made, the RMUX stalls until the previous request is completed. As a result, the source of the RMUX input (Ebox or Fbox) also stalls. DIGITAL CONFIDENTIAL Macroinstruction and Microinstruction Pipelines 5-15 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • 5.3.3 RMUX selected to other source: Macroinstructions must be completed in the order in which they appear in the instruction stream. The Ebox retire queue determines whether the next instruction to complete comes from the Ebox or the Fbox. If the next instruction should come from one source and the other makes an RMUX request, the other source stalls until the retire queue indicates that the next instruction should come from that source. Exception Handling A pipeline exception occurs when a segment of the pipeline detects an event which requires that the normal flow of the pipeline be stopped in favor of another flow. There are two fundamental types of pipeline exceptions: those that resume the original pipeline flow once the exception is corrected, and those that require the intervention of the operating system. A TB miss on a memory reference is an example of the :first type, and an access control violation is an example of the second type. M=O faults are handled specially, as described below. Restartable exceptions are handled entirely within the confines of the section that detected the event. Other exceptions must be reported to the Ebox for processing. Because the NVAX CPU is macropipelined, exceptions can be detected by sections of the pipeline long before the instruction which caused the exception is actually executed by the Ebox or Fbox. However, the reporting of the exception is deferred until the instruction is executed by the Ebox or Fbox. At that point, an Ebox handler is invoked to process the event. Because the Ebox and Fbox are micropipelined, the point at which an exception handler is invoked must be carefully controlled. For example, three macroinstructions may be in execution in segments S3, 84, and S5 of the Ebox pipeline. If an exception is reported for the macroinstruction in the 83 segment, the t"\vo macroinstructions that are in the 84 and 85 segments must be allowed to complete before the exception handler is invoked. To accomplish this, the S41S5 boundary in the Ebox is defined to be the commit point for a microinstruction. Architectural state is not modified before the 85 segment of the pipeline, unless there is some mechanism for restoring the original state if an exception is detected (the Ibox RLOG is an example of such a mechanism). Exception reporting is deferred until the microinstruction to which the event belongs attempts to cross the S4!S5 boundary. At that point, the exception is reported and an exception handler is invoked. By deferring exception reporting to this point, the previous microinstruction (which may belong to the previous macroinstruction) is allowed to complete. Most exceptions are reported by requesting a microtrap from the Microsequencer. When the Microsequencer receives a microtrap request, it causes the Eboxto break all its stalls, aborts the Ebox pipeline (by asserting E_USQ%PEJ\BORT_L), and injects the address of a handler for the event into the control store address latch. This starts an Ebox microcode routine which will process the exception as appropriate. Certain other kinds of exceptions are reported by simply injecting the appropriate handler address into the control store at the appropriate point. The VAX architecture categorizes exceptions into two types: faults and traps. For both types, the microcode handler for the exception causes the Ibox to back out all GPR modifications that are in the RLOG, and retrieves the PC from. the PC queue. For faults, the PC returned is the PC of the opcode of the instruction which caused the exception. For traps, the PC returned is the PC of the opcode of the next instruction to execute. The microcode then constructs the appropriate exception frame on the stack, and dispatches to the operating system through the appropriate 8CB vector. 5-16 Macroinstruction and Microinstruction Pipelines DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Hevision 1.0, February 1991 There are a number of exceptions detected by the NVAX CPU pipeline, each of which is discussed briefly below, and in much more detail in the appropriate chapter of this specification. 5.3.3.1 Interrupts The CPU services interrupt requests from various sources between macroinstructions, and at selected points within the string instructions. Interrupt requests are received by the interrupt section and compared with the current IPL in the PSL. If the interrupt request is for an IPL that is higher than the current value in the PSL, a request is posted to the microsequencer. At the next macroinstruction boundary, the microsequencer substitutes the address of the microcode interrupt service routine for the instruction execution flow. The microcode handler then determines if there is actually an interrupt pending. If there is, it is dispatched to the operating system through the appropriate SCB vector. 5.3.3.2 Integer Arithmetic exceptions There are three integer arithmetic exceptions detected by the CPU, all of which are categorized as traps by the VAX architecture. This is significant because the event is not reported until after the commit point of the instruction, which allows that instruction to complete. Integer Overflow Trap An integer overflow is detected by the RMUX at the end of the S4 segment of the Ebox pipeline. If PSL<lV> is set and overflow traps are enabled by the microcode, the event is reported in segment 85 of the pipeline via a microtrap request. Integer Divide-By-Zero Trap An integer divide-by-zero is detected by the Ebox microcode routine for the instruction. It is reported by explicitly retiring the instruction and then jumping directly to the microcode handler for the event. Subscript Range Trap A subscript range trap is detected by the Ebox microcode routine for the INDEX instruction. It is reported by explicitly retiring the instruction and then jumping directly to the microcode handler for the event. 5.3.3.3 Floating Point Arithmetic exceptions All floating point arithmetic exceptions are detected by the Fbox pipeline during the execution of the instruction. The event is reported by the RMUX when it selects the Fbox as the source of the next instruction to process. At that point, a microtrap is requested. DIGITAL CONFIDENTIAL Macroinstruction and Microinstruction Pipelines 5-17 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 5.3.3.4 Memory Management Exceptions Memory management exceptions are detected by the Mbox when it processes a virtual read or write. This section covers actual memory management exceptions such as access control violation, translation not valid, and M=O faults. Translation buffer misses are discussed separately in the next section. Because the reporting of memory management exceptions is specific to the operation that caused the exception, each case is discussed separately. • I·Stream Faults While the Ibox is decoding instructions, it may access a page which is not accessible due to a memory management exception. This may occur on the opcode, a specifier or specifier extension, or on a branch displacement. Should this occur, the Ibox sets a global :MME fault flag and stops. Memory management exceptions detected on intermediate operations during specifier evaluation (such as a read for the indirect address of a displacement deferred specifier) are converted by the Ibox into source or destination faults, as described below. If the Ebox reaches the instruction which caused the exception (which may not happen due to, for example, interrupt, exception, or branch), it will reference one of the queues, which does not have a valid entry because the Ibox stopped when the error was detected. The particular queue depends on the instruction component on which the error was detected. If the Ibox global MME flag is set when an empty queue entry is referenced, the error is reported in one of four ways. If the Ibox global ~WE flag is set when the microsequencer references an invalid instruction queue entry, it inserts the instruction queue stall into the pipeline and the Ebox qualifies it with the fault flag. "When this flag reaches the 54 segment of the pipeline and is selected by the RMUX, a microtrap is requested. If the Ibox global MME flag is set when the Ebox references an invalid source queue entry, a fault flag is injected into either the Ebox or Fbox pipelines, depending on the type of instruction. To avoid a deadlock, S3 stalls do not prevent forward prgress of the flag in the pipeline. "When the flag reaches the S4 segment of the pipeline and is selected by the RMux, a microtrap is requested. If the Ibox global :M:ME flag is set when the Ebox microcode microbranches on an invalid field queue entry, a fault flag is injected into the Ebox ·pipeline. When the flag reaches the S4 segment of the pipeline and is selected by the RMUX, a microtrap is requested. If the Ibox global :MME :flag is set when the Ebox references an invalid branch queue entry, and the RMUX selects the Ebox, a microtrap is requested. If the Ibox global MME :flag is set when the RMUX references an invalid destination queue entry for a store request, a microtrap is requested. • Source Operand Faults If the Mbox detects a memory management exception during the translation for a source specifier, it qualifies the data returned to the MD file with a fault flag which is written into the MD file. When this entry is referenced by the Ebox, a fault flag is injected into the pipeline. To avoid a deadlock, S3 stalls do not prevent forward prgress of the flag in the pipeline. When the flag reaches the S4 segment of the pipeline and is selected by the RMUX, a microtrap is requested. 5-18 Macroinstruction and Microinstruction Pipelines DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • Destination Address Faults If the Mbox detects a memory management exception during the translation for a destination specifier, it sets a fault flag in the PA queue entry for the address. When this entry is referenced by the RM'UX, a microtrap is requested,. • Faults on Explicit Ebox Memory Requests Explicit Ebox reads and writes are, by definition, performed in the context of the instruction which the Ebox is currently executing. If the Mbox detects a memory management exception that was the result of an explicit Ehox read or write, it requests an immediate microtrap to the memory management fault handler. • M=O faults M=O faults occur when the Mbox finds the M-hit clear in the PTE which is used to translate write-type references. The event is reported to the Ebox in one of the three ways described above: ,"ia the MD :file or PA queue fault :flags, or via an immediate microtrap for explicit Ebox writes. Unlike other memory management exceptions, which are dispatched to the operating system, M=O faults are completely processed by the Ebox microcode handler. For normal instructions, the handler causes the !box to back out all GPR modifications that are in the RLOG and retrieves the PC from the PC queue. For string instructions, any RLOG entries that belong to the string instructions are not processed, and PSL<FPD> is set. Using the PTE address supplied by the Mbox, the Ebox microcode reads the PTE, sets the M-bit, and writes the PTE back to memory. The instruction stream is then restarted at the interrupted instruction (which may result in special FPD handling, as described below). 5.3.3.5 Translation Buffer Miss Translation buffer misses are handled by the Mbox transparently to the rest of the CPU. When a reference misses in the translation buffer, the Mbox aborts the current reference and invokes the services of the memory management exception sequencer in the Mbox, which fetches the appropriate PTE from memory and loads it into the translation buffer. The original reference is then restarted. 5.3.3.6 Reserved Addressing Mode Faults Reserved addressing mode faults are detected by the !box for certain illegal combinations of specifier addressing modes and registers. When one of these combinations is detected, the Ibox sets a global addressing mode fault flag that indicates that the condition was detected and stops. If the Ibox global addressing mode fault flag is set when the Ebox references an invalid source queue entry, a fault flag is injected into either the Ebox or Fbox pipelines, depending on the type of instruction. To avoid a deadlock, 83 stalls do not prevent forward prgress of the -flag in the pipeline. The fault flag is carried along the Ebox or Fbox pipeline and passed to the RMUX, which reports the event by requesting a microtrap when that source is selected. DIGITAL CONFIDENTIAL Macroinstruction and Microinstruction Pipelines 5-19 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 If the Ibox global addressing mode fault flag is set when the Ebox microcode microbranches on an invalid field queue entry, a fault flag is injected into the Ebox pipeline. When the flag reaches the 54 segment of the pipeline and is selected by the RMUX, a microtrap is requested. Similarly, if the Ibox global addressing mode fault flag is set when the RMUX, in response to a request by the Ebox or Fbox, references an invalid destination queue entry, a microtrap is requested. 5.3.3.7 Reserved Operand Faults Reserved operand faults for floating point operands are detected by the Fbox, and reported in the same manner as the floating point arithmetic exceptions described above. Other reserved operand faults are detected by Ebox microcode as part of macroinstruction execution flows and are reported by jumping directly to the fault handler. 5.3.3.8 Exceptions Occurring as the Consequence of an Instruction Opcode-specific exceptions such as reserved instruction faults, breakpoint faults, etc., are dispatched directly to handlers by placing the address of the handler in the instruction PLA. for each instruction. Other instruction-related faults, such as privileged instruction faults, are detected in execution Hows by the Ebox microcode and are reported by jumping directly to the fault handler. For testability, the Fbox may be disabled. If this is the case, integer multiply instructions are executed by the Ebox microcode and floating point instructions are converted into reserved instruction faults for emulation by software. When the first Ebox microinstruction of an Fbox operand flow for a floating point macroinstruction reaches the S4 segment of the pipeline, a microtrap is requested. The handler for this microtrap then jumps directly to the reserved instruction fault handler. 5.3.3.9 Trace Fault Trace faults are detected by the microsequencer with some help from the Ebox. The microsequencer maintains a duplicate copy of PSL<TP>, which it updates as required to track the state of the PSL copy as it would exist when the instruction is executed by the Ebox. At the end of a macroinstruction, the microsequencer logically ORs its local copy of the TP bit with PSL<TP>. If either is set, the microsequencer substitutes the address of the microcode trace fault handler for the address of the next macroinstruction. 5.3.3.10 Conditional Branch Mispredict When the Ibox decodes a conditional branch, it predicts the path that the branch will take and places its prediction into the branch queue. When the Ebox reaches the instruction, it evaluates the actual path that the branch took and compares it in the 85 segment of the Ebox pipeline with the Ibox prediction. If the two are different, the Ibox is notified that the branch was mispredicted and a microtrap request is made to abort the Ebox and Fbox pipelines. The Ibox flushes itself, backs out any GPR modifications that are in the RLOG, and redirects the instruction stream to the alternate path. The Ebox microcode handler for this event cleans up certain machine state and waits for the first instruction from the alternate path. 5-20 Macroinstruction and Microinstruction Pipelines DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 5.3.3.11 First Part Done Handling During the execution of one of the 8 string instructions that are implemented by the CPU, an exception or an interrupt may be detected. In that event, the Ebox microcode saves all state necessary to resume the instruction in the GPRs, backs up PC to point to the opcode of the string instruction, sets PSL<FPD> in the saved PSL, and dispatches to the handler for the interrupt or exception. When the interrupt or exception is resolved, the software handler terminates with an REI back to the instruction. When the Ibox decodes an instruction with PSL<FPD> set, it stops parsing the instruction immediately after the opcode. In particular, it does not parse the specifiers. When the microsequencer finds PSL<FPD> set at a macroinstruction boundary, it substitutes the address of a special FPD handler for the instruction execution fiow. The FPD handler determines which instruction is being resumed from the opcode, unpacks the state saved in the GPRs, clears PSL<FPD>, advances PC to the end of the string instruction (by adding the opcode PC to the length of the instruction, which was part of the saved state), and jumps back to the middle of the interrupted instruction. 5.3.3.12 Cache and Memory Hardware Errors Cache and memory hardware errors are detected by the Mbox or Cbox, depending on the type of error. If the error is recoverable (e.g., a Pcache tag parity error on a write simply disables the Pcache), it is reported via a soft error interrupt request and is dispatched to the operating system. In some instances, write errors that are not recoverable by hardware are reported via a hard error interrupt request, which results in the invocation of the operating system. Read errors that are not recoverable by hardware· are reported via the assertion of a soft error interrupt, and also in a manner that is similar to that used for memory management exceptions, as described above. In fact, the MD file, PA queue, and the Ibox all contain a hardware error :Hag in parallel with the memory management fault flag. With the exception ofTB parity errors, which cause an immediate microtrap request, the event is reported to the Ebox in exactly the same way as the equivalent memory management exception would be, but the microcode exception handler is different. For example, an unrecoverable error on a specifier read would set the hardware error flag in the MD file. When the :Hag is referenced, the error flag is injected into the pipeline. When the fiag advances to the S4 segment and is selected by the RMUX., it causes a microtrap request which invokes a hardware error handler rather than a memory management handler. Note that certain other errors are reported in the same way. For example, if the memory management sequencer in the Mbox receives an unrecoverable error trying to read a PTE necessary to translate a destination specifier, it sets the hardware error fiag in the PA queue for the entry corresponding to the specifier. This results in a microtrap to the hardware error handler when the entry is referenced. PTE read errors for read references are also reported via the original reference. DIGITAL CONFIDENTIAL Macroinstruction and Microinstruction Pipelines 5-21 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 5.4 Revision History Table 5-1: Revision History Who When Description of cbaDge Mike Uhler 06-Mar-1989 Release for external review. Mike Uhler 19-Dec-1989 Update for second-pass release. Mike Uhler 02-Feb-1991 Update after pass 1 PG. 5-22 Macroinstruction and MicrolnstNction Pipelines DIGITAL CONFIDENTIAL Chapter 6 Microinstruction Formats 6.1 Ebox Microcode The NVAX microword consists of 61 bits divided into two major sections. Bits <60:15> control the Ebox Data Path and are encoded into two formats. Bits <14:0> control the Microsequencer and are also encoded into two formats. 6.1.1 Data Path Control The Data Path Control Microword specifies all the information needed to control the Ebox Data Path. The two formats, Standard and Special, are selected by bit <60>, the FORMAT bit. In addition, bit <45>, the LIT bit, selects the constant generation format of the microword, which may be either an 8-hit constant or a 10-bit constant, depending on a decode in the MISe field. Pictures of the microword formats are in Figure ~l and Figure 6-2. A brief description of each field is given in Table ~l and Table ~2. Figure 6-1: Ebox Data Path Control, Standard Format 615 5 5 515 5 5 515 5 4 414 4 4 414 4 4 413 3 3 313 3 3 313 3 2 212 2 2 212 2 2 211 1 1 111 019 8 7 615 4 3 211 0 9 817 6 5 413 2 1 019 8 7 615 4 3 211 0 9 817 6 5 413 2 1 019 8 7 615 +-+---------+---------+-+-----+-+---------+---------+-+-+-+-----------+-----------+---------+ 10 I ALU I MRQ IQI SHF 101 VAL I B ILIWIVI DST 1 A I MIse I +-+---------+---------+-+-----+-+---------+---------+-+-+-+-----------+-----------+---------+ IllPOS 1 CONST I MIse not equal CONST.10 +-+---+---------------+ III CONST.10 I MISe equal CONST.10 +-+-------------------+ Table 6-1: EBOX Data Path Control Microword FIelds, Standard Format Bit Position Microword Field 60 FOR~ 59:55 ALU DIGITAL CONFIDENTIAL Microword Fonnat Description Microword format-Standard or Special Both ALU function select Microinstruction Formats 6-1 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 6-1 (Cont.): EBOX Data Path Control Mlcroword Fields, Standard Format Bit Position Microword Field Microword Format Description 54:50 MRQ Both Mhox request select 49 Q Standard Q register load control 48:46 SHF Standard Shifter function select 45 LIT Both ALU/shift.er B port control-register or literal 44:40 VAL Standard1 Constant shift amount 39:35 B 1 ALU/shift.er B port select 2 Both 44:43 POS Both Constant position 42:35 eONST Both2 8-bit constant value 44:35 eONST.I0 Both 34 L Both S 10-bit constant value Length control 33 W Both Wbus driver control 32 V Both VA write enable 31:26 DST Both WBUS destination select 25:20 A Both ALU/shift.er A port select 19:15 MISe Both Miscellaneious function select, group 0 1 NOT Constant generation microword variant 2S-Bit Constant generation microword variant, when MISe field not equal CONST.I0 SID-Bit Constant generation microword variant, when MISC field equal CONST.IO Figure 6-2: Ebox Data Path Control, Special Format 615 5 5 SIS 5 5 SIS 5 4 414 4 4 414 4 4 413 3 3 313 3 3 313 3 2 212 2 2 212 2 2 211 1 1 111 019 8 7 615 4 3 211 0 9 817 6 5 413 2 1 019 8 7 615 4 3 211 0 9 817 6 5 413 2 1 019 8 7 615 +-+---------+---------+-------+-+-------+-+---------+-+-+-+-----------+-----------+---------+ III ALU I MRQ I MIsel 101 MZSC2 IDt B ILIWIVI DST I A I MISC I +-+---------+---------+-------+-+-------+-+---------+-+-+-+-----------+-----------+---------+ IllPOS I eONST I MIse not equal CONST .10 +-+---+---------------+ III eONST.10 I MISe equal CONST.10 +-+-------------------+ Table 6-2: EBOX Data Path Control Mlcroword Fields, Special Format Bit Position Microword Field 60 FORMAT 6-2 Microinstruction Formats Microword Format Description Microword format-Standard or Special DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Speciftcation, Revision 1.0, February 1991 Table 6-2 (Cont.): EBOX Data Path Control Mlcroword Fields, Special Format Bit Position Microword Field Microword Format Description 59:55 ALU Both ALU function select 54:50 MRQ Both Mbo:r:: request select 49:46 :MISC1 Special Miscellaneous function select, group 1 45 LIT :MISC2 DISABLE.RETIRE B POS CONST CONST.10 L W V DST A :MISC Both ALU/shift.er B port control-register or literal 44:41 40 39:35 44:43 42:35 44:35 34 33 32 31:26 25:20 19:15 Special l Special Both l Both2 l Miscellaneous function select, group 2 Instruction retire disable ALU/shift.er B port select Constant position 2 8--bit constant value 3 10-bit constant value Both Both Both Length control Both Wbus driver control Both VA write enable Both WBUS destination sele<= Both ALU/shifter A port select Both Miseellaneious function select, group 0 1NOT Constant generation microword variant 2S-Bit Constant generation microword variant, when MISe field not equal CONST.I0 310-Bit Constant generation microword variant, when MISC field equal CONST.I0 6.1.2 Microsequencer Control The Microsequencer Control Microword supplies the information necessary for the Microsequencer to calculate the address of the next microinstruction. The basic computation done by the Mierosequencer involves selecting a base address from one of several sources, and then optionally modifying three bits of the base address to get the final next address. Bit <14>, SEQ.FMT, selects between Jump and Branch formats. Figure 6-3 and Figure 6-4 show the two formats. Table 6-3 and Tabl~ 6-4 describe each of the fields. DIGITAL CONFIDENTIAL Microinstruction Formats 6-3 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 6-3: Ebox Mlcrosequencer Control, Jump Format 111111 1 1 4 3 211 0 9 817 6 5 413 2 1 0 +-+-+---+---------------------+ IOISJMUXI J +-+-+---+---------------------+ Table 6-3: Ebox Mlcrosequencer Control Mlcroword Fields, Jump Format Microword Format Bit Position Microword Field 14 SEQ.FMT 13 SEQ. CALL 12:11 SEQ.MUX Both Jump 10:0 J Jump Figure 6-4: Description Microsequencer format-Jump or Branch Subroutine call Next address select Next address Ebox Mlcrosequencer Control, Branch Format 111111 1 1 4 3 211 0 9 817 6 5 413 2 1 0 --+-+---------+---------------+ 11ISISEQ.COND I BR..O!"F +-+-+---------+---------------+ Table 6-4: Ebox Mlcrosequencer Control Microword Fields, Branch Format Microword Format Bit Position Microword Field Description 14 SEQ.FMT 13 SEQ.CALL Both Subroutine call 12:8 SEQ.COND Branch Microbranch condition select 7:0 BR.OFF Branch . Page offset of next address Microsequencer format-Jump or Branch 6.2 Ibox CSU Microcode The Ibox complex specifier unit is controlled by a 29-bit microword, as shown in Figure 6-5. A brief description of each field is given in Table 6-5. 6-4 Microinstruction Formats DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 6-5: Ibox CSU Format 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I ALU IDL I A I BIDS'! I MIse I MREQ IMOX I NX"l' I +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Table &-5: lbox CSU Mlcroword Fields Bit Position Microword Field Description 28:26 ALU ALU function select 25 DL Data length control 24:22 A ALU A port select 21:19 B ALU B port select 18:16 DST MISe MREQ Wbus destination :Mt.JX_Cl\"T NXT Next address mux select 15:13 12:9 8:7 6:0 Miscellaneous function select Mbox request select Next address 6.3 Ibox Instruction ROM and Control PLAs The !box instruction decode is controlled by several ROMs and PLAs that are generated from a single source file whose format is shown in Figure 6-6. A brief description of each field is given in Table 6-6. A more detailed description of the control information as it is actually found in the hardware is given in Table 7-12. DIGITAL CONFIDENTIAL Microinstruction Formats &-5 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 6-6: Ibox Instruction ROM Format 64 63 62 61 60 59 58 57 56 55 54 53 S2 51 50 49 48 47 46 45 +--+--+--+--+--+--+--+--+--+--+----+--+--+--+--+--+--+--+--+-----+ I EXEC_IlISP IVSIST_SPCQIDSIB I VIFBI SP_CNT IA_CNTI +--+--+--+--+--+--+--+--+--+--+----+--+--+--+--+--+--+--+--+-----+ 44 43 42 41 40 39 38 37 36 35 34 33 32 +--+--+--+--+--+--+--+--+--+--+--+--+--+ I A1_REG IA1_IlL I Al_AT I ASS IST1 I +--+--+--+--+--+--+--+--+--+--+--+--+--+ 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 ---+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ IE_IlL I AT 6 I DL 6 I AT 5 I IlL 5 I AT 4 I IlL 4 I A'l' 3 I IlL 3 I AT 2 I IlL 2 I AT 1 I IlL 1 I +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Table 6-6: Ibox Instruction ROM Fields Bit Position Microworcl Field Description 64:56 EXEC_DISP Bits <9:1> of the instruction entry point address in the Ebox conn-ol store 55 VS Deter::nines whether a Yfield specifier occupies 1 or 2 source queue entries 54:53 ST_SPCQ Detern:rl.nes whether the parser is stopped at the end of the instruction, when the next PC queue entry is made, and when the parser is restarted 52 DS 51 B Specifies the length (byte or word) of a branch displacement for the instruction Specifies whether the instruction has a branch displacement 50 V Not CUlTently used 49 FB SP_CNT Specifies whether this instruction is implemented in the Fbox 48:46 45 A_ONT Specifies whether the instruction has an assist 44:41 AI_REG Specifies the register to use for instructions with an assist 40:39 A1_DL Specifies the data length to use for instructions with an assist 38:36 A1_AT Specifies the access type to use for instructions with an assist 35:32 ASSISTI Specifies the type of assist for instructions with an assist 31:30 E_DL 29:27 AT6 Specifies the initial Ebox data length to be used for the instruction Supplies the encoded access type of the sixth specifier, if any 26:25 DLG Supplies the encoded data length of the sixth specifier, if any 24:22 AT5 Supplies the encoded access type of the fifhll specifier, if any 21:20 DLS Supplies the encoded data length of the fifth specifier, if any 19:17 AT4 Supplies the encoded access type of the fourth specifier, if any 16:15 DL4 Supplies the encoded data length of the fourth specifier, if any 6-6 Microinstruction Formats Specifies the number of real specifiers for the instruction DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 6-6 (Cont.): Ibox Instruction ROM Fields Bit Position Microword Field Description 14:12 AT3 Supplies the encoded access type of the third specifiert if any 11:10 DL3 Supplies the encoded data length of the third specifier, if any 9:7 AT2 Supplies the encoded access type of the second specifier, if any 6:5 DL2 Supplies the encoded data length of the second specifier, if any 4:2 AT1 Supplies the encoded access type of the first specifier, if any 1:0 DL1 Supplies the encoded data length of the first specifier, if any DIGITAL CONFIDENTIAL Microinstruction Formats 6-7 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 6.4 Revision History Table 6-7: Revision History Who When Description of chaD.ge Debra Bernstein 06-Mar-1989 Release for external review. Mike Uhler 13-Dec-1989 Update for second-pass release. Mike Uhler 04-Feb-1991 Update after pass 1 PG. 6-8 Microinstruction Formats DIGITAL CONFIDENTIAL Chapter 7 Thelbox 7.1 Overview 7.1.1 Introduction This chapter describes the Ibox section of the }.j",\~ CPU chip. The 4-stage Ibox pipeline (SO ..S3) runs semi-autonomously to the rest of the l\."VAX CPU and supports the following functions: • • • • Instruction Stream Prefetching The Ibox attempts to maintain sufficient instruction stream data to decode the next instruction or operand specifier. Instruction Parsing The Ibox identifies the instruction opcodes and operand specifiers, and extracts the information necessary for further processing. Operand Specifi.er Processing The Ibox processes the operand specifiers, initiates the required memory references, and provides the Ebox with the information necessary to access the instruction's operands. Branch Prediction Upon identification of a branch opcode, the Ibox hardware predicts the direction of the branch (taken vs. not taken). For branch taken predictions, the Ibox redirects the instruction prefetching and parsing logic to the branch destination, where instruction processing resumes. Figure 7-1 is a top level block diagram of the Ibox showing the major Ihox sub-sections and their inter-connections. This chapter presents a high-level description of the Ibox functions, then provides details of the Ibox sub-sections which support each function. DIGITAL CONFIDENTIAL The Ibox 7-1 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 7-1: Ibox Block Diagram - VIC a:: D D C 0 I UI a:: u' :; -s < < III :: = •~ o· ... ~ ~ ,. ,. . r..+ ~ < :. < ~ ; ;; ~ BRANCH ITA' , Bf~'j~' BROPCOllE DISI' ,~V .... I'J !:l IBU OPCODE i II U ISSUE STALL I \bPXI~ ~if.,!.' s,.!.e eTRL o ~ YI I = ! SPEC CTRL ENABLE • ::I ID cD :) III i I ~ :: 0 ia:: SBU ID !! OQU liD INDEX CSU COUNTERS PC OUEUE DATA 7.1.2 Functional Overview The Ibox fetches, parses, and processes the instruction stream, attempting to maintain a constant supply of parsed VAX instructions available to the Ebox for execution. The pipelined nature of the NVAX CPU allows for multiple macroinstructions to reside within the CPU at various stages of execution. The Ibox, running semi-autonomously to the Ebox, parses the macroinstructions following the instruction that is currently in Ebox execution. Performance gains are realized when the time required for instruction parsing in the Ibox is hidden during the Ebox execution of 7-2 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip FunetioDal Specification, Revision 1.0, February 1991 an earlier instruction. The Ibox places the information generated while parsing ahead into Ebox queues. The Instruction Queue contains instruction specific information which includes the instruction opcode, a floating point instruction fiag, and an entry point for the Ebox microcode. The Source Queue contains information about the source operands for the instructions in the instruction queue. Source queue entries contain either the actual operand (as in a short literal), or a pointer to the location of the operand. The Destination Queue contains information required for the Ebox to select the location for execution results storage. The two possible locations are the VAX General Purpose Registers (GPRs) and memory. These queues allow the Ibox to work in parallel with the Ebox. As the Ebox consumes the entries in the queues, the Ibox parses ahead adding more. In the ideal case, the Ibox would stay far enough ahead of the Ebox such that the Ebox would never have to stall because of an empty queue. The Ibox needs access to memory for instruction and operand data. Instruction and operand data requests are made through a common port to the Mbox. All data for both the Ibox and the Ebox is returned on a shared Mo/cMD_BUS_H<63:0> The Ibox port feeds operand data requests to the Mbox Specifier Request Latch and instruction data requests to the Mbox Instruction Request Latch. These 2 latches allow the Ibox to issue memory requests for both instruction and operand data even though the Mbox may be processing other requests. The Ibox supports 4 main functions: 1. 2. 3. 4. Instruction Stream Prefetching Instruction Parsing Operand Specifier Processing Branch Prediction Instruction Stream Prefetching works to provides a steady source of instruction stream data for instruction parsing. While the instruction parsing logic works on one instruction, the instruction prefetching logic fetches several instructions ahead. The Instruction Parsing logic parses the incoming instruction stream, identifying and initial processing each of the instruction's components. The instruction opcodes and associated information are passed directly into the Ebox instruction queue. Operand specifier information is passed on to the operand specifier processing logic. The Operand Specifier Processing logic locates the operands in registers, in memory, or in the Instruction Stream. This logic places operand information in the Ebox source and destination queues, and makes the required operand memory requests. The Ibox does not have prior knowledge of branch direction for branches which rely on Ebox condition codes. The Branch prediction logic makes a prediction on which way the branch will go and forces the Ibox to take that path. This logic saves the alternate branch path target, so that in the event that Ebox branch execution shows that the prediction was wrong, the Ibox can be redirected to the correct branch direction. DIGITAL CONFIDENTIAL The lbox 7-3 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 7.1.3 The Pipeline The !box logic spans the first 4 segments of the NVAX CPU pipeline (SO ..S3). The following table lists the major Ibox sub-sections and which pipe segments they occupy. Table 7-1: lbox Pipeline Sub·Section Description Name so Pipe Stage VIC The VU'tual Instruction Cache is a 2KB direct mapped Istream-only cache with 32 byte blocks, a valid bit per quadword, and an access size of 8 bytes. PFQ The Prefetch Queue is a queue of instruction stream data supplied by the VIC. It is 4 bytes wide by 4 elemen'tS deep. S1 Pipe Stage IBU The Instruction Bu...""St Unit breaks up the incoming instruction data into opcodes, operand specifiers, spec:i£.er extensions, and branch displacements and passes the results to other parts of the !box for further processing. nu The Instruction Issue Unit takes the opcodes provided by the IBU and generates an Ebox microcode dispatch addresses and other context for instnlction execution. BPU The Branch Prediction Unit predicts whether or not branches will be taken and redirects the Ibox instruction processing as necessary. OQU The Operand Queue Unit is the interface to the Ebox source and destination queues. SBU The Scoreboard Unit tracks outstanding read and write references to the GPRs. esu (Sl) This segment of the Complex Specifier Unit contains the microsequencer and control store. S2 Pipe Stage esu (82) This is the register READ segment of the complex specifier unit. It accesses the necessary registers and provides the data to the ALU in the next pipe stage. sa Pipe Stage esu (S3) This is the ALU and WRITE segment of the complex specifier unit. This segment performs the necessary ALU operations and writes the results either to the Ebox register file or to local temporary registers. This segment also contains the Mbox interface. Pipe segment SO is dedicated to supplying a steady stream of instruction data for use by the IBU. When prefetching is enabled, the VIC attempts to fill the PFQ with up to 8 bytes of instruction stream data. The mu parses in 81, the Ebox receives information about the instruction and its operands in the instruction, source, and destination queues. The nu is the Ibox interface to the Ebox instruction queue, and the OQU is the interface to the source and destination queues. When the IBU bas 7-4 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 identified a new opcode, this opcode is passed to the nu which places the necessary opcodespecific information in the Ebox instruction queue. When operand specifiers are identified, the OQU places the necessary operand specific information in the source and destination queues. The CSU is a 3 stage (S1..S3) microcoded pipeline dedicated to handling operand specifiers which require complex processing andlor access to memory. It has read and write access to the Ebox register file and a port to the Mhox. Memory requests from the VIC are received at the CSU and forwarded to the Mbox when there is a cycle free of specifier memory requests. 7.2 Instruction Stream Prefetching The Instruction Stream Prefetcbing mechanism provides a buffer of Istream data 4 bytes wide and 4 elements deep for use by the instruction parser. This buffer insulates the instruction parser from the bursty behavior of the cache and memory sub-systems, and allows for the parallel operation of the instruction fetching and instruction parsing functions. The two Ibox sub-sections which support the instruction prefetching function are the Virtual Instruction Cache (VIC) and the Prefetch Queue (PFQ) both of which reside in the SO pipe stage. 7.2.1 The VIC The VIC is a 2KB, direct-mapped, Istream cache which acts as the primary source of instruction stream data for the Ibox. The VIC attributes are summarized in Table 7-2. Table 7-2: VIC Attributes Cache size 2KBytes Access Type Direct Mapped Block Size 32 Bytes Sub-block Size 8 Bytes Valid Bits 4: Valid bits/Cache Block Data Parity Bits 4: Even Parity bits/Cache Block # of Tags 64 Tags Tag Parity Bit 1 Even Parity Bit Per Tag Fill Algorithm Fill Forward Access Size 8 Bytes Bus Size 8 Bytes Prefetcbing NONE Data stored Istream Only Vutua1lPhysical Vtrtual DIG[TAL CONFIDENTIAL =1 Per Sub-block =1 Per Sub-block The Ibox 7-5 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 7-2: VIC Block Diagram -- r---- r--r--r--- 1 0 0 ~, ~, C C ~ I C --- C 0 ~ ~, .. ~ C C '1 0 0 I I ~ / I I It' ._ .._~"'-. I e li , = a ~ ! .... -1"'" -1"'-1"'" c ~ i I I I I I ~ I !i ~ ~ IIOW_'E~ct&5. 0 f-- *'- I I I - 0 ~~_IE~ •• "'("'- --<~ d~ ~-t M'SS_ADDR J f I ~ ....~ -rAGe,t :10. I .1 c ~ 0 II 0' § f! z The VIC is a virtual cache because the addresses that are used to index into the cache are untranslated VAX Virtual addresses. See Section 12.5 for more on VAX Memory Management and Address Translation. The VIC maintains a local prefetch pointer called VIBA<31:3> (Virtual Instruction Buffer Address). This address is quadword aligned and always points to the next quadword of Istream data to be sent to the PFQ. Table 7-3 shows the fields in VIBA<>. 7-6 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 7-3: VIBA bit fields Bit field Field name Description <4:3> stJBBLE...INDEX Sub-block index (or column select) bits indicate which sub-block to select from cache block. <10:5> BOW_INDEX Row select bits determine which cache row to access <31:11> VIBA_TAG Bits to be compared against cache tag "Whenever the BPU issues a new PC, the VIC latches the NEW_pc<31:3> in VIBA<31:3>. VIBA<10:5> are used to select which cache row to access. Each cache row, shown in Figure 7-3, stores a 21bit tag with even parity for the tag, and four quadword sub-blocks each with a valid bit and an even parity bit which covers the data only. When a cache row of the VIC is accessed, The 21-bit tag is compared with VlBA<31:11> to determine cache hit or miss. VIBA<4:3> selects the cache sub-block. Figure 7-3: VIC Cache Row Format : 0:: 6 0:: 0:: 6 0:3 0 ------------~---~------------------+-~--------------------~----------------------+-~-+------------------+ .:' , '=]..~ IVIP I Sub-block :3 dat.a IVIP I S1.!b-blc-ck :2 data IVIP I Si±--block 1 data IV IP I Sub-block 0 data I ------------~---~------------------~-~-~------------------~-+-~------------------+-+-+------------------~ / ----------------------------------------------- 287 bits ---------------------------------------------- ,\Vhenever space exists in the PFQ, the VIC attempts to supply the next quadword of instruction stream data by doing a VIC_BEAD using the current value ofVIBA<:31:3>. If the VIC_BEAD results in a miss, the VIC begins a VIC_FILL sequence by sending a request through the csu for a cache fill operation from the MbOx. 7.2.1.1 VIC Control The VIC control evaluates the status flags summarized in Table 7-4 every cycle to determine the proper type of cache sequence for the next cycle. VIC_ENABLE enables the cache itself, specifically VIC_READS and VIC_WRITEs. PREFETCILENABLE is the enable bit for the Istream prefetch sequencer. VIC_ERROR indicates that there was a VIC parity error. MBO~ERROR indicates that error status was reported by the Mbox. WRrNtPENDING indicates that the Mbox drove valid Istream data on the M%MD_BUS_B<63:0> last cycle, and a cache write cycle should begin next. The MISS_PENDING flag is set when a VIC_READ misses in cache, and remains set until the cache fill sequence terminates. LOAD_VIC_DATA indicates that VIC data is ready for the PFQ. LOAD>ID_DATA indicates that the data on the M%MD_BUS_B<63:0> during a VIC fill should be loaded into the PFQ. DIGITAL CONFIDENTIAL The Ibox 7-7 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 7-4: VIC Status Flags VIC Flag Meaning The VIC enable bit The prefetch enable bit There was a parity error in the VIC There was an etTOr in the Mbox fetching Istream data Valid data latched from MHmJJtJS_B<63:0>, ready to be written to the VIC A VIC cache fill from the Mbox is in progress A cache read from the VIC is in progress 7 .2.1.2 VIC_Reads The VIC starts a VIC_READ sequence when PREFETCH_ENABLE is set and WRITE_PENDING is clear. If VIC_ENABLE is set, the VIC_READ sequence accesses the cache using the address in VIBA<31:3>. The decode of VIBA<lO:5> selects one of 64 cache rows. If TAG<20:0> matches VIBA<31:11> and the valid (V) bit for the sub-block selected by VIBA<4:3> is set, then there is a cache hit. The data from the sub-block selected by ''IBA<4:3> is driven onto VIC_DATA..BUS<63:0>, LOAD_VIC_DATA is asserted if the PFQ is not full, and the data is loaded into the PFQ. If VIBA<31:11> does not match TAG<20:0>, or the tag matches but the V bit for the selected sub-block is not set, then a cache miss has occurred. In this case, VIBA<31:3> is saved in MISS_ADDRESS<31:3> and the MISS_PENDING flag is set. The four data parity bits for the accessed cache block are latched in MISS_PAB.lTY<3:0>. The four valid bits for the same cache block are latched in MISs_VALID<3:0> if the cache miss is caused by a clear sub-block valid bit. If the cache miss is caused by a tag miscompare then MISS_VALID<3:0> is cleared. VIC_WRITEs make use of MISS..Al)DRESS<31:3>, MISs_PARITY<3:0>, and MISS_VALID<3:0>. A cache fill operation begins as described in Section 7.2.1.3. If VIC_ENABLE is clear or the LOCK bit in the ICSR register is set, indicating a VIC parity error has occurred, then all VIC_READS are forced to miss. 7.2.1.3 VIC Fills Upon detection of a cache miss during a VIC_READ, the VIC issues a fill request to the CSU. The miss address, stored in MISS-.ADDBFSS<31:3>, is driven onto VIC_RlXLADDR<31:3> and V1C_REQ is asserted. The esu forwards the V1C_REQ to the Mbox during the next free cycle on the I%lBO:x..ADDR_B<> bus and associated control lines. The Mbox returns quadwords of instruction data starting with the requested quadword and continuing to the end of the block. This cache fill algorithm is called fill forward. If the Mbox goes off-chip to get the requested data, then a full cache block of instruction data is returned, but not necessarily in any particular order. If the Mbox processes the fill request and finds that the request resides in I/O space, the request is also sent off-chip. In this case only the single requested quadword of data returns to the VIC. In all cases, the VIC is unaware of the number of data blocks being returned. When the last block of data is being returned by either the Cbox or Mbox, a M~LAST_FlLL_B is signaled allowing MISS_PENDING to be cleared and a new read begun. 7-8 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 7.2.1.4 VIC Writes The assertion of M'?DVlC_DATA-.,.L indicates the presence of Istream data on M%MD_BUS_B<63:0>. The VIC latches M%MD_BUS_B<63:0> in FlLL_DArA<63:0>, M~_BUS_QW_PABlTY_L<O> in FlI..kDATA,..PAlUTY<:O>, M%QW_ALIGNMENT_B<l:O> in MISS_ADDRESS<4:3>, and sets WRITE_PENDING. If VIC_ENABLE is set, then a VIC_WRITE commences the next cycle using the address stored in MISS_ADDRESS<31:3> and the data stored in FILL_DATA<63:0>. MISS-.AJ)DRESS<10:5> selects the cache row to write and MISS_ADDRESS<4:3> selects the sub-block to write. TAG<20:0> and its parity bit for the selected row are written with MISS-AJ)DBESS<31:11> and the even parity calculated for these bits. The selected sub-block is written with FILL_DATA<63:0>. MISs_PARITY<3:0> and MISS_VALID<3:0> contain the four data parity bits and four valid bits for the cache block being filled. The parity bit in MISS_p.ARITY<3:0> indexed by MISS..AJ)DRESS<4:3> is associated with the sub-block being written. This parity bit is written with MO/CMD_BUS_QW_PARITY_L<O>. The valid bit in MISS_VALID<3:0> indexed by MISS-AI)DBESS<4:3> is associated with the sub-block being written. This valid bit is set. Both MISS_PARITY<3:0> and MISs_vALID<3:0> are written into the cache array. There may be up to four VIC_WRITEs for each VIC_FILL depending upon sub-block alignment and fill sequence. However, the cache block tag and tag parity, all four data parity bits, all four data valid biis, and one sub-block of data are all written with every VIC_WRITE. If VIC_ENABLE is clear, VIC_WRITEs are disabled, but the cache fill sequence completes norma1l:;: See section Section 7.2.1.7 for information on M%HARD_ERR_B and Mo/cMME_FAULT_H. 7.2.1.5 VIC Bypass When fill data arrives at the VIC on the M%MD_BUS_B<63:0>, an evaluation is done to determine if the incoming data should be loaded directly into the PFQ. If so, then the PFQ latches the data directly from the Mo/GMD_BUS_B<63:0> and VlBA is incremented by 8. This action is referred to as a VIC bypass and is signaled to the PFQ by LOAD_MD_DATA. Note that a VIC_WRITE occurs regardless of the outcome of the evaluation and whether or not the VIC bypass is enabled. H PFQ,..FULL from the PFQ is asserted, indicating the PFQ is full, then LOAD_MD_DATA is not asserted and VIBA is not incremented. The evaluation consists of checking to make sure that the incoming data is for the same cache block and sub-block to which VIBA points. The only time VIBA can be pointing to a different block than the block for ,which data is returning, is if a previous VIC bypass or Hit-Under-Miss incremented VIBA across a cache block boundary. This circumstance is indicated by a VIB.A..NEW_BLOCK flag. In order to facilitate VIC bypass, the Mbox returns M%QW-ALIGNMENT_B<:l:O> with each piece of fill data. These two bits represent the quadword index for this data within the hexaword cache block. If VIBA....NEW_BLOCK is clear and M~W-.ALIGNMENT_B<:l:O> match VIBA<4:3> then the incoming data can be loaded into the PFQ. When VIBA....NEW_BLOCK is set, indicating that the data the PFQ is waiting for is not in the block being filled by the Mbox, then VIC bypass is blocked and LOAD..MD_DATA is not asserted. DIGITAL CONFIDENTIAL The Ibox 7-9 NVAX CPU Chip Functional Specification, Revision 1.0t February 1991 7.2.1.6 VIC Hits Under Miss If the last VIC_WRITE was also a VIC bypass condition, then VIBA increments and potentially points to valid data in the current or next cache block. A subsequent VIC_READ is permitted even when MISS_PENDING is still set. This is referred to as a VIC Hit-Under-Miss. If the VIC_READ during MISS_PENDING also misses, no cache fill request is started. MISS..Al)DBESS, MISS_PARITY<3:0>, and MISS_VALID<3:0> are not updated on a second miss. Note that VIC_READS may start and stop during a :fill sequence based on VIC_WRITEs, but they always restart at the termination of a :fill sequence when M%LAST_FILL_B is signaled. 7.2.1.7 VIC Exceptions and Errors The VIC interprets the Mbox exception and error signals during the VIC_WRITE sequence. The Mo/c:iMME_FAULT_H signal indicates that the Mbox encountered a memory management exception during the processing of an instruction stream reference. The Mbox produces the M%BARD_ERR_H signal when a hardware error is detected during the processing of an instruction stream reference. When M%VIC_D.ATA-.L indicates the presence of data from the Mbox on the M%.'I\ID_BUS_B<63:0>, the assertion of either Mo/tMME_FAULT_B or M%HARD_EBR_B blocks the setting of the WRITE_PD-"DING flag. M%MME_FAULT_B and M%HARD_ERR_B set the error flags IMl\lGT_EXC and MBARD_EBR, respectively. These flags are sent directly to the IBU. They are also used to disable prefetching and block VIC bypass until they are cleared either by a E%STOP_IBOX_B !rom the Ebox or a LOAD_NEW_PC from the BPU. They are also cleared by E%IBOx..LOAD_PC_L which indicates an impending LOADJmW_PC. The VIC checks tag and data during VIC_BEADS. Parity is calculated for the data sub-blocks selected by VIBA<4:3>. The even parity value for the quadword of data is then compared to the parity (P) bit associated with the sub-block read from cache. Data parity miscompares are reported as parity elTars only on valid data. The even parity value for VIC_TAG<20:0> is calculated on VIC_BEADs and compared to the parity (P) bit from the array that is associated with the tag read. Tag parity miscompares are always reported as parity elTors. When the VIC detects either parity elTor, it clears PREFETCB_ENABLE, disabling VIC prefetching, and sets the LOCK bit in the ICSR register, preventing further cache reads and writes. The VIC asserts DIARD_ERR to forward the error condition to the mu. IBARD_ERR remains asserted until it is cleared by a E%STOP_mox".H. The error status bits are set appropriately in the ICsa IPR register and the address of the error is latched in the VMAR register, as explained in Section 7.2.1.16. In addition, the VIC requests a system soft error interrupt by asserting the I%lBOx..S_ERR..L. VIC tag and data parity checking are done specifically to protect the data in the VIC arrays. Refer to section Section 7.9.2 for details on the mu handling of Istream errors. 7.2.1.8 PC Load Effects The assertion of LOAD_NEW_PC by the BPU has the following effects: 1. PREFETCB_ENABLE is set. 2. VIBA is loaded. VIBA<31:3> is loaded from the global Ibox bus NEW_pc<31:3> 3. MlIARD_ERR is cleared. 4. IMMGT_EXC is cleared. 5. MISS_PENDING is cleared. 7-10 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 6. WRITE_PENDING is cleared. 7. VIC_READ is set. B. I%FLUSB_IREF_LAT_B is asserted by the BPU to the MbOx. The VIC reacts to any LOAD_NEW_PC from the BPU on a cycle by cycle basis as follows: Cycle N: • • The Ibox may make an Istream request this cycle. Fill data returning from the Mbox to the Ibox is ignored. Cycle N+1 : • LOAD_NEW_PC is asserted to redirect instruction :flow. • I%FLUSB_IREF_LAT_B is asserted to clear outstanding Istream references. • • The Ibox may make an Istream request this cycle which is ignored by the Mbox. Fill data returning from the Mbox to the Ibox is ignored. This is the last cycle in which fill data for the Istream being flushed can be sent. Prefetching is enabled if previously disabled. MISS_PENDING is cleared and VIC_READ is set. New VIC hit or miss is determined. • • • Cycle N+2: • The Ibox may make a new Istream request based on whether the VIC hit or missed. • MISS_PENDING may be set and VIC_READ cleared if a VIC miss was determined.. • The Mbox may not send Istream data for the old Istream request to the Ibox. Section 7.6 and Section 7.5.1.7 explain more about PC loads. 7.2.1.9 E%STOP_IBOX_H Effects The assertion of E%STOP_IBOx..B by the Ebox has the following effects: 1. 2. 3. 4. 5. 6. 7. B. PBEFETCB_ENABLE is cleared. MHABD_ERR is cleared. lMMGT_EXC is cleared. IBABD_EBR is cleared. MISS_PENDING is cleared. WRITE_PENDING is cleared. VIC_READ is cleared. I%FLUSILIREF_LAT_B is asserted by the BPU. The VIC reacts to a E%STOP_IBOx..B on a cycle by cycle basis as follows: CycleN: • E%STOP_IBOx..B is asserted. • The Ibox may make an Istream request this cycle. DIGITAL CONFIDENTIAL The Ibox 7-11 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • Fill data returning from the Mbox to the Ibox is ignored. CycleN+l : • I%FLUSH_mEF_LAT_B is asserted to clear outstanding Istream references. • • The Ibox will not make an Istream request this cycle. Fill data returning from the Mbox to the Ibox is ignored. This is the last cycle in which fill data for the Istream being flushed can be sent. Prefetcbing is disabled. MISS_PENDING and VIC_READ are cleared, VIC is put into an idle state, waiting for an Eo/cIBOx.,LOAD_PC_L from the Ebox. • • 7.2.1.10 Prefetch Stop Conditions PREFETCH_ENABLE is cleared in the following eases: 1. Any VIC, Mbox error, or Mbox exception when a VIC error is detected or Mbox error is reported. 2. E%STOP_IBOX_H signaled by Ebox when the Ebox microcode performs a MISCIRESET_CPU which asserts E%STOP_IBOX_H. 3. STOP_VIC_PREFETCH, STOP_PARSER bit from the mOM stops Ibox prefetching for those instructions expected to redirect the instruction flow or access the IPRs. 7.2.1.11 Prefetch Start Conditions PREFETCH_ENABLE is set in the following cases: 1. PC load on all PC loads. 2. E%RESTART_IBOx.,H signaled by Ebox when the Ebox microcode performs a E%RESTART_IBOx.,B, unless there is an outstanding VIC or Mbox error, or a PC load by the Ebox: is pending, as signaled by E%IBOx.,LOAD_PC_L. 7.2.1.12 Prioritized List of Prefetch Start/stop Conditions The following priority is followed when multiple prefetch start/stop conditions occur simultaneously: 1. E%STOP_IBOx.,B - stops prefetching 2. PC Load - starts prefetching 3. E%IBOx.,LOAD_PC_L - stops prefetcbing (a PC load is pending) 4. Any VIC or Mhos: Error or Exception - stops prefetching 5. E%RESTART_IBOx.,H - starts prefetcbing 6. STOP_VIC_PREFETCH - stops prefetching 7-12 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 7.2.1.13 VIC Enable The VIC powers up with VIC_ENABLE clear. VIC_ENABLE can be set and cleared during normal operation through the IPR register described in Section 7.2.1.16. VIC_ENABLE is cleared by hardware when any VIC parity error is detected. MACROCODE RESTRICTION In functional operation, an REI must precede the MTPR which enables the VIC in order to flush all of the valid bits. However, if all the valid bits are guaranteed to have been written with a known value (such as in diagnostics or in macrocode that initializes the entire VIC), then this REI may be omitted. 7.2.1.14 VIC Flushing The Ebox asserts E%FLUSH_VIC_H under microcode control to flush the VIC (clear all data valid bits). VIC flushes occur in such instances as the REI instruction, machine checks, and certain exceptions and interrupts. MICROCODE RESTRICTION The Ebox microcode guarantees that prefetching is disabled whenever E%FLUSH_VIC_H is asserted, either implicitly in the context of an instruction with a STOP_PARSER assist or by performing an explicit Eo/~TOP_mOx:..H. The VIC reacts to a E%FLUSH_VIC_H on a cycle by cycle basis as follows: CycleN: • • • • Prefetching has already been disabled. E%FLUSH_VIC_H is asserted. The !box may make an Istream request this cycle. Fill data returning from the Mbox to the Ibox is ignored. Cycle N+1: • I%FLUSH_IREF_LAT_H is asserted to clear outstanding Istream references. • • The Ibox will not make an Istream request this cycle. Fill data returning from the Mbox to the Ibox is ignored. This is the last cycle in which fill data for the Istream being flushed can be sent. 7.2.1.15 Flushing IREFs The signal I%FLUSH_IREF'_LAT_H is asserted by the BPU whenever a new PC is loaded indicating a redirection of the Istream. It is also asserted whenever there is a EO/cSTOP_IB01..H or a E%FLUSILVIC_H from the Ebox. In all cases, the Mbox may continue to return VIC fill data in the same cycle as the I%FLUSH_mEF_LAT_H, but not the following cycle. The VIC will ignore any fill data received in the same cycle or the one cycle previous to the cycle in which I%FLUSH_mEF_LAT_H is signaled. DIGITAL CONFIDENTIAL The Ibox 7-13 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 7.2.1.16 VIC Control and Error Registers The VIC contains 4 internal processor registers (IPRs) which provide VIC control and read/write access to the arrays. MACROCODE RESTRICTION VIC_ENABLE must be cleared before writing to the VIC IPRs: VMAR, VDATA, or VTAG. VIC_ENABLE must be cleared before reading from VIC IPRs: VDATA, VTAG. In functional operation, an REI must preceed the MTPR which enables the VIC. See Section 7.4.2.8 for details of the IPR mechanism. -} \' ~4 Figure 7-4: IPR DO (hex), VMAR 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 1211ll1~ ~'81 7 6 1 (~ 5 41 t 2 1 0 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--~--+--+--+--+--+--+--+--+--+--+--+ I ADDR I' I 1 I 01 01 :VMAR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+,--+--+--+--+ 1 , RO'H_INDEX I ---+ SUB_BLOCK ---+ LW ---+ Table 7-5: VMAR Field Descriptions Name Extent Type Description LW 2 WO Longword select bit. Selects longword of sub-block for cache access SUB_BLOCK 4:3 RW Sub-block select. Selects data sub-block for cache access, also latches VJBA<4:3> on VIC parity elTors 10:5 RW Row select. Row index for read and write access to cache array, also latches VIBA<10:5> on VIC parity errors 31:11 RO Error address field. Latches tag portion of VIBA on VIC parity errors ADDR When the VIC is disabled, the VIC Memory Address Register (VMAR) may be used as an index for direct IPR access to the cache alTays. VMAR<10:5> supply the cache row index, VMAR<4:3> supply the cache sub-block, and VMAR<2> indicates the longword within a quadword address. VMAR also latches and holds the VIBA<31:3> on VIC aITay parity errors. 7-14 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 7-5: IPR 01 (hex), VTAG 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 9 81 7 6 5 41 3 2 1 0 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 TAG 1 11 11 TP 1 DP 1 v 1 :VTAG +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Table 7-6: VTAG Field Descriptions Name Extent Type Description V 3:0 RW Data valid hits. Supply data valid hits on array read/writes DP 7:4 RW Data parity hits. Supply data parity on array read/writes TP 8 RW Tag parity hit. Supplies tag parity on tag array readlwrites TAG 31:11 RW Tag. Supplies tag on tag array read/writes The VTAG IPR provides read and write access to the cache tag array. An IPR write to VTAG will write the contents of the M%MD_BUS_H<63:0> to the tag, parity, and valid bits for the row indexed by VMAR<10:5>. VTAG<31:11> are written to the cache tag. VTAG<8> is written to the associated tag parity bit. VTAG<7:4> are used to write the four data parity bits associated with the indexed cache row. Similarly VTAG<3:0> write the four data valid bits associated with the cache row. DP<3:0> and v<3:0> are the data parity and data valid bits, respectively, for the 4 quadwords of data in the same row. DP<O> and v<O> correspond to the quadword of data addressed when address bits 4:3 = 00, DP<I> and v<l> correspond to the quadword of data addressed when address bits 4:3 =01, etc. Figure 7-6: IPR 02 (hex), VDATA 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 9 81 7 6 5 41 3 2 1 0 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ DATA 1 : VDATA +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Table 7-7: VDATA Field Descriptions Name Extent Type Description DATA 31:0 Data for data array reads and writes RW The VDATA IPR provides read and write access to the cache data array. When VDATA is written, the cache data array entry indexed by VMAR is written with the lPR data. Since the IPR data is a longword, two accesses to VDATA are required to read or write a quadword cache sub-block. Writes to VDATA with VMAR<2> =0 simply accumulate the IPR data destined for the low longword of a sub-block in FILL_DATA<31:0>. A subsequent write to VDATA with VMAR<2> = 1 directs the the IPR data to FILL_DATA<63:32>, and triggers a cache write sequence to the sub-block indexed byVMAR. DIGITAL CONFIDENTIAL The Ibox 7-15 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Reads to VDATA with VMAR<2> = 0 trigger a cache read sequence to the sub-block indexed by VMAR<>. The low longword of the a sub-block is returned as IPR read data. A read ofVDATA with VMAR<2> 1 returns the high longword of the sub-block as IPR data. = Figure 7-7: IPR 03 (hex), ICSR 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 9 81 7 6 5 41 3 2 1 0 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 0 1 1 1 01 I : ICSR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ ---+ TPERR DPERR ---+ LOCK ---+ ENABLE Table 7-4): ---+ ICSR Field Descriptions Name Extent Type ENABLE o RW,O LOCK 2 WC Lock Bit. When set, validates and prevents further modification of the error status bits in the ICSR and the error address in the VMAR register. When clear, indicates no VIC parity error has been recorded and allows ICSR and VMAR to be updated. DPERR 3 RO Data Error Bit. When set, indicates data parity error occurred in data array if the Lock Bit is also set. TPERR 4 RO Tag Error Bit. When set, indicates tag parity error occurred in tag array if the Lock Bit is also set. Description Enable Bit. When set, anows cache access to the VIC. Initializes to o on RESET. The ICSR IPR provides control and status functions for the Ibox. VIC tag and data parity errors are latched in the read-only ICSR<4:3>, respectively. ICSR<2> is set when a tag or data parity error occurs and keeps the error status bits and the VMAR register from being modified further. Writing a logic one to ICSR<2> clears the LOCK bit and allows the error status to be updated. When ICSR<2> is clear, the values in ICSR<4:3> are meaningless. When ICSR<2> is set, a VIC parity error has occurred, and either ICSR<4> or ICSR<3> will be set indicating that the parity error was either a tag parity error or a data parity error, respectively. ICSR<4:3> cannot be cleared from software. ICSR<O> provides IPR control of the VIC enable. It is cleared on RESET. 7.2.1.17 VIC Performance Monitoring Hardware Hardware exists in the lOOx VIC to support the NVAX Performance Monitoring Facility. See Chapter 18 for a global description of this facility. The VIC hardware generates two signals I%PMUXO_H and I%PMUXl_H which are driven to the central performance monitoring hardware residing in the Ebox. These two signals are used to supply VIC hit rate data to the performance monitoring counters. 7-16 The Ibox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 I%PMUXO_B is asserted the cycle when a VIC read reference is first attempted while the prefetch queue is not full. I%PMUXl_B signals the hit status for this event in the same cycle. The data is captured only on the first read reference that could be used by the PFQ to avoid skewed hit ratios caused by multiple hits or misses to the same reference while the prefetch queue is full or the VIC is waiting for a cache fill. 7.2.2 The Prefetch Queue The PFQ is a 4-longword-deep queue for Istream data. When prefetching is enabled, the VIC controls the supply of data to the PFQ. The PFQ can accept one quadword of data each cycle. When the PFQ contains insufficient available space to load another quadword of data it asserts PFQ..FULL which prevents the VIC from loading additional data into the PFQ. When the PFQ contains no unused Istream data it asserts PFQ..EMPTY and sends it to the mu. The PFQ loads data from the MIQID_BUS_B<63:0> or VIC_DATA...BUS as directed by the load signals LOAD_l\iD_DA'IA and LOAD_VIC_DATA from the VIC. LOAD-.MD_DATA is asserted by the VIC only when there are no errors associated with the data. Data loaded from the VIC_DATA..BUS must be conditioned with the error signal IHARD_ERR. If LOAD_VIC_D.Al'A and IBARD_ERR are both asserted, con-upted data is loaded into the PFQ from the VIC_DATA.,BUS. To prevent this data from being used, the IBU reports the error immediately and stops parsing data. The PFQ determines the number of valid unused bytes of Istream data available for parsing and sends this information to the IBU on AVAIL..LE*. 'When the IBU retires Istream data it signals the PFQ on CIBU"lCRETIRE_SPECB_B<5:0> and· I_IBU%RETIRE_OPCODE the number of Istream bytes retired. These two signals are used to update the pointers in the PFQ. The output of the PFQ is directed through a MUX which aligns the data for use by the IBU. The alignment ~IUX takes the first and second longwords and the first byte from the third longword as inputs. The alignment MUX outputs 6 contiguous bytes starting from any byte in the first longword, based on the PFQ pointers. 7.2.2.1 PC load effects The PFQ is Hushed when the BPU broadcasts a new PC load as indicated by I_BPU%LOAD_NEW_PC and when the Ebox asserts E%mO~LOAD_PC_L. In addition, when the BPU loads the PC, bits <2:0> of the new PC are decoded and used to set the PFQ pointer. 7.3 Instruction Parsing The instruction parser identifies the different components of incoming VAX instructions and forwards those components to other parts of the Ibox for further processing. The instruction parser contains two logic sub-sections - the Instruction Burst Unit (IBU) and the Instruction Issue Unit (nu). DIGITAL CONFIDENTIAL The Ibox 7-17 NVAX CPU Chip Functional Specification, Revision 1.0t February 1991 Figure 7-a: Prefetch Queue Block Diagram < ~ < < ~ Q < CI.) ;:) en < ~ < Q (.) 1 -' ~ 1 CL , PFQ SOURCE MUX ~~----~-----T----~--J/ <31:24 1 Q 1 0« ~ :; \ 1 Q1 -' 5:? Q ;:) > :::i <15:8> L 0 -' Q 1 0 -' SEL <7 :0> <23: 16 () () () c~ WRITE (~ () () c~ (.) c; () o ...J ...J S ;:: z o(.) o 1.1. CL , " PFQ ALIGNMENT MUX "'---r-~-r--""'---'--_/ / PTR , '~ ______________J/ DATA TO IBU The IBU parses incoming instruction data into Opcodes, Operand Specifiers and Speci:.6.er Extensions and Branch Displacements. This information is then passed on to the operand specifier processing logic. The opcode is also sent to the rru which generates an Ebox microcode entry point for this opcode and places it and other needed information in the instruction queue in the Ebox. See Table 7-15 for more information on the format of the Ebox instruction queue. 7-18 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Instruction parsing is logically divided into 2 distinct activities: Instruction issue and specifier identification, and branch displacement and Ebox assist processing. The instruction issue and specifier identification activity starts when a new opcode is loaded by the mu. The IBU sends the opcode to the IIU for issuing to the Ebox. The instruction opcode is also used to determine the number of operand specifiers and branch displacements associated with the instruction. In parallel with instruction issue, the IBU identifies the operand specifiers. When all the operand specifiers are processed, the IBU begins the branch displacement and Ebox assist processing activity. The branch displacement (if present) is sent to the BPU, and Ebox assist specifiers (if present) are processed. See Section 7.3.2.7 for more on Ebox assists. 7.3.1 VAX Instruction Format There are 3 components in VAX. instructions: opcodes, operand specifiers and specifier extensions, and branch displacements. The 1 or 2 byte opcode specifies the function to be performed. Operand specifiers with potential extensions range from 1 to 9 bytes and specify an instruction operand or operand location. The 1 or 2 byte branch displacements are signed offsets used to compute the destination PC in branch instructions. A VAX instruction is composed of an opcoae and optionally up to 6 operand specifiers and one branch displacement. For a given opcocie. the number of operand specifiers and branch displacements is fixed. The instruction opcode is the first one or two bytes in the instruction followed by the operand specifiers, followed by the branch displacement, all at successively increasing addresses. All references to opcodes in this section refer to one-byte opcodes unless specified otherwise. For more information on VAX. instruction formats, opcodes, and operand specifiers, see DEC STD 032, VAX Architecture Standard. 7.3.2 The Instruction Burst Unit The IBU bursts apart Istream data into its component parts: opcodes, operand specifiers, and branch displacements. The IBU is capable of identifying an opcode and one operand specifier each cycle. Operand specifiers are categorized according to the their Addressing Mode as being either simple or complex. Simple specifiers are register mode (Addressing Mode 5) and short literal (Addressing Modes 0 .. 3). All other specifier types, including assists, are considered complex. The IBU retires up to 6 bytes of data from the PFQ each cycle. New data is available from the PFQ at the beginning of a cycle. The mu sends the number of specifier bytes being retired back to the PFQ so that new data is available for processing by the next cycle. Instruction components extracted from the Istream data are sent to other parts of the Ibox for further processing. The opcode is sent to the IIU and the BPU on OPCODE<8:0>. The specifiers, except for branch displacements, are sent to the CSU, the SBU and the OQU via SPEC_CTB.L<21:0>. Branch displacements are sent to the BPU on B_BRANCB_DISP<7:0> and SPEC_DATA<7:0>. The specifier control field SPEC_cTRL<21:0> contains information about the specifier being retired each cycle. SPEC_CTRL<21:14> and SPEC_D.Al'A<31:0> contain information used in processing complex specifiers. Table 7-9 describes the information contained on these busses. DIGITAL CONFIDENTIAL The Ibox 7-19 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 7-9: SpecHler Control Fields Bit Field Field Name Description This bit is set if the specifier is a short literal. RNlSBOltT L1"1'E&\L <9:7> Contains a 6-bit short literal if the sblit :flag is set. <4:1> contains the general purpose register number associated with the specifier if the shlit llag is not set, in which case <6:5> are not used. Access Type of the instnlction operand with which this operand specifier is associated. <11:10> DL Data length of the instnlction operand with which this operand specifier is associated. <12> VALID Flags data valid on the bus. <13> COMPLEX This bit is set if this is a complex specifier. If the IBU is retiring a specifier, SPEC_CTBL<21:0> and SPEC_DATA<31:0> contain information about the specifier being retired. SPEC_CTBL<21:14> and SPEC_DATA<31:0> contain valid data used by the CSU only when the specifier is complex. If a simple specifier is being retired, the information on SPEc_CTRL<21:14> is invalid and not used by the CSU and the complex flag SPEC_CTRL<13> is not set. Table 7-10 describes the fields in SPEC_CTBL<21:14> used for complex specifiers. Table 7-11 describes the fields in SPEC_DATA<31:0> used by the csu and BPU. ,\Vhen displacement and displacement deferred mode specifiers are processed, byte and word data length specifiers are sign extended to longword data length on SPEC_DATA<31:0>. Table 7-10: Complex SpecHler Control Fields Bit Field Field Name Description <16:14> DISPATCB Dispatch address for Complex Specifier Unit Control Store. <17> AT_BMW 1 if access type of operand is R, M or W. <18> INDEXED This bit is set if mode of previous specifier is index. <19> ASSIST This bit is set if this is an Ebox assist specifier. <20> PC_MODE This fiag is set if the bits <3:0> of the specifier point to GPR 15 <21> .JMP_OB._JSB This bit is set if this instruction is a JMP or JSB. 7-20 The Ibox =PC. DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 7-11 : SpecHier Data Fields Bit Field Field Name Description <7:0> Upper order byte of word displacement if branch displacement is being processed. Otherwise, the lower order byte of data for immediate and displacement mode specifiers. <31:8> Upper 3 bytes of data for immediate and displacement mode specifiers. 7.3.2.1 SpecHler Identification In the instruction issue and specifier identification phase ofinstru.ction parsing, operand specifiers are parsed, and the necessary information about each specifier is sent to the specifier processing logic. The information needed by the Ebox to process the instruction is also identified and sent to the flU. Each time a new opcode is loaded in the IBU, instruction context for that opcode is extracted from PLAB, complimentary logic, and the Instruction ROM (IROM). This information is summarized in Table 7-l2. As each specifier is identified, the current SPEC_COu!\"T is decremented. '\\llen this counter reaches 0, the IBU enters the next phase of instruction parsing, Ebox assists and branch displacements processing. Table 7-12: Field Name Instruction Context Summary #I bits Description IDstruction Con'text stored in the mOM 3 Number of specifiers for this instruction 2 STP_SlJPPBBSS_POQIBD»,JtES'Ili\ltTJBOX: 0/0 Do not stop parser,make a PC queue entry for the next instruction. 011 Stop parser at the end of the instruction, make a PC queue entry for the next instruction, and restart parser on B'litIIESTAKT..mo~B. ASSIST 110 Stop parser at the end of the instruction, suppress PC entry for next instruction until LOAD,..NEW_PC is received, and restart parser on LOAD PC. See Table 7-14. 111 Stop parser at the end of the instruction, suppress PC queue entry for next instruction until LOADJQ:W..PC is received, restart parser on naBSTAR.T~ 1 Number of Ebox assists for this instruction 3 Assist dispatch 2 Access type for Ebox Assist 1 Data Length for Ebox Assist DIGITAL CONFIDENTIAL The Ibox 7-21 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 7-12 (Cont.): Instruction Context Summary Field Name , bits Description IDstructiOD Context stored in the mOM A..JUtG 1 Register for Ebox Assist AT! 8 Access type for specifier # 1 AT2 8 Access type for specifier # 2 DLl 2 Data length for specifier # 1 D1.2 2 Data length for specifier' 2 1 when this is an Fbox instruction Fa 1 DISPATCH 9 Ebox microcode dispatch address E_DL 2 Data length for instruction execution IDstruction Context stored in the PLAs .AT3 8 Access type for specifier ~ 3 AT" 8 Access type for specifier ~ 4 AT5 1 AT6 1 Access type for specifier 5 Access type for specifier #.= 6 D1.3 2 Data length for specifier '* '* 3 Data length for specifier '* 4 DU 2 D1.6 2 Data length for specifier :# 5 D1.8 1 Data length for specifier :# 6 B 1 Indicates that there is a branch displacement. DISP_SIZE 1 Size of the branch displacement. 0 =byte displacement, 1 = word. IDstruction Context decoded by logic 1 Indicates how many source queue entries to allocate for RMODE (Mode 5) specifiers with variable bit field access type. 0 1 entry, 1 2 entries. = = Each cycle, the IBU evaluates the following information to determine if an operand specifier is available and how many PFQ bytes should be retired to get to the next opcode or specifier: • • • • The number of PFQ bytes available. Each cycle, the PFQ provides the IBU with the number of instruction stream bytes available on AVAlL..LE<5:0>. This can be as little as 0 and as many as6. The number of specifiers left to be parsed in the instruction stream. ISU keeps a running count of the number of specifiers left to be parsed for the current instruction. The data length of the next specifier. The COMPLEX..UNIT_BUSY flag SI_VALJD. When the esu is busy and cannot accept another complex specifier, 81_VALID is asserted. If the IBU identifies a complex specifier while this signal is asserted, it stalls until the flag is cleared by the esu. 7-22 The Ibox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • • • • DATA.,.LENGTB_VALID flag. This flag is asserted when the instruction PLAs have valid data length information ready. This flag is cleared when a new opcode is loaded and set when the access type and data length information is available for use. Specifier bus enable flag, SPEC_CTB.L..ENABLE, from the OQU. This flag enables the loading of specifier information onto the specifier control bus. If SPEC_CTRL_ENABLE is 1 then the specifier control bus is enabled, and one specifier can be processed. If SPEC_CTBL_ENABLE is o then no specifiers can be processed, and the mu stalls. The parser stopped fiag PARSER_STOPPED. There are many times when the parser must be stopped to prevent it from interfering with Ebox activity. 'When this is necessary, PARSER_STOPPED is asserted and all parser activity stops. The next 2 bytes of the instruction stream. If the specifier byte is a simple specifier (Addressing Modes 0 .. 3, or 5), and the following conditions are met, then the information for this specifier is driven onto SPEc_cTRL<12:0>, and the specifier byte is retired from the PFQ at the end of the cycle: 1. There are at least 2 bytes of valid PFQ data. (At least one byte in the specifier field and one byte in the opcode field.) 2. The parser is not stopped. 3. There is at least one specifier remaining for this instruction. 4. SPEC_CTRL_ENABLE = 1. If the first specifier byte is a complex specifier, and the following conditions are met, then the information for this specifier is driven onic SPEc_cTRL<21:0> and SPEC_DATA<31:0>, and the appropriate number of PFQ bytes for this specifier are retired from the PFQ at the end of the cycle: 1. The number of bytes required according to the Addressing Mode and Data Length of the specifier (plus one for the opcode field) are available from the PFQ. 2. The parser is not stopped. 3. There is at least one specifier remaining for this instruction. 4. SPEC_CTBL_ENABLE = 1. 5. COMPLEX_UNIT_BUSY flag is not asserted. 7.3.2.2 Operand Access Types There are 6 different access types for operands. The access type information determines whether the operand is a source or destination operand, and whether the operand, or the address of the operand is needed by the Ebox. These access types are modeled after, but are not identical to, the operand access types specified in the architectural summary. • • • A (Address) An operand with access type = A is a source operand. The Ebox gets the address of the operand, not the actual operand. R (Read) An operand with access type = R is a source operand. The Ebox gets the actual operand. M (Modify) An operand with access type = M is both a source and a destination. The Ebox gets the actual operand and a pointer to the destination. DIGITAL CONFIDENTIAL The Ibox 7-23 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • • • 7.3.2.3 W (Write) An operand with access type =W is a destination operand. The Ebox gets a pointer to the destination. VR (Variable bit field read-access) An operand with access type = VR is a source operand. The Ebox gets the actual operand if the addressing mode of the specifier for the operand is RMODE (Mode 5). Otherwise the Ebox gets the address of the operand. VM (Variable bit field modify-access) An operand with access type = VM is both a source and a destination. The Ebox gets the actual operand if the addressing mode of the specifier for the operand is RMODE (Mode 5). Otherwise the Ebox gets the address of the operand. If the operand specifier is RMODE, the Ebox gets a pointer to the destination. Otherwise no destination pointer is supplied. DL stall For all but one addressing mode, the number of b:;'1es to retire for a specifier is determined entirely by the addressing mode. Immediate mode (SF) addressing, however, requires the data length information for the operand to determine how many PFQ bytes to retire. In the event tha t a new opcode is loaded and the first specifier is an immediate mode specifier, the absence of DATA_LENGTH_VALID causes the IB'l7 to stall because there is no way to determine the number of PFQ bytes to retire for this specifier. DATA....LENGTH_VALID is asserted the following cycle after the opcode has passed through the instruction PLAs and mOM to generate the required data length information. The immediate mode specifier can be retired the following cycle if the conditions described above are met. 7.3.2.4 Driving SPEC_CTRL The data on SPEC_CTRL<13:0> is used by the OQU to generate Ebox source queue and destination queue entries that may be needed in the next cycle. The data on SPEc_cTRL<21:14> is used by the esu to generate the microcode dispatch addresses. SPEC_DATA<31:0> contains instruction stream data for Immediate and Displacement mode specifiers. 7.3.2.5 PC and Delta_PC The IBU keeps a local copy of the PC called the lBU_PC which points to the next byte of I stream data that will be processed by the mu. When the IBU retires instruction stream data, the mu_pc is incremented by the number of operand and operand specifier bytes retired as signaled by SPEC_BYTES_RETIRED and LOAD_NEW_OPCODE. The IBU_PC can be loaded from the NEW_pc<31:0> when the signal LOAD..NEW_PC is asserted and all operand specifier, Ebox assist, and branch displacement processing is completed by the IBU. The mu_pc is sent to the esu, no and BPU on IBU_Pc<31:0>. 7-24 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 7.3.2.6 Branch Displacement Processing Some instructions have branch displacements as indicated by B. If B is set, the instruction has a branch displacement and the branch size is determined by DISP_SIZE. Both Band DISP_SIZE are outputs of the instruction PLAs. A DISP_SIZE of 0 indicates a byte branch displacement and a DISP_SIZE of 1 indicates a word displacement. The branch displacement is always the last piece of data for an instruction and is used by the BPU to compute the branch destination. Branch displacements are not sent to the specifier parsing logic. They are sent only to the BPU on SPEC_DATA<7%o> and B_BRANCH_DISP<7:0>. Branch displacement processing begins after all the non-displacement specifiers are parsed and retired from the PFQ. A branch displacement is processed when the following conditions are met: 1. There are no specifiers left to be processed (Ebox assists excluded). 2. The branch flag B<:O> is set in the instruction PLAs and the branch displacement has not been processed. 3. The required number of bytes is available from the PFQ according to DISP_SlZE. 4. The parser is not stopped. 5. BRANCH_STALL is not asserted. BRA.~CH_STALL occurs on the load opcode of the next instruction after a secc:::ld conditional branch is received. BBA..'l'lJCH_STALL is described in the Section 7.5.1.6 section. If all these conditions are met, then the branch displacement is placed on SPEC_DATA<7:O> and B_BRANCH_DISP<7 :0> and DISP_V.ALID is asserted. SPEC_DATA<7:0> contains the high byte of a word branch displacement and B_BRANCH_DISP<:7 :0> contains the low byte of a word branch displacement or the byte branch displacement. If these conditions are not met, the IBU stalls. If an instruction contains no operand specifier, the branch displacement can be processed during the same cycle that the opcode is processed provided that there is sufficient data in the PFQ. 7.3.2.7 Ebox Assist Processing Ebox assist processing can go on in parallel with branch displacement processing since they require no common resources. Ebox assists are implicit specifiers which help the Ebox speed up some of the time critical instructions. To the csu, these assists look very similar to normal complex specifiers and have associated with them all the normal access tY,pe, data length and register information. The only real difference is where this data comes from. Since these specifiers are not a part of the instruction stream, information about them must be stored in the mOM. The 7 Ebox assists are summarized in the following table: Table 7-13: Ebox Assist Summary Assist Data Name Leugth Read DIGITAL CONFIDENTIAL Quad Register Description FP Read register mask for Ebox. Read return PC for Ebox and BPU The lbox 7-25 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 7-13 (Cont.): Ebox Assist Summary Assist Access Data Name Type LeDgth BSB_DEST (SP)+.RQ Read Read ~SP).WL Register Description Long Quad SP SP Read return. PC for Ebox and BPt7 Quadword stack pop Write Long SP Longword stack push PC.BL Read Long NONE Current PC is sent to Ebox PC..(sp).ML Modify Long SP STOP.MBQX.QUEUE NONE NONE NONE Combines effects of PC.RL and -CSP).WL assists Mbox specifier queue is stopped All of the Ebox assists generate dispatches to the CSU. "'hen all the normal specifiers for an instruction have been identified and retired from the PFQ, the Ebox assist (if any) is processed. The maximum number of assists for any instruction is 1. An Ebox assist is processed and its associated data driven onto SPEC_CTRL<21:0> when the following conditions are met: 1. There is an Ebox assist. 2. The parser is not stopped. 3. It is not the same cycle as the opcode load. 4. If the instr.lction is BSBW or BSBB, the branch displacement has been parsed. 5. SPEC_C~ENABLE = 1. 6. COMPLEX_L"1'.Tl'_BUSY fiag is not asserted. BSBW and BSBB instructions have PC.RL Ebox assists. For these instructions, the branch displacement must be retired and the IBU_PC must be updated to point to the byte following the branch displacement before the PC.RL assist can be processed. 7.3.2.8 Reserved Addressing Modes Some combinations of specifier mode, specifier register, and access type cause reserved addressing mode faults in the VAX architecture. Refer to Table 7-33 for more details on reserved address mode detection. 7.3.2.9 Quadword Immediate Specifiers Immediate mode specifiers with quadword data length take two or more cycles to process. When a quadword immediate specifier is detected by the lBU parse logic, the first longword is prOcessed (like a longword immediate specifier) and QUAD_FLAG, is set. QUAD_FLAG is used by the mu retire logic to properly retire the next four bytes when they become available in the PFQ. When the second longword is retired, QUAD_FLAG is cleared and the specifier count is decremented. QUAD_FLAG is also cleared by E%BRANCH_MISPREDICT_H, El.fcSTOp_mox..B, I%IMEM...MEXC_B, and I%lMDCBERR_H. 7-26 The Ibox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The first longword of the quadword immediate data is sent to the CSU in the normal fashion. The second longword of the quadword immediate data from the instruction stream is discarded. The csu then uses the specifier PC and generates a memory request to fetch the next four bytes of the immediate data. 7.3.2.10 Index Mode Specifiers Index mode specifiers are two-part specifiers which take two or more cycles to process. The first byte of an index mode specifier specifies the index register; it is treated like any other complex specifier with the exception that a flag, index_wait is set, and the specifier counter is NOT decremented. Additionally, SPEC_CTRL<21:17> is ignored by the CSU. When the second byte of an index mode specifier is processed, the specifier counter is decremented and SPEC_cTRL<21:17> contains the appropriate data. SPEC_CTRL<18> is set and index_wait is cleared. The reserved addressing mode fault PLA in the IBU checks the mode of the second specifier byte. If the index_wait is set. and if the second byte is short literal, register mode, or index mode, a reSErved addressing mode fault is detected and sent to the Ebox on I%RSVD_ADDR_FAULT_H. Refer to Table 7-33 for more details on reserved addressing mode detection. 7.3.2.11 Loading a new opcode A new opcode is loaded in the IBU under the following conditions: 1. All operand specifiers, branch displacements and Ebox assists for the current instruction have 2. 3. 4. 5. been parsed (which asserted INSTR_DONE). The parser is not stopped. There is at least one byte of data available from the PFQ. ISSUE_STALL is not being asserted by the IIU. BRANCH_STALL is not being asserted by the BPU. New opcodes are loaded and passed directly to the instruction PLAs and mO:M. In parallel, the instruction issue and specifier identification process for the new instruction begins. When a the new opcode is loaded, a check: is made to see if the value of the opcode is FD. If it is, no instruction parsing is done this cycle. FD_OPCODE is set, the byte is retired from the PFQ, and another opcode load is enabled for the following cycle. The opcode sent to the flU and the BPU on OPCODE<8:0> is a concatenation of FD_OPCODE and the opcode byte. FD_OPCODE is bit 8, and the opcode is in <7:0>. DIGITAL CONFIDENTIAL The Ibox 7-27 NVAX C!:'U Chip Functional Specification, Revision 1.0, February 1991 7.3.2.12 Reserved Opcodes Each time a new opcode is loaded in the IBU, instruction and operand specifier information is extracted from a set of PLAs and from the mOM in the IBU for that opcode. This information is specified in Table 7-12. When a reserved or unimplemented opcode is detected, the following occurs: 1. The IBU !ROM has one of the STOP_PARSER bits set. This signals the IBU to stop parsing instruction stream data. 2. The IBU mOM provides the reserved opcode dispatch address for Ebox microcode. 7.3.2.13 Instruction Parse Completion Once all the operand specifiers, branch displacements and Ebox assists have been processed, instruction parsing is complete and INSTR_DONE is asserted. INSTR_DONE is used by the csu to make RLOG base queue entries and by the IBU to control loading of the BPU_PC under certain conditions. Additionally, ifinstruction parsing is complete and if there is no PC load pending, RETJRE_OPCODE is asserted and sent to the PFQ control logic and the IIU PC queue logic. In the PFQ this signal increments the number of specifier bytes retired by 1 in order to retire the previous opcode and allow for loading of the new opcode. It is used in the IIU to update the PC queue pointer under certain conditions. 7.3.2.14 Operands with Access Type VR and VM One of the outputs from the instruction PLAB is a hit that indicates how many source queue entries should be written for VR and VM access type operands with register mode specifiers. When this bit is 0, only one source queue entry is written; when it is 1, two are written. This bit is available in the middle of the opcode load cycle and is sent to the OQU on VS. This signal remains valid throughout the instruction parsing operation. 7.3.2.15 IO/OIMEM_MEXC_H and I%IMEM_HERR_H The IBU forwards Istrea.m errors to the Ebox on I%J::ME&LBERR_B and I%IMEM..MEXC_H. These signals :Bag memory management exceptions and hardware errors. The IBU receives three error signals from the VIC which are used to determine when to assert I%IMDCHERR_B and ICfdMEM..MEXC_H: IHARD_EBR, MBARD_ERR, and IMMGT_EXC. Refer to Section 7.2.1.7 for more detail on these signals. The IBU asserts I~MEXC_H if JMMGT_EXC is asserted from the VIC and the PFQ is empty or contains insufficient data to complete parsing of the current specifier, and parsing is not stopped. I%IMDLMEXC_B remains asserted as long as these conditions are met. 7-28 The Ibox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional SpecificatiOD.t Revision 1.0, February 1991 The IBU asserts I%IMEM_BERR_H under two different conditions. First, if MBARD_ERR is asserted from the VIC and the PFQ is empty or contains insufficient data to complete parsing of the current specifier, and parsing is not stopped. Additionally, if IBARD_ERR is asserted from the VIC, I%IMDCRERR_B is asserted immediately without waiting for the PFQ to run dry or contain insufficient data. I%IMEM_BERR_H remains asserted as long as these conditions are met. 7.3.2.16 IBU stop and restart conditions Two categories of conditions cause the IBU to stop parsing: the first is exceptions, the second is instructions which need pipeline synchronization. When the IBU is stopped, PABSER_STOPPED is asserted. Table 7-14 summarizes all IBU stop and restart conditions. Table 7-14: IBU stop and start summary Stop Condition Start Condition Description stop ibox, Ebox restarts parser reserved addTessing mode fault, Ebox restarts parser E'iUtESTART_IBOx.,S ''Ie hardware error, Ebox restarts parser ~RESTART_IBOx..s FPD is set, parse opcode and stop parser, Ebox restarts parser E'i-BRANCH_MISPRBDICT..L LCS'OllmtOx.,BESTAltT branch mispredict, ibox restarts parser stop parser set case 1 I_CS'D'EBOx..aEBTAItT parser stopped when STP_BESTAltT_IBOX and INSTR_DONE are both asserted, ibox restarts parser stop parser set case 2 LJBOIICSt7..LD_USTAltT parser stopped when STP_SlJPPItESS_PCQ and and JNSTlt_DQNE are both asserted and STPJi,ESTARTJBOX is de-asserted, restart occurs when the csu supplies the BPU with the new PC and all other instruction parsing is complete IRARD_ERR FPD and opcode 7.3.2.17 load First Part Done (FPD) Set Some long instructions can be interrupted in the middle of their execution sequence (e.g. MOVC instructions). When such an instruction is interrupted, the first part done bit (FPD) in the Processor Status Longword (PSL) is set indicating that the interrupted instruction will be resumed at the execution point where the interrupt occurred, rather than at the beginning of the instruction. All such instructions have one of the STOP_PARSER bits set in the mOM. This allows the FPD pack-up to IPR read the current PC (from the top of the PC queue) and then load the PC of the interrupt handler. When an instruction such as MOVC is interrupted, and the interrupt is processed, processor context is switched back to the interrupted process by the REI instruction. This instruction causes the PSL of the interrupted process to be reloaded with the FPD bit set. The Ebox sends the E%FPD_SET_L signal to the Ibox. If EO/DFPD_SET_L is asserted the Ibox will re-issue the interrupted instruction when valid opcode data is parsed by the IBU. However, after parsing and issuing the instruction, no. further data is parsed by the IBU. DIGITAL CONFIDENTIAL The Ibox 7-29 NVAX CPU Chip Functional Speci:ficatiODt Revision 1.0, February 1991 ~ When the interrupted instruction is complete, the Ebox loads the PC of the next instruction and parsing is restarted by the mu. 7.3.3 The Instruction Issue Unit The flU takes opcodes received from the IBU and generates the information needed by the Ebox to begin instruction execution. .An instruction is said to be issued when this information is sent to the Ebox instruction queue. Table 7-15 shows the format of the instruction queue entries created by the flU. This information is sent to the Ebox on 19DI(LBUS_B<21:0>. The IIU must also keep track of the program counter (PC) values of the opcodes that are either in the instruction queue or are in Ebox execution. If the Ebox detects a fault during the execution of an instruction, it needs to be able to get at the PC of the faulting opcode. These PCs are kept in the PC queue. Table 7-15: Instruction Queue Entry Format Bit Field Field Name DescriptiOJl <0> VALID 1 when this queue entry is valid <9:1> DISPAlCH Ebox microcode dispatch address <10> FB 1 when this is an Fbox instruction <12:11> DL Data. length for instruction execution <21:13> OPCODE Instruction Opcode Most of the information needed to create an instruction queue entry is stored in the instruction ROM located in the mu. See Table 7-12. The opcode used to access the ROM is a 9-bit composite opcode consisting of 8 true opcode bits and 1 bit indicating whether or not this is a two byte FD opcode. This extra bit is generated by the IBU and passed along with the other 8 opcode bits. The IIU issues an instruction as soon as the instruction ROM access completes unless the instruction queue is full. The instruction queue full status is computed and maintained locally in the IIU. 7.3.3.1 Issue Stall The nu maintains a counter of the number of slots filled in the Ebox instruction queue. Each time a new opcode is issued to the DU, the counter is incremented. When the Ebox removes an entry from the queue as indicated by the EtH1E'1'IRE_IN'STR_L signal, the counter is decremented. When the counter equals 6, the depth of the instruction queue, ISSUE_STALL is asserted, blocking the IBU from parsing a new opcode. 7-30 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 7.3.3.2 PC Queue and PC loads The PC queue is a 7 entry FIFO which contains PC values of opcodes that are either in the instruction queue or are in Ebox execution. Opcode PCs are added to the back of the queue as instructions are issued and removed from the front of the queue when the Ebox retires an instruction as indicated by E%RETIBE_INSTR_L. The PC of the next instruction to be retired by the Ebox is always at the front of the queue unless the PC queue is empty. The PC queue is flushed on chip reset or when either E%FLUSB_PCQ..B or E%BRANCB_MISPREDICT_L is asserted by the Ebox.. Any time the Ibox broadcasts a new PC on NEW_PC<31:0>, as signaled by LOAD_NEW_PC, it is loaded into the next available slot in the PC queue. If E%BBANCB_MlSPREDICT_L caused the PC load or if the Ebox stops the Ibox as signaled by E%STOP_mox..B, then following additional actions are taken: • • The instruction queue counter is cleared. ISSUE_STALL is cleared if set. In the event of an Ebox PC load, the parser is guaranteed to stop either by E%STOP_IBOX_H, STP_SUPPRESS_PCQ, or STP_RESTART_IBOX several cycles before the actual PC load occurs. These signals are used in the IBU to stop instruction parsing. When the new PC arrives, the PC queue is empty and ready to accept the new PC into the first available slot. The value of STP_SUPPRESS_PCQ affects whether the PC queue loads the next PC as the parser stops. If STP_SUPPBESS_PCQ is asserted then the next PC is entered in the PC queue. The value of the IBU_PC is loaded into the PC queue if LOAD..NEW_PC is not asserted, the burst unit signals that the parsing is complete with RETIBE_OPCODE, E%FPD_SET_L is not asserted, and either of the following conditions are true: • STP_SUPPRESS_PCQ is not asserted or STOP_VIC_PREFETCB is not asserted, and the BPU is not stalled • BSTL....FRC_PCQ (from the BPU) is asserted and the instruction is done. The PC at the front of the PC queue is readable by the CSU. When the Ebox needs access to this PC, it stops the Ibox and sends an IPR read request to the CSU. The CSU responds by reading the front of the PC queue and then writing that value to the Ebox working register (WX) specified by a register index supplied with the IPR command. See Section 7.4.2.8 for more details on IPR transactions. MICROCODE RESTRICTION For proper operation, retire_instr and lPR read of the BPC (Backup PC) from the PC queue must not occur in the same microword. This guarantees that the PC queue does not decrement in the same cycle that an IPR read of the BPC occurs. DIGITAL CONFIDENTIAL The Ibox 7-31 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 7.4 Operand Specifier Processing Operand Specifier Parsing prepares instruction operands for access by th~ Ebox.. The three Ibox sub-sections which together perform this function are the Operand Queue Unit (OQU), the Complex Specifier Unit (esu), and the Scoreboard Unit (SBU). The OQU handles simple specifiers and acts as the interface to the Ebox. source and destination queues; the esu is responsible for processing complex specifiers, and the SBU provides the esu with information about the number of outstanding GPR read and write references in the source and destination queues. 7.4.1 Operand Queue Unit The OQU controls the passing of operand information into the Ebox operand queues and the allocation of Ebox Memory Data registers (MDs). Simple specifiers are processed entirely in the OQU. Register mode specifiers are passed into the source or destination queues as pointers to the corresponding Ebox register file location. The OQU passes short literal specifiers as immediate data.. The 6 :MD registers in the Ebox register file are used as destinations for operand data requests made by the esu. When a complex specifier appears on the specifier control bus, the OQU allocates both the source queue entries and Ebox MDs and passes the Ebox register file index of the first alloca ted lID to the esu. The I%OPE~'"D_BUS_H<14:0> transfers source and destination queue entry information to the Ebox.. There may be up to 2 source queue entries and 2 destination queue entries made via the Io/cOPERAND_BUS_H<14:0> in a given cycle. The format for this bus is shown in Figure 7-9. Short literals: 7-32 The Ibox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 7-9: Source/Destination Queue Entry Formats SHORT LITERAL Mode: 14 13 12 III 8 I 7 10 5 6 4 I 3 2 1 o +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 1 01 0 I 1 1 1 <--nOPERAND_BOS H +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ I I I MBZ if SQ VALI:D2-1 I I I +---short literal value:2 (quad) I I +-----------------------short literal value1 I +-------------------------------------SHLIT (l-short lit) +-------------------------------------------------SQ_VALID:2 (l-quad operand) +-----------------------------------------------------SQ_VALID1 Register Mode: 14 13 1:2 III 10 8 I 7 6 5 2 1 o 4 I 3 .---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 1 0 1 I 1 I GPRn GPRn+l 1 <--I%OPERAND_BUS H ~---+---+---+---+---~---+---+---+---+---+---+---+---+---+---+ 1 +---REG2 (GPR:l+l tag for quad) +-------------------?~G: (G??~ ~ag) (l-:1=,r) (l-Field Queue ::'-:.-:ry) --------------------------------------SF.:.:: (O-n ~": she·:-=. :i':.; ~-----------------------------------------DQ \iALZD2 (l-quad w/m operand) ·-----------------------------G~R . .---------------------------------VF!:E:.D ----------------------------------------------DQ-v~~:~: ~-------------------------------------------------SQ-V~~:=:2 (l-quad rim operL-:.d) "'------------------------------------------------------SC:V;':':O: ;.:: C':!ler Modes fer access ':~'F.Qs read ano. mod!.!::.. : _. :3 _4 ,.~ 8 ! ' l~ 5' ! 3: 0 ~---+---+---~---+-------+---+---+---+---+---+---~--+-------+ , . , I 1. ! 0 I 1 I C' : I ~---~---+---+---+---~---~---~---~---+---~---+---+---~---~---+ I 1 I I +---REG2 (MD~+l tag for quad) +-------------------REGl (MOn tag) +-----------------------------GPR (O-MO) +---------------------------------VFIELD (l-Field Queue Entry) +-------------------------------------SHLIT (O-not short lit) +-----------------------------------------DQ_VALID:2 (l-quad wlm operand) +---------------------------------------------DQ VALID1 I +-------------------------------------------------SQ-VALID:2 (l-quad rim operand) +-----------------------------------------------------SQ:VALID1 I I I I I I I I I I All Other Modes for access type write: o 2 1 14 13 l2 III 10 9 8 I 7 6 5 4 I 3 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 1 0 I 0 I 1 I I 0 I I 0 I GPRn I GPRn+l I <--I%OPERAND_BOS_H +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ I I I I 1 I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I +---REG2 (GPRn+1 for quad) +-------------------REGl (GPRn tag) +-----------------------------GPR (o-maest) I +---------------------------------VFIELO (l-Field Queue Entry) I +-------------------------------------SHLIT (O-not short lit) +-----------------------------------------DQ V1LID2 (l-quad w/m operand) l---:::::::::::::=-=:==-:::::::-::-:::::::=-==-=::::=-=-::-::-::~~~~~ DIGITAL CONFIDENTIAL (l-quad rIm cparaDdJ The Ibox 7-33 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 7-16: .CYoOPERAND_BUS H DeflnHlon Bit Field Field Name Description VALUE2 Upper bits for quadword short literal, must be zero's SHLIT 3:0 9:4 10 DQ..VALID2 11 Valid second destination queue entry - always 0 for short literal DQ..VALID1 12 Valid destination queue entry - always 0 for short literal SQ..VALID2 13 14 Valid second source queue entry - set if quadword short literal Valid source queue entry VALUE1 SQ..VALID1 Short literal value. Lower bits for quadword Short literal. 1 if short literal, 0 otherwise All other modes: Table 7-17: IO/oOPERAND BUS H Definition Bit Field Field Name Description REG2 3:0 Register or MD for 2nd source/dest queue entry of a quadword specifier REG1 7:4 Register or MD for 1st source/dest queue entty GPR 8 Source/dest queue entry is for a register mode specifier WIELD 9 Field queue entry to be made SHUT 10 Short literal. 1 if short literal, 0 otherwise DQ..VALID2 11 Valid second destination queue entry for quadword specifiers DQ..VALIDI 12 Valid destination queue entry SQ..VALID2 13 14 Valid second source queue entry for quadword specifiers SQ..VALIDI 7.4.1.1 Valid source queue entry Source Queue Interface The OQU can write up to two source queue entries each cycle depending on the access type and data length of the operand they specify. I%oPERAND_BUS_B<SQ...VALID!> and I%OPERAND_BUS_B<SQ...VALID2> are the source queue entry valid bits. I%oPERAND_BUS_B<SQ...VALID!> indicates that the information on I%OPERAND_BUS_B<lO:4> is for a valid source queue entry. I%oPEBAND_BUS_B<SQ...VALID2> indicates the information on I%OPERAND_BUS_B<3:0> is for a valid source queue entry. I%OPERAND_BUS_B<lO:4> contains the information for any specifier that is placed on SPEC_C'rRL. I%OPERAND_BUS_B<3:0> contains the second source queue entry whenever the specifier on SPEC_CTBL has an access type of Read or Modify and a data length of quadword or it is an RMODE specifier with access type VR or VM and the VS hit is set. I%OPERAND_BUS_B<SQ...VALID2> is set only if I%OPERAND_BUS_B<SQ...VALIDl> is set. 7-34 The Ibox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The addressing mode of the operand specifiers determines the value of the source queue entries. For short literal (Modes 0 .. 3) addressing modes, I%OPERAND_BUS_B<VALUE1> contains the short literal data directly, with I%oPEBAND_BUS_B<Sm..IT> set. Source queue entries for register <Mode 5) addressing mode specifiers contain pointers to the referenced GPR, with I%OPERAND_BUS_B<GPR> set and I%OPERAND_BUS_B<SHLIT> cleared. Source queue entries for all other addressing modes contain pointers to Memory Data (MD) registers in the Ebox, with I%OPERAND_BUS_B<GPR> and I%oPERAND_BUS_B<SHLIT> both cleared. I%oPERAND_BUS_B<VFIELD>, is set for variable bit field specifiers and cleared otherwise. This bit is used by the Ebox to make Field Queue entries. The access type and data length of the operand being specified determines the number of source queue entries that are written for all operands except those with access types VR or VM. Read (R) and Modify (M) access type operands write one source queue entry if the operand data length is byte, word, or longword, and two source queue entries if the operand data length is quadword. Write (W) access type operands never write any source queue entries. A.ddress (A) access type operands ahvays write one source queue entry regardless of the operand data length. The number of source queue entries written for non-field access type operands is sum::narized in Table 7-18. Table 7-18: Source Queue Entries Written for Non-field Access Type Operands Access Type Data Length Number of Source Queue Entries written Read (R) Byte, Word, Long 1 source queue entry written ?\'lodify (M) Byte, Word, Long 1 source queue entry written Write (W) Byte, Word, Long, Quad o source queue entries written Address (A) Byte, Word, Long, Quad 1 source queue entry written Read (R) Quad 2 source queue entries written Modify (M) Quad 2 source queue entries written For VR and VM operands, the vs bit associated with the instruction and the addressing mode determine the number of source queue entries that are written. For these variable bit field access type operands, VS performs a function similar to the data length in non-field operands. The VS bit specifies how many source queue entries to write for VM and VR operands with &MODE specifiers. The value of VS is ignored if the access type of the operand is not VR or VM. If vs is 0 then one source queue entry is written for VR and VM operands with an RMODE specifier. If VS is 1 then two source queue entries are written for VR and VM operands with an RMODE specifier. Only one source queue entry is written for VR and VM operands with non-RMODE specifiers, regardless of the value of VS. Table 7-19 shows the number of source queue entries .written for operands with VR or VM access types. Table 7-19: Source queue Entries Written for VR or VM Access Type Operands VS Access Type o RMODE 1 source queue entry written 1 RMODE 2 source queue entries written X non-RMODE 1 source queue entry written DIGITAL CONFIDENTIAL Number of Source Queue Entries Written The Ibox 7-35 NVAX CPU Chip Functional Speci1:ieation, Revision 1.0, February 1991 VS is supplied by the mu in the middle of the cycle in which the opcode is loaded and is held throughout the parsing of the instruction. 7.4.1.1.1 Short Uteral Specifiers (Modes 0••3) Short literal specifiers create a source queue entry with the SBLIT fiag set and the short literal data in I%oPERAND_BUS_B<VALUE1>. The short literal data is the full RN_SBORT_LITERAL<:6:1> from the specifier control bus. For quadword operands the OQU writes two source queue entries. In this case, I%OPEBAND_BUS_B<VALUE2> is 0, I%oPERAND_BUS_B<VALUE1> contains the short literal value, I%OPERAND_BUS_B<SHLlT> is set, and I%oPERAND_BUS_B<SQ,..VALID1> and I%oPERAND_BUS_B<SQ,..VALID2> are both set to indicate 2 source queue entries. Short literal addressing modes for VM and VR access type operands cause a reserved addressing mode fault to be signaled to the Ebox. All reserved addressing mode faults block the OQU from writing any source or destination queue entries. See Section 7.9.5 for details on these faults. 7.4.1.1.2 RMODE Specifiers (Mode 5) Register mode specifiers create source queue entries with Io/tOPEP_~'''D_BUS_H<REG1> pointing to the specified Ebox GPR index and the SHUT bit clear. The conten.:s of I~PERA...~"D_BUs_B<REG1> are taken directly from the specifier control bus R." field. Io/tOPERA.""D_BUS_B<GPR> is equal to 1 for register mode operands .. If two entries are allocated for an operand due to quadword data length or the vs bit, the value for the second entry on I%OPER..~'-n_Bus_H<REG2> is the value of the first entry on I%OPERA."ID_BUS_H<REG1> incremented by 1 and modulo 16. For specifiers of type VR or VM, the I%oPERAND_BUS_B<VFIELD> is set to indicate a variable bit field specifier and cleared otherwise. 7.4.1.1.3 Index Mode Specifiers (Mode 4) Indexed specifiers are processed by the mu as two specifiers. Only the second specifier, the base, may create a source queue entry. The first specifier is recognized and ignored by the OQU if it is a complex specifier with the dispatch field of the specifier control bus pointing to index mode. Therefore, if SPEC_CTBL<COMPLEX> is set and SPEC_CTRL<DISPATCII> is index mode, then no source queue entries will be made for the specifier. 7.4.1.1.4 All Other Addressing Modes Specifiers which are not literal or register mode create source queue entries with the I%OPERAND_BUS_B<REG1> fields pointing to Ebox MDs and the SBLIT and GPR bits clear. One MD is allocated for each source queue entry of this type written. See Section 7.4.1.4 for more detail on MD allocation. If two entries are allocated for an operand due to quadword data length or RMODE with the VS bit set, the I%OPERAND_BUS_B<REG2> field for the second entry is equal the Io/eOPERAND_BUS_B<REG1> field of the first incremented by 1 and modulo 6. The most significant bit for both I%OPERAND_BUs_B<REG1> and I%OPERAND_BUS_H<REG2> are set to 1 to colTespond with Ebox register file addressing. For specifiers of type VR or VM, the VFIELD bit is set to indicate a variable bit field specifier and cleared otherwise. Only one specifier per instruction may be of access type VR or VM, so as not to overflow the field queue. 7-36 The Ibox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 7.4.1.2 Destination Queue Interface The OQU can write up to two destination queue entries each cycle depending on the access types and data lengths of the operands they specify. The addressing mode of the operand speci1ier determines the contents of the destination queue entries written. Destination queue entries for register (Mode 5) addressing mode specifiers contain pointers to the referenced GPR and the GPR flag is set to indicate a register mode destination. All destination queue entries for specifiers with an access type write will contain pointers to the referenced GPR, regardless of addressing mode. For non-register mode specifiers of access types read and modify, the I%OPERAND_BUs_B<REGl> and I%OPEBAND_BUS_B<REG2> fields are used by the source queue and ignored by the destination queue. All addressing modes other than register mode (Mode 5) and short literal <Modes 0 .. 3) clear the GPR flag to indicate a memory destination. I%OPEBAND_BUs_H<D~VALIDl> is set if there is a valid destination queue entry. I%oPEBAND~BUs_B<D~VALID2> indicates a second destination queue entry is also valid. I%OPERAND_BUs_H<D~VALID2> will only be set if Io/cOPERAND_BUS_B<D~VALIDl> is also set. Short literal addressing mode specifiers for operands with access types of Write (W), Modify (M), and VM cause Reserved Addressing Mode Faults. Reserved Addressing :l\.lode Faults block the OQU from writing any source or destination queue entries. See Section 7.9.5 for details on these faults. The access type and data length of the operand being specified determines the number of destination queue entries that are written for 'all operands except those with '\vith access types VR or v:M:. Write (W) and Modify (M) access type operands write 1 destination queue entry if the operand data length is byte, word, or longword, and two destination queue entries if the operand data length is quadword. The number of destination queue entries written for non-field access type operands is summarized in Table 7-20. Table 7-20: Destination Queue Entries Written for Non-field Access Type Operands Access Type DataLeDgth Number of Destination Queue Entries Written Read (R) Byte, Word, Long o destination queue entries written Modify (M) Byte,Word, Long 1 destination queue entry written Write (W) Byte, Word, Long 1 destination queue entry written Address (A) Byte, Word, Long o destination queue entries written Read (R) Quadword o destination queue entries written Modify (M) Quadword 2 destination queue entries written Write (W) Quadword 2 destination queue entries written Address (A) Quadword o destination queue entries written For VR access type operands no destination queue entries are written. For VM access type operands, the vs bit associated with the instruction and the addressing mode of the operand specifier determine the number of destination queue entries that are written. The VS bit speci1ies how many destination queue entries to write for VM access type operands with RMODE specifiers. The value ofvs is ignored if the access type of the operand is not VM. Ifvs is 0 then one destination queue entry is written forVM access type operands with an RMODE specifier. Ifvs is 1 then two destination queue entries are written for VM access type operands with an RMODE specifier. VM access type operands with non-RMODE specifiers create no destination queue entries. Table 7-21 shows the number of destination queue entries written for operands with VM access type. DIGITAL CONFIDENTIAL The Ibox 7-37 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 7-21: Destination Queue Entries Written for VM Access Type Operands VS Access Type Number of DestiDation Queue Entries Written o RMODE 1 destination queue entry written 1 RMODE 2 destination queue entries written X non-RMODE 0 destination queue entries written 7.4.1.2.1 RMODE Specifiers (Mode 5) Register mode specifiers create destination queue entries with I%OPERAND_BUS_B<REG1> pointing to the specified Ebox GPR and the I%OPERAND_BUS_B<GPR> bit set. The contents of the I%OPERAND_BUS_B<REGl> field are taken directly from the specifier control bus R.N field. If two entries are allocated for an operand due to quadword data length or the vs bit, I%OPEBA..'PID_BUS_H<REG2> for the second entry is I%OPERAND_BUS_H<REGl> incremented by 1 and modulo 16. I%OPERAND_BUS_H<D~V.ALIDl> and I%OPERA..~_Bus_H<D~VALID2>, the destination queue entry valid bits, are both set. 7.4.1.2.2 Index Mode Specifiers (Mode 4) Indexed specifiers are processed by the mu as two specifiers. Only the second specifier, the base, may create a destination queue entry. The first specifier is recognized and ignored when the specifier control bus has a complex specifier with the dispatch field pointing to index mode. In other words, if SPEC_CTRL<COMPLEX> is set and SPEC_CTRL<DISPATCH> equals index mode, then no destination queue entries will be made for the specifier. 7.4.1.2.3 All Other Addressing Modes All other addressing modes create destination queue entries with the GPR bit clear. If two entries are allocated for an operand due to quadword data length or VM access type with the vs bit set, the GPR bit applies to both entries. 7.4.1.3 Queue Entry Allocation The OQU maintains a count of available Source and Destination Queue entries using an up-down counter for each. When the OQU allocates source queue entries, the source queue counter increments by the number of entries allocated. When the OQU allocates destination queue entries, the destination queue counter .increments by the number of entries allocated. When the source queue counter equals 12, the source queue is full. When the destination queue equals 6, the destination queue is full. The source and destination queue counters decrement whenever the Ebox retires entries from the respective queues. The signals E%SQ...RETIRE_B<l:O> and E%DCLRETIRELB<O> are generated by the Ebox, and indicate the number of source and destination queue entries, respectively, to be retired this cycle. Up to two source queue entries and one destination queue entry may be retired each cycle. The E%SQ...RETIR.E-.B<l:O> signal decode is demonstrated in Table 7-22 7-38 The Ibox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 7.4.1.4 MD Allocation MDs are allocated in the OQU using an up-down allocate counter and an index counter. When the oQU allocates a new MD, the allocate counter increments and the CUlTent value of the index pointer is sent to the CSU and then incremented modulo 6. Whenever a source queue entry which points to an MD is retired by the Ebox, the allocate counter decrements. The value of the allocate counter always represents the number of previously allocated MDs and the index counter always points to the next MD to allocate. When the allocate counter equals 6 there are no MDs left to allocate. The signals E%S(LRETIRE_MD_B<1:0> are generated by the Ebox and indicate the number of MD source queue entries to be retired this cycle. The EtrcSQ...RETIRE_MD_B<1:0> signal decode is demonstrated in Table 7-22. Table 7-22: Source Queue Entries Retired # SQ Entries Retired 1:0 Retired 00 01 10 11 o 00 o o1 1 10 1 11 2 7.4.1.5 #MD SQ Entries 1 1 2 Specifier Bus Enable The OQU applies back-pressure to the mu whenever there are insufficient MDs or source arid destination queue entries to hold more operands. SPEC_CTRL..ENABLE is driven by the OQU to enable the driving of specifier data on the specifier control bus. SPEC_Cl'B,kENABLE, when asserted, allows the mu to drive a specifier on SPEC_CTRL<21:0>. The number of available source queue entries, destination queue entries, and MDs determine whether a specifier may be parsed by the IBU and driven on the specifier control bus. SPEC_CTBL_ENABLE is asserted if there are at least 2 source queue entries, 2 destination queue entries, and 2 MDs available. 7.4.1.6 E%STOP_IBOX and Branch Mlspredlct The following actions take place when the Ebox issues a Eo/eSTOP_mO:x..B or a branch mispredict. • The MD allocation counter and index counter are both reset to 0 • The source queue counter is reset to 0 • The destination queue counter is reset to 0 • Any specifiers currently being processed will not make a queue entry. DIGITAL CONFIDENTIAL The Ibox 7-39 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 7.4.2 Complex Specifier UnH The Complex Specifier Unit (eSU) processes all specifiers with modes other than short literal or register. It receives parsed instruction stream data and parameters on the specifier control fields. Using a 32-bit, 3-stage pipelined datapath with microcode control, the CSU performs the register and memory data operations required to provide the Ebox with instruction operands. Final operand values are routed to the Ebox memory data registers. 7.4.2.1 CSU Microcode Control The CSU microsequencer provides microcoded control for the 3-stage pipelined datapath. Under typical operation, a control store address is generated for the 128-entry X 29-bit control store array and a new microword is referenced every cycle. The complete microword depicted in Table 7-23 is issued and forwarded to the subsequent pipeline stages in consecutive cycles in order to control the data path logic in those stages. Figure 7-10: Microword Format .. . - 2 - - - -.- .- - 1 .- ..., 11 1 s- a i € ------------------------------------------------------------------------------------------------------------;'..::: •::::= 1.="::s: !·::s: 2 2 .:. .: 2 ... : ~ 1 1 .I. s ~ I ~---------------~-----------------------~-----------------------~---------------~-------~-------------------~ Table 7-23: field Mlcroword Fields description controls the ALU function ML selects mem req data length A lA_bus source B IB_bus source DST IW_bus destination =long or DL miscellaneous functions MBEQ.FNC controls memory request function NX'J:ADDR full next microaddress field conditional control of decoder next The 128-entry control store array is arranged as 8 pages of 16 microwords per page. Bits <6:4> of the control store address designate the microcode page, bits <3:0> designate the microword address within a page. The page organization places the microcode corresponding to a unique complex specifier flow within a particular page. 7-40 The Ibox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 7-24: Microcode Page Allocation page description 000 displacement flows, modes=A,C,E 001 displacement deferred fiows, modes=B,D,F 010 auto increment fiow, mode=8 011 auto increment deferred fiow, mode=9 100 register deferred fiow, mode::6 101 auto decrement fiow, mode=:7 110 IPR and utility routines, index flow, mode=:4 111 Ebox assists, idle address The CSU specifier microcode processes VAX defined specifiers 4 and 6-F. These are the operand specifiers that the Ibox defines as complex. Displacement data will be sign extended by the IBU so the CSU can process byte, word and long'\vord displacement specifiers in a longword microcode flow. DisplaCEment deferred specifiers merge together in a similar fashion. Ebox assists are t'implicif' operands in some of the ,,:4.....'( opcodes. In order to simplify Ebox microcode to handle instruction execution only, the implicit specifiers are processed up front by the Ibox. These assists appear to the Ebox as typical complex operands. See Section 7.3.2.7 for more information on assists. 7.4.2.2 CSU Pipeline The 3-stage esu pipeline operates under microcode control during the 81, 82, and S3 stages of the Ibox pipeline. Control store address generation, control store lookup, and microword issUe occurs in the Sl stage. The datapatb source busses are driven during the 82 pipeline stage. The S3 stage contains the ALU and write destination bus logic, and memory request logic. Ordinarily, microwords move through the pipeline synchronously, advancing every cycle. Stalls occur when a resource required for a particular pipeline stage is unavailable. Stalls operate synchronously and transparently to the microcode How by freezing the sequence and the pipeline, thereby causing the esu logic to repeat the operation performed in the previous cycle. The stall terminates upon acquisition of the resource which caused the stall and the pipeline flow returns to normal, advancing every cycle. 7.4.2.2.1 S1 Pipeline Stage The 81 pipe latch, also called the dispatch latch, controls the 81 pipeline logic. The 81 pipe latch is loaded from the parsed instruction stream data and parameters shown in Table 7-25. SI_RN, SI_AT, SI_DL, SI_DISPATCH, SI_AT_BMW, SI_INDEXED, SI_ASSIST, and SI_PC_MODE load directly from the specifier control field and the specifier complex control field as driven by the IBU. The SI_REG_INDEX loads from the MD_INDEX lines coming from the OQU. The 32-bit SI_IB_DATA and SI_IBO~PC are loaded from SPEC_DAT.A<31:0> and mu_pc<31:0> respectively. DIGITAL CONFIDENTIAL The lbox 7-41 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 7-11: Complex Specifier Unit Control Path Block Diagram t C It ! ~:! .. z loll:> . CI SPEC_CTRL' go ~1--_ _ _ _---Ir:ti--_ _ _ _ _~~T_ _ _ _ _ _ _ _ _-+-_ _-toIr-- ~_ I_.;_~~~-~ ~ : ~ - Ito 9 _____~D~L~~ ____ REG_INDEX ---~--~ :: C ~ ~ :s ... w ... ~ Co. L I·~~ ~~ '-- ,i! TIl I I I ASSIST VI".!O USEQ. VALle CONTRO i ;1 : ISOX '''''' WRITE I! ..; I ! r- , ~ ~ ':...J - '''52 STALL PHI' , "SS STALL i\. CONTROL STORE I: ~ ::~~:" PMI' I ~~ -:l----J I DISP'PC'IN~ IBOX IPR_NUMS;;:; : IBOX IPR INDEX ---r- DECNXT • .IMP. MISC ...!~ 128 MlOROWORDS x I t BITI 5' _MICROWORD v II''' REG SELECTS Sl_RXS_SCORE and Sl_RXD_SCOBE load from the entry in the SBU scoreboard atTay pointed to by the GPR number of the specifier. Sl_BXS_SCORE and Sl_RXD_SCORE represent "snapshot" values of the scoreboard, taken when a specifier dispatch enters the 81 pipe latch. The scoreboard updates the value of these entries based on the Ebox retiring source and destination queue entries. See Section 7.4.3 for scoreboard details. The snapshot values decrement in parallel with the SBU values. 7-42 The Ibox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 7-25: Bit Field 81 Pipe Latch Field Name Description <3:0> SI_RN GPR number from the specifier. <6:4> SI."AT Access Type of the operand associated with the specifier. <8:7> SCDL Data length of the operand associated with the specifier. <12:9> SI_BXS_SCORE Value of scoreboard source queue counter indexed by GPR number. <15:13> SI_ItXD_SCORE Value of scoreboard dest queue counter indexed by GPR number. <18:16> SI_DISPATCH Control store dispatch address. <19> SI-.AT_RMW Access Type of operand is R, M or W. <20> SI_INDEXED The base specifier has an index specifier. <21> SI~SIST Ebox assist specifier. <22> SI_PC_MODE The specifier uses program counter addressing <25:23> SI_REG_lNDEX Value of OQt: MD allocation pointer. <57:26> 51_IB_DATA Data for Immediate and displacement mode specifiers. <89:55> SI_IBOX_PC The PC of the next Istream byte following this specifier. <90> SI_VALlD S1 pipe latch valid bit. <91> SI_JMP_OR_JSB Indicates whether the instruction was ~!P or JSB. The SI_VALID bit indicates that the 81 pipe latch contains valid dispatch arguments waiting to be serviced. The CSU recognizes the availability of the valid complex dispatch, and performs the control store access. The microword is issued in 81 and loaded into the 82 pipe latch. The CSU sets SI_VALID when a complex specifier is parsed by the IBU and doesn't advance to stage 82 the following cycle. This is a result of a SI_STALL. The 81 logic clears 81_VALID upon successful transition of the 81 microword into the 82 pipe latch. The clear 81_VALID bit indicates the availability of the 81 pipe stage for a new complex specifier dispatch next cycle. The 81_STALL condition occurs when the 81 context latch cannot be loaded immediately into the 82 pipe latch. This condition may occur during an 82_STALL, when L,mu%QUAD_FLAG_B<o> is asserted, or a multiple microword flow. S2_STALL indicates that the 82 pipe latch cannot currently advance (see 8ection 7.4.2.2.2 for more details on the S2_STALL). Naturally this stall ripples back to become an 8I_STALL as well because the 81 microword cannot advance into the 82 pipe latch. I_IBUo/DQUAD_FLAG_B<O> indicates the mu is waiting for the second longword of a quadword immediate mode specifier. Once the second longword is retired, J..mu%QUAD_FLAG_B<O> is de-asserted and the csu is allowed to process the quadword immediate mode specifier. During multiple microword flows, the next control store address is generated from the microword in the 82 pipe latch. Consequently, the 81 pipe latch may accept one dispatch from the mu which sets 81_VALID. The dispatch in the 81 pipe latch is then in the S1_STALL condition waiting for service. The IBU uses 8I_VALID as part of the parser enable equation. If S1_VALID is clear then the IBU may parse a complex specifier and retire the instruction stream from the PFQ. If 81_VALID is set then if the IBU parses a complex specifier it cannot retire the instruction stream because the 81 pipe latch cannot accept the dispatch. The IBU stalls the parser such that the same specifier is parsed in subsequent cycles. DIGITAL CONFIDENTIAL The Ibox 7-43 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Typical microcode flows begin at a microcode address determined by a complex specifier dispatch. A DECODER_NEXT directive in the 82 pipe latch tells the microsequencer that the next microcode address is not related to the current flow. If SI_VALID indicates a valid dispatch waiting in the 81 pipe latch and the 82 pipe latch contains a DECODER_NEXT, then the microsequencer selects the 81 pipe latch as the source of the next microaddress. This begins a new microcode :flow for the specifier being dispatched. The microcode sequences through a flow using microaddress jumps. A jump selects the NXT_ADDR<6:0> field of the microword in the 82 pipe latch directly for the next microword address. The final microword of each :flow contains a DECODER_NEXT which once again requests a new dispatch address. Requests for IPR references which are detailed in Section 7.4.2.8 must guarantee that the esu is idle. Thus, whenever the 81 logic detects an IPR read strobe from the Ebox, then the next microaddress is selected by the IPR number. The request immediately dispatches to the utility microcode page. The unwind_mispredict routine is selected when the Ebox signals a branch mispredicted. The RLOG unwinds restoring the GPRs until the RLOG is empty, then the Ibox is restarted. The esu dispatches to the common entry point for the single microword index routine when the dispatch number of a specifier indicates that it is an index. The index register is read from the Ebox and shifted by length = DL. The microaddress control selects the IDLE address when no valid dispatch or utility dispatch awaits processing. The IDLE microword simply jumps to its own address and executes the DECODER_NEXT directive, awaiting a valid dispatch. In addition to the standard DECODER_NEXT directive, the microcode and next address logic supports a conditional DECODER..NEXT. The DECODER_NEXT_IF_BWL performs a standard DECODER_NEXT if the data length associated with the specifier is byte, word, or longword. For quadword data length the next address logic performs a microaddress jump. The microcode and next address logic supports one conditional jump. The BRANCH_IF_RLOG_EMPTY directive causes the next microaddress logic to perform a standard jump, but in addition the logic OR function of a 1 and the next microaddress bit <0> is performed if the RLOG is empty. The RLOG unwind microcode uses this conditional jump feature. A single microword jumps to itself as long as the RLOG still has valid entries. When the RLOG empties, the microword conditionally jumps out of the loop. See Section 7.4.2.3 for RLOG details. The 81 logic uses a five-input multiplexer to select the source of the next control store address. Both the complex specifier multiplexer input and Ebox assist multiplexer input use data from the 81 pipe latch to form the next address. The IPR multiplexer input uses the latched IPR number from the Ebox, to select which IPR type field will be used to form the next address. The next address field from the 82 microword enters another multiplexer input in order to perform the microaddress jump. The final multiplexer input is the idle address. Next address generation is summarized by Table 7-26. 7-44 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 7-26: bit field Next Address Generation Fields field lUUDe description Specifier Dispatch forced to 0 <0> <1> SI_INDEXED index specifier <2> SI_PC~ODE base register is the PC <3> SI..AT_BMW access type =read,modify, or write SI_DISPA1'CB4aO> field from the IBU <6:4> Assist Dispatch forced to 0 assist type forced to 111, assist page number !PR and Utility Dispatch forced to 0 <0> <3:1> 000 index routine <3:1> 001 lPR unwind RLOG read back-up PC <3:1> 010 ~ICT..L <3:1> 011 lPR read forced to 110, IPBIutility page number <6:4> Idle Dispatch forced to 1111111, idle address Nen Address next address field from the ~!rDCB.OWOB.D. For conditional jump OR in 1 if RLOG is empty <6:1> 7.4.2.2.2 NXT..,ADDR next address field from the ~!rDCB.OWOB.D S2 Pipeline Stage The 82 pipe latch controls the 82 pipeline datapath. Each cycle, the 82 pipe latch attempts to load a microword and specifier specific parameters from the instruction stream. The 82 pipe latch is shown in Table 7-27. DIGITAL CONFIDENTIAL The Ibox 7-45 NVAX CPU Chip Functional Specification, Revision 1.Ot February 1991 Table 7-27: S2 Pipe Latch Bit Field Field Name Description <3:0> S2JlN GPR number from the specifier. <6:4> S2..AT Access Type of the operand associated with the specifier. <8:7> S2J)L Data length of the operand associated with the specifier. <11:9> S2_BBG_JNDEX Current value of 82 MD allocation pointer or WX index. <15:12> S2_BXS_SCOBE Value of scoreboard source queue counter indexed by GPR number. <18:16> S2..BXD_SCOBE Value of scoreboard deBt queue counter indexed by GPR number. <47:19> S2Jrt[CltOWOJU) The microword issued in S1. <48> S2..NEW_FLOW Indicates the first microword of a fiow. <49> S2_JD_OB..,.DIP Indicates whether the instruction was J.MP or JSB. S2_R.~, S2_AT, S2_DL, S2_JSB_OR_JMP, S2_BXS_SCOBE, and, S2_BXD_SCORE load directly from the 81 pipe latch. S2_RXS_SCORE and S2_RXD_SCORE decrement in parallel with their corresponding SBU values. S2_REG_~"DEX typically loads directly from Sl_REG_lNDEX, however, if the dispatch is for an IPR read, it loads a copy of lYX.,INDEX from the Ebox. The S2_MICROWORD field of the 82 pipe latch updates from the microword issued by the 81 pipe stage. During an initial specifier dispatch, all of the 82 pipe latch updates. Bits <48:19> of the latch update every cycle, assuming no stalls. However, bits <49,18:0> of the latch remain constant throughout the context of one specifier flow, except for local scoreboard decrements of S2_R'XS_SCORE and S2_RXD_SCORE. This part of the 82 pipe latch does not load again until another dispatch occurs. This allows for multiple microword flows within the context of a given specifier. S2_NEW_FLOW indicates that contents of the 82 pipe latch represents the first microword of a new dispatch. In other words, the microword address for the microword in 82 was genera ted in any manner other than a microaddress jump. This pipe bit aids the 83 stage in loading the specifier context portion of the S3 latch. See section Section 7.4.2.2.3 for details. The 82 datapath contains the Csu register set and constant generator. The esu ALU source busses, the lA_bus and IB_bus, are controlled by the microcode fA and /B fields to drive the source busses in the 82 pipeline stage. The CSU microcode may also requests an Ebox GPR to source the lA_bus by providing the I%IBOX'....IA...ADDR_B<3:0> from the S2_BN field of the 82 pipe latch. The Ebox register read is strobed with IUBOX'....IA...BEAD_H. The Ebox returns GPR data later that cycle on the E%IBOx...IA...BUS_B<31:0> lines. This provides a path for the esu to obtain the base specifier register of the operand currently being processed. When the S2_microword is sourcing a GPR which is identical to the S3_microword destination register, the IW_BUS will be driven onto the source bus, bypassing the GPR read. Table 7-28: CSU Registers Register Available Written Name On From '1'0<> IA,IB IW 7-46 The Ibox Description temporary register DIGITAL CONFIDENTIAL · NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 7-28 (Cont.): CSU Registers Register Available Written Name On From IBJ>ATA IA,IB SPEC_DATA immediate and displacement data Description BX IA IW base specifier register JMD IA MD lbox memory data ltrfD IW Ebox memory data register wx IW Ebox working register 102 IB IB IB IB RLOG_RX IA RLOG_EDL IB moX,.pc IA EDL mL K" 1 for DL::Byte, 2 for Word, 4 for LONG, 8 for QUAD 1 for DL::Byte, 2 for Word, 4 for LONG, 8 for QUAD Constant 4 Constant 12 IW_BUS Register pointed to by top ofRLOG Same as KDL except using DL from top of RLOG stack IBU_PC PC of instruction byte following last byte in specifier TO is a temporary register for microcode use. IB_DATA and IBOx..PC are the S2 pipeline copies of SCIB_DATA and Sl_IBOX_PC respectively. IB_DATA and IBOx..PC are loaded along with the S2_PIPE_LATCH<18:O> on the first microword of a dispatch. Then the CSU microcode maintains control of these registers throughout the context of a given specifier fiow. RX refers to the Ebox GPR register indexed by S2_RN. RLOG_RX refers to the Ebox GPR register indexed by the RLOG_RN. See Section 7.4.2.3 for more details. !tID addresses the Ebox MD register indexed by S2_REG_INDEX. WX points to the Ebox working register also indexed by 82_BEG_INDEX. K4 and K12 are constants. KDL is a constant based on S2_DL. The value of the constant is 1 for DL::O (byte), 2 for DL=l (word), 4 for DL=2 (longword), and 8 for DL=3 (quadword). mL is a constant based on S2_DL for immediate mode specifier with access type A or V. IDL differs from KDL in the fact that the constant value is 4 for DL=3 (quadword). RLOG_KDL is a constant similar to KDL, but based on RLOG_DL. See Section 7.4.2.3 for more details. For a majority of memory requests started by the CSU microcode, the Ibox memory data returns to the IMD register. The Mbox drives M%lBOX'...DA1'A.,L when Mo/cMD_BUS_B<31:0> contains valid data from a specifier memory request. The 1M» has a signal IMD_VALID associated with it. Each time the CSU microcode initiates a memory request IMD_VALID is set. Each time memory data returns to !MD, IMD_VALID is reset. When M%MME_FAULT_B or M%BABD_ERR_H is asserted by the Mbox along with M%IBOX_DAT.A..L, this indicates that Ibox data on MtfciMD_BUS_B<63:0> is invalid and that the corresponding reference was associated with either a memory management exception or a hard error condition. In both cases the CSU continues to process the specifier, but sets fiags indicating the IMD contains invalid data. The fiags are reset at the end of each specifier fiow. They are forwarded to stage 83 whenever the 1M» is selected to source the lA_bus. They are called I%FORCE_MME_FAULT_B and I%FORCE_BARD_FAULT_B. When set they indicate to the Ebox and Mbox that the associated register write or Ibox reference should be forced to "look" like a memory management fault or a hardware fault from the Ibox point of view. DIGITAL CONFIDENTIAL The Ibox 7-47 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The 82 pipeline stage stalls for three reasons: GPR destination queue stall (RXD_STALL), Ibox memory data stall (IMD_STALL) and S3_STALL. BXD_STALL occurs when the esu microcode attempts a read of a GPR for which there exist outstanding writes in the Ebox destination queue. The 82 pipeline logic detects BXD_STALL when S2_RXD_SCORE does not equal 0, and the S2_MICROWORD attempts to read the GPR from the Ebox indexed by S2_BN. The stall breaks when the Ebox retires a destination queue entry that causes both the SBU counter and the snapshot S2_RXD_SCORE to decrement. Multiple destination queue entries may have to be retired, causing multiple decrements, before S2_BXD_SCORE equals O. IMD_STALL occurs when the S2_M1CROWORD attempts to read the IMD when IMD_VALID is set. This condition implies that a memory request was initiated by esu microcode which set IMD_VALID, but memory data which resets the signal has not yet been returned. DID_STALL can only happen in the context of one complex specifier flow when the Ibox requests then waits for memory data to be returned to IMD. S2_STALLS block the 82 pipeline latch update, causing the 82 stage to execute the same stalled MICROWORD until the stall breaks. If an 82 stall occurs, not resulting from a 83 stall, the S3 pipeline latch continues to updates; however, NOPs are fed into the 83 pipeline latch while the 82 stall is in progress. 'When the stall breaks, the pipeline latches resume normal operation. 7.4.2.2.3 S3 Pipeline Stage The 83 pipe latch controls the 83 pipeline datapath. Each cycle, the S3 pipe latch attempts to load a microword and the specifier-specific parameters from the instruction stream. The 83 pipe latch is shown in Table 7-29. Table 7-29: S3 Pipe latch Bit Field Field Name Description <3:0> S3_llN GPR number from the specifier. <6:4> S8-.AT Access Type of the operand associated with the specifier. <8:7> S8_DL Data length of the operand associated with the specifier. <11:9> S8JUilG..INDEX Current value of sa MD allocation pointer or WX index. <15:12> S8_BXS_SCOBE Value of scoreboard source queue counter indexed by GPR number. <46:16> S8_MICBOWOKD The microword issued in 81. <47> S8_JSB_OB...JMP Indicates whether the instruction was JMP or J8B. S3_BN, sa_AT, SS_DL, S3_BEG_INDEX, S3_JSB_OR_J.MP, and S3_RXS_SCORE load directly from the 82 pipe latch. S3_BXS_SCOBE decrements in parallel with its corresponding SBU value. When logic initiates a memory reference with an MD destination, S3_BEG_INDEX specifies the index into sa the MD register array for the memory data write. Such memory requests cause MD_INDEX to increment modulo the size of the MD register file, so that the data for quadword operands, which require two memory requests, occupy successive MD registers. The SS_MICROWORD field of the 83 pipe latch updates from the S2_MICROWORD. During the first instruction of a specifier dispatch :Bow, as indicated by the contents of S2_NEW_FLOW, all of the 83 pipe latch updates. The microword field in bits <46:16> continues to update every cycle, loading the new microword from 82. However, bits <47,15:0> of the latch remain constant throughout 7-48 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision I~Ot February 1991 the context of one specifier fiow, except for local scoreboard decrements of S2_BXS_SCORE, and local increments of S3_BEG_INDEL This part of the 83 pipe latch does not reload until another dispatch occurs, allowing for multiple microword flows within the context of a given specifier. The 83 data path contains the esu ALU and register write logic. The ALU maintains 32-bit input latches which load the lA_BUS and IB_BUS during an 83 pipe latch update. Under control of the microcode /ALU .FNC field the ALU performs 32-bit add, subtract, pass, and left bit-shift equal to S2_DL. The destination bus, IW_bus, provides the path to write the ALU results to one of the esu registers under control of the microcode IDST field. The IW_BUS_bus can also be selected to write to the Ebox GPR, MD, and working CWX) registers. The I%mox..IW_BUS_B<31:0> lines are driven from the ALU output, and the SS_RN field of the 83 pipe latch provide I%mOX_IW_ADDR_B<4:0> as an index into the GPR array. MD and WX writes both use the SS_REG_INDEX field of the S3 pipe latch to provide I%IBOx..IW~DR_B<4:0> as an index into the Ebox register array. The Ebox register write is strobed with IO/clBOx..lW_WRITE_B The S3 stage logic initiates esu memory requests based on the S3_MICROWORD. Along with a memory request command, the full 32-bit address is sent to the Mbox on the I%IBOx..ADDR_B<31:0> lines. These lines may be sourced from either the lA_BUS or IW_BUS, under the S3_MICROWORD IMREQ field control. If microcode selects the L-\_BUS for memory request address, the S3 pipe latch for the lA_BUS sources the address. The S310gic also f01"w'ards VIC_REQ from VIC Istream requests to the Mbox when there are no specifier memory requests in the S3_MICROWORD. In this case, the ISCIBOX_ADDR_B<31:0> is sourced by VIC_RE~ADDR from the 'VIC. The following control signals accompany I~IBOX_ADDR_B<31:0>. I%IBOx..CMD_L<4:0> indicates reference type to the Mbox. See Section l2.3.1 in Chapter 12 for valid values. I%IBOx..TAG_L<4:0> contains the Ebox register file destination of a memory request, a copy of S3_BEG_INDEX. I%IBOJCAT_L<l:O> and I%IBOx..DL_L<1:0> provide the Mbox with the access tj-"Pe and data length. ItFclBOx..AT_L<1:0> is either a copy of sa_AT or forced to read or write depending on control of the microcode JM:REQ field. I'7&mOX_DL_L<l:O> is either a copy of S3_DL or forced to longword depending on control of the microcode IML field. I%IBOx..REF_DEST_L<1:0> specifies the destination for memory request data. I%mox..REF_DEST_L<l> indicates that the Ebox MD registers are the destination. I%IBOx..REF_DEST_L<O> indicates that the Mbox IMD register is the destination. This field is decoded from the SS..MICROWORD memory field. The I%SPEC_REQ..B strobe is asserted for esu specifier memory requests. The I%IREF_REQ...B strobe is asserted for VIC Istream memory requests. For JMP, JSB, and certain Ebox assists, the 83 logic sends requests to the BPU to load a new PC. The PC value may be sourced from either the I%mox..IW_BUS_B<31:0> orM%MD_BUS_B<31:0> under S3_MICROWORD /MISC field control, as indicated by LD_PC_WBUS or LD_PC_MD respectively. The 83 pipeline stage stalls for three reasons: GPR source queue stall (BXS_STALL), memory request stall (MR(LSTALL), and (BLOG_STALL). RXS_STALL occurs when the esu microcode attempts to write a GPR destination for which there exist outstanding read in the Ebox source queue. The S3 pipeline logic detects BXS_STALL when SS_BXS_SCORE does not equal 0, and the SS..l\UCROWORD attempts to write the GPR in the Ebox indexed by SS_RN. The stall breaks when the Ebox retires a source queue entry that causes both the SBU counter and the snapshot SS_RXS_SCORE to decrement. Multiple destination queue entries may have to be retired, causing multiple decrements, before SS_BXS_SCOBE equals O. RLOG_STALL occurs when BLDG_FULL is asserted and the microword in the S3 pipe requests a GPR write. The stall effect is exactly the same as RXS_STALL. The stall breaks when the Ebox retires an instruction which in turn relinquishes RLOG resources. DIGITAL CONFIDENTIAL The Ibox 7-49 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 MRQ..STALL occurs when the S3_MICROWORD attempts a memory request but M%SPEC_Q..FULL_H signal from the Mbox indicates that the request cannot be accepted. the S3_STALLS block the 83 pipeline latch update, causing the 83 stage to execute the same stalled MICROWORD until the stall breaks. SS_STALLS also back-stall the 82 stage, in effect causing S2_STALL which blocks the 82 pipeline latch update. Both pipeline stages execute their respective stalled microwords until the stall condition breaks, allowing successful completion of the microword. The pipeline latches then continue to update as usual. RXS_STALL does not block the initiation of a memory request by the S3_MICROWORD. In other words, if the S3_MICROWORD indicates a memory request operation and no MRQ..STALL or RLOG_STALL exists, the request is initiated regardless of RXS_STALL. This somewhat de-coupled operation of the S3_STALIS breaks possible macroinstruction deadlocks due to the RO (RO)+ case. While processing the specifier (RO)+ the CSU microcode performs a write to the GPR RO. A RXS_STALL will hold until the Ebox retires the first source, RO. The Ebox must retire two source operands at a time, and therefore cannot retire the RO specifier until the MD for the second speciii.er is valid. The converse case, whether MRQ..STALL blocks a register write, is not an architectural or performance issue. This implementation blocks register writes during an MRQ..STALL. 7-50 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 7-12: Complex Specifier Unit Data Path Block Diagram 52 "ICH -,-, __.D_L______________ ~!~&~ ! i i i i Iml _ _~SP~EC~D~AT~A_ _ _ _ _ _ _ _ _ _ _ _~;~~~ 53 "mH I , ! i ; i i i i i t _I~!--+-II~_. ;.S2;.:;.-e;.:;.;-.;. ;. us~_ _i_l Ie BUS ; I--I__________ IB;,;;;;;OX...,I... W ... BU_S_ I" BUS ri,i I : IBU PC -....;.:;.;:;....;..;..-----------+-.......-!I f ~!- ...... . r; ~!~I ; ! I . 180X AODP. v IBOX IA .. D BUS 7.4.2.3 RLOG The register log or RLOG allows the Ibox to restore the state of the GPRs under certain exception conditions. Because of the pipeline organization, the Ibox works on macroinstructions ahead of the Ebox execution. Any or all of six possible operand specifiers for any distinct macroinstruction may be auto-increment or auto-decrement mode, which by definition modify the GPRs. The Ibox must log all modifications to the GPRs for these operand specifiers and. keep the log until the Ebox has retired the associated instruction. If the instruction stream gets redirected due to a branch or exception, then the Ibox uses the RLOG to restore the GPR registers to the condition expected at the time of the redirection. The RLOG is an 8-entry circular queue with read and write pointers. Each entry is oomposed of 7 bits, 4 bits contain the GPR number, 2 bits specify DL, and 1 hit indicates auto-increment or auto-decrement. Elements are added to the RLOG under oontrol of the SS_MICROWORD IDST field. When the microword specifies a register log operation, then S3_RN, SS_DL, and the encoded IALU.FNC are entered in the RLOG entry pointed to by the write pointer. The write pointer is then incremented modulo 8. If the RLOG write pointer reaches the state in which another increment causes the write pointer to equal the read pointer, then the RLOG is full. The RLOG full condition may cause an RLOG_STALL as described in Section 7.4.2.2.3. DIGITAL CONFIDENTIAL The Ibox 7-51 NVAX CPU Chip Functional Speci:6cation, Revision 1.0t February 1991 The RLOG only contains specifier state for macroinstructions which the Ebox has not executed. When the Ebox retires a macroinstruction, the RLOG discards RLOG entries associated with that macroinstruction, by advancing the RLOG read pointer. The RLOG_BASE_POINTER and RLOG_BASE_QUEUE provide the means for read pointer advancement. The RLOG_RASE_POINTER increments anytime a valid auto-increment address mode specifier, auto-decrement address mode specifier, auto-increment assist, or auto-decrement assist appears on SPEC_CTRL. In effect, the RLOG_BASE_POINTER allocates RLOG spaces for the csu to make subsequent entries. The RLOG_BASE_POlNTER is loaded into the 6-entry RLOG_BASE_QUEUE each time a new PC is loaded into the PC_QUEUE. The RLOG_BASE_QUEUE thus maintains an RLOG read pointer for every PC in the PC_QUEUE. The RLOG_BASE_QUEUE and the PC_QUEUE both retire entries when the Ebox asserts rIOR.ETIRE_lNSTR_L indicating that it bas retired a macroinstruction. The RLOG read pointer loads the value of the next RLOG_BASE_QUEUE entry at this time. The esu microcode controls the RLOG unwind procedure. RLOG unwind consists of repeatedly executing a microword that updates the GPR registers based on indirect references to RLOG_R..'T\I, RLOG_DL, and RLOG_FUNC. The RLOG supplies the values for the indirect references from the entry pointed to by the read pointer. This entry is retired by incrementing the read pointer. The RLOG retires successive entries until the read pointer is equal to the write pointer, then the RLOG is empty. At this point the unwind procedure completes and the RLOG is flushed by resetting the RLOG read and ,vrite pointers, the lU..OG_BASE_POn."TER, and the RLOG_BASE_QUEUE read and \vrite pointers. If the RLOG is empty when the microcode initiates an unwind, 0 will be added to whatever GPR is pointed to by the read pointers. 7.4.2.4 Branch Mispredict effects When the Ebox asserts E%BBANCH_MISPREDICT_L, the NOP microword is forced into the 83 pipeline stage, the 81 pipe latch valid bit is cleared, and the next microaddress logic selects the MI8PREDICT.UNWIND utility routine address. The microcode at this location unwinds the RLOG and then restarts the Ibox. H the RLOG is empty when the microcode initiates an unwind, o will be added to whatever GPR is pointed to by the read pointers. Note that the RLOG is NOT flushed on the assertion of E%BRANCH_MISPREDICT_L. It needs to remain intact to be unwound by esu microcode. IMD_VALID is reset upon the assertion of E%BRANCB..MISPREDICT_L. 7.4.2.5 E"oSTOP_IBOX Effects When the Ebox asserts E%STOP_moX-.,B, the microsequencer jams the csu to the idle state, except in the case when the csu is in the middle of IPR transaction unwind RLOGlread back-up PC. In this situation, the RLOG will unwind until completion, and the read of the back-up PC will be disabled. The esu is put into the idle state by forcing NOP microwords into the 82 and 83 pipeline stages, clearing the 81 pipe latch valid bit, and selecting the IDLE microadd.ress. 7.4.2.6 RSVD_ADDR_FAULT effects When I%BSVD_ADDR_FAULT_B is asserted for a complex specifier the 81 pipe latch valid bit is cleared. If there isn't a 81 stall the NOP microword is forced into the 82 pipeline stage. Complex specifiers already in the esu pipeline when I%RSVDJ\DDR_FAULT_B is asserted are allowed to finish processing. 7-52 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 7.4.2.7 CSU Microcode Restrictions The eBU microcode must guarantee, for all auto-increment, auto-increment deferred, and auto-decrement specifier microcode :8.ows, that any specifier memory requests destined for the MD is issued before or during the microword that modifies the GPR. Otherwise, it is possible for the eBU to infinitely stall due to an BXS_STALL. This is evident in the case ADDL2 RO,@(RO)+ where the Ebox must retire two source operands, and therefore cannot retire the RO specifier until the MD for the second specifier is valid. The e8U microcode must also guarantee, for all auto-increment, auto-increment deferred, and auto-decrement specifier microcode :8.ows, that the microword which initiates the memory request destined for the MD must have the misc field stall_iCrlo~full if the following microword modifies the gpr. The eBU microcode must guarantee, for all auto-increment, auto-increment deferred, auto-decrement and auto-decrement deferred specifier microcode :8.ows with access type AV, that the microword which writes the MD is immediately followed by the microword that modifies the gpr. This, in conjunction with an EBOX microcode restriction, is necessary in order to prevent an infinite RXS stall from occurring. The esu microcode must guarantee that memory requests which specify the Ibox IMD as the data destination, are used only for deferred operand evaluation. For a microword with a [IMD] sourc:. the previous micro'\vord must initiate the memory request with destination IMD and must not perform a GPR ,\rnte and not have the mise field stall_if_rlo~full. All this is necessary to protect the use of an unconditional MD latch in the esu datapath. 7.4.2.8 Ibox IPR Transactions The Ebox microcode communicates with the Ibox in part through internal processor registers (IPRs). The IPR reads are handled by esu microcode. The IPR write control is distributed, however the description is included here for completeness. Ebox microcode conventions guarantee that the Ibox is idle before initiating Ibox IPR transactions. This is accomplished either by the knowledge that the current Ebox microcode :8.ow takes place in a macroinstruction with an drain Ibox assist or by asserting an explicit E%STOP_IBO:K...B command. The only exception involve the issuing of an IPR transaction when the CSU is involved in an RLOG unwind operation. In this case the unwind finishes in the esu, then the csu processes the latched IPR command. If the RLOG is empty when the microcode initiates an unwind, 0 will be added to whatever GPR is pointed to by the read pointers. MICROCODE RESTRICTION 7.4.2.8.1 IPR Reads The Ebox signifies an IPR read by asserting the E%IBOX_IPR_READ_H strobe, the E%IBO:K...IPR_TAG_B<2:0>, and the E%IBO:K...IPR_NUM_B<3:0>. This information is latched in the B1 logic stage, and an IPR request :8.ag is posted. The 81 next address logic responds by creating an IPR dispatch to an IPR microaddress in the utility page of microcode, and by clearing the IPR request :8.ag. All Ibox logic blocks associated with IPR reads examine the E%IBOX_IPR_TAG_H<2:0>. If the IPR source is within a section, that section prepares to drive the IPR read data onto the VIC_REQ...ADDR. The microcode at the common IPR routine reads the VIC_REQ..ADDR, passes the value through the ALU, and writes the data to an Ebox working register located at the DIGITAL CONFIDENTIAL The Ibox 7-53 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 E%IBO~IPR_NVM_B<3:0> offset in the register array. The VIC_REQ...ADDR is used for IPR read data source simply because it is a convenient 32-bit bus that runs through the entire section. 7.4.2.8.2 IPR Wrttes The Ebox signifies an IPR write by asserting the E%IBO~IPR_WHITE_B strobe and the All Ibox logic blocks associated with IPR writes examine the E%IBOX_IPR_TAG_H<2:0>. If the IPR destination is within a section, that section prepares to accept the IPR write data from the M%MD_BUS_H<63:0>. The Mbox drives the M%MD_BUS_B<63:0> with IPR data and asserts M%IBO~IPR_WR_B to complete the transaction. E%IBO~IPR_TAG_H<2:0>. 7.4.3 Scoreboard Unit The Scoreboard Unit (SBU) keeps track of the number of outstanding references to GPRs in the source and destination queues. The SBU contains two arrays of 15 counters: the RXS_AlUlAY for the source queue and the RXD~ for the destination queue. The counters in the arrays map one-to-one with the GPRs. There is no scoreboard counter corresponding to GPR 15, the PC, because P~IODE operations to the PC are unpredictable. The maximum number of outstanding operand references determines the maximum count value for the counters. This value is based on the length of the source and destination queues. The RXS_ARRAY counts up to 12 and the RXD_ARRAY counts up to 6. Each time valid register mode source specifiers appear on SPEC_CTRL<13:0>, the BXS_ARBAY counters that correspond with those registers are incremented. At the same time, the OQU inserts entries pointing to these registers in the source queue. In other words, for each register mode source queue entry, there is a corresponding RXS_ARBAY counter increment. This implies a maximum of 2 counters incrementing each cycle when a quadword register mode source operand is parsed. Each counter may only be incremented by 1. When the Ebox removes the source queue entries, the counters are decremented. The Ebox removes up to 2 register mode source queue entries per cycle as indicated on E%SQ..RETIRE_BMODE_B<I:0>. The GPR numbers for these registers are provided by the Ebox on E%SQ..BETIREJtNl_B<3:0> and E%S(LBETIRE_BN2_B<3:0>. A maximum of 2 counters may decrement each cycle, or anyone counter may be decremented by up to 2, if both register mode entries being retired point to the same base register. In a similar fashion, when a new register mode destination specifier appears on SPEC_CTRL<13:0>, the BXD_ARBAY counter that corresponds· to that register is incremented.. A maximum of 2 counters increment in one cycle for a quadword register mode destination operand. When the Ebox removes a destination queue entry, the counter is decremented. The Ebox indicates removal of a register mode destination queue entry on n,DQ..BETIRE_BMODE_H. The GPR number for the register is provided by the Ebox on E%DQ..BETIRELBN_B<3:0>. Whenever a complex specifier is parsed, the GPR associated with that specifier is used as an index into the source and destination scoreboard arrays and snapshots of both scoreboard counter values are passed to the csu on RXS_sCORE<3:0> and RXD_scoRE<2:0>. The csu stalls ifit needs to read a GPR for which the destination scoreboard counter value is non-zero. A non-zero destination counter indicates that there is at least one pointer to that register in the destination queue. This means that there is a future Ebox write to that register and that its current value is invalid. The CSU also stalls if it needs to write a GPR for which the source scoreboard counter value is non-zero. A non-zero source scoreboard value indicates that there is at least one pointer to that register in the source queue. This means that there is a future Ebox read to that register and 7-54 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 its contents must not be modified. For both scoreboards, the copies in the CSU pipe are locally decremented on assertion of the retire signals from the Ebox. 7.4.3.1 EO/oSTOP_IBOX and Branch Mispredlct PC Load Effects 'Whenever a branch mispredict PC load occurs, or the Ebox issues a E%STOP_IBOx..H, all scoreboard array counters are cleared. 7.5 Branch Prediction The Branch Prediction Unit (BPU) monitors each instruction opcode as it is parsed, looking for a branch opcode. Upon identification of a branch opcode, the BPU predicts whether or not the branch will be taken. If the BPU predicts the branch will be taken, it adds the sign extended branch displacement to the current PC and broadcasts the resulting new PC to the rest of the Ibox on the NEW_PC lines. 7.5.1 Branch Prediction Unit 7.5.1.1 The Branch Prediction Algorithm The BPD uses a "Branch History" algorithm for predicting branches. The basic premise behind this algorithm is that branch behavior tends to be patterned. If one looks in a program at one panicular branch instruction, and traces over time that instruction's history of branch taken vs. branch not taken, in most cases a pattern develops. Branch instructions that have a past history of branching seem to maintain that history and are more likely to branch than not branch in the future. Branch instructions which follow a pattern such as branch, no branch, branch, no branch etc., are likely to maintain that pattern. Branch history algorithms for branch prediction attempt to take advantage of this "branch inertia tt • The NVAX branch prediction unit uses a table of branch histories and a prediction algorithm based on the past history of the branch. When the BPU encounters a conditional branch opcode, a subset of the opcode PC bits is used to access the branch history table. The output from the table is a 4 bit field containing the branch history information for the branch. From these 4 history bits, a new prediction is calculated indicating the expected branch path. Many different opcode PCs map to each entry of the branch table because only a subset of the PC bits form the index. When a branch opcode changes outside of the index region, the history table entry that it indexes may be based on a different branch opcode. The branch table relies on the principle of locality, and assumes that, having switched PCs, the current process operates within a small region for a period of time. This allows the branch history table to generate pertinent history relating to the new PC within a few branches. The branch history information consists of a string of 1's and O's indicating what that branch did the last four times it was seen. For example, 1100, read from right to left, indicates that the last time this branch was seen it did not branch. Neither did it branch the time before that. But then it branched the two previous times. The prediction bit is the result of passing the history bits that were stored through logic which predicts the direction a branch will go given the history of its last four branches. DIGITAL CONFIDENTIAL The Ibox 7-55 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The prediction algorithm is accessible via IPR for software programmjng and testability reasons. After power-up, the Ebox microcode initializes the branch prediction algorithm segment of the BPCR register with an algorithm which is the result of extensive simulation and statistics gathering. While it would be possible to create a program for which this prediction logic is wrong all the time, on the average it does very well. This algorithm is shown in Table 7-30. The BPCR is discussed in greater detail in Section 7.5.1.8. 7.5.1.2 The Branch History Table The 512 entries in the branch table are indexed by the opcode pc<8:0>. Each branch table entry, as depicted in Figure 7-13, contains the previous four branch history bits for branch opcodes at this index. The Ebox asserts E~FLUSB_BPT_B under microcode control during process context switches. This signal resets all branch table entries to a neutral value: history = 0100. This will result in a next prediction of O. MICROCODE RESTRICTION ECiCFLUSH_BPT_B may only occur while the Ibox is stopped. E%FLUSH_BPT_H must be asserted before the first branch is executed. Figure 7-13: 3 i Branch Table Entry Format :2 F.is,:~=y 1 0 I +---+---+---+---+ (:mos~ 7.5.1.3 recent) Branch Prediction Sequence When the BPU encounters a conditional branch opcode it reads the branch table entry indexed by pc<8:0>. If the prediction logic indicates the branch taken, then the BPU sign extends and adds the branch displacement supplied by the IBU to the current PC, and broadcasts the result to the Ibox on the NEW_PC lines. If the prediction bit indicates not to expect a branch taken, then the current PC in the Ibox remains unaffected. The alternate PC in both cases (CUlTent PC in predicted taken case, and branch PC in predicted not taken case) is retained in the BPU until the Ebox retires the conditional branch. When the Ebox retires a conditional branch, it indicates the actual direction of the branch. The BPU uses the alternate PC to redirect the Ibox in the case of an incorrect prediction. Section 7.5.1.7 has more details on mispredicted branches. The branch table is written with new history each time a conditional branch is encountered. Once a prediction is made, the oldest of the branch history bits is discarded. The remaining 3 branch history bits and the new predicted history bit are written back to the table at the same branch PC index. When the Ebox retires a branch queue entry for a conditional branch, if there was not a mispredict, the new entry is unaffected and the BPU is ready to process a new conditional branch. If a mispredict is signaled, the same branch table entry is rewritten, this time the least significant 7-56 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Hevision 1.0, February 1991 history bit receives the complement of the predicted direction, refiecting the true direction of the branch. The branch prediction logic is based on the contents of the BPCR register, described in Section 7.5.1.8. Mter power-up, as part of the initialization sequence, the Ebox microcode initializes the BPCR to ECOS (HEX) which implements the truth table in Table 7-30. MICROCODE RESTRICnON An IPR write to the BPCR register in the BPU is required after power-up to load the branch prediction algorithm. Table 7-30: Branch History 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 7.5.1.4 Branch Prediction Logic Prediction for Next Branch Not taken Taken Not Taken Taken Not Taken Not Taken Taken Taken Not Taken Taken Taken Taken Taken Taken Taken Taken The Branch Queue Each time the BPU makes a prediction on a branch opcode, it sends information about that prediction to the Ebox on the I%BRANCH_BUS_H<1:0> The Ebox maintains a queue of branch data entries containing information about branches that have been processed by the BPU but not by the Ebox. The bus is 2 bits wide: one valid bit and one bit to indicate whether the Ibox took the branch or not. Entries are made to the branch queue for both conditional and unconditional branches. For unconditional branches, the value of I%BRANCH_BUS_B<O> is ignored by the Ebox. The branch queue length is selected such that it does not overflow, even if the entire instruction queue is :filled with branch instructions, and there are branch instructions currently in the Ebox pipeline. At anyone time there may be only one conditional branch in the queue. A queue entry is not made until a valid displacement has been processed. In the case of a second conditional DIGITAL CONFIDENTIAL The Ibox 7-57 NVAX CPU Chip F~ctioJial Specification, Revision 1.0, February 1991 branch encountered while a first is still outstanding, the entry may not be made until the first conditional branch has been retired. 7.5.1.5 Branch Mispredlct When the Ebox executes a branch instruction and it makes the final determination on whether the branch should or shouldn't be taken, it removes the next element from the branch queue and compares the direction taken by the Ibox with the direction that should be taken. If these differ, then the Ebox sends E%BRANCH..MISPREDICT_L to the BPU. A mispredict causes the Ibox to stop processing, undo (using the RLOG) any GPR modifications made while parsing down the wrong path, and restart processing at the correct alternate PC. 7.5.1.6 Branch Stall The BPU back-pressures the IBU by asserting BBANCH_STALL when it encounters a new conditional branch with a conditional branch already outstanding. If the BPU has processed a conditional branch but the Ebox has not yet executed it, then another conditional branch causes the BPU to assert BRANCH_STALL. Unconditional branches that occur with conditional branches outstanding do not create a problem because the instruction stream merely requires redirection. The alternate PC remains unchanged until resolution of the conditional branch. The Ebox informs the BPU with the Eo/eBCOr-."D_RETIRE_L each time a conditional branch is retired from the branch queue in order for the BPU to free up the alternate PC and other conditional branch hardware. BRA.~CH_ST.ALL blocks the Ibox from processing further opcodes. ~llen BB.A.~CH_STALL is asserted, the IBU finishes parsing the current conditional branch instruction, including the branch displacement and any assists, and then the IBU stalls. The branch queue entry to the Ebox is made after the first conditional branch is retired. At this time, BRANCH_STALL is de-asserted and the alternate PC for the first conditional branch is replaced with that for the second. BSTL_FRC_pcQ is a signal used by the PC queue logic to force an entry into the PC queue when the second conditional branch is finally processed by the BPU after the release of a BRANCH_STALL. During a BRANCH_STALL, the PC queue refrains from updating the last entry to point to the next instruction until the stall breaks and the BPU finishes processing the second conditional branch. 7.5.1.7 PC Loads The BPU distributes all PC loads to the rest of the Ibox. Ibox PC loads from the csu microcode load a new PC in one of two ways. When the Csu asserts PC_LD_WBUS, it drives a new PC value on the I%IBOx:,.IW_BUS_B<:31:0> lines. PC_LD_MD indicates that the new PC is on the MtQID_BUS_B<:63:0> lines. The BPU responds by forwarding the appropriate value onto the NEW_pc<:31:0> lines and asserting LOAD..NEW_PC. These Ibm PC loads do not change conditional branch state in the BPU. The Ebox signals its intent to load a new PC by asserting E%mox:,.LOAD_PC_L. The assertion of this signal indicates that the next piece of IPR data to arrive on the MtQID_BUS_B<63:0> is the new PC. The next time the Mbox asserts M%IBOx:,.IPR_WR_H, the new PC is taken from M%'MD_BUS_H<:31:0> and forwarded onto NEW_pc<:31:0> and LOAD_NEW_PC is asserted. 7-58 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 The BPU performs unconditional branches by adding the sign extended branch displacement to the current PC, driving the new PC onto the NEW_pc<31:0> lines and asserting LOAD_NEW_PC. Conditional branches load the PC in the same fashion if the logic predicts a branch taken. The following actions occur on a conditional branch mispredict or Ebox PC load: • • • • 7.5.1.8 any pending conditional branch is cleared pending unconditional branches are cleared any pending write to the Ebox branch queue is cleared I%FLUSH_mEF_LAT_H is asserted to abort pending Istream fill requests in the Mbox Branch Prediction IPR Register The BPCR IPR provides control for the BPU and read/write access to the history array. The write-only BPCR<FLUSH_BIIT> bit causes a BPU branch history table flush. The flush is identical to the context switch flush, which resets all branch table entries to a neutral value: history bits = 0100. The write-only BPCR<FLUSH_CTR.> bit causes the BRANCH_TABLE..COUNTER<8:O> to be cleared. The BRANCH_TARLE_COUNTER provides an address into the branch table for IPR read and write accesses. Each IPR read from the BPCR or write to the BPCR with BPCR<LOAD_HISTORY> = 1 increments the counter. This allows IPR branch table reads and writes to step through the branch table array. BPCR<LOAD_HISTORY> enables writes to the branch history table. A write to the BPCR<HlSTORY> field with BPCR<LOAD_HlSTORY> = 1 causes a BPU branch history table write. The history bits for the entry indexed by the counter is written with the IPR data. BPCR reads supply the history bits in BPCR<HlSTORY> for the entry indexed by the counter. BPCR<MISPREDICT> will return a "1" if the last conditional branch mispredicted. BPCR<31:16> contain the branch prediction algorithm. Any IPR write to the BPCR will update the algorithm. An IPR read will return the value of the current algorithm. For example, a "ott in BPCR<16> means that the next branch encountered will not be taken if the history is ttOOOO". A "1" in BPCR<21> means that the next branch encountered when the prior history is tt0101" will be taken. DIGITAL CONFIDENTIAL The Ibox 7-59 NVAX CPU Chip Functional SpecificatioRt Revision 1.1, August 1991 Figure 7-14: IPR D4 (hex), BPCR 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 9 81 7 6 5 41 3 2 1 0 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 BPO_ALGORITHM 1 0 1 1 1 1 01 history I :BPCR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 1 LOAD_HISTORY ---+ I FLUSH_CTR ---+ MISPREDICT ---+ HISTORY ---+ The microcode will write the following bit pattern as part of the power-up sequence: 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 9 81 7 6 5 41 3 2 1 0 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 1 1 1 1 1 1 1 0 1 1 0 0 1 0 1 01 All O's +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Table 7-31 : BPCR Field Descriptions Name Extent Type HISTORY 3:0 RW Branch history table entry history bits. MISPREDICT 5 RO Indicates if last conditional branch mispredicted. FLUSH_BHT 6 WO Write of a 1 resets all history table entries to a neutral value, hardware clears hit. FLUSH_eTR 7 WO Write of a 1 resets BPCR address counter to 0, hardware clears bit. LOAD_HISTORY 8 WO Write history array addressed by BPCR address counter. RW Controls direction of branch for given history. BPU_ALGORITHM 31:16 Description MACROCODE RESTRICTION If an MTPR to the BPCR register is followed by a conditional branch instruction, the prediction algorithm used for this branch is unpredictable. Furthermore, the branch history table update is also unpredictable. The BPU functions correctly, but programs which depend on particular patterns of branch predictions (such as diagnostic tests) should avoid placing conditional branch instructions immediately after an MTPR instruction that writes to the BPCR register. Bits 8,7,6 are defined in Table 7--32 for IPR writes to the BPCR. NOTE: The prediction algorithm will be updated on every IPR write to the BPCR. 7-60 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 7-32: 7.6 BPCR <8:6> BIT BIT BIT 8 '1 6 0 0 0 Write Action Do nothing, except update algorithm 0 0 1 Flush branch table. History not written 0 1 0 Address counter reset to o. History not written 0 1 1 Flush branch table, reset address counter, history not written 1 0 0 Write history to table, counter automatically increments 1 0 1 Undefined: Branch table flushed, new history written, counter incremented 1 1 0 Undefined: Write history to old counter value, counter reset to 0 1 1 1 Undefined: Bninch table flushed, write history to old counter value, counter reset to 0 PC Load Effects This section summarizes the various effects of loading a new PC in the Ibox. New PCs are loaded from four different sources. The BPU receives the new PCs from all these sources, dri'''es the new PC on NEW_Pc<31:0>, and asserts LOAD_NEW_PC. The four sources for new PCs in priority order are: 1. Ebox PC load from the M%MD_BUS_B<31:0> The Ebox loads a new PC as a result of an interrupt or exception or for instructions like _ REI, HALT, CASEx etc. After the Ebox asserts the E%IBO:K..LOAD_PC_L signal, the PC is supplied on the M%MD_BUS_H<31:0>, along with the M%mOX_IPR_WR_B signal. The BPU selects Mo/OMD_BUS_B<31:0> to drive NEW_pc<31:0> and asserts LOAD_NEW_PC. 2. Branch Mispredict PC When a mispredict has been detected, the BPU drives NEW_pc<31:0> from the alternate PC latch containing the address of the branch path not taken, and asserts LOAD_NEW_PC. 3. PC_LD_WBUS from the csu For instructions like JSB and JMP, the CSU computes a new PC and drives that PC up to the BPU. The BPU receives the PC on I%IBO:K..IW_BUS_B <31:0>, drives NEW_pc<31:0> and asserts LOAD_NEW_PC. 4. PC_LD_MD from the CSU For instructions like JSB, JMP, RET and RSB, the csu requests a new PC from the MbOx. The csu asserts PC_LD_MD, and the next M%mox..,DATA..L signals the new PC is on the M%MD_BUS_B<31:0>. The BPU receives the PC on MtQID_BUS_B <31:0>, drives NEW_pc<31:0> and asserts LOAD_NEW_PC. 5. Branch Destination PC For unconditional branches or when the BPU predicts a conditional branch as taken, it computes the branch destination, drives NEW_pc<31:0>, and asserts LOAD_NEW_PC. DIGITAL CONFIDENTIAL The Ibox 7-61 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The effects of loading a new PC are shown below. These effects take place regardless of the source of the PC. • PREFETCB_ENABLE is set in the VIC. • VIBA<31:3> in the VIC are loaded from NEW_pc<31:3> • MBARD_ERR is cleared in the VIC. • IMMGT_EXC is cleared in the VIC. • MISS_PENDING is cleared in the VIC. • WRITE_PENDING is cleared in the VIC. • VIC_READ is set in the VIC, allowing a new cache read sequence from the new address. • • • • The PFQ is flushed and NEW_Pc<2:0> are latched as the initial BYTES_RETIRED. The BPU asserts I%FLUSB_IREF_LAT_H indicating that the Mbox should flush its !REF latch. The IBU stops the parser and latches the new PC from NEW'_pc<31:0>. The IIU latches the new PC as the next entry in PC queue. 7.6.1 Mispredict PC Loads V\Then a PC load is the result of a branch mispredict, additional actions must be taken as described below • • • • • • • • • • • • 7.6.2 All pending conditional and unconditional branches are cleared in the BPU. Pending branch queue writes are aborted by the BPU. In the IIU, the instruction queue free counter is cleared. In the IIU, the PC queue is flushed In the IIU, ISSUE_STALL is cleared. The SBU clears the scoreboard array counters. In the csu, the 81 stage produces the mispredict RLOG unwind microaddress. The S3 stage is forced to NOP. NOTE: The RLOG is NOT flushed. In the csu, IMD_VALID is reset. In the OQU, the MD allocation pointer is reset and the MD allocation counter is cleared. In the OQU, the source queue free counter is cleared. In the OQU, the destination queue free counter is cleared. In the csu, the LD_PC_MD latch is cleared. Ebox PC Loads When the Ebox is the source of the new PC, the signal EtfdBOx..LOAD_PC_L is asserted several cycles before the actual PC arrives from the Mbox. Mter this signal is asserted, but before the new PC is loaded, the signal E%REST.ART_IBOX_H may be asserted, starting the parser and VIC prefetching. To avoid parsing from the wrong instruction stream, the following actions are taken upon the assertion of E%IBOx..LOAD_PC_L. • • The PFQ is flushed, forcing PFQ..EMPTY to be asserted. VIC prefetching is disabled until LOAD_NEW_PC is asserted by the BPU. This also blocks VIC bypass to the PFQ. 7-62 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • MIIARD_ERR is cleared in the VIC. • IMMGT_EXC is cleared in the VIC. MICROCODE RESTRICTION E%IBOx..LOAD_PC_L and E%lBOX-IPR_WRITE_H must not occur in the same Eo/dBOX_LOAD_PC_L and E%RESTART_mox..H must not occur in the same cycle. cycle. EO/oSTOP_IBOX effects 7.7 When the Ebox microcode performs a MISCIRESET_CPU it asserts Eo/eSTOp_mOX-II. The Ibex requires E%9TOP_IBOx..H to be asserted whenever RESET_L is asserted. MICROCODE RESTRICTION Eo/tSTOP_IBOX.;.H" must always be followed by E%IBOx.,LOAD_PC_L and then Eo/cRESTART_lBOX_H. EO/tSTOP_mOX_H and Ec;cBRANCH_MISPREDICT_L cannot occur in the same cycle. The effects of this signal on the various sub-sec-aons in the !box are shown below. • PREFETCH_E.~LE is cleared in the VIC • l\IISS_PE-'''DING, '\\o'"RITE_PENDING, and READ_STATE are cleared in the VIC, putting the VIC in • • • • • • • • • • • • • • • • • • • an idle state. IHARD_ERR is cleared in the VIC. MIIARD_ERR is cleared in the VIC. IMMGT_EXC is cleared in the VIC. In the Iru, the instruction queue free counter is cleared. In the Iru, ISSUE_STALL is cleared. The IQ..VALID signal, from the IIU to the Ebox, is cleared. The Istream parser in the mu is stopped. The signals I~IIEIUCH and I%IMEM_MEXC_B are cleared. The PREV_NOT_DONE signal is cleared in the mu CSU_LD_PC_PEND is cleared in the mu LD_NEW_PC_PEND is cleared in the mu The FD opcode fiip-fiop is cleared in the mu The IDLE microword is injected into all stages of csu pipeline. However, NOTE: RLOG unwind is not aborted. If an IPR read to back-up PC with RLOG unwind is in progress, the unwind completes as normal, but the back-up PC write to the Ebox working register is disabled. All other Ibox IPR accesses are aborted. IMD_VALID is reset in the csu The IREF-pending latch is cleared in the csu The PC_LD_MD - pending latch is cleared in the csu The IPR read/write select signals reset in the csu The stage 1 valid hit is cleared in the csu DIGITAL CONFIDENTIAL The Ibox 7-63 NVAX CPU Chip Functional Speci:6cation, Revision 1.0, February 1991 • • • • • • • The source queue allocation counter is cleared in the OQU The destination queue allocation counter is cleared in the OQU The MD allocation counter is cleared in the OQU The MD index counter is cleared in the OQU The source and destination scoreboard counters are cleared in the SBU Branch stalls are cleared in the BPU I%FLUSH_IREF_LAT_H is asserted 7.8 Initialization 7.8.1 Mechanisms for Ibox State Reset The Ibox depends on the EC"~P_IBOX_B signal to initialize the states shown in Section 7.7. In addition, RESET_L is used to clear those states listed belo'\v which cannot be initialized by E%STOP_IBOx..B. • VIC_ENABLE is cleared in the \'1c. • • • RLOG pointers are reset in the CSU. The IDLE microword is injected into stage 1 of the CSU pipeline. PC queue pointers are reset in the IIU. 7.9 Errors, Exceptions, and Faults 7.9.1 Overview The Ibox handles some of the processing for memory hardware errors, memory management exceptions, and reserved opcode faults, and reserved addressing mode faults. A global view of error, exception, and fault handling is presented here. Implementation details are distributed amongst the Ibox sub-section text. Istream memory hardware etTors may originate in the Mhox and memory subsystem or in the Dstream memory hardware errors originate in the Mbox and memory subsystem. Istream and Dstream memory management exceptions originate in the MbOx.. Reserved opcodes and reserved addressing modes are detected in Ibox. hardware during instruction parsing. VIC array. 7.9.2 Istream Memory Errors When the Mbox conditions returning Istream data with M'1ciMME_FAULT_B or M%lIARD_ERR_H, the VIC and PFQ writes are inhibited, prefetching is disabled, and the VIC sets appropriate condition fiags for the mu. The IBU continues to parse until it attempts to parse the Istream data that caused the exception or error. The condition :flags are then forwarded to the Ebox.. If the Ebox. detects an empty instruction queue, source queue, destination queue, or field queue while the exception or error condition is asserted, the Ebox. initiates an exception microtrap. 7-64 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Any PC load or E%STOP_IBOx..H resets the error and exception Hags in the VIC. An Ibox PC load or Eo/cRESTART_IBOx..B restarts prefetching and parsing. Thus if the error or exception gets forwarded to the Ebox, the Ebox can reset the Ibox fiags, load a new PC and continue. If the instruction stream branches around the instruction stream data responsible for the error or exception, the Ibox resets the error fiags and continues without reporting the condition. If a VIC parity error is detected, VIC prefetching and mu instruction parsing are halted immediately and the error forwarded to the Ebox. This action is taken because the data containing the error may already have been loaded into the PFQ. If the Ebox detects an empty instruction queue, source queue, destination queue, or field queue while the exception or error condition is asserted, the Ebox initiates an exception microtrap. Section 7.2.1.7 and Section 7.3.2.15 contain the Ibox implementation details of Istream error and exception handling. See Table 8-12 and Section 8.5.19 for Ebox implementation details. 7.9.3 Dstream Memory Errors Memory errors on incoming Dstream data are detected during the processing of some deferred mode specifiers. In auto-increment deferred and displacement deferred specifier modes, the complex specifier unit reads the address of an operand from memory. This memory read is followed either by a direct \vrite to an Ebox MD, or an operand memory reference to read the actual operand into an Ebox MD andlor create a PA queue entry for a result store. If the Mbox retum.s :M~c:l\IME_FAUL.T_H or Mt;:CHARD_ERR_R, then in the case of a direct MD write, the appropriate flag is sent with the ~ID write to the Ebox. If the Ebox detects one of the flags during an MD file access, it initiates an exception microtrap. If a memory operation is required to complete the processing of the specifier, the appropriate error or exception £lag, sent with the memory request. The Mbox forces a memory management error or exception to occur for that reference, causing a fault flag to be returned to the appropriate Ebox MD. Section 7.4.2.2.2 contains the Ibox implementation details of Dstream error and exception handling. See Table 8-12 and Section 8.5.19 for Ebox implementation details and Section 12.6.5 for Mbox implementation details. 7.9.4 Reserved Opcode Faults Reserved opcode faults occur when the lBU detects unimplemented or reserved opcodes during instruction parsing. All such opcodes stop the parser and make an Ebox instruction queue entry containing a microcode dispatch for the reserved opcode routine. Section 7.3.2.12 contains the Ibox implementation details for reserved opcode handling. 7.9.5 Reserved Addressing Mode Faults Reserved Addressing Mode Faults occur due to illegal combinations of specifier mode, specifier register, and access type. Unpredictable addressing modes occur due to combinations of specifier mode, specifier register, access type, and data length that do not make sense. Table 7-33 summarizes the behavior of the Ibox on reserved and unpredictable addressing modes. Reserved addressing modes as specified by the VAX Architecture Standard always cause reserved addressing mode faults. Unpredictable addressing modes may produce a fault, or may be allowed to continue even though the result does not make sense. The processing of unpredictable modes never hangs the machine. DIGITAL CONFIDENTIAL The Ibox 7-65 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 7-33: Reserved Addressing Mode Faults Address Access Mode 'JYpe SA#literal Modify take required fault SA#literal Write take required fault SA#literal Address take required fault SA#literal Field GPRs nata take required fault Yes base[Rx] Action Length SA#literal PC take required fault take required fault base[Rx] Rn Indexed Yes take required fault Yes take required fault Address take required fault Rn (RnH- :Modify PC take required fault (RnH- Write PC take required fault Rn PC Rn SP q!d,g 2nd source/dest queue entry has Rn=PC Rn (Rn) SP~-\P,FP o~h unimplemented data lengths PC -(Rn) PC sourceldest queue entry has Rn=PC Operand address is unpredictable Operand address is unpredictable -(Rn) Rx=Rn (Rn)+ Rx=Rn ax read for index, then Rn read for base ax read for index, then Rn read for base @(Rn)+ Rx=Rn Rx read for index, then Rn read for base Yes ax for index is read but not used (Rn)+ (Rn)+ Address PC PC PC after specifier byte passed as address When a Reserved Addressing Mode Fault is detected, I%RSVD_ADDR_FAULT_B is asserted, VIC prefetching is stopped, the IBU is stopped, and the CSU goes idle. A Reserved Addressing Mode Fault also blocks the OQU from makjng the source queue or destination queue entry associated with the faulting operand. If the Ebox detects an empty source queue, destination queue, or neld queue while Io/DRSVD-.AJ)DR_FAULT_B is asserted, the Ebox initiates an exception microtrap. All reserved addressing mode fault conditions are cleared in the Ibox when the Ebox loads a new PC. 7-66 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 7.10 Ibox Signal Name Cross-Reference All signal names referenced in this chapter have appeared in bold and reflect the actual name appearing in the NVAX schematic set. For each signal appearing in this chapter, the table below lists the corresponding name which exists in the behavioral model. Table 7-34: Cross-reference of all names appearing In the lbox chapter Schematic Name Behavioral Model Name I~CB"..BOS_B<l:O> I%mo~IA...ADDR_B<3:0> I%BRANCH_BUS_H<l:O> I%FORCE_HARD_FAULT_H I%FORCE_MME_FAULT_H I%IBOX_IA_ADDR_B<3:0> I~mO~I.A..READ_B I%IBOX_IA_~_B I~mo~~·_ADDB_B<4:0> I%IBOX_IW_ADDR_H<4:0> I%IBO)l.IW_BUS_H<31:0> I%IBOX_IW_WRITE_H I%IBOX_S_ERR_L I%IMEM_HERR_B I%IMEM_MEXC_B I%I'LBUS_H<22:0> I%OPERAND_BUS_B<14:O> I%PMUXO_B I%PMUX1_B I%RSVD_ADDR_FAULT_H E%BCOND_RETIRE_H E%BRANCH_MISPREDICT_H E%DQ...RETIRE_H E%DQ..RETIRE_RMODE_H E%DQ..RETIRE_RN_H<3:O> E%FLUSH_BPI'_H E%FLUSH_PC'LH E%FLUSH_VIC_H E%FPD_SET_H I~POBCE_IL\BJ)_FAm.TJI NtPOBCE_MME_FAtlLT_B I~moX_IW_BUS_B<31:0> I~IMEM_MEXC_B I~ICLBtJS_B<22:O> IfliOPEBAND_BUS_B<14:0> I~PMVXl_B I~RSVD..ADDR_FAtlLT_B 6BCOND_1tE'1"D.IE_L ftoBBANCB..MISPBEDJCT~ ~BE'l".tItE_B J:CMDQ...BE'l".tItE_RMODE_B ~BE'DBE..BN_B<3:O> KFL'VSIUIPI'JI KFLlJSB..PCCU! ~tJSEr...VIC_B ftoFPD_SET_L ft,lBO~BV8J1<31:O> E%IBOx...~BUS_H<31:0> :£IMBOXJPR..NUM...B<3:0> E%IBOx...IPR_NUM_H<3:0> E%IBOx...IPR_READ_H E%IBOx...IPR_TAG_H<2:0> E%IBOx...IPR_WRITE_H :£IMBO~JPltJUtAD.JI ~XJPR_TAG..B<2:O> ft,lBOXJPR_WBl'.nUI DIGITAL CONFIDENTIAL The Ibox 7-67 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 7-34 (Cont.): Cross-reference of all names appearing In the Ibox chapter Schematic Name Behavioral Model Name ~IBOx....LOADJ'CJ. E%IBOX_LOAD_PC_H E%RESTART_IBOX_H E%RETIRE_INSTR_H E%S'LRETIRE_H<l:O> E%S'LRETIRE_MD_H<l:O> E%S'LRETIRE_RMODE_H<l:O> E%S'LRETIRE_RN1_H<3:0> E%S'LRETIRE_RN2_H<3:0> E%STOP_IBOX_H ~BESTAltT_JBOx....B J:CIioBETJBE_INST.R._L ~RE"1't8E_B<l:O> E'UQ..B.E'l'IB.E..MD_B<l:O> nsQ..RE'J.'JRE_RMODE_B<l:O> ft.SCLllE"l'IRE_RN1_B<3 :0> nSCLRETlllE_RN2_B<3:0> E'.iSTOP_IBOx..1I I%FLUSH_IREF~LAT_H I~IBOx...REF_DESTJ.<1:0> I%FORCE_HARD_FAULT_H I%FORCE_MME_FAULT_H I%IBOX_ADDR_H<31:0> l%IBOX_AT_H<l:O> I%IBOX_Cl\ID_H<4:0> I%IBOX_DL_H<l:O> I%IBOX_REF_DEST_H<l:O> I~IBOx...TAG_L<2:0> 1%IBO~TAG_H<2:O> I~mEF_BEQ...H JIlIMPEC_Q..lI"OI..l....B I%IREF_RE'LH I%SPEC_RE'LH M%HARD_ERR_H M%IBOX-DATA_H M%IBOX_IPR_WR_H M%LAST_FnL_H M%MD_BUS_H<63:O> M%MD_BUS_QW_PARITY_H M%MME_FAULT_H M%QW_ALIGNMENT_H<l:O> M%SPEC_'LFULL_H M.,.VIC..;DA'»\..L M%VIC_D~H I~FOBCE~1ME_FAL"LT_H I~IBOx...A.DDR_II<31:0> IwIBOX_AT~L<1:0> I~IBOx..CMD_L<4~1:0> I~IBOx...DL..L< 1:0> ICJi,SPEC_~B M~_ERlLII MtQB~DATAJ. JI.IHB~lPR_WR..JI ~JPJLL..II MUfD_B'VSJ!<63 :0> M.UfD_B'OS_QW-.l'AlU'.l'Y_L M'IMME_FA'DLT_II lIftQW_ALlGNMENT_B<l:O> 7.11 Testability 7-68 The Ibox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 7.11.1 Overview Ibox testability is enhanced by architecturally accessible features, and connections to the internal scan register and the parallel port. 7.11.2 Internal Scan Register and Data Reducer Ibox state can be latched into the scan register and shifted off-chip through the global internal scan register. The shift out begins with scan register bit O. See Chapter 19 for the implementation details of the internal scan register. Table 7-35 lists the states in the Ibox scan register. Under global control from the test port, the Ibox scan register can be configured as a LFSR. Table 7-35: Ibox Scan Register Fields Bit Field Field Name Description <0> STP_llESTAltT St.op parser flag <1> STP_SVPPRESS Stop parser flag <2> SBL1T" specifier control <0>, short literal <8:3> BNlSHORT Ll'l'EBAL specifier control <6: 1>, register or shlit value <11:9> AT specifier control <9:i>, access type <13:12> DL specifier control <11:10>, data length <14> VALID specifier control <12>, valid <15> <18:16> COMPLEX specifier control <13>, complex specifier DISPATCH specifier control <16:14>, dispatch address <19> AT_BMW specifiercontr~<17>,~ <20> INDED:D specifier control <18>, index <21> ASSIST specifier control <19>, assist <22> PC_MODE specifier control <20>, PC mode <23> .IMP_OR...J8B specifier control <21>, JMP or JSB <25:24> E..DL execution data length <1:0> 7.11.3 Parallel Port The esu microcode address is routed to the chip parallel port. The microcode address can be monitored on a cycle by cycle basis during chip debug by selecting the Ibox as source to the parallel port. When selected, a buffered version of the control store address, M11X...B<6:0>, appears on PP_DATA<6:0>. See Chapter 19 for the implementation details of the parallel port. DIGITAL CONFIDENTIAL The Ibox 7-69 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 7.11.4 Architectural Features Internal processor registers are included as architectural features to aid in testability. IPR access to VIC tags and data is available through the VTAG and VDATA registers. See Section 7.2.1.16 for the implementation details of the these registers. IPR access to the branch history table and branch status is available through the BPCR register. See Section 7.5.1.8 for the implementation details of the BPCR. 7.12 7.12.1 Performance Monitoring Hardware Signals The Ibox provides two signals for perlormance monitoring: I%PMUXO_H asserts on every VIC access and I%PMUXl_H asserts on every VIC hit. These signals enable the Ebox performance monitoring hardware to gather statistics on VIC hits versus VIC accesses. 7-70 Thelbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 7.13 Revision History Table 7-36: Revision History Who When Description of ehaDge John F. Brown 19-Feb-1991 Update following pass 1 tape out John F. Brown, Ruben Castelino, Mary Field, Paul Gronowski, Jeanne Meyer 12-Jan-1990 Intermediate release. John F. Brown, Paul Gronowski, Jeanne McKinley 06-Mar-1989 Release for external review. John F. Brown 19-Dec-1988 Partial Update. Shawn Persels 06-0c""L<"' 1988 Initial release. DIGITAL CONFIDENTIAL The Ibox 7-71 Chapter 8 The Ebox 8.1 Chapter Overview This chapter describes the Ebox section of the NVAX CPU chip. Only the major functional blocks, their interfaces to each other, and the interface to the rest of the NVAX system are described here. Circuit level implementation details are not of primary concern in this document. 8.2 Introduction The Ebox is the instruction execution unit in the ~TVAX CPU chip. It is a 3 stage pipeline (S3 ..S5) which runs semi-autonomously to the rest of the NVAX chip and supports the following functions: • • • • Instruction Execution The Ebox is responsible for carrying out the execution portion of each VAX instruction under control of a microfiow whose initial address is provided by the Ibox issue unit. Instruction Coordination The Ebox is a major source of control to coordinate instruction processing in the Ibox, Mbox, and Fbox. It ensures that Ebox and Fbox macroinstructions retire in the proper order, and it provides controls to the Mbox and Ibox which help manage certain inter-macroinstruction dependencies. The Ebox cooperates with the Ibox in handling mispredicted branches. Trap, Fault and Exception Handling The Ebox coordinates trap, fault, and interrupt handling. It delays the condition until all preceding macroinstructions complete properly. It then collects information about the condition and ensures that the COtTect architectural state is reached. CPU Control Most CPU control is provided by the Ebox. Ebox control functions include CPU initialization, controlling Ibox, Fbox, and Mbox activities, and setting control bits during major CPU state changes (e.g. taking an interrupt or executing a change mode instruction). The Ebox accomplishes many of the above functions by executing the NVAX Ebox microcode. This chapter views the Ebox as the interpreter of microcode. Describing how microcode functions are used to correctly emulate the VAX architecture or the architectural motivation for Ebox hardware functions is generally outside the scope of this discussion. DIGfTAL CONFIDENTIAL The Ebox 8-1 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 8-1 at the end of this section is a top level block. diagram of the Ebox showing all the major Ebox function units, their interconnections, and their place in the pipeline. The pipeline segments are shown in the diagram (82, 83, 84, and 85). The sections following the diagram describe the function elements depicted and the Ebox pipeline. 8-2 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 8-1: Ebox Block Diagram ..... nULT.1'IIoCe (FROM FIOXl 83 82 IIf'IMD_TNlJr!.RITE _ ... =O:-::V' ADDII_ _. !!,IIOM ~0ltt:: ~g~X~~i'i'bDR ' .. 'lOX_lA_lUI S4 S5 i! !! :! (WRITE DATA : TO . . . ANI): ;'.. 'IOX_'W_IU5c" :0., 1 ("flO., IIOX; WI» , . , . . .. , ',i, : _MD_IUacS,:o·n : (1'110"" MlOX) i: ...'! I'IroVA..IU5 -"--~-_-_-_.-_-._";'_~._-_.-_-_-_~__-_.-+-fA""D""D~U. ~ ••01 IINST.lucm'N 1 83 MIB LATCH 84 M IB DEOODE LATOH 85 M IB DECODE OUli.UE ~ -_ - ._ T __ .'t. . .•• . . p'La _ _ DIGITAL CONFIDENTIAL The Ebox 8-3 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 8.3 Chapter Structure The Ebox is described from both an overall functional and individual function unit standpoint. The top level description is of the major Ebox functions. The next level consists of a detailed description of each of the Ebox function units. The Ebox functions are described in the initial sections of this chapter. They are presented referring to the microcode fields which control the Ebox. Within each section the Ebox functions in question are discussed in detail and the Ebox function units which support that function are introduced. The functional overview is followed by a comprehensive description of the each of the Ebox function units. The latter sections of this document describe Ebox initiali%ation, timing, error handling, testability and other details not related to the main-line functionality of the Ebox. 8.4 Ebox Overview 8.4.1 Microword Fields The Ebox is controlled by the data path control portion of the microword, which is either standard or special format. The other portion of the control word, the microsequencer control portion, controls the microsequencer which determines which microword is fetched in every cycle. The fields of the data path control portion of the microword and their effect within the Ebox are shown in Table 8-1. For more information on microword formats and field widths see Chapter 6. NOTATION The notation FIELDIFUNCTION is used throughout this chapter to mean that microword field FIELD specifies FUNCTION. Table 8-1: Data Path Control Mlcroword Fields Microword Field Miaroword Format De.criptiOll FORMAT Both This one-bit field determines whether the microword is in the special format. If it is 1, the MISC1, MISC2, and D fields exist. If it is 0, the Q, SHF, and VAL fields exist instead. LIT Both This one-bit field determines whether the mic:roword is the constant generation variant (format). !fit is 1, the POS and CONSTfields exist. Ifit is 0, the VAL and B fields exist instead in standard format, and the MISC2, D, and B fields exist instead in special format. ALU Both Sets the ALU function, including typical ALU operations, and others. MRQ Both Controls initiation ofEbox memory accesses and other Mbox control functions. The Ebox decodes the field and sends the corresponding request to the Mbox. SHF Standard Sets the shifter:function. The W and Q fields control how the shifter output is used. Some settmgs of this field specify a pass operation instead of a shift. 8-4 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0t February 1991 Table 8-1 (Cont.): Data Path Control Mlcroword Fields Mic:roword Field Mic:::roword Format Description VAL Standardl Specifies the shift amount (1 to 31) or, ifVAL in the SC register. A Both Specifies the source of E_BtJH.\BOS_Lc81tO> for this microword. The A field can select any element in the register file or one of several of Ebox sources. E_BVSIW.Bt1SJ.<81tO> is one of the two sources for the ALU and the shifter. B Both1 When the source of E_BUSUBt1S_Lc811D> is a register this field specifies the source of E_Bt18~BBOS_Lc811D>. The B field can select from. some of the elements in the register file or from a small number of other Ebox soun:es. EJlun.BBt1S_ L<311D> is one of the two sources for the ALU and the shifter. POS Both2 When the source of E_Bt1SUBUSJ.<811D> is from the constant generator this field specifies which byte the constant value is in. Bytes a through 3 may be specified. The other bytes are forced to O. CaNST Both2 'Ibis field contains the literal byte value which is sourced to one of the bytes of E_BUSramUSJ"c311D> as specmed by the PCS field. (The other E_BUSUBUS_ L<31tO> bytes are forced to 0.) CaNST. lOs Both2 'Ibis field contaiDs the literal 10-bit value which is sourced to E_Bt1S'iCBBUS_ L<9lO>. (E_BUSC;CBBUS_L<31:10> are forced to 0.) DST Both This field specmes the destination of E_BUS~WBUSJAlaO>. The possible destinations include a subset of the register file and a number of other Ehox destinations. Q Standard Controls whether or not the Q register is loaded with. the shifter output for this microword. W Both Selects the driver OfE_BOS«iiWBOS_L<811D>. Either the ALU or the shifter output is driven on E_Bt18I1>WBOS-.L<8laO>. L Both This field controls whether the Ehox operations are done with a data length of longword or the length specified in the DL register. The Ebox operations affected are condition code calculation, size of memory operations, zero extending of EJlVSII>WBt1S_L data, and bytes affected by register file writes. V Both Controls updati.ng of the VA register. Either the VA register is updated with. the value:from the ALU, or it is not changed from its previous value. MISC Both This field has many uses. Only ODe use can be selected at a time. This field can control PSL condition code alterations, set the DL register, set or clear state flags, or invoke a box coordination or control function. MISCI Special This field can specify one of a few Ibox or Fbax coordination or control functions, and can be used to set·or clear state flags. MISC2 Speciall One M'box control function and one to add an F'box destination scoreboard DISABLE.RETIRE Speciall =0, specifies to shift the amount entry. This field is used to disable retire of macroinstructions and retire queue entries 1 Not constant generation microword variant. 2Constant generation microword variant. sThe CONST.10 field is ac:tua1ly the POS field bitwise concatenated with the CONST field, with the POS field in the more significant position. It is simply a way of treating these two mic:roword fields as one. CONST.IO is only used when MISC/CONST.lO.BIT is specified. DIGITAL CONFIDENTIAL The Ebox 8-5 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 When a microword field is not present in all formats, it defaults to NOP (no operation) when a microword format without that field occurs. More specifically, standard format microwords effectively specify MISClINOP, MISC2INOP, and DISABLE.RETIRElNO by default. Special format microwords effectively specify QlHOLD.Q, SHFINOP, and VAIJO. When the microword is the constant generation variant of the standard format microword, VAUO is effectively specified, and the B field is ignored since this microword variant sources a constant onto E_BUS%BBUS_L<31:O>. In the constant generation variant of the special format microword, MISC2INOP and DISABLE.RETIREINO are effectively specified, and the B field is ignored because this microword variant also sources a constant onto E_BUS%BBUS_L<31:O>. 8.4.1.1 Microsequencer Control Fields In addition to decoding the datapath control portion of the microword, the Ebox decodes a part of the Microsequencer control portion of the microword. Specifically, it detects when the SEQ.FMT and SEQ.MUX fields (see Chapter 9 and Chapter 6) specify LAST. CYCLE or LAST.CYCLE.OVERFLOW. The Ebox fault detection logic and the RMUX control logic use these decodes. 8.4.2 The Register File The register file contains four kinds of registers: lID (memory data), GPR, Wn (working), and CPUSTATE registers. The lID registers receive data from memory reads initiated by the Ibox, and from direct writes from the Ibox. The Wn registers hold microcode temporary data. They can receive data from memory reads initiated by the Ebox and receive result data from .ALU, shifter, or Fbox operations, and from the Ibox in the case of Ibox IPR reads. The GPRs are the VAX architecture general-purpose registers (though R15 is not in the file) and can receive data from Ebox initiated memory reads, from the ALU or shifter, or from the Ibox. The CPUSTATE registers hold semipermanent architectural state (e.g. KSP, SCBB). They can only be written by the Ebox. 8.4.3 ALU and Shifter Each microword specifies source operands for the· ALU or shifter (A, B, POS, and CONST fields), operations for these function units to perform (ALU, SHF, and VAL fields), and a destination (or possibly two destinations if Q or VA is updated) for the result(s) (DST, Q, W, and V fields). Note that in special format microwords no shifter operation can be specified and the Q register can't be altered. In the course of executing the microworci, the Ebox will fetch the source operands onto E_BUS%ABUS_L<31:O> and E_BUS%BBUS_L<31:O>, carry out the specified ALU and shifter functions, and store the result in the specified locations (if any). 8.4.3.1 Sources of ALU and Shifter Operands In general the sources of E_BUKABUS_L<31aO> and E_BU~BBUS_L<31:O> (the inputs to the ALU and shifter) are either a constant, a register from the register file, an Ebox register (e.g. PSL, Q, or VA), an Ebox source value calculated by a special function unit, a hardware status provided via a special path from outside the Ebox (e.g., interrupt status), or an entry from the source queue. E_BUS%BBUS_L<31:O> sources are limited to a subset of the register file, certain Ebox registers, and an entry from the source queue. The source queue is introduced in Section 8.4.4. 8-6 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 8.4.3.2 ALU Functions The ALU is capable of standard operations on byte, word, and longword size operands. It can pass either input to the output and is capable of a number of arithmetic and logical operations on one or two operands, producing condition codes based on data length and operation. It also has specialized functions which are discussed in Section 8.5.3. 8.4.3.3 Shifter Functions The shifter does longword and quadword shift operations and certain pass-thru operations, always producing a longword output. The shifter treats the two sources as a single quadword, with E_BUS%ABUS_L<31:O> as the more significant longword. The longword output is this quadword shifted right 0 to 32 bits and truncated to longword length. The shifter produces condition codes based the longword output data. 8.4.3.4 Destinations of ALU and Shifter Results The output of the shifter and the output of the ALU can drive E_BUSl1CWBUS_L<31:O>. The shifter output is also directly connected to the Q register so that the Q register can be loaded with the shifter output regardless of the source of E_BUS%WBUS_L<31:O>. In the same way, the ALU output is directly connected to the VA register. E_BUS%WBUS_L<31:O> data is the input to one of the write ports on the register file and can be used to update any register file entry except an ~!D register. Certain other Ebox registers (e.g. se, PSL) can be loaded from E_BUS%WBUS_L<31:O>. The destination of E_BUS%l\'BUS_L<31:O> can be specified by the current destination queue entry, when the microword so specifies. The destination queue is introduced in the following section. 8.4.4 Ibox-Ebox Interface The Ibox-Ebox interface is made up of a number of FIFO queues. The purpose of these queues is to allow the Ibox to fetch and decode new instructions before the Ebox is ready to execute them. The Ibox adds entries as it decodes instructions, and the Ebox removes them from the other end as it executes them. For each opcode, there is a predetermined number of entries added to the various queues by the Ibox. Ebox execution microfiows remove exactly the right number of entries from each queue. The queues which interface the Ibox to .the Ebox directly are the source queue, the destination queue, the branch queue, and the field queue. The instruction queue, the PA queue, and the retire queue are introduced here for completeness. The source queue holds source operand information. Entries are added by the Ibox as it decodes the source type operand specifiers of each instruction. The entry is either a pointer into the register file or the data from a literal mode operand specifier. The Ebox accesses and removes an entry each time a microword specifies a source queue access in either the A or B fields. If the entry is literal data, it is used as an ALU and/or a shifter operand. Otherwise the register file is accessed using the pointer in the entry. The destination queue holds result destination information. Entries are added by the Ibox as it decodes the destination type operand specifiers of each instruction. A destination queue entry is either a pointer to a GPR in the register file or a :flag indicating that the result destination is memory. The Ebox accesses and removes an entry each time a mieroword specifies a destination queue access in the DST field. or the Fbox supplies a result which specifies a destination queue DIGITAL CONFIDENTIAL The Ebox 8-7 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 access. If the entry is a pointer to a GPR, the Ebox writes the ALU, shifter, or Fbox data into the register file. Otherwise the data is stored in memory at the address found in the PA queue. The PA queue is in the Mbox.. Each time the Ibox adds an entry indicating a memory destination to the destination queue it also sends the Mbox a virtual address to be translated. When the Mbox has translated the address it puts it in the PA queue. If the current destination queue entry indicates a memory destination, the Ebox sends the result data to the Mbox to be written to the physical address found in the PA queue. The Mbox removes the PA queue entry as it uses it. The branch queue holds status bits for each branch instruction processed by the Ibox. The Ibox adds an entry to the branch queue each time it finishes processing a conditional or unconditional branch. The Ebox references and removes the current branch queue entry in the execution microfiow for the branch. This allows the Ebox to synchronize with the Ibox so that the branch does not :finish executing until the Ibox has successfully fetched the branch displacement specifier. I t also allows the Ebox to check for an incorrect branch prediction by the Ibox. Each time the Ibox decodes a branch it calculates the branch address. For unconditional branches it simply begins fetching from the new instruction stream immediately. For conditional branches the Ibox predicts whether the branch will be taken or not. The branch queue entry added by the Ibox indicates the branch prediction. 'When the Ebox executes an unconditional branch, it references the branch queue simply to ensure that the Ibox 'vas able to fetch the displacement specifier without a fault or error. For conditional branches the Ebox also checks that the branch prediction was correct and initiates a microtrap if it wasn't. If the branch wasn't correct, the Ebox notifies the Ibox, which uses the alternate path PC (which it had kept) to begin fetching along the correct path. The retire queue holds status for each macroinstruction currently being executed in the Ebox or the Fbox. The status indicates which unit will execute the instruction, the Ebox or the Fbox. The Ebox adds an entry each time the Microsequencer dispatches to a macroinstruction execution microfiow. The Ebox references the retire queue when the macroinstruction execution is complete in order to ensure that instructions finish executing in the proper order. A certain amount of concurrent execution in the Fbox and Ebox is possible. The retire queue is used to prevent one box from altering any architecturally visible state before the other box's execution for preceding macroinstructions finishes. The Ebox references and removes a retire queue entry each time an Fbox or Ebox instruction is retired. The field queue holds a one-bit type status for variable-length bit field base address operands processed in the Ibox. (Note that some operands are treated as variable-length bit field base address operands intemally by the NVAX CPU even though the operand is not really the base address of a variable-length bit field. These operands, including the true bit field base address operands, are collectively referred to as field operands.) The field queue entry indicates whether the field operand was register mode. The Ibox adds an entry when it processes operands which it knows by context require an entry. The Ebox retires an entry after it has used the information in a microcode conditional branch. Very different execution microfiows are required for some instructions, particularly bit field instructions, depending on whether a particular operand is register mode or specifies a memory address. In the latter case the information sent by the Ibox is a memory address, while in the first case the source and destination queue entries point to the register in the register file. See Section 8.5.15.8 for more information. 8-8 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Devision 1.0, February 1991 The instIuction queue is part of the Ibox-Microsequencer interface. It holds information derived from the VAX instruction opcode. The Ibox adds an entry as it decodes each instruction. An entry contains the opcode, data length, the microcode dispatch address for execution, and a flag indicating whether the macroinstruction is for the FboL The Microsequencer references and removes an entry at the start of execution of each VAX instruction. It uses the dispatch address to fetch the first microword of the macroinstIuction execution microfiow. At the same time it passes the opcode, data length, and the Fbox execution flag to the Ebox. The Ebox adds an entry to the retire queue at that time. That entry is simply the Fbox execution flag (except if the Fbox is disabled, see Section 8.5.15.7). See Section 9.2.3.3.4 for more on the instIuction queue. 8.4.5 Other Registers and States The Ebox contains several special purpose registers, the Be, VA, and Q registers, and the PSL. The se register holds a shift count for use in some shift operations. The VA register can hold a virtual address or a microcode temporary value. The \~ register is directly readable by the Mbox and is the address source for all Ebox initiated memory operations. The VA register is loaded directly from the ALU output. The PSL is the VAX architecture program status longword register. It is loaded from E_B'CSl'lCWBUS_ L<31:0> and can be used as a source operand by the ALU or shifter. Its bits are used in many places in the Ebox and else\vhere in the CPU where required by the VAX architecture. The Q register is loaded from the output of the shifter. It holds shifter results for later use. 8.4.6 Ebox Memory Access Through the mechanism of the source queue and the destination queue, the Ibox initiates most memory accesses for the Ebox. In certain cases the Ebox must catTy out memory accesses on its own. The MRQ :field of the microword specifies the Mbox operation. The virtual or physical address is provided from the VA register. If the VA is being updated in this microword, the address is bypassed directly from the output of the ALU. For writes, the data is taken from E_BUKWBUS_ L<31:O>, so it can be the output of the shifter or the ALU. For reads, the nST field of the microword specifies the register file entry which is to receive the data. This register must be a GPR or a working register. 8.4.7 CPU Control Functions Most control functions are invoked through one of the MIse fields, but some of the MRQ field functions are Mbox control functions or miscellaneous control functions rather than memory access commands. The control functions generally act to reset a function unit (Fbox, Ibox, or Mbox), synchronize Ebox operation with a function unit, or restart semiautonomous operation of the Mbox or Ibox when either of them has stopped for some reason. DIGITAL CONFIDENTIAL The Ebox 8-9 9) sY S~ NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 t"~(\\ 8.4.8 Ebox Pipeline t:)'E ( ") r\~f "v-.J I ! Execution of microwords in the Eboxis pipelined with three pipe stages (83 ..85). These stages are shown in Figure 8-1. In the first stage (S3), the E_BUS%ABUS_L<31aO> and E_BU~BBUS_L<31aO> sources are fetched or prepared. In the second (84) the ALU and shifter operate on the data. In the third (85) the result is written into the register file or to some other destination. Stages 83 and 84 can stall for various reasons. Stage 85 cannot stall. Once a particular microword's execution has advanced into 85, it is going to complete. Various stalls occur in 84 in order to ensure that a particular microword's effects do not change any architectually visible state (e.g., GPRs, PSL) before proper completion without memory management faults is guaranteed. The Microsequencer fetches the microword and delivers it to the Ebox in 83. If the Ebox's 83 stage is stalled, the Microsequencer's 82 activity is stalled as well. See Chapter 9 for more detail. Even though the operand fetch, function execution, and result store take place in different cycles, the microword specifies the operation as if it all took place in one cycle. The Ebox has bypass paths which allow a microword to use a register as a source even it it is updated by one of the two preceding microwords. For example, if the immediately preceding micro\vord updates WI in the register file and the current microword specifies WI as a source to the ALtT, the Ebox hardware detects the condition and muxes the data into the staging latch before the ALU at the same time as it forwards the data to the latch which sources E_BUS%WBUS_L<31:O> in stage S5. Bypass paths are only implemented where performance considerations '\varrant. Also b:rpassing isn't the solution to every problem pipelining introduces. For example, after the PSL is updated the microcode allows 2 cycles before a microword specifying SEQ.MUXlLAST.CYCLE or SEQ.MUXILAST.CYCLE.OVERFLOW because the PSL is not actually updated until S5. The Microsequencer uses the FPD, T, and TP bits in the PSL to determine the proper new microflow dispatch. It would make the decision based on old PSL information if the microcode didn't allow the 2 cycles. One place where the effect of pipelining is particularly apparent is in microcode conditional branches. For example, a microcode branch based on E_BUS%BBUS_L<31aO> data must immediately follow the microword which sources the relevant data onto E_BUS%BBUS_L<31:O>. Similarly, a microcode branch based on the ALU condition codes must be the second microword after the one which specified the ALU operation. See Chapter 9 for more detail on microcode branches. 8.4.9 Pipeline Stalls The Ebox pipeline is controlled by the stall and fault logic. This function unit supplies stall signals which are used to gate clocking of control and data latches in each stage. It also controls insertion of effective no-ops into S4 when S3 is stalled and into S5 when 54 is stalled. The Ebox pipeline stalls in S3 when it is accessing a source operand in the register file or the source queue which is not valid. Many register file entries have a valid bit associated with them. A register file entry is not valid, and its valid bit is not set, if a memory read has been initiated for that entry and hasn't yet completed. A source queue entry is not valid if the Ibox hasn't added that entry yet. The Ebox stalls in S4 if the current destination queue entry is not valid and the microword in 54 references a destination queue entry. A destination queue entry is not valid if the Ibox hasn't added that entry yet. 8-10 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The Ebox stalls in S4 if the current destination queue entry is valid but specifies a memory destination for the data and the current PA queue entry is not valid. A PA queue entry is not valid if the Mbox hasn't added that entry yet. The Ebox stalls in S4 if the microword in S4 requests a memory operation and the Mbox is already working on an Ebox initiated memory operation (that is, the previous request is still in the EACLATCH). The Ebox stalls in S4 if the microword in S4 synchronizes with the branch queue and the branch queue entry is not valid. A branch queue entry is not valid if the Ibox hasn't added that entry yet. The Ebox stalls in S4 if the current retire queue entry specifies that an Fbox instruction must retire before the instruction associated with the microword in S4 and the Ebox is requesting the use of the RMUX to store result data. (The Ebox requests the use of the RMUX if the microword in S4 specifies anything other than NONE in the DST field.) If the Ebox stalls in S3, the S4 and S5 stages of the pipeline can continue execution. If S4 doesn't stall when S3 does, then an effective no-op is inserted into S4 after the current S4 operation advances into S5. The no-op is necessary so that the stalled S3 micro,vord isn't advanced to S4 and 85 while an S3 stall is in effect. See Section 8.5.20 for :more detail. If the Ebox stalls in S4 then S3 stalls as well. {l\1icrowords can't pass each other in the pipeline.) During S4 stalls, an effective no-op is inserted into S5 after the operation in S5 completes. This is necessary so that the operation in S4 isn't advanced into S5 while an S4 stall is in effect. See Section 8.5.20 for more detail. In any cycle that the Ibox has not made a microstore dispatch address available to the Microsequencer and a dispatch is needed (i.e., during the last cycle of any microflow), the microsequencer fetches the STALL microword. This microword specifies no Ebox operation and can't cause a stall anywhere in the pipeline (although it does specify SEQ.MUXlLAST.CYCLE). This allows the microwords already in the pipeline to continue even when the Ibox is temporarily unable to supply new instruction execution dispatches. See Chapter 9 for more detail. A microcode loop which repeatedly accesses the field queue until the current field queue entry becomes valid is also very much like a stall, though the stall logic is not actually involved. This condition is referred to as a field queue stall. In this situation, the Ebox pipeline advances in each cycle (unless the microword in S4 is stalled also). However, the same microword is fetched out of the control store in every cycle. In typical micrOcode usage of the field queue conditional branch, tbis microword will not alter any state in S4 or 85. See Section 8.5.15.8 for more detail. 8.4.10 Microtraps, Exceptions, and Interrupts The Ebox and Microsequencer together coordinate the handling of exceptions and interrupts. Most interrupts and some exceptions are handled by Microsequencer dispatching to a microcode exception handler routine at the end of the CUlTent VAX instruction. These dispatches do not affect the execution of microwords already in the pipeline. Other exceptions cause a microtrap. In a microtraE the Microsequencer signals the Ebox to cause stages 83, S4, and 85 of the Ebox control pipeline to be flushed. It also signals the Ebox to flush the retire queue. (Flushing of the other Ibox-to-Ebox queues, the Fbox pipeline, and the specifier queue in the Mbox is done by microcode, except in the case of a branch misprediction.) At the same time the Microsequencer fetches a new microword from a special dispatch address in the control store based on the particular microtrap condition. This microflow handles any other necessary state flushing. Because a microtrap affects DIGITAL CONFIDENTIAL The Ebox 8-11 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 microwords already in the pipeline, the Ebox delays handling most traps until the microword which incurred the fault has reached 84. The microtrap is taken at the time that microword would normally have entered 85. In certain cases, Ebox stalls delay a microtrap until the stall is ended. The purpose of this is to ensure that operations which are part of a preceding VAX instruction are allowed to complete properly. Most of the microtraps which the Ebox delays until S4 are due to Ibox-initiated memory operations which had an access or translation fault. Faults due to Ibox-initiated reads are detected by the Ebox when it accesses a valid MD register from the register file, and the fault bit associated with that MD is set. Each MD register has a fault bit which is set by the Ibox or the Mbox when a fault occurs in the memory reads necessary to fetch the source data. When the Ebox accesses an MD register with its fault bit set in 83, it catries that fault status down the pipeline into 84. All faults detected in S3 are piped to S4 before they cause a microtrap. Faults detected in S4 or piped to S4 will cause a microtrap only if the Ebox is next to retire a macroinstruction. Otherwise they are delayed until the Fbox retires an instruction and the retire queue entry indicates the Ebox. Fault status signals are sent by the Ibox for entries in the instruction queue, source queue, field queue, destination queue, and branch queue. Entries in the PA queue have fault bits. The Ebox detects a fault when it accesses a PA queue entry with its fault bit set or when it finds the instruction queue, source queue, field queue, destination queue, or branch queue empty and one of the fault status signals from the Ibox asserted. In the case of the instruction queue, the fault is detected in 82 and carried into 83 only when there is no 83 stall. In the case of the source queue and field queue, the faults are detected in 83. Instruction queue, source queue, and field queue related faults are carried down the pipeline until they reach 84, where they cause a microtrap once the Ebox is next to retire a macroinstruction. Faults encountered in Ebox-initiated memory operations cause the Microsequencer to trap immediately. Ebox memory accesses begin in 85 so these traps cannot affect microwords from preceding VAX instructions. It is up to microcode to make sure that the last Ebox memory access has completed properly before the Microsequencer dispatches to another VAX instruction execution microfiow. Hardware errors are essentially handled in the same way as faults. See Section 8.5.19. 8-12 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 8.5 Ebox Detailed Functional Description 8.5.1 Register File The register file has 4 distinct groups of registers: MD (memory data), GPR, Wn (working registers), and CPUSTATE registers. There are a total of 37 registers in the file. There are 6 ports: 3 read ports and 3 write ports. The read ports are the A port, the B port, and the IA port. The write ports are the W port, the IW port, and the MD port. The result is UNPREDICTABLE if more than one write to the same location occurs at the same time. Section 8.5.1.4 explains why this never happens. 8.5.1.1 Register Groups The MD registers are only written by the Ibox directly or by the Mbox in completing an Iboxinitiated memory read. They are only read by the Ebox, and only accessed using a pointer from the source queue. There are 6 MD registers, MDO-lID5. The GPRs are all of the VAX general purpose registers~ except R15 (PC). These are read and written by the Ebox in the course of instruction execution. The !\!box writes them to complete an Eboxinitiated memory read. The Ibox also reads and ,,'rites them. It reads them as it processes operand specifiers \vhich use a GPR in an address calculation. It writes them as it processes autoincrement and autodecrement operand specifiers, and in UD\'tinding the RLOG. There are 15 GPRs, RO.R14 (R14 is often referred to as SP). Writes to GPRs can depend on the DL (data length) register. If the L field of the microword which caused the write specifies LONG, the fulllongword is written. If the microword specifies L'LEN<DL), only the appropriate bytes are written. The following table shows which bytes are written in all cases. Table 8-2: GPR Write Length Write Byte? DLRegister L Field of Microword 3 2 1 0 X LONG LEN(DL) LEN(DL) LEN(DL) Y Y Y Y N N Y Y Y Y Y LEN(DL) Y N N Y Y Y Y BYTE WORD LONGWORD QUADWORD N Y X means don't care The Wn registers are used by microcode for temporary storage and to receive memory read data. They are only read by the Ebox using the A or B fields of the microword. They can be written by the Ebox, Mbox, or !box. The Mbox writes them in completing an Ebox memory operation. The Ibox only writes them when completing an Ebox-initiated read of an !box !PR. There are 6 Wn registers, WO-W5. DIGITAL CONFIDENTIAL The Ebox 8-13 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The CPUSTATE registers are used by the microcode to hold elements of architectural state. They are read and written only by the Ebox.. There are 10 CPUSTATE registers: KSP, ESP, SSP, USP, !SP, ASTLVL, SCBB, PCBB, SAVEPC, and SAVEPSL. 8.5.1.2 Access Ports The A port and B port of the register file are read ports which can supply data to E_BUSo/GABUS_ 1.<31:0> and E_BUS%BBUS_L<31:O>, respectively. These two ports are accessed in 83. The address can be supplied directly from the A and B fields of the microword or indirectly through the source queue. Source queue addressing is specified in the A and/or B microword fields. The A port can read any register in the file; the B port can read any register in the file except a CPUSTATE. The W port is the write port connected to E_BUS%WBUS_L<31:O>. It receives a result from the Ebox or Fboxin 85. It can write to the GPRs, CPUSTATEs, and Wn registers. The address can be supplied directly by the microword in the DST field or (for GPRs only) indirectly through the destination queue. De~tination queue addressing is used when the microword specifies DSTIDST or when the Fbox writes a result to a GPR. NOTE 'When the Ebox initiates a memory read by sending a request to the Mbox, it specifies the register which will receive the memory data in the DS! field of the microword. This has the sides effect. ,vhen the microword is in S5, of writing that register with the value on E_BUS%WBUS_L<Sl:O>. Normally this register is written by the Mbox after this, before the particular register is read again. However, an exception can prevent the Mbox write and leave the register containing effectively garbage data. The IA port is a read port used by the Ibox to read GPRs for use in general address calculation and for autoincrement and autodecrement operand specifier processing. It can only read the GPRs. The address is supplied by the Ibox.. The IW port is a write port used by the Ibox. It can write to the GPRs, the MD registers, and the Wn registers. The Ibox writes GPRs when it processes autoincrement and autodecrement operand specifiers and when unwinding the RLOG. It writes MD registers when operand specifier decoding requires passing a value (such as an address) to the Ebox. The Ibox writes the Wn registers only when responding to an Ebox-initiated IPR read. The address is supplied by the Ibox. The MD port is used by the Mbox to write memory or IPR read data into Wn registers, :MD registers, and GPRs. The Mbox writes MD registers to complete Ibox-initiated reads. It writes Wn registers or GPRs to complete Ebox-initiated reads. The register file address is supplied by the MbOx. (The Mbox received the register file address when the memory operation was initiated.) 8.5.1.3 Register File Bypass Paths The Ebox implements bypass for data being written into the register file or scheduled to be written into the register file further down the pipeline. Two techniques are employed: actual bypass datapaths and flow-thru bypass. Actual bypass paths are data paths and drivers which directly drive the data onto E_BUS%ABUS_L<Sl~ or E_BUs%BBUS_L<31~. The register file E_BUKABUS_ 1.<31:0> or E_BUs%BBUSJ,.<31:O> drivers are automatically disabled when bypassed data is driven. Flow-thru bypass is the technique in which a write to the register file occurs early in the cycle, well before the read. This way, reads see the result of writes which occur in the same cycle. This technique can only be used when the write data is available early enough and is scheduled to be written in that cycle. (For example, bypass of S4 Ebox results to E..BUSVIGABUS_L<311O> 8-14 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 or E_BUS%BBUS_L<31:O> can't be done with :6ow-thru bypass because the register file write isn't supposed to happen until 85.) See Section 8.5.8 for a description of bypassing of Ebox or Fbox result data from S4 or 85. The register file has actual bypass paths for bypassing IW port writes to E_BUS%ABUS_L<31:O> and The IW port write occurs too late in the cycle for fiow-thru bypass to be used. E_BUSo/cBBUS_L<31:O>. NOTE rw port bypass is necessary for the NVAX CPU to correctly handle some sequences of operand specifier decoding. Here is one example. (10 understand this example, the reader may need to know things which haven't been explained before this point in this specification.) Assume the CPU has to execute the following sequence of macroinstructions: ADDL2 RO,(RO)+ ADDL2 RO,Rl If the Ibox is executing far enough ahead of the Ebox and the read of memory data at (RO) takes a long time (as it \vould if it neither the Pcache nor the Bcache contains the data), then at some point the Ebox is stalled waiting for that data to arrive in an MD and the source and destination queues contain all the entries generated by the two ADDL2 instructions. The Ehox microword which executes ADDL2 is: AlSI, B/S2. ALU/A.PLUS.B, L'LENCDL), MISCILOAD.PSL.cc.nn, SEQJ~fL'XILAST.CYCLE.OVERFLOW In S3 this microword accesses the first two entries in the source queue, which in this case point to RO and some MD. The microword is stalled waiting for the memory read to complete (and the MD to become valid). The Ibox complex specifier unit (CSU) is stalled by the scoreboard unit (SBU) because it is just about to write R0+4 into the register file. For the Ebox must see the old value when it reads RO, the Ibox write to RO must be stalled. Once the Ebox retires the source queue entry containing the pointer to RO, the Ibox knows it can write RO. In cycle N the memory data arrives and is written into the MD. This ends the S3 stall in the Ebox. The very next microword to enter S3 (in cycle N+l) is for the second ADDL2. It reads RO and RI, and must see the new (incremented) value afRO. In cycle N+l, the Ebox signals the Ibox of two source queue retires, the Ibox SBU ends the CSu's stall, and the CSU writes R0+4 on the IW port. The Ebex reads RO in that cycle and, because of the IW port bypass, it sees the correct (autoincremented) value of RO. When processing an autoincrement or autodecrement specifier for an address access type operand specifier, the Ibex does two sequential writes into the register file. The first writes the address into an MD register. the second writes the incremented or decremented register value back into the register. In some cases this can cause the Ebox to attempt to bypass both from the output of the RMUX in S4 and from the IW port to either or both of E.-BUS%ABUS_L<Sl:O> and E_BUS%BBUS_ L<31:O>. In these cases the bypass from the output of the RMUX overrides the IW bypass. See Section 8.5.8 for more on bypass from the output of the RMUX. DIGITAL CONFIDENTIAL The Ebox 8-15 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 8.5.1.4 Write Collisions The result is UNPREDICTABLE if more than one write to the same register file location occurs at the same time. To prevent this, writes to registers are controlled by certain hardware and microcode mechanisms. The MD registers can only be written by the Ibox or the Mbox. The Ibox complex specifier unit has hardware which allocates and deallocates MD registers. The Mbox writes an MD only when returning data for an Ibox-initiated operand data read, and it writes to the particular MD specified by the Ibox. The Ibox writes an MD directly only when it knows that no outstanding reads to the same MD exist in the MbOx. Therefore, The Mbox and Ibox will never write an MD at the same time. The GPRs can be written by the Ebox, Ibox, and MbOx. In many typical instruction execution situations, the Ebox never writes a GPR explicitly. It only writes them through destination queue accesses. The Ibox only writes GPRs to process autoincrement or autodecrement operand specifiers~ so it ahvays reads a given GPR prior to writing it. The Ibox scoreboard unit keeps track of which GPRs have been entered into the destination queue and allows Ibox complex specifier unit reads only when there are no Ebox writes outstanding. This means the Ibox \.vill never write a GPR at the same time as the Ebox. When execution of a particular macroinstruction requires the Ebox to directly write GPRs, the Ibox is ahvays stopped (the Ibox stops itself after processing the macroinstruction's operand specifiers). In these cases, microcode can write to any GPR without colliding with an Ibox write. The l\.1box only writes a GPR ,,""hen returning data for an Ebox-initiated Mbox operation. Microcode doesn't issue such a memory read unless it knO\VS the Ibox is stopped, and microcode doesn't write the GPR while such an operation is outstanding. '\\"'hen unwinding the RLOG, the Ibox may write GPRs. The Ebox microcode knows this may be happening because the unwind was either initiated under microcode control or as a result of a branch mispredict. In either case the Ebox microcode doesn't write GPRs while the unwind is occurring. The Ebox, Ibox, and the Mbox can write the Wn registers. The Mbox only writes a Wn register when returning read data for an Ebox-initiated Mbox operation. The Ibox only writes Wn registers to return IPR data at the Ebox's request. Microcode never writes a Wn if there is an Mbox or Ibox operation outstanding which will write the same register. Only the Ebox can write CPUSTATE registers, so there is no possibility of a write collision on those registers. 8.5.1.5 Valid, Fault, and Error Bits Some of the registers in the register file have valid bits andlor fault and error bits associated with them. There is one valid bit, one fault bit, and one elTor bit associated with each MD register. The Wn registers each have a valid bit, but no fault or error bits. Valid bits are used to allow synchronization with memory reads. Whenever a memory read to a Wn register is initiated, the associated valid bit is cleared. The valid bit for an :MD register is cleared as a side effect of reading it, so it is already cleared when a memory read to it is initiated. (The MD valid bits are also cleared in exception cases, by MISCIRESET.CPU.) When the Mbox supplies the data, the valid bit is set. If the microword in S3 reads from an MD or Wn register whose valid bit is not set, the pipeline stalls in S3. 8-16 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Fault and error bits are used to indicate that some sort of exception occurred with the memory read. Fault bits indicate memory management exceptions, while error bits indicate hardware errors. When the microword in 83 reads an MD register whose fault or error bit is set, a microtrap is scheduled for this microword. The microtrap is delayed in the pipeline as is discussed in Section 8.5.19. Fault and error bits are needed to delay Ebox detection of memory exceptions until the Ebox is processing the associated VAX instruction. A set fault or error bit indicates an Ibox or Mbox detected exception condition related to source operand specifier processing. If the Mbox was unable to complete an Ibox-initiated memory operation targeted to MD, it sets the fault or error bit. If the Ibox encountered any sort of fault or error before initiating the final memory read necessary to process an operand specifier, it sets the fault or error bit directly. In either case the Ebox will not detect the fault until it is executing the associated VAX instruction. There is no need for Wn register fault bits because microtraps due to Ebox memory reads are taken as soon as they are reported by the Mbox. All the Wn register valid bits are set unconditionally in 83 of each new macroinstruction execution microflo\v. The Microsequencer signals the Ebox at start of these microflows. This is done to prevent errors from causing the pipeline to stall waiting for a condition which will never be true. If an error causes an Ebox memory read to a particular Wn register to fail to complete it leaves the \·alid bit cleared. If a ne\v microflow references the same working register, it will stall. Since the memory operation will never complete, the stall will never end. All Wn register valid bits are set unconditionally when the :huse field of the micro\vord specifies RESET. CPU. Wn register valid hits are normally set. A Wn register's valid bit is cleared in S4 if the micro\vord specifies a memory read which. will deliver data to that register. The bit is set when the Mbox or Ibox writes to that register. It is not altered by Ebox (A, B, or W port) accesses. The S4 clear of a Wn valid bit will cause the CUlTent S3 microword to stall if it references Wn. All the MD valid bits are cleared when the microword MISC field specifies RESET.CPU. AID valid bits are not normally set. In normal operation, an lID register's valid bit is set when the Mbox or Ibox writes that register, and is cleared as a side effect of the Ebox reading the register. 8.5.2 Constant Generation There are two constant generators, an extremely simple E_BUS%ABUS_L<31:O> constant source and a more complicated ~BUs%BBUS_L<31:O> source. The E_BUS%ABUS_L<31:O> constant source is specified in the A field of the microword. It can produce the following longword constants: 0, 1. To source these constants to E_BUS%ABUS_L<31:O>, the microword specifies KO or Kl" respectively, in the A field. The E_BUS%BBUS_L<:31:O> constant generator builds a longword constant by placing a byte value in one of the four byte positions in the longword. The POS and CONST fields of the microword specify the value. The CONST field contains a byte value, while the POS field specifies the byte in the longword in which the value appears. The other bytes are zero. It is as if the POS field specified a left shift with zero fill of the CONST value. The POS and CONST fields are part of the constant generation variant of the microword. In this variant the VAL and B fields of the standard format microword, or the MISC2, DlSABLE.RETIRE, and B fields of the special format, are replaced by the POS and CONST fields. In the constant generation variant, E_BUS%BBUS_L<31:O> receives the constant so the B field is unnecessary. Also, the shifter uses the se register for the shift amount so the VAL field is not needed (put another DIGITAL CONFIDENTIAL The Ebox 8-17 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 way, VAUO is effectively specified by the constant generation variant). Similarly, MISC2INOP and DISABLE.RETIREINO are effectively specified by constant generation variant microwords. Under control of the MISe field, the E_BUS%BBUS_L<31:O> constant generator can also provide a constant in which the low order 10 bits are specified by microcode and the high order 22 bits are all zero. This mode of constant generation occurs when the MISC field specifies CONST.I0.BIT. In this case the 10 bit constant is sourced from the CONST.I0 field of the microword. (The CONST.I0 field is formed by concatenating the two-bit FOS field with the 8-bit CONST field, with the POS field more significant.) The microword format must be the constant generation variant, ifMISC/CONST.I0.BIT is specified. The E_BUS%BBUS_L<31:O> constant generator can also provide the constant OOOOFFFFf16. It is produced when the B field of the microword specifies K.FFFF. 8.5.3 The ALU The ALU is a 32-bit function unit capable of arithmetic and logical operations. Its inputs are E_ BUSt;"c..o\BUS_L<31:O> and E~BUS%BBUS_L<31:O>. Its output drives E_ALU%RESULT_H<31:O> which can be muxed onto E_BUS%~'BUS_L<31:O> and is directly connected to the VA register (see Section 8.5.6). It also produces condition codes (ALU<C>, ALU<N>, ALU<V>, ALU<Z» based on the results of its operation. The _o\LU condition codes are data length dependent, with the data length coming from the DL register or defaulting to longword depending on the microword L field. The ALU operation is specified by the ALU field of the microword. The follo'\ving table shows the ALU operations by name, and gives a description of each operation. Table 8-3: ALU Operations ALU Operation Name Operation Description PASS.A E_ALU%RESULT_H <- A E_ALU%RESULT_H <- B E_ALU%RESULT_H <- A .AND. B E_ALU%RESULT_H <- A .AND. (.NOT. B) E_ALU%RESULT_H <- A .OR. B E_ALU%RESULT_H <- A .xOR. B E_ALU%RESULT_H <- (.NOT. A) AND B E_ALU%RESULT_H <- A + 1 E_ALU%RESULT_H<-A+B E_ALU%RESULT_H <- A + B + 1 E_ALU%RESULT_H <- B - A =B + (.NOT. A) + 1 E_ALU%RESULT_H <- A - B = A + (.NOT. B) + 1 E_ALU%RESULT_H <- A - B-1 =A + (.NOT. B) E_ALU%RESULT_H <- A - 1 E_ALU%RESULT_H <- A + 4 E_ALU%RESULT_H <- A - 4 PASS.B A..AND.B A.AND.NOT.B A.OR.B A.XOR.B NOT.A.AND.B A.PLUS.l A.PLUS.B A.PLUS.B.PLUS.l B.MINUSA A.MINUS.B A.MINUS.B.MINUS.l A.MINUS.l A.PLUS.4 A.MINUS.4 8-18 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 8-3 (Cont.): ALU Operations ALU Operation Name Operation Description NEG.B E_ALU%RESULT_H <- -B (minus B) NOT.B E_ALU%RESULT_H <- .NOT. B (ones complement of B) SMUL.STEP E_ALU%RESULT_H <-A .sMUL. B (Q register is affected, see text) UDrv.STEP E_ALU%RESULT_H <- A .unrv. B (Q register is affected, see text) The following signals are used in functional descriptions below: • E_ALU%RESULT_B<N> is the nth bit of the ALU result. • E_ALUo/cCCH<N> is the nth carry-in bit in the ALU. It is the carry into the nth bit slice. The carry-in to the ALU is E.-A,LUO/cCCH<O>, while the carry out for longword data length is E_ ALUO/eCCH<32>. 8.5.3.1 ALU Condition Codes The four condition codes calculated by the ALU are: • • • • ALU<V>-Integer Overflow This bit indicates an integer ovel'fiow from the operation. It is the XOR of the carry in to the most significant bit with the carry out of the same bit. The calculation depends on the data length in effect for the operation. It is E_ALUo/cCCB<.~> .xOR. E-.ALU'icCCH<.~+l> where n is 7, 15, or 31 for byte, word, or longword data length, respectively. ALU<C>-Carry Out .. This bit is the carry out from the operation. It is E_ALU%CCB<8>, E_ALU%CCB<lB>, or E_ ALU%CCB<32> for byte, word, or longword data length, respectively. ALU<Z>-Zero This bit indicates that the ALU result was zero. It is the logical NOR of E_ALU%RESULT_H<7:O>, E_ALU%RESULT_B<lS:O>, or E_ALU%RESULT_B<31:O> for byte, word, or longword data length, respectively. . ALU<N>-Negative This bit indicates that the ALU result was negative. It is simply E_ALU%RESULT_H<7>, E_ALU%RESULT_B<lS>, or E..ALU%RESULT_B<31> for byte, word, or longword data lengths, respectively. length, respectively. For logical and PASS operations the ALU<C> and ALU<V> condition code bits are always zero. The ALU condition codes are available on the miCl"otest bus and can be used to update the PSL. If the microword following the one setting the ALU condition codes is stalled, the Ebox control logic holds the ALU condition code bits constant until the microword branching on them is ready to use them. The effect is the same as if no stall had occurred. See Section 8.5.14 and Chapter 9 for more about the microtest bus and see Section 8.5.5 and Section 8.5.10.1 for more detail on setting PSL condition code bits. If the ALU operation is SMUL or UDIV, the ALU condition codes correspond to the ALU result before the one-bit shift is done on the result. DIGITAL CONFIDENTIAL The Ebox 8-19 NVAX CPU Chip Functional Specification, Revision 1.0, February .1991 8.5.3.2 SMUL Step Definition The signed multiplication step is used to implement the sequential add and shift multiplication algorithm. It allows microcode to implement byte, word, and longword multiplication of two operands. The SMUL step uses the single bit left or right shifter at the output of the ALU, the Q register, and two microcode working registers. The operation of a single SMUL step is described in Figure 8-2. The proper number of SMUL steps is controlled by the microcode and depends upon the data length of the operation. The SMUL step operation selects the ALU operation (either PASS.A or A.PLUS.B) based on the least significant bit of the Q register. However the Q register must not have been loaded by the previous microword unless that microword specified an SMUL step. This is because that bit of the Q register is not ready in time to control the ALU operation if the Q register was loaded from the output of the shifter in the previous cycle. Figure 8-2: - t·:~ o SMUL Step Operation 0 Ii" Q<O> - 1 =::.!::: ~ ;'. :'::.lJE?ZS:::': ::<~:': 0> <-- i\a ~ w~ E~SE E:~U%RESULT:H<31:0> <-- Wa <-- (:!.::!'!.1 ?:o::.~:': o WE~S<31:0> o Q<31:0> <-- E_ALU%RESutT_H<O>' Q<31:1> (E_ALU%P~SutT_H<31> ~ !..!..::-;!pl:':c::; (Partial Product) .XOR. E_ALU%=1_H<31> .XOR. E_ALU%=1_H<32» , E_ALU%RESOLT_E<31:1: At end: Wa ' Q - product NOTE: E_ALU%RESULT_H is the value of the ALU before the single-bit shift • . Description: The lsb of the Q register is tested for a 0 or 1. If Q<O> EQL 0, then the partial product is passed through the ALO unmodified. If 0<0> EOL 1, then partial product and the multiplicand are added together. Then the output of the ALO and the 0 register is shifted right one bit. The shift into the msb of WBOS is the exclusive-or of the ALU's output sign and the arithmetic overflow out of the ALU (arithmetic overflow is the exclusive-or of the carry-in and carry-out of the msb). The shift into the msb of 0 comes from E_ALU%RESOLT_H<O>. 8.5.3.3 UDIV Step Definition The unsigned division step is used to implement the sequential shift and subtract non-restoring division algorithm. It allows microcode to implement byte, word, and longword division of two operands, and to produce the remainder. The UDIV step uses the single bit left or right shifter at the output of the ALU, the Q register, and two microcode working registers. 8-20 The Ebox DIGITAL CONFIDENTIAL. NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The operation of a single UDIV step is described in Figure 8-3. The proper number of UDIV steps is controlled by the microcode and depends upon the data length of the operation. The unsigned divide algorithm using the UDIV step requires microcode to shift the remainder one bit to the right after the final UDIV step. Figure 8-3: UOIV Step Operation Note that non-restoring division use the fact that 2 ~ (Partial Remainder - Divisor + Divisor) - Divisor 2 * (Partial Remainder - Divisor) + Divisor At stl!n: Q register WD - diVidend divisor - 0 (except during an extended divide when Wa contains the high-order longword th. dividend) - Wa 0= !'!lis op_:,c::.ie·:: resu:-:.s ~a ~:'! -:.he Q re;i.s~e: c:o:l-:2.ir.i:lg ':.!l: q'.:o-:.:"-ar.-:. a:::l con~a~~!~; ~hc =~!ndE:. c : ! ;...:.-=_::.: =:;:: :. =~~:, ~_;'':'tJ%?~St':'=_E E:'S~ <-- We - W::> (?artia: Remainder/Quotient - Divisor) =:__:..:..:r~:,!:S~=_E <-- v:a ~ i-:--,:, ,:?a=:.:"a:' ;,,_:ra.:":::'.a.=/~::.·:·':.i_::.:. - :':"'7:"s:=} At end: Q register - quotient Wa - remainder NOTE: E_ALU%RESULT_H is the value of the ALU before the single-bit shift. Description: ALU CC.C is tested for a 0 or 1. If ALU CC.C EQL 1, then Wb is subtracted from Wa. If ALU ce.c EQL 0, then the Wa and WD a;e added together. The output of the ALO is then rotated to the left one-bit and driven onto the WBUS with w.BUS<O> being driven by 0<31>. Additionally, the Q register is rotated left one bit with the complement of the bit shifted out of the ALU result becoming 0<0>. ~he new ALO_CC.C condition flag comes from the carry out of the ALO (or E_ALU%CI_H<32> here). 8.5.4 The Shifter The shifter is a right shift network with 64-bits of input and 32-bits of output. The input is E_BUS%ABUS_lAl:O> and E_BUS%BBUS_L<31:O> concatenated to' form a 64-bit word with E_BUS%ABUS_L<31:O> in the more significant longword. The output is E_SHF%SBF_RESULT_B<31:O> which can be muxed onto E_BUS%WBUS_L<31:O> and is directly connected to the Q register (see Section 8.5.7). The shifter produces two condition code bits, SBF<N> and SBF<Z>. These are available on the microtest bus and can be used to update the PSL. See Chapter 9 for more about the microtest bus and see Section 8.5.5 and Section 8.5.10.1 for more detail on setting PSL condition code bits. DIGITAL CONFIDENTIAL The Ebox 8-21 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The shifter shifts its input right by 0 to 32 bits. A shift amount of 0 selects the E_BUS%BBUS_L<Sl:O> and a shift amount of 32 selects E_BUS%ABUS_L<Sl~. The equivalent of a left shift of N is accomplished by shifting left justified data (32-N) to the right. The shift operation is specified in the 8HF field of the microword. The following table shows the shifter operations by name and gives a description of each operation. If the microword is in the special format, the shifter function defaults to NOP since the 8HF field is not present. Table 8-4: Shifter Operations Shifter Operation· Name Operation Description NOP E_SHF%SHF_RESULT_H <- UNPREDICTABLE PASS.A E_SHF%SHF_RESULT_H <- A PASS.B E_SHF%SHF_RESULT_H <- B Bc\SS.Z E_SHF%SHF_RESULT_H <- 0 LEFT.DOUBLE E_SHF%SHF_RESULT_H <- A'B rsh 32 - count (the effect is LSH count) LEFT. SINGLE E_SHF%SHF_RESULT_H <- KO rsh 32 - count (the effect is LSH count) RIGHT.DOUBLE E_SHF%SHF~RESULT_H <- A"B rsh count E_SHF%SHF_RESli'LT_H <- O'B rsh cou.~t RIGHT.SL.~GLE • is the bitwise concatenation operator. For the 8HFILEFT.8INGLE and SHFIRIGHT.8INGLE operations the shifter masks off E_BUS%BBUS_L<31:O> or E_BUS%A.BUS_L<31:O>, respectively. This guarantees that the bits shifted into the result are o. The shift amount comes from the VAL field of the microword or from the se register. The 8e register is the source of the shift amount if the VAL field is 0 or if the VAL field is not present because the microword is in the constant generation variant format. The 8e register can specify an actual shift amount in the range of 0 to 31, and the VAL field can specify a shift amount of 1 to 31 (0 in VAL implies se contains the shift amount). Neither the se nor the VAL field can specify a shift of 32. However, since the 8HFILEFT.8INGLE and 8HFILEFT.DOUBLE operations differ from the corresponding right shift operations only in that the actual shift amount is the amount in the Be register or VAL field subtracted from 32 (32-N), the shifter shifts right by 32 when a left shift of 0 is specified. 8.5.4.1 Shifter Condition Codes The shifter condition codes are not dependent on the instruction data length. They are calculated always for longword data length. The two condition codes calculated by the shifter are: • SHF<Z>· Zero This bit indicates that the shifter result was zero. It is the logical NOR of E_SIIFo/cSHF_RESULT_B<31~>. 8-22 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Speci1ication, Revision 1.0, February 1991 • SHF<N> • Negative This bit indicates that the shifter result was negative. It is simply E_SBF%SBF_RESULT_B<3l>. The shifter condition codes are available on the microtest bus and can be used to update the PSL. If the microword following the one setting the shifter condition codes is stalled, the Ebox control logic holds the shifter condition code bits constant until the microword branching on them is ready to use them. The effect is the same as if no stall had occurred. See Section 8.5.14 and Chapter 9 for more about the microtest bus and see Section 8.5.5 and Section 8.5.10.1 for more detail on setting PSL condition code bits. 8.5.4.2 Shifter Sign The shifter sign, SHF<N> , is saved after each shifter operation including pass operations. A constant based on this saved value is available as an input to E_BUSo/c:ABUS_L<31:O>. It is accessed by specifying SHIFI'~SIGN in the A field of the microword. The constant is 0 or FFFFFFFF#16 for Saved·SHF<J{> equal 0 or 1, respectively. Saved·SHF<N> is updated after each shifter operation and is held in each shifter NOP cycle. If microword :K specifies a shifter operation, and microword N+1 sources this constant, the new value is used to form the constant. However, the Saved-SHF<N> may be destroyed by executing a special format microword. The hit is UNPREDICTABLE after executing such a microword. The RM1JX coordinates Fbox and Ebox result storage and macroinstruction retiring. It is a large selector which selects the source of Ebox memory requests and the source of the next E_BUS%WBUS_L<31:O> data and associated information. The RMUX selection takes place in 84, as does the driving of the memory request to the Mbox. The new E_BUS%WBUS_L<31:O> data is not used until S5. The RMUX is controlled by the retire queue. See Section 8.5.15.7 for detail on the retire queue. The retire queue output is a status which indicates whether the next macroinstruction to retire is being executed in the Ebox or the Fbox. Based on this status, the RMUX selects one of the two boxes to drive E_BUS%WBUS_L<3l:O> and to drive the memory request signals. The box not selected will stall if it has need to drive E_BUS%WBUS_L<3l:O> or memory request signals. The retire queue read pointer is not advanced, and therefore the RMUX selection cannot change, until the currently selected box indicates that its macroinstruction is to be retired (except that the retire queue read pointer is not advanced when MISC1IR.ETIRElNSTRUCTION is speci:fted). NOTE The Ebox stalls when the microword does not specify NONE in the DST field and the retire queue selects the Fbox. It does not stall if the microword speci:ftes DSTINONE, even if the same microword specifies a memory request. This is the reason for the microcode restriction that any microword specifying a memory operation must also specify DSTIWBUS or something other than none in the DST field. See Section 8.5.27.15. The source (Ebox or Fbox) indicated by the retire queue is always selected to drive the RMUX. If the Ebox is selected, the W field of the microword in 84 selects either the ALU or the shifter as the source of the RMUX. (Note that E_BUS%WBUS_L<31:O> is always driven, even if the Ebox specifies DSTINONE.) DIGITAL CONFIDENTIAL The Ebox 8-23 NVAX CPU Chip Functional Specification, Revision 1.0t February 1991 8.5.5.1 RMUX Produced Memory Request Signals The RMUX produced memory request signals are: • • • • a memory command, a status indicating a destination queue indirect memory store, a tag giving a register file address in case a memory read is specified, and the data length for the operation. This information is processed slightly further in the Ebox's Mbox interface logic to produce a memory request about halfway through 84. See Section 8.5.17 for more on Ebox memory requests. The only memory operation the Fbox can initiate is a destination queue indirect store (a memory store). If the Fbox is selected as the RMUX source, the memory request information comes from the Fbox and the destination queue. The destination queue is only accessed if the Fbox requests it. If it does not request a destination queue access, the memory information output by the RMUX indicates no operation. The Fbox also provides the data length if there is a store. If the Ebox is selected as the RMUX source, the memory request information comes from the microword. However, the DST field can cause a memory store request if it specifies a destination queue indirect store. The data length is from the DL register unless the microword L field overrides it to longword. The register £Ie address for memory reads always comes from the DST field. 8.5.5.2 RMUX Produced E_BUsokWBUS_L Related Information E_BUS%WBUS_L<31:O> carries result data from the Ebox and Fbox and is the only path by which macroinstruction results are written to memory or registers. The RMUX produced E_BUS%WBUS_L<31:0> related information is: • • • • • the E_BUS%WBUS_L<31:O> (a longword of data), the E_BUS%WBUS_L<31:O> destination address or specification, the data length associated with E_BUS%WBUS_L<31:O>, the S5 condition codes, and an indication of which condition code map is to be used. The above control information is driven into 8S provided there is not an S4 stall. If there is an S4 stall, 85 control information specifying no operation is driven into 8S instead. If the Fbox is selected, E_BUS%WBUS_L<31:O> data comes from the Fbox. The E_BUS%WBUS_L<31:O> destination address comes from the destination queue. The condition code bits and map specification come from the Fbox. The Fbox sets map specification code to specify no change of the condition code hits, except in the last cycle of an instruction retire when the map specifier specifies a particular condition code update. See Section 8.S.10.1.1 for more detail on condition code alteration. If the Ebox is selected, E_BUS%WBUS_L<31:O> data comes either from E_ALU%RESULT_B<Sl:O> or E_SIIFo/cSHF_BESULT_B<31:O>. The condition codes come from the same source (ALU or shifter). Since the shifter only produces N and z condition code bits, the RMUX substitutes 0 for 8S C and V bits if the shifter is selected. The E_BUS%WBUS_L<31:O> destination address comes from the DST field of the microword or from the destination queue. The status indicating whether the condition code bits are to be updated and the condition code map to be used are both decoded from the MIse field of the microword. 8-24 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 In 85, E_BUS%WBUS_L<:31:O> drives the W port of the register file and is the input to several miscellaneous registers in the Ebox. The condition codes and the map are used to update the PSL condition code bits if the map and associated status indicate this should happen. E_BUS%WBUS_L<31:O> is also the source of write data for any memory write request which was sent by the Ebox to the Mbox in the previous cycle. In other words E_BUS%WBUS_L<31:O>, in 85, is the source of write data for the memory operation selected by the RMUX in 84. In 85, E_BUS%WBUS_L<31:O> is zero extended according to the data length. The data length is from the DL register unless the microword L field overrides it to longword. E_BUS%WBUS_L<31:O> data is zero extended from the effective data length to longword. 8.5.6 VA Reg ister The 32-bit VA register is the source for the address on all Ebox memory requests, except destination queue based stores which use the current PA queue entry for an address. Unlike the entry in the PA queue, the VA register address is not yet translated (though it may be a physical address). It is a virtual address except 'when the memory operation doesn't require translation (as in IPR references or explicit physical memory references) or when memory management is oft'. The VA register can be used to latch a temporary ALU output value without driving the ALU result onto E_BUS%-WBUS_L<31:O>. The VA register can be loaded only from the output of the ALU, E_ALUtURESt.""LT_B<81:O>. It is loaded when the microword V field specifies to load it. The load occurs at the end of 84, even when there is an S4 stall. If a given microword specifies a memory operation in the ArRQ field and loads the VA register, the new VA value will be received by the Mbox with the memory command. For more detail on Ebox-initiated memory operations, see Section 8.5.17. NOTE The address for memory operations is part of the data latched in the EM_LATCH in the Mbox. This is why the Ebox can overwrite the VA value during 84 stalls even though the stall might be because the EM_LATCH is full. The VA register is one of the possible E_BUS%ABUS_L<31:O> sources. The microword specifies VA in the A field to use it. 8.5.7 Q Register The 32-bit Q register is closely associated with the shifter. It can be loaded directly from the shifter output without driving that data onto E_BUS%WBUS_L<81:O>. Microcode uses it to hold temporary data. The Q register can only be loaded from the shifter output~ E_SIIF%SBF_RESULT_B<31:O>. It is loaded when the microword Q field specifies to load it. The load occurs at the end of 84, even when there is an 84 stall. The Q register is one of the possible sources of both E_BUS%ABUS_L<31:O> and E_BUS%BBUS_L<:31:O>. The microword specifies Q in the A or B field to use it. The data in the Q register is shifted one bit to the left or right as a side effect of the ALU SMUL.STEP and UDIv.STEP operations. The shift is one bit to the left for UDIV.8TEP and one bit to the right for SMUL.STEP. DIGITAL CONFIDENTIAL The Ebox 8-25 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 8.5.8 Bypassing of Results The Ebox implements bypass paths for result data from S4 or S5 to E_BUS%ABUS_L<31:O> or E_BUS%BBUS_IAl:O>. These paths allow microwords to use any register in the register file as a source of E_BUS%ABUS_L<31:O> or E_BUS%BBUS_L<31:O> even if the register has been updated by one of the two preceding microwords. The Ebox pipeline reads from the register file in S3, operates on the data in 84, and writes the register file in 85. Since adjacent microwords in the pipeline could be from entirely different macroinstruction execution microftows, it is necessary that the Ebox hardware detect and resolve cases where one microword alters a register and a subsequent microword reads that register before it is written. NOTE The Fbox is one possible source of result data in 84, and any 85 operation may be a result store operation from the Fbox piped forward one stage. Bypassing of results destined for the register file from 84 or 85 works for Fbox result store operations in the Ebox pipeline in the same way as for microcode operations. The Ebox monitors the register file addresses on the A and B ports of the register file in 83 and compares those to the p~ft'X register file address in 84. '\\llenever E_BUSo/cABUS_L41:O> and E_BUSo/c:BBUS_L<31:O> are expecting data that is not yet in the register file, the data is steered directly from the output of the RltfL"X (at the end of S4). NOTE The bypass path for register file entries from E_BUS%'\\~US_L<31:O> in 85 to E_BUS%ABUS_IAl:O> or E_BUS%BBUS_L<31:O> is implemented by register file flow-thru writes. E_BUS%WBUS_L<31:O> data is written into the register file early in the cycle and read after the write. So reads see the result of writes from the same cycle. The S3 A and B port addresses can come from the microword or the source queue. Similarly the RMUX address in S4 can come from the microword, the destination queue, or £he Fbox. The w port address in 85 has already been determined by the RMUX in the previous cycle. The Ebox bypass path control logic compares the :final 83 read addresses to the :final 54 write addresses and enables the appropriate bypass path when there is a match. (As noted above, 85 to S3 register file bypass is a flow-thru path.) Data length has an effect on bypass operations for GPRs. When a pending GPR write is to less than a fulliongword, only the bytes which are going to be updated are bypassed. The other bytes are read from the register file. Effectively, an independent bypass check. is made for each of the following: byte 0, byte 1, and the upper word. In the event that the W port and the RMUX update the same register, the bypass logic chooses the RMUX data as the source of E_BU9ibABUS_L<31:O> or E..BUS%BBUS_L<31:O>. NOTE Note it would be possible for a value to be constructed of data from the register file, the RMUX, and the w port all at once, because of differing data lengths. In the event that the IW port (from the Ibox) and the RMUX update the same register, the bypass logic chooses the RMUX data as the source of E_BUS%ABUS_L<31:O> or E_BUS%BBUS_L<31:O>. See Section 8.5.1.3 for more on IW port bypass. 8-26 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The Q and VA registers updates are also effectively bypassed. Microcode can depend upon the new data being available to E_BUS%ABUS_L<31~ and E_BUS%BBUS_L<31:O> when the preceding microword updated these registers. However, the Q register contents are not bypassed if the Q register was updated by a shift caused by an ALUISMUL.STEP or ALU/UDJv.STEP ALU operation. NOTE The bypass mechanisms for the VA and Q registers are based on a flow-thru latch updated in S4 (and not stalled) rather than actual bypass paths. Neither bypass is data length dependent, as writes to these registers always load the entire longword. Bypassing for other registers and states in the Ebox generally does not make sense, and therefore is not implemented. For example, there is no bypass associated with the INT.SYS register or the PSL. 8.5.9 . Result Destinations Most of the Ebox result destinations receive their data from E_BUS%WBUS_L<31~ in S5. Destinations specified in the DST field of the microword are updated in S5 from E_BUS%WBUS_L<Sl:O>. Possible E_B'CS'icWBUS_L<31:O> destinations are any register file entry, the PSL and se registers, and the lIMGT.MODE and Thi~.SYS special registers. More detail on the miscellaneous registers is given in the next section. A number of special capabilities for loading registers are available through the MISe field of the microword. • • • 8.5.10 The DL (data length) register can be altered in 83, affecting the next microword but not the current one. The Be register can be updated directly from E_BUS%ABUS_L<4:O> in 84 (overriding an S5 update from the preceding microword). The MPU (mask processing unit, see Section 8.5.10.7) can be updated directly from E_BUS%BBUS_L<29:16> in 84. Miscellaneous Ebox Registers and States There are a number of states and registers in the Ebox with special purposes. Some, like the Some provide status signals used by Microsequencer conditional branches. They also vary in how and when they are loaded. DL register, provide control information. 8.5.10.1 PSL The PSL is the VAX architecture PSL register. Its bits are used in several places within the Ebox. The Microsequencer uses a number of the bits to make dispatching decisions. Additionally the current mode is used by the Mbox and the IPL level is used by the interrupts section (see Chapter 10 for more on intelTUPts). The PSL can be loaded as a longword or byte destination of E_BUS%WBUS_L<31:O> in S5. There are two different decodes of the DST microword field which load the PSL, DSTIPSL and DSTIPSL.BO. The first loads the entire PSL. The second loads only the low·order byte of the PSL. DIGITAL· CONFIDENTIAL The Ebox 8-27 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 8.5.10.1.1 CondHion Code Alteration The condition code bits of the PSL can be altered independently. This occurs when the MIse field of the mia-oword specifies one of the six possible PSL condition code update functions. Condition code update also occurs when the Fbox retires a macroinstruction. The update occurs at the end of 85. The resulting bits can be used in the next cycle (for example, the second following mia-oword can source the PSL). The new condition codes are a logic function (called a map) of the CUITent PSL condition codes and the new 85 condition codes. The 85 condition codes in any cycle were selected in the previous cycle by the RMUX from the shifter, ALU, and Fbox condition codes. The map specifier is an output of the RMUX. It is either supplied by the Ebox or the Fbox. The six different condition code update functions available through the MIse field of the mia-oword indicate six different maps. The Fbox derives its map from the opcode of the macroinstruction it is executing. The following tables show all the different condition code alteration maps. Table 8-5 shows the microcode specified maps used for macroinstructions executed in the Ebox. Table 8-6 shows the maps used for macroinstructions executed in the Fbox. Table 8-5: Condition Code Alteration Maps Specified By Microcode !\-IISC Field Specification Map Function LO~..D.PSL.CC.IIIP PSL<NZ:V> <- S5 Condition Codes <NZ:V> PSL<C> <- PSL<C> (unchanged) LOAD.PSL.CC.JIZJ PSL<N> <- S5 Condition Code <N> XOR S5 Condition Code <V> PSL<Z> <- S5 Condition Code <Z> PSL<V> <-0 PSL<C> <- NOT 85 Condition Code <C> LOAD.PSL.CC.nn PSL<N ~,V,C> <- S5 Condition Codes <N:L,V,C> LOAD.PSL.CC.mJ PSL<N:L,V> <- 85 Condition Codes <N:L,V> PSL<C> <- NOT S5 Condition Code <C> LOAD. PSL. cC.nIP. QUAD PSL<Z> <- PSL<Z> AND S5 Condition Code <Z> PSL<N,V> <- S5 Condition Codes <N,V> PSL<C> <- PSL<C>( unchanged) LOAD.PSLCC.PPJP PSL<V> <- NOT S5 Condition Code <Z> PSL<N:L,C> <- PSL<N:L,C>(unchanged) Table 8-6: CondHlon Code Alteration Maps Used By The Fbox Map Specifier Value Map Function o No change to the PSL condition code bits. 1 (used for MOVF, MOVD, MOVG) PSL<N~> <- S5 Condition Codes <N:L> PSL<V> <-0 PSL<C> <- PSL<C> (unchanged) 2 (used for instructions) PSL<N:L> <- S5 Condition Codes <N,Z> PSL<V,C> <- 0 8-28 The Ebox most fioating point DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 8-6 (Conl): Condition Code Alteration Maps Used By The Fbox Map Specifier Value Map Function 3 (used for MULL and some convert instructions) 8.5.10.1.2 PSL<N,z,V> <- S5 Condition Codes <N,z,V> PSL<C> <- 0 Trace and Trace Pending Bits 'When the first microword of a macroinstruction execution microHow reaches 85, the PSL<T> hit is copied into the PSL<TP> bit. <Macroinstruction execution microfiows are distinguished from other microfiows by a status bit sent from the Microsequencer. See Section 8.5.14.3.) The Microsequencer receives both these bits and causes a trap fault dispatch when necessary. The Microsequencer anticipates the setting of PSL<TP> when it dispatches a macroinstruction execution microfiow so that it 'will dispatch to the trace fault handler on the next SEQ.l\.fl.:"X'LAST.CYCLE or SEQ.1fUX/LAST.CYCLE.OVERFLOW. (See Section 9.2.3.3.2.) 8.5.10.2 SO The SC register is a 5-bit register which holds a shifter shift amount. The microword can specify left and right shifts of the amount in the se register. A microword specifies this one of two '\~ays. If the constant generation variant of the microword is used, the se register is always the source of the shift amount. Also, the SC register is the shift amount source if the microword is not a constant generation variant and the VAL field is zero. The se register can be loaded in two different ways. One way is to specify nST/SC, specifying the se as the destination of E_BUS%WBUS_L<4:O>. The other way is to specify MISCILOAD.SC.FROM.A. In this case the SC register is loaded from E_BUSo/oABUS_L<4:O>. The E_BUS%WBUS_L<4:O> load into SC occurs at the end of 85. The E_BUSo/GABUS_L<4:O> load occurs at the end of 84. In either case, the new value is not seen by the shifter until the next cycle. The shifter can use the old sc value during the current cycle. The SC control logic ensures that the following case works the same way with and without a stall on the second microword. If microword N loads the Be register off E_BU8%WBUS_L<4~, and microword N+l shifts some data by the amount in the se register, the data will be shifted according to the value in BC as microword N began. If two different microwords each specify a load of the Be in the same cycle, the E_BUS%ABUS_L<31:O> data is loaded. This can only happen if one microword specifies DSTISC and the following microword specifies MISeILOAD.SC.FROM.A. The more recently executed microword wins. (Note that this means the result when a stall delays the second microword is the same as if there is no stall.) NOTE If an Ebox pipeline abort occurs, it does not necessarily prevent the modification of the se register by a microword in the pipeline. If a microword which would alter the se in 85 (i.e., specifies DSTISC) enters 85 in a pipeline abort cycle, the 8e is loaded despite the abort. Effectively, the SC register is UNPREDICTABLE after a pipeline abort (though if a particular case is analyzed carefully, it may be possible to determine that the se is predictable in that case). DIGITAL CONFIDENTIAL The Ebox 8-29 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 8.5.10.3 INT.SYS INT.sys is a possible E_BUS%ABUS_L<31:O> source and a possible E_BUS%WBUS_L<31:O> destination. It is microcode's interface to the intelTUpt section. Both as a source and as a destination, INT.sys is a longword. For information on the format and use of the register, see Chapter 10. The register is read in 83 and written in 85. 8.5.10.4 MMGT.MODE MMGT.MODE register is a 2-bit E_BUS%WBUS_L destination. It is loaded from E_BUS%WBUS_L<3:2> early in 85. Its value is used in memory management probe accesses (MRQ field specifies PROBE.V.RCHK, PROBE.V.RCHK.NOFILL or PROBE.V.WCHK). The Ebox drives this mode The directly to the Mbo:x. For more detail on Ebox-Mbox interaction see Section 8.5.17. 8.5.1 0.5 State Flags There are 6 I-bit state flags: 0 through 5. Microcode can conditionally branch on these bits. They can be set and cleared by microcode, and some are cleared automatically at the start of each macroinstruction execution microfiow. The state flags are used as microcode fiags for loops and shared microcode paths. The state bits are maintained in 83. If the state bits are altered in a microword, a branch based on the ne\v state may be specified in the next microword. It is possible to set or clear state flags and branch on the previous value in the same microword. The following table shows the microword fields and specifications used to set and reset state flags. Table 8-7: Setting and Clearing State Flags MIse Field in Standard Format Microword Mnemonic Operation MISC/CLR.STATE.3-0 Clear State Flags 0-3 MISC/SET.STATE.O Set State Flag 0 MISC/SET.STATE.1 Set State Flag 1 MISC/SET.STATE.2 Set State Flag 2 MISel Field in Special Format Microword Mnemonic Operation MISClICLR.STATE.5-4 Clear State Flags 4 and 5 MISClISET.STATE.3 Set State Flag 3 MISClISET.STATE.4 Set State Flag 4 MISClISET.STATE.5 Set State Flag 5 At the start of each macroinstruction (macroinstruction and FPD dispatches in the microsequencer), in sa, state fiags 0 through 3 are reset. If the first microword of the macroinstruction execution microfiow sets any of the state fiags, it will override the automatic reset for the particular state bit(s) specified; the others are still cleared. 8-30 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The state flag bits may be selected onto the microtest bus for use in microcode branches. See Section 8.5.14 and Chapter 9 for more on microcode branches. 8.5.10.5.1 E%MACHINE_CHECK_H If state flag 5 is 1 and state flag 4 is 0, the signal Eo/tMACBINE_CHECK_H is asserted. This causes pin P%MACHINE_CHECK_H to be asserted. 8.5.1 0.5.2 State Flags and Pipeline Abort The state flags are maintained in S3. If a microword which specifies to set or clear state flags enters 83, the :flags are altered. Also, the automatic reset of state flags 0 through 3 at the start of a new macroinstruction execution flow occurs when the associated microword is in 83. In either of these cases, a pipeline abort (due to a microtrap) in S4 for the associated microword will not prevent the state flag modification. When microcode intends that the state flags not be altered by a specific flow if it is aborted by a microtrap, special rules must be followed. There are two cases. If the anticipated microtrap can only occur with microword N in S3, microword N+l can specify an alteration of a state flag and it will not happen if the microtrap occurs. If the anticipated microtrap can only occur with microword N in 84, and micrO\vord N+1 alters a state :flag, that state fiag will be affected even if the microtrap occurs. In this case, microword N +2 may alter a state :flag and it will not happen if the micro trap occurs. If it is not predictable whether microword N will be in 83 or S4 when the anticipated microtrap occurs, then the obvious extrapolation of the above explanation determines the result. Here is an example case in which microword N is guaranteed to be in 83 when an anticipated microtrap occurs: • Microcode issued an explicit memory read to a Wn register and microword N sources Wn to the E_BUSo/cABUS_L<31:O> to synchronize the operation. The anticipated microtrap is associated with the memory read to Wn. Here are some example cases in which microword N is guaranteed to be in S4 when an anticipated microtrap occurs: • • Microword N sources an lID to the E_BUS%ABUS_L<al:O> (through the source queue) to synchronize to an operand prefetch issued by the IbOL The microtrap is associated with the operand which is to be returned to the MD. Microword N synchronizes to an explicit memory reference in microword N-l by specifying MRQlSYNC.MBOx. The microtrap is associated with the memory reference issued by microword N-l. Microcode which intends to avoid the side effect in which state :flags 0 through 3 are cleared in the first cycle of a macroinstruction micro:flow if a microtrap occurs may have to add a microword after the one synchronizing to the anticipated microfiow before specifying SEQ.MUXILAST.CYCLE or SEQ.MUX/LAST.CYCLE.OVERFLOW. Specifically ifmicroword N synchronizes to an anticipated microtrap in S4 and microword N+1 specifies SEQ.MUX/LAST.CYCLE, then state :flags 0 through 3 will not be cleared if the microtrap occurs. However, if microword N specifies SEQ.MUXILAST.CYCLE, the state flags could be cleared (though it would depend on the detailed timing of the events). DIGITAL CONFIDENTIAL The Ebox 8-31 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 8.5.1 0.6 DL Part of the Instruction Context Register The DL is one field of the instruction context register. It contains the initial data length for the macroinstruction which is being executed in the Ebox. The data length is determined by the Ibox and passed to the Microsequencer in the instruction queue. The Microsequencer enters the DL into the instruction context register, along with other instruction context information. It is used by the Ebox as the default data length for each microword. Each microword specifies use of the data length in the DL or use of a data length of longword. The L field of the microword determines this. The operations affected by data length are: • • • • Calculation of the ALU condition codes. The four condition codes are determined according to the data length. (For example, the ALU<N> is bit <31>, <15>, or <7> for longword, word, or byte length operations, respectively.) Zero Extending of E_BUS%WBUS_L<31:O> data. E_BUS%WBUS_L<31:O> data is zero extended from the specified data length to longword. The size of a memory operation initiated by this microword. This affects all memory operations except result stores to the current PA queue entry address. (PA queue entries contain the data length used for the store operation.) Register File GPR Writes. GPR writes from E_BUS%WBUS_L<31:O> are gated by the data length such that only the bytes in that data length are affected by the write and others are unchanged. (Writes from the MD and IW ports to the GPRs are not affected by the DL.) The DL field in the instruction context register can be modified by specifying DL.BYTE, DL\,\rORD, or DL.LONG in the MISC field of the microword. The effect is to set the DL to byte, word, or longword data length, respectively. The old DL value applies to operations in the current microword. The new DL value applies to the next microword. See Section 8.5.14.1 for more on the instruction context register. 8.5.10.7 Mask Processing Un" The mask processing unit (MPU) holds and processes a 14-bit value. The value is loaded from E_BUS%BBUS_L<29:18> when the microword specifies LOAD.MPU.FROM.B in the MISC field. The MPU outputs a set of bits with which the microcode can carry out an eight-way branch. They are MPUO_6<2:O> and MPU7_18<2:0>. The purpose of this is to allow microcode to quickly process bit masks in macroinstruction execution microfiows for CALLG, CALLS, RET, FFC, FFS, POPR, and PUSBR. The MPU unit loads a 14-bit value from E_BUS%BBUS_L<29:18> when the microword specifies it. This occurs in 84. The MPU evaluates the input producing the values on MPUO_6<2:O> and MPU7_13<2:0> shown in the table below. MPUO_6<2:(b depends only on mask bits <6:0> and MPU7_13<2:(b depends only on mask bits <13:7>. Table 8-8: MPU Calculation MPUO_6<2:O> Truth Table Mask<8:O> x x x x x x 1 000 All values shown in binary. X = don't care ' 8-32 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0J February 1991 Table 8-8 (Cont.): MPU Calculation MPUO_6d:O> Truth Table Mask<6:O> X X X X X 1 0 001 X X X X 1 0 0 010 x x x X 1 0 0 0 011 X 1 0 0 0 0 100 X 1 0 0 0 0 0 101 1 0 0 0 0 0 0 110 0 0 0 0 0 0 0 111 MPU7_13<2:0> Truth Table Mask<13:7> X X X X X X 1 000 X X X X X 1 0 001 X X X X 1 0 0 010 X X X 1 0 0 0 011 X X 1 0 0 0 0 100 X 1 0 0 0 0 0 101 1 0 0 0 0 0 0 110 0 0 0 0 0 0 0 111 All values shown in binary. X = don't care Microcode can branch on the MPU7_13<2:0> or MPU6_0<2:0> values after they are loaded. The initial processing is done by the end of the S4 cycle which loaded the MPU. When microcode does branch on one of these values, the least significant bit which is 1 in the current mask value in the MPU is reset to 0 automatically. This occurs in S3, so that the next microword can branch on the new value of the mask. (The MPU bit clear does not occur in a cycle in which there is an S3 stall.) The MPU detects that the microword entering 83 specifies an eight-way branch on MPU7_13<2:0> or MPU6_<k2:0> by examjnjng the E_USQ%UTSEL_B<4:O> and E_USQ%UTSEkL<4:O> bits. If they specify a MPU branch, the appropriate bit is reset. If a load of a new MPU mask value is simultaneous with a microcode MPU branch, the new data is loaded correctly without any side effect due to the branch. This occurs when a microword specifies LOAD.MPU.FROM.B and the immediately following microword does a branch on the previous mask value. The branch is an S3 operation of the second mier-oword, while at the same time the load is an 84 operation of the first. (The branch outcome is guaranteed to reflect the MPU value before the load.) DIGITAL CONFIDENTIAL The Ebox 8-33 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 8.5.11 Branch Condition Evaluator The branch condition evaluator uses the macroinstruction opcode, the ALU condition code bits, the PSL condition code bits, and E_SIIF%SIIF_BESULT_B<O> to evaluate the branch condition for all macroinstruction conditional branches. The evaluation is done in every cycle but is only used if the microword specifies SYNC.BDISP.TEST.PRED in the MRQ field. The result of the evaluation is compared with the Ibox prediction for the branch. The Ibox prediction is indicated in the current branch queue entry. If the Ibox prediction was not COlTect, the Ebox signals the Ibox and sends a branch misprediction trap request to the Microsequencer. The branch condition evaluation is begun late in 84 and finished early in S5. All the information needed to perform the evaluation is gathered late in 84. The PSL condition code bits used in the comparison are bypassed; they are the bits which will be latched into the PSL at the end of 84. The ALU condition code bits used are generated late in 84 and are dependent on the data length for the instruction. The shifter result bit is also generated late in 84. The opcode is available early in S4 and is used to set up the evaluation. In 85, the result of the branch condition evaluation is compared with the Ibox prediction, and Et;CBCO~"D_RETIRE_L is asserted to tell the Ibox that a branch queue entry for a conditional branch was removed from the branch queue. If the prediction was not COlTect, the Ebox also asserts E%BRA.~CH_MISPBEDICT_L which is received by the Ibox and Microsequencer. The ~ficrosequencer forces a branch mispredict microtrap beginning in the next cycle when Et1CBRA..~CB_!t!ISPREDICT_L is asserted. If E%BCO~"D_RETIRE_L is asserted and E~-cBRA.~CB'>fISPBEDICT:..L is not, the Ibox releases the resource which is holding the alternate PC (the address which the branch should have gone to if the prediction was not colTect). If E%BCO~-n_RETIRE_L and Et;CBR.AL~CBJ\nspREDICT:..L are both asserted, the Ibox begins unwinding the RLOG and fetching instructions from the alternate PC. In this case, the microtrap in the Ebox will cause the Ebox and Fbox pipelines to be purged and the various Ibox-Ebox queues to be fiushed. Also, E%FLUSBJ4BOX_B is asserted, fiushing Mbox processing of Ebox operand accesses other than writes. See Section 8.5.19 for more on Ebox handling of microtraps. See Chapter 9 for more on dispatching a microtrap. See Chapter 7 for more on activity surrounding branch misprediction. The branch macroinstruction has entered S5 and is therefore retired even in the event of a misprediction. It is the macroinstructions following the branch in the pipeline which must be prevented from completing in the event of a misprediction trap. The following shows all the cases the branch condition evaluator handles. The macroinstruction opcode and mnemonic are given along with the boolean equation used to determine if the branch is taken. Table 8-9: Branch Condition Evaluation IDstruction Opcode Branch TakeD Condition BNEQ,BNEQU 12 NOTPSL<Z> BEQL,BEQLU 13 PSL<Z> BGTR 14 NOT (PSL<N> OR PSL<Z» BLEQ 15 PSL<N> OR PSL<Z> BGEQ 18 NOTPSL<N> 8-34 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 199.1 Table 8-9 (Cont.): Branch Condition Evaluation Instruction Opcode Branch Taken Condition BLSS 19 PSL<N> BGTRU lA NOT (PSL<C> OR PSL<Z» BLEQU 1B (PSL<C> OR PSL<Z» BVC lC NOTPSL<V> BVS lD PSL<V> BGEQU,BCC 1E NOTPSL<C> BLSSU,BCS 1F PSL<C> SOBGEQ F4 NOTALU<N> SOBGTR F5 NOT (ALU<N> OR ALU<Z» AOBLSS F2 ALU<N> XOR ALU<V> AOBLEQ F3 (ALU<N> XOR ALU<V» OR ALU<Z> ACBB 9D (ALU<N> XOR ALU<V» OR ALU<Z> ACBW 3D <ALU<N> XOR ALU<V» OR ALU<Z> ACBL Fl (ALU<N> XOR ALU<V» OR ALt;<Z> BBS EO E_SHF~SHF_RESt.TLT_H<O> BBC El NOT E_SHF%SHF_RESLTLT_H<O> BBSS E2 E_SHF%SHF_RESULT_H<O> BBCS E3 NOT E_SHF%SHF_RESULT_H<O> BBSC E4 E_SHF%SHF_RESULT_H<O> BBCC E5 NOT E_SHF%SHF_RESULT_H<O> BBSSI E6 E_SHF%SHF_RESULT_H<O> BBCCI E7 NOT E_SBF%SHF_RESULT_H<O> BLBS E8 E_SHF%SHF_RESULT_H<C» BLBC E9 NOT E_SHF90SHF_RESULT_H<C» 8.5.12 Miscellaneous Ebox Operand Sources Generally Ebox operand sources are registers in the register file or other registers. Certain sources are read type accesses to Ebox states, special results calculated automatically, or access to a data path not normally used as an operand source. In some cases data which can be accessed in another way is arranged in a special format as a source. DIGITAL CONFIDENTIAL The Ebox 8-35 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 8-4: S+PSW Format 31 30 2~ 28127 26 25 24123 22 21 2011~ 18 17 16115 14 13 12111 10 O~ 08107 06 OS 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 0 01 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 01 1 0 0 0 0 01 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 1 +--- OPCODE<O> 8.5.12.1 1 +--- PSL<7:S> S+PSW_EX The 8+PSW_EX E_BUS%ABUS_L<31:O> source is simply a bit from the macroinstruction opcode and several bits from the PSL. It saves microcode steps in the CALLS and CALLG macroinstructions. Figure 8-4 shows the format of this longward source. Bit <29> comes from the instruction context register (OPCODE<O». Bits <7:5> come from the PSL register. 8.5.12.2 Population Counter The Population Counter is an Ebox function unit which calculates four times the number of ones in E_BUS'icABUS_L<13:O> every cycle. Its result is available as a E_BUSt;'c..-\BUS_L<31:O> source to the following microword. It saves microcode steps in the CALLS, CALLG, POPR, and PUSHR macroinstructions. The Population Counter calculates a result in the range 0 to 14*4 equal to four times the number of ones in E_BUS%ABUS_L<13:O> early in 84. If microword N steers data to E_BUSO/CABUS_L<31:O>, microword N+l can access the Population Counter result for that data by specifying POP.COUNT in the A field. If microword N+l is stalled in 83, Ebox control logic holds the Population Counter result until the stall ends. The effect is the same as if no stall had occurred. The Population counter's result is used to calculate the extent of the stack frame which will be written by the macroinstruction. The two ends of the stack frame are checked for memory management purposes before any writes are done. 8.5.12.3 RN.MODE.OPCODE RN.MODE.OPCODE is a longword composite source used when the microcode needs to access one of these data items. The four data fields in this register are RN<3:0>, CUR_MOD<l:O>, OPCODE<7:0>, and the VAJCRESTARTJUT. Figure 8-5 shows the position of these fields in the longword. This longword is one of the possible E_BUS%lIBUS_L<31:O> sources. It is read in 83. The RN<3:0> field is really a special data path. Its value is the GPR number in the CUITent source queue entry. The following restrictions apply: The A field of the microword must specify 51 (the current source queue output), and the microcode must know from context that the source queue entry points to a GPR. If these restrictions are not met, the value returned in the RN field is UNPREDICTABLE. 8-36 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 8-5: RN.MODE.OPCODE E_BUSOkBBUS_L<31:0> Source 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ RN I 0 0I I OPCODE I 0 0 0 0 0 0 0 0I I 0 0 0 0 0 0 0I .--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I +-- VAX_RESTART_BIT The CUR_MOD<1:0> field is simply the access mode of the current process; it is taken directly from psL<25:24>. The OPCODE<7:0> field is the opcode from the most recent macroinstruction execution dispatch. It is taken from the instruction context register in S3. This instruction context register field has 9 bits. The 9~h bit indicates the first byte of the opcode was FD#16. The opcode portion of the RJ\.MODE.OPCODE source does not include the 9th bit. The VA.X_RESTART_BIT field is the VAX Restart Bit which indicat.es that the most recently dispatched macroinstruction execution microfiow has not altered a GPR or initiated a memory write operation of some kind. It is used to indicate to the operating system that a macroinstruction which encountered some error hasn't modified any architectural state. See Section 8.5.13 for more detail. 8.5.12.4 PMFCNT Register The PMFCNT register which is part of the performance monitoring facility is available as an E_BUS%ABUS_L<31:O> source. See Chapter 18 and Figure 18-4. 8.5.13 VAX Restart Bit The VAX Restart Bit is used to keep track of whether the currently executing macroinstruction has altered any architecturally visible state. It is only used by macrocode handling machine check exceptions. Conceptually, the Ebox hardware resets this bit anytime a GPR is altered or a memory write or store is initiated and sets it anytime a new macroinstruction begins. Often there is more than one macroinstruction in the NVAX pipeline, making maintenance of the VAX Restart Bit somewhat tricky. As is described in Section 8.5.19, microtraps for faults are always taken at the end of 84, before the microword can advance to S5. The VAX Restart Bit is set reset only when operations advance to 85 and there is no pipeline abort in that cycle. The VAX Restart Bit is reset each time a microword which alters a GPR or specifies any memory write is advanced into S5. The bit is reset in 85 when a read is sent to the Mbox and the read data is to be returned to a GPR, since that event actually writes the data on E_BU~WBUS_L<31:O> into the specified GPR. The memory operations specified in the MRQ field which cause the VAX Restart Bit to be reset are: • WRITE.V.WCHK and • WRITE.V.UNLOCK. DIGITAL CONFIDENTIAL The Ebox 8-37 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 In addition, all microwords specifying DSTIDST reset the VAX Restart Bit since destination queue indirect stores are either memory stores or GPR writes. The VAX Restart Bit is set each time a microword which causes dispatch to an execution microfiow is advanced into 85, or when microcode handles a trap exception by retiring the current instruction and dispatching to the exception handler in microcode. Specifically, it is set when MISCIRETIRE.INSTRUCTION or SEQ.MUXILAST.CYCLE is specified by the microword in 85. The set always overrides the reset when both conditions exist in the same cycle. So the bit is reset when a microword which alters a GPR or writes or stores to memory is in 85 and that microword does not specify MISCIRETJRE.INSTRUCTION or SEQ.MUXILAST.CYCLE. When the Fbox is retiring results, the VAX Restart Bit is maintained properly. It is reset if the Fbox stores a result in memory or the register file (that is, it is reset on any destination queue indirect store from the Fbox). It is set when the Fbox asserts F%RETIRE_B, retiring the current Fbox instruction. (Note that this is not the cycle in which the microword which initiated the Fbox instruction is in 84; this is the cycle in which the Fbox sends the result of the operation to the Ebox.) As with Ebox retires, the set overrides the reset. The \u Restart Bit doesn't detect all changes to architecturally visible states. Microcode takes explicit action when it is about to alier some architecturally visible state other than memory or a GPR. It can, for example, copy a GPR to itself before changing the other state in question. The \:o\.X Restart Bit is read out in 83 but is maintained in 85. The value of this bit isn't useful if the pipeline is executing macroinstructions normall~~ It is useful only when a machine check exception has been detected. Since the VA.'X Restart Bit is updated in mid 85, it won't report a memory or GPR "vrite until the second microword after the one which does the write. The VAX Restart Bit is read through the RN.MODE.OPCODE E_BUS%BBUS_L<31:O> source. Section 8.5.12.3. See 8.5.14 Ebox-Microsequencer Interface The Ebox receives the data path control part of the microword and the macroinstruction context information from the Microsequencer at the beginning of S3. It also receives a few signals indicating the circumstances accompanying the fetch of the microword. The Ebox sends many states which are needed for conditional branches to the Microsequencer from various points in the Ebox pipeline. The Microsequencer uses these states for conditional branch calculation. 8.5.14.1 Instruction Context Register The Microsequencer latches macroinstruction information at the beginning of each. macroinstruction executiop. microflow, including FPD microflows. This information was originally created in the !box and entered in the instruction queue. At some point the Microsequencer extracted that information along with a control store dispatch address. The Microsequencer pipelines this information so that it becomes visible to the Ebox at the same time as the microword from the dispatch address is clocked into the MIB Latch. The Microsequencer holds this data until the next time the first microword of a macroinstruction enters 83. See Section 9.2.3.3.4 and Section 9.2.3.3.4.1. Except for the DL data, the Ebox simply carries the instruction context data down the pipeline. In the Ebox, the DL register is loaded with the DL data when the first microword of a macroinstruction is in 83. This latch can be altered under microcode control. See Section 8.5.10.6. 8-38 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The information passed by the Microsequencer to the Ebox is made up of the following fields: • • • Macroinstruction Opcode; Instruction Context<OPCODE> -= Instruction Context<12:4> The ninth bit indicates FD#16 was the first opcode byte. This data is carried down the Ebox pipeline. It is used in 83 as a source of data and for microcode conditional branches. In S4/S5 it is used in the conditional branch evaluator. Data Length; Instruction Context<DL> -= Instruction Context<3:2> The Ebox holds this initial instruction data length in the DL register. Fbox Instruction Flag; Instruction Context<FI> = Instruction Context<l> This bit is asserted if the opcode is for any macroinstruction which is normally executed in the Fbox. The Ebox enters it in the retire queue and uses to force a reserved opcode fault for Fbox instructions when the Fbox is disabled. The Microsequencer signals that a new microflow begins with the accompanying microword and macroinstruction context information. If the new microflow is due to a macroinstruction, the Ebox latche5 the DL<l:O> data. The DL ,alue can be altered by microcode, so a special latch is implemented in 83 for it. The opcode is simply carried along the pipeline. It remains latched in the !\ficro5e~uencer until the next new macroinstruction How is dispatched, so it is not latched explicitly in the Ebox. This instruction context information is available to any microword in the associated macroinstruction's execution microflow. The floating point instruction flag is also entered in the retire queue when a new microfiow is for a macroinstruction. For more detail on the retire queue see Section 8.5.15.7. The macroinstruction context information is carried down the pipeline with each microword. The context information stalls when the microword stalls. The opcode is used in S4 and S5 to determine conditional branch results. The DL is used to control the ALU in 84, the size for any memory request in 84, E_BUS%WBUS_L<31:O> zero extension in S5, and GPR byte write-eDables in 85. The floating point instruction flag is used in S3 to determine how to handle source operand faults. The DL register can be altered by microcode. This occurs when themicroword specifying the change is in 84. If new instnlction context information enters S3 at the same time as a microword specified DL alteration occurs, the instruction context load overrides the microword specified alteration. This is because the instruction context load is for the microword subsequent to the microword specifying the DL alteration. 8.5.14.2 Mlcrotest Fields The Ebox provides most of the information used by the Microsequencer for microcode branches. The condition bits are driven onto the microtest bus when the Microsequencer requests it by driving the select code on E_USQ%UTSEkB<4:O> and ILUSQ%UTSEL_L<4:O>. The condition data is driven early in the cycle after it is computed. The following table shows the information the Ebox can supply. It gives the source and pipeline segment in which the data is driven. This condition information is tested in S3, as specified by the SEQ.COND field in the Microsequencer control part of the microword. The S3 operation determines the address of the next microword. So data delivered by the Ebox when microword N is in S3 is used by microword N+l to select microword N +2. If the data is driven while microword N is in S4 or S5, one or two more cycles of microbranch latency are required, respectively. DIGITAL CONFIDENTIAL The Ebox 8-39 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 8-10: Ebox Sourced Mlcrobranch Conditions Source Pipeline Stage (condition bit driven at end of stage): ALU<.Z>, ALU<C>, ALU<V>, ALU<N> S4 8HF<N>, 8HF<.Z> S4 E_B~US..L<81.1&1""1I> 83 E_BUSCJilA.BUS-.L<D> OR E_BtJSCJDABUSJ.cU> sa E_BUSCJiiBBUS_LcSa3.IIO> 83 E_BUSCJiiBBUS_L<2aO> EQ 0 83 E_BUSCJiiBBOS_L<U18> NEQ 0 sa :MPUO_6<2:0>,1t!PU7_13<2:0> S4 State Flags 0-5 sa Opcode<2:0> 82 1 PSL<29~26:22> S5 \"ECTOR_PRESE!\'"!' always stable; configuration s+..atus bit (not used by NVAX microcode, see Section 8.5.18) FBOX_ENABLE always stable; configuration status bit Field queue suNs - valid, and reLmode always accessible; Ibox-Ebox queue Fbox fault code (see Section 8.5.19.7) effectively always stable; not valid except in microtrap for Fbox faults Ibypass or fiow-tbru design required 80 first microword of a macroinstru.cti.on execution fiow can specify a conditional branch on its macroin.struction opcode. See Chapter 9 for more on microbranches. 8.5.14.3 Miscellaneous Mlcrosequencer Signals The Microsequencer provides the Ebox with several control signals. Microsequencer events which have Ebox side effects. They signal certain The Microsequencer signals E_U~UTSEkB<4:O> and E_USQ%U'rSEL_L<4:O> are used in early S3 by the Ebox to detect that one of the MPU conditional branches (MPUO_6 or MPU7_1S) is decoded from the Microsequencer control part of the microword. The Ebox clears the appropriate bit in the mask stored in the MPU by the end ofS3. See Section 8.5.10.7 for more detail. The Microsequeneer signals E_USQ%UTSEL..B<4:O> and E_USQ'RJ'rSEkL<4:O> are used early in 83 by the Ebox to detect that the field queue status conditional branch is decoded from the Microsequencer control part of the microword. The Ebox retires an entry from the field queue if the entry was valid at the time the branch was evaluated. See Section 8.5.15.8 for more detail. NOTE E_USQ%UTSEL_B<4:O> and E_USQ%UTSEL...L<4:O> are derived almost directly from the SEQ.COND field of the Microsequencer control part of the microword. See Chapter 9. 8-40 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The Microsequencer asserts E_USQo/cMACRO_1ST_CYCLE_B when the microword in 83 is the first microword of a macroinstruction execution micro:flow (including the microHow at the FPD dispatch). The Ebox sets all the Wn register valid bits and resets state fiags 0-3 as a result of this signal. Both effects occur in S3. It also copies PSL<T> into PSL<TP> once the microword reaches 85. Also, the Ebox latches the new instruction context DL value at the beginning of 83. The Microsequencer asserts E_USQ%PE...ABORT_L when a microtrap is initiated. In this cycle all the control latches in the Ebox pipeline are Hushed. Also, the Ebox flushes the retire queue. The Microsequencer asserts E_USQ%I(LSTALL_H when the microword in 82 is the STALL microword (see Section 8.5.20.1). This status is carried down the Ebox pipeline along with the microword. The status is asserted (and the microword is the STALL microword) only when the Microsequencer required an instruction queue entry but no entry was valid. When this status is true, and the Ibox is asserting one of its memory error signals, the Ebox assumes a memory error in fetching the opcode byte(s) occurred. This is piped forward to 83 and then treated like any other 83 detected fault. A microtrap is forced when the condition is clocked into 84. See (Section 8.5.19). The STALL microword status is also used by the Ebox 83 stall timeout logic (see Section 8.5.25.1). T'\vo £elds from the ::Microsequencer control portion of the micro\vord are decoded by the Ebox. These £elds are SEQ.MUX and SEQ.FMT. The Ebox determines when these fields decode to the operation L.-\sT.CYCLE or LAST.CYCLE.OVERFLOW. See Chapter 9 for more on the format of the ~Iicrosequencer control portion of the microword. The decoded status is carned down the Ebox pipeline with the other decodes of the microword. "'hen a microword specifying SEQ.MVXlLAST.CYCLE or SEQ.AfUXlLAST.CYCLE.OVERFLOW is advanced into S5, the Ebox signals the Ibox that a macroinstruction is retiring (except if the microword specifies DISABLE.RETIP-ElYES). See Section 8.5.15.9 for more detail. When a microword specifying SEQ.MUXILAST.CYCLE.OVERFLOW is advanced into S5, and the PSL<IV> and PSL<V> bits are both set, the Ebox signals the Microsequencer that an integer overfiow microtrap should occur. 8.5.14.4 Miscellaneous Ebox-to-Microsequencer Signals The Ebox sends the Microsequencer several PSL bits which affect new microfiow dispatching (dispatching in response to SEQ.MUXILAST.CYCLE or SEQ.MUXILAST.CYCLE.OVERFLOW). They are PSL<T, TP, and FPD>. When the Microsequencer next decodes a SEQ.MUXILAST.CYCLE or SEQ.MUXlLAST.CYCLE.OVERFLOW operation, if PSL<FPD> or PSL<TP> is set, it dispatches to special microflows (a different microfiow for FPD than for TP) instead of the next macroinstruction execution microftow. If it dispatches for FPD (first part done), the Microsequencer removes an entry from the instruction queue and sends the instruction context information to the Ebox. For TP (trace fault) dispatches, the instruction queue is not referenced and the instruction context register is not loaded. When PSL<T> is set at instruction dispatch time (including dispatching for FPD), the Microsequencer sets a local copy of the PSL<TP> bit, called LOCAL_TP (see Section 9.2.3.3.2). If LOCAL_TP or PSL<TP> is set at the time of a dispatch for a macroinstruction, the instruction queue reference does not occur and a trace fault dispatch occurs instead. This could happen on the very next cycle after the macroinstruction dispatch with PSL<T> set and PSL<TP> not set. The Microsequencer sets LOCAL_TP during the first dispatch cycle so that it can affect the immediately subsequent dispatch. DIGITAL CONFIDENTIAL The Ebox 8-41 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The Ebox asserts the signal E_PSLVSL_IS_DST_SS_B in S5 of any cycle in which the entire PSL is being updated (i.e., if only the low byte of the PSL is updated, E_PSL%PSL_IS_DST_SS_B is not asserted). The Microsequencer clears LOCAL_TP when this signal is asserted. Note that the Microsequencer will initiate a trace fault dispatch if the PSL<TP> bit is set or LOCAL_TP, or both. So if a new PSL with PSL<TP> set is loaded, the trace fault dispatch will occur at the correct point. NOTE There is a microcode restriction which disallows specifying SEQ.MUXILAST.CYCLE or SEQ.MUX!LAST.CYCLE.OVERFLOW in the two microwords following one which loads the PSL. An exception to this rule is made when none of the PSL bits which affect new microfiow dispatching will be changed. Some microflows know from context that none of these bits will change in a given PSL write (for example, in the execution microfiow for the CALL macroinstruction, several bits in the low byte of PSL are cleared, but <T, TP, and FPD> are unaffected). 8.5.15 Ebox-Ibox Interface The Ibox to Ebox interface is made up of a number of FIFO queues which carry operand information to the Ebox. These are the source queue, destination queue, field queue, and branch queue, '\.vhich carry source operand information, destination operand information, type information for bit field operands, and branch related information, respectively. These queues are part of the Ebox. The Ibox generally processes instructions ahead of the Ebox. As it processes operand specifiers it adds entries to one or more of the queues. Each specific macroinstruction execution microfiow always removes the same number of entries from. each queue as the Ibox adds (unless an exception occurs). With this buffering, the Ibox and Ebox operate independently enough that stalls or latencies in one box don't necessarily cause a stall in the other, resulting in greater overall execution speed. See Chapter 7 for more detail on many of the topics in this section. The Ebox maintains macroinstruction ordering information in the retire queue. This FIFO is not part of the Ibox to Ebox interface, but is closely related. The Ebox is both the supplier and the consumer of retire queue entries. In any of the queues described in this section an entry which hasn't been added is said to be invalid. Except in the case of the field queue, a stall (S3 for source queue, S4 for destination queue and branch queue) results when the microword references a queue entry which isn't valid. This stall ends when the Ibox adds enough entries to fulfill the microword's request. In any of the queues described here, adding an entry means writing an entry, and moving the write pointer to the next entry in the queue. Accessing or referencing an entry means reading an entry, and moving the read pointer to the next entry in the queue. Where it is needed, status information concerning the number of valid entries in a queue is generated by examining the read and write pointers of that queue. 8-42 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 8.5.15.1 Ibox Counters The Ibox has three counters which prevent queue overrun. Two counters are used to keep track of the number of entries in the source and destination queues, one for the source queue (allowing 12 entries) and one for the destination queue (allowing 6 entries). The Ibox increments these counters when it adds entries. The Ebox notifies the Ibox when it retires entries from the source or destination queue, and the Ibox decrements the counters in response. Another counter in the Ibox keeps track of the number of macroinstructions which have been sent to the Ebox but have not been retired. This limits the number of entries in the retire queue, branch queue and field queue because there can be no more than one entry in each of these queues for any given macroinstruction. The counter allows up to 6 instructions in the EboxlFbox at a time. The Ibox increments this counter when it adds an entry to the instruction queue. When the Ebox signals the Ibox that a macroinstruction is retiring, the Ibox decrements the counter. This happens in S5 of the Ebox pipeline, one or two stages after the stage in which entries are removed from these queues. Note that this same mechanism limits the number of instruction queue entries to 6. I NOTE The limit of one field queue entry per macroinstruction is simply an NVAX convention. The VA..X Architecture does not include instructions which have more than one bit field base address operand specifier, but ~TVAX defines other operands as field type where it simplifies the implementation. The Ibox also has a counter to keep track of the number of available MD registers. It increments this counter when it allocates an MD to hold operand data (e.g., when it initiates a read of operand data from memory to an MD). 'When the Ebox retires a source queue entry, it tells the Ibox whether the entry pointed to an MD. The Ibox decrements the counter when the Ebox retires a source queue entry which pointed to an MD. It is possible for the Ebox to retire two source queue entries in one cycle, and the Ibox decrements the counter by two when both source queue entries pointed to MDs. 8.5.15.2 Source Queue The source queue carries source operand information. The information is either literal mode data (6 bits) or a pointer into the register file. Ifit is a register file pointer, it either points to a GPR or to an MD register. The Ebox accesses one or two source queue entries per cycle in 83. Source queue accesses always cause data to be sourced to E_BUS%ABUS_L<31:O> or E_BUS%BBUS_L<31:O>. literal mode data is zero extended and driven directly onto the specified bus. Otherwise the contents of the location in the register file pointed to by the source queue entry is fetched. If the register which is accessed is not valid or is marked for writing by the Fbox, then the appropriate 83 stall occurs. Figure 8-6 shows a source queue entry. The VALUE field is either a register file address or a 6-bit literal data value. If it is a register file address, it points to either a GPR or MD register. SH_LIT indicates whether VALUE is short literal data (if SH_LIT is 1, VALUE is short literal data). Source queue entries are made for read, modify, address, and field operands. Both a source queue and a destination queue entry is made for each modify operand. DIGITAL CONFIDENTIAL The Ebox 8-43 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 8-6: A Source Queue Entry 06 05 04103 02 01 00 +--+--+--+--+--+--+--+ VALUE +--+--+--+--+--+--+--+ I +--- SH_I.!T Field operands in NVAX are classified into read and modify types. Read and modify field operands both result in a source queue entry. Modify field operands also result in a destination queue entry if the operand specifier is register mode. Two source queue entries are made for quadword length operands. If they are for registers, they point to registers N and N+l. If they are memory operands, they point to MD registers which will receive data from memory addresses A and A+4. For literal mode, the first value is the immediate data, and the second is O. Source queue access fulfills a necessary synchronization function. "'llen microcode successfully accesses a source queue entry it knows that the Ibox was able to fetch the associated operand specifier. It also knows that there is no access violation or invalid translation condition associated "vith the operand. For modify type operands it also knows that the location will not give an access violation when "vritten. Microcode for complex macroinstructions always references all source operands ,,'\"hich might cause a memory management fault before altering any architecturally visible state. The number of entries in the source queue is 12. 8.5.15.3 Destination Queue The destination queue carries destination operand information. The information is either an address in the register file of a GPR or a status indicating a memory write to the address in the PA queue in the MbOx. The destination queue is accessed in S4 (no more than one entry per cycle is used). Its information is used to decide how to write the result which is being calculated by the ALU, shifter, or Fbox in the same cycle. If the destination queue entry indicates a memory store, the request is sent to the Mbox. An S4 stall occurs if the Mbox is already busy or the PA queue entry is not ready. If the destination queue entry indicates a GPR write, the register file will be written using the address from the destination queue. The GPR write occurs in the next cycle (85). Figure 8-7 shows a destination queue entry. The VALUE field is either a register file address or is unused. If it is a register file address, it points to a GPR. MDEST indicates whether the destination of the data is memory. If MDEST is 0, the result is destined for the register :file and VALUE field indicates the destination address. If MDEST is 1, the destination of the data is memory and the VALUE field is unused. Destination queue entries are made for modify and write access type operands. Also, modify field operands result in a destination queue entry if the operand specifier is register mode. 8-44 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 8-7: A Destination Queue Entry 04103 02 01 00 +--+--+--+--+--+ 1 1 VALUE +--+--+--+--+--+ 1 +--- MDEST Two destination queue entries are made for quadword length operands. If they are for registers, they point to registers Nand N+l. For memory operands they point to addresses A and A+4. Destination queue access fulfills a necessary synchronization function. If the destination queue entry is accessed and used successfully, microcode knows that the destination operand specifier was fetched successfully and that there will be no access violation when the destination location (if it is in memory) is written. In the case of quadword data length, successful use of the first destination queue entry guarantees that the second write will not incur a memory management exception either. The destination queue contains the Fbox destination scoreboard function. See Section 8.5.16.4 for more information. The number of entries in the destination queue is 6. 8.5.15.4 Miscellaneous Queue Retire Information "7b.en an entry is retired from the source or destination queues, certain information is sent back to the Ibox. The Ibox uses this information to maintain three counter values and to maintain GPR scoreboard information in the scoreboard unit (SBU). Zero or one destination queue entry can be retired in a given cycle. The retire information sent to the Ibox for the destination queue is: • • • whether an entry is being retired, whether the entry being retired indicates a GPR·write or a memory write, and the GPR number if it is a GPR write. The Ebox signals the Ibox when a destination queue retire occurs early in the cycle in which the operation is advanced into S5. Zero, one, or two source queue entries can be retired in a given cycle. Similar information is sent for each of the two source queue read ports. The retire information sent to the Ibox for each source queue read port is: • • • whether an entry is being retired, whether the entry being retired indicates a GPR read, an MD read, or is short literal data, and the GPR number if it is a GPR read. The Ebox signals the Ibox when one or two source queue retires occur. It does this early in the cycle in which the microword retiring the source queue entries is advanced into 84. DIGITAL CONFIDENTIAL The Ebox 8-45 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 8.5.15.5 Branch Queue The branch queue carries information for conditional and unconditional branches. The information is a one-bit prediction status. The prediction status is only used by conditional branches. It indicates which way the Ibox predicted the conditional branch would go. The Ebox references the branch queue for two reasons: to synchronize with the Ibox fetch of the branch displacement and to compare the Ibox branch prediction to the actual branch result. The Ebox accesses the branch queue in S4 when the microword specifies SYNC.BDISP, SYNC.BDISP.RETIRE, or SYNC.BDISP.TEST.PRED in the MRQ field. SYNC.BDISP.RETIRE is used in unconditional branches. SYNC.BDISP.TES'tPRED is used in all conditional branches. SYNC.BDISP is used in some complex conditional branches. The Ibox doesn't add an entry to the branch queue until it has successfully fetched the displacement. When the Ebox accesses the branch queue, it will stall until there is an entry. This stall occurs in S4 and prevents the branch macroinstruction from retiring before the displacement has been successfully fetched. For conditional branches, the Ebox waits for the Ibox to add the entry to the branch qu.eue and then compares the Ibox prediction to the actual result of the branch which is calculated in the Ebox. If the branch was mispredicted, the Ebox initiates a microtrap in 85. Because the microtrap is in 85, the branch macroinstruction retires but subsequent macroinstructions are prevented from completing. In some complex conditional branches, the Ebox microcode waits for the branch queue entry to become valid before it stores a result calculated by the instruction. This allows the microcode to be sure the branch displacement was fetched withop,t a memory management fault or hardware error before modifying state. The microcode may have to delay retiring the branch queue entry and checking the branch prediction. SO SYNC.BDISP accesses the branch queue, and causes an S4 stall if the entry is not valid, but does not cause the entry to be retired. The Ebox signals the Ibox whenever a microword which retires a conditional branch queue entry advances into 85 (that is a microword specifying SYNC.BDISP.TEST.PRED). This causes the Ibox to release the alternate branch path PC (the PC of the path not taken by the Ibox prediction). The Ebox signals a mispredicted branch at the same time, if there is one. If there is a mispredicted branch, the Ibox responds by unwinding the RLOG and resuming macroinstruction fetching at the alternate PC address. Du.e to complexity in the branch queue bypass logic, it may happen that one cycle of "unnecessary" stall occurs in cases where there back-to-baclt branches are executed. The extra cycle of stall happens only if the two branches are in adjacent stages of the Ebox pipeline and the Ibox writes the second branch queue entry one cycle before the the second branch is in 84, ready to retire (i.e., it wouldn't be stalled except for the branch queue stall). In this case the branch queue read pointer is being advanced and another branch queue entry is being written. Bypass is not implemented for the second branch in this specific case. The number of entries in the branch queue is 6. 8.5.15.6 Operand and Branch Buses The transmission of operand information for the source queue, destination queue, and field queue occurs via the operand bus. This bus is described in Chapter 7. It carries all the information which might be entered into any of these queues, and it has valid bits which tell the Ebox when to add entries. 8-46 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specificationt Revision LIt August 1991 The operand bus caries information derived from decoding a single operand specifier. Zero, one or two source andlor destination queue entries are specified, and zero or one field queue entry. Only when the operand is quadword length can more than one source or destination queue entry be made. Whether a source or destination queue entry is made depends on whether the operand is read, write, or modify access type. (Note that the access type referred to here might not be identical to the true access type given in the VAX Architecture Standard, for various reasons.) A field queue entry is made for each field operand. The Ibox instruction decode logic determines if a particular operand is a field operand. Only certain macroinstructions have a field operand, and no macroinstruction has more than one field operand. The branch queue receives its information via the branch bus. This bus has one bit of data (a prediction status) and a valid bit. A branch queue entry containing the prediction status is added in every cycle in which the valid bit is asserted. See Chapter 7 for more information. 8.5.15.7 Retire Queue The retire queue is used by the Ebox to force macroinstructions to retire in order. It contains one bit of information, a status indicating whether the Ebox or Fbox is the source of the next macroinstruction to retire. The Ebox adds an entry to the retire queue in S3 each time a new macroinstruction execution microflow begins. (If there is an S3 stall, the entry is added to the retire queue in the first cycle of the stall. Exactly one entry is made whether or not an S3 stall occurs for one or more cycles.) The retire queue entry is the FI bit from the instruction context register (see Section 8.5.14.1). However, if the FBOX_ENABLE bit in ECR (IPR 125) is not set, the retire queue entry is forced to indicate Ebox retire regardless of the FI bit. Similarly, if PSL<FPD>, PSL<27>, is set, the retire queue entry is forced to indicate Ebox retire regardless of the FI bit. The the retire queue is forced to indicate that the Ebox is next to retire when ECR<FBOX_ENABLE> is not set because the Fbox will not receive an operation dispatch from the Ebox (F%FBO~lST_CYCLE_B will never be asserted). ECR<FBOX_ENABLE> also disables microcode sending of operand data, overriding microcode. The Ebox generally forces a reserved instruction microtrap when Fbox instructions are in S4 (see Section 8.5.16.8 for more detail). This microtrap flushes the retire queue (and, because the retire queue is empty, the Ebox is automatically selected as the RMUX source). If the Fbox instruction is MULL, a reserved instruction microtrap does not occur (see Section 8.5.16.8). Instead the Ebox microcode executes the MULL. This requires that the Ebox be selected as next to retire and is the reason ECR<FBOX_ENABLE> forces the retire queue entry to select the Ebox. When PSL<FPD> is set SEQ.MUXlLAST.CYCLE and SEQ.MUXlLAST.CYCLE.OVERFLOW causes the micro sequencer to dispatch to a specific microcode entry point regardless of the instruction queue contents. Since this dispatch is to an Ebox microcode flow which will not send operands to the Fbox, the Ebox must be selected in the retire queue (though any previous instruction is not affected and retires normally). Otherwise, the Ebox could stall waiting for the Fbox to retire an instruction while the Fbox waited for source operands to be sent. That deadlock would only end on 83 stall timeout. The Ebox examines (without retiring an entry) the retire queue in S4 to determine whether the Fbox or the Ebox is the next source of a retiring macroinstruction. Based on the retire queue output, the RMUX is set to select either the Fbox or the Ebox as the source of control for S4-initiated memory references and most S5 operations. This selection remains in effect until the retire queue entry is retired. See Section 8.5.5 for more on how this status is used to control the RMUX. DIGITAL CONFIDENTIAL The Ebox 8-47 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 If the Ebox is the next to retire a macroinstruction, the retire queue entry is retired in S4 when the microword advancing into S5 specified SEQ.MUXlLAST.CYCLE or SEQ.MUXlLAST.CYCLE.OVERFLOW and did not specify DlSABLE.RETIREIYES. If the Fbox is the next to retire a macroinstruction, the retire queue entry is retired in S4 when the Fbox asserts F%RETIRE_B. In either case the retire queue entry is not retired unless the selected operation advances into 85 (i.e., there is no S4 staIl). (Note that a retire queue entry is not retired by the MISClIRETmE.INSTRUCTION operation.) The retire queue is flushed. when a microtrap occurs as well as when the MISC field function RESET.CPU is specified. Anytime the retire queue is empty, the Ebox is automatically selected as the source of the RMUX. Note that it is not possible for the retire queue to have less than the necessary number of entries in it, except after a microtrap, because each entry is added before it is required. The number of entries in the retire queue is 6. 8.5.15.8 Field Queue The field queue carries information about field type source operands for bit-field macroinstructions and some other macroinstructions. The information is one bit which indicates whether the operand was register mode or not. Two different execution microflows are required for hit-field macroinstructions and certain other macroinstructions depending on whether a particular operand is register mode. The lOOx provides this information when it adds a source queue entry for the operand. Microcode is able to branch conditionally on the status of the field queue. This allows execution microflows to decide how to execute the instruction. Each entry in the field queue is a one-bit status which indicates whether the associated field operand is register mode. Microcode branches on a field queue entry are four way branches, though only three of the four outcomes are possible. The following table shows the possible branch outcomes. Table 8-11: Field Queue Branch Condition Resulting Microtest Bus Value Field queue empty 11 (can be execution dispatch target) Field queue not empty-register mode 01 (start of execution for register mode case) Field queue not empty-not register mode 00 (start of execution for address mode case) A branch on the field queue when it is not empty causes the current field queue entry to be retired. The field queue has 6 entries. When the Ebox is branching on the field queue, it may have to wait for the Ibox to make an entry, in which case it loops repeatedly testing the field queue. This condition is similar to a stall, but no Ebox stall is involved. When microcode is branching on the field queue and it is empty, the signal E_FLQ%~STALL_B is asserted. This tells the 83 stall timeout logic that the Ebox is looping on the field queue. If this continues for a long time, a machine check occurs. See Section 8.5.25.1 for more detail. 8-48 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 E_FLQ%FQ...BTALL_H is also used by the fault logic. If E_FLQ%FQ...STALL_H is asserted and one of I%IMEM_MEXC_H, I%IMEM_BERR_H, or I%RSVD_ADDR_FAULT_H is asserted, then a S3 fault condition is detected. After a cycle in which there is no S4 stall (and given that the Ebox is next to retire), the fault condition advances into S4 and the appropriate microtrap is requested. See Table 8-12 and Section 8.5.19 for more information. 8.5.15.9 Retiring Instructions Retiring a macroinstruction is an important synchronization point between the Ebox and the Ibox. When a macroinstruction is retiring, the last of its operations is in S5 and cannot be stalled or aborted: The Ebox signals the Ibox so that it can free up certain resources associated with the retiring instruction. The Ebox usually retires a retire queue entry at the same time as it retires the macroinstruction (the exception is MISClIRETIRE.INSTRUCTION which doesn't affect the retire queue). The resources in the Ibox which are freed up by retiring a macroinstruction are a backup PC queue entry and a group of RLOG entries associated with that macroinstruction. When the retire queue indicates the Ebox is next to retire a macroinstruction, the set of conditions required for retiring to occur are: • the microword in S5 specifies SEQ.MUXlLAST.CYCLE or SEQ.MUXlLAST.CYCLE.OVERFLOW, and not DISABLE.RETIRElYES, or • the MISCI field function, MISClIRETIRE.INSTRUCTION, is specified (though the retire queue is not affected in this case). The Fbox determines its own retire instruction status which it sends through the RMUX when the retire queue indicates the Fbox is next to retire a macroinstruction. If the Fbox operation request in S4 is advanced to S5 with this condition asserted, the Ebox retires an instruction. 8.5.15.10 First Part Done The Ebox sends the current state of the PSL<FPD> bit to the Ibox on E%FPD_SET_L. If the Ibox fetches an opcode and this bit is set, the Ibox stops operation as soon as the opcode has been completely fetched. If the instruction is an interrupted instruction that is being resumed, then the operand specifiers mustn't be processed again since they may have side effects or may depend on data which has been altered by the instruction's execution. 8.5.15.11 Ebox to 'box Commands and IPR Accesses The Ebox is the source of two signals which immediately affect Ibox operation, and three others which cause IPR read and write operations or a load-PC operation. The two signals which immediately change Ibox operation are: Eo/~TOP_IBOX_H and E%RESTART_IBOX_H. Ei1cSTOP_IBOX,..H is asserted in S5 when the microword specifies MISCIRESET.CPU. E%RESTART_mO~H is asserted when the microword in S5 specifies MISCIRESTART.IBOX. E9'oSTOP_IBO~H is used to cause the Ibox to stop processing instructions and clear the Ibox GPR scoreboard. I t does not clear the RLOG or backup PC queue, so the Ibox is still able to restore state to that required for a fault. See Chapter 7 and Section 8.5.19. DIGITAL CONFIDENTIAL The Ebox 8-49 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 E%RESTART_IBO~H is used to restart the Ibox when it put itself in the stopped state after processing the operands for certain complex instructions. The Ebox detects its own accesses to Ibox IPRs in S5 just after issuing the request to the Mbox. It also decodes MRQILOAD.PC to detect a load-PC operation in 85. At that time it asserts one of three command strobes to the Ibox. They are E%IBOX-IPR_READ_H, E%IBOX_IPR_WRITE_H, and E%IBOX_LOAD_PC_L. The Ebox drives the signal fields E%IBO~IPR_TAG_H<2:O> and E%IBO~IPR_NUM..H<3:O> with the Wn register file destination for IPR read data and the IPR number, respectively. (The full register file address for the destination is 6 bits, but the Ibox appends the prefix for Wn registers since all Ibox IPR reads are sent to Wn registers.) For IPR writes and load-PC operations the lbox receives the data when the Mbox forwards it on Mo/oMD_BUS_B<31:O> in a later cycle. For read accesses the Ibox returns the data to the designated Wn register. Microcode synchronizes load-PC operations by issuing an Mbox operation (possibly MRQ/SYNC.MBOX). This synchronization is necessary because the Ibox will not be ready to accept the new PC data if a MISCIRESET.CPU occurs before the new PC data is forwarded by the Mbox. Any interrupt or exception which occurs after the load-PC will cause the Ebox to read the backup PC from the Ibox, and that value must have resulted from the load-PC operation. Once the synchronizing Mbox operation is complete, the microcode knows the Ibox has the data. Ibox IPR writes are synchronized by issuing a MRQlSYNC.MBOX (or another Mbox operation) after the operation. Once the MRQlSYNC.MBOX (or other Mbox operation) is complete, the microcode knows the Ibox has the data. 8.5.15.12 Loading The PC The Ibox maintains all PC information for the NVAX CPU. When microcode executing in the Ebox determines that instruction fetching should begin at some address, it sends the starting PC value to the Ibox. Conceptually, this is equivalent to loading the PC register. However, the Ibox keeps track of a number of PC values, and there isn't really a current PC register. See Chapter 7 for more on how PC values are maintained. The Ebox sends a new PC value to the Ibox in 85 when the microword specifies LOAD.PC in the MRQ field. The PC data is sent via the Mbox. Microcode first ensures that the Ibox is stopped and, if necessary, flushes appropriate queues. Note that the RLOG should have been unwound beforehand. 8.5.15.13 Ebox to Ibox Flush Signals Microcode is able to flush several entities in the Ibox: the virtual instruction cache (VIC), the branch prediction cache (BPC), and the backup PC queue (PCQ). In 85, the Ebox drives E%FLUSB_VIC_H, MFLUSH_BPT_B, and E%FLUSB_PCQ..B, when it decodes MISClIFLUSH.VIC, MISCllFLUSH.BPC, and MISCllFLUSH.PCQ, respectively. 8-50 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 8.5.15.14 Detecting lbox Incurred Faults and Errors There are two kinds of faults which can occur due to Ibox processing. Also, hardware errors can occur. When a fault or error occurs, the status is latched. The Ebox effectively detects the fault or error when it executes a microword which uses the result of the operation which incurred the fault or error. The Ebox causes a microtrap to occur when that same microword is about to be advanced from S4 into 85. See Section 8.5.19 for more on microtrap management. Some Ibox incurred faults and errors are initially detected by the Ibox, while others are first detected by the Ebox. When the Ibox detects a fault or error, it halts operation and asserts one of two fault indication signals or an error indication signal which are all received by the Ebox. These signals are I%IMEM..MEXC_B (which indicates a memory management fault), I%IMEM_HERR_H (which indicates a hardware error), and I%RSVD..A,DDR_FAULT_B (which indicates a reserved addressing mode). The Ibox only asserts I%RSVD_ADDR_FAULT_H for one cycle, so the Ebox has a latch which is set when it is asserted. This latch is reset by MISCJRESET.CPU and by branch mispredict microtraps. The Ebox ignores Ibox fault conditions until it determines that they applies to the current microword. This is done by associating some queue empty condition with the fault status. See Table 8-12. Faults and errors not detected by the Ibox are reported by the Mbox. For reads, the Mbox sets the fault or error bit associated with the target MD register in the register file. For writes, it sets the fault or error bit in the appropriate PA queue entry. When the Ebox references the MD register or tries to use the PA queue entry with a fault bit set, it detects the fault. Faults in memory reads issued by the Ibox as an intermediate step in processing an operand specifier (as in register deferred mode) are handled in a special way. When the memory read fault or error is detected in the Mbox, it returns a fault/error status instead of data. The Ibox latches this fault/error status. If the Ibox was going to use this data as an address (deferred mode), it sends the faultJerror status with the next specifier related memory request. The Mbox, seeing the fault/error status associated with the operation, sends the result to the MD register (for reads) or PA queue (for writes) with the same fault/error status. Detecting faults in memory reads issued by the !box as an intermediate step in processing an operand specifier can also occur another way. In the case where the Ibox will not have to issue a memory request using the result of the failed request (as in address access type with a deferred mode operand specifier), the Ibox reports the error by writing the MD fault or error status bit directly. The fault/error status latched in the Ibox is written into the MD fault/error status bits when the Ibox writes the MD. The table below lists the faults and indicates how each is detected. Table 8-12: Detection of lbox Incurred Faults and Errors Fault Instruction stream read fault/error on opcode How Detected Instruction queue empty AND (I~IMEM....MEXC_H OR I%IMDt.BERR_H) DIGITAL CONFIDENTIAL The Ebox 8-51 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 8-12 (Cont.): Detection of lbox Incurred Faults and Errors Fault How Detected Instruction stream read faultlerror on source operand (including modify type) (I~C_H OR I~~BERR_H) (Source queue empty! OR E~FQ...STALL..B asserted) AND Destination Instru.cti.on stream read faultlelTor on branch displacement Branch queue empty AND (I~IMEM....MEXC_H OR I"'JMEM HERR_H) Memory access faultlerror encountered in processing a source operand (including modify type) Attempt to read an MD register with a fault bit set Memory access fault/error encountered in processing a destination operand (write type) Attempt to use a PA queue entry with a fault hit set Reserved addressing mode on source operand queue empty AND (I~JMEM MUC_H Instruction stream read faultlen-or on destination operand (write type) OR I~BERll..H) (Source queue empty! OR E~FQ...STALIJI asserted) AND I'l&RSVD-.AJ)DRJI'AULTJI Reserved addressing mode on destination operand Destination queue empty AND I~BSVD_ADDR-"AULT..B Reserved opcode Microsequencer Dispatch ! In this context, source queue empty includes the case where the microword in S3 requires two source queue entries to advance, but only one entry is present in the source queue. . It is not possible for the Ibox to assert both I%BSVD.-ADDR_FAULT_H and either of I%IMEM.;wEXC_H or I%IMEM_HERR_H at the same time. The Ibox stops operation as soon as it encounters one of these two faults, so the other cannot occur after one is detected. 8.5.16 Ebox-Fbox Interface The Fbox executes independently of the Ebox but is dependent on the Ebox for delivery of source operands and storing of results. Floating point macroinstructions are decoded by the Ibox exactly like any other macroinstruction. The Ebox is dispatched to an execution microflow. This microflow delivers the source operands to the Fbox in 83 of the pipeline. Once the operands are delivered, the microflow is done. The Fbox returns the result in 84, along with any faults it might have detected. The Ebox keeps track of whether the Fbox macroinstruction is next to retire using the retire queue (see Section 8.5.15.9 and Section 8.5.15.7). Once the Fbox is next to retire, the Ebox may, at the Fbox's request, access the destination queue for the Fbox to determine where the Fbox results are to be written. When the Fbox indicates its last execution cycle, the Ebox retires a retire queue entry and updates the PSL with an Fbox supplied condition code. 8.5.16.1 Fbox Opcode and Operand Delivery The Ebox prepares to deliver operands during 83 when the microword specifies FOP.VALID in the MIse! field. The opcode<8:0> for the instruction is delivered from the Microsequencer late in 82, so that the Fbox can decode the opcode before the operands arrive. The operands are available at the beginning of 84. They come from the output of the bypass muxes so that result data from the most recent 84 (Ebox or Fbox) operation is bypassed if necessary. Anything which stalls 83 in the Ebox, stalls Fbox operand delivery (this includes 84 stalls). Along with the operands, the Ebox sends the current value ofpSL<FU>. 8-52 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 If the Ebox detects a fault or error associated with an Fbox source operand, it indicates this to the Fbox. The Fbox carries this information along its pipeline and indicates the fault and/or error when the Ebox is retiring the Fbox operation. This is how Fbox source operand fault microtraps are delayed until all preceding macroinstructions have retired. The Ebox ignores source operand faults (which proceed down the pipeline to S4) when the Fbox is next to retire. 8.5.16.2 Fbox Result Handling The Ebox handles writing of Fbox results in S4 and 85. When the current retire queue entry indicates the next macroinstruction to retire is to come from the Fbox, the Ebox waits for the Fbox to assert Fo/oSTO~H or F%RETIRE_H. Either or both may be asserted. If FOftSTORE_H is asserted, the Ebox accesses the destination queue and issues a memory store or a GPR write, depending on the MDEST bit in the current destination queue entry. (See Section 8.5.17 for the exact definition of memory store.) The Fbox indicates it is retiring an instruction by asserting the signal F%RETIRE_H. In response to this signal, the Ebox retires the current retire queue entry. The Fbox sends a map specifier which tells the PSL logic in 85 of the Ebox pipeline how to set the PSL condition code bits based on the Fbox condition code. There may be an Fbox result store at the same time as a retire. The storing of Fbox results is handled exactly like the storing ofEbox results in the pipeline. The request is made in S4, through the RMUX. The Fbox supplies the data length for the store. (It derives the data length from the opcode.) If there is no stall or fault, the operation is advanced into 85 where the write is done unconditionally. Condition code updates are done in S5, too. The stalls which apply to this operation are the same as for an Ebox microword doing a stall. The destination queue and PA queue must have valid entries and the Mbox must be ready, if the Fbox is doing a store. The retire queue must indicate the Fbox for an Fbox store or retire to be allowed. Otherwise the Fbox store or retire is stalled. 8.5.16.3 Fbox Store Stall In some cases the Fbox asserts F%STORE_H to indicate it has result data to store and then asserts FOfDSTORE_STALL_H to abort the store. This is done because certain Fbox operations may take an extra cycle, depending on the actual data pattern. FOfaSTORE_STALL..H is asserted too late for the Ebox to not send a store request to the Mbox (if the result is supposed to be stored to memory). If a store is forwarded to the Mbox and is then revoked by F%STORE_STALL_H, the Ebox asserts E%EM..,.ABORT_L early in the next cycle to abort the EM_LATCH operation and purge the EM_LATCH. This is the same mechanism used to abort EM_LATCH operations when an Ebox pipeline abort occurs (see Section 8.5.17.2). Due to complexities in the Mbox, (see Section 8.5.17.2), the Ebox ignores M%PA...Q...STATUS_H<O> in cycles in which E%EM....ABORT_L is asserted because of previous FOfoSTORE_STALL_H assertion. In this cycle, it behaves as if M%PA_Q....STATUS_H<O> is deasserted. IgnoringM%PA...Q....STATUS_H<O> and behaving as ifit is deasserted has the effect of unconditionally stalling the Fbox store (which is always ready in these cases in the current implementation). This means there is one cycle additional latency beyond that introduced by the Fbox aborting the store. Note this only occurs when E%EM_ABORT_L is actually asserted. If the abort store never was sent to the Mbox, M%P.A....Q...STA1US_H<O> is not ignored. DIGITAL CONFIDENTIAL The Ebox 8-53 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 8.5.16.4 Fbox Destination Scoreboard The Ebox maintains state to detect pending Fbox stores to GPRs in the Fbox destination scoreboard. If any Ebox or Fbox operation attempts to source one of the GPRs which the Fbox is scheduled to update, the Ebox stalls and Fbox operand delivery is stalled. The Fbox destination scoreboard is implemented as part of the destination queue. This section describes the Fbox destination scoreboard functionality of the destination queue. See Section 8.5.15.3 for more on the main function of the destination queue. The Fbox destination scoreboard consists of a pair of comparators and a write-pending bit associated with each destination queue entry. If an Fbox update of a particular GPR is pending, the write-pending bit in the destination queue entry for that store is set. The bit is set in 84, by specifying F.DEST.CHECK in the MISC2 field. If the Fbox source operands are all sent by one microword, that microword specifies MISC2IF.DEST.CHECK. If a sequence of more than one microwords sends the source operands to the Fbox, the MISC2IF.DEST.CHECK is in the last microword. Whenever a GPR is accessed using the source queue (AlSl and/or BlS2) in S3 , every destination queue entry with a set write-pending bit is compared with the two outputs of the source queue. A match, or hit, causes a stall if the source queue output which hits is actually specified by the microword in the A or B fields. For a hit to cause a stall, the write-pending bit in the destination queue must be set. Additionally, the source queue output which hits must specify a GPR access (i.e., it must not point to an MD register or contain literal data). If these conditions are met, the S3 operation is stalled. Note that the above check includes destination queue entries with their MDEST bit set. So pending writes to memory (using PA queue addresses) may cause a scoreboard hit stall. This is not done to prevent the Ebox from reading a GPR before a pending Fbox write to the GPR completes. Instead, it is done to prevent the Ebox from reading a GPR when the Ibox must write an incremented or decremented value first. This occurs when the Ibox processes an autoincrement or autodecrement specifier with write access type for an Fbox instruction. In processing the specifier, the Ibox csu can be stalled for some reason, and thus be delayed from writing the new value to the GPR. To handle this case, the Ibox sends the GPR number with ALL destination queue entries. If the Ebox reads a GPR which was used in a destination specifier, the scoreboard hit stall prevents the read until the destination queue entry is retired. Because of the minimum latency in the Mbox in processing specifier accesses, it is known that the Ibox CSU will updat.e the GPR before the associated PA queue entry becomes valid, and the destination queue entry will not be retired until the PA queue entry becomes valid. (Actually, the destination queue entry is effectively retired before the Ebox "knows" that the PA queue entry is not valid, but then an S4 stall exists which will last until the PA queue entry becomes valid. This stall will also stall S3, so the GPR access will be prevented until the GPR is valid. This is why all RMUX S4 stalls also stall S4 and S3 when the Fbox is next to retire an instruction.) In the event that a modify access type specifier is processed, an entry is made in the source and destination queues for the same specifier. If it is a register mode specifier, it does not cause a deadlock because the MISC2IF.DEST.CHECK operation which sets the write pending bit in the destination queue for the entry is not done until the last microword of the execution microflow is in S4. By that time all the operands have been sent to the Fbox. If the addressing mode is some memory access mode, the operand bus bits which carry the GPR number when processing a write access type specifier are used instead to carry the index of the MD register which will hold the source data. Interpreting this MD index as a GPR number could cause lost performance if a subsequent instruction accesses the GPR with the same index as that MD. (Deadlock doesn't occur 8-54 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 for the same reason as before.) To prevent possible loss in performance, the Ebox forces the index bits to 1 as they are written into the destination queue GPR field for modify access type operands only. This has the effect of converting specifying the PC in the destination queue. If a subsequent instruction does access the PC directly, then a stall will not hurt since this is an UNPREDICTABLE case. (The Ebox supplies the value 0 when the ~ is specified in this way.) NOTE When the Ebox is next to retire an instruction and is writing to a write access type destination operand, it will stall in S4 if the PA queue is not valid. This causes an 83 stall. Thus the case which motivated the above special scoreboarding case for Fbox destinations can not occur for Ebox instructions. In fact, the only reason it can occur for Fbox instructions is because there are several "hiddent. pipeline stages between 83 and 84 when the Fbox processes an instruction. These extra pipeline stages allow the Fbox to accept new instructions and their associated source operands before it has retired the current instruction. This combined with the fact that the Ibox can process t'simple specifiers for new instructions even while the CSU is stalled processing a complex write access type specifier from a previous instruction is what leads to the need for the special scoreboard case described above. tt The Ebox will access ahead of the current destination queue entry as part of the Fbox destination scoreboard function. A pointer called the FDest pointer is maintained which may point to an entry which is after the front entry in the FIFO queue. Normally, it points to the current entry. However, in circumstances where the Fbox is next to store a result, it is incremented ahead of the current destination queue entry pointer. When the microword in S4 specifies MISC2IF.DEST.CHECK, the Ebox checks that the destination queue entry at the FDest pointer is valid. If it isn't, S4 stalls (stalling S3 as well). If the destination queue entry is valid, the associated write-pending bit is set. If the DL is quadword, then the bit associated with the next destination queue entry is also set. The FDest pointer is incremented by one, or by two if the DL is quadword. The write-pending bits are set in S4 even if there is an S4 stall. The FDest pointer is incremented as the operation advances into 85, when there are no S4 stalls. NOTE The DL supplied in the instruction queue with Fbox instructions is the length of the result. Flow-thru bypass ensures that the S3 microword is stalled if it is accessing a GPR and that GPR is specified by a destination queue entry whose write-pending bit is being set by the microword inS4. Write-pending bits in the destination queue are reset in S4 as the Fbox writes results, even if the MDEST bit is set in the destination queue entry being retired. Flow-thru bypass ensures that an 83 stall due to the scoreboard is broken in the cycle in which the Fbox drives the. result to the Ebox. This means the result in S4 (after the RMUX) is bypassed to E_BUStroABUS_L<31:O> and/or E_BUS%BBUS_L<31:O> in these cases. In 84, when the Fbox stores a result, the write-pending bit of the destination queue entry is reset. This means that destination queue entry can no longer cause a scoreboard hit stall. The bit is cleared even ifRMUX S4 stalls. In all cases this is safe either because the destination queue entry has MDEST set or because the particular RMUX S4 stall also causes an S4 stall which in turn causes an S3 stall which prevents Fbox operand delivery. DIGITAL CONFIDENTIAL The Ebox 8-55 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 The write-pending bits and all destination queue pointers are reset when E_MSC%FLUSH_EBOX_H is asserted. This happens in every microtrap, including the power-up microtrap. 8.5.16.5 Fbox Fault and Error Management As mentioned above, the Fbox latches source operand fault and error information and carries it along with its other instruction related information. Also, the Fbox may encounter a fault in the course of computing the result. .All these faults and errors are presented by the Fbox when it requests the RMUX. The Ebox responds by signaling a microtrap to the Microsequencer once the retire queue indicates the Fbox. Before the retire queue points to the Fbox, the Ebox ignores the fault status coming from the Fbox. The Ebox detects Ibox incurred faults and errors for Fbox operands as described in Table S-12, but instead of handling them directly, it passes the fault/error status to the Fbox. The Fbox doesn't wait for the operand valid signal when a fault or error status is asserted, even though there isn't valid data. This breaks a stall which might never end otherwise, since the Ibox stops processing operand specifiers when it encounters a fault or error. NOTE The Fbox treats the data which comes with the fault/error status as UNPREDICTABLE. Also the Fbox breaks the stall on any operands which follow an operand with an associated fault or error. The Ibox stops processing operand specifiers when it encounters a fault or error. If the Fbox didn't break the stall and propagate the fault/error to the RMUX, the CPU would hang. If there isn't a fault or error being signaled by the Fbox, there could still be a destination operand fault or error. If the Fbox is requesting the RMUX and indicating a destination queue indirect store, the Ebox cheeks for a destination operand fault or error (see Table 8-12). If there is one, the appropriate microtrap is forced. Most Fbox faults, and all Fbox errors, result in VAX architecture exceptions of the fault type. This means most Fbox faults, and all errors, are taken in S4 when the operation is about to advance into 85. Integer overflow is a trap in the VAX architecture sense, and causes a microtrap late in 85. Fbox operand faults and errors have higher priority in the Microsequencer than Fbox originated data faults. Fbox operand faults cause the same microtraps as would be taken if that fault or error was detected in an Ebox instruction. Fbox originated data faults cause a floating fault microtrap, provided there aren't any operand faults or errors. 8ee Section 8.5.19.7 for more on how microcode determines the cause of the microtrap. 8.5.16.6 Ebox to Fbox Commands The Ebox asserts the signal E%FLUSH_FBOX_H when the microword in 86 specifies RESET. CPU in the MIse field. This has the effect ofreseting the Fbox and clearing its pipeline of all operations. 8-56 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 8.5.16.7 Summary of Fbox-Ebox Signals The following signals are driven by the Ebox to the Fbox. • E%FOPCODE_H This 9-bit bus carries the full opcode for Fbox operations. (This bus is actually driven by the Microsequencer.) • • E%FBOx...1ST_CYCLE_L This bit indicates there is a valid Fbox opcode on E%FOPCODE_H. (This signal is actually driven by the Microsequencer.) Eo/GABUS_H and E%BBUS_H These 32-bit busses carry the source operand(s). • E%FDATA,..VALID_H • This signal tells the Fbox that all operands being sent to it are valid. The Fbox knows, from decoding the opcode, exactly what data is being sent on Eo/GABUS_H<31:O> and E%BBUS_H<31:O>. E%A,.,.SHLIT_H and E%B_SHLIT_H These signals indicate the data on Eo/GABUS_H<31:O> and E%BBUS_H<31:O>, respectively, is a 6-bit short literal value extracted from the instruction stream. Special data formatting· is required by the Fbox. • E%PSL_FU_H • The current PSL<FU> value for use by the Fbox in deciding whether to signal floating point underflow faults or not. E%F.JWMGT_FLT_H, E%F_MEM_ERR_H, and E%F_BSVD~DR_MODE_H These signals tell the Fbox that there is a fault or error associated with the source operands. The Fbox carries this status down the pipeline so that it is handled after instructions which are already in the Fbox pipeline. • • • E%FLUSH_FBOx...H This signal causes the Fbox to clear its pipeline of all operations. E%RETIRE_OK..H This signal tells the Fbox whether to stall if it has an instruction to retire. The Fbox stalls if it wants to retire an instruction and this signal is not asserted. E%STORE_OK_H This signal tells the Fbox whether to stall if it has a result to store. The Fbox stalls if it wants to write a result and this signal is not asserted, even if it also wants to retire an instruction and E%RETIRE_OK....H is asserted. The following signals are driven by the Fbox to the Ebox. • F%1NPUT_STALkH This signal causes the Ebox to stall in S3 if it is attempting to send operands to the Fbox. • F%STORE_STALkI! This signal is asserted by the Fbox when it is asserting F%STORE_H but isn't able to supply valid data. • F%FBOx...RESULT_H This 32-bit bus carries Fbox results to the Ebox. • F%CC_N_H, F%CC_Z_H, .AND F%CC_V_H These are the 3 the Fbox condition code bits. They are Negative, Zero, and Overflow. • F%RETIRiLH This control signal tells the Ebox the Fbox is retiring an instruction in this cycle. DIGITAL CONFIDENTIAL The Ebox S-57 NVAX CPU Chip Functional SpecificatioD.t Revision 1.1, August 1991 • F%STORE_H This control signal tells the Ebox the Fbox is storing a result in this cycle. • F%CC..MA,P_H<1:O>- This is the map specifier which tells the Ebox how to update the PSL condition code bits. • F%FBO~DL_B<1:O> This is the data length used by the Ebox for an Fbox store. • "d\lMGT_FAULT_H Signals a memory management fault for one of the currently retiring instruction's source operands. • ~_H Signals a memory access hardware error for one of the currently retiring instruction's source operands. • F%RSVD~DRJMODE_H Signals a reserved address mode fault for one of the currently retiring instruction's source operands. • F%RSV_H Signals a reserved operand fault for one of the currently retiring instruction's source operands. • F%FOV_H Signals a :Boating point over:8ow fault resulted from the currently retiring instruction. • F%FU_H Signals a :Boating point under:8ow fault resulted from the currently retiring instruction. • F%FD~H Signals a :Boating point divide-by-zero fault resulted from the currently retiring instruction. 8.5.16.8 Fbox Disabled Mode The ability to operate with the Fbox disabled is provided in the Ebox. When the Fbox is disabled, all :Boating point macroinstructions, including all Boating point CVT macroinstructions, cause reserved instruction faults. MULL is handled in microcode. The Fbox enable bit is in IPR 125, ECR (see Section 8.5.22). Ifit is not set, Ebox hardware functions are altered as follows: • • • Assertion of E%FBOx...,1ST_CYCLE_L to the Fbox is disabled (in the Microsequencer). The entry made in the retire queue is overridden to specify Ebox instruction retire. A reserved instruction fault is signaled to the Microsequencer when the first microword of any Fbox execution micro:Bow is about to advance into 85, except if that microword specifies MISCIMULL. With the Fbox disabled, each :Boating point macroinstruction causes a fault (a VAX architecture reserved instruction fault) when the first microword of its execution microflow is about to advance into 85. This occurs for all floating point macroinstructions, including :Boating point CVT instructions. Microcode can branch conditionally on the Fbox disable bit. The first microword of the MULL execution microflow specifies MISCIMULL and branches conditionally on the Fbox disable status. If the Fbox is enabled, the branch is to a microflow which dispatches the operation to the Fbox. If the Fbox is disabled, the branch is to an Ebox execution micro:Bow which completes the MULL. 8-58 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 8.5.17 Ebox-Mbox Interface The Ebox to Mbox interface has a memory request function and a returned read result function. The Ebox issues memory requests by sending a command, address, and possibly write data to the Mhox. The Mbox returns read results by writing them directly into the register file. Faults and errors encountered by the Mbox in completing the operation are reported one of three ways depending on the operation. NOTE When the Ebox initiates a memory read by sending a request to the Mbox, it specifies the register which will receive the memory data in the DST field of the microword. This has the sides effect, when the microword is in 85, of writing that register with the value on E_BUS%WBUS_L<31:O>. Normally this register is written by the Mbox after this, before the particular register is read again. However, an exception can prevent the Mbox write and leave the register containing effectively garbage data.. There are three kinds of memory access requests issued by the Ebox: reads, writes, and stores. Reads are requests for memory data to be returned to a Wn or GPR register in the register file. The Ebox supplies the address directly. Writes are requests that data be written to memory. The address and data are both supplied directly by the Ebox. Stores are requests that data be written to the address in the current PA queue entry in the Mbox. The Ebox only supplies the data for stores. There are several control operations the Ebox can request of the Mbox. There are three kinds of TB invalidate requests. It can synchronize to the Mbox, causing a stall until the Mbox finishes memory management checks for the' current request. Also, probe, write check, TB fill, and processor register read and write operations are available. The Ibox issues operand data reads to MD registers on behalf of the Ebox as it processes operand specifiers. The Ebox simply uses the data when it is returned. The Ibox also issues a request that is the first half of a store. This supplies an address for the Mbox to translate and then enter into the PA queue. The Ebox eventually issues a store request which uses the address in the PA queue to do the write. Memory management faults encountered in memory reads and writes (not stores) issued by the Ebox are reported by the Mbox asserting the signal Mo/,MMILTRAP_L which is received by the Microsequencer. This causes an immediate microtrap and Ebox pipeline abort. Memory management faults encountered in memory reads initiated by the Ibox on behalf of the Ebox result in the Mbox asserting Mo/cMME_FAVLT_H which sets the memory management fault status bit associated with the target MD register in the register file. The Ebox detects the fault when a microword sources that particular MD register. Faults for stores are reported by the Mbox as soon as the PA queue entry is valid. The Ebox detects the fault when a microword attempts to issue a store request. Hardware errors in memory reads issued by the Ebox are reported by asserting M%HARD_ERR_H in the cycle in which read data is written into the register file. The data is generally incorrect, since an error occurred. The register file write can't be to an MD register since it is issued directly by the Ebox. There aren't fault hits in the register file to receive the error status for registers other than the MD registers. 80, when the Ebox detects a MD port write to a register other than an MD and the error status is asserted, the Ebox forces an immediate microtrap. This microtrap is not delayed by any 83 or S4 stalls. DIGITAL CONFIDENTIAL The Ebox 8-59 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Hardware errors in memory reads initiated by the Ibox on behalf of the Ebox result in the Mbox asserting M%HARD_ERR_H as it writes the target MD register in the register file. This sets the error status bit associated with the target MD register. The Ebox detects the error when a microword sources that particular MD register. Hardware errors for stores are reported by the Mbox as soon as the PA queue entry is valid. The Ebox detects the error when a microword attempts to issue a store request. TB parity errors are a special case. Whenever a TB parity error is encountered, the Mbox asserts M%TB_PERR_TRAP_L. The Microsequencer initiates an immediate asynchronous hardware error microtrap when M%TB_PERR_TBAP_L is asserted. This could happen as a result ofMbox processing of any Ebox memory reference, Ibox operand prefetch reference, or lbox instruction fetch or prefetch which uses the TB. ' All Mbox requests except store are specified in the MRQ field of the microword. The store request is implicit in Ebox or Fbox result storing through the RMUX. All Mbox requests are issued in 84. The table below shows the requests the Ebox can send to the MbOx. See Chapter 12 for more detail on each operation. Table 8-13: Ebox Mbox Requests Requeast MnemoDic Addressing Access Check Mode Usedl Operation Description MRQIREAD.V:RCHK virtual read current read virtual memory MRQIREAD.V:WCHK virtual write current read virtual memory and check for write access MRQIREAD.V:NOCHK virtual MRQIREAD.V:LOCK virtual MRQIREAD.P physical read physical memory MRQIREAD.PR physical read processor register MRQlPROBE.V:RCHK virtual MRQlPROBE.V: RCHKNOFILL virtual MRQlWCHK virtual write current check that memory location can be written MRQIWRITE.V:WCHK virtual write current write virtual memory MRQIWRITE.V:NOCHK virtual MRQIWRITE.V:UNLOCK virtual MRQIWRITE.P physical write physical memory MRQIWRITE.PR physical write processor register read virtual memory with no access check write read current mode read-lock virtual memory Probe byte address for read - return 3-bit probe status to register file Probe byte address for presence in TB - return I-bit status to register file, but don't fill TB if entry is not already in TB write virtual memory without access checks write current write-unlock virtual memory lCurrent means CUICMOD from the PSL, mode means contents of MMGT.MODE. 8-60 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 8-13 (Cont.): Ebox Mbox Requests Request Mnemouic Access Check Mode Usedl Operation Description MRQ/PROBE.V.WCHK virtual write mode Probe byte address for write - return. 3-bit probe status to register file STORE 2 virtual 8 write 8 CUlTent S write to physical address in PA queue MRQlLOAD.PC send the data to the lbox to be used as the new PC MRQlSYNC.MBOX synchronize with memory management check. from previous Mbox request by issuing a form of NOP. MRQtrB.TAG.FILL virtual MRQtrB.PTE.FILL MRQtrB.INVAL.SINGLE virtual directly load TAG "current" TB entry part of directly load PTE "current" TB entry part of invalidate single TB entry, if present MRQtrB.INVAL.PROCESS - invalidate all TB entries for current process MRQtrB.INVAL.ALL invalidate all TB entries lCnnent means CUR_MOD from the PSL, mode means contents of MMGT.MODE. 2This operation is not initiated through the MRQ field. It is issued by microwords specifying DSTIDST and libox operations with P'YTORE_B asserted, given that the destination queue entry indicates a memory destination. 3'Ihm.slation and access check done previously by the MboL The store operation in the above table is not specified in the MRQ field. Each destination queue indirect result store which is to memory (as opposed to a GPR) is turned into a Mbox store request. The Mbox writes the data received with this request to the address extracted from the PA queue. (Two address entries in the PA queue are needed for unaligned stores.) The load-PC operation is accomplished with the aid of the Mbox (MRQILOAD.PC). The Mbox's part is to pass the data (PC) on E%WBUS_H<31:O> to the Ibox via Mo/eMD_BUS_H<31:O>. The Ebox signals the lOOx that the new PC value is coming. The information sent to the Mbox when the Ebox issues a command is shown in the following table. The information, except E%WBUS_H<31:O> data, is valid in 84. The command information is driven early in 84, while the address isn't valid until late in 84. E%WBUS_H<31:O> data is valid early in 85. The table shows the source of each item. See Chapter 12 for the encoding of these fields. DIGITAL CONFIDENTIAL The Ebox 8-61 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 8-14: Ebox Memory Request Information Busses Signal Source Description E%EBOx...CMD_H<~ decoded from MRQ and DST microword fields Request - command code E%WBUS_H<311O> E..BUM>WBUS_L<311O> write data, not ready until S5. (only needed for write type and store operations) E%VA,..BUS_L<311O> VA register with bypass address (or PTE in case of TB.PTE.FILL) ~x...TAG_R<.4IO> DST microword field address in register file where read or probe result is to go E%EBOx...AT_B<11O> decoded from MRQ microword field access type for operation E%EBOXJ>kH<11O> DL register data length for access E%EBOx...VIRT~DR..B decoded from MRQ microword field :n.NO_MME_CHECK..B decoded from MRQ microword field Indicates virtual address - translation needed Indicates no access check should be done This information is all latched by the Mbox in the EM_LATCH. This latch can only hold one command. Once it is full the Mbox will ignore Ebox requests until it is empty again. It is emptied when the Mbox request completes. To process requests from the Ebox and from the Ibox, the Mbox receives the CUR_MOD pits from the PSL and the MMGT.MODE register contents. The CUR_MOD bits are normally used as the access mode for a request's TB check. The MMGT.MODE bits are used only when the request is a PROBE.V.RCHK, PROBE.V.RCHK.NOFILL or PROBE.V.WCHK. Note that the Mbox uses the CUR_MOD field for alllbox-initiated requests at all times, so it must receive both mode fields simultaneously. The address for Ebox-initiated memory accesses comes from the VA register. The microword issuing the memory request may update the VA register. If it does, the new VA value is sent with the request. The write data for a memory request is the data put on E%WBUS_H<31:O>, a buffered copy of E_BUS%WBUS_L<31:O>, by the microword issuing the memory request. The following table shows what the Ebox sends on each of the memory request information busses for each operation. Table 8-15: Ebox Memory Request Information Truth Table Request Mnemonic E%EBOX_ ~ox_ CMD_B<.faO> READ.V.RCHK E%EDOI... VlRT_ ECQiO....MME_ ADDR_H CBECK..R AddrlData Sent? DST true false yes/no DL DST true false yes/no DL DST true true yes/no true false yes/no AT_B<1IO> E%EBOx... DL_B<llO>l REDOx... TAG_H<4IO>1 DREAD read DL READ.V.WCHK DREAD modify READ.V.NOCHK DREAD read READ.V.LOCK DREAD_ LOCK modify DL DST 1 DL means data length dictated by the microword; the DL register value unless the microword overrides the data. length to longword. 2DST means the tag is the register specified in the DST field of the microword. - means don't care, doesn't apply. 8-62 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 8-15 (Cont.): Ebox Memory Request Information Truth Table B'QBO~ Request MIlemODic :H.BBOX., EUBO~ :H.~ £fYB()~ CMD_BdIO> AT_B<11O> DL..:s<lIO>l TAG_B<4IO>2 VIB.T_ ADDR..:s READ.P DREAD read DL DST false READ.PR IPR_RD DL DST false PROBE.V.RCHK PROBE read Byte DST true false yes/no PROBE.V.RCHK. NOFILL PROBE ()4 Byte DST true true yes/no WCEK MME_CHK write DL true false yes/no WRITE.V.WCHK WRITE write DL true false yes/yes WRITE.V.NOCHK WRITE write DL true true yes/yes WRITE.V.UNLOCK WRITE_ write DL true false yes/yes DL false yes/yes DL false yes/yes ECJf.NO_MME_ AddrJData ClIIilCEJ[ Sent? yes/no yes/no UNLOCK WRITE.P WRITE WRITE.PR IPR_WR PROBE.v.WCHK PROBE STORE STORE write write Byte DST true false yes/no false nol~es LOAD.PC LOAD_PC false nolyes SYNC.MBOX NOP Byte false nolno TB.PTE.FILL TB_PTE_ FILL Byte false true yu'lno TB.TAG.FILL TB_TAG_ FILL Byte false true yes/no TB.INVAL.SINGLE TBIS Byte false true yes/no TB.INVAL.PROCESS TBIP Byte false true nolno TB.INVAL.ALL TBJA Byte false true nolno 1DL means data length dictated. by the microword; the DL register value unless the microword overrides the data length to longword. 2DST means the tag is the register specified in the DST field of the microword. sPTE data is sent on address bus through VA register. 4Spec:i.al code-no access check is done. Only the presence of an entry in the TB is checked. - means don't care. doesn't apply. 8.5.17.1 10 Read Synchronization Because the Ibox issues operand reads before the Ebox executes the associated macroinstruction, there is a possibility that an exception or branch will result in an operand read occurring even though the associated macroinstruction is never executed. This is not a problem if the read is to memory space, but it might be if the read is to 10 space. Many 10 space reads have side effects, so some mechanism is required which postpones an Ibox issued 10 space read until the Ebox is actually executing the macroinstruction which requires the 10 space read. The Mbox delays all 10 space reads issued by the Ibox until the Ebox asserts the signal E%START_mox;..IO_RD_B. DIGITAL CONFIDENTIAL The Ebox 8-63 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 The Ebox asserts E%sTART_IBO~IO_RD_H when the following are all true: 1. The Ebox is stalled in 83 waiting for a register file entry indexed through the source queue (Le., A/Sl, A/S2, BlSl, or BlS2) to become valid, or E_FLQ%FQ..STALL_H is asserted, 2. there is exactly one entry in the retire queue, 3. there is no stall of S4 of the RMUX part of the Ebox pipeline, 4. conditions 1,2, and 3 were true in the previous cycle, 5. there is no MD fault for any of the MD registers currently being accessed in (stalled) 83, 6. and the Ebox pipeline is not being flushed by a microtrap this cycle. The Mbox processes specifier queue entries one at a time (the specifier queue is the queue in the Mbox which receives all operand data references issued by the Ibox). If the specifier queue entry is an 10 space access, the Mbox will not process it unless S6 in the Mbox is idle (not processing any reference) and S6 was idle in the previous cycle and Eo/cSTART_IBO~IO_RD_H is asserted. (Note that a one cycle delay occurs in the Mbox on E%START_IBOX_IO_RD_H. This is why the current cycle and the previous cycle are checked for NOP in 86 in the Mbox.) If the Ebox is stalled waiting for read data to be put in an MD by the Mbox, and the Mbox is waiting for »rc8TA.RT_mO~IO_RD_H to be asserted (because the specifier queue entry is an 10 space read) then the Ebox must be waiting for the result of that 10 space read. The Ebox only asserts »rcSTART_mO~IO_RD_H when it is certain that the macroinstruction which will use the result of the 10 space read is going to execute. If the retire queue contains more than one entry, other instructions are in the Ebox or Fbox pipeline so »rcST.ART_IBO~IO_RD_H is not ass~rted in case one of them incurs an exception. If the Ebox is stalled in (RMUX) 84, it doesn't assert E%START_mox:....IO_RD_H because the previous macroinstruction's result store may incur an exception when it advances to 85. (Note that the retire queue entry is removed from the queue before the RMUX S4 stall status is known so that the RMUX S4 stall status has to be examined as well.) If the Ebox is being flushed by a microtrap in the current cycle, it doesn't assert E%START_mox..IO_RD_H because the previous macroinstruction actually had a trap. If there is an MD fault being reported in 83 of the Ebox, then the Ebox will take a microtrap after one cycle with no S4 stalls has passed. In the interim, E%START_mO~IO_RD_H must not be asserted. Assertion of Eo/~TART_mO~IO_RD_H when field queue stall is present is necessary to avoid deadlock, however it will cause the CPU to start an 10 space operand prefetch even when a memory management fault will cause the instruction to be fault. For example, this might occur with ADAW! if the second operand is in 10 space and the first can incur a memory management fault. 8.5.17.2 Mbox-Ebox signals The Mbox drives the following control signals for Ebox use: M%EM_LAT_FULL_B and M%PA...Q..STATUS_H<2:O>. M%EM_LAT_FULL_B tells the Ebox the EM_LATCH is full. M%PA..,Q..STATUS_H<2:O> gives the status of the current PA queue entry. M%PA...Q..STAlUS_H<O> indicates that sufficient entries are valid in the PA queue to accept a store request. Multiple PA queue entries are needed for a store when the store will access multiple longwords in memory (as in quadword length stores and unaligned stores which cross a longword boundary). M%PA..,Q..STATUS_B<l> indicates that the relevant PA queue entries have a memory management 8-64 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 fault associated with them. The Ebox will not issue the store; it will microtrap when the microcode attempts it. M%PA...,Q..STATUS_B<2> indicates the relevant PA queue entries have a hardware error associated with them. The Ebox will not issue the store; it will microtrap when the microcode attempts it. In one case Ebox logic ignores M%PA.,.Q...STATUS_B<O> and behaves as if it is deasserted. Due to complexities in the Mbox, M%PA...Q..STATUS_B<2:1> are not logically correct in the cycle in which the Ebox aborts a EM_LATCH operation by asserting E%EM.-ABORT_L. This happens when the Ebox aborts an Fbox result store operation because of F%STORE_STALL..B (see Section 8.5.16.3). Due to complexities in the Mbox, M%PA...Q..STATUS_H<2:1>, which signal memory management exceptions and hardware errors associated with the PA queue entry, are not always correct in a cycle in which an EM_LATCH operation is aborted by assertion of E%EM..ABORT_L. In this cycle, the Ebox ignores M%PA.,.Q..STATUS_H<O> and behaves as if it is deasserted. M%PA.,.Q..STATUS_B<O> qualifies every use of M%PA.,.Q...STATUS_B<2:1>, so the Ebox can't incorrectly take or not take an exception because of incorrect M%PA...Q...STATUS_H<2:1> values. The Ebox ignores M%PA,..Q..STATUS_H<O> only in cycles in which a store of Fbox data was sent in the previous cycle and was aborted in this cycle by asserting E%EM.fi80RT_L because F%STORE_STALL..B was asserted. This be coincident with an actual pipeline abort (which also causes assertion of E%EM.fi80RT_L if a request was sent to the Mbox in the previous cycle). In this case the Ebox will ignore M%PA...Q..STATUS_H<O> in a cycle in which the microword in 84 is effectively a NOP, and no change in behavior will result. The Ebox stalls the microword in S4 if it specifies an Mbox request and the EM_LATCH is full. Also, S4 is stalled if the microword specifies a store and M%PA.,.Q...STATUS_H<O> is not asserted. The Mbox drives several signals and busses used in writing the data into the register file. These are M%EBO~DATA...H, M9"tMD_BUS_B<31:O>, and M%MD_TAG_H<4:O>. When M%EBOX_DAT.A,..H is asserted, the data on M%MD_BUS_H<31:O> is written into the register addressed by M9"cMD_TAG_B<4:O>. Note that Mo/tMD_TAG_H<4:O> is 5 bits; it can address up to 32 locations. The organization of the register file is such that the MD, Wn, and GPR registers (a total of 27 registers) are in the first 32 locations in the register file. This means they can be addressed with a 5-bit tag (which is mapped into the full 6-bit address by zero extension). The Mbox drives fault and error flags which are associated. with the data on M%MD_BUS_B<31:O>: M%MMitFAVLT_H and M%HARD_ERR_H. If Mo/cMME_FAULT_H or M%HABD_ERR..H is asserted when M%EBOx:..DATA.,.B is asserted, then a fault or error is being reported to the Ebox for some previously initiated read operation. This is handled in one of several ways, depending on the case, as is shown in Table 8-16. Sipal Asserted Responee WnorGPR The Ebox ignores this case. MII.MME_T.B.AP_L would have been asserted for the same fault in a previous cycle. WnorGPR In this case the Ebox forces an immediate hardware error microtrap. MD DIGITAL CONFIDENTIAL In this case the fault bit for the partieul.ar MD is set in the register file. The Ebox 8-65 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 8-16 (Cont.): Ebox Response to M%MME FAULT Hand M%HARD ERR H Signal Asserted MD Response In this case the error bit for the particular MD is set in the register file. The Mbox drives Mo/eMME_TRAP_L and M%TB_PERR_TRAP_L to force immediate microtraps. MO/cMME_TRAP_L causes a memory management exception microtrap, while M%TB_PERR_TRAP_L causes an asynchronous hardware error microtrap. The Ebox asserts certain Mbox control signals under the control of the MISC and MISC2 fields of the microword. These signals are E%FLUSH_MBO~H, E%FLUSH_PA.....QUEUE_H, and E%RESTART_SPEC_QUEUE_H. E%FLUSH...MBO~H is asserted when MISCIRESET.CPU is specified. It causes the Mbox to flush ongoing Ebox reads, including those initiated by the Ibox. It also flushes the specifier queue. It does not flush the PA queue, so writes and stores already issued by the Ebox are not affected. E%RESTART_SPEC_Q~H is asserted when MISCIRESTART.MBOX is specified. It restarts Mbox processing of specifier queue references. Mbox specifier queue processing is stopped by Ibox request when certain complex macroinstructions are encountered._ E%FLUSH_PA.....QUEUE_H is asserted when MISC2IFLUSH.PAQ is specified. It causes the PA queue in the Mbox to be flushed: MISC2IFLUSH.PAQ should always be sepcified with a MRQ field request which causes an EM latch command (i.e., other than MRQlSYNC.BDISP, MRQlSYNC.BDISP.RETIRE, MRQlSYNC.BDISP.TEST.PRED, or MRQlNOP). When a pipeline abort occurs, the Ebox asserts E%EM....ABORT_L, conditionally. It is asserted because the abort is recognized too late to prevent the Ebox from issuing an Mbox request in 84. E%EM..ABORT_L is asserted in 85 and signals the Mbox to disregard the command just sent in 84. It is only asserted if the Ebox actually made an Mbox request in S4 and the EM_LATCH wasn't full. Even stores and write requests are aborted in this case. 8.5.17.3 Ibox IPR Access and LOAD PC The Ebox detects Ibox IPR access requests in 85. At that time it asserts a command strobe to the Ibox. The Mbox will also detect that the IPR access is to the Ibox. It will treat an Ibox IPR read as a NOP. For IPR writes the Mbox forwards the data on M%MD_BUS_H<31:O> in a later cycle. Microcode synchronizes with Ibox IPR writes by issuing a MRQlSYNC.MBOX after the operation. Once the MRQlSYNC.MBOX is complete, the microcode knows the Ibox has the data. In detecting Ibox IPRs,. the Ebox treats the entire range of normal IPR addresses from DO to DF (hex)as Ibox IPRs. The exact test used by the Ebox is: vA<9:6>=D (hex) and VA<24>=O. The low four bits (VA<5:2» are sent to the Ibox so it can determine which of its IPRs is specified. The Ebox requests a load-PC Mbox operation in 84 when the microword specifies LOAD.PC in the MRQ field. In 85 of that microword it asserts a command strobe to the Ibox informing it that the Mbox will soon forward the new PC value. Microcode synchronizes with the load-PC operation by specifying a SEQ.MUX!LAST.CYCLE. The instruction queue must be empty at this time. Once the Ibox adds a new instruction queue entry, a macroinstruction dispatch occurs. While waiting, the Ebox executes a continuous stream of "STALL" microwords (see Section 8.5.20.1). 8-66 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 8.5.18 Ebox Vector Support The Ebox supports potential future vector architecture integration by providing a configuration status bit which is available for microcode conditional branches. VECTOR_UNIT_PRESENT is a configuration status bit for vector support in the IPR ECR. See Section 8.5.22. Microcode can branch conditionally on the VECTOR_UNIT_PRESENT status. 8.5.19 Fault and Trap Management There are three kinds of VAX Architecture exceptions: faults, aborts, and traps. In all cases the PC, PSL, and other data is pushed on the stack, and the address of an exception handling routine is fetched from the SCB. For a trap, the instruction which caused the trap has finished completely, and the PC on the stack points to the next instruction to execute. For a fault, the PC on the stack points to the instruction which caused the exception. For an abort, the PC, PSL, and other state are UNPREDICTABLE; however, whenever possible the NVAX CPU tries to turn aborts into faults. The difference between an abort and a fault is that no important architectually visible state was modified by the instruction ifit was a fault, while some important architecturally visible state may have been modified if it was an abort. (Certain state, for example, the memory location which is pointed to by the stack pointer, can be modified in the case of a fault. Generally speaking, aborts are cases where restarting the instruction may not work because some state which the instruction depended on may have been altered.) The VAX Restart Bit in the machine check stack can be used in determining whether it is safe to treat an abort as a fault. To cleanly support the concepts described in the previous paragraph, the NVAX CPU has a macroinstruction commit point in the pipeline. Once any microword of the execution microflow has passed this point, the macroinstruction may have modified architectural state. Until .the first microword of the microflow passes the commit point, the instruction cannot have modified any architectural state. This point is the boundary between S4 and 85 in the Ebox pipeline. No architecturally visible state is ever modified in 83 or S4 of the pipeline. For example, the PSL and all registers in the register file are written only in 85. Also, memory requests are not issued until a microword specifying one is about to advance into 85, and it is certain there are no S4 stalls. Each macroinstruction execution microflow obeys the restriction that no microword in that flow modifies any architectural state before it is certain that all the operand specifiers for the instruction have been properly fetched and decoded and that all the memory accesses which this microftow will request are not going to encounter a memory management violation. This does not mean that no microword of the microflow passes the S4/85 boundary before all this is checked. It only means that the microwords in the microflow don't write memory or any other architecturally visible state until th~se things are verified. The net result is that macroinstructions which encounter a memory management violation are restartable once the condition has been corrected. (Note that the string instructions don't quite follow these simple rules. Instead, they use a more elaborate set of rules to ensure that they can be restarted after any memory management fault.) Microfiows for macroinstructions which might encounter any kind of fault other than a memory management exception specifically test for the fault condition(s) before modifying any architectural state. This is in addition to checking for memory management faults, as described above. DIGITAL CONFIDENTIAL The Ebox 8-67 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Ebox hardware forces a reserved opcode fault for Fbox instructions (except MULL) when the Fbox is disabled, in S4 of the first microword of Fbox macroinstruction execution flows. Because this fault is requested in S4 of the first microword, it prevents any architectural state from being altered by these flows. Hardware errors are handled differently. They generally can't be checked for, and the architecture doesn't require any such checks. Generally aborts occur as a result of a hardware error encountered in an macroinstruction after all memory management checks have been done. In these cases, some architecturally visible state may have been modified before the macroinstruction has completed. 8.5.19.1 Faults and Errors Detected In S4 When the Ebox detects a fault or error condition in S4 associated with an operation that is about to advance into 85, it signals the Microsequencer to cause a microtrap. The microtrap will cause the Ebox pipeline to abort before it advances. Any operation which was in 85 already completes normally, but the operation in S4 is purged before it enters 85. The operation in 85 may be part of a previous macroinstruction microflow. That macroinstruction is not affected by the microtrap. The microword in S4 may be the first microword to modify architecturally visible state in a given execution microflow so it must be prevented from advancing into 85. 8.5.19.1.1 Coordinating Ebox and Fbox Faults and Errors It is necessary that macroinstructions retire in order, even when there is a fault or error detected in 84. The microtrap for the fault or error must be delayed until the macroinstruction connected to the fault or error is next to retire. The current retire queue entry is used by the Ebox to decide whether a microtrap should be signaled. For example, if a branch displacement access fault or error is detected by the Ebox in S4 but the retire queue indicates the Fbox is next to retire a macroinstruction, then the branch macroinstruction came after the one being executed in the Fbox. The branch's fault or error must not cause a microtrap until the Fbox has retired its macroinstruction. Then the microtrap is forced, given that the next entry in the retire queue indicates the Ebox is next to retire a macroinstruction. The microtrap occurs in S4 after the Fbox's last operation advances into 85. The branch is prevented from retiring by the microtrap, since it incurred a fault or error. The Fbox reports a number of faults and one error to the Ebox. The Ebox ignores them until the retire queue indicates the Fbox is next to retire a macroinstruction. The reason is the same as in the previous paragraph. The microtrap has to be delayed until the logically preceding macroinstructions are advanced into 85. Destination queue and PA queue faults and errors can be connected either to the Ebox or the Fbox. It depends on whether the box selected by the retire queue is requesting a destination queue indirect store. If the destination queue is empty and I%IMEM_MEXC_H, I%IMDCHERR_H, or I%RSVD..ADDR_FAULTJI is asserted and the box indicated by the retire queue is requesting a destination queue store, then a microtrap is signaled immediately. Also, if a destination queue store is requested while the current destination queue entry is valid and M%PA...Q...STATUS_H<l> or M%PA...Q....STATUS_H<2> is asserted, a microtrap is taken (see Section 8.5.17). 8-68 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 8.5.19.1.2 Breaking the S4 Stall Other than the requirement that instructions retire in order, 84 stalls do not delay microtraps for faults or eITOl'S which are in 84. In other words, any S4 stall is broken once a fault or error in 84 is due,to the next macroinstruction to retire. 8.5.19.2 Faults and Errors detected In S3 When the Ebox detects a fault or error condition in S3, it latches it in order to carry it down the pipeline to 84. Unlike most control signals propagating down the pipeline, these fault and error conditions are not forced off when 83 is stalled and S4 isn't stalled. So the S3 stall doesn't have to end for the fault/error condition to propagate to 84. However, the fault/error conditions do stall in 83 if there is an 84 stall. This is because the microword in S4 may be from a previous macroinstruction. That instruction must be allowed to complete normally before the microtrap. Once the fault or error status has advanced into 84 and the retire queue indicates the Ebox is next to retire a macroinstruction, the Ebox signals the Microsequencer to microtrap. 8.5.19.3 Integer Overflow and Branch Mispredict Traps There are two traps handled in Ebox hardware. They are integer overftow traps, and branch misprediction traps. Integer overflow traps are VAX Architecture exceptions, while branch misprediction traps are not part of the VAX architecture. Both of these traps are handled in the Ebox by causing a microtra p once the last microword of the macroinstruction's execution microflow has entered 85. The microtrap prevents the next microword (which is the first microword of a new microfiow) from advancing into 85. This means that the macroinstruction in question completes properly but its successors are not allowed to execute. This is done for integer overflow because this is the effect required by the VAX Architecture. It is done for branch misprediction because this is the effect required to recover from an incorrectly predicted conditional branch. Integer overflow traps occur when a microword which specifies SEQ.MUX/ LAST.CYCLE.OVERFLOW is in 85 and PSL<lV> and PSL<V> are both set. If a microtrap is signaled, it prevents the next microword (or Fbox operation) from advancing into 85; the current operation in 85 completes regardless of whether the microtrap is signaled. Of the VAX architecture instructions which can cause integer overftow, MULL and all the CVT instructions are executed in the Fbox (except that MULL is executed in the Ebox when the Fbox is disabled). Integer overftow is detected in the Fbox for these instructions. The Ebox determines that an integer overftow occurred by examining the new PSL<V> bit for every Fbox retiring instruction. To distinguish instructions which can incur integer overflow traps from others the Fbox might retire, the Ebox checks the map specifier supplied by the Fbox. MULL and CVTs with integer destinations all use the same map specifier, and no other Fbox executed instruction uses that particular specifier. When the instruction being retired by the Fbox uses that particular map specifier, and PSL<IV> and PSL<V> are both set, the Ebox forces the microtrap for integer overflow. Branch misprediction traps are taken in 85 when the microword specifies SYNC.BDISP.TEST.PRED and the branch condition evaluator determines that the branch was incorrectly predicted. The Ibox prediction is read from the branch queue in 84. The branch condition evaluator result is available in 85. If the prediction doesn't match the actual result, a branch misprediction microtrap is signaled. The microtrap will prevent the microword in S4 from completing. That microword may have been the first microword of the execution microfiow for the next macroinstruction. It is DIGITAL CONFIDENTIAL The Ebox 8-69 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 not supposed to be executed because the Ibox incorrectly predicted the outcome of the conditional branch. For more on mispredicted branches see Section 8.5.15.5. If a branch mispredict is detected at the same time as an integer overflow, the integer overflow microtrap is taken. See Section 8.5.19.5. 8.5.19.4 Ebox Mlcrotrap Handling The Ebox makes a microtrap request by asserting one of a number of microtrap request signals. The Microsequencer causes a microtrap at the end of the CUlTent cycle. The Microsequencer has a priority encoder which it uses to decide which microtrap dispatch should be taken when more than one microtrap request is asserted (see Chapter 9). Regardless of which microtrap is taken, the signal E_USQ%PE..ABORT..L is asserted, causing an effective no-op to be inserted into all the control latches in S3, 84, and S5. The result is a pipeline abort. Early in a pipeline abort cycle (the cycle in which all the control latches in the pipeline are flushed), the Microsequencer signals asserts E_USQ%PE_ABORT_L. The Ebox responds by flushing the retire queue and the Ebox pipeline. Also, if in the last cycle a new command had been accepted by the Mbox, the Ebox asserts E%EM..ABORT_L which aborts that command. (Eo/DEM..A80RT_L will abort any EM_LATCH entry.) In the case of a branch mispredict microtrap, the Ibox has already been signaled by the Ebox that a mispredict occurred. The Ibox has the alternate PC latched, and it will begin fetching from that location as soon as it has unwound the RLOG. See Chapter 7 for more detail. All microtrap flows except branch mispredict execute a RESET.CPU. This causes a flush or reset of the Ebox queues and register file valid bits, the Fbox, and the Mbox (except the PA queue and EM_LATCH). It also causes E%STOP_IBOx...H to be asserted. These microtrap flows then read the Ibox IPR which causes the RLOG to be unwound and returns the backup PC. The branch mispredict microflow doesn't execute a RESET.CPU because the Ibox automatically recovers from the branch mispredict and begins fetching instructions from the correct memory location. For the same reason, it does not read the Ibox IPR which causes the RLOG to be unwound and returns the backup pc. For branch mispredict, Ebox hardware asserts all the Bush or reset signals that MISCIRESET.CPU would have caused except that E%STOP_IBOx...H is not asserted. All microtrap Bows synchronize with the Mbox by executing MRQ/SYNC.MBOx. Then they execute a MISC2IFLUSH.PAQ which causes the PA queue in the Mbox to be flushed. This allows any stores which were pending in the EM_LATCH to be finished before the PA queue is flushed. Certain microcode rules and restrictions apply to the process of gathering state and flushing the various boxes and function units within boxes. See Section 8.5.27.18. 8.5.19.5 Coincidence of Branch Mispredict Trap with other Traps It is possible for a branch mispredict trap to happen at the same time as an integer overflow trap. When this occurs, the integer overflow trap is taken because it has higher priority than branch mispredict. However, the Ibox is still signaled that a branch mispredict took place. In the few cycles that it takes for the MISCIRESET.CPU in the integer overflow microflow to arrive at 85 in the Ebox pipeline, the Ibox has begun unwinding the RLOG and correcting the backup PC queue. Once the Ibox starts this process, it delays its own response to the E%STOP_mox..H signal (which is asserted by MISCIRESET.CPU) until it has completed the correction process for the mispredicted branch. In this way, the correct backup PC is made available to the integer overflow microflow. 8-70 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 It is also possible for a trace fault to follow a mispredicted branch. In this case, the branch mispredict trap fiushes the pipeline (purging the microfiow for a trace fault which is following it down the pipeline) and the Ibox unwinds the RLOG and corrects the backup PC queue. Then the branch mispredict microftow executes a LAST.CYCLE which causes the Microsequencer to dispatch to the trace-fault handler. Early in the trace fault microfiow, RESET.CPU will be executed, and so MSTOP_mO~B may be asserted to the Ibox before it has finished correcting for the mispredicted branch. The Ibox's ability to delay its response to E%STOP_IBOx..,B is what allows the Ibox to finish its corrective action. 8.5.19.6 Possible Mlcrotrap Requests The following table lists the microtrap requests the Ebox can make. Table 8-17: Ebox Mlcrotrap Requests Mierotrap Ibox signal, MD fault status bits, PA queue fault bit, or indicated by Fbox signal, Memory management fault JUPI,.TIJfMMB BRR II lI'MIMG'l'_I'AVLT...B Memory access error S4 Ibox signal, MD fault status bits, or indicated by Fbox signal, J"IWBRR II JUPI,.TlmWJCBIUI Reserved addressing mode S4 Ibox signal or indicated by Fbox JUPI,.TUSVD.."ADD...,D. . . Reserved operand fault S4 Indicated by Fbox EJ'LTVLOATJNGJI'AULTJI Reserved S4 For floating point macroinstructions when the Fbox is not enabled. EJ'LTUSVDJNsm..L Branch misprediction trap SO branch result mismatch ~ICTJ. Integer overflow trap SO PSL<V> instruction fault and PBL<1V> both set, and JU'LTlMOVll'IJ. SEQ.MUX/LAST.CYCLB.OVBRP'LOW or Fbox map specifier indicates integer result S4 Indicated by Fbox JU'LTVLOATJNG_lI"AULTJI Floating underflow fault S4 Indicated by Fbox JU'LTVLOATJNG_lI"AULTJI Floating fault S4 Indicated by Fbox EJ'LTVLOATJNG_I'AULTJI Floating over1iow fault 8.5.19.7 divide-by-zero Fbox Fault Reporting The four Fbox faults, reserved operand, :floating overftow, :floating underflow, and :floating divide-by-zero all cause the same dispatch in the Microsequencer. The Ebox latches a priority encoded status when one of these faults is reported by the Fbox. This status is available to the trap handler via a microbranch. The priority order, from highest to lowest, is reserved operand, :floating divide-by-zero, fi.oating overftow, and floating underftow. Table ~18 shows the code for each of the four fault conditions. DIGITAL CONFIDENTIAL The Ebox 8-71 NVAX CPU Chip Functional Specificationt Revision 1.lt August 1991 Table 8-18: Fbox Fault Codes Fault Priority Code Reserved operand 1 o Floating divide-by-zero 2 1 Floating overB.ow 3 2 Floating underflow 4 3 8.5.20 Ebox Stalls The Ebox pipeline is controlled by the Ebox stall logic. It supplies stall signals which gate clocking of data information into each pipeline stage. The Ebox stall logic stalls only the segments which must be stalled. 85 is never stalled. 83 stalls if S4 is stalled. If S3 is stalled but S4 is not stalled, a NULL microword (or, more generally, an effective no-op) is injected into S4 after the control information in S4 advances into 85. The clock for each pipeline latch in S3 and S4 is ~1 gated by a stall control signal. The stall control signals are E%S3_STALL, EtrcS4_STALL, and E%RMtJX...S4_STALL for stages 83, S4, and RMUX S4 respectively. These signals determine whether the corresponding latches are opened in 4>1. The stall control signals are used to stall a pipe stage. A stage is stalled when it cannot' complete its operation for some reason. Generally data needed by the stage is not yet valid, but is expected to become valid after some time. Also stage N will be stalled when stage N+1 is not ready to receive the output of stage N. The Ebox pipeline can be stalled while the Fbox uses the RMUX portion of the pipeline to store results. When the Fbox is next to retire an instruction, E%RMtJX...S4_STALL, E%RMtJX...B4_FLUSH, and Eo/~S5_FLUSH depend on the progress of Fbox result store operations. When the Ebox is next to retire, these signals are driven to the same logic level as E0/cS4_STALL, E%S4_FLUSH, and E%S5_FLUSH, respectively. The clock for 85 pipeline latches is not gated. However there is an 85 flush signal for control information and another flush signal for the output of the RMUX.. The S3, 84, and 85 pipeline latches which hold control information also have an asynchronous reset input signal: EtrcS3_FLUSH, E9"cS4_FLUSH, E%RMtJX...S4..FLUSH, Ef7'tS5_FLUSH, and E%RMtJX...S5_FLUSH. These signals clear (flush) the control information to an effective no-op. They are asserted after the clock which loads the latch but before the control information is used to alter any state in the Ebox or anywhere else in the NVAX CPU. The flush control signals are used to insert effective no-ops into a particular stage. This is done for two distinct reasons. First, when pipeline stage N is stalled but stage N+l is not stalled, an effective no-op is inserted into stage N+l as its current operation advances to stage N+2. Secondly, when a pipeline flush is needed, the flush signals are all asserted, so every stage of the pipeline has an effective no-op inserted. The Ebox flushes the pipeline when the Microsequencer asserts E_USQ%PE_ABORT_L (which indicates that a microtrap dispatch has been initiated). Figure 8-8 shows control and data path latches and how the various pipeline control signals are typically connected. 8-72 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 8-8: Ebox Pipeline Latches S5 S4 S3 RMUX (DATA PART) RMUX (DATA PART) DATA CONTROL MICROWORD AND ITS DECODES AND FBOX CONTROL SIGNALS "'103 F IH ... IIMUX .. F UIH E...... F BH Table 8-19 shows the various pipeline stall and flush combinations which can occur. An important factor in determining the pipeline controls is whether the Ebox or Fbox is next to retire a macroinstruction. This status is given by the current retire queue entry. Table 8-19: Ebox Pipeline Stall and Flush Cases Ebox Next to Retire a Macroinstruction sa C10ckI sa Flush S4 Clock! S4 Flush RMUX S4 Clock! S5 Flush! RMUX S4 Flush RM:tJX Flush No Stalls runldon't flush runldon't flush nmldon't flush don't flush/don't flush S3 Stall (with no S4 stallldon't flush runlflush nmlflush don't flush/don't flush stallldon't flush stall/don't flush stallJdon't :flush flush/flush Pipeline Control Case stall) S4Stall DIGITAL CONFIDENTIAL The Ebox 8-73 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 8-19 (Cont.): Ebox Pipeline Stall and Flush Cases Ebox Next to Retire a Macroinstruction Pipeline Control Case sa ClockI sa Flush S4 Clock! S4 Flush RMOX S4 Clock! RMOX S4 Flush S5 Flush! RMOXFlush Pipeline Abort runlfiush nmlfiush runlflush flush!:flush Fbox Next to Retire a Macroinstruction! Pipeline Control Case sa Clock! sa Flush S4 Clock! S4 Flush requesting 84)2, or Ebox 84 stall sta1Vdon't flush sta1Vdon't flush see note 31 don't flush flush/see note" Ebox not requesting RMUX (in 84) and no sa stall2 see noteS/don't flush see noteS/don't flush see note 31 don't flush don't flush/see note" Ebox not requesting mrux (in 84) with 83 stall2 sta1Vdon't flush see noteSlfiush see note 31 don't flush don't flush/see note " Ebox BMUX (in RMOX S4 Clock! RMOX S4 Flush S5 Flush! RMOXFlush Ilf Fbox is next to retire a macroinstruction, then the RMUX always selects the Fbox even if the Fbox doesn't request it. 2The Ebox is requesting the RMUX if the microword in S4 specifies anything other than NONE in the DST field. 3Run ifFbox not requesting RMUX or ifFbox is requesting and there is no stall on the operation. Stall ifFbox is requesting a store and/or retire and there is a stall on the operation. "Don't flush if Fbox not requesting RMUX or if Fbox is requesting and there is no stall on the operation. Flush if Fbox is requesting a store and/or retire and there a RMUX 84 stall on the operation. sStall if RMUX 84 clock is stalled. Otherwise run. As is shown in Table 8-19, when an effective no-op is inserted into S4 during an S3 stall, S5 does not need to be flushed. The effective no-op in S4 will propagate into an effective no-op in 85. VERIFICATION NOTE The interaction between stalls and microbranches is different than Rigel. That all microbranch tests work properly when 83 is stalled and S4 is not stalled should be verified carefully. 8.5.20.1 The STALL Mlcroword In any cycle that the instruction queue is empty (and the Ibox is not providing a bypassed instruction queue entry directly to the Microsequencer), the Microsequencer fetches the STALL microword. This microword specifies no operation, except SEQ.MUX!LAST.CYCLE, and can't cause a stall anywhere in the pipeline. This allows the microwords already in the pipeline to continue even when the Ibox is temporarily unable to supply new instruction execution dispatches. See Chapter 9 for more detail. 8-74 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 8.5.20.2 Field Queue Stall When microcode uses the field queue, it executes a 4-way conditional microcode branch on two conditions, a not-empty condition and the I-bit status in the current field queue entry. Only three of the 4 branch outcomes are actually possible, because the output of the field queue is forced off if the queue is empty. The Ibox makes an entry into to the field queue when it processes a field operand. While the queue is empty, the microcode loops continuously repeating the same conditional branch. This is very much like a stall condition in that the pipeline stages all have the same operation in them in every cycle while the field queue remains empty. See Section 8.5.15.8 for more on field queue operation. 8.5.20.3 Ebox Stall Conditions The Ebox stall logic detects the need for stalls in various parts of the pipeline. The stalls must be detected on time to gate 4>1 latches at the start of the next cycle. This section assumes the Ebox is next to retire a macroinstruction. The next section deals with stalls with the Fbox next to retire a macroinstruction. The Ebox pipeline stalls in 83 when it is accessing some data in the register file which is not valid or when it requires an entry in the source queue which is not available. Up to two source queue entries and up to two MD or Wn registers can be accessed at once. The S3 stall lasts until all the accessed elements are valid and available. Wn and MD registers have valid bits associated with them. A register is valid only if this bit is set. A register's valid bit is not set when a memory read has been initiated for that register and hasn't yet completed. The valid bit is set by the Mbox when the read completes. . The source queue read and write pointers are examined to determine when there are sufficient source queue entries to satisfy the microword in S3. Either one or two entries might be needed. Only one is needed if the source queue is referenced in the A or B microword fields but not both. Two are needed if the source queue is referenced in both microword fields. The Ebox stalls in 83 if exactly the number of entries needed aren't present. In particular, if only one entry is needed, then the Ebox only stalls if the source queue is completely empty, and if two entries are needed, the Ebox stalls until two entries are made. The Ebox stalls in S3 if the microword in S3 is sending operands to the Fbox and the Fbox is indicating that it can't accept the any more operands. The Ebox stalls in S3 if the microword in S3 is accessing at least one GPR which is marked in the Fbox destination scoreboard as having an Fbox result store pending. Given that the retire queue indicates the Ebox is next to r~tire a macroinstruction, the Ebox stalls in S4 if the following are true: • • The microword in S4 specifies DSTIDST. The destination queue is empty, or the destination queue isn't empty, the destination queue entry indicates a memory store, and the current PA queue entry is not valid. The destination queue read and write pointers are examined to determine when the destination queue is empty. The current PA queue entry is valid when the Mbox has completed memory management checks for the store reference. The Mbox asserts M%PA....Q...STATUS_H<O> when the PA queue entry is valid. DIGITAL CONFIDENTIAL The Ebox 8-75 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 The Ebox stalls in S4 if the microword in S4 initiates a memory operation and the Mbox is already working on an Ebox-initiated memory operation. The EM_LATCH in the Mbox holds the current Ebox memory request. It is not available until the Mbox has finished that request. The Mbox provides a status which informs the Ebox that the EM_LATCH is empty. Destination queue indirect stores that are memory stores go in the EM_LATCH like any other Ebox memory access. So EM_LATCH-full 84 stalls can occur even when the microword in S4 specifies MRQlNOP. The Ebox stalls in S4 if the microword in 84 synchronizes with the branch queue and the branch queue is empty. The branch queue read and write pointers are examined to determine when the branch queue is empty. The Ebox stalls in S4 if the microword in S4 specifies MISC2IFDEST.CHECK and the entry in the destination queue needed to complete this operation is not yet valid. This stall ends when the Ibox writes the needed entry. The destination queue has a second access pointer, the FDest pointer. This pointer is compared to the destination queue write pointer to determine when the entry needed for the MISC2IFDEST.CHECK is available. When it is next to retire an instruction, the Fbox can cause an S4 stall by asserting Fo/DSTORE_BTALkH, indicating that the Fbox is stalling for this cycle because the data on F%FBOx:..RESULT_H is incorrect or there is a data exception to be evaluated in the Fbox's last stage. ",cSTORE_STALkH is only supposed to be asserted if the Fbox is storing a result (i.e., Fo/DSTORE_H is asserted). 8.5.20.4 Fbox and RMUX Related Stall Conditions The Ebox has several Fbox related stalls. When the Fbox requests the RMUX the Ebox may have to stall the Fbox. Also, depending on which box (Fbox or Ebox) is next to retire a macroinstruction, several different Ebox stalls may occur. NOTE When the microcode needs to stall in 83 waiting for an Fbox operation to complete, one or two microwords which specify DSTIWBUS should precede the microword needing the Fbox operation to be complete. Any microword specifying DSTIWBUS will stall in 84 until the Fbox retires its instruction. The appropriate amount of delay depends on which result is being awaited. The Ebox stalls in 84 if the current retire queue entry specifies that the Fbox is next to retire a macroinstruction and the Ebox is requesting the RMUX. The Ebox is requesting the RMUX if the microword in 84 specifies anything other than NONE in the DST field. Otherwise it is not requesting the RMUX. The Ebox stalls the Fbox (by asserting a stall signal before the end of the cycle) when the Fbox is requesting the RMUX and one of the four following is true (note that if the Fbox is next to retire, the RMUX portion of the Ebox pipeline is stalled whenever the Ebox stalls the Fbox): • • 8-76 The Ebox is next to retire a macroinstruction. The Fboxis next to retire a macroinstruction, is requesting to use the destination queue, and the current destination queue entry is not valid. The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 • • The Fbox is next to retire a macroinstruction, is requesting to use the destination queue, the current destination queue entry is valid and indicates a memory destination, and the PA queue is not valid. The Fbox is next to retire a macroinstruction, is requesting to use the des tina tion queue, the current destination queue entry is valid and indicates a memory destination, the PA queue is valid, and the EM_LATCH is full. The Ebox determines all these conditions as described in the previous section. No part of the Ebox pipeline is stalled by an Fbox request if the Ebox is next to retire a macroinstruction. The Fbox can cause an RMUX 84 stall by asserting Fo/"sTORE_STALL_H, indicating that the Fbox is stalling for this cycle because the data on F%FBOx..RESULT_H is incorrect or there is a data exception to be evaluated in the Fbox's last stage. (This also causes an 84 stall.) Fo/DSTORE_STALL_H is only supposed to be asserted if the Fbox is storing a result (i.e., Fo/"sTORE_H is asserted). The Ebox is always stalled in S4 if an RMUX S4 stall occurs. 8.5.21 Miscellaneous Operations The microword allows for a number of miscellaneous control and data movement operations. Most of them have been described elsewhere in this chapter, and are only summarized here. The following table lists all the miscellaneous operations by microword field and gives a description. Any of these fields can also specify NOP (no operation). Table 8-20: Ebox Miscellaneous Operations MISe Field· Both Standard and Special Microword Formats Mnemonic Description DL.BYTE DL <- byte; change effects next microword DL.WORD DL <- word; change effects next microword DL.LONG DL <- long; change effects next microword RESTART.mOX restart Ibox operand specifier parsing in S5 RESTART.MBOX restart Mbox operand processing in S5 RESET. CPU flush Mbox and F'box, initialize register file valid bits, flush Ebox queues, all in S6; stop Ibox in S5 CLR.PERF.COUNT Clear the performance counters in S5. See Chapter 18 INCR.PERF.COUNT Increment a performance counter in S5 if ECR<PMF_EMUX> is a certain value. See Chapter 18 CLRSTATE.3·0 clear flags<3:0>; change effects next microword SET.STATE.O set flag<O>; change effects next microword SET.STATE.l set flag<1>; change effects next microword SET.STATE.2 set flag<2>; change effects next microword MULL disables reserved instruction fault normally generated for Fbox instructions when the Fbox is not enabled. Used in MULL2 and MULL3 80 microcode can execute the macroinstruction instead of the Fbox. DIGITAL CONFIDENTIAL The Ebox 8-77 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 8-20 (Cont.): Ebox Miscellaneous Operations MISe Field· Both. Standard and Special Microword Formats MnemoDic Description CONST.lO.BIT Special constant generation mode. See Section 8.5.2 LOAD.SC.FROM.A SC <- E_B~US_L<4IO> LOAD.MPU.FROM.B MPU <- E..BUKBBUS_L<29Il8> LOAD.PSL.CC.IllP update PSL CCs: PSL<N,z,V> <- S5 Condition Codes <N,Z,V> PSL<C> <- PSL<C> (Unchanged) LOAD.PSL.CC.J1ZJ update PSL CCs: PSL<N> <- S5 Condition Code <N> .xOR. S5 Condition Code <V> PSL<Z> <- S5 Condition Code <Z> PSL<V> <-0 PSL<C> <- .NOT. S5 Condition Code <C> LOAD.PSL.CC.lill update PSL CCs: PSL<N,z,V,C> <- S5 Condition Codes <N,Z,V,C> LOAD.PSL.CC.IllJ update PSL CCs: PSL<N,z,V> <- S5 Condition Codes <N,Z,V> PSL<C> <- .NOT. S5 Condition Code <C> LOAD.PSL.CC.IllP.QUAD update PSL CCs: PSL<Z> <- PSL<Z> .AND. S5 Condition Code <Z> PSL<N,V> <- S5 Condition Codes <N,V> PSL<C> <- PSL<C> (Unchanged) LOAD.PSL.CC.PPJP update PSL CCs: PSL<N,z,v> <- PSL<N,z,V> (Unchanged) PSL<V> <- .NOT. S5 Condition Code <Z> CLR.VECT.RDY S3 clear ofVECTOR_RDY condition. See Section 8.5.18 MISel Field· Special Format Microword Mnemonic Description RETIRE.INSTRUCTION generate lbox retire instruction signal in S5 FLUSH.VIC flush. lbox virtual instruction cache in S5 FLUSH.BPC flush. lbox branch prediction cache in S5 FOP.VALID F'box operand on E'*aABUS..B<311O> and EClmBUS_B<311O> or both FLUSH.PCQ Flush PC queue in lbox CLRSTATE.5-4 clear fiags<5:4>; change effects next microword SET.STATE.3 set fiag<3>; change effects next microword SET.STATE.4 set fiag<4>; change effects next microword SET.STATE.5 set fiag<5>; change effects next microword 8-78 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 8-20 (Cont.): Ebox Miscellaneous Operations MISC2 Field· Special Format Microword Mnemonic Description F.DEST.CHECK Access destination queue and make entry in Fbox destination scoreboard FLUSH.PAQ Flush PA queue in Mbox MRQ Field· Both Standard and Special Format Microwords Mnemonic Description SYNC.BDISP stall if branch displacement invalid in S4; microtrap if fault SYNC.BDISP.RETIRE stall if branch displacement invalid in S4; microtrap if fault; S5 retire entry SYNC.BDISP.TEST.PRED stall if branch displacement invalid in 84; microtrap if fault; S5 microtrap if mispredict and retire entry LOAD.PC load new PC (always followed by MISCIRESTART.IBOX) DISABLE.RETmE Field· Special Format Microword Mnemonic Description YES Disable the retire macroinstruction and retire retire queue entry effects of SEQ.MUXlLAST.CYCLE and SEQ.MUXlLAST.CYCLE.OVERFLOW NO Enable the retire macroinstruction and retire retire queue entry effects of SEQ.MUXlLAST.CYCLE and SEQ.MUXlLAST.CYCLE.oVERFLOW The MISCllRETIRE.INSTRUCTION function signals the lbox to retire an instruction in order to bring the backup PC queue and the RLOG into the correct state for restoring GPRs and providing the backup PC after a microtrap. It does not retire a retire queue entry. Therefore MISClJRETIRE.INSTRUCTION must always be followed by a MISCIRESET.CPU before the next macroinstruction execution dispatch (via SEQ.MUXlLAST.CYCLE). The MISCIRESET.CPU function causes EO/cSTOP_mO:K..H to be asserted in 85 and E%FLUSH_MBO:K..H, E_MSC%FLUSH_EBO:K..H, and E%FLUSH_FBOx:..H to be asserted in 86. 8.5.22 Ebox IPRs The Ebox implements two IPRs. They are IPRs 124-125 (decimal), PCSCR and ECR. ECR is a possible source of E_BUS%ABUS_L<31:O>, accessed by specifying ECR in the A field of the microword. ECR and PCSCR are also possible destinations of E_BUS%WBUS_L<31:O>, written by specifying PCSCR or ECR in the DST field of the microword. On writes, the entire register is written, regardless of the current DL value. DIGITAL CONFIDENTIAL The Ebox 8-79 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 8-9: IPR 7C (hex), PCSCR 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 0I 0I 0I I I 0I 0I 0I 0I 0I 0I 0I 0 I 0I 0I I I I I I 0 I 0 I 0 I 0 I 0 I 0 I 0 I 0 I :PCSCB +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I I +-- NONSTANDARD_PATCH +-- PATCH_REV 8.5.22.1 I I I I I I I I I I I I I I I I I I I I I DATA --+ I RWL_SHIFT --+ I PCS_WRITE --+ I PCS_ENB --+ I PAR_PORT_DIS --+ IPR 7C (hex), Patchable Control Store Control Register The pcseR is used to load control store patches. Chapter 9 describes the patchable control store function in detail. Figure 8-9 and Table 8-21 show the bit fields and give descriptions. 8-80 The Ebox DIGITAL CONFIDENTlAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 8-21: PCSCR Field Descriptions Name Extent Type DATA Description 8 RW,O Writing a 1 disables control by the testability parallel port of the section of the internal scan used in loading the control store CAM (content addressable memory) and RAM. This is necessary when using this register to load the control store CAM and RAM. 9 RW,O Enables the control store CAM and RAM so that patches are fetched and supersede the control store ROM. 10 WO The event of writing a 1 to this bit causes the PCS scan chain contents to be written into the control store CAM and RAM. The control signal which enables the write returns to the inactive state automatically; there is no need for software to write a 0 to this bit after writing a 1. This bit always reads as o. 11 WO The event of writing a 1 to this bit causes the PeS scan chain to shift by one. The control signal which enables the shift returns to the inactive state automatically; there is no need for software to write a 0 to this bit after writing a 1. This bit always reads as o. 12 RW,O This bit holds the data which is shifted into the PCS scan chain when a 1 is written to RWL_SffiFr. By repeatedly setting DATA and writing a 1 to RWL_SHIFT, software can shift any data pattern into the PCS scan chain. RW This bit is set by software after loading a microcode patch. If it is 1, it indicates a non-standard microcode patch has been loaded. This bit is returned as bit <8> in a read from the SIn processor register, except that 0 is substituted for this bit in microcode for a SID read ifPCSCR<PCS_ENB> is o. RW This field is set by software after loading a microcode patch. It indicates the revision of the standard microcode patch which has been loaded. This field is returned as bits <13:9> in a read from the SIn processor register, except that 0 is substituted for this field in microcode for a SIn read if PCSCR<PCS_ENB> is o. 28:24 1 This hit or field not implemented in pass 1 chips. 8.5.22.2 IPR 7D (hex), Ebox Control Register The ECR is used to configure certain Ebox functions. Figure 8-10 and Table 8-22 show the bit fields and give descriptions. DIGITAL CONFIDENTIAL The Ebox 8-81 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure ~10: IPR 7D (hex), ECR 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I 0I 0I 0I 0I 0I 0I 0I 0 I I I I I 0I 0I I 0I 0I 0I 0I 0I I I I I I I : ECR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +-- PMF_CLEAR I I I I I I I I I I I I I I I I I I I I PMF_LFSR --+ PMF_EMUX --+ PMF_PMUX --+ PMF_ENABLE --+ FBOX_TEST_ENABLE --+ ICCS_EXT --+ TIMEOUT_CLOCK --+ TIMEOUT_TEST --+ TIMEOUT_OCCURRED --+ FBOX_ST4_BYPASS_ENABLE --+ FBOX_ENABLE --+ VECTOR_PRESENT --+ 8-82 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 8-22: ECR Field Descriptions Name Erlent Type Description This bit is for vector unit support in a future version of this chip. FBOX_ENABLE 1 RW,O This bit is set to a 1 by configuration code to enable the Fbox. TIMEOUT_EXT 2 RW,O This bit is set to a 1 by configuration code to select an external timebase for the S3 stall timeout timer. FBOX_ST4_BYPASS_ ENABLE 3 RW,O This bit is set to a 1 by configuration code to enable Fbox Stage 4 bypass. TDMEOUT_OCCURRED 4 WC This bit indicates that an S3 stall timeout occurred. Writing it with a 1 clears it. TIMEOUT_TEST 5 RW,O If this bit is aI, the S3 stall timeout circuit counts cycles instead of cycles in which ~TIMEOUT_ENABLE_B is asserted. In this test mode the sa stall timeout time is roughly 50 microseconds instead of roughly 3 seconds. This bit is most significant bit of the timeout base counter. It is used as an indication that ~TIMEOUT_ENABLE..B is functioning (though some logic is not covered by this test). It should be 1 half of the time and 0 the other half of the time. The period of the oscillation is 65536 time the cycle time of the chip or of the waveform on P%OSC_TCl_H, depending on ECR<TlMEOUT_EXT>. For ECR<TlMEOUT_EXT> set to o and a 14 nsee cycle time, this is a period of roughly 900 microseconds. ICCS_EXT 7 RW,O This bit is set by configuration code to select the interval timer mode. When it is 0, the CPU implements a subset interval timer with ICC8<6> maintained on the chip. When set to 1, the CPU implements a full interval timer with ICCS, NICR, and ICR processor registers implemented off chip. See Chapter 10. When this bit is set to a 1, 6FBOI...TBST_EN'B_B is asserted. This puts the Fbox is a test mode in which data is passed from stage to stage unaltered. PMF_ENABLE 16 RW,O This bit is the internal implementation of the PME processor register. See Section 18.2.4 for more detail. This field selects the source of the events counted by the performance monitoring facility, when enabled, to be !box, Ebox, Mbox, or Cbox. See Section 18.2.3 for more detail. PMF_EMUX 21:19 RW,O This field selects the Ebox events counted by the performance monitoring facility, when the performance monitoring facility is configured to count Ebox events. See Table 18-3 for more detail. PMF_LFSR 22 RW,O This bit enables the ~WBUS_H<311O> LFSR (linear feedback shift register) accumulator. This is a testability feature. See Section 8.5.26.2 for more detail. DIGITAL CONFIDENTIAL The Ebox 8-83 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 8-22 (Cont.): ECR Field Descriptions Name Extent Type Description 31 Writing a 1 to this bit clears the performance monitoring facility counters (which are also the :n.WBUSJl<31aO> LFSR accumulator). It is not implemented in hardware. Microcode handles this function. WO 8.5.23 Initialization The main mechanism for Ebox initialization is the power-up microtrap, and the MISCIRESET.CPU which occurs in the first microword of this microtrap flow. When this trap occurs, the Microsequencer will assert E_USQ%PE_ABORT_L, aborting the Ebox pipeline as it does for any microtrap. None of the registers in the register file or elsewhere in the Ebox are cleared on initialization, except that IPR bits are cleared where indicated by the bit type (see Section 8.5.22). The state flags are also cleared by reset. The Ebox asserts EO/tSTOP_moX_H, E_MSC%FLUSH_EBO~H, E%FLUSH_MBOX_H, and E%FLUSH_FBOX_H during reset. This is the same effect as MISCIRESET.CPU. See the sections on initialization for each of the boxes for more detail. 8.5.24 Timing TBS. A timing diagram for major Ebox signals will someday appear here. 8.5.25 Error Detection Ebox handling of memory management faults and hardware errors detected by the Mbox while processing an Ebox or Ibox request is covered in Section 8.5.19 and Section 8.5.17. 8.5.25.1 S3 Stall TImeout The Ebox implements an S3 stall timeout timer. The timeout time is shown in Table 8-23. Figure 8-11 shows all the NVAX timeout timers, including those implemented in the Cbox. The Cbox timeout timers are shown because they use E%TIMEOUT_BASE_H as their timebase. See Section 13.4.3.4 for more detail on the Cbox timeout timers. The timeout timer input is E%TIMEOUT_BASE_H, which is created internally by dividing the CPU clock by 65536. As an alterative in systems in which require longer timeout times than NVAX implements, this timer can use an externally supplied timebase. To select the external timebase, K%EXT_TMBS_H, ECR<TIMEOUT_EXT> is set to 1. In this case the base counter counts cycles of K%EXT_TMBS_H instead of the NVAX CPU internal clock. K%EXT_TMBS_H is a synchronized version of the signal received on pin P%OSC_TCl_H. Note that Po/DOSC_TCl_H is synchronized in the clock section to NDAL clocks and must therefore be driven with a clock signal which is high for longer than one NDAL cycle and low for longer than one NDAL cycle. For a square wave clock waveform this implies a speed of 11.9 MHz or less. 8-84 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 8-23: S3 Stall Timeout Values In Normal Mode Cycle time 1imeout Granularity 100ns NVAX 655 microseconds 2.6837 (min) to 2.68345 (max) seconds 12-ns NVAX 786 microseconds 3.22044 (min) to 3.22123 (max) seconds 14-n8 NVAX 917 microseconds 3.75718 (min) to 3.7581 (max) seconds Figure 8-11: S3 Stall timeout NVAX Timeout Counters NOT(RESET) EBOX BASE COUNTER 16 BITS VDD E%TlMEOUT BASE H (MASTER UPDATE-ENABLE) (SYNCRONIZED P%OSC_TC1_H) E_FLT%S3 TIMEOUT STALL H VDO ECRcTIMEOUT_TEST> READO COUNTER 8 BITS 1---------oCLEAFI ENABLE READ1 ~~----~~CLEAFI ENABLE DIGITAL CONFIDENTIAL The Ebox 8-85 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 In every cycle the Ebox counter increments, if one of the following is true: • • • S3 is stalled, or The microword in S3 is the STALL microword (as determined by the piped version of E_USQ%I~S'l'ALkB sent from the Microsequencer). The field queue is being accessed via a microcode conditional branch and is empty (E_FLQ%F~STALL_B is asserted). These conditions are accumulated into one condition, E..FLT%S3_TIMEOUT_STALL..B, in the fault logic section of the Ebox. If none of the above conditions is true, the Ebox counter is reset to O. If the counter reaches its maximum value and overflows, an immediate asynchronous hardware error microtrap is forced. The microtrap breaks the Ebox stall by aborting the pipeline. When the S3 stall timeout timer overflows, forcing a microtrap, the signal EfYt83_TIMEOUT_H is asserted for one cycle. This causes the chip reset logic to reset the Mbox and Cbox. Microcode, in handling the asynchronous hardware error microtrap, must also do MISCIRESET.CPU in order to properly reset the Mbox. The Ebox timeout counter treats cycles in which the pipeline advances the STALL microword into S3 as an S3 stall cycle. If the Microsequencer sends STALL microwords into the pipeline continuously, the timer will eventually timeout. This is the case when the instruction queue in the Microsequencer r~ains empty forever. Similarly, if microcode is in an infinite loop, conditionally branching on the field queue contents, an S3 stall timeout will occur. Any true S3 or S4 stall which lasts forever will cause an S3 stall timeout. It is expected that some hardware failures within the NVAX CPU could cause the Ebox to get out of sync with the Ibox, Ebox, or Fbox.This could result in the Ebox waiting forever for an event which will never happen. This timeout timer causes a machine check exception to occur instead of allowing the CPU to simply hang. 8.5.25.1.1 Testing the S3 Stall Timeout 11mer The Ebox timeout counter may be configured for testing by writing a 1 to ECR<S3_TIMEOUT_TEST>. When this bit is 1, the Ebox counter counts NVAX CPU internal clock cycles instead of cycles of E%TIMEOUT_BASE_B. Table 8-24 gives the timeout times in test mode. See the timeout counter test discussion in Section 13.4.3.4 for detail on how to cause a timeout for test purposes. The timeout will cause the asynchronous hardware error machine check (see Chapter 15). Table 8-24: S3 Stall Timeout Values In Test Mode Cycle time Timeout Granularity S3 Stall timeout 10-ns NVAX 10 nanoseconds 40.95 (min) to 40.96 (max) microseconds 12-n8 NVAX 12 nanoseconds 49.14 (min) to 49.152 (max) microseconds 14-ns NVAX 14 nanoseconds 57.33 (min) to 57.344 (max) microseconds DERIVATION OF TIMEOUT VALUES 8-86 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 The timeout values given above were derived as follows: Table 8-25: ~AU[ Tbneout Granularity S3 Stall timeout (in ~AX cycles) (in NVAX cycles) Normal 2**16 2**2~2**16 (min) to 2**28 (max) Test 1 2**12-1 (min) to 2**12 (max) mode 8.5.26 Derivation of NVAX TImeout Values Testability This section describes the testability features in the Ebox. 8.5.26.1 Parallel Port Test Features The microaddress currently being used to access the control store is visible on the parallel port. Much information about Ebox execution can be inferred from the sequence of microaddresses seen on the parallel port. See Section 9.5. No other Ebox signal is visible directly at the parallel port. Quite a few are visible through the internal observability scan chain controlled via the parallel port controlled inputs. Table 8-26 shows these signals. Timing information and a description is given for each signal. The scan chain loads input data in 4>4. If a signal is not ready to be latched in 4>4, it has to be delayed before being loaded into the scan chain. This implies that the particular signal's value sampled by the scan chain is from one cycle earlier than the cycle in which the scan chain was loaded. This is shown Table 8-26 in the timing column. Table 8-26 lists the scan chain data bits in the order in they would appear at the parallel port. The value of E_RGF%ERROR_H appears first and the value of F%STORE_STALL..H appears last. Table 8-26: Ebox Observe Scan Signals Schematic Signal TimiDg Description E~R..L delayed A 1 value means the Ebox is detecting a hardware error associated with the current MD read (including bypassed MD reads) or with a current S3 lbox-to-Ebox queue access (instruction queue, source queue, or field queue). E~FAULTJ. delayed A 1 value means the Ebox is detecting a memory management fault associated with the current MD read (including bypassed MD reads) or with a current S3 Ibox-to-Ebox queue access (instruction queue, source queue, or field queue). A 0 value means the Ibox is signaling a memory management exception associated with one of the lbox-to-Ebox queues (instruction queue, source queue, field queue, branch queue, or destination queue). DIGITAL CONFIDENTIAL The Ebox 8-87 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 8-26 (Cont.): Ebox Observe Scan Signals Schematic Signal 'l'imina Description A 0 value means the Fbox is currently requesting that no more input data be sent. A 0 value means the Thox is signaling a hardware error associated with. one of the Ibox-to-Ebox queues (instruction queue, source queue, field queue, branch queue, or destination queue). A 1 value means the destination queue is being accessed and there isn't a valid entry. A 0 value means the Fbox is requesting a store in this cycle. delayed A 1 value means the Thox register :file write is being bypassed to E..BURBBUS_L delayed A 1 value means the data on E_BUS~BBUS..J. is valid (otherwise a MD or WN stall would occur). delayed A 1 value means the Ibox register :file write is being bypassed to E_BUfiABUS_L. delayed A 1 value means the data on E_BUS~US_L is valid (otherwise a MD or WN stall would occur). A 0 value means the Fbox destination scoreboard in the destination queue has a hit (i.e., a current source queue based register :file read is to a register the Fbox will update in the future). A 0 value means the current source queue read(s) is (are) accessing an empty location - one kind of 83 stall. A 1 value means the Ebox is next to retire an instruction, not the FbOL delayed A 0 value means the Ebox is signaling the microsequencer to initiate a memory management fault microtrap. delayed A 0 value means the Ebox is signaling the microsequencer to initiate a synchronous hardware error microtrap. A 1 value means the Ebox is stalled in 84 doing the FDE8T.CHECK operation and the destination queue doesn't contain the necessary entry or entries. vss Always a 1 value. A 0 value means the Ebox is recognizing a hardware error because the Mbox wrote a working register or GPR while asserting MMIARD_ERR_R. A 0 value means the Ebox is detecting a memory management fault with a current 83 Thox-to-Ebox queue access (instruction queue, source queue, or field queue). 8-88 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1t August 1991 Table 8-26 (Cont.): Ebox Observe Scan Signals Schematic Signal TimiD.c Description A 0 value means the Ebox is detecting a hardware elTor with a current 83 Ibox-~Ebox queue access (instruction queue, source queue, or field queue). A 0 value means the most significant bit of this field is a 1. See Table 8-6. This data is valid for the condition code alteration in the current cycle (85), provided it is a Fbox instruction being retired. A 0 value means the least sig.oificant bit of this field is a 1. See Table 8-6. This data is valid for the condition code alteration in the CUlTent cycle (85), provided it is a Fbox instruction being retired. A 0 value means the Fbox is requesting an instruction retire in this cycle. A 0 value means the Ebox is stalling because the PA queue is not valid and the current destination queue access is requiring the use of the PA queue. 0 value means the Ebox is stalling because the branch queue is empty and the cu.nent microinstruction in 84 accesses it. A A 0 value means the Fbox is signaling a hardware error on one of the source operands for the currently retiring instruction. A 0 value means the .Fbox is signaling a reserved address mode fault on one of the source operands for the currently retiring instruction. A 0 value means the Fbox is signaling a memory f'CUDIGTJ'AtJLTJI management fault on one of the source operands for the currently retiring instruction. A 0 value means the Fbox is signaling a reserved operand fault for the currently retiring instruction. A 0 value means the Fbox is signaling a floating overflow fault for the currently retiring instruction. A 0 value means the Fbox is signaling a floating divide-by-zero fault for the currently retiring instruction. A 0 value means the Fbox is signaling a floatine underftow fault for the currently retiring instruction. delayed A 0 value means the Mbox is signaling that the EM_LATCH is full. A 0 value means the Ebox is making an Mbox request in this current cycle. vss Always a 1 value. A 0 value means the Ebox is signaling the Mbox that an Ibox 10 space read may begin in the current cycle. subject to certain Mbox restrictions. DIGITAL CONFIDENTIAL The Ebox 8-89 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 8-26 (Cont.): Schematic Sip..al Ebox Observe Scan Signals Timing A 0 value means the Fbox aborted a store request late in the current cycle. NSTORE.-STALI....H 8.5.26.2 Description E%WBUS_H<31 :0> LFSR E%WBUS_H<31:O> (the buffered copy of E_BU8f1CWBUS_L<31:O> which is driven to the Mbox) has an LFSR (linear feedback shift register) accumulator. The LFSR is implemented as part of the performance monitoring facility that is described in Chapter 18, and controlled by two bits in the ECR processor register: PMF_LFSR and PMF_CLEAR. The E%WBUS_H<31:O> LFSR is implemented as two identical 16-bit LFSRs, one for E%WBUS_H<31:16> and one for E%WBUS_H<15:O>. A block diagram of one of these 16-hit LFSRs is shown in Figure S-12. The reader should note that the output of the left-most hit in the LFSR chain is inverted before being XORed with earlier taps. This was done for implementation reasons. Figure 8-12: E%WBUS_H LFSR Block Diagram Both halves of the E%WBUS_H<31:O> LFSR may be cleared by software by writing a 1 to ECR<PMF_CLEAR> (which results in microcode executing the MISC/CLRPERF.COUNT function). The operation of the pair of LFSRs is started by software by writing a 1 to ECR<PMF_LFSR> and stopped by writing a 0 to the same bit. The current state of the E%WBUS_H<31:O> LFSR may be read by software via the PMFCNT processor register (an E_BUS%ABUS_L<31:O> source available via MFPR) in the format shown in Figure S-13. 8-90 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 8-13: PMFCNT Processor Register In E%WBUS_H<31 :0> LFSR Format 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 E_WBUS<31:16> LFSR Value 1 E_WBUS<15:00> LFSR Value I :PMFCNT +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ CAUTION The E%WBUS_B<31:O> LFSR hardware also provides the perlormance monitoring facility function under control of ECR<PMF_ENABLE>. The operation of the hardware is UNDEFINED if both ECIkPMF_ENABLE> and ECR<PMF_LFSR> are on, or if software uses a single MTPR write to turn off one hit and turn on the other simultaneously. That is, if either bit is on, software must turn off both bits with one MTPR and turn on the other with a second MTPR. 8.5.27 Microcode Restrictions This section gives microcode restrictions due to Ebox microarchitec1u:re and the VAX architecture. 8.5.27.1 Register Access Restriction The first microword of any execution microfiow must not read GPRs explicitly, and an explicit read must be preceded by at least one microword specifying something other than NONE in the DST field. (AIS!, AlS2, BlS!, and BlS2 are always allowed.) This restriction has to do with the fact that the Fbox destination scoreboard only examines the source queue outputs to detect GPR read-before-write hazards. Therefore it specifically does not apply in a microtrap flow since the Fbox can never write a result after a microtrap. 8.5.27.2 FLUSH.PAQ Restriction should only be specified when the MRQ field specifies an Mbox operation which is sent in the EM latch (i.e., other than MRQlSYNC.BDISP, MRQlSYNC.BDISP.RETIRE, MRQlSYNC.BDIBP.TEST.PRED, or MRQlNOP). Otherwise the Mbax will not Bush the PA queue. MISC2IFLUSH.PAQ 8.5.27.3 Memory access restrictions Microcode must ensure that all accesses from the current microfiow are complete before allowing the microsequencer to dispatch to the next microfiow. Destination queue indirect writes (DSTIDST) may be implicit memory operations. The MRQ field must specify NOP,8YNC.BDIBP,8YNC.BDISP.RETIRE, or SYNC.BDISP.TEST.PRED when this operation is specified. 8.5.27.4 Shifter Restrictions If the shifter uses the sc register as the source of the shift amount, the se must have been loaded from E_BUS%ABUS_L<4:O> by the previous microword or from E_BUS%WBUS_L<4:O> by the microword before that. Otherwise the old BC value is used as the shift amount. DIGITAL CONFIDENTIAL The Ebox 8-91 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 8.5.27.5 SHIFT.SIGN Restriction The saved copy of the shifter sign bit (Saved-SHF<N» is UNPREDICTABLE after executing a special format microword. 8.5.27.6 MMGT.MODE Restrictions The MMGT.MODE register must be loaded before (in a microword preceding) the microword specifying a memory management probe in the MRQ field. 8.5.27.7 MPU Restrictions If the MPU mask value is loaded by microword N specifying MISCILOAD.MPU.FROM.B, microcode may not branch on the new value until microword N+2. Microcode may branch on the old value in Nand N+1. 8.5.27.8 Mlcrobranch Condition Restrictions The first microword of a macroinstruction execution microflow should not branch based on the state flags. (It may set or clear them.) 8.5.27.9 Ibox IPR read restriction Microcode should not use GPRs as the target for read type accesses to Ibox IPRs. There is no synchronization mechanism to determjoe when the result is ready. Also, the control logic in the IOOx IPR assumes a working register is the destination. 8.5.27.10 RETlRE.lNSTRUCTION The MISCl field operation, RETIRE.INSTRUCTION must always be followed by a MISCIRESET.CPU. The MISCIRESET.CPU may come any number of cycles later, but must come before the next macroinstruction microflow is dispatched. 8.5.27.11 VAX Restan Bit Restriction The VAX Restart Bit should not be read until two microwords after the last microword whose effect is expected to be reflected in the bit's state. For example, the machine check microflow should wait until the second microword before reading the bit to put it on the stack. Then the bit will reflect the state for the aborted execution microflow. 8.5.27.12 Q Register Interaction With SMULSTEP and UDIV.STEP In the microword after the last ALU/sMUL.STEP or ALUIUDN.STEP, the Q register should not be sourced to E_BUS%ABUS_L<31:O> or E_BUS%BBUS_L<31:O>. Bypassing is not implemented for this kind of Q register update. The microword before an ALU/SMUL.STEP must not update the Q register (QlUPDATE.Q) unless that microword also specifies ALU/SMUL.STEP. 8-92 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 8.5.27.13 UDIVISMUL Restrictions The Q field must specify QlUPDATE.Q if the ALU field functions SMUL.STEP or UDIv.STEP are specified. Also the ALU result must be specified as the source of E_BUS%WBUS_L<31:O>, and the shifter operation must be NOP. 8.5.27.14 F.DEST.CHECK Restrictions The F.DEST.CHECK miscellaneous operation should only be used as intended. It should be specified in the last microword of a microflow which sends operands to the Fbox. It should never be specified in a microword which also specifies nST in the DST field. 8.5.27.15 Fbox Operand Delivery Restriction IN delivering operands to the Fbox microcode may only not use AlS2 or BlSl. Short literal bypass to the Fbox source operand buses is not implemented for these decodes. Use of these decodes for Fbox operands could cause improper input data formatting in the Fbox if a short literal data item is present in the source queue. 8.5.27.16 RMUX control Restrictions Every microword with an S4 or 85 side effect of modifying any state (examples include SYNC.BDISP.RETlRE, RESET.CPU, and LOAD.PSL.CC.xxxx) must specify a DST other than NONE. A DST of WBUS is acceptable. This restriction specifically does not apply to FDEST.CHECK. Every microword specifying any operation other than NOP in the MRQ field must specify a DST other than NONE. A DST of WBUS is acceptable. 8.5.27.17 Control Bits After changing either ofECR<l or 3> (FBOX_ENABLE or FBOX_ST4_BYPASS_ENABLE) microcode should not do a SEQ.MUXILAST.CYCLE or SEQ.MUXlLAST.CYCLE.OVERFLOW in the three microwords following the one altering the control hit. 8.5.27.18 8.5.27.18.1 Mlcrotrap Dispatch and RESET.CPU Restrictions Mlcrotrap Flows In a microtrap handler for any microtrap except branch mispredict, microcode must do a MISCIRESET.CPU before it can read any of the registers in the register file which has a valid bit. This restriction is necessary to avoid deadlock. Specifically, microcode must not source any Wn register (working register) until the microword after the one which specifies MISCIRESET.CPU. In a microtrap handler for any microtrap except branch mispredict, there should he no memory request until the third microword after the one specifying MISCIRESET.CPU. In a microtrap handler for any micro trap except branch mispredict, any microcode operation which causes an entry in the retire queue to be retired is illegal until a MISCIRESET.CPU is executed and a second microword specifying SEQ.MUXlLAST.CYCLE and DISABLE.RETIREIYES is executed. This second microword must not occur until after the third microword after the one specifying MISCIRESET.CPU. DIGITAL CONFIDENTIAL The Ebox 8-93 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 In a microtrap handler for any microtrap except branch mispredict, UNPREDICTABLE or UNDEFINED results could occur if microcode accesses the source queue, destination queue, instruction queue, branch queue, or field queue until the fourth microword after the one which specifies MISCIRESET.CPU. Similarly, UNPREDICTABLE results could occur if microcode reads from the Wn or GPR register before the fourth microword after the one which specifies MISCIRESET.CPU, or writes to these registers before the second microword after the one which specifies MISCIRESET.CPU. 8.5.27.18.2 MISC/RESET.CPU Restrictions The fourth microword after one specifying MISCIRESET.CPU may specify SEQ.MUXlLAST.CYCLE (with DISABLE.RETIRElYES), but the first three must not. The first three microwords after a MISCIRESET.CPU must not access the source queue or field queue. The first two microwords after a MISCIRESET.CPU must not access the destination queue, branch queue. The first two microwords after a MISCIRESET.CPU must not issue memory requests. After a microword specifying MISCIRESET.CPU, any microcode operation which causes an entry in the retire queue to be retired is illegal until a microword specifying SEQ.MUXlLAST.CYCLE and DISABLE.RETIRElYES is executed. This microword must not occur until after the third microword after the one specifying MISCIRESET.CPU. 8.5.27.18.3 Asynchronous Hardware Error Mlcrotrap Restriction There are two possible causes of this microtrap, TB parity error and 83 stall timeout. If the cause is S3 stall timeout then the Mbox and Cbox are reset by Ebox hardware for 17.5 cycles. Microcode must not issue any memory requests during that reset time period. Also, the Mbox requires that the MISCIRESET.CPU function be done during the reset period. The first microword of the microtrap handler does not reach 86 until 5 cycles after the 83 stall timeout is detected. Hence the earliest the effect of MISCIRESET.CPU on the Mbox can occur is 5 cycles into the 17.5 cycle reset period. Microcode currently issues the MISCIRESET.CPU upon entry to the asynchronous hardware error microtrap (regardless of the cause) and then waits 23 cycles before beginning normal exception handling procedures. This is the recommended procedure. 8.5.27.18.4 First Part Done Dispatch Restriction The microcode flow at the dispatch for PSL<FPD> set must determine if the opcode is that of an Fbox instruction. If it is, then a MISCIRESET.CPU must occur before the next SEQ.MUXlLAST.CYCLE or SEQ.MUXlLAST.CYCLE.OVERFLOW. This case results in the Fbox and Ebox being out of synch in the protocol for sending opcodes and operands. The Fbox must be flushed. If the instruction is not an Fbox instruction, microcode may continue without the MISCIRESET.CPU (as it does in the case of unpacking and continuing the execution of an interrupted string instruction such as Movea). 8.5.27.19 PSL Use Restrictions The PSL must not be loaded in the first nricroword of a macroinstruction execution microflow. The first two microwords of any macroinstruction execution microflow (any opcode dispatch or the FPD dispatch) should not use the PSL as a source. The PSL<TP> bit read onto E_BUSo/oABUS_L<31:O> will not necessarily be correct. Microcode may disregard this restriction if it is acceptable for this bit to be incorrect. (Reading the PSL does not prevent the automatic copy of <1'> to <TP>.) 8-94 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 The PSL should not be read in the microword after it is updated. If this rule were not followed, it is UNPREDICTABLE whether the second microword will source the old or the new PSL value. (Actually it depends on whether an S3 stall occurs on the second microword.) On loading a new PSL, the third microword after the one altering the PSL may specify LAST.CYCLE for a decode dispatch, but the first two may not. If it is known that the PSL<FPD, T, or TP> bits will not change, then this restriction does not apply. On loading a new value to. PSL<FU>, the microword after the one altering the PSL may specify SEQ.MUXILAST.CYCLE for a decode dispatch, but the one which altered the PSL must not. If microcode loads a new value to PSL<IPL> in microword N, then microwords N through N+3 must not specify SEQ.MUXILAST.CYCLE or SEQ.MUXILAST.CYCLE.OVERFLOW, but N+4 may. .After changing the PSL microcode generally should not micro-branch on PSL bits in the next two microwords. Assuming microword N updates the PSL, if microwords Nor N+l branch on the PSL the old PSL value will determine the result of the microbranch. However, if microword N+2 branches on the PSL, it is UNPREDICTABLE whether the old or new PSL bits will be used to determine the branch outcome. (Actually, it is predictable if S3 stalls on microword N+l are known.) IfN+3 branches on the PSL, the new PSL value will definitely determine the result of the microbranch. This restriction specifically does not apply if PSL<29,26:22> are not changed by the load. Many microcode :flows alter the condition code bits, PSL<3:0>, in the last cycle of the :flow. This implies that microcode should not source the PSL in the first microwords of any :flow except microtrap :flows (i.e., don't in these :flows: opcode dispatch, FPD dispatch, trace fault dispatch, or interrupt dispatch) unless it is acceptable that the incorrect value might be read for the condition code bits. (This assumes that the first microword of the flow synchronizes to any outstanding Fbox retire by specifying a nST other than NONE.) Certain restrictions accompany changes to PSL<CUR_MOD>. The Mbox must not be processing any Ebox references or operand prefetches while PSL<CUR_MOD> is being changed. The mieroword after the one changing PSL<CUR_MOD> can issue a memory reference which will be access checked using the new PSL<CUR_MOD> value. There are no restrictions on reading or writing the PSL in beginning of a microtrap flow. The Ebox pipeline has been :flushed before the microtrap flow begins, so there can't be updates to the PSL after this micro:flow starts. The following table summarizes PSL restrictions at beginnings and ends of :flows. DIGITAL CONFIDENTIAL The Ebox 8-95 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 8-27: PSL Restrictions Summary PSLBits At beginning of new Bowl Before end of any 1l0w2 PSL<3:0>; PSL<N:Z,V,C>s r' o PSL<4,27>; PSL<T,FPD> 0 3 PSL<6>; PSL<FU> 0 1 PSL<20:16>; PSL<IPL> 0 4 PSL<30>; PSL<TP> 2 3 PSL<any other> 1 o 1Number of microwords required at beginning of microflow before microword in which these bits are read. Applies to macroinstruction execution flows (including FPD dispatches), and to trace fault and interrupt dispatches, but not to microtrap dispatches. 2Number of microwords after one which alters these bits (before and including the one which specifies SEQ.MUXlLAST.CYCLE or 8EQ.MUXlLAST.CYCLE.OVERFLOW) 3This assumes the microcode convention of altering the PSL condition code bits in the last microword of some. execution flows. 'This assumes that the first microword of the flow synchronizes to any outstanding F'box retire. 8.5.27.20 S+PSW Restrictions The PSL is written in 85 while the S+PSW source is read in 83. If microword N updates the PSL, microword N+l should not source s+psw.1t is UNPREDICTABLE whether the old or new value would be sourced if this restriction were not obeyed. 8.5.27.21 RN.MODE.OPCODE Restrictions For the RN field to be valid, the A field of the microword must specify 81 (the current source queue entry), and the microcode must know from context that the source queue entry points to a GPR. If these restrictions are not met, the value returned in the RN field is UNPREDICTABLE. The PSL is written in 85 while the RN.MODE.OPCODE source is read in S3. If microword N updates the PSL, microword N+l must not source the new value ofRN.MODE.OPCODE. It could receive the old value. 8-96 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 8.5.28 Signal Name Cross-Reference The following table gives a cross reference for selected signal names in this chapter. Only signal names which have different names in this chapter than they do on the schematics are listed. Different names are used in this chapter only where the resulting description is significantly clearer. Table 8-28: Signal Name Cross-Reference Name in thia chapter Name on echematice Name in behavioral model B...ALtJl!WlB8VLTJ[<81d.&> B..,ALlJj.Dfto1lJ,.<lbl&> B.,.ASIUU.lJIJrJIlI:SULTJ[<lI.I&> B...ALU'lYt1I:IfOLT_H<14Q> B..ALlJ.,.ADI1Yl...L<1.a> B~T..B<1411D> B...ALv.,cur4l> B...ALlJ....ADRCJ....L<:82> B.,.ASIUU.tJIJIoCAlIlIIBS_OtJT_B4l> B...ALt7'JI,CJ,..B<l1,B,2'7'" B...ALV~1,28,s'7... B.,.ASIU\LtJIJIoCAlIlIIBS_OtJT_B:~ 1I.J1.18,17,U> 1I,21,1I.17,1&> . . .18,1&,1.0 sc, B...AL~B4038,._ B...ALlJ...ADKCUI~.-"w. B."ASB;,..ALUM:AlIlIIBS_OtJTJI48,S7~ . . . .18,11> ",11.11> 2IS,S1.18,17,11> B...ALv.,cur<14,H.I0,8.8,4.b B...A8I(..ALtJIJIoCAlIlIIBS_OtJT_B:<18,11,8,7AS,1> B...ALv.,cur<D.l1A7,1,8,l> B.,.ASIUU.tJIJIoCAlIlIIBS_OtJTJI<1S,10......... B...ALv.,cur<o> B.,.ASIUU.UM:..JNJI ~BlltJ~JW...PLU8B no exact ma~ roughly equals the following: no uact match, rougblyequals the following: B_8'ft..1W:ATB..J'.JfOP~SUI, B...8TL1DVDYJATILNOPJIIIllX-.~ B...8'ft....VBB.Y...LA'l1lJ'fOP~8UI no exact ma~ roughly equals the following: B...8'ft..~84JI, no exact ma~ roughly equals the following: B...STL~su., B_8'ft..1W:ATB~SUl, B...STL~~su, B_8'ft....VBB.Y..IA.D~~B: B...STL1DVDY~STAL1JtII~.JWJ. no exact m.atch, roughly equals the following: no exact ma~ roughly equals the following: B...STLtQfOP~8I.J,., B_8'ft..~.JfOP.JIIItlX...SIJI, B...8'ft..IJiNOP..BIItJX.81J[ B...8TLV.JfOP~81J[ B_8'ft..U'...NOP_~8I_H, no no exact ma~ roughly equals the following: ma~ B...~RT..B exact roughly equals the following: B_~BT..L, B_8'1'P'H'JLABORTJI, B...8TL~RTJI no no exact ma~ roughly equals the following: ma~ B_~SlJ[, exact roughly equals the following: JUl'l'LUTALL_Sl..L, B_8TLlIIl.A'.nUITAu...S3...H, B...8'ft....VBB.Y~J[ B...8TL~LA!nUJTAu...SS..L, B...8TL~VDY..LA'11L~..L no no exact match, rougbly equals the following: ma~ B...STL~ B...STL1ItLATB..8~S(.H exact roughly equals the following: B...8TLUTALL..~L, B...8TL«H.ATlL8'l'ALL..su. DIGITAL CONFIDENTIAL The Ebox 8-97 NVAX CPU Chip Functional SpeCificatiODt Revision 1.1, August 1991 Table 8-28 (Cont.): Signal Name Cross-Reference Name in this chapter Name on schematics Name in behavioral model no exact match, roughly equals the following: E_STL~F..,.NOP_~II, E_STL~F..,.NOP_84..:s:, E_STL~LATlLF..,.NOP_84_11 no exact match, roughly equals the following: E_STLH.ATE..F..,.NOP_84_11 no exact match, roughly equals the following: no exact match, roughly equals the following: E_STLCJW«)P_86_11, E..STLV..,.NOP_S5J1 E_STLCU«>P_S5_L, E_STLV..,.NOP_81_11 8-98 The Ebox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 8.5.29 Revision History Table 8-29: Revision History Who When Description of change John Edmondson 30-NOV·1988 Initial Release. John Edmondson 19-DEC·1988 Corrections and Updates. John Edmondson 06·MAR·1989 Release for external review. John Edmondson 29-NOV·1989 Updates after external review and modeling complete. John Edmondson IB.DEC·1989 Further updates, particularly adding real signal names. John Edmondson 31-JAN·1990 Updates reflecting minor implementation motivated changes • rev 0.5. John Edmondson 4-MAY-l990 Updates reflecting minor implementation motivated changes - post rev 0.5. John Edmondson 20·FEB-1991 Further updates post implementation. John Edmondson 31-MAY·1991 Minor updates for pass 2 changes. DIGITAL CONFIDENTIAL The Ebox 8-99 Chapter 9 The Microsequencer 9.1 Overview The microsequencer is a microprogrammed finite state machine that controls the three Ebox sections of the ~'"VAX. pipeline: 83, S4, and 85. The microsequencer itself resides in the S2 section of the pipeline. It accesses microcode contained in an on-chip control ROM, and microcode patches contained in an on-chip SRAM. Each microword is made up offield~ that control all three pipeline stages. A complete microword is issued to 83 each cycle, and the appropriate microword decodes are pipelined forward to S4 and 85 under Ebox control. Each microword contains a microsequencer control field that specifies the next microinstruction in the microfiow. This field may sp~ an explicit address contained in the microword or direct the micro sequencer to accept an address from another source. It also allows the microcode to conditionally branch on various NVAX states. Frequently used microcode can be made into microsubroutines. When a microsubroutine is called, the return address is pushed onto the microstack. Up to six levels of subroutine nesting are possible. Stalls, which are transparent to the microcoder, occur when an NVAX resource is unavailable, such as when the ALU requires an operand that has not yet been provided by the Mbox. The microsequencer stalls when 83 of the Ebox is stalled. Microtraps allow the microcoder to deal with abnormal events that require immediate service. For example, a microtrap is requested on a branch mispredict, when the Ebox branch calculation is different from that predicted by the Ibox for a conditional branch instruction. When a microtrap occurs, the microcode control is transferred to a service microroutine. 9.2 Functional Description 9.2.1 Introduction The NVAX microsequencer consists of several functional units of logic that are explained in the following sections and illustrated in the block diagram, Figure 9-1. DIGITAL CONFIDENTIAL The Microsequencer 9-1 ~ T%CS_TEST_H :II co c _. :} E_BUS%UTE8T _ld :0> CD CiJ SO a I giAPAEDIC I R .a c ;1 ; MICROTRAP LOGIC CUA-I AF.NT . <If. ADDR ~ la OUT DI,plTCU: lOGIC ~I ~, PHI 0 :::J ';:)'1" .... aILECT_IO i III UMIB<SEO.FMT.SIEO.MUX> A o (; -',,.1------1 V... MICAO ... TACK ..-,I ~. s"O, !! 0 () 10:4> :;) , PATCHABLE ...I ~ PHil "'l U I"eTR 1, OUEUE ~ '", :;) ID o I :;)" , STORE ! 10 OUT <VALID> lO_OUT<OPC,DL,FI> JUMP/BRANCH ADDRESS PHI29 g PHI! STALL E_USO%PE_A80RT_L:!I ~ M}B Ci) o JI a ~ ~ r- E%FOPCOOE_H (TO FlOX) l tl ~ i ~ UMIB HI! oz a6 .... I ! .. _ ............................ . ~ r- ~ j:> I PHI4 ~ ~ rI. I laElleT_IO STALL PHil c; ~ g , lO_BUS<VAtio; S! II :I UM'81:t.~OS~Sll4uX> SEO. , . III I CONTROL PHI Nil I STAll '" I ! I ~ <I~ Z : JIi U STAll _ ....... .. :t ....J ! a: n :I, PHI3 E%INT REO H~~!i1~===::::1 STORE lAT .... E_USO%PE_ABORT_l 1-~:t:::f-N"i:flD > fO,._O~ _I CONTROl ~ E_USO%ICTX_H (TO U) E_USOo,c,U1SEl-:4:fh (10 83) E_USO%MIB (TO U) I .... i.... NVAX CPU Chip Functional Specification, Revision 1.0t February 1991 9.2.2 Control Store The control store is an on-chip ROM which contains the microcode used to execute macroinstructions and microtraps. It is made up of up to 1600 microwords. These are arranged as 200 entries, each entry consisting of 8 microwords. Each microword is 61 bits long, with bits <14:0> being used to control the microsequencer. The remainder of the microword, bits <:60:15>, is used by the Ebox to control S3 through S5. The Ebox also receives bits <:14,l2:11>, enabling it to recognize the last cycle of a microftow and the validity of the microtest bus select lines. The control store access is performed during 4>34 of 82 and 4>1 of S3 of the NVAX pipeline. The output of the Current Address Latch (CAL), E_USCLCAL%CAL_B<lO:O>, is used to address the control store. Bits <:10:4,0> are used to select one of the 200 entries. The eight microwords in the selected entry then enter an eight-way multiplexer, where E_USQ..CAL%CAL_B<3:1> select the final control store output. This structure is used because E_USQ..CAL%CAL_B<3:1> are valid later than bits <:10:4,0>, since E_USQ..CAL%CAL_B<3:1> must be OR'd with the microtest bus for a BRANCH format microinstruction (see Section 9.2.2.2.2 for details). 9.2.2.1 Patchable Control Store The patchable control store is an on-chip 8RA.'l\1: which contains microcode patches. It consists of up to 20 micro'\vords. It operates in parallel '\nth the control store. The microaddress from the CAL is the input to its CAM (Content Addressable Memory). If the address hits in the CAM, the output of the patchable control store is selected as the new microword, rather than the output of the ROM control store. The patchable control store and CAM are precharged in 4>3 and evaluate in 4>41' The CAL output, E_USQ..CAL%CAL_B<lO:O>, is used in its entirety as the lookup address in the CAM, as opposed to the 1-of-200 selection followed by the 1-of-8 selection used in the ROM control store. 9.2.2.1.1 Loading the Patchable Control Store - Entries in the Patchable Control Store and its CAM are written under software control from the Patchable Control Store Control Register (PCSCR) in the Ebox. The CAM must be disabled during this operation, so that no hits can occur. This is done by writing a zero to PCSCR<PCS_ ENB>. In addition, Parallel Test Port control of the MIB scan chain must be disabled, by writing a one to PCSCR<PAR_PORT_DIS>. Following assertion of K,..E%RESET_L, PCSCR<PCS_ENB> and PCSCR<PAR_PORT_DIS> both contain zeroes. Data is serially scanned into the MIB scan chain, in the order shown in Table 9-2 (data is shifted from bit 0 to bit 91). The data is taken from PCSCR<:DATA>; shifting into the scan chain is enabled by PCSCR<:RWL_SHIFT>. The final 20 bits scanned in (positions<19:0> in the scan chain) are used to select which entry in the patchable control store is to be written. Only one of these 20 bits may be asserted at a time. \Vhen all 92 bits of the scan chain have been serially loaded, the selected patchable control store and CAM entry are written under control of PCSCR<:PC8_WRITE>. All patchable control store entries must be written with either valid or NULL patches before the PCS is enabled. A NULL patch is an entry whose CAM location is written with an unusedlunreferenced microaddress; there can never be a hit on this microaddress. The values of the MIB bits in a NULL patch are don't-care. DIGrTAL CONFIDENTIAL The Mlcrosequencer 9-3 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 When the patchable control store is loaded, the patch revision must be loaded into PCSCR<PATCH_ REV>. If the patch is non-standard (i.e., one which is not a formally distributed patch, such as a performance analysis patch), PCSCR<NONSTANDARD_PATCH> must be set to 1; otherwise it must be set to o. These fields can be read by software to determine which patches are present in the machine. These fields are included in reads of the SID processor register. Enabling of the patchable control store is done by writing a zero to PCSCR<PAR_PORT_DIS> and then writing a one to PCSCR<PCS_ENB>. See Section 8.5.22.1 for more details on PCSCR operation. The following table shows an example of writing an entry in the patchable control store. Table 9-1: Phase Example: WrHlng an Entry In the Patchable Control Store Action Microeyele 1 1 2 3 Write 0 to PCSCR<PCS_ENB>l (disable the CAM) CAM: NOW DISABLED 'Write a 1 to PCSCR<PAR_PORT_DIS>l (disable parallel port control) 4 Mierocyele 2 1 2 PARALLEL PORT CONTROL NOW DISABLED2 3 Write data for MIB scan chain bit<91> to PCSCR<DATA>l Write 1 to PCSCR<RWL_SHIF'I'>l 4 Microeycle 8 1 2 3 Write data for.MIB scan chain bit<90> to PCSCR<DATA> Write 1 to PCSCR<RWL_SHIF'I'> 4 Data for MIB scan chain bit<91> shi:fted into MIB scan chain bit<O>2 Mierocycle " 1 2 1An 85 operation. 2Note I-cycle delay between some PCSCR fields and MIB scan chain. 9-4 The Mlcrosequencer DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 9-1 (Cont.): Example: Writing an Entry In the Patchable Control Store Phase Action 3 Write data for MIB scan chain bit<89> to PCSCR<DATA> Write 1 to PCSCR<RWL_SHIFT> 4 Data for MIB scan chain bit<90> shifted into MIB scan chain bit<O> Data for MIB scan chain bit<91> shifted into MIB scan chain bit<l> Microeycle " Microcycle 94 1 2 3 Write 1 to PCSCR<PCS_\VRITE>l (write data into patchable control store) 4 Data for WB scan chain bit<91> shifted into MIB scan chain bit<91> Mlcrocycle 95 1 2 3 4 DATA \VRITTEN Thi'O PCS E1\~RY FRO!\l MIB SCAN CHAIN2 1An S5 operation. 2Note I-cycle delay between some PCSCR fields and ~IlB scan chain. Note that this example assumed no stalls within the Ebox. Also note that PCSCR<PCS_ ENB> and PCSCR<PAR_PORT_DIS> must be re-written with the correct values every cycle that PCSCR<DATA> is written. Table 9-2: Position Contents of MIB Scan Chain, When Loading Patchable Control Store . Description Microword Field BRANCH.OFFSETl <91> <90> Comment MIBJ(<l> <89> <88> <87> <86> <85> lSee Chapter 6 for details on microword fields. DIGITAL CONFIDENTIAL The Microsequencer 9-5 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 9-2 (Cont): I Contents of MIB Scan Chain, When Loading Patchable Control Store Position Description <84> <83> <82> <81> <80> <79> MIB.,Bc7> Comment MIB..B<84> Microword Field L MIB..Bc49> Microword Field MISC1 M1B..Bc48> M1B..Bc4'b MIB..Bc48> <78> <77> <76> <75> MIB..BClIO> Microword Field FMT MIB..B<18> Microword Field MISC <74> <73> <72> <71> <70> <69> <68> <67> MIBJI<16> MIB..B<18> MIB..B<17> MIBJI<15> MIBJI<31> :Microword Field DST MIBJlc3O> MIB..B<29> MIBJl48> MIBJI<2'> MIBJldS> <68> <65> <64> <63> <62> <61> <SO> <59> <58> <57> <56> MIBJI<2b <55> <54> <53> <52> CAKIIICJtOADDBB8ScIb <51> <50> <49> CAlI MICBOADDlmS8<1> Microword Field A MIBJI<U> MIBJI<28> MIB..B<22> MIB..B41> MIB..B4O> CAM MICltOADDlmS8<1O> Microadclress to be patched CAM MICltOADDItBSIkIt> CAM MICltOADDBB8Sc8> CAM MICJtOADDBBS8c7> CAMMICltOADDBBIItkI> CAllIIICBOADD:JtBIItkC> CAlI MICBOADDJt1CS84> CAllMICBOADD~ CAM MICJtOADDBB8S<O> MIB..B<1O> 9-6 The Microsequencer Microword Field SEQ.COND DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 9-2 (Cont.): Contents of MIB Scan Chain, When Loading Patchabie Control Store Position Description <48> <47> <46> <45> <44> <43> <42> <41> <40> <39> <38> <37> <36> <35> <34> <33> <32> <31> <30> <29> <28> <27> <26> <25> <24> <23> <22> <21> <20> MIBJl4> Comment MIBJlc8> M1B_B<14> Microword Field SEQ.FMT M1BJI<18> Microword Field SEQ.CALL MIB_B<12> Microword Field SEQ.COND MIB_B<1l> MIB_B<89> Microword Field B MIB_B48> MIB_B<87> MIBJI<88> MIB_Bc35> MIB_Bc44> Microword Field MISC2 MIB_Bc43> MIB_Bc42> ~lIB_Bc:41> MIBJI<46> Microword Field LIT MIBJlc:4O> Microword Field D Microword Field MRQ MIB_Bc54> MIB_Bd8> MIB_BcU> MIB_Bd1> MIBJldO> MIBJl4I> Microword Field W Microword Field V MIBJI<II> Microword Field ALU MIBJI<8I> MIBJlc68> MIBJlcS7> M1BJI<I8> MIBJlcI6> <19> <18> <17> <16> pes EN'1'KY SBLBCT<18> <15> <14> <13> PCB EN'1'KY SELBCT<11> Entry in PCS to be written pes EN'1'KY SBLBCT<18> pes EN'1'KY SBLBCT<17> pes ENTRY SBLBCT<18> pes ENDl1' SELBCT<14> pes ENTKY SBLBCT<13> DIGITAL CONFIDENTIAL The Microsequencer 9-7 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 9-2 (Cont): Contents of MIB Scan Chain, When Loading Patchabie Control Store Position Description <12> PCB ENTJlY SELBCT<D> <11> PCB ENTJlY SELBCT<11> <10> PCB ENTJlY SELBCT<1O> <9> PCB ENTJlY SELBCT<8> <8> PCB ENTJlY SELBCT4> <7> PCS ENTJlY SELBCT<'b <6> PCS ENTJlY SELECT4> <5> PCB ENTRY SELBCTc&> <4> PCS ENT.RY SELECT",b <3> PCS ENTRY SELECT<3> <2> PCS ENTItY SELECT<2> <1> PCS ENTItY SEL'ECTcl> <0> PCS D."TIlY SELECT<O> 9.2.2.2 Comment Mtcrosequencer Control Field of Microcode The microsequencer control field of the NVAX microword is used to help select the next microword address. The next address source is explicitly coded in the current microword; there is no concept of sequential next address. The SEQ.FMT field, hit <14> of the microsequencer control field, selects between the following two formats: Figure 9-2: Microcode Mlcrosequencer Control Field Formats 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ JtJMP 1 01 I 1 J 1 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 1 1 1 I + - - SEQ.Mt1X 1 +--- SEQ. CALL +--- SEQ.FMT 14 13 12111 10 09 08107 06 OS 04103 02 01 00 BRANCH +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 11 1 SEQ .COND 1 BRANCH. OFFSET 1 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 1 +--- 1 SEQ.CALL +--- SEQ. FMT 9-8 The Microsequencer DIGITAL CONFIDENTIAL NVAX CPU Chip Functional SpecifiCation, Revision 1.0, February 1991 Table 9-3: Jump Fonnat Control Field DeflnHlons Extent Description Name SEQ.FMT 14 Ofor JUMP SEQ.CALL 13 Controls whether return address is pushed on microstack . SEQ.MUX 12:11 Selects source of next microaddress J 10:0 JUMP target address Table 9-4: Branch Format Control Field Definitions Name Extent Description SEQ.FMT 14 1 for BRANCH SEQ.CALL 13 Controls whether return address is pushed on microstack SEQ.COND 12:8 Selects source of Microtest Bus BRANCH.OFFSET 7:0 Page offset of next microinstruction 9.2.2.2.1 Jump Format Jump format microinstructions choose the next address from one of three possible sources: the J field (bits<10:0> of the current micro'\vord), the microstack, or the last cycle logic. The microword fields decode as follows: Table 9-5: Jump Format Control Field Decodes NEXT ADDRESS SEQ.CALL SEQ.MUX SOURCE REMA.RKS 0 0 J 1 0 J X 1 STACK JUMP microinstruction. CALL microinstruction. Cun-ent microword address with bits <3:0> incremented by one is pushed onto microstack. RETURN microinstruction. Top entry of microstack is selected. Last cycle. Select next microfiow. Last cycle and enable integer overflow trap. Select next mic:rofiow. X 2 Last Cycle Logic X 3 Last Cycle Logic On a CALL microinstruction, the address of the current microinstruction, with bits <3:0> incremented by one, is pushed onto the Microstack. The CALL address is modified to avoid a RETURN to the CALL address, which would cause an infinite loop. DIGITAL CONFIDENTIAL The Microsequencer 9-9 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Branch Format 9.2.2.2.2 Branch format microinstructions allow the microcoder to perform CASE operations on NVAX state. The SEQ.COND field drives the microtest bus select lines which select the source that drives the microtest bus. (Refer to Section 9.2.3.1.1 for details.) The microtest bus is OR'd with bits <3:1> of the BRANCH. OFFSET field, allowing up to an eight-way case. Casing may be reduced to two-way or four-way by setting to ones the appropriate bits in BRANCH.OFFSET<3:1>. Table 9-6: Branch Format Control Field Decodes NEXT ADDRESS - SEQ.CALL SOURCE REMARKS 0 BRANCH. OFFSET BRANCH microinstruction. 1 BRANCH. OFFSET CONDITIONAL CALL microinstruction. CUlTent microword address with bits<3:0> incremented by one is pushed onto micro stack. As in the JUMP format, the SEQ. CALL field is used to indicate that a RETURN address must be pushed on the microstack. For the purposes of BIL~~CH microinstructions, the control store is divided into 256-microword pages. The target of a branch microinstruction must he in the same page as the BR..4....~CH as only the least significant 8 hits of the address are modified. The BRANCH. OFFSET field is the destination address offset within the current page. A branch address is made up as follows: Table 9-7: Branch Address Formation Bit(s) Source <10:8> Current Addres&<10:8> <7:4> BRANCH.OFFSET<7:4> <3:1> BRANCH.OFFSET<3:1> OR UTEST<2:O> <0> BRANCH.OFFSET<O> 9.2.2.3 MIB Latches The microword output from the Control Store B-to-1 multiplexer is latched in ~1 into the Control Store Microinstruction ButTer (CS_MIB) latch. The microword output from the Patchable Control Store is also latched in ~1' into the PCS_MIB latch. The outputs of the CS_MIB and PCS_MIB latches drive a multiplexer, which selects the PCS_MIB output if the CAL output bit in the Patchable Control Store CAM; otherwise, the multiplexer selects the CS_MIB output. Bits <14:0> of the multiplexer output (the Microsequencer Microinstruction, E_USQ..CSM%UMIB_B<14:O» are driven hack to the microsequencer; all bits are driven to the Microinstruction Buffer (MIB) latch which operates in ~2. Bits <60:14,12:11> of the MIB latch output (E_USQCQIIB_B) are driven to S3 of the Ebox; all hits are driven to the MIB 9-10 The Microsequencer DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 scan chain (see Section 9.5.2). When a microtrap is detected, the contents of the MIB latch are forced to NOP. The MIB latch is stalled on a microsequencer stall. 9.2.3 Next Address Logic The remainder of the microsequencer is devoted to determining the next control store lookup address. There are five next address sources: 1. JUMPIBRANCH.OFFSET field of Microword 2. Microtrap Logic 3. Last Cycle Logic 4. Microstack 5. Test Address Generator 9.2.3.1 CAL and CAL INPUT BUS The CAL, or Current Address Latch, is a static latch which holds the 11 bit address used to access the control store. It operates in 4>3, and is stalled on a microsequencer stall. Bits <10:8> are also "stalled" when forming a branch address (see Table 9-7). The input to the CAL is the CAL Input Bus (E_US~BUSo/cCAL_Il'."PUT_L). The CAL Input Bus is a dynamic bus, precharged in 4>2. The selected next address source drives this bus in 4>3. Bits <14,12:11> of the microsequencer control field are used in selecting three of the next address sources: E_US~CSM%tJMIB_B<lO:O> (for a BRANCH or JUMP address), the output of the last cycle logic, and the microstack output. The fourth CAL Input Bus source is the microtrap address; if a microtrap is detected, this input is selected regardless of the value of E_USQ.,.CSM%UMIB_H<14.12:11>. The fifth source is a test address, driven from the Test Address Generator. This input has the highest priority. In summary: . Table ~: Current Address Selection TEST ADDR TRAP DETECTED <14> SEQ.MUX <12:11> 0 0 1 0 0 0 SEQ.FMT NEXT ADDRESS SOURCE REMARKS XX Branch Address l BRANCH/CONDITIONAL CALL microinstructions 00 J JUMP/CALL microinstructions 0 0 0 01 Microstack RETURN microinstruction 0 0 0 1X 0 1 1 X X X XX XX Last Cycle Logic Microtrap Logic Test Address Generator Start new microilow Microtrap Test address lSee Table 9-7 DIGITAL CONFIDENTIAL The Microsequencer 9-11 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 9.2.3.1.1 Mlcrotest Bus The microtest bus allows conditional branches and conditional calls based on information generated outside the microsequencer, such as Ebox condition codes. The SEQ.COND field of the BRANCH format is driven on the microtest select lines, E_USQ%UTSEL_B<4:O>, in 4>23. These lines are decoded by all conditional information sources in the Ebox. The selected source drives its information on the microtest bus, E_BUKUTEST_L<2:o>. E_BUKUTFST_L must be valid in time to be OR'd with value on the CAL Input Bus and latched in the CAL in 4>3. The sources for the microtest bus are as follows: Table 9-9: Mlcrotest Bus Sources UTSEL<4:O> Select UTEST<2:O> 00 No source 000 01 ALU.~~r2 ALU_CC.N,ALU_CC.Z,ALU_CC.V 02 ALU.l\t"ZC2 ALU_CC.N ,ALU_CC.z,ALU_CC.C OS B.2-01 B.S-3 1 EB_BUS<2:0> 1 EA_BUS<7:S> 04 EB_BUS<5:3> 05 A.i-5 06 A.15-12 1 EA_BUS<15:14>, EA_BUS<1S> OR EA_BUS<12> Oi AS1.BQA.BNZl 1 E..~_BUS<31>, EB_BUS<2:O> 08 MPU.O-S2 MPUO_6<2:0> 09 MPU.7-13 2 MPU7_13<2:0> OA OB STATE.2-02 STATE.S·3 2 STATE<5:3> OC OPCODE.2-01 OPCODE<2:0> =0, EB_BUS<15:8> NEQ 0 STATE<2:O> OD PSL.26-24' PSL<26:24> OE PSL.29.23-228 PSL<29>,PS~3~2> OF SHF.NZ2 ,!NT SHF_CC.N, SHF_CC.z, INTERRUPr_REQUEST 10 VECTOR,TEST EC1kVECTOR_UNIT_PRESENT>', TEST DATA, TEST STROBE 11 FBOX Encoded fault<1:0>", EClkFBOX.ENABLED> = 0 8 12 FQ.VRl 0, FIELD_QUEUE_NOT_VALID, FIELD_QUEUE_RMODE 1S-1F Not Used 1Data is taken from 83. 2Data is taken from 84. 'Data is taken from 86. "See Section 8.5.19.7. The microtest select lines are always driven with bits <12:8> of the MIB output regardless of the microinstruction format. The microtest bus is only OR'd with the CAL Input Bus if the BRANCH source is selected to drive that bus. 9-12 The Microsequencer DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Two of the microtest sources, the Field Queue (FQ) and the Mask Processing Unit (MPU), perform some function based on the value of the microtest select lines. These functions must check SEQ.FMT, E_USQo/CMIB_B<l4>, for validity of the microtest select lines. The microtest select lines are precharged to a value of zero during 4'1; no microtest source is selected for this value. 9.2.3.2 Mlcrotrap Logic Microtraps allow the microcoder to deal with abnormal events that require immediate service. 'When a microtrap occurs, the microcode control is transfeITed to a service microroutine. Operations further behind in the pipe than the one which caused the microtrap are aborted. Microtraps are generated by the Ebox, Mbox, or Ibox. Those Ebox microtrap requests considered faults are asserted in S4 of the microinstruction in which they occurred. Those that are considered traps are asserted in 85 of the microinstruction in which they occurred. Microtraps have higher priority than all other next address sources except the Test Address Generator. Microtraps are detected in ~4. The microtrap signals are OR'd together in 4>1 to form E_USQ%PE_ABORT_L. The trap signals are prioritized and address lookup is done to select the appropriate microtrap handler address, which is driven on the CAL Input Bus in 4>3. Since microtraps are not detected until 4'4, too late for control store access in that cycle, the signal E_USQ%PE_ABORT_L is used to force NOPs in all the Ebox and microsequencer inter-stage latches in ~l and 4>2. This effectively flushes the pipe. In the cycle following microtrap detection, control store access is done using the microtrap handler address, and the first microword of the trap handler is driven to S3 on E_USQ%MIB_H. Microtrap microcode flows flush the Ebox, Fbox, the specifier queue in the Mbox, the Instruction Queue in the micro sequencer, and the Ibox. The only exception to this is the branch mispredict microtrap, which does not flush the Ibox. The microtrap handler also loads a new PC which allows the Ibox to start prefetching. At the end of the microtrap, microcode control is returned to the last cycle logic. Microtrap signals must be asserted for only one cycle, to prevent multiple detections of the same trap. 9.2.3.2.1 Mlcrotraps 1. Powerup The powerup microtrap is requested when the chip is powered up. This forces the internal state of the chip to a known condition. See Chapter 16 for details. 2. Asynchronous Hardware Error The asynchronous hardware error microtrap request can happen at any time regardless of what is in the pipeline. The following conditions cause execution of this microtrap: • • S3 Stall Timer Expiration The S3 stall timer counts the number of consecutive cycles that S3 is stalled.. When the counter reaches its limit, it initiates the Asynchronous Hardware Error microtrap by asserting E_TIM'1cS3_TIMEOUT_H. See Section 8.5.25.1 for more detail concerning the timer. Translation Buffer Parity Error DIGITAL CONFIDENTIAL The Microsequencer 9-13 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 3. 4. 5. 6. 7. S. 9. If the Mbox detects a TB parity etTOr, it initiates the Asynchronous Hardware Error microtrap by asserting M%TB_PERR_TRAP_L. Integer Overflow The integer overflow microtrap request, E_FL'NlOVFL_L, is asserted in S5 when the Ebox detects an integer overflow condition (see Section 8.5.19.3) during the last cycle of a macroinstruction with overftow checking enabled. The microinstruction that checked the overfiow condition completes, but any microinstruction initiated after it is aborted. Branch Mispredict A branch mispredict microtrap request, E_PSI&BRANCH_MlSPREDICT_H, is asserted in 85 by the Ebox when the output of the Branch Queue (the Ibox branch prediction) does not match the branch direction calculated by the Ebox. See Section 8.5.19.3. Reserved Instruction Fault The Ebox initiates the reserved instruction microtrap in S4 when the Fbox is disabled and any Fbox instruction other than MULL is issued. It asserts E_FLT%RSVD_INSTR_L to initiate the microtrap. Hardware Errors The Ebox hardware error microtrap request, E_FLT%HW_ERR_H, is asserted in S4 on operand-related hardware errors, such as the attempted access of an MD register which has its error bit set. Memory Management Exceptions • Reported by Mbox An explicit read or write request by the Ebox can result in a memory management exception. This causes the Mbox to assert the microtrap request signal, Mo/GMME_TBAP_L. See Section 12.5.1.5.3.7 for further detail. • Reported by Ebox A memory management fault can also occur during a memory access initiated by the Ibox, such as for an opcode or operand specifier. When this happens the Ibox asserts I%IMEM_MEXC_H. The Ebox combines this signal with several other conditions to generate E_FLT%'MME_EBR_B. It initiates the memory management microtrap by asserting E_FLT%MME_ERR_B in 84. See Section 8.5.15.14 and Section S.S.19 for more detail. Reserved Addressing Mode A reserved addressing mode fault occurs when the !box: detects a reserved addressing mode on an operand specifier. The reserved addressing mode microtrap request, E_FLNRBVD_ADDRJWODE_H, is asserted in S4 by the Ebox. Refer to Section 8.5.15.14 and Section 8.5.19 for details. Floating Point Fault A :floating point fault is a fault detected by the Fbox. If the current entry of the retire queue points to the Fbox, the request E_FLNFLOATING_FAULT_B can be asserted. If the retire queue points to the Ebox, the request is stalled until the retire queue does point to the Fbox. There are four possible causes for assertion of E_FLT%FLOATlNG_FAULT_B: :floating over.6ow, floating underflow, reserved operand, and :floating divide by zero. The trap handler cases on the :floating point fault code on the microtest bus. See Section 8.5.16.5 for further detail. 9-14 The Microsequencer DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0t February 1991 9.2.3.2.2 Microtrap Request TIming The exceptions which result in microtrap requests to the microsequencer are detected in different pipeline seqments. In addition, some microtrap requests are delayed in order to align the request with a particular pipeline segment. The following table gives the pipeline segment in which the exception is detected and the pipeline segment in which the microtrap request is made for each type of microtrap. Table 9-10: Microtrap Request Timing Microtrap Exception Microtrap Detected Requested Powerup N/A N/A Asynchronous Hardware Error, sa Stall Timer S3 S3 Asynchronous Hardware Error, TB Parity Error N/A N/A Integer Overflow S5 55 Branch Mispredict S5 SO Reserved Instruction Fault S3 54 Hardware Error 53~54 54 :Memory :Management Exception, :Mbox N/A N/A :Memory :rv1anagement Exception, Ebox 53.54 54 Reserved Addressing Mode S3,84- 54 Floating Point Faults 54 54 9.2..3.2.3 Prioritization of Microtraps Microtraps must be prioritized since more than one request may be asserted at a time. Microtrap priorities and microtrap handler addresses are given in the following table. Table 9-11: Mlcrotraps Priority Microtrap Dispatch Address (Hel[) 1 Powerup 00 2 Asynchronous hardware errors 04 3 Integer overflow 08 4 Branch mispredict 5 Reserved instruction fault OC 10 6 Hardware error 14 7 Memory management exceptions 18 8 Reserved addressing mode faults Ie 9 Floating point faults 20 The priorities of the microtraps are assigned utilizing the following dependencies: DIGITAL CONFIDENTIAL The Microsequencer 9-15 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 1. The chip must be placed in a known state upon powerup. 2. Once in a known state, asynchronous hardware errors take precedence over all, since they indicate a serious problem. 3. Microtrap requests issued in S5 have priority over those in S4 since they are further down the pipe. 4. Opcode faults take priority over operand faults. 5. Of the requests issued in 84, whichever physically took place first (was forwarded the furthest) has priority. 6. Architecturally defined faults or traps (i.e. integer overfiow) have priority over implementation defined faults or traps (i.e. branch mispredict). 7. Reserved addressing mode faults are mutually exclusive of operand memory management faults for the same operand, because the source queue is empty before a reserved addressing mode fault request is made. 8. The fioating point fault may only be requested when the retire queue points to the Fbox. 9.2.3.2.4 Erroneous Mlcrotrap Interruption A window of at least 4 cycles exists between initiation of a microtrap (assertion of E_USQ%PE_ABORT_L) and decoding of RESET~CPU for all microtraps except Branch ~fispredict. (A subset of the RESET.CPU operations is performed immediately on detection of branch mispredict.) During this window, a lower priority microtrap based on state which will be cleared by RESET.CPU must not be allowed to interrupt the higher priority microtrap which has begun execution. This restriction is met by the following rules: • • • • 9-16 Powerup Powerup can interrupt any microtrap as it has the highest priority. The powerup microtrap is initiated by deassertion ofK..E%RESET_L. Assertion ofK_E%RESET_L causes all NVAX state to be initialized, so no microtraps will occur to interrupt powerup based on previous state. Asynchronous Hardware Error Asynchronous hardware errors can interrupt any microtrap but Powerup. Due to the effects of K..E%RESET_L described above, no special logic is needed to meet this constraint. Ebox-Generated Microtraps All Ebox-generated microtrap requests (integer overflow, branch mispredict, reserved instruction fault, Ebox hardware error, Ebox memory management exception, reserved addressing mode, and floating point faults) are cleared within the Ebox immediately on assertion of E_Ue;Q%PEJ\BORT_L. Thus, none of these microtraps can interrupt another. Mbox-Generated Microtrap The Mbox memory management exception can occur at any time between assertion of E_U~PE_ABORT_L and decoding of RESET. CPU, if an Ebox-initiated memory reference is outstanding. The following list describes the possibility of assertion of M%MME_TRAP_L (initiation of the Mbox memory management exception microtrap) during each type of microtrap. • Powerup: As described above, Mtf:MME_TBAP_L cannot be asserted. • Asynchronous Hardware Error: The Microsequencer DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • • • • • By the nature of these errors, the Ebox may be performing any operation during initiation of this microtrap, so Mo/ciMME_TBAP_L could be asserted. Integer Overflow, Branch Mispredict: Detection of these traps occurs only on the last cycle of a microfiow, in S5. All outstanding Ebox-initiated memory references which could produce an error have been completed by this time, so M%MME_TRAP_L cannot be asserted. Reserved Instruction Fault: Initiation of this microtrap occurs in 84, on the first cycle of the micro'fiowfor the offending instruction. If that same microword begins an Ebox-initiated memory reference, the reference will be aborted on initiation of the microtrap. On initiation of a Reserved Instruction Fault microtrap, S5 can only contain the last microword of the previous micro'fiow. As described above, MO/c:l\fME_TRAP_L could not be asserted at that point. Ebox Hardware Error, Ebox Memory 1ianagement Exception, Reserved Address Mode: These faults are generated during operand access. By microcode convention, no operands are referenced while there is an outstanding Ebox-initiated memory reference. Thus, Mo/cMM:E_TRAP_L cannot be asserted. Mbox Memory Management Exception: :tvlultiple Ebox-initiated memory references can be outstanding at any time, so a second Mbox Memory Management Exception could occur. Floating Point Fault: Similar to the Reserved Instruction Fault, this fault is detected in 84, with the first result transfer from the Fbox. Any memory reference initiated during this cycle will be aborted on initiation of the microtrap. S5 could only contain the last cycle of a micro'fiow. Thus Mo/cMME_TRAP_L cannot be asserted. In summary, the Mbox Memory Management Exception microtrap is the only trap which could incorrectly interrupt a higher priority microtrap in this window. In order to prevent this error, detection of the Mbox Memory Management Exception is blocked at the microtrap logic for the cycles from microtrap initiation (assertion of E_USQ%PE.."ABORT_L) through execution of RESET. CPU (assertion of E-.MSC%EARLY_FLUSB_EBOx...B). Mbox Memory Management Exception detection is enabled again in the cycle following execution of RESET. CPU. Branch Mispredict is the only microtrap for which RESET.CPU is not executed. In this case, E_MSC%EARLY_FLUSB_EBOx...B is asserted in the same cycle as E_USQ%PE--ABORT_L; therefore, detection of the Mbox Memory Management Exception is only blocked at the microsequencer for one cycle. However, as described above, Mt~MMEtTRAP_L cannot be asserted during the Branch Mispredict microtrap, so the blocking is not necessary for proper execution of this microtrap. 9.2.3.2.5 Mlcrotrap Detection Abort Effects The microsequencer aborts operation on detection of a microtrap (assertion of E_USQ%PE_ABORT_L). The following table shows the timing for all micro sequencer logic that is cleared or reset on an abort. DIGITAL CONFIDENTIAL The Microsequencer 9-17 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 9-12: Abort Effects In the Mlcrosequencer Phase What is Clearedl.Reset E_USCLSTLCJeLATB_t7IKLS'D\I..LJ. E_t78Q9rIIBJ to sa E_'t18QCUIACBO_1S'I'_CYCLE_B to sa J5lI'BO~1ST_CYCLEJ. to Fbox E_t78Q...8'lt1l1>VERY_LATE_'D8Q..STAI..'kL J5lI'BO~l8T_CYCLEJ. to Fbox E_t18QCUIACBO_1S'I'_CYCLE..B master latch 9.2.3.3 Last Cycle Logic The last cycle logic examines several conditions used to determine \vbich new microfiow is to be taken when LAST.CYCLE or LAST.CYCLE.OVERFLO'\\T is detected on E_US~CSM%'L"MIB_H, no microtraps are detected, and no test address is driven. There are five possible new microflows, listed in order of priority: 1. 2. 3. 4. 5. Interrupt Request Handler Trace Fault Handler First Part Done Handler Instruction Queue Stall The macroinstruction microcode indicated by the top entry in the instruction queue. The last cycle logic prioritizes these sources and performs address lookup. In addition, the signal E_USQ...l..ST'1cSELECT_IQ...B is derived. This signal is asserted when a valid entry is taken from the instruction queue. Table 9-13: Priority Mlcroaddresses for Last Cycle Interrupts or exceptions Interra.pt or Esceptiou. Dispatch Address (lies:) 1 Interrupt request 24 2 Trace fault 28 3 First part done 2C 4 Instruct.ion Queue Stall 30 The priorities in the last cycle logic are assigned using the following dependencies: 1. Interrupts and trace faults must be handled between instructions. (Interrupts may also be serviced at defined points during long instructions such as string instructions; this servicing is handled by microcode.) 2. By definition, an interrupt that is permitted to request service has a higher priority level (IPL) than any exception that occurs in the process to be interrupted, or any instruction to be executed by that process. 9-18 The Microsequencer DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 3. When tracing is enabled (E_PSL%PSL_H<TP> is set), a trace fault must be taken before the execution of each instruction. 4. If an instruction begins execution with PSL<FPD> set, the first part done handler must be entered rather than the normal entry point for the instruction. 5. PSL<TP> and PSL<FPD> cannot both be set when an instruction begins execution. In order for PSL<FPD> to be set, the instruction must have been interrupted previously; the interrupt handler always clears PSL<TP> before saving the PSL when interrupting an instruction. (Note that the interrupt handler does not clear PSL<TP> when the interrupt is taken between instructions.) 6. The Instruction Queue Stall microword is executed if an opcode is requested from the Instruction Queue but the queue is empty. 9.2.3.3.1 Interrupts Interrupt servicing is requested by the Ebox by assertion of E%INT_REQ....H. For more information on interrupts, see Chapter 10. 9.2.3.3.2 Trace Fault A trace fault should be requested when the PSL<TP> hit is set. Due to the pipelined implementation of the Ebox, a local version of the PSL<TP> hit must be maintained; thus, the trace fault is actually requested when LOCAL_TP is asserted. There are two cases that must be considered in setting LOCAL_TP. In the first case, a macroinstruction starts execution with PSL<T> set. This is the normal program tracing mode. LOCAL_TP must be set immediately after the macroinstruction begins execution. In the second case, an interrupt was taken at the end of a macrointruction, and the trace must be taken when interrupt processing completes. In this case, PSL<TP> is set, and LOCAL_TP is asserted. LOCAL_TP is also updated whenever the PSL is written. LOCAL_TP is cleared by loading the PSL as a longword, with a value of 0 in the <TP> hit. 9.2.3.3.3 First Part Done The first part done handler is selected when PSL<FPD> is asserted and the instruction queue output is valid. The top entry in the instruction queue is removed (E_USQ...LS~~ELECT_I~H is asserted), but the last cycle address is the first part done handler address, rather than the dispatch taken from the instruction queue. If PSL<FPD> is asserted and the instruction queue is empty, the Instruction Queue Stall microword is selected. 9.2.3.3.3.1 Interaction with Reserved Instructions The !box detects unimplemented instructions (such as POLYx), and causes the microcode to enter the reserved instruction fault handler by placing the microaddress for that handler in the dispatch field of the instruction queue entry for the unimplemented instruction. However, if PSL<FPD> is asserted, the last cycle logic selects the first part done handler rather than the reserved instruction fault handler. The first part done handler detects this case and branches to the reserved instruction fault handler. DIGITAL CONFIDENTIAL The Mlcrosequencer 9-19 NVAX CPU Chip Functional SpecificatioDt Revision 1.1, August 1991 9.2.3.3.4 Instruction Queue The instruction queue is a FIFO filled by the Ibox. This queue permits the Ibox to fetch and decode instructions ahead of Ebox execution. The instruction queue is 6 entries deep. Each entry is 22 bits long. The format of each entry is as follows: Figure 9-3: Instruction Queue Entry Format 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I OPCODE I DL IFII DISPATCH I VI +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Table 9-14: Instruction Queue Entry Format Field Definitions Name Extent Description OPCODE 21:13 9-bit opcode of the instruction. DL 12:11 Initial data length of instruction operands. FI 10 Set if entry is an Fbox instruction. DISPATCH 9:1 Microcode address of the instruction's microfiow. V 0 Set if entry is valid. The instruction queue entry indicated by the write pointer is written in 4>4. The write pointer is advanced in 4>2 if the valid bit is set in the new queue entry. The instruction queue entry indicated by the read pointer is read in 4>1. The address used to access the control store is derived from the instruction queue entry as follows: Table 9-15: Control Store Address Formation Bit(s) Value o IQ entry DISPATCH field o If the valid bit of the entry being read is set, and the instruction queue is selected as the CAL Input Bus source, E_USQ%MACRO_lST_CYCLE_H is asserted and driven to the Ebox in 4>1 of S3. This signal is cleared on a microtrap, and stalls on a microsequencer stall. If the first cycle of an Fbox instruction is detected«FI> is asserted), the signal E%FBOX,..lST_CYCLE_L is also asserted, and driven to the Fbox in 4>23 of 82. This signal is only asserted once per instruction; it is not stalled on a microsequencer stall. If the valid bit of the entry to be read is not set, and the instruction queue is selected as the CAL Input Bus source, the last cycle logic selects the Instruction Queue Stall microaddress (030#16), which is used to look up the stall microword in the control store. The stall microword is a NOP for the EboX; it selects the last cycle logic again in the micro sequencer. In addition to driving the stall 9-20 The Mlcrosequencer DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 microword to the Ebox, E_USQ%IQ...STALL_H is asserted in 4>1 of S2. This signal, in conjunction with memory management and hardware error signals driven by the Ibox, is used by the Ebox to detect instruction stream referencing errors. The read pointer is advanced in 4>3 if E_UsQ...cSM%UMm_H selects the last cycle logic, the last cycle logic selects the instruction queue, and the valid bit in the queue entry that was read out is set. When the read pointer is advanced, the valid hit in the entry read out is cleared. The read pointer is stalled on a microsequencer stall. The instruction queue is flushed when the Ebox decodes RESET. CPU from the MIB (E_MSC%EARLY_FLUSH_EBOX_H is asserted). The pointers are reset, and the entry valid bits are cleared. Table 9-16 shows the phase-by-phase events that occur on an instruction queue stall. Initially, the read and write pointers both have a value of 4; the queue is empty. Table 9-16: Phase Instruction Queue Operation Action Microcycle 1 1 E_USQ..CSM"UMIB_B = LAST. CYCLE E_USQCHQ..ST.ALL_B asserted 2 Last microword of instruction flow driven to S3 =IQ stall address 3 CAL 4 Write I"IQ..BUS_B to Entry[4] (value = valid data) Microcycle 2 1 E_USQ..CSM"UMIB_B = LAST. CYCLE E_USQ..INQCI>IQ..Otn'_B Entry[4] E_USQ..I.S'l"AiSELECT_IQ..B assserted = E_uSQCHQ..~Bdeasserted 2 NOP microword driven to sa 3 CAL Entry[4] Increment read pointer (pointer=5) Clear valid bit in Entry[4] 4 Write I~IQ..BUS_B to Entry[5] (value = valid data) Increment write pointer (pointer=5) = Microcycle 3 1 E_USQ..CSM"UMlB_B = micro sequencer field of first microword E_U8QlQ(ACRO_lST_CYCLJLB asserted 2 First microword of new instruction flow driven to S3 Increment write pointer (pointer=6) 3 4 DIGITAL CONFIDENTIAL The Microsequencer 9-21 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 9.2.3.3.4.1 Instruction Context Latches The instruction queue drives the dispatch address to the last cycle logic. The remainder of the queue entry (DL,OPCODE,FI) is latched in the instruction context (ICTX) latches. The format is as follows: Figure 9-4: Instruction Context Format 11 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+ I OPCODE I DL IF!I +--+--+--+--+--+--+--+--+--+--+--+--+ Table 9-17: Instruction Context Format Field Definitions Name Extent Description OPCODE 11:3 9-bit opcode of the instruction. DL 2:1 Initial data length of instruction operands. FI o Set if entry is an Fbox instruction. The output of the queue is latched every~2 for hold-time reasons. The ICTX master latch operates in ~4 of 82, and is loaded from the queue output latch only when a valid entry is removed from the instruction queue (E_U~ISTo/c:SELECT_I~H is asserted). The ICTX slave latch operates in ~1 of 83; its output (E_USQ%ICTX...H) is driven to the Ebox. The instruction context latches are only valid when their respective pipeline stages are executing macroinstructions. Both the master and slave latches are stalled on a microsequencer stall. The slave latch is stalled holding the correct value for the current 83 cycle, and the master latch is stalled holding the correct value for the next cycle. The opcode portion of the instruction context (E%FOPCODE_H) is driven to the Fbox from the instruction queue output latch, in ~2 of 82. 9.2.3.4 Mlcrostack Frequently used microcode can be made into microsubroutines. When a micro subroutine is called, the return address is pushed onto the microstack. The output of the microstack is driven on the CAL Input Bus when a RETURN is decoded from the E_US<LCSM%UMIB_H, no microtraps are detected, and no test address is driven. The microstack is 6 entries deep. It is a circular stack, with the write pointer always one entry ahead of the read pointer. Each entry is an II-bit control store address. The addresses stored in the microstack incorporate any modification done by the microtest bus. Every ~1, the entry indicated by the microstack read pointer is read out into a ~1 latch, where it is held to be driven on the CAL Input Bus in ~3. Also in ~1, the RETURN address is written into the entry ahead of the microstack read pointer. The RETURN address is formed by adding 1 to bits <3:0> of the CALL address in the CAL. Bits <10:4> are unchanged. 9-22 The Mlcrosequencer DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 The microstack pointer is incremented in 4>4 on a CALL or CONDITIONAL CALL microinstruction; it is decremented on a RETURN microinstruction. The microstack pointer is stalled on a micro sequencer stall. It is only reset when the chip reset signal, K_E%RESET_L, is asserted. Figure 9-5: Mlcrostack Organization POINTER ARRAY +-----+ +------------------------------------+ I 0 I 1 1 +-----+ +------------------------------------+ First Call writes here I 1 I +-----+ +------------------------------------+ 1 2 1---+--->1 Pointer - 2 read entry +-----+ +------------------------------------+ I 3 1 +--->1 Pointer'"' 2 write entry 1 +-----+ +------------------------------------+ 1 4 1 1 I +-----+ +------------------------------------+ 1 5 I 1 I +------------------------------------+ +-----+ 1 1 1 Consider a CALL followed immediately by a RETURN with an initial microstack pointer value of 2. Table ~18 shows the phase-by-phase operation of the microstack during the next three cycles. Table 9-18: Phase X: CALL Y X+l: {next microword} Y: RETURN Mlcrostack Pointer Example Action Microcycle 1 1 2 3 CAL=X 4 Microcycle 2 1 Write X+l1 to Array[3] USTACK...OUT<10:0>=Array[2] E_USQ...CSM'*''DMIB_B = CALL 2 3 CAL=Y 4 Increment microstack pointer (pointer=3) lAssumptiOn: the result of the increment to bits<3:0> of X is X+1. DIGITAL CONFIDENTIAL The Mlcrosequencer 9-23 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 9-18 (Cont.): Mlcrostack Pointer Example Phase Action Microcycle 3 1 Write Y+l to Array[4] U8TACILOUT<10:0>=Array[3] (value E_U8Q...C8M"UMIB_B RETURN = =X+l) 2 3 CAL=X+l 4 Decrement microstack pointer (pointer=2) 9.2.4 Stall Logic The microsequencer is stalled whenever S3 is stalled. The Ebox derives the signal E_STIl~USE<LSTALL..H which is used to stall the micro sequencer. The micro sequencer creates delayed versions of this signal as needed to stall various latches. The signals E_USQ%PE-ABORT_L (asserted on initiation of a microtrap) and E_US(LTST%FORCE_TEST.J\I>DR_L (asserted on detection of the Test Address Generator driving a control store microaddress, see Section 9.5) break a micro sequencer stall by clearing the delayed versions of E_STL%USEQ..STALkH. The following table shows the timing for all stallable logic in the microsequencer. Table 9-19: Stall Timing In the MlcroSequencer Phase What Stalls 4'1 ICTX slave latch to S3 E_USQUIACBO_lST_CYCLEJI latch to S3 E_USQMtIBJI to 83 Current Address Latch E_USQ'QIACBO_lST_CYCLE_B master latch Instruction queue read pointer ICTX master latch to 83 Microstack pointer 9.3 Initialization A reset (assertion of ~E%RESET_L) causes the microsequencer to initialize in the following state: • • • A powerup microtrap is initiated (see Table 9-12 for microtrap ABORT effects). The micros tack pointer is reset to zero. The instruction queue valid bits are flushed and its pointers are reset by E...MSC%EARLY_FLUSH_EBO:K..B. • 9-24 The Patchable Control Store CAM is disabled, since PCSCR<PCS_ENB> is cleared in the Ebox. The Mlcrosequencer DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specificationt Revision LIt August 1991 • • 9.4 The MIB scan chain is controlled by the Parallel Test Port command pins, since PCSCR<PAR_PORT_DIS> is cleared in the Ebox. The Test Address Generator is reset to an address value of zero. Microcode Restrictions 1. Every microtrap except Branch Mispredict must contain a RESET.CPU in order to reset the 2. 3. 4. 5. 6. 7. 8. 9. Instruction Queue. (The Ebox is flushed automatically, clearing the queues, on detection of branch mispredict.) RESET. CPU must not be issued within the 3 microwords preceding LAST. CYCLE in order to allow time for the Instruction Queue to be cleared (if RESET. CPU is present in microword N, LAST.CYCLE cannot be present until microword N+4). For correct operation of Trace Fault and First Part Done in the Last Cycle Logic, PSL<T,TP,FPD> must not be changed within the 2 microwords preceeding LAST.CYCLE (if any of these PSL bits are changed in microword N, LAST.CYCLE cannot be present until microword N+3). No Ebox-initiated memory requests can be made in the last cycle of a microflow, other than writes with the translation already known to be valid. No Ebox-initiated memory requests can be outstanding when the microcode -references an operand (queue entry or register file location). The instruction queue stall microword must indicate LAST.CYCLE. PSL<TP> must be cleared by the interrupt handler before it allows execution of an interrupted instruction to resume. The Patchable Control Store (PCS) WRITE command, issued by writing a "1" into PCSCR<PCS_WRITE> in microinstruction N, must not be followed by a PCS ENABLE command (issued by writing a "lit into PCSCR<PCS_ENB» before microinstruction N +2. Following the writing of the Patchable Control Store ENABLE bit (PCSCR<PCS_ENB>) in S5 by microinstruction N, the first microinstruction for which Patchable Control Store can be considered enabled is microinstruction N +4. The First Part Done microflow must check for the case in which an unimplemented instruction begins execution with PSL<FPD> set. In this case, microcode must branch to the Reserved Instruction Fault microflow, rather than executing the normal First Part Done microflow. 9.5 Testability 9.5.1 Test Address The control store microaddress is both controllable and observable. A microcode address can be driven to the microsequencer from the Test Address Generator. The Test Address Genera tor is an II-bit counter which is initialized to a value of zero on assertion of K_E%RESET_L. It increments its address counter once on each deassertion of To/oCS_TEBT_H, thus cycling through all possible control store addresses. This microaddress source takes priority over all others. To ensure immediate control store lookup using this microaddress, assertion of T%CS_TEST_H sets an SIR latch whose output is E_USQ....TST%FORCE_TEST_ADDR_L. Assertion of this signal breaks any stall on 4>2, 4>3, and 4>4 latches in the microsequencer. This allows the control store to operate, driving the selected DIGITAL CONFIDENTIAL The Mlcrosequencer 9-25 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 microword into the MIB scan chain (see Section 9.5.2). The Ebox stall(s), if any, are unaffected, along with stalls on 4>1 latches in the microsequencer. E_USQ...'I'ST%FORCE_TEST..ADDR_L is deasserted when the Test Address Generator has completed generation of all possible addresses (when its counter overflows). The microaddress driven from the CAL can be be observed on the Parallel Test Port data pins under control of the Parallel Test Port command pins. The microsequencer drives to the Parallel Test Port in 4>1' Figure 9-6: Parallel Port Output Format 11 10 09 08107 06 05 04103 02 01 +--+--+--+--+--+--+--+--+--+--+--+ I CAL OUTPUT I +--+--+--+--+--+--+--+--+--+--+--+ Table 9-20: Parallel Port Output Format Field Definitions Name Extent Description CAL OUTPUT 11:1 9.5.2 Microaddress driven from CAL MIS Scan Chain A 92-bit scan chain is present at the output of the MIB, allowing the complete microword to be latched and scanned out of the chip. The scan chain master latches operate in 4>4; the slave latches operate in 4>2. In observe mode, the scan chain is loaded and shifted under control of the Parallel Test Port command pins. When scanning out, MIB scan chain bit<91> is the first bit to reach the Parallel Test Port. Note that control of the MIB scan chain must be given to the parallel port during this operation, by writing a 0 to PCSCR<PAR_PORT_DIS>. See Section 8.5.22.1 for details. Table 9-21 : Contents of MIB Scan Chain, In Observe Mode Position Description <91> Comment Microword Field BRANCH. OFFSET! <90> <89> E_UsqyaBJI<2> <88> E_UsqyaBJI<3> <87> E_UsqyaBJI<4> <86> E_~JI<6> <85> E_UsqyaB..B<IB> <84> E_USQCQUB..B<'7> lSee Chapter 6 for details on microword fields. 9-26 The Mlcrosequencer DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 9-21 (Cont.): Contents of MIB Scan Chain, In Observe Mode Position Description Comment <83> E_U8QMUBJI<M> Microword Field L <82> E_U8QMUBJI~ Microword Field MISC1 <81> E_U8QMUBJl4B> <80> E_U8QMUBJI<47> <79> E_U8QMUBJI<.M> <78> E_U8QMUB_B4O> Microword Field FMT <77> E_USQUDB_B<19> Microword Field MISC <76> E_U8QClIDMlBJI<18> <75> E_U8QMUB_B<l7> <74> E_U8QMUB_B<l8> <73> E_~JI<1a> <72> E_USQll(tMlB_B<3l> <71> E_USQIl(tMIB_B<3O> <70> E_U8QMUB_B<28> <69> E_U8QMUB_B<2B> <68> E_U8QMUB_B4'7> <67> E_U8QMUBJl4I> <66> E_U8QMUBJl4I> Microword Field DST Microword Field A <65> E..USQ'HIIB_B<M> <64> E_~JI<I3> <63> E_U8Q9IIBJI<D> <62> E_USQUDBJI<ll> <61> E_USQIl(tMIB_B<ID> <60> Value Undefined No Observe Input <59> Value Undefined No Observe Input <58> Value Undefined No Observe Input <57> Value Undefined No Observe Input <56> Value Undefined No Observe Input <55> Value Undefined No Observe Input <54> Value Undefined No Observe Input <53> Value Undefined No Observe Input <52> Value Undefined No Observe Input <51> Value Undefined No Observe Input <50> Value Undefined No Observe Input <49> E_U8QMUB_B<lO> Microword Field SEQ.COND <48> E_~_B<I> DIGITAL CONFIDENTIAL The Mlcrosequencer 9-27 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 9-21 (Cont.): Contents of MIB Scan Chain, In Observe Mode Position Description <47> E_USQQUB-.&4> <46> <45> E_VSQCI!MIB-.&<1,o Microword Field SEQ.FMT E_tJSQClQIIB_B<13> Microword Field SEQ. CALL E_US~_B<12> Microword Field SEQ.COND <44> <43> <42> <41> <40> <39> <38> <37> <36> <35> <34> <33> <32> <31> <30> Comment E_VSQCI!MIB-.&<1l> E_VSQCJ&MIB_B<89> Microword Field B E_VSQ'QIIB-.&<38> E_tJSQClQIIBJ!<3'7> E_tJSQClQIIBJ!<88> E_tJSQClQIIB_B<36> E_tJSQClQIIBJ!~> Microword Field MISC2 ~US~_B<43> E_VSQ'QIIBJ!<42> ~tJSQClQIIB_B<4l> E_VSQCJ&MIB-.&<4I> Microword Field LIT E_tJSQClQIIB_B<4O> Microword Field D E_USQCY!JBJ!<6b Microword Field MRQ E_USQCJ&MIB-.&43> <29> <28> <27> E_USQCY!JBJ!<D> <26> <25> <24> <23> E_USQCJ&MIB_B<33> Microword Field W E_USQ'QIIB_B42> Microword Field V E_USQCJ&MIBJ!4B> Microword Field ALU <22> <21> <20> <19> E_USQ'UIIB-.&47> <18> <17> <16> <15> <14> <13> <12> E_tJSQClQIIB_B41> E_USQCY!JB-.&<50> E_USQCQIIBJ!<I58> E_USQCJ&MIBJ!<I58> E_USQCQIIBJ!<A> Value Undefined No Observe Input Value Undefined No Observe Input Value Undefined No Observe Input Value Undefined No Observe Input Value Undefined No Observe Input Value Undefined No Observe Input Value Undefined No Observe Input Value Undefined No Observe Input 9-28 The Mcrosequencer DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 9-21 (Cont.): Contents of MIB Scan Chain, In Observe Mode Position Description Comment <11> Value Undefined No Observe Input <10> Value Undefined No Observe Input <9> Value Undefined No Observe Input <8> Value Undefined No Observe Input <7> Value Undefined No Observe Input <6> Value Undefined No Observe Input <5> Value Undefined No Observe Input <4> Value Undefined No Observe Input <3> Value Undefined No Observe Input <2> Value Undefined No Observe Input <1> Value Undefined No Observe Input <0> Value Undefined No Observe Input DIGITAL CONFIDENTIAL The Mlcrosequencer 9-29 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 9.6 Signal Cross Reference Note that the signal names used in this specification are the schema tic signal names. Table 9-22: Schematic Signal Names, In Alphabetical Order Schematic Signal Name Behavioral Model Signal Name E%FBOX_lST_CYCLE_L E%FB OJL lST_CYCLE_L E%FOPCODE_H E%FOPCODE_H E%INT_RE'LH E%INT_RE'LH E_BUS%UTEST_L E_BUS%UTEST_H E_FLT%FLOATING_FAULT_H E%FLOATING_FAULT_H E_FLT%HW_ERR_H E%HW_ERR_H E_FLT%IOVFL_L E%IOVFL_H E_FLT%MME_ERR_H E%MME_ERR_H E_FLT%RSVD_ADDR_MODE_H E%RSVD_ADDR_MODE_H E_FLT%RSVD_INSTR_L E%RSVD_INSTR_FAULT_H E_MSC%EARLY_FLUSH_EBOX_H E_MSC%EARLY_FLUSH_EBOX_H E_PSL%BRANCH_MISPREDICT_H E%BRANCH_MISPREDICT_H E_PSL%PSL_H E_PSL%PSL_H E_STL%USE'LSTALL_H E_STL%USE'LSTALL_H E_TIM%S3_TIMEOUT_H E_TIM%S3_TIMEOUT_H E_USQ%ICTICH E_USQ%ICTICH E_USQ%I'LSTALL_H E_USQ%I'LSTALL_H E_USQ%MACRO_lST_CYCLE_H E_USQ%MACRO_lST_CYCLE_H E_USQ%MIB_H E_USQ%MIB_H E_USQ%PE_ABORT_L E_USQ%PE_ABORT_H E_USQ%UTSEL_H E_USQ%UTSEL_H E_US'LBUS%CAL_INPUT_L E_US'LBUS%CAL_INPUT_L E_US'LCALo/oCAL_H E_US'LCAL%CAL_H E_US'LCSM%UMIB_H E_US'LCSM%UMIB_H E_US'LINQ%I~OUT_H E_US'LINQ%I'LOUT_H E_US'LLST%SELECT_I'LH E_US'LLST%SELECT_I~H E_US'LSTL%LATE_US'LSTALL_L E_US'LSTL%LATE_US~STALL_L E_US'LSTL%VERY_LATE_US'LSTALL_L E_US'LSTL%VERY_LA.TE_US~STALL_L E_US'LTST%FORCE_TEST_ADDR_L E_US'LTST%FORCE_TEST_ADDR_L I%I:MEM_MEXC_H I%IMEM_MEXC_H I%I'LBUS_H I%I~BUS_H 9-30 The Mlcrosequencer DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 9-22 (Cont.): Schematic Signal Names, In Alphabetical Order Schematic SigDa! Name Behavioral Model Sipal Name K_E%RESET_L M%MME_TRAP_L M%TB_PERR_TRAP_L T%CS_TEST_H K%RESET_L M%:MME_TRAP_H M%TB_PERR_TRAP_H T%CS_TEST_H Table 9-23: Behavioral Model Signal Names, In Alphabetical Order Behavioral Model Sipal Name Schematic Signal Name E%BRANCH_~nSPREDICT_H E_PSL%BRA..~CH_MISPREDICT_H E%FBOX_1ST_CYCLE_L E%FLOATING_FAtJLT_H E%FOPCODE_H E%lrn1_ERR_H E%I!\7_RE'LH E%IOVFL_H E%l\f1\IE_ERR_H E%FBOX_1ST_CYCLE_L E_FLTUFLOATING_FAULT_H E%FOPCODE_H E_FLT'n-ffi\_ERR_H E%RSVD_ADDR_~IODE_H E%RSVD_INSTR_FAtJLT_H E_BUS%UTEST_H E_MSC%EARLY_FLUSH_EBOX_H E_PSL%PSL_H E%I!\'"T:...RE~H E_FLTliIOVFL_L E_FLT9C!\IME_ERR_H E_FLT%RSVD_ADDR_:MODE_H E_FLT%RSVD_INSTR_L E_BUS%UTEST_L E_MSC%EARLY_FLUSH_EBO~H E_USQ%MIB_H E_USQ%PE_ABORT_H E_USQ%UTSEL_H E_PSL%PSL_H E_STL%USEQ..STALL_H E_TIM%S3_TIMEOUT_H E_USQ%IQ..ST.ALL_H E_USQ%MACRO_1ST_CYCLE_H E_USQ%MIB_H E_USQ%PE_ABORT_L E_USQ%UTSEL_H E_US~BUS%CAL_~_L E_USQ..BUs%CAL_~_L E_US'LCAL%CAL_H E_USQ..CAL%CAL_H E_USQ..CSM%UMIB_H E_USQ%ICT.X....H E_USQ..INQ%IQ..OUT_H E_USQ..LST%SELECT_I'LH E_USQ..STL%LATE_USQ..STALL_L E_USQ..STL%VERY_LATE_USQ..STALL_L E_STL%USE~ST.ALL_H E_TIM%S3_TIMEOUT_H E_USQ%I'LSTALL_H E_USQ%~CRO_1ST_CYCLE_H E_US~CSM%UMIB_H E_USQ%ICTX...,H E_US~INQ%I'LOUT_H E_USQ..LST%SELECT_I~H E_US'LSTL%LATE_USQ..STALL_L E_USQ..STL%VERY_LATE_US~STALL_L DIGITAL CONFIDENTIAL The Microsequencer 9-31 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 9-23 (Cont.): Behavioral Model Signal Names, In Alphabetical Order Behavioral Model Signal Name Schematic Signal Name E_US~TST%FORCE_T.EST_ADDR_L E_U~TSr%FORCE_TEST_ADDR_L I%IMEM_MEXC_H 1%1Q..B US_H K%RESET_L M%MME_TRAP_H M%TB_PERR_TRAP_H T%CS_TEST_H I%IMEM_MEXC_H I%IQ..BUS_H K_E%RESET_L M%MME_TRAP_L M%TB_PERR_TRAP_L T%CS_TEST_H 9-32 The Microsequencer DIGITAL CONFIDENTIAL NVAX CPU Chip FUnctional Specification, Revision 1.0, February 1991 9.7 Revision History Table 9-24: ReviSion History Rev Who When Description of ehange 0.0 Elizabeth M. Cooper ()6..Mar-1989 Release for external review. 0.1 Elizabeth M. Cooper 14-Sep-1989 Post-modelling update. 0.5 Elizabeth M. Cooper 10-Dec-1989 Updates for Rev 0.5 spec release. 0.5A Elizabeth M. Cooper 5-Jan-1990 Remove vector microtrap and V bit from IQ. O.5B Elhabeth M. Cooper 20-Jun-1990 Accumulated updates. 0.6A Elizabeth M. Cooper 26-Nov-1990 Final updates. 0.6B Elizabeth M. Cooper, 12-Dec-1990 Final final updates. Tim C. Fischer O.6C Elhabe~h M. Cooper I-Jan-1991 Add signal cross reference tables. 0.6D Elizabeth M. Cooper 13-Feb-1991 Add description of patch revision. DIGITAL CONFIDENTIAL The Mlcrosequencer 9-33 Chapter 10 The Interrupt Section 10.1 Overview The interrupt section receives interrupt requests from both internal and external sources, and compares the IPL associated with the interrupt request to the current interrupt level in the PSL. If the interrupt request is for an IPL that is higher than the current PSL IPL, the interrupt section signals an interrupt request to the microsequencer 'Which will initiate a microcode interrupt handler at the next macroinstruction boundary. When an interrupt is serviced by the Ebox microcode, the interrupt section provides an encoded interrupt ID on E_BUSo/cABUS_L<20:16>, which allows the microcode to determine the highest priority interrupt request that is pending. Interrupt requests are cleared in one of three ways, depending on the type of request. Software interrupt requests are supported via a I5-bit SISR register, which is read and written by the microcode, and which makes requests to the interrupt generation logic. Both full and subset interval timer support is provided, based on the state of the ICCS_EXT bit in the "ECR processor register, as described in Section 8.5.22. If ECR<lCCS_EXT>=O, a subset interval timer is supported by implementing the interrupt enable bit of the ICes processor register in internal logic. If ECR<lCCS_EXT>=l, a full interval timer is supported, and external logic must implement the full ICeS, ICR, and NICR processor registers. In this instance, reads from and writes to these registers are converted to 110 space addresses and transmitted off-chip, as described in Section 2.12, Processor Registers. 10.2 Interrupt Summary Interrupt requests received from external logic are divided into two categories: those received by edge-sensitive logic, and those received by level-sensitive logic. Both are synchronized to internal clocks. In addition, there are several internal sources of interrupt requests. DIGITAL CONFIDENTIAL The Interrupt Section 10-1 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 10.2.1 External Interrupt Requests Received by Edge-Sensitive Logic Five of the external interrupt requests are received by edge-sensitive logic and synchronized to internal clocks. These signals request the following special-purpose intenupts. • P%HALT_L: The assertion of P%HALT_L causes the CPU to enter the console at IPL IF (hex.) at the next macroinstruction boundary. This interrupt is not gated by the current IPL, and always results in console entry, even if the IPL is already IF (hex). Note that the implementation of this event is different from a normal interrupt in which a PCIPSL pair are pushed onto the interrupt stack. For this event, the current PC, PSL, and halt code are stored in the SAVPC and SAVPSL processor registers. The mechanism by which the console is entered, and a description of the SAVPC and SAVPSL processor registers is given in Section 15.4, Console Halt and Halt Interrupt.. • P%PWRFL_L: The assertion of P%PWRFL_L indicates that a power failure is pending. This results in the dispatch of the interrupt to the operating system at IPL IE (hex) through SCB vector OC (hex). • PCiOH_ERR_L: The assertion of PCiOH_ERR_L indicates that a hard error has been detected in the system environment. This results in the dispatch of the interrupt to the operating system at IPL 1D (hex) through SCB vector 60 (hex). • P%S_ERR_L: The assertion of P%S_ERR_L indicates that a soft error has been detected in the system environment. This results in the dispatch of the interrupt to the operating system at IPL 1A (hex) through SCB vector 54 (hex). • P%INT_TIM_L: The assertion ofP%Il\"T_TIM:_L indicates that the interval timer period has expired. If the interrupt enable bit in the ICeS processor register is set (whether this bit is implemented internally or externally), an interrupt is dispatched to the operating system at IPL 16 (hex) through SCB vector CO (hex). If ICCS<6> is not set, no interrupt is dispatched. Each signal must make a high-to-Iow transition to assert the interrupt request. A pseudo-edge detect circuit is used to capture this transition asynchronously. Details of the edge detect logic given in Section 10.3.1. Because these are special-purpose interrupt requests with an implied SCB vector, no acknowledgement of the interrupt is required. Ebox microcode explicitly clears the interrupt request when the interrupt is serviced. 10.2.2 External Interrupt Requests Received by Level-Sensitive Lc;»gic Four of the external interrupt requests are received by level-sensitive logic and synchronized to internal clocks. These signals request general-purpose interrupts at the following IPLs. Interrupt Bequest p%mQ...L<3> P%mQ..,L<2> Po/dRQ..L<l> Po/olRQ...L<O> BequestIPL (Hex) (Dec) 17 16 15 14 23 22 21 20 Each signal must be driven low and remain low to assert the interrupt request. When one of these interrupts is to be serviced, the Ebox microcode acknowledges the interrupt by issuing an 10-2 The Interrupt Section DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 NDAL read of word length to one of four longword-aligned interrupt vector offset registers to obtain the SOB offset through which the interrupt should be dispatched. The address of the register depends on the interrupt being serviced, as shown in Table 10-1. Table 10-1: Interrupt Vector Offset Registers InteJ.TUpt Vector Offset Request Register Address Processor Register! Po/oIR'L,L<3> EIOOOIOC IAKl7 Po/oIR'L,L<2> p%m'L,L<l> EIOOOI08 IAKl6 EIOOOI04 IAKl5 p%m'L,L<o> EIOOOIOO IAKl4 1Direct access to the interrupt vector offset registers is provided via processor register reads for system test. Software references to these processor registers du.riJlg normal system operation can result in UNDEFINED behavior In response, the microcode expects to receive an interrupt 8CB vector offset, which is shown in Figure 10-1. The fields are described in Table 10-2. Figure 10-1 : Interrupt SeB Vector Offset 31 30 29 28127 26 25 24123 :2 21 20119 16 17 16115 14 13 l21l1 10 09 08107 06 05 04103 02 01 00 : x x x x x x x x x x x x x x x xl Sys~am Cor.~rel Block Offset IPRI =:'1 :!AKlx ---~--~--~--~-~--+--+--~--~--+--+--+--~--~--~--~-~--+--+--+--~--+--+--+--+--+--~--~--+--~--+--+ Table 10-2: Interrupt SCB Vector Offset Name Extent Description IL o Interrupt Level Override. In normal operation, the IPL at which the interrupt is serviced is implied by the request signal that was asserted. If the IL bit is set in the intelTUpt vector offset, the IPL at which the interrupt is taken is forced to 17 (hex). This capability supports external buses, such as the Q-bus, that can not guarantee that the device that responds with the intelTUpt SOB vector offset is the device that originally requested the interrupt. For example, the Q-bus has four separate interrupt request signals that correspond to P%IRQ...L<3:0> but only one signal to daisy chain the interrupt grant. Furthermore, devices on the Q-bus are ordered so that higher priority devices are electrically closer to the bus master. If an P%IR'LL<1> request is being serviced, there is no guarantee that a higher priority device will not intercept the grant. Software must determine the level of the device that was serviced and set the IPL to the correct value. DIGITAL CONFIDENTIAL The Interrupt Section 10-3 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 10-2 (Cont.): Interrupt sce Vector Offset Name Enent Description PR 1 Passive Release Flag. In certain circumstances, notably in multi-processor configurations, an intenupt may be requested but removed by the time the microcode acknowledges it by reading the interrupt vector offset register. If the PR bit is set in the interrupt SCB vector offset, the microcode treats this interrupt as an internal passive release and resumes the interrupted instruction stream without dispatching the interrupt. If the interrupt request is deasserted before the microcode reads the intelTUpt ID, the will be zero, indicating that no intelTUpt is pending. In that instance, no read of the interrupt vector offset register is done, and the microcode generates an immediate passive release. m 15:2 Longword offset from the start of the SCB of the vector to use to dispatch this interrupt. AoL"ter zero-extending to longword length, microcode adds this value to the contents of the SCBB register, reads that location, and uses it as the SCB vector with which to dispatch the interrupt to the operating system. NOTE If both the PR and IL bits are set in the interru.pt SeB vector offset, the PR bit takes priority and a passive release is done. 10.2.3 Internal Interrupt Requests The Cbox, Ibox, and Mbox report elTor conditions by asserting internal intelTUpt request signals that are logically ORed with the synchronized versions ofP%H_ERR_L and P%S_ERR_L. These requests are then handled in exactly the same manner as requests generated by external sources, as specified above. The following table details the internal intelTUpt sources Table 10-3: Internal Interrupt Requests SigDal Source Type H_ERR_L S_ERR_L S_ERR_L S_ERR_L ~OlLS_1EBlUI CBOX CBOX ICMBOlLS_EBR..L mox II'QIBOlLS_EB.BOll_1I MBOX CCl.CBOlLH..BB.R_1I The performance monitoring facility requests an interrupt at IPL 1B (hex) when the performance counters become half full. The performance monitoring hardware asserts the signal E_P.MN%PMON_L to perform this request. This request is serviced entirely by microcode, and cleared by writing to the appropriate bit in the ISR. Chapter 18 should be consulted more details about the Peformanee Monitoring facilities. 10-4 The Interrupt Section DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Architecturally defined software interrupt requests are implemented through an internal register in the interrupt section. Under control of the SISR and SIRR processor registers which are described in Chapter 2, the Ebox microcode sets the appropriate hit in this register, which then results in the dispatch of the interrupt to the operating system at an IPL and through the SCB vector implied by the interrupt request. The association between the interrupt request, requested IPL, and SCB vector for these requests is shown in the following table. Table 10-4: Software Interrupts Request IPL SCB Vector SISR bit (Hex) (Dec) (Hex) SI SR<15 > OF 15 BC SISR<14> OE 14 BS SISR<13> OD 13 B4 SISR<12> OC 12 BO SISR<11> OB 11 AC SISR<10> OA 10 AS SISR<09> 09 09 A4 SISR<OS> 08 08 AO SISR<Oi> 07 07 9C SISR<06> 06 06 98 SISR<05> 05 05 94 SISR<04> 04 04 90 SISR<03> 03 03 8C SISR<02> 02 02 8S SISR<Ol> 01 01 84 Ebox microcode explicitly clears the interrupt request when the interrupt is serviced. 10.2.4 Special Considerations for Interval Timer Interrupts The NVAX CPU may be configured to support either a subset interval timer, or a full interval timer, depending on the state of ECR<1CCS_EX'r>, as described in Section 8.5.22, Ebox IPRs. Console firmware initializes this bit to the correct state based on the system environment in which the CPU chip is used. The internal implementation of the interval timer interrupt request gates the assertion of P%INT_TIM_L with the internal copy of the interrupt enable bit of the ICCS processor register (ICC8<6». The CPU chip does not know the source of the signal driving P%INT_TIM_L, and this fact is used to allow the implementation of both a subset and full interval timer. If ECR<1CCS_EXT>::O, an SRM-approved subset interval timer may be implemented by driving P%INT_TIM_L with an oscillator whose period is lOms. In this mode, the NICR and ICR processor registers are not required nor implemented, and microcode maintains the subset ICCS processor register with an internal copy of only the interrupt enable bit from ICC8<6>. References DIGITAL CONFIDENTIAL The Interrupt Section 10-5 NVAX CPU Chip Functional SpeeifieatioDt Revisio~ 1.0, February 1991 to the ICCS processor register affect only ICCS<6>, and are handled internally without being transmitted on the NDAL. If ECR<lCCS_EX'I'>=l, a full interval timer consisting of the ICeS, NICR, and ICR processor registers may be implemented in external logic. P%INT_TIM_L is asserted when the programmed interval has expired. Processor register references to the ICCS, NICR, and ICR processor registers are converted to I/O space references and transmitted onto the NDAL, as described in Section 2.12, Processor Registers. However, even in this mode, microcode maintains the internal copy of ICCS<6> consistent with a write to ICCS that is transmitted onto the NDAL. As a result, if interrupts are enabled in the off-chip ICCS register, they are also allowed by the internal ICCS interrupt enable bit. Conversely, if interrupts are disabled in the off-chip ICCS register, they are also disabled by the internal hit. External logic is expected to return all 32 bits when the ICCS processor register is read, including the correct state of the interrupt enable bit. Microcode does not attempt to merge the external data with the internal copy of ICCS<6> to satisfy a processor register read of ICCS. It should be noted that ECR<lCCS_EXT> has no effect on the operation of the interrupt section hardware. It is used strictly as a control bit which directs the microcode operation of references to the ICCS processor register. Independent of the state ofECR<lCCS_EXT>~ processor register \rntes to ICCS cause microcode to update the internal copy of the interrupt enable bit. If E CR<l CCS_EXT> =1, references to the ICCS processor register are also transmitted onto the !\-r>AL. References to the ~ICR and ICR processor registers are always transmitted onto the 1\-r>AL; they are simply not used if the system implements a subset interval timer. Table 10-5 gives a summary of the results of references to the ICeS, l\'T}CR, and ICR processor registers, with both states of ECR<lCCS_EXT>. Table 10-5: References to Interval Timer Processor Registers Operation MFPR #PR$_ICCS,x Update internal lCCS<6> Update internal lCC8<6>, write data to EI0000601 Return internal lCC8<6> Read aDd return data from ElOOO0601 MTPR x,IPR$_NICR Write data to El0000641 Write data to ElOOO0641 MFPR #PR$_NICR,x Read and return data from El0000641 Read aDd return c1ata from El0000641 MTPR x,1PR$_ICR Write data to El0000681 Write data to El0000681 MFPR #PR$_ICR,x Read and return data from ElOOOO6S1 Read aDd return c1ata from El000068 1 1See Section 2.12 10.2.5 Priority of Interrupt Requests When multiple interrupt requests are pending, the interrupt section prioritizes the requests. Table 10-6 shows the relative priority (from highest to lowest) of all interrupt requests. For reference, this table also includes the IPL at which the interrupt is taken, and the SCB vector through which the interrupt is dispatched. 1 0-6 The Interrupt Section DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 10-6: Relative Interrupt Priority Request IPL Inte1Tl1pt SCB Vector Request (Hex) (Dec) (Hex) P%HALT_L P%PWRFL_L P%H_ERR_L2 IF 31 None l 1E 30 OC ID 29 60 E_PMNCJalMON_L IB 27 NoneS P%S_ERR_L2 1A 26 54 p%mQ..,L<3> 17 23 Specified by devices 16 22 Specified by devices P%Il'."T_TIM_L 16 22 CO p%mQ..L<l> 15 21 Specified by devices p%mCL,L<o> 14 20 Specified by devices SISR<15> 15 BC 14 B8 SISR<13> OF OE OD 13 B4 SISR<12> OC 12 BO SI SR< 11> OB 11 AC p%mQ..,L<2> 4 SISR<14> SISR<10> OA 10 AS SISR<09> 09 09 A4 SISR<08> 08 08 AO SISR<07> 07 07 9C SISR<06> 06 06 98 SISR<05> 05 05 94 SISR<04> 04 04 90 SISR<03> 03 03 8C SISR<02> 02 SISR<Ol> 01 02 01 84 Highest priority 88 Lowest priority 1Direct dispatch to console; PC, PSL placed in SAVPC, SAVPSL processor registers 2lncludes Cbox, Ibox, and Mbax internally generated requests sSCB vector offset supplied by the device 4When enabled by the internal ICC8<6> lilnterrupt processed entirely by microcode The P%IRQ..L<2> request takes priority over the P%INT_'rIM_L request, both of which are at IPL 16 (hex). Inter-processor interrupts in multi-processor systems are requested via P%IRQ..L<2>, and they must take priority over interval timer requests. DIGITAL CONFIDENTIAL The Interrupt Section 10-7 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 10.3 Interrupt Section Structure The interrupt section consists of three basic components: the edge detect and synchronization logic, the interrupt state register (ISR), and the interrupt generation logic. A block diagram of the intelTUpt section is shown in Figure 10-2. Figure 10-2: Interrupt Section Block Diagram "'"O."'cO·O------------------, "'"o."'c'.c::)-----------------. _i.el. T'~_:~ ' ' ' _I. D~CT !:;:;... ~ ! ! .. Oel!C :~:-.5--j . _. __ :~~~p:": IS w I J w I ! I lI I I .. . *.~ II I I V V : I w,c -..,.. ~&.O •• .cJ~:2".2 . . . ,,,"we:.:," .c2I.u.u" .....!C .",.e~s-~c":"I ••• !'"C'.:"j .. ... ~ , •'w • .;,:..~~.:cc.c I I .. ~ . 11' lI ~l II .i. ti~L~~TIIj-L 11".2 ... wrT"I trIIIOM Iccs.ca. 1'''.0 . 'NTI!I!WltT I'T '''I. 'NT.ID ;; ..... ,;..... ;;" ~""A·"T"::·"·"· ::~;"1-t 1£ 1C 1e 1E ,D 27 ,7 10 2. -. --I- I ." :---, I I I I j ' ........._Oh.A. 11 1 ~ I. ." .. I. , .e "1 18 2'1 a. I I I u 'N~""U" 'TAT! 111811Te" 1 1 11 1 1 1 1 1 1 U I_III O~ D~ .1."~1» 0' 0' 01 ..."."Ii. ,. 22 11 11 1 l 1 ! l 1 1 I. I. ~ I I ,. I. i:I~::::8:~.!!:~c-:.::!'c'." • ~ 1 I. I. "' I. I. L L L I. L '2 I" ,0 D'l I I DT D' DI 0 .. O• 02 0' DO L L I. ~ ~.L "~.L ... 211:11. .._._-, ~ , ".11 T_D~"~lltL_ .....0. __ I~LI1II1! I 1::1~'~\'f,·r_~;"~'~,"T.'Cc4I ••• ,;. 1! 1 1! 1 1 111 1 ! 111 .l 2'1 21 a.. ~ ,. 'oO za u I ~IIIOI!ITY 11 ~"""'IOJC_'_I_II".I< • ···c.. ··,1t' 21i 10 . ,...,.0,,_'_1""'_", _'_ ,... ,. ,. l'~~ ,. n .._Ia._~ ~ClOx_ ~_CIOJC_'_I"It_.. i ~W~r·L """"10-...2. :i!=g:"f:!:~ 'A 17 .. ' ; ~ I. 'NTI!IIIIUP""_'ICT,ON.DOC . I"NO ...·O.. '%lII. I ~ '1.1: ; ......u,..,rr 1"1 ~ , .... E......,.aEOI< ~D. !_INT_D~"'CC'_'_L !_INT_DPftllCcID.,. ... ID_ .. 10.3.1 Edge Detect and Synchronization Logic 10.3.1.1 Edge Detect Circuitry E_I"T_D~' ... I.a_Lc' •• '. ~ aU_Allu. The pads for the five special-purpose external interrupt request signals contain logic which detects high-to-Iow transitions on these signals. A falling edge sets an SR flip-flop which begins the interrupt request process. This interru.pt request process involves setting another SR flip-flop to register the interrupt. This second flip-flop may only be cleared by microcode. Microcode clears this flip-flop while servicing the interru.pt request. The edge detect circuitry resets itself 10-8 The Interrupt Section DIGITAL CONFIDENTIAL. NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 automatically (clearing the first SR) within two NDAL cycles following the low-to-high transition of the pin. 10.3.1.2 Intenupt Synchronization The pads for all external interrupt request signals (both the edge and level sensitive types) contain synchronizers to allow the use of asynchronous signals for interrupt requests. The pin signals are synchronized to the internal NVAX. clocks and are then passed to the ISR. More deterministic timing behavior may be desired in some applications such as during test. This may be achieved by driving the signals synchronously with respect to the input clocks. The chapter on Electrical Characteristics should be consulted for details about setup and hold times. 10.3.2 Interrupt State Register The interrupt state register is a composite register that implements the I5-bit architecturally defined SISR register, the internal copy of the interrupt enable bit from the ICeS processor register. the interrupt latch for the performance monitoring facility inteITUpt, and the interrupt request latches for the 5 special-purpose and 4 general-purpose interrupts. The ISR contains two kinds of elements: SR Hops for the special-purpose interrupt requests, and latches for the other requests. The follo\ving table lists the types and positions of all elements in the ISR. State ISR bit Element Description 31 SR Interrupt request for P%HALT_L interrupt 30 SR Interrupt request for P%PWRFL_L interrupt 29 SR Interrupt request for P'1oH_ERR_L and internal hard elTOr interrupts. 28 SR Interrupt request for E.PJdNIIDPJ40N.L, the performance monitoring facility interrupt 27 SR Interrupt request for P'1DS_ERR_L and internal BOft elTOr interrupts 26 L Interrupt request for P%IR<LL<3> interrupt 25 L Interrupt request for P%IR<LL<2> interrupt 24 23 SR Interrupt request for P%INT_TIM_L interrupt L Interrupt request for P%IR<LL<1> interrupt 22 L Interrupt request for P'1~L<O> interrupt 15:1 L SISR<15:1> latches and requests for software intenupts 0 L IntemalICC8<6>la~h State Element SR-SRflop I-Latch Synchronized inputs from the external special·purpose interrupt requests are logically ORed with the internal requests from the Cbox, Ibox, and Mbox. The assertion of one of these signals causes the appropriate request flop to be set in ISR<3I:29,27,24>. These request flops are cleared under Ebox microcode control when written with a 1 from the corresponding bits of E_BUs%WBUS_L. DIGITAL CONFIDENTIAL The Interrupt Section 10-9 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Synchronized inputs from the general-purpose interrupt requests are loaded into the appropriate latch in ISR<26:25,23:22>. These request latches are cleared when the interrupting device deasserts the interrupt request in response to a CPU request for an interrupt vector offset. The performance monitoring facility interrupt request is loaded into the request flop in ISR<28>. The request is cleared under Ebox microcode control when written with a 1 from E_BUS%WBUS_L<28>. SISR<15:1> is implemented via ISR<15:1>, and is loaded from bits <15:1> of E_BUS%WBUS_L under Ebox microcode control. These request latches are cleared under Ebox microcode control when a new value is loaded from E..BUS%WBUS_L. The internal copy of the interrupt enable bit in the ICCS processor register (ICCS<6» is implemented via lSR<O>, and is loaded from E_BUS%WBUS_L<O> under Ebox microcode control. Local logic gates the interval timer request from ISR<24> with the state of ISR<O>. The interrupt request elements of the interrupt state register aSR<31:22,15:1» go to the interrupt generation logic. ISR<O> and ISR<15:1> may also be read onto E_BUS%ABUS_L for return to the Ebox. 10.3.3 Interrupt Generation Logic The interrupt generation logic priority encodes all interrupt requests from the interrupt state register to determine the highest priority request. The output of the encoder is the request IPL and the interrupt ID of the highest priority request. If any request is pending, the request IPL is compared against E_PSL%PSL_BdO:l6> from the Ebox. If the request IPL is higher than the PSL IPL, or if the request is for P%HALT_L <P%H.ALT_L is not gated by the IPL), E%II\"T_RE~H is asserted to the microsequencer. The assertion of E%lNT_RE~H causes the microsequencer to initiate a microcode interrupt handler at the next macroinstruction boundary. The same signal is available on the microtest bus (E_BUSo/tUTEST_L<O> as a microbranch condition, which is checked by the Ebox microcode during long instructions. Along with the request IPL, the intelTUpt generation logic provides an encoded intenupt ID that identifies the highest priority interrupt. The interrupt ID is read onto bits <20:16> of E_BUS%ABUS_L along with ISR<O> and ISR<15:1> when microcode references the AflNT.SYS source. For each interrupt, the interrupt ID encoding, request IPL, ISR bit number, method for clearing the interrupt, and 8CB vector is shown in Table 10-7. Table 10-7: Summary of Interrupts RequestIPL Bit Be.et SCB Vector (Dec) (Hex) (Dec) (Dec) Method (Hex) IF 31 1F 31 31 Write 1 to ISR bit Console BaIt P%PWRFL_L 1E 30 1E 30 30 Write 1 to ISR bit oc Po/oH_ERR_Ll 1D 29 1D 29 29 Write 1 to ISR bit 60 ISR Interrupt Request (1Ie%) P%HALT_L IntID 1Includes Cbox, Ibox, and M'box intemally generated requests 1 0-10 The Interrupt Section DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 10-7 (Cont.): Summary of Interrupts 18R Bit Reset seB Vector (Dec) Method (Hex) 27 282 Write 1 to ISR bit Handled by microcode 1A 26 2~ Write 1 to ISR bit 54 23 17 23 26 ReadIAKl7IPR Supplied by device 16 22 16 22 25 Read IAK16 IPR Supplied by device P%INT_TIM_L 1C3 28 16 22 242 Write 1 to ISR bit co p%mQ...L<l> 15 21 15 21 23 ReadIAKl5IPR Supplied by device p%mQ...L<o> 14 20 14 20 22 Read IAKl4 IPR Supplied by device SISlk15> OF 15 OF 15 15 Write 0 to ISR bit BC SISlk14> OE 14 OE 14 14 Write 0 to ISR bit B8 Interrupt Request Int ID (Hex) (Dec) RequestIPL (Hex) (Dec) E_PMN%PMON_L 1B 27 1B P1oS_ERR_L1 1A 26 P1dRQ...L<3> 17 p%mQ...L<2> SISR<13> OD 13 OD 13 13 Write 0 to ISR bit B4 SISlk12> OC 12 OC 12 12 Write 0 to ISR bit BO SISlk11> OB 11 OB 11 11 Write 0 to ISR bit AC SISlk10> OA 10 OA 10 10 Write 0 to ISR bit AS SISlk09> 09 09 09 09 09 Write 0 to ISR bit A4 SISlk08> 08 08 08 08 08 Write 0 to ISR bit AO SISR<07> 07 07 07 07 07 Write 0 to ISR bit 9C SI Slk06 > 06 06 06 06 06 Write 0 to ISR bit 98 SISlk05> 05 05 05 05 05 Write 0 to ISR bit 94 SISlk04> 04 04 04 04 04 Write 0 to ISR bit 90 SISlk03> 03 03 03 03 03 Write 0 to ISR bit 8C SISlk02> 02 02 02 02 02 Write 0 to ISR bit 88 SISlk01> 01 01 01 01 01 Write 0 to ISR bit 84 No Interrupt 00 00 Dismiss interrupt 1Includes Cbox, Ibox, and Mbox internally generated requests 2Write-l-to-clea:r ISR bit is different than IPL and interrupt ID 3 Interrupt ID is different than IPL The interrupt ID is the same as the request IPL for all interrupt requests except for the interval timer request. DIGITAL CONFIDENTIAL The Interrupt Section 10-11 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 DESIGN CONSTRAINT A value of zero for the interrupt ID must be returned if an interrupt is no longer present, or if the highest priority interrupt request is no longer higher than the PSL IPL. Normally, once an intelTUpt request is made, it remains until it is cleared by the microcode. However, the level-sensitive interrupt requests may be deasserted after the interrupt is dispatched, but before the microcode reads the intelTUpt ID. Therefore, it is possible that the highest remaining interrupt has a request IPL lower than the current PSL IPL. If zero is not returned for the interrupt ID in this instance, the processor will not function correctly. 10.4 Ebox Microcode Interface The Ebox microcode interfaces with the interrupt section primarily through reads (via E_BUS%ABUS_L) and writes (via E..BUS%WBUS_L) of the ISR accomplished through the AIINT.SYS and DSTIINT.SYS decodes. These decodes provide access to the so-called INT.SYS register, which is shown in Figure 10-3. The fields of the register are listed in Table 10-8. Figure 10-3: IPR 7A (hex), INTSYS 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 1 1 1 1 10101 1010101 INT.ID 1 SISR<15:1> 1 1 :INTSYS +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 1 1 1 1 1 1 +-- ICCS<6> 1 1 1 1 1 1 1 1 1 1 1 +-- INT_TIM_RESET 1 1 1 1 +-- S ERR RESET 1 1 1 +-- PMON-RESET 1 1 +-- H ERR RESET 1 +-- PWRFL_RESET +-- HALT_RESET Table 10-8: INTSYS Field Descriptions Name Extent Type Description ICC8<6> 0 RW,O This field contains the internal copy of the interrupt enable bit from the ICCS processor register. It is set to 0 by microcode at powerup. SISR 15:1 RW,O This field contains the 15 architecturally-defined software interrupt request bits. It is set to 0 by microcode at powerup. !NT.ID 20:16 RO This field contains the encoding of the highest priority interrupt request as listed in Table 10-7. Writes to this field are ignored. INT_TIM_RESET 24 WC,O Writing a 1 to this field clears the P%INT_TIM_L interrupt request. Writing a 0 has no effect on the request. The field is read as a 0 and the interrupt request is cleared by microcode at powerup. 1 0-12 The Interrupt Section DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 10-8 (Cont.): Name PMON~RESET INT.SYS Register Fields Extent Type Description 27 WC,O Writing a 1 to this field clears the P%S_ERR_L interrupt request. Writing a 0 has no effect on the request. The field is read as a 0 and the interrupt request is cleared by microcode at powerup. 28 WC,O Writing a 1 to this field clears the E_PMNVMON_L interrupt request. Writing a 0 has no effect on the request. The field is read as a 0 and the interrupt request is cleared by microcode at powerup. 29 WC,O Writing a 1 to this field clears the Po/oH_ERR_L interrupt request. Writing a 0 has no effect on the request. The field is read as a 0 and the interrupt request is cleared by microcode at powerup. 30 WC,O Writing a 1 to this field clears the P%PWRFL_L interrupt request. Writing a 0 has no effect on the request. The field is read as a 0 and the interrupt request is cleared by microcode at powerup. 31 WC,O Writing a 1 to this field clears the Po/GH.ALT_L interrupt request. Writing a 0 has no effect on the request. The field is read as a 0 and the inteITUpt request is cleared by microcode at powerup. DESIGN CONSTRAINT ~nen read onto E_BUSo/cABUS_L, ThTT.SYS<31:27,24> must be zero. Microcode updaies the iniemal copy of ICCS<6> and SISR<15:1> by reading the I~~.SYS register,modifying the appropriate bits, and writing the updated value back. The write-one-to-clear bits must be read as zero because the microcode does not mask them out before writing them back. MICROCODE RESTRICnON The INT.SYS register is not bypassed. A write to INT.SYS in microinstruction n must not be followed by a read of INT.SYS sooner than microinstruction n+4. MICROCODE RESTRICnON Changes to machine state that affect the generation ofintelTUpts (PSL<IPL>, ICCS<6>, or SISR<15:1» done by microinstruction n must not be followed by a LAST CYCLE microinstruction sooner than microinstruction n+4 if the change is to be observed by the next macroinstruction. 10.5 Processor Register Interface Software can interact with the interrupt section hardware and microcode via references to processor registers, as follows: • • • ICCS: References to the ICCS processor register allow access to the copy of ICCS<6> that is implemented in INT.SYS<O>, as described in Section 10.2.4. NICR, ICR: References to the NICR and ICR processor registers are transmitted off-chip to an optional full interval timer implementation as described in Section 10.2.4. SISR, SIRR: References to the architecturally-defined SISR and SIRR processor registers allow access to SISR<15:1>, which are implemented in lNT.SYS<15:1>. DIGITAL CONFIDENTIAL The Interrupt Section 10-13 NVAX CPU ·Chip Functional Specification, Revision 1.0, February 1991 • ECR: References to ECR<ICCS_EX'l'> select the interval timer configuration, as described in Section 10.2.4. 1A.Kl4, IAKl5, IAKl6, IAKl7: Reads of the IAK processor registers allow diagnostic and test software direct access to device interrupt vectors, as described in Section 10.2.2. References to these processor registers during normal system operation can result in UNDEFINED behavior. INTSYS: References to the INTSYS processor register allow diagnostic and test software direct access to the INT.SYS register. Reads of the INTSYS processor register return the format shown in Figure 10-3. Writes of the INTSYS processor register are internally masked by microcode such that only the left halt write-to-clear bits are written. Other hits remain unchanged. Writes to the INTSYS processor during normal system operation can result in UNDEFINED behavior. ' • • 10.6 Interrupt Section Interiaces 10.6.1 Ebox Interiace 10.6.1.1 Signals From Ebox • • • • • • E_BUS%'WBUS_L: Write data bus, from which ICCS<6> and SISR<15:1> are loaded, and from which the write-one-to-clear interrupt latches are cleared. E_PMN%P.MON_L: Performance monitoring facility interrupt request. E_PSL%PSL_B<20:16>: IPL field from the current PSL. E_STL%F_NOP_S5_B: Force a NOP into S5 of the MIB decode pipe when an S3 or S4 stall exists E_STL%LATE_F_NOP_S4_H: Force a NOP into S4 of the MIB decode pipe when an S3 stall exists E_STL%LATE_STALL_S4_H: Stall the MIB decode pipe when an S4 stall exists 10.6.1 .2 • Signals To Ebox E_BUS%ABUS_L: A-port operand bus, on which ICCS<6>, SISR<15:1>, and the interrupt m are returned. 10.6.2 Microsequencer Interface 10.6.2.1 Signals from Mlcrosequencer • E_USQ%MIB_B<31:2O>: MIB lines used to decode the writes/reads to INT.SYS • E_USQ%MIICIAl:2O>: MIB lines used to decode the writeslreads to INT.SYS • E_US~UTSEL..B<4:O>: Microtest bus select code. • E_US~UTSEL_L<4:O>: Microtest bus select code. 10-14 The Interrupt Section DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 10.6.2.2 Signals To Mlcrosequencer • E%JNT_REQ...H: Interrupt pending. • E_BUS%UTEST_L<O>: Microtest bus. 10.6.3 Cbox Interface 10.6.3.1 Signals From Cbox • C%CBOX,.,H_ERR_H: Hard elTor interrupt request. • C%cBOX,.,S_ERR_H: Soft error interrupt request. 10.6.4 Ibox Interface 10.6.4.1 Signals From Ibox 10.6.5 Mbox Interface 10.6.5.1 Signals From Mbox 10.6.6 Pin Interface 10.6.6.1 • • • • • • Input Pins P%HALT_L: Special-purpose halt "intelTUpt" signal, sampled by edge-sensitive logic. P%H_ERR_L: Special-purpose hard elTor interrupt signal, sampled by edge-sensitive logic. P%INT_TIM:_L: Special-purpose interval timer interrupt signal, sampled by edge-sensitive logic. p%mQ..,L<3:0>: General-purpose interrupt signals, sampled by level-sensitive logic. P%PWRFL_L: Special-purpose power failure interrupt signal, sampled by edge-sensitive logic. P%S_ERR_L: Special-purpose soft error interrupt signal, sampled by edge-sensitive logic. 10.6.7 Signal Dictionary Table 10-9: Cross-reference of all names appearing In the Interrupt chapter Schematic Name DIGITAL CONFIDENTIAL Behavioral Model Name The Interrupt Section 10-15 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 10-9 (Cont.): Cross-reference of all names appearing In the Interrupt chapter Schematic Name E_PMN'liPMON..,L E_PSL~PSL..RdOa18> E_STLc;Q'.,NOP_S5-R E_USQ%MlB_R<31a2O> L_USQ%MlB_L<31a2O> L_USQ~t."TSEL..Bc.C1O> L_USQ~'L"l'SEL_L<41O> Behavioral Model Name C%CBOX_S_ERR_H E%INT_REQ...H E_BUS%ABUS_H E_BUS%UTEST_H E%WBUS_H E_PMN%PMON_H E_PSL%PSL_H E_STL%F_NOP_S5_H E_STL%L..'\TE_F_NOP_S4_H E_STL%LATE_STALL_S4_H E_USQ9C~llB_H E_USQ'iC!\IIB_H E_USQ'iC'tJTSEL_H E_USQ'kt."'TSEL_H I%IBOX_S_ERR_L :M%~IBOx....S_ERROR_H P%HALT~L P%H_ERR_L P%S_ERR_L P%INT_TIM_L P%PWRFL_L P%S_ERR_L 1 0-16 The Interrupt Section DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 10.7 Revision History Table 10-10: Revision History Who When Description of change Mike Uhler 06-Mar-1989 Release for external review. Mike Uhler 14-Dec-1989 Update for second-pass release. Ron Preston 09.Jan-1990 Changes to simplify implementation. Mike Uhler 2O.Jul-1990 Update for change to performance monitoring intenupt request and reflect implementation. Ron Preston 07-Feb-1991 Update to reflect Pass 1 implementation. DIGITAL CONFIDENTIAL The Interrupt Section 10-17 Chapter 11 The Fbox 11.1 Overview This chapter describes the fioating point unit of the :NVAX CPU chip. Only the major functional blocks, their interfaces to each other, and the interface to the rest of the !\.~~ system are described here. Circuit level implementation details are not of primary concern in this document. 11.2 Introduction The Fbox is the fioating point unit in the NVAX CPU chip. The Fbox is a 4 stage pipe1ined fioating point processor, with an additional stage devoted to assisting division. It interacts with three different segments of the main CPU pipe1ine, these are the micro-sequencer in S2 and the Ebox in 83 and 84. The Fbox runs semi-autonomously to the rest of the CPU chip and supports the following operations: • • • • • VAX Floating Point Instructions and Data Types The Fbox provides instruction and data support for VAX floating point instructions. VAX F-, D-, and G-floating point data types are supported. VAX Integer Instructions The Fbox implements longword integer multiply instructions. Pipelined. Operation Except for all the divide instructions, DIV{F,D,G}, the Fbox can start a new single precision floating point instruction every cycle and a double precision floating point or an integer multiply instruction every two cycles. The Ebox can supply two 32-bit operands or one 64-bit operand to the Fbox every cycle on two 32 bit input operand buses. The Fbox drives the result operand to the Ebox on a 32-bit result bus. Conditional "Mini-Round" Operation Result latency is conditionally reduced by one cycle for the most frequently used instructions. Stage 3 can perform a "mini-round" operation on the LSB's of the fraction for all ADD, SUB, and MUL floating instructions. If the "mini-round" operation does not fail, then stage 3 drives the result directly to the output, bypassing stage 4 and saving a cycle of latency. Fault and Exception Handling The Ebox coordinates the fault and exception handling with the Fbox. Any fault or exception condition received from the Ebox is retired in the proper order. If the Fbox receives or generates any fault or exception condition, it does not change the flow of instructions in progress within the Fbox pipe. DIGITAL CONFIDENTIAL The Fbox 11-1 NVAX CPU Chip Functional SpecificatiODt Revision 1.0t February 1991 Figure 11-1 is a top level block diagram of the Fbox showing the six major functional blocks within the Fbox and their interconnections. Figure 11-1: F E S C Fbox block diagram Fraction Data Exponent Data Sign Data Control Control Data Bus I f I I \ / I I I I \ / ----------------------------------+ Se:tion !~terface Inp~t +---------------------------------~ IS! \ \ ! / \ / \ / .... , I ,...· \ / Iz: i~i IS: \ / \ / \ / Stag. l I -e:·I \ / IZi \ I \. / Stage :2 +---------------------------------+ iFI lEI 151 ICI \1 \/ \1 \/ ----------------------------------+ I S-:.age:3 I ~---------------------------------+ IFI I I lEI I I 151 I I ICI I I \/11\/11\/11\/11 +---------------------------------+ Stage 4 +---------------------------------+ IFI lEI 151 ICI \ / \ / \ / \ I +---------------------------------+ I Interface OUtput Section I +---------------------------------+ I I I I I I \ I Control I I \ I Data Bus 11.3 Fbox Functional Overview The Fbox is the floating point accelerator for the NVAX CPU. Its instruction repertoire includes all VAX base group :floating point instructions. The data types that are supported are F, D, and G. Additional integer instructions that are supported are MULL2, and MULLS. The number of internal execution cycles and the total number of cycles to complete an instruction within the Fbox is measured as follows in Figure 11-2 11-2 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 11-2: Fbox Execute Cycle Diagram For Land F Data Types 1 3 5 4 7 6 1----------1--------1--------1--------1--------1--------1--------1 1 1 FSl 1 FS2 1 FS3 1 FS4 I I<-->I<----Fbox internal execute cycles--><->1 1<-opcode->1 cycle 1operand resul~ 1cycle to Ebox For D and G Data Types 1 2 3 5 4 6 7 8 9 !----------I--------I--------I--------I--------I--------1--------1--------1--------1 1 1 FSl 1 FS2 1 FS3 1 FS4 1 I <-opcooe->1 1<-->1 I<-->I<----:bcx internal execute cycles->!<->I <->1 o?erandl operand: resul~ result c~"cl .. cycle cycle to Ebox to ::box :.L~ ULW The internal execution time for all instructions except MUL{D,G,L} and DIV{F,D,G} is four cycles. The internal execution time of the ,·arious Fbox operations is given in the following Table 11-1. Table 11-1: Fbox Internal Execute Cycles INSTRUCTION F D G L MUL 4 5 5 5 DIV 14 25 24 ALL OTHER 4 4 4 4 The total number of cycles taken by the Fbox to complete an instruction is given in Table 11-2. Note that this includes the cycles taken for opcode and operand transfer, in particular, the dead cycle between the opcode and the first operand is counted. Table 11-2: List of the Fbox Total Execute Cycles INSmUCTlON F D G L MOL 7 10 8 DIV 17 30 10 29 ALL OTHER 7 9 9 11.3.1 Fbox Interface This section is responsible for overseeing the protocol with the EbOx. This includes the sequence of receiving the opcode, operands, exceptions, and other control information, and also outputing the result with its accompanying status. The opcode and operands are transferred from the input DIGITAL CONFIDENTIAL The Fbox 11-3 \ NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 interface to stage 1 in all operations except division. The result is conditionally received from either stage 3 or stage 4. 11.3.2 Divider The divider receives its inputs from the interface and drives its outputs to stage 1. It is used only to assist the divide operation, for which it computes the quotient and the remainder in a redundant format. 11.3.3 Stage 1 Stage 1 receives its inputs from either the interface or the divider section and drives its outputs to stage 2. It is primarily used for determining the difference between the exponents of the two operands, subtracting the fraction fields, performing the recoding of the multiplier and forming three times the multiplicand, and selecting the inputs to the first two rows of the multiplier array. 11.3.4 Stage 2 Stage 2 receives its inputs from stage 1 and drives its outputs to stage 3. Its primary uses are: right shifting (alignment), multiplying the fraction fields of the operands, and zero and leading one detection of the intermediate fraction results. 11.3.5 . Stage 3 Stage 3 receives most of its inputs from stage 2 and drives its outputs to stage 4 or, conditionally, to the output. Its primary uses are: left shifting (normalization), and adding the fraction fields for the aligned operands or the redundant multiply array outputs. This stage can also perform a "mini-round" operation on the LSB's of the fraction for ADD, SUB, and MUL floating instructions. If the "mini-round" does not overfiow, and if there are no possible exceptions, then stage 3 drives the result directly to the output, bypassing stage 4 and saving a cycle of latency. 11.3.6 Stage4 Stage 4 receives its inputs from stage 3 and drives its outputs to the interface section. It is used for performing the terminal operations of the instruction such as rounding, exception detection (overflow, underfiow, etc.), and determining the condition codes. 11.4 Fbox· Ebox Interface The Fbox depends on the Ebox for the delivery of instruction opcodes and source operands and for the storing of results. However, the Fbox does not require any assistance from the the Ebox in executing the Fbox instructions. The Fbox macroinstructions are decoded by the Ibox just like any other macroinstruction and the Ebox is dispatched to an execution flow which transfers the source operands, fetched during 83 of the CPU pipeline, to the Fbox early in 84. Once all the operands are delivered, the Fbox executes the macroinstruction. Upon completion, the Fbox requests to transfer the results back to the Ebox. "When the current retire queue entry in the Ebox indicates an Fbox result and the Fbox has requested a result transfer, then the result is transferred to the Ebox, late in S4 of the CPU pipeline, and the macroinstruction is retired in S5. 11-1 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification. Revision 1.0, February 1991 The Fbox input interface has two input operand registers which can hold all of the data for one instruction, and a three segment opcode pipeline. If the Fbox input machine is unable to handle new opcodes or operands then P%JNPVT_STALL_B is asserted to the Ebox, causing the next Fbox data input operation to stall the CPU pipeline at the end of its 83. The Fbox output interface has a format mux and two result queues, the data queue and the control queue. The format mux is used to transform the result data into VAX storage format. The queues are used to hold data results and control information whenever result transfers to the Ebox become stalled. 11.4.1 Opcode Transfers to the Fbox '\\7henever the Fbox indicates that it is ready to receive new information by negating F%INPUT_S'rALL_H, the Ebox may initiate the next opcode or operand transfer. The Fbox receives instructions from the Microsequencer (82 of the CPU pipeline) on a 9 bit opcode bus. The opcode bus is made up of the 8 msb's of the macroinstruction along with a single bit which, when set, indicates a G data type operation (i.e., the low order macroinstruction opcode byte was FD (hex). The Micro-sequencer indicates the presence of a new opcode by asserting the opcode valid fl.ag~ EC-cFBOX_1ST_CYCLE_H. This opcode 'v'alid flag is only asserted once for each ne\v instruction. In particular, if the Microsequencer was stalled during an opcode transfer cycle then the same opcode could be driven for multiple cycles, however, Eo/cFBOX_1ST_CYCLE_H is only asserted for one of those stalled cycles. A complete list of the instructions executed by the Fbox and the opcode received from the Micro-sequencer is contained in Table 11-3. NOTE The Fbox does not check for an illegal opcode. However, if an illegal opcode is received then the Fbox will interpret it as if it were an ADDF. No indication is given that this error has occured, the Fbox simply assumes that an ADDF has been started. When the instruction is retired (assuming that it actually was not an ADDF) it will be possible for diagnostic software to determine that an error has oeeured. This processing of illegal opcodes is done entirely to keep the Fbox internal control signals in a predictable state and thus avoid any "catastrophic" failure. Once a valid opcode has been received from the Microsequencer, it is processed in a three element pipeline/queue by the Fbox input logic. The first level, II, is a static register which feeds the re-code PLA.. The second level, 12, is the recoded opcode. The third level, 13, is the current instruction, this register is output to both the Divider and Fbox stage 1. Any operand being sent to the Fbox is always for the instruction that is in 13. Each level has a corresponding valid bit which indicates the presence of an instruction to be executed. When the Fbox input is not stalled then opcodes and operands :flow in the following order: a. b. c. d. e. f. Opcode from the Microsequencer is loaded into II during~4. (CPU 82) Re-code PLA runs during the following ~12. Re-coded opcode is loaded into 12 at the end of ~2' 12 is loaded into 13 during the following ~3. Input operand latches are loaded during the next 4>12, at the earliest Fbox internal Data Valid is set on 4>3 following the last operand reception. DIGITAL CONFIDENTIAL The Fbox 11-5 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 If the final data is not received during phase 12, then the 13 register stalls. This back pressures the Fbox input instruction pipeline, if there is a valid instruction in 12 then it will also stall. Once 12 is stalled, II will stall on the next instruction from the micro-sequencer. When the final operand for the instruction in 13 is received the stall is removed and new instructions are allowed to advance within the input pipeline. Besides stalling when waiting for operands from the Ebox, the input instruction pipeline stalls for a :fixed number of cycles during MUL{D,G,L} and DIV{F,D,G} instructions. These internally generated stalls, termed opcode stalls, are needed to allow multiple passes in the multiply and the divide arrays. The opcode stalls not only keep the Fbox input pipeline from advancing, but also cause F%INPUT_STALL_B to be asserted back to the Ebox. Because an opcode stall can not be started until alI the operands for the stalling opcode have been received, a three level instruction pipeline/queue is needed in the Fbox input stage (refer to Section 11.4.3, Figure 11-3). It is possible for the Fbox to receive two additional new opcodes before the opcode stall can be asserted and take effect at the Ebox. These two additional opcodes, along 'with -the original stalling opcode, must be held in the Fbox input stage until the stall is finished. 11.4.2 Operand Transfers to the Fbox Source operands, which were accesed in the Ebox during S3, are transferred from the Ebox to the Fbox early in 84. There will always be at least one cycle between the opcode transfer and the corresponding operands, .during which the Fbox decodes the opcode. The data type of the source operand, contained in the 13 register of the input instruction pipeline, is used to select the proper data input format. There are two 32-bit input data busses, Eo/cABUS_B and E9CBBUS_H, which are used to transfer operands to the Fbox. If the instruction is either a single operand type or, an integer or Hoating F type, then all of the operands are transferred in one cycle. If the instruction is a :floating D or G type then one complete 64 bit operand is transferred on the concatenated input busses at a rate of one per cycle. For a floating D or G data type, the lower longword (Le., sign, exponent, and fraction MSB's) is transferred on the EtrcABUS_B and the upper longword is transferred on the Eo/cBBUS_B<>. Each 32-bit input operand bus has a related short literal flag which indicates the presence of a short literal on bits<5:0> of the corresponding bus. If a double precision operand is being transferred then a short literal will be detected using the flag associated with the E%ABUS_H and the Boating short literal data will be taken from E%ABUS_H<5:O>. The remaining Eo/cABUS_B and E%BBUS_B bits are zero, however the Fbox ignores them. When receiving an integer short literal, the integer is on bits<5:0> and the Fbox depends on the remaining bits of that bus being zero. The Fbox: must transform all short literals to the proper format based on the instruction data type. When all of the input operand information for both input data busses is valid, the Ebox asserts an input valid flag, M>FDATA...VALID_H. If the flag is not asserted then the Fbox input machine enters an input stalled state. Along with the operands, the Ebox sends 3 different operand fault Bags. These are the memory management, hardware error, and reserved address mode faults. Once an operand fault has been sent to the Fbox, it is unpredictable whether the Ebox will or will not assert the E%FDATA....VALID_H signal. It is also unpredictable whether or not any other outstanding operands will be sent. When the Fbox receives an input fault two actions take place: 11-6 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 1. The Fbox asserts data valid into the Fbox pipeline. This breaks any internal stall conditions, thus allowing the instru.ction to complete. 2. The Fbox asserts F%INPUT_STALL_H. This halts the transfer of any other operands and prevents the Fbox and Ebox from getting out of synchronization. This stall normally continues until after the faulting instruction has been retired by the Fbox. It is cleared by the assertion of E%FLUSH_FBOx...H or K%RESET_H. Since the faulting operand data values used by the Fbox are undetermined, it is possible that the Fbox may generate additional faults. However, the Ebox prioritizes the faults on retirement, the three input operand faults are at the highest priority. Therefore, any Fbox generated fault is ignored if the Fbox received an input operand fault. On completion, the faulting instruction will be handled by the Ebox in the proper order, ensuring compliance with the VAX architecture standard. In addition, the Ebox will flush the Fbox, this will cause F%JNPUT_BTALL_H to be negated, releasing the stalled state. Besides the operand fault flags, the Ebox also sends the current ,,·alue of the PSL floating underfiow enable bit, E%PSL_FU_H. If the FU bit is set then the Fbox will cause a fault on floating underfio\'r. '\\1hether the FU bit is set or clear, the Fbox will return a floating zero data yalue on the result bus if underflow is detected. 11.4.3 Summary of Fbox Input Stage Stall Rules The follm.ving list is a set of input stall rules for the Fbox input stage. They center around opcode transfers and the actions related to the assertion and negation of F%I!\"PUT_STALL_R. 1. Floating opcodes are transfelTed from the Microsequencer to the Fbox during the CPU's 82 cycle. There will always be at least one cycle between an opcode transfer, OPC1, and the first data transfer for that opcode. In addition, there can only be one new opcode transfer, OPC2, between Opel and opel's last data transfer. It is possible that a new opcode transfer, Ope3, could take place in the same cycle as Opel's last data transfer. Refer to the following Figure 11-3. Figure 11-3: Opcode Transfers to the Fbox Cycle I n I n+1 I n+2 I m m+1 I m+2 +---------+---------+---------+ ... +---------+---------+---------+ I OPCl I I 1ST DATA I I I I LAST DATA I +---------+---------+---------+ ... +---------+---------+---------+--... I OPC2 I I I +---------+---------+---------+--... I OPC3 +---------+---... 2. Assertion of F%INPUT_STALL_H implies that the next data transfer cycle will stall; i.e., if F%INPUT_STALL_B is asserted during a data transfer cycle then that cycle will not stall but the next data transfer cycle will. That next data transfer cycle can not have either E%FBOx...lST_CYCLE_H or ~FDATA....VALID_H asserted. The Ebox will repeat the DIGITAL CONFIDENTIAL The Fbox 11-7 NVAX CPU Chip Functional Speci:6.cation, Revision 1.0, February 1991 stalled transfer cycle keeping the E%ABUS_B, E%BBUS_H, E%FDATA...,VALID_B, and any faults unchanged. 3. If F%INPUT_STALL_B is released in the current data transfer cycle then the current data transfer cycle will be repeated once more in the next cycle, this time with E%FDATA..VALID_B asserted. In that next cycle it is also possible to have E%FBO~lST_CYCLE_B asserted, indicating a new opcode transfer. 11.4.4 Fbox Result Transfers to the Ebox Data is returned to the Ebox on one 32-bit output bus. A single integer or :floating F type result can be returned in one cycle. Floating DIG data requires two cycles, the lower 32-bits (i.e., sign, exponent, and mantissa msb's) are returned in the first cycle followed by the upper 32-bits in the next cycle. A two bit data length field and a two bit condition code map field are also returned ,vith each result transfer, as are all of the result status bits. The data length field is used to indicate a result data length of Byte, Word, Longword, or Quadword. The condition code map field informs the Ebox which PSL condition code bits must be updated for the retiring instruction. If the Fbox is not trying to retire an instruction then the condition code map is forced to a value of "no update". For double precision results which require two transfers, the data length is set to Quadword during both transfers. The condition code map will be forced to a value of "no update" during the first transfer of a double precision result and then to the proper instruction dependent code during the second transfer. The other result status is broadcast during both transfers. The Ebox uses the result status to detect microtrap conditions before any store of result data occurs. The Fbox supplies 12 bits of status information with the retirement of each instruction. These are made up of: a. Operand faults received with the input operands. 1. F%MMGT_FLT_H - memory management faults 2. FCfoMERR_H - hardware read faults, etc 3. F%RSVD-.ADDR_MODE_B - Reserved Address Mode Fault b. Fault conditions detected by the Fbox 1. F%RSV_B .. reserved operand 2. F%FOV_B .. :floating over:fiow 3. F%FU_B - :floating under:fiow 4. F%FDBZ_B - :floating divide by zero c. Fbox condition code values 1. F%CC_N_H - result is negative 2. F%CC_Z_B'" result is zero 3. FtroCC_V_B - result caused an integer over:fiow 4. F%CC_MAP_B<llO> - cc update map select If multiple exceptions are detected by the Fbox for an instruction that it is executing then all of the exceptions for that instruction are reported to the Ebox. The Ebox and Microsequencer will prioritize these faults. The source operand faults are at the highest priority. Refer to Section 8.5.19.7 for the priority of the Fbox detected faults. 11-8 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0t February 1991 There are two signals from the Ebox to the Fbox that control the transfer of results by the FboL E%R.ETIRE_O~H informs the Fbox that it may be possible to retire an instruction. E%STORE_OK_H indicates that it is possible for the Fbox to store data. When the Fbox wants to store a result the request signal, F%STORE_H is asserted. Similarly, if the Fbox wants to retire an instruction then F%R.ETIRE_H is asserted. All instructions must be retired on completion, most instructions (with the exception of TST and CMP) also need to store data. Single precision and integer instructions which store a result request to both store and retire in a single transfer cycle. Double precision instructions which store a result need two transfer cycles, the first transfer requests only to store, the second transfer requests to both store and retire. All TST and CMP instructions, regardless of data type, will request to retire without a store in one transfer cycle. The completion of a result transfer from the Fbox to the Ebox is recognized when the appropriate request and its corresponding OK signal are both asserted. Conversely, if the corresponding OK signal is not asserted then the Fbox stalls (repeats) the current transfer. Vlben an instruction is completed by the Fbox core, the Fbox output stage transforms the data result back into VAX memory format. The VAX formatted data, along ,vith ten hits of result status, is then always written into the output data queue. This queue has seven entries, each of which are 74 bits wide. The data from this queue is transferred to the Ebox on the Fo/c:RESULT_H bus in a mst-inlfirst-out fashion, one long'\vord at a time. If the data queue is empty at the time that the core is retiring, then the low word of the formatted data, along with the result status, is also selecied to bypass directly to the result bus. This action is performed by the result multiplexer, which can select one of three sources: the queue bypass bus, the output queue lo\v word, or the output queue high word. The data queue is written every cycle, its input (write) pointer is only advanced after writing valid data. ~'benever an instruction is retired, the data queue output (read) pointer is advanced. When the input and and output pointers are selecting the same entry then the queue is empty. If the input pointer is only one entry ahead of the output pointer a condition called empty next is detected. The empty and empty next conditions are used to generate result transfer requests from the data queue, and also in selecting between the queue bypass bus or the queue read data. Because double precision results retire from the Fbox core in one cycle but require two cycles to be transferred back to the Ebox, the high word of a double precision result will always be soureed from the data .queue. This allows the core to retire quadword results in consecutive cycles (which could happen when CVTx{D,G) instructions are executing). Besides the data queue, the Fbox output also has a control queue. This queue is seven bits wide by seven entries deep. It contains information derived from the opcode; the result data length, the condition code map, whether the instruction writes a result or not, and how many transfer cycles will be required to retire the instruction. Since the opcodes will precede the data through the Fbox pipeline by one cycle, there is no need to have a bypass bus for the control queue. The output machine is always able to write the control information into this queue and read it back before it is needed. Like the data queue, the control queue is written every cycle. Its input pointer is advanced after a new instruction has been passed through the pipeline and written into this queue. Its output pointer is advanced after a valid entry has been read into the control latch (i.e., the control queue's output latch). Because the .request information is needed early in the transfer cycles, the control queue often is running ahead of the data queue. Result transfers to the Ebox can be initiated by one of three sources: from the Fbox stage 3 bypass request line, from a data valid in Fbox stage 4, or from the Fbox output queue. The output queue takes precedence over the Fbox core. If the queue is not empty then the current queue output is transferred to the Ebox, any concurrent results from the Fbox core are written into the output OIGITAL CONFIDENTIAL The Fbox 11-9 NVAX CPU Chip FUDcti~nal Specification, Revision 1.0, February 1991 queue. Fbox stages 3 and 4 perform their own prioritization. If stage 4 is retiring an instru.ction then stage 3 will not attempt to bypass stage 4. Instead, stage 3 passes its unrounded result to stage 4 and stage 4 will retire that result in the next cycle. 11.4.5 Fbox Pipeline Stalls The Fbox input can request to stall the Ebox for one of two reasons. The Ebox does not actually stall until the next time it is ready to transfer data to the Fbox. Fbox Input Stall 1. Opcode Stalls 2. Fault Stalls As was- mentioned earlier at the end of Section 11.4.1, the implementation of some instructions requires more than one cycle of execution within some stages of the Fbox pipeline. These instructions require that they be followed by a sufficient number of bubbles in the pipeline such that they can not be OyelTUIl by succeeding instructions. In particular, MUL{D,G~L} require two cycles in the stage 2 multiply array, and DIV{F!,D~G} require lO!,21,20 cycles, respectively, in the divide array. In order to guarantee proper operation, the Fbox input generates an input stall of the appropriate length for each of these instructions. The multiply stalls are controlled by a simple state machine in the Fbox input, it starts when all of the multiply operands have been received and continues for one cycle. The divide stalls are started by the input interface, as soon as all of the divide operands are received, and ended by a dhide done signal \vhich is received from the Fbox divider stage. '\\1henever the Fbox receives an operand from the Ebox for which the Ebox has signaled a fault, the Fbox will request an input stall. This is done because it is unpredictable whether or not the Ebox will complete any other outstanding data transfers for this instruction. Therefore, to prevent the Fbox from entering an unpredictable state, P'%INPUT_STALL_B is asserted and any new data transfers after the faulting source operand are blocked. When the instruction with the faulting operand is retired the Ebox will :Bush the Fbox, this will release the fault stall condition. The Fbox output can cause a stall at the Ebox for one of two reasons: Fbox Output StaU 1. Result not ready 2. Stage 4 bypass abort If the Fbox does not have any results ready to retire and it is the selected source for the RMUX in the Ebox, then the Ebox is stalled until the Fbox is ready to transfer the result. Stage 3 in the Fbox has the ability to perform "mini-round" operations for floating ADD, SUB, and MUL instructions. When stage 3 detects that it may be possible to round its fraction result and bypass stage 4, then it makes a request to store data to the Fbox output interface. If the data queue is empty then this store request is passed on to the Ebox. Later in the same transfer cycle, stage 3 may detect a "mini-round" overflow or some other error condition. If this occurs then stage 3 signals an abort of the stage 4 bypass. If the data queue was empty then this abort causes Fo/cSTORE_STAI.L..B to be asserted to the Ebox. The current store is stalled, by the Fbox, for one cycle until the correct result can be obtained from stage 4. 11-10 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.4.6 Fbox Reset and Flush The Fbox can be initialized by the assertion of two different signals. At powerup time K%RESET_B is asserted for several cycles. This signal initializes all of the instruction registers and the output queue pointers in the Fbox interface. Any outstanding transfers and all stalls are terminated. At the completion of reset the Fbox is properly initialized and ready to receive opcodes and operands. The Ebox can also initialize the Fbox by asserting the E%FLUSB_FBO~H signal. This has the same effect as resetting the Fbox, the Fbox pipeline is cleared of all operations. Operations already under way anywhere in the pipeline are lost. E%FLUSB_FBO~H is updated during phase 1 and it is only asserted for one cycle. The Fbox is ready to receive new opcodes in the very next cycle. 11.4.7 Summary of Fbox-Ebox Signals The following signals are driven by the Ebox to the Fbox. • E%FLUSH_FBOX_H This signal causes the Fbox to clear its pipeline of all operations. • ECiCFBOX_lST_CYCLE_H This signal tells the Fbox that the opcode is valid. • E%FOPCODE_H<8:O> This 9-bit opcode bus carries the B-bit opcode byte of the macroinstruction along with a single bit that indicates G-type data. • • • E%FDAT..o\..V.ALID_H This signal tells the Fbox that all data on the operand busses is valid. The Fbox knows, from decoding the opcode, exactly what data to expect. E%.ABUS_H<31:O> and E%BBUS_B<31:O> These 32-bit busses carry the source operand(s). Eo/cA...SBLIT_B and E%B_SBLrl'_B These signals indicate that the data on the Eo/4lABUS_B or the ~BBUS_B, respectively, is'a 6-bit short literal value extracted from the instruction stream.. Special data formatting is requ.ired by the Fbox. • E%PSL_FUJI • The current psL<FU> value for use by the Fbox in deciding whether to signal :floating point underflow faults or not. E%F.,l\fMGT_FLT_B, E%F,.MEM..ERR_H, and E%F_BSVD_ADDR~ODE_B These signals tell the Fbox that there is a fault or error associated with the source operands. The Fbox carries this status down the pipeline so that it is handled after instructions which are already in the Fbox pipeline. • E%FBO~S4_BYPASS_ENB_B This signal is used to control the Fbox stage 4 bypass option. Assertion of this signal enables stage 3 to conditionally bypass stage 4. This signal is normally cleared at system startup, disabling the bypass option. This signal has the additional function of selecting between FDlR or FD2R to be output of Stage3 while the FBOX is in FBOX_Test mode. • E%RETIBE_OE....B, E%S'I'ORE_OK.-B These signals inform the Fbox of any stalls when attempting to transfer a result to the Ebox. DIGITAL CONFIDENTIAL The Fbox 11-11 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The following signals are driven by the Fbox to the Ebox. • F%IN'PUT_STAI..kH This control signal stalls the Ebox from issuing any more operands to the Fbox. • F%RETIRE_B This control signal tells the Ebox the Fbox is attempting to retire an instru.ction in this cycle. • Fo/eSTORE_B This control signal tells the Ebox the Fbox is attempting to store a result in this cycle. • pcr~TORE_STALL_H This control signal tells the Ebox the Fbox is stalling the current store request this cycle. • ~IDRESULT_B<31:OO> This 32-bit bus caITies Fbox results to the Ebox. • F%FBOX_DL_B<I:O> This is the data length used by the Ebox for an Fbox store. • F%CC_lICB, F%CC_Z_H, Fo/cCC_V_H These 3 signals carry Fbox condition code bits to the Ebox. They are Negative, Zero, and Overflow. • Fo/eCC_MAP_H<I:O> This is the map specifier which tells the Ebox how to update the PSL condition code hits. • Fc,"c..'\fMGT_FLT;.,H Signals a memory management fault for one of the currently retiring instruction's source operands. • Fo/GMERR_H Signals a memory access hardware error for one of the currently retiring instruction's source operands. • F%BSVD_ADDR_MODE_B Signals a reserved address mode fault for one of the currently retiring instruction's source operands. • F%RS'V_H Signals a reserved operand fault for one of the currently retiring instru.ction's source operands. • F%FOV_H Signals a Boating point overflow fault resulted from the currently retiring instruction. • ~_B Signals a Boating point underflow fault resulted from the currently retiring instruction. • F%FDBZ_B Signals a Boating po~t divide-by-zero fault resulted from the currently retiring instruction. 11.4.8 Fbox Instruction Set The instructions listed in Table 11-3 constitute the VAX integer and :floating point instructions supported by the Fbox datapath. 11-12 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 11-3: Fbox Ope Fbox Floating Point and Integer Instructions CC lDstructiOD NZVC MAP DL 04C 06C 14C 04D 06D 14D 04E 06E 14E CVTBF src.rb, dst.wf CVTBD src.rb, dst.wd CVTBG src.rb, dst.wg CVTWF src.rw, dst. wf **00 **00 **00 **00 **00 **00 **00 **00 **00 10 10 10 10 10 10 10 10 10 10 11 11 10 11 11 10 11 11 048 049 04A 068 069 06A 148 149 14A 04B 06B 14B c\'TFB src.n, dst.wb eVTFW sre.n, dst.ww CVTFL sre.n, dst.wI CVTDB src.rd., dst. wb ***0 ***0 ***0 ***0 ***0 ***0 ***0 ***0 ***0 ***0 ***0 ***0 11 11 . 11 11 11 11 11 11 11 11 11 11 00 01 10 00 01 10 00 01 10 10 10 10 056 199 076 133 CVTFD src.rf, dst. wd CVTFG src.rf, dst. wg CVTDF src.rd, dst.wf CVTGF src.rg, dst.wf **00 **00 **00 **00 10 10 10 10 11 11 10 10 040 041 060 061 140 141 ADDF2 add.rf, sum.mf ADDF3 addl.rf, add2.rf, sum.wf ADDD2 add.rd, sum.md ADDD3 add1.rd, add2.rd, sum.wd ADDG2 add.rg, sum.mg ADDG3 add1.rg, add2.rg, sum.wg **00 **00 **00 **00 **00 **00 10 10 10 10 10 10 10 10 11 11 11 11 CVTWD src.rw, dst.wd CVTWG src.rw, dst. wg CVTLF src.rI, dst. wf CVTLD sre.rl, dst.wd eVTLG src.rI, dst.wg CVTDW src.rd., dst.ww CVTDL src.rd, dst. wI CVTGB are.rg, dst.wb CVTGW sre.rg, dst.ww CVTGL src.rg, dst.wl CVTRFL src.rf, dst. wI CVTRDL src.rd., dst.wl CVTRGL src.rg, dst.wl DIGITAL CONFIDENTIAL Exceptions rsv, iov rs'\, io" rsv~ iov rsv, iov rsv, iov rsv, iov nv, iov nv, iov nv, iov rsv, iov rav, iov nv, iov rav rsv rav,fov nv, fov, fuv nv, fov, fuv rsv,fov,fuv nv, fov, fuv nV,fov,fuv rsv,fov,fuv nv, fov, fuv The Fbox 11-13 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 11-3 (Cont.): Fbox Floating Point and Integer Instructions Fbox Ope IDstructiOD NZVC CC MAP DL ExceptiODS 042 SUBF2 sub.rf, dif.mf **00 10 10 rev,fov,fuv 043 SUBF3 sub.rf, min.rf, eli!. wf **00 10 10 rev, fov,fuv 062 SUBD2 sub.rd, dif.md **00 10 11 rev, fov,fuv 063 SUBD3 sub.rd, min.rd, dif.wd **00 10 11 rev, fov, fuv 142 SUBG2 sub.rg, dif.mg **00 10 11 rev,fov,fuv 143 SUBG3 sub.rg, min.rg, dif.wg **00 10 11 rev, fov, fuv OC4 Oe5 ~IDLL2 mulr.rl, prod.ml ***0 11 10 iov lvIULL3 mulr.rl, muld.rl, prod.wI ***0 11 10 044 ~roLF2 mulr.ti, prod.mf **00 10 10 iov rsy, fo,·, fuv 045 **00 10 10 rsv, fov, fuv 064 MULF3 mulr.rf~ muld.rf, prod.wi :Mt;"LD2 mulr.rd, prod.md **00 10 11 rsY, fov, fuv 065 l\rt1LD3 mulr.rd, muld.rd, prod.wd **00 10 11 rsv, fov, fuv 144 ~fiJ"'LG2 mulr.rg, prod.mg **00 10 11 rsv,fov,fuv 145 MULG3 mulr.rg, muld.rg, prod.wg **00 10 11 rev, fov, fuv 046 DIVF2 divr.rf, quo.mf **00 10 10 rev, fov, fuv, fdvz 047 DIVF3 divr.ti, divd.rf, quo.wf **00 10 10 rev, fov, fuv, fdvz 066 DIVD2 divr.rd, quo.md **00 10 11 rev, fov, fuv, fdvz 067 DIVD3 divr.rd, elivd.rd, quo.wd **00 10 11 rev, fov, fuv, fdvz 146 DIVG2 elivr.rg, quo.mg **00 10 11 rev, fov, fuv, fdvz 147 DIVG3 elivr.rg, divd.rg, quo.wg **00 10 11 rev,fov,fuv, fdvz 050 MOVF src.rf, dst.wf **0- 01 10 rev 070 MOVD sre.rc., dst.wd **0- 01 11 rev 150 MOVG sre.rg, dst. wg **0- 01 11 rev 052 MNEGF src.n, dst. wf **00 10 10 rev 072 MNEGD src.rd, dst. wd **00 10 11 rev 152 MNEGG src.rg, dst.wg **00 10 11 rev 051 CMPF sre1.rf, src2.rf **00 10 :xx rev 11-14 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 11-3 (Cont.): Fboz Ope Fbox floating Point and Integer Instructions IDstructioD. NZVC CC MAP DL **00 **00 10 071 CMPD sre1.rei, src2.rd 151 CMPG srel.rg, src2.rg 053 TSTF src.rf 073 TSTD src.rd 153 TSTG src.rg **00 **00 **00 rsv rsv 10 10 10 10 E:&:ceptiODS xx xx xx rsv rsv rsv CC_MAP: Condition Code Map = 00 No Update ()1 MOV Floating 10 All Other Floating 11 = Integer = = DL: Result Data Lenet;h = 00 Byte 01 = Word 10 Long 11 = Q-olad = 11.5 DIVIDER 11.5.1 Introduction The divider stage in the Fbox performs the floating point divide operations. The inputs to the divider stage are the divisor and the dividend operands, source data type, opcode, data valid, and abort from the input interface section. The divider computes the quotient, and outputs to stage 1 of the pipeline: the quotient as two vectors, the :final remainder, also as two vectors, and division done signals. The divider also supplies the division done signal to the input interface section. The input interface stalls after issuing a divide instruction and defers further issue of instructions to Divider/Stagel until the division is completed in the divider. The final quotient and the final remainder are computed in the pipe stages. The sign of the :final remainder is used for correcting the quotient. This correction is done in stage-3 of the pipeline. The terminal operations for floating point divide (quotient overflow, rounding), and the detection of floating overflow, underflow, and reserved operand are done in the pipeline stages. The execution time within the divider stage is data independent for divide instructions. The table below lists execution time within the Fbox for divide instructions. DIGITAL CONFIDENTIAL The Fbox 11-15 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 11-4: Total Fbox execute cycles for Divide operation IDstruction Execution time in cycles DIVF 17 DIVD 30 DIVG 29 The execution cycles are counted beginning with the cycle in which Fbox receives the divide opcode through the cycle in which Fbox retires the result to EBOX. A typical cycle count for OIVD instruction would have 1 opcode transfer cycle, 1 dead cycle, 2 operand transfer cycles, 1 divide pIa cycle, 20 divider array cycles (retires 60 bits of quotient), 1 cycle each through stagel, stage2 and stage3 and finally 2 cycles for the result transfer from stage4 Gower longword) and output interface (upper longword) for a total count of 30 Fbox cycles. 11.5.2 Overview The dhider uses the Radix-2 SRT division algorithm using the following recursive relation: =.. ::?" ~s -:'::'e ~~:-:!~: =.e.:!..::.:'.:, ; !s ~=e ~~:-:.i_~~ £~: ~ is ~he ~i~iso=. (ass~ed to be no:malize~.) The partial remainder is computed using carry save addition and the quotient is selected using an estimate of the partial remainder. The boundary conditions for the partial remainder and the estimated partial remainder are as follows: a. b. c. d. -2D -< partial remainder < 2D 0 -< Max. error < 1.0 -2.5 -< est1mated partial remainder < 2.0 Quotient selection q - -1 i f estimated partial remainder < (- 0.5) q 0 if (- 0.5) -< estimated partial remainder < 0 q - +l if estimated partial remainder >- 0 To compute the estimated partial remainder the condition b) together with (c) above implies that a Carry Propagate Adder (CPA) of 4 bits (3 bits above the binary point and 1 bit below the binary point) is required. The division process essentially consists of the following two steps to retire each bit: • • Compute estimated partial remainder using the CPA and the quotient Compute the new partial remainder using the CSA by adding +0, -0 or 0 to the partial remainder based on the quotient from step 1. 11-16 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 11-4: Divider Array Block Diagram NPR FROM PREVIOUS Ft::IN S C \' t 4 BITS ONLY ,It ,II \" 4 BIT CPA aUOTIENT LOGIC " +0 w " CSA1 \11 .,.... OS(I) , Q A (I) " 0 0 l! CSA2 PR • PR + 0 S(+O) C(+O} J. 'II PR + 0 0 C (-0) S (·O) C(+O) S(+O} v w ,II \11 \/ 'I CSA3 \II 2 • (3: 1) SELECTOR AND SHIFTER C(NPR} S{NPR) W W TO NEXT FON OF CSA'S In order to speed up the time for retiring each bit, step 1 and step 2 are performed in parallel as there are only three choices for the quotient. As shown in the block diagram, Figure 11-4, the divider array computes (PR+1 *D), (PR-1 *D) and (PR+O*D) for all the possible values of quotient: q = -1, +1, and 0, in parallel while the quotient is being calculated. The correct new partial remainder is selected using the computed quotient. In the divide array, there are three rows of CSAs. Thus three bits are retired with each pass through the divide array. 11.6 Interface Signal Timing Diagrams 11.7 Divider Operation For a valid divide operation, the divisor is loaded into Divisor (DVR) register and the dividend into Dividend Feedback (DFB) register, both during Pffi_4. The CFB is initialized to zero. The control then sequences the datapath with appropriate control signals to load DFB and QM, for the required number of divide steps. For the DIVF instruction, the divide array generates 27 bits of quotient. For the DIVG instruction, the divide array produces 57 bits of quotient. For the DIGITAL CONFIDENTIAL The Fbox 11-17 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 11-5: Input Signals from Input Interface Operand ~ransfer ~o Diviaer: 1 P3 1 P4 1 Pl + P2 1 P3 1 P4 I Pl + P2 1 P3 1 P4 I 10.5 14.0 3.5 7.0 10.5 14.0 3.5 7.0 10.5 14. 1-1-1-1-1-1-1-1-1-1-1 I F B%FD2 L<A2:B58> F-B%FD1-L<A2:B58> F-B%ED1-L ::-B%ED2-L :-=~5;'" _I-I-!-I-b~-I-I-I-I-I XX>OOOOOOOOOOOO I--i--i--I--j--I--,--,--I--I--I :: :::3 is 52::::" --I--I--l--;--:--!-_I_-I--:--: I I i I ____________ -J/ ! - I ! \~ i i __________ _j : - - : - - 1 - - ;--,--j-- j - - : - - ; - - . - - I--I--I--!--:--:--!--I--i--!--! I ' I I ____________ -J/ ! ; ___________' i \~! j-------'---'----I-' 10.5 14.0 3.5 7.0 10.5 14.0 3.5 7.0 10.5 14.0 P3 1 P4 I P1 + P2 I P3 I P4 I Pl + P2 I P3 i P4 I Key: I - driven by Interface DIVD instruction, the divide array produces 60 bits of quotient. In general, since the quotient is greater than or equal to 0.5 and less than 2.0, the number of quotient bits to generate are the number of bits in the data type, one bit above the binary point and for rounding an additional hit in the least significant end. Since the divider array has three rows, one to two additional bits are generated. The divider control receives the F_I%DSEQ..START_L signal from the input interface indicating a valid DIV instruction. This signal should remain valid from the trailing edge of PHI_2 (input to the Divider PLA) thru. to the trailing edge of PHI_4 (Divisor and Dividend· Latches). coming from the input interface. The divisor and dividend operand latches are conditioned by the F _I%DSEQ...START_L signal. The source data type field from the input interface determines whether the division is a DIVF or DIVD or DIVG. At the conclusion of the required divider steps signals F_D_C2%DSEQ..DONEDAT4_H (to Input Interface) and F_D_C%DIVDONE_DAT_H (to Stage-I) are asserted. First the quotient components are driven on F_I%FDlR_H and F_I%FD2R_H together with the exponent and sign registers on respective buses. Then the sum and carry vectors are driven on F_I%FDR1_H and F _I%FD2R_H with exponents and signs. 11-18 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 11-6: Result Transfer to Stage-1 Divider Result Transfer: I o I,D D NOTE 1 Pl P2 P3 P4 Pl P2 P3 P4 Pl P2 3.5 7.0 10.5 14.0 3.5 7.0 10.5 14.0 3.5 7.0 I I I I I I I I I I 1-1-1-1-1-1-1-1-1-1-1 '-- / -j===-I=-I--I--I--I--I--l--;--!--; QS I ~..INDER SUM : ! >00000OO« »OOCX XX>OOOOO< : - - 1 - - --i--;--j--,--,-_, __ ,__ i QA i I I ~"'W:h'"D:::R O..!<..:tY i i >00000OO« »OOCX XX>OOOOO< ;--;--;--:--:--1--1--1--.--.-_-! I D! ! I D I ! i i , I ! I I l 1 F :~FD2 3<A2:358> F-!%ED2-E =:::~ED1:E I F _D%~\~=A_"'''·~:~::<'_E I,D NOTE 2 ! I 1 ! I l 1 I I I..L. I I I I i ~ ! i : l ----------/ \~-----------------'--;--:--;--;--I--;--;--:--I--j ! o i ! 3.5 7.0 10.5 :~.O 3.5 7.0 :0.5 1'.0 3.5 7.0 Pl I ?2 : ?3 i ?4 i ?1 1":2 I ?3 ; ?4 i ?l j ?2 i D - driven by Divider. NOTE 1: divdone dat with t bypass d deasserted. NOTE 2: data vaIid only for quotient transfer. The final quotient and the:final remainder are computed in the pipeline stages. In stage 1, the two parts of the quotient and in the following cycle, the two parts of the remainder are added. The final quotient requires coITeCtion if the sign of the final remainder is negative as one too many subtractions were performed. Thus, if the sign of the final remainder is negative the quotient is decremented in stage 3. If the quotient is GEQ 1.0, it is shifted down and rounding constant is added in stage 4. " 11.8 Divider Implementation The divider stage consists of fraction data path, control, exponent and sign sections. 11.8.1 Divider Fraction Data Path The divider fraction data path is composed of divisor register, divider array, quotient logic, quotient/remainder selector, and the fraction data path drivers. A block diagram of the divider fraction data path is shown in Figure 11-7. The divider fraction data path is shifted down by three bits relative to the interface and stage 1 fraction data path as shown in the figure. DIGITAL CONFIDENTIAL The Fbox 11-19 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.8.1.1 Divisor Register • DVR The divisor register DVR<Bl:B55> stores the divisor from the interface for divide operations. The DVR register is loaded during PHI_4 when input interface asserts the F_I %DSE't,START_L (asserted in Pffi_2 and held by the divider through Pffi_4) and the divider control asserts DVR_"WR_FDI (asserted in Pffi_4). DVR<.A2:AO> are forced to zero and DVR<BO> is forced to one. The output of this register is shifted down by two bits (for topological reasons to create space for Estimated Partial Remainder logic at the left of the datapath) and is used by the divider aITay to compute the partial remainder in DCSA cells. The dividend operand is also latched in PEn_4. 11.8.1.2 Divider Array The divider array consists of three rows of carry save adders (CSA:s), three carry propagate adders (CPAs), latches for the dividend and intermediate results. The various cells the divider array is composed of are DCSA, DSEL CPA, LA.Tl, R2D, DCSAF, DFB and CPA. The least significant bits of the array are different from the others and are described later. 11.8.1.2.1 DCSA and DSEl The nCSA, the carry save adder cell computes in parallel the (partial remainder + di,,"isor), (partial remainder - divisor) and (partial remainder + 0) corresponding to the quotient values of -1, 1, and 0 as sum (8) and carry (C). The correct new partial remainder is selected in DSEL using the three select lines from the CPA. ?R: Pa~ial Rema~~der s: swr. ~npu~ c: carry inpu~ D: d.ivisor S P~uso: sum output of PR+O S-PLUSD: sum out~ut of PR+l*D S:M!NUSD: sum output of PR-l*D C_PLUSO: carry output of PR+O C_PLUSD: carry OUtput of PR+l*D C_MZNUSD: carry output of PR-l*D SUM - S XOR C SANDC L - NOT(S AND C) SORC_L - NOT(S OR C) S PLUSO - SUM S- PLUSD - NOT «D AND SUM) OR (NOT D AND NOT SUM» S:MINUSD - NOT(S_PLUSD) C PLUSO - NOT(SANDC L) C-PLUSD - NOT«D AND SORC L) OR (NOT D AND SANDe L» C:MINUSD - NOT«NOT D AND-SORC_L) OR (D AND SAN.DC_L» The inputs to the first row of the divider array are DVR, SFB_H, CFB_H. During the :first step of the divide, the SFB and CFB contain the dividend and zero respectively and during subsequent steps they carry the outputs of the third row. The second row of the divider also uses the DCSA and DSEL cells. In the least significant bits of the array, since the S vector is shifted left by 1 and C vector is shifted left 1>,y 2, except for the first step of the division, the S and C inputs to the DCSA are zero. For the first step of the division, the least significant bit contains dividend <B55>. For the computation of PR-1 *D, the divisor is complemented and a one is forced in the C input position ( to complete the 2's complement) as illustrated in Table 11-5. 11-20 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 11-5: CSA Inputs CSAPorts PR+o PR+l*D PR-l*D Input S S S Input C 0 S 0 Input D 0 D NOTD Output S S SXORD Output C 0 SANDn SXORD SOR (NOT D) 11.8.1.2.2 1 LAT1 The outputs of the first row are latched every cycle in the L..4..Tl cell to avoid corrupting the third row inputs. The LATI cell is also used to latch the select lines from the row 1 CPA in hit position <B56> for the formation of the quotient. The L..td'l outputs are shifted left - S by one and C by two, to form the 2*partial remainder for the second row DCS.A.. During reset, LATI is loaded with the row 1 outputs to prevent illegal data making multiple select lines valid in the second and third rows of the divider array. 11.8.1.2.3 R20 and OCSAF The cell R2D buffers the outputs of the second row and consequently the Sand C vectors for the third row are asserted low. The cell DCSAF used for the third row is similar to the DCSA cell except that it takes Sand C in complement form. 11.8.1.2.4 DFB and SHF The DFB register contains static latches for the S and C outputs from the third row of the divider array and to store the dividend. The dividend is loaded into SFB from the input bus during pm_4 using the control signal DFB_WR_FD2 and RESET_H, while the CFB is cleared. The Sand C vectors from the third row are loaded into DFB using the control signal DFB_WR_R3 at the end of each pass through the array. The outputs of the DFB cell SFB_H and CFB_H are fed back to the first row of the array for the next pass. In addition, at the end of the required division steps, the DFB holds the final remainder to be transmitted to stage 1. The sign of the final remainder is used to correct the final quotient. Since the sign is derived from <.A.O> bit of the stage 1 adder, the final remainder is shifted down and buffered. The 8HF cell accomplishes this and its outputs are RSR_L and RCR_L. DIGITAL CONFIDENTIAL The Fbox 11-21 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 11-7: Divider Fraction Data Path F_ %F02_LcA2:B58> FROM INPUT INTERFACE F _B%FC'_Lc2:B5h ~ 1/ OVR_WR_FO' (PH OVR_LATCH (B, :B5 6) "- v CPA1 I"- ~ W-W -.J CSA1 ROW1 /1 OVR_H I I I S C ,1/ ,1/ I ~ ,1/ ,1/ l/ I ROW2 f' " I I I I , I I / I C~A3 1/ I' ,1/ C ,!/ J I ~/~ S ~ I I I I I I ,V ~ SEL3 i C '-.I/W ,1/ - PHI 2 ~ OS21 CSA~ SE~L2 I ! i VSS 1/ 1'. CSA3 ROW3 ;' I I I IS w " LATCH II I C~A2 I-- SEL1 1/ DFS Q23 - ~ OFB_WR_FO " F j,...FB SHF S C -W -W F CN as QS " 1/ , RS QC " RC 1/ , i/ l/, tF W ~ 1 11.8.1-.2.5 i/ " QM_SHIFT _IN ( PHI_2 SEL_QUOIREM I' i/ TS (TRISTATE DRIVERS) F_'%FO' R_H F_'%F02R H 1/ I' i/ OSEL. (QUOTIENT/REMAINDER SELECT) , , C,G OIVDONE_DAT I' TO STAGE·, CPA The CPA in each row of the divider array computes the estimated partial remainder(EPR) and generates the three select lines for selecting one of PR+O, PR+D and PR-D in the array. The inputs to the CPA are the four MSBs of S and C from the divider array. The CPA is implemented 11-22 The Fbox DIGITAL CONFIDENTIAL NVAX CPU· Chip Functional Specification, Revision 1.0, February 1991 as a carry select adder as shown in Figure 11-8. The carry select adder computes the sign of the EPR, SIGN_H, and the zero detect logic detects if the 4-bit sum is exactly -0.5 (1111#2), Z_H. The three select lines are derived as follows: ESTIMATED PARTIAl.. REMAINDER SELECT PR+O OUTPUT (NOT POSSIBt£) SELECT PR-D OUTPUT (NOT POSSIBLE) SELECT PR-D OUTPUT SELECT PR-D OUTPUT SELECT PR+O OUTPUT SELECT PP.+D OUTPUT SELECT PR+D OUTPUT SELECT PR+D OUTPUT SELECT PR+D OUTPUT (NOT POSSIBLE) Oll.l OlO.0,OlO.1,Oll.0 OOl.X ooo.x lll.l 111.0 1l0.X 101.1 101.0,lOO.X SE~ ACTION Zt R* H - select PR+O out~ut s~~=:t:R~:E - Select PR+D out?U~ S~~_~~_R*_E s~:._z:_!{ - select PR-D output .._E - S~:_::_?~ .._~ S~:._!·=_?:~ ::_E - s::z.: ~::. j=::. 2 - :::: Z_E ;':;D S:Gl:_E _= - l~=':' S:G!~_E The three select lines are also used to form the quotient. Figure 11-8: S<A2> CPA Block Diagram C<A2> S<AO:BO> C<AO:BO> S<A1 :80> QSELECT 11.8.1.3 C<A1 :80> SEL_ZD_R* _L SEL_PD_R* _H SEL_MD_R* _L Quotient Recodlng and Quotient Registers DIGITAL CONFIDENTIAL The Fbox 11-23 NVAX CPU Chip Functional Specification, ~vision 1.0, February 1991 11.8.1.3.1 QS21 and QREC The select lines SEL_PD_R*, and SEL_MD_R* indicate the selected quotient value. Each pass through the array three pairs of quotient bits are generated. These can be expressed as the number of additions and the number of subtractions performed. These bits need to be accumulated in two shift registers. The final quotient is the total number of effective subtractions performed. In order to minimize the number of bits to accumulate and to reduce the shift register bits, the three pairs of quotient bits from each pass through the divider array are encoded into four bits. The encoding is accomplished by generating the magnitude of the number of subtractions in each pass as three bits and a catTY bit if the number of additions is greater than the number of subtractions. These four bits, instead of the six bits before the encoding, are accumulated in the shift register QM!QS. The catTy vector after shifting left by 1 is subtracted from the number of effective subtractions to form the final quotient. Since the row 3 computation is done last, two sets of quotient bits are generated from the first two rows - one for each. possibility and the final quotient bits are selected based on the row 3 quotient bits. The cell QS21 performs recoding and generates the QSB21, QSB20, QSBIO(QSBll) and QCAI and QCAO. ;~ - SG - a:;~~i:~ cc~_ s~,:ra=,: !~ :~w : cion. in row :3 x: - 52 XOR 1..2 Q5S21 - NOT (X2 XO~ 51) Q5E11 - 51 XOR Al Q530 - 50 O~ AO QCAl QCAO QSB20 - X2 XOR Al QSB10 - NOT Q5511 - 1..2 O~ (NOT 52 AND NO'!' 51) - 1..2 OR (NOT 52 AND Al) The cell QREC selects the final quotient bits and its outputs are QSUB_H<2:0> and QC_L<O> corresponding to the effective subtractions and the carry from one pass thru the array. These bits are shifted in ~ accumulate the final quotient in QMlQS cells. 11.8.1.3.2 QM and QS registers The QM and QS is a master/slave shift register that holds the two components of the quotient the number of subtractions performed and the carry vector respectively. Mter each pass through the array the quotient bits are loaded into QM at various positions depending on the data type. For the DIVF instruction the quotient bits are shifted into bit <B25>. For the DIVD instruction the quotient bits are shifted in at position <B5S>. For the DIVG instruction the quotient bits are shifted in at position <B55>. The quotient carry component QC, is shifted left by one position when it is loaded into QM. The QM register is initialized to zero before beginning a new divide instruction so that the pipeline stages can operate on all the bits of the quotient. The QM register gets loaded either from the QSUB<2:0> and QC<O> or from the slave QS after a shift of three bits in PHI_4. The QS latch is loaded every PHI_2. The QM cells uses six control signals to clear, load or shift in the data. These control signals are derived as shown in Table 11-6. 11-24 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 11-6: QM Cell Control Signals Operation Bit PositiollS INIT DIVF DIVD DIVG DONE Cells AO:B22,B26:B52 CLEAR SHFL SHFL SHFL NOP QMC,QMFC,QMGC B23:B25 CLEAR FLOAD FSHF FSHF NOP QMF B53:B55 CLEAR NOP SHFD GLOAD NOP QMG B56:B58 CLEAR NOP DLOAD NOP NOP QMD DQM_SHFL 0 1 1 1 0 DQM_FLD 0 1 0 0 0 DQM:_FSHF 0 0 1 1 0 DQM_DLD 0 0 1 0 0 DQM_GLD 0 0 0 1 0 DQM_CLR 1 0 0 0 0 Control Signals • '" -asserted HIGH. During reset, all the above control signals except DQ~CCLR are deasserted. In order to simplify the stage 1 control, the ones complement of the QC component is transferred to stage 1 so that stage 1 adder performs the same operation for both the final quotient and the final remainder computation. 11.8.1.3.3 QSEL and TSF The QSEL selects the divider results to be driven to the stage 1 fraction data path. At the end of the required division steps, first the two components of the quotient are selected and in the following cycle the RSR and RCR of the final remainder are selected using the control signals DIV_SEL_REM_*. Since the carry component of the quotient is only one bit .per three quotient bits, zeros are forced into the other two bits. The TSF cell consists of a tristate driver that drives the divider results on F_B%FDl_L and F _B%FD2_L busses during Pffi_2 and Pffi_3 using the control signal F_D_C2%DIVDONE_DATF_H. The TSF also contains buffers to drive F_I%FDlR_H and F_I%FDlR_H to stage 1. 11.8.2 Divider Control The divider control is responsible for all sequencing and control of the divider data path. It gets F_I%DSEQ..START_L, SRC_DT_H<l:O> , F _I%DATA_VALIDR_H and F _I%ABORT_H from the Input Interface. The divider control generates all control signals for the data path, and F_D_C2%DSEQ..DONEDAT4_H signal for the input interface and F_D_C%DIVDONE_DAT_H to stage 1 of the pipeline. The early signal F_D_C2%DSEQ...DONEDAT4_H to the input interface stays valid thru two cycles for both the quotient and remainder transfers. DIGITAL CONFIDENTIAL The Fbox 11-25 NVAX CPU Chip Functional Specification, Revision 1.0t February 1991 The F _I%DSE'LSTART_L signal obtained from the input interface must be valid by the trailing edge of PHI_2. A latched version of this signal is used in Pffi_4 to latch in the divisor and dividend. 11.8.2.1 Divider Control Blocks The divider control consists of the control sequencer and miscellaneous logic, source data type latches, and buffers for driving the various control signals to the fraction data path. 11.8.2.1.1 Control Sequencer The control sequencer is implemented as a PLA.. The inputs to the PLA are the latched version of F_I%ABORT_H, F_I%DSEQ..START_L, F_I%SRC_DT_H<l:O> and state information. The PLA essentially implements a counter and a sequencer to control the data path. The divider control stays in the NOP state until a valid divide opcode and valid operands are received. The signal F _I%DSEQ..START_L obtained from the input interface combines these two conditions. The state transition table shows the sequencer state, inputs and outputs. Figure 11-9: Divider Sequencer State Transition Table --------------------------------------------------------------------------------------------------------------. O:;:'rtT=S : :!;::;:-s .: s:;..:!<,:c> S?,.:_=:E-:SY ---:.. S:;'.. c> :.?: -~;-=:.: ~~:.-:~"'! ~-.:-.:'.- ::'!~_:'A: :s:?~!:: :£<~: :::~_ s~-_::== ~~:~_7;~:=' ~--------------------------------~----------------------------------------------------------------------------~ x 0 0 :. x x 0 0 ~1::>? 0 0 1 0 0 I 0 0 I 0 I 0 I 0 I 0 I 0 I 0 I 0 I 0 I 0 I 0 I 0 I 0 I 0 I 0 I 0 I 0 I 0 I 0 I 0 I 0 0 I 0 , ,x 1 0 X X NO? NOr 0 0 0 1 0 0 0 1 1 0 - NO? PASS:'. C PASS:2 0 0 0 0 0 0 0 0 PASS3 PASS4 PASSS PASS6 PASS7 PASse LAS'X2 PASS 9 PASS10 PASSll PASS12 PASS13 PASS14 PASS15 PASSl6 PASS17 LAS'X2 PASSlS LAS'X2 x x PASSl 0 : c X X X X X X X X X X X X X PASS2 PASS3 PASS4 PASSS PASS 6 PASS 7 PASS7 PASse PASS 9 PASS10 PASSll PASS12 PASS13 PASS 1 4 PASS15 PASS16 PASS17 PASS17 PASSlS LAST2 SOONE QOONE RDONE 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 X X x X x X X X X X x x x x x DIG F X X X X X X X X X G D X X X X X 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 SooNE (looNE IUX>NE 1 NOP 0 0 C 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 1 0 0 0 0 0 +--------------------------------+----------------------------------------------------------------------------+ The divider control PLA has 10 inputs, 14 outputs and 26 Minterms. These numbers include one spare input, one spare output and three spare minterms. 11-26 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.8.2.1.2 Opcode Information Latches The divider latches the source data type signal from the input interface. If the divider is not busy, then the source data type information is latched into a static Pffi_2 latch (cell LS). The output of this latch is used as an input to the sequencer PLA. 11.8.2.1.3 Divider Behavior during ABORT Divider starts execution upon receipt of the F _I%DSE'LSTART_L signal from the Input Interface. Assertion of F_I%ABORT_H from the Input Interface, while the divider is retiring quotient bits, will automatically force the divider to reset its control sequencer to its initial NOP state and to maintain the data valid enable in its deasserted state. It is expected that the Input Interface also deasserts the F _I%DATA_VALIDR_H signal during the ABORT cycle. Assertion of the F_I%ABORT_H signal from the Input Interface during quotient and result transfers to Stage-I, also STOPS the divider from driving the F_D%DATA_v..4.LIDR~H line to Stage-I. As above it is expected that the Input Interface also deasserts the F_I %DATA_v..4.LIDR_H signal during the ABORT cycle. 11.8.2.1.4 Data path Control Drivers The ~·arious control signals to the data path are combined with the appropriate clock signals and driven to the data path. 11.8.2.2 Summary of Divider Stage Outputs The following table shows the divider stage outputs for the divide operations: Table 11-7: Divider Output Stages Divider Outputs Instruction DIVF Q(A,S) R Q(C,S)<AO:B25>=Q Remainder Q(C ,S)<B26:B58>::O DIVD DIVG Q(C,S)<AO:B58>=Q Remainder Q(C,S)<AO:B55>=Q Remainder Q(C,S)<B56:B58>::O Q(C,s)-Quotient Vectors QC, QS RRemainder vectors, carry and sum NOTE: • • The divider stage saves the exponent and the sign parts of the operands and passes them during the result transfer unchanged. Floating divide by zero, reserved operand, floating overflow and underflow are not detected by the divider stage. In these cases, the Q(C,S) and R outputs are undefined. DIGITAL CONFIDENTIAL The Fbox 11-27 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • 11.8.2.3 The control outputs generated by the divider stage, DIVDONE_DAT_H and DSE~DONEDAT4_H signals are deasserted for non-divide operations. Data Valid Logic The divider output signal F _D%DATA_VALIDR_H driven to Stage-l signal is a logical OR'ing of F _1%DATA_VALID_H signal from the Input Interface and the F _D_C_DV%EN_H signal from the Divider. These signals are mutually exclusive. The Input Interface deasserts its data valid after issuing a divide instruction and awaits the F_D_ C2%DSE~DONEDAT4_H signal from the Divider before it asserts the data valid again. The presence of the global ABORT signal F _I%ABORT_H disables the driving of F _I%DATA_VALIDR_H signal by the Divider. 11.8.3 Exponent and Sign Data Path The exponent data path in the divider consists of registers to save the exponents and signs of the divisor and the dividend. The divider does not operate on the exponent and sign parts of the divisor and the dividend. The exponents and signs are saved to pass them to stage 1 of the pipe along with the quotient and final remainder components so that for lioating point divide operations, the exponent result and exception conditions can be detected. The Ll cell is a static latch and is loaded with sign and exponent data from the interface during PID_4 if a valid F_I%DSE(LSTART_L is detected. At the end of divide operation the exponent and sign data is driven to stage 1 exponent data path. The cell TSE contains the tristate driver and the driver. The exponent and sign data, as in the case of the fraction data path, is actively driven during pm_2 and pm_s using the control signal F _D_C2%DSE~DONEDAT_H and PID_2S. 11.9 Stage 1 Stage 1 of the pipeline is primarily used to perform the addition of the two inputs, or to compute the encoded shift amount, or to perform the recoding for the multiplier array, generate the initial partial product, select the row one input to the multiplier and the row two input to the multiplier in stage 2. Stage 1 receives its inputs from either the interface section or the divider section. All outputs of stage 1 are driven to stage 2 of the pipeline. The sign of the adder result is driven to stage S as well as stage 2. Stage S requires the sign of the remainder, for the divide operation, to determine if the quotient result should be incremented. The fraction datapath portion of stage 1 primarily consists of an input selector, an adder, the multiplier recoder, and two output selectors. The adder in stage 1 is used for many functions. For multiply operations it is used to compute three times the multiplicand, for quotient operations it is used for adding the sum and carry vectors for the quotient; for other operations it is used to add two vectors. The recoder logic is used to select the appropriate bits of the multiplier and recode them. The recoded bits are inputs to the multiplier array in stage 2. 11-28 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The exponent datapath of stage 1 primarily consists of an input selector, two adders, detection logic, and an output selector. The main purpose of the exponent section in stage 1 is to compute the exponent difference. The detection logic is used to determine the range of the exponent difference. The sign datapath portion in stage 1 performs no operation on the sign bits. They are passed unchanged to stage 2. 11.10 Section Implementation Description 11.10.1 Fraction Datapath Figure 11-10 is a top level block diagram of the Fbox stage 1 fraction datapath. DIGITAL CONFIDENTIAL The Fbox 11-29 NVAX CPU Chip Functional Specification, ReVision 1.0, February 1991 Figure 11-10: Fraction Datapath Block Diagram F I%FD2R - II ZERO \1 \1 +--------------------------------------------------------------------------------------------------------+ I ISEl. IOV!' 1--~------------------------------------++-------++-----------------------++-------++-----------------------+ II II \ lAIN \/BIN -------------------------------------++-------++---------------------------------------------------------+ ADDEP. +----------------------++------------~+-------++---------------------------------------------------------+ ! 15m'. II IIFD1R +----------------------++------------++------------------------------------------------------------------+ RECCDEP. - MTCR ~CR ----~~-------~---------+~------------++-------+~----------------------------------+.-------------~~------+ I ISm-! I iFD1R II Ii I! I' ~------------+~--------~~---------------------------------------------------------.~-------------~-------. ~------------~--------~----------------------~~----------------------------------+--------------~+------~ : : !: i :; i \,i:-=: -~?,. t=:: ~~ :::?. .: I1 I1 I :!-!?~J" \/!-:?.=::<: : c> -------------------------------------------------------------------------------------------------.~------~ II \/s~·= i ,I ------------------------------------~---------------------------------------------~~------------~.-------~ : 1 I I, \ /~.=:=<: ., : 0> \/ I I ~!?.E::?. ---------------------------------------------------------------------------------------------------------i i; 1 I I ~==?R !, !-=~R !: II \ /M?'::C<l7 : 0> ; 1 I i i ' \1 ::=:_SR \ / II IIM?.EC?. +---+~-------++--------+~-----------~--------+~-----------------------++--------++-------------++------+ ZERO II II Ii II Ii II II \/MRW1R ! IMRECR Ill-UP?R ! IM'!'CR \ 150M \1 .---~--------+~-------------------------------~~------------------------~~-----------------------~.------+ ~---~--------~+---------------------++--------++-----------~+-----------++--------++-------------++------+ ZERO II II II II II II IllaP?R IIMTCR \1 \/FD1R IIMRW2R \/MRW1R IIMRECR +---++-------++-------------------------------++------------------------++-----------------------++------+ FD2R SEI.ECTOR and REGISTER +---++-------++-------------------------------++-----------++-----------++--------++-------------++------+ II II II II II \1 MIPPR_l. \1 MTCR_L<lS:O> 11-30 The Fbox \1 \I MR.W2R MRWlR \/ MRECR_H[O:6)<S:O> DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The Table 11-8 lists what is required to be loaded into the stage 1 fraction datapath registers, FDlR and FD2R, for each operation. Table 11-8: Stage 1 Fraction Register Operations Category Operation Condition FO FDlR <- OPl • (OP2) + 1 Effective SUB (DeltaE=O), eMP F1 FDlR <- OP1 ·SHR1(OP2)+ 1 Effective SUB (DeltaE= +1) F2 FD1R <- .SHRl(OP1)+OP2+ 1 Effective SUB (DeltaE= ·1) F3 FDlR<-OPl Effective ADD or effective SUB CDeltaE > 1), and ED1R < ED2R F4 FDlR<-OP2 F5 FDlR <- OPl + OP2 F6 FDIR <- OPl + 0 Effective ADD or effective SUB CDeltaE > 1), and ED1R >= ED2R DIV ( after the divide array operation, done once for the quotient and one for the remainder) '" CVTft, CV'Tfi, MOV, l\fl'I~G, TST, and CVTif(ifinput integer is positive) F7 CVTif (if input integer is negative) F8 FDlR <- -(OPl) + 0 + 1 FDlR <- OP2 + SHL1(OP2) F9 FD2R <-OPl Effective ADD or effective SUB CDeltaE > 1), and EDlR >= ED2R FlO FD2R<-OP2 Effective ADD or effective SlJB CDeltaE > 1), and ED1R < ED2R, or MUL, MULL :MUL, :Mt)"LL "-The divider supplies stage 1 with QA . This allows the stage 1 adder to perform the same operation on the quotient and the remainder inputs. 11.10.2 Integer Overflow • IOVF The integer overflow logic in stage 1 is used to help facilitate the detection of an integer overflow . condition during a CVTFI operation. 11.10.3 Input Selector • ISEL ISEL consists of two 3 to 1 selectors. The inputs to the A selector are FD1R%I<bI>, The inputs to the B selector are FD2R%I<bI>, FD1R%I<bI+1>, and FD2R%I<bI-l>. FD2R%I<bI+1>, and zero. Both selectors can invert the selected input. 11.10.4 Adder The adder uses two 61-bit inputs to derive a 62-bit result. The 61-bit inputs have two bits above the binary point and 59 bits below; the 62-bit result has an additional bit above the binary point. The main carry acceleration technique used is carry select. The adder is broken up into nine small groups, with all but the least significant group having duplicate carry chains. These carry chains operate in parallel in the first half of the stage 1 cycle. Propagate and generate logic operates before the carry chains. These parts of the adder are fully static. DIGITAL CONFIDENTIAL The Fbox 11-31 - NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 In second half of the cycle, the sum logic executes. Just as for the carry logic, there is duplicate sum logic for all groups except the least significant one. These carry out signals are used to select the correct sum values. These parts of the adder are also fully static. The carry in to bit position <B5B> is set directly by the stage 1 control. 11.10.5 Recoder Selector • RSEL RSEL is a 2 to 1 selector which selects either F _I%FDlR_H<aO:b2B> or F _I%FD1R_H<b26:b55>. When F_1_C%MRW_UPPER_H is asserted bits <a0:b28> are selected, and when F_1_C%MRW_UPPER_H is deasserted bits <b26:b55> are selected. 11.10.6 SRECODER The srecoder uses the radix S modified Booth algorithm to compute the recoded sign bits of the partial products. The srecoder receives F _I%FD1R_H<aO:b26> as an input and outputs 9 recoded sign bits, F_l_R'iCSREC_H<S:O>. If either F_1_E%E1Z_H or F_1_E%E2Z_H is asserted, ihe srecoder will force the outputs to a one. The recoded sign bit is asserted when the partial product is positive. 11.10.7 Multiplier Two!s Complement Register· MTCR<18:0> The 1\.ITCR. register is a 19 bit 2 to 1 selector and register. "'ben F_1_ C%~:tRW_'UPPER_H is asserted the A inputs to the selector are selected and when F_1_ C%MRW_ UPPER_H is deasserted the B inputs to the selector are selected. Bit zero of the A input is tied to VDD, bits <9:1> are driven by the SRECODER, and bits <18:10> are tied to VDD. Bits <9:0> of the B input are driven by the RECODER and bits <18:10> are driven by the SRECODER. 11.10.8 Recoder There are 31 inputs to the the recoder: F_1_R%RSEL_H<29:0> and zero. The least significant bit of the recoder input is always zero. The recoder performs the recoding using the radix S modified Booth algorithm. The recoder generates 60 recoded bits. They are F_1_R%MREC_H<59:0>. Of the 60 bits, IS are used in stage 1. F_1_R%MREC_H<5:0> are used to select the MIPP. F_1_R%MREC_H<11:6> are used to select the row one input to the multiplier array. F _1_R%MREC_H are used to select the row two input to the multiplier array. If either F _l_E%EIZ_H or F _1_E%E2Z_H is asserted, the recoder will force the recoder outputs to recode zero. 11.10.9 PHI_4 LATCHES The pm_4 latches are used to latch F _I%FD1R<a2:b58>, F _1_R%MREC_H<59:0>, and F _l_R%SREC_H<S:O>. 11-32 The Fbox F_I %FD2R<a2:b5S> , DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.10.10 Recoder Register· MRECR[O:6]<S:O> The MRECR register is a 42 hit register. The latch is written every cycle with the upper 42 hits of the recoder output (F_l_R%MRE C_3R_B<59: 18>). The output of the register is driven to stage 2 as MRECR[O:6]<5:0>. 11.10.11 Multiplier Initial Partial Product Selector and Register • MIPPR The MIPP selector is a 1 of 5 selector. It uses the RECODER output hits F_1 %MREC_3R_H<5:0> to select the initial partial product. The inputs to the selector are: plus/minus (lX, 2X, 3x., 4X) the multiplicand, and zero. The selected input will be latched at the end of stage 1 execute cycle. 11.10.12 Multiplier Row 1 Selector and Register· MRW1R The 1-IR'\VIR selector is a 1 of 5 selector. It uses the RECODER output bits F_1 %11:REC_3R_H<11:6> to select the ro'\v 1 input to the multiplier array. The inputs to the selector are: plus/minus (lx., 2X., 3X, 4X) the multiplicand, and zero. The selected input will be latched at the end of stage 1 execute cycle. 11.10.13 Multiplier Row 2 Selector and Register • MRW2R The :MR,\V2R selector is a 1 of 5 selector. It uses the RECODER output bits F_1 %1fREC_3R_H<17:12> to select the ro'\'\" 2 input to the multiplier array. The inputs to the selector are: plus/minus (lx., 2X., 3X, 4X) the multiplicand, and zero. The selected input will be latched at the end of stage 1 execute cycle. 11.10.14 Selector and Reg ister • FD1 R The FDIR selector is a 4 to 1 selector. The inputs to the selector are FD1_3R, FD2_3R, the output of the adder, and zero. The selected input is latched at the end of stage 1 execute cycle. 11.10.15 Selector and Register • FD2R The FD2R selector is a 3 to 1 selector. The inputs to the selector are FDl_3R, FD2_3R, and zero. The selected input is latched at the end of stage 1 execute cycle. Figure 11-11 is a block diagram of the recoder logic in stage 1 of the Fbox fraction datapath. DIGITAL CONFIDENTIAL The Fbox 11-33 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 11-11: Recoder Block Diagram V +------------------------------+ RSEL 1<------- MRW_UPP ._-------------+---------------+ I FD1R_H%I<aO:b26> 1 I 28 o / 30 I I I v v v +------------------------------+ .--------------+---------------+ +------------------------------+ I SRECOOE I +--------------+---------------. I / S' I -; I / 60 I v 1<---------------- i ............ -:. < _______ ::-::_.; I I / 6 i 6 i € I I --------/-------~ !C?? -----------------------------------> I :-:0 !..a;-;: -----------------------------------> I I . 1----------------------------------> -'"' I "V" ~t ~ / S' ... 10 ~. V V \' -------------------------- ------------------------------. MTCR MTCR A B +-----------+------------+ v MTCR<18:10> 11-34 The Fbox A 1<----------+ B S~ I I I 1<----------------------------- MRW_UPPER_H +--------------+--------------+ I I I v v M'l'CR<9 : 0> E~ I / 42 I I 9 1 1 S~ ~O MRECR[O:6)<S:O> DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.11 11.11.1 Exponent Datapath Stage 1 Exponent Processor Block diag ram Figure 11-12 is a block diagram of the exponent processor logic in stage 1. Figure 11-12: Stage 1 Exponent Processor Block diagram ------------------------------------1 1-----1 1----------------------------------------11 11--------------------F_!_E"iED1R_H 1 1 I< 1 1 E".-I_E"iED2P,-E STl'_EJo.l 11 11 11 F :. =~E!.A E:Z :. F:::C%El~=~:Z=~ \ I 1 1 \ / +-----------------~-~-----------------------------------------------~ --------->: ZE?.C DE'!:::!!Ol' i ------> ::_:'_E%E:Z?,_F. :_l_£~~:Z~_E ---------------------------------------------------------------.----. F ! !~ED1R F. I I 1: ! I - -- \ / \. / \ ./ !J:.'T?:"-: S::::'E::TOR F:'?, ;'..=:;'E?.:' ----------.-------~-~---------------------~-------------+----~-~----\ :' \ I -----------------------------------------------------~---------~----+ E".-!_ElkED1R_H 1 1 I< 1 1 1 1 1 1 EA ADl liES ADl 1 - \ / - 1 \! i F_!_!:%EDZR_E 1 1 +---------~-+-----+-~----------------------------------------~-~---~ ~---------~+-----.-.-----------~----------------------------+------~ F I ElkED1R H I I I < l i E ADl 1 1 I I F I E%C2R H STN EAl ------------------------~-~-----~--\ /-----\ /----~----\ /----------------------------\ /-::::-----::--------:---- +-----------------------------.-+-----------------------------------+ F 1 C%ISEL2 ED2R A ElL -->1 F-l-C%ISELZ-K A ilL ----->1 F-l-C%!SELZ-ED1R BElL -->1 AO F:l:C%ISEL2:I<_B_H/L ---->1 NO F_1_ElkE_N_F. INPUT SELECTOR FOR ADDER.2 +---------+-+-----------------+-+---------+-+---------+-+----+-+----+ l i E ADl I 1 INP 2A I 1 INP.2B 1 I 1 1 F I EtED2R H F ! ElkED1R H -- - 1 1 - 1 1 - \ / - \ / 1 1 -- - +---------+-+-----------------+-+----------------------------+-+----+ F 1 ClkINVER'l' EA ADZ E/L ->1 COMPLEMENT LOGIC FOR F:l:ClkINVER.'l':EB:AD2:H/L ->1 EXPONENT ADDER .2 +---------+-+-----------------+-+---------+-+--------+-+-----+-+----+ F I ElkED1R H l i E AD1 I l E A ADZ 1 1 EB AD2 1 1 1 1 F ! E'ED2R H -- - II - 1 1 - \ / - \ / 1 1 -- - +---------+-+-----------------+-+----------------------------+-+----+ F_l_ClkCIN_E_AD2_H ------->1 AU EXPONENT ADDER .2 +---------+-+-----------------+-+---------+-+----------------+-+----+ F I EtED1R H l i E ADl l i E AD2 1 1 1 1 F I ElkED2R H 1 -- - II - \ / - \ I 1 1 -- - +---------+-+------------------------------------------------+-+----+ 1 EXPONENT DIFFERENCE 1----> F 1 ElkE DIFF 5 1 NEQ 0 1 LOGIC 1----> F:l:E'E:DIFFRji<5:0>- . +---------+-+-----------------+-+---------+-+----------------+-+----+ F I ElkED1R H l i E AD1 l i E AD2 1 1 1 1 F I E'ED2R E 1 EA2 ------------------------~-~------~--I I------------~----I I----~----I 1----------------1 1-::;:-----;:-------------:-11 \/ \/ 11 l_EOS Figure 11-12 Cont'd on next page DIGITAL CONFIDENTIAL The Fbox 11-35 NVAX CPU Chip Functional Speci:6cation, Revision 1.0, February 1991 Figure 11-12 (Cont.): Stage 1 Exponent Processor Block diagram +---------+-+------------------------------------------------+-+----+1----> F I E%E DIFF EXPONENT DIFFERENCE DETECTION LOGIC 1----> F-I-E%E-DIFF1----> F-I-E%E-DIFF1----> F-I-E%E-DIFF1----> F:I:E%t:DIFF: +---------+-+-----------------+-+----------------------------+-+----+ 1 1 F I E%ED2R H \/-+-------------------------------------------------------------------+ ZERO L ------>1 -- F I E%EDlR H l i E ADl 1 1 -\/ \/ F 1 C%OSE~l AnI H/L --->1 F-l-C%OSEL1-ED1R ilL ---->1 F:l:C%OSEL1:E~2P:H/L ---->1 F-l-C%OSE~l-t PH! 4 LATCHES, OUTPUT SELECT -AND PH!_2 LATCHES +---------+-+-------------------------------------------------------~ l_EOS ------------------------------------i \1-----------------------------------------------------------------------I F_l_E%E~l~H 11.11.2 Exponent Adders The operations performed by the adders in stage 1 are listed in Table 11-9. E1 refers to F_I_E%EDIR_H, E2 refers to F_I_E%ED2R_H, and K refers to the constants generated in the control section of stage 1. Table 11-9: Exponent Adder Operations Category EO El E2 E3 E4 E5 E6 Adder#l El- E2 - El + E2 - El + K El- K El+E2 El+K 11.11.3 Adder #2 Condition E2 -El CVTif, :MULL SUBf, ADDf, elm DIVf - K+ El CVTfi CVTff, MOVf, MNEGf MULf TSTf Constants The constants are driven from the control section into the exponent datapath. The constants needed for stage 1 are listed below. 0000010000000 = 0 ; TSTf 0000010000000 = 128 ; CVTfi', MOV, MNEG {F,D} 0010000000000 = 1024; CVTfi', MOV, MNEG {G} 0000010111000 = 184 ; CVTfi {F,D} 0010000111000 = 1080 ; CVTfi {G} The Table 11-10 shows the required carry-in to the exponent adders. 11-36 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0. February 1991 Table 11-10: Exponent Adder carry-In Operations Category CinE~l CinE_AD2 EO d d El E2 E3 1 1 1 d 1 1 E4 1 d E5 E6 0 d 0 d d =don't care 11.11.4 Zero Detection The zero detectors check to see if an exponent operand has a value of zero. They are enabled by EKA_ElZ and EKA_E2Z. The deiection is done in the second half of execute cycle and driven into the control as E 1Z and E2Z. E lZ detects zero on edlr and E2Z detects zero on ed2r. 11.11.5 Exponent Adder 1 The exponent adder is a 13-bit static adder used to add or subtract two inputs. Each input is passed through a 2 to 1 selector and inversion logic prior to the adder. INP_lA can be selected from EDlR or K. If ISELl_EDlR_A is asserted, then EDIR is passed through the selector. If ISELl_K_A is asserted, then K is passed through the selector. Inversion of the adder input is then done based on the assertion of INVERT_EA_AD l. INP_lB can be selected from ED2R or K. If ISEL1_ED2R_B is asserted, then ED2R is passed through the selector. If ISEL1_K_B is asserted, then K is passed through the selector. Inversion of the adder input is then done based on the assertion ofINVERT_EB_ADl. The adder also contains a carry-in to the LSB cell, CIN_E_ADl_H. The carry-in is primarily used for performing subtraction operations. Since the adder is static, it begins its operation when the input data is valid at the start of the stage 1 execute cycle. Intermediate results in the exponent adder are latched in the second half of the execute cycle and sent to the detection logic and outPut selector. 11.11.6 Exponent Adder 2 Exponent adder 2 is almost identical to exponent adder 1. The only real difference is found in the input selection logic. INP_2A can be selected from ED2R or K. If ISEL2_ED2R_A is asserted, then ED2R is passed through the selector. If ISEL2_K_A is asserted, then K is passed through the selector. Inversion of the adder input is then done based on the assertion of INVERT_EA_AD2. DIGITAL CONFIDENTIAL The Fbox 11-37 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 INP_2B can be selected from EDlR or K.. If ISEL2_ED1R_B is asserted, then ED1R is passed through the selector. If ISEL2_K_B is asserted, then K is passed through the selector. Inversion of the adder input is then done based on the assertion of INVERT_EB_AD2. The adder also contains a carry-in to the LSB cell, CIN_E_AD2_H. The carry-in is primarily used for performing subtraction operations. Since the adder is static, it begins its operation when the input data is valid at the start of the stage 1 execute cycle. Intermediate results in the exponent adder are latched in the second half of the execute cycle and sent to the detection logic and output selector. 11.11.7 Exponent Difference Detection The exponent difference detection is used to detect certain exponent values. The detection is done on the output of both exponent adders, adder 1 and adder 2, and then selection of the exponent difference is based on E_N. E_N is hit 12 of adder 1. It is used to select the detection results from the positive adder output. The detection logic detects the following conditions: Exponent Difference =0 E_DIFF_EQL_O Exponent Difference> 1 E_DIFF_GTR_l Exponent Difference = 24 E_DIFF_EQL_24 Exponent Difference = 25 E_DIFF_EQL_25 Exponent Difference > 57 E_DIFF_GTR_57 E2 > El E_N Exponent Difference <S:l> l\TEQ. 0 E_DIFF_S_l_NEQ...O The detection and latching is done at the start of the execute cycle. In addition, the absolute value of the exponent difference is determined at the start of the stage 1 execute cycle. These lines, E_DIFFR<5:0>, are used to drive the inputs to the shift decoders in stage 2. The exponent block also generates a signal called EDIFF_S_l_NEQ...O. This signal is asserted when bits <5:1> of the positive exponent difference are not equal to zero. 11.11.8 Output Selector The output data (ED1R) can be selected from four sources: ed1r, ed2r, e_ad1 or it can be set to zero. The selection is done based on the assertion of the output select control signals. If OSELl_ED1R is asserted, then ed1r is selected. If OSEL1_ED2R is asserted, then ed2r is selected. If OSELl_E_ADl is asserted, then e_ad1 is selected. If OSEL1_ZERO is asserted, then the output of the selector is zeroed. The output of the selector is latched every cycle at the end of the stage 1 execute cycle and driven into the following stage. 11-38 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.12 Sign Datapath The sign bits of both the operands are not modified within stage l. They are used by the stage 1 control. The two sign bits S1 and S2 are latched in stage 1 and are passed to stage 2 of the pipeline. Figure 11-13: Sign Datapath Block Diagram I SlP. ! I - To S-:.age 1 Con~:-ol <----______ <----------+ i ____________________ ~ V S~R : ?E:_~ __ : --------->i ?E:_: ---------> s:_~? .::-!"'~- I \.~ S~?~:. 11.13 Stage 1 Control The control section in stage 1 receives the opcode from the interface. The control section unconditionally decodes it every cycle. After a minimum of a one cycle delay, stage 1 will receive operands from the input interlace. If it is a one operand instruction the input interface will assert data valid, and stage 1 will perform the instruction. If it is a two operand instruction, both operands are driven in the same cycle alongwith data valid. 11.13.1 Divide Instruction During a divide operation, the opcode, data valid, and two operands are passed to the divider and stage 1 by the interface. The divider and stage 1 will perform their portion of the divide operation. Stage 1 will deassert data valid. When the divider completes the divide operation, stage 1 will again receive the opcode. The following cycle stage 1 will receive data valid, divdone_dat, and quotient bits QS and QA. Stage 1 will compute the quotient and pass data valid and the quotient result to stage 2. The next cycle stage 1 will receive divdone_dat and the sum and carry vectors for the remainder. Stage 1 will compute the remainder and pass the sign of the remainder to stage 3. DIGITAL CONFIDENTIAL The Fbox 11-39 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.14 Fraction Datapath Operation Summary Figure 11-14: Fraction Data path Operation Table Data Condition EDIFF 5 1 IDIV , IOV: +--------------+-----+-----+-------+------+-----+--------+------+-----+-------+-------+------+-------+------+ 0 X X X X X EFF SUB 0 0 0 0 Fl-F2 Fl 0 OPER1..T!ON - EDT'!:":" -- --- : 0 I FD2R ! F_N I MRECR I MIPPR I MRWR I EDIFF 0 :1-:2/2, 1 i-Fl/2+:2 I I , Fl 0 0 0 C 0 0 C , 0 X 0 \' y. "'" X :-~-=..; 0 :)1:: -- --- -- - ,--" ------- .. > C =..:? (, :.?~ ::;..;= FD1R -+:. S~J= ~=::':A ZERO I £ N 0 £:: SUE DE:'T.A £.:F I E::'Z I E2Z ~- .. ,", 0 : - !~=~!,:', ;- " ::-."!, :, 9' !~=--=- 0 ~ .!. X 0 X :: =1 -- - =:~::2 C X X X X X X X X X X X ): X ).: V X .. X .I. X F: -- ::: -"\ ::- ~. X X v Y. Y. X .'. :.: X ): " y V Yo X I !-!OV/l>!!~~ 0 0 X 0 ':S: 0 0 X 0 0 X 0 C\"'!!! I I I I I I I I I X Fl 0 V X X X X X X Fl 0 V X X X X X X :l 0 X i: X X X X Fl 0 0 X X X X X C\r:i! 0 Sl - 0 0 X 0 C\~i! 0 0 X 0 r X -Fl 0 0 X X X X X 0 I I I X F1 0 0 X X X V V 5l - 1 0 CVT!i X 0 +--------------+-----+-----+-------+------+-----+--------+------+-----+-------+-------+------+-------+------+ X V -- Don't care Valid 11.15 F1 F2 -- Fraction portion of operand 1 Fraction portion of operand .2 Fraction Datapath Exception Summary 11-40 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional SpecificatioD.t Revision 1.0, February 1991 Figure 11-15: Fraction Datapath exception Summary Data IDIV OPERATION 5_1 ZERO I E_N FD1R I FD2R I F_N I MRECR I MIPPP. I MRWR I ED IFF I IOV: I E1Z I E2Z ~-------------------+-----+-------+------+-----+------+------+-------------+-------+------+-------+------+ 0 0 0 EFF SUB 1 1 0 0 0 X X X X X EDZFF 0 Condition ED IFF - E'I:"':' SUB EJ:;'l"'I:"':' 1 1 0 0 1 X X 0 0 1 0 0 F2 0 0 Fl 0 EF: ADD OR 1 0 0 1 X X 0 0 1 0 0 0 F2 Fl 0 1 x X 1 C 0 0 :. c 0 0 x 0 0 0 >: c- x i: i: - ~!' SUB AIm - E,t'!F: > 0·:: .... ---, ,:;-, - '!'..- - ' : ~ . - x X X X X X X X X X X x x x x x x x x j: ::: 0 ): :i: X - x x ): 0 -- - .A. i: x ---.;:.z X X ): --.:;. .. :x: ".' !-:: -; / !,=,~:; X X ,-' :i: (; X X : 1 _ X X .!. x ".;: --~-,-,:! X X ): X - - · .. z o X X X X j= ~ c j: .!. .'. x x 1.1 X X ~' CJ ): c G s 0 (; :r: X x x x X X x x :i: x x X x x :i: 7. X X .. x x x :i: x j.: .'. x j: j: ): j: ~' ~--------------------------~-------~--------------------------~-----~-------~--------------~-------~------~ DIGITAL CONFIDENTIAL The Fbox 11-41 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.16 Exponent Datapath Operation Summary Figure 11-16: Exponent Datapath Operation Table Condit. ion Data ED IFF ED IFF ED IFF EDIFF EDIFF - 1 > 1 - 24 - 25 +--------------+-----~-----+-------+------+-----~------+- ------+-------+-------+-------+-------+ ErF SUB o o o o o o o o o E1 1 :::t'!:: - 0 E~Z C?ERll.TION EFF SUB __ S:.-r:. =~:.:;.. - - -- I E2Z I 5_1 IDIV ZERO I E_N o o o o c o o o ED IFF ED1R I - 0 o o E: o o o o o o J __ .. "-_ C'R o o o o o o o MOV/MNEG o " > - ~:::::: 1 0 o o 0 .!. C: X 0 lye V x :. :. x x x x x X : -=::..,..~: I X x X I I E1-K X X X X X V X X X X X X X X X lOX X X X X X X X X X X V V x I 0 ':'!:':' CIt'!'!! :;\~i! 51 - 0 X 0 X I I I E1-K I 0 0 X 0 X 0 0 X 0 X 0 I I 0 CVTi! 51 - 1 0 0 X 0 X CVTfi 0 0 X 0 0 lOX I I I-E1+K X +--------------+-----+-----+-------+------+-----+------+-------+-------+-------+-------+-------+ X - Don't care V - Valid K - Constant 11-42 The Fbox E1 - Exponent portion of operand 1 E2 - Exponent portion of operand 2 MAX - Maximwn exponent [E1, £2 J DIGITAL. CONFIDENTIAL. NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.17 Exponent Datapath Exception Summary Figure 11-17: Exponent Datapath exception Table Condi'tion Da'ta ED IFF IDIV ::DIFF I E1Z , E.2Z I 5 1 I ZERO I E_1' - - EDIFF - - EDIn' ED I:: > 1 EDIFF 0 0 0 1 X X :s: x x x x - 0 1 ED1R I .25 ! I ! OPERATION .2" I .--------------+-----+-----~-------+------+-----+------+-------+-------+-------+-------+-------+ , 0 0 0 0 0 0 0 0 :. I 1 I I £FE'" SUB 0 ~::F I I I I 0 0 0 0 0 :. 0 0 0 E:F SUB :"..i. I 1 I ~~:.:;.. ... 1 I I - . -- --- st:-: - --- - -'=---~. :. 0 ----- --c::: > -- - ::!.:? 0 I ...J:".. ~_'e.I .t"_"..,; ~-- i I i ~ __ W_,_,'::I ---;-; A _ MOV/MNEG ... ~: 0 C' - , I ~ . x (; 1 X 0 x x v I 0 x 0 I I I j: i: 1 1 X -:s:- - x ~lT=! 1 X CVTfi 1 X I I I I I I I I I I ::..,.; 0 1 ~- 0 0 0 0 0 '-' (. c -- -- x x x .. - 0 1'• -= -, ...: ...... ---, 0 0 x 0 X 0 - '.' I I x 0 I I I I I X I ..: i ~ x 0 c' j 0 I I I I I I I I I X 0 x 0 X 0 1 0 C', - x x x x ;:, 0 X c :; x x x x x X X X X X x x x x x x X x x x x ,\1 X X X X X X X x x x 0 x I I I I I I I I I I. X x x X X 0 +--------------+-----+-----+-------+------+-----+------+-------+-------+-------+-------+-------+ No~e - Stage 1 will not assert F_l%E2ZR_H during one operand instructions. NOTE: • The exponents and signs are driven to stage 1 during both quotient and final remainder transfers to stage 1. 11.17.1 Passthru Signals :MM:GT_FLT_L, MEM_ERR_L, RSV_ADR_L and PSL_FU_H signals are simply passed through stage-l without change. They are latched coming in from Input Interface during Pffi_, and driven to Stage-2 during Pffi_2. DIGITAL CONFIDENTIAL The Fbox 11-43 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 NEW_FOP_H signal also passes through to Stage-2 unaffected. It is latched during PIn_l coming from Input Interface and driven to Stage-2 during PHI_S. This signal is gated with the global purge signal F_I%PURGE_H from the input interface which clears it on a PURGE from the input interface. This signal is used by the Output Interace to manipulate its control-queue and data-queue pointers. 11.18 STAGE 2 11.18.1 Introduction Stage 2 of the Fbox pipeline is composed of a fraction datapath, an exponent datapath, a sign data path and a control block. Stage 2 receives all its data inputs from stage 1 and passes all its data outputs to stage 3. Stage 2 receives control inputs from stage 1 and the interface section, and passes control information to stage 3. The stage 2 fraction datapath has an array multiplier, a right shifter "and detection logic. The detection logic is used to detect the bit position of the most significant bit in a number and if a number is equal to zero. The detection logic is also used to generate the sticky bit associated with the right shifter. The exponent datapath is composed of the standard exponent block, of which only the adder and the output selector are used, and an additional 6 bit data register. The sign bits are passed from stage 1 to stage 3 unchanged. The stage 2 fraction data path performs operations on its input data for the following instructions: ADDf, StJBf, ~1Pf, TSTf~ MULf, MtJLL, CVTif, CVTfi and CVTRfi. The ADDf and SUBf instructions use the output of the right shifter and the detection logic. CMPf, TSTf, CVTif use the output of the detection logic. The 1\flJLf and rvruLL instructions use the output of the array multiplier. The CVTfi and CVTRfi instructions use the output of the right shifter. For all other instructions the stage 2 fraction output registers are either written with the unchanged input data passed from stage 1 or the contents are undefined. The stage 2 exponent datapath performs operations on its input data for the MULf and DIVf instructions. The adder in the exponent datapath is used to either add or subtract the appropriate exponent bias from the exponent data passed from stage 1. The output selector selects between the adder output, the input data from stage 1, and zero. For all instructions other than MULf and DIVf, the output selector passes the data passed from the stage 1 exponent datapath. The stage 2 control block generates all conditional datapath control signals and passes control information to stage 3. The control block must sequence the fraction multiplier for MULD/G and MULL instructions which require two consecutive cycles of execution in the stage 2 fraction datapath for generating the two vectors (carry and sum) used for forming the final product in stage 3. 11.18.2 MUL Instruction Flows Stage 2 is the stage of the Fbox pipeline that executes most of the computation needed for MULf and MULL instructions. To clarify the need for the multiply hardware in stage 2, the basic MUL :flow is described. The multiplication algorithm implemented in the FBOX is the modified Booth algorithm which retires 3 multiplier bits at a time. The steps for calculating the product, or the fraction portion of the product in the case of floating point operands is as follows. First, the multiples of the multiplicand that are required by the Booth algorithm are calculated and the multiplier is recoded. Then the snmmands (a snmmand must be one of the calculated multiples of the multiplicand) which are to be added together to form the product are selected based on 11-44 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 the recoded bits of the multiplier. Finally, the snmmands are added together and all terminal operations (rounding, etc.) are performed as required by the particular instruction and datatype. The stage 1 fraction datapath basically calculates the required multiples of the multiplicand and recodes the multiplier. The stage 1 exponent datapath adds the exponents of the operands for floating point instructions. The sign of the operands are passed from stage 1 to stage 2 unchanged. The stage 2 fraction datapath selects the summands and performs carry-save addition on the summands. The stage 2 exponent datapath subtracts the appropriate exponent bias from the sum of the exponents calculated in stage 1. The signs of the operands are passed from stage 2 to stage 3 unchanged. The stage 3 fraction datapath forms the final product by doing a carry-propagate addition of the carry and sum vectors output from stage 2. The stage 3 exponent datapath decrements the exponent of the product if the fraction portion of the product needs to be normalized. Stage 3 also checks for potential data dependent stage-4 bypassable cases by carrying out a miniround on the lower 3 bits and the round bit. If the rounding operation doesn't carry past the 4 bits then stage-4 is bypassed. This bypass is aborted should stage-3 detect any exception or potential exception conditions. For a more detailed explanation refer to stage-3 specifiction. For all non stage-4 bypassable instructions, Stage 3 passes the signs of the operands unchanged to stage 4. The stage 4 fraction datapath performs all terminal operations on the product. For ~rULf instructions stage 4 rounds the fraction of the product and increments the exponent if the fraction overfio,'\"s. Stage 4 also checks for floating overflow and underlio\'\". For ~Itj'LL instructions stage 4 checks for integer overflow and forms and aligns the product for outputing to the interface. Stage 4 generates the correct sign bit for floating and integer MUL instructions. Two consecutive cycles of execution in stage 2 are needed to complete all MtJL instructions except M1JLF. This is due to the fact that 1 cycle is required for each pass through the multiply hardware in stage 2 and only F floating datatype multipliers can be completely retired in one pass. A more detailed description of the operations executed in the fraction datapaths of stages 1 through 4 is given below. Stage 1 passes the recoded multiplier, +1 *multiplicand and +3*multiplicand to stage 2. The multiples of the multiplicand required by the Booth algorithm are 0, +1-1*multiplicand, +1-2*multiplicand, +1-3*multiplicand and +/-4*multiplicand. Stage 1 only calculates +3*multiplicand because the other multiples are obtained by a simple shift of +1*multiplicand, and all negative multiples are generated by two's complementing the positive multiples. In order to reduce the number of computations executed in stage 2 for MOLD, stage 1 also passes the summand selected from the recoded 3 LSB's (assuming D datatype) of the multiplier. The initial partial product is zero for all MUL instructions except for MOLD. Stage 1 also passes two summands which are input to two rows of CS adders in stage 2 called MROWI and MROW2, and a vector which facilitates generating the two's complement of the selected summands. The logic in stage 1 which determines the snmmands for MROWI and MROW2 examine different multiplier bits depending on the operand datatype and whether it needs to output snmmands for the first or the second pass through the stage 2 multiply hardware. The initial partial product is latched in the MIPPR, and the two summand inputs to the MROWl and the MROW2 are latched in the MRW1R and the MRW2R, respectively. The vector used in stage 2 for two's complementing the selected summands is latched in the MTCR. Stage 2 selects all the summands which are needed to form the product, with the exception of the summands provided by stage 1. Stage 2 performs carry-save addition on the summands and outputs a carry and a sum vector to stage 3 for the formation of the final product. The multiply hardware in Stage 2 can be thought of as a 9 row, 3 bit retirement, carry-save multiply array DIGITAL CONFIDENTIAL The Fbox 11-45 NVAX CPU Chip Functional Speci:&cation, Revision 1.0, February 1991 which is capable of feeding its outputs back to its inputs for executing MULD/G and MULL instructions. Each multiply array cell is composed of a selector which selects a summand and a carry save adder which adds the summand to the partial product. The first two physical rows of the array are called MROWl and MROW2, and are different from the other 7 rows in that they have no selector. The selected summands for MROWl and MROW2 are passed from stage 1 in the MRWIR and MRW2R. During the first execute cycle of a MUL instruction, MROWl adds the following three inputs from stage 1: the MIPPR output, the MRWIR output, and the MTCR output. During the second execute cycle, MROW1 adds the MRWlR output and the fed back MARRA.Y sum and carry outputs. Stage 3 does the carry propagate addition on the carry and sum vectors passed from stage 2 to form the final product and normalizes the product if necessary. Note that a left shift of 1 bit position is the maximum normalization possible. Actually two separate carry propagate additions are performed in stage 3. A 60 bit caITY propagate addition is performed to form the fraction portion of floating point products and the high order 58 bits of integer products. A separate 6 bit carry propagate addition is performed to form the 6 least significant bits of integer products. The carry out generated from the 6 hit addition is accounted for in the 60 bit addition so the 6 bit sum can be concatenated to the high order 58 bits. Stage 3 passes the results of both additions to stage 4. Stage 4 performs all the terminal operations (rounding, etc.) on the final product (except when stage-3 bypasses stage-4 operations) before passing the product to the interface section. Stage 4 handles detection of floating underflow, floating overfiow, integer overfiow, and the proper alignment of the product. 11-46 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.19 11.19.1 Stage 2 Implementation Description Fraction Datapath Figure 11-18: MTCP__L I! II I! M!PFR_L II II \/ \/ Stage 2 Fraction Datapath Block Diagram MRW1R_L II II II Ii MRv12R_L II II II MRECR F_HFIllR F HsFD2 II II II II II II II II II II II - +------------------------------------+ +------------------------~ V MeR I I /62 /62 v ME?. -------------+--------+--------+-------++-------+--------++----------------+---------------------+ I I +--------------------~--------~~--------+-------~--------·.--------~-------·---------------------T II II II II Ii II II II II \ I~=S~:":' \ I!·:SS~: --------------------------------------_..--------------------------------------------------------. . . ------------------------------~-----------------~--------------------------~---------------------~ I I I: I: II 11 I' \/!.:?~:_:: \ /1-1:':_S I I Ii ~--------------------------------------+~-------~.-------~~-------~------------------------------!! i I i! !! \/~l? ---------------------------------------~-----------------~---------------------------------------i<- -- ------ ::.:_~ ---------------------------------------+--------++-------+~--------~-------++--------------------+ I II \/FD1R II \/FD2R II \/l·lRECR II II \/Y.&A_C1R \/MA_S1R I I ---------------------------------------++-------..-------~.-------.+-------+.--------------------.I<-:--!-----I FE: 4 MARRAY (ROWS 2 - 6) I ?E!:2 I I--j--I--+ +--------------------------------------++-------++----------------++-------++--------------------+ I II II II I 1-----------------------... II II II II II II II II \/FD1R II II II II II II II II \/FD2R I I 1------- I 1--------------------------+ I I1 II II II II II II \/MCR_L II II II II II II II \/MBR_L +------+--+ I I v v +---+---+ +---+---+ I MILSBCR IMII.SBSRI<- PHI_2 +---+---+ +---+---+ 1 I I V MILSBCR V MILSBSR Figure 11-18 Cont'd on next page DIGITAL CONFIDENTIAL The Fbox 11-47 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 11-18 (Cont.): Stage 2 Fraction Datapath Block Diagram IIFDlR IIFD2R IIMSR_l. " II " II \I " +--------------------------------------++----------------++-------++-------++--------------------+ PHI_3 RSHIFT 1<------ PH: 41 . .--------------------++----------------++-------++-------++-------++-------++--------------------+ FORCE_SF /\SDECO II II " II \/RSHFTO I I " " " " II " " " " . .--------------------++----------------+~------++-------++-------++-------++--------------------+ RSHFTOR 1<------ PH:_ 4 ~--------------------++----------------++-------++-------++-------++------- . .+--------------------+ IISDECO I I RSH:'TOR I I II II II !I II ~-------------------~~----------------~+-------~-------++-------++-------+~--------------------+ 1<------ PH! 2 +--------------------~+----------------~+-------++-------~+-------++-------~~--------------------+ II II II II II \/SDE:CO !' II II II " ---------------------~--------------------------~-------~~-------+--------~---------------------+ 1------> I 1<------ -------------------------------------------------------------------------------------------------j, I II II , I I I I::~=:'C II i I II \! It -------------------------------------------------------------------------------------------------iI II ! I ! t I! 11 1! II I' \1 ~---------------------~----------------------------------~~-------~~-'---------------~~-----------~ i<------ ~::_~:: I I:'lDE:':lO II II II II II II II II ~--------------------++-------------------------++---- ---+._------++-------+.-------.+-----------+ II \/SDE:CO " 1<------- PE:_' +--------------------~--------------------------~~--~----++-------+~-------~~--------~-----------+ II II II II II II II " .--------------------+.-------------------------++-------++-------++-------++-------++-----------+ II \/SDECOR II \/L1DETLO I.SSEL +-----------------------------------------------++-------++-------++-------++-------++-----------+ II " II II \/RSHFTOR\/FD2R II II \/MCR_L " II \/MSR_L II II \/LSSELO Figure 11-18 Cont'd on next page 11-48 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 11-18 (Cont.): Stage 2 Fraction Datapath Block Diagram I I RSHFTOR I I FD2R II I IMeR L II II IIMSR L II II IILSSELO II \/ II II II II II II II II II \I 1\ \ /LSSELO II II II II II II +-----------------------------------------------++-------++-------++-------++-------++-----------+ I LSENC 1<----- PHl_'_L +-----------------------------------------------++-------++-------++-------++-------++-------+---+ II II II II II I LSEl3CO II II II II II +----------> TO ED2R +-----------------------------------------------+~-------++-------++-------++-------++-----------+ LSHR +-----------------------------------------------++-------++-------+~------++-------++-----------+ II II II I! \/P..5?FTORI I II II II II \I I: II!.SHR II II 0 II \/ ---------------------------------------------------------+~-------~-----------------.~----------- . . -----------------------------------------------~------------------~--------~-------+------------I; ,I i i i. \/:::sn.=: I ! I· !: I! I , 'I . . -------------------------------------------------------------------------------------------------I, I; I II . !I \/==2?, II j I: I! II 1<------ ::::_~ =:w .. ~ -------------------------------------------------------------------------------------------------I; II II II I' I' \/~:'?.. 11.19.2 \ /!':::? \/!":=?.. \/!!5?.. ~~--- \/:'SER MSEL· Multiplier Selector The MSEL is composed of two 62 bit, 2 to 1 selectors. One selector selects the carry input to the MROW1, the other selects the sum input. The two possible carry inputs to the MROW1 are the F_1 %MTCR_L and zero in some bit positions, or the MCR fed back from the bottom of the MARRAY. The two possible sum inputs to MROW1 are the F _1 %MIPPR_L or the MSR fed back from the bottom of the MARRAY. If the signal MSEL_PASS_FB_H is asserted, then the MCR and MSR outputs are passed to the MSEL outputs, MCSELO and MSSELO, respectively. Otherwise the F_1%MTCR_L and the F_1%MIPPR_L are passed to MCSELO and MSSELO, respectively. The MSEL_PASS_FB_H signal is asserted during the second execute cycle of MULD/G and MULL. MCSEL0<:A2:B58> and MSSEL0<A2:B58> are driven to the MROWl. 11.19.3 MROW1· Multiplier Row 1 The MROW1 is composed of a row of 59 CS adders and a 3 bit carry propagate adder. The MROW1 is actually the first physical row of the multiplier array but since the summand selection is performed in stage 1, the MROW1 has no summand selector. The CS adders perform a carry-save addition on MRWIR_L<A2:B55> (the summand), MCSEL0<:A2:B55> and MSSEL0<A2:B55>. The 3 bit carry propagate adder adds MCSELO<B56:B58> and MSSELO<B56:B58> and is needed to maintain a correct partial product. The 3 bit carry propagate adder insures that if the bits of the C and the S vector which are shifted out of the array cause a carry of bit significance DIGITAL CONFIDENTIAL The Fbox 11-49 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 B55, that carry is correctly added to the partial product. The MROW1 generates a carry output, MRW1_C, and a sum output, MRW1_S, which are input to the MROW2. 11.19.4 MROW2 - Multiplier Row 2 The MROW2 is composed of a row of 59 CS adders and a 3 bit carry propagate adder. The MROW2 is the second physical row of the multiplier array, and like the MROW1, it has no summand selector. The summand selection for the MROW2 is done in stage 1. The CS adders perform a carry-save addition on MRW2R_L<A2:B55> (the summand), and sign extended MRW1_ C<A2:B53> and MRW1_S<A2:B52>. The 3 bit carry propagate adder adds MRW1_C<B54:B55>, MRW1_S<B53:B55>, and the carry out of the 3 hit carry propagate adder in the MROW1, MRW1_C<B56>. This 3 bit carry propagate adder is needed to maintain a correct partial product. The 3 bit carry propagate adder insures that if the bits of the C and the S vector which are shifted out of the array cause a carry of bit significance B55, that carry is correctly added to the partial product. ·The MROW2 generates a carry output, MRW2_C, and a sum output, MR\V2_S, \vhich are input to the first row of the :MARRAY. 11.19.5 MARRAY· Multiplier Array The :\!ARR...4..Y is a 3 bit retirement per row multiplier array which has 7 ro~s of multiplier cells. The MROW1, MROW2, and the :MARRAY are used together to generate a carry and a sum vector which are added in stage 3 to produce the final product. The inputs to 1W\RRA.Y are F_1%FDIR, F_1%FD2R, ~mECR~ ~m'W"2_C, and MRW2_S. The F_1%FD2R and F_1%FDlR contains 1*multiplicand and 3*multiplicand respectively for MUL instructions. The MRECR contains the recoded multiplier bits. The MRW2_C and MRW2_S signals are the carry and sum outputs of the MROW2. Each multiplier cell is composed of a selector and a CS adder. The selector selects the summand input and the CS adder adds the summand to the partial product. The MRECR[O:6J<5:0> control the summand selectors. The selector inputs are F_1 %FD2R, F_l%FD2R left shifted by 1 bit position,F_l%FD1R, F_1%FD2R left shifted by 2 bit positions, or zero. The selector can generate the ones complement of any of the previously mentioned inputs for generating negative summands. The ones complement of zero is never generated. The MARRAY selector outputs are unconditionally latched in PHI_4. The least significant bit positions <B56:B58> in MARRAY, as in the MROW1 and MROW2, are populated by three bit carry propagate adder cells which are used to calculate carrys which have the weight of the <B55> bit position. The carry and sum outputs from the second row of the MARRAY, MA_C[1] and MA_S[1], are latched unconditionally in PHI_4. The least significant 5 carry and 6 sum outputs of the MARRAY cells in the fifth and sixth rows of the MARRAY are latched in the MILSBCR and MILSBSR. The MILSBCR and the MILSBSR are used in stage 3 to form the 6 least significant bits of longword products. The carry and sum outputs from the last row of the MARRAY, MA_C[6J and M,A..S[6], are latched unconditionally in PEn_2. The latched versions of MA_C[6] and MA_S[6] are the MCR and MSR signals and are driven to the MSEL and stage 3. 11.19.6 MILSBSR<S:D>· Multiplier Integer LSB Sum Register The MILSBSR is a 6 bit register. This register holds a 6 bit sum vector which is used to form the least significant 6 bits of the 64 bit product of longword operands. MILSBSR<5:3> are written with MA_S[5]<B53:B55>, and MILSBSR<2:0> are written with MA_S[4]<B53:B55> uncondtionally in PEn_2. The contents of this register are undefined for instructions other than MULL. The MILSBSR is driven to stage 3. 11-50 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.19.7 MILSBCR<4:0>· Multiplier Integer LSB Carry Register The MILSBCR is a 5 bit register. This register holds a 5 bit carry vector which is used to form the least significant 6 bits of the 64 bit product of longword operands. MILSBCR<4:3> are written with MA_C[5]<B54:B55>, and MILSBCR<2:0> is written with MA_C[4]<B54:B56> uncondtionally in PIn_2. The contents of this register are undefined for instructions other than MULL. The MILSBCR output is driven to stage 3. 11.19.8 RSHIFT· Right Shifter The RSlnFT shifts F_1 %FD1R to the right by 0 to 57 bit positions depending on the control signal FORCE_SHFT_O and the output of the shift decoder, SDECO. The RSHIFr is used for pre-aligning operands in ADD and SUB instructions under certain conditions (for details see the description of the stage 2 control) and right shifting the fraction of a Hoating point operand in CVTFIICVTRFI instructions. If the signal FORCE_SHFT_O is asserted~ the RSEnFT will pass F_1 %FD1R<AO:B58> to its output RSHFTO<-~O:B58> unshifted. If FORCE_SHFT~O is deasserted. F _1 %FD1R is passed to RSHFTO right shifted by 0 to 57 bit positions~ depending on the state of SDECO<57:0> which has exactly 1 bit asserted. F_1%FD1R<AO> is always passed to the RSHFT<AO> output. The RSHFTO<BO:B57> bits which become vacant due to the right shift of F_1 %FDIR<BO:B58> are zero filled. The RSHIFT output, RSHFTO, is driven to the RSHFTOR. 11.19.9 RSHFTOR<AO:B58>· Right Shifter Output Register The RSHFTOR is a 60 bit register which is written with RSHFrO<AO:B58> unconditionally in PEn_4. The RSHFTOR is driven to the FD1SEL. 11.19.10 SOEC· Shift Decoders The SDEC decodes the F_1 %E_DIFFR_H<5:0> from the stage 1 exponent datapath to a 58 bit output, SDECO<57:0>, which has exactly one bit asserted. The SDECO is the fully decoded right or left shift amount which is used to control the RSInFT or the normalizer in stage 3, for ADD, SUB and CVTFI instructions under certain conditions (for details see the description of the stage 2 control). The assertion of SDECO<57> corresponds to a shift of zero. The assertion of SDECO<56> cOlTesponds to a shift of 1, and so on. The SDEC output is driven to the RSlnFT, the SDECOR and the DETL. 11.19.11 SDECOR<S7:0>· Shift Decoder Output Register The SDECOR is a 58 bit register which is written with SDECO<57:0> unconditionally in PHI_4. The SDECOR is driven to the LSSEL. DIGITAL CONFIOENTIAL The Fbox 11-51 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.19.12 DETL· Detection Logic The DETL detects if F_1 %FDlR is equal to zero, generates outputs which are used by the leading one detection logic, L1DETL, and calculates the sticky hit for the adder in stage 3. The sticky hit is needed for ADD and SUB instructions under certain conditions (for details see the description of the stage 2 control). If the control signal DETL_EN_STKY_L is asserted, the DETL takes F _1 %FD1R and SDECO as its inputs and calculates the sticky hit, STKYR. The sticky bit is set if a one in the F_1%FDlR is right shifted out of the B58 bit position by the RSHIFT. If F _2%SET_STKYR_H is asserted, STKYR is set independent of the DETL inputs. The STKY latch is written unconditionally in PHI_2 and is driven to stage 3. If DETL_EN_STKY_L is deasserted, the DETL takes only the F_1%FD1R as its input and it generates outputs which are used by the L1DETL. The DETL has two outputs, FZ and DETLO<65:0>. The FZ is the zero detection output and is driven to the stage 2 control block. The DETLO<65:0> outputs are driven to the DETLOR. FZ is asserted if F_1%FD1R<BO:B57> are all zeros. The FZ output is conditionally loaded in PEn_2 in the stage 2 control. 11.19.13 DETLOR<BO:B57>· Detection Logic Output Register The DETLOR is a 58 bit register which is written with DETLO unconditionally in Pffi_41. The DETLOR is driven to the L1DETL. 11.19.14 L1DETL· Leading 1 Detection Logic The L1DETL is used to· determine the bit position of the leading or most significant bit of the F_1%FD1R<BO:B57>. If F_l%FD1R<AO> is a 1 then leading 1 detection is performed on the ones complement of F _1 %FDIR<BO:B57>, otherwise leading 1 detection is performed on F _1 %FD1R<BO:B57>. The LlDETL output, LIDETLO, is 58 bits wide and has exactly one hit asserted. The LIDETLO output determines the shift required to normalize (the normalizer is in stage 3) the F_1%FDIR in CVTIF and under certain conditions ADD and SUB instructions (for details see the description of the stage 2 control). If LIDETLO<BO> is set, the left shift amount is zero. If LIDETLO<B1> is set, the left shift amount is one, and so on. If the signal E1Z_E2Z is asserted, LlDETO<BO> is set independent of the DETLOR outputs. If EIZ_E2Z is deasserted, the L1DETL outputs depend on the DETLOR outputs. The LIDETLO is driven to the left shift selector LSSEL. 11.19.15 LSSEL· Left Shift Selector The LSSEL is -a 58 bit 2 to 1 selector which selects between LIDETLO<BO:B57> and SDECOR<57:0>. If the signal LSSEL_PASS_SDECOR_H is asserted, then SDECOR<57:0> is passed to LSSELO<BO:B57>. Otherwise LIDETLO<BO:B57> is passed to LSSELO<BO:B57>. LSSEL_PASS_SDECOR_H is asserted if a CVTFI instruction is decoded, or if and an effective subtraction and exponent difference greater than 1 is detected. LSSELO is driven to the LSHR. 11-52 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.19.16 LSENC· Left Shift Encoder The LSENC does a binary encoding of the LSSELO<BO:B57> and drives the encoded signal, LSENCO_DYN<5:0>, to the ED2R. The LSENCO_DYN signal is used in CVTIF and under certain conditions ADD and SUB (for details see the description of the stage 2 control). LSENCO_DYN is used to form the result exponent in CVTIF. LSENCO_DYN is used to correct the result exponent due to normalizing the result in ADD and SUB. LSENCO_DYN<5:0> is driven to the ED2R. 11.19.17 LSHR<57:0>· Left Shifter Control Register The LSHR is a 58 bit register which is written with LSSELO<BO:B57> unconditionally in Pffi_2. The contents of this register determine the number of bit positions the normalizer in stage 3 will shift its input data. Exactly one bit of LSHR<57:0> is asserted. The LSHR output is driven to stage 3. 11.19.18 FD1SEL· Fraction Data 1 Selector The FD1SEL is a 60 bit 2 to 1 selector that selects the input to the stage 2 FDIR. If FD1_SEL_PASS_O_l is asserted, zero is passed to the FD1SEL output, FDlSELO. If FI;n_SEL_PASS_D_l is deasserted, then the RSHFTOR is passed to FD1SELO. The FD1SEL output is driven to the FDIR. 11.19.19 FD1R<AO:B58>· Stage 2 Fraction Data 1 Register The FDIR is a 60 bit register which is written with FDlSELO unconditionally in PHI_2. The contents of this register for all instruction flows are given in the description of the stage 2 control. The FDIR output is driven to stage 3. 11.19.20 FD2R<AO:B58>· Stage 2 Fraction Data 2 Register The FD2R is a 60 bit register master/slave register. The master register is written with F_l%FD2R unconditionally in Pffi_4, and the slave register is written with the output of the master unconditionally in Pffi_2. The contents of this register for all instruction flows are given in the description of the stage 2 control. The FD2R output is driven to stage 3. DIGITAL CONFIDENTIAL The Fbox 11-53 NVAX CPU Chip Functio:na1 Specification, Revision 1.0, February 1991 11.19.21 Exponent Datapath Figure 11-19: Stage 2 Exponent Datapath Block Diagram F 1 E%ED1R H 1 1 \ / K 1 1 F 1 nED2R H \ / - +---------------.-+-----------------------------------------------+ (NO) ------->1 ZERO DETECTION 1------> F 2 E%E -- (NO) 1 1 ~------+-+-----+-+----------------------------------------+-.----+ F-2-E%E: 1 1 F 1 £%ED2R E - - F 1 E%ED1R H 1 1 K 1 1 -- -\/ \/-- \/ - ~---------------.-------------------------------------------------+ = :2 =%:SE:'l EDl? A E/:' (AU ) -->1 =-2-C~:.;E:.:-K A 'E/:; (r.-:J) :===%:SE::.:=~=:?,,-=_E/:'(;"":} =- (:ro) =_:_=~:s:::,,:_?._=_:./: --->! !!~U: -->1 =OR SE:Z::TOR 1 ~~ER ---->: :!~_:.;,. :!::_~=!! 1 \ I \ I ->; {:::-j ------>, ---------.-----------------------------.----------~--------------+ K • I I ! ------------>,---------~-----------------------------------------------~-+----~ --~------~-------------------------------------------------.----1 K I \ / F 1 £%E:':'?? i -- -....._..__.. '::>'u.,. " - +-----------------------------+-+-----------------------------------+ --------------->! 't: ------------.-----------------~-------------------------------------. E.iWl 1 1 1 1 \ / - \ / +-------------------------------------------------------------------+ OVERFLOW AND F :2 C%EN G TYPE 1. (AU) ---->1 ::=2:=C%EN:=:D_TYFE_I. (NO) --->1 F_2_C%OSEL1_ZERO_H -------->1 UNDERFLOW DETECTION LOGIC +---------+-+-----------------+-+-----------------+-+---------------+ £ ADl 1 1 F OVF1, F OVF2 1 1 F :2 E%£01 H 1 1 -- - \/ -------> ADI PH!_:2 \/ - - \/ +-------------------------------------------------------------------+ OUTPUT SELECT -------->1 F :2 C%OSELl ZERO 1. F-2-C%OSEL1-E H 1 (F_2_C%OSELl_ED1R:=H) , ------>1 F_:2_L%LSENCO_H - AND PH! LATCHES +----+ +---------+-+-------------------------------------+-+---------------+ 1 1 t------------------+ 1 1 +----+ I 1 1 1 1 I F 2 E%F OVFR H (NU) 1 1 --->IED2RI-----------1 t I 1 I 1 ---------------+ \ / F_2_E%ED1R_H V ED2R \ / F_2:=E%F ~ H- (NO) 11.19.22 Zero Detection This functional block is not used in stage 2. 11-54 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.19.23 Exponent Adder 1 The exponent adder is a l3-bit static adder used to add or subtract two inputs. Each input is passed through a 2 to 1 selector and inversion logic prior to the adder. INP_lA can be selected from EDIR or K. If ISEL1_EDIR_A is asserted, then EDIR is passed through the selector. If ISEL1_K_A is asserted, then K is passed through the selector. Inversion of the adder input is then done based on the assertion of INVERT_EA_ADI. INP_IB can be selected from ED2R or K If ISEL1_ED2R_B is asserted, then ED2R is passed through the selector. If ISEL1_K_B is asserted, then K is passed through the selector. Inversion of the adder input is then done based on the assertion of INVERT_EB_ADI. The adder also contains a carry-in to the LSB cell, CIN_E_AD1_H. The carry-in is primarily used for performing subtraction operations. Since the adder is static, it begins its operation when the input data is valid near the falling edge of phase 1. Intermediate results in the exponent adder are latched in phase 3 and sent to the detection logic and output selector. For stage 2, L~l>_lA always selects EDIR not inverted. INP_lB always selects K Inversion of Thl>_lB is done based on the assertion of CIK_E_ADl. In other words, in stage 2 Il\"VERT_EB_ADl is shorted to and named CIN_E_.Wl. 11.19.24 Floating Overflow and Underflow Detection This functional block is not used in stage 2. 11.19.25 Output Selector The output selector is used to select the output data from three different sources: edlr, eadl or zero. This selection is done for the exponent output data (EDIR), the floating overflow (F_OVFR) and the floating underflow (F_UNFR). The selection is based on the assertion of two control signals, OSEL1_ZERO and OSELl_E_ADl. OSELl_E_ADl if asserted, selects the output from E_ADl; for overflow and underflow, OSEL1_E_ADl selects E_ADl_UNF and E_ADl_OVF. If OSELl_E_ADl is deasserted, then the output is selected from EDIR; for overflow and underflow, OSELl_E_ADl deasserted selects EDIR_OVF and EDIR_UNF. This selection is done using a 2 to 1 selector. The- selection of zero is done prior to the 2 to 1 selector described above. If OSELl_ZERO is asserted, then the inputs from E_ADl and EDlR entering the 2 to 1 selector are both forced to zero. Then, since only one s~lect line is used to control the selector, the zero value will be transfetTed to the output regardless of the assertion of OSEL1_E_ADl. The output of the selector is latched every phase 1 and driven into the following stage. 11.19.26 ED2R<5 :0> - Exponent Data 2 Register The ED2R is a 6 bit register which is written with LSENCO_DYN<5:0> unconditionally in Pffi_2. The ED2R output is driven to the stage 3 exponent datapath. DIGITAL CONFIDENTIAL The Fbox 11-55 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.19.27 Sign Datapath Figure 11-20: Sign Datapath Block Diagram I F_l I To Stage 2 Control %SlR_H I F_l%S2R_H I <----------+ I <----------1--------------------+ i I V V +--------------------+-------------------+ PE:_ 4 ---------> I Elf. E2? PE!_~ +--------------------~---------+--------+ The 2 inputs to the stage 2 sign data path are F _1 %SlR_H and F _1 %S2R_H These bits correspond to the sign of operand 1 and the sign of operand 2, respectively. The stage 2 datapath does not perform any operations on the sign bits. SIR and S2R are 1 bit master-slave registers. The master latches are '\Uitten unconditionally in PHI_4 and the slave latches are written ,nth the master latch outputs unconditionally in PHI_2. The register outputs F _1 %SlR_H and F_2%S2R_H are driven to stage 3. 11-56 The Fbox DIGITAL CONFIDENTIAL. NVAX CPU Chip Functional Speei:6eation, Revision 1.0, February 1991 11.19.28 Control Figure 11-21: Control Block Diagram From Stage 1 : IOUN: EN DATA TAGR SRC DTR<2> ... ABORT-I CPI!.._ RNDR I I -I FOP_ FLOi\"'R I I I 1·-----+ I II +-----+ I I II I I I I I II I II I I II I II ! I II I II ----------+ .------------+ +--------------. v v v : r~: \I v \/ SCAN FOP F NR E1ZR E DIFF EQ!.. OR DSTDTR I II II II II II I I I I I \/ V !..NEGIR IDIV"BY OR -E NR E2ZR-E DIFF EO!.. 24R MNEGBR IDIV OVFR II II E-DIIT-EQ!..-2SR MNEGWR II SCAN ED? II II E-DIIT-GTR-1R MNEGl..R I I" II I I E-DIIT-GTR-S7R I I I I II II II II I +----, I II II II II II +---, .,.----+ \/ :E::3 ---->~ \/ I 1--> C:l~ ::: ;..:: 1--> ~_:.:c:~,~_~::o \--> os::::.: ::: ;..::. S'I;"~ :. =.=-.:- ::~:..::: -- -- > , CE~:I~ \/ I .; ----> r~: :: ---->! :~:-:::; ---->1 :?:-3'; ---->: :~::~: ---->1 5=;'.1-: \/ !<-- : l~S:R 1<-- :::.rss::?, :~:-: ---->1 r~: \/ ---->:~-----------------------------------------------------------------------------+-+ ---->: :2 CON'!R.O!.. ! --> os::::.:- E::?, !--> CS::::':'-:!~: :. ---> ==~j:_=!':;'.ss_E' ----> i =: <---I ::>~::.. :='N STKY :. <--s=;.:::!.:;..s:~p~ i -:=: ----> ~ S:;',!~_SJ.... ~~:::: :;.~s 1 :ORCE:SE:T:O <---I ::::Z E:Z <---\ :'SS::::_:;"SS_SD~=OR <---I S~: S=:~l:~ <--- 1 FD1SE!.. PASS 0 !.. <---I - F:Z:H --->1 ,-+--+---+---+----+----++----++------+-----+-++----++-------++--------++-++-----, I II II II I II II II II II I II II II II I II II II I II II II II II II I II II II II I I I .1 II II I II tt II II F_IOUNF_EN I DATA_ VALIDR I I IIEFFS_E1Z_E2Z IF_NFl. E1ZR E_DIFF_EQ!,,_OR LNEGIR IDIV_BY_ORI I I F_ZR I SRC DTR<2> FOP _FLOW I I IE_NR E2ZR E_DIFF_EQL_24R HNEGBR IDIV_OVFR I I I I II DST_DTR II E_DIFF_EQL_2SR HNEGNR I CRFL_RNDR-I I I I SCAN_FOP II I II E_DIFF_GTR_lR MNEGl..R I I I II II I I I I 1\ II I I 1\ II I I II II I I I I II II II I V V V V V V V \/ \/ SCAN_EDP \/ \I \I \/ \I ... From Interface '1'0 Stage 3 --------------, I ---------, I -------, -----, II II ,-------1'-----. ,----- The stage 2 control block generates control signals for the stage 2 fraction and exponent datapaths based on opcode and control information passed from stage 1. The control block decodes the datapath control signals one cycle prior to the cycle in which they are needed in the datapaths. The control signals are latched in master slave latches to allow control decoding to overlap with datapath execution and to prevent races. The control block also loads control information output from stage 1 into master slave registers and passes the information to stage 3. The master slave latches which hold the SRC_DTR<2>, FOP_FLOWR<5:0>, DATA_VALIDR, and DST_DTR<2:0> signals are written in Pffi_l (master strobe) and PID_3 (slave strobe). All other master slave latches are written unconditionally in PID_4 and PID_2. If the interface section asserts F _I%ABORT_H the signals F_2%LAT_MUL2_H and F_2%DATA_VALIDR_H are deasserted. DIGITAL CONFIDENTIAL The Fbox 11-57 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The internal signal F_2%LAT_MUL2_h is used to facilitate the sequencing of the fraction multiplier and stalling of the DATA_VALIDR bit transfer for MOLD/G, and MULL instructions. F _2%LAT_MUL2_H is asserted during the second decode cycle of MOLD, MULG, and MULL instructions. If F _2%LAT_MUL2_H is asserted, the DATA_VALIDR bit will not be passed to stage 3, and the multiplier will select fed-back outputs as its inputs. F _2%LAT_MUL2_H is unconditionally deasserted one cycle after it is asserted. The stage 2 control block also modifies one bit of the internal opcode encoding, F _1 %FOP_FLO'\VR_H<l>, before passing it to the stage 3 control if the conditions effective sub and exponent difference greater than 1 are detected. It also contains the FZR bit latch and some logic to conditionally clear the latch. If an effective subtraction is decoded, and EIZR XOR E2ZR is true, the FZR bit latch will be cleared. If this condition is not true, the FZR bit latch will loaded with the FZ output of the DETL in the fraction datapath. 11.19.28.1 Datapath Control Signals Output from Control Block C:C\_E_AD1_H: This is the carry-in to the LSB position of the exponent data path adder, E_.IDl. This signal also controls the ones complementing of the exponent bias. If asserted the ones complement of the exponent bias is passed to the EB_ADl output of the exponent complement logic. If deasserted, the true exponent bias is passed to the EB_•.tU>l output unchanged. F _2_C%CIN_E_AD1_His asserted if a :hofL~f instruction is decoded by the stage 2 control. F _2_C%DETL_EN_STKY_L :This enables the DETL to detect conditions for setting the sticky bit 'which is used by the stage 3 fraction adder. This signal is asserted if an effective subtraction is decoded and the exponent difference between the operands is greater than one. F_2_C%E_K_H<7> : This signal is an exponent bias which is driven to the Il\TP_IB input of the exponent complement logic. This signal is the complement ofE_K_H%F_2_C<lO>. F_2_C%E_K_H<lO> : This signal is an exponent bias which is driven to the INP_IB input of the exponent complement logic. This signal is asserted if the F_1_C%DST_DTR_H<2:0> decodes to G datatype. F _2_C%EIZ_E2Z_H : If this signal is asserted, the LIDETLO<BO> bit will be set (which indicates the contents of the F_1%FDlR is a normalized number) independent of the other inputs to the LIDETL. This signal is asserted if (F_1 %EIZR OR F_1 %E2ZR) AND (effective sub) is detected. F_2_C%FD1SEL_PASS_O_L : If this signal is asserted then the FD1SEL will pass zeros to it's outputs and the stage 2 FDIR will load in all zeros. This signal is asserted if F _l_E%EDIFF_GTR_57_H is asserted, and ADDf or SUBf or CVTfi or CVTR£ is decoded. F _2_C%LSSEL_PASS_SDECOR_H : If this signal is asserted F _2_P%SDECOR_H is passed to the LSSEL output, F_2%LSSELO_H. If deasserted, F _2_L%LIDETLO_H is passed to F_2_L%LSSELO_H. LSSEL_PASS_SDECOR_H%F_2_C is asserted if a CVT.6. or CVTR£ instruction is decoded, or if an effective subtraction and exponent difference greater than 1 is detected. F_2_C%MSEL_PASS_FB_H : If this signal is asserted F_2_M%MCR_L is passed to the MSEL carry output, MCSELO, and the F_2_M%MSR_L is passed to the MSEL sum output. If deasserted, the MTCR%_l is passed to MCSELO, with zeros in the vacant bit positions, and the F_l%MIPPR is passed to MSELO. F_2_C%MSEL_PASS_FB_H is asserted if the internal signal F_2_C%LAT_MUL2_H is asserted. 11-58 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 F _2_C%OSEL1_E_AD1_H / F_2_C%OSEL1_ED1R_H : The OSELl_E_ADl_H and the OSEL1_EDlR_H signals are complementary signals. If OSEL1_E_ADl_H is asserted, the exponent output selector, OSELl, passes E_ADl, E_AD1_0VF and E_AD1_UNF to its outputs. If OSEL1_EDlR_H is asserted, EDlR, EDIR_OVF and EDlR_UNF are passed to the OSEL1 outputs. OSEL1_E_ADl_H is asserted if a MUL or DIV is decoded by the stage 2 control. F_2_C%OSEL1_ZEROS_L : If this signal is asserted the exponent output selector, OSELl, passes zero to its output. If deasserted, zero is not passed to the OSELl outputs. This signal is asserted if a MUL or a DIV is decoded, and F _l_E%EIZR_H or F_1_E%E2ZR_H is asserted. F _2_C%SET_STKYR_H : If this signal is asserted the STKYR is forced to 1, independent of the state of the F_1 %FDlR and the SDECOR. If deasserted, the state of the STKYR depends on the instruction fiow and the data. SET_STKYR_H%F_2_C is asserted if F _l_E%E_DIFF_GTR_57R_H A.'fW NOT(F_1 %ElZR OR F _1 %E2ZR%) is true. F _2_C%FORCE_SHFT_O_H : This signal forces the RSHIFT to pass the FD1R%_l to its output unshifted. If this signal is deasserted, then the RSHIFT shifts the F_1 %FD1R by the number of bit positions decoded by the SDEC. This signal is deasserted if an effective sub is decoded and the F_l_E%E_DIFF_GTR_IR...H is asserted, or if an effective add is decoded, or a CVTfi or a CVTRfi is decoded and F_l_E%E_~"'R~H is low. DIGITAL CONFIDENTIAL The Fbox 11-59 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.19.29 Stage 2 Fraction Datapath Operation Summary The following tables summarize the operation of the stage 2 fraction and exponent datapaths. Figure 11-22: Fraction Datapath Operation Summary ~-----------------------------------------------~----------------------------------------------------------------- Conditions Fraction Datapath Registers and Outputs +----------------+-----+-----+-------------+----+------------+---------~------+----------+-------~-------+---- ---! ope I EFF A/oS : E1Z i E2Z I E_N , ED!FF I ~ , FD1R FD2R I oSTKYR I MC/SR J ~~SBC/ +------+---------+-----~-----+-----+-------~----+----- -------+---------+------+----------+-------+-------.-------I;'''!)~! .A ,SOB! 1-. A i I I o o :. :. oS oS oS o 1 X A s o o o o s s o oS 1 oS o c o c 1 X X X Yo X X X X ~>5i X =:v-C 1 I X X c· : o o oS 1 1 oS y. x o 1 ~:?~: =::?~l :.: I =::"?"-: =-::?-: ~-O X ! X ! ~- n-o X X I x IDIVf, X X X X 'MOV!, IMNE.Gf, I ,CVT!f I I I I CV'T:fi, I ICVTRfi I X X X X , I I I X X ICVT1f I X X X X X X , :.: =::':,,_1 ~-l X ~~~l'* ~.-: X o 1 X X X C X I X ! o c o 'It =:::~ :. '=-:::?. !; R(FIllR-l)"'" !'D2?::' Il<ED<S8 Il<ED<58 I X X It (FD1P:-l) ... FD2? :. I X I ~- 1.- X I c o : ! ::::?:: X I ;:'S~R<:;>-l! \ !..oSE:?<S7>-11 ! =:2?__ :' Il~S8 I I I I I I I I I I I I o ~~R_l'" ::::'?__l I , X I , I X , I I I I ED<58 , EtK58 I ED>S7 UD ,,- .'. I C>Si ITST! FD~P.::. n:R_l 0- (I X X X =D2?-l X i : :~:,<:= 1 !'D2R 1 ?(==:'R-:') "'" ~>:; oS R(FD1R-l)* Yo 0-0 X X y. R(FDlR l} R(FD1R-l) .... :-:2?__: j<: o ..., c x ED<58 ED<5S ED<5S D<Se un- =--:. . - tID tID UD UD UD UD oS!'Y.Y STY.Y UD Yo i , I X I OD , I I I X I x I XI I XI FD2R 1* FD2R-l* FD2R:1* UD UD UD UD SDECOR UD OD OD OD L1DETLOR UD OD UD UD UD un un OD UD +------+---------+-----+-----+-----+-------+----+------------+---------+------+----------+-------+-------+-------Figure 11-22 Cont'd on next page 11-60 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip F~ctional Specification, Revision 1.0, February 1991 Figure 11-22 (Cont.): Fraction Datapath Operation Summary ~----------------------------------------------+--------------------------------------------------------------------+ Conditions Fraction Datapath Registers and OUtputs I +-----+---------+-----+-----+-----+-------+----+----------+-----------+------+----------+-------+-------+-----------+ FD2R I ope I EFF AIS I E1Z I E2Z I E_N I ED IFF I M2 I FD1R I F_ZR I LSHR I STKYR I Me/SR I MILSBC/SR I +-----+---------+-----+-----+-----+-------+----+----------+-----------+------+----------+-------+-------+-----------+ I I MOUI MOLFI M'J!..FI M"J"....F I MtJ...!" I IMn.DG I IMULDG I lMU:.DG I IMC'!.DG I IHt-::'DGI 0 0 x x 1 0 I 0 1 X X X X X X X X I I I 1 I 0 I 1 I 0 I 0 I 1 I I 0 I I X X X X X X 0 1 X 0 X 0 1 X 0 0 X 1 X X X 1 X x x X X x X X X X x 1 Il-ft':.!;\; I X X !.z,:..:., ~ x x x x x X X X X X I M'..-:o:. I :/5 ?. . (;a_=-_:.. ,: x , FD1R_ U FD1P, 1* FD1R:1* FD1P,-1* FD2R 1U FD2R-1* FD2R:1* FD2R_1* tID tID FD1P, HI FD1P,- Hi FD1R:1;!'D1R 1;FD1R:1· FD1F~ 1* =~:'F,"- :.~ FD2R a~ FD2R-1*+ FD2R:1* :Il2R 1'" FD2R-1· :=2?..- 1* ~:ZR: lee FD1P,- l~ F:l2?,_ l~~ 'CD UD tID UD UD OIl OIl OIl 'CD UD UD .... ..., im TJD tID tID tID UD UD tID UD UD im UD UD im TJD TJD u:o u:o tID u:o u:o u:o CIS 0 0 0 UD u:o CIS u:o u:o UD UD OIl UD OIl OIl OIl tID UD UD UD UD im C 0 tiD to to O!) CiS CiS :J:) :;~::..:-=.es 7a.::"ci =a:-~~ a::.c S~::: ,"ec':ors. - :;0&:-.. :-:'&5 -:.::'a-; '":-~~ s-:.::.;_ : =:..;::::. s::'!!-:e.= !::::-.;-; :'s ;....;.._; - - 5-:.:.;s : ~ == X X 0 0 0 - :=-==-=S -:.::':"5 :--=;:"5-:'£= ':.c. =.... a:: %06::5. Cc~~a~~s 3~ul~i~lica~: ::::~a!~s :9~~:~~;:~:~~. DIGITAL CONFIDENTIAL gene:atec i~ stage 1. The Fbox 11-61 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 11-23: Stage 2 Exponent Datapath Operation Summary +-------------------------------------------------+-----+-------------------------------+ Cond.itions I Exponent Datapath Registers +-------+---------._----+-----+-----+-------+-----+-----+----------------+--------------+ ED1P. OPC I £FF A/S I E1Z I E2Z I E_N I ED IFF I M2 I K I ED2R +-------+---------+-----+-----+-----+-------+-----+-----+----------------+--------------+ I I I I I IADD! ISUBf I A S S S S S X I X 0 0 1 I 0 1 1 I I I X I 0 1 X I ICM?!, X X I"'''"'''~ I 0 X X l=n"!'/D ,---.::, ,--_ ..z 1--"- C - .t. I---:~Z X X I I I X I I I I I 0 I I I I I I I :. I I i ;!"::'-;.-=, X X X X :l': X I IED-O,l IED-C,l IED-o,l 1£D-0,1 I ED>l X I I j: i :r: X X X X X X 1I-~~~ _. ___ , ! I =-",,":!,:', I I X :l': I X ! ): I I I I !!,,~,;~~!, X I I I I I I X X X X X X X X X X X X I I I I X I X X X ED1P. 1 ED1P.-1 ED1P.-1 ED1R:l ED1R 1 I I I I I I I I I I EDU(1 0 0 0 tID ~lP._l tID 128 I £_.iWl-:::;:):P._l+K - ): :.: 1:::"; X X X I .t. :.: 0 X X :.: :tl)l?___ -;.:;.:-:::::?~- :~::. (, X UD LSENCO t;~ '"'-'~ - -...; ! :-. -:?!! I--_.I.e1_" __ - ): I .!-. i I~.:..!'/D X X X X !!'!".:'!.G IMti:.! 11-:,,;-:'! 0 0 1 X I I I X I X I X I X I X 128 !lO24 I I I I I I I X :',$::'::== ! I 0 0 I X I X X I :-: 1 I X I I I I I I X X X X X X X E 1.Dl-ED1P. l-K ::::AD1-!:t):!.P:l-K 0 0 t:D UD tID W tID X X X X tID X I X lM1'r~, I X +-------~---------+-----+-----.-----+-------+-----~-----+----------------+--------------+ opeODE OPC Denotes that the operation is an effective ADD or SUB. Err A/S Exponent d.ifference. ED Undefined tID - - 11.19.30 Passthru Signals MMGT_FLT_L, MEM_ERR_L, RSV.ftDR_L and PSL_FU_H signals are simply passed through stage-2 without change. They are latched coming in from Stage-l during Pffi_4 and driven to Stage-3 during PIn_2. NEW_FOP_H signal also passes through to Stage-3 unaffected. It is latched during PHI_l coming from Stage-l and driven to Stage-3 during PHI_3. This signal is gated with the global purge signal F_I%PURGE_H from the input interface which clears it on a PURGE from the input interface. This signal is used by the Output Interace to manipulate its control-queue and data-queue pointers. 11-62 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.20 STAGE 3 11.20.1 Introduction Stage 3 of the pipeline is used primarily to left shift an input, or to perform the addition of two inputs. Stage 3 contains a control section and portions of the fraction, exponent, and sign datapaths. In addition, stage 3 has the capability to bypass stage 4 rounding operation for certain instructions. Stage 3 takes virtually all of its inputs from stage 2 of the pipeline, and drives it's outputs either to stage 4 or to the output interface directly. The fraction datapath portion of stage 3 consists of a left shifter, an instantiation of the generic adder and some mini-rounding incrementers. The left shifter is used for convert and effective subtraction-like operations. The adder is used by all other operations either to pass an input to the output (by adding zero), or to add two vectors-for example, the two input operands (correctly aligned) for addition/subtraction, or the sum and carry vectors for multiplication. The mini-rounding incrementers are used to round the fraction result during a stage 4 bypass operation. Stage 3 also performs the injection of the sticky bit and increments the quotient, dependent on the sign of the remainder. The output of stage 3 is always normalized, where relevant. The exponent data path consists of the generic exponent block. In this stage, the input selector, adder, and output selector are primarily used. For addition, subtraction, multiplication.. and division, the adder is used to increment/decrement the input exponent according to whether the fraction addition can o\7eriiow/underiiow. It also subtracts the left shift amount \",hen the fraction portion performs a left shift. The sign datapath portion in stage 3 will generate the correct sign for the result during a successful stage 4 bypass. No operation is performed on the sign bit that is sent to stage 4. Some integer overflow detection logic is included in the control path. Additionally the six LSB's generated for MULL are combined, and a few stage 4 signals are generated. 11.20.2 Stage 4 Bypass For a specific set of instructions and conditions, stage 3 can supply a result to the output interface directly. This is referred to as a "stage 4 bypass" and improves Fbox latency by supplying a result one full cycle earlier than the stage 4 supplied result. In order to bypass stage 4, stage 3 must perform the required operations that stage 4 would normally perform under the same conditions. This includes rounding the fraction, supplying the correct exponent and generation of the condition codes and status information that is related to the result for floating ADD, SUB and MUL instructions. Stage 3 performs the rounding operation through the use of incrementers. These incrementers are much smaller in width than the number of fraction bits for a particular data type due to timing constraints. Because of the limited size of the inerementers not all fraction datums can be correctly rounded by stage 3. (The mini-round succeeds if the selected incrementer for a bypassable instruction does not generate carry out.) If the mini-round fails, the unmodiiied fraction is driven and the stage 4 bypass is aborted. Stage 3 and stage 4 share common busses to drive results to the output interface. Stage 4 will drive the busses, during phi3, if it has a valid data. Stage 3 will drive the busses, during phi3, if it can successfully bypass an instruction and stage 4 does not have a valid data. DIGITAL CONFIDENTIAL The Fbox 11-63 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The stage 3, stage 4 common busses are listed below: f b~f out l<bO:bSS> fract~on result bus f:b.e:out:l<lO:O> exponent result bus f b~s out si~n result bus f:b.n:out psin result bus f b.z out pslz result bus f:b~:out pslv result bus 11.20.2.1 Stage 4 Bypass Request When stage 3 has detected that a stage 4 bypass may be possible it signals the output interface by asserting the signal F_S%S4_BYPASS_REQR_B during pbi4. All of the following conditions must be met in order to generate a stage 4 bypass request. = :'::e s:!?"=.:1. F_2%D~VALIDR_H :is asse::e:! :in::':=.::!.,:; :.':Ia: :r.e d=.:=. p:ese::: a: s:a~E ;'e i~pu: is vA1~=. = ~e s:';::=.l F_3%DATA..'vALIDR_H :'s N~!' asse::.: :'::::!=a::!.,:~ :ba: c : • .!':.:..:!: ?;:.~ :.::: ..!E!:: =.: E:a.~• .; :.:: :bE ~:fj-'-!.=;';S c.,:·:':' •• o :;e:!. :be: of tne :wo input ope:lUlcU are reserved opera.~ds. 11.20.2.2 Stage 4 Bypass Abort In order to abort a stage 4 bypass, the signal F _3%S4_B'YPASS_ABORTR_H must be asserted during phl2. Either of the two following conditions must be met in order to abort a stage 4 bypass assuming the bypass request was generated. ~be selected mini-round incramenter carried out of it's most significant bit position. o Mini-round fa~lure. o Exponent overflow or underflow ~s detected on e~tber of the two exponent results in stage 3's exponent section, ~rrespective of the possible l-bit left or right sbift required for the fract~on adder result. 11.20.2.3 Stage 3 Response to FBOX Purge Stage 3 responds to the FBOX purge by clearing from stage 3, the data_valid flag and also the new_fop flag. 11.20.3 Section Implementation Description 11-64 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.20.3.1 Block Diagrams Stage 3 is made up of three sections: control, fraction, and exponent. On the following pages, block diagrams of the fraction and exponent datapaths are shown. DIGITAL CONFIDENTIAL The Fbox 11-65 NVAX CPU Chip Functional Speci1ication, Revision 1.0, February 1991 o o Figure 11-24: Stage 3 Fraction Datapath Block Diagram STAGE 3 CIN 0 CIN G COUT GID COUT F SELECTc :0. SELECTORS OUTPUT LATCH -----. . . -------,-+.ot------, ,---------,-----, STAGE 4 OUTPUT INTERFACE ,--------------------------------------,-------------------, o 11-66 The Fbox o DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Speci:ficatiollt Revision 1.0, February 1991 Figure 11-25: 1 I Stage 3 Fraction Mlni·round Block Diagram ·-.I."_I_"","~U"_ . . U.,-~cult,u."'1"'" •• "_·t......u.._Ioc.s\,,.... !! I i "_I A~.Uflll_ ..c.a':.H,. 1 ! 1 'I' 1"_!_trIII"l.e+t"_o:r_"•• u:., •• ! I ~_._."r.'u - .._rC....:•• I. I '_.~&........,_..C.I~:.I'. 1 i I f ._I_.... L.... ~_OU~.... c"O:.I ... ------I · 1 · -. C'_'_"'''::v.'.U.cr,o-.. I.i~_''' i i 1 i : c _1_ ... III'~I.rCJ:lb i ________.~~~-y_._.~_Q~-_I~_e.~-_=o~-~. __~~------ _ ~: L.J ........ ~ , II r:_ .......-"_00.. 1... i j._...."_.D..._.. C •• ,.... DIGITAL CONFIDENTIAL The Fbox 11-67 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 11-26: Stage3 Exponent Datapath Block Diagram 1--------------- ~_,.I.. IDZA_"c5IO. D1R.hc1ZlO. ~ ( ID2A_"cn.h • .'.C"".K."C1O. 7.5.6.a. ZERO DETECTION va, ) ~••• I",,1ZR.H IN U) ~.'.I"'UZA." (NUl 1A_"c,a:G> jI:' F_ •• C",'.IL'_ID1A.A.KlL ••••• F _'.C'"IL'.K_"'.H/L •••••••• • _~.C ... II!. '.K.'.H:~ •••••••• F.'.C"".!L 1.1112".I.H/L ••••• .t_C'II.E._K."c10. 1.1 ..4.1. INPUT SELECTOR FOR EXPONENT ADDER '.~-li!r· ."_"c'2:11. ! '£~."'.U'l2:~. ~_'_C~£_K_"c'C..'7.I ••• I 0 81 t1 :~~~;'AlIC2"_"CI:O .. _- - ! COMP1.EMENT LOGIC FOR EXPONENT ADDER , STAGE 3 i. _2.... IDI·_ ....I:O .. I ED,R 1.ATCHES '_'.C"'I".Il."""_L (NUl····. F_•• c,e,,_~::. ""~!.L. (NUl •••• '_•• =",0.11. '.Z!RO." ••••••••• ~--~~--------~--------~--------~ • _ •• C",OIIL '_ZERO.L ••••••••• ' ••• C'OIU,_E.AD'.H •••••••• • _~.C ..0.IL'.ID1R. . . . . . . . . . . . '-----r-o:--------------...,.~------' ------------------------r+-~'---------------------~~---------------------- STAGE 4 OUTPUT INTERFACE 11-68 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Speci:fi.cation, Revision 1.0, February 1991 11.20.4 Fraction Datapath The operations performed in the fraction datapatb. in this stage are shown in the following table. Table 11-11: Stage 3 Fraction Datapath Operations Category Operation Condition FO LSHFT_OUT <- FDIR.SHL.[LSHR] EFF.SUBft deltaE < 2, neither operand 0; CVTif; CVTfi, left shift; CVTRfL, left shift. Fl LSHFT_OUT <- FD2R.SHL.[LSHR] EFF.SUBf, deltaE < 2, operand(s) =0 F2 SUM <- FD2R + FDIR = EFF.ADDf; CVTft'; MOV£; MNEGf; CMPf; TSTf; CVTfi, right shift F3 SUM <- FD2R + .not.FDlR + .not.STKYR EFESUBf, deltaE > 1 F4 SUM: <- FD2R + FDIR + Rnddi CVTRfL, right shift F5 Sm! <- FD2R + FDIR + .not.F_h'"R sm! <- .not.:MCR + .not.:MSR DIVf :MULf; A1ULL; F6 11.20.4.1 Normalizer Input Selection The data to be left-shifted may be contained in F_2t;CFDIR_H or F _2%FD2R_H. The normalizer input selector is used to select between these two input registers. 11.20.4.2 left Shifter The left shifter is capable of performing zero to fifty-seven bit left shifts. The shift amount is driven on the LSHR lines in decoded form.. The output of the left shifter is driven on LSHFT_OUT to the stage 3 output selector. For effective subtraction exponent difference equal to zero, the output of the left shifter may be negative. The shift amount is forced to "shift of zero" if stage 3 is in FBOX_Test mode or the chip is reset. t DIGITAL CONFIDENTIAL The Fbox 11-69 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.20.4.3 Adder Input Selection The adder is driven with two input vectors: AIN and BIN. AIN can be FD2R or MSR; BIN can be FDIR or MCR. Note that for several operations, either FDIR or FD2R must be zero; the data is contained in the other register. These operations are: CVTff MOVf MNEGf CMPf TSTf CVT:6., right shift CVTRfL, right shift DI\Tf 11.20.4.4 Adder The adder uses two 61-bit inputs to derive a 62-bit result. The 61-bit inputs have two bits above the binary point and 59 bits below; the 62-bit result has an extra bit above the binary point. In this stage, the most significant bit of each input is not used; neither are the two most significant bits of the output. The main carry acceleration technique used is carry select. The adder is broken up into nine small groups, with all but the least significant group having duplicate carry chains. These carry chains operate in parallel during the early part of the execute cycle. Propagate and generate logic operates before the carry chains. These parts of the adder are fully static. During the late part of the execute cycle, the sum logic executes. Just as for the carry logic, there is duplicate sum logic for all groups except the least significant one. In addition, logic to derive the true group carry out signals executes in these phases. These carry out signals are used to select the correct sum values. These parts of the adder are also fully static. NOTE FOR MULL: The adder in stage 3 adds the 58 MSB's generated by the multiplier array. <1358> of AIN and BIN is forced to zero for multiply operations. Shift Detection Logic: The most significant group of adder bits, bit positions <A2:Bl>, is different from the groups below it. In this group, both the carry and sum logic execute during the early part of the execute cycle. Late in the execute cycle, shift detection logic executes. If enabled, it examjnes the sum bits <AO:Bl> to determine whether a one bit shift right or left is needed to normalize the result. The possible values of sum bits <AO:Bl> are given in the table below for each operation which may yield a non-normalized adder result. 11-70 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 11-12: Possible Values For Sum Bits cAO:B1 > Result II Result '2 Result '3 Operation 0.1X. l.xx 0.01 0.00 EFF.ADDf 0.00 EFF.SUBf, deltaE>l 0.1X. 0.1X. 0.01 0.00 MULf 0.1X. 1.xx 0.00 DIVf If the shift detection logic is disabled, then the signal indicating "no shift needed" will be asserted. This logic is also conditioned with another signal (sel_other) which is used to deassen all of the shift detection signals. Since these shift detection signals are used to drive the output selector for the stage, this feature permits the selection of a stage output other than the shifted or unshifted adder result. The logic used to control the shifting is as follows: Det_shrl detects the case in which the fraction result is l.xx.xx, and thus the fraction mustbe shifted right by one bit to be normalized. Det_pass detects several cases: first, the case in which the fraction result is O.1XX..xx; second, the case in which the fraction result is zero (O.OOxx..xx); last, the case in which shifting is disabled. Det_shll detects the case in which the fraction result is O.OlXX...xx, and the fraction must thus he shifted left by one hit to be normalized. The detection logic is duplicated, with one copy for each of the two sets of sum bits. This logic is fully static. The correct shift signals are selected dynamically by the true group carry out of the previous group, and driven out of the adder. A signal indicating whether a shift was done is driven to the exponent section, where it is used in selecting the proper exponent output. Bit Injection Within Adder: The adder performs rounding and two's complementing for all datatypes. The following table shows the bit positions into which injection is done. The bit positions are defined as C(Y), meaning the carry in of the yth bit position. This carry in is derived by forcing a carry out to be generated in bit position (y-l). Only Rnddi is used in this stage. DIGITAL. CONFIDENTIAL The Fbox 11-71 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 11-13: BH Injection Within Adder 'IYPe of lDjectiol1 Rudf Rnddi Rudg c(B24) c(B56) c(B53) c(B55) The carry in to hit position <SSB> is set directly by the stage's control section. 11.20.4.5 Mini-Round Incrementers These incrementers are used to round the fraction result supplied by either the left shifter or the adder. The incrementer for D and G type is four hits wide while the incrementer for F type is three bits wide. , 1.20.4.6 Output Selector The output selector is a precharged l-of4 selector. It selects either the left shifter output or the adder output (shifted left one bit position, passed unsbifteci, or shifted right one hit position). Three of the four selector control signals (the three adder output selection signals) are driven from the adder to the output selector; the fourth (the left shifter output selection signal) is driven from the control section. 11-72 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.20.4.7 Fraction Datapath Operation Summary (Normal Operating Mode): Figure 11-27: Fraction Datapath Operation Summary +------------------------------------------+------------------------------------------------+------------------+ Conditions , FOP Inputs , FD? OutPUt I +------~---------.-----+-------+-----+-----+------+------+-----+-----+------+-------+-------+------------------+ ,ope 'EFF AIS i E_N I EDIFr I E1Z I E2Z , FD1R I FD2R , MCR I ~~R I LSHR , ST.r.YR I F_N FD1R -------+---------+-----+-------+-----+-----+------~------+-----+-----+------+-------+-------+------------------+ 'ADD!, A x I x I x x V V x x x x x SUM I SUB! S x I ED<2 I 0 0 v x x x V x x I.SP.FT_OUT I S x , E:D<2' 1 x x V x x V x x I.SH:"'!_ OU'! x , 0<2 I x 1 x V x x V x x I.SP.F'I_0'0= , S I I ; =:.:::.f x x S S E S x ... l: l: Il<E:D<S8 I I ~>S7 ! 1 1:::>571 I E:D>S7 I I x x 0 0 1 0 1 0 x x i::-·-: x ,.~ ---- ~ I x x x x x x x j-: x v v V V x x x x V V V ." x x x x v IC\~?..f:'1 iC,,"':?~:':: o x x x x 1 o v x x x x x x x x x 1 0 0 x x x x SU* SW- x SW': S~ S1.'o/. x v v v v x x x x x x x x x x x x x x ~: x v x x x x x x x x St'Y. x x S::': x z x x x x I l: x .... .. v v v v v v v x x v· x x x x :'5~=== x x x V x x V x x x x x x x x x __ _ SW- I.SP.F'! OUT I.sz.:F'!:OUT S~: ----.-------------------~------.-----+-----+------~------~-----+--------------------~-------+------------------. O?C - Opcode E:F AIS - Effective Addition (A) or Effective Subtraction (S) SUM - Adder OUtput, shifted left/passed unshifted/shifted right as needed ED - Exponent Difference V - Valid data x - Don't care DIGITAL CONFIDENTIAL The Fbox 11-73 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.20.5 Exponent Datapath The operations performed in the stage 3 exponent datapath adder are shown in the table below. Note that the exponent operation category numbers are unrelated to the fraction operation category numbers. Table 11-14: Exponent Datapath Operation Summary Category Operation done in Adder EO NOOP CMPf; TSTf EI E_ADI <- EDIR + K + 1 (ER + 0 + 1 ) E_ADI <- EDIR + .not.K (ER·I) E_ADI <- 'EDIR + .not.ED2R + 1 DIVt; EFF..ADDf E2 E3 Condition MULf; :MULL; EFF.SUBf, deltaE>l EFF.SUBf, deltaE<2 ( ER • NORl\f ) E_ADI <- K + .not.ED2R .;. 1 CVTif ( BIASI· NORM ) E_ADI <- EDIR + .not.K + 1 (ER· BIAS2 ) E_ADI <- EDIR + K (ER + BlAS3) E5 E6 11.20.5.1 CVTfi, CVTRtL CVTff, :MOVf, ~fl\~Gf Constants Five bits (bits <BITMAP>(lO), <BITMAP>(7), and <5:3» of the exponent constants are driven from the control section into the exponent section. The other eight constant bits are hardwired to ground within the exponent block. The constants needed in stage 3 are: KO - 0000000000000 1111111111111 K1 - 0000010100000 K2 - 0010000100000 K3 - 0000000011000 K4 - 0000000101000 K5 - 0000000110000 K6 - 0000010000000 K7 - 0010000000000 o - -1 - NOT(KO} 160 1056 ; ; ; ; ; ; ; CVT{B,W,L}{F,D} CVT{B,W,L}G CVT{F,D,G}L/CVTR{F,D,G}L CVT{F,D,G}W CVT{F,D,G}B CVT{D,G}F/CVTFD CVTFG 24 40 48 - 128 1024 K1 and K2 are the BIAS! constants, used in CVTif; Ka, K4, and K5 are the BlAS2 constants, used in CVTfi and CVTRfL; K6 and K7 are the BlASS constants used in CVTfi', MOVf, and MNEGf. 11.20.5.2 Zero Detection The zero detectors are not used in stage 3. 11-74 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.20.5.3 Exponent Adder 1 The exponent adder is a l3-bit static adder used to add or subtract two inputs. Each input is passed through a 2 to 1 selector and inversion logic prior to the adder. INP_lA can be selected from ED lR or K.. If ISELl_ED lR_A is asserted, then ED lR is passed through the selector. If ISELl_K_A is asserted, then K is passed through the selector. Inversion of the adder input is then done based on the assertion of INVERT_EA.-ADI. INP_lA is never inverted in this stage. INP_lB can be selected from ED2R or K.. If ISELl_ED2R_B is asserted, then ED2R is passed through the selector. If ISELl_K_B is asserted, then K is passed through the selector. Inversion of the adder input is then done based on the assertion of INVERT_EB_ADl. The adder also contains a carry-in to the LSB cell, CIN_E_ADl_H. The carry-in is primarily used for performing subtraction operations. The table below gives the carry value for each exponent operation category: Table 11-15: L.SB Carry-In Values Category Carry In EO El d E2 0 E3 1 E4 1 E5 1 E6 0 1 Since the adder is static, it begins its operation when the input data is valid near the falling edge of phase 2. Intermediate results in the exponent adder are valid by the middle part of the execute cycle and sent to the detection logic and output selector. 11.20.5.4 Output Selector The output selector is used to select the output data from three different sources: edlr, e_adl or zero. This selection is done for the exponent output data (EDIR), the :Boating overflow (F_OVFR) and the floating under6.ow (F_UNFR). The selection is based on the assertion of two control signals, OSELl_ZERO and OSELl_E_ADl. OSEL1_E_ADl if asserted, selects the output from E_ADl; for overftow and underflow, OSEL1_E_ADl selects E_ADl_UNF and E_AD1_OVF. IfOSELl_E_ADl is deasserted, then the output is selected from EDlR; for overflow and underflow, OSELl_E_ADl deasserted selects EDlR_OVF and EDlR_UNF. This selection is done using a 2 to 1 selector. The selection of zero is done prior to the 2 to 1 selector described above. If OSELl_ZERO is asserted, then the inputs from E_ADl and EDIR entering the 2 to 1 selector are both forced to zero. Then, since only one select line is used to control the selector, the zero value will be transferred to the output regardless of the assertion of OSELl_E_ADl. The output of the selector is latched every cycle and driven into the following stage. DIGITAL CONFIDENTIAL The Fbox 11-75 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The selection of the exponent output is shown in the following table. Table 11-16: Exponent Output Selection Ezponent Operation Select EDlR Force Zeros Category if: if: EO El always select fraction passed fraction shifted EFF.ADDf: (elzr * e2zr) unsbifted fraction passed E2 DIV: (elzr + e2zr) fraction shifted MUL: (elzr + e2zr) always select EFF.SUBf: (elzr '" e2zr) EFF.SUBf~ deltaE=O: (f_zr) always select EFF.SUBf: (elzr '" e2zr) (Czr) unsbifted E3 E4 E5 E6 always select always select (elzr) As shown in the table above, some selection operations are dependent only on the operation category, ,vhile others also depend on ,,,"hether the fraction adder result needed a one bit normalization. The control section implements the following equation: CSEI.:1 E i.Z:Z - CSEL1:E:....Z.Dl - :1 :. !~ GSEZ:Z E ;.:oJ !~ GSE:.:.:ED1R GSEL1_E_ADl and GSEL1_EDlR are generated in the control section, based on the opcode. SHFr_DONE is generated in the adder, based on the value of the adder output. If RESET is asserted, SHFT_DONE selects the exponent output. The overflow and underflow outputs that are selected are never used. 11-76 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.20.5.5 Exponent Datapath Operation Summary (Normal Operating Mode): +------------------------------------------------------+-----------------+------------------+ Conaitions EDP Inputs EDP Output I I I I +------+---------+-------+-----+-----+-----+-----------+---+------+------+------------------+ ED1P. I OPC I EFF A/S I ED IFF I E1Z I E2Z I F_ Z I SHM_DONE: I K I ED1R I ED2R I I +------+---------+-------+-----+-----+-----+-----------+---+------+------+------------------+ A 0 I ADD!, x x I x 0 ED1P. I V I V I x I SUB! I I I I I I I I I I i A A A A S S S s S s x x 0 x 0 x x x 0 x 1 0 1 x 0 1 ED-O ED-O ED-O c-O E:I>-l 0>1 7.. 1 x x x x x 7.. S E:»l x x IC~~! l: 7.. x ! l!S':! J.: x x ! 1t:-,,~=, I I .,,- x 0 l: 7.. 0 x .,,- x x G ! 11-:=:"!, I ! !-r:-:.:., i_ ___ , l~"""~~ C c x la X X . !'._. "',---& -I 0 :.>: l: - I x x x I !~=~:;: I~,,'":=! I I""'-~:'~ I CV':R!L I ..,. I x x Z ~: "\" x x I :I: X x x l: ! x x I I I I I I I I I Z x 0 I V I V I V I V I x I x I x I I x I V I V I I x X I x x I I I I I I I I I I I I I I x x 0 0 1 1 x 7.. l: x 7.. 1 x x x x x x x ... C> - I I I I I I I I I I I V V V V V V V V V V V ! '\'! 1 x I V ,. 0 i I V 0 I V I V - x x ',- .:.. V V ! I 'v ! V .,,- ! V I \" j: i .I. I V x x ED1P. E_.AIll E_.AIll x 0 V V V V V E_ADl E_ADl 0 0 E_J..!n x x E A!:l , xx ~=:?" E.::'?.. I I I ! I l: x I E~:'? E::? E::? 7.. - x ~-;..::: .t"_ _ .I. f". - I x x x I x x I I I I I I I I I I I I I .I V I x I x - _r-.l_ .. - ..._- >: E_ADl -------------------------------------~-----------------------------------~------------------+ 11.20.6 Sign Datapath The operation done in the sign datapath portion of stage 3 is shown in the table below. Table 11-17: Stage 3 Sign Datapath Operatlons/slgn_dp_oper Category Operation Condition so C3%slr_h <- f_2%slr_h always performed f_3%s2r_h <- f_2%s2r_h 81 Cb%s_out_l <- C3%bp-plsn performed during stage 4 bypass 11.20.7 'Control The control section generates all the control signals needed for stage 3, based on the opcode and several condition signals, such as ElZR and F_ZR. It sends the opcode and necessary condition signals to stage 4. In addition, it contains some integer overflow detection logic, a 6-bit adder used in MULL, and logic to generate some control signals needed by stage 4. DIGITAL CONFIDENTIAL The Fbox 11-77 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The following table shows which categories of operations are performed in the fraction, exponent, and sign datapath portions of stage 3 for each opcode. Each category indicates a unique set of control signals to be driven. The control section generates these combinations of categories. Table 11-18: Categories of Datapath Operations Categories of Datapath Operations Opcode Operation Fraction Ezponent Sign CVTfi', MOVf, MNEGf F2 E6 SO CVTif FO E4 SO CVT:fi, right shift CVT:fi, CVTRtL: left shift. F2 E5 E5 E5 EI EI E3 E3 E2 SO FO SO EFF_.\I)Df F4 F5 F2 EFF.SUBf, deltaE<2, opnds <> 0 FO EFF.StJBf, delta.E<2, opnd(s) = 0 C:MPf; TSTf FI F3 F2 EO SO :MlTLf, ~f(;"LL, F6 E2 SO C'VTRfL, right shift DIVf EFF.SUBf, deltaE>1 11.20.7.1 SO SO SO SO SO SO Miscellaneous Control Signals Most of the stage 3 control signals are generated in the control decoders, but some are generated or conditioned external to the decoders. These signals are described in this section. 11.20.7.2 Data_Valid The data_valid signal sent to stage 4 is received from stage 2 and is enabled when there is no FBOX flush occurring and a stage 4 bypass is also not occurring. The equation for enabling F _3_C%8_D~VALIDR_B is as follows: NOT f_.1.abort_h AND (f_3b4_bypass_abortr_h OR NOT ( f_3.s4_bypass_Qnb AND f_3.!l4_bypass_rQqr_h )) This operation is performed before the end of the execute cycle. 11.20.7.3 Fault Bits and NEW_FOP There are three fault signals associated with each valid data that :Bows through the FBOX pipe. In addition to these three fault signals there is one more signal (new_fop) which indicates that there is a new FBOX operation is coming through the FBOX pipe. The three fault signals are named F_3%MMGT_FLT_L, F _3o/GMEM..ERR_L, F_3%BSV_ADR_L. A stage 4 bypass request can not be generated if any of the fault lines are asserted. The new_fop signal is cleared out of the FBOX pipe whenever an FBOX purge occurs. 11-78 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.20.7.4 Signs_Not_Eql, Fb_Neg4 Stage 3 generates two signals for use in stage 4. fb_neg4r_h The equation for signs_not_eql is: These signals are signs_not_eqlr_h and FB_NEG4 is the signal used to negate the B input to the stage 4 fraction adder. The input is negated if stage4 needs to perlorm a two's complement. The equation implemented is: := NEG4 - 11.20.7.5 2~: N) + SlR) + -(SIr;J1S_NC'!'_E;-Z)) .; F_:ztFEOY._S'YP,ASS_E [(E:FF'SrJE .. E iX!F: EQL 0 .. : (Ci?'!'::I oj, Integer Ovet1low logic Some of the logic used to detect the integer overflow condition for CVTii and CVTRfL is located in stage 3. This static logic operates unconditionally, and its outputs are used by stage 4 \vhen needed. The first function is IOVFL3. It implements the equation (DESTD'1'<~lORD> It MNEG~1R) + (!)ESTD'!'<LONG> It~)) IOVFL3 detects integer overflow for CVTfi and CVTRfL (no round up), in the case where the hidden bit of the fraction becomes the MSB of the integer, and the sign is negative. In this case, a two's complement must be perlormed on the integer. If the integer is 100... 00, no overflow will occur since the result of the two's complement will be 100...00, a negative number. This happens because in N bits, more negative numbers (one more) can be represented using two's complement than positive numbers. Thus, there is no positive equivalent of the most negative number (100 ...00). If the integer is not 100...00, overflow will occur since the result of the two's complement will be oxx....xx, a positive number. The second function is IOVFL4. It implements the equation IOVFL4 <-- (IOVFL4A + IOv.FL4B) It CVTRfL IOv.FL4A <-- LNEGIR ItS2R It E_DIFF_EOL_25R IOv.FL4B <-- SlR .. E_DIFF_EOL_24R It CRE'L_RNDR It HNEGLR IOVFL4 detects integer overfiow for CVTRiL, in the case when rounding causes the integer to be incremented. IOVFL4A detects the case where the integer is 011 ... 11, the result should be positive, and a round up occurs. IOVFL4B is used to detect a case not covered by IOVFL3. In general, if the hidden bit of the fraction becomes the MSB of the integer and the sign bit is negative, overflow will occur unless the integer is 100...00. However, for CVTRiL, overftow will also occur for an integer equal to 100... 00 if the integer must be rounded up. IOVFlAB covers this case. IOVFL3 and IOVFL4 are sent to stage 4, which calculates the final integer overflow result. DIGITAL CONFIDENTIAL The Fbox 11-79 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.20.7.6 Cln_BSS The carry in to bit position <B58> of the fraction adder is generated outside the control decoders, using control signals generated by the decoders. (See Fraction Datapath Operation Summary.) --- i f operatio~ is DIVIDE CIN_BSB - F_ltF_N CIN_BSB - STKY i f 0pfiration is EFFSUB, EDIFF>l CIN_BS8 - 0 otherwisfi The decoders generate the signals indicating the operation type. 11.20.7.7 SeI_Other The sel_other signal is used in the adder and output selector in order to permit selection of the normalizer output as the stage output. For all operations except CVTfi, the value of this signal is determined by the operation. For CVTfi, it is determined by the sign of the exponent difference obtained in stage 1. If the exponent difference is negative, a left shift is performed on the fraction, and stage 3 must select the normalizer output. If the exponent difference is positive, a right shift is performed, and stage 3 selects the adder output. (See Fraction Datapath Operation Summary.) Finall~ the normalizer output is always selected in FBOX_Test mode and when the chip is reset. The equation implemented is the following: (~ '" F_3tSEL_OTHEP__B) + F.-If'FBOX_BlrASS_H + The control decoders generate the signal F_3_01D8EL_OTHER_H, used for all operations except CVTFI. 11.20.7.8 Left Shifter Input Selection Signals There are two left shifter input selection signals: F_3%ISBFT_FDl~R and F_S%LSHFT_FD2R_H. Either F_2%FD1R or F_2%FD2R may hold the input to be left-shifted. (See Fraction Datapath Operation Summary.) F _2%FD2R holds the input if the operation is effective subtraction, with either input equal to zero. For all other operations, F_2%FD1R holds the input to be shifted. The equations implemented are the following: LSHFT_FDl:R - IF_ItF130X_BrPASS_B It (EFFSUB .. (ElZ + E2Z))) + IF_I.FBOX_BrPASS_B .. F_ItS4_BrPASS_ENB_H) LSHFT_FD2:R - IF_ItF130X_BrPASS_B '" (EFFSUB .. (EIZ + E2Z))) + (F_ItF130X_BrPASS_B 11-aO The Fbox It F_I.S4_BrPASS_ENB_H) DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.20.7.9 Osel1_Zero This signal is used to force the stage 3 exponent output to zero. The equation implemented is as follows (see description of the output selector in the exponent section): OSELl ZERO - 11.20.7.10 ... (E1Z + E2Z;) + + {{DIV {~ ... (E1Z + E2Z;) {EFFADD ... (E1Z ... E2Z;) {EFFSUB ... (E1Z ... E2Z;) (EFFSUB ... E DIFE' EQL 0 (eil'!'!:! ... F- Z) +- + + ... F Z) + - retT:::! ... E1ZJ + (MOt':! (MNEG:! ... E1Z) + ... E1Zj) ... F_Ii"FBOY,-'sl"PASS_H Osel1_Ed1r This signal is used to select the stage 3 exponent output. If it is asserted, the contents of F_SG2%EDIR are chosen as the stage output; othennse, the exponent adder output is chosen as the stage output. !:=:~= OSEL1_E.D1R - 1 CSE:1_ED1R - SEF'I_D01-<"E :if [C!@: -fe 'IST:t + F_ItFBOY._BY'P;.sS_B ) RESET :!:! IEFFSrJE ... E_DZFF_G':rFL1) + EE"FJJ)D + NUL DI"il P.ESE'!' 11.20.7.11 ... + ... F_ItFBOY._SYFASS_H + MULL Adder The multiplier array in stage 2 generates 64-bit sum and carry vectors for MULL. The 58 MSB's are combined in the fraction adder in stage 3. The 6 LSB's (<B58:B63» of each vector must be added together in the control section of stage 3. The six sum bits generated are sent to stage 4 (as are the MSB sum bits). Any carry out of the six LSB's has been previously incorporated in the MSB's in stage 2. 11.21 STAGE 4 Stage 4 of the pipe is used to do various terminal operations of an instruction. It does round or a 2's complement on the result of stage 3. The result of stage 4 is the :final result which is sent to the interface section. Stage 4 finds the sign of the final floating result and outputs it to the interface. Stage 4 also detects the following conditions: integer overflow, floating over:6.ow, floating underflow, zero result, negative result, reserved operand and floating divide by zero. In addition to this, it sets the correct condition codes (PSL.Z AND PSLN). Stage 4 also checks whether the condition for CMP and TST instruction is met or not. For eMF, the correct condition DIGITAL CONFIDENTIAL The Fbox 11-81 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 codes are set. During any CMP instruction, stage 4 forces the fraction and exponent datapath output to zero. When reset is asserted only one path of selector will be enabled in the fraction adder selector logic. 11.22 FRACTION DATAPATH Figure 11-28: Fraction Datapath Block Diagram : Bt: o~ L<Bl:BS8> I I - - - \/ I I F 3tFD1R<AO:BS8> \/ - ---------------------------------------------------------------------------------------------+ ZERO :lE'ZE==IOl' !.OG:C/Mt:'I.l. OV-.:.?.n.Oii l.OGIC II 1--> '1'0 I COl~'!P,O:" ---------------------------------------------------------------~+-----------------+----------I; \/ 1<--------- : 3'~l.S=R<5:0> V - !I \/ ----------------------------------------------------------------------------------------------! <---s~:., !~:. I <---==_l;~~ \/ :;. €:-!:: ;...:=~ I \ Ii Ii \/ \/ : I I: II II I i Ij =_';_A.1!::::_= 1 <- ?-l=::,;t c.:.. !<- =:lr=S5,=::·~_=:e 1<- S4,:"- _:=-.:~:., .::::_:..... 1-> SE:'I'_D:n~ ere E:G.) ij 11------------------------11-------> SUM<b48>.Su.M<b40>. I SUM<b24> ('1'0 ~SC LOGIC) \ / S'OM (.~:ER R£S~,!) \I I ~---.----------------------------------------------------------------------------------+-----+ ----~----------------------------+~--------------~+-----------------------------------~-----+ IDET_ Sh"Rl II II \I RESS " II II II ZERO v \ / P.!:SN \/ \/ ~--------------------------------------------------------------------------------------------+ RSELEC'1'OR II I V II +------------------------------------++------------------------------------------------------+ II II \I \/ +--------------------------------------------------------------------------------------------+ FD1R<BO:B5S> II 1<-- PHI_2 +------------------------------------++------------------------------------------------------+ 1 I F 4%FDlR II \/ - \I +--------------------------------------------------------------------------------------------+ I 1<-- PHI 3 BUS DRIVERS I I 1<-- DATA_VALID 1<-- F_tlNFR_H +--------------------------------------------------------------------------------------------+ 1/ \I F_B%F_OU'l'_L<Bl:B58> ( To output interface.) 11-82 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.22.1 Fraction Implementation Description FRACTION DETECTION LOGIC The detection logic in the fraction datapath is connected directly to the output from stage 3. The F_3%FDIR_H and F_3%MILSBR_H outputs from stage 3 are necessary for the detection logic. The detection logic works unconditionally and no control signals are provided to the logic except clocks. The detection logic detects a zero result for MULL and CVTfi instructions. It also detects overflow for the MULL instruction. Overfiow for MULL occurs when the 32 msb's of the 64-bit result are not equal to the sign extension of the low half (32 Isb). SELECTOR The selector drives the selected input into the adder. The selector either selects F_3%FDIR_H unshifted or shifted left by eight bits. It can also negate the selected input. The control input to the selector is SEL_MULL_L, SEL_MULL_H, FB_NEGR_H, and FB_NEGR_L. If the SEL_Mt.i'LL_H is high then it is a MULL instruction and the F _3%FDIR_H and F_3_C7£:~nLSBR_H is selected, shifted left by eight bits. If SEL_MULL_H is 10'''', then the F_3%FDIR_H is selected 'without any shifting. IfFB_1\TEGR_H is high then the selected input is complemented. The complementing is necessary for doing a 2's complement if certain conditions are satisfied for EFFStJB and CVTfi instructions. ADDER The adder is used for the terminal operation of the result, i.e. for rounding, to find the 2's complement of the result and to add zero to the input. The last case is used when the input to stage 4 is to be passed as output of stage 4. The adder also drives the result selection signals. One input (FB) to the adder is F_4_A%BIN_H and the other input (FA) is always zero. The R.~*, Cm_B58 and CINB55_ ONE signals are driven to the adder by the control of stage 4. SHIFT DETECTION LOGIC OF ADDER If enabled, the adder examines the sum bits <AO:Bl> to determine whether a one bit shift right is needed to normalize the result. The instructions which may require a one bit right shift are: EFF.ADDf, EFF.SUBf, MULf, DIVF, CVTif and CVTff. For all these instructions the result from stage 4 fraction adder could be of the form 0.lXX.., 0.00... , or l.xx. .. If the shift detection logic is disabled, then the signal indicating "no shift. needed" will be forced valid. This logic is also conditioned with another signal, which is used to force all of the shift detection signals to their invalid value. Since these shift detection signals are used to drive the output selector for the stage, this featUre permits the selection of a stage output other than the shifted or unsbifted adder result. The logic used to control the shifting is as follows: f_*_B%det_shrl_h - AO * shift_en * sel_other f_*_a%det_pass_h - {[(AO*BO + AO*SO*Bl) * shift_en) + shift_en} * sel_other DET_SHRl detects the case in which the fraction result is l.XX.,XX, and thus the fraction must be shifted right l>y one bit to be normalized. DET_PASS detects several cases: first, the case in which the fraction result is 0.lXX.XX; second, the case in which the fraction result is zero (O.OOxx..xx); last, the case in which the shifter is disabled. DIGITAL CONFIDENTIAL The Fbox 11-83 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 1-BIT RIGHT SHIFTER The input to the 1-bit shifter is the adder result. The output of the I-bit shifter is the adder result unshifted (RESN) and the adder result shifted right by I-bit (RESS). The I-bit shifter works unconditionally. The shifter is used to right shift a fraction overflow result in case of fraction overflow for floating point instructions. If fraction overftow has occured then the shifted result is used, otherwise the unshifted result is used. RSELECTOR The RSELECTOR selects the final result for an instruction. The output of the selector is latched in pm_2 which is passed to the interlace. The inputs to the RSELECTOR are the two outputs from the I-bit shifter and zero. For the CMP instruction and for floating destination type instructions if the final result is zero then it selects zero. For all other instructions the selector selects the I-bit right shifter output (RESN or RESS). BUS DRIVERS The BUS DRIVER section drives the final stage 4 result to the output interface on an active-low precharged bus, F_B%F_OUT_L<BI:B55>. This bus is shared with stage 3 which uses it to bypass stage 4 for certain instructions. The input to the BUS DRIVER section is F_4%FDIR_H<Bl:B55>. During PW_3, if stage 4's data_valid bit is set and the underflow condition is not detected, the inyerted ~"alue of F_4%FDIR_H<Bl:B55> is driven onto the bus. If underflow is detected then the bus is not driven. This represents a zero being driven to the output interface. The fraction sign bit (SlR)~the PSL.N bit, and the exponent data bits are all driven to the output interface in the same manner. 11.22.2 Fraction Operation The operations performed in the fraction datapath are shown in the table below. Table 11-19: Fraction Datapath Operations Condition ADDER FloatiDg Operation SHIFT_EN EFF_SUB AND FN=l AND DeltaE::O FDlR <- 0 + NOT FB + 1 y EFF_SUB AND FN::O AND DeltaE::O FDIR<-O+FB y EFF_.ADD OR (EFF_SUB AND NOT DelatE::O) FDlR <- 0 + FB + Rnch y MULf FDlR <- 0 + FB + Rnch y DIVf FDlR <- 0 + FB + Rnch y CV'Iif FDlR <- 0 + FB + Rnch y CVTffIMOV FDlR <- 0 + FB + Rnch y MNEG instruction FDlR<-O+FB N CVTfi AND SlR::O FDlR<-O+FB N CVTfi AND SlR=l FDIR<-O + NOTFB + 1 N CMPtrsT and PIPELINED CMP inst. FDlR<-O N MULL FDIR<-O+FB N 11-84 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip FunctioDal Specification, Revision 1.0, February 1991 Table 11-20: Fraction Datapath Operation Summary Inputs Conditions ope EDIFF Value ElZ E2Z F_Z FDlR MILSBR Output FDIR E_A X 0 X X V X SUM E_A E_A E_S E_S E_S E_S X X 0 X V X SUM =0 1 1 X X 0 =0 X X 0 X V X SUM =0 X X =0 1 1 X X >0 0 X E_S >0 X 0 1 X X X ~rt~Lf X 0 0 :MULf X 1 ~nJLf X X X 1 X X DrVf X X DIVf X CVTif V X X X X SUl\1 V X SUM X 0 X X X X 0 0 X V X stJM X 1 X X X 0 X X 0 V X SUM CVTif X X X X 1 X X 0 CVTff X X 0 X X X X V X SUM X X 0 0 X X X SUM 1 X X V MOVIN X X X X 0 CVTfi X 0 X X V SUM CVT.fi X 1 X X X X X MULL X X X X V V SUM eMP V V V V X X 0 CVTff MOVIN 1 V 0 0 SUM 0 E A/E S - Eff add/Eff 8ubstarct. MOV/N- - MOV/MNEG instruction 0 - Zero result. X - Don't care V - Valid DIGITAL CONFIDENTIAL The Fbox 11~5 .,.. NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The "::0" and ">0" under the EDIFF value column for E_A or E_S refers to the exponent difference value being equal to zero or greater than zero respectively. 11.23 EXPONENT DATAPATH Figure 11-29: Block Diagram of Exponent Processor I I F BikE OUT L<lO: 0> I I F 3 E%ED2R I \ / - -- \ / \ / - - F_4_EAl I I +-----------------+-+-----------------------------------------------+ --------> I I I zmo DE'!1:C:OlON ------> F " +---------.-+-----+-~---------------------------------------+-~----+ :-4-E'~:Z~F 3 E%EDl? I I K I I 1 I I 1 F 3 E%ED2R H --\/ \/ \/ \/-F 3 E%ED1R P. I I F_4_C%ENA_E1Z_~, F_'_C%EN.k_~:!Z_l, K P. I ~%E1Z? ! P. =_ 1 I 1 .;_=%:s~:.:_=:=:~E_E/~ =_.;_=%:s:::.:_:~_=_E/:: --------------------------------------------------------------------+ -->1 ----->! =_~_C%:SE~:_E~l?~A_E/~ =_"'_=%:.5~:.:_::_A_E/:. -->1 ----->; ! _.-=- . . .;. S~E::':'?\' FOR ';;'::-ER l 1 ~-----------~-----~-----------------------~-----------~-~-----------+ ::::_:;.. i I :~:_l= I I 3 E~~:::? :; K i I = ! i =:=- .;';:=%:!:,\~~::~:~l::E/:' =~::~~~!{: 1 \ ! \ / \ / 1 - - - ~---------+-------~--------------------------------------------~---~ EJ... ;...:': :: /:. - > ~ ->! ------------~-----~-~-----------------------~-----------------------~ y. I EA;'::: 1 1 £E A:il 1 1 1 1 I I \. / I I EXPOP..N'I A!lDER 1 \. - \! ----------~------~+---------~-~---------------------------~.----+ -: 3 E%ED1P. P. 11K I I I I l i E ADl I I -: 3 !:lJED.2P. E F 4 EAl ---=-=------=--1 1-----------1 1---1 I--=------------------------------~-~------~---~-~ \ / K_F%PH!_ '_H \ / -:_~_!:os 1 I +-----------------------------+-+-----------------------------------+ 1 1 1 I I PH!4 LATCHES I +---------+-+-----------------+-+-----------------------------------+ F 4 E%EDl H 1 1 1 I l i E A:ol -------------> -- - \/ \/ \/- +-------------------------------------------------------------------+ F 4 ClJEN G TYPE I. ------>1 1 1 OVERFLOW AND 1 F-'-C%EN-F.5 TYPE L ------>1 1 I UNDERFLOW 1 1 1 DETECTION LOGIC I F:4:ClJOSEL1:zmo:H ------>1 +---------+-+-----------------+-+-----------------+-+---------------+ F 4 E%EDl H 1 I 1 1 liE ADl 1 1 F OVFl, F OVF.2 -- - \/ \/ \/ - \/ - - +-------------------------------------------------------------------+ F 4 ClJOSELl ZERO L -----> I 1 1 OUTPUT SELECT 1 ----->1 1 1 AND PHI2 LATCHES 1 F:4:ClJSHFT_DoNE_H I I I 1 +---------+-+-------------------------------------+----+------------+ F 4 EOS ---------------1 1-----------1 1-----------------------1 I-----------------------~;:--- FONFRH II II 1 I <-=----=-----------+----------------1-1-----------1-1-----------------------1 I '1'0 FRAC. D.P. I F_4_ElkEDULH \ / \ / F_ON!"R_H V V F_OVFR_H I +-------------------------------------------------------------------+ +----->1 I PH! 3 ---------->1 BUS DRIVERS I DATA_VALID ----->1 I +-------------------------------------------------------------------+ 1 I 1 1 \ / F_BlJE_OUT_L<lO:O> V V F 4tFOV 00'1' L F_UFON_OOT_L 11-86 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0t February 1991 11.23.1 Exponent Block Description The exponent block can be used for various functions. In stage 4 it is used to increment the stage 3 exponent result. It is also used to detect the floating underflow and floating overflow conditions on the final result. The zero detector result is used for CVTfi overflow detection logic. The final exponent result is either the stage 3 result, or the stage 3 result incremented by one, (if there is overflow) or zero. As the selection of the final result is done near the end of a cycle, floating overfiow and underflow are computed for all possible results and the correct one is chosen with the result. 11.23.2 Exponent Operation In the exponent data path, the stage 3 exponent result is incremented unconditionally for each instruction. Then, depending on the instruction and the fraction result, the correct exponent is selected. The three possible exponent results are: the stage 3 exponent result, the stage 3 exponent result incremented by one, and zero. For instructions having integer as the final output, the exponent is a don't care. 11.23.3 Floating Overflow and Underflow Detection Floating point overflow and underfiow is detected on the output of the exponent adder as well as the exponent data (EDIR). Floating point overflow requires detecting a case when the exponent is larger than the largest biased exponent of 255 for F and D, and 2047 for G. The overflow is detected as follows, where e<12:0> represents the exponent: For F ana D: OVerflow for G: e<l2> 1< ( e<11> + e<BITMAP> (10) + .<9> + .-<8» overflow - .<12> 1< &<11> The floating overfiow signals, EDIR_OVF and E_AD1_OVF, are only asserted if an overflow is detected and the appropriate enable signal is asserted. The enable signals are en_fd_type_l and en-Ltype_l, they signal whether a floating point operation is being performed and what the. data type is. Floating point underflow requires detecting the case when the exponent is smaller than the minimum exponent. Since the smallest biased exponent is 1 for F, D and G, the following logic detects underflow: for F,D and G: unaerf10w - e<12> + NOR (e<0> to &<12», which reduces to, - e<12> + NOR (e<0> to &<11» DIGITAL CONFIDENTIAL The Fbox 11-87 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 As with overflow, the underflow signals, EDIR_UNF and E_AD 1_UNF, are asserted only if an underflow is detected and one of the enable signals is asserted. The overflow and underflow signals are selected as described in the output selector section. 11.23.4 Output Selector The output selector is used to select the output data from three different sources: edlr, e_adl or zero. This selection is done for the exponent output data (EDIR), the floating overflow (F_ OVFR) and the floating under:fiow (F_ UNFR). The selection is based on the assertion of two control signals, OSEL1_ZERO and SHFT_DONE. SHFr_D ONE , if asserted, selects the output from E_ADl; for overflow SHFT_DONE selects E_ADl_UNF and E_ADl_OVF. If SHFT_DONE is deasserted, then the output is selected from ED1R; for overflow and underflow, SHFT_DO:NE deasserted selects EDIR;...OVF and "EDIR_'lJl\j'T. This selection is done using a 2 to 1 selector. The selection of zero is done prior to the 2 to 1 selector described above. The selection for the exponent result is done as follows. If the final result is know to be zero then a zero result is selected. The PSL.Z bit (see below under miscellaneous logic) is asserted if the final result is zero, which asserts OSELl_ZERO. IfOSELl_ZERO is asserted, then the inputs from E_ADI and EDIR entering the 2 to 1 selector are both forced to zero. Then, since only one select line is used to control the selector, the zero value will be transferred to the output regardless of the assertion of SHFT_DONE. The output of the selector is latched during PHI_2 of every cycle and driven to the BUS DRIV"ER section. BUS DRIVERS The BUS DRIVER section drives the final stage 4 result to the output interlace on an active-low precharged bus, F_B%E_OUT_L<lO:O>. This bus is shared with stage 3 which uses it to bypass stage 4 for certain instructions. The input to the BUS DRIVER section is F_4_E%EDIR_B<lO:O>. During pm_a, if stage 4's data_valid bit is set and the underflow condition is not detected, the inverted value ofF_4_E%EDlR_H<lO:O> is driven onto the bus. If underflow is detected, the bus is not driven. This represents a zero being driven to the output interface. 11-88 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0. February 1991 Table 11-21: Exponent Datapath Operation Summary Input Conditious ope Output EIZ E2Z F_Z EDIR EDIR X 0 X X V V X X 0 X V V =0 1 1 X X 0 EDIFF Value E_A E_A E_A E_S E_S E_S E_S =0 X X 0 V V =0 X X 1 X 0 =0 1 1 X X 0 >0 0 X X V V E_S >0 X 0 X V V ~mLf X 0 0 X V V !\IDLf X 1 X X X 0 :MULf X X 1 X X 0 DIVf X X 0 X V V DIVf X X 1 X X 0 CVTif X X X 0 V V CVTif X X X 1 X 0 CVTff X 0 X X V V CVTfi X 1 X X X 0 MOVIN MOVIN X 0 X X V V X 1 X X X 0 CVTfi X X X X X X MULL X X X X X X CMP X X X X X 0 E AlE S - Eff add/Eff substarct MOV/N- - MOV/MNEG instruction X - Don't care V - Valid The "=0" and ">0" under the EDIFF value column for E_A or E_S refers to the exponent difference value being equal to zero or greater than zero respectively. DIGITAL CONFIDENTIAL The Fbox 11-a9 NVAX CPU Chip Functional SpecUication, Revision 1.0, February 1991 11.24 Control Figure 11-30: Control Block Diagram IF_3_C%FOP_FLOWR_H<5:0> I I I v IF nnsox BYPASS H 1- - - v ~------------------------------------------------------+ 1--> !:IT SO:S 1--> SE!i='!_!:N 1--> Mi::.:' 1 !--> C\"!':: !--> ?..:~ 1--> !::-;;'_l:!:'! :--> :=:.::? ~------------------------------------------------------+ I<--i::!l ~-----------------------------------------------------+ i<--PHI2 ~------------------~----------------------------------+ I IF_4_C%:OP_:LOWR_H<S:O> v (TO MISC LOGIC 0: STAGE 4 ) 11.24.1 Control Block Description The control block supplies all the control signals for various operations in stage 4 and also sends the control information to interface delayed by a cycle. The control block gets it's input from stage 3. 11.24.2 Control Block Implementation The main control is implemented with a PLA. The inputs to the PLA are the opcode and bypass signals. All the instruction information is encoded in FOP_FLOWR_H. The following control information is decoded in the PLA: EFF_SUB, SHIFT_EN, MULL, CVTFI, RND, ENA_DET and PCMPR. SHIFT_EN is asserted for CVTif, CVTDF, ADD/SUB, DlVf, MULf. RND is asserted for CVTif, CVTff, ADDISUB, DIVf, MULf. ENA...DET is asserted for CVTff, ADDISUB, DIVf, MUL£, CVTff, CVTi£. The destination data type is decoded to get six signals for each datatype. FrYPE,DTYPE, GTYPE, BYTE, WORD and LONG. 11-90 The Fbox They are: DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The logic used to generate other control signals in stage 4 are as follows: md en h - fop flowr h<2> ;: md h ;: (eff sub AND e d1f! eql Or) fop:flowr_h<2> : If-~h1s ~1gnal is low for 1n~true~ion in p1pe1ined mode requiring rnd, then nld is disabled, i.e. it is a truncate mode. rndt h mddI h mdg_h cin bS8 h cinb55_one_h - rnd_en_h ;: ftype rnd_en_h ;: dtype rnc_en_h ;: gtype e d1f! eel Or h ;: eft sub ;: f nr h (e~fi:h~*-Slr_h) + (~igns_no~_eqlr_h) fb_neg_h is generated by stage 3 and sent as fb_neg4r_h to stage 4. The equation in stage 3 is : irr~lemen~ed sel other h psl:_f_h - 11.25 - OR osl% f h :hi~ si;n~: ~I:I be high if the result for a flcat~ng destination result is C and if for a eM? instr".lction if both tn .. :;perancs are sa:ne. ~=m~r MISCELLANEOUS AND SIGN LOGIC Figure 11-31: Miscellaneous Pia Block Diagram v itV it itV V vvv 1--> PSLZ_F_H MIse PIA 1 1--> PSLN2_F_H 1 1--> PSLNl F H 1 -- 1--> RESERVED_OPD 1 1--> F DIV BY 0 1 - -- +------------------------------------------------------+ 11.25.1 Miscellaneous Sign Logic Implementation Stage 4 is used to find the sign of the final result, condition codes and exceptions. Specifically it does sign computation, integer overflow detection, zero result detection, negative result detection, reserved operands and floating divide by zero detection by utilizing the information provided from the previous stages of the pipe. H the result is zero, stage 4 will force its output to zero. In the case of floating underflow, the sign, PSLN_F_H, fraction, and exponent of the result are forced to zero. DIGITAL CONFIDENTIAL The Fbox 11-91 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.25.2 Sign and Negative Result Logic The sign of the final result and the PSL.N status bits are the same except for eMP and TST instructions. For CMP and TST instructions, the sign bit is a don't care and the PSLN bit is high if the first operand is strictly less than the second operand. If the final result is a zero then the sign bit and the PSL.N bit should be forced to zero. The PSL.Z (see below) bit is set if the result is zero, which is used to force the sign and the PSLN bit to zero. For the integer instruction the sign is already in the result, and hence sign is computed only for floating results. Hence the PSL.N bit for floating result PSLN_F_H is same as the sign bit. The signals, PSLN and PSL.Z, are driven to the output interface on the active-low precharged bus which is shared with stage 3. During PHI_3, if stage 4's data_valid bit is set and the underflow condition is not detected, the inverted value of PSL.N is driven onto the bus. The inverted value of the PSL.Z signal is also driven onto the bus during PHI_3, if the data_valid bit is set, regardless of the underflow condition. The interface uses these signals to determine if the CMP condition is met or not. The PSL.N bit is obtained as below. :! rS~.Z ~he~ PS~.N - 0 For EFF.WDIEFFSti''B the PSL.N bit of the result is given as follows. &::=_s::.= or I"_:~!=_-;:_:· .. {__::'''S:= - -w_:-.... -s:=; For MULf and DIVf the PSL.N bit of the result is the XOR of the sign of the input operands. For MOV, CVTff and CVTif the PSLN bit of the result is the sign of the input operand. For MNEG instruction the PSLN bit of the result is the inverse of the sign of the input operand. PSL.N - slr * (MOV + CVTff + CVTif) + slr * MNEG For CMP and :5: instruction the PSL.N bit is PSL.N - [signs_not_eql*slr + signs_eql*{ e_diff_eql_O * Cf_n XOR slr) * f_z + All the above computations are done in the miscellaneous PLA. As the number of minterms for psln logic was large, two signals are generated in the PLA, which are OR'ed outside and AND'ed with PSLZ_F_H, to give the final PSLN_F_H. Sign has to be computed only for instructions considered above. For all the above instruction the final sign is either the PSLN bit or it is a don't care, hence For CVTfi and MULL, the PSL.N bit is the MSB of the final result. For MULL and CVTfi. (destination long), the MBB is SUM<B24>. For CVTfi with destination of word the MSB is SUM<B40> and with destination of byte the MBB is SUM<B48>. Also when the destination is byte and word, the only instruction possible is CVTfi. Hence the PSL.N bit is PSL.N - SUM<B24> * (LONG * CVTfi + MOLL ) + SUM<B40> * WORD + SUM<B48> * BYTE 11-92 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.25.3 Integer Over1low Integer overflow is possible for MULL and CVTfi instructions. The overflow condition for the convert floating to integer instruction is determined in stages 1, 3 and 4 of the pipe. For MULL instruction, the overflow is determined in stage 4. All these conditions are combined to give the integer overflow signal to the interface stage. OVERFLOW DETECTION FOR CVTfi The CVTfi instruction overflow detection operation performed in all the stages is given below. All constants are in decimal. Let the exponent of input operand be El, then the actual exponent of the Hoating number is El-bias. Let that number be ACTUAL_EXP. hence, Let DEST_LEN equal the length, in bits, of the destination result hence, :c= -::::"'.e~ =:c·a-:.~:':.: ~: _. _'= :"::s-::-.::-:.:"::: :'-=5":_:'::: c.ss':_.l..:: -:6 :~s":_:~~ !:= !:= - __ C~:lVE.::' ':::a.':.~=.; ::~~~~ =::a~~~; ~: -:.~ . . . cr: ins':.=-.;=':.:..c~ ::~~~:== !~s~=~=~~:~ For convert from floating to integers of length 8(B), 16(W), 32(LW) integer overfio\v occurs under the following condition. 1. !! a:~ual_exp > des~_len :. !! a:~ua:_exp - QeS~_le~ an= sl=-O ~. !! a:~~a:_eXp - des~_le~ ~Q sl=-: L~C t~e in~ege= ?o~ic~ is nc~ equal ~o ~he mos~ nega~ive numbe= ,. fo: CVT rounded to long only, in addition to the above concitions the !ollowing concitions haVe to be checkec: a} if actual exo - 31 anc slr - 0 and the 32 bits of the integer part are of the form- 01111 ••• 111 and the remaining fraction 1s greater than or equal to 0.5. b} if actual exp - 32 anc slr - 1 and the 32 bits of tbe integer part 1s of the for.m 10000 ••• 000 and tbe remaining fraction is greater than or equal to 0.5. The actual detection of the above conditions are done in stages 1, 3 and 4. In stage 1 the following signals are generated. lnegir mnegbr mnegwr mnegbr crfl rndr e diff eql 24r e:diff:eql:2Sr - Least negative integer; high if <BO:B31> of F_I%FD1R_H are 1 Most negative byte; high if <B1:B7> of F_I'FD1R_H are 0 Most negative word; high if <Bl:B1S> of F_I%FD1R_H are 0 Most negative longword; high if <B1:B31> of F_I%FD1R_H are 0 Convert floating to longword round bit; <B32> of F_I'FD~_H exponent difference equals 24 exponent difference equals 25 In stage 3 an exponent difference (see below) is done to determine the first three conditions for CVTfi overflow. The fourth condition for CVTRtL is also determined in stage 3. DIGITAL CONFIDENTIAL The Fbox 11-93 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Let El be the exponent of the incoming operand in stage 3 and ER be the result of the subtraction in stage 3. ER is send to stage 4. ER - (bias + dest_len) - E1 - constant - E1 Above Constant Values F,D --> B F,D --> W F,D --> L G --> B G --> W C-l36 C-144 C-160 C-1032 C-1040 C-1056 G --> L Stage 3 sends out two signals, IV3 and IV4, to stage 4 for CVTfi overflow detection. They are generated as follows. iv' - (:n_;~= ~ y !:= y c_;!!=__ ;:_:: =nE;:= ~ !:= • c=!:_=::: • __ =~!~_.~:_~4) * (C\~~~) In stage 4 the following operations are performed. Let, !:e: s~a=. S s~s~=a=~icn _1<:2:~> - .. . . . - -::. .. S~;-:: ::;.:: :'! .. :.. ~ .....:<::'> elz - 1, el<:::O> is zero _x~~~_~~ ~.!~~~ The first two conditions is determined as The third condition for CVTfi overfiow is determined as: The fourth condition for overfiow is given by iv4. Finally the CVTfi overflow is determined as OVERFLOW DETECTION FOR MULL For MULL integer overflow occm-s if the high half of the double length result is not equal to the sign extension of the low half. The following condition is determined on MULL result to detect integer overflow. The register F _3%FDlR_H<BO ... .B32> contains the high 33 bits of the MULL 64-bit result. mull_zero - NOR OF BITS fdlr(BO) THROUGH fdlr(B32) ;33 BITS mull_one - AND OF BITS fdlr(BO) THROUGH fdlr{B32) ;33 BITS The integer overflow is defined as: 11-94 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.25.4 Zero Result 'When the final result is zero then the zero flag (PSL.Z) bit has to be set. Different instructions are analyzed. For EFFADDIEFFSUB a zero result is possible when a) Both the input operands are equal and it is an effective SUB operation or b) Both the input operands are zero. For a floating multiply instruction, zero result is possible only when one or both the input operands is zero. For a floating divide instruction, a zero result is possible only when the dividend is zero. i.e. the second operand is zero. "'nen the first operand, the divisor, is zero then it is floating divide by zero. "'ben a floating dh-ide by zero occurs then the PSL.Z bit is a don't care. :~:..z - ~:_: ~ ::~..= For ~fOV&~LG/CVTffinstructions zero result is possible only when the input operand is zero. For C1-IPITST instruction zero flag has to be set when operand 1 is equal to operand 2. For the TST instruction operand 2 is zero. For convert integer to floating instructions the result is zero if the input integer is zero. Ps~.z - =_Z * CVTif All the above computation is done in the miscellaneous PLA. The output of the miscellaneous PLA is PSLZ_F_H, as only the PSL.Z bit for floating instruction was considered. For integer multiply instructions and all convert :floating to integer instructions, zero result is possible for many different input operands. Hence the final result will be checked for zero result. For the CVTfi instruction, stage 4 is used to do a 2's complement. The 2's complement of zero is again zero, and the 2's compliment of any non-zero number will not be zero. Hence the zero condition can be detected at the input of stage 4 rather than at its output. For MULL the low order 32 bits of the result need to be checked for zero result. The register MILSBR has the 6 low hits of 32-bit lsbs and register FDIR<B32:B57> has the other 26 hits of the 32-bit Ish result. The conditions which are generated are as follows: f 4 d%zero mil h f-4-d%zero-byt-h f-4-d%zero-wor-h f:4:d%Zero:mul:h f_4_d%zero_lon_h - NOR of f 3%fdlr h<BS6:BS7> * NOR of f 3 emilsbr h<S:O> NOR of f-3%fdlr-h<B48:BSS> - NOR of f-3%fdlr-h<B40:B47> NOR of f:3%fdlr:h<B32:B39> NOR of f_3%fdlr_h<B24:B31> DIGITAL CONFIDENTIAL The Fbox 11-95 NVAX CPU Chip Functional Speci6cation, Revision 1.0, February 1991 The zero detection is done as follows, - NOR OF FD1R(B4S) THROUGH FD1R(B55) - i_4_d%zero_byt_h ;8 bits - NOR OF FD1R(B40) THROUGH FD1R(B55) ;16 bits - i_4_d%zero_byt_h * i_4_d%zero_wor_h - NOR OF FD1R(B24) THROUGH FD1R(B55) ;32 bits - zerc_w * !_4_d%zero_mul_h • f_4_d%zero_lon_h zero mull - (NOR OF FD1R{B32) THROUGH FD1R(B57» * (NOR OF MILSBR{O) THROUGH MILSBR(5» - zero_w * !_4_d%zero_mul_h * f_4_d%zero_mil_h - zero_l * (long * CVTfi) + zero_w * word + zero_b * bytQ + zQrc_l * zQro_mull * MOLL PSL.Z During PHI_3, if stage 4's data_valid bit is set, the inverted value of the PSL.Z bit is driven onto the active-Io\v shared bus. 11.25..5 Reserved Operand The reserved operand fault is checked in stage 4 of the pipe. A reserved operand fault is possible only \~hen the input operand is floating type. "Then a resen ed operand fault occurs the other condition codes are overridden. The reserved operand detection is done in the miscellaneous pIa. g For one operand instruction: For two operand instruction: RES.OP~ - (f 3 c%elz: h * ! 3%slr_h + !_3_c%e2zr_h * f_3%s2r_hl * (;.:oD+ SUE + D::Vf + !roLf + 11.25..6 cMP) Floating Divide by Zero When a floating divide by zero occurs, the f_div_by_zero bit has to be set. The Hoating divide by zero fault occurs if operand 1 is zero. The logic is done in miscellaneous PLA. 11-96 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specm.cationt Revision 1.0, February 1991 11.26 FBOX TESTABILITY This section describes FBOX_Test mode of operation. FBOX_'Thst mode would primarily be used during chip debug and possibly during manufacturing tests. 11.26.1 FBOX_Test Control Signals Two FBOX input signals are associated with FBOX_Test mode. E%FBOX_TEST_ENB_H is received from the EBOX, latched during PHIl, and driven down the FBOX pipe as F_I%FBOX_BYPASS_H. Assertion of E%FBOX_TEST_ENB_B puts the FBOX into FBOX_Test mode. A second signal, E%FBOX_S4_BYPASS_ENB_H, has the function of selecting two slightly different modes of FBOX_Test mode. E%FBOX_S4_B'YPASS_ENB_B is received from the EBOX by a PHIl latch and driven into the Fbox core as F_I%S4_BYPASS_ENB_B by a following PID3 latch. 11.26.2 FBOX_Test Mode Description FBOX_Test mode allows simple testing of the FBOX fraction and exponent datapaths. 'When in FBOX_Test mode, the basic operation of each stage is to pass fraction and exponent data, unchanged~ from its input to its output. Thus~ the test mode features allow FDlR or FD2R to be passed through the fraction datapath and EDlR to be passed through the exponent datapath. Selection of whether to pass FDlR or FD2R to the Fbox output is done, in Stage3, by looking at the yalue of F _IC;CS4_Bl'PASS_~~_H. SIGN bit processing is not affected by FBOX_Test mode. 11.26.2.1 FBOX Section Operation During FBOX_Test Mode Input and Output - The Input and Output sections of the FBOX operate as normal. Divider - In the Divider, F_I%FBOX_BYPASS_H assertion forces F_D_C%DIVDONE_DAT_H to be asserted to Stagel effectively bypassing the Divider. This enables Stagel to use data supplied by the Input interface as the result of the Divider stage. Stagel - In Stagel., F _I%FBOx,..BYPASS_H assertion forces Stagel output register select signals to a state that writes the Stagel FDlR, FD2R, and EDlR output registers with the contents of the Input interface FDlR, FD2R, and ED1R respectively. Stage2 - In Stage2, F_I%FBOx,..BTPASS_H assertion forces right-shifter control to a "shift_of_zero in order to pass FD 1R throug~ Stage2. Output register select signals are forced to a state which writes the Stage2 FDlR and EDlR output registers with the contents of the Stagel FDlR and ED lR. Stage2 FD2R is always written with the contents of Stagel FD2R irrespective of FBOX_Test mode. II Stage3 - In Stage3, F_I%FBOx,..BYPASS_B assertion forces left-shifter control to a "shift_of_zero" in order to pass FDlR or FD2R through Stage3. Selection of whether to pass FDlR or FD2R is done by the value on F_I%S4_BYPASS_ENE_H and output is on StageS's FDlR. Stage3 EDlR output is written with Stage2 EDlR input while in FBOX_test mode. Stage3 fraction output selectors are forced to output the contents of the left_shifter during FBOX_Test mode. The following table describes Stage3 operation modes and data driven on various busses for different modes of operation. DIGITAL CONFIDENTIAL The Fbox 11-97 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The main features of this implementation are: o Either FD1R or FD2R can be selected to pass direetly through the FSOX o The two shared busses between Stages 3/4 and the output interface ean be selectively driven by Stage 3 or Stage ~. o Provides visibility of the Stage3 miniround inerementer results. F_I%FBOX_SYPASS_H I IF_I%S4_BYPASS_ENB_H Value Appearing On Susses II II II II II Miniround Incrementer F_B%F_OUT_L<Bl:BSS> Input. Stage 3 Operation Mode F_3%FD1R_H<AO:BS8> F_B%E_OOT_~<lO:O> -----------------------------+-----------+--------------------+---------------~-.--------------------~-----. I 00 i N~~.a: Operation w/ Opcode I Stage 4 !raction IStage 3 !raetior. I Stage .; expone~t I Stage I Depe."':dent I re:sult I resu:t I result I re I S';_B:tpass - crr -----------------------------~-----------~----------------------------------------------------------~-----N~;::r..a;. ~per~::.ion w/ Ope ode :St.age 3 !raet.ion St.age 3 expone~t. 01 S.;_=~~·ass Dependent C!~ - Ires~lt i! Stage .; I!:.y?ilssed, else rQs~!~ r .. s:ll-: !! S~ag. ~ b::-pass.:, .15. S-:age ~ czpe::.e=.-: S~a~. , D=iv~ '* I =::'::,,-::<:"0: ~> ~----------------------------~----------~--------------------------------------~--------------------~------S':.a.;-.. ~ :-:!. ·w"e~ ;:::?-:-::<;'~:=5S> =::::·:_=l:ASS, I D.?e."':Qe~,: ::::>-:-::<3:::55> S':_=~-;ass - :.:~, !"or see: =~":pass~=l. ~::=~Qe " ! ; :) ! Cp:o:. I S~a;. ~ ~=~v_~ i ~.?e.~Q.n-: ! F:::P'-.E<=: :::5> I !oo~note ==~X_=Y:~~S, S';_E~-;ass - .~::: =::i':_=l-:;'.'sS, S",_:E<ypass - ON, ~c!!-:~-pass2.!:,!e Dependent ~==:?~E<A0:~5S> j s':.!.;-.. ~ =-=:.~_:: FD2R_H<Bl:BSS> epeo:1. '* ;::::?,:-::<A·:':E5E> I I I I I '* I $'::.;. ~ ==~~~:: ~::: E:D1R_H<lO:O> ~-------------------------------------------------------------+-----------------~--------------------~------- ;.:~ fraet.icn cat.a bits are passec ~hrough Stage 3, as rec.ived, by way c! the left shi!t.er. - In FBOX_Test meae,with 54_Bypass on and a bypassable opcode in 5tage3 the majority but not all o! frae bits are passed through 5~age 3,as received, by way of the left shif~er and the output seleetor ehoosi shifter output. For :-type data two fraction bits (B22:B23) are passed through Stage3 by way of the mini round ineremen Similarly, for D-type data six fraction bits (BSO:BS5) and for G-type data three fraetion bits CBSO:BS are passed through Stage3 miniround incrementers. It is important to note that the eontrol logio for the miniround input selectors makes it's selection on opeode information and the signal F_3_A%SHFT_DONE_R. FBOX_Test mode is not factored into the minir inerementer's input seleetor eontrol. Depending on the opeode and exponent differenee, miniround inpu' could choose left shifter output or fraction adder output to be fed to the miniround ineramenters. The simplest way to pass FD2R through Stage3 (unchanged) is to select the proper opeoae and data sueh that an effeetive subtraet with exponent difference of zero will enter Stage3. This will seleet Stage 3'5 left shifter output as the souree for the miniround incrementar input and the round bit pOSition will be zero. Stage4 In Stage4, F_I%FBO~BYP.ASS_B assertion forces fraction adder carry-in and round signals to zero to allow FDIR to pass through Stage3 unchanged. Stage4 FDIR and EDIR are written with the contents of Stage3 FDlR and EDIR respectively. 11-98 The Fbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 11.26.3 Revision History Table 11-22: Revision History Who When Des~puonofclumge Anil Jain Anil Jain Dave Deverell 17-Mar-1989 Initial Release 18-Dec-1989 Updated to reflect the Fbox implementation 25-Jan-1991 Updated to reflect PASS1 implementation and FOX_Test section added DIGITAL CONFIDENTIAL The Fbox 11-99 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Chapter 12 The Mbox 12.1 INTRODUCTION The Mbox performs three primary functions: • • • VAX memory management: The Mbox, in conjunction \vith the operating system memory management software, is responsible for the allocation and use of physical memory. The Mbox performs the hardware functions necessary to implement VAX memory· management. It performs translations of virtual addresses to physical addresses, access violation checks on all memory references, and initiates the invocation of software memory management code \vhen necessary. Reference processing: Due to the macropipeline structure of NVAX, and the coupling between NVAX and its memory subsystem, the Mbox can receive memory references the Ibox, Ebox and Cbox simultaneously. Thus, the Mbox is responsible for prioritizing, sequencing, and processing all references in an efficient and logically correct fasbion and for transferring references and their corresponding data tolfrom the Ibox, Ebox, Pcache, and Cbox. Primary Cache Control: The Mbox maintains an 8KB physical address cache of I-stream and D-stream data. This cache, called the Pcache (Primary Cache), exists in order to provide a two cycle pipeline latency for most I-stream. and D-stream data requests. It is the fastest D-stream storage medium for NVAX and represents the first level of D-stream. memory hierarchy and the second level of I-stream memory hierarchy for the NVAX computer system. The Mbox is responsible for controlling Pcache operation. DIGITAL CONFIDENTIAL from The Mbox 12-1 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 12.2 MBOX STRUCTURE This section presents a block diagram of the Mbox and defines the function of the basic Mbox components. This section neither explains why the functions of each component exist nor does it discuss the interactions among the components. The intent of this section is only to define the function and interconnection of the components for future discussion. Subsequent sections will deal component interaction. The following block diagram illustrates the basic components of the MbOx. 12-2 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 12-1 : Mbox Block Diagram FROM liaox FROM I.OX 1... I.OX_ADDRe" :0. E... I W._.U. TO taOXIE.OX KW.UScl' :0. 11:0. I E¥_LATCH I t I ROTATOR ARBITRATION ~ I LOGIC VAP.LATCH ~1 t INCR ~ IREF.LATCH ~I SPEC_QUEUE 1 I I f C·TMER ISS_PACKET.INFO i SS COMMANO A.ORT LOGIC I r-;::j RTV_DMISS_LA T DATAPATH UNALIGNED DETECT ,MME_OATAPATHl LOGIC '" QUE,;,S. VlIoc3· :0. t I H I f 1 . ~ TAG IiAIE I QUEUE ,PA.QUEUE) i PIPE_LATCH JI PIPE_LATC .. I TRANSLATION IDMIS5 LATCH IIIMIS5 LATCH I 1 PTE_DATA ADDRESS H U i PHYSICAL cl:C. p... I EXCEPTION ~ I II + C CEO _AD Rc 'oS. M OX_ ILL QWe4:b ~ U.NAG5MENT CROSS PAGE CSOX_Lt.TCH : MEMORy ",.QUE...SS_DATAct,:o. i ... I MME_LATCH ACVITNVIM.O DETECT LOGIC .UFFER t M.OUE..s'.PAct :0. (TI) .YTE I-- IMSIC GEN&AA~ 1 PFN M QUE"'" PRIMARY CACHE - - ---- IoI_QI ~~'1:'. PTE INFO I Ihas PAc":O. (POACHE) a. PIPE aTAIIE -------- ~~-~~~~-I----------------------------------a. PIPE STAGE .KB OF 011 STREAM ROTAlOA DATA MO_BUS & DRIVER t OTMI ~.S.-PAOKET-...FO ....._DATt.c.,:o. H M....I PAc":O. FROM caox DIGlTAL. CONFIDENTIAL PARITY aaERAlOA & CNEDII&R TO OIIOX The Mbox 12-3 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The Mbox is implemented as a two-stage pipeline located in the fifth and sixth segments of the NVAX macropipeline (85 and 86). References processed by the Mbox are first executed in 85. Upon successful completion in 85, the reference is transferred into 86. At this point, the reference has either completed or is transferred to the Ibox, Ebox, or Cbox. During any cycle, the fundamental state of the 85 and 86 stages can be defined by the particular references which currently reside in these two stages. For the purposes of describing the Mbox, all references can be viewed as a packet of information which is transferred on the 85 and 86 buses. The 85 reference packet, and the coITesponding 85 buses are defined as: • • • • • ADDRE8S: The M_QUE%S5_VA,..B<31:0> bus transfers all virtual addresses and some physical addresses into the S5 pipe. The M_QUE%SS_PA.,.B<31:0> bus transfers some physical addresses into the S5 pipe and transfers all addresses out of the 85 pipe. DATA: M_QUE%ss_DATA..B<31:0> transfers data originating from the Ebox, through the 85 pipe. COMM..~~: M_QUEo/cS5_CMD_B<4:0> transfers the type of reference through the 85 pipe. This command field is defined in Section 12.3.l. TAG: The M_QUE%SS_TAG_B<4:0> transfers the Ebox register file destination address corresponding to the reference through the S5 pipe. DEST_BOX: M..QUE%S5_DEST_B<1:0> transfers the reference destination information through the S5 pipe. This field is defined as follows: M_Q'CE%S5_DEST_ H • De:6nitioD 00: the reference requests data destined for the MhOx. 01: the reference requests data destined for the Thox. 10: the reference requests data destined for the EbOx. 11: the reference requests data destined for the Ebox and Thox. AT: The M_QUE%S5~_B<1:O> transfers the access type of the reference. This field is defined as follows: M_QUE%S5_AT_ • H DemrlUOD 00: th passive query access (See PROBE command) 01: read access 10: write access 11: modify access (read with write check for future write to same addr) DL: The M_QUE%S5_DkB<1:O> transfers the data length of the reference. This field is defined as follows: 12-4 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 M_QUE%S5_DL_ • H Definition 00: byte 01: word 10: longword 11: quadword REF_QUAL: The M_QUE%S5_QUAL_B<6:0> transfers information which further qualifies the reference for the purpose of Mbox processing. This field is defined as follows: Definition M_QUECiiS&_QUAL..B<6> address of reference is currently a virtual address. M_Q~"_QUAL_B<5> reference has been tested for cross-page condition. M_Q~"_QUAL_B<4> reference is first part of an unaligned reference. ~QUE%"_QUAL_B<3> reference is second part of an unaligned reference. M_Qm:r;t,SS_QUAL..B<2> enable ACV and ~i=O checks. M_QVV.cS5_QUAL_B<l> reference has or is forced to have a hard error. M_Q'OEc.<SS_QtiAL..B<O> reference has or is forced to have a memory management fault (ACV~'"V~{=O). The 86 reference packet, and the corresponding 86 buses are defined as: • • • • ADDRESS: The M%S6_PA..,.B<31:0> bus transfers a physical address through the S6 pipe. DATA: Bo/'iSG_DATA..,.B<63:0> transfers data through the 86 pipe. COMMAND: M%S6_CMD_B<4:0> transfers the type of reference through the 86 pipe. This command field is defined in Section 12.3.l. DEST_BOX: 1\CQUE_MS2%S6_DEST_B<1:0> transfers the reference destination information through the 86 pipe. This field is defined as follows: M_QUE_MS29'D86_ DEST_H • • Definition 00: the reference requests data destined for the Mbox. 01: the reference requests data destined for the Ibox. 10: the reference requests data destined for the EbOx. 11: the reference requests data destined for the Ebox and Ibox. S6_BYTE_MASK: :M%S6_BYTE_~B<7:0> transfers the byte mask information through the S6 pipe. The byte mask field is used to indicate which bytes of a longword or quadword write should actually be written to a cache or memory. REF_QUAL: 1\CQUE..)IS2%S6_QUAL_B<3:0> transfers information which further qualifies the reference for the purpose of Mbox processing. This field is defined as follows: DIGITAL CONFIDENTIAL The Mbox 12-5 NVAX CPU Chip Functional Specification, Revision 1.0. February 1991 M_QUE_MS2%S6_QUAL_ H bit 12.2.1 Definition M..QVE~_Q~B<3> reference is first part of an unaligned reference. M_QVE_M.S5S6_QUAL..B<2> reference is second part of an unaligned reference. M..QtlE_MS5S6_QuAL..B<l> reference has or is forced to have a hard elTOr. M_QtJE..MSSS6_QUAL..B<O> reference has or is forced to have a memory management fault (ACVITNVIM=( IREF_LATCH The IREF_LATCH is a latch which stores all I-stream read references (IREADs) requested by the Ibox. Each lREAD is stored in the IREF_LATCH until the reference successfully completes in 85. The follo\ving figure illustrates the structure of the IREF_LATCH: 12-6 The Mbox DIGITAL CONFIDENTIAL. NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 12-2: lref Latch _IR_E_F_.-R_E_Q_ _ _ _ _ _ _ _ _ _eot.1 >-1 I VALID BIT IREAC REAt) ACCESS ON CL. TRUE FALSE FALSE FALSE TRUE The output of the address field of the mEF_LATCH has an incrementer associated with it in order to increment the quadword address. The output of this structure can be tristated. See Section 12.3.5.2 for a more complete understanding of lREF_LATCH function in the context of overall Mbox operation. DIGITAL CONFIDENTIAL The Mbox 12-7 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 12.2.2 SPEC_QUEUE The SPEC_QUEUE is a 2-entry FIFO structure which stores D-stream read and write references associated with specifier source and destination operands decoded by the !box. Each reference latched in the 8PEC_QUEUE is stored until the reference successfully completes in S5. If the reference is unaligned, the entire reference must complete in 85 before the corresponding entry is invalidated. The following figure illustrates the structure of the SPEC_QUEUE: Figure 12-3: Spec Queue S"EC_FlEO I£OX_:::MOe4.' :0> !cOX_ADORe3' :0> ISOX TAGc2:0> 190X_ c.!: _OeSTe' :0> IBOX_ATe, :0> IBOX_DLe1 :0> VALID BIT VALID liT COMMANO COMMAND ADDRESS ADDRESS TAG TAG DESTINATION DESTINI. TION ACCESS TY~E DATA LENGTH ACCESS SIS CMOe4:0> I TY~E DATA LENGTH NOT STO,._S,.EC_QeO> X,.AGE_CHECKED FALSE FALSE TRUE : MBOX FORCE HAAD FAULTeCb FORCE MME FAUL TeO> The output of this structure can be tristated. 12-8 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 12.2.3 EM_LATCH The EM_LATCH latches and stores all commands originating from the Ebox. Each reference is stored until the following two conditions are satisfied: 1) the "complete logical reference" (i.e. the pair of aligned references required if the EM_LATCH reference is unaligned) clear memory management access checks, and 2) the EM_LATCH reference successfully completes in 85. The following figure illustrates the structure of the EM_LATCH: DIGITAL CONFIDENTIAL The Mbox 12-9 NVAX CPU Chip Functional Specification, Revision 1.Ot February 1991 Figure 12-4: VAI.ID 81T Sf VAe31 :0> W 8USc31 :0> DATA TAG ACCESS TV"'! DATAU!NGTH VIRT .r.:~ S! ATc1:~> S5 Dl.c1:~> S$ OU.t.Lch .. ALSE "Al.SE SS_OUALel> 8S OUAl.eD> A 4-way byte barrel shifter is connected to the data portion of the EM_LATCH. This enables the write data to be byte-rotated into longword alignment. The EM_LATCH output can be tristated. 12-10 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 12.2.4 VAP_LATCH The function of the VAP_LATCH is to create and store the second reference of an unaligned reference pair. Each reference is stored until the reference successfully completes in 85. The following figure illustrates the structure of the VAP_LATCH: Figure 12-5: YAP_LATCH VALID BIT COMMAND ADDRESS , SE -DATAc3l:0> DATA TAG , SS -DESTcl :0> DESTINATION ACCESS TY~I DATA LENGTH VIRT'I"HYS TRUE FALSE TfIIUE , SS- QUALcb TRUE FALSE FALSE IU_CMDc4 :0> c SS_ VAc3l :0> SS_D"'T4c31 :0> S!_ TAGc4:0,. SE_DESTcl :0,. S!_"'Tc1:0,. SS_DLc1 :0,. SS_QUALe',," SS_QUALe',. c c SS_QUALe.,. S,_QUALel,. SS_QUALcb S,_QUALeb S,_QUALeO,. c c c c The VAP_LATCH transforms the current 85 reference into a new reference. Thus, input for the VAP_LATCH is taken off of the 85 buses. An incrementor exists on the input side of the address field which adds eight to M_QUE%S5_V~B<31aO> in order to create the second reference in an unaligned pair of references. The VAP_LATCH output can be tristated. DIGITAL CONFIDENTIAL The Mbox 12-11 NVAX CPU Chip Functional Speci1ication, Revision 1.0, February 1991 See Section 12.3.17 for a more complete understanding ofVAP_LATCH function in the context of overall Mbox operation. 12.2.5 MME_LATCH The :MM:E_LATCH (Memory Management Exception Latch) stores references associated with memory management processing. It acts as a buffer between the 85 processing pipe and the :MME_DATAPATH. The :MME_LATCH is the 85 source for PrE references (page table entry reads), PrE data, and Mbox internal processor registers and TB fill operations. The following figure illustrates the structure of the MME_LATCH: 12-12 The Mbox DIGITAL CONFIDENnAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 12-6: MME_LATCH VALID BIT I MME_CMO_GENc4:0> MME_ALUe31 :0> COMMAND ADDRESS MO BUSe31 :0> : MM~_ALUe31 :0> DATA MME_ TAGe4:0,. MME_OESTe1 :0,. - MME A'1"e1:0> MME_DL.e1:0> MM=_ VI"~ _AODI'I TAG DESTINATION ACCESS TYPE DATA L.EN~T,," VIf'iT:P;';YS FALSE FAL.SE FALSE I SS_CMDc4 :0> SS _VAe3l:0> 55 _DATl-e3l :0,. SS_ TAGc4:0,. SS _OESTe' :0,. S!_A-:-c'::~> 55 _DL.e1 :0> 55 _OUAL.el» SS_OUALcS> t SS_QUAL.e4> SS_QU"L.eb SS_QUALcZ> MME_ENABLE_"CV _CHK c c .,_QUALeb FALSE FAL.SE SS_QU"LcO> t Each reference is stored until the reference successfully completes in 85. The :MME_LATCH output can be tristated. DIGrTAL CONFIDENTIAL The Mbox 12-13 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The RTY_DMI8S_LATCH stores D-stream reads which missed in the Pcache when a previous D-stream fill sequence has not yet completed. This latch is the mechanism by which aD-stream read, which missed in the 86 pipe during another D-stream fill sequence, can be retried in the 85 pipe at some later point. An 86 D-stream read is loaded into the RTY_DMISS_LATCH when it misses in the Pcache while a previous D-stream fill sequence is in progress. A RTY_DMI8S_LATCH is driven into the S5 pipe during or after the point when the final D_CF reference is executing in S6 to complete the previous fill sequence. A RTY_DMIS8_LATCH reference is invalidated when its read is retired from 85. The following figure illustrates the structure of the RTY_DMISS_LATCH: 12-14 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 VALID BIT SS_CMDc' :0> S6 PA<S' :0> - SS TAGc4 :0> S6 OESTe' :0> - se - Dl.e' :0> COMMAND ADDRESS TAG DESTINATION DATA LENGT'" - • - • - • S5 CMDc4:0> S5 PA<3' :0> S5 TAGc4:0 > 55 - OEST<~: 0> S:_DL<': 0> S5_QUALc6> FALSE TRUE - S:_QUAL<5> S6 QUAL<4> S5_QUAL,<Al> S6 QUAL<3> S5_QUAL,c3> - • S5_QUAL,c2> FALSE FALSE S5_QUALc' > • S5_QUALcO> FALSE The RTY_DMISS_LATCH output can be tristated. See Section 12.3.5.3.1 for a more complete understanding ofRTY_DMISS_LATCH function in the context of overall Mbox operation. DIGITAL CONFIDENTIAL The Mbox 12-15 NVAX CPU Chip Functional Speci:fication, Revision 1.0, February 1991 12.2.7 CBOX_LATCH The CBOX_LATCH stores references originating from the Cbox. These references are I-stream Pcache :fills, D-stream Pcache fills, or Pcache hexaword invalidates. Each reference is stored until the reference successfully completes in 85. The following figure illustrates the structure of the CBOX_LATCH: Figure 12-8: CBOX_LATCH VAL.IO BIT COMMANO ADDRESS - SS PAc3,:O> . OcSilNATION QUADWORO DL. FAL.SE S5 QUAL.c6> TRUE FAL.SE FAL.SE FAL.SE S5_ QUALc2> • HARD ERRORcO> FAL.SE Note that no data field is present in this latch even though this latch services cache fill commands. 12-16 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Cache fill data will be supplied to the Pcache on the B%S6_DATA...,H Bus by the Cbox during the appropriate S6 cache fill cycle. The C%CBOx..,ADDR_B bus is driven by the Cbox during invalidate commands. During cache fill commands, all but two bits of the C%cBOx...ADDR_B bus are driven by the DMISS_LATCH or IMISS_LATCH. The Cbox will drive cumoX_FD:.L_QW_B<4:3> during cache fill commands in order to supply the quadword alignment of the :fill data within the hexaword block.. The CBOX_LATCH output can be tristated. 12.2.8 PA_QUEUE The PA_QUEUE (Physical Address Queue) stores the physical addresses associated with destination specifier references made by the Ibox via a DEST_ADDR or READ_MODIFY command. The Ebox will supply the corresponding data at some later time via a STORE command. 'W'hen the STORE data is supplied, the PA_ QUEUE address is matched with the STORE data and the reference is turned into a physical ,\VRlTE operation. The following figure illustrates the structure of the PA_QUEUE: Figure 12-9: PA_QUEUE ----6 eNTRies o e : ? - - - VA1.IO 'I~ VIo.LIC BIT itA .QUEUE CONFLICT ADDRESS ADDRESS I>ATA LENGTH DATA LENGTH FALSE TRUE FA.LSE MBOX_FORCE MME_FAULTeO> The PA_QUEUE is organized as a 8-entry FIFO. Addresses from the Ibox are expected in the same order as the corresponding data from the Ebox. DIGITAL CONFIDENTIAL The Mbox 12-17 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The PA_QUEUE has address comparators built into all FIFO entries. These comparators detect when the physical address bits <8:3> of a valid PA_QUEUE entry matches the corresponding physical address of an Ibox D-stream read. See Section 12.3.6.1 and Section 12.3.18.1.1 for a more complete understanding of PA_QUEUE function in the context of overall Mbox operation. 12.2.9 TB The TB (translation buffer) is the mechanism by which the Mbox performs quick virtual..tophysical address translations. It is a 96-entry fully associative cache ofPTEs (Page Table Entries). Bits 31 through 9 of all S5 virtual addresses act as the TB tag. The replacement algorithm implemented is Not-Last-Used. See Section 12.5.1.3 for more information. 12.2.10 MME_DATAPATH The :MlvIE_DATAPATH (Memory Management Datapath) is used to process most memory management functions performed by the Mbox. Specifically, it performs the following functions: • • • • Creates read references of PTEs in order to obtain virtual address translations not cUlTently cached in the TB. Creates TB fill references in order to write PTE data into the TB. Stores memory management internal processor registers. Stores virtual addresses associated with memory management faults or TB parity errors. The MME_DATAPATH implements these functions with a register file and an ALU. See Section 12.5.1 for a more complete description of the MME_DATAPATH. 12.2.11 ARBITRATION LOGIC The ARBITRATION LOGIC is responsible for determining which reference source drives its reference packet into the S5 pipe. (See Section 12.3.4 for more information about reference arbitration.) 12.2.12 S6_PIPELATCH The S6_PIPELATCH is the buffer between the S5 and S6 stages of the Mbox pipeline. It latches the S5 reference packet, modifies it appropriately, and drives it as an S6 reference packet into the S6 pipe. M_QUEUS_DA7A,.B<31:0> is driven onto both the upper and lower halves of BCfcS6_ D.ArA,..B<63:0>. M~_CMD_B<4:0> is either: 1. driven by the M_QUE'*SS_CMD_B<4:0> 2. is changed into a NOP 12-18 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 12.2.13 DMISS_LATCH and IMISS_LATCH The DMISS_LATCH stores the currently outstanding D·stream read. That is, a I).stream read, which missed in the Pcache, is stored in the DMISS_LATCH until the corrsponding Pcache block :fill operation completes. The DMISS_LATCH also stores IPR_RDs to be processed by the Cbox until the Cbox supplies the data. I·stream reads are handled analogously by the !MISS_LATCH except that IPR_RDs are never handled by the IMISS_LATCH. The following figure illustrates the structure of the DMISS_LATCH and the IMISS_LATCH: Figure 12-10: DMISS_LATCH and IMISS_LATCH PCACHE_B1.K_MATCH HEXAWORD_AOOR_MATCH VAL.IO BIT AOORESS TAG OESTINATION 1 ST UNALIGNED 2ND UNALIGNED • NON-CACHE ABLE 1ST FILL FIRST_FILL • These two latches have comparators built in in order to detect the following conditions: • If the he:xaword address of an invalidate matches the hexaword address stored in either MISS_LATCH, the corresponding MISS_LATCH sets a bit to indicate that the corresponding :fill operation is no longer cacheable in the Pcache. DIGITAL CONFIDENTIAL The Mbox 12-19 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • • • Address<11:5> addresses a particular Pcache index (corresponding to two Pcache blocks). If address<8:S> of the DMISS_LATCH matches the corresponding bits of the physical address of an 85 I-stream read, the 85 I-stream read is stalled until the entire D-stream fill operation completes. This prevents the possibility of causing a D-stream fill sequence to a given Pcache block from simultaneously happening with an I-stream fill sequence to the same Pcache block. By the same argument, address<8:5> of the IMISS_LATCH is compared against 8S D-stream reads to prevent another simultaneous I-streamlD-stream fill sequence to the same Pcache block. Address<8:S> of both miss_latches is compared against any S5 memory write operation. This is necessary to prevent the write from interfering with the cache fill sequence. See Section 12.3.5.1 for a more complete understanding of the DMISS_LATCHlIMISS_LATCH functions in the context of overall Mbox operation. The function of the MD_BUS_ROTATOR is to right-justify read data and drive it on the MO/cMD_ BUS_H. For unaligned reads (see Section 12.3.17.1) the ~ID_BUS_ROTATOR is designed to assemble read data from two read references and drive it on the Mo/c:MD_BUS_H in right-justified form. This rotator coupled with the IvIbox decomposition of unaligned references into two aligned references, allows the Ibox and Ebox to issue unaligned D-stream reads and receive the requested data aligned to the Ebox datapath. The !\rID_BUS_ROTATOR is illustrated below: 12-20 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 M%MD_BUSc7:0> M%MO BUSc1S:8> M%MD BUSc2~ :16> M%MD BUSc31 :24> M%MD BUSc3Si :32> .M%MD BUSc47 :40> rM%MD 8UScS5 :48> M%MD BUSc6~ :56> L '\TCH - I - ... - ROTATOR CONTROL S·WAY 8YTE 8ARREL SHIFTER 8%S6_ DATAc6 3:56> 8%S6_ DATAc5 5:48> 8%S6_ DATA<4 7:40> 8%S6_ DATAc3 19:32> 8%S6_ PATAc3 ., :24> 8%S6_ DATAd ~:16> 8%S6_ PATAc1S:8> B%S6 _0 AT Ac7 :0> , , M%St PAc lA_a Although the diagram above describes the MD_BUS_ROTATOR as an 8-way byte barrel shifter, its actual design is a functional subset of a full barrel shifter. The lower four bytes of the output of the rotator are designed as a full 8-way byte barrel shifter in order to right-justify D-stream longword data. However, the upper four bytes always directly pass M%MD_BUS_B<63:32> since these bytes are only used when aligned I-stream quadword data is sent to the VIC. 12.2.15 Pcache The Pcache is a two-way set associative, read allocate, no-write allocate, write through, physical address cache of I-stream and D-stream data. It stores 8192 bytes (8K) of data and 256 tags corresponding to 256 hexaword blocks (1 hexaword = 32 bytes). Each tag is 20 bits wide corresponding to bits <31:12> of the physical address. There are four quadword subblocks per block with a valid bit associated with each subblock. The access size for both Pcache reads and writes is one quadword. Byte parity is maintained for each byte of data (32 bits per block). One bit of parity is maintained for every tag. The Pcache has a one cycle access and a one cycle repetition DIGITAL CONFIDENTIAL The Mbox 12-21 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 rate for both reads and writes (note however, that the entire Mbox latency is two cycles due to the two stage Mbox pipeline). The Pcache represents the first level of D-stream memory hierarchy and the second level of 1stream memory hierarchy in all NVAX computer systems. Pcache entries must be invalidated in order to maintain cache coherency with higher levels of the memory hierarchy. See Section 12.4 for more information on the Pcache. 12-22 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.3 REFERENCE PROCESSING This section discusses how references are processed by the Mbox, and how the Mbox functional components interact to carry out reference processing. 12.3.1 REFERENCE DEFINITIONS The following table describes all types of references processed by the Mbox: Table 12-1: Reference Definitions Name Value (hex) Reference Source !READ OE Ibox Aligned quadword I-stream read Description DREAD lC Ibox, Ebox, Mbox Variable length D-stream read DREAD_MODIFY 1D Ibox Variable length D-stream read with modify intent as a result of Iboxdecoded modify specifiers DREAD_LOCK IF Ebox Variable length D-stream read with atomic memory lock WRITE_UNLOCK 1A Ebox Variable length write with atomic memory unlock WRITE IB Ebox Variable length write DEST_ADDR OD Ibox Supplies address of a write-only destination specifier STORE 19 Ebox Supplies write data corresponding to a previously translated destination specifier address. IPR_WR 06 Ebox: Internal Processor Register Write IPR_RD 07 Ebox Internal Processor Register Read IPR_DATA 04 Mbox Transfers Mbox IPR data to Ebox LOAD_PC 05 Ebox: Transfers a PC value to Ibox via M'HID..Bvs_B<31:0> PROBE 09 Ebox Mbox returns ACVfrNVlM:O status of specified address to Ebox:. MME_CHK 08 Ebox, Mbox Performs ACVtrNVlM=O check on specified address and invokes the appropriate memory management exception TB_TAG_FILL OC Ebox, Mbox Writes a TB tag into a TB entry. TB_PI'E_FILL 14 Ebox, Mbox Writes PrE data into a TB entry. DIGITAL CONFIDENTIAL The Mbox 12-23 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 12-1 (Cont.): Reference Definitions Name Value (hex) Reference Source TBIS 10 Ebox Invalidates a specific PTE entry in the TB. TBlA 18 Ebox,Mbox Invalidates all entries in TB. TBIP 11 Ebox Invalidates all PTE entries in TB corresponding to process-space tram lations. 03 Cbox D-stream quadword Pcache fill NOP Description 02 Cbox I-stream quadword Pcache fill 01 Cbox Hexaword invalidate of a Pcache entry OF Ibox Stops processing of specifier references. 00 !box, Ebox, Mbox No operation 12.3.2 SIMPLE MBOX PIPELINE FLOW A major Mbox design consideration was to return requested read data to the Ibox and Ebox as quickly as possible in order to minimize macropipeline stalls. If the Ebox pipeline is stalled because it is waiting for a memory operand to be loaded into its register file (md_stall condition), then the amount of time the Ebox remains stalled is related to how quickly the Mbox can return the data. In order to minimize Mbox read latency, a two-cycle pipeline organization is used. This organization allows requested read data to be returned in a minimum of two cycles after the read reference is shipped to the Mbox. The timing diagram below illustrates the basic sequential processing within the two-cycle Mbox pipeline. 12-24 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 12-12: Basic Mbox Timing 56 PIPE S5 PIPE 1 1 1 1-----------1------------1-----------1------------1-----------1------------1------------1------------1 1 1 1 <------- RO=ATE & RETURN DATA --------> <----- TB LOOKUP ------> =0 IBOX & EBOX <----------------Pcache ACCESS -----------------> (read, write, fill, invalidate) At the start of the 85 cycle, the Mbox drives the highest priority reference into the 85 pipe. The Mbox arbitration logic determines which reference should be driven into 85 at the end of the previous cycle. The first half of the 85 cycle is used to translate the virtual address to a physical address \ia the TB. The Pcache access is started during phase two of 85 and continues into the first quarter of 86. If the reference should cause data to be returned to the Ibox or Ebox, the first three phases of the 86 cycle is used to rotate the read data (if the data is not right-justified) and to transfer the data back to the Ibox and/or Ebox. Thus, assuming an aligned read reference is issued in cycle x by the Ibox or Ebox, the ~Ibox can return the requested data in cycle x+2 pro'~dded that 1) the translated read address \vas cached in the TB, 2) no memory management exceptions occurred, 3) the read data was cached in the Pcache, and 4) no other higher priority or pending reference inhibited the immediate processing of this read. 12.3.3 REFERENCE ORDER RESTRICTIONS Due to the macropipeline structure of NVAX, the Mbox can receive "out-of-order" references from the Ibox and Ebox. That is, the Ibox can send a reference corresponding to an opcode decode before the Ebox has sent all references corresponding to the previous opcode. Issuing references out-of-order" in a macropipeline introduces complexities in the Mbox to guarantee that all references will be processed correctly within the context of the VAX architecture, the NVAX. macropipeline, and the Mbox hardware. Many of these complexities take the form of restrictions on how and when references can be processed by the Mbox. It The following synchronization example is useful to illustrate several of the reference order restrictions. DIGITAL CONFIDENTIAL The Mbox 12-25 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 12-13: 2 Processor Synchronization Example PROCESSOR 1 PROCESSOR 2 MOVL fl,C MOVI: fl,T 10$ BLBC T,lO$ MOVL C,RO This example illustrates two processors operating in a multiprocessor environment. Initially, processor 1 owns the critical section corresponding to memory location T. Processor 1 will modify memory location C since it currently has ownership. Subsequently, processor 1 will release ownership by writing a 1 into T. Meanwhile, processor 2 is "spinning" on location T waiting for T to become non-zero. Once T is non-zero, processor 2 will read the value of C. Note that this example is not the preferred way to implement synchronization. A better way would be to use VAX interlocked instructions which guarantee atomicity. This is, ho\vever, a valid example 1.Ulder current SRM rules because it does not disallow an NVAX multiprocessor system from supporting this synchronization structure. The following discussion explains the Mbox reference order restrictions. 12.3.3.1 No O-stream hits under o-stream misses "NoD-stream hits under D-stream misses refers to the fact that the ~Ibox will not aHo\'\'" a D-stream read reference, which hits in the Pcache, to execute as long as requested data for a previous D-stream read has not yet been supplied. tt Consider the code that processor 2 executes in the example above. If the Mbox allowed D-stream hits under D-stream misses, then it is possible for the Ibox read of C to hit in the Pcache during a pending read miss sequence to T. In doing so, the Mbox could supply the value of C before processor 1 modified C. Thus, processor 2 would get the old C with the new T causing the synchronization code to operate improperly. Note that, while D-stream hits under D-stream misses is prohibited, the Mbox will execute a D-stream hit under a D-stream fill operation. In other words, the Mbox will supply data for a read which hit in the Pcache while a Pcache fill operation to a previous missed read is in progress, provided that the missed read data has already been supplied. I-stream and D-stream references are handled independently of each other. That is, I-stream processing can proceed regardless of whether a D-stream miss sequence is currently executing, assuming there is not Pcache index confiict. 12.3.3.2 No I-stream hits under I-stream misses This is the analogous case for I-stream read references. This restriction is necessary to guarantee that the Iboxwill always receive its requested I-stream reference first, before any other I-stream data is received. 12-26 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 12.3.3.3 Maintain the order of writes Consider the example shown above. If the Mbox of processor 1 were to reorder the write to C with the write to T, then processor 2 could read the old value of C before processor 1 updated C. Thus, the Mbox must never re-order the sequence of writes generated by the Ebox microcode. 12.3.3.4 Maintain the order of Cbox references Again consider the example above. Processor 2 will receive an invalidate for C as a result of the write done by processor 1 in the MOVL #l,C instruction. If this invalidate were not to be processed until after processor 2 did the read of C then, the wrong value of C has been placed in RO. Strictly speaking we must guarantee that the invalidate to C happens before the read of C. However, since C may be in the Pcache of processor 2, there is nothing to stop the read of C from occurring before the invalidate is received. Thus from the point of '\--jew of processor 2, the real restriction here is that the invalidate to C must happen before the invalidate to T which must happen before the READ of T \vhich causes processor 2 to fall throught the loop. As long as the Mbox does not re-order Cbox references, the invalidate to C will occur before a non-zero value of T is read. 12.3.3.5 Preserve the order of Ibox reads relative to any pending Ebox writes to the same quadword address Consider the following example: Figure 12-14: Memory Scoreboard Example MOVl.. U,e MOVl.. e,RO In the NVAX macropipeline, the Ibox prefetches specifier operands. Thus, the Mbox receives a read of C corresponding to the "MOVL C,RO" instruction. This read, however, cannot be done until the write to C from the previous instruction completes. Otherwise, the wrong value of C will be read. . In general, the Mbox must ensure that Ibox reads will only be executed once all previous writes to the same location have completed. 12.3.3.6 1/0 Space Reads from the Ibox must only be executed when the Ebox Is executing the corresponding Instruction Unlike memory reads, reads to certain I/O space addresses can cause state to be modified. As a result, these I/O space reads must only be done in the context of the instruction execution to which the read corresponds. Due to the macropipe1ine structure of NVAX, the Ibox can issue an I/O space read to prefetch an operand of an instruction which the Ebox is not currently executing. Due to branches in instruction execution, the Ebox may in fact never execute the instruction corresponding to the I/O space read. Therefore, in order to prevent improper state modification, DIGITAL CONFIDENTIAL The Mbox 12-27 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 the Mbox must inhibit the processing of 110 space reads issued by the Ibox until the Ebox is actually executing the instruction colTesponding to the 110 space read. 12.3.3.7 Reads to the same Pcache block as a pending read/fill operation must be Inhibited The organization of the Pcache is such that one address tag corresponds to four subblock valid bits. Therefore, the validated contents of all four subblocks must always correspond to the tag address. If two distinct Pcache fill operations are simultaneously filling the same Pcache block, it is possible for the fill data to be intermixed between the two fill operations. As a result, an lREAD to the same Pcache block as a pending D-stream readl:fi11 is inhibited until the pending read/fill operation completes. Similarly, a D-stream read to the same Pcache block as a pending I-stream readlfill is also inhibited until the fill completes. 12.3.3.8 Writes to the same Pcache block as a pending readlflll operation must be Inhibited until the read/fill operation completes As in the above, this restriction is necessary in order to guarantee that all valid subblocks contain valid up-to-date data. Consider the following situation. The Mbox executes a write to an invalid subblock of a Pcache block \vhich is currently being filled. One cycle later, the cache fill to that same subblock arrives at the Pcache. Thus, the latest subblock data, which came from the write, is O\Tenrntten by older cache fill data. This subblock is now marked valid with "old" data. To avoid this situation, writes to the same Pcache block as a pending read/fill operation are inhibited until the cache fill sequence completes. 12.3.4 REFERENCE ARBITRATION The Mbox maintains seven different reference storage devices in 85. The purpose of these devices is to buffer pending references, which originate from different sections of the chip, until they can be processed by the MbOx. In order to optimize performance of the NVAX pipeline, and to maintain functional correctness of reference processing in light of the Mbox hardware configuration and reference order restrictions, the Mbox services references from these queues in a prioritized fashion. 12.3.4.1 Arbitration PrIority During every Mbox cycle, the reference arbitration logic is responsible for detenninjng which unserviced references should be processed next cycle. The reference sources are listed below from highest to lowest priority: 1. CBOX_LATCH 2. RTY_DMISS_LATCH 3. 4. 5. 6. 7. 8. MME_LATCH VAP_LATCH EM_LATCH SPEC_QUEUE IREF_LATCH nothing can be driven = => Mbox drives a NOP command into 85 12-28 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 ( This prioritized scheme does not directly indicate which pending reference will be driven next, but instead indicates in what order the pending references should be tested to determine which one will be processed. Conceptually, the highest pending reference which satisfies all conditions for driving the reference is the one which is allowed to execute during the subsequent cycle. The rationale behind tbis priority scheme can be explained as follows. All references coming from the Cbox are always serviced as soon as they are available. Since Cbox references are guaranteed to complete in S5 in one cycle, we eliminate the need to queue up Cbox references and to provide a back-pressure mechanism to notify the Cbox to stop sending references. A D-stream read reference in the RTY_DMISS_LATCH is guaranteed to have cleared all potential memory management problems. Therefore, any reference stored in this latch is the second consideration for processing. If a reference related to memory management processing is pending in the MME_LATCH, it is given priority over the remaining four sources because the Mbox is designed to clear all memory management exceptions through the use of the :M:ME_LATCH before normal processing can resume. The VAP_LATCH stores the second reference of an unaligned reference pair. Since we desire to complete the entire unaligned reference before starting another reference, the VAP_LATCH has next highest priority in order to complete the unaligned sequence that was initiated from a reference of lesser priority. The EM_L.4..TCH stores references from the Ebox. It is given priority over the SPEC_QlJEUE and IREF_LfJCH sources because Ebox references are physically further along in the pipe than Ibox references. The presumed implication of this fact is that the Ebox has a more immediate need to satisfy its reference requests than the Ibox, since the Ebox is always performing real work and the Ibox is prefetching operands that may, in fact, never be used. The SPEC_QUEUE stores Ibox operand references. It is next in line for consideration. The SPEC_QUEUE has priority over the IREF_LATCH because specifier references are again considered further along in the pipeline than I-stream prefetching. If no other reference can currently be driven, the lREF_LATCH can drive an I-stream read reference in order to supply data to the Ibox. If no reference can currently be driven into 85, the Mbox automatically drives a NOP command. 12.3.4.2 Arbitration Algorithm Based on the priority scheme discussed above, the arbitration logic tests each reference to see whether it can be processed next cycle by evaluating the current state of the Mbo.x. The teSt associated with each latch is described below: • • • • CBOX_LATCH: 8ince Cbox references always want to be processed immediately, a validated CBOX_LATCH always causes the Cbox reference to be driven before all other pending references. RTY_DMI88_LATCH: A pending D-stream read reference will be driven from this latch once the final D_CF command has been retired from the 85 pipe. MME_LATCH: A pending MME reference will be driven when the contents of the MME_LATCH is validated. VAP_LATCH: A reference from the VAP_LATCH will be driven provided that the VAP_LATCH is validated. DIGITAL CONFIDENTIAL The Mbox 12-29 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • • • EM_LATCH: A reference from the EM_LATCH will be driven provided that the EM_LATCH is validated. SPEC_QUEUE: A validated reference in the SPEC_QUEUE will be driven provided that the SPEC_QUEUE has not been stopped due to explicit Ebox writes in progress (see Section 12.3.20). IREF_LATCH: A reference from the !REF_LATCH will be driven provided that the lREF_LATCH has not been stopped due to a pending READ_LOCKlWRITE_UNLOCK sequence (See Section l2.3.19.2). If none of the conditions above are satisfied, the Mbox will drive a NOP command onto M_QUFt0S5_CMD_H<4:0> causing the S5 pipe to become idle. 12.3.5 READS 12.3.5.1 Generic Read-hit and Read-mlss/Cache_flll Sequences In order to orient the reader as to how memory reads are processed by the Mbox, this section will describe the "vanilla" read sequence. It does not discuss reads which TB_MISS, or otherwise are stalled for a variety of different reasons. byte mask generator generates the corresponding The b)''"te mask by looking at M_Qt1EC"~_\1A.,B<2:0> and M_QUEo/cS5_D~H<1:0> and then drives the byte mask data onto M'7CS6_B'1'TE_~B<7:0> during the subsequent cycle. Byte mask data is generated on a read operation in order to supply the byte alignment information to the Cbox on an I/O space read. When a read reference is initiated in the S5 pipe, the address is translated by the TB (assuming the address was virtual) to a physical address during the first half of the S5 cycle. The Pcache initiates a cache lookup sequence using this physical address during the second half of the S5 cycle. This cache access sequence overlaps into the following S6 cycle. During phase four of the 86 cycle, the Pcache determines whether the read reference is present in its array. If the Pcache determined that the requested data is present, a "cache hit" or "read hit" condition occurs. In this event, the Pcache drives the requested data onto B%S6_DATA....H<63:0>. The signal, M%CBOX_REF_ENABLE_L, is de-asserted to inform the Cbox that it should not process the 86 read since the Mbox will supply the data from the Pcache. If the Pcache determined that the requested data is not present, a "cache miss" or "read miss" condition occurs. In this event, the read reference is loaded into the IMISS_LATCH or DMI8S_LATCH (depending on whether the read was I-stream or D-stream) and the Cbox is instructed to continue processing the read by the Mbox assertion of M%CBO%..REF_ENABLE..L. At some point later, the Cbox obtains the requested data. The Cbox will then send four quadwords of data using the I_CF a-stream cache fill) or D_CF (D-stream cache fill) commands. The four cache fill commands together are used to fill the entire Pcache block corresponding to the hexaword read address. In the case of D-stream fills, one of the four cache fill command will be qualified with OldlEQ..DQW_H indicating that this quadword :fill contains the requested D-stream data corresponding to the quadword address of the read. When this fill is encountered, it will be used to supply the requested read data to the Mbox, Ibox andlor Ebox. If, however, the physical address corresponding to the I_CF or D_CF command falls into I/O space, only one quadword :fill is returned and the data is not cached in the Pcache. Only memory data is cached in the Pcache. 12-30 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification. Revision 1.0. February 1991 Each cache fill command sent to the Mbox is latched in the CBOX_LATCH. Note that neither the entire cache fill address nor the fill data are loaded into the CBOX_LATCH. The address in the IMIS8_LATCH or DMIS8_LATCH, together with two quadword alignment bits latched in the CBOX_LATCH are used to create the quadword cache fill address when the cache fi.ll command is executed in 85. When the fill operation propagates into S6, the Cbox drives the corresponding cache fill data onto B%S6_DATA..B<63:0> in order for the Pcache to perform the fill. Returning Read Data 12.3.5.1.1 Data resulting from a read operation is driven on Bo/0S6_DATA_H by the Pcache (in the cache hit case) or by the Cbox (in the cache miss case). This data is then driven on Mo/ciMD_BUS_B<63:0> by the MD_BU8_ROTATOR in right-justified form. The signals M%VIC_DATA...L, M%mOX_DATA..L, M%IBOX_IPR...WR_H, M%EBOX_DAT.A...H, Mo/cMBOX_DATA, are conditionally asserted with the data to indicate the destination(s) of the data. 12.3.5.1 .1.1 Pcache Data Bypass In order to return the requested read data to the Ibox andlor Ebox as soon as possible, the Cbox implements a Pcache Data Bypass mechanism. '''''hen this mechanism is invoked, the requested read data can be returned one cycle earlier than 'when the data is driven for the 86 cache fill operation. The bypass mechanism \vorks by haYing the Mbox inform the Cbox that the next 86 cycle will be idle, and thus the Bt;CS6_DAI:.4._H bus will be available to the Cbox. When the Cbox is informed of the 86 idle cycle, it drives the B%S6_DATA..H bus with the requested read data if read data is currently a,·ailable (if no read data is available during a bypass cycle, the Cbox drives some indeterminent data and no valid data is bypassed). The read data is then formatted by the MD_BUS_ROTATOR and transferred onto the M%MD_BUS_B to be returned to the Ibox andlor Ebox, qualified by M%VIC_DATA_L, ~f%IBOX_DAT.A..L, andlor M%EBOX_DATA..H. 12.3.5.2 I-stream Read Processing Memory access to all I-stream code is implemented by the Mbox on behalf of the Ibox. The Ibox uses the I-stream data to load its prefetch queue and to fill the VIC (Virtual Instruction Cache). When the Ibox requires I-stream data which is not stored in the prefetch queue or the VIC, the Ibox issues an I-stream read request which is latched by the IREF_LATCH. The Ibox address is always interpreted by the Mbox as being an aligned quadword address. Depending on whether the read hits or misses in the Pcache, the amount of data returned varies. The Ibox continually accepts I-stream data from the Mbox until the Mbox qualifies I-stream MD_BUS data with the M%LAST_FlLL_H signal. M%LAST_F'ILL_R informs the Ibox that the current fill terminates the initial IREAD transaction. 12.3.5.2.1 l-stream Read Hits When the requested data hits in the Pcache, the Mbox turns the IREF_LATCH reference into a series of I-stream reads to implement a VIC "fill forward" algorithm. The fill forward algorithm generates increasing quadword read addresses from the original address to the highest quadword address of the original hexaword address. In other words, the Mbox generates read references so that the hexaword VIC block corresponding to the original address is filled from the point of the request to the end of the block.. The theory behind this fill forward scheme is that it only makes DIGITAL CONFIDENTIAL The Mbox 12-31 NVAX CPU Chip Functional S~cation, Revision 1.0, February 1991 sense to supply I-stream data following the requested reference since I-stream execution causes monotonically increasing I-stream addresses (neglecting branches). The fill forward scheme is implemented by the IREF_LATCH. Once the IREF_LATCH read completes in 85, the IREF_LATCH quadword address incrementor modifies the stored address of the IREF_LATCH so that its contents becomes the next quadword !READ. Once this "new" reference completes in 85, the next IREAD reference is generated. When the lREF_LATCH finally issues the lREAD corresponding to the highest quadword address of the hexaword address, the forward fill process is terminated by invalidating the IREF_LATCH. 12.3.5.2.2 I-stream Read Misses The :fill forward algorithm described above is always invoked upon receipt of an IREAD. However, when one of the IREADs is found to have missed in the Pcache, the subsequent IREAD references are fiushed out of the 85 pipe and the IREF_LATCH. The missed IREAD causes the IMISS_LATCH to be loaded and the Cbox to continue processing the read. When the Cbox returns the resulting four quadwords of Pcache data, all four quadwords are transferred back to the Ibox qualified by M%YIC_D.U'A..,.L. This in effect, results in a VIC "fill full" algorithm since the entire VIC block will be filled. Fill full is done instead of fill forward because it costs little to implement. The Mbox must allocate a block of cycles to process the four cache fills; therefore, all the Pcache fill data can be shipped to the VIC \tith no extra cost in Mbox cycles since the ?tlt;'cMD_BUS_B would otherwise be idle during these fill cycles. Note that the Ibox is unaware of what :fill mode the Mbox is currently operating in. The VIC continues to fill I-stream data from the MC:u'\ID_BUS_B \vhenever M%VIC_DAT.A".L is asserted regardless of the Mbox fill mode. The Mbox asserts the M%LAST_F'ILL_B signal to the Ibox during the cycle which the Mbox is driving the last I-stream fill to the Ibox. M%LAST_FILL_H informs the Ibox that is is receiving the final VIC fill this cycle and that it should not expect any more. In fill forward mode, the Mbox asserts M%LAST_FILL_B when the quadword alignment equals 11 (Le. the upper·most quadword of the bexaword). In:fill full mode, the Mbox receives the last :fill information from the Cbox and transfers it to the Ibox through the M%LAST_FILL_B signal. It is possible to start processing I-stream reads in fill forward mode, but then switch to fill full. This could occur because one of the references in the chain of fill fonvard !READs misses due to a recent invalidate or due to displacement of Pcache I-stream data by a D-stream cache fill. In this case, the Ibox will receive more than four fills but will remain in synchronization with the Mbox because it continually expects to see fills until M%LAST_FILL_B is asserted. 12.3.5.2.3 1/0 Space I-stream Reads See Section 12.3.5.4. 12.3.5.3 D-stream Read Processing Memory access to all D-stream references is implemented by the Mbox on behalf of the Ibox (for specifier processing), the Mbox (for PTE references), and the Ebox (for all other D-stream references). In general D-stream read processing behaves the same way as I-stream read processing except that there is no fill forward or fill full scheme. In other words, only the requested data is shipped to the initiator of the read. From the Pcache point of view, however, a D-stream fill full scheme is implemented since four D_CF commands are still issued to the Pcache. 12-32 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 D-stream reads can have a data length of byte, word, longword or quadword. With the exception of the cross-page check function, a quadword read is treated as if its data length were a longword. Thus aD-stream quadword read returns the lower half of the referenced quadword. The source of most D-stream quadword reads is the Ibox. The Ibox will issue aD-stream longword read to the upper baH of the referenced quadword immediately after issuing the quadword read. Thus, the entire quadword of data is accessed by two back-to-back D-stream read operations. A DREAD_LOCK command always forces a Pcache read miss sequence regardless of whether the referenced data was actually stored in the Pcache. This is necessary in order that the read propagate out to the Cbox so that the memory lock/unlock protocols can be properly processed. 12.3.5.3.1 Reads under Fills The ~Ibox will attempt to process a DREAD after the requested fill of a previous D-stream fill sequence has completed. This mechanism, called "reads under fills" is done to try to return read data to the Ibox andlor Ebox as quickly as possible, without having to \vait for the previous fill sequence to complete .. If the attempted read hits in the Pcache, the data is returned and the read completes. If the read misses in the S6 pipe, the corresponding fill sequence is not immediately initiated for t\VO reasons: • • A D-stream cache fill sequence for tbis read cannot be started because the D~n8S_L..~TCa: is full corresponding to the currently outstanding cache fill sequence. The D-stream read may hit in the Pcache once the current fill sequence completes becauSe the current fill sequence may supply the data necessary to satisfy the new D-stream read. ' Because this DREAD bas already propagated through the 85 pipe, the read must be stored somewhere in order that it can be restarted in 85. The RTY_DMI8S_LATCH is the mechanism by which the 86 read is saved and restarted in the 85 pipe. Once the read is stored in the RTY_DMI8S_LATCH, it will be retried in 85 after the final D_CF reference is retired from 85 (the final D_CF completes the previous D-stream fill sequence). The RTY_DMISS_LATCH is invalidated when the retried reference is retired from 85. 12.3.5.4 1/0 Space Reads 110 space reads are defined as reads which address 110 space. Therefore, a read is an 110 read when the physical address bits, addr<31:29>, are set. I/O space reads are treated by the Mbox in exactly the same way as any other read, except for the following differences: • • 110 space data is never cached in the Pcache. Therefore, an 110 space read always generates a read-miss sequence and causes the Cbox to process the reference. Unlike, a memory space miss sequence, which returns a hexaword of data via four I_CF or D_CF commands, an 110 space read returns only one piece of data via one I_CF or D_CF command. Thus the Cbox always asserts C%LAST_Fn..L_B on the first and only I_CF or D_CF 110 space operation. If the 110 space read is D-stream, the returned D_CF data is always less than or equal to a longword in length. DIGITAL CONFIDENTIAL. The Mbox 12-33 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • 110 space D-stream reads are never prefetched ahead of Ebox execution. An 110 space D-stream read issued from the Ibox is only processed when the Ebox is known to be stalling on that particular 110 space read (see Section 12.3.18.1.1). NVAX RESTRICTION I-stream I/O space reads must return a quadword of data. Execution of an I-stream I/O space read which does not return a quadword of data is unpredicatable. 12.3.6 WRITES All writes are initiated by the Mbox on behalf of the Ebox. The Ebox microcode is capable of generating write references with data lengths of byte, word, longword, or quadword. With the exception of cross-page checks (see Section 12.5.1.5.4), the Mbox treats quadword write references as longword write references because the Ebox datapath only supplies a longword of data per cycle. Ebox '\vrites can be unaligned. The Mbox performs the following functions during a write reference: • • • • • Memory Management checks: The I\Ibox checks to be sure the page or pages referenced have the appropriate write access and that the valid virtual address translations are available. (See Section 12.5 ) The supplied data is properly rotated to the memory aligned longword boundary. Byte Mask Generation: The Mbox generates the byte mask of the write reference by examining the write address and the data length of the reference. Pcache writes: The Pcache is a write-through cache. Therefore, writes are only written into the Pcache if the write address matches a validated Pcache tag entry. The one exception to this rule is when the Pcache is configured in force D-stream hit mode. In this mode, the data is always written to the Pcache regardless of whether the tag matches or mismatches. All write references which pass memory management checks are transferred to the Cbox via B%S6_DATA,..B<63:0>. The Cbox is responsible for processing writes in the Bcache and for controlling the protocols related to the write-back memory subsystem. When write data is latched in the EM_LATCH, the 4-way byte barrel shifter associated with the EM_LATCH rotates the EM_LATCH data into proper alignment based on the lower two bits of the corresponding address. The diagram below illustrates the barrel shifter function: 12-34 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 12-15: Barrel Shifter Function original 4 bytes of Ebox da~a barrel shifter output when M_QUE%S5_VA_H<1:O> - 01 barrel shifter output when M_QUE%S5_VA_H<1:O> - 10 ba=:r6:' sh!!"t.e: ot:-:"'T:~-: whe:':. +-----+-----+-----+-----+ I 4 I :3 I :2 I 1 I +-----+-----+-----+-----+ +-----+-----+-----+-----+ I :; I :2 I 1 I 4 I +-----+-----.-----+-----+ +-----+-----+-----+-----+ I :2 I 1 I 4 I t :; NV/tf( P~J- .-.i- y- D ~ 1t{~v,",r . ~-----~-----+-----~-----~ .-----+-----+-----~-----+ ; : ! 4 I :; I :2 I The result of this data rotation is that all bytes ( relative to memory longword boundaries. "'ben '\.vrite data is driven from the E~I_LATCH, M_ of the barrel shifter so that data will always be pre Note that, while the M_QUE%S5_DATA_H bus is ~ quadword wide. Bo/cS6_DATA..H is a quadword wi The quadword access size facilitates Pcache and half of B%S6_DATA.,.H<63:0> is ever used to write the Pcacbe Slnce au W .U"I:< "'V~ ______ . • a longword or less of data. When a write reference propagates from 85 to 86, the longword aligned data on M_QUEo/cS5_DATA.-B<31:0> is transferred onto both the upper and lower halves of Bo/cS6_DATA_H<63:0> to guarantee that the data is also quadword aligned to the Pcache and Cbox. The byte mask corresponding to the reference will control which bytes of B%S6_DAT.A...B<63:0> actually get written into the Pcache or Bcache. Write references are formed through two distinct mechanisms described below. 12.3.6.1 Destination Specifier Writes Destination specifier writes are those writes which are initiated by the Ibox upon decoding a destination specifier of an instruction. When a destination specifier to memory is decoded, the Ibox issues a reference packet corresponding to the destination address. Note that no data is present in this packet because the data is generated when the Ebox subsequently executes the instruction. The command field of this packet is either a DEST_ADDR command (when the specifier had access type of write) or a DREAD_MODIFY command (when the specifier had access type of modify). The address of this command packet is translated by the TB, memory management access checks are performed, and the corresponding byte mask is generated. The physical address, DL and other qualifer bits are loaded into the PA_QUEUE. When the DE8T_ADDR command completes in 85, it is turned into a NOP command in 86 because no further processing can take place without the actual write data. DIGITAL CONFIDENTIAL The Mbox 12-35 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 When the Ebox executes the opcode corresponding to the Ibox destination specifier, the corresponding memory data to be written is generated. This data is sent to the Mbox by a STORE command. The STORE packet contains only data. When the Mbox executes the STORE command in 85, the corresponding PA_QUEUE packet is driven into the 85 pipe. The data in the EM_LATCH is rotated into proper longword alignment using the byte rotator and the lower two bits of the corresponding PA_QUEUE address and are then driven into 85. In effect, the DE8T_ADDR and 8TORE commands are merged together to form a complete physical address WRITE operation. This WRITE operation propagates through the 85186 pipeline to perform the write in the Pcache (if the address hits in the Pcache) and in the memory subsystem. 12.3.6.2 Explicit Writes The term explicit writes defines writes generated solely by the Ebox. That is, writes which do not result from the Ibox decoding a destination specifier but rather writes which are explicitly initiated and fully generated by the Ebox. An example of an explicit write is a write performed during a Move instruction. In this example, the Ebox generates the virtual write address of every write as well as supplying the corresponding data. The PA_QlJEUE is never involved in processing an explicit write. Explicit writes are transferred to the Mbox in the form of a 'W"RITE command issued by the Ebox. These writes directly execute in 85 and S6 in the same manner as when a write packet is formed from the PA_ QUEUE contents and the STORE data. 12.3.6.3 Writes to 1/0 Space I/O space writes are defined as a write command which addresses I/O space. Therefore, a write is an I/O space write when the physical address bits, addr<31:29>, are set. I/O space writes are treated by the Mbox in exactly the same way as any other write, except for the following differences: • I/O space data is never cached in the Pcache; therefore, an I/O space write always misses in the Pcache. 12.3.6.4 Byte Mask Generation 8inCe memory is byte-addressable, all memory storage devices must be able to selectively write specified bytes of data without writing the entire set of bytes made available to the storage device. The byte mask field of a write reference packet specifies which bytes within the quadword Pcache access size get written. The byte mask is generated in the Mbox by the byte mask generation logic based on M_Q~_V~B<2:0> and the data length of the reference. Byte mask data is generated on a read as well as a wriate in order to supply the byte alignment information to the Cbox on an I/O space read. The following table illustrates the behavior of the byte mask generator for all aligned reads and writes: 12-36 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 12-2: Byte Mask logic for Aligned References addr<2:O> 000 001 010 011 100 101 110 111 BM (DL-byte) BM BM (DL-word) (DLdoDg) BM (DL-quad) 00000001 00000010 00000100 00001000 00010000 00100000 01000000 10000000 00000011 00000110 00001100 00011000 00110000 01100000 11000000 00001111 00011110 00111100 01111000 11110000 00001111 00011110 00111100 01111000 11110000 unaligned unaligned unaligned unaligned unaligned unaligned unaligned See Section 12.3.17.3 for a description of byte mask generator for unaligned references. 12.3.7 IPR PROCESSING 12.3.7.1 MBOX IPRs The Mbox maintains the following internal processor registers: Table 12-3: Mbox IPRs Register Name IPRAddress (in hex) MPOLR (Mbox PO Length Register)1 EO E1 MP1BR (Mbox PI Base Register)1 E2 MPlLR (Mbox PI Length Register)1 E3 :MSBR {~.fbvx Systam Base Pwgi~wr)1 E4 MSLR (Mbox System Length Register)1 E5 MMAPEN (Map Enable Bit)1 E6 PAMODE (Address Mode) E7 MMEADR (MME Faulting Address Register)1 E8 MMEPTE (PTE AddreSIJ Register)1 E9 MMESTS (status of memory management exception)1 EA TBADR (address of reference causing TB parity error) EC TBSTS (status ofTB parity error) ED PCADR (address ofreference causing Pcache parity error) F2 PCSTS (status of Pcache parity error and PrE hard errors) F4 PCCTL (control state of Pcache operation) F8 MPOBR (Mbox PO Base Register)1 1Testability and diagnostic use only; not for software use in normal operation. DIGITAL CONFIDENTIAL The Mbox 12-57 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 12-3 (Cont.): Mbox IPRs IPRAddress Register Name (in hex) PCTAG o1800000..0 180lFE< PCDAP OlCOOOOO.. OICOIFF The first thirteen IPRs listed above (memory management IPRs) are stored in the 85 pipe in the register file of the MME_DATAPATH. All other IPRs are stored in the 86 pipe. Note that when an Mbox IPR, other than a Pcache tag, is addressed, the actual IPR address is received on M_QlJECY0S5_V~B<9:2> (the table above is written such that all addresses start at bit<O». The following is the format description of each Mbox IPR. Each format illustrates the format visible at the programmer level. The formats do not necessarily illustrate the intenlal hardware storage format. Figure 12-16: IPR EO (hex), MPOBR 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 11 01 system virtual page address of PO page table I 01 01 01 01 01 01 01 01 OI:MPOBR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Figure 12-17: IPR E1 (hex), MPOLR 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 01 01 01 01 01 01 01 01 01 01 length of PO page table in longwords I:MPOLR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 12-38 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 12-15: Barrel Shifter Function barrel shifter output. when +-----+-----+-----+-----+ I 4 I 3 I :2 I I I +-----+-----+-----+-----+ +-----+-----+-----+-----+ I 3 I :2 I I I 4 I +-----+-----+-----+-----+ +-----+-----+-----+-----+ I :2 I I I 4 I :3 I M_QUE%S5_VA_H<1:0> - lO +-----~-----+-----~-----+ ba:re! sh!!te: .-----+-----~----.-----+ ; :. ! 4 I :3 I :2 I ~------------------------ original 4 bytes of Ebox data barrel shifter output when M_QUE%S5_VA_H<l:0> - Ol out:;;'..lt when M_Q:~%S5_VA_H<1:O> - II The result of this data rotation is that all bytes of data are now in the correct byte positions relative to memory longword boundaries. \Vhen write data is driven from the El\.I_LATCH, M_QUEo/cS5_DATA....H<31:0> is driven by the output of the barrel shifter so that data will always be properly aligned to memory longword addresses. Note that, while the M_QUE%S5_DATA_H bus is a longword wide, the B%S6_DA.T.A.,H bus is a quadword wide. Bo/c:S6_DATA_H is a quadword wide due to the quadword Pcache access size. The quadword access size facilitates Pcache and VIC fills. Ho'\vever for all writes, at most half of B%S6_DMA-H<63:0> is ever used io write the Pcache since all write commands modify a 10ngword or less of data. When a write reference propagates from 85 to 86, the longword aligned data on M_QUEo/cS5_D.ATA..B<31:0> is transferred onto both the upper and lower halves of Bo/cS6_DA'£A..H<63:0> to guarantee that the data is also quadword aligned to the Pcache and Cbox. The byte mask corresponding to the reference will control which bytes of B%S6_DArA...B<63:0> actually get written into the Pcache or Bcache. Write references are formed through two distinct mechanisms described below. 12.3.6.1 Destination Specifier Writes Destination specifier writes are those writes which are initiated by the Ibox upon decoding a destination specifier of an instruction. When a destination specifter to memory is decoded, the Ibox issues a reference packet corresponding to the destination address. Note that no data is present in this packet because the data is generated when the Ebox subsequently executes the instruction. The command field of this packet is either a DEST_ADDR command (when the speciiier had access type of write) or a DREAD_MODIFY command (when the specifter had access type of modify). The address of this command packet is translated by the TB, memory management access checks are performed, and the corresponding byte mask is generated. The physical address, DL and other qualifer bits are loaded into the PA_QUEUE. When the DE8T_ADDR command completes in 85, it is turned into a NOP command in 86 because no further processing can take place without the actual write data. DIGITAL CONFIDENTIAL The Mbox 12-35 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 When the Ebox executes the opcode corresponding to the Ibox destination specifier, the corresponding memory data to be written is generated. This data is sent to the Mbox by a STORE command. The STORE packet contains only data. When the Mbox executes the STORE command in 85, the corresponding PA...,QUEUE packet is driven into the 85 pipe. The data in the EM_LATCH is rotated into proper longword alignment using the byte rotator and the lower two bits of the corresponding PA_QUEUE address and are then driven into 85. In effect, the DE8T_ADDR and STORE commands are merged together to form a complete physical address WRITE operation. This WRITE operation propagates through the 85/S6 pipeline to perform the write in the Pcache (if the address hits in the Pcache) and in the memory subsystem. 12.3.6.2 Explicit Writes The term explicit writes defines writes generated solely by the Ebox. That is, writes which do not result from the Ibox decoding a destination specifier but rather writes which are explicitly initiated and fully generated by the Ebox. An example of an explicit write is a write performed during a Move instruction. In this example, the Ebox generates the virtual write address of every write as well as supplying the corresponding data. The PA..QIJEUE is never involved in processing an explicit write. Explicit writes are transferred to the Mbox in the form of a '\VlUTE command issued by the Ebox. These writes directly execute in 85 and 86 in the same manner as when a write packet is formed from the PA_QUEUE contents and the STORE data. 12.3.6.3 Writes to 1/0 Space I/O space writes are defined as a write command which addresses 110 space. Therefore, a write is an I/O space write when the physical address bits, addr<31:29>, are set. 110 space writes are treated by the Mbox in exactly the same way as any other write, except for the following differences: • I/O space data is never cached in the Pcache; therefore, an 110 space write always misses in the Pcache. 12.3.6.4 Byte Mask Generation 8inCe memory is byte-addressable, all memory storage devices must be able to selectively write specified bytes of data without writing the entire set of bytes made available to the storage device. The byte mask field ofa write reference packet specifies which bytes within the quadword Pcache access size get written. The byte mask is generated in the Mbox by the byte mask generation logic based on l~CQUE%SS_V~B<2:0> and the data length of the reference. Byte mask data is generated on a read as well as a wriate in order to supply the byte alignment information to the Cbox on an 110 space read. The following table illustrates the behavior of the byte mask generator for all aligned reads and writes: 12-36 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 12-18: IPR E2 (hex), MP1BR 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+-~~--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 11 01 system virtual page address of P1 page table I 01 01 01 01 01 01 01 01 OI:MPIBR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Figure 12-19: IPR E3 (hex), MP1 LR 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1·01 01 01 01 01 01 01 01 01 01 length of (2**21) - PI page table in longwords I:MPILR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Figure 12-20: IPR E4 (hex), MSBR 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 physical page address of system page table I 01 01 01 01 01 01 01 01 OI:MSBR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Figure 12-21 : IPR ES (hex), MSLR 31 30 29 28127 26 25 24123 2221 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 01 01 01 01 01 01 01 01 01 01 length of system page table in longwords I:MSLR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ DIGITAL CONFIDENTIAL The Mbox 12-39 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 12-22: IPR E6 (hex), MMAPEN 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 MI:MMAPE~ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Table 12-4: MMAPEN Field Descriptions Name Extent Type Description M o When 0, disables Mbox memory management. When 1, enables Mbox memory management. RW Figure 12-23: IPR E7 (hex), PAMODE 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 I:PAMODE +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 MODE------+ Table 12-5: PAMODE Field Descriptions Name Extent Type Description MODE o When 0, maps addresses from a 30-bit physical address space. When 1, maps addresses from a 32-bit physical address space. 12-40 The Mbox RW DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 12-24: IPR E8 (hex), MMEADR 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I address associated with recorded MME fault I:MMEADR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+~-+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Figure 12-25: IPR E9 (hex), MMEPTE 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I PTE address associated with an address corresponding to a modify fault I:MMEPTE +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Figure 12-26: IPR EA (hex), MMESTS 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ SRC I 01 01 01 01 01 01 01 01 01 o 1FAULT 1 01 01 01 01 01 01 01 01 01 01 01 MI ILVI:MMESTS +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ <---+----> I I 1 +---- LOCK Table 12-6: MMESTS Field Descriptions Name Extent Type 0 1 RO RO Description Indicates ACV fault occurred due to length violation. Indicates ACVlrNV fault occurred on PTE reference corresponding to MMEADR. Indicates corresponding reference had write or modify intent. M 2 FAULT 15:14 RO RO SRC 28:26 RO Complemented shadow copy of LOCK bits. However, the SRC bits do not get reset when the LOCK bits are cleared. LOCK 31:29 RO,O Indicates the lock status of MMESTS. See LOCK encodings below. This field is cleared on :H.FLUSB_MBOx:..X. Indicates nature of memory management fault. encodings below See Fault bit See Section 12.5.1.5.3.5 for information on how these fields are encoded. DIGITAL CONFIDENTIAL The Mbox 12-41 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 12-27: IPR EC (hex), TBADR 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I virtual address associated with the recorded TB parity error I :TBADR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Figure 12-28: IPR ED (hex), TBSTS 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I SRC I 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 CMD I I I 1 I :TBSTS +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I I I EM_VAL---------+ I I 1 TPERR-------------+ I 1 DPERR----------------+ I LOCK--------------------+ Table 12-7: TBSTS Field Descriptions Name Extent Type Description LOCK 0 WC Lock Bit. When set, validates TBSTS contents· and prevents any other field from further modification. When clear, indicates that no TB parity error has been recorded and allows TBSTS and TBADR to be updated. DPERR 1 Data Error Bit. When set, indicates a TB data parity error. TPERR 2 EM_VAL 3 RO RO RO CMD 8:4 SRC 31:29 RO RO Tag Error Bit. When set, indicates a TB tag parity error. EM_LATCH valid bit. Indicates if EM_LATCH was valid at the time of the error TB parity error detection. This helps the software error handler determine if a write operation may have been lost due to the TB parity error. S5 command corresponding to TB parity error. Indicates the original source of the reference causing TB parity error. See Section 12.6.4.1 for information on how these fields are encoded. 12-42 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 12-29: IPR F2 (hex), PCADR 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I quadword physical address associated with the recorded Pcache parity error I 01 01 OI:PCADR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Figure 12-30: IPR F4 (hex), PCSTS 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 I I CMD I I I I I: PCSTS +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I I I I PTE_ER---------+ I I I I PTE_ER_WR---------+ I I I LEFT_BANK---------------------------+ I I RIGHT_BANK-----------------------------+ I I DPERR-------------------------------------+ I LOCK-----------------------------------------+ Table 12~: PCSTS Field Descriptions Name Extent Type Description LOCK o WC Lock. Bit. When set, validates PCST8<8:1> contents and prevents modification of these fields. When clear, invalidates PCST8<8:1> and allows these fields and PCADR to be updated. DPERR 1 RO Data Error Bit. When set, indicates a Pcache data parity error. RIGHT_BANK 2 RO Right Bank Tag Error Bit. When set, indicates a Pcache tag parity error on the right banJt. LEFI'_BANK 3 RO Left Bank Tag Error Bit. When set, indicates a Pcache tag parity error on the left bank. CMD 8:4 RO S6 command corresponding to Pcache parity error. PrE_ER_WR 9 we Indicates a hard error on a PrE DREAD which resulted from a TB miss on a WRITE or WRITE_UNLOCK. PrE_ER 10 we Indicates a hard error on a PrE DREAD. Note that the state of PCSTS<31:11> are "don't cares" during an IPR write operation. DIGITAL CONFIDENTIAL The Mbox 12-43 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 12-31: IPR F8 (hex), PCCTL 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 11 11 11 11 11 11 11 11 11 11 11 1 I 11 1 I 1 I 11 11 11 1 I 11 11 11 1 I PMM I 1 : PCCTL +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I I I RED_ENABLE---+ 1 1 I ELEC DISABLE----+ I 1 P_ENAsLE--------------------+ I BANK_SEL-----------------------+ I FORCE HIT-------------------------+ I I_ENABLE ----------------------------+ D_ENABLE -------------------------------+ Table 12-9: PCCTL Field Descriptions Name CENABLE PMM 12-44 The Mbox Extent Type Description o RW,O When set, enables Pcache for all INVAL operations and for all D-stream readlwrite/fill operations, qualified by other control bits. When clear, forces a Pcache miss on all Pcache D-stream readlwritelfill operations. Note, however, that an ACVtrNVlM=O condition overrides a desasserted D_ENABLE in that it will force a Pcache hit condition with D_ENABLE=O. 1 RW,O When set, enables Pcache processing of INVAL, lREAD and 1_CF commands. When clear, forces a Pcache miss on IREAD operations and prevents state modification due to an CCF operation. Note, however, that an ACV/TNVfM=O condition overrides a desasserted CENABLE in that it will force a Pcache hit condition with CENABLE::O. 2 RW,O When set, forces a Pcache hit on all reads and writes when Pcache is enabled for I or D-stream operation. 3 RW,O When set with FORCE_HIT=l, selects the "right bank" of the addressed Pcache index. When clear with FORCE_HIT= 1, selects the '1eft. bank" of the addressed Pcache index. BANK_SEL is a don't care when FORCE_HIT=O. NOTE: BANK_SEL never affects bank selection during IPR reads and !PR writes to the Pcache tags or Pcache data parity bits; bank selection for these commands is always determined by the specified IPR address. 4 RW,O When set, enables detection of Pcache tag and data parity errors. When deasserted, disables Pcache parity error detection. 7:5 RW,O SpecifiesMbox performance monitor mode (see Section 12.10). Note that this field does not control or affect the operation of the Pcache in any way. PMM is placed in PCCTL for the convenience of the hardware implementation. DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 12-9 (Cont.): Name PCCTL Field Descriptions Extent Type Description RW,O When set, the Pcache is disabled electrically to reduce power dissipation. NOTE: This bit should only be set when the Pcache is functionally turned off by the deassertion of both I_ENABLE and D_ENABLE. UNPREDICTABLE operation will result when this bit is set when either CENABLE or D_ENABLE is also set. Also note that Pcache tag or parity IPRs will not function properly when this bit is unconditionally set. RO When set, indicates that one or more Pcache redundancy elements are enabled (see Section 12.4.11 for more information). Note that the state ofPCCTL<31:10> are "don't cares" during an IPR write operation. Figure 12-32: IPRs 01800000 thru 01801FEO (hex), PCTAG 31 30 29 28\27 26 25 24\23 22 21 20\19 18 17 16\15 14 13 12\11 10 09 08\07 06 05 04\03 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ tag \ 1\ 1\ 1\ 11 11 11 PI valid bitsl AI:PCTAG +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Table 12-10: PCTAG Field Descriptions Name Extent Type A 0 RW Allocation Bit cOlTesponding to index of this tag. valid bits 4:1 RW Valid Bits cOlTesponding to the four data subblocks. PCTACk4> colTesponds to uppermost quadword in block. PCTACk1> corresponds to lowermost quadword in block. P 5 RW Even Tag Parity tag 31:12 RW Tag Data Description Note that the state of PCTAG<11:6> are "don't cares" during an IPR write operation. DIGITAL CONFIDENTIAL The Mbox 12-45 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 12-33: IPRs 01 COOOOO thru 01 C01 FFS (hex), PCDAP 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 DATA_PARITY I:PCDAP +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Table 12-11: PCDAP Field Descriptions Extent Type Name 7:0 RW Description Even byte parity cOITesponding to addressed quadword of data. Bit n represents parity for byte n of addressed quadword. Note that the state of PCDAP<31:8> are "don't cares" during an IPR write operation. 12.3.7.2 Hardware MBOX IPR Format The IPR formats listed above reflect the formats used by the programmer to execute IPR read and write operations. However, due to the specific structure of the Mbox memory management datapath, four memory management registers are internally stored in a different format in order to facilitate all length violation checks and PI space PrE calculations. The following describes the hardware formats of these registers: Figure 12-34: MPOLR Register 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 01 length of PO page table in longwords I 01 01 01 01 01 01 01 01 OI:MPOLR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 12-46 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional SpecificatiOD.t Revision 1.0, February 1991 Figure 12-35: MP1 LR Register 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 (length of (2**21) - P1 page table in lon9words) + lr_bias 1 01 01 01 01 01 01 0: 01 01:MP1LR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--~--+--+--+--+--+--+--+ Figure 12-36: MSlR Register 31 30 29 2e12~ 26 25 2'123 22 21 20119 18 ~i 16115 14 ~3 12il~ 10 09 08107 06 05 04103 02 C: 00 ---~--~--~--~--~-.-----~--~--~--~--+--+--+--~--+--~--~-.--~--~--+--~--+-----~--~-----+-----~--~ ! 0: ~en~h o! sys~em page ~a~:e ir. longwo:tis : C. 1 C I 0 i 0 I 0 i u! 0 i C i v i :!'!S:"?, ~--------~--+--~--~--~--+--+--~-----+--.--~--~--+--~--~--~--~--~-----.--~--------------~--------- The re-formating operation necessary to convert the program-level format to the hard\vare-level format is handled by microcode. When IPR writes are done to these registers, the microcode shifts the length register data 9 bits to the left before delivering the IPR_"'RITE reference to the Mbox. In the MP1LR case, the microcode adds a bias value to the data following the shift operation. This is done in order to compensate for the "1" \vhich \vilI occur in virtual_addr<30> position during length check subtraction operations for all Pl space virtual references. The microcode reverses the format operation to convert the Mbox IPR data back into the program-level format during MxLR IPR_READ operations. The hardware format for MPIBR is shown below: Figure 12-37: MP1 BR Register 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I system virtual page address of P1 page table - br_bias 1 01 01 01 01 01 01 01 01 01:MP1BR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Before sending the IPR_'WRITE data to the Mbox, the microcode substracts a different bias value from the PI space base register. This is done in order to compensate for the "1" which will occur in virtual_addr<30> position during PI space PrE address calculations. The microcode reverses this format operation to convert the Mbox IPR data back into the program-level format during MPIBR IPR_READ operations. 12.3.7.3 IPR Reads IPR reads (internal processor register reads) are issued to the Mbox by the Ebox using the IPR_RD command. The Ebox issues an IPR_RD in order to obtain the contents of an NVAX internal processor register existing somewhere in the system other than the Ebox. DIGITAL CONFIDENTIAL The Mbox 12-47 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 12.3.7.3.1 Mbox IPR Reads When the Ebox issues an IPR_RD to an Mbox 85 IPR, the :MME_DATAPATH will respond by accessing the appropriate register and loading it into the data field of the MME_LATCH. The :MME_LATCH is then validated with an IPR_DATA command. Subsequently, the IPR_DATA command will execute in the Mbox pipe by passing the requested IPR data back to the Ebox on M%MD_BUS_B<31:0>, qualified by M%EBOx..,DATA.,.H. All Mbox 86 IPRs return their data directly on M%MD_BUS_B<31:0>, qualified by M%EBOx..DATA...B, during the 86 execution of the IPR_RD command. Any IPR address in the range EO-FF which is not specified above is called a reserved Mbox IPR (reserved for any future Mbox IPR functional requirements). An IPR_RD to a reserved Mbox IPR will cause the assertion ofM%EBO~D.A.T.I\..B in order to unstall the Ebox which is waiting for IPR data to be returned. Note however, that the returned data is UNPREDICATABLE. 12.3.7.3.2 Non-Mbox IPR Reads The Ebox '\vill issue an IPR~RD command to the Mbox to access IPRs existing in other sections of the l\TVAX computer system. Specifically, IPR_RD commands are issued to address IPRs in the Ibox, Cbox, ~"TIAL and memory subsystem. IPR_RDs to the Ibox (IPR addresses DO-DF) are treated as NOPs. That is, execution of an Ibox IPR_RD command performs no Mbox function and does not modify any Mbox state. This behavior facilitates the Ebox microcode decode of IPR commands by allowing Ibox IPR_RDs to be issued to the :Mbox even though the Mbox does not playa role in returning Ibox IPR data. IPR_RDs which do not address the Ibox or the Mbox are transferred to the Cbox for further processing by asserting M%CBOx..,BEF_ENABLE_L when the IPR_RD is in S6. These IPR_RDs are handled by the Mbox in a manner similar to a DREAD which misses in the Pcache. The IPR~RD command is loaded into the DMISS_LATCH as the command is transferred to the Cbox. DMISS_LATCH state is set to indicate that the reference is not cacheable. Subsequently, the Cbox responds to the IPR_RD by sending back the requested data via one D_CF command. The IPR_RD sequence is similar to an 110 space READ miss sequence in that only one D_CF command is sent rather than four, and the returned data is not loaded in the Pcache even though a D_CF command was used to return the data. 12.3.7.4 IPR WRITES IPR writes (internal processor register writes) are issued to the Mbox by the Ebox using the IPR_WR command. The IPR_WR command modifies the contents of an internal processor register which is located in the Ibox, Mbax, Cbox, NDAL or memory subsystem. The addressed register is modified using the longword of data associated with the IPR_WR command. 12.3.7.4.1 Mbox IPR Writes All Mbox IPBs located in S5 reside in the MME_DATAPATH. These IPRs are written by the IPR_WR command during the cycle after the IPR_WR executes in S5. All other Mbox !PRs reside in S6 and are written during the cycle when the IPR_WR executes in S6. See Table 12-3 for a description of the Mbox IPR registers. An IPR_WR to an Mbox reserved IPR causes no action to be taken. 12-48 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 12.3.7.4.2 Non-Mbox IPR Writes Unlike Ibox IPR reads, the Mbox plays a role in processing Ibox IPR writes. The Mbox reoognizes all Ibox IPR writes (addresses DO-DF) and passes the data through the Mbox pipeline onto Mo/oMD_BUS_B<31:0> qualified by M%mox..,IPR_WR. The Ibox receives the IPR write data and stores it in the Ibox IPR specified by information received directly from the Ebox. Processing Ibox IPR writes via the Mbox allows the M%MD_BUS_B to be used to transfer Ibox IPR write data without the need for a special Ebox-Ibox data bus. The Mbox asserts M%CBOx..,REF_ENABLE_L to the Cbox when the addressed IPR falls outside of the Ibox and Mbox IPR address space. This causes the Cbox to continue processing the IPR_WR. The LOAD_PC command is used to transfer a new Program Counter value from the Ebox to the Ibox via the Mbox. This PC value propagates through the Mbox in order to transfer the Ibox data across Mo/cl\ID_BUS_H<31:0>. Using the Mo/c:MD_BUS_H for this purpose eliminates the need for a . special Ebox-Ibox data bus. The LO.W_PC command operates in a manner identical to an Ibox IPR_"TIt command. The only difference between a LOAD_PC and an Ibox IPR_~~ command is that no IPR address need be decoded. The LOAD_PC command directly specifies the destination of the data as being the Ibox PC. 12.3.9 INVALIDATES The Pcache must always be a coherent cache with respect to the Bcache. In other words, the Pcache must ahvays contain a strict subset of the data cached in the Bcache. If cache coherency were not maintained, incorrect computational sequences oould result from reading "stale" data out of the Pcache in multi-processor system configurations. An invalidate is the mechanism by which the Pcache is kept coherent with the Bcache. A Pcache invalidate operation occurs when data is displaced from the Bcache or when Bcache data is invalidated. The Cbox initiates an invalidate by specifying a hexaword physical address qualified by the INVAL command. The INVAL oommand is latched by the Mbox in the CBOX_LATCH. Execution of an INVAL command guarantees that data corresponding to the specified hexaword address will not be valid in the Pcache. If the hexaword address of the INVAL command does not match to either Pcache tag in the addressed index, no operation takes place. If the hexaword address matches one of the tags, the four corresponding subblock valid bits are cleared to guarantee that any subsequent Pcache accesses of this hexaword will miss until this hexaword is re-validated by a subsequent Pcache fill sequence. If a cache fill sequence to the same hexaword address is in progress when the INVAL is executed, a hit in the corresponding MISS_LATCH is set to inhibit any further cache fills from loading data or validating data for this cache block. Also note that an assertion of C%cBOx..,BARD_ERR_B during a cache fill command causes the cache fill operation to be processed as if it were an INVAL operation. DIGITAL CONFIDENTIAL The Mbox 12-49 NVAX CPU Chip Functional Speci:fication, Revision 1.0t February 1991 12.3.10 CACHE FILL COMMANDS See Section 12.3.5.1 for a discussion of cache :fill operations. 12.3.11 MME CHECK COMMANDS Two commands exist for the purpose of checking references for possible memory management exceptions. 12.3.11.1 MME_CHK The fWlction of the MME_ CRK command is to obtain the allowed access rights for a specified page, and to compare it against an intended access mode specified by M_QUE%S5_AT_B<l:O>. The MME_ CHK command causes a TB access of the PTE corresponding to the MME_CHK address. If the PTE is not cached in the TB, the Mbox first fetches the PTE from memory. Once the PTE information is accessed, ACVITl\TVIM=O checks are performed. If an Ac\~ Th"V or M=O fault is detected, the appropriate memory management fault response is invoked (See Section 12.5.1.5.3 for a description of ACV/TNV!.M=O faults). 12.3.11.2 PROBE The PROBE command is used when the microcode must determine the accessibility of a page before changing any state (e.g. PROBER, PROB~T~ CH!\Ix macro instructions). It'functions exactly as an MME_CHK command except for three differences: • • • If an Ac\~ TNv, or M::O condition is detected, no Acv, TNv, or M=O response is invoked. That is, a PROBE merely detects the condition without actually causing a memory management exception. The PROBE command will update MMESTS based on the probe information if MMESTS is unlocked. However, a PROBE command will never lock MMESTS. The PROBE command returns status to the Ebox which indicates the nature of any memory management condition the PROBE may have detected. If M_QUE%S5..AT_B<l:O>=OO corresponding to the PROBE reference, then the MME_DATAPATH tb_miss sequence is not invoked when the TB detects a miss. Status is returned to the Ebox on the MtQID_BUS_H in the following format: • Mo/tMD_BUS_B<3> is set when the PROBE reference hits in the TB. • M%MD_BUS_B<2> is set when the PROBE reference corresponds to an ACV fault. • M%MD_BUS_B<l> is set when the PROBE reference corresponds to an TNV fault. • M%MD_BUS_B<O> is set when the PROBE reference corresponds to an M=O fault. • All other MIfciMD_BUS_B bits are undefined. NOTE One exception to this PROBE status format exists. When M%MD_BUS_B<2:0> = Oll, the meaning of this code indicates that a TNV has occurred on the PPTE (Process Page Table Entry) corresponding to the PROBE address. It does NOT mean that a TNV and M::O fault have simultaneously occurred on the PROBE address (this would not make sense). 12-50 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 The following tables summarizes all possible PROBE status encodings. Table 12-12: Probe Status Encodlngs XOOO atA=OO No fault. XOOI atA=OO Modify fault. XOIO atA=OO TNVfault. XOll atA=OO TNV fault on PPI'E reference. XlOO atA=OO ACVfault. XlOI atA=OO illegal status (will never be generated) XlIO atA=OO illegal status (will never be generated) XlII atA=OO illegal status (will never be generated) Oxxx at=OO PROBE missed in TB. Lower three bits are a don't care. 1XXX at=OO PROBE hit in TB. Lower three bits are a don't·care. If memory management is turned off (i.e. MAPEN=O) execution of the PROBE command returns a status ofM%MD_BUS_H<2:0>=O indicating that no fault was detected (MlYcMD_BUS_B<3> will vary based on hit/miss TB status). 12.3.12 TB Fills 12.3.12.1 TB Tag Fills The TB_TAG_FILL command is used in conjunction with the TB_PTE_FILL command to cache a PrE in the TB. The data associated with the TB_TAG_FILL command corresponds to a virtual byte address in some virtual page. The TB_TAG_FILL command causes the page address on M_QlJEIreSS_VA...H<31:9> of the TB_TAG_FILL data to be written into the tag field of the TB entry pointed to by the NLU TB allocation pointer (see Section 12.5.1.3 for information about the NLU TB allocation pointer). The TB valid bit (TBV) of the entry is cleared.. When TB_TAG_FILLs occur from the :MM:E_LATCH, the tag data is driven onto M_QUE%S5_V~H in the following format: DIGITAL CONFIDENTIAL The Mbox 12-51 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I VPN I 0I 0I 0I 0 I 0I 0 I 0 I 0I 0 I +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Table 12-13: TB_TAG_FILL Definition Name Extent Type VPN 31:9 w Description VIrtual page address used to fill a TB tag field. During the TB_TAG_FILL, the TB logic will automatically generate even tag parity corresponding to PrE<31:9>. This parity will be written into the TB during the TB_TAG_FILL operation. When TB_TAG_FILLs occur from the Ebox, the tag data is supplied from the address field of the EM_LATCH and is driven onto M..QUE%S5_VA..,B in the following format: Figure 12-39: TB_TAG_FlLL Format (from EM_LATCH): IPR 7E (hex), MTBTAG 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I VPN I 0 I 0 I 0 I 0 I 0 I 0 I 0 I 0 I TP I : MTBTJ! +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Table 12-14: MTBTAG Field Descriptions Name Extent Type Description TP o Even tag parity hit. VPN 31:9 w w Virtual page address used to fill a TB tag field. In this case, the even tag parity corresponding to the VPN is specified in bit<O> of the data field for the TB_TAG_FILL. This mechanism allows COITect or inCOITect parity to be deliberately written into the TB tag array for testability purposes by invoking the TB_TAG_FILL operation through the appropriate MTPR instruction. 12.3.12.2 TB PTE Fills The TB_PTE_FILL operation drives the PTE data onto 1~CQUE%S5_VA...B<31:O> in order that this data can be written into the data array of the TB. The data is written into the entry pointed to by the NLU TB allocation pointer. The TB valid hit (TBV bit) of the entry is set (Note that a TB_TAG_FILL command will not be issued by the Mbox if PrE<31> is clear in order to guarantee that only validated PTEs are ever cached in the TB). The NLU TB allocation pointer is incremented after the fill is done. 12-52 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 When TB_PTE_FILLs occur from the :MME_LATCH, the PTE data is driven onto M_QUE%S5_VA.,.H during a TB_PTE_FILL in the following format: Figure 12-40: TB_PTE_FILL Data Format (from MME_LATCH) 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 11 PROT I M1 0 I 0 1 0 I PFN 1 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Table 12-15: TB_PTE_FlLL DeflnHlon Name Extent Type PFN 22:0 W Description Page frame address Forced to 0 by MME_LATCH 0 25:23 M 26 W PTE modify bit. PROT 30:27 W PTE protection field. 1 31 Valid bit of PrE (must be a "1". See below) Only hits <30:26>, <22:0> and the corresponding PTE parity hit are actually written into the TB array during a TB_PTE_FILL. TELPrE_FILLs from the MME_LATCH will only be issued for validated PTEs. Therefore, PTE<31> will always be set. The TB logic will automatically generate even parity to he written during the fill corresponding to PrE<31:0>. Note that the parity generator includes PTE<31> in this calculation even though this hit is not written into the TB. Since PTE<3l> is always a "ltt during a TB_PTE_FILL, the stored parity can be thought of as odd parity on hits <30:0>. When TB_PTE_FILLs occur from the EM_LATCH, the PTE data is driven onto M_QlJEf1tS5_VA.,.H during a TB_PTE_FILL in the following format: DIGITAL CONFIDENTIAL The Mbox 12-53 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 12-41: TB_PTE_FILL Data Format (from EM_LATCH): IPR 7F (hex), MTBPTE 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 11 PROT I MI 0 1 P 1 0 1 PFN 1 :MTBPT +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Table 12-16: MTBPTE Field Descrtptlons Name Extent Type Description PFN 22:0 Page frame address W Assumed to be a "0" for parity calculation. 0 23 P 24 0 25 M 26 W PTE modify bit. PROT 30:27 W PTE protection field. 1 31 W U ser-settable even parity corresponding to PrE<31:26> and PTE<22:0>. Assumed to be a "0" for parity calculation. Assumed to be a "I" for parity calculation. (See below) Bits <30:26>, <22:0> are written into the TB array during a TB_PI'E_FILL. Bit<24> is interpreted as the corresponding PTE parity and is directly written into the TB as such. This gives the user the flexibility of writing correct or incorrect PrE parity for testability purposes. Note however that while PI'E<31> is not written into the TB, it must be assumed that this bit is set when the user calculates even parity on PTE<31:0>. Similarly, PrE<25> and PTE<23> must be cleared for proper parity calculation. See Section 12.5.1.5.2 for a description of TB fill sequences. 12.3.13 TBIS The TBIS (TB Invalidate Single) command invalidates the PTE entry corresponding to the specified virtual address, providing that the PTE is cached in the TB. If the PTE is not cached in the TB, no action is taken. 12.3.14 TBIP The TBIP (TB Invalidate Process) command invalidates all the PTE entries corresponding to PO or PI space translations which are currently cached in the TB. This command is used when the CPU changes process context. It allows a new process translation state to be set up for the new process context without being polluted by old translations corresponding to the old process context. TBIP does not invalidate PTEs corresponding to system space translations because these translations are valid across all processes. 12-54 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 12.3.15 TBIA The TBIA (TB Invalidate All) command invalidates all PrE entries in the TB and resets the NLU TB allocation pointer to a known state. This is done for CPU initialization purposes, when the operating system reconfigures its system space translations, and when the Mbox clears the TB after encountering a TB parity error. The STOP_SPEC_ Q command is sent by the Ibox to inform the Mbox that no subsequent Ibox specifier references should be processed until the Ebox sends the proper synchronization. This command decrements the SPEC_Q...SYNC_CTR. In all other respects, it is treated as a NOP by the Mbox. See Section 12.3.20 to understand the context of the use of STOP_SPEC_ Q. 12.3.17 UNALIGNED REFERENCES An unaligned reference is a D-stream memory read or memory write reference that refers to data which crosses a quadword-aligned boundary (note that unaligned I/O space references are defined to cause 'U'1\"PREDICTABLE behavior). A quadword boundary is the appropriate address resolution because the Pcache and Cbox read and '\vrite aligned quadwords of data. If a reference crosses a quadword-aligned boundary, the unaligned reference must be translated into two references-one for each distinct quadword memory access. Detection of an unaligned reference is done in S5 by the unaligned detection logic and is a function ofM_QUE%S5_VA_H<2:0> and M_QUEo/cS5_DL_B<l:O> of the S5 reference packet. The following table summarizes all possible unaligned configurations: DL ADDB<2:O> word 111 longword 101, 110, 111 quadword 101, 110, 111 When an unaligned D-stream read, STORE or WRITE is detected, the Mbox does the following: • • • The address of the unaligned reference is used to reference the aligned quadword corresponding to the lower portion of the data. The Mbox generates a second reference corresponding to the aligned quadword corresponding to the upper portion of the reference. In the case of reads, once both references have been executed, the requested data is extracted from the two quadwords and aligned to l\ftQtU)_BUS_B<31:0>. The implication of unaligned processing by the Mbox is that unaligned references are functionally invisible to the Ibox and Ebox. That is, the !box and Ebox can perform reads and writes without regard to alignment. Note that Mbox-generated references and I-stream reads are always aligned references. DIGITAL CONFIDENTIAL The Mbox 12-55 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 12.3.17.1 Unaligned Reads When an S5 read is determined to be unaligned, the S5 command packet is loaded into the VAP_LATCH. However, M..QUE%SS_VA..,.B<31:0> is not directly loaded. Instead the quadword incrementor associated with the VAP_LATCH increments the M_QUE%S5_VA..,.B quadword address. This new address is loaded and is used to reference the upper half of the unaligned data. Meanwhile, the current S5 read command is allowed to execute. When this read successfully completes in S5, the VAP_LATCH is validated indicating that it contains the upper half of the unaligned reference and that it can now be executed. Subsequently, the VAP_LATCH reference will be processed in the S5 pipe. Once it successfully completes in S5, the VAP_LATCH is invalidated. Note that if the read originated from the EM_LATCH, the EM_LATCH was invalidated as the first reference of the unaligned pair successfully completed. However, if the read came from the SPEC_QUEUE, the SPEC_QUEUE is not invalidated until the VAP_LATCH reference successfully completes (See Section 12.3.19.1). When data for the first read is available on BCi'tS6_DATA..B<63:0> (either from the Pcache or the Cbox), the data is rotated by the ~ID_BUS_ROTATOR based on Mo/eS6_PA..,.B<2:0> and latched in the 1ID_BUS_ROTATOR latches. Since the VAP_LATCH read was executed after the initial read. its data is guaranteed to be available during some cycle after the initial data is latched by the j\ID_BUS_ROT.A.TOR. When the second data arrives in S6, the data is rotated by the same number of bytes as '\\"'as done for the first reference. The lower one, two, or three bytes of the Mt;cMD_BUS_B is then driven from the 1-ID_BUS_ROTATOR latches which contain valid data from the first reference while the remaining bytes ofMo/cMD_BUS_B are driven directly from the rotator. The effect of this sequence is to assemble the data from the two reads in a right-justified manner on the M%MD_BUS_B. When the assembled data is driven, M%IBO%..DATA...L and/or M%EBO%..D.ATA...B are asserted to indicate the destination of the data. The RTY_DMISS_LATCH always contains a physical address because it stores retried reads from the 86 pipe. The implication of this fact on unaligned reads is that an unaligned sequence is never initiated from the RTY_DMISS_LATCH because the RTY_DMISS_LATCH address is physical. If an unaligned reference crosses a page boundary, the physical address of the second reference is not guaranteed to be a quadword incremented version of the first reference since the first and second references are associated with different address translations. 12.3.17.2 Unaligned Writes Like unaligned reads, unaligned writes are processed by breaking the reference into two aligned quadword references such that the VAP_LATCH always generates and stores the upper portion. When this EM_LATCH command successfully completes in 85, the VAP_LATCH generates the upper portion of the unaligned write reference in the same manner as an unaligned read. The data driven on M_QUEtfc85_DATA...B<31:0> from the EM_LATCH byte rotator during the first write is latched in the VAP_LATCH. Thus, when the VAP_LATCH write executes, the same data is again driven onto :M..QUEU5_D~B<31:O>. It is the different byte masks and addresses of the two aligned writes which cause the proper bytes to be written into the proper bytes of memory. 12-56 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 12.3.17.3 Byte Mask Generation for Unaligned Writes The byte mask generator must understand whether a given reference is the first or second reference of an unaligned reference pair in order to generate the appropriate byte mask. M_QUE%S5_QUAL.-H<3> is used to determine this. The following table illustrates examples of the behavior of the byte mask generator for aligned and unaligned writes: Table 12-17: Byte Mask L.ogic for Aligned and UnaUgned References ref addr<2:O> 1st 2nd BM (DL-byte) BM (DL-word) BM (DL-long) BM (DL-quad) 000 000 00000001 00000011 00001111 00001111 1st 2nd 001 001 00000010 00000110 00011110 00011110 1st 2nd 010 010 00000100 00001100 00111100 00111100 1st 2nd 011 011 00001000 00011000 01111000 01111000 1st 2nd 100 100 00010000 00110000 11110000 11110000 1st 2nd 101 101 00100000 01100000 11100000 00000001 11100000 00000001 1st 2nd 110 110 01000000 11000000 11000000 00000011 11000000 00000011 . 1st 2nd 111 111 10000000 10000000 00000001 10000000 00000111 10000000 00000111 Since the VAP_LATCH always increments the virtual address by eight, the lower three hits of the VAP_LATCH address will always be the same as the original address. However, the lower three bits of the address sent to the Cbox (M%C_S6_PAd~) are always zeroed on the second half of an unaligned reference in order that the address that is sent off chip is consistent with the corresponding byte mask value. DIGITAL. CONFIDENTIAL. The Mbox 12-57 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 12.3.17.4 Unaligned Destination Specifier Writes When an unaligned DEST_ADDR or unaligned DREAD_MODIFY command is latched in the 8PEC_QUEUE, the unaligned detection logic :flags the unaligned condition and thus, the reference is split into two aligned references by the mechanism described previously. As each one of the pair of commands executes, one entry will be added to the PA_QUEUE. When the corresponding data arrives in the EM_LATCH via the STORE command, the data is rotated based on the lower two address bits output from the PA_QUEUE. The rotated data is then matched up with the reference driven from the PA_QUEUE to form a newly assembled WRITE command. Since the reference driven from the PA_QUEUE indicates that M_QUEo/c&_QUAL-.B<4>=1 (i.e. this reference is the first part of an unaligned pair), the v..~_L..-\TCH latches and validates a copy of the STORE command with the rotated STORE data. 'When this newly assembled ,\\'RITE command successfully completes in S5, the bottom entry of the PA_ QUEUE is retired. When the VAP_LATCH subsequently executes the second STORE reference, the second entry in the PA_QUEUE is matched with it and retired. In effect, the STORE data is split into two STORE commands so that each STORE is merged with each PA_QUElJE entry to form two WRITE commands. 12.3.17.5 Implication 01 Ebox unaligned references on M%EM_LAT_FULL_H The EM_LATCH is invalidated whenever the E...\{_LATCH reference successfUlly completes in S5. However, if the E~f_LATCH reference was unaligned, the second half of the reference still awaits processing in the VAP_LATCH even though the EM_LATCH has been invalidated. Clearing the E:\-I_LATCH \vbile the second half of an unaligned Ebox reference is still pending could release the EM_STALL condition causing the Ebox microcode to advance even though the Mbox has not completed processing of the second part of the previous unaligned reference. This scenario is undesireable since the Ebox microcode makes synchronization assumptions based on references being retired from the EM_LATCH. To preserve these assumptions, the Mbox will assert M%El\LLAT_FULL_B until both halves of the unaligned reference have been retired even though the EM_LATCH will have been invalidated earlier. Note that this applies to both unaligned reads and unaligned writes. 12.3.18 ABORTING REFERENCES The Mbox abort operation is used to cancel the current 85 operation. When an abOrt is executed, the 85 state, which would normally be updated due to execution of the current 85 reference, is not updated. The aborted S5 reference is not propagated into S6. Instead, a NOP is introduced into the S6 pipe. In effect, an aborted S5 reference is equivalent to a NOP command being executed in 85. . Note that the abort operation should be viewed as only cancelling the current execution of a reference. In most cases, aborting an operation does not invalidate the existence of the corresponding reference, which will still be stored in one of the reference sources and retried at a later point. The abort operation is executed when M_S5C...AB~ORT_L is asserted. The following changes to Mbox state are inhibited during the cycle in which M_S5C_AB'l'%A.BORT_L is asserted: • The reference source which drove the aborted command into 85 does not invalidate the corresponding command. Thus, the reference still exists to be retried during a subsequent cycle. 12-58 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 NOTE There are two exceptions to this rule. The CBOX_LATCH is always invalidated after it drives a command into S5. The EM_LATCH will be invalidated if the Ebox has explicitly requested it to be (via the E%EM_ABORT_L signal). • • Loading the PA_QUEUE with a DEST_ADDR or DREAD_MODIFY command is inhibited. Emptying the PA_QUEUE when a STORE command is driven in 85 is inhibited. If the unaligned detection logic detected an unaligned reference during the aborted cycle, the VAP_LATCH is not validated to contain the second portion of the unaligned sequence. , 2.3.18.1 Conditions for Aborting References In general, references are aborted for five reasons: • • • • • The reference is aborted to prevent a reference order restriction from occurring (see Section 12.3.18.1.1). The reference is aborted because insufficient hardware resources are available to complete processing of the current command. The reference is aborted because a memory management operation must be performed prior to execution of the current reference. The reference is aborted in order to avoid a deadlock condition related to unaligned references. The reference is aborted due to an external flush condition. The following describes the specific conditions which can invoke an abort operation for each of the five categories listed above. , 2.3.18.1.1 • • • Aborting to Maintain Reference Order Restrictions Aborting D-stream hits under D-stream misses: Consider the case where two D-stream reads are executed in back-to-back cycles. In this case, the second D-stream read will be aborted in 85 if the first D-stream read misses in the Pcache in S6. This prevents the possibility of propagating the second read into 86 and having it bit and return data before the first read returns data. Note that this condition applies to all D-stream "read_like" references (i.e. references which return data to the Ebox). Specifically, this condition applies to DREAD, DREAD_MODIFY, DREAD_LOCK, IPR_RD, and PROBE commands. Aborting I-stream hits under I-stream misses: The Mbox initiates an lREAD sequence by issuing consecutive lREAD commands via the I-stream "fill forward" mode (See Section 12.3.5.2.1). If the first !READ in this sequence misses in the Pcache in 86 while the second lREAD is executing in S5, the second IREAD is aborted. This is done to handle I-stream reads in an analogous fashion to D-stream reads. Aborting to preserve order of Ibox reads relative to Ebox writes: As explained previously, the PA_QUEUE is the structure used to store pending destination specifier addresses until the Ebox can supply the corresponding data to complete the write reference. Once the Ebox supplies the data, the write executes and the corresponding entry in the PA_QUEUE is invalidated. DIGITAL CONFIDENTIAL The Mbox 12-59 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The comparator function built into the PA_QUEUE is used to detect address matches on bits<8:3> between Ibox D-stream read references and any of the valid PA_QUEUE entries. Consider the example shown in Figure 12-14. In this example, the Ibox would decode the destination specifier of the first MOVL instruction which causes a DEST_ADDR command to be sent to the PA_QUEUE. Subsequently, the Ibox would decode the first specifier of the second MOVL, causing a read to be issued to the MbOL When this read is started in the 85 pipe, a PA_QUEUE comparator will detect an address conflict between the read and the pending destination address. As a result, the read is aborted and is not successfully executed until the write completes. Thus, all reads originating from the SPEC_QUEUE are aborted if the PA_QUEUE detects an address conflict. Note that the PA_QUEUE must always detect physical address conflicts. Detecting virtual address conflicts is not sufficient since two or more different virtual pages could he mapped to the same physical page causing two or more different virtual addresses to conflict on the same physical longword. Also note that the PA_QUEUE is capable of detecting false conflicts because only address bits <8:3> are compared rather than the entire address. Performance data indicates that the number of false conflicts using addr<8:3> is sufficiently low to have an insignificant performance degradation. Bits <8:3> are used since they are untranslated address hits and, therefore, are immediately available for use without waiting for the address to be translated. The lower three bits are not used because the PA_QUElJE must detect confiicts at quad\vord resolution. The follo\ving diagram illustrates why quadword resolution· must be used: Figure 12-42: PA_QUEUE conflict detection <-------------------------------- memory ali~eQ quadworQ ------------------------------> 1 1 1----------1----------1----------1----------1----------1----------1----------1----------1 1 <---PA_QUEUE entry addresses this longword--> 1 A DREAD is issued which adaresses this byte <----+-----> I ------------------+ PA QUEUE addr<2:0>: 010 DREAD addr<2:0>: 101 The diagram above illustrates eight bytes of memory within a memory aligned quadword. In this example, the PA_QUEUE contains a destination address which references a longword. While this reference is not longword aligned., it is handled as an aligned reference because the reference does not cross an aligned quadword boundary. Consider the byte DREAD shown above which is issued by the SPEC_QUEUE and is executed in 85 in the presence of the PA_QUEUE entry. W'hile a PA_QUEUE address conflict clearly exists on the fifth byte within this quadword, the lower three bits of the PA_QUEUE address do not mat41 the lower three bits of the DREAD address. Thus, the the lower three bits cannot be used for the purposes of PA_QUEUE conflict detection. DREAD_MODIFY references with DL=quadword pose a special problem for the PA_QUEUE conflict logic. Quadword memory operands are requested by the Ibox by issuing aD-stream reference with DL=quadword followed by another D-stream reference with DL=longword. The first reference causes the lower balf of the quadword operand to be returned on M%MD_BUS_B<31:0> (i.e. all quadword DREADs only return a longword of data). The second reference addresses the upper half of the quadword causing the upper half of the operand to be returned on Mo/cMD_BUS_B<31:0>. If the quadword operand is aligned, both 12-60 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Func~oD81 Specification, Revision 1.0, February 1991 • • the quadword and the longword references have the same quadword address. Thus, when the DREAD_MODIFY longword reference is executed in 85, a PA_QUEUE address conflict could be detected against the DREAD_MODIFY quadword reference previously loaded. If this were to happen, a deadlock state would exist within the NVAX chip because the colTesponding STORE data for the quadword operand cannot be generated to clear the PA_QUEUE until the Ebox receives the entire requested quadword operand, which cannot happen as long as a PA_QUEUE address conflict is detected. A similar deadlock situation could result from an unaligned DREAD_MODIFY quadword operand. To avoid this deadlock problem, the PA_QUEUE control logic stores a state bit for each entry to indicate whether the DL is quadword. If the last entry loaded contains a quadword, the PA_QUEUE address conflict logic associated with that PA_QUEUE entry is inhibited. This avoids deadlock by preventing the PA_QUEUE conflict logic from detecting a conftict between the first half and the second half of the same DREAD_MODIFY quadword specifier. 1/0 space reads prefetched by the Ibox which are destined for the Ebox must be inhibited until the Ebox is stalling on that particular 110 space read: Since certain 110 de\ices can cause their state to change based on a read reference to that device, the possibility exists for I/O device state to be improperly modified based on Ibox prefetching of operands. '\\~e must guarantee that any state change only occurs within the context of Ebox execution of the corresponding instruction. Thus, I/O space reads are aborted in S5 until \ve can guarantee that the Ebox is executing the instruction corresponding to the 1/0 space read. This function is implemented by aborting any I/O space read originating from the SPEC_ Qu~lJ'E which returns data to the Ebox when either of the follo\Ung two conditions is true: 1. Eo/cSTART_IBOX_IO_RD_R is deasserted. £C""J'D8TART_IBOx..IO_RD_H is an Ebox signal that informs the Mbox that the S3 Ebox pipe is currently in MD_STALL waiting for an operand to be retmned. Thus, the deassertion of this signal indicates that the Ebox cannot CUlTently be stalling on the I/O space operand. 2. A NOP command does not currently exist in the 86 pipe. This condition is necessary to account for a timing boundary condition which can exist between the Mbox and Ebox. It is possible for the Ebox to be MD_STALLing on an 86 reference corresponding to a previous instruction when the I/O read is in 85. In this case, :nsTART_mox..IO_RD_B could be asserted in reference to the previous MD data which may exist in the 86 pipe while the I/O space reference exists in the S5 pipe. To avoid this potential problem, the I/O space reference is aborted until a NOP is detected in S6 which indicates that this boundary condition cannot exist. Note that it is necessary to stipulate that this abort condition only affect Ibox I/O space DREAD references which directly return data to the Ebox. Thisis because it is conceivable that a deferred mode destination specifier could cause the DREAD of the address of the operand to map to I/O space. In this situation, the Ebox will never MD_8TALL on this reference since it corresponds to a destination specifier. Thus, the pipeline could hang if the Mbox unconditionally aborted all Ibox I/O space DREADSs. By conditioning M_QUE~5_DEST_H into this abort equation, this deadlock condition is avoided by only applying this abort condition to DREADs which return data to the Ebox Aborting reads to the same Pcache index as a pending readlfill operation: As stated in 8ection 12.2.13, allowing two Pcache fill sequences to simulataneously operate on the same Pcache block creates the possibility of corrupting this Pcache block. To prevent this, address bits <8:5> of the DMI8S_LATCH are compared against M_QUE%S5_PA...B<8:5> when 85 contains an lREAD and the DMISS_LATCH is validated. If there is a match, the 85 lREAD OIGITAL CONFIDENTIAL The Mbox 12-61 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 is aborted in order that a potential I-stream :fill sequence does not pollute the Pcache block . associated with the D-stream fill already in progress. Note that address bits<8:S> are used to detect a Pcache index address conflict even though bits<11:S> represents the entire Pcache index. The upper three bits of the Pcache index are not used because these can be translated address bits which are not available in time for the address comparator circuit. By only using bits <8:S>, some false address confiicts may occur. A false address conflict will needlessly delay processing of a read or write reference, however, the NVAX performance model has shown that this has a negligible impact on overall performance. Even if a true Pcache index conflict is detected, it is possible that there is no block confiict because the 2-way set associative Pcache contains two blocks per index. In order to reduce hardware complexity however, a block conflict is assumed to have ocCUITed whenever an index conflict is detected even though the references may address different blocks within the index. By the same rationale, the same address bits ofa valid IMISS_LATCH are compared against M_QUEo/aSs_PA..B<B:5> when S5 contains a D-stream read. If a match is found, the 85 read is aborted in order to let the I-stream fill proceed without possible cOlTUption. • Aborting writes or STORESs to the same Pcache index as a pending read/fill operation: As stated in Section 12.3.1B.1.1, writes should be inhibited from executing if they map to the same Pcache block as a Pcache fill already in progress. Otherwise, the memory 'Write data could miss in the Pcache block during a fill sequence before the Cbox supplied the iiII data. v.'hen this subblock is filled by the Cbox, this Pcache subblock. would be validated with old data. Therefore, the write data which was processed by the Mbox would not be refiected in the Pcache. •o\voiding this situation is accomplished by the comparators built into the DMISS_LATCH and !MISS_LATCH. If either of these latches are valid, and bits <B:5> of the fill address equals M_QUEo/cSS_PA..B<B:5> of an S5 write or S5 STORE, then the SS write is aborted. Note that since the entire write address is not compared, we may abort writes when there was not a true address conflict. This is done however, for circuit speed reasons and does affect the overall CPU performance appreciably. 12.3.18.1.2 • • Aborting due to lack of hardware resources Aborting a "read_like" reference when the RTY_DMISS_LATCH is full: Consider the situation where a D-stream fill is execu.ti.ng and the RTY_DMISS_LATCH stores the next read to be executed. If a third read is started in SS, it is automatically aborted. H the third read were not aborted two incoITeCt scenarios would result. The third read could miss in S6 with no where to put it, since both the DMISS_LATCH and the RTY_DMISS_LATCH are full. If the third read bit, its data would be returned before the data of the second read, which is equivalent to an illegal "bit under miss" scenario. For the purposes of the above discussion, a "read-like" reference is defined as any reference which returns data to the Ebox. Thus, a read-like reference is a DREAD, DREAD_MODIFY, DREAD_LOCK, IPR_RD, or PROBE command. Aborting DEST_ADDR or DREAD_MODIFY due to insufficient room in PA_QUEUE: If a destination specifier reference is executing in SS, but there are insufficient PA_QUEUE entries to store the reference, the Mbox has no choice but to abort the S5 reference and retry it later when more PA_QUEUE entries free up. If the 85 reference is unaligned, the abort logic tests for two empty slots in the PA_QUEUE since two will be required for the unaligned reference. If the S5 reference is aligned, only one slot need be available. 12-62 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • Aborting an S5 write, STORE or Cbox IPR_WR due to Cbox back-pressure: All S6 Cbox writes are automatically transferred to a write buffer in the CbOL The Cbox uses this write buffer to store the writes until they can be written into the Cbox, Bcache or main memory. If this write buffer becomes sufficiently full so that we cannot guarantee that the S5 write or STORE can be loaded into the write buffer when it propagates to 96, the S5 command is aborted. The Cbox asserts write buffer back-pressure to the Mhox by asserting C%WR_BUF_BACK,..PRES_H. 12.3.18.1.3 Aborting due to memory management operation When a tb_miss or cross-page condition is detected, a memory management operation must be processed before the S5 reference can be allowed to complete. Thus, detection of a tb_miss or cross-page condition causes the S5 command to be aborted until the memory management operation finishes. This also prevents the possibility of having to handle a second memory management sequence before the first memory management sequence completes. The two specific abort conditions are: • • Aborting an S5 reference due to TB_:MISS condition: If the virtual address of the S5 reference is not found in the TB, the corresponding physical address cannot be immediately derived. Therefore, the reference is aborted until the translation can be cached in the TB (See Section 12.5.1.5.2 for information on memory management). Aborting an S5 reference due to CROSS_PAGE condition: If an unaligned S5 reference references two pages, a CROSS_PAGE condition has been detected. In this situation, access checks of both pages must be made before the reference is allowed to complete. Therefore~ the reference is aborted and retried after the CROSS_PAGE check has tested the upper page (See Section 12.5.1.5.4). In either situation described above, all but.two reference types from the Ibox or Ebox references will be continually aborted until the memory management sequence completes. The two exceptions are the STOP_SPEC_ Q and STORE commands. Since these references are guaranteed not to require any memory management function, these references are allowed to proceed. Note that while a STOP_SPEC_Q command is never aborted, it is transformed into a NOP command as it enters the S6 pipe. This is allowable since no S6 function is performed by this command and it offers an extra 86 data bypass opportunity. 12.3.18.1.4 Aborting due to an external flush condition This abort condition will be explained in the discussion of fiushes. 12.3.19 MBOX PIPELINE DEADLOCK AVOIDANCE SCENARIOS Two special considerations have been designed into the Mhox in order to avoid two possible pipeline deadlock conditions. DIGITAL CONFIDENTIAL The Mbox 12-63 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 12.3.19.1 Unaligned Reference Deadlock Condition Consider the situation where the second part of an unaligned D-stream read is driven into S5 from the VAP_LATCH. If this read conflicts with the quadword address of a valid PA_QUEUE entry, this read will be aborted based on PA_QUEUE address conffict detection. If the VAP_LATCH is not cleared, a pipeline deadlock situation has occurred because the VAP_LATCH command will always execute before an EM_LATCH command. However, a STORE command originating from the EM_LATCH is the only way the PA_QUEUE confiiet can be eliminated. Therefore, in addition to aborting the VAP_LATCH reference during a PA_QUEUE conflict, the VAP_LATCH must be invalidated in order that the arbitration logic can select the EM_LATCH STORE command to clear the PA_QUEUE confiiet condition. Clearing the VAP_LATCH due to PA_QUEUE confiict detection has several implications. It means that the unaligned sequence must be restarted from the beginning in order to re-generate the VAP_LATCH reference. This. is why the colTesponding SPEC_QUEUE entry is not invalidated until the entire unaligned sequence successfully completes in S5. A side effect of this is that the first read of the unaligned sequence will be re-executed causing two read references to the same data. This, however, is harmless if the read is to memory. This may not be harmless if the read is to I/O space, however, unaligned I/O space reads are defined to yield u'"}.&-PREDICTABLE results. Another implication of avoiding this pipeline deadlock is that the bottom entry of the PA_QUEUE must be in~alidated if the VAP_LATCH command was a DREAD_MODIFY command. If it was a DRE..c\D_1vIODIFY, the first reference of the unaligned pair had already introduced an entry into the PA_QUEUE. Since the first reference will be re-executed, the corresponding PA_QlJEUE entry is invalidated to avoid replicating the same PA_QL'"EliE entry twice. 12..3.19.2 READ_lOCKlWRITE_UNlOCK Deadlock Condition Once a READ_LOCK command has been passed to the Cbox, the Cbox will not process any subsequent D-stream read references until the corresponding WRITE_UNLOCK command has been executed. This behavior introduces a deadlock consideration. Consider the situation where a DREAD_LOCK has been sent to the Cbox. Before the EM_LATCH is loaded with the corresponding WRITE_UNLOCK, the Mbox starts processing an !READ reference which misses in the TB. The resulting memory management sequence will issue a D-stream PTE read which the Cbox will not process until it has received the WRITE_UNLOCK command. However, the Mbox will never send the WRITE_UNLOCK (or any other Ebox or Ibox reference) until the memory management sequence completes, which can not occur until the PTE DREAD completes. This deadlock condition is avoided by the arbitration logic by disabling IREF_LATCH selection once a DREAD_LOCK command has successfully been retired from the S5 pipe. Thus, no !READ TB_MISS can occur between the READ_LOCK and WRITE_UNLOCK, thus avoiding the deadlock situation. The arbitration logic will re-enable !REF_LATCH selection on either of the following two conditions: 1. A WRITE_UNLOCK reference has been retired from the S5 pipe. This will cause the Cbox to resume D-stream read processing, thus eliminating the deadlock condition. 12-64 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 2. E%FLUSH_MBO~B is asserted by the Ebox due to a hard error. This condition should occur much more infrequently than the above condition because a WRITE_UNLOCK must normally be issued after a READ_LOCK.. However, if an error OCCUlTed sometime between the READ_LOCK and WRITE_UNLOCK, a hard error microtrap will result preventing a WRITE_UNLOCK from being issued. The microtrap will generate E%FLUSH-MBOx...,B which re-enables IREF_LATCH selection because no WRITE_UNLOCK will follow. Note that the Cbox state, which prevents subsequent D-stream reads from being processed before the WRITE_UNLOCK, will be cleared by an IPR_WRITE during the error handler. Note that the analogous deadlock condition involving a SPEC_QUEUE reference cannot occur because Ibox processing will have been halted prior to the READ_LOCKlWRITE_UNLOCK sequence. The analogous deadlock condition involving an EM_LATCH reference will not occur because Ebox microcode will never issue a D-stream read in the middle of a READ_LOCK/\VRITE_UNLOCK sequence. The PA_ Qt.,'EUE address comparator function can maintain the relative order of specifier reads and destination specifier writes because both the reads and the writes originate from the same Ibox pipeline stage and are loaded into the same reference queue. Ho\'\""e'\"er~ when the Ebox issues reads or ,,"rites independently of the Ibox destination specifier decodes, the PA_QUEUE cannot be used since there is no implied ordering between the Ibox reads and Ebox reads or writes from t\vo different pipeline stages. In this case, an 8-state counter, called the SPEC_'LSYN'C_CTR, is used to prevent Ibox memory operand prefetching when the Ebox can be writing to memory. When the Ibox decodes an instruction that can cause explicit Ebox writes which are independent of the Ibox destination specifier decodes (e.g. MOVC), the Ibox loads the SPEC_QUEUE with a STOP_SPEC_ Q command after all specifer references for the same instruction have been l()aded. Execution of STOP_SPEC_Q in S5 causes the SPEC_Q...SYNC_CTR to be decremented. The nominal state of this counter is one. Whenever, the value of SPEC_'LSYNC_CTR is zero, the arbitration logic will not select a SPEC_QUEUE reference as the source for the S5 pipe for the next cycle. The effect achieved is to stop all Ibox specifier references from occurring after the STOP_SPEC_ Q command has executed. When the Ebox completes all explicit writes for the instruction which caused the Ibox to issue the STOP_SPEC_Q command, the Ebox asserts the EQESTART_SPEC_QVEVE_B signal. Each assertion of E%RESTART_SPEC_QVEUE_B causes the SPEC_'LSYNC_CTR to be incremented. Subsequent specifier reference processing resumes when the value of SPEC_Q....SYNC_CTR is positive. Thus, the SPEC_Q....SYNC_CTR acts as a synchronization device to stop processing of specifier references whenever the Ebox may be independently modifying memory state. Note that a value of zero in the SPEC_Q....SYNC_CTR only prevents the arbitration logic from selecting the SPEC_QUEUE as the 85 reference source. It does not prevent the Ibox from loading additional references into empty 8PEC_QUEUE entries. The 8PEC_'LSYNC_CTR is an 8-state unsigned counter which can store values from 0 to 7. A counter function must be used for this synchronization function because pipeline behavior can cause the Ebox to assert ~B.ESTART_SPEC_QUEUE_B multiple times before the Mbox ever processes any STOP_SPEC_Q commands. For example, if the Mbox is executing a TB_MI8S :flow while the Ebox is retiring multiple instructions associated with this synchronization scheme, multiple assertions of E%RESTART_SPEC_QUEUE_B will result even though no STOP_8PEC_Q commands have been processed yet due to the on-going memory management sequence. DIGITAL CONFIDENTIAL The Mbox 12-65 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Thus, the SPEC_Q..SYNC_CTR buffers up the E%RESTART_SPEC_QUEUE_B assertions until the corresponding STOP_SPEC_Q commands are processed from the SPEC_QUEUE. Note that there is no need for the SPEC_Q...SYNC_CTR to buffer up multiple instances of STOP_SPEC_Q because the SPEC_QUEUE intrinsically buffers these instances. The 8-state SPEC_Q..SYNC_CTR can buffer up to six E%RESTART_SPEC_QUEUE_B assertions (SPEC_Q...SYNC_CTR values 2 through 7). Six buffer states are sufficient to buffer all pending instructions which could result in the Ebox assertion of E%BESTART_SPEC_QUEUE_B because at most six of these instructions can be issued to the Ebox before the Ibox is back-pressured from decoding the next instruction of this type. Six buffered states are derived from the fact that the Ibox must fill its four-stage pipeline in addition to the 2-entry SPEC_QUEUE before it is back-pressured by the SPEC_QUEUE from issuing any further instructions which the Ebox could assert E%BESTART_SPEC_QUEUE_B in response to. 12.3.21 FLUSHING REFERENCES FROM THE MBOX PIPE Flushing the Mbox pipeline refers to altering the state of the :M'box in a controlled way so that certain pending and currently executing references are eliminated from the MbOL There are two distinct mechanisms that cause different types of references to be flushed. One type of ftush originates from the Ibox and the other type from the Ebox.. 12.3.21.1 Ibox Flushes If the Ibox 'VI C is in the process of being filled by a previously requested IRE-ID, and the Ibox. has determined, or has been forced, to start decoding instructions at a new point in the I-stream requiring another VIC fill, the Ibox asserts the signal, I%FLUSB_IBEF_LAT_H, to the Mbox. From the Ibox point of view, assertion of I%FLUSH_mEF_LAl'_H indicates that the current VIC fill operation will be immediately cancelled. This allows the Ibox to invoke a new VIC fill operation via a new IREAD, without having to wait for the current VIC fill operation to complete. From the Mbox point of view, assertion of I%FLUSB_IBEF_LAT_B aborts all pending and currently executing I-stream activity by penorming the following actions: 1. The IREF_LATCH is invalidated. Any IREAD sent to the Mbox during the cycle I%FLUSB_IREF_LAT_B is asserted is not validated. 2. If the CUlTeIlt 85 reference is an IREAD or an I_OF, it is aborted. 3. .The !MISS_LATCH is invalidated and all state indicating an outstanding I-stream fill is cleared. If the IMISS_LATCH is being loaded during the cycle that llfoFLUSB_mEF'_LAT_B is asserted, the IMISS_LATCH is not validated. 4. The signal, M%ABORT_CBO~mD_H, is asserted to the CBOX to indicate that the M'box does not want any more I_OF references which may have been pending in the Cbox. If I%FLUSB_IBEF_LAT_B is asserted during a cycle with an outstanding istream read or fill, the Mbox logic guarantees that the M%VIC_DATA...,L signal will not be asserted in response to the IREAD during any subsequent cycles. However, M%VIC_DATA..L may be asserted during the same cycle that I%FLUSB_IREF_LAT_B is asserted. It is the responsibility of the Ibox to ignore the corresponding data in this case. 12-66 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 12.3.21.2 12.3.21.2.1 Ebox Flushes Flushing due to E%EM_ABORT_L Due to the construction of the microcode, it is possible for the Ebox to issue a reference to the Mbox only to discover during the following cycle that the reference should not have been issued. In this case, the Ebox asserts E%EM_ABORT_L during the cycle following when the reference was issued. E%El\LABORT_L causes the Mbox to unconditionally clear the EM_LATCH and to abort the S5 reference if that reference was driven from the EM_LATCH. The net effect is to :flush out all Mbox state associated with this Ebox reference. 12.3.21.2.2 Flushing due to EO/oFLUSH_MBOX_H When the Ebox determines that a branch misprediction took place, or that process context is to be changed, or that an exception or interrrupt bas occured, the macropipe1ine must be flushed in order that no processor state changes as a result of subsequent pipeline operations. As part of this flush operation, all pending or currently executing references in the Mbox which correspond to flushed instructions are immediately and permanently aborted. The Ebox informs the Mbox of this situation by asserting E%FLUSH_MBO~H. The assertion of E~FLusH_!tmoX_H invokes the following Mbox actions: 1. The SPEC_Ql)~lJ"-E is invalidated. Any reference sent to the !vIbox SPEC_Qti'Eu~ during the cycle in which E~FLusB_!tmox..H is asserted is not validated. 2. The SPEC_Q..Sl~C_CTR is unconditionally reset to the value of O. The effect of this is to inhibit further SPEC_QUEu~ reference processing by never selecting the SPEC_Qu""Eu""E as the S5 reference source (See Section 12.3.20). It does not inhibit the Ibox from loading references into the SPEC_ Q'UEUE during subsequent cycles, however. This function is associated with the scheme for fiushing the PA_QUEUE. See Section l2.3.21.2.3. 3. If the current S5 reference was driven from the SPEC_QUEUE, it is aborted. 4. If the EM_LATCH contains any type of read, IPR_RD, probe or MIv.tE_CHK, it is invalidated. Any reference sent to the EM_LATCH during the cycle that E%FLUSH_MBO~H is asserted is not validated. 5. If the current S5 reference was driven from the EM_LATCH, and this reference is any type of read, IPR_RD, probe or :M:ME_CHKit is aborted. 6. If the VAP_LATCH contains any type of read or DEST_ADDR, it is invalidated. H a read or DEST_ADDR is being loaded into the VAP_LATCH during the cycle that E%FLUSH_MBO~H is asserted, the VAP_LATCH is not validated. 7. If the current S5 reference was driven from the VAP_LATCH, and this reference is any type of read or DEST_ADDR, it is aborted. 8. If the RTY_DMISS_LATCH contains any type of an Ibox or Ebox read, it is invalidated. If an Ibox or Ebox read is being loaded into the RTY_DMISS_LATCH during the cycle that E%FLUSB.J'tIBO~H is asserted., the RTY_DMISS_LATCH is not validated. 9. If the current S5 reference was driven from the RTY_DMISS_LATCH, and this reference is an Ibox or Ebox read, it is aborted. 10. If the DMISS_LATCH contains a currently outstanding Ibox or Ebox read, the DMISS_LATCH state is modified to indicate that the data should not be sent to the Ibox or Ebox when the data becomes available. 11. MMESTS<31:29> are cleared. This unlocks the MMESTS reg. DIGITAL CONFIDENTIAL The Mbox 12-67 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The effect of items 1 through 10 above can be summarized as follows. All Ibox and Ebox D-stream reads, which have not yet propagated into 86, are blown away. Note that Mbox D-stream reads (PrE references) are not affected by E%FLUSB_MBOx..H. Any outstanding D-stream fill sequence corresponding to an Ibox or Ebox D-stream read is allowed to complete in order that the D-stream data is filled in the Pcache. However, the requested data will not be returned to the Ibox andlor Ebox. Any WRITE or STORE reference which existed in one of the Mbox reference sources PRIOR to the E%FLUSB_MBOX_B assertion is allowed to complete in the presence of the E%FLUSH_MBOX_B assertion. This is necessary because any write data existing in the Mbox prior to the E%FLUSB_MBOx..B assertion represents a memory modification corresponding to an action before the Ebox decided to ftush. If E%FLUSB_MBOX_B is asserted during a cycle with an outstanding D-stream read or D·stream fill, the Mbox logic guarantees that the M%IBOX_D~L and M%EBOx..DATA.-B signals will not be asserted in response to the D-stream read/fill during any subsequent cycles. However, M%IBOX_DAl'A....L or M%EBOX_D.A.T.A...B may be asserted during the same cycle that E%FLUSB_MBOx..B is asserted. It is the responsibility of the Ibox and Ebox to ignore the corresponding data in this case. Note that I~FLUSB_IBEF_LAT_B causes an outstanding I-stream fill sequence to be completely stopped~ but E~FLUSH_l\IBOX_H allows an outstanding D·stream fill sequence to continue without returning data to the Ibox andlor Ebox. These two cases are handled differently based on performance model data 'which indicates that it is beneficial to future references to complete the D·stream fill, but allowing the I-stream:fill to complete only binders the immediate need of accessing different I-stream data. 12.3.21.2.3 Ebox Flushing of the PA_QUEUE The function of E%FLUSH_MBOX_B described above is to clear out reference state associated with instructions that had not yet been started by the Ebox. Note however, that E%FLUSH_MBOx..B does not flush the PA_QUEUE even though the PA_QUEUE may contain reference state that should be logically ftushed by E%FLUSB_MBOX_B. This is because the PA_QUEUE may also contain reference state associated with the currently executing Ebox instruction. The PA_QUEUE entries associated with the currently executing Ebox instruction must be retired from the PA_QUEUE in the normal fashion before the remainjng PA_QUEUE entries may be flushed. Thus, flushing the PA_QUEUE is a two-step process described as follows: 1. As described in Section l2.3.21.2, ECd'LUSB.,MBOx..B inhibits the Mbox arbitration logic from selecting SPEC_QUEUE references for processing during subsequent cycles. This function guarantees that no more PA_QUEUE entries can be filled during subsequent cycles. 2. Once the Ebox has issued all STOREs corresponding to state modifications that must occur before the Mbox is completely ftushed, the Ebox issues another reference which is qualifted with the E%FLUSH_PA...,QUEUE_H signal. Once this EM_LATCH reference executes in S5, the Mbox is guaranteed to have completed all subsequent STORE references. Thus, when this EM_LATCH reference executes, the remaining entries in the PA_QUEUE are flushed. Note that both halves of an unaligned STORE will complete before the "E%FLUSH_PA...QUEUE" reference is executed because the second half of the reference is stored in the VAP_LATCH, which has higher priority than the EM_LATCH. 12-68 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The Ebox will assert E%RESTART_SPEC_QUEUE_H once the "E%FLUSB..PA..,.QUEUE" reference has been latched in the EM_LATCH. E%RESTART_SPEC_QUEUE_B re-enables Mbox processing of SPEC_QUEUE references during subsequent cycles. MICROCODE RESTRICTION Eo/cFLUSH~Ox...H has been asserted, E%FLUSB_PA..,.QUEUE_R and E%RESTART_SPEC_QUEUE_R must be asserted before the Ibox or Ebox require further Mbox processing of Ibox or Ebox D-stream references. E%FLUSR_PA....QUEUE_H and E%RESTART_SPEC_QUEUE_R must be asserted during a cycle subsequent to the assertion of E%FLUSR_MBOx...H, and only when the microcode guarantees that all corresponding STORE commands have been retired by the EM_LATCH. DIGITAL CONFIDENTIAL The Mbox 12-69 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 12.4 THE PCACHE The Pcache is a two-way set associative, read allocate, no-write allocate, write through, physical address cache of I-stream and D-stream data. The Pcache has a one cycle access and a one cycle repetition rate for both reads and writes. It stores 8192 bytes (8K) of data and 256 tags corresponding to 256 hexaword blocks (1 hexaword = 32 bytes). Each tag is 20 bits wide corresponding to bits <31:12> of the physical address. There are four quadword subblock.s per block with a valid bit associated with each subblock. The access size for both Pcache reads and writes is one quadword. Even byte parity is maintained for each byte of data (32 bits per block). One bit of even parity is maintained for every tag. The logical orgainiza tion of the Pcache is shown below: Figure 12-43: Logical Pcache Organization <-------------- l~!~ bank ----------------> <-------------- rigr.~ bank ---------------> ---------------------------+-------------------------------~------------------.----------------------------~----------+------.-------------+----------~-----------.-------------~----------~----------------------~------~------------------~-----~----------~--------------------~ ~--------------------------~-------------~------~---------~----.------~-------------------- 12;: +---.----+-----+----+------+------~------+------+----+-----+----+------+------+------+------+ I A I !? I '!'ACO I i.1S I DID? i DIDP I DID? I D/DP I n I TAG I VB i IilIiP I D/DP I IiIIiP ! D/DP I ---------~----+----~------~-------------+------~----+-----+----+------+------~------+------~ where: .r. TPTAG - V3- D/DP - Alleea~ion bit. In~icates whether the le!t or right bank was last allocated. 1 bit of even tag parity. 20 bits of tag address. 4 valid bits. Each bit corresponds to 8 bytes of data. a bytes o! data with 8 bits of even byte parity (72 total bits). The Pcache is logically organized into 128 direct mapped indexes, where each index consists of two blocks, and each block consists of: 20-bit tag, 1-bit tag parity, 4 valid bits, 256 bits of data, and 32 bits of data parity. In addition, each index also contains a one bit allocation pointer. The breakdown of address bits for Pcache decoding is shown below: 12-70 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 12-44: Pcache Address Breakdown 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ tag address index address I I 1 I +--+--+--+--+--+--+--~-+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ <--+--> 1 subblock address where: 12.4.1 -------+ tag address bits loaded into or compared with tag. index address addresses 1 0: 128 indexes. subblk address - addresses 1 o! 4 aligned quadwords within the hexaword data block. PCCTL The PCCTL controls the mode of operation of the Pcache. PCCTL is accessible by IPR_RD and IPR_WR operations. See Figure 12-31 for the definition of this register. Note that Pcache operation is further qualified by the state of PCSTS<O> (See Section 12.6 for more information about PCSTS). If this bit is nOD-zero, Pcache operatioD is automatically forced to behave as if I_ENABLE=O and D_ENABLE=O, regardless of the actual state of I_ENABLE and D_ENABLE. Effectively, this shuts do'\VD normal Pcache operation due to the presence of a pre'vious Pcache parity error. Kote that Pcache invalidate operations are only disabled if both D_ENA.BLE=O and I_ENABLE=O, or if PCSTS<O> is set. Note that the ELEC_DISABLE bit of PCCTL is intended for debug use only. This bit electrically disables the Pcache to reduce power dissipation. This bit should only be set when the Pcache is functionally turned off by the deassertion of both I_ENABLE and D_ENABLE. UNPREDICTABLE operation will result when this hit is set when either I_ENABLE or D_ENABLE is also set. Any further discussion concerning Pcache function assumes that ELEC_DISABLE is inactive. Also note·that all Pcache IPR_RD and IPR_WR operations will function correctly regardless of the state of I_ENABLE or D_ENABLE or PCSTS<O>. However, Pcache array IPRs will not function if ELEC_DISABLE is set. If either D_ENABLE or I_ENABLE are to be toggled to the on state, the Pcache array must See Section 12.8.2.1 for more information about Pcache initialization. be initialized prior to such action. When the FORCE_HIT (Force Hit) bit is set and I-stream or D-stream operation is enabled, all enabled memory space read and write references are forced to hit in the Pcacbe regardless of the value of the stored tag. The BANK_SEL bit specifies which tag of the pair of tags addressed is forced to hit. Thus when FORCE_HIT=l, the Pcache becomes a 4K direct mapped cache with all reads and writes forced to hit in the Pcache. Toggling BANILSEL causes the other half of the 8K Pcache to become accessible in this direct mapped mode. Note that BANILSEL never affects bank selection during IPR reads and IPR writes to the Pcache tags or Pcacbe data parity bits; bank selection for these commands is always determined by the specified IPR address. Also note that the FORCE_HIT bit only affects memory space references. I/O space references still miss in the Pcache regardless of-the state of the FORCE_HIT bit. DIGITAL CONFIDENTIAL The Mbox 12-71 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The FORCE_HIT feature is designed to facilitate testing the Pcache data alTay and to make diagnostic tests easily loadahle within the Pcache by simple WRITE operations. When FORCE_mT=O, the Pcache is configured as an 8K 2-way set associative cache, no reads or writes are forced to hit, and the BANK_SEL bit is a don't care. The P_ENABLE (Parity Enable) bit allows the detection of Pcache tag and data parity errors to be enabled or disabled. If P_ENABLE=O, Pcache parity elTors will not be detected. Thus when P_ENABLE=O, no Pcache error will be recorded in PCSTS or will be reported to the Ebox. Note however, that when FORCE_HIT=l, Pcache tag parity is never checked regardless of the state of P_ENABLE. 12.4.2 Pcache Hit/Miss Determination HIt/Miss Determination by Tag Comparison 12.4.2.1 Vlhen an IREAD, DREAD, DREAD_MODIFY, WRITE, VvlUTE_lJNLOCK, or INVAL operation is executed, the Pcache must determine if the referenced data is present in its alTay. To do this, physical address bits<11:5> are input to the Pcache row decoders in order to determine which one of the 128 direct mapped indexes is being addressed. Subsequently, all 629 bits "vithin the addressed index are accessed by the assertion of the cOlTesponding word line. The two accessed tag values are simultaneously compared to physical address bits<31:12>. A Pcache hit condition occurs when all of the following conditions are simultaneously true: • • • The contents of one of the two addressed tags matches the data on M%S6_P~B<31:12>. The valid bit corresponding to both the matched tag and to the addressed subblock (specified by physical address bits<4:3» is set. The stored tag parity corresponding to the matched tag is the same as the value calculated off of Mo/cS6_PA...B<31:12>. If an address match is detected on one of the tags and the valid bit which corresponds to both the matched tag and the addressed subblock (specified by physical address bits<4:3» is set, then a Pcache hit condition has been detected on the corresponding Pcache tag. The absence of the Pcache hit condition causes a Pcache miss condition. 12.4.2.2 Conditions which force Pcache Miss The Pcache miss condition is forced to override the tag determination of hitlmiss described above when anyone of the following conditions is satisfied: • • • • • If PCSTS<O> is set, the Pcache miss condition is forced due to a previous Pcache parity error. If an !READ or I_CF operation is accessing the Pcache and I_ENABLE=O, the Pcache miss condition is forced. If a D-stream read or D_CF operation is accessing the Pcache and D_ENABLE=O, the Pcache miss condition is forced. If a DREAD_LOCK operation is executing, the Pcache miss condition is forced. This guarantees that the read will propagate to the Cbox for synchronization purposes. If an I_CF operation is executing and the IMISS_LATCH state indicates that the reference cannot be cached, the Pcache miss condition is forced. 12-72 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 • If a D_CF operation is executing and the DMISS_LATCH state indicates that the reference cannot be cached, the Pcache miss condition is forced. . 12.4.2.3 Conditions which force Pcache HH The Pcache hit condition is forced to override the tag determination of hit/miss described above when anyone of the following conditions is satisfied. Note that unless explicitly stated to the contrary, the forced Pcache miss conditions above take precedence over the forced Pcache hit conditions described below. • • • • • 12.4.3 If a read reference is tagged as having a memory management fault or hard error associated with it (i.e. M_QUE_MS2%S6_QUAL_B<O> = 1 or M_QUE..MS2%S6_QUAL_B<l> = 1), a Pcache hit condition is forced. NOTE: This force hit condition takes precedence over any force miss condition described above. If the operation is a DREAD, DREAD_MODIFY, WRITE, or WRITE_UNLOCK, and D_ENABLE =1 and FORCE_ffiT=1, the Pcache hit condition is forced on the tag corresponding to both the addressed Pcache index and the bank specified by the BANILSEL bit EXCEPr when the address maps to I/O space. I/O references must never hit in the Pcache regardless of the state of FORCE_IDT. If the operation is an IREAD and I_ENABLE=1 and FORCE_IDT=1, the Pcache hit condition is forced on the tag corresponding to both the addressed Pcache index and the bank specified by the BANILSEL bit. If the operation is a D_CF and D_ENABLE=1 and the DMISS_LATCH state indicates that the reference is cacheable, the Pcache hit condition is forced and the bank is specified by the allocation field of the DMISS_LATCH. If the operation is a I_CF and I_ENABLE=1 and the IMISS_LATCH state indicates that the reference is cacheable, the Pcache hit condition is forced and the bank is specified by the allocation field of the IMISS_LATCH. Pcache Read Operation A Pcache read operation is initiated by a DREAD, DREAD_MODIFY, or IREAD reference. A Pcache read begins by determining the Pcache hit or miss condition described above. If a Pcache hit is detected, the quadword of data corresponding to both the tag in which the hit occurred and to physical address bits<4:3> is driven out of the Pcache. If a Pcache miss condition is asserted, all the data driven out of the Pcache is ignored except for the allocation hit. The allocation bit is stored in the DMISS_LATCH (in the case of aD-stream read) or in the IMISS_LATCH (for an IREAD). This bit will be used during a cache fill operation to select the appropriate block to be filled (See Section 12.4.6 for information about allocating and filling blocks). DIGITAL CONFIDENTIAL The Mbox 12-73 I I I NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.4.4 Pcache Write Operation A Pcache write operation is initiated by a STORE, WRITE or WRITE_UNLOCK reference. A Pcache write begins by determining the Pcache hit or miss condition described above. If a Pcache hit is detected, the data present on B%S6_D~H<63:0> is selectively written into the quadword corresponding to both the tag in which the hit occurred and to physical address bits<4:3>. The data is selectively written by using MI1a96_BYTE_~H<7:0> as a write enable for the eight respective bytes of data. The corresponding data parity is also written in the same manner for each corresponding byte which is written. If a Pcache miss condition occurs, no Pcache write operation takes place. However, the write reference is forwarded to the Cbox for processing regardless of the hit/miss condition in the Pcache. 12.4.5 Pcache Replacement Algorithm When a Pcache miss occurs during a read operation, it must be decided which one of two blocks will be allocated for the subsequent Pcache fill sequence. When the Pcache miss occurred because no validated tag field matched the read address, the state of the corresponding allocation bit indicates which bank aeft or right) should be used for the resulting fill sequence. The value of each allocation bit changes according to the "not-last-used" algorithm. That is, the allocation bit always points to the bank within the index that was not last accessed. When a read miss occurs because no validated tag field matched the read address, the value of the allocation hit is latched in the MISS_LATCH corresponding to the read miss. This latched value will be used as the bank select input during the subsequent fill sequence. As each fill operation takes place, the inverse of the allocation value stored in the MISS_LATCH is written into the allocation bit of the addressed Pcache index. During Pcache read or write operations, the value of the allocation bit is set to point to the opposite bank that was just referenced because this is now the new "not-last-used" bank. The one exception to this algorithm occurs during an invalidate. When an invalidate clears the valid bits of a particular tag within an index, it only makes sense to set the allocation bit to point to the bank select used during the invalidate regardless of which bank was last allocated. By doing so, we guarantee that the next allocated block within the index will not displace any valid tag because the allocation bit points to the tag that was just invalidated. 12.4.6 Pcache Fill Operation A Pcache fill operation is initiated by the I_CF (I-stream cache fill) or D_CF (D-stream cache fill) reference. A fill op.eration can be considered to be a specialized form of a write operation. A fill is· functionally identical to a Pcache write operation except for the following differences: • • The bank. within the addressed Pcache index is selected by the following algorithm. If a validated tag field within the addressed index matches the cache fill address, then the block. corresponding to this tag is used for the fill operation. If this is not true, then the value of the corresponding allocation bit selects which block will be used for the fill. The first fill operation to a block causes all four valid bits of the selected bank to be written such that the valid bit of the corresponding fill data is set and the other three are cleared. All subsequent fills cause only the valid bit of the corresponding fill data to be set. 12-74 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 • • • Any fill operation causes the fill address bits<31:12> to be written into the tag field of the selected bank.. Tag parity is also written in a analogous fashion. A fill operation causes the allocation bit to be written with the complement of the value latched by the corresponding MISS_LATCH during the initial read miss event. A fill operation forces every bit of the corresponding byte mask field to be set. Thus, all eight bytes of fill data are always written into the Pcache array on a fill operation. 12.4.7 Pcache Invalidate Operation A Pcache invalidate operation is initiated by the INVAL reference. The invalidate operation is interpreted as a NOP by the Pcache if the address does not match either tag field in the addressed Pcache index. If a match is detected on either tag, an invalidate will occur on that tag. Note that this determination is made based only on a match of the tag field bits rather than on satisfying all criteria for the Pcache hit condition (Pcache hit factors in valid bits and verified tag parity into the equation). When an invalidate is to occur, the four valid bits of the matched tag are written with zeros and the allocation bit is written with the value of the bank select used during the current invalidate operation. Also note that an assertion of C%cBOx..HARD_ERR_H during a cache fill command causes the cache fill operation to be processed as if it were an INVAL operation. 12.4.8 Pcache IPR Access For testability reasons it is important to verify that every Pcache storage bit can be read and written in both "0" and "1" states. The easiest way to do this is to provide a mechanism to directly read and write every bit in the Pcache array. The data field is already accessible through read and write commands. The tag field, tag parity, valid bits and data parity are directly accessible through IPR_RD and IPR_WR operations to the Pcache IPRs defined below: DIGITAL CONFIDENTIAL The Mbox 12-75 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 12-45: IPR Address Space Mapping Normal IPR Address 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I SBZ I 0I SBZ I IPR number +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Pcache TAG IPR Address 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I SBZ I 11 11 01 SBZ I BI pcache index addr I SBZ I +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ where: B - o _a> select the left bank of the specified index. 1 _a> select the right bank of the specified index. Pcache Data Parity IPR Address 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I SBZ I 11 11 11 SBZ I BI pcache index addr I SBA I SBZ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 0 _a> select the left bank of the specified index. 1 _a> select the right bank of the specified index. SBA - subblock address selection where: B - The format of a Pcache tag IPR is shown in Figure 12-32. The tag parity bit is included in the Pcache tag IPR format to allow the user to write bad tag parity into the array in order to verify the tag parity logic. Further, the valid bits and allocation bit are also included so that the Pcache can be initialized to a known state. . The format of a Pcache Data Parity IPR is shown in Figure 12-33. This IPR allows the Pcache data parity to be directly read and written for testability purposes. 12.4.9 Pcache IPR Summary The following table summaries all IPRs associated with the Pcache: Table 12-18: Pcache IPRs IPRAddreS8 (in hex) Register Name PCADR (quadword address of reference causing Pcache parity error) FO PCSTS (status of Pcache parity error) F1 PCCTL (control state of Pcache operation) F2 PCTAG 01800000 .. 01801FEO PCDAP 01 COOOOO .. O1CO 1FF8 See Section 12.6 for a description of the PCADR and peSTS registers. Note that with the exception of the Pcache tag IPRs, the addresses of the three other Pcache IPRs are driven into the Mbox shifted left two hits. This fact is not reflected in the above table. 12-76 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.4.10 Pcache States Resulting in UNPREDICTABLE operation The capability of arbitrarily altering Pcache state through IPR write operations allows for the possibility of putting the Pcache into obscure states which cannot be achieved by IInormal't operation. Two of these states will cause UNPREDICTABLE behavior: 1. Setting the ELEC_DISABLE bit in PCCTL will cause IPR read operations to the Pcache tag or Pcache data parity bits to return incorrect data. Setting the ELEC_DISABLE bit will cause IPR write operations to the Pcache tag or Pcache data parity bits to be disabled. Setting ELEC_DISABLE with either I_ENABLE or D_ENABLE set may cause Pcache read operations to return incorrect data. Setting ELEC_DISABLE with either I_ENABLE or D_ENABLE set will cause Pcache write, invalidate and cache fill functions to be disabled. 2. Through explicit Pcache tag IPR write operations, a user could write both blocks of a Pcache index with the same tag, tag parity and valid bit data. If this condition occurs with one or more sub-block valid bits set, the Pcache will return invalid data on references corresponding to the written tag (note that normal Pcache operation precludes this situation from ever occurring). 12.4.11 Pcache Redundancy Logic Due to the extreme density of the Pcache array, the Pcache has a high susceptibility to manufacturing defects. As a result, redundancy logic was designed in order to provide a mechanism which would allow the Pcache to function correctly in the presence of a small number of manufacturing defects. , The redundancy logic consists of hardware which supports the operation of sixteen extra indic:ies which exist in addition to the 128 IIregular" indic:ies. If a defect exists in an index which does not disturb the function of any column logic, the redundancy logic allows the bad index to be replaced by one of the 16 extra indicies. If an index is determined to be malfunctioning during chip test, a redundant index can be substituted for the bad index by blowing specific fuses on the chip through the use of a lazer. Blowing these fuses creates logic state transitions on redundancy control signals which disable the operation of a set of 4 'tregular" indicies and will enable the operation of 4 redundant indicies in their place. Four sets of four redundancy fuses exist. Each set controls 4 of the 16 redundant indicies. Each set can map its 4 redundant indicies into one of 8 different sets of 4 IIregular" indices. The redundancy mapping is shown below: DIGITAL CONFIDENTIAL n The Mbox 12- NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 12-46: Pcache Address Redundancy Mapping 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I RS I XIRED_ADDRI XI +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ where: RS represents the address bits corresponding to the four sets of four redundancy fuses. The two X's represent the address bits corresponding to the set of four indicies which get replaced. RED_ADDR represents the lazer-programmable address bits that specify which one of 8 sets of 4 "regular" indicies are to be replaced. Each set of 4 redundancy fuses consists of three bits to specify the address mapping (specified by RED_ADDR above) and 1 bit to enable the redundant indicies to operate in place of the specified set of "regular" indicies. When one or more redundancy elements are blown, another fuse is also blown which will set the RED_ENABLE bit in PCCTL (see Figure 12-31). Thus, by reading the PCCTL IPR one can determine if one or more redundancy elements has been enabled. 12-78 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.5 MEMORY MANAGEMENT The Mhox, the Ebox microcode, and the VMS memory management software implement VAX memory management. The Mbox performs the hardware memory management functions necessary to process most references in a quick efficient manner. The operating system software performs all other functions. For a description of the hardware end of VAX memory management, the reader is referred to the Memory Management chapter of the 'VAX Architecture Standard" (DEC STD 032). For a complete description of the software end of VAXlVMS memory management, the reader is referred to the Memory Management chapters of "VAXlVMS Internals and Data Structures". The Mbox is responsible for the following memory management functions: • • • • • • • Performing virtual-to-physical address translations. Maintaining a cache of PTEs to perform the quick translations. Performing access mode checks on memory references. Performing TNV checks on memory references. Performing M=O checks on memory references. Directly or indirectly invoking a software memory management exception handler due to ACV (Access Violation) or TNV (Translation not Valid) or M=O faults. Detecting cross-page conditions and performing the corresponding access mode checks. DIGITAL CONFIDENTIAL The Mbox 12-79 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.5.1 NVAX MEMORY STRUCTURE 12.5.1.1 Virtual Address Space The NVAX. virtual address space conforms with the description of the VAX virtual address space. The space contains four gigabytes (2**32) of memory divided into four regions as shown below: Figure 12-47: Virtual Address Space Layout 00000000 +-------------------------+ 1 1 1 1 1 PO ----------------1 Region 1 1 1 length of PO Region in pages (POLR) 3FFFFFFF 1 1 PO Region growth direction 40000000 +-------------------------+ 1 1 PI Region growth direction V 1 1 1 1 Pl ----------------1 Region 7FFFFFFF 80000000 1 1 1 1 +-------------------------+ 1 1 System Region FFFFFEOO FFFFFFFF ---------------1 1 1 1 1 1 1 1 1 V 1 length of PI Region in pages (2**2l-PlLR) length of System Region in pages (SLR) System Region growth direction +-------------------------+ Reserved 1 1 1 Page 1 +-------------------------+ NOTE NVAX CPU chips at revision 1 implement the original VAX memory management architecture in which any reference to a virtual address above BFFFFFFF (hex) falls into a reserved region and causes a length violation. NVAX CPU chips at revision 2 or later implement the extended SO space addressing described above. 12-80 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.5.1.2 Physical Address Spaces The NVAX hardware addresses a physical address space defined by another four gigabyte region. The first seven-eighths ofit addresses physical memory. The top one-eighth of this space addresses 110 space~ Thus, all 110 space addresses can be distinguished by physical address bits<31:29> = 111 (binary). Figure 12-48: Physical Address Space of the NVAX Hardware 00000000 +-------------------------+I I -+ +- I I I I -+ +I I ++- I I Memory Space 12.5.1.2.1 -+ +- -+ +- -+ I I EOOOOOOO FFFFFFFF I I 3.5 Gigabytea I I I I DFFFFFFF -+ I I I I I I +-------------------------+ I I/O I 512 Magabytea I Space I +-------------------------+ Physical Address Space Mappings The Mbox is designed to accommodate both a 30-bit and 32-bit physical address space as seen at the program level while maintaining one physical address space as seen by all NVAX hardware extemal to the Mbox (shown above). These two programleve1 physical address spaces are mapped by Mbox hardware into the NVAX physical address space according to the value of the PAMODE register. See Figure 12-23 for a description of PAMODE. The PAMODE register is accessible by the IPR_RD and IPR_WR commands. When PAMODE=O, the 30-bit physical address space seen at the program level is translated into the NVAX. physical address space as follows: DIGITAL CONFIDENTIAL The Mbox 12-81 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 12-49: 30-bH Physical Address Mapping Program Level 30-bit Address Space mapped when PAMODE-O +-------------------------+I 00000000 I Memory lFFFFFFF I Space +- - - - - - - - - - - 20000000 I I/O 3FFFFFFF I Space NVAX Physical Address Space +-------------------------+ Memory I 00000000 I lFFFFFFF I +-------------------------+ +- -+ +I I +- +I I +I I +I FFFFFFFF I -+ Outside of 30-bit Space -+ I I -+ I I -+ -+ I I +-------------------------+ Space +-------------------------+ I I I I I +I I +I I +I I +I DFFFFFFF I I -+ Inaccessable Region When PAMODE-O I I -+ I I -+ -+ -+ +-------------------------+ I/O I EOOOOOOO I FFFFFFFF I Space I +-------------------------+ Logically speaking, this mapping is accomplished by the Mbox by sign-extending physical address<29> into physical address<31:29>. When PAMODE=l, the 32-bit physical address space seen at the program level is directly translated into the NVAX physical address space: 12-82 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 12-50: 32-b1t Physical Address Mapping Program Level 32-b1t Address Space mapped when PAMODE-l +-------------------------+I 00000000 I lFFFFFFF I +I I +- +-------------------------+ I I 00000000 I lFFFFFFF I I -+ I I -+ I I -+ I I -+ +I I +I I +I I +- -+ I I -+ I I -+ I I -+ I I +I Memory I Space +I I +-+ I I I I +-+ I I I I +- - - - - - - - - - - EOOOOOOO I I/O FFFFFFFF I Space -+ +-------------------------+ 12.5.1.3 NVAX Physical Address Space Memory Space +I I -+ I I +- -+ I I +-------------------------+ I/O EOOOOOOO I FFFFFFFF I Space +-------------------------+ ADDRESS TRANSLATION AND THE TB For a complete description of VAX virtual address translation, the reader is referred to the Memory Management chapter of the "VAX Architecture Standard" (DEC STD 032). An overview of this process can be found in Section 2.6 of this specification. The Mbox performs virtual-to-physical address translations in the S5 pipe when the following two conditions are satisfied: 1. The MAPEN bit is set (MAPEN enables virtual address translations). 2. M..Q~_QUAL..H<6> indicates that the 85 reference is a virtual reference. When both of these conditions are met, the address in M_QUEY5_VA...H<31:O> is translated by the Mbox, and the resulting physical address is driven on M..QtJEY5_PA..-B<31:O>. If both these conditions are not satisfied, the contents ofM_QUE%S5_VA...B<31:O> is treated as a physical address and is directly transferred to M..QUE%S6_PA..-B<31:O>. The TB (translation buffer) is the mechanism. by which the Mbox performs quick virtual-to-physical address translations. It is a 96-entry read allocate fully associative cache of PTEs (Page Table Entries). The format of a page table entry and a TB entry are shown below. DIGITAL CONFIDENTIAL The Mbox 12-83 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 12-51: PTE and TB format Page Table Entry 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I VI PROT I MI 51 Physical Page Frame Address I +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ where: V - valid bit PROT - authorized access modes Mmodify bit S reserved bit TB Entry 5 4 5 3 5 5 2 1 2 2 9 8 2 5 2 4 2 3 2 2 o +-----+----+--------+---------+------+---+----+------------------+ I TBV I TP I TP_BAR I TAG I PROT I M I DP I PFN +-----+----+--------+---------+------+---+----+------------------+ where: TB entry valid bit TBV - TP TP_BAR TAG PROT M - DP PFN - even tag parity bit complement of TP virtual address<31:9> authorized access modes modify bit even parity for validated PTE field physical page frame address Note that the TB entry stores all but three bits of the PrE field. The TB entry does not store the S bit because it is not used, and the TB entry does not store the upper two bits of the PTE PFN because these bits correspond to a larger physical address space than NVAX uses. The tag field stores the virtual page frame address. The TBV bit indicates whether the corresponding entry is valid. If TBV is set, then PrE<31> is valid because the TB only caches PTEs whose valid bit is set. The associativity of each TB entry is implemented by the use of comparators on the TBV and tag fields. When a virtual address is driven onto M_QUE'1cSS_VA...H<31:0> at the start of a cycle, each TB tag comparator, whose corresponding TBV bit is set, looks for a match between the M_QUEo/eS5_VA...H virtual page frame address and its corresponding tag. If no comparator finds a match, the TB_MISS condition has occurred indicating that no TB entry contains a translation for the specified address (see Section 12.5.1.5.2 for discussion of TB_MISSes). If one of the entries detects a match (TB_mT condition), the PFN, PROT, and M fields of the corresponding TB entry are read out of the TB. M..QUE%S5_pA...B<31:9> are driven with the contents of the accessed PFN. M_QUEo/cS5_PA-B<8:0> are the untranslated bits addressing a byte within a page; therefore, these bits are driven directly from M_QUE%ss_VA...H<8:0>. The PROT, and M fields, which were driven out of the TB with the PFN, are used by the memory management exception detection logic to determine ACV and M=O conditions (See Section 12.5.1.5.3). TB entries are allocated using a NLU (Not-Last-Used) TB allocation pointer. The TB entry pointed to by the NLU allocation pointer is allocated and validated during a TB_TAG_FILI1TB_PrE_FILL sequence. The allocation pointer increments in round robin fashion around every TB entry when a TB lookup accesses the entry pointed to by the allocation pointer or when a TB_PTE_FILL 12-84 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 operation is done. Because the allocation pointer is guaranteed not to point to the last entry referenced, this scheme implements a not-last-used allocation scheme. TB entries can be invalidated in the following ways: • • • • An entry can be invalidated by being displaced from the TB by allocation of another PTE to the same TB entry. An entry can be invalidated by execution of the TBIS (TB Invalidate Single) command. If the specified TBIS virtual address matches a TB tag, the TBV bit corresponding to the matched tag is cleared. Clearing the TBV bit invalidates the TB entry (See Section 12.3.13 ). Entries can be invalidated by execution of the TBIP (TB Invalidate Process) command. TBIP causes the most significant bit of all the tag fields to be examined. If this bit is cleared, the corresponding TBV bit is cleared. The effect of this operation is to invalidate all PTEs corresponding to PO or P1 space translations (See Section 12.3.14 ). All entries can be invalidated by the execution of the TBIA (TB Invalidate All) command. This command resets the TBV bit of every TB entry (See Section 12.3.15 ). 12.5.1.4 30-blt to 32-blt Physical Address Translations When PAMODE::O, the NVAX system is configured such that only 30-bit physical addresses are processed at the program level. Since the Mbox and Cbox hardware is designed assuming a 32-bit hardware address space, the Mbox must appropriately translate all 30-bit physical addresses into 32-bit physical addresses based on the mapping scheme shown in Figure 12-49. This is done in two ways. 1. When the Mbox receives a physical address from one of its reference sources, the mapping is implemented by an address sign extension scheme involving the upper three address bits. In this scheme, address<31:30> are forced to the state of address<29>. 2. When the Mbox receives a virtual address, virtual address translation occurs normally without any sign extension of the resulting physical address. This is possible because the corresponding sign extension function is preprocessed on the upper three hits of page frame address which is written into the TB during the TB_TAG_FILL operation. Note that restrictions exist about how the PAMODE register can be modified. See Section 12.8.2 for more information. 12.5.1.5 12.5.1.5.1 MEMORY MANAGEMENT EXCEPTIONS MME_DATAPATH The MME_DATAPATH (Memory Management Datapath) is used to process most memory management functions performed by the Mbox. Specifically, it performs the following functions: • • • • • Creates read references of PTEs in order to obtain virtual address translations not currently cached in the TB (See VAX Architecture Standard, DEC STD 032, for a description of this process). Creates TB fill operations in order to fill tag and PTE data in the TB. Stores most Mbox internal processor registers. Stores virtual addresses associated with memory management faults. Stores PTE addresses associated with M=O faults. DIGITAL CONFIDENTIAL The Mbox 12-85 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 The MME_DATAPATH is illustrated below: Figure 12-52: MME Datapath "_DATAcI! :0> U_ADDRclhO~ 1 IIMI_ADDR_LAT I U_CIIDc£:O~ IIME..,I.A'n:H I IIIIE_DAT"-LAT I aID I AlIDA aID GI!N!IIA1\)R DATA I I IIIII_"H_'ILI ... J IID_IIUlcI!:O ~ III 1I1I1_ALUeI!. .~ , B 12.5.1.5.1.1 A II t . . ~'11 I ~ ALU t / MME Register File The register file has one write port and two read ports (one for each input to the ALU). The register file contains the following longword registers: Reg Name PAMODE 12-86 The Mbox DefiDitioD Address Mode Register: enables 30 or 32-bit address mapping DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Reg Name MMAPEN12 12 MSLR MSBR1 MPOLR12 MPOBRI MPILR12 Definition Mbox Map Enable Register: turns on/off virtual translations Mbox System Length Register: Length of System Page Table Mbox System Base Register: Addr of System Page Table Mbox PO Length Register: Length of PO Page Table Mbox PO Base Register: Addr of PO Page Table Mbox PI Length Register: Length of PI Page Table :MPIBRI MMEADRI Mbox PI Base Register: Addr of PI Page Table MMEPrEl PrE Address Register MMESTS 1 Status of memory management exception TBADR Address of reference causing TB parity error TBSTS Status of TB parity error TMPI Scratch Register 1 TMP2 Scratch Register 2 MME Faulting Address Register 1 Testability and diagnostic use only; not for software use in normal operation. 2Ebox ucode sends and receives this data toIfrom the MME reg file shifted left 9-bits. Note that the datapath associated with this register file performs all bit shifts associated with MME processing except for 9-bit shifts required on MMAPEN, MSLR, MPOLR, and MP1LR registers. The Ebox microcode sends pre-formatted data to these registers such that the data has been pre-shifted left nine bit positions. This facilitates the M:M:E datapath implementation. IPR_RD operations from these registers send data back to the Ebox in the same format. Thus, the Ebox microcode will re-format the data back into the standard formats illustrated in Table 12-3. Note that a 9-bit left shift is performed on MMAPEN so that the contents of MMAPEN can be used to increment a virtual address by a page in order to perform cross page check operations. The MME_ADDR latch stores the address which was driven on M_QUEo/0S5_VA....H<31:0> during the previous cycle. The MME_DATA latch stores the data which was driven on M_QUEo/0S5_DATA_B<31:0> during the previous cycle. The A input to the ALU is either driven from MME_ADDR, MME_DATA, or the A read port of the register file. 12.5.1.5.1.2 MME ALU The ALU (Arithmetic Logic Unit) performs the following functions: • • • • pass A: used for receiving addresses and data from main S5 pipe. pass B: used for reading/writing registers A + B: used to generate PTE addresses (note 9-bit right shift on A input) A - B: used for page table length checks of PO and SO space references (note 7-bit right shift on A input) DIGITAL CONFIDENTIAL The Mbox 12--87 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 The output of the ALU can write the following: • • • address field of the MME_LATCH (to generate PrE reads, TB tag fills and TB pte fills) data field of the :M:ME_LATCH (to return requested IPR read data) the register file 12.5.1.5.1.3 MME_SEQ The lMME_SEQ is a state machine which controls sequencing of the MME_DATAPATH. It controls which devices drive and latch data in the MME_DATAPATH, whatALU function is to be executed, and what command gets generated and latched in the MME_LATCH, The possible MME state sequences of the MME_SEQ are illustrated by the following two diagrams below: 12-88 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 12-53: MME Sequences START 01' Til_MISS SEQUENeE f I LOAD TMP1 1 i ...- - - - - DOUIILE TB MISS SEOUENCE----. l i I PERFORM PAGE TAIILE LENGTH VIOL.ATON CHECK I I LOAD TMP2 WITH PPTE AD ADDR I , PERFORM PAGE T AIILE LENGTH VIOL.ATON CHECK i I I LENGTH VIOLATION NO LENGTH VIOLATION LENGTH VIOLATION NO LENGTH VIOLATION I I ISSUE Til TAG FILL TO ALLOCnE A NEW Til ENTRY 1 I I ISSUE PTE DREAD l l I I I ISSUE TB TAG FILI.I TO ALLOCnE A NeW TB ENTRY ISSUE SPTE DREAD (PHYSICAL READ) RECEIVE PTE DATA I I Til_MISS ON PTE DREAD TNV VIOLATION TB_HIT ON PTE DREAD NO TNV VIOLATION ! I ACVITNV VIOLATION NO ACVITNV VIOLATION I END 01' TB ISSUE TB PTE I'ILLl TO VALIDATE 1II!W Til ENTRY IRE.ISSUE TIS TAG FIL1 TO RE-ALLOd>.TE "PTE FOR MM! FAULT_ADDR I RECEIVE PTE DATA I ISSUE ~~ PTE FILL TO VALl iATE TIEW Til NTRY I I RE·ISSUE PTE DREAD I l lass SEQUENCE ; 1 GOTO ACVITNV SEQUENCE .. ·.-----DOUI!ILE TB MISS SEOUENCE----...; DIGITAL CONFIDENTIAL The Mbox 12-89 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 12-54: MME Sequences Cont'd START OF WE IPR_WR SEQUENCE START OF ACV/TNV/M.O SEQUENCE t LOAD ADDRESSED MME IPR FROM MME_DATA LATCH END OF IPR_WR SEQUENCE CONDITIONALLY LOAD TMP1 FROM MME_ADOR t CONDITIONALLY UPDATE MME-"AULT_ADDR I. MME_STAT 1 IF I0Il.0 CONDITION: GENERATE PTE ADDR AND CONDITIONALLY UPDATE I0IlI0Il E_PTE_ADDR START OF WE IPR_RD SEQUENCE 1 END OF ACV/TNV/M.O SEQUENCE LOAD MME_LATCH WITH ADDRESSED MME IPR AND ISSUE IPA_DATA CWO END OF IPR_RD SEQUENCE There are five distinct entry points into the MME sequences: • • • • • TB_MISS Entry Point: Whenever a TB_MISS condition is detected on an Ibox or Ebox reference, the MME_SEQ executes the sequence defined by the TB_MISS Entry Point. Cross Page Entry Point: The MME_SEQ executes the Cross Page Sequence in order to check for:MME faults which may exist on the upper page of a reference that crosses a page boundary. ACVtrNVfM=O Entry Point: The MME_SEQ can execute this sequence when an AGV, TNv, or M=O condition is detected on an S5 reference, or when an ACV or TNV condition is detected during the TB miss sequence. MME IPR_RD Entry Point: The MME_SEQ executes this fiow when an Mbox IPR register located in the MME_DATAPATH is addressed by an IPR_RD command. MME IPR_WR Entry Point: The :MME_SEQ executes this fiow when an Mbox IPR register located in the MME_DATAPATH is addressed by an IPR_WR command. Once an MME sequence starts, the processing of all Ibox and Ebox references is inhibited until the sequence completes. Once the MME sequence terminates, normal processing resumes and the original reference which initiated the MME sequence will be retried. 12-90 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.5.1.5.2 TB MISS SEQUENCE When memory management is enabled (MAPEN=1) and no valid tag entry in the TB matches the corresponding virtual page frame address applied on M_QUEo/DSS_VA...B<31:9>, the TB does not contain the necessary translation information to convert the address to physical space. In this situation, the TB asserts its TB_MIS8 signal which initiates a series of sequential events that will cause the proper PTE to be written into the TB. 12.5.1.5.2.1 Single Miss Sequence A single miss sequence is defined as a TB miss sequence with only one TB miss occurring during the sequence. The following series of events characterizes a single TB miss sequence (see Figure 12-53 for a flow chart description of this sequence): • • • • • • • • cycle 1: TB asserts TB_MIS8. S5 reference is aborted (will be retried later). MME_ADDR latches M_QUEo/cS5_VA...H. cycle 2: TMP1 is loaded from MME_ADDR in order to store the TB miss address in the MME register file. cycle 3: The proper page table length check is performed using TMP1, the appropriate XLR and a subtract ALU operation. If a length violation exists, the execution sequence continues in the ACVrrNVlM=O sequence (See Section 12.5.1.5.3.6). cycle 4: The address field of the M:ME_LATCH is loaded with the TMP1 fault address and the MME_LATCH is validated with a TB_TAG_FILL command. cycle 5: The TB_TAG_FILL command executes in S5 (assuming no Cbox reference took priority) to allocate a TB entry corresponding to the TB miss address. The corresponding PrE address is formed using TMPl, the appropriate XBR and the A+B ALU operation. The PTE DREAD is loaded into the MME_LATCH. cycle 6: The PrE DREAD is started in 85 (assuming no Cbox reference took priority). If this is an 8PTE (System Page Table Entry) DREAD, this reference is physical and, therefore, cannot have a TB_MISS and/or TNV condition associated with it. If this is a PPTE DREAD (Process Page Table Entry) DREAD, this reference is virtual and can have a TB_MISS and/or TNV condition associated with it. Since a single miss sequence is being described here, a PPTE DREAD hits in the TB by definition (see Section 12.5.1.5.2.2 for a description of when this reference misses). Note that no ACV protection checks are performed on this DREAD because it is an Mbox PTE DREAD. No TNV checks are performed because only PrEs with PrE<31> set are cached in the TB. No M=O check is performed since this is strictly a read operation. Assuming TB miss problems occurred, the address is now properly translated and the DREAD continues into 86. cycle x: The PTE data is available on the M%MD_BUS_B<31:0>. This data is latched in the address field of the MME_LATCH. ACVITNV checks are performed on the protection and valid bit fields of the incoming PTE data. If an ACVfINV condition is detected, the memory management sequence continues in the ACVITNVfM=O sequence (See Section 12.5.1.5.3.6). If neither condition is detected, the MME_LATCH is validated with the TB_PTE_FILL command. cycle x+l: The TB_PTE_FILL command is executed in S5 (assuming no other Cbox command took priority) to load the PTE into the TB and validate the TB entry. Normal processing resumes and the reference which causes the original TB miss will be retried. DIGITAL CONFIDENTIAL The Mbox 12-91 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.5.1.5.2.2 Double Miss Sequence When the MME_DATAPATH generates a PPTE DREAD in order to resolve a TB miss, the PPTE address is itself a system virtual address. Therefore, it is possible for the PPl'E DREAD to generate a second TB miss. In this case, the PPTE DREAD TB miss must be processed first in order to translate the PPrE DREAD address. Following this, the original TB miss sequence can resume in order to translate the initial faulting address. This scenario is called a double TB miss and is shown below (see Figure 12-53 for a flow chart description of this sequence): • • • • • • • • • • • • cycle 1: TB asserts TB_MISS. S5 reference is aborted (will be retried later). MME_ADDR latches M_QUE%SS_V.A...H. cycle 2: TMPI is loaded from MME_ADDR in order to store the TB miss address in the !\{ME register file. cycle 3: The proper page table length check is performed using TMPl, the appropriate PXLR and a subtract ALU operation. If a length violation exists, the execution sequence continues in the ACVtJNVlM=O sequence (See Section 12.5.1.5.3.6). cycle 4: The data field of the :MM:E_LATCH is loaded with the TMP1 fault address as the MME_LATCH is validated with a TB_TAG_FILL command. cycle 5: The TB_TAG_FILL command executes in 85 (assuming no Cbox reference took priority) to allocate a TB entry corresponding to the TB miss address. The corresponding PPTE address is formed using TMP1, the appropriate PXBR and the A+B ALU operation. The PPrE DREAD is loaded into the MME_LATCH. Note that because the Mbox generated a PPTE DREAD as part of a TB miss sequence, the virtual reference is loaded into the MME_LATCH with the ACVfM=O reference qualifier cleared so that ACV checks will not be performed on the reference. cycle 6: The PPTE DREAD is started in 85 (assuming no Cbox reference took priority). The TB asserts TB_MI8S again because the PPrE address translation was not present in the TB. MME_ADDR latches the PPTE DREAD address and the DREAD is aborted. cycle 7: TMP2 is loaded from the MME_ADDR with the PPI'E DREAD address. cycle 8: The system page table length check is performed using TMP2, SLR and the A-B ALU operation. If a length violation exists, the execution sequence continues in the ACV/TNVIM.=O sequence (See Section 12.5.1.5.3.6 ). cycle 9: The address field of the :MM:E_LATCH is loaded with the TMP2 PPTE fault address as the MME_LATCH is validated with a TB_TAG_FILL command. cycle 10: The TB_TAG_FILL command executes in 85 (assuming no Cbox reference took priority) to allocate a TB entry corresponding to the TB miss address. Note that the TB entry that is allocated destroys the previous TB entry allocation for the original TB miss because the NLU TB allocation pointer has not moved. The corresponding SPTE address is formed using TMP2, SBR and the A+B ALU operation. The 8PTE DREAD is loaded into the MME_LATCH. cycle 11: The SPTE DREAD is started in 85 (assuming no Cbox reference took priority). Note that this DREAD has a physical address. Therefore, no memory management problem can occur on this read. cycle x: The SPTE data is available on the Mo/cMD_BUS_H<31:0>. This data is latched in the address field of the MME_LATCH. ACVITNV checks are performed on the protection and valid bit fields of the incoming PTE data. If an ACVITNV condition is detected, the memory management sequence continues in the AG'V'ITNVIM=O sequence (See Section 12.5.1.5.3.6). 12-92 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 • • • • • If neither condition is detected, the MME_LATCH is validated with the TB_PrE_FILL command. cycle x+l: The TB_PTE_FILL command is executed in S5 (assuming no other Cbox command took priority) to load the SPTE into the TB and validate the TB entry. Note that the NLU TB allocation pointer is incremented on a TB_PTE_FILL operation. In order to re-allocate a TB entry for the original TB miss address, the address field of the MME_LATCH is loaded with TMP1 while the command field is loaded with a TB_TAG_FILL command. cycle x+2: The TB_TAG_FILL command is executed in S5 (assuming no other Cbox command took priority) to re-allocate a TB entry corresponding to the original TB miss. The original PPTE address is re-generated using TMP1, the appropriate PXBR and the A+B ALU operation. The PPrE DREAD is loaded into the MME_LATCH (ACV checks are once again disabled for this reference). cycle x+3: The PPTE DREAD is started in 85 (assuming no Cbox reference took priority). Note that no ACV protection checks are performed on this DREAD because it is an Mbox PTE DREAD. No TNV checks are performed because only PrEs with PrE<31> set are cached in the TB. No M=O check is performed since this is strictly a read operation. The PPTE DREAD address is now properly translated. cycle y: The PPI'E data is available on Mo/oMD_BUS_B<31:0>. This data is latched in the address field of the MME_LATCH. ACVITNV checks are performed on the protection and valid bit fields of the incoming PTE data. If an ACVITNV condition is detected, the memory management sequence continues in the ACVtrNVlM=O sequence (See Section 12.5.1.5.3.6). If neither condition is detected, the MME_LATCH is validated with the TB_PrE_FILL command. cycle y+l: The TB_PI'E_FILL command is executed in S5 (assuming no other Cbox command took priority) to load the PPrE into the TB and validate the TB entry. Normal processing resumes and the reference which caused the original TB miss will be retried. MICROCODE RESTRICTION To avoid a potential infinite loop case whereby the Mbox is stuck in the TB double miss sequence forever, the Ebox microcode must guarantee that it issues a non-STORE instruction other than TBIA, TBI8, or TB_TAG_FILL during the cycle immediately preceding the cycle it issues either a TBIA, TBI8 or TB_TAG_FILL instruction. 12.5.1.5.3 12.5.1.5.3.1 ACVITNV/M=O ACVITNV/M=O Fault Handling: In order for an Acv, TNv, or M=O fault to be processed, the following steps must occur: 1. The Mbox must detect the ACVITNVIM=O condition. 2. The Ebox microcode must be invoked to start processing the condition. 3. The Ebox microcode must probe Mbox state in order to detennine which fault occurred and how it should be processed. 4. The Ebox microcode must service the fault condition directly, or it must invoke an operating system memory management service routine to service the fault. DIGITAL CONFIDENTIAL The Mbox 12-93 NVAX CPU Chip Functional Specification~ Revision 1.1, August 1991 5. If the memory management fault was not fatal to the process, normal instruction execution resumes by restarting the instruction corresponding to the memory management fault after servicing the fault. 12.5.1.5.3.2 ACV detection: The protection field of a PTE indicates the authorized access rights for each execution mode. When a reference causes the TB to access a PTE, the protection field of the PrE corresponding to the reference is driven out of the TB. The ACV (Access Violation) detection logic uses the PTE protection field, ~QUJNcS5_AT_H<1:0>, and the appropriate CPU execution mode from the Ebox (i.e. user, supervisor, executive, kernel) to detect access violations. If, for example, the protection field indicates a ttread-only" access in user mode, the CPU execution mode specifies user mode, and M_QUE%S6...AT_H<I:0> indicates write access, then an ACV condition is flagged since a write reference is not allowed to this page in user mode. A 2: 1 MUX controls the source of the CPU execution mode. The CPU execution mode information is normally taken directly from the current mode field of the PSL (psL<25:24». On PROBE references, however, the CPU execution mode is driven from E%MMGT.-MODE_H<1:0> in order to check for ACV conditions for an execution mode which the CPU is not currently in. An ACV condition is also generated when a PTE reference fails to satisfy the page length check corresponding to the virtual space of the reference or when the virtual reference falls into reserved page region of virtual memeory (FFFFFEOO-FFFFFFFF). Either condition is reported as an ACV length violation. An ACV check is also performed on the protection field of all PTEs which have just been sent to the Mbox due to an earlier Mbox DREAD issued during the TB_MISS sequence. ACV protection and length checks are perlormed on all Ibox and Ebox references and on all MME_ CHKs. ACV page length checks are performed on all PTE addresses. However, ACV protection checks are never performed on PrE read references generated by the MbOx. Note that the ACV protection condition is disabled from occurring during any cycle where the reference is aborted. When an ACV condition occurs, the MME_SEQ is invoked to execute the ACVITNVIM=O sequence. ACV checks only occur on virtual addresses when memory management is enabled and when the reference indicates that memory management checks should be done (i.e. M_QUE%S5_QUAL_H<2> = 1). 12.5.1.5.3.3 TNV detection When the PTE valid bit is clear, it indicates that the corresponding PrE page frame address translation is not valid. This is called a Translation Not Valid Fault (TNV). TNV detection only occurs during the TB_MISS sequence when the Mbox receives PTE data from the Pcache or Cbox such that the PTE valid bit (PTE<31» is clear. When a TNV fault is detected, the :MME_SEQ interrupts the TB_MISS sequence and invokes the ACVITNVfM=O sequence. By doing so, the invalid PTE is never cached in the TB and a memory management fault is recorded (See Section 12.5.1.5.3.5 on recording memory management faults). 12-94 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.5.1.5.3.4 M::O detection: When a virtual reference causes the TB to access a PTE, the modify bit of the PTE is read out of the TB. A cleared modify bit indicates that the corresponding page has not been written to. If the valid bit of the PTE is set, and the modify bit is clear and the access type of the S5 reference indicates an intention to modify the page (e.g. write or modify access type), then the Mbox must initiate the proper sequence of events to process this "M=O" condition. The M=O check is performed when memory management is enabled and a virtual reference hits in the TB. Note that the M=O condition is disabled from occurring during any cycle where the reference is aborted. 12.5.1.5.3.5 Recording ACVITNV/M=O Faults In order for the microcode to determine the nature of the memory management fault detected by the Mbox, the Mbox must record the necessary fault information. The fault information is recorded in Mbox IPRs which can be read by Ebox microcode. The fault information is stored in three of the registers in the MM:E register file which are accessible to microcode by IPR reads and writes: • The MMEADR register stores the virtual address associated with the ACV, TNV or M=O fault. As per SRM requirements, if the ACVrrNV fault occurred by referencing a PTE during a TB • • miss sequence, the MMEADR stores the original address and not the PTE address. The :MMEPTE register stores the virtual or physical address of the Page Table Entry corresponding to a virtual reference upon which an M=O condition has been detected. The M:M:ESTS register stores state which indicates to the microcode the context and type of fault corresponding to the ACVrrNVIM=O condition. The format ofMMESTS is shown below: Figure 12-55: IPR EA (hex), MMESTS 31 30 29 2BI27 26 25 24123 22 21 20119 1B 17 16115 14 13 12111 10 09 OBI07 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 1 SRC 1 01 01 01 01 01 01 01 01 01 o 1FAULT 1 01 01 01 01 01 01 01 01 01 01 01 MI ILVI:MMESTS +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ <---+----> 1 1 1 +---- LOCK Table 12-19: MMESTS Field Descriptions Extent Type Description o RO Indicates ACV fault occurred due to length violation. 1 RO Indicates ACVtfNV fault occurred on PrE reference corresponding to MMEADR. M 2 RO Indicates corresponding reference had write or modify intent. FAULT 15:14 RO Indicates nature of memory management fault. encodings below Name DIGITAL CONFIDENTIAL See Fault bit The Mbox 12-95 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 12-19 (Cont.): MMESTS Field Descriptions Name Extent Type Description SRC 28:26 RO Complemented shadow copy of LOCK bits. However, the SRC bits do not get reset when the LOCK bits are cleared. LOCK 31:29 RO Indicates the lock status of MMESTS. See LOCK encodings below. This field is cleared on EU'LUSH...MBO~II. Table 12-20: LOCK Encodlngs Defined LOCK values Definition (binary) 000 MMESTS, MMEADR and MMEPrE are unlocked. 001 valid IREAD fault is stored (no other IREAD fault can overwrite :MMESTS, MMEADR, or MMEPl'E). 011 valid !box specifier fault is stored (only an Ebox reference fault can overwrite MMESTS, MMEADR, or MMEPI'E). 111 valid Ebox fault is stored (MMESTS, MMEADR, and MMEPI'E are completely locked). Note that the encodings for the SRC bits are the complemented version of the the LOCK bits. Thus, for example, a fully locked SRC encoding is 000. Table 12-21: FAULT Encodlngs Defined FAULT values (binary) Definition 01 ACV Fault. This is the highest priority fault in the presence of multiple simultaneous faults. 10 TNV Fault. This is the next highest priority fault. 11 M::O Fault. This is the lowest priority fault. Due to the macropipeline design, the :MMEADR, MMEPTE and MMESTS registers must be conditionally loaded in a prioritized fashion. These registers are loaded depending on the relative states of their current contents and on the context of the current fault. If the MMESTS register is empty, the current fault state is always loaded. If the MMESTS register contains a valid fault condition, the M:M:EADR, MMEPI'E and MMESTS are only loaded if the current fault is associated with a pipe stage further along in the pipe than the stage corresponding to the stored MMESTS state. This loading priority is necessary because these memory management faults must be reported within the context of the execution of the instruction they are associated with. A fault detected on an Ebox reference is loaded provided that another Ebox reference fault is not already loaded. Faults detected on Ibox specifier references are only loaded if no Ebox or lbox specifier reference fault is currently stored. Faults on Ibox I-stream references are only loaded if the :M:M:ESTS register is empty. In effect, the MMESTS register captures the first memory management exception that will be associated with Ebox execution. Stated differently, it captures the fault which occurs farthest along in the macropipeline. 12-96 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 The LOCK field of MMESTS specifies the source of the faulting reference currently stored in MMESTS. Thus, the decision to load another faulting reference into MMESTS is made by examining the bits of the LOCK field. The FAULT field is set in a prioritized manner. That is, an ACV fault takes precedence over a TNV or M=O fault. A TNV fault takes precedence over an M=O fault. Therefore, if multiple pending fault conditions are true, only the fault condition with the highest priority is reported in the M:M:ESTS register. When the Ebox starts the memory management exception microflow, it issues an IPR_RD to the MMESTS to determine the nature of the memory management fault. The MMESTS register is automatically unlocked by resetting the LOCK field when the E%FLUSB_MBOx..B signal is asserted by the Ebox. 12.5.1.5.3.6 ACVITNV/M=O MME_DATAPATH Sequence When an ACVtrNVIM=O condition occurs the MME_DATAPATH performs the following actions in order to record the fault for subsequent use by the Ebox microcode. • • • cycle 1: AGV, TNY, or M=O condition is detected. MME_ADDR latches M_QUEo/cS5_VA..,.B address. Note that the S5 reference is NOT aborted. If the faulting reference is associated with an Ebox reference, Mo/cMME_TRAP_L is asserted to the micro sequencer to generate a memory management microtrap. If the faulting reference was associated with a DEST_ADDR command, the MME fault is logged in the corresponding PA_QUEUE entry. In all other cases (lREADs and Ibox D-stream reads) M%MME_FAULT_B qualifies the M%MD_BUS_B indicating that the requested data had a memory management problem. cycle 2: If this ACV/TNVIM=O sequence was not invoked from a previous MME_SEQ flow, the contents of MME_ADDR are loaded into TMPl. If this sequence was invoked from another MME_SEQ flow, TMP1 is not loaded because it already contains the original address that must be reported for this ACVtrNV condition. cycle 3: The source of the reference which directly/indirectly invoked the MME fault is compared to M:M:ESTS<31:29> (the LOCK field) to determine whether this fault should be recorded in MMEADR, MMEPTE, and in MMESTS. If a previous fault of equal or greater priority is already stored in MMESTS, MMESTS, MMEADR, and M:M:EPTE are not updated. If the LOCK field indicates that this fault should be recorded, MMEADR is loaded from TMP1 and MMESTS is updated as follows: Table 12-22: MMESTS State Update fault type MMESTS<15:14> MMESTS<2:O> ACV without MME_SEQ active (no modify intent) 01 000 ACV without MME_SEQ active (modify intent) 01 100 M=O 11 100 length violation on ref during TB_MISS seq (no modify) 01 001 length violation on ref during TB_MISS seq (modify intent) 01 101 DIGITAL CONFIDENTIAL The Mbox 12-97 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 12-22 (Cont.): MMESTS State Update fault type MMESTS<15:14> MMESTS<2 length violation on PTE ref during TB_MISS seq (no modify intent on original reference) 01 011 length violation on PTE ref during TB_MISS seq (modify intent on original reference) 01 111 TNV on PTE ref during TB_MISS seq (no modify intent on original reference) 10 010 TNV on PTE ref during TB_MISS seq (modify intent on original reference) 10 110 • The LOCK field of :MMESTS is updated appropriately. cycle 4: If MM:ESTS was updated during cycle 3, and the fault was M=O, the corresponding PTE address is formed using TMPl, the appropriate XBR and A+B ALU operation. The PrE address is then loaded into MMEPrE. 12.5.1.5.3.7 Microcode Invocation of ACV/TNVIM::O Microcode is invoked for ACVITNVfM=O faults in three different ways: • • If the faulting reference originated from the Ebox, then the Mbox asserts M%'M:MILTRAP_L to invoke a memory management microtrap. M%MMILTBAP_L is asserted at the end of the cycle in which the ACTfINVIM=O fault was detected. Thus, from a microcode point of view, the microtrap happened before the EM_LATCH contents were retired. This microtrap invokes the ACVrmvIM=O microflow which handles the fault in the context of the reference executing in the Ebox. If the faulting reference is a read sourced by the Ibox (either a D-stream or I-stream read), M_QUEo/cS5_QUAL_H<O> is set indicating that a memory management fault should be forced on this read. When the read propagates into S6, the Mbox forces the Pcache to hit and returns invalid data. This data, however, will be qualified with the M%MME_FAULT_H signal to indicate that the data is invalid and that an ACVITNVIM=O fault is associated with this data. When the Ebox references the corresponding D-stream operand, or requires the decode of the corresponding I-stream data, a microtrap is generated by the Ebox to invoke the ACVITNVIM=O microfiow. If an MME fault occurs on the address of the address of an operand (i.e. Ibox decoding a deferred specifier), the Mbox records the fault in MMEADR and MMESTS in the usual way and returns data qUalified by M%MM1LFAULT_H. In some instances, the Ibox must issue a second reference to the Mbox based on the address returned by the first reference. Due to the fault however, the Ibox cannot issue a valid operand read address since the data returned by the first reference was invalid. In this case, the Ibox issues a read qualified with the I%FORCEJWME_FAULT_R signal. This causes the Mbox to "fake" an ACVITNV violation by qualifying the returned data with M(~MltUtFAULT_R. This reference is trapped on when the Ebox references the operand. Note that when the Mbox "fakes" an ACVlTNVfM=O violation, the MME_DATAPATH does not invoke a memory management response to either an ACTtrNVlM=O problem or to a TB_MISS. Further, no state update is performed for either the MMESTS or MME_ADDR. Thus, these registers still record the true ACV/TNV error. 12-98 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 • If the faulting reference is a DEST_ADDR, an ACVtrNVlM=O hit in the PA_QUEUE is set in the corresponding PA_QUEUE entry. When the Ebox microcode checks for the validity of the PA_QUEUE in order to send the corresponding STORE data, the Ebox detects the ACVtrNVlM=O condition and generates the microtrap. The PA_QUEUE hardware must guarantee that the first PA_QUEUE entry of an unaligned pair of entries must he marked with the ACVlTNVIM=O condition regardless of which of the two references caused the fault. This is necessary so that the microcode takes the proper action at the start of the reference. If an ACV length violation or a TNV fault is generated on an Mbox PTE reference, the original reference (i.e. the reference that caused the memory management sequence which generated the PTE reference) must be marked as having an :MM:E fault associated with it. Thus, when the original reference is retried after the memory management sequence completes, the reference will be treated as if the MM:E fault was associated with it. Note that the l\fMESTS register records the fact that the actual fault was associated with the PTE reference and not the original reference. 12.5.1.5.3.8 Microcode Processing of ACVITNVIM=O: The NVAX macropipeline design can cause synchronization problems related to operating system processing of PTEs. The SRM states that "software is not required to flush TB entries after changing PTEs that were already invalid." Consider the case where an Ibox read prefetches an invalid PTE from a page table. Just after this read, the Ebox completes the previous macroinstruction by updating, validating and writing the same PTE back to memory. When the Ebox references the prefetched PTE operand, an invalid TNV fault will be generated because the PTE has just been validated. To prevent this scenario from occurring, the memory management fault microcode must re-test for fault conditions before invoking the actual fault sequence. If no fault is detected at this time, no fault processing occurs. Microcode re-tests the fault conditions by first asserting E%FLUSH_MBOX_H, which unlocks MMESTS and clears pending Mbox references. Following this, the microcode reads the fault address from MMEADR via an IPR_RD command and then issues a TBIS command corresponding to this faulting reference. The TBIS will clear out the potentially out-of-date PTE in the TB which is associated with the fault. The microcode will then issue a PROBE command to the same address. The PROBE will cause the updated PTE to be cached in the TB (unless a TNV fault is detected) and will record the new fault status in MMESTS and return the status to the Ebox. Note that the PROBE command does not lock MMESTS. If the microcode detects a valid fault upon reading the PROBE status, microcode fault processing continues. Otherwise, the instruction is restarted without causing a memory management fault. If a real ACV or TNV fault was detected, it re-reads M:M:ESTS to get the updated status based on the last PROBE operation. The microcode constructs and pushes the memory management fault stack frame consisting of the fault status, the contents of MMEADR, the PC of the corresponding instruction, and the PSL at the time of the fault. The microcode then reads the appropriate SCB (System Control Block) vector corresponding to either the ACV or TNV fault. Based on this vector, the microcode sets the appropriate CPU execution mode and redirects the PC to the appropriate operating system memory management macrocode fault handler. This software fault handler reads the fault status and the faulting address from the stack and processes the ACV or TNV fault based on this information. Once the fault is processed, an REI is executed, the macropipeline is flushed, and normal instruction processing resumes by restarting the instruction that originally caused the fault. DIGITAL CONFIDENTIAL The Mbox 12-99 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 If the microcode read MMESTS and determined the fault to be an M::O condition, the microcode processes the fault without the aid of operating system sofware. To do this, the microcode performs the following actions: 1. A TBIS command is issued which references the faulting address. This reference will cause the PTE, which was used to detect the M=O fault, to be invalidated from the TB. 2. The microcode will then test the faulting address to determine whether it was a process or system space reference. If it was a system space reference, the corresponding 8PTE address must be a physical address. If it was a process space reference, the corresponding PPTE address must be a virtual address. 3. The microcode then issues a DREAD using the PTE address it read from MMEPTE. If the microcode determined the PrE to be an SPTE, the read is issued with M_QUEo/tS5_QUAL..H<6>=O indicating a physical read. If the microcode determined the PrE to be a PPrE, the read is issued with M_QUEo/cS5_QUAL_H<6>=1 and M_QUEo/cS5_QUAL_H<2>=1 indicating a virtual read with ACV and M=O checks disabled because the Mbox must not perform M=O checks and ACV protection checks on PrE references. 4. When the PTE data is received, the Ebox sets the modify bit of the PrE indicating that the corresponding page is written. The new PrE is then written back into the page table in memory by issuing a physical WRITE or a virtual write with ACVIM=O checks disabled, depending on the physical or virtual nature of the PTE. 5. The microcode then flushes the macropipeline and resumes normal instruction processing by restarting the instruction corresponding to the M=O fault. Note that when the address which caused the M=O fault is restarted after the M=O fault was serviced, the Mbox will generate a TB_MISS condition since the old PTE was invalidated from the TB. Subsequently, a TB_MISS sequence will be invoked which will cause the new PTE to be read into the Mbox and cached in the TB. 12.5.1.5.3.9 12.5.1.5.3.9.1 Pipeline Implications of ACV/TNVIM=O condition Pipeline Effects for MME Faults on Write References If an Acv, TNV or M::O condition occurs on a write reference, the faulting write is transformed into a NOP command in the S6 pipe. Thus, the Pcache and Bcache are prevented from modifying any memory state as a result of a memory management fault detected in 85. 12.5.1.5.3.9.2 Pipeline Effects for MME Faults on Read References If the faulting reference is a read, the read must be prevented from leaving the Mbox pipe since a read to 110 space could cause detrimental state changes. This is handled by forcing the deassertion ofMo/tCBOX,.REF_ENABLE_L which causes the Cbox to ignore the read. 12.5.1.5.3.9.3 Pipeline Effects of E%FLUSH_MBOX_H on MME State A more subtle implication involving the NVAX macropipeline exists whieh affects updating recorded Mbox MME state. Since the MME_SEQ executes independently of the Ebox microcode, the MME_8EQ must appropriately synchronize to Ebox execution such that MME state will not be updated for references that will never be processed by the Ebox. 12-100 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Consider the following situation. A tb_miss sequence has begun. on a specifier reference. During this sequence, the Ebox detects a branch mispredict which causes redirection of the processing stream. As the PTE data is returned to the Mbox, a TNV condition is detected. This TNV must not be recorded because it corresponds to a reference which the Ebox will not see due to the redirection of the execution stream. From the Mbox point of view, handling this scenario can be generalized as follows. If the Mbox receives a Eo/oFLUSH_MBOX signal during any memory management sequence which may update mme state, one of three possibilities will happen: 1. If E%FLUSH_MBOX is received after MME state has been updated, E%FLUSH_MBOX will unlock MMESTS so that only Iv.IME state corresponding to the redirected execution stream will be recorded. 2. If E%FLUSH_MBOX is received during the cycle that an mme state update is being done, the functional effect of E%FLUSH..MBOX will predominate, thus causing the MMESTS to be unlocked. 3. If E%FLUSH_MBOX is received before the state update, :MMESTS will be cleared by E%FLUSH..MBOX and a state bit will be set which will inhibit any mme state updates during the remaining mme sequence. Note that the analogous problem exists when processing a memory management sequence on an IREAD when I%FLUSH_IREF_LAT_H is asserted. In this case, the following three possibilities can occur: 1. IfI%FLUSH_mEF_LAT_H is asserted when MMESTS contains a validated fault on an IREAD, I%FLUSH_mEF_LAT_H will unlock MMESTS. 2. If I%FLUSH_mEF_LAT_H is asserted during the cycle that an mme state update is being done on an IREAD reference, the functional effect of Io/oFLUSH_mEF_LAT_H will predominate, thus causing MMESTS to be unlocked. 3. If I%FLUSH_mEF_LAT_H is received before a MMESTS update but during a memory management fault sequence invoked from an IREAD, MMESTS will be cleared by I%FLUSH_mEF_LAT_H and a state bit will be set which will inhibit the subsequent mme state update. . Note that while a special state bit is necessary to synchronize MME updates with Ebox execution stream redirection, no special mechanism is required to keep TB state synchronized. There are two reasons for this. First, the TB never validates a PTE whose PrE valid bit is clear. Secondly, the Mbox arbitration logic prevents Ebox references such as TBIS, TBIP, and TBlA from executing when a memory management sequence is executing. Therefore, TB state updates are always serialized ~th respect to TB invalidates generated by the Ebox microcode. 12.5.1.5.3.9.4 Pipeline Effects of E%FLUSH_MBOX_H on M%MME_TRAP_L Just as E%FLUSH~BOx....H must be examined in order that MME state remains synchonized to Ebox execution, E%FLUSH_MBOX_H must also be factored into the logic which generates Ml1cMME_TRAP_L. This prevents the following scenario from occurring. If the Ebox has issued a DREAD which misses in the Pcache as a result of a MOVC instruction, the Mbox will propagate the reference forward to the Cbox. While the read is pending, the Ebox issues an MME_ CHK command which TB misses causing the Mbox to initiate a TB miss sequence. During this sequence, the Cbox returns the read data qualified by Co/oCBOX_HARD_ERR_H. This causes the Ebox to microtrap into the error handler resulting in the assertion of E%FLUSH_MBOX_H. If the DIGITAL CONFIDENTIAL The Mbox 12-101 NVAX CPU Chip Functional Specificatio~ Revision 1.1, August 1991 Mbox were to subsequently assert M%MME_TRAP_L based on a memory management fault on the MME_CHK command, the Ebox would microtrap out of the error handler and initiate MME fault sequence that should never occur. Thus, the assertion of E»OFLUSB.JWBOx...U during a memory management sequence inhibits the assertion of M%MME_TRAP_L during that cycle or any subsequent cycles of the memory management sequence. 12.5.1.5.4 Cross Page Sequence When an unaligned virtual reference falls across a page boundary, ACVtrNVlM=O checks must be performed on both pages before the Mbox can determine if the reference passes or fails ACV checks. The function of the cross-page sequence is to generate an :M:ME_CHK reference to check the second page (i.e. the upper page) for ACV/rNVIM=O problems. As long as the MME_CHK clears memory management checks before the reference is allowed to execute, the reference can be processed in the normal manner because ACVITNVIM.=O checks on the first page (i.e. the lower page) will naturally occur as they do on all virtual references. If an ACVrrNV problem is found on either page, an ACVtrNV condition is flagged for the reference. When the cross-page detection logic flags a cross-page condition, the following cross-page sequence is invoked: • • • • cycle 1: The cross-page condition is detected. The 85 reference is aborted. The MME_ADDR latches the M..QUE%S5_VA...H address. cycle 2: The MME_DATAPATH adds 512 to the address in MME_ADDR. The resulting address is guaranteed to fall into the upper page of the original reference for all byte, word, longword and quadword references. This address is loaded into the !\{ME_LATCH qualified by an MME_CHK command. The MME_CHK reference (with DL=byte) will perform memory management checks on the upper page. cycle 3: The MME_CHK is executed in 85 (assuming no Cbox reference took priority). If a TB_MI88 occurs, the TB_MI8S sequence is first invoked to obtain the proper translation. Once the TB has been updated based on the TB_MISS, the original MME_CHK reference will be restarted and the cross-page sequence will be re-invoked from the beginning. When the translation of the MME_CHK reference has properly occurred, ACVIM=O checks are performed (note that TNV checks are only performed when the PTE is to be fille,d in TB). If an ACVITNVfM=O fault is detected during the MME_CHK processing, M_QUEo/fB5_QUAL_B<O> of the original reference, which caused the cross-page sequence, is set. Thus, when this reference is restarted, an MME fault will be reported. If no ACVITNV/M=O condition was detected on the upper page, the original reference is marked as having passed the cross-page condition (M_QUE%S5_QUAL_H<5> is set). cycle x: The original reference is restarted. If no ACVITNVIM=O fault occurred on the upper page the reference executes normally without further cross-page checks. If the reference was marked as having an MME fault, the reference fault will be reported in the previously-described fashion (see Section 12.5.1.5.3.7). The cross-page sequence is only invoked on a virtual reference when memory management is enabled. 12-102 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.6 MBOX ERROR HANDLING 12.6.1 Types of Errors Handled Mbox plays a role in the processing of the following types of errors: • • • • • TB tag parity errors. TB data parity errors. Pcache tag parity errors. Pcache data parity errors. Errors encountered by the Cbox while processing a memory read, I/O space read, or IPR_RD which were transferred from the Mbox to the Cbox. Note that these errors could originate from the Bcache, NDAL or memory subsystem. All other possible errors are handled without Mbox involvement. 12.6.2 TB parity error detection 12.6.2.1 TB tag parity error detection Conceptually, a single bit of even parity representing TB tag parity is stored in each TB entry. Whenever a valid tag entry matches the 85 virtual page address, the corresponding tag parity data is accessed and driven out of the °TB array for a subsequent parity check. Thus tag parity errors are only detected on the entry which causes a TB hit condition. The value of tag parity with which the stored parity data is compared to is calculated in parallel with the TB access by using the virtual page address found on M_QUE%S5_V~H<31:9>. This scheme eliminates the need to drive out the matched tag entry in order to calculate parity. If the tag matched the virtual page address, then the correct parity value can be derived from M_QUE%S5_V.A.,.H<31:9> instead of from the stored tag. This scheme is called predicted parity. Tag parity in a fully associative cache can cause several different failure modes since the tag state directly determines which entry (or entries) are selected during each TB access. Assuming a single bit soft failure occurs in a single TB tag (i.e. a tag bit accidentally toggles due to some transient failure mode), three possible failure modes are possible: 0 1. A single bit tag error can cause no TB entry to match because the tag no longer compares with the virtual page address that it should have compared to. Thus, a TB_MISS condition is generated which causes the PTE data to be accessed from memory. This PrE data, along with its corresponding tag, will be written into a TB entry. In effect, this scenario causes the single bit tag error to remain undetected, but does not corrupt the virtual address translation process. 2. A single bit tag error may cause exactly one TB entry to match because the incorrect tag entry happens to match a virtual page address which is not already cached in the TB. In this situation, the tag parity read out of the TB is guaranteed not to match the virtual page address parity. Thus a TB tag parity error will be correctly detected. 3. A single bit tag error may cause two TB entries to match because the incorrect tag entry happens to match a virtual page address which is already cached in the TB. Thus, the correct tag entry detects a match at the same time as the incorrect tag entry detects a match. DIGITAL CONFIDENTIAL The Mbox 12-103 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Due to the wired OR function implicit in accessing data off of a shared bit line within the TB array, it is possible that the tag parity read out of the array matches the parity of the virtual page address causing no tag parity error to be detected. In this case, the wired OR function on the PTE bit lines will OR the two accessed PTE entries together causing an incorrect PTE to be read out. If an even number of PTE bits were corrupted by the simultaneous PrE access, the parity logic associated with the PTE data will not detect a problem. This is a disatrous situation to the currently-executing CPU process because the TB will produce an incorrect translation without producing a parity error. .As a result of the undetected fatal parity error discussed in this third case, a single bit of tag parity is stored in both its true and complement form in each TB entry. For a single entry match, these two parity lines always produce a "01" or "10" value. Due to the wired OR access, a two-entry TB match due to a single bit tag parity error, produces a "11" parity access indicating a multiple tag match and a tag parity error. TB tag parity is written along with the tag during a TB_TAG_FILL operation. 12.6.2.2 TB data parity error detection Data parity error detection is conceptually simpler than tag parity detection. When a TB hit condition occurs the accessed PTE data is driven out of the TB along with the corresponding stored data parity. Parity is then calculated on the data and compared with the stored parity. A miscompare results in a TB data parity error. TB data parity is a single bit corresponding to the entire stored PrE field. TB data parity is written along with the PTE data during a TB_PTE_FILL operation. 12.6.3 Pcache parity error detection 12.6.3.1 Pcache tag parity error detection Pcache tag parity is stored and checked as a single bit representing even parity across the entire 20-bit tag field. Unlike the TB implementation however, true and complement versions of single bit tag parity are not implemented-only the true version is implemented. There are two separate aspects to Pcache tag parity error detection. The first aspect employs the "predicted parity" scheme which was used for the TB. However, the Pcache does not use predicted parity to directly detect tag parity errors. Instead, predicted tag parity is factored into the Pcache hit logic such that a Pcache miss will be forced if the tag parity does not agree with the parity calculated on the input address. By doing so, the tag parity design does not have to handle the case of a Pcache hit causing data to be returned to the Ibox, Ebox or Mbox in the presence of a Pcache tag parity error. Pcache predicted tag parity works by generating parity on Mo/D86_P.A.....H<31:12> at the same time as the Pcache access is taking place. If a validated tag matches the address on Mo/aSG_P.A.....H<31:12>, but the tag parity does not match the predicted parity, a Pcache miss is forced. The second aspect of Pcache tag parity error detection explicitly detects the tag error condition after the Pcache access has completed. Both banks of the tag store have their own tag parity generator. When both tags of the addressed Pcache index are driven out of the tag store, the two parity generators calculate tag parity based on the two accessed tags. These calculated values are compared to the corresponding stored tag parity which was accessed from the tag store with the tag data. If a miscompare occurs, a tag parity error is flagged. Note that this mechanism allows 12-104 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 miscomparing tags to be flagged as tag parity errors while the other tag may simultaneously generate a Pcache hit or miss. Pcache tag parity is checked on both tags on all Pcache I-stream read operations only when I_ENABLE=1 and FORCE_ffiT=O in the PCCTL. Pcache tag parity is checked on both tags on all Pcache D-stream read and write operations only when D_ENABLE=l and FORCE_HIT=O in the PCCTL. When FORCE_HIT=!, tag parity is never checked. Pcache tag parity is never checked on an IPR_RD operation to a Pcache tag. Tag parity is written on a cache fill operation or on an IPR_WR to a Pcache tag. 12.6.3.2 Pcache data parity error detection Byte parity is maintained for each Pcache hexaword block.. Therefore, each block contains 32 bits of parity-one bit of even parity for each byte of data. Pcache data parity is checked on the same conditions as Pcache tag parity checks except for two differences: 1. Unlike tag parity, Pcache data parity errors are only detected during a Pcache hit condition. One exception to this rules exists though. If the Pcache force hit condition exists due to a memory management fault or hard fault, then Pcache data parity is not checked in spite of the Pcache hit condition. 2. Unlike tag parity, data parity is written into the array during a Pcache write operation rather than checked. Mo/tS6_BYTE_MASI(..H<7:O> enables writing data parity into the Pcache in the same manner as M%S6_BYTE~K..H<7:O> enables writing data into the Pcache. Therefore, each data parity bit is only updated as its corresponding byte of data is updated in the Pcache array. The Pcache data parity check begins following the completion of the Pcache read access. Correct parity is generated on all eight data bytes read out of the Pcache. Each bit of generated data parity is compared to its corresponding stored parity. If one or more mismatches is found, a Pcache data parity error has occurred. Note that the parity check is independent of which bytes of the eight accessed bytes were actually requested by the read reference. Therefore, a Pcache data parity error can occur even though the requested bytes of data have correct parity. 12.6.4 Recording Mbox errors When any hard error is detected within the system, the error is recorded in one of many error status registers located throughout the NVAX system. When the operating system error handler routine is invoked from a microtrap or interrupt, the handler can read the state of all the error registers through IPR_RD operations to determine what error or errors were present when the error handler was invoked. The Mbox contains four of these error registers. Two are used to record TB parity errors and the other two are used to record Pcache parity errors. DIGITAL CONFIDENTIAL The Mbox 12-1 05 NVAX CPU Chip Functional Specmcationt Revision 1.1, August 1991 12.6.4.1 TBSTS and TBADR The TB status register is shown below: Figure 12-56: IPR ED (hex), TBSTS 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 SRC 1 0 1 01 0 1 01 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 01 0 1 01 0 1 0 1 01 CMD 1 1 1 : TESTS +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 1 1 1 EM VAL---------+ 1 1 1 TPERR-------------+ 1 1 DPERR----------------+ 1 LOCK--------------------+ Table 12-23: TBSTS Field Descriptions Name Enent Type Description LOCK o WC Lock. Bit. When set, validates TBSTS contents and prevents any other field from further modification. When clear, indicates that no TB parity error has been recorded and allows TBSTS and TBADR to be updated. 1 RO Data Error Bit. When set, indicates a TB data parity error. 2 RO Tag Error Bit. When set, indicates a TB tag parity error. 3 RO EM_LATCH valid bit. Indicates if EM_LATCH was valid at the time of the error TB parity error detection. This helps the software error handler determine if a write operation may have been lost due to the TB parity error. CMD 8:4 RO S5 command corresponding to TB parity error. SRC 31:29 RO Indicates the original source of the reference causing TB parity error. Table 12-24: SRC Encodlngs Definition Defined SRC values 110 100 000 valid IREAD error is stored valid Ibox specifier reference error is stored valid Ebox reference error is stored See Figure 12r-27 for the format description of TBADR. When a TB parity error is detected with LOCK=O, TBADR is loaded with the virtual address which caused the TB parity error, and all fields of TBSTS are updated to record the nature of the TB parity error. Note that both the TPERR and DPERR bits can be set at the same time if these two error conditions occurred during the same cycle. When a TB parity error is recorded, the LOCK bit is set to validate the contents of both TBSTS and TBADR registers. When LOCK is set, all bits of both registers are frozen and cannot be changed until the LOCK bit is cleared. Thus, any subsequent error is not recorded if LOCK=1. 12-106 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 When the operating system. error handler is invoked, TBSTS and TBADR will be read through an IPR_RD command in order to determine if any TB parity errors were recorded. If the state of the LOCK bit was read to be a zero, then no error has occurred and the remaining state information in these two registers is invalid. If the LOCK bit was found to be set, then the remaining error state of these two registers characterizes the nature of the recorded error. Once the error handler has read these registers, it re-enables TBSTS to record any new errors by clearing the LOCK bit. Clearing the LOCK bit is accomplished by writing a "1" to LOCK through an IPR_WR operation. 12.6.4.2 PCSTS and PCADR The PCSTS register is shown below: Figure 12-57: IPR F4 (hex), PCSTS 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 I 11 11 11 11 11 11 1 I CMD I 1 1 1 I : peSTS +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 1 I 1 I PTE_ER---------+ I I 1 I PTE ER WR---------+ I I I LEFT BANK---------------------------+ I 1 RIGHT_BANK-----------------------------+ 1 DPERR-------------------------------------+ LOCK-----------------------------------------+ Table 12-25: PCSTS Field Descriptions Name Extent Type Description LOCK o we Lock Bit. When set, validates PCST8<8:1> contents and prevents modification of these fields. When clear, invalidates PCSTS<8:1> and allows these fields and PCADR to be updated. DPERR 1 RO Data Error Bit. When set, indicates a Pcache data parity error. RIGHT_BANK 2 RO Right Bank Tag Error Bit. When set, indicates a Pcache tag parity error on the. right bank.. LEFT_BANK 3 RO Left Bank Tag Error Bit. When set, indicates a Pcache tag parity error on the left bank. CMD 8:4 RO S6 command corresponding to Pcache parity error. PTE_ER_WR 9 we Indicates a hard error on a PTE DREAD which resulted from a TB miss on a WRITE. PTE_ER 10 we Indicates a hard error on a PTE DREAD. The PCSTS and PCADR record Pcache tag and data parity errors. The function and operation of these registers is identical to the TBSTS and TBADR registers except that the PCADR stores physical quadword addresses rather than virtual byte addresses, and it also records PTE hard error events. The definitions of these registers are shown in Figure 12-29 and Figure 12-30. DIGITAL CONFIDENTIAL The Mbox 12-107 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Note however, that when PC8TS<O> is set, Pcache memory reads, writes and invalidates are disabled. The PCSTS is a partial misnomer in that it also records hard error state associated with fatal elTors occurring on Mbox PTE DREAD references. These hard errors have nothing to do with Pcache parity errors, however, they are included in PCSTS for implementation simplicity. The PTE_ER bit of PCSTS will set whenever the Cbox has returned fatal error status on a requested PrE DREAD. The PrE_ER_WR bit of PCSTS will set whenever the Cbox has returned fatal error status on a requested PTE DREAD which was due to a TB miss on a WRITE reference. Both of these bits may be set independently of the LOCK bit of PCSTS. Further, the state of these bits are always valid regardless of the state of the LOCK bit. These two bits can only be cleared by a write-one-to-clear operation to each bit. 12.6.5 Mbox Error Processing 12.6.5.1 Processing TB parity errors TB tag parity errors can be detected on all commands which cause a TB tag lookup to occur (See Section 12.6.5.4). TB data parity errors can be detected on all commands in which data can be read out of the TB (See Section 12.6.5.4). For hardware simplicity, the detection of any TB parity error will cause the Mbox to generate a hard error microtrap and will cause the faulting reference and all pending Ibox, Ebox and Mbox references to be cleared. Thus, any TB parity error is fatal in the sense that it is non-recoverable and will cause a machine check. The following describes the specific sequence of events which occur following the detection of a TB tag parity error, or a TB data parity error: 1. If the TBSTS register is locked, TB8TS state is not updated. Assuming the TB8TS is not locked, the TB parity condition is recorded in the TBSTS and the associated virtual address is loaded into TBADR. TBSTS and TBADR are subsequently locked by setting TBSTS<O>. The Mbox asserts M%TB_PEBR_TRAP_L to invoke a hard error microtrap. The valid bits of the IREF_LATCH, SPEC_QUEUE, EM_LATCH, VAP_LATCH, and RTY_DMISS_LATCH are unconditionally cleared to eliminate all pending references which might involve a subsequent TB operation. 2. The TB parity error detection causes the MM:E_DATAPATH to invoke the TB parity error sequence. As a result, the MME_DATAPATH issues a TBIA command. The reference which caused the TB parity error is transformed into a NOP command as it propagates into the- 86 pipe. Thus, this reference will not modify any Pcache, Bcache or Cbox state. 3. The TBIA command executes in 85 causing all TB entries to be invalidated and for the NLU pointer to be reset. All TB entries are invalidated rather than just the one which caused the parity error. This is done based on the premise that a single soft failure in the TB may affect more than one entry. Thus, each distinct soft failure will only be detected and reported once. 12-108 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.6.5.2 Processing Pcache parity errors Pcache tag parity errors can be detected on all commands which cause a Pcache tag lookup to occur (See Section 12.6.5.4). Pcache data parity errors can be detected on all commands in which data is read out of the Pcache (See Section 12.6.5.4). The strategy behind processing Pcache parity errors is to tum. off the Pcache and let the Cbox process the reference from the Bcache or from main memory. Thus, in the absence of any of errors from the Cbox or memory subsystem, a Pcache parity error never causes an error fatal to the currently executing process. The following describes the specific sequence of events which occur following the detection of a PCACHE tag parity error: 1. The Pcache tag parity error is recorded in it and the corresponding physical address is recorded in PCADR. PCADR and PCSTS are subsequently locked by setting the LOCK bit of PCSTS. Locking PCSTS automatically disables the Pcache from performing any subsequent non-IPR operations. The Mbox asserts MtfcMBOX".S_ERROR_H to :Bag an interrupt which will guarantee that the parity error will be recorded as a soft error at some future time. If the Pcache operation is a write, the Cbox will automatically continue processing the reference independent of any parity error condition. In the case of read operations, the predicted parity mechanism guarantees that a Pcache miss condition will occur when a tag parity error is detected. Thus, M%CBOX".REF_ENABLE_L is asserted in response to the Pcache miss condition causing the Cbox to continue to process the read reference. The following describes the specific sequence of events which occur following the detection of a PCACHE data parity error: 1. The Pcache data parity error is recorded in it and the corresponding physical address is recorded in PCADR. PCADR and PCSTS are subsequently locked by setting the LOCK bit of PCSTS. Locking PCSTS automatically disables the Pcache from performing any subsequent non-IPR operations. The Mbox asserts M%MBOX".S_ERROR_H to :Bag an interrupt which will guarantee that the parity error will be recorded as a soft error at some future time. If the Pcache operation was a read in the absence of an outstanding fill operation, then M%CBOX".LATE_EN_H is asserted to inform the Cbox that it must continue to process the S6 reference because of the Pcache data parity error. M%CBOX".LA.TE_EN_H may be asserted in spite of the fact that M%CBOX_REF_ENABLE_L was deasserted earlier in the cycle because M%CBOX".REF_ENABLE_L is dependent on the Pcache hit condition but not on the parity error detection. The Pcache read reference is loaded into the corresponding MISS_LATCH and the read is treated in subsequent cycles as a normal Pcache miss sequence. If the Pcache operation was a D-stream read which occurred during an outstanding fill operation, M%CBOX".LATE_EN_H is not asserted because the Mbox and Cbox are unable to handle another fill at this point. When the the fill sequence completes, this reference will be retried (from the RTY_DMISS_LATCH), and MtfcCBOX".LATE_EN_H will be issued. Note that M%CBOX".LATE_EN_H is never asserted during a Pcache write operation. DIGITAL CONFIDENTIAL The Mbox 12-109 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.6.5.3 Processing Cbox errors on Mbox-lnlUated read-like sequences The Cbox detects errors that occur in the Bcache, NDAL or memory subsystem. When the Cbox detects one of these errors, and it is associated with an Mbox-initiated reference that requires data to be returned (e.g. memory read, 110 space read, or IPR read), the Mbox must transfer the eITor status of the reference back to the destination corresponding to the reference. The Mbox never records a Cbox-detected eITor in Mbox error registers because the eITor is logged in Cbox eITor registers. 12.6.5.3.1 Cbox-detected ECC errors The Cbox returns requested data through a I_CF or D_CF command to the Mbox while simultaneously checking the error-coITeCtion code to check for a possible Bcache error. If an ECC eITor is found, the Cbox asserts Co/cCBOx...ECC_ERR_H. This causes the Mbox to latch a NOP in the CBOX_LATCH rather than the cache fill. As a result, the Mbox does not perform any Pcache state updates resulting from the bad data nor does it assert M%VIC_DATA.L, M%mox...DATA_L, M%EBOx...DATAJI, or MO/ciMBOx...DATA to indicate the presence of valid data. During subsequent cycles, the COOx will determine if the ECC error is correctable or not. If it is, the data will be corrected and returned. If the data is not cOITectable, a Cbox-detected hard eITor has occurred and will be dealt with as described below. Note that the ECC detection mechanism is what verifies the validity of the data. The COOx does not send any parity information in order for the Mbox to check the validity of the received data. 12.6.5.3.2 Cbox-detected hard errors on requested fill data If the Cbox has determined that the requested data cannot be returned for some reason, the Cbox drives a cache fill command qualified by C%cBOx...BARD_ERR_R. When this happens, the Mbox performs the following actions: 1. The assertion of C%CBOx...HARD_ERR_H indicates to the Mbox that the cache fill data is invalid. Thus, the Mbox returns the invalid data on the M%MD_BUS_H in the same manner that all data is returned except that the data is further qualified by M%BARD_ERR_H. M%HARD_ERR_H informs the receiver that the data is invalid and that the requested data cannot be returned due to a hard error. 2. Once the COOx detects a hard eITor on the requested data, the Cbox immediately terminates the pending fill sequence by the assertion of C%LAST_FILLJI. Thus, no further data corresponding to the same fill sequence will be returned and the Mbox fill sequence corresponding to the eITor is terminated by invalidating the corresponding MISS_LATCH. 3. An I_CF or D_CF command which is qualified by C%cBOX_HARD_ERR_H is interpreted by the Pcache as an INVAL command. Thus the invalid data is not filled in the Pcache. 12-110 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.6.5.3.3 Cbox-detected hard errors on non-requested fill data The Cbox performs the same actions as described above to indicate the presence of a hard error regardless of whether the data is the requested data or just one of the other three pieces of fill data for the corresponding Pcache block. If the data is non-requested fill data, the Mbox performs the following actions: 1. Once the Cbox detects a hard error on the non-requested data, the Cbox immediately terminates the pending fill sequence by the assertion of C%LAST_FlLL_H. Thus, no further data corresponding to the same fill sequence will be returned and the Mbox fill sequence corresponding to the error is terminated by invalidating the corresponding MISS_LATCH. 2. An I_CF or D_CF command which is qualified by C%CBOX_HARD_ERR_H is interpreted by the Pcache as an INVAL command. Thus the invalid :fill data is not filled in the Pcache and all previous fills to the same block are invalidated. This is necessary in order to maintain coherency between the Pcache and Bcache because a Bcache data block will only be validated if all the data within the block is error-free. 12.6.5.3.4 Microcode Invocation on Cbox-detected Hard Errors When the Cbox indicates a hard error on requested read data, invalid data is -driven on the M%MD_BUS_H qualified by M%HARD_ERR_B to indicate that the data is invalid due to a hard error. When the Ebox references the corresponding data a microtrap is generated by the Ebox to invoke the hard error micro:flow. If the hard error occurs on the address of the address of an operand (i.e. Ibox decoding a deferred specifier), the Mbox returns data qualified by M%BARD_ERR_B in the normal manner. However, in some instances, the Ibox must issue a second reference to the Mbox based on the address returned by the first reference. Due to the hard error however, the Ibox cannot issue a valid operand read since the data returned by the first reference was invalid. In this case, the Ibox issues a read qualified with the I%FORCE_BARD_FAULT_B signal. If this deferred specifier is a source operand, the Mbox "fakes" a hard error on this read by forcing a Pcache hit and by qualifying the returned data withM%BARD_ERR_B. This reference is trapped on when the Ebox references the operand. If this deferred specifier is a destination specifier, the Mbox sets the corresponding hard error bit in the the PA...QUEUE. The hard error condition is then propagated to the Ebox through M%PA...Q..STATUS_B<2>. If a hard error is generated on an Mbox PTE reference, this fact is recorded in the PCSTS register (see Section 12.6.4.2), the tb_miss sequence is immediately terminated, and the original reference (i.e. the reference that caused the memory management sequence which generated the PTE reference) is tagged as having the hard error associated with it. When the original reference is retried after the memory management sequence completes, the reference will be treated as if the hard error actually occurred on it. If the original reference was a read from the Ibox, the Mbox asserts M%RARD_ERR_H as it retums the invalid data to notify the Ibox or Ebox of the problem. The error handler will be invoked by the Ebox once the Ebox references the invalid data. The error handler will then read all error registers in the system to determine the nature of the error (note that the Cbox has recorded the physical PrE address of the fatal read). DIGITAL CONFIDENTIAL The Mbox 12-111 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Hard errors on PTE DREADs resulting from a TB miss on a DEST_ADDR get reported through the M%PA,..Q...STATUS_B<2> mechanism described. above. Thus, any hard error on a PrE reference invoked. by an Ibox reference will always be reported within the context of the executing instruction. However, fatal errors on PTE DREADs resulting from MME_CHK and WRITE references pose a more difficult problem than PrE errors resulting from reads. Since both of these references do not cause the Ebox to wait for a response from the Mbox, a more involved sequence is implemented in order to maximize the ability to report the fatal eITor within the context of the cOITesponding instruction execution. Thus, when a PTE eITor is detected on ANY Ebox reference except for PROBEs, the following sequence will take place: 1. The Mbox will immediately assert MtQfME_TRAP_L (unless the Ebox has previously asserted EVLUSH-.MBOx..H during the tb miss sequence). The MME sequencer will update MMEADR to record the original address of the reference which resulted in the tb miss sequence-it does not record the PTE address. The MME sequencer will update MMESTS<2> to indicate whether the original address had modify intent. The FAULT, PrE_REF, and LV fields of MMESTS are UNPREDICTABLE in this context. 2. The assertion of MfQIME_TRAP_L will cause the Ebox to immediately trap to the mme microflow. 3. The mme microflow will examine MMESTS<2> and issue a PROBE command to the address in MMEADR to determine to nature of the mme fault. 4. The PROBE will invoke another TB miss. If the PTE error does not reoccur, valid PROBE status will be returned to the Ebox indicating the absence or presence of a true mme fault. In this case, Ebox processing of the current instruction will continue with no consequences due to the transient hard error. If the PrE error does reoccur on the TB miss during PROBE processing, the PROBE status returned. to the Ebox will be qualified with M%HARD_ERR_H indicating that a fatal error occurred during the PROBE reference. This will invoke the error handler within the context of the executing instruction. 12.6.5.4 Mbox Error Processing Matrix The following table summaries all Mbox error handling. A blank entry in the table means that the corresponding error cannot occur for the given reference. Table 12-26: Mbox Error Handling Matrix Command TB tag parity error TB data parity error Pcache tag parity error Pcache data parity error Cbox hard error A A B D F lbox references IREAD 12-112 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Speci:.fication, Revision 1.1, August 1991 Table 12-26 (Cont.): Mbox Error Handling Matrix TB data parity error Pcache tag parity error Pcach.e data parity error COOl[ hard Command TB tag parity error DREAD A A B D F DREAD_MODIFY A A B D F DEST_ADDR A A DREAD A A B D DREAD_LOCK A A B F F error STOP_SPEC_Q Ebox references C STORE WRITE A A C WRITE_UNLOCK A A C IPR_RD (to Pcache) F IPR_RD (non-Mbox) IPR_WR (to Pcache) IPR_WR (non-Mbox) PROBE A A MME_CHK A A A A TB_TAG_FILL TB_Pl'E_FILL TBIS TBIP TBIA LOAD_PC Mbox references Pl'EDREAD B D G TB_TAG_FILL TB_PrE_FILL DIGITAL CONFIDENTIAL A The Mbox 12-113 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 12-26 (Cont.): Command Mbox Error Handling Matrix TB tag parity error data parity error A A TB Pcache tag parity error Pcache data parity error Cbox hard error Cbox references E H H LEGEND: A. • • • • Mbox microtraps Ebox by assertion of M%TB_PEBR_TBAP_L during cycle error was detected. The faulting reference and all pending lbox and Ebox references are blown away. TBIA command is issued to invalidate entire TB. TBSTS and TBADR are updated appropriately. B. • • • A Pcache miss condition is forced to occur on this read reference causing the assertion of MO/oCBOX-REF_ENABLE_L. This instructs the Cbox to continue processing the read reference. M%MBOX-S_ERROR_B is asserted to post a soft error interrupt. PCSTS and PCADR are updated appropriately (a side effect of this operation turns off the Pcache). C. • • • The Cbox continues to process the write reference, as is done on all write operations regardless of a Pcache parity error. M%MBOx...S_ERROR_B is asserted to post a soft error interrupt. PCSTS and PCADR are updated appropriately (a side effect of this operation turns off the Pcache). D. • M%CBOx...LATE_EN_B is asserted to instru.ct the Cbox to continue processing the reference • M%MBOX-S_ERROR_B is asserted to post a soft error interrupt. • PCSTS and PCADR are updated appropriately (a side effect of this operation turns off the Pcache). which caused the Pcache parity error. 12-114 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 E. • • • The invalidate operation takes place in spite of the tag parity error because the invalidate is only a function of matching all tag bits. M~Ox..,S_ERROa..B is asserted to post a soft elTor interrupt. PCSTS and PCADR are updated appropriately (a side effect of this operation turns off the Pcache). F. • • • The Cbox indicated a hard error for a non-PTE read or IPR_RD operation by the assertion of C%CBOx..,BARD_ERB._B and C%LAST_FlLL.-B. If the hard error corresponded to the data explicitly requested by the Mbox reference, M%BABD_ERR_B qualifies M%MD_BUS_B data indicating to the M%MD_BUS_B receiver that a hard error occurred while accessing the requested data. The fill sequence is immediately terminated by the assertion of C%LAST_FILL_B. and the entire Pcache block corresponding to the fill is invalidated. G. • The hard error detected by the Cbox on this Mbox-issued PTE DREAD is recorded in PCSTS. The tb miss sequence is immediately terminated. IF the error resulted from an Ibox reference, the error is tagged back to the appropriate Ibox reference latch. The error is then signaled via M%BARD_ERR_B when the requested data is returned on M%MD_BUS_B, or is reported through PA...Q...STATUS<2> (for DEST_ADDR commands). If the original reference came from the Ebox, M%MME_TRAP_L is asserted (in all cases except for PROBE references). This will invoke the memory management fault handler in order to try to report the hard error within the context of the execution of the instruction (see Section 12.6.5.3.4 for more information). • The fill sequence is immediately terminated by the assertion of C%LAST_F'ILkH. and the entire Pcache block corresponding to the fill is invalidated. H. C%CBOx..,BARD_ERR_B was asserted by the Cbox during an I_CF or D_CF command. This is the mechanism by which the Cbox informs the Mbox of a hard error during a read or IPR_RD operation where the Cbox must return data. Thus, see the error responses specified by F and G for the error response within context of the original read operation. DIGITAL CONFIDENTIAL The Mbox 12-115 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.7 MBOX INTERFACES The Mbox passes data and/or control information to four other sections of the NVAX chip. These sections are: 1) Ibox, 2) Ebox, 3) Useq and 4) Cbox. This section will describe the interfaces to each of these sections. 12.7.1 IBOX INTERFACE 12.7.1.1 Signals from lbox • I%IBOx...CMD_L<4,1:0>: Command field of reference sent by Ibox. • I%IBOX'.J\DDR_B<31 :0>: Transfers addresses of Ibox references to Mbox. • I%IBOx...TAG_L<2:0>: Ebox reg file destination of reference sent by Ibox. • I%IBO~_L<I:0>: Access type of reference sent by Ibox. • I%IBOx...DL_L<I:0>: Data length of reference sent by Ibox. • I%IBOx...REF_DEST_L<I:0>: Indicates the destination(s) of the requested Ibox reference . • • I%IBEF_REQ..H: When asserted, indicates that a valid IREAD reference is present on the I%IBOX'.J\DDR_B<31:0> bus. • I%SPEC..REQ..,B: When asserted, indicates that a valid specifier reference is being issued to • I%FORCE..MME_FAULT_B: Indicates that the associated Ibox reference should be forced to "look" like a memory management fault from the Ibox point of view. I%FORCE_HARD_FAULT_B: Indicates that the associated Ibox reference should be forced to "look" like a hardware fault from the Ibox point of view. I%FLUSH_IREF_LAT_H: Indicates that any current IREAD sequence in Mbox should be immediately cleared. • • the Mbox. 12.7.1.2 • Signals to lbox M%sPEC_~FULL..H: Informs Ibox that the SPEC_QUEUE is full and cannot accept any new references. • M%LAST_F'II..kH: Qualifies I_CF data being returned to Ibox. It indicates that this data is the last fill data for the current fill sequence . • • Mo/oMD_BUS_H<63:0>: Transfers data back to Ibox. • M%MD_BUS_QW_PARITY_L: Quadword parity for M4Qf))_BUS_H. • M%QW..AUGNMENT_H<I:0>: Indicates the relative aligned quadword position of VIC fill data • • • within the aligned hexaword. M%VIC_DATA..,.L: When asserted, indicates that MtrGMD_BUS_H<63:0> contains VIC fill data. M%IBOx...DATA..,.L: When asserted, indicates that M%MD_BUS_H<31:0> contains requested Ibox data. M%IBOx...IPR_WR_H: When asserted, indicates that M%MD_BUS_H<31:0> contains Ibox IPR write data. 12-116 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 • • M%MME_FAULT_H: When asserted in conjunction with M%VIC_D~L or M%IBOX_DATA..L, indicates that data on Mo/cMD_BUS_H is invalid and that the corresponding reference was associated with a memory management exception. M%HARD_ERR_B: When asserted in conjunction with M%VIC_D~L or M%IBOx...DATA..L, indicates that data on M%MD_BUS_H is invalid and that the corresponding reference was associated with a hard error condition. 12.7.2 EBOX INTERFACE 12.7.2.1 Signals from Ebox • E%EBOX_CMD_H<4:0>: Command field of reference sent by Ebox. • E%VA..BUS_L<31:0>: Transfers addresses of Ebox references to Mbox. • E%WBUS_H<31:0>: Transfers data of Ebox references to Mbox. • E%EBOx...TAG_H<4:0>: Ebox reg file destination of reference sent by Ebox. • E%EBOUT_H<1:0>: Access type of reference sent by Ebox. • E%EBOx...DL_B<1:0>: Data length of reference sent by Ebox. • E%EBOx...VIRT_ADDR_B: Indicates whether address is virtual or physical. • • • Eo/'oMMGT_MODE_H<1:0>: Execution mode to be used for ACV checks on PROBE references. E%CUR_MODE_B<1:0>: Execution mode to be used for ACV checks on all non-PROBE references. • E%EREF_REQ...H: When asserted, indicates that a valid Ebox reference is currently being issued. • E%EM..,ABORT_L: Indicates that the current EM_LATCH reference should be disregarded. • E%FLUSH..MBOx...H: Indicates that certain references and reference state in the Mbox should be cleared (See Section 12.3.21.2 ). E%FLUSH_PA,..QUEUE_H: Indicates that the PA_QUEUE should be flushed (See Section 12.3.21.2 ). E%START_IBOX_IO_RD_H: Indicates that the Ebox is md stalling on the corresponding SPEC_QUEUE read. If this SPEC_QUEUE read is an 110 space read and Eo/cSTART_IBOx...IO_RD_H is not asserted, the read is aborted until it is asserted. E%RESTART_SPEC_QUEUE_B: Indicates that Ebox has sent all explicit writes for the current instruction to the Mbox and, therefore, causes the SPEC_~SYNC_CTR to be incremented. EO/oNOJ\fMJLCHECK.-H: Indicates that the corresponding EM_LATCH reference should not be tested for ACV or M=O conditions. • • • • 12.7.2.2 • Signals to Ebox M%EM_LAT_FULL_H: Indicates that EM_LATCH is currently full and cannot accept any new references. • • indicates that the corresponding address in the PA_QUEUE is associated with a hard error. M%PA..Q...STATUS_B<1>: indicates that the corresponding address in the PA_QUEUE is associated with a memory management exception. M%PA..Q...STATUS_B<2>: DIGITAL CONFIDENTIAL The Mbox 12-117 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 • • • • • • • • M%PA...~STATUS_B<O>: indicates that sufficient physical address data is present in the PA_QUEUE to initiate an Ebox STORE command. M%MD_BUS_H<31:0>: Transfers data back to Ebox. M%MD_TAG_H<4:0>: Ebox reg file destination of reference on Mo/cMD_BUS_B<31:0>. M%EBO:x;.,DATA..,H: When asserted, indicates that M%MD_BUS_B<31:0> contains requested Ebox data. Mo/cMME_FAULT_B: When asserted in conjunction with M%EBO:x;.,DATA...H, indicates that data on M%MD_BUS_H is invalid and that the corresponding reference was associated with a memory management exception. M%BARD_ERR_B: When asserted in conjunction with M%EBO:x;.,DAT.A....H, indicates that data on M%MD_BUS_B is invalid and that the corresponding reference was associated with a hard error condition. M%PMUXO_B: Mbox performance data signal (see Section 12.10). M%PMUXl_H: Mbox performance data signal (see Section 12.10). 12.7.3 INTERRUPT SECnON INTERFACE 12.7.3.1 • Signals to Interrupt Section Mo/cMBO:x;.,S_ERROR_H: Indicates that the Mbox has logged a hard error in the PCSTS register and thus, is posting an interrupt. 12.7.4 USEQ INTERFACE 12.7.4.1 Signals to Useq • MIf(MMELTRAP_L: Indicates to the Useq that a memory management exception is to be invoked. • M%TB_PERR_TRAP_L: Indicates to the Useq that a tb parity error has been detected. 12.7.5 CBOX INTERFACE 12.7.5.1 Signals from Cbox • C%cBO:x;.,CMD_B<1:0>: Command field of Cbox reference sent to Mbox. • C%CBO:x;.,ADDR_H<31:5>: Hexaword address of Cbox reference sent to MbOx. • C%MBO:x;.,FILL-.QW_H<4:3>: Indicates the aligned quadword within the aligned hexaword. • C%REQ...DQW_H: Qualifies the current D_CF to indicate that this is the requested data. • BlfD86_DATA..,B<63:0>: Data of Mbox reference seen by Cbox. • C%S6_DP_H<7:0>: Even data parity corresponding to B'*S6_DATA...H<63:0> during cache fill references. • • C%LAST_FILL_H: When asserted, indicates that this is the last fill sent for the current sequence. • C%cBO:x;.,HARD_ERR_H: When asserted when Cbox is driving data onto the B%S6_DATA....B Bus, it indicates that data on M%~m_BUS_H is associated with a non-recoverable hard error. 12-118 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 • Co/oCBOx;..ECC_ERR_R: Indicates that an ECC error is associated with the Cbox data being returned. • C%WR_BUF_BACK.,.PRES_H: Indicates that Cbox cannot accept any more entries in its write buffer. 12.7.5.2 Signals to Cbox • M%S6_CMD_H<4:0>: Command field of Mbox reference seen by Chox. • M%S6_P.A..,.B<31:3>: Quadword physical address of Mbox reference seen by Cbox. • M%C_S6_PA_H<2:0>: Address within addressed quadword of Mbox reference seen by Chox. • Bo/cS6_DA'L\..H<63:0>: Data of Mbox reference seen by Cbox. • M%S6_BYTE_MASK..H<7 :0>: Byte mask field of Mbox reference seen by Chox. • Mo/oCBOx;..REF_ENABLE_L: Indicates that current 86 read reference packet should be latched • and processed by the Cbox. This signal is a don't care on write operations. M%CBOx;..LATE_EN_H: Asserted at the end of a cycle to indicate that a Pcache parity error was detected. As a result, the Cbox must continue to process this reference regardless of what MO/oCBOx;..REF_ENABLE_L indicated. . • • • Mo/tJABORT_CBOx..m.n_H: Indicates that any IREAD which the Cbox may be processing should be immediately terminated. M%CBOx;..BYPASS_ENABLE_H: Indicates that the COOx may drive B%S6_DA'L\..H<63:0> during the following cycle in order to attempt a data bypass. DIGITAL CONFIDENTIAL The Mbox 12··119 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.8 INITIALIZATION 12.8.1 Power-up Initialization The signal, K..M%RESET_L is asserted during the power-up reset sequence. The following state is forced whenever K....)I%RESET_L is asserted: • • • • • • • EM_LATCH valid hit is cleared. VAP_LATCH valid bit is cleared. MME_LATCH valid bit is cleared. RTY_DMISS_LAT valid bit is cleared. DMISS_LATCH valid bit is cleared. MME state machine is forced to the home state. PCCTL<8:0> are cleared (this disables the Pcache). The power-up reset sequence also causes the assertion of E%FLUSH_MBOX. E%FLUSH..MBOX will cause the following state to be forced within the context of the power-up sequence; • • • The SPEC_QUEUE valid bits are cleared. The SPEC_Q...SYNC_CTR is reset to O. Note that a subsequent E%B.FSTART_SPEC_Q signal is expected to enable SPEC_QUEUE arbitration. MMESTS<31:29> are cleared. This invalidates and unlocks the MMESTS register. See Section Section 12.3.21.2 for a complete description of all state changes due to E%FLUSH..lWBOX. Once E%FLUSHJWBOX has been asserted, E%FLUSH_PA,.,QUEUE will be asserted during a subsequent cycle. E%FLUSH_P~QUEUE will cause all PA_QUEUE valid bits to be cleared. The power-up reset sequence also causes the assertion OfI%FLUSH_mEF_LAT. I%FLUSH_IREF_LAT will cause the following state to be forced within the context of the power-up sequence: • • The IREF_LATCH valid bit is cleared. The IMISS_LATCH valid bit is cleared. See Section Section 12.3.21.1 for a complete description of all state changes due to I%FLUSH_mEF_LAT. 12.8.2 Initialization by Microcode and Software It is the responsibility-of the power-up microcode to perform an IPR_WRITE operation to clear MAPEN before any virtual memory references are issued to the Mbox from either the Ebox or Ibox. Failure to clear MAPEN could result in UNDEFINED behavior prior to complete memory management state initialization. PAMODE is also cleared by the power-up microcode via an IPR_WRITE command. If the system configuration requires a 32 bit program-visible physical address space, setting the PAMODE value via an IPR_WRITE must be done under very controlled conditions because writes to the PAMODE processor register affect both physical address generation and interpretation of PrEs. With the possible exception of certain diagnostic code, writes to the PAMODE processor register should not be performed while memory management is enabled. With memory management disabled, 12-120 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 writes to the PAMODE processor register should not be performed unless the PC of the MTPR instruction which writes to the register is in one of the following (hex) address ranges: OOOOOOOO .. 1FFFFFFF EOOOOOOO ..FFFFFFFF By restricting PC to one of these address ranges, changes to the PAMODE register do not cause the genera ted physical address to change in going from 30-bit mode to 32-bit mode, or vice versa. At powerup, microcode fetches the initial instruction from the boot ROM at address E0040000 (hex), which is in the second of the ranges shown above. Therefore, the console code in the boot ROM may write to the PAMODE processor register, and it is expected that this is the place where the PAMODE processor register will be initialized. In uncontrolled conditions, writes to the PAMODE processor register can cause UNDEFINED results. 12.8.2.1 Pcaehe Initialization The Pcache is disabled by the power-up initialization sequence. In order to enable the Pcache, the following sequential actions must be performed: 1. Pcache IPR_WRITE operations must be performed to each Pcache tag to write the tag field to a known state, set the tag parity bit to the cOlTesponding value, and clear the subblock valid bits. 2. The lock bit in PCSTS must be cleared so that a locked PCSTS will not inhibit turning on the Pcache. 3. An IPR_WRITE to the PCCTL must be done to enable the Pcache in the desired operation mode. This step effectively turns the Pcache on. Note that the data array need not be initialized because correct parity will be written into the data array whenever fill data is validated, and data parity is only checked on validated sub-blocks. 12.8.2.2 Memory Management InHlalizatlon Memory management is disabled by MAPEN being cleared by the power-up microcode. Before memory management can be turned on, the following actions must be performed: • • The Ebox must issue a TBIA command to invalidate the TB and reset the NLU pointer to a known state. This is done as part of the microcode processing of an MTPR to MAPEN. The Ebox must write the appropriate values into the six memory base and length registers via IPR_WRITE commands. Once this is done, the Ebox may turn on memory management by setting MAPEN through an IPR_WRITE command. DIGITAL CONFIDENTIAL The Mbox 12-121 I I NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.9 Mbox Testability Features This section describes what testability features are made use of for Mbox testability, and what Mbox signals are used for each testability function. For a global understanding of NVAX testability, and for a detailed description of each testability strategy and hardware mechanism, the reader is referred to Chapter 19. 12.9.1 Internal Scan Register and Data Reducers The following lists Mbox signals which are captured in the internal scan chain. The signals are listed in the order in which they are serially shifted out. Therefore, the first signal listed is the first signal shifted out. If a bus of signals is listed in the form signal<x:y>, y represents the first hit to be shjfted o11t; x represents the last bit of the bns to be shjfted ont Captured Signal Name Description M;..QUB_QtJ3'IJWMILVALO..LAST_B<> cycle-delayed valid bit for Oth entry Spec Queue M;..QUB_QtJ3'IJWMILVAL1..LAST_B<> cycle-delayed valid bit for 1st entry Spec Queue M;..QUJLQtncr.D(.VAlJASTJI<> cycle-delayed valid bit for EM_LATCH JrJ...QtJB..QUKPA~STATV8_PS_B<2:0> Status bits for PA_QUEUE M;..QUB_QUKMMB_TBAPYI_B<> Memory Management Exception Trap signal JrJ...QUB_QUKR'lT_VAlJASTJ2_B<> cycle-delayed valid bit for RTY_DMISS_LATCH JrJ...SIC_TS'l'CJ&:CBOx;.)IBF_BN_P2..B<> Indicates S6 read reference is for Cbox ARCBOIJI'lPASS_ENABL1UI<> Enables bypassing of Cbox cache fill data M...8IC_TSTYM...LAT~~B<> Indicates EM_LATCH backpressure status to Ebox cycle-delayed valid bit for VAP_LATCH M_QUB_QlJ3'IHI1EF_VAL..lA.ST-.8<> cycle-delayed valid bit for IREF_LATCH M...QUB_QUltQJMB_VAIJ,.AST-.8<> cycle-delayed valid bit for MME_LATCH M;..QUJUMILfU5...PA....L3_B<9:31> samples S5_PA Bus M...QUE_S6LfU5...P~«>:8> samples S5_PA Bus M...QtJ:BYS.-AT_B<1:0> Access type for 85 reference M...QtJ:BYS_TAG_B<4:0> Ebox tag address for 85 reference M...QtJ:BYS_DBST_B<1:0> Box destination code for S5 reference M...QtJ:BYS_CMDJI<4:0> Command for S5 reference M...QtJ:BYS_DL...B<1:0> Data length for S5 reference M...QtJ:BYS_Q11AL....B<6:0> Qualifier bits for 85 reference Note that only M.-QUE%S5_PAJI<31:0> contains a data reducer. Implementing a data reducer on this bus should provide coverage for the Mbox 85 pipe as well as coverage for the Ibox, Ebox and Cbox logic which issue references to the Mbox. 12-122 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.9.2 Nodes on Parallel Port The following signals are observable via the Parallel Port: M..QUE%S5_CMD_H<4:0> Current Reference Source (3 encoded bits). The encodings are as follows: Reference Source Encoding NOP or PA_QUEUE (when cmd = STORE) 000 IREF_LATCH 001 SPEC_QUEUE 010 EM_LATCH (when cmd A= STORE) 011 VAP_LATCH (when cmd A:: STORE) 100 MME_LATCH 101 RTY_DMISS_LATCH 110 CBOX_LATCH 111 M_Q11E_QU5o/GABORT_P4_H M_MME.,)fMD%TB_MISS_L3_H M_PC_BSL%PCACHE_HIT_P4_H MME state machine state bits (4 encoded bits). The encodings are as follows: State Name Encoding home 0000 tb_miss_1 0001 tb_miss_2 0010 tb_miss_3 0011 tb_miss_4 0100 tb_miss_5 0101 doub_tb_miss_1 0110 doub_tb_miss_2 0111 doub_tb_miss_3 1000 doub_tb_miss_4 1001 mme_1 1010 mme_2 1011 ipr_rd_1_th_per_2 1100 xpage_1 1101 tb_per_1 1110 undefined 1111 MD_BUS Qualifiers (3 encoded bits). The encodings are as follows: DIGITAL CONFIDENTIAL The Mbox 12-123 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Event Encoding undefined 000 Ibox data 001 010 011 100 101 110 111 Ebox data Ibox and Ebox data VIC data Ibox IPR data undefined Mbox data M"cMME_FAULT_B 12.9.3 Nodes on Top Metal tbd 12.9.4 Architectural features The following is a brief description of all the Mbox architectural features which are relevant to verification, debug, and chip test. All of these features are invoked through the use of IPRs which are defined at the NVAX instruction set level. All of these IPRs can be invoked through the use of MTPR or MFPR macroinstructions. See the Architectural Summary Chapter for a list of all Mbox IPR addresses. Note that Mbox IPR addresses referenced through the MxPR instruction are translated by the Ebox microcode into IPR_RD, IPR_WR, TBIS, TBIA, or PROBE operations before being issued to the Mbox. 12.9.4.1 Translation Buffer Testability The diagnostic user can invalidate the entire TB array by executing an MTPR instruction which addresses the TBIA IPR. This operation will also reset the NLU pointer. The user can invalidate any virtual page address which may cached in the TB by executing a MTPR addressing the TBIS IPR. The diagnostic user can explicitly query the TB to determine if a given tag is validated and stored in the TB. This is accomplished by addressing the Translation Buffer Check IPR through the MTPR instruction. Every TB entry can be explicitly filled and validated by the diagnostic user through the use of the TB_TAG_FILL and TB_PTE_FILL commands. The entry on which these two commands operate at any given time is addressed by the NLU pointer. The NLU pointer is a round robin pointer which increments when a TB_PrE_FILL is executed or when a tag match is detected on the entry which the NLU pointer is currently pointing to. The NLU pointer is reset to point to the Oth entry whenever a TBIA command is executed. 12-124 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 It is·the responsbility of the diagnostic user to set hislher tests up such that normal I-stream and D-stream references generated in the macropipeline do not interfere with the TB state under test. Specifically, the user must guarantee that all relevant pages of the diagnostic program reside in the TB before the test begins, such that accessing these pages will not cause modification of the TB state while the diagnostic program is explicitly probing and changing TB state. See Section 12.5.1.3 for a complete description of TB function as it relates to testability. See Section 12.3.11.2 for a description of the PROBE command which can be invoked through the Translation Buffer Check IPR. 12.9.4.2 Pcache Testability Every bit in the Pcache can be read and written by the user through DREAD, WRITE, IPR_RD and IPR_WR operations. Pcache is accessed by DREADs and WRITEs. All other bits (tag, valid bits and parity bits) are accessed through Mbox IPRs. The operational mode of the Pcache can be changed to accomodate testing the array. The mode is controlled by the Pcache Control Register (PCCTL) which can be read and written as an Mbox IPR. The PCCTL allows the user to: 1. Enable/disable D-stream and/or I-stream operations to the Pcache. 2. Allow the Pcache to operate in a direct mapped force hit mode. 3. Enable/disable Pcache parity checks. See Section 12.4 for a complete description of Pcache function as it relates to testability. 12.9.5 M-BOX Miscellaneous Features -tbd DIGITAL CONFIDENTIAL The Mbox 12-125 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.10 Mbox Performance Monitor Hardware Hardware exists in the Mbox to support the NVAX Performance Monitoring Facility. Chapter 18 for a global description of this facility. See The Mbox hardware generates two signals, M%PMUXO_B and M%PMUXl_B, which are driven to the central performance monitoring hardware residing in the Ebox. These two signals are used to supply Mbox performance data for the purpose of recording performance statistics. Seven Mbox performance monitoring functions exist. The function to be executed is specified by the PMM field of the PCCTL register (see Figure 12-31). The following describes the seven Mbox performance monitor modes: Table 12-27: Mbox Performance Monitor Modes PCC'I'L<7:5> Performance Monitor Mode 000 TB hit rate for SO Space I-stream Reads l 001 010 TB hit rate for SO Space D-stream Reads l 011 TB hit rate for POIP1 Space D-stream Readsl 100 Pcache hit rate for I-stream Reads 101 110 illegal mode-Results are UNPREDICTABLE TB hit rate for POIP1 Space I-stream Reads l Pcache hit rate for D-stream Reads ratio of unaligned virtual reads and virtual writes to total virtual reads and virtual writes 111 ITB hit count is unconditionally incremented when MAPEN::O 12.10.1 TB hit rate Performance Monitor Modes The TB hit rate modes work by asserting M%PMl1XO_H during the cycle in which a specific type of virtual read reference is first attempted in the S5 execution pipe. During the same cycle, M%PMUXl_B will transfer the TB hit status corresponding to this read execution event. It is important to capture this data only on the first execution of the read in order that the TB hit statistics are not skewed by multiple retries of the same reference due to aborted cycles and tb_miss sequences. One low probability scenario exists in which this scheme will not accurately record the TB hit/miss data for the reference. Consider the case where the read is initially executed and is found to hit in the TB while simultaneously being aborted due to some abort condition (e.g. Pcache Index Con:8ict). During the following cycle, another reference is executed which invokes a TB miss sequence. If the TB miss sequence displaces the PrE corresponding to the first read, then the read will subsequently be retried as a TB miss event even though it has already been recorded as a TB hit event. However, the frequency of this scenario should normally be so low that the accuracy of the TB hit ratio statistics will not be affected. 12-126 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.10.1.1 TB hit rate for PO/P1 I-stream Reads In this mode, M%PMUXO_H is asserted during the cycle in which the lREF_LATCH first attempts to drive a virtual process space lREAD into the 85 pipe. Note that MtfcPMUXO_H is only asserted in response to lREAD execution events caused by Ibox-generated IREADs. This avoids recording Mbox-generated "fill forward" lREADs which would abnormally boost the TB hit rate. During the same cycle, M%PMUXl_H will transfer the TB hit status corresponding to the same IREAD execution event. 12.10.1.2 TB hit rate for PO/P1 D-stream Reads In this mode, M%PMUXO_H is asserted during the cycle in which the SPEC_QUEUE, EM_LATCH, VAP_LATCH or MME_LATCH first attempts to drive a virtual process space read into the S5 pipe. During the same cycle, M%PMUXl._H will transfer the TB hit status corresponding to the same read execution event. 12.10.1.3 TB hit rate for SO I-stream Reads In this mode, M%PMUXO_H is asserted during the cycle in which the lREF_LATCH first attempts to drive a system space IREAD into the S5 pipe. Note that M%PMUXO_H is only asserted in response to lREAD execution events caused by Ibox-generated IREADs. This avoids recording Mbox-generated "fill forward" IREADs which would abnormally boost the TB hit rate. During the same cycle, M%PMUXl_H will transfer the TB hit status corresponding to the same lREAD execution event. 12.10.1.4 TB hit rate for SO D-stream Reads In this mode, M%PMUXO_H is asserted during the cycle in which the SPEC_QUEUE, EM_LATCH, VAP_LATCH or MME_LATCH first attempts to drive a virtual system space read into the-85 pipe. During the same cycle, M%PMUXl_B will transfer the TB hit status corresponding to the same read execution event. 12.10.2 Pcache hit rate Performance Monitor Modes The Pcache hit rate modes work by asserting M%PMUXO_H during the cycle in which a specific type of 86 physical read reference is executed in the Pcache. During the same cycle, M%PMUXl_B will transfer the Pcache hit status corresponding to this read execution event. 12.10.2.1 Pcache hH rate for I-stream Reads In this mode, M%PMUXO_H is asserted during the cycle in which an IREAD is executing in the S6 pipe. M%PMUXO_H is only asserted in response to IREAD execution events caused by Ibox-generated lREADs. This avoids recording Mbox-generated "fill forward" IREADs which would abnormally boost the Pcache hit rate. M%PMUXl_H will transfer the Pcache hit status corresponding to the same lREAD execution event during the cycle which M%PMUXO_H is asserted. DIGITAL CONFIDENTIAL The Mbox 12-127 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.10.2.2 Pcache hit rate for D-stream Reads In this mode, M%PMUXO_R is asserted during the cycle in which a D-stream read is executing in the S6 pipe. M%PMUXO_R is only asserted in response to the first Pcache lookup attempt of a D-stream read executing in the S6 pipe. This avoids skewing the performance data based on the same reference being retried in the Pcache due to the "read under fill" function. Therefore, S6 reads originating from the RTY_DMISS_LATCH do not cause the assertion of M%PMUXO_R. M%PMUXl_R will transfer the Pcache hit status corresponding to the same read execution event during the cycle which M%PMUXO_R is asserted. 12.10.3 Unaligned reference statistics This mode allows the user to obtain the percentage of references processed by the Mbox which are unaligned. In this mode, M%PMUXO_R is asserted on any virtual read, virtual DEST_ADDR, or virtual WRITE reference driven from the SPEC_QUEUE or EM_LATCH. The reference must virtual to be recorded due to the nature of the hardware implementation. M%PMVXL.R is asserted on the same conditions as M%PMUXO_R, except that it is further qualified by the fact that the reference is unaligned. 12-128 The Mbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.11 Mbox Signal Name Cross-Reference All signal names referenced in this chapter have appeared in bold and reflect the actual name appearing in the NVAX schematic set. For each signal appearing in this chapter, the table below lists the corresponding name which exists in the behavioral model. Table 12-28: Cross-reference of all names appearing In the Mbox chapter Schematic Name Behavioral Model Name B"8~U)ATA....B<63:0> B%S6_DATA_H<63:0> C%CBOI..,cMD...H<1:0> C%cBOX_CMD_H<1:0> C%CBO~DR...B<31:5> C%CBOX_ADDR_H<31:5> CCQIBOI..,P'ILL..Qw_B<4:3> C%MBOX_FILL_QW_H<4:3> C9IORBCLDQW...B<> C%RE'LDQW_H CU8...DP...B<7 :0> C%S6_DP_H<7:0> C%LAST_FILL_H C%CBOI..,BARD...ERB...B Co/oCBOX_HARD_ERR_H C9IOCBOI..,EC(um~_B C%CBOX_ECC_ERR_H C9IOWR....BUF.JlACJt.PRE8...B C%WR_BUF_BACK_PRES_H E9IOEBOI..,CMD_B<4:0> E%EBOX_CMD_H<4~> E9IOvA....BU8J.<31:0> E%VA_BUS_H<31:0> E9IOWBus...B<31:0> E9IOEBOI..,TAG...B<4:O> E%WBUS_H<31:O> E%EBOX_TAG_H<4:0> E9IOEBO~T...H<1:0> E%EBOX_AT_H<l:O> E9IOEBOXJ>L.JI< 1:0> E9IOEBOI..,VIRT..,ADDB,..,B E%EBOX_DL_H<1:0> nMMGT~DE...B<l:O> E%MMGT_MODE_H<1:0> E9ICUlUIoDE...H<l:O> E%CUR_MODE_H<1:0> E9IOERD'....REQ...H E9IOD(.ABORT....L E%EREF_RE'LH E%EM_ABORT_H E9IOFLUSll...,MBOx...H E%FLUSH_MBOX_H E9IOFLt:JSB....PA....QUEUJi:...B E%FLUSH_PA_QUEUE_H E9IOSTART_mox..,IOJID...H E%START_IBOX_IO_RD_H E9IORESTART_SPEC...QUEUE...H E%RESTART_SPEC_QUEUE_H ~...MME.-CBECJUI E%NO_MME_CHECK_H E%EBOX_VIRT_ADDR_H I9IOIBOI..,CMD_L<4,1:0> I%IBOX_CMD_H<4:0> I..mox...,ADDR...H<31:0> I%IBOX_ADDR_H<31:0> I9IOIBOI..,TAG....L<2:0> I%IBOx....TAG_H<2:0> I..IBO~_L<1:0> I%IBOX_AT_H<1:0> DIGITAL CONFIDENTIAL The Mbox 12-129 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 12-28 (Cont.): Cross-reference of all names appearing In the Mbox chapter Schematic Name Behavioral Model Name I~IBO~IJ.<1:0> I'YPB<Ul.BQ_B I%IBOX_DL_H<1:0> I%IBOX_REF_DEST_H<1:0> I%IREF_RE'LH I%SPEC_RE'LH I~roBCE...MMB-"AULTJI I%FORCE_~_FAULT_H I~IBOx:..;aBFJ)B8TJ.< 1:0> I~IB.D'.JlEQ..lI ~RT_CBO~.JI I%FORCE_HARD_FAULT_H I%FLUSH_IREF_LAT_H M%ABORT_CBOX_IRD_H M.c_S8_PA....B<2:O> M%C_S6_P~H<2:0> M'lCBOIJIYPASS_BNABLJUI M%CBOX_BYPASS_ENABLE_H M%CBOX_LATE_EN_H M%CBOX_REF_ENABLE_H M%EBOX_DATA_H M%EM_LAT_FULL_H M%HARD_ERR_H M%IBOx....DATA_H M%IBOX_IPR_WR_H M%LAST_FILL_H M%MBOX_S_ERROR_H M%MD_BUS_H<63:0> M%MD_BUS_QW_PARITY_H M%MD_TAG_H<4:O> M%MME_FAULT_H I~roBCE..BAB.D_FAULTJI I~FLUSBJBBlI'_LAT_B M'ICBO:x..I..ATE...,_B M'ICBOlUIBFJl:NABLB..L MCMmox..DATA...B ~JI'VLIJI ~_BB.B..B WSBOXJ)ATA....L WQBOXJPB._WRJI ~.,;FD:.IJI M'QIBO:x..S_II:RBORJI MIQID_BUS_B<63:O> M...m_BUS_QW..PABl"IT_L ~_TAG_B<4:O> ~-"AVLT_B ~A...Q...STATV8.J1<2:0> I8PJrroXDJI M~PMUXlJJ M~w_ALIGNMBNTJI<1:0> ~_Q...ruLL...B MCJWM_BTrE...MASI...B<7 :0> MCJWM_CMD_B<4:0> MH8_pA..ll<31:0> I8V1CJ)A'n\..L M:..QtJBY5..ATJI<l:O> M:..QtJBY5_CMDJJ<4:0> M:..QtJBY5J)ATAJI<31:0> M:..QtJBY5_DEBT_B<l:D> 12-130 The Mbox M%PA_'LSTATUS_H<2:0> M%PMUXO_H M%PMUX1_H M%QW_ALIGNMENT_H<1:0> M%SPEC_'LFULL_H M%S6_BYTE_MASK_H<7:0> M%S6_C:MD_H<4:0> M%S6_PA_H<31:0> M%VIC_DATA_H M..QUE%S5_AT_H<1:0> M_QUE%S5_CMD_H<4:0> M_QUE%S5_DATA_H<31:0> M_QUE%S5_DEST_H<1:0> DIGITAL CONFIDENTlAl NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 12-28 (Cont.): Cross-reference of all names appearing In the Mbox chapter Schematic Name Behavioral Model Name M...QUEU&J)LJI<1:0> M_QUE%S5_DL_H<1:0> M...QUEU&_PAJI<31:0> M_QUE%S5_PADP_H<31:0> M...QUEU5_QUAL_B<6:0> M_QUE%S5_QUAL_H<6:O> M...QUEU5_TAG_B<4:0> M_QUE%S5_TAG_H<4:0> M...QUEU&_vA.,.B<31:0> M_QUE%S5_VA_H<31:0> M_S5C%ABORT_H DIGITAL CONFIDENTIAL The Mbox 12-131 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 12.12 Revision History Who When Description of change Mike Uhler 12-Sep-I991 Correct TB selections in the PMM field of PCCTL Bill Wheeler 26-Jul-I991 correct an inconsistency in spec Bill Wheeler 21-May-I991 add an mbox ucode restriction Bill Wheeler 26-Apr-I991 final tweaks Bill Wheeler 25-Feb-1991 tweaked description of ucode biasing of mm regs Bill Wheeler 22-Feb-I991 described ucode biasing of mm regs Bill Wheeler 20-Feb-1991 Changed text to reflect expaned SO space configuration Bill Wheeler 20-Sep-I990 Other tweaks; add signal nef table Bill Wheeler 8-May-1990 Other tweaks Bill Wheeler 27-Feb-1990 Add perf monitor hardware. Other tweaks Bill Wheeler 15-Jan-I990 Signal name change Bill Wheeler 20-Nov-1989 Final Changes prior to review for Rev 1.0 Release Bill Wheeler 23-Aug-I989 More Updates Bill Wheeler 31-Jul-1989 Spec Update Bill Wheeler 06-Mar-1989 For External Release Bill Wheeler 30-Nov-1988 Initial Release 12-132 The Mbox DIGITAL CONFIDENTIAL Chapter 13 The Cbox 13.1 Terminology Term Meaning Enor transition mode (ETM) Mode where the backup cache only services CPU requests to blocks which are valid-owned. All other CPU requests, including those to valid-unowned blocks, are ignored by the backup cache and are forwarded to memory. The purpose is to use the cache as little as possible because of previously detected errors. Cache coherence transaction A transaction from the external system which inteITogates the backup cache and may cause a block invalidate and/or a block writeback. Deallocate The actions necessary to allocate a new block because of a read miss or a write miss. A writeback is required if the block is valid-owned. An invalidate is required if the block is valid, whether owned or unowned. A cache coherency request which results in a hit also causes a deallocate. Longword 4: bytes of data Quadword 8 bytes of data Hexaword 32 bytes of data 13.2 Functional Overview of the Cbox and Backup cache The Cbox is that section of the NVAX CPU chip which controls the backup cache and interfaces to the external bus. The Cbox includes the BIU functions for the NVAX CPU. The backup cache is a writeback cache. Cache tags and cache data are stored in off-chip static RAMs (off-the-shelf parts). The Cbox implements the control for the cache tags; control for the cache data; and control for the external pin bus, the NDAL. The Mbox sends read requests and writes to the Cbox; the Cbox sends fills and invalidates to the MbOx. The Cbox ensures that the Pcache is a subset of the backup cache through invalidates. The Cbox communicates with the memory subsystem (everything beyond the backup cache) via the NDAL. The Cbox generates reads and receives fills; it receives cache coherence transactions from the NDAL to which it responds with invalidates and writebacks, as appropriate. The reader is assumed to be familiar with Chapter 3, which describes the NDAL. DIGITAL CONFIDENTIAL The Cbox 13-1 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Cache coherence in an NVAX system is based upon the concept of ownership. A hexaword block of memory may be owned either by memory or by an NVAX backup cache. In a multiprocessor system, only one of the caches or memory can own the block at a time. Several of the planned NVAX systems implement an explicit ownership bit for each hexaword block of memory; it would also be possible to build an NVAX system without explicit ownership bits in memory. 13.2.1 The Cbox and the System The Cbox has a tightly coupled internal interface with the MbOx. It has separate external busses which communicate with the backup cache tag RAMs, the backup cache data RAMs, and the memory interface, as shown in Figure 18-1. Figure 13-1: The Cbox In the System MBOX , / PCACHE CBOX/BIU ON·CHIP ------------------------------------------------------------------_. OFF· CHI P .",'" 8 0 41 ," " .... .JIll 100> N, . . . . . . " ....... " ....... " ••" '----;i-:-4;J TAG RAMS NDAL ,," 92 ....., ": : ! BACKUP CACHE!: : i , DATA RAMS '----~i~ / : I : i MEMORY INTERFACE /, SYSTEM MEMORY AND 110 13-2 The Cbox BUS DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 13.2.2 Writeback Cache and Ownership Concepts There is one fundamental difference between a writeback cache and a writethrough cache. "When a write is received by a write-through cache, the data may be written into the cache and is always written to memory as well. When a write is received by a writeback cache, the write is not necessarily forwarded to memory; the write may be done only into the cache. The data is written back to memory only if another element in the system needs that data, or if the block is displaced (deallocated) from the cache. The NVAX backup cache is a writeback design in which a cache block may exist in one of three states: invalid, valid-unowned, and valid-owned. A block which is valid-unowned is a read-only copy of memory data. A block which is valid-owned may be written by NVAX, and if it has been written since being put into the cache, is the only up-to-date copy of the data in the system. The l'i'VAX cache makes no distinction between valid-owned blocks it has written and those which it has not written. A valid-unowned copy of a given cache block may reside in one or more backup caches in an l\TVAX multiprocessor system. No l\"'\;:tU( backup cache may contain a valid cache block which is valid-owned by another backup cache in the system. The Cbox design relies upon the system bus andlor the system bus interface to support ~-nAL Ownership ReadlDisown Write pairs to ensure cache coherency. The most straightforward way to implement a memory for :NVAX is to have an ownership bit associated with each hexaword of data. When this memory receives an Ownership Read (OREAD) for a hexa\vord, ownership is passed to the requesting CPU, and the data is returned to the CPU. If another Ownership Read arrives for that hexaword from a second CPU, memory does not return the data since the hexaword is not owned by memory but by the first CPU. The first CPU recognizes the second OREAD as a cache coherence transaction and writes back the data from its cache, using the Disown Write command. The data is then available for the second cpu. During normal operation, the Cbox issues an OREAD to the memory interface and receives ownership of the block before it performs a write to that block in the backup cache. The Cbox relinquishes ownership of the data when a cache coherence transaction requesting a writeback appears on the NDAL. 13.2.3 Backup cache Operating Modes The backup cache has four distinct modes of operation. • • • • Cache ON. Normal operation. Most of this chapter describes Cbox operation when the backup cache is on. Cache OFF. Reset puts the backup cache into the OFF state. The backup cache may be enabled/disabled (turned ON/OFF) by software through the Cbox control !PR. Cache off mode is described in Section 13.9.l. Force Hit. The Cbex forces all memory space reads and writes to bit in the backup cache. This mode is used for testing and initialization purposes. Force Hit mode is described in Section 13.9.2. Error Transition Mode. The Cbox enters Error Transition Mode upon recognition of some error conditions or when put into ETM explicitly by an IPR write. Error Transition Mode is described in Section 13.9.3. DIGITAL CONFIDENTIAL The Cbox 13-3 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 13.3 NVAX Backup Cache Organization and Interface The backup cache is configurable based on the size and speed of the cache RAMs used to implement the cache on the board. The backup cache may be configured to be one of four sizes: 128 kilobytes, 256 kilobytes, 512 kilobytes, or 2 megabytes. This is controlled by the SIZE field in the CCTL register, as described in Section 13.5.1. The smallest RAMs which may be used to achieve each configuration are shown in Table 13-1. Table 13-1: Backup Cache Size and RAMs Used Cache size Tag RAM Size Data RAM Size Number of Tags Valid Bits Per Tag 128 Kilobytes 4Kx4 16Kx4 4K 1 1 256 Kilobytes SKxS 1 32Kx 8 8K 1 512 Kilobytes 16Kx4 64Kx4 16K 1 2 Megabytes 64Kx4 256Kx4 64K 1 lUBing x8 parts means the cache no longer takes advantage of the :Dibble protection feature of the cache ECC design. Regardless of configuration, the cache has a block size of 32 bytes and has no subblocks. The data bus to the cache is 8 bytes wide, so in order to read out an entire block, 4 accesses are done. Each block contains 32 bytes of data and has associated with it a tag, a valid bit, and an owned bit. ECC protection is provided on each quadword in the cache. ECC protection is also provided on the tag store. Each of address bits <20:17> serves either as an index bit or as a tag bit, based on the cache size configured. Table 13-2 shows how the bits are used. Table 13-2: Tag and Index Interpretation based on cache size Cache size Tag bits 118ed Indez bits used 128 kilobytes Tag<31:17> Index<16:5> 256 kilobytes Tag<31:18> Index<17:5> 512 kilobytes Tag<31:19> Index<18:5> 2 megabytes Tag<31:21> Index<20:5> The backup cache speed may also be configured based on the access time of the RAMs used to implement the tag store and the data store. The TAG_SPEED and DATA_SPEED fields of the Cbox control register, CCTL, are used to control the number of NVAX cycles used by the Cbex to access the RAMs. The relationship between TAG_SPEED, DATA_SPEED, NVAX cycle time, and the cache RAM access times required is shown in Table 13-3. 13-4 The Cbox DIGITAL CONFIDENTIAL NV.AX CPU Chip Functional Specification, Revision 1.0, February 1991 NOTE Table 13-3 is based upon simulations of the XNP (XMI-based system) board. These numbers may only be applied directly to an environment which is very close to that of theXNP. Table 13-3: Backup Cache RAM Speeds and NVAX Cycle Time Tag RAM tacread CCTL a.ccetIS (access) tag write CCTL TAG_SPEED time rep rate rep rate Data BAM access datarea.d data write DATA..,.SPEED time rep rate rep rate RAM Speeds required for 16 D& NVAX cycle time 0 0·2lns (2)3 cycles 3 cycles 001 0·19.5ns 2 cycles 3 cycles 11 22· 37m (3)4cycies 4 cycles 01 20·3S.5ns 3 cycles 4 cycles 10 36·S1.5ns 4 cycles S cycles RA..'I\1 Speeds required for 14 DS l'.VAX cycle time 0 0·1'i.5ns (2) 3 cycles 3 cycles 001 0·16 ns 2 cycles 3 cycles 11 18·3l.Sns (3)4 cycles 4 cycles 01 17·30 ns 3 cycles 4 cycles 10 31· 44 %IS 4 cycles 5 cycles RAM Speeds required for 12"D& NVAX cycle time 0 0·14ns (2) 3 cycles 3 cycles 001 0·13 ns 2 cycles 3 cycles 11 IS· 26m (3)4cycies 4 cycles 01 14·25 %IS 3 cycles 4 cycles 10 26·37ns 4cycles 5 cycles RAM Speeds required for 10 D& NVAX cycle time 0 o ·10.Sns (2)3 cycles 3 cycles 001 0·9.5 JlS 2 cycles 3 cycles 11 1l·20.5ns (3)4 cycles 4, cycles 01 10· 19.5ns 3 cycles 4 cycles 10 20· 29.5ns 4, cycles 5 cycles lTAG_SPEED=1 cannot be used with DATA_SPEED=OO, as the NVAX Cbox cannot function with tag rams whose read access time is longer than the data ram read access time. Extensive simulations of the NVAX chip, package, and XNP board were done in order to determine the drive times of the cache pins in this environment. The drive times are measured from the internal NVAX clock to the signal being valid at the cache pin. The drive times for TT (typical speed) parts under worst-ease conditions are shown in Table 13-4. These drive times would be met under worstease conditions in the 14ns system. These drive times only apply to the XNP board, and cache drive times and performance would be different in a different environment. DIGITAL CONFIDENTIAL The Cbox 13-5 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 13-4: Cache pin drive times In the XNP environment NVAX Cache Interface Pin Time to sigDal valid at cache RAM Starting clock P%TS_TAG_H<31:17>, FCkTS_ECC_H<5:O>, P%TS_OWNED_R, P%TS_VALID_R 8.5 ns P%TS_TAG_H<31:17>, FCkTS_ECC_H<5:O>, P%TS_OWNED_R, P%TS_VALID_R 1.5 ns (tristate time) P%TS_INDEX_H<lO:O> E...PA.DCH'BI..S_H 8.0ns P%TS_INDEX_H<20:11> E...PADL4IIiIPBI_S_H 8.0 ns P%TS_OE_L E...PADLVIII_lJI (assertion), 8.0 ns E...PADL4IIiIPBI_"_H (deassertion) E...PADIJIi.PHI_S_H (assertion), 8.0 ns E...PADL4IIiIPBI_lJI (deassertion) FCiCDR_Thii>EX_H<20:3> E...PADLVHI_S_H 8.0 ns P%DR_OE_L E...PADLfii>PHI_l_H (assertion), E...PADLfii>PBC,,-H (deassertion) 8.0 ns (assertion), 8.0 ns E...PADLVHI_S_H E...PADLCiCPHI_l_H (deassertion) P%DR_DATA_H<63:0>, P%DR_ECC_H<7:O> E..PADLfii>PHI_"J 8.5 ns P%DR_DATA_H<63:0>, P%DR_ECC_H<7:O> E.,PADLfii>PHI_"_H 1.5 ns (tristate time) Figure 13-2 and Figure 13-3 show the timing of cache tag transactions and of cache data transactions. The symbols shown in the timing diagrams are defined in Table 13-5. Table 13-5: Cache pin timing symbol definitions Symbol Meanjng Taa RAM address access time: valid index to RAM output valid Toe Assertion of output enable to RAM output valid Tob RAM output bold from address change Tohz Output disable to RAM output in high Z Taw Valid index to end of RAM write Tdw Data valid to end of RAM write Tnz NVAX tristate time 'IWr Write enable deassert to address change (write recovery) Tdh NVAX data hold time after write enable deassert 'l\vp Write enable pulse width Tas RAM address setup time to write enable assertion 13-6 The Cbox DtGITAL CONFIDENTIAL o Q ~ r NVAX BAckup CAche T/IG R1IM P"d Timing 14nll NVAX cye1e time. IU\H read followed by Another reAd. T/IG m P 'TS_INDEX-"<20. 5> » r P'T5 TAG H<31111 > PtTS-e;cCH<5.0> PtTS-ONNED H P\T!CVALID::!t § 11 1111 phA"e. Are relAtive to the NVAX intern"l c10ck8. ••••••••••••••••••••••••••••••••••••••••••••••••••••• "...................... ,. ••••••••••• "...................... "It o o ,.z 6 3. 5n. per phll.e. I TAG CONTROLLlI:R S TAft. P'TS_OE_L P'TS_MB_L I I I lotI!: I I I I 1 I I 8 on8 : I I ~+~ , , I LIlj'KUP Index 0 15.0n8 TaA ead i'E--8:on"~ 1 TAG COtITROLLER STAn:. I lOLl!! MRITB It. itA • • " • • • • • • • • (Q C CiJ -r RNt dA:a JlJIJ1. ~.:.. I ~ O.On" ~ UJ ID r-'~f:~ RAM5 drlvin () ~ C "'CJ &' "ions-.~~-!l.5n~.---;ao () TAG IU\H quAdword write followed by reAd. I 1t It • • " • • • • • " • • • " • • • • • • "' • • " III • • • • • • • • • • It .... ~SJ.lP I It." ....... ·r. r:~~ns-r ~-~ 1.5n. .• . I It It • • • PI i CYC'S "<IIIIA) 11F. tNIlF.RTEO IIF.r{F. I P2 J J''J I r4 :l--"--.----- ~ (Q PI !_.~~_I_~~_ P4 JllI,1I: NRI1'F. I I - - I P'TS INDI!:X "<2015> writo dat P'TS TAG H<31111> PtTS-e;cC-H<5.0> _Pl-+I_-+-_+--+ 1-1 iC 5 :J> :r"tJ :I a fa :i 13. o.. ,.·~---·--· Twp ~SLIP CYCLE NOUJ.l) Sf!: IN!lI!'.RmD III!:RE ~ ~ P\TSYALID::!t PtTS_MB.Jo ~ :D PtTS_OEy pus-omeD H ~ n ~ () Jt. t fI) 1 e; ~ ~ i. fl. g t-o' ~ ...~ o0" 2 .... ~ ~ ~ j.... eo eo t-o' ..4 t ~ NVAX Backup Caeh.. DATA RAM Pad Timing • NVAX eycl. tim... 3.5n. per pha... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A • • • ,.. ""." • • It . . . . . . . . . . . . . . . . . . . . . . . D*. IU\H read. ::!1 All ph•••• ar. rel.tiv. to th.. NVIIX internal clock •• "It •• ltll A • • • • Air". '11ft" • • • It • • • • It. A • • A • • A A.A."' . . . . ItA . . . . . A A"Ir' 1t.1t A • • " • • ItAIr . . . . . . . . . . . . . . . . . . . . . . . . . . It • • • • • • • • • • • • • • • • ca Aborted due to ta9 mis •• CD ~ P4 1-_P1 ~l P3 • CO C P4 ..4 ~ I.OOKUl' I" - - I .0 CONmOLLBR STATE. I .0 OS I" IDLI:: I J.\MI()()()\)() ~.On. I .3 LOOKUP I. ptJR INDEX "<20.3> I .0 . I I !ltl_ ~ PI P2 I ."RIm LOOKUP. • n1. ~ () I r3+~I~ _ !~I__~~·~ ex for re 16.5n. Taa i P4 I~ ; I "'SI.IP CTCJ.E "flUID I'''' INI11::RTFlIl IIF.nE f l! ::J J fa I l ~ :I a ;- . ~ ~ i !C 'R DATA tt<U ,0> IC£cCjI<1'0> a: ~ S ~ D.t:. IU\H read-modify writ ... ("l t,.j ~ 111;;1,1': ~ c:: OJ C tI~:a;;ltmll ("l () II "'SI.IP eTC).F. WOlllJ) III, ~ It. f ~. t!. 8 ~ D~a IU\H quadword writ. follow.d by read, I" CONmOLLBR STATE. 1 PttJR INDEX "<20'j> - - .0 I" IDLI: .0 : I" I. J.\MI()()()\)() F 8 : o nll I !lI'E I" . .0 WRITE i .. • • n !' ex fpr " • i5 ~ r- 8 Z JI c m § ~ r- IOLE, 1 .. _., t. dat: rCECc_ ii<1.0> • . . . _. T.w R DATA 8<63.0> c , P2 __ ~ !~_I ~_r4_1j1 j:> t,.j i ~ co ... U) 'R_OI!:_L 13.0".0 Twp. *--o.or"'--·--*-i I . ~-;..I "'SLIP CYCI.P.: MOU/A) III'! INSF.ltl'ED "';;ItF. I 1',Dl NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 13.3.1 Backup Cache Interface This section describes the NVAX pins dedicated to the backup cache interface. These are listed in Table 13-6. Table 13-6: NVAX Backup Cache Inter1ace Pins Signal Number Input/output Type BACKUP CACHE TAG STORE SIGNALS (41 total) P%TS_INDEX_B<20:5> 16 Output One driver, six receivers P%TS_OE_L 1 Output One driver, six receivers P%TS_WE_L 1 Output One driver, six receivers P%TS_TAG_B<31:1 "1> P%TS_ECC_H<5:O> 15 Input/Output Tristate, seven drivers/receivers 6 Input/Output Tristate, seven drivers/receivers P%TS_O\\~'"ED_H 1 Input/Output Tristate, seven drivers/receivers 1 Input/Output Tristate, seven drivers/receivers P%TS_VALID_H BACKUP CACHE DATA R.AM: SIGNALS (92 total) P%DR_~"DEX_H<20:3> 18 Output One driver, eighteen receivers P%DR_OE_L 1 Output One driver, eighteen receivers Po/oDR_WE_L 1 Output One driver, eighteen receivers P%DR_DATA_B<63:0> 64 Input/Output Tristate, nineteen driverslreceivers P%DR_ECC_H<"1:0> 8 Input/Output Tristate, nineteen driverslreceivers The pins listed are described in the sections which follow. 13.3.1.1 pOkTS_INDEX_H<20:S> These pins drive the address lines of the tag RAMs, thus indexing into one row of the tag store. The value driven depends upon the corresponding bits in the address of the memory or IPR reference being done. P%TS_INDEX_H<16:5> are used for every cache configuration. P%TS_INDEX..H<20:1'1> are used based on the cache size selected. When the cache size selected is smaller than 2 megabytes, bits are driven to 0 rather than to the value givep in the address. This some or all of these is shown in Table 13-7. fou:r: DIGITAL CONFIDENTIAL The Cbox 13-9 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 13-7: Usage of POkTS_INDEX_H<20:5> based on cache size Cache size P%TS_INDEX_H bits driven UDcoDditioDally to 0 128 ldlobytes P%TS_INDEX_B<20:17> P%TS_INDEX_H<16:5> 256 kilobytes P%TS_INDEX_B<20:18> P%TS_INDEX_B<17:5> 512 ldlobytes P%TS_INDEX_B<20:l9> KTS_INDEX_B<18:5> 2 megabytes None KTS_INDEX_Bd0:5> P%TS_INDEX_H<20:5> are driven by NVAX and received by up to 6 RAM chips. 13.3.1.2 pOkTS_OE_L P%TS_OE_L (Tag Store Output Enable) is an output pin which controls the tag store RAMs. It enables the RAMs to drive their outputs. It is asserted (driven low) when the tag store is being read, and allows the tag store to drive P%TS_TAG_H<31:17>, P%TS_ECC_Hc5:0>, P%TS_OWNED_H and P%TS_VALID_H. 'When the tag store is being written, P%TS_OE_L is deasserted (driven high). P%TS_OE_L is driven by :NVAX and received by up to 6 RAM chips. 13.3.1.3 p%TS_WE_L P%TS_WE_L (Tag Store Write Enable) is an output pin which, when asserted, enables the tag store RAMs to be written. It is asserted (driven low) during writes of the tag store. P%TS_'WE_L is driven by NVAX and received by up to 6 RAM chips. 13.3.1.4 pO/oTS_TAG_H<31 :17> P%TS_TAG_H<31:17> are 110 pins which are used to transfer the cache tag to and from the tag store RAMs. When the tag store is being written, P%TS_TAG_H<31:17> are used as outputs; when the tag store is being read, P%TS_TAG_H<31:17> are used as inputs. Some of the tag lines are not used when the cache is bigger than 128 kilobytes, as shown in Table 13-8. When this is the case, the board designer does not need to connect the pin at all on the board. The pin is pulled low through a resistor in the pad so that internal to the Cbox, the unused tag lines are recognized as zeros when the tag is read. Table 13-8: Usage of POkTS TAG Hc20:17> based on cache Size Cache size 128 kilobytes None 256 kilobytes 512 kilobytes P%TS_TAGJI<l '1> P%TS_TAG_B<18:1'1> 2 megabytes P%TS_TAG_B<20;l. '1> 13-10 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 All of the P%TS_TAG_H<31:17> pads are built with internal resistors, for chip layout consistency. Each P%TS_TAG_H pin is connected to one RAM 110 pin. A system designer who intends to run NVAX only in 30-bit mode can leave P%TS_TAG_H<31:29> unconnected, and they will be pulled low internally so that the Cbox sees a zero value. 13.3.1.5 pOkTS_ECC_H<S:O> P%TS_ECC_H<5:0> are 110 pins which are used to transfer the ECC check bits to and from the tag store RAMs. "When the tag store is being written, P%TS_ECC_H<5:0> are used as outputs; when the tag store is being read, P%TS_ECC_H<5:O> are used as inputs. Each P%TS_ECC_H pin is connected to one RAM 110 pin. 13.3.1.6 pOkTS_OWNED_H P%TS_OWNED_H is an I/O pin which is used to transfer the ownership hit to and from the tag store RAMs. V\1hen the tag store is being written, P%TS_OWNED_H is used as an output; when the tag store is being read, P%TS_OWNED_H is used as an input. P%TS_O'\\1\"ED_H is connected to one RAM 110 pin. 13.3.1.7 P%TS_VAl.ID_H P%TS_VALID_H is an I/O pin which is used to transfer the valid bit to and from the tag store RAMs. 'When the tag store is being written, P%TS_VALID_H is used as an output; when the tag store is being read, P%TS_VALID_H is used as an input. P%TS_VALID_H is connected to one RAM 110 pin. 13.3.1.8 P%DR_INDEX_H<20:3> These pins drive the address lines of the data RAMs, thus indexing into one row of the data store. The value driven depends upon the corresponding bits in the address of the memory reference being done. P%DR_INDEX_H<16:S> are used for every cache configuration. P%DR_INDEX_H<20:17> are used based on the cache size selected. When the cache size selected is smaller than 2 megabytes, some or all of these four bits are driven to 0 rather than to the value given in the address. This is shown in Table 13-9. Table 13-9: Usage of P%DR_INDEX_H<20:S> based on cache size Cache size P%DR_lNDEX_Bbits driven 1IDconditioDally to 0 128 kilobytes P%DR_INDEX_B<20:1'1> P%DR_lNDEX_B<16:5> 256 kilobytes P%DR_lNDEX_B<20:18> P%DR_INDEX_B<17:S> 512 kilobytes Po/oDR_lNDEX_B<20:19> P%DR_INDEX_B<18:5> DIGITAL CONFIDENTIAL The Cbox 13-11 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 13-9 (Cont.): Usage of POkDR INDEX H<20:S> based on cache Size Cache size UDconc1itioDBlly to 0 P%DR_INDEX_Hbits driven P%DR_INDEX..H bits 1l8ed 2 megabytes P%DR_INDEX_H<16:5> are driven by NVAX and received by 18 RAM chips. 13.3.1.9 P%DR_OE_l P%DR_OE_L (Data RAM Output Enable) is an output pin which controls the data RAMs. It enables the RAMs to drive their outputs. It is asserted (driven low) when the data RAMs are being read, and allows the data RAMs to drive P%DR_DATA_H<63:0> and P%DR_ECC_H<7:O>. When the data RAMs are being written, P%DR_OE_L is deasserted (driven high). P%DR_OE_L is driven by NVAX and received by 18 RAM chips. 13.3.1.10 P%DR_WE_l P%DR_WE_L (Data RAM ,\\Trite Enable) is an output pin which, when asserted, enables the data RAMs to be written. It is asserted (driven low) during writes of the data R.AM:s. P%DR_"WE_L is driven by NVAX and received by 18 RAM chips. 13.3.1.11 P%DR_DATA_H<63:0> P%DR_DATA_H<63:0> are I/O pins which are used to transfer the cache data to and from the data RAMs. When the data RAMs are being written, P%DR_DATA_H<63:O> are used as outputs; when the data RAMs are being read, P%DR_DATA_H<63:0> are used as inputs. Each one of P%DR_DATA_H<63:O> is connected to one RAM I/O pin. 13.3.1.12 POkDR_ECC_H<7:0> P1DDR_ECC_H<7:0> are I/O pins which are used to transfer the data ECC to and from the data store. When the data store is being written, P%DR_ECC_H<7:O> are used as outputs; when the data store is being read, P%DR_ECC_H<'7:O> are used as inputs. Each one ofP%DR_ECC_H<7:O> is connected to one RAM I/O pin. 13-12 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 13.3.2 Backup Cache Block Diagrams Figure 13-4 and Figure 13-5 show the connections to the tag store and data RAMs and the way the address is used for the 128-kilobyte cache. Figure 13-4: Tags and Data for 128-Kllobyte Cache TAG STORE P%TS_OE_L 6 PARTS, 4K X 4 P%TS_WE_L VALID_H P% TS_ECC_H<5 :0> POlo DR_INDEX_H<16:3> , POlo DR_OE_L POlo DR_WE_L / DATA RAMS / 18 PARTS, 16K X 4 , , / /1' P%OR_OATA_H<63:0> "\,V DIGITAL CONFIDENTIAL /, P%DR_ '\/ The Cbox 13-13 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 13-5: Address as used for 128-Kllobyte Cache 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I tag - 15 bits I data and tag store index - 12 bits I , UNUSED I +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ used to address data quadword within hexaword--' unused for tag store 13-14 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 13-6 and Figure 13-7 show the connections to the tag store and data RAMs and the way the address is used for the 256-kilobyte cache. Figure 13-6: Tags and Data for 256-Kllobyte Cache TAG STORE P%TS_OE_L 3 PARTS, 8K X 8 P%TS_WE_L VALID H P%TS:ECC_H<S:O> po;,oDR_INDEX H<17:3> '\ DATA RAMS / p% DR_OE_L p% DR_WE_L '\ / '\ / 9 PARTS, 32K X 8 /1" /" P%OR_DATA_H<63 :0;:. ,V P%OR '\/ - Figure 13-7: Address as used for 256-Kllobyte Cache 31 30 29 28127 26 25 24123 22 21 20119 l8 17 l6115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ tag - 14 bits I data ana ta9 store inaex - l3 bits I I UNUSED I +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I used to address data quadword within hexaword--' unused for ta9 store DiGITAL CONFIDENTIAL The Cbox 13-15 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 13-8 and Figure 13-9 show the connections to the tag store and data RAMs and the way the address is used for the 512-kilobyte cache. Figure 13-8: Tags and Data for 512-Kllobyte Cache POlo TS_' NOEX_H<18:5> TAG STORE P%TS_OE_L 6 PARTS, 16K X 4 P%TS_WE_L VALIO_H P% TS_ECC_H<5 :0> po;.oDR_'NDEX_H<18:3> \, DATA RAMS / P% DR OE_L \, 18 PARTS, 64K X 4 / POlo DR_WE_L -" ./ / , I, /i' P%DR_DATA_H<63 :0> P%DR ,/ / - Figure 13-9: Address as used for 512-Kllobyte Cache 3l 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 tag - 13 bits 1 ciata and tag store inclex - 14 bits 1 1 'ONOSED I +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 used to adciress ciata quadword within hexaworci--' unused for tag store 13-16 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 13-10 and Figure 13-11 show the connections to the tags and data RAMs and the way the address is used for the 2-megabyte cache. Figure 13-10: Tags and Data for 2-Megabyte cache P% TS_IN DEX_H<20 :5> TAG STORE P%TS_OE_L 5 PARTS, 64K X 4 P%TS_WE_L P% DR_INDEX_H<20:3> " P% OR_OE_L / DATA RAMS " 18 PARTS, 256K X 4 / P% DR_WE_L -""/ /1" /1" P%OR_DATA_H<63:0> ',I' P%OR ,v - Figure 13-11: Address as used for 2-Megabyte Cache 31 30 29 28127 26 2S 24123 22 21 20119 l8 17 16115 14 l3 l21l1 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ tag - II bits ciata anci tag store index - 16 bits 1 1 UNUSED 1 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I used to adaress data quaciword within hexaword--' unused for tag store DIGITAL CONFIDENTIAL The Cbox 13-17 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 13.4 The Cbox Datapath The Cbox includes datapath and control for interfacing to the Mbox, the cache RAMs, and to the NDAL. The portion of the Cbox which primarily interfaces to the Mbox and the cache RAMs will be referred to here as the Cbox proper, while the portion of the Cbox which primarily interfaces to the NDAL will be referred to as the BIU. The Cbox datapath is organized around a number of queues and latches, an address bus and a data bus in the Cbox proper, and an address bus and a data bus in the BIU. Separate access is provided to the tag store and the data RAMs. Table 13-10 lists the Cbox queues and the major latches. Each is covered in more detail later in the section. The IPRs are not covered here, as they are covered in Section 13.5. Table 13-10: Cbox Queues and Major Latches QueuelLatch Entries AddresslData C:M_OUT_LATCH 1 Address<31:3> and data<S3:0> Holds fill data or an invalidate address being sent to the Mbox. FILL_D_-\TA_PIPEs 2 Dat8<63:0> Pipeline data destined for the MbOx. DRE..W_LATCH 1 Address<31:0> Holds a data-stream read request from the Mbox. IREAD_LATCH 1 Address<31:0> Holds an instruction-stream read request from the Mbox. WRITE_PACKER 1 Address<31:0> and data<63:0> Compresses sequential memory writes to the same quadword. WRITE_QUEUE 8 Address<31:O> and data<S3:0> Queues write requests from the Mbox. FILL_CAM 2 Address<31:3> Holds addresses for read or write misses which have resulted in a read to memory; one may hold the address of an in-progress DREAD_LOCK which has no memory request outstanding. or Holds up to 8 quadword fills and up to 2 coherence transactioris from the NDAL. Address<81:3> and data<68:O> times 4 Holds writeback addresses and data to be driven on the NDAL. The queue holds up to 2 hexaword writebacks. It is also used for quadword WDISOWNs. Address<31:O> and data<63:O> The NON_WRITEBACILQUEUE holds all non-WDISOWN transactions destined for the NDAL. This includes reads, I/O space transactions, and normal writes which are done when the cache is off or inETM. Address<81:5> data<63:O> WRITEBACK....QUEUE 2 Function It can be seen from Table 13-10 that some of the queues contain address and data entries in parallel (CM_OUT_LATCH, WRITE_PACKER, WRITE_QUEUE, WRITEBACK...,QUEUE, NON_WRITEBACK_QUEUE), some contain either addresses or data <NDAL_IN_QUEUE), some 13-18 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 contain only data (FILL_DATA_PIPE), and some contain only addresses (DREAD_LATCH, lREAD_LATCH, FILL_CAM). The Cbox is organized around an address datapath and a data datapath. A block. diagram of the data datapath is given in Figure 13-12, and a block. diagram of the address datapath is given in Figure 13-13. There are five major busses in the Cbox: C_BUs%DBUS_B<63:O>, C_BUS%BIU_DArA_B<63:O>, C_ADC%ABUS_B<31:O>, C_ADC%BIU_ADDR_OUT_B<31:O> and C_BW%ADC_ADDR_JN_B<31:O>. The first two transfer data, and the last three transfer addresses. From the block diagrams, it can be seen which of the latches and queues are connected to which busses. Transfers between address and data are connected only through the AbusIDbus Xfer block, which is in the BIU. The data flows may be understood by examining Figure 13-12. Write data enters the Cbox through the WRITE_QUEUE and is written into the data RAMs. When a writeback of a block occurs, data is read out of the data RAMs, transferred to the WRlTEBACK_QUEUE in the BIU, and is driven onto the ~TDAL. ,\Vhen read data is read from the backup cache, it is sent to the Mbox through the CM_OL~_LATCH. When read data returns from memory, it enters the Cbox through the }"l)AL_IN_QUEtJE, is driven across C_BUS%BW_DAT.A...B<63:O> to C_BUS%DBUS_H<63:O> and into the data RAMs, as well as to the Mbox through the CM_OUT_LATCH. "'hen the Bcache is oil, write data is sent from the 'WRITE_QtJEVE directly to the NON_WRlTEBACK_Qt.iEu'"'E and to memo!"); bypassing the cache entirely. The last data flow of signmcance has to do with the reading and '\vriting of IPRs. The Dbus IPRs and the NDAL IPRs are read and written directly from the data data path. The address flows may be understood by examining Figure 13-13. Address bits <31:3> are used for memory space reads and \vrites, which always address a quadword boundary. Address bits <31:0> are used for I/O space reads and writes, which may address individual bytes. Read addresses arrive through the IREAD_LATCH and the DREAD_LATCH, and write addresses anive via the WRITE_QUEUE. Each address is driven across C_ADC%ABUS_B<31:O> to the tag RAMs, where it is looked up so that hit may be calculated. The index portion of the address is also driven to the data RAMs in case of a hit. If a read or a write results in a hit, the data is sent back to the Mbox via the CM_OUT_LATCH. The requested quadword is always sent first on a Bcache hit. Bits <4:3> are driven onto Co/tMBO~FlLL_QW_B<4:3> to enable the Mbox to distinguish between quadwords within a hexaword. The most significant bits are not driven for :fill data, as the Mbox knows from its miss latches and the:fill command (D_CF or I_CF) which hexaword address the data corresponds to. If the read or write does not result in a Bcache bit, the miss address is loaded into the FILL_CAM, which holds addresses of outstanding read and write misses; the address is also driven to the BIU, where it enters the NON_WRITEBACK_QUEUE to be driven onto the NDAL. When the :fill data returns, the value of the NDAL signal P%ID_H<O> is used to locate the correct one of the two addresses in the FILL_CAM so that the data RAMs and the tag RAMs may be written. The address is driven out of the FILL_CAM to index the tag and data RAMs. DIGITAL CONFIDENTIAL The Cbox 13-19 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Another address-type operation occurs when a cache coherency transaction appears on the NDAL. In this case, the address comes in through the NDAL_IN_QUEUE and is driven from the BIU to the CBOX proper through the CBOX_BID_INTERFACE. The address is looked up in the tag RAMs, and if it hits, the address is sent through the CM_OUT_LATCH to the Mbox for a Pcache invalidate. If necessary, the VALID andlor OWNED bit is cleared for the Bcache entry. Only address bits <31:5> are used for invalidates, as the invalidate is always to a hexaword. If a writeback is required, the index is driven to the data RAMs so the data can be read out. The address is then driven to the WRITEBACK_QUEUE for the writeback; it is followed shortly by the writeback data on the data busses. When Abus IPRs are read or written, the address busses and the data busses come into play. When an Abus IPR is read, the data is driven onto C."ADC%ABUS_B<31:O> and then to C_ADC%Bnt.ADDR_OUT_B<31:O>. The BIU uses the AbuslDbus XFER block to transfer the data to C_BUS%Bm_DAl'.A,.H<63:O>; it then goes to C_BUS%DBUS_B<63:O> and back to the Mbox through the CM_OUT_LATCH. When an Abus IPR is writien, the data is driven from the Mbox through the WRITE_QUEUE, to C_BUS'itDBUS_B<63:O>, and to C_BUS%BID_DATA...B<63:O>. The AbuslDbus XFER block transfers the data to C_ADco/cBnt.ADDR_OVT_B<31:O>, and it is then driven to C_ADCo/cABUS_B<31:O> so that it can be written into the register. The byte mask is received from the Mbox for writes and I/O space reads. It is passed through the Cbox and onto the !\i'DAL for writes when the cache is off or in ETM, and it is passed through to the NDAL for all I/O space transactions. 13-20 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 13-12: Cbox block diagram with DATA_BUS C%S6 DP H<7:0> /1' 1 I I' 1E-l/_ _ _ _ _ _ _ _ _ _ _ _ _,~ DATA i" ABUS/OBUS I /, ceOX_BIU_INTERFACE DATA XFER AOOR XFER XFER~ /, NDAl IPRS RAMS r:l L::J , I DBUS IPRS!, I / I WRITEBACK QUEUE I NON_WRITEBACK QUEUE /, DIGITAL CONFIDENTIAL The Obox 13-21 NVAX CPU Chip FunctioDal Specification, Revision 1.0, February 1991 Figure 13-13: Cbox block diagram wtth ADDRESS_BUS C%CBOX_ADDR_H<31 :5>, C%MBOX_F ILL_ QW_H<4:3> /~ -----1~ IP-~_-----_..J <31 :3> ,f' _________________________________ DATA / RAMS /~ ABUS IPRSV , I' / :, L lOBUS IPRsl <20 :3> ~~ <20:5> (INDEX) <31:17> (TAG) , , / TAG / RAMS /- " I ABUS/DBUS XFER 1 W CBOX_B/U_'NTERFACE DATA XFER ADDR XFER J /f' NDAL IPRS 13-22 The Cbox WRITEBACK QUEUE I NON_WRITEBACK QUEUE DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 13.4.1 Mbox Interface All NVAX CPU chip transactions for the Cbox arrive through the Cbox-Mbox interface. Reads come from the Mbox to the Cbox through the read latches. Writes arrive through the WRITE_PACKER and the WRITE_QUEUE. All fills returning from the Cbox to the Mbox go through the CM_OUT_LATCH. A block diagram of the Mbox interface is shown in Figure 13-14. Figure 13-14: Mbox Interface ;'" ;'" M%SS_PA_H<31 :3>. M·... C_SE_PA_H<2:0> Ce;.CBOX_AOOR_H<31 :5> (INVA~S) (FI~LS) 1 1 7K C·4MBOX_FIL • ow H<',., /1' I ~/ ,I FIL.L_OATA_PIPE2 I 1" DREAD_LATCH W I I IREAD_LATCH I WRITiU:>ACKER I w I FllL_DATA_PIPE1 1 8 ENTRIES j--------- -- -------------------- ----------~ :~ ICM_ADDR_LATCHI I eM_DATA_LATCH II _________________ ___ =___________ __________ r.... ~ 1 WRITE_ QUEUE /f" i I CM OUT LATCH""" J : 1 C_BUS%DBUS_H<63:0> C_ADC%ABUS_H<31 :0> ,/ , .... ,1/ When the Mbox has a command for the Cbox, the command appears on M%S6_CMD_B<4iO>. It is not asserted for writes since the Cbox accepts all writes from the Mbox. The Cbox loads the address from M%S6_P~B<Sl:3> and M%C_S6_PA...,BdiO> into either the mEAD_LATCH, the DREAD_LATCH, or the WRITE_PACKER. If the command is a write, the Cbox loads the data from B%S6_DATA..,B and the byte enable from M%S6_BYTE..MASK...B into the WRITE_PACKER. M%CBOx.,BEF_ENABLE_L is asserted for all reads, IPR_RDs, and IPR_WRs. Table 13-11 shows the commands which pass between the Mbox and the CboL DIGITAL CONFIDENTIAL The Cbox 13-23 I NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 13-11: Mbox-Cbox Commands CODlDlBDd Description Cbox datapath element involved Instruction stream read !READ_LATCH DREAD_LATCH DREAD_LATCH Data stream read Data stream read with modify intent DREAD_LOCKl WRITE_UNLOCK WRITE IPR_RDl Interlocked data stream read Write which releases lock Normal write Read of an internal or external processor register DREAD_LATCH WRITE_PACKER, WRITE_QUEUE WRITE_PACKER, WRITE_QUEUE DREAD_LATCH Write of an internal or external processor register D_CF I_CF Instt"Uction stream cache :fill Thi\'AL Hexaword invalidate NOP No operation. 13-24 The Cbox Data stream cache fill CM_otJT_LATCH CM_OUT_LATCH C!\I_0UT_LATCH DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0. February 1991 13.4.1.1 Mbox to Cbox Transactions The Mbox commands and accompanying control and data signals are shown in Table 13-12. M%cBO~REF_ENABLE_L and M%CBOx...LArE_EN_B are used to enable certain transactions coming to the CboX; M%CBOx...LA.TE_EN_B is only used for transactions which may bit in the Pcache. From the table, is may be seen that the assertion of M%CBOX_REF_ENABLE_L is not necessary for writes and write unlocks; and that M%CBOx...LATE_EN_B is only used for DREADs, lREADs, and READ MODIFYs. M%S6_BYTE_MASB:..B<7:O> is valid for all transactions, although B%S6_D~B<6S:O> is not valid for read transactions. Table 13-12: Mbox to Cbox Command Matrix Mbox-driVeJ1 Signal or Bus M~S6_PAJI<3118> M'iC>86_CMD_BQIO> lt5CBOx.,.BEF..ENABLE-L DREAD READ :MODIFY IREAD READ LOCK IPR READ IPR 'WRITE MIJi>S6_B'!TE..MASE..B<1aO> MIiiCBOx.,.LAT.E_ENJI M~_S6_PA...BdaO> valid 1 valid valid valid valid valid valid valid X2 X valid valid valid valid valid valid valid X X valid valid X valid 0 3 hS6_DA'rA..B<83aO> ,"alid valid 0 0 valid valid WRITE X X X X valid valid valid valid valid valid UNLOCK OTHER X X X X X WRITE 1 "valid" denotes that the signal is either asserted or deasserted by the Mbox, and the Cbox interprets it appropriately. 2"X" denotes that the Mbox may drive any value to the Cbox, and the Cbox does not care what value is driven.. a"O" denotes that the Mbox never a.aerts the signal in this case. 13.4.1.1.1 The IREAD_LATCH and the DREAD_LATCH ·When the Mbox has a read command for the Cbox, the Cbox loads the address from from M%S6_PA..,.B<31:3> and M%C_S6_PA..,.B<2:O> into either the mEAD_LATCH or the DREAD_LATCH, depending on the command. Only IREADs are loaded into the lREAD_LATCH. The DREAD_LATCH is used for DREAD, DREAD_MODIFY, DREAD_LOCK, and IPR_READ. The Mbox only has one outstanding IREAD and one outstanding DREAD at a time, so no backpressure for the latches is needed. When the DREAD_LATCH is valid, the Mbox does not start the next DREAD-type transaction until all :fill data from the previous command is returned to the Mbox. When the IREAD_LATCH is valid, the Mbox does not start the next lREAD transaction until either the !READ has been aborted or all fill data from the IREAD is returned to the MbOx. DIGITAL CONFIOENTIAL The Cbcx 13-25 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The Cbox services a read bit from the read latch; a read miss is transferred to the FILL_CAM where it awaits the arrival of data from memory. Table 13-13 and Table 13-14 show the fields which are contained in the two read latches. Table 13-13: IREAD_LATCH Fields Field Purpose ADDRES8<31:0> Physical address of the read request. CMD<4:0> Specific command being done aREAD). Table 13-14: DREAD LATCH Fields Purpose Field Specific command being done (DREAD, DREAD_MODIFY, DREAD_LOCK, IPR_READ). ADDRESS<31:0> Physical address of the read request. 'Vhen the Mbox asserts M%ABORT_CBOX_IBD_H, the Cbox clears the lREAD_LATCH entry if the reference has not yet started. If the CBOX is in the middle of the tag store lookup or in the middle of a bit sequence and returning the Iread fill data, it aborts the lookup or the data sequence. If a miss has already been initiated, the CBOX continues with the fills to the backup cache but does not send any data to the Mbox. 13.4.1.1.2 WRITE_PACKER and WRITE_QUEUE Writes from the Mbox go through the WRITE_PACKER and into the WRITE_QUEUE. The WRITE_PACKER holds one quadword of data; the WRITE_QUEUE consists of 8 entries, each of which contains a quadword of data. The purpose of the WRITE_PACKER is to accumulate memory-space writes to the same quadword which arrive sequentially, so that only one write has to be done into the cache. Performance modelling shows that this can reduce by 70% the number of writes done to the backup cache. Only normal WRITE commands to the same quadword are packed together. Other writes pass immediately from the WRITE_PACKER into the WRITE_QUEUE. The WRITE_PACKER is flushed at the following times: • • • • • When a memory-space WRITE to a different quadword arrives. The new quadword then remains in the write packer until a write packer flush condition is met. When a WRITE_UNLOCK anives. The WRITE_UNLOCK is then passed immediately from the WRITE_PACKER to the WRITE_QUEUE. When an 110 space write arrives. The 110 space write is then passed immediately from the WRITE_PACKER to the WRITE_QUEUE. When an IPR_WRITE arrives. The IPR_WRITE is then passed immediately from the WRITE_PACKER to the WRITE_QUEUE. If an !READ or a DREAD arrives to the same hexaword as that of the entry in the WRITE_PACKER. 13-26 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • • Whenever any condition for fl.ushing the write queue is met on the entry in the WRITE_PACKER. If the DISABLE_PACK hit in the CCTL IPR is set. In this case, every write passes directly through the WRITE_PACKER without delay. THREE-CYCLE L.ATENCY THROUGH THE WRITE_QUEUE If the WRITE_QUEUE and the WRITE_PACKER are empty, the latency of any write through them is 3 cycles. The implication of this is that if any reads which Hush the WRITE_QUEUE are done alternately with writes, their execution will be greatly slowed. This applies to IPR reads and writes and may be an issue in testing the chip. Table 13-15 describes the fields in the 'WRITE_QUEUE. Table 13-15: WRITE_QUEUE Fields Field Purpose Indicates that the entry contains valid information. Indicates that this write conflicts with a DREAD, giving the 'WRITE_Q1.J'EL~ priority. Check is done using hexaword address. I\VR_CONFLICT Indicates that tms write confiicts with an !READ, giving the \VRITE_QtJEu'"'E priority. Check is done using hexaword address. c~m<4:0> Specific command being done. ADDRESS<:31:0> Physical address of the write. BYTE_EN<7:0> Byte enable for the write. DATA<63:0> Data to be written. When a quadword of data is moved into the WRITE_QUEUE, it is serviced by the Cbox arbiter as the lowest-priority task, unless special conditions exist. Servicing writes separately from reads allows reads to take higher priority and gets read data back to the CPU faster. However, a read which follows a write to the same hexaword must not be allowed to complete before the write completes. To prevent this there are conflict bits, DWR_CONFLICT<8:0> and IWR_CONFLICT<8:0>, implemented in the WRITE_QUEUE and WRITE_PACKER, one for each entry. The conflict bits ensure correct ordering between writes and a DREAD or an !READ to the same hexaword. .When a DREAD arrives, the hexaword address is checked against all entries in the WRITE_QUEUE and WRITE_PACKER. Any entry with a matching hexaword address bas its corresponding DWR_CONFLICT bit set. The DWR_CONFLICT bit is also set if the WRITE_QUEUE entry is an IPR_WRITE, a WRITE_UNLOCK, or an I/O space write. If any DWR_CONFLICT bit is set, the WRITE_QUEUE takes priority over DREADs, allowing the writes up to the point of the conflicting write to complete first. When an !READ arrives, the hexaword address is checked against all entries in the WRITE_QUEUE and WRITE_PACKER. Any entry with a matching hexaword address bas its corresponding IWR_CONFLICT bit set. The IWR_CONFLICT bit is also set if the WRITE_QUEUE entry is an IPR_WRITE, a WRITE_UNLOCK, or an I/O space write. If any IWR_CONFLICT bit is set, the WRITE_QUEUE takes priority over IREADs, allowing the writes up to the point of the conflicting write to complete first. DIGITAL CONFIDENTIAL The Cbox 13-27 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 As each write is done, the conflict bits and valid bit of the entry are cleared. When the last write which conflicts with a DREAD finishes, there are no more DWR_CONFLICT bits set, and the DREAD takes priority again, even if other writes arrived after the DREAD. In this way a DREAD which confiicts with previous writes is not done until those writes are done, but once those writes are done, the DREAD proceeds. The analogous statement is true for an IREAD which has a conflict. IfIWR_CONFLICT is set and the IREAD is aborted before the confiicting write queue entry is processed, the WRITE_QUEUE continues to take precedence over the IREAD_LATCH until the conflicting entry is retired. If both a DREAD and an IREAD have a confiictin the WRITE_QUEUE, writes take priority until one of the reads no longer has a confiict. If the DREAD no longer has a confiict, the DREAD is then done. Then the WRITE_QUEUE continues to have priority over the !READ_LATCH since the lREAD has a conflict, and when the confiicting writes are done, the IREAD may proceed. If another DREAD atrives in the meantime, it may be allowed to bypass both the writes and the lREAD if it has no confiicts. This mechanism is used for other cases to enforce read/write ordering. Cases where the WRITE_Q1JEtiE (and the "TRITE_PACKER) must be flushed before proceeding are listed below: 1. DREAD_LOCK and \VRlTE_UNLOCK. 2. All IPR_READs and IPR;.,.WRITEs (includes Clear Write Buffer). 3. All 110 space reads and I/O space writes. 4. Dread or Iread conflict with a write (checked to hexaword granularity, on address bits <31:5». When a DREAD_LOCK arrives from the MBOX, D'WR_CONFLICT bits for all valid writes in the WRITE_QUEli'E and WRITE_PACKER are set so that all writes preceding the DREAD_LOCK are done before the DREAD_LOCK is done. When any IPR_READ anives, all DWR_CONFLICTbits for valid entries in the WRITE_QUEUE and WRITE_PACKER are set, forcing the writes to complete before the IPR_READ completes. This ensures that IPR reads and writes are executed in order. ~Tben any D-stream I/O space read anives, all DWR_CONFLICT bits for valid entries. in the WRITE_QUEUE and WRITE_PACKER are set, so that previous writes complete first. When any I-stream 110 space read anives, all IWR_CONFLICT bits for valid entries in the WRITE_QUEUE and WRITE_PACKER are set, so that previous writes complete first. Note that when a WRITE_UNLOCK arrives, the WRITE_QUEUE is always empty as it was previously flushed before the READ_LOCK was serviced. When a new' entry for the DREAD_LATCH arrives, it is checked for conflicts with the WRITE_QUEUE. At this time the DWR_CONFLICT bit is set on any WRITE_QUEUE entry which is an 110 space write, an IPR_WRITE, or a WRITE_UNLOCK. Similarly, when a new entry for the IREAD_LATCH arrives, it is checked for conflicts with the WRITE_QUEUE. At this time the IWR_CONFLICT bit is set on any WRITE_QUEUE entry which is an I/O space write, an IPR_WRITE, or a WRITE_UNLOCK. Thus, all transactions from the Mbox except memory space reads and writes unconditionally force the flushing of the WRITE_QUEUE. Memory space reads cause a flush if they conflict with a previous write. If the WRITE_QUEUE fills up, the Cbox asserts C%WR_BUF_BAClCPRES_H. The Mbox then stops sending more writes to the Cbox until C%WR_BUF_BACK.-PRES_H is deasserted. 13-28 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 13.4.1.2 Cbox to Mbox Transactions The Cbox sends fills and invalidates to the Mbox. The signals which the Cbox drives in doing this are shown in Table 13-16. Table 13-16: Cbox to Mbox Interface signals Field Purpose Specific command being done: either D_CF, I_OF, INVAL, or NOP. Hexaword address for invalidate sent to Mbox Indicates that the quadword of:fill data being returned was the requested quadword of data: the quadword to which the original address corresponded. It is also asserted if c.cBOx..,BARD_ERlLH is asserted and the requested quadword has not yet been returned; the Mbox then notifies the !box and/or Ebox that the requested data has been returned so that the machine does not hang. ~LAST.~H Indicates that this is the last data being sent for the read request. ce-~O~BARD_ERR.H Indicates that an unrecoverable error is associated with the data. This bit only qualifies fills, not invalidates. When ~~BARD_EBR_R is asserted, the Obox also asserts Cca.A.ST_FILL_R as no more fills follow. Ct;iCBO~RARD_ERR_H may be asserted as the result of an uncorrectable error in the Bcache or as the result of RDE on the :NDAL. Indicates that a correctable backup cache ECC error is associated with the cun-ent fill data and the data should be ignored. Valid for fills only, not invalidaLes. Corrected data will follow. Address bits to indicate to which quadword within the hexaword the current fill data belongs. Bus used to receive data from the l\lbox and to send :fill data to the Mbo%. Byte data parity for BY8.DArA,.,Bd8IO>. Table 13-17 shows what signals are driven and valid for every Cbox-to-Mbox transaction. If an error in the backup cache or on the NDAL happens while fill data is being retrieved, the Cbox notifies the Mbox using C%CBO~BARD_ERRJI or c%cBO~ECC_ERR_B. Table 13-18 shows how both normal cases and error cases are handled by the Mbox. DIGITAL CONFIDENTIAL The Cbox 13-29 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 13-17: Cbox to Mbox Command Matrix ~CMD_B<1IO> Cbox-driven or Bus SigDa! NOP (00) INVAL (01) I_CF (10) D_CF (11) Xl valid2 X valid CHtEQ..DQW_JI 0 0 X valid4 ~lAST_FILL_JI 0 0 valid valid C'iiCBOx.,.BABD_EltR..B 0 0 valid valid C'iiCBOx.,.ECC_BB.R..B X X X not driven not driven valid valid valid valid driven driven driven driven C'iiCBOx...,ADDR..B<811l> 3 C'IiMBOx....nu._QW_Bc4a3> X ~_DATA.Bc881O> not driven CCieS6_DP_B<71O> not driven I"X" denotes that the Cbox may drive any value to the Mbox, and the Mbox does not care what value is driven. 2" v alid" denotes that the signal is either asserted or deassened by the Cbo:x, and the Mbox interprets it appropriately. 8"0" denotes that the Cbox never asserts the signal in this case. 'The ltfbox ignores the value driven by the Cbox in this case. Table 13-18: Cbox to Mbox commands and resulting Mbox actions Qualifiers 1 NOP I_CF or D_CF MbO% Action Qualifiers do not apply. Take no action. None asserted. Accept fill data for outstanding !READ or DREAD; expect more. CtJdAST_FILL..H asserted Accept fill data for outstanding lREAD or DREAD; expect no more. CtJK3OX'..,IWt.DJQm.,Jl, Perform invalidate, expect no more fills for this read. (C'HAST..FILL...B is always asserted when ~_EIt.1I....B is asserted.) CtJdAST_FILL..H I_CF or D_CF CtJK3Ox.,.BCCJQm.,Jl I_CF or D_CF ~JQm'" Ignore this 1ill data, expect fill later. and Ignore this 1ill data, expect fill later. OH..\ST-FILL..H This case never happens, and is disallowed. c«Jl.CIIOX,..BAR.D..BRIl...H INVAL Qualifiers do not apply. Perform invalidate. INVAL to outstanding :fill Qualifiers do not apply. Perfonn invalidate, expect :fill data. Do not validate the data in the Pcache when it returns. 13-30 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 13.4.1.2.1 CM_OUT_LATCH The CM_OUT_LATCH holds:fill data and invalidate addresses which are destined for the MbOx. The Mbox never backpressures the Cbox (it can always receive a command from the Cbox) so a queue is not needed. The latch has an address portion and a data portion. The fields are shown in Table 13-19. Field Purpose CMD<1:0> Specific command being done. ADDR<31:5> Physical address of the invalidate. This field is not used for fills. FILL_QW<4:3> Quadword alignment of the fill. This field is not used for invalidates. DATA<63:0> Fill data. The CM_OUT_LATCH is loaded with an invalidate when the backup cache deallocates a valid block or when it performs an invalidate due to a cache coherency transaction on the NDAL. The CM_OL"'T_L..~TCH is loaded with cache fill data '\vhen the l\'TIAL returns fill data which was requested by the Mbox or when a read request bits in the backup cache. Cbox control ensures that both events never happen in the same cycle. The command from the CM_OUT_LATCH is driven on C%CBOX_CMD_H<l:O>. If the command is an in'\"alidate, the address is driven on C%CBOX_ADDR_B<31:5>, and no data is driven to the lvIbox. If the command is a fill, the qu.adword alignment is driven on C%MBOx..FILL_QW_H<4:3>. (The Mbox has the hexaword address during these cycles.) Fill data is piped through the FILL_DATA_PIPEs and driven on B%S6_DATA...B<63:O>. The Cbox calculates byte parity on the £11 data and drives.it on C~_DP_B<7:O>. If an IREAD is in progress in the Cbox and the MBOX asserts Mo/cABORT_CBOX_IRD_H, the Cbox prevents any further command, address, or data for that Iread from being driven to the Mbox, as described in Section 13.4.1.2.3. DIGITAL CONFIDENTIAl. The Cbox 13-31 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 13.4.1.2.2 FILL_DATA_PIPE1 and FILL_DATA_PIPE2 The FILL_DATA_PIPEs are used to pipeline the:fill data·for two cycles so that the Obox drives B%S6_DATA,..B<63:O> coincidentally with the write-enable of the Pcache. If there is a free cycle on BcrtS6_DATA,..B<63:O>, the Obox may bypass the:fill data from the FILL_DATA_PIPE1 (to achieve a one-cycle bypass). This allows the Mbox to return data to the Ibox or the Ebox one cycle early. The cache :fill to the Pcache is done in the normal cycle, driven from FILL_DATA_PIPE2, even if Ebox or Ibox data was bypassed in an earlier cycle. The timing relationships for one cache :fill are shown in Figure 13-15. Figure 13-15: B%S6_DATA_Hc63:O> bypass timing one-cy:le oata bypass cy:le :. cy:le ~ cycle : aa~a w=it~en I to Pcache cycle 4 I I+~++~I~-~~~!-~--~I+++·+I-+·++i++~-+!-+--+I-----i+.-~-'~++~-i-+·++I+++++I+++-+I+++~+I+++++I+++++I I ,.. I ,.. I : I B%S6_DATA...H valid <.:c= :?ca:he ~~:l} : B~~6_DATA.B<63:O> va:id (-:0 !-~_:S:iS) I Mt;eCBOx.,BYPASS_ENABLE_B C%CBOX_CMD_H Co/~Ox.,FtLL_Q~~B<4~> In this example, a fill is just arrh~g' in cycle 1, so the Cbox drives C~BOx.,CMD_H and Co/ciMBOx.,FILL_QW_B<4:3>. The Mbox drives M%CBOX_BYPASS_ENABLE_H to the Cbox in cycle 2 to indicate that Bo/cS6_D~H is free during the current cycle. This causes the Obox to bypass data from FILL_DATA_PIPE1 to B0/cS6_DATA...H to achieve a one-cycle bypass. In cycle 3 the Cbox drives the data from FILL_DATA_PIPE2 to the Pcache for the write. It does this even though the bypass was done previously, because the Pcache is always written in the third cycle after C%CBOx.,CMD_B is driven with the fill command. The rules for the Cbox driving data on B%S6_DATA.,.H are as follows: 1. IF FILL_DATA_PIPE2 contains valid data, drive B%S6_DATA,..B from FILL_DATA_PIPE2 2. ELSE IF M%CBOX_BYPASS_ENABLE_H is asserted and FILL_DA.TA...PIPE1 contains valid data, drive from FILL_DA.TA...PIPE 1 to achieve a one-cycle bypass. The Mbox keeps enough state to know what the Cbox will be bypassing in any given cycle. When the Obox drives B%S6_DArA...H, it also generates byte parity and drives C%S6_DP_H with the same timing. 13-32 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The fields of the FILL_DATA_PIPEs are shown in Table 13-20. Field Purpose !READ Indicates that fill data is for an !READ. DATA<63:0> Fill data. The lREAD field is necessary in case of an IREAD abort, as described in Section 13.4.1.2.3. If Mo/aABORT_CBOX-.mD_H is asserted and the data in either FILL_DATA_PIPEl or FILL_DATA_PIPE2 is for an IREAD, that FILL_DATA_PIPE must be cleared so that data is not driven back to the Mbox. DIG(TAL CONFIOENTIAL The Cbox 13-33 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 13.4.1.2.3 IREAD Aborts The Mbox asserts the signal M%ABORT_CBO~mD_B to notify the Cbox to abort any IREAD which it is currently processing. This may happen because of a branch mispredict where the Istream has been prefetching from one branch and has to change over to the other. The Mbox then aborts all outstanding !READs so that a new !READ can begin. When the Cbox receives the abort signal, the read in question may be anywhere in the Cbox read sequence. The exact action taken depends on where the read is, as shown in Table 13-21. Table 13-21: Cbox Action Upon Receiving M%ABORT_CBOX_IRD_H State of the mEAD Action Taken by the Cbox No !READ outstanding No action taken. IREAD_LATCH but not started Clear the !READ_LATCH so the request will not be started. valid IREAD_LATCH valid and hit calculation in progress Abort the hit calculation immediately. This frees the tag store and data RAMs for another request. IREAD_LATCH valid and read hi: in progress Abort the data RAM sequence immediately. The tag store and data RAMs are freed up for another request. !READ valid FILL_CA.\f Clear the TO_MBOX bit in the FILL_CAM entry. 'When the :fill data returns from memory, validate it in the backup cache but don't send the data to the in l\Ibox. !READ fill data in Cl\COUT_LATCH or FILL_DAXA_PIPEs Clear the entry containing !READ data so that the data is not returned to the Mbox. Figure 13-16 shows an example of timing for the Cbox abort response. In cycle 1, Mo/aABORT_CBOx..IRD_B is asserted during phase 2. The Cbox is ready to drive the I_CF command and B%S6_D.A1'.A...B during phase 4. The assertion of M%ABORT_CBO~IBD_B prevents both of those actions. The next lREAD may appear two cycles after the abort. I eyele 1 I eyele 2 I eyele :3 I I +++++ 1+++++ f +++++ I +++++ f +++++ I +++++ I +++++ I +++++ I +++++ I +++++ I +++++ I +++++ I I I I I I " A I I I I I Mbox may send next 7READ I BtIS6_D.A:rA..B for I CF not c1ri ven due to abort C%CBO~CMD_B-I CF ;ot driven due to abort M%ABORT_CBOx..,IBD_B 13-34 The Cbox - DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 13.4.2 ECC Datapaths 1 The backup cache tag store and data store are both protected by error-detect-and-correct codes (ECC). ECC was chosen for its capability to correct errors because the cache is writeback. and may contain the only copy of data in the system. The codes employed detect andlor correct the following errors: 1. 2. 3. 4. 5. 6. Detect and correct single-bit errors. Detect double-bit errors. Detect three and four bit failures if within one nibble. Detect some addressing failures. Detect all-zero's failure on all protected bits. Detect all-one~s failure on all protected bits. In general, ECC works as follows: Some number of check bits are generated. Each check bit is parity calculated over some subset of the data bits to be protected. The data bits and the check bits together are known as a code word. ,\Vben data is written, the check bits are calculated and stored with the data; when data is read the check bits are regenerated and compared against the stored check bits. The result of the comparison is called the syndrome; if it is all zeros there is no error. The syndrome is passed through the syndrome decoder, which decodes one of N states. Each of the N states corresponds to one of the data or check bits being protected by ECC. If the syndrome does not decode successfully, the error is recognized as uncorrectable. If it does decode successfully, the output of the decoder indicates which bit is in error and that bit is inverted to achieve the data correction. 13.4.2.1 Backup Cache Tag Store ECC Figure 13-17 shows a block diagram of the ECC data path for tag store ECC. P%TS_TAG_H<31:17>, P%TS_OWNED_H, and P%TS_VALID_H are protected directly by ECC. When the tag store is written, the generated check bits are written into the RAMs with the tag, valid and owned bits. When the tag store is read, the check. bits are regenerated on the stored tag, valid and owned bits and compared with the stored check bits. The result of the comparison is the syndrome, which decodes to tell the hardware which bit is in error. 1 see Steve Elkind's memo of 31 January 1989. ECC Codes for NVAX Bcache, far mare detail about the codes chosen. DIGITAL CONFIDENTIAL The Cbox 13-35 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 13-17: Tag Store ECC Block Diagram NOTE: EACH PARITY TREE HAS DIFFERENT SUBSETS OF DATA LINES AS INPUTS; ·EACH PARITY TREE PRODUCES ONE CHECK BIT. TAG<31 :17>, VALID,OWNED,ADDRESS PARITY I I I I I I PARITY TREE PARITY TREE PARITY TREE PARITY TREE PARITY TREE PARITY TREE I I I I I I I XOR I <> 0 ERROR SYNDROME DECODER ERROR UNCORRECTABLE_ERROR TAG<31 :17>, VALID,OWNED CORRECT _ TAG<31 :17>,CORRECT _ VALID,CORRECT _OWNED J XOR INDIVIDUAL BITS CORRECTED TAG, VALID, OWNED A failure in addressing the RAMs is covered indirectly in the following way: When an entry is written into the tag store, even parity is generated on the on-chip version of P%TS_INDEX..H<20:5>. This is the address parity bit. (Those bits ofP%TS_INDEX_H<20:17> which are not required to address the RAMs, based on cache~selection, are zero'd during parity generation.) The address parity bit, P%TS_TAG_~~~, P%TS_OWNED_H, and ~,!S_V~_H are all used in generating the check bits to J>;~~' ~e a~dress parity bIt Itself IS not actually stored. lS t'~v,1 tOf . .;,!,f"7 When an entry is read from the tag store, parity on P%TS_INDEX_H<20:5> is recalculated and used in the regeneration of the check bits, which are then compared with the stored check bits. If there was an addressing failure in either reading or writing the RAMs, and the regenerated check bits do not match the stored check bits, the output of the syndrome decoder indicates that the address bit is in error. Addressing failures are only detected if the failure was such that incorrect parity is produced from the address. 13-36 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 "Gi-- <. (.; ,.1\",'·"') i/:"" , i~e ECC datapath makes a "predictive" ECC possible which is used in the hit calculation. While the tag RAMs are being accessed, the six predictive ECC check bits are calculated on the expected tag, valid, and owned bits. This predictive ECC is then compared with the actual ECC check bits read from the TAG RAMs during the hit calculation. In this way, an ECC error prevents a cache hit, so that a hit is never detected and then rescinded due to an error. The code used for tag ECC is shown in Figure 13-18. The check bit which is marked with a "1" in each row is generated by a parity tree whose inputs are the Tag, Valid, Owned, and AP (address parity) bits which are marked with a "1" in that row. Figure 13-18: Tag Store Error Correcting Code Matrix 1--------------------- tag bits -----------------1 / 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 AP ----------1---------------------------------------------------------------------------------SO 1 1 0 0 0 10 0 1 0 1 0 0 1 0 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 0 Syndrome Igenerated check bitsl 1 CO C1 C2 C3 C4 C5 0 V ----------1-------------1-------------1-------------1------------1------------1---------1---51 1 0 1 0 0 10 0 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 1 1 1 1 1 ----------1-------------1-------------1-------------1------------1------------1---------1---52 1 0 0 1 0 10 0 1 1 1 1 1 1 0 1 0 0 1 1 1 1 0 0 0 1 1 0 1 1 1 ----------1-------------1-------------1-------------1------------1------------1---------1---53 1 0 0 0 1 10 0 1 0 1 1 0 0 1 1 0 1 0 1 1 1 1 1 0 1 0 1 1 1 1 ----------1-------------1-------------1-------------1------------1------------1---------1---54 1 0 0 0 0 11 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 0 0 1 1 ----------1-------------1-------------1-------------1------------1------------1---------1---55 1 0 0 0 0 10 1 1 0 1 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 1 1 1 1 1 ----------1-------------1-------------1-------------1------------1------------1---------1---1 nibble 0 1 nibble 1 1 nibble 2 1 nibble 3 1 nibble 4 1 h h 1 nibble 5, three bits only not stored Even parity - CO, C2, C3, C5 Odd parity - Cl, C4 5n - (generated Cn) XOR (stored Cn) In a tag store read operation, a non-zero syndrome indicates an error. If the syndrome generated matches one of the columns in the matrix, the error is correctable and the matching column indicates the bit to be corrected. For example, if syndrome<5:0> equals 011100(BIN), then tag bit <31> must be inverted to correct the problem. Any syndrome value which is non-zero and does not match a column in the matrix indicates an uncorrectable error. This code has the property that if any three or four bits in one nibble are in error, the syndrome produced will not match any matrix column. This means that an uncorrectable error will be flagged for a single 4-bit-wide RAM failure. It does not necessarily protect against single RAM failures if 8-bit-wide RAMs are used. NOTE Nibble protection only works if the bits in each nibble shown in the matrix are physically stored in the same RAM chip. The board designer must ensure that this is the case. DIGITAL CONFIDENTIAL The Cbox 13-37 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Odd parity is used for check bits 1 and 4 to protect against the all-zeros failure mode. Otherwise, all-zeros would be a valid code word. The choice of odd and even parity bits prevents all-ones from being a valid code word as well. 13.4.2.2 Backup Cache Data Store ECC Figure 13-19 shows a block diagram of the ECC datapath for data ram ECC. P%DR_DATA_H<63:0> are protected directly by ECC. Address failure is covered indirectly in the same manner as it is covered on the tag store. When data is written into the data RAMs, parity is generated on the on-chip version ofP%DR_INDEX_H<20:3> and used as an additional data bit in generating the check bits to be stored. The address parity bit is not actually stored. When an entry is read from the data RAMs, parity on P%DR_INDEX_H<20:3> is recalculated and used in the regeneration of the check bits, which are then compared (XOR'd)with the stored check bits to produce the syndrome for the transaction. (If a cache size is selected which does not use some or all ofP%DR_INDEX_H<20:1'7>, those bits are zero'ed during the parity calculation.) In many cases an address failure is detected because the check bits will not match and an error is ilagged. The syndrome is used to calculate whether there was an error, and if so, and it was a correctable error, the syndrome tells which bit needs to be corrected. The code used for data ECC is shown in Figure 13-20. The check bit (C) which is marked with a "1" in each row is generated by a parity tree whose inputs are the data bits marked with a "I" in that row. As in tag store ECC, any syndrome value which is non-zero and does not match a column in the table indicates an uncorrectable error. A correctable error is indicated when the syndrome matches a column in the table. For example, data bit <44> must be inverted to correct the error if syndrome<7:0> equals 10000011(BIN). This code has the property that if any three or four bits in one nibble are in error, the syndrome produced will not match any matrix column. This means that an uncorrectable error will be flagged for a single 4-bit-wide RAM failure. NOTE Nibble protection only works if the bits in each 4-bit nibble shown in the matrix are physically stored in the same RAM chip. The board designer must ensure that this is the case. If x8 RAMs are used, the failure of an entire RAM chip is not protected by the code. Odd parity is used in check bits 3 and 7 to prevent all-ones and all-zeros from being valid code words. 13-38 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 13-19: Data RAM ECC Block Diagram NOTE: EACH PARITY TREE HAS DIFFERENT SUBSETS OF DATA LINES AS INPUTS; EACH PARITY TREE PRODUCES ONE CHECK BIT. DATA<63 :O>.ADDR ESS_PARITY I I I I PARITY TREE PARITY TREE PARITY TREE PARITY TREE I I I I PARITY TREE I I I PARITY TREE PARITY TREE PARITY TREE I I I GENERATED_CHECK_BITS<7:0> I XOR I <> 0 ERROR SYNDROME DECODER ERROR UNCORRECTABLE_ERROR I XOR INDIVIDUAL BITS CORRECTED_OATA<63:0> DIGITAL CONFIDENTIAL The Cbox 13-39 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 13-20: Backup Cache Data Store Error Correcting Code Matrix DDDD DDDD DDDD DDDD DDDD DDDD DDDD DDDD DDDD DDDD DDDD DDDD DDDD DDDD DDDD D DDD 0123 4567 8911 1111 1111 2222 2222 2233 3333 3333 4444 4444 4455 5555 5555 6CCC CCCC C666 A 01 2345 6789 0123 4567 8901 2345 6789 0123 4567 8901 2345 6789 0012 3456 7123 P 1101 0001 0001 0011 1110 1010 0010 0010 0100 1011 01Il 1100 1100 1000 0101 0100 1111 1111 0001 0010 1111 0101 1001 0111 0001 1111 1010 0110 1101 0100 1011 1111 0000 0100 1000 0000 1111 0000 1111 0000 1101 1101 1100 0011 0010 1010 1010 1011 0100 0100 0111 0111 0111 1000 1001 0001 0001 0001 0001 0001 0100 0100 1000 0111 1101 1000 1000 0010 1101 1011 1101 0010 0100 1011 1000 1111 0000 0000 0000 1111 0010 1101 1110 1010 1110 0100 0000 0000 0100 1011 1011 1101 1101 1010 0000 0000 1001 0110 0101 0111 0011 0001 0000 0000 0001 0001 1111 0001 1111 1000 1000 0100 1101 0010 0110 1000 0110 0000 0100 0010 1011 0100 1001 0100 1001 0000 0010 0001 0111 0111 1111 1101 0000 1000 0001 0111 0000 1111 0000 1111 1111 0000 0000 1111 1 1 1 0 1 1 0 0 SO 51 52 53 54 55 56 57 AP is not stored in the RAMs. Even parity - CO, C1, C2, C4, C5, C6 Odd parity - C3, C7 5n - (generated Cn) XOR (stored Cn) 13-40 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.4.3 The BIU The BIU contains the NDAL pads, the NDAL_IN_QUEUE, the WRITEBACK_QUEUE, the NON_WRITE BACK_QUEUE, the BIU IPRs, and timeout counters for outstanding reads. The pads are run on the NDAL clocks, while the rest of the BIU is run on the NVAX internal clocks. The BIU IPRs are described in Section 13.5; the rest of the BIU is described here. 13.4.3.1 NDAl_IN_QUEUE The NDAL_IN_QUEUE receives fill data and cache coherency requests from the NDAL. It consists of 8 quadword entries for fill data and two entries for cache coherency addresses. Queue control ensures that each entry is processed in the order in which it was received, so that fills and coherency requests are always processed in order. The BIU also uses the NDAL_IN_QUEUE mechanisms to inform the FILL_CAM that a read transaction was not acknowledged or timed out before the fill data returned. The 8 fill data slots ensure that there is always room in the queue for CPU fill data being returned from memory. The two cache coherency slots are managed through the assertion of PO/oCPU_SUPPRESS_L. The BIU asserts Po/oCPU_SUPPRESS_L on the NDAL to prevent the cache coherency slots from overflowing. When one slot fills, the BIU must assert Po/oCPU_SUPPRESS_L immediately because the next NDAL cycle may be another cache coherency cycle, which would fill both queue slots. This means that two cache coherency commands may be received only if they are on back-to-back cycles; if only one is received, P%CPU_SUPPRESS_L is asserted until that one is handled by the Cbox. This should happen quickly since the NDAL_IN_QUEUE is serviced by the Cbox as the highest priority task. The BIU deasserts P%CPU_SUPPRESS_L when it is able to accept more cache coherency commands. Note that fill data may always return, whether or not Po/"cPU_SUPPRESS_L is asserted, as there is always room in the queue for fill data. The NDAL_IN_QUEUE is loaded with a valid entry to be processed by the Cbox (1) whenever there is a valid memory address cycle on the NDAL, where P%ID_H<2:1> is not equal to the NVAX ID, and which is accompanied by one of the following commands: IREAD, DREAD, OREAD, or WRITE (cache coherency cycles); (2) whenever there is a Read Data Return or Read Data Error cycle on the NDAL and P%ID_H<2:1> indicates that it belongs to the CPU; (3) when the BIU detects NACK for an outgoing read; (4) when a read transaction times out before data is returned. The fields of the two portions of the NDAL_IN_QUEUE are shown in Table 13-22. DIGITAL CONFIDENTIAL The Cbox 13-41 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 13-22: NDAL_IN_QUEUE Fields Field Purpose Fill entries VALID Indicates that the entry contains valid information. DATA<63:0> Fill data being returned. Cache coherency entries VALID Indicates that the entry contains valid information. ADDRES8<31:5> The address of the cache coherency request. When the BIU sends a transaction from the NDAL_IN_ QUEUE to the Cbox proper, it is accompanied by one of the commands shown in Table 13-23. Table 13-23: BIU commands sent to Cbox proper Command name HeaDing C_BIU%%NOP_C:MD No operation. C_BIU%%FILL_O_CMD Fill for FILL_CAM entry o. C_BIU%%FILL_l_CMD Fill for FILL_CAM entry 1. C_BIU%%RDE_O_CMD Read Data Error for FILL_CAM entry o. C_BIU%%RDE_l_CMD Read Data Error for FILL_CAM entry 1. C_BIU%%NACK_O_CMD No NDAL acknowledgement received for read from FILL_CAM entry o. C_BIU%%NACK_1_CMD No NDAL acknowledgement received for read from FILL_CAM entry 1. C_BIU%%TIMO_O_CMD Read from FILL_CAM entry 0 has timed out. C_BIU%%TIMO_1_CMD Read from FILL_CAM entry 1 has timed out. C_BIU%%INVAL_R_CMD Cache coherency request resulting from a DREAD or an IREAD on the NDAL. C_BIU%%INVAL_O_CMD Cache coherency request resulting from an OREAD or a WRITE on the NDAL. No address is returned for fills, as the NDAL P%ID_H<O> which is returned tells the Cbox which FILL_CAM entry was used for the read address. This information is encoded in the commands in Table 13-23. The Cbox uses the backup cache index from the FILL_CAM to write the correct locations in the tag store and data RAMs. There are four separate NDAL Read Data Return commands to allow the Cbox to identify the quadwords within the hexaword as they return. The lower two bits of the NDAL command are encoded to represent bits <4:3> of a quadword address. The BIU passes these bits to the CBOX_BIU_INTERFACE, which drives them onto C.,.ADC%ABUS_B<4:3> when the data is driven onto C_BUS%DBUS_B<63:O>. The information is then driven to the Bcache and to the Mbox. In this way the correct quadword cache entry is written in both caches. 13-42 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.4.3.2 NON_WRITEBACK_QUEUE All outgoing commands except disown writes pass through the NON_WRITEBACK_QUEUE. When the backup cache is on, the NON_WRITEBACK_QUEUE contains read misses, OREADs due to write misses, and 110 space reads and writes. When the backup cache is off, all transactions except quadword disown writes (which result from WRITE_UNLOCKs) go oQ.t through the NON_WRITE BACK_QUEUE. The NON_WRITEBACK_QUEUE has two entries. The fields of each entry in the queue are shown in Table 13-24. Table 13-24: NON_WRITEBACK_QUEUE Fields Field Purpose VALID Indicates that the entry contains valid information. CMD<3:0> Specific command being done. 100 Identification, driven onto P%ID_H<o>, for outgoing reads only. ADDRES8<31:0> Address of the outgoing command. LENGTH<63:62> Length of the outgoing command. BYTE_ENABLE<47:40> Byte enable. DATA<63:0> Data, used if the outgoing command is a write. The format of the address field corresponds to that of an address cycle on the NDAL, which is described in Section 3.3.4.l. Writes from this queue are always byte-enabled quadword writes whether to memory space or 110 space. The NON_WRITEBACK_QUEUE has a backpressure signal so that when it gets full, the Cbox stalls transactions from the Mbox until there is room in the queue to proceed. Fills and cache coherency transactions continue normally. 13.4.3.3 WRITEBACK_QUEUE The WRITEBACK_QUEUE holds addresses and data for write disowns to memory. It contains two entries, each consisting of address and data for either a hexaword or a quadword disown write. Table 13-25 shows the fields in the WRITEBACK_QUEUE. Table 13-25: WRITEBACK QUEUE Fields Field Purpose VALID Indicates that the entry contains valid information. CMD<3:0> Specific command being done. ADDRES8<63:0> Address cycle for the writeback. DATAO<63:0> First quadword of writeback data. DIGITAL CONFIDENTIAL The Cbox 13-43 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 13-25 (Cont.): WRITEBACK QUEUE Fields Field Purpose DATAl<63:0> Second quadword of writeback data. DATA2<63:0> Third quadword of writeback data. DATA3<63:0> Fourth quadword of writeback data. BYTE_ENABLE<7:0> Byte enable for quadword disown writes. The format of the address field corresponds to that of an address cycle on the NDAL, which is described in Section 3.3.4.l. When a disown write is done, the ADDRESS field is first loaded. CMD<3:0> is loaded with the WDISOWN command. Four quadwords of write data are loaded if the transaction is hexaword length; if the transaction is quadword length, one quadword of data is loaded. All writeback data is read from the data RAMs before the NDAL transaction is started, to simplify error handling. If a quadword of data is read out with an uncorrectable error, the command field sent with that data cycle is changed from WDATA to BADWDATA The WRITEBACK_QUEUE always takes priority over the NON_WRITEBACK_QUEUE in driving the NDAL. The WRITEBACK_QUEUE backpressures the Cbox control when it gets full, causing the following: 1. 2. 3. 4. 13-44 All reads from the Mbox are prevented. All writes from the Mbox are prevented. All :fills are prevented. All cache coherency lookups are prevented. The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.4.3.4 Timeout counters The BIU has two timeout counters, one for each read request which may be outstanding. If all the fills for an outstanding read have not completed when the associated timeout counter expires, the BIU notifies the FILL_CAM of the error and it is handled as described in Chapter 3. The NVAX timeout counters are shown in Figure 13-21. The Ebox contains the Ebox base counter and the Ebox counter, which counts Ebox stall cycles. The Cbox contains two read counters which, in normal mode, are driven from the Ebox base counter. The Ebox counters are described in detail in Chapter 8. Three IPR bits control the operation of the timeout counters. When ECR<TIMEOUT_EXT>, E CR<S3_TIME OUT_TEST> , and CCTL<TIMEOUT_TEST> are all cleared, the counters are in normal mode. When ECR<TIMEOUT_EXT> is set, an external timebase may be used to lengthen the timeout period; when CCTL<TIMEOUT_TEST> is set, the read timeout counters are placed in test mode, under which the read timeout values are shortened; and when ECR<S3_TIMEOUT_TEST> is set, the Ebox counter is put in test mode, under which the S3 timeout value is shortened. DIGITAL CONFIDENTIAL The Cbox 13-45 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 13-21: NVAX TImeout Counters NOT{RESET) EBOX BASE COUNTER 16 BITS E'%TlMEOUT BASE H (MASTER UPDATE-ENABLE) VDD (SYNCRONIZED F"JIoOSC_TC1_H) ECR<TlMEOUT_EXT> E_FLT'%S3 TIMEOUT STALL H VOD ECR<TIMEOUT_TEST> READO COUNTER a BITS r-------------~CL~R ENABLE READ 1 COUNTE 8 BIT ~~--------~CL~R ENABLE In normal mode, the Cbox and the Ebox share the base counter, which is run from the internal NVAX clock. The 12-bit Ebox counter and the 8-bit Cbox read counters are clocked with the global signal, E%TIMEOUT_ENABLE_H, which is generated from the 16-bit base counter. In normal mode, E%TlMEOUT_ENABLE_H is asserted for one NVAX internal cycle when the Ebox base counter overftows; if an external timebase is used (if ECR<TIMEOUT_EXT> is asserted), E%TIMEOUT_ENABLE_B is asserted for one cycle of the external timebase when the counter overflows. E%TIMEOUT_BASE_H is always asserted when the timeout counter is in normal mode; if ECR<TIMEOUT_EXT> is asserted, E%TIMEOUT_BASE_H is asserted for one NVAX. internal cycle when the input clock transitions high. The timeout values for normal mode are shown in Table 13-26. 13-46 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 13-26: NVAX TImeout Values In Normal Mode Cycle time Timeout Granularity 100ns Read timeout! Ebox timeout1 655 microseconds 167.117 (minimum) to 167.772 (max) milliseconds 2.6837 (minimum) to 2.68345 (max) seconds 786 microseconds 200.54 (minimum) to 201.327 (max) milliseconds 3.22044 (minimum) to 3.22123 (max) seconds 917 microseconds 233.964 (minimum) to 234.881 (max) milliseconds 3.75718 (minimum) to 3.7581 (max) seconds NVAX 12-ns NVAX 14-ns NVAX 1 The timeout logic is in normal mode ECR<S3_TIMEOUT_TEST> are all cleared. when ECR<TIMEOUT_EXT>, CCTL<TIMEOUT_TEST>, and Each Cbox read counter is initialized to zero when it is not enabled with either C_BIU_NOC_5%BXI...TIMO_O_EN'_H or C_BIU_NOC_5%BXLTlMO_l_EN_H, and counts as long as the read is outstanding. If all the fills do not return within the timeout period, the counter overflows and C_BIU_NOC%BXI_TIMO_O_LAT_H or C_BID_NOC%BXI_TIMO_l_LAT_H is asserted. As a result, the read is aborted, the timeout counter is reset to zero, and the error is handled as described in Chapter 3. If a system designer needs to lengthen the timeout values, an external timebase, K%EXT_TMBS_H, can be selected by setting ECR<TIMEOUT_EXT> in the Ebox control register. In this case, the Ebox base counter is clocked with the external timebase, which enters the chip through Po/aOSC_TCl_H. The counters are configurable for use at chip test and at power-up test. At chip test and/or during power-up diagnostics, the read counters can be tested in the following way: Set CCTL<TIMEOUT_TEST> so that the Cbox counters run off the internal NVAX clock. Clear ECR<S3_TIMEOUT_TEST>. Do a read of a memory or I/O space location which will not respond within the timeout period. A read timeout should occur. This must be done for each timeout counter. The timeout values for the Cbox and Ebox counters in test mode are shown in Table 13-27. Table 13-27: NVAX Timeout Values In Test Mode Cycle time Timeout Granularity 100ns Read timeout1 Ebox timeout2 10 nanoseconds 2.55 (minimum) to 2.56 (max) microseconds 40.95 (minimum) to 40.96 (max) microseconds 12 nanoseconds 3.06 (minimum) to (max) microseconds 3.072 49.14 (minimum) to 49.152 (max) microseconds 14 nanoseconds 3.57 (minimum) to (max) microseconds 3.584 57.33 (minimum) to 57.344 (max) microseconds NVAX 12-ns NVAX 14-ns NVAX lRead timeout test is done under these conditions: ECR<TIMEOUT_EXT> and ECR<S3_TIMEOUT_TEST> cleared; CCTL<TIMEOUT_TEST> set. 2Ebox timeout test is done under these conditions: ECR<TIMEOUT_EXT> and CCTL<TIMEOUT_TEST> cleared; ECR<S3_TIMEOUT_TEST> set. DIGITAL CONFIDENTIAL The Cbox 13-47 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Forcing timeouts cannot be done by reading nonexistant memory or I/O: NDAL designers respond to nonexistant memory and I/O space with either NACK or RDE, which happen well before the timeout counters expire. A timeout can be accomplished in the following way: 1. Do a write or a read-modify-write which causes an OREAD to bring owned data into the backup cache. 2. Do an IPR WRITE to clear the owned bit in the backup cache tag store. 3. Perform another operation which requires ownership in the Bcache. This OREAD will timeout because it won't hit in the backup cache and memory won't respond because it believes the backup cache owns it. 4. Do an IPR WRITE to the Bcache tag store to put it back into the owned state. The list which follows describes a scenario in which read data takes a long time to return to the Ebox. This case should not approach the Ebox timeout value; it is given to illustrate what can keep data from returning quickly to the Ebox. 1. The Cbox write queue is full. 2. A Dread, call it Dread A, enters the Cbox and has a conftict with the last write queue entry, Write A, which means that the whole write queue must be cleared out before Dread A can proceed. 3. The writes in the write queue all miss in the Bcache, and each one requires a writeback from another CPU which owns the block. As each writeback is done, the data is returned to the Bcache, ownership is passed to the Bcache, and the write queue is emptied of one write. In this scenario, eight writebacks are required before Read A can be processed. 4. After the Oread for Write A reaches the NDAL, an invalidate arrives for A After the data is returned and Write A is processed, the block will be written back, due to the previous invalidate. 5. Now Dread A will miss in the Bcache, and it will have to wait for another writeback. Eventually this read data will return, and the Ebox gets its data. DERIVATION OF TIMEOUT VALUES The timeout values given on the previous pages were derived from NVAX cycles as follows: Table 13-28: NVAX mode Normal Derivation of NVAX Timeout Values Timeout Granularity Read timeout Ebox timeout (in NVAX cycles) (in NVAX cycles) (in NVAX cycles) 2**16 2**24-2**16 (minimum) to 2**24 (max) Test 1 2**8-1 to 2**8 (max) 13-48 The Cbox 2**28-2**16 (minimum) (max) (minimum) to 2**28 2**12-1 (minimum) to 2**12 (max) DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.4.3.5 BIU clocking: Relating Internal cycles to external cycles Three NVAX internal cycles take place in the time of one NDAL cycle. The BIU relates internal cycles to external cycles by naming the internal cycles according to where they fall relative to the external cycle. This is shown in Figure 13-22. Figure 13-22: BIU cycle counts BIU CYCLE COUNT: PHil NVAX INTERNAL CYCLES: PHI2 PHI3; PHI4 NDAl PH 11 PHil i PHI2 NDAl PH 12 ; PHI3 ; PHI4 PHil NDAl PH 13 PHI2 i PHI3 ; PHI4 NDAl PH 14 NDAl EXTERNAL CYCLE: The BIU has a shift register which asserts only one of the signals C_BIUo/cCYCLE_l_H, C_BIUo/cCYCLE_2_H, and C_BIUo/oCYCLE_3_H during any given NVAX. cycle. This shift register is initialized properly by K_CE%RESET_H, which comes from the clock section of the chip. During reset, the clock section asserts K..,.CE%RESET_H during every NDAL phase 4, allowing the BIU to initialize the shift register properly. Only the NVAX. internal clocks are used in the Cbox and BIU, while only the external clocks are used in the pad ring. Through the use of C_BWo/£YCLE_CH, C_BIUo/oCYCLE_2_H, and C_BIUo/oCYCLE_3_H, the BIU is able to properly drive and receive the NDAL to and from the pad ring. There is a delay in the NDAL clocks as they travel from NVAX to the other NDAL chips and also back to NVAX. The delay from the NVAX. output pin, p%pm12_0UT_H, to the NVAX input pin, P%P1n12_IN_H, may be as little as Ons or as much as three internal NVAX. phases (one NDAL phase). This delay is shown graphically in Figure 13-23. DIGITAL CONFIDENTIAL The Cbox 13-49 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 13-23: NVAX time relative to NDAL time 1 ,. VA'2c! C:' I'. 1 : Vi~!=1 2. 1-: Vi', CYI~!' , 1 K_MCB%PHI_l_H I r----\. I : : ~ : : r--... I :'----:---:--...,., I : : j-\ {,. . __:_....J~ : ~ K_MCB%PHI_2_H K_MCB%PHI_3_H K_MCB%PHI_4_H j-\ : r-----\ ~'-----...,.z --1--1--1--1--1--[--1--1--[--1--1-\ - - - - - - - - - 0 N END A L , C Y C L i: , - - NDAL PRASE 1 - - : - - NDAL PRASE 2 - - : - - NDAL PRASE 3 --:--NDAL PRASE 4 - - P~PHI_12_0UT_H P%PHI_23_0UT_H P%PRI_34_0UT_R { j i ,~--------------------------------~ ~CLE P%PRI_4l_0UT_R P%PRI 12 IN H (earliest possible) PtfPRI 23 IN H (earliest possible) PtfPHI 34 IN R (earliest possible) P%PHI 41 IN R (earliest possible) ~ LLIEST POSSIBLE NDAL - - NDAL PRASE 1 - - ; - - NDAL PRASE 2 - - ; - - NDAL PRASE 3 --;--NDAL PRASE 4 - - - I I \~ I __________________________________ I ~ : ~----------------Jir----------------~----------------~\~----------------~ {~____________________________....Ji~------------------------------~ I ~--------------~, I : Ir---------------~ : I : LATEST I - - NDAL PRASE 1 I I P%PRI 12 IN H (latest possible) POSSIBLE NDAL CYCLE --~-- NDAL PRASE 2 - - ; - - NDAL PRASE 3 - - ; - . . \ : . ~---------------Ji~----------------~----------------~'-. . Jjr------------------{,'--______________________________ __________________________________Jj__ P%PRI 23 IN R (latest Possible) P%PRI 34 IN H (latest possible) ~--------------~\~ P%PRI 41 IN H (latest possible) I I: :: P%NDAL H :: PEN TO RECUVE ND;:::::m»»»») (earliest possible) P%NDAL B (latest possible) PEN TO RECEIVE I :, I I \ ....._ _ _ _ _ _ _ _ _ _ _ _ _---J~ :, r----\ : NDAL~> / '\I I~--------~~ NVAX P1U4, CYCLE 3 LATCH OPEN TO BRING NDAL INTO INTERNAL NVAX TIME~ K....MCB%PIII..IJl, K.>fCB%PBI_2_H, K....MCB%PIII..3_H, and K,J\fCB%PHJ..4_H are the internal NVAX clocks which are used in the Cbox. Figure 13-23 shows that the NDAL clocks at the input pins (P%PWI2_IN_H, P%PHI23_IN_H, P%Pffi34_IN_H, and P%PW41_IN_H) may be delayed by up to three internal NVAX phases. The NDAL always operates with respect to the clocks as received at each NDAL driver/receiver, so if the NDAL clocks are delayed, the entire operation of the NDAL is delayed. 13-50 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 The CBOX BIU is designed so that even if the NDAL is operating with three full phases of delay from the internal NVAX clocks, the BIU is able to drive and receive the NDAL properly. For example, P%NDAL_H<63:0> are valid at the beginning ofNDAL phase 3. NVAX receives this bus using an NDAL latch which is open while P%PHI23_IN_H is asserted. The output of this latch is sent from the NVAX pad ring to a latch in the NVAX BID which is open during NVAX phase 4 of BID cycle 3. This timing allows 2 NVAX phases of delay to get the signal from the pad ring to the BID. Thus, the NDAL is properly received for the entire range of possible NDAL delay. Once the NDAL is latched by the phase 4, cycle 3 latch, the BIU operates entirely using the internal NVAX clocks; the NDAL clocks are only used in the pad ring itself. DIGITAL CONFIDENTIAL The Cbox 13-51 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.4.4 The FILL_CAM The FILL_CAM has two entries, each of which is used for an outstanding read to memory or for a DREAD_LOCK in progress. Its depth limits the number of outstanding reads to memory at a time. The fields in each FILL_CAM entry are described in Table 13-29. Table 13-29: FILL_CAM Relds Field Purpose ADDRES8<3 1:3> RDLK Quadword-aligned address of the read request. Indicates that a READ_LOCK is in progress. IREAD This is an Istream read from the Mbox which may be aborted. OREAD This is an outstanding OREAD; block ownership bit should be set when the fill returns. WRITE This read was done for a write; write is waiting to be merged with the fill. TO_MBOX Data is to be returned to the Mbox. RIP READ invalidate pending. OIP OREAD invalidate pending. Do not fill - data is not to be written into the cache or validated when the fill returns. Indicates that the last fill for a READ_LOCK arrived. Indicates that the requested quadword of data was received from the NDAL. Counts the number of fill quadwords that have been successfully returned. Indicates that the entry contains valid information. DNF RDLK_FL_DONE REQ...FILL_DONE COUNT<l:O> VALID The FILL_CAM backpressures the Cbox control so that if it is full, any read or write request stalls until an entry is free. When the read miss first occurs and the FILL_CAM entry is loaded, the following bits are cleared: RIP, OIp, RDLK_FL_DONE, and RE~FILL_DONE. VALID is set and the ADDRESS field is loaded. IREAD, RDLK, OREAD, WRITE, and TO_MBOX are loaded with the correct information. If the cache is off, in ETM, or the miss is for an 110 reference, DNF is set; otherwise it is cleared. COUNT is set to 0 if four fill quadwords are expected; it is set to 3 if only one quadword is expected. As each fill returns successfully, COUNT is incremented so that when the final fill returns and COUNT=3, the Cbox updates the tag store appropriately. If an abort request arrives from the Mbox, and the entry is marked IREAD, the TO_MBOX bit is cleared. When the data returns, it will be written into the backup cache (if DNF is not set) but it will not be sent to the Mbox. If a coherence request arrives from the NDAL which matches the address of a FILL_CAM entry, RIP or OIP may be set. Table 13-30 shows when each is set. 13-52 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 13-30: Cbox Response to Coherence Transactions to FILL_CAM Entries Coherence State of OREAD bit OREAD set or clear Transaction OREAD, READLK, Cbox: Action write Set OIP. Send invalidate immediately to the Pcache. OREAD set DREAD, IREAD Set RIP. OREAD clear DREAD,IREAD Take no action. any When all the fills for an outstanding miss have completed, a cache coherence transaction is initiated if either RIP or alP is set and DNF is not set. This is done immediately after the fill and the validate of the cache are done, and cannot be interrupted by any other transaction. When a WRITE_UNLOCK completes successfully and RIP or alP is set, the cache coherence transaction is initiated immediately. There are several error cases where RIP or alP may be set, indicating the need ,for a cache coherence transaction, but the Cbox will not perform the transaction, possibly causing the system element to time out. These cases are as follows: 1. The fill sequence fails by ending in RDE or timeout. If the fill was meant for the Pcache and ends in an error, the Pcache invalidates itself. 2. A READ_LOCK sequence does not conclude with a WRITE_UNLOCK but with a write-one-to-c1ear to the RDLK bit in CEFSTS. As shown in the table above, when an ownership-type coherence transaction arrives, an invalidate is sent immediately to the Pcache and alP is set. When the cache coherence transaction to the tag store is processed immediately after all the fills have arrived, a second invalidate will be issued to the Pcache, although it is not strictly necessary. The first invalidate is sent immediately so that the block in the Pcache is invalidated as soon as possible, to prevent the stale data from being accessed before the rest of the fills return. 13.4.4.1 Block-conflict In the FILL_CAM Every new read or write from the Mbox is checked against valid FILL_CAM entries so that any transaction with a cache block conflict is stalled until all the fills return for the outstanding read, clearing the conflicting FILL_CAM entry. In this way, cache accesses to a block with an outstanding fill are prevented. When the cache is off or in ETM, writes are not checked for block conflict but are sent immediately to memory. 13.4.4.2 The FILL_CAM and DREAD_LOCKs Each DREAD_LOCK from the Mbox is held in the FILL_CAM until the associated WRITE_UNLOCK completes, regardless of whether the read hits or misses in the backup cache. Only one DREAD_LOCKlWRITE_UNLOCK transaction is in progress at a time. A DREAD_LOCK which does not produce an owned hit in the backup cache results in an ORE.AD on the NDAL to gain ownership of the block so that the write can be done. DIGITAL CONFIDENTIAL The Cbox 13-53 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 By holding the DREAD_LOCK address in the FILL_CAM from the time the DREAD_LOCK starts until the WRITE_UNLOCK completes, the Cbox prevents the block from being written back to memory during that time. This guarantees that the DREAD_LOCKlWRITE_UNLOCK sequence will not be interrupted by another CPU requesting ownership of the block. The CPU depends on no other state in memory once the OREAD is done in order to complete the WRITE_UNLOCK, so no deadlock can arise. Every new transaction is checked against the FILL_CAM to ensure that the block is not inaccessable due to an outstanding fill or DREAD_LOCK. If either RDLK bit is set in the FILL_CAM, lREADs and DREADs are not processed. Incoming fills and coherency transactions continue normally; and the WRITE_QUEUE is serviced normally. The only transaction which should appear in the WRITE_QUEUE (when either RDLK bit is set) is the WRITE_UNLOCK corresponding to the READ_LOCK The one exception to this is when the READ_LOCK terminates in an error. In this case an IPR_WRITE to CEFSTS is the next transaction which appears in the WRITE_QUEUE .. Specifically, a write-one-to-clear of the RDLK bit in CEFSTS has the side effect of clearing any RDLK bit in the FILL_CAM which is set. If one of the RDLK bits is cleared in the FILL_CAM, hardware also clears the corresponding valid bit, freeing the entry for a new transaction. When the RDLK hit is cleared by a normal WRITE_UNLOCK, a cache coherency transaction is initiated if RIP or OIP was set on the entry. RIP and OIP are ignored when the RDLK bit is cleared by the "IPR write unlock" method. 13-54 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.5 Cbox Internal Processor Registers The processor registers that are implemented by the NVAX Cbox are logically divided into three groups, as follows: Normal-Those IPRs that address individual registers in the NVAX CPU chip or system environment. Bcache tag IPRs-The read-write block of IPRs that allow direct access to the Bcache tags. Bcache deallocate IPRs-The write-only block of IPRs by which a Bcache block may be deallocated. • • • Each group of IPRs is distinguished by a particular pattern of bits in the IPR address, as shown in Figure 13-24. Figure 13-24: IPR Address Space Decoding as seen by Software NODmal IPR Address 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I SBZ I 0I SBZ 1 IPR Number I +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Bcache Tag IPR Address 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 SBZ I 11 01 01 xl Bcache Tag Index I SBZ I +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Bcache Deallocate IPR Address 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ SBZ I 11 01 11 xl Bcache Tag Deallocate Index 1 SBZ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ DIGITAL CONFIDENTIAL The Cbox 13-55 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 The numeric range for each of the three groups is shown in Table 13-31. Table 13-31: IPR Address Space Decoding IPR Address Range (hex) Contents Normal OOOOOOOO..OOOOOOFF 256 individual IPRs. Bcache Tag BCTAG OlOOOOOO..OllFFFE02 64k Bcache tag IPRs, each separated by 20(hex) from the previous one. Bcache Deallocate BCFLUSH 01400000.. 015FFFE02 64k Bcache tag deallocate IPRs, each separated by 20{hex) from the previous one. IPR Group Mnemonic! IThe mnemonic is for the first IPR in the block 2Unused fields in the IPR addresses for these groups should be zero. Neither hardware nor microcode detects and faults on an address in which these bits are non-zero. Although non-contiguous address ranges are shown for these groups, the entire IPR address space maps into one of the these groups. If these fields are non-zero, the operation of the CPU is UNDEFINED. NOTE The address ranges shown above are those used by the programmer. When processing normal IPRs, the microcode shifts the IPR number left by 2 bits for use as an IPR command address. This positions the IPR number to bits <9:2> and modifies the address range as seen by the hardware to 0 .. 3FC, with bits <1:0>=00. No shifting is performed for the other groups of IPR addresses. Because of the sparse addressing used for IPRs in groups other than the normal group, valid IPR addresses are not separated by one. Rather, valid IPR addresses are separated by 20(hex). For example, the IPR address for Bcache tag 0 is 01000000 (hex), and the IPR address for Bcache tag 1 is 01000020 (hex). In this group, bits <4:0> of the IPR address are ignored, so IPR numbers 01000001 through 010oo01F all address Beache tag O. Processor registers in all groups except the normal group are processed entirely by the NVAX CPU chip and will never appear on the NDAL. This is also true for a number of the IPRs in the normal group. IPRs in the normal group that are not processed by the NVAX CPU chip are converted into 110 space references and passed to the system environment via a read or write command on the NDAL. The processor registers implemented by the NVAX Cbox are are shown in Table 13-32. 13-56 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 13-32: Cbox Processor Registers Register Name Cbox Mnemonic (Dee) (Hex) Type Loci Adctt2 Cbox Control Register CCTL Number Reserved for Cbox 160 AO Cbox RW Abus 280 161 Al Bcache Data ECC BCDECC 162 A2 W Dbus 288 Bcache Error Tag Status BCETSTS 163 A3 RW Abus 28C Bcache Error Tag Index BCETIDX 164 A4 R Abus 290 Bcache Error Tag BCETAG 165 AD R Abus 294 Bcache Error Data Status BCEDSTS 166 A6 RW Dbus 298 Bcache Error Data Index BCEDIDX 167 A7 R Abus 29C Bcache Error Data ECC R Dbus 2A0 BCEDECC 168 A8 Reserved for Cbox 169 A9 Reserved for Cbox 170 AA 171 AB R Abus 2AC RW Abus 2BO RW BIU 2B8 R BIU 2CO R BIU 2C8 R BIU 2DO R BIU 2D8 R BIU 2EO RW Abus Fill Error Address CEFADR Fill Error Status CEFSTS 172 AC 173 AD 174 AE 175 AF 176 BO 177 B1 178 B2 179 B3 NEDATHI 180 B4 181 B5 NEDATLO 182 B6 183 B7 184 B8 Reserved for Cbox 185 B9 Reserved for Cbox 186 BA Reserved for Cbox NDAL Error Status NESTS Reserved for Cbox NDAL Error Output Address NEOADR Reserved for Cbox NDAL Error Output Command NEOCMD Reserved for Cbox NDAL Error Data High Reserved for Cbox NDAL Error Data Low Reserved for Cbox NDAL Error Input Command NEICMD Reserved for Cbox 187 BB Reserved for Cbox 188 BC Reserved for Cbox 189 BD Reserved for Cbox 190 BE 191 BF Reserved for Cbox Bcache Tag (01000000 - 011FFFEO HEX) BCTAG 1 Each Cbox IPR is located in the Cbox Abus datapath, the Cbox Dbus datapath, or the Cbox BIU datapatb.. 2The address given is as it is seen in the Cbox, after microcode has shifted the software address left by two bits. DIGITAL CONFIDENTIAL The Cbox 13-57 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 13-32 (Cont.): Cbox Processor Registers Number Cbox Cbox Register Name Mnemonic (Dec) (Hex) Type Loci Ad~ Bcache Deallocate (01400000 - 015FFFEO HEX) BCFLUSH W Ahus 1 Each Cbox IPR is located in the Cbox Abus datapatb., the Cbox Dbus datapatb., or the Cbox BIU datapatb.. 2The address given is as it is seen in the Cbox, after microcode has shifted the software address left by two hits. 13-58 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 IPRs in the system and in the Cbox are accessed through IPR_READs and IPR_WRITEs from the Mbox to the Cbox. When the Cbox recogrrizes a valid IPR_READ on Mo/eSG_CMD_H<4:O>, it . loads the read into the DREAD_LATCH to be processed. The Mbox guarantees that only one DREAD or IPR_READ may be outstanding at a time, so that the DREAD_LATCH will not be overwritten. A valid IPR_WRITE is loaded into the WRITE_PACKER and proceeds immediately to the WRITE_QUEUE. All IPR reads and writes to the Cbox flush the WRITE_QUEUE before completing. Any IPR_READ sets DWR_CONFLICT bits in all valid entries in the WRITE_QUEUE so that any preceding writes of any kind will complete before the IPR_READ. An IPR_WRITE is placed in the WRITE_QUEUE after the preceding writes so that the ordering takes place naturally. If a read arrives after the IPR_WRlTE and before it has been processed, the WRITE_QUEUE conflict bits are set so that the WRITE_QUEUE takes priority over the read. If the IPR_READ addresses one of the Cbox registers, the Cbox returns the data from the register through the CM_OUT_LATCH, in the usual way that it would return. data for a read hit. The only difference is that it returns just one quadword or less of data, rather than the usual 4 quadwords. The Cbox asserts C%LAST_F'II..kH so the Mbox does not expect any more fills. If a write-only Cbox register is read, the Cbox returns UNPREDICTABLE data. Reading an unimplemented Cbox register returns UNPREDICTABLE data; if an unimplemented register is written, the write is discarded by the Cbox and normal operation continues. If the Cbox receives an IPR access to a legal IPR address which is not within the Cbox block of IPR addresses, it converts it to an 110 space read or write. The Cbox merges the IPR address with EIOOOOOO hex, effectively adding the base 110 space address of the IPR block to the IPR address. This is done in hardware by forcing bits <31:29> and bit <24> to 1's. (The other upper bits are expected to be received as zero's.) From this point on, the transaction is treated as an 110 space transaction by the Cbox. It sends the request off-chip to the NDAL through the NON_WRITEBACK_QUEUE. When the fill data returns, the data is returned to the Mbox but is not cached by the Cbox. 110 space reads and writes are never cached in the primary cache or the backup cache. DIGITAL CONFIDENTIAL 13-59 NVAX CPU Chip Functional Specificationt Revision 1.1, August 1991 13.5.1 Cbox ControllPR (CCTL) CCTL is a read/write register which contains bits controlling the behavior of the Cbox. The bits are detailed in Figure 13-25 and Table 13-33. Figure 13-25: IPR AO (hex), CCTL 31 30 29 28121 26 25 24123 22 21 20119 18 11 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I I xl xl xl xl xl xl xl xl xl xl xl xl xl I I I I I I I I I I I I :CCTL +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I '-sw ETM '-HW_ETM I I I I I I I I I I I I I '-ENABLE I I I I I I I '-TAG_SPEED I I I I I I I I I I I I I I '-DATA_SPEED I I I I I I I I I I I '-SIZE I I I I I I I I I '-FORCE HIT I I I '-DISABLE ERRORS I I '-sw ECC I ' -TIMEOUT TEST '-DISABLE PACK '-PM ACCESS TYPE I I I I I I I I I I I '-PM_HIT_TYPE '-FORCE_NDAL_PERR - Table 13-33: CCTL Field DescrlpUons Name Extent Type ENABLE 0 RW,O Turns the bcache on and off. TAG_SPEED 1 RW,O Controls time to access the tag RAMs. DATA_SPEED 3:2 RW,O Controls time to access the data RAMs. SIZE 5:4 RW,O Selects one of four backup cache sizes. FORCE_HIT 6 RW,O Forces memory reads and writes to hit in the backup cache. DISABLE_ERRORS 7 RW,O Disables all backup cache ECC errors. SW_ECC 8 RW,O Enables use of ECC check bits as given by software for the tag store and data RAMs. TIMEOUT_TEST 9 RW,O Puts the Cbox read timeout counters into test mode. DISABLE_PACK 10 RW,O Disables the Cbox write packer. PM_ACCESS_TYPE 13:11 RW,O Selects type of Bcache access for the performance monitoring hardware. PM_HIT_TYPE 15:14 RW,O Selects type of Bcache hit for the performance monitoring hardware. RW,O Forces a parity error in the command field of the next outgoing NDAL transaction. FORCE_NDAL_PERR 16 Description SW_ETM 30 RW,O U sed by software to put the backup cache into ETM. HW_ETM 31 WC Used by hardware to put the backup cache into ETM. 13-60 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.5.1.1 ENABLE When ENABLE = 1, the backup cache is enabled for operation. When ENABLE=O, the backup cache is off and all references are treated as misses and are not looked up in the backup cache. When the backup cache is off: FORCE_IDT, SW_ETM and HW_ETM are ignored. Reset clears this bit so that the Bcache is off when the chip is reset. 13.5.1.2 TAG_SPEED The Cbox provides this bit to select the speed of the tag rams. Table 13-34 shows the relationship of the value of TAG_SPEED and the access time of the tag RAMs, given in NVAX cycles. This is the total RAM access time including internal Cbox processing time. For information on the actual cache ram access times required, see Section 13.3.1. Reset clears this bit so that the tag access repetition rate is 3 cycles when the chip is reset. Table 13-34: TAG_SPEED tag read tag write rep rate rep rate o 3 cycles 3 cycles 1 4 cycles 4 cycles 13.5.1.3 comments may not be used when DATA_SPEED=OO DATA_SPEED The Cbox provides this bit to select the speed of the data rams. Table 13-35 shows the relationship of the value of DATA_SPEED and the access time of the data RAMs, given in NVAX cycles. This is the total RAM access time including internal Cbox processing time. For information on the actual cache ram access times required, see Section 13.3.1. Reset clears these bits so that the data read rep rate is 2 cycles when the chip is reset. Table 13-35: DATA_SPEED data read data write rep rate rep rate comments 00 2 cycles 3 cycles may not be used when TAG_SPEED=1 01 3 cycles 4 cycles 10 4 cycles 5 cycles 11 1 unused 1 unused lCbox response in this mode is UNDEFINED. The fastest DATA..,.SPEED may not be selected with the slowest TAG_SPEED, for in this configuration the result of the cache hit calculation is not known in time for the Cbox state machines to operate correctly. DIGITAL CONFIDENTIAL The Cbox 13-61 NVAX CPU Chip Functional Specification, Revision 1.lt August 1991 13.5.1.4 SIZE Four backup cache sizes are selectable by using the SIZE bits, as shown in Table 13-36. These bits are cleared on reset so that when the chip is reset, the 128-kilobyte cache is selected by default. Table 13-36: SIZE SIZE<l:O> Backup cache size 00 128 kilobytes 01 256 kilobytes 10 512 kilobytes 11 2 megabytes 13.5.1.5 FORCE_HIT When FORCE_mT is set, all memory references, both Dstream and Istream reads and writes, are forced to hit in the backup cache. The tag store state is not changed but data is always read or written. Reset clears this bit. The backup cache must be enabled when the cache is used in FORCE_mT mode. This mode is expected to be used for testing purposes only. 13.5.1.6 DISABLE_ERRORS When DISABLE_ERRORS is set, all ECC errors from the backup cache are ignored. Neither Co/oCBOX_B_ERR_B nor Co/cCB01-S_ERR_H is asserted. Co/cCBOx..BARD_ERR_B is not asserted for data returning to the Mbox. The backup cache data syndrome is loaded into BCEDECC on every cache access; the behavior of BCETSTS, BCETIDX, BCETAG, BCEDSTS, and BCEDIDX is unpredictable. This feature allows operation of the backup cache even if the error detection and correction logic is faulty. It also allows access to the backup cache syndrome for the purposes of testing the ECC logic. Reset clears this bit. 13.5.1.7 SW_ECC When SW_ECC is clear, the Cbox generates correct ECC check bits for all writes to the tag store and data RAMs. When SW_ECC is set, the Cbox does not generate the check bits when the backup cache is written with data, but uses the check bit values as specified by software and written in the BCDECC register. Note that if a read or write reference misses in the Bcache when SW_ECC is set, all four fills will be written with the ECC given in BCDECC when they return. When SW_ECC is set and the tag store is written using an IPR write to BCTAG, the Cbox uses the check bits for the tag store as given through the IPR write. The value of SW_ECC does not affect tag store transactions other than IPR writes. Reset clears this hit. 13-62 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.5.1.8 TIMEOUT_TEST When TIMEOUT_TEST is set, the Cbox uses the internal clock to clock its read timeout counter. When TIMEOUT_TEST is clear, the Cbox uses E%TIMEOUT_BASEJI to clock its timeout counters. Reset clears this hit. 13.5.1.9 DISABLE_PACK When DISABLE_PACK is set, the Cbox does not pack quadword writes together. Instead, the write packer passes every write it receives directly into the write queue. When the hit is clear, the Cbox write packer operates normally. DISABLE_PACK is intended for testing purposes only. Reset clears this hit. 13.5.1.10 PM_ACCESS_TYPE PM.-ACCESS_TYPE selects the type of Bcache access for the performance monitoring hardware .. The function of these three bits is fully described in Section 13.11. Reset clears these bits. 13.5.1.11 PM_HIT_TYPE PM_InT_TYPE selects the type of Bcache hit for the performance monitoring hardware. The function of these two bits is fully described in Section 13.11. Reset clears these bits. 13.5.1.12 FORCE_NDAL_PERR When a 1 is written to FORCE_NDAL_PERR, a parity error is caused in the command field of the next outgoing NDAL transaction. The parity error is caused by inverting the value of P%PARITY_H<2>. Setting this bit causes only one parity error. The parity error does not occur until NVAX is granted the NDAL for its next outgoing transaction. If software sets FORCE_NDAL_PERR and clears it before NVAX is granted the bus, NVAX will still force a parity error on the next transaction. In order to produce a second parity error on the bus, FORCE_NDAL_PERR must be cleared and set again by software. Reset clears this hit. 13.5.1.13 SW_ETM This is a software-write8.ble bit to put the backup cache into Error Transition Mode. When the cache is on and software ascertains that the cache is producing errors, it can set this bit in order to turn off the cache while ensuring cache coherency. Software can then flush owned data through use of the Bcache Deallocate IPR, BCFLUSH. In this manner, the unique data can be extracted from the cache before it is turned off completely. Reset clears this bit. 13.5.1.14 HW_ETM Hardware sets this bit when an uncorrectable error is detected in the backup cache tag store or data rams, unless DISABLE_ERRORS is set. Hardware sets the bit to put the backup cache into Error Transition Mode. Software clears HW_ETM by writing a one to it. DIGITAL CONFIDENTIAL The Cbox 13-63 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.5.2 IPR A2 (hex), BCDECC Figure 13-26: Format of the BCDECC 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I xl xl xl xl xl xl ECCHI I xl xl xl xl xl xl xl xl xl xl xl xl ECCLO I xl xl xl xl xl xl :BCDECC +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ The ECCm field corresponds to data check bits <7:4>. The ECCLO field corresponds to data check bits <3:0>. This register is written by software. It is a write only register. Software writes BCDECC using an IPR_WRITE. The value in the register is then used to explicitly write ECC into the data RAMs during any write of the data RAMs, but only if SW_ECC is set in . the control register. If SW_ECC is not set, hardware ignores the value in BCDECC and generates the check bits to be written using the ECC syndrome generator. BCDECC is expected to be used during testing only. It allows software to explicitly write bad ECC into the data RAMs in order to test Cbox error detection logic. Note that BCDECC will be used as the source of the ECC check bits during any write of the backup cache data RAMs, including those done for fills. Cache transactions must be carefully controlled while this register is being used in order to obtain the expected results. BCDECC will probably be most useful when used in FORCE_mT mode, so that no fills are generated. Reset does not affect this register. 13-64 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.5.3 Backup Cache Tag Store Error Registers (BCETSTS, BCETIDX, BCETAG) On some tag store errors, hardware overwrites the corrupted values so that they cannot be diagnosed by reading the tag store directly. For this reason there are tag store error registers which hold the relevant data, so that software can understand the problem. The tag store error registers are loaded when any tag store error occurs. Their contents are not changed during reset. The status bits in BCETSTS indicate what sort of error happened. Correctable errors are indicated by the CORR bit; the UNCORR and BAD_ADDR errors are both uncorrectable-type errors. If no error is yet logged in the registers, the registers are loaded when either a correctable or an uncorrectable error occurs. Once the registers are loaded with information from a correctable error, they are locked against further correctable errors, and are only loaded again if an uncorrectable error happens. At this time either UNCORR or BAD_ADDR is set. The LOCK bit in BCETSTS is set as well. In this way, information from the first correctable error is held in the registers, and is only overwritten if an uncorrectable error happens later. The error registers are cleared and unlocked by software. If the error registers hold data from a non-correctable error and yet another non-correctable error happens before the error registers are unlocked, the LOST_ERR bit is set. This indicates to software that it does not have sufficient information in the error registers to recover from all uncorrectable errors which have occurred. 13.5.3.1 Bcache Error Tag Status (BCETstS) The BCETSTS register gives the general status of an error in the tag store, indicating the transaction which was taking place at the time and the type of error. The register is written by hardware and read by software. Hardware does not clear the error hits in this register; this must be done by software using write-one-to-clear to the bottom 5 bits of the register. The contents of the register are not changed during reset. Figure 13-27: IPR A3 (hex), BCETSTS '~) 1 J 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl TS_CMD I I: I . I I ')1 :BCETSTS +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I I I I I I I I I I I I I I '-LOCK '-CORR '-UNCORR '-BAD ADDR '-LOST_ERR Table 1~7: BCETSTS Field Descriptions Name Extent Type Description LOCK o WC Lock bit. Indicates that BCETSTS (except LOST_ERR), BCETIDX, and BCETAG are locked. CORR 1 WC Indicates that a correctable ECC error was encountered. DIGITAL CONFIDENTIAL The Cbox 13-65 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 13-37 (Cont.): BCETSTS Field Descriptions Name Extent Type UNCORR 2 WC Indicates that an uncorrecta.ble ECC error was encountered. BAD_ADDR 3 WC Indicates that an addressing error was detected. uncorrecta.ble error. LOST_ERR 4 WC Indicates that more than one uncorrectable error occurred which. was not recorded in the error registers. TS_CMD 9:5 R Indicates what tag store command was being processed at the time the error occurred. 13.5.3.1.1 Description This is an LOCK Whenever the tag store error registers are locked due to an uncorrectable error, the LOCK bit is set. At this time either UNCORR or BAD_ADDR is also set to indicate the type ofuncorrectable error. When the LOCK bit is set, the BCETSTS, BCETIDX, and BCETAG registers are all locked. Clearing the lock bit unlocks all three registers. The LOCK bit is set by hardware and it is cleared by software. It is a write-one-to-clear bit. 13.5.3.1.2 CORR CORR is set when the tag store ECC decoder detects a correctable error. When this occurs, the Bcache Tag Store Error registers are loaded and are locked against further correctable errors. They are not locked against an uncorrectable error which follows. BCETSTS<LOCK> is not set. If a correctable error is followed by an uncorrectable error, the CaRR bit remains set. The CORR bit is set by hardware and it is cleared by software. It is a write-one-to-clear bit. 13.5.3.1.3 UNCORR UNCORR is set when the tag store ECC decoder detects an uncorrectable error. When this occurs, the Bcache Tag Store Error registers are loaded and locked. The UNCORR bit and the BAD_ADDR bit are exclusive: only one of them is set for a given error which sets the LOCK bit. If the other type of error occurs later, the related bit is not set since the register is already locked. In this case, LOST_ERR is set instead. The UNCORR bit is set by hardware and it is cleared by software. It is a write-one-to-clear bit. 13.5.3.1.4 BAD_ADDR BAD_ADDR is set when the tag store ECC decoder detects an error in the address bit, indicating some problem with the address lines going to the tag rams. This is an uncorrectable error, thus, when it occurs, the Bcache Tag Store Error registers are loaded and locked. The UNCORR bit and the BAD_ADDR bit are exclusive: only one of them is set for a given error which sets the LOCK bit. If the other type of error occurs later, the related bit is not set since the register is already locked. In this case, LOST_ERR is set instead. 13-66 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 The BAD_ADDR bit is set by hardware and it is cleared by software. It is a write-one-to-clear hit. 13.5.3.1.5 LOST_ERR LOST_ERR indicates that after the first uncorrectable error was recorded in the tag store error registers, an additional uncorrectable error occurred for which state was not saved. LOST_ERR is set by hardware and is cleared by software. It is a write-one-to-clear bit. 13.5.3.1.6 TS_CMD The TS_CMD field indicates what the tag store was doing when the error was detected. Its values are listed in Table 13-38. Table 13-38: Interpretation of TS_CMD TS_CMD NAME Tag Store Operation 00111 00011 00010 01000 DREAD Data-stream (DREAD) or DREAD_MODIFY tag lookup mEAD Instruction-stream tag lookup OREAD Ownership-read tag lookup for a write or a READ_LOCK WUNLOCK Ownership-read tag lookup for a WRITE_UNLOCK (lookup done only in ETM) 01101 Cache coherency tag lookup as the result of NDAL DREAD or IREAD 01001 Cache coherency tag lookup as the result of NDAL OREAD or WRITE 01010 Tag lookup for an explicit IPR deallocate operation There are three tag store operations which do not cause any soIt of errors: tag store update after a fill, ipr write of the tag store, ipr read of the tag store. Thus, these commands will not appear in BCETSTS. 13.5.3.2 Bcache Error Tag Index (BCETIDX) This register is loaded and locked when a tag store error occurs. If a correctable error is followed by a second error which is not correctable, the register is loaded with information from the second, more serious error. Except for this case, once it is locked, it is not changed until software explicitly unlocks the register. This register is written by hardware and read by software. Its contents are not changed during reset. DIGITAL CONFIDENTIAL The Cbox 13-67 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 13-28: IPR A4 (hex), BCETIOX 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 Backup Cache Tag Store Address I 0 I 01 01 01 01 :BCETIDX +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ BCETIDX contains the complete hexaword address corresponding to a tag store request which resulted in an error. Since the full address is saved, both the cache index and the cache tag of the request are known. Thus, this register shows what index was being accessed when the error occurred as well as showing what the tag of the request was. Software can compare this tag with the actual tag read from the RAMs, which is saved in BCETAG. On a BCFLUSH which incurs an error, the address used to flush the cache is captured in BCETIDX, not the memory address of the block. 13.5.3.3 Bcache Error Tag (BCETAG) This register is loaded when a tag store error occurs. It is locked when an uncorrectable error occurs on a tag store access. Once the register is locked, it is not overwritten until it is unlocked by software. BCETAG is written by hardware and read by software. It is a read-only register from the software point of view. The contents of BCETAG are not changed during reset. The register holds the data which was read from the tag store and produced the error, as shown in Figure 13-29. Figure 13-29: IPR A5 (hex), BCETAG 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 TAG 1 ~ ECC 1 I 1 0 1 0 I 0 1 0 1 0 I 0 1 0 1 0 1 0 1 : BCETAG +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 1 1 '-TAG or 0, based on cache sizQ Table 13-39: BCETAG Field Descriptions Description Name Extent Type VALID 9 RO Valid bit OWNED 10 RO Ownership bit ECC 16:11 RO ECC check bits TAG 31:17 RO Backup cache tag 13-68 The Cbox 1 '-VALID '-OWNED DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.5.3.3.1 VALID VALID is the bit read from the tag RAMs which indicates whether the block is valid in the Bcache. 13.5.3.3.2 OWNED OWNED is the bit read from the tag RAMs which indicates whether the Bcache owns the block in question. 13.5.3.3.3 ECC The ECC field contains the check bits as read from the tag RAMs during the tag access which produced the error. 13.5.3.3.4 TAG The TAG field of BCETAG is the cache tag as read from the tag RAMs. It must be interpreted based on the cache size being used, as shown in Table 13-40. When certain address bits are not used as tag bits for the cache size given, their value in BCETAG is o. Table 13-40: TAG Interpretation Cache size Tag bits used 128 kilobytes TA<k31:17> None 256 kilobytes TA<k31:18> TACk 17> 512 kilobytes TA<k31:19> TA<k18:17> 2 megabytes TA<k31:21> TA<k20:17> DIGITAL CONFIDENTIAL Unused tag bits The Cbox 13-69 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.5.4 Backup Cache Data RAM Error Registers (BCEDSTS, BCEDIDX, BCEDECC) The data RAM error registers hold data relevant to errors in the backup cache data RAMs, so that software can understand the problem. BCEDSTS holds the general status of the problem. BCEDIDX holds the data RAM index being used when the problem occurred. BCEDECC holds the syndrome bits as calculated on the data which was read from the RAMs when the problem occurred. If no elTOr is yet logged in the data RAM error registers, the registers are loaded when either a cOlTectable or an uncorrectable error occurs. Once the registers are loaded with information from a correctable elTor, they are locked against further correctable errors, and are only loaded again if an uncorrectable error happens. If an uncorrectable error happens, the LOCK bit in BCEDSTS is set and the registers are not overwritten until software clears the elTor bits. In this way, information from the first correctable error is held in the registers, and is only overwritten if an uncorrectable error happens later. If the registers are locked, any subsequent non-correctable elTor causes the LOST_ERR bit to be set, but does not modify any other information in the registers. LOST_ERR indicates to software that it does not have sufficient information in the error registers to recover from all uncorrectable errors which have occurred. Of the backup cache data RAM error registers, only BCEDSTS is writable by software. Software clears the error and lock bits which reenables all the Data RAM error registers to record the next error which occurs. The contents of BCEDSTS, BCEDIDX, and BCEDECC are not affected by reset. 13.5.4.1 Bcache Error Data Status (BCEDSTS) Figure 13-30: IPR AS (hex), BCEDSTS 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl DR_CMD I 01 01 01 I I I I I :BCEDSTS +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I I I I I I I I '-LOCK I I I '-CORR I I '-UNCORR I '-BAD ADDR '-LOST_ERR Table 13-41: BCEDSTS Field Descriptions Name Extent Type Description LOCK o WC Lock bit. Indicates that the BCEDSTS, BCEDIDX, and BCEDECC registers are locked. CORR 1 WC Indicates that a correctable ECC error was encountered. 13-70 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 13-41 (Cont.): BCEDSTS Field Descriptions Name Extent Type Description UNCORR 2 WC Indicates that an uncorrectable ECC error was encountered. BAD_ADDR 3 WC Indicates that an addressing error was detected. LOST_ERR 4 WC Indicates that a second uncorrectable error occurred; it was not recorded in the error registers. 11:8 R Indicates what command was being processed by the data RAMs at the time the error occurred. The LOCK bit is set when an error which was not correctable has occurred. If the CORR bit is set, the data ram error registers are locked unless an uncorrectable error occurs. On an uncorrectable error, the LOCK bit is set and the registers are permanently locked until unlocked by software. The contents of BCEDSTS are not affected by reset. 13.5.4.1.1 LOCK Whenever the data RAM error registers are loaded with an uncorrectable error, the LOCK bit is set. At this time either UNCORR or BAD_ADDR is also set to indicate the type of uncorrectable error. (A correctable error does not set BCEDSTS<LOCK>.) When the LOCK bit is set, the BCEDSTS, BCEDIDX, and BCEDECC registers are all locked. Clearing the lock bit unlocks all three registers. The LOCK bit is set by hardware and it is cleared by software. It is a write-one-to-clear bit. 13.5.4.1.2 CORR CORR is set when the data ECC decoder detects a correctable error. When this occurs, the Bcache Data Error registers are loaded and locked against further correctable errors; BCEDSTS<LOCK> is not set. The CORR bit is set by hardware and it is cleared by software. It is a write-one-to-clear bit. 13.5.4.1.3 UNCORR UNCORR is set when the data ECC decoder detects an uncorrectable error. When this occurs, the Bcache Data Error registers are loaded and locked. The UNCORR bit is set by hardware and it is cleared by software. .It is a write-one-to-clear bit. 13.5.4.1.4 BAD_ ADDR BAD_ADDR is set when the data ECC decoder detects an error in the address bit, indicating some problem with the address lines going to the data rams. This is an uncorrectable error, thus, when it occurs, the Bcache Data Error registers are loaded and locked. The BAD~DR bit is set by hardware and it is cleared by software. It is a write-one-to-clear bit. DIGITAL CONFIDENTIAL The Cbox 13-71 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.5.4.1.5 LOST_ERR LOST_ERR indicates that after the first uncorrectable error was recorded in the data error registers, an additional uncorrectable error occurred for which state was not saved. LOST_ERR is set by hardware and is cleared by software. It is a write-one-to-clear bit. 13.5.4.1.6 DR_CMD The DR_CMD field indicates what the data RAMs were doing when the error was detected. Its values are listed in Table 13-42. Table 13-42: Interpretation of DR_CMD Data RAM operation 0111 DREAD Data lookup for a Dstream read 0011 IREAD Data lookup for an Istream read 0100 WBACK Data lookup for a writeback 0010 RMW Data lookup for a read·modify·write (done for normal writes and WRITE_UNLOCKs.) There are two data RAM operations which do not cause any sort of errors: full quadword writes and fills. Thus, these commands will not appear in BCEDSTS. DR_CMD is only written by hardware. It is read-only for software. 13.5.4.2 Bcache Error Data Index (BCEDIDX) This register holds the index of a data RAM transaction; it is loaded when an error is detected on a data RAM access. The index loaded due to a correctable error is not overwritten unless an uncorrectable error occurs afterwards. If an uncorrectable error occurs, BCEDIDX is loaded and locked. BCEDIDS is unlocked by software; the lock bit is in the BCEDSTS register. BCEDIDX is read-only from software's point of view. Its contents are not affected by reset. - 13-72 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 13-31: IPR A7 (hex), BCEDIDX 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I 01 01 01 01 01 01 01 01 01 01 01 I Backup cache data RAM index 1 01 01 01 :BCEDIDX +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 '-index or undefined, based on cache size BCEDIDX must be interpreted based on the cache size being used, as shown in Table 13-43. When certain address bits are not used as index bits for the cache size given, their value in BCEDIDX is undefined. Table 13-43: BCEDIDX Interpretation Cache size Index bits used 128 kilobytes BCEDIDX<16:3> BCEDIDX<20:17> 256 kilobytes BCEDIDX<17:3> BCEDIDX<20:18> 512 kilobytes BCEDIDX<18:3> BCEDIDX<20:19> 2 megabytes BCEDIDX<20:3> None 13.5.4.3 undefined index bits Bcache Error Data ECC (BCEDECC) This register holds the syndrome as calculated on the backup cache data and check bits. It is loaded when an error occurs on a data RAM access. Then it follows the same lock rules that the other Bcache Data Error registers follow. It is unlocked by software. The lock bit is in the BCEDSTS register. The contents of BCEDECC are not affected by reset. When DISABLE_ERRORS is set, BCEDECC is loaded on every quadword read from the cache. This provides a way of testing the ECC logic by reading the results of the syndrome calculation. Note that because 4 quadwords are read from the Bcache at a time, BCEDECC will contain the syndrome from the LAST quadword read after the 4-qw transaction is complete. Software can control which quadword is read last by varying the requested quadword of a transaction; the Bcache controller always returns the requested quadword first, then returns the remaining 3 quadwords in wraparound order. For example, if the programmer wants to see the contents of BCEDECC after quadword 2, she would do a read to quadword 3 of the block, and the quadwords would be read out in the order 3-0-1-2. Software can use BCDECC to write known check bits to the data RAMs; when the RAMs are read, the syndrome is captured by BCEDECC. Once the syndrome is known, the check bits which were calculated by the ECC hardware can be deduced, because the check bits read from the RAMs were known. The syndrome is simply the XOR of the calculated check bits and the check bits which were read from the RAMs. If the programmer wants to learn what the correct checkbits for a particular data pattern should be, she can write data to the cache while BCDECC contains all zero's and CCTL<SW_ECC> is set. This forces checkbits of zero to be written to the cache with the data. When the data is read back, BCEDECC will contain the correct checkbits for the data (the XOR of the checkbits read and the checkbits calculated by hardware). DIGITAL CONFIDENTIAL The Cbox 13-73 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 BCEDECC is read-only from software's point of view. Figure 13-32: IPR AS (hex), BCEDECC 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I xl xl xl xl xl xl ECCHI I xl xl xl xl xl xl xl xl xl xl xl xl ECCLO I xl xl xl xl xl xl :BCEDECC +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ The ECCm field corresponds to syndrome bits <7:4>. The ECCLO field corresponds to syndrome bits <3:0>. 13-74 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.5.5 Fill Error Registers (CEFADR, CEFSTS) Some elTors are related to outstanding reads to memory. These elTors may be diagnosed using the CEFSTS and CEFADR registers. CEFSTS holds general information about the type of read outstanding; CEFADR holds the address of the outstanding read. The contents of these these registers are not changed during reset. 13.5.5.1 Cbox Error Fill Status (CEFSTS) The CEFSTS register holds information related to a problem on a read which was sent to memory. If a read request to memory times out or is terminated with RDE, the CEFSTS register and the CEFADR register are loaded and locked. The register is read-write. Only the lowest five bits and the UNEXPECTED_FILL bit may be written, and then only to clear them after an error. CEFSTS is not affected by reset. Figure 13-33: IPR AC (hex), CEFSTS 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I xl xl xl xl xl xl xl xl xl xl I xl xl xl x I COUNT I I I I I :CEFSTS +--+--+--+--+--+--+--+--+--+--+- +--+--+--+--+--+--+- +--+- +- +- +--+--+--+--+--+--+--+--+--+--+ I I I I I I I I I I I '-RDLK '-LOCK '-TIMEOUT '-RDE '-LOST_ERR '-IDO '-IREAD '-OREAD '-WRITE '-TO_MBOX '-RIP -OIP I I '-UNEXPECTED_FILL Table 13-44: -DNF '-RDLK FL DONE '-REQ_FILL_OONE CEFSTS Field Descriptions Name Extent Type RDLK 0 WC Indicates that a READ_LOCK was in progress. LOCK Description 1 WC Indicates that an error OCCUlTed and the register is locked. TIMEOUT 2 WC FILL failed due to transaction timeout. RDE 3 WC FILL failed due to Read Data Error. LOST_ERR 4 WC Indicates that more than one error related to fills occurred. IDO 5 RO NDAL identification hit for the read request. IREAD 6 RO This is an Istream read from the Mbox which may be aborted. OREAD 7 RO This is an outstanding DREAD. WRITE 8 RO This read was done for a write. DIGITAL CONFIDENTIAL The Cbox 13-75 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 13-44 (Cont.): CEFSTS Field Descriptions Name Extent Type TO_MBOX 9 RO Data is to be returned to the MbOx. RIP 10 RO READ invalidate pending. OIP 11 RO OREAD invalidate pending. DNF 12 RO Do not fill - data not to be written into the cache or validated when the fill returns. RDLK_FL_DONE RO Indicates that the last:fill for a READ_LOCK arrived. RE'LFILL_DONE 13 14 RO Indicates that the requested quadword was successfully returned from the NDAL. COUNT 16:15 RO For a memory space transaction, indicates how many of the fill quadwords have been successfully returned. For I/O space, is set to 11(BIN) when the transaction starts as only one quadword will be returned. UNEXPECTED_FILL 21 WC Set to indicate that an unexpected fill was received from the NDAL. 13.5.5.1.1 Description RDLK RDLK is set to show that a READ_LOCK is in progress. This hit is write-one-to-clear. The side effect of performing a write-one-to-clear to this bit is to clear the VALID hit for an entry which had its RDLK bit set; this has the effect of clearing out the FILL_CAM entry. This is the same action which is taken when a WRITE_UNLOCK is received. Microcode uses this functionality during certain error sequences; the hit is implemented in the zero position to make the microcoding as efficient as possible. . . ..~ .. · •. ···,·.lb.. This bit is normally not read as a one by software, because the IIJ,icrocqde ensures that 'the. READ_LOCK-WRITE_UNLOCK sequence is an indivisible operation. It: however," the :first quadword of a READ_LOCK is returned successfully and then the transaction either times out or is terminated in RDE, CEFSTS is loaded with the RDLK bit set. 13.5.5.1.2 LOCK The LOCK bit is set when a read transaction which has been sent to memory terminates in Read Data Error or in Timeout. At the same time, all information corresponding to the read is loaded from the FILL_CAM into the CEFSTS register. When the LOCK bit is set, one of TIMEOUT, RDE, or UNEXPECTED_FILL is also set to indicate the type of error. Once the LOCK bit is set, none of the information in CEFSTS or CEFADR changes, with the possible exception of LOST_ERR, until the LOCK bit is cleared. Hardware sets the LOCK bit and software clears it by writing a one to that location. 13.5.5.1.3 TIMEOUT TIMEOUT is set when a read transaction which was sent to the NDAL times out for some reason. When TIMEOUT is set, the LOCK bit is also set. Hardware sets the TIMEOUT hit and software clears it by writing a one to that location. 13-76 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.5.5.1.4 ROE RDE (Read Data Error) is set when a read transaction which was sent to the NDAL terminates in RDE. When the RDE bit is set, the LOCK bit is also set. The UNEXPECTED_FILL bit will be set as well, if the RDE was actually unexpected (no read corresponding to the RDE was outstanding when that RDE was received). Hardware sets the RDE bit and software clears it by writing a one to that location. 13.5.5.1.5 LOST_ERR The LOST_ERR bit is set when CEFSTS is already locked and another RDE, timeout, or unexpected fill error occurs. This indicates to software that multiple errors have happened and state has not been saved for every error. Hardware sets the LOST_ERR bit and software clears it by writing a one to that location. 13.5.5.1.6 100 100 corresponds to the NDAL signal, P%lD_H<O>, which was issued with the read that failed. It also indicates which one of the two FILL_CAM entries was used to save information about the transaction while it was outstanding. . 13.5.5.1.7 IREAD IREAD indicates that the transaction in error was an IREAD. 13.5.5.1.8 OREAD OREAD indicates that the transaction in error was an OREAD; the OREAD may have been done for a write, a READ_LOCK, or a read modify. 13.5.5.1.9 WRITE WRITE indicates that the transaction in error was an OREAD done because of a write request. 13.5.5.1.10 TO_MBOX TO_MBOX indicates that data returning for the read was to be sent to the MBDX. 13.5.5.1.11 RIP RIP (Read Invalidate Pending) is set when a cache coherency transaction due to a read on the NDAL is requested for a block which has Dread fills outstanding at the time. This triggers a writeback of the block when the fill data arrives; a valid copy of the data is kept in the cache. 13.5.5.1.12 OIP OIP (Dread Invalidate Pending) is set when a cache coherency transaction due to an DREAD or a WRITE on the NDAL is requested for a block which has DREAD fills outstanding at the time. This triggers a writeback and invalidate of the block when the fill data arrives. DIGITAL CONFIDENTIAL The Cbox 13-77 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.5.5.1.13 DNF DNF (Do Not Fill) is set when data for a read is not to· be written into the Bcache. This is the case when the cache is off, in ETM, or when the read is to I/O space. The assertion of this bit prevents the block from being validated in the cache. 13.5.5.1.14 RDLK_FL_DONE This bit is set in the fill cam when a READ_LOCK hits in the Bcache or the last fill arrives from the BIU for a READ_LOCK. Once this is set, the corresponding WRITE_UNLOCK is allowed to proceed. This overrides the FILL_CAM block conBict on the WRITE_UNLOCK which is inevitable since the READ_LOCK is held in the FILL_CAM until the WRITE_UNLOCK is done. 13.5.5.1.15 REQ_FILL_DONE RE<t.FILL_DONE is set when the requested quadword of data was successfully received from. the NDAL. This is used to allow error handling software to differentiate between an error which occurred before the requested data was received, and an error which occurred after the requested data was received. If the error occurs while the requested data is being returned, such as the requested data being returned with RDE, it is as if the requested data was not received. REQ..FILL_DONE will not be set because the requested data was not successfully received. 13.5.5.1.16 COUNT These two bits indicate how many of the expected four quadwords have been returned successfully from memory for this read. If they are OO(BIN), no quadwords have returned, if they are 01(BIN), one quadword has returned, etc. If the entry was for a quadword read, the count bits are set to 11(BIN) when the reference is sent out. As an example, if RDE is returned before any other RDR returns for a hexaword request, COUNT will be OO(BIN), to indicate that no quadwords of data were successfully returned. 13.5.5.1.17 UNEXPECTED_FILL UNEXPECTED_FILL is set to indicate that an RDE or an RDR cycle was received from the NDAL with an ID for which the FILL_CAM entry was not valid. When UNEXPECTED_FILL is set, CEFSTS and CEFADR are loaded and locked. RDE will also be set if the unexpected fill was an RDE rather than an RDR. UNEXPECTED_FILL is a write-one-to-clear bit which is set by hardware and cleared by software. 13.5.5.2 Fill Error Address (CEFADR) The CEFADR register holds the original quadword read address of a fill which ended in an error condition. It is loaded when an error is detected on a fill. It is a read-only register. 13-78 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 CEFADR is locked when CEFSTS is locked. Its contents are not changed during reset. Figure 13-34: IPR AB (hex), CEFADR 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Fill error address 1 01 01 01 :CEFADR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ DIGITAL CONFIDENTIAL The Cbox 13-79 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.5.6 NDAL Error Registers (NESTS, NEOADR, NEOCMD, NEDATHI, NEDATLO, NEICMD) The NDAL error registers hold information related to NDAL errors. NESTS, NDAL Error Status, holds error bits relating to any problems encountered. NEOADR, NDAL Error Output Address, holds the address corresponding to the cycle which was in error. NEOCMD, NDAL Error Output Command, holds the command bits corresponding to the cycle in error. NEDATHI, NDAL Error Data High Longword., and NEDATLO, NDAL Error Data Low Longword, hold the data from. an NDAL cycle where NVAX detected a parity error on the bus. NEICMD, NDAL Error Input Command, holds the command bits corresponding to a cycle with a parity error. The NDAL error registers are not affected by reset: their contents are not changed during reset. 13.5.6.1 NDAL Error Status IPR (NESTS) The NESTS register holds information about any errors which happened on the NDAL. All six bits in this register are write-one-to-clear. Reset does not affect this register. Power-up does not initialize the register. Figure 13-35: IPR AE (hex), NESTS 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl I I I I I I :NESTS +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I I I I I I I I I I I I I I I I '-NOACK I I I '-BADWDATA I I '-LOST_OERR I '-PERR '-INCON PERR I I I I '-LOST_PERR Table 13-45: NESTS Field Descriptions Name Extent Type Description NOACK o WC Indicates that ptTtA.CK_L was not asserted for an outgoing NVAX cycle. This bit locks NEOADR and NEOCMD. BADWDATA 1 we Indicates that an outgoing data cycle was accompanied by the BADWDATA command. This bit locks NEOADR and NEOCMD. 2 WC Indicates that multiple outgoing errors, BADWDATA, were detected. 3 we Indicates that a parity error was detected on the NDAL. This bit locks NEDATHI, NEDATLO, AND NEICMD. PERR 13-80 The Cbox either NOACK or DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 13-45 (Cont.): NESTS Field Descriptions Name Extent Type Description INCON_PERR 4 WC Inconsistent parity error. LOST_PERR 5 we Indicates that multiple NDAL parity elTors were detected. 13.5.6.1.1 NOACK NOACK is set when NVAX detects that P%ACK_L was not asserted on the NDAL for an outgoing NVAX cycle. When NOACK is set, NEOADR and NEOCMD are locked so that software can read them to see what transaction was being attempted when the error occurred. NOACK is set on any outgoing NVAX cycle which is not acknowledged, whether it was an address cycle or a data cycle. The information which is locked in NEOADR and NEOCMD corresponds to the address cycle of the transaction. For example, if an outgoing write data cycle is not· acknowledged, the address cycle for that write operation is saved in NEOADR and NEOCMD. NOACK is not set if there was a previous BADWDATA. If a BADWDATA cycle is NOACK'd, both BADWDATA and NOACK are set. NOACK is cleared by a write-one-to-cIear. 13.5.6.1.2 BADWDATA BADWDATA is set when the BIU receives data for a writeback from the cache which had an uncorrectable ECC error, and thus is being issued on the NDAL with the BADWDATA command. When BADWDATA is set, NEOADR and NEOCMD are locked so that software can read them to retrieve the information about the failure. The address for the write operation is captured in NEOADR, and the command information for the cycle is captured in NEOCMD. BADWDATA is not set if there was a previous NOACK. If a BADWDATA cycle is NOACK'd, both BADWDATA and NOACK are set. 13.5.6.1.3 LOST_OERR LOST_OERR is set when NOACK or BADWDATA is already set and another one of those errors occurs. It notifies softwaz:e that state was saved only for the first outgoing error. LOST_OERR is cleared by a write-one-to-clear. 13.5.6.1.4 PERR PERR is set when NVAX detects a parity error on the NDAL. When PERR is set, NEDATHI, NEDATLO, and NEICMD are locked so that software can read them to see what was on the NDAL when the error occurred. Since NVAX calculates parity on every cycle, PERR will be set on both its own transfers and the transfers of other devices which fail the parity check. PERR is cleared by a write-one-to-clear. DIGITAL CONFIDENTIAL The Cbox 13-81 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.5.6.1.5 INCON_PERR INCON_PERR (Inconsistent parity error) is set when an NDAL parity error is detected on a cycle which is also acknowledged with Pff'tACK_L. This means that NVAX detected a parity error but some other device acknowledged the transfer. INCON_PERR is only set in conjunction with PERR. It is not set unless PERR is set. If one NDAL parity error has already occurred, setting PERR, but INCON_PERR was not set for that cycle, a subsequent cycle with an inconsistent parity error will not cause INCON_PERR to be set. INCON_PERR is cleared by a write-one-to-clear. 13.5.6.1.6 LOST_PERR LOST_PERR is set when PERR is already set and another NVAX transfer fails the parity check. LOST_PERR notifies software that multiple NVAX transfers have failed the parity check; state was saved only for the first. LOST_PERR is cleared by a write-one-to-clear. 13.5.6.2 NDAL Error Output Address IPR (NEOADR) The NEOADR register is loaded for every address cycle which the Cbox drives onto the NDAL, unless it is locked. It is loaded during the cycle when the corresponding P%ACK_L should be asserted on the NDAL. It is locked when the NOACK bit in the NESTS register is set. When NEOADR is locked, it contains the address information for the :first transaction which failed. If it is read when it is not locked, it contains information from the last address cycle which was acknowledged on the NDAL. The format of NEOADR matches the low longword of the NDAL during an address cycle. NEOADR is read-only to software. Its contents are not changed during reset. Figure 13-36: IPR 80 (hex), NEOADR 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I NDAL address I :NEOADR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 13.5.6.3 NDAL Error Output Command (NEOCMD) The NEOCMD register is loaded and locked exactly as NEOADR is loaded and locked. The format of NEOCMD is similar to that of the high longword of the NDAL during an address cycle. The high quadword byte enable positions are NOT included, since NVAX only uses quadword byte-enabled transactions; and the NDAL ID and command are added in the lower four bits of the longword. 13-82 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 The contents of NEOCMD are not affected by reset. Figure 13-37: IPR 82 (hex), NEOCMD 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I LEN I xl xl xl xl xl xl xl xl xl xl xl xl xl xl BYTE_EN I 01 ID I CMD I :NEOCMD +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Table 13-46: NEOCMD Field Descriptions Name Extent Type CMD 3:0 RO NDAL command as driven by NVAX during the transaction. For specific values, see Section 3.3.4.2. ID 6:4 RO Commander ID as driven by NVAX during the transaction. For specific values, see Section 3.3.4.3. 15:8 RO Byte enable as driven by NVAX during the transaction. For specific values, see Section 3.3.4.1. 31:30 RO Length of the NDAL transaction. Section 3.3.4.1. LEN Description For specific val ues, see The meanings of these fields are described in Chapter 3. 13.5.6.4 NDAL Error Input Command (NEICMD) NEICMD, NEDATHI, and NEDATLO are loaded at the same time and they are locked at the same time. They are all loaded when a parity error occurs; at this time the PERR hit is set in NESTS, which locks the three registers. If a second NDAL parity error happens, the registers are not loaded again; they are not loaded again until after they are unlocked when software clears PERR. NEICMD contains the P%CMD_H<3:0>, P%ID_H<2:O>, and P%PARITY_H<2:0> bits from the failed transfer. NEICMD is a read-only register. Its contents are not changed during reset. DIGITAL CONFIDENTIAL The Cbox 13-83 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 13-38: IPR B8 (hex), NEICMD 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl xl PARITY I ID I CMD I :NEICMD +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 13.5.6.4.1 PARITY The PARITY field corresponds to the NDAL lines P%PARITY_H<2:O>. 13.5.6.4.2 10 The ID field corresponds to the NDAL lines P%ID_H<2:0>. 13.5.6.4.3 CMD The CMD field corresponds to the NDAL lines pcf£MD_H<2:O>. 13.5.6.5 NDAL Error Data High and NDAL Error Data Low (NEDATHI and NEDATLO) NEDATHI and NEDATLO behave analogously to NEICMD. They capture P%NDAL_H<63:0> during a cycle with a parity error. NEDATHI contains the high longword of data from the NDAL (p%NDAL_H<63:32»; NEDATLO contains the low longword of data from the NDAL (P%NDAL_H<31:0». The format of NEDATHI and NEDATLO must be interpreted based on the CMD found in NEICMD. If the CMD field shows that the cycle was a data cycle, the registers contain two longwords of data. If the CMD field shows that the cycle was an address cycle, the registers are in the format of an NDAL address cycle, as shown in Figure 13-39 and Figure 13-40. The contents of NEDATHI and NEDATLO are not affected by reset. 13-84 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 13-39: IPR 84 (hex), NEDATHI 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I LEN I UNDEFINED I BYTE_EN I UNDEFINED I :NEDATHI +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Figure 13-40: IPR 86 (hex), NEDATLO 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I address I :NEDATLO +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ DIGITAL CONFIDENTIAL The Cbox 13-85 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.5.7 Backup Cache Tag Store Access Through IPR Reads and Writes (BCTAG) Direct access to the backup cache tag store is provided to aid in error recovery and diagnosis and to assist testing. These accesses work whether the cache is on or off, in ETM or in force hit mode. If there is a valid FILL_CAM entry for the same cache block which is being accessed through an IPR read or write, the IPR read or write is stalled until the fills return and the FILL_CAM entry is no longer valid. When the backup cache tag store is being accessed through IPR reads and writes, address bits <24:22> = 100 (BINARY). Address hits <20:5> are used as the index into the tag store RAMs; these indicate which backup cache location is to be written or read. Figure 13-41 : Backup Cache Tag Store IPR Addressing Format 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I SBZ I 1 0 0I x I I BCTAG Index I SBZ I +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I '-BCTAG Index or SBZ, based on cache size. Some or all of bits <20: 17> are not actually used as the index if the cache is smaller than 2 megabytes. This is set out explicitly in Table 13-48. The format for reading and writing the backup cache tag store as an IPR is described in Figure 13-42 and Table 13-47. Figure 13-42: IPRs 01000000 thru 011FFFEO (hex), BCTAG 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I TAG I I ECC I I I x I x I x I x I x I x I x I x I x I : BCTAG +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I I '-TAG or 0, based I '-VALID on cache size '-OWNED Table 13-47: BCTAG Field Descriptions Name Extent Type VALID 9 RW OWNED 10 RW Description Valid bit Ownership bit I ECC 16:11 RW ECC check bits TAG 31:17 RW Tag data IThe ECC bits are written from. the value given in the IPR_WRITE only if the SW_ECC bit of the CCTL IPR is set. Otherwise, the Cbox generates and writes correct ECC for the tag, owned and valid values being written. 13-86 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Some or all of TAG<20:17> are not actually used as tag if the cache is larger than 128 kilobytes. This is set out in Table 13-48. Table 13-48: Tag and Index Interpretation for BCTAG IPR Cache size Tag bits used Index bits used 128 kilobytes TACk31:17> Index<16:5> 256 kilobytes TACk31:18> Index<17:5> 512 kilobytes TACk31:19> Index<18:5> 2 megabytes TACk31:21> Index<20:5> The tag store must be initialized to a known state when the chip is powered up. This is done through IPR_WRITEs to BCTAG. When the tag store is read, the ECC check bits are read out directly from the tag store in the format shown. ECC is not checked on IPR accesses to the tag store; no errors can occur during these accesses. Some care must be taken if IPR reads of the tag store are done while other transactions are in progress. The tag information read out may not be what the programmer expects if cache 'misses or cache coherency transactions are in progress on the block which is being read. For example, if a cache miss is in progress, the new tag will be in the tag store but the valid and owned bits will be clear. DIGITAL CONFIDENTIAL The Cbox 13-87 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.5.8 Backup cache deallocates through IPR access (BCFLUSH) The Backup Cache Deallocate IPR is a write-only register which software uses to explicitly request the deallocation of a cache block. For example, this register may be used when hardware has put the cache into ETM and software wants to request writeback of the owned blocks to memory. If there is a valid FILL_CAM entry for the same cache block which is being flushed, the flush is stalled until the fills return and the FILL_CAM entry is no longer valid. Figure 13-43: IPRs 01400000 thru 015FFFEO (hex), BCFLUSH 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I SBZ 1 11 0 1 11 x I Bcache Tag Deallocate Index I SBZ 1 : BCFLUSH +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ When BCFLUSH is written, the CbOx accesses the tag store. If the block is invalid, no further action is taken. If the block is valid but not owned, the Cbox sends a block invalidate to the Mbox and invalidates the entry in the Bcache tag store. If the block is valid and owned, it sends a block invalidate to the Mbox, performs a writeback of the data, and invalidates the entry in the tag store. This behavior takes place whether the cache is on, off, in ETM, or in FORCE_ruT mode. In FORCE_HIT mode, BCFLUSH does a real lookup of the tag store and does not force the access to hit. Software must take care not to force deallocates when cache state is not consistent with the state of memory. For example, when the cache is off: valid and owned bits may be set for blocks which are no longer up-to-date with respect to memory. When a deallocate is done, the VALID and OWNED bits will be cleared as necessary, and the value of the stored TAG is modified. Its value is UNPREDICTABLE. Correct ECC is stored on the tag store entry. A BCFLUSH operation never changes the data stored in the data RAMs. Errors are detected and reported during BCFLUSH operations. The index given is interpreted as in Table 13-48, based on the size of the cache. BCFLUSH may be used when the Bcache is on, as the Pcache is kept a subset of the Bcache during these operations. However, new blocks may be allocated due to memory reads and writes as the cache is being flus~ed. 1~ TheCbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.6 Cbox Control Description The Cbox control consists of the following sections: • • • • • Mbox Interlace. Controls receiving commands from the Mbox including checking for read/write conflicts, and sending data and invalidates back to the Mbox. Cbox Arbiter. Decides which Cbox request should be serviced next. Tag Store Control. Controls access to the tag store RAMs, hit calculation, ECC generation and checking for tag RAMs, tag RAM error handling. Data Ram Control. Controls access to the data RAMs, ECC generation and checking for data RAMs, data RAM error handling. NDAL interface. Controls access to the NDAL queues and implements the NDAL protocol described in Chapter 3. The tag store controller is a state machine which executes any of the following tasks, upon instruction from the arbiter: • • • • • • • • • C_TAG%%DREAD_CMD. Performs a lookup for a data-stream read. Hits if tag matches and is valid. C_TAG%%IREAD_CMD. Performs a lookup for an instruction-stream read. Hits if tag matches and is valid. The operation may be cancelled midstream if the IREAD is aborted. C_TAG%%OREAD_CMD. Performs a lookup which requires ownership. Hits if tag matches and is valid and owned. C_TAG%%R_INVAL_CMD. Perlorms a cache coherency lookup as the result of an NDAL DREAD or IREAD; clears OWNED if necessary. C_TAG%%O_INVAL_CMD. Perlorms a cache coherency lookup as the result of an NDAL OREAD or WRITE; clears VALID and/or OWNED if necessary. C_TAG%%FILL_CMD. Sets the VALID and/or OWNED bit for a fill which has completed. C_TAG%%IPR_DEALLOC_WRlTE_CMD. Performs a lookup for a deallocate; clears VALID and OWNED bits if the block was owned. C_TAG%%IPR_TAG_WRITE_CMD. Writes the tag store with given data. C_TAG%%IPR_TAG_READ_CMD. Reads the tag store from the location requested. When the command given has been executed, the tag store controller notifies the arbiter that it has finished. The data RAM controller is a state machine which executes any of the following tasks, upon instruction from the arbiter: • • • • C_DAT%%DREAD_CMD. Reads four quadwords of data-stream data from the Bcache and sends them to the Mbox interface. C_DAT%%IREAD_CMD. Reads four quadwords of instruction-stream data from the Bcache and sends them to the Mbox interlace. The operation may be cancelled midstream if the Iread is aborted. C_DAT%%WB_CMD. Reads four quadwords of data from the Bcache and sends them to the WRITEBACK_QUEUE. C_DAT%%RM_WRITE_CMD. Perlorms a read-modify-write operation on the Bcache quadword. DIGITAL CONFIDENTIAL The Cbox 1~9 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 • • C_DAT%%WRITE_BMO_CMD. Performs a full quadword write on the Bcache. C_DAT%%FILL_CMD. Writes fill data into the Bcache; merges write data with the fill if necessary. When the command given has been executed, the data RAM controller notifies the arbiter that it has finished. The arbiter looks at the DREAD_LATCH, the IREAD_LATCH, the WRITE_QUEUE, and incoming transactions from the CBOX_BIU_INTERFACE to decide which to service next. It notifies the tag store controller and data RAM controller of which command to execute next. Fills and cache coherency requests both arrive in the NDAL_IN_QUEUE and are sent to the Cbox proper through the CBOX_BIU_INTERFACE. They are processed in order; therefore, one does not have priority over the other. When a transaction such as a read miss causes a cache block to be deallocated, the deallocate always takes place as the next data RAM transaction. Transactions in the CBOX_BIU_INTERFACE take next-highest priority. In the normal case, the DREAD_LATCH takes next priority, the IREAD_LATCH next, and the WRITE_QUEUE takes lowest priority. These priorities change if there are special circumstances, as shown in the tables which follow. Table 13-49: Cbox Task Priority Under Normal Conditions. Priority Source of Transaction 1 Deallocate caused by previous transaction. 2 CBOX_BIU_INTERFACE (Fills and cache coherency requests) 3 4 DREAD_LATCH IREAD_LATCH 5 WRITE_QUEUE Table 13-50: Cbox Task Priority When DWR_CONFLICT BHs are Set In the WRITE_QUEUE. Priority Source of Transaction 1 Deallocate caused by previous transaction. 2 CBOX_BIU_INTERFACE (Fills and cache coherency requests) 3 !READ_LATCH 4 WRITE_QUEUE 5 DREAD_LATCH - not serviced until DWR_CONFLICT bits are clear 13-90 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 13-51: Cbox Task Priority When IWR CONFUCT Bits are Set In the WRITE QUEUE. Priority Source of Transaction 1 Deallocate caused by previous transaction. 2 CBOX_BIU_INTERFACE (Fills and cache coherency requests) 3 DREAD_LATCH 4 WRITE_QUEUE 5 IREAD_LATCH - not serviced until IWR_CONFLICT bits are clear Table 13-52: Cbox Task Priority When WRITE_UNLOCK Is done. a DREAD_LOCK Is In Priority Source of Transaction 1 Deallocate caused by previous transaction. progress until the 2 CBOX_BIU_INTERFACE (Fills and cache coherency requests) 3 WRITE_QUEUE - the WRITE_UNLOCK corresponding to the DREAD_LOCK is the only write which will arrive unless an error occurs; in this case the IPR_WRITE clearing the RDLK bit in the FILL_CAM is the next write to arrive. 4 DREAD_LATCH - not serviced until the WRITE_UNLOCK completes or the FILL_CAM RDLK bit is cleared. 5 IREAD_LATCH - not serviced until the WRITE_UNLOCK completes or the FILL_CAM RDLK bit is cleared. There are various resources in the Cbox which must be available for the start of a transaction. The necessary conditions vary, depending on the transaction in question. Necessary conditions before servicing a fill from the CBOX_BIU_INTERFACE are as follows: 1. The data RAMs and the tag store must be free. The tag store is only strictly necessary for the last fill but for implementation simplicity, both are required for all fills. 2. The WRITEBACK_QUEUE must not be full. A writeback may be necessary at the completion of the fill. Necessary conditions before SeI"VlClng CBOX_BIU_INTERFACE are as follows: a cache coherency request from the 1. The tag store must be free. 2. The WRITE BACK_QUEUE must not be full. Necessary conditions before servicing a transaction from the DREAD_LATCH or the IREAD_LATCH are as follows: 1. 2. 3. 4. The data RAMs and the tag store must be free. A FILL_CAM entry must be available, in case the read misses. There must be an available entry in the NON_WRITEBACK_QUEUE, in case the read misses. There must be no valid entry in the FILL_C.AM for the same cache block as that of the new request. DIGITAL CONFIDENTIAL The Cbox 13-91 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 5. There must be no RDLK hit set in the FILL_CAM, indicating that a READ_LOCK WRITE_UNLOCK sequence is in progress. 6. There must be no block conflict with any WRITE_QUEUE entry. 7. The WRITEBACK_QUEUE must not be full. Necessary conditions before servicing a full quadword write from the WRITE_QUEUE are as follows: 1. The tag store must be free. 2. If a read lock is not outstanding, a FILL_CAM entry must be available, in case the write misses and requires an OREAD. 3. If a read lock is not outstanding, there must be an available entry in the NON_WRITEBACK_QUEUE, in case the write misses. 4. There must be no valid entry in the FILL_CAM for the same cache block as that of the new request, unless the new request is a WRITE_UNLOCK. 5. If there is a READ_LOCK in the FILL_CAM, the fills for the READ_LOCK must have completed. 6. The WRITEBACK_QUEUE must not be full. The tag store lookup for a full quadword write may be done while the data RAMs are busy with another transaction. When the data RAMs free up, the full quadword write is done. If full quadword writes are streaming through the WRITE_QUEUE, this effectively pipelines the tag store accesses and the data RAM accesses so that the writes take place at the maximum write rep rate of the data RAMs. This would not be the case if the arbiter required both the data RAMs AND the tag store to be free before starting the full quadword write. Necessary conditions before servicing any WRITE_QUEUE entry other than a full quadword write are as follows: 1. The tag store and the data RAMs must he free. If a read lock is not outstanding, A FILL_CAM entry must be available, in case the write misses and requires an OREAD. 3. If a read lock is not outstanding, there must be an available entry in the NON_WRITEBACK_QUEUE, in case the write misses. 4. There must be no valid entry in the FILL_CAM for the same cache block as that of the new request, unless the new request is a WRITE_UNLOCK. 5. If there is a READ_LOCK in the FILL_CAM, the fills for the READ_LOCK must have completed. 6. The WRITEBACK_QUEUE must not be full. 2: From the above lists, the following is true: 1. When the data RAMs are busy, the only tag store operations which may proceed are cache coherency requests and full quadword write requests. 2. No transaction from the Mbox which produces a block conBict with the FILL_CAM may proceed, except a WRITE_UNLOCK. This includes 110 space transactions and IPR transactions, for implementation simplicity. 13-92 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.7 Transaction Descriptions 13.7.1 IPR Reads and IPR Writes These transactions are described in Section 13.5. 13.7.2 1/0 Space 110 space references are recognized when address bits <31:29> are equal to all ones. Address bits <31:0> are used for 110 space reads and writes, which may reference bytes. All bits of the address are driven onto the NDAL. In addition, the byte enable field is valid for all I/O space reads and writes, as described in Chapter 3. When the Cbox receives an 110 space read or write, it passes the byte enable from the Mbox out through the BIU to the NDAL. 110 space references are never cached in the Bcache. All such references are passed directly to the NDAL. 110 space fill data which returns is passed directly to the Mbox. 110 space references are always quadword length. When the quadword returns on the NDAL, the Cbox returns it directly to the Mbox and asserts C%LAST_FILL_B so the Mbox does not expect any more fills. 110 space references also result from IPR_READs and IPR_WRITEs to the Cbox which are not in Cbox register space. The Cbox converts these to 110 space reads and writes, as described in Section 13.5. Before an 110 space read is allowed to proceed, the WRITE_QUEUE is flushed. I/O space writes are naturally ordered with respect to previous 110 space writes since they go into the WRITE_QUEUE behind any previous 110 space writes. They are also ordered with respect to previous reads and subsequent reads through the write conflict bit mechanism. There are situations where I/O space writes will appear out of order with respect to memory space writes. See Section 13.14 for an explanation of when this may happen. READ_LOCKs and WRITE_UNLOCKs to I/O space are not supported by the Cbox. If software issues these transactions through the Mbox, the Cbox converts them to normal DREADs and WRITEs on the NDAL. 13.7.3 Clear Write Buffer In previous systems, Clear Write Buffer (CWB) was implemented as a separate command. NVAX implements this as an IPR read or write which the Cbox converts into an 110 space read or write on the NDAL. As this transaction passes through the Cbox, it has the effect of clearing previous entries in the WRITE_QUEUE, the NON_WRITEBACK_QUEUE, and the WRITEBACK....QUEUE. An IPR_READ to clear the write buffers causes all the DWR_CONFLICT and IWR_CONFLICT bits in the WRITE_QUEUE to be set. All writes are flushed as top priority, and then the I/O space read is issued to the NDAL and system. Which device responds to the read is system-dependent. DIGITAL CONFIDENTIAL The Cbox 13-93 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 An IPR_WRITE to clear the write buffers goes into the WRITE_QUEUE. If any reads are outstanding, they complete first due to their higher priority and then the writes complete. If a new read arrives while the IPR_WRITE is still in the WRITE_QUEUE, the con:ftict bit is set for that entry so the read does not complete until after the IPR_WRITE to clear the write buffer. After that IPR_WRITE completes, read/write priority goes back to the default behavior. The Clear Write Buffer has the effect of clearing both the WRITEBACK_QUEUE and the NON_WRITE BACK_QUEUE, as follows: the CWB, whether issued as an IPR_READ or an IPR_WRITE, enters the NON_WRITEBACK_QUEUE. Since the WRITEBACK_QUEUE takes priority over the NON_WRITEBACK_QUEUE, any previous writebacks will be issued to the NDAL before the CWB is issued from the NON_WRITEBACK_QUEUE. Any entries which were already in the NON_WRITEBACK_QUEUE will be issued before the CWB as transactions in the queue are always issued in order. Thus, before the CWB completes, both outgoing NDAL queues are flushed of all previous transactions. If the CWB is issued as an IPR_READ, software receives positive acknowledgement that the queues were cleared when the fill returns. The IPR_WRITE is issued to the NDAL as an I/O space write. As with the I/O space read to clear the write buffers, the device which responds is system-dependent. 13-94 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.7.4 Memory Read Hit Several different kinds of memory reads may arrive from the Mbox, as shown in the following table. Read Cboxacnon IREAD hits if tag matches and valid bit is set DREAD hits if tag matches and valid bit is set DREAD_MODIFY hits if tag matches and valid bit is set DREAD_LOCK hits if tag matches t valid bit is sett and ownership bit is set When the Mbox asserts M%CBOX_REF_ENABLE_L, the Cbox takes the command from Mo/d36_CMD_H. If the backup cache is occupied with another transaction, the Cbox puts an IREAD into the IREAD_LATCH or a DREAD into the DREAD_LATCH for later processing. Otherwise, the read bypasses the read latches and is started immediately. When both the tag store and the data RAMs are free, the transaction starts. The tag lookup is done in parallel with the data lookup. If the read hits, data is driven from the backup cache RAMs back through the CM_OUT_LATCH. The fill command is sent to the Mbox on Co/cCBOX_CMD_H<l:O>. Two cycles later, the Pcache fill is done while the Cbox drives data onto Bo/d36_DATA.,..H<63:O>. Using the fastest RAM speed configuration, the backup cache access incurs an additional4-cycle latency penalty beyond the Pcache access. Each subsequent quadword in the block takes an extra two cycles from the previous quadword. On a read hit in the backup cache, the requested quadword is always returned first to the Mbox. The subsequent quadwords are sent in wrapped order as shown in Table 13-53. Table 13-53: Order of quadwords read from the Bcache Requested QW 2nd QW returned 3rd QW returned 4th QW returned QWO QWl QW2 QW3 QWl QW2 QW3 QWO QW2 QW3 QWO QWl QW3 QWO QWl QW2 DIGITAL CONFIDENTIAL The Cbox 13-95 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.7.5 Read Miss and Fill At the same time the tag store access is done for a read, the address is put in the FILL_CAM. If the read misses, that entry is validated and the address is sent to the NON_WRITEBACK_QUEUE. If the read command was DREAD_MODIFY and missed, it is converted to an OREAD on the NDAL. All other reads are sent as either lREADs or DREADs on the NDAL. From the NON_WRlTEBACK_QUEUE the request goes across the NDAL to the memory interface. When the memory interface returns the fill, the Cbox puts the fill into the NDAL_IN_QUEUE. Since the block size is 32 bytes and the NDAL is 8 bytes wide, four fill transactions on the NDAL result from the read request. The arbiter services the CBOX_BIU_INTERFACE, and thus the fill, as highest priority. At this time, Cbox control takes the fill from the CBOX_BIU_INTERFACE and puts the data in the CM_OUT_LATCH. At the same time it starts writing the backup cache RAMs with the data, which takes at least three cycles, depending on RAM access time. The fill data is driven to the Mbox from the CM_OUT_LATCH as described in the cache hit section preceding. As fill data returns, the Cbox keeps track of how many quadwords have been received with a two-bit counter in the FILL_CAM. If two read misses are outstanding, fills from the two misses may return interleaved, so each entry in the FILL_CAM has a separate counter. When the last quadword ofa read miss arrives, the new tag is written and the valid bit is set in the cache. The owned bit is set if the fill was for an Ownership Read. The FILL_CAM is made available for the next cache miss. If the RIP or OIP bit is set (and DNF is not set) in the FILL_CAM when the last fill returns, the arbiter immediately notifies the tag store control to start a cache coherency transaction on that block; nothing intervenes between the last fill and the cache coherency transaction. 13.7.6 Write Hit A write from the WRITE_QUEUE is begun by accessing the tag store. It is a write hit if the tag matches, the valid bit is set, and the ownership bit is set. In this case the write data may be written into the data RAMs. The data RAMs are not accessed for the write until it is determined that the write hit. The write is somewhat complicated because we have ECC across 8 bytes in the data RAMs. If all bytes in the quadword are not to be written with new data, the old data is read out of the data RAMs during the tag store lookup and before the write is done. The new data is merged with the old so that ECC can be calculated across the new quadword. This action is known as read-modify-write. If byte enable indicates that the write is a full quadword write, the read-modify-write is not necessary. In this case, the tag store lookup may proceed even if the data RAMs are not available; when the RAMs then become available, the write is done (assuming the tag store access resulted in hit-owned). This allows sequential full quadword writes to be effectively pipelined, as the tag store lookup for the next write may proceed while the current write is being done into the data RAMs. If the fastest RAM configuration is used, this achieves a three-cycle repetition rate for full quadword writes. When the write is complete, the entry is removed from the WRITE_QUEUE. 13-96 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.7.7 Write Miss If the tag store lookup for a write is done and the ownership bit is not set or the tag does not match, an ownership read is issued to the memory subsystem through the NON_WRITEBACK_QUEUE. At the same time, the new tag is written to the backup cache tag store with cleared VALID and OWNED bits. When the requested quadword returns through the NDAL_IN_QUEUE, the write data is merged with the fill data, ECC is calculated, and the new data is written to the cache RAMs. At this time the write is removed from the WRITE_QUEUE. When the fourth quadword returns, the valid bit and the ownership bit are set in the tag store. None of the fill data is sent to the Mbox, since the request originated from a write rather than from an Mbox read. 13.7.8 Deallocates Due to CPU Reads and Writes When any tag lookup for a read or a write results in a miss, the cache block is deallocated to allow the fill data to take its place. If the block is not valid, no action is taken for the deallocate. If the block is valid but not owned, the block is invalidated in the backup cache tag store and an invalidate is sent to the Pcache. If the block is valid and owned, the block is written back to memory, invalidated in the tag store, and an invalidate is sent to the Pcache. The Hexaword Disown Write command is used to write the data back. If a writeback is necessary, it is done immediately after the read or write miss occurs. The miss and the deallocate are contiguous events and are not interrupted for any other transaction. When the block is invalidated or deallocated at the time of the miss, the VALID and OWNED bits are cleared. The TAG is written with a value corresponding to the address of the read or write which just missed. When the fill returns, the VALID and OWNED bits are written appropriately. The four quadwords for the deallocate are read out from the bcache in the order shown in Table 13-53. They are driven on the NDAL in order from QWO to QW3, however, as required by the NDAL protocol for hexaword writes. 13.7.9 DREAD_LOCK and WRITE_UNLOCK The Cbox receives DREAD_LOCKlWRITE_UNLOCK pairs from the Mbox. It never issues those commands on the NDAL. The Cbox always uses Ownership Read-Disown Write on the NDAL and depends on use of the ownership bit in memory to accomplish interlocks. When the cache is on, a DREAD_LOCK which produces an owned hit in the backup cache causes no memory access. All four quadwords are read out of the Bcache and sent to the Mbox. The address is placed in the FILL_CAM to prevent any access of the block Wltil the WRITE_UNLOCK is done. A DREAD_LOCK which does not produce an owned hit in the backup cache results in an OREAD on the NDAL, whether the cache is on or off. When the cache is on, the WRITE_UNLOCK is written into the backup cache and is only written to memory if requested through a coherence transaction or due to a deallocate. When the cache is off: the WRITE_UNLOCK becomes a Quadword Disown Write on the NDAL. When a DREAD_LOCK arrives in the DREAD_LATCH, the WRITE_QUEUE is flushed before the DREAD_LOCK is started. All transactions from the IREAD_LATCH or the DREAD_LATCH are prevented until the WRITE_UNLOCK takes place or until the RDLK bit in the FILL_CAM is cleared through an IPR_WRITE to the CEFSTS IPR. DIGITAL CONFIDENTIAL The Cbox 13-97 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 During READ_LOCKlWRITE_UNLOCK processing, the NDAL_IN_QUEUE is serviced normally, so if the cache is on, the NDAL may see some writebacks while the DREAD_LOCKlWRITE_UNLOCK is in progress. When the Bcache is running in normal mode, a WRITE_UNLOCK is not looked up in the tag store as it is guaranteed to be owned in the cache. The arbiter initiates a read-modify-write directly to the data RAMs without any tag store access at all. If the Bcache is in ETM, the WRITE_UNLOCK is looked up, as the block mayor may not be owned in the cache. When the Bcache is off, a WRITE_UNLOCK which is done without a preceding READ_LOCK will be sent directly to the NDAL. In any other mode of Bcache operation, the WRITE_UNLOCK is expected to be preceded by a READ_LOCK. When the cache is off, a WRITE_UNLOCK without a preceding READ_LOCK may be useful for error handling (this is not currently implemented in the microcode). 13.8 Cache Coherency Since NVAX is used in multiprocessor systems, cache coherency requests requiring invalidates and/or writebacks arrive on the NDAL. These may require action in the Bcache and/or the Pcache. Under normal conditions, the Cbox ensures that the Pcache is a subset of the Bcache, as explained below. Thus, it is able to filter invalidate requests so that not all are sent to the Pcache. Table 13-54 shows the actions taken in the Bcache, based on the NDAL command which arrives and matches a cache block. Table 13-54: NVAX Backup Cache Invalidates and Wrltebacks NDAL Command Invalid block Valid & Unowned IREAD,DREAD Valid & Owned Writeback, set Bcache to valid-unowned state OREAD Invalidate Writeback, Invalidate WRITE Invalidate Writeback, Invalidate WDISOWN Whenever an invalidate is necessary in the Bcache, according to Table 13-54, an invalidate is also sent to the Pcache. . Invalidates are sent to the Pcache under the following circumstances: 1. When an invalidate is necessary in the Bcache, due to a cache coherency request, the invalidate is also forwarded to the Pcache. 2. When a cache miss causes a Bcache deallocate, a corresponding invalidate is forwarded to the Pcache. 3. When a write to BCFLUSH causes a bcache deallocate, a corresponding invalidate is forwarded to the Pcache. 4. When a OREAD or WRITE cache coherency request matches an entry in the FILL_CAM, the invalidate is forwarded immediately to the Pcache. When the last fill returns, a second invalidate is forwarded to the Pcache. 13-98 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 5. When the Bcache is off or in FORCE_mT mode, ALL cache coherency requests result in invalidates to the Pcache. It is not strictly necessary to send invalidates for IREAD and DREAD cache coherency requests, as multiple caches may contain read-only copies of data, but for implementation reasons they ARE sent as invalidates to the Peache. 6. When the Bcache is in ETM, all OREAD and WRITE cache coherency requests result in invalidates to the Pcache. (IREAD and DREAD cache coherency requests do not result in invalidates to the Peache.) A second invalidate is passed to the Pcache if the normal Bcache lookup conditions are met. NOTE When a cache coherency request hits in the cache and either VALID or OWNED is modified, the tag which is written to the cache is the same as the tag which was there originally. DIGITAL CONFIDENTIAL The Cbox 13-99 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.9 Abnormal conditions This section describes the various modes of Bcache behavior as well as Cbox response when it detects an error. The Bcache has four operating states which are controlled by the following hits in the CCTL register: ENABLE, FORCE_HIT, SW_ETM, and HW_ETM. The four states are ON, OFF, ETM, and FORCE_InT. The four states are determined and prioritized as follows: 1. OFF. If the ENABLE bit is cleared in CCTL, the Bcache is OFF and those conditions take precedence. 2. FORCE_IflT. If the ENABLE hit is set and FORCE_InT is set, the Bcache is in FORCE_ruT mode and those conditions take precedence. 3. ETM. If the ENABLE bit is set, FORCE_mT is cleared, and either SW_ETM or HW_ETM is set, the cache is in ETM mode and those conditions take precedence. 4. ON. If the ENABLE bit is set and FORCE_ruT, SW_ETM, and HW_ETM are cleared, the cache is ON. I The ON state is the normal operating condition of the cache. OFF, FORCE_HIT, and ETM modes are described in the sections which follow. A summary of the backup cache behavior when it is ON and incurring no eITors is given in Table 13-55. 13-100 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 13-55: Backup cache behavior while It Is ON Cache Response Cache 'IransactioD Miss Invalid CPU IREAD, DREAD READ memory CPU Read Modify OREAD memory CPU READ_LOCK OREAD memory CPU Write OREAD memory Miss Valid Miss Owned Hit Valid Hit Owned Read memory, Pcache inval Read memory, Pcache inval, Bcache dealloc Read cache Read cache ~Read OREAD memory, Pcache inval, Bcache dealloc Read cache Read cache ORead memory, Pcache inval OREAD memory, Pcache inval, Bcache dealloc OREAD memory, Pcache inval Read cache OREAD memory, Pcache inval OREAD memory, Pcache inval, Bcache dealloc OREAD memory, Pcache inval Write cache memory, Pcache inval - - - - - - N o tag store lookup; write Bcache unconditionally'------- Fill OREAD Write for for -Write cache with fill data and write data; set TS valid-owned - - - Fill OREAD for ----Write cache with fill data; set TS valid-owned - - - - - Fill READ for ----Write cache with fill data; set TS valid - - - - - - - NDAL IREAD, DREAD ---No action for a misss---- No Action Writeback, set Bcache valid-unowned NDAL OREAD, WRITE ---No action for a miss-s- - - Bcache inval, Pcache inval Writeback, Bcache inval,Pcache inval DIGITAL CONFIDENTIAL The Cbox 13-101 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.9.1 Cbox Behavior When the Backup Cache is OFF The backup cache may be off for three reasons: the chip has just powered up, the system contains no backup cache, or software has disabled the cache by clearing the ENABLE bit in the Cbox control register. When the cache is off, no accesses to the backup cache are done. Errors are not detected and cache state is UNCHANGED unless explicitly changed by software through IPR reads and writes. When the backup cache is off, all Ownership-Invalidate cache coherency requests (as the result of OREADs or WRITEs) which arrive are forwarded as invalidates to the Mbox, as the data may be valid in the Pcache. All reads from the Mbox go directly to the NON_WRITEBACK_QUEUE, and an entry in the FILL_CAM is allocated. Fills which return are sent directly to the Mbox without accessing the Bcache, and when the last fill for a block arrives, the FILL_CAM entry is cleared. All writes except WRITE_UNLOCKs go directly to the NON_WRITEBACK_ QUEUE. 1 When the cache is off, a DREAD_LOCKlWRITE_UNLOCK pair from the Mbox becomes Hexaword Ownership ReadlQuadword Disown Write on the NDAL. All writes issued from NVAX when it is operating without a backup cache are of quadword length. Memory reads are of hexaword length since the Pcache block size is a hexaword. Even if the Pcache is off, a hexaword of data is returned to the MbOx. A DREAD_MODIFY command from the Mbox normally becomes an OREAD on the NDAL when it misses in the cache. However, when the cache is off, a normal DREAD is used on the NDAL. 1 If P%CPU_WB_ONLY_L is asserted, the WRITE_UNLOCK must be allowed to proceed. Only the WRITEBACK_QUEUE continues when Plr£PU_WB_ONLY_L is asserted, so the WRITE_UNLOCK must go through the WRITEBACK_QUEUE. 13-102 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.9.2 Cbox Behavior When the Backup Cache is in FORCE_HIT Mode FORCE_ffiT mode is intended to be used for testing purposes only. It is used when the cache is enabled. When FORCE_InT is set, all memory space reads and writes to the Bcache, both Istream and Dstream., are forced to hit. Tag store state is not changed at all; the data RAMs are accessed as if the tag store access produced an owned-valid hit. Cache coherency transactions are treated as they are when the cache is off: they are not looked up in the backup cache, they are all forwarded to the Mbox, and cache state is not changed as the result of the cache coherency requests. When the Bcache is in FORCE_ffiT mode, deallocates are not done. Even if the tag matches and the VALID and OWNED bits are set, the block is not written back. The implication of this is that if FORCE_ffiT mode is being used while running in a multiprocessor environment, the Bcache must be flushed of all owned blocks beforehand. Tag store and data RAM ECC errors are detected in FORCE_IDT mode if DISABLE_ERRORS in the CCTL register is not set, resulting in the usual error handling. Suppose the ECC logic for the data RAMs is to be tested. Put the cache in FORCE_IDT mode. Set SW_ECC in the Cbox control register. Write the desired ECC into BCDECC. Do a Dstream write to the desired location, and the location will be written using ECC from BCDECC rather than from Cbox-generated ECC. Suppose the ECC written is such that when the data is read, an ECC error will be flagged. Now perform a read of the location while FORCE_inTis still set. The read will result in an ECC error, showing that the logic is working correctly. The data ram error registers may be read and will correspond to the induced error. 13.9.3 Cbox Behavior When the Backup Cache is in Error Transition Mode When the Cbox detects certain errors, as described in Chapter 3 and Section 13.4.2, it puts itself into Error Transition Mode. The goals of the Cbox design during ETM are the following: 1. Preserve the state of the cache as much as possible for diagnostic software. 2. Honor Mbox references which hit owned blocks in the backup cache since this is the only source of data in the system. 3. Respond to NDAL cache coherency requests normally. Once the Cbox enters Error Transition Mode, it remains in ETM until software explicitly disables or enables the cache. To ensure cache coherency, the cache must be completely flushed of valid blocks before it is re-enabled because some data can become stale while the cache is in ETM. Table 13-56 describes how the backup cache behaves while it is in ETM. DIGITAL CONFIDENTIAL The Cbox 13-103 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 13-56: Backup cache behavior during ETM Cache Cache Response 'lransaction Miss Valid hit Owned hit CPU IREAD,DREAD Read memory Read memory Read cache CPU Read Modify Read me~ory Read memory Read cache CPU READ_LOCK OREAD memory OREAD memory Oread memory, Bcache deallocl CPU Write Write memory Write memory Write memory, Bcache deallocl Write memory Write memory Write cachel Fill (from read started before ETM) - - - - - - - N o r m a l cache behavio:r::....-------- Fill (from read started during ETM) ---,Do not update backup cache; return data to Mbox~-- NDAL cache coherency request --Normal cache behavior except that o-inval always goes to Pcache2 - - IDone to preserve write ordering; no invalidate is sent to the Pcaehe. For the READ_LOCK (or WRITE), the block writeback may be done before OR after the OREAD (or WRITE). 2The tag store controller looks up the invalidate request normally; if the lookup was an o-inval (due to an OREAD or a WRITE on the NDAL), the Cbox arbiter unconditionally forwards an invalidate to the Pcaehe. If the hit conditions are met in the cache, a second invalidate for the same block is forwarded to the Pcache (the tag store controller behaves as it does in normal mode.) Any reads or writes which do not hit valid-owned during ETM are sent to memory: read data is retrieved from memory, and writes are written to memory, bypassing the cache entirely. The cache supplies data for Ireads, Dreads, and Dread Modifys which hit valid-owned; this is normal cache behavior. If a write hits a valid-owned block in the cache, the block is written back to memory and the write is also sent to memory. The write leaves the Cbox through the NON_WRlTEBACK_ QUEUE, enforcing write ordering with previous writes which may have missed in the cache. If a READ_LOCK hits valid-owned in the cache, a writeback of the block is forced and the READ_LOCK is sent to memory (as an OREAD on the NDAL). This behavior enforces write ordering between previous writes which may have missed in the cache and the WRITE_UNLOCK which will follow the READ_LOCK The write ordering problem to which the previous two paragraphs allude is as follows: Suppose the cache is in ETM. Also suppose that under ETM, writes which hit owned in the cache are written to the cache while writes which miss are sent to memory. Write A misses in the cache and is sent to the non-writeback queue, on its way to memory. Write B hits owned in the cache and is written to the cache. A cache coherency request arrives for block B and that block is placed in the writeback queue. If Write A has not yet reached the NDAL, Writeback. B can pass it since the writeback queue has priority over the non-writeback queue. If that happens, the system sees write B while it is still reading old data in block A, because write A has not yet reached memory. 13-104 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Referring again to Table 13-56, note that a WRITE_UNLOCK that hits owned during ETM is written directly to the cache. There is only one case where a WRITE_UNLOCK will hit owned during ETM: if the READ_LOCK which preceded it was performed before the cache entered ETM. (Either the READ_LOCK itself or an invalidate performed between the READ_LOCK and the WRITE_UNLOCK caused the entry into ETM.) In this case, we know that no previous writes are in the non-writeback queue because writes are not put into the non-writeback queue when we are not in ETM. (There may be 110 space writes in the non-writeback queue but ordering with 110 space writes is not a constraint.) Therefore there is not a write ordering problem as in the previous paragraph. Table 13-56 shows that during ETM, cache coherency requests are treated as they are during normal operation. Fills as the result of any type of read originated before the cache entered ETM are processed in the usual fashion. If the fill is as a result of a write miss, the write data is merged, as usual, as the requested fill returns. Fills caused by any type of read originated during ETM are not written into the cache or validated in the tag store. During ETM, the state of the cache is modified as little as possible. Table 13-57 shows how each transaction modifies the state of the cache. Table 13-57: Backup cache state changes during ETM Cache State Modified Cache Transaction Miss Valid hit Owned hit CPU IREAD,DREAD, Read Modify None. None. None. CPU READ_LOCK None. None. Clear VALID & change OWNED; TS_ECC accordingly. CPU Write None. None. Clear VALID & OWNED; change TS_ECC accordingly. None. None. Write new data, change DR_ECC accordingly. Fill (from read started before ETM) Fill (from read started duringETM) --------.None.-------- NDAL cache coherency request ----Clear VALID & OWNED; change TS_ECC accordingly:.----- 13.9.4 Cbox transition into Error Transition Mode When the BIU encounters an error which induces ETM, it sends an explicit transaction to the arbiter requesting that the Cbox enter ETM. When the arbiter services this transaction, CCTL<HW_ETM> is set. The next transaction serviced by the arbiter will be under ETM. DIGITAL CONFIDENTIAL The Cbox 13-105 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 When the backup cache tag store or data RAM controller encounters an ETM-inducing error, it sets CCTLdIW_ETM> immediately. The arbiter picks up the new value of CCTL<HW_ETM> whenever it starts a new transaction. The tag store controller picks up the new value whenever the arbiter instructs it to start a new transaction. For a given transaction, the arbiter and the tag store always see the same value of ETM. Since they pick up the state of ETM at the beginning of every transaction, the Cbox always enters ETM in a predictable way. Although the data ram controller may cause the assertion of HW_ETM, it does not use ETM in processing its transactions. In general, if a transaction starts when the Bcache is operating normally, and it encounters an ETM-inducing error, the next transaction is handled in ETM. There is one exception: If a read is looked up in the tag store and hits, the data RAM controller looks up the data in the backup cache. While the data is being read out of the RAMs, the tag store controller may start a lookup for a quadword write. If the quadword write hits, the write WILL be done to the backup cache even if the read data encounters an ETM-inducing error before the write is done to the Bcache. This sequence would be as follows: 1. 2. 3. 4. 5. 6. 7. Tag store lookup and Data RAM lookup for READ A start. Tag store lookup for READ A completes. Tag store lookup for Quadword Write B starts. Data RAM lookup for Read A encounters an ETM-inducing error. Tag store lookup for Quadword Write B completes; it was a hit. Data RAM lookup for Read A completes. Data RAM write for Quadword Write B is carried out to the Bcache. Quadword Write B completed as if the Bcache were operating normally. If the tag store lookup for the Quadword Write had not started until after the ETM-inducing error had been encountered, then the Quadword Write would have been carried out under ETM, and the write would have been done directly to memory. 13-106 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Cbip Functional Specification, Revision 1.1, August 1991 13.9.5 How to turn the Bcache off Because the Bcache is a writeback cache, care must be taken to maintain cache coherency when turning it off. If the cache is running normally and software wishes to turn it off, it must do the following: 1. Write CCTL to set SW_ETM. In this mode, the Bcache will not allocate any new blocks and will send all cache coherency requests to the Mbox as invalidates. 2. Use the BCFLUSH register to flush all owned blocks out of the cache. 3. Turn off the Bcache by writing CCTL to clear ENABLE and SW_ETM simultaneously. If an error was encountered during the deallocate process, HW_ETM may be set and if so, should be cleared as well. If the Bcache encounters an uncorrectable ECC error, the Cbox sets HW_ETM in the CCTL register. If software wishes to turn off the cache, it must do the following: 1. Use the BCFLUSH register to flush all owned blocks out of the cache. 2. Write CCTL to clear ENABLE and clear HW_ETM simultaneously. This turns off the Bcache. If Bcache errors are happening, but only in part of the cache, software may be able to avoid the errored portion of the cache by disabling it through use of the SIZE field in CCTL. If part of the cache is failing, a smaller cache size may be selected so that only part of the cache RAMs are being used. The cache must be flushed before changing the cache size so that the tags are correct. This only works if the smallest cache size is not being used to begin with, and if the failing areas of cache do not fall within the range of the smaller cache size selected. 13.9.6 How to turn the Bcache on When NVAX powers up, garbage data is stored in the Bcache tags and data. This would result in ECC errors if the cache were turned on immediately. Through IPR writes, every Bcache tag store entry must be written with cleared OWNED and VALID bits. The value written to the TAG is irrelevant, as long as correct ECC is written to the TAG store. The Bcache data RAMs must also be initialized with correct ECC on powerup. FORCE_HIT mode may be used to initialize the Bcache data RAMs with correct ECC. If full quadword writes are used, no data RAM errors will be detected during this process, since the RAMs are written without being read first. If partial quadword writes are used, errors will be detected because of the read-modify-write which is necessary. If the programmer sets the DISABLE_ERRORS bit in the CCTL register, the Cbox will ignore these errors. Once the tag store and data RAMS have been initialized, the cache may be enabled by setting ENABLE in the CCTL register. If the Bcache is in ETM, it may be incoherent with respect to other CPUs and memory because of how it treats writes which hit valid but not owned in the cache (see Table 13-56). In addition, the Pcache, if enabled, is no longer a subset of the backup cache. The procedure for turning on the Pcache and the Bcache described in Chapter 16 must be followed. If the Bcache is operating normally and is turned off for some reason, the programmer must ensure that when it is reenabled, all the OWNED and VALID bits are cleared. DIGITAL CONFIDENTIAL The Cbox 13-107 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 When P%CPU_WB_ONLY_L is asserted, NVAX may only arbitrate in order to issue Disown Writes on the NDAL. When P%CPU_WB_ONLY_L is asserted, the Cbox continues to process transactions from the NDAL_IN_QUEUE normally, performing writebacks as necessary. With one exception described below, the COOx arbiter prevents all new reads and writes from the Mbox while P%CPU_WB_ONLY_L is asserted. Therefore, if P%CPU_WB_ONLY_L is asserted for long periods of time, CPU performance could be adversely impacted. The exception to the rule is the following: If a READ_LOCK from the Mbox is in progress when P%CPU_WB_ONLY_L is asserted, the WRITE_UNLOCK from the write queue must be allowed to complete. Otherwise, deadlock could occur if the system asserted P%CPU_WB_ONLY_L until it received data from the WRITE_UNLOCK Therefore, while P%CPU_WB_ONLY_L is asserted, the write queue is permitted to continue if a READ_LOCK is in progress. The READ_LOCK is completed when either the WRITE_UNLOCK is issued and completed, or an "IPR WRITE_UNLOCK" to CEFSTS is issued and completed. During the cycle in which P%CPU_WB_ONLY_L is asserted, the Cbox may issue a non-writeback command on the NDAL. It is up to the NDAL arbiter not to grant to NVAX again during that cycle, so that the Cbox does not issue another non-writeback command in the following cycle. If the NDAL arbiter does assert Po/cCPU_GRANT_L during the same cycle in which P%CPU_WB_ONLY_L is asserted, NVAX may drive another non-writeback command on the NDAL in the following cycle which was granted. There is one interesting error case which can occur when P%CPU_WB_ONLY_L is asserted. It is as follows: Normally, when the Cbox has a READ_LOCK outstanding and it receives an O_INVAL cache coherency request (OREAD or WRITE to the block), it sets the OIP bit in the FILL_CAM (O_INVAL pending). If the Cbox receives an R_INVAL cache coherency request, it sets the RIP bit in the FILL_CAM CR_INVAL pending). When the Ebox issues the WRITE_UNLOCK and the Cbox arbiter sees that RIP or OIP is set, it issues a block writeback to the NDAL. This is done even if Po/oCPU_WB_ONLY_L is asserted. If some error occurs which prevents the Ebox from issuing the WRITE_UNLOCK, it sends the Cbox an "IPR WRITE_UNLOCK" to clear the READ_LOCK out of the FILL_CAM. This "IPR WRITE_UNLOCK" clears the FILL_CAM entry but the Cbox arbiter DOES NOT check the status of RIP and OIP to see if we need to do a writeback. The implication is that if the Cbox is in the middle of a READ_LOCK-WRITE_UNLOCK and a cache coherency transaction arrives for the block, AND the Ebox never issues the WRITE_UNLOCK due to some error (see below), the Cbox will NOT write back that block in response to the former invalidate. (The COOx would write the block back if a subsequent cache coherency request arrived.) The following error would cause this situation: TB parity error after issuing the read lock; Ebox S3 stall timeout after issuing the read lock; an uncorrectable error in the Backup cache data RAMs on the first quadword of the read lock. This could cause a deadlock in a system if the system had asserted Po/oCPU_WB_ONLY_L because it was waiting for the writeback. NVAX might never issue the writeback and the Cbox stops processing after the "IPR write unlock", until P%CPU_WB_ONLY_L is deasserted. 13-108 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specificationt Revision 1.1t August 1991 One solution to the deadlock is for the system element which is waiting for the writeback to have a timeout counter, so that it does not wait forever. Once the element times out, P%CPU_WB_ONLY_L should be deasserted and the system can continue to operate. Or if the cache coherency transaction is reissued on the NDAL after the completion of the "IPR WRITE_UNLOCK", the Cbox WILL service it. DIGITAL CONFIDENTIAL The Cbox 13-109 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.9.8 Backup Cache Errors In general, the Cbox logs as much state as possible concerning errors and notifies the Ebox and/or Mbox that an error has occurred. For every error, the Cbox asserts either C%CBOx...H_ERR_H or Ctf£BOx...S_ERR_H to notify the interrupt section of a hard error or a soft error, respectively. The Cbox also notifies the Mbox if the error occurred on a fill to the Mbox. The backup cache goes into Error Transition Mode when it detects any uncorrectable error from the cache RAMs. Table 13-58: Backup Cache ECC Errors and NVAX CPU Error Responses General Problem Correctable ECC error in the data RAMs Correctable ECC error in the tag store 13-110 The Cbox Specific Situation and Action Taken by NVAX CPU read hit for writeback or read hit for deallocate IPR Cbox asserts CC1K3O~S_ERR-.H. The data for the writeback is corrected and the writeback continues normally. read hit for Mbox Cbox asserts CflfCB()~s_BRlUL CCJ&oCBOX;..EC(uaUt~H is asserted to tell the Mhox to ignore the uncorrected data. When the data has been corrected, it is driven to the MbOx. Hardware does not correct the error in the cache. read for write hit Cbox asserts c.cBO~s.,;BlUl-.H. The corrected data is merged with the write data and written into the RAMs. miss No error is reported. any read or write except WUNLOCK (hit or miss) Cbox asserts CCJ&CBO~S_BBll.-H, assumes the transaction missed, and sends a READ or an OREAD to memory. If the location was owned, making a deallocate necessary, the outgoing address is corrected for the writeback. Note that if the transaction actually hit-owned, the read or oread is sent to the NDAL followed by a writeback of the same block. The errored location is corrected by hardware when the tag and valid bit are written for the fill. WRITE_UNLOCK No tag store lookup is done, so this case does not occur. cache coherence transaction miss Cbox asserts CfJICBO~S_ERR-.H. Hardware does not correct the bad location; it may be done by software. cache coherence transaction hit Cbox asserts CC)(,CBO~SJ£BR_H. Writes the corrected tag, valid, and owned bits back into the tag store when invalidating the entry. Uses corrected address for the writeback if necessary. DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 13-58 (Cont.): Backup Cache ECC Errors and NVAX CPU Error Responses General Problem Specific Situation and Action Taken by NVAX CPU Uncorrectable ECC error in the data RAMs (includes addressing errors) read for writeback or deallocate IPR Cbox asserts c.cBO~S_ERR_H, puts backup cache into ETM. The data cycle command for the NDAL is changed to BADWDATA and the writeback continues normally. VALID-OWNED or VALID-UNOWNED read for Mbox Cbox asserts CCJ&,CBOX-.S_EBllJl, puts backup cache into ETM. The CM_OUT_LATCH is loaded with the data and marked bad by asserting Cf,CBO~HAR.D_ERR_H. VALID-OWNED DREAD_LOCK for Mbox, first quadword fails Cbox asserts CCJ&,CBO~S_EBllJl, puts backup cache into ETM. The CM_OUT_LATCH is loaded with the data and marked bad by asserting CCJ&,CBO~HARD_ERR_H. The DREAD_LOCK entry remains in the FILL_CAM until microcode issues the "IPR write unlock". If RIP or OIP is set, it is not processed. VALID-OWNED DREAD_LOCK for Mbox, quadword other than the first one fails Cbox asserts CCJ&,CBOX-.S_EBllJl, puts backup cache into ETM. The CM_OUT_LATCH is loaded with the data and marked bad by asserting CCJ&,CBO~HARD..ERRJI. The EboxlMbox issues the WRITE_UNLOCK since data for the DREAD_LOCK was returned. read for write or write-unloc~ valid-owned hit Cbox asserts ~ox H ERR H, puts backup cache into ETM. When the error is detected, write data has already been merged with the corrupted data. The Cbox inverts two of the ECC check bits (bits 3,7) which gives a bigh probability that when the data is read again, an uncorrectable error will be detected. See description after this table. miss No error is reported. read for Mhox Cbox asserts c.cBO~S_ERR..H, puts backup cache into ETM. The read is sent to memory; if the backup cache actually owned the block the read will time out. If fill data is returned, the fill is done to the Bcache and the fill data is sent to the MbOx. write Cbox asserts c.cBO~S_ERR..H, puts backup cache into ETM. The Oread for the write is sent to memory. If the cache actually owned the block, the read will time out and the write will then be sent to memory. The write will then time out as well unless error handling software cleans up the problem. If the cache did not own the block, the Oread will complete, the write will be merged with it, and the merged data will be written to the cache. Uncorrectable ECC error in the tag store (includes addressing errors) No tag store lookup is done, so this case does not occur. cache coherence transaction DIGITAL CONFIDENTIAL Cbox asserts c.cBO~S_ERR_H, puts backup cache into ETM. Transaction is treated as a miss with regard to the backup cache; the invalidate is forwarded to the Mbox if the cache coherence transaction was due to an OREAD or a WRITE. The Cbox 13-111 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 One action noted in the table deserves further explanation. When an uncorrectable ECC error is detected in the data RAMs during a read-modify-write, the Bcache controller has already begun to write the new data into the cache, overwriting the errored data. The new data may have been corrupted by the errored data which was read from the cache. If this were allowed to be written into the cache with correct ECC, it might be read back later with no errors and incorrect data would be returned to the CPU. In order to prevent this from occurring, the Bcache controller inverts two of the checkbits which are being written to the cache to deliberately cause errored data to be written. This increases the likelihood that when the data is read back., an uncorrectable error will be detected whether the data is read back as written or with single-bit or multiple-bit errors. Due to layout constraints, only checkbits 3,6, and 7 were potential candidates to be inverted in the circumstance described. The probabilities for reading the data back as uncorrectable are shown in Table 13-59. Table 13-59: Probability of reading data with an uncorrectable error after writing It with Inverted checkblts Bita DO error single bit double bit triple single quad single Inverted read back error read error read m"bble error m"bble error back back read back read back 1.00 .3425 .9909 .4306 .6111 3,6 3,7 1.00 .3973 .9916 .4861 .6667 6,7 1.00 .1233 .9878 1.0000 1.0000 3,6,7 0.00 .9863 .44.29 1.0000 1.0000 Choosing bits 3 and 7 results in uncorrectable errors a high percentage of the time if you assume a high likelihood that the data will be read back with no error (as it would be if the original error were transient) or with a double-bit error (as it would be if the original error were a hard double-bit error). 13-112 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.9.9 Backup Cache Errors Incurred While in Error Transition Mode Table 13-60 describes error handling when the backup cache is already in ETM. NOTE The table below only describes ETM error cases which differ from error handling when the cache is in normal mode. Table 13-60: Backup Cache ECC Error handling during ETM General Problem Specific Situation and Action Taken by NVAX CPU Correctable ECC error in the tag store WRITE_UNLOCK The enror is corre~d and the WRITE_UNLOCK is handled as it normally is in ETM: it is written to the Bcache if it hits owned, and it is written to memory if it misses or hits valid. Uncorrectable ECC error in the tag store (includes addressing errors) read for Mbox Cbox asserts C%CBO~s_mm_H, puts backup cache into ETM. The read is sent to memory; if the backup cache actually owned the block the read will time out. If fill data is returned, the fill is not done to the Bcache but is sent to the Mbox. write c%cBo~sJm.B...H. ~o~~mmJL The write is sent to memory. If the cache actually owned the block, the write will time out in the memory interface unless software forces the Cbox to disown the block. If the cache did not own the block, the system handles the write as it normally does for a cache which is oft ~o~sJm.B...H. The write is sent to memory as "a QW WDISOWN. Since the READ_LOCK was done just previously, memory always believes that we own the block. In most cases, the cache itself does not have a record of owning the block since a READ_LOCK to an owned block during ETM forces a writeback of the block. In these cases the WRITE_UNLOCK handling is very consistent. There is only one case where the cache does own the block: if we entered ETM on or after the READ_LOCK and before the WRITE_UNLOCK. In this case, the cache may contain previously written data which is not now reflected into memory. This may be handled by software. 13.9.10 NDAL Parity Errors The Cbox response to NDAL parity errors is described in Chapter 3. DIGITAL CONFIDENTIAL The Cbox 13-113 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.10 Testability The testability features provided in the Cbox make key Cbox control visible for debug purposes. The testability features do not specifically address fault coverage for manufacturing, since Cbox activity is very visible on the NDAL and cache interface pins. Many of the Cbox IPRs should be useful for testing and debug. The IPRs are described in Section 13.5. This section describes additional Cbox testability features. 13.10.1 Parallel port The parallel port is useful for real-time debugging and for manufacturing test. The Cbox does not control any nodes using the parallel port; it is used for observation only. C%PP_DATA...H<1l:7> are driven as shown in Table 13-61. The Mbox contains the circuitry which enables C%PP_DATA....H<1l:7> to drive the parallel port when 'N'cMB0X-DR_PP_H is asserted. Table 13-61: Cbox Parallel Port Connections Parallel signal port Cbox Signal Meaning Cbox signal Given in Table 13-62 cc.ppJ)ATA...B<ll> Given in Table 13-62 Of,ppJ)ATA,..B<1O> BC_TS_CMD<l> cc.ppJ)ATA,.,B< 8> BC_TS_CMD<O> Given in Table 13-62 Of,PPJ)ATA,.,B< 8> DEALLOC Asserted when the tag store starts a deallocate. Of,PPJ)ATA,.,B< 7> BC_HIT Backup cache hit; factors in the type of request with VALID, OWNED, and the result of the tag compare. Table 13-62: Interpretation of BC TS CMD<2:0> BC_TS_CMD Name Tag store operation 000 DREAD Data-stream tag lookup 001 mEAD Instruction-stream tag lookup 010 OREAD Ownership-read tag lookup for a write or a READ_LOCK 011 WUNLOCK Ownership-read tag lookup for a WRITE_UNLOCK (done only under ETM) 100 R_INVAL Cache coherency tag lookup as the result of NDAL DREAD or IREAD 101 O_INVAL Cache coherency tag lookup as the result of NDAL OREAD or write 110 IPR_DEALLOC Tag lookup for an explicit IPR deallocate operation 111 unused 13-114 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.10.2 Internal scan chain A scan chain is provided on both entries of the FILL_CAM. A Linear Feedback Shift Register is provided on this scan chain. This serves two purposes: it helps the debug effort and it increases fault coverage in manufacturing. The scan chain bits are loaded when Mtr~_ISR_LOAD_L is asserted; they are shifted out when it is deasserted. The LFSR is enabled when Mo/£_ISR_LFSR_L is asserted. When Mtr~_ISR_LFSR_L is not asserted, the scan chain becomes an observe-only register. The FILL_CAM gives cycle-by-cycle information on what is happening in the Cbox, as every potential cache miss is loaded into the FILL_CAM before the miss actually occurs. There is information relating to cache coherency requests as well. The Cbox scan chain covers the following bits of the FILL_CAM: Table 13-63: FILL_CAM scan chain Name Extent Type RDLK_O °1 WC Indicates that the outstanding read is a READ_LOCK RO This is an Istream read from the Mbox which may be aborted. RO This is an outstanding OREAD. RO This read was done for a write. TO_MBOX_O 2 3 4 RO Data is to be returned to the MbOx. RIP_O 5 RO READ invalidate pending. OIP_O RO OREAD invalidate pending. DNF_O 6 7 RO Do not fill - data not to be written into the cache or validated when the fill returns. RDLK_FL_DONE_O 8 RO Indicates that the last fill for a READ_LOCK arrived. RE<LFILL_DONE_O 9 RO Indicates that the requested quadword was successfully received. COUNT_O 11:10 12 13 14 15 16 17 18 19 20 RO How many of the :fill quadwords have been returned successfully. WC Indicates that an error occurred and the register is locked. WC Indicates that the outstanding read is a READ_LOCK RO This is an Istream read from the Mbox which may be aborted. RO This is an outstanding DREAD. RO This read was done for a write. RO Data is to be returned to the MbOx. RO READ invalidate pending. RO OREAD invalidate pending. RO Do not fill - data not to be written into the cache or validated when the fill returns. 21 22 24:23 RO Indicates that the last fill for a READ_LOCK arrived. RO Indicates that the requested quadword was successfully received. RO How many of the fill quadwords have been returned successfully. IREAD_O OREAD_O WRITE_O VALID_O RDLK_l IREAD_l OREAD_l WRITE_l TO_MBOX_l RIP_l OIP_l DNF_l RDLK_FL_DONE_l RE<LFILL_DONE_l COUNT_l DIGITAL CONFIDENTIAL Description The Cbox 13-115 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 13-63 (Cont.): FILL CAM scan chain Name Extent Type 25 we Description Indicates that an error occurred and the register is locked. There are two FILL_CAM entries. Thirteen hits in each are covered, for a total of 26 bits in this scan path. The Cbox scan chain is connected in the order shown in the table, with bit <0> shifted out first and sent to the Mbox scan chain. When the Cbox scan chain is in shift mode, a "0" is shifted into bit <25> of the Cbox scan chain. Bit <0> is driven onto C%ISR2_TDO_H, which is input to the Mbox scan chain. 13-116 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.11 Performance Monitoring The Cbox sends two signals, C%PMUXO_H and C%PMUXl_H to the performance counters. CCTL<PM_ACCESS_TYPE> controls the mux which outputs C%PMUXO_H. CCTL<PM_IDT_TYPE> controls the mux which outputs C%PMUXCH. The correspondence between CCTL<PM_ACCESS_TYPE> and C%PMUXO_H is shown in Table 13-64. Table 13-64: Cbox Performance Monitoring Control CCTL: Signal muxed Signal functionality Bcache coherency access (as a result of an NDAL DREAD, IREAD, OREAD, or WRITE) 001 BC_COR_READ Bcache coherency access as a result of an NDAL DREAD or IREAD 010 BC_COR_OREAD Bcache coherency access as a result of an NDAL OREAD or WRITE 011 100 unused BC_CPU Bcache CPU access (as a result of an NVAX Iread, Dread, or Oread) 101 110 BC_CPU_IREAD Bcache CPU access as a result of an NVAX Iread BC_CPU_DREAD Bcache CPU access as a result of an NVAX. Dread or Dread-modify 111 BC_CPU_OREAD Bcache CPU access as a result of an NVAX. Oread due to a read lock, a write, or a write unlock. The correspondence between CCTL<PM_HIT_TYPE> and C%PMUXl_H is shown in Table 13-65. Table 13-65: Cbox Performance Monitoring Control CCTL: Signal muxed onto C%PMUXl...H Signal functionality 00 Bcache hit; factors in VALID and OWNED as necessary, based on the transaction. 01 Bcache hit owned; tag matched, VALID and OWNED were set. 10 Bcache hit valid; tag matched, VALID was set, OWNED was either set or clear. 11 Bcache miss; tag did not match, VALID and OWNED were set (triggers writeback). The HIT signals which produce C%PMUXl_H are valid during the same cycle in which the ACCESS signals which produce C%PMUXO_H are asserted. They must be valid at the same time because in the central performance monitoring hardware, C%PMUXl_H is conditioned with C%PMUXO_H. DIGITAL CONFIDENTIAL The Cbox 13-117 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.12 Initialization When the CPU powers up, K..C%RESET_L and K%RESET_CCTL.-L are asserted, clearing the main queues and latches in the Cbox and putting the Cbox state machines into their idle states. The only Cbox IPR which is initialized on reset is the Cbox control register, CCTL. It is initialized as described in Section 13.5.1. s:.C%RESET_L is also asserted when the Ebox timeout counter expires. At this time K%RESET_CCTI...-L is not asserted. Thus, the Cbox is initialized just as on power-up except that CCTL is not changed. s:.C%RESET_L must be asserted for 18 internal cycles (6 NDAL cycles) in order to properly reset the Cbox. The backup cache must be initialized and turned on as described in Section 13.9.6. Software must write CCTL to the desired state. The W1C error registers should be cleared so that they are starting with no error bits set. When the CPU powers up, K%EXT_RESET_L is asserted which puts the pads into their reset state: • • • • • Tristates P%NDAL_H<63:O>, P%CMD_H<3:0>, P%ID_H<2:0>, and P%PARITY_H<2:0>. This occurs when internal reset is asserted, and is not qualified with any clock. Releases P%ACK_L. is occurs when internal reset is asserted, and is not qualified with any clock. Deasserts lHToCPU_REQ..L, lHToCPU_HOLD_L, and P%CPU_SUPPRESS~. This occurs when Ko/oEXT_RESET..L is asserted, and is not qualified with any clock. Deasserts P%TS_OE_L, P1oTS_WE_L, P%DR_OE_L, and P%DR_WE_L. This occurs when K%EXT_RESET_L is asserted, and is not qualified with any clock. Tristates P%TS_TAG_H<31:17>,P%TS_ECC_H<5:O>,P%TS_VALID_H, P%TS_OWNED_H, P%DR_DATA_H<63:0>, and P%DR_ECC_H<7:0>. This occurs when K%EXT_RESET_L is asserted, and is not qualified with any clock. 13-118 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.13 Cbox Interfaces The Cbox interfaces with the Mbox, the NDAL, the backup cache, the Interrupt section, and the Clock section. The signals the Cbox uses for each of these interfaces are listed here. Table 13-66: CBOX interface signals Signal Number 110 Description ~oCPU_RECLL 1 Requests the NDAL. ~oCPU_BOLD_L 1 0 0 Holds the NDAL. ~oCPU_GRANT_L 1 I Grants NVAX the NDAL. ~oCPU_SUPPRESS_L 1 0 Suppresses the NDAL. ~oCPU_WB_ONLY_L 1 I Suppresses non-writeback NVAX transactions. P%NDAL_H<63:O> 64 110 110 110 110 110 NDAL address/data, multiplexed lines. NDAL SIGNALS (80 total) ~oCMD_H<3:0> 4 P%ID_B<2:O> 3 P%PARITY_H<2:O> 3 ~~CK_L 1 NDAL command. Identifies the NDAL driver. Parity on the NDAL. Acknowledges NDAL cycles as correctly received. BACKUP CACHE TAG STORE SIGNALS (41 total) P%TS_INDEX_H<20:5> 16 P%TS_OE_L 1 P%TS_WE_L 1 P%TS_TAG_H<31:17> 15 P%TS_ECC_H<5:O> 6 P%TS_OWNED_H 1 P%TS_VALID_H 1 0 0 0 110 110 110 110 Index into the tag store. Tag Store Output Enable. Tag Store Write Enable. Backup cache tag. Tag store ECC. Indicates ownership of the block. Indicates the block is valid. BACKUP CACHE DATA RAM SIGNALS (92 total) P%DR_INDEX_H<20:3> 18 P%DR_OE_L 1 P%DR_WE_L 1 P%DR_DATA_H<63:O> 64 P%DR_ECC_H<7:O> 8 DIGITAL CONFIDENTIAL 0 0 0 110 110 Index into the data rams. Data RAM output enable. Data RAM write enable. Backup cache data. Backup cache data ECC. The Cbox 13-119 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 13-66 (Cont.): CBOX Interface signals Number VO P%PBI12_IN_H 1 I NDAL clock used in the pads. P%PHI23_IN_H p%pm34_IN_H 1 I NDAL clock used in the pads. 1 I NDAL clock used in the pads. p%pm41_IN_H 1 I NDAL clock used in the pads. Signal Description CLOCK PINS (4 total) CLOCK SECTION INTERFACE (5 total) KJI~~l_H 1 I Clock used in the Cbox. K.,.MC:n,PRU_H 1 I Clock used in the Cbox. K__C:n,P~3_H 1 I Clock used in the Cbox. KJlC:n,PHL,,-H 1 I Clock used in the Cbox. It.P~PHI_IJ! 1 I Clock used in the upper pad ring. K...P~3_H 1 I Clock used in the upper pad ring. It.P~"-H 1 I Clock used in the upper pad ring. It.PADLVBJ...l_H 1 I Clock used in the lower pad ring. It.PADL~.":LH 1 I Clock used in the lower pad ring. It.PADLVBJ...3_H 1 I Clock used in the lower pad ring. It.PADLVBJ..."J! 1 I Clock used in the lower pad ring. K%BXT_REBET.J. 1 I Puts the cache and NDAL pads into their reset state. It.CCJl>RESET_L 1 I Resets the Cbox except for CCTL. K%RESET_CCTL..L 1 I Resets the Cbox control register, CCTL. I Resets the BID cycle counter which relates internal to external time. It.CE'*'BESETJ! 1 EBOX INTERFACE SIGNALS (2 total) C%CBO~_H 1 o Indicates a hard error in the backup cache or on the NDAL. C%CBOx...S_ERR_H 1 o Indicates a soft error in the backup cache or on the NDAL. ECJ&.TJMEOUT_BASBJ! 1 I Controls the NDAL read timeout counters. ECJ&.TJMEOUT~H 1 I Controls the NDAL read timeout counters. 13-120 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 13-66 (Cont.): CBOX Interface signals S~ Nwmb~ VO Description TEST AND PERFORMANCE MONITORING SIGNALS (10 total) 5 o Cbox internal state, driven to the Mbox, where it is driven to the parallel port when selected. 1 o Cbox internal scan chain output which hooks up to the Mbox scan chain. 1 I Tells the Cbox LFSRJinternal scan chain whether to load or shift. 1 I Puts the Cbox LFSRlinternal scan chain into Linear Feedback Shift Register mode. 1 I Clocks the boundary scan cells. 1 I Clocks the boundary scan cells. 1 I When asserted, the boundary scan cells are in load mode; otherwise, they are in shift mode. 1 I When asserted, the pins are driven with data from the boundary scan cells rather than with NVAX internal data. 1 I Controls the update of the cache I/O pads, when driven by JTAG. C_PAD.-N%BSR..-.ND~H<S3> 1 o Boundary scan chain output from the Cbox pads. E_PAD_IN'NBSR.-MACHlNE_CBECK.-L 1 I Boundary scan chain input from the Ebox pads. :s:..PAD_CKHDISABLE_O'VT_H 1 I Asynchronously disables all NVAX outputs from driving; equivalent to the inversion of P%DISABLE_OUT_L. 1 1 DIGITAL CONFIDENTIAL o o Cbox performance monitoring output. Cbox performance monitoring output. The Cbox 13-121 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 13-66 (Cont.): Signal CBOX Interface signals 110 Description 5 I Mbox reference command field. 29 I Physical address of Mbox reference. 3 I Physical address of Mbox reference, lower three bits. 8 I Byte enable field of Mbox reference. 1 I Indicates that the current 86 reference packet should be latched and processed by the CboX; not asserted for writes as all writes are processed by the COOL 1 I This is equivalent to M'*CB01:..REFJ£NABLEJ., but driven to the COOx with later timing, after the Mbox detects a Pcache parity error. It indicates that the S6 reference packet should be processed by the COOL 1 I Indicates that any IREAD which the Cbox may be processing should be immediately terminated. 1 I Indicates that the Cbox may drive nS~LDATA...H<AO> during the following cycle in order to attempt a fill data bypass. 64 I/O Bus used to receive data from the Mbox and to send data to the MboL 8 o Byte data parity for BU6J)ATA,..H<63.cb. 2 Hexaword address for invalidate sent to Mbox 2 o o o 1 o Indicates that the requested quadword of data is being returned. This is asserted for both DREADs and !READs; it is also asserted if a hard error occurs on fill data and the requested quadword has not yet been returned. 1 o Indicates that this is the last fill sent for the read being processed 1 o Indicates that a hard error is associated with the data being returned. The Mbox treats this as a fill with an error. 1 o Indicates that an ECC error is associated with the data being returned. The Mbox ignores the data and waits for another fill from the CbOL 1 o Indicates that the Cbox cannot accept any more entries in its WRITE_QUEUE. Number MBOX INTERFACE SIGNALS (157 total) 27 13-122 The Cbox Command field of Cbox reference sent to Mbox Address bits to indicate to which quadword within the hexaword the current fill data belongs. DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specificationt Revision I.I t August 1991 13.14 Resolved Issues 1. Issue: Does the Cbox need to check for conflicts between writes into the Istream and IREADs? Resolution: Yes, it does. The following case illustrates why. Suppose that the Cbox did not check for conflicts between writes and Istream reads. Also note that the SRM requires that an REI be done after any write into the instruction stream. REI flushes all write buffers and flushes the VIC. Suppose that the Ibox is prefetching and issues mEAD A, IREAD A+1. A and A+l are adjacent hexawords. Around the same time the Ebox is doing unaligned WRITE A,A+l,REI which was caused by Istream previous to that now being fetched by the Ibox. Suppose the sequence as seen by the Mbox is IREAD A, unaligned WRITE A,A+ 1, IREAD A+l, REI. The first IREAD is prefetching Istream data and should retrieve the new A. If the first IREAD misses in the VIC and the Pcache, the Bcache will return old data for the IREAD. The write will then be done into the Pcache, since it is write-through, and into the WRITE_QUEUE. At this point the new data for A is in the Pcache. Now the second IREAD misses the Pcache and appears in the IREAD_LATCH in the Cbox. It is serviced before the write since no conflict checking is done for IREADs, and they take priority over writes. Old data is returned to the Pcache for the second IREAD. Then the Clear Write Buffer command appears in the Cbox because the Ebox is executing the REI so the write is done. At this point the VIC has old data for the IREADs. This is ok because the REI flushes the VIC. Location A is updated in the Pcache because the write was done after the first IREAD. However, the Pcache has old data for A+l because the Bcache returned the old data after the write missed into the Pcache. When the Istream re-fetches A+l, it will get old data from the Pcache. This is not the behavior we want. Thus, the Cbox implements conflict checking for IREADs and prevents the IREAD of A+l from bypassing the write to A+1. 2. Issue: Is it ok that the Cbox reorders I/O space writes with respect to memory space writes? Resolution: Yes, it is OK per VAX ECO 95, Allow Write-and-Run to 110 space. This is the scenario where the Cbox may reorder I/O space writes with respect to memory writes: The Mbox issues Memory Write A followed by 110 Write B. Memory Write A hits owned in the backup cache and is written. 110 Write B goes to the NON_WRITEBACK_QUEUE. The NDAL is busy or P%CPU_WB_ONLY_L is asserted, so I/O Write B stays in the NON_WRITEBAC~QUEUE. Meantime, a cache coherency request arrives for memory location A. The data is retrieved from the backup cache and put into the WRITEBACK_QUEUE. Since the WRlTEBACK_QUEUE contains a cache coherency request (or Po/oCPU_WB_ONLY_L is asserted), the WRITEBACK_QUEUE has priority over the NON_WRITEBACK_QUEUE. Therefore Memory Data A reaches the NDAL before I/O Write B, effectively reordering the writes. DIGITAL CONFIDENTIAL The Cbox 13-123 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.15 NVAX CBOX Signal Name Cross-Reference All CBOX signal names and pin names referenced in this chapter have appeared in bold and reflect the actual name appearing in the NVAX schematic set, with the exception of K%EXT_RESET_L, which is a behavioral model name only. For each signal and pin appearing in this chapter, the table below lists the corresponding name which exists in the behavioral model. Table 13-67: Cross-reference of all names appearing In the CBOX chapter Schematic Name Behavioral Model Name C'l£BOI..,CMD...B<11O> C'I£BOI..,ECC_ERR.ft C'I£BOI..,HAB.D_BRR_H CCJ5oCBOx....:a.JmR_H C'I£BOI..,H..ERR-H CCliCBOI..,S.,;ERR..H C'I£BOI..,S_EBB._H OJIIIS~TDO...B OI.ISB2_TDOJI OJIILASTJILL_H OM.AST_FILL.ft CCQOK)I..,~QW_H<4rS> OQDI()~QW_H<4rS> CClWMUXOJI OJIIPMUXl.ft OJIIPPJ)ATA..H<1117> OJIIWR...BVFJlACK...PREB_H C..AJ)C1aABVS...B<31aO> C..AJ)C4Ji>BIU-.ADDR_OVT.ft411O> C_~-.ADDR_OVTJl411O> C_B~-.ADDR.-lNJl411O> C_BItJilU.DC..,.ADDR_lNJl41aO> C_B~CLE_l...B C_BIO'«:YCLE_l...B C_BltJCJlCYCLE_2J1 C_BIO'«:YCLE_2_H C_BltJCJlCYCI..B..3J1 C_BIO'«:YCLB_S_H C_BIU_~TIMO_O..LAT...B C_BIU..NOCYXLTIMO_O_LAT_H C_BIU_~TJMO_l..LAT...B C_BIU..NOCYXLTIMO_l..LATJI C_BIU..NOC_5YXL'l'DIO_O..;EN..H C_BIU..NOCYXI_TIMO_O..EN_H C_BIU..,.NOC_5YXL'l'DIO_1..;EN_H CJIIt1..NOCYXLTIMO_l..EN_H C_BVSUIUJ)ATA...H<I3IO> C_BVSYIUJ)ATA....H<83IO> 13-124 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 13-67 (Cont.): Cross-reference of all names appearing in the CBOX chapter Schematic Name Behavioral Model Name E_PAD_IN'I'%BSR_MACBlNE_CHECK....L T_BSR%MACBINE_CBECK.-H K%EXT_'1MBS_H K_P.AD%EXT_TMBS_B K%RESET_CCTL_L K%RESET_L K....C%RESET_L K_C%RESET_L K....CE%RESET_H K_CE%RESET_H K....MCB%PBI_CH K%PBI_CB K....MCB%PBI_2_H K%PBI_2_B K....MCB%PBI_3_H K%PBI_3_B K....MCB%PBI_4_H K%PBI_"_B K....PAD%PBI_l_H K%PBI_CB K....PAD%PBI_3_H K%PBI_3_B K....PAD%PBI_4_H K%PBI_4_B K_PADL%PBI_CB K%PBI_l_B K....PADL%PBI_2_B K%PBI_2_H K....PADL%PBI_3..B K%PBl.-3_B K....PADL%PBI_4_B K%PBI_"_B K....PAD_CK2%DISABLE_OUT_B P%DISABLE_OUTJ.. K....PAD%EXT_RESET_TOP_L K%EXT_RESET_L K....PAD%EXT_RESET_BOT_L K%EXT_RESET_L M%ABORT_CBOXJRD....& M%ABORT_CBO%.,.m:o..H M%CBO~BypASS_ENABLE_B M%CBO~BYPASS_ENABLE_B M%CBO~LATE_EN_B M%CBO%.,.LATE_EN_H M%CBOOEF_ENABLE_L M%CBO%.,.REF_ENABLE_B M%C_ISR_LFSR_L T%ISR_LFSR_B M%C_ISR_LOAD_L T%ISR...LOAD_B M%C_S6_PA_B<2aO> M%C_S6_PA_H<2IO> M%S6_BYTE..MASK....BdsO> M%S6_BYTE...MABK....B<7sO> M%S6_CMD_B<4aO> M%S6_CMD_H<4sO> M%S6_PA.,.B<3113> M%S6_PAJI<3h3> T%MBO~DR_PP_B "rQIBO%.,.DR_PP_B T_JTG%BSR_:EXTEST_L T%BSR_EX'rEST_B T_JTG%BSR_UPDATE_L T%BSR_UPDATE_H T_.J'l:'Go%CAPTURE_L T%CAPTURE_H T_JTG%DRCLK..,.B T_.rrG_TAP%DR_CLKEN'_B T_JTG%DRCLK..,.L T_.rrG_TAP%DR_CLKEN'_H DIGITAL CONFIDENTIAL The Cbox 13-125 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 13-67 (Cont.): Cross-reference of all names appearing In the CBOX chapter Schematic Name Behavioral Model Name Po/oACK_L Po/oACK_L Po/tCMD_H<3:0> Po/tCPU_GRANT_L Po/oCMD_H<3:0> P%CPU_GRANT_L Po/tCPU_HOLD_L Po/oCPU_HOLD_L Po/tCPU_RE<LL Po/tCPU_SUPPRESS_L Po/oCPU_RE<LL Po/oCPU_SUPPRESS_L Po/tCPU_WB_ONLY_L Po/oCPU_WB_ONLY_L P%DISABLE_OUT_L P%DISABLE_OUT_L P%DR_DATA_H<63:0> P%DR_DATA_H<63:0> P%DR_ECC_H<7:O> P%DR_ECC_H<7:O> P%DR_INDEX_H<20:3> P%DR_INDEX_H<20:3> P%DR_OE_L P%DR_OE_L P%DR_WE_L P%DR_WE_L P%ID_H<2:O> Po/olD_H<2:O> P%NDAL_H<63:O> Po/oNDAL_H<63:O> Po/DOSC_TCl_H Po/oOSC_TCl_H P%PARITY_H<2:O> P%PHI12_IN_H P%PBI23_IN_H p%pm34_IN_H P%P.ARITY_H<2:O> P%PBI12_IN_H P%PBI23_IN_H p%pm34_IN_H p%pm41_IN_H p%pm41_IN_H P%PHI12_0UT_H P%PBI12_OUT_H P%TS_ECC_H<5:O> P%TS_ECC_H<5:O> P%TS_INDEX_H<20:5> P%TS_INDEX_H<20:5> P%TS_OE_L P%TS_OE_L P%TS_OWNED_H P%TS_OWNED_H P%TS_TAG_H<31:17> P%TS_TAG_H<31:17> P%TS_VALID_H P%TS_VALID_H P%TS_WE_L P%TS_WE_L 13-126 The Cbox DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 13.16 Revision History Table 13-68: Revision History Who When Description of change Rebecca Stamm 9-0ct-1991 Made the following change: Bcache data MUST be initialized with correct ECC on powerup, contrary to what was in previous revisions. Rebecca Stamm 16-Aug-1991 Minor updates and clarifications. RDE and UNEXPECTED_FILL are both set if an unexpected RDE arrives on the NDAL. During ETM, a read modify that does not hit owned causes a read to memory, NOT an OREAD to memory. On uncorr error on RMW, checkbits 3 and 7 are inverted rather than 3,6,7. Added description of why the bits are inverted. Rebecca Stamm 20-Feb-1991 Correct TS_CMD and DR_CMD encodings. Clarify some sections. Add description of NVAX-NDAL timing. Add statements that the contents of the Cbox error registers are not changed during reset. Added cache timing information. Added table of cache behavior while it is ON. Appended P% to the beginning of all pin names, since those match the schematics and the beh model. Add assertion levels to signal names. Rebecca Stamm 14-Aug-1990 Remove E%MEMORY_REBET, add x:,.RESET_CC'l".L..L Rebecca Stamm 4-Jul-1990 Correct description of HMKMORY_RESBT. Added ~cm,RESET...H. Added CCTL<FORCE_NDAL_PERR>. Update description of Cbox behavior when PO/OCPU_WB_ONLY_L is asserted. Update conditions for servicing the write queue. Update cache coherency section with bug correction. Added to cache ram speed table, I6ns. Clarify CEFSTS<COUNT>. Clarify BCFLUSH during FORCE_HIT mode. Update handling of DREAD lock which fails on an uncorrectable error on the first quadword. Clarify handling of correctable error in the tag store. Added section about the FILL_CAM and block conflicts. Rebecca Stamm 3-Jun-1990 Clarify handling of write, readlock CEFSTS<UNEXPECTED_FILL> WIC. Rebecca Stamm 17-May-1990 Clarify invalidate handling sections. Always give the WRITEBACK_QUEUE priority over the NON_WRITEBACK_QUEUE. Change bit definitions in- CEFSTS. Change WR_MRG_DONE to REQ..FILL_DONE in CEFSTS and FILL_CAM. Clarify stalling of IPR accesses to the tag store while a FILL_CAM entry to the same block is valid. DIGITAL CONFIDENTIAL in etm. Make The Cbox 13-127 NVAX CPU Chip Functional Specification, Revision 1.1 t August 1991 Table 13-68 (Cont.): Revision History Who When Description of change Rebecca Stamm 20-Feb-1990 Update error table. Add complete description of timeout counters. Change CCTL<TIMEOUT_EXT> to CCTL<TIMEOUT_TEST>, update description of that bit. Add ~'l'IMBOUT_BASEJI to Cbox interface signal list. Add control signal names for scan chain, updated scan chain section, removed two bits from the scan chain. Add control signal names for parallel port, updated parallel port section. Update description of CEFSTS RDLK bit. Clarified description of CEFADR. Clarified tag store actions on deallocates. Update performance monitoring hardware section and added control bits to CCTL. Correct clock names. Bcache read quadwords returned in wrapped order rather than in Grey code order. WRITEBACK_QUEUE full prevents all transactions from starting. Add BC_TS_CMD decodings for the parallel port. Added TS_CMD encodings to BCETSTS. Added DR_CMD encodings to BCEDSTS. More detail on NESTS bit descriptions. Better explanation of use of BCDECC register. Add detail to WRITE_UNLOCK explanation. Rebecca Stamm 3-Feb-1990 External release. Eliminated BCEDHI and BCEDLO IPRs. Made updates based on internal review. Rebecca Stamm 19-Jan-1990 Release for internal review. Rebecca Stamm 13-Jan-1990 Intermediate release. Many edits. Eliminated backup cache data RAM access through IPR reads and writes. Updated Cbox internal bUB sing diagrams and description. Write queue is 8 entries. Rebecca Stamm 21-Mar-1989 Release for external review Rebecca Stamm 16-Mar-1989 Release for internal review 13-128 The Cbox DIGITAL CONFIDENTIAL Chapter 14 Vector Interface 14.1 Description The l\TVAX CPU chip does not fully support the VAX vector instruction set and any attempt to execute a vector instruction will result in a reserved instruction fault. Vector instructions are listed in Table 14-1. Table 14-1: Vector Instruction Set Opcode Instruction 31FD MFVP regnum.rw, dst.wl 34FD VLDL cntr1.rw, base.ab, stride.rl 35FD VGATHL cntrl.rw, base.ab 36FD VLDQ cntr1.rw, base.ab, stride.rl 37FD VGATHQ cntr1.rw, base.ab 80FD VVADDL cntr1.rw 81FD VSADDL cntrl.rw, scal.rl 82FD VVADDG cntrlrw 83FD VSADOO cntr1.rw, scal.rq 84FD VVADDF cntrl.rw 85FD VSADDF cntrl.rw, scaLrl 86FD VVADDn cntrlrw 87FD VSADDD cntrl.rw, sca1.rq 88FD VVSUBL cntr1.rw 89FD VSSUBL cntrl.rw, scal.rl 8AFD VVSUBG cntrlrw 8BFD VSSUBG cntr1.rw, scaLrq 8CFD VVSUBF cntrl.rw 8DFD VSSUBF cntrl.rw, scal.rl DIGITAL CONFIDENTIAL Vector Interface 14-1 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 14-1 (Cont.): Vector Instruction Set Opcode Instruction 8EFD 8FFD VVSUBD cntrl.rw VSSUBD cntrl.rw, scal.rq 9CFD VSTL cntrl.rw, base.ab, stride.rl VSCATL cntrl.rw, base.ab VSTQ cntrl.rw, base.ab, stride.rl VSCATQ cntrl.rw, base.ab 9DFD 9EFD 9FFD AOFD A1FD A2FD A3FD VVMULL cntrl.rw VS:MULL cntrl.rw, scal.rl VVMULG cntrl.rw VSMULG cntrl.rw, scal.rq A4FD VVltfULF cntrl.rw AEFD VS~rtJLF cntrl.rw, scal.rl A6FD A7FD ABFD A9FD AAFD ABFD ACFD ADFD AEFD VVMULD cntrl.rw VS~ruLD cntrl.rv;, sca1.rq VSYNC regnum.rw MTVP regnum.rw, src.rl VVDIVG cntrl.rw VSDIVG cntrl.nv, scal.rq VVDIVF cntrl.rw VSDIVF cntrl.nv, scal.rl VVDIVD cntrl.rw VSDIVD cntrl.rw, scal.rq AFFD COFD C1FD C2FD C3FD C4FD C5FD C6FD C7FD C8FD C9FD CCFD 14-2 VVCMPL cntrl.rw VSCMPL cntrl.rw, scal.rl VVCMPG cntrl.rw VSCMPG cntrl.rw, scal.rq VVCMPF cntrl.rw VSCMPF cntrl.rw, scal.rl VVCMPD cntrl.rw VSCMPD cntrl.rw, scal.rq VVBISL cntrl.nv VSBISL cntrl.nv, sca1.rl VVBICL cntrl.rw Vector Interface DIGITAL CONFIDENTIAL. NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 14-1 (Cont.): Vector Instruction Set Opcode IDstractiOD CDFD VSBICL cntrl.rw, scalrl EOFD EIFD E4FD E5FD EBFD E9FD ECFD EDFD VVSRLL cntrl.rw VSSRLL cntrl.rw, scal.rl EEFD EFFD VVSLLL cntrl.rw VSSLLL cntrl.rw, scal.rl VVXORL cntrl.rw VSXORL cntr1.rw, scal.rl VVCVT cntrl.rw IOTA cntrl.rw, scal.rl VVMERGE cntrl.rw VS:MERGE cntrl.rw, sca1.rq Although the vector instruction set is not fully implemented, some residual support is included in the Nv...4.X CPU chip and should be considered: • • • • The Ibox, under control of the IROM, decodes the vector instructions listed above, including parsing and processing the instruction specifiers. If a memory management exception is detected on the instruction or one of the specifiers, the Ibox will report it to the Ebox, which will ignore it in favor of reporting a reserved instruction fault instead. However, if a hardware error is detected during the processing of the vector instruction or specifiers, that error will be reported in the usual way. The ECR<VECTOR_PRESENT> bit remains in the hardware, but a reserved instruction fault will result if a vector instruction is executed, independent of the state of this bit. A vector disabled fault will never be generated by the NVAX. CPU chip microcode. References to vector processor registers in the range 90-97 (hex) are intercepted by the microcode and are not transmitted on the NDAL as is the normal case for an unimplemented processor register. Rather, writes to these registers are ignored, and reads from these registers return O. The operating system depends on this behavior. DIGITAL CONFIDENTIAL Vector Interface 14-3 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 14.2 Revision History Table 14-2: Revision History Who When Description of chaDge Mike Uhler 06-Jan-1990 Initial release Mike Uhler 02-Feb-1991 Update aft.er pass 1 PG. 14-4 Vector Interface DIGfTAL CONFIDENTIAL Chapter 15 Error Handling This chapter describes the h"VAX CPU error exceptions and interrupts as seen from the macrocoder's point of view. It is organized with respect to the SCB vectors through which the event is dispatched. The SCB layout and SCB vector format are described in Chapter 2. Exceptions and interrupts that are a result of normal system operation are described in Chapter 2. 15.1 Terminology Term Meanjng Fill Any quadword of data re'turned to the l\"VAX CPU chip in response to read-'type operation. The quadword containing the requested data is a fill. Ownership bit In the Bcache and the memory, a bit is stored with each hexaword called the ownership bit. In the Bcache it indicates the Bcache owns the block.; it has the one valid copy of the data. In memory it indicates some cache or bus interface bas the one good copy of the block, not the memory. Memory cache state In memory in various system environments, a certain amount of state is kept for each hexaword in memory. This state always includes the ownership bit. In some system environments, it includes additional information. ETM Error transition mode in the Bcacb.e: in this mode the Bcache is not used except if it owns the addressed block. It continues to respond to NDAL coherency requests which require writeback. 15.2 Error handling Introduction and Summary This chapter discusses all levels of hardware and microcode-detected errors. Errors notification occurs through one of the following events, listed in order of decreasing severity. • • • Console error hal~A halt to console mode is caused by one of several elTOrs such as Interrupt Stack Not Valid. For certain halt conditions, the console prompts for a command and waits for operator input. For other baIt conditions, the console may attempt a system restart or a system bootstrap as defined by DEC Standard 032. The actual algorithms used are outside of the scope of this document. Machine check-A hardware error occurred synchronously with respect to the execution of instructions. Instruction-level recovery and retry may be possible. Power fail-The power supply asserted the power fail signal. DIGITAL CONFIDEt-.'TIAl. Error Handling 15-1 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • • • Hard error interrupt-A hardware error occurred asynchronously with respect to the execution of instructions. Usually, data is lost or state is corrupted, and instruction-level recovery may not be possible. 80ft error interrupt-A hardware error occurred asynchronously with respect to the execution of instructions. The error is not fatal to the execution of instructions, and instruction-level recovery is usually possible. Kernel stack not valid-During exception processing, a memory management exception occurred while trying to push information on the kernel stack. This chapter explains in detail several of the 8CB entry points. The purpose is to help the operating system programmer determine exactly what error occurred and to recommend an error recovery method. Since this chapter is only concerned with errors which are generic to all system environments, it may be used as the basis for a specification of error handling and recovery for particular systems based on the NVAX CPU chip. The following information is given in this chapter for each 8CB entry point: • • • • • What parameters are pushed on the stack. "1bat failure codes are defined. "1bat additional information exists and should be collected for analysis. How to determine what error(s) actually occurred. How to restore the state of the machine, and what level of recovery is possible. Table 15-1 shows the general error categories associated with each of these error notifications. Table 15-1: Error Summary By Notification Entry Point SCB Indu: EntryPomt (hex) General Error Categories Console Halt N/A IntelTUpt Stack not valid, kernel-mode halt, double error, illegal SCB vector, initial Power up, BALT_L assertion Machine Check 04 Memory management, intelTUpt, microcode detected CPU errors, CPU stall timeout, TB parity errors, VIC tag or data parity errors, Bcache uncorrectable data read errors, memoryINDAL read errors (no-ACK, timeout, or RDE from system environment) Power Fail OC system environment notification via PWRFL_L Soft Error InteITUpt 54 VIC tag or data parity errors, Pcache tag or data parity errors, Bcache uncorrectable tag errors, Bcache uncorrectable data read errors Bcache uncorrectable data errors in writebacks, Bcache correctable tag and data errors, memorylNDAL read errors (no-ACK, timeout, or RDE on reads), NDAL parity errors, system environment notification via S_ERR_L 15-2 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specmcation, Revision 1.0, February 1991 Table 15-1 (Cont.): Error Summary By.NoUficatlon Entry Point Entry Point Hard ElTor Interrupt 15.3 SCB lDdez (hez) 60 General Error Categories Bcache uncolTectable data errors on write operations, NDAL no-ACK on writes, Bcache :fill errors in NDAL ownership reads after merging write data in the cache data RAMs, system environment notification via H_ERR_L Error Handling and Recovery All errors (except those resulting in console halt) go through SCB vector entry points and are handled by service routines provided by the operating system. A console halt transfers control to a hard,vare-prescribed IO-space address. Software driven recovery or retry is not recommended for etTors resulting in console halt. Software error handling (by operating system routines) can be logically divided into the following steps: • • • • State collection. Analysis. Recovery. Retry. These steps are discussed in general in the next four sections. After that, details are supplied on analysis, recovery and retry for each error event which results in an exception or interrupt. This information is organized by SCB entry point. 15.3.1 Error State Collection Before error analysis can begin, all relevant state must be collected. The stack frame provides the PCIPSL pair for all exceptions and interrupts. For machine checks, the stack frame also provides details about the error. In addition to the stack frame, machine checks and hard and soft error interrupts usually require analysis of other registers. It is strongly recommended that all the state listed below be read and saved in these cases. State is saved prior to analysis so that analysis is not complicated by changes in state in the registers as the analysis progresses, and so that errors incurred during analysis and recovery can be processed with that context. Ibox ICSR: Ibox (VIC) control and status register. VMAR: VIC memory address register. Ebox ECR: Ebox control and status register. DIGITAL CONFIDENTIAL Error Handling 15-3 NVAX CPU Chip Functional Specification, Revision 1.0t February 1991 Mbox TBSTS: TB status register. TBADR: TB address register. PCSTS: Pcache status register. PCADR: Pcache address register. Cbox CCTL: Cbox Control Register. BCEDST8: Bcache data error status register. BCEDIDX: Bcache data error index register. BCEDECC: Bcache data error ECC/syndrome register. BCETSTS: Bcache tag error status register. BCETIDX: Bcache tag error index/address register. BCETAG: Bcache errored tag register. CEFSTS: Read and Bcache fill status register. CEFADR: Read and Bcache fill address register. l\"ESTS: NDAL error status register. NEOADR: NDAL error output address register. ~~OCMD: NDAL en-or output command register. l\TEICMD: NDAL error input command register. l\TEDATHI: NDAL error input data register (HI). l\TEDATLO: NDAL error input data register (LO). System environment All states (i.e., CSRs) which report error conditions or events. For the purposes of the rest of this chapter, it is assumed that each of these states is saved in a variable whose name is constructed by prepending "8_" to the register name. For example, the ICSR would be saved in the variable 8_ICSR. The following example shows allocation of memory storage for the error state. ; ERROR STATE COLLECTION DATA STORAGE S_ICSR: S_VMAR: • LONG • LONG 0 0 ;IBOX ; IBOX VIC CONTROL AND STATUS REGISTER ; IBOX VIC ERROR ADDRESS REGISTER S_ECR: • LONG 0 ;EBOX ; EBOX CONTROL AND STA'l'OS REGISTER S_TBSTS: S_TBADR: S PCSTS: S:PCADR: • LONG •LONG •LONG • LONG 0 0 0 0 ;MBOX 15-4 Error Handling n ; n STATUS REGISTER ERROR ADDRESS REGISTER ; PCACHE STATtJS REGISTER ; PCAC'HE ERROR ADDRESS REGISTER ; DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 S_CCn.: S BCEDSTS S:BCEDIDX S BCEDECC S-BCETSTS S-BCETIDX S-BCETAG: S:CEFSTS: S_CEFADR: S_NESTS: S_NEOADR: S_NEOCMD: S_NEICMD: S NEDATH!: S:l.1£DAl'LO: • LONG .LONG • LONG • LONG • LONG • LONG • LONG • LONG .l.ONG • LONG .LONG • LONG •LONG .LONG •LONG 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CBOX CBOX CONTROL REGISTER' BCACHE DATA RAM ERROR STATUS REGIS'l'ER BCACHE DA1'A RAM ERROR INDEX REGISTER BCACHE DA1'A RAM ECC/ SYNDRO!£ REGIS'l'ER BCACHE TAG RAM ERROR STATUS REGISTER BCACHE TAG RAM ERROR INDEX REGIS'l'ER BCACHE TAG RAM ERRORED TAG REGIS'l'ER READ AND BCACHE FILL ERROR STATUS REGISTER READ AND BCACHE FILL ERROR ADDRESS REGIS'l'ER NDAL ERROR STATUS REGISTER NDAl. OU'l'PO'T ERROR ADDRESS REGISTER NDAL OU'l'PO'T ERROR CO~ REGISTER NDAL INPU'l' ERROR COMMAND REGIS'l'ER NDAL INPU'l' ERROR ADDRESS REGIS'IER (HI) NDAL :!NPO'l' ERROR ADDRESS REGISTEP. (LO) SYSTE:M ENVIROme}."'!': R!::=!STERS FROM THE SYSTEM E1~YIRONMEN'! (MODULE, : Ji..?..E SA,,"ED S?.I: i ~OR MEMORY (S), BUS lNTEP.FA:::E (5» The following example shows collection of error state which would normally be done early in the eITor handling routine. Note the handling of error registers which may be overwritten in the e\Tent of a more severe error. For example, after a correctable Bcache data RAM error, BCEDIDX would hold the index of the correctable error. If an uncorrectable Bcache data RAl\-1 error occurs, BCEDIDX \vould be reloaded with the index of the more sever uncorrectable error. To ensure the data in BCEDIDX and BCEDECC matches the report in BCEDSTS, a conditional test is performed and these two registers are recaptured if both an uncoITectable and correctable error are reported in BCEDSTS. Otherwise, BCEDIDX and BCEDECC could reflect a previous correctable eITor even though BCEDSTS reports a more severe error. SAVE S'!;'.'!'E: ;!BOX M:?R MrPR .PR19S lCSR,S ICSR fPR19 s:\I'MAP-, s:VMAR ;EBOX ;MBOX MFPR MFPR MFPR MFPR MFPR MFPR MFPR MFPR BICl.3 CMPL BNEQ MFPR MFPR lOS: fPR19S TeSTS,S TaSTS tPR19S-TBADR,S-l'aADR tPR19S-PCSTS,S-PCSTS tPRl9S:PCADR,S:PCADR ;CBOX tPR19S CCl'L,S CCTL tPR19S-BCEDIDX,S BCEDIDX tPRl9S-BCEDECC,S-BCEDECC fPRl9S-BCEDSl'S,S-BCEDsrs t~C<BCEDSrS$M coRa ! BCEDSTS$M LOCK>,S BCEDSTS,RO RO,tBCEDSTSSM-CORR ! BCEDSTSSM-LOCK lOS fPRl9S BCEDIDX,S BCEDIDX fPRl9S:BCEDECC,S:BCEDECC MFPR MFPR MFPR tPRl9S BCETIDX,S BCErIDX . tPR19S-BCETAG,S BcETAG tPR19S-BCETsrs,s BCETSl'S fAC<BCETSl'SSM CORR ! BCETSTSSM LOCK>,S BCEl'SrS,RO BICl.3 Ro,tBCETSrS$M-CORR ! BCETSTSSM-LOCK CMPL 2~ BNEQ MFPR tPRl9S BCETIDX,S BCETIDX MFPR fPR19S:BCETAG,s_icETAG DIGITAL CONFIDENTIAL Error Handling 15-5 NVAXl,;.l'U (,;.bip ,14'UDC'tiOna! ~pecificat10n, Revision I.U, J.c'ebruary 111111 20$: MFPR MFPR MFPR MFPR MFPR MFPR MFPR MFPR fP~9S CEFSTS,S CEFSTS fPR19S-CEFADR,S-CEFADR fPR19S-NESTS,S NESTS fP~9S-NEOADR,S NEOADR fPR19S-NEOCHO,S-NEOCKD fP~9S-NEICHO,S-NEICMD fPR19S-NEDATHI,S NEDATHI fPR19S:NEDATLO,S:NEDATLO ; SYSTEM ENVIRONMENT COLLECTION OF SYSTEM ENVIRONMENT ERROR REGISTERS GOES HERE A.dditional state collection is recommended while/after flushing the Bcache because certain errors may occur as a result of the flush operation. The following state should be collected immediately after flushing each Bcache location. Cbox CCTL: Cbox Control Register. BCEDSTS: Bcache data error status register. BCEDIDX: Bcache data error index register. BCEDECC: Bcache data error ECC/syndrome register. BCETSTS: Bcache tag error status register. BCETIDX: Bcache tag error index/address register. BCETAG: Bcache errored tag register. l\j~STS: ~L>'.o\L error status register. !\~OADR: ~L>AL error output. address register. l\i"EOCMD: N'TIAL error output command register. System environment All states (i.e., CSRs) which report the event of NVAX sending a BADWDATA cycle on the NDAL. For the purposes of the rest of this chapter, it is assumed that each of these states is saved in a variable whose name is constructed by prepending tlSS_tI to the register name. For example, the BCEDSTS register would be saved in the variable SS_BCEDSTS. The following example shows allocation of memory storage for additional error state collected while/after flushing the Bcache. ; ADDITIONAL ERROR STATE COLLECTION DATA STORAGE FOR AFTER BCACHE FLUSH SS CCTL: SS:BCEDSTS: SS_BCEDIDX: SS_BCEDECC: SS BCETSTS: SS-BCETIDX: SS:BCETAG: SS NESTS: SS:NEOADR: SS_NEOCHO: • LONG • LONG • LONG • LONG • LONG • LONG • LONG • LONG • LONG .LONG 0 0 0 0 0 0 0 0 0 0 ;CBOX CBOX CONTROL REGISTER BCACHE DA'rA BAM ERROR STA'l"OS REGISTER BCACHE DATA BAM ERROR INDEX REGISTER BCACHE DATA BAM ECC/SYNDROME REGISTER BCACHE TAG RAM ERROR STATUS REGISTER BCACHE TAG RAM ERROR INDEX REGISTER BCACHE 'rAG RAM ERRORE!> TAG REGISTER NDAL ERROR STATUS REGISTER ; NDAL O'OTP'O'l' ERROR ADDRESS REGISTER ; NDAL OUTPUT ERROR COMMAND REGISTER SYSTEM ENVIRONMENT: ADDITIONAL ERROR STATE COLLECTION DATA STORAGE FOR AFTER BCACHE FLUSH REGISTERS WHICH ARE AFFECTED BY A BADWDATA CYCLE FROM WAX ARE SAVED HERE AFTER THE BCACHE FLUSH 15-6 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The following example shows collection of error state which would normally be collected during and just after flushing the Bcache. 305: MFPR MFPR MFPR MFPR BICL3 CMPL BNEQ MFPR MFPR ;CBOX tPR19S CCTL,SS CCTL tPRl9S-BCEDIDX;SS BCEDIDX tPR19S-BCEDECC,SS-BCEDECC fPR19S-BCEDSTS,SS-BCEDSTS fAC<BCEoSTSSM CORR ! BCEDSTSSM LOCK>,SS BCEDSTS,RO RO,tBCEDSTSSM-CORR! BCEDSTSSM-LOCK 30S fPR19S BCEDIDX,SS BCEDIDX fPR19S:SCEDECC,SS:BCEDECC MFPR MFPR M:PR B!~3 o..~~ B!r=;Q !-::'PR ~~PR fPR19S BCETIDX,SS BCE!IDX f:R19S-BCETAG,SS BCETAG ~PR19S-BCETSTS,SS BC£:STS fAC<BC£TSTSSM co?i ! BCETSTSSM ~OCK>,SS BC£TSTS,RO RO,fBCETSTSSM:CORR! ECETSTSSM:~OCK 405 fP?19S BCE'!'IDX, S5 :9C::=::O>: *?R19S::SC:::AG,SS_BCETAG ~~PR ~?R195 M!'PR !":::::?.. 15.3.2 NE5TS,55 ~~s=s .:?19S-1~CllD?, SS n~~;'.DR *:?,,:gs:!~~~=, ss:!·~o:!~= Error Analysis With the error state obtained during the collection process, the error condition can be analyzed. The purpose is to determine what error event caused the particular notification being handled (to the extent possible), and what other errors may also have occurred. Analysis of machine checks and hard and soft error interrupts should be guided by the parse trees given in the appropriate sections below. NOTE Errors detected in or by one of the caches usually result in the cache automatically being disabled. However, to minjmize the possibility of nested errors, it is suggested that error analysis and recovery for memory or cache-related errors be performed with the Pcache disabled and the Bcache in ETM. In some cases, a notification for a single error occurs in two ways. For example, an uncorrectable error in the Bcache data RAMs will cause a soft error interrupt and may also cause a machine check.. Software should handle cases where a machine check handler clears error bits and then the soft error handler is entered with no error bits set. In certain cases one error event results in two related reports. For example, a Bcache uncorrectable data error during a writeback will be reported in NESTS as a BADWDATA event. In this case, the BADWDATA event captures the full address of the errored data (that is why BADWDATA is an error event). Cases like this are handled as single error events. In general an error reporting register can report events which lead to machine check., soft error, or hard error. A given error event can result in machine check and soft error interrupt, or in just one or the other. Events which lead to hard error interrupts generally can not also cause machine check or soft error interrupt. Sometimes an error event which leads to machine check or soft error interrupt is closely related to an event which leads to hard error interrupt (e.g., Bcache DIGITAl. CONFIDENTIAL Error Handling 15-7 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 :fill error on :first quadword of a fill for an OREAD done for a write causes soft error interrupt, but the same error on a later quadword causes hard error interrupt). Multiple simultaneous errors may make useful recovery impossible. However, in cases where no conflict exists in the reporting of the multiple errors (i.e., no one error register is used to report two errors), and recovery from each error is possible, then recovery from the set of errors is accomplished by recovering from all of them. For example, recovery from a Pcache tag parity error and a Bcache correctable data error being reported together is possible by following the recovery procedures for each error in sequence. The error cause determination parse tree for machine check exception is directed at causes or possible causes of machine checks. It ignores errors which lead to hard or soft error interrupts but not to machine checks. Similarly, the hard error interrupt cause determination ignores errors which lead to machine check or soft error interrupt, and the soft error interrupt cause determination ignores errors which lead to machine check or hard error interrupt. There is a natural order between machine check, hard error interrupt, and soft error interrupt because the IPL for hard error interrupts is higher than that of soft error interrupts and the IPL in the machine check exception is higher than either of the error interrupts. This hierarchy is important because knowledge ofwmch notification event occurred is used to discriminate between certain error events (e.g., an error on the initial fill quadword for a read-lock is distinguished from a fill error on a subsequent quadword by the fact of machine check notification). 15.3.3 Error Recovery Recovery from errors consists of clearing any latched error state, repairing damaged state (if necessary and possible), and restoring the system to normal operation. There are special considerations involved in analysis and recovery from cache or memory errors, which are covered in the next sections. Recovery from multiple error scenarios is possible when there is no conflict in the error registers which report the errors and there is no conflict in the recovery procedures for the errors. However all recovery procedures in this chapter assume that only one error is present. None of the procedures are valid in multiple error scenarios without further analysis. In some instances, it may be desirable to stop using the hardware which is the source of a large number of errors. For example, if a cache reports a large number of errors, it may be better to disable it. It is suggested that software maintain error counts which should be compared against error thresholds on every error report. If the count (per unit time) exceeds the threshold, the hardware should be disabled. NOTE Hard failures of one bit in the tag store can lead to unrecoverable errors requiring a full system crash. It would be appropriate to have an extremely low threshold for tag store correctable errors, especially if they recur in the same location or bit position. NOTE NVAX CPU utilization of the NDAL and memory is extremely high if the Bcache is disabled. In multiprocessor systems a CPU should probably be removed from the system rather than being used with the Bcache off. In a single processor system there 15-8 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 may be effects to 10 subsystem performance and latency due to the high NDAL and memory utilization. 15.3.3.1 Special Considerations for Cache and Memory Errors Cache and memory error recovery requires special considerations: • • • • • • • • • • • • • Cache and memory elTor recovery should always be done with the Pcache and VIC off and the Bcache in error transition mode (ETM). (In certain cases, the last part of recovery must be done with the Bcache off.) See Section 15.3.3.1.1.1, Cache Enable, Disable, and Flush Procedures. Bcache Hush is necessary before re-enabling the Bcache whenever it is in ETM. See Section 15.3.3.1.1, Cache Coherence in Error Handling. Bcache flush should be always be done one block at a time, recapturing the relevant elTor registers between each block flush. Cache coherence requires a specific procedure for re-enabling the caches. See Section 15.3.3.1.1, Cache Coherence in Error Handling. ElTor recovery should be performed starting with the most distant component and working toward the CPU and Ebox. System environment memory errors should be processed first, followed by NDAL errors, Bcache fill elTors, Bcache tag store and data RAl\1 errors, Pcache errors, TB errors, and, finally, VIC errors. ~"DAL errors are cleared by writing the write-one-to-clear bits in J\!~STS. The suggested way to do this is to write a one to the specific error bit. Bcache fill errors are cleared by writing the write-one-to-clear bits in CEFSTS. The suggested way to do this is to write a one to the specific elTor bit. Special recovery procedures. may be necessary after Bcache:fill errors. See Section 15.3.3.1.2, Special Writeback Cache Recovery Situations and Procedures. Bcache tag store errors are cleared by writing the write-one-to-clear bits in BCETSTS. The suggested way to do this is to write a one to the specific error bit. Special recovery procedures may be necessary after Bcache uncorrectable tag store errors. See Section 15.3.3.1.2, Special Writeback Cache Recovery Situations and Procedures. Bcache data RAM errors are cleared by writing the write-one-to-clear bits in BCEDSTS. The suggested way to do this is to write a one to the specific error bit. Special recovery procedures may be necessary after Bcache uncorrectable data RAM errors. See Section 15.3.3.1.2, Special Writeback Cache Recovery Situations and Procedures. Hardware ETM is cleared by writing the write-one-to-clear hit in CCTL. The suggested way to do this is to write the value saved during error state collection back to the register. Pcache tag and data store errors are cleared by writing the write-one-to-clear bits in peSTS. The suggested way to do this is to write a one to the specific error bit. Pcache Hush is necessary after Pcache tag store parity errors. See Section 15.3.3.1.1.1, Cache Enable, Disable, and Flush Procedures. TB errors are cleared by writing the write-one-to-clear bits in TBSTS. The suggested way to do this is to write a one to the specific error bit. PTE read errors are cleared by writing the PTE error write-one-to-clear bits in PCSTS. The . suggested way to do this is to write a one to the specific error hit. DIGITAL CONFIDENTIAL Error Handling 15-9 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • VIC errors are cleared by writing the write-one-to-clear bits in ICSR. The suggested way to do this is to write a one to the specific error bit. VIC Hush and re-enable is necessary after VIC tag store parity errors. See Section 15.3.3.1.1.1, Cache Enable, Disable, and Flush Procedures. 15.3.3.1.1 Cache Coherence In Error Handling Certain procedures must be followed in order to maintain cache coherence while enabling NVAX caches. Since many errors cause caches to be disabled, and since cache and memory error recovery is normally done with the Pcache and VIC off and the Bcache in ETM, the complete cache enable procedure is done as part of recovery from all cache and memory elTors. Once the Bcache is in ETM mode, it will not be coherent with memory if it is re-enabled before being flushed. This is because writes (from the Mbox) to blocks which happen to be VALID_UNO~TED in the Bcache are not copied into the Bcache data RAMs. These writes are only sent out on the NDAL. Once the Bcache is put in ETM by hardware or soft,vare action, a Bcache flush must be done before re-enabling the Bcache. The procedure is described in the next section. ·Wbile the Bcache in in ETM or off, the Pcache will stay coherent with memory. Ho,\vever, before the Bcache is re-enabled, the Pcache must be disabled. After the Bcache is re-enabled, the Pcache must be flushed before it is re-enabled. The procedure is described in the next section. If a Pcache tag parity error occurred, the flush procedure gl,"en is sufficient to clean up the Pcache tag store. The VIC (virtual instruction cache) is not automatically kept coherent '\nth memory. It is flushed as a side effect of the REI instruction (as required by the VAX architecture). Normally in error recovery, there is no definite need to :fiush the VIC. For consistency and for the sake of beginning error retry in a known state, flushing the VIC during error recovery is recommended. However, in the event of VIC tag parity errors, the complete VIC Hush procedure described in the next section must be done. The TB is not automatically kept coherent with memory. Software uses the TBIS and TBIA functions to maintain coherence, and the LDPCTX instruction clears the process PTEs in the TB. Normally in error recovery, there is no definite need to :fiush the TB. For consistency and for the sake of beginning error retry in a known state, :fiushing the TB during error recovery is recommended. When a TB parity error occurs, Mbox hardware :fiushes the TB by itself (via an internally generated TBIA), but it would be appropriate for software to test the TB after a parity elTor. This is discussed in Section 15.3.3.1.3. 15.3.3.1.1.1 Cache Enable, Disable, and Flush Procedures To enable the NVAX caches, the caches are Hushed and enabled in a specific order. The ordering is necessary for coherence between the Bcache, Pcache, and memory. For simplicity, one procedure is given for enabling the NVAX caches, even though variations on the procedure may also produce COtTect results. Disabling the caches can be done in any order, though one procedure is given here. In error handling, the VIC and Pcache are disabled while the Bcache is placed in ETM. The Bcache :Bush from ETM procedure is done to turn off the Bcache altogether. The cache enable procedure assumes that the Bcache is completely off at the start. 15-10 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 15.3.3.1.1.1.1 Disabling the NVAX Caches for Error Handling (leaving the Bcache In ETM) This is the procedure for disabling the NVAX caches (placing the Bcache in ETM): NOTE These procedures will be supplied with MACRO coding examples. • Disable the VIC: TBS (MTPR to ICSR) • Disable the Pcache: TBS (MTPR to PCCTL) • Put the Bcache in software ETM: 15.3.3.1.1.1.2 Flushing and Disabling the Bcache This is the procedure for fiushing the Bcache and disabling it: • Flush and disable the Bcache: Errors can occur as a result of fiushing the Bcache. Before carrying out the procedure, BCEDSTS and BCETSTS should be clear of unrecoverable errors, and NESTS should be clear of unrecoverable outgoing errors. The MTPRs to BCFLUSH IPRs should be done one block at a time, checking the BCEDSTS and BCETSTS error registers after each one. (The MFPR from BCEDSTS or BCETSTS will not finish until all the Bcache accesses which result from the MTPR to BCFLUSH are done.) Otherwise any unrecoverable error which occurs during the flush may become a lost unrecoverable error and a system crash will most likely be necessary. Errors which occur while flushing the Bcache are separate errors and should be handled independently of the initial error. However, certain errors may be expected during the flush procedure, based on the initial error. Also, the successful outcome of the Bcache flush procedure is important in determining whether to retry or restart the interrupted or machine checked instruction stream. 15.3.3.1.1.1.3 Enabling the NVAX Caches The procedure for enabling the NVAX caches after an error is the same as is used to initialize the caches after power-up. See Section 16.4, Cache initialization). This procedure ensures that error retry/restart occurs with the caches in a known state. The procedure is outlined below. • • • • • The caches must all be disabled and the Bcache must be disabled (not just in ETM). Follow the above procedures to reach this state. Flush the Bcache (Loop on MTPR to BCTAG !PRs). Enable the Bcache (MTPR to CCTL). FlUsh the Pcache (Loop on MTPR to PCTAG IPRs). Enable the Pcache (MTPR to PCCTL). DIGITAL CONFIDENTIAL Error Handling 15-11 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • Flush the TB: • • Flush the VIC (Loop on MTPRs to VMAR and VTAG, writing an initial value). Enable the VIC (MTPR to ICSR). 15.3.3.1.2 Special Writeback Cache Recovery Situations and Procedures Writeback caching can lead to a couple of special error cases. Some of them can be recovered. Sometimes, further state determination or state capture is required after the error cause determination guided by the parse trees in the sections on machine check exceptions and hard and soft errors. Further analysis may also be necessary. 15.3.3.1.2.1 Bcache Uncorrectable Error During Wrlteback ""hen a Bcacheuncorrectable data RAM error occurs in a writeback, the status, cache index, and error syndrome are captured in BCEDSTS! BCEDIDX, and BCEDECC. As it is written back, the data is tagged-bad via the BAD,,:nATA 1\~AL command. However, the address of the lost data is not captured in the Bcache error registers (for implementation reasons). For this reason, sending BAD,\VDATA on the :r-..'D.AL is treated as if it were an error by the bus interface unit (BIU). This means the full address is captured in ~~OADR while the status is captured in l\'ESTS. This writeback can sit in the writeback queue in the BIU for an indefinite amount of time. If a Bcache uncorrectable error on writeback is detected, but NESTS does not show any outgoing error status, the writeback queue must be drained to continue the analysis and recovery. This is most easily accomplished by the following IPR write. S_NESTS should be reloaded from NESTS after this operation. If S_NESTS does not show the the BAD'WDATA elTor status after draining the writeback queue, and it shows no other outgoing error, then there is a serious inconsistency and the system should be crashed. 15.3.3.1.2.2 Memory State Memory in NVAX systems supports the writeback cache by maintaining some amount of state for each hexaword (each cachable block) in memory. In XMI2 systems with XMA2 memory modules, an ownership bit, and interlock bit, and an owner ID is stored for each hexaword. In OMEGA systems, only an ownership bit is stored for each block. Other system environments are possible. The effect of a given error on the stored ownership bit in memory is system specific. Since the system environment is not directly aware of errors which occur inside the NVAX CPU chip, the system specific behavior is limited to the result of system environment errors. It is always assumed that a an ownership read command no-ACKed on the NDAL doesn't affect the ownership bit in memory. Depending on the system, the state of memory's ownership bit (and other such state) may be UNPREDICTABLE or determinate after errors in data returned for ownership reads. If it is determinate, it may be set or reset, possibly depending on which fill quadword had the error and on the sort of error that occurred. This specification assumes that memory does not reset a set ownership bit on a WDISOWN until all four quadwords have been successfully received by memory (as is stated in Chapter 3). 15-12 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 15.3.3.1.2.2.1 Accessing Memory State In recovering from certain elTors it is necessary to read (or access by some means) the state memory has stored with each hexaword. This specification assumes a routine called MEMORY_STATE exists which returns this state given a block address. MEMORY_STATE may have system specific errors and side effects. For example, in XMI2 systems this routine may cause a read timeout error in the memory module and a con-esponding machine check. Software must be prepared to handle this. Before calling MEMORY_STATE, software should confirm that all registers which may end up reporting expected etTors are clear of en-ors. This helps minimize the possibility that an unrelated error event is ignored because it appears to be an expected error. In the XMI2 example, within the NVAX CPU, CEFSTS is the register to check because a memory read timeout is the only error which is expected as a side effect of MEMORY_STATE. 15.3.3.1.2.2.2 Repairing Memory State (Fill Errors) In recovering from various Bcache fill errors it is necessary to reset the o,vnership state in memory. In some system environments, this can be done without writing the data in memory. In others reseting the ownership siat.e may ha'''e the side effect of altering the data stored in the memory block. In cases where the fill error resulted from lost"l data ,vhich can not be recovered, the ownership bit may still be set in memory ,,,,bile no cache owns the block. If the data is private to one process, then the system may be able to continue operating after killing that one job. The system dependent procedure is then used to reset the ownership bit. For certain Bcache fill errors, an attempt is made to reset the ownership bit in memory, while maintaining or restoriong the correct data to the memory block. • • All the data is in memory. One or more quadwords of (the same) data are also in the cache. Memory's ownership bit is set (meaning it "thinks" a cache owns the block). The owner ID stored with the block in memory indicates this CPU. The cache tag for the block does not indicate the block is owned. (In general, if no writes to this block timeout, and the block is private to one process, then the repair can be done.) All the data is in memory. One or more quadwords of data are also in the cache, and one quadword has been altered by the Cbox in processing a write to that block from the Mbox. Memory's ownership bit is set (meaning it "thinks" a cache owns the block). The owner ID stored with the block in memory indicates this CPU. The cache tag for the block does not indicate the block is owned. an general, if no writes to this block timeout, and the block is private to one process, then the repair can be done.) NOTE If an owner ID for each block is not stored in memory, then recovery of the lost data is not recommended. The data should be treated as lost, and the appropriate system actions should be taken. 1 In this case the more general sense of "lost" is implied. That is. memorfs ownership bit is set but no cache writes the data back when a read is done to that location. In some systems. it may be possible to identify which CPU memory "thinks" owns the data. but it is often not possihle to determine which error caused this situation to arise. DIGITAL CONFIDENTIAL Error Handling 15-13 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 To recover from the first situation listed above in an XMI2 system, for instance, one of the COlTect quadwords in the Bcache is accessed (see Section 15.3.3.1.2.3) and used in the XMI2 procedure for reseting mekory's ownership bit. The side effect of this procedure is the the data extracted from the Bcache is written to memory. Given that the block is private to one process and no writes have timed out in memory, this data is still COlTect. (Note that software must somehow ensure that no writes to this block are pending in the memory before beginning the repair. This can be done by waiting an amount of time equal to an XMA2 write timeout time.) 10 recover from the second situation listed above in an XMI2 system, the same procedure is followed, but the data written back is part of the known-altered quadword. The remainder of the known-altered quadword is written to the block after the repair. 15.3.3.1.2.2.3 Repairing Memory State (Tagged-Bad Locations) In recovering from Bcache uncorrectable data RAM errors on writebacks is necessary to reset the tagged-bad-data state for a block in memory. This is a system specific procedure. In general, before clearing the tagged-bad data state of memory, software must first ensure that no more accesses to the block can occur. Otherwise there is the danger that some process on some other processor or a D~1..-\ 10 de'\ice will see incorrect data and not detect an error. In XMI2, a sequence of operations involving writes to registers in a memory module followed by a ,vrite to the memory block in question is required. To do this the Bcache should be off, because !\V:U will not issue a write to memory when the cache is enabled (or is in EThI and the block's tag indicates v..U-ID-OWNED). In OMEGA, reseting tagged-bad-data state in memory requires that a full quadword write to the tagged-bad quadword be accomplished. The most straightforward way for NVAX software to do this is to fill in the Bcache tag store and data RAMs with a VALID.. OWNED block and force a writeback (via a MTPR to BCFLUSH). 15.3.3.1.2.3 Extracting Data from the Bcache 10 extract data from the Bcache, the Bcacbe is placed in FORCE_HIT mode. Before this is done, the Bcache must be oft'. With the Bcache flushed and disabled, set the Bcache in FORCE_:EnT mode and extract the data. Note that the code which executes this procedure and its local data must be in 10 space. The TB entries (PTEs) which map this code and local data must be fixed. in the TB. (This is most easily done by fiushing the TB via an MTPR to TBIA and then accessing all the relevant pages in pages in sequence.) Otherwise Bcache FORCE_EnT will interfere with instlUCtion fetch, operand access, and PTE fetches in TB miss sequences. The following instruction places the Bcache in FORCE_HIT mode: TBS CM'l'PR to CCTL) With the Bcache in FORCE_BIT mode, a read in memory space of any address whose index portion matches the index of the cache data will return the data (provided there is no uncorrectable data RAM error). This is most easily accomplished by reading from the true address of the data. 15-14 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 NOTE In FORCE_IDT mode, Bcache data RAM ECC errors are detected (unless CCTL<DlSABLE_ERRORS> is set). Software should prepare for an ECC error (BCEDSTS unrecoverable error bits should be clear). The Bcache is restored to the disabled state by: TBS (MTPR to CC'l'L) 15.3.3.1.2.4 Address Determination Procedure for Recovery from Un correctable BClche Data RAM Errors After an uncorrectable data RAM error in the Bcache, only the index of the block is stored, not the complete physical address. The procedure for constructing the physical address of the error is given here. It depends on the assumption that the block has not been replaced. The detailed error descriptions only refer to this procedure when this assumption is valid. This is the procedure for constructing a physical address from the contents of S_BCEDIDX and the tag indicated by that register. It uses the Bcache tag Eee check routine found in Section 15.10. If an unreco\·erable ECC error if found in the tag, then the address can not be determined directly. (?..sa: ":.:.= ~~:'a !-:;":?~ ~:":.=5.=,": ==c:-. 3:-:,AG :r?,.s. :~&:k ~::a~ :.!'lE: -:.a; de::.a a~ci c!l .. ck t!-:s a=e ::==-=:-: c= cc==.. ::-:.a=.:e. ':::.-= a:==~ss -::; 1==:--=:"'::: :! -:.::. :C==-=:-;.= =.5-:';:-:' a::c. CC==!':l& ,,":!.:':: E_=~=::::·:.: NOTE The above procedure is used in the event of a Bcache data RAM error. If it fails because the tag also has an uncorrectable error, then the error should be considered unrecoverable. However, the search procedure described in the next section could be used to obtain useful information for the error log (specifically, which blocks this CPU has marked owned in memory for this cache index). 15.3.3.1.2.5 Special Address Determination Procedure for Recovery from Un correctable Bcache Tag Store Errors An uncorrectable tag store error in the Bcache can cause certain interesting error cases. In some of these cases data may be lost (the copy in the Bcache was overwritten). In other cases, the data is still good in the cache. In all cases, the address of the lost data is not directly known. A special procedure must be used to determine this address. This section describes the generic address determination procedure for use in recovering from Wlcorrectable tag store errors. Specific error event descriptions in Section 15.5.2, Section 15.7.1, and Section 15.B.1 refer to this procedure for address determination. The possible outcomes of this procedure are: • • The single address of a lost data block is found. Retry and recovery information for the error is found in the specific error event description which referred to this address determination procedure. No address is found. It can be assumed that no block was owned by the Bcache (or the error was transient). Retry and recovery information for the error is found in the specific error event description which referred to this address determination procedure. DIGITAL CONFIDENTIAL Error Handling 15-15 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • Multiple addresses are found. This is a multiple unrecoverable error situation, and the system should be crashed. The procedure for determining the address of a lost data block follows. Note that this procedure assumes the relevant tag in the Bca.che is not (correct or correctable) VALID-OWNED. This procedure is for analyzing the result of errors in that tag. This procedure assumes that MEMORY_STATE will return the ownership state and the physical ID of the CPU which memory "thinks" owns the block. If memory does not store an owner ID and there is exactly one writeback cache in the system, then the lack of an owner ID might not prevent error recovery. • • The Bcache should be in ETM. Search for the address: (5 ..... ==h all memo=;:' block a::=esses WhOSE ir.ciu po:--;ion ma-;ehes the incie~: of the Beache tag wi-:h the e====. Cheek lr.;omory s-:a':e !:= ':hE blo:k. :! -;his C::"C' is the oto."ne= o! -::. ...,: i:<:.:ck, -:ne.. thE block is lost C::::::.:!.:lwc ~r.L_ sea:er.. &'7.::' !.! o%:... ::5': :::-:k is !et:!'l~. Ze:c, 0:16, or :m.:.!.':!p~e los,: :locks co::lc: be p:ese~ l;:-:.e -:'!'la~ !.:: s::-"s-:'£ns \\-!':!: nc :'~-:l.= :: :;'::.5 i:: %r'~w:r~=~:" and .xac-:.ly 0:1. =::7, io: :::a~· e= may ~o': b.;: possiblE ~~, 2.SS-:.:.:ne ':.:'5.-: E'VE:::t o..-:le: =:==k :!.s ow:;.:: ~r -:,ha =.a.:':"::; ':.!l-ii :a:k.a: :o:a-:.:":::.. ~tr. =!!-:.!.s o"":'le: =.~,. ~~!.s !-:. Ina::" be n .. =essa::t -:: =:::!'!::i .aeh $.': o'W:led b'::: ~t:, ~he =.a: s~:::.:= '::Z:.e:::-:..) NOTE This procedure is specific to recovering from tag store errors in one CPU. So ,,,,hen the memory state for a block indicates another cache in the system owns a particular block, that block is not counted as lost. Thai block may be '1osf' in the more general sense (if the cache indicated as the owner no longer "knows" that it owns the block or is somehow unable to write it back.) The purpose here is only to find blocks that are definitely lost as a result of errors involving this CPU. 15.3.3.1.3 Cache and TB Test Procedures TBS OUTUNE OF TO·BE·SPECIFIED TEST PROCEDURES Testing is generally done using the force hit mode of a cache. The code and data of the test procedure must reside in 10 space. Assuming memory management is enabled during tbis procedure, the needed PTEs must be in the TB before entering force hit mode in the Pca.che or Bcache. For the Bcache, testing should be done with errors disabled. The ECC logic should be tested thoroughly on one location by forcing various check bit patterns and examjnjng the syndrome latched on the read (BCEDECC is loaded on every read in Bcacbe disable-errors mode). Pcache and VIC parity checking should be tested by writing bad parity into the arrays. TB testing may be accomplished by writing to MTBTAG and MTBPrE (with care to not change any TB entry necessary for the test code and data and not to cause two TB entries to exist for one address). PROBER and PROBEW (setting PSL<PRV_MOD» are then used to verify the protection bits. Testing the modify bit would be difficult, though approaches exist. 15-16 Error Handling DIGITAL. CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 15.3.4 Error Retry Error retry is a function of the error notification (machine check. or error interrupt), error type, and error state. The sections below specify the conditions under which the instruction stream may be restarted. If retry is to be attempted, the stack must be trimmed of all parameters except the PCIPSL pair. This is necessary only for machine checks, because error interrupts do not provide any additional parameters on the stack. An REI will then restart the instruction stream and retry the error. Some form of software loop control should be provided to limit the possibility of an error loop. Note that pending error interrupts may be taken before the retry occurs, depending on the IPL of the interrupted or machine checked code. Strictly speaking, an REI from a hard or soft error interrupt handler is not a retry since these interrupts are recognized between macroinstructions. A machine check exception is an instruction abort, and an REI from the handler will cause the failing instruction to be retried (provided retry is indicated by analysis). "'nat these cases all have in common is that the interrupted instruction stream is restarted. This is only done when the result of error analysis and recovery is such that all damaged state has been repaired and there is no reason to suspect that incorrect results will be produced if the image is restarted and another error does not occur. If complete recovery from one or more errors is not possible (i.e.~ some state is lost or it is impossible to determine what state is lost), possibly the entire system will have to be crashed, a single process \.vill have to be deleted, or some other action will have to be taken. Software must determine if the error is fatal to the current process, to the processor, or to the entire system, and take the appropriate action. It is expected that software handles machine checks, soft error interrupts, and hard error interrupts independently. For example, after handling a machine check from which retry is to occur, software does not check. for errors which might cause a pending hard or soft error interrupt. The machine check handler is exited via REI (after trimming the machine check information off the stack). If the IPL of the machine checked instruction stream is low enough, any pending hard or soft error interrupt is taken before the retry occurs. However, if the interrupted instruction stream was running at high IPL, then it will continue oblivious of remaining errors. 15.3.4.1 General Multiple Error Handling Philosophy Multiple errors may be reported at the same time. In some cases the NVAX CPU pipeline will contain multiple operand prefetches to the same memory block. This can cause multiple errors from a single non-transient failure. It could also occur that two separate errors occur at nearly the same time and are thus reported simultaneously. Multiple error scenarios may be grouped into the following three classes: 1. Multiple distinct errors for which no error report interferes with the analysis of any other (e.g., no lost error bits set). 2. Multiple errors which could have been caused by the NVAX CPU pipeline issuing more than one reference to a given block before the error interrupt or machine check forced a pipeline flush. 3. Multiple errors for which analysis is complicated because the reports interfere with each other. DIGITAL CONFIDENTIAL Error Handling 15-17 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 It is the intent of this chapter to recover from class 1 (above) by simply treating the errors as separate and recovering from each in turn. Retry or restart evaluation is based on the cumulative result of the recovery and repair procedures for each error. For class 2, specific cases are identified in which lost errors are tolerated. These cases are selected because the NVAX pipeline can easily cause them (given one error), and because sufficient safeguards exist to ensure that correct operation is maintained. Section 15.3.4.2 lists these cases. Class 3 scenarios are generally not considered recoverable. The system is simply crashed in those cases. Note that lost correctable errors are not considered serious problems since hardware recovers from those automatically. 15.3.4.2 Retry Special Cases The multiple error scenarios \vhich are handled are listed below. They are made likely by the 1\TVAX pipeline'S tendency to prefetch operands. The safeguard that exists in all cases is that en-ors inconsistent with correct operation after the error (such as lost data) will invariably cause a hard error interrupt or be detectable by the analysis accompanying the machine check or soft en-or intenupt. • • • Lost Bcache data RAM uncorrectable ECC errors and (BCEDSTS<LOST_ERR> ) Lost Bcache fill errors (timeouts and RDEs). (CEFSTS<LOST_ERR» Lost ~'"DAL output errors (No-ACKs). (~~STS<LOST_OERR» addressing errors. NOTE Retry from a machine check is done even when a hard error interrupt might be pending. If the machine checked I-stream were running at high enough IPL, it would not be interrupted immediately. Typical hard error causes are write errors. They can not cause a machine check.. So the fact that a serious error is ignored in the machine check retry equation is not considered a problem. The other error would probably have occurred anyway and it would not have interrupted the I-stream until IPL was lowered. 15-18 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 15.4 Console Halt and Halt Interrupt A console halt is not an exception, but rather a transfer of control by the NVAX CPU microcode directly into console macrocode at the boot ROM address E0040000 (hex). Console halts are initiated at powerup, by certain microcode-detected double error conditions, and by the assertion of the external halt interrupt pin, HALT_L. There is no exception stack frame associated with a console halt. Instead, the SAVPC and SAVPSL processor registers provide the necessary information. The format of SAVPC (IPR 42) is shown in Figure 15-1. Figure 15-1: IPR 2A (hex), SAVPC 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Saved PC I :SAVPC +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ The PSL, halt code, MAPEN<O>, and a validity bit are saved in SAVPSL (IPR 43). The fonnat of SAVPSL is shown in Figure 15-2. The halt codes are shown in Table 15-2. Figure 15-2: IPR 28 (hex), SAVPSL 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ PSL<31:16> I I I Halt Code I PSL<7:0> I :SAVPSL +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I I MAPEN<O> --+ I Invalid SAVPSL if 1 --+ The possible halt codes that may appear in SAVPSL<13:8> are listed in Table 15-2. Table 15-2: Console Halt Codes Mnemomc Code (Hex) Meanjng ERR_HLTPIN 02 HALT_L pin asserted ERR_PWRUP 03 Initial power up ERR_INTSTK 04 Interrupt stack not valid ERR_DOUBLE 05 Machine check during exception processing ERR_HLTINS 06 HALT instruction in kernel mode ERR_ILLVEC 07 nlegal SCB vector (bits <1:0> = 11) ERR_WCSVEC 08 WCS SCB vector (bits <1:0> = 10) ERR_CHMFI OA CHM:x: on interrupt stack ERR_IEO 10 ACVlrNV during machine check processing ERR_lEI 11 ACVlrNV during kemel-stack-not-valid processing DIGITAL CONFIDENTIAL Error Handling 15-19 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 1"5-2 (Cont.): Console Halt Codes MnemoDie Code (Hex) Meaning ERR_1E2 12 machine check during machine check processing ERR_1E3 13 machine check processing ERR_IE_PSL_26_24_101 19 PSL<26:24> = 101 during interrupt or exception ERR_IE_PSL_26_24_110 1A PSL<26:24> = 110 during interrupt or exception dming kernel-stack-not-valid ERR_IE_PSL_26_24_111 1B P8L<26:24> = 111 during interrupt or exception ERR_REI_PSL_26_24_101 1D PSL<26:24> = 101 during REI ERR_REI_PSL_26_24_110 1E 1F PSL<26:24> = 110 during REI ERR_REI_PSL_26_24_111 PSL<26:24> = 111 during REI NOTE In certain error conditions detected during the execution of a string instruction, the state packup sequence leaves the FPD bit set in the SAVPSL register, but the SAVPC register pointing at the instruction following the string instruction, rather than at the string instruction itself. If the FPD bit is no set in the SAVPSL register, SAVPC is correct. As error halts are not normally restartable, this is not a problem. For a console halt due to the assertion of the HALT_L pin, which is the only normally restarlable console halt, SAVPC is always correct, even if the halt interrupt was detected during the execution of a string instruction. At the time of the halt, the current stack pointer is saved in the appropriate IPR (0 to 4), and SAVPSL<31:16,7:0> are loaded from PSL<31:16,7:0>. SAVPSL<15> is set to MAPEN<O>. SAVPSL<14> is set to 0 if the PSL is valid and to 1 if it is not (SAVPSL<14> is undefined after a halt due to a system reset). SAVPSL<13:8> is set to the console halt code. To complete the hardware restart sequence and thereby pass control to the console macrocode, the state shown in Table 15-3 is initialized. Table 15-3: CPU State Initialized on Console Halt State Initialized Value SP IPR 4. (IS) PSL 041FOOOO (hex) PC EOO40000 (hex) MAPEN o Ices o (after reset, code=3, only) SISR o (after reset, code=3, only) ASTLVL PAMODE BPCIk31:16> 4. (after reset, code=3, only) 15-20 Error Handling o (after reset, code=3, only) FECA(hex) (after reset, code=3, only) DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 15-3 (Cont.): CPU State Initialized on Console Halt State Initialized Value CPUID 0 (after reset, code::3, only) all else undefined DIGITAL CONFIDENTIAL Error Handling 15-21 NVAX CPU Chip Functional Speci:6.cation, Revision 1.1, August 1991 15.5 Machine Checks The machine check exception indicates a serious system error. Under certain conditions, the error may be recoverable by restarting the instruction. The recover-ability is a function of the machine check code, the VAX Restart bit (VR) in the machine check stack frame, the opcode, the state of PSL<FPD>, the state of certain second-error bits in internal error registers, and most probably, the external error state. A machine check results from an internally detected consistency error (e.g., the microcode reaches an "impossible" state), or a hardware detected error (e.g., an uncorrectable Bcache ECC error on a data read). A machine check is technically a macro instruction abort. The NVAX CPU microcode attempts to convert the condition to a fault by unwinding the current instruction, but there is no guarantee that the instruction can be properly restarted. As much diagnostic information as possible is pushed on the stack and provided in other error registers. The rest of the error parsing is then left to the operating system. When the software machine check handler receives control, it must explicitly acknowledge receipt of the machine check via a write of any value to the MCESR processor register with thefollowing instruction: Figure 15-3: IPR 26 (hex), MCESR 31 30 29 28127 26 2S 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x xl :MCESR +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 15.5.1 Machine Check Stack Frame The machine check stack frame is shown in Figure 15-4. The fields of the stack frame are described in Table 15-4, and the possible machine check codes are listed in Table 15-5. The contents of all fields not explicitly defined in Table 15-4 are UNDEFINED. 15-22 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 15-4: Machine Check Stack Frame 31 30 29 28127 26 25 24123 22 21 20119 18 17 16/15 14 13 12/11 10 09 08/07 06 05 04/03 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ / 24 (byte count of parameters, not including this longword) / : (SP) +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ / ASTLVL / x x x x x/ Machine Check Code / x x x x x x x x/ CPUID +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ / INT.SYS register +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ / SAVEPC register +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ / VA register +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Q register +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Rn I x x/Mode / Opcode I x x x x x x x xlVR/ x x x x x x xl +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ / PC +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I PSL +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 31 30 29 28/27 26 25 24/23 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 Table 15-4: Machine Check Stack Frame Fields Longword Bits Contents (SP)+O 31:0 Byte count-This longword contains the size of the stack frame in bytes, not including the PC, PSL, or the byte count longword. Stack frame PC and PSL values should always be referenced using this count as an offset from the stack pointer. (SP)+4 31:29 ASTLVL-This field contains the current value of the VAX ASTLVL register. 23:16 Machine check code-This longword contains the reason for the machine check, as listed in Table 15-5. 7:0 CPUID-This field contains the current value of the VAX cpum register. (SP)+8 31:0 INT.SYS register-This longword contains the value of the INT.SYS register and read onto the Abus by the microcode. The fields in this register are described in Chapter 10. (SP)+12 31:0 SAVEPC-This field contains the SAVEPC register which is loaded by microcode with the PC value in certain circumstances. It is used in error handling for PTE read errors with PSL<FPD> set in this stack frame. (SP)+16 31:0 VA register-This longword contains the contents of the Ebox VA register, which may be loaded from the output of the ALU. DIGITAL CONFIDENTIAL Error Handling 15-23 NVAX CPU Chip Functional Specification, Revision 1.lt August 1991 Table 15-4 (Cont.): Machine Check Stack Frame Fields LoDgWord Bits Contents (SP)+20 31:0 Q register-This longword contains the contents of the Ebox Q register, which may be loaded from the output of the shifter. (SP)+24 31:28 Rn-This field contains the value of the Rn register, which is used to obtain the register number for the CVTPL and EDIV instructions. In general, the value of this field is UNPREDICTABLE. 25:24 Mode-This field contains a copy of PSL<CUR_MOD>. 23:16 Opcode-This field contains bits <7:0> of the instruction opcode. The FD bit is not included. 7 VR-This field contains the VAX Restart bit, which is used to communicate restart information between the microcode and the operating system. If this bit is set, no architectural state has been changed by the instruction which was executing when the error was detected. If this bit is not set, architectural state was modified by the instruction. Table 15-5: Machine Check Codes MnemoDic Code (Res:) Meanjng MCHK_UNKNOWN_MSTATUS 01 Unknown memory management fault parameter returned by the Mbox (see Section 15.5.2.1) MCHK....INT.ID_VALUE 02 illegal interrupt ID value returned in INT.SYS (see Section 15.5.2.2) MCHK_CANT_GET_HERE 03 illegal microcode Section 15.5.2.3) MCHK_MOVC.STATUS 04 illegal combination of state bits detected during string instruction (see Section 15.5.2.4) MCHK_ASYNC_ERROR 05 Asynchronous hardware Section 15.5.2.5) error occurred (see MCHK_SYNC_ERROR 06 Synchronous hardware Section 15.5.2.6) error occurred (see dispatch occurred (see 15.5.2 Events Reported Via Machine Check Exceptions This section describes all the errors which can cause a machine check exception. A parse tree is given which shows how to determine the cause of a given machine check. After that, there is a description of each error. For each error, the recovery procedure is given. Where appropriate, the conditions for retry are given. See Section 15.3.3 and Section 15.3.4 for more on error recovery and error retry. 15-24 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 15-5 is a parse tree which should be used to analyze the cause of a machine check exception. The errors shown in the parse tree are described in detail in the sections following the figure. The section is indicated in parenthesis with each error. Note that it is assumed that the state being analyzed is the saved state, as described in Section 15.3.1. Otherwise the state could change during the analysis procedure, leading to possibly incorrect conclusions. (See Section 15.3.2 for general information about error analysis.) DIGITAL CONFIDENTIAL Error Handling 15-25 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Rgure 15-5: Cause Parse Tree for Machine Check Exceptions MACHINE CHEeR ----+ (select one) MCHK UNKNOWN MSTATUS +------=-------=----------------------------------> Unknown memory management status error (Section15.5.2.1) I I MCHK_INT.ID_VALUE +-------------------------------------------------> Illegal interrupt ID error (Section 15.5.2.2) I I MCHK CANT GET HERE +------=----=---=---------------------------------> Presumed impossible microcode address reached I (Section 15.5.2.3) I MCHK_MDVC.STATUS +-------------------------------------------------> MOVCx status encoding error (Section 15.5.2.4) I I MCHK ASYNC ERROR +----+ (select all, at least one) S_TBSTS<LOCK> +----+ (select all) I I I I I I I I I I I I I S_TBST5<DPERR> +---------------------------------------> TB PTE data parity error (Section 15.5.2.5.1) I I S_TBSTS<TPERR> +---------------------------------------> TB tag parity error (Section 15.5.2.5.1) none of the above +---------------------------------------> Inconsistent status (no TBSTS error bits set) (Section 15.5.2.7) S_ECR<S3_STALL_TMEOUT> +--------------------------------------------> 53 stall timeout error (Section 15.5.2.5.2) none of the above +--------------------------------------------> Inconsistent status (no asynchronous machine check error set) (Section 15.5.2.7) MCHK_SYNC_ERROR +----+ (select all, at least one) I I I I I I I I I I I I I v 1 I I S_ICSR<LOCK> +----+ (select all, at least one) I I I I I I I I I I v 2 I I S ICSR<DPERR> +---=-----------------------------------> VIC (virtual instruction cache) data parity error I (Section 15.5.2.6.1) I S ICSR<TPERR> +---=-----------------------------------> VIC tag parity error (Section 15.5.2.6.1) I I none of the above +---------------------------------------> Inconsistent status (no ICSR error bits set) (Section 15.5.2.7) Figure 15-5 Cont'd on next page 15-26 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 15-5 (Cont): Cause Parse Tree for Machine Check Exceptions 2 v v S BCEDSTS<LOCK> AND NOT S PCSTS<PTE ER> +----+ (select one> I I I I S BCEDSTS<BAD ADDR> I +---=+ (select o~e) I I +----------------------------------> Bcache data RAM addressing error on D-stream read I or read-lock (Section 15.5.2.6.2) I I +----------------------------------> Bcache data RAM addressing error on I-stream read I I (Section 15.5.2.6.2) I I otherwise I +----------------------------------> Not a synchronous machine check cause (see soft and I hard error interrupt events) I S_BCEDSTS<UNCORR> I +----+ (select one) I I I I S_BCEDSTS<DR_CMD>aDREAD I +----------------------------------> Bcache data RAM uncorrectable ECC error on D-stream read I or read-lock (Section 15.5.2.6.2) I S_BCEDSTS<DR_CMD>-IREAD I +----------------------------------> Bcache data RAM uncorrectable ECC error on I-stream read I I (Section 15.5.2.6.2) I I otherwise I +----------------------------------> Not a synchronous machine check cause (see soft and I hard error interrupt events) I none of the above +---------------------------------------> Inconsistent status (no BCEDSTS unrecoverable error bits set) (Section 15.5.2.7) S BCEDSTS<LOST ERR> AND NOT S_PCSTS<PTE_ER> +--------------------------------------------> Lost unrecoverable Bcache data RAM error I (Section 15.5.2.6.3) I S_CEFSTS<LOCK> AND I NOT S_PCSTS<PTE_ER> +----+ (select one) S_CEFSTS<TIMEOUT> +----+ (select one) S CEFSTS<TO MBOX> AND (NOT S CEFSTS<REQ FILL DONE» +----+ (select one) I I I I S_CEFSTS<IREAD> I +-----------------------------> I-stream NDAL read timeout error (Section 15.5.2.6.4) I I I I S_CEFSTS<OREAD> I +-----------------------------> D-stream NDAL ownership read timeout error I I (Section 15.5.2.6.4) I I otherwise I +-----------------------------> D-stream NDAL read timeout error (read only operand) I (Section 15.5.2.6.4) I otherwise +----------------------------------> Not a synchronous machine check cause (see soft and hard error interrupt events) v 1 v 2 v 3 Figure 15-5 Cont'd on next page DIGITAL CONFIDENTIAL Error Handling 15-27 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 15-5 (Cont): 1 2 3 v v v Cause Parse Tree for Machine Check Exceptions S_CEFSTS<RDE> +----+ (select one) I I I S_CEFSTS<TO_MBOX> AND (NOT S_CEFSTS<REQ_FILL_DONE» +----+ (select one) S_CEFSTS<IREAD> +-----------------------------> I-stream NDAL read data error (Section 15.5.2.6.5) S CEFSTS<OREAD> +---=-------------------------> D-stream NDAL ownership read data error I (modify operand or read-lock) (Section 15.5.2.6.5) I otherwise +-----------------------------> D-stream NDAL read data error (read only operand) (Section 15.5.2.6.5) otherwise +----------------------------------> Not a synchronous machine check cause (see soft and hard error interrupt events) S CEFSTS<UNEXPECTED FILL> +---=-----------------=-----------------> Not a synChronous machine check cause (see soft error I interrupt events) I otherwise +---------------------------------------> Inconsistent status (either CEFSTS<RDE>, CEFSTS<TIMEOUT>, or CEFSTS<UNEXPECTED FILL> should be set) (Section 15.5.2.7) S CEFSTS<LOST ERR> AND NOT S_PCSTS<PTE_ER> +--------------------------------------------> Lost Bcache fill error (Section 15.5.2.6.6) I I S_NESTS<NOACK> AND I NOT S_PCSTS<PTE_ER> +----+ S_ NEOCMD<CMD>-IREAD +---------------------------------------> Unacknowledged I-stream NDAL read (Section 15.5.2.6.7) I I S_ NEOCMD<CMD>-DREAD +---------------------------------------> Unacknowledged D-stream NDAL read (read only operand) I (Section 15.5.2.6.7) I S NEOCMD<CMD>-OREAD +---=-----------------------------------> Unacknowledged D-stream NDAL read (modify operand or reac I (Section 15.5.2.6.7) ,I S_NEOCMD<CMD>-WRITE OR WDISOWN +---------------------------------------> Not a synchronous machine check cause (see hard error I interrupt events) I otherwise +---------------------------------------> Inconsistent status (invalid command in NEOCMD<CMD» (Section 15.5.2.7) S NESTS<LOST OERR> AND NOT S PCSTS<PTE ER> +-------=---------=--------------------------> Lost unrecoverable NDAL output error (Section15.5.2.6.8) I v v 1 2 Figure 15-5 Cont'd on next page 1>28 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 15-5 (Cont.): Cause Parse Tree for Machine Check Exceptions 1 2 v v S BCEDSTS<LOCK> AND S-PCSTS<PTE ER>1 +---=+ (select-one) I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I v v 1 2 I I S BCEDSTS<BAD ADDR> +----+ (select one) S BCEDSTS<DR_CMD>-DREAD +---=------------------------------> Bcache data RAM addressing error on PTE read I (Section 15.5.2.6.9.2) I S BCEDSTS<DR CMD>-IREAD +---=+ (select one) S BCEDSTS<LOST ERR> +---=------------=------------> Multiple errors in context of PTE read error I (Section 15.5.2.6.9.6) I otherwise +-----------------------------> Bcache data RAM error addressing error on I-stream read (Section 15.5.2.6.2) otherwise +----+ (select one) +-----------------------------> Multiple errors in context of PTE read error I (Section 15.5.2.6.9.6) I otherwise +-----------------------------> Not a synchronous machine check cause (see soft and hard error interrupt events) +----+ (select one) I I I I S BCEDSTS<DR CMD>-DREAD I +---=----------=-------------------> Bcache data RAM uncorrectable ECC error on PTE read I I (Section 15.5.2.6.9.2) I I S BCEDSTS<DR CMD>-IREAD I +---=+ (select one) I I I I I I I I I I I I I I I I I S BCEDSTS<LOST ERR> +---=------------=------------> Multiple errors in context of PTE read error I (Section 15.5.2.6.9.6) I otherwise +-----------------------------> Bcache data RAM error uncorrectable error on I-stream read (Section 15.5.2.6.2) otherwise +----+ (select one) I I S_BCEDSTS<LOST_ERR> +-----------------------------> Multiple errors in context of PTE read error I (Sect ion 15. 5 • 2 • 6. 9. 6) I otherwise +-----------------------------> Not a synchronous machine check cause (see soft and hard error interrupt events) none of the above +---------------------------------------> Inconsistent status (no BCEDSTS unrecoverable error b:ts set) (Section 15.5.2.7) Figure 15-5 Cont'd on next page 1 At least one potential PTE cause must be found or the status is inconsistent (see Section 15.5.2.7). Some of the outcomes indicate a potential synchronous machine check cause which is not a potential PTE read error cause. These errors should be treated separately. DIGITAL CONFIDENTIAL Error Handling 15-29 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 15-5 (Cont.): 1 v Cause Parse Tree for Machine Check Exceptions 2 v S CEFSTS<LOCK> AND S-PCSTS<PTE ER>1 (select-one) +---=+ S_CEFS TS<T IMEOOT> +----+ (select one) I I I I I I I I I I S CEFSTs<TO MBOX> AND (NOT S CEFsTS<REQ FILL DONE» (select one) - +----+ I I I I I I I I I I I I I I I I I I I I I I I I I S CEFsTS<IREAD> +---=+ (select one) S CEFSTS<LOST ERR> +---=-----------=--------> Multiple errors in context of PTE read error I I (Section 15.5.2.6.9.6) otherwise +------------------------> I-stream NDAL read timeout error {Section 15.5.2.6.4) S CEFSTS<OREAD> +---=+ (select one) S CEFSTS<LOST ERR> +---=-----------=--------> Multiple errors in context of PTE read error I (Section 15.5.2.6.9.6) I otherwise +------------------------> D-stream NDAL ownership read timeout error (Section 15.5.2.6.4) otherwise +-----------------------------> D-stream NDAL read timeout error (PTE read) (Section 15.5.2.6.9.3) otherwise (select one) +----+ I I S CEFSTS<LOST ERR> I otherwise +---=-----------=-------------> Multiple errors in context of PTE read error I (Section 15.5.2.6.9.6) +-----------------------------> hard Not a synchronous machine check cause (see soft and error interrupt events) v 2 v 1 v 3 Figure 15-5 Cont'd on next page 1 At least one potential PTE cause must be found or the status is inconsistent (see Section 15.5.2.7). Some of the outcomes indicate a potential synchronous machine check cause which is not a potential PTE read eITOr cause. These errors should be treated separately. 15-30 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Speci:6.cation, Revision 1.1, August 1991 Figure 15-5 (Cont): Cause Parse Tree for Machine Check exceptions 1 v 2 v I I I I I I I 3 v +----+ (select one) I I I S_CEFSTS<TO_MBOX> AND (NOT S CEFSTS<REQ FILL DONE» +----+ (select one) I I I I I I I I I I I I I I I I I I I I I I I I I S_CEFSTS<IREAD> +----+ (select one) I I I I I I I I S CEFSTS<LOST ERR> +---=-----------=--------> Multiple errors in context of PTE read error I (Section 15.5.2.6.9.6) I otherwise +------------------------> I-stream NDAL read data error (Section 15.5.2.6.5) I I S_CEFSTS<OREAD> +----+ (select one) S CEFSTS<LOST ERR> +---=-----------=--------> Multiple errors in context of PTE read error I (Section 15.5.2.6.9.6) I otherwise +------------------------> D-stream NDAL ownership read data error (Section 15.5.2.6.5) otherwise +-----------------------------> D-stream NDAL read timeout error (PTE read) (Section 15.5.2.6.9.4) otherwise +----+ (select one) I I S CEFSTS<LOST ERR> +---=-----------=-------------> Multiple errors in context of PTE read error I (Section 15.5.2.6.9.6) I otherwise +-----------------------------> Not a synchronous machine check cause (see soft and hard error interrupt events) S CEFSTS<UNEXPECTED FILL> +---=+ (select one) - S CEFSTS<LOST ERR> +---=-----------=------------------> Multiple errors in context of PTE read error I (Section 15.5.2.6.9.6) I otherwise +----------------------------------> Not a synchronous machine check cause (see hard error interrupt events) otherwise +---------------------------------------> Inconsistent status (either CEFSTS<RDE>, CEFSTS<TIMEOUT>, or CEFSTS<UNEXPECTED FILL> should be set) (Section 15.5.2.7) v v 1 2 Figure 15-5 Cont'd on next page 1 At least one potential PTE cause must be found or the status is inconsistent (see Section 15.5.2.7). Some of the outcomes indicate a potential synchronous machine check cause which is not a potential PTE read error cause. These elTors should be treated separately. DIGITAL CONFIDENTIAL Error Handling 15-31 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 15-5 (Cont): Cause Parse Tree for Machine Check Exceptions 1 2 v v S NESTS<NOACK> AND S:PCSTS<PTE_ER>l +----+ I I S_ NEOCMD<CMD>-IREAD +----+ (select one) I I I I I I I I I I S NESTS<LOST OERR> I otherwise +---=----------=-------------------> Multiple errors in context of PTE read error I (SectionlS.S.2.6.9.6) +----------------------------------> Unacknowledged I-stream NDAL read (Section lS.S.2.6.7) S_ NEOCMD<CMD>-DREAD +---------------------------------------> Unacknowledged D-stream NDAL read (PTE read) I (Section lS.S.2. 6. 9.5) I S_ NEOCMD<CMD>-OREAD +----+ (select one) I I I I S NESTS<LOST OERR> I otherwise I I I I I I +---=----------=-------------------> Multiple errors in context of PTE read error I (Section lS.S.2.6.9.6) +----------------------------------> Unacknowledged D-stream NDAL read (modify operand or rea( (SectionlS.S.2.6.7) S_NEOCMD<CMD>-WRITE OR WDISO'WN +----+ (select one) I I S_NESTS<LOST_OERR> . +----------------------------------> Multiple errors in context of PTE read error I (Section 15.5.2.6.9.6) I otherwise +----------------------------------> interrupt Not a synchronous machine check cause (see hard error events) otherwise +---------------------------------------> Inconsistent status (invalid command in NEOCMD<CMD» (Section lS.S.2. 7) none of the above +--------------------------------------------> Inconsistent status (no cause found for synchronous mach: (Section lS.S.2. 7) otherwise +-------------------------------------------------> (Section Inconsistent status (unknown machine check code) lS. 5.2.7) Notation: (select one) - Exactly one case must be true. If zero or more than one is true, the status is inconsistent. (select all) - More than one case may be true. (select all, at least one) - All the cases are possible causes of a particular machine check. More than one may be true. At least one must be true or the status is inconsistent. A case is not considered true if it evaluates to -Not a machine check cause w • otherwise - fall-through case for (select one) if no other case is true. none of the above - fall-through case for (select all) or (select all, at least one) if no other case is true. NOTE References to VR and PSL<FPD> in the "retry condition" parts of the following descriptions of machine check causes should be understood to refer to the named hit in the machine check stack frame. 1 At least one potential PTE cause must be found or the status is inconsistent (see Section 15.5.2.7). Some of the outcomes indicate a potential synchronous machine check cause which is not a potential PTE read error cause. These errors should be treated separately. 15-32 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 15.5.2.1 MCHK_UNKNOWN_MSTATUS Description: An unknown memory management status was returned from the Mbox in response to a microcode memory management probe. This is probably due to an internal elTOr in the Mbox, Ebox, or microsequencer. Recovery procedures: No explicit error recovery is required in response to this elTOr. Retry condition: This error can only happen in microcode processing of memory management faults for a virtual memory reference. Retry if: (VR = 1) OR (PSL<FPD> = 1). 15.5.2.2 MCHK_INT.ID_VALUE Description: An illegal interrupt ID was returned in INT.SYS during interrupt processing in microcode. This is probably due to an internal error in the interrupt hardware, Ebox, or micro sequencer. Recovery procedures: No explicit error recovery is required in response to this elTOr. Retry condition: This error can only happen in microcode processing of interrupts which occurs between instructions or the middle of interruptable instructions. Retry if: (VR 15.5.2.3 = 1) OR (PSL<FPD> = 1). MCHK_CANT_GET_HERE Description: Microcode execution reached a presumably impossible address. This is probably due to a microcode bug or an internal error in the Ebo:x or microsequencer. Recovery procedures: No explicit error recovery is required in response to this error. Retry condition: Retry if: (VR = 1) OR (PSL<FPD> = 1). 1S.5.2A MCHK_MOVC.STATUS Description: During the execution of MOVCx, the two state bits that encode the state of the move (forward, backward, fill) were found set to the fourth (illegal) combination. This is probably due to an internal error in the Ebox or micro sequencer. Recovery procedures: No explicit error recovery is required in response to this error. Retry condition: Because the state bits encode the operation, the instruction can not be restarted in the middle of the MOVCx. If software can determine that no specifiers have been over-written (MOVCx destroys RO-RS and memory due to string writes), the instruction may be restarted from the beginning by clearing PSL<FPD>. This should be done only if the source and destination strings do not overlap and if: (PSL<FPD> = 1). DIGITAL CONFIDENTIAL Error Handling 15-33 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 15.5.2.5 MCHK-"SYNC_ERROR This machine check code reports serious errors which interrupt the microcode at an arbitrary point. Many internal machine states (e.g., bits in the PSL, the PC or SP) are questionable. Recovery is typically not possible. 15.5.2.5.1 TB Parity Errors Description: Parity errors in tags and PrE data in the TB cause an asynchronous machine check by directly forcing a microtrap in the microsequencer. The reference being processed by the Mbox may be for and explicit Ebox reference, an operand prefetch or DEST_ADDR reference from the specifier queue, or an instruction prefetch from the lREF latch. Also the reference could be a read generated by the Mbox within a TB miss for a process space virtual address since process page tables are stored in virtual memory (system space). Description (TB PTE Data Parity Error): A parity error in the PrE data portion of a TB entry which hit had a parity error. Description (TB Tag Parity Error): A parity error in the tag portion of a TB entry which hit had a parity error. Recovery procedures: To recover, clear TBSTS<LOCK>. Retry condition: Since the Ibox is nearly always able to issue instruction prefetches, TB parity errors could occur at practically any time. This makes it impossible to determine what machine state is incorrect. There is no guarantee that all writes with a different PSL<CUR_MOD> completed successfully. Therefore even the stack frame PSL<CUR_MOD> can't be used to determine whether system data is uncorrupted. So retry is not possible. Crash the system. 15.5.2.5.2 Ebox S3 Stall Timeout Error Description: S3 stall timeout errors occur when the Ebox microcode is stalled waiting for some result or action which will probably never occur. S4 stalls in the Ebox cause S3 stalls and therefore can lead to S3 stall timeout. Additionally, field queue stall and instruction queue stall can cause this timeout. (These last two situations are not Ebox pipeline stalls, but they are similar in effect.) The timeout can occur in any microfiow for a number of reasons. Machine state may be corrupted. This timeout is probably due to an internal error in the NVAX CPU such that one box is waiting for another to do something which it isn't going to do. An example would be if the Ebox microcode expected one more source specifier than the Ibox delivered. The Ebox will stall until the timeout occurs waiting for the Ibox to deliver one more source operand via the source queue. S3 timeout errors can be caused by failures of various pipeline control circuits in the Ebox. Also a deadlock within a box or across multiple boxes can cause this error. Recovery procedures: To recover, clear the S3_STALL_TIMEOUT bit in ECR. Retry condition: Because this error can occur at any time, it is not possible to determine what machine state is incorrect. Also, this error should never happen and indicates either a serious failure in the NVAX CPU chip or a design bug. So retry is not possible. Crash the system. 15-34 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 15.5.2.6 MCHK_SYNC_ERROR This machine check code reports errors which occur in memory or 10 space instruction fetches or data reads. Except in the case of PTE read errors, core machine state should be consistent since microcode has to explicitly access an operand or instruction in order incur this error. Microcode does not access memory results or dispatch for a new instruction execution with core machine state in an inconsistent state. PTE read errors on write transactions can cause a microtrap at an arbitrary time, and so core machine state may be inconsistent. Many of the error events described below for synchronous machine check are possible causes. IT more than one is present, there is no way to determine which actually caused the machine check. If exactly one possible cause is discovered, then the machine check may be attributed to that cause. The reason multiple causes may be present is that the NVAX CPU prefetches instructions and data. If the CPU branches or takes an exception before using data it has requested, then the pending machine check is taken as a soft error interrupt (though it might not be recoverable in the final analysis). If multiple errors occur, recovery and retry may be possible. It is recommended that retry from multiple errors be done only if one error report does not interfere with analysis of, and recovery from, another error. An example of such interference is when S_BCEDSTS reports a Bcache data RAM uncorrectable error on a writeback while S_NESTS is reporting a NDAL command no-ACK error. Normally, S_NESTS<BADWDATA> would be reported by the writeback error and S_NEOADR would report the address of the lost writeback. The no-ACK error makes recovery from the writeback error much more difficult. But there it is unlikely that these two errors would occur together since they are understood to be uncorrelated events. So this case is considered unrecoverable. If two errors are entirely separate, neither interfering with the analysis and recovery of the other, then it is acceptable to retry from these errors provided all the error analyses and recovery procedures result in a retry indicatif!';, In several cases, lost errors are tolerated. See Section 15.3.4.2 for a list of these special cases. In each case, the strong tendency to prefetch data exhibited by the NVAX pipeline makes the particular lost error likely, given that one error of that kind occurred. Also, in each case, if data is lost in the lost error, a hard error interrupt is posted. So these errors are tolerated as long as they do not cause a hard error interrupt. Errors in opcode or operand specifier fetching are always detected before architecturally visible state within the CPU is modified. This means the VR bit from the machine check stack frame should be 1. This error handling analysis attempts to recover from multiple errors, so the retry condition for each error is -made as general as possible. If the machine check handler finds only errors of the kind listed here, then VR should be 1 and it is an inconsistent report if it is not (see Section 15.5.2.7). • • • • VIC parity errors. Bcache data RAM uncorrectable ECC and addressing errors in I-stream reads. Bcache timeout errors and fill read data errors in I-stream reads. Unacknowledged NDAL I-stream reads DIGITAL CONFIDENTIAL Error Handling 15-35 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 15.5.2.6.1 VIC Parity Errors Description: A parity error was detected in the VIC tag or data store in the Ibox. VIC parity errors cause a machine check when the Ebox microcode requests dispatch to a new instruction execution microflow or attempts to access an operand within an instruction execution microflow. VIC Data Parity Errors: A parity error occurred in the data portion of the VIC. VIC Tag Parity Errors: A parity error occurred in the tag portion of the VIC. In all cases, the quadword virtual address of the error is in VMAR. Pending Interrupts: A soft error interrupt should be pending. Recovery procedures: To recover, disable and :flush the VIC by re-writing all the tags (using the procedure in Section 15.3.3~1.1.1). Also, clear ICSR<LOCK>. Retry condition: Retry if: (VR 15.5.2.6.2 = 1) OR (PSL<FPD> = 1). Bcache Data RAM Uncorrectable ECC Errors and Addressing Errors Description (addressing errors): A Bcache addressing error was detected by the Cbox in an I-stream or D-stream read during a Bcache hit. Addressing errors are the result of a mismatch between the address the Cbox drives to the RAMs for a read access and the address used to write that location. A multiple bit data error can appear to be addressing error, though it is extremely unlikely. Description (uncorrectable ECC errors): A Bcache uncorrectable data error was detected by the Cbox in an I-stream or D-stream read during a Bcache hit. Uncorrectable data errors are the result of a multiple bit error in the data read from the Bcache. An addressing error with a single bit data error will appear as an uncorrectable data error. Description (all cases): The Bcache is in ETM. ~ _BCEDIDX contains the cache index of the error, and S_BCEDECC contains the syndrome calculated by the ECC logic. The physical address of the reference can be found by reading the tag for the data block (using the procedure in Section 15.3.3.1.2.4). (If the physical address is found to be in 10 space, it is an inconsistent status. See Section 15.5.2.7.) If the block's tag is found to contain an uncorrectable ECC error, then the address can not be determined. It should never be the case that both S_BCEDSTS<BAD_ADDR> and S_BCEDSTS<UNCORR> are set. If they are, it is an inconsistent status (see Section 15.5.2.7). Pending Interrupts: A soft error interrupt should be pending. Recovery procedures (addressing errors): To recover, clear BCEDSTS<LOCK, BAD_ADDR>. Recovery procedures (uncorrectable ECC errors): To recover, clear BCEDSTS<LOCK, UNCORR>. Recovery procedures (both cases): Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). If the data is owned by the Bcache and if the error repeats itself (is not transient), then a writeback error will result from the :flush procedure. Software should prepare for this by clearing NESTS and BCEDSTS errors. 15-36 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Retry condition: If no writeback error occurs in the Bcache flush, retry if: (VR = 1) OR (PSL<FPD> = 1). If a writeback. error occurs in the Bcache flush, then the data is presumed to be unrecoverable. See Section 15.8.1.10 for a description of handling an error in a writeback. Given that the address is available (no error in the tag store), software should determine if the error is fatal to one process or· the whole system and take appropriate action. Otherwise, crash the system. 15.5.2.6.3 Bcache Lost Data RAM Access Error Description: A lost Bcache data RAM error may have been a machine check. cause. It also might not have been. Lost Bcache data RAM errors which cause machine checks are always read errors, and can be retried unless the aborted instruction has altered essential state. Whether or not it- is a machine check cause, the error will have caused either a soft or hard error interrupt. Lost Bcache data RAM errors which can not have caused a machine check are dealt with in the sections on hard and soft error interrupts. Lost Bcache data RAM errors may be caused by more than one operand prefetch to the same cache block. Recovery for lost Bcache data RAM errors depends on whether the pending interrupt is a hard or soft error interrupt. The machine check error handling software should defer recovery until the expected hard or soft error interrupt occurs. Once the interrupt is taken, the error recovery and restart instructions found in the hard error interrupt and soft error interrupt sections should be referenced. See Section 15.7.1.3.2 and Section 15.8.1.15. Software should employ some mechanism to record that an interrupt for a lost Bcache data RAM error is pending. This mechanism should allow detection of a case in which an expected interrupt does not occur (once IPL is lowered). If the expected interrupt does not occur when IPL is lowered, then a serious inconsistency exists and the system should be crashed. The Bcache in in ETM. Pending Interrupts: A hard or soft error interrupt should be pending, or possibly both. Recovery procedures: No specific recovery action is required. Note that BCEDSTS<LOST_ERR> is not cleared. It will be cleared by the hard or soft error interrupt handler. Also, the Bcache must remain in ETM until the error interrupt occurs. Retry condition: Retry only if: (VR 15.5.2.6.4 = 1) OR (PSL<:FPD> = 1). NDAL I-Stream or D·Stream Read or D-stream Ownership Read Timeout Errors Description: An I-stream or D-stream read or D-stream ownership read timed out before any fill quadword was received. This is not an accepted means for a system environment to notify the NVAX CPU of "non-exi.stent memory or 10 location". The error could he caused by an error in the system environment or an NDAL parity error on the returned data. It also could be caused by some previous error in the system environment or this CPU which leaves a cache block marked as owned in memory and not marked as owned in any cache in the system. DIGITAL CONFIDENTIAL Error Handling 15-37 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 S_CEFSTS<COUNT> indicates the number of quadwords received before the error. (S_CEFSTS<COUNT> should always be 11 (binary) if the address is in 10 space.) The physical address is in S_CEFADR. CEFSTS<WRITE> should not be set. If it is, it is an inconsistent status (see Section 15.52.7). I-stream read: The Bcache is not in ETM. I-stream errors cause a machine check when the Ebox microcode requests dispatch to a new instruction execution microflow or attempts to access an operand within an instruction execution microfiow where the I-stream data with the error is required for the dispatch or access. D-stream read: The Bcache is not inETM. D-stream read errors cause a machine check. when the Ebox microcode accesses prefetched operand data or when the Mbox returns data tagged with an error indication to the Ebox register file. . D-stream ownership read: The Bcache is in ETM. No write data has been merged with the returning fills. The address should not be in 10 space. If it is, it is an inconsistent status (see Section 15.5.2.7). D-stream ownership read errors eause a machine check when the Ebox microcode accesses prefetched operand data or when the Ebox issues a read-lock. Pending Interrupts (all cases): A soft error interrupt should be pending. Recovery procedures (all cases): Clear CEFSTS<LOCK,TIMEOUT>. Additional Recovery procedures for D-stream ownership read: Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). Depending on the system environment, memory may have set its ownership bit for this block. The data in memory is presumably still good. The Beache block is marked invalid in the Bcache tag store. If S_CEFSTS<COUNT> is greater than 0, then part of the data also is in the Beache. In general, it is not possible to determine which quadwords are valid. However, if the S_CEFSTS<COUNT> is 11 (binary) and S_CEFSTS<REQ....FILL_DONE> is not set, then the three quadwords in the Bcache block other than the quadword pointed to by S_CEFADR are valid. If S_CEFSTS<COUNT> is greater than 0, and the address in S_CEFADR is not in 10 space, then the block was not owned before the operation began. In this case, use the procedures in Section 15.3.3.1.2.2 to determine if memory's ownership bit is set and this CPU owns the block. If so, use the system specific procedure (see Section 15.3.3.1.22.2) to reset it. In some systems (the XMI2 for example) this may require a quadword of correct data be written to memory to reset the ownership bit. Section 15.3.3.1.2.3 describes procedures for extracting data from the Bcache data RAMs in this case. If memory's ownership hit was left set as a result of this error and no non-destructive procedure exists for restoring it, then the hexaword block is lost. Retry condition a-stream or D-stream read): Retry if the address is not in 10 space and: (VR 15-38 Error Handling = 1) OR (PSL<FPD> = 1). DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Retry condition (D-stream. ownership read): Given that no data is lost, retry if the memory state repair procedure is successful or not called for and if: (VR = 1) OR (PSL<FPD> = 1). If the hexaword block could not be repaired or data is lost, software must determine if the error is fatal to one process or the whole system and take appropriate action. (If it is fatal only to one process, use the system dependent procedure for reseting memory's ownership bit.) Post Retry Recovery: If the same fill error recurs on retry, then the block. is probably "lost".1 Software must determine if the error is fatal to one process or the whole system and take appropriate action. Of it is fatal only to one process, use the system dependent procedure for reseting memory's ownership hit.) NOTE It may be appropriate in this case to first cause each CPU in the system to flush its Bcache, and then retry once more. NOTE It may be that another error (such as an uncorrectable tag store error on a coherence request) will be repaired by the soft error interrupt handler before the retry actually occurs, fortuitously repairing the cause of the fill error. 15.5.2.6.5 NDAL I-Stream or D-Stream Read or D-Stream Ownership Read Data Errors Description: An I-stream or D-stream read· or D-stream ownership. read ended with an RDE (read data error) NDAL cycle before any the fill quadwords were received. If S_CEFSTS<COUNT> is 0 or the address in S_CEFADR is an 10 space address, this is an accepted means for a system environment to notify the NVAX CPU of "non-existent memory or 10 location". Otherwise, the error could be caused by an error in the system environment. It also could be caused by some previous error in the system environment or this CPU which leaves a cache block marked as owned in memory and not marked as owned in any cache in the system. S_CEFSTS<COUNT> indicates the number of quadwords received before the error. (S_CEFSTS<COUNT> should always be 11 (binary) if the address is in 10 space.) The physical address is in S_CEFADR. . CEFSTS<'WRITE> should not be set. If it is, it is an inconsistent status (see Section 15.5.2.7). I-stream read: The Bcache is not in ETM. I-stream errors cause a machine check when the Ebox microcode requests dispatch to a new instruction execution microflow or attempts to access an operand within an instruction execution microfiow where the I-stream data with the error is required for the dispatch or access. D-stream read: The Bcache is not in ETM. D-stream read errors cause a machine check when the Ebox microcode accesses prefetched operand data or when the Mbox returns data tagged with an error indication to the Ebox register file. 1 In this case the more general sense of 'lost" is implied. That is, memory's ownership bit is set but no cache writes the data back when a read is done to that location. In some systems, it may be possible to identify which CPU memory "thinks" owns the data, but it is often not posSlble to determine which error caused this situation to arise. DIGITAL CONFIDENTIAL Error Handling 15-39 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 D-stream ownership read: The Bcache is in ETM. No write data has been merged with the returning fills. The address should not be in 10 space. If it is, it is an inconsistent status (see Section 15.5.2.7). D-stream ownership read errors cause a machine check when the Ebox microcode accesses prefetched operand data or when the Ebox issues a read-lock. Pending Interrupts (all cases): A soft error interrupt should be pending. Recovery procedures (all cases): Clear CEFSTS<LOCK,RDE>. Additional Recovery procedures for D-stream ownership read: Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). Depending on the system environment, memory may have set its ownership bit for this block. The data in memory could still be good. The Bcache block is marked invalid in the Bcache tag store. IfS_CEFSTS<COUNT> is greater than 0, then part of the data also is in the Bcache. In general, it is not possible to determine which quadwords are valid. However, if the S_CEFSTS<COUNT> is 11 (binary) and S_CEFSTS<REQ...FILL_DONE> is not set, then the three quadwords in the Bcache block other than the quadword pointed to by S_CEFADR are valid. If S_CEFSTS<COUNT> is greater than 0, and the address in S_CEFADR is not in 10 space, then the block was not owned before the operation began. In this case, use the procedures in Section 15.3.3.1.2.2 to determine if memory's ownership bit is set and this CPU owns the block. If so, use the system specific procedure (see Section 15.3.3.1.2.2.2) to reset it. In some systems (the XMI2 for example) this may require a quadword of correct data be written to memory to reset the ownership bit. Section 15.3.3.1.2.3 describes procedures for extracting data from the Bcache data RAMs in this case. If memory's ownership bit was left set as a result of this error and no non-destructive procedure exists for restoring it, then the heX8~t)rd block is lost. Retry condition (I-stream or D-stream read): Retry if the address is not in 10 space and: (VR = 1) OR (PSL<FPD> = 1). Retry condition (D-stream ownership read): Given that n9 data is lost, retry if the memory state repair procedure is successful or not called for and if: (VR = 1) OR (PSL<FPD> = 1). If the hexaword block could not be repaired or data is lost, software must determine if the errol is fatal to one process or the whole system and take appropriate action. (If it is fatal only to onE process, use the system dependent procedure for reseting memory's ownership bit.) Post Retry Recovery: If the same fill error recurs on retry, then the block is probably "lost". J Software must determine if the error is fatal to one process or the whole system and takE appropriate action. (If it is fatal only to one process, use the system dependent procedure fOl reseting memory's ownership hit.) 1 In this case the more general sense of "lost" is implied. That is, memory's ownership bit is set but no cache writes th4 data back when a read is done to that location. In some systems, it may be posSlble to identify which CPU memo~ "thinks" owns the data, but it is often not possible to determine which error caused this situation to arise. 15-40 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 NOTE It may be appropriate in this case to first cause each CPU in the.system to flush its Bcache, and then retry once more. NOTE It may be that another error (such as an uncorrectable tag store error on a coherence request) will he repaired by the soft error interrupt handler before the retry actually occurs, fortuitously repairing the cause of the fi.ll error. 15.5.2.6.6 Lost Bcache Fill Error Description: Some number of fill errors occurred and were not latched because CEFSTS and CEFADR already contained a report of an unrecoverable error. There is no guarantee this error could have caused a machine check, though it may be a cause. Lost Bcache fill errors which cause machine checks are always read errors, and can be retried unless the aborted. instruction has altered essential state. If it is a machine check cause, the error will have caused a a soft. error interrupt. Lost Bcache fill errors which can not have caused a machine check are dealt with in the sections on hard and soft error interrupts. Lost Bcache fill errors may be caused by more than one operand prefetch to the same cache block. Recovery for lost Bcache fill errors depends on whether the pending interrupt is a hard or soft error interrupt. The machine check error handling software should defer recovery until the expected hard or soft error interrupt occurs. Once the interrupt is taken, the error recovery and restart instructions found in the hard error interrupt and soft. error interrupt sections should be referenced. See Section 15.7.1.3.2 and Section 15.8.1.15. Software should employ some mechanism to record that an interrupt for a lost Bcache fill error is pending. This mechanism should allow detection of a case in which an expected interrupt does not occur (once IPL is lowered). If the expected interrupt does not occur when IPL is lowered, then a serious iritonsistency exists and the system should be crashed. The Bcache may be in ETM (S_CCTL<HW_ETM> will be set if it is). Pending Interrupts: A hard or soft error interrupt should be pending, or possibly both. Recovery procedures: No specific recovery action is required. Note that CEFSTS<LOST_ERR> is not cleared. It will be cleared by the hard or soft error interrupt handler. Also, the Bcache must remain in ETM (if it is already) until the error interrupt occurs. Retry condition: Retry only if: (VR 15.5.2.6.7 = 1) OR (PSL<FPD> = 1). Unacknowledged NDAL I-Stream or D-Stream Read or D-Stream Ownership Read Description: An I-stream or D-stream read or D-stream ownership read was no-ACKed by the system environment. This could be because the external component(s) received bad NDAL parity or it could be due to a system-specific notification of "non-existent memory or 10 location". The physical address is in S_NEOADR. I-stream read: The Bcache is not in ETM. DIGITAL CONFIDENTIAL Error Handling 15-41 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 I-stream errors cause a machine check when the Ebox microcode requests dispatch to a new instruction execution microfiow or attempts to access an operand within an instruction execution microfiow where the I-stream data with the error is required for the dispatch or access. D-stream read: The Bcache is not in ETM. D-stream read errors cause a machine check when the Ebox microcode accesses prefetched operand data or when the Mbox returns data tagged with an error indication to the Ebox register file. D-stream ownership read: The Bcache is in ETM. The address should not be in 10 space. If it is, it is an inconsistent status (see Section 15.5.2.7). D-stream ownership read errors cause a machine check when the Ebox microcode accesses prefetched operand data. Pending Interrupts (all cases): A soft error interrupt should be pending. Recovery procedures (all cases): Clear NESTS<NOACK>. Additional Recovery procedure for D-stream ownership read: Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). Retry condition: Retry if: (VR = 1) OR (PSL<FPD> = 1). 15.5.2.6.8 Lost NDAL Output Error Description: Some number of NDAL output errors occurred and were not latched because NESTS, NEOADR, NEDATHI, and NEDATLO already contained a report of an unrecoverable error. There is no guarantee this error could have caused a machine check, though it may be a cause. Lost NDAL output-errors which cause machine checks are always read errors, and can be retried unless the aborted ins'lction has altered essential state. If it is a machine check cause, the error will have caused a a soft error interrupt. Lost NDAL output elTors which can not have caused a machine check are dealt with in the sections on hard and soft error interrupts. Recovery for lost NDAL output errors depends on whether the pending interrupt is a hard or soft error interrupt. The machine check error handling software should defer recovery until the expected hard or soft error interrupt occurs. Once the interrupt is taken, the error- recovery and restart instructions found.in the hard error interrupt and soft error interrupt sections should be referenced. See Section 15.7.1.5 and Section 15.8.1.17. Software should employ some mechanism to record that an interrupt for a lost NDAL output error is pending. This mechanism should allow detection of a case in which an expected interrupt does not occur (once IPL is lowered). If the expected interrupt does not occur once IPL is lowered, then a serious inconsistency exists and the system should be crashed. The Bcache may be in ETM (S_CCTL<HW_ETM> will be set if it is). Pending Interrupts: A hard or soft elTor interrupt should be pending, or possibly both. Recovery procedures: No specific recovery action is required. Note that NESTS <LOST_ERR> is not cleared. It will be cleared by the hard or soft error interrupt handler. Also, the Bcache must remain in ETM (if it is already) until the error interrupt occurs. 15-42 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Retry condition: Retry only if: (VR 15.5.2.6.9 = 1) OR (PSL<FPD> = 1). PTE read errors The following sections describe error handling for PTE read errors. PrE read errors are read errors which happen in reads issued by the Mbox in handling a TB miss. Handling of these errors is different from handling the same underlying error (Bcache data RAM error, Bcache fill error, or NDAL no-ACK error) when PTE read' isn't the cause. If S_PCSTS<PTE_ER> is set, then a PTE read issued by the Mbox in processing a TB miss had an unrecoverable error. The TB miss sequence was aborted because of the error. The original reference can be any I-stream or D-stream read or write. If the original reference was issued by the Ebox, then the PTE read which incurred the error will have been retried once (because of a special hardware/microcode mechanism for handling PTE read errors on Ebox references). PTE read errors are difficult to analyze, partly because the read error report in the Cbox does not directly indicate that the failing read was a PrE read. Because of this and because PTE read errors should be rare (a very-small percentage of the reads issued by the Mbox are PrE reads), multiple errors which interfere with the analysis of the PTE error are not considered recoverable. The mechanism for reporting PTE read errors on Ebox references involves the Mbox forcing the Ebox (via a microtrap) into the microcode routine which normally handles memory management faults. This routine probes the address of the original reference, effectively retrying the failing PTE read. Assuming the error is not transient, the probe by microcode will cause a machine check. If the error does not occur on the probe, microcode restarts the current instruction stream. So machine checks caused by PTE read errors can easily occur with the particular PTE read error having occurred twice (with a lost error bit set in the relevant Cbox error register). The analysis here tolerates these particular multiple error reports and allows retry in those cases, provided the remainder of the error analysis indicates retry is appropriate. (Note that there is no way to tell from the information available to the machine check handler whether the original-reference was an Ebox or Ibox reference.) If the reference which incurs the PTE read error is a write, S_PCSTS<PTE_ER_WR> will be set. In this case the original write is lost. No retry is possible partly because the instruction which took the machine check may be subsequent to the one which issued the failing write. Also, PTE read errors on write transactions can cause a machine check at a practically arbitrary time in a microcode flow, and core machine state may not be consistent. 15.5.2.6.9.1 PTE Read Errors In Interruptable Instructions Another special case associated with PTE read errors exists for interruptable instructions (specifically CMPC3, CMPC5, LOCC, MOVC3, MOVCS, SCANC, SKPC, and SPANC). For these instructions, if the PrE read error occurred for an Ebox reference, the PC in the machine check stack frame points to the instruction following the interrupted instruction. In this case, the SAVEPC element in the machine check stack frame is the PC of the interrupted instruction. However in all other cases, SAVEPC is UNPREDICTABLE. This case is not considered recoverable because analysis of the error information can not unambiguously conclude that this case is present. To tell that this case might be present, the error handler examines the FPD bit in the PSL in the machine check stack frame. If FPD is set in the stack frame (in the case of a PTE read error) then one of the following is true: DIGITAL CONFIDENTIAL Error Handling 15-43 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 • • One of the interruptable instructions listed above incurred the PTE read error. In this case, SAVEPC from the machine cheek stack frame points to the interrupted instruction, and PC in the stack frame points to the next instruction. An REI instruction loaded a PSL with FPD set and a certain PC. The Ibox incurred the PrE read error in fetching the opcode pointed to by that PC. In this case, the PC in the stack frame points to the instruction which was the target of the REI and SAVEPC from the stack frame is unpredictable. It is not possible to determine with certainty which of the two above cases is the cause of a machine cheek with S_PCSTS<PTE_ER> set and stack frame PSL<FPD> set. Retry is not possible since software can not tell which PC to restart with. However, software may wish to probe the location pointed to by the PC in the stack frame, expecting a possible machine check as a result. If a machine check does occur, that is information indicating that the second case occurred (not totally unambiguously, of course). A very good guess may be made by a person examining the error report if the machine check stack frame and the result of this probe is available in the report. 15.5.2.6.9.2 Bcache Data RAM Uncorrectable ECC Errors and Addressing Errors on PTE Reads Description (addressing errors): A Bcache addressing error was detected by the Cbox in a PrE read during a Bcache hit. Addressing errors are the result of a mismatch between the address the Cbox drives to the RAMs for a read access and the address used to write that location. A multiple bit data error can appear to be addressing error, though it is extremely unlikely. Description (uncorrectable ECC errors): A Bcache uncorrectable data error was detected . by the Cbox in a PTE read during a Bcache hit. Uncorrectable data errors are the result of a multiple bit error in the data read from the Bcache. An addressing error with a single bit data error will appear as an uncorrectable data error. Description (all cases): The Bcache in in ETM. S_BCEDIDX contains the cache index of the error, and BCEDECC l ltains the syndrome calculated by the ECC logic. The physical address of the PTE read can be found by reading the tag for the data block (using the procedure in Section 15.3.3.1.2.4). (If the physical address is found to be in 10 space, it is an inconsistent status. See Section 15.5.2.7.) If the block's tag is found to contain an ECC error, then the address can not be determined. S_BCEDSTS<LOST_ERR> may be set. This error is probably due to the same PTE error occurring more than once. This is an acceptable assumption unless a hard error interrupt occurs after handling this error. It should never be the case that both S_BCEDSTS<BAD_ADDR> and S_BCEDSTS<UNCORR> are set. If they are, it is an inconsistent status (see Section 15.5.2.7). Pending Interrupts: A soft error interrupt should be pending. Recovery procedures (addressing errors): To recover, clear BCEDSTS<LOCK, BAD_ADDR>. Recovery procedures (uncorrectable ECC errors): To recover, clear BCEDSTS<LOCK, UNCORR>. 15-44 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Recovery procedures (both cases): Flush the Bcache. Clear CCTLdfW_ETM> (after flushing the Bcache). Clear PCSTS<PTE_ER>. If the data is owned by the Bcache and if the error repeats itself (is not transient), then a writeback error will result from the flush procedure. Software should prepare for this by clearing NESTS and BCEDSTS errors. Retry condition: If no writeback error occurs in the Bcache flush, retry if: (VR = 1) AND (PSL<FPD> = 0) AND (S_PCSTS<PTE_ER_WR> = 0). If crash the system. If a writeback error occurs in the Bcache flush, then the data is presumed to be unrecoverable. See Section 15.8.1.10 for a description of handling an error in a writeback (software must determine if the error is fatal to one process or the whole system and take appropriate action). 15.5.2.6.9.3 NDAL PTE Read Timeout Errors Description: A PTE read timed out before any fill quadword was received. This is not an accepted means for a system environment to notify the NVAX CPU of "non-existent memory or 10 location". The error could be caused by an error in the system environment or an NDAL parity error on the returned data. It also could be caused by some previous error in the system environment or this CPU which leaves a cache block marked as owned in memory and not marked as owned in any cache in the system. S_CEFSTS<COUNT> indicates the number' of quadwords received before the error. (S_CEFSTS<COUNT> should always be 11 (binary) if the address is in 10 space.) The physical address is in S_CEFADR. CEFSTS<WRITE> should not be set. If it is, it is an inconsistent status (see Section 15.5.2.7). The Bcache is not in ETM. The read was not an ownership read, so this error can not have caused the ownership bits in memory to be left in the wrong state. S_CEFSTS<LOST_ERR> may be set. This error is probably due to the same PTE error occurring more than once. This is an acceptable assumption unless a hard error interrupt occurs after handling this error. Pending Interrupts: A soft error interrupt should be pending. Recovery procedures: Clear CEFSTS<LOCK, TIMEOUT>. Clear PCSTS<PTE_ER>. Retry condition: Retry if: (VR = 1) AND (PSL<FPD> = 0) AND (S_PCSTS<PTE_ER_WR> = 0). Otherwise, crash the system. Post Retry Recovery: If the same fill error recurs on retry, then the block is probably "lost".l Software must determine if the error is fatal to one process or the whole system and take appropriate action. (If it is fatal only to one process, use the system dependent procedure for reseting memory's ownership bit.) 1 In this case the more general. sense of "lost" is implied. That is, memory's ownership hit is set but no cache writes the data back when a read is done to that location. In some systems, it may be possible to identify which CPU memory "thinks" owns the data, but it is often not posSlble to determine which error caused this situation to arise. DIGITAL CONFIDENTIAL Error Handling 15-45 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 NOTE It may be appropriate in this case to first cause each CPU in the system to flush its Bcache, and then retry once more. NOTE It may be that another error (such as an uncorrectable tag store error on a coherence request) will be repaired by the soft error interrupt handler before the retry actually occurs, fortuitously repairing the cause of the fill error. 15.5.2.6.9.4 NDAL PTE Read Data Errors Description: A PrE read ended with an RDE (read data error) NDAL cycle before any the fill quadwords were received. If S_CEFSTS<COUNT> is 0 or the address in S_CEFADR is an 10 space address, this is an accepted means for a system environment to notify the NVAX CPU of "non-existent memory or 10 location". Otherwise, the error could be caused by an error in the system environment. It also could be caused by some previous error in the system environment or this CPU which leaves a cache block marked as owned in memory and not marked as owned in any cache in the system. S_CEFSTS<COUNT> indicates the number of quadwords received before the error. (S_CEFSTS<COUNT> should always be 11 (binary) if the address is in 10 space.) The physical address is in S_CEFADR. CEFSTS<WRITE> should not be set. If it is, it is an inconsistent status (see Section 15.5.2.7). The physical address of the PTE is in S_CEFADR. The Bcache is not in ETM. The read could not have been an ownership read, so this error can not have caused the ownership bits in memory to be left in the wrong state. S_CEFSTS<LOST_ERR> may be set. This error is probably due to the same PTE error occurring more than once. 't'his is an acceptable assumption unless a hard error interrupt occurs after handling this error. Pending Interrupts: A soft error interrupt should be pending. Recovery procedures: Ch~ar CEFSTS<LOCK, RDE>. Clear PCSTS<PTE_ER>. Retry condition: Retry if: (VR = 1) AND (PSL<FPD> = 0) AND (S_PCSTS<PTE_ER_WR> = 0). Otherwise, crash the system. Post Retry Recovery: If the same fill error recurs on retry, then the block is probably "losttt. 1 Software must determine if the error is fatal to one process or the whole system and take appropriate action. (If it is fatal only to one process, use the system dependent procedure for reseting memory's ownership bit.) 1 In this case the more general sense of ''lost'' is implied. That is, memory's ownership bit is set but no cache writes the data back when a read is done to that location. In some systems, it may be possible to identify which CPU memory "thinks" owns the data, but it is often not possible to determine which error caused this situation to arise. 15-46 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 NOTE It may be appropriate'in this case to first cause each CPU in the system to flush its Bcache, and then retry once more. NOTE It may be that another error (such as an uncorrectable tag store error on a coherence request) will be repaired by the soft error interrupt handler before the retry actually occurs, fortuitously repairing the cause of the fill error. 15.5.2.6.9.5 Unacknowledged NDAL PTE Read Description: A PrE read was no-ACKed by the system environment. This could be because the external component(s) received bad NDAL parity or it could be due to a system-specific notification of "non-existent memory or 10 location". The physical address of the PTE is in S_NEOADR. The Bcache is not in ETM. S_CEFSTS<LOST_OERR> may be set. This error is probably due to the same PTE error occurring more than once. This is an acceptable assumption unless a hard error interrupt occurs after handling this error. Pending Interrupts: A soft error interrupt should be pending. Recovery procedures: Clear NESTS<NOACK>. Clear PCSTS<PTE_ER>. Retry condition: Retry if: (VR = 1) AND (PSL<FPD> = 0) AND (S_PCSTS<PTE_ER_WR> = 0). Otherwise, crash the system. :15.5.2.6.9.6 Multiple Errors Which Interfere with Analysis of PTE Read Error Because PTE read errors lead to several unusual cases, retry is not recommended in the event that other errors cloud the analysis of the PTE read error. Pending Interrupts: A hard or soft error interrupt should be pending, or possibly both. Recovery procedures: No specific recovery action is called for. Retry condition: No retry is possible. Crash the system. 15.5.2.7 Inconsistent Status in Machine Check Cause Analysis Description: A presumed impossible error report was found in the error registers. This could be due to a hardware failure or bug, or to incomplete analysis in this spec. Pending Interrupts: A hard or soft error interrupt should be pending, or possibly both. Recovery procedures: No specific recovery action is called for. Retry condition: No retry is possible. The integrity of the entire system is questionable. Crash the system. DIGITAL CONFIDENTIAL Error Handling 15-47 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 15.6 Power Fail Interrupt Power fail interrupts are requested to report imminent loss of power to the CPU. Power fail interrupts are requested via the PWRFL_L pin at IPL IE (hex) and are dispatched to the operating system through 8CB vector OC (hex). The stack frame for a power fail interrupt is shown in Figure 15-6. Figure 15-6: Power Fall Interrupt Stack Frame 31 30 29 28127 26 25 24123 22 21 20119 l8 17 16115 l4 l3 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I pC 1 : (SP) +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I PSL I +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 15-48 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 15.7 Hard Error Interrupts Hard error interrupts are requested to report an error that was detected asynchronously with respect to instruction execution. This results in an interrupt at IPL ID (hex) to be dispatched through SCB vector 60 (hex). Typically, these error indicate that machine state has been corrupted and that retry is not possible. The stack frame for a hard error interrupt is shown in Figure 15-7. Figure 15-7: Hard Error Interrupt Stack Frame 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ PC 1 1 : (SP) +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ PSL +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 15.7.1 Events Reported Via Hard Error Interrupts This section describes all the errors which can cause a hard error interrupt. A parse tree is given which shows how to determine the cause of a given hard error. After that, there is a description of each error. For each error, the recovery procedure is given. Where !ppropriate, the conditions for restart are given. See Section 15.3.3 and Section 15.3.4 for more on error recovery and error retry. Figure 15-8 is a parse tree which should be used to analyze the cause of a hard error interrupt. It is assumed that the state being analyzed is the saved state, as described in Section 15.3.l. Otherwise the state could change during the analysis procedure, leading to possibly incorrect conclusions. (See Section 15.3.2 for general information about error analysis.) DIGITAL CONFIDENTIAL Error Handling 15-49 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 15-8: Cause Parse Tree for Hard Error Interrupts HARD ERROR IN'l'ERRUPT ----+ (select all, at· least one) S_BCEDSTS<LOCK> +----+ (select one) I I I I I I I I I I I I I I I I I I I I I I I I I S BCEDSTS<BAD_ADDR> +----+ +---------------------------------------> Bcache data RAM addressing error on a write or write-unlc from Mbox (Section 15.7.1.1) otherwise +---------------------------------------> Not a hard error interrupt cause (see soft error interruJ events) S_BCEDSTS<UNCORR> +----+ +---------------------------------------> Bcache data RAM uncorrectable ECC error on a write or wrj I unlock from Mbox (Section 15.7.1.1) I otherwise +---------------------------------------> Not a hard error interrupt cause (see soft error interruI events) none of the above +--------------------------------------------> Inconsistent status (no BCEDSTS unrecoverable error bits set) (Section 15.7.1. 7) S_BCEDSTS<LOST_ERR> +-------------------------------------------------> Lost unrecoverable Bcache data RAM error (Section 15.7.1.2) S_CEFSTS<LOCK> +----+ (select one) • S_CEFSTS<TIMEOUT> AND S_CEFSTS<REQ_FILL_DONE> AND S_CEFSTS<WRITE> AND S_CEFSTS<OREAD> +--------------------------------------------> NDAL timeout on OREAD for write from Mbox after write dat I merged with fill data in cache (Section 15.7.1.3) I S CEFSTS<RDE> AND S CEFSTS<REQ FILL DONE> I AND S_CEFSTS<WRITE>-AND S_CEFSTS<OREAn> "+--------------------------------------------> NDAL read data error on OREAD for write from Mbox after write data merged with fill data in cache (Section15.7.1 +--------------------------------------------> Unexpected NDAL fill received. I (Section 15.7.1.3.1) I otherwise +--------------------------------------------> Not a hard error interrupt cause (see soft error interrul events) +-------------------------------------------------> Lost Bcache fill error I (Section 15.7.1.3.2) v 1 Figure 15-8 Cont'd on next page 15-50 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 15-8 (Cont.): Cause Parse Tree for Hard Error Interrupts 1 v S NESTS<NOACK> +---=+ (select one) S_NEOCMD<CMD>-WRITE +--------------------------------------------> no-ACK on WRITE command or data cycle I (Section 15.7.1.4) I S_NEOCMD<CMD:>-WDISOWN +--------------------------------------------> no-ACK on WOISOWN command or data cycle I (Section 15.7.1.4) I otherwise +--------------------------------------------> Not a hard error interrupt cause (see so£t error interrupt events) S_NESTS<LOST_OERR> +-------------------------------------------------> Lost no-ACK error I (Section 15.7.1.5) I (status consistent with hard error interrupt I in system environment error registers) +-------------------------------------------------> Hard error interrupt from system environment I (Section 15.7.1. 6) I otherwise +-------------------------------------------------> Inconsistent status (Section 15.7.1.7) Notation: (select one) - Exactly one case must be true. If zero or more than one is true, the status is inconsistent. (select all) - More than one case may be true. (select all, at least one) - All the cases are possible causes of a hard error interrupt. More than one may be true. At least one must be true or the status is inconsistent. A case is not considered true if it evaluates to "Not a hard error interrupt causeR. otherwise - fall-through case £or (select one) if no other case is true. none of the above - £all-through case for (select all) or (select all, at least one) if no other case is true. 15.7.1.1 Uncorrectable Data Errors and Addressing Errors During Write or Write-Unlock Processing Description: In processing a write or write-unlock, the Cbox detected an addressing error or an uncorrectable ECC error on the data read from the Bcache data RAMs. The write data has already been merged with the corrupted Bcache data and the write of the merged ("bad") data occurred. Data from the write is lost. There are two types of uncorrectable Bcache data RAM errors: addressing errors and uncorrectable ECC errors. Both are detected through the ECC check logic. U ncorrectable ECC errors indicate that two or more hits of the stored data quadword have changed and the error correcting code can not correct the data. A multiple-bit data error can appear to be addressing error, though it is extremely unlikely. A single-bit error combined with an addressing error appears as an uncorrectable error. Addressing errors indicate that the location read from the data RAM was probably written using a different address than the one used to read it out. The actual hardware failure could have occurred in the previous data RAM write or the current read. Addressing errors are more serious than uncorrectable ECC errors since they indicate the integrity of the entire Bcache is questionable. Also, there is less than a 100% chance that a given addressing error will result in recognition of an addressing error. This is because addressing errors are recognized by encoding the parity of the address with the data and checking it on read back. All single-bit addressing errors are DIGITAL CONFIDENTIAL Error Handling 15-51 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 detectable. Note that addressing errors on writes are never detected if that data is never read out again. The Cbox inverts three of the check bits being written back into the data RAMs to ensure that if the data is read again an uncorrectable error will be detected. If a subsequent read occurs, S_BCEDSTS<LOST_ERR> should be set, and the instruction which issued the read will machine check. However this mechanism is not fully reliable at ensuring that a subsequent read will detect the error (see Section 15.11.1, Note On Tagged-Bad Data Mechanisms). For either case, the physical address is determined from the contents of S_BCEDIDX using the procedure in Section 15.3.3.1.2.4. (If the physical address is found to be in 10 space, it is an inconsistent status. See Section 15.7.1.7.) S_BCEDECC contains the syndrome calculated by the ECC logic. The Bcache is in ETM. If the block's tag is found to contain an ECC error, then the address can not be determined. It should never be the case that both S_BCEDSTS<BAD_ADDR> and S_BCEDSTS<UNCORR> are set. If they are, it is an inconsistent status (see Section 15.7.1.7). Recovery procedures (addressing error): Clear BCEDSTS<BAD_ADDR, LOCK>. Recovery procedures (uncorrectable ECC error): Clear BCEDSTS<UNCORR, LOCK>. Recovery procedures (both cases): The data in this block is lost. Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). Flushing the Bcache should cause a writeback error (in which BADWDATA will be sent on the NDAL), so BCEDSTS and NESTS should be cleared beforehand. Then use the system specific procedure to clear the tagged-bad state from this block in memory. It is possible that no writeback error will occur, or that it will happen at the wrong address. This would occur if an error in the data RAMs caused the data to appear as correctable or without error even though it was written with three ECC bits inverted. Also, this could occur if the data was written to a different location than intended (addressing error). If this happens, then the block in memory will incorrectly appear to be good data. NOTE When clearing the tagged-bad data state of memory, software must first ensure that no more accesses to the block can occur. Otherwise there is the danger that some process on some other processor or a DMA 10 device will see incorrect data and not detect an error. Restart condition (addressing error): Addressing errors occur on data RAM reads and writes. Because the Cbox writes "b~d" data back into the location, there is no way to distinguish transient read errors from transient write errors. Therefore, the worst case has to be assumed: some previous write was written to the wrong place in the Bcache or the failing write has been written to the wrong location in the Bcache. In other words, not only is the block. whose address is known corrupted, but another block is as well. No restart is possible. The integrity of the entire system is questionable. Crash the system. Restart condition (uncorrectable ECC error): If the address of the data is available and no unexpected writeback errors occurred during the Bcache flush, software must determine if the lost data is fatal to one process or the whole system and take the appropriate action. If the address of the data could not be determined or unexpected errors occurred during the Bcache flush, crash the system. 1 ~2 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 15.7.1.2 Lost Bcache Data RAM Hard Errors Description: Some number of unrecoverable Bcache data RAM errors occurred and were not latched because BCEDSTS already contained a report of an unrecoverable error. There is no guarantee this error could have caused a hard error interrupt, though it may be a cause. Lost Bcache data RAM errors may be caused by more than one operand prefetch to the same cache block. Bcache data RAM errors which cause hard error interrupt indicate that write data has been lost. Specifically, a read-modify-write operation for a write or write-unlock had an uncorrectable ECC error or an addressing error. The data was written back into the RAMs with three check bits inverted. The Bcache is in ETM. Pending interrupts: A soft error interrupt may be pending. Recovery procedures: Clear BCEDSTS<LOST_ERR>. CCTL<HW_ETM> (after flushing the Bcache). Flush the Bcache. Clear Restart condition: No restart is possible since the errors which were not recorded could potentially have caused lost write data and no indication of what data is lost exists (based on the fact that this error was reported by hard error interrupt). Also, the possibility exists that a subsequent read to any location which had this error could receive incorrect data with no error indication. Crash the system. NOTE The lost data should be marked bad through the Bcache tagged-bad scheme. But there is a significant probability of an error converting that tagged-bad location back to good data. This is because precisely the location which had the data error is being depended on to store a different value without an error. The Bcache tagged-bad scheme does not reliably preserve the bad data status of the location in the presence of errors (see Section 15.11.1, Note On Tagged-Bad Data Mechanisms). So the tagged-bad locations may appear good to a subsequent reader. This is why the system must be crashed. 15.7.1.3 . Bcache TImeout or Read Data Error In Quadword OREAD Fill After Write Data Merged Description: A D-stream ownership read for a write or write-unlock timed out or terminated receiving an RDE fill response after the requested quadword was received. The error could be sue to an error in the system environment or to any previous error in the system environment or this CPU which leaves a cache block marked as owned in memory and not marked as owned in any cache in the system. The quadword physical address is in S_CEFADR. The address should not be in 10 space. If it is, it is an inconsistent status (see Section 15.7.1.7). The merged data is in the Bcache in the quadword indicated in S_CEFADR. The ownership and valid bits in the Bcache are not set. CEFSTS<WRITE> should not be set. If it is, it is an inconsistent status (see Section 15.7.1.7). Recovery procedures: Clear CEFSTS<LOCK>. Clear CEFSTS<TIMEOUT> if the error is a timeout, and CEFSTS<RDE> ifitis a read data error. Flush the Bcache. Clear CCTLdlW_ETM> (after flushing the Bcache). DIGITAL CONFIDENTIAL Error Handling 15-53 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Depending on the system environment, memory may have set its ownership bit for this block. This should be predictable for the given system environment because at least one quadword of data was received successfully. If the bit is set, then subsequent reads and writes to the same location may fail while the error is being handled. The data in memory should be unchanged. The quadword containing the merged data is in the Bcache. In general, the memory block can not be repaired. However, assuming the memory block is left owned, no writes to the block have timed out in memory, and the block is private to the interrupted job, it can be repaired by the following procedure. • • Extract the addressed quadword from the Bcache (see Section 15.3.3.1.2.3). Reset memory's ownership state (see Section 15.3.3.l.2.2.2) and write the extracted quadword to memory. NOTE Software must somehow ensure that no writes to this block are pending in the memory before beginning the repair. This can be done by waiting an amount of time equal to a memory subsystem write timeout time.) If memory's ownership bit is not set, the block can not be repaired. Restart condition: If memory state repair is successful, restart. Otherwise, software must determine if the lost data is fatal to one process or the whole system and take the appropriate action. 15.7.1.3.1 Unexpected Fill Error Description: At least one fill was received when none for that transaction ID was expected by the NVAX CPU. This can only occur if a serious NDAL error has occurred. Reads previous to this event may have received incorrect data. If S_CEFSTS<RDE> is set, the unexpected fill was an RDE NDAL transaction. The Bcache is in ETM. S_CEFADR is UNPREDICATBLE. Recovery procedures: Clear CEFSTS<LOCK, UNEXPECTED_FILL>. Flush the Bcache and clear CCTL<HW_ETM> (in that order). Restart condition: Data may have been corrupted in memory because of incorrect read data being processed. Crash the system. 15.7.1.3.2 Lost Bcache Fill Error Description: Either at least one fill error occurred in an OREAD after write data was merged or an unexpected fill was received. The error was not latched because CEFSTS and associated registers already contained a report of an unrecoverable error. There is no guarantee this error could have caused a hard error interrupt, though it may be a cause. The Bcache may be in ETM. Read S_CCTL<HW_ETM> to find out. Pending interr1!l.pts: A soft error interrupt may be pending. 15-54 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Recovery procedures: Clear CEFSTS<LOST_ERR>. If the Beache is in ETM, :flush it and clear CCTL<HW_ETM> (in that order). Restart condition: Data has been corrupted but the address is unknown. Crash the system. 15.7.1.4 NDAL No-ACK During WRITE or WDISOWN Description: When the Cbox issues an NDAL WRITE or WDISOWN on the NDAL and it is not acknowledged, the Cbox requests a hard error interrupt. This could be because the external component(s) received bad NDAL parity or it could be due to a system-specific notification of "non-existent memory or 10 location". The transaction is not retried by hardware, so the data is lost. Typically, for writebacks, the Bcache location is overwritten soon after this error, so there is no way to recover the data from the Bcache. The Bcache is in ETM. S_NEOADR contains the physical address. S_NEOCMD contains the byte mask and NDAL command. Recovery procedures: Clear NESTS<NOACK>. Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). Retry condition: Software must determine if the lost data is fatal to one process or the whole system and take the appropriate action. 15.7.1.5 Lost NDAl No-ACK Hard Errors Description: Some number of outgoing NDAL WRITE or WDISOWN commands were not acknowledged and were not latched because NESTS, NEOCMD, and NEOADR already contained a report of an NDAL output error. There is no guarantee this error could have caused the hard error interrupt, though it may be a cause. Pending interrupts: A soft error interrupt may be pending. Recovery procedures: Clear NESTS<LOST_NOACK>. Restart eondition: No restart is possible since the errors which were not recorded could potentially have caused lost write data. No indication of what data is lost exists. Crash the system. 15.7.1.6 System Environment Hard Error Interrupts Description: Errors which occur in the system environment and result in loss of data and which can not notify the NVAX CPU by returning RDE notify the CPU of the error by asserting H_ERR_L (e.g., write errors). Errors which can be signaled by RDE should not use hard error interrupt notification. Errors which are corrected automatically by hardware and do not result in loss of data should use soft error interrupt notification instead. NOTE It is very important that components in the system environment which assert H_ERR_L have a CPU accessible register which unambiguously reports the H_ERR_L assertion. Otherwise, system specific error handling for the hard error interrupt would always crash the system (every time). DIGITAL CONFIDENTIAL Error Handling 15-55 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 It is also strongly recommended that an address be stored where applicable. This may allow the operating system to kill one process or job instead of crashing the system in the event of that hard error. Recovery procedures: Clear the error status bits in the system registers and perform any necessary system dependent recovery procedure. Restart condition: Depends on the error. H the system environment reports the address of the lost data (where applicable) software may be able to kill just one process instead of crashing the system. 15.7.1.7 InconSistent Status In Hard Error Interrupt Cause Analysis Description: A presumed impossible error report was found in the error registers. This could be due to a hardware failure or bug. Recovery procedures: No specific recovery action is called for. Restart condition: No retry is possible. The integrity of the entire system is questionable. Crash the system. 15-56 Error Handling DIGITAL CONFIDENTIAL NVAX. CPU Chip Functional Specification, Revision 1.1, August 1991 15.8 Soft Error Interrupts Soft error interrupts are requested to report errors which were detected, but did not affect instruction execution. This results in an interrupt at IPL 1A (hex) to be dispatched through SeB vector 54 (hex). The stack frame for a soft error interrupt is shown in Figure 15-9. Figure 15-9: Soft Error Interrupt Stack Frame 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I PC I : (SP) +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I PSL I +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 15.8.1 Events Reported Via Soft Error Interrupts This section describes all the errors which can cause a soft error interrupt. A parse tree is given which shows how to determine the cause of a given soft error. Mer that, there is a description of each error. For each error, the recovery procedure is given. Where appropriate, the conditions for restart are given. See Section 15.3.3 and Section 15.3.4 for more on error recovery and error retry. Figure 15-10 is a parse tree which should be used to analyze the cause of a soft error interrupt. It is assumed that the state being analyzed is the saved state, as described in Section 15.3.l. Otherwise the state could change during the analysis procedure, leading to possibly incorrect conclusions. (See Section 15.3.2 for general information about error analysis.) Note that many errors which cause a so"'ft error interrupt may also lead to a machine check exception. For this reason, a soft error interrupt with no apparent cause is not an inconsistent state unless the CPU has executed an instruction while IPL was lower than 1A (hex) since the most recent machine check exception. When a soft error interrupt is the only notification for any memory read error which could cause a machine check, the error didn't cause a machine check for one of the following reasons. • • • • The error did not occur on the quadword the Ebox or Ibox requested (Pcache fill error). The Ebox took an interrupt before accessing an instruction or operand which was prefetched by the Ibox. could be this soft error interrupt.) A prefetched instruction or operand belonged to an instruction following a mispredicted branch, so the Ebox never executed the instruction (and it was flushed from the pipeline when the branch mispredict was recognized). The Ebox took an exception for a different reason before attempting to use an instruction execution dispatch or access an operand prefetched by the Ibox. (The pipeline was flushed because of the exception.) at DIGITAL CONFIDENTIAL Error Handling 15-57 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 15-10: Cause Parse Tree for Soft Error Interrupts SOFT ERROR INTERRUPT ----+ (select all, at least one) S_ICSR<LOCK> +----+ (select all, at least one) S ICSR<DPERR> +---=----------------------------------------> VIC (virtual instruction cache) data parity error I (Section 15.8.1.1) I S_ ICSR<TPERR> +--------------------------------------------> VIC tag parity error (Section 15.8.1.1) I I none of the above +--------------------------------------------> Inconsistent status (no ICSR error bits set) (Section 15.8.1.22) S_PCSTS<LOCK> +----+ (select all, at least one) S_PCSTS<DPERR> +--------------------------------------------> Pcache data parity error (Section 15.8.1.2) I I S PCSTS<RIGHT BANK> +---=-----------=----------------------------> Pcache tag parity error in right bank I (Section 15.8.1.2) I S PCSTS<LEFT BANK> +---=----------=-----------------------------> Pcache tag parity error in left bank I (Section 15. 8~ 1.2) I otherwise +--------------------------------------------> Inconsistent status (no PCSTS error bits set) (Section 15.8.1.22) S_BCETSTS<LOCK> +----+ (select one) I I I I I I I I I I I I I I I I I I I I I I I I I I I I v 1 I I S_BCETSTS<UNCORR> +----+ (select one) +---------------------------------------> Bcache tag store un correctable ECC error on D-stream rea I (Section 15. 6.1.3) I S_BCETSTS<TS_CMD>-IREAD +---------------------------------------> Bcache tag· store uncorrectable ECC error on I-stream rea I (Section 15.6.1.3) I S_BCETSTS<TS_CMD>-OREAD +---------------------------------------> Bcache tag store uncorrectable ECC error on write or read-lock (Section 15. 8.1.3) +---------------------------------------> Bcache tag store uncorrectable ECC error on write-unlock I (done only in ETM) (Section 15.8.1.3) I S_BCETSTS<TS_CMD>-R_INVAL +---------------------------------------> Bcache tag store uncorrectable ECC error on writeback request type of NDAL operation (Section 15.8.1.3) S BCETSTS<TS CMD>-O INVAL +---=----------=------=-----------------> Bcache tag store uncorrectable ECC error on I writeback-and-invalidate type of NDAL operation (Section I S BCETSTS<TS CMD>-IPR DEALLOCATE +---=----------=--------=---------------> Bcache tag store uncorrectable ECC error on software I forced deallocate (Section 15.8.1.3) I otherwise +---------------------------------------> Inconsistent status (invaiid command) (Section 15.8.1.22) v 2 Figure 15-10 Cont'd on next page 15-58 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 15-10 (Cont.): 1 2 v v Cause Parse Tree for Soft Error Interrupts S BCETSTS<BAD ADDR> +---~+ (select o;e) S_BCETSTS<TS_CMD>-DREAD +---------------------------------------> Bcache tag store addressing error on D-stream read I (Section 15.8.1.3) I S_BCETSTS<TS_CMD>-IREAD +-----------------------~---------------> Bcache tag store addressing error on I-stream read I (Section 15.8.1.3) I S_BCETSTS<TS_CMD>-OREAD +---------------------------------------> Bcache tag store addressing error on write or I read-lock (Section 15.8.1.3) I S_BCETSTS<TS_CMD>-WUNLOCK +---------------------------------------> Bcache tag store addressing error on write-unlock I (done only in ETM) (Section 15.8.1.3) I S_BCETSTS<TS_CMD>-R_INVAL +---------------------------------------> Bcache tag store addressing error on writeback I request type of NDAL operation (Section 15.8.1.3) I S BCETSTS<TS CMD>-O INVAL +---~----------~------~-----------------> Bcache tag store addressing error on I writeback-and-invalidate type of NDAL operation (Section15.8.1.3) I S BCETSTS<TS CMD>=IPR DEALLOCATE +---~----------~--------~---------------> Bcache tag store addressing error on software I forced deallocate (Section 15.8.1.3) I otherwise +---------------------------------------> Inconsistent status (invalid command) (Section 15.8.1. 22) otherwise +--------------------------------------------> Inconsistent status (no BCETSTS error bits set) (Section 15.8.1. 22) +-------------------------------------------------> Lost unrecoverable Bcache tag store error I (Section 15.8.1.4) v 1 Figure 15-10 Cont'd on next page DIGITAL CONFIDENTIAL Error Handling 15-59 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 15-10 (Cont.): Cause Parse Tree for Soft Error Interrupts 1 v S_BCETSTS<CORR> +----+ (select one) I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I v I I S BCETSTS<LOCK> +---=----------------------------------------> Lost Bcache tag store correctable error I (Section 15.8.1.6) I otherwise +----+ (select one) +---------------------------------------> Bcache tag store correctable ECC error on D-stream read I (Section 15.8.1.5) I S_BCETSTS<TS_CMD>-IREAD +---------------------------------------> Bcache tag store correctable ECC error on I-stream read (Section 15.8.l.5) +---------------------------------------> Bcache tag store correctable ECC error on write or I read-lock (Section 15.8.1.5) I S_BCETSTS<TS_CMD>-WUNLOCK +---------------------------------------> Bcache tag store correctable ECC error on write-unlock I (done only in ETM) (Section 15.8.1. 5) I S_BCETSTS<TS_CMD>mR_INVAL +---------------------------------------> Bcache tag store correctable ECC error on writeback request type of NDAL operation (Section 15.8.1.5) S BCETSTS<TS CMD>-O INVAL +---=----------=------=-----------------> Bcache tag store correctable ECC error on I writeback-and-invalidate type of NDAL operation (Section I S BCETSTS<TS CMD>-IPR DEALLOCATE +---=----------=--------=---------------> Bcache tag store correctable ECC error on software I forced deallocate (Section 15.8.1.5) I otherwise +---------------------------------------> Inconsistent status (invalid command) (Section 15.8.1.22) 1 Figure 15-10 Cont'd on next page 15-60 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 15-10 (Cont.): Cause Parse Tree for Soft Error Interrupts 1 v S_BCEDSTS<CORR> +----+ (select one) I I I I S_BCEDSTS<LOCK> I +--------------------------------------------> Lost Bcache data RAM correctable error I I (Section 15.8.1.8) I I otherwise I +----+ (select one) I I I I I I I I I I I I I I I I I I +---------------------------------------> Bcache data RAM correctable error on D-stream read (Section 15.8.1. 7) +---------------------------------------> Bcache data RAM correctable error on I-stream read I (Section 15.8.1.7) I S_BCEDSTS<DR_CMD>-WRITEBACK +---------------------------------------> Bcache data RAM correctable error on writeback (Section 15.8.1. 7) +---------------------------------------> Bcache data RAM correctable error on read-modify-write I for write or write-unlock (Section 15.8.1.7) I otherwise +---------------------------------------> Inconsistent status (invalid command) (Section 15.8.1.22) S_BCEDSTS<LOCK> AND NOT S PCSTS<PTE ER> +----+ (;elect onel v S_BCEDSTS<UNCORR> +----+ (select one) I I I I S_BCEDSTS<DR_CMD>aDREAD I +---------------------------------------> Bcache data RAM uncorrectable ECC error on D-stream read I I (or Pcache fill for read-lock) (Section 15.8.1.9) I I S_BCEDSTS<DR_CMD>-IREAD I +---------------------------------------> Bcache data RAM uncorrectable ECC error on I-stream read I I (Section 15.8.1. 9) I I S_BCEDSTS<DR_CMD>-WRITEBACK I +---------------------------------------> Bcache data RAM uncorrectable ECC error on writeback I I (Section 15.8.1.10) I I otherwise I +---------------------------------------> Inconsistent status {all other cases cause hard error I interrupt} (Section 15.8.1.22) v 1 2 Figure 15-10 Cont'd on next page DIGITAL CONFIDENTIAL Error Handling 15-61 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 15-10 (Cont.): Cause Parse Tree for Soft Error Interrupts 1 2 v v S BCEDSTS<BAD ADDR> +---=+ (select one) I I I I I I I I I I I I I I I I S BCEDSTS<DR CMD>-DREAD +---=----------=------------------------> Bcache data RAM addressing error on D-stream read I (or Pcache fill for read-lock) (Section 15.8.1.9) I S BCEDSTS<DR CMD>-IREAD +---=----------=------------------------> Bcache data RAM addreSSing error on I-stream read I (Section 15.8.1.9) I S BCEDSTS<DR CMD>-WRITEBACK +---=----------=------------------------> Bcache data RAM addressing error on writeback I (Section 15.8.1.10) I otherwise +---------------------------------------> Inconsistent status (all other cases cause hard error interrupt) (Section 15.8.1.22) otherwise +--------------------------------------------> Inconsistent Status (no error bits set in BCEDSTS) (Section 15.8.1.22) S BCEDSTS<LOST ERR> AND NOT S_PCSTS<PTE_ER> +----+ I I I I I I I I I I I S_NESTS<BADWDATA> OR S_NESTS<LOST_OERR> +--------------------------------------------> Lost unrecoverable Bcache data RAM error with possible I lost writeback error (Section 15.8.1.11) I otherwise +--------------------------------------------> Lost unrecoverable Bcache data RAM error (Section l5. 8 .1.12) S_CEFSTS<LOCK> AND NOT S_PCSTS<PTE~ER> +----+ (select one) S_CEFSTS<TIMEOUT> +----+ (select one) I I I I I I v 1 v 2 I I S_CEFSTS<OREAD> +----+ (select one) I I I I I I I I I I I I I I I I I v :3 S CEFSTS<WRITE> AND NOT S CEFSTS<TO MBOX> +----+ (;elect one> I I I I I I I I I I S CEFSTS<REQ_FILL_DONE> +-----------------------------> Inconsistent status (should cause hard error interrupt) I (Section 15.8.1.22) I otherwise +-----------------------------> D-stream NDAL owne~ship read for Mbox write timeout error before write data merged with fill data (Section15 S_CEFSTS<TO_MBOX> o-stream NDAL ownership read timeout error (modify operand or read-lock) (Section 15.8.1.13) otherwise Inconsistent status (either WRITE or TO_MBOX, but not boo should be set) (Section 15.8.1.22) +----------------------------------> +----------------------------------> Figure 15-10 Cont'd on next page 15-62 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 15-10 (Cont.): 1 2 v v Cause Parse Tree for Soft Error Interrupts 3 v oth,erwise +----+ (select one) S_CEFSTS<IREAD> +----------------------------------> I-stream NDAL read timeout error (Section 15.8.1.13) S_CEFSTS<TO~MBOX> +----------------------------------> D-stream NDAL read timeout error (read only operand) (Section 15.8.1.13) otherwise +----------------------------------> Inconsistent status (TO_MBOX should be set) (Section 15.8.1.22) S_CEFSTS<RDE> (select one) I I S_CEFSTS<OREAD> (select one) +----+ I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I +----+ S_CEFSTS<WRITE> AND NOT S CEFSTS<TO MBOX> (select one) I I S_CEFSTS<REQ_FILL_DONE> +----+ I I I I I I I otherwise I I . I S_CEFSTS<TO_MBOX> +-----------------------------> Inconsistent status (should cause hard error interrupt) (Section 15.8.1.22) +-----------------------------> D-stream NDAL ownership read for Mbox write read data error before write data merged with fill data (Section15.B.l.14) +----------------------------------> D-stream NDAL ownership read read data error (modify op2rand or read-lock) (Section 15.8.1.14) otherwise +----------------------------------> Inconsist~t status (either WRITE or TO_MBOX, but not both, should be set) (Section 15.8.1.22) otherwise (select one) +----+ +----------------------------------> I-stream NDAL read read data error I I S_CEFSTS<TO_MBOX> I I otherwise (Section 15.8.1.14) +----------------------------------> D-stream NDAL read read data error (read only operand) (Section 15.8.1.14) +----------------------------------> Inconsistent status (TO_MBOX should be set) (Section 15.8.1.22) otherwise +--------------------------------------------> Inconsistent status (either CEFSTS<RDE> or CEFSTS<TIMEOUT> should be set or, if CEFSTS<UNEXPECTED FILL> is set, it should cause a hard error interrupt) (Section 15.8.1.22) S CEFSTS<LOST ERR> AND NOT S_PCSTS<PTE_ER> +-------------------------------------------------> Lost Bcache fill error I v (Section 15.8.1.15) 1 Figure 15-10 Cont'd on next page DIGITAL CONFIDENTIAL Error Handling 15-63 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 15-10 (Cont.): Cause Parse Tree for Soft Error Interrupts 1 v S_NESTS<NOACK> AND NOT S PCSTS<PTE ER> +----+ (select one) S_NEOCMD<CMD>-IREAD +--------------------------------------------> Unacknowledged I-stream NDAL read (Section 15.8.1.16) I I 5_NEOCMD<CMD>-DREAD +--------------------------------------------> Unacknowledged D-stream NDAL read (read only operand) (Section 15.8.1.16) +--------------------------------------------> Unacknowledged D-stream NDAL read {modify operand or reo I (Section 15.8.1.16) I S_NEOCMD<CMD>-WRITE or WDISOWN +--------------------------------------------> Inconsistent status (should cause hard error interrupt) I (Section 15.8.1.22) I otherwise +--------------------------------------------> Inconsistent status (invalid command in NEOCMD<CMD» (Section 15.8.1.22) S NESTS<LOST OERR> AND NOT S_PCSTS<PTE_ER> +-------------------------------------------------> Lost NDAL output error (Section 15.8.1.17) I I I 5 BCEDSTS<LOCK> AND S-PCSTS<PTE ER>l +---=+ (select-one) S_BCEDSTS<ONCORR> +----+ (select one) • S_BCEDSTS<DR_CMD>-DREAD +---------------------------------------> Bcache data RAM uncorrect~le ECC ~rror on PTE read I (Section 15.8.1.18.1) I S BCEDSTS<DR CMD>-IREAD +---=+ (select one) S BCEDSTS<LOST ERR> +---=------------=-----------------> Multiple errors in context of PTE read error I (Section 15.8.1.18.5) I otherwise +----------------------------------> Bcache data RAM un correctable ECC error on I-stream reac (Section 15.8.1. 9) v 1 v 2 v 3 Figure 15-10 Cont'd on next page 1 At least one potential PTE cause must be found or the status is inconsistent (see Section 15.8.1.22). Some of the outcomes indicate a potential soft error interrupt cause which is not a potential PTE read error cause. These errors should be treated separately. 15-64 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 15-10 (Cont.): 1 v 2 v Cause Parse Tree for Soft Error Interrupts 3 v S BCEDSTS<DR CMD>-WRITEBACK +---=+ (select ~ne) I I I I S BCEDSTS<LOST ERR> I +---=------------=-----------------> Multiple errors in context of PTE read error I I (Section 15.8.1.18.5) I I otherwise I +----------------------------------> Bcache data RAM uncorrectable ECC error on writeback I (Section 15.8.1.10) I otherwise +---------------------------------------> Inconsistent status (all other cases cause hard error interrupt) (Section 15.8.1.22) S_BCEDSTS<BAD_ADDR> +----+ (select one) I I I I S BCEDSTS<DR_CMD>-DREAD I +---------------------------------------> Bcache data RAM addressing error on PTE read I (Section 15.8.1.18.1) I S BCEDSTS<DR CMD>=IREAD I +---=+ (select ~ne) I I I I S BCEDSTS<LOST ERR> I +---=------------=-----------------> Multiple errors in context of PTE read error I I (Section 15.8.1.18.5) I I otherwise I +----------------------------------> Bcache data RAM addressing error on I-stream read I (Section 15.8.1.9) I S_BCEDSTS<DR_CMD>-WRITEBACK I +----+ (select one) I I I I I I I I I I I S BCEDSTS<LOST ERR> +---=------------=-----------------> Multiple errors in context of PTE read error I (Section 15.8.1.18.5) I otherwise +----------------------------------> Bcache data RAM addressing error on writeback (Section 15.8.l.10) otherwise +---------------------------------------> Inconsistent status (all other cases cause hard error interrupt) (Section 15.8.1.22) otherwise +--------------------------------------------> Inconsistent Status (no error bits set in BCEDSTS) (Section 15.8.1.22) v 1 Figure 15-10 Cont'd on next page At least one potential PTE cause must be found or the status is inconsistent (see Section 15.8.1.22). Some of the outcomes indicate a potential soft error interrupt cause which is not a potential PTE read error cause. These errors should be treated separately. DIGITAL CONFIDENTIAL Error Handl ing 15-65 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 15-10 (Cont.): Cause Parse Tree for Soft Error Interrupts 1 v S CEFSTS<LOCK> AND S-PCSTS<PTE ER>l +---=+ (select-one) I I I I S_ CEFSTS<TIMEOUT> I +----+ (select one) I I I I S_ CEFSTS<OREAD> +----+ (select one) I I I S_CEFSTS<WRITE> AND NOT S CEFSTS<TO MBOX> +----+ (select oneT I I S_CEFSTS<REQ_FILL_DONE> +-----------------------------> Inconsistent status (should cause hard error interrupt) (Section 15.8.1.22) otherwise +----+ (select one) I I S CEFSTS<LOST ERR> +---=-----------=--------> Multiple errors in context of PTE read error I (Section 15.8.1.18.5) I otherwise +------------------------> D-stream NDAL ownership read for Mbox write timeout error before write data merged with fill data (Section1 S CEFSTS<TO MBOX> +---=+ (select-one) I I I I I I I I S CEFSTS<LOST ERR> +---=-----------=-------------> Multiple errors in context of PTE read error I (Section 15.8.1.18.5) I otherwise +-----------------------------> D-stream NDAL ownership read timeout error I (modify operand or read-lock) (Section 15.8.1.13) I otherwise +----------------------------------> Inconsistent status (either WRITE or TO_MBOX, but not b should be set) (Section 15.8.1.22) otherwise +----+ (select one) S_CEFSTS<IREAD> +----+ (select one) I I S_CEFSTS<LOST_ERR> +-----------------------------> Multiple errors in context of PTE read error I (Section 15.8.1.18.5) I otherwise +-----------------------------> I-stream NDAL read timeout error (Section 15.8.1.13) S_CEFSTS<TO_MBOX> +----------------------------------> D-stream NDAL read timeout error (PTE read) I (Section 15.8.1.18.2) I otherwise +----------------------------------> Inconsistent status (TO_MBOX should be set) (Section 15.8.1.22) v v 1 2 Figure 15-10 Cont'd on next page 1 At least one potential PTE cause must be found or the status is inconsistent (see Section 15.8.1.22). Some of the outcomes indicate a potential soft elTOr interrupt cause which is not a potential PTE read elTOr cause. These errors should bE treated separately. 15-66 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0. February 1991 Figure 19-2: Internal Scan Register Operation nmlng 1 NDAL_PHIl:2 NDAL_PHI2:3 NDAL_PH!34 ..I ~ 1 \. ""\ I 1 I pnases cycles 1:2 :3 ! I 1 \ :3 I 1 " ! I \ \. :2 :3 ~ \ I \ 1 \ 1 I '--- 1:2 " .. I . . :3 " . . I :---------------------------..:..----:..--~ • I ~:-,";,j: =::c:'e =&;:,:~=e:i :::.-,; E!II I '::::::::X>ocx~.".;.----------------------------..:..----:..--- ::_=·=·<::C> ;,.:: I 1:2 .. ::_:;.:;'<:'1:0> !-;o-"-;,,x =:::"::"e 1 " : : _::!C'<2::> ~:: I I \ i--i--I--I--i--I--I--I--I--I--I--I--i--I--!--I--I 1 :2 ::; 1 :2 :3 1 :2 :3 1 ~ ::; 1 - - I - - i - - , ' - - I - - I - - ! - - I - - 1 - - I - - r - - i - -1 I CAS!. 1 \ I '::::::::X>ocx~:"";- z!.~:.· :)~:..;..~ 1 \. I \. \ 1 1'"1;,): i:l.'!;e=nal I I I ----I 10,,:);':' 1 \. =z.::':.:~;=e:' ~=.-. '::::::::X>ocx--:.".;::-----------------------------..:..----:..--~"V'AX Cycle ca;:,:~=e~ in p:'_' Note that the initial packets of ISR data contain data from before the load event from the last bit on the chain. After one or two samples, this data is all valid sampled data. The bits from the scan chain are serial-to-parallel converted as shown in Table 19-3. Note that for ISR1, 9 bits are always visible. Every third NVAX cycle, they shift up by three bit positions. Table 19-3: Serial to Parallel Conversion of Scan Data Bit from Scan Chain ISRl ISR2 PP_DATA<5> Most recently received bit PP_DATA_B<4> Second most recently received bit PP_DATA_B<3> Third most recently received bit PP_DATA_B<8:6> Last PP_DATA_B<5:3> (from 3 NVAX cycles ago) PP_DATA_B<11:9> Last PP_DATA_B<8:6> (from 3 NVAX cycles ago) PP_DATAJI<2> Most recently received bit PP_DATA_B<l> Second most recently received bit PP_DATA_B<O> Least recently received bit DIGITAL CONFIDENTIAL Testability Micro-Architecture 19-5 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Observe MAB For full speed MAB observation, an internal clock is provided which will allow synchronous capture by a DAS in any debug environment. Figure 19-3 shows the the self-relative timing during Observe MAB mode. Figure 19-3: Self Relative Timing In Observe MAS Mode 1 WAX cycle IE__---.-. ;.1 _.__ PP DATA<10:0> P? D~_TA<11> r ~ ~ I . I . ~~-----~ __/_7_Z~/\~S~\~\~__~__~/~7~ZI\~S~S~\____~~/~Z~Z~/\~S~S~\____~___ (N"7;..x PE! 2) Force MAB During Force M-.!\B mode an internal 11 bit counter forces address on the microaddress bus. The count.er is initialized internally by the Ebox.. It gets incremented each time FORCE ~iAB mode is entered, thus allowing it to go through all control store addresses. Refer to the testability sections of Micro-Sequencer chapter for further details of Force :tv1AB operation. Observe .Box Signals The timing for observing internal signals from boxes follows the basic pattern as that for observing MAE. Note that PP_DATA_H<11> may be used for observing box-specific signal. Details of the signals observed may be found in the testability section of each box chapter. 19.5 Test Pads This port consists of strategic internal nodes brought out to top level of metal in the form of 3x3 micron test pads. These pads will be accessed by probes during chip debug and wafer probe manufacturing tests. The access may primarily provide observability of these nodes, however, controllability may also be provided where appropriate. See the testability sections in box chapters for the list of nodes brought out on the top metal layer. 19.6 System Port This is simply the normal system I/O of the chip. It is identified as a test access port because of two reason: • It is used to provide the read/write access to testability features via the VAX architecture's MFPR and MTPR instructions. 19-6 Testability Micro-Architecture DIGITAL CONFIDENTiAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • It provides the natural resource for testing the chip via the macro-code based tests. See the individual box chapters for the list of specific architectural features provided. 19.7 Serial P-cache Port Instruction stream data may be serially loaded into the P-cache by supplying data on the TEST_ DATA_H pin and strobing it with the TEST_STROBE_H pin. Chip microcode collects the bitserial data, packs it into longwords, and writes the longwords into the P-cache. After loading the P-cache, the microcode passes control to the first MACRO instruction in P-cache. The serial load follows this fiow: • • • • • • • TEST_STROBE_H is de-asserted while ASYNCH_RESET_L is asserted. TEST_STROBE_H is normally pulled up through on-chip resistors. '\\-"hen ASYNCH_RESET_L is de-asserted, the on-chip power-up microcode enters the special burn-in fiow. ,,:nen 1-fCHK_H is asserted, TEST_STROBE_H should be de-asserted. The chip is now ready to receive serial data input. The first. bit of instruction stream data should be placed on TEST_DATA_H, Then TEST_ STROBE_H should be asserted. TEST_STROBE_H should then be de-asserted. TEST_DATA_H can change on the same edge as the TEST_STROBE de-assertion. TEST_STROBE_H may transition at a maximum rate of 1110 the internal chip clock frequency. There is no minimum rate. 32K bits of instruction stream data must be loaded into cache. At this time, MCHK_H will be de-asserted, signifying the cache load is complete. The chip then jumps to the first location in P-cache, attempting to execute an instruction at that location. It is difficult to achieve high test coverage in the the burn-in and life-test environments due to limited test pattern bandwidth and the difficulty in synchronizing test equipment to the NVAX chip. Using this serial port, burn-in and life-test programs can load the real "test program" into P-cache, where the chip can perform a self-test. This scheme minjmizes test pattern bandwidth, allows for asynchronous transmission of the serial data, provides a means to stimulate multiple chips under test which are running asynchronously, and supplies a means to achieve high test coverage. 19.8 -IEEE 1149.1 (JTAG) Serial Test Port The Serial Test Port is a 4-pin test access interface based on IEEE 1149.1 standard. (See [2], [3].) In NVAX it is used for accessing and controlling the boundary scan register. The port" supports EXTEST, SAMPLE and BYPASS instructions. DIGITP.L CONFIDENTIAL iestablllty Micro-Architecture 19-7 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 19-4: Serial Port nmlng NDAL_PR!l:2 NDAL_PR!23 NtlAl._PR!3.t NtlAl._PR! ·n NV.1.X inte::DAl cyc:l •• :.....-.p~~. :rES':_ S'l'P'O$£_R : 1: ~____~__________________~______~____________~~~Z7----~ i . o'".!~~~~s s'!s r~::: :. 127 1;7 I/? 127 \S\ • • I • ,'\ • ~ ~-.;'::?-.;':s s~s !'.!.st: ~ inl:)\:':. :'£s-:_S!r~C'=:_!: • I ou,:puts SYS_Rl:SE:_l. ! I . . ! ,55,SSSS'S,S \ I • • I The block diagram of the port logic together with the boundary scan register is shown in Figure 19-5. The port logic shown represents all the logic used in the definition of Common Test Interface (see [2]). It consists of the four-wire Test Access Port (TAP), a TAP controller, an instruction register (IR) and a bypass register (BPR). The four pins in test access port are TDI_H, TOO_a TMS_H, and TCK_H. These pins conform to all requirements of the standard. The port also uses PP_CMD_H< 0> pin as pseudo-TRST_L pin. When asserted low, this pin resets the JTAG test logic. See Section 19.8.5 for more details. The TAP Controller is a state machine which interprets IEEE 1149.1 protocols received on TMS line and generates appropriate clocks and control signals for the testability features under its jurisdiction. 19-8 Testability Micro-Architecture DIGITAL CONFIDENTIAl. NVA.X CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 19-5: IEEE 1149.1 Serial Port (the Basic en) i-·_·_·_·_·_·-·_·_·-·_·-"····-·-·-·_·_·_·-·-·-·-"-·_·-._._._.-.-.. -.-._.-.-._._._.-... _._... _.-._.-...-... _.-.-.-..._._._._.. _., i i i ,....-..-------. . . ------1 >-I-~I-I bAtn t roll e r TCK_H '-~---I~ State Machine Ins t r u c t ion Reg i s t e r I - - - - - - f Control Dispatch Log i c -C~-,;t;;·i·d·r~;·:t;·;·-·-·-·-·-···-·-·-·-· to boundary 50an Soundary Scan Resister The Instruction Register resides on a scan path. Its contents are interpreted as test instructions and are used to select the testability modes and features. The Bypass Register is a one bit shift register which provides a single-bit serial connection through the port (chip) when no other test path is selected. 19.8.1 TAP Controller State Machine The TAP Controller is a synchronous finite-state state machine that interprets IEEE 1149.1 protocols received on TMS line. The state transitions in the controller are caused by the TMS signal on the rising edge of TCK In each state, the controller generates appropriate clocks and control signals that control the operation of the testability features. Appropriate actions of the testability features are initiated on the rising edge of TCK following the entry into a state. The TAP controller states provide the four basic actions required for testing: transportation of test data (Shift), stimulus application (Update), test execution (Run-Test), and response capture (Capture). Test data are transported generally in the beginning and at the end of a test. The state diagram for the TAP controller is shown in Figure 19-6. The TAP controller causes appropriate actions to occur only in the testability features selected by the current instruction in the instruction register. All other testability features maintain status quo. Status quo means that the registers either retain their previous state or continue to operate in their previously selected mode. DIGITAL CONFIDENTIAL. Testability Micro-Architecture 19-9 NVAX CPU Chip Functional SpecificatiOllt Revision 1.0, February 1991 A Scan Sequence begins with entry into the Capture State and end with the exit from the Update State. The Scan Sequence entered from Select-DR-Scan controls the instruction register, and the one entered from Select-DR-Scan controls the testability feature selected by the instruction register. The actions caused by the states in the two scan sequences are identical. The following is the brief description of each state. Figure 19-6: TAP Controller State Machine Values Shown are TMS 10 r::~:-~ ~h~~~R~C Exit~-OR o • • Test-Logie-Reset: This state disables the test logic. The chip performs normal system operation. Testability features are either inactive or are performing normal system operation. The TAP controller is forced into this state at power-up and it continues to remain in this state as long as TMS is held high. Run-TestlIdle: This is a combined controller" state between scan operations when the test logic is either idle or a particular test is running. For example, upon entry into this state, an internal test (such as self-test or macrocode test involving data reducers etc.) selected by the current instruction is executed. All other testability features (not involved in the current test) maintain status quo. 19-10 Testability Micro-Architecture DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 • • • • • • • • 19.8.2 Select-D~Scan: This is a temporary controller state in which all test registers (instruction register as wen as testability features) maintain status quo. If TMS is held low when the controller is in this state, then a scan sequence for the selected test feature is initiated. Select-IR-Sean: This is a temporary controller state in which all test registers maintain status quo. If TMS is held low when the controller is in this state, then a scan sequence for the Instruction Register is initiated. Capture: In this controller state, the chip data is parallel loaded into the selected test register Unstruction Register or testability feature). This is the state in which the observe action takes place. Shift: In this state the selected test register shifts data one stage towards its serial output on each rising edge of TCK. Exit1: This is a temporary controller state where all test registers maintain status quo. Pause: This controller state allows shifting of the selected test register to be temporarily halted. All test registers maintain status quo. Exit2: This is a temporary controller state. All test registers maintain status quo. Update: The selected test register updates its outputs by transferring data from the shifterstage into parallel output stage. Tnis update action is initiated on the first falling edge of TCK upon entry into the state. All other registers maintain status quo. Instruction Register The JTAG Instruction Register on !\·'V~.~ CPU consists of 2 bits. The two bits are interpreted as per Table 19-4 to select and control the operation of boundary scan register. During Caoture-IR state, the shift register stage of IR is loaded with data '01'. This automatic load feature is useful for testing the integrity of the JT...\G scan chain on module. Table 19-4: Instruction Register m< 1:0 > Test Register Selected Test IDstructioDl Operation 00 01 10 11 Boundary Scan Register EXTEST. Also forces reset to internal chip logic. Boundary Scan Register SAMPLE Bypass Register BYPASS Bypass Register BYPASS. Default A cen used in the instruction register is shown in Figure 19-7. The ir_cell operations are controlled by m_CAPTVRE_B, m_SlDFT_Cl, IR_SllIFT_C2, m_UPDATE_B and m_RESET_L signals. These signals are described later. DIGITAL CONFIDENTIAL Testability Micro-Architecture 19-11 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 19-7: JTAG Instruction Register Cell tdo ir _._. _. _...•.• -.-._.-.-.-.-.-.-.- -•.•... _. reset_I I Status! info I in IR o Q G o Output Bit o G G !._ ••• _._ ••••• - -.-.-.-.-.- _.-.-.-._.-••• -•• -.-. ir_update_h tdi Ir 19.8.3 ir_shift_c2 shift_c' Bypass Reg ister The bypass register provides a one bit scan route though the 2\V'AX chip during a scan-shift operations. It provides a means for effectively bypassing the ~\:~ CPU chip's test logic during testing at. module and system levels. 'When the bypass register is selected, a CAPTURE-DR controller state loads a '0' in the bypass register. V\1hen the JTAG instruction selects the Bypass operation, Bypass register is selected for the scan operation. 19.8.4 Control Dispatch Logic Dispatch logic generates signals to control operations of JTAG circuitry, including the the instruction register and the driver on TDO_H pin. It decodes the current instruction in the IR and the current TAP controller state information and dispatches the control signals to the bypass and boundary scan registers. The control signals dispatched are described below.. Dispatch to Boundary Scan Register • • • • • 19-12 BSR_EXTEST_R: Asserted high when the instruction selects EXTEST instruction. This allows boundary scan cells to drive data on output and 110 pins. BSR_EXTEST_H also forces an internal reset to chip logic. This makes chip's internal logic insensitive to test patterns used for interconenction test. BSR_CAP'I'URE_H: The signal is asserted when TAP controller enters the CAPTURE-DR state and deasserted when the TAP Controller exits CAPI'URE-DR state. The signal causes data to be observed into the boundary scan register, BSR_SBIFT_Cl: Issues a pulse with the falling edge of TCK_H during CAPI'URE-DR and SHIFT-DR states. BSR_SBIFT_C2: Unconditionally issues a pulse with each rising edge of TCK_H. BSR_UPDATE_R: Issues a pulse with the falling edge of TCK_H during UPDATE-DR state. This pulse loads new data into the parallel output latch in md_bcells described later. Testability Micro-Architecture DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Dispatch· to Bypass Register Dispatch to Bypass Register consists of BSR_CAP'lURE_H, BSR_SRIFT_Cl, and BSR_SHIFT_C2 signals. Note that these are subset of signals dispatched to the boundary scan register. Dispatch to Instruction Register • • • • • IR_CAPTVRE_B: The signal is asserted when TAP controller enters the CAPTURE-m state and deasserted when the TAP Controller exits CAPTURE-IR state. The signal causes status data ('01') to be observed into IR, IR_SHIFT_Cl: Issues a pulse with the falling edge of TCK_H during CAPTURE-IR and SHIFTIR state. m_SHIFT_C2: Unconditionally issues a shift pulse with each rising edge of TCK_H. Note that the data shifts from the most significant bit to least significant bit. The least significant bit is at TDO_H. IR_UPDATE_B: Issues a pulse with the falling edge of TCK_H during tJPDATE-IR state. This pulse loads ne\v instruction into the parallel output latches of IR. IR_BESET_L: This signal initializes the instruction register's output latches. When asserted 10\\, all IR output latches are set high to force B'l"PASS instruction. m_RESET_L is asserted low \vhen the T...I\P Controller enters the Test-Logie-Reset state. Olspatch to TDO Multiplexers and Driver !vIultiplexer control is dispatched by decoding the instruction register as per Table 19-4. EXABLE_ TDO_H is generated as follows. • ENABLE_TDO_H: This signal is asserted high when the TAP controller is in SHIFT-IR or SEnFT-DR states. The signal enables TDO_H pin driver whenever a shift operation is in progress and keeps it disabled all other times. Figure 19-8 and Figure 19-9 show the timing diagram of the signals dispatched by the Control Dispatch Logic and the behavior of the Boundary Scan Register and the Instruction Register during the IR-Scan and DR-Scan sequences. Notice that the implementation must meet the standard's requirement that the changes on TDO_ H occur with falling edge ofTCK_H signal. In NVAX CPU this requirement is met by including a timing latch at the TDO_H pin. The latch opens when the TCILH is low and closes when TCK_H is bigh. OIGITAL CONFIDENTIAL Testability Micro-Architecture 19-13 ""'" 1: Instruction Register Scan (Example: Load EXTEST Instruction) 11 ~ cJ I ~ !: n I i !1. 5 CO C TCK_" CiJ TMS_H ..& I.. STATE INSTRUCTION REG. I~CAP_INPUT IR_SO_OUT I~PAR_OUT iii· F4HMHtTtH1¥f--:---r-:m-:----:-m:m-:nTn:m-:m)~ ---- "¥"--_._._- : I : I __ : I __ : I __ IR CONTROL DISP. IR_CAPTURE_" IR_SHIFT_Cl I~S"IFT_C2 IR- UPDATE- " ~--~~--~--~~I \~~ ~ ~ ~~ ~ ~__~__~ I . I . I . I . I. . I. . I . ----~~~~--~~~ I . I . I . I I ,,~--~--~~ . I. . , . ----~~~~--~--~~ ,,~~--~~ I ,.,. I I. I . I. . I . I ----~~~~--~--~--~--~--~--~--~--~--~,,~--~ · I . . I . I . I . I I . ,. I - - - - - - - - - - - - _ . -_.. _._.._------ __ _-_._- · .•. BS~CAP_IN · I . I. I . . I I . . , , . . ,. I. ., . I . · I. .,. I . , . -------------_ .. --_.-_._ _. __ __ - , . ,. . I . ,---:---l-~--T--:--T-.. I-~ BSR_SIIIFT_Cl BSR_SHIFT_C2 c is ~ r 8z BS~EXTEST_" I I . . I. I . I .,. . I . : . : . : . :. I I . . I I . . I I . . .:.:. I I . I . I . . . . . I I . . I I . . I I )It STATE e ':x ',"':"0 e 3 :; co a: ~ ~ g It. l ta n ~ co :So CD ::I a e ~ CO) C Do c .. ;!:x!! n :t 2::I W .a Too_" Too_MUX Ii" CO I :.:. :. i:: _._ __ _ - _ . _ - - - - - - - - - - .. ~ CO ~ :D TDI_" m r- . . .. _--_ .. TOO_DRIVER § I I ... I ..& ..& 0 MISC Jl tJ ... m :1 ::J I BSR DISPATC" BSR_UPDATE_" . , I I . I , .. . . I" . . BSR_SO_OUT BSR_CAPTURE_" ...... · I . I . I . I . I' BSR BS~PA~OUT I__: I__: m ~ f !I. 8 ..... P ~ J ..... co cc ..... o i5 ~ r o Data Register Scan (Example: Assume EXTEST Instruction) TMS_H "TI STATE 8 m ~ .-;; C TC~" o Z l! CO CiJ .... ; BSR m m BSR_CAP_IN .... .... BSR_SO_OUT •co BSR_PAR_OUT ~ --------- Ii' BSR DISPATCH BS~CAPTURE_H BS~SHIFT_Cl BS~SHIFT_C2 BSR_UPDATE_H ~--~--~~I ~~~__~__~~__~__~____~__~__~~__~__~__~ . 1 . I. . I . I ~--~----~~ ~r--~----~--~--~ • 1 • I. • 1 • 1 • ~--~--~---~ ~~~--~--~--~ I .--, .~I .-1 . I . I --~--~--~----~--~--~--~----~--~--~--~-J~~--~--~ __ BS~EXTEST_H I~eAPTURE_H I~UPDATE_H ~ I R_SH I FT_C2 t MIse i Too_" --. ! U1 a. c :::J :a 1 . . . I . I Current (EXTEST) Instruction .-:--- I~~--Ir-- . l..:........=-.--L.--='----'----=----'--~___'______"=____'____=__'" 1---.-I - r---:--, IR DISPATCH ... ... g. i 3 Sco c I~S"IFT_Cl 2c ~ I~CAP_INPUT ~ () ~ co I~PAR_OUT cJ ~ g; n n IR IR_SO_OUT ~-I I I I . . I. I . I . r--=--- 1 I I . . I. I. I. I . ---r-- T--. I. I. I. . . TDI_H 1 I . __ --_. .... Ie • --- - . I. ----- . --------_._- 1 I. 1 I I I I. I . I . I. • CD :::J 2 e: ~ a n It. l f ~ ir:t. ~ f ~. !I. S ..e ~ ~ J TOO_DRIVER TOO_MUX STATE 1 I I I in 2 :::J .Q C r ~ CO ::1 --------------------------------------- ~ Ca Shift X SII 1-: . I~x ~ co U) ~ NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 19.8.5 Initialization The TAP Controller and the Instruction Register's output latches are initialized by PP_CMD_ H<O>. When PP_CMD_H<O> pin is asserted low, the TAP controller is forced to enter the Test-Logie-Rest state and the IR is forced to BYPASS instruction. During Test-Logie-Reset state, all JTAG logic, including boundary scan register, is in inactive state. That is, the chip performs normal system functions. The boundary scan logic is set to a passive sample (observe) mode. TAP controller leaves this state only when a JTAG test operation is desired and appropriate sequence is sent on TMS_H and TCILH pins. NOTE Note that PP_CMD_H< 0 > pin on NVAX CPU acts like a pseudo-TRST_L pin. Since this pin is internally pulled-up, a system designer must make provision to assert the pin low, at least during the power-up operation. This will keep all JTAG circuits inactive and allow system to wake up normally in system mode. 19.9 Boundary Scan Registers The ~'V:.4.X CPU chip's boundary scan register primarily facilitates int.erconnection test on module during module manufacturing and field service. Uses during other life cycle testing phases may also be possible. The boundary scan register is a single shift register formed by bounchlry scan cells placed at most of the chip's signal pins. The register is accessed via the JTAG port's TDI_H and TDO_H pins. Its operation is controlled by the control dispatch received from the JTAG Port. 19.9.1 Boundary Scan Register Cells The NVAX chip uses four main types of boundary scan cells. In_beell: Used on input-only pins. Figure 19-10 shows the block diagram. The bcell basically consists of l-bit shift register. The cell supports Sample and Shift functions. The cell is used at input-only pins. out_beeU: Used on output-only pins. Figure 19-11 show the block diagram. Besides the shift register, the cell has an output multiplexer. The cell supports the following functions: Sample, Shift, Drive outputs. The cell is used at miscellaneous output-only pins. 10_been: Used on bi-directional pins. Figure 19-12 show the block diagram. The cell is identical to the out_bcell cell except that it captures test data from the incoming data line. The cell supports Sample, Shift, Drive output functions. It is used at all 110 pins. 19-16 Testability Micro-Architecture DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 md_beell: Used on certain special pins and internal signals. For example, this cell is used on TS_WE_L, TS_OE_L, DR_WE_I.., DR_OE_L pins and on internal driver enable signals for bi· directional busses. Figure 19-13 show the block diagram of an md_bcell. The cell builds upon the out_beell. It has a third output latch which holds data at output steady while a shift operation is in progress. The cell supports Sample, Shift, Drive output, and Hold output functions. Figure 19-10: In_beell Boundary Scan Cell to sys log i c Pin t do ···········r;t::f:e·.·q i I par-! f"l i I i i i 1 I 1I i i I G t ~·;I··············· i G ............ b~·;:~h·if1·:·~·2··············· osr_snift_c' Figure 19-11: out_bee II Boundary Scan Cell t do bsr _capture_h i"0-u·r:"'·ijc·.iT·· •.•.•.•.•-•.•.•.•.• -.-•.•.•. _·············_·········i ~r'J°sm ; log ic ; pa,. i n l Pin i ~j ! It_.._.._._.._.___._._..___._ .....__._._ .-....___._..._. t d i ._._._._._.~ bsr_ahlft_c2 bsr_ahlft_c1 DIGITAL CONFIDENTIAL Testability Micro-Architecture 19-17 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 19-12: IO_beell Boundary Scan Cell to aya logio from .. in aya 10£110 par-In; i i i i !i i i ~ !............... • ........ _. _ _ _ _ ............ ••_ _ _ _........._ tdl . ••••••• 1 .a,_exte.t_h .a'_.h 1"_02 b.,_shlft_o1 Figure 19-13: md_bcell Boundary Scan Cell t ,-----. ········----···········r---··---;;;:.;;j·i! tdo b.,_oaptu,._h from aya 10 £Iio P ... i 'OI i i ! i...........................:.............___.........._____._... .__.! •• ,_ext •• t_h tdl •• ,_update_h b., _s h ift_02 b.,_ahift_o1 NOTE Caution: In NVAX CPU chip, when Boundary Scan Register is shifting data in EXTEST mode (that is, when bsr_extest_h is asserted) the shifting of data is transferred to the pins and is visible to the other components connected to the pins. Since the back-up cache interface pins are connected to RAMs which do not have boundary scan on them, the protection is provided by extra logic in the bcells on PJW bits. This is explained later. 19-18 Testability Micro-Architecture DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 19.9.2 Boundary Scan Register Organization The boundary scan register on NVAX CPU chip is 243 bits long. Table 19-5 lists all the signal pins and the associated boundary scan register cell type. The pins are listed in the order of their connection from TDI_H pin to to TDO_H pin. Thus, cell on the internal signal on signal C_PAD_N%NDAL_OUT_DRV_H is closest to TDI_H pin and the cell on pin CH1P_ID_H<II> is farthest from the TDI_H pin. In an entry with more than one pin, the cell on the first pin is closer to the TDI-pin. On-chip fuses provide a means to program each die with a unique ID number which can be used to trace a packaged part back to the lot, wafer, and die location of origin for yield analysis. Although it is not part of the chip boundary, the twelve bit CHIP_10_H<II> is connected to boundary scan chain so that the ID can be easily accessed through the JTAG port. Table 19-5: Boundary Scan Register Organization Signal Name Count Pin type BSR Cell Type Remarks C_PAD_N%NDAL_OUT_DRV_H 1 Int signal md_bcell Int Signal NDAL_H< 32:63 > OSC_H, OSC_L 32 110, tri, 4 pts io_beell 2 In none 1 In none OSC_TCl_H, OSC_TC2_H 2 In none PHI12_0UT_Ir, PH123_0UT_H 2 Out, ID, 4R none PHI41_0UT_H, PHI34_0UT_H 2 ~Out, ID, 4R none SYS_RESET_L 1 Out md_bcell ASYNC_RESET_L 1 In, ID, 3R in_beell DISABLE_OUT_L °SC_TEST_H 1 In, ID, 3R in_beell TEST_STROBE_H 1 In,ptp in_br:ell TEST_DATA_H 1 In, ptp in_beell IRQ..L< 0:3 > H_ERR_L, S_ERR_L 4 In, Op dr, 3D, lR in_beell 2 In, Op dr, 3D, lR in_beell INT_TIM_L 1 In, ptp in_beell PWRFL_L,HALT_L 2 in_beell :MACHlNE_CHECK_H 1 TEMP_H 1 PP_CMD_H< 0:2 > 3 PP_DATA_H< 0:11 > 12 TS_TAG_H< 17:31 > 15 TS_ECC_H< 0:5 > 6 TS_OWNED_H, TS_VALID_H 2 C_PAD_T%EN_TS_DRV_H 1 TS_INDEX_H< 5:20 > 16 In, ptp Out, ptp Out In, pull-up Out 110, tri 7 pts 110, tri 7 pts 110, tri, 7 pts Int. signal Out, 6 pts DIGITAL CONFIDENTIAL out_beell none none none io_beell io_beell io_beell md_bcell Int signal out_beell Testability Micro-Architecture 19-19 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 19-5 (Cont.): Boundary Scan Register Organization Signal Name Count Pin type BSR Cell Type TS_OE_L, TS_VVE_L 2 Out, 6 pts md_bcell DR_INDEX...H< 3:20 > 18 Out, 8 pts out_beell DR_OE_L, DR_WE_L 2 Out, 8 pts md_bcell C_PAD_D%EN_DR_DRV_H 1 Internal sig md_beell DR_DATA_H< 0:23 > 24 I/O, tri,19 pts io_bcell DR_ECC_H< 0:7 > 8 I/O, tri, 19 pts io_beell DR_DATA_H< 24:63> 40 I/O, tri, 19 pts io_bcell CPU_WB_ONLY_L 1 In, Op dr in_beell ACK_L 1 I/O, Op dr, 4 pts io_beell CPU_SUPRESS_L 1 Out, ptp out_beell CPU_HOLD_L 1 Out, ptp out_beell CPU_RE~L 1 Out, ptp out_bcell CPU_GRANT_L 1 In, ptp in_beell CMD_H< 0:3 > 4 I/O, tri, 4 pts io_beell ID_H< 0:2 > 3 I/O, tri, 4 pts io_beell 3 I/O, tri, 4 pts io_beell 32 I/O, tri, 4 pts ~ io_bcell PARITY_H< 0:2 > • NDAL_H< 0:31 > CHIP_ID_H<O:ll> 12 Int signal in_beell PHI12_IN_H, PHI23_IN_H 2 In, ID, 4R none PHI41_IN_H, PIll34_IN_H 2 In, ID, 4R none TMS_H 1 In, pull-up none TCK_H 1 In, pull-down none TDO_H 1 Out,tri,2D none TDI_H 1 In, ptp, pull-up none Remarks Int Signal Int Signal Some of the boundary scan register cells in NVAX are grouped together to form sections. A section is simply a collection of pins that are identical in nature and have identical boundary scan cells on them. A section is generally controlled and operated identically during certain test modes. The pins in a section may also be logically related and may be located physically together. Some such sections are described below. BSR at TAG Store Interface The boundary scan register at TAG Store interface consists of 4 sections: WE/OE bits, a driver enable bit on C_PAD_T%EN_TS_DRV_H signal, 23 data bits (tag, ECC and own), and 16 address (index) bits. Figure 19-14 shows the block diagram. The boundary scan cell type used in each segment is listed in Table 19-5. (The figure does not show the actual order of connection.) 19-20 Testability Micro-Architecture DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 19-14: Boundary Scan Register at TAG Store Interface ~--~T-------~--------~--------------------+-~-disable_out_1 WE Bit Data Bits (TAG) Address Bits (Index) b s r _ s h if t _ c 2 BSR to td 0 h ----------~----------------~--------------~Control ~--------~---------------b-s~r-~s-h-i-f~t-~c-1--------~Dispatch bsr_capture_h IEEE P'149.1 Port The following are some specific requirements. WE/OE Bits: The WE/OE Bits use md_bcells with additional logic to allow proper operation of RAMs during interconnection testing. When bsr_extest_h is asserted, the test data is injected on pins is as follows: • • TS_WE_L hit: Data injected is the logical OR of the value stored in the md_bcell's output latch and the complement of bsr_update_h signal. TS_OE_L hit: Data injected is the logical OR of the value stored in the md_bcell's output latch and the complement of bsr_capture_h signal. Idea is to assert these two signals appropriately in a non-overlapping manner and only when the boundary scan is not shifting the data. This enhancement allows the test operation to meet the timing constraints in accessing RAMs. (See reference [4].) It also protects RAM interface from the shifting data pattern. BSR at Data RAM Interface The BSR section at Data RAM interface also consists of 4 segments: WE/OE bits, a driver enable bit on C_PAD_D%EN_DR_DRV_H) signal, 12 Data bits, and 18 Index bits. The block diagram and operation of BSR at Data RAM interface are exactly same as the BSR at TAG Store interface. BSR at NDAL Interface The BSR section at NDAL data interface has a driver enable bit on the internal signal C_PAD_ No/cNDAL_OUT_DRV_H. It allows the drivers on bi-directional NDAL pins to be controlled by JTAG during testing. DIGITAL CONFIDENTIAL Testability Micro-Architecture 19-21 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 19.10 Internal Scan Register and LFSR Reducer NVAX CPU chip has several internal nodes observable via internal scan registers. This observability facilitates chip debug. Some internal scan register sections turned into LFSR Reducers to enhance fault coverage and reduce test vectors during chip manufacturing tests. 19.10.1 Internal Scan Register Cells Figure 19-15 shows the block diagrams of two types of cells used in NVAX. ISR cell is used for Scan-only registers and ISL is used for implementing Scan-cum-LFSR registers_ Figure 19-15: Cells for Internal Scan Registers PI PI so so SI __~~__________~ SI load h ISR Cell Cell for Scan-only ISL Cell Register Cell for Scan-cum-LFSR Register Figure 19-16 shows how an LFSR is constructed by using ISL cells and an ISR. The ISR cell used in the left-most bit position represents a dummy bit. The cell provides the multiplexer function required to enable feedback during LFSR operation. (Note that this cell can be replaced by an ordinary multiplexer. The feedback taps for the LFSRs are based on primitive characteristiC polynomial. (The actual taps used will be documented in respective box chapters when LFSR size and other constraints are known.) Internal Scan register's operations are controlled by internal NVAX clocks and by two signals received from the parallel port as follows: pm_4_H and pm..2_H are internal NVAX clocks. The PHL4_H loads the master and PHI...2_H loads the slave. 19-22 Testability Micro-Architecture DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision l.Ot February 1991 Figure 19-16: An ISR section tumed Into LFSR PI<n-1> S '4- ----. 41 ISR Cell - .... --;l1lI' ~ ISL Cell I .... r" ~ ISL Cell I f-+ ...-.-. - I ISL Cell I .... so ,. PHI_2 PHI_4 10 ad_h J f s r_ h ....•. -.•........ I Feed· back - i • • • • • • • • • • • • • • _._ •• t 'Vhen ISR_LOAD_H is asserted high the master latches in ISR and ISL cells capture/observe data from internal signals. ,Vhen ISR_LOAD_H is asserted low the internal scan register shifts data. Xote that the shift occurs independent. of assertion on ISH_LFSR_H. ISR_LOAD_R is latched in phase PReS before using it to control the ISRs. "When both ISR_LFSR_R and ISR_LOAD_H are asserted high, the internal scan register sections containing ISL cells operate as LFSRs to and compress data. ISR_LFSR_H is a;so latched in phase PBI_3 before using it to control the internal LFSRs. 19.10.2 Internal Scan Register Organization The Internal Scan Registers are divided into 2 groups: ISRI and ISR2. The ISRI consists of the scan register on the control store. It is used for patching the control store as well as reading out the control store during testing. ISR2 consists of all the other internal scan registers. Specific nodes included on the internal scan registers are listed in individual box chapters under their testability sections. The individual box scan registers are chained together, and are shifted out in the following order: Ibox, Ebox, Mbox, Cbox. Both ISRI and ISR2 operate at the internal clock rate. However, they are read out at the parallel port at NDAL clock rate. See Section 19.4.1 for details of ISRI and ISR2 operation. 19.11 Output Pin Tri-state Control NVAX CPU chip has a dedicated pin disable_out_l. When asserted low, the CPU chip tri-states output drivers on all output-only and bi-directional pins, except those listed below. When asserted, the pin also forces internally a reset to the NVAX CPU chip. DIGITAL CONFIDENTIAL Testability Micro-Architecture 19-23 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The only exceptions are the TDO_H pin and NDAL clock output pins which are not tristated by the disable_out_l pin. Not tristating clock output pins has been approved by the stage-1 module test engineers. Leaving out the TDO_H pin allows the JTAG circuits to operate while chip tristate is in effect. This affords additional flexibility for the module manufacturing test. For example, during the interconnection test, the NVAX outputs may be allowed to drive only during the CAPTURE-DR state and kept in tristate in all other states. This can eliminates the effect of shifting patterns, as well as drastically reduces the duration of time for which the drivers may see an interconnect short fault. The single pin tristate function is used only during testing. Note that the drivers on bi-directional I/O pins are also tristated by internal Cbox logic during RESET and by the boundary scan register during the interconnection test (EXTEST mode) .. The order of precedence is as follows: DISABLE_OUT_L, Boundary scan register, and the Cbox logic. 19.12 Operating Speed of Test Logic rne IEEE 1149.1 Port and the boundary scan register are designed to be operable in the range o to 10 !v!Hz at least. Internal scan registers operate at internal clock rate. A higher speed of 10 ~ffiz (instead of 5 1\1Hz) has been set to make the boundary scan register usable during the '\\'"afer probe testing. NOTE The JTAG circuitry design must account for the fact that TCK_H will not be driven in the running system. References 1. "mUG's Testability Document V1.0," NVAX Testability User's Group, December 1988. 2. "Common Test Architecture: Adaptations and Compatible Applications of IEEE P1149.1 Specification, Revision 1.0," Semiconductor Design & Engineering/Advanced Test Technology Group, February 1990. 3. IEEE Standard 1149.1-1990: "IEEE Standard Test Access Port and Boundary-Scan Architecture draft D3." January 1989, 4. "Testing connections to non-JTAG Static RAMs with JTAG Boundary Scan," D. K.. Bhavsar, DEC Internal Report, December 1989. 19-24 Testability Micro-Architecture DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 19.13 Revision History Table 19-6: Revision History Who When Description of change Dilip Bhavsar 06.Mar·1989 Release for external review. Dilip Bhavsar 1S-Jul-1989 First Update of specific details. Dilip Bhavsar lS-Jan-1990 3rd Release. Dilip Bhavsar 16-Mar-1990 Dilip Bhavsar 21-May-1990 Spec error in WE and E bcells corrected. (3.2) bcell on SYS_REST pin changed to md_bcell. IR and speed spec updated. Clock tristating removed. Serial port for PCache introduced. Dilip Bhavsar 14-June-1990 (3:3) Parallel port modes changed. JTAG Reset added. Box control· lability removed. IYJip Bhavsar 03-July-1990 (3.4) JTAG Reset finalized. bcell and other figures updated to refiect actUal implementation. Timing on JTAG con'tTol signals changed to be consistent mth edge-trigger design. Pins listed in order of their connection in BSR. J o'"'TJ. F. Brown 03-Jul-1990 Serial PCacne Port details added. Dilip Bhavsar 30-Jul-1990 Reset actions by JTAG EXTEST instruction and DlSABLE_OL7_ L pin added. Timing diagrams added. Parallel Port operation details added. ISR/lSL clocking changed to PHI_4 (master) and Phi_2 (slave). Final Edits for Rev 3.4. (See NITS 314, 330, 337, 351, 360) John K Brown 23-Aug-90 Timing diagram for Serial P·cache port added Dilip Bhavsar 28-Sep-90 Rev 3.5. PPort timing changed (NITS # 385). More description for PPort operation.Also, the boundary scan order updated to reflect implementation. John F. Brown 20-Feb-91 Rev 3.6. Updates for spec release: PPort fields & scan chain order DIGITAL CONFIDENTIAL ° Testability Micro-Architecture 19-25 ~·.~s · {l-s:- \ , NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 15-10 (Cont.): 1 v Cause Parse Tree for Soft Error Interrupts 2 v +----+ (select one) I I I I S_CEFSTS<OREAD> I +----+ (select one) I S_CEFSTS<WRITE> I AND NOT S CEFSTS<TO MBOX> I +----+ (select one) I I I +-----------------------------> Inconsistent status (should cause hard error interrupt) I I (Section 15.8.1.22) I I otherwise I +----+ (select one) I I 5 CEFSTS<LOST ERR> I +---=-----------=--------> Multiple errors in context of PTE read error I I (Section 15.8.1.18.5) I I otherwise I +------------------------> D-stream NDAL ownership read for Mbox write read data I error before write data merged with fill data (Section15.8.1.14) I I +----+ (select one) I I I I I I 5 CEFSTS<LOST ERR> I I +---=-----------=-------------> Multiple errors in context of PTE read error I I I (Section 15.8.1.18.5) I I I otherwise I I +-----------------------------> D-stream NDAL ownership read read data error I I (modify operand or read-lock) (Section 15.8.1.14) I I otherwise I +----------------------------------> Inconsistent status (either WRITE or TO_MBOX, but not both, I should be set) (Section 15.8.1.22) I otherwise I +----+ I I 5 CEFSTS<IREAD> I +---=+ (select one) I I 5 CEFSTS<LOST ERR> I +---=-----------=-------------> Multiple errors in context of PTE read error I I (Section 15.8.1.18.5) I I otherwise I +-----------------------------> I-stream NDAL read read data error I (Section 15.8.1.14) I I +----------------------------------> D-stream NDAL read read data error (PTE read) I I {Section 15.8.1.18.3) I I otherwise I +----------------------------------> Inconsistent status (TO_MBOX should be set) I (Section 15.8.1.22) I I otherwise +--------------------------------------------> Inconsistent status (either CEFSTS<RDE> or CEFSTS<TIMEOUT> should be set or, if CEFSTS<UNEXPECTED FILL> is set, it should cause a hard error interrupt) (Section 15.8.1.22) v 1 Figure 15-10 Cont'd on next page 1 At least one potential PTE cause must be found or the status is inconsistent (see Section 15.8.1.22). Some of the outcomes indicate a potential soft error interrupt cause which is not a potential PTE read error cause. These errors should be treated separately. DIGITAL CONFIDENTIAL Error Handling 15-67 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 15-10 (Cont.): Cause Parse Tree for Soft Error Interrupts 1 v S NESTS<NOACK> AND S-PCSTS<PTE ER>l (select-one) I I S_NEOCMD<CMD>-IREAD (select one) +---=+ I I I +----+ I I I I I I I I S NESTS<LOST OERR> I I I I I I I I I I I otherwise S_ NEOCMD<CMD>-DREAD I S_NEOCMD<CMD>-OREAD +---=----------=------------------------> Multiple errors in context of PTE read error I (Section 15.8.1.18.5) +---------------------------------------> Unacknowledged I-stream NDAL read (Section 15.8.1.16) +--------------------------------------------> Unacknowledged D-stream NDAL read (PTE read) I (Section 15.8.1.18.4) I I I +----+ (select one) I , , , I I I I S NESTS<LOST OERR> +---~----------=------------------------> Multiple errors in context of PTE read error I ,otherwise (Section 15.8.1.18.5) +---------------------------------------> Unacknowledged D-stream NDAL read (modify operand or re I I S_NEOCMD<CMD>-WRITE or WDISOWN I otherwise (Section 15.8.1.16) +--------------------------------------------> Inconsistent status (should cause hard error interrupt) , (Section 15.8.1.22) +--------------------------------------------> Inconsistent status (invalid command in NEOCMD<CMD» (Sect ion 15. 8 • 1. 22 ) S_NESTS<PERR> +----+ (select one) S_NESTS<INCON_PERR> +--------------------------------------------> NDAL inconsistent parity error I (Section 15.8.1.19) I otherwise +--------------------------------------------> NDAL parity error (Section 15.8.1.19) S_NESTS<LOST_PERR> +-------------------------------------------------> Lost NDAL parity error or inconsistent parity error I (Section 15.8.1.20) , I (status consistent with soft error interrupt in system environment error registers) , none of the above +-------------------------------------------------> Soft error interrupt from system environment I (Section 15.8.1. 21) +-------------------------------------------------> interrupt Inconsistent status (possible machine check or hard err during soft error interrupt processing) (Section 15.8.1.22) Figure 15-10 Cont'd on next page 1 At least one potential PTE cause must be found or the status is inconsistent (see Section 15.8.1.22). Some of the outcom~ indicate a potential soft error interrupt cause which is not a potential PTE read error cause. These errors should bE treated separately. 15-68 Error Handling DIGITAL CONFIDENTIAL NYAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 15-10 (Cont.): Cause Parse Tree for Soft Error Interrupts Notation: (select one) - Exactly one case must be true. If zero or more than one is true, the status is inconsistent. (select all) - More than one case may be true. (select all, at least one) - All the cases are possible causes of a soft error interrupt. More than one may be true. At least one must be true or the status is inconsistent. A case is not considered true if it evaluates to "Not a soft error interrupt cause". - fall-through case for (select one) if no other case is true. otherwise none of the above - fall-through case for (select all) or (select all, at least one) if no other case is true. 15.8.1.1 VIC Parity Errors Description: A parity error was detected in the VIC tag or data store in the Ibox. VIC Data Parity Errors: A parity error OCCUlTed in the data portion of the VIC. VIC Tag Parity Errors: A parity error occurred in the tag portion of the VIC. In all cases, the quadword virtual address of the error is in S_VMAR. Recovery procedures: To recover, disable and flush the VIC by re-writing all the tags (using the procedure in Section IS.3.3.1.1.1). Also, clear ICSR<LOCK>. 15.8.1.2 Peaehe Parity Errors Description: A parity error was detected in the Pcache. Either a tag parity error or a data parity error is reported, though tag parity errors in both the left and right banks may be reported simultaneously. The reference, whether it was a read or write, was passed to the Cbox as if the Pcache had missed. No data is lost. The Pcache is disabled because PCSTS<LOCK> is set. S_PCADR contains the physical address of operation incurring the error. The address should not be in 10 space. If it is, it is an inconsistent status (see Section 15.8.1.22). Recovery procedures: Clear PCSTS<LOCK>. Flush the Pcache and initialize the Pcache tag store (see Section 15.3.3.1.1.1.2). 15.8.1.3 Beache Tag Store Uncorreetable Errors Description: An uncorrectable ECC error or an addressing error resulted from reading the Bcache tag store. The Bcache is in ETM. The hexaword physical address of the transaction incurring the error is in S_BCETIDx.. (If the physical address is found to be in 10 space, it is an inconsistent status. See Section IS.8.1.22.) S_BCETAG contains the actual tag data and check bits read during the failing access. Software may use the routine TAG_ECC_CHECK in Section 1S.10 to ,check the tag data and determine the syndrome. The result of this check should give the result expected from S_BCETSTS<UNCORR,BAD_ADDR>. It should never be the case that both S_BCETSTS<BAD_ADDR> and S_BCETSTS<UNCORR> are set. If they are, it is an inconsistent status (see Section 1S.8.1.22). DIGITAL CONFIDENTIAL Error Handling 15-69 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 For any normal Mbox command (i.e., not BCFLUSH), this error leads to a fill of the block whose tag had the error. This is because the Cbox converts uncorrectable tag store errors into misses and sends the associated reference to memory. For reads, the reference sent out is a read or an ownership read, and when the data returns it is loaded in the Bcache. For writes, an ownership read is sent, and when the data returns the write is merged with it and it is loaded in the Bcache. When the fill finishes successfully, the tag is updated (overwriting the bad tag). If the fill times out, the tag is not overwritten. In some cases, this error leads to an NVAX CPU read timeout andlor a write timeout in memory. This occurs when the block was VALID-OWNED in the Bcache and is the same block that is being accessed by the failing operation. Errors resulting from these lost blocks are handled separately. Write-unlocks are a special case. No tag lookup is done for write-unlocks unless the Bcache is in ETM. If the Bcache is in ETM, and the tag store error occurs for that transaction, the write-unlock is sent to memory. Recovery procedure (all cases): Clear BCETSTS<LOCK>. If it is an addressing error, clear BCETSTS<BAD_ADDR>. Otherwise, clear BCETSTS<UNCORR>. 15.8.1.3.1 Case: BCETSTS<TS_CMD>=WUNLOCK Recovery procedure: Write a INVALID tag with good ECC to the tag with the error (using the BCTAG access path). Then flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). Software should prepare for another tag error during the Bcache flush by clearing BCETSTS of unrecoverable errors. Restart conditions: The Bcache was in ETM at the time the write-unlock arrived. The data is in memory may be corrupt and memory's ownership bit was cleared. Memory is corrupted at the location indicated by S_BCETIDX. Software must determine if the error is fatal to one process or the whole system and take appropriate action. 15.8.1.3.2 Case: BCETSTS<TS_CMD>=DREAD,IREAD,OREAD Recovery procedure: Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). Software should prepare for another tag error during the Bcache flush by clearing BCETSTS of unrecoverable errors. After flushing the Bcache, it is necessary to determine if any block is "lost". If a block's memory ownership bit is set and no writeback cache in the system has it owned, then the block is said to be lost. Use the procedure in Section 15.3.3.1.2.5. This procedure can result in :finding no lost blocks, one lost block, or multiple lost blocks. Restart conditions: If there is one lost block, it is not recoverable. Software must if the lost data was fatal to one process or the whole system and take appropriate action. If multiple blocks are lost (this isn't expected), crash the system. 15-70 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 15.8.1.3.3 Case: BCETSTS<TS_CMD>=R_INVAL,O_INVAL,IPR_DEALLOCATE Recovery procedure: Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). Software should prepare for another tag error during the Bcache flush by clearing BCETSTS of unrecoverable errors. Mter flushing the Bcache, it is necessary to determine if any block is "lost If a block's memory ownership bit is set and no writeback cache in the system has it owned, then the block is said to be lost. Use the procedure in Section 15.3.3.1.2.5. This procedure can result in finding no lost blocks, one lost block, or multiple lost blocks. tI. If exactly one block is lost, memory's owner ID information indicates this CPU, write a VALID-OWNED tag with the address of the lost block into the tag which had the error (using the BCTAG access means). Then flush this location to memory. An error could occur with this flush, in which case the data is not recoverable. NOTE If memory does not store an owner ID with each block in a particular system, then this recovery method is not recommended. Instead, the data should be considered lost. Restart conditions: If there is one lost block, and the repair procedure didn't incur an error, restart. . If the repair procedure was not successful, the data is not recoverable. Software must if the lost data was fatal to one process or the whole system and take appropriate action. If multiple blocks are lost (this shouldn't result from one tag store error), crash the system. 15.8.1.4 Lost Bcache Tag Store Errors Some number of unrecoverable Bcache tag store errors occurred and were not latched because BCETSTS already contained a report of an unrecoverable error. All unrecoverable tag store errors cause soft error interrupt, so this is definitely a cause of the soft error interrupt. Lost Bcache tag store errors may be caused by more than one operand prefetch to the same cache block. The Bcache is in ETM. Unrecoverable tag store errors can cause lost data by overwriting blocks in the Bcache. Unrecoverable tag store errors in ETM on write-unlocks can cause corrupted memory data. Recovery procedure: Clear BCETSTS<LOST_ERR>. Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). Software should prepare for another tag error during the Bcache flush by clearing BCETSTS of unrecoverable errors. . Restart conditions: Lost write-unlock errors may have corrupted memory. Crash the system. 15.8.1.5 Bcache Tag Store Correctable ECC errors Description: A correctable error occurred in accessing the Bcache tag store. The Bcache is not in ETM. S_BCETIDX contains the physical address of the error. (If the physical address is found to be in 10 space, it is an inconsistent status. See Section 15.8.1.22.) (The index portion of S_BCETIDX indicates which tag store entry had the error.) S_BCETAG contains the actual tag data and check bits read during the failing access. Software may use the routine DIGITAL CONFIDENTIAL Error Handling 15-71 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 TAG_ECC_C:HECK in Section 15.10 to check the tag data and determine the syndrome. ThE result of this check should be a correctable single-bit error. Recovery procedures: Clear BCETSTS<CORR>. If the operation was anything but a tag lookup for an explicit IPR deallocate operation (i.e., BCFLUSH), software should flush that one location by writing the BCFLUSH IPR. TBS (MTPR to (BCFLUSH + (S_BCETIDX & INDEX_MASK))) This effectively scrubs the Bcache tag store location by invalidating it and forcing it to be written back if it is owned. This may be done without putting the Bcache in software ETM. 15.8.1.6 Lost Bcache Tag Store Correctable ECC errors Description: A correctable error occurred in accessing the Bcache tag store, but it is lost because of an uncorrectable tag store error which also occurred. Recovery procedures: Clear BCETSTS<CORR>. The Bcache should be flushed (and it would be because of the uncorrectable error in any case). This effectively scrubs the Bcache tag store location by invalidating it. 15.8.1.7 Bcache Data RAM Correctable ECC Errors Description: A correctable error occurred in accessing the Bcache data RAM. The Bcache is not in ETM. S_BCEDIDX contains the cache index of the error, and S_BCEDECC contains the syndrome calculated by the ECC logic. It is not possible to reliably determine the physical address of the error, since the Bcache is not in ETM and therefore the block can be overwritten at any time after the error. Recovery procedures: Clear BCEDSTS<CORR>. lfthe opel19.tion was a read (S_BCEDSTS<DR_CMD>=DREAD or IREAD), lftware should flush that one location using the BCFLUSH IPR. TBS (MTPR to (BCFLUSH_BASE + (BCEDIDX & INDEX_MASK))) This effectively scrubs the B'cache data RAM location by invalidating it and forcing it to be written back if it is owned. This may be done without putting the Bcache in software ETM. 15.8.1.8 Lost Bcache Data RAM Correctable ECC Errors Description: A correctable error occurred in accessing the Bcache data RAM, but it is lost because of an uncorrectable data RAM error which also occurred. The address and syndrome of the error are not known. Recovery procedures:" Clear BCEDSTS<CORR>. The Bcache should be flushed (and it would be because of the uncorrectable error in any case). This effectively scrubs the Bcache data RAM location by invalidating it and forcing it to be written back if it is owned. 15-72 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 15.8.1.9 Bcache Data RAM Uncorrectable ECC Errors and Addressing Errors on l-8tream or D-Stream Reads Description (addressing error): A Bcache addressing error was detected by the Cbox in an I-stream or D-stream read during a Bcache hit. Addressing errors are the result of a mismatch between the address the Cbox drives to the RAMs for a read access and the address used to write that location. A multiple bit data error can appear to be addressing error, though it is extremely unlikely. Description (uneorrectable ECC error): A Bcache uncorrectable ECC error was detected by the Cbox in an I-stream or D-stream read during a Bcache hit. Uncorrectable data errors are the result of a multiple bit error in the data read from the Bcache. An addressing error with a single bit data error will appear as an uncorrectable data error. Description (both cases): The Bcache in in ETM. S_BCEDIDX contains the cache index of the error, and S_BCEDECC contains the syndrome calculated by the ECC logic. The physical address of the reference can be found by reading the tag for the data block (using the procedure in Section 15.3.3.1.2.4). (If the physical address is found to be in 10 space, it is an inconsistent status. See Section 15.8.1.22.) If the block's tag is found to contain an ECC error, then the address can not be determined. It should never be the case that both S_BCEDSTS<BAD_ADDR> and S_BCEDSTS<UNCORR> are set. If they are, it is an inconsistent status (see Section 15.8.1.22). Recovery procedures: To recover, clear BCEDSTS<LOCK>. Also, if it is an addressing error, clear BCEDSTS<BAD_ADDR>. Otherwise, clear BCEDSTS<UNCORR>. Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). If the data is owned by the Bcache and if the error repeats itself (is not transient), then a writeback error will result from the flush procedure. Software should prepare for this by clearing NESTS and BCEDSTS errors. Restart Conditions: If a writeback error occurs in the Bcache flush, then the data is presumed to be unrecoverable. See the next section for a description of handling an error in a writeback. Software must determine if the error is fatal to one process or the whole system and take appropriate action. If the address of the error in the flush is not the same as that of the original error, this is a multiple error case in the data RAMs and is a serious failure. Crash the system. 15.8.1.10 Bcache Data RAM Uncorrectable ECC Errors and Addressing Errors on Writebacks Description (addressing error): A Bcache addressing error was detected by the Cbox in an writeback Addressing errors are the result of a mismatch between the address the Cbox drives to the RAMs for a read access and the address used to write that location. A multiple bit data error can appear to be addressing error, though it is extremely unlikely. The NDAL WDATA cycle was converted to a BADWDATA cycle. Memory should have tagged the location as bad and unreadable by an implementation specific mechanism. Description (uncorrectable ECC error): A Bcache uncorrectable ECC error was detected by the Cbox in an writeback. Uncorrectable data errors are the result of a multiple bit error in the data read from the Bcache. An addressing error with a single bit data error will appear as an uncorrectable data error. The NDAL WDATA cycle was converted to a BADWDATA cycle. DIGITAL CONFIDENTIAL Error Handling 15-73 NVAX CPU Chip Funetional Specification, Revision 1.1, August 1:.991 Memory should have tagged the location as bad and unreadable by an implementation specific mechanism. Deseription (both cases): The Bcache in in ETM. S_NESTS<BADWDATA> should be set. If it isn't, and S_NESTS<LOST_OERR> and S_NESTS<NOACK> aren't set, then the writeback which incurred the error is still in the writeback queue in the BIU. Software should force the writeback queue to be drained (causing the second error event to occur) by reading from the CWB register. After this, NESTS, NEOADR, and NEOCMD should be captured again. If S_NESTS<BADWDATA> is set, then S_NEOADR contains the physical address of the lost writeback data. (If the physical address is found to be in 10 space, it is an inconsistent status. See Section 15.8.1.22.) IfS_NESTS<BADWDATA> isn't set but S_NESTS<.LOST_OERR> is, then the address of the lost writeback data is not available. If after draining the writeback queue, S_NESTS<BADWDATA> isn't set, then an inconsistency exists (see Section 15.8.1.22). It should never be the case that both S_BCEDSTS<BAD_ADDR> and S_BCEDSTS<UNCORR> are set. If they are, it is an inconsistent status (see Section 15.8.1.22). Recovery procedures: To recover, clear BCEDSTS<LOCK> and NESTS <BADWDATA>, if it is set. If it is an addressing error, clear BCEDSTS<BAD_ADDR>, otherwise clear BCEDSTS<UNCORR>. Flush the Bcache. Clear CCTLdIW_ETM> (after flushing the Bcache). Then use the system specific memory repair procedure to undo the tagged-bad data in memory (see Section 15.3.3.1.2.2.3). NOTE When clearing the tagged-bad data state of memory, software Ih~t first ensure that no more accesses to the block can occur. Otherwise there is the danger that some process on some other processor or a DMA 10 device will see incorrect data and not detect an error. Restart Conditions: The data is lost, software must determine if the error is fatal to one process or the whole system and take appropriate action. If the address of the lost data could not be determined, crash the system. 15.8.1.11 Lost Bcache Data RAM Errors With Possible Lost Writebacks Description: Lost Bcache data RAM errors which cause only a soft error interrupt (when S_NESTS indicates the possibility of a lost writeback error) indicate that data errors occurred on reads or writebacks, but no new write data was lost. S_NESTS reports the writeback error, unless multiple NDAL output errors have occurred. The Bcache in in ETM. Lost Bcache data RAM errors of this kind can be caused by an operand prefetch from a Bcache block followed by a write to the same block. 15-74 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 If S_NESTS<BADWDATA> is set, then S_NEOADR contains the physical address of a writeback. (If the physical address is found to be in 10 space, it is an inconsistent status. See Section 15.8.1.22.) Recovery procedures: To recover, clear BCEDSTS<LOST_OERR>. Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). Writeback errors may occur during the :flush. Software should prepare for this by clearing NESTS and BCEDSTS errors. If S_NESTS<BADWDATA> is set, clear NESTS<BADWDATA>. Use the system specific memory repair procedure to undo the tagged-bad data in memory (see Section 15.3.3.1.2.2.3) (the Bcache must 1?e flushed before this repair procedure). NOTE When clearing the tagged-bad data state of memory, software must first ensure that no more accesses to the block can occur. Otherwise there is the danger that some process on some other processor or a DMA 10 device will see incorrect data and not detect an error. Restart condition (S_NESTS<LOST_OERR> set): There is no way to determine how many writebacks failed. They all should have gone to memory with BADWDATA cycles, where memory would have them marked as tagged-bad data. So an unknown block may be tagged-bad in memory. If so, the next access to that block could come from the system itself, even if it belonged " only to one process. This will cause the system to crash. But there is a chance that the next access will come from a user process. This would allow the system to stay up, though that process would have to be deleted. It _ If the system's implementation of tagged-bad data is not reliable (see Section 15.11.1, Note On Tagged-Bad Data Mechanisms), software should crash the system. If it is reliable, restart. Restart condition (S_NESTS<LOST_OERR> not set): The writeback data is lost but the address is known. Software must determine if the error is fatal to one process or the whole system and take appropriate action. 15.8.1.12 Lost Bcache Data RAM Errors Without Lost Wrltebacks Description: Lost Bcache data RAM errors which cause only a soft error interrupt (when S_NESTS indicates no possibility of writeback error) indicate that data errors occurred on reads. No Write data was lost. Lost Bcache data RAM errors may be caused by more than one operand prefetch to the same cache block. The Bcache in in ETM. Recovery procedures: To recover, clear BCEDSTS<LOST_OERR>. Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). Writeback errors may occur during the flush. Software should prepare for this by clearing NESTS and BCEDSTS errors. Restart condition: Only reads from the Bcache failed. Restart is possible unless any error encountered during Bcache flush is fatal. DIGITAL CONFIDENTIAL Error Handling 15-75 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 15.8.1.13 NDAL I-Stream or D-Stream Read or D-Stream Ownership Read Timeout Errors Description: An I-stream or D-stream read or D-stream ownership read timed out before all the fill quadwords were received. This is not an accepted means for a system environment to notify the NVAX CPU of "non-existent memory or 10 location"·. The error could be caused by an error in the system environment or an NDAL parity error on the retum.ed data. It also could be caused by some previous error in the system environment or this CPU which leaves a cache block marked as owned in memory and not marked as owned in any cache in the system. S_CEFSTS<COUNT> indicates the number of quadwords received before the error. (S_CEFSTS<COUNT> should always be 11 (binary) if the address is in 10 space. If the address is in memory space; S_CEFSTS<COUNT> indicates the number of quadwords received.) The physical address is in S_CEFADR. I-stream or D-stream read: The Bcache is not in ETM. D-stream ownership read: The Bcache is in ETM. No write data has been merged with the returning fills. The address should not be in 10 space. If it is, it is an inconsistent status (see Section 15.8.1.22). If the ownership read was for an Mbox write, the write was sent on the NDAL after the OREAD timed out. If the ownership read was for a read-lock, the corresponding write-unlock should have been received from the Ebox. The write-unlock is sent as a quadword WDISOWN by the Cbox, so no memory location is left owned. (If the error was on the requested quadword, a machine check would definitely have resulted. If a separate error prevents the write-unlock, that will be reported either in ot~er error registers.) Recovery procedures (all cases): Clear CEFSTS<LOCK, TIMEOUT>. Additional Recovery procedures for D-stream ownership read (S_CEFSTS<WRITE> set): Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). Depending on the system environment, memory may have set Its ownership bit for this block. If so the write data must have been lost, and a hard error interrupt is expected. Use the system dependent procedure for reseting the ownership bit in memory. If memory would not have set its ownership bit for this block, memory's state may be correct and up to date. Additional Recovery procedures for D-stream ownership read (S_CEFSTS<WRITE> not set): Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). Depending on the system environment, memory may have set its ownership bit for this block. The data in memory is presumably still good. The Bcache block is marked invalid in the Bcache tag store. However, if the error occurred on a read-lock, the corresponding write-unlock should have occurred and it will have cleared the ownership bit for this block. If S_CEFSTS<COUNT> is greater than 0, then part of the data also is in the Bcache. In general, it is not possible to determine which quadwords are valid. However, if S_CEFSTS<RECLFILL_DONE> is set, then the quadword in the Bcache block pointed to by S_CEFADR is valid (except in the case of a read-lock, but the data shouldn't be needed for memory repair in that case). 15-76 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 If S_CEFSTS<COUNT> is greater than 0, and the address in S_CEFADR is not in 10 space, then the block was not owned before the operation began. In this case, use the system dependent procedures (see Section 15.3.3.1.2.2.1) to determine ifmemory's ownership bit is set and this CPU owns the block If so, use the system specific procedure (see Section 15.3.3.1.2.2.2) to reset it. In some systems (the XMI2 for example) this may require a quadword of correct data be written to memory to reset the ownership bit. Section 15.3.3.1.2.3 describes procedures for extracting data from the Bcache data RAMs in this case. If memory's ownership bit was left set as a result of this error and no non-destructive procedure exists for restoring it, then the hexaword block is lost. Restart condition: Restart if the memory state repair procedure is successful or no repair is called for, no data is lost, and the address is not in 10 space. If the hexaword block could not be repaired or data is lost, software must determine if the error is fatal to one process or the whole system and take appropriate action. Post Restart Recovery: If the same fill error recurs on restart, then the block is probably ttlost".l Software must determine if the error is fatal to one process or the whole system and take appropriate action. (If it is fatal only to one process, use the system dependent procedure for reseting memory's ownership bit.) NOTE It may be appropriate in this case to first cause each CPU in the system to flush its Bcache, and then restart once more. NOTE It ma:y be that another error (such as an uncorrectable tag store error on a coherence request) will be repaired by the soft error interrupt handler before the restart actually occurs, fortuitously repairing the cause of the fill error. 15.8.1.14 NDAL I·Stream or D-Stream Read or D·Stream Ownership Read Data Errors Description: An I-stream or D-stream read or D-stream ownership read terminated with an RDE (read data error) NDAL cycle before all the fill quadwords were received. IfS_CEFSTS<COUNT> is 0 or the address is an 10 space address, this is an accepted means for a system environment to notify the NVAX CPU of tinon-existent memory or 10 location". Otherwise, the error could be caused by an error in the system environment. It also could be caused by some previous error in the system environment or this CPU which leaves a cache block marked as owned in memory and not marked as owned in any cache in the system. S_CEFSTS<COUNT> indicates the number of quadwords received before (S_CEFSTS<COUNT> should always be 11 (binary) if the address is in 10 space.) the error. In any case, the physical address is in S_CEFADR. I-stream or D-stream read: The Bcache is not in ETM. D·stream ownership read: The Bcache is in ETM. No write data has been merged with the returning fills. 1 In this case the more general sense of "lost" is implied. That is, memory's ownership bit is set but no cache writes the data back when a read is done to that location. In some systems, it may be possible to identify which CPU memory "thlnks" owns the data, but it is often not possible to determine which error caused this situation to arise. DIGITAL CONFIDENTIAL Error Handling 15-77 NVAX CPU Chip Functional SpecificationJ Revision 1.lt August 1991 The address should not be in 10 space. If it is, it is an inconsistent status (see Section 15.8.1.22). If the ownership read was for an Mbox write, the write was sent on the NDAL after the OREAD was aborted. If the ownership read was for a read-lock, the corresponding write-unlock should have been received from the Ebox. The write-unlock is sent as a quadword WDISOWN by the Cbox, so no memory location is left owned. (If the error was on the requested quadword, a machine check would definitely have resulted. If a separate error causes prevent the write-unlock, that will be reported either in other error registers.) Recovery procedures (all cases): Clear CEFSTS<LOCK, RDE>. Additional Recovery procedures for D-stream ownership read (S_CEFSTS<WRITE> set): Flush the Bcache. Clear CCTLdIW_ETM> (after fiushing the Bcache). Depending on the system environment, memory may have set its ownership bit for this block. If so the write data must have been lost, and a hard error interrupt is expected. Use the system dependent procedure for reseting the ownership bit in memory. If memory would not have set its ownership bit for this block, memory's state_ may be correct and up to date. Additional Recovery procedures for D-stream ownership read (S_CEFSTS<WRITE> not set): Flush the Bcache. Clear CCTLdIW_ETM> (after fiushing the Bcache). • Depending on the system environment, memory may have set its ownership bit for this block. The data in memory could still be good. The Bcache block is marked invalid in the Bcache tag store. However, if the error occurred on a read-lock, the corresponding write-unlock should have occurred and it will have cleared the ownership bit for this block. If S_CEFSTS<COUNT> is greater than 0, then part of the data also is in the Bcache. In general, it is not possible to determine which quadwords are valid. However, if S_CEFSTS<RE'LFILL_DONE> is set, then the qua-1T '7()rd in the Bcache block pointed to by S_CEFADR is valid (except in the case of a read-lock, but the data shouldn't be needed for memory repair in that case). If S_CEFSTS<COUNT> is greater than 0, and the address in S_CEFADR is not in 10 space, then the block was not owned before the operation began. In this case, use the procedures in Section 15.3.3.1.2.2 to determine if memory's ownership bit is set. If so, use the system specific procedure (see Section 15.3.3.1.2.2.2) to reset it. In some systems (the XMI2 for example) this may require a quadword of correct data be written to memory to reset the ownership bit. Section 15.3.3.1.2.3 describes procedures for extracting data from the Bcache data RAMs in this case. If memory's ownership bit was left set as a result of this error and no non-destructive procedure exists for restoring it, then the hexaword block is lost. . Restart condition: Restart if the memory state repair procedure is successful or no repair is called for, no data is lost, and the address is not in 10 space. If the hexaword block could not be repaired or data is lost, software must determine if the error is fatal to one process or the whole system and take appropriate action. 15-78 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Post Restart Recovery: If the same fill error recurs on restart, then the block is probably "lost".l Software must determine if the error is fatal to one process or the whole system and take appropriate action. af it is fatal only to one process, use the system dependent procedure for reseting memory's ownership bit.) NOTE It may be appropriate in this case to :first cause each CPU in the system to flush its Bcache, and then restart once more. NOTE It may be that another error (such as an uncorrectable tag store error on a coherence request) will be repaired by the soft error interrupt handler before the restart actually occurs, fortuitously repairing the cause of the fill error. 15.8.1.15 Lost Bcache Fill Error Description: Some number of fill errors occurred and were not latched because CEFSTS and CEFADR already contained a report of an unrecoverable error. Lost Bcache fill errors which do not cause hard error interrupts are always read errors. Lost Bcache fill errors may be caused by more than one operand prefetch to the same cache block. Lost Bcache fill errors may leave blocks marked owned by this CPU in memory without the Bcache actually owning the block. The Bcache may be in ETM. Read S_CCTL<HW_ETM>to find out. Recovery procedures: Clear CEFSTS<LOST_ERR>. If the Bcache is in ETM, flush the Bcache and clear CCTL<HW_ETM> (in that order). Restart condition: Lost Bcachefill errors may leave blocks marked owned by this CPU in memory without the Bcache actually owning the block. In systems where the ownership bits are very reliably maintained (see Section-15.11.2, Note On Ownership Mechanism), restart. In systems where the ownership bits are not very reliably maintained, crash the system. 15.8.1.16 Unacknowledged NDAL I-Stream or D-Stream Read or D-Stream Ownership Read Description: An I-stream or D-stream read or D-stream ownership read was no-ACKed by the system environment. This could be because the external component(s) received bad NDAL parity or it could be due to a system-specific notification of "non-existent memory or 10 location". The physical address is in S_CEFADR. I-stream or D-stream read: The Bcache is not in ETM. D-stream ownership read: The Bcache is in ETM. The address should not be in 10 space. If it is, it is an inconsistent status (see Section 15.8.1.22). 1 In this case the more general sense of'1.ost" is implied. That is, memory's ownership bit is set but no cache writes the data back when a read is done to that location. In some systems, it may be possible to identify which CPU memory "thinks" owns the data, but it is often not possible to determine which error caused this situation to arise. DIGITAL CONFIDENTIAL Error Handling 15-79 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 If the ownership read was for an Mbox write, the write was sent on the NDAL after the OREAD timed out. If the write was also no-ACKed, a hard error interrupt would have been posted. That is handled as a separate error. Recovery procedures (all cases): Clear NESTS<NOACK>. Additional Recovery procedure for D-stream ownership read: Flush the Bcache. Clear CCTL<HW_ETM> (after flusbing the B~che). No error is expected during the Bcache flush. 15.8.1.17 Lost NDAL Output Error Description: Some number of NDAL output errors occurred. Some number of read no-ACKs and/or BADWDATAs were missed. Hard error interrupt would have occurred if a write or writeback was no-ACKed. Lost NDAL output errors may be caused by more than one operand prefetch to the same cache block. The Bcache may be in ETM. read S_CCTL<HW_ETM> to find out. Recovery procedure: Clear NESTS<LOST_OERR>. If CCTL<HW_ETM> is set, flush the Bcache and clear CCTL<HW_ETM> (in that order). Restart conditions: Lost NDAL output errors may leave tagged bad locations in memory. In systems where the method of implementing tagged-bad data is reliable (see Section 15.11.1, Note On Tagged-Bad Data Mechanisms), restart. If a tagged-bad block is not reliable in the particular system, crash the system. 15.8.1.18 PTE read errors The following sections describe error handling for PTE read errors. PTE read errors are read errors which happen in reads issued by the Mbo" ;n handling a TB miss. Handling of these errors is different from handling the same underlying error (Bcache data RAM error, Bcache fill error, or NDAL no-ACK error) when PTE read isn't the cause. If S_PCSTS<PTE_ER> is set, then a PTE read issued by the Mbox in processing a TB miss had an unrecoverable error. The TB miss sequence was aborted because of the error. The original reference can be any I-stream or D-stream read or write. PTE read errors are difficult to analyze, partly because the read error report in the Cbox does not directly indicate that the failing read was a PTE read. Because of this and because PTE read errors should be rare (a very small percentage of the reads issued by the Mbox are PTE reads), multiple errors which interfere with the analysis of the PTE error are not considered recoverable. If the reference which incurs the PTE read error is a write, S_PCSTS<PTE_ER_WR> will be set. In this case the original write is lost. No retry is possible partly because the instruction which took the machine check may be subsequent to the one which issued the failing write. Also, PTE read errors on write transactions can cause a machine check at a practically arbitrary time in a microcode flow, and core machine state may not be consistent. 1s-aO Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 15.8.1.18.1 Bcache Data RAM Uncorrectable ECC Errors and Addressing Errors on PTE Reads Description (addressing errors): A Bcache addressing error was detected by the Cbox in a PrE read during a Bcache hit. Addressing errors are the result of a mismatch between the address the Cbox drives to the RAMs for a read access and the address used to write that location. A multiple bit data error can appear to be addressing error, though it is extremely unlikely. Description (uncorrectable ECC errors): A Bcache uncorrectable data error was detected by the Cbox in a PTE read during a Bcache hit. Uncorrectable data errors are the result of a multiple bit error in the data read from the Bcache. An addressing error with a single bit data error will appear as an uncorrectable data error. Description (all cases): The Bcache in in ETM. S_BCEDIDX contains the cache index of the error, and S_BCEDECC contains the syndrome calculated by the ECC logic. The physical address of the PrE read can be found by reading the tag for the data block (using the procedure in Section 15.3.3.1.2.4). (If the physical address is found to be in 10 space, it is an inconsistent status. See Section 15.8.1.22.) If the block's tag is found to contain an ECC error, then the address can not be determined. S_BCEDSTS<LOST_ERR> may be set. This lost error is probably due to the same PTE error occurring more than once. This is an acceptable assumption unless a hard error interrupt occurs after handling this error. It should never be the case that both S_BCEDSTS<BAD_ADDR> and S_BCEDSTS<UNCORR> are set. If they are, it is an inconsistent status (Section 15.5.2.7). Recovery procedures (addressing errors): To recover, clear BCEDSTS<LOCK, BAD_ADDR>. Recovery procedures (uncorrectable ECC errors): To recover, clear BCEDSTS<LOCK, UNCORR>. Recovery procedures (both cases): Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). Clear PCSTS<PTE_ER>. If the data is owned by the Bcache and if the error repeats itself (is not transient), then a writeback error will result from the flush procedure. Software should prepare for this by clearing NESTS and BCEDSTS errors. Restart condition: If no writeback error occurs in the Bcache flush, restart if: (S_PCSTS<PTE_ER_WR> = 0). If crash the system. If a writeback error occurs in the Bcache flush, then the data is presumed to be unrecoverable. See Section 15.8.1.10 for a description of handling an error in a writeback (software must determine if the error is fatal to one process or the whole system and take appropriate action). DIGITAL CONFIDENTIAL Error Handling 15-81 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 15.8.1.18.2 NDAL PTE Read TImeout Errors Description: A PTE read timed out before any fill quadword was received. This is not an accepted means for a system environment to notify the NVAX. CPU of "non-existent memory or 10 location". The error could be caused by an error in the system environment or an NDAL parity error on the returned data. It also could be caused by some previous error in the system environment or this CPU which leaves a cache block. marked as owned in memory and not marked as owned in any cache in the system. S_CEFSTS<COUNT> indicates the number of quadwords received before the error. (S_CEFSTS<COUNT> should always be 11 (binary) if the address is in 10 space.) The physical address is in S_CEFADR. CEFSTS<WRlTE> should not be set. If it is, it is an inconsistent status (see Section 15.5.2.7). The physical address of the PTE is in S_CEFADR. The Bcache is not in ETM. The read could not have been an ownership read, so this error can not have caused the ownership bits in memory to be left in the wrong state. S_CEFSTS<LOST_ERR> may be set. This error is probably due to the same PTE error occurring more than once. This is an acceptable assumption unless a hard error interrupt occurs after handling this error. Recovery procedures: Clear CEFSTS<LOCK, TIMEOUT>. Clear PCSTS<PTE_ER>. Restart condition: Restart if: Otherwise, crash the system. Post Restart Recovery: If the same fill error recurs on restart, then the block is probably "lost".l Software must determine if the error is fatal to one process or the whole system and take appropriate action. (If it is fatal only to one process, use the system dependent procedure for reseting memory's ownership bit.) NOTE It may be appropriate in this case to first cause each CPU in the system to flush its Bcache, and then restart once more. NOTE I t may be that another error (such as an uncorrectable tag store error on a coherence request) will be repaired by the soft error interrupt handler before the restart actually occurs, fortuitously repairing the cause of the fill error. 1 In this case the more general sense of "lost" is implied. That is, memory's ownership bit is set but no cache writes the data back when a read is done to that location. In some systems, it may be possible to identify which CPU memory "thinks" owns the data, but it is often not posSlble to determine which error caused this situation to arise. 15-82 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 15.8.1.18.3 NDAL PTE Read Data Errors Description: A PTE read ended with an RDE (read data error) NDAL cycle before any the fill quadwords were received. This is an accepted means for a system environment to notify the NVAX CPU of tlnon-existent memory or 10 locationtl . Otherwise, the error could be caused by an error in the system environment. It also could he caused by some previous error in the system environment or this CPU which leaves a cache block marked as owned in memory and not marked as owned in any cache in the system. S_CEFSTS<COUNT:> indicates the number of quadwords received before the error. (S_CEFSTS<COUNT> should always be 11 (binary) if the address is in 10 space.) The physical address is in S_CEFADR. CEFSTS<WRITE> should not be set. If it is, it is an inconsistent status (see Section 15.5.2.7). The physical address of the PTE is in S_CEFADR. The Bcache is not in ETM. The read could not have been an ownership read, so this error can not have caused the ownership bits in memory to be left in the wrong state. S_CEFSTS<LOST_ERR> may be set. This error is probably due to the same PTE error occurring more than once. This is an acceptable assumption unless a hard error interrupt occurs after handling this error. Recovery procedures: Clear CEFSTS<LOCK, RDE>. Clear PCSTS<PTE_ER>. Restart condition: Restart if: • Otherwise, crash the system. Post Restart Recovery: If the same fill error recurs on restart, then the block is probably "lost".l Software must determine if the error is fatal to one process or the whole system and take appropriate action. (If it is fatal only to one process, use the system dependent procedure for reseting memory's ownership hit.) NOTE It may be appropriate in this case to first cause each CPU in the system to flush its Bcache, and then restart once more. NOTE It may be that another error (such as an uncorrectable tag store error on a coherence request) will be repaired by the soft error interrupt handler before the restart actually occurs, fortuitously repairing the cause of the fill error. 1 In this case the more general sense of 'lost" is implied. That is, memory's ownership hit is set but no cache writes the data back when a read is done to that location. In some systems, it may be possible to identify which CPU memory "thinks" owns the data, but it is often not possible to determine which error caused this situation to arise. DIGITAL CONFIDENTIAL Error Handling 15-83 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 15.8.1.18.4 Unacknowledged NDAL PTE Read Description: A PrE read was no-ACKed by the system environment. This could be because the ex.tem.al component(s) received bad NDAL parity or it could be due to a system-specific notification of "non-existent memory or 10 location". The physical address of the PTE is in S_NEOADR. The Bcache is not in ETM. S_CEFSTS<LOST_OERR> may be set. This error is probably due to the same PTE error occurring more than once. This is an acceptable assumption unless a hard error interrupt occurs after handling this error. Recovery procedures: Clear NESTS<NOACK>. Clear PCSTS<PTE_ER>. Restart condition: Restart if: Otherwise, crash the system. 15.8.1.18.5 Multiple Errors Which Interfere with Analysis of PTE Read Error Because PTE read errors lead to several unusual cases, restart is not recommended in the event that other errors cloud the analysis of the PTE read error. Pending Interrupts: A hard or soft error interrupt should be pending, or possibly both. Recovery procedures: No specific recovery action is called for. Restart condition: No restart is possible. Crash the system. 15.8.1.19 NDAL Parity Errors Description: A cycle with a parity ~rror was received by the NVAX CPU chip from the NDAL. If it is an inconsistent parity error, another node acknowledged the transaction despite the parity error seen by the NVAX chip. The Bcache is in ETM. The Bcache is coherent with memory because it only accesses VALID-OWNED locations in the Bcache data RAMs once in ETM. Some other node's request may timeout because the Cbox missed a coherency request for writeback. The Pcache may now be incoherent since an NDAL write to a Bcache VALID-UNOWNED location may have been missed. In some systems (e.g., OMEGA), a no-ACK on an NDAL command implies no effect from that command took place. This makes NDAL parity errors very recoverable. In other systems (e.g., XMI2), a no-ACK on an NDAL command does not imply this (for invalidates forwarded from the XMI2 bus), and all parity errors imply possible lost invalidates and incoherent Pcache. Recovery procedure: Clear NESTS<PERR> and NESTS<1NCON_PERR>. Flush the Bcache. Clear CCTL<HW_ETM> (after flushing the Bcache). Restart condition: If no-ACK in the specific system implies a command was not effective, and if the error was not an inconsistent parity error, restart. Otherwise, It isn't possible to determine whether the interrupted instruction stream may have seem the effect of out of order writes because of the Pcache missing an invalidate. Crash the system. 15-84 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 15.8.1.20 Lost Parity Errors Description: Some number of cycles with parity errors were received by the NVAX CPU chip from the NDAL. Some may have been inconsistent parity errors. The Bcache is in ETM. The Bcache is coherent with memory because it only accesses VALID-OWNED locations in the Bcache data RAMs once in ETM. Some other node may timeout because the Cbox missed a coherency request for writeback. The Pcache may now be incoherent since an NDAL write to a Bcache VALID-UNOWNED location may have been missed. Recovery procedure: Clear NESTS<LOST_PERR>. CCTL<HW_ETM> (after fiushing the Bcache). Flush the Bcache. Clear Restart condition: It isn't possible to determine whether the interrupted instruction stream may have seem the effect of out of order writes because of the Pcache missing an invalidate. Crash the system. System Environment Soft Error Interrupts 15.8.1.21 Description: Errors which occur in the system environment and do not result in loss of data or which can notify the NVAX CPU by returning RDE also notify the CPU of the error by asserting S_ERR_L (e.g., read errors). Errors which are corrected automatically by hardware and do not result in loss of data should use soft error interrupt notification. NOTE It is important that components in the system environment which assert S_ERR_L have a CPU accessible register which reports the S_ERR_L assertion. Attention should be given to the robustness tagged-bad data schemes. If error detection for these schemes is good enough, then error recovery may be able to ignore lost soft errors. Lost soft errors are very possible in NVAX systems because the first error doesn't normally prevent NVAX from continuing to issue new requests (sue to macropipelining). Similarly, good error detection schemes on the ownership bits in memory may facilitate recovery from lost soft errors. It is also recommended that an address be stored where applicable. They allow software to do improve the systems chance of surviving an error event without crashing by cleaning up tagged-bad locations and the like. For example, a write timeout clearing a page in the VMS page handler may be unrecoverable, while clearing that tagged-bad data location before it ever got to the page handler might be quite recoverable. Recovery procedures: Clear the error status bits in the system registers and perform any necessary system dependent recovery procedure. Restart condition: Typically, restart is possible, though in cases where data is lost software may have to kill one process or crash the system. 15.8.1.22 Inconsistent Status In Soft Error Interrupt Analysis Description: A presumed impossible error report was found in the error registers. This could be due to a hardware failure or bug. Recovery procedures: No specific recovery action is called for. DIGITAL CONFIDENTIAL Error Handling 15-85 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Restart condition: No restart is possible. The integrity of the entire system is questionable. Crash the system. NOTE This status can result if machine check occurs. Software may employ some mechanism for determining that this occurred, but it must be sure that mechanism can't ever falsely indicate that an inconsistent status is acceptable. Inconsistent status is a serious problem and should not be ignored. 15-86 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specificatio~ Revision 1.1, August 1991 15.9 Kernel Stack Not Valid Exception A Kernel Stack Not Valid Exception occurs when a memory management exception is detected while attempting to push information on the kernel stack during microcode processing of another exception. Note that a console halt with an error code of ERR_INTSTK is taken if a memory management exception is encountered while attempting to push information on the interrupt stack. The Kernel Stack Not Valid exception is dispatched through SCB vector 08 (hex:) with the stack frame shown in Figure 15-11. Figure 15-11: Kernel Stack Not Valid Stack Frame 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ PC I : (SP) .--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ PSL +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ DIGITAL CONFIDENTIAL Error Handling 15-87 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 15.10 Error Recovery Coding Examples To be supplied. 15.11 Miscellaneous Background Information This section contains miscellaneous background information relevant to this error ha~dling chapter. 15.11.1 Note On Tagged-Bad Data Mechanisms Writebacks which are sent as BADWDATA are supposed to appear as tagged-bad data in memory, and further reads to that block should fail. In some systems, tagged bad data is implemented by a mechanism as reliable as that used to store data. In at least one system (OMEGA), tagged-bad data is implemented by altering the ECC code of the data as it is written. Some single-bit and many double-bit errors in this data can make it appear to be correctable or correct when read. This is less protection from error than valid data has. In such a system, an error which results in a lost tagged-bad-data block is reason to crash the system. In systems with reliable storage of "tagged-bad-data", operation can continue after such an error because it is essentially certain that any process which accesses that data will see an RDE error for that block and will machine check before it uses the bad data. The Bcache data RAMs in NVAX use the above relatively unreliable mechanism for tagged-bad data. Three ECC check bits are flipped in the stored value. This mechanism would often prevent a subsequent read from succeeding, but it is not sufficiently reliable to allow missing tagged-bad blocks in the Bcache to be tolerated. As a result, all errors which may have left a tagged-bad block in the Bcache without some error address register pointing it out are cause to crash the system. 15.11.2 Note On Ownership Mechanism In the absence of additional errors, the memory/cache ownership mechanism ensures that no other process can access the block whose ownership bit is set in memory and is not owned by any cache. Cache coherence in the system depends on this mechanism. In some systems, memory error detection and correction for ownership bits is as reliable as for data. This is true of XMI2 based systems. However, in some systems the mechanism is less reliable. One example is the OMEGA system, where the ownership bits are stored with a single-bit-error-detect-and-correct scheme which can not detect most double bit errors and therefore interprets most double bit errors as correctable single bit errors. In such a system, error situations in which unknown blocks in memory may be owned should be taken as a system crash. In OMEGA, there is a proposal make up for the non-robust ownership hit error detection scheme by flushing the cache on every "correctable" ownership bit error in the NMC. If the "correctable" error really is an uncorrectable error, this may be detected by a WDISOWN to an unowned memory location. This is because some uncorrectable errors are seen as correctable errors, so one ownership bit is flipped by memory's error correction hardware and at least two bits were wrong to start with. There is a chance that the "correction" flips one of the bad bits, but it could also flip one of the remaining correct bits. This leaves the memory with one or three incorrect ownership bits after an uncorrectable "correctable" error. If every cache is flushed immediately 15-88 Error Handling DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 after a "correctable" error, then writebacks to apparently unowned locations may result if the error is inadvertently made worse by the correction scheme. These are detectable protocol errors and should lead to a system crash. If the effect of the error correction was to mark block(s) as owned when no cache owns them, then eventually some process will attempt to access that data and time out. If the error was successfully corrected, then Hushing the caches causes a pause in processing and no bad effects. If these errors are infrequent, this seems an acceptable loss in performance in exchange for increased reliability. DIGITAL CONFIDENTIAL Error Handling 15-89 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 15.12 Revision History Table 15-6: Revision History Who When Description of change Mike Uhler 06-Mar-1989 Release for external review. Mike Uhler 19-Dec-1989 Update for second-pass release. John Edmondson 12-Feb-1990 Update with error handling information. John Edmondson 3O-Jun-1990 Update further after internal review and resolution of many issues. John Edmondson 31-May-1991 Minor updates for pass 2 changes. 1~90 Error Handling DIGITAL CONFIDENTIAL Chapter 16 Chip Initialization 16.1 Overview This chapter describes the hardware initialization process for the NVAX CPU chip. The hardware and microcode start the initialization, and then pass control to the console macrocode at address E0040000 for further initialization. Much of the job of initialization involves setting the NVAX internal processor registers (IPRs) to a known state, or using NVAX IPRs to perform functions such as cache initialization. See Chapter 2 for a list of the NVAX IPRs. Also, see the individual box chapters for a more in depth definition of many of the IPRs. 16.2 Hardware/Microcode initialization The NVAX Chip hardware initializes to the following state on powerup or the assertion of chip reset: 1. The VIC, Pcache, and Bcache are disabled. 2. The RLOG is cleared. 3. The Fbox and vector unit are disabled. 4. The microstack is cleared. 5. The Mbox and Cbox are reset, and all previous operations are flushed. 6. The Fbox is reset. 7. The !box is stopped, waiting for a LOAD PC. 8. All instruction and operand queues are flushed. 9. All MD valid bits are cleared, and all Wn valid bits are set. 10. A powerup microtrap is initiated which starts the Ebox at the label IE~POWERUP.. The NVAX Chip microcode then does the following: 1. Hardware interrupt requests are cleared. 2. ICCS<6> is set to O. 3. SISR<15:1> is set to O. 4. ASTLVL is set to 4. DIGITAL CONFIDENTIAL Chip Initialization 16-1 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 5. The Mbox PAMODE IPR is set to 30-bit physical address mode. 6. CPUID is set to O. 7. The BPCR branch history algorithm is reset to the default value. 8. Backup PC is retrieved from the Ibox and saved in SAVPC. 9. PME is cleared. 10. The current PSL, halt code, and value of MAPEN are saved in SAVPSL. 11. MAPEN is cleared (memory management is disabled). 12. All state flags are cleared. 13. PSL is loaded with 041FOOOO. 14. PC is loaded with E0040000 (the address of the start of the console code). 16.3 Console initialization The console macrocode has the job of filling the gap between the initialized state described above and the initial state needed for the operating system. To that end, the console code does the following: 1. Set CPUID to the correct value from the system environment. 2. Set ECR (Ebox Control Register) as follows: 1. Set FBOX_ENABLE to enable the Fbox. 2. Set S3_TIMEOUT_EXT as required by the system environment. 3. Set FBOX_ST4_BYPASS_ENABLE to enable Fbox stage 4 bypass. 4. Write one to S3_STALL_TIMEOUT to clear any error. 5. Set ICCS_EXT as required by the system environment. 3. Set I"'~R (Ibox Control Status Register) as follows: 1. Clear ENABLE to leave the VIC disabled. 2. Write one to LOCK to clear any error. 4. Set the PAMODE register MODE bit as required by the system. 5. Write one to clear the LOCK bit in TBSTS (Translation Buffer Status). 6. Initialize the PCSTS (Pcache Status) Register: 1. Write one to clear the LOCK bit. 2. Write one to clear PTE_ER_WR. 3. Write one to clear PTE_ER. 7. Set CCTL (Cbox Control) as follows: 1. Clear ENABLE to leave the Bcache disabled. 2. Set TAG_SPEED, DATA_SPEED, and SIZE to reflect the Bcache RAM configuration in the system. 3. Clear FORCE_HIT. 4. Clear DISABLE_ERRORS. 5. Clear SW_ECC. 6. Clear TIMEOUT_TEST. 7. Clear DISABLE_PACK to allow the write packing feature. 16-2 Chip Initialization DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 8. Clear SW_ETM. 9. Write one to clear HW_ETM. 8. Clear the various Cbox error registers: 1. BCETSTS (Bcache Error Tag Status): Write one to LOCK, CORR, UNCORR, BAD_ADDR, and LOST_ERR to clear any errors. 2. BCEDSTS (Bcache Error Data Status): Write one to LOCK, CaRR, UNCORR, BAD_ ADDR, and LOST_ERR to clear any errors. 3. CEFSTS (Cbox Error Fill Status): Write one to RDLK, LOCK, TIMEOUT, RDE, and LOST_ERR to clear any errors. 4. NESTS (NDAL Error Status): Write one to NOACK, BADWDATA, LOST_OERR, PERR, INCON_PERR, and LOST_PERR to clear any errors. 16.4 Cache initialization Either the console code or the operating system will do the following final initialization steps (code examples are given): 1. Initialize the VIC This code initializes the VIC by writing all 128 tags with good parity and all valid bits clear. movl movl movl movl movl movl vic_loop: mtpr mtpr add12 cmpl bneq #"xOOOOO020, to, rl to, r2 #"xOOOO0800, tPRl9 $_VMAR, fPRl9$_VTAG, rO r3 r4 rS r2, r4 rl, rS rO, r2 r3, r2 vic_loop tag index increment E 1 hexaword block tag init value VIC tag starting address VIC tag ending address + 1 block VIC memory address register (VMAR) VIC tag register (VTAG) write current index to VMAR write the tag via VTAG increment index by the block size check if done 2. EIiable the VIC mtpr f<icsr$m_enable+icsr$m_lock>, tPR19$_ICSR 3. Initialize the Bcache tags This code initializes the Bcache by writing all tags with good ECC and all valid and owned bits clear. This example initializes a S12Kb Bcache. This code can be changed to in it the other legal Bcache sizes by changing the value in R3. SW_ECC in CCTL is clear, so the CBOX will generate correct ECC for the tag/valid/owned bits. movl movl movl movl bcache_loop: mtpr add12 cmpl bneq t"xOOOOOO20, rO fO, rl f"x01000000, r2 f"x01080000, r3 tag index increment - 1 hexaword block tag init value Bcache tag starting address Bcache tag ending address + 1 block for S12Kb Bcache rl, r2 rO, r2 r3, r2 bcache_loop write tag to current tag address increment index by the block size check if done DIGITAL CONFIDENTIAL Chip Initialization 16-3 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 4. Initialize the Bcache data .SBTTL ZERO_BCACHE_DATA :++ : ZERO_BCACHE_DATA - Write zero data and good ECC to the BCACHE data rams ;-- BYTES_PER_QUADWORD - 8 BYTES_PER_PAGE - 512 QUADWORDS_PER_PAGE - BYTES_PER_PAGE!BYTES_PER_QUADWORD ZERO BCACHE DATA: PUSHR t M<RO,Rl,R2,R3,R4,RS,R6> : Save registers MFPR iPRS CPUID, R5 ; XMI node id MOVL SYSLSL BACKUP CACHE CONSTANT[R5],Rl ; For.mative cache constant MTPR Rl,fPRl3S CCTL ; Set cache with default constant EXTZV tPRl3_CCTLSV_SIZE,tPR13_CCTLSS_SIZE,Rl,R2 : Extract backup cache size MOVL SYSLSL_BCACHE_PAGE_CONSTANT[R2],R5 ; Cache page count CLRL R6 "AOB" index CLRQ Rl ; Quadword data to be written to BCACHE rams lOS: MULL3 tBYTES_PER_PAGE,R6,R3 ; BCACHE page index to write BSBW MAP PHYSICAL ADDRESS Map R3 PA to R4 VA CLRL R3 - ; "AOB"-index 20$: JSB @IO_WRITE_BCACHE_DATA ; Write BCACHE data ADDL2 iBYTES PER QUADWORD,R4 ; Update VA AOBLSS iQUADWORDS_PER_PAGE,R3,20$ ; Loop 'til done AOBLSS R5,R6,lO$ ; Loop 'til done MFPR iPRS CPUID,RS ; XMI node id MOVL SYSLSL BACKUP CACHE CONSTANT[RS],Rl ; For.mative cache constant MTPR Rl,tPRl3S CCTL ; Set cache with default constant POPR tAM<RO,Rl~R2,R3,R4,R5,R6> ; Restore registers RSB ; Return h ;++ MAP_PHYSICAL_ADDRESS - Map a physical address with a system VA INPUTS: R3 Physical address to map to system VA OUTPUTS: R4 System VA of physical address in R3 MAP PHYSICAL ADDRESS: PUSHR t M<RO,Rl,R9> ; Save registers BSBW GET XNP NUMBER ; CPU number to R9 MOVAL @SYSLOA SPTE[R9],RO : Address of this CPU's SPTE BICL2 tPTESM_VALID, (RO) ; Invalidate SPTE h INVALIDATE_TB ENVIRON-UNMAPPED ; Invalidate this TB MOVL (RO),Rl ; SPTE EXTZV tVA$V VPG,fPTESS PFN,R3,R4 ; Address PFN INSV R4,iPTE$V PFN,fPTE$S PFN,Rl ; Insert the PFN BISL3 t<PTESM VALID!PTESM-MODIFY!PTESC KW>,Rl, (RO) ; Map PFN EXTZV fVASV BYTE,fVASS BYTE,R3,RO ; Address byte offset MULL3 fS12,R9,Rl ; ThIs CPU's page offset MOVAB @SYSLOA_SPTE_VA[R1J,R4 ; VA that this CPU's SPTE maps INSV RO,tVASV_BYTE,tVASS_BYTE,R4 ; A VA that maps physical address POPR t AM<R0,Rl,R9> ; Restore registers RSB ; Return 16-4 Chip Initialization DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 ;++ INPUTS: Rl - Lo longword data to be written to BCACHE R2 - Hi longword data to be written to BCACHE R4 - Virtual address that maps physical address corresponding to secondary cache index to be written. OUTPUTS: RO LBS indicates BCACHE data written, otherwise clear ;-- ;******************************************************************************* This routine cannot be stepped through using XDELTA. The FORCEHIT bit in the backup cache control is set and will cause erroneous hits to occur in the secondary cache. , ;******************************************************************************* .ALIGN LONG IO WRITE BCACHE DATA ROUTINE: MOVI.. R3~IO SAvED REGISTER ; Save register MTPR to,tP~_TBIA ; Reset TB allocation pointer CLRL RO ; Signal failure TSTL 10$ ; Ensure TB hit TSTL 30$ ; Ensure TB hit TSTL (R4) Ensure TB hit TSTL B A 4(R4) ; Ensure TB hit MOVAB 10$, R3 ; Address to check MTPR R3,tPR$ TBCHK ; In TB BVC 20$ ; If VC no MOVAB 30$,R3 ; Address to check MTPR R3,tPR$ TBCHK ; In TB BVC 20$ ; If vc no MOVAL (R4),R3 ; Address to check MTPR R3, tPRS TBCHK ; In TB BVe 20$ ; If VC no MOVAL BA 4(R4),R3 ; Address to check MTPR R3,tPR$ TBCHK In TB BVC 20$ ; If VC no 10$: MFPR tPR13$ CCTL,R3 ; Read CCTL BICL2 t<- -; Form a mask <1@PR13 CCTL$V FORCE HIT>l- ; Force hit mode <1@PR13=CCTL$V=DISABLE_ERRORS>l- ; Disable errors <0>>,R3 ; Local copy control register BISL3 t<; Form a mask <1@PR13 CCTL$V ENABLE>l- ; Enable BCACHE <1@PR13-CCTL$V-FORCE HIT>l- ; Force hit mode <1@PR13=CCTLSV=DISABLE_ERRORS>l- ; Disable errors <0>>,R3,RO ; Local copy control register MTPR RO,tPRl3$ CCTL Enable Bcache - FORCE HIT, DISABLE ERRORS MFPR tPR13$ CCTL,RO Allow the dust to settle ••• MOVQ Rl, (R4) ; Write BCACHE data MTPR R3,tPRl3$ CCTL BCACHE off MFPR tPR13$ CCTL,R3 Allow the dust to settle ••• MOVI.. tSS$ NORMAL,RO Signal success 20$: MOVL 10 SAVED REGISTER,R3 ; Restore register RSB -; Return 30$: 5. Initialize the Pcache This code initializes the Pcache by writing all 256 tags with good parity and all valid bits clear. movl movl movl movl f ..... x00000020, rO to, rl f ..... x01800000, r2 f ..... x01802000, r3 DIGITAL CONFIDENTIAL tag index increment - 1 hexaword block tag init value Pcache tag starting address Pcache tag ending address + 1 block Chip Initialization 16-5 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 pcache_loop: mtpr add12 cmpl bneq r~, r2 rO, r2 r3, r2 pcache_loop write tag to current tag address increment index by the block size check if done 6. Enable the Bcache and the Pcache NVAX cache coherency requires that the Pcache is always a subset of the Bcache. This code to enable the caches is arranged to insure that this is true. Thus, the Bcache is enabled first, and an REI is executed between the Bcache enable and the Pcache enable. The purpose of the REI is to synchronize data prefetching such that the Pcache will not perform any fills to addresses that were not also filled in the Bcache. mfpr bis12 mtpr #PRl9$ CCTL, r6 t<cctlSm enable>, r6 r6, tPRl9$_CCTL movpsl moval rei - (sp) init_cont,-(sp) ; get current value in Cbox CTL IPR ; set the Bcache enable bit write the new Cbox CTL IPR push the psl and the next PC branch to the next PC flushing the VIC and aborting all previous IREADS Now that state is synChronized, enable the Pcache 16.5 Miscellaneous Information There is no need to explictly initialize the Translation Buffer as the NVAX microcode performs an internal TBIA on any MTPR to the MAPEN IPR. There is no need to explictly initialize the data portions of the VIC or Pcache as long as the tags are initialized with all valid bits clear. Both Bcache tags and Bcache data must be initialized before the cache is enabled. 16-6 Chip Initialization DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 16.6 Revision History Table 16-1: Revision History Who When Debra Bernstein 9-May-1990 Initial edit Debra Bernstein 19-Nov-1990 Add Miscellaneous Information section. Add true code examples for cache init. Add information on the ordering of cache enable. Debra Bernstein Il-Mar-1991 Update to pests, tbsts Rebecca Stamm 9-0ct-1991 Bcache data must be initialized as well as the Bcache tags. Description of change • DIGITAL CONFIDENTIAL Chip Initialization 16-7 • Chapter 17 Chip Clocking 17.1 Overview of the NVAX Clocking System The ~\TAX CPU generates all the clock signals required to operate the CPU and the ~'T!)AL interface. The clocks are derived from a high frequency oscillator signal that is supplied to the chip. To allow for flexible logic design the chip implements a four phase clocking system. The four internal ~TVAX clock phases are generated on-chip by dhiding the frequency of the external oscillator by four. The NVAX chip generates and drives the NDAL clocks which are used to clock the peripheral chips on the ~'"DAL bus. The h"DAL also uses a four phase clock scheme, but runs three times slower than the internal NVAX clocks. 17.2 Receiving the NVAX External Oscillator Signal The NVAX. chip can receive the external clock from one of two sources depending on the state of the OSC_TEST_H pin. When OSC_TEST_H is asserted the clock is received by the OSC_TC1_H and OSC_TC2_H pins. These pins are configured to use standard 3V CMOS signal levels. When OSC_TEST_H is deasserted the clock is received by the OSC_H and OSC_L pins. These pins use a differential amplifier circuit to receive the clock signal from an ECL oscillator. Figure 17-1 shows the NVAX clock interface circuitry. EXTERNAL OSCILLATOR Detailed information concerning the design of the external oscillator can be found in the NVAX Signal Integrity Specification. 17.2.1 The System Environment During normal system operation OSC_TEST_H is ~ed low and the OSC_H and OSC_L pins are used to receive the extemal clock source. The NVAX CPU is designed to operate at a maximum internal clock speed of 100 MHz. This requires the external oscillator to deliver a 400 MHz clock. At these frequencies the generation and interconnection of signals is extremely complex and specialized circuitry must be used. DIGITAL CONFIDENTIAL Chip Clocking 17-1 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 17-1: NVAX CPU Interface Circuitry VOl) -+-----+ OSC_H I i -+----+ 1 1 \ 1 / \ / I -+- VDD I K%MASTER CLK ON-CHIP MASTER CLOCK Driven to global clock generator BOFFER / I / \ (EXTERNAL) / 1 1\ \ 1 1 \ I ;u.:P:'!F:!:R 1 1 Otr'!' 1 I I PAD lose 1 1 • +-----+ \ 1 +-----+ I 1 i Jt. .---.;- \ >----~ 1 I \ 1 ----- ... / ------+ +-----+ 1 +----+-----+ I \ I +---+ - \ r::rr:E..~'!'IhL -.~:. 1 1 VSS -+----+ 1 PULL-UP RESISTOR ~----+----+ +----< . .-----+ AC COUPLER I I CMOS INPUT VDD/2 BIAS NETWORK PAD +----+----+-------+ -------+ +----+ 1 I 1 1 1 I 1 /1 \ / \ / ;. . 1 :: -;.---+ x~? ! 1 I 1 1 I E ----- 1 C:;::'!'RC:' 1 1 / --- 1 1 \ 1 1 / i \ ! 1 1/ / \ 1 / \.~:: ?-::.:.--;r 1 1 ; i i i : I / \ / \ I 1 1 VSS ------+ \'SS -~- '\~~/~ 1 -----< ----------- ~~~ ; 05:_ -----. PAD ------------------E:A.S 1 +-----+ I OSC_TEST_H I PAD 1 \ / \ / 1 P~...L-DOWN CMOS INPUT TRANSISTOR BUFFER 1\ 1 \ I / +----+----------------+ >------------+ i +-----+ 1/ The NVAX oscillator generates a pair of clock signals that are 180 degrees out of phase. The oscillator does not supply standard CMOS logic levels. The signals have a peak to peak voltage swing of .5 volts centered at 3.5 volts - therefore, a standard CMOS input buffer cannot be used on the chip to receive the signals. Instead, a differential amplifier is used and the signals are AC coupled and level shifted before they are received by the amplifier. 17.2.2 The Chip Test Environment The chip tester, used during chip manufacturing to functionally verify the part, cannot supply a 400 MHz clock. The two pins OSC_TC1_H and OSC_TC2_H offer an alternative method for supplying the chip with clocks. The pin OSC_TEST_H is used to select between the system and test clocking modes. When the pin is asserted the test clock pins supply the clock to the chip. 17-2 Chip Clocking DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Hevision 1.0, February 1991 The test clock pins are supplied with two clock signals that are 90 degrees out of phase. They are XORed on-chip to generate the internal2X clock signal K_PAD_CK1%ZZ. Figure 17-2 shows the relationship between K.-PAD_CK1%ZZ and the test clock input signals. Figure 17-2: On-Chip XOR Test Functionality Waveforms 1-------\ 1-------\ 1-------\ + , ,-,-++" , ,-,-++" , ,-,-++" +" 1-------\ +" 1-------\ +" 1-------\ +, , OSC TCl H I I '-1-+ I , , /------- \ _ _1------- \ _ _1 ------- \ _ _ /------- \ _ _1------- \ _ _1------- \. , , , + I 'I 'I I "I + I , , + , I , 1---\_1---\ +, ~ I 1---\ -+ + 1---\ I-I I I /---\. ,-... I + 1---\ I-I I I 1---\ ,I /---\_1---\_/---\_1---\_1---\_ -, I +'" + " I I ~ I I In addition to the frequency doubling feature of the test clock input circuitry, the pins use CMOS differential ampliiiers to receive the clock signals. Hence, the test oscillator clock inputs can be used to drive the chip at slower than maximum speeds using standard 3 volt CMOS logic levels. 17.3 On-Chip Clocks 17.3.1 Clock Generation/Distribution Overview Figure 17-3 illustrates the overall structure of the clock generation/distribution system. The clocks are distributed across the chip in two stages. The global clock generator receives the master clock signal and generates the following global clocks that are driven to various sections of the nvax chip: • • • • eight single phase matched clocks (true and complement) K%PBI...l:4_H &; L four double phase matched clocks K%PHC12:4CH four NDAL matched clocks K..GLB%PHI12:41_0UT_H two specially tuned single phase clocks K..P%PHI_SE and E...V%PHL.SE For purposes of defining clock specifications in different parts of the NVAX chip, clock section (or simply, section) is defined for the remainder of the chapter to be one of the chip sections shown in Table 17-1. T$ble 17-1: NVAX CPU Clock Sections Section Name Symbol Cbox MCB Ebox IEB Fbox Ibox Mbox F I M Pcache P DIGITAL CONFIDENTIAL Chip Clocking 17-3 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Table 17-1 (Cont.): NVAX CPU Clock Sections Section Name Symbol VIC V Upper lIO Pad Logic PAD Lower lIO Pad Logic PADL Global Reset Logic K Figure 17-3: On-Chip Clock Distribution :'0 2.:1 se:~!.er.s ! j I ! :~:z =-=::" , -------.. ---------) :.:~~s :<-- K_:~rE!_l:~_~ I ?:_:~?:.:_::: ~:_E i x~ C::ck ~=!~s 1 f ------------------>:<------- P:_:~r:.:_: :~_:: :~:x S_=~io~ ~::a: --------- ---->=-='::. I I :'::.~s ! <--- :.:.~~: ·::k :'~ <------------------~ I ---------- i e sg: r::'u.. I , db: phase 1<-------------· .---------------+ I I I ~!ST?!5t:':'!ON TO :30X Ol~Y > I 1< ·ZOSA:. O:S'!?:3U'!'!ON - KISPE! l : ' E K%PH!_1:4_I. I 1 Ma't.ched. Ie and. Load. -KUH! l2:41 H I ' Matched.-IC & Load. 1 Global Clock Gen K V%PH! 3£ H \ K-P%PH!-3£-H / Special Phase 3 Cloel I I -400 MHZ I Divide by 12 master NDAL clocks to I/O Pad. Drivers clock I OUtputs: I K GLB%PHI12:41 OUT H MatCbeci IC and Load. I 8 single phase elks I I 4 double phase elks I I 4 NDAL elks I +------+----.---------+ +------> +------> +------> ---->+ +---------------------+ The global clock signals are received and driven into each section by local clock buffers. It is these local clocks that are used to control logic sequencing throughout the chip. Note that the active high single phases are used by all sections, while the double phases and active low single phases are used only in the Fbox. NDAL clocks are driven to the pads where they are buffered and driven off chip. 1 where X is a clock section symbol. 17-4 Chip Clocking DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specmcation, Revision 1.0, February 1991 17.3.2 Global Clock Distribution In this two stage distribution scheme, clock generation and distribution are very tightly controlled at the global level. Delays seen by each section are mjnimjzed and equalized to reduce global skew. Global clock signals have matched buffer delays from the generator, matched interconnect, and matched section loads. Load matching on global clock signals is implemented using dummy loads - MOSFET capacitors added to global distribution lines to balance section driver loads seen by the global clock generator. Dummy loads are added to global clock signals at each section input, matching the section load on that signal with the most heavily loaded clock signal at that section input. Global routing of the clock signals is carefully controlled to both mjnjmjze RC delays and to match the delays of all signals atriving at a common receiver. To provide fiexibility in global clock distribution, global clock signals are organized into four groups. Interconnect and loads are matched between the signals within each group. The four groups are designed to have very similar edge rates and delay characteristics. These groups .. consist of: 1. K'iCPHl..l:4_H - active high CPU clocks 2. K'iCPHI_l:4_L and K'iCPHC12:41_H - active lo\v and double phase CPU clocks 3. K_GLBt7CPHI12:41_0LTT;..H - double phase ~'"DAL clocks 4. K_plicpm_3E_H and K_V'iCPHI_3E_H - special CPU clocks 17.3.3 Section Clock Distribution Section clock distribution rules are more fiexible than global rules to allow for stringent routing requirements at the section level. Primary requirements for section-level distribution are 1) maximum 125 pS RC delay between section drivers and any receiver, and 2) adherence to NVAX methodology which specifies the use of only fully complementary receivers. A detailed description of the rules relating to the use of the NVAX on-chip clocks can be found in the NVAX CPU Chip Design Methodology document. 17.3.4 Global Clock Waveforms Eight single phase and four double phase clock signals are globally distributed on the chip. Four NDAL clocks are driven to the pads where they are bufiered and driven off chip. The single and double phase CPU clocks have a period of one NVAX cycle. The NDAL clock cycle is three NVAX cycles in length. Both rising and falling clock transitions occur at the boundaries of each of the four phases of an NVAX clock cycle. Waveforms for the globally-distributed clock signals are shown in Figure 17-4. The use of these global clock signals is RESTRICTED to interconnecting the section clock drivers. Clock signals K_P%PHI_3E_L and K_V%PHI_3E_L are used for sense amplifier timing within the Pcache and VIC, respectively. These signals are "early" versions of K%Pffi_3_H and are carefully tuned in relation to other clock signals. For this reason, waveforms for these clocks are not depicted in Figure 17-4. These signals are discussed further in the next section. DIGITAL CONFIDENTIAL Chip Clocking 17-5 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 17-4: Global Clock Waveforms NVAX NVAX NVAX Cycle Cycle Cycle I I I I I--I--i--I--I--I--I--I--I--I--I--I--I I I I I K%PH!_2_H _/--\ K%PH!_3_H _ _ /--\ K%PP.:_'_P. /--\ /--\ /--\_ _ /--\ /--\_ J--\ /--\ K%:F.:_::_:' \_1--------\_/--------\_1--------\ ---\_/--------\_1--------\__ /-----------\ __ /--------\_1--------\_/--- K%?P.:_~_:' 1--------\_1--------\_/--------\__ 1 K%PF.:_::_P. ; ----- \_._/ ----- \ _ _ K%PP.: 1 :. Y.%PP.: 2 :. Kl!:P.:_:::_P. !-----' __ I /----- \ _ _ 1-----\ _ _1----- ' _ ;' -----\__1-----\__ i --- /-----------------\------ . . ___ ___ i-----------------, ______ 1-----------------\ K_~=Ii£P:.:_'l_P. 17.3.5 ---------\ _ _ _ _ _ _ /--------- Section Clock Waveforms The section clocks are buffered versions of the globally distributed clock signals. Ten sections on the chip receive clocks K%PBI_l:4_H, while the Fbox is the only receiver of K%PHCh4_L and K%PHC12:41_H. NDAL clocks are received only at the pads and are driven off chip as PHI12:41_0UT_H. Clock signals K_P'kPHI_3E_L and K..V~1IL3E..L are received only in the Pcache and VIC sections, respectively. These clocks are used to trigger sense amplifiers and must be tuned such that their buffered, section level edges precede the normal section level phase 3 edges (e.g. K_P%PHl_3_H and K_V%PID_3_H) by approximately 1.2 nS. All section buffers have identical internal delays. To insure this, standard clock drivers are used in each section (except the Fbox). The standard clock driver is designed to be used in a distributed fashion: multiple identical parallel drivers are used, with inputs, outputs, and primary internal ~ nodes being individually strapped together within each section. 17-6 Chip Clocking DIGITAL CONFIDENTIAL. NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 17.3.6 Clock Skews and Rise/Fall Times of the Section Clocks Because of the tightly controlled delays in the first stage of the distribution network, clock skew specifications are the same in most sections of the chip. The only exception to this is in the Fbox, where leverage of the layout from the Rigel Fbox necessitates specification of a lower-tolerance skew. This higher skew figure is due to larger allowable RC delays in the Fbox section level clock distribution network. Table 17-2 specifies the skews and riselfall times for the edges of the single phase clock signals. These values are for a 'IT part running at 100°C and 3.0 volts. Clock Skew is the uncertainty in time from when any clock edge crosses the 50% Vdd point to when any other clock edge crosses the 50% Vdd point. The rise and fall times are measured from the 10% to 90% points of the full voltage transition of the clock signal. Adjacent clock phases can overlap or underlap due to clock skew. Table 17-2: Skews and Rise/Fail Times' Skew Within Any Section2 Skew Fbox 0.5 nS 1.0 nS Within Skew Between Any Two Sections2 Skew Between Fbox and Any Section Rise/Fall Times 0.5 nS 1.0 nS 0.5 nS 12 17.4 The NDAl interface timing system 17.4.1 NDAl Clocks The NVAX CPU provides four double phase low skew clocks that are used by the memory interface to communicate with the CPU via the NDAL. The NDAL runs at one third the speed of the internal CPU cycle. The NDAL clocks are generated by dividing the internal clock frequency by three. The interconnect used for these signals must be well controlled to avoid excessive delay, ringing, and skew. The relationship of the four clocks to the internal CPU clock cycle is shown in Figure 17--5. The timing diagram. also indicates the timing of the NDAL signals. The NDAL changes in ~12, is valid during 4>3, and goes tristate in ~4. All NDAL signal transitions are referenced to the RISING transitions of the clocks. 1 These skews are not valid for the NDAL clocks. See Section 1.4 for specific NDAL clock skew information. 2 ExcludiDg the FboL DIGITAL CONFIDENTIAL Chip Clocking 17-7 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 17.4.2 Controlling Inter-Chip Clock Skew The distribution of the NDAL clocks across the module is critical to the performance and functionality of the CPU. At the specified operating frequencies of the CPU, the module interconnect acts as a transmission line. It has a characteristic impedance and delay. The interconnect used for the clock signals must be carefully matched to avoid skew. Note that skews and signal delays are measured from the point where the waveform reaches VDD/2 (nominally 1.65V). MODULE INTERCONNECT Detailed information concerning the design of the module interconnectivity can be found in the NVAX Module Signal Integrity Handbook. Figure 17-5: Relationship of Internal and NDAL Clock Cycles !=;~ =y=~ !-----------~-----------:-----------:-----------, ?E:::_O~':_H /-----------------------\ _ _ _ _ _ _ __ ::':3~_ ?:.::::_:.'=:~!! =:-:_E \ _ _ _ _ _ _ _ _ ,' ----------------------- \ PE:~:_O:r!_:! ------------\ _ _ _ _ _ _ _ _ /------------ t::iA.:.. :x:xxxxxxxxxxxx------»»>-------X .I ----------------------- \, _ _ __ A A I 'tris~a~e I valid at input pin of receiver 17.4.2.1 Self Skew Each NDAL clock is distributed to a number of receivers on the CPU board. In a perfect electrical environment each chip would receive the clocks at exactly the same time. Unfortunately, due to mismatched interconnect lengths and variations in the electrical properties of the interconnect, a clock signal will not arrive at the ditierent receivers at the same time. For example, refer to Figure 17-6. The clock signal is driven from the NVAX CPU to four clock receivers. Due to interconnect length mismatches it will be received at points A, B, C, and D at different times. 17-8 Chip Clocking DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 17-6: Self Skew +---------+ PHIl2 B +----------+ 1 out+----------+--------->1 I I NVAX I I I A 1 1 I receiver I I 1 I 1 in 1<---------+ ~---------+ 1 1 1 ~----------+ 1 1 1 ~---------+ 1 +----------+ liD 1 ell Ireceiver 1<---------+--------->1 receiver 1 I; 1 , 3 1 ~---------+ -:-----------~ The maximum difference in the arrival time of a particular clock transition at different locations is defined as the self-skew of the clock. Self-skew is the maximum possible difference between the actual clock transition and the specified clock transition. For the :!\T'\TAX CPU to operate at its maximum perlormance the following rules must be obeyed. 1. The rising transition of each N"DAL clock occurs at any receiver within loOns of when it occurs at any other receiver. For example, refer to the diagram above. The 4>12 rising transition occurs at point A, point B, point C and point D, and the transitions at each separate point occur within loOns of the transitions at every other point. 2. .Rule 1 must also hold for N"DAL falling edge transitions. These rules imply that if a clock transition appears at one receiver O.5ns before the specified time, the same clock transition cannot appear at another receiver more than 0.5nsafter the specified time: this would violate rule 1. 17.4.2.2 Inter-Clock Skew At the clock receivers, each NDAL clock transition is specified to appear at some time relative to any of the other NDAL clock transitions. In an ideal design, all clock transitions would occur at the specified time. Unfortunately, due to device, processing, and interconnect mismatches, the clock signals will arrive at times different from those specified. The uncertainty in arrival times is defined as the inter-clock skew. For the NVAX CPU to operate at its maximum performance the following inter-clock skew roles must be obeyed. 1. The skew between any two rising NDAL clock transitions at any two receivers is +1-0.5ns. For example, if the transitions are defined to be 15ns apart, the clock design guarantees that they are between 14.5 and 15.5 ns apart. 2. Skew between falling clock transitions is +/-0.5ns. 3. The skew between a rising transition and a falling transition is +1-O.75ns. 17.4.3 Driving and Receiving NDAL signals Detailed information regarding NDAL clocking and NDAL skew considerations can be found in Chapter 3. DIGfTAL CONFEDENTIAL Chip Clocking 17-9 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 17.4.4 Information Transfer between the NDAL clock system and the on-chip clock system Detailed information regarding information transfer between the NDAL clock system and the on-chip clock system can be found in Chapter 13. 17.5 Initializing the NVAX system. ASYNC_RESET_L is an asynchronous input to the 1\TVAX chip. It is used to force the NVAX CPU into a known state. The assertion of ASYNC_BESET_L occurs during NVAX system initialization. ASYNC_RESET_L must be asserted for a minimum of 7 NDAL cycles. SYS_RESET_L is both an asynchronous and synchronous output. SYS_RESET_L is asynchronously asserted whenever ASYNC_RESET_L is asserted. When asserted, it places the ~'"VAX. system chips in their initial power-up states. SYS_RESET_L is asserted for a minimum of 7 NDAL cycles. The deassertion of the signal is s)'Ilchronized to the :r-..~.AL clocks. It is deasserted on the rising edge of PHI12_0UT_H and is ,\9alid at the ~'TIAL receivers in time to be latched in l\."'DAL ~4. Figure 17-7 shows the relationship between ASYNC_RESET_L and SYS_RESET_L signals. Figure 17-7: System Reset Timing 1 t..'tIAI. cyc:.::: 1 ND;'..!.. cyc:.::: 1 NDAL CYCLE 1 1 1 1 1 1 Pli P21 P31 P41 Pli P21 P31 P41 Pli P~l P31 P'I 1---1---1---1---1---1---1---1---1---1---1---1---1 + 1 1 1 + , 1 I + , 1 , + 11111-------------------------------------- ASYNC_RES!T_L --\\\ SYS_RESET_L <-- Asserted for a minimum of 7 IIDn Cycles --\\\\\\\\\\\ + 1 ASYNC RESET L asynchronous 1 assertion causes asynehronous 1 assertion of SYS_RESET_L. ---> 1 1 1 + 1 1 1 + 1 , 1 T 11111111111------ 1 + 1 1 + 1 1 1 + PH!12 OUT H 1-------\ 1-------\ 1-------\ I - + 1 1--'- + 1 '-1-+ 1 1-'-+ PHI23 OUT H 1-------\ 1-------\ 1-------\ - - -+- , PHI34_0UT_H \ + , PH141 OUT H ----\ - - + 1 1 1--+-' 1-------\ 1 I 1 , ,--+-, 1-------\ +--1-' 1 1 +-'-1 1-------\ /-------\ 1 + '--1-' + 1 I-T 1-------\ I , + 1---I + ************************************************* 17.5.1 Internal NVAX Reset The ASYNC_BESET_L pin is used to generate several internal reset signals wbich reset various parts of the NVAX chip. ASYNC_BESET_L is synchronized with NDAL 4>3, then latched after settling with NDAL 4>1. This synchronized signal is piped to NVAX 4>4 to produce E..P~C_RESET. The internal, buffered version of ASYNC_RESET_L is K_P.AD%ASYNC_RESET. 17-10 Chip Clocking DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 To satisfy various logic timing constraints, several reset signals are produced and distributed throughout the NVAX chip. The primary internal NVAX reset is K%RESET. This signal is asserted asynchronously and deasserted synchronously following assertionldeassertion of ASYNC_RESET_L or DISABLE_OUT_L, or during the BSR External Test. Buffered versions of KO/cRESET are used by the Ebox, Ibox, VIC, and Fbox to reset local logic. Detailed information regarding the functions of DISABLE_OUT_L and the BSR External Test can be found in the NVAX Testability Specification. The Mbox, Pcache, and Cbox (excluding BIU) receive buffered versions of K_MC%BESET. This signal functions the same as K%BESET, except it is also asserted following an Ebox S3 timeout (see individual box chapters of this specification for detailed information). The 110 Pad logic receives buffered versions ofK%EXT_BESET. This signal is the same as K%BESET, except it is not asserted with DISABLE_OUT_L or during BSR External Test as K%RESET is. K_CE%RESET is asserted during ~TDAL 4>3 and piped to l\'TVAX internal ~1' A buffered version of this is used to reset BIU logic in the Cbox to the proper l\TDAL sequencing state during the reset sequence (see CBOX chapter for detailed information). DIGITAL CONFIDENTIAL Chip Clocking 17-11 NVAX CPU Chip Functional Speci1ication, Revision 1.0, February 1991 17.5.2 Generation of Clocks During Power-up The NVAX chip generates its internal clocks and the NDAL clocks by dividing down a high frequency external oscillator signal. The external system oscillator is powered from the module 5 volt power supply. Its clock signals must be valid before 3 volt power is supplied to the NVAX chip. The oscillator takes a maximum of 10 mS of initialization time before its clocks can be considered free running. Hence, the module power supply must be designed to guarantee that the 3 volt supply is not valid until 10 mS after the 5 volt supply is stable. The NVAX clock generator derives free nJDDing clocks from the external oscillator clock. The clock generator is self-initializing and is not affected by the assertion of ASYNC_RESET_L, except for clock generator reset test features (see CLOCK GENERATOR RESET, next section). The clock generator requires a maximum of 3 oscillator clock cycles to initialize itself after the 3 volt module power supply has become valid. Figure 17-8 shows the NVAX Chip and external oscillator power-up sequence. Figure 17-8: Clock State During Initial Power-up ::.:=:: r;-~AX I ::1 i2t 1 5-\·_?:~·:!,?" ~S! : ?~I 17.-;';": P:I =:-:::: iii " ! !:-'.-;...): ::.:::: I :;-."";._i: =:::?: ... i =:! r!: P,t P:, ==! ?3! :4r ~:I ::, :3: = i' 1 Ii / ,I 1,/./ - - - - .... - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - . <---:'0 ~ 3':_:cn·;=:? :.!.::---> I I I I ' / I / I /-------------------------------------------------------------------------------. , , , ... 1 I' 'I 1 ... , , , 05C_:ZS=_E ~~~.______________________________________~------------~--------~----------------~--------------------~ <-05C inoet.e::m!.nat.e-> + : " " f "I Ii' OSC_? xr.r....oov:J<:J<:iOOUO:XXXXXXXXXX/ - \_1 - \ - \_1 - \_1 - \ I .. \ 1-\ /- \_/- \_1 - \_1 - \_/ - \ j - \_1 - \_/- \_l - \_1 - \ j - \_1 -' <3 OSC Cycles max.. >+ 1 -, -, - + , , , + , , , + , , , OSC_L ~\ 1-\ I ••• 1-\ 1-\ /-\ /-\ /-\ /-\ 1-\ /-\ /-\ /-\ 1-\ /-\ 1-\ /-\ /-\ 1-\ ) <3-0SC-Cycles max.>+- ,- ,- 1- +- ,- ,- ,- +- ,- 1- ,- +- ,- ,- ,- (Internal Chip Clock) XXXXXXXXXXXXXXX--------\ _ _ _ /-------\ _ _ _ I-------\_ _ _ I-------\ _ _ + , , , + 1 , , + , 1 , + , , , K%PH!_23_H (Int.ernal Chip Clock) XXXXXXXXXXXXXXX 1-------\ 1-------\ /-------\ /-------\ -+-, I ,--+-, , 1--+-' , ,-+--, , ,K%PHI_34_H (Internal Chip Clock) XXXXXXXXXXXXXXX /-------\ 1-------\ /-------\ /-----. + , 1 , +--,-, , +--,- , , +--1- , 1 K%PHI_41_H (Internal Chip Clock) XXXXXXXXXXXXXXX----\ /-------\ 1-------\ /-------\ /-+ , 1 I + , , , + ,--,-, + ,--,-, PHI_12_00T (NDAL System Clock) XXXXXXXXXXXXXXX/-----------------------\ /-------------+ , I , + 1 , , + 1 +, 1 , PHI_23_00T (NeAL System Clock) XXXXXXXXXXXXXXX /-----------------------\ /-. + I + 1 I 1 + I I I + , PHI_34_00T (NDAL System Clock) xx:xxxxxxxxxxxx /-----------------------\_________________ + , , I + I' + I I I + I I , PHI_41_0OT (NDAL System Clock) + I I I + , 1 1 + I 1 I + , , ,*****************************************************************************************************************, K%PH:_12_H XXXXXXXXXXXXXXX------------\ 17.5.3 /-----------------------\ Clock Generator Reset The NVAX chip incorporates a clock generator reset feature for use in verifying chip timing. The generator can be reset to a known cycle and phase in order to verify various signals against their specified timing. 17-12 Chip Clocking DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 WARNING Use of the clock generator reset feature must follow these specific sequencing and timing constraints. Deviation from these specifications will have undesirable results, and can result in physical damage to the NVAX chip. Contact a member of the NVAX clock design team for further information about this feature. Figure 17-9 shows the proper signal timing for effecting a reset of the clock generator. To begin the clock generator reset sequence, the chip is powered up using normal high speed oscillator inputs supplied through OSC_H and OSC_L. This is the normal powerup mode, and allows the internals of the chip to reach a deterministic operating state. Following a normal powerup reset sequence, the oscillator input is turned off briefly (1..2 mS) to switch the oscillator input to the test clocks. Following the switch to the test clocks, the chip is again reset to restore any internal state lost during the test clock switch. Note that ASYNC_RESET_L is held asserted through the duration of the clock generator reset sequence. Following this second chip reset sequence, the test clocks are stopped briefly (500 nS MAX). The states of test clocks OSC_TCl_H and OSC_TC2_H when stopped must be the same, either both high or both low (as shown). TEST_DATA_H should be driven low as shown in Figure 17-9 to enect the clock generator reset. This immediately places the clock generator into ~\::.~ ~2 and :N'TI_~ 4'>1. TEST_DATA_H is then driven high and clocking of the chip is resumed. On the first oscillator cycle following resumption of clocking, the generator will transition into l\"V:AX 4'>s and begin normal sequencing. AYSYNC_RESET_L must remain asserted for at least i !\'"DAL cycles follo"ving resumption of clocking. DIGITAL CONFIDENTIAL Chip Clocking 17-13 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Figure 17-9: Clock Generator Reset nmlng CPU Phase NDAl. Phase :3 4 1 2 3 1 2 3 I I 1-\_1-\_1-\_1-\__________________________________________________________________ ___ 2 1 I \- 1-\- 1-\- 1-\- 1-----------------------------------------------------------------------------------. I 1---\_1-' 1--- \_1--- \ I I 1---\ _________1---\_1---\_1---\_) ------------------_1 ---\_1---\_1 --- \_1--- \ - - - - - -_ _ _ _ _ _1---\_1---\_1---\ osc TC: H - , , , ,- - - - - - -, I-\_I-\_I-\_I-\~ _______ I-\ 1-\ 1-\ 1-\ 1-\ 1-\ 1-\ I-\ _ _ _ _ _ _ _ I-\_I-\_I-\_I-\_I-\j-\_I-\ ------------------------------------------------------------\--_I-------------------------------~ I I , __________ /-- -----------SSSSSSSSSSSS-----------------~--I!I!/-----------\ 1 ...• ""_-"=-- ..- :.:' ::5 : :::5 ::.1:: • :-..!.:-. • : : ::s :.:..::. • .~ - -.. ~ - E_SZ=%OSCl_K ~s ~~w ~~~_~a: =as~_= ~:~:k ;=c~~:&: !=o~ &~~:._= ~hw OS:_K a~~ OSC_I. :.~;-.;~s, ~:::.& OSC_~l_E a.."'!~ OSC_~_K ~::;."-:':$. OS:_DS1'_K :"5 -.;s.: ':-: 5&:.. :-: ":.!'lc ::::k s:-:.:.:: .. as :i_s:=~z.= !:: -:':::'5 ::oet: s?W·:!!:!..:a-:.:":::. == - S indica~es a s,:a~ic (non-changing) ~1D~ ~ 1. '!'iming Not;es; _. £C~ ~in in~ut;s OBC K and OSC I. must be use: to supply clocks to chip p=ie= to a~Q auring powe~-up. lnpu~s OSC_TC1_P. and OSC_TC2_P. must be held low in order to prevent latch-up. 2. Switch to test clocks OSC_TC1_H and OSC_TC2_H. 3. Clocks restarted to restore internal chip signals prior to clock-reset sequence. Start measure out lpat on chip tester. 4. ASDC auz1' 10 must remain asserted for a minimum of 7 NDAI. cycles following restart of clocks. 17-14 Chip Clocking DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 17.6 NVAX Clock Section Signal/Pin Dictionary 17.6.1 Schematic· Behavioral Translation Schematic Name l Behavioral Model Name l - Signals K%EXT_RESET K%PHI_l_H K%PHI_2_H K%PHI_3_H K%PHI_4_H K%PHI_l_L Ko/i:PHI_2_L K%PHI_3_L K'iCPHI_4_L K9CPHI_12_H K%PHI_23_H K%PHl_34_H K%PHl_41_H K%RESET K_CE %RE SET K_GLB%PHI12_0UT_H K_GLB%PHI23_0UT_H K_GLB%PHI34_OUT_H K_GLB%P:En41_0UT_H K_MC%RESET K_P%PID_3E K%EXT_RESET K%PHI_l_H K%PHI_2_H K%PHI_3_H K%PHI_4_H K%PHI_l_L K%PHI_2_L K%PHI_3_L K%PHI_4_L K%PHI_12_H K%PHI_23_H K%PHC34_H K%PHI_41_H K%RESET K_CE%RESET K%NDAL_PHI_12_H K%NDAL_PHI_23_H K%NDAL_PID_34_H K%NDAL_PHI_41_H K_MC%RESET K_PADo/~C_RESET K_PAD%ASYNC_RESET K...PAD%SYNC_RESET K_PAD%SYNC_RESET K_SEC%OSC1_H K...V%PID_3E non-existent2 module calls non-existent2 - Pins ASYNC_RESET_L DISABLE_OUT_L OSC_TEST_H P%ASYNC_RESET_L P%DlSABLE_OUT_L P%OSC_TEST_H lSigDals without specified assertion levels may uist in _H and/or _L versions. 2These signals are not modeled in the behavioral code. SAny transition is represented. in behavioral model by a call to routine n_%master_clock_tnmsition. DIGITAL CONFIDENTIAL Chip Clocking 17-15 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Schematic Namel Behavioral Model Namel OSC_H P%OSC_H OSC_L P%OSC_L OSC_TC1_H P%OSC_TCl_H OSC_TC2_H P%OSC_TC2_H PHI12_0UT_H P%Pln12_0UT_H PHl23_0UT_H P%PHI23_0UT_H PHI34_0UT_H P%PHI34_0UT_H PHI41_0UT_H P%PHI41_0UT_H SYS_RESET_L P%SYS_RESET_L TEST_DATA_H P%TEST_DATA_H lSignals ~ithout speci1ied assertion levels may exist in _H and/or _L versions. 17.6.2 Behavioral· Schematic Translation Behavioral Model Name" Schematic Name" • Signals K%EXT_RESET K%EXT_RESET K%PHI_l_H K%PHI_l_H K%PHI_2_H K%PHI_2_H K%PHI_3_H K%PHI_3_H K%PHI_4_H K%PHI_4_H K%pm_l_L K%PHI_l_L K%pm_2_L K%PHI_2_L K%PHI_3_L K%PHI_3_L K%PHI_4_L K%PHI_4_L K%PHI_12_H K%PHI_12_H K%PHI_23_H K%PHI_23_H K%PHI_34_H K%PHI_34_H K%PHI_41_H K%PHI_41_H K%RESET K%RESET K_CE%RESET K_CE%RESET K%NDAL_PHI_12_H K_GLB%PHI12_0UT_H K%NDAL_PHI_23_H K_GLB%PHI23_0UT_H K%NDAL_PHI_34_H R;..GLB%PHI34_0UT_H "Signals without speciiied assertion levels may exist in _H and/or _L versions. 17-16 Chip Clocking DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 Behavioral Model Name" Schematic Name" K%NDAL_PID_41_H K_MC%RESET K_PAD%ASYNC_RESET K_PAD%SYNC_RESET I\..GLB%PHI41_0UT_H K_MC%RESET I\..PAD%ASYNC_RESET K_PAD%SY.NC_RESET -Pins ASYNC_RESET_L DISABLE_OUT_L OSC_TEST_H OSC_H OSC_L OSC_TCl_H OSC_TC2_H PHI12_0UT_H PHI23_0tJT_H P%~C_RESET_L P%DISABLE_OUT_L P%OSC_TEST_H P%OSC_H P%OSC_L P%OSC_TCl_H P%OSC_TC2_H P%PHI12_01.i"T_H P%PHI23_0LTT_H P%PHI34_0UT_H P9CPHI41_0UT_H P%SYS_RESET_L P%TEST_DATA_H PHI34_0'l'"r_H PHI41_0tJ'T_H SYS_RESET_L TEST_DAT-~_H ·Signals without speci1ied assertion levels may exist in _H and/or _L versions. 17.7 Revision History Table 17-3: Revision History Who When Description of cIumge Bill Bowhill 28-Jan-1990 Initial Release Tim Fischer 28-Jan-1991 Pass 1 Updates Complete DIGITAL CONFIDENTIAL Chip Clocking 17-17 Chapter 18 Performance Monitoring Facility 18.1 Overview The NVAX CPU chip contains a facility by which privileged software may obtain performance information about the dynamic behavior of the CPU. The facility is implemented with a combination of hardware and microcode, and controlled by software using privileged instructions. Two 64-bit performance counters called PMCTRO and PMCTRI are maintained in memory for each CPU in the system. The lower 16 bits of each counter are implemented in hardware in the CPU, and at specified points, microcode updates the quadwords in memory with the contents of the hardware counters. The performance monitoring facility may be configured by privileged software to count a number of events in the system, from which performance analysis data such as cache and TB hit rates, cycles-per-instruction, and stall frequencies may be calculated. 18.2 Software Interface to the Performance Monitoring Facility The performance monitoring facility makes use of a data structure in memory, and must be configured and enabled via a location in the System Control Block, processor register references, and the LDPCTX instruction. 18.2.1 Memory Data Structure The two 64-bit performance counters for each CPU are maintained in a data structure in memory. This data structure consists of a pair of quadwords for every CPU in the system. The physical address of the base of the data structure is obtained from offset 58 (hex) in the System Control Block. The format of this location is shown in Figure 18-1. DIGITAL CONFIDENTIAL Performance Monitoring Facility 18-1 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 18-1: Performance Monitoring Data Structure Base Address 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 Physical Address of Performance Monitoring Data Structure ISBZ 0 1 11 :SCB+58 (hE +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ NOTE A quadword-aligned physical base address is constructed by clearing the lower 3 bits of the longword fetched from offset 58 (hex) in the SCB. Microcode will not update the block in memory unless bits <2:0> of this longword contain 011 (binary). If these bits are found to contain another value, a machine check with code MCHK_PMF_CONFIG is performed to notify software that the performance monitoring facility was incorrectly configured. If is strongly suggested that the physical address be at least octaword aligned, and preferably page aligned. The address of the pair of quadwords for an individual CPU is computed by shifting the CPUID value left 4 bits and adding this value to the base address. This calculation is shown in equation form below (all numbers in these equations are hex). phys_base_addr = 8GB [58] AND F F F F F F FO; The format of the pair of quadwords for each CPU is shown in Figure 18-2. Figure 18-2: Per-CPU Performance Monitoring Data Structure 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ PMCTRO, low longword , :+00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ PMCTRO, high longword 1 :+04 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 63 62 61 60159 58 57 56155 54 53 52151 50 49 48147 46 45 44143 42 41 40139 38 37 36135 34 33 32 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--.--+--+--+ PMCTR1, low longword 1 :+08 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ PMCTR1, high longword 1 :+12 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 63 62 61 60159 58 57 56155 54 53 52151 50 49 48147 46 45 44143 42 41 40139 38 37 36135 34 33 32 18.2.2 Memory Data Structure Updates When the performance monitoring facility is enabled, the memory data structure is updated from the hardware counters if the PMCTRO counter is more than half full and the current processor IPL is below 1B (hex), if a LDPCTX instruction is executed and the PME bit in the new PCB is off, or if the performance monitoring facility is disabled via a write to the PME processor register. The PME bit is internally implemented as ECR<PMF_ENABLE>, with conversion handled by microcode. 18-2 Performance Monitoring Facility DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 When the PMCTRO counter reaches half full, an interrupt at IPL 1B (hex) is requested. This interrupt request is serviced like any other interrupt if the IPL of the processor is below that of the interrupt request IPL. Like any other interrupt, it is serviced between instructions (or in the middle of the interruptable string instructions). Unlike other interrupts, the performance monitoring interrupt is serviced entirely by microcode, with no software interrupt handler required. When a performance monitoring interrupt occurs, microcode temporarily disables the facility, ~e~~s a.nd~l~ars ~llehardware counters, then updates the memory data structure with the 'haraware co~t~.">The 'iacilitYisthenre..enabled, the interrupt is dismissed, and the interrupted instruction stream is restarted. ! I" NOTE Although the performance monitoring facility is disabled during the memory update process, it is re-enabled for the restart of the interrupted instruction stream. Therefore, depending on what events were selected, the facility may count events that are part of the restart process. At the maximum rate (one increment every 14ns CPU cycle), an interrupt is requested every 459 microseconds. If a LDPCTX is executed and the PME bit in the new PCB is off, or if the performance monitoring facility is disabled via a write to the PME processor register, the microcode disables ) the performance monitoring facility, reads . and clears the hardware counters, and updates the, ; memory data structure for the CPU with""inrtlia'rttwaI'@ coufits. - . .- I NOTE The hardware counters are not cleared, and the memory data structures are not updated when the performance monitoring facility is disabled via a direct write to ECR<PMF_ENABLE>. 18.2.3 Configuring the Performance Monitoring Facility Before the performance monitoring facility is enabled, software must select the source of ~he event to be counted. This is accomplished first by selecting the box that reports the event, and then by selecting the event that is to be counted. The box section is made by writing to the PMF_PMUX field in the ECR processor register, as indicated by Table 18-1. Table 18-1: Performance Monitoring Facility Box Selection ECR<PMF_PMVX> (binary) Source of Information 00 !box 01 10 11 Ebox Mbox Cbox The event selection within the box is made by writing to a processor register within the box, as described in subsequent s~tions, and in the box chapters elsewhere in this specification. DIGITAL CONFIDENTIAL Performance Monitoring Facility 18-3 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 The hardware used to implement the 16-bit counters is constructed such that the PMCTRJ counter increments only if both its selected event, and the PMCTRO selected event are truE simultaneously. As such, PMCTR1 is a strict subset of PMCTRO. As a result, some combination~ of event selections will not cause PMCTR1 to be incremented. In some boxes, the event selectioI is specified in such a way that compatible events are automatically selected. In other boxes, thE user must specify compatible events. Where they are required, compatible events are describee in the sections below. 18.2.3.1 Ibox Event Selection The Ibox reports only one event, so if the Ibox is selected, that event is also selected. The ThO] inputs to the PMCTRO and PMCTR1 hardware counters are shown in Table 18-2 Table 18-2: lbox Event Selection PMCTRO Input PMCTRllnput Description; Use VIC Access VIC Hit VIC hits compared to total VIC accesses; VIC hit ratio. 18.2.3.2 Ebox Event Selection The Ebox reports several events, as selected by the PMF_EMUX field in the ECR processor register. The Ebox inputs to the PMCTRO and PMCTR1 counters are shown in Table 18-3. Table 1~: Ebox Event Selection ECR<PMF_EMUX> (binary) PMCTRO Input PMCTRllnput Description; Use 000 Cycles S3 Stall S3 stalls (source queue, MD, Wn, Fbox scoreboard hit, Fbox input) compared to total cycles; 83 stalls per unit time. 001 Cycles EM+PA Stall 010 Cycles Instruction Retire 011 Cycles Total stall queue EM latch and PA queue stalls compared to total cycles; EM+PA queue stalls per unit time. Ebox and Fbox instructions retired compared to total cycles; CPl. Total Ebox stalls compared to total cycles; Stalls pel unit time. 100 Total stall S3 Stall 101 Total stall EM+PA Stall 18-4 Performance Monitoring Facility S3 stalls compared to total stalls; S3 stalls as a percentage of all stalls. queue EM latch and PA queue stalls compared to total stalls; EM and PA queue stalls as a percentage oi all stalls. DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 18-3 (Cont.): Ebox Event Selection ECR<PMF_EMUX> (binary) PMCTRO Input 111 S5 Microword event 18.2.3.3 Mbox Event Selection PMCTRI Input Description; Use S5 Microword event Number of times a microinstruction whose MISe field contained INCR.PERF.COUNT reached S5. By using the patchable control store, one may count microcode events by setting the MISC field of selected microwords to this value. If this event is selected, writing to the PMFCNT processor register will increment the counters via the MISC field decode. The Mbox reports several events, as selected by the PMM field in the PCCTL processor register. The Mbox inputs to the PMCTRO and PMCTR1 counters are shown in Table 18-4. Table 18-4: Mbox Event Selection PCCTL<PMM> (binary) PMCTRo Input PMCTRI Input Description; Use so I-stream TB TB bits for so I-stream references compared to total TB accesses for so I-stream references; SO I-stream TB hit ratio. so I-stream TB access bit1 001 so D-stream TB access so D-stream TB bit1 TB bits for so D-stream references compared to total TB accesses for SO I-stream references; SO D-stream TB bit ratio. 010 POIP1 I-stream TB access POIP1 I-stream TB bit 1 TB bits for PO and PI I-stream references compared to total TB accesses for PO and PI I-stream references; POIP1 I-stream TB hit ratio. 011 POIP1 D-stream TB access POIP1 D-stream TB bit! TB bits for PO and PI D-stream references compared to total TB accesses for PO and PI D-stream references; POIP1 D-stream TB bit ratio. 100 I-stream Pcache access I-stream Pcache bit Pcache hits for I-stream references compared to total Pcache accesses I-stream references; I-stream Pcache bit ratio. 101 D-stream Pcache access D-stream Pcache hit Pcache bits for D-stream references compared tototal Pcache accesses D-stream references; D-stream Pcache hit ratio. 000 110 111 Selection causes uNPREDICTABLE behavior of the performance monitoring hardware. Total reads and writes Unaligned reads and writes Unaligned virtual reads and writes compared to total virtual reads and writes; Unaligned references as a percentage of all references. lTB bit count is unconditionally incremented when MAPEN::O DIGITAL CONFIDENTIAL Performance Monitoring Facility 18-5 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991· 18.2.3.4 Cbox Event Selection The Cbox reports several events, as selected by the PM_ACCESS_TYPE and PM_ffiT_TYPE fields in the CCTL processor register. The Cbox inputs to the PMCTRO counter are shown in Table IS-5 and the Cbox inputs to the PMCTR1 counter are shown in Table 18--6. For the Cbox, all of the PMCTRI selections shown in Table 18--6 are compatible with all of the PMCTRO selections shown in Table IS-5. Table 18-5: Cbox PMCTRO Event Selection CC~PM_ACCESS_TY.PE> (binary) PMCTRO Input 000 Bcache coherency access. PMCTRO increments when the Bcache processes any coherency request from the NDAL. 001 Bcache coherency READ access. PMCTRO increments when the Bcache processes a IREAD or DREAD coh~rency request from the NDAL. 010 Bcache coherency OREAD access. PMCTRO increments when the Bcache processes an OREAD OR WRITE coherency request from the NDAL. 011 Selection causes UNPREDICTABLE behavior of the performance monitoring hardware. 100 Bcache CPU access. PMCTRO increments when the Bcache processes any reference from the CPU. 101 Bcache CPU IREAD access. PMCTRO increments when the Bcache processes an instruction-stream read request from the CPU. 110 Bcache CPU DREAD access. PMCTRO increments when the Bcache processes an data-stream read, or read-with-modify-intent request from the CPU. 111 Bcache CPU OREAD access. PMCTRO increments when the Bcache processes a data-stream read lock, write, or write unlock request from the CPU. Table 18-6: Cbox PMCTR1 Event Selection CC~PM_HIT_TYPE> (binary) PMCTRI Input 00 Bcache hit.· PMCTR1 increments when a Bcache access results in any hit. 01 Bcache hit owned. PMCTR1 increments when a Bcache access results in an owned hit. 10 Bcach~ hit valid. PMCTR1 increments when a Bcache access results in a valid hit. 11 Bcache miss owned. PMCTR1 increments when a Bcache access results in a miss in which both the valid and owned bits were set. 18.2.4 Enabling and Disabling the Performance Monitoring Facility The performance monitoring facility is enabled or disabled by setting or clearing the Performance Monitor Enable (PME) bit in the CPU. This bit may be written in one of three ways: with a write to the PME processor register, by loading a new value with a LDPCTX instruction from the PME bit in the new PCB, or by a direct write of the ECR<PMF_ENABLE> bit. 18-6 Performance Monitoring Facility DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 The format of the PME processor register is shown in Figure 18-3. Figure 18-3: IPR 3D (hex), PME 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I SBZ I :PME +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ I ENABLE --+ If PME<O> is written with a 1, the performance monitoring facility is enabled. If PME<O> is written with a 0, the performance monitoring facility is disabled. Direct writes to ECR<PMF_ENABLE> are similar to writes to PME<O>, with the exception that the hardware counters are not automatically cleared, and the memory counters are not·updated on an explicit write to ECR<PMF_ENABLE>. The CPU PME bit is also loaded by the LDPCTX instruction from PCB+92<31>. CAUTION The longword at offset 58 (hex) from the SCB and the correct unique CPUID value for each CPU must be initialized before the performance monitoring facility is enabled. Failure to do so will result in UNDEFINED behavior of the system. The CPU PME hit is cleared, and the performance monitoring facility is disabled, at powerup. 18.2.5 Reading and Clearing the Performance Monitoring Facility Counts In normal operation, microcode automatically updates the memory counters by reading the current value of the hardware counters, adding these values to the memory counters, and clearing the hardware counters. This iii the preferred mode of operation. However, there may be some situations in which software wishes to directly read or clear the hardware counters. The current value of the hardware counters may be read from the PMFCNT processor register, whose format is shown in Figure 18-4. DIGITAL CONFIDENTIAL Performance Monitoring Facility 18-7 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 18-4: IPR 7B (hex), PMFCNT in PMF Format 31 30 29 28127 26 25 24123 22 21 20119 18 17 16115 14 13 12111 10 09 08107 06 05 04103 02 01 00 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 Current Hardware PMCTRl Value I Current Hardware PMCTRO Value 1 :PMFCNT +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ The current value of the 16-bit hardware PMCTRI counter is returned in PMFCNT<31:16> and the current value of the 16-bit hardware PMCTRO counter is returned in PMFCNT<15:0>. The two 16-bit hardware counters may be explicitly cleared by software by writing a 1 to ECR<PMF_CLEAR>. H the counters are explicitly cleared, any outstanding interrupt request is also cleared. It is strongly suggested that the hardware counters not be cleared while the performance monitoring facility is enabled. If the performance monitoring facility is configured to select the Ebox mier-oword event (ECR<PMF_PMUX>=Ebox, ECR<PMF_EMl1X>=85 microword event, ECR<PMF_ENABLE>=1), a write of any value to the PMFCNT processor register will increment both hardware counters. TEST NOTE The performance monitoring facility hardware incrementers may be tested by clearing them via ECR<PMF_CLEAR>, selecting the Ebox 85 microword event, and enabling the facility. Each write to the PMFCNT processor register will then increment both hardware counters, and the result may be observed by reading the PMFCNT register. The interrupt request may be tested by incrementing the PMCTRO hardware counter into hit<15>, which will cause an interrupt to be requested. NOTE If the 16-bit hardware counters are explicitly cleared by writing a 1 to ECR<PMF_CLEAR>, any count in these registers is lost and will not be included in the memory counters. • CAUTION The performance monitoring hardware also provides the "WBU8 LF8R function under control of ECR<PMF_LFSR>. The operation of the hardware is UNDEFINED if both ECR<PMF_ENABLE> and ECR<PMF_LFSR> are on, or if software uses a single MTPR write to turn off one bit and turn on the other simultaneously. That is, if either bit is on, software must turn off both bits with one MTPR and turn on the other with a second MTPR. 18.3 Hardware and Microcode Implementation of the Performance Monitoring Facility . The performance monitoring facility is implemented via both CPU chip hardware and microcode. A block diagram of the perfonnance monitoring hardware is shown in Figure 18-5. 18-8 Performance Monitoring Facility DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0. February 1991 Figure 18-5: Performance Monitoring Hardware Block Diagram BUScS':,,» .BU5c,I:_ A8USc'S:OIbo DRIYE ABUIij.-_ _ _---l CLEA.,..._ _ _---; DRIYE ABUIl!-_ _ _ _ _ _ _ _ _ C~A~ PMCTR1 ~ARRY rITe, PMCTRO OUT .u...IiW;:lIIr.I!i..,.¥I;.~~~~ _ _ _ _ _ _~~~~~~~~~~~ LFSR ......~------..,....------IiMoII.Ii,I;&.;..w;,ill;ll> RAY OUT CARRY IN ITc"" Ii> , S.BIT INCAEUENTERfLFSR , S·BIT INCAEUENTERfLFSR "'UX~ VIC VIC Hi-:- " , E:;;cPIo!=_E"'UX.. EBOX 103 STALL "~'~f\ ~ Ee-oX =".DA~ S"ALl...:.:.:.! SS IIIISTAUCTICN II ACCES~5_...i!~ •• o;;.:~~U"",~:,,,,;";...'_...;......;.'_ _ _ _ _ _ _ _ _ _ _ _ _--..J v.~1 ~ETIP.E-.:.:lt EBOX S.fAU~ _CCn.eP"'_HIT_TY"E .. ~ I IL- ______________ I II C'i°YUV· .. I ! (..t --+-.:..+-.£e~::::·0:::l"'~UY:;.;.' • ..::...; ... L. ~ BOACHE HIT BCACHE HIT OWNer. \J~BO"CHE HIT VALID , -I:.:..-BCACHE MISS OWNED S! MISC:INep..PEFlF.=OUN-:-~ TB POII" DREAD ACCESS TB SO DREAD ACCESS PCACHE IREAD ACCESS PCACHE DREAD ADDRESS Nle ANY REFERENCE The lower 16 bits of the PMCTRO and PMCTRI performance counters are implemented as two I6-bit incrementers in the Ebox. Both incrementers have a common clear line which is driven from an 85 decode ofMISC/CLR.PERF.COUNT, and each has a separate carry-in input to cause an increment in the appropriate counter. The 32-bit concatenated value from.· the incrementers can be read onto E_BUS%ABUS_L (the active-low variant of MlABUS_H), and the upper bit of PMCTRO is used to generate E_PltfNCil,P!dON_L, the performance monitoring facility interrupt request. The PMCTRO and PMCTRI carry-in inputs are supplied by PMUXO and PMUXl, with the PMCTRI carry-in signal gated with the PMCTRO carry-in signal. This makes PMCTRl counter a strict subset of the PMCTRO counter. Increments of both counters are suppressed if the performance monitoring facility is not enabled, or if the PMCTRO counter has reached its maximum value. DIGITAL CONFIDENTIAL Performance Monitoring Facility 18-9 I I NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 The top-level selection of events is determined by ECR<PMF_PMUX>, which selects the source to PMUXO and PMUXl. This selects the source (lbox, Ebox, Mbox, Cbox) of the carry-in signals to each counter. Distributed in the appropriate boxes are second-level muxes which are selected to provide the actual source of the increment events for PMCTRO and PMCTRl. 18.3.1 Hardware Implementation The two I6-bit hardare counters are implemented as side-by-side incrementers in the Ebox datapath (this hardware also implements the 'Wbus LFSR reducer that is described in the testability section of Chapter 8). The carry-in signals for each of the counters are driven from two 4-to-l muxes that are selected by ECR<PMF_PMUX>, and which select the appropriate source of inputs to the incremeniers. Logic in the Ibox, Mbox, and Cbox select the appropriate values to drive the two carry-in signals based on processor register fields in the bOL The Ebox carry-in signals are selected locally and provide the fourth input to the muxes. The PMCTRI carry-in signal is forced to be a subset of the PMCTRO carry-in signal by A..'Y\L>ing the raw PMCTRI carry-in signal with the P1v1CTRO carry-in signal to produce the final PMCTRI carry-in signal. Because the Pl\ICTRl increment is a strict subset of the PMCTRO increment, the ultimate source of the two carry-in signals align them such that they are valid in the same cycle. For example, if the selcted conditions are IREAD PCACHE ACCESS and' PCACHE HIT, these two signals are ,,"alid in the same cycle, and they refer to the same reference. Therefore the assertion of IREAD PC...4..C:HE ACCESS is delayed until the cycle in which PCACHE EnT -is valid. In addition to this, the source of the carry-in signals guarantees that any events that may be retried are only recorded once. For example, a particular Pcacbe access causes only one increment, even if it is retried multiple times. When the IS-bit PMCTRO counter increments into the high-order bit, an interrupt is requested by asserting the E_PMN%PMON_L signal to the interrupt section, unless the hardware is configured to enable LFSR mode. This signal is sampled by edge-sensitive logic, so the interrupt request is maintained until it is cleared by writing a 1 to the appropriate bit in the INT.SYS register, even if the performance monitoring facility hardware counters are subsequently cleared. When the IS-bit PMCTRO incrementer reaches its maximum value, subsequent increments of either counter are inhibited by blocking the clocks to the logic when a carry-out is detected from PMCTRO. In normal operation, this should not occur, but the counter may overflow if the interrupt request isn't serviced within several hundred microseconds, as would be the case if software spent an extended period of time a high IPL with the performance monitoring facility enabled. The 32-bit concatenated value of the two IS-bit hardware incrementers can be read onto E_BUS%ABUS_L when selected by an S3 decode of AlPERF.COUNT. This is the mechanisim by which microcode retrieves the current values of the two incrementers. The 32-bit concatenated value is cleared by an S5 decode of MISClCLRPERF.COUNT. The clear is done independent of whether the logic is enabled for performance counting or LFSR mode. 18-10 Performance Monitoring Facility DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specmcationt Revision 1. It August 1991 18.3.2 Microcode Interaction with the Hardware There are several points at which the microcode interacts with the performance monitoring facility hardware. At powerup, microcode clears both of the I6-bit hardware incrementers and any potential interrupt request. MICROCODE RESTRICTION If the performance monitoring facility hardware incrementers are cleared in cycle 'n' via MISC/CLR.PERF. COUNT, INT.SYS<28> must be written with a 1 no earlier than cycle 'n+3' to guarantee that the interrupt request is cleared. This delay is due to latency introduced between the performance monitoring factility hardware and the interrupt section. Microcode reads the current value of the hardware incrementers via A1PERF. COUNT as a byproduct of a read of the PMFCNT processor register, and as part of the process of updating the memory counters. Microcode clears the hardware incrementers via MISC/CLR.PERF.COUNT when ECR<PMF_CLEAR> is written with a 1. Microcode also clears the incrementers after reading and updating the memory counters. Microcode uses the CPUID processor register value to find the pair of quadwords that contain the performance counter values for this CPU. This value must be correctly initialized by either console firmware or software before the performance monitoring facility is enabled. The operation of the processor is UNDEFINED if CPUID is not correctly initialized. The memory counters are updated under three circumstances: when a performance monitoring facility interrupt is serviced, when the facility is disabled via a write to the PME processor register, and when the facility is disabled by loading a new value ofPME is LDPCTX. The memory updates are done in a common subroutine by disabling the facility by clearing ECR<PMF_ENABLE>, reading the current value of the hardware incrementers and then clearing them, and updating each quadword in memory with the appropriate I6-bit hardware value. DIGITAL CONFIDENTIAL Performance Monitoring Facility 18-11 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 18.4 Revision History Table 18-7: Revision History Who When Description of change Mike Uhler 12-Sep-1990 Reverse the definition of the TB selections for the Mbox performance monitoring mux Mike Uhler 12-Jan-1990 Initial release Mike Uhler 02-Jul-1990 Update to reflect implementation Mike Uhler 13-Feb-1991 Update to relect pass 1 design Mike Uhler 12-Aug-1991 Minor updates to clarify interrupt request 18-12 Performance Monitoring Facility DIGITAL CONFIDENTIAL Chapter 19 Testability Micro-Architecture 19.1 Chapter Overview This chapter describes the ~\'AX CPU chip's Testability Micro-_4..rchitecture-a framework of testability features implemented throughout the l\-rv:.u CPU chip. The chapter does not detail the motivation for testability features or discuss the actual method of their uses in various life cycle testing phases. These is covered elsewhere. (For example, see in [1J.) 19.2 The Testability Strategy The l\"VAX CPU chip's testability strategy addresses the broad issue of providing cost-effective and thorough testing during many life cycle testing phases. The strategy specifically implements test features to support • • • • • chip debug high fault coverage test at wafer probe and packaged chip test support "reduced probe contact" wafer probe test support for effective chip burn-in test support module interconnection test via boundary scan and in-circuit-test (ICT) via a single pin tristate feature. The strategy uses a combination of a variety of testability techniques and approaches that are best suited to address the specific functional elements in the chip. The cost-effective implementation is realized by the appropriate consideration of global issues, by unifying the test objectives, by sharing test resources and by exploiting features inherent in the chip. The strategy also relies on leveraging off the design verification patterns in developing production test patterns to meet the fault coverage goals. The test features are implemented such that they have no effect on the targeted performance of the chip. DIGITAL CONFIDENTIAL Testability Micro-Architecture 19-1 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 19.3 Test Micro-Architecture Overview The NVAX CPU chip's Test Micro-Architecture consists of two principal elements: Test Interface Unit and the Testability Features. Test Interface Unit The Test Interface Unit (TIU) implements a comprehensive test access strategy for the NVAX CPU. It permits an efficient access to testability features implemented on the chip. Figure 19-1: Test Interface Unit T e s t a b i F e a t (~ Sus pin s} "'_ .A ~ ". < IEEE 1149.1 S e ria I Test Access Port Y ~ tI. ~ ~ 'II: >' EJ tI. ~ 411: (JTA G) t e s u r " I::: "~9.' i I S Y S t e m P Se ri al Peach • Port (2 pins) 0 Parallel Test Po rt q Parallel tI. Port (15 pins) V ~ r t System 'tI." " ~ Pins 'II: >' TIU shown in Figure 19-1 consists of three ports: an IEEE P1149.1 (JTAG) serial test port, a parallel test port and an "invisible" port consisting of test pads. The serial test port is a 4pin dedicated test access port conforming to the IEEE Pl149.1 (JTAG) standard. It is used for accessing the boundary scan register. The parallel test port consists of 15 dedicated pins. This port is used for accessing internal scan registers and test features which benefit from parallel access (for example, microaddress bus). The Test Pads primarily facilitates micro-probing during chip debug. These pads are located at strategic nodes throughout the chip. The NVAX CPU also has a special 2-pin serial port consisting of TEST_DATA...H and TEST_ STROBE_H that allow the PCache to be loaded serially under control from special microcode. This feature has been provided to support convenient self-test operation during the chip burn-in test. For more details see Section 19.7 19-2 Testability Micro-Architecture DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Speci:6.cation, Revision 1.1, August 1991 In addition to these test ports, NVAX also uses the normal system port (pins) for test access. This access consists of using the VAX instructions to manipulate a testability feature or to perform the actual tests on the chip's logic. Table 19-1 summarizes the dedicated test pins for NVAX.. Table 19-1: NVAX CPU's Test Pins Pin Type Pin Function Input, ~-up IEEE 1149.1 Serial Test Data Input Output, Tri-state, 2 receivers IEEE Pl149.1 Serial Test Data Output TMS_H Input, Pull-up IEEE 1149.1 Test Mode Select TCK_H Input, pull-down IEEE 1149.1 Test Clock PP_CMD_H<2:0> Input, pull-up Parallel Port: Command Pins PP_DATA_H<ll:O> DISABLE_OUT_L Output Parallel Port: Data Pins Input, Pull-up Disables (tristate) all output drivers TEST_DATA_H Input, Pull-up Data for serially loading PC ache TEST_STROBE_H Input, Pull-up Strobe for serially loading PC ache OSC_TEST_H o SC_TC I_H. OSC_TC2_H Input Test clock enable. See Section 3.2.2 Input Test clocks. See Section 3.2.2 TEMP_H Output Temperature sensor. See Section 3.2.5 Pin Name Testability Features The testability features facilitate the testing of the chip, module, or system. The testability features are scattered throughout the NVAX CPU chip. The features implemented primarily use internal scan registers, LFSR Reducers and boundary scan register. 19.4 Parallel Test Port This port allows the critical chip nodes to be either controlled or monitored in parallel. The port consists of 15 dedicated test pins as follows: PP_DATA_H<11:0>: A 12 bit output pins that provide control to or observability of various internal nodes. PP_CMD_H<2:0>: Selects up to eight different test configurations at the parallel port. Table 19-2 lists the Parallel Port's configurations. NOTE 1. "When the parallel port is not in use, internal pull-ups on PP_CMD_H<2:0> pins force the port into an inactive (Ebox observe MAE) state. DIGITAL CONFIDENTIAL Testability Micro-Architecture 19-3 NVAX CPU Chip Functional Specificationt Revision I.l t August 1991 2. PP_CMD_H< 0 > pin is also used as pseudo-TRST_L pin to reset JTAG circuits. Table 19-2: Parallel Port Operating Modes Command Pins PP_CIW>_H<2:0> Port Mode 111 Observe MAB (Default) 110 Observe Mbox 101 100 Observe CboxlMbox Observe !box 011 010 001 Enable LFSR Mode Undefined Shift ISRs 000 Force MAB 19.4.1 Data Pins Signals controlled/Observed PP_DATA_H<ll:O> PP_DATA.,.H<11> PP_DATA_H<10:O> PP_DA~H<ll:9> PP_DATA.,.B<8:4> PP_DATA.,.H<3> PP_DATA-H<2> PP_DATA.,.B<l> PP_DATA_H<O> PP_DATA_H<ll:9> PP_DATA_H<8> PP_DATA_B<7> PP_DATA_H<6:4> PP_DATA.,.H<3:0> PP_DATA_H<11> PP_DATA.,.H<10:7> PP_DATA_H<6:0> PP_DATA_H<11:O> PP_DATA_B<ll:O> PP_DATA_H<11:3> PP_DATA_B<2:0> PP_DATA.,.B<ll:O> Internal PBI_2. Ebox MAB. See Section 9.5. 85 Reference Source. See Section 85 command. See T~ble 12-1. 85 Abort. S5 TB Miss. 85 PCache Hit. Cbox BC_TS_CMD<2!O>. See Table 13Cbox DEALLOC. Cbox BC_HIT. Mhox MD Destination. See SectiOl Mhox MME State. See Section 12.' Internal PHC2. Undefined. I-MAE. See Section 7.11.3. Undefined. Undefined. ISRI (Control Store data). I8R2 (Other intemalscan d-4-"\). Undefined. See Section 9.5. Parallel Port Operation Internal Scan Registers When shifting, the ISR bits are serial to parallel converted. They change every third cycle on internal PHI_4. This gives usable time with respect to the NDAL clocks. The parallel port commands are captured synchronously with respect to the NDAL clocks, in NDAL phase 3. In order to give full flexibility in capturing a given internal cycle, a mechanism is provided to delay the capture-and-start-shifting event by 0, 1, or 2 cycles. This delay is determined by the state of the parallel port bits PP_CMD< 1:0 > immediately before entering the Shift ISR mode. COO' corresponds to zero delay, '01' corresponds to 1 cycle delay and '10' correspond to two cycle delays.) See the timing diagrams in Figure 19-2 19-4 Testability Micro-Architecture DIGITAL CONFIDENTIAL Chapter 20 Electrical Characteristics 20.1 Introduction This chapter specifies the electrical characteristics to which one must adhere in order to incorporate the chip in a system. Related information may be obtained from the following documents: 1. NVAX Module Signal Integrity Handbook. 2. CMOS-4 Technology File, revision 2.3. 3. NVAX CPU Module Inter-chip Specification. 4. NVAX CPU Chip Functional Specification, Chapter 3, Chapter 13, and Chapter 17. 20.2 NVAX DC Operating Characteristics 20.2.1 Maximum Ratings Table 20-1: Maximum Ratings Parameter sym min max units comments internal supply voltage VDDi 3.0 3.465 Vdc 3.3V +5%/-10% including power supply ripple external supply voltage VDDe 3.0 3.465 Vdc 3.3V +5%/-10% including power supply ripple power dissipation @ 10ns cycle 16.3 watts measured at VDDi=VDDe= 3.465V power dissipation @ 12ns cycle 13.8 watts measured at VDDi=VDDe: 3.465V power dissipation @ 14ns cycle 12.0 watts measured at VDDi=VDDe= 3.465V 9.7 watts measured at VDDi=VDDe= 3.465V 100 degC specific ambient temperature depends on board design and air flow power dissipation @ 18ns cycle junction temperature DIGITAL CONFIDENTIAL Tj 0 Electrical Characteristics 20-1 NVAX CPU Chip Functional Specification, Revision 1.2, December 1991 Table 20-2: Power Dissipation Across Voltage and Cycle Time Cycle time min@3.2V max@3.2V mraOa.465V max@3.6V lOns cycle 8.3 13.9 16.3 17.6 watts 12ns cycle 7.1 11.8 13.8 14.9 watts 14ns cycle 6.2 10.3 12.0 13.0 watts 18ns cycle 5.0 8.3 9.7 10.4 watts units The power dissipation numbers given are worst-case average power dissipation measurements; they do not represent the peak instantaneous power dissipated on NVAX. The worst-case average power values were developed from the measured power dissipated when a worst-case pattern was run on an NVAX chip in a Neptune system. 20-2 Electrical Characteristics DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.2, December 1991 20.2.2 Pin Driver Impedance Table 20-3 contains the acceptable range for output driver impedance, assuming worst case environmental skews. Table 20-3: NVAX Pin Driver Impedance Rterm Rterm. Name low high Po/llACK_L Po/oCMD_H<3:0> Po/oCPU_HOLD_L Po/oCPU_RE'LL Po/oCPU_SUPPRESS_L P%DR_DATA_H<63:0> P%DR_ECC_H<7:O> P%DR_INDEX_H<20:3> P%DR_OE_L P%DR_WE_L P%ID_H<2:0> 10 10 12 12 12 11 11 4 4 4 10 10 10 10 8 8 8 8 10 12 10 11 12 12 11 11 12 15 15 18 18 18 17 17 Po/~CHINE_CHECK_H P%NDAL_H<63:0> P%PARITY_H<2:0> P%PID12_0UT_H P%PID23_0UT_H P%PID34_0UT_H P%PID41_0UT_H P%PP_DATA_H<ll:O> P%SYS_RESET_L P%TDO_H P%TS_ECC_H<5:O> P%TS_INDEX_H<20:5> P%TS_OE_L P%TS_OWNED_H P%TS_TAG_H<31:17> P%TS_WE_L 6 6 6 15 15 15 15 12 12 12 12 15 18 15 17 18 18 17 17 18 Rds Z low high low Z high 19 65 20 20 20 20 20 12 12 12 65 65 65 65 37 125 41 41 41 41 41 25 25 25 125 125 125 125 23 23 23 23 125 41 125 41 41 41 41 41 41 29 75 32 32 32 31 31 16 16 16 75 75 75 75 16 16 16 16 75 32 75 31 32 32 31 31 32 52 140 59 59 59 58 58 31 31 31 140 140 140 140 35 35 35 35 140 59 140 58 59 59 58 58 59 Rds 8 8 8 8 65 20 65 20 20 20 20 20 20 Key to pin characteristics: Rterm - termination resistance Rds - device resistance Z - sum of resistance range Conditions of test: Vdd =3.465v 'lJ =0 and 100 degrees Centigrade Rds measured with the pin shorted to Vdd=3.465v for measuring N-MOS characteristics Rds measured with the pin shorted to Vss=O.Ov for measuring P-MOS characteristics Pins cannot tolerate shorts for prolonged periods. The above information is provided for test purposes only. DIGITAL CONFIDENTIAL Electrical Characteristics 20-3 NVAX CPU Chip Functional Specification, Revision 1.2, December 1991 20.2.3 Pin Capacitance Table 20-4: Maximum Pin Capacitance Pin Types Rating Unit I/O and output only pins 12.0 7.5 8.5 pF pF pF input only pins except for Po/oPHIXX_lN_R P%PB1XX_IN_H Conditions of test (in simulation): measured as pin capacitance to VSSi with all other pins returned to VSSi 'Ij 27 degrees Centigrade measured at DC, zero bias for the junction capacitors = 20.2.4 Pin Operating Levels Table 20-5 summarizes the electrical characteristics for various pin operating levels. Table 20-6 identifies the operating level associated with each unique pin group. Table 20-5: NVAX Pin Levels VJl Vlh Vol 101 Voh loh MaxVm1 Leakage 0.8 0.8 TrL IN 3 0.8 TrLINPU P%PP_Cl\ID_ 0.8 2.0 2.0 2.0 2.0 0.4 +2mA 2.5 -2mA 6V 4.5V Vdd+0.5V Vdd+O.5V 100uAmps 100uAmps ±200-900uAmps 1000uAmps 2.0 2.0 +0.3V 2.0 0.4 +2mA Vss+O.IV +40uA 2.6 -2mA Vdd-0.1V -40uA 4.5V 4.5V Vdd+0.5V Vdd+O.5V 100uAmps 100uAmps 100uAmps 100uAmps Level Type TrLI0 2 H<2:O> CMOS' CMOS 5 ECLIN ACKI~ 0.8 0.8 -0.3V 0.8V 0.4 +17mA lm.aximum voltage tolerable without incurring damage 25-volt tolerant 3 pins with active pull-up or pull-down 'with TIL load liwith CMOS load 6active pull-up to 3.3 volts 1 c:' 20-4 Electrical Characteristics r DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 20-6: NVAX Pin Characteristics Name Type Level Voltage Po/oACK_L P%ASYNC_RESET_L Po/oCMD_H<3:0> PO/OCPU_GRANT_L Po/oCPU_HOLD_L Po/oCPU_RE(LL Po/oCPU_StnPPIUESS_L P%CPU_WB_ONLY_L P%DISABLE_OUT_L P%DR_DATA_H<63:0> P%DR_ECC_H<7:O> P%DR_INDEX_H<20:3> P%DR_OE_L P%DR_WE_L P%HALT_L P%H_ERR_L P%ID_H<2:0> P%INT_TIM_L P%m<LL<3:O> P%MACmNE_CBECK_H P%NDAL_H<63:O> P%OSC_H P%OSC_L P%OSC_TCl_H P%OSC_TC2_H Po/oOSC_TEST_H P%PARITY_H<2:0> P%PHl12_IN_H P%PBl12_0UT_H P%Pin23_IN_H P%PBl23_0UT_H p%pm34_IN_H p%pm34_0UT_H p%pm41_IN_H p%pm41_0UT_H P%PP_CMD_H<2:O> P%PP_DATA_H<ll:O> P%PWRFL_L P%SYS_RESET_L P%S_ERR_L B,OD I B I 0 0 0 I I B ACK TTL TTL TTL TTL TTL TTL TTL TTL TTL TTL TTL TTL TTL TTL TTL TTL TTL TTL TTL TTL ECL ECL CMOS CMOS CMOS TTL CMOS CMOS CMOS CMOS CMOS CMOS CMOS CMOS TTL TTL TTL TTL TTL 3 3 5 3 5 5 5 3 3 5 5 3 3 3 3 3 5 3 3 5 5 3 3 3 3 3 5 3 B 0 0 0 I I B I I 0 B I I I I I B I 0 I 0 I 0 I 0 I 0 I 0 I Pull·x + + 3 3 3 3 3 3 3 3 5 3 5 3 + Key to pin characteristics: LEVEL - threshold levels as per Table 20-5 VOLTAGE - (5) 5V tolerant driver, (3) 3V tolerant driver - must not be exposed to 5V signals PULL-X - (+) active pull-up, (-) active pull-down TYPE - (B) bidirectional, (I) input, (0) output, (OD) open drain DIGITAL CONFIDENTIAL Electrical Characteristics 20-5 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 20-6 (Cont.): NVAX Pin Characteristics Name Type Level Voltage P%TCK_H P%TDI_H P%TDO_H P%TEMP_H P%TEST_DATA_H P%TEST_STROBE_H P%TMS_H P%TS_ECC_B<5:0> P%TS_INDEX_B<20:5> P%TS_OE_L P%TS_OWNED_H P%TS_TAG_B<31:17> P%TS_VALID_H P%TS_WE_L I I 0 0 I I I B 0 0 B B B 0 TTL TTL TTL 3 < 3V TTL TTL TTL TTL TTL TTL TTL TTL TTL TTL 3 5 3 3 3 3 5 5 5 5 5 5 5 Pull-x + + + + Key to pin characteristics: LEVEL - threshold levels as per Table 20-5 VOLTAGE - (5) 5V tolerant driver, (3) 3V tolerant driver - must not be exposed to 5V signals PULL-X - (+) active pull-up, (-) active pull-down TYPE - (B) bidirectional, (I) input, (0) output, (OD) open drain 20-6 Electrical Characteristics DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision l.2t December 1991 20.3 NVAX AC Operating Characteristics This section specifies AC timing parameters, but is not intended to illustrate detailed transactionaloperation. 20.3.1 1. 2. 3. 4. 5. AC Conditions of Test Tj = 0 to 70 degrees Centigrade VDDi = 3.3 volts +3.9%/-3% (3.2 to 3.43V) VDDe =3.3 volts +3.9%/-3% (3.2 to 3.43V) Voltage levels used for timing specifications as per Table 20-5. Pin loading used for timing specifications as per Table 20-7. Table 20-7: Pin Loading for AC Tests Loading required for chip test on Takeda 3381 Pin Total Pin Loading Series Resistor Series Capacitor P%DR_INDEX_H<20:3> 140 pF 10 ohms 100pF P%DR_OE_L 140 pF 10 ohms 100 pF P%DR_WE_L 140 pF 10 ohms 100pF P%TS_INDEX_H<20:5> 60 pF 15 ohms 20 pF P%TS_OE_L 60 pF 15 ohms 20 pF P%TS_WE_L 60 pF 15 ohms 20 pF P%PHlXX_OUT_H 70 pF 22 ohms 30 pF all others 40 pF none none The AC conditions of test given were designed specifically with the Neptune and Omega systems in mind, in order to maximize chip yield. The AC conditions of test may be changed in the future depending upon chip yields and the needs of the system partners. DIGITAL CONFIDENTIAL Electrical Characteristics 20-7 NVAX CPU Chip Functional Specification, Revision 1.2, December 1991 This page intentionally left blank. 20-8 Electrical Characteristics DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 20.3.2 NDAL Timing Specification NDAL signal timing is specified as phase and constant offsets from the NDAL clock inputs. The chip operating frequency determines the phase time. Figure 20-1: NDAL Pin Timing Relative to the NDAL CLOCKS ----I-----NN D A P4 P%PHIl2_IN_H P%PHI23_IN_H Pl L C Y C L E:-----I-----NN D A L P2 ______J/ P3 P4 Pl C Y C L E,----I P2 P3 P4 _------.,. \...._ _ _ _ _ __J/ \ .... I I _______~----Ji~--~----~\. . ----.,._-----Ji~----~-----~\'------~ P%PHI34 _ IN_ H ~----~~________~/-----------~~________~/r----~---+{ P%PHI41_IN_H / P%ID H<2:0> P%PARITY H<2 :0> P%NDAL H<63:0> P%CMD_H<3:0> 1 \ -I ! ~««««««««« / ! : 1 \ ...._____________J/ 1 1---- »»»~ As dri.ven by NVAX CPU: I Dri.ven from P%PHI12 IN H ri.sing edge Re1eased wi.th P%PHI41_IN_H ri.si.ng edge : : ~ P%ID B<2:0> P%PARITY H<2:0> P%NDAL B<63: 0> P%CMD_H<3:0> :.»)~ I As recei.ved by NVAX CPU I Latch c10ses wi.th P%PHI41_IN_B ri.si.ng edge (1atch open duri.ng phi.23) """"" LI/'////// "////////////////// As pu1led 10w by NVAX CPU & pu11ed hi.gh through board pu11up resi.stor NVAX pu1ls 10w w/P%PHI23_IN_H ri.sing; NVAX releases wi.th P%PHI23_IN_H fa11i.ng : I : : : I : : As requi.red by NVAX CPU Latch c10ses wi.th P%PHI34 IN H ri.si.nq edge (latch open duri.nq phi.12)- 1----1 I 1 XXXXXXX)()O< P%CPU HOLD L P%CPU-SUPPRESS L P%CPU:REQ_L - As dri.ven~b~y~Nv.~AX~~C~PU~-----------1 Dri.ven wi.th P%PHI12_IN_H ri.sinq edge P%CPU WB ONLY L P%CPU:GRAm_L- As ~iredby ~ CPU :: ! Latch closes wi.th P%PHI41 IN H ri.sinq edge (latch open duri.ng phi.23)- - > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 --- ---I DIGITAL CONFIDENTIAL I 1 I 1 : - - - 1 - - - - - - ----'-' Electrical Characteristics 20-9 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 2lh9: NDAL AC timing specs Input Pin Bold Time Po/oNDAL_B<63:O> Po/oCMD_H<3:0> P%ID_H<2:O> P%PARITY_H<2:O> Po/oCPU_WB_ONLY_L Po/oCPU_GRANT_L Output Pin Output Valid Output Tristate P%NDAL_B<63:O> P1oCMD_B<3:0> P%ID_B<2:O> P%PARITY_H<2:O> P%PHI23_IN_H R + 1 phase Oow transition), P%PHI23_IN_B F + 3 phases(higb transition)8 P1oCPU_BOLD_L P1oCPU_SUPPRESS_L i,t," lit P1oCPU_RE'L,L / 1R means the rising edge of the clock is used; F means the falling edge of . 2Data may be held capacitively during the hold time. . ! e clock is used. / 8Po/oACK..L is pulled up to 3.3v through a resistor in the system and inf.e test environment. / 20-10 Electrical Characteristics I DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 20.3.3 BCACHE Timing Specification Due to the chip's clocking structure, BCACHE timing is specified relatively between the various BCACHE signals. Figure 20-2 and Figure 20-3 show the Bcache timing for a generic NVAX system. Table 20-9 and Table 20-10 specify the RAM timing constraints under NVAX input requirements subtitle, and the guaranteed chip output under NVAX output responses. This data should be used to establish chip input requirements and output responses in a generic system environment. Signal delays are dependent on chip packaging and board design. For the OMEGA system, the chip must meet the input constraints and the output responses specified in Table 20-11 and Table 20-12. The OMEGA system operates at a 14ns clock cycle, using a 128 KB cache with 16Kx4 25ns data RAMs and 4Kx4 25ns tag RAMs. This configuration requires the Bcache processor register settings CCTL(DATA_SPEED)=Ol and CCTL(TAG_SPEED)=l to allow one slip cycle for both data and tag RAM access. See Chapter 13. The specific timing for XNP systems is shown in Figure 20-4 and Figure 20-5. The chip must meet the input constraints and the output responses specified in Table 20-13 and Table 20-14. The XNP system operates at a 14, 12, or IOns clock cycle, using a 2 MB cache with 256Kx4 20ns data RAMs and 64Kx4 15ns tag RAMs. This configuration requires the Bcache processor register settings CCTL(DATA_SPEED)=Ol and CCTL(TAG_SPEED)=l to allow one slip cycle for both data and tag RAM access. The timing constraints for both the OMEGA and XNP systems are based upon the RAM specifications rather than upon NVAX predicted behavior. Actual signal delays are dependent on chip packaging and board design. DIGITAL CONFIDENTIAL Electrical Characteristics 20-11 t ~ 1I c N m [ (Q Generic Data IU\H Pad Timing ... ." *.* *** ••• "... It . . . . . . . • • • • • • • • • • • ." . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . "' . . . . . . . . . . . . . . . . . . . . . . . . . . "' . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . '" • • • • * ....................................*** ••••••• Data nM r .. ad. Aborted du .. to tag ..i .... . :1. ~ o . a . ! f I\) P'D"'-INDEX_" ID ptD"'-DAT_R Q d ~ "tS STATE i G) CD P'DR_OE_L p'DR..-"E_L ; n aC g .... » ::u s:: "tS :I :1 5' co a ~ f."". I» Data nM r .. ad-modify writ.... STATE PtDR_ INDEX_" c P'DR..-DAT_" ii co DJ p'DR..-0E_L PtDR..-"E_L I'!j § ::I. CD a Gl ~ 3 a 00 ~ III g Dat.a IU\H quadword writ.e followed by read. "'""' ~ i STATE p'DR_:tNDEX_R P'DR_DAT_" I!"t- ..... cc cc P'D"'-OE_L !2 Q i!r o o z l! c m ~ » r ptDR_"E_L Ie Tall ~I Twp I "'""' NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 20-9: Generic Data RAM TIming Specification Param Function NVAX Input Requirement Taa address access to RAM data valid ::; (7 phases - P%DR_INDEX_H drive-to-valid delay) Toe OE assertion to RAM data valid RAM data_ output hold from INDEX ::; (5 phases - P%DR_OE_L deasserted-to-asserted delay) Toh 2:: 2.0ns change Tohz OE deassertion to RAM data high-z Param Function ::; (4 phases - P%DR_OE_L asserted-to-deasserted delay) NVAX Output Response Tto data high-z to OE assertion Tdw data valid to WE deassertion 'l\vp WE pulse Taw address valid to end of write 2:: O.Ons 2:: (5 phases - P%DR_DAT_H drive-to-valid delay) 2:: (6 phases - P%DR_WE_L deasserted-to-asserted delay) 2:: (10 phases - P%DR_INDEX_H drive-to-valid delay) Tnz NVAX tristate time ::; 1 phase Twr write recovery (WE deassertion to ~ 0.On8 INDEX change) Tdh data hold after WE deassertion Tas address setup Tow OE deassertion to WE assertion DIGITAL CONFIDENTIAL 2:: O.Ons 2:: O.Ons 2:: O.Ons Electrical Characteristics 20-13 I\) 1:.a:.. m [ J1 (Q Generic TAG RAM Pad Timing ...... *.............. *.* ............. *** •• **** .********* ...... * ... **** *****'" *** .. ***.* ** * .. * .. * **** •• **** ............. * *** **** .. *...... ***** .*** .... ft* **** •• *.* .. *** ****** ** ... * .... ***.*.********* *** .**** .. **** TAG RAM read follow .. d by another read. I 1'1 ::!. [ o if Al ~ ~ ~ C STATE ptTS_INOEX_" P'TS_TAG_H ptTS_OE_L p'TS_"~_L I 1'2 I I I I I I t.eYi 1e I I I 1'3 IDLE -- I I 1'4 1'1 I IE I 1'2 I 1'3 wt I 1'4 KUp [ndex for read Taa I 1'1 I 1'2 I 1'1 I 1'2 I 1'3 :roLE I 1'4 I 1'1 I 1'2 I 1'3 I 1'4 I rootuP .1.5ne : rTrZ~ L___ :___ I ~ i>::s CD ::!. RAMS drivin () I .~ CC :u » TAG RAM quadword writ... followed by read. i: :t STATE PUS_INDEX_" P'TS_TAG_" P'TS_"E_L C e: ~ § n ~ ~..... f ~ :I :; n (Q ii ~ II :I S. 0 (Q PUS_OE_L ~ d "'d . I I rToejH. : I I I 1'4 r001up I I I 1'3 CiJ I\) ~ C ~ f. g .... i-' { c C5 ~ o r- o z ::!! c m ~ » r- "".... ....~ NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 20-10: Generic Tag RAM Timing Specification Param. Function NVAX Input Requirement Taa address access to RAM data valid :$ (7 phases· P%DR_INDEX_B drive-to-valid delay - 1.5ns) Toe OE assertion to RAM data valid :$ (5 phases - P%DR_OE_L drive-to-valid delay - 1.5ns) Param. Function NVAX Output Response '!'to high-z to OE assertion ~ 1.5ns Tdw data valid to WE deassertion ~ (5 phases· P%TS_TAG_B drive-to-valid delay) 'lWp WE pulse ~ (6 phases - ~TS_WE_L drive-to-valid delay) Twr write recovery ~ -2.0ns Tdh data hold time ~ 1.Ons Tas address setup ~ (4 phases - P%TS_JNDEX_B drive-to-valid delay) DIGITAL CONFIDENTIAL Electrical Characteristics 20-15 NVAX CPU Chip Functional Specificationt Revision 1.1t August 1991 Table 20-11: OMEGA-Specific Data RAM Timing Specification . 128KB Bcache, 25ns Data RAMs, 25ns Tag RAMs, 14ns cycle NVAX Test Input Requirements Param Function Timing Measuring Point Notes Taa address access to data valid ~25.0ns INDEX 2.4H1.4L must be met before tester drives data Toe OE assertion to data valid ~ 12.0ns OE.4L must be met before tester drives data Toh output hold ~ 3.0ns INDEX .4H12.4L tester hold time Tohz OE deassertion to data high-z ~ lO.Ons OE2.4H tester hold time, chip overdrives Tcycle internal cycle time 14.Ons NVAX Test Output Responses Param Function Timing Measuring Point '!'to high-z to OE assertion ~ O.Ons OE 2.4L Tdw data valid to WE deassertion ~ lO.Ons DAT 2.4H1.4L Twp WE pulse ~ 15.0ns WE .4L, WE 2.4H Twr write recovery ~ O.Ons WE .4L, INDEX .4H12.4L Tdh data hold time ~ O.Ons WE2.4H Tas address setup ~ O.Ons INDEX 2.4H1.4L, WE 2.4L Tow OE deassertion to WE assertion ~ O.Ons OE 2.4H, WE .4L 20-16 Electrical Characteristics Notes DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 20-12: OMEGA Specific Tag RAM Timing Specification 512KB Bcache, 12ns Data RAMs, 12ns Tag RAMs NVAX Test Input Requirements Param Function Timing Measuring Point Notes Taa address access to data. valid ~ 12.0ns INDEX 2.4H1.4L must be met before tester drives data . Toe OE assertion to data. valid ~ 6.0ns OE .4L must be met before tester drives data Tcyc1e internal cycle time 14.Oos NVAX Test Output RespoDSes Param Function Timing Measuring Point Tto high-z to OE assertion ~ O.Ons OE2.4L Tdw data. valid to WE deassertion ~ 6.0ns DAT 2.4H1.4L Twp WE pulse ~ 12.0ns WE .4L, WE 2.4H Twr write recovery ~ O.Ons WE .4L, INDEX .4H12.4L Tdh data. hold time ~ O.Ons WE 2.4H Tas address setup ~ 0.On8 INDEX 2.4H1.4L, WE 2.4L DIGITAL CONFIDENTIAL Notes • Electrical Characteristics 20-17 ~ lJ c 00 m [ CO Oat. !lAM Pad Ti .. in." .......................................... * .......................................... **** ....................................... ++ ..... *........... ** ............................................................................. 0 .. 1:11. I\AM r .... d. Aborl: .. d du .. t.o t. .. ." .. lee. o i STATE >< z P'DR_INDEX_II ; .~~ ~ N I ::t. ~ i c;o ! :u » STATE !C PtDR_INDt:X_H :t :1 :; P'D~DAT_H (Q c P'D~OE_L Si co iii 3 P'D~"E_L ~ o oz :n c m § » r g e. 00 ~ (I) n a; ~ ~ ~ :l. rn .... g ~ .t"' STATE i ""cc.... CD .... PtDR_DAT_" r n Data I\AM qu .. dwo.rd write follow .. d by r .... d. P'D~tNDEX_H c Ci ~ Q; :;; Data !lAM read-modify writ... ~ .... ~ "g () ptD~"E_L d a CD PtDR_OE_L Q ." en P'DR_OAT_H ~ P'D~OE_L P'D~"EJo 13.0n .. I Twp NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 20-13: XNP Specific Data RAM Timing Specification 2MB Bcache, 20ns Data RAMs, 15ns Tag RAMs NVAX Test Input Requirements Param Function Timing Measuring Point Notes Taa address access to data valid ~ 20.0ns INDEX 2.4H1.4L must be met before tester drives data Toe OE assertion to data valid ~ 10.Ons OE.4L must be met before tester drives data Toh output hold ~ 5.0ns INDEX .4H12.4L tester hold time Tohz OE deassertion to data high-z ~ 10.Ons OE 2.4H tester hold time, chip overdrives Tcyc1e internal cycle time 14.0,12.0, or 10.0ns NVAX Test Output Responses Param Function Timing Measuring Point Tto high-z to OE assertion ~ O.Ons OE 2AL Tdw data valid to WE deassertion ~ 12.0ns DAT 2.4H1.4L Twp WE pulse ~ 14.0ns WE .4L, WE 2.4H Twr write recovery ~ O.Ons WE AL, INDEX .4H12.4L Tdh data hold time ~ O.Ons WE 2.4H Tas address setup ~ O.Ons INDEX 2.4H1.4L, WE 2.4L Tow OE deassertion to WE assertion ~ O.Ons OE 2.4H, WE .4L DIGITAL CONFIDENTIAL Notes Electrical Characteristics 20-19 N ~ :II o rn [ TAG IU\M Pad Thing *****.* .... ** .............. * ......................... *............................. ** .. ** .. ** .... * ............. ** ..... **.* .. t-".* .ft ..... ** *** .......... ***** ••• * ..... *** .** ..... ** ....... *.**** ... *** ...... * .. ", .... ***. ****** .. * .. to •• * •• ** ...... *•• TAG RAM read followed by another read. I P1 :::!. ~ o :r ...I»I» ...in~ ct 1l I P2 I 1'3 I 1'4 ptTS_rNDEXY I ptrs_TAG_H I 18:on. I I STATE PtTS_OE_L nrS_"E_L 1'1 I 1'2 I 1'3 IN I P1 415.0n.. I I tCYjle I I I 1'2 I !OlE 1 I .: Toe I I 1'1 I 1'2 I 1'3 I 1'4 1'1 I 1'2 rolE 1 : I I TAG IU\M quadword write followed by read. ~8ion .. rrrz~ lI'lOI I ..~ >< Z r~~·on .. ~ I .1. 5n .. I 1'3 I 1'4 I 100fup I IU\M data ~8.on-:---~ . I 1'4 100fup r• • . E8.on..~ I 1'3 I RUp rots ca c i 9.5n_ "'0 IU\MS en dr1 v1n ." CD J ... Sl niJ to :D » iC STATE :t nTS_INDEX_H :I ca :; P'TS_TAG_H C ptTS_OE_L Dr (Q P'TS_"E.:.L 13. On .. I TwP iJ :I ~ fd d ~ "I'J ~ S n Q: i Ul 1 ~ ~ Q: ~ ~ ~. rn .... 8 ~ ~ { "" ~ c C5 ~ r- 8Z "11 6 m § » r- ~ ~ NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Table 20-14: XNP Specific Tag RAM Timing Specification 2MB Bcache, 20ns Data RAMs, 15ns Tag RAMs NVAX Test Input Requirements Param Function Timing Measuring Point Notes Taa address access to data valid· ~ 15.Ons INDEX 2.4H1.4L must be met before tester drives data Toe OE assertion to data valid ~ B.Ons OE.4L must be met before tester drives data Tcycle internal cycle time 14.0,12.0, orl0.0ns NVAX Test Output Responses Param Function Timing Measuring Point Tto high-z to OE assertion ~ O.Ons OE 2.4L Tdw data valid to WE deassemon ~ 7.0ns TAG 2.4Hl.4L Twp WE pulse ~ 15.0ns WE .4L, WE 2.4H TWr write recovery ~ O.Ons WE .4L, INDEX .4H12.4L Tdh data hold time ~ O.Ons WE 2.4H Tas address setup ~ O.Ons INDEX 2.4H1.4L, WE 2.4L DIGITAL CONFIDENTIAL Notes Electrical Characteristics 20-21 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 20.3.4 Other Pin Timing Specifications 20.3.4.1 Clock Timing When P%OSC_TEST_H is not asserted the chip receives the master clock through the P%OSC_ Hand Po/oOSC_L pins. Operation of the chip at the maximum internal clock speed of 100 MHz requires an input clock frequency of 400 MHz. These pins require capacitively-coupled, differential waveforms, 180 degrees out of phase at inverted EeL levels. The peak-to-peak differential voltage must be at least 600mV, with a differential symmetry of 60/40 or better. The voltage should not exceed the absolute value ofVdd plus 500 mV during operation. When P%OSC_TEST_H is asserted the chip receives the master clock through the MoOSC_ TC1_H and P%OSC_TC2_H pins. Operation of the chip at the maximum internal clock speed of 100 MHz requires an input clock frequency of 200MHz. These pins require waveforms that are 90 degrees out of phase at CMOS input levels (Table 20-5). Each edge must be place within an accuracy of ± 24 degrees. The chip provides four double phase NDAL clocks on the P%PHIXX_OUT_H pins. The chip also receives these clocks through the P%PHIXX_IN_H pins. The relationship of the four clocks to the internal CPU clock cycle is shown in Figure 20-6. Figure 20-6: Relationship of Internal and NDAL Clock Cycles CPU CYCLE 11 12 13 1 4 11 12 13 1 4 11 12 13 14 1 NDAL CYCLE 1-----------1-----------1-----------1-----------1 PHI12_OUT_H /----------------------- \, _ _ _ _ _ _ __ PHI23_0UT_H /-----------------------\ _ _ __ PHIl PHI2 PHI3 PHI4 PHI34_0UT_H \ _ _ _ _ _ _ _ _ /-----------------------\ PHI 4 l_OUT_H ------------\ /------------ The following skew specifications must be met for all NDAL clock receivers. Inter-clock skew is dependent on the electrical characteristics of the chip environment. . The rising edge of any clock will be present at all receivers within ± 0.5 ns, as measured from the CMOS VIh level (see Table 20-5). 2. The falling edge of any clock will be present at all receivers within ± 0.5 ns, as measured from the CMOS Vllievel. 3. The skew between the rising edge of any phase and the falling edge of any other phase will be no more than ± 0.75 ns, as measured from Voh to Vol. 4. The NDAL clocks will have an edge rate of 2.0 ns or better, measured at the receiver, between the 10% and 90% points. 1. 20-22 Electrical Characteristics DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 20.3.4.2 Reset Timing P%ASYNC_RESET_L is an asynchronous input. It must be asserted for a minimum of 7 NDAL cycles. The P%SYS_RESET_L output is asserted asynchronously whenever P%ASYNC_RESET_L is asserted. P%SYS_RESET_L is deasserted synchronously with the rising edge ofP%PffiI2_0UT_H. Figure 20-7 shows the relationship between the reset signals and the clocks. Figure 20-7: System Reset Timing ************************************************* I NDAL CYCLE I NDAL CYCLE I NDAL CYCLE I I I I I I Pli P21 P31 P41 Pli P21 P31 P41 Pli P21 P31 P41 1---1---1---1---1---1---1---1---1---1---1---1---1 + 1 1 1 + 1 1 1 + 1 1 1 + P%ASYNC_RESET_L --\\\ /////-------------------------------------<-- Asserted for a minimum of 7 NDAL Cycles ---> 1 1 + I 1 + 1 I 1 + P%SYS_RESET_L --\\\\\\\\\\\ ///////////-----+ 1 1 + 1 1 + 1 1 1 + P%PHI12 OUT H /-------\ /-------\ /-------\ / P%ASYNC RESET L asynchronous - + 1 1--1-+ , '--1-+ , 1--'-+ asserti~n cau;es asynchronous P%PHI23 OUT H /-------\ /-------\ /-------\ assertion of P%SYS RESET L. - +--, 1 1--+-1 1 1--+-1 1 1--+ P%PHI34 OUT H \ /-------\ /-------\ /-------\ +--1-1 1 +--1-1 1 + 1 1 1 + P%PHI41_0UT_H ----\ _ _ _ /-------\ _ _ _ /-------\ _ _ _ /---+ 1 , 1 + 1 1 1 + 1 1 1 + ************************************************* The clock generator can be reset to a known state by using the P%TEST_DATA_H input as shown in Figure 20-8. With P%AYSYNC_RESET_L asserted, all clock inputs are stopped briefly (500 nS MAX). The states of test clocks P%OSC_TCl_H and Po/oOSC_TC2_H when stopped must be the same, either both high or both low. P%TEST_DATA_H should be driven low to effect the clock generator reset. This immediately places the clock generator into NVAX 4>2 and NDAL ~1. P%TEST_DATA_H is then driven high and clocking of the chip is resumed. On the first oscillator cycle following resumption of clocking, the generator will transition into NVAX ~3 and begin normal sequencing. PO/oAYSYNC_RESET_L must remain asserted for at least 7 NDAL cycles following resumption of clocking. DIGITAL CONFIDENTIAL Electrical Characteristics 20-23 NVAX CPU Chip Functional Specification, Revision 1.1, August 1991 Figure 20-8: Clock Generator Reset Timing CPU Phase NDAL Phase XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 2 1 3 4 2 I 1 2 3 3 1 I 1-\_1-\_1-\_1-\ ____________________________________________________________________ I \_1-\_1-\_1-\_1----------------------------------------------------------------------------------I - - - - - - - - - _ / - - - \ _ 1 - - - \ _ 1 - - - \ _ 1 - - - \ _ _- - - - - , / - - - \ _ / - - - \ _ 1 - - - \ _ 1 I I _ _ _ _ _ _ _ _ _ _ _ /--- \_1--- \ /--- \_/--- \ _ _ _ _ _---:-_1--- \_/---\_/--- \ I I /-\ 1-\ 1-\ /-\ 1-\ 1-\ I,-- 1-\_1-\_/-\_/-\_--_/-\ 1-\ 1-\ 1-\ 1-\ /-\ /-\ 1-\ I ,-I INTERNAl OSC P%ASYNC_RESET_L ______________________~----------------------------__----------__-------------------P%SYS_RESET_L I ' I ------------------------------------------------------------\ 1------------------------------I I 1 '--I I -----------c---SSSSSSSSSSSS---------------------------------//ii/-----------\ NDAL Phase 1 I I I I I I , I I Setup Assert Hold 10 nS 10 nS 10 nS min. min. min. I ' I Note 1 I Note 2 Note 3 * CHIP POWER-UP * * I- 1 CLOCK RESET SEQUENCE Note 4 * - K SEC%OSCl H is the internal master clock produced from either the P%OSC_H and P%OSC L inputs, or the P%OSC !lel Band P%OSC !l.'C2 B inputs. P%OSC DS'!l' B is us;d to select the clock ;our~e as described in this clock spe~ific~tion. - S indicates a static (non-changing) NDAL 9> 1. Timing Notes: 1. EeL pin inputs P%OSC_H and P%OSC_L must be used to supply clocks to chip prior to and during power-up. Inputs P%OSC !l.'Cl B and P%OSC !I.'C2 H must be held low in order to prevent latch-up. 2. Switch to test clocks P%OSC_'!l'C2_B and P%OSC_'!l'C2_B. Start measure out lpat on chip tester. 3. Clocks restarted to restore internal chip signals prior to clock-reset sequence. 4. P%ASYNC RESE'!l' L must remain asserted for a minimum of 7 NDAL cycles followi;g restart of clocks. 20.3.4.3 . Interrupt, Error, and Test Pin Timing P%DISABLE_OUT_L and P%TCK_H are an asynchronous inputs. P%TEMP_H is an asynchronous output. When P%PP_CMD_H<2:O> selects ~2 on P%PP_DATA_H<II> (see Chapter 19) then the output is asynchronous. The timing for the interrupt, parallel port, serial port, and boundary scan, and error pins is shown in Table 20-15. 20-24 Electrical Characteristics DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision l.2 t December 1991 Table 20-15: Interrupt, Test, and Boundary Scan Pin ~C timing specs Input Pin Bold Time P%PWRFL_L P1oHALT_L P%B_ERR_L 3.0 nB to JW1cPBI41_IN_B R P%INT_TIM_L P%S_ERR_L 3.0 ns to P%PBI41_IN_B R P%TEST_DATA_H P%TEST_STROBE_H 1 phase to Po/cPBI41_lN_H R 3.0 nB to P%PBI41_IN_B R 3.0 ns to P1cPBI41_IN_H R 3.0 ns to P%PBI41_IN_H R 1 phase to P%PBI41_lN_H R PtfcPBI41_IN_H R + 0.0 nB P%PBI41_lN_H R + 0.0 ns PtfcPBI41_lN_H R + 0.0 ns PtfcPBI41_lN_H R + 0.0 ns P%PBI41_IN_H R + 0.0 ns P%PBI41_lN_H R + 1 phase P%PBI41_IN_H R + 1 phase 1 phase to P%PBI41_IN_H R 3.0 nB to P%TCK 3.0 ne to P%TCK Output Pin P%TCK F + 3.0 ns P%TCK F + 3.0 ne Output Valid P%TCK R + 10.0 ns lR means the rising edge of the clock is used; F means the falling edge of the clock is used. DIGITAL CONFIDENTIAL Electrical Characteristics 20-25 NVAX CPU Chip Functional Specification, Revision 1.2, December 1991 20.4 Revision History Table 20-16: I Revision History Who When Description of clumge Rebecca Stamm S-Dec-1991 Revision 1.2, COlTect power supply numbers, leakage numbers, AC conditions of test Rebecca Stamm 9-0ct-1991 Revision 1.1, update power numbers and AC test conditions John F. Brown SO-Aug-1991 Revision 1.0, first edition released John F. Brown 20-Jun-1991 Revision 0.1, first edition for review Mike Uhler lS-Feb-1991 Revision 0.0, add template 20-26 Electrical Characteristics DIGITAL CONFIDENTIAL Appendix A Processor Register Definitions This appendix contains the SDL (Structure Definition Language) definitions for the ~"'VAX processor registers. These definitions are used by chip verification code, and it is strongly recommended that soft\vare groups use the same definitions to minimize errors in the generating new defini tions. NOTE Tne file shown below is maintained by the ~V:4.X CPU chip design group and is constantly being updated as changes are made to the design. It is included here simply as a means to document processor register definitions used in examples throughout this specification. The latest machine-readable version of this file should always be obtained from the NVAX CPU chip design group. module SPRl9DEF; {. Nvax - Specific Processor Register Definitions {* {* {* To convert this file to a macro library, do the following: {* {* SOL/LANGUAGE-MACRO/COPYRIGHT/VMS DEVELOPMENT/LIST PRl9DEF {* LIBRARY/CREATE/MACRO/SQUEEZE PRl9DEF PRl9DEF {*- constant RE\~SION equals 30 prefix PRl9$; /* Revision number of this file 1* In the definitions below, registers are annotated with one of the following 1* Symbols: 1* 1* RW - The register may be read and written 1* RO - The register may only be read 1* WO - The register may only be written 1* 1* For RO and WO registers, all bits and fields within the register are also 1* read-only or write-only. For RH registers, each bit or field within 1* the register is annotated with one of the following: 1* /* /* /* /* /* /* RW - The bit/field may be read and written RO - The bit/field may be read; writes are ignored _ wo - The bit/field may be written; reads return an UNPREDICTABLE result. Wz - The bit/field may be written; reads return a 0 WC - The bit/field may be read; writes cause state to clear RC - The bit/field may be read, which also causes state to clear; writes are ignored DIGITAL CONFIDEt\'TIAL Processor Register Definitions A-1 NVAX CPU Chip Functional SpecificatioD-t Revision 1.0, February 1991 aggregate PR19DEF union prefix PR19; /* Architecturally-defined registers which have different characteristics /* on this CPuo constant TODR equals %xlB tag $; /* Time Of Year Register (RW) constant MCESR equals %%26 tag $; cons~ant SAVPC equals %%2A tag $; /* Machine check error register (WO) /* Console saved PC (RO) SAoV?SL equals %x2B tag $; /* Console saved PSL (RO) PRl9SAV?5L_B!TS structure fill prefix SAVPSL$; PS~ LO bitfield lenath 8 mask; /- Saved PSL bits <7:0> lir~TCODE ~i~!ield l;ngth 6 mask; /* Halt code containing one of the following values cons~an~ ~~~_K~TP!N equals %%2; 1* HAL!_L pin asserted constant ~T_?WROP equals %x3; /* Initial powerup ccnstant ~T_!~~STK equals %x4; /* Interrupt stack not valie c~ns~ar.~ r~: DOU:~E equals %x:; /* Mac~ine cheer. =~rin; exception processing c~nstant nA!::H~:!NS equals %x6; /* Halt instruction in kernel mode c=nstan~ ~~r_:~VEC e~~als %xi; /- ::lega1 5=3 vector (b!ts<l:O>-ll; ::=nstr.~ E..:.~r_'i';,=so\l~: eq:.:als %x8; /* ;;-:5 S:B vector (bi~s<:: 0>-:0) constan~ ::::.!:':c:::, ::_:'':'':_:?!''::: Q~;.a~s %L;';' ==·=.s':. &'::::. F..A::=_:~O eq,",a.~s %zlC'; _::-:2.:5 ::::.s-:.a::.-: ::.;...:::_:!:: %z:!; :::::s":.a::o: ::.;...:..:_:::: .. ;-':'2.:'5 %z.:~; /7 =:-:!~ on i=;:._:::'1.:~': s~a:k /* ~~';/TNV c.~=ing machine ehe:k ~:-oeessing /.,. A~-/:l:\" c:~:=:!.::.; p:s:-:-. ~ ;=C:fiSS!!'l'; /., l-:&:r.:":le :he:!: :i::.=:'::;- : ..a.:::':'::.. ~:::.=k p=o:ess!.::.g /? !-!&::!':lw ·:ne:k :-==!.::; ::~::-: ;==:.ss:'::; ::::s':.c:: F";''':':_:~3 Eq-~a:s J!z:~; ::ns-:a::.-: ::_:..:.: :E PE~ ::: e~a.;,.s %j;:i'; /+ PS:<:€:2~>-::: :::.:r:'n; ::'nter:.:~-: cr ex::ep~!cn ::=:-_~-='3.::.': =:.----:--:e----.~ e;--.:a:s ~::.:;._; l · ?S:.:<:€::.;>-:.:C ::..;=!.::.;- !.::~ ...==.;~~ ___ j:CQ~':!==' ::::.s~a::.~ ~:~~:-:!-;s:-::~ _q~a:s %z::; j ' :S~<:€::~>-::: :~=~~: !~~£==~;~ __ ~:_?':.!=~ ::::'S":.:::'-: ::;.:.:':?z:~:S:_:c:. e~.;als %'z::-; ::::s~a::-: ::;':'=_?~:_:E:_:::' £::-:.:=.~s %:::.:~; ::::.s":.£:.":. S;.:.=_~:_=S:_::: e;~a:s %~:=; :£::::.:: : ==s~; :~:.:...:!.:: =!":.=~_:: :,-=:;-::::. 1 mask; :::-~-;..:..:: =~ =~':.!~£:~ , .. /-r I'" /Y :::~c.:!: PS:~<:ZE ::.;>-::: :"".:~:'::g __ _ ?S:..<:::: ::.;>-:..::. ::.:.=:..::; __ _ :S~<:€::4>-::: :~=!~; : __ sA....:,;:. ~! - :.. __~_ ;!~!~_:: :_~=~:. :6 =~S~; c::':' :?~:~S:;''':.?S:':_=::S; co~s-:~r.t IO?ZSET.equals %x37 tag $; /* I/O system reset register CWO) /* Per!o:mar.ee ~;nitoring enable (RW) S:: eq~als %x3E tag S; /w System ider.tification register (RO) structure fill prefix SIDS; UCODE REV bitfiele length 8 mask; /* Microcode (chip) revision number NONSTANOA?~ PATCE bi~field length 1 mask; /* PCS loadee with a non-standard patch PATCH REV bItfield length 5 mask; /* Patch revision number F!LL I bit!ield length 10 fill tag S$; TY?E-bitfleld length 8 mask; /* CPU type code (19 decimal for NVAX) ene PR~9SID_BITS; c~ns~an-: PR~95:D_B!TS A-2 Processor Register Definitions DIGITAL CONRDENTlAL NVAX CPU Chip Functional Specification, Revision 1.0t February 1991 1* System-level required registers. 1* These registers are for testability and diagnostics use only. 1* They should not be referenced in normal operation. constant IAK14 equals %x40 tag $; 1* Level 14 interrupt acknowledge (RO) constant IAK1S equals %x41 tag $; 1* Level lS interrupt acknowledge (RO) constant IAK16 equals %x42 tag $; 1* Level 16 interrupt acknowledge (RO) constant IAK17 equals %x43 tag $; 1* Level 17 interrupt acknowledge (RO) structure fill prefix IAKS; 1* Vector returned in response to IAY.lx read !?~17 bi~=ield length 1 mask; 1* Force IPL 17, independent of actual level PR bitfield length 1 mask; 1* Passive release S==_O==S~: bitfield length 14 mask; /* LW offset in SeE of interrupt vector F:~~_l bit!ield length 16 fill tag S$; end ?R:9!A¥._\~=TOR; P~l~ZAY. v~CTOR DIGITAL CONFIDENTIAL Processor Register Definitions A-3 NVAX CPU 'Chip Functional Specification, Revision 1.0, February 1991 1* Ebox registers. 1* Ebox register definition constant INTSYS equals %x7A tag $; 1* Interrupt system status register (RW) PR19INTSYS BITS structure fill prefix INTSYS$: 1* ICCS<6> (RO) ICCS6 bitfiela length 1 mask; S!SR bitfiela length 15 mask; 1* SISR<15:1> (RO) !NT_ID bit!ield length 5 mask; 1* ID of highest penaing interrupt (RO) cons~ant I~~_ID_HALT equals %x1=: 1* Halt pin constant IN! ID PWRFL eauals ~xl~; 1* Power fail constant !NT-ID-H ERR e~als ~xlD; 1* Hard error constant INT:ID:IIIT_TIM-equals ~xlC: 1* Interval timer constant IN! !D PMON eauals %x1:; 1* Performance monitor constar..t Il,,:r-:n-s ERR ~auals -b~lJo_; /"" Soft error constant INT-ID-IRQ3 e~als %x17: 1* I~~ ~I aevice interrupt constant IUT-ID-IRQ2 eq...als %x16; I'" IF:' :'6 aei7iee ir.terrupt :~:lst:.a:.-:. !l:T-!D-ZRQl equals %z!5; /.- _=_ "!~ :i~,,:!.=e in-:.e::t:p: e:mstant :l~!-:D-!?.QO eq-.lals %x14; I'" :F:' ... o.e,ice ir.terrupt eor.star..t IN:-::;-S!S?l5 eq'.lals %xC'=: /"" S!SR<:S> constant !1~':-::l:-S:SRl4 equals %xC'~: /.., S:SR<H> =:·~s-:.a::':. :!~:-_::_S:S?':3 c~~s-:.~t. ::':-:_::_S:S?':':; .~.le.ls %x:·:; :·~::s,:a.::':. !!.~: ::::s":.::::-: :::".s":.a::~ =:::S-:"3.:::: =:: eq'.:!.ls S:S?~l 4iq''':e.:s .XC~; "s:..::'::: :::-::=:.=:.5:':CI _:;-'':&:5 %:-::.;..: :::-:_::_5:5?. ;. e;--:.a:s Itx':'~; :!:::~::-_S:S~e 17 /9 /9 I~ _;-':2.:5 %zC'E: S:S?~<:3> S:.5?~<:2> S:.5~<::> S:.s?~<:~> ::S:.<9> , .. E:S?~<E.> :!:-:_=_S:S:. - -=;-.:.a:s ~=:-; .: • .:::?~<-> ::::S-:'a:':t :::-:_::_S:S?6 -;-';£':$ ~x:'£: ::::s-;.r..~ ::::_::'_::::": .;-..:=.:s ~z·:·:; ::::s-;.a.::::. _... __ --_.:-.:~. 6;-":'=':5 ~j;:..;; ::::S-:'a:':t :r,:_::_S:S?.3 _;-,;a.:'s ~x:::: :::::so:a::-: ::::_::_::::: -=;-.:.a:s ~=:.:: S:S?<6> 5:.5?. <:> ::::05";:':':.-: =:.s?~<.;> S:S?<:3> S:S?<:> ==::.s-:.a=:.":. _.... __ --_~-.::"'- .. ~.:.:.:s %z::; / .. $:S:?. <:> ::::5':.:::::' :::-:_::_:::_:::-: ..;::.~:s %::::.; /'" ::: ::.-:..==::;':. =::::_: =:.~=:..:::. :6~9''::: 3 !!:: -:.a; 5;; ::~_::~_?ZS~: ~it=i.lo l.ngth : mask; /9 :::-:..~a: t~~&: ir.terru?~ rese-:. (WC) !'::.:._= ri-:.!ie1o length 2 !111 tag Ss; S_~?2~?~SET bi~!iela lL~;th 1 mask; /* So!t error interrupt reset (We) F!·:~:'_?ZS~'! b;,t!ield leng':.h 1 mask; I· Pe:!o::m&nee monitoring int.rr..p-c res.t E_~?~_?ZS~: :!t!i.l: l~g':.= : :ask: 1 ~a=: &=r~r !r.':._r~~~t reset (v~l Ph~~ RESET bitfield length 1 mask: 1* Power fail interrupt reset (WC) HALT_~SET bitfiela length 1 mask: 1* Halt pin interrupt reset (WC) ena PR19INTSYS_BITS: I~ (v~C) 9 A-4 Processor Register Definitions DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 /* Ebox registers, continued. constant PMFCNT equals %x7B tag $; 1* Perfor.mance monitoring facility count register (RW) PR19PMFCNT BITS structure fill prefix PMFCNT$: PMeTRO bitfield length 16 mask; /'* PMCTRO word PMCTRl bitfield length 16 mask; /'* PMCTRl word end PR19PMFCNT_BITS; constant PcseR equals %x7C tag $; /'* Patchable control store control register (RW) PR19PCSCR BITS structure fill prefix PCSCR$; FI~L 1 bTtfield length e fill tag $S; PA? PORT-DIS bitfield length 1 mask: /* Disable parallel port control of scan chain (RW) peS:EN~ bitfield length 1 mask; /"" Enable use of patchable control store (RWl peS_WRITE bitfield length 1 mask; /"" Write scan chair. to patchable control store (WO) Rr.."L_SE!:T bitfield length 1 mask; /* Shift read-write later. scan ::hain by one bit (WO) D~.TA bi-:field lengtr. 1 mask; /'" Data to be shifted into the pes scan chain (:<:rn :::':_2 l:·!-:!ield length 10 fil: -::.ag SS; !;~:~S:Al\,":)A?Z_PATCE b!-:!ield ler:.T-t. ::. mask; /* pes loaded with a non-standard pa-:::h (R;;) F;'.:'=-::_PZY b1tfi.10 lengtr. 5 :ask; I"" Patch re ..... ision number (RN) :::"~_3 er.d }:,:::!!£:d lE!'l;-:h 3 !!:: ~ag $$; rR:g?=S=?~=:~S; ?~l~~=?~=::S S~=~=~~=E !~:: p:c=!~ ~:?$; l:~':.~:'_:~ :.::..;-:::. :. :r4;S):: lor ·~~e:':.:-r u.~!-; ;=.Sc:~:: -.~::O?. _:?~S~!~ (~i\) =:.:):_:::.:;,-=:..~ :::!.'~=io&::' :e~;-:.:: : :--.2.51-:; /Y ===.~: E.:la!:~.:' {R't·~~ :=~Z:::-:_U':: =:. -=:!:.. .. :: :'c::;-:':: : =:as}:; / or 56:.:-:' c:C.=:4a.: -:.:'::-...:a5. ::: S3 s-;a.:: ":.~:r:e:::-; -:.:':r==:- (?,:t-~~ =E:·X_S:~_=!:;'...s.:_~:;...::.! :t!.-:.:!..:: :.::.;-:.:: : :.ask; /9 =:·:·z s':.!.;. "' c::l·:':'::ic:la: b::'Pass &:lab:'e (?,.:;':) :::.Z:::-:_:::::-?..;z:: =:..-:.::.. ..:: : ..::.;-:.:: : ::.as!:; / ... S~ s'::.:: -:.!.:r...:':.:,: :::::.==_: \~::; :-=·:=::::"=_:~S= =:'::=:'.c:=' :... ~;--:::. : ~s::'; /'" 5.::".:-: ~es':. Ir.o:. !er S:: s-:.a.:': ":.:.r,eo".:~ (?..i:) ::!~!::::.:_=::=:: :::.-;=:.... :=. :s::::!: : ~~k; :l::k S~ -:.!.:r.... :::-: {?..v-i} :::S_:::i:: =':::.!i.e.:: ~s::;-:.:' : :~S}:; / .. :~:: ~:::;:a:Q=::'£: :.:: CA':..:::a: ::;!: (?"l{} =. =. =:~~_: =~~=~c:: :~~=~:. 5 ==::r:_=~~:_!:::;'''::''E ::~-:=:..c:= =:~:_: =:..":.=!.~:: :~~;":.:: :!~~ _~:~;.-=:.:: !~:: 1 ':.a; 5!; 9 ==:S :s::,;-":.!: : =.asr.: /", : !~:: '::; S;; =,:!.':::,i.f::=' :6::';-:':: : :!i.as!:.; .I" ~a;:6 ":.es':. == :;··:x (R~-;) :&==:-::.2.::.:. ::L:::'!~:=!.::; !a.:!.:'!. -:::- ,=.a= 1. {?~i;.! :!·Z_::·.::;:.: ;::'t::.!.;::c. ;;.;:::g-::.:: : ::t>asr.; /" :~rf:=~cc Irloni-:~ring fa:!:"i-:y mas~er se:";::t (?vO cons~ant ?~~~ !BOX equals %bOO; /'" Select !box constar.~ PMUX-~BOX equals %bOl; /* Selec~ ~ox constar.t PMUi:-l-SOX equals %blO; /+ Select Moox constan~ P~~:CBOX equals %bll; /* Select ebox Pl~_~; bi~::ielc length 3 mask: /* Perfor.mance monitoring facility Ebox mux selec~ (RW) cons~ant EMUX_S3_STALL equals %bOOO; /* Measure S3 stall against total cycles cons-:ant EMUX_E.~_PA_STALL equals %b001: /* Measure EM+PA queue stall against total cycles cons~ant EMUX_CPI equals %b010; /'" Measure instructions retired against to~al cycles cons~ant EMUX STALL equals %bOll: /* Measure total stalls against total cycles constant EMUX-S3 STALL PCT equals %blOO: /* Measure S3 stall against total stalls constant EMUX:EY.:PA_STALL_PCT equals %blOl: /* Measure EM+PA queue stall against total stalls constant EMUX_OWORD equals %blll: /* Count microword increments PMF LFSR bitfield length 1 mask: /* Perfor.mance monitoring facility Wbus LFSR enable (RW) FILL 3 bitfield length e fill tag $$: PMF_CLEAR bi~field length 1 mask: /'* Clear performance monitoring hardware counters (WO) end PR19ECP~BITS: DIGITAL CONFIDENTIAL Processor Register Definitions A-5 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 /* Mbox TB registers. /* These registers are for testability and diagnostics use only. /* They should not be referenced in normal operation. constant MTBTAG equals %x7E tag $; 1* Mbox TB tag fill (WO) P~19MTBTAG B!TS structure fill prefix M7BTAG$; TP bitfield length 1 mask; /* Tag parity bit F!LL 1 bitfield length 6 fill tag SS; VPN bit field length 23 mask; /* Virtual page number of address (VA<31:9» end PR19MTBTAG_BITS; MTBPTE equals %x7F tag $; 1* Mbox T.B PTE fill (wv) PR19MTBPTE BITS struc~ure fill prefix M!$PTES; /~ Format is normal PTE format, except for PTE parity bit PFN bi~fi;ld length Z3 mask; /'* Page frame number (PA<31:9» F:~l_: bitfield length 1 fill tag 5S; P bit!ielc length 1 mask; /* PTE parity cons~ant F:~~_: bi~=ielc length 1 !ill tag S$; M :!-=.!'i£:c len~!: 1 mask; /"It M~ci!fy :bit. PR~'!' bi~!iel:: le:lgt.h : mask; /'* Pro~e::tion field V :b:!.~!ield le.. gth 1 mask; /* PTE va:id bit end ~R19~~S?~_E:=S; A~ Processor Register Definitions DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 /* Vector architecture registers constant VPSR equals %x90 tag $; 1* Vector processor status register (RW) PR19VPSR_BITS structure fill prefix VPSR$; VEN bit field length 1 mask; 1* Vector processor enabled (RW) RET bitfield length 1 mask; 1* Vector processor state reset (WO) FILL 1 bitfield length 5 fill tag $$; AEX bit!ield length 1 mask; /* Vector arithmetic exception (WC) FILL_2 bitfield length 16 fill tag SS; I~ bit field length 1 mask; 1* ~plementation-speeific hardware error (WC) F:LL_3 bit!ield length 6 fill tag SS; BSY J::.it!ielc leng":!: :. mask; lYe Vector processor busy eRO) end ?R19~~SR_S!TS; constant .';;',.ER equals ~x9l tag $; !* Vector aritr.metic exception register (RO) ?R19Vp~, BITS st~~cture fill prefix: V~R$; F_~r.)F titfield lengtt :. mask; 1* Floating ~~derflow .. -'~" .. b':'::!i.ld. l.a:lg-:.h . mask; b!:!!cl: :.n~~~ _ ~ask; :_O\~~ bi~!~&:c l_~g~h 1 mask; =_~:'PR /* :lo2."t:!.r.; C:iv!'Qe-b:z"-z~=c /7 =loa~!r.; reserved o~=L~d /. :loa~ing ove:!l~w b!t!ield :en;:r. . fill tag 5$; :_:~1>7:' =:::.::'0£::' : ..::';-:'!-'. .. ruask; =:~~_: =!~=!~:: :~~;~r. ?Z'~:S=~~_l~':;'..s:~ /. !=~::.e;e: ove:!!ow :0 !~:~ ~a; Ss: =:'-;!:"i::c 1&::;-:.:: 1£ :rl.ask; /"" \'!'e:O:C:- ~eS':.!:la~':o:: :~;!.s:.£= :nask e::~ :?_::"::;'~,"-::::E; ......... :,.-: -:- :- . _---: -, DIGITAL CONFIDENTIAL Processor Register Definitions A-7 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 /* Cbox registers. constant CCTL equals %xAO tag $; 1* Cbox control register (RN) PR19CCTL_B!TS structure fill prefix CCTL$; ENABLE bitfield length 1 mask: /* Enable Bcache (RW) TAG_SPEED bitfield length 1 mask; /* Tag RAM speed (RW) constant TAG_3_CYCLES equals 0; /* Select tag RAM speed: 3-cycle read rep/3-cycle write rep constant TAG 4 CYCLES eauals 1: /* Select tag RAM speed: 4-cycle read rep/4-cycle write rep DATA_SPEED bitfield length i mask: /* Data RAM speed (RW) constant DATA_:_CYC-~S equals 0: /* Select data RAM speed: 2-cycle read rep/3-eycle write rep co~stant DATA_3_CYCLES equals 1: /* Select aata RAM speed: 3-cyele reaa rep/4-cycle write rep eo~stant DATA , CYC~S eeuals:; /* Select aata RAM speed: 4-eycle read rep/5-cycle write rep S=ZE bitfield length 2 mask; /* Beaehe size eRW) co~stant S~ZE :2SKB eaua.ls 0: /* S.. lect 128l".B Beaehe co~stant S:Z£:256KS e~als 1; /* Select 2S6KE Bcaehe co~sta~t S:ZE_512KB e~~als 2: /* Select 512F~ Beache cc·~stant S::::E_2*3 egwals 3; /"" Select 2M3 Beache =OR==:_~:: ::,~-==icl: /"11: ForCE =:a.ehQ hit le:l;-:'!i. :. mask; (RW) bitfielC: len;'th 1 mask: /* ~isu·le Beach.. ECC errors (RW) SX_=::: ::itfiel: len;t!l 1 r,ask: / ... £nable use of software E~~ (Rl-v) ::::.Z::::_~S': bio:!!.&:c l_r.;:.:: : mask; /- !.=.a:l.. tes't. e! :b:x =.. 2.:: -:.!mect:~ :;".!:l':.e:s :::s;.:::.::_=;..::: ::itfiel: le~r-h :. xr.uk: /y Disa=le "'rite pa:kin; (RW) =!·:_;"::£SS_:'Y:?~ l::!..t!!.ele le:l;~h :3 ::lask; /'" ?r!or:m&nce mo~it~r:!..n~ access ':1"'PQ (Riil =:::.s-:z.::-: :!·:;..:_=~E .. ~-=.:.:s C'; /7: C:.h.=c:lCY 2.:=.5.5 c! .!':.n.= ~~-pe ::::.s-:.r.-: ?!-j..:_=:.!!_?.:;;.:: e.q:;a!s 1; /." ::=.e=..:::::-" &:=655 '!e: ?.!.;..:: ::::.s-;a::-:. :::":'.:_:::':'_:'?':::;''':: e;-..;:.~s:~ /- ::::6=-=::':::- 2.::.5S ! : : :.?z;...:. ::::s':.r.-: :!~:;...:_:::: ,;-.:.a.:s 4: I Y c:-: aceess : ! a::::' -:::7'= ::S;'-=!'£_~.RC·?.s :::-.. s,,:a::-: ::·~:_:=-:-_:?L:''':' e=-';!.~S _, ::::S':a!::' ::~:;....:_==::_:.?z;.~ £;--:'20:5 t.; ::~s':~=::; =l~:":'_:_=:::_:'?~:'~ . :~" _~::_=~:::: ::':.':.::"~:: ::::s:.a::-:. :~~:_E:: C'; ! ... a::.ss ::= ::~~.: /-- C::'- a:e.ss !:: ~?...!:A: a:cess :0: C?~.=t :'£=!:~~=_ :::::.':-:.::!::; ...... - /- Crt! eq-.:.a.:s -;; :.::;--::: : !:~sk; .~.:.::.:s =?~ (:~v:} j. E':-:. '":.~",? {?:;";; E::: :::. .~..--::.. :. =:":::: ::::~-:.r.,: ?::':_E:':_-~·;':':~ .. ~~a:"s:: /'91: Ei-:. C:l va:~: ~l;·:.k ::::.s-:.c-; =:~':':_!·::SS_:~·::~:' fi~.:.a:'.s::: / .. !..::'ss ::: =,:o:k <:a-";5.5 ft·=:'::' .. =·a.::-:"· =:'~~_!'=.;''':''_:::?''?. . =~-:.!:. .. ::: : ...~::::: : =.as}:; i'ft =e,==es : ;'E.:!-::; e=:~r en -:.h.. !~.;..:., _ ......... ~_ :::.:._: :::'':.!~c:,e. len;o:r. :"3 :!ll ~a9 5$; S~ =:=~ bi~!ielc len~~h 1 ~asy.: /* £nter s:ftw~re error ~rL~s!~!Qn moae (RW) Ht(!:':'1f. ritfielc length 1 mask; /'" Error transition moae enterea aue to error (i':~) ::::.s-:.r.~ =!~:':_F.::_:'t-::~ 6~":!.:S:; /'W 0.,.-::.= ~-":-;'~:.!.== -:':2.::'S2.:-:!..::: end PRl9C=~_E:TS; BCDECC equals %~ tag $; /* Beache data ram ECC (WO) :?R19BCDECC_EITS s~ructure fill prefix BCDECC$: ::!.L_l bit!ield length 6 fill tag S$: ECCLC bitfiela length 4 mask: /* ECC cheek bits <3:0> :ILL_2 bi~field length 12 fill tag SS: ECCH! bitfield length 4 mask; /* ECC check bits <7:4> FILL 3 bitfielC: leng~h 6 fill tag SS; end PR19BCDECC_BITS: cons~ant A-8 Processor Register Definitions DIGITAL CONFlDEt..'TIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 /* Cbox registers, continued constant BCETSTS equals %xA3 tag $; /* Bcache error tag status (RW) PR19BCETSTS BITS structure fill prefix BCETSTS$; LOCK bitfield length 1 mask; /* Tag store registers are locked due to an error (WC) CORR bitfield length 1 mask: /* Correctable error occurred (WC) UNCORR bitfield length 1 mask; /* Uncorrectable error occurred (WC) BAD_ADDR bitfield length 1 mask; /* Addressing error occurred (WC) LOST ERR bitfield length 1 mask: /* Error occured while register was locked (we) TS_CMD bit!ield length 5 mask; /* Tag store command which caused error (RO) const~~t ~ DREAD equals %bOOlll: 1* Command was D-stream tag lookup const~~t ~-!~ eouals %b0001~; 1* Command was I-stream tag look~p constant CMD:O~ e~als %b00010; 1* Command was OREAD tag lookup for write or read lock constant eM!:: WUlu.OCK eouals %b0100D; /* Command was write unlock tag lookup (done only uncier E:M) consta::t o=~? !~-Ii'"J...J.. e~uals %bOllOl; /* Command was inval tag lookup for l~;"::' IiP..EAD or !P'=:;'.Ii consta::t o-:=:c::ar;;.:. equals %b01001; /Y: Command was :"nval tag lookup fer 1~!)A:.. O?z;.:J or v-<?':'!::: consta::t CMD_:?P__Ii:::;';;:':"OC equals %b01C1D; /* Command was tag loor-up to::: :?R deallocate ~!t!ield lengt~ :: !!ll tag 55; .:':0. ;:::.. 19S=:::'S:S_E:=S; /"" E-c:a::he errc::: tag (RO) . :?. :;:::E=';'~ ::-::S s-;:-,..;c::.~rc .:!.:: ;=_!~>: =~=;. ~~; =:~~_: =~~=~~:: :~~~r. ~ !!:: ~a; SS; : :.ask; ".-;''':':: :~":=i~:':' ::.:::~: :::.::.::"-=:"::' :c::;--:,:-.. : ::-.ask; :e~;--:.:: ~== =~~=~~:d :E~==~ ;~ oo;.':'a::': ::.-; 6 ~ask; DIGITAL CONFIDENTIAL Processor Register Definitions A-9 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 /* Cbox registers, continued constant BCEDSTS equals %xA6 tag $; /* Bcache error data status (aw) PRl9BCEDSTS BITS struct~re fill prefix BCEDSTS$; LOCK bitfield length 1 mask; /* Data RAM registers are locked due to an error (WC) CORR bitfield length 1 mask; /* Correctable ECC error occurred (WC) ONCORR bitfield length 1 mask; /* Oncorrectable ECC error occurred (WC) BAD_ADDR bitfield length 1 mask; /* AddreSSing error oceurred (WC) ~OS!_ERR bitfield length 1 mask; /* Error occurred While register was loeked (w~) F::;:'!,_:' bi~field leng~h :3 fill tag SS; DF~CMD bi~fielc length 4 mask; /* Data RAM command which ea~sed error (RO) CO~StL~t CMD D~ equals %bOlll; /* Comm&nd was D-stream data lookup constant CMD:I?~ equals %bOOll; /* Command was I-stream data lookup const~t ~m_W=ACK e~~als %b0100; /* Command was writeback data lookup cc~stant om ?,MW eauals %b0010; /* Command was read-mocify-write data lookup F::;:'!,_: bitfield length "20 fill tag 55; en: P~19B=EDS=S_=:TS; ::. :'9::~'=:=:_=:=S s'::-~=':~=. ~!.:: ::'r.. !!z ==~~==S; =:~~_: b~~!i_:c :e:~h 6 !~~: ~a~ S$; :=c::!lw :'a.~a ~CC .; :ask; =:~:._: =,:. -:.!!... :c :£..~=-:. :: E==:': ~:=~: A-10 =~ -:!i .. :.::. :.=.~r.. =~~=~&:: :E~~r. ~ ~~sk; s~"=.:'rom.. ::,,':'5 <~: c> 2.;- !s; :=..:!l6 :'a::.a ~:: Processor Register Definitions DIGITA·L CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 /* Cbox registers, continued cons~ant ~FADR equals %xAB tag $: /* Fill error address (RO) constant CEFSTS equals %xAC tag $: /* Fill error status (RW) PR19CEFSTS BITS structure fill pre~ix CEFSTS$: RDLK bitfield length 1 mask: 1* Error occurred during a read lock (WC) LOCr. bi~field length 1 mask: /* CEFSTS & CEFADR registers are locked due to an error (We) TIMEOUT bitfield length 1 mask: 1* Fill failed due to transaction timeout (WC) ?DE b!.~!ield length 1 mask; /'* Fill failed due to Read Data Error (WC) LOST ~?~, bitfield length 1 mask: 1* Error occurred while register was locked (WC) ZDO ;itfield length 1 mask: I'" NIl;':' id<O> for failed read (R:l) IPD.D bitfield length 1 mask: 1* Error occured during an I?EAtl (RO) O?~ Litfield len~h 1 mask: I'" Error occurred during an O?~ (RO) w"?..!':E b!.tfield len~h 1 mask: /'" Error occurred dur!.ng a write (RO) ':'::_1-130:;: bitfield length 1 mask; /'" Da':.a was destinec for ':.he Mbox (RO) R:? bitfield length _ ~~sk; /"'?~ invalidate was pending (RO) ::P =itfield length :. ~sk; 1'* O~ inva:idate ~as pending (RO) ::~ ::!.tfie:'c length 1 mask; Iy Data was not to be va:'idate: when fill com;:<i.eted (RO) ?.:;:.?._:::._:;:!~ t!.t!ield length 1 mask: /'* Last f1:1 f::= reac lock receive: (R~l ?~·::_;::":"_DO!:E: ~!-;:iclc len;-:.::' 1 mask; /~ Aeques~e= :!.:: q':.l2:W:'!"= was re:.s!vec !C~ -:.~is r.~d. :::::::: =:'~=ic::"Q :.::;-:.:' ~ ::las}:: /<ft !1~&= e! requ.S':.~:' =-::.:._: :::::i.;::c. lc!'l;~!: .: =~:: ~ag SS; ::!:=::.:?~=:-=::_=:~:. =-:~:_: !:.!-:.!:... :c. =!~!~_:: :~~=~~ :...::..;-:.:: : raask; j?: ?=.~ ~i; : ! !!.:: =c::.a:"'V": (?.. ~; =: :?.:?~ ...-a.s =~:c:.'";.!'.. : !=::. -:h. :;:;...:. "-"'!:c::' =!.::_:a.~.... __ ".7a:!.c (~.;:) :C !~:: ~a; S~; i:::: :?. :~=~:S:S_::=S; DIGITAL CONFIDENTIAL Processor Register Definitions A-11 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 1* Cbox registers, continued constant NESTS equals %xAE tag $; 1* NDAL error status CRW) PR19NESTS BITS structure fill prefix NESTS$; NOACK bitfield length 1 mask: 1* OUtgoing command was NACKed (WC) BADWDATA bitfield length 1 mask: 1* BADWDATA cycle transmitted (WC) :LOS'!'_OERR bitfield length 1 mask; 1* OUtgoing error was lost while register was locked (WC) PERR bitfield length 1 mask: 1* NDAL parity error detected (We) INCON PERR bitfielc leng-:.h 1 mask; 1* Inconsistent parity error (parity error detected on ~OS!~P£RR bitfield length 1 mask; 1* NDAL parity error detected while register was locked (W=) /* ACKee transaction) (WC) F:~:_l bit!ield len;th 26 fill tag SS; ene PK19NESTS_S=TS; ccnstant l'EOADR eg-.:.als %xBO tag $; /* ~~AL error output address (RO) constant }~OCMD eg-~als %xB2 tag $; /* h1D~ errcr OUtput command (RO) P?~:$th~O~!:'_3:TS s-::uc-:.'c:e !ill pr£!!.x l~~~·=>S; a=· :·:::.!i.:'c lEr.~h 'mask; /-- 1m;...:.. e:::r.:n.a.=:: O~ ·:n:::.;cin; .==o~ t=L~sc.=~ion (see below) :~ bi t!iele le:::gt:: :: mask; /* ~'"D1.1. :~ on cutgcing error tra:lsactic:l :~:._: l:,i-:.!!.ld E!-:~_~!': r'!~=iclc :'_n;~h :_~.g:.h ~:'ll -:'2.; 1 S mask: =:~~~: ~~-:.=iw:c :.~;~~ :~ ~E!\ =~':.=iclci S5; /<fr Sy-:.e .::=.=·:.. s C!". ..==:: ,:=c:sa:':.~~n c::::'~;j:!.:l; =~ll ~a~ S~: :w::;-:.r. 2 mask; /. ~-e::;':.h ,. /9 :::. ::::.;::"::; .::.:.: ~=2.::52.:':.:':::' (54ft :· .. ~"-'.:"'- .:===: :.a::.a ---=_ . C~..: ."0_.:0"_ ~-- .. ==== - -:-: .. _.. ~ --'''''' . :'.- cc;::s-:.a::-: !~::!.=, ;:q~a:'s %z=c -=&; S; .'y :=;...:. .. !.-:.;:.:-: ::="2=.: (?..: : :P. :?:~::!·=_=:=S s-:.:-.::-:.::.=_ ':.1 - .. :;:=....:!.z :~:=·=S; ::!-:~:.£:c :...s=:-=- ~ r..ast:; i'ft :=;..:. =o=.a::.~ =s=_:"v.. ~ .::: .==:.: -:.=a.::.sa:-:.:':::. ::= A-12 Processor Register Definitions :0.. ) .J __ (S~ :'~:"C-W) DIGITAL CONFIDENTIAL NV~ CPU Chip Functional Specification, Revision 1.0, February 1991 /* Cbox registers, continued /* Encoded NDAL length values cons~ant LEN HW equals %bOO prefix NDAL$; cons~ant LEN:OW equals %blO prefix NDAL$; constant LEN_OW equals %bll prefix NDAL$; /* Length - hexaword /* Length - quadword /* Length octaword /* encoded NDAL command values constan~ CMD rmp equals %bOOOO prefix NDALS; /* Command - NO? constant CMD:W?,,:TE equals %::'0010 prefix NDA!.S; /'" Commanc. - virite constan~ ~_~~!SOWN equals %b001~ prefix 1~A!.$; /* Co~nc - Write disown constan~ ~ID_!P~ equalS %b0100 prefix ~~ALS; /'" Comm~~c !-read cons~ant CMD_DREAD eq'.lals %b010l prefix lI...:>AL$; /'" Command - :C'-read cons":ant C!.~ O~.D eouals %b0110 'Orefix 10;':'$; /* Commanc. - Co-read =c::.s~a::-:. ::!-=::?.=oE. £qu~ls %b100:' ~:~=!.~:. l\-:,;''':'S; /"tr Co:r.:nan: - R.a.: cia::.: e::o= .. equals %~:":lC !=,=Q!:!.~ l"~;'':'';; /'" =~!:'.::la.~c - v;=!~Q da-=.a eq-.;a.:s fs;:,:O:: ;-=s!!x :~;'~$; /"" Ce':::::la::=- - :a::' ·...·::!.":e d.a~a. c:.::s":.a:~:: ~'_?':>R:; eq-..la.:"s %~!:OO F·:£!:!.z l~';:'$: 1* C=::.-rj,a~= Rca:' :iao:a =.":.i.::-::' 0 ;"" CC:oI:J.a~= ?".a: da-:.a =&~t:..:n 1 :o::s-:.an":. :::'=_?":1?~ sqL;als .:;::..~:. F=~=!~ l:=;'~S; c::ls-:a::-: =.::;_?=.:::..: _;-.:a:"5 %=,:::'0 ::·=£=~z !~;'~S; :==-~::= :<.sa:' ::!.":.!. _s ___ ... :2 =.::::.s-:.a:::: ::!~=_P~?~ _~.;als %l:::::: P=-£!:"j: :=;'':''S; /or :=,==.:.::~ ?.aa:' :a":.a =£":.:;=. ~ c.~~s-:.a::,: CIv=_v=~ :A ::::S-:'=.::::: C!~=_:s..ADr:-DA=A DIGITAL CONFIDENTIAL Processor Register Definitions A-13 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 /* Cbox registers, continued constant BCTAG equals %x01000000 tag $; /* First of 64K Bcache tag IPRs (RW) constant BCTAG_128KB_MAX equals %x010lFFEO tag $; /* Last tag IPR for 128KB Bcache constant BCTAG_256KB_MAX equals %x0103FFEO tag $; /* Last tag IPR for 256KB Bcache constant BCTAG_S12KB_MAX equals %x0107F:EO tag S; /* Last tag IPR for S12KB Bcache constant BCTAG_2MB_~: equals %xOllFFFEO tag $; /* Last tag IPR for 2MB Bcache constant IPR_INCR equals %x20 prefix BCTAGS; /* Increment between ~cache tag IPR numbers PR19BCTAG_BITS structure fill prefix BCTAGS; FILL_~ bitfielcl length 9 fill tag SSi VALID ~!t!ield length 1 mask; /* Valid bit eRW) Orm-":.D bit!ield 1enr-h ~ mas):; /* OWnership bit (RW) ECC ritfield length 6 mask; /* ECC bits (RW) TAG bitfield length 15 mask; /* tag data (RW) end PR19B:TAG_E!TS; co!:.s-:.a:l': BCFLUSH equals %x01400000 tag $; /'" :irs"; c! 64:': :seache tag dea::'ocate !PRs (WO) cc!:.s-:.a!:.': BC=:.tiSF._:.2S::=_::;:'.>: eq~als Iu:Cl';:'ITEC ,:ag $; '''' :'ast aea:locC!':.e IPP. fe: ::':SY.E Beach. cc-ns':.a!:.t 5:=:':1SF._:56Y~_!-~: e;;,:a18 %xCl';:SITEC ':.ag; /'" :'a8t aeal.:"'c:a':.e :PR fe: 256Y.E Bcache c:·::,stant SC=:.tiSF._::ZF''=_~: eq-.!als %zOl-,"7ITEO tag $ :. :'ast c.alloca':e Ii'? fe: 5:'2F'': Bcache c~~s~a~~ EC:LUSE_:~~~~_X .q~&ls %x:::=::~O ~ag S; ~ ~as~ eea:l~~a~. ::~ fo= 2~ ~~a=hQ eC'!1s-:,a.::~ :::?,,_::='=?. . _~t:!.~s %z: ~ ;·=e.:iz =:=:'::5E:: / .. :::'==_=~E:':-:' :'_::~_L-:' =:a:!lQ d..2.::o:a-:.w =:::?. :l;:rJ:.. :-s A-14 Processor Register Definitions DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 /* Ibox registers. constant VMAR equals %xDO tag $; /* VIC memo%1-' adaress register PR~9VMAR BITS structure fill prefix VMAR$; FILL 1 bitfleld length 2 fill tag $$; LW bltfield length 1 mask; /* longword within quadword SUB BLOCK bitfield length 2 mask; /* Bubo-block indicator ROW-INDEX bitfield length 6 mask: /* cache row index ADDR bitfield length 21 mask: /* error address end PR19VMAR_BITS: constant '~AG equals %~l tag $: /* v~C tag register PR19~~AG ===5 str~cture fill prefix \~AG$: Y bitfield leng":h "lI'.ask; /'* da't.a valid. l:,~tS I:.p b!tfield length" mask: 1'* data parit::' bits ':? b:!.::'!ielc. lE.r.;th ~ mas;:; /." :::.:._:. bi~=ic:'c lc!'l;t.h 2 !!.:l ~ag _c:.~ ?a=!.':.::~ $5; b.!::. /-rr ::nusac b~ts (zero) =A~ t!~=i&:c lEn~~ :: mask; £::: ~?.. 19-:.~';'~_3::S; :::SR £;-.:a15 %;C'.3 -:.a; ~: c:::s~a::~ !:-:=-';:~::.:=" ~!:.:.-=:.~ !:~-:'=~E::= :a::;-:.::' : =.£S}.:; ~;~~_;~;;~~~::~~;~~t~ ;.a~~;: ::~?..?.. =!.':.!:'-=:~ :::£:::. =:. -;=:..~::. I"':.: :~: ::!'"'::.:.:: !-::.: s-:,a-:.us =-=;is-;.c: \~'t.;; !:.:: ;=c:iz ::S?S; F?:9:=S~~_=:=S :.s.::--.;-":.:: : ::..as}:; :":::=-::' : ::..as::; =!':=~~:= ::==~~ is::.=. :F. :S-:::S?_=:::: - for ~.-:: .:::a:::~ =~-:. (?~i;) ... =~":.!. ;:=:..~:. £==== '~::. :-; :a; ~!.=:..':~. -===:= {?:: ::::. --, I. ::. ::::=:._=::-E S-:':-':':-:"':'=i: ::..:: ;=-=!:..z =:::,.~; . 1::.':'::..::: :£::~:: ~ :as~-:; ==a::::' :::!s'::-~- =!~S ~:':=:? !. : !:..:: -;a; ::; !..::S::?Z==:-= =:.-:':=-.::=' :-a:-.:-::. ::. ::-. :s!:; ,'" :::.s-:.:=:. -:! !as,,: ==1.:::::' =:~~_: =~~!i~:: :e=;~~ =:-.:,s:_ ==a.::.:h ::':"s-:.c:::", -:a=.:.e /- =::::s:;' :'=~eh :is-=. a=:: :O~::::'E= LOAD H::STO?Y b!";!ielo len;-:.h 1 mask: I" w::!-:e neloi history ':.c a::ra::' :::':;SF._:P.: bi-:!ield :cr.;-:.:: : Ir~sk; :!/JSE_CTR l::'::!:'c:Q len;-:.~ :. :.ask; =::':1.:). bit!ielc lengt.h -: fill tag $5': :s:?-r_;':"GO?":,!:~ :":::.!i&l: lE---:~:: 1': :=ask; constant BPU ALGOR:THM equals end. PP~9BPCP~B!TS; /"" unusec bits (must Pe zero) /"" =-:a:::::' ~:Q=:'=~!o:: a~;:-:!':.hm %x:~CA: 1* default value for BPU ALGORITHM field - /* The following two registers are for testability and diagnostics use only. 1* They should not be referenced in normal operation. constant BPC equals %xD6 tag $; constant BPCUNW equals %xDi tag $: DIGITAL CONFIDENTIAL /* Ibox Backup PC (RO) /* Ibox Backup PC with .RLOG unwind (RO) Processor Register Definitions A-15 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 /* Mbox internal memory management registers. /* These registers are for testability and diagnostics use only. /* In nor.mal operation, the equivalent architecturally-defined registers /* should be used instead. constant MPOBR equals %xEO tag $; /* Mbox PO base register (RW) constant MPOI.R equals %xEl tag $; /* Mbox PO length register (RW) constant MP1BR equals %xE2 tag $; /* Mbox Pl base register (RW) consta:::t MPl!.R equals %xE3 tag $; / ... Mbox Pl length register (RW) constant MS3R equals bE4 tag S; / ... Mbox system base register constant y...s:r..R equa.ls %xES tag $; / ... MOox system length register constant ~Ja.E:::l\ equals %xE6 tag $; A-16 (RW) (RW) i'" Mbox memory management enabl& (RW) Processor Register Definitions DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 /* Mbox registers. constant PAMODE equals %xE7 tag S; /* Mbox physical address mode (RW) PR19PAMODE BITS structure fill prefix PAMODES: MODE bitfield-length 1 mask: /* Addressing mode(l - 32bit addressing) (RW) constant PA_30 equals 0: /* 30-bit PA mode constant PA 3: equals 1; /* 32-bit PA mode FILL_l b1tfield-length 31 fill tag SS: end PR19PAMODE_BITS; constant ~~R equals ~xES tag $; /* Mbox memory management fault address (RO) constant MME?T£ equals %x~9 tag $; 1* MOo>: memory management fault PTE address (RO) MMESTS equals %xEA tag $; /~ Mbox memory management fault status (RO) :R:9!~S=S B:TS St:~cture !!:l prefix ~STSS; ~".: :::'-:.!'ie~: len;-:!: 1 =~ask; j . AC\- =a~1:: ::a.,;c ':.c- ler.;~r. viola':.ion ?=~_?Z: t!tfielc :e~;t~ : mask; j. A=V/T~-V fa~:t occurred on PPTE reference ccnsta~t ~; !:':'::.:i.:; :e:l.;-:'r.. : ::.:.s}:; /'fI: ~6=.a=E::=" had w:!.":.E: cr !rt~dify i:l't.er.~ :::::..:._: ;·:.::!:'c:c lE::;-:!l :.: :!:l -:.a.; S:~; :A:-:": l::!::.!ic~ci ~e::;-:.!'! : =.ask; /'fI' :a-:;~...:. ':.::~, one c! th. !e:'lowin;: ::::5-:::. . -; ::';"~:~;'..:"",,. -e;--.:a:s:; /..". 1"__ " :!s:..:.:"-= ::::s-:£::-: :;._:-:.:_=~. £;:':3.:s :::; :'''' :::-,,~ !a:.:.: ': ::::'S':.!.::::. =;'.:-:':_!'-~: _::.:.a.:'s~; ;Off !~-.:. :a::.:::. =:~:_: S?.. =:: ==::'S'::.::':' : _...... =~~=~6:: :&~r~~ ~-:!~-=:~ :£::~::. :: !!:: ~~= ! ; ; ~ :""""':'5;:; S::a:':·i-: =:.;.::'" : ! :.:::~ !:~':S : .. _=.r_.:'. -=~.:.~.:.s .;~: :': - -; :.:::z __ ;·a=:' ':::.. - . ...':. -- --- -=;:.:a..:s ~~~: ':.:.;' -; (Swc !·:s~=S c::ls-:a::~s ::c:::....·) _==:= ac.:.=css \?..:; :.~ :~:::E;:~_=::; S~=~:~~=~ =~:: ;=c!~x :.:=:: =~-:.=~o:::. :.-==.=~:: : =':S~':i ::~:.?~ ;.:.'==:.-=:: :.a::.==:: : =.:S}:i .... ::S:S$; :.-s;-:"s':..:= :"s :::::.a:' ::;.. -:.: :- '. :.s::= ;:a=:':.~· -===== i?..:; ~ ____ (1"::: : ~~~: . .;,.:. :::'-:::'-='::' :'-s!':;-::: : =;,:.si:; ,' ... E,.: :2::':::' as ~-!.::.c. y."'he:: ... ===:- c::-.:.==_= (?..O) C!..=, =~-:!:.. .. :c :_::~:: :: = . ask; ! .. 05: ::.==.a::: wile::' ::. :a=~':.y a=:c: o:c'!.:r.::' (?toO) ::~:~ =~':=!~:~ :~~::~ =:!.:_.~ F::t~_l 'Woo.. =!-:.=i.•:"~ len~-:.h ::: :i:l "tag 5S; SRC b!~!iele length 3 mask; .r.: PR:'9-:3STS_::!'Si /. So~rce 0: original refemce (see ~~C$ cons~ants below) (RO) ccns':ant !R!::_:.kTC"'ii e:r.:a.ls 6 prefix MSRC$; it Source of fault was IRE:: latch constant SPEC_Q~UE equals 4 pr&f1x ~~RCS; /* Source of fault was spec queue constant EM_~~TCE equals 0 prefix MS~$; /* Source of fault was EM latch DIGITAL CONFIDENTIAL Processor Register Definitions A-1"7 NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 I~ Mbox Pcache registers constant PCADR equals %xF2 tag $: I~ Mbox Pcache parity error address eRO) constant PCSTS equals %xF4 tag $: I~ Mbox Pcaehe parity error status (RW) PR19PCSTS BITS structure fill prefix peSTS$: LOCR bi~field length 1 mask: 1* Register is locked due to an error (We) DPERR bitfield length 1 mask: I~ Data parity error occurred (RO) R:G?T_BANK bit field length 1 mask; I~ Right bank tag parity error occurred ~O) LE:'!'_B~.NE bitfield leng~h 1 mask; I~ L6ft bank tag parity error occurred(RO) CMD bit!ield len~h : mask: I~ S6 command when Pcache parity error occured eRO) ?'!'E_ER_v."R bitfield len~h 1 mask: 1* Hard error on PTE DREAD oecurreci (orig-':e! was WT":TE-) (We) ~:::-::~ bi~field leng~h :. mask; I~ Hard error on PTE DREAD occurred (WC) =_~~ ~ z!~!icld length Zl fill tag $$; a~c PR:9?CS~S_~:TS; constant ?CC:~ equals ~x:s tag $; :?-.:9:?===:'_=:':S s-:,=-,,;,:"::~re :!~:l 1* Mbox Pcache control (RW) ;=e!~.x ?C==~~; j .. :'n&:-l& !c·: !.nva:i:cia't.e, !:-s':.r.am r .. ac/wr!::.&/!:':l b;.t!ie:'c len;-~::' 1 mask; lor ::nable for ;:'validate, :-s":r.a."t read/!1:'l CRoW) !'c·:. ~_:.:: =~-:.=i := : ..::.;-:,r. : mask: /7 ~a.=.l. !O::e h!t or.. ?:a:he rQ!.:ances (:tW) E;':7:._::~ =,~~!:" .. .:..:. :....::;-:!:. :. :n~sk; /* S .. :.:-: l .. ~:. ='&:'lk !! C, =ig::~ ba.:lk !.! 1. (:t~.;) :_=:!;;..E~ !:·':'::.!ie:= lE::;-:'I: : ::-.ask; :_::l,AE~ :_~:;E:':: .. :::':::i.:: :.::.;-:::' :. .r;-~a.s~:; / . ~:5.l:'1. F,a~:'':~'' :he:k:!.:g (tfw) / .. !..==-~X ;-e:!=.::na::,:e :n=~':'.':.,:,: r.~=. (Rv;~ ?:-:..: .=:::.!.:. ..:': :~::.~!: 3 !"'..as?; :.:::;':.!: : !:4!lsk; /* i-:a=~_ .. :.:-:::.:2.: ::'sa=·:. :~:: ~;~::~~;~~~=~~~;:.:~;~;:::~~::. !~:~:~~~ ~~; ?_:=::a:::~· e."laZ:. =1~ iF.C) or,: -==:.:' :?:i::::::'_===':; ~:.z=_::S~~:'~ ::!::=~~:.:. _ ::!~!~oi::~ =:~:-: : ..::.;-:.:: : =.as~; :.~; ::·a::'-:::- iN:, =!-:.:!.:: :.~:~:. : !~:: -:- ::; -:;.:; ::!. t!!. ..::: :' ..::;t::' ::: :r:.ask: E..-"lQ /9 CRi\j .,' !... :a; l:!.,,:s (~*) ??.. l9?·::~:;_~::S; ?C::;;'.? equals %>:OlCOOOOO tag $: 1* First of 1024 Peaehe cia":a pari":y !ilRs {Rvn eq..,;a:'s ~~:'nCC1::S ":a; $: I" !.as~ cf 1C:4 Peache ciata par:'ty :P?s constan~ !P~_!NCR equals %x8 prefix PCDAP$; I~ Increment between Pcaehe data parity !PR numbers PR19PCDAP B!TS structure fill prefix PCDAP$: DA=;'_?~~~!Y bi~field len;":h 8 mask; 1* Even byte parity for the addressed quaciword (RW) F!~~_l bi~!ield length 2' !ill ~a9 $$; end PR19PCDAP_B!TS: ccnstan~ c:ns~ant ?C:;'':_!·~: end PR19DE:: end_module $PRl9DEr': A-18 Processor Register Definitions DIGITAL CONFIDENTIAL NVAX CPU Chip Functional Specification, Revision 1.0, February 1991 A.1 Revision History Table A-1 : Revision History Who When Description of chaDge Mike Uhler 14-Aug-90 Update to latest version Mike Uhler 01-Dec-89 Initial version DIGITAL CONFIDENTIAL Processor Register Definitions A-19 Index A Addressing Modes General Register· 2-10 PC-Relative· 2-11 Address Translation· 12-79 PO Process Space· 2-28 P1 Process Space· 2-29 System Space • 2-26 ALU·8-18 ASTLVL See DEC Standard 032 B Backup Cache See Bcache Bcache Addressing 128KB Cache • 13-15 Addressing 256KB Cache· 13-16 Addressing 2MB Cache· 13-18 Addressing 512KB Cache· 13-17 Data Store ECC Matrix • 13-40 Disabling· 13-108 Enabling • 13-1 08 Interface Pin Descriptions -13-10 IPR Access - 13-56. 13-87 Organization· 13-4 Pin Timing -1~. 20-11 RAM Speeds - 13-5 Tag and Index Interpretation ·13-4 Tag Store ECC Matrix - 13-38 BCDECC-1~5 BCEDECC -13-75 BCEDIDX - 13-74 BCEDSTS -13-71 BCETAG -13-09 BCETIDX - 13-09 BCETSTS -13456 BCFLUSH - 13-89 BCTAG • 13-87 Boundary Scan Register -19-16 BPCR- 7-60 BPU -7-55 DIGITAL CONFIDENTIAL BPU (Cont.) Branch History Table - 7-56 Branch Mispredict Processing - 7-58 Branch Prediction Algorithm· 7-55 Branch Prediction Sequence - 7-56 Branch Queue - 7-57 Branch Stall- 7-58 PC Loads - 7-58 Branch Condition Evaluator· 8-34 Branch Prediction Unit See BPU Branch Queue - 8-46 Byte Mask Generation - 12-36 Unaligned References· 12-57 c Cache Coherency - 13-99 CCTL· 13-61 CEFADR·13-80 CEFSTS· 13-76 Chip Clocking Clock Domain Crossing • 17-9 Clock Skew • 17-7 Controlling Inter-Chip Skew - 17-7 External Oscillator· 17-1 Generation and Distribution - 17-3 Global Clock Distnbution· 17-5 Global Clock Waveforms - 17-5 Initialization· 17-9 Inter-Clock Skew· 17-9 NDAL Clocks· 17-7 NDAL Signals • 17-9 Rise and Fall Times • 17-7 Section Clock Distribution· 17-5 Section Clock Waveforms • 17-6 Self Skew· 17-8 Test Environment· 17-2 Chip Initialization - 16-1 Cache· 16-3 Cbox -13-119 Console - 16-2 Ebox - 8-84 Hardware and Microcode -16-1 Ibox· 7-64 Index-1 Chip Initialization (Cont.) Mbox - 12-120 Microsequencer - 9-24 Chip Overview Box and Section Description - 4-1 Major Buses • 4-4 The Cbox • 4-4 The Ebox and Microsequencer 4-3 The Fbox • 4-3 The lbox • 4-2 The Mbox· 4-4 Chip Reset 017-10 Clocking See Chip Clocking Complex Specifier Unit SeeCSU Console Halt • 2-40, 15-19 Halt Codes -15-19 CPUID-2-44, 18-2, 18-7, 18-11 CSRD See DEC Standard 032 CSRS See DEC Standard 032 CSTD See DEC Standard 032 CSTS See DEC Standard 032 CSU -7-40 Branch Mispredict Effects· 7-52 Ibox IPR Transactions· 7-53 Microcode Control- 7-40 Microcode Restrictions· 7-53 Pipeline • 7-41 RLOG ·7-51 0 D Data Types • 2-6 to 2-8 Destination Qu~ue • 7-32, 8-44 Error Handling and Recovery (Cont.) Error Recovery· 15-8 Retry -15-17 State Collection - 15--$ Errors Bcache ·13-111 Dstream Memory - 7~5 Istream Memory - 7-64 Pcache Parity Error· 12-104 TB Parity Error" 12-103 Error Transition Mode See ETM ESP See DEC Standard 032 ETM ·13-104 Exceptions" 2-35 Arithmetic· 2-36 Ebox Handling - 8-67 to 8-72 Emulated Instruction· 2-38 Fbox Detected • 8-56 Ibox Detected· 8-51 Machine Check· 2-40 Memory Management • 2-37 Reserved Addressing Mode· 7-65 Reserved Opcode • 7-65 Vector • 2-40 Exception Stack Frame General· 2-33 Minimum" 2-33 F Fbox Destination Scoreboard - 8-54 Fbox Disabled Mode • 8-58 Fbox Result Handling • 8-53 Fbox Stage 4 Bypass ·11-63 Field Queue· 8-48 G E GPR· 2-4 ECR· 8-81 Electrical Characteristics AC Characteristics - 20-7 AC Conditions of Test • 20-7 DC Characteristics • 20-1 Maximum Ratings· 20-1 Pin Capacitance • 20-4 Pin Driver Impedance - 20-3 Pin Levels· 20-4 Power Dissipation Across Voltage and Cyde Time- 20-2 Error Handling and Recovery· 15-3 Cache and Memory Errors - 15-9 Cache Coherence ·15-10 Error Analysis· 15-7 H 2-1ndex Hard Error Interrupts • 15-49 Event Descriptions· 15-51 to 15-56 Parse Tree· 15-50 to 15-51 Stack Frame • 15-49 VO Space Read Synchronization· 8-G3, 12-33 IAK14· 10-3, 10-14 IAK15· 10-3, 10-14 IAK16 0 10-3, 10-14 IAK17 ·10-3, 10-14 DIGITAL CONFIDENTIAL Ibox IPR Access • 8-66 IBU· 7-19 Branch Displacement Processing • 7-25 DL Stall • 7-24 Ebox Assist Processing • 7-25 Exception and Error Processing· 7-28 FPD Processing • 7-29 Index Mode Specifiers· 7-27 Instruction Context - 7-21 Instruction Parse Completion - 7-28 Loading New Opc:ode - 7-27 Operand Access Types - 7-23 PC and Delta PC - 7-24 Quadword Immediate Specifiers - 7-26 Reserved Addressing Modes - 7-26 Reserved Opcodes - 7-28 Specifier Identification - 7-21 SPEC_CTRL Bus- 7-24 Stop ancl Restart Conditions - 7-29 V Access Mode Operands - 7-28 ICCS ·1~, 10-13 ICR· 10-6, 10-13 ICSR· 7-16 ''':, IIU - 7-30 Issue Stall- 7-30 PC Queue and PC Loads - 7-31 Initialization See Chip Initialization Instruction Burst Unit See IBU Instruction Context - 7-21, 8-38, 9-22 Instruction Issue Unit SeeliU , _ Instruction Parsing - 7-17 Instruction Queue - 9-20 Instruction Set - 2-11 to 2-24 INT.SYS Register - 8-30 Internal Processor Registers SeelPRs Internal Scan Register Cbox -13-116 Chip-19-4 Ebox - 8-87 lbox- 7~9 Mbox ·12-122 --'" - ... __ . Microsequencer • 9-26 Interrupts - 2-33 Interrupt State Register' 10-9 Interrupt Summary - 10-10 Interrupt Vector - 10-3 Interval Timer -10-5 INTSYS· 10-12, 1~14 IORESET . ' ,. ,_ =c- . See DEC Standard 032 IPL· 2-34 IPRs ASTLVL See DEC Standard 032 -- ~.- .... - : DIGITAL CONFIDENTIAL 1· IPRs (Cont.) BCOECC -13-65 BCEDECC - 13-75 BCEDIDX'13-74 BCEDSTS'13-71 BCETAG -13-69 BCETIDX - 13-69 BCETSTS - 13-66 BCFLUSH - 13-89 BCTAG -13-87 BPCR- 7-60 CCTL-13-61 CEFADR • 13-80 CEFSTS - 13-76 CPUID - 2-44, 18-2, 18-7, 18-11 CSRD See DEC Standard 032 CSRS See DEC Standard 032 CSTD See DEC Standard 032 CSTS See DEC Standard 032 ECR-8-81 ESP See DEC Standard 032 Full listing· 2-52 to 2~0 IAK14· 10-3,10-14 IAK15'10-3,10-14 IAK16 - 10-3, 10-14 IAK17 -10-3,10-14 ICCS -10-6,10-13 ICR -1~, 10-13 ICSR· 7-16 INTSYS -.10-12,10-14 IORESET See DEC Standard 032 IPL- 2-34 -'ISP See DEC Standard 032 . - ..... KSP See DEC Standard 032' MAPEN ·2-25 MCESR • 15-22 . - .- .-- _. "- ... MMAPEN • 12-40 MMEADR - 12-41 MMEPTE - 12-41 MMESTS • 12-41, 12-95 MPOBR - 12-38 MPOLR -12-39,12-47 MP1BR -12-39, 12-47 MP1LR· 12-39, 12-47 MSBR ·12-39 MSLR • 12-40, 12-47' -. :~ • MTBPTE· 12-54 MTBTAG • 12-52 NEDATHI • 13-86 NEDATLO • 13-86 ~"'f'~ :~i\..~ ,_. r • 'S?;f . .' :~ :... ~ !'I. .:~ -- - ---- --... ,. .- n·:;)· ...... Index-3 ... -'" .... ' ·f - IPRs (Cont.) NEICMD - 13-85 NEOADR • 13-83 NEOCMD .. 13-84 NESTS· 13-81 NICR -10-8,10-13 POBR-2-29 P1BR - 2-30 P1LR - 2-30 PAMODE • 2-4, 12-40 PCADR - 12-43 PCBB-2-4G PCCTL - 12-44, 12-71 PCDAP - 12-46 PCSCR-s-aO PCSTS - 12-43, 12-107 PCTAG • 12-45 PME·18-7 PM FCNT • 18-8 RXCS See DEC Standard 032 RXDB See DEC Standard 032 SAVPC· 2-40, 15-19 SAVPSL· 2-40,t5-:1S:' SBR· 2-26 SCBB- 2-41 SID ·2-44 SIRR - 2-35, 10-13 SISR· 2-35, 1~13' SLR· 2-26 SSP-See DEC Standard 032 TBADR • 12-42 TBCHK See DEC Standard'032 TBIA· 2-25, 12-55 TBIS • 2-25, 12-54 TBSTS -12-42, 12-106' -' TODR See DEC Standard 032 TXCS See DEC Standard 032 TXDB See DEC Standard 032 USP See DEC Standard 032 VAER . _. ,. See DEC Stanoard 032 :~H. VDATA-7-15 VMAC See DEC Standard 032 VMAR· 7-14 VPSR See DEC Standard 032 VTAG" 7-15 VTBIA See DEC Standard 032 ISP See DEC Standard 032 ..- . J STAG Test Port - 19-7 K Kernel Stack Not Valid • 15-87 Stack Frame - 15-87 :.,KSP ,~ DEC Standard 032 L LFSR' WSUS-s-gO M r .~ ';_ 4-Index Machine Check - 2-40, 15-22 Codes - 15-24 Event Desc!iPt.!o'1S •.15::?;3~ tol5::-4Z ..._ ,..,,. Parse Tree· 15-25 to 15-32 Stack Frame • 15-22 MAPEN·2-25 Mask Processing Unit See MPU Mbox Commands • 12-23 Mbox Reference Order Restrictions • 12-:25. MCESR • 15-22 MD Bus Rotator· 12-20 Memory Management Probe Status Encoo.ings12-51 Microcode Format ' ... Ebox· 6-1 to 6-4, 8-4 to '9-8 Ibox CSU • 6-4..·to··6-5 ._' -... ,. .ibox iROM and Control PLAs· 6-5 to 6-7 '. Microcode Restrict.ion:s . Ebox • 8-91 to 8-96 MicrO$tack - 9-22 Microtest Fields - 8-39 Microtraps • 9-13 to 9-18 MMAPEN • 12-40 MMEADR -12-41 MMEPTE • 12-41 MMESTS -12-41,12-95 ::' MMGT.MODE Register·' &-30 ' MPOBR - 12-38 MPOLR·12-39,12-47 MP1BR - 12-39, 12-47 MP1LR 012-39,12-47 MPU .. 8-32 MSBR-12-39 MSLR -12-40,12-47 ._ .-. s-:o: 0, : - .' DIGITAL CONFIDENTIAL Parallel Port ~(" '1 Observe Mbox (Cont.) S5 Reference Source· 12-123 _..: . , ' '..Jj Operating Modes· 19-4 : \ ':: ~ • F - k~ .. Patchable Control Store .S I • C ~'!' ' .,:; Loading· 9-3 ; B--8 ~" - -:.1 NDAL Overview • 9-3 -): 0-"'. ' . ,,'.f, Pcache '12-21,12-70 is.,_ ", Arbitration ~ 3-18 Cache Coherency • 3-54_ ,_.., __ ,. ,,__ . Adtiressing. 12-70 08,. _ ~ '='J Clear Write Buffer· 3-55 Address Redundancy Mapping • 12-n 02-"; •. _. r=1 Clocking' 3-18 IPR Access' 12-75 (.;;. ,,; > .~-S ' :::.-:7 .i:'C; Description • 3-15.... ,---Logical Organization' 12-70 ~;; ~ . ~,~ ,.;~)a Errors' 3-57 to 3-63 .i-. ," . '" Redundancy Logic. 12-77 a.j:l.. : . -;~ ')C Field Description' 3-27 to 3-37 .' . ..,' -,. Replacement Algorithm' 12-74 ;' - _~. !.h-$ r : ;~;", Information Transfer· 3-27 PCADR. 12-43 ;';'-S i ;;. ,']0<:: Initialization· 3-64 ~.:. ":: .;r; ;s"::' ~-,. 2-48 08-.' ;;.:)~ Interlock Support· 3-:-56 .. _... -_ ... ---.--." ··--·-P6BB.,. 2-46 ~.j~_::;' ,1~~~~ ~ Interrupts· 3-55 PCCTL:. 12-44, 1 2 - 7 1 : : ; { _ - . 5 . : " ., Terms' 3-17 _..... __ ._ ._... _._... I?QDAp. 12-46 '. ~.~', Transactions··3-3'8 to '3-53----- . PCSCR • 8-80 8-e r .-~,~ ..:.;.•.::; NEDATHI'13-8S peSTS. 12-43,12-107 -: ') ,-:; NEDATLO ·13-86 PCTAG '12-45 .':'~"~'.k, :y; _ NEICMD· 13-85 _'M ___'. __ - ' . - .......- .._ ' . Perloimance Monitoring Facility MTBPTE - 12-54 MTBTAG ·12-52 N "'0 •.• _ •• -- PCB. ,. Base Address ·18-2 •. ,.' ... ~. -'i~ ,)": .. ., ., - •... - ---. Siock Diagram· 18-8 .'. , .. . - .-" CbOx Event Selection ,13-61. 13-118, ·1~)!> __ .. ~. .;,:-:" Configuring' 18-3 "',_, :Ebox Event Selection' 8-83. 18-4 .. Enabling and Disabling· 18-6 f ., ,·:·t- "Ibox EventSelection e 7-16, 18-4 i-:--Ot - - - - - - - - - - - - - - - - - - - . (' .~? Mbox Event Selection' 12-44, 12-126. 18-a,)· Operand Queue Unit ~ ~I Memory Data Structure Format. 18-2 - ...... NEOADR'13-83 NEOCMD· 13-84•.--. NESTS ·13-81 NICR '10--0.1 0-13 o A· ~Memory Data Structure Updates' 18-2 See OQU Operand Specifier Processing· 7-32 OQU • 7-32 . . Destination Queue • 7~2 Destination Queue Interfa~ • 7-37 ._ ~ _- MD Allocation • ?-39 , ~;" -. Queue Entry Allocation' 7-38 .-. Source Queue· 7-$2 Source Queue Interface' ?;34. " -PFQ' 7-17 _.?"' .;:.".ti_~"c;J~ :J .• . ~ '. ""~. -c. :':;1 Physical Address Space - 2-2. 12-81 ' : ; ; " " . , '.:: ' '~:'-:::-- • Pin Description :..'~ 1;;7' ~. 2 Cache Interface Pins· 3-10 to 3-::1 ~'~r.m? . ~ ~.i':; CldCking Pins' 3-8 to 3-9 . -;-., Clocks' 20-22 ::.;:. < '-. . 'Interrupt and Error Pins. 3-9 to 3-10,) interrupts' 20-24 . ~NDAL' 3-4 to 3-7, 20-9 -3 . . -;,'" ......:: " ,C " "':Reset • 20-23 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _-_.-_,_.::-_.~:-:,.-:". ·=Tesf-20-24 p ~ :J~" .. ~ ... .",-,~ r ..~. '~:'t3~~!~o:<3 ....~:: '-~ ':..:.', ..7-· ..·\··./'4 ........... , POBR· 2-29 ,~·;··s }' .•~ .:rest Pins· 3-13 to 3-14 POLR· 2 - 2 9 ' " .Pinout -3-1 to 3-3 - :-,£c.'"lc8 . _.<?} P1 BR • 2-30 ',~ ::-'::' ":':'; " Pipeline l:;'c'. P1 LR • 2-30 " ..-.: •. '·'.ifi"~c8ptions· 5-16 to 5-21 .. .' ':".:-:-':r.s~?! ::: ~ ':""":'. Page Table Entry Format· 2-31 ~~L~ ..:. • ./1;C Fundamentals· 5-1 to 5-6 ~:..... -.i PAMODE· 2-4.12-40 " , . Microtraps, Exceptions. and Intefl1J~~6~lL. :'J' p:>~i' ... z Parallel Port - . NVAX Overview- 5-6 to 5-11 :::_.~ ..",--AJV Observe CboxlMbox -12-123. 13-115 ~::;:··.,;:i·;..···;o.r .:: ·:,Statls· 5-11 to 5-16, 8-10 -:'~Ni'<j' Cbox Tag Store Command • 13-£116 ,1',: ;.f=;;; ~::'. 'PME' 1&-7 -., :)'E:X:::i ~ , .. , . . -:.; Mbox MD Destination ·12-123 .~< '.~' PMFCrtA"· 18-8 • ;;:~,:i".' Mbox MME State -12-123 . ~<. ~~:. PMFCNT Register' 8-37,8-91 .": ... " ~: --~, Population Counter' 8-36 . ~~~.I:": ".. ' ::=: Observe Ibox - 7~9 Observe MAB • 8-87, 9-25 "~'.' .: . Power 'Failure Interrupts· 15-48 • ;- .::. .,.. Observe Mbox • 12-123 ? Prefeteh- Queue : is' -'I S5 Command • 12-23 .. '. ' See~ PFQ .' 't;, lO!:c ':-'. DIGITAL CONFIDENTIAL Index-5 Primary Cache See Pcache ProcessControJ Block • 2-48 PSL- 2-5, PTE ·12-83 i-27 Q ."..; ',-c SBR· 2-26 SBU·7-54 SCB·2~1 SCBS· 2-41 ScorebOard Unit ·See SBU SC Register. 8-29' ; SeriaJ:iest Port· 1S-7 Shifter .. 8-21 SID·,2~ . , .... .'o'.,,' SIRR .. 2""'35, ',10-13 ',SISR ,,, 2-35, 10-13 SLR .. 2-26 Soft Error lnterrup~ • t5-57 Event DescriptiGns .. 15-69 to 15-86 Parse Tree - 15-58 to 15-69 Stack Frame - 15-57 , Source Queue· 7-32, 8-43 :., I Register File • 8-13 Bypass· 8-14 Valid",.fault."':~nd Error Bits· 8-16 Reset ' See Chip Reset Result Bypass • 8-26 Retire Queue· 8-47 RMUX·8-23 RXeS See DEC Star)dard 032 RXOB See DEC Standard 03,2 SSP See DEC Starldard 032 Stalls Ebox· 8-72 to 8-77 State Flags • 8-30 System Control Block • 2-41 Vector - 2-41 • s T S3" Stall Tir:neoot • -a...:a4 S5Reference PaCket TBADR .. 12-42 TBCHK See DEC Standard 032 TBIA • 2-25, 12-55 TBIS • 2-25, 12-54 TBSTS .. 12-42, 12-106 Timeout Counters· 13-46 TODR See DEC Standard 032 TXCS See DEC Standard 032 TXDB See DEC Standard 032 A~ TyPe~c 124 .Address - 12-4Command .. 12-4 Oa~ ·12-4 Data length·"12~ Reference Destination· 12-4 Refer~nce Qualifiers • 12-..5 Tag ~·:12~A' S5 Reference Source Arbitratiori • 12:1?, 1:2-28Cbt»c Latch .' 12-16 ' EM Latd1:: '12-9 Iref Laten .. 12-6 MME Lat~h" 12~12 PkQueue·12-17 ' Retry Dmiss Laid, ·12-14 $p99Qu9ue .. 12-8 , VAP tat<tt- 12-11 S6 Refereh~ Packet Address .. 12-5 Byte MaSk· 12~5 Command ·12-5 Data ·12-5 Reference Destination .. 12-5 Referen~ Qualifiers· 12-5 SAVPC· 2--40, 15-19 SAVPSL.2~O, 15-19. 6-Index ~ u Unaligned Reference Processing ·12-55 USP See DEC Standard 03~ _~ .• . .__ ,- v VAER See DEC 'Standard 032 VA Register· 8-25 _, VAX Restart Bit· 8-37 VDATA-7-15 Vector Instruction 'Supp~rt Limitations • 14-3 VISA· 7--0 DIGITAL CONFIDENTIAL - - 0"' 0" "} ~ " ~ VIC -7-5 Bypass-7-9 Reads- 7-8 Control - 7-7 a:'·-~ Writes • 7-9 Control and Error Registers - 7-14 to 7-16 :: ". Virtual Address Space - 2-1, 12-80 ,·~.Vjrtual Instruction Cache EO/OSTOP_IBOX_H Effects - 7-11 Enable - 7-13 '.-;; SeeVIC '1':1 ~~MMAC Exceptions and Errors - 7-10 Fills - 7-8 .~ See DECStandard.032. __ ."'._ ..... Flushing - 7-13 ,NMAR· 7-14 '.., ;MP.sS:.;;,z _ Hits Under Miss - 7-10 PC Load Effects - 7-10 --"'-~ .....-:-:~. C$.~; ,. -2" -See;£)EC Stahda,cf032 . . - ·\.;~M ...... ~'\aAG ·~7-15 Performance Monitoring Hardware - 7-16 .... -.. -7; ;- - '. 2:vrBIA;: ..•. -. - .•....•. _Prefetch Start Conditions - 7-12 ..-----:--~~..... Prefetch Stop COnditions. 7-12 $6& DEC Standard 032 .... -".-,... -~ ---- _~"'''''''' .J'_' _ _ 'I- .' -----.----.- ;.--,~ ~.~--.:-.J': ",' ·"'!"f1U';:·:.:·";'1 ::";"'1, ,''': ./1::.: oSie:tJ?9F: -';:1 ·f;'1:~:.·t'>:~ ·l¥.~~.r:"; i .. .. .. --~ - " :,~",:' .'''''; " ~\~' ·.o~13 'C}nr{,' ~:.::~~~ ~r~:c:::: ..~:r-~7~ . ~'{A ''::'Ia2:::' ·2$.~~:: * ~ ... ' . . : ~~.:.: ;;'.1' ,. ·'.i'· .z::-::..~~ . ~.~ r_ .. - - - - -~"'- ';.::_"..:::-::-~'; ~~'::.~ .. ; "'~ r • ,.--: 1~ ....;. .. • : .• • •, •••? ... ~- ...!.. -.:...:....:- :.:~: . --,-,:-,"-, .. - :::.... -.-- -' -.:~~'< ~T(31' ~ SG.I.1 ... : ;;..: --::------:-:,: ,.,-_.__ .. v ' ''; " .. :.- '""l~ • DIGITAL CONFIDENTIAL ... 1"'" I ·~7;1 :~;:~.~?:; • . j Index-7 ,
Home
Privacy and Data
Site structure and layout ©2025 Majenko Technologies