Digital PDFs
Documents
Guest
Register
Log In
EK-KA90I-TD-001
May 1990
182 pages
Original
8.5MB
view
download
OCR Version
9.9MB
view
download
Document:
VAX 9000 Family IBox Technical Description
Order Number:
EK-KA90I-TD
Revision:
001
Pages:
182
Original Filename:
OCR Text
VAX 9000 Family IBox Technical Description Order Number EK-KA90I-TD-001 digital equipment corporation maynard, massachusetts DIGITAL INTERNAL USE ONLY First Edition, May 1990 The information in this document is subject to change without notice and should not be construed as a commitment by Digital Equipment Corporation. Digital Equipment Corporation assumes no responsibility for any errors that may appear in this document. Restricted Rights: Use, duplication, or disclosure by the U. S. Government is subject to restrictions as set forth in subparagraph (c¢) (1) (ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013. Copyright © Digital Equipment Corporation 1990 All Rights Reserved. Printed in U.S.A. The postpaid Reader’s Comment Card included in this document requests the user’s critical evaluation to assist in preparing future documentation. FCC NOTICE: The equipment described in this manual generates, uses, and may emit radio frequency energy. The equipment has been type tested and found to comply with the limits for a Class A computing device pursuant to Subpart J of Part 15 of FCC Rules, which are designed to provide reasonable protection against such radio frequency interference when operated in a commercial environment. Operation of this equipment in a residential area may cause interference, in which case the user at his own expense may be required to take measures to correct the interference. The following are trademarks of Digital Equipment Corporation: BI Cl DEC DECmate KDM KLESI MASSBUS MicroVAX RSTS RSX RT RV20 DECwriter DHB32 PDP P/OS TA TK DECUS DIBOL DRB32 EDT KDB50 NI Professional RA Rainbow RD RV64 . ULTRIX UNIBUS VAX VAX C VAX FORTRAN VAX MACRO VAXBI VAXcluster VAXELN VMS VT Work Processor XMI dliloli[t/a{ 1N ® IBM is a registered trademark of International Business Machines Corporation. ® Intel is a registered trademark of Intel Corporation. TM Hubbell is a trademark of Harvey Hubbel, Inc. ® Motorola is a registered trademark of Motorola, Inc. This document was prepared and published by Educational Services Development and Publishing, Digital Equipment Corporation. DIGITAL INTERNAL USE ONLY Contents xiii About This Manual 1 General Description IBox Introduction . ... ........ ittt ineneneenannaneans . .. .. ... ...... 1.2 Basic Hardware Implementation. . .. ... , i .o .. ... Program Counter Unit ....... 1.2.1 Virtual Instruction Cache . . ... .. ... i 1.2.2 ie Instruction Buffer . . .. ... 1.2.3 Branch Prediction . ... ... o it ittt it i i e 1.24 Branch Prediction Cache. ... ... ... . i 1.24.1 ... .. it ... Multiple Specifier Decode Unit . ...... 1.2.5 i ... ... Specifier Handlers. . .. 1.2.6 e e e e e Complex Specifier Unit . . ... ... e 1.2.6.1 Short Literal Unit . . . .. oo it it e i it e e ee s 1.2.6.2 Free Pointer Logic. . . .. .. oottt 1.2.6.3 Read/Write Scoreboards .. .......ciiitiiiiinennaenenennnn 1.2.7 1.3 Physical Organization . ....... .. ... VIO MOU & et e e e e e e e et et et e e 1.3.1 XBR MCU . ottt e e e e e et et et e e 1.3.2 1.1 1-1 1-1 1-1 1-3 1-3 1-3 1-3 1-3 14 1-4 14 14 1-5 1-5 1-6 1-7 1-8 e e e et e et et et OPU MCU . . ot 1.3.3 1.4 Pipeline Overview .. ...ttt 1-9 15 IBoX INterfaces ... ..o vttt ee s ieterenenenaesonneennnnnss 1-13 MBox INterface . . .o v oottt ittt et e it e 1-13 1.5.1 Instruction Buffer Port . . ...... ... 1-13 1.5.1.1 1512 1.5.2 1.5.2.1 1.5.2.2 1.5.3 1.5.4 OPUPOIE .« o oee e ttee e iieae e e eia e iiiaa ey 1-13 s onenas es t t et inanaeea EBOX INterface . ..o v it n. nnnnn iinnn .. IBox-to-EBox Interface . ....... EBox-to-IBox Interface . ....... .ot inieeeennnnn. VBOX INterface .. .o oot e ittt it e .. Service Processor Interface .. .. .....cviiiiiiie 1-14 1-14 1-14 1-14 1-14 ITE] iv 2 Contents Program Counter b2 S O 1 o 1= 2.2 Major PCs . ... . i e e e e e e 2-1 e e e 2.2.1 Prefetch PC .. ... .. i e 2-1 2.2.2 Prefetch PCDataPath ............. ... ... .. . ... 2-3 2.2.2.1 2.2.3 i 2-1 Prefetch PC Control . .......ovriii Target PC ... ... i i, e e 24 e 24 2.2.4 Decode PC . . ... e 24 2.2.5 Decode PCDataPath ............. ... . 0. 2-5 DeltaPC .. ... ... e e e 2-5 2251 i e e 2.2.6 Branch PC ... . . e e 2-6 2.2.7 BranchPCDataPath.............. ... ... ... .. . i ... 2-6 2.2.8 Unwind PC .. ... . e 2-17 2.2.9 Unwind PCDataPath ................. ... ... .. ... .. ...... 2-7 2.3 CacheControl ........ ... .. it e 2-8 2.3.1 BPC Control ... ... .. e 2-8 2.3.2 VIC Control. . ... e 2-8 e e 2-9 24 o i e e et ettt et PCUErrorDetection........................ e 2.4.1 PCVCError Logic . ...... .ttt it i i 2-9 242 PCBP Error Logic . . ... 2-11 2.4.3 PCLOError Logic . . ... .. oo e e 2—-11 244 PCHI Error Logic . ...... .. .. it 2-14 2.5 PCUINPULS . . .ottt e e e e 2—-14 26 PCUOUtputs e 2-16 VI . e e e e e 3-1 ........ . it i i i e e i e et it Instruction Fetch 3.1 3.1.1 %2 (O = 5 3.1.2 VIC Data Write . . . ... 7 ittt e et et et e et e e e 3-2 3-2 3.1.2.1 VIC Address Selection. . ........... ... i iiininnnnn. 34 3.1.2.2 VIC Data. . .. .. 34 3.1.3 VIC Tag Write . . . .. oot e e eneen 3-5 et 3-6 .. .. ...... ... ... . ... 3-6 Block Valid Write . ........... ...ttt 3-6 ... it et e 3.1.3.2 Quadword Valid Write. 3.1.3.3 3.1.34 Tag Parity . ... ... e e 3-6 3.14 VICFlush . ... 3.1.5 VICDataRead .......... ... . 3.1.5.1 i e e e e Tag Write . ... . i ettt 3.1.3.1 e 3-7 . i, et 3-7 VICSTRAM Bypass . .....coiii ittt it ieiiieeiaenennn 3-8 3.1.6 VICParityCoverage ... .........iiiiiiiiieiinennnnnn. 3-8 3.1.7 Disabling VIC Hits . .. ..ottt e 3-8 Imstruction Buffer .. ...... ... .. . .. .. . . . . .. 3-9 IBEX2 ... e EERETRERRE 3-10 3.2 3.2.1 Contents v e 3-11 e IBEX . ottt et e e e e 3.2.2 IBEX Valid Count . . .o oot i it it e it i eeee e i 3-11 3.2.21 e ST 3-11 1 3.2.3 IBEX2 Rotate Data . . ... .o it ee et ittt i i e 3-13 3.2.3.1 R 3-14 1 0 5 02 ST 3.2.4 Y i 7 S 3-16 3.2.5 65 N 3-18 3.2.6 1§51 Simple Decode. . . .. vi i e 3-18 3.2.6.1 it ieen e 3-19 Branch Decode . ... .o it i 3.2.62 eae 3-19 e YREG Decode .. v oot iieeeeeeeiiaeeaa 3.2.6.3 Short Literal Decode . . ... .cvvveeeeennenn.e 3-19 3.2.6.4 s 3-19 Register Mode Decode . . .. ... ovininiiiiin 3.2.6.5 3-19 tt Instruction Buffer Parity ... ..... ..ot 3.2.7 . 3-20 y .. i .... 3.3 Instruction Buffer Interface ....... Instruction Buffer Requests . ....... .. ... . i 3-22 3.3.1 Aborting Requests . . . ... ovi i 3-23 3.3.1.1 PageFaults........ ...t 3-23 3.3.1.2 Instruction Decode e i se 4.1 XBAR .. e e e 4.1.1 DRAM . o o 4.1.2 XBRAM ..o e 4.1.3 Simple Decode Logic . . .. .o o iiei e 4.1.4 Decode Tree LOgic . . . oot it 4.1.5 Request LOZIC . . oot it 4-1 4-4 4-5 4-6 4-7 4-9 4.1.6 4.1.7 Specifier Count Logic . ......voviiiii 4-12 Shift Count LOgic . .. oo v it 4-13 4.1.8 Fork LOiC . . oottt e 4-14 4.1.7.1 FD Shift Opcode .. ..o ov it 4-13 XBAR Displacement DataPath .....................ooonnnn 4.1.9 Displacement . .. ...t 4.1.9.1 Extended Immediate Mode (X8F) Detection . ................ 4.1.9.2 t e t e XREG & oot 4.1.9.3 e e YREG & o ittt e 4194 4-14 4-14 4-15 4-16 4-16 .. 4-16 v ...... 41.10 XBAR Short Literal DataPath ......... 4.1.10.1 Short LiteralDataSelect ........... ... 4-16 4.1.102 Short Literal Specifier Number....................covnnnn 4-17 41103 ShortLiteral Valid ....... ... i 4-17 4.1.11 XBAR Source and Destination Logic .. .......... ..o 4-18 4112 XBARSourcelDataPath ................cconennn 4-19 ..... 4-20 ..i 41.13 XBARSource2DataPath ....... 4.1.14 41141 4.1.142 XBARDestination...........c.oouinieemnianneennnnnn 4-21 Destination Valid . .......c .o 4-21 . 4-21 o .. . .. .. . Destination Register Valid ...... Vi Contents 4.1.15 RegisterMasks . ........... 41151 Read Mask ........... 4.1.152 Write Mask .. ... 41.153 4.1.16 Intra-Instruction Read Conflicts . ............ ... ... ... ..... 4-23 41.17 XBARStalls .........ci e 4-25 BranchPrediction........... ... ... .. 4-27 Primary Predictions ........... ... ... . . . .. 4-27 42 42.1 42.1.1 42.1.2 i L. 422 i 4-22 ... . e 4-23 ImpliedMask ........ ... ... i 4-23 i Primary Hits . .. ... ... i e i i i i e e e 4-28 TagMatchEnable............ ... ... ... . . . . . . ... 4-29 4.2.2 Demote ... .. e e e 4-29 423 Secondary Predictions . . ........ ... i 4-30 424 BPC Correction . ...ttt ittt 4-30 4.2.5 BPCUnwind . .. .......... it 4-31 43 PCUMICrocode ... ... vt e e 4-31 43.1 PCU Microaddress e, 4-31 4.3.2 PCU Microword. .. ...ttt e et et et et e e e 4-34 4.3.3 Writing the BPC . . ... ... .. .. . . e 4—40 BPCWriteEnable. . ....... .. ... ... .. ... 4—40 43.3.1 ..........c.uiiiiii e 4.3.3.2 Cache Tag Write . ... ... ... ... . . . 4-41 4.3.3.3 Instruction Length Field Write . . .. ........... . ... ... .... 441 4334 Prediction PCWrite 4.3.3.5 BP Displacement Write . .. . ... ......... ... . . ... . .. 4-42 4.3.3.6 BP Prediction Bit ... ......... ... . . . . 4-42 4.3.3.7 BPC Address Selection .. .......... ... ... . .. 445 ... ... 442 Specifier Decode 5.1 5.1.1 52 5.2.1 52.1.1 OVEIVIEW . . i e e 5-1 Stall Logic . . .. ... e 5-3 Complex Specifier Unit .. ....... ... OPUADataPath ... .. ........ .. .. . i AMUX Inputs . ... .. i e e e e e 5-4 5-5 5-7 5.2.12 BMUX Inputs .. ...ttt et e e e 5-7 5.2.1.3 Adder ... .. e e e e 5-8 5.2.2 OPUBDataPath ............ .. .. 5-8 5.2.3 Current PCGeneration . ... ..........c.iiiiiiinennennnnn.. 5-10 5.2.3.1 5.2.3.2 OPUA Current PC[15:00]. . .. ... ittt OPUB Current PC[31:16]. ... ... v it e i 5-11 CSUMicrocode . ........c.ii ittt ittt it e 5-12 5241 CSUMicroaddress . ........c.cviiiiniitiniemeneneennnnn. 5-12 5.2.4.2 CSU Microword . . . . ..o o e e 5-13 524 e e e e e e e e e e i i 5-11 e e e e e Contents Vii et e e e CSU Stalls ..ottt ittt et e e e e e tt Scoreboard Stalls . ... ...t .. .. .. .... .. ... .... Branch Under Branch Stall ....... . ... ... .. ..... AUTOxx Under Branch Stall . .......... L. .. OPUPort Grant Wait Stall . .. ...... ... .. 5-15 5-15 5-18 5-18 5-19 t ittt iiiinennnans ShortLiteral Unit ....... ..tt 58 nnneennn Short Literal Processing .. ... ... ..o iiiiieiii 5.3.1 Integer Expansion....... ..., 5.3.2 s Floating-Point Expansion .............c.ooiiiiiiiienan 5.3.3 Outputs to the EBox Interface ............................. 534 e e e e ot Order ..e 5.3.4.1 U AR o] -1 ) L 5.3.5 i i Source List Full .. ... ... .. 5.3.5.1 et it tee iy SLUStalled .. ....ii ittt 5.3.5.2 .. .. .. .... ... ... .... .. EBox Interface Qutput Stall 5.3.5.3 ... . ... ... Parity Coverageand Errors ............ 5.3.6 5-19 5-19 521 5-22 5-24 5-24 5-24 5-24 5-24 5-24 5-24 FreePointer Logic. . ... ... . i 54 e e e e e Source 1 PoInter .. ..o ittt 54.1 e e e e oot .o . . Source 2 PoInter 5.4.2 e Free PoInter ... oottt ettt e e et 54.3 .. .. .. .... ... Free Pointer Initialization. ... .... 5.4.3.1 e Destination Pointer . . . ... . it e i 54.4 i i . .. ... ... . Register Destination 5.4.4.1 Destination Valid . . ...... ... i i 5.4.4.2 Destination Memory .. ........ .. 5443 Operand Control Unit ...... ... ... ... ... 55 e et it it i i i eiaaaa Read/Write Masks . . . .. oottt 5.5.1 e e e e e e i Mask Valid . ... .. i 5.5.1.1 e e Instruction Done . . .. .. it 55.1.2 O (073 w1710 « NPT G 55.1.3 Mask Parity .. ... ... ..oy Read/Write 5.5.2 e e FIOShES &« . oo ot et et e et e et et e e 5.5.3 G & P71 7:1 | 1T PR (0104 5.54 e t Scoreboard Stall ... ... . i 5.54.1 e e e Mask Stall. .. ..ot 5.5.4.2 5-25 5-25 5-27 5-29 5-30 5-30 5-30 5-30 5-30 5.2.5 5.2.5.1 5.2.5.2 5.2.5.3 5.2.5.4 5-32 5-32 5-33 5-33 5-34 5-35 5-35 5-37 537 5-37 ineeeenneeann 5-38 56 OPUPortInterface ..... ...ttt 57 58 IBox-to-EBox Interface ........... ... 5—40 EBox-to-IBoxInterface ........ ...ttt 5—44 59 VBoxInterface ... .... .ttt ettt 547 viii Contents 6 IBox Error Descriptions A 6.1 IBox Error Registers . .. .. ... ... .. i . 6-1 6.2 Fetch Error Register 1 ... ... ... ... . .. 6-1 6.3 Fetch Error Register 2 . .. ... ... ... . .. 64 6.4 Decode Error Register 1 .. ... . . . .. 6-5 6.5 XBAR Decode Error Register .. ......... ... ... ..., 6—6 6.6 Specifier Error Register 1 . .. .. ... ... .. .. 6-8 6.7 Specifier Error Register 2 . . ......... ... ... .. .. . . . ... .. ... ... ... ... i i . .. .. 6-10 ........... ... ... ... . . . ... 1-2 .......... ... ... .. .. .. ... 1-5 IBox Input and Output Listing Index Figures 1-1 Basic IBox Block Diagram 1-2 Planar Module Layout 1-3 VICMCU Content ittt 1-6 14 XBRMCU Content . ........i ittt .......... .0ttt 1-7 1-5 OPUMCU Content . ..ottt it ittt ettt et i 1-6 IBox Pipeline: NoStalls .. ....... .. ... .. .. . . ... 1-8 1-10 1-7 IBox Pipeline: CSU Stall . .......... ... . ... . .. 1-11 1-8 IBox Pipeline: Branch Taken, Secondary Prediction ... ............ 1-12 1-9 IBox Pipeline: BP Cache Hit, Prediction Taken .................. 1-12 2-1 PCUDataPath . ........ ... .. 2-2 2-2 Prefetch PCDataPath ... ........ ... .. .. ... ... .. ... .. .. ..... 2-3 2-3 Target PC Sources. . ... oottt et e e 2—4 24 Decode PCDataPath .. ........... .. .. .0 i .. 2-5 2-5 Branch PCDataPath ........... ... .. ... .. ... .. ... .. .. ..... 2-6 2-6 Unwind PCDataPath 2-7 2-7 PCU Cache Control . . ... ............ ... .. ... .. ... . . . ... . .... .. i it e it e i i i 2-8 2-8 PCVC Error Logic . . . ... .. een 2-10 2-9 PCBPError Logic . . ...t i ettt e e 2-12 2-10 PCLOError Logic . . ... ..ot et e 2-13 2-11 PCHI Error Logic . ... ittt et e e et iee e 2-15 3-1 VI 3-2 VIC Data D Write . . .. ... 3-3 VIC Address Selection. 34 VIC Tag Write . . .. .. 3-5 VIC Match Logic . .. ... it ettt e i e . e e e e . 3-1 it ettt 3-3 . .. ...... ... ..ttt 34 ... e et i e it e e 3-5 et et e e i 3-7 3-6 Instruction Buffer . . ........ ... ... ... . .. 3-9 3-7 Rotator . ... ... . e e e e 3-12 3-8 IBEX2 Rotate Data . .......... ... ... .. .. . .. i 3-13 3-9 Simplified Merger . .. ... 3-14 ... . . . e e Contents ix R 3 =) o t e e 3-11 IBUF Data & .ot ittt e ettt ettt eae 3-12 Simple Decode. . .o oottt 3-13 Instruction Buffer Interface ...... ... ..., iiiiiiineenen. 3-14 Instruction Buffer Request .. ... aeennnnn iiiniiiin XBAR Block Diagram . .........coiuuiiii eae e DRAM LOZIC . oot et e ie e it it iae et 3-15 3-17 3-18 3-20 3-22 4-2 44 XBAR Decode Trees . .ot v ittt ieeeese e e anonenneneness R2BW Decode Tree ... oo iii ittt ettt it iiaeeaenaneens 4-7 4-8 ELLLILET 3-10 Simple Decode Logic . . ...t 4-6 e 4-10 Request LOGiC . . .. oiiii i 4-12 o .. ... ... Specifier Count Logic . ....... Shift COUNLS . . v v ittt e ittt et ettt etean it neaaansaeeann 4-13 XBAR Displacement ... ........ .ttt 4-15 4-9 4-10 Short Literal Logic . . . ..o oot i 4-17 4-18 4-11 Source and Destination Logic .. ...... ... .. i 4-19 4-12 XBAR Source 1 Logic .. ...t t 4-20 e 4-13 XBAR Source 2 Logic . ... ..tt 4-21 e 4-14 XBAR Destination . . . .o v v ittt i ittt et 4-15 XBAR Read and Write Masks . . .. ... ..ottt 4-22 . i - 4-24 4-16 IRC Detection: Read Mask .. ... i e 4-25 4-17 IRC Detection: IRCMask . . ..o ittt i it enens 4-27 e 4-18 BPC Organization . .. .. .....intt ittt 4-28 4-19 BPC Compare: Hit .. ... ... i 4-28 t 4-20 BPC Compare: Demote . . .. ... oo 4-21 BPTagMatchEnable .. ......... ... .o it 4-29 4-22 Branch Bias Logic . . . .. oo i 4-30 4-23 PCU Microword Format .. ... ... . i, 4-34 e 4-40 i 4-24 BPC Write Enable. . . ... e 441 4-25 BPCTag Write . .. ..ottt 441 i 4-26 BP Instruction Length .. ... ... . . . e ieiieeeaae e 4-42 it it tt 4-27 BP Prediction PC Write . .. .ot it 4-42 i . 4-28 BP Displacement Write . . ...... ... 4-44 4-29 BP Prediction Bit Write .. ... ...ttt 4-30 BPC Address Selection . ........c..ititiiimitneeenenneannan 4-45 e e 5-3 it e et OPU Stall Logic .. oot ei it 5-1 CSUOTIganization . .. ..o oot iii i et 54 5-2 e 5-6 e e e e A . .o OPU 5-3 5-9 e e e e OPUB . .ottt 5—4 5-10 5.5 CSU Current PC(High Slice) . ....... ...y 5-6 CSU Current PC (Low Shice) . . . . oo i ittt ittt i e eeeea e 5-10 i 5-12 CSU Microaddress Format . ........... 5-17 5-8 CSU Microword Format . .. ... ottt 5-13 CSU Microword Select .. ...t i i 5-16 5-9 5-19 5-10 Input and Outputs of the Short Literal Unit .................... 5-20 . oo, ... . ... 5-11 Short Literal Unit Block Diagram. ...... x Contents 5-12 SLU Integer Expansion...... ...t 5-21 5-13 SLU Floating Point Literal Format............................ 5-22 5-14 SLU F-Floating Expansion ............. ... ... 5-22 5-15 SLU D-Floating Expansion ... ............. i, 5-22 5-16 SLU G-Floating Expansion . .. ... ..... ...ttt 5-23 5-17 SLU H-Floating Expansion . . . ... ..... ..t reienennn.n 5-23 5-18 Free Pointer Logic. ... ..... ... i i 5-26 5-19 FPL Source 1 Pointer Logic ............ ... ... ... ... 5-27 5-20 FPL Source 2 Pointer Logic ........... ... ... .. 5-28 5-21 Free Pointer ... ... ...ttt ittt teteananeenanns 5-29 5-22 Destination Pointer Logic .. .......... ... .. .. . ... 5-31 5-23 L107 N I 5 51 A 5-32 5-24 OCTL Read/Write Masks . ........... i iiiiuintnneennnnnn. 5-33 5-25 OCTL Mask Correction Logic . ......... ... ... .. ... 5-34 5-26 OCTL Flush Logic. . .. ..o et 5-36 5-27 OCTL Mask Stall Logic........... ... .. . ..e 5-37 5-28 OPU Port Interface . . . ... i it 5-38 5-29 IBox-to-EBox Interface . .......... .. ... .. i 5—-41 5-30 EBox-to-IBox Interface .. ......... .. .. ittty 545 .. 5-31 VBox Interface . ... ... e 5-47 6—1 Fetch Error Register 1 .. ...... ... . ... iy 62 6—2 Fetch Error Register 2 . ... .. ... .. ... .. 64 6-3 Decode Error Register 1 . . ... ... .. ... .. i 6-5 64 XBAR Decode Error Register . ........ ...y 6—6 6-5 Specifier Error Register 1 . ... ... ... ... i 6-8 6—6 Specifier Error Register 2 . . .. ... 6-10 ... ... . i Tables 2-1 |SOO D) 6 ' 2-2 PCBP Errors . . ot o e e 2-9 2-11 e 2-11 2-14 3-1 Sample Merge Select . . . ... ... . e 3-15 3-2 Instruction Buffer Interface Signals ............ ... ... .. ... .... 3-21 LLEE PCLO Errors . oot e e s PCHI Errors . ..ot it ee e e e it e et e e 2-4 ittt e e e e e 2-3 Case Outpul .. ... i e e e e e 4-6 XBAR Decode Trees ... .cvvinie it ittt eneneienens e 4-9 PCU Microaddress Descriptions ... .... ..., 4-32 PCU Microword Field Descriptions . . ... ..., 4-35 CSU Microaddress Deseriptions ... ...........iiiiii s 5-12 CSU Microword Field Descriptions . . ... ...... ... 5-14 Modulo 6 Subtraction Logic ........... ... ... .. Register Valid Fields . . .. .. ........ ... ... . . ... 5-34 5-35 EBox Flush Codes . . .....co ittt ea s 5-36 OPU Port Interface Signals. . . ...... ... ... .. ... 5-39 IBox-to-EBox Interface Signals .. ........... ... .. ... ... .. ... .. . ... .. ... ... ..... EBox-to-IBox Interface Signals . . ......... 542 5-46 Contents 5-9 6-1 62 63 xi i i 547 VBox Interface Signals . ...... .. 62 Fetch Error Register 1 . ... ... it 64 Fetch Error Register 2 .. ... ... i, Decode Error Register 1 .. ... .. it 6-5 ... 6-7 ... XBAR Decode Error Register .. ........ .... 6-9 i ... Specifier Error Register 1 . ...... Specifier Error Register 2 ... .. ........ . ... i 6-11 A-1 A-2 A-3 IBox-VIC Signals .. ...ttt IBox-XBR Signals .. ....ccovin ittt e tt i IBox-OPU Signals . ... ..ot A-1 A-2 A4 About This Manual family system. It is This manual describes the functions of the IBox in the VAX 9000a trainin g resource for as well as el personn s a reference manual for Customer Service Educational Services. Intended Audience The content, scope, and level of detail in this manual assumes that the reader: o Is familiar with the VAX architecture and VMS operating system at the user level e Has experience maintaining midrange and large VAX systems Manual Structure This manual has six chapters, an appendix, a glossary of IBox terms, and an index. s, functions, Chapter 1 contains an introduction to the IBox, describing the major feature . Chapters counter m progra the es describ 2 and physical organization of the unit. Chapter chapter each with IBox, the of ns functio 3, 4, and 5 provide a detailed description of the emphasizing one of the three main pipeline stages of the unit. Chapter 6 is a summary of the errors that can be generated in the IBox. Appendix A provides a listing of the input and output signals of the three multichip units (MCUs) that comprise the IBox. DIGITAL INTERNAL USE ONLY xlii 1 General Description This chapter provides an overview of the VAX 9000 family IBox, including descriptions of the major hardware features, physical characteristics, and interbox interfaces. IBox Introduction The IBox is an independent functional unit that fetches and decodes instructions and their specifiers from the MBox and passes them to the EBox for execution. The EBox is provided with data and control functions so that it can execute several instructions in one 1.1 cycle. The IBox provides all required instruction data to the EBox: source, destination, PC, fork address, as well as pointers to the source data and destination. The instruction data is stored in the EBox source list or general-purpose registers (GPRs). In the case of a memory source operand, the IBox generates the operand address and passes it to the MBox requesting a memory read operation. The MBox prefetches the data and then writes it into the EBox source queue. Using the source pointers provided by the IBox, the EBox accesses its source list or GPRs and executes the instruction. Depending on the destination, the EBox writes the results to a GPR or transfers the results to the MBox to be subsequently written to memory. In effect, the EBox deals only with data (no opcodes or operand specifiers). In addition, the IBox can decode and store several instructions ahead of the EBox. However, given the capabilities of the EBox and MBox, it is difficult for the IBox to remain several instructions ahead. 1.2 Basic Hardware Implementation 1.2.1 Program Counter Unit This section describes the major new hardware implementations incorporated into the | IBox. Figure 1-1 shows the basic IBox block diagram. The program counter unit (PCU) is responsible for directing the I-stream that the IBox will decode, and for generating the PC address used by the EBox and MBox. The PCU provides the control logic for the two IBox caches: virtual instruction cache (VIC) and the branch prediction cache (BPC). In addition, the PCU controls the secondary branch prediction mechanism. DIGITAL INTERNAL USE ONLY 1—1 4334n8 Figure 1-1 l 0 o 8 3 ' "x r e l I n s 3 u JLlium/avIdSHSV G3ANVdIX3eSS [0: D3IHA 300N O3IHX J1 dm/Qv3iy SHSVYW Hvax (+dlnv0aivla X083 » — X08W DIGITAL INTERNAL USE ONLY 10 Basic iBox Block Diagram 3{o00:213€0lodf7o1:31D€HlV0Ld uwBE00X68 dlove:5X0]11esldsiQ(o x“xHooAla8D!olLM4loo:v:seellvisvaaua v. { xoa3a ©3L0I8HM 0[I:-€9lAviva Q1A H [00:1E]0d O L 3 4 3 H d " xo8) el [eo:rela v Xl0o8o3:1elLIns3u 1[oo'1elod 3HO9NV4HVSLE"[0 :1€]0d Od9 —{ NOILVYNILS3A SHILNIOd N30HNOSOILVYNILANVS3IQ SHILNIOd =« X093 NOILDIG3HG" (00:1€]0d xoan [o :eglviva 1 4 I H S ANNOD NOILONYULSNI xo8w 4o lo0:1elviva 1-2 General Description nis 1dd A30HNOVS 3l0o:013e0lod General Description 1-3 1.2.2 Virtual Instruction Cache The virtual instruction cache (VIC) is a virtually addressed, 8 Kbyte, direct-mapped, one:way associative cache that reduces the number of I-stream requests issued to the MBox. By flushing the VIC on every REI, the IBox can ignore writes to memory. Having a virtually addressed instruction cache eliminates the need for address translation and translation logic. Because the VIC has its own MBox request port, it is refilled from the MBox data cache, rather than memory. 1.2.3 Instruction Buffer The IBox incorporates a 25-byte instruction buffer that latches the I-stream. The instruction buffer is partitioned into a 9-byte instruction buffer IBUF), an 8-byte extended instruction buffer IBEX), and an additional 8-byte extended instruction buffer (IBEX2). The content of the nine IBUF bytes are latched, decoded, and shifted. The remaining 16 bytes of IBEX and IBEX2 contain additional prefetched I-stream from the VIC, which is passed to the IBUF as required. 1.2.4 Branch Prediction When branch instructions are encountered in the I-stream, the IBox predicts the direction the I-stream will follow (redirects the I-stream by taking a branch or continues decoding sequential I-stream by not taking the branch). When initially encountering a branch, the IBox decides to take or not take the branch by accessing the logic that contains fixed predictions for each branch instruction. Branch predictions are validated by the EBox and the correct predictions are stored in a branch prediction cache that is accessed when the same branch is subsequently encountered. 1.2.4.1 Branch Prediction Cache To minimize the idle time spent flushing and refilling the pipeline after every branch instruction, the instruction decode stage includes a branch prediction cache (BPC). This 1K virtual cache increases performance by storing information about the branch validity, and the target address. As a branch instruction is being decoded, it is referenced, in parallel, in the BPC and a prediction is made whether to take the instruction. The IBox uses the cached target address to redirect the instruction fetch stage to the new I-stream if the branch is taken. For performance reasons, the BPC is never flushed. 1.2.5 Multiple Specifier Decode Unit The multiple specifier decode unit (XBAR, or “crossbar”) is implemented as a set of multiplexers that provide the capability of simultaneously decoding up to three operand specifiers. The I-stream is presented to the XBAR nine bytes at a time from the IBUF. The actual number of specifiers decoded depends on the specifier type. The XBAR can decode up to three specifiers (for example, two simple specifiers and one complex specifier, or three simple specifiers). Simple specifiers are considered register mode or short literal, while all other specifiers are considered complex. DIGITAL INTERNAL USE ONLY 1-4 General Description 1.2.6 Specifier Handlers The specifier handlers are in the operand processing unit (OPU). The logic is comprised of three dedicated specifier evaluation units: Complex specifier unit Short literal unit Free pointer logic In addition, the OPU maintains a set of GPRs. The GPRs are implemented in self-timed register (STREG) files that provide multiported, read/write access capabilities. The OPU maintains a virtually addressed MBox request port. 1.2.6.1 Complex Specifier Unit The complex specifier unit (CSU) is responsible for evaluating complex specifiers. The XBAR supplies the CSU with the register, register index, and up to 32 bits of displacement. The CSU calculates branch target addresses, immediate operands, and memory addresses of operands supplied to the EBox from the MBox. The CSU contains the OPU port interface to the MBox and the interface to the EBox. Operand addresses are sent to the MBox, while the EBox receives the immediate operands directly from the CSU. The CSU also contains the IBox GPRs. The GPRs are read and written by the CSU, and written by the EBox. 1.2.6.2 Short Literal Unit The short literal unit (SLU) receives decoded short literal specifiers from the XBAR and expands them for entry into the EBox source list. The SLU can produce a single longword of expansion each cycle. The literal expansion depends on the specifier data type. 1.2.6.3 Free Pointer Logic The free pointer logic (FPL) manages pointers into the EBox source list. The FPL tracks the available (free) source list addresses and associated pointers. The FPL establishes the correct source 1 and source 2 pointers for operands the EBox will use to execute an instruction. The FPL also generates the correct destination pointer for an instruction result. DIGITAL INTERNAL USE ONLY General Description 1-5 1.2.7 Read/Write Scoreboards The IBox processes specifiers while the EBox is executing previously decoded instructions. The IBox must be prevented from performing an address calculation that depends on the result of a currently executing instruction. This is accomplished by recording the register numbers to be written by the EBox, and by matching those numbers against any register selected for use in an operand specifier. When a match occurs, the CSU stalls and waits for the instruction to be completed before calculating the operand address. The XBAR generates the read and write masks for conflict checking in the OPU. The masks represent GPR 0 through 14. The masks prevent reading or writing a GPR that is scheduled to be modified or read by the EBox. The CSU maintains the read and write register scoreboards for up to six instructions in the pipeline. Both scoreboards contain 15-bit registers, representing GPR 0 through 14. The read scoreboard tracks the GPRs designated to be read by a previously decoded instruction. These GPRs cannot be written by the OPU for autoincrement and autodecrement until their corresponding instructions are completed by the EBox. The write scoreboard tracks GPRs designated as destinations by instructions in the pipeline. These GPRs cannot be used by the OPU for address calculations until the instruction is completed by the EBox. 1.3 Physical Organization The IBox logic is physically contained in three multichip units (MCUs) (Figure 1-2). This section introduces each MCU and its related macrocell arrays (MCAs). CTU MUL FAD VAD 1 5 s 13 DTA DST INT ucs 2 6 10 14 VAP DTB CTL VRG 3 7 11 15 oo | U opu e e == - XBR 8 vic | 1] 2 | 'lggx _______ J vme 1e MR_X0040_89 Figure 1-2 Planar Module Layout DIGITAL INTERNAL USE ONLY 1-6 General Description 1.3.1 VIC MCU The VIC MCU is comprised of two MCAs and forty-two 1K x 4-bit STRAMs. Two bitslices of the program counter and the data STRAMs for branch prediction and VIC are resident in this MCU. Figure 1-3 shows the MCU content. The following list introduces the MCA and STRAM functions of the VIC MCU: e PCBP MCA — The program counter/branch prediction control MCA contains bits [07:00] of the PC and provides branch prediction control. e PCVC MCA — The program counter/VIC control MCA contains bits [15:05] of the PC and provides VIC control. e VIC data STRAMs — These eighteen 1K x 4-bit STRAMs are dedicated to the VIC data and associated byte parity. * Branch prediction STRAMs — These twenty-four 1K x 4-bit STRAMs are dedicated to the branch prediction function. The STRAMs store the branch PC tag, prediction PC, branch instruction length, and prediction bit. viC l BPST I BPST BRANCH PREDICTION TAG ADDRESS AND PARITY [ pcer PROGRAM COUNTER ' [07:00] BP TARGET ADDRESS BP CONTROL DISPLACEMENT l PCVC PROGRAM COUNTER INSTRUCTION LENGTH cD12 [15:05] PREDICTION VIC CONTROL PARITY - I VicD VIC DATA STRAMs AND PARITY MR_X0041_89 Figure 1-3 VIC MCU Content DIGITAL INTERNAL USE ONLY General Description 1-7 1.3.2 XBR MCU The XBR MCU contains the instruction buffer, XBAR, two of the four PC slices, and tag STRAMs for the VIC. Figure 1-4 shows the MCU content. The following list introduces the MCA and STRAM functions of the XBR MCU: PCLO MCA — The program counter low MCA contains bits [23:13] of the PC. PCHI MCA — The program counter high MCA contains bits [31:24] of the PC. IBFA MCA — The instruction buffer A MCA contains the low-order nibble of the instruction buffer. IBFA also provides the VIC hit logic. IBFB MCA — The instruction buffer B MCA contains the high-order nibble of the instruction buffer. Parity checking is performed by combining IBFB parity with a partial parity from IBFA. XDTA MCA — The XBAR data A MCA contains the low-order nibble of the XBAR. The major MCA outputs are displacements for the OPU. XDTB MCA — The XBAR data B MCA contains the high-order nibble of the XBAR data path. XSCA MCA — The XBAR control MCA is the XBAR control unit. It receives Istream data from the IBUF and performs some simple instruction decoding. The instruction buffer shift control is generated from the number of specifiers decoded and the number of specifiers the instruction contains. VICT STRAMs — Five of the nine 1K x 4-bit STRAMs contain the 19-bit VIC tags. Two of the STRAMs provide the 4-bit VIC quadword valid fields and associated parity bits and parity for the VIC quadword valid bits. The remaining two provide the VIC block valid field. PROGRAM COUNTER IBUF LOW NIBBLE XBAR LOW NiBBLE I PCHI I IBFA | XDTA [31:24)] | VICT I XSCA XBAR CONTROL AND DECODE SHIFT COUNT cDo8 Qw VALID AND PARITY | PCLO | {BFB l XDTE XBAR HIGH NIBBLE VIC TAG BLOCK AND IBUF HIGH NIBBLE PROGRAM COUNTER [23:13] MR_X0042_89 Figure 1-4 XBR MCU Content DIGITAL INTERNAL USE ONLY 1-8 General Description 1.3.3 OPU MCU The OPU MCU contains the logic responsible for the specifier decode process. The operand port interface to the MBox also resides in this MCU. The MCU also contains a pair of self-timed registers (STREGs) that provide the IBox GPRs. Figure 1-5 shows the MCU content. The following list introduces the MCA and STRAM functions of the OPU MCU: ¢ OPUx MCA — The OPUA and OPUB MCAs provide the data path for the complex specifier unit (CSU). (OPUA provides the low-order word; OPUB provides the highorder word.) The CSU receives up to 32 bits of displacement from the XBAR, operand data from the MBox, and result data from the EBox and directs the outputs to the MBox, EBox, or a loopback to the CSU. e OSQA MCA — This MCA provides control for the GPR STREGs. The OPUA and OPUB multiplexers (AMUX, BMUX) are also provided by this MCA. This MCA also controls the EBox and OPU port interfaces and stall logic. e OSQB MCA — This MCA receives short literal, source, and destination data from the XBAR. Short literal specifiers are expanded into the correct context and passed to the EBox. The source and destination pointers are also passed to the EBox. e OCTL MCA — The operand control MCA maintains the read/write scoreboard and directs flush signals to other appropriate IBox functional units. oPU l §TG2 GPR [15:00) l OPUA I OPUB CSU ADDRESS CSU ADDRESS [15:00] (31:18] CALCULATION CALCULATION l S§TG3 IOSQA RLOG, GPR, GPR [31:16] cDo4 CSU AND OPU PORT CONTROL STALL LOGIC I OCTL FLUSH AND SCOREBOARD LOGIC I osQB SL EXPANSION AND FPL MR_X0043_89 Figure 1-5 OPU MCU Content DIGITAL INTERNAL USE ONLY General Description 1-9 1.4 Pipeline Overview The IBox consists of three main pipeline stages that closely relate to the MCU physical structure. Each stage can generate indicators to aid in isolating errors to an MCU FRU. The three pipeline stages are defined as follows: Instruction fetch — This stage consists of the logic involved in fetching the I-stream before it is latched in the instruction buffer and includes these components: VIC Program counter logic (PCU) IBEX, IBEX2, and part of the IBUF IBUF-to-MBox interface Instruction decode and branch prediction — This stage consists of the logic involved in decoding the instruction in the IBUF and includes these components: XBAR Branch prediction unit (BPU) Specifier evaluation — This stage consists of the OPU logic involved in the evaluation of operand specifiers and includes these components: Complex specifier unit (CSU) Branch prediction logic Free pointer logic (FPL) Short literal unit (SLU) Operand control unit (OCTL) OPU-to-MBox interface OPU-to-EBox interface Because the IBox is pipelined, when a stage cannot complete its operation, previous stages must be stalled. That is, previous stage operations may be suspended. Figure 1-6 through 1-9 provide examples of basic pipeline flow. DIGITAL INTERNAL USE ONLY 1-10 General Description Figure 1-6 shows the IBox pipeline decoding sequential instructions where no stalls are encountered in any of the pipeline stages. The following events occur in each of the machine cycles of this example: * During the first machine cycle (T0), the IBUF is loaded with the I-stream to be presented to the decode units of the IBox. * During the second cycle (T1), the logic representing the decode pipeline stage decodes the opcode and the specifier bytes in the IBUF, and passes them to the specifier handlers. The logic representing the fetch stage replenishes the decoded bytes of the IBUF. * During the third cycle (T2) of this example, the decoded specifiers are processed by the specifier handlers and passed to the EBox. The decode logic and the fetch logic continue to perform their operations simultaneously. The parallel operations of the three pipeline stages continue until one of the units stalls or until the IBox is flushed to a new I-stream. A stall in one of the units is caused by instruction specifiers that require more than a single cycle to be processed or by conflicts in the instructions (for example, an instruction that contains two sequential complex specifiers and the first one requires more than a single cycle to be processed in the specifier pipeline stage). TO < T4 FETCH T2 T3 T4 >< DECODE SPECIFIER > < DECODE >< SPECIFIER > FETCH FETCH < TS DECODE >< SPECIFIER > MR_XCc<e_88g Figure 1—6 IBox Pipeline: No Stalls DIGITAL INTERNAL USE ONLY 1-11 General Description Figure 1-7 shows the IBox pipeline when the processing of a specifier requires more than a single cycle and a subsequent specifier requires processing by the same specifier handler. This stall occurs when the CSU is processing an autodecrement-deferred indexing mode specifier and another complex specifier is decoded by the XBAR. The following events occur in this example of the IBox pipeline: During the first machine cycle (T0), the IBUF is loaded with the I-stream to be presented to the decode units of the IBox. During the second machine cycle (T1), the decode unit decodes an autodecrement- deferred indexing mode specifier and the fetch unit replenishes the decode units with I-stream. During the third machine cycle (T2), the CSU begins processing the autodecrementdeferred indexing mode specifier, the decode unit decodes a complex specifier and is replenished by the buffers representing the fetch stage of the IBox pipeline. Because the autodecrement-deferred indexing mode specifier requires multiple cycles to process and a subsequent complex specifier is to be processed by the CSU, the IBox pipeline stalls and cannot resume operations until the first complex specifier is processed. The autodecrement-deferred indexing mode specifier requires a minimum of four cycles to be processed. 2 T5 T6é 77 )< SPECIFIER > >< SPECIFIER DECODE (SPECIFIER ’< SPECIFIER >< SPECIFIER > N ,( DECODE T4 T3 N ( FETCH T1 . T0 FETCH MR_X0545_839 Figure 1-7 1Box Pipeline: CSU Stall DIGITAL INTERNAL USE ONLY 1-12 General Description Figure 1-8 shows the pipeline when a branch is taken, but is predicted by the secondary prediction mechanism. The following events occur when a branch is predicted by the secondary prediction mechanism: * o * During the second cycle (T1) of this example, a branch is decoded by the XBAR. The BPC is accessed, but no information regarding this branch is valid. The branch displacement is passed to the CSU to calculate the branch target address. During the third éycle (T2), the secondary branch prediction logic is accessed and the branch is predicted taken. The branch target PC calculation is completed by the CSU and the PC is passed to the fetch logic the next cycle. During the fourth cycle (T3), the fetch logic requests the data at the address supplied by the branch target PC to be loaded into the instruction buffer. To < T1 FETCH T2 ' T3 T4 Ts >< DECODE >< SPECIFIER > FETCH MR_X(046_89 Figure 1-8 IBox Pipeline: Branch Taken, Secondary Prediction Figure 1-9 shows the IBox pipeline during the execution of a branch instruction. The branch, in this case, is stored in the BPC and the history bit indicates that the branch is to be taken. The BPC lookup requires one extra cycle to predict the branch and load the correct target PC. TO T T2 T3 T4 TS < FETCH >< DECODE k SPECIFIER > FETCH >< DECODE >< SPECIFIER > MR_X0047_89 Figure 1-9 IBox Pipeline: BP Cache Hit, Prediction Taken DIGITAL INTERNAL USE ONLY General Description 1-13 1.5 IBox Interfaces The IBox interfaces to each of the other CPU functional units through dedicated interfaces (or ports). This section describes each interface. 1.5.1 MBox Interface The IBox interfaces to the MBox through two ports in the MBox: e Instruction buffer port — This port requests data from the MBox when a miss is e Operand processing unit (OPU) port — This port requests memory-related source operands from the MBox and passes operand addresses that specify memory encountered in the VIC. destinations to the MBox. 1.5.1.1 Instruction Buffer Port The instruction buffer port is a read-only port to the MBox. The IBox uses the instruction buffer port to issue requests to the MBox for I-stream data — 64 bits at a time with byte parity. The I-stream is retrieved from the MBox cache. In the case of a cache miss, the request is forwarded to memory through the system control unit (SCU). Typically, a request is for four (aligned) quadwords to fill the VIC. Requests are initiated by the instruction buffer on a VIC miss. 1.5.1.2 OPU Port The OPU port is a read/write operand access port to the MBox. The port has a 32-bit wide data path with byte parity. Any rotation of the data (justification) is performed by the MBox. The port provides the following functions: Operand prefetch from cache or memory on behalf of the EBox and VBox Queuing of addresses for operands destined to cache and memory Prefetching operands that are address deferred (indirect) from cache or memory The IBox issues requests on behalf of the EBox and VBox for operands that come from memory. The operands are passed directly from the MBox to the EBox source list. The IBox sends the destination address to the MBox. In turn, the MBox performs a translation buffer (TB) lookup and stores the physical address in the write queue. The MBox then waits for the result data from the EBox. For deferred addressing, the MBox returns the address of the operand to the IBox for a successive fetch for the data (operand). The data operand is returned to the EBox. DIGITAL INTERNAL USE ONLY 1-14 General Description 1.5.2 EBox Interface The IBox interfaces to the EBox through the queue functional unit port in the EBox. This unit contains a set of FIFO buffers (queues) that accept instruction control information and operands from the IBox. 1.5.2.1 IBox-to-EBox Interface This interface is used to send operands to the EBox 32 bits at a time with byte parity and their respective pointers. These operands are handled by the OPU within the IBox (sign or zero-extended data, integer and floating short literal operands, immediate mode data). The source and destination pointers are maintained in queues within the EBox, and they allow the EBox to access the correct data. In addition, control information is passed to the EBox for microcode control, RLOG information, program count, and errors. 1.5.2.2 EBox-to-IBox Interface This interface is used to transfer: Result data A starting or flushing PC RLOG unwind data Control information (branch valid, queue full, keep masks) This interface transfers 32-bit EBox result data (including byte parity) to the IBox. The result data is generally an operand that has a destination of a GPR. The data, including the byte parity, is written to the IBox GPR set. In addition to the various control signals, the EBox result data may also be of a controlling function, for example, an address passed to the IBox to initiate instruction fetch. 1.5.3 VBox Interface The VBox issues requests for operands through the IBox. It sends the address (30 bits) with byte parity and control data. Control data is the reference data size, type of reference (read or write), and whether it is a block read. 1.5.4 Service Processor Interface The service processor unit (SPU) interfaces through the scan latches throughout the IBox data and control paths, and error logic. The SPU can retrieve and store the state of the ~ IBox scan logic for error reporting and possible recovery. Field Service can then perform analysis on the stored error symptom data for identification of the failing FRU. The scan paths can implement symptom-directed diagnosis (SDD) as well as test-directed diagnosis (TDD). The scan latches have scan data/clock inputs and the usual system data/clock inputs. The scan inputs can be controlled by the SPU and diagnostics to shift test patterns in and test for the correct pattern that is subsequently retrieved. 1-14 DIGITAL INTERNAL USE ONLY 2 Program Counter This chapter describes the functions of the IBox program counter unit (PCU). It identifies the major PCs that control the IBox and the MCAs that comprise them. This chapter also provides a detailed explanation of the PC parity coverage and the construction of the 32-bit PCs across four MCAs. Overview 2.1 The PCU directs the flow of the I-stream through the IBox and provides control to the branch prediction cache (BPC) and the virtual instruction cache (VIC). The PCU also directs the EBox and MBox in instruction fetch and instruction execution by supplying a copy of the PC to the EBox and by providing an address to the MBox on a VIC refill operation. 2.2 Major PCs The IBox generates four major PCs: Prefetch PC Decode PC Branch PC Unwind PC The primary inputs to the PCU are from the EBox, the OPU MCU, and the branch prediction cache (BPC). The EBox input (EBOX_RESULT_H[31:00]) directs the IBox to begin decoding a new I-stream as the result of a flush. The OPU input (OPU_RESULT_ H[31:00]) provides the branch target address when a branch in the I-stream is not stored in the BPC. The target address is calculated by the complex specifier unit (CSU) and loaded into the decode PC (DECODE_PC_H[31:00]). When a branch stored in the BPC is decoded, BP_PREDICTION_PC_H[31:00] is loaded into the prefetch PC. Figure 2-1 is a block diagram of the PCU data path. 2.2.1 Prefetch PC The prefetch PC is 32 bits wide and is used during the instruction fetch stage of the pipeline. This PC is the address of the next quadword that the instruction buffer receives. This address is used to request data from the VIC. When a cache miss is encountered, this address is the address of a request to the MBox. Each cycle, the prefetch PC is incremented by a quadword until the buffers of the instruction buffer are completely full. While the instruction buffer is full, the prefetch PC is held. The PC begins incrementing again when IBEX2 is empty. DIGITAL INTERNAL USE ONLY 2-1 Figure 2—1 flo D < g+ 94309230 ‘HJ1VI Il— 0 O PCU Data Path DIGITAL INTERNAL USE ONLY &> H O L V — 2 4 7 3 0 2 3 0 a N 2 Z H O N V H E { 0 : 1 € 1 0 d ” HOLV1 ~ < ¢ OHONVUE NZ 24 ndo 1Ins3¥ O N I M N 1 E ) 0 d [ 0 : aONZONIMN f0o:1£]od HONVYSE 0d dleo :eols 3uagav glo:ielovi 8 2xldoomo:dfO1o"N €:lie0lodgTM30023Q o [oEvI:seAiovOiTM u 4X3IN 9d "3023a'0d"'NI1€llo HoLVI3d0E0V3L4103f9oU:1{€ol0od:T1MElodO_H.O~ LX3AN od ln{y0o8o3:"x i1eliaIins3au "13vHVLfo:1€lod 0UVEX [0:€0/ANO3ALVAIIIHAS 1398vL lo:1elod Od"4NBI HOL31403Hd:"#010d alieao:telsauav xoala1feoielsavav gHOLVYl—o:soloviHmO1V1—DIAlg0:—z1IS3YaQAV o 43 HOL 3" O dE NOIL2103Ud loo:ielod” b HoLVY dlg0:1€)0d u8r00X UN 63 2-2 Program Counter 30%d3a ¢HOLVY Program Counter 2-3 When there is a request for I-stream, the prefetch PC is used as the address to the MBox. All requests for I-stream are quadword aligned, so it is not necessary to send bits 0 through 2 of the PC to address the MBox data cache. On a flush, the EBox provides the new PC to the IBox. The IBox then uses the PC provided to address the VIC and to address the MBox on a VIC miss. 2.2.2 Prefetch PC Data Path The prefetch PC is loaded from the target PC, BP prediction PC, incremented PC, or MTAG. When a target PC is selected as the prefetch PC, it redirects the IBox from decoding sequential I-stream. The target PC is loaded from the: EBox during a flush OPU as a calculated branch PC Unwind PC when a branch prediction is incorrect BP_PREDICTION_PC_H[31:00] is selected as the prefetch PC when the branch prediction cache is supplying the PC of the branch being decoded. MTAG is loaded into the prefetch PC on each cycle of an MBox response. The MTAG is the memory tag of the returning data from the MBox. MTAG is incremented by eight each cycle of the MBox response and, when the response is complete, is loaded into the prefetch PC. The prefetch PC gets loaded with the MTAG because, during the MBox response, the prefetch PC can increment beyond the next quadword it is to receive. The prefetch PC outputs control the VIC and provide the addresses of requests to the MBox. Figure 2-2 describes the data path of the prefetch PC. The prefetch PC is selected from: e MTAG — At the completion of a VIC refill, MTAG is loaded into the prefetch PC to ensure that the prefetch PC points to the next quadword loaded into the instruction buffer. e Target PC — A target PC from one of the sources described in Section 2.2.3 can be e Incremented PC — This input increments the prefetch PC by a quadword. : loaded into the prefetch PC to redirect the I-stream. BP PREDICTION PC \ ‘ \ +8 TARGET PC MTAG PC_PREFETCH_PC_SEL[01:00]) T Figure 2-2 PREFETCH PC LATCH > TARGET PC PREFETCH_PC_SEL[01 :O:f MR_X0052_88 Prefetch PC Data Path DIGITAL INTERNAL USE ONLY 2—-4 Program Counter 2.2.2.1 Prefetch PC Control The prefetch PC is incremented each cycle until the instruction buffer informs the PCU that IBEX and IBEX2 are full. When these units are full, the prefetch PC is held until IBEX2 again requires an I-stream. This continues until a request is made for an I-stream that is not in the VIC or a new target PC redirects the prefetch PC from sequential decoding. 2.2.3 Target PC The target PC is loaded into both the prefetch PC and the decode PC to redirect thé I-stream from decoding sequential instructions. A branch in the I-stream or a flush loads a new target PC into the decode PC and prefetch PC. Figure 2-3 describes the sources of the target PC. TARGET_PC_SELECT_H[01:00] is controlled by the PCU microcode and selects an input to redirect the I-stream for each of the following conditions: e Prediction PC is selected when a branch that is being decoded is stored in the BPC (BP hit). e OPU target PC provides the target PC when a branch is predicted taken but not stored in the BPC (BP miss). e EBox result provides the target PC when the EBox directs the IBox to flush to a new I-stream. e Unwind PC is loaded into the target PC after a bad branch prediction. PREDICTION PC 7B N UNWIND PC v TARGET PC EBOX RESULT PREFETCH PC Y OPU TARGET PC DECODE PC TARGET_PC_SELECT_H[01:00] /f MR_X00S"_89 Figure 2-3 Target PC Sources 2.2.4 Decode PC The decode PC is the PC of the instruction whose opcode is in the opcode byte (byte 0) of the instruction buffer. That is, it is the PC of the instruction currently being decoded. The decode PC does not reflect intermediate shift counts that result from partially decoded instructions. The decode PC is used in the instruction specifier stage of the IBox and is passed to the EBox to be placed in the PC history buffer. In the EBox it is called the VAX PC because it is the PC of the instruction to be executed and is used by the EBox when handling exceptions. This PC is also used by the CSU to evaluate implied specifiers and branch displacements. DIGITAL INTERNAL USE ONLY Program Counter 2-5 The decode PC is loaded from two sources. The next or sequential PC provides the decode PC when instruction decode is following a sequential path (straight line) and when branches are encountered that are predicted not taken. When a branch is predicted taken or a flush is encountered, a new (nonsequential) PC is loaded into the decode PC. This PC is supplied by the OPU or the branch prediction cache on branch predictions or is supplied by the EBox on a flush. 2.2.5 Decode PC Data Path The decode PC is loaded from the next PC, when decoding sequential I-stream, or is loaded from the target PC. The target PC is loaded into the decode PC at the same time a new target PC is loaded into the prefetch PC. The next PC is generated each cycle by adding the number of bytes the XBAR decodes to the decode PC. When the instruction under decode is completely decoded, this addition yields a new decode PC. The decode PC is output to the EBox (IBOX_PC_H[31:00]) to be placed in a queue until the instruction is executed. The CSU of the OPU receives a copy of the decode PC (PC_ OPU_DECODE_PC_HI[31:00]) to identify the instruction that is being decoded. When a branch instruction is being processed, and the branch meets the criteria required to be written in the branch prediction cache, the decode PC is written to that cache. The BP tag is written with DECODE_PC_H[31:10] at the address supplied by DECODE_PC_ H[09:00]. Each cycle, the decode PC is added to the number of bytes decoded by the XBAR (XBAR_ SHIFTCOUNT _H[03:00]). When all of the specifier bytes of an instruction are decoded (XSCA_SHIFTOPCODE_H asserted), the next sequential PC or next PC is loaded as the decode PC. Each new decode PC that is loaded asserts IBOX_PC_H[31:00] in the EBox and OPU_ DECODE_PC_H[31:00] in the OPU MCU. The EBox copy of the decode PC is stored in a queue for instructions to be executed. The OPU copy of the decode PC is used to calculate specifier PCs and branch displacement. TARGET PC XBAR SHIFT COUNT LATCH ) DECODE Pi > NEXT PC /l XSCA_SHIFTOPCODE_H MR_X0055_89 Figure 2-4 Decode PC Data Path 2.2.5.1 Delta PC The XBAR shift count (XSCA_SHIFTCOUNT_H[03:00]) is accumulated to produce the decode delta. That is, the total number of bytes in an instruction yields the decode delta. The decode delta provides the OPU and the BPC with the instruction length. DIGITAL INTERNAL USE ONLY 2-6 Program Counter 2.2.6 Branch PC The branch PC is the address of the branch instruction that is currently under evaluation. When the IBox encounters a branch, the decode PC is loaded into the branch PC. The branch PC is used to address the branch prediction cache. The branch PC is compared with the branch prediction (BP) tag to produce a hit or miss in the branch prediction cache. 2.2.7 Branch PC Data Path The branch PC is loaded from either the decode PC or the second branch PC. When a branch instruction is being processed, the address of the branch is saved until the branch is shifted out of the instruction buffer. After the branch is shifted out of the instruction buffer, the branch PC provides the information pertaining to the branch that is written to the branch prediction cache. Because the IBox can process two branches simultaneously, a second branch PC must be generated for the second branch. The second branch PC (SECOND_BRANCH_PC_ H([31:00]) is stored in a scan latch until the first branch is completely processed. The second branch PC is then loaded into the branch PC and used as data to be written to the branch prediction cache. Figure 2-5 describes the data path of the branch PC. The decode PC is loaded into the branch PC for each branch the IBox encounters. The address is saved until the branch is completely processed. If two branches are being processed, each address is saved. DECODE PC BRANCH PC LATCH DECODE PC SECOND BRANCH PC LATCH MR_X0054_8% Figure 2-5 Branch PC Data Path DIGITAL INTERNAL USE ONLY Program Counter 2-7 2.2.8 Unwind PC The unwind PC is the PC of the direction not taken in a branch prediction. When the IBox makes a branch prediction, two PCs relate to the prediction: the branch target PC and the next or sequential PC. When a branch is predicted taken, the branch target PC or the prediction PC is the path that the instruction execution follows. The next PC is the path that is not taken. The IBox saves the path that is not taken (in this case, the next sequential PC) in case the IBox predicted the branch incorrectly. If the EBox informs the IBox that a branch has been incorrectly predicted, the IBox can correct the prediction by loading the unwind PC into the decode and prefetch PCs and start decoding in the correct path. Because the IBox can process two branches simultaneously, it is necessary to have two unwind PCs. These PCs are called the unwind PC and the second unwind PC. These PCs are both used when a branch is predicted and a second branch is predicted before the first has been validated by the EBox. Once the first branch is validated, the second unwind PC is loaded into the first unwind PC. The original, first unwind PC is discarded. 2.2.9 Unwind PC Data Path When the IBox predicts a branch, the PC of the instruction that will not be executed is saved as the unwind PC. In the event of a wrong prediction, the PCU can return to the PC that was saved and continue with the correct I-stream. The PCU can maintain information pertaining to two conditional branches, therefore, two unwind PCs are stored. Figure 2—6 describes the data path of the unwind PC. The unwind PC is loaded from one of three sources: e Next PC — This PC is loaded when a branch is predicted taken. e OPU result — This PC is loaded when a branch is predicted not taken. e Second unwind PC — This PC is loaded by latching the next PC and loading it as the unwind PC. The second unwind PC is loaded into the unwind PC only after the first branch that has been predicted has been validated by the EBox. NEXT PC OPU RESULT NEXT PC ! LATCH \ LATCH UNWIND PC > SECOND UNWIND PC MR_X0055_89 Figure 2-6 Unwind PC Data Path DIGITAL INTERNAL USE ONLY 2-8 Program Counter 2.3 Cache Control Control of the branch prediction cache (BPC) and the virtual instruction cache (VIC) is provided by the PCU. The VIC is addressed by the prefetch PC on read and write operations, the VIC tag is compared to the prefetch PC to determine a match when requesting data, and the prefetch PC provides the address for MBox requests on a VIC refill. The decode PC provides both the BP tag and the BP address. Figure 2-7 shows the relationship of the cache control signals to the PCU outputs. IBOX IB ADDRESS . 3130292827262524232221201918 171615141312 11100808 0706 0504 03{020100 VIC TAG VIC ADDRESS 313029282726252423222120191817 1615141312 11100908 0706 050403020100 BP TAG BP ADDRESS MR_X0048_882 Figure 2-7 2.3.1 PCU Cache Control BPC Control The branch PC provides BPC fields. The BP tag is bits 31 through 10 of the branch PC and is written when the branch is first encountered. The tag is written at the cache address that is supplied by decode PC bits 0 through 9. When a branch is encountered, the virtual address of the branch (DECODE_PC_H[31:00]) is compared to the BP tag to determine a hit or miss in the BPC. 2.3.2 VIC Control The prefetch PC provides control for the VIC. This PC provides the address of a request to the MBox (IBOX_IB_ADDRESS_H[31:03]), the VIC tag (VIC_TAG_H[31:13]), and to address the VIC when replenishing the instruction buffer. Bits 31 through 13 provide the VIC tag. The tag is written on a VIC refill and subsequently compared with bits 31 through 13 of the prefetch PC when the instruction buffer is requesting VIC data. Bits 0 through 9 of the prefetch PC provide the cache address on VIC requests and, when requesting data from the MBox data cache bits 31 through 3 of the prefetch PC, provides the requested address (IBOX_IB_ADDRESS_H[31:03]). DIGITAL INTERNAL USE ONLY Program Counter 2-9 2.4 PCU Error Detection Each of the MCAs that comprise the PCU receives inputs from most of the other functional units in the IBox and from the MBox and EBox. Each PCU MCA contains parity detection logic that detects errors in the internally generated signals and inputs to the PCU. Errors detected in the PCU MCAs result in both fetch and decode errors being asserted in the IBox and EBox. This section contains block diagrams of the error logic of each PCU MCA and lists the decode and fetch errors that can occur in each MCA. 2.4.1 PCVC Error Logic PCVC fetch errors are latched in IBFB as an IBFB_FETCH_ERROR and then asserted in the EBox as IBOX_FORK_ERROR_H. During this process, the scan latches in PCVC receive a hold signal. The hold signal preserves the state of the error scan latches so that the intermediate error can be determined. PCVC decode errors are latched in XDTB and OSQB as XDTB_DECODE_ERROR_H. 0OSQB asserts IBOX_POINTER_ERROR_H and DATA_ERROR_H in the EBox. Figure 2-8 shows a block diagram of the PCVC error logic and Table 2-1 lists the error signals generated in this MCA. Table 2-1 PCVC Errors Input Signal Intermediate Error Fetch Errors DECODE_PC[15:08] PREFETCH_PC_H[15:08] PCBP_DECODE_PC[05:00] BP_PREDICTION BP_PREDICTION_TAG_H[15:00] PCVC_DECODE_PC_ERROR PCVC_PREFETCH_PC PCVC_PCBP_DECODE_PC_ERROR PCVC_BP_PREDICTION_ERROR PCVC_BP_PRED_TAG_ERROR Decode Errors XSCA_XSCA_SHIFTCOUNT_H[03:00] XSCA_SHIFTOPCODE PCVC_DECODE_ERROR PCVC_DECODE_ERROR DIGITAL INTERNAL USE ONLY OADdOAODdHOHY3H84dalHOL34HOYHIH HOHADd3I Figure 2-8 (80:651]104730030 [H8D01:3641340d OHAOL1DH3d43I L3IHS '30 3dO AlIdVYd HINOIHO PCVC Error Logic DIGITAL INTERNAL USE ONLY "oHu3 H 810X 30 9230 HOHY3 9 H 30 230 HOH 3I 8L H | I ' E T 4 9 1 X 0 4 8 } M H O 4 O H H 3 HH — X 0 8 3 43SHOAODddW80dY0303IH Aldvd 0ADd30230HOHY3H 8L0AX30230 OH3HH 80s0 xosdlViva OWH3HH AA10HI)lNLDYuOvIdH(DS3IHO —-—- -—a1ax- ~ - -. X08!HILNIOL-- HOHI«H—X083 2 3 A D 0 d 2 3 0 D d H O H 3 H 7dd['8o3N480O:IS2Le3l0DO7IV4QI83T0YM4NH[Od0I8L:5O0HI)Q0(3dAHHOYL1SIH) OONAAODddILdd8812dDI30G393Y0dO0HdOHHO3YHIH H DAJd d8 G3d 9VLI HOWH3 H 'N‘=v9od14d08X1 MW9S00X6 2-10 Program Counter _QNV_ vGHi1Ov0LVaY HOHY3 01901 -O'dAJ8DI0dA Program Counter 2-11 2.4.2 PCBP Error Logic Figure 2-9 shows a block diagram of the PCBP error logic and Table 2-2 lists the error signals generated in this MCA. Table 2-2 PCBP Errors Input Signal Intermediate Error Fetch Errors DECODE_PC[07:00] PREFETCH_PC_H[07:00] INSTRUCTION_LENGTH][05:00] DECODE_PC[23:16] DECODE_PC[31:24] BP_PREDICTION PCBP_DECODE_PC_ERROR PCBP_PREFETCH_PC_ERROR INSTRUCTION_LENGTH_ERROR PCBP_PCLO_DECODE_PC_ERROR PCBP_PCHI_DECODE_PC_ERROR BP_PREDICTION_ERROR Decode Errors XSCA_SHIFTCOUNT_H[03:00] XSCA_SHIFTOPCODE_H XSCA XSCA 2.4.3 PCLO Error Logic Figure 2-10 shows a block diagram of the PCLO error logic and Table 2-3 lists the error signals generated in this MCA. Table 2-3 PCLO Errors Input Signal Intermediate Error Fetch Errors DECODE_PC[23:13] PREFETCH_PC_H[23:13] DECODE_PC[05:00] EBOX_RESULT{15:08] BP_PREDICTION_PC[15:13]} BP_PREDICTION PCLO_DECODE_PC_ERROR PCLO_PREFETCH_PC_ERROR PCLO_PCBP_DECODE_PC_ERROR PCLO_EBOX_RESULT_ERROR PCLO_PRED_PC_15_13_ERROR PCLO_BP_PREDICTION_ERROR Decode Errors XSCA_SHIFTCOUNT_H[03:00] XSCA_SHIFTOPCODE_H PCLO_DECODE_ERROR PCLO_DECODE_ERROR DIGITAL INTERNAL USE ONLY X 0 8 ! W I L N I O d O H Y I HH Q10H viva HOLV1 H UN£S00X 60 Program Counter = '810X 'vifl NJd 2-12 H8O4L8Y3!4 "8H31O0H0343X0I AHILNlOdIvHdD._De-R-g.1aX@x HOHUIo 3H80O1Y03X0 O'dAJ8OI0dA OHd813d043I HNdOLIO8HNYdLISTNI HO30d810d"030 NOIHdLOD8OI2QIdHd H3dAOL08l2u93vdI0 Hoy 3 1OdED(NOI]VL98DIJYQ43HLd ‘AL(UlI4VoNIO0aHS:OXeD0)] [00:20]0d"HOL3434d [0:2010d730030 [5049713:0€20]d3H07 dHNYOILDIG83Hd A'L3l40IH0V2SdO Figure 2-9 PCBP Error Logi DIGITAL INTERNAL USE ONLY HOud3 (SA31IH0OLHV)1 Program Counter 2-13 68HS08NX 010d 21901 H HOHY3 3009234 019d Alldvd Figure 2-10 PCLO Error Logi DIGITAL INTERNAL USE ONLY 2-14 Program Counter 2.4.4 PCHI Error Logic Figure 2-11 shows a block diagram of the PCHI error logic and Table 2—4 lists the error signals generated in this MCA. Table 2-4 PCHI Errors Intermediate Error Input Signal Fetch Errors DECODE_P(C[31:24] PCHI_DECODE_PC_ERROR PREFETCH_PC_H[31:24] PCHI_PREFETCH_PC_ERROR DECODE_PC[05:00] PCHI_PCBP_DECODE_PC_ERROR DISPLACEMENT{11:08, 03:00] PCHI_XDTA_DISP_ERROR DISPLACEMENT(15:12, 07:04] PCHI_XDTB_DISP_ERROR BPTD_TAG_DISPLACEMENT{15:00] PCHI_BP_TAG_DISP_ERROR[01:00] BP_PREDICTION PCHI_BP_PREDICTION_ERROR Decode Errors PCHI_DECODE_ERROR XSCA_SHIFTCOUNT_H{03:00] XSCA_SHIFTOPCODE_H 2.5 : PCHI_DECODE_ERROR PCU Inputs The MBox inputs to the PCU are in response to an instruction buffer request to the MBox. The two signals the PCU receives are as follows: e MBOX_IB_RESPONSE_H directs the control of the MTAG and the prefetch PC when e MBOX_IB_PAGEFAULT_H inhibits writing page faulted data to the VIC. the MBox is returning data for a VIC refill. The EBox inputs to the PCU validate predicted branches and, when the IBox is flushed, provides a starting address. The primary EBox inputs to the PCU are as follows: e EBOX_RESULT_H[31:00] provides the starting address after a flush. e EBOX BRANCH_A_H, when asserted, indicates a bad branch prediction. e H validates EBOX_BRANCH_A_H. EBOX_BRANCH_VALID_A_ The XBAR provides shift counts and error status to the PCU. The XBAR inputs are as follows: e XSCA_SHIFTCOUNT_H[03:00] is used to increment the decode PC. e XDTB_DECODE_ERROR_H informs the PCU of an error in the instruction decode pipeline stage of the IBox. The signal sets the PCU in a hold state. DIGITAL INTERNAL USE ONLY 2-15 HBS00X WN 68 Program Counter H HOHH3 HOLl3d adal H HOHH3 IHOd 1HOd Figure 2-11 PCHI Error Logi DIGITAL INTERNAL USE ONLY 2-16 Program Counter The OPU provides flush signals and the target PC for branch instructions. The OPU inputs are as follows: e OCTL_PC_FLUSH flushes the IBox to a new decode PC, provided by the EBox. e OCTL_IBUF_FLUSH holds the prefetch PC until the IBox can resume decoding an I-stream. The signal is sent when the IBox has to wait for one of the VIC block valids to be flushed. e OPU_TARGET_PC_H[31:00] is provided by the CSU on branch instructions. e QOCTL_VIC_FLUSH_H flushes the VIC. 2.6 PCU Outputs The PCU directs outputs to the MBox, EBox, XBAR, and OPU. The outputs direct the flow and execution of instructions through the functional units that receive them. The MBox receives a byte parity protected address that directs the prefetching of Istream. These two outputs are as follows: e e IBOX IB_ADDRESSI[31:03] is the prefetch PC. (The lower three bits are always zero.) IBOX_IB_ADDRESS_PARITY[03:00] is the byte parity for the address that the PCU is sending the MBox. The EBox receives a copy of the decode PC and information pertaining to branches. The signals the EBox receives are as follows: e IBOX_PC[31:00] is a copy of the decode PC. The EBox, after execution of the instruction to which this PC points, stores it in the PC history buffer, or uses it during exception processing. e IBOX_PC_PARITY[03:00] is the byte parity for the IBox PC. e IBOXCORRECTION_H is asserted if the IBox can correct a bad branch prediction before it is shifted out of the IBUF. e e IBOX_PC_VALID_H is asserted when a new valid opcode is shifted into the opcode byte of the IBUF. IBOX_PREDICTION_H, asserted, informs the EBox that the IBox has predicted a branch to be taken. The XBAR receives two signals from the PCU: e PCHI_UNWIND_H is asserted when the EBox detects a bad branch prediction. The e PCHI_DECODE_ERROR_H is asserted by a parity error on XSCA_SHIFTCOUNT_ signal asserts a restart signal in the three XBAR MCAs. C_H or XSCA_SHIFTOPCODE_C_H. The following PCU outputs provide the OPU with the decode PC and branch information: e e DECODE_PC_HI[31:00] is the CSU’s copy of the decode PC. PCHI_UNWIND_H is asserted on a bad branch prediction. This signal initiates a flush of the CSU. : e PCHI_CORRECTION_H is input to the branch count logic of the CSU. e PCVC_VIC_FIP_H is sent to OCTL when the VIC is being flushed. Asserting this signal directs OCTL to stall the IBox if another flush is received before the first flush is complete. (The VIC flush requires 256 cycles.) 2-16 | DIGITAL INTERNAL USE ONLY 3 Instruction Fetch This chapter describes the functional units representing the instruction fetch pipeline stage: the VIC, the instruction buffer, and the MBox interface. VIC 3.1 The VIC is a direct-mapped, 8-Kbyte cache with a block size of 32 bytes and a fill size of 8 bytes. Physically, the VIC is made up of four groups of STRAMs: e Data STRAMs — Store one quadword of data per location. o e Tag STRAMs — Store bits [31:13] of the address of each block in the VIC. Block valid STRAMs — Store one bit to indicate that the VIC block is valid. e Quadword valid STRAMs — Store four bits to indicate that a corresponding quadword is valid. VIC data STRAMs are on the VIC MCU and the VIC tag STRAMs are on the XBR MCU. A VIC block consists of four quadwords of VIC data. There are 1024 VIC data locations and 256 VIC tags, VIC block valid bits, and VIC quadword valid bits. There are two separate block valid bits for the VIC. Figure 3-1 is a simplified block diagram of the VIC. VIC TAG STRAMs, 256 ENTRIES TAG PARITY TAG [03:02] | [31:13] VICB | viCc MATCH LOGIC > BLOCK | BLOCK QUADWORD |QUADWORD | VICA VALID 0 ] 1 l 2 l 3 VALID PARITY VALID | VALID VIC DATA STRAMs, 1024 ENTRIES MBOX 1B DATA [63:00) AND BYTE PARITY - QUADWORD 0 QUADWORD 1 VIC DATA [63:00] AND BYTE PARITY L QUADWORD 2 QUADWORD 3 MR_X006C_89 Figure 3-1 VIC DIGITAL INTERNAL USE ONLY 3-1 Instruction Fetch 3-2 3.1.1 VIC Hit The instruction buffer determines a VIC hit (VIC_MATCHA_H or VIC_MATCHB_H asserted) if the following conditions are met: The VIC block is valid. The prefetch PC matches the VIC tag. The quadword is valid. If these conditions are not met, a VIC miss is signaled and an instruction buffer request is made to the MBox. 3.1.2 VIC Data Write Control of the VIC is provided by the prefetch PC and the instruction buffer. VIC match and instruction buffer request logic are in the instruction buffer, while the requested address is provided by the prefetch PC. Figure 3-2 shows a block diagram of the VIC data write function. The instruction buffer request initiates the following sequences: e One cycle after the request is made, the instruction buffer asserts IBUF_REQUEST_ IN_PROCESS. This signal selects the memory tag (MTAG) to address the VIC. e MBOX_IB_RESPONSE is asserted when the MBox responds to the request. Without any stalls present or a page fault on the requested data, this signal provides the write enable for the VIC data. DIGITAL INTERNAL USE ONLY instruction Fetch 3-3 H649W80X -< NG a2 Q 0 2b 1 -- HOLVI Figure 3-2 VIC Data Write DIGITAL INTERNAL USE ONLY 3—4 Instruction Fetch 3.1.2.1 VIC Address Selection The VIC is addressed by the prefetch PC on VIC requests and is addressed by the MTAG when the I-stream is being returned from the MBox. The MTAG is the address that is returned from the MBox in response to an instruction buffer request. Figure 3-3 shows the generation of the VIC address. The prefetch PC provides the requested address to the MBox and is then held from incrementing until the MBox returns the requested data. When the datais returned (MBOX_IBRESPONSE H asserted), the prefetch PC begins incrementing again. VIC_ADDRESS_SELECT_H is asserted to select MTAG when the MBox returns the requested I-stream. The prefetch PC does not address the VIC when data is returning from the MBox because the XBAR does not always consume a quadword of I-stream in a single cycle. If the prefetch PC did address the VIC and the XBAR did not consume a quadword of I-stream each cycle of the response, the prefetch PC would be incremented past the last quadword that the instruction buffer receives next. The prefetch PC is loaded with the MTAG at the end of an instruction buffer request so that it points to the next quadword that the instruction buffer requests. PREFETCH PC LATCH MTAG VIC ADDRESS > +8 MBOX_:B_RESPONSE_H/T) VIC_ADDR ESS_SELECT_H/f MR_X0050_89 Figure 3-3 VIC Address Selection 3.1.2.2 VIC Data With the VIC data write signal (PCVC_VIC_DATA_WRITE_H) enabled and the VIC address selected, the MBox returns the I-stream and writes it to the STRAMs. The I-stream arrives one quadword per cycle and is byte parity protected. Most requests for VIC data are for four quadwords, but requests for less can be issued. If a request is for less than four quadwords, the MBox returns the requested quadword and the remaining quadwords in the block. That is, the request does not wrap around the block. For example, if the requested address is for the third quadword in a block, quadwords 2 and 3 are returned, but quadwords 0 and 1 are not. DIGITAL INTERNAL USE ONLY instruction Fetch 3-5 3.1.3 VIC Tag Write As VIC data is being written to the cache, the tags are also written. As the requested I-stream is returned from the MBox, the PCU writes the tag field (bits [31:13] of the MTAG) and validates the block and quadword valid bits for that location in the VIC. Figure 3—4 is a block diagram of the VIC tag write function. MBOX_IB_RESPONSE_H is asserted in the first cycle of a response from the MBox and enables writing the tag and valid fields to the VIC tag (VICT) STRAMs. The following sections describe the four write functions associated with the VIC tag field. The four fields to be written are as follows: VIC tag Quadword valid bits VIC_FLUSH_IN_PROGRESS =/ / Block valid bit Tag parity VICA_BLOCK_VALID_WRITE VIC_DATA_WRITE USE_BLOCK_A_VALID MBOX_IB_PAGEFAULT_L | VICB_BLOCK_VALID_WRITE LATCH ——'l OP_ABORT_L CTL_PC _FLUSH_L i IBFA_IB_ABORT_L } MBOX_iB_RESPONSE_H USE_BLOCK_A_VALID ] l . TAG [31:13}) TAG PARITY [03.02] : QUADWORD VALID QUADWORD VALID PARITY VICA BLOCK VALID vicB BLOCK VALID | i i | DECODE 1 PREFETCH_PC MTAG[04:03) MTAG IBFA_REQUEST_IN_PROGRESS START_ADDRESS[01:00] j BLOCK_A_DATA USE_BLOCK_A_VALID { BLOCK_B_DATA MRA_Xc082_e3 Figure 3-4 VIC Tag Write DIGITAL INTERNAL USE ONLY 3-6 Instruction Fetch 3.1.3.1 Tag Write When the requested I-stream is returned from the MBox, the PCU writes MTAG bits [31:13] to the tag field. The address of the request was originally supplied by the prefetch PC, but when the I-stream is returning, the VIC is addressed by the MTAG. When the VIC is receiving multiple responses from the MBox, the address of the VIC must be incremented by a quadword in each cycle of the MBox response. The prefetch PC is not incremented by a quadword until a quadword of I-stream is consumed by the XBAR. Because of this, a new address (MTAG) must be used when the I-stream comes from the MBox to the VIC. To address subsequent returning quadwords, bits [04:03] of the VIC address are incremented each cycle. The VIC address is then loaded into the MTAG and used as the returning quadword address. 3.1.3.2 Quadword Valid Write As each quadword returns, bits [04:03] of the MTAG are decoded to identify the quadword being written. The bit field is decoded and the single bit valid field is written as the data is written to the data field. Validating this field as the data is being written provides valid data, for the instruction buffer, in the first cycle of the MBox response. 3.1.3.3 Block Valid Write The VIC contains two sets of the block valid field. The 2-bit field, when validated, indicates that valid data is in the VIC. The purpose of the two copies of the field is to enable a VIC flush in a single cycle. To validate this field, the correct block valid is selected, then the valid bit is latched into the STRAM. The field is validated when the first quadword, of a request, is returned from the MBox. A block valid selection and write is initiated by the assertion of MBOX_IB_RESPONSE_ H. The selection of the block valid is achieved by inverting USE_BLOCK_A_VALID. This selects the block that was not in use since the last VIC flush. 3.1.3.4 Tag Parity All VIC tag parity checking is performed in IBFA. When a VIC tag parity error is detected in IBFA, IBFA_VIC_ERROR_H is asserted and passed to the IBFB fetch error circuitry. IBFB asserts IBFB_FETCH_ERROR_H and forwards it to the EBox when a VIC tag parity error is detected. DIGITAL INTERNAL USE ONLY 3-7 Instruction Fetch 3.1.4 VIC Flush The VIC is flushed on every REI and, in most cases, the flush is achieved in a single cycle. A flush is initiated when OCTL_VIC_FLUSH_H is asserted. This signal is generated in OCTL and is the result of a flush code sent by the EBox. Receipt of this signal inverts the selection of the block valid being used and initiates the flush of the original block valid that was in use. The flush requires 256 cycles. Each cycle of the block valid flush, VIC_CLEAR_ADDRESS_H[08:00] is incremented. The flush is complete when bit 8 is set (VIC_CLEAR_ADDRESS_H[08:00] = 1FF). Oncé a flush is initiated by OCTL, the PCU asserts VIC_FLUSH_IN_PROGRESS_H. This signal remains asserted until the flush is complete and prohibits issuing another flush request until it is negated. 3.1.5 VIC Data Read When the instruction buffer requires a new I-stream, compare logic on the IBFA MCA checks the VIC tag field for a resident block and valid quadwords then compares the requested address with the VIC address tag field. If there is a match, a VIC hit (VIC_ MATCH_H) is signaled and the requested I-stream is loaded into the instruction buffer. Figure 3-5 is a simplified block diagram showing the VIC compare logic. VICB_BLOCK_VALID BLOCK_VALID V‘J VICA_BLOCK_VALID USE_BLOCK_A_VALID_H ] OUADWORDO_VALID_H QUADWORD1_VALID_H QUADWORD2_VALID_H ‘ QUADWORD_VALID_H ] rC QUADWORD3_VALID_H \ VIC_MATCH_H / BOX_VALID_L \ REOUEST_CONDITION_H __J IBOX_EMPTY_TB_R PREFETCH_PC_H[04.03] sc_eso_naox_vc_HJ PREFETCH_PC_H[31:13] VIC_TAG_H[31:13) L TAG_MATCH_H COMPARE SPU_DISABLE_H MA_X0D63_89 Figure 3-5 VIC Match Logic DIGITAL INTERNAL USE ONLY 3-8 Instruction Fetch The block valid bit is checked to detect a valid VIC block. The prefetch PC (bits [04:03]) selects the quadword valid to be compared. The tag address is compared with prefetch PC [31:13]. A match of the two address fields enables VIC_MATCH_H. A VIC hit asserts READ_VIC_DATA_H and the requested I-stream is written to the instruction buffer. 3.1.5.1 VIC STRAM Bypass During a VIC refil], it is possible to write and validate the data and latch the data in the instruction buffer in the same cycle. When the MBox is returning the first quadword of an I-stream, the block and quadword are marked valid and the tag and data are written to the STRAMs. By placing the data on the output latches of the STRAMs, the instruction buffer is allowed to receive the valid data late in the same cycle. If the XBAR consumes the instruction buffer data in a single cycle and the MBox is simultaneously returning VIC data, subsequent quadwords are latched in the instruction buffer through the VIC STRAM bypass. 3.1.6 VIC Parity Coverage The VIC can cause two errors: fetch errors and decode errors. The VIC tag (VICT) fields are parity checked on the IBFA MCA. VIC data (VICD) is parity checked on IBFA and IBFB. VIC data is byte parity protected and is checked on the output, as the data is latched into the instruction buffer. IBFB_FETCH_ERROR_H is asserted when this error is detected. Block valid parity, tag parity, and quadword valid parity is checked on IBFA. Errors in these units assert IBFA_VIC_ERROR_H. This signal is latched to IBFB and asserts IBFB_FETCH_ERROR_H. The SPU can disable the VIC parity error detection logic. Enabling this feature suppresses reporting of VIC parity errors. 3.1.7 Disabling VIC Hits The SPU can disable the VIC hit logic. Enabling this feature causes every VIC access to result in a VIC miss. The instruction buffer is then forced to initiate an MBox request each time it needs an I-stream. This feature can be used as a troubleshooting aid or as a temporary solution to an excessive error rate in the VIC. DIGITAL INTERNAL USE ONLY Instruction Fetch 3.2 3-8 Instruction Buffer The instruction buffer logic is physically partitioned across the IBFA and IBFB MCAs on the XBR MCU, with IBFA handling the low nibble and IBFB handling the high nibble. The primary function of the instruction buffer is to present the I-stream to the XBAR for decoding. The instruction buffer is comprised of six major functional units (Figure 3—6): IBUF IBEX IBEX2 Rotator Shifter Merger Under usual conditions, the instruction buffer is filled in the following manner: The first quadword of VIC data passes through the rotator and merger and is loaded into the IBUF The second quadword is loaded into IBEX, and IBEX2 receives the third quadword. As the IBUF bytes are decoded, they are shifted out and replenished with bytes from IBEX. The rotator, shifter, and merger align the remaining IBUF bytes and the new IBEX bytes in sequential order in the IBUF. When the IBEX becomes empty, the quadword in IBEX2 is loaded into IBEX and IBEX2 requests more data from the VIC. If no valid data is in the VIC, then a request is passed to the MBox. IBUF DATA [71:00] ROTATOR IBEX2 VIC DATA [63.00] IBEX_ROTATE_SELECT[O1 :00]/{ IBUF_VALID_COUNT{03.00] IBEX_VALID_COUNT[03:00] XBAR_SHIFTCOUNT_H[03:00] ROTATE L/‘( ' ROTATE_DATA_SELECT{02:00]} SELECT IBUF_VALID_COUNT[03:00] XBAR_SHIFTCOUNT[03:00] Q‘EELRE%ET MERGE_SELECT[08:00] MR_X0064_89 Figure 3-6 Instruction Buffer DIGITAL INTERNAL USE ONLY 3-10 Instruction Fetch IBEX and IBEX2 are quadword buffers between the VIC and IBUF. Data can be loaded into IBUF from either of these registers, with valid data in IBEX taken before any valid data from IBEX2. Data taken from either of these sources is rotated by the rotator to provide the correct byte position in IBUF. As the data in IBUF is consumed, decoded bytes are shifted out and replenished with new valid data from IBEX, IBEX2, or the VIC. The shifter provides this function by shifting the decoded bytes out and shifting remaining bytes down into vacant lower byte positions. The merger ties the functions of the rotator and shifter together by replenishing IBUF with valid data from both the shifter and rotator. Control for the instruction buffer depends primarily on three signals: e [BUF_VALID_COUNT_H[03:00] is the number of valid bytes in the 9-byte IBUF. e IBEX_VALID_COUNT_H[03:00] is the number of valid bytes in the 8-byte IBEX. e XSCA_SHIFTCOUNT _H is the number of IBUF bytes the XBAR has decoded in a single cycle. These three signals direct the flow of data through the rotator and shifter, and they initiate requests to the VIC and MBox for replenishment. The shift count (XSCA_SHIFTCOUNT_H[03:00]) is the number of bytes the XBAR has decoded and directs the shifter to shift decoded bytes out of IBUF. The IBUF valid count (IBUF_VAL_H[03:00]) selects rotate and merger data for replenishing IBUF. This valid count is calculated by subtracting the XBAR shift count from the previous IBUF valid count. The IBEX valid count (IBEX_VALID_COUNT_H[02:00]) is the number of valid bytes in the 8-byte IBEX. This valid count is used by the rotator to select valid bytes to be loaded into IBUF. When no valid bytes are in IBEX (IBEX_VALID_COUNT_H = 0), data is loaded into IBUF from IBEX2, the VIC, or a request is made to the MBox. 3.2.1 IBEX2 IBEX2 is an 8-byte buffer between the VIC and IBEX. IBEX2 receives and outputs data eight bytes per cycle. IBEX2 receives input data from the VIC and outputs to either IBEX or IBUF. Which unit receives IBEX2 data depends on the valid counts of IBEX and IBUF. For example, if IBEX is empty (IBEX_VALID_COUNT_H = 0) and IBUF is full (IBUF_VALID_COUNT_H = 9), then the eight bytes of IBEX2 data is loaded into IBEX. This load is accomplished by selecting IBEX2 data at the rotator multiplexer IBEX_ ROTATE_SELECT_H[01:00] = 1). The data is not passed through the rotator because the IBUF valid count indicates that IBUF is full. Validating IBEX2 requires only a single bit, as the buffer is either full or empty. IBEX2_ VALID_H is asserted when valid data is in IBEX2. To load data into IBEX2 and assert the IBEX2 valid bit, IBEX must contain valid data IBEX_EMPTY_TB_H is negated) and a VIC match must occur (VIC_MATCHA_H or VIC_MATCHB_H asserted). DIGITAL INTERNAL USE ONLY Instruction Fetch 3-11 When IBEX2 contains valid data (IBEX2_VALID_H asserted), the prefetch PC cannot be incremented past the next sequential quadword it is addressing. A hold signal (IBUF_ HOLD_PREFETCH_PC) is asserted to stop incrementing the PC. This signal is asserted when IBEX2_VALID_H is asserted and IBEX_EMPTY_H is negated. The PC is held because it must continue to point to the next quadword the instruction buffer will receive. When IBEX2 becomes empty, the quadword that the prefetch PC is addressing is loaded into IBEX2 and the PC is incremented. 3.2.2 IBEX IBEX, unlike IBEX2, can contain valid data in any of its byte locations. This allows IBEX to replenish IBUF with any number of bytes as they are decoded by XBAR. Data that is loaded into IBEX is sent to the rotator of the instruction buffer. IBEX data is passed to the rotator in a low-to-high byte order, providing IBUF with a sequential I-stream. IBEX receives data from either IBEX2 or the VIC. If IBEX2 is valid, then it replenishes IBEX. If IBEX2 is not valid, then the VIC (if valid) replenishes IBEX. 3.2.2.1 IBEX Valid Count Because IBEX replenishes IBUF with as many bytes as needed, it can contain any number of valid bytes between eight and zero. The IBEX valid count (IBEX_VALID_COUNT_H[03:00]) is calculated by subtracting the valid count, of the previous cycle, from the number of bytes loaded into IBUF (MERGE_ COUNT_H[03:001]). When IBEX contains valid data, and the IBUF valid count IBUF_VC_H[03:00]) is less than nine, IBEX data is selected at the rotator multiplexer. The IBEX data is placed on the input of the rotator. The rotator rotates the bytes needed to fill IBUF (ROTATE_ SELECT_H[03:00)) and passes the rotated IBEX bytes to the merger. The merger passes shift data and rotate data to IBUF and produces a merge count (MERGE_COUNT_ H[03:00]). The previous IBEX valid count is subtracted from the merge count to produce a new IBEX valid count. When the IBEX valid count is decremented to zero, IBEX_EMPTY_H is asserted and directs IBEX_ROTATE_SELECT_H[01:00] to select VIC data or IBEX2 data at the IBEX rotator multiplexer. When either of these sources is supplying data to IBUF, and IBUF does not require all eight bytes, the remaining bytes are stored and validated in IBEX. 3.2.3 Rotator The rotator aligns the bytes from IBEX, IBEX2, or the VIC so that they are correctly placed in IBUF. This logic consists of a bank of multiplexers that can select any of the eight bytes and insert them into IBUF in the correct byte position. The rotator is controlled by the IBEX valid count, the IBUF valid count, and the XBAR shift count. The rotator receives data from the IBEX rotator multiplexer, which selects IBEX data, IBEX2 data, or VIC data to be input to the rotator. The source of the data at the IBEX rotator multiplexer is selected by IBEX_ROTATE_SELECT_H[01:00]. This select signal is derived from IBEX_EMPTY_H (asserted when IBEX contains no valid data), IBEX2_VALID_H (the single valid bit for IBEX2), and READ_VIC_DATA_H (the read enable signal for the VIC). IBEX_ROTATE_SELECT_H[O1:00] selects valid data from IBEX before selecting IBEX2 or VIC data. If the IBEX data is not valid, IBEX2 data is selected (if valid) before the VIC data. If neither IBEX nor IBEX2 data is valid, VIC data is selected. DIGITAL INTERNAL USE ONLY Instruction Fetch 3-12 Figure 3-7 shows the rotator supplying IBEX data to IBUF to replenish four bytes that have been decoded and shifted out. e During cycle 1, four bytes, including the opcode, are decoded and shifted out of IBUF. e During cycle 2, IBUF bytes 4 through 9 are shifted into bytes 0 through 4, and IBEX bytes 1 through 4 are rotated into the empty IBUF bytes. IBUF is full, IBEX contains three valid bytes, and IBEX2 is valid. IBEX contains seven valid bytes and IBEX2 is valid. CYCLE 1 08 07 06 05 00 00 | 06 | 8F 01 00 { DO | 56 | 54 | 51 ©04 ©03 02 | C1 T IBUF ROTATOR | / \ IBEX_ROTATE_SELECT_H[01:00]} \ 07 ©06 45 | 67 05 04 | oo | oF ©03 02 | D4 | 54 00 07 06 | 00 | XX 01 85 | 53 IBEX 04 03 02 01 00 B4 | 45 | B4 | 0O IBEX2 VIC DATA 08 07 06 05 04 gF | D4 | 54 | 00 | 00 CYCLE 2 05 { S2 | c1{ 02 01 00 | OO | 06 | 8F | 0O 03 f 1BUF ROTATOR ! / \ IBEX_ROTATE_SELECT_H[01:00) b 07 06 45[57 05 04 03 02 01 0O 00 | XX | XX | XX | XX | XX IBEX 07 06 05 04 55]53152[01 03 02 01 B4 | 45 | B4 00 oo] IBEX2 VIC DATA MA_X0D6S_88 Figure 3-7 Rotator DIGITAL INTERNAL USE ONLY instruction Fetch 3-13 3.2.3.1 IBEX2 Rotate Data When IBEX is empty, IBEX2 (if valid) replenishes IBUF with an I-stream. Figure 3-8 shows the loading of IBEX2 data into IBUF. The quadword that IBEX2 contained is partially loaded into both IBUF and IBEX. In Figure 3-8, the initial IBUF valid count is nine, the IBEX valid count is zero, and IBEX2 is valid. During cycle 1, seven bytes of IBUF data are decoded and shifted out of IBUF. The e new IBUF valid count is two. During cycle 2, IBEX2 data is selected at the rotator multiplexer. Seven bytes of IBEX2 data are loaded into IBUF; one byte is loaded into IBEX. The IBEX2 valid bit (IBEX2_VALID_H) is negated, the IBEX valid count (IBEX_VALID_COUNT._ e H[03:00]) is one, and the IBUF valid count IBUF_VC_H[03:00]) is nine. When the IBEX2 valid count is negated, a read request is sent to the VIC and the prefetch PC is incremented (IBFA_HOLD_PREFETCH_PC_H is negated). CYCLE 1 08 ©07 06 05 04 03 02 01 00 54 co 54 00 00 00 06 8F DO T IBUF ROTATOR _\ l 07 ©06 05 04 03 02 01 00 o IBEX_ROTATE_SELECT_H[01:00] \ / xx | Xx | Xx | xxX | XX | XX | XX | XX IBEX2 IBEX CYCLE 2 VIC DATA 08 54 07 10 04 0S o1 Do CF 00 03 56 02 55 01 00 54 co ? IBUF ROTATOR | \\‘ IBEX_ROTATE_SELECT_H{01:00] -1 //, 07 06 05 04 03 02 O1 00-—J Ba | xx | xx | xx | xx | xx | xx | xx IBEX Lo7 06 05 04 03 02 01 00 xx | oxx | xx | xx | xx | xx | Xx | XX IBEX2 VIC DATA MR_X0066_89 Figure 3-8 IBEX2 Rotate Data DIGITAL INTERNAL USE ONLY 3—-14 Instruction Fetch 3.2.4 Merger The merger consists of nine, 8-bit, 2-to-1 multiplexers where each multiplexer replenishes one of the nine bytes of the IBUF. Either rotate data or shift data can be selected as the input, with IBUF as the destination of the output. Figure 3-9 shows a simplified block diagram of the merger. Each merger multiplexer receives SHIFT DATA_H and ROTATE_DATA_H for inputs. MERGE_SELECT_H[08:00] selects each byte to be loaded into IBUF. SHIFT_DATA_H SN IBUF_DATA_H[71:64) ROTATE_DATA_H MERGE_SELECT_H[08] j/ SHIFT_DATA_H SN IBUF_DATA_H[63:56] ROTATE_DATA_H MERGE_SELECT_R[07) /}/ SHIFT_DATA_H IBUF_DATA_H[15:08] ROTATE_DATA_H / MERGE_SELECT_H|01] 1/ SHIFT_DATA_H IBUF_DATA_H[07:00] ROTATE_DATA_H MERGE_SELECT_H{[00] /( MR_X0067_89 Figure 3-9 Simplified Merger DIGITAL INTERNAL USE ONLY Instruction Fetch 3-15 Merge select is calculated by subtracting the XBAR shift count from the IBUF valid count. The select logic outputs a 9-bit field (MERGE_SELECT{08:00]), with each bit controlling one of the multiplexers of the merger. A logical one, in the field, selects rotate data while a logical zero selects shift data. Figure 3-10 shows the merger supplying IBUF with data from both the shifter and the rotator. With a XBAR shift count of 4, bytes 1 through 4 are shifted out of IBUF and bytes 0 (opcode) and 5 through 8 are inputs to the merger from the shifter. The merge select logic receives the XBAR shift count and the IBUF valid count, and outputs the 9bit field that selects shifter bytes 1 through 4 and rotator bytes 5 through 8. The opcode byte, because it is not shifted, is recirculated through the shifter and merger until the instruction is completely decoded. Table 31 is a sample of the merge select output. All outputs are based on an initial IBUF valid count of nine with different shift counts. o8 07 06 05 TOd 103 TOZ T(fl 0 1 1 1 SHIFT DATA ROTATE DATA . IBUF_VALID_COUNT{03:00] = S MERGE T O O S XBAR_SHIFTCOUNT[03'00] - 4 1 MERGE_SELECT[08:00) = 1EOQ SELECT L il 11 ROTATOR SHIFT DATA SHIFTER A [ IBEX2 DATA VIC DATA MR_X0068_89 Figure 3-10 XBAR Shift Count MERGE_SELECT_H[08:00] 1FF 1FE 1FC 1F8 1F0 1E0 1CO 180 = OO WWH O WO WY IBUF Valid Count NWR UL -] 00 O Sample Merge Select 100 S Table 3-1 Merger 0 DIGITAL INTERNAL USE ONLY 3-16 Instruction Fetch 3.2.5 Shifter The instruction buffer shifter is responsible for shifting out decoded bytes, holding the opcode byte (byte 0), and realigning the remaining valid bytes in IBUF. XBAR provides three signals that control the shifter: ' XSCA_SHIFTCOUNTY{03:00] provides the number of IBUF bytes decoded in the last cycle. XSCA_SHIFTOPCODE shifts the opcode out of byte 0. The signal is received when all of the specifier bytes of an instruction have been decoded. XSCA_FD_SHIFTOPCODE is asserted when the XBAR detects an FD (extended opcode) in the opcode byte of IBUF. The FD byte is shifted out of byte 0 and the second byte of the opcode is shifted into byte 0. NOTE . Figure 3-11 provides examples of the state of IBUF after the completion of three cycles. For simplification, no new I-stream is placed in IBUF at the completion of each cycle. Cycle 1 — XBAR decodes one byte and produces a shift count of one. Byte 1 is shifted out of IBUF and the remaining bytes are shifted down to replace it. Cycle 2 — XBAR decodes the two remaining bytes of the instruction, asserts XSCA_ SHIFTOPCODE_H, and produces a total shift count of three. The opcode and the two remaining bytes of the instruction are shifted out, with the remaining bytes in IBUF again being shifted down to replenish those shifted out. Cycle 3 — XBAR decodes the FD in the opcode byte of IBUF, asserts XSCA_FD_ SHIFTOPCODE_H, and produces a total shift count of one. The FD in the opcode byte is shifted out and the remaining bytes are shifted down one byte. DIGITAL INTERNAL USE ONLY Instruction Fetch 3-17 1 CYCLE 08 07 06 05 04 00 01 02 03 oo | 05 | pa | 7c | Fo | 55 | 44 | 0o | C1 fll | o8 o7 Jos os {04 loa [o2 00 SHIFT RIGHT {05 | Da |o1 o0 |7c | FD | 55 | 44 | c1 XSCA_SHIFTCOUNT_H[03:00] = 1 2 CYCLE 08 07 06 05 04 00 01 02 03 00 | 05 | Da | 7¢ | FD | 55 | 44 | C1 s 44 Joz | 08 |07 o6 os los oz oo | os o1 foo | Da | 7¢ | FD SHIFT | XSCA_SHIFTCOUNT_H|03.00] = 3 XSCA_SHIFTOPCODE_H (ASSERTED) 3 CYCLE 08 07 06 05 04 03 00 | o5 02 01 [e]+] | Da | 7¢ | FD 4\ { o8 07 oe |os los |oa |02 |01 00 05 D4 oo 7C SHIFT | XSCA_SHIFTCOUNT_H[03:00] = 1 XSCA_FD_SHIFTOPCODE_H (ASSERTED) MR_X0069_89 Figure 3-11 IBUF Data DIGITAL INTERNAL USE ONLY 3-18 Instruction Fetch 3.2.6 IBUF IBUF holds nine bytes of I-stream that are passed to XBAR for decoding. Byte 0 always contains the opcode of the instruction that is being decoded. A copy of this opcode is sent to the EBox, OPU, XBAR, and the branch prediction logic. Specifier bytes are parsed by XBAR and handled by the XBAR decode units while the opcode bytes remain in IBUF until the instruction has been completely decoded. 3.2.6.1 Simple Decode As the I-stream is presented to XBAR for decoding, logic in the instruction buffer performs a small amount of decode. The decoded I-stream provides branch information for the PCU and CSU. Register, short literal, and YREG information is also decoded and passed to XBAR. Figure 3-12 shows a block diagram of the instruction buffer decode units. 1 KOS 07 A 06 05\ 04 INSTRUCTION BUFFER 03 02 01 00 OPCODE[07:00] IBFB_BRANCH_INSTRUCTION_H » csu IBFB_BRANCH_OPCODE_H, L BRANGH IBFB_CACHEABLE H DECODE | \grp_LOOP_BRANCH_H, L PCU IBF8_UNCONDITIONAL_H XSCA_EXTENDED (FROM XBAR) YREG IBFA_YREG_F_H[04:01] DECODE IBFB_DATA[62:63, 55:54, 46:47) sL > (BFB_SL_MODE_H{07:05] DECODE 1BFB_DATA]71:68, 63:60, 55:52, 47:44] RESéSDTEER DECODE > IBFB_REGISTER_MODE_H[08:05) o MR _X0070_88 Figure 3-12 Simple Decode DIGITAL INTERNAL USE ONLY Instruction Fetch 3-19 3.2.6.2 Branch Decode Each time a new opcode is loaded into IBUF, the opcode is decoded to detect a branch instruction. The decoded branch opcode has two destinations, the CSU and the PCU. The CSU receives a single signal (IBFB_BRANCH_INSTRUCTION_H) that indicates a branch instruction is currently being decoded. The PCU receives more detailed information relevant to the branch. The four signals provide information te aid in directing the PCU and branch prediction cache (BPC) handling. The four signals provided by the branch decode logic are as follows: e IBFB_BRANCH_OPCODE_H is asserted if the opcode is a branch instruction. e IBFB_UNCONDITIONAL_H is asserted if the opcode is an unconditional branch. IBFB_LOOP_BRANCH_H is asserted for any loop branch instruction, except for an emulated branch. (Loop branches are AOBLEQ, AOBLSS, SOBGEQ, SOBGTR, e ACBL, ACBW, and ACBB.) e IBFB_CACHEABLE_H informs the PCU that the branch opcode in IBUF can be cached in the BPC. (Noncacheable branches are RSB, JSB, JMP, CALLG, and CALLS. These instructions do not have normal branch displacement specifiers.) 3.2.6.3 YREG Decode The low nibble of bytes 1, 2, 3, and 4 are decoded to provide YREG information for XBAR. When all four bits are set, PC addressing is possible. 3.2.6.4 Short Literal Decode Bytes 5, 6, and 7 are decoded to detect short literal specifiers. If the upper two bits of bytes 5, 6, and 7 are zeros, a short literal specifier is detected. Each bit in the signal corresponds to the byte number in the I-stream of IBUF. 3.2.6.5 Register Mode Decode Bytes 5 through 8 are decoded to a value of 5. The high nibble of the IBUF I-stream 1s input to the decoder, producing an output with each bit in the signal corresponding to the byte being decoded (for example, if IBFB_REGISTER_MODE_H[08:05] = 3, then bytes 5 and 6 may contain register specifiers). 3.2.7 Instruction Buffer Parity Instruction buffer data is byte parity protected and is checked at two locations. The outputs of IBEX and IBUF contain parity detection circuitry. As data is output from IBEX, a parity check is performed. IBFB receives partial parity from IBFA and performs the check. Detected errors assert IBFB_IBEX_ERROR_H, which asserts IBFB_FETCH_ERROR_H and is forwarded to the EBox as IBOX_FORK_ ERROR_H. The data that is being presented to the XBAR is checked on the output to the instruction buffer. IBFB receives partial parity from IBFA and performs the check. Detected errors assert IBUF_ERROR_H, which is latched to the XBAR as DECODE_ERROR_H. This error is forwarded to the EBox as IBOX_POINTER_ERROR_H. DIGITAL INTERNAL USE ONLY 3-20 Instruction Fetch 3.3 Instruction Buffer Interface The instruction buffer interface is a read-only port to the MBox. Requests are made across this port for the I-stream to be loaded into the VIC. Most requests are for a VIC block (four aligned quadwords). Figure 3—-13 summarizes the instruction buffer interface to the MBox. Table 3-2 describes each signal line. iBOX MBOX 1IBOX_18_REQUEST_H 07:03] 31:24, 23:13, 12:08, IBOX_IB_ADDRESS_H[ IBOX_IB_ADDRESS_PARITY_H[03, 02, 01, 00] IBOX_IB_ABORT_H L IBOX_ABORT_H. MBOX_!B_DATA_H[£3'00] MBOX_iB_DATA_PARITY_H[07:00) A_L MBOX_IB_RESPONSE_A&B_H, MBOX_IB_PAGE_FAULT_H, L IBFA PCHI PCLO pPCVC PCBP IBFA PCVC VICD VICD PCVC PCVC IBFA PCVC MR_X0071_88% Figure 3-13 Instruction Buffer Interface DIGITAL INTERNAL USE ONLY Instruction Fetch Table 3-2 3-21 Instruction Buffer Interface Signals Name IBOX_IB_REQUEST_H IBOX_IB_ADDRESS_H[31:03] Description The request for data from the MBox. Detection of a VIC miss or flush of the VIC asserts this line. Address lines for the requested quadword. The address is generated by the PCU and, because all requests are quadword aligned, the lower three bits of the address are assumed to be zero. IBOX_IB_ADDRESS_PARITY_H[03:00] Parity protection for the address that the IBox IBOX_IB_ABORT_H Asserted to abort a request. The signal aborts a request only if it is in the TB stage of the MBox. IBOX_ABORT_H sends. Overriding abort signal for the interface. When asserted, it asserts IBOX_IB_ABORT. An EBox flush or branch prediction unwind asserts this signal. This signal also aborts OPU port transactions. MBOX_IB_DATA_H[63:00] MBOX_IB_DATA_PARITY_H[07:00] MBOX_IB_RESPONSE MBOX_IB_PAGE_FAULT_H, L Returning quadwords requested by the IBox. Byte parity for the quadword being returned by the MBox. Informs the IBox that data will be returned in the next cycle. The signal is negated late in the cycle that the last quadword is being returned. Asserted to inform the IBox that the requested data has page faulted in the MBox. The instruction buffer and the XBAR receive this signal and inform the EBox of the page fault if decode of the data is attempted. DIGITAL INTERNAL USE ONLY 3-22 Instruction Fetch 3.3.1 Instruction Buffer Requests Figure 3—14 shows an instruction buffer request that is honored, with three quadwords written to the VIC. Usually, IBOX_IB_ADDRESS_H[31:03], IBOX_IB_REQUEST_H, and IBOX_IB_ABORT_ H are asserted. In response to these signals, the MBox latches the address, and detects and aborts the request each cycle. A request is initiated by negating the abort signal (IBOX_IB_ABORT_H) early in the cycle after the address has been latched in the MBox. This signal is negated when REQUEST_CONDITION_H is asserted by IBEX_EMPTY_H asserted, and IBEX2_ VALID_H, VIC_MATCHA_H, and VIC_MATCHB_H are all negated. The MBox detects the request early in this same cycle and the IBox lowers the request signal. The request is detected in the MBox translation buffer (TB) stage. In this stage, the MBox latches the request and address, and arbitrates for the TB. When control of the TB is received, a TB lookup and a validation of the translation is performed. Figure 3-14 shows the request in the TB stage for a single cycle. Figure 3-14 is a best case example because TB arbitration could cause several cycles of delay if the MBox was honoring other requests. The instruction buffer port has the lowest priority of all ports that arbitrate for the TB. Cache arbitration results in a cache hit or miss. A cache hit asserts MBOX_IB_ RESPONSE_H, late in the cycle. In the following cycle, the MBox returns the requested quadword (MBOX_IB_DATA[63:00]) and subsequent quadwords to the end of the block. Late in the same cycle that the last quadword is being returned, the MBox response signal is negated. In the same cycle, the abort signal then the request signal are asserted again. IBOX_IB_ADDRESS_H[31:03]{ ADDRESS >< ADDRESS)(ADDRESS >< IBOX_IB_REQUEST_H — IBOX_IB_ABORT_H I MBOX TB MBOX_IB_RESPONSE_H [T MBOX CACHE L | MBOX_IB_DATA[63:00]} ( Qwo >< Qw1 >< ) MBOX_IB_PAGE_FAULT_H IBOX_ABORT_H MR_X0072_89 Figure 3-14 Instruction Buffer Request DIGITAL INTERNAL USE ONLY Instruction Fetch 3-23 3.3.1.1 Aborting Requests Two abort signals are associated with the instruction buffer interface: IBOX_IB_ABORT_H IBOX_ABORT_H IBOX_IB_ABORT_H, when asserted, attempts to abort the request. This signal is not asserted for a request that has entered the cache stage because of the penalties the MBox must pay to clean up. The MBox returns the first quadword and then aborts. The quadword is not written to the VIC because IBOX_IB_ABORT_H, asserted, negates VIC_DATA_WRITE_H. : IBOX_ABORT, when asserted, unconditionally aborts instruction buffer requests. This signal is asserted by an EBox flush (EBOX_FLUSH_H[02:00]) due to an error, interrupt, or branch unwind situation. 3.3.1.2 Page Faults When a page fault is detected, the MBox asserts MBOX_IB_RESPONSE_H and MBOX_ _H in the same cycle. The page fault signal is sent to the instruction IB_PAGE_FAULT buffer and the XBAR. The XBAR notifies the EBox of page faults if decode of the data is attempted. If the VIC is flushed, the data is not accessed and the MBox clears the page fault and associated registers. DIGITAL INTERNAL USE ONLY 3-23 4 Instruction Decode This chapter describes the two functional units representing the instruction decode pipeline stage: the XBAR and branch prediction logic. 4.1 XBAR The XBAR decodes the individual macroinstructions and determines the following: Number of instruction specifiers Destination of each decoded specifier Number of specifiers decoded in a single cycle Each cycle, the XBAR attempts to decode and pass specifier data to the specifier handling units of the IBox: SLU, CSU, and FPL. Because each unit processes a unique specifier type, the number of combinations of specifiers the XBAR can successfully decode each cycle is restricted. The XBAR can decode up to three specifiers in a single cycle. The specifiers may be all register mode, two register mode and one short literal or complex, or one of each (register, complex, or short literal) in any order. Complex specifiers are branch displacements and all specifiers other than short literal mode and register mode specifiers. The XBAR concurrently processes specifiers of a single instruction only. The XBAR is also responsible for generating read and write register masks for conflict checking by the CSU. In special cases where conflicts occur in a single instruction, the XBAR redirects register specifiers to the CSU for processing. Figure 4-1 is a block diagram of the XBAR. As shown in Figure 4-1, the XBAR receives 72 bits of instruction buffer data and distributes it throughout the associated MCAs (XSCA, XDTA, and XDTB). The following list introduces the major XBAR functional blocks. DIGITAL INTERNAL USE ONLY 4-1 Instruction Decode —~ 4MHO P d vdgl [0ILVYH:OILH5IST0] XH3H10SO18943!H lLNo3W3D:Vd1SI€A] [0©3:H€A0] 219017 21901 a4d| NOILVYNILS3A 01907 1s3Nn03u 21901 3go09o13qQ S3A4Hl “ 30934 wvyda y34ng Figure 4-1 XBAR Block Diagram DIGITAL INTERNAL USE ONLY 3104I0dH0SO IH3N1410O3dDS 3sJ3aOyaWy 3VdiAvla f¥aovs3v:anvyi](1XIL0SiL]VM: YNCL00X 68 4-2 QHi 21901 instruction Decode 4-3 DRAM and XRAM — The DRAM and XRAM decode the opcode and produce the following: — Specifier count — The total number of specifiers the instruction contains. — Specifier data type — The data type of each specifier in the instruction (byte, word, longword, etc.). — Specifier access type — The access type of each specifier in the instruction : (read, write, modify, etc.). The XRAM specifier count output controls the instruction buffer shifter by providing a running total of specifiers decoded each cycle and by informing the instruction buffer to load a new opcode into the opcode byte (byte 0) of the IBUF when all of the specifiers of an instruction have been decoded by the XBAR. The specifier attribute outputs (data type and access type) are distributed throughout the XBAR to the individual data path units (displacement, short literal, and source and destination logic units) to validate outputs to the specifier handlers. These outputs are also used to generate a read and write mask and to detect intrainstruction read conflicts (IRC). Simple decode logic — Simple decode logic decodes the I-stream and provides the addressing mode (register mode, absolute mode, and so on) for the first four bytes of the instruction. The addressing mode outputs are used to validate the data path unit outputs and by the decode tree logic and request logic when generating their shift count and specifier count outputs. Decode tree and request logic — Each cycle, the decode tree logic and request logic perform a parallel operation that determines the number of specifiers the XBAR will decode and the number of specifier bytes that will be decoded. The decode trees produce shift count and specifiers decoded outputs based on the decoded addressing modes and data types and by decoding the I-stream. There are 14 pairs of output (one shift count and one number of specifiers decoded) generated each decode cycle. Request logic decodes addressing modes, specifier access types, and the specifier count. It produces an output that selects the correct decode tree output for the instruction currently being decoded. For example, for the instruction ADDL2 RO, R1, the request logic determines that there are two specifiers, the access types are read and modify, and there are two register specifiers in the instruction. The request logic selects R2 (R2_SC_H[02:00] shift count and R2_N_H[001:00] specifiers decoded count) from the request logic. The R2 output provides the correct shift count to the instruction buffer and the correct number of specifiers decoded to the specifier count logic. The decode tree logic has produced 14 outputs for the ADDL2 RO, R1 with only one of the outputs producing the correct specifiers decoded and the correct shift count. The R2 output is produced by logic that produces only an output that is based on an instruction that contains two specifiers. In the same cycle that the R2 output is selected, outputs based on different numbers and structures of specifiers are produced. If the R3BW output were selected, the shift count and number of specifiers decoded would be equal to those based on decoding three specifiers, with the third | specifier being a branch word displacement. DIGITAL INTERNAL USE ONLY 4-4 |nstruction Decode The parallel operation of the decode tree logic and the request logic is implemented to reduce the cycle time required to decode the specifiers of an instruction (to determine their addressing modes, access types, and data types) and then calculate the number of bytes that will be decoded and the number of specifiers that can be decoded. e Displacement logic — The displacement logic receives the complex specifiers (other than register and short literal specifiers) and, when a specifier is valid, passes up to 32 bits of displacement to the CSU of the OPU MCU. e Short literal logic — The short literal unit decodes the short literal specifiers and passes six bits of short literal data to the SL specifier handler in the OPU for expansion. e Source and destination logic — The source and destination logic outputs source 1 pointers, source 2 pointers, destination pointers, and a read and write register field to the OPU specifier handlers and to the read/write mask logic. The source and destination pointers define the operand addresses (register number or memory location) of the specifiers. The read/write masks record the reading and writing of registers by the EBox during the execution of an instruction. e IRC logic — The IRC logic detects intra-instruction read conflicts. These conflicts occur when there are read conflicts in the specifiers of an instruction. That is, an instruction specifier directs the EBox to read RO and a subsequent specifier directs the IBox to autoincrement or autodecrement R0. When IRCs occur, the IBox decodes each specifier separately and does not update the EBox GPRs until instruction execution is completed by the EBox. 4.1.1 DRAM The XBAR DRAM is implemented as a functional logic block in the XSCA MCA. That is, the DRAM is comprised of logic gates instead of a RAM structure, as with previous VAX systems. The inputs to the DRAM logic are the instruction opcode, the extended opcode bit, and the specifiers remaining in the instruction currently being decoded. Based on theses inputs, the number of specifiers the instruction contains and the specifier attributes (access type and data type of each specifier) are determined. The outputs of the DRAM are passed to the other functional units of the XBAR and are used to validate the specifier handler outputs and also are used in the specifier count and shift count logic. Figure 4-2 shows a block diagram of the DRAM logic. ASRC ORAM SPECIFIER 1 SPECIFIER 2 EXTENDED_H [_DATA_H[07:00] SP1_ACCESS_TYPE| waire SP2_ACCESS_TYPE VSRC READ SP3_ACCESS_TYPE| BRANCH BYTE BRANCH BYTE MODIFY IMPLIED READ SPECIFIER 3 Byre SPECIFIER 4 . WORD SPECIFIER 5 SPECT_DATA_TYPE_H[02:00]) | aNGWORD SPECIFIER 6 SPEC2 _DATA_TYPE_H[02:00] 83?233828 SPEC3_DATA_TYPE_H[02.00] | £ FrO4TING G FLOATING H FLOATING SPECIFIERS_COMPLETED_H[02:00] Figure 4-2 DRAM Logic - DIGITAL INTERNAL USE ONLY MR X0269 89 Instruction Decode 4-5 DRAM data type logic decodes the opcode (I_DATA_H[07:00)) and the extended opcode bit (EXTENDED_H), and produces SP1_DATA_TYPE_H[02:00] through SP6_DATA_ TYPE_H[02:00]. Each output describes the data type of the specifier it represents. (For example, SP1_DATA_TYPE_H[02:00] describes the data type of the first specifier of the | instruction being decoded.) Each cycle, SPECIFIERS_COMPLETED_H[02:00] selects up to three of the data type outputs to be used by the XBAR decode units. SPECIFIERS_COMPLETED_H[02:00] is the number of specifiers that have been decoded in the instruction that is currently being decoded. DRAM access type outputs are also based on the decode of the instruction opcode and the extended opcode bit. The access type of each specifier (for example, SPEC1_READ_H and SPEC1_WRITE_H) is placed on the DRAM output multiplexers and is also selected by SPECIFIERS_COMPLETED_H[02:00]. 4.1.2 XRAM The XRAM is physically structured like the DRAM and is contained on XDTB. The XRAM decodes the instruction opcodes and the extended bit, and produces five fields: s SPECIFIER_COUNT_H[02:00] is the total number of specifiers the instruction « e IMPLIED_MASK_H is asserted when a character string instruction is detected. VSRC_FORK_MODIFY_H is asserted when a variable length bit field instruction is e SUSPEND_H is asserted because the instruction that is currently being decoded contains. decoded. does not leave predictable results in memory or registers. XBAR continues to decode specifiers until it encounters a complex specifier and then suspends (stops all decoding) and waits for the EBox to restart it (EBOX_UNSUSPEND_H). Some of the instructions that assert SUSPEND_H are as follows: ADAWI (add aligned word interlocked) ADDP4 (add packed 4-operand) EDITPC (edit packed to character string) e STOP_H is similar to SUSPEND_H. STOP_H is asserted by certain instructions after all of the specifiers of the instruction have been decoded. When STOP_H is asserted, the XBAR does not process any specifiers until the EBox asserts EBOX_ UNSUSPEND_H. Some instruction that assert STOP_H are as follows: HALT CASEx (case byte, word, longword) CHMx (change mode) DIGITAL INTERNAL USE ONLY 4—6 Instruction Decode 4.1.3 Simple Decode Logic The simple decode logic receives the high nibble of the first four bytes of instruction buffer data and I_YREG_F[04:01] (used to determine which bytes are relative addressing mode or immediate addressing mode) from the instruction buffer simple decode logic. Outputs defining the addressing modes of bytes 1 through 4 of the I-stream are produced from these inputs. Figure 4-3 shows the inputs and outputs of the simple decode logic. The simple decode logic also decodes the I-stream to produce CASE_H{01:00]. This 2-bit signal defines where in the I-stream the complex specifier is located. CASE_H[01:00] is valid when any addressing mode other than short literal or register is detected. Table 4-1 describes the four case outputs. CASE_H[01:00] is input to the decode tree logic to produce shift counts for complex specifiers. Because CASE_H[01:00] determines only the location of the complex specifier, it must be decoded with the addressing mode to produce the length of the specifier for the shift count logic. REGISTER_MODE][04:01] SIMPLE |_DATA_H[39:36, 31:28, 2320, 15:12] DECODE LOGIC |_DATA_L[39:36, 31:28, 23.20, 15:12] I_YREG_F_L[04:01] SL_MODE[04:01] INDEX_MODE[03:01] COMPLEX[04:01] ABSOLUTE_MODE[04:01) IMMEDIATE_MODE[04:01] CASE[01:00] MR_X0074_89 Figure 4-3 Case Output CASE_H[01:00] Byte Position of Complex Specifier 00 b N W Table 4-1 Simple Decode Logic 01 10 11 DIGITAL INTERNAL USE ONLY Instruction Decode 4-7 4.1.4 Decode Tree Logic Each cycle, the decode tree logic decodes the I-stream and produces 14 unique outputs. One of the outputs is selected, at the end of a cycle, to produce a shift count and the number of specifiers decoded. Figure 4—4 shows the inputs and outputs of the decode tree logic. DECODE TREES R1_SC_H[02:00} R1_SC_L{02:00} R1 R1_N_H[00] R1BB_SC_H[00] |_DATA_H[38:37, 31:29, 23:21, 15:13] R1BB R1BB_SC_L[00] R1BB_N_H[00] 23:21, 15:13) [39:37, 31:29, |_DATA_L R1BW_SC_L[01] R1BW !_VALID_H[08:01] R1BW_N_H[00] |_VALID_L[08:01] REGISTER_MODE_H[07:03, 01] REGISTER_MODE_L[08:01] R1I "2 R1{_N_H{00] R2_SC_H[02:00]} R2_N_H[01:00] SL_MODE_H[01] R2BB_SC_H[01:00] SL_MODE_L[07:01] R2BB R2BB_N_H[01:00] INDEX_MODE_H[02:01] 01] INDEX_MODE_L{02 R2BW_SC_H[01:00] R2BW R2BW_N_H[01:00] ABSOLUTE_MODE_H[04:01] ABSOLUTE_MODE_L[03:01] R2/_SC_H[00] R2I R2I_N_H[01:00} IMMEDIATE_MODE_L{03:01] CASE_H[01:00] R2R R2R_SC_H[02:00] R2R_N_H{01:00] CASE_L[01:00] SP2_DATATYPE_H[02:00] R3 SP1_DATATYPE_H[02°00] R3_SC_H[03:00] R3_N_H[01:00] R3BB_SC_H[01:00] R3BB IRC_L R3BB_N_H[01:00] R3BW_SC_H[02:00) OPU_STALL_H R3BW OPU_STALL_L R3BW_N_H[01:00] R3XR_SC_L[{03:00] SL_STALL_H SL_STALL_L R3XR R3XR_N_r[01:00] MR_X0075_89 Figure 4-4 XBAR Decode Trees DIGITAL INTERNAL USE ONLY 4-8 Instruction Decode Each of the 14 outputs contains 2 fields. The two fields determine the number of specifiers decoded and a shift count for the XBAR: Rx_SC_H[02:00] is the shift count. The shift count is the number of IBUF bytes that will be decoded when this tree output is selected. Rx_N_H[01:00] is the specifier count. This count represents the number of specifiers that is decoded when this tree output is selected. The decode tree logic receives the following inputs: I-stream — The high nibble of bytes 1 through 4 (I_DATA_H[38:37, 31:29, 23:21, 15:13]) and their valid bits (I_VALID_H[04:01]). The valid signals are received from the IBUF valid count. Addressing modes — The addressing mode logic decodes the I-stream and supplies the addressing mode. Case — Indicates the location of the complex specifier in the I-stream. Data type — The DRAM decodes the opcode and outputs the specifier data types to the tree logic. IRC — IRC decode logic passes this signal when an IRC is detected in the I-stream currently being decoded. Stall — If the CSU or SLU is stalled, the decode tree logic is notified. The stall signal from either of these two units (OPU_STALL_H and SL_BUSY_STALL) influences decode tree output. If the SLU is stalled, the decode tree cannot produce an output that selects data to be sent to the SLU of the OPU. Figure 4-5 shows a block diagram of the R2BW decode tree. This example of the tree logic is used because it is one of the less complex. OPU_STALL_H [~ O« WO o | I_VALID_L[03] ) R2BW_SC_H[01] R2BW_N_H[01] I_VALID_H[01] R2BW_SC_H[00) : I_VALID_H[03] I_VALID_L[01] R2BW_N_H[01] OPU_STALL_L I_VALID_H[03] ’?J MR_X0076_8% Figure 4-5 R2BW Decode Tree DIGITAL INTERNAL USE ONLY instruction Decode 4-9 The outputs of the R2BW tree logic are based on an I-stream that contains two valid specifiers, with the second specifier being a word displacement of a branch instruction. As shown in Figure 4-5, I_VALID_H[03] (IBUF valid bit for byte 3) selects R2BW_N_ valid H[00] from I_VALID_L[01] or OPU_STALL_L (the stall signal for the CSU). Theand the _H{00], R2BW_SC produce to ORed are IBUF the of 3 bits for IBUF bytes 1 and R2BW_ and _H[01] R2BW_SC produce signal stall CSU the valid bit for IBUF byte 3 and N_H[01]. This tree logic output would be selected if the XBAR is decoding an ACBW R1, R2, R3, in displacement word and the first two specifier bytes of the instruction had been decoded s, a previous cycle. The output would be valid because the XBAR is decoding two specifier with the second specifier being a branch word displacement. The output for this decode cycle of the instruction would be R2BW_SC_H = 2 and R2BW_N_H = 2. Table 4—2 describes the I-stream addressed by each decode tree. Table 4-2 XBAR Decode Trees Name Function RO No specifiers are decoded. R1 Decodes one specifier. R1BB Decodes one branch byte specifier. R1BW Decodes one branch word specifier. R1l Decodes one implied specifier. R2 Decodes two specifiers. R2BB Decodes two specifiers; the first is a register or short literal specifier and the R2BW Decodes two specifiers; the first is a register or short literal specifier and the R2I Decodes two specifiers; the second is implied. R2R R3 second is a branch byte displacement. second is a branch word displacement. Decodes two specifiers; the second is a register specifier. Decodes three nonconflicting specifiers. R3BB Decodes three specifiers; the third is a branch byte displacement. R3XR Decodes three specifiers but not in the same cycle. R3BW Decodes three specifiers; the third is a branch word displacement. 4.1.5 Request Logic outputs The request logic decodes the specifier addressing modes and access types, and it (Rx_ the 4-bit field (REQUEST_H[03:00]) that selects 1 of the 14 decode tree outputs SC_H[02:00] and Rx_N_H[01:00]). Figure 4-6 shows the relationship of these units and shows the inputs and outputs of the request logic. DIGITAL INTERNAL USE ONLY 4-10 Instruction Decode REQUEST LOGIC READ_H[02] READ_L[03:01] WRITE_H[03:02] WRITE_L[03:02] MODIFY_H[03:02) MODIFY_L[03:02] BRANCH_BYTE_H[03:01) BRANCH_BYTE_L[02] BRANCH_WORD_H[03, 01] BRANCH_WORD_L[03:02] VSRC_READ_H[02] VSRC_READ_L{02.01] REQUEST_H[03:00] VSRC_MOCDIFY_H[02] REQUEST_L[03:00] VSRC_MODIFY_L[02] ASRC_H{02] ASRC_L[02:01] IMPLIED_READ_H[02:01] IMPLIED_WRITE_H[02:01] REGISTER_MODE_H[03:01] REGISTER_MODE_L[02:01] SL_MODE_H[01] SL_MODE_L{02:01] INDEX_MODE_L[03:01] COMPLEX_L[04:01] SPECIFIERS_NEEDED_H[01:00] SPECIFIERS_NEEDED_L[01:00] XBAR_STALL_H DECODE TREES R1_SC_H{02:00] R1_SC_L[02:00] Rx_SC_H[02:00] R1_N_H[00] Rx_N_H[01:00] R3XR_SC_L{03:00} R3XR_N_H[01:00} MR _X0077_89 Figure 4-6 Request Logic DIGITAL INTERNAL USE ONLY Instruction Decode 4-11 The request logic uses SPECIFIERS_NEEDED_H[01:00] to determine the number of specifiers to decode. (For example, if only one specifier is to be decoded, the request logic selects only one of the R1 trees.) The addressing mode logic provides four inputs: Register mode Short literal mode Index mode Complex These inputs determine the number of specifiers that can be decoded and passed to the specifier handlers. For example, for ADDL3 R1 #43 R5, the request logic and decode tree logic perform their parallel operations as follows: Request logic — Receives SPECIFIERS_NEEDED_H[01:00] = 3 (the number of specifiers in the instruction) from the XRAM and specifier count logic. The addressing modes of the three specifiers in the instruction are provided by the simple decode logic. (REGISTER_MODE_H[03:01] = 5, specifiers 1 and 3 are register specifiers and SL_MODE_H[02:01] = 2, specifier 2 is a SL specifier). The access type of the specifiers is also input to the request logic. These inputs, supplied by the DRAM, would be READ_H[03:01] = 011 and WRITE_H[03:02] = 10, signifying read, read, write as the order of access in the instruction. Specifier data types (SP1_DATATYPE_H[02:00] and SP2_DATATYPE_H[02:00] = 0 longword) are also input to the request logic. This signal is also from the DRAM. The request logic output (REQUEST_H[03:00]) produces an output based on the above inputs that selects the R3_SC_H[03:00] and R3_N_H[01:00]. Tree logic functions — In parallel with the request logic functions, the decode trees decode the I-stream, the addressing modes, and data types of the specifiers and stall signals from the specifier handlers of the OPU to produce shift counts and specifier decode counts for the instruction. The tree logic (specifically the R3 tree) produces an output that produces shift counts (R3_SC_H[02:00]) and the number of specifiers decoded (R3_N_H[01:00]) for an instruction that contains three specifiers. The tree logic produces the R3_N_H[01:00] outputs first and uses them as a basis for generating R3_SC_H[03:00]. To generate R3_N_H[01:00], the logic inputs the valid counts for the instruction buffer data that is being decoded (I_VALID_H[08:00]). DIGITAL INTERNAL USE ONLY 4-12 |nstruction Decode 4.1.6 Specifier Count Logic Specifier counts are produced every cycle to determine the initial number of specifiers an instruction contains (SPECIFIER_COUNT_H[02:00]), the number decoded in a cycle (SPECIFIERS_DECODED_H[02:00]), and the number that remain to be decoded (SPECIFIERS_REMAINING_H[02:00]). Figure 4—7 shows the specifier count logic. Initially, SPECIFIER_COUNT_H[02:00] is produced by the XRAM and input to the specifier count logic. The XRAM generates the specifier count based on the opcode of the instruction. ’ The decoded specifier output of the decode tree logic (N_L[01:00]) is subtracted from the specifier count to produce SPECIFIERS_REMAINING_H{[02:00]. e SPECIFIERS_NEEDED_H[01:00] is the number of specifiers needed in the current decode cycle. This signal is decoded so that it represents only a maximum of three because the maximum number of specifiers decoded in a single cycle is three. This signal is loaded into an adder and a comparator. e XSCA_SPECIFIERS_DECODED_H[01:00] is the latched value of N_L[01:00] and is loaded into an adder with XDTB_SPECIFIERS_REMAINING_H[02:00] to produce XSCA_SPECIFIERS_REMAINING_H[02:00]. e The comparator receives N_L[01:00] and XSCA_SPECIFIERS_REMAINING_ H[02:00], and outputs XSCA_ALL_SPECIFIERS_COMPLETED_H when they are equal. ALL_SPECIFIERS_COMPLETED_H COMPARE N_L{01:00] XSCA_SPECIFIERS_DECODED_H[01:00] XDTE_SPECIFIERS_REMAINING _H[02:00] SPECIFIER_COUNT_H[02'00] LATCH SPECIFIERS_NEEDED_H[02:00} (SUBTRACT) ! L OPCODE_REOUIRED o ,j XDTB_SPECIFIERS_COMPLETED[02:00] i (ADD) | OPCODE_REOUIRED_L ! ‘ MA_X6078_839 Figure 4-7 Specifier Count Logic DIGITAL INTERNAL USE ONLY Instruction Decode 4-13 4.1.7 Shift Count Logic Each cycle, the XBAR passes a shift count (XSCA_SHIFTCOUNT_H[03:00]) to the instruction buffer. The shift count directs shifting of decoded bytes out of the instruction buffer and replenishment with a new I-stream. Three signals provide control to the instruction buffer shifter: XSCA_SHIFTCOUNT_H[03:00] XSCA_FD_SHIFTOPCODE_H XSCA_SHIFTOPCODE_H Figure 4-8 shows the generation of these three signals. OUNT is generated from the decode tree logic SC_ _H[03:00] XSCA_SPECIFIER_C H[02:00] and is passed as the value to be loaded into the instruction buffer shifter. When an instruction is completely decoded, XSCA_SHIFTOPCODE_H is asserted and selects the incremented SC_H[02:00] to be sent to the shifter. 4.1.7.1 FD Shift Opcode XSCA decodes the FD opcodes and asserts XSCA_FD_DETECTED_H. This signal is passed to the shift count logic, which asserts XSCA_FD_SHIFTOPCODE_H and directs the instruction buffer to shift out the FD opcode. Shifting the FD out of the opcode byte results in a shift count of one, as no specifiers are decoded until the FD is shifted. SC_H[{02 .00} XSCA_SHIFTCOUNT_H[03:00] ey J SPECIFIERS_REMAINING_H[02:00] COMPARE N_L[01:00) XSCA_FD_DETECTED_H LATCH XSCA_SHIFTOPCODE_H XSCA__FD_SHIFTOPCODE;_H MR_X0079_89 Figure 4-8 Shift Counts DIGITAL INTERNAL USE ONLY 4-14 |nstruction Decode 4.1.8 Fork Logic The EBox receives a fork address for each instruction that is decoded by the XBAR. The fork address is supplied by the XBAR and the instruction buffer. The instruction buffer sends two fork signals: IBOX_FORK_ADDRESS_H[07:00] IBOX_FORK_ADDRESS_PARITY_H The fork address is a copy of the opcode. The XBAR outputs three fork signals: e IBOX_FORK_VALID_H is the valid signal for the fork address. e IBOX_FORK_ADDRESS_H[08] is asserted when an FD opcode is decoded. e IBOX REGISTER_FORK_H distinguishes between a register and a memory reference (asserted = register) when a VSRC specifier is decoded. When a VSRC specifier is to be decoded, the fork is not validated until the determination between memory or register reference is made. 4.1.9 XBAR Displacement Data Path Up to 32 bits of displacement can be passed by the XBAR to the CSU in a single cycle. Figure 4-9 shows the logic that decodes the complex specifiers and passes the related displacement to the CSU. The XBAR displacement logic outputs four fields to the CSU: e DISPLACEMENT H[31:00] is 32 bits of branch displacement or complex specifier data. e XREG_HJ[03:00] indicates the GPR of index register for indexed operand specifiers. e YREG_H[03:00] indicates the GPR of base register for the operand specifier being delivered. e XDTB_INDEXED_H indicates the specifier under decode is indexed mode. 4.1.9.1 Displacement DISPLACEMENT H([31:00] is provided by XDTA and XDTB. The 32-bit field is nibble sliced as follows: XDTA_DISPLACEMENT_H[27:24, 19:16, 11:08, 03:00] XDTB_DISPLACEMENT_H[31:28, 23:20, 15:12, 07:04] The displacement field is selected from the instruction buffer data by decoding: XSCA_REQUEST_H[03:00] CASE_H[01:00] XSCA_X8F_H Decoding these three fields determines where in the I-stream the complex specifier is located and if the specifier is an extended immediate mode specifier (XSCA_X8F_H). DIGITAL INTERNAL USE ONLY Instruction Decode 4-15 XSCA_REQUEST_H[03:00] DISPLACEMENT CASE_H{01:00] SELECT XSCA_X8F _H [63:32] I_DATA_H[71:00] [5524] DISPLACEMENT_H[31:00] [71:40] > [4718] [39.08] [19:16] [11:08] XREG_H[03:00] [27:24) [27:24] [19:186] \I CASE_H[01:00) YREG_H[03:00] {35:32] [11:08) XSCA_REQUEST_H[03/00] vREG CASE_H[01.00] sErEeT XSCA_IRC_H REGISTER_MODE_H[01] CASE_H[01:00] INDEX IRC_REG!STER_H XDTB_INDEXED_H DECODE IMPLIED_H INDEX_MODE_H[02:00] MR_Xo080_89 Figure 4-9 XBAR Displacement 4.1.9.2 Extended Immediate Mode (X8F) Detection Most complex specifiers are decoded by the crossbar and passed to the CSU in a single cycle across the 32-bit XBAR to OPU data path. Extended immediate mode specifiers are of data types longer than 32 bits and require more than a single cycle to be decoded and passed to the OPU. Because of the size of these specifiers, special handling by the XBAR is required to process them. The special handling of these specifiers involves manipulating the shift counts and specifier counts produced by the XBAR. XSCA contains logic that detects extended immediate mode specifiers. This logic decodes the specifiers’ data types (SP1_DATATYPE_H[02:01] and so on) from the DRAM, IMMEDIATE_MODE_H[04:01], INDEX_MODE_H[02:01], and CASE_H[01:00] (all from the simple decode logic). When extended immediate mode is detected, X8F_H, X8F_ INHIBIT_SHIFTOPCODE_H, and X8F_SC_H[02] are asserted. Asserting these X8F signals forces N_H[01:00] (from the decode trees and signifying the number of specifiers decoded) to equal 1 and also forces SC_H[02:00] (also from the decode trees and signifying the number of bytes to shift out of the instruction buffer) to equal 4. This scenario allows the XBAR to pass the decoded specifier to the OPU in multiple cycles without incorrectly affecting the specifier count logic and without asserting SHIFTOPCODE_H before the specifier is completely decoded. DIGITAL INTERNAL USE ONLY 4-16 Instruction Decode 4.1.9.3 XREG When an indexed specifier is decoded, XREG_H[03:00] is asserted and sent with XDTB_INDEXED_H to the CSU. XREG_H[03:00] identifies the index register. XDTB_ INDEXED_H is generated by decoding the following: e JRC_REGISTER_MODE is asserted when the IRC is detected and the specifier is register mode. e IMPLIED_H is asserted by decoding the request logic field. e CASE_H[01:00] identifies the location of the complex specifier in the I-stream. e INDEX_MODE_H[02:00], from the addressing mode logic, defines which specifier, if any, is decoded as an index specifier. 4.1.9.4 YREG YREG_H[03:00] is generated in XDTA and sent to OSQA (CSU) to indicate which GPR the CSU references for an operand address calculation. Byte 1, 2, 3, or 4 of instruction buffer data is selected by CASE_H[01:00] to produce the YREG output. The YREG generation logic also contains inputs from the request logic, DRAM, and IRC detection logic. When an IRC is detected, this logic supplies the GPR number for specifiers during IRC handling. 4.1.10 XBAR Short Literal Data Path The XBAR short literal logic outputs a 6-bit short literal specifier, a valid bit, and the number of the short literal specifier to the short literal expansion unit of the specifier evaluation logic. Figure 4-10 shows the organization of the XBAR short literal logic. 4.1.10.1 Short Literal Data Select The XBAR SL data path receives the following: e Instruction buffer data (I_DATA_H[67:64, 59:56, 51:48, 39:32, 31:24, 23:16, 15:08]). e SL_MODE_H[08:01] is from the XBAR simple decode logic and the instruction buffer simple decode logic. It identifies the bytes that contain short literal specifiers. That is, if bit 1 is asserted, then a short literal specifier is detected in the byte 1 position of the I-stream. e XSCA_REQUEST_H[03:00] is the output of the request logic. The specifier decode logic decodes the instruction buffer data and the addressing mode to produce the following: e SPx_SL_H[05:00] contains the 6-bit short literal data for three specifiers (x = 1, 2, or 3). e SPx_SL_H corresponds to the three byte positions that the SL data could be in. SP1_ SL_H, asserted, denotes specifier 1 is a short literal specifier. The byte position of the short literal specifier (SPx_SL_H) is decoded with XSCA_ REQUEST_H[03:00] to produce the short literal specifier offset (SL_SPECIFIER_ OFFSET_H[02:00]). This signal selects the short literal data and outputs the field (XDTA_SL_H[05:04] and XDTB_SL_H[03:00]) to the SL specifier handler. DIGITAL INTERNAL USE ONLY instruction Decode SP1_SL_H[05:00] |_DATA_H[67:64, 59:56, 51:48, 45°40, 39:32, 31:24, 21:16, 13:08] sL FIER SPECI DEGODE SL_MODE_H[08:01] $P1_SL_H 4-17 SP2_SL_H[05:00] SP3_SL_H[05:00] SL_SPECIFIER_OFFSET{02:00] SP2_SL_H J XDTB_SL_H[05:04) —> XDTA_SL_H[03:00] \ DECODE SP3_SL_H XDTB_SL_SPECIFIER _NUMBER_H[02:00] OPU_SPECIFIERS_COMPLETED[02:00] lJ SP1_SL_H ' sP2_SL_H XDTB_SL_VALID_H SP3_SL_H XSCA_REQUEST_H[03:00] DECODE MR_X0081_89 Figure 4-10 Short Literal Logic Short Literal Specifier Number 4.1.10.2 XDTB_SL_SPECIFIER_NUMBER_H([02:00] defines which specifier of the instruction is being passed to the SL specifier handler. This signal is generated by adding the specifier byte position to OPU_SPECIFIERS_COMPLETED_H[02:OO]. 4.1.10.3 Short Literal Valid XDTB_SL_VALID_H is sent with the valid short literal data to the SLU of the OPU. _ This signal is generated by decoding the input from the request logic XSCA_REQUEST H[03:00]) and selecting the valid byte containing the short literal specifier. DIGITAL INTERNAL USE ONLY 4-18 Instruction Decode 4.1.11 XBAR Source and Destination Logic Each specifier decoded and passed to the specifier handlers has an associated source and destination field to identify the function of the specifier. The source and destination fields are processed by the specifier handlers and loaded into the EBox pointer queues until the instruction they represent is executed. Figure 4-11 shows the inputs and outputs of the source and destination logic. SOURCE AND DESTINATION LOGIC ASRC_H[01] BRANCH_BYTE_H[03:01] BRANCH_WORD_H[03, 01] DESTINATION_REG_H[03:00] DESTINATION_REG_VALID_H DESTINATION_VALID_H IMPLIED_WRITE_H[02:01] MODIFY_H[02:01] MASK_SOURCE1_REG_VALID_H READ_H[02:01) SOURCE1_REG_H[03:00} VSRC_MODIFY_H[02:01] SOURCE1_REG_VALID_H VSRC_READ_H[01] SOURCE1_VALID_H WRITE_H[02:01] SOURCE2_REG_H[03:00] R1I_REQUEST_L SOURCEZ_REG_VALID_H R1BX_REQUEST_t SOURCE2_VALID_H R2BX_REQUEST_L SP1_REGISTER_DESTINATION_H SP1_REGISTER_H{03:00] SP2_REGISTER_DESTINATION_H SP1_REGISTER_MODE_H SP2_REGISTER_H[03:00] SP2_REGISTER_MODE_H SP3_REGISTER_H[03:00] SP3_REGISTER_MODE_H XSCA_IRC_L XSCA_SPECIFIERS_DECODED_H[01:00} MA_X0320_8% Figure 4-11 Source and Destination Logic DIGITAL INTERNAL USE ONLY 4-19 instruction Decode 4.1.12 XBAR Source 1 Data Path The source 1 data path logic passes three fields to the FPL and, if the specifier is a register, also passes a register field to the mask logic. Figure 4-12 shows the XBAR : source 1 logic. The XBAR source 1 logic passes the low nibble of instruction buffer byte 1 to the FPL as the register field XDTA_SOURCE1_REG_H[03:00]). The source 1 validation logic receives the following: REGISTER_MODE_H[01] XSCA_REQUEST_H[03:00] SPECIFIER1_ACCESS_TYPE_H XSCA_SPECIFIERS_DECODED_H[02:00] These fields are decoded to output the two source 1 valid fields: e XDTA_SOURCE1_REG_VALID_H is asserted when the present source 1 operand is a register specifier. This field validates XDTA_SOURCE1_REG_H[03:00]. XDTA_SOURCE1_VALID_H validates the specifier as a source operand. e When SOURCE1_REG_VALID_H is negated and SOURCE1_VALID_H is asserted, the source 1 specifier is a memory operand. KR i_DATA_H[11:08] 10 XDTA_SOURCE1_REG_H[03:00] SP1_REGISTER_H[03:00] LATCH REGISTER_MODE_H|[01] XSCA_REQUEST_H[03:00] SP1_ACCESS_TYPE_H REGISTER VALID g B_SOURCE1_REG_R[03:00] XDTA_SOURCE1_REG_VALID_H B_SOURCE1_REG_VALID_H - > XSCA_SPECIFIERS_DECODED_H{01:00] L B_SOURCE1_VALID_H 1 SOURCE VALID XDTA_SOURCE1_VALID_H o > MR_X0082_89 Figure 4-12 XBAR Source 1 Logic DIGITAL INTERNAL USE ONLY 4-20 Instruction Decode 4.1.13 XBAR Source 2 Data Path Source 2 fields are passed to the operand handlers if two source specifiers are decoded. The source 2 fields are similar to the source 1 fields in structure, but they rely on the source 1 specifier characteristics when they are generated. That is, if source 1 is a complex specifier, then the location in the I-stream that contains source 2 is determined by case, source 1, and source 2 data types and the number of specifiers completed. Figure 4-13 shows the generation the source 2 fields. The source 2 select logic selects the byte position that contains the specifier. If the specifier is not a register, the byte position is selected, but the field (SP2_REGISTER_ H[03:00]) is negated. The inputs to the select logic are as follows: e CASE_H[01:00] identifies the location in the instruction of the complex specifier. e XSCA_SP1_DATATYPE_H[02:00] identifies the data type of the source 1 specifier. e REGISTER_MODE_H[07:01], from register mode logic and instruction buffer simple decode logic, identifies which bytes contain register specifiers. e INDEX_MODE_H[01], IMMEDIATE_MODE_H[01], and ABSOLUTE_MODE_H[01] identify the source 2 specifier as either of these three addressing modes. e SP2_ACCESS_TYPE_H provides the access type of the source 2 specifier. e XSCA_SPECIFIERS_DECODED_HI[02:00] identifies the total number of specifiers decoded in the instruction currently being decoded. The source 2 register field is validated if the specifier is a valid register mode specifier. The specifier 2 access type is input into the valid logic to differentiate between the destination and the source specifiers. A write or modify access negates the validation of a specifier as a source operand. |_DATA_H[59 16] BYTE 2 BYTE 3 BYTE 4 ] SP2_REGISTER_H[03:00] BYTE § BYTE 6 BYTE 7 XDTA_SOURCE2_REG_H[03:00] LATCH CASE_H{[01:00] REQUEST XSCA_SPt_DATATYPE_H[02:00) SRC2_REG_MODE REGISTER_MODE_H[07:01] IMMEDIATE_MODE_H[01] INDEX_MODE_H[01] SOURCE 2 REGISTER VALID > B_SOURCE2_REG_VALID_H > XDTA_SOURCE2_REG_VALID_H SOURCE 2 > BYTE SELECT ABSOLUTE_MODE_H[01] SPECIFIERS_DECODED SP2_ACCESS_TYPE ACCESS_TYPE XSCA_SPECIFIERS_DECODED_H[01:00) SOURCE 2 VALID | B_SOURCE2_VALID_H l XDTA_SOURCE2_VALID_M > > MA_X0083_89 Figure 4-13 XBAR Source 2 Logic DIGITAL INTERNAL USE ONLY Instruction Decode 4-21 4.1.14 XBAR Destination The XBAR destination fields are generated for destination specifiers that are passed to the specifier handlers. Figure 4-14 shows a simplified block diagram of the destination logic. XSCA_SPECIFIERS_DECODED_H[01:00] B_DESTINATION_VALID_H DESL"NU“;'oN SP1_REGISTER_H[03:00] SP2_REGISTER_H[03:00] SP1_REGISTER_MODE_H SP3_REGISTER_MODE_R B_DESTINATION_REG[03:00] L l XDTA_DESTINATION_REG_H[03:00] SP3_REGISTER_H{03:00] SP2_REGISTER_MODE_H | XDTA_DESTINATION_VALID_H vy SPECIFIER_ACCESS_TYPE SPECIFIERY_REG_DESTINATION_H DESTINATION SPECIFIER SELECT SPECIFIER2_REG_DESTINATION_H B_DESTINATION_REG_VALID SP3_REGISTER_MODE_H XDTA_DESTINATION_REG_VALID_H > _TYPE SPECIFIER_ACCESS XSCA_SPECIFIERS_DECODED_H[01:00]} 88 WAR_X:o84_ Figure 4-14 XBAR Destination 4.1.14.1 Destination Valid XDTA_DESTINATION_VALID_H is sent to the FPL of the OPU to validate a destination specifier. This signal is generated from access type and specifiers decoded fields. The access type of the destination specifier must be write, modify, implied write, or VSRC modify. Any specifier with an access type of read is a source specifier. To validate the destination specifier, the specifiers decoded must indicate the specifier under decode (destination specifier) is the last specifier of the instruction. 4.1.14.2 Destination Register Valid Destination registers are validated when the destination specifier is a register. SPx_ REGISTER_DESTINATION_H is asserted when the selected destination specifier is a register mode specifier. DIGITAL INTERNAL USE ONLY 4-22 Instruction Decode 4.1.15 Register Masks The register mask logic receives a register number for any register that is read or written to by the EBox during the execution of an instruction. All of the register accesses in an instruction are accumulated in one of two registers and passed to the OCTL unit of the OPU at the completion of the instruction decode. Figure 4-15 shows the XBAR mask logic. B_SOURCE1_REG_H|[03:00] SOURCE 1 XSCA_SP1_DATATYPE_H[02:01] SOURCE1_MASK_H{[14:00] MASK B_SOURCE2_REG_H[03:00] SOURCE2_MASK_H[14:00] XSCA_SP2_DATATYPE_H[02:01] SO.:JARSCKE 2 B_MASK_SOURCE1_REG_VALID_H nEen XDTA_READ_MASK_H[14:00] GENERATION B_SOURCE1_VALID_H gl B_SOURCEZ2_REG_VALID_H B_SOURCEZ2_VALID_H . XSCA_SHIFTOPCODE_D_L B_DESTINATION_REG_VALID_H B_DESTINATION_VALID_H ‘x:gKE XOTA_WRITE_MASK_H[14:00] GENERATION > B_DESTINATION_REG{03:00] DESTINATION_MASK_H{[14:00] XSCA_SP3_DATATYPE_H[02:00) DESTINATION MASK 3 XSCA_SPECIFIERS_DECODED_H[01:00] ! MA_X0085_89 Figure 4-15 XBAR Read and Write Maské 4.1.15.1 Read Mask A XBAR read mask is generated by decoding each valid source operand that is passed to the FPL. The read mask logic receives the following: B_SOURCEx_REG_H[03:00] B_SOURCEx_VALID_H B_SOURCEx_REG_VALID_H XSCA_SPx_DATATYPE_H[02:00] The source valid and source register fields are decoded to detect valid register specifiers. When a valid register specifier is detected, the register field is stored in the read mask logic. The data type of the specifier XSCA_SPx_DATATYPE_H[02:00]) further defines the read access of the registers by asserting subsequent register numbers in the read mask when the base register data type is quadword or octaword. For example, a quadword read of register 1 would generate XDTA_READ_MASK_H[14:00] = 0006. This mask identifies register 1 and register 2 as being read during the execution of this instruction. READ_MASK_H[14:00] is passed to OCTL when XSCA_SHIFTOPCODE_H is asserted. DIGITAL INTERNAL USE ONLY Instruction Decode 4-23 4.1.15.2 Write Mask The XBAR write mask field contains entries pertaining to registers that are written to during the execution of an instruction. The write mask is generated by decoding the following: DESTINATION_REG_VALID_H DESTINATION_VALID_H DESTINATION_REG{03:00] XSCA_SP3_DATATYPE_H[02:00] The data type and the register field generate DESTINATION_MASK_H[14:00] when the data type is as follows: Write Modify Implied write VSRC modify When both DESTINATION_REG_VALID_H and DESTINATION _VALID_H are asserted, an entry is added to the write mask. 4.1.15.3 Implied Mask When a character string instruction is detected, the XRAM asserts IMPLIED_MASK H, which asserts XDTA_WRITE_MASK_H[05:00]. RO through R5 are asserted in the write mask because they contain the control block that maintains updated addresses and state information during the execution of the instruction. 4.1.16 Intra-Instruction Read Conflicts Intra-instruction read conflicts (IRCs) occur when a read conflict is in the specifiers of a single instruction. These conflicts occur when a register specifier is to be used as data in the EBox and is subsequently used as the base register for an autoincrement or autodecrement. An IRC occurs in the instruction ADDL3 RO, (R0)+, R1. This instruction directs the EBox to read RO and also directs the IBox to use RO as the base register for an autoincrement. The IBox detects this IRC by monitoring the read and write masks that are generated for each instruction. When the XBAR detects an IRC, it notifies the OPU and passes the autoincrement or autodecrement specifier to the CSU of the OPU and also passes all subsequent register specifiers through the CSU, instead of through only the FPL. The CSU processes the autoincrement specifier, updates the IBox copy of the GPRs, but it does not update the EBox copy of the GPRs. Subsequent register specifiers are processed by the CSU and passed to the EBox as data and placed in the source list. The data passed to the source list is the content of the IBox copy of the register. The OPU records the modified register numbers until the instruction has been completed and then writes the registers to the EBox GPRs using the data in the IBox GPRs. The XBAR does not process any subsequent specifiers until the instruction is complete (in the EBox) and then writes the IBox (modified) GPRs to the EBox GPRs. This process is called a delayed GPR update. DIGITAL INTERNAL USE ONLY 4-24 Instruction Decode The IRC mask decode logic (shown in Figure 4-16) generates a composite IRC mask (IRC_MASK_H[08:00]). The composite IRC mask identifies the data types of GPR accesses for the IRC detection logic. IRC_MASK_HJ[08:00] is generated by decoding the read mask and decoding the I-stream for autoincrement and autodecrement mode specifiers. IRC_YREG_H[03:00] is also input into the IRC mask logic. IRC_YREG_H[03:00] identifies the base register of register specifiers in the instruction. The IRC composite mask, n (number of specifiers decoded), CASE_{01:00], index mode, and the specifier data types are input to the IRC detection logic (Figure 4-17. The Rlxx shift count and specifiers decoded count are also input to this logic. The output of the IRC detection logic produces the specifier counts and specifier decoded counts during IRC handling. When an IRC is detected, three outputs are produced to direct the special handling required for the instruction: IRC_H IRC_N_H[00] IRC_SC_H{[02:00] IRC_N_H[00] provides the specifiers decoded count for the IRC and can equal only zero or one. IRC_SC_H[02:00] provides the shift count for the IRC. During an IRC, the shift counts are based only on the decode tree outputs of R1, R1BB, R1BW, and R1L The XBAR continues to decode one specifier at a time until the instruction is complete. All decoded specifiers are passed through the CSU. Autoincrements or autodecrements are performed only on the IBox GPRs. The XBAR stalls IRC_STALL_H) when the instruction is completely decoded and waits for the EBox to complete the instruction. When the EBox completes the instruction, a delayed GPR update is performed and the XBAR resumes decoding (OSQA_DGPR_UPDATE_FINISHED_H negates IRC_STALL_ H). IRC MASK DECODE IRC_MASK_H[08:00] CASE_H[01:00] I_DATA_H[38:08] I_DATA_L[39:37, 31:29, 23:21, 19:13, 11:08] READ_MASK_H[14:00] REGISTER_MODE_H[02:01] SL_MODE_H[02:01) IRC_YREG_H[03:00] MR_X0338_89 Figure 4-16 IRC Detection: Read Mask DIGITAL INTERNAL USE ONLY Instruction Decode 4-25 IRC DETECTION OUTPUTS IRC_DETECTED_H IRC_H IRC_L B_IRC_H B_IRC_L RIRC_SC_H[02:00] RIRC_N_L[00] XDTA_IRC_MASK_H[08:00] INPUTS SP1_DATATYPE_H[02:01] SP2_DATATYPE_H[02:01) INDEX_MODE_H[01] INDEX_MODE_L[02] CASE_H[01] CASE_L[00] REQUEST_L{03:00] N_L[01:00] B_ALL_SPECIFIERS_COMPLETED_H 00] R1_SC_L[02 R1_N_H[00] R1BB_SC_L[00] R1BB_N_H{00] R1BW_SC_L[01] R1BW_N_H{00] R11_N_H[00] BRANCH_BYTE_H[01) BRANCH_WORD_H[01] IMPLIED_READ_H[01] IMPLIED_WRITE_H[01) B_RESTART_XBAR_H XBAR_STALL_H XBAR_STALL_L B_IBOX_ERROR_H MR_X0339_88 Figure 4-17 IRC Detection: IRC Mask 4.1.17 XBAR Stalls XSCA detects stalls in the XBAR. Most of the stalls in the XBAR are related to specifiers that require special handling or are related to situations external to the XBAR logic. The external situations are a result of the limitations of the logic units that receive decoded specifiers from the XBAR or because the instruction buffer and/or the VIC are not providing valid I-stream to the XBAR. This section describes the logic that initiates XBAR stalls, and it describes the conditions external to the XBAR that cause the XBAR to stall. DIGITAL INTERNAL USE ONLY 4-26 Instruction Decode When XBAR_STALL_H is asserted, the XBAR forces the decode tree logic and the request logic to produce outputs that generate no shift counts and produce zero for a specifiers decoded output. During a XBAR stall, the request logic produces an output that selects a nonexistent decode tree output (REQUEST_H[03:00] = F). Selecting the nonexistent decode tree output generates a shift count of zero (XSCA_SHIFTCOUNT_H[03:00] = 0) and generates a number of specifiers decoded count of zero (N_H[01:00] = 0). The shift count of zero directs the instruction buffer to hold (not shift) the I-stream it is presenting for decode and the specifiers decoded count of zero results in no valid specifier data being passed to the OPU. XBAR_STALL_H is asserted when the following conditions are true: e JRC_STALL_H is asserted after the XBAR passes all the specifiers of an instruction that contains an IRC to the OPU. e RAF_H is asserted, signifying a reserved addressing fault. This fault is detected in the XBAR. e FPD_FLUSH_H is asserted when the EBox stalls the execution of an instruction to service an exception or an interrupt. e PCHI_IBUF_FLUSH_H is asserted. This signal is asserted when there is a change in I-stream. e e [VALID_L[00], when negated, there is no valid opcode in the instruction buffer. OCTL_MASK_STALL_H is asserted when the OPU read/write register mask logic is full. e OSQA_DECODE_STALL_H is asserted because the OPU cannot receive decoded specifiers because one or more of the following conditions exist: EBox source list or source pointers are full. Read or write conflict stall exists (inter-instruction conflict or scoreboard stall). Autoincrement or autodecrement follows a branch. Branch follows a branch. Two conditional branches are buffered. GPR update is delayed. The XBAR generates two stall signals, OPU_STALL_H and SL_STALL_H, that are related to the availability of resources in the OPU. These signals inhibit the XBAR from decoding complex or short literal specifiers when they are asserted. When OPU_STALL _H is asserted, this informs the XBAR that the OPU is processing a complex specifier and has another complex specifier buffered in the OPU stall logic. This stall signal inhibits the XBAR from decoding and passing another complex specifier to the OPU. This signal is input to the decode tree logic and affects the shift count and number of specifiers decoded outputs. When OPU_STALL_H is asserted and another complex specifier is decoded, XBAR_STALL_H is asserted. SL_STALL_H is asserted to inform the XBAR that two short literal specifiers are currently in the SLU. This stall inhibits decoding another short literal specifier and asserts XBAR_STALL_H when another short literal is encountered. DIGITAL INTERNAL USE ONLY 4-27 Instruction Decode 4.2 Branch Prediction This section describes the VAX 9000 family branch prediction functions. It describes the three methods used to predict a branch and the microcode that controls them. The three methods used to predict' branches are as follows: e Primary mode is used when the branch being decoded is stored in the branch prediction cache (BPC). The fields in the BPC direct the decision of taking or not taking the branch and provide the branch PC. e s Demote is related to primary mode and occurs when the cached displacement or instruction length does not match that of the branch being decoded. Under demote, the cached prediction bit is used and the branch PC is provided by the CSU. Secondary mode is used when the branch is not stored in the BPC. The branch may not be cached because it has not been encountered before or because it is not a cacheable branch. In this mode, the branch PC is provided by the CSU. 4.2.1 Primary Predictions Primary branch predictions use the BPC to direct the flow of I-stream when branches are encountered. These predictions are based on the content of the BPC. The BPC contains five fields that are written when a cacheable branch is first encountered and read when the branch is subsequently encountered. Figure 4-18 shows the organization of the BPC. DECODE PC [31:10] PREDI!CTION TAG [31:10] PREDICTION TAG PARITY [03:01] COMPARE p—> HIT DELTA PC (SHIFT COUNT) TAG INSTRUCTION LENGTH [05:00] TAG INSTRUCTION LENGHTH PARITY [00] PCxx_BP_ADDRESS_H{08 00} PREDICTION PC [31:00] PREDICTION PC PARITY [03:00] COMPARE |—TM DEMOTE PCU > ViC ADDRESS XBAR DISPLACEMENT TAG DISPLACEMENT {15:00] COMPARE —¥ DEMOTE PREDICTIONA, B, C, D, E, F PCU i PREDICT TAKEN TAG DISPLACEMENT PARITY [01:00] MR_X0340_89 Figure 4-18 BPC Organization DIGITAL INTERNAL USE ONLY 4-28 Instruction Decode The BPC has 1024 locations and is never flushed. The following fields describe the contents and functions of the branch prediction cache fields: * Branch PC tag — This field contains bits [31:10] of the virtual PC of the branch. * Prediction PC — This 32-bit field is the destination PC of the branch. * Branch displacement — This 16-bit field is the actual displacement of the branch. * Branch instruction length — This 6-bit field contains the actual instruction length * Prediction bit — This bit is set when the branch is predicted taken. of the branch. 4.2.1.1 Primary Hits A branch prediction hit occurs when the BP tag field matches the corresponding bits in the decode PC. A hit directs the cached prediction PC to be loaded into the decode PC if the prediction bit is asserted. Figures 4-19 and 4-20 show the compare logic of the BPC and the results of the comparisons. DECODE_PC[31:10] HIT PREDICTION_TAG[31:10] COMPARE| MIsS LATCH b_.—_. BP_PREDICTION LATCH DELTA_PC[05 00] TAG_INSTRUCTION_LENGTH[05 00! LATCH COMPARE XBAR_DISPLACEMENT[15:00] > TAG_DISPLACEMENT{15:00] LATCH > COMPARE MR _Xp086_89 Figure 4-19 BPC Compare: Hit DECODE_PC_H[31:10] TAG PREDICTION_TAG_HI31:10) | HIT compaRE LATCH MOVE_BP_TO_DECODE BP_PREDICTION LATCH BSHOP LATCH DELTA_PC_H[05:00] TAG_INSTRUCTION _LENGTH[05:00] XBAR_DISP[15:00] LATCH COMPARE LATCH COMPARE [— ! TAG_DISP[15:00] - MR_X0087_asg Figure 4-20 BPC Compare: Demote DIGITAL INTERNAL USE ONLY Instruction Decode 4-29 When a branch is encountered, the BPC is accessed to perform a compare. The compare results in one of three outputs: hit, miss, or demote. Two cycles are required to produce a predicted taken branch from the branch prediction cache. ed 1. In the first cycle, DECODE_PC[31:10] and PREDICTION_PC[31:10] arerycompar mechanism and produce a hit or miss. A miss directs the prediction to the seconda and a hit requires subsequent comparisons to produce a hit or a demote. 92 If a hit is encountered in the PC comparisons, DELTA_PCI[05:00] and TAG_ INSTRUCTION _LENGTH[05:00] are compared, producing a hit or demote. 3. The XBAR_DISPLACEMENT{15:00] and TAG_DISPLACEMENT(15:00] are compared to produce a demote if they do not match. must be set for the branch to be 4. If a hit is encountered, BP_PREDICTION_H : , . predicted taken. 4.2.1.2 Tag Match Enable the PCU if the The IBUF simple decode logic decodes branch instructions and informs BPC is not being the and d branch is cacheable. When IBFB_CACHEABLE_H is asserte tag match. the enables that written, a tag match is enabled. Figure 4-21 shows the logic IBFB_CACHEABLE_H ENABLE_BP_TAG_MATCH_H BP_WRITE_TB_L DISABLE_BP_HIT_L = MR_X0088_8% Figure 4-21 BP Tag Match Enable 4.2.2 Demote of the A primary prediction hit is demoted if the instruction length or the displacement occurs This branch. taken ed predict a on BPC the in branch do not match those stored when the virtual PCs of the branch match, but they are not the same branch. checked before If the displacement and instruction length do not match, the history bit is no demote 0), = bit the branch is demoted. If the branch is predicted not taken (history the predict ion When es. is necessary and decoding of the sequential I-stream continu is BPC the and CSU, the by bit is set, the branch is demoted, the target PC is provided are written. written. That is, the new branch PC, instruction length, and displacement If the prediction is incorrect, the BPC is rewritten with the prediction bit cleared. DIGITAL INTERNAL USE ONLY 4-30 Instruction Decode 4.2.3 Secondary Predictions Secondary predictions are performed by the branch bias logic of the XBAR. A secondary prediction is based on the opcode of the branch instruction. When a secondary prediction directs the IBox to take the branch, the branch target PC is supplied by the CSU. Secondary mode is used when the branch has not been cached or when the BPC cannot be accessed because it is busy being wntten Figure 4-22 shows the inputs and outputs of the branch bias logic. The branch bias logic (BRAM) is comprised of 22 scan latches and selection logic. The latches are loaded with fixed predictions for branches and accessed when a branch that has not been cached is encountered. The opcode is used to select the prediction that is stored in the branch bias latches. The fixed predictions for the branches are based on a bias for that instruction. Thatis, a BEQL tests two conditions for equality. Because equalityin mathemat1ca1 situations is rare, this branch would be predicted not taken. A branch thatis predicted takenin the BRAM loglc asserts XDTB_BRAM_BIAS_H. This signal is passed to the branch prediction logic. XDTB EXTENDED_H |_DATA_HR[07:00] B_BIAS_H B RANCH BIAS | ———————e e s_Bias_L T MR_X0089_89 Figure 422 4.2.4 Branch Bias Logic BPC Correction Correction refers to the actions taken to correct an incorrect branch prediction before the branch instruction is shifted out of the instruction buffer. Branch instructions are predicted when they are shifted into the instruction buffer and are acted upon when they are shifted out of the instruction buffer. If the EBox validates the branch prediction before it is shifted out of the instruction buffer, and the prediction is incorrect, a correction is initiated. A correction causes the rewriting (inverting) of the BPC prediction bit and the writing of the unwind PC (UNWIND_PC_HI[31:00]) to the prediction PC (PREDICTION_PC_ H([31:00]). If the branch was predicted to be taken and was incorrect, the IBox continues following sequential I-stream. If the branch was predicted to be not taken and was incorrect, the IBox redirects the I-stream to the PC supplied by the CSU. In both cases, the correct PC (sequential or nonsequential) is loaded from the unwind PC to the prediction PC. DIGITAL INTERNAL USE ONLY Instruction Decode 4-31 4.2.5 BPC Unwind Unwind refers to the actions necessary to correct an incorrect branch prediction whose validation arrives after the instruction buffer has shifted the branch instruction out and has started preprocessing the wrong I-stream. A BPC unwind initiates the following events: e Asserts IBOX_ABORT to abort any instruction buffer requests for new I-stream. e TFlushes the instruction buffer of any I-stream loaded since the branch instruction. « Invalidates any pointers to the EBox source list that have been entered after the branch prediction was made. o Invalidates any OCTL GPR read/write masks that pertain to instruction decoded | after the bad branch prediction. e Rewrite (invert) the BPC history bit to reflect the correct prediction for the branch. processing e The unwind PC is loaded into the prediction PC so that the IBox can start ' the correct I-stream. PCU Microcode 4.3 nted The PCU and BPC are controlled by PCU microcode. This microcode is implemelogic as hard in ed in hard-coded microcode. That is, the microcode structure is generat opposed to a conventional RAM structure. fields are The PCU microcode consists of 38 bits that are partitioned into 22 fields. The the control and tion informa state PCU s contain that field addressed by an 11-bit address signals to govern the flow of the microcode. 4.3.1 PCU Microaddress The address for the PCU microcode is 11 bits wide. The address is organized as follows: UADDRESS[03:00] provides PCU control signals. UADDRESS[06:04] indicates where to store OPU_TARGET_PC. UADDRESS[10:07] defines the state of the PCU at the beginning of the current cycle. Table 4—3 describes the three address fields. DIGITAL INTERNAL USE ONLY 4-32 Instruction Decode Table 4-3 PCU Microaddress Descriptions UADDRESS[03:00] UADDRESS[00] PCU Control This bit is set when the CSU has delivered a target PC to the PCU and there was not an unwind, last cycle. UADDRESS[01] This signal indicates whether an IBox branch prediction was correct. The signal is only valid when UADDRESS[02] is asserted. UADDRESS[02] Asserted, this bit validates UADDRESS[01]. UADDRESS[03] Asserted when an unconditional branch is in the IBUE. UADDRESS[06:04]) OPU Target PC Destination UADDRESS[04] UADDRESS[05] UADDRESS[06]" This bit directs the target PC to be stored in the unwind PC in the PCU. This bit directs the target PC to be stored in the decode PC. This bit is asserted if the PCU has a branch prediction, predict taken hit. This informs the microcode to ignore the target PC. UADDRESS[10:07] Label Description 0000 IDLE The PCU has not encountered a branch and the IBox is executing sequential code. 0001 BSHOP VAL PCU has received a target PC from the CSU before the branch has been shifted out of the IBUF and before EBox has validated prediction. 0010 VAL UNC TAR VAL TAR XBAR has decoded a conditional, ~ 0011 VAL unconditional, and another conditional branch, in that order. There may have been sequential code between the branches. The PCU is waiting for EBox validation of the first branch, a target PC for the unconditional branch, and EBox validation and target PC for the third branch. The PCU is waiting for EBox validation of a branch prediction. 0100 VAL VAL TAR Two conditional branches have been shifted out of IBUF. The first needs EBox validation and the second needs EBox validation and target PC from the CSU. 0101 . BSHOP TAR This state occurs when the branch is validated before it is shifted out of the IBUF. (EBox is ahead of IBox.) 0110 VAL TAR XBAR decodes a conditional branch and shifts it out of the IBUF. The validation and target PC follow a few cycles later. (IBox is ahead of EBox.) : 1These three bits are mutually exclusive. Only one of these bits may be set in a cycle. DIGITAL INTERNAL USE ONLY Instruction Decode 4-33 Table 4-3 (Cont.) PCU Microaddress Descriptions UADDRESS[10:07] Label Description 0111 VAL TAR VAL TAR Two conditional branches have been decoded by XBAR with both awaiting validation and target PCs. 1000 TAR VAL TAR 1001 TAR 1010 VAL UNC TAR TAR Two branches are being processed. The first has been validated but needs a target PC. The second needs both validation and a target PC. A target PC is required from the CSU to process a branch completely. The PCU is processing three branches. The first is conditional and needs validation. The second and third are unconditional and need target PCs from the CSU. 1011 TAR TAR 1100 VAL BSHOP VAL : The CSU has not delivered target PCs for two branches. An outstanding branch is in the PCU and the CSU has provided a target PC before the second branch has been shifted out of the instruction buffer. This state occurs only with JSB or BSB. 1101 VAL UNC TAR 1110 VAL TAR TAR The PCU is waiting for a validation for one branch and a target PC for a subsequent unconditional branch. The PCU needs validation and a target PC for a conditional branch and a target PC for a subsequent unconditional branch. 1111 VAL VAL TAR UNC TAR The IBox has shifted out two conditional and one unconditional branches. The target PC has been delivered for only the first conditional branch and the EBox has not validated either conditional branch. DIGITAL INTERNAL USE ONLY 4-34 Instruction Decode 4.3.2 PCU Microword The PCU microword is 39 bits wide and is partitioned into 22 fields. Figure 4-23 provides a breakdown of the microword. Table 4—4 describes each microcode field. 07 06 0s 04 03 g2 01 00 NEXT STATE LOAD DECODE LOAD UNWIND LOAD DEL TARGET LOAD SEC UNWIND LOAD SEC BRANCH USE BRANCH OK BP WRITE IBUF FLUSH BRANCH TAKEN CORRECTION 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 INVERT BP ADDRESS BRANCH LOAD PC MARKERS TARGET PC DECODE PC UNWIND PC SET LOAD UNWIND BRANCH BRANCH SEC OK SEL MARKERS [N USE MA_X0030_89 Figure 4-23 PCU Microword Format DIGITAL INTERNAL USE ONLY Instruction Decode 4-35 Table 4-4 PCU Microword Field Descriptions [08:00] Next State Description 0000 IDLE See Table 4-3 for field descriptions. 0001 BSHOP VAL 0010 VAL UNC TAR VAL TAR 0011 VAL 0100 VAL VAL TAR 0101 BSHOP TAR 0110 VAL TAR 0111 VAL TAR VAL TAR 1000 TAR VAL TAR 1001 TAR 1010 VAL UNC TAR TAR 1011 TAR TAR 1100 VAL BSHOP VAL 1101 VAL UNC TAR 1110 VAL TAR TAR 1111 VAL VAL TAR UNC TAR [04] Load Decode 1 - Description Bad branch prediction. Load unwind PC into decode PC or PCU waits for target PC from CSU. Load decode is set when destination PC arrives. Load Unwind [05] 1 - Description Unwind PC is loaded when predict taken branch is shifted out of IBUF. On predict not taken, unwind PC is loaded when CSU provides destination PC. Load Delayed Target [06] 1 - Description When the CSU delivers a target PC before branch is shifted out, this signal instructs the PCU to latch and hold the target PC. [07] 1 Load Second Unwind - Description Same as load unwind, except the PC is loaded into second unwind PC (on second branch prediction). DIGITAL INTERNAL USE ONLY _4—36 Instruction Decode Table 4-4 (Cont.) [08] 1 PCU Microword Field Descriptions Load Second Branch Description - Asserted when a second branch is shifted out of the IBUF. The PCU must latch and hold the following: Virtual address Prediction bit Cacheable Loop branch Instruction length Branch displacement [09] 1 Use Branch OK Description - Set when EBox sends validation of branch before shifting the branch out of IBUF. Remains set until branch is shifted. [13:10] BP Write Description 000 FALSE BPC is not written when this state is chosen. 001 NO HIT BPC is written if present branch is not stored and cacheable. 010 PT NL BPC is written in this case if the branch is incorrectly predicted taken and is not a loop branch. 011 CA Cacheable branch is encountered for the first time and is written to the BPC. 100 NL CA Not loop, cacheable. Branch predicted incorrectly from BPC is rewritten correctly (prediction bit) if it is not a loop branch. 101 NEXT This field enables the write of a second branch being processed, to BPC, if the branch is cacheable. IBUF Flush Description 00 NOOP Hold previous values. 01 TRUE IBUF flush. 10 NPT Not predict taken. 11 PT Predict taken. [15:14] DIGITAL INTERNAL USE ONLY Instruction Decode Table 4-4 (Cont.) 4-37 PCU Microword Field Descriptions Branch Taken Description 00 NOOP Hold previous value. 01 NPT Not predict taken. 10 PT Predict taken. 11 TRUE Predict taken. 100 NOT BT Not branch taken. 101 SBT Second branch taken. [19] Correction Description - Asserted when the EBox informs the IBox of a bad Invert Descripticn - When the EBox informs the IBox of a bad prediction, BP Address Description [18:16] 1 [20] 1 [22:21] branch before it has been shifted out of the IBUF. this signal directs the PCU to invert its fields. If the branch is still in the IBUF, the decode PC 00 - 01 - If the branch has been shifted out of the IBUF, the 10 - Second branch PC is selected by this value. (23] Branch PC Description 0 Decode PC The branch PC is loaded into the decode PC. Second branch PC provides the address. stored branch PC provides the address. The branch PCis loaded into the second branch PC (two branches being processed). DIGITAL INTERNAL USE ONLY 4-38 Instruction Decode Table 4-4 (Cont.) PCU Microword Field Descriptions [27:24] Load Markers Description 0000 NOOP Load markers with values held last cycle. 0001 TRUE The marker set depends on the prediction of the branch and if the branch hits in the BPC. The following table provides the value dependent on these two variables. BP Hit Predict Taken Marker Set 0 0 Store in unwind. 0 1 Store in decode. 1 0 Store in unwind. . 1 1 Ignore PC. 0010 FALSE All markers, first and second, are cleared when all 0011 SET SID This state clears all markers and sets STORE_IN_ DECODE. 0100 TAR CASES This state is set when the validation of a branch arrives before the destination PC. This state depends on three signals: BRANCH_OK, BP_HIT, and PREDICT_TAKEN. branches are completely processed or when EBox detects a bad branch prediction. Branch BP Hit OK Predict Marker Set Taken 0 0 0 0 0 1 Store in decode. Ignore PC. 0 1 0 Store in decode. 0 1 1 Ignore PC. 1 0 0 Ignore PC. Store in decode. 1 0 1 1 1 0 Ignore PC. 1 1 1 Ignore PC. 0101 FROM SECOND When processing two branches, at the completion of the first, this field is selected to move the markers stored in the second position into the first position. 0110 SET IG When the PCU is awaiting a destination PC from the CSU, and the EBox indicates the prediction is wrong, markers are switched from store in decode to ignore PC. 0111 SECOND When the microcode selects the second set of markers, the TRUE state is used as the input. 1000 PUSH This field is only used when going to state VAL VAL TAR UNC TAR. The first markers get the content of the second marker and the second marker gets loaded with appropriate markers for the unconditional branch that has just been shifted out of the IBUF. DIGITAL INTERNAL USE ONLY Instruction Decode 4-39 Table 4-4 (Cont.) PCU Microword Field Descriptions [29:28] Target PC Select Description 00 01 10 OPU results Unwind PC Delayed target PC Target PC provided by the SCU. Unwind PC provides target PC. Wait for target PC. [31:30] Decode PC Select Description Target PC Next PC Target PC is selected on predict taken. Next PC is selected on predict not taken. Unwind PC Select Description 00 01 [33:32] 00 Next PC 01 OPU result 10 Second unwind PC Set Unwind Description 01 Next On a bad branch prediction, this signal informs the 10 OPU result 11 Second unwind Load second unwind PC. [36] Load Branch Description - This field is the same as load second branch. The [85:34] 1 IBox to unwind from the prediction. This signal is used only if the branch has been shifted out of the IBUF. Next selects next PC on a bad predict taken. Load OPU result on bad not taken prediction. same information is stored but pertains to the first branch. [37] 1 Branch OK Select Description - This field can force the validation signal from the EBox. The signal indicates that the PCU has acknowledged the bad prediction and has corrected it without causing any damage to the integrity of the IBUF. [38] 0 Second Markers Description - When a demote occurs, demote the first branch. When a demote occurs, demote the second branch. DIGITAL INTERNAL USE ONLY 4-40 Instruction Decode 4.3.3 Writing the BPC The PCU writes the information pertaining to cacheable branches it has encountered. Certain criteria must be met for a branch to be cacheable: e Branches must have displacements that will not change during the execution of the instruction. ¢ Branches must have a displacement as part of the I-stream. A branch that has already been written to cache will not be rewritten unless the prediction is incorrect. The EBox informs the IBox of an incorrect prediction and the prediction bit is inverted. 4.3.3.1 BPC Write Enable The BP write field of the PCU microcode selects the field that enables the BPC write signal. Figure 4-24 shows the write enable logic. \ Iy 0 BP_HIT_L IBFB_CACHEABLE PREDICT_TAKEN BP_LP_BRANCH_L IBFB_CACHEABLE CACHEABLE CACHEABLE LOOP_BRANCH_L WRITE BP LATCH BP WRITE }————» PCBP NEXT_CACHEABLE BP_WRITE_SEL{02:00] (MICROCODE [13:10}]) CTL_PC_FLUSH_L (ENABLE) MR_X009*_69 Figure 4-24 BPC Write Enable DIGITAL INTERNAL USE ONLY 4—41 Instruction Decode 4.3.3.2 Cache Tag Write Figure 4-25 shows the BP tag address write logic. To write the tag, either the branch PC, decode PC, or second branch PC is selected at the BP tag multiplexer. Selection is as follows: Decode PC is selected if the branch is still in the instruction buffer. The branch e remains in the instruction buffer if the address must be read from or written to. e Branch PC is selected after the branch has been shifted out of the instructionvbufi'er. The virtual address of the branch is loaded into the branch PC. e Second branch PC is used when the PCU is processing two branches. This address can be loaded directly into the BPC or into the branch PC when the first branch address is no longer needed. The selected PC provides bits [31:10] of the virtual address of the branch to be written to the BPC tag field. \‘ BP ADDRESS BRANCH_PC DECODE_PC SECOND_BRANCH_PC BP_ADDRESS_SEL[01:00] (MICROCODE [22:21]) MR_X0092_89 Figure 4-25 BPC Tag Write 4.3.3.3 Instruction Length Field Write The instruction length field is loaded from the delta PC. This PC is derived from the accumulated shift counts since the last SHIFT_OPCODE_H. This value, when the instruction is shifted out of the instruction buffer, yields the instruction length. Figure 4-26 shows the logic generating the detla PC. The delta PC represents the current instruction length or, if a second branch is being processed, it is selected as the second branch instruction length. ———»d LATCH NN DELAYED_DELTA_PC CURRENT_INSTRUCTION_LENGTH_H[05:00] BRANCH_IL XBAR_SHIFTCOUNT{05:00] PC_INSTRUCTION_LENGTH_H{05:00] LATCH l) ] 2ND_BRANCH_iL LATCH BP_ADDRESS_SELECT_H (MICROCODE [22:21]) ! wA_x0883_38 Figure 4-26 BP Instruction Length DIGITAL INTERNAL USE ONLY 4—-42 Instruction Decode 4.3.3.4 Prediction PC Write Figure 4-27 shows the inputs that are written to the BP prediction PC field. The time when the PCU writes the prediction PC field depends on the prediction of the branch. When a branch is predicted not taken, the field is written when the branch is shifted out of the instruction buffer. The value that is written for a branch predicted not taken is unpredictable. The field that is written is not valuable because the branch is predicted not taken the next time it is encountered. A branch that is predicted taken must wait for the target PC to arrive from the CSU before it can be written. When a branch is predicted incorrectly, the BPC must be updated with the correct information. In this case, the unwind PC supplies the correct prediction PC. TARGET PC DECODE PC LATCH |—— BP TARGET PC NEXT PC DECODE PC SELECT MR_XD00¢_89 Figure 4-27 BP Prediction PC Write 4.3.3.5 BP Displacement Write The branch displacement is calculated by the XBAR and, when the branch is cacheable, is written to the BPC. Figure 4-28 shows the logic that selects from two inputs to be written to the BPC. XBAR_DISPLACEMENT _H[15:00] is written to the BPC or latched as SECOND_ BRANCH_DISPLACEMENT_H[15:00] when two branches are being processed. XBAR_DISPLACEMENT{15:00] | DISPLACEMENT[15:00] A_SECOND_BRANCH _DISPLACEMENT[15:00] LATCH MR_XD095_89 Figure 4-28 BP Displacement Write 4.3.3.6 BP Prediction Bit The BP prediction bit is the bit that is stored in the BP cache and determines the history of the branch. This bit is the reference to whether the branch was predicted taken or predicted not taken the last time it was encountered. This section provides a detailed description of the reading and writing of the prediction bit. DIGITAL INTERNAL USE ONLY instruction Decode 4-43 4.3.3.6.1 BP Prediction Bit Read Two inputs supply the BPC prediction bit during a read operation. (refer to Figure 4-29) BPP_PREDICT - the history bit that is stored in the BPC. e e XDTB_BRAM_BIAS - the prediction bit supplied by the BRAM bias logic. This prediction is the fixed prediction that is based on the opcode of the branch instruction. The selection between these two inputs depends on if the branch prediction tagismatch used. occurs in the BPC. When a tag match occurs, the prediction bit from the BPC y currentl is it because read be cannot BPC the or BPC When there is no tag match in the being written, the branch prediction is supplied by the branch bias logic. Loop branches and unconditional branches assert FORCE_OUTPUT which always selects PREDICT_TAKEN. 4.3.3.6.2 BP Prediction Bit Write The prediction bit is written to the BPC when the branch is initially encountered and may also require being rewritten (inverted) when a branch prediction is determined to be incorrect. The logic that writes the prediction bit can write a prediction bit for a second branch prediction while the preceding branch is still under evaluation. In Figure 4—29, TAKEN or PCBP_BP_TAKEN is the output that is used to access the BPC history bit during read and write operations. For reads, BP_ADDRESS_SEL_ H[01:00] (PCU microcode bits [22:21]) selects A_PREDICT_TAKEN to access the BPC. For writes to the BPC, BP_ADDRESS_SEL_H[01:00] selects either BRANCH_TAKEN or A_SECOND_BRANCH_TAKEN. Either of these fields is selected when writing the BPC on a miss, or for rewriting the BPC on a correction or an unwind. BRANCH_TAKEN is selected from one of five different fields by BRANCH_TAKEN_ t SEL_H[02:00]. The selection of the source for this field depends on timing and differen states or conditions related to the branch instruction. The following list describes the five sources and when they are used: e e The previous output is selected during a noop. The inverted output of PREDICT_TAKEN is selected for a correction. ¢ PREDICT TAKEN is selected for the write path when the branch. misses in the BPC and it is an unconditional branch. e On a bad not taken prediction, rewriting the BPC is done by selecting the field that forces the history bit to be asserted (shown as 1 in Figure 4-29). e An unwind uses the latched and inverted version of BRAN CH_TAKEN to rewrite the history bit. e When two branches are being processed, PREDICT_TAKEN is latched and then used to rewrite the history bit if it is required (correction or unwind occurs). BRANCH_TAKEN_SELECT[02:00] controls the selection of the write field and is supplied by the PCU microcode bit [18:16]. DIGITAL INTERNAL USE ONLY dG§NOONNfi( d HOLVN 8 V1 dHONVHE a d1 Figure 4-29 d8 (lo :+0]1387SS30AQV HONVHE N3IXNVL YNIMVLTL101GIHd 3004H0IW){o1'81] 300°2IW)([12:z2] HOLlV1 (1) H'VONOVNHODUJE3INSINHVOLNV{H0E:2N03)I1¥3VSL dNOEILDO103YHd 101034d N3¥VL DIGITAL INTERNAL USE ONLY HN6979600X d89d d8 N3NVL 4-44 Instruction Decode NaNVL HOLV1 HOLVY SW8v1idgex DdOHAVOL8D1dN \L3N0d1N3O BP Predi ion Bit Write instruction Decode 4-45 4.3.3.7 BPC Address Selection ESS[05:00]. On read and write references, the BPC is addressed by PCBP_BP_ADDR 4-30). When (Figure address the s provide :00] When reading the BPC, DECODE_PC[09 address the to used also is H_PC BRANC used. is :00] writing the BPC, BRANCH_PC[09 BPC when rewriting the BPC during a correction or an unwind. decode PC is When a second branch is to be written before the first is validated,thetheBPC. latched and selected as SECOND_BRANCH_PC[09:00] to address The ten bits of the address are supplied by two of the PCU MCAs: PCBP provides bits [05:00]. PCVC provides bites [09:06]. by the PCU The field that selects the BP address (BP_ADDRESS_SEL[01:00]) is provided_ microcode bits [22:21]. DECODE PC SECOND BRANCH PC DECODE PC BRANCH PC LATCH » PCBP_BP_ADDRESS[05:00] PCVC_BP_ADDRESS{09:06] SECOND BRANCH PC ——® LATCH WA_X0087_8§ Figure 4-30 BPC Address Selection DIGITAL INTERNAL USE ONLY 4-45 S Specifier Decode This chapter describes the logic representing the IBox specifier decode pipeline. stage. It also describes the three specifier handlers; the interfaces to the MBox, EBox, and VBox; PC generation; and related stalls and flushes. 5.1 : Overview The XBAR decodes macroinstructions and passes individual specifiers to the OPU MCU. Within the OPU, the specifiers are processed by a specifier handler and passed as operand data to the EBox or passed as an operand address to the MBox. The OPU MCU contains three specifier handlers: e Complex specifier unit — The CSU handles all specifiers other than short literal and register specifiers. e Short literal unit — The SLU expands the short literal and floating short literal e Free pointer logic — The FPL manages the pointer to the Ebox source list. The free pointer points to the next location in the source list that the IBox can write to. Source and destination pointers are passed through this unit to the EBox source and specifiers into a format and passes them to the EBox source list. destination pointer queues. The following list describes the processing of the instruction ADDL3 RO #54 R5 in the specifier decode pipeline stage. The example is based on the XBAR having decoded the instruction in a single cycle. e The FPL receives RO, register valid, and source 1 valid for the first instruction specifier. e The SLU receives the short literal data (54) and short literal valid bit. The FPL receives source 2 valid and register not valid. e R5, the register valid bit, and the destination valid bit are sent to the FPL for the destination specifier. e OCTL receives a 31-bit register mask indicating RO will be read and R5 will be written. DIGITAL INTERNAL USE ONLY 5-1 5-2 Specifier Decode From these inputs, pointers for each of the specifiers are passed to the EBox in the following manner: The source 1 logic passes a 5-bit field to the source pointer queue. The bit field indicates that the operand is a register and contains the register number. The short literal specifier (source 2) is expanded by the SLU and passed to the EBox source list. The free pointer provides the address in the source list that will receive the operand. The 5-bit source 2 pointer is placed in the source pointer queue and contains the location in the source list of the short literal operand. The free pointer is incremented by 1. This sets the free pointer to the next free location in the source list. The destination pointer is passed to the destination pointer queue and contains a 5-bit field indicating that the destination is a register and indicating the register number. The register mask is stored in the register mask logic and is not discarded until the instruction is executed by the EBox. If the destination specifier were a register-deferred specifier, it would be processed by the CSU. When processing ADDL3 RO #54 (R5), the two source specifiers would process the same as previously described. The destination specifier (R5) would be processed by the CSU. To process this specifier, the CSU sends the contents of R5 (the destination address) to the MBox write queue and the destination pointers to the EBox destination queue. The destination pointer contains a bit signifying a memory destination. For the instruction CLRW @(R4)+[R5], the CSU performs the following steps: 1. Sends an OPU port request to the MBox to return the contents of the address supplied by R4 and autoincrements R4 by four. Multiplies the contents of the index register (R5) by two and then adds the data returned from the MBox to the contents of R5. The result of the addition produces the address of the operand which is sent as an OPU request with the data being returned to the EBox source list. DIGITAL INTERNAL USE ONLY Specifier Decode 5-3 5.1.1 Stall Logic s Figure 5-1 shows a block diagram of the stall buffer that each of the specifier handler has to buffer the inputs from the XBAR. These stall circuits enable the specifier handlers to receive two specifiers from the XBAR before they inform the XBAR that it must stall. The stall buffer consists of two scan latches, a latch, and a multiplexer. The multiplexer provides selection of input data or stalled data to be processed by the specifier handler. This data is held in a scan latch until the present specifier sequence is completed. Theif present specifier sequence is represented by current data and can also be stalled data the sequence requires multiple cycles or if the specifier handler has stalled. At the start of a specifier sequence, the input data is selected by the following: OPU_SEQUENCE_START_H in the CSU SL_SEQUENCE_START_H in the SLU FPL_STALLED_L in the FPL The input data is selected as current data and passed to the functional units of the specifier handler. Current data is also passed through the latch and scan latch, and n placed on the input of the multiplexer. Any subsequent specifier cycles for this operatio will continue to select stalled data at the multiplexer until the next sequence start. STALLED DATA SCAN INPUT DATA - LATCH LATCH SCAN | SPECIFIER DATA ‘ CURRENT DATA LATCH SEQUENCE START MR_X0098_88 Figure 5-1 OPU Stall Logic DIGITAL INTERNAL USE ONLY 5-4 Specifier Decode 5.2 Complex Specifier Unit The complex specifier unit (CSU) provides the following functions: Handles complex specifiers. Calculates operand addresses and branch target addresses. Controls the MBox OPU port. Provides IBox data control to the EBox. Performs GPR and REOG write operations. The CSU is contained in the OPU MCU and is divided across five MCAs. The MCAs provide the following functions: e OPUA — Low word of the data path ¢ OPUB — High word of the data path ¢ OSQA — Control unit for RLOG, GPRs, OPUA, OPUB, and the OPU port and EBox interfaces e STG2 — Low word of the GPRs s STG3 — High word of the GPRs Figure 5-2 is a basic block diagram showing the organization of the CSU. EBOX MBOX < EBOX INTERFACE > < OPU PORT INTERFACE > r————— - —— — - - | _——————— — — - —_—e———— — EBOX_DATA[31:00] | | OP_ADDRESS[31:00] ' I XBAR_DISPLACEMENT[31:00] | | OPUA/OPUB l 0sQA fi | | OPU PORT CONTROL | MBOX_OP_DATA[31:00] ' ' YGPR_DATA | [31:00] XGPR_DATA[31:00] - | | OPU_RESULT [31:00] OPU CONTROL | 530":‘0':%2{“05 i I | i I VBOX_ADDRESS[31:00] ! PR CONTROL i STG2/STG3 EBOX_RESULT[31:00] | | | | b | e e e e e e e e e e e e —— —— —— - - —— —— — — — — — — — J MR_X0099_83 Figure 5-2 CSU Organization DIGITAL INTERNAL USE ONLY Specifier Decode 5-5 te the target addresses of The OPUA and OPUB MCAs contain the adders that calcula ned in the OPUA logic. contai is operands for the EBox. A context shifter (multiplier) in these two MCAs and ned contai lly The OPU port and the EBox interfaces are physica receive control from OSQA and OSQB. n to under control The GPRs are dual port STREGs that can be read from and writte provided by OSQA. These GPRs can also receive inputs from the VBox. The VBox sends addresses to the MBox through the OPU. 5.2.1 OPUA Data Path of the CSU. OPUA The OPUA MCA controls bits [15:00] of the address and dataerpath to calculate the operand contains an adder that receives inputs that are added togeth the data to the adder from data or address. Two multiplexers, AMUX and BMUYX, supply r, the AMUX provides a variety of sources. When processing a displacement modethespecifie sign-extended data to the the register (YREG) to the adder and the BMUX provides es the operand address. (For adder. The adder adds the two inputs together and produc OPUB produces the high example, OPUA produces the low 16 bits of the address while 16 bits.) data, and presents OPUA receives displacement data from the XBAR, sign extendss the the GPRs. The CSU it to the adder of the MCA as an operand. OPUA also accesseis current ly being evaluated microcode selects the relevant operands for the specifiercanthat be used as an address for an and adds them together. The output of this operation the PCU as a target PC, or placed OPU port request, passed as data to the EBox, senton.to Figure 5-3 shows a block diagram on the input of the adder for the next cycle operati of the data and address paths of the OPUA MCA. 16 bits of the 32 bits The AMUX receives XBAR_DISPLACEMENT_H[15:00], the low 0]) represents the [15:0 ENT_H LACEM of XBAR displacement. This input (XBAR_DISP following: MEOQW P> Byte, word, or longword displacement for addressing modes A through F Byte displacement Byte displacement deferred Word displacement Word displacement deferred Longword displacement Longword displacement deferred e TImmediate data for addressing mode 8F Absolute address for addressing mode 9F e Byte or word displacements for branch instructions e e Byte, word, or longword displacement for relative addressing modes (AF through FF) to produce SXDISP_H[15:00], which XBAR_DISPLACEMENT_H[15:00] is sign extended(AMUX and BMUX) that provide the is then placed on the input of the two multiplexers data to the adder circuit. DIGITAL INTERNAL USE ONLY lw0a:-sz51_a:1xasN810XS1foo:5]~{0: gU03F1I-NddIILLI1QNOo,NAAEBde291 HOIVY)_HI{G0NI0Y :w&f iNVaHSO V(1N34080:V5191dV°LOT1M)1(N2Ed3H01)(2918 s 0 4 3 Z N 6 0 L N Y H D L I Y M ( A L Y ) xingeain awe T3 | 013HSD:13280lH YI8{OS03SO:15L1IINLOIMSNTVIAXS316187 "uN 1 % 9700 YNYO§HYBXN[0:S1)ELdN.SIIW-OH¢IINXDY—O—SVLVNe31I—S—DId(8NSAS1"e6NI0VSO0IY1D:N8W840"(YION61oI}4)(H10d3"-O~5:X101:-S-4JaH-OMI E10i2(]K0OA0DNEGV13~[ ::0II8lN55iRv1dQiDi3E1IuAv80GT9Sa1dI-I—NeH:70Rl4SOXuIo:1SNaDPX1e:A30s=:BWL{1H:05bNDe°8Siv-ivo’1 1XHNSIeDNNYV7(O102S4:154110a.1G3nviLNsYv.o8WsO|_ueHaaUeviN)i3s.OTl avVoR) HenaHoHIT6LlaTNVlGUY l1oo:siPIlHL"ianga3n0v’PYloP:a-:sLNVOHSO XYOVNH1UIIOV—O4-TOIXNO1B8I03OW10Vi9:4[$101-80]1:H5XN Y 193 138 0 20l 5-6 Specifier Decode HOLV] Figure 5~-3 OPUA DIGITAL INTERNAL USE ONLY Specifier Decode 5-7 5.2.1.1 AMUX Inputs de (AMUX_ The AMUX receives six inputs, one of which is selected by the OPU microco inputs to the The adder. the to SELECT _H[02:00)) and passed as one half of the operand AMUX are as follows: . SXDISP[15:00] — XBAR displacement that has been sign extended. e by e CURRENT_PC_H[15:00] — The current updated PC. This PC is generated branch ing calculat for used is and TA adding XBAR_DECODE_PC and DECODE_DEL target PCs and in PC relative addressing. e OPU data — Data that has been returned from the MBox in response to an OPU port request when processing an indirect addressing mode specifier. STG2_YGPR_H[15:00] — YGPR is the GPR that is being referenced in the present specifier cycle. YGPR represents any GPR reference except for indexed specifiers. GPRs. e Delayed GPR data — YGPR data that has not been written to the EBox (IRCs). s conflict read n Delayed GPR data is used when processing intra-instructio e e OPUA result — The result of the last addition (adder output) that is pléced back on the input of the AMUX for further processing. 5.2.1.2 BMUX Inputs de (BMUX_ The BMUX receives six inputs, one of which is selected by the OPU microco SELECT_H{[02:00]) and passed as the operand to the context shifter and then the adder. The inputs to the BMUX are as follows: e OPUA result — The result of the last addition (adder output) that is placed back on the input of the BMUX for further processing. e SXDISP[15:00] — Displacement data provided by the XBAR. e STG2 XGPR — The input to the BMUX to represent which register is being accessed for an indexed operation. e e Constant -1 — Selected for autodecrement operations. Constant +1 — Selected for autoincrement operations. output by 0, The BMUX output is passed to the context shifter, which multiplies the de _ 1, 2, 4, 8, or 16. The multiplication type is selected by the OPU microco (CSHFT d deferre SEL_H[02:00]) for index, autoincrement, autodecrement, and autoincrementg selectin by logic operations. BMUX inputs are passed unchanged through the CSHFT zeros as the function. DIGITAL INTERNAL USE ONLY 5-8 Specifier Decode 5.2.1.3 Adder The output of the AMUX multiplexer (AMUX_H[15:00]) is input to a 16-bit adder and added with the output of the context shifter (CSFT_H[15:00]) to produce RESULT _ H[15:00]. Any carry produced from the addition is sent to a similar add circuit in OPUB. The output of the OPUA adder (RESULT_H[15:00]) is passed to one of the following signals to the other functional units of the MBox, EBox, and IBox: e IBOX_DATA_H[15:00] is passed to the EBox source list as an operand. e IBOX_OP_ADDRESS_.HI[15:00] sends result data to the MBox as the address of an operand with data being returned to the IBox or to the EBox source list, or as the destination address of an operand where the EBox is to write the destination data. e OPUA_RESULT_H[15:00] is selected as the output and sent to the PCU as the target address of a branch instruction. This output (OPUA_RESULT_H[15:00]) can also be selected and written to a GPR (STG2) that was updated (autoincrement or autodecrement operation). OPUA_RESULT_H[15:00] can also be selected as an input to AMUX or BMUX for subsequent add operations. ' 5.2.2 OPUB Data Path The OPUB MCA performs functions similar to OPUA. This MCA receives the high-order bytes of XBAR displacement data (XBAR_DISPLACEMENT _H[31:16]) and produces the high-order bytes of the operands or operand addresses and GPR update data. Figure 54 shows a block diagram of the OPUB MCA. OPUA contains an AMUX, a BMUX, and a context shifter. As in OPUA, the output of the AMUX is added to the output of the BMUX and context shifter to produce the operand address or the operand data. OPUB uses two adder circuits. The two outputs they produce are “add with carry” and “add without carry.” (For example, the high-order 16 bits of the operand are added together, producing an output that assumes a carry and a no carry from the low-order addition.) The output of the two adders are placed on a multiplexer and selected by the carry bit from the adder of OPUA (OPUA_ADDER_COUNT_H). DIGITAL INTERNAL USE ONLY 5-9 U V 1040K 8 7 Specifier Decode NYOS nsiv N o i H2IVI {1o4:1¢8ol7H HO1VI NVOS NVOS HO1V1 Figure 5-4 2d 191:18la810XS Fivis 0§HiZ I(NVI1SNO)D OPUB DIGITAL INTERNAL USE ONLY 5-10 Specifier Decode 5.2.3 Current PC Generation The CSU calculates the current PC for branch instruction target addresses and for PC relative addressing mode specifiers. The current PC generation logic is divided between the OPUA and OPUB MCAs. OPUA generates the low-order word of the current PC while OPUB generates the high-order word. This section provides a detailed description of the current PC logic. Figures 5-5 and 5-6 are detailed block diagrams describing the current PC generation in the CSU. —* PC_OPU_DECODE_PC{31:16} T8 Latcn SCAN TM STALLED_PC[31:15] SCAN LaTcr L_ CURRENT_PC([31:16] | SPECIFIER_DECODE_PC[31:16] \ LATCH ’ 0 PC_PLUS_CIN{31:16] INC SCAN LATCH a INVERT_CURRENT_PC_PARITY_H[03:02] QPUA_PC_COUT OSQA_OPU_SEQ_START MRA_X5102_08 Figure 5-5 CSU Current PC (High Slice) OSOA_OPU_SEQ_START_H o PC_OPU_DECODE_PC_H[15:00] | gcan | SPECIFIER_DECODE_PC[15:00] LATCH > SPECIFIER_PC_H[15:00] ) XDTB_DECODE_DELTA_H{05:00] | scan | SPECIFIER_DECODE_DELTA_M[15:00] LATCH CURRENT_PC_H[15:00] 8 LATcH AL SCAN LATCH TMTM SCAN CaTcH ‘} STALLED_PC_H[15:00] INVERT_SPECIFIER_PC_PARITY_H[01:00] MA_X0103_68 Figure 5-6 CSU Current PC (Low Slice) DIGITAL INTERNAL USE ONLY Specifier Decode 5-11 : 5.2.3.1 OPUA Current PC [15:00] OPUA receives PCxx_OPU_DECODE_PC_H[15:00] from the PCU. These 16 bits are delivered in the following manner: PCBP sends bits [07:001. PCVC sends bits [12:08]. PCLO sends bits {15:13]. number of The XBAR sends XDTB_DECODE_DELTA_H[OS:bO], which equals the totalsince the last instruction buffer bytes that have been shifted out of the instruction buffer SHIFTOPCODE was asserted. These two signals are latched and then added (Figure 5-6) together to produce SPECIFIER_PC_H[15:00], which is latched and produces an output of CURRENT_PC_ H[15:00]. A carry from the addition results in a carry bit being asserted in the logic that generates the high-order word of the current PC. CURRENT_PC_H[15:00] is an AMUX input in the OPUA data path of the CSU. This logic also contains a stall buffer that controls selection of a current PC output or a stalled output (STALLED_PC_H[15:00]). The stalled PC is selected when the CSU is in sequence. 5.2.3.2 OPUB Current PC [31:16] The high-order word for the current PC (CURRENT_PC_H[3 1:16]) is generated in OPUB and is an input to the AMUX in the OPUB data path of the CSU. Figure 5-5 shows a block diagram of the logic that generates the high-order word of the current PC. OPUB receives PCxx_OPU_DECODE_PC_H[31:16] from the PCU. Theses 16 bits are delivered in the following manner: PCHI supplies bits [31:24]. PCLO supplies bits [23:16]. These two PC fields are latched and produce SPECIFIER_DECODE_PC_H[31:16], which is input to a multiplexer for final selection of the current PC. Also input to the multiplexer is an incremented version of SPECIFIER_DECODE_PC_H[31:16] (PC_PLUS_ CIN_H[31:16]). The incremented PC is selected when the calculation of the low-order word of the current PC results in a carry. This logic also contains stall circuitry similar to that in the logic that generates the low-order word of the current PC. DIGITAL INTERNAL USE ONLY 5-12 Specifier Decode 5.2.4 CSU Microcode The CSU is controlled by microcode logic that is resident on the OSQA MCA. The microcode controls the multiplexers that supply inputs to the adders of the CSU. Control of the OPU port interface, GPR writes, YGPR incrementing, and source list writes is also provided under microcode control. 5.2.4.1 CSU Microaddress The CSU microaddress is 10 bits wide and contains 4 fields that govern the flow of the microcode (Figure 5-7). Table 5-1 lists the fields of the microaddress. 08 05 08 INDEX MODE 02 04 ACCESS TYPE 00 01 COUNT MR_X0270_88 Figure 5~7 CSU Microaddress Format Table 5-1 CSU Microaddress Descriptions Index [09] Indexed Indication Label 0 Nonindexed specifier NI 1 Indexed specifier | Mode [08:05] Specifier Mode Label Values 0000 0001 Autoincrement Autoincrement deferred AINC AINCDEF 8x 9x 0010 0011 0100 Displacement Displacement deferred Unused Register DISP DISPDEF 4 REG Ax, Cx, Ex Bx, Dx, Fx 0x, 1x, 2x, 3x, 4x 0110 Register deferred REGDEF 6x 0111 Autodecrement ADEC 7x 1000 Immediate IMM 8F 1001 Absolute ABS 9F 1010 1011 1100 1101 1110 1111 Relative Relative deferred Unused Unused PC deferred PC autodecrement REL RELDEF 12 13 PCDEF PCADEC AF, CF, EF BF, DF, FF OF, 1F, 2F, 3F, 4F 5F 6F 7F [04:02] Specifier Access Type Label 000 Address (ASRC) A 001 Read 010 Write R W 0101 Access Type YThis specifier mode is selected when IRC is asserted in the XBAR. DIGITAL INTERNAL USE ONLY Original Mode 5x! Specifier Decode 5-13 CSU Microaddress Descriptions Table 5-1 (Cont.) Access Type [04:02] Specifier Access Type Label 011 100 101 110 111 Modify Vield source (VSRC) Branch displacement Implied Callx specifier M v B I C Count [01:00] Microword in Sequence Label 00 01 10 11 First microword Second microword Third microword Fourth microword 0 1 2 3 Count [01:00] Register Mode Size Label 00 01 10 11 Longword data type Octaword data type Three longwords Quadword data type 0 1 2 3 Register Mode 5.2.4.2 CSU Microword The CSU microword is 19 bits wide and is divided into 12 fields (Figure 5-8). Table 5-2 lists the fields of the microword. 16 13 14 15 11 12 IADDR‘”“E° i5°5] SLIST WRITE . MREQ 24 25 MICROWORD PARITY 23 INC ' WRITE YREG , GPRS 08 FACTOR 22 WRITE PC 00 04 03 07 MUL BMUX | AMUX 21 2018 SELECT .SELECT MRECQ DEST 17 Mneol SIZE MRA_X0271_88 Figure 5-8 CSU Microword Format DIGITAL INTERNAL USE ONLY 5-14 Specifier Decode Table 5-2 CSU Microword Field Descriptions Bits Name 03:00 AMUX select 07:04 11:08 12 13 14 BMUX select Multiply factor Value Description 000 Previous cycle’s adder output. 001 Sign-extended I-stream data. 010 Current specifier’s PC. 011 OPU port data returned to OPU. 100 Contents of specifier’s base GPR. 101 Contents of current delayed GPR. 000 Previous cycle’s adder output. 001 Sign-extended I-stream data. 010 Constant value of one. 100 Contents of specifier’s index GPR. 101 Constant value of minus one. 000 Multiply BMUX output by zero. 001 Multiply BMUX output by one. 010 Multiply BMUX output by two. 011 Multiply BMUX output by four. 100 Use specifier’s context. Write source list EOS MREQ 0 Don’t write adder result to source list. 1 Write adder result to source list. 0 Specifier sequence continues. 1 End of specifier sequence. 0 Don’t issue an MBox OPU port request. 1 Issue an MBox OPU port request. Use adder result as OPU port address. 16:15 MREQ address select 0 19:17 MREQ size 00 Use specifier’s context as request size. 01 Force request size to word. 1 21:20 22 23 24 25 MREQ destination Write PC Write GPRs Increment YREG Microword parity DIGITAL INTERNAL USE ONLY Use AMUX output as OPU port address. 10 Force request size to longword. 11 Force request size to quadword. 0 Return data to OPU. 1 Write data to EBox source list. 0 Don’t send target PC. 1 Send target PC if appropriate. 0 Don’t write GPRs. 1 Write IBox and EBox GPRs. 0 Don’t increment base GPR pointer. 1 Increment base GPR pointer. 0 Parity bit disabled, microword has odd number of bits. 1 Parity bit enabled, microword has even number of bits. Specifier Decode 5-15 Figure 5-9 shows signals involved in generating a CSU microword. The following list describes the source of each signal used to generate the CSU microword: e OSQB_ACCESS_TYPE_H[03:00] is from the OSQB DRAM and defines the access : type of the current specifier. e XDTB_INDEXED_H is from the XBAR and is asserted when the specifier is indexed e XDTB_MODE_H[03:00] is from the XBAR and provides the addressing mode of the e XDTA_XREG_H[03:00] is from the XBAR and provides the base register for any mode. specifier. register operands. A count field is also input to the microaddress generation to control the count of the CSU sequence. The count field is generated by decoding the inputs to the microaddress generation logic. 5.2.5 CSU Stalls In the CSU, the OSQA MCA contains the logic that monitors conditions that may initiate stalls. The stalls that occur in this unit can be related to handling certain specifiers (read and write conflict stalls) or related to the units that the CSU is supplying data to. This section describes the stalls that occur in the CSU, and it describes how each stall is detected and how each stall is cleared. : 5.2.5.1 Scoreboard Stalls The OCTL MCA contains the scoreboard logic that tracks the reading and writing of GPRs by the EBox during the execution of macroinstructions. Read and write conflicts occur when the IBox is directed to read or write a GPR before the EBox has performed a required operation on the same GPR. When these conflicts occur, the CSU must stall and wait for the EBox to complete the conflicting instruction before it can proceed. Two types of read and write conflict stalls can be asserted in the CSU. The two types of stalls are caused by the same conflicts but differ only in the timing of the detection of the stall. For example, a read conflict can initiate a stall by asserting a SPECIFIER_ READ_CONFLICT_H or a CURRENT_READ_CONFLICT_H. A specifier read conflict occurs when the conflicting data is latched from the XBAR but has not yet been acted on by the CSU. A current read conflict occurs after the CSU begins evaluating the data (OPU_SEQ_START_H is asserted). This section describes the initiation of read conflict and write conflict stalls, and it describes the clearing of these conditions. DIGITAL INTERNAL USE ONLY S 3HA QVOHDINW YU¥HI33I3i11144H411100033343dddSSSSaQaaHHHOOOMMMNNN[P[[1IS8iLLIHIII[LHIHIH(XX39€0NLL0818XN)UWW0M)3)8)sW(Lv()SLaS35£HN3(Sdnn9D3030n343W0Y3(YNO(I3L2V1Y5NILS3Q HN 14C0X 69 H 3 I 4 1 0 3 d S " a H O M N ( 0 : 2 0 l H X n W v ) ( 1 0 3 1 3 8 { € 5 e { 9 H [ I 1 9 4 1 0 3 d S : Q 8 H O 0 M N ] I H A d I L I N A ) ( H O L O V d 4 " I 4 1 A [ 6 0 3 L 1 W M ) 3 0 U N O S ( S 1 7 H3I AID3dS Odn 10 :60lH aIHiI414H0130d3SdOSB aMHOMNN {lo2tLIIHH aXOn038)N)401L{$33ONN033INHOD(3sS$3HAAV 5-16 Specifier Decode GHOMOYOIN J1VHINID [QY(3I1H9L4D1Ou0MI3dmNHS) JLVHINID Figure 5-9 3B8[Sd00:3A€S0l0LJVOH 3Qf8oL0:AcOXolN"H V{€DL03AlU0XHA CSU Microword Select DIGITAL INTERNAL USE ONLY Specifier Decode 5-17 5.2.5.1.1 Read Conflict Stall A read conflict occurs Read conflicts are detected by OSQA and initiate scoreboard stalls. when the CSU is directed to update (autoincrement or autodecrement) a GPR that the EBox is directed to read. For example: MOVL RO, xx (read R0O) ADDL2 yy, (R0)+ (autoincrement RO) e the EBox is directed to The above sequence of instructions causes a read stall becaus how many instructions on read RO and the IBox is directed to update R0. Depending RO ahead of the EBox the IBox is, the IBox could possibly update before the EBox reads RO. MASK_HI[15:00]), Read conflicts are detected by monitoring the read mask of(READ_ crement or autoin an nce occurre the and 0]), the specifier GPR (YREG_H[03:0 both point to YREG and mask read the When ). ODE_H autodecrement (AUTOXXX_M t is to be performed by the CSU, the same register and an autoincrement or autodecremen SPECIFIER_READ_CONFLICT_H and SPECIFIER_READ_STALL_H are asserted in OSQA. These stall signals also assert OPU_STALLED_H and SCOREBOARD_STALL_H. containing the conflict The stall is cleared by a flush or unwind or when the instruction PU_AND_SL_H INIT_O occurs, is executed by the EBox. When a flush or BP unwind When the EBox completes an . negated is asserted and any read or write conflicts are OCTL to directs signals This d. asserte is instruction, EBOX_INSTRUCTION_DONE_H ded is discar is that mask the If tion. discard the masks related to the completed instruc the one that asserted the stall, then the CSU resumes operation. Stall 5.2.5.1.2 Write Conflict conflicts occur when the Write conflicts are similar to read conflicts in the CSU. dWrite the same GPR. read to directe is IBox EBox is directed to write a GPR and the (WRITE_MASK_H[15:00]) and Write conflicts are detected by monitoring the write mask the specifier GPR (YREG_H[03:00)) for register mode addressing. When a match of the write mask and a YREG is detected, a write conflict is asserted. rs. OSQA compares A write conflict can also occur when processing index mode specifiethis conflict. the index register (XREG_H[03:00]) with the write mask to detect SPECIFIER_WRITE_STALL_H, Write conflicts assert CURRENT _WRITE_STALL_H or asserts BOARD_STALL_H which asserts OPU_STALLED_H to stall the CSU and flush, SCORE unwind, or when PCU a in the OCTL MCA. These stalls are cleared by an EBox the OCTL mask logic to the EBox has complated the conflicting instruction and directs discard the write mask associated with it. DIGITAL INTERNAL USE ONLY 5-18 Specifier Decode 5.2.5.2 Branch Under Branch Stall The CSU stalls when a conditional branch is encountered, if an outstanding branch has not yet been validated by the EBox. OSQA maintains a count of conditional branches that have been processed by the CSU. The unconditional branch count is incremented each time the CSU sends a target PC to the PCU (TARGET_PC_H[31:00], and TARGET_ VALID_H) and the current instruction is an unconditional branch (CURRENT_ UNCOND_BRANCH_TB_L negated). The branch under branch stall (CURRENT_BUB_STALL_H) occurs when the following conditions exist: e BRANCH_CNT _EQL_1_H is asserted. * SPECIFIER_BRANCH_ACCESS_TYPE_H is asserted. This signal is from the 0OSQB DRAM access type. ‘ - * SPECIFIER_DATA_AVAILABLE_H is asserted. This signal is assérted when XSCA_ VALID_H is asserted and XDTA_RAF_L is negated. * SPECIFIER_UNCOND_BRANCH_L is asserted in the instruction buffer simple decode logic. ¢ OPU_SEQ_START_H is asserted. CURRENT_BUB_STALL_H asserts OPU_STALLED_H. The stall is cleared by an EBox flush, PCU unwind, or when the first branch is validated by the EBox (EBOX_BRANCH _ VALID_H). 5.2.5.3 AUTOxx Under Branch Stall The CSU does not process autoincrement and autodecrement specifiers when an outstanding branch prediction is awaiting EBox validation. When an autoincrement, autodecrement, or autoincrement-deferred mode specifier is detected and the branch count is equal to one, the CSU stalls and waits for the branch prediction to be validated before it continues operation. CURRENT_AUB_STALL_H is asserted to initiate this stall when the following conditions exist: * SPEC_NOT_BRANCH_OR_IMPLIED_H is asserted. * SPECIFIER_DATA_AVAILABLE_H is asserted. * SPECIFIER_AUTOXX_MODE_H is asserted. * BRANCH_COUNT _EQL_1_H is asserted. e OPU_SEQ_START H is asserted. Asserting CURRENT_AUB_STALL_H also asserts OPU_STALLED_H. The stall is cleared by an EBox flush, a PCU unwind, or when the EBox validates the outstanding branch prediction. During this stall, it is possible for the CSU to initiate an OPU port request to handle the AUTOxx specifier. For an autoincrement specifier, the CSU can read the GPR and then write (autoincrement) the GPR when the stall is negated. DIGITAL INTERNAL USE ONLY Specifier Decode 5-19 5.2.5.4 OPU Port Grant Wait Stall When the CSU issues an OPU port request, the MBox must respond to the request before the CSU can continue processing. When the MBox does not respond, the CSU stalls. When an OPU port request is issued, the CSU selects stalled A and B addends as input to the CSU adder by asserting OSQA_OP_GRANT_WAIT_H. This holds the port request information until the MBox responds. port request is issued. OP_GRANT_WAIT_H asserts OPU_STALLED_H if another OPUassertin g MBOX_OP_ The stall is cleared when the MBox responds to the request by GRANT _TA_L or if an EBox flush or PCU unwind occurs. 5.3 Short Literal Unit The short literal unit (SLU) expands the 6-bit integer and floating short literal (SL) e operands into a relevant field (32-, 64-, or 128-bit field). Figure 5-10 shows the inputs and outputs of the SLU. This unit receives XBAR_ SL[05:00] and a copy of the opcode (OPCODE[08:00]) from the XBAR. The opcode is decoded to determine the SL data type. The data type and the SL data are input to the SL data formatter, which outputs the expanded SL data. The SLU can produce a single longword of expansion per cycle. Data formats larger than a longword require successive cycles to produce their output. . 0sQB XBAR_SL_H[05:00] SHORT XBAR_SL_VALID_H SL_DATA_TYPE_H[03:00] LITERAL UNIT EXPANDED_SL_H[31:29, 14, 09:00] » MR_X0104_89 Figure 5-10 Input and Outputs of the Short Literal Unit 5.3.1 Short Literal Processing Figure 5-11 shows a block diagram of the SLU. The SL formatter provides integer andtheF, D-, G-, and H-floating formats to the fornat select logic. The SL formatter receives R_ 6-bit short literal data (XBAR_SL_H[05:00i) and expands it into four formats (INTEGE H[31:00], F/D_FLOAT_H([31:00], G_LFLOAT_HI[3 1:00], and H_FLOAT_H[31:00]). The expansion select logic selects one of the four SL formatter outputs, PREVIOUS_ EXPANSION_HI[31:00], or zero. PREVIOUS_EXPANSION_H[31:00] is selected when short literal data that was previously expanded was not written to the EBox source list (for example, IBox-to-EBox interface is busy). Zero is selected in cycles other than the first in multiple cycle expansion (for example, D-, G-, and H-floating formats). The data type of the specifier being processed (SL_DATA_TYPE_H[03:00]) and a signal monitoring the status of the EBox source list (SLIST_FULL_H) are input to expansion select. The data type is generated by decoding instruction opcodes in the OSQB access type and data type logic and provides selection of the expanded short literal data. DIGITAL INTERNAL USE ONLY 5-20 Specifier Decode INTEGER[31:00] SL_EXPANSION[31:29, 14, 09:00] F/D_FLOAT(31:00] XBAR_SL_H[05:00] | graLL sL BUFFER |7] FormMaTTER | G-FLOATI31:00] - H_FLOAT([31:00] 6 e H B PREVIOUS_EXPANSION [_’ SL_DATA_TYPE_H[03:00] SCAN J LATCH SCAN LatcH EXPANSION l [© SELECT SLIST_FULL ? XBAR_SL_VALID_H STALL BUFFER [ SL_DATA_TYPE_H[03:00] \ | SEQUENCE_START } —® SCAN LATCH | LONGWORD [T SELECT \ SL_BUSY_sTaLL } ‘ LONGWORDS_OUTSTANDING[02:00] LONGWORDS_OCUTSTANDING_1[02:00] NEW_LONGWORDS_OUTSTANDING{02:00) T8 LATCH ‘] NEW_LONGWORDS_OUTSTANDING_1[02:00] MR_X0"0S_89 Figure 5-11 Short Literal Unit Block Diagram Short literal sequences start when SL_SEQUENCE_START_H is asserted. Assertion of this signal selects valid short literal data (XBAR_SL_H[05:00]) from the input stall buffer to be loaded into the SL formatting logic. SL_SEQUENCE_START_H is asserted when the following occurs: e XDTB_SL_VALID_H is asserted, signifying that the XBAR is passing valid short literal data to the SLU. e XDTA_RAF_L is negated, signifying that the XBAR did not detect an RAF. e SL_LWORDS_LEFT TA_EQL_O_H is asserted, signifying that no short literal specifiers are currently being processed. e PREV_SLIST FULL_L is negated, signifying that the source list is not full. e INIT_OPU_AND_SL_L is negated, signifying that no flush or PCU unwind is in progress. DIGITAL INTERNAL USE ONLY Specifier Decode 5-21 At the start of a short literal sequence, SL_LWORDS_H[02:00] is loaded into SL_ LWORDS_LEFT_H[02:00]. SL_LWORDS_H[02:00] is originally loaded with the valued. evaluate or count that is derived from the data type of the current SL specifier beingformats, the d quadwor For one. equals For byte, word, and longword formats, the count count equals two; for octaword formats, the count equals four. When CURRENT SL_LWORDS_H[02:00] equals SL_LWORDS_LEFT_H[02:00], OSQA_ list. SL_WRITES_SLIST_H is asserted to write expanded short literal data to the source ted. Each time the write signal is asserted, SL_LWORDS_LEFT_H[02:00] is decremen When the longword count is decremented to zero, the SL sequence is complete. 5.3.2 Integer Expansion s Integer short literal operands represent values from 0 to 63. The short literal operand list. source are zero extended, according to their data type, and passed to the EBox Figure 5-12 shows the outputs of the SLU when processing an integer short literal operand. In this example, the zero expansion of 63 occurs in a single cycle for the byte, word, and longword contexts. Quadword outputs occur in a minimum of two cycles. Producing the octaword format requires a minimum of four cycles. XBAR_SL{05:00] ([05:00] = 3F) SL_DATA_TYPE[03:00] SL_EXPANSION[31:28, 14, 09:00] SHORT LITERAL EXPANSION UNIT 0 7 3 103:00) DATA TYPE 0001 0010 BYTE 0011 0100 LONGWORD CQUADWORD 1100 D FLOATING 0101 1000 1011 1101 OCTAWORD G FLOATING F FLOATING H FLOATING 0 15 WORD (BYTE) F 0 0 3 (WORD) E 0 31 0 0 0 0 . 0 ) 0 3 (QUADWORD) F Ly 1 127 0 . o, 63 0 . 0 3 F (OCTAWORD) MR_X0106_8S Figure 5-12 SLU Integer Expansion DIGITAL INTERNAL USE ONLY 5-22 Specifier Decode 5.3.3 Floating-Point Expansion Figure 5-13 shows the format of the 6-bit short literal field when representing a floatingpoint number. The 6-bit floating short literal field is expanded to produce the relevant F-, D-, G-, and H-floating operands. Figures 5-14, 5-15, 5-16, and 5-17 show the format of each type of floating output from the SLU. Bit 3 of SL_DATA_TYPE_H[03:00] is asserted to indicate a floating point literal field and also provides the most significant bit (MSB) of the exponent. The actual expansion is performed by correctly positioning the exponent and fraction fields. 05 03 02 EXPONENT 00 FRACTION MR_X0107_89 Figure 5-13 SLU Filoating Point Literal Format SL_DATA_TYPE[03:00]=[1011} l XBAR_SL[05:00] ““ 10100 3130292827 26252423222120191817161514 1312 11100908 07 06 05 04 03 02 0100 olojojojojojojojojojotojojolojojojsfojojojoj1ttioftiojojojojolo MR_X0108_89 Figure 5-14 SLU F-Floating Expansion SL_DATA_TYPE[03:00}=[1100] XBAR_SL{05:00] 1t 10100 Wi 3130292827 262524232221201918171615141312 11100808 07 06 050403020100 oi{ojofojojojojojofojojojo0j0j0l0j0|1{O|OjOjO[t]t]Oft]ojO|O|O(O]|O €3 MR_X0109_8%9 Figure 5-15 SLU D-Floating Expansion DIGITAL INTERNAL USE ONLY Specifier Decode SL_DATA_TYPE[03:00}=[1000] XBAR_SL[05:00] i T 5-23 21302902827 26252423222120181817 1615141312 111009080706 0504 03020100 olololojolojojof{olotolo|ojo|lojojoltjo|loflojojojolO|1}110]1{0{0)0 €3 MR_X0110_89 Figure 5-16 SLU G-Floating Expansion XBAR_SL[05:00] 110 SL_DATA_TYPE[03:00] = {1101) YYyey 100 l 3130292827 2625242322 212019181716 1514 1312 11 10 06 08 07 06 05 04 0302 01 00 1{oloiolofololojolojo]o|ojojojoloji1{o]jo}lolojolofojojojojolt|t]0O 127 MR_X0:11_89 Figure 5-17 SLU H-Floating Expansion DIGITAL INTERNAL USE ONLY 5-24 Specifier Decode 5.3.4 Outputs to the EBox Interface The 14-bit short literal operand is passed as 32 bits of short literal data to the EBox across the EBox interface. This field is constructed by zeroing out the remaining 18 bits as the data is passed to the EBox source list. 5.3.4.1 Order When specifiers from the CSU and SLU are both valid, XBAR_SL_ORDER is used to keep the operand entries to the free pointer in order. When asserted, the short literal specifier precedes the complex specifier. This signal is generated by the XBAR. 5.3.5 Stalis The SLU stalls for three reasons: e The unit is processing a multicycle SL specifier. e The EBox interface cannot accept the expanded SL data because the OPU is using the interface. e The EBox queues are full. 5.3.5.1 Source List Full The FPL tracks the available free slots in the source list. When the source list is full, OSQB_SLIST_FULL_H is asserted in the OSQB MCA and passed to the OSQA MCA. The receipt of this signal disables any SLU or CSU writes across the EBox interface to the source list. 5.3.5.2 SLU Stalled The XBAR must be informed of the status of the SLU so that when the SLU is busy processing one specifier, another is not passed to it. OSQA_SL_BUSY_STALL_H is asserted and sent to the XBAR to signify that the SLU is busy processing a specifier. This signal is an output of the comparator, in the SLU, that tracks the number of longwords remaining in a SL expansion. 5.3.5.3 EBox interface Output Stall The EBox interface cannot pass both SL and CS data to the EBox simultaneously. If SL data is available to be written to the source list and the EBox interface is busy with CS data, SL_BUSY_STALL_H is asserted until the interface is free to accept the SL data. 5.3.6 Parity Coverage and Errors Three key signals of the SLU are parity checked before the SL specifier is processed: XDTB_SL_SPECIFIER_NUMBER_H{[02:00] XDTB_ORDER_H XBAR_SL_H[05:00] DIGITAL INTERNAL USE ONLY Specifier Decode 5-25 5.4 Free Pointer Logic The free pointer logic (FPL) processes and passes pointers that are used by the EBox to direct the execution of instructions: Source 1 pointer Source 2 pointer Destination pointer Free pointer from the The EBox maintains queues for the source and destination pointers it receives list source The GPRs. to or list IBox. The source pointers point to entries in the source or IBox the by passed been either is a queue for storing operands. These operands have prefetched from memory on behalf of the EBox. To manage the source list, the IBox generates a free pointer. The free pointer points to the next free location in the source list that the IBox will write to. The structure of the EBox queueé is as follows: The top bit of e Source pointer queue — A 5-bit wide x 16-location deep queue. remaining bits The the entry specifies if the operand is in the source list or a GPR. contain the address, in the source list, or the GPR number. top bit in e Destination pointer queue — A 5-bit wide x 8-bit deep queue. The ng bits remaini the while GPR, a or memory is ion the queue specifies if the destinat contain the GPR number when applicable. and short literal e Source list — A 16-entry circular queue for memory, immediate, free pointer must the list, operands. Each time an entry is inserted into the source a quadword is size operand be incremented to point to the next free location. If the or octaword, the free pointer must be incremented to reflect the size. Figure 5-18 shows a simplified block diagram of the FPL. 5.4.1 Source1 Pointer The source 1 pointer is generated and validated in XDTA and passed to the FPL logic in OSQB. Three signals deliver the source 1 information: e XDTA_SOURCE_REG_H[03:00] is the register number containing the operand if it is a register operand. e XDTASOURCE_REG_VALID_H is asserted when the operand is a register. e XDTA_SOURCE_VALID_H is asserted when a valid operand is being passed. and, if the These signals allow the FPL to generate a pointer, validate the pointer, free pointer. operand is not a register, allocate the source list entry and update the Figure 5-19 shows a block diagram of the source 1 pointer logic. The source 1 register xer, field XDTA_SRC1_REG_H[03:00)) is placed on the input of the pointer select multiple if no stalls are present. DIGITAL INTERNAL USE ONLY 5-26 Specifier Decode (v ) e e e cmm c— o— - — — — — — = ! e om o = - 1 XDTA_DESTINATION_REGISTER|[03:00} IBOX_DESTINATION_POINTER|03:00] XDTA_DESTINATION_VALID DESTINATION POINTERS XDTA_DESTINATION_REG_VAL IBOX_DESTINATION _MEMORY IBOX_DESTINATION_VALID ' XDTA_SOURCE1_REG[03:00] ' XDTA_SOURCE1_REG_VALID IBOX_SOURCE1_POINTER[04:00] XDTA_SOURCE1_VALID SOURCE 1 POINTERS CURRENT_FREE_POINTER IBOX_SOURCE1_VALID SPECIFIER_SOURCE1_SIZE ! XDTA_SOURCE2_REG[03:00] XDTA_SOCURCE2_REG_VALID IBOX_SOURCE2_POINTER|[04:00] XDTA_SOURCE2_VALID CURRENT_FREE_POINTER SOURCE 2 POINTERS IBOX_SOURCEZ2_VALID e CURRENT_SOURCE1_SIZE e o oo o SPECIFIER_SOURCE2_SIZE N SOURCE1_SIZE TV on owe o SOURCE2_SIZE IBOX_FREE_POINTER[03:00) PREV_FREE_POINTER{03:00] MR_X0112_89 Figure 5-18 Free Pointer Logic DIGITAL INTERNAL USE ONLY Specifier Decode FREE_POINTER[03:00} \l XDTA_SOURCE1_REG[03:00) XDTA_SOURCE1_REG_VALID_H STALL | SOURCE1_REG[03:00] T8 IBOX_SOURCE1_POINTER[03:00] T8 1BOX_SOURCE1_POINTER[04] LATCH BUFFER BSUT;‘AFLELR SOURCE1_REG_VALID_H LATCH SOURGET REG VALID L | XDTA_SOURCE1_VALID_H SOURCE1_POINTER STALL BUFFER SOURCE1_NOT_REGISTER_H FPL_FLUSH_L '—‘—J ‘ |SOURCE1_VALID_H _ FPL_FLUSH_L 1B SRC1_VALID | LaTeH FPL_STALL_L Figure 5-19 5-27 1BOX_SRC1_VALID_H MRA_X0113_89 FPL Source 1 Pointer Logic XBAR and The register valid signal (XDTA_SRC1_REG_VALID) is latched fromd the a register denotes asserte valid r Registe provides the select at the pointer multiplexer. , negated If queue. source the in entered operand and selects the register field to be 1 source the for list source the in d the free pointer is selected and an entry is allocate operand. latched as bit 4 of Aside from providing a pointer select, the register valid signal is EGIST ER_H. When the source 1 pointer and, when negated, generates SRC1_NOT_R nting the free asserted, SRC1_NOT_REGISTER_H initiates the process of increme pointer. . This signal To pass a valid source 1 pointer, IBOX_SRC1_VALID_H must be, asserted is passed to the EBox. is passed from the XBAR, and, if no stalls or flushes are presentthe source queue. This signal enables the EBox to enter the source 1 pointer into 5.4.2 Source 2 Pointer manner as the source 1 The source 2 pointer and valid signals are generated in a similarpointer logic. pointer. Figure 5-20 is a detailed block diagram of the source 2 DIGITAL INTERNAL USE ONLY ’204SQITVA a1 Figure 5-20 VGLIATXVA"232HNOSH 4d 7T GIMYA“23oHnosH dHSN14 DIGITAL INTERNAL USE ONLY HOLV1 g1 HOLV? HOLV 1IviLs H344ny 1viLs Hiding 1Ivis H344n8 X10408'€0]H3ILNIOd"UP2NL30IHONXO6S8 VLaX 230UNOS [00:€0]l934 44{HLN03'I3IYULH:E€NN0DIJOH[o3D0L:N€["0IS]0NO91d3d4:72O€3H00S4"1 1NO3SZ15S 230HNOS HILINIOd X2083I0TM 4N0SQITVA H HNOSLON D34 H vLiax 230HN0S 934 QITVA H 1QIV 22d33400HHHNNSOONS1S499T3344 QIATVA H1 723103W0N934 OS LON X08| 230HNOS [vold3LNIOd 1d 171IVLS 5-28 Specifier Decode g9l FPL Source 2 Pointer Logi Specifier Decode 5-29 The pointer multiplexer of this logic provides the selection of three sources for the source 2 pointer. o CURRENT_SOURCE2_REGISTER[03:00] — When source 2 is a register operand, the source 2 register field is passed. r and source 2 e« FREE_POINTER[03:00] — Selected when source 1 is a valid registe is valid but not a register operand. This entry is submitted to the source list. both source 1 and . FREE_POINTER_PLUS_SOURCEI_SIZE[03:00] — Whenlist (free pointer) must source the for entry source 2 are valid but not registers, the d. operan 1 source the be allocated after the source list entries for 1 and source 2 not register The select for the pointer multiplexer is provided by the source : signals. 5.4.3 Free Pointer to reflect the As entries are added to the source list, the free pointer must be incremented of the free ment size of the entries. Figure 5-21 is a block diagram depicting the manage pointer. SPECIFIER_SRC!I_NOT_REG_H SPECIFIER_SRC2_NOT_REG_H ORDER_H ALLOCATE_O SL_LWORDS_H[02:00] SPEC_LWORD_H[02:00] OPCODE_H[07:00] OPCODE_H[08] Eouncm_sxze ) SOURCE2_SIZE | ADD Oire , SPES%FE‘ER V CURRENT_SIZE_TO_ALLOCATE_H[03:00] = P SCAN PREVIOUS_FREE_POINTER_H[03:00] [ATCH > FREE_POINT_PLUS_SIZE_H[03:00] j NEW_FREE_POINTER_H[03:00] I‘BOX_ REE_POINTER_H[03:00) F LATCH P TER : FPL_FLUSH_H SLIST_FULL_H MR_X0115_89 Figure 5-21 Free Pointer DIGITAL INTERNAL USE ONLY 5-30 Specifier Decode The opcode of each instruction is decoded in the access/data type logic, with the data type providing the specifier size for each valid memory operand. Register operands have no effect on the free pointer because they are not entered into the source list. When source 1 and source 2 pointers have both been passed as valid entries into the source list, the free pointer is equal to the size of the two operands (SRC1_PLUS_SRC2_ SIZE) plus the previous free pointer. If either source 1 or source 2 is a register operand, then only the size of the memory operand must be added to the free pointer to produce a new free pointer. When both source 1 and source 2 are register operands, the free pointer is not affected. 5.4.3.1 Free Pointer Initialization On a flush of the free pointer logic, the EBox copies the free pointer into the last pointer in the source list, and the IBox increments the free pointer (FREE_POINTER_ PLUS1[03:00]) so that the source list appears empty. 5.4.4 Destination Pointer The destination pointer is generated and validated in XDTA and passed to the FPL in OSQB. Three signals deliver the destination pointer information: e XDTA_DESTINATION_REG_H[03:00] is the register number of the destination operand when it is a register operand. o XDTA_DESTINATION_REG_VALID_H is asserted when the destination operand is a register. e XDTA_DESTINATION_VALID_H is asserted when a valid destination operand is being passed. Figure 5-22 shows a block diagram of the destination pointer logic. 5.4.4.1 Destination Register The XBAR supplies the destination register field XDTA_DESTINATION_REG{03:001) and, when the register valid bit is set (XDTA_DESTINATION_REG_H), the operand is a register and the field is placed in the destination queue as the address of the GPR that receives the destination data. 5.4.4.2 Destination Valid The destination valid signal (XDTA_DESTINATION_VALID) is passed by the XBAR. This signal validates the destination pointer if FPL_FLUSH, EBOX_QUE_FULL, or DISABLE_VALIDS are not asserted. 5.4.4.3 Destination Memory To differentiate between a register destination and a memory destination, IBOX_ DESTINATION_MEMORY_H is used. When asserted, this signal signifies a memory destination is being passed. This signal is asserted when XDTA_DESTINATION_VALID_ H is asserted, XDTA_DESTINATION_REGISTER_H is negated, and no flush is present. All destination pointers are sent to the EBox destination pointer queue. DIGITAL INTERNAL USE ONLY Specifier Decode H AHOWIW NOILVNILS3IA XO08! 81 D34 1S3 LN3H ND Figure 5-22 5-31 61H807NX91 1 Destination Pointer Logi DIGITAL INTERNAL USE ONLY 5-32 Specifier Decode 5.5 Operand Control Unit The operand control (OCTL) unit provides control and distribution of stall and flush signals from the EBox and from other IBox functional units. OCTL also stores the read and write register masks that are generated for each instruction. Figure 5-23 is a basic block diagram of OCTL. EBOX_FLUSH_H[02:00] : XDTA_READ_MASK[14:00] . FLUSH XDTA_WRITE_MASK[14:00] PCU | INSTRUCTION BUFFER VvicC MBOX OCTL PCHI_UNWIND_H EBOX_READ_MASK[14:00] SCOREBOARD_STALL — EBOX_WRITE_MASK[14:00] MASK_VALID MR_X0117_89 Figure 523 5.5.1 OCTL Unit Read/Write Masks Up to six read and write masks can be stored in OCTL. Each mask is passed by the XBAR when the instruction it represents has been completely decoded. The masks are generated to prevent the EBox and IBox from using stale data during the execution of instructions. Figure 5-24 is a detailed block diagram of the read and write mask generation, storage, and unwind logic. Each instruction that is decoded in the XBAR generates a 31-bit mask that is passed to the OCTL unit. This field is passed at the completion of the instruction decode with a valid bit XDTB_MASK_VALID_H) and is broken down as follows: XDTA_MASK_H[30] is the odd parity for the mask field. XDTA_MASK_H[29:15] is the write mask field. XDTA_MASK_H[14:00] is the read mask field. If the first instruction the IBox decoded were ADDL2 RO, R1, the mask the XBAR generates would contain bits 0 and 1 set in the read mask and bit 1 set in the write mask. This mask would be selected by REG_INSERT[02:00] to be stored in REG_MASKO. The following list describes the three basic operations that can be performed with the masks. The signals that initiate the function precede the descriptions. XDTB_MASK_VALID_H — Accept a new mask from the XBAR. EBOX_INSTRUCTION_DONE_H — Discard the oldest mask. EBOX_KEEP_MASKS[02:00] — Unwind the masks on a bad branch prediction. These three functions affect the following two fields: e REGISTER_SIZE_H[02:00] is the field that indicates the number of the masks that are valid. e REGISTER_INSERT_H[02:00] is the field that points to the position where the next mask should be inserted. DIGITAL INTERNAL USE ONLY 5-33 Specifier Decode REG_MASKx|30:00] J\ REG_MASKx_PERR CURRENT_MASK[30:00) > SCAN | STALLED_MASK[30:00) SN T LATCH : B XDTB_MASK_VALID scan | INSTR_MASKi{30:00] XDTA_MASK[30:00] | [ATCH . l——1 , SCOREBOARD_STALL_TB_L MASK_VALID_H REG_INSERT{02:00] REG_INSERT{02:00] MODULO & ) CORRECTION[02:00] |} gy gTRACT MODULO 6 INCREMENT T PC ) (SPECIFIER) z (ERROR) | MASKO_VALID{29:00] 9: REG_MASKO0[2 SCAN [29:00] } 0 LATCH 0 . LATCH 1 SCAN | 2 LaTCH % B 3 LATCH TP SCAN scan | 4 LaTCH Lo scan MASKS_VALID[29:00] - LATCH § G_MASKS[29:00) ] 5 NEW WRITE_MASK[29:00] -1 E_{\ > 1 i EBOX_KEEP_MASKS { WRITE_MASK REGISTER SISt REG_SIZE[02:00] . REG_INSERT[02:00] MASKx_VALID A ] MASK[29:00] MASK[14:00] INSTRUCTION_DONE FLUSH [29:00) MASK[29:15) scan |OCTL_EBOX_READ_MASK[14:00] LATCH SCAN LATCH > E PG> OCTL_EBOX_MASK_PARITY _ 5s0a locTL_EBOX_WRITE_MASK[14:00] o MA_X0118_88 Figure 5-24 OCTL Read/Write Masks 5.5.1.1 Mask Valid REGISTER_INSERT to XDTB_MASK_VALID_H, asserted, directs REGISTER_SIZE .andIncrem enting is done in MASKx into d inserte be incremented. The mask field is then modulo 6 (count = 0 through 35). The six sets of registers that hold the masks are cyclic. That is, if the last mask was MASKS5, the next mask to be used would be MASKO. 5.5.1.2 Instruction Done UCTION_DONE_H. This The EBox, at the completion of an instruction, asserts INSTR the instruction that was with signal directs the mask logic to discard the mask associated decoded. o 6) REGISTER_SIZE_ Discarding the mask is accomplished by decrementing (modulwhile REGISTER_INSERT_ H[02:00]. This function directs the oldest mask to be deleted H[02:00] still points to the next valid position for a mask to be written. DIGITAL INTERNAL USE ONLY 5-34 Specifier Decode 5.5.1.3 Correction In the event of a bad branch prediction, the EBox must direct the IBox masks to be unwound back to the point of the branch. The EBox directs the unwind operation by sending EBOX_KEEP_MASK_L[01:00] to the mask logic. Figure 5-25 shows the logic involved in the correction. REG_INSERT_H[02:00) = CORRECTION_H[02:00) MODULO 6 REGISTER_SIZE_H[02:00] COUNTER | REG_INSERT_H[02:00] REGISTER | MASKx_VALID_H[29:00) VALID MR_XD119_89 Figure 5-25 OCTL Mask Correction Logic NEW_REG_INSERT_H[02:00] is generated by subtracting CORRECTION_H[02:00] from REG_INSERT_H[02:00]. The subtraction is modulo 6. Table 5-3 describes the generation of NEW_REG_INSERT_ H[02:00] with the possible combinations of REG_INSERT{02:00] and CORRECTION_H[02:00]. Table 5-3 ' Modulo 6 Subtraction Logic REG_INSERT{02:00] CORRECTION_ H[02:00] 0 0 000 1 101 2 100 1 2 3 001 010 011 000 001 010 101 000 001 4 5 6 7 100 101 XXX XXX 011 100 XXX XXX 010 011 XXX XXX 3 011 100 101 000 001 010 XXX XXX 4 010 011 100 101 000 001 XXX XXX 5 001 010 011 100 101 000 XXX XXX 6 000 001 010 011 100. 101 XXX XXX 7 XXX XXX XXX XXX XXX XXX XXX XXX Legend 0-7 = Input count for CORRECTION_H[02:00] or REG_INSERT{02:00] 000-101 = NEW_REG_INSERT_H[02:00] binary output XXX = Not possible DIGITAL INTERNAL USE ONLY Specifier Decode 5-35 produce the NEW_REG_INSERT_H[02:00] is decoded with REGISTER_SIZE_H[02:00] toregister valid the for logic decode register valid field. Table 54 provides the output of the fields. Table 54 Register Valid Fields REGISTER_SIZE_H[02:00] NEW_REG_ INSERT_H[02:00] 000 001 010 011 100 101 110 000 001 010 000000 000000 000000 100000 000001 000010 110000 100001 000011 111000 110001 100011 111100 111001 110011 111110 111101 111011 111111 111111 111111 001000 001100 001110 001111 101111 111111 011 100 101 000000 000000 000000 000100 010000 000110 000111 011000 011100 100111 011110 110111 011111 111111 111111 Legend 000-101 = Input count for REGISTER_SIZE_H[02:00] or NEW_REG_INSERT_H[02:00] 000000-111111 = REG5 valid through REGO valid output Where a 1 indicates valid and 0 indicates invalid For example, 100000 = REGS5 valid, 000011 = REG]1 valid and REGO valid Using Table 54, if NEW_REGISTER_INSERT_H[02:00] were 001 and REGISTER_ SIZE_H[02:00] were 010, REGISTER_VALID_H would equal 100001. This value represents REG5 and REGO as valid. 5.5.2 Read/Write Mask Parity output of REG_MASKO0[{30:00] through REG_MASK5_H[30:00] are parity checked at their R_0 SK_PER REG_MA the OCTL MCA. An error detected within one of the masks asserts mask. register the to nding through REG_MASK_PERR_5, with 0 through 5 correspo REG_MASK_PERRx asserts OCTL_ERROR, which is passed to the OSQA MCA. OSQA, on receipt of this error, asserts SPECIFIER_ERROR_H and forwards it to the EBox. 5.5.3 Flushes ‘ tes flush The OCTL unit receives EBOX_FLUSH_H[02:00], decodes the field, and distribu signals to the appropriate functional units. Figure 5-26 shows the flush logic in the OCTL unit. The EBox redirects the flow of the IBox with a 3-bit flush code. Table 5-5 provides the breakdown of these flush fields. DIGITAL INTERNAL USE ONLY 5-36 Specifier Decode PCVC_VIC_FIP [01] [02) [0z} {01} EBOX_FLUSH|[02:00] T8 LATCH VIC_FLUSH_STALL SCAN VIC_FLUSH_IN_PROGRESS PCVC T8 OCTL_VIC_FLUSH LATCH LATCH SCAN LATCH , {02] T8 ocTL_1BUF_FLUSH| LATCH PCHI_UNWIND SCAN LATCH AN ——] SO [ . 1BOX_FLUSH_ABORT | scan LATCH , > MBOX |'BUF_FLUSH_IN_PROGRESS FLUSH_CTL {00] fo2] 18 LATCH OCTL_FPD_FLUSH l SCAN |FPD_FLUSH_IN_PROGRESS — XBR (XDTB, XSCA) LATCH MA_Xbt2z_89 Figure 5-26 Table 5~5 OCTL Flush Logic EBox Flush Codes Code Name Action 010 VBox mode Directs the IBox to enter VBox mode. 011 Unsuspend Directs the XBAR to resume decoding specifiers. 100 IBUF flush Flushes the instruction buffer. 110 VIC flush Directs the PCU to flush the VIC. If, during the 256 cycles that it takes the VIC to complete the flush, a subsequent flush is received, the IBox stalls. 111 FPD flush First part done (FPD) flush is used by the EBox to stall the execution of lengthy instructions. The instruction, under EBox microcode control, is stalled during execution to service interrupts or exceptions. The XBAR receives this signal. 1xx PC flush Each flush directed to the IBox functional units is also copied to the PCU. This copy of the flush redirects the PCU to a new prefetch PC. DIGITAL INTERNAL USE ONLY Specifier Decode 5-37 5.5.4 OCTL Stalls mask stall. A Two stalls are related to the read/write mask logic: scoreboard stall and ing GPR contain ions scoreboard stall (SCOREBOARD_STALL_H) occurs when instruct until the GPRs EBox conflicts are being processed. Scoreboard stalls inhibit updating the MASK_ (OCTL_ stall instruction containing the conflicts is completely processed. Mask STALL_H) is asserted when the IBox is ahead of the EBox by six instructions. The stall inhibits the IBox fom processing more I-stream. 5.5.4.1 Scoreboard Stali 0SQA, in controlling the EBox interface, detects this stall and inform the mask logic. The stall directs the mask to be submitted again for a write to the EBox. 5.5.4.2 Mask Stall Figure 5-27 is a detailed block diagram describing the mask stall logic. On power-up, the SPU loads a value of six (MASK_COUNT_H[02:00]) into one comparator and MASK_COUNT_H[02:00] minus one into a second comparator. Each time a new mask is loaded, REG_SIZE_H[02:00] is compared to MASK_COUNT_ H[02:00]. The two comparators used in this process output MASK_FULL_NEXT_the CYCLE_H when the count reaches five and MASK_CURRENTLY_FULL_H when write count reaches six. OCTL_MASK_STALL_H is asserted to inhibit passing read and masks to the EBox when the IBox is six instructions ahead of the EBox. mable. The value that is passed to the mask logic (MASK_COUNT_H[02:00)) is program the allows one to value the Setting six. to This value can be set at the console from one IBox to get one instruction ahead of the EBox. SPU (POWER-UP) —eeee SCAN | LATCH (FULL NEXT MASK_CNT[OZ:OO]D_‘ .1 COMPARE | CYCLE) T8 LATCH XBAR > OS0A (CURRENTLY REG_SIZE[02:00] COMPARE | OSQA_SCOREBOARD_STALL OCTL_MASK_STALL T8 LATCH FULL) SCOREBOARD_STALL el AR_XG:2°_88 Figure 5-27 OCTL Mask Stall Logic DIGITAL INTERNAL USE ONLY 5-38 Specifier Decode 5.6 OPU Port Interface The OPU port interface is a read/write interface to the MBox. The interface is 32 bits wide and is byte parity protected. The interface is used for three reasons: Prefetching operands, from memory, for the EBox Queuing destination operand addresses for data from the EBox Fetching operands that are indirectly addressed Passing VBox requests to the MBox Figure 5-28 lists the signals that comprise the OPU port interface and Table 56 describes them. 1BOX MBOX IBOX_OP_REQUEST_H OSQA IBOX_OP_ADDRESS_H[31:18, 15:00] OPUB, A IBOX_OP_ADDRESS_PARITY_H[03:02, 01:00) OPUB, A IBOX_OP_CONTEXT_H{02:00] 0SQA IBOX_OP_CONTROL_H[02:00] OSQA IBGX_OP_CONTROL_PARITY_H OSQA IBOX_OP_INDIRECT_H[01:00] OSQA IBOX_OP_TAG_H[03:00] 08QA IBOX_ABORT_H. L PCVC IBOX_FLUSH_ABORT_H, L OCTL MBOX_OP_DATA_H[31:16, 15:00) OPUB, A MBOX_OP_DATA_PARITY_H[03:02, 01:00] A OPUB, MBOX_OP_GRANT_H OSQA MBOX_OP_RESPONSE_H OSQA OPUA, B MR_XG122_89 Figure 5-28 OPU Port Interface DIGITAL INTERNAL USE ONLY Specifier Decode Table 5-6 5-39 OPU Port interface Signals Name Description IBOX_OP_REQUEST_H When asserted, indicates that an OPU port request is being made and that all other OPU port fields are valid. IBOX_OP_ADDRESS_H[31:00] IBOX_OP_ADDRESS_PARITY_H[03:00] IBOX_OP_CONTEXT_H[02:00] These lines are the 32-bit address sent to the MBox. Byte parity for the OPU address. Defines the reference size of the OPU port request. Field Reference Size 000 Longword 001 Byte Word Unused Quadword Octaword Unused 010 011 100 101 110 111 IBOX_OP_CONTROL_H{02:00] Block (16 quadwords) Provides the reference type of the transaction. Field Reference Type 000 Read (lock status) 001 Read with write check (lock status) 011 100 Write check (lock status) Read (don’t lock status) 110 111 Unused Unused 010 101 Read with write check (no conflict check) Read with write check (don’t lock status) DIGITAL INTERNAL USE ONLY 5-40 Specifier Decode Table 5-6 (Cont.) OPU Port interface Signals Field Reference Size IBOX_OP_INDIRECT _H{[01:00] Indicates the final destination of the MBox reference. Field IBOX_OP_TAG_H[03:00] IBOX_OP_CONTROL_PARITY_H 5.7 Destination 00 Nonwrite reference for the EBox 01 Write reference for the EBox 10 Indirect OPU read for a nonwrite specifier 11 Indirect OPU read for a write specifier Provides the address in the source list that the returning MBox data should be written. Odd parity for IBOX_OP_CONTEXT_H[02:00], IBOX_ OP_CONTROL_H[02:00], IBOX_OP_INDIRECT_ H[01:00], and IBOX_OP_TAG_H[03:00]. IBox-to-EBox Interface The IBox interface to the EBox is used to send the CSU and SLU operand data, control, and error information to the EBox. Figure 5-29 lists the signals that comprise the IBox-to-EBox interface and Table 5-7 describes them. DIGITAL INTERNAL USE ONLY ‘Specifier Decode 5—41 EBOX 1BOX A OPUB, A OPUB, OSsQA 0SQA OSQA OSQA 0SQB OsQB O0sSQB osaB 0sQB osaQB 0sQB osas 0sQB 0sSQB OsQB XSCA IBFB, A IBFB XSCA OSQA 0SQA 0SQA XDTA OSQA OSQA O0SQA CSQA OSQA XSCA PCH! PCHI PCHI, LO PCVC, BP PCLO XSCA 0sQB 0sQB XSCA XDTA IBOX_DATA_H[31:16, 15:00] IBOX_DATA_PARITY_H[03:02, 01:00] IBOX_DATA_TAG_H, L[03:00] IBOX_DST_DATA_TAG_H[03:00) IBOX_DATA_VALID_H, L iIBOX_OSQA_ISSA_PARITY_H IBOX_SOURCE1_POINTER_H, L[04:00] IBOX_SOURCEY_VALID_H, L IBOX_SOURCE2_POINTER_H, L[04:00] IBOX_SOURCE2_VALID_H, L IBOX_OSQB_SRCS_PARITY_H IBOX_DESTINATION_MEMORY_H IBOX_DESTINATION_POINTER_H[03:00] IBOX_DESTINATION_VALID_H {IBOX_OSQB_ISSB_PARITY_H 1BOX_OSQB_QPTR_PARITY_H IBOX_FREE_POINTER_H[03:00] IBOX_FORK_H[08, 07:04, 03:00] IBOX_FORK_ADDRESS_PARITY_H 1IBOX_FORK_VALID_H {BOX_GPR_H[03:00] IBOX_DST_GPR_H[03:00] IBOX_GPR_WRITE_H IBOX_REGISTER_FORK_H IBOX_RLOG_CONTEXT_H[03:00] IBOX_RLOG_TAG_H|[01:00]) IBOX_RLOG_WRITE_H IBOX_RLOG_COMPLETE_H IBOX_OSQA_RLOG_PARITY_H IBOX_INSTRUCTION_DECODED_H IBOX_CORRECT!ON_H {BOX_PREDICTION_H IBOX_PC_H[31:24, 23:13, 12:08, 07:00] 1BOX_PC_PARITY_H[03, 02, 01, 00] tBOX_PC_VALID_H IBOX_FORK_ERROR_H 1BOX_POINTER_ERROR_H IBOX_DATA_ERROR_H IBOX_IB_PAGE_FAULT_H IBOX_RAF_H MR_X0123_89 Figure 5-29 IBox-to-EBox Interface DIGITAL INTERNAL USE ONLY 5-42 Specifier Decode Table 5-7 IBox-to-EBox Interface Signals Name Description IBOX_DATA_H[31:00] This bus delivers the short literal or immediate data to the EBox source list. GPR updates, due to autoincrement and autodecrement specifiers, are also delivered to the EBox GRPs with this bus. IBOX_DATA_PARITY_H[03:00] Byte parity for IBOX_DATA_H[31:00]. IBOX_DATA_TAG_H, L{03:00] Address in the source list that the IBOX_DATA is to IBOX_DATA_VALID_H When asserted, indicates that there is short literal or immediate data for the source list. IBOX_DATA_ERROR_H Asserted to inform the EBox that a specifier error has IBOX_OSQA_ISSA_PARITY_H 0dd parity for IBOX_DATA_TAG_H[03:00] and IBOX_SOURCE1_POINTER_H, L{04:00] Provides the EBox source queue with ’the address, be written to. been detected. ' IBOX_DATA_VALID. or the GPR of the source 1 operand. If the operand is SLU, CSU, or MBox data, the field provides the source list address for the operand. IBOX_SOURCE1_VALID_H, L Validates the source 1 pointer. IBOX_SOURCE2_POINTER_H, L{04:00] The same as the source 1 pointer, except it is the pointer for the source 2 operand. IBOX_SOURCE2_VALID_H, L Valid bit for the source 2 pointer. IBOX_OSQB_SRCS_PARITY_H Parity bit for source 1 and 2 pointers and valid bits. IBOX_DESTINATION_MEMORY_H Asserted to inform the EBox that the destination operand is memory (not register). IBOX_DESTINATION_POINTER_ H([03:00} Provides the destination operand address. The address contains a GPR reference or write queue address in the MBox. IBOX_DESTINATION_VALID_H When asserted, validates all destination signals. IBOX_OSQB_ISSB_PARITY_H Parity bit for destination pointers and valid bit. IBOX_OSQB_QPTR_PARITY_H IBOX_FREE_POINTER_H[03:00] Parity bit for source 1, source 2, and destination valid bits and pointers. Provides the current free pointer value in the source list. IBOX_FORK_H{08:00] Copy of the opcode. IBOX_FORK_PARITY_H Parity bit for IBOX_FORK_H[08:00]. IBOX_FORK_VALID_H Asserted to indicate that IBOX_FORK_H[08:00] is valid. IBOX_GPR_H{03:00] The address of the EBox GPR that is to be written. This copy of the address is used by the RLOG. IBOX_DST_GPR_H[03:00] Contains the address of the EBox GPR that is to be written. This copy of the address is for the STREG. IBOX_WRITE_H Valid bit for IBOX_GPR_H and IBOX_DATA_H. IBOX_REGISTER_FORK_H DIGITAL INTERNAL USE ONLY Asserted to inform the EBox that a USRC specifier is being decoded. Specifier Decode Table 5-7 (Cont.) 5-43 IBox-to-EBox Interface Signals Name IBOX_RLOG_CONTEXT_H[03:00] Description Indicates the context for an RLOG request. The field supplies size and direction of changes. Field Change 0000 No change to GPR 0001 GPR incremented by one (byte) 0010 GPR incremented by two (word) 0011 GPR incremented by four (longword) 0100 GPR incremented by eight (quadword) 0101 GPR incremented by sixteen (octaword) 0110 Unused 0111 Unused 1000 Unused 1001 GPR decremented by one (byte) GPR decremented by two (word) 1010 1011 1100 1101 : : GPR decremented by four (longword) GPR decremented by eight (quadword) GPR decremented by sixteen (octaword) 1110 Unused 1111 Unused No change is used when the IBox detects an IRC. This records the fact that the GPR was modified, but only in the IBox. IBOX_RLOG_TAG_H[01:00] Indicates for which instruction the RLOG information pertains. The field is matched with 3-bit counters that indicate which instruction is being executed in the EBox and IBox. IBOX_RLOG_WRITE_H Asserted to indicate that RLOG information is to be written and to validate all other RLOG-related signals. IBOX_RLOG_COMPLETE_H Asserted to inform the EBox that the CSU has IBOX_OSQA_RLOG_PARITY_H Parity bit for the RLOG-related signals. IBOX_INSTRUCTION_DECDODED_H IBOX_CORRECTION_H stopped evaluating specifiers. Informs the EBox that the XBAR has completed decoding an instruction. Asserted by the PCU on a bad branch prediction. Informs the EBox that an unwind is not necessary, as the branch has not yet been shifted out of the instruction buffer. IBOX_PREDICTION_H Asserted by the PCU to inform the EBox that the branch under decode is predicted taken. IBOX_PC_H[31:00] These lines deliver a copy of the decdode PC to the IBOX_PC_PARITY_H[03:00] Byte parity for the EBox copy of the decode PC. EBox. IBOX_PC_VALID_H DIGITAL INTERNAL USE ONLY 5-44 Specifier Decode Table 5-7 (Cont.) 1Box-to-EBox Interface Signals | Name IBOX_FORK_ERROR_H IBOX_POINTER_ERROR_H Description Asserted to inform the EBox that a fork error has been detected. Asserted to inform the EBox of a detected pointer error. IBOX IB PAGE_FAULT H . IBOX_RAF_H 5.8 Asserted to inform the EBox that the data being decoded page faulted in the MBox. The signal is not sent unless the data is accessed. The XBAR informs the EBox of an RAF by asserting this line. EBox-to-IBox Interface The EBox interface to the IBox is used to write result data to GPRs and to provide a variety of control functions for the IBox. The control signals can include starting or flushing PCs, RLOG unwinds, interrupts, and branch prediction status signals. Figure 5-30 lists the signals that comprise the EBox-to-IBox interface and Table 5-8 describes them. DIGITAL INTERNAL USE ONLY Specifier Decode 5-45 EBOX 1BOX EBOX_BRANCH_A_H&L, B_H PCU OSQA EBOX_BRANCH_VALID_A_H&L, B_H EBOX_FLUSH_H[02:00] OCTL EBOX_GPR_BYTEO_WRITE_H 8TG2 EBOX_GPR_BYTE1_WRITE_H §TG2 EBOX_GPR_H[03:00} 8TG2, 3 EBOX_GPR_WORD1_WRITE_H STG3 OoCTL XDTB EBOX_KEEP_MASKS_H, L[01:00] EBOX_INSTRUCTION_DONE_H OCTL EBOX_INTERRUPT_H OSQA EBOX_LAST_POINTER_H[03:00] OsQB EBOX_QUEUE_FULL_H osaB EBOX_RESULT_H[31:16, 15:00] STG3, 2 PCHI, LO PCVC, BP EBOX_RESULT_L[31:24, 23:08, 07:05, 07:00] EBOX_RESULT_PARITY_H[03:02, 01:00] STG3, 2 PCHI, LO PCBP EBOX_RESULT_PARITY_L[03, 02:01, 00] EBOX_RLOG_FULL_H osas EBOX_RLOG_POINTER_H[01:00] XDTB EBOX_UNSUSPEND_H XDTA A—————at—e’ MR_X0124_89 Figure 5-30 EBox-to-IBox Interface DIGITAL INTERNAL USE ONLY 5-46 Specifier Decode Table 5-8 EBox-to-IBox Interface Signals Name Description EBOX_BRANCH_A,B_H Asserted in the PCU and OSQA when a bad branch EBOX_BRANCH_VALID_H Asserted to inform the IBox that a conditional branch has been retired. The CSU decrements the branch count if the branch was predicted correctly. EBOX_FLUSH_H[02:00] This field passes EBox flush signals to the IBox. EBOX_GPR_BYTEO_WRITE_H Asserted to indicate that the EBox wishes to write byte 0 of the GPR pointed to by EBOX_GPR_H{03:00]. EBOX_GPR_BYTE1_WRITE_H Asserted to indicate that the EBox wishes to write EBOX_GPR_H[03:00] These lines indicate which GPR the EBox will write prediction is detected. byte 1 of the GPR pointed to by EBOX_GPR_H[03:00]. to. EBOX_GPR_WORD1_WRITE_H Asserted to indicate the EBox wishes to- write the high-order word that EBOX_GPR_H[03:00] is addressing. EBOX_KEEP_MASK_H[01:00] This field informs the IBox of how many register masks to keep on a bad branch prediction. The field reflects the number of instructions still in progress in the EBox. EBOX_INSTRUCTION_DONE_H Asserted when an instruction is complete in the EBox. Directs the register mask logic to delete the oldest mask. EBOX_INTERRUPT_H Asserted when the EBox is taking an exception or an interrupt and directs the IBox to stop processing specifiers. EBOX_LAST_POINTER_H[03:00] This field provides a pointer to the last used location EBOX_QUEUE_FULL_H Asserted when the EBox queues are full (except EBOX_RESULT_H[31:00] These lines deliver the result data to be written to the IBox GPRs and is also the path used to supply a new PC to the IBox. EBOX_RESULT_PARITY_H[03:00] Byte parity for the EBox result dat=. EBOX_RLOG_FULL_H Asserted to inform the IBox that the EBox RLOG EBOX_RLOG_POINTER_H[02:00] This field is sent to XDTB and describes the current EBOX_UNSUSPEND_H This signal is sent to XSCA to negate XBAR_ DIGITAL INTERNAL USE ONLY in the source list. RLOG). queue is full. RLOG entry. SUSPEND_H. Specifier Decode 5.9 5-47 VBox Interface MBox prefetches The VBox interfaces to the MBox through the EBox and the IBox. TheOPU port, requests the operand requested and the VBox, through the IBox STREGs and the operands. The operand data is returned to the EBox source list. Figure 5-31 list the signals that comprise the VBox interface and Table 5-9 describes them. VBOX IBOX 2 8§TG3, 2 STG3, 16, 15:00] VBOX_ADDRESS_H[31: VBOX_ADDRESS_PARITY_H{03:02, 01:00] VBOX_ADDRESS_VALID_H OSQA VBOX_REFERENCE_SIZE_H OSQA VBOX_REFERENCE_TYPE_H OSQA VBOX_BLOCK_READ_H QSQA VBOX_READ_NOP_H 0SQA MR_X0125_89 Figure 5-31 VBox Interface Table 5-9 VBox Interface Signals Name Description VBOX_ADDRESS_H[31:02] Address lines of the request the VBox is making to the MBox. Bits 0 and 1 are always cleared. VBOX_ADDRESS_PARITY_H[02:(0] Byte parity for the VBox address. VBOX_ADDRESS_VALID_H VBOX_BLOCK_READ_H Asserted to indicate that the VBox is making an MBox request, across the OPU port. The signal validates all other VBox signals received by the IBox. Asserted when the VBox request is for a block of data (16 longwords). When asserted, it is assumed that a read request is being made. VBOX_REFERENCE_SIZE_H When asserted, indicates that the reference size of the VBox request is a quadword. When negated, the reference is a longword. VBOX_REFERENCE_TYPE_H When asserted, indicates that the VBox reference type is write. When negated, the reference type is read. DIGITAL INTERNAL USE ONLY 5-47 6 IBox Error Descriptions This chapter describes the IBox error registers. Six registers report errors. The tables in this chapter contain a description of each error, including the register bit, the error of the failing pneumonic, the error signal that asserts the error bit, and a description data or control signals that cause the error. [Box Error Registers 6.1 The six error registers that report IBox errors are as follows: IBOX_FETCH_ERROR_REG1_H[31:00] IBOX_FETCH_ERROR_REG2_H[31:00] IBOX_DECODE_ERROR_REG1_H[31:00] IBOX_XBR_DECODE_ERROR_REGISTER_H([31:00] IBOX_SPECIFIER_REG1_H[31:00] IBOX_SPECIFIER_REG2_H[31:00] Each register corresponds to one of the three IBox pipeline stages: fetch, decode, or specifier. Data and control flow mainly from the fetch stage to the decode stage, then to the specifier stage in the IBox. Because of this relationship, if an error is detected in a previous stage, only that error is reported. Once an error is detected in the IBox, the state elements in the IBox are held until the console can scan in new values and restart the IBox. 6.2 Fetch Error Register 1 Two 32-bit registers report errors occurring in the fetch stage of the IBox pipeline. The errors reported are those detected in the following MCAs: PCBP PCVC PCLO PCHI IBFA IBFB Figure 6-1 shows the bit field breakdown of fetch error register 1. Table 6-1 describes each error that this register reports. DIGITAL INTERNAL USE ONLY 6—1 |Box Error Descriptions 62 31 30 29 28 27 26 25 24 tBFA PARITY ERRORS 2 15 13 14 | PC o1 12 PRE- DEC Pc PRE PC 22 00 11 XDTB | XDTA | PCBP | PRE- PCBP |DEC PC| 21 20 19 S 10 07 06 08 8P BP PREDICT | TAG PRE PC 17 16 BP BP DISP DICT 09 18 PCLO PARITY ERRORS pDisP | DISP | DEC PC| 0s PRE PC DEC | PCBP i SULT RE|DEC C | PC picT PRE- 04 03 02 01 00 BPTAG 00 ., PRE PC DEC PC PCBP PARITY ERRORS PCVC PARITY ERRORS PCLO PARITY ERRORS DICTY [1:33] BP PRE viC TAG S 23 PCHI PARITY ERRORS DEC PC PCHI | PCLO |DEC PC|DEC PC| PREDICT BP [LENGTH} MR_X0140_85 Figure 6-1 Table 6-1 Fetch Error Register 1 Fetch Error Register 1 Bits Error Name Description 00 DEC PC (PCBP_DECODE_ PC_ERROR_H) Occurs when a parity error is detected in data being latched in DECODE_PC_TA_H[07:00] in PCBP. 01 PRE PC (PCBP_PREFETCH_ PC_ERROR_H) Occurs when a parity error is detected in the data being latched in PREFETCH_PC_H[07:00] in PCBP. 03:02 BP TAG (PCBP_BP_PRED_ TAG_ERROR_H[03:02]) Occurs when a parity error is detected in bits [31:16] of the branch prediction cache tag field. Parity checking is performed in PCBP. 04 BP LENGTH (PCBP_BP_ INSTR_LENGTH_ERROR_H) Occurs when a parity error in the instruction length field is stored in the BP STRAMs on the VIC MCU. Parity checking is performed after the field is passed to PCBP. 05 BP PREDICT (PCBP_BP_ PREDICTION_ERROR_H) Occurs when the two prediction bits from the BP STRAMSs are not the same. PCBP receives these bits from the VIC MCU. 06 PCLO DEC PC (PCBP_PCLO_ DECODE_PC_ERROR_H) Occurs when a parity error is detected in PCLO_ DECODE_PC_TB_H([23:16]. PCBP receives this field from PCLO and performs the parity check. 07 PCHI DEC PC (PCBP_PCHI_ DECODE_PC_ERROR_H) Occurs when a parity error is detected in PCHI_ DECODE_PC_TB_H][31:24]. PCBP receives this field from PCHI and performs the parity check. 08 DEC PC (PCVC_DECODE_ PC_ERROR_H) Occurs when a parity error is detected in the data being latched in DECODE_PC_TA_H[15:08). 09 PRE PC (PCVC_PREFETCH_ PC_ERROR_H) Occurs when a parity error is detected in the data being latched in PREFETCH_PC_H[15:08] in PCVC. 10 BP TAG (PCVC_BP_PRED_ TAG_ERROR_H) Occurs when a parity error is detected on bits {15:10] of the branch prediction tag. The parity check is performed when the STRAM data is passed to PCVC. 11 BP PRED (PCVC_BP_ PREDICTION_ERROR_H) Occurs when the BP prediction bits sent to PCBP are not the same. The bits are stored in the BP STRAMs of the VIC MCU. 12 PCBP DEC PC (PCVC_PCBP_ DECODE_PC_ERROR_H) Occurs when a parity error is detected on data being latched in PCBP_DECODE_PC_TA_H{05:00]. DIGITAL INTERNAL USE ONLY IBox Error Descriptions Table 6-1 (Cont.) Fetch Error Register 1 Description Bits Error Name 13 DEC PC (PCLO_DECODE_ 14 15 16 17 18 6-3 Occurs when the data being latched in DECODE_ PC_ERROR_H) PC_TA_HI[23:16] causes a parity error in PCLO. PC_ERROR_H) PC_H[23:16] asserts a parity error in PCLO. PRED_PC_15_13_ERROR_H) BP PREDICT (PCLO_BP_ PREDICTION_ERROR_H) from BPST asserts a parity error in PCLO. Occurs when the two BP prediction bits sent to PCBP are not the same. The bits are stored in the PRE PC (PCLO_PREFETCH_ PREDICT PC [15:13] (PCLO_ EBOX RESULT (PCLO_ Occurs when the data being latched in PREFETCH_ Occurs when BPPC_PREDICTION_PC_B_H[15:13] BP STRAMs in the VIC MCU. Occurs when EBOX_RESULT_H[15:08] from the EBOX_RESULT_ERROR_ EBox asserts a parity error in PCLO. PCBP DEC PC (PCLO_PCBP_ Occurs when PCBP_DECODE_PC_TA_H[OE:00] H[01]) ' DECODE_PC_ERROR_H) 4 or PCBP_DECODE_PC_PARITY_TA_H asserts a parity error in PCBP. DEC PC (PCHI_DECODE_PC_ ERROR_H) PRE PC (PCHI_PREFETCH_ Occurs when the data being latched in DECODE_ PC_TA_H[31:24] causes a parity error in PCHI. Occurs if a parity error is detected when checking PCBP_DECODE_PC_TA_H[05:00] on PCBP. 25 PCBP DEC PC (PCHI_PCBP_ DECODE_PC_ERROR_H) XDTA DISP (PCHI_XDTA_ 26 XDTB DISP (PCHI_XDTB_ Asserted in PCHI when a parity error is detected 19 20 22:21 23 24 Occurs when the data being latched in PREFETCH_ PC_ERROR_H) BP DISP (PCHI_BP_TAG_ DISP_ERROR_H[01:00]) BP PREDICT (PCHI_BP_ PREDICTION_ERROR_H) PC_H[31:24] asserts a parity error in PCHIL Occurs when BPTD_TAG_DISPLACEMENT_ H[15:00] asserts a parity error in PCHL If this error occurs, the two prediction bits sent to PCRBP from the BP STRAMs are not the same value. DISP_ERROR_H) XDTA_DISPLACEMENT_L{11:08, 03:00]. DISP_ERROR_H) when checking XDTB_DISPLACEMENT_L[15:12, Asserted in PCHI when a parity error is detected in 07:04] and XDTB_DISPLACEMENT _PARITY_ H[01]. 29:27 31:30 PRE PC (IBFA_PREF_PC_ ERROR_H{[03:01]) VIC TAG (IBFA_VICT_TAG_ ERROR_H[02]) Asserted in IBFA when a parity error s detected in the following signals: PCBP_PREF:*TCH_PC_ H[04:00] from PCBP to IBFA, PCHI_PREFETCH_ PC_H[31:24] from PCHI to IBFA, and PCLO_ PREFETCH_PC_H[23:16] from PCLO to IBFA. Bit 30 of this error is asserted when VICT_TAG_ H[23:13] asserts a parity error. Bit 31 is set when VICT_TAG_H[31:24] asserts a parity error. Both VIC tag fields are checked in IBFA when they are passed from the VICT STRAMs to the XBR MCU. DIGITAL INTERNAL USE ONLY 6-4 |Box Error Descriptions 6.3 Fetch Error Register 2 Fetch error register 2 is 11 bits wide and reports errors occurring in IBFA and IBFB. Figure 6—2 shows the bit field breakdown of this register and Table 6-2 describes each error that this register reports. 31 29 30 28 27 26 25 24 23 22 21 20 19 18 17 16 06 05 04 03 02 01 00 NOT USED 08 07 IBFB PARITY ERRORS NOT USED IBFA PARITY ERRORS IBEX ERROR [07:00} 07 06 ., 05 ., 04 | 03 1 02 1 o1 00 VICB VICA BLOCK | BLOCK VAL VAL oWvIC VAL VA WA_XDT41_08 Figure 6-2 Table 6-2 Fetch Error Register 2 Fetch Error Register 2 Bits Error Name Description 00 VIC QW VAL (IBFA_VIC_ QUADWORD_ERROR_H) Occurs when a parity check of VICQ_QUADWORD_ VALIDS_H[03:00] fails. 01 VICA BLOCK VAL (IBFA_ VICA_BLOCK_VALID_ ERROR_H) Occurs when VICA_BLOCK_VALID_H[01:00] is parity checked after being passed from VICT to IBFA on the XBR MCU. 02 VICB BLOCK VAL (IBFA_ VICB_BLOCK_VALID_ Occurs when VICB_BLOCK_VALID_H[01:00] fails parity testing after being passed from VICT to IBFA ERROR_H) 10:03 IBEX ERROR (A_IBFB_IBEX_ ERROR_H[07:00]) on the XBR MCU. Occurs when IBEX_DATA_H[63:00] fails a parity test that is performed as the data is passed to IBUF. The failing data is held partly in IBFA, partly in IBFB. Each byte of IBEX data has one parity bit. Each byte of IBEX data is split into nibbles. The low nibbles are stored in IBFA and the high nibbles are stored in IBFB. DIGITAL INTERNAL USE ONLY IBox Error Descriptions 6.4 6-5 Decode Error Register 1 Decode errors are reported in two registers. Decode error register 1 reports errors that occur on instruction buffer data that is passed to XBAR and on control signals that are passed from XBAR to the PCU (shift counts, shift opcode). Figure 6-3 shows the bit field breakdown of this register and Table 63 describes each error that this register reports. 31 29 30 28 27 25 26 16 17 18 18 20 21 22 23 24 NOT USED 13 15 12 11 08 09 10 07 06 01 00 04 05 03 . 1 L 1 1 I I ] 02 1 00 01 IBFB PARITY ERRORS NOT USED I1BUF ERROR [08:00] 07 08 06 05 04 03 ) 02 IBFB SC PE PCHI IBFA DEC PE | DEC PE PCBP PCVC PCLO DEC PE | DEC PE | DEC PE WA _XD42_88 Figure 6-3 Table 6-3 Decode Error Register 1 Decode Error Register 1 Bits Error Name Description 00 PCBP DEC PE (A_PCBP_ DECODE_ERROR_H) Occurs when an error is detected in XSCA_ SHIFTCOUNT _C_L[03:00], XSCA_SHIFTOPCODE_ C_L, or XSCA_PCBP_B_PARITY_H. Parity checking is performed on PCBP as these signals are passed from XBAR. 01 02 03 PCVC DEC PE (A_PCVC_ DECODE_ERROR_H) Occurs when an error is detected in A_ XBAR_SHIFTCOUNT H[03:00], A_XBAR_ PCLO DEC PE (A_PCLO_ DECODE_ERROR_H) Occurs when an error is detected in XSCA_SHIFTCOUNT C_H[03:00], XSCA_ PCHI DEC PE (A_PCHI_ DECODE_ERROR_H) Occurs when an error is detected in XSCA_SHIFTCOUNT_B_H[03:00], XSCA_ SHIFTOPCODE_H, and A_XSCA_PCVC_B_ PARITY_H as they are passed to PCVC. SHIFTOPCODE_C_H, and XSCA_PCLO_B_ PARITY_H as they are passed to PCLO. SHIFTOPCODE_B_H, XSCA_DISPLACEMENT_ VALID_H, and XSCA_PCHI_B_PARITY_H when they are passed to PCHI. 04 IBFA DEC PE (A_IBFA_ DECODE_ERROR_H) Occurs when an error is detected in XSCA_SHIFTCOUNT A_H[03:00], XSCA_ _ SHIFTOPCODE_A_H, XSCA_FD_SHIFTOPCODE H, and XSCA_IBFA_B_PARITY_H as they are passed to IBFA. 05 IBFB DEC PE (A_IBFB_SC_ ERROR_H) Occurs when an error is detected in XSCA_ SHIFTCOUNT_A_L[03:00], XSCA_SHIFTOPCODE_ A_L, XSCA_FD_SHIFTOPCODE_L, XSCA_ EXTENDED_L, and XSCA_IBFB_B_PARITY_H as they are passed to IBFB. DIGITAL INTERNAL USE ONLY 1Box Error Descriptions 6-6 Table 6-3 (Cont.) Decode Error Register 1 Bits Error Name Description 14:06 IBUF ERROR (A_IBFB_IBUF_ ERROR_H[08:00]) Covers the IBFA and IBFB data paths for IBUF data to XBAR. Detection of a parity error in any of the nine bytes of IBUF data as it is passed to XBAR asserts the respective bit in this signal. (For example, when a parity error is detected in byte 0 of the IBUF data, byte 0 of A_IBFB_IBUF_ERROR_ H[08:00] is asserted.) XBAR Decode Error Register 6.5 The XBAR decode error register reports decode errors that occur, primarily, in the XBAR. Failing data passing between XSCA, XDTA, and XDTB asserts errors that are reported in this register. Control signals with sources external to XBAR are also reported (for example, OSQA_BRANCH_COUNT _H[01:00] and OSQA_SL_BUSY_STALL_H). Figure 6—4 shows the bit field breakdown of this register and Table 6—4 describes each error that this register reports. 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 01 (1Y) NOT USED i 15 14 13 12 11 10 03 -] 07 XSCA PARITY ERRORS USED OSQA PE XDTA PE IBFB PE | 06 05 04 03 XDTB PARITY ERRORS IBFA PE NOT USED XSCA PE IBFB PE IBFA PE 1 i 1 L ] 5 i 1 L 1 1 1 1 02 XDTA PARITY ERRORS NOT USED XSCA PE IBFB PE IBFA PE MR _X0:43_89 Figure 6-4 XBAR Decode Error Register DIGITAL INTERNAL USE ONLY IBox Error Descriptions Table 6-4 6—7 XBAR Decode Error Register Bits Error Name Description 00 IBFA PE (IBFA_XDTA_ PARITY_ERROR_H) Occurs when an error is detected in IBFA_YREG_ F_A_L[03:01], IBFA_VALID_A_L[03:01], or IBFA_ DATA_A_L[67:64, 59:56, 51:48, 43:40, 35:32, 27:24, 19:16, 11:08] as they are passed from IBFA to XBAR. 01 IBFB PE (IBFB_XDTA_ PARITY_ERROR_H) Occurs when an error is detected in IBFB_DATA_A_ L{39:36, 31:28, 23:20, 15:12] or IBFB_REGISTER _ MODE_L[08:05] as they are passed from IBFB to XBAR. 02 XSCA PE (XSCA_XDTA_ PARITY_ERROR_H) Occurs when an error is detected on the interconnect between XSCA and XDTA. The signals that can cause this error are as follows: o XSCA_IMPLIED_MASK_H XSCA_IRC_L XSCA_REQUEST_H{[03:00] ' XSCA_SPECIFIERS_DECODED_H[01:00] XSCA_SP1_ACCESS_H[09:00] XSCA_SP1_DATATYPE_H[02:00] XSCA_SP2_ACCESS_H[06:00] XSCA_SP2_DATATYPE_H[02:00] XSCA_SP3_ACCESS_H[01:00] XSCA_SP3_DATATYPE_H[02:01] XSCA_X8F_H 04 IBFA PE (IBFA_XDTB_ PARITY_ERROR_H) Occurs when an error is detected in IBFA_DATA_A_ H[03:00] or IBFA_YREG_F_A_H[04:01] as they are passed from IBFA to XDTB. 05 IBFB PR (IBFB_XDTB_ PARITY_ERROR_H) Occurs when a parity error is detected in IBFB_ DATA_A_H[71:68, 63:60, 55:52, 47:44, 39:36, 31:28, 23:20, 15:12, 07:04] after it is passed to XDTB. 06 XSCA PE (XSCA_XDTB_ PARITY_ERROR_H) Occurs when an error is detected on the interconnect between XSCA and XDTB. An error detected in one or more of the following signals can assert this error: XSCA_REQUEST_H[03:00] XSCA_IRC_L XSCA_X8F_H XSCA_SP1_DATATYPE_H[02:00] XSCA_SP2_DATATYPE_H[02:00] XSCA_SP1_DATATYPE_H[02, 00] XSCA_SPECIFIERS_REMAINING_L[02:01] XSCA_SPECIFIERS_DECODED_H[01:00] IBFA PE (IBFA_XSCA_ PARITY_ERROR_H) Occurs when an error is detected in one or more of the following signals: IBFA_DATA_B_H[03:00] IBFA_VALID_H[08:00] IBFA_YREG_F_B_H[04:01] IBFA_IB_PAGE_FAULT_H | 08 DIGITAL INTERNAL USE ONLY 6-8 [Box Error Descriptions Table 6-4 (Cont.) XBAR Decode Error Register Bits Error Name Description 09 IBFB PE (IBFB_XSCA_ PARITY ERROR_H) Occurs when an error is detected on the interconnect between IBFB and XSCA. The following signals can assert this signal: IBFB_DATA_B_H[39:36, 31:28, 23:20, 15:12, 07:04] ~ IBFB_REGISTER_MODE_H[08:05] IBFB_SL_MODE_H[07:05] IBFB_UNCONDITIONAL_B_H Occurs when an error is detected in XDTA_IRC_ MASK_H[08:00] as it is passed from XDTA to XDTA PE (XDTA_XSCA_ PARITY_ERROR_H) 10 XSCA. This error may propagate to a specifier error. H, OSQA_ Occurs when OSQA_DECODE_STALL. OSQA PE (OSQA_XSCA_ 11 BRANCH_COUNT _H[01:00],OSQA_SL_BUSY_ PARITY_ERROR_H) STALL_H, or OSQA_IN_SEQUENCE_H asserts a parity error in XSCA. This error may propagate to a : specifier error. Specifier Error Register 1 6.6 Specifier error register 1 reports errors that occur when data is being passed from XBAR to OSQA, OSQB, and OCTL. Errors occurring with the control signals passed to the OPU MCU and the data paths for the FPL and SLU are reported through this register. Figure 6-5 shows the bit field breakdown of this register and Table 6-5 describes the errors that this register reports. 30 31 29 25 26 27 28 23 24 22 21 20 19 08 05 04 03 18 17 186 02 01 00 NOT USED 14 18 13 09 10 11 12 07 08 OSQB PARITY ERRORS OCTL PARITY ERAORS NOT USED ] REG MASK PE [05:00]) i I { . oP- CODE XDTBPE 01 1 00 XDTA P I OSOA PARITY ERRORS ! NOT l MASK XDTB I XDTA . 4 USED ' PE PE | PE MR_X0144_80 Figure 6-5 Specifier Error Register 1 DIGITAL INTERNAL USE ONLY IBox Error Descriptions Table 6-5 6-9 Specifier Error Register 1 Bits Error Name Description 00 XDTA PE (XDTA_OSQA_ PERR_TA_H) Asserted by detection of a parity error in XDTA_ XREG_H[03:00], XDTA_OSQA_A_PARITY_H, or XDTA_YREG_HI[03:00] after they are passed from XDTA to OSQA. 01 XDTB PE XDTB_OSQA_ PERR_TA_H) Occurs when a parity error is detected in XDTB_ ORDER_H, XDTB_SL_VALID H, XDTB_0SQA_B_ PARITY_H, XDTB_INDEXED_H, XDTB_MODE_ H[083:00], or XDTB_RLOG_TAG_H[02:00] in OSQA. 02 MASK PE (OSQA_MASK_ PERR_TA_H) Occurs when a parity error is detected when checking OCTL_EBOX_READ_MASK_H[14:00], OCTL_EBOX_WRITE_MASK_H[14:00], and OCTL_ EBOX_MASK_PARITY_H. 04 XDTA PE (XDTA_OSQB_ PERR_TA_H) Asserted by a parity error on the interconnect between XDTA and OSQB. An error in one or more of the following signals asserts this error: XDTA_SL_H[03:00] XDTA_DESTINATION_REG_H[03:00] XDTA_SOURCE1_REG_H[03:00] XDTA_SOURCE2_REG_H[03:00] XDTA_DESTINATION_REG_VALID_H XDTA_DESTINATION_VALID H XDTA_SOURCE1_REG_VALID_H XDTA_SOURCE1_VALID_H XDTA_SOURCE2_REG_VALID_H XDTA_SOURCE2_VALID_H XDTA_OSQB_B_PARITY_H 05 XDTB PE (XDTB_OSQB_A_ PERR_TA_H) Asserted by a parity error in XDTB_OPU_ SPECIFIERS_COMPLETED_H[02:00], XDTB_ RSL_H[02:01], and XDTB_OSQB_A_PARITY_ H. Parity checking is performed in OSQB when these signals are passed from XDTB. 06 XDTB PE (XDTB_OSQB_B_ PERR_TA_H) Asserted by a parity error in XDTB_SL_ SPECIFIER_NUMBER_H[02:00], XDTB_ORDER _ H, XDTB_SL_VALID_H, XDTB_SL_H[05:04], and XDTB_0SQB_B_PARITY_H. Parity checking is performed in OSQB as these signals are passed from XDTB. 07 OPCODE PE (OPCODE_ PERR_TA_H) Asserted by a parity error in IBFA_OPCODE_A_ L[03:00], IBFB_OPCODE_L{[07:04], and IBFB_ OPCODE_PARITY_H. Parity checking is performed in OSQB when these signals are received from IBFB and IBFA. 08 REG MASK PE [00] (REG_ MASK_PERR_TA_H[00]) Occurs when an error is detected in register read/write mask 0 in OCTL (REG_MASKO(0_TA_ H[30:00]). Each mask (mask 0 through 5) is parity checked in OCTL after being passed from XDTA. 09 REG MASK PE [01] (REG_ MASK _PERR_TA_H[01]) Occurs when an error is detected in register read/write mask 1 in OCTL (REG_MASKI1_TA_ H[30:00]). DIGITAL INTERNAL USE ONLY IBox Error Descriptions 6-10 Table 6-5 (Cont.) Specifier Error Register 1 Bits Error Name Description 10 REG MASK PE [02] (REG_ MASK_PERR_TA_H{[02]) Occurs when an error is detected in register read/write mask 2 in OCTL (REG_MASK2_TA_ REG MASK PE [03] (REG_ MASK_PERR_TA_H[03]) Occurs when an error is detected in register read/write mask 3 in OCTL (REG_MASK3_TA_ REG MASK PE [04] (REG_ MASK_PERR_TA_H[04]) Occurs when an error is detected in register TA_ read/write mask 4 in OCTL (REG_MASK4 REG MASK PE [05] (REG_ MASK_PERR_TA_H[05]) Occurs when an error is detected in register read/write mask 5 in OCTL (REG_MASK5_TA_ 11 12 13 H[30:00]). H[30:00]). . H[30:00]). H[30:00]). 6.7 Specifier Error Register 2 Specifier error register 2 is 26 bits wide and reports errors detected in the OPUA and OPUB MCAs. Figure 66 shows the bit field breakdown of this register and Table 66 describes each error that this register reports. 28 29 30 31 27 26 22 23 24 25 OPUB PARITY ERRORS . osoB 13 14 15 EXPAN-| CTRL | USED , oPUB sL NOT SION 12 1 10 16 17 18 18 20 21 PE[03:02] BMUX PE [03:02] boo1 00 o1 , 00 07 08 0s 04 08 0e AMUX DECPC . | | 01 [04:03) 03 00 l 02 SPEC 1-STREAM 01 00 01 00 OPUA PARITY ERRORS OSQB St NOT , USED , EXPANSION 01, 00 OPUA CTRL AMUX PE[01:00] 0t . 00 | | 01 , 00 SPEC -DEC PC BMUX PE[01:00] 02 , [02:00] 901 00 I-STREAM 01, 00 DEC DELTA BRLXI145.09 Figure 6-6 Specifier Error Register 2 DIGITAL INTERNAL USE ONLY IBox Error Descriptions Table 6-6 Specifier Error Register 2 Bits Error Name 00 DEC DELTA (SPEC_ 01 02 03 DECODE_DELTA_PERR_ Description Occurs when a parity check on XDTB_DECODE_ DELTA_H[05:00] and XDTB_DECODE_DELTA_ TA_H) PARITY_H fails. SPEC I-STREAM (SPEC_ ISTREAM_LWORD_PERR_ TA_H[00]) testing XDTA_DISPLACEMENT_H[11:08], XDTA_DISPLACEMENT H[03:00], and XDTA_ SPEC I-STREAM (SPEC_ ISTREAM_LWORD_PERR_ TA_H[01]) testing XDTB_DISPLACEMENT_H[15:12], XDTB_DISPLACEMENT H[07:04], and XDTB_ DEC PC [00] (SPECIFIER_ DECODE_PC_PERR TA_ H{00]) 04 6-11 DEC PC [01] (SPECIFIER_ DECODE_PC_PERR_TA_ H[01)) Occurs when a parity error is detected when DISPLACEMENT_PARITY_H[00]. Occurs when a parity error is detected when DISPLACEMENT PARITY_H[01]. - Occurs when a parity error is detected when testing PCBP_OPU_DECODE_PC_H[07:00] and PCBP_ OPU_DECODE_PC_PARITY_H[00]. . Occurs when a parity error is detected while testing PCVC_OPU_DECODE_PC_H[12:08] and PCVC_ OPU_DECODE_PC_PARITY_H. Occurs when a parity error is detected while testing PCLO_OPU_DECODE_PC_HJ[15:13] and PCLO_ OPU_DECODE_PC_PARITY_H[01]. 06 DEC PC [02] (SPECIFIER_ DECODE_PC_PERR_TA_ H{02)) BMUX PE [00] BMUX_PERR_ 07 BMUX PE [01] (BMUX_PERR_ Asserted by an error in the high BMUX data byte in 08 AMUX PE [00] (AMUX_PERR_ Asserted by an error in the low AMUX data byte in 09 AMUX PE [01) (AMUX_PERR_ Asserted by an error in the high AMUX data byte in 10 OPUA CTRL (OPUA_ 05 TA_H[00]) TA_HI[01]) TA_H[00]) TA_H[01]) CONTROL_PERR_TA_H) Asserted by an error detected in the low BMUX data byte in OPUA (BMUX_H[07:00)). OPUA (BMUX_H[15:08])). OPUA (AMUX_H[07:00]). OPUA (AMUX_H[15:08]). Occurs when a parity error is detected in OSQA_ AMUX_SEL_H{02:00], OSQA_BMUX_SEL_ H[02:00], OSQA_CSHFT_SEL_H[02:00], OSQA_ H, EQ START OP_GRANT WAIT_H, OSQA_OPU_S or OSQA_OPUX_CONTROL_PARITY_H. Parity checking is performed in OPUA. 11 _SL EXPANSION (SL_ Occurs when an error is detected while testing 12 H[00]) SL EXPANSION (SL_ EXPANSION_PARITY_H[00]. Occurs when an error is detected while H{01) SL_EXPANSION_H[09:08], and OSQB_SL_ 16 17 EXPANSION_PERR_TA_ EXPANSION_PERR_TA_ SPEC I-STREAM (SPEC_ ISTREAM_LWORD_PERR_ TA_H[02]) SPEC I-STREAM (SPEC_ ISTREAM_LWORD_PERR_ TA_H[03)]) OSQB_SL_EXPANSION_H[07:00] and OSQB_SL_ testing OSQB_SL_EXPANSION_H[14], 0sSQB_ EXPANSION_PARITY_H[01]. Occurs when an error is detected while testing XDTA_DISPLACEMENT_H([27:24], XDTA_ DISPLACEMENT H[19:16], and XDTA_ DISPLACEMENT PARITY_H[02]. Occurs when an error is detected while testing XDTB_DISPLACEMENT_H(31:28], XDTB_ DISPLACEMENT_H[23:20], and XDTB_ DISPLACEMENT_PARITY_H[03]. DIGITAL INTERNAL USE ONLY 6-12 iBox Error Descriptions Table 6-6 (Cont.) Specifier Error Register 2 Bits Error Name Description 18 DEC PC [03] (SPECIFIER_ DECODE_PC_PERR_TA_ H[02]) Occurs when an error is detected while testing PCLO_OPU_DECODE_PC_H{23:16] and PCLO_ OPU_DECODE_PC_PARITY_H[02]. 19 DEC PC [04] (SPECIFIER_ DECODE_PC_PERR_TA_H) Occurs when an error is detected while testing PCHI_OPU_DECODE_PC_H[31:24] and PCHI_ 20 BMUX PE [02] (BMUX_PERR_ OPU_DECODE_PC_PARITY_H[03]. Asserted by an error detected in the low BMUX data TA_H[02]) byte in OPUB (BMUX_H[23:16]). BMUX PE [03] (BMUX_PERR_ Asserted by an error detected in the high BMUX 22 AMUX PE [02] (AMUX_PERR_ Asserted by an error detected in the low AMUX data 23 AMUX PE [03] (AMUX_PERR_ Asserted by an error detected in the high AMUX 21 24 TA_H[03]) TA_H[02]) TA_H[03]) OPUB CTRL (OPUB_ CONTROL_PERR_TA_H) data byte in OPUB (BMUX_H([31:24]). byte in OPUB (AMUX_H([23:16]). data byte in OPUB (AMUX_H[31:24). Occurs when an error is detected in one or more of the following signals: OSQA_AMUX_SEL_L{02:00] OSQA_BMUX_SEL_L[02:00] OSQA_CSHFT_SEL_L{[02:00] OSQA_OP_GRANT WAIT_H OSQA_OPU_SEQ_START_H OSQA_OPUX_CONTROL_PARITY_H 25 6-12 0OSQB SL EXPANSION (SL_ EXPANSION_PERR_TA_H{03]) Occurs when an error is detected while testing OSQB_SL_EXPANSION_H[31:29] and OSQB_SL_ EXPANSION_PARITY_H[03]. DIGITAL INTERNAL USE ONLY A IBox input and Output Listing Tables A-1, A-2, and A-3 list the input and output signals of the IBox. The signals are grouped by their MCU and MCA origination or destination and are listed alphabetically. All signals represent communication between the IBox and the VBox, MBox, and EBox. The box origination of input signals is defined in the prefix of the signal (for example, EBOX_BRANCH_B_H[00] is an input from the EBox). Table A-1 1Box-VIC Signals Input Destination Origination EBOX_BRANCH_A_H[00] VIC-PCBP USQ-USQC EBOX_BRANCH_A_H[00] VIC-PCVC USQ-USQC EBOX_BRANCH_VALID_A_H[00] VIC-PCBP USQ-UsSQC EBOX_BRANCH_VALID_A_H[00] VIC-PCVC USQ-USQC EBOX_RESULT_L{07:00] VIC-PCBP MUL-RETO EBOX_RESULT_L{07:05)] VIC-PCVC MUL-RETO EBOX_RESULT_PARITY_L{03:00] VIC-PCBP MUL-RETO+RET1 MBOX_IB_DATA_H{03:00] VIC-CL#4 DTA-DTMO MBOX_IB_DATA_H[07:04] VIC-CL#7 DTA-DTM1 MBOX_IB_DATA_H[11:08] VIC-CL#4 DTA-DTM1 MBOX_IB_DATA_H[15:12] VIC-CL#4 DTA-DTM1 MBOX_IB_DATA_H[19:16] VIC-CL#4 DTA-DTM2 MBOX_IB_DATA_H{[23:20] VIC-CL#7 DTA-DTM2 MBOX_IB_DATA_H[27:24] VIC-CL#4 DTA-DTM3 MBOX_IB_DATA_H([31:28] VIC-CL#7 DTA-DTM3 MBOX_IB_DATA_H[35:32] VIC-CL#4 DTA-DTMO MBOX_IB_DATA_H[39:36] VIC-CL#7 DTA-DTMO MBOX_IB_DATA_H[43:40] VIC-CL#4 DTA-DTM1 MBOX_IB_DATA_H[47:44] VIC-CL#7 DTA-DTM1 MBOX_IB_DATA_H[51:48] VIC-CL#4 DTA-DTM2 MBOX_IB_DATA_H[55:52] VIC-CL#7 DTA-DTM2 MBOX_IB_DATA_H[59:56] VIC-CL#4 DTA-DTM3 MBOX_IB_DATA_H[63:60] VIC-CL#7 DTA-DTM3 MBOX_IB_DATA_PARITY_H[01:00] VIC-CL#7 DTB-DTMO, DTM1 MBOX_IB_DATA_PARITY_H[03:02] VIC-CL#7 DTA-DTM2, DTM3 DIGITAL INTERNAL USE ONLY A-1 A-2 [Box Input and Output Listing Table A-1 (Cont.) [Box-VIC Signals MBOX_IB_RESPONSE_TA_H[00] Destination VIC-CL#7 VIC-CL#7 VIC-PCVC VIC-PCVC VIC-PCBP Origination DTB-DTMo0, DTM1 DTA-DTMZ2, DTM3 VAP-FALT CTU-CTMV CTU-CTMV Output Origination Destination IBOX_ABORT_H[00] IBOX_ABORT_L{00] IBOX_IB_ABORT_H{00] IBOX_IB_ABORT_L{00] IBOX_IB_ADDRESS_H[05:00] IBOX_IB_ADDRESS_H[12:06] VIC-PCVC VIC-PCVC VIC-PCVC VIC-PCVC VIC-PCBP VIC-PCVC MBOX-VAP-VAPO MBOX-VAP-CCSQ MBOX-VAP-VAPO MBOX-VAP-CCSQ MBOX-VAP-VAPO MBOX-VAP-VAPO IBOX_IB_ADDRESS_H[31:13] IBOX_IB_ADDRESS_PARITY_H[OO] IBOX_IB_ADDRESS_PARITY_H[0 1] IBOX_PC_H[05:00] IBOX_PC_H[12:06] IBOX_PC_PARITY_HI[00]} VIC-PCVC VIC-PCBP VIC-PCVC VIC-PCBP VIC-PCVC VIC-PCBP MBOX-VAP-FXUP MBOX-VAP-VAPO MBOX-VAP-VAPO EBOX-CTL-QPCS EBOX-CTL-QPCS EBOX-CTL-QPCS IBOX_PC_PARITY_H(01] IBOX_CORRECTION_H[00] VIC-PCVC XBR-PCHI EBOX-CTL-QPCS EBOX-INT-USQC Input Destination Origination EBOX_BRANCH_A_L{00] EBOX_BRANCH_A_L{00] XBR-PCHI XBR-PCLO INT-USQC INT-USQC EBOX_BRANCH_VALID_A_L[00] XBR-PCLO INT-USQC Input MBOX_IB_DATA_PARITY_H[05:04] MBOX_IB_DATA_PARITY_H[07:06} MBOX_IB_PAGE_FAULT_L{00] MBOX_IB_RESPONSE_H[00] Table A-2 IBox-XBR Signals EBOX_BRANCH_VALID_A_L[00] EBOX_KEEP_MASKS_L[01:00] XBR-PCHI XBR-XDTB INT-UISQC CTL-iSSA XBR-PCLO MUL-RETO EBOX_RESULT_L[31:24] EBOX_RESULT_PARITY_L{02:0 1] EBOX_RESULT_PARITY_L[03] EBOX_RLOG_POINTER_H[02:00] EBOX_UNSUSPEND_H[00] MBOX_IB_PAGE_FAULT_H[00] XBR-PCHI XBR-PCLO XBR-PCHI XBR-XDTB XBR-XDTA XBR-IBFA MUL-RET1 MUL-RETO MUL-RET1 DST-SRCS INT-USQA VAP-FALT MBOX_IB_RESPONSE_TA_L{00] MBOX_IB_RESPONSE_TA_L[00] MBOX_IB_RESPONSE_TA_L{00] XBR-IBFA XBR-PCHI XBR-PCLO CTU-CTMV CTU-CTMV CTU-CTMV EBOX_RESULT_L[23:08] DIGITAL INTERNAL USE ONLY IBox Input and Qutput Listing Table A-2 (Cont.) A-3 IBox-XBR Signals Output Origination Destination IBOX_FORK_ADDRESS_H[03:00] IBOX_FORK_ADDRESS_H{[07:04] IBOX_FORK_ADDRESS_H[08] IBOX_FORK_ADDRESS_PARITY_H[00] IBOX_FORK_ERROR_H[00] XBR-IBFA XBR-IBFB XBR-XSCA XBR-IBFB XBR-XSCA XBR-XSCA EBOX-CTL-FRAMX EBOX-CTL-FRAMX EBOX-CTL-FRAMX EBOX-CTL-QPTR EBOX-CTL-ISSE EBOX-CTL-QTPR IBOX_IB_ADDRESS_H[21:13] IBOX_IB_ADDRESS_H[31:22] IBOX_IB_ADDRESS_PARITY_H[02] IBOX_IB_ADDRESS_PARITY_H[03] IBOX_IB_PAGE_FAULT_H[00] IBOX_IB_REQUEST_H[00] XBR-PCLO XBR-PCHI XBR-PCLO XBR-PCHI MBOX-VAP-FXUP MBOX-VAP-FXUP MBOX-VAP-FXUP MBOX-VAP-FXUP XBR-XSCA XBR-IBFA EBOX-CTL-ISSA, ISSB, ISSC MBOX-VAP-VAPO : IBOX_INSTRUCTION_DECODED_H[00] IBOX_PC_H[21:13] XBR-XSCA XBR-PCLO XBR-PCHI EBOX-CTL-ISSA, ISSB, ISSC EBOX-CTL-QPCS EBOX-CTL-QPCS XBR-PCLO XBR-PCHI EBOX-CTL-QPCS EBOX-CTL-QPCS XBR-PCLO EBOX-CTL-QPCS XBR-PCHI EBOX-CTL-QPTR XBR-XDTA XBR-XDTA EBOX-CTL-ISSA, ISSB, ISSC EBOX-CTL-QTPR IBOX_FORK_VALID_H[00] IBOX_PC_H[31:22] IBOX_PC_PARITY_H[02] IBOX_PC_PARITY_H[03] IBOX_PC_VALID_H[00] IBOX_PREDICTION_H[00] IBOX_RAF_H[00] IBOX_REGISTER_FORK_H{00] DIGITAL INTERNAL USE ONLY A—4 IBox Input and Output Listing Table A-3 I1Box-OPU Signals Input Destination EBOX_BRANCH_B_H[00] OPU-0OSQA Origination USQ-USQC EBOX_GPR_BYTEO_WRITE_H[00) EBOX_GPR_BYTE1_WRITE_H[00] EBOX_GPR_H[03:00] OPU-0SQA OPU-OCTL, 0QSA, OSQB OPU-STG2 OPU-STG2 OPU-STG2 USQ-USQC UsSQ-UsQAa EBOX_GPR_H[03:00] EBOX_GPR_WORD1_WRITE_H[00] OPU-STG3 OPU-STG3 CTL-ISSC CTL-ISSC EBOX_KEEP_MASKS_H[01:00] OPU-OCTL EBOX_BRANCH_VALID_B_H[00] EBOX_FLUSH_H[02:00] EBOX_INSTRUCTION_DONE_H[00] EBOX_INTERRUPT_H[00] EBOX_LAST_POINTER_H[03:00] EBOX_QUEUE_FULL_H[00] CTL-ISSC CTL-ISSC CTL-ISSC OPU-OCTL OPU-OSQA CTL-ISSA USQ-USQA OPU-0SQB CTL-ISSA OPU-0OSQB CTL-ISSA CTL-QPTR EBOX_RESULT_H[15:00] OPU-STG2 MUL-RETO+RET1 EBOX_RESULT_PARITY_H[01:00] EBOX_RESULT_PARITY_H[03:02] OPU-STG2 OPU-STG3 MUL-RET0+RET1 MUL-RETO+RET1 EBOX_RESULT_H(31:16] OPU-STG3 MUL-RET0+RET1 EBOX_RLOG_FULL_H[00] OPU-0SQB INT-RLOG MBOX_OP_DATA_H(15:00] OPU-OPUA DTA-DTMo, DTM1 MBOX_OP_DATA_PARITY_H[01:00] OPU-OPUA DTA-DTMo0, DTM1 MBOX_OP_GRANT_H[00] OPU-0SQA VAP-VAPO MBOX_OP_DATA_H[31:16] OPU.OPUB DTA-DTM2, DTM3 OPU-OPUB DTA-DTM2, DTM3 MBOX_OP_RESPONSE_H[00] OPU-0SQA, OPUA, OPUB CTU-CTMV VBOX_ADDRESS_H[15:00] VBOX_ADDRESS_H[31:16] VBOX_ADDRESS_PARITY_H[01:00] VBOX_ADDRESS_PARITY_H[03:02] VBOX_ADDRESS_VALID_H[00] VBOX_BLOCK_READ_H[00} OPU-STG2 OPU-STG3 OPU-STG2 OPU-STG3 OPU-OSQA, OSQB OPU-0SQA, OSQB VBOX-VAD-VMKA VBOX-VAD-VMKA VBOX-VAD-VMKA VBOX-VAD-VMKA VBOX-VAD-VMKB VBOX-VAD-VMKB VBOX_READ_NOP_H[00] OPU-0OSQA, OSQB VBOX-VAD-VMKB MBOX_OP_DATA_PARITY_H[03:02] VBOX_REFERENCE_SIZE_H[00] VBOX_REFERENCE_TYPE_H[00] DIGITAL INTERNAL USE ONLY OPU-0SQA, OSQB OPU-0SQA, 0OSQB VBOX-UCS-VCTC VBOX-UCS-VCTA IBox Input and Output Listing Table A-3 (Cont.) A-5 [Box-OPU Signals Output Origination Destination IBOX_DATA_H[15:00] IBOX_DATA_H[31:16] IBOX_DATA_PARITY_H[01:00] IBOX_DATA_PARITY_H[03:02] IBOX_DATA_ERROR_H{00] OPU-OPUA OPU-OPUB OPU-OPUA OPU-OPUB OPU-OSQB EBOX-DST-STGO EBOX-DST-STG1 EBOX-DST-STGO EBOX-DST-STG1 EBOX.CTL-ISSE IBOX_DATA_TAG_H[03:00] OPU-0OSQA EBOX-CTL-ISSA, ISSB, ISSC, IBOX_DATA_VALID_H[00] OPU-0SQA EBOX-CTL-ISSA, ISSB, ISSC, IBOX_DATA_VALID_L[00] OPU-0SQA EBOX-CTL-ISSA, ISSB, ISSC, IBOX_DESTINATION_MEMORY_HI[00] OPU-0OSQB gg%){(-CTL—ISSA, ISSB, ISSC, IBOX_DESTINATION_POINTER_ OPU-0SQB EBOX-CTL-ISSA, ISSB, ISSC, IBOX_DESTINATION_VALID_H[00] OPU-0OSQB g}li%){(—CTL—ISSA, ISSB, ISSC, IBOX_DST_DATA_TAG_H[03:00] OPU-0SQA EBOX-DST-STGO, STG1 IBOX_DST_GPR_H[03:00] IBOX_FLUSH_ABORT_H[00] IBOX_FLUSH_ABORT_L[00] IBOX_FREE_POINTER_H[03:00] IBOX_GPR_H[03:00] IBOX_GPR_WRITE_H[00] OPU-0SQA OPU-OCTL OPU-OCTL OPU-0SQB OPU-0SQA OPU-0SQA EBOX-DST-STGO, STG1 MBOX-VAP-VAPO MBOX-VAP-CCSQ EBOX-ISSA, ISSB, ISSC EBOX-DST-SRCS EBOX-DST-SRCS IBOX_OP_ADDRESS_H[31:16] IBOX_OP_ADDRESS_H[15:00] IBOX_OP_ADDRESS_PARITY_H[01:00] IBOX_OP_ADDRESS_PARITY_H[03:02] OPU-OPUB OPU-OPUA OPU-OPUA OPU-OPUB MBOX-VAP-FXUP MBOX-VAP-FXUP MBOX-VAP-VAPO MBOX-VAP-FXUP IBOX_OP_CONTROL_PARITY_H[00] OPU-OSQA MBOX-YAP-VAPO IBOX_OP_TAG_H([03:00] OPU-0OSQA MBOX-VAP-VAPO IBOX_OSQA_RLOG_PARITY_H[00] OPU-0SQA EBOX-INT-RLOG H{03:00] IBOX_OP_CONTEXT_H[02:00] IBOX_OP_CONTROL_H[02:00] IBOX_OP_INDIRECT_H[01:00] IBOX_OP_REQUEST_H[00] IBOX_OSQA_ISSA_PARITY_H[00] OPU-0OSQA OPU-0OSQA OPU-0SQA OPU-0SQA OPU-0SQA DST-SRCS DST-SRCS . DST-SRCS QPTR MBOX-VAP-VAPO MBOX-VAP-VAPO MBOX-VAP-VAPO MBOX-VAP-VAPO EBOX-CTL-ISSA DIGITAL INTERNAL USE ONLY A-6 |Box Input and Output Listing Table A-3 (Cont.) 1Box-OPU Signails Output Origination Destination IBOX_OSQB_ISSB_PARITY_H[00] IBOX_OSQB_QPTR_PARITY_H[00] IBOX_OSQB_SRCS_PARITY_H[00] IBOX_POINTER_ERROR_H[00] IBOX_RLOG_COMPLETE_H[00] IBOX_RLOG_CONTEXT_H[03:00} OPU-0SQB OPU-0SQB OPU-0OSQB OPU-0SQB OPU-0SQA OPU-0SQA EBOX-CTL-ISSB EBOX-CTL-QPTR EBOX-DST-SRCS EBOX-CTL-ISSE EBOX-DST-SRCS EBOX-DST-SRCS IBOX_RLOG_TAG_H[02:00] OPU-0SQA - EBOX-DST-SRCS IBOX_SOURCE2_POINTER_H[04:00] IBOX_SOURCE2_POINTER_L{04:00] IBOX_SOURCEZ2_VALID_H[00] IBOX_SOURCE2_VALID_L[00] OPU-0OSQB OPU-0SQB OPU-0OSQB OPU-0SQB EBOX-CTL-QPTR EBOX-CTL-SRCS EBOX-CTL-QPTR EBOX-CTL-SRCS IBOX_RLOG_WRITE_HI[00] IBOX_SOURCE1_POINTER_H{04:00] IBOX_SOURCE1_POINTER_L{04:00] IBOX_SOURCE1_VALID_H[00] IBOX_SOURCE1_VALID_L[00] DIGITAL INTERNAL USE ONLY OPU-0SQA OPU-0SQB OPU-0SQB OPU-0SQB OPU-0SQB EBOX-DST-SRCS EBOX-CTL-QPTR EBOX-CTL-SRCS EBOX-CTL-QPTR EBOX-CTL-SRCS Index B Branch bias, 4-30 Branch prediction, 4-27 demote, 4-29 primary prediction hit, 4-28 primary predictions, 4-27 secondary predictions, 4-30 Branch prediction cache match enable, 4-29 C Complex specifier unit (CSU), 5-4 adder, 5-8 AMUX, 5-7 BMUX, 5-7 CSU microcode control, 5-12 current PC generation, 5-10 OPUA data path, 5-5 OPUB data path, 5-8 Instruction buffer, 3-9 IBEX, 3-11 IBEX2, 3-10 IBUF, 3-18 instruction buffer parity, merger, rotator, 3-19 3-14 3-11 shifter, 3-16 Instruction buffer interface, 3-22 aborting requests, 3-23 page faults, 3-23 Instruction decode, 4-1 Instruction fetch, 3-1 to 3-23 Intra-instruction read conflict XBAR IRC logic, 4-23 Intra-instruction read conflicts (IRC), 4-23 M MCU, 1-5 OPU, 1-8 VIC, 1-6 XBR, 1-7 E EBox-to-IBox interface, 5-44 F Free pointer logic (FPL), 5-25 destination pointer, 5-30 free pointer, 5-29 free pointer initialization, 5-30 source 1, source 2, 5-25 5-27 o Operand control (OCTL), 5-32 flush, 5-35 read and write register mask parity, 5-35 read and write register masks, 5-32 stalls, 5-37 OPU port interface, 5-38 P IBox error registers, 6-1 decode error register 1, 6-5 fetch error register 1, 6-1 fetch error register 2, 64 specifier error register 1, 6-8 specifier error register 2, 6-10 XBAR decode error register, 6-6 IBox-to-EBox interface, 5-40 PCU microcode, 4-31 Pipeline description, 1-9 to 1-12 Program counter, 2-1 branch PC, 2-6 branch PC data path, 2-6 cache control, 2-8 decode PC, 24 decode PC data path, 2-5 PCU errors, 2-9 prefetch PC, 2-1 index 1 2 Index Program counter (cont’d.) prefetch PC data path, 2-3 target PC, 24 unwind PC, 2-7 unwind PC data path, 2-7 S Short literal specifier handler (SLU), 5-19 block diagram, 5-19 floating-point expansion, 5-22 integer expansion, SLU output, 5-24 5-21 SLU parity protection, 5-24 stalls, 5-24 Specifier decode, V VBox interface, 5-1 VIC flush, 3-7 VIC hit, 3-2 VIC parity, 3-8 VIC read, 3-7 VIC write, 3-2 XBAR, 4-1 decode tree logic, 4-7 DRAM, 44 fork logic, 4-14 request logic, 4-9 shift counts, 4-13 simple decode, 4—6 specifier counts, 4-12 XBAR displacement data path, 4-14 XBAR IRC logic, 4-23 XBAR short literal data path, 4-16 XBAR source and destination logic, 5-47 Virtual instruction cache, 3-1 disabling VIC hit, 3-8 4-18 XRAM, 4-5
Home
Privacy and Data
Site structure and layout ©2025 Majenko Technologies