Digital PDFs
Documents
Guest
Register
Log In
EK-ORAS0-SV-003
June 1990
256 pages
Original
11MB
view
download
Document:
EK-ORA90-SV-003 RA90 RA92 Service Jun90
Order Number:
EK-ORAS0-SV
Revision:
003
Pages:
256
Original Filename:
OCR Text
RA90/RA92 Disk Drive Service Manual Order Number EK-0RA90-SV-003 Digital Equipment Corporation Maynard, Massachusetts First Edition: June 1988 Second Edition: June 1989 Third Edition: June 1990 The information in this document is subject to change without notice and should not be construed as a commitment by Digital Equipment Corporation. Digital Equipment Corporation assumes no responsibility for any errors that may appear in this document. The software described in this document is furnished under a license and may be used or copied only in accordance with the terms of such license. No responsibility is assumed for the use or reliability of software on equipment that is not supplied by Digital Equipment Corporation or its affiliated companies. Restricted Rights: Use, duplication, or disclosure by the U.S. Govemment is subject to restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013. Copyright © 1989, 1990 by Digital Equipment Corporation All Rights Reserved. Printed in U.S.A. The postpaid READER'S COMMENTS card requests the user's critical evaluation to assist in preparing future documentation. FCC NOTICE: The equipment described in this manual generates, uses, and may emit radio frequency energy. The equipment has been type tested and found to comply with the limits for a Class A computing device pursuant to Subpart J of Part 15 of FCC Rules, which are designed to provide reasonable protection against such radio frequency interference when operated in a commercial environment. Operation of this equipment in a residential area may cause interference, in which case the user at his own expense may be required to take measures to correct the interference. The following are trademarks of Digital Equipment Corporation: DEC DECUS DECnet HSC KDA MASSBUS MicroVAX MSCP PDP RA RC25 RQDX3 RSTSIE RSX R~11 SA TA TK TU UDA50 UlTRIX UNIBUS VAX VAXsimPlUS VMS TOPS-10 TOPS-20 RA90 ©Digital Equipment Corporation 1987 Covered by one or more U.S. PAT. Nos. 4,475,212 4,150,172 4,503,420 4,434,487 and other patents pending This document was prepared using VAX DOCUMENT, Version 1.1 Contents About This Manual 1 2 xiii Introduction 1.1 RA90 and RA92 Disk Drive Descriptions ...•.......................... 1.1.1 Physical and Logical Media Layout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 1-3 1.2 Maintenance Strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Service Delivery Strategy . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1.1 Six-Step Maintenance Strategy. . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Tools Required for Maintenance .. . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . . . 1.2.3 Preventative Maintenance ......................•................ 1-3 1-4 1-4 1-5 1-5 1.3 RA9O/RA92 Disk Drive Specifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5 1.4 Electrostatic Protection. . . . . . . . . . . • . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . 1-8 Installation 2.1 Introduction 2-1 2.2 Site Preparation and Pianning ..................................... . Power and Safety Precautions. . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . 20202 Three-Phase Power Requirements ••••••••••••••••••.••.••...•••••• 2.2.3 AC Power Wiring ............................................. . 2.2.4 Thermal Stabilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.5 Floor Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . 2.2.6 Operating Temperature and Humidity ...•.......................... 2-1 2.2.1 2-1 2-1 2-3 2-3 2-3 2-3 2.3 Unpacking the Cabinet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Deskidding the Cabinet ......................................... . 2-3 2-5 2.4 Installing SDI Cables and Power Cords .............................. . 2.4.1 Removing the Front and Rear Access Panels ........................ . 2.4.1.1 Front Access Panel Removal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1.2 Removing the Rear Access Panel ................................ 2.4.2 SDI Cable Connections and Routing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Power Cord Connections and Routing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7 2-7 2-7 2-9 2-10 2-11 2.5 2-12 Locating the RA9OIRA92 Disk Drive Power Supply ...................... iii iv Contents 3 2.5.1 Plugging in the Power Cord ..................................... . 2-12 2.6 International Operator Control Panel Labeling ........................ . 2-13 2.7 RAOOIRA92 Disk Drive Acceptance Testing Procedures .................. . Voltage Selection .............................................. . 2.7.1 Applying Power to the Drive ..................................... . 2.7.2 2-13 2-13 2-14 2.8 Power-Up Resident Diagnostics .................................... . 2.8.1 OCP Lamp Testing ............................................ . 2.8.2 Test Selection from the OCP ..................................... . 2.8.3 RA901RA92 Idle Loop Acceptance Testing ........................... . 2.8.4 Testing Spun-Down Drive ....................................... . 2.8.5 Testing Spun-Up Drive ......................................... . 2-16 2-16 2-16 2-16 2-18 2-19 2.9 Placing the Drive On Line ........................................ . 2.9.1 Programming the Drive Unit Address .............................. . 2-20 2-20 2.10 Installing RA9OIRA92 Add-On Disk Drives in SO-Inch Cabinets ............ . 2-22 Operating Instructions Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 3.2 RA9OIRA92 Disk Drive Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Electronic Control Module (ECM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1.1 I10-R/W Mod·ule ............................................. 3.2.1.2 Servo Module .............................................. . 3.2.2 Preamp Control Module (PCM) ................................... . Head Disk Assembly and Carrier Assembly ......................... . 3.2.3 3.2.4 Dual Outlet Blower Motor ....................................... . 3.2.5 Power Supply ................................................ . 3.2.6 Drive Functional Microcode ..................................... . 3.2.7 OCP Functions ............................................... . 3-1 3-3 3-3 3-5 3-7 3-10 3-12 3-12 3-13 3-14 3.3 RA.901RA92 Operating Modes ...................................... . 3.3.1 Normal Mode Setup ........................................... . 3.3.2 Fault Display Mode Setup ....................................... . 3.3.3 Test Mode Setup .............................................. . 3-15 3-15 3-16 3-18 3.4 Programming the Drive Unit Address. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Alternate Unit Address Display Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19 3-21 3.1 4 Drive-Resident Diagnostics and Utilities Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 4.2 Power-Up and Idle Loop Diagnostics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Power-Up (Hardcore) Diagnostics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Idle Loop Tests (Drive Spun Down). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Idle Loop Tests (Drive Spun Up) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 4-1 4-2 4-2 4.3 4-2 4.1 Sequence Diagnostics ............................................. Contents v 5 4.4 Standard OCP Displays Indicating Procedural Problems .. . . . . . . . . . . . . . . . . 4-3 4.5 Software Jum.per. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4--4 4.6 Temperature's Affect on Drive Performance ............................ 4-5 4.7 Diagnostics Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 Seek Timing Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.2 Time, Seeks, and Spinups Display Interpretation . . . . . . . . . . . . . . . . . . . . . . 4-5 4-14 4-17 Troubleshooting and Error Codes 5.1 Troubleshooting Reference Material .................................. 5.1.1 Customer Support Training for the RA9OIRA92 Disk Drive. . . . . . . . . . . . . . 5-1 5-1 5.2 RA9OIRA92 Troubleshooting Aids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . 5.2.1 V.AXsi~US. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Host Error wgs .............................................. . 5.2.2 Extended Status Bytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Response Opcode (Byte 1) .••...•..•••....•.•.•..•••.....••••.•• 5.2.3.1 Unit Num.ber Low Byte (Byte 2) and Subunit Mask (Byte 3) .......... . 5.2.3.2 Request Byte (Byte 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . 5.2.3.3 Mode Byte (Byte 5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3.4 Error Byte (Byte 6) .......................................... . O.~.C>.O Controller Byte (Byte 7) •.••..•............•.•••...••••••.••••• 5.2.3.6 Retry' Coun.t (Byte 8) ...••.•.•..••••.•.•.•••••••••••.•.•••••••• 5.2.3.7 Previous Command Opcode (Byte 9) ..•.•••••••..••••••••••••••••• 5.2.3.8 IIDA Revision Bits (Byte 10) ................................... . 5.2.3.9 Cylinder Address (Bytes 11 and 12) ...........................••. 5.2.3.10 Error Recovery Level (Selected Group) (Byte 13) ................... . 5.2.3.11 Error Code (Byte 14) ......................................... . 5.2.3.12 Manufacturing Fault Code (Byte 15) ............................. . o.~.c>.~c> 5.2.4 Drive Internal Error Log. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Running DKUTIL From the HSC Console or KDM70 Controller ....... . 5.2.4.1 5.2.4.2 Running the Drive-Resident Utility Dum.p (T41) From the OCP ....... . 5.2.5 OCP Fault IndicatorlError Codes ................................. . 5.2.6 Drive Power Supply Indicator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.7 Drive Error Reporting Mechanisms ................................ 5.2.7.1 Detailed Description of Error Reporting Mechanisms. . . . . . . . . . . . . . . . . 5.2.8 Host-Level Diagnostics and Utilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 5-2 5-2 5-2 5-3 5-3 5-3 5-4 5-4 5-5 5-5 5-6 5-6 5-6 5-9 5-9 5-9 5-9 I"' ... f t , . . 1"'''' ft 4 f t 5-12 5-14 5-14 5-14 5-15 5-15 5-16 5.3 General Troubleshooting Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Drive-Resident Diagnostics Limitations ............................. 5-16 5-16 Step-by~Step Troubleshooting Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Troubleshooting Worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-16 5-23 5.5 Identifying the Problem Drive f!] • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 5.5.1 Talking to the System Operator/Checking the OCP Fault Indicator II] . . . . . 5.5.2 Using VAXsimPLUS to Identify the Problem Drive (gJ. . . . . . . . . . . . . . . . . . 5.5.3 Using the Host Error Log to Identify the Problem Drive [!I. . . . . . . . . . . . . . 5-23 5-23 5-23 5-23 5.4 5.4.1 vi Contents Using the HSC Console Log to Identify the Problem Drive [] ........... . Using the Host ConsolelUser Terminal Trails to Identify the Problem Drive 5-24 ~ Using Other Means to Identify the Problem Drive [!] ................. . 5-24 5-24 5.6 Identifying the Problem FRU ~ ..................................... . Pre-Verifying Drive Symptoms ~ ................................ . 5.6.1 Using OCP Error Codes to Identify the Problem FRU ~ ............... . 5.6.2 Using VAXsimPLUS to Identify the Problem FRU § ................. . 5.6.3 Using the Host Error Log to Identify the Problem FRU ~ ............. . 5.6.4 Using the HSC Console Log to Identify the Problem FRU ~ ........... . 5.6.5 Using the Drive Internal Error Log to Identify the Problem FRU ~ ...... . 5.6.6 5-24 5-25 5-25 5-25 5-25 5-26 5-27 5.7 Priority Order of Troubleshooting DSA Errors f! . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.1 Drive-Detected Drive Errors and Diagnostic Faults [!] ................. 5.7.1.1 Drive-Detected Protocol Errors Without Communication Errors ~ . . . . . . 5.7.1.2 Drive-Detected Pulse or State Parity Errors ~ . . . . . . . . . . . . . . . . . . . . . 5.7.2 Controller-Detected EDC Error ~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.2.1 Controller-Detected Protocol and Transmission Errors Without Communication Errors (StatuslEvent Codes 14B or 4B) [!]. . . . . . . . . . . . 5.7.2.2 Controller-Detected Pulse or State Parity Errors (Status/Event Code lOB) ~ .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.3 Controller-Detected Communication Events and Faults [!I . . . . . . . . . . . . . . 5.7.3.1 Controller-Detected: LOSS OF READIWRITE READY (Status/Event Code: 5-27 5-27 5-27 5-27 5-28 5.5.4 5.5.5 5.5.6 ........................................................ . 8B)~.................................................... 5.7.3.2 5.7.3.3 5.7.3.4 5.7.3.5 5.7.3.6 5.7.3.7 5.7.3.8 Controller-Detected: LOST RECEIVER READY (Status/Event Code: CB) ~ ....................................................... Controller-Detected: RECEIVER READY COLLISION (StatusIEvent Code: lAB) 13.101 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Controller-Detected: DRIVE CLOCK DROPOUT (Status/Event Code: AB) 13.111. ....... ............................................... Controller-Detected: DRIVE FAILED INITIALIZATION (StatuslEvent Code: 16B) 13.121 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Controller-Detected: DRIVE IGNORED INITIALIZATION (Status/Event Code: 18B) 13.131 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Controller-Detected: SERDES OVERRUN ERROR (Status/Event Code: 2A) 13.141 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SDI Drive Command Timeout (Status/Event Code: 2B)13.1sl ........... 5.8 Media-Related Errors ~. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.1 Repeating LBNslRBNs [!]. . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . 5.8.2 Excessive Number of Blocks Replaced Because of RJW Path Problems ~ . . . 5.8.3 LBN Correlation to Single GrouplTrack ~ .......................... 5.8.4 LBN Correlation to Head Groups ~ ......................... .. . . .. 5.8.4.1 LBNs Correlated to Zone Write Boundaries ~ ..................... 5.8.4.2 LBN Correlation to a Physical Cylinder ~ ........................ 5.8.5 Multiple Controllers Report Same Error Types ~ . . . . . . . . . . . . . . . . . . . . . 5.8.6 Only Single Controller Port Affected I!!J . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.7 Isolating Random RJW Transfer Errors ~. . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.7.1 Not Defined to a Specific Drive/Controller Port. . . . . . . . . . . . . . . . . . . . . . 5-29 5-29 5-30 5-30 5-30 5-31 5-31 5-31 5-31 5-32 5-32 5-32 5-33 5-33 5-33 5-34 5-34 5-34 5-35 5-35 5-35 5-35 Contents Miscellaneous Checks ~. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5--36 5.10 Axe You wst? ~ ................................................. 5--36 5.11 Using Host-Level Diagnostics as a Last Resort ~ . . . . . . . . . . . . . . . . . . . . . . . . 5.11.1 HSC-Based Diagnostics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.11.2 KDM-Based Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.11.2.1 On Line from 'VMS . . . . . . • . . • . . • • • • • . . • . • . . . • • • • . . . . . • •.• . . . . . . 5.11.2.2 Running Standalone Programs from the VAX Diagnostic Supervisor . . . . . 5.11.3 xDA Controller-Based Diagnostics ................................. 5--37 5--37 5--37 5-37 5-38 5-38 5.12 Exiting Data Collection: Action Item List Process ~ . . . . . . . . . . . . . . . . . . . . . . 5--39 5.13 FRU Replacement ~ .............................................. 5.13.1 Multiple Error Codes ~. • • • • • . . . • • • • . • • • . • • • . • • • • • • • • • • • • • • . • • • • 5.13.2 Service Post-Verification rg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.13.3 Return Disk Drive to User ~ .................................... 5-40 5-40 5-40 5-41 5.14 Performance Issues When No Errors Are Being Logged. . . . . . . . . . . . . . . . . . . 5-41 5.15 Troubleshooting VMS Mount Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.15.1 VMS Mount Verification .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.15.2 VMS Problems Surrounding Diagnosis of "Why a Drive Mount-Verifies" .... 5.15.3 Non-VMS Mount Verification ..................................... 5-42 5-42 5-42 5-44 5.16 Troubleshooting ECC Errors on RA9OIRA92 Disk Drives ... . . . . . . . . . . . . . . . 5.16.1 Uncorrectable ECC Errors--MSCP Status/Event E8 ................... 5.16.1.1 Hard Uncorrectable ECC Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.16.1.2 Soft Uncorrectable ECC Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.16.2 Correctable ECC Errors--MSCP Status/Event Codes lAB, lC8, 1E8 . . . . . . . 5.16.2.1 BBR Packet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-44 5-44 5-44 5-46 5.17 Troubleshooting Controller-Detected Positioner Errors-MSCP StatuslEvent 6B 5.17.1 RA92 Disk Drive With MSCP Status/Event 6B. . . . . . .. . . . . . . . . . . . . . . . . 5.17.2 Evaiuaiing MSCP 6B Events ..................................... 5-49 5-49 5-52 5.18 Conclusion...................................................... 5--52 5.19 Error Codes and Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5--53 5.9 6 vii 5-48 5=-48 Removal and Replacement Procedures 6.1 Introduction ................................................... . 6-1 6.2 Sequence for FRU Removal ........................................ . 6-3 6.3 Electrostatic Sensitivity ........................................... . 6-3 6.4 Power Precautions .............................................. . 6-3 6.5 Tools Checklist ................................................. . 6-3 6.6 Removing/Replacing Cabinet Front and Rear Access Panels ............... . 6.6.1 RemovinglReplacing the Front Access Panel ......................... . 6.6.2 Removing!Replacing the Rear Access Panel ......................... . 6-4 6-4 6-4 6.7 Removing the Operator Control Panel ............................... . ~ viii 7 Contents 6.8 Removing the BlowerlBezel Motor Assembly ........................... 6.8.1 Separating the Bezel and Blower Motor Assembly . . . . . . . . . . . . . . . . . . . . . 6-7 6-9 6.9 Removing the Electronic Control Module. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10 6.10 Removing the Preamp Control Module. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11 6.11 Removing/Replacing the Head Disk Assembly .......................... 6.11.1 Removing the HDA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11.2 HDA Thermal Stabilization Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11.3 Replacing th.e IIDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11.4 Separating the HDA and Carrier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11.5 Removing the Spindle Ground Brush. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11.6 Removing the Brake Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11. 7 Spindle Lock Solenoid Failure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12 6-12 6-13 6-14 6-14 6-16 6-17 6-20 6.12 Removing the Power Supply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-22 6.13 Removing/Replacing the Rear Flex Cable Assembly . . . . . . . . . . . . . . . . . . . . . . 6-23 6.14 Media Removal Service for Customers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-25 Microcode Update Procedure 7.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 7.2 Microcode Update Cartridge Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 7.3 Microcode Update Port Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2 7.4 Running Test 40 (T40) ............................................ 7-3 7.5 Updating the Microcode ........................................... 7.5.1 Error Codes/Common Problems During Microcode Update. . . . . . . . . . . . . . . 7-3 7-3 A Capturing Information for LARS and CHAMPS B RA90/RA92 Error Recovery Levels C Customer Equipment Maintenance C.1 Customer Responsibilities ......................................... . C.1.1 Cleaning Supplies ............................................. . Ongoing Equipment Care ....................................... . C.1.2 C.I.3 Monthly Equipment Maintenance ................................. . C.1.4 Maintenance Records .......................................... . C-1 C-1 C-1 C-2 C-2 Contents o ix Customer Services'Preventative Maintenance D.1 PM Checklist for RAOOIRA92 Disk Drives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-1 Index Examples 5-1 5-2 5-3 5-4 5-5 5-6 RA90 Cylinder Address and Group (Head) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RA92 Cylinder Address and Group (Head) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VMS Uncorrectable ECC Error Log-Hard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . VMS Uncorrectable ECC Error Log=Soft. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VMS BBR Packet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . Positioner Mis-Seek MSCP Status/Event 6B. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7 5-8 5-45 1)...47 5-50 5-51 Figures 1-1 1-2 1-3 1-4 2-1 2-2 Example of Sector Format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RA90 Physical and Logical Media Layout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RA92 Physical and Logical Media Layout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ESD Wrist Strap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Electrical Plug Configurations ...................................... Unpacking the 60-lnch Cabinet ..................................... 2--3 Cabinet Deskidding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • 2-4 Ramp Installation of Shipping Pallet ................................. 2-5 ~veler Adjustment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 Front· Panel Removal ............................................• 2-7 Rear Access Panel Removal ........................................ 2-8 SDI Cable Connections and Routing-SA600 Example. . . . . . . . . . . . . . . . . . . • 2-9 Power Cord Connections and Routing-SA600 Example" " " " " " " " " " " " " " " " " " " 2-10 RA9OIRA92 Power Supply Controls and Indicators. . . . . . . . . . . . . . . . . . . . . . . 2-11 RA9OIRA92 Operator Control Panel .................................. 2-12 Location of Voltage Selector Switch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13 Location of Power Controller Controls-881 Example. . . . . . . . . . . . . . . . . . . . . 2-14 Test Selection Flowchart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15 OCP Displays During Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16 Unit Selection Flowchart .......................................... 3-1 RA9OIRA92 Disk Drive Block Diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 I10-R/W Module Block Diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 Servo Module Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3--4 PCM Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5 PCM Switch Pack Location. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3--6 HDA Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7 Power Supply OK LED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8 RA.9OIRA.92 OCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9 OCP Fault Display Error Code Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10 Fault Display Mode Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 1-3 1-4 1-9 2-2 2-4 2-5 2-6 2-7 2--8 2-9 2-10 2-11 2-12 2-13 2-14 2-15 2-17 2-19 2-21 3-2 3-4 3-6 ~ 3-10 3-11 3-13 3-15 3-17 3-17 x Contents 3-11 OCP Display After Test Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12 OCP Display While Running Test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13 Unit Address Selection Flowchart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14 Alternate Unit Address Display Mode Flowchart. . . . . . . . . . . . . . . . . . . . . . . . 4-1 Using Loopback Connectors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 Hardware Revision Switches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3 Hardware Revision Byte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4 T65 FCY OCP Display ............................................ 4-5 T65 LCY OCP Display ............................................ 4-6 T65 INC OCP Display. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7 T65 DLY OCP Display ............................................ 5-1 RA9OIRA92 Extended Drive Status Bytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2 RA9OIRA92 Drive Internal ElTOr Log Memory Layout. . . . . . . . . . . . . . . . . . . . 5-3 RA9OIRA92 Drive Internal ElTOr Log Header Format .................... 5-4 RA9OIRA92 Drive Internal ElTOr Log Descriptor Format . . . . . . . . . . . . . . . . . . 5-5 Drive Internal Error wg. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6 Power Supply Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7 Step-by-Step Troubleshooting Flowchart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 Power Supply Cover Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9 WRT/CMD Data Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • 6-1 RA9OIRA92 Disk Drive - Exploded View. . . . . . . . . . . . . . . . . . . . . • . . . . . . . . 6-2 FRU Removal Sequence ........................................... 6-3 Front Access Panel P.emoval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4 Rear Access Panel Removal ........................................ 6-5 OCP Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6 Blower Motor Assembly Removal Sequence ............................ 6-7 Bezel and Blower Motor Assembly Separation .......................... 6-8 ECM Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9 PCM Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10 IIDA Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11 IIDA Carrier Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12 Spindle Ground Brush Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-13 Contact Extraction Tool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14 RA9OIRA92 Brake Assembly Removal/Replacement . . . . . . . . . . . . . . . . . . . . . . 6-15 Disabling the Solenoid for In-Field Data Recovery . . . . . . . . . . . . . . . . . . . . . . . 6-16 Power Supply Removal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-17 Rear Flex Cable Assembly Removal .................................. 6-18 HDA Media Removal - Top View. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-19 HDA Media Removal - Bottom View ........................ ". . . . . . . . . 7-1 Microcod.e Update Cartridge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2 Microcod.e Update Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . • • . . . . . . . . . . . A-I LARS Exa.m.ple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0-1 Customer Equipment Maintenance Log for Storage Array Cabinets. . . . . . . . . . 3-18 3-18 3-20 3-22 4-10 4-19 4-20 4-23 4-24 4-24 4-24 5-2 5-10 5-11 5-12 5-13 5-14 5-17 5-59 5-65 6-2 6-3 6-5 ~ 6-7 6-8 6-9 6-10 6-11 6-13 6-15 6-16 6-18 6-19 6-21 6-23 6-24 6-27 6-28 7-1 7-2 A-2 C-3 Contents xi Tables 1-1 1-2 1-3 2-1 3-1 3-2 3-3 3-4 3-5 Specifications for RA90 and RA92 Disk Drives. . . . • . . . . . . . . . . . . . . . . . . . . . Additional Electrical Specifications by Model for RA90 and RA92 Disk Drives. . RA9OIRA92 Environmental Limits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OCP Error Codes ................................................ ECM Module Types = Compatibility Matrix ........................... I/O-PJW Module - Hardware Revision Matrix . . . . . . • . . . . . . . . . . . . . . . . . . . Servo Module - Hardware Revision Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . PCM Switch Pack Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PC'.M: Module - Hardware Revision Matrix . . . • . . . . . . . . . . . . . . . . . . . . . . . . 3-6 RA9OIRA92 HDA Hardware Compatibility Matrix . . . . . . . . • . . • . . . . . . . . . . . 3-7 RA9OIRA92 Microcode Compatibility With Drive FRUs ...•........•...... 3-8 Power-Up: Normal Mode Operations .........................•....... 5-1 Reference Material for Troubleshooting ............................... 5-2 1\vo-Board Controller Diagnostics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . 5-3 Summary of Controller-Detected Communication Errors .................. 5-4 RA9OIRA92 Write Zones .... . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . 5-5 VDS-Based Off-lAne Diagnostics. . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ~ MDM-Based Off-lAne Diagnostics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7 XXDP-Based Off-Line Diagnostics ................................... 5-8 Serial Num.ber ............................................. '" . . . . 5-9 Power Supply Voltage Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . 5-10 HDA Connector Pin Designations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11 HDA Resistance MeastJrements ................•.................... 6-1 Digital Part Num.bers for Recommended Tools . . . . . . . . . . • . . . . . . . . . . . . . . . 7-1 Common Error CodeslProblems During Microcode Update. . . . . . . . . . . . . . . . . B-1 RA9OIRA92 Hardware Error Recovery Circuits • . . • . . . . . • . . . . . . . . . • . . . . . . B-2 RA9OIRA92 Error Recovery I..eve1s .. . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . 1-5 1-7 1-7 2-18 3-3 3-5 3-7 3-9 3-9 3-12 3-14 3-16 5-1 5-9 5-29 5-34 5-39 5-39 5-39 5-55 5-60 ~1 ~1 6-26 7-4 B-1 B-2 About This Manual The information contained in this manual is intended for Di2ital Customer Services personnel responsible for RA9OIRA92 disk drive maintenance and serviCe calls. This manual contains checkout, servicing, and troubleshooting information for RA90 and RA92 disk drives. Procedures for unpacking, deskidding, and cabling 6O-inch cabinets are also included. Procedures for installing RA90 and RA92 add-on disk drives in GO-inch cabinets are not included in this manual. Refer to product-specific documentation. Related documentation is listed below, in alphabetical order. DcK.-ument Title Order Number DSA Controller Documentation Kit QP9OS-GZ DSA Drive Documentation Kit QP907-GZ DSA Error Log Manual DSA Error Log Pocket Seroice Guide EK-DSAEL-MN EK-DSAEL-PG Getting Started With VAXsimPLUS AA-KN79A-TE HSC Service Manual EK-HSCMA-SV RA90 Disk Drive Rlustrated Parts Breakdown EK-ORA.90-IP RA90 Disk Drive Technical Description Manual EK-O~TD RA.9O Field Maintenance Print Set MP-01424-01 RA90 /6000 Cabinet Series Upgrade Installation Guide EK-RA9CK-IN RA90 / H9643 Cabinet Installation Guide EK-RA90H-IN RA90 / RA92 Disk Drive Pocket Service Card EK-ORA.90-PS RA90 / RA92 Disk Drive User Guide EK-ORA9G-UG SA600 / SA800 Storage Array Family Configuration Guide EK-SA6OO-CG SA650 / SA850 Storage Array Family Configuration Guide EK-SA65O-CG VAXsimPLUS Field Service Manual AA-KN82A-RE VAXsimPLUS User Guide AA-KNSOA-TE xiii 1 Introduction 1;;1 RASO and RA92 Disk Drive De-scrlptlons The RA90 and RA92 disk drives are high density, fixed-media disk drives which use nonremoveable, thin film media and thin film heads. The RA9OIRA92 heads, disks, rotary actuator, and filtering system are encased in a single unit called the Head Disk Assembly (HDA). The RA90 disk drive has a formatted data storage capacity of 1.216 gigabytes and an unformatted data storage capacity of 1.604 gigaby..es in a IS-bit word format. The RA92 disk drive has a formatted data storage capacity of 1.506 gigabytes and an unformatted data storage capacity of 1.987 gigabytes in a 16-bit word format. Thirteen surfaces contain data and embedded servo information. The embedded servo information is within the intersector gaps. The embedded servo information accomplishes fine positioning of read/write heads over the data tracks. Figure 1-1 is an example of the sector format used for RA9OIRA92 disk drives. The fourteenth surface is a dedicated servo surface that, when decoded by the drive electronics, provides information on: • Coarse radial position • Track crossing (velocity) • Rotational index and sector position • Generation of clock synch ?.llse • Inner and outer guardband detection DIGITAL INTERNAL USE ONLY 1-1 1-2 Introduction A BURST 11 BYTES HDR PRE HSY HDR DATA PREA , , , , B BURST 11 BYTES DSY ,~ DATA , EDC PAD • ~ ECC , WRITE/ READ RECOV ,~ HEADERS PREAMBLE 17 BYTES HDR SYNC 2 BYTES HEADER 16 BYTES DATA PREAMBLE 37 BYTES DATA SYNC 2 BYTES DATA 512 BYTES EOC 2 BYTES PAD 1 BYTE ECC 23 BYTES WRITE/READ RECOVERY 26 BYTES CXO-2166A Figure 1-1 Example of Sector Format DIGITAL INTERNAL USE ONLY Introduction 1=3 1.1.1 Physical and Logical Media Layout The physical structure of the media is transparent to the user. Figures 1-2 and 1-3 represent the layout of logical information for the RA90 and RA92 media. CYLO 2649 ~ 897 LeNs PER CYL I OUTER GUARD BAND HOST APPLICATION AREA (CYL 0~2648) I 2656 ~ I 2659 ! ! REVECTOR CONTROL TABLES (CYLS FORMAT AREA (CYLS I 2649~ 265~ DIAG AREA (CYLS I 26~ 2653) 2655) XBNs DBNs INNER GUARD BAND I I RIW DIAG 265~ 2660 2650) REPLACEMENT SECTORS 13 RBNlCYL I iV,S,BLETO HOST APPLICATIONS LBNs (0 ~2,376, 152) I --- ~I VISIBLE TO HOST OPERATING SYSTEMS RBNs (0 "34462) LBNs (0 "2,3n ,946) (O~ (O~ 2729) 1819) - t4------VISIBLE TO CONTROLLER-----,.. CXO-2167B Figure 1-2 RA90 Physical and Logical Media Layout ; .2 Maintenance Strategy The RA90 and RA92 disk drives introduce a new approach to repairing peripheral equipment. In most cases, RA9OIRA92 disk drives afford easy access to field replaceable units (FRU) without the use of tools. Additional drive maintenance features include the following: • . A microprocessor-controlled operator control panel (OCP) interface eliminating the need for external test equipment • EEPROM where an internal error log is stored • Twelve error recovery levels • Extensive drive-resident diagnostics • Drive microcode that can be updated by way of the microcode update port DIGITAL INTERNAL USE ONLY 1-4 Introdudion 949 LBNs PER CYL CYLO 3099 l l OUTER GUARD BAND HOST APPLICATION AREA (CYL 0..... 3098) 3106 3108 3111 l l l REVECTOR CONTROL TABLES (CYLS FORMAT AREA (CYLS DIAG AREA (CYLS 310~ 310~ 3099~ 3105) 3107) XBNs (0" 4809) DBNs (0'" 1923) INNER GUARD BAND RIW DIAG 3111~ 3112 3100) REPLACEMENT SECTORS 13 RBNlCYL ~VISIBLETO HOST APPLICATIONS LBNs (0 "'2.940.950) - ~I VISIBLE TO HOST OPERATING SYSTEMS RBNs (0 -'40312) LBNs (0 -'2,942,848) -... ......-----VISIBLE TO CONTROLLER------I... CXO-2976A Figure 1-3 RA92 Physical and logical Media Layout 1.2.1 Service Delivery Strategy Real-time subsystem (drive) faults detected by the drive are recorded in the RA9OIRA92 drive internal error log. Real-time faults detected in the disk subsystem are recorded in the supporting system host error log. Controller-detected errors (such as ECC errors) are also logged to the host error log and not the RA9OIRA92 drive-resident error log. Use utility programs to obtain a print-out of the drive internal error log and isolate faults, provided the error was drive-detected. Additionally, you can run the RA9OIRA92 drive-resident utility T41 to access the drive internal error log. This provides the drive LED error codes only. Use of other utility programs provides additional error information. Use drive-resident diagnostics to validate repairs to RA9OIRA92 disk drives. For more information on drive-resident diagnostics and utilities, refer to Chapter 4. 1.2.1.1 SIx-step Maintenance Strategy This section describes the maintenance strategy for RA90 and RA92 disk drives. Become familiar with it as it determines the course of action necessary to successfully service RA9OIRA92 disk drives. Implement the following six-step maintenance strategy on each service call for a drive problem: 1. Examine and analyze VAXsimPLUS. 2. Examine and analyze system error logs. 3. Examine and analyze the drive internal error log. 4. Correlate failure symptoms to the probable failing FRU through service documentation. DIGITAL INTERNAL USE ONLY Introduction 1~5 5. Replace the FRU only after a prime FRU is identified from previous steps. 6. Verify device repair through drive-resident diagnostics. (Running host-level diagnostics to verify repairs is unnecessary and penalizes the customer by tying up the system.) Use host-based diagnostics only as a iast resort, to obtain symptomatic iailure information, and only if system and drive error logs are unavailable. Verify the drive is on line and operational through normal system-level commands that access the unit under repair. 1.2.2 Tools Required for Maintenance Tools required for maintaining RA9OIRA92 disk drives are identified in the procedures where they are needed and in Chapter 6. 1.2.3 Preventative Maintenance Customer responsibilities for preventative maintenance (60-inch cabinets only) are described in AppendixC. Digital Customer Services responsibilities for cabinet and RA9OIRA92 disk cL-ive maintenance are described in Appendix D. 1.3 RA90/RA92 Disk Drive Specifications Table 1-1Usts important operating and nonoperating specifications for RA90 and RA92 disk drives. Table 1-1 Specifications for RA90 and RA92 DIsk DrIves Characteristic RA90 Disk Drive RA92 Disk Drive Bead Disk Assembly (BDA) Storage capacity, formatted 1.216 gigabytes Storage capacity, unformatted 1.604 gigabytes 1.987 gigabr..es HDA word format IS-bit only Same asRA90 Bits/square inch 40 megabits 49.4 megabits Tracks/inch 1750 2045 Disk recording method Rate 213 modulation code Same asRA90 Number of disks 7 Same as RA90 Disk surfaces 14 (13 data and 1 servo) Same asRA90 Number of heads 14 Same as RA90 Heads per surface 1 Same as RA90 Data tracks 34,437 40,287 Logical cylinders 2656 3101 User logical cylinders 2649 3099 Number of sectors 69 + 1 spare 73 + 1 spare Number of logical blocks 2,376,153 2,942,849 1.506 gigabytes DIGITAL INTERNAL USE ONLY 1-6 Introduction Table 1-1 (Cont.) Specifications for RA90 and RA92 Disk Drives Characteristic RA90 Disk Drive RA92 Disk Drive Seek 'limes One cylinder 5.5 milliseconds 3.0 milliseconds Average seek 18.5 milliseconds 16.0 milliseconds Maximum cylinder seek 31.5 milliseconds 29.0 milliseconds Latency Rotation speed 3600 rlmin 3405 rlmin Average latency 8.33 milliseconds 8.81 milliseconds Maximum latency 16.67 milliseconds 17.62 milliseconds SiDgle Start/Stop Time Start (maximum) 40 seconds Same asRAOO Inhibit between stop and restart 40 seconds Same asRA90 Data Bates Transfer rate 2.77 megabytes/sec Same asRAOO Physical Characteristics Height 26.56 em (10.42 inches) Same asRAOO Width 22.19 em (8.74 inches) Same as RA90 Depth 68.47 em (26.96 inches) Same asRA90 Weight 31.8 kg (70 pounds) Same asRA90 Inrush Current 120Vac 60 amperes peak @ 132 Vac Same as RA90 220-240Vac 70 amperes peak @ 264 Vac Same asRA90 120Vac 4.6 amps Same as RA90 220-240Vae 2.4 amps Same asRAOO 120Vae 0.7 Same asRA90 220-240Vac 0.58 Same as RAOO Line cord length (from the cabinet) 2.74 meters (9 feet) Same asRA90 Power factor: DIGITAL INTERNAL USE ONLY Introduction 1-7 Table 1-2 contains additional electrical specifications by model for RA90 and RA92 disk drives. NOTE The RA90 and RA92 disk drives are not line-frequency dependent. Table 1-2 Additional Bectrlcal Specifications by Model for RA90 and RA92 Disk DrIves Input Current (Amps)l NomiDal Voltage CurreDt PHI Neutral Power Dissipation BTUslBour Model RA9O-xxIRA92-u 120 volta 5.0 3.4 3.4 281 Watts 960 RA.9O-xxIRA92-xx 240 volts 2.85 1.45 1.45 271 Watts [976] Start-Up [IijIIIour]1 lCurrents are for nominal voltages of 120 Vac phase to neu1z'al or for 240 Vac phase to neutral. For 101 Vac and 220 Vac nominal voltages, the drives will have proportionately higher phase CUlTeDts by a ratio of 1201101 or 24G'220 to the currents specified in this table. 2Bracketed figures indicate kilojoules per hour. Table 1-3 shows the maximum environmental limits and the recommended environmental operating ranges to optimize equipment performance and reliability. Table 1-3 RA90JRA92 Environmental Limits Characteristic RA9OIBA92 Disk Drive Mmmum EnviroDmeDtal Limits Temperature (Required) Operating 100 e to 400 e (50°F to 104°F) with a temperature gradient of 200 elhour (36°Flhour) Nonoperating -400 e to +6Ooe (-40°F to +14O°F) Beiaiive humidity Operating 10% to 90% (noncondensing) with a minimum wet bulb temperature of 28°e (82°F) and a minimum dew point of 2°e (86° F) Nonoperating 10% to 90% with no condensation DIGITAL INTERNAL USE ONLY 1-8 Introduction Table 1-3 (Cont.) RA90/RA92 Environmental limits RA9OIRA92 Disk Drive Characteristic Recommended Environmental Operating Ranges Temperature 18°C to 24°C (64.4°F to 7S.2°F) with an average rate of change of 3°Clhour maximum and a step change of 3°C or less Relative humidity 40% to 60% (noncondensing) with a step change of 10% or less (noncondensing) Air quality (maximum particle count) Not to exceed 500,000 particles per cubic foot of air at a size of O.S micron or larger Air volume (at inlet) SO cubic feet per minute (.026 cubic meters per second) Altitude Operating Sea level to 2400 meters (8000 feet); maximum allowable operating temperatures are reduced by a factor of 1.8°Cl1ooo meters (l°F/1000 feet) for operation above sea level Nonoperating 300 meters (1000 feet) below sea level to 7500 meters (16,000 feet) above sea level (actual or effective by means of cabin pressurization) 1.4 Electrostatic Protection Electrostatic discharge (ESD) is the result of electrostatic buildup and its subsequent release. The surface storage of an electrostatic charge from a person or object can damage hardware components and may result in premature device or option failure. The basic concept of static protection for electronic components is the prevention of static buildup, where possible, and the safe release of existing electrostatic charge buildup. If the charged object is a conductor, such as an object or person, complete discharge can be achieved through grounding the person or object. Use the following guidelines when handling static-sensitive components and modules: CAUTION Always use grounding straps to avoid product damage when handling static-sensitive components and modules. 1. Read all instructions and installation procedures included with static control materials and kits. 2. Use static-protective containers to transfer modules and components (including bags and tote boxes). 3. Wear a properly grounded ESD wrist strap when handling components, modules, or other static-sensitive devices. Figure 1-4 shows the ESD wrist strap in use. When using an ESD wrist strap: • Ensure the wrist strap fits snugly for proper conductivity. • Attach the alligator clip securely to a clean, unpainted, grounded metal surface such as the drive chassis or cabinet frame. • Do not overextend the grounding cord. DIGITAL INTERNAL USE ONLY Introduction 1-9 CHASSIS STABILIZER BRACKET GROUNDING ESD WRIST STRAP CXO-2168C Figure 1-4 ESD Wrist Strap DIGITAL INTERNAL USE ONLY 2 Installation 2.1 Introduction The SA600 and SA650 cabinets are the most commonly used cabinets for RA90 and RA92 disk drives. Procedures for unpacking, deskidding, and cabling 6O-incht cabinets are contained in this chapter. This chapter also covers site preparation and planning considerations, drive acceptance testing procedures, and power-up diagnostics. Information on unpacking and installing add-on RA90 and RA92 disk drives in 6O-inch cabinets can be found in product-specific documentation and is not covered here. 2.2 Site Preparation and Planning Site preparation and planning are necessary before installing an RA90 or RA92 disk drive subsystem. Chapter 1 contains a full range of recommended environmental specifications. In addition, consider the following items before attempting installation. 2.2.1 Power and Safety Precautions The RA901RA92 disk drives do not present any unusual fire or safety hazards. It is recommended, however, that you check ac power Wiring for the computer system to determine adequate capacity for expansion. 2.2.2 Three-Phase Power Requirements The RA90 and RA92 disk drives use a single-phase power supply; however, the 881 power controller uses three phases. It is very important that the correct phase requirements for this product be met. Refer to Chapter 1 for power specifications. WARNING Hazardous voltages are present in this equipment. IDstallation and service must be performed by trained service pel'SODlleL Bodily iDjury or equipment damage may result from incorrect servicing. To prevent damage to equipment and personnel, ensure power sources meet the specifications required for this equipment. t The SA600 and the SA650 are both 6O-inch cabinets. DIGITAL INTERNAL USE ONLY 2-1 2-2 Installation POWER CORDS GOING TO POWER CONTROLLER 120V 60HZ POWER CORD DEC NO. A-PS-1700083-23 PLUG - POWER CONTROLLER END 240V 50HZ POWER CORD DEC NO. A-PS-1700083-24 PLUG - POWER CONTROLLER END 120/240V 47 -63HZ 10Al6A POWER CORD DEC NO. A-PS-1700442-18 OR A-PS-1700442-19 PLUG - DRIVE END PLUGS GOING TO WALL OUTLET (FROM CONTROLLER) 120V 60HZ 24A 1-PHASE 40-INCH CABINET NEMA NO. L5-30P DEC NO. 12-11193 (874-0) 220/240V 50-60HZ 16A 1-PHASE lEe 309 320-P6W DEC NO. 12-14379-03 (874-F) 120/208V AC 60HZ 30A 3-PHASE WVE USED WITH 881-A AND 881-C POWER CONTROLLERS (@ \.V 5-WIRE NEMA NO. L21-30P 60-INCH CABINET 220-240/380-415V AC 50HZ 20A OR 16A 3-PHASE WVE USED WITH 881-B POWER CONTROLLER •• ~ 5-WIRE. 4-POLE. IEC 309 CXO-1872D Figure 2-1 Electrical Plug Configurations DIGITAL INTERNAL USE ONLY Installation 2-3 2.2.3 AC Power Wiring The wiring used by Digital Equipment Corporation conforms to UL, CSA, and ISE standards. Figure 2-1 shows the ae plug configurations for RA90 and RA92 disk drives and 881 and 874 power controllers. 2.2.4 Thermal Stabilization Thermal stabilization prevents temperature differences between the equipment and its environment from damaging disk drive components. Prior to installation, a 6O-inch cabinet subsystem and the RA9OIRA92 add-on drive must be stored at a temperature of sooF (16°C) CT bigher fer a '"""'mum of 24 hou-'PS. These units may be stored either in the computer room or in another storage room under controlled temperature conditions. If r..ored in another storage room, each unit must sit for an additional hour in the computer room in which it is to be installed. CAUTION The thermal stabilization procedure is JIIGlUlaIory. Do not open the moisture barrier bag until after the thermal stabilization periocL Failure to thermally stabiHze the equipment may cause premature equipmcmt failure. After the thermal stabilization criteria has been met, carefully cut the moisture barrier bag and proceed with the installation. 2.2.5 Floor Loading Consider the placement of this equipment, especially if a fully loaded 6O-inch configuration is used. A fully loaded 6O-inch cabinet weighs approximately 390 kilograms (860 pounds). Each RA90 or RA92 disk drive weighs approximately 31.8 kilograms (70 pounds). 2.2..6 Operating Temperature and Humidity The required relative humidity range is between 10 percent and 90 percent with a minimum wet bulb temperature of 28°C (8~F) and a minimum dew point of ~C (36°F) (non-condensing) with a step change of 10 percent or less. The RA90 and RA92 disk drives can be operated within temperatures of 10°C to 40°C (50°F to 104°F). However, it is highly recommended that RA90 and RA92 disk drives be operated in a temperature range below 25°C (77°F) to increase reliability and extend product life. 2.3 Unpacking the Cabinet The GO-inch cabinet configuration is packed in a cardboard carton attached to a wooden shipping pallet. Refer to Figure 2-2 and use the following procedure to unpack the cabinet: 1. Inspect the shipping carton for any sign of external damage. Report any damage to the local carrier and to the Digital Customer Services or sales office. 2. Remove the two cardboard U-sections but leave the sealed moisture barrier with desiccant in place during thermal stabilization. CAUTION This equipment must be thermally stabiHzed in the site enviromnent for at least 24 hours before operation. DIGITAL INTERNAL USE ONLY 2-4 Installation MACHINE BOLTS (2 EACH SIDE) SHIPPING STRAPS CARDBOARD U-SECTION SHIPPING BOLT *00 NOT OPEN UNTIL THE THERMAL STABILIZATION PROCEDURE IS COMPLETE. Figure 2-2 Unpacking the 50-Inch Cabinet DIGITAL INTERNAL USE ONLY CXO-2717 A_S instaiiation 2-5 2.3.1 Deskiddlng the Cabinet Three people are required to deskid the 60-inch cabinet. See Figure 2-3. WARNING Serious injury could result if the cabinet is improperly handled. Figure 2-3 cabinet Desklddlng 1. 'Remove the two unloading ramps from their carton located under the carton top cover. 2. Inspect the ramps, ramp side rails, and metal hardware for defects described in the following list: • Cracks more than 25 percent of the ramp depth, either across or lengthwise on the ramp. • Knots or knotholes going through the thickness of the ramp and greater than 50 percent of the ramp width. • Loose, missing, or broken ramp side rails. • Loose, missing, or bent metal hardware. If any of these conditions exist, do not use that ramp. Investigate alternate means of removing the cabinet and/or order a new ramp. The part number for the left ramp is ~768~1; the part number for the right ramp is ~768~2. DIGITAL INTERNAL USE ONLY 2-6 Installation 3. Remove shipping bolts from the shipping brackets on each of the four levelers. See inset in Figure 2-2. 4. Remove shipping brackets from the four cabinet levelers. 5. Fasten unloading ramps onto the pallet by fitting the grooved end of each ramp over the metal mating strip on the pallet. See Figure 2-4. SHIPPING PALLET RIGHT UNLOADING RAMP STEEL~ RIGHT RAMP ATTACHES HERE Figure 2-4 DOWEL Ramp Installation of Shipping Pallet 6. Screw the cabinet levelers (Figure 2-5) all the way up until the cabinet rests on its rollers on the pallet. 7. Carefully roll the cabinet down the ramps (three people are required). 8. Move the cabinet into its final position. 9. Turn each leveler hex nut clockwise until the leveler foot contacts the floor (no weight on the casters) and the cabinet is level. DIGITAL INTERNAL USE ONLY installation 2-7 2.4 Installing SDI cables and Power Cords Generally, SDI cables and power cords are installed in the 6O-inch cabinet prior to shipping. Use this section as a ref'erence should you need to remove or reinstall the power cords or SDI cables. LOCKNUT LEVELER HEX NUT LEVELER FOOT Figure 2-5 Leveler Adjustment 2.4.1 Removing the Front and Rear Access Panels Use the following procedure to remove front and rear cabinet aeeess panels. 2.4.1.1 Front Access Panel Removal Refer to Figure 2-6 while performing this procedure: 1. Use a hex wrench or fiat-bladed screwdriver to unlock the two quarter-turn fasteners at the top of the panel. Turn the fasteners counterclockwise. 2. Grasp the panel by its edges, tilt it toward you, and lift it up about 2 inches. Remove the panel and store it in a safe place. To r~tall the front panel, lift it into place and lower it straight down until the tabs on the panel's lower edge engage the slots in the cabinet support bracket. Hold the panel flush with the cabinet and use a hex wrench to lock the fasteners. DIGITAL INTERNAL USE ONLY 2-8 Installation OUARTER-TURN FASTENER .,,./' FRONT PANEL r ./' I I SUPPORT BRACKET CXO-2130C Figure 2-6 Front Panel Removal DIGITAL INTERNAL USE ONLY Installation 2-9 2.4.1.2 Removing the Rear Access Panel Refer to Figure 2r-7 while performing this procedure: 1. Use a hex wrench or fiat-bladed screwdriver to unlock the two quarter-turn fasteners at the top of the panel. T-w-n the fasteners counterclockwi..se. 2. Tilt the panel toward you and lift it up to disengage the pins at the bottom. 3. Lift the panel clear of the enclosure and store it in a safe place. When replacing the rear panel, lift it into place and fit the pins into the holes at the top of the VO bulkhead. Push the top of the panel into place and turn the quarter-turn fasteners clockwise. QUARTER-TURN ~ FASTENER CABINET REAR ~ BUSTLE ,/ ", il!lIl. II "lIlItI,i," HEX WRENCH 1111 'l'IIII~1I1111 / REAR ACCESS PANEL II II '" 110 BULKHEAD jill 1111111111 11 111 '- IIIIIII~IIII'II '" 1111111 PINS CXO-2131D Figure 2-7 Rear Access Panel Removal DIGITAL INTERNAL USE ONLY 2-10 Installation 2.4.2 SDI Cable Connections and Routing Both external and internal cables are connected to the 110 bulkhead located at the base of the drive cabinet. See Figure 2-8. Refer to product-specific documentation for more information. CABLE CABLE TROUGH TROUGH SOl CABLES 1/0 BULKHEAD • 2 • • ~OOB 4 ~DoB 1 o I 5 0; • <tOOB • ~OOA ~DoA °OoB ° <tOOA 7 o OB °D ~DoA <to OA °OoA 0 0 3 <to OB <tOOB 0 <tOOA 1 ~OOB • CXO-2132B Figure 2-8 SDI Cable Connections and Routlng-SA600 Example DIGITAL INTERNAL USE ONLY installation 2-11 2.4.3 Power Cord Connections and Routing Figure 2-9 shows drive power cord connections and the recommended power cord routing for an SA600 storage array cabinet. Refer to product-specific documentation for power cord connections and routing for other subsystems. DISK DRIVE POWER CORDS DISK DRIVE POWER CORDS POWER' CONTROLLER ACPOWER CORD ~ q ~ CXO-2133B Figure 2-9 Power Cord ConnecUons and Routlng-SA600 Example DIGITAL INTERNAL USE ONLY 2-12 Installation 2.5 Locating the RA90/RA92 Disk Drive Power Supply To access the RA90 or RA92 disk drive power supply, remove the cabinet rear access panel (Figure 2-7). Figure 2-10 shows the location of the RA9OIRA92 disk drive power supply, circuit breaker, and the Power OK LED. DRIVE /REAR CIRCUIT BREAKER GREEN LED (POWER OK) CXO-2134B Figure 2-10 RA90/RA92 Power Supply Controls and Indicators 2.5.1 Plugging in the Power Cord The drive power cords in a fully-configured cabinet are already plugged into the power controller. Only the ac power cord from the cabinet power controller needs to be plugged into an external power source. NOTE Do not apply power to the power controller until proper voltage has been selected. (Refer to Section 2.7.1.) DIGITAL INTERNAL USE ONLY Installation 2-13 2.6 International Operator Control Panel Labeling Each drive unit or cabinet configuration is shipped with a set of international labels for the operator control panel (OCP). The labels come in a packet or on a single sheet. Select and apply the set of labels applicable to the countr-.i in w hlch the equipment is being installed. 2.7 RA90/RA92 Disk Drive Acceptance Testing Procedures The following sections cover RA9OIRA92 disk drive acceptance testing procedures. Follow each procedure to completion before starting the next. Refer to Figure 2-11 while performing acceptance testing on RA90 and RA92 disk drives. A more detailed description of the RA9OIRA92 OCP and its functions can be found in Chapter 3. FOUR-CHARACTER ALPHANUMERIC DISPLAY UNIT NUMBER TEST SWITCH \ , FAULT SWITCH STATE LED INDICATORS \ RUN SWITCH NOTE: RA90 PART NO. 74-35109-02 RA92 PART NO. 74-39769-01 . Figure 2-11 CXO-2962A RA90/RA92 Operator Control Panel 2.7.1 Voltage Selection Before applying power to RA90 or RA92 disk drives, ensure the proper operating voltage has been selected for your area of operation. The voltage selector is a slide switch capable of selecting 120 volts or 240 volts. (The frequency 60 Hz or 50 Hz is universal.) To -select the proper voltage, perform the following steps: 1. Remove the cabinet rear access panel (refer to Section 2.4.1.2). 2. Verify the ac circuit breaker on the power controller is off. DIGITAL INTERNAL USE ONLY 2-14 Installation 3. Verify the circuit breaker on each disk drive is off (0). 4. Locate the voltage selector switch (Figure 2-12). 5. Using a non-conductive pointed object, slide the voltage selector switch into the position applicable to your site. PORTB 120V OR 240V PORTA VOLTAGE SELECTOR SWITCH POWER SUPPLY NOTE: VOLTAGE MARKINGS ON SOME POWER SUPPLIES READ 1151230V. Figure 2-12 CXO-2135D Location of Voltage Selector Switch 2.7.2 Applying Power to the Drive Use the following procedure to apply power to RA9OIRA92 disk drives: l. Verify drive voltage selector switch has been properly set (see Section 2.7.1). 2. Verify the ac circuit breaker on the power controller is off. Also verify the circuit breaker on each disk drive is off. See Figures 2-10 and 2-13 for circuit breaker locations. DIGITAL INTERNAL USE ONLY installation 2-15 ./ POWER CONTROLLER POWER CONTROL BUS CONNECTORS / GROMMETED CORD OPENING UNDELAYED~FUSE '" DELAYED (0.5 SEC) SERIAULOGO LABEL CX0-2136A Figure 2-13 Location of Power Controller Controls-881 Example 3. Verify the LocalIRemote switch on the 881 power controller is in the Local position. 4. Verify the drive power cord is plugged into the power controller. 5. Verify the external power source is correct. 6. PlUg the ac power cord from the power controller into an external power receptacle. 7. Switch the ac circuit breaker on the power controller to the on position. 8. Switch the ae circuit breaker on the RA90 or RA92 disk drive to the on position. DIGITAL INTERNAL USE ONLY 2-16 Installation 2.8 Power-Up Resident Diagnostics A sequence of drive-resident diagnostics run at power-up. The sequence consists of hardcore tests with basic processor tests. Successful completion of the hardcore tests is indicated by the following OCP displays: 1. Blank (1 second) 2. WAIT (16 seconds) 3. [0000] (If previously programmed, the drive unit number is displayed; otherwise, zeros are displayed.) 2.8.1 OCP Lamp Testing Before continuing with acceptance testing, perform an OCP lamp test to ensure the LED state indicators and alphanumeric display are working properly. Perform the following procedure before selecting any other OCP switches (refer to Figure 2-11): 1. Select the Test switch. The Test LED indicator lights. 2. Select the Fault switch. All lamps light momentarily. 3. Deselect the Test switch. All lamps should momentarily light. If not, ensure the OCP is seated properly and power is applied to the drive. Repeat the test. Replace the OCP if any lamps fail (refer to Section 6.7). 2.8.2 Test Selection from the OCP It is necessary to select and run resident diagnostics from the OCP to complete acceptance testing. Use the following procedure to select and run diagnostics from the OCP. Figure 2-14 is a flowchart of this procedure. 1. Power up the drive (if not done previously). 2. Select the Test switch (test defaults to zero; no other operator action is required). 3. Select the Write Protect switch. 4. Select the diagnostic to run by using Port A and Port B switches. See the test selection flowchart (Figure 2-14). 5. Start the test by selecting the Write Protect switch. 6. Stop the test by selecting either the Port A or Port B switch. 7. Restart the test by selecting the Write Protect switch again. 8. Select the Test switch to exit the test mode. 2.8.3 RA90/RA92 Idle Loop Acceptance Testing After the hardcore diagnostics have successfully run, the drive automatically enters an idle loop diagnostic test sequence. Do not select any front panel switches. Allow the drive to remain in the idle loop test for 5 minutes. DIGITAL INTERNAL USE ONLY instailation 2-17 1 SELECT PORT A SWITCH (MSD BEGINS FLASHING) DISPLAY = SELECT PORT SWITCHES TO DESELECT PORTS(S) I DISPLAY = * i iNCREMENT NUMBERS 0-9 BY SELECTING PORTB SWITCH DISPLAY = SELECT WRITE PROTECT SWITCH (TEST STARTS) SELECT WRITE PROTECT SWITCH DISPLAY = SELECT PORT A SWITCH I (LSD BEGINS FLASHING) DISPLAY = INCREMENT NUMBERS 0-9 BY SELECTING PORTB SWITCH • INDICATES FLASHING READOUT Figure 2-14 DISPLAY = (START) DISPLAY = (COMPLETE) t 1 --+-1_°..,&,1_1......1 DISPLAY = ...T-Io.I t DIAG CAN BE STOPPED BY SELECTING PORT A OR PORT B, RESTARTED BY SELECTING WRITE PROTECT, OR EXITED BY SELECTING TEST SWITCH CXO-2139B Test SeIecUon Flowchart DIGITAL INTERNAL USE ONLY 2-18 Installation If an error occurs during power-up or during idle loop testing, the drive attempts to display an error code. Table 2-1 lists error codes and required operator actions. Error codes not found in Table 2-1 indicate a problem requiring additional troubleshooting. Refer to Chapter 5 for troubleshooting strategy. Table 2-1 OCP Error Codes Error Description Action OF Drive write protected Disable write protection with the OCP Write Protect switch or turn off software write protection. 22 Drive over-temperature condition Spin down and remove power from the drive. Ensure the cabinet air vent grill is clean and room temperature is within recommended limits. Call Digital Customer Services if dirty air vent grill or temperature has not caused an over-temperature condition. 2D Power supply overtemperature condition Spin down and remove power from the drive. Ensure the cabinet air vent grill is clean and room temperature is within recommended limits. Call Digital Customer Services if dirty air vent grill or temperature bas not caused an over-temperature condition~ 3A, Write protect errors 6F Disable write protection with the OCP Write Protect switch or turn off software write protection. 2.8.4 Testing Spun-Down Drive To invoke resident diagnostics while the drive is still spun down: 1. Select Test switch (Test indicator lights). 2. Select the Write Protect switch: [T 00] is displayed. 3. Input [T 60] into display. This is a loop-on-test utility. 4. Start T60 by selecting the Write Protect switch a second time. The following occurs: [8.60] [LOT] [C.60] [T 00] (LSD flashing) 5. Input [T 00] into display. 6. Start TOO by selecting the Write Protect switch a second time. The drive is now running a sequence of resident diagnostics. A number of displays are seen during the execution of the diagnostics. These displays are normal. Examples of these displays are shown in Figure 2-15. DIGITAL INTERNAL USE ONLY Installation 2-19 DISPLAY = I I I I I T 0 DISPLAY = I I S 1 0 11 I (START) DISPLAY = I C 1 I I (COMPLETE) DISPLAY = I I i T i .. i 1 0 11 10 11*1 * INDICATES FLASHING DISPLAY CXO-2137A Figure 2-15 OCP Displays During Testing Allow drive tests to run for 5 minutes before continuing acceptance testing. To halt testing, select the Test switch (Test LED extinguishes). 2.8.5 Testing Spun."Up Drive To spin up the RA90 or RA92 disk drive, select the Run switch. The Run indicator lights and an [R.-] appears in the display. Allow the drive to come to the ready state as indicated by the front panel Ready indicator. If either of the ports (AlB) are selected when the drive reaches the ready state, deselect the port switches, then proceed as follows: 1. Select the Test switch. Test indicator lights. 2. Select the Write Protect switch. [T 00] is displayed. 3. Input [T 60] into display. This is a loop-on-test utility. 4. Start T60 by selecting the Write Protect switch a second time. [LOT] is displayed in the OCP. 5. Select the Write Protect switch. 6. Input [T 00] into the display. 7. Start TOO by selecting the Write Protect switch a second time. The above steps invoke a sequence of resident-diagnostic tests. The tests check drive functions in the following areas: Processor Servo bus Positioner Head select Read/write circuitry Fault detection circuitry DIGITAL INTERNAL USE ONLY 2-20 Installation Allow the tests to run for 30 minutes to complete ac..ceptance testing, then select the Test switch to exit the test mode. The Test LED extinguishes, an [R..••] appears in the display and the Ready and Run indicators light. Additionally, if either port switch is selected, it will be displayed after the unit address: [R AB]. If an error occurs during power-up or during the idle loop diagnostics, the drive attempts to display an error code. Table 2-1 lists error codes and required operator actions. If no problems are encountered, place the drive on line. NOTE In an HSe cluster environment, you can duplicate system usage by running n,EXER for a few minutes; in a non-HSe environment, a successful operating system disk initialization and mount operation are sufficient for verifying subsystem operation. 2.9 Placing the Drive On Line The following procedure assumes drive acceptance testing and cabling procedures have been completed. If not, refer to the appropriate sections of this manual for details. 2.9.1 Programming the Drive Unit Address The unit address can be set once power has been applied to the drive. The unit address is programmable in the range of 0 to 4094. t Enter the test mode to set the unit address. In the test mode, Port A and Port B switches have the added function of selecting both the unit address numbers and test numbers. After applying power, follow this procedure to set the drive unit address. Figure 2-16 is a flowchart of this procedure. 1. Select the Test switch. The Test LED lights and zeros are displayed. (Something other than zeros may be displayed if the unit address has been previously programmed.) 2. Select the Port A switch for the ones position. Position zero will blink. 3. Select the Port B switch. Position zero will increment 1 through 9 for every time Port B is selected. 4. Select the Port A switch for the tens position. Position one will blink. 5. Select the Port B switch. Position one will increment 1 through 9 for every time Port B is selected. 6. Select the Port A switch for the hundreds position. Position two will blink. 7. Select the Port B switch. Position two will increment 1 through 9 for every time Port B is selected. 8. Select the Port A switch for the thousands position. Position three will blink. 9. Select the Port B switch. Position three will increment 1 through 4 for every time Port B is selected. 10. Select the Test switch to exit the unit selection function. t The KDA50lUDA50IKDB50 support drive logical unit addresses only up ro 255. DIGITAL INTERNAL USE ONLY Installation 2-21 ~ORMAl, MODi) INCREMENT NUMBERS 0-9 BY SELECTING PORTB SWITCH DISPLAY = DESELECT PORT AAND B TO SET UNIT ADDRESS I I THE FOLLOWING IS A SCROLLING DISPLAY. TO STOP DISPLAY, SELECT RUN SWITCH I SELECT PORTA SWITCH I DISPLAY = . - - 1 _......--......... L..I • DISPLAY = DISPLAY = rOO· 0 0 1 lit SELECT TEST SWITCH (TEST LED LIGHTS) SELECT PORTA SWITCH DISPLAY = I INCREMENT NUMBERS 0-9 BY SELECTING PORTB SWITCH DISPLAY. SELECT PORT B SWITCH TO TOGGLE (Y) OR (N) I SELECT PORTA SWITCH 0 TO SAVE OLD ADDRESS,SELECT [N] AND EXIT BY SELECTING TEST SWITCH , INCREMENT NUMBERS 0-9 BY SELECT!NG I PORTS SWITCH DISPLAY = SELECT PORTA SWITCH r 0 "I 0 0 0 iNCREMENT NUMBERS 0-4 BY SELECTING PORTB SWITCH I DISPLAY = TO SAVE NEW ADDRESS, SELECT [Y] AND EXIT BY SELECTING TEST SWITCH SELECT TEST SWITCH DISPLAY = 0 DISPLAY = • INDICATES FLASHING READOUT Figure 2-16 CXO-2138A Unit Selection Flowchart DIGITAL INTERNAL USE ONLY 2-22 Installation Before exiting, you will be prompted to verify that you want the unit number changed. The OCP displays the following prompt: eRG UNT , {? [N]} 1. If you do not want to change the unit address, select the Test switch a second time. 2. To change the unit address, proceed as follows: • Toggle the Port B switch. eRG UNT , {? [Y]} displays. • Select the Test switch. The old unit address will be overwritten, and the new unit address will be displayed in the OCP. NOTE The unit address number is written to EEPROM and is not lost if the drive loses power. 2.10 Installing RA90/RA92 Add-On Disk Drives in 6o-Inch Cabinets Information for unpacking and installing RA9OIRA92 add-on disk drives into 6O-inch cabinets can be found in product-specific documentation. Refer to the preface, About This Manual, for a list of related documentation. DIGITAL INTERNAL USE ONLY 3 Operating Instructions 3.1 Introduction This chapter describes each of the RA90 and RA92 disk drive components. Module compatibility tables are provided to explain the relationships between RA90 and RA92 disk drive hardware. Drive block diagrams are included to illustrate component relationships. This chapter also explains various operating modes of RA9OIRA92 disk drives, and covers drive unit address programming, test functions, and fault functions. 3.2 RA90/RA92 Disk Drive Components The main components of RA90 and RA92 disk drives are: • The electronic control module (ECM) • The preamp control module (PCM) • The blower motor assembly • The head disk assembly (HDA) • The drive power supply • The operator control panel (OCP) RASOlRA92 disk drives use three microproeess..,~ to aCC()ft"plish cL-rive functions. The processors are the master (or I/O), the servo (or DSP), and the operator control panel (OCP) processor. Figure 3-1 shows a simplified block diagram of RA9OIRA92 disk drives. DIGITAL INTERNAL USE ONLY 3-1 0 G5 iJ r- z-f m JJ z ~ I ::n cc c Ci :u » CD e:u m N 0 Z ~ » CD HALL S1 HALL S2 HALL S3 0 en ~ ... C i BYTE CLK H WRT CURRENT SWITCH L EMBEDDED OK H HALL PWR/GND SVO FLT H +1- SPINDLE LOCK FINE TRK H COILA INDEX DET H COIL B DETENT L COILC SERVO CLK ECL UH BUF OCP SERIAL OUT H OCP INTLK CLOSED L 5VPF BEZEL INTLK CLOSED L TEMP/AIR FLT L 40KHZ H DCOKH SENSE -5.2V OVER TEMP L POKH +/- ACT LOCK +/- MOTOR SPINDLE POWER SUPPLY OJ 0" (') +1-12V +5V -5.2V +1-24V GND BLR 40KHZ H SENSE RTN DCOKH OVER TEMP L POKH ~ 0 Dr ... cc Q) 3 ON L OCP SERIAL IN H +5V PWR/UP RST L INIT OCP PROC L SOUTH BZL INTLK CLOSED L OCP INTLK CLOSED L DRIVE 10 REV LEVEL 01 H DRV 10 (00:19) ZONE 2 H ZONE 1 H CHIP ENABLE L WRT DATA TRANS H/L +/- 12V +/- 5.2V +5V HD SEL1I2L FRC MULTICHIP H HEAD CLK H ENC WRITE GATE L SEL REV LEVEL H RIW ROY H VENDOR TYPE 1/2 H +1- ANALOG SERVO DATA +1- RAW READ DATA WRITE UNS L CURRENT MONITOR L WRITE CUR DET L FORMAT 16L HDA DATA L HDA INTLK CLOSED L +5V SENSE RTN SENSE -S.2V -6.2V +12V -12V OCP BLOWER HDA IIO-RIW SERVO ! c en ACTUATOR GND BLR +24V BUFF INT ENAB L (INT ENAB L) ONL PWR FAIL ENA L GASP RST L OCP SERIAL IN H BURST AMP OK H BURST PROT L ON/OFF L DATA (SDO:SD7) ADDRESS (SAO:SA10) PWR/UP RST L (UP RST L) EMB A+I-B L +/- ANALOG SERVO DATA SE CLK H SR L GASP SELECT L SEL DRV 10 L INIT OCP PROC L SASH SOE L AlB READ/RES DATA SWE L AlB REAL-TIME DRIVE STATE SOl INTERFACE A/B REAL-TIME CTRL STATE AlB WRT/CMD DATA CXO-2185A Operating Instructions 3-3 3.2. 1 Electronic Control Module (ECM) The ECM field replaceable unit (FRU) consists of two modules back-to-baek mounted on a slide carrier. One module contains the inputloutput-readlwrite (I10-R/W) circuitry and is referred to as the I10-R/W module. The second module contains the servo circuitry and is referred 1;0 as the servo module. Each module has a set of four physical jumpers that are hard-wired at the factory. These jumpers are ECO-controlled and are used to mark the differences in functionality between the two hardware versions of the ECM modules. (These jumpers allow the microcode to display the correct hardware revision codes for the I10-RJW and servo modules when running drive utility T45.) The two 70-elass ECM module set versions and related 54-class component part numbers are listed in Table 3-1. Table 3-1 ECM Module Types - Compatibility Matrix ECMPIN VO·BJWPIN Servo PIN Comments 70-22942-011 54-17771-01 54-17769-01 RA.9O with HDA 70-22951-01 or 70-27268-01 70-22942-02 1 54-17771-02 54-17769-02 RA92-compatible 1The ECM FRU is available as a 70-class part. The individual 54-c1ass parts are not field/customer available due to repair and error log history strategies implemented by Digital. The Digital circuit schematic (CS) revision alphanumeric marking on the ECM and its 54-class component modules does not reflect the microcode loaded into the non-volatile EEPROM as firmware code. This code is loaded in the field through the use of a microcode update cartridge. The microcode can then configure itself (enabled by the physical jumpers on the ECM modules) to assure the correct functionality of that particular ECM module. The functions of the I10-BJW and servo modules are described in the sections that follow. 3.2.1.1 1I0-RJW Module Functionally, the IlO-RIW module can be divided into tlu-ee pr~ ar"'~S: SDI interface, control, and readlwrite. Figure 3-2 provides a block diagram of the IiO-B/W module. The control circuitry on this module contains the following: • MC6801 microprocessor (Master processor) • Memory (ROM and RAM) • . Output control registers • Input status registers DIGITAL INTERNAL USE ONLY 0 Gi i!rz -I II CO c QJ t TSID ARTOS A RD/RES m JJ Z » rC (JJ m 0 Z ~ C5 • ~ · - · ARTCS AWRT/CMD SOl PORTB B RTDS B RD/RES == 0 Q. C ii' SYNC WRT DATA H m 0' () -. · - ,r:" c ii" co ... I» 3 SOl PORTA RTCS SYNC DATA H RTCS SYNC ClK H WRT/CMD SYNC DATA H PlS ERR H - - ·.. :: -- .. B RTCS B WRT/CMD SOl EMBEDDED BURST TIMING :..... ---FR~ SERVO ·· ... DRV IDXSCT PlS H lOX PlS l SEl TRK a RD H SEl TRK a WRT l REC RDGATE H REC WRT GATE H INSEl TRWT FMT CUD l - .... .... -- DRIVE FlT POWER FAil ENABLE POWER OK HEAD SEl 1.... THRESHOLD CTRl SERVOClK WRITE DATA l DRV RD GATE H DRV WRT GATE l .: NRZ READ DATA H ECl RD ClK l WRTCUR DETl VCOOKl ....... MULTlCHIP SEl l WRTUNSl - ::: COMPAR ERR l NO DIAG SYNC l - ----. ----. - MASTER PROCESSOR lOOP BACKH 1:- ClR ERR l SYSTEM OK l 104- AlTRDGTH ~ PWR/UP RST . -- - ...- READ DATA 110 RTDSH RTDS ClK H EN RTDS A H EN RTDS B H EN PORTS AlB H DlY RDGATE H RD/RES DATA ECl H EMB A-B l BURST AMP OK l EMB A+B l ANALOG SERVO DATA TO SERVO_ . - ~ FROM INDEX DET H SERVO... BYTE ClK H -SERVOClK H INTERLOCK STATUS RIW STATUS SERVO STATUS POWER SUPPLY STATUS RIW ENDEC FROM BYTE ClK H SERVO; DETENTl ==-- ADDRESS 0-3 RIWSTATUS DATA 0-7 .- -~ CXO-2186A Operating Instructions 3-5 The master microcode software controls drive functions through the control and status registers. Functional and diagnostic software for the master is stored in ROM, RAM, EEPROM and PROM memories. The master is the logic processor, and it controls and performs the following tasks: • OCP communications • Drive fault detection (including error recovery) • Servo processor communications • Functional servo microcode loading • Standard disk interconnect (SDI) processing The master processor controls the servo processor through the use of software. The servo processor's response to master processor commands is also accomplished through the use of software. Upon power-up, the master processor (after self-testing the logic on the I/O-RIW) has the ability to test portions of the serVo processor logie, including the servo processor RAM memory. After a successful test of the servo RAM, the master processor will execute a load of the functional servo microcode from the EEPROM located on the I/O-R/W module. RA9OIRA92 disk drives are equipped with special error recovery circuits which the master processor controls. IT the drive receives error recovery commands from the disk controller, the master processor software activates combinations of error recovery signals. As a result, drive read/write and servo characteristics are altered in an attempt to recover drive data. Appendix B contains a more detailed description of the RA901RA.92 error recovery mechanisms. The master processor retains the drive OCP switch state information and drive unit number in memory. This state information is saved into non-volatile EEPROM memory if a power loss is detected. Upon restoration of drive power, the original state of the drive can be resumed Functional microcode in the drive provides base level revision information concerning the I/O-RJW module. Drive utility T45 (refer to Chapter 4) displays a numeric number (decimal) code that translates to the module's hardware revision. The display format is [lOP=xx]. Table 3-2 presents the displayed codes and the corresponding module part numbers and revisions. Table 3-2 I/O-RIW Module - Hardware Revision Matrix T45 Displayed Revision IOP=xx VO-BJW Module CIS Part Part Number Revision Etch Revision Compatibility 00 54-17771-01 Lx-Nx E RA90 only 01 54-17771-01 Rx-xx F RA90 only 03 54-17771-02 Ax- F RA92-compatible 3.2.1.2 Servo Module Figure 3-3 is a simplified block diagram of the servo portion of the ECM module. The servo portion of the ECM uses a digital signal processor of the Texas Instruments TMS family. The digital signal processor is called the servo processor (or sometimes the DSP processor). The servo processor communicates with the master processor and does the following: • Obtains embedded servo information from the IJO-RJW module for offset calibration of the read/write heads. • Obtains dedicated servo information for positioning the read/write heads. DIGITAL INTERNAL USE ONLY 0 G5 .-~ z-I m 11 ca c ; t z rc cCJ) iC J] » .m 0 z ~ ~ I""' 0 Ci' CD 0' n 7f' C Ai" GASP2 RlWMODULE A a. c ACTUATOR POWER AMP MOTOR +1ACT LOCK +1- -'" ACT EN L ..... IACT ~ -,-... f-- 1.0 (6801 DATA BUS) ~ ECLKH BURST PROT L R/WL RST GASP l GASP SELECT L CQ -" --- D1 -y S E CLK H --.- BURST PORT L SR L PWR/UP RST l -.- SCS L SVO FLT H ... SERVO PROCESSOR - ANALOG ~ --,. --,. -- -- T CTRl X H T CTRl Y H ----- RESET DSP L HOLD DSP L INTERRUPT L iii... ~BUSO-1;) DSP DATO-15 H =:>r SPIN/RUN H -- VCMD 14t--- ~ L...- RS HOLD "'" .- .A ACTUATOR GRAY CD X H GRAY CD Y H FINE DAC H T CTRL X T CTRL Y BYTE ClK H FINE TRK H INDEX DET H .... 3 H DA ACTUATOR .. EP10-17 H: HI RAM EN L LO RAM EN L ! - MC WRSP L I-- DMA CTRL f. DSP MEMORY ~ .... SPINDLE DRIVER ISPNDL I""' S PINOLE MOTOR .-. ,. COIL AlBIC +5V HALL GND SPINDLE LOCK +1- -- -- ..... HALL S1-S3 CXO-2187A Operating Instructions • Controls spindle motor spin-up and spin-down operations. • Monitors HDA spindle speed and servo positioning (including errors). • Controls servo-related internal diagnostics. 3-7 Additionally, the servo processor controls the following: • Retract (moving heads off'data surface) • Return to zero (RTZ) • Fine track (keeping heads on track centerline) • Seeks Functional microcode in the drive provides base level revision information concerning the servo module. Drive utility T45 (refer to Chapter 4) displays a numeric number (decimal) code that translates to the module's hardware revision. The display format is [SRV=xx]. Table 3-3 presents the displayed codes and the corresponding module part numbers and revisions. Table 3-3 Servo Module - Hardware Revision Matrix T45 Displayed Revision SRV=xx ClS Part Revision Etch Part Number Revision Compatibility 00 54-17769-01 Ax- Nx E RAoo only 01 54-17769-01 Px-xx F RAOO only 03 54-17769-02 Ax - F RA.92-compatible Servo Module 3.2.2 Preamp Control Module (PCM) The PCM FRU is part of the HDAlcarrier assembly which is also an FRU. Figure 3-4 is a simplified block diagram of the PCM. The PCM perlorms the following operations: • Decodes head select signals sent from the master to select the appropriate read/write head matrix chips Ooeated inside the RnA), and the appropriate output from each matrix chip. • Monitors unsafe read/write conditions. • Provides differential write pulses to the preamplifiers. • Passes through the HDA vendor type bits from the HDA to the master processor. • Passes the type of format bits from the PCM switch pack to the master processor. Two different PCM modules exist in the RA901RA92 disk drive family. The two PCM types are electrically incompatible in the interconnect between the PCM and the internal HDA electronics. However, the PCMs are functionally compatible between the PCM and internal HDA and the ECM variants that may be attached. A physical mechanism prevents the use of an incompatible PCM with an HDA. Table 3-4 describes the PCM switch pack settings with regard to the type of PCM, HD~ and RA9x model. DIGITAL INTERNAL USE ONLY c C5 ~ r- :II co c ; z-4 I :IJ -a m z 0 c m 0' () » rC/J m 0 z ~ _ RD DATA 1 - RD DATA 2 i: : : MUX - ~ c t' , ~ CHIP EN4 CHIP EN3 ii co ; DECODER 3 .-.. R/W READY ENC WRT GATE RD/WRT 2 RD/WRT 3 RD/WRT 4 ..... .,. LATCH HD SEL 8 HD SEL 4 HD SEL 2 HD SEL 1 -- ,. -----,. WRT DATA 1 WRITE DATA SWITCH WRT DATA TRANS--.1 WRT DATA 2 WRT DATA 3 WRT DATA 4 HDA ,. ,. ..- --.. WRT CUR 1 ZONE 2 ---.. ----. --- SELECT - ZONE 1 -,. 10 HDA DATA) ,~ CHIP EN2 CHIP EN1 RD/WRT 1 DECODER ~ RD DATA 3 RD DATA 4 -.. - WRITE CURRENT GENERATOR WRT CUR 2 WRT CUR 3 WRT CUR 4 ..- - --- CXO-2188A Operating Instructions 3-9 Table 3-4 PCM Switch Pack Setup PCMPlN 54-17758-01 HDAPfN PCM SW Pack Settings· S14 SI-3 81·2 SI·1 70-22951-01 0 0 0 0 RA.9O long arm only 0 1 RA90 short arm Comments 1 54-19724-01 70-27268-01 0 1 ~ 54-19724-011 70-27492-01 1 0 0 1 RA92 only 54-19724-011 70-27492-01 0 0 0 1 Incompatible setup 2 54-19724-011 70-27492-01 1 1 0 1 Incompatible setup 2 54-19724-011 70-27268-01 0 0 0 1 Incompatible setup 2 54-19724-011 70-27268-01 1 1 0 1 Incompatible setup 2 *0 = ON = CLOSED, 1 = OFF = OPEN 1PCM spares shipped from logistics are configured by default to declare an incompatible situation. This forces the field person to properly configure the replacement P{;M" to indicate the proper IIDA format type. The drive microcode uses the switch setting information to properly configure servo operations. 2Drive LED error code CO signifies that the microcode has determined an incompatible situation between the hardware and/or microcode components of the drive configuratio~ or a hardware failure has caused the drive to believe the configuration is improper. Functional microcode in the drive provides base level revision information concerning the PCM module. Drive utility T45 (refer to Chapter 4) displays a numeric number (decimal) code that translates to the module's hardware revision. The display format is [PCM=xx]. Table 3-5 presents the displayed codes and the corresponding module part numbers and revisions. Table 3-5 PCM Module - Hardware Revision Matrix T45 Displayed Revision PC""l\'isxr PCMModule Part Number CIS Part Revision Etch Revision Compatibility 00 54-17758-01 2 Ex-Hx E HDA 70-22951-01 only 01 2 Ax- A HDA 70-27268-01 and 70-27492-01 54-19724-01 18witch position 81-3 and 81-4 of Switch Pack Sl determine the displayed PCM hardware revision. 2These modules have a mechanical interlock that prevents the inadvertant mating of electrically incompatible PCMs to the HDA There is a four-position switch pack on the PCM. Switch pack switches S1-3 and S1-4 determine the PCM hardware revision (not CS revision) through OCP display T45. Switches S1-1 and S1-2 are used to tell the drive functional microcode the format type written on the HDA. There are two planned format types - R...A..90-eompatible and F...A92-compatible. A new HDA/carrier assembly FRU should have the switch pack set correctly by the manufacturing plant. If the PCM is defective, set the switch pack switches appropriately. Figure 3-5 shows the location of the switch pack on the PCM. DIGITAL INTERNAL USE ONLY 3-10 Operating Instructions PCM SWITCH PACK PREAMP CONTROL MODULE (PCM) I PCM-ECM CABLE CONNECTOR PCM SWITCH: 0 = OPEN/OFF = LOGICAL 1 C = CLOSED/ON = LOGICAL 0 f.lgure 3-5 I CXO-2963A PCM SwHch Pack location 3.2.3 Head Disk Assembly and Carrier Assembly Figure 3-6 is a simplified block diagram of the RA901RA92 head disk assembly (HDA) and its relationship to the rest of the drive. The HDA consists of the following components: • The spindle motor, spindle, and recording media • The actuator motor to position the read/write heads • The Hall sensors to monitor spindle speed • The preamp/select chips • The brake assembly • The ground brush • The positioner lock mechanism Currently, there are three different HDAs in the RA9x disk drive products family. Two different PCMs are available for these three HDAs. Table 3-6 is a compatibility matrix for HDA types, PCM types, and RA9x models. DIGITAL INTERNAL USE ONLY Jl co c CiJ t ,... SERVO MODULE HDA R/W 110 MODULE PREAMP CTRL MODULE WR RECOVERY L ACTUATOR PWR AMP -- MOTOR +/ACT LOCK +/SPINDLE DRIVER COIIL AlB/C +5V HALL GND SPINDLE LOCK +/- MANUFACTURING ONLY c (5 ~ r- ~ m lJ z » rC en m oz !:( , .-... - ------- ---- RE AD DATA 1-4 L I - - - - -....... RE AD DATA 1-4 H I - - - - -....... CIJB MONITOR L I - - - - -....... WFl ITE UNSAFE L I - - - - -..... SEA. VO RD DATA L I - - - - -..... SER VO RD DATA L I - - - - -...... VEND OR TYPE 1/2 H t - - - - -...... +/- READ DATA HDA INTLK CLSD L WRITE CUR DET L CURRENT MONITOR WRITE UNS L ANALOG SERVO OAT H ANALOG SERVO OAT L HD SEL 1-2 VENDOR TYPE 1/2 H -5.2V SERVO READ IH SERVO WRT DATAL SERVOWRT CIATA H ....- - - - - 1 +5V ~-----I READ/WRITE 1-4 H .....----.-1 WR CUR 1·,4 H ...- - - - - 1 CHIP ENA 1-4 L CHIP ENA L ....- - - - - 1 WRITE DATA 1-4 L WRITE DATA 1-4 H ....- - -...... HEAD SEL 1-2 L 14----,-. ,----..... 4-- ---- HDA DATA 1-4 L CHIP ENABLE L WRITE DATA TRANS H WRITE DATA 1rRANS L. FORCE MULTICHIP H HEAD ClK H R/W READY H ENC WRITE GATE L SELECT REV LEVEL H ZONE 1-2 H +/- 12V IN -5.2V IN +5VIN CXO-2189A 3-12 Operating Instructions Table 3-6 RA90/RA92 HDA Hardware Compatibility Matrix BDAPIN and Type RA90 70-23899-01 1 RA90 70-23899-02, RA92 7027490-01 PIN 70-22951-01, Original* Compatible Incompatible 54-17758-01 70-27268-01, Short-arm RA90 HDA Compatible Original* Incompatible 54-19724-012 70-27492-01, Short-arm RA92 HDA Incompatible Incompatible Original* PCM Long-arm RA90 HDA *Original =RDA type original to drive IThe RA90 disk drive was originally made from the base drive part number 70-23899-01. With the introduction of the short-arm HDA, the variant of the base part number for the RA90 disk drive was changed. The size ancl SDI disk topology of the 70-23899-01 and 70-23899-02 variant RA90 disk drives are identical. There is not a duplication of drive serial numbers between the 7O-class numbers. Architecturally, the drives are itlenticGl. At the HDA FRU level, the abort-arm HDA is electrically compatible with the originallong-arm HDA. However, microcode compatibility issues must be watched. 2The PCM switch pack must be set to indicate the type of HDA. 3.2.4 Dual Outlet Blower Motor The blower motor assembly provides drive cooling. In addition, the blower motor contains speed control circuitry to activate higher throughput if the ambient air temperature exceeds 23°C (75°F). If the drive is operating without problems at or below this temperature, blower speed is reduced for better acoustic levels. 3.2.5 Power Supply The power supply provides the following voltages to RA9OIRA92 disk drives: • ±12 Vdc • ±5.1 Vdc • ±24 Vdc • -5.2 Vdc Normal power supply operation is indicated by the presence of a green Power OK LED located at the rear of the drive. Refer to Figure 3-7 for the location of the Power OK LED. The power supply operates on any line frequency within the range of 47 HZ to 63 HZ. It is switchselectable to either of two ranges: 120 Vac or 240 Vac. CAUTION If a unit has its voltage selector switch in the 120 Vac position and is plugged into 240 Vac, the power supply will be damaged. If a unit has its voltage selector switch in the 240 Vac position and is plugged into 120 Vac, it may work, but would be very sensitive to low line voltage. DIGITAL INTERNAL USE ONLY Operating instructions 3-13 DRIVE /REAR ~ CIRCUIT BREAKER I II /~n q ~~~O /I c GREEN LED (POWER OK) CXO-2134B Figure 3-7 Power Supply OK LED This power supply has two vendors, designated Vendor A and Vendor B. Power supplies from Vendor A have a serial number with a ex. site code. Power supplies from Vendor B have a serial number with a KB site code. (Voltage markings on some power supplies may read 115/230 Vac.) The power supplies from both vendors are functionally identical and catTy the same Digital part number. 3.2.6 Drive Functional Microcode The drive functional microcode can be field loaded and upgraded using the OCP microcode update port. ROM-based utility programs contained on the ECM module (I/O-RIW) allow microcode loading. Table 3--7 is a compatibility matrix for microcode cartridges, microcode levels, and ECM and HDA FRUs. DIGITAL INTERNAL USE ONLY 3-14 Operating Instructions Table 3-7 RA90/RA92 Microcode Compatibility With Drive FRUs ECMFRU Microcode Micro- Cart. PIN and Rev code Level 70-24432-02 Al BDAFRU PIN PIN PIN PIN PIN 70·22942-01 70-22942-02 70-22951·01 70-27268-01 70-27492-01 8 Yes Yes 70-24432-02 A1 9 Yes 70-24432-02 Bl 10 Yes Nol Nol No1 70-24432-02 Cl 11 Yes Yes 70-24432-02 Dl 13 Yes N02 No2 Yes No No No No No No No No No No 70-27950-01 Al 20 Yes Yes No Yes Yes 70-27950-01 Bl 25 Yes Yes Yes Yes Yes Yes Yes 1Results in LED Code 13 2Results in LED Code E2 NOTE Microcode compatible with an ECM FaU means the code can be loaded into the ECM FaU without error and wiD function, provided there is a compatible BDA with the appropriate PCM and PCM switch settings are correct. (This does not apply to Hard faults, because the microcode cannot be loaded into the ECM.) Microcode compatible with an BDA FaU, means tbat the code (when loaded into a compatible ECM) will support the BDA identified in Table 3-7. To determine total compatibility, you must verify the following: - Code compatibility to ECM (Table 3-7) - Code compatibility to HDA (Table 3-7) - ECM compatibility to HDA (Table 3-1) - PCM and HDA compatibility (Table 3-4) - PCM switch pack setup (Table 3-4) 3.2.7 OCP Functions The operator control panel (OCP) shown in Figure 3-8 functions as the interface to the RA901RA92 disk drive. The OCP performs the following functions: • Selects and displays the unit address. • Selects Run, Write Protect, Port A, and Port B. • Displays fault indication and elTOr codes. • Selects tests in the test mode. • Controls the drive software update process. • Communicates with the RA9OIRA92 master processor. • Monitors momentary contact switches for closure. DIGITAL INTERNAL USE ONLY Operating Instructions FOUR-CHARACTER ALPHANUMERIC DISPLAY"" UNIT NUMBER / / 3-15 TEST SWITCH STATE LED INDICATORS NOTE: RA90 PART NO. 74-35109-02 RA92 PART NO. 74-39769-01 Figure 3-8 CXO-2962A RA90/RA92 OCP To execute or select these functions, you must be familiar with the following OCP features (refer to Figure 3-8): • Six input switches (Run, Fault, Write Protect, Port A. Port B, and Test). • Seven LED indicators (Ready, Run, Fault, Write Protect, Port A, Port B, and Test). • A four-character alphanumeric display. • . A software update port (refer to Chapter 7). 3.3 RA90/RA92 Operating Modes RA9O/RA92 disk drives operate in three setup modes: normal, fault display, and test. The following sections describe the function of each of these modes. 3.3.1 Normal Mode Setup The normal mode setup is the usual operating mode of the RA90 and RA92 disk drives. Switch selection during normal operation usually consists of the Run switch, Write Protect switch (for normal write protection), and Port A or Port B switch. No Fault or Test indicators are lit. The switch states are displayed in the alphanumeric display, and the state of the drive relative to the controller is displayed in the LED indicators. DIGITAL INTERNAL USE ONLY 3-16 Operating Instructions In the normal operating mode: 1. Selecting the Run switch causes an R to appear in the OCP display and causes the drive to spin up. Additionally, the Run LED indicator lights. The Ready LED indicator lights once the drive is up to speed. 2. Selecting the Port A or Port B switch causes an A or B to appear in the OCP display and logically makes the drive available to the controller. 3. Selecting the Write Protect switch logically write protects the drive and lights the Write Protect LED indicator. 4. Selecting the Fault switch: • (Without a fault indicator) causes a 2-second OCP lamp test. • (With a fault indicator) causes an elTOr code to display. Selecting the Fault switch a second time (with a fault indicator) clears the fault. (Refer to Section 3.3.2.) 5. Selecting the Test switch: • (With the Port A or Port B switch selected) causes a 2-second display of the unit address. (Refer to Section 3.4.1 for information on the alternate unit address display mode.) • (Without the Port A or Port B switch selected) causes the drive to enter the test mode. (At this time the Ready LED is extinguished.) Table 3-8 details operator actions and the result of OCP switch selection(s) in the normal mode. Power-up OCP functions and normal switch selection functions are covered. Table 3-8 Power-Up: Normal Mode Operations Operator Action OCPResult Drive Function <Power-up> [WAIT] Drive is running power-up diagnostics Default [0000] Unit number displayed may be something other than zero <RUN> [R. •• ] Spinup command issued to spindle <A> [B.A.] Port A is enabled <B> [R.AB] Port B is enabled 3.3.2 Fault Display Mode Setup The fault display mode can only be entered if the Fault indicator is lit; otherwise, selecting the Fault switch causes a 2-second OCP lamp test. To enter the fault display mode, select the Fault switch. An error code is displayed in the format shown in Figure 3-9. To exit the fault display mode and clear the fault, select the Fault switch a second time. NOTE Hard faults will not clear. Figure 3-9 shows a characteristic alphanumeric fault display error code. Figure 3-10 is a fault display mode flowchart. DIGITAL INTERNAL USE ONLY Operating Instructions 3-17 DISPLAY = I. ._E_,--_I,--o_..i i _F... CXO-219CA Figure 3-9 OCP Faun DIsplay Error Code Example NORMAL MODE DiSPLAY= NO SELECT FAULT SWITCH SELECT FAULT SWITCH OCP LAMP TEST (2 SEC) ·DISPLAY= FAULT CLEARS ·NOTE: ANY COMBINATION OF LEGAL ALPHANUMERIC ERROR CODES (HEX). CXO-2191B Figure 3-10 FauH DIsplay Mode Flowchart DIGITAL INTERNAL USE ONLY 3-18 Operating Instructions 3.3.3 Test Mode Setup You must enter the test mode to set the RA90 or RA92 disk drive unit address or to run resident diagnostic tests. In this mode, Port A and Port B switches have the function of selecting both the unit address numbers and test numbers. In addition, the port switches are used to abort running diagnostics. The Write Protect switch starts the tests and the Port A or Port B switch stops selected tests. The test mode is characterized by three displays. Figure 3-11 shows an OCP after test selection is made. Figure 3-12 shows a display while the test is running. DISPLAY =1...T----.I----'I~o....1_1*.....1 * INDICATES FLASHING DISPLAY CXO-2192A Figure 3-11 OCP Display After Test Selection DISPLAY = I I I DISPLAY = I S I 1 0 11 1 (START) DISPLAY... I DISPLAY = Is I I 0 1 2 I T C 1 0 11*1 1 0 11 I (COMPLETE) * INDICATES FLASHING DISPLAY CXO-2195A Figure 3-12 OCP Display While Running Test DIGITAL INTERNAL USE ONLY Operating instructions 3-19 3.4 Programming the Drive Unit Address The unit address can be set once power has been applied to the drive. You must set the drive unit address before placing the drive on line. The RA90 or RA92 unit address is programmable from 0 to 4094. (Note that the operating system or subsystem type can limit the unit address range.) Use the following procedure to set the drive unit address. (Refer to Figure 3-13 for a :flowchart of this procedure.) 1. Select the Test switch. (The Test LED indicator lights and a unit address (if previously programmed) is displayed; otherwise, zeros are displayed.} 2. Select the Port A switch for the ones position. (Position zero blinks.) 3. Select the Port B switch. (Position zero increments i through 9 every time Port B is selected.) 4. Select the Port A switch for the tens position. (Position one blinks.) 5. Select the Port B switch. (Position one increments 1 through 9 every time Port B is selected.) 6. Select the Port A switch for the hundreds position. (Position two blinks.) 7. Select the Port B switch. (Position two increments 1 through 9 every time Port B is selected.) 8. Select the Port A switch for the thousands position. (Position three blinks.) 9. Select the Port B switch. (Position three increments 1 through 4 every time Port B is selected.) 10. Select the Test switch to exit. At this point, the OCP prompts you to verify that you want to change the unit address. The following prompt scrolls through the OCP display: eRG UNT I {? [N]} • If you do not want to change the unit address, select the Test switch a second time. The drive returns to normal mode. • To change the unit address: 1. Toggle the Port B switch, eRG UNT , {? [Y]} displays. 2. Select the Test switch. The old unit address is overwritten with the new address. The new unit address is displayed in the OCP. NOTE The new unit address is written to EEPROM and is not lost if the drive loses power. DIGITAL INTERNAL USE ONLY 3-20 Operating Instructions NORMALMOD~ DISPLAY = r R A B INCREMENT NUMBERS 0-9 BY SELECTING PORTB SWITCH I L..--I...-.J-.....I._-' I~ DESELECT PORT A AND B TO SET UNIT ADDRESS DISPLAY = r THE FOLLOWING IS A SCROLLING DISPLAY. TO STOP DISPLAY. SELECT RUN SWITCH , SELECT PORTA SWITCH 1 R L -....._-#---'_~ DISPLAY = r O O * 0 0 I DISPLAY = * I C H 1 DISPLAY = U I N T I? G I SELECT TEST SWITCH (TEST LED LIGHTS) SELECT PORTA SWITCH I0 I 0 0 0* I DISPLAY = J0 * 0 0 O· 0 INCREMENT NUMBERS 0-4 BY SELECTING PORTB SWITCH , I TO SAVE OLD ADDRESS, SELECT [N} AND EXIT BY SELECTING TEST SWITCH 0 I 01, SELECT PORTA SWITCH DISPLAY = 0 0 1 DISPLAY = * INDICATES FLASHING READOUT UnH Address Selection Flowchart DIGITAL INTERNAL USE ONLY 11 [ *1 ] N , TO SAVE NEW ADDRESS. SELECT [Y} AND EXIT BY SELECTING TEST SWITCH SELECT TEST SWITCH DISPLAY = Figure 3-13 1 [ N*l) SELECT PORT B SWITCH TO TOGGLE (Y) OR (N) , ,~ INCREMENT NUMBERS 0-9 BY SELECTING PORTB SWITCH ? L -.......- - f -.....-~ SELECT PORTA SWITCH , DISPLAY = DISPLAY = INCREMENT NUMBERS 0-9 BY SELECTING PORTB SWITCH r I[ 1 Y* I} CXO-2138A Operating instructions 3-21 3.4.1 Anemate Unit Address Display Mode Future RA90 and RA92 disk drives will incorporate a microcode enhancement that will provide an alternate unit address display mode. 'Th display the unit address, refer to Figure 3-14 while penorming the ionowing procedure: 1. The OCP display shows an B, A, and/or B. 2. While in normal mode, select the Port A and/or Port B switch. 3. Select the Test switch. At this point, the Run, Fault, Write Protect, Port A, and Port B switches are disabled. 4. The unit address is displayed until: • The Test switch is deselected. • Power is cycled. • An SDI HARD INIT occurs, or the drive forces a hard initialization due to a fatal error. Any of these conditions will clear the OCP from the alternate display mode. DIGITAL INTERNAL USE ONLY 3-22 Operating Instructions NORMAL MODE DISPLAY= NO RFWAB FUNCTIONS DISABLED DISPLAY. NO ENABLES ALL SWITCHES CXO-2958A Figure 3-14 Alternate Unit Address Display Mode Flowchart DIGITAL INTERNAL USE ONLY 4 Drive-Resident Diagnostics and Utilities 4.1 Introduction This chapter describes drive-resident diagnostic fault detection, power-up and idle loop diagnostic routines, and sequenced or chained diagnostics. The RA9OIRA92 drive-resident diagnostics and utilities are described individually. These drive-resident diagnostics test for and detect elTOrs in the following field replaceable units (FRUs): • Electronic Control Module (ECM) (inputloutput-readlwrite (IlO-PJW> and servo modules) • Preamp Control Module (PCM) • Head Disk Assembly (lIDA) 4.2 Power-Up and Idle Loop Diagnostics Resident diagnostics execute any time the drive is powered up or the master processor is reset. Additionally, diagnostic routines execute during idle loop with the drive spun up or down. The Test LED, when lit, indicates the drive is in idle loop testing. The following sections describe power-up (reset) and idle loop diagnostic sequences. 4.2.1 Power-Up (Hardcore) Diagnostics The iollowing hardcore tesis are run at power-up or upon reset of the master processor (refer to Section 4.7 for a description of each test): • Master CPU test (POR) • Master ROM test (TOl) • Master RAM test (POR) •. Master timer test (T02) • Serial communication test (SCI) (POR) • Servo data bus loopback test (T03) • Servo RAM test (POR) DIGITAL INTERNAL USE ONLY 4-1 4-2 Drive-Resident Diagnostics and Utilities 4.2.2 Idle Loop Tests (Drive Spun Down) Idle loop is defined as the drive being off line to the controller. The following sequence is executed every 30 seconds during idle loop (refer to Section 4.7 for a description of each test): • Master ROM test (TO 1) • Master timer test (T02) • Servo data bus loopback test (T03) • Head select test (TOO) • Sectorlbyte counter test (T07) • SDI loopback test (internal) (T08) 4.2.3 Idle Loop Tests (Drive Spun Up) The following tests are run during idle loop with the drive spun up (refer to Section 4.7 for a description of each test): • Master ROM test (T01) • Master timer test (POR) • Servo data bus loopback test (T03) • Head select test (TOG) • Gray code (track counter) test (1'29) • Guardband test (T30) • Incremental seek test (quick verify mode) (T31) • Random seek test (quick verify mode) (T33) 4.3 Sequence Diagnostics A number of tests are sequenced together to form a chain of tests. The test [chain] numbers and the individual test numbers that make up the chain are listed here. An example of the information seen in the OCP alphanumeric display is also included. Refer to Section 4.7 for a description of each test. • TOO and T23 are the same when the drive is spun down, and include: T01 T02 T03 TOO T07 TOB Duration: 12 seconds • TOO and T22 are the same when the drive is spun up, and include: T01 T02 T03 T06 T29 T30 T31 T33 DIGITAL INTERNAL USE ONLY Drive-Resident Diagnostics and Utilities 4-3 T14 T15 T16 Duration: 7:10 minutes The following is an example of the information seen in the alphanumeric display as the drive sequences through a chain: 1. [T 00] Enter test TOO from the OCP front panel. 2. [S 00] Start TOO. 3. [S 01] T01 starts. 4. [C 01] T01 completes. 5. [802] T"02 starts. 6. [C 02] T02 completes. (and so on until each diagnostic in the chain is completed) 7. [T 00] Concludes with this display and the least significant digit (LSD) blinking. The OCP display is read from left to right with the LSD on the right side. The majoriv,f of tests are of a relatively short duration, with the fonowing exceptions: • Tal (2.5 minutes; indefinite when standalone) • T32 (1 minute; indefinite when standalone) • T33 (55 seconds; indefinite when standalone) Additional test chains are: • T18: T01, 02, 03,06 (8 seconds total) • T19: T14, 15, 16 (20 seconds total) • T20: (4 seconds if spun down; 2 seconds if spun up) • T21: T03, 29, 30, 31 (4.5 minutes), 32, 33 (7:10 minutes total); error if spun down • T22: Same as TOO except Tal (4.5 minutes) (7 minutes total) • T23: T01, 02, 03,06,07, 08 (20 seconds total) 4.4 Standard OCP Displays Indicating Procedural Problems If you attempt to load and run a nonexistent test, [INVL] (invalid) displays in the OCp, followed by an error code. For example, if you attempt to run T10 (an invalid test number), the following occurs: 1. [T 10] (Display) 2. [S 10] (Display) 3. [INVL] (2 seconds-indicates invalid test) 4. [C 10] 5. [T 10] No error code is generated. To continue, simply select another diagnostic. If you attempt to run a diagnostic while the drive is faulted and that particular diagnostic cannot be run under fault conditions, the OCP displays [NRUN]. DIGITAL INTERNAL USE ONLY 4-4 Drive-Resident Diagnostics and Utilities For example, read/write or seek tests cannot be run while the drive is faulted. However, ROM or RAM tests can be run. If you attempt to run a test that requires the drive be spun up (but the drive is spun down), the following occurs: 1. [T 14] Load T14. 2. [8 14] Start T14. 3. [T 14] (with fault light) 4. Select Fault switch. 5. fE.CAl error code indicates the drive must be spinning for the test to run successfully. Unless otherwise indicated, this is the format for all errors. Select the Fault switch again to clear the fault and continue. If you attempt to run a test that requires the drive be spun down (but the drive is spun up), the following occurs: 1. [T 07] Load T07. 2. [8 07] Start T07. 3. [T 07] (with fault light) 4. Select the Fault switch. 5. [E 7B] Invalid-test-while-drive-is-spinning error. This is the format for all tests that are invalid while the drive is spinning. Select the Fault switch again to clear the fault and continue. Some diagnostic test numbers call up other tests. These are displayed in the OCP after the diagnostic starts. An example of this is 1'24. The following is displayed in the OCP: 1. [T 24] Load 1'24. 2. Start test. 3. [8 63] See test T63. 4. Mter the head(s) are selected, select Write Protect. 5. [T 31] Loaded by the drive. 6. [8 31] See test T3l. The reverse is not true. T63 does not start 1'24. 4.5 Software Jumper References to a software jumper are frequently made throughout the discussion of diagnostics. To use the software jumper, simply select the Run/Stop switch within 1.5 seconds of starting a diagnostic requiring the jumper's use. CAUTION Do not use the jumper unless it is required. Valuable drive component information can be accidentally lost. Use the jumper only when instructed to do so. DIGITAL INTERNAL USE ONLY Drive-Resident Diagnostics and Utilities 4-5 Master RAM Test (POR/SOI) 4.6 Temperature's Affect on Drive Performance The RA9OIRA92 drive utilities T36, T3B, and T39 measure various seek time parameters. Compare measured times to drive specifications in cases where seek time is in question. At the areal densities of the RA9OIRA92 disk drives, variations of mechanical responses within the HDA mechanical structures change significantly over the wide temperature ranges acceptable to the drive. To control these variations and their impact on the subsystem, the drive monitors and compensates its seek profile to optimize the seek time performance. This compensation is a dynamic process, that assures top seek performance of the disk drive. 4.7 Diagnostics Descriptions This section describes each of the diagnostic tests and utilities resident in the RA901RA92 disk drive. Tests are listed by a test number (where applll"$lble), a name. an explanation of how the test is invoked, and a test description. Conventions include the following: • (TOO): test number • (POR): power-up or reset • (SDI): initialization performed by the controller over the SDI cable • ([0000]): items enclosed in square brackets represent the OCP alphanumeric display NOTE Some diagnostics implement a scrolling display pattern. To stop the scrolling display patternt select the Run switch; this halts the display until you are ready to continue. Select the Run switch again to continue the display. Some tests run for several seconds then have results to display_ These tests stop the scrolling display and send an asterisk to the display_ Press the Run switch to display test results. Master CPU Test (POR) The Master CPU test verifies the basic functions of the drive master processor. Accumulator functions, conditional codes, and other MCU chip functions are tested. Master RAM Test (PORISDI) The Master RAM test runs at power-up only. It verifies the master processor internal and static RAM. The test reads and writes, then reads each RAM location again to verify data integrity of the component. The test is executed in both forward and reverse directions. DIGITAL INTERNAL USE ONLY 4-6 Drive-Resident Diagnostics and Utilities (T03) Servo Data Bus Loopback Test (PORlSDI) Serial Communications Interface (SCI) Test (POR) The SCI test checks the master processor serial communication interface by looping a data pattern from the serial output back to the serial input. It compares data out to data in for integrity. Additionally, the serial port is tested for overrun error detection and overrun recovery. The test simulates OCP MCU communication with the master MCU. Servo RAM Test (POR) The Servo RAM test checks the servo processor RAM by writing a pattern of ones and zeros through RAM. The entire 16 Kbytes of RAM is tested. (T01) Master ROM Test The Master Processor ROM test verifies the master processor internal ROM, EEPROM, and the associated address decode logic. A checksum is done on each ROM. Next, the test verifies that the consistency codes match between the MCU ROM and the master processor EPROM and EEPROM. If a failure occurs, the master processor attempts to display an error code to the OCP. (T02) Master Timer Test (PORISOI) The Master Timer test verifies the output compare timer in the master processor by checking the Output Compare Flag (OCF) for stuck bits. Additionally, the test operates the timer in polling and interrupt modes. In polling mode, the output compare register generates a compare every 50 ms and ensures that the OCF sets within 60 ms. In interrupt mode, the output compare register generates a compare every 50 ms and checks for one interrupt within a 75 ms period. (T03) Servo Data Bus Loopback Test (PORISOI) The Servo Data Bus Loopback test checks the data bus interface between the I10-PJW module and the servo module (ECM) by rotating a single bit through each bit position on the servo data bus. The data pattern is written to the GASP register #1 and read back through GASP register #7. DIGITAL INTERNAL USE ONLY Drive-Resident Diagnostics and Utilities 4-7 (T04) Drive SIN Bus Test (PORlSDI) (T04) Drive SIN Bus Test (PORISDI) The Drive SlN Bus test checks the drive serial number bus (the rear flex cable between the HDA and the servo module with the PCM switch pack). The three drive ID ports hardwired on the rear flex cable assembly are read and concatenated into one 20-bit binary encoded serial number. Bits 19 and 18 represent the manufacturing plant code of the drive (CX or KB). Bits 17 through 0 are the alphanumeric serial number (OOOOO-Z9999). The numbering scheme is displayed using the following: Encoded Serial Number Displayed Decimal Drive Serial Number 00000-99999 0-99,999 A~A9999 100,000-109,999 B~B9999 110,000-119,999 COOOO-C9999 120,000-129,999 DOOOO-D9999 130,000-139,999 E~E9999 140,000-149,999 FOOOO-F9999 150,000-159,999 HOOOO-H9999 160,000-169,999 JOOOO-J9999 170,000-179,999 KOOOO-K9999 180,000-189,999 LOOOO-L9999 190,000-1999,999 M~M9999 200,000-209,999 NOOOO-N9999 210,000-219,999 POOOO-P9999 220,000-229,999 ROOOO-R9999 230,000-239,999 SOOOO-S9999 240,000-249,999 TOOO()...T9999 250,000-259,999 UOOOO-U9999 260,000-269,999* V~V9999 270,000-279,999 WOOOO-W9999 280,000-289,999 YOOOO-Y9999 290,000-299,999 ZOOOO-A9999 300,000-309,999 *NOTE U2143 is the maximum serial number that can be coded for the KB manufacturing site because only the bottom 18 binary bits are used for the serial number range. DIGITAL INTERNAL USE ONLY 4-8 Drive-Resident Diagnostics and Utilities (TO?) SectorlByte Counter Test The test passes or fails based on the following valid and invalid bit-encoded binary information: VALID DRIVE SiN CODES 19 18 0 0 CXO-built drive (serial number 1 through 262,143) 0 1 CXO-built drive (serial number 262,144 through 309,999) Limitation is based upon the number of alphabetic characters available. 1 0 KBO-built drive (serial number 1 through 262,143) INVALID DRIVE SiN CODES BITS MIN BINARY VALUE 19 18 BITS<17:00> o 1 1 1 MAX BINARY VALUE BITS<17:00> 001011101011110000 111111111111111111 000000000000000000 111111111111111111 NOTE Do not alter these switches in the field unless you are instructed to do so during an ECOIFCO installation. (T06) Head Select Test The Head Select test checks the SDI gate array (SGA) head select register for stuck-at conditions. The test writes a head select pattern to an SGA internal register and vermes the pattern by reading it back through another SGA internal register. Each head select pattern is clocked to the preamp control module (PCM) verifying the correct head select chip can be enabled. (T07) Sector/Byte Counter Test The SectorlByte Counter test checks the sector preset by writing and reading each bit in the sector preset register. The test checks the byte preset counter by presetting the byte counter. A full coUnting sequence is needed to increment the sector count by one. Finally, the sector/byte counter is checked with the actual preset values used in the functional code. A diagnostic clocking signal is used. DIGITAL INTERNAL USE ONLY Drive-Resident Diagnostics and Utilities 4-9 (T15) WritelRead Test (TOB) SOl Loopback Test (Internal) The internal SDI Loopback test is executed with the TS!D GATE .AP"p_A"y in loopback mode. The State Frame part of this test asserts the state bits (RDY, ATN, RiW, SEC) in the Real Time Drive State (RTDS) frame and checks the corresponding state bits (RDY, WRT, RD, 00) in the Real Time Controller State (RTCS) frame for accuracy. The Response Serializer part of this test sends framing codes (START, CONTINUE, END) by way of the CMDREG register, along with response data (a pattern) by way of the RSPDAT register. The test checks the correct framing codes by way of the INSTR2 register and the correct command data through the CMDATA register. This test is executed on Ports A and B. (T09) SDI Loopback Test (External) The external SDI Loopback test is the same as the internal SDI Loopback test except the SDI signals are looped back via connectors at the end of the SDI cables. See Figure 4-1. (T14) Read-Only Test The Read-Only test compares prerecorded data information from cylinder 2659 to the data read by each head. The data pattern is different for each head. If the compare fails, an error code is generated. In addition, if five off-track errors are detected while reading with anyone head, an error is generated. Errors are analyzed in the following manner: • A sector is considered bad if the same sector fails to read the correct data three out of five times. • A head is considered bad if the same head contains nine bad sectors. If no errors are detected during this test, a compare error is induced to ensure that the lID chip compare circuitry can detect a compare error. (T15) Write/Read Test The WritelRead test executes only after the read-only (T14) test has passed. This test writes and reads dedicated cylinder 2660 using all read/write heads. Two patterns are used during this test: 1. First, all the heads are written with an all-zeros-plus-a-SYNC-BIT pattern and read to verify that the data compares. If there are no errors, a NO SYNC detection test is run verifying that the llD sync detection circuitry is working correctly and that it can detect a NO SYNC error. 2. Second, a ones-plus-a-SYNC-BIT pattern is written to cylinder 2660 and read back using each data head. Data is compared to ensure data integrity. DIGITAL INTERNAL USE ONLY 4-10 Drive-Resident Diagnostics and Utilities (T15) WritelRead Test 110 BULKHEAD CXO-2144A Figure 4-1 Using Loopback Connectors DIGITAL INTERNAL USE ONLY Drive-Resident Diagnostics and Utilities 4-11 (T21) Total Servo Sequence Test (T16) ReadlWrite Force Fault Test Readlw~~te safety detection ch-cuits Ch-e tested by SOihvCh-e and hcu-dwCh-e routines ~'tat force read/write faults. (T17) Read-Only Cylinder Formatter Read-only cylinder 2659 is written with a zeros-plus-a-SYNC-BIT pattern (all heads) and read back to verify data. Then another pattern is written and read back, and the data is compared for accuracy. This cylinder is not formatted by any other subsystem formatter. NOTE Use a software jumper to execute this utility. This protects the stored information from unintentional clearing. Refer to Section 4.5. Reformatting this cylinder is sometimes necessary in the field. (T18) Hardcore Sequence Test This sequence diagnostic consists of TO 1, 02, 03, and 06. Duration: 20 seconds. Drive may be spun up or down. (T19) ReadlWrite Sequence Test The drive must be spun up to run the ReadIWrite sequence test. This sequence diagnostic consists of T14, 15, and 16. Duration: 25 seconds. (T20) Servo Spinup Sequence Test See T03. (T21) Total Servo Sequence Test The drive must be spun up to run the 'lbtal Servo Sequence test. This sequence diagnostic consists of T03, 29, 30, 31, 32, and 33. Duration: 4.5 minutes. DIGITAL INTERNAL USE ONLY 4-12 Drive-Resident Diagnostics and Utilities (T30) Guardband Test (T22) Total Drive Sequence Test (Spinning) The drive must be spun up to run this test. This sequence diagnostic consists of TO!, 02, 03, 06, 29, 30, 31, 33, 14, 15, and 16. Duration: 7 minutes. (T23) Total Drive Sequence Test (Spun down) The drive must be spun down to run this test. This sequence diagnostic consists of TO!, 02, 03, 06, 07, and 08. Duration: 20 seconds. (T24) Head Select and One Seek Test Sequence See T63. (T28) Drive-Sensed Temperature Display Utility This utility was implemented with version 25 of the drive microcode to display the drive-sensed temperature in degrees Fahrenheit, in a scrolling display on the OCP. Version 26 of the microcode displays this temperature in degrees Fahrenheit and Celsius. The OCP scrolling display is as follows: [*TEMP=xxxF/xxC*] (T29) Gray Code (Track Counter) Test The Gray Code test checks that the correct gray code is generated from the two least significant bits of the track counter as the drive seeks from cylinder 0 to 3 and 3 to O. This test is executed on the dedicated servo surface only. (T30) Guardband Test The Guardband test checks the drive's ability to find inner and outer guardbands during seeks to these areas. DIGITAL INTERNAL USE ONLY Drive-Resident Diagnostics and Utiiities (T34) Tapered Seek Test 4-13 (T31) Incremental Seek Test The Incremental Seek test exercises the serve by seeking between two eyl;nders mring an incremental seek pattern. The starting cylinder, ending cylinder, and incremental value can be default or user defined. Default seek parameters are: starting cylinder 0, ending cylinder 2655 (last data cylinder), and an incremental value of l. An example of the seek algorithm using the default seeJ,t parameters: BEG: 0-1-2-3-4-5- 1/ 2653=2654=2655 :END (T32) Toggle Seek Test The Toggle Seek test does repetitive seeks between two cylinders. The starting and ending cylinders can be user defined or default cylinder addresses. Default seek parameters are: starting cylinder 0 and ending cylinder 2655 Oast data cylinder). An example of the seek algorithm using the default seek parameters: BEG: 0-2655-0-2655-0-2655- etc .. , :END (T33) Random Seek Test The Random Seek test does repetitive seeks between two cylinders. The starting and ending cylinders can be user defined or default cylinder addresses. Default seek parameters are: starting cylinder 0 and ending cylinder 2655 Oast data cylinder). (T34) Tapered Seek Test The Tapered Seek test exercises the servo by seeking between two cylinders using a tapered seek pattern. The pattern starts at the cylinder with ~he longest stroke and ends at the cylinder with the shortest stroke. The starting and ending cylinders can be user defined or default cylinders. Default seek parameters are: starting cylinder 0 and ending cylinder 2660 (diagnostic write cylinder). This example has the reference cyl=O and ending cyl=2660: BEG: 0-2660-0-2659-0-2658-0-2657~0-2656- etc. 0-6-0-5-0-4-0-3-0-2-0-1-0 :END This exampie has the rekrence cyl=2660 and ending cyl=2660: BEG: 2660-0-2660-1-2660-2-2660-3-2660-4 etc. 2660-2658-2660-2659-2660 :END This example has the reference cyl=1330 and ending cyl=2660: BEG: 1330-2660-1330-2659-1330-2658 etc. 1330-1332-1330-1331-1330 :END BEG: 0-1330-1-1330-2-1330-3-1330-4 etc. 1327-1330-1328-1330-1329-1330 :END DIGITAL INTERNAL USE ONLY 4-14 Drive-Resident Diagnostics and Utilities (T38) Average Seek Timing Test 4.7.1 Seek Timing Tests The following diagnostics are classified as seek timing tests. Seek timing tests can be executed through the OCP or through the SDI level 2 DIAGNOSE command. At the completion of a timing test, position three is blank, positions two and one contain a timing test acronym (MH, MX, AV, HD), and position zero contains an asterisk (*). At this point, the results can be displayed. A scrolling message display reports the test results to the user. The message is scrolled, one character at a time, starting at the right side of the OCP and continuing off to the left side of the OCP. The Run switch is used to start and stop the scrolling display by pressing it once to start the display, and once to stop the "display. All the timing tests use a 1-microsecond clock to calculate seek times. Because of this, the short seek and head switch times are not as accurate as the long seek times. (T36) Minimum Seek Timing Test This test executes the minimum seek timing algorithm and displays the results of the test in the OCP. Test time is approximately 75 seconds. The following scrolling message format is used to display test results: [MIN TIM FWD=xx.xMS] [MIN TIM REV=xx.xMS] where xx.x is the seek time (in milliseconds). The minimum. seek time is defined as the average of 2655 single cylinder seeks (forward and reverse). This test uses the default incremental seek pattern. NOTE If the time exceeds 99.9. the decimal point is shifted one digit to the right. The OCP displays [999]. (T38) Average Seek Timing Test This test executes the average seek timing algorithm and displays the test results to the OCP. This test takes 5-7 minutes to complete. The following message is scrolled across the OCP display: [AVG TIM FWD=xx.xMS] [AVG TIM REV=xx. xMS ] where xx.x is the seek time (in milliseconds). The average seek time is defined as the average of 512 one-third-Iength seeks. For the RA90 disk drive, the seek length is 855 cylinders. For the RA92 disk drive, the seek length is 1035 cylinders. Average seek time: < 21 milliseconds for RA90. Average seek time: < 19 milliseconds for RA92. DIGITAL INTERNAL USE ONLY Drive-Resident Diagnostics and Utilities 4-15 (T40) Update Cartridge Utility (Spun Down) (T39) Head Switch Timing Test This test exec-u.tes the head switch timing algorithm and displays the test reS'~lts to the OCP. This test takes approximately 2 seconds to run. The following message is scrolled across the OCP display: [HO SWT TIME=xx.xMS] where xx.x is the head switch time (in milliseconds). The head switch iime is defined as the summation oi all possible head switches divided by the total number of head switches. (T40) Update Cartridge Utility (Spun Down) The drive must be spun down to run the Update Cartridge utility. This internal microcode update utility is used in the field to update the following internal drive microcode functions: • Diagnostics microcode • Servo microcode • Functional microcode New microcode is loaded in the following sequence: 1. Load update cartridge into update port. 2. Load test T40. (Drive must be spun down.) 3. Start test T40. The following occurs in the OCP display once this test has begun (S = start, P = pass, C = complete): • [840] (2 seconds). • [p 1] (20 seconds) Pass one checks PROM to be loaded. • [p 2] (20 seconds) Pass two writes the new code into the even pages in EEPROM. • [p 3] (20 seconds) Pass three writes the new code into the odd pages in EEPROM. • [C 40] (1 second) Update is complete. • [WAIT] (10 seconds) Exits test mode and goes through power-up hardcore sequence. • [0000] Returns to display the drive unit address. DIGITAL INTERNAL USE ONLY 4-16 Drive-Resident Diagnostics and Utilities (T41) Display Error Log Errors (T41) Display Error Log Errors This utility displays the RA.901RA92 drive-resident error log. When initiated, it first verifies the integrity of the error log by reading the first four bytes of the elTor log header and comparing them to expected values. If the compare fails, the utility exits and an eITor code displays. The elTor log is displayed starting with the latest entry first and continuing until all entries are displayed. Positions three and two represent the error log entry in decimal. Positions one and zero represent the two-digit LED hex error code. Each entry is displayed for 1.5 seconds with the option of starting and stopping the display using the Run switch. NOTE Null entries are displayed as 00 and should be ignored. DIGITAL INTERNAL USE ONLY Drive-Resident Diagnostics and Utilities 4-17 (T43) Display Seeks Utility 4.7.2 Time, Seeks, and Spinups Display Interpretation The time, seeks, and spinups display utilities all use the following format to display the counts to the OCP: o POSITION 3 2 OCP 1 x X 9 8 7 6 5 4 3 2 1 0 OCP2 OCP3 OCP4 OCP5 I I OCP6 CXO-2146B The following conventions are used: TM =time SK= seeks SP =spinups OCP 1 contains either TM, SK, or Sp, and the binary digits 9 and 8. OCP 3 contains binary digits 7, 6, 5, and 4. OCP 5 contains binary digits 3, 2, 1, and o. OCP displays 2, 4, and 6 are used as separators to indicate the display is changing. (T42) Display Time Utility A 10-digit decimal number representing time is displayed when this utility is run, This number is time, in minutes, since the drive was first powered up. See Section 4.7.2 for display interpretation. (T43) Display Seeks Utility When this utility is run, the OCP displays the number of total seeks (times a thousand) since the drive was first powered up. A 10--digit decimal number is displayed in six segments at the OCP. Each segment is displayed for 1.5 seconds unless the display is halted by selecting the Run switch. See Section 4.7.2 for display interpretation. DIGITAL INTERNAL USE ONLY 4-18 Drive-Resident Diagnostics and Utilities (T45) Drive Revision Level Utility (T44) Display Spinups Utility This utility displays the total number of spinups since the drive was first powered up. When this utility is run, the total number of spinups is displayed on the OCP in six segments. Each segment is displayed for 1.5 seconds unless the display is halted by selecting the Run switch. See Section 4.7.2 for display interpretation. (T45) Drive Revision Level Utility This utility uses the following mnemonics to display drive component hardware and/or microcode revisions as follows: • DRV = Drive hardware revision • DCD = Drive microcode revision (microcode) • lOP = Master processor module (hardware) • SRV = Servo module (hardware) • PCM = Preamp control module (hardware) • ORV = Operator control panel (hardware) • OCD = Operator control panel (microcode) Running this utility displays the revision level for each module in a scrolling message format across the OCP. The following scrolling message format is used to display the information to the drive OCP: • DRV=www where www is the decimal hardware revision (0 to 255) of the drive. • DCD= yyy where yyy is the decimal revision number (0 to 255) of the combined functional, servo, and diagnostic microcode. The microcode is loaded from the microcode update cartridge. NOTE If a drive microcode revision (in the OCP display) contains an alpha character, for example, DCD=L200, this signifies unreleased code. The drive microcode should be updated. with a formally released microcode revision. • IOP= xx where xx is the decimal revision number (0 to 15) of the appropriate module. • SRV= xx: • PCM= xx • ORV= xx • OCD= z.z where z.z is the decimal revision number (0.0 to 9.9) for the OCP microcode. NOTE If the OCD is displayed as version 5.1 (OCD= 5.1), the drive has an OCP that allows the alternate unit address display mode to be used. Refer to Chapter 3. DIGITAL INTERNAL USE ONLY Drive-Resident Diagnostics and Utilities 4-19 (T45) Drive Revision Level Utility The hardware revision switches (Figure 4(2) provide the subsystem with the ability to determine base-level module revision compatibilities. The hardware switches are changed only by direction of a drive ECOIFCO. All ECO and FCO activity will take into account the impact of the changes to the drive and to the subsystem to which it is attached. RA90/RA92 DRIVE CHASSIS FRONT \ FOUR-POSITION HARDWARE REVISION DIP SWITCH CXO-2147B Figure 4-2 Hardware Revision SwItches NOTE Do not alter these switches in the field unless you are instructed to do so during an ECOIFCO installatiOD. The hardware revision switches make up only part of the total reported hardware revision. 'Ihe total reported hardware revision is a byte of information determined as shown in Figure 4-3. DIGITAL INTERNAL USE ONLY 4-20 Drive-Resident Diagnostics and Utilities (T47) Display Drive Serial Number Utility BITS '----- HARDWARE REVISION SWITCHES L . -_ _ _ _ L . -_ _ _ _ _ _ INDICATES SERVO SYSTEM IMPLEMENTED IS: 00 = DEDICATED (ONLY SERVO SYSTEM) 01 = EMBEDDED (BLEND) SERVO SYSTEM HDA CONFIGURATION 00 = RA90 LONG-ARM HDA (PIN 70-22951-01) 01 = RA90 SHORT-ARM HDA (PIN 70-27268-01) 10 = RA92 HDA (PIN 70-27492-01) 11 = NOT USED CXO-2716B Figure 4-3 Hardware Revision Byte (T46) HDA Revision Utility This utility allows you to display the HDA revision/vendor bits in the OCP display. The first year of production will reflect HDA revisionlvendor bit o. I I_V_......N. . .I......=.....I-.."__ OCP 1 .. CXO-2148B The two left-most places of the OCP display contain a VN for the vendor code. The right-most place of the OCP display contains a vendor code of 0 through 3. These revision/vendor bits are used to distinguish the HDA type to the drive microcode. These bits, in conjunction with PCM switches 81-1 and 81-2, tell the microcode how the servo system should be configured in microcode. (T47) Display Drive Serial Number Utility This utility displays the drive serial number to the OCP. The following message is scrolled Oeft to right) across the OCP display: [DRV SIN xxy_zzzz] where: - xx is the manufacturing location of the drive (CX=CXO, KB=KBO) - y is the alphanumeric digit 0-9 or A-Z (G, I, 0, Q, and X are not allowed) - zzzz is 0000-9999 DIGITAL INTERNAL USE ONLY Drive-Resident Diagnostics and Utilities 4-21 (T60) Loop-On-Test Utility (T50) Error Log Checkpoint Utility The Error Log Checkpoint utility allows you to enter a checkpoint entry into the internal drive elTOr log. This is similar to a place marker. (T53) Clear Seeks Utility The Clear Seeks utility clears the total number of seeks since the drive was first powered up. Run this test any time the HDA is replaced. NOTE Use a software jumper to execute this utility. This protects the stored information from unintentional clearing. Refer to Section 4.5. This test causes [INVL) to be displayed if you fail to use the software jumper. (TS4) Clear Spinups Utility The Clear Spinups utility clears the total number of spinups since the drive was first powered up. Run this test any time the HDA is repiaced. NOTE Use a software jumper to execute this utility. This protects the stored information from unintentional clearing. Refer to Section 4.5. This test causes [INVL) to be displayed if you fail to use the software jumper. (T55) Clear DD Bit Utility The Clear DD Bit utility clears the DD bit set by the diagnostics or a controller. (T60) Loop-On-Test Utility This utility enables looping on a test. It can be set to loop on a diagnostic test or a diagnostic sequence of tests. [LOT] is displayed on the OCP for 1.5 seconds when the loop utility is run. The utility loops until an error is encountered or until the Test switch on the OCP is selected. OCP 11'-_L_Io.....0_I......T--Ilo..-~1 CXO-2149A DIGITAL INTERNAL USE ONLY 4-22 Drive-Resident Diagnostics and Utilities (T63) Head Select Utility (T61) Loop-On-Error Utility This utility loops continuously on elTors encountered during the execution of drive internal diagnostics. The test loops as long as the error is present. [LOE] is displayed on the OCP for 1.5 seconds when the loop utility is run. (T62) Loop-Off Utility The Loop-Off utility terminates all loop-on conditions. [LOF] is displayed on the OCP for 1.5 seconds when this utility is run. OCP1 __ I__~ MI_L____o~I F__ CXO-2150A The effects of the LOT or LOE utilities may be canceled manually (LOF) or by exiting OCP test mode and letting the idle loop routine execute at least one time. (T63) Head Select Utility The Head Select utility allows you to select or change the head to be tested. When the utility is first run, the currently selected head number is displayed in decimal (0-12) in the OCP display, and the least significant digit (LSD) blinks. The format is as follows: 1_H____=_I__o~__o__'1 OCP 1 ... CXO-2151A The head number may be changed by selecting the Port B switch to increment the blinking digit. When the desired head number is displayed in the OCp, pressing the Write Protect switch causes that head to be selected and the head number to be changed in RAM. If the Test switch is pressed, the test is aborted and the change does not take place. The head remains selected until changed by this utility, power-up or reset, I/O processor reset, SDI INIT, or controller intervention. DIGITAL INTERNAL USE ONLY Drive-Resident Diagnostics and Utilities 4-23 (T65) Seek Parameter Input Utility (T64) One Seek Utility The One Seek utility can be used to seek and lock on a cylinder. Vlhen run, the folloW='u...g OCP display is seen: OCP 1 C Y L = (CYLINDER 1.5 SEC) OCP2 X X X X (CYLINDER VALUE: 0-2660 SEC) CXO-2152A The right-most digit blinks to indicate cylinder value selection can begin. Selecting the Port A switch selects the next desired digit position which starts blinking upon selection. Digit position is from right to left (LSD to MaD). A wrap back to the LSD occurs if the Port A switch is selected enough times. Selecting the Port B switch increments the blinking digit. After the cylinder value is set, select the Write Protect switch to cause the heads to position themselves at the desired cylinder. Selecting the Test switch aborts the process without changing the cylinder value. The selected cylinder value is stored in RAM until T64 is run again or a power-up reset, master processor reset, or SDI INIT occurs. (T65) Seek Parameter Input Utility Four seek parameters can be examjned or changed when using the seek timing tests T36, T38, and T39. They are: • FCY (first cylinder) e LCY (last cylinder) • INC (increment) • DLY (delay) Seek parameters are changed the same way as the seek utility parameters. Refer to tests T36, T38, and T39 for a discussion on altering parameters for diagnostics. The following describes the sequence of events which occur when test T65 is run: FCY=is the first display seen when this utility is started (Figure 4-4). The first cylinder value follows 1.5 seconds later. The FCY can be any decimal number between 0 and 2660. OCP 1 F C Y = (FIRST CYLINDER VALUE 1.5 SEC) OCP2 X X X X (DESIRED VALUE: 0-2660 SEC) CXO-2153A Figure 4-4 T65 Fey OCP Display DIGITAL INTERNAL USE ONLY 4-24 Drive-Resident Diagnostics and Utilities (T65) Seek Parameter Input Utility Next, select the Write Protect switch. LCY= is displayed (Figure 4-5). The last cylinder value follows 1.5 seconds later. The LCY can be any decimal value between 0 and 2660. OCP 1 L C Y = (LAST CYLINDER VALUE 1.5 SEC) OCP 2 X X X X (DESIRED VALUE: 0-2660 SEC) CXO-2154A Figure 4-5 T65 LCY OCP Display Select the Write Protect switch again. INC= is displayed (Figure 4-6). The incremental value follows 1.5 seconds later. The INC value can be any decimal number between 1 and 2660. If a value of 0 is chosen, the test loops indefinitely. OCP 1 I N C = (CURRENT INCREMENT VALUE 1.5 SEC) X X X X (DESIRED VALUE: 0-2660 SEC) CXO-2155A Figure 4-6 T65 INC OCP Display Select the Write Protect switch and DLY= is displayed (Figure 4-7). The delay value between seeks is displayed 1.5 seconds later. A delay value can be between 0 and 2999 milliseconds. OCP 1 0 L Y = (CURRENT DELAY VALUE 1.5 SEC) OCP 2 X X X X (DESIRED VALUE: 0-2999 SEC) CXO-2156A Figure 4-7 T65 DLY OCP Display The seek parameters remain changed until this utility is run again or a power-up reset, 110 processor reset, or SDI INIT occurs. NOTE T65 does not check for out-of.range values. Do not exceed the maximum specified input values. Also, the last cylinder parameter must always be equal to or greater than the :&rst cylinder parameter. If an invalid cylinder value is entered, a (servo) seek failed error (F5) occurs. DIGITAL INTERNAL USE ONLY Drive-Resident Diagnostics and Utilities 4-25 (T66) Variable Average Seek liming Test (T66) Variable Average Seek Timing Test This test exe.A!>'"lltes the average seek timing algorithm and allows you tG time any ler~h seek. To set the seek length, modify the first (FeY) and last (LeY) cylinder addresses through the seek parameter input utility (T65). The run time for this test varies, depending on the length of the seek used. The run time should not take longer than 45 seconds, regardless of the length of the seek. The following message is scrolled across the OCP display: [AVG TIM FWD=xx.xMS] [AVG TIM REV=xx.xMS] where XLX is the seek time (in milliseconds). The variable average seek time is defined as the average (AVG) of 512 seeks in forward (FWD) and reverse (REV) directions. DIGITAL INTERNAL USE ONLY 5 Troubleshooting and Error Codes J:.. Tirft"hleshoctinl"'l Re'erenl'A •• v ... w. v ..... •.• m atAria __ •• l• "'. I ~. I ..~ When running diagnostics and interpreting error logs, you will need the documents listed (alphabetically) in Table 5-1. Table &-1 Reference Material for Troubleshooting Document Title Order Number DBA Error Log Manual EK-DSAEL-MN DBA Error Log Pocket Service Guide EK-DSAEL-PG Getting Started With VAXsimPLUS AA-KN79A-TE HSC Service Manual EK-HSCMA-SV VAXsimPLUS Field Service Manual AA-KN82A-RE VAXsimPLUS User Guide AA-KNSOA-TE Refer to Section 5.19 for RA9OIRA92 disk drive error codes and descriptions. 5" 1"1 Customer Support Training for the R-.A90/RA92 Disk Drive You must have the proper training to efficiently support the RA disk family. This training is available at most Customer Services Training Centers, category A and B sites. Consult with your Customer Services unit managers for training information. DSA Level I and HSC Level I courses are prerequisites to the RA90 IVIS training. Although support organizations are available to assist in problem solving, there is no substitute for proper training. Support training resources include DSA Level II and DSA Troubleshooting courses, and the RA90 Disk Drive Technical Description Manual. 5.2 RA90/RA92 Troubleshooting Aids The following aids are available for disk drive troubleshooting: • VAXsimPLUS (VMS systems) (see Section 5.2.1) • Host error logs (see Section 5.2.2) • Drive internal error log (see Section 5.2.4) • Operator control panel (OCP) fault indicator/error codes (see Section 5.2.5) • Drive power supply indicator (see Section 5.2.6) DIGITAL INTERNAL USE ONLY 5-1 5-2 Troubleshooting and Error Codes • Drive error reporting mechanisms (see Section 5.2.7) • Host-level diagnosticS/utilities (see Section 5.2.8) 5.2.1 VAXsimPLUS The VAX System Integrity Monitor (VAXsimPLUS) provides access to VMS error log data. The three VAXsimPLUS manuals needed to operate VAXsimPLUS effectively are listed in Section 5.1. 5.2.2 Host Error Logs Refer to the appropriate system error logs for error interpretation. The DSA Error Log Manual and the DSA Error Log Pocket Service Guide contain system error log descriptions for most operating systems. 5.2.3 Extended Status Bytes Extended status bytes are part of the response to the SDI GET STATUSII'OPOLOGY command or any unsuccessful response to a level 2 command. These bytes are passed through the controller to the host for error logging purposes. Figure 5-1 shows a breakdown of the RA9OIRA92 extended drive status bytes. Extended status bytes are described in detail in the sections that follow. BYTE 01 RESPONSE OPCODE BYTE 02 UNIT NUMBER LOW BYTE BYTE 03 SUBUNIT MASK BYTE 04 REQUEST BYTE GENERIC DRIVE STATUS BYTE BYTE 05 MODE BYTE GENERIC DRIVE STATUS BYTE BYTE 06 ERROR BYTE GENERIC DRIVE STATUS BYTE BYTE 07 CONTROLLER BYTE GENERIC DRIVE STATUS BYTE BYTE 08 RETRY COUNT BYTE 09 PREVIOUS CMD OPCODE EXTENDED DRIVE STATUS BYTE BYTE 10 HDA REVISION BITS EXTENDED DRIVE STATUS BYTE BYTE 11 CYLINDER ADDR (LO) EXTENDED DRIVE STATUS BYTE BYTE 12 CYLINDER ADDR (HI) EXTENDED DRIVE STATUS BYTE I EXTENDED DRIVE STATUS BYTE BYTE 13 RECOVERY LVL GROUP NO BYTE 14 ERROR CODE EXTENDED DRIVE STATUS BYTE BYTE 15 MFG FAULT CODE EXTENDED DRIVE STATUS BYTE CXO-2157B Figure 5-1 RA90/RA92 Extended Drive Status Bytes DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-3 5.2.3.1 Response Opcode (Byte 1) Response Opcode (Byte 1) is the drive-to-controller response opcode and indicates the success or failure of the previous controller-to-drive command. Generally, this is transparent to the user. 5.2.3.2 UnH Number Low Byte (Byte 2) and Subunit Mask (Byte 3) BITE 3 10 0 0 1 Ix BITE 2 x x xl I I I I I 1,1 I Ix x x xix x x xl I 1,1 I I 1,1 I IIIL-1_1I_______LnD"/~II~IT~~I~"~''''''B~o/nTI''\Ant\Af''\~'''''''.lA' SUB"U.. REPORTiNGUTHISSTATUS)'UM/ II I . .. lIAV' .. .I'I .... \ NIT~O'MASK(SUBUNrro - SUBUNIT 1 MASK (NOT USED) SUBUNIT 2 MASK (NOT USED) ' - - - - - - - - - - - S U B U N I T 3 MASK (NOT USED) CXO-3017A 5.2.3.3 Request Byte (Byte 4) I xxxx I xxxxI BYTE 4 REQUEST BYTE L (RU) 0 = RUN/STOP SWITCH OUT 1 = RUN/STOP SWITCH IN - (PS) 0 = PORT SWITCH OUT 1 = PORT SWITCH IN ( PB) 0 = PORT A RECEIVERS ENABLED 1 = PORT B RECEIVERS ENABLED I I C (EL) 0 = NO LOGGABLE INFORMATION IN EXTENDED STATUS AREA 1 = LOGGABLE INFORMATION IN EXTENDED STATUS AREA (SR) 0 = SPINDLE NOT READY (NOT UP TO SPEED) 1 = SPINDLE READY (DR) 0 = NO DIAGNOSTIC IS BEING REQUESTED FROM THE HOST 1 = THERE IS A REQUEST FOR A DIAGNOSTIC TO BE LOADED INTO THE DRIVE MICROPROCESSOR MEMORY """"'-------(RR) 0 = DRIVE REQUIRES NO RECALIBRATE COMMAND 1 = DRIVES REQUESTS RECALIBRATE COMMAND ' - - - - - - - - - ( O A ) 0 = DRIVE ON LINE OR AVAILABLE TO CURRENT CONTROLLER 1 = DRIVE UNAVAILABLE (IT IS ALREADY ON LINE TO ANOTHER CONTROLLER) CXO-1281A DIGITAL INTERNAL USE ONLY 5-4 Troubleshooting and Error Codes 5.2.3.4 Mode Byte (Byte 5) I 000 x I XXXXJ L( - MODE BYTE BYTE 5 S7) 0 = 512-BYTE SECTOR FORMAT (16-BIT) 1 = 576-BYTE SECTOR FORMAT (18-BIT) (NO CURRENT PLAN TO IMPLEMENT 18-BIT) (DB) 0 = DBN AREA ACCESS DISABLED 1 = DBN AREA ACCESS ENABLED ( FO) 0 = FORMATTING OPERATIONS DISABLED 1 = FORMATTING OPERATIONS ENABLED ( DD) 0 = DRIVE ENABLED BY CONTROLLER ERROR ROUTINE OR DIAGNOSTIC 1 = DRIVE DISABLED BY CONTROLLER ERROR ROUTINE OR DIAGNOSTIC (FAULT LIGHT = ON) (W1) 0 = WRITE-PROTECT SWITCH FOR SUBUNIT 0 IS OUT 1 = WRITE-PROTECT SWITCH FOR SUBUNIT 0 IS IN (W2)NOTIMPLEMETED ( ED1) ERROR LOG DISABLE (SET BY TWO-BOARD CONTROLLER DIAGNOSTICS) (EDO) ERROR LOG DISABLE (SET BY TWO-BOARD CONTROLLER DIAGNOSTICS) CXO-2193A Bits EDl and EDO can only be set by two-board controller diagnostics. If either EDl or EDO are set (EDx=l), the RA901RA92 disk drive turns off internal error logging. 5.2.3.5 Error Byte (Byte 6) l X X X 0 J X 000 I I BYTE 6 ERROR BYTE (WE) 0 = NO ERROR 1 = WRITE LOCK ERROR (ATTEMPT TO WRITE WHILE WRITE-PROTECTED) NOT USED ( DF) 0 = NO ERROR 1 = DRIVE FAILURE DURING INIT ( PE) 0 = NO ERROR 1 = LEVEL 2 PROTOCOL ERROR (IMPROPER COMMAND CODES OR PARAMETERS ISSUED TO DRIVE) (RE) 0 = NO ERROR 1 = SDI RECEIVE ERROR ON SDI TRANSMISSION L1NE(S) FROM CONTROLLER (DE) 0 = NO ERROR 1 = DRIVE ERROR (DRIVE FAULT LIGHT MAY BE ON; CAN BE CLEARED VIA DRIVE CLEAR COMMAND) CXO-1283C DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-5 The error byte is one of four generic status bytes. Error bits in the error byte are set by the drive for drive-detected errors. The controller clears the bits with the SDI DRIVE CLEAR command. The bits are described as follows: • The DE bit reports any internal drive error that requires explicit controller recovery ac'"~on other than simple command retransmission or context readjustment. • The RE hit reports transmission errors detected by the drive. • The PE bit reports level 2 protocol errors detected by the drive. • The DF bit indicates the drive did not pass its initialization/diagnostics the last time it was initialized or powered up. • The WE bit reports the drive received a SELECT TRACK AND WRITE command or a FORMAT command while the drive was write protected. NOTE Drive-detected errors fit into one of the five classes described above and are reported as such. Controller-detected drive errors are logged without any of these bits being set. For example, the drive actuator has positioned itself to a cyHnder other than the one the controller requested.. The controller detects this failure as a drive positioner error or an invalid header error. 5.2.3.6 Controller Byte (Byte 7) BYTE 7 CONTROLLER BYTE '----- 0000 = NORMAL DRIVE OPERATION 1000 = DRIVE IS OFF LINE AND UNDER CONTROL OF A DIAGNOSTIC I 1001 = DRiVE is OFF LiNE DUE TO ANOTHER DRiVE HAVING THE SAME UNIT SELECT IDENTIFIER I ' - - - - - - (SI) 1 = NOT USED '--------------(S2) 1 = NOT USED --------------(~)1=NOTUSED ~---------------(S4)1=NOTUSED CXO-2158A 5.2.3.7 Retry Count (Byte 8) Byte 8 is the retry count during the last SEEK or RECALIBRATION command. (The retry count is the number of times the command was retried, internal to the drive, in an attempt to successfully complete the SEEK or RECALIBRATE operation.) DIGITAL INTERNAL USE ONLY 5-6 Troubleshooting and Error Codes 5.2.3.8 Previous Command Opcode (Byte 9) xxxx xxxx BYTE 9 LAST OPCODE (EXTENDED DRIVE STATUS BYTE) "----OPCODE OF THE LAST PREVIOUS LEVEL 2 DRIVE COMMAND DECODED BY THE DRIVE (RECEIVED FROM THE SOl CONTROLLER) 81 = CHANGE MODE 82 = CHANGE CONTROLLER FLAGS 03 = DIAGNOSE 84 = DISCONNECT (DRIVE) 05 = DRIVE CLEAR 06 = ERROR RECOVERY 87 = GET COMMON CHARACTERISTICS 88 = GET SUBUNIT CHARACTERISTICS OA = INITIATE SEEK 8B = ON LINE OC = RUN 80 = READ MEMORY 8E = RECALIBRATE 90 = TOPOLOGY OF = WRITE MEMORY FF = SELECT GROUP (LEVEL 1 COMMAND - PROCESSED BY FIRMWARE SEEK HEAD SELECT SUBROUTINES) CXO-1285B 5.2.3.9 HDA Revision Bns (Byte 10) Byte 10, bits 0 and 1, indicate which vendor heads are used in the HDA. Bit 7 is the UNCALIBRATED hit and indicates the drive failed during drive recalibration. 5.2.3.10 Cylinder Address (Bytes 11 and 12) Decoding bytes 11 and 12 gives you the cylinder address from the last SDI SEEK command issued to the drive. See Examples 5-1 (for the RA90 disk drive) and 5-2 (for the RA92 disk drive) to determine cylinder address and group (head). DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-7 The RA90 implements the following geometry for logical addressing: T"ne RA90 has 1 logical track = 1 physical The RA90 has 1 logical group The RA90 has logical cylinder 1 logical track 13 logical groups The current cylinder address and current group bytes indicate the cylinder address and group where the read/write heads are positioned. The following formula outlines how to obtain the cylinder head from the logical block number (LBN). = LBN/897 = cyl.fraction (discard fraction) Coyl * 897»/69 = head.fraction Cylinder (cyl) Head = (LaN = to physical cylinder and head number conversion: LBH If LBN 23609 Then 23609/897 26.32 (discard fraction) cn = 26 To find the head, use the following example: Head (23609 - (26 * 897»/69 Head 4.16 (discard fraction) Head 4 As you can see LBN 23609 = head 4 and physical cylinder 26. DBRa to physical cylinder and track (head on RA90 disk drives) conversion: cn = 2654 + DBN/910 = cylinder.fraction (discard fraction) Head = (DBN fraction) XBR «Cn - 2654) * 910»/70 = head.fraction (discard to physical cylinder and head conversion: cn = 2651 + XBN/910 = cylinder.fraction (discard fraction) Head = (XBN - «Cn - 2651) fraction) * 910»/70 Head.fraction (discard RBR to convert a RBN to the associated physical cylinder and head, use the following formula: cn = RBN/13 = cylinder. fraction (discard fraction) Head = RBN - (Cn * 13) Example 5-1 RA90 Cylinder Address and Group (Head) DIGITAL INTERNAL USE ONLY 5-8 Troubleshooting and Error Codes The RA92 implements the following geometry for logical addressing: The RA92 has 1 logical track = 1 physical track The RA92 has 1 logical group The RA92 has logical cylinder 1 logical track 13 logical groups The current cylinder address and current group bytes indicate the cylinder address and group where the read/write heads are positioned. The following formula outlines how to obtain the cylinder head from the logical block number (LBN). = LBN/949 = cyl.fraction (discard fraction) (cyl * 949»/73 = head.fraction (discard fraction) Cylinder (cyl) Head = (LBN - LBN to physical cylinder and head number conversion: If LBN 23609 Then 23609/949 24.88 (discard fraction) cn = 24 To find the head, use the following example: * 949»/73 Head (23609 - Head 11.411 (discard fraction) Head 11 (24 As you can see LBN 23609 = head 11 and physical cylinder 24. DBNa to physical cylinder and track (head on RA90 disk drives) conversion: CYL = 3104 + DBN/962 = cylinder.fraction (discard fraction) Head = (DBN fraction) XBN «Cn - 3104) * 962»/74 = head.fraction (discard to physical cylinder and head conversion: CYL = 3101 + XBN/962 = cylinder.fraction (discard fraction) Head = (XBN fraction) «CYL - 3101) * 962»/74 = Head.fraction (discard RBN to convert a RBN to the associated physical cylinder and head, use the following formula: CYL = RBN/13 = cylinder. fraction (discard fraction) Head = RBN - (CYL * 13) Example 5-2 RA92 Cylinder Address and Group (Head) DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-9 5.2.3.11 Error Recovery Level (Selected Group) (Byte 13) I x Ix I x I x Ix I x I x I x I BYTE 13 ERROR RECOVERY LEVEL (SELECTED GROUP) .......- - - - - - G R O U P NUMBER FOR LAST GROUP SELECT COMMAND, OR LAST SUCCESSFUL GROUP SELECT DURING A SEEK COMMAND (R/W HEAD NUMBER) 1...-_ _ _ _ _ _ _ _ _ _ CURRENT ERROR RECOVERY LEVEL CXO-2159A 5.2.3.12 Error Code (Byte 14) Refer to Section 5.19 for drive error codes and their descriptions. 5.2.3.13 Manufacturing Fault Code (Byte 15) Byte 15 contains the manufacturing repair code and is used by the repair depot. 5..2. 4 Drive Internal Error Log All drive-detected disk subsystem errors are recorded in the RA9O!RA92 drive internal error log. Power-related errors are also recorded. ECC errors are not recorded in the drive internal error log. Figure 5-2 shows the RA9OIRA92 drive internal error log memory layout; Figure 5-3 shows the RAOOIRA92 drive internal error log header format; and Figure 5-4 shows the RA.9OIRA92 drive internal error log descriptor format. There are three ways to extract the RA901RA92 drive internal error log: 1. Run DKUTIL from the HSC console or KDM controller (see Section 5.2.4.1). 2. Run utilities for two-board controllers. (Table 5-2 lists the systems that use two--bca.~ controllers.) 3. Run drive-resident utility T41 from the RA901RA92 OCP (see Section 5.2.4.2). Table 5-2 Two-Board Controller Diagnostics Monitor KDAlKDBlUDA XXDP ZUDM VDS EVRLL MDM Test drive internal error log utility at the device utility menu DIGITAL INTERNAL USE ONLY 5-10 Troubleshooting and Error Codes LABEL BYTE WIDE MEMORY LOGBUF OA006H START OF ERROR LOG HEADER SAVESET OA010H START OF POWER DOWN PAGE; FIRST 8 BYTES ARE DRIVE GENERIC SAVEO OA018H SECOND 8 BYTES ARE DRIVE SPECIFIC OA025H LAST BYTE OF HEADER OA026H UNUSED OA02FH UNUSED DSCBEG OAOSOH START OF ERROR LOG DESCRIPTORS OA42FH LAST BYTE OF LAST DESCRIPTOR DSCEND OAOSOH END OF DESCRIPTOR MARKER; FROM HERE ON EEPROM IS NOT USED FOR ERROR LOG CXO-2162A Figure 5-2 RA90/RA92 Drive Internal Error Log Memory Layout DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-11 LOGBUF (ADDRESS LABEL) = OAOOSH FFFB WORD 00 WORD 01 I SIZE WORD 02 ______ DEVICE TYPE ERRORLOG SIZE LO ORDER SEEKS SINCE LAST POWERUP WORD 03 WORD 04 t~ H_I_O_R_D_E_R_S_E_E_K_S_S_I_N_C_E_L_A_S_T_P_O_W_E_R_U_P______~l SAVESET (ADDRESS LABEL) = OA010H .. r- ,f'\ WORD 05 LO ORDER CUMULATIVE SEEKS WORD OS HI ORDER CUMULATIVE SEEKS WORD 07 LO ORDER TOTAL ELAPSED TIME (MIN) WORD 08 HI ORDER TOTAL ELAPSED TIME (MIN) WORD 09 OCP SWITCH STATUS UNIT NUMBER ONES DIGIT WORD 10 UNIT NUMBER TENS DIGIT UNIT NUMBER 100S DIGIT WORD 11 UNIT NUMBER 1000 DIGIT S.SA2 STATUS BYTE WORD 12 CUMULATIVE NUMBER OF SPINUPS WORD 13 NOT USED = OOOOH WORD 14 WORD 15 BAD ERROR LOG FLAG l POWER DOWN DATA* FAUL T TABLE POINTER POINTER TO DESCRIPTOR ENTRY THAT FA!LED *MUST BE SAVED AT AN EEPROM PAGE BOUNDRY (XXXOH). CXO-21S0A Figure S-3 RA90/RA92 Drive Internal Error Log Header Format DIGITAL INTERNAL USE ONLY 5-12 Troubleshooting and Error Codes DSCBEG (ADDRESS LABEL) = OA020H WORD 00 ERROR TYPE ERROR CODE WORD 01 FRUIDRU NUMBER NUMBER OF ASCII BYTES WORD 02 LO NUMBERS SEEKS AT TIME OF ERROR WORD 03 HI NUMBER OF SEEKS AT TIME OF ERROR WORD 04 ENTRY WRITE COUNT WORD 05 NUMBER OF SPINUPS SINCE FIRST POWERUP WORD 06 WORD 08 I CURR GROUP I ERR RCVRY LVL WORD 07 DRIVE GENERIC INFORMATION TBD DESIRED CYLINDER LO ORDER TOTAL ELAPSED TIME (MIN) WORD 09 HI ORDER TOTAL ELAPSED TIME (MIN) WORD 10 ASCII BYTE ASCII BYTE WORD 11 ASCII BYTE ASCII BYTE WORD 12 ASCII BYTE ASCII BYTE WORD 13 ASCII BYTE ASCII BYTE WORD 14 ASCII BYTE ASCII BYTE WORD 15 ASCII BYTE ASCII BYTE DRIVE SPECIFIC INFORMATION CXO-2161A Figure 5-4 RA90/RA92 Drive Internal Error Log Descriptor Fonnat 5.2.4.1 Running DKUTIL From the HSC Console or KDM70 Controller Running DKUTIL from the HSC console controller dumps the drive internal error log to the HSC console. The same capability exists for the KDM70 controller. To display the drive internal error log, enter the DISPLAY ERROR command at the HSC prompt (see the example below). First do: DKUTIL> GET Dxxxx (If Drive is capable of being put on line) OR DKUTIL> GET Dzzzz/NOOHLXNB (If Drive is incapable of being put on line) THEN DKUTIL> DISPLAY ERROR Figure 5-5 shows an example of a formatted drive internal error log. The data in this example will help you determine the time elapsed since a failure occurred. DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-13 ERROR LOG ENTRIES FOR DR!VE 0090 SELECT STARTING ENTRY LOCATION (1-32) [20]? ENTER HOW MANY ERROR LOG ENTRIES TO DISPLAY (0-32) [32]? PAUSE AND PROMPT AFTER 10 ERROR LOG ENTRIES [(Y),N]? Y DRIVE TYPE MAX#ENTRYS SEEKS/POWER ON CUM. SEEKS (D) (D) (D) RA90 64 328 9065 4 4 4 PE DE RE 2B F5 07 0000042695 0000A6C7* DRIVE SPECIFIC HEX DATA BYTE 0-9, RIGHT TO LEFT (H) ENTRY ENTRY ERR ERR SEEK MFG LOCTN COUNT TYP CODe COUNT CODE (D) (D) (A) (H) (D) (H) 20 19 18 CUMULATIVE POWER-ON MINUTES (D) (H) DRIVE ERR MESSAGE (A) 8751 00 00 00 3F 1C 00 00 00 00 00 17 8751 11 00 00 SF 1C 00 00 00 00 00 17 00 00 3E 95 05 2C 06 2C 00 15 8731 OE inv.dmr.num. dsp.sek.fit. frm.seq.err. ~yttWAy TIME·· (D) = decimal (A) = ASCII (H) = hex CYL ERROR REC LEVEL Jl SPIN-UPS SINCE FIRST POWER-UP HEAD/ GROUP * 0000A6C7 (H) CUMULATIVE POWER-ON MINUTES (SUBTRACT) - ** 00003F1C (H) LEFT-MOST FOUR "TIME- BYTES (EQUALS) = 000067AB (H) T!ME LAPSE SINCE LAST ERROR (D) = 26,539 MiNUTES CONVERT HEX TIME LAPSE TO DECIMAL MINUTES, THEN CONVERT TO HOURS, THEN CONVERT TO DAYS. CXO-2994A Figure S-5 Drive Internal Error Log The ten bytes of drive-specific hex data printed by the DKUTIL utility are divided by the RA9OIRA92 into five data fields. The drive specific hex data fields are: 1. Time (minutes) 2. Cylinder 3. Head/group 4. Undefined 5. Spinups since the last power-up NOTE All five data fields represent the drive state at the time of the error. DIGITAL INTERNAL USE ONLY 5-14 Troubleshooting and Error Codes 5.2.4.2 Running the Drive-Resident Utility Dump (T41) From the OCP Run drive-resident utility T41 to display the drive internal error log. (Refer to Chapter 4 for instructions on how to run this utility.) The drive internal error log is displayed starting with the latest entry and continuing until all entries are displayed. Positions three and two represent the error log entry in decimal. Positions one and zero represent the two-digit LED hex error code. Each entry is displayed for 1.5 seconds. You can start or stop the display using the Run switch. 5.2.5 OCP Fault Indicator/Error Codes The OCP Fault indicator lights when a hard fault is detected. Select the Fault switch to display an error code. These error codes are described in Section 5.19. Each description includes fault isolation information. 5.2.6 Drive Power Supply Indicator The drive power supply has a green LED that, when lit, indicates the power supply is operating normally. If the LED is not lit and the drive is experiencing problems, begin troubleshooting in this area. Figure ~ shows the location of the green LED. DRIVE CIRCUIT BREAKER /REAR ( r\ On~O ~ ( GREEN LED (POWER OK) CXO-2134B Figure 5-6 Power Supply Indicators DiGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-15 If the green LED appears to be at about half brilliance and the OCP has no display, the power suppiy is in a crow-bar state. Recycling the circuit breaker may clear the condition. J: 2.~ , ~. nrh,ft c ••_. Rftnc..in'" ··echanisft'lls . . , . Y'V . . . . . " ' . • 'VI""' M • • • II I •• .....~ The RA9OIRA92 detects and reports the majority of real-time errors and faults in the drive, including intermittent failures. All drive-detected errors are reported to the controller. If error logging is available and enabled, the controller reports errors to the host. 5.2.7.1 Detailed Description of Error Reporting Mechanisms RA9OIRA92 disk drives have five mechanisms available to report error conditions to the controller. The mechanism used is based on the state of the drive, the drive activity at the time of the error, and the error that occurred. The five mechanisms are listed below. As described in this list, it is assumed that a port or ports have been selected from the OCP port select switches. 1. STOP TRANSMITTING CLOCKS AND DATA OVER ANY SDI LINE-The drive stops transmitting clocks and data over any SDI line connected to either port if any of the following conditions exist: • The drive is off line to the controller. • Power is failing. • A failure is detected that prevents communication between the drive and the controller. 2. TRANSMIT CLOCKS BUT NO STATE INFORMATION-The drive transmits drive clock but does not transmit state (RTDS) information if it is off line to the controller or if it failed resident diagnostics. The only time a drive executes resident diagnostics is at power-up or reset and when an SDI INIT is received by the real time controller state (RTCS) line. If a drive receives an SDI INIT, it executes resident diagnostics verifying processor and communications paths to the controller. 3. ASSERT ATrENTION IN THE RTDS-The drive uses the RTDS attention mechanism to report error conditions if the drive is on line to the controller. The RTDS attention mechanism is used when the command timer expires or when one of the generic status bits changes, with the following exceptions: when a generic status bit changes as a result of a correct operation during an SOl leve12 command or an error in an SDI level 2 command occurs. 4. SEND UNSUCCESSFUL RESPONSE-An unsuccessful response to an SDI level 2 command is sent to the controller if any of the following conditions exist: • The execution of an SDI level 2 command could not be completed successfully. (For example, a level 2 DRIVE CLEAR command was issued but the error condition could not be cleared.) • A transmission error occurred during an SDI level 1 exchange and the drive successfully received a valid SDI level 1 end frame. • A protocol error occurred. • A fault occurred while the drive was executing a topology command. 5. CONTROLLER RESPONSE TIMEOUT -This is not a drive mechanism, but it indicates to the controller that the drive has an error condition. DIGITAL INTERNAL USE ONLY 5-16 Troubleshooting and Error Codes 5.2.8 Host-Level Diagnostics and Utilities If possible, avoid running host-level diagnostics to recreate the symptoms. You only extend the service period. However, under certain conditions you may need to run host-level diagnostics. Refer to Section 5.11. Do not use host-level diagnostics to verify drive repair; use resident diagnostics tests. Use systemlevel commands to ensure the drive is on line and operating normally. 5.3 General Troubleshooting Information The drive internal error log records all drive-detected (DD) faults as error codes. Use the recorded error codes to help isolate faults to a failing or failed FRU. Run the RA9OIRA92 disk drive utility program T41, Display Drive Error Log, to extract drive internal error log information. Real-time faults detected by the disk subsystem are recorded in the host error log of the supporting operating system software. Host error logs contain detailed information on intermittent and hard drive errors and can also be used to isolate the failing field replaceable unit (FRU). ECC-type errors are detected by controllers and logged in the host (or HSC) level error logs. These errors are not recorded in the drive internal error log. The drive only reports drive-detected errors. Once a disk drive fault has been isolated to an FRU and repairs have been made, use drive-resident diagnostics to verify proper drive operation. 5.3.1 Drive-Resident Diag nostics Limitations The following disk functions or areas are not covered by resident diagnostic testing: 1. Customer data areas (are never read or written to during testing). 2. Data paths between the drive and controller. 3. Internalloopback testing (only tests the SDI loopback through the TSID gate array). External SDI testing can be accomplished with resident diagnostic T09 and use of a loopback connector (Digital part number 7~19074-01). "At-speed" testing of the SDI circuitry is not done. SDI interface testing is accomplished by internally looping the SDI signals within the SDI gate array and TSID. Transformer couplings are not tested. If you suspect media, go to Section 5.8. Drive-resident diagnostics descriptions are in Chapter 4. 5.4 Step-by-Step Troubleshooting Procedure Use this troubleshooting procedure when you are reasonably certain the problem is in a disk drive. Some troubleshooting procedures may require that you follow the entire procedure before isolating the problem. If you have an error code, go to Section 5.19 for a description of the error and an FRU replacement list. Included in this section is a step-by-step troubleshooting flowchart (Figure 5-7). Each section heading that follows this flowchart contains a number, enclosed within a box, that corresponds to those in the step-by-step troubleshooting flowchart. DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-17 ( 1. START) IDENTIFY PROBLEM DRIVE CONSIDER 1.1 1-4---------------1 REMOTE SUPPORT 1.2 YES 1.3 YES 1.4 YES 1.5 YES , 2 . ....-----..... IDENTIFY PROBLEM FRU 1.6 OTHER MEANS .....------...J~ CXO-2163C Sheet 1 of 6 Figure 5-7 (Cont.) Step-by-step Troubleshooting Flowchart DIGITAL INTERNAL USE ONLY 5-18 Troubleshooting and Error Codes 2. IDENTIFY PROBLEM FRU 2.1 PRE-VERIFY DRIVE SYMPTOMS NO YES 3. OSA ERRORS 2.3 YES 2.4 YES 9. 2.5 YES 2.6 YES FRU REPLACEMENT PROCEDURES 5. MISC CHECKS CXO-2163C Sheet 2 of 6 Figure 5-7 (Cont.) Step-by-Step Troubleshooting Flowchart DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-19 3. DDDE = DRIVE-DETECTED DRIVE ERROR DDDF = DRIVE-DETECTED DIAGNOSTIC FAULT DOPE"", DRIVE-DETECTED PROTOCOL ERROR RE = TRANSMISSION ERROR DSA SEE SECTION >-_~ ON "ERROR CODES AND DESCRIPTIONS" > - -...... CONTROLLER L..-_ _ _ _ _. . . . ......- - - - . . CONTROLLER! DRIVE POWER CABLING > - -..... CONTROLLER YES ECM YES DRIVE SDICABLES CONTROLLER CXO-2163C Sheet 3 of 6 Figure 5-7 (Cont.) Step-by-step Troubleshooting Flowchart DIGITAL INTERNAL USE ONLY 5-20 Troubleshooting and Error Codes SEE ·ORDER OF YES TROUBLESHOOTING >---1.... DSA ERRORS· SECTION 4. MEDIA 4.1 YES RUN DKUTIL RIW PATH PROBLEMS CONTROLLER MEDIA 4.2 MAY NEED TO FORMAT AFTER REPAIR YES CXO-2163C Sheet 4 of 6 Figure 5-7 (ConI.) Step-by-Step Troubleshooting Flowchart DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-21 HDA YES ECM ECM ~------------~" CABLES HDA CONTROLLER ~ ECM ___... CABLING P.S. HDA I NO t 5. MISC CHECKS - SEE SERVICE MANUAL YES YOU ARE LOST I CXO-2163C Sheet 5 of 6 Figure 5-7 (Cont.) Step-by-Step Troubleshooting Flowchart DIGITAL INTERNAL USE ONLY 5-22 Troubleshooting and Error Codes YOU ARE NOT LOST; DRIVE IS IDENTIFIED, PROBLEM IS NOT r---~ 7. USE HOST-LEVEL DIAGS AS LAST RESORT 9.3 8. 9. FRU REPLACEMENT RETURN DRIVE TO USER -MOUNT -ACCESS -BASIC APPLICATIONS SEE -MULTI PLE ERROR CODES~---.... IN THE -FRU REPLACEMENTSECTION YES 9.1 9.2 EXECUTE DRIVE SEQUENCE TESTS; -POWERUP -SPINUP SERVICE POST VERIFICATION 1-----------------' CXO-2163C Sheet 6 of 6 Figure 5-7 Step-by-step Troubleshooting Flowchart DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-23 5.4.1 Troubleshooting Worksheet Develop a worksheet to aid in collecting error data. Identify only those errors being reported against the identified drive. Arrange a piece of wide, line-printer paper with columns identified as follows: • MSCP StatuslEvent Code • Comment Area • Block Number • Block Type (LBN or RBN) • Cylinder • Group • Sector • Drive LED Error • Drive-Reported Previous/Current Group - DatelTime of Error 5.5 Identifying the Problem Drive ffi The cause of local drive error problems generally requires minimum analysis. These problems can be identified by noting that the drive is not performing basic operational functions (power-up, spinup, spindown, and so on), by incorrect lamp indications, or by OCP error codes. Once you have isolated the problem drive, proceed to Section 5.6. If you have not isolated the problem drive, refer to Sections 5.5.1 through 5.5.6. These sections describe procedures to use for problem drive identification. 5.5.1 Talking to the System Operator/Checking the OCP Fault Indicator [!J Discuss d...~ve errors with the system operator/manager and users. Operators or users can provide valuable information concerning system activity at the time of the error (such as applications that were nmning, disks the data is stored on, affected users, and impact on other applications). Check the OCP for fault indications. 5.5.2 Using VAXsimPLUS to Identify the Problem Drive [!] Use VAXsimPLUS to obtain a summary of information that may lead to direct identification of the failing drive. Section 5.1 lists appropriate VAXsimPLUS documentation. If the problem drive is identified using information obtained with VAXsimPLUS, go directly to Section 5.6. 5.5.3 Using the Host Error Log to Identify the Problem Drive ~ Study available host error logs. Host error logs proVide failing drive and error code information. Use this information to identify failing FRUs. Refer to the DSA Error Log Manual for detailed descriptions of most system-level host error logs. DIGITAL INTERNAL USE ONLY 5-24 Troubleshooting and Error Codes 5.5.4 Using the HSC Console Log to Identify the Problem Drive [!3J Drives attached to HSC controllers send drive state information to the HSC console log. Use the HSC console log to identify problem drives. Correlate time-of-error information to user operations. 5.5.5 Using the Host Console/User Terminal Trails to Identify the Problem Drive ~ If no host error log or VAXsimPLUS resource is available, check host console trails or user terminal trails. These may indicate drive problems and identify the problem drive. 5.5.6 Using Other Means to Identify the Problem Drive ~ If no hard fault indications, error logs, or console logs are available to identify the problem drive, refer to Section 5.9. It is important to identify the failing disk drive before attempting to isolate the failing subsystem component. If more than one drive exhibits the same failure symptoms, examine the possibility of a controller or system problem. NOTE Using DSA utilities such as Error Log Dumper (ZUDMlEVRLUMDMlDKUTIL) to dump the RA9OIRA92 drive internal error log may identify problem hardware areas. However, there may be a significant negative impact on the availability of hardware and data to the customer. Consider off-line diagnostics only as a last resort. DSA utilities (Bad Block Replacement or HSC Verify) verify that the logical structures of the user data are correct. Additionally, these utilities check the status of any revectored blocks, blocks with forced error flags set, blocks marked bad in the RCT area, the number of primary and non-primary replaced blocks, and blocks that exceed symbol error thresholds. User data areas that have :Bagged forced error conditions are identified as disk areas that cannot be accessed due to media or drive problems. Transient problems may require the use of off-line diagnostics. EVRL, ZUD, and MDM frequently miss a problem executing in the DBN area of a disk. You may have to exercise the customer data area of the disk to increase the chances of generating an error. CAUTION Back up customer data before executing diagnostics on customer data areas of the disk. Refer to Section 5.11 for host-level diagnostics information. 5.6 Identifying the Problem FRU ~ Mter identifying the problem drive, you must identify the failing FRU. The following sections describe procedures to use for identifying the problem FRU. Use the host error log or HSC console log to fill in the troubleshooting worksheet (described in Section 5.4.1). Calculate the logical cylinder, group, and sector from the targeted LBN or RBN and add that information to the worksheet. Drive-reported errors (SDI error packet) include valid extended drive status bytes that call out the logical cylinder, the previous and current group select, and the master drive error code. After the data is collected, analyze the data to select the most logical replacement FRU. Proceed to Section 5.7 and compare the collected data to determine troubleshooting priority. DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-25 5.6.1 Pre-Verifying Drive Symptoms [?]I After identifying the drive, you should verify drive failure symptoms by performing pre-verification testing of the drive. Pre-verification of drive symptoms using resident diagnostics has the following benefits: • • Establishes a basis for post-verification and: - Ensures that no new problems have been introduced. - Ensures that a replaced FRU corrected the problems detected during pre-verification testing. Establishes a more reliable error code or condition to troubleshoot. Generally, errors detected while performing drive-resident diagnostics have a higher priority than errors or symptoms derived from any source previously mentioned. To complete pre-verification testing, perform the following steps: 1. Spin up the drive. 2. Execute resident diagnostic test T60 Ooop-on-test utility). 3. Execute resident diagnostic test TOO (sequence test). Examine the drive internal error log and note the type of errors. Compare the generated errors to the error symptoms originally encountered. The following sections help isolate the failure symptoms to the failing FRU. 5.6.2 Using OCP Error Codes to Identify the Problem FRU ~ Correlate error codes displayed in the OCp, host error logs, or drive internal error logs to error descriptions given in Section 5.19. Each error description includes a list of suggested replacement FRUs. Use this list to repair the drive. Verify repairs using the post-verification procedures defined in Section 5.13.2. 5.6.3 Using VAXsimPLUS to Identify the Problem FRU ~ VAXsimPLUS identifies FRU replacements based upon an analysis of the errors being recorded by the VMS error logging system. VAXsimPLUS identifies the failing FRU through a theory number. The procedure for cross-referencing theory numbers to drive FRUs is determined by individual Digital service areas. Each service area has the responsibility of defining and implementing VAXsimPLUS in line with individual area service goals and strategies. IfVAXsimPLUS identifies a failing FRU, replace the FRU then proceed with post-verification testing. Refer to Chapter 6 for FRU removal and replacement procedures. 5.6.4 Using the Host Error Log to Identify the Problem FRU ~ If the system does not support host error logs, or if a host error log cannot be obtained, go to Section 5.6.5 If you are working in a cluster environment, it may be easier to use the HSC console log. The HSC console log is a condensed version of the host error log. Proceed to Section 5.6.5 for information on using the HSe console log. The following is a data collection step: Access the host error log. Obtain the drive and controller event (error) codes. Note the LBNs involved in read/write disk transfer errors. DIGITAL INTERNAL USE ONLY 5-26 Troubleshooting and Error Codes Note the LBN being reported in the data transfer error packet. Also note if any of the following error types have been detected by the controller: • • • • • • • • Data errors ECC errors Uncorrectable ECC errors Header-not-found errors Invalid header errors Header compare errors Format errors Data sync timeout errors Study the SDI error packet of the error log for drive-detected errors and check for the following information: • Error code • Drive group number • Logical cylinder number For controller-detected (communication) errors, such as protocol or transmission errors, note the controller-reported error code in the status/event code field. 5.6.5 Using the HSC Console Log to Identify the Problem FRU ~ If the disk drive is not attached to an HSC or KDM and no supporting error data is available, go to Section 5.6.6. The amount of subsystem error information reported by the HSC console log depends upon the HSC error threshold level setting. The HSC SETSHO utility can be set to alter the error threshold level as follows: • Information • Warning • Error • Fatal Execute the HSC SHO SYSTEM command to display the error threshold parameter setting. If the error threshold is set sufficiently high (fatal), no error information may be available from the HSC console log. Refer to Section 5.6.6 to continue error analysis. If the drive is attached to an HSC, check the HSC console log. Use the HSC Service Manual to decode the console error log. Obtain status/event codes, drive extended status bytes for the drive LED error codes, and the LBN addresses at the time of the error. Organize the gathered information on the troubleshooting worksheet to help isolate the failing FRU. Proceed to Section 5.7 and compare the collected data to determine troubleshooting priority. If the information from the HSC console log does not identify the problem FRU, go to Section 5.6.4 to examine the host error log, or Section 5.6.6 to examine the drive internal error log. DIGITAL INTERNAL USE ONLY Troubieshooting and Error Codes 5-27 5.6.6 Using the Drive Internal Error Log to Identify the Problem FRU ~ If the drive is connected to a cluster, it is strongly recommended that you dump the drive internal error log before troubleshooting or attempting FRU replacement. To extract the RA9OIRA92 drive internal error log, use one of the following methods: • Run DKUTIL from the HSC console or KDM controller (see Section 5.2.4.1). • Run drive-resident utility T41 from the RA9OIRA92 OCP (see Section 5.2.4.2). • As a last resort, run utilities such as MDM, EVRLL, or ZUDMxx. NOTE Off·line diagnostics remove system availability from the user and should only be used as a last resod Media problems such as ECC errors are not logged in the drive internal error log. Proceed to Section 5.8 for media errors. If you cannot access the drive internal error log, verify the physical connection between the drive and the controller. If the drive is attached to an HSC, type a SHOW DISK command at the HSC console to verify that the drives are on line to the controller. If no errors have been logged., or the drive internal error log is inaccessible, proceed to Section 5.9. If a host error log or an HSC console trail has been acquired, proceed to Section 5.9. 5. i Priority Order of Troubleshooting DSA Errors ~ The priority order of troubleshooting DSA errors is important. The following sections describe the importance of each error type and DSA reporting mechanisms. 5.7.1 Drive-Detected Drive Errors and Diagnostic Faults [II Give error codes in this category top priority. Drive-detected drive errors (DDDEs) appear in host error logs and HSC console logs provided the error threshold is set low enough. DDDEs are also available in the drive internal error log. Drive-detected diagnostic faults (DDDF) appear in the drive internal error log, although they may be seen at the host level. This error type is top priority. 5.7.1.1 Drive-Detected Protocol Errors Without Communication Errors ~ The occurrence of drive-detected protocol errors (such as errors 07, OC, and so on) without the occurrence of transmission errors (errors 20, 21) indicate a controller problem or an electronic cOI\trol module (ECM) failure. Troubleshooting must be done on that basis. The occurrence of drive-detected transmission errors with error codes 08, 09, OD, OE, OF, 10, 16, 19, lA, 29, 2A, 2B, 2E, or 2F without communication errors generally indicate a controller problem. The drive detects these errors by analyzing packet frames as they are being received. If the drive is at fault (in other words, replacing the controller did not fix the problem), replace the drive ECM module. 5.7.1.2 Drive-Detected Pulse or State Parity Errors ~ The occurrence of transient, drive-detected communication errors occasionally causes a protocol error. This is generally a manifestation of communications problems. Determine if the problems occur on the transmit or receive lines from the controller to the drive. Drive error codes associated with pulse or parity errors are OA, 20, or 21. DIGITAL INTERNAL USE ONLY 5-28 Troubleshooting and Error Codes If the drive is seeing drive-detected communication errors, then the drive ECM receive circuitry, SDI port transmit circuitry (controller), or SDI cabling is suspect. Reconfiguration might further isolate the problem (use different drive/controller ports and cable combinations). If the controller is seeing communication errors (these also show up as ECC errors) and the drive is also seeing communication errors, then the whole path (drive to controller) is suspect. It is important to make a distinction between the communication errors and ECC errors. If an SDI subsystem is having communication errors, one of the manifestations (not the cause) is ECC errors. If the communication errors are severe enough, data transfers are halted. NOTE Fix communication problems before concentrating on ECC or positioner errors. Ensure SDI cable connections are secure enough to provide proper electrical and mechanical continuity. 5.7.2 Controller-Detected EDC Error ~ NOTE EDC errors are not caused by drives. EDC is a data protection mechanism to ensure data integrity within a disk controller. In contrast, the ECC mechanism ensures data integrity from the controller through the drive, to the media, and back again. ECC ensures integrity of customer data and the EDC mechanism together. It is important to note the differences in how controllers implement the EDC mechanism: • For the KDAlKDBIUDA family of controllers, EDC is generated on a sector of data at the bus interface as the data is initially read from host memory. EDC is verified on a sector basis as the data is written to host memory from the controller memory. Therefore, xDAlxDB controllers generate and check EDC. The microcode engine of the controller performs this check at the bus interface. • For HSC controllers, EDC is generated on a sector of data at the K.pli port processor module as the data streams in from host memory over the CI bus. EDC then becomes an integral part of the user data as the data is transferred to the HSC data memory. As this data is read out of HSC data memory by the K.sdi modules and transmitted to the drive, user data EDC is regenerated and checked in the K.sdi and compared to the EDC characters appended to the data by the K.pli. The EDC must check OK, or the write-transfer-to-disk will be aborted. The IISC again requests the data from host memory and again queues the write-transfer-to-disk when data becomes available in the HSC data memory. If the EDC checks OK at the K.sdi on a write-to-disk, the EDC and ECC codes are appended to the data stream and written to disk with ECC ensuring data integrity of the customer data and the EDC code. For a disk read, the data, as it is read by the K.sdi (over the SDI read/response line), is checked for good ECC, then the data plus EDC characters are stored in HSC data memory. As the data is sent to host memory, the K.pli, while transferring the data to host memory, verifies that good EDC exists for the customer data block but does not transfer EDC characters to host memory. IfEDC is bad, the K.pli informs the HSC functional code to again request the same data from the disk. • For KDM controllers, EDC is generated on a sector of data at the bus interface as the data is initially read from host memory. EDC is verified on a sector basis at the SDI SERDES port interface as the data is written to disk. On a read, EDC is checked by the SDI SERDES at the completion of each sector read (and data correction, if applicable). EDC is checked again as the data is written to host memory from the controller memory. DIGITAL INTERNAL USE ONLY Troubieshooting and Error Codes 5-29 If EDC errors are detected, the problem is a controller problem. The ECC is protecting the data to and from the disk and checking the integrity of the data at the SDI port module logic. NOTE A properly functioning controller always reads bad EDC written to disks. However, if bad EDC is written to a disk (improperly functioning controller), each time the block with bad EDC is read, EDC errors are logged against the drive. Only after the data is restored or rewritten to the disk with good EDC by a good controller will the errors go away. 5.7.2.1 Controller-Detected Protocol and Transmission ElTOrs Without Communication Errors (Status/Even! Codes 14B or 4B) f!] The troubleshooting process for this type of error is very similar to the discussion in Section 5.7.1.1. It i8 important to determine that the controller detected protocol errors without basic communications errors such as: • Protocol errors-A level 2 response from the drive had correct framing codes and checksum but was not a valid response under SDI protocol rules. If the opcode on the readlresponse line has an odd number of bits, it is an unknown opcode; if the response packet is bad, it is also classified as a protocol error. • Transmission errors-The controller detected an invalid framing code or a checksum error in a level 2 response from the drive. The UDA50 also returns the same siaiusleveni code ior controller-detected protocol errors. Tabie 5-3 Summary of Controller-Detected Communication Errors StatuslEvent Code Controller-Detected Communication Errors USC UDA KDA KDB KDM Protocol 14B 4B 14B 14B l4B Invalid frame code, level 2 checksum 4B 4B 4B 4B 4B Pulse/state parity (wi...re) lOB lOB lOB lOB lOB Communication (wire) errors are described in Section 5.7.2.2. 5.7.2.2 Controller-Detected Pulse or State Parity Errors (Status/Event Code 10B) ~ The procedure for handling controller-detected communication errors is very similar to the one described in Section 5.7.1.2. The controller detected a pulse error on the state or data line, or the controller detected a parity error in a state frame from the drive. This error is associated with the controller and drive SDI port electronics (including interconnecting cables). The symptoms indicate a basic (wire) communications problem within the SDI pathway, including drive or controller port electronics. Noise can be injected through the port electronics or the cabling between the controller and the drive. Additionally, bad cables (bent, walked on) or loose connecting hardware (bulkhead connections) can contribute to the problem. Pulse errors are caused by two consecutive pulses of the same polarity. SDI signal lines use an NRZ transmission technique where no two adjacent pulses can be of the same polarity. This is detected on either the state or read/response line. A state parity error is the occurrence of bad parity over the length of a single SDI RTDS state frame or SDI read/response frame. This type of error may also result in the detection of ECC errors during data transfer times. This occurs when the read/response line and the write/command line are functioning as the data line. DIGITAL INTERNAL USE ONLY &-30 Troubleshooting and Error Codes Controller-detected transmission errors (4B) occur if an invalid framing code or a checksum error is detected during a level 2 response from the drive. NOTE The UDA50 also returns this status/event code for controller-detected protocol errors. 5.7.3 Controller-Detected Communication Events and Faults [ZJ Controller-detected communication events include: • Loss of read/write ready-MSCP StatuslEvent 8B • Loss of receiver ready-MSCP Status/Event CB • Receiver ready collisions-MSCP StatuslEvent lAB • Drive clock dropout-MSCP StatuslEvent AB • Failure of drive initialization process-MSCP StatuslEvent 16B • Failure of drive to· respond to controller-requested initialization-MSCP StatuslEvent 18B • SERDES overrun error (in controller)-MSCP StatuslEvent 2A • SDI drive command time-out-MSCP StatuslEvent 2B Communication systems have faults and event irregularities. Communication faults are events, but not all events are faults. The difference is related to timing between events and system operations occurring at the time of the event. For example, a loss of read/write ready is an event if no write activity is occurring at the time of the loss. During a write, however, a loss of readlwrite ready is an error (fault) event. 5.7.3.1 Controller-Detected: LOSS OF READIWRITE READY (Status/Event Code: 8B) [!) The controller event is LOST READIWRITE READY DURING OR BETWEEN TRANSFERS. This error indicates read/write ready (RTDS status bit) was negated when RJW ready had been previously asserted (indicating completion of a preceding seek) and: • The controller attempted to initiate a transfer, or • A RJW ready was found negated at the completion of a transfer This event usually results from a drive-detected transfer error, in which case an additional error log message may be generated containing the drive-detected error event code. This error may be symptomatic of a fine track servo problem in the RA9OIRA92 disk drive. If there are no other such subsequent error log entries, the loss of fine track was probably responsible for t~e loss of read/write ready. Examine the drive internal error log for evidence of servo problems. 5.7.3.2 Controller-Detected: LOST RECEIVER READY (Status/Event Code: CB) ~ RECEIVER READY (RTDS status bit) was negated when the controller attempted to initiate a transfer, or RECEIVER READY was not asserted at the completion of a transfer. This includes all cases of the controller timeout expiring for a transfer operation (level 1 real-time command). As a consequence of this condition, the controller performs an SDI INIT then attempts to request a GET STATUS. The extended status error log entry returned in the GET STATUS command may indicate what the problem is. If no information is being reported by the drive as a part of the error log sequence, approach the problem as a drive ECM failure. Examine the drive internal error log for extended error information. DIGITAL INTERNAL USE ONLY Troubieshooting and Error Codes 5-31 5.7.3.3 Controller-Detected: RECEIVER READY COLUSION (Status/Event Code: 1AB)13.1ol The controller attempted to assert RECEIVER READY (RTCS status bit), indicating it was ready to receive a drive response. The drive RECEIVER READY (RTDS status bit) was still asserted, indicating it was ready to receive a command from the controller. This is not an error, but an event within the subsystem. All DBA drives and controllers occasionally log this event. There is no performance impact because of the occasional OCCUITence of this event. No data corruption is associated with the occurrence of this event if no other SDI bus errors occur at the same time. Acceptable event rates for RECEIVER READY collisions are less than ten per day, provided the following events are not contributing: • Broken physical SDI interconnects (plugging and unplugging SDI cabies). • Controller (node) initi!:!l;z!:!tions or F~C f'~ilovers. NOTE The occurrence of RECEIVER READY collisions happens primarily when both Ports A and B are enabled at the drive. Resolve unacceptable event rates of more than ten a day by replacing either the ECM or controller port interface module, cables, or bulkheads. 5.7.3.4 Controller-Detected: DRIVE CLOCK DROPOUT (Status/Event Code: AB) 13.111 Either data (read/response line) or state clock (RTDS) was missing when it should have been present. This is usually detected through a timeout. A fatal drive condition can cause the drive to drop the drive clocks. The drive should reassert clocks after performing a drive initialization and establishing clocks to the controller to re-establish communications and state information between the drive and controller. The sequence of getting status and error information then occurs. Analysis of error log message packets usually indicates that the above sequence has occurred. If such message packets are not being processed or received, it is possible that the condition cannot be detected by the drive. Execute drive SDI loopback tests to try to find subtle SDI problems. The order of emphasis is: • ECM • Controller port module • Cabling (including bulkhead connectors) 5.7.3.5 Controller-Detected: DRIVE FAILED INITIALIZATION (Status/Event Code: 16B)j3.121 The drive clock failed to resume following a controller-attempted drive initialization. This implies the- drive encountered a fatal initialization error. It may also indicate the drive was attempting its own initialization or that the drive is looping in an initialization state or routine. 5.7.3.6 Controller-Detected: DRIVE IGNORED INITIAUZAll0N (Status/Event Code: 18B)13.131 The drive clock continued running even though the controller attempted to perform a drive initialization. This implies the drive did not recognize the INIT command from the controller. It may also indicate the drive was performing an initialization caused by some drive-detected condition and, in the course of initialization, ignored the controller's attempt to initialize the drive. DIGITAL INTERNAL USE ONLY 5-32 Troubleshooting and Error Codes 5.7.3.7 Controller-Detected: SEROES OVERRUN ERROR (Status/Event Code: 2A)/s.141 SERDES overrun (or underrun) errors indicate that the drive is too fast for the controller or, more typically, a controller hardware fault is preventing the controller microcode from keeping up with data transfers to or from the drive. Because of the speed with which the RA901RA92 disk drive handles data transfers, some SDI controller ports may not be able to keep up with data transfers to and from the drive. This speed sensitivity may even show up on drive ports that have successfully run other RA-type disk drives. There is not a universal problem with Digital SDI controller port boards. The controller port boards design supports RA9OIRA92 operating speeds. The SERDES overrun problem manifests itself as transient occurrences of the error or as solid SERDES problems preventing execution of read/write operations to the drive. For all controllers, the SERDES occurrence looks like a single controller port failure and is seldom related to a particular drive port. 5.7.3.8 SOl Drive Command Timeout (Status/Event Code: 28) 13.151 A controller may report an SDI command timeout when it issues a command to the drive and the drive does not respond within the required timeout period. The timeout period is commanddependent. SDI command timeouts are associated with Status/Event Code 2B. These events will frequently occur under the following conditions: • Powering up a drive with one or both port switches depressed, then hitting the Run switch. • Spinning down a drive with one or both port switches depressed. Under these two conditions, the SDI command timeout event reports can be ignored. However, under other conditions, you should examine SDI command timeout events by looking at the logged errors around the time of the event. The drive internal error log may also reveal clues to the problem; however, you should verify that the time of the error, as logged in the drive, corresponds to the time of the event. If the controller is an HSC, verify that the device priority is correctly managed. The RA9OIRA92 disk drive's place in the priority scheme is as follows: TA9O-highest priority RA9OIRA92 ESE2x RA82 RA81 RA70 RASO-Iowest priority 5.8 Media-Related Errors ~ Media and read/write transfer problems manifest themselves in many ways. Symptoms include: • ECC errors (refer to Section 5.16) • Uncorrectable ECC errors (refer to Section 5.16.1) • Header-not-found errors (refer to Section 5.16.1) • Invalid header errors (refer to Section 5.16.1) • Header compare errors (refer to Section 5.16.1) • Format errors • Data sync timeout errors DIGITAL INTERNAL USE ONLY Troubieshooting and Error Codes 5-33 Read/write errors may involve the read/write data path or defective media. For the SDI disk subsystem, the readiwrite data path includes: • SDI controller read/write data path circuits • SDI cables and bulkhead connectors • Disk drive read/write data path hardware • Disk drive media Use the following process to analyze read/write transfer errors: 1. Isolate the LBNs associated with the logged transfer errors in the host or HSC error log. If there are many, randomly select 10 to 20. Use the approprUlte algorithm to decode targeted LBN numbers to the logical cylinder, group, and head. Refer to Example 5-1 for RA90 LBN conveLNon, and Example 5-2 for P..A92 LBN conversion. 2. Decode the LBNs in question to physical cylinders, tracks, and groups (physical readlwrite heads). 5.8.1 Repeating LBNs/RBNs ~ LBNs or RBNs that consistently recur in the host error log should be replaced. If the controller or system has noi marked ihese fur replacement, replace them manually by nJnning HSC DKtJTIL, EVRLK, or ZUDLx, and MDM. This is a useful procedure for blocks that consistently report ECC or data errors. This symptom occurs when the host bad block replacement (BBR) software does not use customer data as a pattern to test the suspect block. The block is initially flagged for replacement. The host executes a test of the block and finds nothing wrong. It does not revector the block, but instead restores the original data back to the block. The user then attempts to access the data and may get another ECC error severe enough to invoke the BBR activity again. 5.8.2 Excessive Number of Blocks Replaced Because of RIW Path Problems ~ Read/write data path problems may cause the replacement of a high number of good blocks. This may lead to logical fragmentation of the disk. If this happens, the number of blocks in the RCT recorded as revec+..ored differs substantially from FCT inful'mation. For example, the RCT may show a doubling of replaced blocks occurring over a short period of time. Use EVRLB, MDM, ZUDK:xx, or HSC FORMAT to reformat the disk and recover these good blocks. NOTE Back up customer data before executing the reformat. Use the host error log to identify replacement blocks and to show if BBR activity is complete. Use HSC DKUTIL to dump the factory scan (FCT) and RCT areas of the disk. Look for differences in the FCT and what is currently in the RCT. The contents of RCT only show what blocks were replaced; the host error log and HSC console logs supply the time of replacement. Keep good records in the site management/cluster guide. Include results of VERIFY and BBR scans of each disk. This information helps identify changes in block replacement activity and is part of good site management practices. 5.8.3 LBN Correlation to Single Grouprrrack @2l Consistent failures involving one or two readlwrite heads usually indicate an HDA failure. DIGITAL INTERNAL USE ONLY ~34 Troubleshooting and Error Codes 5.8.4 LBN Correlation to Head Groups ~ Consistent failures within head groups are usually due to head selection logic within the HDA. The groups are as follows: RA90 (LA) RA90 (SA) RA92 70-22951-01 HDARevOO 70-27268-01 HDARevOl 70-27492-01 BDARevl0 0-3 0-2 0-2 4-7 3--6 3--6 8-11 7-9 7-9 12 10-12 10-12 Replace in the following order: 1. PCM 2. HDA 5.8.4.1 LBNs Correlated to Zone Write Boundaries ~ Failures showing no consistency to a group or head may show consistency in write current zones. DSA drives divide the media into different write current amplitude zones. The RA9OIRA92 divides the media into four write current amplitude zones as listed in Table 5-4. Table 5-4 RA90/RA92 Write Zones RA90 RA92 Zone Cylinder Range LBN Cylinder Range LBN 0 0000-1722 0-1546428 0000-2014 0-1912234 1 1723-2020 1546429-1813724 2015-2363 1912235-2243435 2 2021-2335 1813725-2096289 2364-2731 2243436--2592667 3 2336--2660 2096290-2377747 2732-3112 2592668-2954237 To verify this correlation, you need a substantial number of errors (greater than 100) and knowledge of the user disk space being used. A customer using more than 50 percent of the available disk space is probably accessing all zones of the disk. A disk using less than 25 percent of the disk space may only be accessing a single zone. Knowledge of operating system utilization of disk space is necessary to make this troubleshooting procedure effective. Zone-related problems encountered with the RA9OIRA92 disk drives generally are resolved by replacing the PCM, ECM, or BDA (in that order). 5.8.4.2 LBN Correlation to a Physical Cylinder ~ Failures consistently related to a specific cylinder may be the result of a head touchdown. Problems involving servo detection information (dedicated and/or embedded) that prevent head tracking to cylinders usually indicate media corruption. These problems include HDA and ECM electronics. Failures are usually due to specific cylinders in a head crash and may include an area as wide as ten cylinders. One to three cylinders usually indicate servo data failures. In the RA901RA92, logical cylinders correlate to physical cylinders. DIGITAL INTERNAL USE ONLY Troubieshooting and Error Codes 5-35 5.8.5 Multiple Controllers Report Same Error Types [!l If multiple controllers report the same error types and only one drive port (after cable swap) reports the error, it is likely an ECM problem. If multiple controllers report the same elTor types and both drive ports report the same elTor, replace drive components in the following order: 1. PCM 2. ECM 3. SDI cabling/interconnects 4. Power source 5. Spindle ground brush 6. HDA 5.8.6 Only Single Controller Port Affected I!!I If errors occur to a single controller port and both drive ports have been tested to a known good controller interface, then the problem is in the controller or cable. 5.8.7 Isolating Random RIW Transfer Errors ~ NOTE You are here only because the disk drive is experiencing random readlwrite transfer errors or because your checklist has led you here. If you have not pinpointed the failure, see Section 5.9. Random physical cylinder and head failures are generally caused by ECMlSDIISDI-controller interface problems. A faulty spindle ground mechanism or a power supply exceeding noise specifications may also cause a drive to exhibit random errors. Intermittent read/write problems involving random read/write heads and cylinders may be the result of intermittent failures through the read/write data path. This includes SDI cabling or rt::adlw .l~te data path hardware in the controller. 5.8.7.1 Not Defined to a Specific Drive/Controller Port This is a decision point for the first-time call efTort with random read/write errors. If working from a miscellaneous check or action item list, proceed to Section 5.9. For the RA.901RA92 drive, replace parts in the following order: 1.,PCM 2. ECM 3. Cabling (reconfigure) 4. Power supply 5. Spindle ground brush 6. HDA DIGITAL INTERNAL USE ONLY 5-36 Troubleshooting and Error Codes 5.9 Miscellaneous Checks ~ Miscellaneous checks are provided as an alternative when: • No host elTor log is available. • No HSC console trail is available. • No errors are logged in the drive internal error log. • ElTors are transient or not reproducible through standalone diagnostics. If you cannot access the RA9OIRA92 drive internal elTor log from the OCP, replace FRUs in the following order: 1. ECM 2. OCP 3. Power supply If you cannot ac~ss the RA9OIRA92 drive internal error log with DKUTIL or EVRLUZUDMlMDM, perform the following: 1. Execute resident diagnostic test TOO (drive spun down). 2. Execute resident diagnostic test TOO (drive spun up). 3. Execute externalloopback SDI test T09 (use loopback connector Digital part number 70-19074-01). 4. Check drive power supply and indicators. See Section 5.2.6 for the location of power supply indicators and their meanings. 5. Check drive power supply for proper voltages and ripple (noise). See Chapter 1 for power supply operating specifications. 6. Check spindle ground brush for excessive wear. 7. Check the SDI cable by changing the cable. 8. Check the controller port by connecting the SDI cable to another port. Unreliable power from the power supply, controller, or source power may cause the drive to exhibit a variety of unrelated elTors. Ensure source power is within tolerances and follow suggested drive power checks. If all checks have been made and no problem is found, replace the ECM. The ECM is the moSt likely FRU to fail, provided the failing drive has been colTectly identified. Use the Customer Support Center for problems beyond the scope of your experience or this manual. NOTE For transient disk subsystem errors, nJnning host-level diagnostics on xDAlxDB controllers seldom isolates errors without long run times. This seriously impacts system availability to the customer. Use system-level and drive internal error logs whenever possible. 5.10 Are You Lost? ~ If you feel that the problem is beyond your capabilities and you have spent too much time trying to isolate it, use available support resources. Digital Customer Services should operate within the Management Action Planning (MAP) guidelines for each respective area of the country/world. If you are in the process of performing action items, complete those items and reenter the drive fault evaluation phase after collecting new error data. DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-37 5.11 Using Host-Level Diagnostics as a Last Resort ~ There are significant concerns about running standalone diagnostics in troubleshooting RA901RA92 disk problems. Running standalone diagnostics extends site time and makes the system unavailable to the customer. Customer Services goals ~'""S to m8~mi7.e system or device availability to the customer and minimize repair time. Consider running host-level diagnostics only if you have exhausted all options. Tables 5-5 through 5-7 contain the names of diagnostics that are compatible with the RA901RA92 disk drives. CAUTION Back up customer data before executing diagnostics on customer data areas of the disk. Protection of customer data is your responsibility. Follow the strategy which is in place to provide quick and accurate diagnosis, repair, and validation. Trls strategy Tn~n~Tn;zes +:he impact on syst.em or device aVAilabilitYo 5.11.1 HSC-Based Diagnostics Use HSC utilities (DKUTIL) and diagnostics (lLEXER and ILDISK) in a cluster environment. Though the diagnostics are in line and do not cause a loss of system availability, device availability is an issue. With that in mind, examine the drive internal error log prior to rnnning standalone diagnostics. To execute the in-line tests or utilities, the drive must first be dismounted. The rest of the disk subsystem will not be affected. DKUTIL, ILEXER, and ILDISK do not adversely affect the drive; however, ensure customer data is protected. While rnnning these tests, give errors detected by the drive or controller top priority. 5.11.2 KDM-Based Diagnostics Use KDM utilities (DKUTIL) and diagnostics (ILEXER and ILDEVO) in a cluster environment. Though the diagnostics are in line and do not cause a loss of system availability, device availability is an issue. With that in lnind, examine the drive internal error log before nJnning standalone diagnostics. To execute the in-line tests or utilities, the drive must first be dismounted. The rest of the disk subsystem will not be affectect DKUTIL. ILEXER, and ILDEVO do not adversely affect the drive; however, ensure customer data is protected. While running these tests, give errors detected by the drive or controller top priority. 5.11.2.1 On Une from VMS Use the following procedure to access and run on-line programs on a KDM controller. See Section 5.11.2.2 for instructions on accessing and running programs in standalone mode. NOTE You cannot run on-line diagnostics, exercisers, and utilities without first mnnjng EVRLN.KDM. Follow the procedure shown here. $ RON SYS$SYSTZN:SYSQBR SYS GEN> COBRBC'!' J'!'AO /WOADU'1'D SYSGEN> EXI~ $ SB~ DBFAOL~ SYS$~ $ SB~ HOST/DtJP /SBRVBR=DOP /L01t.I);I:BVRLR.1mM PtJAO/DBVICB $ SB~ HOS~/DtJP /SBRVBR=DOP /DSlC-ILDBVO PUAO/DBVICB DIGITAL INTERNAL USE ONLY 5-38 Troubleshooting and Error Codes 5.11.2.2 Running Standalone Programs from the VAX Diagnostic Supervisor DS> ATTACH Ia)N70 ROB DOx 11 BR , I , , _ _ _ _ BUS REQUEST , , _ _ _ _ _ NODE NUMBER DS> SBLEC'l' DOx DS> ROIl BVRLN EVRLN> ROHL ILDBVO 5.11.3 xDA Controller-Based Diagnostics To run standalone diagnostics or utilities (excluding EVRAE) through any UDA, KDA, KDB controller, the operating system must be shut down and the appropriate diagnostic/supervisor loaded. Some diagnostics force error conditions to validate the drive's ability to detect eITor conditions. Error conditions detected by the drives are logged to the drive internal error log as a normal course of operation. Therefore, through several iterations of a standalone diagnostic, the drive internal error log may be overwritten and the real drive-detected errors lost. For example, running a single iteration MDM on a MicroVAX may result in 13 error events. These events are logged to the drive's internal error log (EEPROM) and may overwrite important error information. With that in mind, examine the drive internal error log before running standalone diagnostics. A recent SDI specification change addresses this issue by having the controller disable drive error logging during drive testing. The following diagnostic software releases incorporate the SDI specification changes: • • XXDP-Release 135 (Q3FY88) - ZUDGrev CO - ZUDHrev CO MDM-Release 122 (Q3FY88) - • NAKDAH VDS-Release 31 (Q4FY88) - EVRLF version 8.3 - EVRLG version 8.3 If any errors occur while running disk diagnostics, go to Section 5.6. If multiple errors occur, go to Section 5.13.1. If no errors occur, go to Section 5.10 and call remote support. DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-39 Table 5-5 VDS-Based Off-Une Diagnostics Diagnostic Title EV"RLB Drive formatter EVRLF Tests 1-3 EVRLG Test 4 EVRLJ Test 5 EVRLK Bad block replace~ent utility (Scrubber) EVRLL Drive-resident error log utility EVRAE MSCP disk exerciser EVSBA VAX autosizer Table 5-6 MOIl-Based Off-Une Diagnostics Diagnostic Title MDM MicroVAX diagnostic supervisor 1 lCurrently has a problem identifying drive unit number. Table 5-7 XXDP-Based Off-una Dlagnosacs Title ZUDH2 Tests 1-3 Test 1: UNIBUS interruptJaddress test Test 2: Executes drive-resident diagnostics Test 3: Disk function test (rdIwrt) ZlJDI :& Test 4: Disk exerciser ZUDJ Test 5: UDAIKDASO subsystem exerciser ZUDK Formatter ZUDL Bad block replacement utility ZUDM Disk-resident error log utility 2Forl:es 8lTOl'S during nm that are logged in the drive intema1 error log. 5.12 Exiting Data Collection: Action Item ust Process ~ Your goal dming the data collection phase is to collect logged subsystem events including: 41) Status/event codes from error log packets • Drive-detected master error codes • Identified target LBN numbers DIGITAL INTERNAL USE ONLY 5-40 Troubleshooting and Error Codes When no host or HSC error log information is available, use the drive internal error log or operator/system console trail to identify the problem drive. In some isolated cases (less than one percent), you will have to use a troubleshooting worksheet (described in Section 5.4.1) in place of system logged information. You should leave this phase ready to analyze collected data or with an action item list. 5.13 FRU Replacement ~ Replace an FRU only after: • Analysis of VAXsimPLUS directed a replacement FRU based upon its analysis of occurring errors or error rates. • Analysis of host error logs resulted in a list of error codes with particular emphasis placed on identifying drive-detected error codes. The error codes should predominately be drive error codes. In some circumstances, error codes are generated by the controller. • Analysis of the HSC console log resulted in a list of drive error codes used in identifying replacement FRUs. • Analysis of the drive internal error log led to an identification of a replacement FRU. • Analysis of miscellaneous checks or the process of elimination identified an FRU replacement. Once an error code has been established from one of the previously mentioned sources, refer to Section 5.19 for error code descriptions and suggested FRU replacement(s). 5.13.1 Multiple Error Codes [1] If a number of different error codes are detected, consider the following to decide which error code(s) to use for troubleshooting: • • Give error codes obtained from running internal drive diagnostics top priority. Select an error code or symptom that indicates the least number ofFRUs. Drive-detected errors of this type will have been derived using the least amount of circuitry to isolate the particular failure. • Select the error code that occurs most often. • Select the FRU that is most commonly indicated by different error codes. • Select the FRU that most commonly indicates the same manufacturing code (Section 5.2.3.13). 5.13.2 Service Post-Verification ~ After replacing an FRU or repairing a drive, execute drive-resident diagnostics. You can do this through power-up and spin-up cycles or by using tests which exercise the repaired FRUs. Compare the results to the diagnostics executed during pre-verification testing (Section 5.6.1). Post-verification testing accomplishes the following: • Verifies that no new problems have been introduced when servicing or replacing FRUs. • Verifies that a repair or replaced part corrected any problems detected during pre-verification testing. If the same error code(s) occur during post-verification testing, reinstall the original FRU. Continue troubleshooting procedures, or replace the next identified FRU in the appropriate list. DIGITAL INTERNAL USE ONLY Troubieshooting and Error Codes 5-41 If the diagnostics pass successfully, the problem has most likely been resolved, with the following exception: • If the original error codes used for FRU isolation were the result of host, controller, or drive internal error log entries (not duplicated by running pre-verification testing): the problem may be due to an intermittent failure. Proceed to Section 5.13.3 If any errors occur, you may want to reinstall the original drive FRU and go to Section 5.6.1. 5.13.3 Return Disk Drive to User [!J After checkout is complete, return the disk drive to the user. Have the user exercise the repaired disk drive tlu·ough customer applicationR • If customer applications appear to be functioning normally, the call can be closed. If the drive fails, return to Section 5.6 or call remote support. If there is a question as to the correct identity of the failing disk drive, return to Section 5.5. 5.14 Performance Issues When No Errors Are Being Logged Customer complaints of disk performance can require a fair amount of analysis. Often the performance complaints are quite subjective. The following list of questions may help analyze perlormance complaints: 1. Do the performance issues relate to all or most of the disks? If so, ensure that system parameters comply with suggested guidelines. Cluster si.aZe of disks, working set size parameters, paging parameters, and ACPIXQP-related parameters all can affect performance. 2. Do the performance problems occur during image activation (when a large sized application program is initially started)? Many layered products require some time to fully activate. This is not a disk problem. 3. Is the performance problem noticed by users of the same image, layered product, or file on the (same) disk? If the disk is attached to a iocai controller (uuAiK..lJAlKDB) but is a VAX node member in a cluster, then request that the filelimagel1ayered software product be moved to a disk on the HSC. Local serving of disks creates bus, VAX, and 110 overhead that impacts performance. 4. Is the performance problem noticed by users of a filel"nnagel1ayered product that resides on the same disk as the swap and page files? If so, request the system manager monitor paging and swapping activity. High pagelswap rates decrease VMS response and create an I/O bottleneck for the pagelswap disk. Request the filelimagel1ayered product be moved to another disk. In addition to system parameter settings, two areas of the architecture (hardware-related) contribute to actual loss of performance. These include: 1. Nonprimary replacements in a critical file or directory structure, such as the following examples: • Nonprimary replacement in VMS disk: [000000] INDEXF.SYS • Nonprimary replacement in a frequently used directory file The two examples are of files that may affect the perceived performance of a disk. However, the location of a block of data within a file and how the operating system is set up equally affect nonprimary replacement which, in tum, impacts system or disk drive performance. DIGITAL INTERNAL USE ONLY 5-42 Troubleshooting and Error Codes A non primary replaced block in the INDEXF.SYS file of a disk could be very significant if it is in the front of the file. However, if it is the last block within the file, it might not have as large an impact on system performance. A nonprimary replacement in a block within SYS.EXE that is loaded once by VMS into memory (at startup) and stays resident in memory has no effect on performance. However, if the block is within a portion of SYS.EXE that is frequently brought in by VMS, it could impact performance. A solution is to increase the VMS working set size. A nonprimary replaced block within a swap or paging file has little performance impact. If the system is doing enough paging and swapping to notice the occurrence of nonprimary replacements, the real problem may be with the user or system working set size. Performance may improve if the system manager adjusts system parameters around paging and swapping. VMS uses virtual block file structures, not logical blocks. VBNs do not correlate to LBNs. To correlate an LBN to the affected file, contact someone familiar with the operating system file structure, such as VMS ODS-2. Identifying affected files within ODS-2 is very complicated. 2. Difficulty (but success) in achieving fine track following a seek. The RA9OIRA92 disk drive utilities T36, T3B, and T39 measure various seek time parameters. Compare measured times to drive specifications in cases where seek time is in question. Temperature can affect the performance of T36 and T3B. 5.15 Troubleshooting VMS Mount Verification EXE$MOUNTVER is the VMS executable mount verification process to bring disks back on line after a problem has made the drives inaccessible to a host VAX. It is a very complicated process. If any failure to reinitialize the disk occurs, or if EXE$MOUNTVER exceeds its allowed timeout period (default 10 minutes), the host logs a mount verification error to the host error log. 5.15.1 VMS Mount Verification The mount-verification feature of Files-11 disk handling generally leaves users unaware that a mounted disk has gone off line and returned on line (or in some other way has been unreachable and then restored). Mount verification is the default parameter for EXE$MOUNTVER, with the following exceptions: Disks mounted !FOREIGN and disks mounted INOMOUNTVERIFICATION do not undergo mount verification except during cluster state transitions. Drives dual-ported through HSC controllers should never be mounted INOMOUNTVERIFICATION because this may prevent VMS from failing the drive over to the secondary HSC controller. EXE$MOUNTVER sends status messages to OPCOM. Because there are cases when mount verification messages are needed at the operator console and OPCOM might not be able to provide them, mount verification also sends special messages with the prefix %SYSTEM-I-MOUNTVER to the operator console, OPAO. 5.15.2 VMS Problems Surrounding Diagnosis of "Why a Drive Mount-Verifies" VMS calls EXE$MOUNTVER if a drive loses contact with the system. (For example, the controller sends a command to the drive but does not get a successful response back within the controllerspecific timeout period.) The process verifies that the disk VMS reestablished contact with is the same disk originally connected. Sending the drive to the mount-verify state involves: 1. The host initiating an MSCP ONLINE command to the drive modifier followed by a GET UNIT STATUS (GUS). DIGITAL INTERNAL USE ONLY Troubieshooting and Error Codes 5-43 2. The host reading the home block and comparing the volume information (serial number, name, etc.) for the drive before VMS lost contact and after VMS reestablished contact with the drive during mount verification. This sequence is re~..,ated until success or timeout. Th;s sequence is m.ade evident by t.he drive having a port light on and the Ready light blinking quite slowly as the controller accesses the FCT for the on line and LBN block for the media ID, effectively doing full-stroke seeks. The MVTIMEOUT system parameter defines the time (in seconds) allowed for a pending mount verification to complete before it is aborted. This dynamic parameter should always be set to a reasonable value for the typical operations at the site. NOTE Do not use values less than the recommended default of 600 seconds (10 minutes). After a mount verification times out, any pending YO requests to the. volume will fail. Try to execute the DISMOUNT/ABORT command which allows a subsequent mount to be successful if the MV-timer has previously expired. In some extreme cases, drive failures may require a reboot of the controller; some require a reboot of the system. Entry and exit to or from MOUNT VERIFY are time stamped. VAXcluster time stamps may vary across the cluster nodes due to differences in the TOY clocks and the initial clock times. Slight variations in time stamps do not indicate multiple drive or controller failures causing MOUNT VERIFICATION, but rather one drive or controller failure causing every node to enter MOUNT VERIFICATION at their own locally specified time. Some reasons why a drive enters mount verification: • Disk drive dropped off line because of: Port switch glitch. Drive fault. - • • • Lost communications with controller or cable fault (drive temporarily went away and came back). Drive status changed (operator physically did something with the drive). Someone accidentally pushed the Write Protect switch. By noting the time duration of the mount verification and other circumstances surrounding the mount verify status, you can determine some valuable troubleshooting information. How long did the mount verify take? Le~s than MVTIMEOUT and the drive eventually succeeded. A few seconds-implying a glitch or a recoverable fault. Did the drive appear on another controller after the mount verification? If so, it could be a port-related problem. Thirty seconds to a minute to remount probably means the drive was spun down and had to be spun back up. Was this due to a drive fault? Did it run its spin-up diagnostics error free? Infinite time probably means that, along with the drive disappearing, it also changed its media_ID, or it is a different drive, or it continually fails its spin-up diagnostics, or there is a hard fault on the drive. What happened? VMS does not log errors during the MOUNT VERIFY process, although it may log some before or after, depending on how the drive failed. DIGITAL INTERNAL USE ONLY 5-44 Troubleshooting and Error Codes Did the drive see a fault during this period? (Examine the drive internal error log for error information.) Were any errors logged to the host or HSC console log before or after the mount verify? Is it always the same drive? Do any nonexistent drive numbers appear which may characterize a unit select problem? Was there a last-fail packet from the xDAlxDB shortly after, meaning the controller faulted/initialized as well? Did all the drives on a portlKlcontroller fail? 5.15.3 Non-VMS Mount Verification RSTS 9.x is tolerant of DSA drives dropping off line. It reinitiaIizes the drive and puts it back on line. Most other drives remain off line unless the driver is patched to reissue on line before every command (as RSX does). 5.16 Troubleshooting ECC Errors on RA90/RA92 Disk Drives Disks are getting bigger and faster. As disk bit and track density increases, the electronics and mechanical components of the subsystem operate under tighter constraints. This means that error recovery mechanisms within the architecture may be called upon more frequently to compensate for these narrow tolerances. This is one of the significant advantages of a Digital storage solution. Digital integrates into the design of the controller and the drive error recovery attributes that enhance and ensure data integrity and delivery to the user. Plug-compatible manufacturers (PCMs) of storage devices, by not owning the design of both ends of the subsystem (controller and drive), are left with little capacity to implement such techniques. The RA9OIRA92 disk drive has 14 different error recovery mechanisms (reference Appendix B) and, therefore, affords excellent recovery potential for data errors. These error recovery mechanisms provide the margins necessary to protect customer data at increased densities and to ensure that the data is always delivered successfully. In order to better determine the significance of logged correctable and uncorrectable ECC errors, and for assistance in troubleshooting either, note the discussions and error log examples in the sections that follow. 5.16.1 Uncorrectable ECC Errors--MSCP StatuslEvent E8 An uncorrectable ECC error is architecturally defined as the occurrence of a controller logging an MSCP status/event E8 as a result of a read data error. There are two uncorrectable ECC error types: hard and soft. Both types are reflected by a single MSCP status/event code. The next two sections attempt to aid the engineer in determining/distinguishing between whether the status/event was hard or soft and significant or insignificant. 5.16.1.1 Hard Uncorrectable ECC Errors A hard uncorrectable ECC error is the occurrence of an uncorrectable ECC error that renders the drive unable to recover data through any retry or recovery mechanism. An uncorrectable ECC error is not considered "bard" until all attempts at getting the data are exhausted and the controller has to terminate its attempts. Example 5-3 shows a VMS error log error packet where the data was lost due to a hard error. The fields of note are emphasized in bold. DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-45 29. ******************************* ******************************* ENTRY ERROR SEQUENCE 3885. DATE/TIME 30-JAN-1989 19:54:03.77 SCS NODE: PICKUP ERL$LOGMESSAGE ENTRY LOGGED ON: REVI 14. KA750 SID 0200620E SYS_TYPE 00000000 OCOOE REVI 98. I/O SOB-SYSTEM, UNIT _HSC013$DUA36: MESSAGE TYPE 0001 DISK MSCP MESSAGE MSLG$L eM[) REF MSLG$W:UNIT AF66000F 0024 MSLG$W_SEQ_NOM 0054 MSLG$B_FORMAT 02 MSLG$B_FLAGS EO UNIT +36 .. SEQUENCE 184. DISK TRANSFER ERROR BAD BLK REPLACEMENT REQUEST OPERATION CONTINUING OPERATION SUCCESSFUL MSLG$1I_BVBft OOB. DAD DROIt. 0ltC0lUUlCD8I& ace DItOa MSLG$Q_CNT_ID 0000F20D 01010000 MSLG$B_UNIT_ SVR OB MSLG$B_UNIT_HVR 01 MSLC;$B_LBWL 01 <Laat ze=y 1e".1 ~ KSLG$B_UftY 05 <l'Utb rRzy . . . .~~ <a~ ~ ~ zeUy 1.".1 UNIT SOFTWARE VERSION I l l . UNIT HARDWARE REVISION 11. ~ 1.".la MSLG$L_VOL_SER 14 ~ 2. 0000036C VOLUME SERL~ +876: MSLG$L_BOR_CODE 000E75BO LOGICAL BLOCK 1947645. GOOD LOGICAL SECTOR CONTROLLER DEPENDENT INFORMATION ORIG ERR 8010 EDC ERROR ECC ERROR ERR RECOV FLGS 0003 LBN REPLACEMENT INDICATED ERR LOGGED TO CONSOLE AND HOST LV1 A RETRY LV1 B RETRY BOF OAT HEM ADR SRC REO I DET REQ I 00 00 C41B 03 03 ****************************************************** Example 5-3 VMS Uncorrectable ECC Error Log-Hard DIGITAL INTERNAL USE ONLY 5-46 Troubleshooting and Error Codes The disk subsystem will attempt to recover from an uncorrectable ECC error by retrying the transfer five times. For an RA901RA92 disk drive, the controller would then invoke drive recovery level 14 and execute that recovery mechanism up to five times, then invoke drive recovery level 13, and so on, until executing the last recovery level (1). Note that for UDA controllers, the reported recovery levels from the controller will differ from what the other controllers will report. 5.16.1.2 Soft Un correctable ECC Errors A soft uncorrectable ECC error is the occurrence of an uncorrectable ECC error on the first read attempt; however, a successful recovery level and/or retry was made and the data was read successfully (with eight or less symbols in error). In such a case, the block is flagged as a BBR candidate for testing purposes by the HSC controller (or in case of a UDAlKDAlKDB controller, the host operating system driver). For uncorrectable ECC errors (MSCP status/event ES), the following items should be considered: • For the RA901RA92 disk drive, examine the error log and determine that the MSLG$_LEVEL and MSLG$_RETRY (for VMS) is being reported as follows: If the recovery level is reported as 0 and the retry count is =1 for the uncorrectable ECC errors, an occasional error under high I/O rates may be considered normal. The normal recovery will occur on the first retry with a recovery level of O. If more than a single retry is necessary, and especially if other levels of recovery are necessary, this indicates potentially more serious error conditions, including the legitimate condition whereby a block is going bad and needs replacement. The RA90 short-arm HDA and the RA92 HDA will show improved (decreased) ECC error rates. The nominal distribution of uncorrectable ECC errors for an RA90 disk drive with a long-arm HDA operating at very high I/O rates should appear as follows: - Ninety percent of the errors occur in the top five heads (heads 0 through 4). One of the heads (in the 0-4 range) will have no errors logged. At least three of the top five heads will have errors of this type. - You should have a sample size of at least 16 uncorredable ECC errors for examination. If this distribution of errors is not met, then further analysis should be done. For example, if 10 of the 13 heads are logging these data errors, then consider it a general read path problem and troubleshoot accordingly. If distribution is to a single head, then consider the likelihood of a defective HDA. If error log information indicates that data recovery was accomplished by utilizing a drive error recovery level of 7 through 14 (head offset mechanism), then consider HDA replacement (especially if 9A, 9B, or 9C errors are being logged in the drive as well). • Each error log entry of an uncorrectable ECC error should be followed by a BBR packet (reference Section 5.16.2.1). The MSCP status/event code should reflect a 34, BBR replacement attempted but block tested okay. Blocks in a normal drive will be retired at a very low rate (less than 20 percent of the time) for the normal transient occurrence of uncorrectable ECC errors on RA90 disk drives. Example 5-4 has three fields of note (emphasized in bold). The:first emphasized field denotes the actual MSCP status/event logged (OOES), and a bit-to-text decode denoting that the read error was an uncorrectable ECC error. The second field of note indicates how the subsystem recovered from the error condition; in this case, a single retry was successful with no special error recovery mechanism being invoked to aid in the recovery of the data. DIGITAL INTERNAL USE ONLY Troubieshooting and Error Codes 5-47 The third emphasized field is the field within an error log packet that, for an ECC-type MSCP status/event packet, typically has no meaning and will in most all cases indicate zeros. This section of an errorlog packet will, however, contain significant information for the interpretation of MSCP status/event 6B error packets. 29. ******************************* LOGGED ON: SID 0200620E SYS TYPE 00000000 ******************************* ENTRY ERROR SEQUENCE 3885. DATE/TIME 30-JAN-1989 19:54:03.77 SCS NODE: PICKUP ERL$LOGMESSAGE ENTRY REVI 14. KA750 OCODE REVI 98. I/O SOB-SYSTEM, UNIT _HSC013$D0A36: MESSAGE TYPE 0001 DISK MSCP MESSAGE MSLG$L CMD REF MSLG$W:UNIT AE66000F 0024 MSLG$W_SEQ_NOM 0054 MSLG$B_FORMAT 02 MSLG$B_FLAGS EO UNIT 136. SEQUENCE 184. DISK TRANSFER ERROR BAD BLK REPLACEMENT REQOEST OPERATION CONTINUING OPERATION SUCCESSFUL MSLG$Q_CNT_ID 0000F20D 01010000 MSLG$B_UNIT_SVR OB MSLG$B_UNIT_ HVR 01 IISIaQ$B LBWL !!SLG$B:lLll'!RY 00 01 UNIT SOFTWARE VERSION I l l . UNIT HARDWARE REVISION 11. MSLG$L_VOL_SER <so Drive BeOOV8%Y Invoked <S~91. ~'b:y . . . auQO•••~ul' <Minimal. impact event 00OO036C VOLUME SERIAL 1876. MSLG$L_HDR_CODE 000E75BD LOGICAL BLOCK 1947645. GOOD LOGICAL SECTOR CONTROLLER DEPENDENT INFORMATION ORIG ERR 8010 EDC ERROR ECC ERROR ERR RECOV FLGS LV11 D'lRY LVl B amy BOF DAT HEM ADR SRC REQ 1 DET REQ 1 0003 00 00 C41B 03 03 LBN REPLACEMENT INDICATED ERR LOGGED TO CONSOLE AND HOST < POJ: data pJ:Obleru, the •• ~iel.da 8houl.cl contaa ' zeros' • *********************************************** Example 5-4 VMS Uncorrectable ECC Error Log-Soft DIGITAL INTERNAL USE ONLY 5-48 Troubleshooting and Error Codes 5.16.2 Correctable ECC Errors-MSCP Status/Event Codes 1A8, 1C8, 1E8 Correctable ECC errors are those where the data was read with symbols in error above the drive threshold (6-8 symbols for the RA901RA92 disk drive). For ECC errors (MSCP status/event codes lAB, 1C8, and 1E8), consider the following: • For an RA90 disk drive with a long-arm HDA, an occasional ECC error (including 6-8 symbols in error and soft uncorrectable errors) may be considered normal when the drive has sustained or 110 burst rates of >30 1I0s per second. The RA90 short-arm HDA and the RA92 HDA show a marked improvement (decrease) in ECC error rates. The nominal distribution of correctable ECC errors for an RA90 disk drive with a long-arm HDA should appear as follows: - Ninety percent of the errors occur in the top five heads (heads 0 through 4). - One of the heads (in the 0-4 range) will have no errors logged. At least three of the top five heads will have errors of this type. You have a sample size of at least 16 uncorrectable ECC errors for examination. If this distribution of errors is not met, then further analysis should be done. For example, if 10 of the 13 heads are logging these data errors, then consider it a general read path problem and troubleshoot accordingly. If distribution is to a single head, then consider the likelihood of a defective HDA. If error log information indicates that data recovery was accomplished by utilizing a drive error recovery level of 7 through 14 (head offset mechanism), then consider HDA replacement (especially if 9A, 9B, or 9C errors are being logged in the drive as well). • Each error log entry of an ECC (6-8 symbol) error should be followed by a BBR packet (reference Section 5.16.2.1). The MSCP status/event code should reflect a 34, BBR replacement attempted but block tested okay. Blocks in a normal drive will be retired at a very low rate Gess than 20 percent of the time) for the normal transient occurrence of correctable ECC errors on RA90 disk drives. 5.16.2.1 BBR Packet ECC errors that exceed the drive threshold initiate BBR algorithms. The BBR algorithms are provided to test, verify, and replace (if needed) defective media spots or marginal media/head spot combinations (assuming no data path problems). In those instances where the BBR algorithms do not determine a need for block replacement, it may be due to a transient type error situation, or mechanisms not attributable to actual head/media margins. These above-drive-threshold ECC errors (or uncorrectable ECC errors) may be caused by drive phenomena other than bad media/heads. The BBR packet, which is generated at the completion of the BBR algorithm, will contain several important clues about the nature of the ECC error. Included in the packet is whether the block tested good or bad, and whether the original data was recovered or restored with the FORCED ERROR flag set, indicating the data was lost. The following MSCP status/event codes are applicable for a BBR packet: MSCP status/event 14-Bad block successfully replaced. MSCP status/event 34-Block verified okay; not a bad block. MSCP status/event 54-Replacement failure; replace command failed. MSCP status/event 74-Replacement failure; inconsistent RCT. MSCP status/event 94-Replacement failure; drive access failure. DIGITAL INTERNAL USE ONLY Troubieshooting and Error Codes 5-49 MSCP status/event B4-Replacement failure; no block available. MSCP status/event D4-Replacement failure; two successive RBNs were bad. Example 5-5 illustrates what the status of the BBR replacement algorithm resulted in. In this example, the block in question did go through BER; however, the block was not replaced, Further in the example, the replace flags demonstrate that the block was not replaced because the block "verified good." The last segment of the BBR log packet reveals why the block was even tested. In this example, the block was thought to contain a data error with a severity level of "uncorrectable ECC." 5.17 Troubleshooting Controller-Detected Positioner Errors-MSCP Status/Event 68 MSCP status/event 6B is a positioner unintelligible header error (also referred to as a positioner error mis-seek). Several considerations must be weighed when troubieshooting the MSCP 6B event. These include: • For RA9OIRA92 disk drives, what is the I/O rate on the drive? • Is only one SDI path noting the problem? • Are other errors being logged at or near the same frequency as the MSCP 6B? • For RA92 disk drives, what is the write-to-read ratio? • What recovery level/mechanism is the controller using in order to recover from the situation? With the RA90lRA92 disk drive, if in the examination of the error log, it can be dete!'Tn1Ded that: • the Level A retry mechanism is successful on first retry, and • the Level B retry mechanism is not being used (reported Level B retry count =0), and • "all" errors are being recovered on a single retry, then an error rate of six per day may be considered nominal for the RA901RA92 disk drives operating near or above 30 I/Os per second. Example 5-6 illustrates a typical RA90 error log on a VMS system. The fields of note are empl-..asized in bold. 5.17.1 RA92 Disk Drive With MSCP Status/Event 68 RA92 disk drives may log more occurrences of MSCP status/event 6B than RA90 disk drives in applications during which long sequences of write activity are occurring. This phenomenon, as a contributor to 6B events, was recently discovered and identified. Though it occurs more often with the RA.92 disk drive, heavy write-to-read ratios could be a contributor to logged MSCP 6B events by RA90 disk drives. The problem is occuning within the design of the heads while the head is involved in large sequential write transfers. When the head has to switch back to read (for next header identification), noise can result in the head that essentially disrupts the header signal as it is read. No identifiable damage to the actual header information is exhibited on the media. Customer data is not at risk. The noise merely disrupts the read chain momentarily as the header is being read. By the time the next sector comes around, the read chain will have stabilized. This head phenomenon will result in additional 6B errors being logged when the write-to-read ratios are heavily weighted in favor of writes. Typical VMS environments may not provide this scenario. It has been noted that typical ULTRIXIUNIX applications appear to have a higher mix of write-to-read activity than VMS applications. However, regardless of the operating system, certain applications may increase the potential of this phenomenon occurring when those applications, by their nature, offer heavy write-to-read ratios. DIGITAL INTERNAL USE ONLY 5-50 Troubleshooting and Error Codes ****** ENTRY 6., ERROR SEQUENCE 4709. LOGGED ON SID 05283914 ERL$LOGMESSAGE ENTRY KA820 REVt E BI NODE' 2. I/O SOB-SYSTEM, UNIT _HSC015$D0A36: MESSAGE TYPE PATCH REV' 28. OCODE REV' 20. 0001 DISK MSCP MESSAGE MSLG$J.. CMD REF MSLG$W= UNIT 6BBCOOOA 0024 MSLG$W_SEQ_NOM 0002 MSLG$B_FORMAT 09 MSLG$B_FLAGS 80 KSLG$W_ BVlD1T 0034 UNIT '36. SEQUENCE '2. BAD BLOCK REPLACEMENT ATTEMPT OPERATION SOCCESSFUL BAD BLOC1t RBPL&CBND1'f BLOCK VB1U:rIBD C:OOD MSLG$Q_CNT_ID 0000FC15 01200000 UNIQUE IDENTIFIER, 00000000FC15 (X) MASS STORAGE CONTROLLER HSC70 MSLG$B_CNT_SVR 27 MSLG$B_CNT_HVR 00 CONTROLLER SOFTWARE VERSION '39. CONTROLLER HARDWARE REVISION '0. MSLG$W MOLT UNT 0060 MSLG$Q=UNIT=ID 000003F6 02130000 UNIQUE IDENTIFIER, 0000000003F6(X) DISK CLASS DEVICE (166) RA90 MSLG$B_UNIT_ SVR OB MSLG$B_UNIT_HVR 01 MSLG$W_ RPL_ JlLGS 0000 MSLG$L_VOL_SER 0000036C MSLG$L_BAD_LBN 00175A52 UNIT SOFTWARE VERSION '11. UNIT HARDWARE REVISION '1. VOLUME SERIAL '876. BAD LOGICAL BLOCK NUMBER = 1530450. MSLG$L OLD RBN MSLG$L-NEW-RBN MSLG$W:CAOO 00000000 000056A4 00B8 DHA aaoR ORCORUCTULJI ace DBOR ******************************************* Example 5-5 VMS BBR Packet DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-51 VMS LOGGED !<'..sCP '6B' POSITIONER ERRORS ******************************* ENTRY 1. ******************************* LOGGED ON: SID 1105009C SYS TYPE 00000000 ERROR SEQUENCE 2151. DATE/TIME 26-JOL-1990 11:12:49.31 ERL$LOGMESSAGE ENTRY KA88 BEV. 5. CPU • O. CPO O. I/O SOB-SYSTEM, UNIT _BSC4$DUA39: MESSAGE TYPE 0001 DISK MSCP MESSAGE MSLG$L CMD REF MSLG$W:UNIT 56310024 0027 MSLG$W_SEQ_NOM 001B MSLG$B_FORMAT 02 MSLG$B_FLAGS 81 UNIT .39. SEQUENCE '27. DISK TRANSFER ERROR SEQUENCE NUMBER RESET OPERATION SUCCESSFUL IfSlaQ$1f_avD'f OOG MSLG$Q_CNT_ID 0017F20D 01010000 UNIQCE IDENTIFIER, 00000017F20D(X) MASS STORAGE CONTROLLER HSC50 CONTROLLER DEPENDENT INFORMATION ORIG ERR 1800 HEADER COMPARE ERROR HEADER SYNC TIMEOUT SOSPECTED LOW HEADER MISMATCH ERR RECOV FLGS 0002 LVl A amy LVl a amy 01 00 C4BF 02 02 BUF OAT MEN ADR SRC REO t DET REQ t Example 5-6 ERR LOGGED TO CONSOLE AND HOST <---~ 1 "A" ~ <---BOB 110 "a" aBIBS Positioner MIs-Seek MSCP StatuslEvent 6B DIGITAL INTERNAL USE ONLY 5-52 Troubleshooting and Error Codes The occurrence of 6B errors caused by this phenomenon has been more pronounced on the KDMlHSC controllers than on the KDAlKDBIUDA controllers. Since experience and engineering evaluation have shown that the occasional occurrence of the MSCP status/event 6B, when recovered on a single retry, is inconsequential, extra error management code has been implemented as follows: • HSC software released after the 39x series will contain special 6B error management code that will look for this error signature and will not report this event characteristic of the RA9OIRA92 product. • The KDM70 controller with microcode at revision level 2 will also contain this enhanced error management code for 6B errors on RA9OIRA92 disk drives. This phenomenon is being aggressively plD'Sued by Digital and resolution details will be communicated to the field. 5.17.2 Evaluating MSCP 68 Events When converting some (20-30 LBNs identified as 6B MSCP events) of the target LBN numbers, look for the following: • Single head but quite random cylinder addresses-consider the HDA • Single head but narrow band of cylinder addresses-consider mapping out suspect LBNs with DKUTIL or HDA replacement. To manually force replacement of a perceived bad block, make sure a current disk backup exists. • Repeating LBNs-consider "mapping" out suspect LBNs with the BBR utility (DKUTIL). • Random heads (10 of 13 heads>--<:Onsider data path including controller SDI module. Troubleshoot MSCP status/event 6B as follows: • Update the drive with the latest drive microcode version. • If errors are only happening on one port, plD'Sue a port path problem, including ECM, SDI cables between drive and bulkhead, cabinet to controller cabinet, and within the controller cabinet and the port interface module in the controller. • Note whether more than one drive on the requester is reporting consistent 6B events. This would more definitely suggest a port interface problem within the controller. • If errors are clearly happening on both drive ports, pursue the problem as a drive problem first, when the event rate exceeds the guidelines indicated above and/or customer satisfaction dictates. 5.18 Conclusion The DSA architecture defines a very reliable and flexible storage subsystem. This subsystem can be maintained efficiently and effectively when consistent and methodical troubleshooting procedures are followed. Poorly trained or untrained Customer Services engineers are at a serious disadvantage. The cost of supporting incolTectly identified FRUs is very high. Many of the FRU units are expensive to replace. Some very expensive FRUs are not repairable FRUs. The impact to a customer can be substantial. Impacts include: • Necessity to back up and restore potentially large amounts of data on misdiagnosed HDA replacements. • Loss of system availability when using standalone diagnostics with controllers such as UDAlKDAlKDB. • Loss of drive availability when performing extensive subsystem diagnostics using an HSC controller. DIGITAL INTERNAL USE ONLY Troubieshooting and Error Codes 5-53 • Increased frustration and inconvenience of dealing with repeated calls. • Loss of confidence in Digital as a quality supplier of storage systems. • Increased potential of data loss if improper diagnosis is made and the failure mode continues or gets worse. SERVICE GOAL The Customer Services engineer'. number one goal in service efforts is to correctly diagnose a problem on the first ea11 and replace the correct part 80 the c1l8tomer'. disk and data availability is minjma]]y impacted. 5.19 Error Codes and Descriptions This section describes RA9OlRA92 disk drive eIIOr codes. Included in each elTUC wde de&'ciption is a list of suggested replacement FRUs for repairing drive problems. Careful analysis of both system and drive internal error logs, along with drive-generated error codes, should lead to problem isolation and correction. Error codes are listed in hex numerical order starting with error code 01 through error code FD (hex). The general format of the error code listings is as follows: 01 0 Spindle Motor Transducer nmeout • Error Type: DE • Error Description: The spindle was given the command to spin up by an SDI cOmmand or from the front panel Run switch and no movement was detected by the spindle motor transducer. See error code 13 for possible isolation help before replacing FRUs. e Fault IsolatioDlCorrection: • 1. ECM 2. HDA 3. P..ear flex cable assembly Where: o 01 is the error code. • SPINDLE MOTOR TRANSDUCER TIMEOUT is the error message. 6) DE is the error type. e Error Description: is a brief summary of the error event. " Fault Isolation/Correetion: is the suggested FRU replacement order for troubleshooting. DIGITAL INTERNAL USE ONLY 5-54 Troubleshooting and Error Codes 01 Spindle Motor Transducer nmeout Error Type: DE Error Description: The spindle was given the command to spin up by an SDI command or from the front panel Run switch, and no movement was detected by the spindle motor transducer. See error code 13 before replacing FRUs. Fault IsolatioDlCorrection: 1. ECM 2. HDA 3. Rear flex cable assembly 02 Splnup Too Slow Error Type: DE Error Description: The spindle did not reach 1000 rlmin within 20 seconds. See error code 13 before replacing FRUs. FaultlsolatioDlCorrection: 1. ECM 2. HDA 3. Rear flex cable assembly 03 Spindle Not Accelerating During Splnup Error Type: DE Error Description: The spindle did not accelerate above 1000 rlmin in the allotted spinup timeout period. See error code 13 before replacing FRUs. Fault IsolatioDlCorrection: 1. ECM 2. HDA 3. Rear flex cable assembly 04 Splnup Too Long to Lock on Speed Error Type: DE Error Description: The spindle did not reach 3600 r/min (::t 18 r/min) within 30 seconds. See . error code 13 before replacing FRUs. Fault lsolatioDlCorrection: 1. ECM 2. HDA 3. Rear flex cable assembly DIG1TAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-55 05 Invalid Drive Serial Number Code Error Type: DF Error Description: The drive serial number is out of acceptable range or an invalid manufacturing plant code was read by the drive microcode. Switches are set (or read) incorrectly on the rear flex cable assembly (S1IS2). This is neither a fatal error nor a hard error. Clearing the fault allows the drive to continue operation~ The drive serial number is checked during the power-up sequence. Table 5-8 Serial Number Bits <19:18> MIg 0 0 0 1 ex ex 0 1 1 0 1 1 KB Serial Number Riwp inDecbaaJ. Mas BiDaz7 Value Bits <17:00> 0-262,143 1111111111111111111 262,144-309,999 001011101011101111 310,000-524,287 1111111111111111111 invalid 0-262,143 1111111111111111111 0-262,143 1111111111111111111 invalid Fawtho~tio~Co~tiom 1. Incorrect SlIS2 bite set on rear flex cable assembly 2. Rear flex cable assembly 3. ECM seating problem 4. ECM 06 Microcode Fault Error Type: DF Error Description: A 'hA?tIwL-relsoftware failure caused the master processor addressing to point to a null EEPROM area. Fawt Iso~tioDlCorreetiom 1. Reload drive microcode 2. ECM 07 $DI Frame Sequence Error Error Type: RE Error Deseriptiom Level 1 SDI commands were detected in the wrong sequence. If the same drive is reporting errors from two controllers, start troubleshooting at the drive. Fawt Iso~tioDlCo~tiom 1. Controller 2. SDI cable 3. ECM DIGITAL INTERNAL USE ONLY 5-56 Troubleshooting and Error Codes 08 SOl Lvi 2 Checksum Error Error Type: RE Error Description: The calculated checksum did not compare with the checksum field sent by the controller to the drive for SDI level 2 commands. If the same drive is reporting errors from two controllers, start troubleshooting the drive. Fawtuomtio~Co~tiom 1. Controller 2. SDI cable 3. ECM 09 SOl Lvi 1 Framing Error Error Type: RE Error Deseriptiom A sync pattern was detected by the drive on the SDI WRITFJCOMMAND line, but no SDI level 1 control message transmission or single frame command was detected. Fawt Uomtio~Co~tiom 1. Controller 2. SDI cable 3. ECM OA SOl Incorrect Command Opcode Parity Error Error Type: PE Error Description: The wrong parity was detected on the opcode byte of a level 1 or level 2 command. Fault Uomtio~Co1Teetiom 1. Controller 2. SDI cable 3. ECM DB SOl invalid Opcode Error Type: PE Error Description: The decoded opcode is not a valid Gevel 2) SDI opcode. Fault Uomtio~Co1Teetiom 1. Controller 2. SDI cable 3. ECM DIGITAL INTERNAL USE ONLY Troubieshooting and Error Codes 5-57 OC SDI Command Length Error (LVL2) Error Type: RE Error Description: This error indicates the controller caused the drive SDI input command buffer to overflow. Fault IsolatioDlCorreetioD: 1. Controller 2. SDI cable 3. ECM OD SDI invalid Command with DrIve Error Error Type: PE Error Description: The controller issued an INITIATE SEEK command, an ERROR RECOVERY command, or a RECALIBRATE command while the drive was faulted. Fault IsolatioDlCorrectioD: 1. Controller 2. ECM 3. SDI cable OE SDI Lvi 1 invalid Select Group Number Error Type: RE Error Description: Indications are the controller attempted to select a nonexistent group. For RA90 and RA92 disk drives, group=head. Fault IsolatioDlCorreetioD: 1. Controller 2. ECM 3. SDI cable OF SD. Write Enable on a WrIte-Protected DrIve Error Type: PE Error DescriptioD: A drive write-protected from the OCP (front panel) was issued a WRITE ENABLE command through an SDI CHANGE MODE command. The OCP switch state has priority over any SDI CHANGE MODE commands. Fault IsolatioDlCorreetioD: 1. Disable Write Protect switch 2. Controller 3. ECM 4. OCP DIGITAL INTERNAL USE ONLY 5-58 Troubleshooting and Error Codes 10 SOl Command Length Error (LVL2) Error Type: PE Error Description: An SDI command length error, LVL2, indicates the number of bytes expected did not equal the number of bytes received for an SDI level 2 command. Fault Isolation/Correction: 1. Controller 2. SDI cable 3. ECM 11 Microcode Cartridge Load Occurred Error Type: Informational Only Error Description: This logged event indicates that a drive microcode update successfully occurred. This new event occurred with the introduction of the Etch-F I1O-RIW module. Etch-F revision ECM boards are -indicated by revision 1 or later in the lOP and SRV values reported with drive internal test T45. (There are a minimal number of Etch-E revision modules that provide this information.) Fault Isolation/Correction: 1. Information only 12 Spindle Speed Unsafe Error Error Type: DE Error Description: During idle loop, a spindle speed check indicated the drive was not up to speed at 3600 rlmin (:t 18 r/min). The servo processor will also detect this condition dynamically and have the master processor log this error as well. Fault IsolatioDlCorrection: Disabling the brake circuit may aid in troubleshooting. The brake can be disabled by opening either pin 4 or 5 of the rear HDA connector. Use the pin extraction tool (PIN 29-26655-00) to avoid breaking pins. CAUTION The female pins in the HDA connector are delicate and must be handled with care. When disabling the brake, cover loose pins with electrical tape to prevent them from shorting. 1. Reseat HDA 2. ECM 3. Power supply 4. Brake 5. HDA DIGITAL INTERNAL USE ONLY iroubleshooting and Error Codes 5-59 13 Spindle Motor Control Fault Error Type: DE Error Description: The motor control Ie detected a condition that prevented the spindle from getting up to speed. Fault IsolatioDlCorrection: 1. Reseat ECMlHDA 2. ECM 3. HDA A number of checks are made to detect this fault. A failure of any of the following checks results in this error: 1. If no Hall effect is seen within 700 ms after current is applied to the spindle motor. 2. If the SSI chip on the servo module which controls spindle speed rotation is operating at less than 6.8 volts. 3. If the brake circuit is activated at the same time that current is applied to the spindle. 4. If the Hall sensor input from the spindle motor is not occurring at a 700 ms rate. Additionally, any open condition in the spindle circuitry, including Hall sense phase or spindle motor phase circuitry, causes this error to be asserted. Although power supply voltages cannot be adjusted, they can be meaB".L~ by removi..ng the small cover as shown in Figure 5-8 (power supplies bearing a serial number starting with only). On the back of the connector, the pin numbers are visible. A very small electrical probe is required to make connection. ex POWER SUPPLY I II I ----.....1 > QUARTER-TURN HOLD-DOWN ~:::::::ttr------~,L.J SCREWS POWER SUPPLY ACCESS COVER CXO-2184B Figure 5-8 Power Supply Cover Removal DIGITAL INTERNAL USE ONLY 5-SO Troubleshooting and Error Codes Removal of this cover allows access to the power supply output voltage connector. 1b remove the power supply cover, use a quarter-inch hex driver. Remove the hold-down screws. Next, use a DVM or oscilloscope to measure the points to ground (black lead) as shown in Table 5-9. Table 5-9 Power SUpply Voltage Measurements Pin Wire Color Volta,re MeuureJD8Ilt Deviation 1 Orange +12 Vdc :.6 V 2 Black :t:12 Vdc (return) 3 Black :12 Vdc (return) 4 Blue -12 Vdc 5 Red +5.1 Vdc 6 Red +5.1 Vdc :.6 V :.25 V :.25 V 7 Red +5.1 Vdc :.25 V 8 Red +5.1 Vdc :.25 V 9 Black +5.1 Vdc (return) 10 Black +5.1 Vdc (return) 11 Black +5.1 Vdc (return) 12 Black +5.1 Vdc (return) 13 Purple -5.2Vdc :.17Vdc 14 Purple -5.2 Vdc :.17Vdc 15 Brown -24 Vdc :2.4 Vdc 16 Brown -24 Vdc :2.4 Vdc 17 Brown -24 Vdc :2.4 Vdc 18 Black :24 Vdc (return) 19 Black :24 Vdc (return) 20 Yellow +24 Vdc :2.4 Vdc 21 Yellow +24 Vdc :t2.4 Vdc 22 Yellow +24 Vdc :t2.4 Vdc 23 Brown 40kHzH 24 Blue -5.2 Vdc (sense) 25 Black -5.2 Vdc (sense return) 26 Orange DCOKH 27 Red OVTEMPH 28 Blue POCKH 29 White ONIOFFL In addition to these measurements, error codes 2D and FF indicate power problems. Along with the power supply measurements, a number of resistance checks can be made to the HDA. The HDA must first be removed from the drive chassis. Exercise care when handling the HDA so that connector pins are not damaged during measurements. DO NOT jam probes into the connector housing from the front of the connector because it is easy to damage the pins in these sockets. Access the pins from the rear of the connector or use the pin insert/extract DIGITAL INTERNAL USE ONLY Troubieshooting and Error Codes 5-61 tool (PIN 29-26655-00) to remove pins from connectors for easier measurements. Refer to Table 5-10 to locate opens in the circuits. Table 5-10 lists pin-to-cireuit connections. Table 5-10 HDA Connector Pin Designations Pin Wire Color Cireait 1 Blue Positioner lock solenoid (-) 2 Blue Positioner lock solenoid (+) 4 White Brake (-) 5 White Brake(+) 6 Green sa 7 Violet S2 8 Flex Positioner actuator 1Ix (-) 9 Orange S1 10 Flex Positioner actuator fix (+) ii Brown Ball sen&Or ground 12 Gray Spindle motor coil C 13 Red Hall sensor 5 V input 14 Blue Spindle motor coil B 16 Black Spindle motor coil A Grnd Yellow Spindle motor lamination lead exits HDA and is grounded on HDA. Resistance measurements are checked according Table 5-11. Table 5-11 HDA Resistance Measurements (-)Pm to (+)Pm Circuit Measured Value 16-14 Coil A - Coil B 1.4 ohm 16-12 Coil A- Coil C 1.4 ohm 14-12 Coil B - Coil C 1.4 ohm 16 - HDA ground Coil A - ground 2Omegobm 14 - HDA ground Coil B - ground 20megobm 12 - HDA ground Coil C - ground 2Omegobm 9-7 SI- S2 2Omegobm 9-6 SI- sa 2Omegobm 7-6 S2- sa 2Omegobm 9-13 S1- Hall 5 V 20megobm 7-13 S2-Hall5V 20megobm 6-13 sa-Hall5V 20 megohm 9-11 SI - Hall ground ~4.50 megobm DIGITAL INTERNAL USE ONLY 5-62 Troubleshooting and Error Codes Table 5-11 (Cont.) HDA Resistance Measurements (-)Pin to (+)Pin Circuit Measured Value 7-11 82 - Hall ground ~4.30 megohm 6-11 S3 - Hall ground ~4.50 megohm 11-13 Hall ground - Han 5V ~7megohm 1-2 Positioner lock solenoid ~30ohm 8-10 Actuator coil ~4 ohms 14 Head Offset Margin Event Error Type: DE Error Description: This is not an error condition. Manufacturing sets the enable ftag for the detection of this event. If this code shows up in the field, reset the ftag by taking the drive off line and powering it down and then up. Fawt~~tio~Co~tion: 1. Power drive off and back on. 15 Head Offset Out-of-Band Error Error Type: DE Error Description: Head offset has exceeded normal head offset parameters for this drive. This is a serious problem. Data is in danger of being lost. Do not use the drive for further writes. Initiate prompt backup. Head offset errors can result from an over-temperature condition. Check drive airflow and ambient room temperature. If temperature appears to be normal, replace the HDA The amount of offset necessary before this eITOr is ftagged is :3/4ths of a track. After each offset table rebuild, the servo processor tests each head value against this threshold. If a head exceeds offset limits, the master processor asserts ATTENTION and uses the GET STATUS response to identify which head or heads are involved. The drive specific bytes of the drive internal error log should indicate which head has marginal offsets. Fawt Iso~tioDlCorreetion: 1. HDA 2. ECM 3. PCM 16 SDllnvalld Group Select LVL2 Error Type: PE Error Description: The controller attempted to select a nonexistent group. A group refers to a head in the RA90 and RA92 disk drives. If the drive is dual-ported and logging this error from both controllers, troubleshoot the drive. Fawt ~~tioDlCorreetion: 1. Controller 2. ECM DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-63 17 SDI Port A CommandIResponse Timeout Error Type: Informational Only Error Description: The Port A controller did not accept message response data from the drive. This is typically a communications event and not a drive error. Fault Isolation/Correction: 1. Communications event (typically not a drive problem) 2. Controller on Port A 3. ECM 18 SDI Port B CommandlResponse Timeout Error Type: Informational Only Error Description: The Port B controller did not accept message response data from the drive. This is typically a communications event and not a drive error. Fault Isolation/Correction: 1. Communications event (typically not a drive problem) 2. Controller on Port B 3. ECM 19 SDI Invalid Format Request Error Type: PE Error Description: The controller requested that the drive place itself in 576-byte format. The RAOOIRA92 only accepts 512-byte format. This error can also be caused by someone trying to format the drive in 576-byte mode. Fault Isolation/Correction: 1. Controller 2. ECM 1A SDI Invalid Cylinder Address Error Type: PE Error Description: The drive decoded a nonexistent cylinder address during a controllerinitiated SEEK command. This error also occurs when a controller, while running diagnostics, attempts to test the DBN area of the disk without first setting the drive's DB bit. This error also occurs if an attempt is made to access cylinders beyond the DBN space if the DB bit is set. Fault Isolation/Correction: 1. Controller 2. ECM DIGITAL INTERNAL USE ONLY 5-64 Troubleshooting and Error Codes 1Binner Guardband Error Error Type: DE Error Description: The drive hardware detected servo inner guardband information instead of servo data information or outer guardband information. The only time the servo head is positioned in the inner guardband area and does not generate an error is during execution of diagnostics. Fault IsolatioDlCorrection: 1. ECM 2. HDA NOTE If an actuator current error or actuator speed error is also indicated, it is probable that the inner guardband error is secondary. Reference the respective actuator error. 1C Outer Guardband Error Error Type: DE Error Description: Outer guardband information was decoded when servo or inner guardband information was expected. The only time the servo head is positioned in the outer guardband area and does not generate an error is during execution of a head load operation, a recalibrate, or internal diagnostics. Fault IsolatioDlCorrection: 1. ECM 2. HDA NOTE If an actuator current error or actuator speed error is also indicated, it is probable that the inner guardband error is secondary. Reference the respective actuator error. 1D Ilegal Servo Fault Error Type: DE Error Description: A servo fault was detected by the GASP array; however, when the master processor examjned the register information, the error was invalid. FaultuolatioDlCorrection: 1. ECM 1E Power-Up Aft. AC Power Loss Error Type: Information only Error Description: Information event noting that the drive performed a power-up sequence after ac power loss. This may be the result of turning the drive power off at the breaker, or loss of ac power to the drive/cabinet. This new event occurred with the introduction of the Etch-F VO-BIW module. Etch-F revision ECM boards are indicated by revision 1 or later in the lOP and SRV values reported with drive internal test T45. (There are a minimal number of Etch-E revision modules that provide this information.) DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-65 This event is different from the logged event as a result of the power supply being over temperature. ) Fault Isolation/Correction: 1. Information Only 1F Sector OVerrun Error Error Type: DE Error Description: When a sector or index pulse occurs with either WRITE GATE or READ GATE asserted, an overrun error is asserted. This indicates a write or read operation was attempted through a sector/index boundL-y. Fault Isolation/Correction: 1. Controller 2. ECM 20 SDI RTCS Parity Error Error Type: DE Error Description: A bit was dropped or picked up in data transferred on the SDI Real Time Controller State (RTCS) tine. Fault Isolation/Correction: 1. Controller 2. SDI cable 3. ECM 21 SDI Transfer (Pulse) Error Error Type: DE Error Description: An extra or missing pulse was detected on the SDI WRT/CMD tine or the RTCS line. If t~'s enor OCC'.!!'S &rJft' both ports and/or more than one controller, trouhieshoot the drive. If only one port is involved, troubleshoot the SDI cables or the controller. See Figure 5-9. BIT CELL TIME (86.2n8) BIT CELL TIME BIT CELL TIME BIT CELL TIME BIT CELL TIME BIT CELL TIME ENCODED DATA II ~ Figure 5-9 r-- PULSE WIDTH (12 +1-2n8) CXO-1325B WRT/CIID Data Format DIGITAL INTERNAL USE ONLY 5-66 Troubleshooting and Error Codes On the WRT/CMD and RTCS lines, a positive transition at the leading edge of a bit cell indicates a one; a negative transition indicates a zero. If the next bit cell contains the same data (a one followed by a one or a zero followed by a zero), the line switches polarity in the middle of the bit cell. The error is detected by the TSID gate array and is passed to the SDI gate array as a PLS ERR error. A pulse error should only be reported when the drive is executing a data transfer operation. If a pulse error occurs during a TRANSFER command, PLS ERR will set bit 0 of Fault Register 3 of the SDI gate array. Fault IsolatioDlCorrection: 1. ECM 2. SDI cables 3. Controller 4. Power supply 5. Spindle ground brush 22 Electronic Control Module Over-Temperature Error Error Type: DE Error Description: An over-temperature condition exists in the drive. Drive over-temperature conditions result from high room temperature or a dirty air vent inhibiting airftow through the drive. Additionally, a bad blower motor could cause the intemal temperature of the drive to increase, but a 2D error is more likely in this case. This over-temperature condition happens when the detector senses 43°C (1100F). Fault IsolatioDlCorrection: 1. Ambient air temperature is too high 2. Cabinet door air vent needs cleaning 3. Blower assembly 4. ECM 24 Loss of Fine Track During Data Transfer Error Type: DE Error Description: A loss of fine track was detected when a read or write operation was ready to begin, but not actually started. This error code is not implemented in microcode revision 7 and later. 'Refer to servo event 9A. Fault IsolatioDlCorrection: 1. Install RA9OX-OOOI FCO 25 Servo Fault Error Error Type: DE Error Description: A servo error was detected but no condition was found that would cause the error condition. The master processor, while in its idle loop, was smnning the servo GASP gate array and discovered error bit(s) set. Valid conditions include: Actuator fault PLO error Actuator over current error DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-67 Actuator over speed error Track counter en-or Off track error Guardband error Heat sink 1 error-over-temperature Heat sink 2 error-over-temperature Fault Isolation/Correction: 1. ECM 26 Spindle Speed Error (Servo Processor) Error Type: DE Error Description: Spindle is not within :t 0.5% of 3600 rlmin. The servo processor monitors spindle speed. This error is different from the loss of PLO which can OCC'".lr sspL-ately from the error. Upon detection of the loss of PLO, the master processor examines the servo processor status to determine if it has valid servo-detected error information. If it does, this error is asserted. Fault Isolation/Correction: 1. ECM 2. Brake 3. HDA 'Z'I Servo OVer-Temperature Error at Sl Error Type: DE Error Description: The thermal sensor (S1) on the servo module detected an overtemperature condition. This results in the master processor spinning the disks down and setting this error condition. If the over--temperature clears, the controller can initialize the drive and try to spin it back up. Fault Isolation/Correction: 1. Ambient air temperature too high 2. Cabinet door air vent needs cleaning 3. Blower assembly 4. ECM 28 Servo OVer-Temperature Error at S2 Error Type: DE Error Description: The thermal sensor (82) on the ECM detected an over-temperature condition. This results in the master processor spinning the disks down and setting this error condition. If the over-temperature clears, the controller can initialize the drive and try to spin it back up. Fault IsolatioDlCorrection: 1. Ambient air temperature too high 2. Cabinet door air vent needs cleaning 3. Blower assembly 4. ECM DIGITAL INTERNAL USE ONLY 5-68 Troubleshooting and Error Codes 28 SDa invalid Error Recovery Level SpecHlec:l Error Type: PE Error Description: The controller issued an SDI ERROR RECOVERY command with an illegal recovery level. The RA901RA92 supports 14 error recovery mechanisms. This value is passed to the controller during a GET COMMON CHARACTERISTICS command. The controller in this case asked {or a level greater than 14. Not all controllers report the etTOr recovery levels in the same manner. Fault IsolatioDlCorreetion: 1. Controller 2. ECM 2A SDI invalid SUbunit SpecHled Error Type: PE Error Description: The controller attempted a GET STATUS command to a subunit address other than zero. (The RA9OIRA92 is a single unit drive with a subunit address of zero.) Fault Isolation/Correction: 1. Controller 2. ECM 2B SDa invalid Diagnose Memory Region location Error Type: PE Error Description: The controller or the operator attempted to execute a nonexistent internal drive test or internal diagnostics while the drive was on line to the controller. Fault IsolatioDlCorrection: 1. Use valid diagnostic 2. Controller 3. ECM 2C SOl Spindle Not R_cty with SeeklRecallbration Command Error Type: PE Error Description: A RECALmRATE or SEEK command was issued to a spun-down disk drive. Fault IsolatioDlCorrection: 1. Controller 2. ECM DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-69 2D Power Supply OVer-Temperature Error Type: DE Error Description: A critical over-temperature condition exists in the power supply. This condition is detected by the master processor th..?Ough the OVER TEMP L signal Within 15 ms of detection, the de voltages are removed in an orderly fashion. The error is stored in EEPROM and can be'read when power is restored to the drive after the over-temperature condition is corrected or the power supply cools down sufficiently to allow power to be reapplied. Fault Isolation/Correction: 1. Ambient air temperature too high 2. Blower assembly 3. Power supply 4. ECM 5. Rear :flex cable assembly 2E SDI Splnup Inhibited by Controller Flags Error Type: PE Error Description: The drive cannot be spun up from the OCP while the drive is in the AVAILABLE or ONLINE state to the controller. NOTE If the Run switch is selected prior to the Fault switch, a 2E led code will be indicated. Fault Isolation/Correction: 1. Check Run switch 2. ECM 2F SDI RUN Command wHh Run Switch In Stop Position Error Type: PE Error Description: An SDI RUN command was issued to the drive when the OCP Run switch was in a logical stop state. Fault Isolation/Correction: 1. Check OCP switch state 2. Controller 3. ECM 30 Write Current and No Write Gate Error Type: DE Error Description: Current was detected at the read/write heads and WRITE GATE was not asserted. The PCM provides the current source for the write chips in the HD.A_ Drive fLT'!!lware tests for this condition during diagnostics. DIGITAL INTERNAL USE ONLY 5-70 Troubleshooting and Error Codes Fault IsoIatioDlCorrection: 1. ECM 2. PCM 3. HDA 31 Read Gate and Write Gate Both Auerted Error Type: DE Error Description: SDI gate array detected. that READ GATE and WRITE GATE were asserted at the same time. Fault IsoIatioDlCorrection: 1. ECM 2. Controller 32 Read or Write While Faulted Error Type: DE Error Description: A READ or WRITE command was issued to a drive that bas a fault condition. FaultIsoIatioDl~tion: 1. Check error log for fault condition 2. Controller 3. ECM 33 Attempt to Write Through Bursts Error Type: DE Error Description: An attempt was made to aseert WRITE GATE while the read/write heads were positioned over embedded servo burst iDformatiOll. Fault IsoIatioDlCorrection: 1. ECM 2. Controller 3. HDA 34 EN DEC Encoder Error Error Type: DE Error Description: Data to be written to media has been improperly 213 encoded. FawtuoIatioDlCorrection: 1. ECM 2. PCM DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-71 35 Write and Write Unsafe Error Type: DE Error Description: A problem in the write data path prevented the drive from correctly writing data to the disk su.~ace. One or more of the :fono~ng conditions cause this error: • No write data transitions • No write current • No SSI283 (head select chip) selected • SSI 283 stuck in read mode The unsafe conditions are wire ORad together and are detected on the PCM. Fault Isolation/Correction: 1. PCM 2. HDA 3. ECM 36 Write and Servo Uncallbrated Error Type: DE Error Description: The firmware routines used to calibrate the read/write heads and the servo system failed to complete successfully. The subsequent write was attempted with the servo unealibrated. Fault Isolation/Correction: 1. PCM 2. ECM 3. HDA 37 Write Gate and No Write CUrrent Error Type: DE Error Description: WRITE GATE was asserted but no write current was detected at the read/write heads. The PCM sources the current when WRITE GATE is asserted. Fault Isolation/Correction: 1. PCM 2. ECM 3. HDA 38 Read Gate and Multiple Head Chips Selected Error Type: DE Error Description: During a read operation, the master processor determined that more than one head and/or more than one SSI 283 chip was selected. DIGITAL INTERNAL USE ONLY 5-72 Troubleshooting and Error Codes Fault Isolation/Correction: 1. PCM 2. ECM 3. HDA 39 Write Gate and Off Track Error Type: DE Error Description: A loss of fine track was detected when WRITE GATE was asserted. This error code is not implemented in microcode revisions 7 and later. This error code is used exclusively with the dedicated-only servo system found on earlier drives. Refer to error code 9B. Fault IsolatioDlCorrection: 1. Install RA9OX-OOOI unless superseded by a later FCO. 3A Write Gate and Write-Protected Error Type: WE Error Description: A write-protected drive detected the assertion of WRITE GATE. Fault IsolationlCorrectiom 1. Controller 2. ECM 3B Hard INIT Occurred to Drive Error Type: DE Error Descriptiom This is not typically an error condition. It is a record of initializations (initializations the controller started by the RTCS logical signallNIT, and initializations started by the drive). Initializations stop mechanical movements, and the drive performs a power-up initialize and reloads the servo processor code. . Examine previous error conditions. With drive microcode revisions 10 or earlier, if the drive performs a bard initialization on its own (for example, when new drive microcode has just been reloaded), this error entry will be recorded into the EEPROM. Microcode revisions later than 10 give a new indication of microcode reload. Refer to drive LED code 11. Fault Isolation/Correctiom 1. Look for previous errors 2. ECM 3. Controller 3D HDA ReadlWrite Interlock Broken Error Type: DE Error Description: The cable between the PCM and the ECM is disconnected or broken. DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-73 Fault Isolation/Correction: 1. Disconnected ECM-to-PCM cable 2. Bad ECM-to-PCM cable 3. PCM 4. ECM 3E OCP Interlock Broken Error Type: DE Error Description: The operator control panel was removed with de voltages still applied to the drive. Fault Isolation/Correction: 1. OCP flex circuit connectors 2. Bezellblower flex circuit/servo module connectors 3. Servo modulelECM connectors 4. OCP 5. ECM 40 SDI Invalid Read Memory Region Error Error Type: PE Error Description: The controller issued an SDI level 2 READ MEMORY REGION command to an invalid region of drive read memory. Fault Isolation/Correction: 1. Operator attempted to write or read a nonexistent or protected memory location. 2. Controller 3. ECM 42 Drive Not On UneISEEK Command Issued Error Type: PE Error Description: The controller issued an SDI level 2 INITIATE SEEK command and the drive was not on line to the controller. Fault Isolation/Correction: 1. Controller 2. SDI cable 3. ECM DIGITAL INTERNAL USE ONLY 5-74 Troubleshooting and Error Codes 43 TCR and Not ReadIWrlte Ready Fault Error Type: RE Error Description: The SDI gate array has decoded a data transfer command from the controller, but the drive is not ready to read/write; or the drive detected a loss of READIWRITE READY during a data transfer. Fault Isolation/Correction: 1. Controller 2. SDI cable (poor SDI connection) 3. ECM 44 Format Command and Format Not Enabled Error Type: RE Error Description: (A FORMAT ON SECTOR OR INDEX command or a SELECT TRACK AND FORMAT ON INDEX command was decoded by the drive without the format bit (FO) being set in the drive.) Fault Isolation/Correction: 1. Controller 2. ECM 45 Read Gate and Off Track Both Asserted Error Type: DE Error Description: A loss of fine track was detected when read gate was asserted. This error code is not implemented in microcode revisions 7 and later. This error code is used exclusively with the dedicated-only servo system found on earlier drives. Refer to error code 9B. Faultlsolation/Co~on: 1. Install RA9OX-0001 unless superseded by a later FCO. 46 Invalid Hardware Fault Error Type: DE Error Description: A failure was detected for unused fault inputs to the SDI gate array. Fault Isolation/Correction: .1. ECM 47 Invalid Disconnect CommandllT Bit Error Error Type: PE Error Description: An SDI DISCONNECT command was issued to the drive and the 'IT modifier bit was in an incorrect state. Fault Isolation/Correction: 1. Controller 2. ECM DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-75 48 Invalid Write Memory Byte CounterlOffset Error Error Type: PE Error Description: The drive detected an incorrect number of data bytes to be written in drive memory; or the directed offset into the memory region was incorrect. Fault IsolatioDlCorreetion: 1. Controller 2. ECM 49 Invalid Command During TOPOLOGY Command Error Type: PE Error Description: During me execution oi an SDI level 2 TOPOLOGY OOfijDjijfid, the drive received an illegal SDI level 2 command from another controller. Fault IsolatioDlCorreetion: 1. Controller 2. ECM 4A Drive Disabled by Controller (OD Bit Set) Error Type: Informational Only Error Description: The controller issued an SDI level 2 CHA.~GE ~dODE command to a d..-ive with its DD bit asserted. When the controller asserts the DD bit, it disables the drive from further 110 activity. Fault IsolatioDlColTeetion: 1. Controller (controller error routine determined the drive should be taken out of service) 2. ECM 4B Index Error Error Type: DE Error Description: No index pulse was detected for one revolution of the disk. Fault IsolatioDlCorreetion: 1. ECM 2. HDA 4C SOl invalid Write Memory Region Error Error Type: PE Error Description: An SDI level 2 command was issued to a drive-defined invalid memory region. Fault Isolation/Correction: 1. Operator (attempting to write a nonexistent or protected memory location in drive) 2. Controller 3. ECM DIGITAL INTERNAL USE ONLY 5-76 Troubleshooting and Error Codes 4D Write Gate and Bad Embedded Servo information Error Type: DE Error Description: The servo processor discovered incorrect embedded servo information while WRITE GATE was asserted. Fault IsolatioDlCorrection: 1. HDA 2. PCM 3. ECM 4F invalid Select Group (Level 1 Command) - Not ReadIWrIte Ready Error Type: RE Error Description: The controller issued a level 1 SELECT GROUP command to a drive when the drive was not read/write ready. Fault IsolatioDlCorreetion: 1. Cheek OCP for drive state 2. Controller 3. ECM 50 Servo Data Bus Failure Error Type: DF Error Description: A communication path to the GASP 8lT8y failed during resident diagnostic testing. Fault IsolatioDlCorreetion: 1. ECM 51 Sector/Byte Counter Error Error Type: DF Error Description: A resident diagnostic failure occurred during testing of the sector counter register or byte counter register. Fault Isolation/Correction: 1. ECM 52 Servo RAM Test Failure (Low Byte of Address) Error Type: DF Error Description: At power-up, the drive-resident diagnostics failed during testing of RAM located on the servo portion of the ECM. Fault IsolatioDlCorrection: 1. ECM DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-77 53 Servo Processor Offset Error Error Type: DE Error Description: The servo system failed to offset the heads during error recovery. Fawt~~tiowCon~tiom 1. ECM 54 Head Select Register Loopbaok Error Error Type: DF Error Description: A drive-resident diagnostic detected a failure in the head select register. The head select register is inside the SDI gate array. Fawtho~QowCo~Qom 1. ECM 55 DSP Sanity llmeout After Load Error Type: DE Error Description: The servo processor microcode was reloaded from the EEPROM on the I10-R/W module because of a fault condition. After the microcode was reloaded in servo RAM, the master processor initiated a servo sanity test. The sanity test timed out, indicating a problem with the servo processor. Fawt~~tio~Co~tiom 1. ECM 56 Servo RAM Test Failure (HIgh Byte of Address) Error Type: DF Error Description: A drive-resident diagnostic failed when testing RAM that resides on the servo module. Fawtho~tio~Co~tiom 1. ECM Sf Master Processor 'nmer Failure Error Type: DF Error Description: A drive-resident diagnostic failed when testing the time count register or output compare register. Both are located internal to the master processor. Fawt Iso1atiowCon~tiom 1. ECM 58 Dedicated Head Gain Calibration error Error Type: DE Error Description: The servo processor timed out while attempting to measure and compensate for the gain of the dedicated servo head. DIGITAL INTERNAL USE ONLY 5-78 Troubleshooting and Error Codes Fault Isolation/Correction: 1. ECM 2. PCM 3. HDA 59 Embedded Servo Offset Calibration Error Error Type: DE Error Description: The servo processor timed out during a calibrate of the read/write head offsets. This calibration occurs during all head loads and periodically thereafter. Fault Isolation/Correction: 1. HDA (most probable, especially if only one head is involved) 2. ECM (10 of 13 heads affected) 3. PCM 5A Embedded Head Gain Calibration Error Type: DE Error Description: The servo processor timed out while attempting to calibrate the head gain relative to the readlwrite head embedded burst information. The drive calculates this gain for each of the read/write heads. Fault Isolation/Correction: 1. ECM (if most heads show problem) 2. PCM (if most heads show problem) 3. HDA (most probable, especially if only 1 head is involved) 5B Bias Calibration Error Error Type: DE Error Description: The servo processor timed out during a bias force adjustment to the actuator. Fault Isolation/Correction: 1. ECM 2. PCM 3. HDA 5C Incorrect Diagnostic Index or Sector Pul.. Error Type: DF Error Description: In testing the sector and byte counters, the master processor detected that the sector counter was not working properly. Fault IsolatioDlCorreetion: 1. ECM DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-79 60 ReadIWrHe Head Select Failure Error Type: DE Error Description: A failure occurred when attempting to select a group (head). When a group selection is requested, logic and fL'P!nwa..~ in the drive veri,jy that the correct SSt 283 chip and head in the HDA have been selected. This verification takes place during functional operation and in diagnostic mode. Fault IsolatioDlCorrection: 1. PCM 2. PCM-to-ECM cable 3. ECM 61 Diagnostic Index Sync Timeout Error Error Type: DF Error Description: A drive-resident diagnostic failed to detect an index pulse. Fault IsolatioDlCorreetion: 1. ECM 2. HDA 62 Read Test Overall Read Failure (Three or More Bad Heads) Error Type: DF Error Description: DUring the execution of a resident diagnostic read-only test or write/read test, data by three or more heads read from diagnostic cylinders did not compare to the originally written patterns. The RA90 drive has two diagnostic cylinders (2659 and 2660) located in the inner guardband area of the media. Only the drive can access these two cylinders; they cannot be accessed by the controller. These are not the same cylinders used by the controller to execute controller-based diagnostics (DBN space). Refer to drive-resident diagnostic 17. Fault Isolation/Correction: 1. Reformat the read-only cylinder by running drive-resident diagnostic 17 2. PCM 3. ECM .4. HDA 63 Read Test Partial Failure (One or lWo Bad Heads) Error Type: DF Error Description: During the execution of a resident diagnostic read-only test or write/read test, data by one or two heads read from diagnostic cylinders did not compare to the originally written patterns. Refer to error code 62. DIGITAL INTERNAL USE ONLY 5-80 Troubleshooting and Error Codes Fawt~mtio~Coneetio~ 1. Reformat the read-only cylinder by running drive-resident diagnostic 17 2. PCM 3. ECM 4. HDA 64 Cannot Clear lID Error Bits Error Type: DF Error Description: Error detection logic internal to the lID gate array cannot be cleared. Fawt~mtio~Co~tio~ 1. ECM 65 Diagnostic Index or Sector Not Detected Error Type: DF Error Descriptio~ No index pulse was detected during the execution of resident diagnostics that read or write media. Fawt ~mtio~Correetio~ 1. ECM 2. HDA 66 Read Test Servo Failure Error Type: DF Error Descriptio~ The drive internal diagnostic read or writelread test failed because of an off-track condition. Fawt ~mtio~Coneetio~ 1. PCM 2. ECM 3. HDA 01 Cannot Execute Write Test (Read-Only Test failed or Not Run First) Error Type: DF Error Description: This indicates an operator error, not a drive problem. Service personnel must run the read-only test before attempting to run the write test. Additionall;Yt the read test must be successful before the write/read diagnostic is executed. Fawt Isomtio~Correetio~ 1. Service personnel attempted to execute the writelread diagnostic without first executing the read-only diagnostic. 2. The read-only diagnostic failed and the write/read diagnostic was attempted anyway. 3. ECM DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-81 68 This Diagnostic Cannot Execute WIthout Software Jumper Error 1)rpe: DF Error Description: A diagnostic or utility was attempted without having first selected the RunfStop switch. The Run/Stop switch must be selected within 1.5 seco~w.S ~..er initiating certain tests with the Write Protect switch. Fault IsoIatioDlCorreetion: 1. Procedural error 2. ECM 3. OCP 69 Unable to Force Compare Error Error 1)rpe: DF Error Description: The drive failed to force a data compare eITOr during a read-only diagnostic. Fault IsoIatioDlCorrection: 1. ECM 6A Unable to Force No-Sync Error Error 1)rpe: DF Error Description: The diagnostic firmware was unable to force a no-sync error. Fault IsoIatioDlCorrection: 1. ECM 6B RJW WrlteJRead Test Overall failure (Three or More Bad Heads) Error 1)rpe: DF Error Description: The data read from three or more heads during execution of resident diagnostics was incorrect. The heads are positioned at the drive-reserved diagnostic cylinders during these tests. Fault IsoIatioDlCorrection: 1. ECM 2. PCM 3. HDA 6C RJW WriteIRead Test Partial Failure (One or Two Bad Heads) Error 1)rpe: DF Error Description: The data read from one or two heads was incorrect. The heads were positioned at the drive reserved diagnostic cylinders. Fault IsoIatioDlCorreetion: 1. ECM 2. PCM 3. HDA DIGITAL INTERNAL USE ONLY 5-82 Troubleshooting and Error Codes 60 Unable to Force Read Gate and Write Gate T....,.. Error Type: DF Error Description: Drive-resident diagnostics were unable to force the simultaneous assertion of READ GATE and WRITE GATE. Fault IsolationlCorrection: 1. ECM 6E Unable to Force Write Gate and Write Protect Error Error Type: DF Error Description: A write-protected drive bas WRITE GATE asserted but no etTOr was detected. Fault IsolationlCorrection: 1. ECM 6F Diagnostic Write Attempted While Wrtte-Protecl8d Error Type: DF Error Description: Either the Writ&'Read Diagnostic or the Diagnostic Track Format Utility was attempted on a write-protected drive. FaultuolationlCorreetion: 1. Drive write-protected from the OCP 2. Drive write-protected by the controller 3. ECM 70 Servo Processor Splnup Timeout Error Type: DE Error Description: The master processor timed out after issuing a SPINUP command to the servo processor. FaultUIDlationlCorrection: 1. ECM 71 Recallbrate Timeout Error Error Type: DE Error Description: The master processor timed out during a RECALIBRATE command issued to the servo processor. Fault IsolationlCorrection: 1. ECM DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-83 72 Servo Processor Seek nmeout Error Type: DE Error Description: The servo processor timed out the execution of a SEEK command. This is a gross seek error in that the servo subsystem never sensed thAt it got even within a cylinder of the desired cylinder within a 100 DlS. Fawt~~tiowCo~tion: 1. ECM 2. HDA 73 Servo Processor Head SWItch Timeout Error Type: DE Error Description: The master processor timed out before the servo processor responded to a head switch status request. Fawt~~tiowCo~tion: 1. ECM 74 Offset nmeout Error Error Type: DE Error Description: The master processor timed out during an offset check or OFFSET command to the servo processor. Fawt~~tiowCo~on: 1. ECM 2. HDA 75 Servo Processor Unload timeout Error Type: DE Error Description: The master processor iimed out after issuing an li·:N~OAD (bead) . command to the servo processor. Fawt ~~tiowCo~on: 1. ECM 76 Servo Processor Sanity nmeout Error Type: DE Error Description: The master processor timed out while waiting for a response from the servo processor after issuing a SANITY CHECK command. Fawt ~~tiowCo~on: 1. ECM DIGITAL INTERNAL USE ONLY 5-84 Troubleshooting and Error Codes 77 Head Load nmeout Error Error Type: DE Error Description: The master processor timed out waiting for a response from the servo processor after issuing a HEAD LOAD command. Fault IsolatioDlCorrection: 1. ECM 78 Servo Processor Bias Force calibration Timeout Error Type: DE Error Description: The master processor issued a BIAS CALIBRATION command (diagnostic opcode) to the servo processor. The master processor timed out while waiting for a servo processor response. Fault Isolation/Correction: 1. ECM 79 Dedicated Servo calibration Timeout Error Error Type: DE Error Description: The master processor timed out waiting for the servo processor to respond to a DEDICATED SERVO CALIBRATION command issued as part of a diagnostic opcode. Fault Isolation/Correction: 1. ECM 7A Embedded OffsetlGaln calibration Timeout Error Type: DE Error Description: The master processor timed out while waiting for the servo processor to respond to an EMBEDDED OFFSET CALIBRATION or EMBEDDED HEAD GAIN CALIBRATION command issued by a diagnostic opcode. FaultIso~tion/Correction: 1. ECM 7B Invalid Test While Spindle Running Error Type: DF Error Description: The drive was spun up and the operator selected a diagnostic that can only be run when the drive is spun down. (Certain diagnostics can only be executed on a spun-down drive.) Refer to Chapter 4 for a complete listing of diagnostics and execution requirements. Fault Iso~tion/Correetion: 1. Spin down drive to run selected tests 2. ECM DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-85 7C Gray Code Match Error After Settling Error Type: DE Error Description: Head settling on a track normally occurs following a SEEK command. A gray cede oo:mp~"'iscn is :made to ensu..-re the heads l't..ave settled on the requestAA trar.k. In this case, the servo was settling within 114 track of the desired track (but fine track had not been asserted) when suddenly the servo gray coded information indicated that movement of >1 cylinder has taken place away from the desired target cylinder. Such an occurrence may be related to an intermittent open of the coil actuator circuitry or transient spike in voltage establishing the holding current for the positioner. Fault Isolation/Correction: 1. EC-M: 2. BTlA 7D Embedded Interrupt Timeout Error Type: DE Error Description: The servo processor failed to detect a BURST PROTECT transition (asserted to de-asserted state) as generated from the master processor (ECM). Fault Isolation/Correction: 1. ECM 7E Fine Track Lost After Settling Error Type: DE Error Description: The actuator initially settled on track but has now moved off track and loss of fine track has been declared by the servo subsystem This condition has persisted for 2 seconds. Examine head and or cylinder correlation when considering this error. This information should be derivable from the host error log or by doing a complete dump of the drive internal error log with a controller. Other contributors to tbis condition might be sustained vibration to the drive lWit. HDA runnout condition, or an HDA mechanical resonance problem. Fault Isolation/Correction: 1. ECM (if totally random cylinders and heads) 2. HDA (first choice if same cylinder(s» 3. HDA (first choice if same head(s» 7F Servo Settling Timer Expired Error Type: DE Error Description: The actuator was not able to settle on track within the allotted settling timeout period. The servo system was able to relocate to within 114 track. of the desired track/cylinder; however, it could not meet the fine track threshold stability criteria within the time allotted (1.8 seconds). Examine head and or cylinder correlation when considering this error. DIGITAL INTERNAL USE ONLY 5-86 Troubleshooting and Error Codes Fault Isolation/Correction: 1. ECM (if totally random cylinders and heads) 2. HDA (first choice if same cylinder(s» 3. HDA (first choice if same head(s» 80 Master Processor ROM ConsIstency Code . . . . .tch Error Type: DF Error Description: The master processor microcode is inconsistent with the microcode stored in EPROM. Fault~lationlCorrectio~ 1. Reload microcode 2. ECM 81 Servo Processor Settle State nmeout Error Type: DE Error Description: The actuator was not able to settle on track within the allotted settling timeout period. Fault~1ation/Co~o~ 1. ECM 2. PCM 3. HDA 82 Servo Processor Coarse Velocity State TImeout Error Type: DE Error Description: The servo processor timed out when commanded to move the actuator 256 or more cylinders. Fault Iso1ationlCorreetio~ 1. ECM 2. HDA 83 Servo Processor Fine Velocity State > nmeout Error Type: DE Error Description: The servo processor timed out when commanded to move the actuator less then 256 cylinders. Fault Iso1ationlCorrectio~ 1. ECM 2. PCM 3. HDA DIGITAL INTERNAL USE ONLY Troub!eshooting and Error Codes 5-87 84 Servo Processor Seek Direction Error Error Type: DE Error Description: Servo processor actuator (positioner) and dedicated servo information indicated that the seek d:i.reetion was wrong. Fault Isolation/Correction: 1. ECM 2. HDA 85 Master Processor RAIl Test Failure Error Type: DF Error Description: The drive-resident diagnostics detected bad RAM intemaI to the master processor. Fault Isolation/Correction: 1. ECM 86 Static RAM Failure Error Type: DF Error Description: Drive-resident diagnostics detected bad RAM external to the master processor. Fault Isolation/Correction: 1. ECM 87 Master Processor ROM Checksum failure Error Type: DF Error Description: Drive-resident diagnostics detected bad ROM internal to the master processor. Fault !solAtio-n/Correetion: 1. ECM 88 Master Processor EEPROM Write Violation Error Error Type: DE . Error Description: EEPROM was addressed and written to while in read-only mode. Fault Isolation/Correction: 1. ECM 89 Seek Speed Out of Range Error Type: DE Error Description: While monitoring the speed of the actuator, the servo processor determined that seek velocity is beyond prescribed speed. DIGITAL INTERNAL USE ONLY 5-88 Troubleshooting and Error Codes Fault IsolatioDlCorrection: 1. ECM 2. Power supply 3. HDA 8A Servo Processor Inside of Destination Track During Settle State Error Type: DE Error Description: Servo processor has determined that the positioner has placed heads inside of the destination track during settle state. Fault IsolatioDlCorreetion: 1. ECM 88 Gray Code Error After Settling WIth Fine Track Error Type: DE Error Description: Head settling on a track normally occurs following a SEEK command. A gray code comparison is made to ensure the heads have settled on the requested track. In this case, the servo was settling within 114 track of the desired track and fine track had been asserted when suddenly the servo gray coded information indicated that movement of >1 cylinder has taken place away from the desired target cylinder. Such an occurrence may be related to a significant amount of vibration in the vertical axis of the drive, or electrical transients from the positioner control voltage and holding current circuitry. Fault IsolatioDlCorreetion: 1. ECM 2. HDA 8e Uncallbrated and PLO Error Error Type: DE Error Description: A PLO elTOr occurred and the head offsets were uncalibrated. Fault IsolatioDlCorreetion: 1. ECM 2. PCM 3. HDA 8D Polarity Error on Velocity Command During a Multi-Track Seek Error Type: DE Error Description: The polarity indication bit in a velocity command profile was clear (zero) during a multi-track seek. This bit should have been set. (This is one of the setup functions the servo processor checks before it executes the digital servo seek profiles.) Fault lsolatioDlCorrection: 1. ECM DIGITAL INTERNAL USE ONLY Troub!eshooting and Error Codes 5-89 SE Master Processor ROMlEEPROM Consistency Code Mismatch Error Type: DF Error Description: Master processor microcode is incompatible with EEPROM microcode. Fault Isolation/Correction: 1. Reload microcode 2. ECM SF EEPROM Checksum Failure Error Type: DF Error Description: Drive-resident diagnostics detected bad EEPROM external to the master processor. The calculated checksum did noi maicb the stored checksum. FaultIsolation/Co~tion: 1. ECM 90 Unable to Force Index Error Error Type: DF Error Description: Drive-resident diagnostics were unable to force and/or detect an index error. Fauit IsoiationiCorrection: 1. ECM 91 No InterRipt Detected During RIW Force Fault Error Type: DF Error Description: No interrupt to the master processor was detected by the drive during the read/write force fault diagnostic. Fault Isolation/Correction: 1. ECM 92 Inner Guardband WIthout a Servo Fault Set Error Type: DF Error Description: The actuator was positioned in the inner guardband area and the inner guardband flag was set; however, a servo fault condition was not detected. Fault Isolation/Correction: 1. ECM 93 Inner GuardbandlServo Fault: No Interrupt Detected Error Type: DF Error Description: The actuator was positioned at a cylinder in the inner guardband area, the inner guardband flag was set, and a servo fault error was detected, but the master processor was not interrupted. Fault Isolation/Correction: 1. ECM DIGITAL INTERNAL USE ONLY 5-90 Troubleshooting and Error Codes 94 SOl Loopback Test Failure on Both Ports Error Type: DF Error Description: Drive-resident diagnostics detected an SDI gate array or TSID gate array failure involving both SDI ports A and B logic. If the drive internal test 09 fails, the failure could be in the hardware external to the SDIfl'SID gate aITay as well. Dming internal T09, the testing expects SDI loopback connectors to be attached to the ECM or at the cab bulkhead. Fault Isolation/Correction: If test number 08 fails: 1. ECM If test number 09 fails: 1. Loopback connectors are not installed 2. Defective SDI cable 3. Defective bulkhead connector 4. ECM 5. SDI connectors JI0l or JI02 95 SOl Test Failure: Port A Error Type: DF Error Description: A drive-resident diagnostic detected a failure with the SDI gate array or the TSID gate array involving SDI Port A. If the drive internal test 09 fails for this error code, the failure could be in the SDI Port A hardware external to the SDursID gate array as well. During internal TOO, the testing expects SDI loopback connectors to be attached to the ECM or at the cabinet SDI bulkhead. Fault IsolatioDlCorrection: If test number 08 fails: 1. ECM If test number 09 fails: 1. Port A loopback connectors are not installed 2. Defective SDI cable (Port A) 3. Defective bulkhead connector (Port A) . 4. ECM 5. SDI connector JI02 (Port A) 96 SOl Failure: Port B Error Type: DF Error Description: A drive-resident diagnostic detected a failure with the SDI gate array or the TSID gate array involving SDI Port B. If the drive internal test 09 fails for this error code, the failure could be in the SDI Port B hardware external to the SDIIl'SID gate array as well. During internal T09, the testing expects SDI loopback connectors to be attached to the ECM or at the cabinet SDI bulkhead. DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-91 Fault Isolation/Correction: if test number 08 fails: 1. ECM If test number 09 fails: 1. Port B loopback connectors are not installed 2. Defective SDI cable (Port B) 3. Defective bulkhead connector (Port B) 4. ECM 5. SDI connector J101 (Port B) 98 Can't Execute Diagnostic/Jumper Error Type: DF Error Description: A diagnostic test could not be run because a hardware jumper was not installed. If this error is seen in the field, do not attempt to alter jumpers. Fault Isolation/Correction: 1. Operator (do not attempt to alter jumpers) 9A Positioner Corrected Event During Data Transfer This is typically an event unless analyzed by VAXsimPLUS to be worthy of correction. Reference expanded di~on or 9A, 9B, and 9C events under error code 9C. Error Type: DE Error Description: Heads were not fine positioned or locked on track (relative to the embedded servo information) at the time a read or write operation was ready to start. The drive took necessary procedures to re-establish on-track condition. The drive command was received but READ GATE or WRITE GATE had not yet been asserted. Fault Isolation/Correction: 1. IillA (if only one head) 2. ECM (if 10 or 13 heads) 98 Write and Positioner Corrected Event This is typically an event unless analyzed by VAXsimPLUS to be worthy of correction. Reference expanded discussion or 9A, 9B, and 9C events under error code 9C. Error'Type: DE Error Description: The master processor determined that the selected read/write head moved off track when WRITE GATE was asserted. The condition was corrected. (The readlwrite heads must be within 57.1 microinches from track centerline.) Fault Isolation/Correction: 1. HDA (if only one head) 2. ECM (if 10 of 13 heads) DIGITAL INTERNAL USE ONLY 5-92 Troubleshooting and Error Codes 9C Read Gate and Positioner Corrected Event Error Type: DE Error Description: The master processor determined that the selected read/write head moved off track when READ GATE was asserted. The condition was corrected. (The read/write heads must be within 57.1 microinches from track centerline.) TROUBLESHOOTING 9A, 9B, AND 9C: This is typically an event unless analyzed by VAXsimPLUS to be worthy of correction. For HSCIKDM controllers, event rates of <5 per day may be considered normal for disks that operate with fairly high I/O rates (continually or in significant bursts) provided that the following pattern is noted: • Ninety percent of occurrences are with the top five heads (heads 0 through 4). • One of the top five heads will have few if any errors. • No one head in the top five has 90 percent of the errors. (This might point to a track/surface problem.) If the event pattern matches this, and the event rate exceeds these guidelines, then HDA replacement may be necessary. If the event pattern does not match this, then further analysis is required. For KDAlKDB/UDA controllers, event rates should not exceed 16 per day on heavily used disks (110 rates of 30 per second). If these events occur over 10 of the 13 heads, then the occurrence may be related to a general servclread path problem. This is possibly an electro~cs problem that may not involve the HDA. If these errorS occur primarily on one head, there is strong head/surface correlation and possible HDA replacement is warranted. The above number of events to be expected was determined by analysis and experience with the RA90 HDA 70-22951~1. With the introduction of the RA92 (HDA 70-27492-(1), the number of 9A, 9B, and 9C events has decreased significantly. The phase-in of the RA92 HDA hardware mechanics (resulting in an RA9O-compatible HDA 70-27268-01) into RA90 production has substantially reduced the occurrence of these events because of the new design. Fault Isolation/Correction: 1. HDA (if only one head) 2. ECM (if 10 of 13 heads) 9D Error Log Header Corrupted Error Type: DF Error Description: A location in EEPROM containing drive-resident error log identifier information, device type, or descriptor size is invalid. Fault Isolation/Correction: 1. Attempt to load new microcode 2. ECM DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-93 9E Drive FauHed, Test Cannot Run Error Type: DF Error Description: Drive-resident diagnostics cannot run while the drive is faulted. Fault IsolatioDlCorreetion: 1. Check fault condition 2. ECM 9F Error Log Check Point Code Error Type: Informational Only. Error Description: If drive-resident diagnostic T50 has been used to place a checkpoint between errors in the drive internal error iog, a 9F entry will be seen in ihe drive iniernai error log. This makes drive troubleshooting easier by placing a null field between errors in the drive internal error log to partition repair activity. Fault IsolatioDlCorreetion: 1. None (read Error Description above) AO Unable to Clear SDI Array Safety St..atus Register Error Type: DF Error Description: Drive-resident diagnostics attempted to clear the SDI gate array safety status registers but were unsuccessful. Fault IsolatioDlCorrection: 1. To isolate the stuck bit, check the preceding error in the drive internal error log storage silo. Base corrective action on the preceding error. 2. ECM A 1 Unable to Force Encoder Error Error Type: DF Error Description: Drive-resident diagnostics were unable to force a read/write encoder/decoder (RWENDEC) error. Fault IsolatioDlCorrection: 1. ECM A2 Unable to Force Multiple Head Select While Reading Error Type: DF Error Description: Drive-resident diagnostics were unable to force read gate and multi-chips error. Fault IsolatioDlCorreetion: 1. PCM 2. ECM 3. HDA DIGITAL INTERNAL USE ONLY 5-94 Troubleshooting and Error Codes A3 Unable to Force Write Gate and Write U...t. Error Type: DF Error Description: A drive-resident diagnostic was unable to force write gate and write unsafe error conditions. Fault Isolation/Correction: 1. ECM 2. PCM A4 Unable to Force Write Current and No Write Gate Error Type: DF Error Description: Drive-resident diagnostics were unable to force write current and no write gate error conditions and detect such a condition. FaultlsolationlCorrection: 1. ECM 2. PCM AS Unable to Force Write Gate and No Write Current Error Type: DF Error Description: Drive-resident diagnostics were unable to force write gate and no write current error conditions. Fault IsolatioDlCorrection: 1. ECM 2. PCM A6 Unable to Force Read Gate and Off Track Error Error Type: DF Error Description: Drive-resident diagnostics were unable to force read gate and off track error conditions. Fault IsolationlCorrection: 1. ECM A7 Unable to Force Write Gate and Off Track Error Error Type: DF Error Description: Drive-resident diagnostics were unable to force write gate and off track error conditions. Fault IsolationlCorrection: 1. PJW cable to PCM 2. ECM DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-95 AS Unable to Force Read and Write Fault While Writing Error Type: DF Error Description: Drive-resident diagnostics were unable to force a readlwrite-while-faulted en ui' condition. Fawt~mtio~Co~on: 1. ECM A9 Servo FaultIForce Fault Test Error Type: DF Error Description: A servo check occurred while the diagnostic firmware was attempting to execute the force fault subtest. Fawthomtio~Co~tion: 1. ECM 2. HDA AS Forced Read and Write Fauil While Reading Error Type: DF Error Description: Drive-resident diagnostics were unable to force a readlwrite-while-faulted error condition. Fault ~mtio~Correction: 1. ECM AD UART Overrun or Framing Error Error Type: DE Error Description: The master processor internal UART detected an overrun condition or a framing error condition on data received from the OCP. Fault Isolation/Correction: 1. OCP 2. ECM 30 Blowerlbezel assembly AE ,OCP Data Packet Checksum Error Error Type: DE Error Description: Data packets transmitted between the master processor and the OCP processor are in error. Faultbomtio~Co~tion: 1. ECM 2. OCP 3. Blowerlbezel assembly DIGITAL INTERNAL USE ONLY 5-96 Troubleshooting and Error Codes AF ocp Start Byte Is Not a Sync Charact. Error 'JYpe: DE Error Description: The first byte the master processor expects in a data packet transfer is a sync character. This error indicates no sync character was received. Fault IsolatioDiCorreetion: 1. ECM 2. OCP 3. Blowerlbezel assembly 80 OCP invalid Response Error 'JYpe: DE Error Description: The OCP processor did not acknowledge a command from the master processor. FaultlsolatioDlCo~tion: 1. OCP 2. ECM 3. Blowerlbezel assembly 82 OCP Retransmit Failure Error 'JYpe: DE Error Description: The OCP processor can request three retransmits from the master processor. This error indicates the OCP requested more than three consecutive retransmit responses. Fault IsolatioDiCorreetion: 1. OCP 2. ECM 3. Blowerlbezel assembly 83 OCP Command Unsuccessful Error 'JYpe: DE Error Description: An incoITeCt response was received from the OCP processor after the master processor issued a SEND STATUS command to the OCP. FaultlsolatioDiCorreetion: 1. OCP 2. ECM DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-97 B4 OCP Command nmeout Error Type: DE Error Description: The OCP processor did not respond to a master processor command within iis allotted timeout period. As a result of this error, the il'l&r..er processor logs a B6 euJr condition into EEPROM and latches B4 into the display. Fault IsolatioDlCorrection: 1. OCP 2. ECM 3. Blowerlbezel assembly 136 Master Processor UART Loopback Test Failure Error Type: DF Error Description: Drive-resident diagnostics were unable to transmit and receive data through the master processor serial communications interface (SCI). Fault IsolatioDlCorrection: 1. ECM B8 Master Processor UART TransmitterlReceiver Error Error Type: DE Error Description: The OCP failed to transmit or receive data through its master processor serial port. Fault IsolatioDlCorrection: 1. OCP B9 OCP-to-Master Processor Communications nmaout Failure Error Type: OCP Error Code Error Description: The master processor failed to CO"'municate with the OCP processor within 4 seconds after power-up. Fault IsolatioDlCorrection: 1. OCP 2. ECM 3. Blowerlbezel assembly BA OCP NIII nmeout Failure Error Type: OCP Error Code Error Description: The master processor failed to communicate with the OCP processor within 4 seconds after issuing an initialize request to the OCP processor. Fault IsolatioDlCorreetion: 1. OCP 2. ECM 3. Blowerlbezel assembly DIGITAL INTERNAL USE ONLY 5-98 Troubleshooting and Error Codes BB OCP Processor ROM Checksum Failure Error Type: OCP Error Code Error Description: The OCP processor performed a ROM checksum, and the calculated checksum did not match the stored checksum. Fault IsolatioDlCorrection: 1. OCP BC Cartridge Checksum Fa.lwe Error Type: DF Error Description: Invalid microcode was detected in the microcode update cartridge. Fault IsolatioDlCorrectiom 1. Reseat update cartridge (retry T40) 2. Defective cartridge 3. OCP 4. ECM BD Microcode Update CartrIdge DetectIon failure Error Type: DF Error Description: The microcode update utility (T40) was attempted without an update cartridge in place. Fault IsolatioDlCorrection: 1. Cartridge is not inserted 2. Defective cartridge 3. OCP 4. ECM BE CartrldgeJEEPROM'Master Proceaeor ~ Check Error Type: DF Error Description: Microcode contained within the cartridge is inconsistent with the microcode in the master processor, EPROM, or EEPROM. The microcode update 'process is halted to prevent loading incompatible microcode. The product revision matrix documentation . shows which codes are compatible. Fault lsolatioDlCorrection: 1. Reseat update cartridge 2. Replace update cartridge with a compatible cartridge 3. ECM DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-99 BF Error Log Write Compare Error Error Type: DE Error Description: Each time the drive writes an error log entry into the error silo, it verifies the data written. The microcode got a data compare error on the page (16 bytes) that was written. This is not a fatal error. Should that particular silo entry be rewritten, it mayor may not fail again. This error code is not written to EEPROM but may be displayed at the time of the elTor if the fault button is depressed. Fault Isolation/Correction: 1. ECM co Hardware Revision and Microcode incompatlbliity Error Type: DE Error Description: The microcode has determined that there is an incompatible hardware and/or software combination from the revision information that it has visibility to. The microcode looks at the following hardware revisions in a drive: • I10-R/W module hardware revision jumpers • Servo module hardware revision jumpers • PCM switch pack (S1-1 through S1-4) • HDA revision bits information Most of this hardware revision information can be determined by executing drive internal iest T45 (see Chapter 4), then decoding the reported revision information. The microcode, after checking this internal revision information, will modify the final drive reported hardware revision that is reported to the subsystem and host as the drive hardware revision. Microcode revision 9 was the first release that checked for HDA revision. Subsequent microcode revisions have been expanding on the compatibility testing. With the RA92 (microcode revisions 20 and later), a significant amount of revision checking/testing is necessary for the microcode to properly configure itself as to the type of drive (RA9O vs RA92), type of HDA (short arm vs long 8-1"!!l), type of format (RA.90 VB R.'\92). and type of ECM (70-22942-01 vs 70-22942-02). To determine TOTAL compatibility, you must verify: • Code compatibility to ECM • Code compatibility to HDA • ECM compatibility to HDA • PCM and HDA compatibility • PCM switch pack setup. Reference the compatibility tables in Chapter 3. With microcode revisions 20 and later, the CO LED error is a a very significant fault to the drive and must be resolved. The error type was redefined to a drive error. Fault Isolation/Correction: If the HDA has just been replaced, replace it again with a compatible revision or load compatible drive microcode in the ECM. DIGITAL INTERNAL USE ONLY 5-100 Troubleshooting and Error Codes If the drive HDA and microcode were operational before the failure, then revision bits are now being detected in error. This will require careful troubleshooting. A series of tables in the RA90 I RA92 Disk Drive Pocket Reference Card have been prepared to assist in the determining and resolving of this error condition. Additional tables are provided in Chapter 3. 1. If the HDA has just been replaced: load compatible microcode 2. If the PCM has just been replaced: check PCM switch pack. 81-1 through 81-4 for correct switch settings. Refer to the RA90 / RA92 Disk Drive Pocket Reference Card and the tables in Chapter 3. 3. If the ECM has just been replaced: check microcode compatibility. Refer to the RA90 I RA92 Disk Drive Pocket Reference Card and the tables in Chapter 3. 4. RJW cable 5. PCM 6. ECM C1 outer Guardband Detected After HEAD LOAD Command Error Type: DF Error Description: The GASP gate array detected outer guardband after a HEAD LOAD command. Fault Isolation/Correction: 1. ECM C2 Inner Guardband Detected After HEAD LOAD Command Error Type: DF Error Description: The GASP gate array detected inner guardband after a HEAD LOAD command. Fault Isolation/Correction: 1. ECM C3 Seek to Outer Guardband Failed Error Type: DF Error Description: The servo processor was issued a 8EEK command to the outer guardband area of the disk but failed the seek. Fault Isolation/Correction: 1. Clean cabinet air vent grill 2. ECM 3. PCM 4. Blowerlbezel assembly 5. HDA DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-101 C4 Seek to Outer Guardband Not Detected Error Type: DF Error Description: The servo processor was issued a SEEK command to the outer guardband area of the disk, but the OGB flag was not detected. Fault Isolation/Correction: 1. ECM C5 HDA and ECM Incompatibility Error Type: DF Error Description: The microcode has determined that the reported HDA type and ECM type are incompatible. Specifically, the incompatible combination is an old ECM type (70-22942-01) and an RA92 I:uJA Microcode revision 25 was the first release to check specifically for this error. Previous microcode revisions (revision 9 and later) will report this condition as error code CO. Fault Isolation/Correction: If the HDA or ECM has just been replaced, make sure compatible part numbers have been used. If the PCM has just been replaced (part of the HDA FRU assembly), make sure switches SI-1 through 81-4 are set correctly. (See Chapter 3 or compatibility tables in the RA90 / RA92 Disk Drive Pocket Reference Card.) If HDA, PCM, ECM and drive microcode were operational before the failure, then the switch pack SI on the PCM and/or the I10-R/W and servo revision jumpers are now being detected in error. This will require careful troubleshooting. See drive error code CO for additional troubleshooting information. 1. R/W cable 2. PCM (check 81 switch pack setting) 3. ECM (replace with PIN 70-22942-02) C6 PLO Failure Error TYPe: DE Error Description: The voltage controlled oscillator (VCO) is not synchronized to the dedicated servo information read from the media. Fault Isolation/Correction: 1. ECM 2. PCM 3. HDA C7 Seek to Inner Guardband Failed Error Type: DF Error Description: The servo processor was issued a SEEK command to the inner guardband area of the disk but failed the seek. DIGITAL INTERNAL USE ONLY 5-102 Troubleshooting and Error Codes Fault IsolatioDlCorrection: 1. ECM 2. PCM 3. HDA C8 Inner Guardband Not Detected After Seek to Inner Guardband Error Type: DF Error Description: A SEEK command, issued to the servo processor to seek to the inner guardband area, failed to detect the inner guardband ftag. Fault Isolation/Correction: 1. ECM 2. HDA C9 Analog Loop Test Failure Error Type: DE Error Description: The D/A and AID circuitry did not respond correctly while tested in a loop. The servo processor performs the analog testing on these circuits. FaultIsolatio~rrection: 1. ECM 2. PCM CA Media Not Spinning Error Type: DF Error Description: Selected drive-resident diagnostics could not be executed because the drive was spun down. FaultbolatioDlCorrection: 1. Spin up drive 2. ECM CC Servo Processor Recallbrate Failed Error Type: DE Error Description: A RECALmRATE command issued to the servo processor failed. FaultlsolatioDlCorrection: 1. ECM 2. PCM 3. HDA DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-103 CD Track Counter (Gray Code) Error Type: DE Error Description: During coarse positioning, both gray code bits (X and Y) changed during the same sa-yO frame; or the same gray code changed (X or Y) during two consecutive servo frames. Fault Isolation/Correction: 1. HDA 2. ECM CE EEPROM Write Cyc!e TimeoLif Error Type: DE Error Description: During an EEPROM write operation, a location in EEPROM could not be written within 20 milliseconds. Fault Isolation/Correction: 1. ECM CF Invalid Data In EEPROM Error Type: DE Error Description: Error information in EEPROM was found to be invalid. Fault Isolation/Correction: 1. ECM EO Spindle Rotation Not Detected Error Type: DE Error Description: The servo system has not detected Hall sensor signal transitions. This indicates either the spindle motor is not turning or the Hall sensor circuitry has failed. An open motor coil (or drive circuitrv) will show this svmDtom if that Darticu1ar phase is needed to start the spindle drive. See erro; ~e 13 before replacing FRUs. With microcode revisions 19 and earlier, this error was spindle speed unsafe-basically the same error detection. After microcode revision 20, this error is simply failure to detect that the spindle has performed any motion. The servo monitors the hall sense 81 signal (reference error code 13). If it detects any transition on this specific motor control signal, then this check is okay. Fault Isolation/Correction: 1. ECM 2. Rear flex cable assembly (visually inspect for damage (HDA removal necessary); the rear flex cable assembly should be neatly dressed along the sides of the chassis at the rear) 3. Servo-to-spindle motor interconnect 4. Brake failure (on/open all the time) 5. HDA 6. Rear flex cable assembly DIGITAL INTERNAL USE ONLY 5-104 Troubleshooting and Error Codes E1 Spindle Speed Out Of Range Error Type: DE Error Description: Spindle speed is monitored initially by input from the Hall sensors inside the HDA spindle motor. Improper spindle speed, as detected by the Hall sensors, may prevent proper speed control until the PLO frequency lock range is attained. Once the spindle speed is within the PLO range, the servo system begins to look for servo data in which to lock its frequency to. This error implies that the drive is unable to establish spindle speed rotation within the range required (RA9O=3600 rpm, RA92=3405 rpm). An open failure of a spindle motor coil winding, or a motor drive circuitry failure, or a bad hall sense 81 or 82 circuit will cause this type of error. See error code 13 for measurement points and troubleshooting aids before replacing FRUs. Fault Isolation/Correction: 1. Rear flex cable assembly (visually inspect for damage (HDA removal necessary); the rear flex cable assembly should be neatly dressed along the sides of the chassis at the rear) 2. ECM 3. Continuity checks (refer to Table 5-10) 4. HDA E2 AID or D/A Converter Insane Error Type: DE Error Description: The servo processor detected a failure in its AID or D/A converters during a precheck before the head load was initiated. Fault Isolation/Correction: 1. ECM 2. If you load microcode revision 13 (or earlier) into a 70-22942-02 (RA92-compatible) ECM, a solid E2 error will be seen upon drive spinup. E3 Excessive Positioner Current During Test Error Type: DE Error Description: The servo processor detected a failure in the power amp circuitry that indicates a shorted condition. Fault Isolation/Correction: 1. ECM E4 Open Circuit Detected During Power Amp Toggle Test Error Type: DE Error Description: An open was detected in the power amp circuitry during a head load sequence. Power is applied to the positioner in a toggle fashion during the head load sequence. Reference error code 13 for information that may be useful for isolating an open circuit of the actuator. An ohmmeter measurement might verify this condition at the HDA. DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-105 Fault Isolation/Correction: 1. Rear flex cable assembly {visually inspect for damage (HDA removal necessary); the rear flex cable assembly should be neatly dressed along the sides of the chassis at the rear) 2. ECM 3. HDA E5 Overcurrent Detected During Actuator Test Error Type: DE Error Description: The servo processor detected an overcurrent condition before attempting a head load process. Fault Isolation/Correction: 1. ECM E6 Track Counter Clear Failure Error Type: DE Error Description: The track counter failed to clear indicating establishment of cylinder O. This is the final phase of the head loadIRTZ process that must be accomplished. Loss of PLO during this portion of the head loadIRTZ process will also cause t~ error. See the note in the error description for error code EB. Fault IsolationiCorrection: 1. ECM (most likely) 2. PCM 3. HDA E7 Hlegal Zone Detected Error Type: DE Error Description: The servo system is executing a head load or RTZ operation. For microcode revisions 19 and earlier, the order of band detection is: outer guardband, data area, then inner guardband area. For microcode revisions 20 and later, the order of band detection that the servo system is looking for is OGB, data area, then back to OGB. In this case (without an E9 error), the servo system could not re-establish finding the OGB area (the second time). The servo system will spend up to one second trying to re-establish the OGB area. Loss of PLO during this portion of the head loadIRTZ process will also cause this error. See the note in the error description for error code EB. Fault Isolation/Correction: 1. ECM (most likely) 2. IwA 3. PCM DIGITAL INTERNAL USE ONLY 5-106 Troubleshooting and Error Codes E8 Outer Guardband Timeout Error 'J.Ype: DE Error Description: Servo is in the outer guardband (OGB) of the disk and wants to be able to detect this region by looking for the OGB pattern from the dedicated servo information. At this time, however, the servo cannot establish PLO lock and faults. Interruption of the servo data stream is likely. Up to 3.4 seconds is allocated to trying to find servo data. NOTE PLO Loss During Head LoadIRTZ: The PLO coming unlocked is a fairly serious error to a servo system. It causes all the servo information to become unreadable. There are now four different codes for the PLO being unlocked, depending on when it happens: • At the begiDDing of RTZ, if unable to establish lock, an E8 is reported. • Midway through the RTZ, if lock is lost while SCAnnjng the disk for the OGB, an E7 is reported. • Late in the RTZ, while going from the OGB to cylinder 0, lost lock results in an E6. • During normal track fonowing and seeking, lost lock causes an EC. These are the error codes reported by the servo and logged in the error log while functional VO code is r 11nnjng. Diagnostic VO code may log (and the OCP may display) the VO's error code of C6 for a PLO failure. Fault Isolation/Correction: 1. ECM 2. PCM 3. HDA E9 Gray Code nmeout During the Turnaround State Error 'J.Ype: DE Error Description: No gray code transitions were detected during a hold sequence. The drive is attempting a head load (NRZ), is in the OGB, and has PLO locked, reading its OGB position. At this point, the servo is attempting to move forward to look for track crossings and the eventual detection of the data area of the disk. However, the servo cannot get the positioner to move. The servo will spend up to 3.4 seconds trying to move the positioner. A sticky (dragging) actuator lock pin or faulty actuator lock solenoid will also cause this error. Fault Isolation/Correction: 1. HDA (positioner lock solenoid failu.re--see elTor code 13) 2. Rear flex cable assembly (visually inspect for damage (HDA removal necessary); the rear flex cable assembly should be neatly dressed along the sides of the chassis at the rear) 3. ECM DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-107 EA Gray Code Tlmeout During Outer Guardband State Error Type: DE Error Description: No gray code transitions were detected during a head load sequence. Fault Isolation/Correction: 1. Visually inspect rear flex cable assembly 2. HDA (positioner lock solenoid failure-see error code 13) 3. ECM ED Sector Pulse nmeout During Syne-Up Stat. Error Type: DE Error Description: An index pulse was detected but no sector pulse was detected in the data area of the disk. Heads may not be positioned over the data area. Fault Isolation/Correction: 1. ECM 2. HDA EC Servo Fault and PLO Fault Bit Set In GASP Error Type: DE Error Description: The servo fault and PLO fault bits are both set in the GASp, but it was noted by the servo processor that the PLO had come unlocked. Similar to error code 25, however, the servo processor did see the PLO deassert, which in tum caused the servo fault bit to set. Fault Isolation/Correction: 1. ECM 2. PCM 30 HDA ED Servo Watchdog Tlmeout Error Type: DE Error Description: The digital signal processor (DSP) was not interrupted on time by the GASP. Possibly, the servo clock signal is not present or is not being detected properly. The timeout is 820 microseconds. Fault Isolation/Correction: 1. ECM EE Servo Digital Signal Processor Reset Error Type: DE Error Description: The Servo DSP has been reset. As a result, the profiles for the drive have not been loaded by the master processor. The DSP is sane, but has not been told what type of HDA is present in the drive-it may be an RA90 long arm, RA90 short arm, or an RA92. Therefore, the servo will not load its servo tables or move the actuator. This is an unusual error condition. The master processor should have reinitialized the drive characteristics into the servo system. DIGITAL INTERNAL USE ONLY 5-108 Troubleshooting and Error Codes Fault Isolation/Correction: 1. Turn drive power off and on 2. ECM EF Head Unload Failed Error Type: DE Error Description: The servo processor responded with an error condition to a HEAD UNLOAD command. Fault Isolation/Correction: 1. ECM 2. HDA FO Servo Microcode Update Failed Error Type: DE Error Description: The servo processor did not send a SUCCESSFUL acknowledgment when the master processor attempted to load external servo processor RAM with new microcode. When the drive _powered up, a microcode update occurred or a servo timeout took place. The master processor did a compare of EEPROM to RAM microcode. The data did not compare. Fault Isolation/Correction: 1. I/O-R/W to servo cable connection 2. ECM F1 Command to Servo Processor nmed Out Error Type: DE Error Description: The master processor attempted to issue an UNLOAD command to the servo processor; however, the command timed out during its execution. Fault Isolation/Correction: 1. ECM F3 Servo Splnup Failed Error Type: DE Error Description: The master processor issued a SPINUP command to the servo processor and the servo processor responded with an error condition. Fault Isolation/Correction: 1. ECM 2. Brake assembly 3. HDA DIGITAL INTERNAL USE ONLY Troubleshooti.ng and Error Codes 5-109 F4 Servo Splndown Failed Error Type: DE Error Descriution: The master processor issued a command to spin down the· drive. The error condition. servo processo~ responded with an Fault Isolation/Correction: l. ECM F5 Seek Failed Error Type: DE Error Description: The servo processor returned an error condition in response to a SEEK command from the master processor. NOTE T65 does not check for out-of..range values. Do Dot exceed the maximum specified input values. Also, the last cylinder parameter must always be equal to or greater than the first cylinder parameter. If an invaHd cylinder value is entered, a (servo) Seek Failed. error (F5) occurs. Fauli Isolation/Correction: l. HDA 2. EC!Y{ (if 10 of 13 heads) F6 Head Switch Failed Error Type: DE Error Description: The servo processor responded with an error condition to a HEAD SWITCH command initiated by the master processor. Fault Isolation/Correction: l. HDA 2. ECM (if 10 of 13 heads) F7 RlZ Failed Error Type: DE Error Description: The master processor issued a RETURN TO ZERO (RTZ) command to the servo processor, and the servo processor responded with an error condition. Fault Isolation/Correction: l. ECM 2. HDA F8 Head Load Failed Error Type: DE Error Description: The master processor issued a HEAD LOAD command to the servo processor, and the servo processor responded with an error condition and no specific error information with it, or the head load timed out. DIGITAL INTERNAL USE ONLY 5-110 Troubleshooting and Error Codes Fault Isolation/Correction: For microcode revisions 19 or earlier: 1. ECM (if 10 of 13 heads) 2. PCM 3. HDA For microcode revisions 20 or later: 1. ECM F9 Diagnostic Command Failed Error Type: DF Error Description: The servo processor responded with an error or a timeout condition to a DIAGNOSE command issued by the master processor. Fault Isolation/Correction: 1. ECM 2. PCM 3. HDA FA Servo Processor Failed Seek to DON Write Cylinder Error Type: DF Error Description: A seek to the diagnostic (DGN) writelread cylinder failed while under diagnostics control. Fault Isolation/Correction: 1. ECM 2. PCM 3. HDA FB Servo Processor Failed Seek to DON Read Cylinder Error Type: DF Error Description: A seek to the diagnostic (DGN) read-only cylinder failed while under diagnostics control. Fault Isolation/Correction: 1. ECM 2. PCM 3. HDA DIGITAL INTERNAL USE ONLY Troubleshooting and Error Codes 5-111 FD EEPROM Checksum Error Error Type: DF Error Description: An inCOlTect checksum was detected in one of the EEPROMs. Fault Isolation/Correction: 1. Reload microcode 2. ECM DIGITAL INTERNAL USE ONLY 6 Removal and Replacement Procedures 6.1 Introduction This chapter describes the removal and replacement procedures for RA90 and RA92 disk drive components. No tools are needed to remove or replace the six major field replaceable units (FRUs) that make up the RA9OIRA92 disk drive. However, tools are required for the removal and/or replacement of some drive components. A tools checklist is included to identify these tools. Tools are also identified in procedures where needed. Figure &-1 shows an exploded view of the RA9O!RA92 disk drive. DIGITAL INTERNAL USE ONLY 6-1 6-2 Removal and Replacement Procedures POWER SUPPLY RA90/RA92 DISK DRIVE CHASSIS RIBBON CABLE BLOWER MOTOR ASSEMBLY CXO·2170B Figure 6-1 RA90/RA92 Disk Drive - DIGITAL INTERNAL USE ONLY Exploded View Removal and Replacement Procedures 6-3 6.2 Seauence for FRU Removal • Remove RA9OIRA92 FRUs in the following sequence: CABINET FRONT PANEUDRIVE GRILL OCP BLOWER MOTOR ASSEMBLY ECM ~MHDA t ...........- SPINDLE GROUND BRUSH BRAKE SOLENOID CABINET REAR PANEL POWER SUPPLY CXO-2200A Figure 6-2 FRU Removal Sequence Use care when removing and replacing drive components. Never force fit drive modules or components. Generally, a steady, firm pressure and the correct alignment ensures proper seating of drive components. If you encounter resistance during FRU removal or replacement, check for bent pins, obstructions, or improper alignment of parts. 6.3 Electrostatic Sensitivity Drive components and FRUs are highly sensitive to electrostatic shock. Use proper ESD methods when handling drive components. (Refer to Section 1.4, Electrostatic Protection.) 6.4 Power Precautions Since hazardous voltages are present in this equipment, it is recommended that only trained service personnel attempt to service this equipment. WARNING Always remove power from the unit before removing or replacing any internal part or cable. Bodily injury or equipment damage may result from improper servicing. 6.5 Tools Checklist Most RA90 and RA.92 disk drive repairs can be performed without the use of tools. However, the following tools are required during some procedures: • 5/32 Hex wrench • 1116 Allen wrench • 3/32 Allen wrench • 5/32 Allen wrench • 3/16 Allen wrench DIGITAL INTERNAL USE ONLY 6-4 Removal and Replacement Procedures • Pliers • Needlenose pliers • Medium Phillips screwdriver • Flat-blade screwdriver 6.6 Removing/Replacing Cabinet Front and Rear Access Panels Procedures contained in this chapter require the removal of cabinet front and rear access panels. Panel removal and replacement procedures follow. 6.6.1 Removing/Replacing the Front Access Panel To remove the cabinet front access panel (refer to Figure 6-3): 1. Use a hex wrench or flat-bladed screwdriver to unlock the two quarter-tum fasteners at the top of the panel. Turn the fasteners counterclockwise. 2. Grasp the panel by its edges, tilt it toward you, and lift it up about 2 inches. Remove the panel and store it in a safe place. To reinstall the front access panel: 1. Lift the panel into place and lower it straight down until the tabs on the panel's lower edge engage the slots in the cabinet support bracket. 2. Holding the panel flush with the cabinet, use a hex wrench to lock the quarter-turn fasteners. Turn the fasteners clockwise. 6.6.2 Removing/Replacing the Rear Access Panel To remove the cabinet rear access panel (refer to Figure 6-4): 1. Use a hex wrench or flat-bladed screwdriver to unlock the two quarter-turn fasteners at the top of the panel. Turn the fasteners counterclockwise. 2. Tilt the panel toward you and lift it up to disengage the pins at the bottom. 3. Lift the panel clear of the enclosure and store it in a safe place. To reinstall the rear access panel: 1. Lift the panel into place and :fit the pins into the holes at the top of the 110 bulkhead. 2. Push the top of the panel into place and use a hex wrench to lock the quarter-turn fasteners. Turn the fasteners clockwise. DIGITAL INTERNAL USE ONLY Removal and Replacement Procedures 6-5 QUARTER-TURN FASTENER HEX WRENCH PANEL SUPPORT BRACKET CXO-2130C Figure 6--3 Front Access Panel Removal DIGITAL INTERNAL USE ONLY H Removal and Replacement Procedures HEX FASTENER CABINET REAR BUSTLE /'" /'" 111111 IIIIIIIUIII I ,III ':'II,'I~II'II' /'" /'" i!iil!!!!!!iii! !lil!!!!lll!!! Ilitl 11111 III' ,UI REAR ACCESS PANEL iii!!!!!i!!!!!! i!!iiiii!!!!!! 1I1111::IIIII~~ 111111 ~ ...... 11111:1111111 "," ~ ...... 110 BULKHEAD I I ...... ~ PINS CXO-2131C Figure 6-4 Rear Access Panel Removal 6.7 Removing the Operator Control Panel The operator control panel (OCP) is secured to the bezellblower assembly by the OCP-to-blower connector and by flexible metal retention clips. NOTE Note the orientation of the OCP before removing. To remove the OCP (refer to Figure 6-5): 1. Remove power from the drive. 2. Grip the OCP in the middle and gently pull it towards you. 3. Note OCP-to-blower connector orientation. Reverse this process to replace the OCP. (Check for bent pins before replacing.) DIGITAL INTERNAL USE ONLY Removal and Replacement Procedures DRIVE FRONT &-7 OPERATOR CONTROL PANEL .......... .......... "~BEZEL .......... CONNECTOR CAP CXO-2172C Figure 6-5 OCP Removal 6.8 Removing the Blower/Bezel Motor Assembly Although the bezel and blower motor assembly are removed as one unit from the drive chassis, the bezel and blower motor assembly are two separate units. The blower motor assembly is the FRU. Pay particular attention to the blower motor orientation and blower motor-to-ECM connection. To remove the blower motor assembly (refer to Figure ~): 1. Remove power from the drive. 2. Remove the OCP (refer to Section 6.7). 3. Note blower motor orientation before removing. 4. Locate the four wing nuts. 5. Rotate lower then upper wing nuts counterclockwise to loosen. 6. Grasp the assembly sides and pull the assembly toward you. DIGITAL INTERNAL USE ONLY 6-8 Removal and Replacement Procedures DRIVE CHASSIS BLOWER ASSEMBLY WING NUTS CXO-2173B Figure 6-8 Blower Motor Assembly Removal Sequence To replace the blower motor assembly: 1. Ensure a good connection exists between the blower motor assembly and the ECM. 2. Check for proper connector alignment. 3. Use steady, gentle pressure to replace the blower motor assembly. Do not force the blower assembly into position. If resistance is encountered, check for bent pins. 4. Tighten the upper and lower wing nuts in a clockwise direction. DIGITAL INTERNAL USE ONLY Removal and Replacement Procedures 6-9 6.8.1 Separating the Bezel and Blower Motor Assembly Use the following procedure to separate the blower motor assembly from the bezel (refer to Figure 6-7): 1. Place the assembly grill-side down. 2. Locate and disconnect the +24 V blower motor connector (red and black leads). 3. Locate the Phillips-head screws; loosen and remove. 4. Separate the bezel and blower motor assembly. Reverse this procedure to reconnect the bezel and blower motor assembly. Return the assembly to the chassis. BEZEL BLOWER PHILLIPS· HEAD SCREW +24 V BLOWER MOTOR CONNECTOR PHILLIPSHEAD SCREWS PHILLIPS- _ HEAD SCREW BLOWER CONNECTION CX0-2174B Figure 6-7 Bezel and Blower Motor Assembly Separation DIGITAL INTERNAL USE ONLY 6-10 Removal and Replacement Procedures 6.9 Removing the Electronic Control Module Ensure proper grounding before beginning this procedure. To remove the electronic control module (ECM) (refer to Figure 6-8): 1. Remove power from the drive. 2. Remove the OCP (refer to Section 6.7). 3. Remove the blower motor assembly (refer to Section 6.8). 4. Remove the ribbon cable from the preamp control module (PCM). 5. Locate the lock/release lever on the side of the ECM. 6. Grasp the ECM handle and apply pressure to the lock/release lever with your thumb. NOTE Do not use extreme force when applying pressure to the lock-release lever. Only firm, steady pressure is required to remove the ECM. RA90/RA92 DISK DRIVE CHASSIS RED BAND LOCK/RELEASE LEVER CXO-2176C Figure 6-8 ECM Removal DIGITAL INTERNAL USE ONLY Removal and Replacement Procedures 6-11 7. Pull the ECM toward the front of the chassis. 8. If resistance is encountered, apply a small amount of back pressure to the ECM and, at the same time, apply pressure to the lock release lever. Pull the ECM toward the front of the chassis. Reverse this procedure to replace the ECM. Apply firm (not excessive) pressure until the carrier latch engages its detent. Reconnect the ECM-to-PCM ribbon cable. NOTE Do not force the ECM. If necessary, remove and examine rear connector pins to verify nothing is bent or jammed. In very extreme cases, it may be necessary to remove the SDI cabies from the rear of the drive before inserting the ECM. 6.10 Removing the Preamp Control Module It is not necessary to remove the HDA in order to remove the preamp control module (PCM). Refer to Figure 6-9 while performing this procedure. Ensure proper grounding before beginning PCM removal. PCM TO HDA CONNECTOR SWITCH PACK \\ \ PCM CXO-2175B Figure 6-9 . PCM Removal DIGITAL INTERNAL USE ONLY 6-12 Removal and Replacement Procedures 1. Remove power from the drive. 2. Remove the OCP (refer to Section 6.7). 3. Remove the blower motor assembly (refer to Section 6.8). 4. Remove the ribbon cable from the PCM. . 5. Remove the Phillips-head screws securing the PCM to the HDA 6. Note the orientation of the PCM-to-HDA connector. Place your fingers on the sides and near the PCM-to-HDA connector. Use steady, firm pressure to dismount the PCM from the HDA Reverse this procedure to replace the PCM. Ensure proper alignment between the HDA and PCM-to-HDA connectors. (Check for bent pins prior to reinstalling.) 6.11 Removing/Replacing the Head Disk Assembly This section documents the procedures for removing and replacing the HDA Use extreme care during HDA removal/replacement procedures to prevent damage to the HDA. As with all static-sensitive components, ensure proper grounding when handling. Place components on a grounded, anti-static work surface. Prior to installation, a replacement HDA must be thermally stabilized. WARNING The thermal stabilization procedure is mandatory. Failure to thermally stabilize this equipment could cause premature equipment failure. 6.11.1 Removing the HDA Run tests T43 and T44 before replacing the HDA, to capture seek and spinup information. Record this information on the red tag when returning , the HDA Run tests T53 and T54 to clear stored parameters from the old HDA WARNING An HDA weighs 15 kilograms (33 pounds). Use both hands during this procedure. The positionerlhead assembly must never be rotated in a counterclockwise direction. Damage to the media and heads could occur. Place the HDA on a grounded, anti-static work surface after it has been removed. Use proper grounding techniques when working with drive components. To remove the HDA (refer to Figure 6-10): 1. Remove power from the drive. 2. Remove the OCP (refer to Section 6.7). 3. Remove the blower motor assembly (refer to Section 6.8). 4. Remove the ribbon cable from the HDA. 5. Locate the baseplate latch assembly. 6. To unlock the HDA from the drive chassis, grasp the baseplate latch assembly and pull up and turn until the lock is in its top position. 7. Grasp the HDA carrier handle and pull the HDA toward the front of the drive. 8. Place one hand under the HDA as you remove it from the drive chassis. DIGITAL INTERNAL USE ONLY Removal and Replacement Procedures 6-13 RA90/RA92 DISK DRIVE CHASSIS BASEPLATE LATCH ASSEMBLY CX0-2177B Figure 6-10 HDA Removal 9. If resistance is encountered, attempt to carefully reinsert the HDA and try this procedure again. It may be necessary to apply a small amount of back pressure before the HDA can be removed from the chassis. 10. Place the HDA on a grounded, anti-static work surface. 6.11.2 HDA Thermal Stabilization Procedure The replacement HDA must be thermally stabilized before its moisture barrier bag is opened. Prior to installation, a replacement HDA must be stored at a temperature of 16°C (60°F) or higher for a minimum of 24 hours. The HDA may be stored in the computer room or in another storage room under controlled temperature conditions. If stored in another storage room, the HDA must sit for an additional hour in the computer room in which it will be installed. CAUTION Under no circumstances should the HDA be left overnight in an uncontrolled temperature environment where cold temperatures could occur (for example, in a car) and then openedlinsta1led without a 24-hour thermal stabilization period. DIGITAL INTERNAL USE ONLY 6-14 Removal and Replacement Procedures 6.11.3 Replacing the HDA After the thermal stabilization criteria has been met, open the HDA box and carefully cut the heat-sealed end of the moisture barrier bag. Remove the desiccant from the moisture barrier bag and the HDA from the foam bag. Save all HDA packing material to repackage the failing HDA. Use the following procedure to install the replacement HDA: • Slide the HDA into the chassis until the spring-loaded latch locks into place. WARNING When reinserting the BDA into the drive chassis, take care not to pinch your :6ngers. There is limited clearance between the HDA handle and chassis edges. • Turn the baseplate latch assembly until the latch drops into place and the HDA is secure. To ensure the HDA is secure, try sliding the drive in and out of the chassis. • Reconnect the ECM-to-PCM ribbon cable. • Run tests T53 and T54 to clear stored (replaced) HDA-related information. Return the defective HDA in the replacement HDNs shipping package. Place desiccant inside the moisture barrier bag before folding and sealing the package. Tape the red tag to the outside of the sealed HDA package. 6.11.4 Separating the HDA and Carrier A number of repairs require separating the HDA and carrier. Use the following procedure to accomplish this: 1. Remove the HDA from the chassis and set it carrier-side up on a grounded, anti-static work surface (Section 6.11.1). 2. Locate the rear HDA connector and remove the retaining C clips shown in Figure 6-11. NOTE Remove the C clips by pressing against the spring-loaded rear HDA connector and, at the same time, using a small, :Hat-bladed screwdriver or small needIenose pliers to loosen and remove the clips. 3. Remove the rear HDA connector. 4. Use a Phillips screwdriver to remove the two screws securing the HDA carrier to the damper bracket assembly. 5. Completely loosen but do not remove the four Ton-head screws with a Torx T-15 screwdriver. Refer to Figure 6-11 for the location of the Torx-head screws. Reverse this procedure to reassemble the HDA and the carrier. DIGITAL INTERNAL USE ONLY Removal and Replacement Procedures 6-15 DAMPER BRACKET ASSEMBLY I I 1 CXO-2178B Figure 6-11 HDA Carrier Separation DIGITAL INTERNAL USE ONLY 6-16 Removal and Replacement Procedures 6.11.5 Removing the Spindle Ground Brush This section documents the procedure for removing and replacing the RA9OIRA92 spindle ground brush. Because handling the HDA is necessary, extreme caution must be used. Refer to Figure 6-12 during this procedure. e HEX-HEAD.-'._----~ SCREWS • I SPINDLE GROUND BRUSH COVER CXO-2180B Figure 6-12 Spindle Ground Brush Removal DIGITAL INTERNAL USE ONLY Removal and Replacement Procedures &-17 1. Remove power from the drive. 2. Remove the OCP (refer to Section 6.7). 3. F..emove the blower mot.or assembly (refer to Section 6.B). 4. Disconnect the ribbon cable from the PCM. 5. Remove the HDA from the chassis (Section 6.11.1) and set it on a grounded, anti-static work surface, carrier-side up. 6. Locate the rear HDA connector and remove the retaining C clips shown in Figure ~11. NOTE Remove the C clips by pressing against the spring-loaded rear HDA conneetor and, at the same time, using a small, fiat..b 1gded screwcLPiver or Annan needlenose pliers to loosen and remove the clips. 7. Remove the rear HDA connector. B. Use a Phillips screwdriver to remove the two screws securing the HDA carrier to the damper bracket assembly (refer to Figure ~11). 9. Loosen the four Ton-head screws with a Ton T-15 screwdriver. Refer to Figure ~11 for the location of the Torx-head screws. 10. Separate the HDA and carrier (refer to Section 6.11.4). 11. Locate and remove the spindle ground brush cover shown in Figure 6-12. 12. Locate and remove the spindle ground brush by removing the two hex-head screws that hold it in place. Replace the ground brush then reassemble the HDA and drive assemblies. 6.11.6 Removing the Brake Assembly This section documents the procedures for removing and replacing the RA9O/RA92 brake assembly. Because handling the HDA is necessary, extreme caution must be used. You will need a contact extraction tooi (Digitai Part Number 29-26655-00) to periorm this procedure. Refer to Figures ~12, ~13, and ~14 while performing this procedure. CAUTION Never rotate the actuator or positioner shaft counterclockwise. HDA damage could occur. 1. Remove power from the drive. 2. Remove the OCP (refer to Section 6.7). 3. Remove the blower motor assembly (refer to Section 6.B). 4. Disconnect the ribbon cable from the PCM. 5. Remove the HDA from the chassis (Section 6.11.1) and set it on a grounded, anti-static work surface, carrier side up. 6. Locate the rear HDA connector and remove the retaining C clips shown in Figure ~11. NOTE Remove the C clips by pressing against the spring-loaded rear HDA connector and, at the same time, using a small, flat-bladed serewdriv~r or small needlenose pliers to loosen and remove the clips. DIGITAL INTERNAL USE ONLY 6-18 Removal and Replacement Procedures REAR HDA CONNECTOR HANDLE CONTACT CAVITY Figure 6-13 LANCE RELEASE TIP CXO-2181B Contact extraction Tool 7. Remove the rear HDA connector. 8. Use a Phillips screwdriver to remove the two screws securing the HDA carrier to the damper bracket assembly (refer to Figure ~11). 9. Loosen the four Ton-head screws with a Torx T-15 screwdriver. Refer to Figure ~11 for the location of the Ton-head screws. 10. Separate the HDA and carrier (refer to Section 6.11.4). 11. Locate and trace the brake electrical contacts to the rear HDA connector. 12. Extract the brake electrical contacts (contacts 4 and 5) from the rear HDA connector using the contact extraction tool from the kit. 13. Align t~e contact extraction tool with the front of the connector. Align the lance release tip with the lance release slot, making sure to align the tip with the contact cavity. Refer to Figure &-13. DIGITAL INTERNAL USE ONLY Removal and Replacement Procedures 6-19 BRAKE HOLD-DOWN SCREWS I BRAKE HUB CXO-2997A Figure S-14 RA90/RA92 Brake Assembly Removal/Replacement 14. Push the lance release tip in until the locking lance (metal tip inside contact pin) is released from the slot. 15. Hold the connector firm and push the handle of the contact extraction tool forward. The contact should back out of the rear of the connector. 16. Remove the contact extraction tool and pull the brake contact from the back of the connector. 17. Locate and remove the spindle ground brush cover (refer to Figure 6-12). 18. Locate and remove the spindle ground brush. 19. Use a 5/32 Allen wrench to remove brake hold-down screws (refer to Figure 6-14). 20. Note the hex shape of the spindle and matching hex shape of the brake hub. DIGITAL INTERNAL USE ONLY 6-20 Removal and Replacement Procedures 21. Orient the brake hub to the spindle and fit them together. Do not rotate spindle counterclockwise. 22. Secure the brake to the baseplate with the brake hold-down screws. Refer to Figure 6-14. 23. Replace the spindle ground brush. 24. Reinstall the spindle ground brush cover. 25. Insert brake electrical contacts into slots 5 and 6 in the connector. (Ensure a secure fit by tugging on leads.) 26. Reassemble the HDA to the HDA carrier. 27. Attach the rear HDA connector and C clips. 28. Reassemble the drive. 29. Install the HDA into the drive chassis. 6.11.7 Spindle Lock Solenoid Failure This section covers solenoid failures. The solenoid is not a replaceable FRU; however, its failure prevents the heads from loading and data from being recovered. To preclude the loss of data because of a solenoid failure, this procedure allows you to bypass the solenoid long enough to recover the data and back it up onto another disk drive or tape unit. CAUTION Attempt this procedure only under the worst possible situations; that is, if customer backup data is not current or work in progress must be recovered. After performing this procedure and recovering the data, replace the HDA according to Section 6.11.S. Refer to Figure 6-15 while performing this procedure. 1. Remove power from the drive. 2. Remove the OCP (refer to Section 6.7). 3. Remove the blower motor assembly (refer to Section 6.8). 4. Remove the HDA from the chassis (Section 6.11.1) and set it on a grounded, anti-static work surface, carrier side up. 5. Locate the rear HDA connector and remove the retaining C clips shown in Figure 6-11. NOTE Remove the C clips by pressing against the spring-loaded rear HDA connector and, at the same time, using a small, :flat-bladed screwdriver or small need1enose pliers to loosen and remove the clips. 6. Remove the rear HDA connector. 7. Use a Phillips screwdriver to remove the two screws securing the HDA carrier to the damper bracket assembly. 8. Loosen the four Torx-head screws with a Torx T-15 screwdriver. Refer to Figure 6-11 for the location of the Torx-head screws. 9. Separate the HDA and carner (refer to Section 6.11.4). 10. Locate the solenoid (refer to Figure 6-15). DIGITAL INTERNAL USE ONLY Removai and Replacement Procedures 6-21 TAPE CONTACTS_ SOLENOID ARMATURE SOLENOID HOLD-DOWN SCREWS CXO-2179B Figure 6-15 Disabling the Solenoid for In-Field Data Recovery DIGITAL INTERNAL USE ONLY 6-22 Removal and Replacement Procedures 11. Disconnect the electrical leads from the solenoid and place electrical tape over the lead contacts to prevent shorting. 12. Loosen and remove the positioner lock solenoid hold-down screws with a T-15 Ton wrench. 13. Remove the solenoid and set it aside. 14. Reinstall the solenoid hold-down screws to the baseplate and tighten slightly. 15. Loop a piece of 20-gauge wire (or equivalent) approximately 6 inches long through the solenoid armature as shown in Figure 6-15. 16. Secure one end of the wire around one of the solenoid hold-down screws and tighten the screw securely onto the wire. 17. After looping the wire through the solenoid armature, gently pull the solenoid plunger away from the positioner/actuator assembly until it stops (approximately a quarter inch). lB. Loop the .loose end of wire around the second hold-down screw and tighten the screw securely onto the looped wire. CAUTION Ensure both sides of the wire are secure and that the solenoid plunger is held back. The aim of this procedure is to recover customer data. If the solenoid plunger slips back, it will cause the solenoid armature to allow the positioner/actuator assembly to lock. Data recovery will then be unsuccessful. Reassemble the HDA, carrier, and drive. After data has been recovered, replace the HDA according to the HDA replacement procedure in Section 6.11.3. When returning the old HDA from the field, also return the failed solenoid. 6.12 Removing the Power Supply This section documents the procedures for removing and replacing the RA9OIRA.92 power supply. Ensure you have removed power from the correct drive. Proceed with caution whenever working with high voltages. Refer to Figure 6-16 while performing this procedure. WARNING When removing and replacing drive components, take care not to pinch your fingers. There is limited clearance between the HDA handle and chassis edges. 1. Spin down the drive. 2. Turn off the drive circuit breaker to remove power from the drive. 3. Note port cable connector locations when removing the power supply. 4. Remove the power cord from the rear of the drive. 5. Remove other cables that may interfere with the power supply removal. 6. Loosen the bottom two quarter-turn fasteners by turning in a counterclockwise direction. 7. Support the bottom of the power supply with one hand. B. Loosen the top two quarter-turn fasteners by turning in a counterclockwise direction. 9. Remove the power supply. DIGITAL INTERNAL USE ONLY Removal and Replacement Procedures ~23 RA90/RA92 DISK DRIVE REAR / QUARTERTURN QUARTERTURN FASTENERS FASTENERS CXO-2171B Figure 6-16 Power Supply Removal CAUTION The power supply weighs approximately 6.8 kilograms (15 pounds). It must be supported when being removed from the drive. Reverse this process to replace the power supply. Check the line voltage selector switch to ensure yo~ have the correct voltage for your area. 6.13 Removing/Replacing the Rear Flex Cable Assembly This section documents the procedures for removing and replacing the RA90 and RA92 rear flex cable assembly. To facilitate the removal of the rear flex cable assembly, first remove the drive HDA, power supply, and ECM. After these drive components have been removed, remove the drive chassis from the cabinet and place it on a grounded, anti-static work surface" DIGITAL INTERNAL USE ONLY 6-24 Removal and Replacement Procedures To remove the rear flex cable assembly (refer to Figure 6-17): 1. Loosen the four Allen screws holding the rear panel assembly to the drive chassis. 2. Remove the 15 contact springs. Set the contact springs aside. 3. Remove the four Allen screws and set the rear panel assembly aside. (Set aside the drive serial number label bracket.) BLACK FEMALE ECM CONNECTOR DRIVE SIN EL BRACKET ADHESIVEBACKED CABLE CLAMP (REMOVE) HELICAL-SPLIT WASHER (4) GREEN MALE REAR CONNECTOR DRIVE HARDWARE REVISION SWITCH PACK Figure 6-17 CXO-2990A Rear Flex Cable Assembly Removal The next step requires the removal of the rear flex cable assembly. There are a number of adhesivebacked cable clamps used to secure the rear flex cable assembly in place. The cable clamps all open toward the rear of the drive with one exception; locate and remove this "one" cable clamp to facilitate removal of the rear flex cable assembly. (See Figure 6-17 for the location of this clamp.) 4. Remove the two Allen-head screws that secure the green male rear connector to its bracket. 5. Remove the two C clips that secure the black ECM female connector to its bracket. 6. Remove the rear flex cable assembly. DIGITAL INTERNAL USE ONLY Removal and Replacement Procedures 6-25 The next step requires the replacement of the rear :flex cable assembly. Lay the replacement rear :flex cable assembly out next to the one being replaced. Set the dip switches on the new rear flex cable assembly to the exact settings from the replaced one. By hand, bend the rear :flex cable assembly 90 degrees in the same places as the original assembly. NOTE Future flex cable assemblies may use dip shunt switch packs rather than dip switch packs. A shunt open = switch open or off. 7. Place the rear :flex cable assembly on the rear panel assembly with the two connectors on their proper brackets. 8. Secure the green male rear connector to its bracket with the two (previously removed) A11enhead screws. 9. Secure the black female ECM connector to its bracket with the two (previously removed) C clips. 10. Replace the previously removed adhesive-backed cable clamp. 1l. Loosely attach the rear panel assembly to the rear of the drive chassis. 12. Replace the 15 contact springs. 13. Secure the rear panel assembly by tightening the Allen screws. 14. Return the drive chassis to the cabinet. 15. Return the drive components to the drive chassis. 6.14 Media Removal Service for Customers The on-site media removal and disposal service is an exclusive Digital Customer Services offering. The following tools are needed to remove drive media from the HDA. Digital part numbers for these tools are listed in Table 6-1: l. 1116 Allen wrench 2. 3/32 Allen wrench 3. 5i32 Alien wrench 4. 3/16 Allen wrench 5. Torx size T-15 wrench 6. Torx size T-15 socket wrench 7. Pliers 8. Diagonal cut pliers 9. Needlenose pliers 10. Medium Phillips screwdriver 1l. Flat-bladed screwdriver DIGITAL INTERNAL USE ONLY 6-26 Removal and Replacement Procedures Table 6-1 Digital Part Numbers for Recommended Tools Technical Description Part Number Ballpoint hex screwdriver blade, 1116" 29-26111-00 Ballpoint hex screwdriver blade, 3/32" 29-26113-00 Ballpoint hex screwdriver blade, 5/32" 29-26117-00 Ballpoint hex screwdriver blade, 3/16" 29-26118-00 Pliers, diagonal cutters, 4" 29-19328-00 Pliers, long needlenose 29-13461-00 Socket, Ton: T-15 29-27275-01 Screwdriver, Ton: T-15 29-22772-00 Screwdriver blade, Phillips # 1 29-11001-00 Screwdriver blade, slotted, 3/16" 29-1098S-00 Screwdriver blade, Ton: T-15 29-22772-00 Screwdriver blade, Ton T-10 29-26947-01 To remove the media from the HDA (refer to Figures 6-1S and 6-19): 1. Remove the PCM from the HDA and store it in an ESD bag for return to Customer Services Logistics. Use proper ESD procedures. 2. Remove the four Ton head screws, or three Ton head screws and one medium Phillips-head screw that secure the PCM plug to the HDA chassis. 8. Remove the HDA from the drive chassis (refer to Section 6.11.1). 4. Separate the HDA and carrier (refer to Section 6.11.4). 5. Use a Phillips screwdriver to remove the actuator counterweight located at the end of the positioner shaft. 6. Use a 8/S-inch open-end wrench or a pair of medium-sized needlenose pliers to hold the 81S-inch nut on the positioner motor assembly located near the center of the shaft. This is a locking nut for an expander bolt holding the positioner coil assembly to the positioner shaft. Hold the nut and, at the same time, loosen the 8/82 Allen screw with a 8/82 Allen wrench. Turn counterclockwise until the 3/32 Allen screw, the 3/S-inch nut, and expander bolt assembly can be removed. 7. Use a medium-sized Phillips screwdriver to remove the three retaining screws holding the positioner motor assembly to the HDA baseplate. S. Cut the flex leads from the positioner motor to the HDA electrical socket with diagonal cutters. 9. Firmly grasp the positioner motor assembly at the end of the positioner shaft and lift up. If you have difficulty sliding the positioner motor assembly off the end of the positioner shaft: • Loosen the four crash stop Allen screws using a 5/32 and 1116 Allen wrench. Turn screws in a counterclockwise direction. • Reattempt to remove the positioner motor assembly from the positioner shaft. DIGITAL INTERNAL USE ONLY Removal and Replacement Procedures iI I r, I I I ii I I I I 6-27 il TOP COVER TORX-HEAO SCREW (13) I I I I I I HOA TOP ..,COVER I MALE TORX-HEAD SCREW (6) AIR FILTER ASSEMBLY TOP CLAMP RING HEADS POSITIONER! HEAD ASSEMBLY SCREWS (TO SECURE POS!T!ONEPJHEAD ASSEMBLY) Figure 6-18 CXO-2991A HDA Media Removal - Top View DIGITAL INTERNAL USE ONLY &-28 Removal and Replacement Procedures 3/32 ALLEN SCREW RETAINING SCREW (3) POSITIONER MOTOR ASSEMBLY EXPANDER BOLT CRASH-STOP ALLEN SCREW (4) POSITIONER SHAFT AIR FOIL SCREWS PCM PLUG TORX-HEAD OR PHILLIPS-HEAD SCREW Figure 6-19 HDA Media Removal - POSITIONER COLLAR CXO-2992A Bottom View 10. Use a flat-bladed screwdriver to detach the spring clip that secures the positioner lock pin into the positioner collar. 11. Remove the solenoid armature that connects the lock pin to the solenoid from the lock pin. Using a pair of needlenose pliers, remove the lock pin from the positioner shaft. 12. Use a Ton T-15 wrench to remove the three screws used to secure the positionerlhead assembly to the baseplate. DIGITAL INTERNAL USE ONLY Removal and Replacement Procedures 6-29 13. Use a Torx T-15 wrench to remove the two (in some cases three) screws that secure the internal airfoil. All Torx screws should now be removed from the bottom of the baseplate. 14. Turn the HDA over to access top cover Torx-head screws. 15. Use a Torx T-15 head wrench to remove the 13 top cover Torx-head screws (refer to Figure 6-18). 16. Remove the top cover of the HDA. 17. Remove the internal air filter assembly from the HDA 18. Remove the HDA filter fence from the HDA assembly. It may be necessary to rotate the positionerlhead assembly so the heads are toward the inner guardband area of the media. 19. Push the loose PCM plug out of the chassis and maneuver the PCM plug and its attached cable assembly so the positionerlhead assembly can be removed from the chassis. 20. Rotate the positioner out of the way as you manually unload the heads from the media. 21. Lift the entire positioner/head assembly out of the HDA chassis. 22. Use a Torx T-15 internal socket wrench to remove the six (6) male Torx-head screws securing the top clamp ring on the media stack, and lift clamp rings, media, and spacer rings from the spindle hub. 23. Give the media to the customer. 24. Collect all loose pieces of hardware and remove from the site. Return hardware to Customer Services Logistics for proper disposal. DIGITAL INTERNAL USE ONLY 7 Microcode Update Procedure 7.1 Introduction This chapter describes the procedure for updating RA9OIRA92 disk drive microcode when a new version of the microcode is released. 7.2 Microcode Update Cartridge Description The microcode update cartridge is a ROM assembly that contains updated microcode for the RA9OIRA92 disk drive microprocessor. Figure 7-1 shows the microcode update cartridge. 1b update the RA9OIRA92 microcode, insert the cartridge in the microcode update port and run T40. MICROCODE UPDATE CARTRIDGE CXO-2164A Figure 7-1 Microcode Update Cartridge DIGITAL INTERNAL USE ONLY 7-1 7-2 Microcode Update Procedure 7.3 Microcode Update Port Description The microcode update port is a cutout in the operator control panel (OCP). It is located below and to the left of the Run switch. Figure 7-2 shows the location of the RA9OIRA92 microcode update port. To access the microcode update port, it is necessary to remove the cabinet front access panel. MICROCODE UPDATE ~ORT MICROCODE UPDATE CARTRIDGE CXO-2165C Figure 7-2 Microcode Update Port DIGITAL INTERNAL USE ONLY Microcode Update Procedure 7-3 7ii4 Running Test 40 (T40) T40 is a microcode subroutine used to load the new microcode from the microcode update cartridge into the master processor. The new microcode may be intended as a servo microcode update, a diagnostic update, or a functional microcode update. During update, the new microcode is downloaded to its destination EEPROM in three separate passes. Each pass takes approximately 20 seconds. The pass count is displayed in the OCP alphanumeric display during the update procedure. Pass one reads the cartridge, calculates and verifies the checksum in the cartridge, and verifies the microcode consistency codes. If pass one fails, the update is aborted and an appropriate error code is generated. Pass two writes the even pages in EEPROM (16 bytes). An even page is defined as BIT04 of the EEPROM address equal to zero. Pass three writes the odd pages in EEPROM. An odd page is defined as BIT04 of the EEPROM address equal to one. After the microcode is fully loaded (indicated by [C 40)), the drive performs a reset and goes through its normal power-up sequence of internal diagnostics. The OCP performs a reset, returns the drive to its normal operating state, and displays the unit address. 7.5 Updating the Microcode Remove the cabinet front access panel before beginning the microcode update procedure. Refer to Section 6.6 for the front access panei removal procedure. Use the following procedure when updating drive microcode: 1. Load the microcode update cartridge in the microcode update port. 2. Load test T40 (drive must be spun down). 3. Start test T40. The following occurs in the OCP display (where S = start, P = pass, C = completed): 1. [8 40] (2 seconds). 2. [p 1] (20 seconds) Pass one checks PROM to be loaded. 3. [p 2] (20 seconds) Pass two writes the new code into the even pages in EEPROM. 4. [p 3] (20 seconds) Pass three writes the new code into the odd pages in EEPROM. 5. [C 40] (1 second) Update is complete. 6. [WAIT] (10 seconds) Exits test mode and goes through power-up hardcore sequence. 7. [0000] Returns to display the drive unit address. Remove the microcode update cartridge from the OCP and replace the cabinet front access panel. Select the appropriate port switches to return the drive to the available state. i.5.1 Error Codes/Common Problems During Microcode Update The most common problems encountered during a microcode update are as listed by error code in Table 7-1. DIGITAL INTERNAL USE ONLY 7-4 Microcode Update Procedure Table 7-1 Common Error Codes/Problems During Microcode Update Error Code Reason Solution BD The microcode cartridge was not detected. Reseat the microcode update cartridge. BC The cartridge checksum was incorrect. Reseat the cartridge and retry the update. If it still fails, either replace the OCP or try the cartridge in another drive. Acquire a new microcode cartridge if necessary. BE Cartridge and EPROM consistency check failed. Reseat the cartridge and try again. If the same elTor occurs, replace the cartridge with one containing compatible code. FD An EEPROM checksum error occurred. Attempt to reload the cartridge code. If the failure occurs again, electronic control module (ECM) replacement may be necessary. DIGITAL INTERNAL USE ONLY A Capturing Information for LARS and CHAMPS This appendix contains sample LARS for installation and general troubleshooting of field replaceable units (FRUs) in the field. DIGITAL INTERNAL USE ONLY A-1 c C5 ~ r- z-i m :D Z > rC (J) m 0 Z !:( :!I (Q c ; t ~ i! ~ :II en D' 1». :I ." i&" START DATE STOP DATE REQUEST DATE STOP TIME START TIME DAY MONTH YEAR DAY MONTH YEAR DAY MONTH YEAR 1, 1, 11 151 l'I11FIEI a ielei 1,1'11111 1'11IFIEl a l·I'1 11 10 14 1.1 1014IFIEl a l·I'1 ACT REPAIR DEC OPTION VAR DEC OPT. SIN TYP ACT FAIL AREA -MODULE -FCO -COMMENTS AUTHORIZED TIME CAL TAK TESTS , 0 I 1 1 1 IsiAlelolol-IHIAI Iclxlololol4151 El IDIElslKlllDIDIEIDI 1-lplolslllTlllolNIEJDI I I I 0 REQTIME LINE o [[] EJ E1 [[] I I 151 IRIAleIOI-IAI I I ICIXlOlOl71 SlSJ EJ El IIINISITIAILILIEIDI I-I ITIEISITI I I I I I I I I 0 ~ [[] I I 15 \ IRIAleIOI-IAI I \ ICIX\OIOI7ISl e l EJ ElIIINISITIAIL\LIEID\ I-I ITIEISITI I I I I I I I 1 0 ~ [[] I I 151 IRIAlelol-tAI I I ICI XlOlOl71 171 EJ El IIINISITIAILILIEIDI I-I ITIEISITI I I I I I I I I 0 S El 0 ~ [[] I I 151 IRIAlei 0I- IAI I I ICIx I0 10171sl·1 E] II IN IS IT IAILILIE IDI I- I ITIE ISiT I I I I I I I I I SAMPLES OF LINE ITEMS FOR ECM REPLACEMENT: LINE ACT REPAIR DEC OPTION VAR DEC OPT. SIN TYP ACT FAIL AREA -MODULE -FCO -COMMENTS AUTHORIZED TIME CAL TAK TESTS 110141 IRIAlelOI I-IAI 1 ICIXlAI21S12121 ~ ~. IEICIMIIEI2Iel·12IelllllsIOI7ISlsI1121314ISI ECM WITH VAX.lmPLUS THEORY CODE: LINE ACT REPAIR DEC OPTION VAR DEC OPT. SIN TYP ACT FAIL AREA -MODULE -FCO -COMMENTS AUTHORIZED TIME CAL TAK TESTS 0 4 X X O S S I 1 1 1 IRIAlelOI I-IAI 1 ICI lAI21312121 ~ ~ IEICIMI 1'1·1'ISI·I IXI·IYIYISI I7I I31'121314I I WHERE XX AND YY ARE VAX.'mPLUS SUPPLIED NUMBERS CXO·218ea o [[] D o [[] 0 Sheet 1 of S s::D (J) S» :::t Co o:::t > s:: " (J) FOR HDA REPLACEMENT: LINE 0 CD [EEJ IR I A 1,10 I DEC OPT. SIN VAR DEC OPTION REPAIR TIME ACT I·IA I I EEeI~! 1,12121 TVP CAL ACT TAl< D D FAIL AREA· MODULE· FCO • COMMENTS AUTHORIZED TEST8 EJ!.E""'A-rrn-r-""e'-rH-c r-oTI-'I-H"-1D""11--'1r-T"'1-TEEI-S rQT 7- 'I-s""'1.""11--'1r2 "'I,"'1-4r--1lsl D Z- LAND R FOR LEFT AND RIGt-IlT MOUNTED DR. (FRONT VIEW) HDAZ HEAD ONE ECC ERRORS FOR HDA WITH VAXllmPLUS THEORV CODE: LINE ACT 0 CD [ REPAIR TIME DEC OPTION 10141 IR I A 1'1 0 I DEC OPT. SIN VAR I·IA I TVP CAL ] EEE~!I'12121 D ACT TAl< FAIL AREA· MODULE· FCO • COIMMENTS AUTHORIZED TESTS D EI!>j~~rnBsl·IXI·IVI 1 'lclxl·1112I s l'14171·1 WHERE X AND V ARE VAXllmPLUS SUPPLlEI) NUMBER D FOR PCM REPLACEMENT: LINE ACT DEC OPTION REPAIR TIME 0 CD [m IR I A 1'1 0 I DEC OPT. SIN VAR I· I A]] EEEI21'12121 TVP CAL ACT TAl< FAIL AREA· MODULE· FCO • COMMI:NT8 AUTHORIZED TESTS D D Iil~I~ill~~J_[rrEEl7IslsI214121211 D I FOR PCM REPLACEMENT WITH VAXllmPLUS THEORV CODE: LINE ACT REPAIR TIME I!J CD [m DEC OPTION IRIAI'lol VAR I·IAI DEC OPT. SIN TVP CAL ACT TAK FAIL ARf,A • MODULE· FCO • COMMENTS AUTHORIZED TESTS I [~I!!I'12121 0 D [!I~I~~LEIB!lJ!J~LEEEEl7Isl S 1214121211 I WHERE XX AND VV ARE VAXlimPLUS SUPPIl.IED NUM8ERS D FOR BLOWER REPLACEMENT: LINE o ~ m JJ z > rC (J) m oz ~ REPAIR TIME o CD [m G5 i3r- ACT DEC OPTION IR IAI'lo I VAR I·IAI DEC f)PT. SIN I TVP CAL EEE~I 1 2 1 1 1 0 1 E1 ACT TAl< FAIL AREA· MODULE· FCO • COMMENTS AUTHORIZED TESTS E1 E[~I~E~E~JJTIfCTXl7IsI013121110101 D OR THE APPARENT CAUSE: FROZEN WON'T TURN FOR BLOWER REPLACEMENT WITH VAXllmPLUS THEORV COI)E:: LINE ACT REPAIR TIME o CD [EEJ DEC OPTION VAR IRIAI'loll·IAII DEC f)PT. SIN EEE~11211101 TVP CAL ACT TAl< FAIL AREA· MODULE· FCO • COMMElifrS E1 E1 l!I~.~EliJ~B!lIXJ~LEEEEJI71 s AUTHORIZED TEST8 1 0 1.1211 10 1 0 WHERE XX AND VV ARE VAXlimPLUiS 8UPPLIED NUMBERS ID C)(0·216'8 Sh,..t 20f 3 0 11 G5 CO r- t .... ~ z -f m JJ z c ; ::IJ eD C t1' tD m 0 Z !< :::!. :::J 5: » ren o ~ c: . (Q S' 0' FOR POWER SUPPLY REPLACEMENT: LINE ACT REPAIR TIME :I DEC OPTION DEC OPT. SIN VAR TYP CAL ACT TAK EJ ." FAIL AREA - MODULE - FCO - COMMENTS Ipisl IclRlolwlBIAIRlsl I I AUTHORIZED TESTS Icl x lal111121211121 e l 0 NO VOLTAGEi NO POS 24V ERR 22 ACT o OJ REPAIR TIME I 10 14 DEC OPTION 1 IRIAleiol VAR I-II.I DEC OPT. SIN TYP CAL ACT TAK 1 Icl x l2161112121 El E l l p l s I111·11161·lxlxl·IYIYllcl x laI11112121112Iej FAIL AREA - MODULE - FCO - COMMENTS AUTHORIZED TESTS D WHERE XX AND YYARE VAXllmPLUS SUPPLIED NUMBER ACT o OJ REPAIR TIME 11 0 DEC OPTION 12 1 IRIAleiol VAR I-II.I DEC OPT. SIN TYP CAL I Icl x l21 e l112121 El FAIL AREA - MODULE - FCO - COMMENTS ACT TAK E1 lolcipi ILIElol lolulTI I I AUTHORIZED TESTS Icl x lal111121211121eJ D FOR MISCELLANEOUS PARTS: ACT REPAIR TIME DEC OPTION VAR DEC OPT. SIN TYP CAL ACT TAK E1 FAIL AREA - MODULE - FCO - COMMENTS Islolll IclAIBILIEI lolplelNI AUTHORIZED TESTS I I I I I IIII 0 I ITEM REPAIRED FOR SMOO REPAIRS: LINE ACT REPAIR TIME (J) Q) :::J c- o :r: SEGMeNT OUT OR SWITCH IS BROKEN LINE o ., » FOR OCP REPLACEMENT: LINE - > JJ FOR POWER SUPPLY WITH VAX 11m PLUS THEORY CODE: LINE 3 ao· :::J DEC OPTION VAR DEC OPT. SIN TYP CAL ACT TAK El FAIL AREA· MODULE· FCO· COMMENTS 181 a l11 IRIEILIAlyl 10iRIolpisi AUTHORIZED TESTS lolulTI 1 I 1 1 I D CXO·216eB Sheet 3 of a 3: ." en B RA90/RA92 Error Recovery Levels RA90 and RA92 disk drives incorporate hardware error recovery as part of the RA9OIRA92 circuitry. Read data circuitry is altered any time the controller issues error recovery commands. Generally, error recovery is used to assist the controller during unrecoverable or uncorreetable errors. The intent is to enhance the controller/disk interaction to recover data that might otherwise be lost. The RA9OIRA92 hardware recovery circuitry is divided into six functional areas, as shown in Table B-1. Table B-1 RA90JRA92 Hardware Error Recovery Circuits Circuit Description READ THRESHOLD GAIN There are two ways to increase the chances of reading data from a potentially bad spot on a disk: increase read threshold or decrease read threshold. The drive determines whether information coming off the disk is either too weak or too strong and consequently increases or decreases the read circuitry amplitude in an attempt to recover information. HOLD-OVER ONE-SHOT VCO control voltage is held stable to prevent large phase errors during a momentary loss of read pulses from the disk. SKEW READ GATE A delay of one or two byte times is introduced between the moment the SDI gate L'"T8y (on the !!O~RlW 7'ftodule) receives t.he READ GATE signal from the SDI controller and the time the jjO-RtW module acts upon the READ GATE signal. The amount of delay (skew) changes for each revolution of the disk when the index pulse is received. The skew time is one byte time for odd revolutions of the disk and two byte times for even revolutions of the disk. FAST LOCK DELAY Fast lock delay is accomplished by the BJW ENDEC chip. The drive sofuvue enables fast lock delay through Mise. 110 Port 0 (bit <4> ) with a 2.24microsecond delay in addition to the delayed gate signal. OFFSET OF HEADS Positive and negative offsets can be applied to the servo circuitry during attempted reads. Six combinations of offsets are utilized in the RA9O. These include plus or minus offsets of 5%, 10%, 12.4%, or 20% of the track. width. WRITE DIAGNOSTICS Thin-film heads can sometimes take on the characteristics of the magnetic media. The buildup of this magnetic field in the heads interferes with the drive's ability to read the surface of the disk. Running write current through the heads usually breaks up the magnetic alignment of the thin-film heads substrata layers. This level of error recovery writes internal diagnostics within the dedicated inner guardband to eliminate this problem. WIth normal drive operations, this should rarely be a problem. DIGITAL INTERNAL USE ONLY 5-1 B-2 RA901RA92 Error Recovery Levels The RA901RA92 error recovery circuits are activated when the SDI controller issues an SDI ERROR RECOVERY command to the drive. This occurs after the controller has exhausted its read retry count (five for the RA9OIRA92). An error recovery level is specified by the controller in the SDI ERROR RECOVERY command. The level number specifies which combination of error recovery circuits the drive is to employ. There is no controller intervention in the actual drive error recovery process. RA90 and RA92 disk drives employ 14 levels of error recovery, as shown in Table B-2. Table B-2 RA90/RA92 Error Recovery Levels Level Description 14 Offset of heads by dedicated servo to +5% (offset is towards outer guardband). 13 Offset of heads by dedicated servo to -5% (offset is towards inner guardband). 12 Offset of heads by dedicated servo to +10%. 11 Offset of heads by dedicated servo to -10%. 10 Offset of heads by dedicated servo to +12.4%. 9 Offset of heads by-dedicated servo to -12.4%. 8 Offset of heads by dedicated servo to +20%. 7 Offset of heads by dedicated servo to -20%. 6 Enable hold-over one shot. 5 Fast lock delay level. 4 Turn on low threshold. 3 Turn on high threshold. 2 Turn on read gate delay. 1 Diagnostic writes (to clear head domain cluttering). o NOP: This is the normal default state of the drive. No elTOr recovery circuits are activated. The drive supplies the controller with the number of error recovery levels it has at its command. This is done by the drive in response to a GET COMMON CHARACTERISTICS command from the controller. The actual mechanism is transparent to the user, but works as follows: During a read data operation, the controller reads a block of data from the disk. If there are no ECC errors, data is passed to the host operating system. However, if the controller detects an ECC error, it compares the number of ECC symbols in error to the drive's ECC error symbol threshold. The RA9OIRA92 disk drive has an error symbol threshold of six. As long as the error symbol threshold has not been reached, the controller can correct the data. If the error symbol threshold is equaled or exceeded, the drive then sends an error to the host error log and sets the BBR (bad block replacement) flag. The BBR process is actually implemented at a later time. The controller then determines if it can correct the data. If the data is uncorrectable, the controller examines the drive's common characteristics to determine the drive's read retry count parameter. The RA9OIRA92 disk drive has a read retry count of five. If, after exhausting the read retry count on a block of data, the data is still uncorrectable, the controller determines if the drive has error recovery capabilities. The RAOOIRA92 disk drive has 14 error recovery levels (see Table B-2). The controller issues an ERROR RECOVERY command to the drive. The drive then initiates the first level of error recovery. In the case of the RAOO1RA92, level 14 is used first and the drive decrements down to zero. The RA901RA92 activates the appropriate hardware circuits corresponding to a level 14 error recovery. The controller repeats the entire read data block process including, if necessary, the read retry process. DIGITAL INTERNAL USE ONLY RA9OIRA92 Error Recovery levels B-3 If the data has still Dot been I'eCOveled, the controller issues another ERROR RECOVERY command, this time specifying leve113. Again, the drive error :recovery process starts and continues until the data bas been recovered or all the error recovery levels have been tried. If the read retry operation fails and the error recovery 1eve1s fail, the controller returns an error to the host and BBR is impiemented on thai biock oi data. The error recovery mechanism is not restricted to ECC errors encountered during reads. Headerrelated errors may also cause the hardware error recovery levels to be implemented. DIGITAL INTERNAL USE ONLY C Customer Equipment Maintenance This appendix will assist customers in maintaining their equipment to ensure the highest level of equipment performance and reliability. Specifically, this appendix addresses the mainienance oi 6O-inch storage array cabinet systems. C.1 Customer Responsibilities The customer is direCtly responsible for: • Supplying accessories, including storage racks, cabinetry, tables and chairs, as required. • Making the appropriate documentation available in a location convenient to the system. • Obtaining cleaning supplies specified in this appendix. • Performing the specific equipment maintenance described in ~ appendix. C.1.1 Cleaning Supplies To properly maintain the equipment, the customer must acquire the following items and supplies: • Vacuum cleaner with tlexible hose and nonmetallic, sotlrbristle brush attacbment • Isopropyl alcohol (at least 91%) (Digital PIN 29-19665) • T·;"t~free tissues or clOt.hS • All-purpose spray cleaner CAunON When using spray cleaner, do not spray cleaner directly into computer equipment. This could adversely affect equipment reHabiUty or damage electrical components. C~ 1.2 Ongoing Equipment Care The following should be performed on an ongoing basis: • Keep the immediate area in front of the storage array cabinets free of obstructions. • Keep the exterior of the cabinets and the surrounding area clean. Use a lint-free cloth and isopropyl alcohol to remove sticky residue left on painted surfaces by customer cabinet number labels, and so forth. • Maintain the site temperaturelhumidity to comply with Digital's recommended environmental range (reference product-specific documentation). This will ensure the highest product reliability and product life goals are achieved. DIGITAL INTERNAL USE ONLY C-1 C-2 Customer Equipment Maintenance C.1.3 Monthly Equipment Maintenance The following tasks should be performed on a monthly basis, or more often if environment warrants: CAUTION Avoid touching the operator control paDel switches during cleaning operations. The state of the drives could ehaDge and affect the operation of the subsystem. • Vacuum and/or wipe top of storage array cabinet with a lint-free cloth. • With a soft-bristle brash attachment, vacuum the air vent grill on the front door of the storage array cabinet. Leave the front door assembly attached to the storage 8lT8y cabinet while vacuuming. C.1.4 Maintenance Records Digital suggests the customer keep an accurate log of all equipment maintenance. A maintenance log form for 60-inch storage array cabinets is included in this appendix for customer use. This form may be reproduced and inserted in the customer's site management guide for record-keeping purposes. Refer to Figure 0-1. DIGITAL INTERNAL USE ONLY Customer Equipment Maintenance 0-3 CUSTOMER EQUIPMENT MAiNTENANCE LOG FOR STORAGE ARRAY CABINETS I " ' " I .. v r SERVICE CABINET SIN TYPE OF SERVICE PERFORMED I I I CABINET SIN CABINET SIN CABINET SIN I I I CX0-2eUA Figure C-1 Customer Equipment Maintenance Log for Storage Array Cabinets DIGITAL INTERNAL USE ONLY D Customer Services' Preventative Maintenance The information contained in this appendix will assist Digital Customer Services engineers in performing and pianning preventative maintenance (Ply{) pcOr:edw-es for P..A90IP..A92 disk drive products. D.1 PM Checklist for RA90IRA92 Disk Drives The following preventative maintenance steps should be performed by Digital Customer Services on a scheduled basis at specified intervals. The PM checklist is a per storage element checklist. Due to the frequency of this activity, we suggest that you record this activity on the RA9OIRA92 Preventative Maintenance Activity Log provided in this section. This log sheet may be reproduced and inserted in the site management guide, as appropriate. One-Year Interval Perform the following PM steps at 1-year intervals: 1. Utilize VAXsimPLUS to obtain the repair history of each disk drive. Examine the drive error profile over various lengths of time to determine whether a proactive repair may be warranted. Examination may include opening up the time window for the last week, last month, and last 3 months. Deeper examination of error logs may be necessary if there are any error rate trends of concern. (Time: 10:00 minutes for basic error analysis with VAXsimPLUS) 2. Remove the drive(s) from service. (Time: 2:00 minutes) 3. Remove the cabinet front access panel or bezei assembiy. Remove and clean each cabinet pre-filter or air vent grill as necessary. (Time: 5:00 minutes) 4. Determine the drive microcode revision levels by examjning subsystem printouts or running drive test T45. Update microcode to the latest compatible functional revision as necessary. (Time: 3:00 minutes) 5. From the rear of the cabinet at the 110 bulkhead panel, verify the SDI cables are dressed and , routed in an orderly fashion to prevent the cables from being tripped over or stepped on. 6. Verify the SDI connectors are securely attached to the 110 bulkhead panel. 7. Return the drive(s) to service. The yearly PM steps can be accomplished in approximately 20 minutes per drive. Servicing more than one drive at a time will result in reduced time per drive. DIGITAL INTERNAL USE ONLY 0-1 0-2 Customer Services' Preventative Maintenance 'TWo-Year Interval Perform the following PM steps at 2-year intervals: 1. Remove the drive(s) from service. (Time: 2:00 minutes) 2. Remove drive power. 3. Remove the OCP and blower bezel assembly. Visually inspect the drive chassis interior for debris. If considerable dirtIlint is present, remove the electronic control module (ECM) assembly and head disk assembly (RDA) then vacuum the c~8ssis. Reassemble the drive. (Time: 10:00 minutes) . 4. Power up the drive and determine whether the blower motor quickly attains its speed aDd the drive becomes ready. (Time: 2:00 minutes) 5. Execute drive internal test TOO for one pass. ('lime: 10:00 minutes) 6. Return the drive(s) to service. The 2-year interval PM steps can be accomplished in apprcmimately 24 minutes per drive. Servicing more than one drive at a time will result in reduced time per drive. Five-Year Interval (for the HDA) In addition to the 1- and 2-year interval PM steps previously described, perform the fonowing step at 5-year intervals: 1. Remove aDd replace the spindle ground brush using ~ures contained in this manual. The 5-year interval PM steps should be accomplished within 40 .minutes per drive. DIGITAL INTERNAL USE ONLY Customer Services' Preventative Maintenance D-3 RA90/RA92 PREVENTATIVE MAINTENANCE ACTIVITY LOG FOR EACH RA90/RA92 STORAGE ELEMENT DRIVE TYPE (circle one) RA90 I RA92 DRIVE SERIAL NUMBER _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ CABINET _ _ _ _ _ _ _ _ CABINET SIN _ _ _ _ _ _ _ __ ECM DATE OF SERVICE MICROCODE REV LEVEL MAINTENANCE ACTIVITY HDA REV SIN SIN REV I T I I I CXO-2H3A DIGITAL INTERNAL USE ONLY Index A Acceptance testing, 2-13 drive spun down, 2-18 drive spun up, 2-19 Add-on installation, 2-22 Address selection, 2-20 B BBR algorithms, 5-48 BBR packet, 5-48 Blowerlbezel motor assembly removal, 6-7 Blower motor, dual outlet, 3-12 Brake assembly removal, 6-17 c Cluster installation note, 2-20 Controller byte, 5-5 Correctable ECC errors, 5-48 Cylinder address bytes, 5-6 D Data rates, 1-6 Data storage capacity RA90 disk drive, 1-1 RA92 disk drive, 1-1 Deskidding cabinets, 2-5 Diagnostics host-level, 5-37 HSC-based, 5-37 KDM-based, 5-37 off-line, 5-37 power-up, 2-16 standalone, 5-37 XDA controller-based, 5-38 Diagnostics and utilities Average Seek Timing test (T38), 4-14 Clear DD 'Bit utility (T55), 4-21 Clear Seeks utility (T53), 4-21 Clear Spinups utility (T54), 4-21 Diagnostics and utilities (cont'd.) Display Drive Serial Number utility (T47), 4-20 Display Error Log Errors utility (T41), 4-16 Display Seeks utility (T43), 4-17 Display Spinups utility (T44), 4-18 Display Time utility (T24), 4-17 Drive Revision Level utility (T45), 4-18 Drive SIN Bus test (T04), 4-7 Drive-Sensed Temperature Display utility (T29), 4-12 Error Log Checkpoint utility (T50), 4-21 Gray Code (Track Counter) test (T29), 4-12 Guardband test (TSO), 4-12 Hardcore Sequence Test (T18), 4-11 HDA Revision utility (T46), 4-20 Head Select and One Seek test sequence (T24), 4-12 Head Select test (".r06), 4-8 Head Select utility (T63), 4-22 Head Switch Timing test (T39), 4-15 idle loop tests (spun down), 4-2 idle loop tests (spun up), 4-2 Incremental Seek test (T3l), 4-13 individual descriptions, 4-5 Loop-Off utility (T62), 4-22 Loop-On-Error utility (T61), 4-22 Loop-On-Test utility (T6O), 4-21 Master CPU test, 4-5 Master RAM test, 4-5 Master ROM test (T01), 4-6 Master Timer test (T02), 4-6 Minimum Seek Timing test (T36), 4-14 One Seek utility (T64), 4-23 power-up, 4-1 . problem OCP displays, 4-3 Random Seek test (T33), 4-13 ReadJWrite Force Fault test (T16), 4-11 Index 1 2 Index Diagnostics and utilities (cont'cL) ReadlWrite Sequence test <T19), 4-11 Read-Only Cylinder Formatter test (T17), 4-11 Read-Only test (T14), 4-9 SDI Loopback test (external) (T09), 4-9 SDI Loopback test (internal) ('1'08), 4-9 SectorlByte Counter test ('1'07), 4-8 Seek Parameter Input utility (T65), 4-23 sequence tests, 4-2 Serial Communications Interface test, 4-6 Servo Data Bus Loopback test (T03), 4-6 Servo RAM test, 4-6 Servo Spinup Sequence test (T20), 4-11 'l8pered Seek test ('1'34), 4-13 'lbUle Seek test (T32), 4-13 'lbtal Drive Sequence test (spinning) (T22), 4-12 'lbtal Drive Sequence test (spun down) (T23), 4-12 'lbtal Servo Sequence test (T21), 4-11 Update Cartridge utility (spun down) (T40), 4-15, 7-3 Variable Average Seek Timing test (T66), 4-25 WritelRead test (T15), 4-9 Documentation related, xiii troubleshooting, 5-1 Drive unit address alternate display mode, 3-21 programming, 3-19 E ECM description, 3-3 JlO-RIW module, 3-3 module types, compatibility, 3-3 removal, 6-10 servo module, 3-5 Electrical specifications, 1-7 Electronic control module SeeECM Electrostatic protection See ESD protection Environmental limits, 1-7 Error byte, 5-4 Error code byte, 5-9 Error codes during acceptance testing, 2-20 Error codes (cont'cL) OCP, 2-18 Error descriptions AO Unable to Clear SDI Array Safety Status Register, 5-93 A1 Unable to Force Encoder Error, 5-93 A2 Unable to Force Multiple Head Select While Reading, 5-93 A3 Unable to Force Write Gate and Write Unsafe, 5-94 A4 Unable to Force Write Current and No Write Gate, 5-94 AS Unable to Force Write Gate and No Write Current, 5-94 A6 Unable to Force Read Gate and Off '!rack Error, 5-94 A7 Unable to Force Write Gate and Off Track Error, 5-94 AS Unable to Force Read and Write Fault While Writing, 5-95 AS Servo FaultIForce Fault Test, 5-95 AB Forced Read and Write Fault While Reading, 5-95 4A. Drive Disabled by Controller (DD Bit Set), 5-75 AD UART Overrun or Framing Error, 5-95 5A Embedded Head Gain Calibration, 5-78 7A Embedded OffsetJGain Cah"bration Timeout, 5-84 AE OCP Data Packet Checksum Error, 5-95 AF OCP Start Byte is Not a Sync Character, 5-96 9A Positioner Correeted Event During Data 'lransfer, 5-91 OA SDI Incorrect Command Opcode Parity Error, 5-56 1A SDI Invalid Cylinder Address, 5-63 2A SDI Invalid Subunit Specified, 5-68 SA Servo Processor Inside of Destination '!rack During Settle State, 5-88 33 Attempt to Write Through Bursts, 5-70 SA Unable to Force No-Sync Error, 5-81 SA Write Gate and Write-Protected, 5-72 BO OCP Invalid Response, 5-96 B2 OCP Retransmit Failure, 5-96 B3 OCP Command Unsuccessful, 5-96 B4 OCP Command Timeout, 5-97 Index 3 Error descriptions (cont'cL) B6 Master Processor UART Loopback Test Failure, 5-97 B8 V.iBE..er Processor UART TransmitterlReceiver Error, 5-97 B9 OCP-to-Master Processor Communications Timeout Failure, 5-97 BA OCP NMI Timeout Failure, 5-97 5B Bias Calibration Error, 5-78 BB OCP Processor ROM Checksum Failure, 5-98 BC Cartridge Checksum Failure, 5-98 BD Miaccode Update CArtridge Detection Failure, 5-98 BE Cartridge/EEPROMlMaster Processor Consistency Check, 5-98 BF Error Log Write Compare Error, 5-99 8B Gray Code Error After Settling With Fine Track, 5-88 3B Hard INIT OCCUITed to Drive, 5-72 4B Index Error, 5-75 IB Inner Guardband Error, 5-64 7B Invalid Test While Spindle Running, 5-84 6B BJW WritelRead Test Overall Failure (Three or More Bad Heads), 5-81 2B SDI Invalid Diagnose Memory Region Location, 5-68 OB SDI Invalid Opcode, 5-56 9B Write and Positioner Corrected Event, 5-91 CO HL~WL-re Revision and Microcode Incompatibility, 5-99 Cl Outer Guardband Detected After HEAD LOAD Command, 5-100 C2 Inner Guardband Detected After HEAD LOAD Command, 5-100 C3 Seek to Outer Guardband Failed, 5-100 C4 Seek to Outer Guardband Not Detected, 5-101 C5 HDA and EOM Incompatibility, 5-101 C6 PLO Failure, 5-101 C7 Seek to Inner Guardband Failed, 5-101 C8 Inner Guardband Not Detected After Seek to Inner Guardband, 5-102 C9 Analog Loop Test Failure, 5-102 CAMedia Not Spinning, 5-102 98 Can't Execute Diagnostic/Jumper, 5-91 Error descriptions (cont'd.) 64 Cannot Clear lID Error Bits, 5-80 67 Cannot Execute Write Test (ReadOnly Test Failed or Not Run First), 5-80 CC Servo Processor Recalibrate Failed, 5-102 CD Track Counter (Gray Code), 5-103 CE EEPROM Write Cycle Timeout, 5-103 CF Invalid Data in EEPROM, 5-103 7C Gray Code Match Error After Settling, 5-85 50 IncotTeCt Diagnostic Index or Sector Pulse, 5-78 lC Outer Guardband Error, 5-64 6C BJW WritelRead Test Partial Failure (One or'l\vo Bad Heads), 5-81 9C Read Gate and Positioner Corrected Event, 5-92 OC SDI Command Length Error (LVL2), 5-57 4C SDI Invalid Write Memory Region Error, 5-75 2C SDI Spindle Not Ready with SeeklRecalibration Command, 5-68 8C Uncalibrated and PLO Error, 5-88 58 Dedicated Head Gain Calibration Error, 5-77 79 Dedicated Servo Calibration Timeout Error, 5-84 7D Embedded Interrupt Timeout, 5-85 9D Error Log Header COlTUpted, 5-92 3D IiDA ReacilWrite Interlock Broken, 5-72 65 Diagnostic Index or Sector Not Detected, 5-80 61 Diagnostic Index Sync Timeout Error, 5-79 1D megal Servo Fault, 5-64 8D Polarity Error on Velocity Command During a Multi-Track Seek, 5-88 2D Power Supply Over-'Iemperature, 5-69 42 Drive Not On LinelSEEK Command Issued, 5-73 OD SDI Invalid Command with Drive Error, 5-57 55 DSP Sanity Timeout After Load, 5-77 6D Unable to Force Read Gate and Write Gate Together, 5-82 4D Write Gate and Bad Embedded Servo Information, 5-76 EO Spindle Rotation Not Detected, 5-103 4 Index Error descriptions (cont'd.) E1 Spindle Speed Out Of Range, 5-104 E2 AID or D/A Converter Insane, 5-104 E3 Excessive Positioner CUlTent During Test, 5-104 E4 Open Circuit Detected During Power Amp Toggle Test, 5-104 E5 Overcurrent Detected During Actuator Test, 5-105 E6 Track Counter Clear Failure, 5-105 E7 nlegal Zone Detected, 5-105 E8 Outer Guardband Timeout, 5-106 E9 Gray Code Timeout During the Turnaround State, 5-106 EA Gray Code Timeout During Outer Guardband State, 5-107 EB Sector Pulse Timeout During SyncUp State, 5-107 EC Servo Fault and PLO Fault Bit Set in GASP, 5-107 9E Drive Faulted, Test Cannot Run, 5-93 ED Servo Watchdog Timeout, 5-107 EE Servo Digital Signal Processor Reset, 5-107 EF Head Unload Failed, 5-108 7E Fine Track Lost After Settling, 5-85 22 Electronic Control Module OverTemperature Error, 5-66 8E Master Processor ROMlEEPROM Consistency Code Mismatch, 5-89 59 Embedded Servo Offset Calibration Error, 5-78 34 ENDEC Encoder Error, 5-70 3E OCP Interlock Broken, 5-73 IE Power-Up After AC Power Loss, 5-64 OE SDI LvI 1 Invalid Select Group Number, 5--57 ,2E SDI Spinup Inhibited by Controller Flags, 5-69 6E Unable to Force Write Gate and Write Protect Error, 5-82 FO Servo Microcode Update Failed, 5-108 Fl Command to Servo Processor Timed Out, 5-108 F3 Servo Spinup Failed, 5-108 F4 Servo Spindown Failed, 5-109 F5 Seek Failed, 5-109 F6 Head Switch Failed, 5-109 F7 RTZ Failed, 5-109 F8 Head Load Failed, 5-109 Error descriptions (cont'd.) F9 Diagnostic Command Failed, 5-110 FA Servo Processor Failed Seek to DGN Write Cylinder, 5-110 FB Servo Processor Failed Seek to DGN Read Cylinder, 5-110 FD EEPROM Checksum Error, 5-111 6F Diagnostic Write Attempted While Write-Protected, 5-82 8F EEPROM Checksum Failure, 5-89 9F Error Log Check Point Code, 5-93 4F Invalid Select Group (Level 1 Command) - Not Read/Write Ready, 5-76 44 Format Command and Format Not Enabled, 5-74 2F SDI RUN Command with Run Switch in Stop Position, 5-69 OF SDI Write Enable on a WriteProtected Drive, 5-57 1F Sector Overrun Error, 5-65 7F Servo Settling Timer Expired, 5-85 77 Head Load Timeout Error, 5-84 14 Head Offset Margin Event, 5-62 15 Head Offset Out-of-Band Error, 5-62 54 Head Select Register Loopback Error, 5-77 93 Inner GuardbandlServo Fault: No Interrupt Detected, 5-89 92 Inner Guardband Without a Servo Fault Set, 5-89 49 Invalid Command During TOPOLOGY Command, 5-75 47 Invalid Disconnect CommandtTl' Bit Error, 5-74 05 Invalid Drive Serial Number Code, 5--55 46 Invalid Hardware Fault, 5-74 48 Invalid Write Memory Byte Counter/Offset Error, 5-75 24 Loss of Fine Track During Data Transfer, 5-66 88 Master Processor EEPROM Write Violation Error, 5-87 85 Master Processor RAM Test Failure, 5-87 87 Master Processor ROM Checksum Failure, 5-87 80 Master Processor ROM Consistency Code Mismatch, 5-86 57 Ma~ter Processor Timer Failure, 5-77 11 Microcode Cartridge Load Occurred, 5--58 06 Microcode Fault, 5-55 Index 5 Error descriptions (cont'd.) 91 No Interrupt Detected During R/W Force Fault, 5-89 74 Offset Timeout Error, 5-83 60 ReadlWrite Head Select Failure, 5-79 38 Read Gate and Multiple Head Chips Selected, 5-71 45 Read Gate and Off Track Both Asserted, 5-74 31 Read Gate and Write Gate Both Asserted, 5-70 32 Read or Write While Faulted, 5-70 62 Read Test Overall Read Failure (Three or More Bad Heads), 5-79 63 Read Test Partial Failure (One or Two Bad Heads), 5-79 66 Read Test Servo Failure, 5-80 71 Recalibrate Timeout Error, 5-82 10 SDI Command Length Error (LVL2), 5-58 96 SDI Failure: Port B, 5-90 07 SDI Frame Sequence Error, 5-55 29 SDI Invalid Error Recovery Level Specified, 5-68 19 SDi invalid Format Request, 5-63 16 SDI Invalid Group Select LVL2, 5-62 40 SDI Invalid Read Memory Region Error, 5-73 94 SDI Loopback Test Failure on Both Ports, 5-90 09 SDI LvI 1 Framing Error, 5-56 08 SDI LvI 2 Checksum Error, 5-56 17 SDI Port A CommandlResponse Timeout, 5-63 18 SDI Port B CommandlResponse Timeout, 5-63 20 SDI RTCS Parity Error, 5-65 95 SDI Test Failure: Port A, 5-90 21 SDI Transfer (Pulse) Error, 5-65 51 Sector/Byte Counter Error, 5-76 89 Seek Speed Out of Range, 5-87 50 Servo Data Bus Failure, 5-76 25 Servo Fault Error, 5-66 27 Servo Over-Temperature Error at SI, 5-67 28 Servo Over-Temperature Error at S2, 5-67 78 Servo Processor Bias Force Calibration 'rJllleout, 5-84 82 Servo Processor Coarse Velocity State Timeout, 5-86 83 Servo Processor Fine Velocity State Timeout, 5-86 73 Servo Processor Head Switch Timeout, 5-83 Error descriptions (cont'd.) 53 Servo Processor Offset Error, 0-1 1 76 Servo Processor Sanity Timeout, 5-83 84 Servo Processor Seek Direction Error, 5-87 72 Servo Processor Seek Timeout, 5-83 81 Servo Processor Settle State Timeout, 5-86 70 Servo Processor Spinup Timeout, 5-82 75 Servo Processor Unload Timeout, 5-83 56 Servo RAM T~st Failure (High Byte of Address), 5-77 52 Servo RAM Test Failure (Low Byte of Address), 5-76 13 Spindle Motor Control Fault, 5-59 01 Spindle Motor Transducer Timeout, 5-54 01. Spindle Motor Transducer Timeout 8, 5-53 03 Spindle Not Accelerating During Spinup, 5-54 26 Spindle Speed Error (Servo Processor), 5-67 12 Spindle Speed UnSafe Error, 5-58 04 Spinup Too Long to Lock on Speed, 5-54 02 Spinup'Tho Slow, 5-54 86 Static RAM Failure, 5-87 43 TCR and Not ReadIWrite Ready Fault, 5-74 68 This Diagnostic Cannot Execute W:1t.Jlout Software Jumper, 5-81 69 Unable to Force Compare Error, 5-81 90 Unable to Force Index Error, 5-89 36 Write and Servo Uncalibrated, 5-71 35 Write and Write Unsafe, 5-71 30 Write Current and No Write Gate, 5-69 37 Write Gate and No Write Current, 5-71 39 Write Gate and OtfTrack, 5-72 Error logs, 1-4 Error recovery level byte, 5-9 Error recovery levels, B-1 Error recovery Levels NOP:noopMatio~, B-2 Errors related to media See media elTOrs ESD protection, 1-8 wrist strap use, 1-8 6 Index F Fault display mode setup, 3-16 Floor loading, 2-3 Front access panel, removal, 2-7 H HDA brake assembly removal, 6-17 carrier separation, 6-14 description, 3-10 hardware compatibility, 3-12 installation, 6-14 removal, 6-12 spindle ground brush removal, 6-16 HDA preventative maintenance, D-2 HDA revision bits byte, 5-6 Host error logs, 5-2 J/O-B/W module description, 3-3 hardware revision matrix, 3-5 Idle loop testing, 2-16 Input current (amps), 1-7 Inrush current, 1-6 Installation note, cluster, 2-20 L Labeling, OCP, 2-13 Lamp test, OCP, 2-16 LARS examples, A-I Latency, 1-6 Level A Retry, 5-49 Level B Retry, 5-49 Leveling cabinets, 2-6 Logical media layout, 1-3 Media errors (cont'cL) LBN COITelation to multiple groups (heads), 5-34 LBNs correlated to zone write boundaries, 5-34 multiple controllers report same errors, 5-35 repeating LBNslRBNs, 5-33 single controller port aft'ected, 5-35 Media removal service, 6-25 Microcode compatibility with drive FRUs, 3-13 Microcode update procedure, 7-3 microcode update cartridge description, 7-1 running T40, 7-3 update port description, 7-2 Mode byte, 5-4 MSCP status/event 6B, 5-49 MSLG$_LEVEL, 5-46 MSLG$_RETRY, 5-46 N Normal mode setup, 3-15 o OCP fUnctions, 3-14 removal, 6-6 OCP error codes, 2-18 OCP labeling, international, 2-13 OCP lamp test, 2-16 On line placing drive on line, 2-20 Operating temperature and humidity, 2-3 Operator Control Panel SeeocP M Maintenance activity log, 0-3, D-2 Maintenance strategy, 1-3, 1-4 Manufacturing fault code, 5-9 Media elTOrs, 5-32 drive or controller port not defined (random RIW errors), 5-35 excessive number of blocks replaced because of RJW path problems, 5-33 isolating random RJW transfer errors, 5-35 LBN correlated to a physical cylinder, 5-34 LBN correlation to a single group (head), 5-33 p Part numbers, ECM components, 3-3 Parts removal sequence, 6-3 PeM description, 3-7 removal, 6-11 switch pack settings, 3-9 Phase requirements, 2-1 Physical characteristics, 1-6 Physical media layout, 1-3 Positioner errors, 5-49 Power, applying to drive, 2-14 Power and safety precautions, 2-1 Power cord connections, 2-11 index 7 Power dissipation, 1-7 Power supply available voltages, 3-12 removal, 6-22 Power supply location, drive, 2-12 Power-up resident diagnostics, 2-16 Preamp control module SeePCM Preventative maintenance customer responsibilities, 0-1 Customer Services' responsibilities, Servo module (cont'd.) hardware revision matrix, 3-7 Site preparation and planning, 2-1 Software jumper, 4-4 Specifications, RA9O/RA92, 1-5 Spindle ground brush removal, 6-16 Spindle lock solenoid failure, 6-20 Start/stop time, 1-6 Status/event codes 14, 5-48 34, 5-46, 5-48, 5-49 54, 5-48 74, 5-48 94, 5-48 2A, 5-32 lAB, 5-48 D-1 maintenance activity log, D-2 Previous command opcode byte, 5-6 Programming the unit address, 2-20 lAB, 5-31 AB, 5-31 R Rear access panel, removal, 2=9 Rear flex cable removal, 6-23 Remova1lreplacement procedures bezel and blower motor assembly separation, 6-9 blowerlbezel motor assembly removal., 6-7 brake assembly removal, 6-17 contact extraction tool, 6-20 ECM removal, 6-10 front access panel removal, 6-4 FRUs, sequence for removal, 6-3 HDA and carrier separation, 6-14 HDA installation, 6-14 HDA removal, 6-12 media removal service. 6-25 OCF removal, S-6 PCM removal, 6-11 power supply removal, 6-22 rear access panel removal, 6-4 rear :flex cable removal, 6-23 solenoid removal, 6-22 spindle ground brush removal 6-16 spindle lock solenoid failure, '6-20 tools checklist, 6-3 Request byte, 5-3 Response opcode byte, 5-3 Retry count byte, 5-5 s SDI cable connections, 2-10 Sector format, 1-1 Seek times, 1-5,4-5 Sequence diagnostics, 4-2 Service delivery strategy, 1-4 Servo module description, 3-5 14B, 5-29 4B, 5-29 lOB, 5-29 8B, 5-30 16B, 5-31 18B, 5-31 2B, 5-32 SB, 5-4:9 B4, 5-4-8 1C8, 5-48 CB, 5-30 D4, 5-4-8 E8, 5-44, 5-49 lE8, 5-48 Status bytes extended, 5-2 generic, 5-4 S-ubwrlt mask bTw, 5-3 T 'Thmperature, affect on drive performance 4-5 ' Test selection from OCP, 2-16 Theory drive operations and theory, 3-1 Thermal stabilization, 2-3 Tools checklist, 6-3 Training, 5-1 Troubleshooting bad block replacement (BBR), 5-24 controller byte, iHS controller-detected communication events and faults, 5-30 controller-detected drive clock dropo~ 5-31 controller-detected drive failed initialization, 5-31 8 Index Troubleshooting (cont'd.) controller-detected drive ignored initialization, 5-31 controller-detected EDC errors, 5-28 controller-detected loss of read/write ready, 5-30 controller-detected lost receiver ready, 5-30 controller-detected protocol and transmission errors without communications errors, 5-29 controller-detected pulse or state parity errors, 5-29 controller-detected receiver ready collision, 5-31 controller-detected SERDES error, 5-32 correctable ECC errors, 5-48 cylinder address bytes, 5-6 data collection steps, 5-26 DBN conversion, RA9O, 5-6 DBN conversion, RA92, 5-8 drive-detected drive errors and diagnostic faults (DDDE), 5-27 drive-detected protocol errors without communication errors (DDPE), 5-27 drive-detected pulse or state parity errors, 5-27 drive internal error log, 5-9, 5-27 drive-resident utility dump (T41), 5-14 error byte, 5-4 error code byte, 5-9 error recovery level byte, 5-9 error reporting mechanisms, 5-1,5-15 exiting data collection/action list process, 5-39 extended status bytes, 5-2 FRU replacement stage, 5-40 general information, 5-16 HDA revision bits byte, 5-6 host console/user terminal trails, 5-24 , host error log, 5-25 host error logs, 5-2, 5-23 host-level diagnostics, 5-37 host-level diagnostics and utilities, 5-16 HSC-based diagnostics, 5-37 HSC console log, 5-24, 5-26 HSC console utility: DKUTlL, 5-12 identifying the problem drive, 5-23 identifying the problem FRU, 5-24 KDM-based diagnostics, 5-37 LBN conversion, RA90, 5-6 LBN conversion, RA92, 5--8 manufacturing fault code, 5-9 Troubleshooting (cont'd.) miscellaneous checks, 5-36 mode byte, 5-4 OCP fault indicator/error codes, 5-14, 5-25 off-line diagnostics, 5-37 other means (to identify problem drive), 5-24 performance issues when no errors are being logged, 5-41 post-verification testing, 5-40 Power OK indicator, 5-14 pre-verifying drive symptoms, 5-25 previous command opcode byte, 5-6 priority order of DSA errors, 5-27 RBN conversion, RA9O, 5-6 RBN conversion, RA92, 5--8 receiver ready collisions: acceptable rates, 5-31 receiver ready collisions: unacceptable rates, 5-31 recommended training, 5-1 reference material, 5-1 request byte, 5-3 resident diagnostics limitations, 5-16 response opcode byte, 5-3 retry count byte, 5-5 returning disk to customer, 5-41 SDI drive command timeout, 5-32 standalone diagnostics, 5-37 status/event 6B, 5-52 step-by-step procedure, 5-16 subunit mask byte, 5-3 uncorrectable ECC errors, 5-44 unit number low byte, 5-3 unusual problems, 5-36 VAXsimPLUS, 5-2, 5-23, 5-25 VMS mount verification, 5-42 worksheet, 5-23 XBN conversion, RA9O, 5-6 XBN conversion, RA92, 5-8 XDA controller-based diagnostics, 5-38 u Uncorrectable ECC errors, 5-44 hard, 5-44 soft, 5-46 Unit address see drive unit address Unit number low byte, 5-3 Unpacking, 60-inch cabinets, 2--3 Updating microcode See microcode update procedure Index 9 v VAXsimPLUS, 5-2 Voltage (frequency) selection power supply, 2-13
Home
Privacy and Data
Site structure and layout ©2025 Majenko Technologies