Digital PDFs

EK-VXFTA-SI-A01

June 1993

236 pages

Original

1.9MB

Document:	VAXft Systems Model 810 Service Information
Order Number:	EK-VXFTA-SI
Revision:	A01
Pages:	236
Original Filename:	vxftasia.pdf

OCR Text

VAXft Systems
Model 810
Service Information
Order Number: EK-VXFTA-SI.A01
June 1993
This manual is intended for use by trained personnel responsible for
maintaining VAXft Model 810 systems.

Digital Equipment Corporation

June 1993
The information in this document is subject to change without notice and should not be construed
as a commitment by Digital Equipment Corporation. Digital Equipment Corporation assumes no
responsibility for any errors that may appear in this document.
No responsibility is assumed for the use or reliability of software on equipment that is not supplied
by Digital Equipment Corporation or its affiliated companies.
Restricted Rights: Use, duplication, or disclosure by the U.S. Government is subject to restrictions
as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software
clause at DFARS 252.227-7013.
© Digital Equipment Corporation June 1993.
All Rights Reserved.
Printed in Canada
The following are trademarks of Digital Equipment Corporation: CompacTape, OpenVMS, ThinWire,
TK, UETP, VAX, VAXft, VMS, VAXELN, and the DIGITAL logo.
FCC NOTICE: This equipment generates, uses, and may emit radio frequency energy. It has been
tested and found to comply with the limits for a Class A computing device pursuant to Subpart J
of Part 15 of FCC rules of operation in a commercial environment. This equipment, when operated
in a residential area, may cause interference to radio/TV communications. In such event the user
(owner), at his own expense, may be required to take corrective measures.
This document is available on CDROM.

Documentation Map

Hardware
Information
(VAXft Systems)

Overview
Information
(VAXft Systems)

Software Product
Description

Models
110, 410, 610, 612

Model
810

Configuration
Guide

Configuring
the Model 810

Operating System
(VMS)

Cover
Letter

Software
Information
(VAXft System Services)

Before You
Install Letter

Release Notes
Site Prep and
Installation Guide

Release Notes

Installation
Information

Owner’s Manual
Operating
Information

VMS Upgrade and
Installation Manual

Wide Area
* VAXNetwork
Device Drivers

*Maintenance
Guide

Service
Information

*Site Prep
Information

= Book

= Tape

VMS Upgrade and
Installation Supplement:
VAXft Systems

Using Factory−Installed
Software with
VAXft Systems

Manager’s
Guide

Online Help

VMS Volume
Shadowing Manual

= Bookreader
Reference
Manual

= Online

= Letter

= Order Separately

MR−6230−RA

Contents
1 Cabinet and Component Descriptions
1.1
1.2
1.3
1.4
1.5

In This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CPU and Expansion Cabinets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Zone Control Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Power Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Domestic and International Power Distribution Boxes . . . . . . . . . . . . . . . .

1–1
1–1
1–6
1–8
1–9

2 Console Operations
2.1
In This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
Console Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3
Console Operating Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1
Entering CIO Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2
Exiting CIO Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4
Console Control Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5
Console Command Language Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6
Bootstrap Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7
Entering CIO Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8
CIO Mode Console Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.1
BOOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.2
CLEAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.3
CONTINUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.4
DEPOSIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.5
DUP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.6
EXAMINE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.7
FIND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.8
HELP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.9
INITIALIZE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.10
MOVE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.11
MATCH_ZONES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.12
REPEAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.13
SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.13.1
SET BOOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.14
SHOW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.15
START . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.16
TEST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.17
X(transfer) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.18
Z .....................................................
2.8.19
!(comment) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2–1
2–1
2–3
2–4
2–4
2–5
2–6
2–7
2–8
2–9
2–9
2–10
2–11
2–11
2–13
2–13
2–15
2–15
2–16
2–16
2–16
2–17
2–17
2–18
2–18
2–19
2–20
2–21
2–22
2–22

3 System Maintenance
3.1
In This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2
Maintenance Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3
Operating Rules and Cautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4
General Troubleshooting Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5
Module Fault LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6
Power System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.7
Power System Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.8
Device Status and Fault Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.8.1
RF35 Disk Drawer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.8.2
SF35 Storage Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.8.3
SF73 Storage Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.8.4
TF85C Tape Drive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.8.5
TF857 Tape Loader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.8.5.1
Power-On Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.8.5.2
Operator Control Panel Controls and Indicators . . . . . . . . . . . . . .
3.9
ROM-Based Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.9.1
TEST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.9.2
Z .....................................................
3.9.3
CPU ROM-Based Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.9.4
I/O ROM-Based Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3–1
3–1
3–2
3–4
3–6
3–7
3–12
3–19
3–19
3–21
3–24
3–26
3–27
3–27
3–27
3–29
3–30
3–31
3–31
3–34

4 Error Handling and Analysis
4.1
In This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2
Error Handling Services Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1
Basic Error Isolation and Handling . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.2
EHS Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.3
System Operating Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.4
Error Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.5
VAXELN Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3
Field Replaceable Units (FRUs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1
Isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.2
Deconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.2.1
I/O Attachment Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.2.2
CPU Module and Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.2.3
I/O Expansion Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.2.4
Interface Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.2.5
Zone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.2.6
Cross-Link Cable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.3
Application of Thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4
OpenVMS Error Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1
Fault Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.2
FRU Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.3
Deconfiguration Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.4
Threshold Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.5
Fault Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.5.1
System Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.5.2
End Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.5.3
End Action Timeouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.5.4
VAXELN Detected Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.5.5
Software Detected Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.5.6
Unsynchable Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5
Module NVRAM Status and LED Indicators . . . . . . . . . . . . . . . . . . . . . . .
vi

4–1
4–1
4–2
4–3
4–4
4–5
4–10
4–12
4–12
4–13
4–13
4–14
4–14
4–15
4–16
4–16
4–17
4–19
4–20
4–22
4–24
4–26
4–27
4–27
4–28
4–29
4–30
4–34
4–36
4–38

4.6
FTSS Event Reporting Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.1
Event Reporting Interface Routines . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.2
Error Event Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.2.1
Deconfiguration Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7
Firmware Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.1
System Console and Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.1.1
System Resets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.1.2
CCA Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.2
I/O Expansion Module Console and Diagnostics . . . . . . . . . . . . . . . . . .
4.8
Firmware and OpenVMS Interface Data Structures . . . . . . . . . . . . . . . . .
4.8.1
Console Communications Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8.1.1
Duplex Compatibility Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8.1.2
Dispatch Block Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8.1.3
Boot Parameter Block Description . . . . . . . . . . . . . . . . . . . . . . . . .
4.8.2
Device Configuration Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8.2.1
Sub-Device Configuration Blocks . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8.2.2
CPU Module SubDCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8.3
Page Frame Number Bitmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.9
Error Log Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.9.1
CPU/MEM Fault Error Log Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.9.2
CPU/MEM Fault End Action Error Log Entry . . . . . . . . . . . . . . . . . . .
4.9.3
CPU or Zone Unsynchable Error Log Entry . . . . . . . . . . . . . . . . . . . . .

4–40
4–40
4–40
4–49
4–50
4–50
4–51
4–53
4–53
4–54
4–55
4–57
4–59
4–60
4–61
4–63
4–64
4–65
4–66
4–66
4–69
4–72

5 FRU Removal and Replacement Procedures
5.1
5.2
5.3
5.3.1
5.3.2
5.3.3
5.3.4
5.3.5
5.4
5.4.1
5.4.2
5.4.3
5.4.4
5.4.5
5.4.6
5.4.7
5.4.8
5.4.9
5.4.10
5.4.11
5.4.12
5.4.13
5.4.14
5.4.15
5.4.16
5.4.17
5.4.18
5.4.19
5.4.20

In This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Field Replaceable Unit List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Before You Begin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Handling FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Shutting Down a Zone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Verifying Zone Shutdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Starting Up a Zone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Accessing the FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
FRU Removal and Replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CPU and ATM Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIMMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MMBs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fan and FCSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RF35 Disk Drive Removal and Replacement . . . . . . . . . . . . . . . . . . . .
DSSI Disk Drawer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Zone Control Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
FEU, 3.3V Regulator, 5V Regulator, PSC Modules . . . . . . . . . . . . . . . .
Cross-Link Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Console Extender Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DSSI Extender Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CAMP Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DSSI Interface Module (DIM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Ethernet Interface Module (EIM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DSSI Cable Removal and Replacement . . . . . . . . . . . . . . . . . . . . . . . .
TF85C-BA Tape Drive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SF73 Disk Drive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SF35 Storage Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TF857-CA Tape Drive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Power Distribution Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5–1
5–1
5–3
5–4
5–4
5–5
5–5
5–5
5–6
5–7
5–8
5–9
5–10
5–12
5–14
5–14
5–16
5–18
5–20
5–22
5–24
5–26
5–28
5–29
5–30
5–32
5–36
5–39
5–42

vii

6 Managing Integrated Storage Elements
6.1
6.2
6.3
6.4
6.5
6.6
6.6.1
6.6.2
6.6.3
6.6.4

In This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Loading the DUP Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using VMS DUP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using the Server Setup Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Assigning DSSI Unit Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Warm Swapping an ISE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting ISE Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ISE Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ISE Replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Installing an ISE in a Running System . . . . . . . . . . . . . . . . . . . . . . . .

6–1
6–1
6–1
6–2
6–2
6–3
6–5
6–7
6–8
6–11

A Miscellaneous System Information
A.1
A.2
A.3
A.4
A.4.1
A.4.2
A.4.3
A.4.4
A.5
A.6

In This Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Processor Halt Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Console Halt Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Error Register Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
System Fault (SYSFLT) Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
System Error Address (SYSADR) Register . . . . . . . . . . . . . . . . . . . . . .
DMA Error Address (DMAADR) Register . . . . . . . . . . . . . . . . . . . . . .
Reset Reason 0013 Fault Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . .
I/O Physical Address Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
System Control Block Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A–1
A–1
A–3
A–4
A–4
A–7
A–7
A–8
A–8
A–10

B ISE Parameter Worksheets
B.1
B.2
B.3

In This Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Individual ISE Parameter Worksheets . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ISE Zone Parameter Worksheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B–1
B–1
B–3

Indirect Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
How to Shut Down a Zone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
How to Verify Zone Shutdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2–12
5–5
5–5

Cabinet Layout, Front View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cabinet Layout, Rear View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Zone Control Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Power Module Controls and Indicators . . . . . . . . . . . . . . . . . . . . . . . .
Domestic Power Distribution Box . . . . . . . . . . . . . . . . . . . . . . . . . . . .
International Power Distribution Box . . . . . . . . . . . . . . . . . . . . . . . . .
System Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Console Operating Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Boot Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1–2
1–4
1–6
1–8
1–10
1–11
2–2
2–4
2–7

Index
Examples
2–1
5–1
5–2

Figures
1–1
1–2
1–3
1–4
1–5
1–6
2–1
2–2
2–3

viii

3–1
3–2
3–3
3–4
3–5
3–6
3–7
3–8
3–9
3–10
3–11
4–1
4–2
4–3
4–4
4–5
4–6
4–7
4–8
4–9
4–10
4–11
4–12
4–13
4–14
4–15
5–1
5–2
5–3
5–4
5–5
5–6
5–7
5–8
5–9
5–10
5–11
5–12
5–13
5–14
5–15
5–16
5–17
5–18
5–19
5–20
5–21

Module Fault LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Power System Block Diagram (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . .
Power System Block Diagram (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . .
Power Module Controls and Indicators . . . . . . . . . . . . . . . . . . . . . . . .
RF35 Disk Drawer Controls and Indicators . . . . . . . . . . . . . . . . . . . . .
SF35 Operator Control Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SF35 Rear Panel Fault Indicator . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Location of SF73 Storage Array LEDs and Switchpacks . . . . . . . . . . .
Rear of the SF73 Storage Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TF85C Cartridge Tape Drive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TF857 Operator Control Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hardware Error Handling Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . .
EHS Architectural Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
OpenVMS Error Log Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fault Summary Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
FRU Information Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Deconfiguration Information Block . . . . . . . . . . . . . . . . . . . . . . . . . . .
Threshold Information Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fault Data Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
End Action Timeout Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
VAXELN Detected Error Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Software Detected Error Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Unsynchable Event Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Firmware and OpenVMS Data Structure Memory Map . . . . . . . . . . . .
Dispatch Block Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SubDCB Links to DCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Latches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CPU Module and ATM Module Locations . . . . . . . . . . . . . . . . . . . . . .
SIMM Locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MMB Locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fan Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
FCSB Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RF35 Disk Drive Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Zone Control Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
FEU, 3.3V Regulator, 5V Regulator, and PSC Locations . . . . . . . . . . .
Cross-Link Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Module Extraction Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Console Extender Module Location . . . . . . . . . . . . . . . . . . . . . . . . . . .
Console Extender Module Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DSSI Extender Module Locations . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CAMP Module Locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DIM Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DIM Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
EIM Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TF85C-BA Tape Drive, Rear View . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TF85C-BA Tape Drive Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SF73 Disk Drive, Rear View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3–6
3–8
3–9
3–13
3–20
3–21
3–23
3–24
3–25
3–26
3–28
4–2
4–4
4–19
4–20
4–22
4–24
4–26
4–27
4–30
4–30
4–35
4–37
4–54
4–59
4–64
5–6
5–7
5–8
5–9
5–10
5–11
5–12
5–15
5–16
5–18
5–19
5–20
5–21
5–22
5–24
5–26
5–27
5–28
5–30
5–31
5–32
ix

5–22
5–23
5–24
5–25
5–26
5–27
5–28
5–29
5–30
5–31
5–32
6–1
6–2
A–1
A–2
A–3
A–4
A–5
A–6

SF73 Disk Drive, Front View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SF73 Disk Drive Enclosure Removal . . . . . . . . . . . . . . . . . . . . . . . . . .
SF73 Disk ISE Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SF35 Storage Array, Rear View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SF35 Storage Array, Front View . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SF35 Disk ISE Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TF857-CA Tape Drive, Rear View . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Loosening the Shipping Restraint Screw . . . . . . . . . . . . . . . . . . . . . . .
Setting the TF857 Tape Loader Node ID . . . . . . . . . . . . . . . . . . . . . . .
Domestic Power Distribution Box . . . . . . . . . . . . . . . . . . . . . . . . . . . .
International Power Distribution Box . . . . . . . . . . . . . . . . . . . . . . . . .
VAXft Model 810 Front View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
VAXft Model 810 Rear View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
System Fault Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
JXD System Error Address Register . . . . . . . . . . . . . . . . . . . . . . . . . .
JXD DMA Error Address Register . . . . . . . . . . . . . . . . . . . . . . . . . . . .
I/O Physical Address Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
System Control Block Base Register . . . . . . . . . . . . . . . . . . . . . . . . . .
System Control Block Vector Format . . . . . . . . . . . . . . . . . . . . . . . . . .

5–33
5–34
5–35
5–36
5–37
5–38
5–39
5–40
5–41
5–42
5–43
6–3
6–4
A–4
A–7
A–7
A–9
A–10
A–10

Key to Figure 1–1, Cabinet Layout, Front View . . . . . . . . . . . . . . . . . .
Key to Figure 1–2, Cabinet Layout, Rear View . . . . . . . . . . . . . . . . . .
Key to Figure 1–3, Zone Control Panel . . . . . . . . . . . . . . . . . . . . . . . .
Key to Figure 1–4, Power Module Controls and Indicators . . . . . . . . .
Key to Figure 1–5, Domestic Power Distribution Box . . . . . . . . . . . . .
Key to Figure 1–6, International Power Distribution Box . . . . . . . . . .
Key to Figure 2–1, System Components . . . . . . . . . . . . . . . . . . . . . . . .
Function of the Console Components . . . . . . . . . . . . . . . . . . . . . . . . . .
Console Control Characters and Function Keys . . . . . . . . . . . . . . . . . .
Console Command Language Syntax . . . . . . . . . . . . . . . . . . . . . . . . . .
Qualifiers for BOOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
VMB Program /R5:<flag> Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Qualifier for CLEAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Qualifiers for DEPOSIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Address-Spec Symbolic Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Qualifiers for DUP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Qualifiers for EXAMINE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Address-Spec Symbolic Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Qualifiers for FIND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
INITIALIZE Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SET Variables and Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SHOW Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Qualifiers for TEST Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Qualifiers for TEST Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Qualifier for Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1–3
1–5
1–7
1–9
1–10
1–11
2–2
2–3
2–5
2–6
2–9
2–10
2–10
2–11
2–12
2–13
2–14
2–14
2–15
2–16
2–17
2–18
2–20
2–21
2–22

Tables
1–1
1–2
1–3
1–4
1–5
1–6
2–1
2–2
2–3
2–4
2–5
2–6
2–7
2–8
2–9
2–10
2–11
2–12
2–13
2–14
2–15
2–16
2–17
2–18
2–19

3–1
3–2
3–3
3–4
3–5
3–6
3–7
3–8
3–9
3–10
3–11
3–12
3–13
3–14
3–15
3–16
3–17
3–18
3–19
3–20
3–21
3–22
3–23
3–24
3–25
3–26
3–27
3–28
3–29
4–1
4–2
4–3
4–4
4–5
4–6
4–7
4–8
4–9
4–10
4–11
4–12
4–13
4–14
4–15
4–16
4–17
4–18

Before Stopping a Zone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
After a Zone is Repaired . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Before Leaving the Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
General Troubleshooting Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . .
Key to Figure 3–1, Module Fault LEDs . . . . . . . . . . . . . . . . . . . . . . . .
Power System Functional Summary . . . . . . . . . . . . . . . . . . . . . . . . . .
System DC Voltage Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Key to Figure 3–4, Power Module Controls and Indicators . . . . . . . . .
Fan, LDC, Temperature Error Codes . . . . . . . . . . . . . . . . . . . . . . . . . .
FEU Error Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PSC Error Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12 V DC to DC Converter Error Codes . . . . . . . . . . . . . . . . . . . . . . . .
2 V DC to DC Converter Error Codes . . . . . . . . . . . . . . . . . . . . . . . . .
3 V DC to DC Converter Error Codes . . . . . . . . . . . . . . . . . . . . . . . . .
5 V DC to DC Converter Error Codes . . . . . . . . . . . . . . . . . . . . . . . . .
12 V DC to DC Converter Error Codes . . . . . . . . . . . . . . . . . . . . . . . .
RF35 Disk Drawer Controls and Indicators . . . . . . . . . . . . . . . . . . . . .
SF35 Operator Control Panel Description . . . . . . . . . . . . . . . . . . . . . .
SF35 Rear Panel Controls and Indicator . . . . . . . . . . . . . . . . . . . . . . .
SF73 Front Panel Controls and Indicators . . . . . . . . . . . . . . . . . . . . . .
TF85C Tape Drive Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TF85C Cartridge Tape Drive Indicators . . . . . . . . . . . . . . . . . . . . . . . .
TF857 OCP Controls and Indicators . . . . . . . . . . . . . . . . . . . . . . . . . .
Qualifiers for TEST Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Qualifiers for TEST Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Qualifier for Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CPU ROM-Based Diagnostic Descriptions . . . . . . . . . . . . . . . . . . . . . .
I/O ROM-Based Diagnostic Descriptions . . . . . . . . . . . . . . . . . . . . . . .
EHS Error Notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Error Handling Flowchart Definitions . . . . . . . . . . . . . . . . . . . . . . . . .
System Operating Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Error Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
VAXELN Error Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
System FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ATM Deconfiguration Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CPU Deconfiguration Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
I/O Expansion Module Deconfiguration Actions . . . . . . . . . . . . . . . . . .
Interface Module Deconfiguration Actions . . . . . . . . . . . . . . . . . . . . . .
Zone Deconfiguration Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cross-Link Cable Deconfiguration Actions . . . . . . . . . . . . . . . . . . . . . .
FRU Thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
OpenVMS Error Log Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fault Summary Block Entry Descriptions . . . . . . . . . . . . . . . . . . . . . .
FRU Information Block Entry Descriptions . . . . . . . . . . . . . . . . . . . . .
Deconfiguration Information Block Entry Descriptions . . . . . . . . . . . .
Threshold Information Block Entry Descriptions . . . . . . . . . . . . . . . . .

3–2
3–2
3–3
3–4
3–4
3–7
3–10
3–12
3–13
3–15
3–16
3–16
3–18
3–18
3–19
3–19
3–19
3–20
3–22
3–23
3–24
3–26
3–27
3–28
3–30
3–30
3–31
3–31
3–34
4–2
4–3
4–4
4–5
4–11
4–12
4–13
4–14
4–15
4–15
4–16
4–16
4–17
4–19
4–20
4–23
4–24
4–26
xi

4–19
4–20
4–21
4–22
4–23
4–24
4–25
4–26
4–27
4–28
4–29
4–30
4–31
4–32
4–33
4–34
4–35
4–36
4–37
4–38
4–39
5–1
5–2
5–3
5–4
5–5
5–6
5–7
5–8
5–9
5–10
5–11
5–12
5–13
5–14
5–15
5–16
5–17
5–18
5–19
5–20
5–21
5–22
6–1
6–2
6–3
6–4
xii

System Register Entry Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . .
End Actions Register Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
End Action Timeout Block Entry Description . . . . . . . . . . . . . . . . . . .
VAXELN Detected Error Block Entry Descriptions . . . . . . . . . . . . . . .
Software Detected Error Block Entry Descriptions . . . . . . . . . . . . . . .
Unsynchable Event Block Entry Descriptions . . . . . . . . . . . . . . . . . . .
Module ID NVRAM/DCB Status Codes . . . . . . . . . . . . . . . . . . . . . . . .
System Reset Action Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
System Reset Reason Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Error Handler Reset Reasons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
I/O Reset Action Code Description . . . . . . . . . . . . . . . . . . . . . . . . . . . .
I/O Reset Reason Code Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . .
CCA Component Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Duplex Compatibility Test Failure Codes . . . . . . . . . . . . . . . . . . . . . . .
Dispatch Block Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BPB Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BPB Entry Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DCB Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DCB Entry Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CPU SubDCB Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CPU SubDCB Entry Components . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Model 810 FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Handling FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CPU Module and ATM Module Removal Procedure . . . . . . . . . . . . . . .
SIMM Removal Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MMB Removal Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fan and FCSB Removal Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . .
RF35 Disk Drive Removal Procedure . . . . . . . . . . . . . . . . . . . . . . . . . .
DSSI Disk Drawer Removal Procedure . . . . . . . . . . . . . . . . . . . . . . . .
Zone Control Panel Removal Procedure . . . . . . . . . . . . . . . . . . . . . . . .
FEU, 3.3V Regulator, 5V Regulator, and PSC Removal Procedure . . . .
Cross-Link Assembly Removal Procedure . . . . . . . . . . . . . . . . . . . . . .
Console Extender Module Removal Procedure . . . . . . . . . . . . . . . . . . .
DSSI Extender Module Removal Procedure . . . . . . . . . . . . . . . . . . . . .
CAMP Module Removal Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . .
DIM Removal Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
EIM Removal Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DSSI Cable Removal Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TF85C-BA Tape Drive Removal Procedure . . . . . . . . . . . . . . . . . . . . . .
SF73 Disk Drive Enclosure Removal Procedure . . . . . . . . . . . . . . . . . .
SF35 Storage Array Removal Procedure . . . . . . . . . . . . . . . . . . . . . . .
TF857-CA Tape Drive Removal Procedure . . . . . . . . . . . . . . . . . . . . . .
Power Distribution Box Removal Procedure . . . . . . . . . . . . . . . . . . . . .
PARAMS Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Switches For Disabling the MSCP . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ISE Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Disabling the MSCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4–28
4–29
4–30
4–30
4–35
4–37
4–39
4–51
4–52
4–52
4–54
4–54
4–55
4–58
4–60
4–60
4–61
4–61
4–61
4–65
4–65
5–1
5–4
5–7
5–8
5–9
5–11
5–13
5–14
5–15
5–17
5–19
5–21
5–23
5–25
5–27
5–29
5–29
5–31
5–33
5–38
5–40
5–43
6–2
6–2
6–5
6–9

6–5
A–1
A–2
A–3
A–4
A–5
A–6

Disabling the MSCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Processor Halt Code Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Processor Halt Reason Code Definitions . . . . . . . . . . . . . . . . . . . . . . . .
Console Halt Reason Code Definitions . . . . . . . . . . . . . . . . . . . . . . . . .
Xlink Mode Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Code Field Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SCB Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6–11
A–1
A–2
A–3
A–4
A–10
A–11

xiii

1
Cabinet and Component Descriptions
1.1 In This Chapter
This chapter includes descriptions of the:
•

CPU and expansion cabinets

•

Zone control panel

•

Power modules

•

Domestic power distribution box

•

International power distribution box

1.2 CPU and Expansion Cabinets
Figure 1–1 shows the front layout of an expanded system. Table 1–1 describes
the components shown in Figure 1–1. Figure 1–2 shows the rear layout of an
expanded system. Table 1–2 describes the components shown in Figure 1–2.

Cabinet and Component Descriptions 1–1

Figure 1–1 Cabinet Layout, Front View

Front

6
10

Expansion Cabinet

CPU Cabinet
MR−0406−92RAGS

1–2 Cabinet and Component Descriptions

Table 1–1 Key to Figure 1–1, Cabinet Layout, Front View
Item

Component

Description

Zone A

Complete computer with enough elements to run an
operating system.

Zone B

Complete computer with enough elements to run an
operating system.

Fan assembly

Cooling device.

Disk drawer

Optional SF35 disk drive(s).

System Module Card Cage
5

Slot 0 - CPU module

Logic chips and memory.

Slot 1 - ATM module

I/O logic supporting up to eight interface adapter cards.

Slot 2 - Not used

For future expansion.

Zone control panel

Zone controls and indicators.

Blank panel

Not used.

Disk device

Location for disk device.

Disk/tape device

Location for disk or tape device.

Disk/tape/tape loader

Location for disk, tape, or tape loader device.

Power distribution box A

AC power source for Zone A.

Power distribution box B

AC power source for Zone B.

UPS A

Optional uninterruptible power supply for Zone A.

UPS B

Optional uninterruptible power supply for Zone B.

Cabinet and Component Descriptions 1–3

Figure 1–2 Cabinet Layout, Rear View
Rear

5
6
7

8 9 10

1
11
14
16 17
12 13
15
18

22
24

CPU Cabinet

Expansion Cabinet

Expansion Cabinet Option
MR−0407−92RAGS

1–4 Cabinet and Component Descriptions

Table 1–2 Key to Figure 1–2, Cabinet Layout, Rear View
Item

Component

Description

Zone A

Complete computer with enough elements to run an
operating system.

Zone B

Complete computer with enough elements to run an
operating system.

Fan assembly

Cooling device.

Blank panel

Not used.

Front End Unit (FEU)

AC input circuit breaker.

FEU

Converts ac power to 48 Vdc.

FEU

AC input connector.

Regulator

Provides +3.3 Vdc at 30 A, +12 Vdc at 12.5 A, and
bias.

Regulator

Provides +5 Vdc at 90 A.

Power system controller

Provides interface signals to the ATM module.

Miscellaneous Module Card Cage
11

Blank panel

Not used.

Slot 0 - Not used

For future expansion.

Slot 1 - Cross-link assembly

Connects Zone A and Zone B.

Slot 2 - Console module

Module with console port.

Slot 3 - Not used

Factory test module.

Slot 4 - Disk In/Disk Out module

Permits zone interconnections to access all
configured disks.

Slot 5 - CAMP module

Provides custom power control circuits.

Interface Module Card Cage
18

Slots 10 to 17

DSSI and NI interface modules.

Slots 20 to 27

For future expansion.

Disk device

Location for disk device.

Disk/tape/tape loader

Location for disk, tape, or tape loader device.

Disk/tape device

Location for disk or tape device.

Power distribution box A

AC power source for Zone A.

Power distribution box B

AC power source for Zone B.

UPS A

Optional uninterruptible power supply for Zone A.

UPS B

Optional uninterruptible power supply for Zone B.

Cabinet and Component Descriptions 1–5

1.3 Zone Control Panel
Figure 1–3 shows the layout of the zone control panel. Table 1–3 describes the
functions of the zone control panel controls and indicators.
Figure 1–3 Zone Control Panel

1
2
3
4
5
6
7
1
8
9
10
MR−0514−92RAGS

1–6 Cabinet and Component Descriptions

Table 1–3 Key to Figure 1–3, Zone Control Panel
Item

Control/Indicator

Function

Logic Power - OFF

Two switches with amber indicators. Pressing
the two switches removes 48 V power and
disables the zone. Pressing one switch has no
effect on the operation of the zone. (CPU cabinet
disk power is not affected when logic power is
removed by pressing these switches.)

Logic Power - ON

One switch with a green indicator. Pressing this
switch applies 48 V power to the zone. (CPU
cabinet disk power is not affected when logic
power is applied by pressing this switch.)

Local Console

One switch with a green indicator. Pressing this
switch connects the system to the console local
port for communication.

Remote Console

One switch with a green indicator. Pressing this
switch connects the system to the remote port
for communication.

Secure

One switch with a green indicator. Pressing this
switch disables the console Break key function.
(You cannot use the console Break key to halt
the zone or system.)

Zone Halt Enable

One switch with a green indicator. Pressing this
switch enables the console Break key function.
(You can use the console Break key to halt the
zone.)

System Halt Enable

One switch with a green indicator. Pressing this
switch enables the console Break key function.
(You can use the console Break key to halt both
zones.)

Note
System Halt Enable is NOT supported in Simplex mode.
8

System OK

Green indicator. On when the system power is
on and the system is operational.

System Fault

Amber indicator. On when the system is not
operational.

OS Running

Green indicator. On when the system is
operational and running a customer or
diagnostic application.

Cabinet and Component Descriptions 1–7

1.4 Power Modules
Figure 1–4 shows the location of the power module controls and indicators.
Table 1–4 describes their functions.
Figure 1–4 Power Module Controls and Indicators

FEU

DC3

DC5

PSC
7
8
9
10
11
12
13

2
3
4

CAMP
MR−0483−92RAGS

1–8 Cabinet and Component Descriptions

Table 1–4 Key to Figure 1–4, Power Module Controls and Indicators
Item

Control/Indicator

Function

AC Circuit Breaker

FEU Failure

When on, indicates the dc output voltages for the
FEU are below the specified minimum.

FEU OK

When on, indicates the dc output voltages for the
FEU are above the specified minimum.

DC3 Failure

When on, indicates that one of the +3 Vdc output
voltages is not within the specified tolerances.

DC3 OK

When on, indicates that the +3 Vdc output voltages
are within the specified tolerances.

AC Present

When on, indicates ac power is present at the ac
input connector, regardless of the position of the
circuit breaker.

DC5 Failure

When on, indicates that one of the +5 Vdc output
voltages is not within the specified tolerances.

DC5 OK

When on, indicates that the +5 Vdc output voltages
are within the specified tolerances.

PSC Failure

When on, indicates a PSC fault.

PSC OK

When blinking, indicates the PSC is performing
power-on self-tests.

Over Temperature
Shutdown

When on, indicates that the PSC shut down the
system because of an internal overtemperature
condition.

Fan Failure

When on, indicates a fan failure. Use the
hexadecimal number in the Fault ID Display to
isolate the fan.

Disk Power Failure

When on, indicates a disk power failure. Use the
hexadecimal number in the Fault ID Display to
isolate the storage compartment that houses the disk.

Fault ID Display

Displays power subsystem fault codes.

PSC Reset Button

When out, indicates a PSC fault condition. Press in
to reset.

CAMP Fan Fault

When on, indicates that a fan fault caused all disk
drives and tape drives to shut down.

When on, indicates the PSC is functioning.

1.5 Domestic and International Power Distribution Boxes
The domestic power distribution box (PN 30-24374-01) is shown in Figure 1–5.
Table 1–5 describes the components shown in the figure. The international power
distribution box (PN 30-35415-02) is shown in Figure 1–6. Table 1–6 describes
the components shown in the figure.

Cabinet and Component Descriptions 1–9

Figure 1–5 Domestic Power Distribution Box

I
CB

3
5

MR-0498-92DG

Table 1–5 Key to Figure 1–5, Domestic Power Distribution Box
Item

Component

Description

Three-phase power cord

Connects the power distribution box to ac power.
The power cord may be repositioned by moving the
locking arm.

Circuit breaker

When set to on, ac power is applied to the
distribution box.

Local/Remote switch

The switch has icons representing Remote, Off, and
Local. When set to:
•

Local, the internal bus controls the operation of
ac power.

•

Off, the distribution box is turned off.

•

Remote, the distribution box is turned on (if the
power cord is connected to ac power and the
circuit breaker is set to on).

For power cords

Used to dress the power cords.

Eight ac outlets

Reserved for the FEU and expansion cabinet.

1–10 Cabinet and Component Descriptions

Figure 1–6 International Power Distribution Box

1
2

MR-0499-92DG

Table 1–6 Key to Figure 1–6, International Power Distribution Box
Item

Component

Description

Single-phase power cord

Connects the power distribution box to ac power.

Circuit breaker

When set to on, ac power is applied to the
distribution box.

Local/Remote switch

The switch has icons representing Remote, Off, and
Local. When set to:
•

Local, the internal bus controls the operation of
ac power.

•

Off, the distribution box is turned off.

•

Remote, the distribution box is turned on (if the
power cord is connected to ac power and the
circuit breaker is set to on).

For power cords

Used to dress the power cords.

Six ac outlets

Reserved for the expansion cabinet.

Cabinet and Component Descriptions 1–11

2
Console Operations
2.1 In This Chapter
This chapter describes the console, console operating modes and commands, and
booting information.
This chapter includes:
•

Console description

•

Console operating modes

•

Console control characters

•

Console command language syntax

•

Bootstrap procedures

•

Entering CIO mode

•

CIO mode console commands

2.2 Console Description
The system architecture (Figure 2–1 and Table 2–1) supports in each zone:
•

A local console terminal

•

The console firmware (programs located in ROM) residing on:
The primary NCIO module
The CPU module

•

A remote console terminal

The remote console terminal and the local console terminal are connected to the
zone through the primary NCIO module.
The console operates a terminal that may be:
•

Connected to the CPU serial port

•

On the system console port

Cabinet and Component Descriptions 2–1

Figure 2–1 System Components

2
8

4
3

5
6
2
8

4
3

1
MR−0486−92RAGS

Table 2–1 Key to Figure 2–1, System Components
Number

Component

CPU cabinet

Zone (A or B)

CPU module

To memory

Primary NCIO module

Cross-link cable

Local console terminal

Remote console terminal (optional)

2–2 Cabinet and Component Descriptions

Table 2–2 describes the function of each console component.
Table 2–2 Function of the Console Components
Part

Function

Local console terminal

Terminal located with the system that is used for console
input and display output.

Remote console port

One remote port is available in each zone. The port may
be connected to a remote console terminal through a
modem. There is no built-in modem control. The remote
console port provides the same functions as the local
console port.

Console firmware

The console firmware resides on the primary NCIO
module and on the CPU module.

You can use any one of the four console terminals (local or remote) for input
commands, but use only one terminal at a time. All of the console terminals echo
the response of the system to a console command.
If the system is operating with a single zone running, you must use a console
terminal (local or remote) that is connected to that zone for input commands.

2.3 Console Operating Modes
Operators communicate with the system in one of the following input/output
modes:
•

Program I/O (PIO) mode

•

Console I/O (CIO) mode

Normal operation takes place in the PIO mode. From PIO mode, the operator
uses the console to:
•

•

Use the mail facility

•

Create and edit files

From CIO mode, the operator executes the console commands. These commands
are described in Section 2.8.

Cabinet and Component Descriptions 2–3

2.3.1 Entering CIO Mode
The CIO mode is entered when you turn on system power if:
•

The Zone Halt Enable switch is pressed

•

A STOP/ZONE instruction is executed

•

A severe processor condition occurs

•

An external halt is detected

Once entered, the console prompt >>> is displayed and the CIO mode is ready to
execute commands entered at the prompt.

2.3.2 Exiting CIO Mode
The CIO mode is exited by issuing one of the following console commands:
•

BOOT

•

START

•

CONTINUE

These commands are described in Section 2.8. Figure 2–2 shows how to move
between PIO and CIO modes.
Figure 2–2 Console Operating Modes

PIO
Mode

BOOT
CONTINUE
START

STOP/ZONE

CIO
Mode
MR−0487−92RAGS

2–4 Cabinet and Component Descriptions

2.4 Console Control Characters
The ASCII control characters and function keys listed in Table 2–3 have special
meanings when typed on a console terminal.
Table 2–3 Console Control Characters and Function Keys
Character/Key

Function

Break

In CIO mode, acts like Ctrl/C . In PIO mode, causes the processor to
halt and begin running the console program.
If the system is in a secure mode when you press the Break key the
halt is suppressed. If you press the Zone Halt Enable or System
Halt Enable switch, the halt initiated by pressing the Break earlier
is enabled.

Ctrl/C

Echoes ^C and causes the console to abort processing of a command,
if possible.

Ctrl/O

Alternately enables and disables output.

Ctrl/Q

Resumes output previously suspended by Ctrl/S .

Ctrl/R

Echoes ^R and retypes the command line.

Ctrl/S

Stops transmission until Ctrl/Q is typed.

Ctrl/U

Echoes ^U and ignores the current command line. The console
prompt is displayed on the next line. This affects only the entry of
the current line. Pressing Ctrl/U does not abort a command that is
executing.

<x (delete)

Deletes the character to the left of the cursor. On video terminals,
the deleted characters disappear. On hard-copy terminals, the
deleted characters are typed within a pair of backslash delimiters
as they are deleted.

Esc or Ctrl/[

Suppresses any special meaning associated with a given character.

Return

Terminates a command line and executes the command.

Cabinet and Component Descriptions 2–5

2.5 Console Command Language Syntax
The console commands accept qualifiers. Qualifiers specify a numerical value or
select an option from a list of options. Command elements may be abbreviated
and any extra tabs or spaces are ignored. Unless otherwise noted, numerical
values must be given in hexadecimal notation. The command length may not
exceed 80 characters.
Table 2–4 lists the console command language syntax rules. The console
commands available for the system are listed in Section 2.8.
Table 2–4 Console Command Language Syntax
Command Element

Rule

Abbreviations

A command verb or argument may be abbreviated to the
extent that it remains unique.

Multiple adjacent spaces and
tabs

Are treated as a single space.

Qualifiers

May appear after a command verb, option, or symbol.
They must be preceded by a slash (/).

Numbers

Must be hexadecimal.

No characters

Are treated as a null command. No action is taken.

2–6 Cabinet and Component Descriptions

2.6 Bootstrap Procedures
The BOOT command initializes the system and then loads and starts the virtual
memory bootstrap (VMB) program from read-only memory (ROM). The VMB
program, in turn, loads and starts the operating system from the specified boot
device. Figure 2–3 shows the steps in the boot procedure.
Figure 2–3 Boot Procedure

Enter BOOT command
at the >>>
console prompt.

Boot procedure
initializes
the system.

Boot procedure
loads VMB into
main memory.

VMB loads the
operating system.

MR−0490−92RAGS

The VMB program is the primary bootstrap program. VMB:
•

Resides in ROM on the ATM module.

•

Is loaded into memory and initiated by the system console firmware.

•

Provides the necessary parameters for successful operation of the OpenVMS
secondary bootstraps.

•

Allows you to boot from DSSI compatible disk and tape devices over the
Ethernet.

Cabinet and Component Descriptions 2–7

2.7 Entering CIO Mode
To recognize and process CIO commands:
•

The System Halt Enable switch on both zone control panels must be pressed

•

The operating software must be halted

•

The processor must be running the console firmware

The example below shows how to use the Break key to enter CIO mode from PIO
mode and then return to PIO mode by using the CONTINUE command. The
System Halt Enable switch on both zone control panels must be pressed.
Caution
Use CONTINUE to continue from a system halt. Use START/ZONE to
continue from a zone halt.

A remote operator can use CIO mode only when full access privileges for the
remote console have been set at the local console.
Example
$
$
$
$
$ Break
>>>
?002 External halt
PC = 01E01473
>>> CONTINUE
$

! Press the System Halt Enable switch on
! both zone control panels.
! From PIO mode, press the Break key once.
! This puts the processor in HALT mode.
!
!
!
! This command resumes execution of the
! operating system software.
! The console returns to PIO mode.

Notice that comments (characters following an exclamation point (!)) are allowed
on a command line. Comments are ignored by the console when the Return key is
pressed. This may be useful when you document a console session on a hardcopy
terminal.
Notice also that lowercase characters are accepted, but the console converts all
characters to uppercase.

2–8 Cabinet and Component Descriptions

2.8 CIO Mode Console Commands
This section describes the CIO mode console commands. The console commands
are listed below with command abbreviations shown in bold capital letters.
Boot
CLEAR
Continue
Deposit
DUP
Examine
Find

HElp
Initialize
Move
MATCH_ZONES
Repeat
SEt

SHow
Start
Test
X(transfer)
Z
!(comment)

2.8.1 BOOT
BOOT initializes the system, loads a program image from a specified boot device,
and transfers control to that program image.
When you do not supply a boot-spec, the default boot device is used. When you do
not supply flag(s), a value of 0 is assumed.
The console program accepts a terminating colon on the boot-spec, but ignores the
colon when the name is processed.
The BOOT syntax is:
BOOT[/OVER][[/R5:]<flag(s)> boot-spec]
The boot-spec format may be dduuu/PATH=path-list . . . dduuu/PATH=path-list,
where:
dd is a device mnemonic.
uuu is a unit number (0 to 999).
/PATH=path-list is a qualifier. See Table 2–5.
Or, the boot-spec format may be a variable that specifies the boot devices and
paths. See Section 2.8.13.1.
Table 2–5 describes the qualifiers. Table 2–6 lists the VMB program /R5:<flag>
values.
Table 2–5 Qualifiers for BOOT
Qualifier

Function

/R5:<flag>

Passes parameters to the virtual memory bootstrap (VMB)
program. See Table 2–6.

/PATH=path-list

Specifies a path to a boot device. The path-list specifies zones
and slot numbers in the path. When the path-list has more
than one slot, you separate the slots by commas. The path-list
format is zss, where:
z is a zone ID (A or B).1
ss is a slot number (10 to 17, 20 to 27) of an adapter
connecting to a boot device.

/OVER

Overrides the results of the bootability test to allow a Simplex
mode boot.

1 The console validates this field before invoking VMB.

Cabinet and Component Descriptions 2–9

Table 2–6 VMB Program /R5:<flag> Values
Bit

Hex Value

Function

Action

Conversational
boot

Returns to the SYSBOOT> prompt.

Debug

Maps the XDELTA program into the system
page table.

Initial
breakpoint

Operating system issues a breakpoint after
turning on memory management.

Secondary boot

Boots from boot block specified in /R4:n.

Bootstrap
breakpoint

Transfers control to the XDELTA program.

100

Solicit file name

VMB issues a prompt for the secondary boot
procedure.

200

Halt before
transfer

VMB executes a halt before transferring
control to the secondary bootstrap procedure.

31:28

x0000000

Top-level system
boot

Specifies the top-level directory number for
a system disk with multiple system roots,
where x = a hex value from 0 to F.

2.8.2 CLEAR
CLEAR BOOT deletes a boot-spec. CLEAR ERRORS clears the error frame of the
previously detected error. If you do not clear the error frame, the next error is not
recorded in the error frame. CLEAR BROKE clears the broke bit in EEPROM.
The following CLEAR syntax deletes a boot-spec:
CLEAR BOOT <name>
The following CLEAR syntax clears the error frame:
CLEAR ERRORS
The following CLEAR syntax clears the broke bit ID in EEPROM:
CLEAR BROKE[/PATH=path-number]
Table 2–7 describes the /PATH=path-number qualifier.
Table 2–7 Qualifier for CLEAR
Qualifier

Function

/PATH=path-number

Specifies the zone and slot number of the module to clear. The
path-number format is zss, where:
z is the zone ID (A or B).
ss is the slot number (0 to 2, 10 to 17, 20 to 27) of an
adapter connecting to a DSSI device.

CLEAR BROKE clears the module ID EEPROM in the zone that is running.

2–10 Cabinet and Component Descriptions

2.8.3 CONTINUE
CONTINUE exits the CIO mode and returns operation to the PIO mode.
Caution
Use CONTINUE to continue from a system halt. Use START/ZONE to
continue from a zone halt.

The CONTINUE syntax is:
CONTINUE

2.8.4 DEPOSIT
DEPOSIT stores the specified data in the specified address.
When the system is initialized or when any transition from a running to a halted
state occurs, the defaults are physical address space 0 and data size longword.
The DEPOSIT syntax is:
DEPOSIT[/{B,W,L,Q}][/{G,I,M,P,V,U}][/N:count]address-spec data-spec
The address-spec identifies a physical or virtual hexadecimal memory address. A
qualifier may be placed before or after an address-spec or data-spec.
The data-spec identifies a hexadecimal number to be stored, unless the default
radix has been changed with a %D introducer. When you do not supply a
data-spec, a value of 0 is assumed.
Table 2–8 describes the qualifiers. Table 2–9 lists the address-spec symbolic
addresses.
Table 2–8 Qualifiers for DEPOSIT
Qualifier

Function

Sets the data size to byte.

Sets the data size to word.

Sets the data size to longword.

Sets the data size to quadword.

Sets general purpose register address space R0 through PC.

Sets internal processor register (IPR) address space accessed by the MTPR and
MFPR instructions.

Sets physical address space.

Sets virtual address space. An EXAMINE to virtual memory returns the
translated physical address. A DEPOSIT to virtual memory sets the PTE <M>
bit.

Sets access to console private memory. This qualifier must be specified for each
command.

/N:count

Specifies the number of consecutive locations to modify. The console deposits
to the first address, then to the specified number of succeeding addresses. This
qualifier must be specified for each command.

Cabinet and Component Descriptions 2–11

Table 2–9 Address-Spec Symbolic Addresses
Symbolic Address

Description

R<n>

General purpose register number n, where n is a decimal
number 0 to 15.

Frame pointer.

Argument pointer.

Stack pointer.

Program counter.

PSL

Program status longword.

A location following the last location accessed by an EXAMINE
or DEPOSIT. The location is the last address plus the size of
the last reference (1 for byte, 2 for word, 4 for longword).

A location preceding the last location accessed by an
EXAMINE or DEPOSIT. The location is the last address
minus the size of the last reference (1 for byte, 2 for word, 4 for
longword).

The last location referenced by an EXAMINE or DEPOSIT.

Indirect addressing. The address-spec is used as a pointer to
the data. The format is @address-spec, where address-spec can
be any valid address except another @. See Example 2–1.

Note
Remember that the symbolic addresses from the previous command are
used for indirect addressing. See Example 2–1.

Example 2–1 Indirect Addressing
>>> DEPOSIT R0 200

! The value 200 is stored directly in R0. The defaults
! are set to longword, general purpose register.

>>> DEPOSIT/P @R0 200 ! The value 200 is stored directly in the address pointed
! to by R0. The /P qualifier tells the parser that the
! value in R0 should be treated as a physical address.
! The defaults are set to longword, physical.
>>> DEPOSIT/V @R0 200 ! The value 200 is stored directly in the address pointed
! to by R0. The /V qualifier tells the parser that the
! value in R0 should be treated as a virtual address.
! The defaults are set to longword, virtual.
>>> DEPOSIT @200

! The value 200 is stored in the address specified in
! the previous command. The defaults are set to longword,
! virtual.

2–12 Cabinet and Component Descriptions

2.8.5 DUP
DUP connects to the DSSI DUP service on a selected node. DUP is used to
examine and modify the parameters of a DSSI device.
DUP syntax is:
DUP[/PATH:<path-number>] node-id /[TASK:task]
The node-spec identifies the node number (0 to 7) of a DSSI device attached to
the console. Table 2–10 describes the qualifiers.
Table 2–10 Qualifiers for DUP
Qualifier

Function

/PATH=path-number

Specifies the zone and slot number of an adapter connecting to
a DSSI device. The path-number format is zss, where:
z is the zone ID (A or B).
ss is the slot number (10 to 17, 20 to 27) of an adapter
connecting to a DSSI device.

node-id

Specifies the DSSI node connecting to a DSSI device. Valid
node-ids are 0 to 5.

TASK:task

Invokes a task from a DSSI device. Valid DUP tasks are:
DRVEXR
DRVTST
HISTRY
DIRECT
ERASE
VERIFY
DKUTIL
PARAMS

2.8.6 EXAMINE
EXAMINE displays the contents of the specified memory location or register. The
display line consists of:
•

A single-character address specifier

•

The hexadecimal physical address to be examined

•

The examined data in hexadecimal

When the system is initialized or when any transition from a running to a halted
state occurs, the defaults are physical address space 0 and data size longword.
The EXAMINE syntax is:
EXAMINE[/{B,W,L,Q}][/{G,I,M,P,V,U}][/N:count][/A][address-spec]
The address-spec identifies a physical or virtual hexadecimal memory address. A
qualifier may be placed before or after the address-spec or data-spec.
Table 2–11 describes the qualifiers. Table 2–12 lists the address-spec symbolic
addresses.

Cabinet and Component Descriptions 2–13

Table 2–11 Qualifiers for EXAMINE
Qualifier

Function

Sets the data size to byte.

Sets the data size to word.

Sets the data size to longword.

Sets the data size to quadword.

Sets general purpose register address space R0 through PC.

Sets internal processor register (IPR) address space accessed
by the MTPR and MFPR instructions.

Sets physical address space.

Sets virtual address space. An EXAMINE to virtual memory
returns the translated physical address. A DEPOSIT to virtual
memory sets the PTE <M> bit.

Sets access to console private memory. This qualifier must be
specified for each command.

/N:count

Specifies the number of consecutive locations to modify. The
console deposits to the first address, then to the specified
number of succeeding addresses. This qualifier must be
specified for each command.

Interprets and displays the data as ASCII characters.
Nonprinting characters are displayed as periods.

Table 2–12 Address-Spec Symbolic Addresses
Symbolic Address

Description

R<n>

General purpose register number n, where n is a decimal
number 0 to 15.

Frame pointer.

Argument pointer.

Stack pointer.

Program counter.

PSL

Program status longword.

A location following the last location accessed by an EXAMINE
or DEPOSIT. The location is the last address plus the size of
the last reference (1 for byte, 2 for word, 4 for longword).

A location preceding the last location accessed by an
EXAMINE or DEPOSIT. The location is the last address
minus the size of the last reference (1 for byte, 2 for word, 4 for
longword).

The last location referenced by an EXAMINE or DEPOSIT.

Indirect addressing. The address-spec is used as a pointer to
the data. The format is @address-spec, where address-spec can
be any valid address except another @. See Example 2–1.

Note
Remember that the symbolic addresses from the previous command are
used for indirect addressing. See Example 2–1.

2–14 Cabinet and Component Descriptions

2.8.7 FIND
FIND searches the main memory beginning at physical address space 0 for either
a page-aligned 512-Kbyte segment of memory, or a restart parameter block (RPB).
When FIND is successful, it saves the address plus the segment of memory (or
RPB) in the stack pointer. When FIND is unsuccessful, an error message is
displayed and the contents of the stack pointer are unpredictable.
The FIND syntax is:
FIND
Table 2–13 describes the qualifiers.
Table 2–13 Qualifiers for FIND
Qualifier

Function

/MEMORY

Searches main memory for a page-aligned 512-Kbyte segment
of memory.

/RPB

Searches main memory for a restart parameter block. The
search leaves memory unchanged.

2.8.8 HELP
HELP displays a summary of the commands, their arguments, and qualifiers.
When you supply a command name, HELP displays the arguments and qualifiers
for that command only. HELP does not provide complete descriptions of the
commands.
The HELP syntax is:
HELP [command]
Or:
? [command]

Cabinet and Component Descriptions 2–15

2.8.9 INITIALIZE
INITIALIZE performs the steps shown in Table 2–14.
Table 2–14 INITIALIZE Steps
Step

Action

Do hard reset of zone (the cross-link state is set to off).

Do hard reset of all available ATMs.

Initialize hardware.

Reconfigure the zone and update the device configuration block
(DCB) to reflect the zone status.

Execute the Duplex Compatibility Test.

Load the firmware into the console main loop.

The INITIALIZE syntax is:
INITIALIZE

2.8.10 MOVE
MOVE transfers the specified number of bytes (count) from the source-address to
the destination-address.
The MOVE syntax is:
MOVE source-address destination-address count
The source-address is the starting address of the data. The destination-address
is the starting address of the destination. The count is the number of bytes to be
moved.

2.8.11 MATCH_ZONES
MATCH_ZONES copies the system-wide module data EEPROM from the other
zone. MATCH_ZONES does not copy the zone-specific module data EEPROM.
Use MATCH_ZONES only when:
•

The cross-link state is set to off, and

•

The path to the other zone is available. (The cross-link cables and other zone
power is on.)

The MATCH_ZONES syntax is:
MATCH_ZONES

2–16 Cabinet and Component Descriptions

2.8.12 REPEAT
REPEAT continuously executes the specified command. REPEAT applies to the
following commands only.
•

DEPOSIT

•

EXAMINE

REPEAT can be aborted by pressing Ctrl/C at the console keyboard.
The REPEAT syntax is:
REPEAT command

2.8.13 SET
SET modifies the value of the specified variable.
The SET syntax is:
SET variable value [value]
Note
SET does not allow abbreviations. You must enter the name of the
variable completely.

Table 2–15 lists the variables with the acceptable values.
Table 2–15 SET Variables and Values
Variable

Description

Acceptable Values

BOOT DEFAULT

Default boot specification.

Up to 80 characters of ASCII text

MODE

Boot mode.

FAILSTOP = Simplex mode
FAILSAFE = Duplex mode

RESTART

Halt action switch.

HALT = Enter console mode
BOOT = Boot
RESTART = Restart

BAUD

Console port speed.

300, 600, 1200, 2400, 4800, 9600,
19200, 38400

ZONE

Zone identification.

A = Zone A
B = Zone B

Cabinet and Component Descriptions 2–17

2.8.13.1 SET BOOT
SET BOOT saves the values of boot-specs. Space for nine boot-specs is available
on the CPU module EEPROM. The first space is reserved for the default bootspec. The other eight spaces are available to the user.
The SET BOOT syntax is:
SET BOOT DEFAULT value
Or:
SET BOOT boot-spec value
The boot-spec may be up to 8 characters of ASCII text. The value is the ASCII
text assigned to the boot-spec.

2.8.14 SHOW
SHOW displays information about the specified variable. When the cross-link
state is off (Simplex mode), information about the current zone is displayed.
When the cross-link state is on (Duplex mode), information about both zones is
displayed.
The SHOW syntax is:
SHOW variable
Table 2–16 lists the variables. You must supply a variable.
Table 2–16 SHOW Variables
Variable

Description

Acceptable Values

DEFAULT

Default specification.

Up to 80 characters of ASCII text

MODE

Boot mode.

FAILSTOP = Simplex mode
FAILSAFE = Duplex mode

RESTART

Halt action switch.

HALT = Enter console mode
BOOT = Boot
RESTART = Restart

BAUD

Console port speed.

300, 600, 1200, 2400, 4800, 9600,
19200, 38400

ZONE

Zone identification.

A = Zone A
B = Zone B

BOOT

Displays the saved boot
specifications.

CONFIGURATION

Displays the current system
configuration, including the
identity and status of any
modules in the system.

VERSION

Displays the firmware
revision of all ROMs in the
system.
(continued on next page)

2–18 Cabinet and Component Descriptions

Table 2–16 (Cont.) SHOW Variables
Variable

Description

Acceptable Values

DSSI/PATH=pathnumber

Specifies the zone and
slot number of an adapter
connecting to a DSSI device.
The path-number format is
zss, where:
z is the zone ID (A or B).
ss is the slot number
(10 to 17, 20 to 27) of an
adapter connecting to a
DSSI device.

ETHERNET

Displays the physical
Ethernet addresses.

MEMORY

Displays system memory
information.

STATE

Displays the state of the
cross-link and the system
cables.

ERRORS

Displays the diagnostic error
frames. Not allowed if the
cross-link state is on.

ALL

Displays the contents of all
variables.

2.8.15 START
START begins execution of the operating software from the specified address.
START is equivalent to DEPOSIT PC followed by CONTINUE.
The START syntax is:
START address-spec
You must supply an address-spec.

Cabinet and Component Descriptions 2–19

2.8.16 TEST
TEST enables the user to test:
•

The system

•

A zone

•

The CPU and memory

Use TEST only when the cross-link state is set to off.
The TEST syntax is:
TEST [qualifier(s)]
Tables 2–17 and 2–18 describe the TEST selection and control qualifiers.
Table 2–17 Qualifiers for TEST Selection
Qualifier

Function

/GROUP:n1

Specifies a decimal number from 0 to 5 that identifies the
group of tests to be run.

/TEST:n1

Specifies a decimal number from 0 to 32 that identifies the
tests to be run.

/SUBTEST:n1

Specifies a decimal number from 0 to 32 that identifies the
subtests to be run.

/VERBOSE

Enables a display of all individual tests during execution.

/NOTRACE

Disables test traces.

1 n can be a:

• Single value
• Range separated by a colon (1:5)
• List separated by commas (1,5,9)
• Combination of range and list (1:6,8,10,11:29)

2–20 Cabinet and Component Descriptions

Table 2–18 Qualifiers for TEST Control
Qualifier

Function

/PASSCOUNT:n

n is a decimal number from 0 to MAXINT. When n is 0, the
passcount is infinite.

/NOTRACE

Disables the test traces.

/COE

Continues on error.

/NOCONFIRM

Disables the test confirmation on destructive tests.

/EXTENDED

Enables extended error reports.

/NOSTATUS

Disables status messages and reports.

/LIST

Lists the available tests, but does not run them.

When you do not supply the qualifier(s), TEST runs all the nonextended tests
(except those that require confirmation).

2.8.17 X(transfer)
X is used by automatic systems communicating with the console. X is not
intended for use by operators.
X loads or unloads the count of bytes beginning at the specified address.
When the high-order bit of the count longword is 1, the data is read from physical
memory to the console terminal. When the high-order bit of the count longword
is 0, the data is written from the console terminal to physical memory.
The X syntax is:
X address-spec count

Return

data-stream checksum

The address-spec is a hexadecimal number that specifies a physical address.
The count is an 8-bit hexadecimal number that specifies a number of bytes.
The data-stream contains the bytes to be transferred by X. The checksum is a
2-digit hexadecimal number that specifies the 2’s complement checksum of the
data-stream. The checksum verifies the data-stream.

Cabinet and Component Descriptions 2–21

2.8.18 Z
Z connects to the firmware of another module in the system.
The Z syntax is:
Z[/PATH=path-number]
Table 2–19 describes the qualifier.
Table 2–19 Qualifier for Z
Qualifier

Function

/PATH=path-number

Specifies the zone and slot number of a module. The pathnumber format is zss, where:
z is the zone ID (A or B).
ss is the slot number of the module.

When you do not supply a path, Z tries to connect to the module in slot 1 of the
zone that is running.
Note
Z performs a hard reset on the ATMs, but you need to issue a programmed
reset to load and start the functional firmware. After Z, you must issue a
BOOT from the same zone, or a START/ZONE from the other zone (if that
zone is running the operating system).

2.8.19 !(comment)
The ! (exclamation point) prefixes a comment. The text following the ! is ignored.
The ! syntax is:
!(comment)
Or:
command!(comment)

2–22 Cabinet and Component Descriptions

3
System Maintenance
3.1 In This Chapter
This chapter includes:
•

Maintenance strategy

•

Operating rules and cautions

•

General troubleshooting procedure

•

Module fault LEDs

•

Power system overview

•

Power system maintenance

•

Device status and fault indicators

•

ROM-based diagnostics

3.2 Maintenance Strategy
When a hardware component fails, the Model 810 system uses self-diagnosis
through ROM-based diagnostics (RBDs) to isolate the faulty FRU. Once isolated,
the system automatically:
•

Places the faulty FRU off line

•

Reports the error in the error log

•

Identifies the faulty FRU on the console terminal

•

Turns on the faulty FRU fault LED

System Maintenance 3–1

3.3 Operating Rules and Cautions
Table 3–1, Table 3–2, and Table 3–3 contain operating rules for use during a
service call. Table 3–4 provides cautions.
Table 3–1 Before Stopping a Zone
Step

Action

Do not depend on the accuracy of a zone ID label. Issue SHOW ZONE before
STOP/ZONE to check the states of both zones.

Issue SHOW SYSTEM to make sure that the FTSS$SERVER process is running
before turning off zone power, or pressing the Break key.

Check both zone control panels. The System Fault indicator in the failing zone
should be on.

Check console messages and error log for related problem information.

Always issue SHOW DEV D before STOP/ZONE to make sure that shadow set
copying in not in progress.

Issue STOP/ZONE. Wait for the zone to initialize, and then turn off zone power.

Remove the cross-link assembly.

Table 3–2 After a Zone is Repaired
Step

Action

Replace the cross-link assembly.

Turn on zone power.

Issue SHOW MODE to make sure that the zone is set to: MODE = FAILSAFE.

Issue START/ZONE.

Check the running zone console for the following message: % FTSS-S-ZONEAVAIL.

If the message in step 5 does not appear on the console, consider replacing the
cross-link assembly.

Monitor the console for the following environmental information messages:
"OPERATING ON EXTERNAL POWER"
"OPERATING BATTERY POWER" (Life approx 1 hr.)
"NORMAL ZONE TEMPERATURE"
"YELLOW ZONE TEMPERATURE"
"BATTERY TEST PASSED IN CABINET....."
"BATTERY TEST FAILED" (Battery not present)
FTSS messages....

3–2 System Maintenance

Table 3–3 Before Leaving the Site
Step

Action

Issue SHOW DEVICE D to make sure that all disks are either shadow set
members or in the process of being copied.

Issue SHOW DEVICE E to make sure that all EP/EF drivers are on line.

Use FTSS$FSM to show the failover set status:

MCR FTSS$FSM Return
FSM> SHOW ADAPTER Return
4.

Issue SHOW DEV PW to make sure the PW driver is on line.

Issue SHOW CLUSTER/CONTINUE (ADD CIRCUITS,
CONNECTIONS,LPORT,RPORT) to check for correct DSSI configuration:

$ SHOW CLUSTER/CONTINUE Return
COMMAND> ADD CIRCUITS, CONNECTIONS,LPORT,RPORT

Return

SYSTEMS
MEMBERS
CIRCUITS
CONNECTIONS
NODE SOFTWARE STATUS LPORT RPORT RP_TYP CIR_STA LOC_PROC_NAME
FTSYS VMS V5.4

PWA0
PWB0
PWF0
PWG0
PWA0
PWB0
PWF0
PWG0

6
7
6
7
7
6
7
6

SWIFT
SWIFT
SWIFT
SWIFT
SWIFT
SWIFT
SWIFT
SWIFT

OPEN
OPEN
OPEN
OPEN
OPEN
OPEN
OPEN
OPEN
SCS$DIRECTORY

LISTEN

RFX V200

PWA0

RF35

MSCP$TAPE
MSCP$DISK
OPEN VMS$DISK_CL_DRVROPEN OPEN

USERS RFX V200

PWG0
PWA0
PWG0

0
1
1

RF35
RF35
RF35

OPEN
OPEN
OPEN VMS$DISK_CL_DRVROPEN OPEN

FTTA

RFX V246

PWA0

RF35

OPEN VMS$DISK_CL_DRVROPEN OPEN

SYSB

RFX V200

PWG0
PWB0

2
0

RF35
RF35

OPEN
OPEN VMS$DISK_CL_DRVROPEN OPEN

DISK1 RFX V200

PWF0
PWB0
PWF0

0
1
1

RF35
RF35
RF35

OPEN
OPEN
OPEN VMS$DISK_CL_DRVROPEN OPEN

FTTB RFX V246

PWB0

RF35

OPEN VMS$DISK_CL_DRVROPEN OPEN

PWF0

RF35

OPEN

SYSA

CON_STA

Make sure that the Break keys on both zones are disabled (zone control panel
SECURE LED is on).

System Maintenance 3–3

Table 3–4 Cautions
1.

Do not press ZONE HALT ENABLE and the Break key to stop a running zone.
Use STOP/ZONE. If ZONE HALT ENABLE is used, CONTINUE will not resume
zone operation.

Do not press the Break key or cycle power during the power on or RBD tests. This
action may corrupt the EEPROM.

Do not perform a Simplex boot (MODE = FAILSTOP) from a disk used by the
running zone. This action may corrupt the disk.

Do not turn off zone power or halt a zone if the FTSS$SERVER is not loaded and
running.

3.4 General Troubleshooting Procedure
Table 3–5 provides a general procedure for isolating and replacing a faulty FRU.
While the repair is being performed, the user application continues to run.
Table 3–5 General Troubleshooting Procedure
Step

Action

Check both zone control panels. The System Fault indicator in the failing zone
should be on.

If the zone is not already stopped, ask the system manager or other responsible
system person to perform a SHOW ZONE and STOP ZONE.
After the system manager stops the zone, remove the cross-link assembly.
If you are given permission to stop the zone, use the procedure specified in
Table 3–1.

Check all fault LEDs and the console messages.
To verify that the correct FRU has been isolated, check the error log.
If a fault LED is on and/or a console message indicates that an FRU has been
removed from service, replace the FRU. (See Chapter 5, FRU Removal and
Replacement Procedures.)

Note
Before removing and replacing any module, check the Power Module
indicators (Table 3–9) to rule out any potential power problems.
4.

If the replaced FRU corrected the problem, turn on zone power.

If the repaired zone passes the power on diagnostics, turn off zone power and
reconnect the cross-link assembly.

Turn on zone power. If the power on diagnostics and the duplex compatibility test
pass with the cross-link assembly connected, turn the system over to the system
manager.
The system manager is responsible for synchronizing the system and returning it
to duplex operation.
(continued on next page)

3–4 System Maintenance

Table 3–5 (Cont.) General Troubleshooting Procedure
Step

Action

If the replaced FRU did not correct the problem, open the system cabinet front
door. Check all module and disk drawer fault LEDs.
If any fault LED is on, replace the associated module or device. (See Chapter 5,
FRU Removal and Replacement Procedures.)

If no module or disk fault LED is on, open the system cabinet rear door. Check all
module LEDs in the miscellaneous and interface module card cages.
If a fault LED is on, replace the associated module. (See Chapter 5, FRU Removal
and Replacement Procedures.)

If no module fault LED is on, open the expansion cabinet rear door. Check the disk
power fault indicators to eliminate any potential power problems. (See Figure 3–7
and Figure 3–9.)
If a power fault indicator is on, replace the device. (See Chapter 5, FRU Removal
and Replacement Procedures.)

10.

If no power fault indicator is on, open the expansion cabinet front door and check
all disk and tape unit fault LEDs and indicators. (See Figure 3–6, Figure 3–8, and
Table 3–23.)
If any LED or fault indicator is on, replace or repair the failing device. (See
Chapter 5, FRU Removal and Replacement Procedures.)

11.

If no fault LEDs or indicators are on, run the error log utility. (See Chapter 4,
Error Handling and Analysis.)
Use the OpenVMS HELP facility to help you run the utility as shown in the
following example.
Qualifier examples can be displayed at the ANALYZE Subtopic? prompt as
shown at the end of the code example.

$ HELP ANALYZE/ERROR_LOG
ANALYZE
/ERROR_LOG
Invokes the Errorlog Report Formatter (ERF) to selectively report
the contents of an error log file. The /ERROR_LOG qualifier is
required. For a complete description of the OpenVMS Analyze Error
Log Utility, including more information about the ANALYZE/ERROR_LOG
command and its qualifiers, see the OpenVMS Error Log Utility Reference
Manual.
Format:
ANALYZE/ERROR_LOG [file-sped[,...]]
Additional information available:
Parameters Command_Qualifiers
/BEFORE
/BINARY
/BRIEF
/ENTRY
/EXCLUDE
/INCLUDE /LOG
/OUTPUT /REGISTER_DUMP
/SID_REGISTER
/SINCE
/STATISTICS
Examples

/FULL
/REJECTED
/SUMMARY

ANALYZE /ERROR_LOG Subtopic? Return
ANALYZE Subtopic? Examples Return
(continued on next page)

System Maintenance 3–5

Table 3–5 (Cont.) General Troubleshooting Procedure
Step

Action

12.

If the problem cannot be isolated and repaired, the service call should be escalated
to the Customer Service Center for further action.

3.5 Module Fault LEDs
Figure 3–1 shows all module fault LED locations. Table 3–6 identifies each
module.
Figure 3–1 Module Fault LEDs

Rear

Front

4
5
1

..
..
..
.
. ...

7
8

. . . .. . . .

.
3

.. .. .. ..
..
.

.
. . . .. . . .

CPU Cabinet

CPU Cabinet
MR−0049−93RAGS

3–6 System Maintenance

Table 3–6 Key to Figure 3–1, Module Fault LEDs
Key

Module

CPU module

ATM module

System Fault (zone control panel)

Front end unit

DC3 converter

DC5 converter

Power system controller

Console module

CAMP module

DSSI and Ethernet interface modules

3.6 Power System Overview
The following sections describe the power distribution and power components.
Figure 3–2 and Figure 3–3 are basic block diagrams of the system power and
power distribution.
Table 3–7 provides a functional summary of the power components. Table 3–8 is
a DC voltage summary.

System Maintenance 3–7

Figure 3–2 Power System Block Diagram (1 of 2)

UTILITY POWER INPUT
120 Vac, 60 Hz
240 Vac, 50 Hz

Optional
Uninterruptible
Power System
AC POWER OUTPUT AND DISTRIBUTION
Power
Distribution
Boxes

With UPS: AC Power Distributed to
System and Expansion Cabinets
Without UPS: AC Power Distributed
to Expansion Cabinet

DC POWER OUTPUT AND DISTRIBUTION
48V_DRCT

Front End
Unit
48V_SWD

DC5

5 Vdc to Centerplane to CPU/IO ATM/Console
Extender/Interface Modules
3.3 Vdc and 12 Vdc to Centerplane to CPU/IO
ATM/Console Extender/Interface Modules

DC3

Thermal Emulator Output to
Power System Control (PSC)
2 Vdc Output Not Used on Model 810

CAMP
Module

Zone A
and B
LDCs

21 Vdc to CPU and IO ATM Module Clock Logic
48V_PSC to PSC
I2C Bus Power to Module Fault LEDs
5 Vdc In−Zone Disk Control Panel
5 Vdc Terminal DC Power
12 Vdc LDC Control Card
LDC Control and Status to CAMP Module

Zone A and B
Disk Extender
Modules

48V_SWD
DC3 3.3 Vdc/12 Vdc Input
DC5 5 Vdc Input
Console Extender Module
−12 Vdc Input

Console
Extender
Module

−12 Vdc to Centerplane/CPU/IO
ATM/Interface Modules

Interface
Module
MR−0500−92RAGS−A

3–8 System Maintenance

Figure 3–3 Power System Block Diagram (2 of 2)

DC POWER OUTPUT AND DISTRIBUTION
DC3 3.3 Vdc/12 Vdc Input
DC5 5 Vdc Input
Console Extender Module
−12 Vdc Input
DC3 3.3 Vdc/12 Vdc Input
DC5 5 Vdc Input
Console Extender Module
−12 Vdc Input

CPU
Module

IO ATM
Module
12 Vdc

3.3 Vdc

Internal
DC to DC
Converter

I2C Bus−Power Status to System
Power Fail Function (POK_H) to
I/O Devices and Options

From CAMP Module: Initiate
Power On Sequence
DC3 Thermal Emulator Input
System Temperature Monitor
Centerplane DC Voltage Monitor
LDC Status Monitor

Power
System
Control

Initiate Overtemperature
Power Off Sequence
Initiate Overvoltage
Power Off Sequence
Initiate Undervoltage
Power Off Sequence
Fan Speed Commands to CAMP Module
Report LDC Status and Faults to System
MR−0500−92RAGS−B

System Maintenance 3–9

Table 3–7 Power System Functional Summary
FRU

Functional Summary

Local Disk Converter
(LDC)

An LDC is located in each in-zone disk drawer. It provides
+12 Vdc with fast transit response and tolerance to short-term
loading during disk spinup. Also provides +5 Vdc for power
logic, and EMI filtering for the 48 V bus.
It provides VTERM, which is a 5 V diode isolated output, and
current limited for powering the I/O bus terminators. Fusing is
included to prevent a fault on one LDC from loading the 48 V
bus and crashing the entire power system.

Front End Unit (FEU)
H7884-AA

Provides the main ac circuit breaker, and generates two +48 V
outputs:
•

Unswitched (DRCT) which supports the CAMP and Disk
Extender modules, LDCs, DC3, and DC5

•

Switched (SWD) which supports the interface modules, and
Console and Disk extender modules

Also provides programmable fan power output from +11 to +27
Vdc which allows the system to adjust the fan speed based on
system temperature. The PSC monitors the system temperature
through a thermal emulator in DC3, and sends fan speed
commands through the CAMP module to the FEU to adjust
the fan power output.
Power System
Controller (PSC)
H7851-AA

An I2C bus allows the PSC to write power status information
to the system, and provides a power fail signal (POK_H) to the
mass storage devices and I/O options. Receives commands from
the CAMP module to initiate the logic power on sequence by
commanding the FEU to turn on the +48 V switched output and
enable the DC3 and DC5 outputs.
The PSC also drives the power system visual status indicators.
It monitors system temperature through the thermal emulator in
DC3 and sends fan speed commands through the CAMP module
to the FEU for fan power and fan speed control. Provides a
warning when system temperatures are beyond the normal
operating range:
Green Zone = 5°C (41°F) to 52°C (126°F)
Yellow Zone = 5°C (41°F) to 62°C (144°F)
Red Zone = 5°C (41°F) to 75°C (167°F)
Initiates the power off sequence when system temperature
reaches the red zone.
The PSC monitors the centerplane voltages and initiates a power
off on an undervoltage fault; fires the crowbar and initiates a
power off on an overvoltage fault.
Also initiates a power off if the FEU indicates a 48 V output is
out of tolerance, or there is less than 4 millisecond of reserve
power, and on a fan failure. The PSC monitors the LDC status
and reports failures to the system.
(continued on next page)

3–10 System Maintenance

Table 3–7 (Cont.) Power System Functional Summary
FRU

Functional Summary

DC5 H7179-AA

DC to dc converter which provides +5 Vdc to the CPU, MMB,
SIMMs, I/O ATM, interface and console extender modules, as
well as +5 Vdc to the I/O ATM internal +5 Vdc to +3.3 Vdc
converter for the SOC.
Provides EMI filtering on the 48 V bus, and fusing to prevent
the power system from crashing due to a short circuit on a
converter input. Supports the crowbar SCR on a 5 V overvoltage
or undervoltage fault.

DC3 H7178-AA

DC to dc converter which provides +3 Vdc to the CPU, I/O ATM,
interface and console extender modules. Provides +12 Vdc to the
console extender module +12 V to -12 V converter for the CPU
and I/O ATM modules, and the +21 V converter for the CPU and
I/O ATM clock logic.
Provides EMI filtering on the 48 V bus, and fusing to prevent
the power system from crashing due to a short circuit on a
converter input. Supports the crowbar SCR on a 3 V or 12 V
undervoltage or overvoltage fault. Provides system temperature
sensing through the thermal emulator.
The emulator provides system temperature information to the
PSC for system cooling fan speed control and for power off in the
event of an overtemperature condition.

CAMP module

Control and Miscellaneous Power module. Provides
miscellaneous custom power control circuits.

Console extender
module

Provides local and remote console terminal ports, modem port,
and zone control panel interface.

Fan current sense
board (FCSB)

Monitors the fan current and rotation, and generates a
rotation signal to the CAMP module. The CAMP module in
turn generates a tachometer signal to the PSC for fan speed
monitoring and control.

Zone A and B power
controllers

Provide ac utility power to the peripheral devices. Power
controllers are located in the expansion cabinet.

Power I2C bus

Provides serial communication between the PSC, console
extender, and I/O ATM modules. The PSC uses the bus to
write power status information.
The I/O ATM uses the bus to control the zone control panel
LEDs through the console extender module. It also writes the
Ethernet hardware addresses.

System Maintenance 3–11

Table 3–8 System DC Voltage Summary
Component

Supplies . . .

To . . .

DC5 (H7179-AA)

+5 Vdc

CPU, I/O ATM, console extender, and
interface modules

DC3 (H7178-AA)

+3.3 Vdc

CPU, I/O ATM, console extender, and
interface modules

DC3 (H7178-AA)

+12 Vdc

CPU, I/O ATM, console extender, and
interface modules

FEU (H7884-AA)

+48V_DRCT
(direct)

CAMP and disk extender modules, LDCs,
DC3, and DC5

FEU (H7884-AA)

+48V_SWD
(switched)

Console extender, disk extender, and
interface modules

CAMP 48V_DRCT to 12
V converter

VBIAS12

I2C bus power to drive module fault LEDs

CAMP 48V_DRCT to 12
V converter

VBIAS5

CAMP module internal bias voltage

Console extender module
+48_SWD to -12 V
converter

-12 Vdc

CPU and I/O ATM modules

CAMP +12 V to +21 V
converter

+21 Vdc

CPU and I/O ATM module clock logic

FEU (H7884-AA)

11 Vdc to 27
Vdc

Programmable fan control power

Local disk converter
(LDC)

+5 Vdc

In-zone disk control panel

LDC

+12 Vdc

LDC control card

LDC

+5 VTERM

Terminal dc power

3.7 Power System Maintenance
Figure 3–4 shows the location of the power module controls and indicators.
Table 3–9 describes module functions and repair action.
Table 3–10, Table 3–11, Table 3–12, Table 3–13, Table 3–14, Table 3–15,
Table 3–16, and Table 3–17 describe the Fault ID Display codes of the PSC.

3–12 System Maintenance

Figure 3–4 Power Module Controls and Indicators

FEU

DC3

DC5

PSC
7
8
9
10
11
12
13

2
3
4

CAMP
MR−0483−92RAGS

Table 3–9 Key to Figure 3–4, Power Module Controls and Indicators
Item

Control/Indicator

Function

Repair Action

AC Circuit Breaker

FEU Failure

When on, indicates the
dc output voltages for the
FEU are below the specified
minimum.

Replace the FEU. See
Chapter 5.

FEU OK

When on, indicates the
dc output voltages for the
FEU are above the specified
minimum.

DC3 Failure

When on, indicates that
one of the output voltages
is not within the specified
tolerances.

Replace the dc converter.
See Chapter 5.

(continued on next page)

System Maintenance 3–13

Table 3–9 (Cont.) Key to Figure 3–4, Power Module Controls and Indicators
Item

Control/Indicator

Function

DC3 OK

When on, indicates that the
output voltages are within
the specified tolerances.

AC Present

When on, indicates ac power
is present at the ac input
connector, regardless of the
position of the circuit breaker.

If ac power is present,
check the power source and
power cord.

Replace the dc converter.
See Chapter 5.

DC5 Failure

When on, indicates that
one of the output voltages
is not within the specified
tolerances.

DC5 OK

When on, indicates that the
output voltages are within
the specified tolerances.

PSC Failure

When on, indicates a PSC
fault.

PSC OK

When blinking, indicates the
PSC is performing power-on
self-tests.

Repair Action

If the system will not
power on, and the ac LED
is the only LED on, check
the circuit breaker.

Replace the PSC. See
Chapter 5.

When on, indicates the PSC
is functioning.
11

Over Temperature
Shutdown

When on, indicates that
the PSC shut down the
system because of an internal
overtemperature condition.

Set the circuit breaker
to off and wait 1 minute
before turning system
power on.
Make sure the air intake
is unobstructed and that
the room temperature does
not exceed the maximum
requirement.

Fan Failure

When on, indicates a fan
failure. Use the hexadecimal
number in the Fault ID
Display to isolate the fan.

Replace the fan. See
Chapter 5.

Disk Drive Power
Failure

When on, indicates a disk
drive power failure. Use the
hexadecimal number in the
Fault ID Display to isolate
the storage compartment that
houses the disk drive.

The faulty unit is probably
the local disk converter
(LDC). To isolate the LDC,
disconnect the drives on
the specified bus, and turn
on system power.
If the indicator stays
on with the drives
disconnected, replace the
failing LDC. See Chapter 5.
A cable or drive may also
be at fault.
(continued on next page)

3–14 System Maintenance

Table 3–9 (Cont.) Key to Figure 3–4, Power Module Controls and Indicators
Item

Control/Indicator

Function

Repair Action

Fault ID Display

Displays the power
subsystem fault codes.

PSC Reset Button

When out, indicates a PSC
fault condition.

Press in to reset.

CAMP Fan Fault

When on, indicates that a fan
fault caused all disk drives
and tape drives to shut down.

Replace the fan. See
Chapter 5.

Table 3–10 Fan, LDC, Temperature Error Codes
Error
Code

PSC
OK

PSC
Failure

LDC
Fault

FAN
Failure

Off

—1

—

Normal operation, displayed after PSC
passes self-test

—

Fan 1 failed

—

Fan 2 failed

—

Fan 3 failed

—

Fan 4 failed

—

Access door opened, or two or more fans
failed

—

LDCA (LDC0) failed

—

LDCB (LDC1) failed

—

LDCC (LDC2) failed

—

LDCD (LDC3) failed

—

LDCE (LDC4) failed

—

LDCF (LDC5) failed

—

LDCG (LDC6) failed

—

LDCH (LDC7) failed

Off

—

Temperature sensor failed, low reading

Off

—

Temperature sensor failed, high reading

—

Temperature in red zone

Error Description

1 Dash entries = LED state NOT changed by error

The PSC Fault ID Display provides a continuous, 1-character rotating display of
the 4-character error codes listed in Tables 3–11 to 3–17. Character display time
is approximately 1/2 second.

System Maintenance 3–15

Table 3–11 FEU Error Codes
Error
Code

FEU
OK

FEU
Failure

Error Description

E200

Off

48V_SWITCHED OK before enabling

E201

Off

Fan converter operating before enabling

E202

Off

HVDC is OK, but POWER is not OK (contradictory
status)

E203

Off

The ac current is not OK (in idle state/loop)

E204

Off

48V_DIRECT is not OK and POWER is OK (IRQ18)

E205

Off

48V_SWITCHED is not OK and switched bus requested
(IRQ19)

E206

Off

HVDC is OK, but POWER is not OK (IRQ20)

E210

Off

SWITCHED BUS did not turn on at startup

E211

Off

SWITCHED BUS did not turn off at shutdown

E212

Off

The ac current is high for the second time (in startup or
run loop)

E220

Off

Fan converter voltage is low

Table 3–12 PSC Error Codes
Error
Code

PSC
OK

PSC
Failure

Error Description

EFFF

Off

Invalid error number (in display_error procedure)

E000

Off

Unused error condition

E001

Off

PSC bias supply not OK

E002

Off

80C196 internal register test failed

E003

Off

80C196 operational test failed

E004

Off

80C196 on-chip RAM test failed

E005

Off

ROM checksum test failed

E006

Off

External RAM test failed

E007

Off

Port FF20 (PSC/FEU LEDs) not initially zero

E008

Off

Port FF22 (Module enable) not initially zero

E009

Off

Port FF23 (DC-DC LEDs) not initially zero

E010

Off

Port FF24 (LDC enable) not initially zero

E011

Off

External interrupt test failed (8259 did not clear test bit)

E012

Off

Masked interrupt occurred (A/D conversion complete)

E013

Off

Masked interrupt occurred (HSI data available)

E014

Off

Masked interrupt occurred (HSO)

E015

Off

Masked interrupt occurred (HSI pin 0)

E016

Off

Masked interrupt occurred (Serial I/O)

E017

Off

Software trap interrupt occurred (F7 instruction
executed)
(continued on next page)

3–16 System Maintenance

Table 3–12 (Cont.) PSC Error Codes
Error
Code

PSC
OK

PSC
Failure

E018

Off

Unimplemented opcode interrupt occurred (invalid
instruction)

E019

Off

Masked interrupt occurred (HSI FIFO 4th entry)

E020

Off

Masked interrupt occurred (Timer 2 capture)

E021

Off

Masked interrupt occurred (Timer 2 overflow)

E022

Off

PSC bias supply failed (NMI occurred)

E023

Off

Invalid interrupt number (>31) received from 8259

E024

Off

IRQ4 occurred (slave 0 to master 8259)

E025

Off

IRQ5 occurred (slave 1 to master 8259)

E026

Off

IRQ6 occurred (slave 2 to master 8259)

E027

Off

Masked IRQ13 occurred (FEU DIRECT 48 became OK)

E028

Off

Masked IRQ14 occurred (FEU SWITCHED 48 became
OK)

E029

Off

Masked IRQ16 occurred (FEU POWER became OK)

E030

Off

External interrupt test, not enabled (IRQ22)

E031

Off

External interrupt test, bit not set (IRQ22)

E032

Off

Masked IRQ25 occurred (OCP DC ON, turned on)

E033

Off

Masked IRQ26 occurred (PSC DC ON, turned on)

E034

Off

Invalid converter number (start of enable_converter
procedure)

E035

Off

Invalid converter number (end of enable_converter
procedure)

E036

Off

Invalid converter number (start of disable_converter
procedure)

E037

Off

Invalid converter number (end of disable_converter
procedure)

E047

Off

Unused error condition

E078

Off

Unused error condition

E079

Off

Unused error condition

E086

Off

Unused error condition

E087

Off

Unused error condition

E088

Off

Unused error condition

E091

Off

Unused error condition

E092

Off

Unused error condition

E093

Off

Unused error condition

E094

Off

Unused error condition

E095

Off

Unused error condition

E096

Off

Unused error condition

E097

Off

Unused error condition

Error Description

(continued on next page)

System Maintenance 3–17

Table 3–12 (Cont.) PSC Error Codes
Error
Code

PSC
OK

PSC
Failure

Error Description

E098

Off

Unused error condition

E099

Off

Unused error condition

Table 3–13 12 V DC to DC Converter Error Codes
Error
Code

12V
OK

12V
Fault

5V
OK

5V
Fault

3V
OK

3V
Fault

2V
OK

2V
Fault

E010

—

Error Description

—

Off

—

Delta 0 V

E101

—

Off

—

Off

—

Off

—

Indeterminant
converter
overvoltage
(IRQ7)

E102

Off

—

Off

—

Off

—

Off

—

Indeterminant
converter
overvoltage/
undervoltage
(IRQ15)

E103

Off

Unknown
converter
overvoltage/
undervoltage
condition

1 Dash entries = LED state NOT changed by error

Table 3–14 2 V DC to DC Converter Error Codes
Error
Code

2V
OK

2V
Fault

Error Description

E110

Off

Out of regulation low

E111

Off

Out of regulation high

E112

Off

Undervoltage

E113

Off

Overvoltage

E114

Off

Voltage present when disabled

E115

Off

Did not turn off

Note
The 2 V converter output is not used on the Model 810.

3–18 System Maintenance

Table 3–15 3 V DC to DC Converter Error Codes
Error
Code

3V
OK

3V
Fault

Error Description

E120

Off

Out of regulation low

E121

Off

Out of regulation high

E122

Off

Undervoltage

E123

Off

Overvoltage

E124

Off

Voltage present when disabled

E125

Off

Did not turn off

Table 3–16 5 V DC to DC Converter Error Codes
Error
Code

5V
OK

5V
Fault

Error Description

E130

Off

Out of regulation low

E131

Off

Out of regulation high

E132

Off

Undervoltage

E133

Off

Overvoltage

E134

Off

Voltage present when disabled

E135

Off

Did not turn off

Table 3–17 12 V DC to DC Converter Error Codes
Error
Code

12V
OK

12V
Fault

Error Description

E140

Off

Out of regulation low

E141

Off

Out of regulation high

E142

Off

Undervoltage

E143

Off

Overvoltage

E144

Off

Voltage present when disabled

E145

Off

Did not turn off

3.8 Device Status and Fault Indicators
The following sections describe the device status and fault indicators.

3.8.1 RF35 Disk Drawer
Figure 3–5 shows the RF35 disk drawer controls and indicators. Table 3–18
describes their functions.

System Maintenance 3–19

Figure 3–5 RF35 Disk Drawer Controls and Indicators

D0 D1 D2
FAULT
WRITE
PROT
ON
LINE
PWR
ON/OFF
SET UP

D3 D4

0−1
SU

FAULT
WRITE
PROT
ON
LINE
PWR
ON/OFF
SET UP

0−1
SU

MR−0436−92RAGS

Table 3–18 RF35 Disk Drawer Controls and Indicators
Control/Indicator

Color

State

Operating Condition

Fault

Red

Drive is faulty.

Off

Drive is functioning correctly.

Out, off

System can read from the disk and write to
the disk.

In, on

System cannot write to the disk, but can
read from the disk.

Out, off

Drive is disabled.

In, on

Drive is enabled.

In, on

Power is on.

Out, off

Power is off.

Prevents the drive from joining the DSSI
cluster. Also allows you to set the DSSI
parameters for a new drive or a drive you
replace in the system after repair. (If you
want to set the DSSI parameters, you press
the Set Up switch and the Power On/Off
switch at the same time.)

Out

Has no effect on the drive.

Write Protect

On Line
Power On/Off
Set Up Switch

3–20 System Maintenance

Amber

Green
Green

3.8.2 SF35 Storage Array
Figure 3–6 shows the operator control panel. Table 3–19 describes their functions.
Figure 3–7 shows the rear of the storage array. Table 3–20 describes the functions
of the controls and indicator located at the rear of the storage array.
Figure 3–6 SF35 Operator Control Panel
Operator
Control
Panel
(OCP)

Front

Reeaarr
R

Ready
Write
Protect
Fault

Fault Indicators

Front

Rear

MR-0017-93DG

System Maintenance 3–21

Table 3–19 SF35 Operator Control Panel Description
Control/Indicator

Function

Ready

Push-to-set switch with green indicator. Brings the integrated storage
element (ISE) on-line in about 10 seconds. The indicator remains on
while the ISE is on-line.

Write Protect

Push-to-set switch with amber indicator. Write protects the data on
the ISE. The data cannot be overwritten, nor can new data be written
to the ISE.

Fault

Recessed switch with multi-color indicator. Controls the MSCP.
This switch is equivalent to the SU switch. The colors indicate the
following conditions:
Green (in) = MSCP is disabled.
Green (out)= MSCP is enabled.
Amber = Fault is detected while the MSCP is disabled.
Red = ISE fault.
Off = Normal MSCP operation.

Drive DC Power
Switches

3–22 System Maintenance

One switch/indicator for each ISE. Apply power to the ISEs. Each
ISE spins up and runs a self-test. The indicator shows that nominal
power is being applied to the ISE. (If you want to bring the ISE
on-line, you press the Ready switch next.)

Figure 3–7 SF35 Rear Panel Fault Indicator

DSSI
Connectors

digi tal

1 0

AC Power
Switch

Power Supply
Fault Indicator
(Behind Panel)

230

115

FAULT

Line Voltage
Selector Switch
(Behind Panel)
MR-0421-92DG

Table 3–20 SF35 Rear Panel Controls and Indicator
Control/Indicator

Function

AC Power Switch

Applies power to the ac power supply.

Line Voltage
Selector Switch

Selects 120 Vac (60 Hz) or 240 Vac (50 Hz) line voltage.

Power Supply
Fault Indicator

When on, indicates an overtemperature condition.

System Maintenance 3–23

3.8.3 SF73 Storage Array
Figure 3–8 shows the SF73 storage array status and fault indicators. Table 3–21
descibes their functions. Figure 3–9 shows the controls and indicator located at
the rear of the storage array.
Figure 3–8 Location of SF73 Storage Array LEDs and Switchpacks

digi tal

Write
Ready Protect Fault

DSSI
ID

Write
Ready Protect Fault

MR-0423-92DG

Table 3–21 SF73 Front Panel Controls and Indicators
Control/Indicator

Function

Ready

Push-to-set switch with green indicator. Brings the integrated storage
element (ISE) on-line in about 10 seconds. The indicator remains on
while the ISE is on-line.

Write Protect

Push-to-set switch with amber indicator. Write protects the data on
the ISE. The data cannot be overwritten, nor can new data be written
to the ISE.

Fault

Switch with red indicator. When the indicator is on, the ISE failed.
Press the switch to display the fault codes and clear the ISE fault.
The indicator is off during normal operation.

TERM PWR LED

When on, indicates that the correct termination power is being
supplied.

SPLIT LEDs (2)

When on, indicates that the storage array is operating in split-bus
mode.

Switchpacks (4)

One for each of the drives in the storage array. Each switchpack is
used to set the DSSI ID number. The icon on the front of the door
indicates the location of the drive. The three rightmost switches of
each switchpack are the DSSI ID switches. The leftmost switch is the
SU switch.

Drive DC Power
Switches

One switch/indicator for each ISE. Each switch applies power to an
ISE. Each ISE spins up and runs a self-test. The indicator shows
that nominal power is being applied to the ISE. (If you want to bring
the ISE on-line, you press the Ready switch next.)

3–24 System Maintenance

Figure 3–9 Rear of the SF73 Storage Array

DSSI
Connectors

1 0

AC Power
Switch

Power Supply
Fault Indicator
(Behind Panel)

230

115

FAULT

Line Voltage
Selector Switch
(Behind Panel)
MR-0422-92DG

System Maintenance 3–25

3.8.4 TF85C Tape Drive
Table 3–22 may help you define and correct TF85C tape drive problems.
Table 3–22 TF85C Tape Drive Problems
Problem

Possible Solution

Correctable failure
during operation

If the TF85C drive fails during operation, reset the the drive, then
rewind, unload, and remove the cartridge.
If all four indicators are blinking, press the Unload button. If the
failure is correctable, the tape begins to rewind and the yellow
indicator blinks. When the tape is unloaded, the green indicator
turns on and the beeper sounds. Then pull the Insert/Remove
handle to open the drive and remove the cartridge.

Noncorrectable
failure during tape
motion

If the tape does not rewind when the Unload button is pushed, and
all indicators continue to blink, the failure is not correctable. The
drive must be serviced or replaced.

Failure during
cartridge insertion

A cartridge failure occurs if a cartridge is damaged or if internal
portions of the drive that handle the cartridge are not working.
Suspect a cartridge failure if the green indicator blinks, but the
tape does not move (the yellow indicator does not blink). Remove
the cartridge and try another one, or inspect the tape leader and
drive takeup leader.

Figure 3–10 shows the front of the TF85C tape drive. Table 3–23 describes the
indicators shown in Figure 3–10.
Figure 3–10 TF85C Cartridge Tape Drive

t
ad
gh
Lo
Li
o
T
t
ai his
t
W
n
pe
O dle pe
a
T
an
H
rt
se his
t
In
se
lo e
C dl
an

d
oa
n
nl
to
U
ut t
o
B gh
T
i
ss L
re
P t
is
ai
th
W
n
pe
pe
O dle Ta
an ve
o
em

se
U
ed
g
e
ct
in
in
at dle
n
te te
e
e a pe per an
ri ro
ap Us Cle Ta
W P
O H
T

Text is 8pt on 8pt
Rt,z,-45
TK85 is TI med (ti) 12pt

U
nl
oa
d
MR-0471-92DG

3–26 System Maintenance

Table 3–23 TF85C Cartridge Tape Drive Indicators
Indicator

Color

State

Operating Condition

Write Protected

Orange

Tape is write-protected.

Off

Tape is write-enabled.

Tape in Use

Yellow

Blinking

Tape is moving.

Tape is loaded; ready for use.

Drive head needs cleaning or tape is bad.

If it remains on after
you unload the cleaning
tape . . .

Then the cleaning was not completed because the
tape ended.

If, after cleaning, it
turns on again when
the data cartridge is
reloaded . . .

Then a data cartridge problem occurred. Try
another cartridge.

Okay to operate the Insert/Remove handle.

Off

Do not operate the Insert/Remove handle.

Power-on self-test is in progress.

Blinking

A fault is occurring. Press the Unload button to
unload the cartridge. If the fault is cleared, the
yellow indicator blinks while the tape rewinds.
When the green indicator turns on, you can
move the Insert/Remove handle to remove the
cartridge. If the fault is not cleared, all four
indicators continue to blink. Do not attempt to
remove the cartridge. Refer to the TF85C service
guide.

Use Cleaning
Tape

Operate Handle
All four
indicators

Orange

Green

3.8.5 TF857 Tape Loader
This section describes the power on process and the operator control panel (OCP)
indicators.
3.8.5.1 Power-On Process
When the TF857 tape loader powers on, all of the indicators on the control panel
(OCP) turn on within 15 seconds. The power on self-test (POST) is initializing
the subsystem. When POST completes successfully, all OCP indicators, including
the Magazine Fault and Loader Fault indicators, turn off — except for Power On.
Then the elevator scans the magazine to find slots that contain cartridges.
3.8.5.2 Operator Control Panel Controls and Indicators
Figure 3–11 shows the OCP controls and indicators. Table 3–24 describes their
functions.

System Maintenance 3–27

Figure 3–11 TF857 Operator Control Panel

Operator Control Panel
Eject

Load/Unload

Mode Select Key
Slot Select

OCP
Disabled

Automatic
Mode

Power On

Current
Slot
Indicators
0-5

Manual
Mode

Service
Mode

Button
and
Indicator
Area

OCP Label

Write
Protected
Tape In Use

1
Use
Cleaning Tape
Magazine
Fault
Loader Fault

DSSI Node
ID Label
3

Eject

Load/Unload

Slot Select

0
Power On
Write
Protected

Write Protect
Load Fault

Tape In Use

Use
Cleaning Tape
Magazine
Fault
Loader Fault

5
4

40% REDUCTION

MR-0472-92

Table 3–24 TF857 OCP Controls and Indicators
Control/Indicator

Color

Function

Eject button

–

Opens the receiver, allowing access to the
magazine for removal and insertion of
cartridges. Also can be used to unload the
tape from the drive to the magazine.

Eject indicator

Green

Indicates that pressing the Eject button opens
the receiver. If a cartridge is in the drive, the
cartridge unloads to the magazine and the
receiver opens. If no cartridge is in the drive,
the receiver opens.
(continued on next page)

3–28 System Maintenance

Table 3–24 (Cont.) TF857 OCP Controls and Indicators
Control/Indicator

Color

Function

Load/Unload button

–

Loads the currently selected cartridge into the
drive, or unloads the cartridge from the drive
to the magazine.
If the Loader Fault or Magazine Fault
indicators are on, can also be used to reset
the subsystem.

Load/Unload indicator

Green

Indicates you can press the Load/Unload
button.

Slot Select button

–

When pressed, increments the current slot
indicator to the next slot.

Slot Select indicator

Green

Indicates the Slot Select button can be used.
Pressing the button increments the current
slot indicator to the next slot.

Power On indicator

Green

When on, indicates the TF857-AA tape loader
power is on (ac and dc voltages are within
tolerance). When off, indicates the tape loader
power is off.

Write Protected indicator

Orange

When on, indicates the cartridge in the drive
is write protected. When off, indicates the
cartridge in the drive is write enabled.

Tape in Use indicator

Yellow

Indicates tape drive activity as follows:
•

Slow blinking indicates tape is rewinding;
rapid blinking indicates tape is reading or
writing.

•

When on steadily, indicates a cartridge is
in the drive and the tape is not moving.

•

When off, indicates no cartridge is in the
drive.

Magazine Fault indicator

Red

Indicates a magazine failure.

Use Cleaning Tape indicator

Orange

Indicates the read/write head needs cleaning.

Loader Fault indicator

Red

Indicates a TF857-AA tape loader transfer
assembly error or drive error.

Current slot indicators 0–6

Green

Identify the current slot (see Slot Select
button). Each current slot indicator blinks
when its corresponding cartridge moves to or
from the drive. Also used with the Magazine
Fault or Loader Fault indicator to indicate the
type of fault.

3.9 ROM-Based Diagnostics
The following sections describe how to use the TEST and Z commands and to run
the ROM-based diagnostics (RBDs).

System Maintenance 3–29

3.9.1 TEST
TEST enables the user to test:
•

The system

•

A zone

•

The CPU and memory

Use TEST only when the cross-link state is set to off.
The TEST syntax is:
TEST [qualifier(s)]
Tables 3–25 and 3–26 describe the TEST selection and control qualifiers.
Table 3–25 Qualifiers for TEST Selection
Qualifier

Description

/GROUP:n1

Specifies a decimal number from 0 to 5 that identifies the
group of tests to be run.

/TEST:n1

Specifies a decimal number from 0 to 32 that identifies the
tests to be run.

/SUBTEST:n1

Specifies a decimal number from 0 to 32 that identifies the
subtests to be run.

/VERBOSE

Enables a display of all individual tests during execution.

/NOTRACE

Disables test traces.

1 n can be a:

• Single value
• Range separated by a colon (1:5)
• List separated by commas (1,5,9)
• Combination of range and list (1:6,8,10,11:29)

Table 3–26 Qualifiers for TEST Control
Qualifier

Description

/PASSCOUNT:n

n is a decimal number from 0 to MAXINT. When n is 0, the
passcount is infinite.

/NOTRACE

Disables the test traces.

/COE

Continues on error.

/NOCONFIRM

Disables the test confirmation on destructive tests.

/EXTENDED

Enables extended error reports.

/NOSTATUS

Disables status messages and reports.

/LIST

Lists the available tests, but does not run them.

When you do not supply the qualifier(s), TEST runs all the nonextended tests
(except those that require confirmation).

3–30 System Maintenance

3.9.2 Z
Z connects to the firmware of another module in the system. It is also used to
initiate I/O ROM-based diagnostics.
The Z syntax is:
Z[/PATH=path-number]
Table 3–27 describes the qualifier.
Table 3–27 Qualifier for Z
Qualifier

Function

/PATH=path-number

Specifies the zone and slot number of a module. The pathnumber format is zss, where:
z is the zone ID (A or B).
ss is the slot number of the module.

3.9.3 CPU ROM-Based Diagnostics
Table 3–28 provides a brief description of the CPU ROM-based diagnostics
(RBDs).
Table 3–28 CPU ROM-Based Diagnostic Descriptions
Group

Test

Subtest

G: 0

Description
Self-Test

G: 0

T: 0

NVRAM Test

G: 0

T: 0

S: 0

NVRAM CPU EEPROM Data Integrity Test

G: 0

T: 0

S: 1

NVRAM CPU EEPROM Checksum Test

G: 0

T: 0

S: 2

NVRAM I2C Bus Register Access Test

G: 0

T: 0

S: 3

NVRAM Module-ID PROM Access and Data Integrity
R/W Test

G: 0

T: 0

S: 4

NVRAM Module-ID PROM Checksum Test

G: 0

T: 0

S: 5

NVRAM System Ethernet Access Test

G: 0

T: 0

S: 6

NVRAM System Ethernet PROM Checksum Test

G: 0

T: 1

P-CACHE Test
(continued on next page)

System Maintenance 3–31

Table 3–28 (Cont.) CPU ROM-Based Diagnostic Descriptions
Group

Test

Subtest

Description

G: 0

T: 1

S: 0

P-CACHE Register Bit Test

G: 0

T: 1

S: 1

P-CACHE Tag Integrity Test

G: 0

T: 1

S: 2

P-CACHE Data Integrity Test

G: 0

T: 1

S: 3

P-CACHE Data/Tag Parity Test

G: 0

T: 2

G: 0

T: 2

S: 0

VIC Register Bit Test

G: 0

T: 2

S: 1

VIC Cache Tag Test

G: 0

T: 2

S: 2

VIC Cache Data Test

G: 0

T: 2

S: 3

VIC Cache Data Parity Error Test

G: 0

T: 2

S: 4

VIC Cache Tag Parity Error Test

G: 0

T: 2

S: 5

VIC Branch Prediction Test

G: 0

T: 3

G: 0

T: 4

G: 0

T: 4

S: 0

MEMORY Data Bus & Catastrophic Failure Test

G: 0

T: 4

S: 1

MEMORY Address Uniqueness Test

G: 0

T: 4

S: 2

MEMORY Bank Addressing Test

G: 0

T: 4

S: 3

MEMORY Chip Addressing Test

G: 0

T: 4

S: 4

MEMORY Chip Open Address Lines Test

G: 0

T: 4

S: 5

MEMORY Single-Bit ECC Error Logic Test

G: 0

T: 4

S: 6

MEMORY Double-Bit ECC Error Logic Test

G: 0

T: 4

S: 7

MEMORY ECC Error Logic Test

G: 0

T: 4

S: 8

MEMORY ECC Test

G: 0

T: 4

S: 9

MEMORY ECC Lines Test

G: 0

T: 5

G: 0

T: 5

G: 0

T: 6

G: 0

T: 6

S: 0

B-CACHE Data RAM Test

G: 0

T: 6

S: 1

B-CACHE Tag RAM Test

G: 0

T: 6

S: 2

B-CACHE ECC RAM Test

G: 0

T: 6

S: 3

B-CACHE Write Test

G: 0

T: 6

S: 4

B-CACHE Data Integrity Test

G: 0

T: 6

S: 5

B-CACHE Data Test (error enabled)

G: 0

T: 7

G: 0

T: 7

S: 0

DMA Powerup State Test

G: 0

T: 7

S: 1

DMA Register Access Test

G: 0

T: 7

S: 2

DMA Address Decode Test

G: 0

T: 7

S: 3

DMA Interlock Access Test

G: 0

T: 7

S: 4

DMA Queue Processing Test

VIC Test

JXD Test
Memory Test

BITMAP Test
S: 0

BITMAP March Test
B-CACHE Test

DMA Test

(continued on next page)

3–32 System Maintenance

Table 3–28 (Cont.) CPU ROM-Based Diagnostic Descriptions
Group

Test

Subtest

Description

G: 0

T: 7

S: 5

DMA Sub-Trasfer Length Test

G: 0

T: 7

S: 6

DMA I/O Byte Alignment Test

G: 0

T: 7

S: 7

DMA Memory Byte Alignment Test

G: 0

T: 7

S: 8

DMA Maximum Transfer Length Test

G: 0

T: 8

G: 0

T: 8

S: 0

XLINK Serial Cross-link Internal Loopback Test - Part 1

G: 0

T: 8

S: 1

XLINK Serial Cross-link Internal Loopback Request Test

G: 0

T: 8

S: 2

XLINK Serial Cross-link Internal Loopback Reply Test

G: 0

T: 8

S: 3

XLINK Serial Cross-link Internal Loopback Query Test

G: 0

T: 8

S: 4

XLINK Serial Cross-link External Loopback Test

G: 0

T: 8

S: 5

XLINK Serial Cross-link Communication Register Test

G: 0

T: 9

RESET Test

G: 0

T: 9

RESET CPU Module Hard Reset Test

XLINK Test

G: 1

Zone Test

G: 1

T: 0

ACCESS Test

G: 1

T: 0

S: 0

ACCESS Parallel Xlink Loopback Test

G: 1

T: 0

S: 1

ACCESS I/O Module PATH ACCESS Test

G: 1

T: 0

S: 2

ACCESS I/O Module SSC Console Uart Test

G: 1

T: 1

DMA Test

G: 1

T: 2

INTERRUPT Test

G: 1

T: 3

ERROR Test

G: 1

T: 3

G: 1

T: 4

G: 1

T: 4

S: 0

RESET CPU Module Zone Reset Test

G: 1

T: 4

S: 1

RESET I/O Module Reset Test

S: 0

ERROR I/O Crosscheck Test
RESET Test

G: 2

System Test

G: 2

T: 0

Cross-link Mode Test

G: 2

T: 0

S: 0

Zone A (MASTER -> RESYNC MASTER -> DUPLEX)
Mode Test

G: 2

T: 0

S: 1

Zone B (MASTER -> RESYNC MASTER -> DUPLEX)
Mode Test

G: 2

T: 1

G: 2

T: 1

S: 0

ACCESS I/O Module Path Access Test

G: 2

T: 1

S: 1

ACCESS I/O Module SSC Console Uart Test

G: 2

T: 1

S: 2

ERROR I/O Crosscheck Test

G: 2

T: 2

Zone A MASTER - Zone B SLAVE Mode Test

Zone A RESYNC_MASTER - Zone B RESYNC_SLAVE
Mode Test
(continued on next page)

System Maintenance 3–33

Table 3–28 (Cont.) CPU ROM-Based Diagnostic Descriptions
Group

Test

Subtest

Description

G: 2

T: 2

S: 0

ACCESS I/O Module Path Access Test

G: 2

T: 2

S: 1

ACCESS I/O Module SSC Console Uart Test

G: 2

T: 2

S: 2

ERROR I/O Crosscheck Test

G: 2

T: 3

G: 2

T: 3

S: 0

ACCESS I/O Module Path Access Test

G: 2

T: 3

S: 1

ACCESS I/O Module SSC Console Uart Test

G: 2

T: 3

S: 2

ERROR I/O Crosscheck Test

G: 2

T: 4

G: 2

T: 4

S: 0

ACCESS I/O Module Path Access Test

G: 2

T: 4

S: 1

ACCESS I/O Module SSC Console Uart Test

G: 2

T: 4

S: 2

ERROR I/O Crosscheck Test

G: 2

T: 5

G: 2

T: 5

S: 0

ACCESS I/O Module Path Access Test

G: 2

T: 5

S: 1

ACCESS I/O Module SSC Console Uart Test

G: 2

T: 5

S: 2

ERROR I/O Crosscheck Test

Zone B MASTER - Zone A SLAVE Mode Test

Zone B RESYNC_MASTER - Zone A RESYNC_SLAVE
Mode Test

DUPLEX Mode Test

The following example shows a CPU RBD error frame.
>>> group: 0 test: 1 subtest:2
======================================================================
----------------------- DIAGNOSTIC TEST ERROR ---------------------GROUP: 00
Test: 01 Sub: 02
Error: 01 Pass: 00000001
Addr: 00000000
Exp: 00000000 Rec: 000000ff
Xor: 000000ff
Data Miscompare
=======================================================================
The example shows that the P-CACHE Data/Tag Integrity Test was executed and
failed. The XOR data specifies a data miscompare.

3.9.4 I/O ROM-Based Diagnostics
Table 3–29 provides a brief description of the I/O ROM-based diagnostics (RBDs).
Table 3–29 I/O ROM-Based Diagnostic Descriptions
Group

Test

Subtest

G: 0

Description
I/O Self-Test

G: 0

T: 0

I/O SSC Test

G: 0

T: 0

S: 0

SSC Toy Clock Test

G: 0

T: 0

S: 1

SSC Storage Uart Test

G: 0

T: 0

S: 2

SSC Bus Timeout Test

G: 0

T: 0

S: 3

SSC Interval Timer Test
(continued on next page)

3–34 System Maintenance

Table 3–29 (Cont.) I/O ROM-Based Diagnostic Descriptions
Group

Test

Subtest

Description

G: 0

T: 1

G: 0

T: 1

S: 0

VIC Register Test

G: 0

T: 1

S: 1

VIC Interrupt Test

G: 0

T: 2

G: 0

T: 2

S: 0

Firewall Register Test

G: 0

T: 2

S: 1

Firewall Rail Master Test

G: 0

T: 2

S: 2

Firewall Cross Check Error Test

G: 0

T: 3

G: 0

T: 3

S: 0

CACHE Control Register Bit Test

G: 0

T: 3

S: 1

CACHE Minimum Bank Test

G: 0

T: 3

S: 2

CACHE Data Integrity Test

G: 0

T: 3

S: 3

CACHE Tag Integrity Test

G: 0

T: 3

S: 4

CACHE Tag Parity Detection Test

G: 0

T: 3

S: 5

CACHE Tag Parity Generation Test

G: 0

T: 3

S: 6

CACHE Data Parity Checking Test

G: 0

T: 4

G: 0

T: 4

S: 0

Module Data EEPROM Integrity Test

G: 0

T: 4

S: 1

Module I2C EEPROM Integrity Test

G: 0

T: 5

G: 0

T: 5

I/O VIC Test

I/O Firewall Test

I/O Cache Test

I/O NVRAM Test

I/O RAM Test
S: 0

G: 1

SOC RAM Test
I/O Eself Pcard Test

G: 1

T: 0

I/O SLIM Test

G: 1

T: 0

S: 0

SLIM Register Test

G: 1

T: 0

S: 1

SLIM RAM Test

G: 1

T: 1

G: 1

T: 1

S: 0

SWIFT Reset Test

G: 1

T: 1

S: 1

SWIFT Register Test

G: 1

T: 1

S: 2

SWIFT Interrupt Test

G: 1

T: 1

S: 3

SWIFT Internal Loopback Test

G: 1

T: 2

G: 1

T: 2

S: 0

LANCE Register Test

G: 1

T: 2

S: 1

LANCE Internal Loopback Test

G: 1

T: 2

S: 2

LANCE Interrupt Test

I/O SWIFT Test

I/O LANCE Test

System Maintenance 3–35

The following example shows an I/O RBD error frame.
>>> z
Connecting to target...Press Ctrl/P to end connection
I
IO1> group: 0 test: 4 subtest:1
======================================================================
----------------------- DIAGNOSTIC TEST ERROR ---------------------GROUP: 00
Test: 04 Sub: 01
Error: 03 Pass: 00000001
Addr: 00000000
Exp: 00000000 Rec: 000000ff
Xor: 000000ff
Data Miscompare
=======================================================================
The example shows that the Module I2C EEPROM Integrity Test was executed
and failed. The XOR data specifies a data miscompare.

3–36 System Maintenance

4
Error Handling and Analysis
4.1 In This Chapter
This chapter includes:
•

Error handling services overview

•

Field replaceable units

•

OpenVMS error log

•

Module NVRAM status and LED indicators

•

FTSS error reporting interface

•

Firmware interfaces

•

Firmware and OpenVMS interface data structures

•

Error log analysis

4.2 Error Handling Services Overview
The primary function of the error handling services (EHS) is to handle and
recover from high-level system interrupts generated by the hardware when an
error is detected. When an error occurs, the EHS is invoked by hardware as an
interrupt service routine.
The interrupt service routine isolates the failure by examining various system
registers. The isolation process occurs at a high system priority level; it pauses
the OpenVMS operating system until it is complete.
After isolating the faulty FRU, the EHS determines the appropriate actions
to take. For solid errors, system deconfiguration is performed and the FRU is
removed from service. This usually involves performing module resets to invoke
diagnostics.

Error Handling and Analysis 4–1

EHS error notification is described in Table 4–1.
Table 4–1 EHS Error Notification
Step

Action

Entries are made into the system error log.

Status information is written to the module ID NVRAM and the DCB, where
applicable.

The LED indicator associated with a failed module is set.

A call is issued to the error reporting interface (ERI) which reports the event to the
FTSS$SERVER. The server process generates OPCOM messages and reports the
events to a mailbox.

4.2.1 Basic Error Isolation and Handling
Figure 4–1 and Table 4–2 describe the error isolation and handling procedure.
Figure 4–1 Hardware Error Handling Flowchart

Hardware
Error

6
Fork to IPL8

1
IPL29
Interrupt

7
Transient
Error

2
Fault
Detection

8
Treshold
Error

YES

NO
3
FRU
Isolation

4
Solid
Failure

11
Make Error
Log Entry
YES

5
Deconfigure
FRU

12 − Notify
FTSS$SERVER
through ERI

NO
A

9
Over
Treshold
YES
10
Deconfigure
FRU

13
Done
MR−0495−92RAGS

4–2 Error Handling and Analysis

Table 4–2 Error Handling Flowchart Definitions
Event

Definition

Hardware reports error through a high-level interrupt and control is
transferred to the EHS.

The EHS examines system registers to determine the type of failure which has
occurred.

The EHS identifies the FRU that is the source of the error. FRU isolation is
generally accomplished at the module level. In some cases, FRU isolation is to
a set of modules. In all cases, the EHS isolates the error to an FRU or set of
FRUs in one zone.

The EHS determines if the error is solid.

If the error is solid, the FRU is deconfigured from the system.

The EHS has successfully recovered from the error (either solid or transient)
and execution is continued at IPL8.

7 and 8

If the error is transient, it is compared to its error rate threshold.

If the error is below the error rate threshold, an entry is made in the error log.

If the error is above the error rate threshold, the FRU is deconfigured from the
system.

An entry is made in the error log.

The FTSS$SERVER is notified of the error through the ERI.

Error handling is complete.

4.2.2 EHS Structure
The EHS is packaged as part of the Fault Tolerant System Services (FTSS)
execlet (loadable image file). The FTSS execlet is loaded and initialized when
FTSS is started after the OpenVMS operating system is booted.
System errors are reported to software through an IPL 29 interrupt. When
an interrupt occurs, the hardware fetches the dispatch vector from the System
Control Block (SCB) and dispatches to the EHS interrupt service routine.
VAXELN errors are reported to the OpenVMS operating system through an IPL
22 interrupt. The interrupts are vectored by a combination of hardware and
software to the EHS interrupt service routine.
Figure 4–2 illustrates the position of the EHS relative to the major hardware,
system firmware, and other software components.

Error Handling and Analysis 4–3

Figure 4–2 EHS Architectural Position

Error Handling Services

Functions

System Utilities
Error Reporting Interface
System Error Log

Error Event
Notification
Remote Zone
Interface

IZC Routines
Zone Available

Firmware Interface

Resets
Status

Serial Interrupts
Serial Transmit/Receive
VAXELN and Diagnostics
Console and Diagnostics

Registers
Hardware Interface

Interrupts

System Hardware

VMS Interface

Device Unavailable
FRU Deconfiguration

Device Drivers
FTSS Reconfiguration
MR−0004−93RAGS

4.2.3 System Operating Modes
The error handler recognizes four modes of system operation. Each mode directly
relates to the supported hardware modes of the cross-link state as summarized in
Table 4–3.
Table 4–3 System Operating Modes
Mode

Definition

Simplex

The cross-link state in one zone is off and the CPU, memory, and I/O
subsystem of the other zone are not available for use. However, those
components in the other zone may be available and can run the OpenVMS
operating system. The system can be booted in this mode if one zone is not
physically present or is out of service. The system can also be degraded into
this mode after the failure of one zone.

Degraded
Duplex

The cross-link state in one zone (the master zone) is set to master and the
cross-link state in the other zone is set to slave. The CPU and memory in the
master zone are running the OpenVMS operating system and the I/O from
the slave zone is configured and in use. However, the slave zone CPU and
memory are not in use. This mode can only be achieved as a result of the
deconfiguration of a CPU and memory set of one zone due to an error.

Resynch

This mode is similar to Degraded Duplex except that all memory writes in
the master zone are duplicated in the slave zone. That is, when a write to
memory is performed in the master zone, the same data is written to the
same memory location in the slave zone. The cross-link state in one zone
is Resynch master and in the other zone, Resynch slave. This mode is used
during the synchronization process to copy the master zone memory to the
slave zone before entering Duplex mode.
(continued on next page)

4–4 Error Handling and Analysis

Table 4–3 (Cont.) System Operating Modes
Mode

Definition

Duplex

The memories in both zones are identical and both CPUs are running in
lockstep. The I/O subsystems of both zones are available and in use. The
cross-link state in both zones is Duplex. The system can be booted in this
mode, or can transition to this mode as the result of the synchronization
process from either Simplex or Degraded Duplex modes.

4.2.4 Error Types
EHS recognizes 11 error types. All errors are classified as one of those described
in Table 4–4.
Table 4–4 Error Types
Error Type

Definition

CPU/MEM
Faults

All data, ECC codes, and control signals flow over the primary rail. The
mirror rail exists primarily for the purpose of performing verification
checks against the primary rail. Some checks are performed by hardware
between these two rails to detect failures within the boundaries of the
CPU module. When such a condition is detected, a CPU/MEM fault is
generated by the hardware, and results in the following set of hardware
actions:
1. A high-level system interrupt occurs to report the error, causing an
entry into the error handler. In some cases, the failure may be severe
enough to prevent instructions from executing.
2. If the operating mode at the time of the failure is Duplex, it will
be changed to Degraded Duplex mode. In this case, the other zone is
interrupted as well by a report that a CPU/MEM fault occurred in the
failing zone.
3. Approximately 145 microseconds after the interrupt, the failing CPU
module will be reset by hardware, resulting in an entry into the system
console. The purpose of this brief delay is to allow the error handler to
store the contents of the CPU, JXD, and cross-link registers in the Console
Communications Area (CCA).
In non-Duplex modes, only one CPU is in use. This failure results in the
termination of the OpenVMS operating system.
CPU/MEM faults can be caused by solid or transient errors. Since
software cannot distinguish between the two, they are all treated as
transient. The CPU module requires service only when they exceed the
operating system’s threshold, when an end action timeout occurs, or when
diagnostics fail. In all cases, the FRU identified by software is the CPU
module which experienced the failure.
(continued on next page)

Error Handling and Analysis 4–5

Table 4–4 (Cont.) Error Types
Error Type

Definition

Double-Bit
memory
errors

Hardware reports a double-bit error (DBE) when the ECC checkers detect
this condition on a read from a main memory location. This read can occur
during a DMA or CPU cycle, with two possible error causes: a memory
failure or a programming error.
If system software attempts to access a location beyond the bounds of
physical memory, hardware will report a double-bit ECC error. This is a
programming error in the OpenVMS operating system and the EHS will
initiate a system crash. This will be seen as a FATMEMERR bugcheck.
If system software attempts to access a valid physical memory location
which does not respond, a DBE will be reported by the hardware. In
this case, the cause of the problem is failed memory. The CPU with this
memory failure is removed from the configuration.
If the system is operating in a non-Duplex mode, the OpenVMS operating
system is terminated by forcing an entry into the system console. In
Duplex, the failed CPU is removed and the system continues to operate in
Degraded Duplex mode.
DBEs due to memory failures are always treated as solid. The failed CPU
will not be reconfigured until the zone with the failure is removed and the
memory is repaired.
The FRU in most cases will be a pair of SIMMs on a memory mother board
(MMB). In all cases, FRU isolation is done at the time of the end action
when system registers are recovered from the failed CPU. In the case of
an end action timeout, the CPU module will be identified as the FRU.
(continued on next page)

4–6 Error Handling and Analysis

Table 4–4 (Cont.) Error Types
Error Type

Definition

Single-Bit
memory
errors

Single-Bit Errors (SBEs) can be detected by either the JXD during a DMA
read cycle which reads from main memory or the CPU during a memory
read. Software action varies depending upon the system operating mode
and where the error detection occurs.
If the SBE is detected by the JXD during a DMA cycle in any system
mode or by the CPU during a CPU cycle in any non-Duplex mode, the
actions of the EHS are the same. The error is always transient, and no
deconfiguration is performed. A pair of memory SIMM rows on an MMB
are isolated and compared to its error rate threshold.
In Duplex mode (JXD detected) when the threshold is exceeded, the CPU
module on which the memory resides will be removed from service. In
non-Duplex mode, since there is only one CPU active and since SBEs are
always transient, the CPU is not removed from service when the threshold
is exceeded. The SBE is repaired in memory by hardware if detected by
the JXD, and by the EHS if detected by the CPU.
If the SBE is detected during a CPU cycle while the system is in Duplex
mode, the action differs due to hardware constraints. The CPU which
experiences the SBE will be removed from service by hardware at the time
of the error. An error log will be generated reporting the error, but FRU
isolation is done at the time of the end action. The error is then compared
to its error rate threshold by the OpenVMS operating system.
If the threshold is not exceeded, the CPU will be resynchronized
immediately by system software (FTSS$SERVER) at the time of the end
action. The process of resynchronization will repair the SBE in physical
memory since each location is rewritten during the memory copy.
If the failed CPU does not return for resynchronization after being
removed in the CPU-detected Duplex mode case, an end action timeout
event will be logged which identifies the failed CPU module as the FRU.
In most cases, a pair of SIMM rows and a memory mother board (MMB)
are identified as the FRU in the error log. However, in some cases, end
action data may not contain all the information needed to isolate to a pair
of memory SIMM rows. In this case the CPU module will be identified as
the FRU and will be subjected to the same threshold as a memory SIMM.

Cable
failures

All traffic between the two zones of the system is performed across the
cross-link cable. If this cable is detached or broken, the hardware will
report a cable loss event to the EHS. This error can only happen in a nonSimplex system, and when it occurs, communication between the zones is
lost.
In all cases, the system operating mode must be changed to Simplex. If
the mode before the error was not Duplex, then the slave zone is removed
from service. If the mode was Duplex, then Zone B is removed from
service.
The EHS indicates in the error log that this error is solid and service
is required, and the error is compared to its error rate threshold. If the
threshold is not exceeded, the zone will be resynchronized automatically. If
the threshold is exceeded, no automatic resynchronization will occur until
the cross-link cable is repaired. In all cases, the FRU is the cross-link
cable.
(continued on next page)

Error Handling and Analysis 4–7

Table 4–4 (Cont.) Error Types
Error Type

Definition

Power
failures

If a zone loses power in a non-Simplex configuration, hardware generates
an interrupt to report the event to the EHS. In a non-Duplex mode,
software will detect this error only when the slave zone loses power. In
this case, the slave zone is removed from the configuration and the system
continues to run in Simplex mode.
In Duplex mode, the error is detected by software when either zone loses
power. Again, the failed zone is removed from the configuration and the
system continues in Simplex mode.
EHS indicates in the error log that this error is solid and service is
required, and the error is compared to its error rate threshold. If the
threshold is not exceeded, the zone will be resynchronized automatically.
If the threshold is exceeded, no automatic resynchronization will occur
until the zone is repaired and resynchronized manually. The failed zone is
identified as the FRU for all power failures.

Clock phase
errors

If the clocks between zones begin to run out of phase, hardware generates
an interrupt to report the event to the EHS. This event can occur only
in non-Simplex modes. The cause of this type of failure can be either the
oscillator or the clock locking logic.
An oscillator failure will prevent the CPU and I/O module clocks in the two
zones from running in synchronization and will result in the termination
of the OpenVMS operating system on that zone.
Failure in the clock lock logic will result in two zones running diverged
if the system operating mode had been Duplex. In this case, EHS will
select one zone to remove, and the other zone will continue to run the
OpenVMS operating system in Simplex mode. (Zone selection is based on
timings within the system and could be either zone.) In Degraded Duplex
mode, the slave zone is removed from the configuration and the OpenVMS
operating system continues in Simplex mode.
In all cases of oscillator failure, the ATM in the zone which is removed
is identified as the FRU. If the error is caused by clock lock logic failure,
software cannot accurately determine in which zone the failure exists.
The EHS compares the error to its error rate threshold. An error log is
generated at the time of the error which identifies the ATM as the FRU. If
the threshold is exceeded, the error log indicates that service is required
for the ATM and the zone will not be resynchronized automatically. If the
threshold is not exceeded and the diagnostic tests complete successfully,
the zone will be resynchronized when it becomes available.
If the threshold is not exceeded and the diagnostics report a failure, the
end action error log will indicate that the ATM module requires service
and the zone will not be resynchronized automatically. If the zone fails to
return for service and the threshold had not been exceeded, an end action
timeout error log is generated which indicates the ATM requires service.
(continued on next page)

4–8 Error Handling and Analysis

Table 4–4 (Cont.) Error Types
Error Type

Definition

Halt errors

A halt error occurs when the system is operating in Duplex mode, the Zone
Halt Enable switch on the zone control panel is pressed, and the Break key
is pressed on one of the system consoles, or one zone experiences errors on
its halt lines.
The zone attached to the console terminal or with the error will be halted
and enter the system console. In the other zone, hardware generates
an interrupt to the EHS. The system operating mode will be degraded
to Simplex and the OpenVMS operating system will be continued after
deconfiguring the halted zone.
The failed zone is identified as the FRU in the error log. This error is
not subjected to thresholding. The halted zone must be resynchronized
manually to be returned to service.

Resynch
abort errors

During memory resynchronization, all memory writes are mimicked to
both zones. The data is driven from the master zone across the resynch
bus (also referred to as the cross-link cables) to the slave zone. The
incoming data on the slave side is protected by ECC. An ECC failure on
the slave side results in a CPU/MEM fault on the slave and is handled as
that type of error. The data is protected on the master side by an ECC, a
cross-rail ECC comparison and a data cross-check.
The failure of any of these checks results in hardware generating an
interrupt to the EHS reporting a resynch abort error. Resynch mode is
terminated by the hardware and system operation continues in Degraded
Duplex mode.
Since all resynch abort errors indicate failures on the master side, the
master CPU module is isolated as the FRU. This error can occur only
when the system is in Resynch mode, so removal of the CPU would result
in termination of the OpenVMS operating system. The error log message
will indicate the master CPU as the FRU.
The EHS compares the error to its error rate threshold. If the threshold is
exceeded, the EHS will disable automatic resynchronization of the remote
zone. Manual intervention will be required to repair this situation. Since
Duplex mode cannot be achieved and the master CPU is the source of this
failure, the OpenVMS operating system must be manually terminated to
repair the CPU module.

Nonexistent
I/O errors

Nonexistent I/O (NXIO) errors occur when a reference to an I/O module
times out. Such a timeout can occur during a DMA or CPU cycle. In
a CPU cycle, an automatic operation retry is attempted. If the retry
succeeds, hardware reports the failure as transient. Otherwise, it is
reported as a solid failure.
All timeouts during DMA cycles are transient errors. The error log
indicates if the error was solid or transient, and if it occurred on a DMA or
CPU cycle.
In all NXIO error cases, either an I/O or interface module will be identified
as the FRU. If the error is solid, the I/O or interface module will be
removed from system service by the EHS.
If the error is transient, it will be compared to its error rate threshold
by the EHS. If the threshold is exceeded and the system operating mode
is not Simplex, the I/O or interface module will be removed from system
service.
No I/O module will be removed due to transient errors from a Simplex
system (where alternate I/O paths are not normally available). Additional
transient errors on the I/O module will generate further error logs.
(continued on next page)

Error Handling and Analysis 4–9

Table 4–4 (Cont.) Error Types
Error Type

Definition

I/O errors

The ATM module contains a series of checkers that verify consistency
between the dual rails of the system during I/O accesses. When
discrepancies are detected, the hardware generates an interrupt, invoking
the EHS. System registers which reflect the state of the checkers are read
and analyzed to determine the source of the error.
These miscompare errors can be detected during a DMA operation or
a direct CPU I/O access. When miscompares occur on CPU cycles, the
hardware automatically retries the operation.
If the retry succeeds, hardware reports the error as transient. Otherwise,
the error is solid and the EHS deconfigures the system to remove the FRU.
The error log will indicate the FRU, describe the error as solid or
transient, and list any modules that were deconfigured as a result. If
the FRU is a zone or an ATM, the entire zone is removed.
These errors result in a CPU, ATM, interface module, or cross-link FRU.
Transient errors are compared to their error rate threshold by the EHS.
Errors that exceed the threshold may result in the removal of the FRU
from service.

Zone
divergence

This error type occurs when the two zones begin executing separate
code paths while operating in Duplex mode. This situation is detected
by hardware when an access to I/O space is performed. At that time,
miscompares in the control and data signals will be detected in the crosslink chips on the ATM.
This error is reported by hardware as an I/O error or an NXIO error, but
software recognizes the special case and identifies it as zone divergence in
the error log. When this error is detected, software will remove one zone
from service (Zone selection depends on how zone divergence manifested
itself). Either zone may be removed.
This error is usually due to a programming error or divergence between
the NVRAMs of the two zones. The error is treated as transient and the
threshold error count for that error is incremented.
If the threshold is not exceeded of if the diagnostics on the removed zone
complete successfully, the zone will be resynchronized back into the system
at end action time. If the threshold is exceeded or if the diagnostics on the
removed zone report a failure, the zone will not be resynchronized at end
action time. The end action error log will indicate that service is required.
If the removed zone fails to return from running diagnostics, an end action
timeout error log will be generated which identifies the zone as the FRU
and requests service. If the threshold is exceeded, the zone will not be
automatically resynchronized. Manual intervention will be required to
repair the zone and return it to service.

4.2.5 VAXELN Error Handling
Failures detected by VAXELN software running on the I/O expansion module are
reported to the EHS through one of two mechanisms:
•

An IPL 22 interrupt from the module error which is dispatched into the EHS.

•

The EHS detects the expiration of a watchdog timer maintained by VAXELN
signaling a termination of VAXELN execution.

4–10 Error Handling and Analysis

Table 4–5 describes the VAXELN error classes and the actions taken by the EHS.
Table 4–5 VAXELN Error Classes
Error Class

Description

EHS Actions

VAXELN Kernel
Fatal

This error is reported when the
VAXELN kernel detects a fatal
error which prevents it from
continuing operation.

The FRU is the I/O expansion
module. This is a solid error and
is not subjected to a threshold.

VAXELN Kernel
Recoverable

A recoverable error was
detected and handled by
VAXELN software. Currently,
this error is reported only
when VAXELN software
detects a repairable single-bit
memory error.

The FRU is the I/O expansion
module. The error is compared to its
error rate threshold. If the threshold
is exceeded, the I/O expansion
module and all attached interface
modules are deconfigured from the
system.

I/O Expansion
Module Master
Fatal

A fatal error detected by the
VAXELN I/O expansion module
master job which results in
the shutdown of all VAXELN
processes.

The FRU is the I/O expansion
module. This is considered a solid
error; no threshold is applied. The
I/O expansion module is deconfigured
from the system.

I/O Expansion
Module Master
Recoverable

An error detected by the
VAXELN I/O expansion module
master job which resulted from
the failure of a VAXELN job to
initialize successfully. The Job
ID field of the error message
indicates which VAXELN job
failed.

The FRU in an interface module.
The EHS isolates the interface
module by checking the Job ID field
of the error message. The error is
considered solid; no threshold is
applied. The module is deconfigured
from the system.

I/O Expansion
Module Job
Fatal

Similar to I/O Expansion
Module Master Recoverable,
this error indicates that a
VAXELN job has experienced
a fatal error and has been
terminated. The Job ID field
of the error message indicates
which VAXELN job failed.

The FRU is an interface module.
The EHS isolates the interface
module by checking the Job ID field
of the error message. The error is
considered solid; no threshold is
applied. The interface module is
deconfigured from the system.

VAXELN software implements a watchdog timer which is a cell in the I/O
Expansion Module Communication Area (NCA). It is incremented periodically
by VAXELN and monitored by the EHS. If the value in the NCA cell stops
incrementing, VAXELN has crashed. This is referred to as a VAXELN kernel
fatal error.
The EHS examines the VAXELN NCA error log buffer area for a VAXELN error
message. When it finds the error message, the EHS identifies the I/O expansion
module as the FRU. The error is considered solid; no threshold is applied, and the
I/O expansion module is deconfigured from the system.

Error Handling and Analysis 4–11

4.3 Field Replaceable Units (FRUs)
After analyzing error information and determining the error type, the EHS
isolates the source of the error to a FRU. If the error was solid, the system is
deconfigured to remove the FRU from service. If the error is transient, it is
compared against a threshold for the error type and FRU. If the threshold is
exceeded, or if the error is solid, the system is deconfigured to remove the FRU
from service.

4.3.1 Isolation
Table 4–6 describes the FRUs and lists the error types which could result in a
FRU being isolated.
Table 4–6 System FRUs
FRU

Description

Source Error Types

ATM
module

I/O attachment module. Performs exchange
and verification of I/O control and data signals
between zones. The module includes an
embedded I/O expansion module.

I/O errors
Clock phase errors

CPU
module

The CPU module is identified as the FRU when
the failure is attributable to a CPU problem or
to a problem that cannot be isolated between the
CPU and memory.

Resynch abort errors
CPU/MEM faults
Double-Bit memory
errors
Single-Bit memory errors

Memory
board

A pair of rows of memory SIMMs on a memory
mother board (MMB) will be identified as the
FRU when the error can be isolated beyond the
CPU board to a specific piece of memory.

Double-Bit memory
errors
Single-Bit memory errors

I/O expansion
module

An I/O expansion module can be identified as the
FRU as a result of a firewall miscompare during
an I/O operation or as a result of a nonexistent
I/O error during a reference to the I/O expansion
module or an attached interface module.

Nonexistent I/O errors
I/O errors
VAXELN errors

Interface
module

An interface module can be identified as a FRU
only as a result of a nonexistent I/O error which
occurs during a reference to the interface module.
It is also possible that the I/O expansion module
will be identified as the FRU.

Nonexistent I/O errors
VAXELN errors

Zone

Some error cases involve failures not directly
attributable to a single module. The zone FRU is
only identified in the case of solid or reproducible
errors, so diagnostics should be able to isolate the
failure within the zone.

Power failures
Halt errors
Zone divergence

Crosslink
cable

The cross-link cable is the identified FRU
for any error which isolates the connections
between zones. This includes the resynch and
interzone buses, which are packaged into the
single physical cable.

Cable failures
I/O errors

4–12 Error Handling and Analysis

4.3.2 Deconfiguration
This section describes the actions taken by the EHS when a FRU is identified as
the source of a solid error or transient errors which exceed the FRU threshold. A
table is provided for each FRU that describes the actions taken by the EHS when
the FRU is deconfigured.
In non-Duplex modes, the EHS may respond to excessive transient failures by
calling out the FRU but not removing it from service. This action prevents loss of
system service due only to transient errors.

4.3.2.1 I/O Attachment Module
Table 4–7 describes the OpenVMS operating system actions taken when the
ATM is identified as the FRU and deconfigured by the EHS. Some actions are
dependent on the system operating mode.
Table 4–7 ATM Deconfiguration Actions
Action Taken

Description

Comments

Cross-link mode =
off

The cross-link mode is set to off.
The system will continue in Simplex
mode. The action may be taken by
the hardware when the error occurs
or by software while handling the
error.

Done in non-Simplex mode
only. Extraneous when the
error occurs in Simplex
mode.

CPU/MEM fault

A CPU/MEM fault is forced on the
zone with the failed ATM module.
This results in an entry into the
system console.

Done when the error
occurs in Duplex, Simplex
or in the master zone
of a Degraded Duplex
configuration.

Zone hard reset

A zone hard reset is issued to the
zone with the failed ATM to force
diagnostics to run.

Done only when the error
occurs in the slave zone
of a Degraded Duplex
configuration.

Set ATM LED
indicator

Use the module I2C bus to turn on
the LED indicator for the failed ATM
module.

Set module status
in ATM NVRAM
and DCB

Update the status_os and status_
sum fields in the module ID NVRAM
and the DCB to indicate the module
has experienced a failure. The code
written depends on the failure type.

The entries in Table 4–7 apply when the module is being removed because of a
solid error or excessive transient errors. There is one exception. When an ATM
module in a Simplex system experiences excessive transient errors, the module is
not fully deconfigured since that would result in the termination of the OpenVMS
operating system. In this case, the ATM LED indicators turn on, and the module
status is written to the ATM NVRAM and DCB. The OpenVMS operating system
continues to run. The module will not be configured when the system is booted,
or when the failed zone is synchronized until the module is repaired.

Error Handling and Analysis 4–13

4.3.2.2 CPU Module and Memory
When memory is deconfigured from the system, it is done by removing the CPU
module on which the memory resides.
Table 4–8 describes the OpenVMS operating system actions taken when a CPU
module or memory is identified as the FRU and is deconfigured by the EHS.
These actions are identical for CPU and memory failures. Some actions are
dependent on the system operating mode.
Table 4–8 CPU Deconfiguration Actions
Action Taken

Description

Comments

Cross-link mode =
Degraded Duplex

The cross-link mode is set to master
on the zone with the surviving CPU
and slave on the zone with the failed
CPU. The action may be taken by the
hardware when the error occurs or by
software while handling the error.

Done in Duplex mode
only.

CPU/MEM fault

A CPU/MEM fault is forced on the failed
CPU module. This results in an entry
into system console.

Set CPU LED
indicator

The module I2C bus is used to turn on
the LED indicator for the failed CPU
module.

Set module status
in CPU NVRAM
and DCB

The status_os and status_sum fields
in the module ID NVRAM and DCB
are updated to indicate the module has
experienced a failure. The code written
depends on the failure type.

When one CPU is in use (Degraded Duplex, Simplex, or Resynch mode), excessive
transient failures will result in the EHS calling out the failed module, but not
removing it from service. Removing it from service would cause termination of
the OpenVMS operating system. In this case, the CPU module LED is turned
on, and the module status is written to the CPU module NVRAM and DCB. The
OpenVMS operating system continues to run. The CPU will not be configured
when the system is booted or when the failed zone is synchronized unless the
CPU is repaired.
4.3.2.3 I/O Expansion Module
Table 4–9 describes the actions taken by the OpenVMS operating system when
an I/O expansion module is identified as the FRU and is deconfigured by the
OpenVMS operating system.

4–14 Error Handling and Analysis

Table 4–9 I/O Expansion Module Deconfiguration Actions
Action Taken

Description

I/O hard reset

The I/O expansion module which is being deconfigured is reset
through the cross-link I/O hard reset register.

Set I/O expansion
module LED
indicator

The module I2C bus is used to turn on the LED for the failed
module.

Set module status
in I/O expansion
module NVRAM
and DCB

The status_os and status_sum fields in the module ID NVRAM and
DCB are updated to indicate the module has experienced a failure.
The actual code written depends on the failure type.

The entries in Table 4–9 apply when the module is being removed due to a
solid error or excessive transient errors. There is one exception. When an I/O
expansion module in a Simplex system experiences excessive transient errors, the
module is not fully deconfigured since that would likely result in the loss of the
only I/O path to a device. In this case, the I/O expansion module LED is turned
on and the module status is written to the interface module NVRAM and the
DCB.
The I/O expansion module will remain in service. The NVRAM will not be
configured when the system is booted or when the failed zone synchronized until
the module is repaired.
4.3.2.4 Interface Module
Table 4–10 describes the OpenVMS operating system actions taken when an
interface module is identified as the FRU and is deconfigured by the OpenVMS
operating system. Some actions are dependent on the system operating mode.
Table 4–10 Interface Module Deconfiguration Actions
Action Taken

Description

Reset interface module

The interface module being deconfigured is reset through the
module I2C bus.

Set interface module
LED indicator

Use the module I2C bus to turn on the LED indicator for the
failed interface module.

Set module status
in interface module
NVRAM

Update the status_os and status_sum fields in the module ID
NVRAM and the DCB to indicate the module has failed. The
code written depends on the failure type.

The entries in Table 4–10 apply when the module is being removed because of
a solid error or excessive transient errors. There is one exception. When an
interface module in a Simplex system experiences excessive transient errors, the
module is not fully deconfigured since that would likely result in the loss of the
only I/O path to a device. In this case, the interface module LED indicator is
turned on, and the module status is written to the interface module NVRAM and
the DCB (See Section 4.8.2).
The interface module will remain in service. The module will not be configured
when the system is booted or when the failed zone is synchronized until the
module is repaired.

Error Handling and Analysis 4–15

4.3.2.5 Zone
Table 4–11 describes the OpenVMS operating system actions taken when an
entire zone is identified as the FRU and is deconfigured by the EHS. Note that
some actions are dependent on the system operating mode.
Table 4–11 Zone Deconfiguration Actions
Action Taken

Description

Comments

Cross-link mode =
off

The cross-link mode is set to off.
The system will continue in Simplex
mode. The action may be taken by
the hardware when the error occurs
or by software while handling the
error.

Done only in non-Simplex
mode.

CPU/MEM fault

A CPU/MEM fault is forced on the
failed zone. This results in an entry
into system console.

Done when the error occurs
in Duplex, Simplex or in the
master zone of a Degraded
Duplex system.

Zone hard reset

A zone hard reset is issued to the
failed zone.

Done only in the slave zone
of a Degraded Duplex or
Resynch mode system.

4.3.2.6 Cross-Link Cable
Table 4–12 describes the OpenVMS operating system actions taken when the
cross-link cable is identified as the FRU and is deconfigured by the EHS. The
cross-link cable is active only during non-Simplex modes.
Table 4–12 Cross-Link Cable Deconfiguration Actions
Action Taken

Description

Comments

Cross-link mode =
off

The cross-link mode is set to off.
The system will continue in Simplex
mode. The action may be taken by
the hardware when the error occurs
or by software while handling the
error.

Done only in non-Simplex
modes.

CPU/MEM fault

A CPU/MEM fault is forced on Zone
B. This results in an entry into
system console.

Done only when the error
occurs in Duplex mode.

Zone hard reset

A zone hard reset is issued to the
slave zone.

Done in the slave zone when
the error occurs in Degraded
Duplex or Resynch mode.

4–16 Error Handling and Analysis

4.3.3 Application of Thresholds
Application of thresholds by the EHS is rate based. An FRU exceeds its threshold
when it accumulates a certain number of a given error type in a specified time
period. Table 4–13 lists the thresholds associated with each FRU and error type.
In most cases, more than one type of error can result in the isolation of an FRU.
For each FRU and error type, a separate threshold is applied. The threshold
for an error type of a specific FRU must be exceeded before the module is
deconfigured.
For example, both NXIO and I/O errors may isolate an ATM module. EHS
maintains separate thresholds for NXIO and I/O errors for each ATM module.
When one of the errors occurs and is isolated to an ATM, the threshold for that
error type on that ATM is applied. If the threshold is exceeded, the ATM is
deconfigured.
Table 4–13 FRU Thresholds
Error
Type

Error
Limit

Time
Period1 Comments
CPU Module

CPU/MEM
faults

A CPU/MEM fault results in the temporary removal
of the CPU module from service. The CPU will be
reconfigured into the system if this threshold is not
exceeded.

Resynch
abort errors

Resynch abort errors result in the termination of the
Resynch operation. When the threshold for this error
is exceeded, the CPU module is marked as broken.
System downtime must be scheduled to repair the
problem since the only CPU module has failed.
Memory SIMMs

Single-bit
memory
errors

Each single-bit memory error is attributed to a row
of memory SIMMs on a single MMB. Each SIMM row
has an individual threshold. When the threshold for
the SIMM row is exceeded, the CPU module on which
the SIMM resides will be removed from service if the
system operating mode is Duplex.
I/O ATM Module

Clock phase
errors

Each clock phase error results in the temporary
removal from service of a zone. When the zone returns
to service, it will be resynchronized automatically if the
threshold is not exceeded.

Transient
I/O errors

When this threshold is exceeded, the zone in which
the ATM resides is removed from service, except in a
Simplex system.
I/O Expansion Module

1 In hours

(continued on next page)

Error Handling and Analysis 4–17

Table 4–13 (Cont.) FRU Thresholds
Error
Type

Error
Limit

Time
Period1 Comments
I/O Expansion Module

Transient
NXIO errors

When the threshold is exceeded, the module is
deconfigured except in Simplex system.

Transient
I/O errors

When the threshold is exceeded, the module is
deconfigured except in Simplex system.

VAXELN
kernel
recoverable
errors

When the threshold is exceeded, the module is
deconfigured except in Simplex system.

Interface Module
Transient
NXIO errors

When the threshold is exceeded, the interface module is
deconfigured, except in a Simplex system.
Zone

Power
failures

When power is lost, the zone is temporarily removed
from service and the error is compared to its error rate
threshold. When power is restored, the zone will be
resynchronized automatically if the threshold has not
been exceeded.

Zone
divergence

When the zones diverge, one zone is temporarily
removed from the configuration and the error is
compared to its error rate threshold. When the
zone returns to service, it will be reconfigured if the
threshold is not exceeded. This threshold is not applied
directly to any FRU. The selection of which zone to
remove is made based on how the error manifests itself
within the system.
Cross-Link

Cable
failures

When the cable between the zones is lost, the zone
is temporarily removed from service and the error is
compared to its error rate threshold. When the zone
returns, it will be resynchronized automatically if the
threshold has not been exceeded.

Transient
I/O errors

When the threshold is exceeded, the cross-link is
deconfigured, which results in the removal of one of the
zones from service.

1 In hours

4–18 Error Handling and Analysis

4.4 OpenVMS Error Log
The EHS makes entries in the system error log for all system error interrupts.
Figure 4–3 shows the format of the error log. With the exception of the Fault
Data block, all blocks have fixed length.
Figure 4–3 OpenVMS Error Log Format

Number of Longwords
Fault Summary
FRU Information
Deconfiguration Information
Threshold Information
Fault Data
MR−0006−93RAGS

The first longword in the error log contains the count of longwords which follow.
This number is based on the fault class of the error log (see Section 4.4.1).
Table 4–14 lists the different values which will appear for each of the six different
fault classes.
Table 4–14 OpenVMS Error Log Sizes
Class Value

Fault Class

Decimal Size

Hexidecimal Size

System Error

End Action

End Action Timeout

VAXELN Error

Software Detected Error

CPU or Zone Unsynchable

Error Handling and Analysis 4–19

4.4.1 Fault Summary
The Fault Summary block contains the fault ID, fault flags describing the nature
of the fault, the cross-link mode at the time the fault occurred, and the cross-link
mode after the error handling was completed. All fields in this block are valid for
all error entries. Figure 4–4 identifies each entry in the block and the offset from
the start of the block. Table 4–15 describes the content of each field.
Note
The 1-byte FAULT_ID field is composed of two 4-bit subfields. Bits [07:04]
indicate the class of the fault. Bits [03:00] identify the error type within
the fault class. There are six fault classes. Each class has a different fault
data block at the end of the error log. See Section 4.4.5 for a description
of each fault class and the fault data provided in the error log.

Figure 4–4 Fault Summary Block

XLINK_MODE_AFTER
(Crosslink Mode After)

XLINK_MODE_ERROR
(Crosslink Mode Error)

FAULT_FLAGS
(Fault Flags)

FAULT_ID
(Fault Identification)
MR−0009−93RAGS

Table 4–15 Fault Summary Block Entry Descriptions
Entry

Contents

FAULT_ID

Fault Identification type. The hexidecimal ID values are defined
as:
10 - CPU-detected double-bit error
11 - JXD-detected double-bit error
12 - Cable gone between zones
13 - Power gone in other zone
14 - Clock error
15 - Other zone halted
16 - Resynch abort error
17 - CPU-detected single-bit error
18 - JXD-detected single-bit error
19 - CPU/MEM fault
1A - Nonexistent I/O
1B - I/O miscompare error
1C - Zones divergence
20 - CPU-detected DBE end action
21 - JXD-detected double-bit error end action
22 - Cable gone end action (reserved for future use)
(continued on next page)

4–20 Error Handling and Analysis

Table 4–15 (Cont.) Fault Summary Block Entry Descriptions
Entry

Contents
23 - Power gone end action (reserved for future use)
24 - Clock error end action
25 - Other zone halted end action (reserved for future use)
26 - Resynch abort error end action (reserved for future use)
27 - CPU-detected single-bit error end action
28 - JXD-detected single-bit error end action (reserved for future
use)
29 - CPU/MEM fault end action
2C - Zone divergence end action timeout
30 - CPU-detected DBE end action timeout
31 - JXD-detected DBE end action timeout
32 - Cable gone end action timeout (reserved for future use)
33 - Power gone end action timeout (reserved for future use)
34 - Clock error end action timeout
35 - Other zone halted end action timeout (reserved for future use)
36 - Resynch abort error end action timeout (reserved for future
use)
37 - CPU-detected SBE end action timeout
38 - JXD-detected single-bit error end action timeout (reserved for
future use)
39 - CPU/MEM fault end action timeout
3C - Zone have diverged end action timeout (reserved for future
use)
40 - VAXELN kernel fatal error
41 - VAXELN kernel recoverable error
42 - VAXELN master job fatal error
43 - VAXELN master job recoverable error
44 - VAXELN job fatal error
45 - VAXELN job recoverable error (reserved for future use)
50 - Software-detected error
60 - CPU is unsynchable

FAULT_FLAGS

The following fields are defined within FAULT_FLAGS:
00 - Transient error
01 - Solid error
02 - Error threshold exceeded
03 - Service is required
(continued on next page)

Error Handling and Analysis 4–21

Table 4–15 (Cont.) Fault Summary Block Entry Descriptions
Entry

Contents
[07:04] - Not used

XLINK_MODE_
ERROR

Cross-link mode at the time of error. The following values are
defined:
0 - Off (Simplex)
1 - Slave
2 - Master
3 - Duplex
4 - Not used
5 - RESYNCH_SLAVE
6 - RESYNCH_MASTER
7 - Not used

XLINK_MODE_
AFTER

Cross-link mode after error handling. The modes are as defined
for XLINK_MODE_ERROR.

4.4.2 FRU Information
This block contains information on the isolated FRU and is valid for all error
events. Figure 4–5 identifies each entry in the block and the offset from the start
of the block. Table 4–16 describes the content of each entry.
Note
In some cases, an FRU is not identified in the error log for a system error
event. All fields in this block will be -1 (FFFFFFFF hexidecimal). In
these cases, the FRU will be identified in a subsequent end action or end
action timeout error log.

Figure 4–5 FRU Information Block

FRU_TYPE (FRU Type)

0
+4

FRU_DATA (FRU Data)
MR−0010−93RAGS

4–22 Error Handling and Analysis

Table 4–16 FRU Information Block Entry Descriptions
Entry

Contents

FRU_TYPE

The following bits are defined:
01 - The FRU is a module in Zone A (FRU_DATA has slot ID)
02 - The FRU is a module in Zone B (FRU_DATA has slot ID)
03 - Zone A is the FRU
04 - Zone B is the FRU
05 - The cross-link cable is the FRU
06 - The FRU is a Zone A SIMM (FRU_DATA has SIMM ID)
07 - The FRU is a Zone B SIMM (FRU_DATA has SIMM ID)

FRU_DATA

FRU specific data. The following bits are defined for IDs 1 and 2:
00 - CPU module in slot 0 is the FRU
01 - ATM module in slot 1 is the FRU
02 - I/O expansion module in slot 2 is the FRU
[09:03] - Not used
10 - Interface module in slot 10 is the FRU
11 - Interface module in slot 11 is the FRU
12 - Interface module in slot 12 is the FRU
13 - Interface module in slot 13 is the FRU
14 - Interface module in slot 14 is the FRU
15 - Interface module in slot 15 is the FRU
16 - Interface module in slot 16 is the FRU
17 - Interface module in slot 17 is the FRU
[19:18] - Not used
20 - Interface module in slot 20 is the FRU
21 - Interface module in slot 21 is the FRU
22 - Interface module in slot 22 is the FRU
23 - Interface module in slot 23 is the FRU
24 - Interface module in slot 24 is the FRU
25 - Interface module in slot 25 is the FRU
26 - Interface module in slot 26 is the FRU
27 - Interface module in slot 27 is the FRU
[31:28] - Not used

Note
The following fields define the SIMM ID for FRU_TYPEs 06 and 07:
[15:00] = MMB ID from 0 to 3.
[31:16] = SIMM row ID. Values 1 to 4 represent SIMM rows A to D,
respectively.
This field = -1 for all other FRU_TYPE values.

Error Handling and Analysis 4–23

4.4.3 Deconfiguration Information
This error log block contains information about any system deconfiguration
performed by the EHS. Figure 4–6 identifies each entry in the block and the
offset from the start of the block. Table 4–17 describes the content of each entry.
Note
For errors which require no system deconfiguration, only the FT_FLAGS
fields will be filled in. The last two longwords will contain 0.

Figure 4–6 Deconfiguration Information Block

FT_FLAGS_BEFORE (Fault Flags Before)
FT_FLAGS_AFTER (Fault Flags After)

0
+4
+8

DECONFIG_INFO (Entity Deconfigured)
DECONFIG_MODULES (Modules Deconfigured)

+12

MR−0011−93RAGS

Table 4–17 Deconfiguration Information Block Entry Descriptions
Entry

Contents

FT_FLAGS_
BEFORE

The contents of EXE$GL_FT_FLAGS at the time the system error
occurred. The field is valid for all errors.

FT_FLAGS_AFTER

The contents of EXE$GL_FT_FLAGS after error handling is
complete. If the EHS performs any system deconfiguration that
includes degraded system mode in the cross-link, this field will
differ from FT_FLAGS_BEFORE. Otherwise, they are the same.
The field is valid for all errors.

DECONFIG_INFO

This field shows the entity which was deconfigured as a result of
the error. This is either a module in a given zone or an entire zone.
The following bits are defined:
00 - Zone A deconfigured.
01 - Zone B deconfigured.
02 - CPU module in Zone A deconfigured.
03 - CPU module in Zone B deconfigured.
04 - ATM module in Zone A deconfigured.
05 - ATM module in Zone B deconfigured.
06 - I/O expansion module in Zone A deconfigured.
07 - I/O expansion module in Zone B deconfigured.
08 - Interface module in Zone A deconfigured.
09 - Interface module in Zone B deconfigured.
(continued on next page)

4–24 Error Handling and Analysis

Table 4–17 (Cont.) Deconfiguration Information Block Entry Descriptions
Entry

Contents

DECONFIG_
MODULES

This field shows the Zone A modules removed from service as
a result of error handling. For example, if the source of a solid
or excessive transient error were an I/O expansion module, all
attached interface modules have been removed from service. The
following bits are defined:
00 - CPU module in slot 0 has been removed from service.
01 - I/O expansion module in slot 1 has been removed from service.
Set when the expansion module portion of the ATM module in slot 1
is removed from service. Removal of this portion of the ATM module
does not require deconfiguring the entire zone.
02 - I/O expansion module in slot 2 has been removed from service.
03 - ATM module in slot 1 has been removed from service. Set when
the entire ATM module is removed from service. The bits for all
other modules present in the zone will also be set. The entire zone
is deconfigured.
[09:04] - Not used.
10 - Interface module in slot 10 has been removed from service.
11 - Interface module in slot 11 has been removed from service.
12 - Interface module in slot 12 has been removed from service.
13 - Interface module in slot 13 has been removed from service.
14 - Interface module in slot 14 has been removed from service.
15 - Interface module in slot 15 has been removed from service.
16 - Interface module in slot 16 has been removed from service.
17 - Interface module in slot 17 has been removed from service.
[19:18] - Not used.
20 - Interface module in slot 20 has been removed from service.
21 - Interface module in slot 21 has been removed from service.
22 - Interface module in slot 22 has been removed from service.
23 - Interface module in slot 23 has been removed from service.
24 - Interface module in slot 24 has been removed from service.
25 - Interface module in slot 25 has been removed from service.
26 - Interface module in slot 26 has been removed from service.
27 - Interface module in slot 27 has been removed from service.
[31:28] - Not used.

Error Handling and Analysis 4–25

4.4.4 Threshold Information
When the Transient Error flag is set in the FAULT_FLAGS field of the Fault
Summary block, the isolated FRU error is compared to its error rate threshold.
When threshold is exceeded, the FRU will be removed from the system. In
addition, the Excessive Transient Errors flag is set in the FAULT_FLAGS field.
When the threshold comparison is completed, the threshold information is written
to the error log. Figure 4–7 identifies each entry in the block and the offset from
the start of the block. Table 4–18 describes the content of each entry.
Note
For errors which do not require a threshold comparison, all entries in this
block will be -1 (FFFFFFFF hex).

Figure 4–7 Threshold Information Block

THRESH_INT (Threshold Interval)
THRESH_COUNT (Threshold Count)

0
+4
+8

THRESH_LMT (Threshold Limit)
+12
THRESH_ZERO (Time Since Zeroed)
THRESH_TOTAL (Total Error Types)

+16

MR−0012−93RAGS

Table 4–18 Threshold Information Block Entry Descriptions
Entry

Content

THRESH_INT

The event threshold interval, expressed in seconds.

THRESH_COUNT

The number of events detected within the threshold interval,
expressed in decimal.

THRESH_LMT

The number of events which, if detected within the threshold
interval, will cause the event to be treated as a solid error by the
EHS. Expressed in decimal.

THRESH_ZERO

Time since the threshold count was last zeroed, expressed in
seconds.

THRESH_TOTAL

Total number of this type error since the threshold was zeroed,
expressed in decimal.

4–26 Error Handling and Analysis

4.4.5 Fault Data
The Fault Data block has a variable length specific to the class of the fault which
occurred. The error class can be determined by the high-order four bits of the
FAULT_ID field in the Fault Summary block (see Table 4–15). The six Fault Data
types based on these fault classes are shown in Figure 4–8 and described in the
following subsections.
Figure 4–8 Fault Data Block

System Registers
End Actions (End Action Registers)
End Action Timeouts
VAXELN Detected Errors
Software Detected Errors
Unsynchable Events

+108
+112
+1
+16
+8

MR−0005−93RAGS

4.4.5.1 System Registers
The EHS gathers system error information in the course of error handling. The
content of these registers is written to the error log. Table 4–19 lists each register
entry and its offset from the start of the block.
Note
For different system errors, different sets of system registers are collected.
A value of -1 (FFFFFFFF hex) in a system register location in the error
log indicates that the register was not recorded.

Error Handling and Analysis 4–27

Table 4–19 System Register Entry Descriptions
Entry

Content

Offset

SYSFLT

JXD System Fault Register

SYSADR

JXD System Error Address Register

DMAADR

DMA Error Address Register

DMA_IO_ADDR

DMA Engine I/O Error Address Register

JCSR_A

JXD Control and Status Register - Zone A

JCSR_B

JXD Control and Status Register - Zone B

JDIAG_P_A

JXD Diagnostic Error Register - Zone A, primary rail

JDIAG_M_A

JXD Diagnostic Error Register - Zone A, mirror rail

JDIAG_P_B

JXD Diagnostic Error Register - Zone B, primary rail

JDIAG_M_B

JXD Diagnostic Error Register - Zone B, mirror rail

ATMERR0_A

JXD ROM BUS ATM Error Register - Zone A

ATMERR0_B

JXD ROM BUS ATM Error Register - Zone B

DMASTS_A

DMA Status Register - Zone A

DMASTS_B

DMA Status Register - Zone B

MMBERR0_A

JXD ROM BUS MMB Error Register 0 - Zone A

MMBERR0_B

JXD ROM BUS MMB Error Register 0 - Zone B

MMBERR1_A

JXD ROM BUS MMB Error Register 1 - Zone A

MMBERR1_B

JXD ROM BUS MMB Error Register 1 - Zone B

SERCRS_A

Serial Cross-Link Control and Status Register - Zone A

SERCRS_B

Serial Cross-Link Control and Status Register - Zone B

SERMODE_A

Serial Cross-Link Mode Register - Zone A

SERMODE_B

Serial Cross-Link Mode Register - Zone B

BIU_ADDR_A

CPU BIU Address Register - Zone A

BIU_ADDR_B

CPU BIU Address Register - Zone B

BIU_STAT_A

CPU Fill Syndrome - Zone A

BIU_STAT_B

CPU Fill Syndrome - Zone B

100

BIU_CTL_A

CPU Fill Address - Zone A

104

BIU_CTL_B

CPU Fill Address - Zone B

108

4.4.5.2 End Actions
End action data is provided after diagnostics have completed running on a zone
or CPU which was removed from service as a result of a system error. It is
composed of console and diagnostic status and the contents of registers from the
failed zone/CPU at the time the original system error occurred. Table 4–20 lists
each register entry and its offset from the start of the data block.

4–28 Error Handling and Analysis

Table 4–20 End Actions Register Descriptions
Entry

Content

Offset

SYSFLT

JXD System Fault Register

SYSADR

JXD System Error Address Register

JCSR

JXD Control and Status Register

JDIAG_P

JXD Diagnostic Error Register - primary rail

JDIAG_M

JXD Diagnostic Error Register - mirror rail

MMBERR0

JXD ROM BUS MMB Error Register 0

MMBERR1

JXD ROM BUS MMB Error Register 1

ATMERR0

JXD ROM BUS ATM Error Register

DMASTS

DMA Status Register

DMAADR

DMA Error Address Register

SERCRS

Serial Cross-Link Control and Status Register

SERMODE

Serial Cross-Link Mode Register

SAVPC

CPU Saved PC - Zone A

SAVPSL

CPU Saved PSL

ECR

CPU EBox Control Register

BIU_CTL

CPU BIU Control Register

BC_TAG

CPU B-cache Error Tag

BIU_STS

CPU BIU Status Register

BIU_ADDR

CPU BIU Address Register

FIL_SYN

CPU Fill Syndrome

FIL_ADDR

CPU Fill Address

VMAR

CPU VIC Memory Address Register

ICSR

CPU IBox Control and Status Register

TBADR

CPU MBox TB Parity Address

TBSTS

CPU MBox TB Parity Status

PCSTS

CPU P-cache Status Register

100

PCCTL

CPU P-cache Control Register

104

CONSOLE_STS

System Console Duplex Compatibility Status

108

DIAG_STS

System Diagnostics Status Longword

112

4.4.5.3 End Action Timeouts
This data is provided when a zone or CPU which was temporarily removed from
service due to a fault fails to communicate through the interzone communication
service (IZC) to the remaining zone after running diagnostics. In many cases,
such a situation results in the EHS declaring a solid error for the CPU or zone in
this error log.

Error Handling and Analysis 4–29

Figure 4–9 shows the format of this Fault Data block entry and its offset.
Table 4–21 contains a brief description of the entry.
Figure 4–9 End Action Timeout Block

TIMEOUT_INT (Timeout Interval)

MR−0013−93RAGS

Table 4–21 End Action Timeout Block Entry Description
Entry

Content

Offset

TIMEOUT

End action timeout interval in seconds

4.4.5.4 VAXELN Detected Errors
This data is provided for errors detected by VAXELN software running on the I/O
expansion module. It is composed of data provided by VAXELN software when
the error was detected on the I/O expansion module.
Figure 4–10 shows the format of this Fault Data block and the offset of each
entry from the start of the block. Table 4–22 contains a brief description of each
entry.
Figure 4–10 VAXELN Detected Error Block

ERROR_CLASS (VAXELN Error Class)
ERROR_TYPE (VAXELN Error Type)

0
+4
+8

JOB_ID (ELN Component Job with Error)
+12
ERROR_CODE (Unique Error Designation Code)
ERROR_DATA (Error Condition Specific Data)

+16

MR−0014−93RAGS

Table 4–22 VAXELN Detected Error Block Entry Descriptions
Entry

Contents

ERROR_CLASS

VAXELN error class:
1 - VAXELN kernel fatal error
2 - VAXELN kernel recoverable error
3 - VAXELN master job fatal error
4 - VAXELN master job recoverable error
(continued on next page)

4–30 Error Handling and Analysis

Table 4–22 (Cont.) VAXELN Detected Error Block Entry Descriptions
Entry

Contents
5 - VAXELN job fatal error
6 - VAXELN job recoverable error (reserved for future use)

ERROR_TYPE

VAXELN error type:
1 - Hardware error
2 - Software error
3 - Unknown error

JOB_ID

VAXELN component job with error:
0 - Interface module 0 driver job
1 - Interface module 1 driver job
2 - Interface module 2 driver job
3 - Interface module 3 driver job
4 - Interface module 4 driver job
5 - Interface module 5 driver job
6 - Interface module 6 driver job
7 - Interface module 7 driver job
8 - UART 0 driver job
9 - UART 1 driver job
10 - VAXELN master job
13 - VAXELN FIST job
14 - VAXELN background job
15 - VAXELN I/O expansion module error
17 - VAXELN kernel error

ERROR_CODE

Unique error designation code (in hexadecimal)

9000

Watchdog timer expired

FA03

Job initialization failed

FA04

Job initialization timeout

CA01

Unexpected command interrupt

CA02

Unexpected interface module interrupt

Machine check handler entered with unknown type code

Floating point accelerator error

Memory management - PTE in P0 space

Memory management - PTE in P1 space

Memory management - PTE in P0 space on M bit

Memory management - PTE in P1 space on M bit

Unused interrupt priority level

Microcode detected error

Unknown hardware error

10080

Bus timeout error. Read error - normal read
(continued on next page)

Error Handling and Analysis 4–31

Table 4–22 (Cont.) VAXELN Detected Error Block Entry Descriptions
Entry

Contents

20080

DAL parity error. Read error - normal read

30080

Cache parity error. Read error - normal read

40080

Uncorrectable read data error. Read error - normal read

50080

DMA error. Read error - normal read

60080

Firewall SOC miscompare. Read error - normal read

Unknown hardware error. Read error - SPTE/PCB/SCB

10081

Read error - SPTE/PCB/SCB

20081

DAL parity error. Read error - SPTE/PCB/SCB

30081

Cache parity error. Read error - SPTE/PCB/SCB

40081

Uncorrectable read data error. Read error - SPTE/PCB/SCB

50081

DMA error. Read error - SPTE/PCB/SCB

60081

Firewall SOC miscompare. Read error - SPTE/PCB/SCB

Unknown hardware error. Write error - normal write

10082

Bus timeout error. Write error - normal write

20082

DAL parity error. Write error - normal write

30082

Cache parity error. Write error - normal write

40082

Uncorrectable read data error. Write error - normal write

50082

DMA error. Write error - normal write

60082

Firewall SOC miscompare. Write error - normal write

Unknown hardware error. Write error - SPTE/PCB

10083

Bus timeout error. Write error - SPTE/PCB

20083

DAL parity error. Write error - SPTE/PCB

30083

Cache parity error. Write error - SPTE/PCB

40083

Uncorrectable read data error. Write error - SPTE/PCB

50083

DMA error. Write error - SPTE/PCB

60083

Firewall SOC miscompare. Write error - SPTE/PCB

100

Correctable read data error

200

Polled machine bus timeout error

201

Polled machine DAL parity error

202

Polled machine cache parity error

203

Polled machine uncorrectable read data error

204

Polled machine DMA error

205

Polled machine Firewall SOC miscompare

206

Polled machine battery low

400

Fatal system bugcheck

401

Nonfatal system bugcheck

402

Bugcheck from process

800

Bugcheck during boot
(continued on next page)

4–32 Error Handling and Analysis

Table 4–22 (Cont.) VAXELN Detected Error Block Entry Descriptions
Entry

Contents

Normal successful completion

7C04

Bad parameter count

7C0C

Bad job or process creation

7C14

Bad string parameter length

7C1C

Bad access mode

7C24

Bad stack

7C2C

Bad object state

7C34

Bad object type

7C3C

Bad parameter value

7C44

Connect circuit completed

7C4C

Connect circuit pending

7C54

Connect circuit timeout

7C5C

Count overflow

7C64

Count underflow

7C6C

Debug signal

7C74

Device already connected

7C7C

Circuit disconnected by partner

7C84

Duplicate name

7C8C

Kernel stack not valid

7C94

Machine check

7C9C

No access to parameter

7CA4

No destination port

7CAC

No job initialization specified

7CB4

No physical memory available

7CBC

No I/O mapping register available

7CC4

No message available

7CCC

No object table entry available

7CD4

No process page table available

7CDC

No data path register available

7CE4

No pool available

7CEC

No port available

7CF4

No exit status value specified

7CFC

No such device

7D04

No such name

7D0C

No such port

7D14

No such program

7D1C

No such service

7D24

No system page table entries available
(continued on next page)

Error Handling and Analysis 4–33

Table 4–22 (Cont.) VAXELN Detected Error Block Entry Descriptions
Entry

Contents

7D2C

No virtual address space available

7D34

Power recovery signal

7D3C

Quit signal

7D44

Remote port value

7D4C

Process exit signal

7D54

Remote system currently unreachable

7D5C

Interprocess signal

7D64

Remote system rejected username or password

7D6C

Bad message size

7D74

Referenced shareable image not present

7D7C

Unsupported program image format

7D84

Internal consistency failure

7D8C

Port on another BI node

7D94

Third party disconnected circuit

7D9C

Network is in the off state

7DA4

No such job

7F01

Time has not been previously set

7F09

Expedited message

7F11

Previous job created area

7F19

Device already exists

ERROR_DATA

Error condition specific data. This entry is reserved for future
expansion.

4.4.5.5 Software Detected Errors
This data is provided for errors detected by the OpenVMS operating system
components. Such errors are not usually detected by hardware mechanisms. The
data is composed of information passed by the operating system component to the
EHS.
Figure 4–11 shows the format of this fault data block and the offset of each entry
from the start of the block. Table 4–23 contains a brief description of each entry.
Note
If the software component which detects the module failure does not
request the setting of the module ID NVRAM status code or does not
request a reset of the module, then these fields will contain -1 (FFFFFFFF
hexidecimal).

4–34 Error Handling and Analysis

Figure 4–11 Software Detected Error Block

MODULE_STATUS
RESET_REASON

0
+4
+8

RESET_ACTION
MR−0007−93RAGS

Table 4–23 Software Detected Error Block Entry Descriptions
Entry

Contents

MODULE_STATUS

Hexidecimal module ID NVRAM status code. The following values
are defined:

Excessive CPU/MEM faults

Excessive resynchronization abort errors

Double-bit error

Excessive single-bit errors

Excessive clock phase errors

Excessive CPU I/O errors

Solid CPU I/O errors

Excessive transient NXIO errors

Solid NXIO error

VAXELN kernel fatal error

The module is good

Excessive VAXELN kernel recoverable errors

VAXELN master fatal error

VAXELN master recoverable error

VAXELN job fatal error

System software detected module failure

System software detected I/O expansion module primary UART
failure

System software detected I/O expansion module auxiliary UART
failure

Unexpected VAXELN error detected

RESET_REASON

Hexidecimal OpenVMS reset reason code. The following values are
defined:

Duplex zones have diverged

Fatal cross-link error has occurred

Fatal zone error has occurred

Fatal ATM module error has occurred

Fatal CPU module error has occurred
(continued on next page)

Error Handling and Analysis 4–35

Table 4–23 (Cont.) Software Detected Error Block Entry Descriptions
Entry

Contents

Fatal memory error has occurred

Single-bit error has occurred

User command issued to stop a zone

Unexpected machine check has occurred

Software detected failure has occurred

Solid NXIO error has occurred

Excessive transient I/O expansion module errors have occurred

A solid I/O error has occurred

Excessive transient I/O errors have occurred

Excessive VAXELN kernel recoverable errors have occurred

A VAXELN master fatal error has occurred

A VAXELN job fatal error has occurred

Not enough SPTEs could be allocated to boot the OpenVMS
operating system

Unexpected system error occurred

Interface module has occurred

Unexpected VAXELN error occurred

A VAXELN kernel fatal error has occurred

RESET_ACTION

Hexidecimal console reset action code. The following values are
defined:

Unexpected CPU reset

No diagnostic CPU reset

Dispatch request CPU reset

Resynchronization reset CPU reset

Run diagnostic CPU reset

Reconfigure console CPU reset

STOP/ZONE CPU reset

10000

Unexpected I/O reset

10001

No diagnostic I/O reset

10002

Dispatch request I/O reset

10003

Z command I/O reset

10004

Load and run (VAXELN) I/O reset

10005

Upgrade flash ROM I/O reset

10006

Run diagnostic I/O reset

10007

Reconfigure console I/O reset

4.4.5.6 Unsynchable Events
This data is provided if the console reports that a zone or CPU is unsynchable
when no previous error had been associated with it. The error can occur when
diagnostics run on a zone which was not present in the system configuration, or
after a zone has been manually removed. The data is composed of console and
diagnostic status from the failed zone.

4–36 Error Handling and Analysis

Figure 4–12 shows the format of this Fault Data block and the offset of each field
from the start of the block. Table 4–24 contains a brief description of each entry.
Figure 4–12 Unsynchable Event Block

COMPAT_STS (Test Status)

0
+4

DIAG_STS (Diagnostic Status)
MR−0008−93RAGS

Table 4–24 Unsynchable Event Block Entry Descriptions
Bit

Description

COMPAT_STS

System console duplex compatibility test status. This field indicates
the results of the compatibility test performed by the console after
diagnostics have completed. The following bits are defined:

Self test failed

Zone test failed

System test failed

ATM module self test failed

Both zones have same zone ID

CPU ID EEPROM is bad

CPU ID EEPROM has bad OpenVMS status

CPU ID EEPROM has bad firmware status

CPU ID EEPROM module ID mismatches with other zone

CPU ID EEPROM module name mismatches with other zone

CPU ID EEPROM hardware revision not compatible with other zone

CPU ID EEPROM firmware revision not compatible with other zone

CPU ID EEPROM software revision not compatible with other zone

ATM module ID EEPROM is bad

ATM module ID EEPROM has bad OpenVMS status

ATM module ID EEPROM has bad firmware status

ATM module ID EEPROM module ID mismatches with other zone

ATM module ID EEPROM module name mismatches with other zone

ATM module ID EEPROM hardware revision not compatible with other
zone

ATM module ID EEPROM firmware revision not compatible with other
zone

ATM module ID EEPROM software revision not compatible with other
zone

CPU data EEPROM is bad

CPU data EEPROM system wide data area mismatches with other
zone
(continued on next page)

Error Handling and Analysis 4–37

Table 4–24 (Cont.) Unsynchable Event Block Entry Descriptions
Bit

Description

CPU memory configuration mismatches with other zone

Cables (cross-link/resynchronization)

CPU is in burn-in mode

Ethernet EEPROM mismatches with other zone

CPU console firmware cannot be run in Duplex

[31:28]

Not used

DIAG_STS

System diagnostic status longword. This field is valid when any of bits
[03:00] are set in COMPAT_STS. This longword gives additional detail
on the diagnostic failure indicated by those bits. The following bits are
defined:

[07:00]

Subtest number, expressed in decimal

[15:08]

Test number, expressed in decimal

[23:16]

Group number, expressed in decimal

[27:24]

Diagnostic flags, expressed in hexidecimal

[30:28]

Not used

Diagnostic status is valid

4.5 Module NVRAM Status and LED Indicators
There are multiple I2C buses in a Model 810 zone which are used to provide
access to NVRAMs and LEDs on each module. The system I2C bus connects all
the modules in the primary backplane slots in a zone and has master controllers
on the IO ATM module. This I2C bus is used to access the NVRAMs and
the LEDs on the CPU and IO ATM modules, and the embedded primary I/O
expansion module. The primary I/O expansion module has an I2C bus with
a master controller and connections to each interface module to access their
NVRAMs and LEDs.
When the EHS identifies a module as the source of solid or excessive transient
errors, it removes the module from service. At the same time, it flags the module
as failed, turns on the module LED, and writes the error code to the module
NVRAM through its I2C bus. When the zone is removed for service, the LED
remains on.
When repair is complete and system power is turned on, diagnostics on the CPU
or I/O expansion module will examine the error code. If the OpenVMS operating
system flagged the module as failed, or diagnostics fail, the diagnostics will
not turn off the LED. The LED remains on until the module is replaced or the
NVRAM is cleared.
Table 4–25 lists the status codes that the EHS may write into the operating
system status field of the module ID NVRAM, as well as symbol names,
descriptions, and affected modules. The EHS sets the module LED every time it
writes one of these status codes.
Note
In the case of some catastrophic ATM failures, it may not be possible to
access the I2C bus for that zone to write the code and set the LED. In

4–38 Error Handling and Analysis

such cases, diagnostics on the remote zone are relied on to report the
failure.

Table 4–25 Module ID NVRAM/DCB Status Codes
Status Code

Description

Affected Modules

The threshold for CPU/MEM faults for
this module has been exceeded.

CPU module

The threshold for resynch abort errors
for this module has been exceeded.

CPU module

The module experienced a double-bit
memory error.

CPU module

The threshold for single-bit errors for a
memory SIMM has been exceeded.

CPU module

The zone in which this module resides
has experienced excessive clock phase
errors.

ATM module

The module has experienced excessive
transient CPU I/O errors.

ATM and I/O expansion
modules

The module has experienced a solid CPU
I/O error.

ATM and I/O expansion
modules

The module has experienced excessive
transient NXIO errors.

ATM, I/O expansion, and
Interface modules

The module has experienced a solid
NXIO error.

ATM, I/O expansion, and
Interface modules

The module has experienced a VAXELN
kernel fatal error.

I/O expansion module

The module is good.

CPU, ATM, I/O expansion,
and Interface modules

The module has experienced excessive
VAXELN kernel recoverable errors.

I/O expansion module

The module has experienced a VAXELN
master fatal error.

I/O expansion module

The module has experienced a VAXELN
master recoverable error.

Interface module

The module has experienced a VAXELN
job fatal error.

Interface module

A failure of this module has been
detected by a system software
component.

ATM, I/O expansion, and
Interface modules

A failure of the system console UART
port in the SSC on the I/O expansion
module has been detected by a system
software component.

ATM and I/O expansion
module

A failure of the auxiliary UART port in
the SSC on the I/O expansion module
has been detected by a system software
component.

ATM and I/O expansion
module

Error Handling and Analysis 4–39

4.6 FTSS Event Reporting Interface
The EHS externalizes events by reporting them to the event reporting interface
(ERI). The ERI, in turn, passes notification of the event to the FTSS$SERVER
process. The server reports the event in one of three ways:
1. Generating messages that are sent to the operator console.
2. Entering additional information into the system error log.
3. Reporting the event to an external mailbox which can be read by a user
application.

4.6.1 Event Reporting Interface Routines
The EHS reports events by calling the following ERI routines located in the
FTSS$CORE image.
FTSS$ZONE_AVAILABLE is called to report the availability of the other zone
or CPU. This occurs when the IZC notifies the EHS that the zone has completed
diagnostics and is available for use. A message code is added by the EHS and
results in an OPCOM message and an error log being generated by the server.
FTSS$ERROR_REPORT is called by the EHS when a FRU is identified as the
error source. This can occur as a result of a hardware or software detected
failure. In this call the EHS passes error information through ERI to the server
process. The server generates the appropriate messages to the operator console
and user applications, and makes entries in the error log.

4.6.2 Error Event Messages
The following messages are passed to OPCOM and the system error log by
the server. Each message corresponds to an EHS error event and contains
information that identifies the FRU.
FTSS$_CABLEGONE, cross-link cable fault detected
Facility: FTSS
Explanation: The crosslink cable has been isolated as the cause of a system
failure. One zone will be removed from service by the operating system. For
transient failures, the error will be compared to its error rate threshold.
If the threshold is not exceeded, the zone will be resynchronized when it
completes diagnostics.
User Action: If the zone is automatically resynchronized, no action
is required on the part of the user. If the zone is not automatically
resynchronized, the system error log should be examined for entries which
correspond to the cross-link cable failure. These entries will identify an FRU.
FTSS$_CLOCK_END, Clock fault end action complete
Facility: FTSS
Explanation: Error processing for a clock fault has been completed and the
zone is available to be resynchronized.
User Action: If the zone is automatically resynchronized by FTSS, then no
action is needed on the part of the user. If the zone is not resynchronized,
the system error log should be examined for entries which correspond to clock
fault. These error logs will identify an FRU.

4–40 Error Handling and Analysis

FTSS$_CLOCK_ENDTMO, Clock fault end action timeout on zone [zone_id]
Facility: FTSS
Explanation: When a clock fault occurs in a non-Simplex system, diagnostics
normally run on the failed zone and, upon completion, report status back to
the zone running the operating system. If this end action does not occur
within a reasonable timeout period, the failure will be treated as solid and
the zone will not be automatically resynchronized by FTSS.
User Action: The system error log should be examined for entries which
correspond to the clock fault and the end action timeout. These entries will
indicate an FRU.
FTSS$_CLOCKFLT, Clock fault detected on [module_id] in slot [slot_id], zone
[zone_id]
Facility: FTSS
Explanation: The clocks in each of the two zones operate in phase lock.
When this synchronization is lost, lockstep operation of the zones is lost. The
error is compared to its error rate threshold. If the threshold is exceeded, the
zone is not automatically resynchronized by FTSS.
User Action: If the removed zone is automatically resynchronized after
running diagnostics, no action is needed on the part of the user. If the zone
is not automatically resynchronized, the system error log should be examined
for entries which correspond to the clock fault. These entries will identify an
FRU which must be replaced.
FTSS$_CPMF_END, CPU/MEM fault end action complete
Facility: FTSS
Explanation: Error processing for a CPU/MEM fault has been completed
and the CPU is available to be resynchronized.
User Action: If the CPU is automatically resynchronized by FTSS, then no
action is needed on the part of the user. If the CPU is not resynchronized,
the system error log should be examined for entries which correspond to the
CPU/MEM fault. These error logs will identify an FRU.
FTSS$_CPMF_ENDTMO, CPU/MEM fault end action timed out on zone [zone_
id]
Facility: FTSS
Explanation: When a CPU/MEM fault occurs in a Duplex system,
diagnostics normally run on the failed CPU and, upon completion, report
status back to the zone running the operating system. If this end action does
not occur within a reasonable timeout period, the failure will be treated as
solid and the CPU will not be automatically resynchronized by FTSS.
User Action: The system error log should be examined for entries which
correspond to the CPU/MEM fault and the end action timeout. These entries
will indicate an FRU.

Error Handling and Analysis 4–41

FTSS$_CPUDBE, Double-bit memory fault detected on [module_id] in slot [slot_
id], zone [zone_id]
Facility: FTSS
Explanation: A double-bit memory error has occurred. This indicates a solid
memory failure. This error will only be reported in a Duplex system and a
CPU module will be removed from service when it occurs.
User Action: The system error log should be examined for entries which
correspond to the double-bit error. These logs will indicate the SIMM memory
row which must be replaced.
FTSS$_CPUSBE, A single-bit memory fault detected on [module_id] in slot [slot_
id], zone [zone_id]
Facility: FTSS
Explanation: A recoverable single-bit memory error has been detected and
handled by the operating system. These transient errors are repaired in
memory and compared to their error rate threshold. In a Duplex system, a
CPU module will be removed from service if the threshold is exceeded.
User Action: In most cases, no action by the user is necessary. If the rate of
single-bit errors becomes excessive, replacement of a SIMM memory row or
CPU module will be required. The system error log should be examined for
the entries which correspond to the single-bit errors.
FTSS$_CPUMEMFLT, CPU/MEM fault detected on [module_id] in slot [slot_id],
zone [zone_id]
Facility: FTSS
Explanation: A CPU/MEM fault in a Duplex system has been detected.
This results in the temporary removal of that CPU from service. This error
is compared to its error rate threshold. If the threshold is not exceeded and
the CPU completes diagnostics successfully, the CPU will be automatically
resynchronized. If the threshold is exceeded or diagnostics fail, the CPU will
be not be automatically resynchronized.
User Action: If the CPU is automatically resynchronized after the
completion of diagnostics, no action is required on the part of the user. If
the CPU is not automatically resynchronized, the system error log should be
examined for entries which correspond to the CPU/MEM fault. These entries
will indicate an FRU.
FTSS$_CPUUNSYNC, [module_id] in slot [slot_id], zone [zone_id] is
unsynchable
Facility: FTSS
Explanation: When a CPU completes diagnostics with failure and reports
this status to the zone running the operating system, this message is
generated. The CPU with the failure will not be automatically resynchronized
by FTSS.
User Action: The system error log should be examined for the entry which
corresponds to the unsynchable event. This entry will indicate an FRU.

4–42 Error Handling and Analysis

FTSS$_DBE_END, DBE end action complete
Facility: FTSS
Explanation: Error processing for a double-bit memory error has been
completed and the CPU is available to be resynchronized.
User Action: The system error log should be examined for entries which
correspond to the double-bit error. These error logs will identify an FRU.
FTSS$_DBE_ENDTMO, DBE end action timed out on zone [zone_id]
Facility: FTSS
Explanation: When double-bit memory errors occur in a Duplex system,
diagnostics run on the failed CPU and, upon completion, report status back
to the zone running the operating system. If this end action does not occur
within a reasonable timeout period, the failure will be treated as solid and
the CPU will not be automatically resynchronized by FTSS.
User Action: The system error log should be examined for entries which
correspond to the double-bit error and the end action timeout. These entries
will indicate an FRU.
FTSS$_DIV_END, zone divergence end action complete
Facility: FTSS
Explanation: Error processing for a zone divergence error been completed
and the zone is available to be resynchronized.
User Action: If the zone is automatically resynchronized by FTSS, then no
action is needed on the part of the user. If the zone is not resynchronized,
the system error log should be examined for entries which correspond to zone
divergence error. These error logs will identify an FRU.
FTSS$_DIV_ENDTMO, zone divergence end action timed out on zone [zone_id]
Facility: FTSS
Explanation: When zones diverge in a Duplex system, diagnostics run on
the removed zone and, on completion, report status to the zone running
the OpenVMS operating system. If this end action does not occur within a
reasonable timeout period, the failure will be treated as solid and the zone
will not be automatically resynchronized by FTSS.
User Action: The system error log should be examined for entries which
correspond to the zone divergence and the end action timeout. These entries
will indicate an FRU.
FTSS$_DIVERGED, A synchronized, dual zone configuration has diverged
Facility: FTSS
Explanation: Lockstep operation between the two zones of a Duplex system
has been lost. One of the zones is temporarily removed from service. The
error is compared to its error rate threshold. If the threshold is not exceeded,
the zone will be automatically resynchronized by FTSS after successfully
completing diagnostics. If the threshold is not exceeded or diagnostics fail,
the zone is not automatically resynchronized.
User Action: If the zone is automatically resynchronized, no action
is necessary on the part of the user. If the zone if not automatically
resynchronized, the system error log should be examined for entries which
correspond to the zone divergence error. These entries will indicate an FRU.

Error Handling and Analysis 4–43

FTSS$_ELNJOBFATAL, VAXELN job fatal error detected on [module_id] in slot
[slot_id], zone [zone_id]
Facility: FTSS
Explanation: A VAXELN job running on an I/O Expansion module has
detected a fatal error and has terminated. This error results in the removal
of the associated Interface module from the system.
User Action: The system error log should be examined for entries which
correspond to the VAXELN job fatal error. These entries will indicate an
FRU.
FTSS$_ELNJOBRECOV, VAXELN job recoverable error detected on [module_id]
in slot [slot_id], zone [zone_id]
Facility: FTSS
Explanation: A VAXELN job running on an I/O Expansion module has
detected a recoverable error. These errors are compared to their error
rate threshold by the operating system. If the threshold is exceeded in a
non-Simplex system, the associated Interface module is removed from the
system.
User Action: If the threshold is not exceeded, no action is required on the
part of the user. If the threshold is exceeded, the system error log should be
examined for entries which correspond to the VAXELN job recoverable error.
These entries will indicate an FRU.
FTSS$_ELNKERFATAL, VAXELN kernel fatal error detected on [module_id] in
slot [slot_id], zone [zone_id]
Facility: FTSS
Explanation: The VAXELN kernel running on an I/O Expansion module has
detected a fatal error and has terminated. This error results in the removal
of the indicated I/O Expansion module and associated Interface modules from
the system configuration.
User Action: The system error log should be examined for entries which
correspond to the VAXELN kernel fatal error. These entries will indicate an
FRU.
FTSS$_ELNKERRECOV, VAXELN kernel recoverable error detected on
[module_id] in slot [slot_id], zone [zone_id]
Facility: FTSS
Explanation: The VAXELN kernel running on an I/O Expansion module
has detected a recoverable error. These errors are compared to their error
rate threshold by the operating system. If the threshold is exceeded in a
non-Simplex system, the indicated I/O Expansion module and associated
Interface modules are removed from service.
User Action: If the threshold is not exceeded, no action is required on the
part of the user. If the threshold is exceeded, the system error log should be
examined for entries which correspond to the VAXELN kernel recoverable
errors. These entries will indicate an FRU.

4–44 Error Handling and Analysis

FTSS$_ELNMASFATAL, VAXELN master job fatal error detected on [module_
id] in slot [slot_id], zone [zone_id]
Facility: FTSS
Explanation: The VAXELN master job running on an I/O Expansion module
has detected a fatal error and has terminated. This error results in the
removal of the indicated I/O Expansion module and associated Interface
modules from the system configuration.
User Action: The system error log should be examined for entries which
correspond to the VAXELN master job fatal error. These entries will indicate
an FRU.
FTSS$_ELNMASRECOV, VAXELN master job recoverable error detected on
[module_id] in slot [slot_id], zone [zone_id]
Facility: FTSS
Explanation: The VAXELN master job running on an I/O Expansion module
has detected a recoverable error. These errors are compare to their threshold
by the operating system. If the threshold is exceeded in a non-Simplex
system, the indicated I/O Expansion module and associated Interface modules
are removed from service.
User Action: If the threshold is not exceeded, no action is required on the
part of the user. If the threshold is exceeded, the system error log should be
examined for entries which correspond to the VAXELN master job recoverable
errors. These entries will indicate an FRU.
FTSS$_JXDDBE, Double-bit memory fault detected on [module_id] in slot [slot_
id], zone [zone_id]
Facility: FTSS
Explanation: A double-bit memory error has occurred. This indicates a solid
memory failure. In a Duplex system, a CPU module will be removed from
service when this error occurs.
User Action: The system error log should be examined for entries which
correspond to the double bit error. These logs will indicate the SIMM memory
row which must be replaced.
FTSS$_JXDSBE, Single-bit memory fault detected on [module_id] in slot [slot_
id], zone [zone_id]
Facility: FTSS
Explanation: A recoverable single-bit memory error has been detected and
handled by the operating system. These transient errors are repaired in
memory, and the errors are compared to their error rate threshold. In a
Duplex system, a CPU module will be removed from service if the threshold
is exceeded.
User Action: In most cases, no action by the user is necessary. If the rate of
single-bit errors becomes excessive, replacement of a SIMM memory row will
be required. The system error log should be examined for the entries which
correspond to the single-bit errors.

Error Handling and Analysis 4–45

FTSS$_POWERGONE, Power gone fault detected on zone [zone_id]
Facility: FTSS
Explanation: Power has been lost in one of the zones. This error is
compared to its error rate threshold. If the threshold is not exceeded, the
zone will be automatically resynchronized when power returns.
User Action: If power is restored and the zone is automatically
resynchronized, no action is required on the part of the user. If power is
restored and the zone is not automatically resynchronized, the user should
examine the external system power source.
FTSS$_RESYNCHFLT, Resynch abort fault detected on [module_type] in slot
[slot_id], zone [zone_id]
Facility: FTSS
Explanation: During an attempt to resynchronize a CPU/Memory module,
an error occurred on the master CPU module. This error is compared to
its error rate threshold by the operating system. If the threshold is not
exceeded, FTSS will retry the resynchronization process. When the threshold
is exceeded, attempts to resynchronize will be terminated.
User Action: If the resynchronization retry is successful, no action is
required on the part of the user. If the threshold for retries is exceeded, the
system error log should be examined for entries which correspond to the
resynch abort failure. These entries will indicate an FRU.
FTSS$_SBE_END, SBE end action complete
Facility: FTSS
Explanation: Error processing for a single-bit memory error has been
completed and the CPU is available to be resynchronized.
User Action: If the CPU is automatically resynchronized by FTSS, then no
action is needed on the part of the user. If the CPU is not resynchronized, the
system error log should be examined for entries which correspond to single
bit error. These error logs will identify an FRU.
FTSS$_SBE_ENDTMO, SBE end action timed out on zone [zone_id]
Facility: FTSS
Explanation: When single-bit memory errors occur in a Duplex system,
diagnostics run on the failed CPU and, on completion, report status back
to the zone running the operating system. If this end action does not occur
within a reasonable timeout period, the failure will be treated as solid and
the CPU will not be automatically resynchronized by FTSS.
User Action: The system error log should be examined for entries which
correspond to the single-bit error and the end action timeout. These entries
will indicate an FRU.
FTSS$_SOLIDIOMOD, Solid I/O fault detected on [module_type] in slot [slot_id],
zone [zone_id]
Facility: FTSS
Explanation: A fatal I/O miscompare error was detected and attributed to
the indicated module. The module is removed from service by the operating
system.
User Action: The system error log should be examined for entries which
correspond to the I/O miscompare errors. These entries will indicate an FRU.

4–46 Error Handling and Analysis

FTSS$_SOLIDNXIO, Solid NXIO fault detected on [module_type] in slot [slot_
id], zone [zone_id]
Facility: FTSS
Explanation: A fatal nonexistent I/O error has occurred when accessing the
indicated I/O module. The module is removed from service by the operating
system.
User Action: The system error log should be examined for entries which
correspond to the nonexistent I/O error. These entries will indicate an FRU.
FTSS$_SOLIDIOXLNK, Solid I/O fault detected on the cross-link
Facility: FTSS
Explanation: A fatal I/O miscompare error was detected and attributed
to the cross-link. One zone is selected and is removed from service by the
operating system.
User Action: The system error log should be examined for entries which
correspond to the I/O miscompare errors. These entries will indicate an FRU.
FTSS$_SOLIDIOZONE, Solid I/O fault detected on zone [zone_id]
Facility: FTSS
Explanation: A fatal I/O miscompare error was detected and attributed to
the indicated zone. The zone is removed from service by the operating system.
User Action: The system error log should be examined for entries which
correspond to the I/O miscompare errors. These entries will indicate an FRU.
FTSS$_SWMODERR, Software detected failure on [module_type] in slot [slot_
id], zone [zone_id]
Facility: FTSS
Explanation: A system software component has detected the failure of a
system module. In most cases, these errors indicate the failure of an I/O
module which was detected by a device driver and not reported by a system
error interrupt. These errors indicate a fatal failure of the indicated module
and it is removed from service.
User Action: The system error log should be examined for entries which
correspond to the software detected module failure. These entries will
indicate an FRU.
FTSS$_SWZONERR, Software detected failure on zone [zone_id]
Facility: FTSS
Explanation: A system software component has detected the failure of
a zone. This error indicates a fatal failure of the indicated zone and it is
removed from service.
User Action: The system error log should be examined for entries which
correspond to the software detected zone failure. These entries will indicate
an FRU.

Error Handling and Analysis 4–47

FTSS$_TRNSIOMOD, Transient I/O fault detected on [module_type] in slot
[slot_id], zone [zone_id]
Facility: FTSS
Explanation: A transient I/O miscompare error was detected and attributed
to the indicated module. These errors are compared to their error rate
threshold. If the threshold is exceeded and the system mode is not Simplex,
the module is removed from service.
User Action: If the threshold is not exceeded and the module is not removed
from service, no action is needed on the part of the user. If the module is
removed from service, the system error log should be examined for entries
which correspond to the I/O miscompare errors. These entries will indicate an
FRU.
FTSS$_TRNSNXIO, Transient NXIO fault detected on [module_type] in slot
[slot_id], zone [zone_id]
Facility: FTSS
Explanation: A transient non-existent I/O error was detected when
accessing the indicated module. These errors are compared to their error
rate threshold. If the threshold is exceeded and the system mode is not
Simplex, the module is removed from service.
User Action: If the threshold is not exceeded and the module is not removed
from service, no action is needed on the part of the user. If the module is
removed from service, the system error log should be examined for entries
which correspond to the non-existent I/O errors. These entries will indicate
an FRU.
FTSS$_TRNSIOXLNK, Transient I/O fault detected on the cross-link
Facility: FTSS
Explanation: A transient I/O miscompare error was detected and attributed
to the cross-link. These errors are compared to their error rate threshold. If
the threshold is exceeded and the system mode is not Simplex, then one zone
is removed from service.
User Action: If the threshold is not exceeded and a zone is not removed from
service, no action is needed on the part of the user. If a zone is removed from
service, the system error log should be examined for entries which correspond
to the I/O miscompare errors. These entries will indicate an FRU.
FTSS$_TRNSIOZONE, Transient I/O fault detected on zone [zone_id]
Facility: FTSS
Explanation: A transient I/O miscompare error was detected and attributed
to the indicated zone. These errors are compared to their error rate threshold.
If the threshold is exceeded and the system mode is not Simplex, the zone is
removed from service.
User Action: If the threshold is not exceeded and the zone is not removed
from service, no action is needed on the part of the user. If the zone is
removed from service, the system error log should be examined for entries
which correspond to the I/O miscompare errors. These entries will indicate an
FRU.

4–48 Error Handling and Analysis

FTSS$_ZONEHALT, Zone Halt fault detected on zone [zone_id]
Facility: FTSS
Explanation: A single zone of a Duplex system has been halted. This can be
caused by a user command on the system console or by a system error.
User Action: If the Halt was caused by a user command on the system
console, a START/ZONE command must be executed to restore the zone to
service. If the Halt was not caused by a user command, the system error log
should be examined for entries which correspond to the zone halt error. These
entries will identify an FRU.
FTSS$_ZONEUNSYNC, Zone [zone_id] is unsynchable
Facility: FTSS
Explanation: When a zone completes diagnostics with failure and reports
this status to the zone running the operating system, this message is
generated. The zone with the failure will not be automatically resynchronized
by FTSS.
User Action: The system error log should be examined for the entry which
corresponds to the unsynchable event. This entry will indicate an FRU.
4.6.2.1 Deconfiguration Messages
The following messages can be passed to OPCOM and the system error log file
by the FTSS$SERVER at the request of EHS. Each message corresponds to a
deconfiguration activity performed by EHS. Each message contains information
(through FAO arguments) that identifies the entity deconfigured by EHS.
FTSS$_DECONFIG_ATMIO, I/O expansion subsystem on I/O attachment
module in slot [slot_id], zone [zone_id] has been removed from service
Facility: FTSS
Explanation: Due to one or more system errors, the I/O expansion
subsystem on the indicated I/O ATM and its associated Interface modules
have been removed from service.
User Action: The system error log should be examined for entries which
correspond to the removal of the I/O expansion subsystem. These entries will
indicate an FRU.
FTSS$_DECONFIG_CPUMOD, CPU module in slot [slot_id], zone [zone_id] has
been removed from service
Facility: FTSS
Explanation: Due to one or more system errors, the indicated CPU module
has been removed from service. In some cases, the CPU may be automatically
resynchronized by FTSS when it successfully completes the execution of
diagnostics.
User Action: If the CPU is automatically resynchronized by FTSS after
completing diagnostics, no action is required on the part of the user. If the
CPU is not automatically resynchronized, the system error log should be
examined for entries which relate to the removal of the CPU. These entries
will indicate an FRU.

Error Handling and Analysis 4–49

FTSS$_DECONFIG_EXMOD, I/O expansion module in slot [slot_id], zone [zone_
id] has been removed from service
Facility: FTSS
Explanation: Due to one or more system errors, the indicated I/O Expansion
module and its associated Interface modules have been removed from service.
User Action: The system error log should be examined for entries which
correspond to the removal of the I/O expansion module. These entries will
indicate an FRU.
FTSS$_DECONFIG_INTMOD, Interface module in slot [slot_id], zone [zone_id]
has been removed from service
Facility: FTSS
Explanation: Due to one or more system errors, the indicated Interface
module has removed from service.
User Action: The system error log should be examined for entries which
correspond to the removal of the Interface module. These entries will indicate
an FRU.
FTSS$_DECONFIG_ZONE, Zone [zone_id] has been removed from service
Facility: FTSS
Explanation: Due to one or more system errors, the indicated zone has
been removed from service. In some cases, the zone may be automatically
resynchronized by FTSS when it successfully completes the execution of
diagnostics.
User Action: If the zone is automatically resynchronized by FTSS after
completing diagnostics, no action is required on the part of the user. If the
zone is not automatically resynchronized, the system error log should be
examined for entries which relate to the removal of the zone. These entries
will indicate an FRU.

4.7 Firmware Interfaces
The EHS interacts with three firmware-based software entities: system console
and diagnostics, I/O expansion module console and diagnostics, and the I/O
expansion module VAXELN software. The system console and diagnostics and
I/O expansion module console and diagnostics interfaces are discussed in the
following sections.

4.7.1 System Console and Diagnostics
The EHS communicates with the system console through:
•

System hardware resets combined with flags in the console communications
area (CCA)

•

CCA fields referenced using the IZC service

4–50 Error Handling and Analysis

4.7.1.1 System Resets
When the EHS determines that a zone or CPU should be removed from the
configuration, it forces a reset on the CPU. The reset results in the system console
being invoked from serial ROM by the hardware. When system console runs, it
attempts to determine the reason for the reset, which in turn may determine
the actions performed by the console. The EHS uses the fields in the CCA reset
dispatch block (at offset CCA560$R_RESET_BLOCK) to pass reset reason codes
to the console. The fields are:
RDB$L_RESET_CODE - The reset reason code. This longword field is actually
composed of two one-word fields:
•

RDB$W_ACTION - The reset action. This word instructs the console on the
action that needs to be taken. The reset action codes used by the EHS are
described in Table 4–26.

•

RDB$W_REASON - The reset reason. This field is additional data supplied
by the OpenVMS operating system which indicates the reason for the reset.
The code is printed in hex on the operator console after the reset action
is completed. The reset reason codes used by the EHS are described in
Table 4–27.

RDB$L_REASON_VALID - The 1’s complement of the reset reason code longword.
RDB$L_DISPATCH - This field is used only if the system console is to continue
the OpenVMS operating system after completing reset actions. In all reset cases
by the EHS, it will be 0.
Table 4–26 System Reset Action Codes
Decimal
Value

Description

This code will cause the system console to enter its halt loop, which will
establish IZC to the other zone, without invoking any diagnostics. Currently,
this reset action is requested only when the EHS is handling a single-bit
error.

This code will cause the system console to invoke diagnostics. The
diagnostics which run depend on the cross-link mode at the time. Following
diagnostics, the system console will enter its halt loop, and establish IZC to
the other zone. The code is used when a zone or CPU is being removed due
to a system error.

This code will cause the same actions as CPURESET$K_DIAGS. This code
is used when a zone is being removed by operator action (that is, a user
command).

Error Handling and Analysis 4–51

Table 4–27 System Reset Reason Codes
Decimal
Value

Description

When the EHS detects zone divergence, it selects one zone to continue the
OpenVMS operating system and one zone to stop. Note that the OpenVMS
operating system is not indicating an error in this zone; it must stop one of
the two.

When the EHS isolates a failure to the cross-link cable (for example, a cable
gone error), it will reset one zone using this reason type.

When the EHS detects a fault in a zone that cannot be isolated to a single
module, it will reset the zone with this reason type. Usually, such errors are
the result of backplane failures.

The OpenVMS operating system will use this reset with an IO ATM module
failure. Before this reset, the operating system will write an error code to
the module ID EEPROM through the I2C bus.

The OpenVMS operating system will use this to reset a CPU module after
determining that it has failed. Before the reset, the OpenVMS operating
system will write an error code to the module ID EEPROM through the I2C
bus.

The OpenVMS operating system will use this to reset a CPU module after
determining that its memory has failed.

An SBE was detected by the CPU in Duplex mode. CPU lockstep between
zones is lost on this event and it should be reestablished as soon as possible.
This code is used in conjunction with the CPURESET$K_NO_DIAGS reset
action code.

This code is used as a result of a user-issued command to remove a zone
from service.

A fatal system machine check error has occurred.

A system software component detected a failure of this module.

Table 4–28 lists the events which might cause the EHS to issue the reset, and the
cross-link modes under which the reset might be issued.
Table 4–28 Error Handler Reset Reasons
Event

Possible Cross-Link Modes

Double-Bit Error

OFF, MASTER

Single-Bit Error

SLAVE

Cross-Link Cable
Failure

OFF

Clock Phase Errors

OFF

I/O Errors

OFF, MASTER, SLAVE

Zone Divergence

OFF

Single-Bit Error

SLAVE

User Command

OFF, SLAVE

4–52 Error Handling and Analysis

4.7.1.2 CCA Fields
When a CPU or zone completes diagnostics, it enters its halt loop, which reports
its status to the OpenVMS operating system in the other zone through the IZC
service. The IZC service will in turn call the OpenVMS operating system to report
the availability of the other zone. The operating system requires the following
information to be available from the console in the other zone:
•

The IZC message to the operating system will contain a synchability status.
If the status is unsynchable, the OpenVMS operating system will examine
the CCA in the console zone. The field CCA560$L_COMPAT_STATUS will
contain a reason mask which describes the reasons that the zone is not
synchable. This information will be entered into the system error log.
If the reason mask indicates a diagnostic failure, the CCA560$Q_DIAG_
STATUS field will contain additional information on the failure. The EHS
will use the IZC service to read this information for entry into the system
error log.

•

The EHS uses the IZC service to read system register information from
the CCA of the other zone starting at offset CCA560$R_REG_BLOCK. The
registers in this block were written by the EHS when the original error
occurred. However, the console must preserve this area through all resets and
during diagnostic execution, whenever possible (some catastrophic failures
will prevent this from working).

4.7.2 I/O Expansion Module Console and Diagnostics
When the EHS determines that an I/O expansion module should be removed from
the configuration, it forces an I/O hard reset on the modules. This results in the
I/O expansion module console being invoked by hardware. When the console runs,
it attempts to determine the reason for the reset, which in turn may determine
the actions performed by the diagnostics. The EHS uses two fields in the NCA
reset dispatch block (at offset NCA560$L_RESET_BLOCK) to pass reset reason
codes to the diagnostics. The fields are:
RDB$L_RESET_CODE - The reset reason code. This longword field is actually
composed of two 1-word fields:
•

RDB$W_ACTION - The system reset action. This word instructs the console
on the action that needs to be taken. The only reset action code used by the
EHS is shown in Table 4–29.

•

RDB$W_REASON - The reset reason. This field is additional data supplied
by the operating system which indicates the reason for the reset. The reset
reason codes used by the EHS are shown in Table 4–30.

RDB$L_REASON_VALID - The 1’s complement of the reset reason code longword.
RDB$L_DISPATCH - This field is used only if the console is to continue the
operating system after completing reset actions. In all cases of I/O resets by the
EHS, it will be 0.

Error Handling and Analysis 4–53

Table 4–29 I/O Reset Action Code Description
Decimal Value

Description

This reset code will cause the I/O expansion module console to invoke
diagnostics. The diagnostics which run depend upon the mode of the
cross-link at the time. After diagnostics, console will enter its halt loop.

Table 4–30 I/O Reset Reason Code Descriptions
Decimal
Value

Description

The module has experienced a solid NXIO error.

The module has experienced excessive transient NXIO errors.

The module has experienced a solid I/O miscompare error.

The module has experienced excessive transient I/O miscompare errors.

The module has experienced excessive VAXELN kernel recoverable errors.

The module has experienced a VAXELN master fatal error.

4.8 Firmware and OpenVMS Interface Data Structures
Figure 4–13 shows the OpenVMS operating system and firmware data structure
memory map. The following sections describe the data structures used by the
console:
•

Console Communication Area (CCA)

•

Device Configuration Block (DCB)

•

Page Frame Number Bitmap (PFN)

The firmware constructs, initializes, and shares the data structures with the
OpenVMS operating system.
Figure 4–13 Firmware and OpenVMS Data Structure Memory Map

Page Frame Number (PFN) Bitmap
Zone A Sub−Device Configuration Block (SubDCB)
Zone A Device Configuration Block (DCB)
Zone B Sub−Device Configuration Block (SubDCB)
Zone A Device Configuration Block (DCB)
Console Communications Area (CCA)
Remainder of Main Memory
MR−0019−93RAGS

4–54 Error Handling and Analysis

4.8.1 Console Communications Area
The console communications area (CCA) is the main data structure used by the
console to interface with the OpenVMS operating system. Table 4–31 describes
the CCA components.
Table 4–31 CCA Component Descriptions
Parameter

Size

Description

CCA size

2 bytes

Size of the CCA in bytes. Initialized by firmware.

CCA
revision

1 byte

Revision of the CCA. Initialized by firmware.

CCA base

4 bytes

Physical address of the CCA. Initialized by firmware.

Header
flags

4 bytes

CCA flags. Field breakdown by bit:
•

00 = Bootstrap in progress. Set by firmware when
bootstrap operation is started. Cleared by the OpenVMS
operating system. Used to control the bootstrap
operation.

•

01 = Restart in progress. Set by firmware when restart
operation is started. Cleared by the OpenVMS operating
system. Used to control the restart operation.

•

02 = Automatic bootstrap. Set by firmware when a
manual bootstrap occurred.

•

03 = Reboot in progress. Set by the OpenVMS operating
system when a bootstrap operation is requested by the
operating system using the default boot specification.

•

04 = Failsafe mode. Set by firmware to indicate that the
zone is in Failsafe mode. (Failesafe mode refers to the
method used for bootstrapping.)

•

05 = Synchable status. Set by firmware to indicate that
the zone is synchable (Duplex compatibility test passed).
If bit is clear, test failed. Use the Duplex compatibility
test results component to obtain the reason for failure.

•

06 = Halted from bootstrap. Set by VMB to indicate to
the firmware that it is not to report a bootstrap error.
This bit overrides the state of the bootstrap in progress
bit 0 with respect to handling errors during the bootstrap
operation.

•

[31:07] = Reserved for firmware use.
(continued on next page)

Error Handling and Analysis 4–55

Table 4–31 (Cont.) CCA Component Descriptions
Parameter

Size

Description

Bootability
test results

4 bytes

Results of the bootstrap test. Written by the firmware. Field
breakdown by bit:
•

00 = CPU/ATM check. Set when the CPU and ATM are
good.

•

01 = Cable state. Set when cables are present and good.

•

02 = Other zone power state. Set when the power is on in
the other zone.

•

03 = Other zone OpenVMS operating system state. Set
when the other zone is running the OpenVMS operating
system.

•

04 = Other zone CPU/ATM check. Set when the CPU and
ATM in the other zone are good.

•

[31:07] = Reserved for firmware use.

PFN
bitmap
address

4 bytes

Physical address of the PFN bitmap. Initialized by firmware.

PFN
bitmap
size

4 bytes

Size of the PFN bitmap in bytes. Initialized by firmware.

PFN
bitmap
checksum

4 bytes

Checksum of the PFN bitmap. Checksum = integer sum of all
bytes in the PFN bitmap.

System
serial
number

12 bytes

System serial number. 12 ASCII characters. Initialized by
firmware. Copied from the CPU module data EEPROM.

Zone A
DCB offset

4 bytes

Offset to the Zone A DCB. Offset is the byte offset (signed)
from the CCA base. Initialized by firmware.

Zone A
DCB size

4 bytes

Size in bytes of the DCB for Zone A. The size includes the
DCB and any SubDCBs for Zone A. Initialized by firmware.

Zone B
DCB offset

4 bytes

Offset to the Zone B DCB. Offset is the byte offset (signed)
from the CCA base. Initialized by firmware.

Zone B
DCB size

4 bytes

Size in bytes of the DCB for Zone B. The size includes the
DCB and any SubDCBs for Zone B. Initialized by firmware.
(continued on next page)

4–56 Error Handling and Analysis

Table 4–31 (Cont.) CCA Component Descriptions
Parameter

Size

Description

Diagnostic
status

8 bytes

Results of the diagnostic tests. Initialized by firmware.
Breakdown of the status fields:
•

[07:00] = Error number

•

[15:08] = Subtest number

•

[23:26] = Test number

•

[27:24] = Group number

•

[30:28] = Diagnostic flags. For firmware use only.

•

31 = Set when bits 27:00 indicate a valid failure code.

The high-order four bytes are reserved for firmware.
Duplex
compatibility test
results

4 bytes

Results of the compatibility test. Written by firmware. See
Section 4.8.1.1 for the test descriptions and fault codes.

Reset
dispatch
block

16 bytes

Used by firmware and the OpenVMS operating system to
notify the firmware how to handle a reset entry to firmware.
See Section 4.8.1.2 for dispatch block description.

Boot
parameter
table

164 bytes

Boot parameter table. Initialized by firmware. See
Section 4.8.1.3 for the description.

Saved
register
block

132 bytes

Reserved

64 bytes

Reserved for future expansion.

4.8.1.1 Duplex Compatibility Test
On firmware entry, the console program verifies a number of conditions that are
required for system operation in Duplex mode. These conditions determine if the
zone is synchable, that is, able to join a partner zone in Duplex operation.
The IZC protocol is used by the console program to execute the Duplex
compatibility test. Once the console establishes the IZC service, it executes
the test and notifies the other zone of the results. A zone is considered synchable
if it passes the test.
The compatibility test is responsible for storing the results in the CCA. The
following items are test parameters.
•

Diagnostic status:
CPU self-test passes
CPU zone test passes
Primary I/O expansion module self-test passes
CPU system test does not fail (not run assumes a passed condition)

•

Zone identification:
One Zone A, one Zone B.

Error Handling and Analysis 4–57

•

CPU module ID EEPROM:
Valid checksum
OpenVMS and firmware status byte is good
Module ID and module name compatible with other zone
Module hardware revision compatible with other zone (major)
Firmware and software revisions compatible with other zone (major)

•

I/O ATM module ID EEPROM:
Valid checksum
OpenVMS and firmware status byte is good
Module ID and module name compatible with other zone
Module hardware revision compatible with other zone (major)
Firmware and software revisions compatible with other zone (major)

•

CPU module data EEPROM:
Valid checksum
System data area must be the same in both zones

•

Memory restrictions for synchronization:
Same memory configuration on both zones

•

Cross-link and resynch cables functional

•

Operational modes must be compatible (that is, burnin state)

•

Ability of the CPU console firmware to run in cross-link in Duplex mode

Table 4–32 lists the test failure codes. Each bit represents the results of checking
the given condition. The test will attempt to check all conditions, and updates the
bits as it performs the test (set bit indicates failure).
Table 4–32 Duplex Compatibility Test Failure Codes
Failure Code
Bit Number

Code Description

CPU self-test failed

CPU zone test failed

CPU system test failed

ATM self-test failed

Both zones have the same zone ID

CPU ID EEPROM is bad

CPU ID EEPROM OpenVMS status field shows module is bad

CPU ID EEPROM firmware status field shows module is bad

CPU ID EEPROM module type field mismatches between zones

CPU ID EEPROM module name field mismatches between zones

CPU ID EEPROM hardware revision (major) mismatches between
zones

CPU ID EEPROM firmware revision (major) mismatches between
zones
(continued on next page)

4–58 Error Handling and Analysis

Table 4–32 (Cont.) Duplex Compatibility Test Failure Codes
Failure Code
Bit Number

Code Description

CPU ID EEPROM software revision (major) mismatches between
zones

ATM ID EEPROM is bad

ATM ID EEPROM OpenVMS status field shows module is bad

ATM ID EEPROM firmware status field shows module is bad

ATM ID EEPROM module type field mismatches between zones

ATM ID EEPROM module name field mismatches between zones

ATM ID EEPROM hardware revision (major) mismatches between
zones

ATM ID EEPROM firmware revision (major) mismatches between
zones

ATM ID EEPROM software revision (major) mismatches between
zones

CPU data EEPROM is bad

CPU data EEPROM system wide area mismatches between zones

CPU/memory configuration mismatches between zones

Cables (cross-link and/or resynch) are not functional

CPU is in burnin state

Ethernet EEPROM address mismatches between zones

CPU console firmware cannot be synchable (cannot run in Duplex
mode)

[31:28]

Reserved for future use

4.8.1.2 Dispatch Block Description
The firmware validates a reset entry using a dispatch block, located in memory,
to determine the next operation. Figure 4–14 shows the dispatch block structure.
Table 4–33 describes the block components.
Figure 4–14 Dispatch Block Structure

Base + 00

Dispatch Reason Code

Base + 04
Dispatch Address

Base + 0C

Dispatch Reason Complement
MR−0018−93RAGS

Error Handling and Analysis 4–59

Table 4–33 Dispatch Block Components
Block Content

Offset

Description

Dispatch reason
code

Base + 00h
4 bytes

Code identifying reset reason. Bytes 03:02
identify the reason for the reset.
Bytes 01:00 identify the end action to be taken by
the console as specified below:
•

00 = POWERUP. Default or unexpected reset.
Run diagnostics and halt (enter the console).

•

01 = NO_DIAGS. Halt (enter the console).

•

02 = DISPATCH. Dispatch requested. Jump
to the dispatch address.

•

03 = RESYNCH. Resynch reset. Jump to the
dispatch address.

•

04 = DIAGS. Run diagnostics and halt (enter
the console).

•

05 = STOP_ZONE. OpenVMS issued a STOP_
ZONE. Run diagnostics and halt (enter the
console).

•

06 = RECONFIG. Reconfigure firmware (for
firmware use only).

Dispatch address

Base + 04h
8 bytes

Physical address where console will jump. In the
Model 810, only the first 4 bytes are used. Upper
4 bytes must be 0.

Dispatch reason
complement

Base = 0Ch
4 bytes

The 1’s complement of the dispatch reason code.
Used for checking the dispatch block validity.

4.8.1.3 Boot Parameter Block Description
The boot parameter block (BPB) is a structure built by firmware to reflect the
primary bootstrap code (VMB) of the boot device that is used during the bootstrap
sequence. Table 4–34 describes the BPB components. Table 4–35 describes the
entry components in the DCB structure.
Table 4–34 BPB Components
Component

Length

Description

Number of
entries

4 bytes

Number of entries in the BPB. Written by firmware. Is 0 if
no entries are present.

BPB entries

5 bytes
per entry

An entry describes a boot path. Written by firmware.
Maximum number of entries is 32. (See Table 4–35 for
entry description.)

4–60 Error Handling and Analysis

Table 4–35 BPB Entry Components
Component

Length

Description

Unit number

2 bytes

Device unit number. Valid numbers are in the 0 to 999
(decimal) range.

Device

2 bytes

Device name in ASCII (that is, EP and DI).

Path identifier

1 byte

Path to device. Field breakdown is:
•

[06:00] = Slot number of the adapter module in the 10
to 17 (hex) and 20 to 27 (hex) range.

•

07 = Zone identification of the adapter module: 0 =
Zone A, 1 = Zone B.

4.8.2 Device Configuration Block
The device configuration block (DCB) reflects the configuration of the available
modules in the system. There is a DCB in each zone. The DCB is built by
firmware during the power up sequence and updated each time INIT and BOOT
are executed. The OpenVMS operating system uses the DCB to configure the
system. Table 4–36 describes the DCB components. Table 4–37 describes the
DCB entry components.
Table 4–36 DCB Components
Component

Length

Description

Number of
entries

4 bytes

Number of entries in the DCB. Initialized by firmware. Is 0
if no entries are present.

DCB entries

168 bytes
per entry

An entry describes a module found by the firmware.
Initialized by firmware. Maximum number of entries is
eight. (See Table 4–37 for entry description.)

Table 4–37 DCB Entry Components
Component

Length

Description

Slot number

1 byte

Physical slot number of the module. Valid slot numbers are:
0 to 2 for CPU and I/O ATM modules
0 to 7 for interface modules attached to the I/O ATM

Module type

1 byte

Code identifying the module. Module types are copied from
the module ID EEPROM. Valid module types are:
1 = Not used
2 = SWIFT adapter card
3 = I/O ATM module
4 = DSF module
5 = CPU module
6 = LANCE adapter card
7 = Not used
8 = FDDI adapter card
F = Unknown module
(continued on next page)

Error Handling and Analysis 4–61

Table 4–37 (Cont.) DCB Entry Components
Component

Length

Description

Status
summary

1 byte

Module status summary. This field is a summary of the
OpenVMS and firmware status fields. The field should be
updated whenever OpenVMS or firmware status fields are
updated. Codes are initially copied from the module ID
EEPROM. Valid codes (in hex) are:
A5 = Module is good.
B4 = Module is bad, marked by OpenVMS. See
OpenVMS status field.
C3 = Module is bad, marked by firmware. See firmware
status field.
FF = Module is bad, marked by OpenVMS and
firmware.

OpenVMS
status

1 byte

Module status as marked by OpenVMS (and maintained by
OpenVMS). Codes are initially copied from the module ID
EEPROM. Valid codes (in hex) are:
A5 = module is good.
non A5 = module is bad.

Firmware
status

1 byte

Module status as marked by firmware (and maintained by
firmware). Codes are initially copied from the module ID
EEPROM. Valid codes (in hex) are:
A5 = Module is good.
non A5 = Module is bad.

Module name

4 bytes

ASCII module name. Copied from the module ID EEPROM.

Module serial
number

12 bytes

Module serial in ASCII. Copied from the module ID
EEPROM.

Hardware
revision

6 bytes

Identifies the module hardware revision. Copied from the
module ID EEPROM. Divided in:
Minor revision (bytes 02:00)
Major revision (bytes 05:03)

Firmware
revision

2 bytes

Console/diagnostic firmware revision of the module. Copied
from the module ID EEPROM. Divided in:
Minor revision (byte 00)
Major revision (byte 01)

Software
revision

2 bytes

Functional firmware revision of the module. Copied from
the module ID EEPROM. Divided in:
Minor revision (byte 00)
Major revision (byte 01)
(continued on next page)

4–62 Error Handling and Analysis

Table 4–37 (Cont.) DCB Entry Components
Component

Length

Description

Ethernet
address

32 bytes

Module Ethernet address. Follows the DEC STD format.
Valid only for CPU module and LANCE adapter card.
Copied from the Ethernet EEPROM by firmware for the
CPU. Copied from the LANCE ROM for the LANCE adapter
card.

Extended data

32 bytes

Module-specific data. The field is copied by firmware from
the functional firmware ROM.

Memory size

4 bytes

Size of the module’s memory in 512 byte segments.
For CPU refers to the size of main memory.
For I/O ATM refers to the size of local (SOC) memory.
For interface modules refers to the size of buffer RAM.

SubDCB

4 bytes

Offset to the module SubDCB (Sub-Device Configuration
Block). Offset is the byte offset (signed) from the base of the
DCB. Is 0 if no SubDCB available.

Reserved

64 bytes

Reserved for future use.

4.8.2.1 Sub-Device Configuration Blocks
The SubDCBs reflect the configuration of the interface or memory modules
attached to a module. SubDCBs may be available for the CPU and I/O ATM
modules. The SubDCB is built by firmware during the power up sequence and
updated each time INIT and BOOT are executed.
A SubDCB is present when there are interface modules attached to a given
module and its existence is represented in that module’s DCB entry. When the
SubDCB offset field on a DCB entry is nonzero, the value is used to calculate the
location of its SubDCB block. If the SubDCB offset field on a DCB entry is zero,
there is no SubDCB block present (that is, no interface modules are attached to
that module).
The format of a SubDCB is the same as for the DCB block. The field containing
the number of entries follows the same format as a DCB entry (except the CPU
module SubDCB). Figure 4–15 shows how the SubDCBs are linked to the DCB.

Error Handling and Analysis 4–63

Figure 4–15 SubDCB Links to DCB
SubDCB for DCB Entry 1

CCA

Number of Entries
DCB Entry 1
DCB Entry 2

Zone A DCB Offset

CCA Base
+ Offset

Zone B DCB Offset

DCB Entry n−1
DCB Entry n

Zone A DCB
Number of Entries
DCB Entry 1

DCB Entry n

DCB Base
+ Offset

SubDCB for DCB Entry n
Number of Entries
DCB Entry 1
DCB Entry 2

DCB Entry n−1
DCB Entry n
MR−0020−93RAGS

4.8.2.2 CPU Module SubDCB
The CPU SubDCB is used to represent the memory modules (MMBs) available on
the CPU module. Table 4–38 describes the CPU SubDCB components. Table 4–39
describes the CPU SubDCB entry components.

4–64 Error Handling and Analysis

Table 4–38 CPU SubDCB Components
Component

Length

Description

Number of
entries

4 bytes

Number of entries in the SubDCB. Initialized by firmware.
Is 0 if no entries are present.

SubDCB
entries

16 bytes
per entry

An entry describes an MMB found by the firmware.
Initialized by firmware. Maximum number of entries is
four.

Table 4–39 CPU SubDCB Entry Components
Component

Length

Description

SIMM block

16 bytes

MMB SIMM description. This field is an array of eight
elements (SIMM0 to SIMM7). Each element is 2 bytes in
size and contains:
Byte 00 - SIMM size in Mbytes.
Byte 01 - SIMM status. Values for SIMM status (in
hex) are:
A5 = SIMM is good.
B4 = SIMM is broken.
C3 = SIMM is absent.

4.8.3 Page Frame Number Bitmap
The page frame number (PFN) bitmap is a data structure that indicates which
pages in memory are considered usable by the OpenVMS operating system. The
bitmap is built by diagnostics as a side effect of the memory tests run during the
power up sequence.
The bitmap starts on a page boundary and resides at the top of memory. The
bitmap requires 1 Kbyte for each 4 Mbytes of main memory, that is:
•

A 32-Mbyte system requires an 8-Kbyte bitmap

•

A 512-Mbyte system requires a 128-Kbyte bitmap

The bitmap does not map itself or anything above it. There may be memory
above the bitmap which has good and bad pages.
Each bit in the PFN bitmap corresponds to a page in main memory. There is
a one-to-one correspondence between a page frame number (origin 0) and a bit
index in the bitmap. A 1 in the bitmap indicates that the page is good and can
be used. A 0 indicates that the page is bad and should not be used. By default,
a page is flagged bad if a multiple bit error occurs when referencing the page.
Single-bit errors, regardless of frequency, will not cause a page to be flagged bad.

Error Handling and Analysis 4–65

4.9 Error Log Analysis
4.9.1 CPU/MEM Fault Error Log Entry
V A X / V M S

SYSTEM ERROR REPORT

******************************* ENTRY
ERROR SEQUENCE 1033.
DATE/TIME 2-FEB-1993 18:15:45.55
SYSTEM UPTIME: 0 DAYS 01:47:45
SCS NODE: SIXSHL

COMPILED 3-FEB-1993 09:33:44
PAGE 40.
686. *******************************
LOGGED ON:
SID 17000002
SYS_TYPE 02010101
VAX/VMS T5.5-D34

INT60 ERROR KA560 CPU FW REV# 2. CONSOLE FW REV# 0.1
REGISTER COUNT 00000028
Fault Summary Block

FAULT ID

FAULT FLAG

XLNK MODE ERROR

XLNK MODE AFTER

CPU/mem fault
Solid error
Duplex
Master

FRU Information Block
FRU TYPE

00000004

FRU DATA

00000001

Module in zone B

CPU in slot 0
Deconfiguration Information
FLT FLGS BEFORE 33003301
Full configuration active
Zone A CPU present
Zone B CPU present
Zone A I/O present
Zone B I/O present
Zone A CPU in use
Zone B CPU in use
Zone A I/O in use
Zone A I/O in use
FLT FLGS AFTER 33003301
Full configuration active
Zone A CPU present
Zone B CPU present
Zone A I/O present
Zone B I/O present
Zone A CPU in use
Zone B CPU in use
Zone A I/O in use
Zone A I/O in use
DECONFIG INFO

00000008
Zone B cpu removed from service

DECONFIG MODULE 00000001

CPU in slot 0 removed from service
Threshold Information Block

4–66 Error Handling and Analysis

V A X / V M S

SYSTEM ERROR REPORT

COMPILED 3-FEB-1993 09:33:44
PAGE 41.

THRESHOLD INTER.0000A8C0
THRESHOLD INTER. SECONDS = 43200.
THRESHOLD COUNT 00000001
THRESHOLD COUNT = 1.
THRESHOLD LIMIT 00000003
THRESHOLD LIMIT = 3.
THRESHOLD ZEROED0000190E
THRESHOLD ZEROED SECONDS = 6414.
THRESHOLD TOTAL 00000001
Fault Data Block

THRESHOLD TOTAL = 1.

SYSTEM ERROR
SYSFLT

19
30020010
I/O error, zone A
CPU/memory fault, zone B
XLINK MODE = Duplex

SYSADR

61200034

DMAADR

0269BC00

SYSADR = 61200034(X)
DMAADR = 0269BC00(X)
DMA Address Register Invalid
JCSR_A CTL/STAT 00000088
System errors enabled
Bcache on
JCSR_B Register Invalid
DIAG_P_A REG

CAC00000
DMA most error (non-crc)
Burn-in mode
I/O divide = 6
CPU divide = A

DIAG_M_A REG

CAC00000
DMA most error (non-crc)
Burn-in mode
I/O divide = 6
CPU divide = A

DIAG_P_B Register Invalid
DIAG_M_B Register Invalid
ATMERR_A REG

00000000
Zone ID = A

ATMERR_B Register Invalid
DMA STAT REG A 00000040
CPU I/O error
DMASTS_B Register Invalid
MMBERR0_A REG 00000000
MMBERR0_B Register Invalid
MMBERR1_A REG

00000000

Error Handling and Analysis 4–67

V A X / V M S

SYSTEM ERROR REPORT

COMPILED 3-FEB-1993 09:33:44
PAGE 42.

MMBERR1_B Register Invalid
SERCSR_A REG

00000080
Loopback request
Enable query interrupt

SERCSR_B Register Invalid
SERMODE_A REG

00200912

Master
Operating System is running
Clock fault enable
Clock select 0 = Master, 1 = Slave
Halt source 0 = A, 1 = B
SERMODE_B Register Invalid
BIU_ADDR_A Register Invalid
BIU_ADDR_B Register Invalid
BIU_STAT_A Register Invalid
BIU_STAT_B Register Invalid
BIU_CTL_A Register Invalid
BIU_CTL_B Register Invalid

! This block reflects the content of the four fields of the Fault Summary Block.
" The FAULT ID, FAULT FLAG, FRU TYPE, and FRU DATA fields should
always be reviewed. They will generally provide the most immediate FRU
information.

# The system operating mode has been changed from Duplex to Degraded
Duplex, with Zone A as the master.

$ A solid error has been identified and the FRU removed from service. However,
if the CPU has not exceeded its threshold and diagnostics pass, the CPU will
be reconfigured into the system.

% At this point, the Zone B CPU has not been removed from service.
& The Zone B CPU is being removed from service due to the solid error and
change in operating mode.

' OpenVMS is running in Zone A.

4–68 Error Handling and Analysis

4.9.2 CPU/MEM Fault End Action Error Log Entry
V A X / V M S

SYSTEM ERROR REPORT

******************************* ENTRY
ERROR SEQUENCE 1048.
DATE/TIME 2-FEB-1993 18:16:21.40
SYSTEM UPTIME: 0 DAYS 01:48:21
SCS NODE: SIXSHL

COMPILED 3-FEB-1993 09:33:46
PAGE 56.
701. *******************************
LOGGED ON:
SID 17000002
SYS_TYPE 02010101
VAX/VMS T5.5-D34

INT60 ERROR KA560 CPU FW REV# 2. CONSOLE FW REV# 0.1
REGISTER COUNT 00000029
Fault Summary Block

FAULT ID

FAULT FLAG

CPU/mem fault end action
Solid error
Service is required
XLNK MODE ERROR

XLNK MODE AFTER

Duplex
Master

FRU Information Block
FRU TYPE

00000004

FRU DATA

00000001

Module in zone B
CPU in slot 0

Deconfiguration Information
FLT FLGS BEFORE 33003301
Full configuration active
Zone A CPU present
Zone B CPU present
Zone A I/O present
Zone B I/O present
Zone A CPU in use
Zone B CPU in use
Zone A I/O in use
Zone A I/O in use
FLT FLGS AFTER 31003300
Zone A CPU present
Zone B CPU present
Zone A I/O present
Zone B I/O present
Zone A CPU in use
Zone A I/O in use
Zone A I/O in use

DECONFIG INFO

00000008
Zone B cpu removed from service

DECONFIG MODULE 00000001
CPU in slot 0 removed from service
Threshold Information Not Valid

Error Handling and Analysis 4–69

V A X / V M S

SYSTEM ERROR REPORT

COMPILED 3-FEB-1993 09:33:46
PAGE 57.

Fault Data Block
END ACTION
SYSFLT

29
30020020
I/O error, zone B
CPU/memory fault, zone B
XLINK MODE = Duplex

SYSADR

61200034
SYSADR = 61200034(X)

CNTRL/STAT REG 00000008
System errors enabled
DIAG_P REG

CAC08000
Memory double bit error
DMA most error (non-crc)
Burn-in mode
I/O divide = 6
CPU divide = A

DIAG_M REG

CAC08000
Memory double bit error
DMA most error (non-crc)
Burn-in mode
I/O divide = 6
CPU divide = A

MMBERR0 REG

01010101

MMBERR1 REG
ATMERR REG

00000000
40404040

DMA STAT REG

00000040

DMAADR

0269BC00

SERCSR REG

00000080

MMB #3 double bit error

Zone ID = B
CPU I/O error
DMAADR = 0269BC00(X)
Loopback request
Enable query interrupt
SERMODE REG

00002101
Slave
Clock fault enable
Zone ID 0 = A, 1 = B

PCADR
SAVPSL REG

00000000
0000B039
C-BIT
N-BIT
T-BIT
INTEGER OVERFLOW TRAP ENABLE
INTERRUPT PRIORITY LEVEL = 00.
PREVIOUS MODE = KERNEL
CURRENT MODE = KERNEL
FIRST PART DONE CLEAR

ECR

0000004A
fbox enable
fbox st4 bypass enable
timeout clock
pmf pmux = 00
pmf emux = 00

4–70 Error Handling and Analysis

V A X / V M S
BIU CTL

SYSTEM ERROR REPORT

COMPILED 3-FEB-1993 09:33:46
PAGE 58.

DFE0DEF9
Generate/Expect ECC on check_h pins
output enable of cache rams
direct mapped
2X CPU Cycle
IO Map = 1(X)
512 Kbytes

BC TAG

07913800
tag_match
tag control V
tag control D
tag P
BC TAG = 03C8(X)

BIU STAT

500E3070
Bits 33,32 BIU Addr Reg = 1(X)
Bits 33,32 Fill Addr Reg = 1(X)

FILL SYN

00000000
L0 ECC Syn bits Low Longword = 0(X)
Hi ECC Syn bits High Longword = 0(X)

FILL ADDR

000002A8

VMAR

000007E0

FILL ADDR = 000002A8(X)
Sub Block Select = 0(X)
Row Index = 3F(X)
Error Address Field = 00000000(X)
ICSR

00000001

TBADR
TBSTS

00000000
00000000

enable VIC
s5 cmd corresp to tb perr = 00
source of ref causing tb perr = 00
PCSTS

00000000

PCCTL

00000000

PCSTS.LOCK(0) NOT SET
Performance Monitor Mode = 0(X)
COMPAT/STAT REG 00006008
ATM self test failed
ATM ID EEPROM is bad
ATM ID EEPROM has bad os status
DIAG STATUS REG 00000000
Register is not "VALID"

! This block reflects the content of the four fields of the Fault Summary Block.
" This entry type (end action) is provided after diagnostics have completed

running on a zone or CPU which has been removed from service as a result of
a system error.
This is the end action for the previous example (CPU/MEM Fault Error Log
Entry).

# This message specifies that a physical FRU replacement is required.
$ The system operating mode has been changed from Duplex to Degraded
Duplex with Zone A as the master.

% The FRU may be one of five items: CPU module, or one of the four MMBs.
& The Zone B CPU has been removed from service.
' Double-bit errors are always treated as solid faults. The failed CPU will not
be reconfigured until Zone B memory is repaired. MMB 3 is the most likely
FRU.

Error Handling and Analysis 4–71

4.9.3 CPU or Zone Unsynchable Error Log Entry
V A X / V M S

SYSTEM ERROR REPORT

******************************* ENTRY
ERROR SEQUENCE 1099.
DATE/TIME 2-FEB-1993 18:16:21.40
SYSTEM UPTIME: 0 DAYS 01:48:21
SCS NODE: SIXSHL

COMPILED 3-FEB-1993 09:33:46
PAGE 56.
743. *******************************
LOGGED ON:
SID 17000002
SYS_TYPE 02010101
VAX/VMS T5.5-D34

INT60 ERROR KA560 CPU FW REV# 2. CONSOLE FW REV# 0.1
REGISTER COUNT 0000000E
Fault Summary Block

FAULT ID

FAULT FLAG

CPU or zone unsynchable
Solid error
Service is required
XLNK MODE ERROR

XLNK MODE AFTER

Master
Master

FRU Information Block
FRU TYPE

00000004

FRU DATA

00000001

Module in zone B
CPU in slot 0
Deconfiguration Information
FLT FLGS BEFORE 31003300
Zone A CPU present
Zone B CPU present
Zone A I/O present
Zone B I/O present
Zone A CPU in use
Zone A I/O in use
Zone A I/O in use

FLT FLGS AFTER 31003301
Zone A CPU present
Zone B CPU present
Zone A I/O present
Zone A CPU in use
Zone A I/O in use
Zone A I/O in use
DECONFIG INFO

00000008
Zone B cpu removed from service

DECONFIG MODULE 00000001

CPU in slot 0 removed from service
Threshold Information Not Valid
Fault Data Block

4–72 Error Handling and Analysis

V A X / V M S

SYSTEM ERROR REPORT

COMPILED 3-FEB-1993 09:33:46
PAGE 57.

CUP or ZONE UNSYNCHABLE EVENTS
COMPAR/STAT REG 02000000
CPU is in burnin mode
DIAG STATUS REG FFFFFFFF
Diagnostic status is valid
DIAG ERR NUM

DIAG SUBTEST NUM

DIAG TEST NUM

DIAG GROUP NUM

DIAG ERR NUM = 255
DIAG SUBTEST NUM = 255
DIAG TEST NUM = 255
DIAG GROUP NUM = 15.
Diag Flag = 7(X)

! This block reflects the content of the four fields of the Fault Summary Block.
" The system was unable to synchronize and reach Duplex mode. Consequently,
the before and after XLINK_MODE fields (Fault Summary Block) reflect
Degraded Duplex mode.

# Since the Zone B CPU was unsynchable, it is not in use.
$ The Zone B CPU was removed from service, and will remain out of service
until it is repaired.

Error Handling and Analysis 4–73

5
FRU Removal and Replacement Procedures
5.1 In This Chapter
This chapter includes:
•

Field replaceable unit list

•

Before you begin

•

FRU removal and replacement

5.2 Field Replaceable Unit List
A complete list of field replaceable units (FRUs) is given in Table 5–1.
Table 5–1 Model 810 FRUs
FRU

Part Number

Modules:
CPU

54-21075-01

Memory mother board (MMB)

54-21085-01

Single-sided SIMMs (4 Mbytes per SIMM)

54-21139-CA

Double-sided SIMMs (8 Mbytes per SIMM)

54-21139-DA

I/O attachment module (ATM)

54-21083-01

Zone control panel

54-22130-01

Fan current sense board (FCSB)

54-22126-01

Console extender module

54-21067-01

Cross-link assembly

70-03710-01

Fan

12-27848-01

Power:
AC front end unit (FEU)

H7884-AA

5V regulator (DC5)

H7179-AA

3.3V regulator (DC3)

H7178-AA

Power system controller (PSC)

H7851-AA

Domestic power distribution box

BA22J-AE

International power distribution box

BA22J-AJ
(continued on next page)

Error Handling and Analysis 5–1

Table 5–1 (Cont.) Model 810 FRUs
FRU

Part Number

Control and miscellaneous power module (CAMP)

54-21073-01

Options:
Ethernet interface module (EIM)

54-21081-01

DSSI extender module

54-21063-01

DSSI interface module (DIM)

54-21065-01

DSSI disk drawer assembly

70-30569-01

Storage:
18.2 Gbyte magazine tape subsystem

TF857-AA/AB

2.6 Gbyte cartridge tape drive

TF85C-BA

2 Gbyte disk drive

RF73-EA

852 Mbyte disk drive

RF35-EA

2.6 Gbyte cartridge tabletop tape drive

TF85-TA

Cable kit for the TF85-TA drive

CK-KDXDA-BA

4 Gbyte half-rack storage array with two RF73 drives and one
SF73-HK assembly
1.7 Gbyte half-rack storage array with two SF35 drives and one
SF35-HK assembly
Cables:
DIM to storage device with terminator (84 inches)

17-03537-03

DIM to storage device with terminator (62 inches)

17-03537-02

DIM to storage device with terminator (24 inches)

17-03537-01

Fan to fan tray

17-03514-01

Fan tray to FCSB

17-03513-01

FCSB to centerplane

17-03512-01

VT420 to UPS (power cable)

17-00442-17

Zone control panel to centerplane

17-01148-03

DSSI disk drawer to centerplane

17-03805-01

DSSI disk drawer power/signal to centerplane

17-03806-01

5–2 Error Handling and Analysis

5.3 Before You Begin
Warning
Hazadous voltages exist within the system. Bodily injury or equipment
damage can result when service procedures are performed incorrectly.

Note
FRUs should be handled only by qualified maintenance personnel.

You do not need to shut down the entire system to remove and replace a FRU.
You can shut down the zone that houses the faulty FRU while the other zone
continues to operate. Section 5.3.2 explains how to shut down a zone.
There are two types of FRU removal and replacement procedures:
•

Cold swaps

•

Warm swaps

During a cold swap, you shut down the zone that houses the faulty FRU while
the operating system continues to run in the other zone. FRUs that require cold
swaps include:
Logic modules
Fan modules
Power supplies
DIM modules
EIM modules
Zone control panel
During a warm swap, the power remains on in both zones. The operating system
continues to run in both zones while the faulty FRU is replaced. FRUs that allow
a warm swap include:
RF35 disk drives
RF73 disk drives
SF35 disk drives
SF73 disk drives
TF85 tape drives
TF857 tape subsystems
DSSI disk drawer assemblies
Chapter 6 explains how to perform a warm swap procedure.

Error Handling and Analysis 5–3

5.3.1 Handling FRUs
Static electricity can damage FRUs. When you handle FRUs, follow the rules in
Table 5–2.
Table 5–2 Handling FRUs
Rule

Action

Wear an electrostatic discharge (ESD) wrist strap.

When possible, use a grounded ESD workmat.

Attach both the wrist strap and the workmat to the system chassis.

Before you remove the FRU from the antistatic box, be sure you ground the box
to the system chassis.

Wear an ESD wrist strap when you remove the FRU from the antistatic box.

Ask the operator or system manager to shut down the zone you will be working
in.

5.3.2 Shutting Down a Zone
Typically, the shutdown is performed by the operator or the system manager.
1. Enter the SHOW ZONE command to see the status of each zone.
•

Active — The zone is running.

•

Stopped — The zone is not running the operating system. It may be
running diagnostics or is available for synchronizing.

•

Absent — The zone is not available.

•

Synchronizing — The zone is synchronizing with the other zone.

•

Providing I/O only — The zone has detected a CPU/MEM fault, and has
placed the CPU and memory off line.

2. Enter the STOP/ZONE zone-id command.
3. At the zone control panel (A or B), simultaneously press both Logic Power OFF switches to remove logic power from the zone.
Note
Pressing the Logic Power - OFF switches does not affect the fan or the
expansion cabinet power unless the drives (disk or tape) are turned off. If
the drives are turned off, the fan will run for about 30 seconds after you
press the switches.

5–4 Error Handling and Analysis

Example 5–1 How to Shut Down a Zone
$ SHOW ZONE
Zone A is ACTIVE
Zone B is PROVIDING I/O ONLY

! Displays the status of each zone.
! Zone A is running.
! Zone B has a faulty component.

$ STOP/ZONE B

! Stops zone B.

At the console terminal of the zone that continues to run (in this case, zone A),
the OPCOM messages show that zone synchronization has been lost and virtual
circuits are closed.

5.3.3 Verifying Zone Shutdown
The SHOW ZONE command may be used to verify that the STOP/ZONE zone-id
command was successful.
Example 5–2 How to Verify Zone Shutdown
$ SHOW ZONE
Zone A is ACTIVE
Zone B is ABSENT

! Displays the status of each zone.
! Zone A is running.
! Zone B has been shut down.

5.3.4 Starting Up a Zone
Typically, the startup is performed by the operator or the system manager.
1. At the zone control panel (A or B), press the Logic Power - ON switch.
2. Enter the SHOW ZONE command to verify that the zone is shut down.
3. Enter the START/ZONE command to start up the zone.

5.3.5 Accessing the FRUs
Figure 5–1 shows the latches at the front and rear of the system. To open a door,
pull the latch.
The electrostatic discharge (ESD) kit and module extraction tool are located
inside the rear door of the CPU cabinet.

Error Handling and Analysis 5–5

Figure 5–1 Latches
Latch Location
Expander
Cabinet

CPU Cabinet
Expander
Cabinet

CPU Cabinet

X ft

tem

Sys

Front View

Rear View
MR-0457-92DG

5.4 FRU Removal and Replacement
The following sections contain FRU removal and replacement procedures.
Caution
Service procedures may be performed only by qualified personnel. They
must be familiar with ESD procedures and power procedures for the
Model 810 system. Excessive shock or incorrect handling can damage the
logic modules.

Note
When specific replacement procedures are not given, replace the FRU by
reversing the steps in the removal procedure.

5–6 Error Handling and Analysis

5.4.1 CPU and ATM Modules
You use the same steps to remove the CPU and ATM modules. Figure 5–2 shows
the locations of the modules. Table 5–3 describes the removal procedure.
Figure 5–2 CPU Module and ATM Module Locations
Captive
Screws

Module
Release
Levers
ATM
Module

CPU
Module

CPU Cabinet
MR−0435−92RAGS

Table 5–3 CPU Module and ATM Module Removal Procedure
Step

Action

Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.

Open the front door of the cabinet.

Loosen the captive screws on the module. The CPU module has four captive
screws; the ATM module has two captive screws.

Open the module release levers and slide the module out.

Error Handling and Analysis 5–7

5.4.2 SIMMs
Figure 5–3 shows the locations of the SIMMs. Table 5–4 describes the removal
procedure.
Note
SIMMs are configured on the MMBs in rows, with a pair of SIMMs (two)
in each row. You always replace a pair of SIMMs (a two-SIMM row).

Figure 5–3 SIMM Locations
Retaining
Clip

SIMMs (Row D)

SIMMs (Row C)
SIMMs (Row B)
SIMMs (Row A)

MMB3
MMB0

MMB1
MMB2
CPU Module

MR-0453-92DG

Table 5–4 SIMM Removal Procedure
Step

Action

Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.

Open the front door of the cabinet.

Remove the CPU module using the procedure in Table 5–3.

Press the two retaining clips until the SIMM pops up at a 45-degree angle.

Remove the pair of SIMMs (a two-SIMM row) from the MMB.

5–8 Error Handling and Analysis

5.4.3 MMBs
Figure 5–4 shows the locations of the MMBs. Table 5–5 describes the removal
procedure.
Figure 5–4 MMB Locations
Mounting
Bracket
Screws
Mounting
Bracket
MMB3

MMB0
Mounting
Bracket

MMB1
MMB2

CPU Module
MR-0414-92DG

Table 5–5 MMB Removal Procedure
Step

Action

Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.

Open the front door of the cabinet.

Remove the CPU module using the procedure in Table 5–3.

The MMBs are tension mounted on the CPU module with two screws. These
screws are located on the MMB mounting brackets. Loosen one screw by
turning it two or three times. Then loosen the other screw the same way.
Alternate between the two screws until the MMB is free from the CPU module.
(continued on next page)

Error Handling and Analysis 5–9

Table 5–5 (Cont.) MMB Removal Procedure
Step

Action

Remove the three screws that secure each of the mounting brackets on the
MMB.

Note the configuration of the SIMMs on the MMB. They must be removed from
the faulty MMB and installed in the same locations on the replacement MMB.

Remove the SIMMs from the MMB using the procedure in Table 5–4.

5.4.4 Fan and FCSB
Figure 5–5 shows the location of the fan. Figure 5–6 shows the location of the
FCSB. Table 5–6 describes the removal procedure.
Figure 5–5 Fan Location
Front

Captive
Screws
Fan

Handle

CPU Cabinet
MR−0439−92RAGS

5–10 Error Handling and Analysis

Table 5–6 Fan and FCSB Removal Procedure
Step

Action

Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.

Open the rear door of the cabinet.

Set the FEU circuit breaker to the off position.

Open the front door of the cabinet.

Loosen the three captive screws that secure the fan in the CPU cabinet.

Grasp the handle and pull the fan out of the cabinet.

Locate the FCSB inside the fan assembly.

Disconnect the FCSB from the fan tray to FCSB cable. See Figure 5–6.

Disconnect the FCSB from the FCSB to centerplane cable. See Figure 5–6.

Remove the FCSB from the four mounting standoffs. See Figure 5–6.

Figure 5–6 FCSB Location

Fan Tray to
FCSB Cable

FCSB to
Centerplane
Cable

Mounting
Standoffs
FCSB

MR−0437−92RAGS

Error Handling and Analysis 5–11

5.4.5 RF35 Disk Drive Removal and Replacement
Figure 5–7 shows an RF35 disk drive in the DSSI disk drawer. Table 5–7
describes the RF35 disk drive removal procedure.
Figure 5–7 RF35 Disk Drive Location

Release Lever

Bracket
Phillips
Screws (6)

Captive
Screws (4)

Release
Pin

Captive
Screws
RF35
Disk
Drive

LDC
Bracket

Release
Pin

MR-0025-93DG

5–12 Error Handling and Analysis

Table 5–7 RF35 Disk Drive Removal Procedure
Step

Action

Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.

Open the front door of the cabinet.

Turn off the RF35 disk drive.

Loosen the four screws that secure the DSSI disk drive rack in the CPU
cabinet.

Pull the DSSI disk drive rack out until it locks in place.

Swing the LDC bracket out until you can see the disk drives. See Figure 5–7.

Label the DSSI, power, and disk signal cables, and disconnect them from the
RF35 drive you are removing.

Loosen the captive screws at the bottom of the drive.

Remove the drive and bracket.

Remove the six Phillips screws that secure the bracket on the drive.

Error Handling and Analysis 5–13

5.4.6 DSSI Disk Drawer
Figure 5–7 shows the components in the DSSI disk drawer. Table 5–8 describes
the DSSI disk drawer removal procedure.
Table 5–8 DSSI Disk Drawer Removal Procedure
Step

Action

Ask the operator or system manager to dismount the drive.

Open the rear door of the cabinet.

Set the FEU circuit breaker to the off position.

Open the front door of the cabinet.

Turn off all the RF35 disk drives.

Loosen the four screws that secure the DSSI disk drive rack in the CPU
cabinet.

Pull the DSSI disk drive rack out until it locks in place.

Swing the LDC bracket out until you can see the disk drives. See Figure 5–7.

Label each of the RF35 disk drives.1

Label the DSSI, power, and disk signal cables, and disconnect them from each
of the RF35 drives.

Loosen the captive screws at the bottom of each of the drives.

Remove all the drives from the DSSI disk drawer.

At the rear of the DSSI disk drawer, label the two DSSI cables and the power
cable. Then disconnect them.

Press the release lever on the left side of the DSSI disk drawer and slide the
drawer out of the cabinet.

1 Label each drive before you remove it.

The RF35 disk drives must be removed from the DSSI disk
drawer and installed in the same locations in the replacement DSSI disk drawer.

5.4.7 Zone Control Panel
Figure 5–8 shows the zone control panel. Table 5–9 describes the removal
procedure.

5–14 Error Handling and Analysis

Figure 5–8 Zone Control Panel

Captive
Screws
Zone
Control
Panel
Bracket
Signal
Cable
34

Controller
Module
Handle
Phillips
Screws (6)

Captive
Screws
MR−0023−93RAGS

Table 5–9 Zone Control Panel Removal Procedure
Step

Action

Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.

Open the front door of the cabinet.

Loosen the four captive screws that secure the zone control panel on the
cabinet.

Grasp the handle and pull the zone control panel out until you can access the
controller module signal cable.

Disconnect the signal cable from the controller module.

Remove the six Phillips screws that secure the controller module on the zone
control panel bracket.

Error Handling and Analysis 5–15

5.4.8 FEU, 3.3V Regulator, 5V Regulator, PSC Modules
You use the same steps to remove these four FRUs. Figure 5–9 shows the
locations of the modules. Table 5–10 describes the removal procedure.
Figure 5–9 FEU, 3.3V Regulator, 5V Regulator, and PSC Locations
+3.3V Regulator

+5V Regulator

PSC

Rear

Circuit
Breaker
Release
Handle
FEU

CPU Cabinet
MR−0443−92RAGS

5–16 Error Handling and Analysis

Caution
Removing/replacing these four modules without shutting down 48V_DRCT
may cause damage to the power components.

Table 5–10 FEU, 3.3V Regulator, 5V Regulator, and PSC Removal Procedure
Step

Action

Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.

Open the rear door of the cabinet.

Set the FEU circuit breaker to the off position.

If you are removing the FEU, disconnect the ac power cable from the FEU.

Loosen the screws that secure the module in the cabinet. The FEU is secured
with four screws. The 3.3V regulator, 5V regulator, and PSC are secured with
two screws.

Grasp the module release handles and pull the power module out of the cabinet.

Error Handling and Analysis 5–17

5.4.9 Cross-Link Assembly
Figure 5–10 shows the location of the cross-link assembly. Table 5–11 describes
the removal procedure. Figure 5–11 shows you how to use the module extraction
tool.
Figure 5–10 Cross-Link Assembly
Rear
Upper
Retaining
Bar

Crosslink
Module

Middle
Retaining
Bar

Crosslink
Cable
Upper
Retaining
Bar

Middle
Retaining
Bar

Crosslink
Module

CPU Cabinet
MR−0447−92RAGS

Note
The cross-link assembly consists of two cross-link modules (one per zone)
and one cross-link cable. These three parts are considered to be one FRU.

5–18 Error Handling and Analysis

Table 5–11 Cross-Link Assembly Removal Procedure
Step

Action

Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.

Open the rear door of the cabinet.

Remove the four screws from the upper retaining bar.

Remove the four screws from the middle retaining bar.

Insert the module extraction tool into the hole in the cross-link module. Turn
the module extraction tool to the right until it is fastened to the module. See
Figure 5–11.

Pull the cross-link module out of the cabinet.

Repeat steps 3 through 6 for the other zone.

Figure 5–11 Module Extraction Tool

Module
Extraction
Tool
Tighten

Loosen

Pull to Remove

MR−0024−93RAGS

Error Handling and Analysis 5–19

5.4.10 Console Extender Module
Figure 5–12 shows the location of the console extender module. Figure 5–13
shows the layout of the console extender module. Table 5–12 describes the
removal procedure.
Figure 5–12 Console Extender Module Location
Rear
Upper
Retaining
Bar

Console
Extender
Module

Middle
Retaining
Bar

CPU Cabinet
MR−0036−93RAGS

5–20 Error Handling and Analysis

Table 5–12 Console Extender Module Removal Procedure
Step

Action

Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.

Open the rear door of the cabinet.

Remove the four screws from the upper retaining bar.

Remove the four screws from the middle retaining bar.

Turn off any devices connected to the console extender module.

Label any cables connected to the console extender module. Then disconnect
them. See Figure 5–13.

Insert the module extraction tool into the hole in the console extender module.
Turn the tool to the right until it is fastened to the module. See Figure 5–11.

Pull the console extender module out of the cabinet.

Figure 5–13 Console Extender Module Layout

Local

Remote

LU
OP
CS
A
L

RM
EO
MD
O E
TM
E
A
L
A
R
M

UPS

Modem

Alarm

MR−0456−92RAGS

Error Handling and Analysis 5–21

5.4.11 DSSI Extender Module
Figure 5–14 shows the locations of the DSSI extender modules. Table 5–13
describes the removal procedure.
Figure 5–14 DSSI Extender Module Locations
Rear

Upper
Retaining
Bar

DSSI Extender
Modules
DIMs

Middle
Retaining
Bar

DSSI
Cables

DSSI Extender
Modules

DIMs

CPU Cabinet
MR−0032−93RAGS

5–22 Error Handling and Analysis

Table 5–13 DSSI Extender Module Removal Procedure
Step

Action

Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.

Open the rear door of the cabinet.

Remove the four screws from the upper retaining bar.

Remove the four screws from the middle retaining bar.

Turn off all the devices connected to the console extender module.

Label the two DSSI cables and disconnect them from the module. See
Figure 5–14.

Insert the module extraction tool into the hole in the DSSI extender module.
Turn the tool to the right until it is fastened to the module. See Figure 5–11.

Pull the DSSI extender module out of the cabinet.

Error Handling and Analysis 5–23

5.4.12 CAMP Module
Figure 5–15 shows the locations of the CAMP modules. Table 5–14 describes the
removal procedure.
Caution
Removing/replacing the CAMP module without shutting down 48V_DRCT
may cause damage to the CAMP module.

Figure 5–15 CAMP Module Locations
Rear

CAMP
Module

CPU Cabinet
MR−0475−92RAGS

5–24 Error Handling and Analysis

Table 5–14 CAMP Module Removal Procedure
Step

Action

Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.

Open the rear door of the cabinet.

Set the FEU circuit breaker to the off position.

Remove the four screws from the upper retaining bar.

Remove the four screws from the middle retaining bar.

Turn off all the devices connected to the CAMP module.

Insert the module extraction tool into the hole in the CAMP module. Turn the
tool to the right until it is fastened to the module. See Figure 5–11.

Pull the CAMP module out of the cabinet.

Error Handling and Analysis 5–25

5.4.13 DSSI Interface Module (DIM)
Figure 5–16 shows the location of the interface logic modules. Figure 5–17 shows
how to remove the DIMs. Table 5–15 describes the removal procedure.
Figure 5–16 DIM Location
Rear

Middle
Retaining
Bar

Interface
Logic
Modules
(DIMs and EIMs)

Lower
Retaining
Bar

CPU Cabinet
MR−0433−92RAGS

5–26 Error Handling and Analysis

Figure 5–17 DIM Removal
Rear

Connector

DSSI
Cable

CPU Cabinet

Expansion Cabinet
MR−0046−93RAGS

Table 5–15 DIM Removal Procedure
Step

Action

Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.

Open the rear door of the cabinet.

Remove the four screws from the middle retaining bar.

Remove the four screws from the lower retaining bar.

Turn off all the devices connected to the DIM you are removing.

Disconnect the DSSI cable from the DIM by loosening the two thumb screws.
See Figure 5–17.

Insert the module extraction tool into the hole in the DIM. Turn the tool to the
right until it is fastened to the module. See Figure 5–11.

Pull the DIM out of the cabinet.

Error Handling and Analysis 5–27

5.4.14 Ethernet Interface Module (EIM)
Figure 5–16 shows the location of the interface logic modules. Figure 5–18 shows
how to remove the EIMs. Table 5–16 describes the removal procedure.
Figure 5–18 EIM Removal
Rear
Ethernet
Switch
Ethernet
Cable
Connector

Ethernet
Cable

Terminator

CPU Cabinet

Expansion Cabinet
MR−0455−92RAGS

5–28 Error Handling and Analysis

Table 5–16 EIM Removal Procedure
Step

Action

Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.

Open the rear door of the cabinet.

Remove the four screws from the middle retaining bar.

Remove the four screws from the lower retaining bar.

Turn off all the devices connected to the EIM you are removing.

Disconnect the Ethernet cable from the EIM. See Figure 5–18.

Disconnect the terminator from the EIM, if one is present. See Figure 5–18.

Insert the module extraction tool into the hole in the EIM. Turn the tool to the
right until it is fastened to the module. See Figure 5–11.

Pull the EIM out of the cabinet.

5.4.15 DSSI Cable Removal and Replacement
Table 5–17 describes the removal procedure.
Table 5–17 DSSI Cable Removal Procedure
Step

Action

Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.

Open the rear door of the cabinet.

Turn off all the devices connected to the DSSI cable you are removing.

Disconnect one end of the DSSI cable from the device by loosening the two
screws on the DSSI connector.

Route the DSSI cable through the access hole between the system cabinets.

Disconnect the other end of the DSSI cable from the DIM by loosening the two
screws on the DSSI connector.

Error Handling and Analysis 5–29

5.4.16 TF85C-BA Tape Drive
Figure 5–19 and Figure 5–20 show how to remove an TF85C-BA tape drive from
the system. Table 5–18 describes the removal procedure.
Warning
Two people are required to lift and carry the TF85C-BA tape drive
enclosure.

Figure 5–19 TF85C-BA Tape Drive, Rear View

DSSI
Connectors

230

115

Power Supply
Fault Indicator
(Behind Panel)
FAULT

Line Voltage
Selector Switch
(Behind Panel)
MR-0454-92DG

5–30 Error Handling and Analysis

Figure 5–20 TF85C-BA Tape Drive Removal

Tape Drive Enclosure

Release Tab

Front Plate Screws (4)

Screws (3)

TF85 Tape Drive
Front Plate
MR-0038-93DG

Table 5–18 TF85C-BA Tape Drive Removal Procedure
Step

Action

Ask the operator or system manager to dismount the tape.

Ask the operator or system manager to dismount the tape drive.

Unload the tape magazine, if one is present.

At the front of the drive, set the power switch to off (0). All the indicators
should be off.

Disconnect the power cable from the rear of the drive. See Figure 5–19.

Disconnect the two DSSI cables from the rear of the drive. See Figure 5–19.

At the front of the drive, remove the three screws that secure the tape drive
enclosure in the cabinet. See Figure 5–20.

Slide the tape drive enclosure out of the expansion cabinet.

Remove the four screws that secure the front plate on the tape drive enclosure.

Push the release tab down and pull the drive straight out of the slot.

Error Handling and Analysis 5–31

5.4.17 SF73 Disk Drive
Figure 5–21 and Figure 5–22 show how to remove the SF73 disk drives from the
system. Figure 5–23 shows how to remove an SF73 disk drive enclosure from
the system. Figure 5–24 shows how to remove an SF73 disk ISE from a drive.
Table 5–19 describes the removal procedure.
Warning
Two people are required to lift and carry the SF73 disk drive enclosure.

Figure 5–21 SF73 Disk Drive, Rear View

DSSI
Connectors

1 0

AC Power
Switch

Power Supply
Fault Indicator
(Behind Panel)

230

115

FAULT

Line Voltage
Selector Switch
(Behind Panel)
MR-0422-92DG

5–32 Error Handling and Analysis

Figure 5–22 SF73 Disk Drive, Front View

digi tal

Write
Ready Protect Fault

DSSI
ID

Write
Ready Protect Fault

Captive Screws

Front Cover
Door

Captive Screws

MR-0035-93DG

Table 5–19 SF73 Disk Drive Enclosure Removal Procedure
Step

Action

Ask the operator or system manager to dismount the drive.

Turn off the disk drive enclosure.

Disconnect the power cable from the rear of the drive. See Figure 3–9.

Disconnect the two DSSI cables from the rear of the drive. See Figure 3–9.

Remove the mounting screws from the retainers that secure the drive enclosure
in the cabinet. See Figure 5–23.

Slide the disk drive enclosure out of the expansion cabinet.

Remove the retainer screws that secure the retainers on the disk drive
enclosure. See Figure 5–23.

Loosen the captive screws that secure the front cover on the disk drive
enclosure. See Figure 5–22.

Disconnect all cables from the disk ISE. Slide the disk ISE out of the disk drive
enclosure. See Figure 5–24.

Error Handling and Analysis 5–33

Figure 5–23 SF73 Disk Drive Enclosure Removal

Retainer
Screws

Chassis

Retainer

Mounting
Screws
Retainer
Screws

Retainer

MR-0484-92DG

5–34 Error Handling and Analysis

Figure 5–24 SF73 Disk ISE Removal

NOTE TO
ILLUSTRATOR:
front panel for this
hardware is SHR_X1127_89
ISOL and reduced
17/64 (.265625)
SI
D SID

e F
r it
W te c t
ro
y P

DSSI Cable

SI
D SID
di

e F
r it
W te c t
o
Pr

10-Pin
OCP Cable

NOTE TO ILLUSTRATOR:
This was created by
rotating SHR_x1074A_89_SCN
RW,Z120

SHR-X0135-90
THIS REPRESENTS

6-Pin
Power Cable

A RF72

SHR-X0128-90-SCN

Skid Plate
Guide
Disk
ISE

MR-0034-93DG

Error Handling and Analysis 5–35

5.4.18 SF35 Storage Array
Figure 5–23 shows how to remove an SF35 storage array from the system.
Figure 3–7 and Figure 5–26 show the rear and front views of the SF35 storage
array. Figure 5–27 shows how to remove an SF35 disk ISE from the storage
array. Table 5–20 describes the removal procedure.
Warning
Two people are required to lift and carry the SF35 storage array.

Figure 5–25 SF35 Storage Array, Rear View

DSSI
Connectors

digi tal

1 0

AC Power
Switch

Power Supply
Fault Indicator
(Behind Panel)

230

115

FAULT

Line Voltage
Selector Switch
(Behind Panel)
MR-0421-92DG

5–36 Error Handling and Analysis

Figure 5–26 SF35 Storage Array, Front View

Operator
Control
Panel
(OCP)

Front

Reeaarr
R

Ready
Write
Protect
Fault

Front

Rear

Drive DC Power Switches

F
E
Re

ar D

C
B
A

y
ad
Re
e
rit
W ec t
ot
Pr
ult
Fa

F
E
Fr

t
on D

B
A

F
E
Re

arD

B
A
F
E

Fr
C

t
on D

B
A

MR-0470-92DG

Error Handling and Analysis 5–37

Figure 5–27 SF35 Disk ISE Removal

A
B
C Fro

nt
D
E
F

ad
y
W
r
Pr ite
ot
ec
t
Fa
ul
t

A
B
C Re

ar
D
E
F

Carrier Lever

A
B
C Fro

nt
D

E
F

A
B
C Re

ar
D
E
F

Screw
Carrier Lever
MR-0033-93DG

Table 5–20 SF35 Storage Array Removal Procedure
Step

Action

Ask the operator or system manager to dismount the disk.

Turn off the storage array.

Disconnect the power cable from the rear of the storage array. See Figure 3–7.

Disconnect the two DSSI cables from the rear of the storage array. See
Figure 3–7.

Remove the mounting screws from the retainers that secure the storage array
in the cabinet. See Figure 5–23.

Slide the disk drive enclosure out of the expansion cabinet.

Remove the retainer screws that secure the retainers on the storage array. See
Figure 5–23.

Remove the screw from the carrier lever. See Figure 5–27.

Pull the carrier lever forward and slide the disk ISE out of the slot. See
Figure 5–27.

5–38 Error Handling and Analysis

5.4.19 TF857-CA Tape Drive
Figure 5–28 shows how to remove the TF857-CA tape drive from the system.
Table 5–21 describes the removal procedure.
Warning
Two people are required to lift and carry the TF857-CA tape drive
enclosure.

Figure 5–28 TF857-CA Tape Drive, Rear View

DSSI Cable

Cable
Clip
Tiewraps

Power Cable

Push Cable Tie
MR-0420-92DG

Error Handling and Analysis 5–39

Table 5–21 TF857-CA Tape Drive Removal Procedure
Step

Action

Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.

Ask the operator or system manager to dismount the tape drive.

Unload the tape magazine, if one is present.

At the front of the drive, set the power switch to off (0). All the indicators
should be off.

Disconnect the power cable from the rear of the drive. See Figure 5–28.

Disconnect the two DSSI cables from the rear of the drive. See Figure 5–28.

Remove the mounting screws from the retainers that secure the drive enclosure
in the cabinet. See Figure 5–23.

Slide the tape drive enclosure out of the expansion cabinet.

Loosen the shipping restraint screw until the shipping bracket drops. See
Figure 5–29. If the shipping bracket does not drop when you loosen the
shipping restraint screw, push the shipping bracket down with a screwdriver.

Slide the tape drive enclosure out of the expansion cabinet.

Figure 5–29 Loosening the Shipping Restraint Screw
Shipping
Bracket
Shipping
Restraint
Screw

MR-0466-92DG

5–40 Error Handling and Analysis

Note
If you are replacing the TF857 tape loader, you must set the node ID.
Refer to Figure 5–30 for the node ID DIP switch location.

Figure 5–30 Setting the TF857 Tape Loader Node ID

Node ID
DIP Switch

4
3
2
1

Drive Enclosure
Controller Module

TF857 Tape Drive
Assembly

W
Lo

rit

o
Pr
Fa

MR-0467-92DG

Error Handling and Analysis 5–41

5.4.20 Power Distribution Box
Figure 5–31 shows a domestic power distribution box. Figure 5–32 shows
an international power distribution box. Table 5–22 describes the removal
procedure.
Figure 5–31 Domestic Power Distribution Box

AC Power
Outlets (8)

Hex
Screws

I
CB

Circuit Breaker
DEC Power Bus
Switch

AC Power
Cable

Access Hole

Hex
Screws

MR-0044-93DG

5–42 Error Handling and Analysis

Figure 5–32 International Power Distribution Box

AC Power
Outlets (6)

Hex
Screws

AC Power
Connector
Circuit Breaker
DEC Power Bus
Switch
Access Hole

Hex
Screws

MR-0045-93DG

Table 5–22 Power Distribution Box Removal Procedure
Step

Action

Turn off any devices connected to the power distribution box.

Set the circuit breaker to the off position. See Figure 5–31 or Figure 5–32.

Set the DEC power bus switch to the local position. See Figure 5–31 or
Figure 5–32.

If you are removing a domestic power distribution box, disconnect the ac power
cable from facility power. See Figure 5–31.
If you are removing an international power distribution box, disconnect the
ac power cable from the ac power connector and from facility power. See
Figure 5–32.

Disconnect any ac power cables connected to the ac power outlets and route the
cables through the access hole. See Figure 5–31 or Figure 5–32.

Remove the four hex screws that secure the power distribution box in the
cabinet. See Figure 5–31 or Figure 5–32.

Remove the power distribution box from the cabinet.

Error Handling and Analysis 5–43

6
Managing Integrated Storage Elements
6.1 In This Chapter
This chapter includes:
•

Loading the DUP driver

•

Using VMS DUP

•

Using the server setup switch

•

Assigning DSSI unit numbers

•

Warm swapping an ISE

6.2 Loading the DUP Driver
If the VMS diagnostic utility protocol (DUP) class driver is not loaded, load it as
follows:
$ MCR SYSGEN Return
SYSGEN> CONNECT FYA0/NOADAPTER
SYSGEN> EXIT Return

Return

6.3 Using VMS DUP
Use the VMS DUP to change configuration data on mass storage devices. With
DUP, you can connect the terminal to a storage controller with the following DCL
command:
SET HOST/DUP/SERVER=MSCP$DUP/TASK=taskname nodename
where:
taskname

–

is the utility or diagnostic program name to be executed on the
target storage system

nodename

–

is the node name of the ISE

You can use SET HOST/DUP to create a virtual terminal connection to the
MSCP$DUP server and to execute a utility or diagnostic program on the MSCP
storage controller that uses the DUP standard dialogue.
Once the connection is established, operations are under the control of the utility
or diagnostic program. When the utility or program ends, control returns to the
local system.
PARAMS is the DUP management utility to examine and change ISE parameters
such as node name, allocation class, and unit number. PARAMS is also used to
display the state of the ISE and performance statistics maintained by the ISE.
PARAMS prompts for a command with the PARAMS> prompt. Once you enter a
command, PARAMS executes it, and prompts you for another command.

Managing Integrated Storage Elements 6–1

To stop the PARAMS utility, press Ctrl/C , Ctrl/Y , Ctrl/Z , or type EXIT at the
PARAMS prompt.
Table 6–1 lists PARAMS commands.
Table 6–1 PARAMS Commands
Command

Description

EXIT

Stops the PARAMS utility

HELP

Displays information on how to use PARAMS commands

SET

Changes internal ISE parameters

SHOW

Displays the setting of a parameter or a class of parameters

WRITE

Records in nonvolatile RAM the device parameter changes you made
with SET

Additional information is available on ISE tasks and commands in the
RF/TF-series installation guides.

6.4 Using the Server Setup Switch
The server setup (SU) switch facilitates the installation of a new or incorrectly
initialized ISE on a running system. Use SET HOST and configure parameters
for the ISE with DUP, before VMS recognizes the ISE as an available resource.
Table 6–2 explains how to disable RF-series and SF35, SF73, and SF72 disks.
Table 6–2 Switches For Disabling the MSCP
Disks

To Disable

More information
in

RF-Series

Press the SU switch to disable the
MSCP/TMSCP server within the ISE

VAXft Systems
Owner’s Manual

SF72 or SF 73

Set the drive positions DSSI ID number
and the left-most MSCP to disable the
ISE. The icon on the front of the door
indicates the location of the drive.

VAXft Systems
Operating
Information

SF35

Press the MSCP switch to disable the
ISE. The MSCP switch is located on the
Operator Control Panel.

VAXft Systems
Operating
Information

6.5 Assigning DSSI Unit Numbers
By default, the disk drive forces the unit number to the same value as the DSSI
node address for the drive. Since the drives in zone A and zone B initially have
the same DSSI unit number, reassign unit numbers to remove configuration
conflicts and improve system management.
All unit numbers must be unique within an allocation class. Change the
UNITNUM and FORCEUNI ISE parameters (see Table 6–3) to override the
default values that assign the unit the same value as its node address.
Reassign unit numbers so that they have values greater than 99. For example,
Figure 6–1 and Figure 6–2 use a 100-, 200-, 300-, 400-, 500-, and 600- numbering
scheme for SF35s and SF73s.

6–2 Managing Integrated Storage Elements

Figure 6–1 VAXft Model 810 Front View

Front

700

800

500

600

300

400

100

200

701
SF73

B
101
D
103
F
105

A
100
C
102
E
104

SF35

Expansion Cabinet

CPU Cabinet
MR−0050−93RAGS

6.6 Warm Swapping an ISE
Warm swapping is the procedure by which an ISE can be replaced or added to a
running system without interrupting system operations.
Caution
The procedure must be followed carefully. If a parameter is not entered
correctly, then a system reboot is necessary or the ISE (and possibly the
system) is rendered unusable.
The VMS operating system recognizes an ISE by its unique values for
the NODENAME and SYSTEMID parameters. If only one of these
parameters is changed, VMS inhibits connections to the old and new
parameters for the ISE.

Variations of this procedure depend on the purpose for the warm swap. An ISE
can be warm swapped for the following reasons:
•

Removal and replacement for storage

Managing Integrated Storage Elements 6–3

Figure 6–2 VAXft Model 810 Rear View

Rear

800

700

600

500
703

400

300

200

100

702
SF73

B
107
D
109
F
111

A
106
C
108
E
110

CPU Cabinet

Expansion Cabinet

SF35
MR−0051−93RAGS

•

Replacement in a system that is running

•

Installation in a system that is running

When replacing an ISE or installing a new ISE, determine the parameter values
for the ISE before performing the warm swap procedure. Assign values for each
of the ISE parameters described in Table 6–3.

6–4 Managing Integrated Storage Elements

Table 6–3 ISE Parameters
Parameter

Description
1

ALLCLASS

Allocation class. The default value is 0. Set the ALLCLASS value to the allocation
class chosen for the system. Note that shadowed disk devices must be set to a nonzero
allocation class.

FORCENAM

Force name parameter. Determines if the ISE is to use the NODENAME parameter
value instead of the manufacturing name given to the ISE. The value must be 0. If the
value is 1, the ISE uses a generic device name such as RF31x.

FORCEUNI

Force unit parameter. To use UNITNUM as the device unit number, set the FORCEUNI
parameter to 0. The factory default value of 1 uses the DSSI node address (hardwired
on the backplane) as the unit number.

NODENAME

Node name for an ISE. Each ISE has a node name that is stored in EEPROM. The node
name is determined in the manufacturing process and is unique to each ISE. The node
name can be changed depending on the needs of the site.

SYSTEMID

System identification number. All SYSTEMIDs must be unique within the system. Do
not change this parameter when introducing a new ISE to the system.

UNITNUM

Unit number. Specifies a numeric value for the device name. Use a unit number that is
unique within the allocation class to which you are configuring the unit. Follow the unit
numbering scheme described in Section 6.5 or use one that meets the requirements.

1 RF-series devices only

More information is available on ISE parameters in the RF/TF-series installation
guides.

6.6.1 Setting ISE Parameters
Digital Equipment Corporation recommends maintaining a worksheet of the
parameters for all ISEs, as well as the serial number of each ISE. This is
especially important at sites that maintain a set of spare drives that may be
stored for some time before they are used.
The worksheet aids in:
•

Preventing duplicate parameters, which render an ISE unusable until the
duplication is isolated and corrected

•

Finding the parameter settings of a non-operational ISE to create a
replacement unit with identical parameters

Use the ISE parameter worksheets in Appendix B to identify and record critical
parameter names and values. When installing a new ISE, select parameter
values that meet the site ISE configuration or guidelines. Then continue with
Section 6.6.4. When replacing an ISE, make sure the parameters selected are not
being used for another ISE in the configuration.
If the parameter values were not recorded, perform the following steps to extract
the information required from your system:
1. Enter SHOW DEVICE DI to display the following information:
•

Device name
The device names in the sample output below are $1$DIA22 and
$1$DIA21.

•

NODENAME

Managing Integrated Storage Elements 6–5

The node name is shown in parentheses. In the following sample output,
the node names are RIRRBA and RICYAA.
•

ALLCLASS
The allocation class is found in the device name between the dollar signs
($). In $1$DIA21, the ISE has an allocation class of 1. If the allocation
class was 0, the node name would display as RICYAA$DIA21.

•

UNITNUM
The unit number is the number following the DIA. In $1$DIA21, the
UNITNUM is 21. It is the MSCP unit number.

•

FORCENAM
The force unit name is set to 0 if NODENAME is anything other than an
RF31x. The x corresponds to a DSSI node ID (A = 0, B = 1, and so on).

•

FORCEUNI
The force unit parameter is not shown in the sample, but it should be 0
if the configuration rules given in the VAXft Systems Configuration Guide
were followed.

2. Determine whether the VMS DUP class driver is loaded by entering the
following DCL command:
$ SHOW DEVICE FYA0

Return

If the driver is not loaded, load it as follows:
$ MCR SYSGEN Return
SYSGEN> CONNECT FYA0/NOADAPTER
SYSGEN> EXIT Return

Return

3. Enter SET HOST/DUP to establish a DUP connection with the ISE as follows:
$ SET HOST/DUP/SERVER=MSCP$DUP/TASK=PARAMS nodename
This invokes DUP on the ISE and runs the PARAMS utility. If a connection
can not be established with the ISE DUP, use ANALYZE/SYSTEM to find
information on some of the parameters.
In the following sample output, the SYSTEMID is 94100302 and the
ALLCLASS is 1.
$ ANALYZE/SYSTEM

Return

VMS System Analyzer
SDA> SHOW DEVICE $1$DIA21
I/O data structures
------------------$1$DIA21
RF31

UCB address: 802D65D0

Device status: 00021810 online,valid,unload,lcl_valid
Characteristics: 1C4D4108 dir,rct,fod,shr,avl,mnt,elg,idv,odv,rnd
000022A1 clu,mscp,srv,nnm,loc

6–6 Managing Integrated Storage Elements

Owner UIC [000010,000001]
PID
00000000
Alloc. lock ID 00B000E5
Alloc. class
1
Class/Type
01/38
Def buf. size
512
DEVDEPEND
00000000
DEVDEPND2
00000000
FLCK index
34
DLCK address
00000000

Operation count
1116
Error count
0
Reference count
1
Online count
2
BOFF
0000
Byte count
0000
SVAPTE
00000000
DEVSTS
0004
RWAITCNT
0000

ORB address 802D6700
DDB address 804DA680
DDT address 80308BD8
VCB address 802E2750
CRB address 8048C250
PDT address 802A5F80
CDDB address 802D6410
I/O wait queue empty

Press RETURN for more.
SDA> Return
I/O data structures
--------------------- Primary Class Driver Data Block (CDDB) 802D6410 --Status:
0040 alcls_set
Controller Flags 80D4 icf_mlths,cf_this,cf_misc,cf_attn,cf_replc
Allocation class
1
System ID
94100302
4041
Contrl. ID
94100302
01644041
Response ID
00000000
MSCP Cmd status FFFFFFFF

CDRP Queue
Restart Queue
DAP count
Contr. timeout
Reinit Count
Wait UCB Count

empty
empty
3
60
0
0

DDB address 804DA860
CRB address 8048C250
CDDB link
80344C30
PDT address 802A5F80
Original OCB 00000000
UCB chain
802D65D0

*** I/O request queue is empty ***
Press RETURN for more.
SDA> EXIT Return
$
$ SHOW DEVICE DI
Device
Name
$1$DIA22
$1$DIA21

Return

(RIRRBA)
(RICYAA)

Device
Status
Mounted
Online

Error
Count
0
5

Volume Free Trans Mnt
Label Blocks Count Cnt
DISK22 744282
1
1

6.6.2 ISE Removal
When you replace an ISE, initialize the new ISE with the same parameters as
the ISE being replaced. Refer to the worksheet maintained for that ISE. (See
Section 6.6.1.)
You can turn off power and replace an ISE in a running system without
interrupting system services or users. When the ISE is replaced, the new
ISE must be correctly initialized to:
•

Supersede pre-set manufacturing values

•

Store the modified values in EEPROM

To replace an ISE in a system that is running, perform the following steps:

Managing Integrated Storage Elements 6–7

Caution
You must use an ESD wrist strap, ground clip, and grounded ESD
workmat whenever you handle ISEs. Use the static protective service kit
(PN 29-262446).
Use great care when you handle an ISE; excessive shock can damage the
head-disk-assembly (HDA).

1. If the ISE is mounted, logically dismount it from the system.
2. Make the device unavailable to the system by entering the following DCL
command:
$ SET DEVICE/NOAVAILABLE devicename Return
3. Verify that the device has been marked as unavailable by entering the
following DCL command:
$ SHOW DEVICE $1$DIA21

Return

Device
Name
$1$DIA21

Device
Error
Status
Count
Unavailable
5

(RICYAA)

Volume Free Trans Mnt
Label Blocks Count Cnt

4. Set the ISE power switch to off (0). Wait 45 seconds for drive to stop spinning
(and for RF-disks, the interlock solenoid to release).
5. Remove the ISE from the slot. Follow the steps in the device owner’s manual,
and observe all FRU handling procedures.

6.6.3 ISE Replacement
When you replace an ISE in a system that is running, use the following steps to
restore the parameters from the ISE being replaced. When you install a new ISE
in a system that is running, use the steps described in Section 6.6.4.
Caution
You must use an ESD wrist strap, ground clip, and grounded ESD
workmat whenever you handle ISEs. Use the static protective service kit
(PN 29-262446).
Use great care when handling an ISE. Excessive shock can damage the
HDA.

6–8 Managing Integrated Storage Elements

1. Disable the MSCP server as described in Table 6–4.
Table 6–4 Disabling the MSCP
Disks

Action

RF-series

Press and hold the SU switch/button

SF72 or SF72series

Set the MSCP enable switch

SF35

Press the MSCP/Fault switch (LED is green when enabled)

2. Set the ISE power switch to on (1). Wait for the drive to start spinning (and,
on RF-series disks, the interlock solenoid to lock).
3. If you have an RF-series disk, release the server setup switch. If you have an
SF-series disk, continue with Step 4.
4. Verify that the device has been marked as available by entering the following
DCL command:
$ SHOW DEVICE devicename Return
5. Find the NODENAME parameter for the replacement ISE by entering SHOW
CLUSTER. (SHOW DEVICE will not work at this time.) In the sample
output below, R1QSAA is the replacement ISE.
$ SHOW CLUSTER

Return

Return

If the driver is not loaded, load it by entering the following:
$ MCR SYSGEN Return
SYSGEN> CONNECT FYA0/NOADAPTER
SYSGEN> EXIT Return

Return

7. Enter SET HOST/DUP to establish a DUP connection with the ISE as follows:
$ SET HOST/DUP/SERVER=MSCP$DUP/TASK=PARAMS nodename
This invokes DUP on the ISE and runs the PARAMS utility.
8. Refer to the parameters listed in Table 6–3, and enter the SET command
to set appropriate values for the parameters. Be sure to record the new
parameters on the worksheet for the ISE.

1 Firmware version number

Managing Integrated Storage Elements 6–9

For example:
$ SET HOST/DUP/SERVER=MSCP$DUP/TASK=PARAMS R1QSAA Return
%HSCPAD-I-LOCPROGEXE, Local program executing - type ^\ to exit
Copyright (C) 1993 Digital Equipment Corporation
PARAMS> SHOW NODENAME Return
Parameter
Current
Default
Type
Radix
---------- ------------- -------------- ---------- --------NODENAME
R1QSAA
RF31
String
Ascii
PARAMS> SET NODENAME RICYAA

Return

PARAMS> SHOW SYSTEMID Return
Parameter
Current
Default
Type
Radix
---------- ------------- -------------- ---------- --------SYSTEMID
593200495860 0000000000000 Quadword
Hex
B
PARAMS> SET SYSTEMID 0404194100302

Return

PARAMS> SHOW ALLCLASS Return
Parameter
Current
Default
Type
Radix
---------- ------------- -------------- ---------- --------ALLCLASS
0
0
Byte
Dec
B
PARAMS> SET ALLCLASS 1

Return

PARAMS> SHOW FORCENAM Return
Parameter
Current
Default
Type
Radix
---------- ------------- -------------- ---------- --------FORCENAM
0
0 Boolean
0/1
B
PARAMS> SHOW UNITNUM Return
Parameter
Current
Default
Type
Radix
---------- ------------- -------------- ---------- --------UNITNUM
0
0
Word
Dec
U
PARAMS> SET UNITNUM 21

Return

PARAMS> SHOW FORCEUNI Return
Parameter
Current
Default
Type
Radix
---------- ------------- -------------- ---------- --------FORCEUNI
1
1 Boolean
0/1
U
PARAMS> SET FORCEUNI 0
PARAMS> WRITE

Return

Changes require controller initialization, ok? [Y/(N)] Y
Initializing...
HSCPAD-S-REMPGMEND, Remote program terminated - message number 3
%HSCPAD-S-END, Control returned to CLOUDS
$
9. Make the device available to the system by entering the following DCL
command:
$ SET DEVICE/AVAILABLE devicename Return
10. Mount the ISE in the system and restore the shadow sets.
11. On SF-series drives, enable the MSCP switch.
When initialization is complete, the replacement ISE and its parameters are
made available to the VMS operating system.
6–10 Managing Integrated Storage Elements

Note
The SHOW CLUSTER command continues to show the name of the ISE
replaced. This does not harm the system. After the next reboot, the
replacement ISE name appears.
Note also that the following message is displayed if another node is
already assigned the same SYSTEMID and NODENAME:
%PWA0-REMOTE SYSTEM CONFLICTS WITH KNOWN SYSTEM
In this case, shut down the new node and issue a unique SYSTEMID and
NODENAME for the new node.

6.6.4 Installing an ISE in a Running System
When you install a new ISE in a system that is running, perform the following
steps to initialize the new ISE parameters:
1. Disable the MSCP server as described in Table 6–5.
Table 6–5 Disabling the MSCP
Disks

Action

RF-series

Press and hold the SU switch/button

SF 72 or SF73

Set the MSCP enable switch

SF35

Press the MSCP/Fault switch (LED is green when enabled)

2. Set the ISE power switch to on (1). Wait for the drive to start spinning (and
on RF-series disks, the interlock solenoid to lock.
3. If you have an RF-series disk, release the server setup switch. If you have an
SF disk, continue with Step 4.
4. Refer to Table 6–3 and Section 6.6.1, and select values for the following
parameters:
•

ALLCLASS

•

FORCENAM

•

FORCEUNI

•

NODENAME

•

UNITNUM

5. Determine whether the VMS DUP class driver is loaded by entering the
following DCL command:
$ SHOW DEVICE FYA0

Return

If the driver is not loaded, load it by entering the following:
$ MCR SYSGEN Return
SYSGEN> CONNECT FY0/NOADAPTER
SYSGEN> EXIT Return

Return

6. Enter SET HOST/DUP to establish a DUP connection with the ISE as follows:
$ SET HOST/DUP/SERVER=MSCP$DUP/TASK=PARAMS nodename

Managing Integrated Storage Elements 6–11

This invokes DUP on the ISE and runs the PARAMS utility.
7. Use SET to assign appropriate values for the parameters. Be sure to record
the new parameters on the worksheet for the ISE.
In the following sample output, the new ISE is configured to be device
$1$DIA22. The device is initialized with these parameters:
•

ALLCLASS — 1

•

FORCENAM — 0

•

FORCEUNI — 0

•

NODENAME — DISK22

•

SYSTEMID — no change

•

UNITNUM — 22

$ SET HOST/DUP/SERVER=MSCP$DUP/TASK=PARAMS R1QSAA Return
%HSCPAD-I-LOCPROGEXE, Local program executing - type ^\ to exit
Copyright (C) 1990 Digital Equipment Corporation
PARAMS> SHOW NODENAME Return
Parameter
Current
Default
Type
Radix
---------- ------------- -------------- ---------- --------NODENAME
R1QSAA
RF31
String
Ascii
PARAMS> SET NODENAME DISK22

Return

PARAMS> SHOW ALLCLASS Return
Parameter
Current
Default
Type
Radix
---------- ------------- -------------- ---------- --------ALLCLASS
0
0
Byte
Dec
B
PARAMS> SET ALLCLASS 1

Return

PARAMS> SHOW FORCEUNI Return
Parameter
Current
Default
Type
Radix
---------- ------------- -------------- ---------- --------FORCEUNI
1
1 Boolean
0/1
U
PARAMS> SET FORCEUNI 0
PARAMS> WRITE

Return

Changes require controller initialization, ok? [Y/(N)] Y
Initializing...
HSCPAD-S-REMPGMEND, Remote program terminated - message number 3
%HSCPAD-S-END, Control returned to CLOUDS
$

6–12 Managing Integrated Storage Elements

When initialization is complete, the new ISE and its parameters are made
available to the VMS operating system.
8. On SF-series drives, enable the MSCP switch.
Note
The SHOW CLUSTER command continues to show the name of the ISE
you replaced. This does not harm the system. After the next reboot, the
new ISE name appears.

Managing Integrated Storage Elements 6–13

A
Miscellaneous System Information
A.1 In This Appendix
This appendix includes:
•

Processor Halt codes

•

Console Halt codes

•

Error register descriptions

•

I/O physical address space

•

System control block description

A.2 Processor Halt Codes
Table A–1 provides the processor Halt code definitions.
Table A–1 Processor Halt Code Definitions
Halt Code

Number

Definition

CPM$K_EXT_HALT

?02

External halt

CPM$K_RESET

?03

Reset

CPM$K_BAD_ISP

?04

Interrupt stack not valid

CPM$K_DBL_ERR1

?05

Machine check during execution

CPM$K_HALT

?06

Halt instruction executed

CPM$K_SCB_ERR3

?07

SCB vector bits [01:00] = 11

CPM$K_SCB_ERR2

?08

SCB vector bits [01:00] = 10

CPM$K_CHM_FRM_ISTK

?0A

CHMx executed while on interrupt stack

CPM$K_CHM_TO_ISTK

?0B

CHMx to interrupt stack

CPM$K_SCB_READ_ERR

?0C

SCB read error

CPM$K_MERR_V

?10

ACV or TNV during machine check

CPM$K_KSP_V

?11

ACV or TNV during KSP exception

CPM$K_DBL_ERR2

?12

Machine check during machine check

CPM$K_DBL_ERR3

?13

Machine check during KSP not valid

CPM$K_PSL_EXC5

?19

PSL [26:24] = 101 during interrupt or
exception

CPM$K_PSL_EXC6

?1A

PSL [26:24] = 110 during interrupt or
exception
(continued on next page)

Miscellaneous System Information A–1

Table A–1 (Cont.) Processor Halt Code Definitions
Halt Code

Number

Definition

CPM$K_PSL_EXC7

?1B

PSL [26:24] = 111 during interrupt or
exception

CPM$K_PSL_REI5

?1D

PSL [26:24] = 101 during REI

CPM$K_PSL_REI6

?1E

PSL [26:24] = 110 during REI

CPM$K_PSL_REI7

?1F

PSL [26:24] = 111 during REI

The following example shows a processor Halt code output. Table A–2 defines the
Halt Reason fields.
>>>
?03 Reset (Reason = 0017)
PC= 01E00000 PSL= 041F0300
Table A–2 Processor Halt Reason Code Definitions
Reason Code
(Hex)

Definition

0001

Duplex zones have diverged

0002

Fatal cross-link error has occurred

0003

Fatal zone error has occurred

0004

Fatal ATM error has occurred

0005

Fatal CPU module error has occurred

0006

Fatal memory error has occurred

0007

Single bit memory error has occurred

0008

User command issued to stop a zone

0009

Unexpected machine check has occurred

000A

Software detected failure has occurred

000B

Solid NXIO error has occurred

000C

Excessive transient NCIO errors have occurred

000D

A solid IO error has occurred

000E

Excessive transient IO errors have occurred

000F

Excessive VAXELN kernel recoverable errors have occurred

0010

A VAXELN master fatal error has occurred

0011

A VAXELN job fatal error has occurred

0012

Not enough SPTEs could be allocated to boot OpenVMS

0013

Unexpected system error occurred 1

0014

Interface module failure has occurred

0015

Unexpected VAXELN error occurred

1 Reset reason 0013 indicates that an unexpected system error occurred. The contents of the SYSFLT,

SYSADR, and DMAADR registers will be saved in the CCA area. See Figure A–4 for the CCA offsets
of these registers. Use the register bitmaps and description in Section A.4 to determine the cause of
the error.

(continued on next page)

A–2 Miscellaneous System Information

Table A–2 (Cont.) Processor Halt Reason Code Definitions
Reason Code
(Hex)

Definition

0016

A VAXELN kernel fatal error has occurred

0017

Initializing VAXELN before starting reconfiguration

A.3 Console Halt Codes
The following example shows a console Halt code output. Table A–3 defines the
Halt Reason fields.
>>>
?03 Reset (Reason = 0013)
PC= 01E00000 PSL= 041F0300
Table A–3 Console Halt Reason Code Definitions
Reason Code
(Hex)

Definition

0000

Power-up reset

0001

Duplex zones have diverged

0002

Fatal cross-link error has occurred

0003

Fatal zone error has occurred

0004

Fatal ATM error has occurred

0005

Fatal CPU module error has occurred

0006

Fatal memory error has occurred

0007

Single bit memory error has occurred

0008

User command issued to stop a zone

0009

Unexpected machine check has occurred

000A

Software detected failure has occurred

000B

Solid NXIO error has occurred

000C

Excessive transient NCIO errors have occurred

000D

A solid IO error has occurred

000E

Excessive transient IO errors have occurred

000F

Excessive VAXELN kernel recoverable errors have occurred

0010

A VAXELN master fatal error has occurred

0011

A VAXELN job fatal error has occurred

0012

Not enough SPTEs could be allocated to boot OpenVMS

0013

Unexpected system error occurred1

0014

Interface module failure has occurred

1 Reset reason 0013 indicates that an unexpected system error occurred. The contents of the SYSFLT,

(continued on next page)

Miscellaneous System Information A–3

Table A–3 (Cont.) Console Halt Reason Code Definitions
Reason Code
(Hex)

Definition

0015

Unexpected VAXELN error occurred

0016

A VAXELN kernel fatal error has occurred

0017

Initializing VAXELN before starting reconfiguration

A.4 Error Register Descriptions
A.4.1 System Fault (SYSFLT) Register
This register is not rail or zone unique (Figure A–1). Software does not take
special precautions when reading this register. In addition, the register is
continuously updated. The setting of one error bit does not prevent other bits
from being set. The register contains bits which cause IPL29 interrupts.
All bits in this register have the following characteristics: default = 0, type = ro,
reset = hr.
Figure A–1 System Fault Register
31

SFB

XLM

LCK

RSA

CBG

PWG

CPB

CPA

HTB

HTA

MFB

MFA

MDB

MDA

MSB

MSA

JDB

JDA

JSB

JSA

NXB

NXA

IOB

IOA

DNB

DNA

DMB

DMA

MR−0583−92RAGS

Register Address: CPU = E110 1100 (CCA offset = 15C)
[31]: SFB - Solid Fault Bit. Latched when an automatic retry on an I/O
operation fails to complete properly.
[30:28]: XLM - Xlink Mode [2:0]. This field, sourced by the Xlink, is read-only
and indicates the Xlink mode specified in Table A–4.
Table A–4 Xlink Mode Coding
Code

Mode

000

Xlink Off

001

Xlink Slave

010

Xlink Master

011

Xlink Duplex

100

Not Used
(continued on next page)

A–4 Miscellaneous System Information

Table A–4 (Cont.) Xlink Mode Coding
Code

Mode

101

Resync Slave

110

Resync Master

111

Not Used

[27:26]: - Not used.
[25]: LCK - Lock. Latched when an error occurs during an interlock I/O access.
(Interlock access refers to the special I/O access mode.)
[24]: RSA - Resync Abort. Latched when an error occurs during resync mode.
Resync mode is automatically canceled.
[23]: CBG - Cable Gone. Latched when a cable gone signal is detected. CBG set
will force the Xlink to the off mode.
[22]: PWG - Power Gone. Set when the other zone power gone signal is detected.
PWG set will force the Xlink to the off mode.
[21]: CPB - Clock Phase Error (Zone B). Latches a high level assertion on the
Clock Phase Error line coming from the Xlink. The high level will remain until a
1 is written to the bit. If the Clock Phase Error signal line is still high after the
write 1 to clear, the bit is again set to 1.
[20]: CPA - Clock Phase Error (Zone A). Latches a high level assertion on the
Clock Phase Error line coming from the Xlink. The high level will remain until a
1 is written to the bit. If the Clock Phase Error signal line is still high after the
write 1 to clear, the bit is again set to 1.
[19]: HTB - Halt Error (Zone B). Latches a high level assertion on the Halt
Request line coming from the Xlink. The high level will remain until a 1 is
written to the bit. If the Halt Error signal line is still high after the write 1 to
clear, the bit is again set to 1.
[18]: HTA - Halt Error (Zone A). Latches a high level assertion on the Halt
Request line coming from the Xlink. The high level will remain until a 1 is
written to the bit. If the Halt Error signal line is still high after the write 1 to
clear, the bit is again set to a 1.
[17]: MFB - CPMF (Zone B). Set when the error logic determines that a CPMF
is required.
[16]: MFA - CPMF (Zone A). Set when the error logic determines that a CPMF
is required.
[15]: MDB - Memory Double-Bit Error (Zone B). Set when a double-bit ECC
error or single-bit ECC error is detected during memory writes on the internal
Jet Bus ECC checker. This causes a CPMF.
[14]: MDA - Memory Double-Bit Error (Zone A). Set when a double-bit ECC error
or single-bit ECC error is detected during memory writes on the internal Jet Bus
ECC checker. This causes a CPMF.
[13]: MSB - Memory Single-Bit Error (Zone B). Set when a single-bit ECC error
is detected in memory during a read and the JXD was not the requester of the
data. The bit is set regardless of the state of the Error Enable bit. The error
is automatically corrected at the CPU. An IPL26 interrupt is generated causing

Miscellaneous System Information A–5

a two-zone system to diverge. Hardware generates an IPL29 interrupt to both
zones within three clock cycles.
[12]: MSA - Memory Single-Bit Error (Zone A). Set when a single-bit ECC error
is detected in memory during a read and the JXD was not the requester of the
data. The bit is set regardless of the state of the Error Enable bit. The error
is automatically corrected at the CPU. An IPL26 interrupt is generated causing
a two-zone system to diverge. Hardware generates an IPL29 interrupt to both
zones within three clock cycles.
[11]: JDB - JXD Double-Bit Error (Zone B). Set when a double-bit ECC error is
detected on the internal Jet Bus ECC checker.
[10]: JDA - JXD Double-Bit Error (Zone A). Set when a double-bit ECC error is
detected on the internal Jet Bus ECC checker.
[09]: JSB - JXD Single-Bit Error (Zone B). Set when a single-bit ECC error is
detected on the internal Jet Bus ECC checker and is detected in memory. The
check operation is triggered during Jet Bus transactions. The bit is set regardless
of the state of the Error Enable bit. The error is automatically corrected on JXD
reads from memory. Detection of this error causes the current DMA address
to be latched. The DMA operation is allowed to complete. When finished, the
DMA driver will check this bit, and if set will force a mini resync by reading the
location pointed to by the DMA Error Address register.
[08]: JSA - JXD Single-Bit Error (Zone A). Set when a single-bit ECC error
is detected on the internal Jet Bus ECC checker and is detected in memory.
The check operation is only triggered during Jet Bus transactions. The bit is
set regardless of the state of the Error Enable bit. The error is automatically
corrected on JXD reads from memory. Detection of this error causes the current
DMA address to be latched. The DMA operation is allowed to complete. When
finished, the DMA driver will check this bit, and if set will force a mini resync by
reading the location pointed to by the DMA Error Address register.
[07]: NXB - Nonexistent I/O (Zone B). Set after any bus timeout. If the retry
passes, the Solid Fault bit will not be set.
[06]: NXA - Nonexistent I/O (Zone A). Set after any bus timeout. If the retry
passes, the Solid Fault bit will not be set.
[05]: IOB - I/O Error (Zone B). Set by errors that occur from nonfatal or
recoverable CPU initiated transactions. Errors resulting from CPU to I/O
transactions are retried.
[04]: IOA - I/O Error (Zone A). Set by errors that occur from nonfatal or
recoverable CPU initiated transactions. Errors resulting from CPU to I/O
transactions are retried.
[03]: DNB - DMA NXIO (Zone B). Set when a bus timeout occurs and the
CROME bus is performing a DMA operation.
[02]: DNA - DMA NXIO (Zone A). Set when a bus timeout occurs and the
CROME bus is performing a DMA operation.
[01]: DMB - DMA Error (Zone B). Set by DMA errors. If the bit is set, the DMA
is aborted. A DMA error may generate a CPMF.
[00]: DMA - DMA Error (Zone A). Set by DMA errors. If the bit is set, the DMA
is aborted. A DMA error may generate a CPMF.

A–6 Miscellaneous System Information

A.4.2 System Error Address (SYSADR) Register
This register latches when any error is detected at the JXD Jet Bus and below
(Figure A–2). It contains the address the CPU was accessing at the time the
error occurred. The register is read only and cleared by clearing errors.
All bits in this register have the following characteristics: default = 0, type = ro,
reset = hr.
Figure A–2 JXD System Error Address Register
31

ADR

MR−0581−92RAGS

Register Address: CPU = E110 1030 (CCA_BASE+160)
[31:30]: DL - Data length:
00 - Hexword
01 - Longword
10 - Quadword
11 - Octaword
[29:00] ADR - 30-bit error address latched on CPU operations to the JXD.

A.4.3 DMA Error Address (DMAADR) Register
When a single-bit ECC error is detected at the JXD, the current DMA subtransfer address into main memory is latched in this register and an IPL29
interrupt is generated. Software allows the DMA to complete and later use this
information to fix the bad location in memory (Figure A–3).
All bits in this register have the following characteristics: default = 0, type = ro,
reset = hr.
Figure A–3 JXD DMA Error Address Register
31

DEA

MR−0572−92RAGS

Miscellaneous System Information A–7

Register Address: CPU = E110 1040 (CCA_BASE+180)
[31:30]: DL - DMA data length:
00 - Hexword
01 - Longword
10 - Quadword
11 - Octaword
[29:00]: DEA - DMA 30-bit address latched during error.

A.4.4 Reset Reason 0013 Fault Analysis
The following example shows the content of the SYSFLT and SYSADR registers
after a Reset Halt. The following paragraph analyzes the register content and
identifies the faulty FRU.
?03 Reset (Reason = 0013)
PC= 01E00000 PSL= 041F0300
>>> E/P 1E9AD5C
P 01E9AD5C

300000C0

>>> E/P 1E9AD60
P 01E9AD60

799F0000

! examine saved SYSFLT register contents
! from CCA_BASE+15C
! NXIO, Zone A (bus timeout)
! NXIO, Zone B (bus timeout)
! XLINK MODE = Duplex
! examine saved SYSADR register contents
! from CCA_BASE+160
! Zone B, slot 17 P-card address
CCA Base Address
MEMORY SIZE
CCA_BASE
-------------------------32-Mbyte
1E9AC00
64-Mbyte
3E9AC00
96-Mbyte
5E9AC00
128-Mbyte
7E9AC00
160-Mbyte
9E9AC00
192-Mbyte
BE9AC00
224-Mbyte
DE9AC00
256-Mbyte

FE9AC00

The SYSFLT register indicates a NXIO (nonexistent I/O) error. The SYSADR
register contains a 30-bit address of 399F0000. However, after sign extended to
32 bits the address is translated to F99F0000.
Figure A–4 shows that F99F0000 is the address of an interface module in Zone
B, slot 17. The module failed to respond to its address causing a bus timeout.
Replace the module.

A.5 I/O Physical Address Space
Figure A–4 shows the I/O physical address space.

A–8 Miscellaneous System Information

Figure A–4 I/O Physical Address Space

0000 0000

1FFF FFFF
2000 0000

3FFF FFFF

Main Memory
(512−Mbytes, 30−bit)
(current VMS addressable limit)

CPU Private Space

E000 0000

SYSADR Register

E110 1030 (CCA offset = 15C)

DMAADR Register

E110 1040 (CCA offset = 160)

SYSFLT Register

E110 1100 (CCA offset = 180)

Reserved for Zone A (M=0)
Zone A I/O ATM, Slot 1

Main Memory
(512−Mbytes, 32−bit)
(support by later VMS release)

4000 0000
Unsupported Memory
(1−Gbytes)

(M=1)

Zone A ATM Pcard, Slot 10 (*P=8)

F198 0000

Zone A ATM Pcard, Slot 11 (*P=9)

F199 0000

Zone A ATM Pcard, Slot 12 (*P=A)

F19A 0000

Zone A ATM Pcard, Slot 13 (*P=B)

F19B 0000

Zone A ATM Pcard, Slot 14 (*P=C)

F19C 0000

Zone A ATM Pcard, Slot 15 (*P=D)
Zone A ATM Pcard, Slot 16 (*P=E)
Zone A ATM Pcard, Slot 17 (*P=F)

F19D 0000
F19E 0000
F19F 0000
F1A0 0000

8000 0000
B Cache Tags
(1−Gbytes)
C000 0000

E000 0000
FFFF FFFF

Zone A I/O ATM Firewall Space

Zone A I/O RAM/Flash ROM
Unsupported Memory
(512−Mbytes)

EFFF FFFF
F000 0000
F100 0000

Reserved for
Zone A future I/O, Slot 2

F1AF FFFF
F1B0 0000
F1FF FFFF
F200 0000

(M=2)

I/O Space
(512−Mbytes)

F2FF FFFF
FM00 0000
~
~
FMAF FFFF

Unsupported Zone A I/O
(M=3 − 7)
Reserved for Zone B (M=8)

F800 0000

Zone B I/O ATM, Slot 1

F900 0000

(M=9)

Zone B ATM Pcard, Slot 10 (*P=8)

F998 0000

Zone B ATM Pcard, Slot 11 (*P=9)

F999 0000

Zone B ATM Pcard, Slot 12 (*P=A)

F99A 0000

Zone B ATM Pcard, Slot 13 (*P=B)

F99B 0000

Zone B ATM Pcard, Slot 14 (*P=C)

F99C 0000

Zone B ATM Pcard, Slot 15 (*P=D)
Zone B ATM Pcard, Slot 16 (*P=E)
Zone B ATM Pcard, Slot 17 (*P=F)

F99D 0000
F99E 0000
F99F 0000
F9A0 0000

Zone B I/O ATM Firewall Space
F9AF FFFF
Zone B I/O RAM/Flash ROM
Reserved for
Zone B future I/O, Slot 2

F9B0 0000
F9FF FFFF
FA00 0000

(M=A)
FAFF FFFF

Unsupported Zone B I/O
(M=B − F)

FM00 0000
~
~
FMFF FFFF
PKO−0150−93RAGS

Miscellaneous System Information A–9

A.6 System Control Block Description
The System Control Block (SCB) contains vectors for servicing interrupts and
exceptions. The SCB address should be aligned on a page boundary. The
SCB address is contained in the System Control Block Base register (SCBB)
(Figure A–5). Microcode forces a longword-aligned SCBB by clearing bits [01:00]
of the new value before loading the register.
Figure A–5 System Control Block Base Register
31

Physical Page Address of SCB

SBZ

MR−0021−93RAGS

An SCB vector is an aligned longword in the SCB through which the CPU
microcode dispatches interrupts and exceptions. Each SCB vector has the format
shown in Figure A–6.
Figure A–6 System Control Block Vector Format
31

Longword Address of Service Routine

Code

MR−0022−93RAGS

[31:02]: Longword Address - Virtual address of the service routine for the
interrupt or exception. The routine must be longword aligned since the microcode
forces the two low-order bits to 0.
[01:00]: Code - The code field is defined in Table A–5.
Table A–5 Code Field Definition
Code

Definition

The event is to be serviced on the kernel stack unless the CPU is already on the
interrupt stack, in which case the event is serviced on the interrupt stack.
(continued on next page)

A–10 Miscellaneous System Information

Table A–5 (Cont.) Code Field Definition
Code

Definition

The event is to be serviced on the interrupt stack. If the event is an exception, the
IPL is raised to 1F (hex).

Unimplemented, results in a console error halt.

The SCB content is specified in Table A–6.
Table A–6 SCB Layout
Vector

Name

Type

Parameter

Notes

Unused

—

Unused

—

Machine check

Abort

Parameters reflect
machine state; must
be serviced on the
interrupt stack

Unused

—

Reserved privileged
instruction

Fault

—

Customer reserved
instruction

Fault

XFC instruction

Reserved operand

Fault/abort

Not always
recoverable

Reserved addressing
mode

Fault

—

Access control
violation/ vector
alignment fault

Fault

Parameters are
virtual address and
status code

Translation not valid

Fault

Parameters are
virtual address and
status code

Trace pending

Fault

—

Breakpoint instruction

Fault

—

Unused

—

Compatibility mode
in other VAX systems

Arithmetic trap

Fault

Parameter is type
code

38 to 3C

Unused

—

CHMK

Trap

Parameter is signextended operand
word

CHME

Trap

Parameter is signextended operand
word
(continued on next page)

Miscellaneous System Information A–11

Table A–6 (Cont.) SCB Layout
Vector

Name

Type

Parameter

Notes

CHMS

Trap

Parameter is signextended operand
word

CHMU

Trap

Parameter is signextended operand
word

Unused

—

Soft error notification

Interrupt

IPL is 1A (hex)

58 to 5C

Unused

—

Hard error notification

Interrupt

IPL is 1D (hex)

Unused

—

Vector unit disabled

Fault

Vector instructions

6C to 80

Unused

—

Software level 1

Interrupt

Software level 2

Interrupt

Ordinarily used for
AST delivery

Software level 3

Interrupt

Ordinarily used for
process scheduling

90 to BC

Software levels 4 to 15

Interrupt

—

Interval timer

Interrupt

IPL is 16 (hex)

Unused

—

Emulation start

Fault

Same mode
exception, FPD=0;
parameters are
opcode, PC, specifiers

Emulation continue

Fault

Same mode
exception, FPD=1;
parameters are
opcode, PC, specifiers

Device vector

Interrupt

IPL is 14 (hex)

Device vector

Interrupt

IPL is 15 (hex),
includes console
interrupts

Device vector

Interrupt

IPL is 16
(hex), includes
interprocessor
interrupts

Device vector

Interrupt

IPL is 17 (hex)

E0 to F4

Unused

—

F8 to FC

Unused

—

100 to
FFCC

Unused

—

A–12 Miscellaneous System Information

B
ISE Parameter Worksheets
B.1 In This Appendix
This appendix includes:
•

Individual ISE parameter worksheets

•

ISE zone parameter worksheets

B.2 Individual ISE Parameter Worksheets
Use the following worksheets to record parameters for each ISE.

Serial Number:
NODENAME:
SYSTEMID:
ALLCLASS:
UNITNUM:
FORCEUNI:
FORCENUM:

Serial Number:
NODENAME:
SYSTEMID:
ALLCLASS:
UNITNUM:
FORCEUNI:
FORCENUM:
MR−0052−93RAGS

ISE Parameter Worksheets B–1

Serial Number:
NODENAME:
SYSTEMID:
ALLCLASS:
UNITNUM:
FORCEUNI:
FORCENUM:

Serial Number:
NODENAME:
SYSTEMID:
ALLCLASS:
UNITNUM:
FORCEUNI:
FORCENUM:
MR−0053−93RAGS

B–2 ISE Parameter Worksheets

B.3 ISE Zone Parameter Worksheets
Use the following worksheets to record parameters for each ISE.

Serial No:

NODENAME:

UNITNUM:

Serial No:

NODENAME:

UNITNUM:

Serial No:

NODENAME:

UNITNUM:

Serial No:

NODENAME:

UNITNUM:

Serial No:

NODENAME:

UNITNUM:

Serial No:

NODENAME:

UNITNUM:

UNITNUM:
MR−0054−93RAGS

ISE Parameter Worksheets B–3

Serial No:

NODENAME:

UNITNUM:

Serial No:

NODENAME:

UNITNUM:

Serial No:

NODENAME:

UNITNUM:

Serial No:

NODENAME:

UNITNUM:

Serial No:

NODENAME:

UNITNUM:

Serial No:

NODENAME:

UNITNUM:

UNITNUM:
MR−0054−93RAGS

B–4 ISE Parameter Worksheets

Index
A
Application of thresholds, 4–17
ATM module
removal and replacement, 5–7
ATM module deconfiguration actions, 4–13

B
Before you begin, 5–3
Boot parameter block
data structures, 4–60
Bootstrap procedures, 2–7, 2–8

C
CAMP module
removal and replacement, 5–24
CCA fields
firmware interfaces, 4–53
CIO mode console commands
BOOT, 2–7
CIO mode, entering, 2–8
Console
command language syntax, 2–6
control characters, 2–5
description, 2–1, 2–3
entering console mode, 2–4
exiting console mode, 2–4
operating modes, 2–3, 2–4
operations, 2–1
Console commands, 2–22
BOOT, 2–9
CLEAR, 2–10
! (comment), 2–22
CONTINUE, 2–11
DUP, 2–13
EXAMINE, 2–13
FIND, 2–15
HELP, 2–15
INITIALIZE, 2–16
MATCH_ZONES, 2–16
MOVE, 2–16
REPEAT, 2–17
SET, 2–17
SET BOOT DEFAULT, 2–18
SHOW, 2–18

Console commands (cont’d)
START, 2–19
TEST, 2–20, 3–30
X, 2–21
Z, 2–22, 3–31
Console communications area data structures,
4–55
Console extender module
removal and replacement, 5–20
Controls and indicators
disk drawer, 3–19
CPU and expansion cabinets
system component descriptions, 1–1
CPU and memory deconfiguration actions, 4–14
CPU module
removal and replacement, 5–7
CPU module subDCB
data structures, 4–64
CPU or zone unsynchable error log entry, 4–72
CPU ROM-based diagnostics
system diagnostics, 3–31
CPU/MEM fault end action error log entry, 4–69
CPU/MEM fault error log entry, 4–66
Cross-link assembly
removal and replacement, 5–18
Cross-link cable deconfiguration actions, 4–16

D
Deconfiguration information block, 4–24
Deconfiguration messages, 4–49
Device configuration block
data structures, 4–61
Device fault indicators, 3–19
Device status indicators, 3–19
DIM
removal and replacement, 5–26
Disk drawer
controls and indicators, 3–19
Disk drives
RF35 disk drawer, 3–19
SF35-BK/HK/JK, 3–21
SF73-HK/JK, 3–24
Dispatch block description
data structures, 4–59

Index–1

Documentation road map, iii
DSSI cable
removal and replacement, 5–29
DSSI disk drawer
removal and replacement, 5–14
DSSI extender module
removal and replacement, 5–22
DSSI interface module
removal and replacement, 5–26
DUP, 6–1
PARAMS utility, 6–1
SET HOST, 6–1
Duplex compatibility test, 4–57

E
EHS, 4–1
EHS structure, 4–3
EIM
removal and replacement, 5–28
Eject button
unload function, 3–28
End action timeouts, 4–29
End actions, 4–28
Error event messages, 4–40
Error handling services (EHS), 4–1
Error isolation and handling, 4–2
Error log analysis, 4–66
Error register descriptions, A–4
DMA error address register, A–7
system error address register, A–7
system fault register, A–4
Error types, 4–5
ESD procedures, 5–4
Ethernet interface module
removal and replacement, 5–28
Event reporting interface routines, 4–40

F
Fan
removal and replacement, 5–10
Fault data, 4–27
Fault summary, 4–20
FCSB
removal and replacement, 5–10
FEU
removal and replacement, 5–16
Firmware and OpenVMS interface data structures,
4–54
Firmware interfaces, 4–50
FRU deconfiguration, 4–13
FRU handling, 5–4
FRU information, 4–22
FRU isolation, 4–12
FRU list, 5–1

Index–2

FRUs, 4–12
access, 5–5
FTSS event reporting interface, 4–40

G
General troubleshooting procedure
system maintenance, 3–4

H
Halt codes
console halt codes, A–3
processor halt codes, A–1

I
I/O expansion module console and diagnostics
firmware interfaces, 4–53
I/O expansion module deconfiguration actions,
4–14
I/O physical address space, A–8
I/O ROM-based diagnostics
system diagnostics, 3–34
Interface module deconfiguration actions, 4–15
ISE, 6–1
finding parameter values, 6–5
individual parameter worksheet, B–1
installing new, 6–11
parameters, 6–4
replacing, 6–8
setting, 6–5
removal, 6–7
system parameter worksheet, B–3

L
Load/Unload button
reset function, 3–29

M
Maintenace strategy
system maintenance, 3–1
MMB
removal and replacement, 5–9
Module fault LEDs
system maintenance, 3–6
Module NVRAM status and LED indicators, 4–38

O
OpenVMS error log, 4–19
Operating rules and cautions
system maintenance, 3–2

P
Page frame number bitmap
data structures, 4–65
POST, 3–27
Power distribution box
removal and replacement, 5–42
Power distribution boxes
system component descriptions, 1–9
Power modules, 3–12
system component descriptions, 1–8
Power system maintenance, 3–12
Power system overview
system maintenance, 3–7
Power-on, 3–27
Power-on self-test (POST)
status of OCP indicators, 3–27
PSC
removal and replacement, 5–16

R
Removal and replacement
ATM module, 5–7
CAMP module, 5–24
console extender module, 5–20
CPU module, 5–7
cross-link assembly, 5–18
DIM, 5–26
DSSI cable, 5–29
DSSI disk drawer, 5–14
DSSI extender module, 5–22
DSSI interface module, 5–26
EIM, 5–28
Ethernet interface module, 5–28
fan, 5–10
FCSB, 5–10
FEU, 5–16
MMB, 5–9
power distribution box, 5–42
PSC, 5–16
RF35 disk drive, 5–12
SF35 storrage array, 5–36
SF73 disk drive, 5–32
SIMM, 5–8
TF857-CA tape drive, 5–39
TF85C-BA tape drive, 5–30
5V regulator, 5–16
3.3V regulator, 5–16
zone control panel, 5–14
Reset
load/Unload button, 3–29
Reset reason fault analysis
error register descriptions, A–8
RF35 disk drawer
disk drives, 3–19

RF35 disk drive
removal and replacement, 5–12
ROM-based diagnostics
system diagnostics, 3–29

S
SCB description, A–10
Server setup switch, 6–2
Services
error handling, 4–1
SET HOST, 6–1
SF35 storage array
removal and replacement, 5–36
SF35-BK/HK/JK storage array
disk drives, 3–21
SF73 disk drive
removal and replacement, 5–32
SF73-HK/JK storage array
disk drives, 3–24
Shutting down a zone, 5–4
SIMM
removal and replacement, 5–8
Software detected errors
fault data, 4–34
Starting up a zone, 5–5
Sub-device condiguration block
data structures, 4–63
System console and diagnostics
firmware interfaces, 4–50
System control block description, A–10
System operating modes, 4–4
System registers
fault data, 4–27
System resets
firmware interfaces, 4–51

T
Tape devices
TF857 tape loader, 3–27
TF857 tape loader controls and indicators,
3–27
TF85C tape drive, 3–26
TEST command
system diagnostics, 3–30
TF857 tape loader controls and indicators
tape devices, 3–27
TF857-AA tape loader
operating procedures, 3–27
TF857-CA tape drive
removal and replacement, 5–39
TF85C tape drive
tape devices, 3–26
TF85C-BA tape drive
removal and replacement, 5–30

Index–3

Threshold information block, 4–26
TK85C-BA cartridge tape drive indicators, 3–27

fault data, 4–30
VAXELN error handling, 4–10

Unit number assignment, 6–2
Unsynchable events
fault data, 4–36

Warm swapping, 6–3

Z command
system diagnostics, 3–31
Zone control panel
removal and replacement, 5–14
system component descriptions, 1–6
Zone deconfiguration actions, 4–16

5V regulator
removal and replacement, 5–16
3.3V regulator
removal and replacement, 5–16
VAXELN detected errors

Index–4