Digital PDFs

Order Number: EC-0100A-TE

This document, the "Compiler Writer's Guide for the 21264/21364," provides detailed guidance for optimizing software, specifically compilers and other programs, for Compaq's Alpha 21264 and 21364 microprocessors. It serves as one of three essential resources, complementing the Alpha Architecture Reference Manual and specific hardware reference manuals.

The guide first introduces the Alpha architecture as a 64-bit, load/store RISC design focused on high performance and multiple instruction issue. It then details the common hardware features of the 21264 and 21364, both superscalar pipelined processors. Key architectural aspects include their multi-stage pipelines (7 stages), on-chip memory management (Data and Instruction Translation Buffers), various caches (I-cache, D-cache), multiple execution units (Integer Ebox, Floating-Point Fbox), and dynamic features like register renaming and instruction issue/retire rules.

The core of the document lies in its "Guidelines for Compiler Writers," which advise on leveraging these hardware features for optimal performance. This includes:

Instruction and Data Alignment: Recommending octaword alignment for branch targets and natural alignment for data, with advice for unaligned operations.
Control Flow Optimization: Strategies like laying out code for common fall-through paths, ensuring single successors per octaword, and critically, eliminating branches using CMOV (conditional move) or logical instructions to improve branch prediction and instruction fetch efficiency.
SIMD Parallelism: Encouraging the use of MVI instructions for single instruction, multiple data (SIMD) style operations in registers.
Prefetching: Detailing different prefetch instructions (e.g., PREFETCH, WH64) and providing guidance on optimal prefetch distances specific to the 21264 and 21364.
Avoiding Replay Traps: A significant section dedicated to understanding and preventing performance-costly hardware replay traps (e.g., load-load order, store-load order, wrong-size, load-miss load, store queue overflow) by intelligent register usage, memory access patterns, and instruction choices.
Instruction Scheduling: General advice on software pipelining for loops, scheduling for specific functional unit latencies, and optimizing physical register allocation based on detailed pipeline modeling.

Appendices provide further specifics, including a summary of instruction slotting rules, a detailed example of an optimized checksum inner loop schedule demonstrating SIMD and scheduling, and conformance details for IEEE floating-point operations.

EC-0100A-TE

January 2002

79 pages

Quality

Original

0.3MB

view download