Jupiter Performance

Order Number: XX-4C496-6D

This comprehensive set of interoffice memoranda details a proposed performance modeling project for the Jupiter CPU, which is currently underperforming original estimates. The ultimate goal is for Jupiter to achieve 2.3 times the performance of a KL10, specifically for general timesharing workloads, necessitating a modeling target of 2.5x KL10 to account for model optimism.

The documents identify two primary performance concerns:

  1. Inefficiency of "Slow" Instructions: A simple performance model revealed that a small percentage (25%) of instructions, though less frequently executed, account for over 50% of the total execution time due to their low Jupiter-to-KL10 speed ratio (e.g., Class 1 instructions running <3x KL10). Key "slow" instruction classes identified are BLT, Byte operations (LDB, IDPB), Floating Point operations, PUSHJ/POPJ, String operations (MOVSLJ, CVTxxx), and XCT. Optimizing these slow instructions, even if less frequent, has a much greater impact on overall system performance than optimizing already fast instructions. Instructions with a Jupiter/KL10 ratio of 3x or greater are deemed not worth optimizing at this stage.
  2. Poor Performance with Extended Addressing, especially Indirection: The 2080 architecture's handling of extended addressing, particularly indirect addressing, significantly degrades performance. Indirection can be 5x slower than indexing on Jupiter, potentially leading to a 27-45% throughput loss. Existing software (TOPS-20, monitor, languages) relies heavily on indirection. The IBOX (instruction box) is not equipped to efficiently decode indirect references, defeating prefetch advantages, and the page cache design (1-way associative) is also a concern.

To address these issues and ensure predictable performance, a performance modeling project is proposed, broken into two main phases:

Phase 1: Initial, Low-Risk Performance Improvements (Immediate/Parallel)

  • Goal: Implement "simple" microcode and hardware changes to speed up identified bottleneck instructions (EA-calc, byte pointer decode, BLT/XBLT, PUSHJ, XCT).
  • Methodology: Evaluate and design changes, implement them in microcode and the LISP simulator, measure individual instruction improvements, and predict overall CPU performance.
  • Expected Benefit: Preliminary investigations suggest up to a 30% performance gain with relatively low cost.

Phase 2: Data Gathering, Analysis, and Deeper Investigation (Parallel & Sequential) This phase is critical for accurate predictions, guiding further design changes (including for APA and Model B machines), and reducing project risk. It involves:

  1. Reduction of Current OPHIST Data:

    • Goal: Automate analysis of existing instruction histogram (OPHIST) data to identify correlations, produce ordered lists of "problem" instructions for APA and non-APA cases, and define a "measure of goodness" to predict performance impact of changes.
    • Justification: Current manual analysis is insufficient and lacks confidence in prioritization.
  2. Benchmark Selection:

    • Goal: Create a representative suite of benchmarks for Fortran, Cobol, and general timesharing job mixes to accurately measure and predict CPU performance.
    • Justification: Current predictions rely on hand-evaluated OPHIST; new benchmarks are needed to evaluate changes to cycle time and instruction performance. This task is crucial for the overall project.
  3. Additional Data Gathering and Investigation (Most Critical and Costly):

    • Goal: Produce more accurate performance data and analysis tools to quantify the effects of extended addressing, indirect addressing, IBOX conflicts/flushes/prefetch efficiency, translation buffer conflicts, and cache hits.
    • Methodology:
      • Collect additional OPHIST data from diverse sites (Fortran, Cobol, timesharing) to increase confidence.
      • Utilize TRACKS microcode on KL10 for verifying OPHIST results, exec mode measurements, and PC traces.
      • Perform conflict analysis using programs like CONF20.
      • Enhance the LISP simulator to gather more data on IBOX behavior.
      • Conduct translation buffer and cache hit analysis using tools like SIM20 for address traces and cache simulation, especially for extended addressing effects.

Key Recommendations:

  • Immediate action on simple microcode/hardware changes and OPHIST data reduction, running these tasks in parallel.
  • Prioritize benchmark selection as it underpins subsequent detailed data gathering.
  • Modify the IBOX to efficiently handle single-level global indirection (OP AC,@[EFIW BASE(X)]) to mitigate severe performance penalties.
  • Further study the paging cache and implement a microcode-supported cache in the EBOX for both 2080 and KL10.
  • Form a working group of experts (architecture, performance, microcode, hardware) to oversee changes.
  • Recognize that determining and minimizing the machine's cycle time is as critical as performance modeling.

The project emphasizes a top-down approach, evaluating all changes from a system viewpoint to understand their impact beforehand, to avoid past mistakes and ensure the Jupiter CPU meets its performance targets reliably.

XX-4C496-6D
May 1982
47 pages
Quality

Original
3.5MB

Site structure and layout ©2025 Majenko Technologies