This comprehensive set of interoffice memoranda details a proposed performance modeling project for the Jupiter CPU, which is currently underperforming original estimates. The ultimate goal is for Jupiter to achieve 2.3 times the performance of a KL10, specifically for general timesharing workloads, necessitating a modeling target of 2.5x KL10 to account for model optimism.
The documents identify two primary performance concerns:
- Inefficiency of "Slow" Instructions: A simple performance model revealed that a small percentage (25%) of instructions, though less frequently executed, account for over 50% of the total execution time due to their low Jupiter-to-KL10 speed ratio (e.g., Class 1 instructions running <3x KL10). Key "slow" instruction classes identified are BLT, Byte operations (LDB, IDPB), Floating Point operations, PUSHJ/POPJ, String operations (MOVSLJ, CVTxxx), and XCT. Optimizing these slow instructions, even if less frequent, has a much greater impact on overall system performance than optimizing already fast instructions. Instructions with a Jupiter/KL10 ratio of 3x or greater are deemed not worth optimizing at this stage.
- Poor Performance with Extended Addressing, especially Indirection: The 2080 architecture's handling of extended addressing, particularly indirect addressing, significantly degrades performance. Indirection can be 5x slower than indexing on Jupiter, potentially leading to a 27-45% throughput loss. Existing software (TOPS-20, monitor, languages) relies heavily on indirection. The IBOX (instruction box) is not equipped to efficiently decode indirect references, defeating prefetch advantages, and the page cache design (1-way associative) is also a concern.
To address these issues and ensure predictable performance, a performance modeling project is proposed, broken into two main phases:
Phase 1: Initial, Low-Risk Performance Improvements (Immediate/Parallel)
- Goal: Implement "simple" microcode and hardware changes to speed up identified bottleneck instructions (EA-calc, byte pointer decode, BLT/XBLT, PUSHJ, XCT).
- Methodology: Evaluate and design changes, implement them in microcode and the LISP simulator, measure individual instruction improvements, and predict overall CPU performance.
- Expected Benefit: Preliminary investigations suggest up to a 30% performance gain with relatively low cost.
Phase 2: Data Gathering, Analysis, and Deeper Investigation (Parallel & Sequential)
This phase is critical for accurate predictions, guiding further design changes (including for APA and Model B machines), and reducing project risk. It involves:
Reduction of Current OPHIST Data:
- Goal: Automate analysis of existing instruction histogram (OPHIST) data to identify correlations, produce ordered lists of "problem" instructions for APA and non-APA cases, and define a "measure of goodness" to predict performance impact of changes.
- Justification: Current manual analysis is insufficient and lacks confidence in prioritization.
Benchmark Selection:
- Goal: Create a representative suite of benchmarks for Fortran, Cobol, and general timesharing job mixes to accurately measure and predict CPU performance.
- Justification: Current predictions rely on hand-evaluated OPHIST; new benchmarks are needed to evaluate changes to cycle time and instruction performance. This task is crucial for the overall project.
Additional Data Gathering and Investigation (Most Critical and Costly):
- Goal: Produce more accurate performance data and analysis tools to quantify the effects of extended addressing, indirect addressing, IBOX conflicts/flushes/prefetch efficiency, translation buffer conflicts, and cache hits.
- Methodology:
- Collect additional OPHIST data from diverse sites (Fortran, Cobol, timesharing) to increase confidence.
- Utilize TRACKS microcode on KL10 for verifying OPHIST results, exec mode measurements, and PC traces.
- Perform conflict analysis using programs like CONF20.
- Enhance the LISP simulator to gather more data on IBOX behavior.
- Conduct translation buffer and cache hit analysis using tools like SIM20 for address traces and cache simulation, especially for extended addressing effects.
Key Recommendations:
- Immediate action on simple microcode/hardware changes and OPHIST data reduction, running these tasks in parallel.
- Prioritize benchmark selection as it underpins subsequent detailed data gathering.
- Modify the IBOX to efficiently handle single-level global indirection (OP AC,@[EFIW BASE(X)]) to mitigate severe performance penalties.
- Further study the paging cache and implement a microcode-supported cache in the EBOX for both 2080 and KL10.
- Form a working group of experts (architecture, performance, microcode, hardware) to oversee changes.
- Recognize that determining and minimizing the machine's cycle time is as critical as performance modeling.
The project emphasizes a top-down approach, evaluating all changes from a system viewpoint to understand their impact beforehand, to avoid past mistakes and ensure the Jupiter CPU meets its performance targets reliably.