Quantifying Latency and Throughput Compromises in CMP DesignReport
Designers of chip multiprocessors will increasingly be called upon to optimize for a combination of design metrics under a variety of design constraints. The adoption of chip multiprocessors has also led to a shift in design metrics toward aggregate throughput and away from single thread latency. We examine the compromises between latency and throughput under various power, thermal, area, and bandwidth constraints to quantify the latency penalties of a purely throughput optimized design. We consider a large chip multiprocessor design space that includes core count, core complexity (pipeline dimensions, in-order versus out-of-order execution), and cache hierarchy sizes.
We demonstrate an approach to effectively assess trade-offs given a comprehensive core model, a set of optimization cri- teria, and a set of design constraints. We perform a number of case studies to evaluate these trade-offs, exposing significant single thread latency penalties when optimizing solely for throughput and neglecting other measures of performance. As single thread latency continues to be one of several design metrics, any choice to compromise latency should be well under- stood before implementation. Collectively, our results suggest single thread latency is still a design metric of importance given that optimizing throughput alone will significantly compromise latency. Furthermore, the case for simple, in-order cores should be taken with caution given this balanced view of performance.
All rights reserved (no additional license for public reuse)
Li, Y, Kevin Skadron, B Lee, and D Brooks. "Quantifying Latency and Throughput Compromises in CMP Design." University of Virginia Dept. of Computer Science Tech Report (2006).
University of Virginia, Department of Computer Science