Date of Award
Doctor of Philosophy
Electrical and Computer Engineering
This dissertation aims at improving the off-chip bandwidth utilization and energy efficiency in chip multiprocessor (CMP) architectures. This work consists of two main parts. The first part investigates the early write-back technique for a two-level cache hierarchy in a CMP with four processor cores. Early write-back can be viewed as a modified cache write policy that takes into account not only maintaining data consistency between on-chip and off-chip components of the memory hierarchy but also improving the off-chip bandwidth utilization. Early write-back will issue a write-back operation for some dead and dirty cache lines, from the shared second-level (L2) cache memory to the main memory, prior to those lines eviction. Early write-back operations will be issued when the off-chip bus is free. This technique would improve the processor's performance by avoiding or minimizing off-chip bus contention between write-back operations and demand fetch requests where one example is read and write misses in the shared L2 cache. Early write-back efficiency has been measured in terms of its impact on the L2 cache miss latency. Simulation results have proved early write-back efficacy in improving the off-chip bandwidth utilization of a CMP. Early write-back has achieved varying degrees of performance improvement among different benchmarks. The performance improvement that early write-back can achieve depends on two main factors. First, the sensitivity of the individual benchmark to changes in the available off-chip bandwidth. Second, the severity of off-chip bus contention between demand fetch requests and write-back operations. The second part of this work tackles dynamic voltage and frequency scaling (DVFS) of the off-chip bus that handles the communication between the processor chip and the off-chip memory. Off-chip bus DVFS will dynamically vary the power parameters of the off-chip bus such that off-chip bus energy can be minimized while at the same time the forward progress of the running application can be maintained. This technique captures the CPU and memory boundedness of the running applications, during the run time, such that a reasonable tradeoff between processor performance and off-chip bus energy can be attained. The off-chip bus can be tuned to low-energy settings in CPU-bound applications or CPU-bound phases of program execution. The CPU-boundedness of an application has been measured in terms of the off-chip access ratio (OCAR) which is defined as the ratio between the number of L2 cache misses and the number of instructions retired during a particular observation window. An application or a particular execution phase is said to be CPU-bound if its OCAR is less that a predefined threshold value. Off-chip bus DVFS has been evaluated in two types of processor configurations: First, a processor that relies on Instruction Level Parallelism (ILP) to improve its performance such as single-core superscalar processors, Second, a processor that depends on Thread Level Parallelism in conjunction with ILP to improve its performance such as a CMP with multiple superscalar cores. In applications with high ILP, even when executed on a single processor, the two systems have achieved similar results in terms of their off-chip bus energy savings. On the other hand, in applications with limited ILP when executed on a single processor, the second system has achieved better results in terms of its off-chip bus energy savings; executing the application as multiple concurrent threads has contributed to its CPU boundedness allowing the off-chip bus DVFS triggering condition to be satisfied for the majority of observation windows as compared to the case of a single processor.
This dissertation is only
available for download to the SIUC community. Others should contact the
interlibrary loan department of your local library or contact ProQuest's Dissertation Express service.