Date of Award
Doctor of Philosophy
Electrical and Computer Engineering
With growing computing demands, power aware computation has become a major concern in recent studies. We focus to scale the clock speed/operating frequency of mul- ticore processor to optimize the overall power consumption for a specific workload. Our work consists of four major parts. In the first part we propose a metric to determine core criticality. In multicore processing, certain threads/cores wait for other threads due to synchronization. Our proposed method combines Instructions Per Cycle (IPC) of a thread and the sum of IPCs of all co-running threads within a time quanta. We present a low power hardware based technique to calculate scores in order to determine critical threads. We use our score metric to create stacks that break total execution time into each thread’s score components which makes it visually easier to determine optimization opportunities. Asymmetric multiprocessors have been proposed to scale the frequency by exploiting the stacks to achieve power optimization without loosing much performance. To validate our proposed method we used state-of-the-art simulators to design an asymmetric multi-core processor and executed different multi-threaded parallel applications from the SPLASH2 benchmark. We achieved 28.13% savings in average power consumption with a maximum of 7.1% performance loss. In the second part we address the core optimization problem to reduce the overall power consumption. Multiple threads running on a multi-core processor can improve the performance of a parallel application significantly. However, effective scaling of threads and cores plays a key role to achieve optimal power-performance tradeoff because performance does not necessarily improve with increasing number of cores. Multi- threaded applications suffer due to thread synchronization, negative interference in shared memory including last level cache and main memory. Memory bandwidth also often limits the performance of a multi-threaded workload. In this work we propose a method to achieve optimal scalability on multi-core platform and predict the bandwidth requirement of parallel workloads for a given number of threads. We employ the proposed method to improve the performance of bandwidth limited parallel applications. We find that DRAM access has various phases and use the highest bandwidth among all phases to predict the performance of a given workload on multi-threaded environment. We evaluate our pro- posed method using multi-core simulator and the experimental results show that the phase based bandwidth utilization method can estimate the optimal number of threads/cores for a given parallel workload. In the third and fourth part, we perform an analytical study to obtain power efficient operating frequency for symmetric and asymmetric processors respectively. In a multithreaded application, different threads execute their task which may have different memory access behavior. One thread may execute a memory bound task and the other thread may execute a computing intensive task. Since, both of them have to access data from the off-chip memory, the thread that execute computing intensive task may get delayed because of the increasing number of off-chip traffic. We propose a theoretical framework that captures this off-chip memory access delay and we include its effect in our performance model. Using our derived formulas, we obtain the optimal operating frequency for symmetric multicore processor and optimal operating frequencies for the big core and the small cores in case of asymmetric multicore processor. Using our proposed methodology we achieved maximum 2.071 speedup in comparison to the base frequency by spending an extra 5.12% power.
Available for download on Friday, October 25, 2019
This dissertation is only
available for download to the SIUC community. Others should contact the
interlibrary loan department of your local library or contact ProQuest's Dissertation Express service.