多核處理器架構下有效利用記憶體頻寬之排程策略

Hsiang-Yun Cheng; 鄭湘筠

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/45971

標題:	多核處理器架構下有效利用記憶體頻寬之排程策略 Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs
作者:	Hsiang-Yun Cheng 鄭湘筠
指導教授:	楊佳玲(Chia-Lin Yang)
關鍵字:	資料密集任務排程,串流程式,多核心處理器效能改善,效能分析模型,共享資源競爭, Memory task scheduling,stream programming,multi-core performance improvement,analytical model,shared resource contention,
出版年 :	2010
學位:	碩士
摘要:	由於製程技術的限制，記憶體存取速度遠不及CPU 核心運算速度，這在改善處理器效能方面已成為眾所皆知的重要議題。而隨著多核心處理器的普及，會有越多的CPU 核心共享記憶體資源，這些CPU 核心對共享記憶體的競爭，更會擴大記憶體存取速度與CPU 核心運算速度間的差距。而不同核心所發出記憶體存取指令間的干擾，會延長記憶體存取時間，使得系統效能降低。在這篇論文當中，我們提出了一種多核心架構下的任務排程方式，將程式切割成運算密集任務和資料密集任務，使得記憶體存取集中於資料密集任務中，並限制同時執行資料密集任務的執行緒數量，藉由錯開各執行緒執行資料密集任務的時間，來減少不同核心對共享記憶體的競爭。然而在排程時對資料密集任務的限制，可能會造成CPU 核心在一段時間內除了等待執行資料密集任務的許可外沒有其他任務可執行，對整體效能造成負面影響。因此，我們設計了一個機制，來根據程式的不同特性，動態調整同時允許執行資料密集任務的執行緒數量，進一步改善系統效能。這個動態機制藉由監控程式的資料傳輸和運算時間比例，來偵測是否需要調整執行資料密集任務的執行緒數量。接著，這個動態機制透過一個效能分析模型來估計不同排程限制下的效能，根據所估計的效能來決定如何調整執行資料密集任務的執行緒數量。在這篇論文當中，我們提出了一種多核心架構下的任務排程方式，將程式切割成運算密集任務和資料密集任務，使得記憶體存取集中於資料密集任務中，並限制同時執行資料密集任務的執行緒數量，藉由錯開各執行緒執行資料密集任務的時間，來減少不同核心對共享記憶體的競爭。然而在排程時對資料密集任務的限制，可能會造成CPU 核心在一段時間內除了等待執行資料密集任務的許可外沒有其他任務可執行，對整體效能造成負面影響。因此，我們設計了一個機制，來根據程式的不同特性，動態調整同時允許執行資料密集任務的執行緒數量，進一步改善系統效能。這個動態機制藉由監控程式的資料傳輸和運算時間比例，來偵測是否需要調整執行資料密集任務的執行緒數量。接著，這個動態機制透過一個效能分析模型來估計不同排程限制下的效能，根據所估計的效能來決定如何調整執行資料密集任務的執行緒數量。為了驗證所提出的動態機制，我們將機制直接實作在真實應用程式和模擬程式中，並利用英特爾四核心伺服器(Intel i7)來實驗分析動態機制所改善的系統效能。實驗結果顯示我們所提出的機制在模擬程式中，最高能提供20%的效能改善，並且和效能分析模型預估的結果是相符合的。此外，在真實應用程式中，我們所提出的動態機制也能提供平均12% 的效能改善。 Memory Wall is a well-known obstacle to processor performance improvement. The popularity of multi-core architecture will further exaggerate the problem since memory resource is shared by all the cores. Interferences among requests from different cores may prolong the execution time of memory accesses thereby degrading system performance. To tackle the problem, this thesis proposes to decouple applications into computation and memory tasks, and restrict the number of concurrent memory threads to reduce the contention. Yet with this scheduling restriction, a CPU core may spend some time in acquiring the permission to execute memory tasks and adversely impact the overall performance. Therefore, we develop a memory thread throttling mechanism that tunes the allowable memory threads dynamically under workload variation to improve system performance. The proposed run-time mechanism monitors memory and computation ratios of a program for phase detection. It then decides the memory thread constraint for the next program phase based on an analytical model that can estimate system performance under different constraint values. To prove the concept, we prototype the mechanism in some real-world applications as well as synthetic workloads. We evaluate their performance with real hardware. The experimental results demonstrate up to 20% speedup with a pool of synthetic workloads on an Intel i7 (Nehalem) machine and matches to the speedup estimated by the proposed analytical model. Furthermore, the intelligent run-time scheduling leads to a geometric mean of 12% performance improvement for realistic applications on the same hardware.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/45971
全文授權:	有償授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-99-1.pdf 目前未授權公開取用	1.27 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。