請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57149
標題: | 加速多核系統模擬暨減少硬體共享資源競爭 HCOREMU: Accelerating Multicore System Emulation and Reducing Hardware Shared Resource Contention |
作者: | Pei-Chi Chen 陳培基 |
指導教授: | 劉邦鋒(Pangfeng Liu) |
關鍵字: | 平行模擬,系統模式模擬,多核心,底層虛擬機器,多線程,追蹤式動態二元碼轉換最佳化,共享資源競爭, Parallel Emulation,System Mode Emulation,Multicores,LLVM,Multi-Threaded,Trace-based Dynamic Binary Translation Optimization,Shared Resources Contention, |
出版年 : | 2014 |
學位: | 碩士 |
摘要: | 我們提出一個高效能的平行系統模擬器,命名為 HCOREMU。現有的系統模擬器主要都關注執行的正確性以及 VCPU 間同步的機制,但是有兩個重要的因素會降低他們的效能。分別是模擬器產生的機器碼的優劣,以及用來模擬的多線程會去競爭有限的共享硬體資源。在提升模擬機器碼的品質方面,我們利用現在普遍存在的多核心機器,再根基於 HQEMU 提出的追蹤式 多線程最佳化,提出了兩種引入HCOREMU 的方法。在多線程競爭共享硬體資源的方面,我們減少了三種因為競爭 而造成的效能降低的情形。第一個情況是我們發現了在非均勻訪存機器 (NUMA)上 預設的 Linux 排程器與記憶體分配的行為會有所出入。第二個情況是我們用來幫助 提高模擬機器碼品質的線程干擾模擬的線程。第三種情況則是,我們發現 某些特定的應用程式會讓多個線程一直存取某段特定的記憶體位置。我們藉由硬體的幫助來偵測上述的情況,同時也提出了對應的解決方式。HCOREMU 的效能相較於 COREMU 在單一核心模擬有 1.8 倍的提升,在多核心模擬則有 1.3 倍的提升。我們的排程方法則是相較於預設的 Linux 排程器有了 1.1 倍的提升。 We present the high performance parallel system mode emulator, HCOREMU. Existing parallel system mode emulators focus on the correctness and synchronization mechanisms of emulation. However, there are two important factors that usually impede the performance: (1) the quality of emulation code and (2) threads contention on shared hardware resources. In this thesis, we take advantage of the ubiquitous multi-core platforms to improve our emulation code quality. We also propose two designs to accelerate multi-core system mode emulation based on the trace-based multi-threaded optimization in HQEMU. We reduce shared resource contention in three ways. First, We reduce the interconnect traffic and access latency of our threads due to the inconsistency of default Linux scheduler and memory allocator on NUMA platform. Second, we reduce the contention between optimization threads and emulation threads. Third, we find out that some workloads have a hotspot when accessing memory. We use hardware performance counters to detect this situation. We reduce the interconnect traffic and access latency of emulation threads in workloads having this characteristics. HCOREMU improves the performance of COREMU by a factor of 1.8X in uni-processor emulation, 1.3X in multi-core emulation. Threads contention on shared resources are reduced by our scheduling, for that our scheduling outperforms the default Linux scheduling by a factor of 1.1X. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57149 |
全文授權: | 有償授權 |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-103-1.pdf 目前未授權公開取用 | 1.05 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。