Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57149
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor劉邦鋒(Pangfeng Liu)
dc.contributor.authorPei-Chi Chenen
dc.contributor.author陳培基zh_TW
dc.date.accessioned2021-06-16T06:36:12Z-
dc.date.available2019-08-04
dc.date.copyright2014-08-04
dc.date.issued2014
dc.date.submitted2014-08-01
dc.identifier.citation[1] Shekhar Borkar. Thousand core chips: a technology perspective. In Proceedings of the 44th annual Design Automation Conference, DAC ’07, pages 746–749, New York, NY, USA, 2007. ACM.
[2] Computer Sciences Department, Harold W. Cain, Kevin M. Lepak, On A. Schwartz, and Mikko H. Lipasti. Precise and accurate processor simulation. In In Proceedings of the Fifth Workshop on Computer Architecture Evaluation using Commercial Workloads, pages 13–22, 2002.
[3] Qemu. http://www.ericsson.com/mobility-report.
[4] Zhaoguo Wang, Ran Liu, Yufei Chen, Xi Wu, Haibo Chen, Weihua Zhang, and Binyu Zang. Coremu: a scalable and portable parallel full-system emulator. In PPOPP’11, pages 213–222, 2011.
[5] Jiun-Hung Ding, Po-Chun Chang, Wei-Chung Hsu, and Yeh-Ching Chung. Pqemu: A par-allel system emulator based on qemu. In ICPADS’11, pages 276–283, 2011.
[6] Ding-Yong Hong, Chun-Chen Hsu, Pen-Chung Yew, Jan-Jan Wu, Wei-Chung Hsu, Pangfeng
Liu, Chien-Min Wang, and Yeh-Ching Chung. Hqemu: a multi-threaded and retargetable dynamic binary translator on multicores. In CGO’12, pages 104–113, 2012.
[7] Sergey Zhuravlev, Sergey Blagodurov, and Alexandra Fedorova. Addressing shared resource contention in multicore processors via scheduling. In James C. Hoe and Vikram S.
Adve, editors, Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2010, Pittsburgh, Pennsylvania, USA, March 13-17, 2010, pages 129–142. ACM, 2010.
[8] Yunlian Jiang, Xipeng Shen, Jie Chen, and Rahul Tripathi. Analysis and approximation of
optimal co-scheduling on chip multiprocessors. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT ’08, pages 220–229, New York, NY, USA, 2008. ACM.
[9] Jia Rao, Kun Wang, Xiaobo Zhou, and Cheng-Zhong Xu. Optimizing virtual machine scheduling in numa multicore systems. In Proceedings of the 2013 IEEE 19th Interna-
tional Symposium on High Performance Computer Architecture (HPCA), HPCA ’13, pages 306–317, Washington, DC, USA, 2013. IEEE Computer Society.
[10] Tanima Dey, Wei Wang, Jack W. Davidson, and Mary Lou Soffa. Resense: Mapping dynamic workloads of colocated multithreaded applications using resource sensitivity. ACM Trans.
Archit. Code Optim., 10(4):41:1–41:25, December 2013.
[11] Vasanth Bala, Evelyn Duesterwald, and Sanjeev Banerjia. Dynamo: A transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN 2000 Conference on Program-
ming Language Design and Implementation, PLDI ’00, pages 1–12, New York, NY, USA, 2000. ACM.
[12] Swaroop Sridhar, Jonathan S. Shapiro, Eric Northup, and Prashanth P. Bungale. Hdtrans: An open source, low-level dynamic instrumentation system. In Proceedings of the 2Nd
International Conference on Virtual Execution Environments, VEE ’06, pages 175–185, New York, NY, USA, 2006. ACM.
[13] Jiwei Lu, Howard Chen, Pen-Chung Yew, and Wei chung Hsu. Design and implementation of a lightweight dynamic optimization system. Journal of Instruction-Level Parallelism, 6:2004, 2004.
[14] Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan Zhou, and Youfeng Wu. Lift: A low-overhead practical information flow tracking system for detecting security attacks. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture,
MICRO 39, pages 135–148, Washington, DC, USA, 2006. IEEE Computer Society.
[15] Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin: Building customized pro-
gram analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIG-PLAN Conference on Programming Language Design and Implementation, PLDI ’05, pages 190–200, New York, NY, USA, 2005. ACM.
[16] Nicholas Nethercote and Julian Seward. Valgrind: A framework for heavyweight dynamic binary instrumentation. In Proceedings of the 2007 ACM SIGPLAN Conference on Program-
ming Language Design and Implementation, PLDI ’07, pages 89–100, New York, NY, USA, 2007. ACM.
[17] Anton Chernoff, Mark Herdeg, Ray Hookway, Chris Reeve, Norman Rubin, Tony Tye, S. Bharadwaj Yadavalli, and John Yates. Fx!32 - a profile-directed binary translator. IEEE
Micro, 18:56–64, 1998.
[18] Chris Lattner and Vikram Adve. Llvm: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, CGO ’04, pages 75–, Washington, DC, USA, 2004. IEEE Computer Society.
[19] Evelyn Duesterwald and Vasanth Bala. Software profiling for hot path prediction: less is more. In ASPLOS-IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, pages 202–211, New York, NY, USA, 2000. ACM.
[20] Memory hierarchy. http://en.wikipedia.org/wiki/Memory hierarchy.
[21] Christian Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011.
[22] Bios and kernel developer’s guide (bkdg) family 10h processors.
http://developer.amd.com/wordpress/media/2012/10/31116.pdf.
[23] The hardware performance monitoring interface for linux. http://perfmon2.sourceforge.net/.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57149-
dc.description.abstract我們提出一個高效能的平行系統模擬器,命名為 HCOREMU。現有的系統模擬器主要都關注執行的正確性以及 VCPU 間同步的機制,但是有兩個重要的因素會降低他們的效能。分別是模擬器產生的機器碼的優劣,以及用來模擬的多線程會去競爭有限的共享硬體資源。在提升模擬機器碼的品質方面,我們利用現在普遍存在的多核心機器,再根基於 HQEMU 提出的追蹤式 多線程最佳化,提出了兩種引入HCOREMU 的方法。在多線程競爭共享硬體資源的方面,我們減少了三種因為競爭 而造成的效能降低的情形。第一個情況是我們發現了在非均勻訪存機器 (NUMA)上 預設的 Linux 排程器與記憶體分配的行為會有所出入。第二個情況是我們用來幫助 提高模擬機器碼品質的線程干擾模擬的線程。第三種情況則是,我們發現 某些特定的應用程式會讓多個線程一直存取某段特定的記憶體位置。我們藉由硬體的幫助來偵測上述的情況,同時也提出了對應的解決方式。HCOREMU 的效能相較於 COREMU 在單一核心模擬有 1.8 倍的提升,在多核心模擬則有 1.3 倍的提升。我們的排程方法則是相較於預設的 Linux 排程器有了 1.1 倍的提升。zh_TW
dc.description.abstractWe present the high performance parallel system mode emulator, HCOREMU. Existing parallel system mode emulators focus on the correctness and synchronization mechanisms of emulation. However, there are two important factors that usually impede the performance: (1) the quality of
emulation code and (2) threads contention on shared hardware resources. In this thesis, we take advantage of the ubiquitous multi-core platforms to improve our emulation code quality. We also propose two designs to accelerate multi-core system mode emulation based on the trace-based multi-threaded optimization in HQEMU.
We reduce shared resource contention in three ways. First, We reduce the interconnect traffic and access latency of our threads due to the inconsistency of default Linux scheduler and memory allocator on NUMA platform. Second, we reduce the contention between optimization threads and emulation threads. Third, we find out that some workloads have a hotspot when accessing memory. We use hardware performance counters to detect this situation. We reduce the interconnect traffic and access latency of emulation threads in workloads having this characteristics.
HCOREMU improves the performance of COREMU by a factor of 1.8X in uni-processor emulation, 1.3X in multi-core emulation. Threads contention on shared resources are reduced by our scheduling, for that our scheduling outperforms the default Linux scheduling by a factor of 1.1X.
en
dc.description.provenanceMade available in DSpace on 2021-06-16T06:36:12Z (GMT). No. of bitstreams: 1
ntu-103-R01922053-1.pdf: 1070868 bytes, checksum: c22c443bd347e1f606b0b6e53f3cd0f7 (MD5)
Previous issue date: 2014
en
dc.description.tableofcontentsContents
Acknowledgement ii
Chinese Abstract iii
Abstract iv
1 Introduction 1
2 Related Work 4
2.1 System Mode Emulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Shared Resource Contention . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Designs of HCOREMU 6
3.1 Overview of COREMU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Multi-threaded Trace-based Optimization in HQEMU . . . . . . . . . . . . . . . 7
3.3 Private Queue and Global Queue Designs in HCOREMU . . . . . . . . . . . . . 8
3.3.1 Private Queue Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3.2 Global Queue Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Shared Resource Contention 10
4.1 Inconsistency of Linux Scheduler and Memory Allocator on NUMA platform . . 10
4.2 LLVM Threads contend with VCPU threads . . . . . . . . . . . . . . . . . . . . 12
4.3 Hotspot when VCPU access memory . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4 Overall Scheduling Policies of HCOREMU . . . . . . . . . . . . . . . . . . . . 15
5 Experiment Results 17
5.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.1.1 Uniprocessor Emulation Performance . . . . . . . . . . . . . . . . . . . 18
5.1.2 Multiprocessors Emulation Performance . . . . . . . . . . . . . . . . . . 18
5.1.3 Effectiveness of Scheduling Policies . . . . . . . . . . . . . . . . . . . . 20
6 Conclusion 22
7 Bibliography 23
dc.language.isozh-TW
dc.subject多線程zh_TW
dc.subject系統模式模擬zh_TW
dc.subject多核心zh_TW
dc.subject平行模擬zh_TW
dc.subject共享資源競爭zh_TW
dc.subject追蹤式動態二元碼轉換最佳化zh_TW
dc.subject底層虛擬機器zh_TW
dc.subjectLLVMen
dc.subjectShared Resources Contentionen
dc.subjectTrace-based Dynamic Binary Translation Optimizationen
dc.subjectMulti-Threadeden
dc.subjectParallel Emulationen
dc.subjectSystem Mode Emulationen
dc.subjectMulticoresen
dc.title加速多核系統模擬暨減少硬體共享資源競爭zh_TW
dc.titleHCOREMU: Accelerating Multicore System Emulation
and Reducing Hardware Shared Resource Contention
en
dc.typeThesis
dc.date.schoolyear102-2
dc.description.degree碩士
dc.contributor.coadvisor吳真貞(Jan-Jan Wu)
dc.contributor.oralexamcommittee徐慰中(Wei-Chung Hsu),洪鼎詠(Ding-Yong Hong)
dc.subject.keyword平行模擬,系統模式模擬,多核心,底層虛擬機器,多線程,追蹤式動態二元碼轉換最佳化,共享資源競爭,zh_TW
dc.subject.keywordParallel Emulation,System Mode Emulation,Multicores,LLVM,Multi-Threaded,Trace-based Dynamic Binary Translation Optimization,Shared Resources Contention,en
dc.relation.page25
dc.rights.note有償授權
dc.date.accepted2014-08-01
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-103-1.pdf
  未授權公開取用
1.05 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved