Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/43775
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor洪士灝(Shih-Hao Hung)
dc.contributor.authorYu-Wen Huangen
dc.contributor.author黃昱文zh_TW
dc.date.accessioned2021-06-15T02:28:17Z-
dc.date.available2016-08-26
dc.date.copyright2011-08-26
dc.date.issued2011
dc.date.submitted2011-08-22
dc.identifier.citation[1] A. Sodan, J. Machina, A. Deshmeh, K. Macnaughton, and B. Esbaugh, “Parallelism via
multithreaded and multicore cpus,” Computer, vol. 43, no. 3, pp. 24 –32, march 2010.
[2] Z. Wang, R. Liu, Y. Chen, X. Wu, H. Chen, W. Zhang, and B. Zang, “Coremu: a scalable
and portable parallel full-system emulator,” in Proceedings of the 16th ACM symposium
on Principles and practice of parallel programming. New York, NY, USA: ACM, 2011,
pp. 213–222.
[3] M. Rosenblum, S. A. Herrod, E. Witchel, and A. Gupta, “Complete computer system sim-
ulation: The SimOS approach,” IEEE Parallel Distrib. Technol., vol. 3, no. 4, p. 34–43,
1995.
[4] P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg,
F. Larsson, A. Moestedt, and B. Werner, “Simics: A full system simulation platform,”
Computer, vol. 35, no. 2, pp. 50 –58, feb 2002.
[5] R. Bedichek, “SimNow: Fast Platform Simulation Purely in Software,” in 16th Hot Chips
Symp, 2004.
[6] F. Bellard, “Qemu, a fast and portable dynamic translator,” in Proceedings of the annual
conference on USENIX Annual Technical Conference, ser. ATEC ’05.
Berkeley, CA,
USA: USENIX Association, 2005, pp. 41–41.
[7] C. S. Department, H. W. Cain, K. M. Lepak, O. A. Schwartz, and M. H. Lipasti, “Precise
and accurate processor simulation,” pp. 13–22, 2002.
[8] “Performance Analysis with the IBM Full-System Simulator,” International Bussiness
Machine Cooperation, Tech. Rep., September 2007.
[9] H. Zeng, M. Yourst, K. Ghose, and D. Ponomarev, “Mptlsim: a cycle-accurate, full-
system simulator for x86-64 multicore architectures with coherent caches,” SIGARCH
Comput. Archit. News, vol. 37, pp. 2–9, July 2009.
[10] A. Patel, F. Afram, S. Chen, and K. Ghose, “MARSSx86: A Full System Simulator for
x86 CPUs,” in Design Automation Conference 2011 (DAC’11), 2011.
[11] A. Over, B. Clarke, and P. Strazdins, “A comparison of two approaches to parallel sim-
ulation of multiprocessors,” in Performance Analysis of Systems Software, 2007. ISPASS
2007. IEEE International Symposium on, april 2007, pp. 12 –22.
[12] R. E. Lantz, “Fast functional simulation with parallel embra,” in Proceedings of the 4th
Annual Workshop on Modeling, Benchmarking and Simulation, June 2008.
[13] E. Witchel and M. Rosenblum, “Embra:
Fast and flexible machine simulation,”
in SIGMETRICS ’96: Proceedings of the 1996 ACM SIGMETRICS international
conference on Measurement and modeling of computer systems, 1996, pp. 68–79.
[Online]. Available: http://portal.acm.org/citation.cfm?id=233013.233025
[14] K. Wang, Y. Zhang, H. Wang, and X. Shen, “Parallelization of ibm mambo system
simulator in functional modes,” SIGOPS Oper. Syst. Rev., vol. 42, no. 1, pp. 71–76, 2008.
[Online]. Available: http://portal.acm.org/citation.cfm?id=1341325#
[15] P. Bohrer, J. Peterson, M. Elnozahy, R. Rajamony, A. Gheith, R. Rockhold, C. Lefurgy,
H. Shafi, T. Nakra, R. Simpson, E. Speight, K. Sudeep, E. Van Hensbergen, and L. Zhang,
“Mambo: a full system simulator for the powerpc architecture,” SIGMETRICS Perform.
Eval. Rev., vol. 31, no. 4, p. 8–12, 2004.
[16] “Android emulator,” http://developer.android.com/guide/developing/tools/emulator.html.
[17] “Openmoko,” http://www.openmoko.org.
[18] J. Hennessy, D. Patterson, and A. Arpaci-Dusseau, Computer architecture: a quantita-
tive approach, ser. The Morgan Kaufmann Series in Computer Architecture and Design.
Morgan Kaufmann, 2007.
[19] L. Benini, D. Bertozzi, A. Bogliolo, F. Menichelli, and M. Olivieri, “Mparm: Exploring
the multi-processor soc design space with systemc,” VLSI Signal Processing, vol. 41,
no. 2, pp. 169–182, 2005.
[20] M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu,
A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood, “Multifacet’s
general execution-driven multiprocessor simulator (gems) toolset,” SIGARCH Comput.
Archit. News, vol. 33, pp. 92–99, November 2005. [Online]. Available:
http:
//doi.acm.org/10.1145/1105734.1105747
[21] C. Keenan, HP-UX CSE: official study guide and desk reference, ser. HP Professional
Series.
Prentice Hall PTR, 2004.
[22] N. Nethercote and J. Seward, “Valgrind: a framework for heavyweight dynamic
binary instrumentation,” in Proceedings of the 2007 ACM SIGPLAN conference on
Programming language design and implementation, pp. 89–100. [Online]. Available:
http://doi.acm.org/10.1145/1250734.1250746
[23] A. Jaleel, R. S. Cohn, C. keung Luk, and B. Jacob, “Cmp$im: A pin-based on-the-fly
multi-core cache simulator,” in Proceedings of The Fourth Annual Workshop on Modeling,
Benchmarking and Simulation(MoBS), 2008, pp. 28–36.
[24] “Pin,” http://www.pintool.org.
[25] R. Uhlig and T. N. Mudge, “Trace-driven memory simulation: A survey,” ACM Comput.
Surv., vol. 29, no. 2, pp. 128–170, 1997.
[26] S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, “The splash-2 programs:
characterization and methodological considerations,” in Proceedings of the 22nd annual
international symposium on Computer architecture, ser. ISCA ’95. New York, NY, USA:
ACM, 1995, pp. 24–36. [Online]. Available: http://doi.acm.org/10.1145/223982.223990
[27] J. B. Rothman and A. J. Smith, “Analysis of shared memory misses and reference pat-
terns,” in ICCD, 2000, pp. 187–198.
[28] C. Bienia and K. Li, “Parsec 2.0: A new benchmark suite for chip-multiprocessors,” in
Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation,
June 2009.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/43775-
dc.description.abstractOn today’s multicore systems, developing correct and efficient parallel programs can be very challenging. On a shared memory system, inter-thread communications may result in cache contention and cause significant performance degradation which is difficult for application developers to analyze. Simulation tools would be useful for identifying such problems. However, traditional approaches with detailed caches models scale poorly and become impractical for many-core systems.
In this thesis, we propose a method to estimate the coherence misses of parallel programs across and within synchronization barriers. We integrated the proposed method into an open-source system-level emulator, COREMU, to evaluate the scalability of simulation. The memory references and the barrier operations in a multi-threaded program are analyzed by our method in parallel to estimate the lower bound and the upper bound for the coherence misses in each parallel region. The results from our experiments show that our approach is useful in finding the source of coherence misses and detecting false sharing data structures in parallel program.
en
dc.description.provenanceMade available in DSpace on 2021-06-15T02:28:17Z (GMT). No. of bitstreams: 1
ntu-100-R98922067-1.pdf: 3489507 bytes, checksum: e121c1d7431912dec6ac3f0686345a4f (MD5)
Previous issue date: 2011
en
dc.description.tableofcontentsAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
中文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Abstract. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background and Related Works. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Full-System Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Multicore Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 COREMU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Coherent Cache Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Overview of Fast Cache Coherence Emulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1 Multi-threaded Cache Emulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Characterization of Multi-threaded Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1 Collecting Parallel Traces on Virtual Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.1 Generating Parallel Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.2 Generating Sequential Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Multi-threaded Coherence Cache Emulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.1 Required Communication Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2.2 Optional Communication Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5 Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.1 Experimental Setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.2 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2.1 Emulation Speed Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2.2 Behaviors of Multi-Threaded Applications on A Many-Core Machine . . . . . . 22
6 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
dc.language.isoen
dc.subject快取記憶體zh_TW
dc.subject多核心系統zh_TW
dc.subject虛擬平台zh_TW
dc.subjectmulticoreen
dc.subjectcoherence missen
dc.subjectvirtual platformen
dc.subjectcache coherenceen
dc.title以多核心模擬多核心系統中快取記憶體之一致性協定zh_TW
dc.titleMulticore-on-Multicore Simulation with Emphasis on Cache Coherence Protocolsen
dc.typeThesis
dc.date.schoolyear99-2
dc.description.degree碩士
dc.contributor.oralexamcommittee施吉昇(Chi-Sheng Shih),林風(Phone Lin)
dc.subject.keyword虛擬平台,多核心系統,快取記憶體,zh_TW
dc.subject.keywordvirtual platform,multicore,cache coherence,coherence miss,en
dc.relation.page31
dc.rights.note有償授權
dc.date.accepted2011-08-22
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-100-1.pdf
  未授權公開取用
3.41 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved