以多核心模擬多核心系統中快取記憶體之一致性協定

Yu-Wen Huang; 黃昱文

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/43775

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	洪士灝(Shih-Hao Hung)
dc.contributor.author	Yu-Wen Huang	en
dc.contributor.author	黃昱文	zh_TW
dc.date.accessioned	2021-06-15T02:28:17Z	-
dc.date.available	2016-08-26
dc.date.copyright	2011-08-26
dc.date.issued	2011
dc.date.submitted	2011-08-22
dc.identifier.citation	[1] A. Sodan, J. Machina, A. Deshmeh, K. Macnaughton, and B. Esbaugh, “Parallelism via multithreaded and multicore cpus,” Computer, vol. 43, no. 3, pp. 24 –32, march 2010. [2] Z. Wang, R. Liu, Y. Chen, X. Wu, H. Chen, W. Zhang, and B. Zang, “Coremu: a scalable and portable parallel full-system emulator,” in Proceedings of the 16th ACM symposium on Principles and practice of parallel programming. New York, NY, USA: ACM, 2011, pp. 213–222. [3] M. Rosenblum, S. A. Herrod, E. Witchel, and A. Gupta, “Complete computer system sim- ulation: The SimOS approach,” IEEE Parallel Distrib. Technol., vol. 3, no. 4, p. 34–43, 1995. [4] P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner, “Simics: A full system simulation platform,” Computer, vol. 35, no. 2, pp. 50 –58, feb 2002. [5] R. Bedichek, “SimNow: Fast Platform Simulation Purely in Software,” in 16th Hot Chips Symp, 2004. [6] F. Bellard, “Qemu, a fast and portable dynamic translator,” in Proceedings of the annual conference on USENIX Annual Technical Conference, ser. ATEC ’05. Berkeley, CA, USA: USENIX Association, 2005, pp. 41–41. [7] C. S. Department, H. W. Cain, K. M. Lepak, O. A. Schwartz, and M. H. Lipasti, “Precise and accurate processor simulation,” pp. 13–22, 2002. [8] “Performance Analysis with the IBM Full-System Simulator,” International Bussiness Machine Cooperation, Tech. Rep., September 2007. [9] H. Zeng, M. Yourst, K. Ghose, and D. Ponomarev, “Mptlsim: a cycle-accurate, full- system simulator for x86-64 multicore architectures with coherent caches,” SIGARCH Comput. Archit. News, vol. 37, pp. 2–9, July 2009. [10] A. Patel, F. Afram, S. Chen, and K. Ghose, “MARSSx86: A Full System Simulator for x86 CPUs,” in Design Automation Conference 2011 (DAC’11), 2011. [11] A. Over, B. Clarke, and P. Strazdins, “A comparison of two approaches to parallel sim- ulation of multiprocessors,” in Performance Analysis of Systems Software, 2007. ISPASS 2007. IEEE International Symposium on, april 2007, pp. 12 –22. [12] R. E. Lantz, “Fast functional simulation with parallel embra,” in Proceedings of the 4th Annual Workshop on Modeling, Benchmarking and Simulation, June 2008. [13] E. Witchel and M. Rosenblum, “Embra: Fast and flexible machine simulation,” in SIGMETRICS ’96: Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 1996, pp. 68–79. [Online]. Available: http://portal.acm.org/citation.cfm?id=233013.233025 [14] K. Wang, Y. Zhang, H. Wang, and X. Shen, “Parallelization of ibm mambo system simulator in functional modes,” SIGOPS Oper. Syst. Rev., vol. 42, no. 1, pp. 71–76, 2008. [Online]. Available: http://portal.acm.org/citation.cfm?id=1341325# [15] P. Bohrer, J. Peterson, M. Elnozahy, R. Rajamony, A. Gheith, R. Rockhold, C. Lefurgy, H. Shafi, T. Nakra, R. Simpson, E. Speight, K. Sudeep, E. Van Hensbergen, and L. Zhang, “Mambo: a full system simulator for the powerpc architecture,” SIGMETRICS Perform. Eval. Rev., vol. 31, no. 4, p. 8–12, 2004. [16] “Android emulator,” http://developer.android.com/guide/developing/tools/emulator.html. [17] “Openmoko,” http://www.openmoko.org. [18] J. Hennessy, D. Patterson, and A. Arpaci-Dusseau, Computer architecture: a quantita- tive approach, ser. The Morgan Kaufmann Series in Computer Architecture and Design. Morgan Kaufmann, 2007. [19] L. Benini, D. Bertozzi, A. Bogliolo, F. Menichelli, and M. Olivieri, “Mparm: Exploring the multi-processor soc design space with systemc,” VLSI Signal Processing, vol. 41, no. 2, pp. 169–182, 2005. [20] M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood, “Multifacet’s general execution-driven multiprocessor simulator (gems) toolset,” SIGARCH Comput. Archit. News, vol. 33, pp. 92–99, November 2005. [Online]. Available: http: //doi.acm.org/10.1145/1105734.1105747 [21] C. Keenan, HP-UX CSE: official study guide and desk reference, ser. HP Professional Series. Prentice Hall PTR, 2004. [22] N. Nethercote and J. Seward, “Valgrind: a framework for heavyweight dynamic binary instrumentation,” in Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pp. 89–100. [Online]. Available: http://doi.acm.org/10.1145/1250734.1250746 [23] A. Jaleel, R. S. Cohn, C. keung Luk, and B. Jacob, “Cmp$im: A pin-based on-the-fly multi-core cache simulator,” in Proceedings of The Fourth Annual Workshop on Modeling, Benchmarking and Simulation(MoBS), 2008, pp. 28–36. [24] “Pin,” http://www.pintool.org. [25] R. Uhlig and T. N. Mudge, “Trace-driven memory simulation: A survey,” ACM Comput. Surv., vol. 29, no. 2, pp. 128–170, 1997. [26] S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, “The splash-2 programs: characterization and methodological considerations,” in Proceedings of the 22nd annual international symposium on Computer architecture, ser. ISCA ’95. New York, NY, USA: ACM, 1995, pp. 24–36. [Online]. Available: http://doi.acm.org/10.1145/223982.223990 [27] J. B. Rothman and A. J. Smith, “Analysis of shared memory misses and reference pat- terns,” in ICCD, 2000, pp. 187–198. [28] C. Bienia and K. Li, “Parsec 2.0: A new benchmark suite for chip-multiprocessors,” in Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation, June 2009.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/43775	-
dc.description.abstract	On today’s multicore systems, developing correct and efficient parallel programs can be very challenging. On a shared memory system, inter-thread communications may result in cache contention and cause significant performance degradation which is difficult for application developers to analyze. Simulation tools would be useful for identifying such problems. However, traditional approaches with detailed caches models scale poorly and become impractical for many-core systems. In this thesis, we propose a method to estimate the coherence misses of parallel programs across and within synchronization barriers. We integrated the proposed method into an open-source system-level emulator, COREMU, to evaluate the scalability of simulation. The memory references and the barrier operations in a multi-threaded program are analyzed by our method in parallel to estimate the lower bound and the upper bound for the coherence misses in each parallel region. The results from our experiments show that our approach is useful in finding the source of coherence misses and detecting false sharing data structures in parallel program.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T02:28:17Z (GMT). No. of bitstreams: 1 ntu-100-R98922067-1.pdf: 3489507 bytes, checksum: e121c1d7431912dec6ac3f0686345a4f (MD5) Previous issue date: 2011	en
dc.description.tableofcontents	Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i 中文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Abstract. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Background and Related Works. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Full-System Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Multicore Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2.1 COREMU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Coherent Cache Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Overview of Fast Cache Coherence Emulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1 Multi-threaded Cache Emulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Characterization of Multi-threaded Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.1 Collecting Parallel Traces on Virtual Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.1.1 Generating Parallel Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.1.2 Generating Sequential Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.2 Multi-threaded Coherence Cache Emulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2.1 Required Communication Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2.2 Optional Communication Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5 Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.1 Experimental Setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.2 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.2.1 Emulation Speed Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.2.2 Behaviors of Multi-Threaded Applications on A Many-Core Machine . . . . . . 22 6 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
dc.language.iso	en
dc.subject	快取記憶體	zh_TW
dc.subject	多核心系統	zh_TW
dc.subject	虛擬平台	zh_TW
dc.subject	multicore	en
dc.subject	coherence miss	en
dc.subject	virtual platform	en
dc.subject	cache coherence	en
dc.title	以多核心模擬多核心系統中快取記憶體之一致性協定	zh_TW
dc.title	Multicore-on-Multicore Simulation with Emphasis on Cache Coherence Protocols	en
dc.type	Thesis
dc.date.schoolyear	99-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	施吉昇(Chi-Sheng Shih),林風(Phone Lin)
dc.subject.keyword	虛擬平台,多核心系統,快取記憶體,	zh_TW
dc.subject.keyword	virtual platform,multicore,cache coherence,coherence miss,	en
dc.relation.page	31
dc.rights.note	有償授權
dc.date.accepted	2011-08-22
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-100-1.pdf 未授權公開取用	3.41 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。