請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/43775完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 洪士灝(Shih-Hao Hung) | |
| dc.contributor.author | Yu-Wen Huang | en |
| dc.contributor.author | 黃昱文 | zh_TW |
| dc.date.accessioned | 2021-06-15T02:28:17Z | - |
| dc.date.available | 2016-08-26 | |
| dc.date.copyright | 2011-08-26 | |
| dc.date.issued | 2011 | |
| dc.date.submitted | 2011-08-22 | |
| dc.identifier.citation | [1] A. Sodan, J. Machina, A. Deshmeh, K. Macnaughton, and B. Esbaugh, “Parallelism via
multithreaded and multicore cpus,” Computer, vol. 43, no. 3, pp. 24 –32, march 2010. [2] Z. Wang, R. Liu, Y. Chen, X. Wu, H. Chen, W. Zhang, and B. Zang, “Coremu: a scalable and portable parallel full-system emulator,” in Proceedings of the 16th ACM symposium on Principles and practice of parallel programming. New York, NY, USA: ACM, 2011, pp. 213–222. [3] M. Rosenblum, S. A. Herrod, E. Witchel, and A. Gupta, “Complete computer system sim- ulation: The SimOS approach,” IEEE Parallel Distrib. Technol., vol. 3, no. 4, p. 34–43, 1995. [4] P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner, “Simics: A full system simulation platform,” Computer, vol. 35, no. 2, pp. 50 –58, feb 2002. [5] R. Bedichek, “SimNow: Fast Platform Simulation Purely in Software,” in 16th Hot Chips Symp, 2004. [6] F. Bellard, “Qemu, a fast and portable dynamic translator,” in Proceedings of the annual conference on USENIX Annual Technical Conference, ser. ATEC ’05. Berkeley, CA, USA: USENIX Association, 2005, pp. 41–41. [7] C. S. Department, H. W. Cain, K. M. Lepak, O. A. Schwartz, and M. H. Lipasti, “Precise and accurate processor simulation,” pp. 13–22, 2002. [8] “Performance Analysis with the IBM Full-System Simulator,” International Bussiness Machine Cooperation, Tech. Rep., September 2007. [9] H. Zeng, M. Yourst, K. Ghose, and D. Ponomarev, “Mptlsim: a cycle-accurate, full- system simulator for x86-64 multicore architectures with coherent caches,” SIGARCH Comput. Archit. News, vol. 37, pp. 2–9, July 2009. [10] A. Patel, F. Afram, S. Chen, and K. Ghose, “MARSSx86: A Full System Simulator for x86 CPUs,” in Design Automation Conference 2011 (DAC’11), 2011. [11] A. Over, B. Clarke, and P. Strazdins, “A comparison of two approaches to parallel sim- ulation of multiprocessors,” in Performance Analysis of Systems Software, 2007. ISPASS 2007. IEEE International Symposium on, april 2007, pp. 12 –22. [12] R. E. Lantz, “Fast functional simulation with parallel embra,” in Proceedings of the 4th Annual Workshop on Modeling, Benchmarking and Simulation, June 2008. [13] E. Witchel and M. Rosenblum, “Embra: Fast and flexible machine simulation,” in SIGMETRICS ’96: Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 1996, pp. 68–79. [Online]. Available: http://portal.acm.org/citation.cfm?id=233013.233025 [14] K. Wang, Y. Zhang, H. Wang, and X. Shen, “Parallelization of ibm mambo system simulator in functional modes,” SIGOPS Oper. Syst. Rev., vol. 42, no. 1, pp. 71–76, 2008. [Online]. Available: http://portal.acm.org/citation.cfm?id=1341325# [15] P. Bohrer, J. Peterson, M. Elnozahy, R. Rajamony, A. Gheith, R. Rockhold, C. Lefurgy, H. Shafi, T. Nakra, R. Simpson, E. Speight, K. Sudeep, E. Van Hensbergen, and L. Zhang, “Mambo: a full system simulator for the powerpc architecture,” SIGMETRICS Perform. Eval. Rev., vol. 31, no. 4, p. 8–12, 2004. [16] “Android emulator,” http://developer.android.com/guide/developing/tools/emulator.html. [17] “Openmoko,” http://www.openmoko.org. [18] J. Hennessy, D. Patterson, and A. Arpaci-Dusseau, Computer architecture: a quantita- tive approach, ser. The Morgan Kaufmann Series in Computer Architecture and Design. Morgan Kaufmann, 2007. [19] L. Benini, D. Bertozzi, A. Bogliolo, F. Menichelli, and M. Olivieri, “Mparm: Exploring the multi-processor soc design space with systemc,” VLSI Signal Processing, vol. 41, no. 2, pp. 169–182, 2005. [20] M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood, “Multifacet’s general execution-driven multiprocessor simulator (gems) toolset,” SIGARCH Comput. Archit. News, vol. 33, pp. 92–99, November 2005. [Online]. Available: http: //doi.acm.org/10.1145/1105734.1105747 [21] C. Keenan, HP-UX CSE: official study guide and desk reference, ser. HP Professional Series. Prentice Hall PTR, 2004. [22] N. Nethercote and J. Seward, “Valgrind: a framework for heavyweight dynamic binary instrumentation,” in Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pp. 89–100. [Online]. Available: http://doi.acm.org/10.1145/1250734.1250746 [23] A. Jaleel, R. S. Cohn, C. keung Luk, and B. Jacob, “Cmp$im: A pin-based on-the-fly multi-core cache simulator,” in Proceedings of The Fourth Annual Workshop on Modeling, Benchmarking and Simulation(MoBS), 2008, pp. 28–36. [24] “Pin,” http://www.pintool.org. [25] R. Uhlig and T. N. Mudge, “Trace-driven memory simulation: A survey,” ACM Comput. Surv., vol. 29, no. 2, pp. 128–170, 1997. [26] S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, “The splash-2 programs: characterization and methodological considerations,” in Proceedings of the 22nd annual international symposium on Computer architecture, ser. ISCA ’95. New York, NY, USA: ACM, 1995, pp. 24–36. [Online]. Available: http://doi.acm.org/10.1145/223982.223990 [27] J. B. Rothman and A. J. Smith, “Analysis of shared memory misses and reference pat- terns,” in ICCD, 2000, pp. 187–198. [28] C. Bienia and K. Li, “Parsec 2.0: A new benchmark suite for chip-multiprocessors,” in Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation, June 2009. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/43775 | - |
| dc.description.abstract | On today’s multicore systems, developing correct and efficient parallel programs can be very challenging. On a shared memory system, inter-thread communications may result in cache contention and cause significant performance degradation which is difficult for application developers to analyze. Simulation tools would be useful for identifying such problems. However, traditional approaches with detailed caches models scale poorly and become impractical for many-core systems.
In this thesis, we propose a method to estimate the coherence misses of parallel programs across and within synchronization barriers. We integrated the proposed method into an open-source system-level emulator, COREMU, to evaluate the scalability of simulation. The memory references and the barrier operations in a multi-threaded program are analyzed by our method in parallel to estimate the lower bound and the upper bound for the coherence misses in each parallel region. The results from our experiments show that our approach is useful in finding the source of coherence misses and detecting false sharing data structures in parallel program. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-15T02:28:17Z (GMT). No. of bitstreams: 1 ntu-100-R98922067-1.pdf: 3489507 bytes, checksum: e121c1d7431912dec6ac3f0686345a4f (MD5) Previous issue date: 2011 | en |
| dc.description.tableofcontents | Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
中文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Abstract. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Background and Related Works. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Full-System Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Multicore Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2.1 COREMU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Coherent Cache Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Overview of Fast Cache Coherence Emulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1 Multi-threaded Cache Emulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Characterization of Multi-threaded Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.1 Collecting Parallel Traces on Virtual Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.1.1 Generating Parallel Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.1.2 Generating Sequential Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.2 Multi-threaded Coherence Cache Emulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2.1 Required Communication Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2.2 Optional Communication Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5 Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.1 Experimental Setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.2 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.2.1 Emulation Speed Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.2.2 Behaviors of Multi-Threaded Applications on A Many-Core Machine . . . . . . 22 6 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 | |
| dc.language.iso | en | |
| dc.subject | 快取記憶體 | zh_TW |
| dc.subject | 多核心系統 | zh_TW |
| dc.subject | 虛擬平台 | zh_TW |
| dc.subject | multicore | en |
| dc.subject | coherence miss | en |
| dc.subject | virtual platform | en |
| dc.subject | cache coherence | en |
| dc.title | 以多核心模擬多核心系統中快取記憶體之一致性協定 | zh_TW |
| dc.title | Multicore-on-Multicore Simulation with Emphasis on Cache Coherence Protocols | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 99-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 施吉昇(Chi-Sheng Shih),林風(Phone Lin) | |
| dc.subject.keyword | 虛擬平台,多核心系統,快取記憶體, | zh_TW |
| dc.subject.keyword | virtual platform,multicore,cache coherence,coherence miss, | en |
| dc.relation.page | 31 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2011-08-22 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-100-1.pdf 未授權公開取用 | 3.41 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
