尋找多核心系統架構程式最佳化機會之新方法

Ching-Chi Lin; 林敬棋

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/8582

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	劉邦鋒
dc.contributor.author	Ching-Chi Lin	en
dc.contributor.author	林敬棋	zh_TW
dc.date.accessioned	2021-05-20T19:58:37Z	-
dc.date.available	2012-07-20
dc.date.available	2021-05-20T19:58:37Z	-
dc.date.copyright	2010-07-20
dc.date.issued	2010
dc.date.submitted	2010-07-12
dc.identifier.citation	[1] V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. SIGPLAN Not., 35(5):1–12, 2000. [2] D. Bruening, T. Garnett, and S. Amarasinghe. An infrastructure for adaptive dynamic optimization. In CGO ’03: Proceedings of the international symposium on Code generation and optimization, pages 265–275, Washington, DC, USA, 2003. IEEE Computer Society. [3] J. Lu, H. Chen, P.C. Yew, and W.C. Hsu. Design and implementation of a lightweight dynamic optimization system. Journal of Instruction-Level Parallelism, 6, April 2004. [4] W.K Chen, S. Lerner, R. Chaiken, and D.M. Gillies. Mojo: A dynamic optimization system. In Proceedings of the 3rd ACM Workshop on Feedback-Directed and Dynamic Optimization, 2000. [5] C. Pereira, J. Lau, B. Calder, and R. Gupta. Dynamic phase analysis for cycle-close trace generation. In CODES+ISSS ’05: Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, pages 321–326, New York, NY, USA, 2005. ACM. [6] A.S. Dhodapkar and J.E. Smith. Managing multi-configuration hardware via dynamic working set analysis. In ISCA ’02: Proceedings of the 29th annual international symposium on Computer architecture, pages 233–244, Washington, DC, USA, 2002. IEEE Computer Society. [7] T. Kistler and M. Franz. Continuous program optimization: A case study. ACM Trans. Program. Lang. Syst., 25(4):500–548, 2003. [8] G. Hamerly, E. Perelman, and B. Calderd. How to use simpoint to pick simulation points. SIGMETRICS Perform. Eval. Rev., 31(4):25–30, 2004. [9] A. Das, J. Lu, and W.C. Hsu. Region monitoring for local phase detection in dynamic optimization systems. In CGO ’06: Proceedings of the International Symposium on Code Generation and Optimization, pages 124–134, Washington, DC, USA, 2006. IEEE Computer Society. [10] P. Nagpurkar, C. Krintz, M. Hind, P.F. Sweeney, and V.T. Rajan. Online phase detection algorithms. In CGO ’06: Proceedings of the International Symposium on Code Generation and Optimization, pages 111–123, Washington, DC, USA, 2006. IEEE Computer Society. [11] G. Fursin, A. Cohen, M. O’Boyle, and O. Temam. A practical method for quickly evaluating program optimizations. In Proceedings of the 1st International Conference on High Performance Embedded Architectures and Compilers, pages 29–46, November 2005. [12] J. Lau,M. Arnold, M. Hind, and B. Calder. Online performance auditing: using hot optimizations without getting burned. In PLDI ’06: Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation, pages 239–251, New York, NY, USA, 2006. ACM. [13] C. Dubach, J. Cavazos, B. Franke, G. Fursin, M.F.P. O’Boyle, and O. Temam. Fast compiler optimisation evaluation using code-feature based performance prediction. In CF ’07: Proceedings of the 4th international conference on Computing frontiers, pages 131–142, New York, NY, USA, 2007. ACM. [14] M. Annavaram, R. Rakvic, M. Polito, J.Y. Bouguet, R.A. Hankins, and B. Davies. The fuzzy correlation between code and performance predictability. In Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, pages 93–104. IEEE Computer Society, 2004. [15] J. Lau, J. Sampson, E. Perelman, G. Hamerly, and B. Calder. The strong correlation between code signatures and performance. In ISPASS ’05: Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005, pages 236–247, Washington, DC, USA, 2005. IEEE Computer Society. [16] Y. Jiang, E.Z. Zhang, K. Tian, F.Mao, M. Gethers, X. Shen, and T. Gao. Exploiting statistical correlations for proactive prediction of program behaviors. In CGO ’10: Proceedings of the 8th annual IEEE/ACMinternational symposium on Code generation and optimization, pages 248–256, New York, NY, USA, 2010. ACM. [17] T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. In PACT ’01: Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques, pages 3–14, Washington, DC, USA, 2001. IEEE Computer Society. [18] T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, pages 45–57, New York, NY, USA, 2002. ACM. [19] T. Sherwood, E. Perelman, G. Hamerly, S. Sair, and B. Calder. Discovering and exploiting program phases. IEEE Micro, 23(6):84–93, 2003. [20] T. Sherwood, S. Sair, and B. Calder. Phase tracking and prediction. In ISCA ’03: Proceedings of the 30th annual international symposium on Computer architecture, pages 336–349, New York, NY, USA, 2003. ACM. [21] M.C. Merten, A.R. Trick, C.N. George, J.C. Gyllenhaal, and W.W. Hwu. A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization. In ISCA ’99: Proceedings of the 26th annual international symposium on Computer architecture, pages 136–147, Washington, DC, USA, 1999. IEEE Computer Society. [22] B. Davies, J. Bouguet, M. Polito, and M. Annavaram. iPART: An Automated Phase Analysis and Recognition Tool. Technical Report IR-TR-2004-1-iPART, Intel Corporation, February 2004. ftp://download.intel.com/research/library/IR-TR-2004-1-iPART.pdf. [23] E. Perelman, M. Polito, J.Y. Bouguet, J. Sampson, B. Calder, and C. Dulong. Detecting phases in parallel applications on shared memory architectures. In IPDPS ’06: Proceedings of the 20th International Parallel and Distributed Processing Symposium, April 2006. [24] G. Hamerly and C. Elkan. Alternatives to the k-means algorithm that find better clusterings. In CIKM ’02: Proceedings of the eleventh international conference on Information and knowledge management, pages 600–607, New York, NY, USA, 2002. ACM. [25] D. Pelleg and A.W. Moore. X-means: Extending k-means with efficient estimation of the number of clusters. In ICML ’00: Proceedings of the Seventeenth International Conference on Machine Learning, pages 727–734, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc. [26] C.K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI ’05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 190–200, New York, NY, USA, 2005. ACM. [27] Perfmon. http://perfmon2.sourceforge.net/.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/8582	-
dc.description.abstract	程式最佳化主要可以分爲兩種方式－靜態最佳化以及動態最佳化，這兩種最佳化方式在處理單核心系統架構中的程式都有不錯的表現。但是在多核心系統新架構下，兩種最佳化方式在找尋程式中的最佳化機會時，都沒有將多執行緒程式執行緒間的互動列入考慮。我們的目的是要發展出一個能夠辨認出執行緒間互動的技術，並且利用這些資訊來幫助程式最佳化。經由觀察發現，執行緒間的互動像是競爭公用快取記憶體，可能導致“不穩定”的程式行爲。也就是說，執行程式中相同的程式碼片段，理論上會有相同或相似的效能，但實際上卻有很大的差異，就會被稱作“不穩定”。我們將這些不穩定的程式片段視爲“最佳化的機會”，希望能夠藉由最佳化這些片段，使它們恢穩定，進而提升執行效能。我們提出了一個簡單而且有效率的方法，藉由取樣以及分析基本區塊的效能變化，來分辨出哪些基本區塊是“穩定”，而哪些是“不穩定”。分析得到的結果能夠讓動態最佳化器用來分辨在程式執行的過程中，哪些基本區塊是不穩定的，需要特別注意或特殊處理。我們可以藉由將取樣到的指令指標對應回它所屬的基本區塊，進而找到出現次數最多的基本區塊，用來當作程式執行區間的代表。藉此可以比較不同執行緒中有相同代表的程式區段的效能，計算出每個區段的效能變化。這個方法也能夠應用在單一執行緒的程式。	zh_TW
dc.description.abstract	There have two groups of works in optimizing program execution in the literature – static and dynamic program optimization. To our best knowledge, neither of these optimizations, while looking for optimization opportunities, considers interactions among threads in multi-core architecture. Therefore we would like to develop techniques that can identify the presence of thread interactions and use it to guide possible optimization. We observe that interaction among threads, like competition for shared cache, can lead to “unstable” execution performance. That is, the same part of program will have very different performance characteristics, therefore we identify those parts of program as dynamic optimization opportunity, so that they can be optimized for better performance. We propose a simple and efficient sampling method that analyzes performance variance among basic blocks, so as to differentiate “unstable” and “stable” basic blocks. The results from the analysis can be used as a reference to determine which parts of the program on which dynamic optimizer should make extra efforts during execution. By mapping EIP of each sample back to its basic block, we are able to choose representative basic block for each interval during execution, and compare the performance of each thread, so as to calculate the performance variance of each　basic block. This sampling technique can also be applied to single-threaded programs.	en
dc.description.provenance	Made available in DSpace on 2021-05-20T19:58:37Z (GMT). No. of bitstreams: 1 ntu-99-R97922128-1.pdf: 637470 bytes, checksum: 6839348f488b7ab5e552b1b09ff1716e (MD5) Previous issue date: 2010	en
dc.description.tableofcontents	Certification i Acknowledgement ii Chinese Abstract iii Abstract iv 1 Introduction 1 2 Related Works 3 3 Choosing Representative 5 3.1 Average EIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4 Methodology 9 4.1 Sampling and Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2 Representative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.3 Optimization Opportunity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5 Experimental Results 11 5.1 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.2 Classification Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.3 Performance Comparison Results . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.3.1 Optimization Level: O0 vs O2 . . . . . . . . . . . . . . . . . . . . . . . 15 5.3.2 Unstable parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5.4 Case Study: Swim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.5 Summary of Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 18 6 Conclusion 19 Bibliography 20
dc.language.iso	en
dc.title	尋找多核心系統架構程式最佳化機會之新方法	zh_TW
dc.title	A New Prospect in Finding Optimization Opportunities for Multicore Architectures	en
dc.type	Thesis
dc.date.schoolyear	98-2
dc.description.degree	碩士
dc.contributor.coadvisor	游本中
dc.contributor.oralexamcommittee	吳真貞,徐慰中,陳添福
dc.subject.keyword	多核心架構,動態最佳化,程式行&#29234,	zh_TW
dc.subject.keyword	Multi-core architecture,Dynamic optimization,Program behavior,	en
dc.relation.page	22
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2010-07-13
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-99-1.pdf	622.53 kB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。