使用受限事務內存的應用程式之動態績效調整

Shih-Kai Lin; 林士凱

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/69406

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	徐慰中(Wei-Chung Hsu)
dc.contributor.author	Shih-Kai Lin	en
dc.contributor.author	林士凱	zh_TW
dc.date.accessioned	2021-06-17T03:14:53Z	-
dc.date.available	2019-07-19
dc.date.copyright	2018-07-19
dc.date.issued	2018
dc.date.submitted	2018-07-09
dc.identifier.citation	[1] Y. Afek, A. Levy, and A. Morrison. Programming with hardware lock elision. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13, Shenzhen, China, February 23-27, 2013, pages 295–296, 2013. [2] C. S. Ananian, K. Asanovic, B. C. Kuszmaul, C. E. Leiserson, and S. Lie. Unbounded transactional memory. In High-Performance Computer Architecture, 2005. HPCA-11. 11th International Symposium on, pages 316–327. IEEE, 2005. [3] C. S. Ananian and M. Rinard. Efficient object-based software transactions. In Proceedings, Workshop on Synchronization and Concurrency in Object-Oriented Languages, San Diego, CA. Citeseer, 2005. [4] V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Vancouver, Britith Columbia, Canada, June 18-21, 2000, pages 1–12, 2000. [5] L. Baraz, T. Devor, O. Etzion, S. Goldenberg, A. Skaletsky, Y. Wang, and Y. Zemach. IA-32 execution layer: a two-phase dynamic translator designed to support IA-32 applications on itanium-based systems. In Proceedings of the 36th Annual International Symposium on Microarchitecture, San Diego, CA, USA, December 3-5, 2003, pages 191–204, 2003. [6] F. Bellard. Qemu, a fast and portable dynamic translator. In Proceedings of the FREENIX Track: 2005 USENIX Annual Technical Conference, April 10-15, 2005, Anaheim, CA, USA, pages 41–46, 2005. [7] I. Calciu, T. Shpeisman, G. Pokam, and M. Herlihy. Improved single global lock fallback for best-effort hardware transactional memory. In Transact 2014 Workshop. ACM, 2014. [8] E. G. Cota, P. Bonzini, A. Bennée, and L. P. Carloni. Cross-isa machine emulation for multicores. In Proceedings of the 2017 International Symposium on Code Generation and Optimization, CGO 2017, Austin, TX, USA, February 4-8, 2017, pages 210–220, 2017. [9] J. C. Dehnert, B. Grant, J. P. Banning, R. Johnson, T. Kistler, A. Klaiber, and J. Mattson. The transmeta code morphing - software: Using speculation, recovery, and adaptive retranslation to address real-life challenges. In 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 23-26 March 2003, San Francisco, CA, USA, pages 15–24, 2003. [10] D. Dice, Y. Lev, M. Moir, and D. Nussbaum. Early experience with a commercial hardware transactional memory implementation. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2009, Washington, DC, USA, March 7-11, 2009, pages 157–168, 2009. [11] N. Diegues and P. Romano. Self-tuning intel transactional synchronization extensions. In 11th International Conference on Autonomic Computing, ICAC ’14, Philadelphia, PA, USA, June 18-20, 2014., pages 209–219, 2014. [12] M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lockfree data structures. In Proceedings of the 20th Annual International Symposium on Computer Architecture. San Diego, CA, May 1993, pages 289–300, 1993. [13] R. J. Hookway and M. A. Herdeg. DIGITAL fx!32: Combining emulation and binary translation. Digital Technical Journal, 9(1), 1997. [14] Intel. Intel 64 and ia-32 architectures software developer’s manual. 2016. [15] J. Lu, H. Chen, P. Yew, and W. Hsu. Design and implementation of a lightweight dynamic optimization system. J. Instruction-Level Parallelism, 6, 2004. [16] C. Luk, R. S. Cohn, R. Muth, H. Patil, A. Klauser, P. G. Lowney, S. Wallace, V. J. Reddi, and K. M. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, Chicago, IL, USA, June 12-15, 2005, pages 190–200, 2005. [17] V. J. Marathe, W. N. Scherer, and M. L. Scott. Adaptive software transactional memory. In International Symposium on Distributed Computing, pages 354–368. Springer, 2005. [18] C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: stanford transactional applications for multi-processing. In 4th International Symposium on Workload Characterization (IISWC 2008), Seattle, Washington, USA, September 14-16, 2008, pages 35–46, 2008. [19] M. Moir, K. Moore, and D. Nussbaum. The adaptive transactional memory test platform: a tool for experimenting with transactional code for rock (poster). In SPAA 2008: Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures, Munich, Germany, June 14-16, 2008, page 362, 2008. [20] K. E. Moore, J. Bobba, M. J. Moravan, M. D. Hill, and D. A. Wood. Logtm: logbased transactional memory. In 12th International Symposium on High-Performance Computer Architecture, HPCA-12 2006, Austin, Texas, February 11-15, 2006, pages 254–265, 2006. [21] T. Nakaike, R. Odaira, M. Gaudet, M. M. Michael, and H. Tomari. Quantitative comparison of hardware transactional memory for blue gene/q, zenterprise ec12, intel core, and POWER8. In Proceedings of the 42nd Annual International Symposium on Computer Architecture, Portland, OR, USA, June 13-17, 2015, pages 144–157, 2015. [22] N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, California, USA, June 10-13, 2007, pages 89–100, 2007. [23] R. Quislant, E. Gutiérrez, E. L. Zapata, and O. G. Plata. Insights into the fallback path of best-effort hardware transactional memory systems. In Euro-Par 2016: Parallel Processing - 22nd International Conference on Parallel and Distributed Computing, Grenoble, France, August 24-26, 2016, Proceedings, pages 251–263, 2016. [24] N. Shavit and D. Touitou. Software transactional memory. Distributed Computing, 10(2):99–116, 1997. [25] A. Wang, M. Gaudet, P. Wu, J. N. Amaral, M. Ohmacht, C. Barton, R. Silvera, and M. M. Michael. Evaluation of blue gene/q hardware support for transactional memories. In International Conference on Parallel Architectures and Compilation Techniques, PACT ’12, Minneapolis, MN, USA - September 19 - 23, 2012, pages 127–136, 2012. [26] C. Wang, S. Hu, H. Kim, S. R. Nair, M. B. Jr., Z. Ying, and Y. Wu. Stardbt: An efficient multi-platform dynamic binary translation system. In Advances in Computer Systems Architecture, 12th Asia-Pacific Conference, ACSAC 2007, Seoul, Korea, August 23-25, 2007, Proceedings, pages 4–15, 2007. [27] R. M. Yoo, C. J. Hughes, K. Lai, and R. Rajwar. Performance evaluation of intel® transactional synchronization extensions for high-performance computing. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC’13, Denver, CO, USA - November 17 - 21, 2013, pages 19:1–19:11, 2013.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/69406	-
dc.description.abstract	事務性同步擴展是英特爾第四代處理器上所實作的事務內存，提供兩種編程接口，分別為：硬件鎖省略及受限事務內存。前者較容易做編程，且擁有向下相容性、可以在不支援事務性同步擴展的硬體上執行；後者則是提供較大的彈性及擴充性。在以前的研究中顯示，由受限事務內存所保護的臨界區段配合良好設計的重試機制通常可以擁有優於硬件所省略的執行效能。簡而言之，雖然易於使用的緣故，可能較多的平行應用是使用硬件鎖省略，但改用受限事務內存可能帶來更佳的效能體驗。我們提出一個機於QEMU上實作的框架，可以在運行中將硬件鎖省略的指令轉換成受限事務內存的程式碼片段，並能夠動態地進行績效調整。與原本的硬件鎖省略執行結果相比，我們機於動態二進制轉換上的實作可以在四執行緒的狀況下獲得平均1.15倍的效能提升，以及在八執行緒的狀況下獲得平均1.56倍的效能提升。因為受限事務內存所擁有的擴展性，當執行緒數量越多時，效能提升的現象會更加顯著。	zh_TW
dc.description.abstract	Transactional Synchronization Extensions (TSX) support hardware Transactional Memory (TM) on Intel 4th generation Core processors. Two programming interfaces, Hardware Lock Elision (HLE) and Restricted Transactional Memory (RTM), are provided to support software development using TSX. HLE is easy to use and maintains backward compatible with processors without TSX support while RTM is more flexible and scalable. Previous researches have shown that critical sections protected by RTM with a well-designed retry mechanism as its fallback code path can often achieve better performance than HLE. More parallel programs may be programmed in HLE, however, using RTM may obtain greater performance. To embrace both productivity and high performance of parallel program with TSX, we present a framework built on QEMU that can dynamically transform HLE instructions in an application binary to fragments of RTM codes with adaptive tuning on the fly. Compared to HLE execution, our prototype achieves 1.15x speedup with 4 threads and 1.56x speedup with 8 threads on average. Due to the scalability of RTM, the speedup will be more significant as the number of threads increases.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T03:14:53Z (GMT). No. of bitstreams: 1 ntu-107-R05922043-1.pdf: 578561 bytes, checksum: 8bde84a711a28773005d8d7516cc74a5 (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	口試委員會審定書 . . . . . . . . . . . . . . . . . . . . . . . . . iii 誌謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 Intel Transactional Memory System . . . . . . . . . . . . . . 5 2.1.1 Hardware Lock Elision . . . . . . . . . . . . . . . . . . . 6 2.1.2 Restricted Transactional Memory . . . . . . . . . . . . . . 6 2.1.3 Retry Mechanism with RTM . . . . . . . . . . . . . . . . . . 7 2.2 Dynamic Binary Translation . . . . . . . . . . . . . . . . . . 7 3 HLE-to-RTM Transformation . . . . . . . . . . . . . . . . . . . 9 3.1 Transformation Method . . . . . . . . . . . . . . . . . . . . 9 3.2 Dynamic Tuning of Restricted Transactional Memory . . . . . . 12 4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Results of HLE-to-RTM Transformation . . . . . . . . . . . . . 16 4.3 Native Performance Results with RTM . . . . . . . . . . . . . 19 5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 21 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
dc.language.iso	en
dc.subject	硬體事務內存	zh_TW
dc.subject	事務性同步擴展	zh_TW
dc.subject	動態二進制轉換	zh_TW
dc.subject	重試機制	zh_TW
dc.subject	動態績效調整	zh_TW
dc.subject	Hardware Transactional Memory	en
dc.subject	Dynamic Binary Translation	en
dc.subject	Retry Mechanism	en
dc.subject	Dynamic Tuning	en
dc.subject	Intel Transactional Synchronization Extensions	en
dc.title	使用受限事務內存的應用程式之動態績效調整	zh_TW
dc.title	Dynamic Tuning of Applications using Restricted Transactional Memory	en
dc.type	Thesis
dc.date.schoolyear	106-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	張鈞法(Chun-Fa Chang),吳真貞(Jan-Jan Wu),洪鼎詠(Ding-Yong Hong)
dc.subject.keyword	硬體事務內存,事務性同步擴展,動態績效調整,重試機制,動態二進制轉換,	zh_TW
dc.subject.keyword	Hardware Transactional Memory,Intel Transactional Synchronization Extensions,Dynamic Tuning,Retry Mechanism,Dynamic Binary Translation,	en
dc.relation.page	28
dc.identifier.doi	10.6342/NTU201800901
dc.rights.note	有償授權
dc.date.accepted	2018-07-10
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 未授權公開取用	565 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。