Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70984
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor徐慰中(Wei-Chung Hsu)
dc.contributor.authorChih-Yung Liangen
dc.contributor.author梁智湧zh_TW
dc.date.accessioned2021-06-17T04:47:00Z-
dc.date.available2023-08-01
dc.date.copyright2018-08-01
dc.date.issued2018
dc.date.submitted2018-08-01
dc.identifier.citation[1] M.Amini,B.Creusillet,S.Even,R.Keryell,O.Goubier,S.Guelton,J.O.Mcmahon, F.-X. Pasquier, G. Péan, and P. Villalon. Par4All: From Convex Array Regions to Heterogeneous Computing. In IMPACT 2012 : Second International Workshop on Polyhedral Compilation Techniques HiPEAC 2012, Paris, France, Jan. 2012. 2 pages.
[2] S. Baghdadi, A. Größlinger, and A. Cohen. Putting Automatic Polyhedral Compi- lation for GPGPU to Work. In Proceedings of the 15th Workshop on Compilers for Parallel Computers (CPC’10), Vienna, Austria, July 2010.
[3] M. M. Baskaran, J. Ramanujam, and P. Sadayappan. Automatic c-to-cuda code generation for affine programs. In Compiler Construction, 19th International Con- ference, CC 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010, Paphos, Cyprus, March 20-28, 2010. Proceed- ings, pages 244–263, 2010.
[4] A.Beletska,W.Bielecki,A.Cohen,M.Palkowski,andK.Siedlecki.Coarse-grained loop parallelization: Iteration space slicing vs affine transformations. Parallel Com- puting, 37(8):479–497, 2011.
[5] U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical au- tomatic polyhedral parallelizer and locality optimizer. In Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, Tucson, AZ, USA, June 7-13, 2008, pages 101–113, 2008.
[6] Google inc. Tensorflow: Kernel implementations, 2018. https://www.tensorflow.org/extend/architecture.
[7] Google inc. Tensorflow XLA overview, 2018. https://www.tensorflow.org/performance/xla/.
[8] T.Grosser,A.Größlinger,andC.Lengauer.Polly-performingpolyhedraloptimiza- tions on a low-level intermediate representation. Parallel Processing Letters, 22(4), 2012.
[9] N.Hallou,E.Rohou,andP.Clauss.Runtimevectorizationtransformationsofbinary code. International Journal of Parallel Programming, 45(6):1536–1565, 2017.
[10] C. Lattner and V. S. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 20-24 March 2004, San Jose, CA, USA, pages 75–88, 2004.
[11] S. I. Lee, T. A. Johnson, and R. Eigenmann. Cetus - an extensible compiler in- frastructure for source-to-source transformation. In Languages and Compilers for Parallel Computing, 16th International Workshop, LCPC 2003, College Station, TX, USA, October 2-4, 2003, Revised Papers, pages 539–553, 2003.
[12] C. Liao, D. J. Quinlan, J. Willcock, and T. Panas. Semantic-aware automatic paral- lelization of modern applications using high-level abstractions. International Jour- nal of Parallel Programming, 38(5-6):361–378, 2010.
[13] S. Liu, R. Lo, and F. C. Chow. Loop induction variable canonicalization in paral- lelizing compilers. In Proceedings of the Fifth International Conference on Parallel Architectures and Compilation Techniques, PACT’96, Boston, MA, USA, October 20-23, 1996, pages 228–237, 1996.
[14] F. McMahon. The livermore fortran kernels: A computer test of the numerical per- formance range. Dec 1986.
[15] D. Mikushin, N. Likhogrud, E. Z. Zhang, and C. Bergstrom. Kernelgen - the design and implementation of a next generation compiler platform for accelerating numer- ical models on gpus. In 2014 IEEE International Parallel & Distributed Process- ing Symposium Workshops, Phoenix, AZ, USA, May 19-23, 2014, pages 1011–1020, 2014.
[16] B. Pradelle, A. Ketterlin, and P. Clauss. Polyhedral parallelization of binary code. TACO, 8(4):39:1–39:21, 2012.
[17] RadeonOpenCompute.ROCm,2018.https://github.com/RadeonOpenCompute/ROCm.
[18] I. RAS. Graphite-opencl: Generate opencl code from parallel loops. In GCC Developers’Summit, page 9. Citeseer, 2010.
[19] ROCm Core Technology. Heterogeneous compute compiler (hcc), 2016. https://github.com/RadeonOpenCompute/hcc.
[20] P.Rogers.Heterogeneoussystemarchitectureoverview.In2013IEEEHotChips25 Symposium (HCS), Stanford University, CA, USA, August 25-27, 2013, pages 1–41, 2013.
[21] S. Verdoolaege, J. C. Juega, A. Cohen, J. I. Gómez, C. Tenllado, and F. Catthoor. Polyhedral parallel code generation for CUDA. TACO, 9(4):54:1–54:23, 2013.
[22] T. Yuki and L.-N. Pouchet. Polybench 4.2. May 2016.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70984-
dc.description.abstract異質系統架構(Heterogeneous System Architecture, HSA)是一個由HSA基金會(HSA Foundation)提出的異質計算硬體架構。該架構之統一記憶體架構(HSA Unified Memory Architecture, hUMA)使得資料得以共享於異質裝置中,其提供之使用者層級排隊模型(HSA Queuing Model, hQ)亦能以低成本將程式調度於不同異質裝置上執行,這些特色使得應用程式得以使用更有效率的異質計算。然而,今日之大多數異質計算卻無法得力於hUMA與hQ,甚至大部分市場上的應用程式都以傳統之循序執行模型來實作。
此論文目的為建構一個全自動化的框架以自動轉移循序應用程式至HSA平台上,其包含使用多面體記憶體相依分析、階段化調度預測以及記憶體存取合併優化。此框架亦使用hUMA及hQ所帶來之好處,於符合HSA標準之機器上達成低成本之工作調度。在AMD Carrizo機型上(符合HSA標準),我們的框架最快可以使一個循序應用程式在同一機器上加速至原先之8.66倍。在傳統認為工作量不夠大而無法得力於非HSA異質計算之許多情形中,我們的框架仍能帶來一定程度的加速。此外,其所帶來之加速程度,在同一台Carrizo機器上有時甚至超過人為使用不論HSA平台或非HSA平台轉移之結果。此架構使得許多以循序模型實作之既有傳統應用程式能夠因為HSA的異質計算而達到效能的提升。
zh_TW
dc.description.abstractHeterogeneous System Architecture (HSA) is a hardware architecture for heterogeneous computing proposed by the HSA Foundation. Its Unified Memory Architecture (hUMA) enables data sharing between heterogeneous devices and its user-level Queuing Model (hQ) enables low overhead kernel launching. With such features, applications could enjoy more efficient and effective heterogeneous computing. However, most of today's heterogeneous-computing applications have not leveraged the hUMA and hQ features. Moreover, the majority of applications on the market are implemented in traditional sequential models.
This thesis looks at building a fully automatic framework to migrate sequential applications to HSA. The framework includes polyhedral-guided memory aliasing analysis, a staged dispatching predictor, and memory coalescing optimization. It also takes advantages of hUMA and hQ to achieve low overhead job dispatching on HSA-compliant systems. On an AMD Carrizo machine (HSA-compliant), a sequential application runs through our framework could be 8.66x faster on Carrizo than before. In several cases where workloads are considered insufficient to benefit from conventional or non-HSA heterogeneous computing, our framework could still deliver significant speedups. In addition, the performance obtained through our framework can sometimes exceed the performance gain from manual tuning for both HSA and non-HSA platforms, running on the same Carrizo machine. With this framework, many existing applications coded in traditional sequential models could get performance boost from HSA-based heterogeneous computing.
en
dc.description.provenanceMade available in DSpace on 2021-06-17T04:47:00Z (GMT). No. of bitstreams: 1
ntu-107-R05944012-1.pdf: 903668 bytes, checksum: 1ecc72fa7bb0ef0be2bfec55d7be8538 (MD5)
Previous issue date: 2018
en
dc.description.tableofcontents誌謝 iii
Acknowledgements v
摘要 vii
Abstract ix
1 Introduction 1
2 Related Works 5
3 Background 7
3.1 SVM Granularity in OpenCL 2.0 specification. . . . . . . . . . . . . . . 7
3.1.1 Coarse-grained Buffer SVM .................... 7
3.1.2 Fine-grained Buffer SVM ..................... 8
3.1.3 Fine-grained System SVM..................... 8
3.2 Heterogeneous System Architecture (HSA) . . . . . . . . . . . . . . . . 8
3.2.1 HSA Unified Memory Architecture (hUMA) . . . . . . . . . . . 9
3.2.2 HSA Queuing Model (hQ)..................... 9
3.2.3 HSA-enabled programming framework . . . . . . . . . . . . . . 12
4 Design 13
4.1 Loop Analysis................................ 15
4.1.1 Invariant Iteration Count...................... 15
4.1.2 Cross-iteration Dependence .................... 15
4.2 Runtime Execution Flow .......................... 16
4.3 GPU Kernel Construction and Optimization . . . . . . . . . . . . . . . . 17
4.3.1 Transforming a Loop Body to GPU Kernel Function . . . . . . . 17
4.3.2 Machine-dependent Optimization and Code Generation . . . . . . 18
4.4 Staged Dispatching Predictor........................ 19
4.4.1 Compilation-stage Prediction.................... 20
4.4.2 Runtime-stage Prediction...................... 20
5 Evaluation 25
5.1 Experiment Environment and Benchmark Suite . . . . . . . . . . . . . . 25
5.2 Performance Improvement and Dispatching Predictor . . . . . . . . . . . 26
5.3 Overhead of Runtime Stage Prediction................... 30
6 Conclusion 31
Bibliography 33
dc.language.isoen
dc.subject共享虛擬記憶體zh_TW
dc.subject自動轉移zh_TW
dc.subject異質系統架構zh_TW
dc.subject細顆粒系統共享虛擬記憶體zh_TW
dc.subjectautomatic migrationen
dc.subjectHeterogeneous System Architectureen
dc.subjectshared virtual memoryen
dc.subjectfine-grained system SVMen
dc.title將循序程式自動轉移至異質系統架構zh_TW
dc.titleAutomatically Migrating Sequential Applications to Heterogeneous System Architectureen
dc.typeThesis
dc.date.schoolyear106-2
dc.description.degree碩士
dc.contributor.oralexamcommittee張鈞法(Chun-Fa Chang),洪鼎詠(Ding-Yong Hong),吳真貞(Jan-Jan Wu)
dc.subject.keyword自動轉移,異質系統架構,共享虛擬記憶體,細顆粒系統共享虛擬記憶體,zh_TW
dc.subject.keywordautomatic migration,Heterogeneous System Architecture,shared virtual memory,fine-grained system SVM,en
dc.relation.page35
dc.identifier.doi10.6342/NTU201802161
dc.rights.note有償授權
dc.date.accepted2018-08-01
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊網路與多媒體研究所zh_TW
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
ntu-107-1.pdf
  未授權公開取用
882.49 kBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved