Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51468
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor王勝德(Sheng-De Wang)
dc.contributor.authorMing-Feng Weien
dc.contributor.author魏名鋒zh_TW
dc.date.accessioned2021-06-15T13:35:19Z-
dc.date.available2018-02-16
dc.date.copyright2016-02-16
dc.date.issued2016
dc.date.submitted2016-01-28
dc.identifier.citation[1] KHRONOS OPENCL WORKING GROUP, et al. OpenCL-The open standard for parallel programming of heterogeneous systems. On line] http://www. khronos. org/opencl, 2011.
[2] AMD, AMD Accelerated Parallel Processing (APP) Software Development Kit (SDK). URL http://developer.amd.com/sdks/amdappsdk/.
[3] AMD, AMD. Accelerated parallel processing: OpenCL optimization guide. URL http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/documentation/, 2015.
[4] TOMPSON, Jonathan; SCHLACHTER, Kristofer. An introduction to the opencl programming model. Person Education, 2012.
[5] KYRIAZIS, George. Heterogeneous system architecture: A technical review. AMD Fusion Developer Summit, 2012.
[6] SU, Lisa T. Architecting the future through heterogeneous computing. In: Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International. IEEE, 2013. p. 8-11.
[7] JANG, Byunghyun; CHOI, Minsu; KIM, Kyung Ki. Algorithmic GPGPU memory optimization. In: SoC Design Conference (ISOCC), 2013 International. IEEE, 2013. p. 154-157.
[8] JANG, Byunghyun, et al. Exploiting memory access patterns to improve memory performance in data-parallel architectures. Parallel and Distributed Systems, IEEE Transactions on, 2011, 22.1: 105-118.
[9] JANG, Byunghyun. Evaluation and enhancement of memory efficiency targeting general-purpose computations on scalable data-parallel GPU architectures. PhD Thesis. Department of Electrical and Computer Engineering, Northeastern University. 2011.
[10] AMD, AMD Graphics Cores Next (GCN) architecture whitepaper. URL https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf , 2012.
[11] CHE, Shuai, et al. Rodinia: A benchmark suite for heterogeneous computing. In: Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on. IEEE, 2009. p. 44-54.
[12] CHE, Shuai, et al. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads. In: Workload Characterization (IISWC), 2010 IEEE International Symposium on. IEEE, 2010. p. 1-11.
[13] KAELI, David, et al. Heterogeneous Computing with OpenCL 2.0, Morgan Kaufmann, 2015.
[14] KHAN, Faiz, et al. Using javascript and webcl for numerical computations: A comparative study of native and web technologies. In: Proceedings of the 10th ACM Symposium on Dynamic languages. ACM, 2014. p. 91-102.
[15] APU, AMD. 101: All about AMD Fusion Accelerated Processing Units. 2011.
[16] LEON, Steven J. Linear Algebra with Applications, 8th Edition. Pearson, 2009.
[17] LEUNG, Shun-Tak; ZAHORJAN, John. Optimizing data locality by array restructuring. Department of Computer Science and Engineering, University of Washington, 1995.
[18] WOLF, Michael E.; LAM, Monica S. A loop transformation theory and an algorithm to maximize parallelism. Parallel and Distributed Systems, IEEE Transactions on, 1991, 2.4: 452-471.
[19] LIM, Amy W.; LAM, Monica S. Maximizing parallelism and minimizing synchronization with affine transforms. In: Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages. ACM, 1997. p. 201-214.
[20] KREYSZIG, Erwin. Advanced engineering mathematics. Wiley, 2011.
[21] BANERJEE, Utpal. Data dependence in ordinary programs. 1976. PhD Thesis. Department of Computer Science, University of Illinois at Urbana-Champaign.
[22] WOLFE, Michael J. Techniques for Improving the Inherent Parallelism in Programs, University of Illinois at Urbana-Champaign, 1978. PhD Thesis. MS Thesis.
[23] GHOSH, Somnath; MARTONOSI, Margaret; MALIK, Sharad. Cache miss equations: An analytical representation of cache misses. In: Proceedings of the 11th international conference on Supercomputing. ACM, 1997. p. 317-324.
[24] CHOO, Kyoshin; PANLENER, William; JANG, Byunghyun. Understanding and Optimizing GPU Cache Memory Performance for Compute Workloads. In: Parallel and Distributed Computing (ISPDC), 2014 IEEE 13th International Symposium on. IEEE, 2014. p. 189-196.
[25] BENEDICT R. Gaster; TIM Mattson. OpenCL: An Introduction to Heterogeneous
Programming for HPC. Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. 2010.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51468-
dc.description.abstract全域記憶體的存取往往會造成數百個週期的延遲,使得運作在異質多核心系統上的應用程式效能可能因存取全域記憶體機會增加而顯著降低。本論文提出一種對於記憶體存取的數學建模,它能夠去擷取一群執行緒對於全域的存取,我們也提出一個測量在GPU記憶體系統低效率逐步存取程度的因子。基於一系列對於全域記憶體存取的分析,我們提出一個針對在GPU下記憶體存取問題的方法。多種執行核心的估算結果顯示,在不修改原始碼的前提下,執行核心使用我們所建議的工作群組大小比起廠商所提供的會得到較佳的效能。zh_TW
dc.description.abstractGlobal memory accesses always cause the latency with hundreds of cycles, so that the performance of heterogeneous applications might degrade significantly if global memory accesses increase. In this thesis, we present a mathematical modeling that captures the memory accessing to the public within a group of threads and a metric identifying the degree of inefficient serial accesses in the GPU memory system. Based on the analysis of serial accesses in the memory system caused by global memory accessing within a work-group and among work-groups, we propose an approach to the memory access problem in GPUs. Evaluation on various kernel functions shows that kernels running with the work-group size suggested by our methodology outperforms the work-group size provided by hardware vendors. Heterogeneous applications executing on GPUs can gain the better performance without any code modification except by the memory access optimization with work-group sizing as suggested by our methodology.en
dc.description.provenanceMade available in DSpace on 2021-06-15T13:35:19Z (GMT). No. of bitstreams: 1
ntu-105-R02921043-1.pdf: 3064821 bytes, checksum: 56c30d9229eb5395f6f8e814844289cc (MD5)
Previous issue date: 2016
en
dc.description.tableofcontents口試委員會審定書 #
中文摘要 i
ABSTRACT ii
CONTENTS iii
LIST OF FIGURES v
LIST OF TABLES viii
Chapter 1 Introduction 1
1.1 Heterogeneous Computing 1
1.2 Global Memory Issues 5
1.3 Contributions of This Thesis 8
1.4 Organization of This Thesis 8
Chapter 2 Related Work 9
Chapter 3 Memory Access Modeling 11
3.1 Parallel Model Incorporating Thread Mapping 11
3.2 Memory Access Space 16
3.3 Global Memory Metric 17
Chapter 4 Work Group Sizing 20
4.1 Resource Limit and Search Space 21
4.2 Accessing Analyses 23
4.2.1 Intra Work Group Analyses 24
4.2.2 Inter Work Group Analyses 25
4.3 Work Group Sizing Algorithm 26
Chapter 5 Experimental Results 28
5.1 Experimental Setup 29
5.2 Evaluation of Work Group Sizing 31
Chapter 6 Conclusions and Future Work 47
REFERENCE 48
dc.language.isoen
dc.subject記憶體存取最佳化zh_TW
dc.subject記憶體存取最佳化zh_TW
dc.subjectMemory Access Optimizationen
dc.subjectMemory Access Optimizationen
dc.title資料平行GPU架構之記憶體存取最佳化zh_TW
dc.titleMemory Access Optimization for Data-parallel GPU Architecturesen
dc.typeThesis
dc.date.schoolyear104-1
dc.description.degree碩士
dc.contributor.oralexamcommittee雷欽隆(Chin-Laung Lei),徐慰中(Wei-Chung Hsu),洪士灝(Shih-Hao Hung)
dc.subject.keyword記憶體存取最佳化,zh_TW
dc.subject.keywordMemory Access Optimization,en
dc.relation.page50
dc.rights.note有償授權
dc.date.accepted2016-01-28
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電機工程學研究所zh_TW
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-105-1.pdf
  未授權公開取用
2.99 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved