請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/52955
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 廖世偉 | |
dc.contributor.author | Shao-Yun Kuang | en |
dc.contributor.author | 鄺劭昀 | zh_TW |
dc.date.accessioned | 2021-06-15T16:35:55Z | - |
dc.date.available | 2017-08-31 | |
dc.date.copyright | 2015-08-31 | |
dc.date.issued | 2015 | |
dc.date.submitted | 2015-08-12 | |
dc.identifier.citation | [1] Sylvain Paris Marc Levoy Saman Amarasinghe Fredo Durand. Jonathan Ragan- Kelley, Andrew Adams. Decoupling algorithms from schedules for easy optimiza- tion of image processing pipelines.
[2] Jonathan Ragan-Kelley,Connelly Barnes,Andrew Adams,Sylvain Paris,Fredo Du- rand,Saman Amarasinghe. Halide: a language and compiler for optimizing paral- lelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Im- plementation. [3] R. Govindarajan Prasanna Pandit. Fluidic kernels: Cooperative execution of opencl programs on multiple heterogeneous devices. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization. [4] Khronos Group. Opencl - the open standard for parallel programming of heteroge- neous systems. 2013. [5] C. omasi and R. Manduchi. Bilateral filtering for gray and color images. In In Proceedings of the IEEE International Conference on Computer. [6] By Sylvain Paris, Samuel W. Hasinoff, Jan Kautz. Local laplacian filters: Edge- aware image processing with a laplacian pyramid. 2015. [7] Nvidia. New top500 list: 4x more gpu supercomputers. 2012. [8] L. Nyland, M Harris,J Prins. Fast n-body simulation with cuda. 2012. [9] V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D.Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty,P. Hammarlund, R. Singhal, and P. Dubey. . Debunk- ing the100x gpu vs. cpu myth: an evaluation of throughput computing on cpu and gpu. 2012. [10] Jonathan Ragan-Kelley. Decoupling Algorithms from the Organization of Compu- tation for High Performance Image Processing: The design and implementation of the Halide language and compiler. PhD thesis, MIT. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/52955 | - |
dc.description.abstract | 在這篇論文裡,我們打造了一個協同運算的框架做為 Halide 的 附加元件,藉以利用 Halide 之優勢試圖去創造一個較好的異質計算 環境。在此框架中我們提供兩種的分配工作量方法,這兩種工作法 方可以提供不同程度的加速優化。我們會在這篇論文中測量兩種分 配方法提供的優化程度以及在不同輸入大小的優化程度。在實驗中 使用我們的框架搭配最佳化過的 Halide 的 CPU 及 GPU 程式可以得 到 1.21x,1.55x,1.16x 的加速在 Nexus 7,Intel CPU 搭載 ATI GPU 和 Intel CPU 搭載 Nvidia GPU. | zh_TW |
dc.description.abstract | In this thesis,we create a synergistic framework as Halide extension to build a better heterogeneous computing environment with the advantage of Halide.Our framework support 2 methods - dynamic dispatch and static dis- patch to assign different workload to CPU and GPU with those dispatch method we will get different level speedup.And we will measure those dispatch meth- ods with different input sizes at mobile,ATI GPU and NVIDIA GPU.In our experiment, using our framework with optimized Halide CPU and GPU funci- ton can get 1.21x speedup at Nexus 7, 1.55x speedup at Inetel CPU with ATI GPU and 1.16x speedup at Intel CPU with Nvidia GPU. | en |
dc.description.provenance | Made available in DSpace on 2021-06-15T16:35:55Z (GMT). No. of bitstreams: 1 ntu-104-R02922099-1.pdf: 776550 bytes, checksum: 66e9d1bc46a613a73d52d3e5a3073a9a (MD5) Previous issue date: 2015 | en |
dc.description.tableofcontents | Contents
致謝 i Abstract ii 摘要 iii 1 Motivation 1 2 Background and Related work 2 2.1 IntroductionOfHalide ........................... 2 2.2 IntroductionofOpenCL .......................... 3 2.3 FluidicKernels ............................... 4 3 Implementation 6 3.1 Overview .................................. 6 3.2 Buffersetup................................. 8 3.3 StaticDispatch ............................... 10 3.4 DynamicDispatch ............................. 10 3.5 Usageofourframework .......................... 12 4 Experiments 14 4.1 StaticDispatch ............................... 18 4.2 DynamicDispatch ............................. 22 4.3 ProfileOpenCLRuntimeinformation ................... 23 4.4 CUDAandOpenCL ............................ 23 4.5 DifferentInputSize............................. 25 4.6 OpenCLAPIexectiontimewithlargersize . . . . . . . . . . . . . . . . 26 5 Conclusion 28 Bibliography 29 | |
dc.language.iso | en | |
dc.title | 基於中央處理器及圖形處理單元之協同運算 | zh_TW |
dc.title | A Synergistic Framework base on Halide | en |
dc.type | Thesis | |
dc.date.schoolyear | 103-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 徐慰中,梁伯嵩 | |
dc.subject.keyword | 協同運算,中央處理器,圖形處理器,異質計算, | zh_TW |
dc.subject.keyword | synergistic computing,CPU,GPU,Heterogeneous computing, | en |
dc.relation.page | 30 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2015-08-12 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-104-1.pdf 目前未授權公開取用 | 758.35 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。