使用Halide框架設計和優化OpenVX應用

Bo-Ru Zhao; 趙柏儒

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74828

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	廖世偉
dc.contributor.author	Bo-Ru Zhao	en
dc.contributor.author	趙柏儒	zh_TW
dc.date.accessioned	2021-06-17T09:08:22Z	-
dc.date.available	2024-12-02
dc.date.copyright	2019-12-02
dc.date.issued	2019
dc.date.submitted	2019-11-14
dc.identifier.citation	[1] Khronos Group, 'The OpenVX API for hardware acceleration', https: //www.khronos.org/openvx, 2013. [2] OpenVX NNE, https://www.khronos.org/registry/vx/extensions/neural_network/html/index.html [3] Ragan-Kelley, J., Adams, A., Paris, S., Levoy, M., Ama-Rainghe, S. and Durand, F., ”Decoupling algorithms from schedules for easy optimization of image processing pipelines”, ACM Transactions on Graphics, 31, 4, 32, 2012. [4] J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation. ACM, 2013. [5] NVIDIA CUDA C Programming Guide v10.1. NVIDIA, May. 2019. [6] Advanced Micro Devices, Inc. AMD APP SDK - A Complete Development Platform, 2015. [7] R. T. Mullapudi, A. Adams, D. Sharlet, J. Ragan-Kelley, and K. Fatahalian. 2016. Automatically scheduling halide image processing pipelines. ACM Transactions on Graphics 35, 4, Article 83 (July 2016), 11 pages. [8] A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, pages 1106–1114, 2012. [9] E. Rainey, J. Villarreal, G. Dedeoglu, K. Pulli, T. Lepley, and F. Brill, “Addressing System-Level Optimization with OpenVX Graphs,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2014, pp. 658–663. [10] Canis, A., Choi, J., Aldham, M., Zhang, V., Kammoona, A., Anderson, J.H., Brown, S., Czajkowski, T.: LegUp: high-level synthesis for FPGA-based processor/accelerator systems. In: Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pp. 33–36. ACM (2011) [11] Gehrig, S.K., Eberli, F., Meyer, T.: A real-time low-power stereo vision engine using semi-global matching. In: Computer Vision Systems, pp. 134–143. Springer (2009) [12] Lei, Y., Gang, Z., Si-Heon, R., Choon-Young, L., Sang-Ryong, L., Bae, K.M.: The platform of image acquisition and processing system based on DSP and FPGA. In: International Conference on Smart Manufacturing Application, pp. 470–473. IEEE (2008) [13] Cong, J., Ghodrat, M.A,, Gill, M., Grigorian, B., Reinman, G.: CHARM: a composable heterogeneous accelerator-rich micro- processor. In: Proceedings of the 2012 ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 379–384. ACM (2012) [14] Cong, J., Liu, C., Ghodrat, M.A., Reinman, G., Gill, M., Zou, Y.: AXR-CMP: architecture support in accelerator-rich CMPs. In: 2nd Workshop on SoC Architecture, Accelerators and Workloads (2011) [15] Farabet, C., Martini, B., Corda, B., Akselrod, P., Culurciello, E., LeCun, Y.: Neuflow: a runtime reconfigurable dataflow processor for vision. In: 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 109–116. IEEE (2011) [16] Hegarty, J., Brunhaver, J., DeVito, Z., Ragan-Kelley, J., Cohen, N., Bell, S., Vasilyev, A., Horowitz, M., Hanrahan, P. Darkroom: Compiling high-level image processing code into hardware pipelines. In: Proceedings of the 41st International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH) (2014) [17] OpenCV Library Homepage. http://www.opencv.com/ [18] Coombs, J., Prabhu, R., Peake, G.: Overcoming the challenges of porting OpenCV to TI’s embedded ARM+ DSP platforms. Int. J. Electr. Eng. Educ. 49(3), 260–274 (2012) [19] Tegra Android Development Documentation Website. http://docs.nvidia. com/tegra/index.html. [20] Qualcomm (2015) Computer Vision (FastCV). https://developer. qualcomm.com/computer-vision-fastcv [21] J. E. Stone, D. Gohara, and G. Shi. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering, 2010. [22] Czajkowski, T.S., Aydonat, U., Denisenko, D., Freeman, J., Kinsner, M., Neto, D., Wong, J., Yiannacouras, P., Singh, DP.: From OpenCL to high-performance hardware on FPGAs. In: 22nd International Conference on Field Programmable Logic and Applications (FPL), pp. 531–534. IEEE (2012) [23] P. Boudier and G. Sellers. Memory System on Fusion APUs. AMD fusion developer summit, 2011. [24] G. Tagliavini et al., “Optimizing memory bandwidth exploitation for openvx applications on embedded many-core accelerators,” Journal of Real-Time Image Processing, 2016 [25] Tagliavini G, Haugou G, Benini L. Optimizing memory bandwidth in OpenVX graph execution on embedded many-core accelerators[C]/lDesign and Architectures for Signal and Image Processing (DASIP), 2014 Conference on. IEEE, 2014: 1-8. [26] D. Dekkiche, B. Vincke, and A. Merigot, “Investigation and performance analysis of openvx optimizations on computer vision applications,” in 14th International Conference on Control, Automation, Robotics and Vision, 2016, pp. 1–6. [27] G.Tagliavini,G.Haugou,A.Marongiu,andL.Benini,“ADRENALINE: an OpenVX environment to optimize embedded vision applications on many-core accelerators,” in IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), 2015, pp. 289–296. [28] Computer Vision Hardware and Software Market to Reach $48.6 Billion by 2022, https://www.tractica.com/newsroom/press-releases/computer-vision-hardware-and- software-market-to-r each-48-6-billion-by-2022 [29] Deep Learning Enterprise Software Spending to Surpass $40 Billion Worldwide by 2024, https://www.tractica.com/newsroom/press-releases/deep-learning- enterprise-software-spending-to-surpass-40-billion-worldwide-by-2024/ [30] MULLAPUDI, R. T., VASISTA, V., AND BONDHUGULA, U. 2015. PolyMage: Automatic optimization for image processing pipelines. In Proceedings of the Twentieth International Confer- ence on Architectural Support for Programming Languages and Operating Systems, 429–443.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74828	-
dc.description.abstract	在本研究中，我們研究如何使用特定領域程式語言 – Halide來建構框架以便快速設計和優化OpenVX連通圖。Halide是一款高階影像處理語言，其提供開發者在寫程式時將演算法和排程分離，使開發環境變得更友善，Halide也已被證明是一種用於寫高效能影像處理程式的有效系統。我們利用OpenVX和Halide建構框架以執行影像處理，因為Halide具備OpenVX所缺乏的原語排程，故我們使用Halide來執行OpenVX kernels，此方法使開發者能增加更多開發性並達到更好的效能。我們使用五個實驗來測試，所得到的結果顯示使用Halide搭配OpenVX能顯著提升影像處理及卷積神經網路的效能。	zh_TW
dc.description.abstract	In this study, we investigate how to use a Domain-Specific Language – Halide to build a framework for fast prototyping and optimization of OpenVX graphs. Halide is a new high-level image processing pipeline language. It offers developers to separate the program into algorithms and schedule. This makes developers program friendly. The Halide image processing language has also proven to be an effective system for authoring high-performance image processing code. We built a framework with OpenVX and Halide to implement the image processing system. Since OpenVX is a lack of scheduling primitives, but Halide does. We implemented Halide into OpenVX graphs. This method can significantly improve the performance of image processing and convolutional neural networks.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T09:08:22Z (GMT). No. of bitstreams: 1 ntu-108-R06922013-1.pdf: 7505472 bytes, checksum: 7594f1e5fc7963b4c21f0c78a884da89 (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	口試委員會審定書 # 致謝 i 摘要 ii Abstract iii Contents iv 圖目錄 vi 表目錄 vii Chapter 1 緒論 1 1.1 研究背景 1 1.2 研究動機 3 1.3 研究方法 4 Chapter 2 研究背景及文獻探討 5 2.1 OpenVX 5 2.2 Halide 11 2.2.1 生產者-消費者局部性排程 14 2.2.2 輸入資料重用排程 15 2.3 研究背景 16 Chapter 3 實驗設計及優化方法 18 3.1 排程優化問題 18 3.2 排程演算法 19 3.2.1 函式預處理 20 3.2.2 函式分組和平鋪 21 3.2.3 函式內嵌 25 3.2.4 最終函式產生 26 3.3 實驗設計 27 Chapter 4 實驗結果 28 4.1 OpenVX與Halide資料存取模式比較 28 4.2 OpenVX 上單一kernel替換成Halide 30 4.3 OpenVX 上使用Halide實現Kernels Merge 32 4.4 OpenVX 上使用Halide對連通圖做優化 33 4.5 OpenVX NNE 上使用Halide做優化 34 Chapter 5 結論 37 參考文獻 38
dc.language.iso	zh-TW
dc.subject	Halide	zh_TW
dc.subject	影像處理	zh_TW
dc.subject	卷積神經網路	zh_TW
dc.subject	OpenVX	zh_TW
dc.subject	OpenVX	en
dc.subject	Halide	en
dc.subject	image processing	en
dc.subject	convolutional neural networks	en
dc.title	使用Halide框架設計和優化OpenVX應用	zh_TW
dc.title	Design and Optimization of OpenVX Applications with Halide framework	en
dc.type	Thesis
dc.date.schoolyear	108-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	洪士灝,游逸平,洪明郁,葉羅堯
dc.subject.keyword	OpenVX,Halide,影像處理,卷積神經網路,	zh_TW
dc.subject.keyword	OpenVX,Halide,image processing,convolutional neural networks,	en
dc.relation.page	41
dc.identifier.doi	10.6342/NTU201904278
dc.rights.note	有償授權
dc.date.accepted	2019-11-14
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 未授權公開取用	7.33 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。