應用於立體匹配之線上訓練優化網路及架構設計

Yu-Sheng Wu; 吳昱陞

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/59494

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳良基(Liang-Gee Chen)
dc.contributor.author	Yu-Sheng Wu	en
dc.contributor.author	吳昱陞	zh_TW
dc.date.accessioned	2021-06-16T09:25:34Z	-
dc.date.available	2020-08-24
dc.date.copyright	2020-08-24
dc.date.issued	2020
dc.date.submitted	2020-08-14
dc.identifier.citation	1. CHANG, Jia-Ren; CHEN, Yong-Sheng. Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. p. 5410-5418. 2. CHEN, Yu-Hsin; EMER, Joel; SZE, Vivienne. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH Computer Architecture News, 2016, 44.3: 367-379. 3. LEE, Jinsu, et al. 7.7 LNPU: A 25.3 TFLOPS/W sparse deep-neural-network learning processor with fine-grained mixed precision of FP8-FP16. In: 2019 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 2019. p. 142-144. 4. ALWANI, Manoj, et al. Fused-layer CNN accelerators. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2016. p. 1-12. 5. LYM, Sangkug, et al. Mini-batch serialization: Cnn training with inter-layer data reuse. arXiv preprint arXiv:1810.00307, 2018. 6. LECUN, Yann; BENGIO, Yoshua; HINTON, Geoffrey. Deep learning. nature, 2015, 521.7553: 436-444. 7. SONG, Mingcong, et al. In-situ ai: Towards autonomous and incremental deep learning for iot systems. In: 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2018. p. 92-103. 8. LU, Cheng-Hsun; WU, Yi-Chung; YANG, Chia-Hsiang. A 2.25 TOPS/W Fully-Integrated Deep CNN Learning Processor with On-Chip Training. In: 2019 IEEE Asian Solid-State Circuits Conference (A-SSCC). IEEE, 2019. p. 65-68. 9. CHOI, Seungkyu, et al. A 47.4 µJ/epoch Trainable Deep Convolutional Neural Network Accelerator for In-Situ Personalization on Smart Devices. In: 2019 IEEE Asian Solid-State Circuits Conference (A-SSCC). IEEE, 2019. p. 57-60. 10. CHOI, Seungkyu, et al. An Optimized Design Technique of Low-bit Neural Network Training for Personalization on IoT Devices. In: Proceedings of the 56th Annual Design Automation Conference 2019. 2019. p. 1-6. 11. HAN, Song, et al. Learning both weights and connections for efficient neural network. In: Advances in neural information processing systems. 2015. p. 1135-1143. 12. RASTEGARI, Mohammad, et al. Xnor-net: Imagenet classification using binary convolutional neural networks. In: European conference on computer vision. Springer, Cham, 2016. p. 525-542. 13. COURBARIAUX, Matthieu; BENGIO, Yoshua; DAVID, Jean-Pierre. Binaryconnect: Training deep neural networks with binary weights during propagations. In: Advances in neural information processing systems. 2015. p. 3123-3131. 14. LI, Fengfu; ZHANG, Bo; LIU, Bin. Ternary weight networks. arXiv preprint arXiv:1605.04711, 2016. 15. LIN, Chien-Hung, et al. 7.1 A 3.4-to-13.3 TOPS/W 3.6 TOPS Dual-Core Deep-Learning Accelerator for Versatile AI Applications in 7nm 5G Smartphone SoC. In: 2020 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 2020. p. 134-136. 16. HAMZAH, Rostam Affendi; IBRAHIM, Haidi. Literature survey on stereo vision disparity map algorithms. Journal of Sensors, 2016, 2016. 17. SCHARSTEIN, Daniel; SZELISKI, Richard. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International journal of computer vision, 2002, 47.1-3: 7-42. 18. ŽBONTAR, Jure; LECUN, Yann. Stereo matching by training a convolutional neural network to compare image patches. The journal of machine learning research, 2016, 17.1: 2287-2318. 19. GEIGER, Andreas; LENZ, Philip; URTASUN, Raquel. Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012. p. 3354-3361. 20. MENZE, Moritz; GEIGER, Andreas. Object scene flow for autonomous vehicles. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 3061-3070. 21. SCHARSTEIN, Daniel, et al. High-resolution stereo datasets with subpixel-accurate ground truth. In: German conference on pattern recognition. Springer, Cham, 2014. p. 31-42. 22. MAYER, Nikolaus, et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 4040-4048. 23. PANG, Jiahao, et al. Cascade residual learning: A two-stage convolutional neural network for stereo matching. In: Proceedings of the IEEE International Conference on Computer Vision Workshops. 2017. p. 887-895. 24. LIANG, Zhengfa, et al. Learning for disparity estimation through feature constancy. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. p. 2811-2820. 25. KHAMIS, Sameh, et al. Stereonet: Guided hierarchical refinement for real-time edge-aware depth prediction. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018. p. 573-590. 26. TONIONI, Alessio, et al. Real-time self-adaptive deep stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. p. 195-204. 27. CHU, Brian, et al. Best practices for fine-tuning visual classifiers to new domains. In: European conference on computer vision. Springer, Cham, 2016. p. 435-442. 28. NØKLAND, Arild. Direct feedback alignment provides learning in deep neural networks. In: Advances in neural information processing systems. 2016. p. 1037-1045. 29. HAN, Donghyeon; YOO, Hoi-jun. Direct Feedback Alignment Based Convolutional Neural Network Training for Low-Power Online Learning Processor. In: Proceedings of the IEEE International Conference on Computer Vision Workshops. 2019. p. 0-0. 30. HAN, Donghyeon, et al. A 1.32 tops/w energy efficient deep neural network learning processor with direct feedback alignment based heterogeneous core architecture. In: 2019 Symposium on VLSI Circuits. IEEE, 2019. p. C304-C305. 31. YUAN, Zhe, et al. Sticker: A 0.41-62.1 TOPS/W 8Bit neural network processor with multi-sparsity compatible convolution arrays and online tuning acceleration for fully connected layers. In: 2018 IEEE Symposium on VLSI Circuits. IEEE, 2018. p. 33-34. 32. FLEISCHER, Bruce, et al. A scalable multi-TeraOPS deep learning processor core for AI trainina and inference. In: 2018 IEEE Symposium on VLSI Circuits. IEEE, 2018. p. 35-36. 33. YANG, Guorun, et al. Drivingstereo: A large-scale dataset for stereo matching in autonomous driving scenarios. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. p. 899-908. 34. SONG, Xiao, et al. Edgestereo: A context integrated residual pyramid network for stereo matching. In: Asian Conference on Computer Vision. Springer, Cham, 2018. p. 20-35. 35. DUGGAL, Shivam, et al. Deeppruner: Learning efficient stereo matching via differentiable patchmatch. In: Proceedings of the IEEE International Conference on Computer Vision. 2019. p. 4384-4393. 36. SZEGEDY, Christian, et al. Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261, 2016. 37. PASZKE, Adam, et al. Automatic differentiation in pytorch. 2017. 38. BOTTOU, Léon. Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT'2010. Physica-Verlag HD, 2010. p. 177-186. 39. MICIKEVICIUS, Paulius, et al. Mixed precision training. arXiv preprint arXiv:1710.03740, 2017. 40. WANG, Naigang, et al. Training deep neural networks with 8-bit floating point numbers. In: Advances in neural information processing systems. 2018. p. 7675-7684.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/59494	-
dc.description.abstract	深度預測有著諸多的應用，像是自駕車、機器人、虛擬實境。在數種不同感測器所使用的方法中，立體匹配利用成對的RGB圖片來進行深度預測，也是成本較低的做法。由於場域偏差的因素，再加上簡化的參數及方程式，裝置上進行神經網路模型的結果輸出通常會造成表現上的落差。因此近年來對於裝置個人化而需線上訓練的需求開始逐漸增加。將本地端的資料送往雲端會有個人隱私上的風險，同時模型的更新時間也非常長。另一方面，現有立體匹配演算法的模型仍需耗費大量運算，在本地端裝置有著有限的運算能力及資源的情況下，利用裝置進行整個模型的參數訓練都是不切實際的。為此，在本篇論文中，我們提出一個兩階段的線上立體匹配優化系統，利用額外的小型神經網路來學習本地端資料和雲端訓練資料的場域差異。這個優化系統相比於整個模型的參數訓練有著更好的性價比，不僅如此，我們相比於原始的立體匹配神經網路模型而言，只需負擔0.2% 的額外參數量，以及0.7% 的額外運算。因此這會是線上訓練情境下合適的解決方案。在科技日新月異的進步下，現今的立體匹配相機已可支援至full-HD，支援高解析度的深度預設是未來趨勢。在這個基礎上我們結合先前的線上即時訓練需求，我們將應用的情境設定在full-HD 畫質並且有每秒24張深度的更新頻率，而這個規格下使用現有的硬體訓練架構會有36.73 GB/s的頻寬需求，為了處理這個瓶頸，我們分析常見的三種優化方向來降低頻寬。這些方法包含參數簡化，稀疏度壓縮，以及多層融合。當使用參數簡化，稀疏度壓縮，仍無法達到我們的硬體需求，因此我們使用多層融合的技術，將運算的時間排序進行調整，最終節省 97% 的頻寬，支援這個訓練排序的架構可以做為新的基準點做未來的優化。	zh_TW
dc.description.abstract	Depth estimation has various applications such as autonomous driving, robotics and AR. Upon several approaches using different kinds of sensors, stereo matching is typically a cost-effective depth estimation approach exploiting stereo triangulation between pair of rectified RGB images. Due to domain shifting issue, quantized parameters and approximate functions, inferencing CNN models on device usually causes performance degradation, the necessity of device personalization has increased in recent years. Sending local data to cloud servers is vulnerable to user privacy, and moreover its long update latency. Meanwhile, SOTA stereo matching method is still computation demanding, fine-tuning whole model on-device is not a practicable solution because of the limited power budget and computation ability on edge device. In this thesis, we propose a two-stage online stereo matching refinement system, using additional light-weight network to learn the domain gap between local data and cloud training data. This refinement system has much better load{gain ratio than finetune. (0.02 TOPs/accuracy gain v.s. 3.32 TOPs/accuracy gain) Nevertheless, we only disburse 0.2% of additional parameters, and 0.7% additional computation as set by inference the stereo matching model. Thus, it would be a suitable choice for online training scenario. With the rapid growth on stereo cameras which can support above fullHD resolution, depth estimation with high resolution requirement would be a trending in future application. Owing to the aforementioned reasons, we set the application scenario as full-HD (1920 x 1080) with at least 24 fps. The direct implementation use current learning architecture cause the bandwidth requirement with 36.73 GB/s. To handle this bottleneck, we analysis three computation strategies to achieve bandwidth reduction including sparsity compression, quantization, and layer-fusion. While applying sparsity compression and quantization, we can’t meet the specification. With re-scheduling the training pipeline, we use patch-based layer fusion technique and get about 97% of bandwidth reduction. The architecture supporting proposed training pipeline could be the baseline for further optimization.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T09:25:34Z (GMT). No. of bitstreams: 1 U0001-1408202007132000.pdf: 22865761 bytes, checksum: 2908504502832fbe02fe2189e4c00589 (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	Abstract ix 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Major Contributions . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . 5 2 Background 7 2.1 Depth Estimation . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.2 CNN-Based End-to-End Stereo matching . . . . . . . 9 2.1.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.4 The relationship of propose method and these works 11 2.2 On-chip Learning Accelerators . . . . . . . . . . . . . . . . . 12 2.2.1 Overview of training procedure . . . . . . . . . . . . 12 2.2.2 Cell-Based Learning Accelerator . . . . . . . . . . . . 16 2.2.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.4 Challenge of Designing a Training Hardware Architecture . . . . . 19 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3 Online Refinement System for Stereo Matching 21 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Proposed system framework . . . . . . . . . . . . . . . . . . 23 3.2.1 Pre-trained Stereo Matching Model . . . . . . . . . . 23 3.2.2 Residual Learning Refinement Network . . . . . . . . 24 3.2.3 Smoothness Loss . . . . . . . . . . . . . . . . . . . . 24 3.3 Light-weight Trainable Refinement Network . . . . . . . . . 25 3.3.1 Input consideration . . . . . . . . . . . . . . . . . . . 25 3.3.2 Network architecture consideration . . . . . . . . . . 26 3.3.3 Multi-scale technique consideration . . . . . . . . . . 27 3.3.4 Brief Summary . . . . . . . . . . . . . . . . . . . . . 27 3.4 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4.1 Online Training Scenario Simulation . . . . . . . . . 28 3.4.2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . 29 3.4.4 Implementation Details . . . . . . . . . . . . . . . . . 30 3.4.5 Experimental Setup . . . . . . . . . . . . . . . . . . . 30 3.5 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . 32 3.5.1 Results on baseline refinement network . . . . . . . . 32 3.5.2 Overhead between different modes . . . . . . . . . . . 38 3.5.3 Ablation Studies . . . . . . . . . . . . . . . . . . . . 39 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4 Hardware Architecture Analysis and Design of Refinement Network 41 4.1 Introduction and Hardware Specification . . . . . . . . . . . 42 4.2 Design Consideration . . . . . . . . . . . . . . . . . . . . . . 42 4.3 Direct implementation . . . . . . . . . . . . . . . . . . . . . 43 4.3.1 Implementation Analysis and Challenge . . . . . . . 45 4.3.2 Optimization Direction and Analysis Result . . . . . 48 4.3.3 Comparison and Brief Summary . . . . . . . . . . . . 50 4.4 Patch-based Layer-fused Training Procedure . . . . . . . . . 52 4.4.1 Multi-Layer Computation . . . . . . . . . . . . . . . 52 4.4.2 Layer fusion Challenge on Training Procedure . . . . 53 4.4.3 Proposed Training Procedure . . . . . . . . . . . . . 55 4.4.4 Overhead of Proposed Training Procedure . . . . . . 57 4.4.5 Further Optimization Direction . . . . . . . . . . . . 59 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5 Conclusion 63 Bibliography 65
dc.language.iso	en
dc.subject	立體匹配	zh_TW
dc.subject	多層融合	zh_TW
dc.subject	裝置個人化	zh_TW
dc.subject	優化網路	zh_TW
dc.subject	線上訓練	zh_TW
dc.subject	patch-base layer fusion	en
dc.subject	online training	en
dc.subject	stereo matching	en
dc.subject	layer fusion	en
dc.subject	device personalization	en
dc.subject	refinement network	en
dc.title	應用於立體匹配之線上訓練優化網路及架構設計	zh_TW
dc.title	Online Training Refinement Network and Architecture Design for Stereo Matching	en
dc.type	Thesis
dc.date.schoolyear	108-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	賴永康(Yeong-Kang Lai),黃朝宗(Chao-Tsung Huang),簡韶逸(Shao-Yi Chien),陳慶曄(Ching-Yeh Chen)
dc.subject.keyword	立體匹配,線上訓練,優化網路,裝置個人化,多層融合,	zh_TW
dc.subject.keyword	stereo matching,online training,refinement network,device personalization,layer fusion,patch-base layer fusion,	en
dc.relation.page	70
dc.identifier.doi	10.6342/NTU202003369
dc.rights.note	有償授權
dc.date.accepted	2020-08-14
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電子工程學研究所	zh_TW
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
U0001-1408202007132000.pdf 未授權公開取用	22.33 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。