應用於立體匹配之線上訓練優化網路及架構設計

Yu-Sheng Wu; 吳昱陞

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/59494

標題:	應用於立體匹配之線上訓練優化網路及架構設計 Online Training Refinement Network and Architecture Design for Stereo Matching
作者:	Yu-Sheng Wu 吳昱陞
指導教授:	陳良基(Liang-Gee Chen)
關鍵字:	立體匹配,線上訓練,優化網路,裝置個人化,多層融合, stereo matching,online training,refinement network,device personalization,layer fusion,patch-base layer fusion,
出版年 :	2020
學位:	碩士
摘要:	深度預測有著諸多的應用，像是自駕車、機器人、虛擬實境。在數種不同感測器所使用的方法中，立體匹配利用成對的RGB圖片來進行深度預測，也是成本較低的做法。由於場域偏差的因素，再加上簡化的參數及方程式，裝置上進行神經網路模型的結果輸出通常會造成表現上的落差。因此近年來對於裝置個人化而需線上訓練的需求開始逐漸增加。將本地端的資料送往雲端會有個人隱私上的風險，同時模型的更新時間也非常長。另一方面，現有立體匹配演算法的模型仍需耗費大量運算，在本地端裝置有著有限的運算能力及資源的情況下，利用裝置進行整個模型的參數訓練都是不切實際的。為此，在本篇論文中，我們提出一個兩階段的線上立體匹配優化系統，利用額外的小型神經網路來學習本地端資料和雲端訓練資料的場域差異。這個優化系統相比於整個模型的參數訓練有著更好的性價比，不僅如此，我們相比於原始的立體匹配神經網路模型而言，只需負擔0.2% 的額外參數量，以及0.7% 的額外運算。因此這會是線上訓練情境下合適的解決方案。在科技日新月異的進步下，現今的立體匹配相機已可支援至full-HD，支援高解析度的深度預設是未來趨勢。在這個基礎上我們結合先前的線上即時訓練需求，我們將應用的情境設定在full-HD 畫質並且有每秒24張深度的更新頻率，而這個規格下使用現有的硬體訓練架構會有36.73 GB/s的頻寬需求，為了處理這個瓶頸，我們分析常見的三種優化方向來降低頻寬。這些方法包含參數簡化，稀疏度壓縮，以及多層融合。當使用參數簡化，稀疏度壓縮，仍無法達到我們的硬體需求，因此我們使用多層融合的技術，將運算的時間排序進行調整，最終節省 97% 的頻寬，支援這個訓練排序的架構可以做為新的基準點做未來的優化。 Depth estimation has various applications such as autonomous driving, robotics and AR. Upon several approaches using different kinds of sensors, stereo matching is typically a cost-effective depth estimation approach exploiting stereo triangulation between pair of rectified RGB images. Due to domain shifting issue, quantized parameters and approximate functions, inferencing CNN models on device usually causes performance degradation, the necessity of device personalization has increased in recent years. Sending local data to cloud servers is vulnerable to user privacy, and moreover its long update latency. Meanwhile, SOTA stereo matching method is still computation demanding, fine-tuning whole model on-device is not a practicable solution because of the limited power budget and computation ability on edge device. In this thesis, we propose a two-stage online stereo matching refinement system, using additional light-weight network to learn the domain gap between local data and cloud training data. This refinement system has much better load{gain ratio than finetune. (0.02 TOPs/accuracy gain v.s. 3.32 TOPs/accuracy gain) Nevertheless, we only disburse 0.2% of additional parameters, and 0.7% additional computation as set by inference the stereo matching model. Thus, it would be a suitable choice for online training scenario. With the rapid growth on stereo cameras which can support above fullHD resolution, depth estimation with high resolution requirement would be a trending in future application. Owing to the aforementioned reasons, we set the application scenario as full-HD (1920 x 1080) with at least 24 fps. The direct implementation use current learning architecture cause the bandwidth requirement with 36.73 GB/s. To handle this bottleneck, we analysis three computation strategies to achieve bandwidth reduction including sparsity compression, quantization, and layer-fusion. While applying sparsity compression and quantization, we can’t meet the specification. With re-scheduling the training pipeline, we use patch-based layer fusion technique and get about 97% of bandwidth reduction. The architecture supporting proposed training pipeline could be the baseline for further optimization.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/59494
DOI:	10.6342/NTU202003369
全文授權:	有償授權
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
U0001-1408202007132000.pdf 未授權公開取用	22.33 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。