基於卷積注意力機制非監督式學習之影像拼接

張容誠; Jung-Cheng Chang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93784

標題:	基於卷積注意力機制非監督式學習之影像拼接 Image Stitching Based on Unsupervised Learning with a Convolutional Block Attention Module
作者:	張容誠 Jung-Cheng Chang
指導教授:	丁肇隆 Chao-Lung Ting
共同指導教授:	張恆華 Herng-Hua Chang
關鍵字:	影像拼接,單應性矩陣預測,影像對準,影像混合,深度學習, Image stitching,Homography estimation,Image alignment,Image blending,Deep learning,
出版年 :	2024
學位:	碩士
摘要:	影像拼接(Image stitching)被廣泛運用在無人自駕車上以提升相機於視野盲區的物體偵測與辨識成功率，能否即時運行並保留完整的拼接結果在此任務中相當重要。過去影像拼接透過偵測與配對特徵點以計算出單應性矩陣(Homography)，並進行重疊區域影像的疊加混合以取得最終拼接結果，但缺乏效率。以非監督式學習的方式儘管擁有較快的單應性矩陣預測速度，但矩陣準確度仍有待提升。為解決此問題，本研究提出了新的模型架構，在UDHN的基礎上引入卷積注意力機制，並結合多層位移區塊逐步對影像角落位置進行預測。我們總共產生了18000對256×256大小的訓練資料和各4000對的驗證與測試資料。本研究提出的方法在保留矩陣運算高效性的同時，顯著降低了影像角落位移的均方根誤差，相較於CA-Homo和UDHN分別降低了27.44 % 和75.06 %。此外，本研究採用多階段學習策略，讓模型從較簡單的資料訓練到較複雜的資料，以提高預測準確度並提升訓練效率。另外，本研究提出了基於JBF的影像混合方法，實驗結果顯示，該混合方式在峰值信噪比(PSNR)上較一般混合方式提升了13%。最終，通過消融實驗表明，所提出的模型架構與各個損失函數在整體訓練中，皆起到關鍵作用。總體來說，本研究所使用的模型在提升影像拼接效果的同時，也維持了模型既有的預測速度，讓此模型可應用於即時任務上。 Image stitching has been widely used in autonomous vehicles to improve the success rate of object detection and recognition in blind spots. One of the challenges is to process in real-time while retaining the complete stitching result. Traditional methods detect and match the keypoints in input images to obtain the homography, which lack efficiency. Supervised and unsupervised learning methods offer faster homography estimation, but the accuracy still needs improvement. To address this issue, a new model architecture is proposed, which is built upon unsupervised deep homography network (UDHN) by incorporating convolutional block attention module (CBAM) and the multi-layer corner shifts to progressively predict the final shift. In this study, 18000 pairs of 256×256 images were generated for training with 4000 pairs individually for validation and testing. The root mean square error (RMSE) of corner shift in the research was reduced by 27.44% and 75.06% compared to content-aware unsupervised deep homography estimation (CA-Homo) and UDHN, respectively. Additionally, a multistage learning approach was proposed to train the model with simpler data first, then gradually progress to more complex data to enhance the prediction accuracy and training efficiency. Moreover, a new blending method was introduced and showed a 13% improvement on PSNR over the normal blending method. Finally, the ablation experiments were conducted and validated that each component of the model architecture and loss term play a crucial role in the overall process. To summarize, our approach improved image stitching effectiveness while maintaining the prediction speed, making it suitable for real-time applications.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93784
DOI:	10.6342/NTU202401256
全文授權:	同意授權(限校園內公開)
電子全文公開日期:	2029-07-31
顯示於系所單位：	工程科學及海洋工程學系

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf 目前未授權公開取用	3.65 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。