結合自注意力機制與深度分離卷積套合衛星影像

周裕茗; Yu-Ming Chou

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91318

標題:	結合自注意力機制與深度分離卷積套合衛星影像 Remote Sensing Image Registration Based on Self Attention Mechanism and Depthwise Separable Convolution
作者:	周裕茗 Yu-Ming Chou
指導教授:	張恆華 Herng-Hua Chang
關鍵字:	影像套合,深度學習,非監督式學習,自注意力機制, image registration,deep learning,unsupervised learning,self-attention,
出版年 :	2023
學位:	碩士
摘要:	影像套合(Image Registration)技術廣泛被使用於數位影像處理、電腦視覺等領域。在深度學習領域蓬勃發展的現在，不少研究學者開始使用深度學習模型進行影像套合任務。然而影像套合仍存在許多問題，例如在不同時間點的兩張衛星影像中往往找不到精確的特徵點進行對齊，以至於套合結果不盡理想。現有方法絕大多數使用卷積神經網路進行影像套合任務，然而卷積神經網路往往為了處理高維度資料集而選擇不斷擴張模型架構，導致訓練時間拉長、訓練較不穩定，且容易出現梯度消失以及梯度爆炸等問題。本篇論文結合自注意力模塊與深層分離式卷積的優點，避免卷積神經網路加深所導致的梯度消失以及梯度爆炸等問題，使模型訓練能更加穩定有效。我們採用Swin Transformer的自注意力模塊進行影像特徵提取，並且使用矩陣乘法方式計算由輸入影像獲得的兩張特徵圖之間的相關性矩陣，以此預測仿射轉換的參數。將相關性矩陣放入兩個迴歸模型架構，得到空間變換參數值來轉換圖像，達到影像套合的目的。同時為提高模型泛化能力，我們採用非監督式學習進行訓練，讓模型自行了解複雜資料集內的關聯性以及特徵，使模型能有更好的效能，以處理未見過的資料集。本論文提出的方法在測試資料集皆優於現有方法，既能在已見過的資料集擁有較高的精確度，在未見過的資料集中，仍可正確進行影像套合。不僅解決卷積神經網路在非監督式學習上為學習高維度資料集需加深模型架構而導致訓練不穩定的問題，在不同時間的真實套合影像資料集尤為出色，即便在特徵點稀少的影像上仍能獲得不錯的結果。 Image registration technigues have been widely used in various fields. However, image registration still faces many challenges. For example, it is difficult to find accurate feature points for aligning two satellite images taken at different time points, which result in suboptimal registration accuracy. Most existing methods use convolutional neural networks (CNNs) for image registration tasks. However, CNNs often expand the model architecture to handle high-dimensional datasets, which leads to longer training time, unstable training, and issues such as the gradient vanishing and explosion problem. In this thesis, we combine the advantages of Self-Attention Mechanism and depth-wise separable convolution, making the model more stable and effective. We use Swin Transformer modules for image feature extraction and employ dot product-based methods to calculate the correlation matrix. By computing the correlation matrix with two regression model architectures, we obtain spatial transformation parameters to align the images. To improve the model's generalization ability, this thesis adopts unsupervised learning for training, allowing the model to understand the relationships and features among complex image data on its own. This enables the model to have better generalization ability and handle unseen datasets. The proposed method outperformed existing methods on many datasets, achieving higher accuracy on seen datasets and accurate image registration on unseen datasets. We not only address the problem of unstable training caused by deepening the model architecture to learn high-dimensional datasets but also design a light model architecture that can achieve better results with less time. It performed particularly well on real-world image registration datasets taken at different times.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91318
DOI:	10.6342/NTU202304477
全文授權:	未授權
顯示於系所單位：	工程科學及海洋工程學系

文件中的檔案：

檔案	大小	格式
ntu-112-1.pdf 目前未授權公開取用	6.23 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。