使用時序注意力機制之基於影像序列的物體位姿估測

Chun-Yu Chen; 陳竣宇

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84600

標題:	使用時序注意力機制之基於影像序列的物體位姿估測 Object Pose Estimation Using Image Sequence via Temporal Attention
作者:	Chun-Yu Chen 陳竣宇
指導教授:	洪一平(Yi-Ping Hung)
關鍵字:	物體位姿估測,語義分割,時序注意力,深度學習, Object Pose Estimation,Semantic Segmentation,Temporal Attention,Deep Learning,
出版年 :	2022
學位:	碩士
摘要:	物體位姿估測是一種用於偵測圖片中感興趣物體的技術。由單張RGB影像來做6D物體位姿估測的一個常見的挑戰就是是物體在雜亂的場景中彼此的互相遮擋。除了只使用輸入影像的空間資訊外，利用影片資料中連續影像的之間的時間資訊可以進一步提升這項任務的表現。舉例來說，考慮到輸入影像中的物體在當前的相機視角下被其他物體遮擋的情況，結合鄰近影像的相機視角就有機會去回復未看到的物體的位姿。在本論文中，我們對使用了深度學習的端到端單張影像的位姿估測方法進行了充分的分析與實驗，並且提出了一種端到端方法將單張影像位姿估測擴展到多張影像的版本。實驗結果顯示，我們的方法相較於基準模型提供了更準確的結果。 Object pose estimation is a technique used to detect objects of interest in images. A common challenge of 6D pose estimation from a single RGB image is the occlusions between objects in cluttered environment. In addition to only use the spatial information in the input frame, utilizing the temporal information between consecutive frames in the video data may further improve the performance in this task. For instance, taking account of the situation that the objects in the input frame be occluded by other objects in current camera perspective, combining the camera perspectives of neighboring frames makes it possible to recover the poses of unseen objects. In this thesis, we fully analyze and experiment on an end-to-end single-frame pose estimation method using deep learning. We also propose an end-to-end approach to extend the single-frame pose estimation to a multi-frame version. The experimental results show that our method provides more accurate results than the baseline model.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84600
DOI:	10.6342/NTU202203024
全文授權:	同意授權(限校園內公開)
電子全文公開日期:	2022-09-26
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
U0001-3108202215385700.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	7.92 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。