Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84600
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor洪一平(Yi-Ping Hung)
dc.contributor.authorChun-Yu Chenen
dc.contributor.author陳竣宇zh_TW
dc.date.accessioned2023-03-19T22:17:08Z-
dc.date.copyright2022-09-26
dc.date.issued2022
dc.date.submitted2022-09-19
dc.identifier.citationH. Bay, T. Tuytelaars, and L. V. Gool. Surf: Speeded up robust features. In European conference on computer vision, pages 404–417. Springer, 2006. G.-J. v. d. Braak, C. Nugteren, B. Mesman, and H. Corporaal. Fast hough transform on gpus: Exploration of algorithm trade-offs. In International Conference on Advanced Concepts for Intelligent Vision Systems, pages 611–622. Springer, 2011. Y. Bukschat and M. Vetter. Efficientpose: An efficient, accurate and scalable end- to-end 6D multi object pose estimation approach. arXiv preprint arXiv:2011.04307, 2020. B. Calli, A. Singh, A. Walsman, S. Srinivasa, P. Abbeel, and A. M. Dollar. The ycb object and model set: Towards common benchmarks for manipulation research. In 2015 international conference on advanced robotics (ICAR), pages 510–517. IEEE, 2015. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. Y. Di, F. Manhardt, G. Wang, X. Ji, N. Navab, and F. Tombari. So-pose: Exploiting self-occlusion for direct 6D pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12396–12405, 2021. R. Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015. S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige, and N. Navab. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In Asian conference on computer vision, pages 548–562. Springer, 2012. T. Hodan, D. Barath, and J. Matas. Epos: Estimating 6D pose of objects with symmetries. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11703–11712, 2020. S. Iwase, X. Liu, R. Khirodkar, R. Yokota, and K. M. Kitani. Repose: Fast 6D object pose refinement via deep texture rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3303–3312, 2021. W. Kehl, F. Manhardt, F. Tombari, S. Ilic, and N. Navab. SSD-6D: Making rgb-based 3d detection and 6D pose estimation great again. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017. V. Lepetit, F. Moreno-Noguer, and P. Fua. EPnP: An accurate O(n) solution to the PnP problem. International journal of computer vision, 81(2):155–166, 2009. Y.Li, G.Wang, X.Ji, Y.Xiang, and D.Fox. Deepim: Deep iterative matching for 6D pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 683–698, 2018. Z. Li, G. Wang, and X. Ji. Cdpn: Coordinates-based disentangled pose network for real-time rgb-based 6-dof object pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7678–7687, 2019. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. Ssd: Single shot multibox detector. In European conference on computer vision, pages 21–37. Springer, 2016. J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015. D. G. Lowe. Object recognition from local scale-invariant features. In Proceedings of the seventh IEEE international conference on computer vision, volume 2, pages 1150–1157. Ieee, 1999. J. J. Moré. The levenberg-marquardt algorithm: Implementation and theory. In Numerical Analysis, pages 105–116, Berlin, Heidelberg, 1978. Springer Berlin Heidelberg. K. Park, T. Patten, and M. Vincze. Pix2pose: Pixel-wise coordinate regression of objects for 6D pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7668–7677, 2019. S. Peng, Y. Liu, Q. Huang, X. Zhou, and H. Bao. Pvnet: Pixel-wise voting network for 6D of pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4561–4570, 2019. M. Rad and V. Lepetit. Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In Proceedings of the IEEE international conference on computer vision, pages 3828–3836, 2017. J. Redmon and A. Farhadi. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017. F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. 3d object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. International journal of computer vision, 66(3):231–259, 2006. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. C. Song, J. Song, and Q. Huang. Hybridpose: 6D object pose estimation under hybrid representations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 431–440, 2020. M. Tan, R. Pang, and Q. V. Le. Efficientdet: Scalable and efficient object detec- tion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10781–10790, 2020. B. Tekin, S. N. Sinha, and P. Fua. Real-time seamless single shot 6D object pose prediction. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 292–301, 2018. J. Tremblay, T. To, B. Sundaralingam, Y. Xiang, D. Fox, and S. Birchfield. Deep object pose estimation for semantic robotic grasping of household objects. arXiv preprint arXiv:1809.10790, 2018. H. Wang, W. Wang, and J. Liu. Temporal memory attention for video semantic segmentation. In 2021 IEEE International Conference on Image Processing (ICIP), pages 2254–2258. IEEE, 2021. Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox. Posecnn: A convolutional neural network for 6D object pose estimation in cluttered scenes. Robotics: Science and Systems (RSS), 2018. S. Zakharov, I. Shugurov, and S. Ilic. Dpod: 6D pose object detector and refiner. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1941–1950, 2019.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84600-
dc.description.abstract物體位姿估測是一種用於偵測圖片中感興趣物體的技術。由單張RGB影像來做6D物體位姿估測的一個常見的挑戰就是是物體在雜亂的場景中彼此的互相遮擋。除了只使用輸入影像的空間資訊外,利用影片資料中連續影像的之間的時間資訊可以進一步提升這項任務的表現。舉例來說,考慮到輸入影像中的物體在當前的相機視角下被其他物體遮擋的情況,結合鄰近影像的相機視角就有機會去回復未看到的物體的位姿。在本論文中,我們對使用了深度學習的端到端單張影像的位姿估測方法進行了充分的分析與實驗,並且提出了一種端到端方法將單張影像位姿估測擴展到多張影像的版本。實驗結果顯示,我們的方法相較於基準模型提供了更準確的結果。zh_TW
dc.description.abstractObject pose estimation is a technique used to detect objects of interest in images. A common challenge of 6D pose estimation from a single RGB image is the occlusions between objects in cluttered environment. In addition to only use the spatial information in the input frame, utilizing the temporal information between consecutive frames in the video data may further improve the performance in this task. For instance, taking account of the situation that the objects in the input frame be occluded by other objects in current camera perspective, combining the camera perspectives of neighboring frames makes it possible to recover the poses of unseen objects. In this thesis, we fully analyze and experiment on an end-to-end single-frame pose estimation method using deep learning. We also propose an end-to-end approach to extend the single-frame pose estimation to a multi-frame version. The experimental results show that our method provides more accurate results than the baseline model.en
dc.description.provenanceMade available in DSpace on 2023-03-19T22:17:08Z (GMT). No. of bitstreams: 1
U0001-3108202215385700.pdf: 8110707 bytes, checksum: 3bfe6d8c0edf8cd8bf9bc96180c7f662 (MD5)
Previous issue date: 2022
en
dc.description.tableofcontents摘要 i Abstract ii Contents iii List of Figures v List of Tables vii Chapter 1 Introduction 1 Chapter 2 Related Work 4 2.1 Two-Stage Methods .......................... 4 2.1.1 Keypoint-Based Methods....................... 5 2.1.2 Dense Methods ............................ 7 2.2 End-to-EndMethods.......................... 7 Chapter 3 Proposed Multi-Frame Method 10 3.1 Single-Frame Pose Estimation..................... 11 3.1.1 Semantic Segmentation........................ 12 3.1.2 3D Translation Estimation ...................... 13 3.1.3 3D Rotation Regression........................ 14 3.2 Multi-Frame Pose Estimation ..................... 14 3.2.1 Temporal Memory Attention ..................... 16 3.3 Loss Function.............................. 17 3.3.1 Segmentation Loss .......................... 18 3.3.2 Center Regression Loss........................ 18 3.3.3 Bounding Box Loss.......................... 19 3.3.4 PoseLoss ............................... 19 3.4 Training Process and Implementation Details . . . . . . . . . . . . . 20 Chapter 4 Experiments 22 4.1 Dataset ................................. 22 4.2 Evaluation Metrics ........................... 23 4.3 Quantitative Evaluation ........................ 24 4.4 Ablation Studies ............................ 25 4.4.1 Semantic Segmentation........................ 26 4.4.2 Center Regression........................... 27 Chapter 5 Conclusions and Future Works 31 References 33
dc.language.isoen
dc.title使用時序注意力機制之基於影像序列的物體位姿估測zh_TW
dc.titleObject Pose Estimation Using Image Sequence via Temporal Attentionen
dc.typeThesis
dc.date.schoolyear110-2
dc.description.degree碩士
dc.contributor.oralexamcommittee陳冠文(Kuan-Wen Chen),鄭文皇(Wen-Huang Cheng)
dc.subject.keyword物體位姿估測,語義分割,時序注意力,深度學習,zh_TW
dc.subject.keywordObject Pose Estimation,Semantic Segmentation,Temporal Attention,Deep Learning,en
dc.relation.page37
dc.identifier.doi10.6342/NTU202203024
dc.rights.note同意授權(限校園內公開)
dc.date.accepted2022-09-20
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊網路與多媒體研究所zh_TW
dc.date.embargo-lift2022-09-26-
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
U0001-3108202215385700.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
7.92 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved