基於不確定性之地圖更新與再查找

陳靖元; Ching-Yuan Chen

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94323

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	徐宏民	zh_TW
dc.contributor.advisor	Winston H. Hsu	en
dc.contributor.author	陳靖元	zh_TW
dc.contributor.author	Ching-Yuan Chen	en
dc.date.accessioned	2024-08-15T16:49:01Z	-
dc.date.available	2024-08-16	-
dc.date.copyright	2024-08-15	-
dc.date.issued	2024	-
dc.date.submitted	2024-08-01	-
dc.identifier.citation	[1] J. T. Ash, C. Zhang, A. Krishnamurthy, J. Langford, and A. Agarwal. Deep batch active learning by diverse, uncertain gradient lower bounds, 2020. [2] A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Nießner, M. Savva, S. Song, A. Zeng, and Y. Zhang. Matterport3d: Learning from rgb-d data in indoor environments, 2017. [3] B. Chen, F. Xia, B. Ichter, K. Rao, K. Gopalakrishnan, M. S. Ryoo, A. Stone, and D. Kappler. Open-vocabulary queryable scene representations for real world planning. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 11509–11522. IEEE, 2023. [4] J. Choi, I. Elezi, H.-J. Lee, C. Farabet, and J. M. Alvarez. Active learning for deep object detection via probabilistic modeling, 2021. [5] Y. Gal and Z. Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning, 2016. [6] G. Georgakis, B. Bucher, A. Arapin, K. Schmeckpeper, N. Matni, and K. Daniilidis. Uncertainty-driven planner for exploration and navigation, 2022 [7] G. Georgakis, B. Bucher, K. Schmeckpeper, S. Singh, and K. Daniilidis. Learning to map for active semantic goal navigation, 2022. [8] C. Huang, O. Mees, A. Zeng, and W. Burgard. Visual language maps for robot navigation. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 10608–10615. IEEE, 2023. [9] W. Huang, C. Wang, R. Zhang, Y. Li, J. Wu, and L. Fei-Fei. Voxposer: Composable 3d value maps for robotic manipulation with language models. In 7th Annual Conference on Robot Learning, 2023. [10] K. M. Jatavallabhula, A. Kuwajerwala, Q. Gu, M. Omama, T. Chen, A. Maalouf, S. Li, G. S. Iyer, S. Saryazdi, N. V. Keetha, et al. Conceptfusion: Open-set multimodal 3d mapping. In ICRA2023 Workshop on Pretraining for Robotics (PT4R), 2023. [11] H. Jiang, B. Huang, R. Wu, Z. Li, S. Garg, H. Nayyeri, S. Wang, and Y. Li. RoboEXP: Action-conditioned scene graph via interactive exploration for robotic manipulation. In First Workshop on Vision-Language Models for Navigation and Manipulation at ICRA 2024, 2024. [12] A. Kendall and Y. Gal. What uncertainties do we need in bayesian deep learning for computer vision?, 2017. [13] A. Kirsch, J. van Amersfoort, and Y. Gal. Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning, 2019. [14] B. Li, K. Q. Weinberger, S. Belongie, V. Koltun, and R. Ranftl. Language-driven semantic segmentation. In International Conference on Learning Representations, 2022. [15] L. H. Li, P. Zhang, H. Zhang, J. Yang, C. Li, Y. Zhong, L. Wang, L. Yuan, L. Zhang, J.-N. Hwang, K.-W. Chang, and J. Gao. Grounded language-image pretraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10965–10975, 2022. [16] Y. Ovadia, E. Fertig, J. Ren, Z. Nado, D. Sculley, S. Nowozin, J. V. Dillon, B. Lak shminarayanan, and J. Snoek. Can you trust your model's uncertainty? evaluating predictive uncertainty under dataset shift, 2019. [17] X. Puig, E. Undersander, A. Szot, M. D. Cote, R. Partsey, J. Yang, R. Desai, A. W. Clegg, M. Hlavac, T. Min, T. Gervet, V. Vondruš, V.-P. Berges, J. Turner, O. Maksymets, Z. Kira, M. Kalakrishnan, J. Malik, D. S. Chaplot, U. Jain, D. Batra, A. Rai, and R. Mottaghi. Habitat 3.0: A co-habitat for humans, avatars and robots, 2023. [18] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision. In ICML, 2021. [19] M. Savva, A. Kadian, O. Maksymets, Y. Zhao, E. Wijmans, B. Jain, J. Straub, J. Liu, V. Koltun, J. Malik, D. Parikh, and D. Batra. Habitat: A Platform for Embodied AI Research. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019. [20] B. Settles. Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison, 2009. [21] N. M. M. Shafiullah, C. Paxton, L. Pinto, S. Chintala, and A. Szlam. Clip-fields: Weakly supervised semantic fields for robotic memory. In ICRA2023 Workshop on Pretraining for Robotics (PT4R), 2023. [22] A. Szot, A. Clegg, E. Undersander, E. Wijmans, Y. Zhao, J. Turner, N. Maestre, M. Mukadam, D. Chaplot, O. Maksymets, A. Gokaslan, V. Vondrus, S. Dharur, F. Meier, W. Galuba, A. Chang, Z. Kira, V. Koltun, J. Malik, M. Savva, and D. Ba tra. Habitat 2.0: Training home assistants to rearrange their habitat. In Advances in Neural Information Processing Systems (NeurIPS), 2021. [23] A. Takmaz, E. Fedele, R. Sumner, M. Pollefeys, F. Tombari, and F. Engel mann. Openmask3d: Open-vocabulary 3d instance segmentation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. [24] K. Wang, X. Yan, D. Zhang, L. Zhang, and L. Lin. Towards human-machine cooperation: Self-supervised sample mining for object detection, 2018. [25] K. Wang, D. Zhang, Y. Li, R. Zhang, and L. Lin. Cost-effective active learning for deep image classification. IEEE Transactions on Circuits and Systems for Video Technology, 27(12):2591–2600, Dec. 2017. [26] J. Wu, J. Chen, and D. Huang. Entropy-based active learning for object detection with progressive diversity constraint, 2022. [27] Z. Ye, P. Liu, J. Liu, X. Tang, and W. Zhao. Practice makes perfect: An adaptive active learning framework for image classification. Neurocomputing, 196:95–106, 2016.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94323	-
dc.description.abstract	在無需訓練的機器人應用中，以視覺語言模型（VLMs）探索構建的預探索語義地圖作為基礎元素已被證明非常有效。然而，現有的方法假設地圖是準確的，並且沒有提供有效的機制來根據錯誤的地圖修正決策。這篇論文引入了不確定概念，並通過不確定性來估計地圖的準確性和品質，使得機器人可以找到更有可能出錯的區域，並在沒有額外標籤的情況下修正由不準確地圖引起的錯誤決策。我們使用兩個現代地圖的基石模型，VLMaps 和 OpenMask3D，展示了我們所提出方法的有效性，並在這兩者上都顯示了改善。	zh_TW
dc.description.abstract	Pre-Explored Semantic Map, constructed through prior exploration using visual lan guage models (VLMs), has proven effective as a foundational element for training-free robotic applications. However, existing approaches assume the maps accuracy and do not provide effective mechanisms for revising decisions based on incorrect maps. This work introduces Uncertainty-Aware Memory Updating and Re-Proposing, which estimates the accuracy and quality of the map through uncertainty, enabling the agent to locate the areas with higher chance of being incorrect and to revise erroneous decisions stemming from inaccurate maps without additional labels. We demonstrate the effectiveness of our proposed method using two modern map backbones, VLMaps and OpenMask3D, and show the improvements on both of them.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-15T16:49:01Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-08-15T16:49:01Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xi List of Tables xiii Chapter 1 Introduction 1 Chapter 2 Related Work 5 2.1 Pre-Explored Semantic Map . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Perception with Visual Language Model . . . . . . . . . . . . . . . . 6 2.3 Active Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 3 Method 9 3.1 Failure Case Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Framework Overview . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3.1 Efficient Map Update . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3.2 Re-Proposing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.4 Uncertainty Measures . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4.1 Single-view: Entropy . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4.2 Multi-view: Standard Error & KL-Divergence . . . . . . . . . . . . 15 Chapter 4 Experiments 17 4.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.3.1 Efficacy of Each Module . . . . . . . . . . . . . . . . . . . . . . . 20 4.3.2 Efficient Map Update . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.3.2.1 Selection of Uncertainty Measures . . . . . . . . . . . 21 4.3.2.2 Generalizability . . . . . . . . . . . . . . . . . . . . . 22 4.3.3 Re-proposing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3.3.1 Selection of Confidence and Uncertainty Measures . . 22 4.3.3.2 Generalizability . . . . . . . . . . . . . . . . . . . . . 23 Chapter 5 Conclusion 25 References 27 Appendix A — More Method Details 31 A.1 Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 A.2 Limitation of KL Divergence . . . . . . . . . . . . . . . . . . . . . . 32	-
dc.language.iso	en	-
dc.subject	預探索	zh_TW
dc.subject	不確定性	zh_TW
dc.subject	導航	zh_TW
dc.subject	Embodied Agent Navigation	en
dc.subject	Uncertainty	en
dc.subject	Pre-Explore	en
dc.title	基於不確定性之地圖更新與再查找	zh_TW
dc.title	Uncertainty-Aware Memory Updating and Re-Proposing	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	陳尚澤;陳文進;葉梅珍	zh_TW
dc.contributor.oralexamcommittee	Shang-Tse Chen;WC Chen;Mei-Chen Yeh	en
dc.subject.keyword	不確定性,預探索,導航,	zh_TW
dc.subject.keyword	Uncertainty,Pre-Explore,Embodied Agent Navigation,	en
dc.relation.page	32	-
dc.identifier.doi	10.6342/NTU202402241	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2024-08-04	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf	4.74 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。