Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80536
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor徐宏民(Winston Hsu)
dc.contributor.authorKe-Jyun Wangen
dc.contributor.author王科鈞zh_TW
dc.date.accessioned2022-11-24T03:08:48Z-
dc.date.available2021-11-03
dc.date.available2022-11-24T03:08:48Z-
dc.date.copyright2021-11-03
dc.date.issued2021
dc.date.submitted2021-10-28
dc.identifier.citationKazemzadeh, S., Ordonez, V., Matten, M., Berg, T. L. (2014). ReferIt Game: Referring to Objects in Photographs of Natural Scenes. EMNLP. Tziafas, G., Kasaei, S. H. (2021). Few-Shot Visual Grounding for Natural Human-Robot Interaction. CoRR, abs/2103.09720. Opgehaal van https://arxiv.org/abs/2103.09720 Shridhar, M., Mittal, D., Hsu, D. (01 2020). INGRESS: Interactive visual grounding of referring expressions. The International Journal of Robotics Research, 39, 027836491989713. doi:10.1177/0278364919897133 Zhang, H., Lu, Y., Yu, C., Hsu, D., Lan, X., Zheng, N. (2021, Julie). INVIGORATE: Interactive Visual Grounding and Grasping in Clutter. Proceedings of Robotics: Science and Systems. doi:10.15607/RSS.2021.XVII.020 Wang, K.-J., Liu, Y.-H., Su, H.-T., Wang, J.-W., Wang, Y.-S., Hsu, W., Chen, W.-C. (2021, Junie). OCID-Ref: A 3D Robotic Dataset With Embodied Language For Clutter Scene Grounding. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 5333–5338. doi:10.18653/v1/2021.naacl-main.419 Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N. D. (2009). Dataset Shift in Machine Learning. The MIT Press. Wang, M., Deng, W. (2018). Deep visual domain adaptation: A survey. Neurocomputing, 312, 135–153. doi:10.1016/j.neucom.2018.05.083 Calli, B., Walsman, A., Singh, A., Srinivasa, S., Abbeel, P., Dollar, A. M. (2015). Benchmarking in Manipulation Research: Using the Yale-CMU-Berkeley Object and Model Set. IEEE Robotics Automation Magazine, 22(3), 36–52. doi:10.1109/MRA.2015.2448951 Community, B. O. (2018). Blender - a 3D modelling and rendering package. Opgehaal van Blender Foundation website: http://www.blender.org Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P. (2017). Domain randomization for transferring deep neural networks from simulation to the real world. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 23–30. Tremblay, J., Prakash, A., Acuna, D., Brophy, M., Jampani, V., Anil, C., … Birchfield, S. (2018). Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 1082–10828. Qi, F., Yang, X., Xu, C. (2018). A Unified Framework for Multimodal Domain Adaptation. Proceedings of the 26th ACM International Conference on Multimedia, 429–437. Presented at the Seoul, Republic of Korea. doi:10.1145/3240508.3240633 Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., … Lempitsky, V. (2016). Domain-Adversarial Training of Neural Networks. Journal of Machine Learning Research, 17(59), 1–35. Opgehaal van http://jmlr.org/papers/v17/15-239.html Saito, K., Ushiku, Y., Harada, T., Saenko, K. (06 2019). Strong-Weak Distribution Alignment for Adaptive Object Detection. 6949–6958. doi:10.1109/CVPR.2019.00712 Huang, B., Lian, D., Luo, W., Gao, S. (2021, Junie). Look Before You Leap: Learning Landmark Features for One-Stage Visual Grounding. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Perez, E., Strub, F., de Vries, H., Dumoulin, V., Courville, A. C. (2018). FiLM: Visual Reasoning with a General Conditioning Layer. AAAI. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … Polosukhin, I. (2017). Attention is All you Need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Reds), Advances in Neural Information Processing Systems (Vol 30). Opgehaal van https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf Tanwani, A. K. (2020). DIRL: Domain-Invariant Representation Learning for Sim-to-Real Transfer. CoRL. Bousmalis, K., Irpan, A., Wohlhart, P., Bai, Y., Kelcey, M., Kalakrishnan, M., … Vanhoucke, V. (2018). Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping. 2018 IEEE International Conference on Robotics and Automation (ICRA), 4243–4250. Park, T., Efros, A. A., Zhang, R., Zhu, J.-Y. (2020). Contrastive Learning for Unpaired Image-to-Image Translation. European Conference on Computer Vision. Mao, J., Huang, J., Toshev, A., Camburu, O., Yuille, A., Murphy, K. (2016). Generation and Comprehension of Unambiguous Object Descriptions. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 11–20. doi:10.1109/CVPR.2016.9 Nagaraja, V. K., Morariu, V. I., Davis, L. S. (2016). Modeling Context Between Objects for Referring Expression Understanding. ECCV. Hu, R., Xu, H., Rohrbach, M., Feng, J., Saenko, K., Darrell, T. (2016). Natural Language Object Retrieval. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Yu, L., Lin, Z., Shen, X., Yang, J., Lu, X., Bansal, M., Berg, T. (06 2018). MAttNet: Modular Attention Network for Referring Expression Comprehension. 1307–1315. doi:10.1109/CVPR.2018.00142 Paul, R., Arkin, J., Roy, N., Howard, T. (06 2016). Efficient Grounding of Abstract Spatial Concepts for Natural Language Interaction with Robot Manipulators. doi:10.15607/RSS.2016.XII.037 Mees, O., Burgard, W. (2021). Composing Pick-and-Place Tasks by Grounding Language. In B. Siciliano, C. Laschi, O. Khatib (Reds), Experimental Robotics (bll 491–501). Cham: Springer International Publishing. Liu, R., Liu, C., Bai, Y., Yuille, A. (06 2019). CLEVR-Ref+: Diagnosing Visual Reasoning With Referring Expressions. 4180–4189. doi:10.1109/CVPR.2019.00431 Prakash, A., Boochoon, S., Brophy, M., Acuna, D., Cameracci, E., State, G., … Birchfield, S. (2019). Structured Domain Randomization: Bridging the Reality Gap by Context-Aware Synthetic Data. 2019 International Conference on Robotics and Automation (ICRA), 7249–7255. Yue, X., Zhang, Y., Zhao, S., Vincentelli, A., Keutzer, K., Gong, B. (10 2019). Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization Without Accessing Target Domain Data. 2100–2110. doi:10.1109/ICCV.2019.00219 Chebotar, Y., Handa, A., Makoviychuk, V., Macklin, M., Issac, J., Ratliff, N., Fox, D. (05 2019). Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience. 8973–8979. doi:10.1109/ICRA.2019.8793789 Irpan, A., Harris, C., Ibarz, J., Rao, K., Khansari, M., Levine, S. (2020). RL-CycleGAN: Improving Deep-RL Robotics With Simulation-To-Real. Opgehaal van https://arxiv.org/abs/2006.09001 Ho, D., Rao, K., Xu, Z., Jang, E., Khansari, M., Bai, Y. (2021). RetinaGAN: An Object-aware Approach to Sim-to-Real Transfer. 2021 IEEE International Conference on Robotics and Automation (ICRA), 10920–10926. Judy Hoffman and Eric Tzeng and Taesung Park and Jun-Yan Zhu, and Phillip Isola and Kate Saenko and Alexei A. Efros and Trevor Darrell. (2018). CyCADA: Cycle Consistent Adversarial Domain Adaptation. International Conference on Machine Learning (ICML).
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80536-
dc.description.abstract在人機互動的領域中,我們時常期望聰明的機器人能夠快速地適應環境的變化,並在視覺定位的任務上有好的表現,然而現今的解決方法都是利用收集新環境的資料來重新訓練我們的機器人,然而這樣的方法既沒效率又很花費人力與金錢。因此為了解決此問題,我們提出了一種基於模擬至現實之遷移學習的領域適應方法來幫助我們的機器人在零成本的模擬資料中學習。而為了要生成出訓練時所需的資料,我們透過強大的圖形渲染引擎來製作出逼近於現實模樣的虛擬照片,將這些虛擬照片和不需成本即可獲得的標住資訊組合成一個新的視覺定位資料集 YCB-Ref 來讓我們可以在機器人的視覺定位之任務訓練中使用。不過,如果直接使用這些生成的資料,會在過程中遇到一個虛實差異的困境,在這問題上,我們的方法也提出了兩個解決方案,第一個方案是混合式領域隨機方法,我們將現實世界中的背景套用到一個空背景的虛擬照片上,來增強機器人對於背景雜訊的區分。第二個方案是語言領導之小區塊領域適應方法,在這方法上,我們將虛擬圖片和現實照片中較重要的小區塊去強化他們的關聯性,以幫助機器人對於需要關注的小區塊能夠更敏感且更了解。最後,在實驗結果上,皆表示出我們的方法能夠很好的幫助機器人在模擬資料中去學習視覺定位的任務。zh_TW
dc.description.provenanceMade available in DSpace on 2022-11-24T03:08:48Z (GMT). No. of bitstreams: 1
U0001-2610202112302600.pdf: 1335536 bytes, checksum: 5a600bc61d82bbe4843eeffbb228f24b (MD5)
Previous issue date: 2021
en
dc.description.tableofcontentsVerification Letter from the Oral Examination Commitee..................... i Acknowledgments............................................................ ii 摘要........................................................................ iii Abstract................................................................... iv Contents................................................................... vi List of Figures............................................................ viii List of Tables............................................................. ix Chapter 1 Introduction..................................................... 1 Chapter 2 Related Work..................................................... 4 2.1 Visual Grounding................................................. 4 2.2 Sim2Real Transfer................................................ 5 Chapter 3 Problem Definition............................................... 7 3.1 Cross Domain Challenge........................................... 7 3.2 Problem Formulation.............................................. 8 3.3 Problem Statement................................................ 8 Chapter 4 Main Approach.................................................... 10 4.1 Overview of SimVG................................................ 10 4.2 Synthetic Data - YCB-Ref......................................... 10 4.2.1 HRI Scene Generation........................................... 10 4.2.2 Mixup Domain Randomization..................................... 11 4.2.3 Referring Expression Generation................................ 12 4.3 Patch-wise Domain Adaptation..................................... 13 4.4 Training Strategy................................................ 16 Chapter 5 Experiments....................................................... 18 5.1 Setup............................................................ 18 5.2 Unsupervised Sim2Real Transfer................................... 20 5.3 Fine-tune on 100% Real Data...................................... 21 5.4. Fine-tune on Real Data with Different Data Ratio................. 22 Chapter 6 Conclusion........................................................ 23 References.................................................................. 24
dc.language.isoen
dc.subject領域隨機zh_TW
dc.subject模擬至現實之遷移學習zh_TW
dc.subject人機互動zh_TW
dc.subject視覺定位zh_TW
dc.subject領域適應zh_TW
dc.subjectHuman-Robot Interactionen
dc.subjectDomain Adaptationen
dc.subjectVisual Groundingen
dc.subjectDomain Randomizationen
dc.subjectSim2Real Transferen
dc.title基於模擬至現實之遷移學習解決視覺定位中透過語言引導的領域適應問題zh_TW
dc.titleSim2real Transfer Visual Grounding Knowledge Through Language-Guided Patch-wise Domain Adaptationen
dc.date.schoolyear109-2
dc.description.degree碩士
dc.contributor.oralexamcommittee陳文進(Hsin-Tsai Liu),陳奕廷(Chih-Yang Tseng),葉梅珍,余能豪
dc.subject.keyword領域隨機,領域適應,模擬至現實之遷移學習,視覺定位,人機互動,zh_TW
dc.subject.keywordDomain Randomization,Domain Adaptation,Sim2Real Transfer,Visual Grounding,Human-Robot Interaction,en
dc.relation.page28
dc.identifier.doi10.6342/NTU202104219
dc.rights.note同意授權(限校園內公開)
dc.date.accepted2021-10-29
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊網路與多媒體研究所zh_TW
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
U0001-2610202112302600.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
1.3 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved