針對現實場景的策略學習的發展與監控

葉佳峯; Jia-Fong Yeh

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97512

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	徐宏民	zh_TW
dc.contributor.advisor	Winston H. Hsu	en
dc.contributor.author	葉佳峯	zh_TW
dc.contributor.author	Jia-Fong Yeh	en
dc.date.accessioned	2025-07-02T16:14:15Z	-
dc.date.available	2025-07-03	-
dc.date.copyright	2025-07-02	-
dc.date.issued	2025	-
dc.date.submitted	2025-06-02	-
dc.identifier.citation	[1] OpenAI. Openai: Introducing chatgpt. https://openai.com/blog/chatgpt, 2022. [2] Yecheng Jason Ma, William Liang, Vaidehi Som, Vikash Kumar, Amy Zhang, Osbert Bastani, and Dinesh Jayaraman. Liv: Language-image representations and rewards for robotic control. arXiv preprint arXiv:2306.00958, 2023. [3] Yan Duan, Marcin Andrychowicz, Bradly Stadie, OpenAI Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, and Wojciech Zaremba. One-shot imitation learning. In Advances in Neural Information Processing Systems, volume 30, 2017. [4] Chelsea Finn, Tianhe Yu, Tianhao Zhang, Pieter Abbeel, and Sergey Levine. One-shot visual imitation learning via meta-learning. In CoRL 2017, pages 357–368, 13–15 Nov 2017. [5] Tianhe Yu, Chelsea Finn, Sudeep Dasari, Annie Xie, Tianhao Zhang, Pieter Abbeel, and Sergey Levine. One-shot imitation from observing humans via domain-adaptive meta-learning. In Robotics: Science and Systems (RSS), 26-30 June 2018. [6] Stephen James, Michael Bloesch, and Andrew J Davison. Task-embedded control networks for few-shot imitation learning. In CoRL 2018, 2018. [7] Alessandro Bonardi, Stephen James, and Andrew J. Davison. Learning one-shot imitation from humans without humans. IEEE Robotics and Automation Letters, 5(2):3533–3539, 2020. [8] Tianhe Yu, Pieter Abbeel, Sergey Levine, and Chelsea Finn. One-shot composition of vision-based skills from demonstration. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2643–2650, 2019. [9] Sudeep Dasari and Abhinav Gupta. Transformers for one-shot imitation learning. In CoRL 2020, 2020. [10] T. Cachet, J. Perez, and S. Kim. Transformer-based meta-imitation learning for robotic manipulation. In 3rd Workshop on Robot Learning (NeurIPSW), 2020. [11] Christopher R. Dance, Julien Perez, and Théo Cachet. Demonstration-conditioned reinforcement learning for few-shot imitation. In Proceedings of the 38th International Conference on Machine Learning, volume 139, pages 2376–2387, 18–24 Jul 2021. [12] Stephanie Milani, Nicholay Topin, Manuela Veloso, and Fei Fang. Explainable reinforcement learning: A survey and comparative review. ACM Computing Surveys, 56(7), April 2024. [13] Charles Retzlaff, Srijita Das, Christabel Wayllace, Payam Mousavi, Mohammad Afshari, Tianpei Yang, Anna Saranti, Alessa Angerschmid, Matthew E. Taylor, and Andreas Holzinger. Human-in-the-loop reinforcement learning: A survey and position on requirements, challenges, and opportunities. Journal of Artificial Intelligence Research, 79:359–415, 2024. [14] Claire Glanois, Paul Weng, Matthieu Zimmer, Dong Li, Tianpei Yang, Jianye Hao, and Wulong Liu. A survey on interpretable reinforcement learning. Machine Learning, 113(8):5847–5890, April 2024. [15] Maryam Zare, Parham M. Kebria, Abbas Khosravi, and Saeid Nahavandi. A survey of imitation learning: Algorithms, recent developments, and challenges. IEEE Transactions on Cybernetics, 54(12):7173–7186, 2024. [16] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016. [17] B Ravi Kiran, Ibrahim Sobh, Victor Talpaert, Patrick Mannion, Ahmad A. Al Sallab, Senthil Yogamani, and Patrick Pérez. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926, 2022. [18] Amirhosein Mosavi, Yaser Faghan, Pedram Ghamisi, Puhong Duan, Sina Faizollahzadeh Ardabili, Ely Salwana, and Shahab S. Band. Comprehensive review of deep reinforcement learning methods and applications in economics. Mathematics, 8(10), 2020. [19] OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko, Madelaine Boyd, Anna-Luisa Brakman, Greg Brockman, Tim Brooks, Miles Brundage, Kevin Button, Trevor Cai, Rosie Campbell, Andrew Cann, Brittany Carey, Chelsea Carlson, Rory Carmichael, Brooke Chan, Che Chang, Fotis Chantzis, Derek Chen, Sully Chen, Ruby Chen, Jason Chen, Mark Chen, Ben Chess, Chester Cho, Casey Chu, Hyung Won Chung, Dave Cummings, Jeremiah Currier, Yunxing Dai, Cory Decareaux, Thomas Degry, Noah Deutsch, Damien Deville, Arka Dhar, David Dohan, Steve Dowling, Sheila Dunning, Adrien Ecoffet, Atty Eleti, Tyna Eloundou, David Farhi, Liam Fedus, Niko Felix, Simón Posada Fishman, Juston Forte, Isabella Fulford, Leo Gao, Elie Georges, Christian Gibson, Vik Goel, Tarun Gogineni, Gabriel Goh, Rapha Gontijo-Lopes, Jonathan Gordon, Morgan Grafstein, Scott Gray, Ryan Greene, Joshua Gross, Shixiang Shane Gu, Yufei Guo, Chris Hallacy, Jesse Han, Jeff Harris, Yuchen He, Mike Heaton, Johannes Heidecke, Chris Hesse, Alan Hickey, Wade Hickey, Peter Hoeschele, Brandon Houghton, Kenny Hsu, Shengli Hu, Xin Hu, Joost Huizinga, Shantanu Jain, Shawn Jain, Joanne Jang, Angela Jiang, Roger Jiang, Haozhun Jin, Denny Jin, Shino Jomoto, Billie Jonn, Heewoo Jun, Tomer Kaftan, Łukasz Kaiser, Ali Kamali, Ingmar Kanitscheider, Nitish Shirish Keskar, Tabarak Khan, Logan Kilpatrick, Jong Wook Kim, Christina Kim, Yongjik Kim, Jan Hendrik Kirchner, Jamie Kiros, Matt Knight, Daniel Kokotajlo, Łukasz Kondraciuk, Andrew Kondrich, Aris Konstantinidis, Kyle Kosic, Gretchen Krueger, Vishal Kuo, Michael Lampe, Ikai Lan, Teddy Lee, Jan Leike, Jade Leung, Daniel Levy, Chak Ming Li, Rachel Lim, Molly Lin, Stephanie Lin, Mateusz Litwin, Theresa Lopez, Ryan Lowe, Patricia Lue, Anna Makanju, Kim Malfacini, Sam Manning, Todor Markov, Yaniv Markovski, Bianca Martin, Katie Mayer, Andrew Mayne, Bob McGrew, Scott Mayer McKinney, Christine McLeavey, Paul McMillan, Jake McNeil, David Medina, Aalok Mehta, Jacob Menick, Luke Metz, Andrey Mishchenko, Pamela Mishkin, Vinnie Monaco, Evan Morikawa, Daniel Mossing, Tong Mu, Mira Murati, Oleg Murk, David Mély, Ashvin Nair, Reiichiro Nakano, Rajeev Nayak, Arvind Neelakantan, Richard Ngo, Hyeonwoo Noh, Long Ouyang, Cullen O’Keefe, Jakub Pachocki, Alex Paino, Joe Palermo, Ashley Pantuliano, Giambattista Parascandolo, Joel Parish, Emy Parparita, Alex Passos, Mikhail Pavlov, Andrew Peng, Adam Perelman, Filipe de Avila Belbute Peres, Michael Petrov, Henrique Ponde de Oliveira Pinto, Michael, Pokorny, Michelle Pokrass, Vitchyr H. Pong, Tolly Powell, Alethea Power, Boris Power, Elizabeth Proehl, Raul Puri, Alec Radford, Jack Rae, Aditya Ramesh, Cameron Raymond, Francis Real, Kendra Rimbach, Carl Ross, Bob Rotsted, Henri Roussez, Nick Ryder, Mario Saltarelli, Ted Sanders, Shibani Santurkar, Girish Sastry, Heather Schmidt, David Schnurr, John Schulman, Daniel Selsam, Kyla Sheppard, Toki Sherbakov, Jessica Shieh, Sarah Shoker, Pranav Shyam, Szymon Sidor, Eric Sigler, Maddie Simens, Jordan Sitkin, Katarina Slama, Ian Sohl, Benjamin Sokolowsky, Yang Song, Natalie Staudacher, Felipe Petroski Such, Natalie Summers, Ilya Sutskever, Jie Tang, Nikolas Tezak, Madeleine B. Thompson, Phil Tillet, Amin Tootoonchian, Elizabeth Tseng, Preston Tuggle, Nick Turley, Jerry Tworek, Juan Felipe Cerón Uribe, Andrea Vallone, Arun Vijayvergiya, Chelsea Voss, Carroll Wainwright, Justin Jay Wang, Alvin Wang, Ben Wang, Jonathan Ward, Jason Wei, CJ Weinmann, Akila Welihinda, Peter Welinder, Jiayi Weng, Lilian Weng, Matt Wiethoff, Dave Willner, Clemens Winter, Samuel Wolrich, Hannah Wong, Lauren Workman, Sherwin Wu, Jeff Wu, Michael Wu, Kai Xiao, Tao Xu, Sarah Yoo, Kevin Yu, Qiming Yuan, Wojciech Zaremba, Rowan Zellers, Chong Zhang, Marvin Zhang, Shengjia Zhao, Tianhao Zheng, Juntang Zhuang, William Zhuk, and Barret Zoph. Gpt-4 technical report, 2024. [20] Kinza Arshad, Rao Faizan Ali, Amgad Muneer, Izzatdin Abdul Aziz, Sheraz Naseer, Nabeel Sabir Khan, and Shakirah Mohd Taib. Deep reinforcement learning for anomaly detection: A systematic review. IEEE Access, 10:124017–124035, 2022. [21] Jens Kober, J. Andrew Bagnell, and Jan Peters. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013. [22] Julian Ibarz, Jie Tan, Chelsea Finn, Mrinal Kalakrishnan, Peter Pastor, and Sergey Levine. How to train your robot with deep reinforcement learning: lessons we have learned. The International Journal of Robotics Research, 40(4-5):698–721, 2021. [23] Tengteng Zhang and Hongwei Mo. Reinforcement learning for robot research: A comprehensive review and open issues. International Journal of Advanced Robotic Systems, 18(3):17298814211007305, 2021. [24] Bharat Singh, Rajesh Kumar, and Vinay Pratap Singh. Reinforcement learning in robotic applications: a comprehensive survey. Artificial Intelligence Review, 55:945–990, 2022. [25] Jia-Fong Yeh, Chi-Ming Chung, Hung-Ting Su, Yi-Ting Chen, and Winston H. Hsu. Stage conscious attention network (scan): A demonstration-conditioned policy for few-shot imitation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8866–8873, Jun. 2022. [26] Jia-Fong Yeh, Chi-Ming Chung, Hung-Ting Su, Yi-Ting Chen, and Winston H. Hsu. Stage conscious attention network (scan) : A demonstration-conditioned policy for few-shot imitation. https://arxiv.org/pdf/2112.02278, 2021. arXiv. [27] Q. Shao, Jin Qi, J. Ma, Yi Fang, W. Weiming, and Jie Hu. Object detection-based one-shot imitation learning with an rgb-d camera. Applied Sciences, 10:803, 2020. [28] Oriol Vinyals, Charles Blundell, Timothy Lillicrap, koray kavukcuoglu, and Daan Wierstra. Matching networks for one shot learning. In Advances in Neural Information Processing Systems 29, pages 3630–3638, 2016. [29] Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems 30, pages 4077–4087, 2017. [30] Flood Sung, Yongxin Yang, Li Zhang, Philip H.S. Torr Tao Xiang, and Timothy M. Hospedales. Learning to compare: Relation network for few-shot learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1199–1208, 2018. [31] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pages 1126–1135, 2017. [32] Pinzhuo Tian, Zhangkai Wu, Lei Qi, Lei Wang, Yinghuan Shi, and Yang Gao. Differentiable meta-learning model for few-shot semantic segmentation. In Proceedings of AAAI, pages 12087–12094, 2020. [33] Leonid Karlinsky, Joseph Shtok, Amit Alfassy, Moshe Lichtenstein, Sivan Harary, Eli Schwartz, Sivan Doveh, Prasanna Sattigeri, Rogerio Feris, Alex Bronstein, and Raja Giryes. Starnet: towards weakly supervised few-shot object detection. Proceedings of AAAI, 35(2):1743–1753, May 2021. [34] Tom Silver, Kelsey R. Allen, Alex K. Lew, Leslie Pack Kaelbling, and Josh Tenenbaum. Few-shot bayesian imitation learning with logical program policies. Proceedings of the AAAI Conference on Artificial Intelligence, 34(06):10251–10258, Apr. 2020. [35] Andrew Y. Ng, Daishi Harada, and Stuart Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of ICML, pages 278–287, 1999. [36] Jonathan Ho and Stefano Ermon. Generative adversarial imitation learning. In Advances in Neural Information Processing Systems, volume 29, 2016. [37] Siddharth Reddy, Anca D. Dragan, and Sergey Levine. Sqil: Imitation learning via reinforcement learning with sparse rewards. In ICLR, 2020. [38] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, volume 27, 2014. [39] Tamar Flash and Binyamin Hochner. Motor primitives in vertebrates and invertebrates. Current Opinion in Neurobiology, 15(6):660–666, 2005. Motor sytems / Neurobiology of behaviour. [40] Simon Manschitz, Jens Kober, Michael Gienger, and Jan Peters. Learning movement primitive attractor goals and sequential skills from kinesthetic demonstrations. Robotics and Autonomous Systems, 74:97–107, 2015. [41] Andy Zeng, Shuran Song, Stefan Welker, Johnny Lee, Alberto Rodriguez, and Thomas Funkhouser. Learning synergies between pushing and grasping with self-supervised deep reinforcement learning. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4238–4245, 2018. [42] Luca Marzari, Ameya Pore, Diego Dall’Alba, Gerardo Aragon-Camarasa, Alessandro Farinelli, and Paolo Fiorini. Towards hierarchical task decomposition using deep reinforcement learning for pick and place subtasks. arXiv, abs/2102.04022, 2021. [43] Youngwoon Lee, Shao-Hua Sun, Sriram Somasundaram, Edward S. Hu, and Joseph J. Lim. Composing complex skills by learning transition policies. In Proceedings of ICLR, 2019. [44] Sang-Hyun Lee and Seung-Woo Seo. Learning compound tasks without task-specific knowledge via imitation and self-supervised learning. In Proceedings of ICML, volume 119, pages 5747–5756, 13–18 Jul 2020. [45] Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, and Xiaoou Tang. Residual attention network for image classification. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6450–6458, 2017. [46] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. [47] Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. Self-attention generative adversarial networks, 2019. [48] Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. Effective approaches to attention-based neural machine translation, 2015. [49] Stephen James, Marc Freese, and Andrew J. Davison. Pyrep: Bringing v-rep to deep robot learning. arXiv preprint arXiv:1906.11176, 2019. [50] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579–2605, 2008. [51] Kuo-Han Hung, Pang-Chi Lo, Jia-Fong Yeh, Han-Yuan Hsu, Yi-Ting Chen, and Winston H. Hsu. Victor: Learning hierarchical vision-instruction correlation rewards for long-horizon manipulation. In The Thirteenth International Conference on Learning Representations (ICLR), 2025. [52] Kuo-Han Hung, Pang-Chi Lo, Jia-Fong Yeh, Han-Yuan Hsu, Yi-Ting Chen, and Winston H. Hsu. Victor: Learning hierarchical vision-instruction correlation rewards for long-horizon manipulation. https://arxiv.org/abs/2405.16545, 2025. arXiv. [53] Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE international conference on robotics and automation (ICRA), pages 6292–6299. IEEE, 2018. [54] Abhishek Gupta, Vikash Kumar, Corey Lynch, Sergey Levine, and Karol Hausman. Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. arXiv preprint arXiv:1910.11956, 2019. [55] Shubham Pateria, Budhitama Subagdja, Ah-hwee Tan, and Chai Quek. Hierarchical reinforcement learning: A comprehensive survey. ACM Computing Surveys (CSUR), 54(5):1–35, 2021. [56] Andrew Y. Ng and Stuart J. Russell. Algorithms for inverse reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML ’00, page 663–670, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc. [57] Pieter Abbeel and Andrew Ng. Apprenticeship learning via inverse reinforcement learning. Proceedings, Twenty-First International Conference on Machine Learning, ICML 2004, 09 2004. [58] Avi Singh, Larry Yang, Kristian Hartikainen, Chelsea Finn, and Sergey Levine. End-to-end robotic reinforcement learning without reward engineering. arXiv preprint arXiv:1904.07854, 2019. [59] Yecheng Jason Ma, Shagun Sodhani, Dinesh Jayaraman, Osbert Bastani, Vikash Kumar, and Amy Zhang. Vip: Towards universal visual reward and representation via value-implicit pre-training. arXiv preprint arXiv:2210.00030, 2022. [60] Annie S. Chen, Suraj Nair, and Chelsea Finn. Learning generalizable robotic reward functions from ”in-the-wild” human videos. ArXiv, abs/2103.16817, 2021. [61] Minttu Alakuijala, Gabriel Dulac-Arnold, Julien Mairal, Jean Ponce, and Cordelia Schmid. Learning reward functions for robotic manipulation by observing humans, 2023. [62] Xuzhe Dang, Stefan Edelkamp, and Nicolas Ribault. Clip-motion: Learning reward functions for robotic actions using consecutive observations, 2023. [63] Parsa Mahmoudieh, Deepak Pathak, and Trevor Darrell. Zero-shot reward specification via grounded natural language. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 14743–14752. PMLR, 17–23 Jul 2022. [64] Sumedh A Sontakke, Jesse Zhang, Sébastien MR Arnold, Karl Pertsch, Erdem Bıyık, Dorsa Sadigh, Chelsea Finn, and Laurent Itti. Roboclip: one demonstration is enough to learn robot policies. arXiv preprint arXiv:2310.07899, 2023. [65] Jingyun Yang, Max Sobol Mark, Brandon Vu, Archit Sharma, Jeannette Bohg, and Chelsea Finn. Robot fine-tuning made easy: Pre-training rewards and policies for autonomous real-world reinforcement learning. In Towards Generalist Robots: Learning Paradigms for Scalable Skill Acquisition @ CoRL2023, 2023. [66] Juan Rocamonde, Victoriano Montesinos, Elvis Nava, Ethan Perez, and David Lindner. Vision-language models are zero-shot reward models for reinforcement learning, 2023. [67] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision, 2021. [68] Mengda Xu, Zhenjia Xu, Cheng Chi, Manuela Veloso, and Shuran Song. XSkill: Cross embodiment skill discovery. In 7th Annual Conference on Robot Learning, 2023. [69] Dhruv Shah, Błażej Osiński, Sergey Levine, et al. Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action. In Conference on Robot Learning, pages 492–504. PMLR, 2023. [70] Chan Hee Song, Jiaman Wu, Clayton Washington, Brian M Sadler, Wei-Lun Chao, and Yu Su. Llm-planner: Few-shot grounded planning for embodied agents with large language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2998–3009, 2023. [71] Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR, 2022. [72] Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. Code as policies: Language model programs for embodied control, 2023. [73] Sai H Vemprala, Rogerio Bonatti, Arthur Bucker, and Ashish Kapoor. Chatgpt for robotics: Design principles and model abilities. IEEE Access, 2024. [74] Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, and Brian Ichter. Inner monologue: Embodied reasoning through planning with language models, 2022. [75] Kevin Lin, Christopher Agia, Toki Migimatsu, Marco Pavone, and Jeannette Bohg. Text2motion: From natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153, 2023. [76] Anthony Brohan, Yevgen Chebotar, Chelsea Finn, Karol Hausman, Alexander Herzog, Daniel Ho, Julian Ibarz, Alex Irpan, Eric Jang, Ryan Julian, et al. Do as i can, not as i say: Grounding language in robotic affordances. In Conference on Robot Learning, pages 287–318. PMLR, 2023. [77] Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Cliport: What and where pathways for robotic manipulation. In Aleksandra Faust, David Hsu, and Gerhard Neumann, editors, Proceedings of the 5th Conference on Robot Learning, volume 164 of Proceedings of Machine Learning Research, pages 894–906. PMLR, 08–11 Nov 2022. [78] Vivek Myers, Andre He, Kuan Fang, Homer Walke, Philippe Hansen-Estruch, Ching-An Cheng, Mihai Jalobeanu, Andrey Kolobov, Anca Dragan, and Sergey Levine. Goal representations for instruction following: A semi-supervised language interface to control. arXiv preprint arXiv:2307.00117, 2023. [79] Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, and Anton van den Hengel. Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments, 2018. [80] Dhruv Shah, Blazej Osinski, Brian Ichter, and Sergey Levine. Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action, 2022. [81] Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, and Pete Florence. Palm-e: An embodied multimodal language model, 2023. [82] Jongheon Jeong, Yang Zou, Taewan Kim, Dongqing Zhang, Avinash Ravichandran, and Onkar Dabeer. Winclip: Zero-/few-shot anomaly classification and segmentation, 2023. [83] Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexander Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Lisa Lee, Tsang-Wei Edward Lee, Sergey Levine, Yao Lu, Henryk Michalewski, Igor Mordatch, Karl Pertsch, Kanishka Rao, Krista Reymann, Michael Ryoo, Grecia Salazar, Pannag Sanketi, Pierre Sermanet, Jaspiar Singh, Anikait Singh, Radu Soricut, Huong Tran, Vincent Vanhoucke, Quan Vuong, Ayzaan Wahid, Stefan Welker, Paul Wohlhart, Jialin Wu, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, and Brianna Zitkovich. Rt-2: Vision-language-action models transfer web knowledge to robotic control, 2023. [84] Pierre-Luc Bacon, Jean Harb, and Doina Precup. The option-critic architecture. In Proceedings of the AAAI conference on artificial intelligence, page 1726–1734, 2017. [85] Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. Feudal networks for hierarchical reinforcement learning, 2017. [86] Akhil Bagaria and George Konidaris. Option discovery using deep skill chaining. In International Conference on Learning Representations, 2019. [87] Elliot Chane-Sane, Cordelia Schmid, and Ivan Laptev. Goal-conditioned reinforcement learning with imagined subgoals. In International Conference on Machine Learning, pages 1430–1440. PMLR, 2021. [88] Siddharth Srivastava, Eugene Fang, Lorenzo Riano, Rohan Chitnis, Stuart Russell, and Pieter Abbeel. Combined task and motion planning through an extensible planner-independent interface layer. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 639–646, 2014. [89] Caelan Reed Garrett, Rohan Chitnis, Rachel Holladay, Beomjoon Kim, Tom Silver, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. Integrated task and motion planning, 2020. [90] Shuo Cheng and Danfei Xu. League: Guided skill learning and abstraction for long-horizon manipulation, 2023. [91] Chen Wang, Linxi Fan, Jiankai Sun, Ruohan Zhang, Li Fei-Fei, Danfei Xu, Yuke Zhu, and Anima Anandkumar. Mimicplay: Long-horizon imitation learning by watching human play, 2023. [92] Zhiwei Jia, Vineet Thumuluri, Fangchen Liu, Linghao Chen, Zhiao Huang, and Hao Su. Chain-of-thought predictive control. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors, Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pages 21768–21790. PMLR, 21–27 Jul 2024. [93] Aishwarya Kamath, Mannat Singh, Yann LeCun, Gabriel Synnaeve, Ishan Misra, and Nicolas Carion. Mdetr-modulated detection for end-to-end multi-modal understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1780–1790, 2021. [94] Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018. [95] Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, and Sergey Levine. Time-contrastive networks: Self-supervised learning from video. 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1134–1141, 2017. [96] Suraj Nair, Aravind Rajeswaran, Vikash Kumar, Chelsea Finn, and Abhinav Gupta. R3m: A universal visual representation for robot manipulation. arXiv preprint arXiv:2203.12601, 2022. [97] A. Ng, Daishi Harada, and Stuart J. Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In International Conference on Machine Learning, 1999. [98] Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Karol Hausman, Chelsea Finn, and Sergey Levine. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning (CoRL), 2019. [99] Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J. Davison. Rlbench: The robot learning benchmark & learning environment. IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020. [100] Kaizhi Zheng, Xiaotong Chen, Odest Chadwicke Jenkins, and Xin Wang. Vlmbench: A compositional benchmark for vision-and-language manipulation. Advances in Neural Information Processing Systems, 35:665–678, 2022. [101] Oier Mees, Lukas Hermann, Erick Rosete-Beas, and Wolfram Burgard. Calvin: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks, 2022. [102] Coppelia Robotics. Coppeliasim software. https://www.coppeliarobotics.com/. [103] Stephen James, Marc Freese, and Andrew J. Davison. Pyrep: Bringing v-rep to deep robot learning. arXiv preprint arXiv:1906.11176, 2019. [104] Suraj Nair, Eric Mitchell, Kevin Chen, brian ichter, Silvio Savarese, and Chelsea Finn. Learning language-conditioned robot behavior from offline data and crowdsourced annotation. In Aleksandra Faust, David Hsu, and Gerhard Neumann, editors, Proceedings of the 5th Conference on Robot Learning, volume 164 of Proceedings of Machine Learning Research, pages 1303–1315. PMLR, 08–11 Nov 2022. [105] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017. [106] Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021. [107] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017. [108] Jia-Fong Yeh, Kuo-Han Hung, Pang-Chi Lo, Chi-Ming Chung, Tsung-Han Wu, Hung-Ting Su, Yi-Ting Chen, and Winston H. Hsu. Aed: Adaptable error detection for few-shot imitation policy. In Advances in Neural Information Processing Systems, volume 37, pages 136805–136836, 2024. [109] Jia-Fong Yeh, Kuo-Han Hung, Pang-Chi Lo, Chi-Ming Chung, Tsung-Han Wu, Hung-Ting Su, Yi-Ting Chen, and Winston H. Hsu. Aed: Adaptable error detection for few-shot imitation policy. https://arxiv.org/pdf/2402.03860, 2024. arXiv. [110] Kourosh Hakhamaneshi, Ruihan Zhao, Albert Zhan, Pieter Abbeel, and Michael Laskin. Hierarchical few-shot imitation with skill transition models. In International Conference on Learning Representations, 2022. [111] Stone Tao, Xiaochen Li, Tongzhou Mu, Zhiao Huang, Yuzhe Qin, and Hao Su. Abstract-to-executable trajectory translation for one-shot task generalization. In Proceedings of the 40th International Conference on Machine Learning, pages 33850–33882, 2023. [112] Sangwoo Shin, Daehee Lee, Minjong Yoo, Woo Kyung Kim, and Honguk Woo. One-shot imitation in a non-stationary environment via multi-modal skill. In Proceedings of the 40th International Conference on Machine Learning, pages 31562–31578, 23–29 Jul 2023. [113] Jinxin Liu, Li He, Yachen Kang, Zifeng Zhuang, Donglin Wang, and Huazhe Xu. CEIL: Generalized contextual imitation learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. [114] Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, and Linxi Fan. VIMA: Robot manipulation with multimodal prompts. In International Conference on Machine Learning, pages 14975–15022, 2023. [115] Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, and Wojciech Zaremba. One-shot imitation learning. In Advances in Neural Information Processing Systems, volume 30, 2017. [116] Stephen James, Michael Bloesch, and Andrew J Davison. Task-embedded control networks for few-shot imitation learning. In Proceedings of the Conference on Robot Learning, 2018. [117] Alessandro Bonardi, Stephen James, and Andrew J. Davison. Learning one-shot imitation from humans without humans. IEEE Robotics and Automation Letters, 5(2):3533–3539, 2020. [118] Sudeep Dasari and Abhinav Gupta. Transformers for one-shot imitation learning. In Proceedings of the Conference on Robot Learning, 2020. [119] Yisheng Song, Ting Wang, Puyu Cai, Subrota K. Mondal, and Jyoti Prakash Sahoo. A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities. ACM Comput. Surv., 55(13s), jul 2023. [120] Chong Zhou and Randy C. Paffenroth. Anomaly detection with robust deep autoencoders. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 665–674, 2017. [121] Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Müller, and Marius Kloft. Deep one-class classification. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 4393–4402, 10–15 Jul 2018. [122] Daehyung Park, Yuuna Hoshi, and Charles C. Kemp. A multimodal anomaly detector for robot-assisted feeding using an lstm-based variational autoencoder. IEEE Robotics and Automation Letters, 3(3):1544–1551, 2018. [123] Tingting Chen, Xueping Liu, Bizhong Xia, Wei Wang, and Yongzhi Lai. Unsupervised anomaly detection of industrial robots using sliding-window convolutional variational autoencoder. IEEE Access, 8:47072–47081, 2020. [124] Josiah Wong, Albert Tung, Andrey Kurenkov, Ajay Mandlekar, Li Fei-Fei, Silvio Savarese, and Roberto Martín-Martín. Error-aware imitation learning from teleoperation data for mobile manipulation. In 5th Annual Conference on Robot Learning, 2021. [125] Oded Maron and Tomás Lozano-Pérez. A framework for multiple-instance learning. In M. Jordan, M. Kearns, and S. Solla, editors, Advances in Neural Information Processing Systems, volume 10, 1997. [126] Waqas Sultani, Chen Chen, and Mubarak Shah. Real-world anomaly detection in surveillance videos. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6479–6488, 2018. [127] Karsten Roth, Latha Pemula, Joaquin Zepeda, Bernhard Schölkopf, Thomas Brox, and Peter Gehler. Towards total recall in industrial anomaly detection. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14298–14308, 2022. [128] Jia-Chang Feng, Fa-Ting Hong, and Wei-Shi Zheng. Mist: Multiple instance self-training framework for video anomaly detection. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14004–14013, 2021. [129] Keval Doshi and Yasin Yilmaz. Towards interpretable video anomaly detection. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2654–2663, 2023. [130] Chelsea Finn, Tianhe Yu, Tianhao Zhang, Pieter Abbeel, and Sergey Levine. One-shot visual imitation learning via meta-learning. In CoRL 2017, pages 357–368, 13–15 Nov 2017. [131] Tianhe Yu, Chelsea Finn, Sudeep Dasari, Annie Xie, Tianhao Zhang, Pieter Abbeel, and Sergey Levine. One-shot imitation from observing humans via domain-adaptive meta-learning. In Robotics: Science and Systems (RSS), 26-30 June 2018. [132] Tianhe Yu, Pieter Abbeel, Sergey Levine, and Chelsea Finn. One-shot composition of vision-based skills from demonstration. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2643–2650, 2019. [133] Hayato Watahiki and Yoshimasa Tsuruoka. One-shot imitation with skill chaining using a goal-conditioned policy in long-horizon control. In ICLR 2022 Workshop on Generalizable Policy Learning in Physical World, 2022. [134] Shelly Sheynin, Sagie Benaim, and Lior Wolf. A hierarchical transformation-discriminating generative model for few shot anomaly detection. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 8475–8484, 2021. [135] Guansong Pang, Choubo Ding, Chunhua Shen, and Anton van den Hengel. Explainable deep few-shot anomaly detection with deviation networks. arXiv preprint arXiv:2108.00462, 2021. [136] Chaoqin Huang, Haoyan Guan, Aofan Jiang, Ya Zhang, Michael Spratlin, and Yanfeng Wang. Registration based few-shot anomaly detection. In European Conference on Computer Vision (ECCV), 2022. [137] Ze Wang, Yipin Zhou, Rui Wang, Tsung-Yu Lin, Ashish Shah, and Ser-Nam Lim. Few-shot fast-adaptive anomaly detection. In Advances in Neural Information Processing Systems, 2022. [138] Yiwei Lu, Frank Yu, Mahesh Kumar Krishna Reddy, and Yang Wang. Few-shot scene-adaptive anomaly detection. In European Conference on Computer Vision, 2020. [139] Yong Qiang, Shumin Fei, and Yiping Jiao. Anomaly detection based on latent feature training in surveillance scenarios. IEEE Access, 9:68108–68117, 2021. [140] Xin Huang, Yutao Hu, Xiaoyan Luo, Jungong Han, Baochang Zhang, and Xianbin Cao. Boosting variational inference with margin learning for few-shot scene-adaptive anomaly detection. IEEE Transactions on Circuits and Systems for Video Technology, pages 1–1, 2022. [141] Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger. Mvtecad —a comprehensive real-world dataset for unsupervised anomaly detection. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9584–9592, 2019. [142] Vijay Mahadevan, Weixin Li, Viral Bhalodia, and Nuno Vasconcelos. Anomaly detection in crowded scenes. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1975–1981, 2010. [143] Cewu Lu, Jianping Shi, and Jiaya Jia. Abnormal event detection at 150 fps in matlab. In 2013 IEEE International Conference on Computer Vision, pages 2720–2727, 2013. [144] Danijar Hafner, Kuang-Huei Lee, Ian Fischer, and Pieter Abbeel. Deep hierarchical planning from pixels. In Advances in Neural Information Processing Systems, 2022. [145] Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. In IEEE Conference on Computer Vision and Pattern Recognition, June 2015. [146] Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. D4rl: Datasets for deep data-driven reinforcement learning, 2020. [147] Ajay Mandlekar, Danfei Xu, Josiah Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, and Roberto Martín-Martín. What matters in learning from offline human demonstrations for robot manipulation. In 5th Annual Conference on Robot Learning, 2021. [148] Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gokmen, Sanjana Srivastava, Roberto Martín-Martín, Chen Wang, Gabrael Levine, Michael Lingelbach, Jiankai Sun, Mona Anvari, Minjune Hwang, Manasi Sharma, Arman Aydin, Dhruva Bansal, Samuel Hunter, Kyu-Young Kim, Alan Lou, Caleb R Matthews, Ivan Villa-Renteria, Jerry Huayang Tang, Claire Tang, Fei Xia, Silvio Savarese, Hyowon Gweon, Karen Liu, Jiajun Wu, and Li Fei-Fei. BEHAVIOR-1k: A benchmark for embodied AI with 1,000 everyday activities and realistic simulation. In 6th Annual Conference on Robot Learning, 2022. [149] Jiayuan Gu, Fanbo Xiang, Xuanlin Li, Zhan Ling, Xiqiang Liu, Tongzhou Mu, Yihe Tang, Stone Tao, Xinyue Wei, Yunchao Yao, Xiaodi Yuan, Pengwei Xie, Zhiao Huang, Rui Chen, and Hao Su. Maniskill2: A unified benchmark for generalizable manipulation skills. In International Conference on Learning Representations, 2023. [150] Yingxian Chen, Zhengzhe Liu, Baoheng Zhang, Wilton Fok, Xiaojuan Qi, and Yik-Chung Wu. Mgfn: Magnitude-contrastive glance-and-focus network for weakly-supervised video anomaly detection. Proceedings of the AAAI Conference on Artificial Intelligence, 37(1):387–395, Jun. 2023. [151] Hongzuo Xu, Yijie Wang, Juhui Wei, Songlei Jian, Yizhou Li, and Ning Liu. Fascinating supervisory signals and where to find them: deep anomaly detection with scale learning. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023. [152] Geoffrey Hinton. Rmsprop optimizer. http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf. [153] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997. [154] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579–2605, 2008. [155] Pratyusha Sharma, Balakumar Sundaralingam, Valts Blukis, Chris Paxton, Tucker Hermans, Antonio Torralba, Jacob Andreas, and Dieter Fox. Correcting Robot Plans with Natural Language Feedback. In Proceedings of Robotics: Science and Systems, New York City, NY, USA, June 2022. [156] Yuchen Cui, Siddharth Karamcheti, Raj Palleti, Nidhya Shivakumar, Percy Liang, and Dorsa Sadigh. No, to the right: Online language corrections for robotic manipulation via shared autonomy. In Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction, page 93–101, 2023. [157] Mengxi Li, Alper Canberk, Dylan P. Losey, and Dorsa Sadigh. Learning human objectives from sequences of physical corrections. In IEEE International Conference on Robotics and Automation, 2021. [158] Huihan Liu, Shivin Dass, Roberto Martín-Martín, and Yuke Zhu. Model-based runtime monitoring with interactive imitation learning, 2023. [159] Aivar Sootla, Alexander Imani Cowen-Rivers, Jun Wang, and Haitham Bou Ammar. Enhancing safe exploration using safety state augmentation. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. [160] Greg Anderson, Swarat Chaudhuri, and Isil Dillig. Guiding safe exploration with weakest preconditions. In International Conference on Learning Representations, 2023. [161] Ruiquan Huang, Jing Yang, and Yingbin Liang. Safe exploration incurs nearly no additional sample complexity for reward-free RL. In International Conference on Learning Representations, 2023.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97512	-
dc.description.abstract	策略學習是機器人學習中的重要領域，旨在為機器人尋找能有效完成任務的策略。近年來，策略學習已有許多應用，但在將其部署於真實場景中的機器人時，仍面臨諸多挑戰：現實場景變化多端且干擾因素眾多，現實環境中缺乏即時獎勵回饋以協助策略評估其表現，以及機器人在現實環境中發生失誤可能帶來嚴重的安全隱患。這些挑戰限制了策略學習的進一步發展。因此，本論文針對這些挑戰進行分析，並提出解決方案，期望能促進策略學習在真實世界中的應用落地。針對訓練與部署環境差異的挑戰，我們研究少樣本模仿學習任務，旨在僅使用少量示範即可適應新場域。我們開發了一種策略，能夠應對多階段操作任務、示範長度不一致且關鍵資訊未時間對齊，以及示範者與機器人結構或外觀不同的情況。為此，我們設計了階段意識注意力模型，以分析機器人的當前階段並關注示範中相應階段的資訊，並採用基於示範的條件策略學習專家與機器人之間的動作映射。在實驗中，我們的方法在兩階段及三階段任務中的表現均優於其他少樣本學習方法，並展現對示範品質的高度魯棒性。為解決缺乏即時獎勵的挑戰，我們探索了將長時間操作任務進行階層式拆解的方法，並開發基於視覺觀察與任務指令關聯的獎勵生成函數。我們利用大型語言模型的推理能力拆解任務並分析環境中物體的變化，透過階段判斷器定位所屬階段後，利用大型視覺語言模型評估機器人當前動作及完成程度。此外，我們設計了多個對比學習目標來輔助模型訓練。此階層化拆解方法使獎勵生成模型能夠提供細緻的獎勵訊號，幫助強化學習方法精確掌握任務進度。在相同的強化學習框架下，搭配我們的獎勵生成模型可完成更具挑戰性的長時間操作任務。最後，為應對策略在新環境中的安全性挑戰，我們研究了策略行為監控的方法，確保其行為與示範意圖一致。我們定義了自適應錯誤偵測任務，並設計了一種基於模式分析的錯誤偵測模型，用於判斷策略抽取的特徵是正常還是異常。我們進一步引入了兩個對比學習目標以提升模型學習效果。實驗結果顯示，在我們構建的基準上，此錯誤偵測模型能精準地及時發現錯誤，並在七個任務與三種策略測試中展現最佳效能。同時，結合多個錯誤偵測器與修正策略後，僅我們的方法能有效偵測並修正錯誤，從而提升策略表現。本論文旨在加速策略學習在真實環境中的應用，針對少樣本模仿學習、視覺與指令的獎勵生成模型以及策略行為異常監控等挑戰提出創新任務與方法，並取得超越現有方法的成果。我們亦探討了未來值得關注的方向，希望為策略學習研究的發展與突破提供更多啟發。	zh_TW
dc.description.abstract	Policy learning is a crucial topic in robotics, aiming to develop policies that enable robots to effectively accomplish tasks. While policy learning has seen significant advancements and applications in recent years, deploying it in practical robotic applications still faces several challenges: the variability and complexity of real-world environments, the lack of reward feedback to guide the policy during training, and the potential for robot execution failures to cause serious safety concerns. These challenges limit the progress of policy learning. To address this, this dissertation examines these challenges in depth and proposes solutions to accelerate the adoption of policy learning in practical applications. To tackle the challenge of adapting to deployment environments different from training ones, we investigate few-shot imitation learning tasks that require adaptation to new domains with only a few demonstrations. We develop a policy that addresses multi-stage manipulation tasks, handles demonstrations of varying lengths with temporally misaligned key information, and bridges configuration or appearance differences between demonstrators and agents. To this end, we designed a stage-conscious attention model to analyze the robot’s current stage and focus on the corresponding stage information in the demonstrations. Additionally, we employed a demonstration-conditioned policy to learn the mapping between expert and agent actions. Experiments show that our method outperforms other few-shot imitation learning approaches in both two- and three-stage tasks and demonstrates superior robustness to demonstration quality. For the challenge of lacking reward functions, we propose a hierarchical approach to decompose long-horizon tasks and develop a reward generation model based on the correlation between visual observations and task instructions. Leveraging the reasoning capabilities of large language models, we decompose tasks and analyze changes in object states in the environment. After identifying the task stage using a stage detector, we use a large vision-language model to evaluate the robot’s current motion and its progress. We also designed multiple contrastive learning objectives to aid model training. This hierarchical decomposition enables our reward model to provide fine-grained reward signals, offering reinforcement learning methods precise information on task progress. Using the same reinforcement learning method, training with our reward model achieves better performance on challenging long-horizon tasks. Finally, addressing the policy's safety concerns in novel environments, we explore how to monitor policy behavior to ensure it remains consistent with the intent demonstrated. We define the adaptable error detection task and design a pattern-explored error detection model to classify policy features as normal or abnormal. Two contrastive learning objectives were introduced to enhance model training. On the benchmarks we constructed, our error detection model identifies errors with precise timing and achieves the best performance across seven tasks and three policies. Moreover, when integrating various error detectors with error correction policies, only the integration with our model effectively detects and corrects errors, improving policy performance. This dissertation aims to advance the practical development of policy learning by addressing challenges in few-shot imitation learning, vision- and instruction-based reward generation, and policy erroneous behavior detection. We propose novel tasks and methods that achieve state-of-the-art performance while surpassing existing approaches. Additionally, we discuss promising future directions to inspire further advancements and breakthroughs in policy learning research.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-07-02T16:14:15Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-07-02T16:14:15Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Contents Verification Letter from the Oral Examination Committee i Acknowledgements iii 摘要 v Abstract vii Contents xi List of Figures xv List of Tables xxv Chapter 1 Introduction 1 1.1 What Is Policy Learning? 1 1.2 Why Policy Learning Is Important? 3 1.3 Dissertation Contents and Contributions 4 Chapter 2 Learning Policies from a Few Demonstrations 5 2.1 Foreword 5 2.2 Introduction 6 2.3 Related Work 9 2.4 Few-Shot Imitation Learning (FSIL) 11 2.4.1 Problem Statement 12 2.4.2 Survey of Existing Methods 13 2.4.3 Demonstration-conditioned (DC) Policy 14 2.5 Methodology 16 2.6 Experiments 23 2.6.1 Evaluation Tasks 24 2.6.2 Baselines 26 2.6.3 Performance Comparison 28 2.6.4 Effectiveness of Stage Conscious Attention (SCA) 30 2.6.5 Ablation Study 34 2.7 Chapter Summary 37 Chapter 3 Learning Rewards from Vision and Instructions 39 3.1 Forword 39 3.2 Introduction 40 3.3 Related Work 43 3.4 Preliminaries 44 3.5 Method 45 3.5.1 Task Knowledge Generation 46 3.5.2 Stage Detection 47 3.5.3 Motion Progress Evaluator 48 3.5.4 Reward Formulation & Policy Learning 52 3.6 Experiments 53 3.6.1 Evaluation Tasks 53 3.6.2 Baselines and Details of Model Implementation 56 3.6.3 Experimental Results in Our Benchmark 59 3.6.4 Experimental Results in Real-world Dataset 72 3.6.5 Conclusion 73 Chapter 4 Detecting Erroneous Policy Behavior 75 4.1 Foreword 75 4.2 Introduction 76 4.3 Related Work 79 4.3.1 Few-shot Imitation (FSI) 79 4.3.2 Few-shot Anomaly Detection (FSAD) 80 4.4 Preliminaries 81 4.5 Adaptable Error Detection (AED) 83 4.6 Pattern Observer (PrObe) 85 4.6.1 Rollout Augmentation 85 4.6.2 PrObe Architecture 86 4.7 Experiments 90 4.7.1 AED Benchmark 90 4.7.2 Details of Few-shot Imitation (FSI) Policies and AED Baselines 95 4.7.3 Analysis of Experimental Results 98 4.7.4 Pilot Study on Error Correction 108 4.7.5 Limitations 109 4.7.6 Conclusion 110 Chapter 5 Conclusion 113 References 115	-
dc.language.iso	en	-
dc.subject	機器人學習	zh_TW
dc.subject	具身智能	zh_TW
dc.subject	策略學習	zh_TW
dc.subject	獎勵生成	zh_TW
dc.subject	行為監控	zh_TW
dc.subject	Behavior Monitoring	en
dc.subject	Robot Learning	en
dc.subject	Embodied AI	en
dc.subject	Policy Learning	en
dc.subject	Reward Generation	en
dc.title	針對現實場景的策略學習的發展與監控	zh_TW
dc.title	Development and Monitoring of Policy Learning for Real-world Scenarios	en
dc.type	Thesis	-
dc.date.schoolyear	113-2	-
dc.description.degree	博士	-
dc.contributor.oralexamcommittee	葉梅珍;陳奕廷;陳駿丞;李濬屹;賴尚宏;孫民;王蒞君;連震杰	zh_TW
dc.contributor.oralexamcommittee	Mei-Chen Yeh;Yi-Ting Chen;Jun-Cheng Chen;Chun-Yi Lee;Shang-Hong Lai;Min Sun;Li-Chun Wang;Jenn-Jier Lien	en
dc.subject.keyword	機器人學習,具身智能,策略學習,獎勵生成,行為監控,	zh_TW
dc.subject.keyword	Robot Learning,Embodied AI,Policy Learning,Reward Generation,Behavior Monitoring,	en
dc.relation.page	141	-
dc.identifier.doi	10.6342/NTU202500305	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2025-06-02	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
dc.date.embargo-lift	2025-07-03	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	25.07 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。