應用自動驗證技術提升自然語言程式化問答系統可靠性之綜合研究

任恬儀; Tien-Yi Jen

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90168

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳信希	zh_TW
dc.contributor.advisor	Hsin-Hsi Chen	en
dc.contributor.author	任恬儀	zh_TW
dc.contributor.author	Tien-Yi Jen	en
dc.date.accessioned	2023-09-22T17:41:57Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-09-22	-
dc.date.issued	2023	-
dc.date.submitted	2023-08-03	-
dc.identifier.citation	References Aida Amini, Saadia Gabriel, Shanchuan Lin, Rik Koncel-Kedziorski, Yejin Choi, and Hannaneh Hajishirzi. 2019. Mathqa: Towards interpretable math word problem solving with operation-based formalisms. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2357–2367. Mattia Atzeni, Jasmina Bogojeska, and Andreas Loukas. 2021a. Sqaler: Scaling question answering by decoupling multi-hop and logical reasoning. Advances in Neural Information Processing Systems, 34:12587–12599. Mattia Atzeni, Jasmina Bogojeska, and Andreas Loukas. 2021b. Sqaler: Scaling question answering by decoupling multi-hop and logical reasoning. Advances in Neural Information Processing Systems, 34:12587–12599. Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In The Semantic Web: 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007+ ASWC 2007, Busan, Korea, November 11-15, 2007. Proceedings, pages 722–735. Springer. Yefim Bakman. 2007. Robust understanding of word problems with extraneous information. arXiv preprint math/0701393. Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1533–1544. Daniel G. Bobrow. 1964. Natural language input for a computer problem solving system. Technical report, USA. Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247–1250. Antoine Bordes, Nicolas Usunier, Sumit Chopra, and Jason Weston. 2015. Large-scale simple question answering with memory networks. arXiv preprint arXiv:1506.02075. Antoine Bordes, Jason Weston, and Nicolas Usunier. 2014. Open question answering with weakly supervised embedding models. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part I 14, pages 165–180. Springer. Diane J Briars and Jill H Larkin. 1984. An integrated model of skill in solving elementary word problems. Cognition and instruction, 1(3):245–296. Qingqing Cai and Alexander Yates. 2013. Large-scale semantic parsing via schema matching and lexicon extension. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 423–433. Shulin Cao, Jiaxin Shi, Liangming Pan, Lunyiu Nie, Yutong Xiang, Lei Hou, Juanzi Li, Bin He, and Hanwang Zhang. 2022. Kqa pro: A dataset with explicit compositional programs for complex question answering over knowledge base. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6101–6119. Kezhen Chen, Qiuyuan Huang, Hamid Palangi, Paul Smolensky, Kenneth D Forbus, and Jianfeng Gao. 2020. Mapping natural-language problems to formal-language solutions using structured neural representations. In Proc. of ICML, volume 2020. Xinyun Chen, Chen Liang, Adams Wei Yu, Denny Zhou, Dawn Song, and Quoc V Le. 2019. Neural symbolic reader: Scalable integration of distributed and symbolic representations for reading comprehension. In International Conference on Learning Representations. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Li Dong, Furu Wei, Ming Zhou, and Ke Xu. 2015. Question answering over freebase with multi-column convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 260–269. Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, and Matt Gardner. 2019. DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs. In Proc. of NAACL. Mohnish Dubey, Debayan Banerjee, Abdelrahman Abdelkawi, and Jens Lehmann. 2019. Lc-quad 2.0: A large dataset for complex question answering over wikidata and dbpedia. In The Semantic Web–ISWC 2019: 18th International Semantic Web Conference, Auckland, New Zealand, October 26–30, 2019, Proceedings, Part II 18, pages 69–78. Springer. Mor Geva, Ankit Gupta, and Jonathan Berant. 2020. Injecting numerical reasoning skills into language models. In ACL. Yu Gu, Sue Kase, Michelle Vanni, Brian Sadler, Percy Liang, Xifeng Yan, and Yu Su. 2021. Beyond iid: three levels of generalization for question answering on knowledge bases. In Proceedings of the Web Conference 2021, pages 3477–3488. Steve Harris and Andy Seaborne. 2013. SPARQL 1.1 query language. W3C Recommendation 21 March 2013, World Wide Web Consortium. Charles Hinson, Hen-Hsen Huang, and Hsin-Hsi Chen. 2020. Heterogeneous recycle generation for Chinese grammatical error correction. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2191–2201, Barcelona, Spain (Online). International Committee on Computational Linguistics. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780. Mohammad Javad Hosseini, Hannaneh Hajishirzi, Oren Etzioni, and Nate Kushman. 2014. Learning to solve arithmetic word problems with verb categorization. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 523–533. Danqing Huang, Jing Liu, Chin-Yew Lin, and Jian Yin. 2018a. Neural math word problem solver with reinforcement learning. In Proceedings of the 27th International Conference on Computational Linguistics, pages 213–223. Danqing Huang, Jin-Ge Yao, Chin-Yew Lin, Qingyu Zhou, and Jian Yin. 2018b. Using intermediate representations to solve math word problems. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 419–428. Sarthak Jain. 2016. Question answering over knowledge base using factual memory net- works. In Proceedings of the NAACL student research workshop, pages 109–115. Daniel Keysers, Nathanael Schärli, Nathan Scales, Hylke Buisman, Daniel Furrer, Sergii Kashubin, Nikola Momchev, Danila Sinopalnikov, Lukasz Stafiniak, Tibor Tihon, et al. 2019. Measuring compositional generalization: A comprehensive method on realistic data. In International Conference on Learning Representations. Yunshi Lan and Jing Jiang. 2020. Query graph generation for answering multi-hop complex questions from knowledge bases. Association for Computational Linguistics. Wang Ling, Dani Yogatama, Chris Dyer, and Phil Blunsom. 2017. Program induction by rationale generation: Learning to solve and explain algebraic word problems. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 158–167. Thang Luong, Ilya Sutskever, Quoc Le, Oriol Vinyals, and Wojciech Zaremba. 2015. Addressing the rare word problem in neural machine translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: LongPapers), pages 11–19, Beijing, China. Association for Computational Linguistics. OpenAI. 2021. Chatgpt: Large-scale language model. https://openai.com/research/chat- gpt. Accessed: July 3, 2023. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551. Srinivas Ravishankar, June Thai, Ibrahim Abdelaziz, Nandana Mihidukulasooriya, Tahira Naseem, Pavan Kapanipathi, Gaetano Rossiello, and Achille Fokoue. 2021. A two- stage approach towards generalization in knowledge base question answering. arXiv preprint arXiv:2111.05825. Subhro Roy, Tim Vieira, and Dan Roth. 2015. Reasoning about quantities in natural language. Transactions of the Association for Computational Linguistics, 3:1–13. Apoorv Saxena, Aditay Tripathi, and Partha Talukdar. 2020. Improving multi-hop question answering over knowledge graphs using knowledge base embeddings. In Proceedings of the 58th annual meeting of the association for computational linguistics, pages 4498–4507. Yiheng Shu, Zhiwei Yu, Yuhan Li, Börje F Karlsson, Tingting Ma, Yuzhong Qu, and Chin- Yew Lin. 2022. Tiara: Multi-grained retrieval for robust question answering over large knowledge bases. arXiv preprint arXiv:2210.12925. Jianlin Su. 2021. T5 pegasus - zhuiyiai. Technical report. Yawei Sun, Lingling Zhang, Gong Cheng, and Yuzhong Qu. 2020. Sparqa: skeleton-based semantic parsing for complex questions over knowledge bases. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 8952–8959. Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27. Dung Thai, Srinivas Ravishankar, Ibrahim Abdelaziz, Mudit Chaudhary, Nandana Mihindukulasooriya, Tahira Naseem, Rajarshi Das, Pavan Kapanipathi, Achille Fokoue, and Andrew McCallum. 2022. Cbr-ikb: A case-based reasoning approach for question answering over incomplete knowledge bases. arXiv preprint arXiv:2204.08554. Priyansh Trivedi, Gaurav Maheshwari, Mohnish Dubey, and Jens Lehmann. 2017. Lc- quad: A corpus for complex question answering over knowledge graphs. In The Semantic Web–ISWC 2017: 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part II 16, pages 210–218. Springer. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008. Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledge- base. Communications of the ACM, 57(10):78–85. Xu Wang, Shuai Zhao, Jiale Han, Bo Cheng, Hao Yang, Jianchang Ao, and Zhenzi Li. 2020. Modelling long-distance node relations for kbqa with global dynamic graph. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2572–2582. Xi Ye, Semih Yavuz, Kazuma Hashimoto, Yingbo Zhou, and Caiming Xiong. 2022. Rng- kbqa: Generation augmented iterative ranking for knowledge base question answering. In Proceedings of the 60th Annual Meeting of the Association for ComputationalLinguistics (Volume 1: Long Papers), pages 6032–6043. Wen-tau Yih, Matthew Richardson, Christopher Meek, Ming-Wei Chang, and Jina Suh. 2016. The value of semantic parse labeling for knowledge base question answering. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 201–206. Yuyu Zhang, Hanjun Dai, Zornitsa Kozareva, Alexander Smola, and Le Song. 2018. Variational reasoning for question answering with knowledge graph. In Proceedings of the AAAI conference on artificial intelligence, volume 32.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90168	-
dc.description.abstract	大型語言模型 (LLMs) 不僅革新了自然語言處理 (NLP) 領域，也為實際應用帶來了重大變革。儘管有這些進步，像程序生成這樣的領域仍然具有挑戰性。本論文專注於生成兩種類型的程序：數學程序和知識圖譜問答 (KGQA) 程序。對於數學程序，我們的工作提出了一種新穎的回收數值數據擴增 (RNDA) 方法，該方法自動生成高質量的訓練實例與程序。實驗結果顯示，用擴增數據訓練的模型可以達到最先進的性能。與此同時，在KGQA程序的領域，我們提出了一種反向生成的驗證方法以提高可靠性。實驗表明，這種方法也可以提高ChatGPT在此任務的性能。總的來說，該研究通過引入新方法，描繪了程序生成的範式轉變，專注於改善數學和KGQA程序。這些發現為未來的研究提供了一個有前景的基礎，目標是充分利用大型語言模型。	zh_TW
dc.description.abstract	Large Language Models (LLMs) have revolutionized not only the field of Natural Language Processing (NLP) but also brought significant changes to real-world applications. Despite these advancements, certain realms like program generation have been challenging to leverage. This thesis concentrates on the generation of two types of programs: Math programs and Knowledge Graph Question Answering (KGQA) programs. For the math program, our work proposes a novel recycling numeracy data augmentation (RNDA) approach that automatically generates high quality training instances with programs. Experimental results show that the model trained on the augmented data could achieve the state-of-the-art performance. Meanwhile, in the realm of KGQA programs, we propose a reverse generation-based validation to enhance reliability. Experiments show this approach can also improve the performance of the task on the ChatGPT. In essence, the research delineates a paradigm shift in program generation through the introduction of new methods, focusing on the betterment of Math and KGQA programs. The findings offer a promising foundation for future research aimed at leveraging Large Language Models to their fullest potential.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-09-22T17:41:57Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-09-22T17:41:57Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Contents Page Acknowledgements i 摘要 ii Abstract iii Contents v List of Figures ix List of Tables x Chapter 1 Introduction 1 1.1 Math Word Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Knowledge Graph Question Answering . . . . . . . . . . . . . . . . 4 1.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Chapter 2 Related Work 10 2.1 MathWord Solving . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Knowledge Graph Question Answering . . . . . . . . . . . . . . . . 12 2.2.1 Knowledge Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.2 KGQA Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.3 Relation Extraction Approach . . . . . . . . . . . . . . . . . . . . . 13 2.2.4 Chinese Knowledge Graph Question Answering Dataset . . . . . . 13 2.3 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.1 LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.2 transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.3 Bert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.4 t5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.5 GPT3.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Chapter 3 Programs 18 3.1 Programs of MathWord Problem . . . . . . . . . . . . . . . . . . . . 18 3.2 Programs of KBQA . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Chapter 4 Methodology 20 4.1 Method of Automatic Augmentation with Validation of Math Programs 20 4.1.1 Recycling Data Augmentation . . . . . . . . . . . . . . . . . . . . 21 4.1.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.1.3 Numeracy Data Augmentation . . . . . . . . . . . . . . . . . . . . 23 4.1.4 RNDA Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.1.5 Program Generation . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2 Validation of Knowledge Graph Answering via Reverse-Based Method 25 4.2.1 Overall Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.2 Reverse Generate Validation Process . . . . . . . . . . . . . . . . . 27 4.2.2.1 Ranker . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2.3 Reverse Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2.4 Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2.5 ChatGPT No training approach . . . . . . . . . . . . . . . . . . . . 30 Chapter 5 Experiments 32 5.1 Experiment on Math Word Problem Solving . . . . . . . . . . . . . . 32 5.1.1 Experiment Setting . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.1.2 Overall Result of Math Word Problem . . . . . . . . . . . . . . . . 33 5.1.3 Result of RNDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.2 Experiment of Reverse Generation Base Validation of KBQA . . . . 35 5.2.1 Experiment Setting . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2.2 Result of Reverse Generation Base Validation . . . . . . . . . . . . 36 5.2.3 Experiment Setting for ChatGPT . . . . . . . . . . . . . . . . . . . 37 5.2.3.1 Prompt of ChatGPT Reverse Generator . . . . . . . . . 37 5.2.3.2 Prompt of ChatGPT Generator . . . . . . . . . . . . . 38 5.2.3.3 Corrector Prompt . . . . . . . . . . . . . . . . . . . . 39 5.3 Experiment of Generation after Ranking Approach of KGQA on Noisy CKBQA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.3.1 Experiment setting . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.3.2 Experiment Result . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.3.2.1 Ranker Result . . . . . . . . . . . . . . . . . . . . . . 41 5.3.2.2 Ranker + Generator (Pegasus T5-base) . . . . . . . . . 42 5.3.2.3 Ranker + Generator (Pegasus T5-small) . . . . . . . . 42 Chapter 6 Discussion and Analysis 44 6.1 Analysis of Math Word Problem . . . . . . . . . . . . . . . . . . . . 44 6.1.1 Analysis of Augmentation . . . . . . . . . . . . . . . . . . . . . . 44 6.1.2 Analysis of Paremeter h . . . . . . . . . . . . . . . . . . . . . . . . 45 6.2 Case of Augmentation base Validation Combine with ChatGPT . . . 46 6.2.1 Question Generation by ChatGPT . . . . . . . . . . . . . . . . . . 46 6.2.2 ChatGPT Generator (G) . . . . . . . . . . . . . . . . . . . . . . . . 48 Chapter 7 Conclusion 50 References 52	-
dc.language.iso	en	-
dc.subject	知識庫問答	zh_TW
dc.subject	數學問題	zh_TW
dc.subject	大型語言模型	zh_TW
dc.subject	Large Language Model	en
dc.subject	Knowledge Graph Question Answering	en
dc.subject	Math Word Problem Solving	en
dc.title	應用自動驗證技術提升自然語言程式化問答系統可靠性之綜合研究	zh_TW
dc.title	Programming Natural Language for Strengthening QA Reliability through Automatic Validation	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	鄭卜壬;蔡宗翰;蔡銘峰	zh_TW
dc.contributor.oralexamcommittee	Pu-Jen Cheng;Tzong-Han Tsai;Ming-Feng Tsai	en
dc.subject.keyword	知識庫問答,數學問題,大型語言模型,	zh_TW
dc.subject.keyword	Knowledge Graph Question Answering,Math Word Problem Solving,Large Language Model,	en
dc.relation.page	59	-
dc.identifier.doi	10.6342/NTU202302191	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2023-08-07	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊網路與多媒體研究所	-
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	617.15 kB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。