請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92439完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 陳信希 | zh_TW |
| dc.contributor.advisor | Hsin-Hsi Chen | en |
| dc.contributor.author | 黃哲韋 | zh_TW |
| dc.contributor.author | Che-Wei Huang | en |
| dc.date.accessioned | 2024-03-22T16:30:47Z | - |
| dc.date.available | 2025-01-17 | - |
| dc.date.copyright | 2024-03-22 | - |
| dc.date.issued | 2023 | - |
| dc.date.submitted | 2023-12-25 | - |
| dc.identifier.citation | Wenhu Chen, Xinyi Wang, and William Yang Wang. 2021. A dataset for answering time-sensitive questions. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, and Luke Zettlemoyer. 2018. QuAC: Question answering in context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2174–2184, Brussels, Belgium. Association for Computational Linguistics. Jeffrey Dalton, Chenyan Xiong, and Jamie Callan. 2021. Cast 2020: The conversational assistance track overview. In In Proceedings of TREC Jeffrey Dalton, Chenyan Xiong, Vaibhav Kumar, and Jamie Callan. 2020. Cast-19: A dataset for conversational information seeking. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’20, page 1985–1988, New York, NY, USA. Association for Computing Machinery Luyu Gao, Zhuyun Dai, and Jamie Callan. 2021a. Rethink training of BERT rerankers in multi-stage retrieval pipeline. CoRR, abs/2101.08751 Yifan Gao, Jingjing Li, Michael R. Lyu, and Irwin King. 2021b. Open-retrieval conversational machine reading. CoRR, abs/2102.08633 Qiao Jin, Bhuwan Dhingra, Zhengping Liu, William Cohen, and Xinghua Lu. 2019. PubMedQA: A dataset for biomedical research question answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2567–2577, Hong Kong, China. Association for Computational Linguistics. Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zettlemoyer. 2017. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611, Vancouver, Canada. Association for Computational Linguistics Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online. Association for Computational Linguistics Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. Albert: A lite bert for self-supervised learning of language representations. In International Conference on Learning Representations Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach Sewon Min, Julian Michael, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2020. AmbigQA: Answering ambiguous open-domain questions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5783–5797, Online. Association for Computational Linguistics Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2017. MS MARCO: A human-generated MAchine reading COmprehension dataset. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, page 311–318, USA. Association for Computational Linguistics Chen Qu, Liu Yang, Cen Chen, Minghui Qiu, W. Bruce Croft, and Mohit Iyyer. 2020. Open-retrieval conversational question answering. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’20, page 539–548, New York, NY, USA. Association for Computing Machinery Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(1) Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don’t know: Unanswerable questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 784–789, Melbourne, Australia. Association for Computational Linguistics Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics. Siva Reddy, Danqi Chen, and Christopher D. Manning. 2019. CoQA: A conversational question answering challenge. Transactions of the Association for Computational Linguistics, 7:249–266 Marzieh Saeidi, Max Bartolo, Patrick Lewis, Sameer Singh, Tim Rocktäschel, Mike Sheldon, Guillaume Bouchard, and Sebastian Riedel. 2018. Interpretation of natural language rules in conversational machine reading. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2087–2097, Brussels, Belgium. Association for Computational Linguistics Haitian Sun, William Cohen, and Ruslan Salakhutdinov. 2022. ConditionalQA: A complex reading comprehension dataset with conditional answers. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3627–3637, Dublin, Ireland. Association for Computational Linguistics Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, brian ichter, Fei Xia, Ed Chi, Quoc V Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, volume 35, pages 24824–24837. Curran Associates, Inc Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, and Yuji Matsumoto. 2020. LUKE: Deep contextualized entity representations with entity-aware self-attention. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6442–6454, Online. Association for Computational Linguistics Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369–2380, Brussels, Belgium. Association for Computational Linguistics Zhuosheng Zhang, Junjie Yang, and Hai Zhao. 2020. Retrospective reader for machine reading comprehension | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92439 | - |
| dc.description.abstract | 在當今充滿訊息的社會中,從龐大的數據中獲取所需要的訊息有重大挑戰性。人們期望能夠通過程式從廣泛的資料中檢索所需訊息並在推理判斷後提供所需答案。因此,我們在開放式檢索的框架下,建立了一個新的對話式機器閱讀理解資料集,名為WHITE-ShARC。同樣是對話式機器閱讀理解的設定下,與以往多數資料集不同,該資料集強調通過模型提問追問問題的方式逐步推理出用戶問題的答案,而不僅僅依賴代名詞指代以連接兩個不緊密關聯的問題及答案組。另外,相比OR-ShARC資料集,WHITE-ShARC在開放檢索和對話式機器閱讀理解的設定下擁有更廣泛的問題類型,並引入了無法回答的場景。
本論文將介紹WHITE-ShARC資料集的構建過程,並使用多種擴充方法,尤其是利用當前最火紅的大型語言模型ChatGPT。此外,我們使用retriever-reranker-reader的框架來處理對話式機器閱讀理解任務,並進行了全面的實驗和分析。特別的是,我們探索了ChatGPT在該任務中的應用,取得了一些正面的成果,同時也發現了一些潛在的改進空間。整體而言,通過建置這個新的資料集,我們試圖推進對話式機器閱讀理解領域的研究,促進其進一步的探索和進展。 | zh_TW |
| dc.description.abstract | In today's information-rich society, the ability to retrieve desired information from vast amounts of data is essential. There is a growing demand for systems that can effectively navigate complex information sources and provide accurate and relevant answers to user queries. In this thesis, we address this challenge by introducing a new conversational machine reading comprehension (CMRC) dataset, named WHITE-ShARC. Unlike previous datasets that rely on co-reference resolution to connect question-answer pairs, our dataset emphasizes the conversational aspect by requiring models to reason through follow-up questions to gradually arrive at the answers. WHITE-ShARC combines the characteristics of conversational dialogue and open-retrieval settings, enabling a broader range of question types and incorporating unanswerable instances.
The construction of WHITE-ShARC involves the use of various augmentation methods, including leveraging large language models like ChatGPT. We propose a retriever-reranker-reader framework to tackle the CMRC task and conduct comprehensive experiments and analyses, including exploring the capabilities of ChatGPT. The results demonstrate both positive contributions and potential areas for improvement. Overall, this research advances the field of CMRC by introducing a new dataset that encourages conversational reasoning and provides insights for future research in this area. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-03-22T16:30:47Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2024-03-22T16:30:47Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | Acknowledgements i
摘要 ii Abstract iii Contents v List of Figures viii List of Tables x Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Thesis Organization 6 Chapter 2 Dataset Construction 7 2.1 Rule Text 7 2.2 Difficulties 7 2.3 Main Annotation Stage 9 2.4 Human Augmentation Stage 13 2.5 Transfer Stage 14 2.5.1 Leaf Node Transfer 15 2.5.1.1 Logic 16 2.5.1.2 Answer Annotation 20 2.5.2 Internal Node Transfer 21 2.6 ChatGPT Augmentation Stage 23 Chapter 3 WHITE-ShARC 25 3.1 WHITE-ShARC Setup 25 3.2 WHITE-ShARC Statistics 26 3.3 Evaluation Metrics 27 Chapter 4 Methodology 31 4.1 Retriever 31 4.2 Reranker 32 4.3 Reader 33 Chapter 5 Experiments 35 5.1 Baseline Models 35 5.2 Results 36 5.3 Seen and Unseen Splits 40 5.4 ChatGPT 43 5.4.1 Reader 44 5.4.2 Data Augmentation 45 5.4.3 Rule Segmentation 47 5.4.4 Scenario Rewriting 48 5.5 Larger Model Scale 50 Chapter 6 Related Work 52 6.1 Question Answering Dataset 52 6.2 Conversational Question Answering Dataset 53 6.3 Open Retrieval Setting 54 Chapter 7 Conclusion 55 References 56 | - |
| dc.language.iso | en | - |
| dc.subject | 對話式機器閱讀理解 | zh_TW |
| dc.subject | 開放式檢索 | zh_TW |
| dc.subject | 問答系統 | zh_TW |
| dc.subject | question answering | en |
| dc.subject | open-retrieval | en |
| dc.subject | conversational machine reading comprehension | en |
| dc.title | 基於追問問題建模的開放檢索對話式機器閱讀模型研究 | zh_TW |
| dc.title | Follow-up Question Modeling for Open Retrieval Conversational Machine Reading with Wh-Questions | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 112-1 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 蔡宗翰;鄭卜壬;蔡銘峰 | zh_TW |
| dc.contributor.oralexamcommittee | Tzong-Han Tsai;Pu-Jen Cheng;Ming-Feng Tsai | en |
| dc.subject.keyword | 開放式檢索,對話式機器閱讀理解,問答系統, | zh_TW |
| dc.subject.keyword | open-retrieval,conversational machine reading comprehension,question answering, | en |
| dc.relation.page | 60 | - |
| dc.identifier.doi | 10.6342/NTU202304553 | - |
| dc.rights.note | 未授權 | - |
| dc.date.accepted | 2023-12-26 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊工程學系 | - |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-112-1.pdf 未授權公開取用 | 1.81 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
