請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94676
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 陳縕儂 | zh_TW |
dc.contributor.advisor | Yun-Nung Chen | en |
dc.contributor.author | 黃兆緯 | zh_TW |
dc.contributor.author | Chao-Wei Huang | en |
dc.date.accessioned | 2024-08-16T17:28:01Z | - |
dc.date.available | 2024-08-17 | - |
dc.date.copyright | 2024-08-16 | - |
dc.date.issued | 2024 | - |
dc.date.submitted | 2024-08-08 | - |
dc.identifier.citation | [1] Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat Behl, et al. Phi-3 technical report: A highly capable language model locally on your phone. arXiv preprint arXiv:2404.14219, 2024.
[2] Chris Alberti, Daniel Andor, Emily Pitler, Jacob Devlin, and Michael Collins. Synthetic QA corpora generation with roundtrip consistency. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6168– 6173. Association for Computational Linguistics, 2019. [3] Akari Asai, Jungo Kasai, Jonathan Clark, Kenton Lee, Eunsol Choi, and Han- naneh Hajishirzi. XOR QA: Cross-lingual open-retrieval question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 547–564. Association for Computational Linguistics, 2021. [4] Akari Asai, Timo Schick, Patrick Lewis, Xilun Chen, Gautier Izacard, Sebastian Riedel, Hannaneh Hajishirzi, and Wen-tau Yih. Task-aware retrieval with instruc- tions. arXiv preprint arXiv:2211.09260, 2022. [5] Akari Asai, Xinyan Yu, Jungo Kasai, and Hanna Hajishirzi. One question answer- ing model for many languages with cross-lingual dense passage retrieval. Advances in Neural Information Processing Systems, 34:7547–7560, 2021. [6] Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova Das- Sarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022. [7] Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell,Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKin- non, et al. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073, 2022. [8] Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, et al. Ms marco: A human generated machine reading comprehension dataset. arXiv preprint arXiv:1611.09268, 2016. [9] Luiz Bonifacio, Hugo Abonizio, Marzieh Fadaee, and Rodrigo Nogueira. In- pars: Unsupervised dataset generation for information retrieval. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2387–2392, 2022. [10] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Pra- fulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020. [11] Tuhin Chakrabarty, Yejin Choi, and Vered Shwartz. It’s not rocket science: In- terpreting figurative language in narratives. Transactions of the Association for Computational Linguistics, 10:589–606, 2022. [12] Cheng-Han Chiang and Hung-yi Lee. Merging facts, crafting fallacies: Evaluat- ing the contradictory nature of aggregated factual claims in long-form generations. arXiv preprint arXiv:2402.05629, 2024. [13] Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, and Luke Zettlemoyer. QuAC: Question answering in context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2174–2184. Association for Computational Linguistics, 2018. [14] Yung-Sung Chuang, Yujia Xie, Hongyin Luo, Yoon Kim, James R. Glass, and Pengcheng He. Dola: Decoding by contrasting layers improves factuality in large language models. In The Twelfth International Conference on Learning Representations, 2024. [15] Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fe- dus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowd- hery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei. Scaling instruction-finetuned language models, 2022. [16] JonathanH.Clark,EunsolChoi,MichaelCollins,DanGarrette,TomKwiatkowski, Vitaly Nikolaev, and Jennimaria Palomaki. TyDi QA: A benchmark for information-seeking question answering in typologically diverse languages. Transactions of the Association for Computational Linguistics, 8:454–470, 2020. [17] Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. Overview of the trec 2020 deep learning track. corr abs/2102.07662 (2021). arXiv preprint arXiv:2102.07662, 2021. [18] Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M Voorhees. Overview of the trec 2019 deep learning track. arXiv preprint arXiv:2003.07820, 2020. [19] Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Wei Zhu, Yuan Ni, Guotong Xie, Zhiyuan Liu, and Maosong Sun. Ultrafeedback: Boosting language models with high-quality feedback, 2023. [20] Zhuyun Dai, Arun Tejasvi Chaganty, Vincent Y Zhao, Aida Amini, Qazi Mamunur Rashid, Mike Green, and Kelvin Guu. Dialog inpainting: Turning documents into dialogs. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 4558–4586. PMLR, 17–23 Jul 2022. [21] Zhuyun Dai, Vincent Y Zhao, Ji Ma, Yi Luan, Jianmo Ni, Jing Lu, Anton Bakalov, Kelvin Guu, Keith Hall, and Ming-Wei Chang. Promptagator: Few-shot dense retrieval from 8 examples. In The Eleventh International Conference on Learning Representations, 2023. [22] Jeffrey Dalton, Chenyan Xiong, and Jamie Callan. Cast 2019: The conversational assistance track overview. In TREC, 2019. [23] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computa- tional Linguistics, 2019. [24] Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, and Jason Weston. Chain-of-verification reduces hallucination in large language models. arXiv preprint arXiv:2309.11495, 2023. [25] Laura Dietz, Manisha Verma, Filip Radlinski, and Nick Craswell. Trec complex answer retrieval overview. In TREC, 2017. [26] Ning Ding, Yulin Chen, Bokai Xu, Yujia Qin, Zhi Zheng, Shengding Hu, Zhiyuan Liu, Maosong Sun, and Bowen Zhou. Enhancing chat language models by scaling high-quality instructional conversations. arXiv preprint arXiv:2305.14233, 2023. [27] Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. The faiss library. 2024. [28] Yann Dubois, Balázs Galambosi, Percy Liang, and Tatsunori B Hashimoto. Length-controlled alpacaeval: A simple way to debias automatic evaluators. arXiv preprint arXiv:2404.04475, 2024. [29] Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, and Douwe Kiela. Kto: Model alignment as prospect theoretic optimization. arXiv preprint arXiv:2402.01306, 2024. [30] Luyu Gao and Jamie Callan. Unsupervised corpus aware language model pre- training for dense passage retrieval. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2843–2853. Association for Computational Linguistics, 2022. [31] Luyu Gao, Yunyi Zhang, Jiawei Han, and Jamie Callan. Scaling deep con- trastive learning batch size under memory limited setup. In Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021), pages 316– 321. Association for Computational Linguistics, 2021. [32] Zorik Gekhman, Gal Yona, Roee Aharoni, Matan Eyal, Amir Feder, Roi Reichart, and Jonathan Herzig. Does fine-tuning llms on new knowledge encourage halluci- nations? arXiv preprint arXiv:2405.05904, 2024. [33] Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, et al. Olmo: Accelerating the science of language models. arXiv preprint arXiv:2402.00838, 2024. [34] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015. [35] Sebastian Hofstätter, Sophia Althammer, Michael Schröder, Mete Sertkan, and Al- lan Hanbury. Improving efficient neural ranking models with cross-architecture knowledge distillation, 2020. [36] Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curi- ous case of neural text degeneration. In International Conference on Learning Representations, 2020. [37] Chao-Wei Huang and Yun-Nung Chen. Instupr: Instruction-based unsupervised passage reranking with large language models. arXiv preprint arXiv:2403.16435, 2024. [38] Chao-Wei Huang, Chen-Yu Hsu, Tsu-Yuan Hsu, Chen-An Li, and Yun-Nung Chen. CONVERSER: Few-shot conversational dense retrieval with synthetic data gener- ation. In Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 381–387, 2023. [39] Chao-Wei Huang, Chen-An Li, Tsu-Yuan Hsu, Chen-Yu Hsu, and Yun-Nung Chen. Unsupervised multilingual dense retrieval via generative pseudo labeling. In Yvette Graham and Matthew Purver, editors, Findings of the Association for Computational Linguistics: EACL 2024, pages 736–746, St. Julian’s, Malta, March 2024. Association for Computational Linguistics. [40] Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. Unsupervised dense information retrieval with contrastive learning, 2021. [41] Gautier Izacard and Edouard Grave. Distilling knowledge from reader to retriever for question answering, 2020. [42] Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave. Atlas: few-shot learning with retrieval augmented language models. J. Mach. Learn. Res., 24(1), mar 2024. [43] Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38, 2023. [44] Fan Jiang, Qiongkai Xu, Tom Drummond, and Trevor Cohn. Boot and switch: Alternating distillation for zero-shot dense retrieval. arXiv preprint arXiv:2311.15564, 2023. [45] Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zettlemoyer. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611. Association for Compu- tational Linguistics, 2017. [46] Katie Kang, Eric Wallace, Claire Tomlin, Aviral Kumar, and Sergey Levine. Un- familiar finetuning examples control how language models hallucinate. arXiv preprint arXiv:2403.05612, 2024. [47] Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781. Association for Computational Linguistics, 2020. [48] Omar Khattab and Matei Zaharia. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 39–48, 2020. [49] Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:452–466, 2019. [50] Harrison Lee, Samrat Phatale, Hassan Mansoor, Kellie Lu, Thomas Mesnard, Colton Bishop, Victor Carbune, and Abhinav Rastogi. Rlaif: Scaling reinforcement learning from human feedback with ai feedback. arXiv preprint arXiv:2309.00267, 2023. [51] Nayeon Lee, Wei Ping, Peng Xu, Mostofa Patwary, Pascale N Fung, Moham- mad Shoeybi, and Bryan Catanzaro. Factuality enhanced language models for open-ended text generation. Advances in Neural Information Processing Systems, 35:34586–34599, 2022. [52] Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, and Martin Watten- berg. Inference-time intervention: Eliciting truthful answers from a language model. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. [53] Yulong Li, Martin Franz, Md Arafat Sultan, Bhavani Iyer, Young-Suk Lee, and Avirup Sil. Learning cross-lingual IR from an English retriever. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4428–4436. As- sociation for Computational Linguistics, 2022. [54] Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, and Rodrigo Nogueira. Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations. In Proceedings of the 44th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021), pages 2356–2362, 2021. [55] Sheng-Chieh Lin, Akari Asai, Minghan Li, Barlas Oguz, Jimmy Lin, Yashar Mehdad, Wen-tau Yih, and Xilun Chen. How to train your dragon: Diverse aug- mentation towards generalizable dense retrieval. arXiv preprint arXiv:2302.07452, 2023. [56] Sheng-Chieh Lin, Luyu Gao, Barlas Oguz, Wenhan Xiong, Jimmy Lin, Wen-tau Yih, and Xilun Chen. Flame: Factuality-aware alignment for large language mod- els. arXiv preprint arXiv:2405.01525, 2024. [57] Zhenghao Lin, Yeyun Gong, Xiao Liu, Hang Zhang, Chen Lin, Anlei Dong, Jian Jiao, Jingwen Lu, Daxin Jiang, Rangan Majumder, et al. Prod: Progressive distil- lation for dense retrieval. In Proceedings of the ACM Web Conference 2023, pages 3299–3308, 2023. [58] Wei Liu, Weihao Zeng, Keqing He, Yong Jiang, and Junxian He. What makes good data for alignment? a comprehensive study of automatic data selection in instruc- tion tuning. In The Twelfth International Conference on Learning Representations, 2024. [59] Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, and Chenguang Zhu. Gpteval: Nlg evaluation using gpt-4 with better human alignment. arXiv preprint arXiv:2303.16634, 2023. [60] Yougang Lyu, Lingyong Yan, Shuaiqiang Wang, Haibo Shi, Dawei Yin, Pengjie Ren, Zhumin Chen, Maarten de Rijke, and Zhaochun Ren. Knowtun- ing: Knowledge-aware fine-tuning for large language models. arXiv preprint arXiv:2402.11176, 2024. [61] Xueguang Ma, Xinyu Zhang, Ronak Pradeep, and Jimmy Lin. Zero-shot listwise document reranking with a large language model. arXiv preprint arXiv:2305.02156, 2023. [62] Potsawee Manakul, Adian Liusie, and Mark Gales. SelfCheckGPT: Zero-resource black-box hallucination detection for generative large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9004–9017. Association for Computational Linguistics, 2023. [63] AI Meta. Introducing meta llama 3: The most capable openly available llm to date. Meta AI, 2024. [64] Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Koh, Mohit Iyyer, Luke Zettlemoyer, and Hannaneh Hajishirzi. FActScore: Fine-grained atomic evaluation of factual precision in long form text generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12076–12100. Association for Computational Linguistics, 2023. [65] Abhika Mishra, Akari Asai, Vidhisha Balachandran, Yizhong Wang, Graham Neu- big, Yulia Tsvetkov, and Hannaneh Hajishirzi. Fine-grained hallucination detection and editing for language models. arXiv preprint arXiv:2401.06855, 2024. [66] Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Bi- derman, Teven Le Scao, M Saiful Bari, Sheng Shen, Zheng-Xin Yong, Hailey Schoelkopf, et al. Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786, 2022. [67] Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hernandez Abrego, Ji Ma, Vincent Zhao, Yi Luan, Keith Hall, Ming-Wei Chang, and Yinfei Yang. Large dual encoders are generalizable retrievers. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9844–9855. Association for Computational Linguistics, 2022. [68] Rodrigo Nogueira and Kyunghyun Cho. Passage re-ranking with bert. arXiv preprint arXiv:1901.04085, 2019. [69] Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. Document ranking with a pretrained sequence-to-sequence model. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 708–718. Association for Computational Linguistics, 2020. [70] Rodrigo Nogueira, Wei Yang, Kyunghyun Cho, and Jimmy Lin. Multi-stage docu- ment ranking with bert. arXiv preprint arXiv:1910.14424, 2019. [71] Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018. [72] Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Train- ing language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744, 2022. [73] Ronak Pradeep, Rodrigo Nogueira, and Jimmy Lin. The expando-mono-duo de- sign pattern for text ranking with pretrained sequence-to-sequence models. arXiv preprint arXiv:2101.05667, 2021. [74] Ronak Pradeep, Rodrigo Nogueira, and Jimmy J. Lin. The expando-mono-duo de- sign pattern for text ranking with pretrained sequence-to-sequence models. ArXiv, abs/2101.05667, 2021. [75] Chen Qu, Liu Yang, Cen Chen, Minghui Qiu, W Bruce Croft, and Mohit Iyyer. Open-retrieval conversational question answering. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 539–548, 2020. [76] Yingqi Qu, Yuchen Ding, Jing Liu, Kai Liu, Ruiyang Ren, Wayne Xin Zhao, Daxiang Dong, Hua Wu, and Haifeng Wang. RocketQA: An optimized train- ing approach to dense passage retrieval for open-domain question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5835–5847. Association for Computational Linguistics, 2021. [77] Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. [78] Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383– 2392. Association for Computational Linguistics, 2016. [79] Vipula Rawte, Swagata Chakraborty, Agnibh Pathak, Anubhav Sarkar, S.M Towhidul Islam Tonmoy, Aman Chadha, Amit Sheth, and Amitava Das. The trou- bling emergence of hallucination in large language models - an extensive defini- tion, quantification, and prescriptive remediations. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2541– 2573. Association for Computational Linguistics, 2023. [80] Houxing Ren, Linjun Shou, Ning Wu, Ming Gong, and Daxin Jiang. Empowering dual-encoder with query generator for cross-lingual dense retrieval. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3107–3121. Association for Computational Linguistics, 2022. [81] Guilherme Moraes Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, and Rodrigo Nogueira. No parameter left be- hind: How distillation and model size affect zero-shot retrieval. arXiv preprint arXiv:2206.02873, 2022. [82] Devendra Sachan, Mike Lewis, Mandar Joshi, Armen Aghajanyan, Wen-tau Yih, Joelle Pineau, and Luke Zettlemoyer. Improving passage retrieval with zero-shot question generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3781–3797. Association for Computational Linguistics, 2022. [83] Devendra Singh Sachan, Mike Lewis, Dani Yogatama, Luke Zettlemoyer, Joelle Pineau, and Manzil Zaheer. Questions are all you need to train a dense passage retriever. arXiv preprint arXiv:2206.10658, 2022. [84] Keshav Santhanam, Omar Khattab, Christopher Potts, and Matei Zaharia. Plaid: an efficient engine for late interaction retrieval. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pages 1747– 1756, 2022. [85] Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, and Matei Zaharia. ColBERTv2: Effective and efficient retrieval via lightweight late interaction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3715–3734. Association for Computational Linguistics, 2022. [86] Tianhao Shen, Mingtong Liu, Ming Zhou, and Deyi Xiong. Recovering gold from black sand: Multilingual dense passage retrieval with hard and false negative sam- ples. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10659–10670. Association for Computational Lin- guistics, 2022. [87] KurtShuster,SpencerPoff,MoyaChen,DouweKiela,andJasonWeston.Retrieval augmentation reduces hallucination in conversation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3784–3803. Association for Computational Linguistics, 2021. [88] Nikita Sorokin, Dmitry Abulkhanov, Irina Piontkovskaya, and Valentin Malykh. Ask me anything in your native language. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 395–406. Association for Computational Linguistics, 2022. [89] Weiwei Sun, Lingyong Yan, Xinyu Ma, Pengjie Ren, Dawei Yin, and Zhaochun Ren. Is chatgpt good at search? investigating large language models as re-ranking agent. arXiv preprint arXiv:2304.09542, 2023. [90] GemmaTeam,ThomasMesnard,CassidyHardin,RobertDadashi,SuryaBhupati- raju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, et al. Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295, 2024. [91] Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. BEIR: A heterogeneous benchmark for zero-shot evaluation of informa- tion retrieval models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021. [92] Katherine Tian, Eric Mitchell, Huaxiu Yao, Christopher D Manning, and Chelsea Finn. Fine-tuning language models for factuality. In The Twelfth International Conference on Learning Representations, 2024. [93] Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yas- mine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhos- ale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023. [94] Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Ra- sul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, et al. Zephyr: Direct distillation of lm alignment. arXiv preprint arXiv:2310.16944, 2023. [95] Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V Le. Finetuned language models are zero-shot learners. In International Conference on Learning Representations, 2022. [96] Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, et al. Long-form factuality in large language models. arXiv preprint arXiv:2403.18802, 2024. [97] Xing Wu, Guangyuan Ma, Meng Lin, Zijia Lin, Zhongyuan Wang, and Songlin Hu. Contextual masked auto-encoder for dense passage retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 4738–4746, 2023. [98] Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Am- manabrolu, Noah A Smith, Mari Ostendorf, and Hannaneh Hajishirzi. Fine-grained human feedback gives better rewards for language model training. Advances in Neural Information Processing Systems, 36, 2024. [99] Shitao Xiao, Zheng Liu, Yingxia Shao, and Zhao Cao. RetroMAE: Pre-training retrieval-oriented language models via masked auto-encoder. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 538–548. Association for Computational Linguistics, 2022. [100] Can Xu, Qingfeng Sun, Kai Zheng, Xiubo Geng, Pu Zhao, Jiazhan Feng, Chongyang Tao, Qingwei Lin, and Daxin Jiang. WizardLM: Empowering large pre-trained language models to follow complex instructions. In The Twelfth International Conference on Learning Representations, 2024. [101] Ziwei Xu, Sanjay Jain, and Mohan Kankanhalli. Hallucination is inevitable: An innate limitation of large language models. arXiv preprint arXiv:2401.11817, 2024. [102] Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, and Colin Raffel. mT5: A massively multilingual pre- trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498. Association for Computational Linguis- tics, 2021. [103] Peilin Yang, Hui Fang, and Jimmy Lin. Anserini: Reproducible ranking baselines using lucene. ACM J. Data Inf. Qual., 10:16:1–16:20, 2018. [104] Yue Yu, Chenyan Xiong, Si Sun, Chao Zhang, and Arnold Overwijk. COCO-DR: Combating distribution shift in zero-shot dense retrieval with contrastive and dis- tributionally robust learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1462–1479. Association for Com- putational Linguistics, 2022. [105] Zheng Yuan, Hongyi Yuan, Chengpeng Li, Guanting Dong, Chuanqi Tan, and Chang Zhou. Scaling relationship on learning mathematical reasoning with large language models. arXiv preprint arXiv:2308.01825, 2023. [106] Hansi Zeng, Hamed Zamani, and Vishwa Vinay. Curriculum learning for dense retrieval distillation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1979– 1983, 2022. [107] Xinyu Zhang, Kelechi Ogueji, Xueguang Ma, and Jimmy Lin. Towards best practices for training multilingual dense retrieval models. arXiv preprint arXiv:2204.02363, 2022. [108] Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric. P Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. Judging llm-as-a-judge with mt-bench and chatbot arena, 2023. | - |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94676 | - |
dc.description.abstract | 近年來,人工智慧應用程式越來越依賴大量的一般和專門知識庫來處理複雜的任務,如問答和資訊搜尋。傳統的資訊檢索(IR)技術在有效地從各種知識來源收集相關資訊方面發揮了關鍵作用,從而提高了這些應用程式的性能。儘管這些資訊檢索系統十分有效,但它們需要大量的訓練資料,這限制了它們在各種任務中的應用。另一方面,大型語言模型(LLMs)因其能夠儲存在廣泛預訓練過程中獲得的大量知識而廣受歡迎。此外,這些模型擅長遵循各種指令並執行廣泛的任務。然而,大型語言模型在事實準確性方面存在困難,且無法提供最新資訊,這大大限制了它們成為下一代資訊存取引擎的效能。
因此,本篇論文旨在探索並增強資訊檢索和大型語言模型之間的協同效應,以提高它們的相互效能。具體來說,我們提出了資料高效檢索和大型語言模型的事實性對齊的技術。對於資料高效檢索,我們提出了在各個方面解決檢索系統資料效率的技術。通過利用大型語言模型的強大能力,我們提出了無需或只需少量標註資料即可建立檢索系統的新方法。對於事實性對齊,我們提出了一種對齊演算法 FactAlign,該演算法利用長篇事實性評估器提供的細粒度訊號。我們的研究重點為開發利用資訊檢索和大型語言模型優勢的技術,最終相互提高它們的效能。本篇論文中,每一篇研究都展示了一個用有效利用此協同效應的學習框架,總合起來,這篇論文探索了有效利用資訊檢索和大型語言模型之間的協同效應而相互提升彼此的效能。 | zh_TW |
dc.description.abstract | In recent years, artificial intelligence applications have increasingly relied on substantial knowledge bases, both general and domain-specific, to address complex tasks such as question answering and information seeking. Traditional information retrieval (IR) techniques have played a critical role in enhancing the performance of these applications by effectively gathering relevant information from diverse knowledge sources. Despite their utility, these IR systems require extensive amount of training data, which hinders their application to various tasks.
On the other hand, large language models (LLMs) have gained immense popularity due to their ability to store vast amounts of knowledge acquired during extensive pretraining. In addition, these models excel in following diverse instructions and performing a wide array of tasks. However, LLMs struggle with factual accuracy and incapable of providing up-to-date information, which significantly limit their effectiveness to become the next-generation engine for information access. In this thesis, we aim to explore and enhance the synergies between IR and LLMs, seeking to improve their mutual effectiveness. Specifically, we present techniques for data-efficient retrieval and factuality alignment of LLMs. For data-efficient retrieval, we present techniques that address the data efficiency of retrieval systems in various aspects. By leveraging the robust capabilities of LLMs, we propose novel methodologies to build retrieval systems with no or little annotated data. For factuality alignment, we propose an alignment algorithm, FactAlign, that leverages the fine-grained signals provided by long-form factuality evaluators. In summary, this thesis introduces various techniques that leverage the strengths of both IR and LLMs, ultimately improving their effectiveness mutually. | en |
dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-16T17:28:01Z No. of bitstreams: 0 | en |
dc.description.provenance | Made available in DSpace on 2024-08-16T17:28:01Z (GMT). No. of bitstreams: 0 | en |
dc.description.tableofcontents | Acknowledgements i
摘要 iii Abstract v Contents vii List of Figures xiii List of Tables xv Chapter 1 Introduction 1 Chapter 2 Related Work 5 2.1 Information Retrieval 5 2.1.1 Dense Passage Retrieval 5 2.1.2 Knowledge Distillation for Dense Retrieval 6 2.1.3 Passage Reranking 6 2.2 Large Language Models 7 2.2.1 Language Model Alignment 7 2.2.2 Factuality of Language Models 8 Chapter 3 Data-efficient Retrieval 9 3.1 Background 9 3.1.1 DenseRetrieval 9 3.1.2 PassageReranking 10 3.1.3 Knowledge Distillation for Dense Retrieval 11 3.2 Query Likelihood for Unsupervised Multilingual Retrieval 12 3.2.1 Unsupervised Multilingual Reranking 12 3.2.2 Knowledge-Distilled Retriever Training 14 3.2.3 Iterative Training 15 3.2.4 Experiments 16 3.2.4.1 Datasets 16 3.2.4.2 Baseline Systems 17 3.2.4.3 Implementation Details 18 3.2.4.4 Main Results 19 3.2.5 Analysis and Discussion 20 3.2.5.1 Unsupervised Multilingual Reranking 20 3.2.5.2 Question Generation 21 3.2.5.3 In-batch Negative 22 3.2.5.4 Effect of Hyperparameters 22 3.2.5.5 Retrieval Performance on XOR-Full 23 3.2.6 Summary 24 3.3 Synthetic Data for Conversation Dense Retrieval 25 3.3.1 ProposedFramework 25 3.3.1.1 Few-Shot Conversational Query Generation 25 3.3.1.2 Two-Stage Generation 26 3.3.1.3 Passage Switching 27 3.3.1.4 Consistency Filtering 27 3.3.2 Experiments 28 3.3.2.1 Experimental Setup 28 3.3.2.2 Main Results 29 3.3.2.3 Ablation and Comparative Study 30 3.3.2.4 Effect of Generated Data Size 31 3.3.3 Qualitative Study 32 3.3.4 Summary 33 3.4 Instruction-based Unsupervised Passage Reranking 33 3.4.1 Proposed Framework 33 3.4.1.1 INSTUPR: Instruction-based Unsupervised Passage Reranking 34 3.4.1.2 Soft Relevance Score Aggregation 34 3.4.1.3 Pairwise Reranking 35 3.4.2 Experiments 36 3.4.2.1 Setup 36 3.4.2.2 Baseline Systems 37 3.4.3 Main Results 37 3.4.4 Ablation Study 38 3.4.5 Summary 38 3.5 Pairwise Relevance Distillation 39 3.5.1 Proposed Framework 39 3.5.1.1 Pairwise Reranking 39 3.5.1.2 Pairwise Relevance Distillation 41 3.5.1.3 Iterative Training 43 3.5.2 Experiments 43 3.5.2.1 Datasets 43 3.5.2.2 Baseline Models 45 3.5.2.3 Implementation Details 46 3.5.3 Main Results 48 3.5.3.1 In-domain Evaluation 48 3.5.3.2 Out-of-domain Evaluation 48 3.5.4 Discussions 49 3.5.4.1 Ablation Study 49 3.5.4.2 Zero-shot Domain Adaptation 52 3.5.5 Additional Analyses 53 3.5.5.1 Reranking Performance 53 3.5.5.2 Difference between pairwise and pointwise reranking 53 3.5.5.3 Effect of hyperparameters 53 3.5.6 Summary 54 Chapter 4 Factuality Alignment of LLMs 55 4.1 Introduction 55 4.2 Preliminaries 57 4.2.1 Long-form Factuality 58 4.2.2 Kahneman-Tversky Optimization 59 4.3 FACTALIGN: Aligning Language Models for Long-form Factuality 61 4.3.1 Automatic Long-form Factuality Evaluator 61 4.3.2 Long-form Factuality Alignment 63 4.3.2.1 Response-level Alignment 63 4.3.2.2 Sentence-level Alignment 64 4.3.3 Iterative Optimization 65 4.4 Experimental Stetup 65 4.4.1 Datasets 65 4.4.2 Long-form Factuality Evaluator 66 4.4.3 Models 67 4.4.4 Evaluation Procedure 67 4.4.5 Implementation Details 68 4.5 Results 69 4.5.1 Ablation Study 71 4.5.2 Generalization to New Topics 73 4.5.3 Sensitivity of Hyperparameters 73 4.6 Summary 74 Chapter 5 Conclusion and Future Work 75 5.1 Conclusion 75 5.2 Future Work 76 References 79 | - |
dc.language.iso | en | - |
dc.title | 資訊檢索與語言模型的協同效應:從資料高效密集檢索到事實性優 化 | zh_TW |
dc.title | Synergies in Information Retrieval and Language Models: From Data-efficient Dense Retrieval to Factuality Alignment | en |
dc.type | Thesis | - |
dc.date.schoolyear | 112-2 | - |
dc.description.degree | 博士 | - |
dc.contributor.oralexamcommittee | 林守德;李宏毅;陳尚澤;孫紹華;張碩尹 | zh_TW |
dc.contributor.oralexamcommittee | Shou-De Lin;Hung-Yi Lee;Shang-Tse Chen;Shao-Hua Sun;Shuo-Yiin Chang | en |
dc.subject.keyword | 資訊檢索,密集檢索,文件重排序,語言模型,事實性,語言模型對齊, | zh_TW |
dc.subject.keyword | information retrieval,dense retrieval,passage reranking,language models,factuality,alignment, | en |
dc.relation.page | 98 | - |
dc.identifier.doi | 10.6342/NTU202403870 | - |
dc.rights.note | 同意授權(全球公開) | - |
dc.date.accepted | 2024-08-12 | - |
dc.contributor.author-college | 電機資訊學院 | - |
dc.contributor.author-dept | 資訊工程學系 | - |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-112-2.pdf | 7.02 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。