大型語言模型進行醫療診斷推理

吳承光; Cheng-Kuang Wu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95138

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳信希	zh_TW
dc.contributor.advisor	Hsin-Hsi Chen	en
dc.contributor.author	吳承光	zh_TW
dc.contributor.author	Cheng-Kuang Wu	en
dc.date.accessioned	2024-08-29T16:15:48Z	-
dc.date.available	2024-08-30	-
dc.date.copyright	2024-08-29	-
dc.date.issued	2024	-
dc.date.submitted	2024-08-15	-
dc.identifier.citation	[1] Jerome P Kassirer. Diagnostic reasoning. Annals of internal medicine, 110(11):893–900, 1989. [2] Friedemann Ohm, Daniela Vogel, Susanne Sehner, Marjo Wijnen-Meijer, and Sigrid Harendza. Details acquired from medical history and patients＇experience of empathy–two sides of the same coin. BMC medical education, 13(1):1–7, 2013. [3] Kai-Fu Tang, Hao-Cheng Kao, Chun-Nan Chou, and Edward Y Chang. Inquire and diagnose: Neural symptom checking ensemble using deep reinforcement learning. In NIPS workshop on deep reinforcement learning, 2016. [4] Hao-Cheng Kao, Kai-Fu Tang, and Edward Chang. Context-aware symptom checking for disease diagnosis using hierarchical reinforcement learning. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018. [5] Zhongyu Wei, Qianlong Liu, Baolin Peng, Huaixiao Tou, Ting Chen, Xuan-Jing Huang, Kam-Fai Wong, and Xiang Dai. Task-oriented dialogue system for automatic diagnosis. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 201–207, 2018. [6] Lin Xu, Qixian Zhou, Ke Gong, Xiaodan Liang, Jianheng Tang, and Liang Lin. End-to-end knowledge-routed relational dialogue system for automatic diagnosis. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 7346–7353, 2019. [7] Yuan Xia, Jingbo Zhou, Zhenhui Shi, Chao Lu, and Haifeng Huang. Generative adversarial regularized mutual information policy gradient framework for automatic diagnosis. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 1062–1069, 2020. [8] Junying Chen, Dongfang Li, Qingcai Chen, Wenxiu Zhou, and Xin Liu. Diaformer: Automatic diagnosis via symptoms sequence generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 4432–4440, 2022. [9] Zhenyu Hou, Yukuo Cen, Ziding Liu, Dongxue Wu, Baoyan Wang, Xuanhe Li, Lei Hong, and Jie Tang. Mtdiag: an effective multi-task framework for automatic diagnosis. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 14241–14248, 2023. [10] Huimin Wang, Wai Chung Kwan, Kam-Fai Wong, and Yefeng Zheng. Coad: Automatic diagnosis through symptom and disease collaborative generation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6348–6361, 2023. [11] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020. [12] Jason Wei, Maarten Bosma, Vincent Y Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021. [13] Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, et al. Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138, 2022. [14] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013. [15] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. [16] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824-24837, 2022. [17] Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022. [18] Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021. [19] Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, and Jonathan Berant. Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9:346–361, 2021. [20] Katharina E Keifenheim, Martin Teufel, Julianne Ip, Natalie Speiser, Elisabeth J Leehr, Stephan Zipfel, and Anne Herrmann-Werner. Teaching history taking to medical students: a systematic review. BMC medical education, 15:1–12, 2015. [21] Arsene Fansi Tchango, Rishab Goel, Zhi Wen, Julien Martel, and Joumana Ghosn. Ddxplus: A new dataset for automatic medical diagnosis. Advances in neural information processing systems, 35:31306–31318, 2022. [22] Cheng-Kuang Wu, Wei-Lin Chen, and Hsin-Hsi Chen. Large language models perform diagnostic reasoning. arXiv preprint arXiv:2307.08922, 2023. [23] Andrea Madotto, Zhaojiang Lin, Genta Indra Winata, and Pascale Fung. Few-shot bot: Prompt-based learning for dialogue systems. arXiv preprint arXiv:2110.08118, 2021. [24] Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744, 2022. [25] Cheng-Kuang Wu, Zhi Rui Tam, Chieh-Yen Lin, Yun-Nung Chen, and Hung-yi Lee. Streambench: Towards benchmarking continuous improvement of language agents. arXiv preprint arXiv:2406.08747, 2024. [26] Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. Rethinking the role of demonstrations: What makes in-context learning work? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11048–11064, 2022. [27] Xinxi Lyu, Sewon Min, Iz Beltagy, Luke Zettlemoyer, and Hannaneh Hajishirzi. Z-icl: Zero-shot in-context learning with pseudo-demonstrations. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2304–2317, 2023. [28] Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch. Improving factuality and reasoning in language models through multiagent debate. In Forty-first International Conference on Machine Learning. [29] Justin Chih-Yao Chen, Swarnadeep Saha, and Mohit Bansal. Reconcile: Round-table conference improves reasoning via consensus among diverse llms. arXiv preprint arXiv:2309.13007, 2023.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95138	-
dc.description.abstract	醫療診斷推理是臨床工作中的重要能力，它讓醫師能從病人身上「蒐集關鍵資訊」，並且利用蒐集到的資訊「預測診斷」。本論文以問診作為研究案例，探討大型語言模型之診斷推理能力，並提出進一步提升此推理能力之方法論。此項研究之主要貢獻有三：一、發展大型語言模型角色扮演評估框架，用以評估大型語言模型之問診能力；二、提出少樣本、零樣本之提示工程方法論，此方法論結合醫師診斷推理之思考過程，並以實驗佐證其能提升大型語言模型「蒐集關鍵資訊」以及「預測診斷」之表現。三、顯示大型語言模型能透過儲存及擷取自生成之診斷推理過程，持續增進其診斷預測之能力。	zh_TW
dc.description.abstract	Medical diagnostic reasoning is a crucial capability in clinical practice, enabling physicians to "collect key information" from patients and "predict diagnoses" based on the collected information. This thesis investigates the diagnostic reasoning abilities of large language models (LLMs) using history taking as a case study and proposes methodologies to further enhance these reasoning abilities. The main contributions of this research are threefold: (1) Development of an LLM Role-Playing Evaluation Framework to assess the history-taking abilities of LLMs. (2) Introduction of few-shot and zero-shot prompting methodologies that integrate the diagnostic reasoning processes of physicians, with experimental evidence demonstrating their effectiveness in improving LLMs' performance in ``collecting key information'' and ``predicting diagnoses''. (3) Showing that LLMs can continuously improve their diagnostic prediction capabilities through storing and retrieving self-generated diagnostic reasoning processes.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-29T16:15:48Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-08-29T16:15:48Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Acknowledgements i 摘要 iv Abstract v Contents vi List of Figures viii List of Tables ix Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Thesis Organization 4 Chapter 2 Related Work 5 2.1 History Taking 5 2.2 Reasoning Abilities of Large Language Models 6 Chapter 3 Datasets 8 3.1 Patient Profile 8 3.2 Data Statistics 9 Chapter 4 Collecting Diagnostic Information 10 4.1 The LLM-Role-Playing Evaluation Framework 10 4.2 Evaluation Metric 12 4.3 Experiments 12 4.3.1 Few-Shot Setting 12 4.3.1.1 Methodology 12 4.3.1.2 Results 13 4.3.2 Zero-Shot Setting 15 4.3.2.1 Methodology 15 4.3.2.2 Results 16 Chapter 5 Diagnosis Prediction 21 5.1 General Setup 21 5.2 Methodology 22 5.2.1 Non-streaming setting 22 5.2.2 Streaming setting 23 5.3 Experiments 25 5.3.1 Implementation 25 5.3.2 Results 25 5.4 Discussion 27 5.4.1 Confusion Matrices of Single-Agent and Multi-Agent Memory Methods 27 5.4.2 Ablation Studies on Multi-Agent Memory 28 Chapter 6 Conclusions 30 References 32	-
dc.language.iso	en	-
dc.subject	醫療診斷推理	zh_TW
dc.subject	大型語言模型	zh_TW
dc.subject	問診	zh_TW
dc.subject	History Taking	en
dc.subject	Large Language Models	en
dc.subject	Medical Diagnostic Reasoning	en
dc.title	大型語言模型進行醫療診斷推理	zh_TW
dc.title	Large Language Models Perform Diagnostic Reasoning	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	張嘉惠;王釧茹;鄭卜壬	zh_TW
dc.contributor.oralexamcommittee	Chia-Hui Chang;Chuan-Ju Want;Pu-Jen Cheng	en
dc.subject.keyword	大型語言模型,醫療診斷推理,問診,	zh_TW
dc.subject.keyword	Large Language Models,Medical Diagnostic Reasoning,History Taking,	en
dc.relation.page	36	-
dc.identifier.doi	10.6342/NTU202403801	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2024-08-15	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf	1.67 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。