結合知識檢索與生成式 AI 之健檢報告生成

楊博勛; Po-Hsun Yang

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98390

Full metadata record

???org.dspace.app.webui.jsptag.ItemTag.dcfield???	Value	Language
dc.contributor.advisor	張智星	zh_TW
dc.contributor.advisor	Jyh-Shing Jang	en
dc.contributor.author	楊博勛	zh_TW
dc.contributor.author	Po-Hsun Yang	en
dc.date.accessioned	2025-08-05T16:10:57Z	-
dc.date.available	2025-08-06	-
dc.date.copyright	2025-08-05	-
dc.date.issued	2025	-
dc.date.submitted	2025-07-28	-
dc.identifier.citation	[1] V. Karpukhin, B. Oguz, S. Min, et al., “Dense passage retrieval for open-domain question answering,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online: Association for Computational Linguistics, 2020, pp. 6769–6781. [2] Y. Liu, T. Han, S. Ma, and J. Zhang, “Summary of chatgpt-related research and perspective towards the future of large language models,” arXiv preprint arXiv:2304.01852, 2023. [3] M. Enis and M. Hopkins, “From llm to nmt: Advancing low-resource machine translation with claude,” arXiv preprint arXiv:2404.13813, 2024. [4] H.-W. Cheng, “Challenges and limitations of chatgpt and artificial intelligence for scientific research: A perspective from organic materials,” AI , vol. 4, no. 2, pp. 401–405, 2023. doi: 10.3390/ai4020021. [5] S. Farquhar, J. Kossen, L. Kuhn, et al., “Detecting hallucinations in large language models using semantic entropy,” Nature, vol. 630, pp. 625–630, 2024. doi: 10.1038/s41586-024-07421-0. [6] S. Dhuliawala, M. Komeili, J. Xu, et al., “Chain-of-verification reduces hallucinationin large language models,” arXiv preprint arXiv:2309.11495, 2023. [7] A. Mishra, A. Asai, V. Balachandran, Y. Wang, G. Neubig, and Y. Tsvetkov, “Fine-grained hallucination detection and editing for language models,” arXiv preprint2401.06855, 2024. [8] L. Huang, W. Yu, W. Ma, et al., “A survey on hallucination in large language models:Principles, taxonomy, challenges, and open questions,” arXiv preprint arXiv:2311.05232,2023. [9] S. T. I. Tonmoy, S. M. M. Zaman, V. Jain, et al., “A comprehensive survey on hallu-cination in large language models: Definition, evaluation, detection, and mitigation,” arXiv preprint arXiv:2401.01313, 2024. [10] Z. Ji, N. Lee, R. Frieske, et al., “Survey of hallucination in natural language generation,” arXiv preprint arXiv:2202.03629, 2022. [11] A. J. Thirunavukarasu, D. S. J. Ting, K. Elangovan, et al., “Large language models in medicine,” Nature Medicine, vol. 29, pp. 1930–1940, 2023. doi: 10.1038/s41591-023-02448-8 [12] D. Wadden, S. Lin, K. Lo, et al., “Fact or fiction: Verifying scientific claims,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online: Association for Computational Linguistics, 2020, pp. 7534–7550. [13] J. Vladika, P. Schneider, and F. Matthes, “Healthfc: Verifying health claims with evidence-based medical fact-checking,” in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italia: ELRA and ICCL, 2024, pp. 8095–8107. [14] J. Vladika, P. Schneider, and F. Matthes, “Medreqal: Examining medical knowledge recall of large language models via question answering,” arXiv preprint arXiv:2406.05845,2024. [15] Y. Gao, Y. Xiong, X. Gao, et al., “Retrieval-augmented generation for large language models: A survey,” arXiv preprint arXiv:2312.10997 , 2023. [16] Y. Bengio and P. Vincent, “A neural probabilistic language model,” Journal of Machine Learning Research, pp. 1137–1155, 2003. [17] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013. [18] A. Grover and J. Leskovec, “Node2vec: Scalable feature learning for networks,” arXiv preprint arXiv:1607.00653, 2016. [19] “Improving language understanding by generative pre-training.” Accessed: 2024-04-02. (2023), [Online]. Available: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf. [20] C. Raffel, N. Shazeer, A. Roberts, K. Lee, and S. Narang, “Exploring the limits of transfer learning with a unified text-to-text transformer,” arXiv preprint arXiv:1910.10683, 2019. [21] DeepSeek-AI, D. Guo, D. Yang, H. Zhang, J. Song, and R. Zhang, “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,” arXiv preprint arXiv:2501.12948, 2025. [22] A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. Bowman, “GLUE: A multi-task benchmark and analysis platform for natural language understanding,” in Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium: Association for Computational Linguistics, 2018, pp. 353–355. [23] A. Wang, Y. Pruksachatkun, N. Nangia, and A. Singh, “Superglue: A stickierbenchmark for general-purpose language understanding systems,” arXiv preprint arXiv:1905.00537 , 2019. [24] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: A method for automatic evaluation of machine translation,” arXiv preprint arXiv:2109.07958, 2002. [25] C.-Y. Lin, “Rouge: A package for automatic evaluation of summaries,” in Text Summarization Branches Out, Barcelona, Spain: Association for Computational Linguistics, 2004, pp. 74–81. [26] J. Vladika, P. Schneider, and F. Matthes, “Truthfulqa: Measuring how models mimic human falsehoods,” arXiv preprint arXiv:2109.07958, 2019. [27] A. Srivastava, A. Rastogi, A. Rao, A. A. M. Shoeb, and A. Abid, “Beyond the imitation game: Quantifying and extrapolating the capabilities of language models,”arXiv preprint arXiv:2206.04615, 2022. [28] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, and L. Jones, “Attention is all you need,” arXiv preprint arXiv:1706.03762, 2017. [29] Z. Zhou, X. Ning, K. Hong, T. Fu, J. Xu, and S. Li, “A survey on efficient inference for large language models,” arXiv preprint arXiv:2404.14294, 2024. [30] “Prompt engineering playbook.” Accessed: 2024-04-02. (2023), [Online]. Available: https://www.developer.tech.gov.sg/products/collections/data-science-and-artificial-intelligence/playbooks/prompt- engineering-playbook-beta-v3.pdf. [31] “Wikipedia 資料下載頁面.” Accessed: 2025-04-02. (2025), [Online]. Available: https://zh.wikipedia.org/wiki/Wikipedia:%E6%95%B0%E6%8D%AE%E5%BA%93%E4%B8%8B%E8%BD%BD. [32] “Bge-m3.” Accessed: 2024-04-02. (2023), [Online]. Available: https://huggingface.co/BAAI/bge-m3. [33] X. Zhang, X. Ma, P. Shi, and J. Lin, “Mr. tydi: A multi-lingual benchmark for dense retrieval,” arXiv preprint arXiv:2108.08787 , 2018. [34] L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, and A. Siddhant, “Mt5: A massively multilingual pre-trained text-to-text transformer,” arXiv preprint arXiv:2010.11934, 2010.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98390	-
dc.description.abstract	本研究旨在建立資料檢索導向的健康檢查報告生成系統。系統整合 ChatGPT 語言模型生成報告內容，同時以標準化健康檢查數據及結構化問卷調查作為檢索資料庫。透過多階段檢索架構，首先利用BGE-M3 embedding模型進行初步語義匹配，再透過 BGE-reranker 對查核項目（健檢建議）與檢索結果（檢查數據和問卷）進行精確相關性排序，篩選出高度相關的結果作為DeepSeek 語言模型查核的強化參考依據。此機制能有效識別健檢建議中的幻覺現象與邏輯謬誤，顯著提升報告整體可靠性與臨床參考價值。本研究主要貢獻為：（一）減少LLM在處理複雜健檢資料時的幻覺與誤判：運用 BGE-M3 進行多維度密集向量檢索獲取相關參考資料，再以 BGE-reranker 應用注意力機制重排檢索結果，精確篩選高相關資訊作為 LLM 查核依據，有效檢測幻覺的產生，未來可根據不同檢查項目特性與需求動態調整檢索參數與資料集；（二）拓展智慧醫學領域應用：實現高度客製化的自動健檢報告生成與幻覺查核，提高報告準確性、專業一致性與患者理解度，有效減輕醫師工作負荷，同時保持專業判讀品質，為精準醫療與人工智能醫療整合提供可擴展實證架構，推動醫療AI從輔助決策工具向智能協作夥伴的轉型。	zh_TW
dc.description.abstract	This study proposes a retrieval-augmented system for generating personalized health checkup reports. By integrating the ChatGPT language model with standardized health examination data and structured questionnaires as the retrieval corpus, the system employs a multi-stage retrieval framework: BGE-M3 is used for initial semantic matching, followed by BGE-reranker to rank the relevance between checkup suggestions and retrieved content. The top-ranked results are then provided as contextual grounding for the DeepSeek language model to detect hallucinations and logical inconsistencies in the generated recommendations. Experimental results demonstrate that this approach effectively reduces hallucinations, enhances report reliability, and improves clinical value. The contributions of this study lie in reducing LLM-induced errors when processing complex medical data through dense retrieval and re-ranking, and in expanding the application of AI in precision healthcare by enabling automated, accurate, and explainable report generation.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-05T16:10:57Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-08-05T16:10:57Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	摘要 i Abstract iii 目次 v 圖次 ix 表次 xi 第一章緒論 1 1.1 研究簡介與動機 1 1.2 研究範圍 2 1.2.1 資料來源 2 1.2.2 模型選擇 2 1.2.3 系統設計 3 1.3 研究貢獻 4 1.4 章節概述 5 第二章文獻探討 7 2.1 幻覺查核 7 2.2 檢索增強生成 9 2.3 語言模型 10 2.4 評估 12 第三章研究方法 13 3.1 研究架構 13 3.1.1 資料處理 13 3.1.2 生成建議 14 3.1.3 人工查核 14 3.1.4 自動化幻覺查核 14 3.1.5 結果評估 14 3.1.6 觀察與結論 15 3.2 研究工具 15 3.2.1 大型語言模型 15 3.2.2 提示詞工程 16 3.2.3 檢索資料 17 3.2.4 向量嵌入模型 19 3.2.5 重排模型 21 3.3 研究實施與步驟 22 3.3.1 資料前處理 22 3.3.2 資料檢索 23 第四章實驗相關設定 27 4.1 模型設定 27 4.1.1 知識檢索模型 27 4.1.2 語言模型 29 4.2 評估方式 30 4.2.1 評估流程 30 4.2.2 幻覺辨識的二元分類評估指標 30 4.3 實驗設計 31 4.3.1 實驗參數設定 31 4.3.2 實驗路線圖 31 4.4 資料集 32 4.4.1 基本資訊 32 4.4.2 健康檢查數值資料 33 4.4.3 健康問卷調查資料 35 第五章實驗結果與探討 37 5.1 健檢建議內容之生成與標註 37 5.2 實驗 1: 不同溫度參數對查核成效之影響 37 5.2.1 實驗設定 37 5.2.2 結果分析 38 5.3 實驗 2: 不同檢索資料對查核成效之影響 38 5.3.1 實驗設定 39 5.3.2 結果分析 39 5.4 實驗 3: 健檢建議生成之效能評估 40 5.4.1 實驗設定 40 5.4.2 結果分析 40 5.5 幻覺查核成效的影響分析 40 5.5.1 語言模型溫度參數對查核結果具輕微影響 41 5.5.2 檢索資料品質對查核成效具有顯著貢獻 41 5.5.3 強化檢索資料來源應為提升查核表現之優先策略 41 5.6 查核結果案例分析 41 5.6.1 案例 1 42 5.6.2 案例 2 43 5.6.3 案例 3 43 第六章結論 45 6.1 主要貢獻 45 6.1.1 結合資料檢索以提升幻覺查核準確性 45 6.1.2 導入領域知識提升模型辨識幻覺的準確度 46 6.2 未來工作 46 6.2.1 針對幻覺類型進行差異化辨識與處理 46 6.2.2 擴展醫學知識來源以強化查核依據 46 參考文獻 47 附錄 A 提示詞 51 A.1 第 1 輪-賦予模型任務說明 51 A.2 第 2 輪-提供醫學知識 52 A.3 第 3 輪-提供醫學知識 52 A.4 第 4 輪-提供醫學知識 52 A.5 第 5 輪-提供醫學知識 52 A.6 第 6 輪-提供健檢數據和問卷調查 53 A.7 第 7 輪-請模型提供結果 53	-
dc.language.iso	zh_TW	-
dc.subject	幻覺查核	zh_TW
dc.subject	醫療健康	zh_TW
dc.subject	檢索增強生成	zh_TW
dc.subject	大型語言模型	zh_TW
dc.subject	Retrieval-Augmented Generation	en
dc.subject	Hallucination Detection	en
dc.subject	Healthcare	en
dc.subject	Large Language Model	en
dc.title	結合知識檢索與生成式 AI 之健檢報告生成	zh_TW
dc.title	Integrating Knowledge Retrieval with Generative AI for Health Checkup Report Generation	en
dc.type	Thesis	-
dc.date.schoolyear	113-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	張傑帆;朱大維	zh_TW
dc.contributor.oralexamcommittee	Jie-Fan Chang;Ta-Wei Chu	en
dc.subject.keyword	檢索增強生成,幻覺查核,大型語言模型,醫療健康,	zh_TW
dc.subject.keyword	Retrieval-Augmented Generation,Hallucination Detection,Large Language Model,Healthcare,	en
dc.relation.page	53	-
dc.identifier.doi	10.6342/NTU202502096	-
dc.rights.note	未授權	-
dc.date.accepted	2025-07-30	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
dc.date.embargo-lift	N/A	-
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-113-2.pdf Restricted Access	3.27 MB	Adobe PDF

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets