Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98390
Full metadata record
???org.dspace.app.webui.jsptag.ItemTag.dcfield???ValueLanguage
dc.contributor.advisor張智星zh_TW
dc.contributor.advisorJyh-Shing Jangen
dc.contributor.author楊博勛zh_TW
dc.contributor.authorPo-Hsun Yangen
dc.date.accessioned2025-08-05T16:10:57Z-
dc.date.available2025-08-06-
dc.date.copyright2025-08-05-
dc.date.issued2025-
dc.date.submitted2025-07-28-
dc.identifier.citation[1] V. Karpukhin, B. Oguz, S. Min, et al., “Dense passage retrieval for open-domain question answering,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online: Association for Computational Linguistics, 2020, pp. 6769–6781.
[2] Y. Liu, T. Han, S. Ma, and J. Zhang, “Summary of chatgpt-related research and perspective towards the future of large language models,” arXiv preprint arXiv:2304.01852, 2023.
[3] M. Enis and M. Hopkins, “From llm to nmt: Advancing low-resource machine translation with claude,” arXiv preprint arXiv:2404.13813, 2024.
[4] H.-W. Cheng, “Challenges and limitations of chatgpt and artificial intelligence for scientific research: A perspective from organic materials,” AI , vol. 4, no. 2, pp. 401–405, 2023. doi: 10.3390/ai4020021.
[5] S. Farquhar, J. Kossen, L. Kuhn, et al., “Detecting hallucinations in large language models using semantic entropy,” Nature, vol. 630, pp. 625–630, 2024. doi: 10.1038/s41586-024-07421-0.
[6] S. Dhuliawala, M. Komeili, J. Xu, et al., “Chain-of-verification reduces hallucinationin large language models,” arXiv preprint arXiv:2309.11495, 2023.
[7] A. Mishra, A. Asai, V. Balachandran, Y. Wang, G. Neubig, and Y. Tsvetkov, “Fine-grained hallucination detection and editing for language models,” arXiv preprint2401.06855, 2024.
[8] L. Huang, W. Yu, W. Ma, et al., “A survey on hallucination in large language models:Principles, taxonomy, challenges, and open questions,” arXiv preprint arXiv:2311.05232,2023.
[9] S. T. I. Tonmoy, S. M. M. Zaman, V. Jain, et al., “A comprehensive survey on hallu-cination in large language models: Definition, evaluation, detection, and mitigation,” arXiv preprint arXiv:2401.01313, 2024.
[10] Z. Ji, N. Lee, R. Frieske, et al., “Survey of hallucination in natural language generation,” arXiv preprint arXiv:2202.03629, 2022.
[11] A. J. Thirunavukarasu, D. S. J. Ting, K. Elangovan, et al., “Large language models in medicine,” Nature Medicine, vol. 29, pp. 1930–1940, 2023. doi: 10.1038/s41591-023-02448-8
[12] D. Wadden, S. Lin, K. Lo, et al., “Fact or fiction: Verifying scientific claims,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online: Association for Computational Linguistics, 2020, pp. 7534–7550.
[13] J. Vladika, P. Schneider, and F. Matthes, “Healthfc: Verifying health claims with evidence-based medical fact-checking,” in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italia: ELRA and ICCL, 2024, pp. 8095–8107.
[14] J. Vladika, P. Schneider, and F. Matthes, “Medreqal: Examining medical knowledge recall of large language models via question answering,” arXiv preprint arXiv:2406.05845,2024.
[15] Y. Gao, Y. Xiong, X. Gao, et al., “Retrieval-augmented generation for large language models: A survey,” arXiv preprint arXiv:2312.10997 , 2023.
[16] Y. Bengio and P. Vincent, “A neural probabilistic language model,” Journal of Machine Learning Research, pp. 1137–1155, 2003.
[17] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[18] A. Grover and J. Leskovec, “Node2vec: Scalable feature learning for networks,” arXiv preprint arXiv:1607.00653, 2016.
[19] “Improving language understanding by generative pre-training.” Accessed: 2024-04-02. (2023), [Online]. Available: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
[20] C. Raffel, N. Shazeer, A. Roberts, K. Lee, and S. Narang, “Exploring the limits of transfer learning with a unified text-to-text transformer,” arXiv preprint arXiv:1910.10683, 2019.
[21] DeepSeek-AI, D. Guo, D. Yang, H. Zhang, J. Song, and R. Zhang, “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,” arXiv preprint arXiv:2501.12948, 2025.
[22] A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. Bowman, “GLUE: A multi-task benchmark and analysis platform for natural language understanding,” in Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium: Association for Computational Linguistics, 2018, pp. 353–355.
[23] A. Wang, Y. Pruksachatkun, N. Nangia, and A. Singh, “Superglue: A stickierbenchmark for general-purpose language understanding systems,” arXiv preprint arXiv:1905.00537 , 2019.
[24] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: A method for automatic evaluation of machine translation,” arXiv preprint arXiv:2109.07958, 2002.
[25] C.-Y. Lin, “Rouge: A package for automatic evaluation of summaries,” in Text Summarization Branches Out, Barcelona, Spain: Association for Computational Linguistics, 2004, pp. 74–81.
[26] J. Vladika, P. Schneider, and F. Matthes, “Truthfulqa: Measuring how models mimic human falsehoods,” arXiv preprint arXiv:2109.07958, 2019.
[27] A. Srivastava, A. Rastogi, A. Rao, A. A. M. Shoeb, and A. Abid, “Beyond the imitation game: Quantifying and extrapolating the capabilities of language models,”arXiv preprint arXiv:2206.04615, 2022.
[28] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, and L. Jones, “Attention is all you need,” arXiv preprint arXiv:1706.03762, 2017.
[29] Z. Zhou, X. Ning, K. Hong, T. Fu, J. Xu, and S. Li, “A survey on efficient inference for large language models,” arXiv preprint arXiv:2404.14294, 2024.
[30] “Prompt engineering playbook.” Accessed: 2024-04-02. (2023), [Online]. Available: https://www.developer.tech.gov.sg/products/collections/data-science-and-artificial-intelligence/playbooks/prompt- engineering-playbook-beta-v3.pdf.
[31] “Wikipedia 資料下載頁面.” Accessed: 2025-04-02. (2025), [Online]. Available: https://zh.wikipedia.org/wiki/Wikipedia:%E6%95%B0%E6%8D%AE%E5%BA%93%E4%B8%8B%E8%BD%BD.
[32] “Bge-m3.” Accessed: 2024-04-02. (2023), [Online]. Available: https://huggingface.co/BAAI/bge-m3.
[33] X. Zhang, X. Ma, P. Shi, and J. Lin, “Mr. tydi: A multi-lingual benchmark for dense retrieval,” arXiv preprint arXiv:2108.08787 , 2018.
[34] L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, and A. Siddhant, “Mt5: A massively multilingual pre-trained text-to-text transformer,” arXiv preprint arXiv:2010.11934, 2010.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98390-
dc.description.abstract本研究旨在建立資料檢索導向的健康檢查報告生成系統。系統整合 ChatGPT 語言模型生成報告內容,同時以標準化健康檢查數據及結構化問卷調查作為檢索資料庫。透過多階段檢索架構,首先利用BGE-M3 embedding模型進行初步語義匹配,再透過 BGE-reranker 對查核項目(健檢建議)與檢索結果(檢查數據和問卷)進行精確相關性排序,篩選出高度相關的結果作為DeepSeek 語言模型查核的強化參考依據。此機制能有效識別健檢建議中的幻覺現象與邏輯謬誤,顯著提升報告整體可靠性與臨床參考價值。本研究主要貢獻為:(一)減少LLM在處理複雜健檢資料時的幻覺與誤判:運用 BGE-M3 進行多維度密集向量檢索獲取相關參考資料,再以 BGE-reranker 應用注意力機制重排檢索結果,精確篩選高相關資訊作為 LLM 查核依據,有效檢測幻覺的產生,未來可根據不同檢查項目特性與需求動態調整檢索參數與資料集;(二)拓展智慧醫學領域應用:實現高度客製化的自動健檢報告生成與幻覺查核,提高報告準確性、專業一致性與患者理解度,有效減輕醫師工作負荷,同時保持專業判讀品質,為精準醫療與人工智能醫療整合提供可擴展實證架構,推動醫療AI從輔助決策工具向智能協作夥伴的轉型。zh_TW
dc.description.abstractThis study proposes a retrieval-augmented system for generating personalized health checkup reports. By integrating the ChatGPT language model with standardized health examination data and structured questionnaires as the retrieval corpus, the system employs a multi-stage retrieval framework: BGE-M3 is used for initial semantic matching, followed by BGE-reranker to rank the relevance between checkup suggestions and retrieved content. The top-ranked results are then provided as contextual grounding for the DeepSeek language model to detect hallucinations and logical inconsistencies in the generated recommendations. Experimental results demonstrate that this approach effectively reduces hallucinations, enhances report reliability, and improves clinical value. The contributions of this study lie in reducing LLM-induced errors when processing complex medical data through dense retrieval and re-ranking, and in expanding the application of AI in precision healthcare by enabling automated, accurate, and explainable report generation.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-05T16:10:57Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-08-05T16:10:57Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents摘要 i
Abstract iii
目次 v
圖次 ix
表次 xi
第一章 緒論 1
1.1 研究簡介與動機 1
1.2 研究範圍 2
1.2.1 資料來源 2
1.2.2 模型選擇 2
1.2.3 系統設計 3
1.3 研究貢獻 4
1.4 章節概述 5
第二章 文獻探討 7
2.1 幻覺查核 7
2.2 檢索增強生成 9
2.3 語言模型 10
2.4 評估 12
第三章 研究方法 13
3.1 研究架構 13
3.1.1 資料處理 13
3.1.2 生成建議 14
3.1.3 人工查核 14
3.1.4 自動化幻覺查核 14
3.1.5 結果評估 14
3.1.6 觀察與結論 15
3.2 研究工具 15
3.2.1 大型語言模型 15
3.2.2 提示詞工程 16
3.2.3 檢索資料 17
3.2.4 向量嵌入模型 19
3.2.5 重排模型 21
3.3 研究實施與步驟 22
3.3.1 資料前處理 22
3.3.2 資料檢索 23
第四章 實驗相關設定 27
4.1 模型設定 27
4.1.1 知識檢索模型 27
4.1.2 語言模型 29
4.2 評估方式 30
4.2.1 評估流程 30
4.2.2 幻覺辨識的二元分類評估指標 30
4.3 實驗設計 31
4.3.1 實驗參數設定 31
4.3.2 實驗路線圖 31
4.4 資料集 32
4.4.1 基本資訊 32
4.4.2 健康檢查數值資料 33
4.4.3 健康問卷調查資料 35
第五章 實驗結果與探討 37
5.1 健檢建議內容之生成與標註 37
5.2 實驗 1: 不同溫度參數對查核成效之影響 37
5.2.1 實驗設定 37
5.2.2 結果分析 38
5.3 實驗 2: 不同檢索資料對查核成效之影響 38
5.3.1 實驗設定 39
5.3.2 結果分析 39
5.4 實驗 3: 健檢建議生成之效能評估 40
5.4.1 實驗設定 40
5.4.2 結果分析 40
5.5 幻覺查核成效的影響分析 40
5.5.1 語言模型溫度參數對查核結果具輕微影響 41
5.5.2 檢索資料品質對查核成效具有顯著貢獻 41
5.5.3 強化檢索資料來源應為提升查核表現之優先策略 41
5.6 查核結果案例分析 41
5.6.1 案例 1 42
5.6.2 案例 2 43
5.6.3 案例 3 43
第六章 結論 45
6.1 主要貢獻 45
6.1.1 結合資料檢索以提升幻覺查核準確性 45
6.1.2 導入領域知識提升模型辨識幻覺的準確度 46
6.2 未來工作 46
6.2.1 針對幻覺類型進行差異化辨識與處理 46
6.2.2 擴展醫學知識來源以強化查核依據 46
參考文獻 47
附錄 A 提示詞 51
A.1 第 1 輪-賦予模型任務說明 51
A.2 第 2 輪-提供醫學知識 52
A.3 第 3 輪-提供醫學知識 52
A.4 第 4 輪-提供醫學知識 52
A.5 第 5 輪-提供醫學知識 52
A.6 第 6 輪-提供健檢數據和問卷調查 53
A.7 第 7 輪-請模型提供結果 53
-
dc.language.isozh_TW-
dc.subject幻覺查核zh_TW
dc.subject醫療健康zh_TW
dc.subject檢索增強生成zh_TW
dc.subject大型語言模型zh_TW
dc.subjectRetrieval-Augmented Generationen
dc.subjectHallucination Detectionen
dc.subjectHealthcareen
dc.subjectLarge Language Modelen
dc.title結合知識檢索與生成式 AI 之健檢報告生成zh_TW
dc.titleIntegrating Knowledge Retrieval with Generative AI for Health Checkup Report Generationen
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee張傑帆;朱大維zh_TW
dc.contributor.oralexamcommitteeJie-Fan Chang;Ta-Wei Chuen
dc.subject.keyword檢索增強生成,幻覺查核,大型語言模型,醫療健康,zh_TW
dc.subject.keywordRetrieval-Augmented Generation,Hallucination Detection,Large Language Model,Healthcare,en
dc.relation.page53-
dc.identifier.doi10.6342/NTU202502096-
dc.rights.note未授權-
dc.date.accepted2025-07-30-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
dc.date.embargo-liftN/A-
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
ntu-113-2.pdf
  Restricted Access
3.27 MBAdobe PDF
Show simple item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved