克服自動化ICD編碼的挑戰：大型語言模型的微調與上下文增強方法之探討

施名軒; Ming-Xuan Shi

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99109

標題:	克服自動化ICD編碼的挑戰：大型語言模型的微調與上下文增強方法之探討 Overcoming Challenges in Automated ICD Coding: Exploring Fine-Tuning and Context-Enhancement with Large Language Models
作者:	施名軒 Ming-Xuan Shi
指導教授:	陳信希 Hsin-Hsi Chen
關鍵字:	大型語言模型,醫療編碼自動化, Large Language Model,Automated Medical Coding,
出版年 :	2025
學位:	碩士
摘要:	本研究深入探討了大型語言模型（LLMs）在國際疾病分類（ICD）編碼預測任務中的應用，特別著重於使用參數高效微調（PEFT）技術，即LoRA。我們透過全面的實驗分析，將微調後的LLMs與現有的深度學習基準模型以及未經適應的通用LLMs進行比較，結果明確顯示領域適應型LLMs在處理此複雜的醫療編碼任務上展現出卓越的性能。關鍵研究發現包括：微調後的LLMs（尤其是QwQ32B）在各項評估指標上均優於傳統深度學習模型。我們觀察到，微調後的LLMs在處理稀疏編碼方面表現出顯著提升，這解決了長期以來在醫療編碼領域中的一大挑戰。研究強調，領域專屬的微調對於LLMs精確掌握醫療語義和編碼指南至關重要，僅憑通用LLMs 的少樣本學習或零樣本提示遠不足以達到所需的精確度。此外，我們發現模型規模與性能之間存在正相關，較大的LLMs在高效適應後能展現更好的結果。實驗也證實了微調後LLMs在前8位精確度（P@8）上的優勢，這對於實際臨床應用具有高度相關性。在預測主要ICD類別時，模型展現出優異的性能，顯示其對更廣泛疾病分類的穩固理解。然而，研究也揭示了在構建能進行醫療推理的小型LLMs時所面臨的挑戰，特別是通用LLMs在生成高質量思維鏈（CoT）數據方面存在過度編碼和錯誤推理的問題。同時，我們透過實驗證明，檢索增強生成（RAG）方法極具潛力，能透過提供相關ICD編碼資訊顯著提升LLMs的性能。這為未來透過整合外部知識庫來克服當前上下文長度限制並提高編碼準確性指明了方向。總之，本研究證明了參數高效微調是將大型預訓練LLMs應用於ICD編碼這一高度專業化領域的有效策略，為實現自動化、準確且資源高效的醫療編碼解決方案提供了可行路徑。 This study meticulously investigates the application of Large Language Models (LLMs) for ICD code prediction, specifically focusing on the efficacy of Parameter-Efficient Fine-Tuning (PEFT) using LoRA. Through comprehensive experimental analysis, comparing fine-tuned LLMs with established deep learning baselines and unadapted general-purpose LLMs, our findings unequivocally demonstrate the superior performance of domain-adapted LLMs in this complex medical coding task. Key findings include: fine-tuned LLMs, especially QwQ 32B, consistently outperform traditional deep learning models across various evaluation metrics. A significant breakthrough observed is the substantial improvement in Macro F1 scores for fine-tuned LLMs, indicating a remarkable ability to handle sparse codes, a long-standing challenge in medical coding. The research emphasizes that domain-specific fine-tuning is paramount for LLMs to precisely grasp medical semantics and coding guidelines, as few-shot or zero-shot prompting with general LLMs falls short of the required accuracy. Furthermore, we found a positive correlation between model scale and performance, with larger LLMs exhibiting better results when efficiently adapted. Experiments also validate the superiority of fine-tuned LLMs in Precision@8 (P@8), highly relevant for practical clinical applications. Models also show excellent performance in predicting major ICD categories, indicating a robust understanding of broader disease classifications. However, the study also highlights challenges in building medical reasoning models with smaller LLMs, particularly the propensity of general LLMs for overcoding and erroneous reasoning when generating high-quality Chain-of-Thought (CoT) data. Concurrently, our experiments demonstrate the immense potential of Retrieval-Augmented Generation (RAG), which significantly boosts LLM performance by providing in-context ICD code information. This points towards future directions in integrating external knowledge bases to overcome current context length limitations and enhance coding accuracy. In conclusion, this research proves that Parameter-Efficient Fine-Tuning is an effective strategy for adapting large, pre-trained LLMs to the specialized and highly regulated domain of ICD coding, offering a viable path towards automated, accurate, and resource-efficient medical coding solutions.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99109
DOI:	10.6342/NTU202503544
全文授權:	未授權
電子全文公開日期:	N/A
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 未授權公開取用	569.42 kB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。