Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99109
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳信希zh_TW
dc.contributor.advisorHsin-Hsi Chenen
dc.contributor.author施名軒zh_TW
dc.contributor.authorMing-Xuan Shien
dc.date.accessioned2025-08-21T16:25:23Z-
dc.date.available2025-08-22-
dc.date.copyright2025-08-21-
dc.date.issued2025-
dc.date.submitted2025-08-02-
dc.identifier.citationMeasuring diagnoses: Icd code accuracy. Health Services Research, 40(5 II):1620-1639, October 2005. ISSN 0017-9124. doi: 10.1111/j.1475-6773.2005.00444.x.
Elena Birman-Deych, Amy D Waterman, Yan Yan, David S Nilasena, Martha J Radford, and Brian F Gage. Accuracy of icd-9-cm codes for identifying cardiovascular and stroke risk factors. Medical care, 43(5):480–485, 2005.
Alex Bottle and Paul Aylin. Intelligent information: A national system for monitoring clinical performance. Health services research, 43:10–31, 03 2008. doi: 10.1111/j.1475-6773.2007.00742.x.
Junying Chen, Zhenyang Cai, Ke Ji, Xidong Wang, Wanlong Liu, Rongsheng Wang, Jianye Hou, and Benyou Wang. Huatuogpt-o1, towards medical complex reasoning with llms. arXiv preprint arXiv:2412.18925, 2024.
Luciano R. S. de Lima, Alberto H. F. Laender, and Berthier A. Ribeiro-Neto. A hierarchical approach to the automatic categorization of medical documents. In Proceedings of the Seventh International Conference on Information and Knowledge Management, CIKM ’98, page 132–139, New York, NY, USA, 1998. Association for Computing Machinery. ISBN 1581130619. doi: 10.1145/288627.288649. URL https://doi.org/10.1145/288627.288649.
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024.
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025.
Chao-Wei Huang, Shang-Chi Tsai, and Yun-Nung Chen. Plm-icd: Automatic icd coding with pretrained language models. arXiv preprint arXiv:2207.05289, 2022.
Alistair EW Johnson, Tom J Pollard, Lu Shen, Li-wei H Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9, 2016.
Bevan Koopman, Guido Zuccon, Anthony Nguyen, Anton Bergheim, and Narelle Grayson. Automatic icd-10 classification of cancers from free-text death certificates. International journal of medical informatics, 84(11):956–965, 2015.
Fei Li and Hong Yu. Icd coding from clinical text using multi-filter residual convolutional neural network. In proceedings of the AAAI conference on artificial intelligence, volume 34, pages 8180–8187, 2020.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26, 2013.
Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, and Tatsunori Hashimoto. s1: Simple test-time scaling. arXiv preprint arXiv:2501.19393, 2025.
James Mullenbach, Sarah Wiegreffe, Jon Duke, Jimeng Sun, and Jacob Eisenstein. Explainable prediction of medical codes from clinical text. arXiv preprint arXiv:1802.05695, 2018.
Adler Perotte, Rimma Pivovarov, Karthik Natarajan, Nicole Weiskopf, Frank Wood, and Noémie Elhadad. Diagnosis code assignment: models and evaluation metrics. Journal of the American Medical Informatics Association, 21(2):231–237, 2014.
Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36: 53728–53741, 2023.
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Y Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300, 2024.
Ali Soroush, Benjamin S Glicksberg, Eyal Zimlichman, Yiftach Barash, Robert Freeman, Alexander W Charney, Girish N Nadkarni, and Eyal Klang. Large language models are poor medical coders—benchmarking of medical code querying. NEJM AI, 1(5):AIdbp2300040, 2024.
Qwen Team. Qwq-32b: Embracing the power of reinforcement learning, March 2025. URL https://qwenlm.github.io/blog/qwq-32b/.
Thanh Vu, Dat Quoc Nguyen, and Anthony Nguyen. A label attention model for icd coding from clinical text. arXiv preprint arXiv:2007.06351, 2020.
World Health Organization. International statistical classification of diseases and related health problems. World Health Organization, 10th edition, 2016. URL https://icd.who.int/browse10/2016/en.
Fengli Xu, Qianyue Hao, Zefang Zong, Jingwei Wang, Yunke Zhang, Jingyi Wang, Xiaochong Lan, Jiahui Gong, Tianjian Ouyang, Fanjin Meng, et al. Towards large reasoning models: A survey of reinforced reasoning with large language models. arXiv preprint arXiv:2501.09686, 2025.
Haoran Xu, Baolin Peng, Hany Awadalla, Dongdong Chen, Yen-Chun Chen, Mei Gao, Young Jin Kim, Yunsheng Li, Liliang Ren, Yelong Shen, et al. Phi-4-minireasoning: Exploring the limits of small reasoning language models in math. arXiv preprint arXiv:2504.21233, 2025.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99109-
dc.description.abstract本研究深入探討了大型語言模型(LLMs)在國際疾病分類(ICD)編碼預測任務中的應用,特別著重於使用參數高效微調(PEFT)技術,即LoRA。我們透過全面的實驗分析,將微調後的LLMs與現有的深度學習基準模型以及未經適應的通用LLMs進行比較,結果明確顯示領域適應型LLMs在處理此複雜的醫療編碼任務上展現出卓越的性能。
關鍵研究發現包括:微調後的LLMs(尤其是QwQ32B)在各項評估指標上均優於傳統深度學習模型。我們觀察到,微調後的LLMs在處理稀疏編碼方面表現出顯著提升,這解決了長期以來在醫療編碼領域中的一大挑戰。研究強調,領域專屬的微調對於LLMs精確掌握醫療語義和編碼指南至關重要,僅憑通用LLMs 的少樣本學習或零樣本提示遠不足以達到所需的精確度。此外,我們發現模型規模與性能之間存在正相關,較大的LLMs在高效適應後能展現更好的結果。實驗也證實了微調後LLMs在前8位精確度(P@8)上的優勢,這對於實際臨床應用具有高度相關性。在預測主要ICD類別時,模型展現出優異的性能,顯示其對更廣泛疾病分類的穩固理解。
然而,研究也揭示了在構建能進行醫療推理的小型LLMs時所面臨的挑戰,特別是通用LLMs在生成高質量思維鏈(CoT)數據方面存在過度編碼和錯誤推理的問題。同時,我們透過實驗證明,檢索增強生成(RAG)方法極具潛力,能透過提供相關ICD編碼資訊顯著提升LLMs的性能。這為未來透過整合外部知識庫來克服當前上下文長度限制並提高編碼準確性指明了方向。
總之,本研究證明了參數高效微調是將大型預訓練LLMs應用於ICD編碼這一高度專業化領域的有效策略,為實現自動化、準確且資源高效的醫療編碼解決方案提供了可行路徑。
zh_TW
dc.description.abstractThis study meticulously investigates the application of Large Language Models (LLMs) for ICD code prediction, specifically focusing on the efficacy of Parameter-Efficient Fine-Tuning (PEFT) using LoRA. Through comprehensive experimental analysis, comparing fine-tuned LLMs with established deep learning baselines and unadapted general-purpose LLMs, our findings unequivocally demonstrate the superior performance of domain-adapted LLMs in this complex medical coding task.
Key findings include: fine-tuned LLMs, especially QwQ 32B, consistently outperform traditional deep learning models across various evaluation metrics. A significant breakthrough observed is the substantial improvement in Macro F1 scores for fine-tuned LLMs, indicating a remarkable ability to handle sparse codes, a long-standing challenge in medical coding. The research emphasizes that domain-specific fine-tuning is paramount for LLMs to precisely grasp medical semantics and coding guidelines, as few-shot or zero-shot prompting with general LLMs falls short of the required accuracy. Furthermore, we found a positive correlation between model scale and performance, with larger LLMs exhibiting better results when efficiently adapted. Experiments also validate the superiority of fine-tuned LLMs in Precision@8 (P@8), highly relevant for practical clinical applications. Models also show excellent performance in predicting major ICD categories, indicating a robust understanding of broader disease classifications.
However, the study also highlights challenges in building medical reasoning models with smaller LLMs, particularly the propensity of general LLMs for overcoding and erroneous reasoning when generating high-quality Chain-of-Thought (CoT) data. Concurrently, our experiments demonstrate the immense potential of Retrieval-Augmented Generation (RAG), which significantly boosts LLM performance by providing in-context ICD code information. This points towards future directions in integrating external knowledge bases to overcome current context length limitations and enhance coding accuracy.
In conclusion, this research proves that Parameter-Efficient Fine-Tuning is an effective strategy for adapting large, pre-trained LLMs to the specialized and highly regulated domain of ICD coding, offering a viable path towards automated, accurate, and resource-efficient medical coding solutions.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-21T16:25:22Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-08-21T16:25:23Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsAcknowledgements i
摘要 iii
Abstract v
Contents vii
List of Figures xi
List of Tables xii
Chapter 1 Introduction 1
1.1 The Challenge of Medical Coding 1
1.2 Evolution of Automated Coding Models 2
1.3 Motivation 4
1.4 Thesis Organization 5
Chapter 2 Related Work 6
2.1 Evolution of Automated Coding Research 6
2.2 Deep Learning Models for ICD Coding 7
2.2.1 CAML 7
2.2.2 MultiResCNN 8
2.2.3 LAAT 10
2.2.4 PLM-ICD 11
2.3 Challenges of Large Language Models in Automated Medical Coding 12
Chapter 3 Method 14
3.1 Datasets 14
3.2 Preprocessing 15
3.3 Task Definition 16
3.4 Overall Framework 16
Chapter 4 Experiments 20
4.1 Experiment Setup 20
4.1.1 Computational Resources 20
4.1.2 Model Selection and Configuration 20
4.1.3 Parameter-Efficient Fine-Tuning(PEFT) with LoRA 21
4.1.4 Training Hyperparameters 22
4.2 Evaluation Metrics 23
4.2.1 Micro-averaged Metrics 23
4.2.2 Macro-averaged Metrics 24
4.2.3 Precision at 8(P@8) 25
4.3 Results and Discussion 25
4.3.1 Experimental Findings and Analysis 25
4.3.2 Overall Performance Superiority of Fine-tuned LLMs 26
4.3.3 Enhanced Performance on Sparse Codes: A Key LLM Advantage 27
4.3.4 The Indispensable Role of Domain-Specific Fine-Tuning 28
4.3.5 Impact of Model Scale on Fine-tuned Performance 28
4.3.6 Clinical Relevance and Practical Utility (P@8) 29
4.3.7 Generative Fidelity of Code Sequences (BLEU-L) 29
4.3.8 Resource Efficiency, Scale, and the Performance Landscape 29
4.3.9 Granularity of Prediction: Significant Gains in Major Categories and Future Directions 30
Chapter 5 Beyond Fine-tuning 33
5.1 Challenges in Training Reasoning Models 33
5.1.1 Empirical Validation: Large LLM Performance on ICD Coding Difficulty 35
5.1.2 Case Study: Overcoding in ICD-9 Assignment 36
5.1.3 Conclusion and Implications for Reasoning Model Development 37
5.2 Exploring Retrieval-Augmented Generation (RAG) for ICD Coding 38
5.2.1 Experimental Design for RAG Feasibility 38
5.2.2 Results and Discussion 39
5.2.2.1 Observation 1: Significant Performance Boost with In Context Codes 39
5.2.2.2 Observation 2: Impact of Noise and Completeness of Relevant Codes 40
5.2.2.3 Observation 3: General LLMs Benefit Disproportionately from RAG 41
Chapter 6 Conclusion & Future Work 42
6.1 Conclusion 42
6.2 Future Work 44
6.2.1 Enhancing Reasoning Capabilities for Granular ICD Coding 44
6.2.2 Advancing Retrieval-Augmented Generation (RAG) for ICD Coding 45
6.2.3 Beyond ICD-9 and Broader Implications 47
References 48
-
dc.language.isoen-
dc.subject醫療編碼自動化zh_TW
dc.subject大型語言模型zh_TW
dc.subjectAutomated Medical Codingen
dc.subjectLarge Language Modelen
dc.title克服自動化ICD編碼的挑戰:大型語言模型的微調與上下文增強方法之探討zh_TW
dc.titleOvercoming Challenges in Automated ICD Coding: Exploring Fine-Tuning and Context-Enhancement with Large Language Modelsen
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee陳冠宇;黃乾綱;黃瀚萱zh_TW
dc.contributor.oralexamcommitteeKuan-Yu Chen;Chien-Kang Huang;Hen-Hsen Huangen
dc.subject.keyword大型語言模型,醫療編碼自動化,zh_TW
dc.subject.keywordLarge Language Model,Automated Medical Coding,en
dc.relation.page51-
dc.identifier.doi10.6342/NTU202503544-
dc.rights.note未授權-
dc.date.accepted2025-08-06-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
dc.date.embargo-liftN/A-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf
  未授權公開取用
569.42 kBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved