Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98885
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳信希zh_TW
dc.contributor.advisorHsin-Hsi Chenen
dc.contributor.author陳彥錞zh_TW
dc.contributor.authorYen-Chun Chenen
dc.date.accessioned2025-08-20T16:09:21Z-
dc.date.available2025-08-21-
dc.date.copyright2025-08-20-
dc.date.issued2025-
dc.date.submitted2025-08-13-
dc.identifier.citationHsin-Yu Tsai, Hen-Hsen Huang, Che-Jui Chang, Jaw-Shiun Tsai, and Hsin-Hsi Chen. Patient history summarization on outpatient conversation. In 2022 IEEE/WIC/ACM International Joint Conference onWeb Intelligence and Intelligent Agent Technology (WI-IAT), pages 364–370. IEEE, 2022.
Wen-wai Yim, Yujuan Fu, Asma Ben Abacha, Neal Snider, Thomas Lin, and Meliha Yetisgen. Aci-bench: a novel ambient clinical intelligence dataset for benchmarking automatic visit note generation. Scientific data, 10(1):586, 2023.
Xuehai He, Shu Chen, Zeqian Ju, Xiangyu Dong, Hongchao Fang, Sicheng Wang, Yue Yang, Jiaqi Zeng, Ruisi Zhang, Ruoyu Zhang, et al. Meddialog: Two large-scale medical dialogue datasets. arXiv preprint arXiv:2004.03329, 2020.
Guojun Yan, Jiahuan Pei, Pengjie Ren, Zhaochun Ren, Xin Xin, Huasheng Liang, Maarten De Rijke, and Zhumin Chen. Remedi: Resources for multi-domain, multiservice, medical dialogues. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 3013–3024, 2022.
Junda Wang, Zonghai Yao, Zhichao Yang, Huixue Zhou, Rumeng Li, Xun Wang, Yucheng Xu, and Hong Yu. Notechat: a dataset of synthetic doctor-patient conversations conditioned on clinical notes. arXiv preprint arXiv:2310.15959, 2023.
Asma Ben Abacha,Wen-wai Yim, Griffin Adams, Neal Snider, and Meliha Yetisgen- Yildiz. Overview of the mediqa-chat 2023 shared tasks on the summarization & generation of doctor-patient conversations. In Proceedings of the 5th Clinical Natural Language Processing Workshop, pages 503–513, 2023.
MarcusVOrtega, MichaelKHidrue, Sara R Lehrhoff, Dan B Ellis, Rachel C Sisodia, William T Curry, Marcela G Del Carmen, and Jason H Wasfy. Patterns in physician burnout in a stable-linked cohort. JAMANetwork Open, 6(10):e2336745–e2336745, 2023.
Jeffrey Budd. Burnout related to electronic health record use in primary care. Journal of primary care & community health, 14:21501319231166921, 2023.
Yizhan Li, Sifan Wu, Christopher Smith, Thomas Lo, and Bang Liu. Improving clinical note generation from complex doctor-patient conversation. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 209–221. Springer, 2025.
John Giorgi, Augustin Toma, Ronald Xie, Sondra S Chen, Kevin R An, Grace X Zheng, and Bo Wang. Wanglab at mediqa-chat 2023: Clinical note generation from doctor-patient conversations using large language models. arXiv preprint arXiv:2305.02220, 2023.
Yu-Wen Chen and Julia Hirschberg. Exploring robustness in doctor-patient conversation summarization: An analysis of out-of-domain soap notes. arXiv preprint arXiv:2406.02826, 2024.
Wei Chen, Zhiwei Li, Hongyi Fang, Qianyuan Yao, Cheng Zhong, Jianye Hao, Qi Zhang, Xuanjing Huang, Jiajie Peng, and Zhongyu Wei. A benchmark for automatic medical consultation system: frameworks, tasks and datasets. Bioinformatics, 39(1):btac817, 2023.
Asma Ben Abacha, Wen-wai Yim, Yadan Fan, and Thomas Lin. An empirical study of clinical note generation from doctor-patient encounters. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2291–2302, 2023.
Lin Long, RuiWang, Ruixuan Xiao, Junbo Zhao, Xiao Ding, Gang Chen, and Haobo Wang. On llms-driven synthetic data generation, curation, and evaluation: A survey. arXiv preprint arXiv:2406.15126, 2024.
Huiyi Leong, Yifan Gao, Shuai Ji, Yang Zhang, and Uktu Pamuksuz. Efficient finetuning of large language models for automated medical documentation. In 2024 4th International Conference on Digital Society and Intelligent Systems (DSInS), pages 204–209. IEEE, 2024.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98885-
dc.description.abstract大型語言模型近年來被視為簡化臨床工作流程的有力工具。然而,在高度敏感的醫療領域中應用模型面臨諸多挑戰,例如隱私保護,以及缺乏公開且高品質的臨床對話資料集。本研究聚焦於使用開源大型語言模型,在完全本地環境下,從實際的門診對話中產生臨床紀錄,以確保病患隱私不外洩。我們設計了一套完整的資料前處理流程,包含對實際醫療對話的摘要與翻譯,並重新標註對應的臨床紀錄內容。本文探討三種臨床紀錄生成方式:單階段端到端生成、兩階段檢索增強生成、以及單階段生成搭配合成對話擴充。我們的實驗顯示監督式微調在效能上表現優異,且小型模型在準確檢索關鍵證據方面亦展現潛力。儘管大型語言模型可在一定程度上協助摘要臨床紀錄,但要維持完全在地部署兼顧效能仍是一大挑戰。本論文突顯了當前大型語言模型應用於醫療資料的潛力與限制,特別是在隱私要求高、需本地部署的場景下。zh_TW
dc.description.abstractLarge language models (LLMs) have emerged as a promising tool to streamline clinical workflows. However, the application of LLMs in the highly sensitive domain of healthcare faces major challenges, such as strict privacy regulations and the scarcity of publicly available, high-quality clinical dialogue datasets. This work focuses on clinical note generation from real-world outpatient conversations using open-source LLMs in a fully local environment to preserve patient privacy. We developed a comprehensive data preprocessing pipeline involving summarization and translation of real-life medical dialogues, along with meticulous re-annotation of the corresponding clinical notes. Three approaches to note generation are explored: One-stage End-to-end Generation, Two-stage Retrieval-Augmented Generation and One-stage Generation with Synthetic Dialogue Augmentation. Our experiments demonstrated the effectiveness of supervised fine-tuning methods and the the potential of smaller models in accurately retrieving evidence. While LLM applications can assist in summarizing clinical notes to a certain extent, maintaining fully local models for privacy remains a significant challenge. This work highlights both the potential and the limitations of current LLM-based approaches in this specialized domain, particularly under local deployment constraints.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-20T16:09:21Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-08-20T16:09:21Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents致謝........ i
摘要....... iii
Abstract...... iv
Contents...... v
List of Figures...... vii
List of Tables...... viii
Chapter 1 Introduction......1
Chapter 2 Related Work...... 4
2.1 Clinical Note Generation....... 4
2.2 Medical Dialogue Datasets...... 5
2.3 Synthetic Data Generation...... 6
Chapter 3 Datasets ......7
3.1 FamilyMed-Dialogue-Note Dataset....... 8
3.2 ACI-Bench Dataset.......9
Chapter 4 Methodology ......10
4.1 FamilyMed-Dialogue-Note Data Preprocessing ....... 11
4.1.1 Translation....... 12
4.1.2 Summarization....... 13
4.1.3 Re-annotation....... 14
4.2 Clinical Note Generation......16
4.2.1 One-stage End-to-end Generation ...... 16
4.2.2 Two-stage Retrieval-Augmented Generation....... 16
4.2.3 One-stage Generation with Synthetic Dialogue Augmentation......18
4.3 Evaluation Metrics......20
Chapter 5 Experiments...... 22
5.1 Experimental Setup...... 22
5.2 One-stage End-to-end Generation...... 24
5.2.1 Impact of Model Size...... 24
5.2.2 Effect of Few-Shot Prompting ......25
5.2.3 Supervised Fine-tuning (SFT) ...... 26
5.2.4 Input Format Comparison....... 27
5.2.5 Combining SFT with Few-Shot Prompts ........ 28
5.3 Two-stage Retrieval-Augmented Generation....... 29
5.4 One-stage Generation with Synthetic Dialogue Augmentation ...... 31
Chapter 6 Discussion ......33
6.1 Case Study...... 33
6.2 Column-Wise Analysis....... 33
6.2.1 Chief Complaint....... 36
6.2.2 Patient History and Lifestyle......36
6.2.3 Assessment and Plan.......36
Chapter 7 Conclusion ......38
References ......39
-
dc.language.isoen-
dc.subject大型語言模型zh_TW
dc.subject臨床紀錄生成zh_TW
dc.subject本地部署zh_TW
dc.subject隱私保護zh_TW
dc.subject醫療對話zh_TW
dc.subjectMedical Dialogueen
dc.subjectPrivacyen
dc.subjectLocal Deploymenten
dc.subjectLarge Language Modelsen
dc.subjectClinical Note Generationen
dc.title基於本地大型語言模型的門診對話臨床紀錄摘要zh_TW
dc.titleClinical Note Summarization from Outpatient Conversations Using Local LLMsen
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee鄭卜壬;陳建錦;古倫維zh_TW
dc.contributor.oralexamcommitteePJ Cheng;Chien Chin Chen;Lun-Wei Kuen
dc.subject.keyword臨床紀錄生成,大型語言模型,醫療對話,隱私保護,本地部署,zh_TW
dc.subject.keywordClinical Note Generation,Large Language Models,Medical Dialogue,Privacy,Local Deployment,en
dc.relation.page41-
dc.identifier.doi10.6342/NTU202504251-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2025-08-15-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊網路與多媒體研究所-
dc.date.embargo-lift2025-08-21-
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf1.47 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved