先進肌肉骨骼超音波電腦輔助診斷:首創用於全身解剖結構即時辨識之低秩適應DETR與大型語言模型智慧診斷前瞻性研究

高駿平; Jyun-Ping Kao

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101372

標題:	先進肌肉骨骼超音波電腦輔助診斷:首創用於全身解剖結構即時辨識之低秩適應DETR與大型語言模型智慧診斷前瞻性研究 Advancing Computer-Aided Diagnosis in Musculoskeletal Ultrasound: First Low-Rank Adaptation Based DETR for Real-Time Full Body Anatomical Structures Identification with Prospective Study in Intelligent Diagnosis Using Large Language Model
作者:	高駿平 Jyun-Ping Kao
指導教授:	陳中平 Charlie Chung-Ping Chen
共同指導教授:	陳文翔 Wen-Shiang Chen
關鍵字:	深度學習,Transformer微調低秩適應物件偵測肌肉骨骼超音波大型語言模型 Deep Learning,TransformerFine-tuneLow-Rank AdaptationObject DetectionMusculoskeletal UltrasoundLarge Language Model
出版年 :	2025
學位:	碩士
摘要:	用於物件偵測的醫學影像模型通常仰賴大量的預訓練資料，但由於醫學資料的稀缺性和隱私限制，造成資料難以獲取。實際上，醫院通常僅能存取預訓練模型的權重，無法取得原始訓練資料，導致難以將模型調整以適應特定的病患族群和成像設備。這種僅共享權重的模式帶來了顯著的挑戰：當醫院在有限的本地資料集上微調這些模型時，傳統微調方法需要改寫數百萬個參數，從而導致模型對於預訓練資料的災難性遺忘和顯著的領域偏移。此類偏移經常導致模型性能大幅的下降，各解剖結構上準確率下降 6-24%，某些關鍵解剖結構甚至下降高達 100%，使原先可偵測的解剖結構完全無法辨識。為了解決上述挑戰，本研究提出首個應用於全身肌肉骨骼超音波的低秩適應（LoRA）增強型即時檢測 Transformer (RT-DETR) 模型。透過在 RT-DETR 的部分編碼器和解碼器層嵌入 LoRA 模組，在保留模型表徵能力的同時，可將訓練參數量分別減少99.45%（RT-DETR-L）和99.68%（RT-DETR-X）。如此大幅的參數精簡使模型僅需利用極少量的本地資料（約為預訓練資料集的10%）即可進行高效微調，不僅在微調資料集中缺乏的解剖結構上仍維持原本的準確率，更顯著提升在本地資料中解剖結構上的辨識準確率。此方法有效模擬了醫院間僅共享模型權重的臨床情境，使接收機構能夠運用自身有限的資料成功微調模型，同時不影響模型在先前已學領域上的性能。在五折交叉驗證實驗中，所提出的 LoRA 增強模型相較於傳統的全模型微調表現更佳，不僅在各類肌肉骨骼結構上維持甚至提升了辨識準確率，還展現出避免領域偏移的特性。我們所提出的 LoRA 增強 RT-DETR 大幅降低了 Transformer 架構物件偵測模型於臨床部署的門檻，提供了一種兼顧隱私且輕量化微調的即時全身肌肉骨骼超音波辨識的解決方法，且無需大量的預訓練或預訓練資料共享即可有效適應各種臨床環境。此外，本研究亦包含了探討將大型語言模型應用於肌肉骨骼超音波的智慧診斷的前瞻性研究。我們利用Llama 3.2 11B Vision 模型並模擬醫師於臨床診斷的報告流程進行微調，以實現生成人工智慧輔助診斷報告。 Medical imaging models for object identification often rely on extensive pretraining data, which is difficult to obtain due to data scarcity and privacy constraints. In practice, hospitals typically have access only to pretrained model weights without the original training data, severely limiting their ability to tailor models to specific patient populations and imaging devices. This weight-only sharing paradigm presents significant challenges: when recipient hospitals fine-tune these models on their limited local datasets, conventional approaches rewrite millions of parameters, triggering catastrophic forgetting and substantial domain shifts. Such shifts frequently result in devastating performance degradation, with accuracy declining by 6-24% across anatomical structures and up to 100% for certain critical structures, essentially rendering some previously detectable features completely unrecognizable. We address these challenges with the first Low-Rank Adaptation (LoRA)-enhanced Real-Time Detection Transformer (RT-DETR) model for full body musculoskeletal (MSK) ultrasound (US). By injecting LoRA modules into select encoder and decoder layers of RT-DETR, we achieved a 99.45% (RT-DETR-L) and 99.68% (RT-DETR-X) reduction in trainable parameters while preserving the model’s representational power. This extreme reduction enables efficient fine-tuning using only minimal institution-specific data (10% of the pretraining dataset) and not only maintains robust performance on anatomical structures absent from the fine-tuning set but also significantly enhances detection accuracy on domain-specific structures. This approach effectively simulates a clinical scenario where hospitals share only model weights, and recipient institutions can successfully adapt these models using their own limited data without compromising performance on previously learned domains. In extensive 5-fold cross-validation, our LoRA-enhanced model outperformed traditional full-model fine-tuning and maintained or improved detection accuracy across a wide range of MSK structures while demonstrating strong resilience to domain shifts. The proposed LoRA-enhanced RT-DETR significantly lowers the barrier for deploying transformer-based detection in clinics, offering a privacy-conscious, computationally lightweight solution for real-time, full-body MSK US identification that can be effectively adapted across diverse clinical settings without requiring extensive retraining or data sharing. Looking beyond detection, we are pioneering research into intelligent diagnosis of MSK US using large language models (LLMs). We have applied the first LLM capable of interpreting sequential ultrasound scans through the Llama 3.2 11B Vision model in a prospective study that mimics clinician workflows for AI assisted diagnostic reporting.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101372
DOI:	10.6342/NTU202504693
全文授權:	未授權
電子全文公開日期:	N/A
顯示於系所單位：	生醫電子與資訊學研究所

文件中的檔案：

檔案	大小	格式
ntu-114-1.pdf 未授權公開取用	2.92 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。