Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 工業工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98070
標題: 發展時空深度學習模型以識別腹腔鏡膽囊切除術動作-手術表現之客觀化評估
A Spatial-Temporal Deep Learning Approach to Identify Laparoscopic Cholecystectomy Actions for Objective Surgical Performance Evaluation
作者: 黃嘉園
Jia-Yuan Huang
指導教授: 藍俊宏
Jakey Blue
關鍵字: 腹腔鏡膽囊切除手術影像分析,手術動作識別,時空深度學習模型,手術效能評估,
Laparoscopic Cholecystectomy video analytics,surgical action identification,spatial-temporal deep learning model,surgical performance evaluation,
出版年 : 2025
學位: 碩士
摘要: 本研究旨在發展一套結合空間與時間特徵的深度學習模型,用以識別腹腔鏡膽囊切除手術中的各項動作,進而達成手術表現的客觀化與自動化評估。腹腔鏡膽囊切除術為當代臨床中最常見的微創手術之一,其操作精密、流程複雜,手術成效高度依賴執刀醫師的技巧與經驗。目前臨床上針對手術效能的評估大多仰賴醫師事後檢視完整影片並根據檢核表進行主觀評分,不僅耗時冗長,也存在評估準則不一致與主觀性過高等問題,對於醫師訓練與品質控管皆構成挑戰。儘管過往已有研究致力於自動化辨識手術階段,但此類階段分類多為高層次流程,難以細緻揭示醫師實際操作過程中的具體行為與手術手勢變化。此外,單純使用短時間片段進行判斷,模型易受時序波動與重複性動作的干擾而導致辨識誤差。因此,若能針對手術動作進行更細緻且具有時間關聯性的辨識,不僅能提升手術教學與研究效率,也可進一步推動手術品質監測與臨床決策輔助系統的發展。
本論文針對上述問題,提出兩種具備時空特徵融合能力之深度學習架構:STEMNet(Spatial-Temporal Encoding Memory Network)與STMT(Spatial-Temporal Memory Transformer)。STEMNet仿效TMRNet之兩階段設計,第一階段建立由ResNet50與LSTM組成之特徵提取器,預先建立記憶體庫以儲存長期時間資訊;第二階段進一步將短期序列與長期特徵整合於Transformer編碼器中,以提高模型辨識時的上下文理解能力。STMT則專注於辨識短時間內快速變化之細微動作,透過ResNet50擷取空間特徵後輸入至Transformer架構,以非遞迴方式捕捉多位置間的時間關聯性,提升辨識效率與敏感度。
為驗證所提模型之效能,本研究採用Cholec80腹腔鏡膽囊切除手術影片資料集作為實驗基礎,設計多組序列長度與預訓練策略進行比較,並以分類準確率與召回率作為效能指標進行量化分析。同時,為了協助臨床應用,亦導入資料視覺化方法,將模型輸出結果以時序圖形式呈現,幫助醫師快速掌握手術過程中各動作出現頻率與分布趨勢,進而評估醫師的操作節奏與技巧特徵。
實驗結果顯示,所提出模型在動作分類任務上均優於傳統手術階段識別方法,特別是在辨識重複性與異質性動作方面展現更高的準確性與穩定性,有效降低人工標註負擔並縮短審查時間。整體而言,本研究不僅展示深度學習技術於臨床手術影像分析之潛力,更為手術評估自動化與教學決策提供可行的技術路徑與應用價值。
This study proposes a spatial-temporal deep learning framework for recognizing surgical actions in laparoscopic cholecystectomy, aiming to facilitate objective and automated evaluation of surgical performance. Laparoscopic cholecystectomy is one of the most commonly performed minimally invasive surgeries in contemporary clinical practice. Its procedural complexity and precision make surgical outcomes highly dependent on the technical proficiency and experience of the operating surgeon. Traditional performance assessments primarily rely on manual video review and checklist-based scoring by experts, which is time-consuming, subjective, and inconsistent, posing challenges to surgical training and quality control.
Although prior studies have explored phase recognition to automate surgical workflow analysis, phase-level classification is often too coarse to capture nuanced surgical actions and hand movements. Furthermore, models that rely solely on short video snippets are vulnerable to temporal fluctuations and repetitive patterns, resulting in reduced accuracy. Thus, precise action-level recognition with temporal awareness is critical for improving surgical education, performance feedback, and the development of intelligent decision support systems.
To address these challenges, we propose two deep learning architectures that integrate spatial and temporal features: the Spatial-Temporal Encoding Memory Network (STEMNet) and the Spatial-Temporal Memory Transformer (STMT). STEMNet adopts a two-stage training strategy inspired by TMRNet, where a ResNet50-LSTM backbone extracts short-term features and stores them in a memory bank. These features are later fused with current inputs using a Transformer encoder to enhance contextual understanding. In contrast, STMT focuses on capturing fine-grained, rapidly changing actions by leveraging ResNet50 for spatial encoding followed by a Transformer encoder to model temporal dependencies in a non-recurrent, fully parallel manner.
To evaluate the effectiveness of the proposed models, experiments were conducted using the Cholec80 dataset. Various sequence lengths and pretraining strategies were compared using classification accuracy and recall as performance metrics. In addition, temporal visualization of model outputs was implemented to support clinical interpretation, enabling surgeons to efficiently review the frequency and distribution of actions and gain insights into their technical patterns and timing.
Experimental results demonstrate that the proposed architectures outperform traditional phase recognition models, particularly in distinguishing repetitive and heterogeneous actions. The models significantly reduce manual annotation burden and review time, offering promising utility for clinical deployment. Overall, this study highlights the potential of deep learning in surgical video analysis and contributes a practical framework for automated performance evaluation and educational support in minimally invasive surgery.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98070
DOI: 10.6342/NTU202502244
全文授權: 同意授權(限校園內公開)
電子全文公開日期: 2030-07-22
顯示於系所單位:工業工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf
  未授權公開取用
3.09 MBAdobe PDF檢視/開啟
顯示文件完整紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved