請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98070完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 藍俊宏 | zh_TW |
| dc.contributor.advisor | Jakey Blue | en |
| dc.contributor.author | 黃嘉園 | zh_TW |
| dc.contributor.author | Jia-Yuan Huang | en |
| dc.date.accessioned | 2025-07-24T16:04:41Z | - |
| dc.date.available | 2025-07-25 | - |
| dc.date.copyright | 2025-07-24 | - |
| dc.date.issued | 2025 | - |
| dc.date.submitted | 2025-07-22 | - |
| dc.identifier.citation | Padoy, N., Blum, T., Feussner, H., Berger, M. O., & Navab, N. (2008, July). On-line Recognition of Surgical Activity for Monitoring in the Operating Room. In AAAI (pp. 1718-1724).
Blum, T., Feußner, H., & Navab, N. (2010). Modeling and segmentation of surgical workflow from laparoscopic video. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2010: 13th International Conference, Beijing, China, September 20-24, 2010, Proceedings, Part III 13 (pp. 400-407). Springer Berlin Heidelberg. Twinanda, A. P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., & Padoy, N. (2016). Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE transactions on medical imaging, 36(1), 86-97. Jin, Y., Dou, Q., Chen, H., Yu, L., Qin, J., Fu, C. W., & Heng, P. A. (2017). SV-RCNet: workflow recognition from surgical videos using recurrent convolutional network. IEEE transactions on medical imaging, 37(5), 1114-1126. Jin, Y., Long, Y., Chen, C., Zhao, Z., Dou, Q., & Heng, P. A. (2021). Temporal memory relation network for workflow recognition from surgical video. IEEE Transactions on Medical Imaging, 40(7), 1911-1923. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., & Schmidhuber, J. (2016). LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems, 28(10), 2222-2232. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., & Schmid, C. (2021). Vivit: A video vision transformer. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6836-6846). Tong, Z., Song, Y., Wang, J., & Wang, L. (2022). Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. Advances in neural information processing systems, 35, 10078-10093. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98070 | - |
| dc.description.abstract | 本研究旨在發展一套結合空間與時間特徵的深度學習模型,用以識別腹腔鏡膽囊切除手術中的各項動作,進而達成手術表現的客觀化與自動化評估。腹腔鏡膽囊切除術為當代臨床中最常見的微創手術之一,其操作精密、流程複雜,手術成效高度依賴執刀醫師的技巧與經驗。目前臨床上針對手術效能的評估大多仰賴醫師事後檢視完整影片並根據檢核表進行主觀評分,不僅耗時冗長,也存在評估準則不一致與主觀性過高等問題,對於醫師訓練與品質控管皆構成挑戰。儘管過往已有研究致力於自動化辨識手術階段,但此類階段分類多為高層次流程,難以細緻揭示醫師實際操作過程中的具體行為與手術手勢變化。此外,單純使用短時間片段進行判斷,模型易受時序波動與重複性動作的干擾而導致辨識誤差。因此,若能針對手術動作進行更細緻且具有時間關聯性的辨識,不僅能提升手術教學與研究效率,也可進一步推動手術品質監測與臨床決策輔助系統的發展。
本論文針對上述問題,提出兩種具備時空特徵融合能力之深度學習架構:STEMNet(Spatial-Temporal Encoding Memory Network)與STMT(Spatial-Temporal Memory Transformer)。STEMNet仿效TMRNet之兩階段設計,第一階段建立由ResNet50與LSTM組成之特徵提取器,預先建立記憶體庫以儲存長期時間資訊;第二階段進一步將短期序列與長期特徵整合於Transformer編碼器中,以提高模型辨識時的上下文理解能力。STMT則專注於辨識短時間內快速變化之細微動作,透過ResNet50擷取空間特徵後輸入至Transformer架構,以非遞迴方式捕捉多位置間的時間關聯性,提升辨識效率與敏感度。 為驗證所提模型之效能,本研究採用Cholec80腹腔鏡膽囊切除手術影片資料集作為實驗基礎,設計多組序列長度與預訓練策略進行比較,並以分類準確率與召回率作為效能指標進行量化分析。同時,為了協助臨床應用,亦導入資料視覺化方法,將模型輸出結果以時序圖形式呈現,幫助醫師快速掌握手術過程中各動作出現頻率與分布趨勢,進而評估醫師的操作節奏與技巧特徵。 實驗結果顯示,所提出模型在動作分類任務上均優於傳統手術階段識別方法,特別是在辨識重複性與異質性動作方面展現更高的準確性與穩定性,有效降低人工標註負擔並縮短審查時間。整體而言,本研究不僅展示深度學習技術於臨床手術影像分析之潛力,更為手術評估自動化與教學決策提供可行的技術路徑與應用價值。 | zh_TW |
| dc.description.abstract | This study proposes a spatial-temporal deep learning framework for recognizing surgical actions in laparoscopic cholecystectomy, aiming to facilitate objective and automated evaluation of surgical performance. Laparoscopic cholecystectomy is one of the most commonly performed minimally invasive surgeries in contemporary clinical practice. Its procedural complexity and precision make surgical outcomes highly dependent on the technical proficiency and experience of the operating surgeon. Traditional performance assessments primarily rely on manual video review and checklist-based scoring by experts, which is time-consuming, subjective, and inconsistent, posing challenges to surgical training and quality control.
Although prior studies have explored phase recognition to automate surgical workflow analysis, phase-level classification is often too coarse to capture nuanced surgical actions and hand movements. Furthermore, models that rely solely on short video snippets are vulnerable to temporal fluctuations and repetitive patterns, resulting in reduced accuracy. Thus, precise action-level recognition with temporal awareness is critical for improving surgical education, performance feedback, and the development of intelligent decision support systems. To address these challenges, we propose two deep learning architectures that integrate spatial and temporal features: the Spatial-Temporal Encoding Memory Network (STEMNet) and the Spatial-Temporal Memory Transformer (STMT). STEMNet adopts a two-stage training strategy inspired by TMRNet, where a ResNet50-LSTM backbone extracts short-term features and stores them in a memory bank. These features are later fused with current inputs using a Transformer encoder to enhance contextual understanding. In contrast, STMT focuses on capturing fine-grained, rapidly changing actions by leveraging ResNet50 for spatial encoding followed by a Transformer encoder to model temporal dependencies in a non-recurrent, fully parallel manner. To evaluate the effectiveness of the proposed models, experiments were conducted using the Cholec80 dataset. Various sequence lengths and pretraining strategies were compared using classification accuracy and recall as performance metrics. In addition, temporal visualization of model outputs was implemented to support clinical interpretation, enabling surgeons to efficiently review the frequency and distribution of actions and gain insights into their technical patterns and timing. Experimental results demonstrate that the proposed architectures outperform traditional phase recognition models, particularly in distinguishing repetitive and heterogeneous actions. The models significantly reduce manual annotation burden and review time, offering promising utility for clinical deployment. Overall, this study highlights the potential of deep learning in surgical video analysis and contributes a practical framework for automated performance evaluation and educational support in minimally invasive surgery. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-07-24T16:04:41Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2025-07-24T16:04:41Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 中文摘要 i
Abstract iii 目次 v 圖次 vii 表次 viii 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機與目的 3 1.3 研究架構 4 第二章 文獻探討 5 2.1 腹腔鏡膽囊切除手術於電腦視覺之背景 5 2.1.1 手術階段識別於統計模型 5 2.1.2 手術階段識別於深度學習 6 2.2 時間深度學習模型 7 2.2.1 長短期記憶網路 7 2.2.2 Transformer 8 2.3 空間深度學習模型 10 2.3.1 卷積神經網路 10 2.3.2 殘差神經網路 11 2.3.3 Vision Transformer 12 2.4 時空深度學習模型 13 2.4.1 TMRNet 13 2.4.2 Video Vision Transformer 14 2.4.3 VideoMAE 15 2.5 文獻總結 16 第三章 手術動作辨識模型設計 17 3.1 資料前處理 20 3.2 時空深度學習模型於腹腔鏡膽囊切除手術 22 3.2.1 STEMNet 22 3.2.2 STMT 25 3.3 損失函數 30 3.4 模型評估指標 31 第四章 案例研討 33 4.1 資料集說明 33 4.2 模型成效分析 36 4.2.1 基模型成效 37 4.2.2 STEMNet成效 41 4.2.3 STMT成效 44 4.2.4 各模型間成對比較與統計顯著性檢定 48 4.3 資料視覺化分析 51 第五章 結論與建議 54 5.1 研究結論 54 5.2 未來展望 55 參考文獻 56 | - |
| dc.language.iso | zh_TW | - |
| dc.subject | 手術效能評估 | zh_TW |
| dc.subject | 時空深度學習模型 | zh_TW |
| dc.subject | 手術動作識別 | zh_TW |
| dc.subject | 腹腔鏡膽囊切除手術影像分析 | zh_TW |
| dc.subject | surgical performance evaluation | en |
| dc.subject | Laparoscopic Cholecystectomy video analytics | en |
| dc.subject | surgical action identification | en |
| dc.subject | spatial-temporal deep learning model | en |
| dc.title | 發展時空深度學習模型以識別腹腔鏡膽囊切除術動作-手術表現之客觀化評估 | zh_TW |
| dc.title | A Spatial-Temporal Deep Learning Approach to Identify Laparoscopic Cholecystectomy Actions for Objective Surgical Performance Evaluation | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 113-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 何明志;顏宏軒 | zh_TW |
| dc.contributor.oralexamcommittee | Ming-Chih Ho;Hung-Hsuan Yen | en |
| dc.subject.keyword | 腹腔鏡膽囊切除手術影像分析,手術動作識別,時空深度學習模型,手術效能評估, | zh_TW |
| dc.subject.keyword | Laparoscopic Cholecystectomy video analytics,surgical action identification,spatial-temporal deep learning model,surgical performance evaluation, | en |
| dc.relation.page | 57 | - |
| dc.identifier.doi | 10.6342/NTU202502244 | - |
| dc.rights.note | 同意授權(限校園內公開) | - |
| dc.date.accepted | 2025-07-23 | - |
| dc.contributor.author-college | 工學院 | - |
| dc.contributor.author-dept | 工業工程學研究所 | - |
| dc.date.embargo-lift | 2030-07-22 | - |
| 顯示於系所單位: | 工業工程學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 未授權公開取用 | 3.09 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
