基於整合式深度學習框架於大腸直腸癌病理切片影像預測生物標記

田庚昀; Geng-Yun Tien

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97153

標題:	基於整合式深度學習框架於大腸直腸癌病理切片影像預測生物標記 An Integrated Deep Learning Framework for Biomarker Prediction from Histopathological Images in Colorectal Cancer
作者:	田庚昀 Geng-Yun Tien
指導教授:	莊曜宇 Eric Y. Chuang
共同指導教授:	陳翔瀚 Hsiang-Han Chen
關鍵字:	大腸直腸癌,組織病理切片影像,生物標記預測,深度學習, colorectal cancer,histopathological image,biomarker prediction,deep learning,
出版年 :	2025
學位:	碩士
摘要:	大腸直腸癌是全球最致命的癌症之一。辨識生物標記狀態對於標靶治療和免疫治療至關重要，但現行的篩檢方法通常成本高昂且耗時費力，特別是在低收入地區。組織病理全切片影像是診斷中經常使用的工具，它能提供有關細胞異質性的關鍵資訊。在本研究中，我們提出一個整合的深度學習框架，用於預測大腸直腸癌中的重要生物標記，包括BRAF V600E突變、KRAS突變和MSI-H。我們從TCGA-COAD和CPTAC-COAD資料庫中收集帶有生物標記標註的組織病理全切片影像。TCGA-COAD用於模型訓練和交叉驗證，CPTAC-COAD則用於外部測試。這些切片影像被分割成補丁，以滿足深度學習模型的輸入要求。首先，我們訓練一個腫瘤偵測模型，用來準確區分腫瘤補丁和非腫瘤補丁。接著使用TCGA-COAD的腫瘤補丁來微調基於自監督學習的特徵提取器，其將來自同一切片影像的腫瘤補丁轉換成特徵矩陣。最後，我們基於多實例學習配合注意力機制、變換器、圖神經網路三種技術，分別訓練了Att-MIL、Tran-MIL 和 GNN-MIL三個生物標記預測模型。這些模型在特徵矩陣上進行完整的訓練，並使用集成方法將三個模型的預測輸出整合為最終結果。本研究提出的腫瘤偵測模型有著優異的表現，在測試階段分辨腫瘤補丁時達到至少90%的精確度和召回率。特徵提取器微調過程中訓練損失曲線的穩定收斂，進一步驗證該模型能夠從腫瘤補丁中捕捉最具代表性的特徵。在生物標記預測模型中。基於TCGA-COAD的交叉驗證結果顯示，MSI-H預測的表現最佳，三個模型的集成預測之AUC達到90%；其次是BRAF V600E突變預測，其AUC達到87%；而KRAS突變預測之AUC則為64%，三項生物標記的預測效能均優於過往的其他研究。總結來說，所提出的深度學習框架提供一種有效的方式來判斷生物標記狀態，並展示了從臨床影像中發現潛在分子異常的能力。 Colorectal cancer (CRC) is one of the deadliest cancers globally. Identifying biomarkers status for targeted therapy and immunotherapy is essential, but current screening methods are often expensive and labor-intensive, particularly in low-income regions. Histopathological whole slide images (WSIs), routinely used for diagnostics, can provide valuable insights into cellular heterogeneity. In this study, we propose an integrated deep learning (DL) framework to predict important CRC biomarkers, including BRAF V600E mutation, KRAS mutation and MSI-H. WSIs with biomarker labels were collected from the TCGA-COAD and CPTAC-COAD cohorts, using TCGA-COAD for model training and cross-validation while CPTAC-COAD served as the external testing. These WSIs were segmented into patches to meet the input requirements of the DL models. First, we trained a tumor detection model to accurately distinguish tumor patches from non-tumor patches. Then, the TCGA-COAD tumor patches were used to fine-tune a self-supervised learning-based feature extractor, which converts tumor patches from the same WSI into a feature matrix. Finally, we trained three biomarker prediction models—Att-MIL, Tran-MIL, and GNN-MIL—based on multiple instance learning with attention mechanisms, transformers, and graph neural networks, respectively. These models were fully trained on the feature matrix, and the predictions from all three models were integrated using an ensemble method to generate the final results. The proposed tumor detection model exhibited excellent performance, achieving at least 90% precision and recall in identifying tumor patches during testing. The stable convergence of the training loss curve during the fine-tuning of the feature extractor further validated the model's ability to capture the most representative features from tumor patches. In the biomarker prediction models, cross-validation results based on TCGA-COAD indicated that MSI-H prediction achieved the best performance, with the ensemble prediction achieving an AUC of 90%; BRAF V600E mutation prediction followed with an AUC of 87%, KRAS mutation prediction achieved an AUC of 64%. The prediction performance of all three biomarkers surpassed that of previous studies. In conclusion, the proposed DL framework offers an efficient approach to determining biomarker status and demonstrates a capability to uncover potential molecular aberration from the clinical images.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97153
DOI:	10.6342/NTU202500414
全文授權:	同意授權(限校園內公開)
電子全文公開日期:	2030-02-07
顯示於系所單位：	生醫電子與資訊學研究所

文件中的檔案：

檔案	大小	格式
ntu-113-1.pdf 未授權公開取用	9.5 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。