Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89732| Title: | 利用腸道微生物菌相早期檢測大腸癌和大腸腺瘤的新型機器學習方法 A novel machine learning pipeline for early detection of colorectal cancer and colorectal adenoma using gut microbiome data |
| Authors: | 廖乃勳 Nai-Shun Liao |
| Advisor: | 莊曜宇 Eric Y. Chuang |
| Co-Advisor: | 陳佩君 Pei-Chun Chen |
| Keyword: | 大腸直腸癌,腸道菌相,機器學習,微生物風險得分,糞便早期篩檢, CRC,Gut microbiome,Machine learning,MRS,Stool-based screening, |
| Publication Year : | 2023 |
| Degree: | 碩士 |
| Abstract: | 大腸直腸癌(簡稱大腸癌)在美國與台灣皆是第三大診斷癌症。通過大腸癌篩檢和診斷可以找出高風險的患者並且大幅降低大腸癌的長期風險。許多研究已經表明大腸癌與腸道微生物菌相之間存在許多關聯。利用機器學習模型來檢測潛在患者的腸道菌相有潛力比傳統的大便篩檢測試更早地檢測到大腸癌。在這篇研究當中,我們構建了一個新的機器學習流程,使用微生物菌相數據來識別大腸癌、大腸腺瘤和健康組別,並評估每個人的大腸癌風險分數。從SRA數據庫或其他研究中提供的數據中收集了具有16S rRNA定序數據的糞便樣本。根據ANCOM-BC演算法和卡方檢定,共識別出109個與大腸癌相關的菌屬。使用10組交叉驗證對隨機森林分類器進行訓練並且通過外部驗證資料評估模型的分類表現。結果顯示,在區分對照組和大腸癌組方面,隨機森林模型具有優異的分類性能,在10組交叉驗證中有90%的AUC並在外部驗證中有82%的AUC。在通過分類對照組對比腺瘤加大腸癌組以達到大腸腺瘤早期篩檢的策略中,隨機森林模型在10組交叉驗證中表現出87%的靈敏度,在外部驗證中表現出97%的靈敏度。最後使用ANCOM-BC演算法找出的7個生物標記菌屬被用來計算微生物風險得分 (MRS),可以被用來作為大腸癌的風險指標。總而言之,我們開發了一種使用16S rRNA腸道微生物菌相數據的CRC分類新流程,並識別出了特定於大腸癌的腸道微生物菌屬。該流程和生物標記菌屬可以作為早期檢測CRC的非侵入性工具使用。 Colorectal cancer (CRC) is the third leading diagnosed cancer and cause of cancer death in the United State and Taiwan. The long-term risk of CRC can be managed through the identification of high-risk patients by CRC screening and diagnosis. Many studies have shown the associations between CRC and gut microbiome. The machine learning models have the potential to detect CRC earlier than the conventional stool screening test. We constructed a novel machine learning pipeline to identify CRC, colorectal adenoma, and healthy groups, and evaluated the risk of CRC for each person using microbiome data. Stool samples with16S rRNA sequence data were collected from the NCBI SRA database or supplementary data provided in studies. In total, 109 CRC-associated genera were identified based on ANCOM-BC algorithm and chi-square test. Random forest (RF) classifiers were training with 10-fold cross validation (CV). Model performance was evaluated by the external validation. Our results showed that the RF model illustrated excellent performance with 90% AUC for 10-fold CV and 82% AUC for external validation in classifying control vs CRC groups. RF model performed well with 87% sensitivity for 10-fold CV and 97% sensitivity for external validation in early detection strategy by classifying control vs adenoma plus CRC groups. Finally, 7 biomarkers identified by ANCOM-BC algorithm were utilized to calculate a microbial risk score (MRS), which could be regarded as an index the possibility of CRC. In summary, we developed a new pipeline for CRC classification using 16s rRNA gut microbiome data and identified CRC-specific gut microbiome genera. The pipeline and biomarkers could be used as a non-invasive tool for the early detection of CRC. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89732 |
| DOI: | 10.6342/NTU202303093 |
| Fulltext Rights: | 同意授權(全球公開) |
| metadata.dc.date.embargo-lift: | 2025-08-31 |
| Appears in Collections: | 生醫電子與資訊學研究所 |
Files in This Item:
| File | Size | Format | |
|---|---|---|---|
| ntu-111-2.pdf | 1.51 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
