以機器學習方法建構卵巢癌病患之基因預後模型

Chun-Liang Tao; 陶俊良

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/77357

標題:	以機器學習方法建構卵巢癌病患之基因預後模型 Development of a prognostic model for ovarian cancer through machine learning algorithms
作者:	Chun-Liang Tao 陶俊良
指導教授:	盧子彬(Tzu-Pin Lu)
關鍵字:	卵巢癌,總體存活,基因表達,機器學習,XGBoost,基因演算法, ovarian cancer,overall survival,gene expression,machine learning,XGBoost,genetic algorithm,
出版年 :	2019
學位:	碩士
摘要:	背景: 2018年卵巢癌在世界女性癌症死亡中排名第八且卵巢癌新病例於2018年共有295,414例。在存活方面，台灣上皮性卵巢癌的五年相對存活率為63.68%，而非西班牙裔白人的上皮性卵巢癌五年存活率僅只有35%，因此關於卵巢癌的治療至關重要。在台灣和National Comprehensive Cancer Network (NCCN)中手術後化學治療的臨床實踐指南顯示部分IA或IB期卵巢癌病患建議不需要接受化療，但在台灣一期卵巢癌病患手術後接受化療的比例卻高達60.65%，然而，要找到需要接受化學治療的高風險(總體存活時間小於三年)卵巢癌病患是非常不容易的，因此，在本研究中開發了統計模型來藉此找出接受化學治療後能具有效益的病患。研究方法: 在本研究建立Bagging模型方面，利用微陣列基因表現資料集中GSE26193當作配適模型的資料集，先找出差異倍數大於兩倍的探針，再使用T檢定選擇顯著的探針後，隨後去建立一個結合XGBoost的基因演算法模型，將資料的反應變項設為三年總體存活並以其評估預測模型的性能，為了解決過度配適的問題，每次隨機從訓練資料中抽取70%去建立模型，並重複15次前面建立模型步驟去建立15個模型而刪除表現較差的模型後，最後藉由14個模型投票預測高風險病患，再藉由使用三個外部獨立的微陣列資料集(GSE30161、GSE19829、GSE63885)去測試Bagging模型的表現。結果: 模型結果顯示Bagging模型不僅在GSE26193資料集的驗證資料中顯示出100％的靈敏度，其Kaplan Meier存活分析曲線中，高低風險病患的存活曲線明顯有區別(log rank 檢定的P值=0.0024<0.05)，另外在所有測試資料總合也顯示出83%高靈敏度的表現，並且於所有測試資料合併後的Kaplan Meier存活分析曲線中，高低風險病患的存活曲線也呈現相同趨勢(log rank 檢定的P值=0.014<0.05)。結論: Bagging預測模型可以幫助選擇需要接受化學治療的高風險病患，另外藉由此模型可以排除一些不需要接受化學治療的低風險病患，進而在醫療費用方面大約可以省下403,376,060英鎊。 Background: Ovarian cancer is ranked as the eighth disease in cancer-related deaths among women in the whole world, and about 295,414 new cases were reported in 2018. The 5-year survival rate of epithelial ovarian cancer is only 35% in non-Hispanic whites, whereas the 5-year relative survival rate of epithelial ovarian cancer is 63.68% in Taiwan. Currently, the two practical guidelines including Taiwan and National Comprehensive Cancer Network (NCCN) indicate that it is unnecessary to give chemotherapy to ovarian cancer patients in stage IA or B. However, approximately 60.65% percent of stage I ovarian cancer patients received chemotherapy after surgery in Taiwan. Till now, no good predictors have been developed to identify high-risk patients (overall survival less than 3 years) with ovarian cancer who need to receive chemotherapy. Materials and Methods: A prediction model using the gene expression profiles is developed to identify patients with high risk of death and thus these patients may need intensive and aggressive medical cares. To develop the model, a t-test was used to select significant genes, and only genes showing at least 2 fold changes were retained. Subsequently, a genetic algorithm combined with XGBoost was utilized to develop a model based on the training dataset. The outcome was set as the 3-years overall survival to evaluate the performances of the prediction model. To address the issue of overfitting, we randomly selected 70% of the patients as the training data and repeated the same analysis steps described previously for 15 times in order to build a bagging model. Lastly, the bagging model was validated by using three independent microarray datasets (GSE30161, GSE19829, GSE63885). Results: The bagging model not only showed 100% sensitivity in the validation data from GSE26193, but also displayed high sensitivity (83%) in all three testing datasets. The Kaplan Meier survival plots for the validation set showed that the overall survival of high-risk groups were shorter than low-risk groups based on the prediction of classification model (log rank test p-value=0.0024 < 0.05). The Kaplan Meier survival plots for all three testing datasets (log rank tests p-value=0.014 < 0.05) shows distinct curves for high-risk and low-risk groups, which conforms to that from the training dataset. Conclusion: The prediction model could help to select high-risk ovarian cancer patients who may need to receive chemotherapy. Regarding the medical expenses, roughly £403,376,060 could potentially be saved from excluding low-risk patients to receive the chemotherapy.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/77357
DOI:	10.6342/NTU201901785
全文授權:	未授權
顯示於系所單位：	流行病學與預防醫學研究所

文件中的檔案：

檔案	大小	格式
ntu-108-R06849010-1.pdf 未授權公開取用	1.5 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。