Please use this identifier to cite or link to this item:
Development of a prognostic model for ovarian cancer through machine learning algorithms
ovarian cancer,overall survival,gene expression,machine learning,XGBoost,genetic algorithm,
|Publication Year :||2019|
|Abstract:||背景: 2018年卵巢癌在世界女性癌症死亡中排名第八且卵巢癌新病例於2018年共有295,414例。在存活方面，台灣上皮性卵巢癌的五年相對存活率為63.68%，而非西班牙裔白人的上皮性卵巢癌五年存活率僅只有35%，因此關於卵巢癌的治療至關重要。在台灣和National Comprehensive Cancer Network (NCCN)中手術後化學治療的臨床實踐指南顯示部分IA或IB期卵巢癌病患建議不需要接受化療，但在台灣一期卵巢癌病患手術後接受化療的比例卻高達60.65%，然而，要找到需要接受化學治療的高風險(總體存活時間小於三年)卵巢癌病患是非常不容易的，因此，在本研究中開發了統計模型來藉此找出接受化學治療後能具有效益的病患。
結果: 模型結果顯示Bagging模型不僅在GSE26193資料集的驗證資料中顯示出100％的靈敏度，其Kaplan Meier存活分析曲線中，高低風險病患的存活曲線明顯有區別(log rank 檢定的P值=0.0024<0.05)，另外在所有測試資料總合也顯示出83%高靈敏度的表現，並且於所有測試資料合併後的Kaplan Meier存活分析曲線中，高低風險病患的存活曲線也呈現相同趨勢(log rank 檢定的P值=0.014<0.05)。
Background: Ovarian cancer is ranked as the eighth disease in cancer-related deaths among women in the whole world, and about 295,414 new cases were reported in 2018. The 5-year survival rate of epithelial ovarian cancer is only 35% in non-Hispanic whites, whereas the 5-year relative survival rate of epithelial ovarian cancer is 63.68% in Taiwan. Currently, the two practical guidelines including Taiwan and National Comprehensive Cancer Network (NCCN) indicate that it is unnecessary to give chemotherapy to ovarian cancer patients in stage IA or B. However, approximately 60.65% percent of stage I ovarian cancer patients received chemotherapy after surgery in Taiwan. Till now, no good predictors have been developed to identify high-risk patients (overall survival less than 3 years) with ovarian cancer who need to receive chemotherapy.
Materials and Methods: A prediction model using the gene expression profiles is developed to identify patients with high risk of death and thus these patients may need intensive and aggressive medical cares. To develop the model, a t-test was used to select significant genes, and only genes showing at least 2 fold changes were retained. Subsequently, a genetic algorithm combined with XGBoost was utilized to develop a model based on the training dataset. The outcome was set as the 3-years overall survival to evaluate the performances of the prediction model. To address the issue of overfitting, we randomly selected 70% of the patients as the training data and repeated the same analysis steps described previously for 15 times in order to build a bagging model. Lastly, the bagging model was validated by using three independent microarray datasets (GSE30161, GSE19829, GSE63885).
Results: The bagging model not only showed 100% sensitivity in the validation data from GSE26193, but also displayed high sensitivity (83%) in all three testing datasets. The Kaplan Meier survival plots for the validation set showed that the overall survival of high-risk groups were shorter than low-risk groups based on the prediction of classification model (log rank test p-value=0.0024 < 0.05). The Kaplan Meier survival plots for all three testing datasets (log rank tests p-value=0.014 < 0.05) shows distinct curves for high-risk and low-risk groups, which conforms to that from the training dataset.
Conclusion: The prediction model could help to select high-risk ovarian cancer patients who may need to receive chemotherapy. Regarding the medical expenses, roughly £403,376,060 could potentially be saved from excluding low-risk patients to receive the chemotherapy.
|Appears in Collections:||流行病學與預防醫學研究所|
Files in This Item:
|1.5 MB||Adobe PDF|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.