病例存活拔靴機器學習法有效估計傳統乳癌篩檢效益–台灣族群乳癌經驗

吳張瑀; Ariel Chang-Yu Wu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89629

標題:	病例存活拔靴機器學習法有效估計傳統乳癌篩檢效益–台灣族群乳癌經驗 Patient Survival Data for Efficient Estimation of Mammography Screening Effectiveness with Bootstrap-based Machine Learning Methods: Taiwan Population-based Screening
作者:	吳張瑀 Ariel Chang-Yu Wu
指導教授:	陳秀熙 Hsiu-Hsi Chen
關鍵字:	乳癌篩檢,存活分析,時間相依性,前導期偏差,截切偏差,抽樣, breast cancer screening,survival analysis,time-dependent,lead time bias,truncation,sampling,
出版年 :	2023
學位:	碩士
摘要:	背景：傳統大數據分析進行大規模族群篩檢效益評估需處理非常大筆的資料，過程耗時且複雜。乳癌篩檢從早期的隨機分派臨床試驗使用意向治療分析法（intention-to-treat analysis）比較受邀篩檢者和未受邀篩檢者乳癌存活，到開始大規模族群篩檢後以觀察性世代研究比較受邀篩檢者和未受邀篩檢者或是參與篩檢與未參與篩檢者（per-protocol analysis）乳癌存活，都需使用大量篩檢資料。本研究旨在以乳癌個案的死亡日期創造排序統計量，進而採拔靴機器學習方法建立時間相依篩檢暴露史用以代表族群篩檢資料，以有效估計傳統大數據分析乳癌篩檢效益，並以電腦模擬巢狀病例對照抽樣研究設計組織性篩檢之不偏及有效應用之評估方法。方法：本論文先以前瞻性世代研究設計進行傳統大數據分析乳癌篩檢效益評估，使用台灣地區三個篩檢時期大規模族群乳癌篩檢資料以時間相依性Cox等比例風險模型（time-dependent Cox proportional hazard model）進行乳房攝影篩檢效益評估。接著以拔靴機器學習分析方法，利用死於乳癌的排序統計量（以死亡日期進行排序），重新定義時間相依的篩檢暴露史作為族群篩檢暴露狀態之分佈，應用時間相依性轉換模型（time-dependent switched design）校正截切偏差、前導期偏差及病程長短偏差。最後以電腦模擬巢狀病例對照研究設計，從死於乳癌個案（病例組）與設限個案（對照組）選取隨機樣本進行前述拔靴機器學習分析方法，並檢視不同病例組及對照組樣本數與比例之估計不偏性。結果：傳統大數據分析台灣大規模乳房攝影篩檢於降低乳癌死亡率之相對危險性為0.67（95% CI 0.62-0.72），顯示乳房攝影篩檢能降低33%乳癌死亡率；二期以上乳癌發生風險比為0.70（95% CI 0.62-0.78），乳房攝影篩檢能降低30%晚期乳癌發生率。拔靴機器學習分析乳癌個案存活資料估計大規模族群隨機分派臨床試驗意向治療分析法的乳癌死亡的風險比為0.65（95% CI 0.61-0.68），結果顯示乳癌篩檢能降低35%乳癌死亡率。兩研究乳癌篩檢於存活的效益相近。在電腦模擬巢狀病例對照研究顯示病例組與對照組抽樣樣本數達50%以上或是病例組與對照組抽樣佔比與原資料的比例越相近，抽樣估計值會有較佳的涵蓋比例和不偏性。結論：根據傳統大數據分析結合傾向分數分析，顯示台灣大規模族群乳房攝影篩檢能夠降低33%的乳癌死亡率。以拔靴機器學習分析乳癌個案所得到篩檢效益，篩檢能降低35%乳癌死亡率，拔靴法結合時間相依篩檢暴露校正可以對族群篩檢達到不偏估計。本研究進一步使用巢狀病例對照研究設計的電腦模擬，找到有效的抽樣方法。如此對未來進行族群大規模篩檢效益評估將可提供不偏及更有效率的應用。 Background: Conventional big data analysis for evaluation of population-based screening effectiveness requires to process large-scale screening data, which is time-consuming and complex. Breast cancer screening has evolved from randomized clinical trials comparing breast cancer survival between invited and non-invited participants (intention-to-treat analysis), to observational cohort studies comparing invited and non-invited participants or attended and non-attended participants (per-protocol analysis) after the start of population-based mass screening, all of which require a large-scale screening data. Our aim is to create a ranking statistic based on the date of death from breast cancer cases to establish a time-dependent screening exposure history using the bootstrap machine learning method, in order to estimate the effectiveness of large-scale population-based breast cancer screening. Also, we will use a computer-simulated nested case-control study design to evaluate a sampling method to estimate the effectiveness. Methods: The paper first conducted conventional big data analysis for breast cancer screening effectiveness using a prospective cohort study design and a time-dependent Cox proportional hazard model based on large-scale population-based screening data from three screening periods in Taiwan. The bootstrap machine learning method was then used to redefine the time-dependent screening exposure history based on a ranking statistic of deaths from breast cancer cases (sorted by date of death), and to correct for lead-time bias, length-biased sampling bias, and truncation bias using a time-dependent switched design. Finally, a computer-simulated nested case-control study design was used to randomly select samples from breast cancer cases (case group) and censored cases (control group) who died from other causes, and the aforementioned bootstrap machine learning method was applied to examine the unbiased estimates for different sample sizes and sampling fractions in the case and control groups. Results: Conventional big data analysis showed that the relative risk of breast cancer mortality reduction in Taiwan population-based breast cancer mammography screening was 0.67 (95% CI 0.62-0.72), indicating a 33% reduction in breast cancer mortality. The hazard ratio of stage II or higher breast cancer incidence was 0.70 (95% CI 0.62-0.78), indicating a 30% reduction in the incidence of late-stage breast cancer. Bootstrap machine learning analysis of survival data for breast cancer cases estimated the hazard ratio for breast cancer mortality using the intention-to-treat analysis method of population-based randomized clinical trials to be 0.65 (95% CI 0.61-0.68), showing a 35% reduction in breast cancer mortality in screening. Both studies showed similar effectiveness of breast cancer screening. A computer simulation study revealed that estimates had higher coverage rates and lower bias when the sample size of both the case and control groups exceeded 50%, or when the ratio of sampled cases in the case and control groups in the sample resembled that of the total cases. Conclusion: According to conventional big data analysis combined with propensity score analysis, population-based mammography screening in Taiwan resulted in a 33% reduction in breast cancer mortality. The bootstrap machine learning analysis of breast cancer cases showed that screening resulted in a 35% reduction in breast cancer mortality. This indicates that using bootstrapping with time-dependent switch design can achieve unbiased estimation of population screening. Furthermore, this study used computer simulations based on nested case-control study design to find effective sampling methods. Therefore, it can be inferred that our study designs have the potential to provide unbiased estimates of the effectiveness of population-based screening.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89629
DOI:	10.6342/NTU202302679
全文授權:	同意授權(限校園內公開)
顯示於系所單位：	流行病學與預防醫學研究所

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	2.15 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。