請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50368
標題: | 運用脆弱模型為基礎之隨機過程於區間設限資料 Frailty-based Stochastic Process for Interval-censoring Data |
作者: | Ying-Yu Chen 陳瑩瑜 |
指導教授: | 陳秀熙 |
關鍵字: | 區間設限資料,脆弱模型,隨機過程,乳癌篩檢,EM 演算法, Interval censoring data,Frailty model,Stochastic process,Breast Cancer Screening,EM algorithm, |
出版年 : | 2016 |
學位: | 碩士 |
摘要: | 背景
處理存活分析中的區間設限資料常是棘手的問題,尤其需考量共變數對事件發生時間之效應。過去已有學者提出統計方法來處理區間設限資料,始自Peto (1973)的無母數方法,Turnbull (1976)利用定義區間設限資料左右限之等價集合為基礎得出數學演算法之無母數方法,至Finkelstein (1985)以Cox (1972)的半母數比例風險迴歸模型為基礎發展相關研究,Farrington (1996)及Collett (2003)提出有母數模型處理區間設限資料的。然而,若將前方法應用於大樣本資料時,將因囊括眾多重疊的事件區間而造成在兩階段模型中以分段方法為基礎的迴歸模型,產生大量複雜的計算問題,進而衍生過多的待估參數。再者,將區間設限資料應用於多階段疾病進展模式,如全國性的癌症篩檢資料,之統計模型研究仍非常有限。 目的 本論文研究目的如下: 1. 發展電腦程式演算方法處理Finkelstein和Wolfe (1985)以及Collett (2003)所提出的既有方法,並應用於Finkelstein和Wolfe (1985)研究中的三個範例並進行比較。 2. 發展以脆弱模型為基礎之隨機效應指數迴歸模式,利用區間設限資料左右端點形成之等價區間(區間次集合)為基礎,考量基準風險於不同等價區間的異質性。 3. 發展以脆弱模型為基礎之多階段馬可夫模型,並將其應用於同時含括區間設限、左設限和左截切的瑞典乳癌篩檢資料。 材料與方法 方法 1. 以電腦程式演算方法演示既有方法 以下說明根據Finkelstein和Wolfe (1985)以及Collett (2003)所提出的既有方法之演算方法概要: (1) 由Finkelstein和Wolfe (1985)提出的半母數Cox (1972)比例風險迴歸模型 根據Finkelstein和Wolfe (1985)的方法,以SAS IML程序撰寫具自我一致性EM演算法的電腦程式,用於Freireich (1963) 6-MP試驗之白血病、Hoel和Walburg (1972)之肺癌動物實驗以及Beadle等人(1984a,1984b)之乳癌術後外觀情形資料。 (2) Collett (2003)分段式指數模型(母數方法) 以非線性混合效應(nonlinear-mixed)法及貝氏蒙地卡羅馬可夫鏈(Bayesian MCMC)法重現Collett (2003)分段式指數模型,並用於乳癌術後外觀情形資料。 2. 以脆弱模型及隨機效應模型為基礎之指數迴歸模型 此新模型將「脆弱」合併隨機效應於伽瑪分佈藉以刻畫基準風險(baseline hazard),此基準風險則根據Turnbull (1976)定義區間設限資料左右限之等價集合得出。我們使用貝氏蒙地卡羅馬可夫鏈法估計常態分布中之參數,並將上述模型應用於乳癌術後外觀情形的實際資料且將其估計結果與前述結果做比較。 3. 以脆弱模型為基礎之多階段模型 將兩階段的脆弱新模型拓展至多階段模型,並利用伽瑪分佈衡量「脆弱」效應並與其合併於馬可夫過程(Markov process),進而估計轉移機率之參數和相關共變數於不同轉移狀態之效應。我們亦將此多階段模型應用於瑞典乳癌篩檢資料。 材料來源 乳癌術後追蹤資料 此研究取自Beadle et al.(1984a,1984b)之回溯性研究,共計94位女性病人,其中有46位屬於僅雷射治療組,48位雷射治療加化學療法組。其目的欲探討早期乳癌病人使用不同治療的差異。病人每四到六個月追蹤一次,且追蹤時間與復原情況呈反比,其中乳房萎縮情形由輕至重可分為四個等級:無、輕微、中等及嚴重,此研究欲以第一次達到中等或嚴重的萎縮當成事件。由於其為固定追蹤資料,研究者無法得知確切時間,僅知曉其發生在某一區間內,符合區間設限資料之定義。 瑞典乳癌篩檢資料 此研究是以乳房攝影術為基礎之乳癌篩檢資料,其篩檢間隔為24或33個月,前後蒐集1977年至1986年間40至69歲參與瑞典庫柏格城市女性的篩檢資料。研究最初有50666位女性,其中1321位為癌症個案,789位屬於篩檢偵測個案,532位為篩檢間隔個案。此外,感興趣的變項為身體組成質量指數(BMI),我們以脆弱模型為基礎之多階段模型評估其在多階段疾病進程之效應。 結果 1. 電腦程式估計結果 將Finkelstein和Wolfe提出的半母數Cox比例風險迴歸模型,以三個範例估計出的迴歸係數與其刊出的數值非常接近,且因有較小的標準誤導致擁有較大的概似比檢定之檢定統計量(6-MP試驗之白血病資料: 20.677 v.s. 17.49、肺癌動物實驗資料:26.161 v.s. 2.4,以及乳癌術後外觀情形資料: 33.215 v.s. 6.83)。 2. 以脆弱模型基礎之兩階段方法估計乳癌術後外觀情形資料之結果 運用分段指數模型以貝氏蒙地卡羅馬可夫鏈估計出的治療效應(風險比值: 2.283, 95%信賴區間: 1.315, 3.987)與Collett結果相似(風險比值: 2.27, 95%信賴區間: 1.28-4.02),且與基準風險相關的參數(θ1-θ9)亦與Collett結果相仿。 而以脆弱模型為基礎之模式,估計的有效樣本數在logit鏈結函數下運用不同分布,如:伽瑪分布、對數常態分布或隨機效應於基準風險上皆十分穩健(從2.261至2.303不等),且與Collett之分段方法相容。三個模型的DIC值分別為336.8、 336.16和 338.55。此外,以complementary log-log鏈結函數分析的結果與上述相仿,則不再贅述。再者,不同的診斷檢定測式亦道出上述模型有好的成效。而模擬結果則指出樣本數增加時以脆弱模型為基礎之模式較其他模型有較好的表現。 3. 以脆弱模型基礎之多階段方法估計瑞典乳癌篩檢資料之結果 在未考慮共變數且脆弱模型作用於三階段模型中,臨床症前期乳癌(λ1)發生率估計值為每千人3例(標準誤:0.003,95%信賴區間(confidence interval, C.I.)為0.002-0.003),由臨床症前期至症狀期之轉移速率(λ2)之估計結果則為0.335(標準誤:0.028, 95% CI: 0.282-0.390),代表異質性的脆弱指標參數(α)之估計結果為4.603 (標準誤:0.359, 95% CI: 3.867-5.000)。將BMI納入模式中,則可發現BMI與λ1呈正相關,但與λ2呈負相關,而α參數則變化不大。 若將脆弱模型同時作用在兩個轉移速率中,則作用於發生率的脆弱指標參數(α1)之估計結果為4.531 (標準誤:0.419),而作用在由臨床症前期至症狀期之轉移速率(α2)則為3.984 (標準誤:0.713)。當BMI納入模式中,則發現對α1影響不大,但會使α2顯著降低至1.026 (標準誤:0.408)。 結論 本論文提出以脆弱模型為基礎處理區間設限資料的模式,其可結合傳統的半母數和母數模型,並模式可有效減少參數增加造成的問題,且確能拓展至包含區間設限資料及截切資料的多階段隨機過程。 Background Interval-censoring data are often intractable in survival analysis particularly when the effect of covariate on time-to-event is elucidated. Numerous statistical methods have been proposed to deal with interval-censoring data with classical examples commencing from the Peto’s (1973) proposal and the delicate mathematical algorithm developed by Turnbull (1976) using non-parametric methods based on the concept of equivalent class, Finkelstein’s (1985) methods with the semi-parametric method using Cox proportional hazards regression model to incorporate the effect of covariates, to Farrington (1996) and Collett’s (2003) extended parametric methods. In spite of these previous methods, their applications to large dataset with too many step times of interval may involve intractable intensively computation problem because most of parametric regression models for two-state disease model are based on piecewise methods with too many parameters. Moreover, statistical methods applied to data on multi-state disease process with interval censoring data like population-based cancer screening are very limited. Aims The objectives of this thesis were (1)to develop computer algorithms for previous methods proposed by Finkelstein and Wolfe (1985) also Collett (2003) and apply to three illustrations used in Finkelstein and Wolfe (1985) to demonstrate the comparability with the previous methods, (2)to develop a frailty-based and random-based exponential regression model to capture the heterogeneity of baseline hazards varying with time underpinning the subset of left and right end of interval censoring time belonging to equivalent class, (3)to develop the frailty-based multi-state Markov model and apply it to data on screening which involved data consisting of interval censoring, left censoring, and left-truncation. Methods and illustrations Methods 1.Develop computer algorithms for previous methods Based on the previous methods proposed by Finkelstein and Wolfe (1985) and Collett (2003), computer algorithms for estimation were developed as follows. (1) Finkelstein and Wolfe (1985) semi-parametric Cox proportional hazards regression model: Following the Finkelstein and Wolfe (1985) method, a EM algorithm for self-consistency algorithm was written with SAS IML program. (2) Collett (2003) piecewise exponential model (parametric method):Computer algorithms with nonlinear-mixed methods and Bayesian MCMC methods were also developed following Collett (2003) piecewise exponential model and was applied to the Breast cosmetic data. 2. Frailty-based and random-based exponential regression model The new model is to introduce frailty term with gamma distribution on random effect to capture the distribution of baseline hazard arising from a subset of left and right end of interval censoring time belonging to equivalent class (Turnbull 1976). We made use of Bayesian Markov Chain Monte Carlo (MCMC) to estimate parameters with a variety of distributions. This model was applied to the Breast cosmetic data. 3. Frailty-based Multistate Markov Model Frailty-based two-state model was extended to the corresponding multistate Markov model by introducing frailty terms with gamma distribution in conjunction with Markov process to estimate transition parameters and the effect of relevant covariates on each transition state. This frailty-based multistate Markov model was applied to the Swedish Breast Cancer screening data. Data Source The Breast cosmetic data collected by Beadle et al. (1984a, 1984b) including 94 females were used for the illustration of the proposed models. The patients were followed with the predetermined interval of four to six months. The occurrence of event was defined as the first occurrence of moderate or severe retraction. The Swedish Breast Cancer Screening data is a population-based mammographic screening randomized controlled trial implemented in Kopparberg, Sweden. A total of 1321 breast cancers were recognized, 789 were screen-detected, of which those detected in the prevalent screen were left-censored for the preclinical detectable phase (PCDP) and also left truncated for the clinical phase (CP), and those detected in subsequent screens were interval-censored data for the PCDP state. Another 532 patients were interval cancer cases, which were interval-censored data for the PCDP and uncensored for the CP. Results 1.Estimated results with the developed computer algorithm The estimated regression coefficients of the semi-parametric Cox proportional hazards regression model using our algorithm were close to the results from literature, but the standard errors were smaller than their counterparts in Finkelstein and Wolf (1985). 2.Estimated results of frailty-based method applying to the breast cosmetic data The Bayesian MCMC estimation had the estimated effect of treatment similar to the Collett method. Applying the frailty-based model, the estimated effect size of treatment with a logit link to the probability of occurring events were robust to whether the gamma-distributed, lognormal-distributed baseline hazards, or random effect was introduced, and was compatible with that derived from the piecewise method used by Collett. The estimated results and model fitting remained similar when the complementary log-log link function was used. The diagnostic tests indicated the good performance of our MCMC simulation. Simulated results have shown frailty-based model performed better compared with the previous methods when sample size increased. 3.Estimated results of applying frailty-based multistate method applying to the Swedish Breast Cancer Screening data The incidence of breast cancer (λ1) was estimated as 0.003 (sd: 0.0003, 95% CI: 0.002-0.003), and the rate from the PCDP to the CP (λ2) was estimated as 0.335 (sd: 0.028, 95% CI: 0.282-0.390). The parameter of heterogeneity (α) was estimated as 4.603 (sd: 0.359, 95% CI: 3.867-5.000). A positive association between BMI and λ1, but an inverse relationship between BMI and λ2 were observed. The estimated α did not have a substantial change in the model incorporating covariates. In the model with two random effects on both transitions, the standard error of λ2 was elevated. The random effect estimates for the first transition (α1) and the second transition (α2) were 4.531 (sd: 0.419) and 3.984 (sd: 0.713), respectively. When the BMI was considered, the estimated α1 (4.599, sd: 0.358) was similar to the model without covariates but that of the second transition (α2) was remarkably reduced to 1.026 (sd: 0.408). Conclusion The proposed frailty-based model for interval-censoring data provides a novel alternative to the traditional semi-parametric and parametric models. It can alleviate the disadvantage of increased number of parameters. In addition to the novelty of developing a frailty-based model can be easily extended to multistate stochastic process involving data containing interval-censoring and truncation. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50368 |
DOI: | 10.6342/NTU201601503 |
全文授權: | 有償授權 |
顯示於系所單位: | 流行病學與預防醫學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-105-1.pdf 目前未授權公開取用 | 2.81 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。