請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76882完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 陳正剛(Argon Chen) | |
| dc.contributor.author | Jia-Rong Li | en |
| dc.contributor.author | 李佳蓉 | zh_TW |
| dc.date.accessioned | 2021-07-10T21:39:21Z | - |
| dc.date.available | 2021-07-10T21:39:21Z | - |
| dc.date.copyright | 2020-09-10 | |
| dc.date.issued | 2020 | |
| dc.date.submitted | 2020-08-12 | |
| dc.identifier.citation | [1] Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences, 96(12), 6745-6750. [2] Al-Ramahi, I., Giridharan, S. S. P., Chen, Y. C., Patnaik, S., Safren, N., Hasegawa, J., ... Clarke, J. (2017). Inhibition of PIP4Kγ ameliorates the pathological effects of mutant huntingtin protein. Elife, 6, e29123. [3] Ament, S. A., Pearl, J. R., Cantle, J. P., Bragg, R. M., Skene, P. J., Coffey, S. R., ... Rosinski, J. (2018). Transcriptional regulatory networks underlying gene expression changes in Huntington's disease. Molecular systems biology, 14(3), e7435. [4] Anderson, A. N., Roncaroli, F., Hodges, A., Deprez, M., Turkheimer, F. E. (2008). Chromosomal profiles of gene expression in Huntington's disease. Brain, 131(2), 381-388. [5] Bhattacharya, P., Chekmenev, E. Y., Reynolds, W. F., Wagner, S., Zacharias, N., Chan, H. R., ... Ross, B. D. (2011). Parahydrogen‐induced polarization (PHIP) hyperpolarized MR receptor imaging in vivo: a pilot study of 13C imaging of atheroma in mice. NMR in Biomedicine, 24(8), 1023-1028. [6] Borovecki, F., Lovrecic, L., Zhou, J., Jeong, H., Then, F., Rosas, H. D., ... Krainc, D. (2005). Genome-wide expression profiling of human blood reveals biomarkers for Huntington's disease. Proceedings of the National Academy of Sciences, 102(31), 11023-11028. [7] Chen, S., Yang, D., Liu, Z., Li, F., Liu, B., Chen, Y., ... Zheng, Y. (2020). Crucial Gene Identification in Carotid Atherosclerosis Based on Peripheral Blood Mononuclear Cell (PBMC) Data by Weighted (Gene) Correlation Network Analysis (WGCNA). Medical Science Monitor: International Medical Journal of Experimental and Clinical Research, 26, e921692-1. [8] Deng, Y., Ai, J., Guan, X., Wang, Z., Yan, B., Zhang, D., ... Yang, M. Q. (2014). MicroRNA and messenger RNA profiling reveals new biomarkers and mechanisms for RDX induced neurotoxicity. BMC genomics, 15(S11), S1. [9] Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., ... Bloomfield, C. D. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. science, 286(5439), 531-537. [10] Hoerl, A. E., Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55-67. [11] Izzo, A., Manco, R., Bonfiglio, F., Calì, G., De Cristofaro, T., Patergnani, S., ... Conti, A. (2014). NRIP1/RIP140 siRNA-mediated attenuation counteracts mitochondrial dysfunction in Down syndrome. Human molecular genetics, 23(16), 4406-4419. [12] Jiang, M., Wang, J., Fu, J., Du, L., Jeong, H., West, T., ... Seredenina, T. (2012). Neuroprotective role of Sirt1 in mammalian models of Huntington's disease through activation of multiple Sirt1 targets. Nature medicine, 18(1), 153-158. [13] Johnson, J. W. (2000). A heuristic method for estimating the relative weight of predictor variables in multiple regression. Multivariate behavioral research, 35(1), 1-19. [14] Johnson, R. M. (1966). The minimal transformation to orthonormality. Psychometrika, 31(1), 61-66. [15] Lovrecic, L., Slavkov, I., Džeroski, S., Peterlin, B. (2010). ADP-ribosylation factor guanine nucleotide-exchange factor 2 (ARFGEF2): a new potential biomarker in Huntington's disease. Journal of International Medical Research, 38(5), 1653-1662. [16] Lye, J. J., Latorre, E., Lee, B. P., Bandinelli, S., Holley, J. E., Gutowski, N. J., ... Harries, L. W. (2019). Astrocyte senescence may drive alterations in GFAPα, CDKN2A p14 ARF, and TAU3 transcript expression and contribute to cognitive decline. Geroscience, 41(5), 561-573. [17] Nayak, A., Salt, G., Verma, S. K., Kishore, U. (2015). Proteomics approach to identify biomarkers in neurodegenerative diseases. In International Review of Neurobiology (Vol. 121, pp. 59-86). Academic Press. [18] Pomeroy, S. L., Tamayo, P., Gaasenbeek, M., Sturla, L. M., Angelo, M., McLaughlin, M. E., ... Allen, J. C. (2002). Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 415(6870), 436-442. [19] Potashkin, J. A., Bottero, V., Santiago, J. A., Quinn, J. P. (2019). Computational identification of key genes that may regulate gene expression reprogramming in Alzheimer’s patients. PloS one, 14(9), e0222921. [20] Reis, K., Fransson, Å., Aspenström, P. (2009). The Miro GTPases: at the heart of the mitochondrial transport machinery. FEBS letters, 583(9), 1391-1398. [21] Salvador, S., Chan, P. (2004, November). Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. In 16th IEEE international conference on tools with artificial intelligence (pp. 576-584). IEEE. [22] Sen, A., Srivastava, M. (2012). Regression analysis: theory, methods, and applications. Springer Science Business Media. [23] Shen, Z., Chen, A. (2020). Comprehensive relative importance analysis and its applications to high dimensional gene expression data analysis. Knowledge-Based Systems, 106120. [24] Song, Q., Ni, J., Wang, G. (2011). A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE transactions on knowledge and data engineering, 25(1), 1-14. [25] Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., ... Mesirov, J. P. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences, 102(43), 15545-15550. [26] Sun, X., Li, P. P., Zhu, S., Cohen, R., Marque, L. O., Ross, C. A., ... Rudnicki, D. D. (2015). Nuclear retention of full-length HTT RNA is mediated by splicing factors MBNL1 and U2AF65. Scientific reports, 5, 12521. [27] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288. [28] Yao, J., Le, T. C., Kos, C. H., Henderson, J. M., Allen, P. G., Denker, B. M., Pollak, M. R. (2004). α-Actinin-4-mediated FSGS: an inherited kidney disease caused by an aggregated and rapidly degraded cytoskeletal protein. PLoS Biol, 2(6), e167. [29] Yiching, C. (2019). Relative importance based hierarchical clustering and its application to gene expression analysis. [30] Zou, H., Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), 67(2), 301-320. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76882 | - |
| dc.description.abstract | 變數選擇一直是資料分析中很重要也很複雜的議題,首當其衝的問題是如何判斷每個解釋變數對反應變數的重要程度?常見的判別指標如簡單相關係數、標準化迴歸係數。但是當解釋變數之間有共線性相關時,此兩指標皆無法同時考量到自身對變數選擇一直是資料分析中很重要也很複雜的議題,首當其衝的問題是如何判斷每個解釋變數對反應變數的重要程度?常見的判別指標如簡單相關係數、標準化迴歸係數。但是當解釋變數之間有共線性相關時,此兩指標皆無法同時考量到自身對反應變數的影響(direct effect)及自身與其它解釋變數的相互影響對反應變數所造成的影響(joint effect),於是J.W. Johnson提出了相對權重(Relative Weight) ,能在考量所有變數的關係與影響下計算變數自身的相對重要程度。但此方法只適用在解釋變數無完全共線性(non-singular)且樣本數大於變數數量(n>p)之情況,故Shen與Chen以相對權重為基礎提出適用於變數存在共線性或n≤p情況的「廣泛」相對重要性,能在各種資料情況下判別解釋變數對反應變數的相對重要程度。在變數選擇的議題中,除了考量解釋變數對反應變數的相對重要程度,也需考量解釋變數自身與模型中其它解釋變數對反應變數的影響貢獻度相似性,由於相對重要性會平分變數的共同貢獻,具有「群組效果」,也就是具高度相似貢獻的變數的相對重要性大小會相當,若同時選取這些變數進入模型,也無法達到較好的解釋效能。為了進一步探討解釋變數之間對反應變數影響貢獻的錯綜關係,學者陳解構相對重要性之構成元素,並以此進行階層式分群,名為「相對重要性階層分群法」,不同於一般非監督式的分群法只考量解釋變數之間的相似程度作為分群依據,此方法同時考量解釋變數影響反應變數的重要程度與解釋變數之間的相似性,藉此方法可以把對反應變數影響具有同質性的變數分在同一群,提供變數相互關係的判斷依據。因此本研究將結合相對重要性與相對重要性階層分群法針對反應變數預測進行變數排序,因為相對重要性階層分群法可以辨別變數之間影響反應變數的相似性,藉此方法化解相對重要性排序的群組效果,達到同時考量變數自身的重要程度與變數之間的相似性以便針對反應變數預測進行變數排序與選擇。本研究首先定義相對重要性階層分群法適用的距離定義,並提出自動化決定分群數的方法,使得分群結果得以合理切割變數群組。接著利用分群結果於每一群內利用變數相對重要性排序選出代表變數,並以代表變數之相對重要性排序群組並定義為「分群結構」。根據群組排序及群組內的變數排序定義出「候選變數規則」之排序方法。在排序變數的過程中,需有一個指標來權衡變數自身的重要程度與變數之間的相似性,本研究進一步利用相對重要性構成元素提出一個權衡指標,衡量變數與模型中其它變數的解釋效能,產生選入預測模型的變數排序。本研究的目的是在高維度資料中進行反應變數預測的變數排序與選擇,我們將結合正規化迴歸(Regularized Regression) Ridge Regression方法建構預測模型,然後利用高維度的基因表現資料案例進行驗證,同時與相對重要性、正規化迴歸中的Lasso及Elastic Net進行預測結果比較,驗證本研究方法在反應變數預測上的優勢。為了可以進一步選擇預測模型的變數,本研究進一步利用預測結果與頻繁模式挖掘(Frequent Pattern Mining)完成變數選擇,針對反應變數預測找出最具解釋效能的變數組合。 | zh_TW |
| dc.description.abstract | Variable selection has always been an important and complex issue in the field of data analysis. The first question is how to judge the importance of each explanatory variable. Indicators commonly used are simple correlation coefficients and standardized regression coefficients. However, when there exists collinearity among explanatory variables, neither of these two indicators are not able to consider the direct effect of individual variable on the response variable and the join effect of multiple variables on the response variable simultaneously. Therefore, J.W. Johnson proposes an indicator, called Relative Weight, which can calculate the relative importance of the variables considering the relationship among all the variables. However, this indicator is only applicable to the case where the variables are not completely collinear (non-singular) with the sample size is greater than the number of variables (n>p). Then, Shen and Chen propose 'comprehensive' relative importance to overcome the difficulty of Relative Weight under the low-rank condition.On the topic of variable selection, in addition to considering the relative importance of the explanatory variables to the response variable, the similarity among the explanatory variables on the contribution to the response variable should be considered. Because the relative importance indices will be equalized for variables with the common contribution to explain the response variable and form a 'group effect', when these variables are selected at the same time in the model better prediction performance cannot be achieved. In order to further explore the intricate relationship among the variables, Chen deconstructs relative importance and carries out hierarchical clustering, called 'relative-importance-based hierarchical clustering'. Different from the unsupervised clustering method, this method considers both the importance of the variables and the similarity between the variables so that it can cluster the variables with a common structure of contributions to the response variables into the same group.Therefore, this research will combine the relative importance and relative-importance-based hierarchical clustering for variable ranking and selection to build a better prediction model. Because the hierarchical clustering can identify the similarity between variables, it resolves group effect of the relative importance to consider both the importance of the variables and the similarity between the variables simultaneously so that we can rank and select variables for prediction model.This research first defines the distance applicable to the relative importance hierarchical clustering method and proposes a method for automatically determining the optimum number of clusters, so that the clustering results can group the variables properly based on their similarities in contributions to explain the response variable. Then, we define the 'grouping structure' by selecting a representative variable in each group and ranking the groups according to the relative importance of the representative variables. The variable ranking method of 'candidate variable rule' is defined according to the grouping structure. In the process of variable ranking, an indicator is needed to weigh the importance of the variables and the similarity between the variables. This study further uses relative importance component to propose an indicator to measure the explanatory powers of the variables to generate variable ranking for prediction model.The purpose of this research is to rank and select variables for the response variable prediction in high-dimensional data. We will combine our method with Ridge Regression to build the prediction model, and then use the high-dimensional gene expression data for validation. At the same time, we compare the prediction results of our method with the relative importance, simple correlation, Lasso and Elastic Net to verify the advantages of our method in predicting the response variable. In order to identify specific variables for the prediction model, we further use the prediction results and frequent pattern mining to find the best variable combination for the response variable prediction. | en |
| dc.description.provenance | Made available in DSpace on 2021-07-10T21:39:21Z (GMT). No. of bitstreams: 1 U0001-1208202002445300.pdf: 4485496 bytes, checksum: 0f95ae586401647bd47613a4e3329e57 (MD5) Previous issue date: 2020 | en |
| dc.description.tableofcontents | 誌謝 i 摘要 ii ABSTRACT iv 目錄 vi 圖目錄 viii 表目錄 x Chapter 1 緒論 1 1.1 研究背景 1 1.2 研究動機與目標 2 1.3 論文架構 2 Chapter 2 文獻探討 4 2.1 非監督與監督式學習 4 2.1.1 階層式分群法(Hierarchical Clustering) 4 2.1.2 頻繁模式挖掘(Frequent Pattern Mining) 6 2.1.3 迴歸分析(Regression Analysis) 11 2.1.4 正規化迴歸(Regularized Regression) 14 2.2 變數相對重要性 16 2.2.1 最佳近似正交轉換(Johnson’s Transformation) 16 2.2.2 相對權重(Relative Weight) 18 2.2.3 非行滿秩矩陣變數相對重要性(Relative Importance) 21 2.3 變數相對重要性階層分群 25 2.3.1 相對重要性構成元素(Relative Importance Components) 25 2.3.2 考量解釋變數與反應變數之間關係的階層式變數分群 36 2.3.3 相對重要性構成元素之簡化方法 36 Chapter 3 相對重要性階層分群之變數排序與選擇 40 3.1 相對重要性階層分群之距離 43 3.2 相對重要性階層分群之群數決定 47 3.2.1 聚合距離之最遠垂直距離法 49 3.2.2 聚合距離之二段最遠垂直距離法 52 3.3 利用相對重要性分群之變數排序 55 3.3.1 分群結構 57 3.3.2 候選變數規則 57 3.4 利用頻繁模式挖掘進行變數選擇 69 Chapter 4 方法應用與結果分析 73 4.1 模擬案例 73 4.2 實際案例 75 Chapter 5 結論與未來研究 87 參考文獻 89 | |
| dc.language.iso | zh-TW | |
| dc.subject | 相對重要性 | zh_TW |
| dc.subject | 變數排序 | zh_TW |
| dc.subject | 變數選擇 | zh_TW |
| dc.subject | 基因預測 | zh_TW |
| dc.subject | 階層式分群 | zh_TW |
| dc.subject | 相對權重 | zh_TW |
| dc.subject | Hierarchical Clustering | en |
| dc.subject | Gene Prediction | en |
| dc.subject | Variable Ranking | en |
| dc.subject | Variable Selection | en |
| dc.subject | Relative Importance | en |
| dc.subject | Relative Weight | en |
| dc.title | 利用相對重要性階層分群變數排序與選擇建構預測模型及其基因預測應用 | zh_TW |
| dc.title | Variable Ranking and Selection for Prediction Model based on Relative Importance Hierarchical Clustering and Its Applications to Gene Prediction | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 108-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.coadvisor | 徐治平(Jyh-Ping Hsu) | |
| dc.contributor.oralexamcommittee | 藍俊宏(Jakey Blue),吳沛遠(Pei-Yuan Wu),陳炯年(Chiung-Nien Chen) | |
| dc.subject.keyword | 變數排序,變數選擇,相對重要性,相對權重,階層式分群,基因預測, | zh_TW |
| dc.subject.keyword | Variable Ranking,Variable Selection,Relative Importance,Relative Weight,Hierarchical Clustering,Gene Prediction, | en |
| dc.relation.page | 91 | |
| dc.identifier.doi | 10.6342/NTU202003039 | |
| dc.rights.note | 未授權 | |
| dc.date.accepted | 2020-08-12 | |
| dc.contributor.author-college | 共同教育中心 | zh_TW |
| dc.contributor.author-dept | 統計碩士學位學程 | zh_TW |
| 顯示於系所單位: | 統計碩士學位學程 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-1208202002445300.pdf 未授權公開取用 | 4.38 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
