基於隨機子集之穩健監督式降維法

Hsun-Chen Chang; 張訓楨

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73276

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	洪弘(Hung Hung)
dc.contributor.author	Hsun-Chen Chang	en
dc.contributor.author	張訓楨	zh_TW
dc.date.accessioned	2021-06-17T07:25:58Z	-
dc.date.available	2019-08-26
dc.date.copyright	2019-08-26
dc.date.issued	2019
dc.date.submitted	2019-06-26
dc.identifier.citation	Alladi, S. M., P, S. S., Ravi, V., & Murthy, U. S. (2008). Colon cancer prediction with genetic profiles using intelligent techniques. Bioinformation, 3(3), 130-133. Beyer, K., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When Is “Nearest Neighbor” Meaningful? Paper presented at the Database Theory — ICDT’99, pp. 217-235, Berlin, Heidelberg. Cevikalp, H., Verbeek, J., Jurie, F., & Klaser, A. (2008). Semi-supervised dimensionality reduction using pairwise equivalence constraints. In: Proc. VISAPP 2008, pp. 489–496. Donoho, D. L., & Grimes, C. (2003). Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Sciences, 100(10), 5591-5596. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M.; Mesirov, J. P. et al. Science (1999). Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, 286(5439), 531-537. Hung, H., Jou, Z. Y., & Huang, S. Y. (2018). Robust mislabel logistic regression without modeling mislabel probabilities. Biometrics, 74(1), 145–154. Margaret, A. S., Ken, N. R., Pablo, T., Andrew, P. W., Jeffery, L. K., Ricardo, C. T. A., Todd, R. G. (2002). Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine, 8(1), 68 -74. Mollah, M. N. H., Eguchi, S., & Minami, M. (2007). Robust Prewhitening for ICA by Minimizing β-Divergence and Its Application to FastICA. Neural Processing Letters, 25(2), 91-110. Peduzzi, P., Concato, J., Kemper, E., Holford, T. R., & Feinstein, A. R. (1996). A simulation study of the number of events per variable in logistic regression analysis. Journal of Clinical Epidemiology, 49(12), 1373-1379. Petricoin, E. F., Ardekani, A. M., Hitt, B. A., Levine, P. J., Fusaro, V. A., Steinberg, S. M., Liotta, L. A. (2002). Use of proteomic patterns in serum to identify ovarian cancer. The Lancet, 359(9306), 572-577. Pomeroy, S. L., Tamayo, P., Gaasenbeek, M., Sturla, L. M., Angelo, M., McLaughlin, M. E., Golub, T. R. (2002). Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 415, 436. Shmueli, G. (2011). To Explain or to Predict? Statistical Science, Volume 25, Number 3 (2010), 289-310. Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., Sellers, W. R. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2), 203-209. Tenenbaum, J. B., Silva, V. d., & Langford, J. C. (2000). A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290(5500), 2319 -2323. Tin Kam, H. (1998). The random subspace method for constructing decision forests. Pattern Analysis and Machine Intelligence, IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832-844. Wang, S., Lu, J., Gu, X., Du, H., & Yang, J. (2016). Semi-supervised linear discriminant analysis for dimension reduction and classification. Pattern Recognition, 57, 179-189. Xianfa, C., Jia, W., Guihua, W., & Zhiwen, Y. (2014). Local and Global Preserving Semisupervised Dimensionality Reduction Based on Random Subspace for Cancer Classification. Biomedical and Health Informatics, IEEE Journal of Biomedical and Health Informatics, 18(2), 500-507. Yu, G., Zhang, G., Domeniconi, C., Yu, Z., & You, J. (2011). Semi-supervised classification based on random subspace dimensionality reduction. Pattern Recognition 45, 1119–1135.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73276	-
dc.description.abstract	癌症的精准預測在近幾年發展出許多不同方法，由於資料型態多為高維度資料，必須先將維度降低以利分析，一個基於隨機子集的局部和全局保持的降維法利用隨機子集去建構出半監督式模型，然而，其採用人為的參數設定直接將監督式與非監督式的資訊加起來，再建構出拉普拉斯矩陣表示出資料點間的關係，在此種人為設定下，該參數並無選擇標準，僅能依不同的處理經驗去做設定，因此參數設定高度影響各種資料型態的準確率。在本篇研究中，為了解決參數無法固定的狀況，本研究改良出另一穩健隨機子集監督式降維法RRS-SDR，改以利用伽馬邏輯斯回歸(r–logistic Regression)直接估計該資料點被分為某一類別的機率，再計算兩資料點被分為同類的機率，並代入拉普拉斯矩陣中，以此取代需要比例混合參數的半監督式學習演算法，此外，對於有錯誤標記的資料集，RRS-SDR也有較佳的分類表現。	zh_TW
dc.description.abstract	Precise cancer classification developed various methods in these years. Because of the high-dimensional data type, dimensionality reduction is an essential preprocessing tool. A local and global preserving semi-supervised dimensionality reduction based on random subspace algorithm (RSLGSSDR) utilized random subspace for semi-supervised dimensionality reduction. It used tuning parameter to combine the information between the supervised and the unsupervised parts, constructing Laplacian matrix which connects the relationship between each data point. Whereas this tuning parameter did not have selecting principle, the characteristic of datasets could be diverse. Thus, it highly influenced the classification accuracy. In this thesis, to solve the instability of the tuning parameter, we developed Robust Random Subspace-based Supervised Dimension Reduction method (RRS-SDR). We utilized r–logistic regression to estimate the label probability, and then calculated the probability of two data points which are regarded as the same class. By substituting the probability into Laplacian matrix, we replaced semi-supervised learning with our new method. We showed that RRS-SDR has superior classification performance on mislabel datasets.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T07:25:58Z (GMT). No. of bitstreams: 1 ntu-108-R05849015-1.pdf: 2150371 bytes, checksum: 813742413cf00c998e2b2bd629f1e616 (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	Acknowledgements i Abstract ii List of Figures iv Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Introduction of RSLGSSDR 2 1.3 Drawbacks of RSLGSSDR 9 Chapter 2 The Proposed Methods 11 2.1 The construction of RRS-SDR 11 Chapter 3 Experimental Results 16 3.1 Datasets 16 3.2 Settings 19 3.2.1 Preparing the datasets 19 3.2.2 Mislabeling 20 3.2.3 Method comparison 20 3.2.4 Classification process 21 3.3 The number of subset size of random-partition 22 3.4 The number of random-partition 26 3.5 Results on different target dimensionalities 28 3.6 Results on different mislabel rates 33 3.7 ROC curve on different mislabel rates 35 Chapter 4 Conclusion and Future Work 39 Bibliography 43 Appendix 45 A. Robust γ–logistic Regression 45 B. Selection of γ 46
dc.language.iso	en
dc.subject	癌症分類	zh_TW
dc.subject	降維度	zh_TW
dc.subject	拉普拉斯矩陣	zh_TW
dc.subject	伽馬邏輯斯回歸	zh_TW
dc.subject	隨機子集演算法	zh_TW
dc.subject	r-logistic regression	en
dc.subject	dimensionality reduction	en
dc.subject	Laplacian matrix	en
dc.subject	random subspace method	en
dc.subject	Cancer classification	en
dc.title	基於隨機子集之穩健監督式降維法	zh_TW
dc.title	A Robust Random Subspace-based Supervised Dimension Reduction Method	en
dc.type	Thesis
dc.date.schoolyear	107-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	蕭朱杏(Chu-Hsing Hsiao),盧子彬(Tzu-Pin Lu)
dc.subject.keyword	癌症分類,降維度,拉普拉斯矩陣,伽馬邏輯斯回歸,隨機子集演算法,	zh_TW
dc.subject.keyword	Cancer classification,dimensionality reduction,Laplacian matrix,r-logistic regression,random subspace method,	en
dc.relation.page	47
dc.identifier.doi	10.6342/NTU201901081
dc.rights.note	有償授權
dc.date.accepted	2019-06-27
dc.contributor.author-college	公共衛生學院	zh_TW
dc.contributor.author-dept	流行病學與預防醫學研究所	zh_TW
顯示於系所單位：	流行病學與預防醫學研究所

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 未授權公開取用	2.1 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。