Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 工業工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/77325
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳正剛zh_TW
dc.contributor.author陳羿晴zh_TW
dc.contributor.authorYi-Ching Chenen
dc.date.accessioned2021-07-10T21:56:29Z-
dc.date.available2024-08-06-
dc.date.copyright2019-08-07-
dc.date.issued2019-
dc.date.submitted2002-01-01-
dc.identifier.citation[1] A. K. Sen, Regression analysis : theory, methods and applications (Springer texts in statistics.). New York: Springer-Verlag, 1990.
[2] A. N. Zaied, M. G. Habishy, and M. A. Saleh, Acute Leukemia Classification using Bayesian Networks. 2012, pp. 1419-1426.
[3] A. Wai-Ho, K. C. C. Chan, A. K. C. Wong, and W. Yang, "Attribute clustering for grouping, selection, and classification of gene expression data," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 2, no. 2, pp. 83-101, 2005.
[4] D. Ghosh and A. M. Chinnaiyan, "Classification and selection of biomarkers in genomic data using LASSO," (in eng), Journal of biomedicine & biotechnology, vol. 2005, no. 2, pp. 147-154, 2005.
[5] D. V. Budescu, "Dominance analysis: A new approach to the problem of relative importance of predictors in multiple regression," Psychological Bulletin, vol. 114, no. 3, pp. 542-551, 1993.
[6] G. Gamberoni, E. Lamma, F. Riguzzi, S. Storari, and S. Volinia, "Bayesian Networks Learning for Gene Expression Datasets," in Advances in Intelligent Data Analysis VI, Berlin, Heidelberg, 2005, pp. 109-120: Springer Berlin Heidelberg.
[7] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene Selection for Cancer Classification using Support Vector Machines," Machine Learning, vol. 46, no. 1, pp. 389-422, 2002/01/01 2002.
[8] J. J. Hughey and A. J. Butte, "Robust meta-analysis of gene expression using the elastic net," (in eng), Nucleic acids research, vol. 43, no. 12, pp. e79-e79, 2015.
[9] J. W. Johnson, "A Heuristic Method for Estimating the Relative Weight of Predictor Variables in Multiple Regression," Multivariate Behavioral Research, vol. 35, no. 1, pp. 1-19, 2000/01/01 2000.
[10] J. Zhu and T. Hastie, Classification of Gene Microarrays by Penalized Logistic Regression. 2004, pp. 427-43.
[11] N. Friedman, M. Linial, I. Nachman, and D. Pe'er, "Using Bayesian networks to analyze expression data," presented at the Proceedings of the fourth annual international conference on Computational molecular biology, Tokyo, Japan, 2000.
[12] P. E. Green, J. Douglas Carroll, and W. Desarbo, A New Measure of Predictor Variable Importance in Multiple Regression. 1978, pp. 356-360.
[13] R. M. Johnson, "The minimal transformation to orthonormality," Psychometrika, vol. 31, no. 1, pp. 61-66, 1966/03/01 1966.
[14] S. Zixin and C. Argon, "Relative importance under low-rank condition and its applications to semiconductor yield analysis," in 2017 International Conference on Decision Support System Technology, Namur, Belgium, 2017, pp. 153–159.
[15] T. Hastie, The elements of statistical learning data mining, inference, and prediction, 2nd ed. ed. (Springer series in statistics). New York, NY: Springer-Verlag New York, 2009.
[16] T. R. Golub et al., "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, no. 5439, p. 531, 1999.
[17] W. A. Gibson, “Orthogonal Predictors: A Possible Resolution of the Hoffman-Ward Controversy.” Psychological Reports 11, no. 1 (August 1962): 32–34.
[18] Y. Oshima et al., "DNA microarray analysis of hematopoietic stem cell-like fractions from individuals with the M2 subtype of acute myeloid leukemia," Leukemia, vol. 17, no. 10, pp. 1990-1997, 2003/10/01 2003.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/77325-
dc.description.abstract變數選擇是資料分析領域中歷久不衰的議題,其中自變數之間的共線性是線性模型在變數選擇時主要考量的問題,早在1966年R.M.Johnson就根基於共線性問題提出了「最佳近似正交轉換」方法,目的為將原始變數轉換成正交變數以解決共線性問題,此後,Green (1978)[12]與J.W. Johnson (2000)[9]相繼延伸R.M.Johnson的正交變數概念提出其他計算變數重要性的方法,其中J.W. Johnson的方法為相對權重(Relative Weight),但由於這些方法皆受限於最佳近似正交轉換方法中自變數必須無完全共線性(non- singular)且資料筆數大於變數數量(n>p)之情況,故Zixin (2017)[14]以相對權重為基礎提出適用於變數存在共線性或是n≤p情況的「相對重要性」,本研究將延伸此變數重要性指標進行變數分群。
分群(Clustering)為非監督式學習方法,用於處理沒有正確標籤可參考的資料,透過不同屬性對物件的描述將物件依照相似度歸類,傳統的階層式分群方法(Hierarchical Clustering)是常見的分群方法,其演算法直觀,且只需定義距離計算方式、物件聚合方式便能得出分群結果,且該結果以樹狀圖呈現,易於透過視覺觀察出物件之間的相似關係,本文中將擷取此方法的優點於變數分群。
於過去結合變數重要性與變數分群方法的相關文獻中,研究目的為從各群集中選擇同質性低的重要因子,使選出之變因能更全面地預測結果,其方法皆以兩階段方式進行分析,首先將變因依照彼此的相似程度分群,再來才分析各群變因與結果的關係,藉此篩選出不同面向的重要變因。但以上方法在分析過程中使用了相同的資料集兩次,故本研究希望整合此兩階段分析,藉此減少資源的重複利用性。
本研究的目的為將變因以影響結果的程度及變因本身的相似性區分成群,故變因與結果的關係及變因彼此之間的關係皆是變數分群時考量的重點。透過了解相對重要性的幾何意義,並將之合理地拆解成相對重要性構成元素,作為階層式分群中計算距離的依據,其中相對重要性構成元素間彼此獨立、意義不重疊,且相對重要性含有迴歸分析的監督式概念並考量自變數之間的共線性影響,故有利於將變因依照對結果的影響程度及變因間的同質性一次性地分群。
本文使用模擬案例及Oshima (2003)[18]等人於文獻中提供的基因表現與白血病資料進行方法驗證,並與非監督式階層分群與貝氏網路之結果比較。
zh_TW
dc.description.abstractVariable selection is a long-standing issue in the field of data analysis. When selecting the variables in linear models, the collinearity between independent variables is the main consideration. As early as 1966, R.M. Johnson proposed a method to transform original variables to orthogonal variables in order to solve the multi-collinearity problem in linear regression model. Other methods of calculating relative importance based on R.M. Johnson’s ideas are purposed by Green (1978) [12] and J.W. Johnson (2000) [9] successively, and the method purposed by J.W. Johnson is called “Relative Weight”. However, all these relative importance methods are limited by the R.M. Johnson (1966) transformation method, which independent variables must be non-singularity and the number of sample is greater than the number of variables (n>p). Therefore, Zixin (2017) [14] proposed a comprehensive relative importance method to overcome the difficulty of relative weights on low-rank condition. This study will extend the method of relative weights and its comprehensive method proposed by Zixin for variable grouping.
Clustering is an unsupervised method which is applied on unlabeled data and cluster objects according to their similarities through different attributes. One of the common clustering method is Hierarchical Clustering. The advantage of Hierarchical Clustering is that its algorithm is intuitive and its result is presented in a tree diagram. That makes the method be understood easily and the results be also interpretable. This research takes advantages of Hierarchical Clustering method for variable grouping.
In previous study aiming for selecting important factors in different aspect, they combine variable importance with variable grouping in order to have a more comprehensive results of variable selection. Those methods in previous study are analyzed in a two-stage manner. Firstly, the variables were grouped according to their similarities. Secondly, analyze the relationship between the results and each group of variables. However, the two-stage manner use the same data set twice, so this study aims to improve the previous method into a one-time analysis to avoid reusing of resources.
The purpose of this study is to distinguish the causes by their importance of explaining the results and the similarity of the causes. Therefore, the relationship between the causes and the results and the relationship between the causes themselves are two focus of the grouping method. According to the geometric meaning of relative weights, this study takes apart relative weight into multiple components which values are addable. In addition, the meaning of these components are independent to each other, so they are considered as the good reference to calculate the distance between variables.
Simulation cases and real cases of gene expression whose data set provided by Oshima et al (2003) [18] are used for method validation, and furthermore compared with the results of unsupervised hierarchical clustering and Bayesian networks.
en
dc.description.provenanceMade available in DSpace on 2021-07-10T21:56:29Z (GMT). No. of bitstreams: 1
ntu-108-R06546005-1.pdf: 5108153 bytes, checksum: 628cbfeaaadbd046b85a4c0e6421a8b0 (MD5)
Previous issue date: 2019
en
dc.description.tableofcontents誌謝 i
摘要 ii
ABSTRACT iii
目錄 v
圖目錄 viii
表目錄 xiii
Chapter 1 緒論 1
1.1 研究背景 1
1.2 研究動機與目標 2
1.3 論文架構 4
Chapter 2 文獻探討 5
2.1 監督與非監督式學習 5
2.1.1 階層式分群法 (Hierarchical Clustering) 6
2.1.2 迴歸分析 (Regression Analysis) 9
2.1.3 貝氏網路 (Bayesian Network) 12
2.2 變數相對重要性 13
2.2.1 最佳近似正交轉換 (Johnson’s Transformation) 14
2.2.2 相對權重 (Relative Weight) 16
2.2.3 非行滿秩矩陣變數相對重要性 (Relative Importance) 19
2.3 基因表現(Gene Expression)與疾病研究 24
2.3.1 白血病種類與基因表現之間的關係 24
2.3.2 基因表現資料分析方法 25
Chapter 3 變數相對重要性之階層分群方法 26
3.1 相對重要性構成元素(Relative Importance Components)及其幾何意義 26
3.1.1 一般情況之相對重要性構成元素及其幾何意義 28
3.1.2 非一般情況之相對重要性構成元素及其幾何意義 37
3.2 考量自變數與應變數之間關係的階層式變數分群 44
3.2.1 透過相對重要性構成元素進行階層式變數分群 44
3.2.2 分群結果意義闡述與變數群之相對重要性 48
3.3 n《p案例相對重要性構成元素之簡化方法 49
Chapter 4 方法應用與結果分析 57
4.1 模擬案例 57
4.1.1 n>p 59
4.1.2 n≤p 64
4.2 實際案例:探討與白血病有關之基因分群 71
Chapter 5 結論與未來研究 78
參考文獻 80
-
dc.language.isozh_TW-
dc.subject階層式分群zh_TW
dc.subject相對重要性zh_TW
dc.subject變數選擇zh_TW
dc.subject基因表現zh_TW
dc.subject相對權重zh_TW
dc.subjectGene Expressionen
dc.subjectVariable Selectionen
dc.subjectRelative Importanceen
dc.subjectRelative Weighten
dc.subjectHierarchical Clusteringen
dc.title變數相對重要性之階層分群方法及其於基因表現資料分析之應用zh_TW
dc.titleRelative Importance based Hierarchical Clustering and Its Application to Gene Expression Analysisen
dc.typeThesis-
dc.date.schoolyear107-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee藍俊宏;陳炯年;何明志zh_TW
dc.contributor.oralexamcommittee;;en
dc.subject.keyword變數選擇,相對重要性,相對權重,階層式分群,基因表現,zh_TW
dc.subject.keywordVariable Selection,Relative Importance,Relative Weight,Hierarchical Clustering,Gene Expression,en
dc.relation.page81-
dc.identifier.doi10.6342/NTU201902386-
dc.rights.note未授權-
dc.date.accepted2019-08-05-
dc.contributor.author-college工學院-
dc.contributor.author-dept工業工程學研究所-
顯示於系所單位:工業工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-107-2.pdf
  未授權公開取用
4.99 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved