Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 機械工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/63915
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳正剛(Argon chen)
dc.contributor.authorAmos Hongen
dc.contributor.author洪士峰zh_TW
dc.date.accessioned2021-06-16T17:22:57Z-
dc.date.available2017-08-28
dc.date.copyright2012-08-28
dc.date.issued2012
dc.date.submitted2012-08-16
dc.identifier.citation[1] J.W. Johnson, “A heuristic method for estimating the relative weight of predictor variables in multiple regression”, Multivariate Behavioral Research, vol. 35, pp. 1-19, 2000.
[2] A. Sen and M. Srivastava, Regression analysis, theory, methods and applications. Springer, Berlin, 1990.
[3] L. Breiman , J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and regression trees. Monterey, CA: Wadsworth. 1984.
[4] H. Hotelling, “Relations between two sets of variates”, Biometrika, vol. 28, pp. 321-377, 1936.
[5] H. Wold, “Path model with latent variables: the NIPALS approach”, in Quantitative Sociology: International Perspectives on Mathematical and Statistical Modeling, H. M. Blalock et al., editors, New York: Academic Press, 1975, pp. 307-357.
[6] J. Gui and H. Z. Li, “Penalized cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data”, Bioinformatics, Vol 21, pp. 3001–3008, Jul 2005.
[7] V. E. McGee and W.T. Carleton, “Piecewise Regression”, Journal of the American Statistical Association, Vol. 65, pp. 1109-1124, Sept. 1970.
[8] D.M. HAWKINS, “Point Estimation of the Parameters of Piecewise Regression Models”, Applied Statistics, Vol. 25, No.1, pp. 51–57, 1976.
[9] J. LIU, S. WU, and J. V. ZIDEK, “On Segmented Multivariate Regression”, Statistica Sinica, Vol. 7, No.2, pp.497–525, Apr. 1997.
[10] J. BAI, “Estimation of a Change Point in Multiple Regression Models”, Review of Economics and Statistics, Vol. 79, No.4, pp.551–563, Nov. 1997.
[11] ZJ, Qu and P. Perron, “Estimating and testing structural changes in multivariate regressions”, Econometrica, Vol. 75, No. 2, pp. 459-502, Mar. 2007.
[12] Y. Wu, “Simultaneous change point analysis and variable selection in a regression problem”, Journal of Multivariate Analysis, Vol. 99, No. 9, pp.2154-2171, Oct. 2008.
[13] W. S. DeSarbo and W. L. Cron, “A maximum likelihood methodology for clusterwise linear regression”, Journal of Classification, Vol.5, No.2, pp. 249-282, Sept. 1988.
[14] H. Spath, “Algorithm 48: a fast algorithm for clusterwise linear regression”, Computing, Vol.29, No.2, pp. 175–181, Jun. 1982.
[15] Q. Shao and Y. Wu, “A consistent procedure for determining the number of clusters in regression clustering”, Journal of Statistical Planning and Inference, Vol.135, No. 2, pp. 461-476, Dec. 2005.
[16] A. Fielding, “Binary segmentation: The automatic detector and related techniques for exploring data structure,” In C. A. O'Muircheartaigh and C. Payne, editors, The Analysis of Survey Data, Volume I, Exploring Data Structure, Wiley, New York, 1997, pp. 221-257.
[17] C.Y. Chen, S. W. Shyue and C. J. Chang, 'Association rule mining for evaluation of regional environments: case study of Dapeng Bay, Taiwan”, International Journal of Innovative Computing, Information and Control, Vol.6, No.8, pp. 3425-3436, 2010.
[18] Y. Kusunoki, M. Inuiguchi and J. Stefanowski, 'Rule induction via clustering decision classes”, International Journal of Innovative Computing, Information and Control, Vol. 4, No. 10, pp.2663-2677, 2008.
[19] S.W. Han and J. Y. Kim, 'A new decision tree algorithm based on rough set theory', International Journal of Innovative Computing, Information and Control, Vol.4, No. 10, pp.2749-2757, 2008.
[20] J. R. Quinlan, “Learning with continuous classes”, in Proceeding Fifth Australian Joint Conference on Artificial Intelligence, 1992, pp. 343-348.
[21] A. Karalic, “Linear regression in regression tree leaves', in Proceeding of International School for Synthesis of Expert Knowledge, Bled, Slovenia, 1992, pp.151-163.
[22] P. Chaudhuri, M. Huang, W. Loh, and R. Yao, “Piecewise-Polynomial Regression Trees,” Statistica Sinica, Vol. 4, No.1, pp. 143-167, Jan. 1994.
[23] A. Dobra and J.E. Gehrke, “Secret: A Scalable Linear Regression Tree Algorithm,” in Proceeding Eighth ACM SIGKDD International Conference Knowledge Discovery and Data Mining, 2002.
[24] C. Vens and H. Blockeel, 'A simple regression based heuristic for learning model trees', Intelligent Data Analysis, Vol.10, No.3, pp.215-236, 2006.
[25] H. Tsuda, H. Shiri, O. Takagi and R. Take, “Yield analysis and improvement by reducing manufacturing fluctuation noise”, in ISSM 2000 proceedings, pp. 249 – 251.
[26] D. Braha and A. Shmilovici, “On the use of Decision Tree Induction for Discovery of Interactions in a Photolithographic Process,” IEEE Transaction on Semiconductor Manufacturing, Vol. 16, No. 4, pp. 644-652, Nov. 2003.
[27] D. Lubinsky, 'Tree Structured Interpretable Regression', In D. Fisher & H. Lenz, Editors, Learning from Data, Lecture Notes in Statistics, Vol. 112, Springer, 1994, pp. 387-398.
[28] D. Malerba, F. Esposito, M. Ceci and A. Appice, “Top-Down Induction of Model Trees with Regression and Splitting Nodes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 26, No. 5, pp.612-625, May 2004.
[29] W. Loh, “Regression Trees with Unbiased Variable Selection and Interaction Detection,” Statistica Sinica, Vol. 12, No.2, pp. 361-386, Apr. 2002.
[30] A. R. Barron, A. Luttge et al., Chemistry of Electronic Materials, A. R. Barron, editor, Rice University, Houston, Texas, 2011
[31] J. A. Wegelin, “A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case”, Technical Report, Department of Statistics, University of Washington, Seattle, 2000.
[32] R. Rosipal and N. Kramer, 'Overview and recent advances in partial least squares,' In Subspace, Latent Structure and Feature Selection Techniques, C. Saunders, et al., editors.. New York: Springer-Verlag , 2006, pp. 34–51.
[33] S. Wold, “PLS for multivariate linear modeling”, in QSAR: Chemometric Methods in MolecularDesign: Methods and Principles in Medicinal Chemistry, H. V. D. Waterbeemd, Ed. New York: Wiley-VCH, 1994, pp. 195-218.
[34] S. Wold, E. Johansson, and M. Cocchi, “PLS- partial least squares projections to latent structures” In 3D QSAR in drug Design, Theory, Methods and Applications, H. Kubinyi, Ed. Leiden: ESCOM Science Publishers, 1993, pp. 523-550.J. A. Wegelin, “A survey of partial least squares (PLS) methods, with emphasis on the two-block case”, Technical report, Department of Statistics, University of Washington, Seatle, 2000.
[35] A. Phatak, and S. De Jong, “The geometry of partial least squares”, Journal of Chemometrics, vol. 11, no.4, pp. 311–338, 1997.
[36] I.T . Jolliffe, Principal Component Analysis, 2nd ed., New York: Springer, 2002.
[37] K.E. Muller, “Understanding canonical correlation through the general linear model and principal component”, The American Statistician, vol. 36, no.4, pp.342-354, 1982.
[38] J. Bland and D. Altman, “Multiple significance tests: the Bonferroni method”, British Medical Journal, Vol. 310, pp.170, Jan. 1995.
[39] G. C. Chow, “Tests of equality between sets of coefficient in two linear regressions”. Econometrica, Vol. 28, No. 3, pp.591-605, Jul. 1960.
[40] P. D. Allison, “Testing for interaction in multiple regression”. The American Journal of Sociology, Vol. 83, No. 1, pp. 144-153, Jul. 1977.
[41] Y. Benjamini and D. Yekutieli, “The control of the false discovery rate in multiple testing under dependency”, Annals of Statistics, Vol. 29, pp. 1165-1188, Aug. 2001.
[42] D. W. Hosmer and S. Lemeshow, Applied Logistic Regression, New York: Wiley, 1989, pp.184.
[43] E. W. Steyerberg, M. J. C. Eijkemans, F. E. Harrell Jr, and J. D. F Habbema, “Prognostic modeling with logistic regression analysis: In search of a sensible strategy in small data sets.” Medical Decision Making, Vol. 21, pp. 45-56, Jan. 2001.
[44] A.C. Rencher, Multivariate statistical inferences and applications, New York: John Wiley and Sons’ INC, 1998.
[45] R. Gittins, Canonical analysis: A review with application in ecology, New York: Springer Verlag, 1985.
[46] D.V. Budescu, “Dominance analysis: A new approach to the problem of relative importance of predictors in multiple regression”, Psychological Bulletin, vol. 114, pp. 542-551, 1993.
[47] J.M. Lebreton, R.E. Ployhart, and R.T. Ladd, “A Monte Carlo comparison of relative importance methodologies”, Organizational Research Methods, vol. 7, pp.258-282, 2004.
[48] Y. E. Chao, Y. Zhao, L. L. Kupper, and L.A. Nylander-French, “Quantifying the relative importance of predictors in multiple linear regression analyses for public health studies”, Journal of Occupational and Environmental Hygiene, vol.5, no.8, pp.519 – 529, 2008.
[49] W. Kruskal, “Relative importance by average over orderings”, The American Statistician, vol.41, no.1, pp.6-10, Feb. 1987.
[50] R. Azen, and D.V. Budescu, “Comparing predictor in multivariate regression models: an extension of dominance analysis”, Journal of Education and Behavioral Statistics, vol.31, pp.157-180, 2006.
[51] Y. Huo and D.V.Budescu, “An extension of dominance analysis to canonical correlation analysis”, Multivariate Behavioral Research, vol. 44, pp. 688-709, 2009.
[52] R.M. Johnson, “The minimal transformation to orthonormality”, Psychomatrika, vol.31, no.1, pp.61-66, 1996
[53] F.H.C. MARRIOTT, 'Tests of significance in canonical analysis', Biometrika, vol.39, no.1-2, pp.58-64, 1952
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/63915-
dc.description.abstract逐步迴歸分析與迴歸樹分析常應用於建立單一反應變數對多個影響因子的因果分析模型。逐步迴歸分析無法自動分群樣本建立逐段線性迴歸模型。迴歸樹循序地選擇屬性進行資料分群,最終並連結各分群至特定的線性迴歸模型,因此迴歸樹可用於建立逐段線性迴歸模型。不過現有的迴歸樹在每個節點選擇屬性並分離資料,在經歷幾個階層後樣本數會快速減少,樣本數消耗會導致之後的屬性選擇結果過度依存於先前分裂形成的小數目樣本,而產生不可靠的屬性選擇。本研究首先將結合迴歸樹與逐步迴歸分析的優點,提出樣本高效迴歸樹方法以有效建立逐段迴歸模型。
另外一方面,當考量多個相關的反應變數與多個相關潛在影響因子變數的複雜關係時, 將反應變數個別考量將不再是發現重要的影響因子的有效方法。雖然文獻中已經有直接同時分析多反應變數對多影響因子相關性的方法。但這些方法在變數多重共線性的前提下無法合理解釋各變數對於此多對多相關性貢獻度。已有文獻提出泛用的架構以估算一對多迴歸分析中的變數重要性指標。本研究其次將延伸此一架構以估算多對多相關分析中變數貢獻性指標。
本文使用假設案例及實際半導體良率分析案例闡明樣本高效迴歸樹及多對多相對重要性分析並驗證兩者於因果分析應用的效力。結果顯示樣本高效迴歸樹在有限的樣本數限制下仍可有效發掘潛在的因果分析模型。案例結果亦顯示多對多相對重要性分析相較於現有方法更有效發掘兩個變數集合之間的因果關係。
zh_TW
dc.description.abstractForward stepwise regression analysis and regression tree are used for one-to-many causal analysis. Forward stepwise regression analysis selects critical attributes all the way with the same set of data. Regression analysis is, however, not capable of splitting data to construct piecewise regression models. Regression trees have been known to be an effective data mining tool for constructing piecewise models by iteratively splitting data set and selecting attributes into a hierarchical tree model. However, the sample size reduces sharply after few levels of data splitting causing unreliable attribute selection. In this research, we propose sample-efficient regression tree (SERT) approach that combines the forward selection in regression analysis and the regression tree methodologies to effectively construct a piecewise linear causal model.
As multiple responses are mingled with potential causal factors, one-response-at–a-time correlation analysis is no longer sufficient to discover critical factors that result in change in correlated responses. Though methodologies of many-to-many correlation analysis have been proposed in the literature, difficulties arise, especially when there exist multi-collinearity effects among variables, to measure the relative importance of a variable’s contribution in the association between a set of responses and a set of factors. Johnson’s dominance analysis [1] offers a general framework for determination of relative importance of independent variables in linear multiple regression models. In this research, we also extend Johnson’s dominance index to many-to-many correlation analysis as a measurement to summarize the association relationship between two sets of variables.
Hypothetical and actual semiconductor yield-analysis cases are used to illustrate both SERT and many-to-many relative importance analysis. Case studies show that SERT is effective in discovering the dataset’s underlying model where the sample size available for analysis is relatively small. Case study also shows the effectiveness of many-to-many relative important methods, as compared to other conventional methods, in analysis of two sets of variables.
en
dc.description.provenanceMade available in DSpace on 2021-06-16T17:22:57Z (GMT). No. of bitstreams: 1
ntu-101-D93522002-1.pdf: 1256354 bytes, checksum: bf43f13cf5e7129f054e6b42f8ce5677 (MD5)
Previous issue date: 2012
en
dc.description.tableofcontentsAcknowledgement i
中文摘要 ii
ABSTRACT iii
CONTENTS v
LIST OF FIGURES viii
LIST OF TABLES xi
Chapter 1 Introduction 1
1.1 Background and Motivation 1
1.1.1 The limitations of multiple regression analysis and regression tree analysis 1
1.1.2 The importance and the lack of many-to-many association methods 7
1.2 Problem Description and Research Objective 10
1.2.1 An unified framework to combine forward regression analysis and regression tree 10
1.2.2 Interpreting many-to-many relationship from viewpoint of regression relative importance analysis 20
1.2.2.1 Partial Least Square 21
1.2.2.2 Canonical correlation analysis 25
1.3 Chapter Outlines 31
Chapter 2 Sample Efficient Regression Tree 33
2.1 Selection of Single Variable 34
2.1.1 Augmented models and selection criterions 34
2.1.2 Tree construction algorithm 40
2.1.3 Piecewise modeling 47
2.1.4 Selection of node type 51
2.2 Selection of Attribute Combination 55
2.2.1 Combination of continuous attributes 57
2.2.2 Combination of encoded attributes 59
2.3 The Stopping Criterion 61
Chapter 3 Case study for SERT 63
3.1 Hypothetical Case Study 63
3.1.1 A case of piecewise linear model 63
3.1.2 A case of two level full hierarchical interaction model 65
3.1.3 A case of piecewise constant model with attribute combination 67
3.2 Semiconductor Yield Learning 69
3.2.1 Yield analysis case 1 69
3.2.2 Yield analysis case 2 73
3.2.3 Yield analysis case 3 76
Chapter 4 Many-to-many Relative Importance Analysis 80
4.1 Relative importance for one-to-many correlation analysis 83
4.2 Relative importance for many-to-many correlation analysis 86
4.3 Remedy for singularity and small sample size 92
4.3.1 remedy for singular data under n > p + q 93
4.3.2 Remedy for small sample size (n <= p + q) 99
4.3.2.1 remedy for p<n <= p + q 100
4.3.2.2 remedy for q<n<=p 104
4.3.2.3 remedy for n<=q 106
Chapter 5 Case Study for Many-to-many Relative Importance Analysis 110
5.1 Hypothetical Case Study 110
5.1.1 Hypothetical case 1: 110
5.1.2 Hypothetical case 2: 113
5.1.3 Hypothetical case 3: 116
5.2 Learning on Semiconductor ET Parameters 122
5.2.1 Real case1 123
5.2.2 Real case2 126
Chapter 6 Conclusions 133
6.1 Summary 133
6.2 Future Study 134
APPENDIX 135
A. Proof of proposition 1 135
B. Proof of proposition 3 138
C. Proof of proposition 4 139
D. Proof of proposition 6 140
E. Proof of proposition 7 142
F. Proof of proposition 8 143
REFERENCE 145
LIST OF PUBLICATIONs 150
dc.language.isoen
dc.subject多對多相關分析zh_TW
dc.subject因果分析方法zh_TW
dc.subject變數選擇zh_TW
dc.subject逐段迴歸模型建構zh_TW
dc.subject機器學習zh_TW
dc.subject資料探勘zh_TW
dc.subject迴歸樹zh_TW
dc.subject半導體良率分析zh_TW
dc.subject回歸模型變數重要性分析zh_TW
dc.subject相對重要性zh_TW
dc.subject相對權重zh_TW
dc.subjectvariable selectionen
dc.subjectcausal analysis methodsen
dc.subjectmany-to-many correlation analysisen
dc.subjectrelative weightsen
dc.subjectrelative importanceen
dc.subjectdominance analysisen
dc.subjectyield analysisen
dc.subjectregression treeen
dc.subjectdata miningen
dc.subjectmachine learningen
dc.subjectpiecewise modelen
dc.title樣本高效迴歸樹及多對多相對重要性分析用於因果分析方法之研究zh_TW
dc.titleCausal Analysis Methods by Sample-Efficient Regression Tree and Many-to-many Relative Importance Analysisen
dc.typeThesis
dc.date.schoolyear100-2
dc.description.degree博士
dc.contributor.oralexamcommittee桑慧敏(Wheyming Tina Song),汪上曉(David Shan Hill Wong),鄭順林(Shuen-Lin Jeng),蔡雅蓉(Ya-Jung Tsai)
dc.subject.keyword因果分析方法,變數選擇,逐段迴歸模型建構,機器學習,資料探勘,迴歸樹,半導體良率分析,回歸模型變數重要性分析,相對重要性,相對權重,多對多相關分析,zh_TW
dc.subject.keywordcausal analysis methods,variable selection,piecewise model,machine learning,data mining,regression tree,yield analysis,dominance analysis,relative importance,relative weights,many-to-many correlation analysis,en
dc.relation.page151
dc.rights.note有償授權
dc.date.accepted2012-08-16
dc.contributor.author-college工學院zh_TW
dc.contributor.author-dept機械工程學研究所zh_TW
顯示於系所單位:機械工程學系

文件中的檔案:
檔案 大小格式 
ntu-101-1.pdf
  未授權公開取用
1.23 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved