樣本高效迴歸樹及多對多相對重要性分析用於因果分析方法之研究

Amos Hong; 洪士峰

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/63915

標題:	樣本高效迴歸樹及多對多相對重要性分析用於因果分析方法之研究 Causal Analysis Methods by Sample-Efficient Regression Tree and Many-to-many Relative Importance Analysis
作者:	Amos Hong 洪士峰
指導教授:	陳正剛(Argon chen)
關鍵字:	因果分析方法,變數選擇,逐段迴歸模型建構,機器學習,資料探勘,迴歸樹,半導體良率分析,回歸模型變數重要性分析,相對重要性,相對權重,多對多相關分析, causal analysis methods,variable selection,piecewise model,machine learning,data mining,regression tree,yield analysis,dominance analysis,relative importance,relative weights,many-to-many correlation analysis,
出版年 :	2012
學位:	博士
摘要:	逐步迴歸分析與迴歸樹分析常應用於建立單一反應變數對多個影響因子的因果分析模型。逐步迴歸分析無法自動分群樣本建立逐段線性迴歸模型。迴歸樹循序地選擇屬性進行資料分群，最終並連結各分群至特定的線性迴歸模型，因此迴歸樹可用於建立逐段線性迴歸模型。不過現有的迴歸樹在每個節點選擇屬性並分離資料，在經歷幾個階層後樣本數會快速減少，樣本數消耗會導致之後的屬性選擇結果過度依存於先前分裂形成的小數目樣本，而產生不可靠的屬性選擇。本研究首先將結合迴歸樹與逐步迴歸分析的優點，提出樣本高效迴歸樹方法以有效建立逐段迴歸模型。另外一方面，當考量多個相關的反應變數與多個相關潛在影響因子變數的複雜關係時, 將反應變數個別考量將不再是發現重要的影響因子的有效方法。雖然文獻中已經有直接同時分析多反應變數對多影響因子相關性的方法。但這些方法在變數多重共線性的前提下無法合理解釋各變數對於此多對多相關性貢獻度。已有文獻提出泛用的架構以估算一對多迴歸分析中的變數重要性指標。本研究其次將延伸此一架構以估算多對多相關分析中變數貢獻性指標。本文使用假設案例及實際半導體良率分析案例闡明樣本高效迴歸樹及多對多相對重要性分析並驗證兩者於因果分析應用的效力。結果顯示樣本高效迴歸樹在有限的樣本數限制下仍可有效發掘潛在的因果分析模型。案例結果亦顯示多對多相對重要性分析相較於現有方法更有效發掘兩個變數集合之間的因果關係。 Forward stepwise regression analysis and regression tree are used for one-to-many causal analysis. Forward stepwise regression analysis selects critical attributes all the way with the same set of data. Regression analysis is, however, not capable of splitting data to construct piecewise regression models. Regression trees have been known to be an effective data mining tool for constructing piecewise models by iteratively splitting data set and selecting attributes into a hierarchical tree model. However, the sample size reduces sharply after few levels of data splitting causing unreliable attribute selection. In this research, we propose sample-efficient regression tree (SERT) approach that combines the forward selection in regression analysis and the regression tree methodologies to effectively construct a piecewise linear causal model. As multiple responses are mingled with potential causal factors, one-response-at–a-time correlation analysis is no longer sufficient to discover critical factors that result in change in correlated responses. Though methodologies of many-to-many correlation analysis have been proposed in the literature, difficulties arise, especially when there exist multi-collinearity effects among variables, to measure the relative importance of a variable’s contribution in the association between a set of responses and a set of factors. Johnson’s dominance analysis [1] offers a general framework for determination of relative importance of independent variables in linear multiple regression models. In this research, we also extend Johnson’s dominance index to many-to-many correlation analysis as a measurement to summarize the association relationship between two sets of variables. Hypothetical and actual semiconductor yield-analysis cases are used to illustrate both SERT and many-to-many relative importance analysis. Case studies show that SERT is effective in discovering the dataset’s underlying model where the sample size available for analysis is relatively small. Case study also shows the effectiveness of many-to-many relative important methods, as compared to other conventional methods, in analysis of two sets of variables.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/63915
全文授權:	有償授權
顯示於系所單位：	機械工程學系

文件中的檔案：

檔案	大小	格式
ntu-101-1.pdf 未授權公開取用	1.23 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。