微生物源資料之辨識模型探勘

Nancy Huang; 黃安婷

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/18925

標題:	微生物源資料之辨識模型探勘 Discriminative Pattern Mining in Microbiomic Data
作者:	Nancy Huang 黃安婷
指導教授:	歐陽彥正
關鍵字:	辨識模型探勘,辨識模型相關性,辨識模型冗餘性,辨識模型選取,微生物源資料, discriminative patterns,pattern mining,pattern relevancy,pattern redundancy,pattern selection,microbiomic data,
出版年 :	2016
學位:	博士
摘要:	Machine learning classifiers have long been used to solve biological problems by predicting the target class (e.g. disease state, bacterial taxonomy, etc.) of unseen samples. A favorable and important byproduct of a special type of classifier is “interpretability” (also known as “comprehensibility”), which could be utilized to offer explanations as to why and how a sample is assigned to the predicted class. Interpretable classifiers produce “discriminative patterns” that lead to different prediction results, and provide insights to critical properties of the biological problem by capturing a greater extent of underlying semantics than single features. Discriminative patterns can be directly utilized by pattern-based classifiers to predict unseen samples by a majority voting or aggregation mechanism. In this case, we are concerned with not only finding useful individual patterns, but also the effectiveness of the pattern set as a whole. Thus, it is imperative to ensure the relevancy and non-redundancy of the discriminating patterns. Few studies have evaluated pattern redundancy via examining samples covered by the patterns; and in those that do, the focus has been mostly on the proportion of overlapping samples, suggesting that a great deal of information on non-overlapping samples were overlooked. In addition, traditional pattern mining approaches often require the generation of a complete set of initial patterns and a global discretization of continuous attributes, both of which are impractical for high-dimensional biological datasets of complex nature. We address the above issues by presenting a novel pattern selection algorithm that estimates pattern redundancy by not only the proportion of overlapping samples, but also the resemblance of non-overlapping samples. The proposed method was applied on three real microbiomic datasets, with the aim of providing new insights on the interactions between microbial factors and their effects on the host. When compared with other robust classifiers and feature selection heuristics, our pattern selection algorithm led to diverse and compact sets of final patterns that demonstrated comparable or even superior predictive capabilities.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/18925
DOI:	10.6342/NTU201603476
全文授權:	未授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-105-1.pdf 目前未授權公開取用	2.04 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。