從類神經網路擷取具機率值的布林分類規則

Chun-kai Hwang; 黃俊凱

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66773

標題:	從類神經網路擷取具機率值的布林分類規則 Extracting Classification Boolean Rules with Probabilities from Neural Networks
作者:	Chun-kai Hwang 黃俊凱
指導教授:	歐陽彥正
關鍵字:	類神經網路,分類問題,規則擷取,決策樹,UCI 資料集, Neural networks,Classification,Rule extraction,Decision trees,UCI datasets,
出版年 :	2020
學位:	碩士
摘要:	在分類問題上，類神經網路顯著地在正確率指標優於傳統統計方法如回歸分析或區別分析，乃至於其他機器學習演算法如決策樹或貝氏網路。但由於其學習的知識是存在於內部層層映射的網路架構及神經元連結的權重及閥值。類神經網路的決策過程為黑箱作業使人無法理解其決策規則。本研究提出了具機率值的布林分類規則擷取演算法，除了可以擷取出分類規則外，也可以依據機率的閥值來調整我們的規則模型使其符合指定的靈敏度。此外，在特徵集上，我們也可以給予每一個特徵屬性一個介於0到1的一個重要因子值。當重要因子值為0時即代表此特徵屬性為雜訊，同樣的也可以依據給定一個特定的閥值來決定特徵集的選取。從線性可區分及線性不可區分的模擬資料集實驗結果，我們發現即使在僅有1/10的訓練資料下，我們提出的演算法PBCR1及PBCR2仍然有優於類神經網路的分類正確率。在UCI機器學習資料集上，我們發現PBCR1及PBCR2在AUC上會比類神經網路略為下降。但在正確率指標上，從紅酒資料集及白酒資料集的實驗，PBCR1及PBCR2統計上顯著地優於決策樹且與類神經網路無統計上顯著差異。在F1指標上，PBCR1及PBCR2統計上顯著優於決策樹在紅酒資料集，白酒資料集，糖尿病資料集及子宮頸癌資料集。 For classification problems, neural networks are well known for the high accuracy in comparison to traditional statistical methods such as logistic regression and discriminant analysis. It is even better than other algorithms such as decision trees and Bayesian networks. However, the knowledge learned by the neural networks is stored in the hierarchical functional mapping of the structures of neural networks and the weight and bias parameters. It is not easy for people to understand its black-box decision process. In this research, we extract probabilistic Boolean classification rules from neural networks. The ruleset model can be tuned to a specified sensitivity according to different thresholds. In addition, we can compute an important factor for each attribute that composing the Boolean rules. The important factor is a numeric number between 0 and 1. If the important factor is 0, it means the corresponding attribute is a noise signal. Hence, the important features can be filtered out with a given threshold. From the linearly and nonlinearly separable simulation datasets, we find that the accuracy of PBCR1 and PBCR2 are better than neural networks even with 1/10 training ratio. From UCI machine learning datasets, we find that the AUC of PBCR1 and PBCR2 will be a little lower than the AUC of neural networks. However, on the accuracy metric, from red wine and white wine datasets, PBCR1 and PBCR2 are almost the same with neural networks. The accuracies of PBCR1 and PBCR2 are superior to DT by a statistically significant margin. For the F1 score, PBCR1 and PBCR2 are statistically significantly better than DT on red wine, white wine, PID, and cervical cancer datasets.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66773
DOI:	10.6342/NTU202000129
全文授權:	有償授權
顯示於系所單位：	統計碩士學位學程

文件中的檔案：

檔案	大小	格式
ntu-109-1.pdf 目前未授權公開取用	2.93 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。