從類神經網路擷取具機率值的布林分類規則

Chun-kai Hwang; 黃俊凱

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66773

Title:	從類神經網路擷取具機率值的布林分類規則 Extracting Classification Boolean Rules with Probabilities from Neural Networks
Authors:	Chun-kai Hwang 黃俊凱
Advisor:	歐陽彥正
Keyword:	類神經網路,分類問題,規則擷取,決策樹,UCI 資料集, Neural networks,Classification,Rule extraction,Decision trees,UCI datasets,
Publication Year :	2020
Degree:	碩士
Abstract:	在分類問題上，類神經網路顯著地在正確率指標優於傳統統計方法如回歸分析或區別分析，乃至於其他機器學習演算法如決策樹或貝氏網路。但由於其學習的知識是存在於內部層層映射的網路架構及神經元連結的權重及閥值。類神經網路的決策過程為黑箱作業使人無法理解其決策規則。本研究提出了具機率值的布林分類規則擷取演算法，除了可以擷取出分類規則外，也可以依據機率的閥值來調整我們的規則模型使其符合指定的靈敏度。此外，在特徵集上，我們也可以給予每一個特徵屬性一個介於0到1的一個重要因子值。當重要因子值為0時即代表此特徵屬性為雜訊，同樣的也可以依據給定一個特定的閥值來決定特徵集的選取。從線性可區分及線性不可區分的模擬資料集實驗結果，我們發現即使在僅有1/10的訓練資料下，我們提出的演算法PBCR1及PBCR2仍然有優於類神經網路的分類正確率。在UCI機器學習資料集上，我們發現PBCR1及PBCR2在AUC上會比類神經網路略為下降。但在正確率指標上，從紅酒資料集及白酒資料集的實驗，PBCR1及PBCR2統計上顯著地優於決策樹且與類神經網路無統計上顯著差異。在F1指標上，PBCR1及PBCR2統計上顯著優於決策樹在紅酒資料集，白酒資料集，糖尿病資料集及子宮頸癌資料集。 For classification problems, neural networks are well known for the high accuracy in comparison to traditional statistical methods such as logistic regression and discriminant analysis. It is even better than other algorithms such as decision trees and Bayesian networks. However, the knowledge learned by the neural networks is stored in the hierarchical functional mapping of the structures of neural networks and the weight and bias parameters. It is not easy for people to understand its black-box decision process. In this research, we extract probabilistic Boolean classification rules from neural networks. The ruleset model can be tuned to a specified sensitivity according to different thresholds. In addition, we can compute an important factor for each attribute that composing the Boolean rules. The important factor is a numeric number between 0 and 1. If the important factor is 0, it means the corresponding attribute is a noise signal. Hence, the important features can be filtered out with a given threshold. From the linearly and nonlinearly separable simulation datasets, we find that the accuracy of PBCR1 and PBCR2 are better than neural networks even with 1/10 training ratio. From UCI machine learning datasets, we find that the AUC of PBCR1 and PBCR2 will be a little lower than the AUC of neural networks. However, on the accuracy metric, from red wine and white wine datasets, PBCR1 and PBCR2 are almost the same with neural networks. The accuracies of PBCR1 and PBCR2 are superior to DT by a statistically significant margin. For the F1 score, PBCR1 and PBCR2 are statistically significantly better than DT on red wine, white wine, PID, and cervical cancer datasets.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66773
DOI:	10.6342/NTU202000129
Fulltext Rights:	有償授權
Appears in Collections:	統計碩士學位學程

Files in This Item:

File	Size	Format
ntu-109-1.pdf Restricted Access	2.93 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets