從類神經網路擷取具機率值的布林分類規則

Chun-kai Hwang; 黃俊凱

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66773

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	歐陽彥正
dc.contributor.author	Chun-kai Hwang	en
dc.contributor.author	黃俊凱	zh_TW
dc.date.accessioned	2021-06-17T01:08:16Z	-
dc.date.available	2023-02-17
dc.date.copyright	2020-02-17
dc.date.issued	2020
dc.date.submitted	2020-02-04
dc.identifier.citation	[1] A. D. Flaxman and T. Vos (2018) “Machine learning in population health: Opportunities and threats,” PLoS medicine, 15(11), e1002702. DOI:10.1371/journal.pmed.1002702 [2] K. J. Danjuma (2015) “Performance evaluation of machine learning algorithms in post-operative life expectancy in the lung cancer patients,' Department of Computer Science ModibboAdama University of Technology. [Online] Available: https://arxiv.org/abs/1504.04646. [3] S. B. Kotsiantis et al. (2006) “Machine learning: A review of classification and combining techniques,” Artificial Intelligence Review. 26. 159-190. 10.1007/s10462-007-9052-3. [4] S. N. Tran and A. d’Avila Garcez (2013) “Knowledge Extraction from Deep Belief Networks for Images,” in: IJCAI-2013 Workshop on Neural-Symbolic Learning and Reasoning. [5] S. K. Biswas et al. (2017) 'Rule extraction from training data using neural network,' Int. J. Artif. Intell. Tool 26(3). [6] G. Bologna (2019) “A simple convolutional neural network with rule extraction,” Applied Sciences. 9(12):2411. [7] J. Zlike (2015) “Extracting rules from deep neural networks,” Master thesis, Computer Science Department, Technische Universitat Darmstadt. [8] C. Matthews and I. Jagielska (1995) 'Fuzzy rule extraction from a trained multilayered neural network', Proc. of ICNN'95, pp. 744-748. [9] P. Smyth et al. (1995) “Retrofitting decision tree classifiers using kernel density estimation,” Proc. 12th Int’l Conf. Machine Learning, pp. 506-514. [10] Y. J. Liu (2019) “Combining decision tree and adaptive kernel density estimation to construct probability estimation tree,” Master thesis, Master Program in Statistics, Center for General Education, National Taiwan University. [11] F. C. Yang and C. K. Hwang (2004) “A Boolean algebra based rule extraction algorithm for neural networks with binary or bipolar inputs,” Journal of the Chinese Institute of Industrial Engineers, 21(1), pp.27-39. [12] R. A. Iqbal (2012) “Eclectic rule extraction from neural networks using aggregated decision trees,” IEEE, 7th International Conference on Electrical Computer Engineering (ICECE), pp.129–132. [13] K. Sethi et al. (2012) 'KDRuleEx: A novel approach for enhancing user Comprehensibility using rule extraction', in Third International Conference Intelligent Systems, Modelling and Simulation, Kota Kinabalu, pp. 55-60. [14] M. Augasta and T. Kathirvalavakumar (2011) 'Reverse Engineering the Neural Networks for Rule Extraction in Classification Problems', Neural Process Lett, 35(2), pp.131-150. [15] I. A. Taha and J. Ghosh (1999) 'Symbolic interpretation of artificial neural networks,' IEEE Transactions on Knowledge and Data Engineering, 11(3), pp. 448-463. [16] E. W. Saad and D. C. Wunsch (2007) 'Neural network explanation using inversion,' Neural Networks, 20(1), pp. 78–93, Jan. [17] C. Umans et al. (2006) 'Complexity of two-level logic minimization,' IEEE Trans. Computer-Aided Design of Int. Circuits and Systems, 25(1), pp. 1230–1246. [18] C. Umans (2001) 'The minimum equivalent DNF problem and shortest implicants,' Journal of Computer and Systems Sciences, 63(4), pp. 597–611. [19] E. Boros and O. Cepek (1994) 'On the complexity of Horn minimization,' Technical Report 1-94, RUTCOR Research Report RRR, Rutgers University, New Brunswick, NJ, Jan. [20] T. Chang (2004) 'Horn formula minimization,' Master thesis, Rochester Institute of Technology. [21] E. J. McCluskey (1956) “Minimization of Boolean functions,” The Bell System Technical Journal, November. [22] S. R. Petrick (1956) 'A direct determination of the irredundant forms of a Boolean function from the set of prime implicants,' Bedford, Cambridge, MA, USA: Air Force Cambridge Research Center. AFCRC Technical Report TR-56-110. [23] A. Duşa and A. Thiem (2015) 'Enhancing the minimization of Boolean and multivalue output functions with eQMC,' The Journal of Mathematical Sociology, 39(2), pp. 92-108. [24] R. K. Brayton et al. (1984) 'Logic Minimization Algorithms for VLSI Synthesis,' (9th printing 2000, 1st ed.). Kluwer Academic Publishers. ISBN 0-89838-164-9. [25] B. Chlebus and S.H. Nguyen (1998) “On finding optimal discretizations for two attributes,” Proc. First Int’l Conf. Rough Sets and Current Trends in Computing (RSCTC ’98), pp. 537-544. [26] U. M. Fayyad and K. B. Irani (1993) “Multi-interval discretization of continuous valued attributes for classification learning,” In Proc. 13th International Joint Conference on Artificial Intelligence, pp. 1022–1029, Chambery, France. [27] J.R. Quinlan (1993) “C4.5: Programs for machine learning,” Morgan Kaufmann Publishers, Inc. [28] R. Kerber (1992) “ChiMerge: discretization of numeric attributes,” Proc. Nat’l Conf. Artifical Intelligence Am. Assoc. for Artificial Intelligence (AAAI), pp. 123-128. [29] H. Liu and R. Setiono (1997) “Feature selection via discretization,” IEEE Trans. Knowledge and Data Eng., 9(4), pp. 642-645, July/Aug. [30] S. Mehta et al. (2005) “Toward unsupervised correlation preserving discretization,” IEEE Trans. Knowledge and Data Eng., 17(9), pp. 1174-1185, Sept. [31] M. Boulle´ (2006) “MODL: A Bayes optimal discretization method for continuous attributes,” Machine Learning, 65(1), pp. 131-165. [32] S. García et al. (2013) 'A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning,' in IEEE Transactions on Knowledge and Data Engineering, 25(4), pp. 734-750, April. [33] P. Cortez et al. (2009) “Modeling wine preferences by data mining from physicochemical properties,” In Decision Support Systems, Elsevier, 47(4) pp. 547-553. [34] M. Nilashi et al. (2017) “Accuracy improvement for diabetes disease classification: a case on a public medical dataset,” Fuzzy Information and Engineering, 9(3), pp. 345-357. [35] D. K. Choubey et al. (2017) “Classification of Pima indian diabetes dataset using naïve bayes with genetic algorithm as an attribute selection,” in Communication and Computing Systems: Proceedings of the International Conference on Communication and Computing System (ICCCS 2016), pp. 451-455. [36] R. P. Kauffman et al. (2013) “Current recommendations for cervical cancer screening: do they render the annual pelvic examination obsolete?” Medical Principles and Practice 22(4): pp.313-322 [37] K. Fernandes et. al (2015) “Temporal segmentation of digital colposcopies,” In: R. Paredes, J. Cardoso, X. Pardo, eds. Iberian Conference on Pattern Recognition and Image Analysis. Santiago de Compostela: Springer, pp.262-271. [38] Centers for Disease Control and Prevention (CDC) (2013) “Cervical cancer screening among women aged 18-30 years”, United States, 2000-2010. Morbidity and Mortality Weekly Report 61(51-52): p.1038. [39] S. Coskun (2018) “Red Wine Quality Classification”. [Online] Available: https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009 [40] J. Huang (2018) “Wine Quality Prediction,” [Online]. Available: http://rstudio-pubs-static.s3.amazonaws.com/438329_edfaab4011ce44a59fb9ae2d216d8dea.html [41] Q. Wang et al. (2019) “DMP_MI: An effective diabetes mellitus classification algorithm on imbalanced data with missing values,” IEEE Access, 7, pp.102232 – 102238. [42] K. Fernandes et al. (2018) “Supervised deep learning embeddings for the prediction of cervical cancer diagnosis,” PeerJ Computer Science, 4, pp. e154.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66773	-
dc.description.abstract	在分類問題上，類神經網路顯著地在正確率指標優於傳統統計方法如回歸分析或區別分析，乃至於其他機器學習演算法如決策樹或貝氏網路。但由於其學習的知識是存在於內部層層映射的網路架構及神經元連結的權重及閥值。類神經網路的決策過程為黑箱作業使人無法理解其決策規則。本研究提出了具機率值的布林分類規則擷取演算法，除了可以擷取出分類規則外，也可以依據機率的閥值來調整我們的規則模型使其符合指定的靈敏度。此外，在特徵集上，我們也可以給予每一個特徵屬性一個介於0到1的一個重要因子值。當重要因子值為0時即代表此特徵屬性為雜訊，同樣的也可以依據給定一個特定的閥值來決定特徵集的選取。從線性可區分及線性不可區分的模擬資料集實驗結果，我們發現即使在僅有1/10的訓練資料下，我們提出的演算法PBCR1及PBCR2仍然有優於類神經網路的分類正確率。在UCI機器學習資料集上，我們發現PBCR1及PBCR2在AUC上會比類神經網路略為下降。但在正確率指標上，從紅酒資料集及白酒資料集的實驗，PBCR1及PBCR2統計上顯著地優於決策樹且與類神經網路無統計上顯著差異。在F1指標上，PBCR1及PBCR2統計上顯著優於決策樹在紅酒資料集，白酒資料集，糖尿病資料集及子宮頸癌資料集。	zh_TW
dc.description.abstract	For classification problems, neural networks are well known for the high accuracy in comparison to traditional statistical methods such as logistic regression and discriminant analysis. It is even better than other algorithms such as decision trees and Bayesian networks. However, the knowledge learned by the neural networks is stored in the hierarchical functional mapping of the structures of neural networks and the weight and bias parameters. It is not easy for people to understand its black-box decision process. In this research, we extract probabilistic Boolean classification rules from neural networks. The ruleset model can be tuned to a specified sensitivity according to different thresholds. In addition, we can compute an important factor for each attribute that composing the Boolean rules. The important factor is a numeric number between 0 and 1. If the important factor is 0, it means the corresponding attribute is a noise signal. Hence, the important features can be filtered out with a given threshold. From the linearly and nonlinearly separable simulation datasets, we find that the accuracy of PBCR1 and PBCR2 are better than neural networks even with 1/10 training ratio. From UCI machine learning datasets, we find that the AUC of PBCR1 and PBCR2 will be a little lower than the AUC of neural networks. However, on the accuracy metric, from red wine and white wine datasets, PBCR1 and PBCR2 are almost the same with neural networks. The accuracies of PBCR1 and PBCR2 are superior to DT by a statistically significant margin. For the F1 score, PBCR1 and PBCR2 are statistically significantly better than DT on red wine, white wine, PID, and cervical cancer datasets.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T01:08:16Z (GMT). No. of bitstreams: 1 ntu-109-R06h41019-1.pdf: 2998187 bytes, checksum: 85481472f726b9e359f198e2a3981d7b (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	List of Figures vii List of Tables viii Chapter 1 Introduction 1 1.1 Background 1 1.2 Research Objective 2 1.3 Thesis Organization 3 Chapter 2 Literature Review 5 2.1 Neural Network Rule Extraction Algorithms: An Overview 5 2.2 Boolean Function Reduction Methods 10 2.3 Numeric Attributes Discretization Methods 10 2.4 Discussion 11 Chapter 3 Methods and Algorithms 13 3.1 Probabilistic Boolean Classification Rules(PBCR) Algorithm 13 3.1.1 Probabilistic Boolean Classification Rules I (PBCR1) Algorithm 14 3.1.2 Probabilistic Boolean Classification Rules II (PBCR2) Algorithm 16 3.1.3 Probabilistic Boolean Classification Rules III (PBCR3) Algorithm 18 3.1.4 Probabilistic Boolean Classification Rules IV (PBCR4) Algorithm 20 3.2 Feature Important factor (FIF) Algorithm 21 3.3 Chi-Merge Extension for Missing-value (Chi-Merge_MV) Algorithm 22 3.4 Experimental Procedures 26 Chapter 4 Results 29 4.1 Linearly Separable Simulation Dataset 29 4.2 Nonlinearly Separable Simulation Dataset 32 4.3 Red Wine Dataset 38 4.4 White Wine Dataset 45 4.5 Pima Indians Diabetes (PID) Dataset 51 4.6 Cervical Cancer Dataset 62 Chapter 5 Discussion 76 5.1 Linearly and Nonlinearly Separable Datasets 76 5.2 Red Wine Dataset 76 5.3 White Wine Dataset 77 5.4 PID Dataset 77 5.5 Cervical Cancer Dataset 79 5.6 Performance Comparison 80 5.6.1 AUC metric 80 5.6.2 Accuracy metric 83 5.6.3 F1 score metric 84 Chapter 6 Conclusions and Future Works 87 6.1 Conclusions 87 6.2 Future works 88 Reference 89 Appendix A. Proof of Step 4.3 in PBCR1 Algorithm (Algorithm 3.1) 95
dc.language.iso	en
dc.title	從類神經網路擷取具機率值的布林分類規則	zh_TW
dc.title	Extracting Classification Boolean Rules with Probabilities from Neural Networks	en
dc.type	Thesis
dc.date.schoolyear	108-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	韓謝忱,賴飛羆,傅楸善
dc.subject.keyword	類神經網路,分類問題,規則擷取,決策樹,UCI 資料集,	zh_TW
dc.subject.keyword	Neural networks,Classification,Rule extraction,Decision trees,UCI datasets,	en
dc.relation.page	95
dc.identifier.doi	10.6342/NTU202000129
dc.rights.note	有償授權
dc.date.accepted	2020-02-04
dc.contributor.author-college	共同教育中心	zh_TW
dc.contributor.author-dept	統計碩士學位學程	zh_TW
顯示於系所單位：	統計碩士學位學程

文件中的檔案：

檔案	大小	格式
ntu-109-1.pdf 目前未授權公開取用	2.93 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。