基於自動編碼器之重建值判別學習應用於異常值偵測

Yen Su; 蘇彥

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/59724

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	王勝德(Sheng-De Wang)
dc.contributor.author	Yen Su	en
dc.contributor.author	蘇彥	zh_TW
dc.date.accessioned	2021-06-16T09:34:55Z	-
dc.date.available	2022-02-17
dc.date.copyright	2017-02-17
dc.date.issued	2017
dc.date.submitted	2017-02-13
dc.identifier.citation	[1] Breast cancer wisconsin (original) dataset. https://archive.ics.uci.edu/ml/ datasets/Breast+Cancer+Wisconsin+(Original). [2] Ionosphere dataset. https://archive.ics.uci.edu/ml/datasets/Ionosphere. [3] Nsl-kdd dataset. http://www.unb.ca/research/iscx/dataset/ iscx-NSL-KDD-dataset.html. [4] Scikit-learn. http://scikit-learn.org/. [5] Tensorflow. http://tensorflow.org/. [6] Uci machine learning repository. http://archive.ics.uci.edu/ml/. [7] S. Albrecht, J. Busch, M. Kloppenburg, F. Metze, and P. Tavan. Generalized radial basis function networks for classification and novelty detection: self-organization of optimal bayesian decision. Neural Networks, 13(10):1075 – 1093, 2000. [8] M. F. Augusteijn and B. A. Folkert. Neural network classification and novelty de- tection. International Journal of Remote Sensing, 23(14):2891–2902, 2002. [9] V. Barnett and T. Lewis. Outliers in statistical data. John Wiley & Sons Ltd., 2nd edition edition, 1978. [10] S. D. Bay and M. Schwabacher. Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’03, pages 29–38, New York, NY, USA, 2003. ACM. [11] M.M.Breunig,H.-P.Kriegel,R.T.Ng,andJ.Sander.Lof:Identifyingdensity-based local outliers. SIGMOD Rec., 29(2):93–104, May 2000. [12] V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Comput. Surv., 41(3):15:1–15:58, July 2009. [13] H. A. Dau, V. Ciesielski, and A. Song. Anomaly detection using replicator neural networks trained on examples of one class. In Proceedings of the 10th International Conference on Simulated Evolution and Learning - Volume 8886, SEAL 2014, pages 311–322, New York, NY, USA, 2014. Springer-Verlag New York, Inc. [14] P. Gogoi, B. Borah, D. Bhattacharyya, and J. Kalita. Outlier identification using symmetric neighborhoods. Procedia Technology, 6:239 – 246, 2012. [15] D.Hawkins.Identificationofoutliers.Monographsonappliedprobabilityandstatis- tics. Chapman and Hall, London [u.a.], 1980. [16] S. Hawkins, H. He, G. J. Williams, and R. A. Baxter. Outlier detection using repli- cator neural networks. In Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2000, pages 170–180, London, UK, UK, 2002. Springer-Verlag. [17] Z.He,X.Xu,andS.Deng.Discoveringcluster-basedlocaloutliers.PatternRecogn. Lett., 24(9-10):1641–1650, June 2003. [18] Z. He, X. Xu, J. Z. Huang, and S. Deng. Fp-outlier: frequent pattern based outlier detection. Technical report, 2002. [19] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006. [20] L. M. Ibrahin, D. T. Basheer, and M. S. Mahmod. A comparison study for intrusion database (kdd99, nsl-kdd) based on self organization map (som) artificial neural net- work. In Journal of Engineering Science and Technology, 8(1), 107-119, 2013. [21] B. Ingre and A. Yadav. Performance analysis of nsl-kdd dataset using ann. In 2015 International Conference on Signal Processing and Communication Engineering Systems, pages 92–96, Jan 2015. [22] A. Javaid, Q. Niyaz, W. Sun, and M. Alam. A deep learning approach for network intrusion detection system. In Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies (Formerly BIONET- ICS), BICT’15, pages 21–26, ICST, Brussels, Belgium, Belgium, 2016. ICST (Insti- tute for Computer Sciences, Social-Informatics and Telecommunications Engineer- ing). [23] E. M. Knorr and R. T. Ng. Algorithms for mining distance-based outliers in large datasets. In Proceedings of the 24rd International Conference on Very Large Data Bases, VLDB ’98, pages 392–403, San Francisco, CA, USA, 1998. Morgan Kauf- mann Publishers Inc. [24] T. Kohonen, M. R. Schroeder, and T. S. Huang, editors. Self-Organizing Maps. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 3rd edition, 2001. [25] V. Kumar and A. K. Singh. Outlier detection: A clustering-based approach. Inter- national Journal of Science and Modern Engineering, 1:16–19. [26] D.Martinez.Neuraltreedensityestimationfornoveltydetection.Trans.Neur.Netw., 9(2):330–338, Mar. 1998. [27] R. A. Maxion and R. R. Roberts. Proper Use of ROC Curves in Intrusion/Anomaly Detection. Technical Report CS-TR-871, School of Computing Science, University of Newcastle upon Tyne, Nov. 2004. [28] A. Muñoz and J. Muruzábal. Self-organizing maps for outlier detection. Neurocom- puting, 18(1–3):33 – 60, 1998. [29] G.H.Orair,C.H.C.Teixeira,W.Meira,Jr.,Y.Wang,andS.Parthasarathy.Distance- based outlier detection: Consolidation and renewed bearing. Proc. VLDB Endow., 3(1-2):1469–1480, Sept. 2010. [30] S. Ramaswamy, R. Rastogi, and K. Shim. Efficient algorithms for mining outliers from large data sets. In Proceedings of the 2000 ACM SIGMOD International Con- ference on Management of Data, SIGMOD ’00, pages 427–438, New York, NY, USA, 2000. ACM. [31] M. Sakurada and T. Yairi. Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proceedings of the MLSDA 2014 2Nd Workshop on Machine Learning for Sensory Data Analysis, MLSDA’14, pages 4:4–4:11, New York, NY, USA, 2014. ACM. [32] R.Sang,P.Jin,andS.Wan.DiscriminativeFeatureLearningforActionRecognition Using a Stacked Denoising Autoencoder, pages 521–531. Springer International Publishing, Cham, 2014. [33] C. D. Stefano, C. Sansone, and M. Vento. To reject or not to reject: that is the question-an answer in case of neural classifiers. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 30(1):84–94, Feb 2000. [34] I. Syarif, A. Prugel-Bennett, and G. Wills. Unsupervised Clustering Approach for Network Anomaly Detection, pages 135–145. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012. [35] P. Sykacek. Equivalent error bars for neural network classifiers trained by bayesian inference. In In Proc. ESANN, pages 121–126, 1997. [36] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani. A detailed analysis of the kdd cup 99 data set. In Proceedings of the Second IEEE International Conference on Computational Intelligence for Security and Defense Applications, CISDA’09, pages 53–58, Piscataway, NJ, USA, 2009. IEEE Press. [37] L. Tóth and G. Gosztolya. Replicator Neural Networks for Outlier Modeling in Segmental Speech Recognition, pages 996–1001. Springer Berlin Heidelberg, Berlin, Heidelberg, 2004. [38] Y. Xia, X. Cao, F. Wen, G. Hua, and J. Sun. Learning discriminative reconstructions for unsupervised outlier removal. 2015 IEEE International Conference on Computer Vision (ICCV), pages 1511 – 1519, 2015.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/59724	-
dc.description.abstract	異常值檢測目的在於從數據集中找出屬性值與正常資料點與不同的離群資料點。隨著資料演化快速發展，在分析龐大數據上需要更有效的異常值檢測，而自動解碼器是一個可以有效偵測離群值的工具，透過重建時的誤差，以離群值具有相對正常資料點更大的誤差來判斷。然而，重建誤差在一些數據集中會嚴重重疊，導致預測不精確。本論文提出一個基於自動編碼器的改良解決此問題。訓練過程中，在週期的迭代，透過閾值決定每個實例的正和負類別形成一個預測向量，並將預測向量作為區分資訊加入到下一次迭代做成本運算。利用區別性學習，正常資料點和離群值更可透過重建誤差做分離，使得重建誤差成為更好的預測判別指標，進而提升異常值偵測的效能。最後將提出的方法應用在三個常用於異常值偵測研究的數據集進行測試，實驗結果顯示所提出的方法可以在偵測異常值達到較高的準確度。	zh_TW
dc.description.abstract	Outlier detection aims to find the instances that are very different from the defined normal instances in a given dataset. Autoencoders are effective tools for outlier detection by utilizing the reconstruction errors, that is, the outliers have relatively larger reconstruction errors than the inliers. Nevertheless, the reconstruction errors will overlap significantly in some dataset, which leading to inaccurate prediction. In the thesis, we propose a modified autoencoder to solve the problem. Based on the autoencoder, we assign a positive and negative label to each instance and feed the prediction vector to the next iteration as a discriminative information in the learning process periodically. With the discriminative learning, the reconstruction errors of inliers and outliers are more separable, leading to a more accurate outlier detection. We have tested on three datasets that are widely used for outlier detection: Ionosphere, Wisconsin breast cancer and NSL-KDD. The proposed approach can achieve 94.30%, 97.07%, and 92.74% accuracy respectively. The experimental results show that our approach can reach high performance on identifying outliers.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T09:34:55Z (GMT). No. of bitstreams: 1 ntu-106-R03921080-1.pdf: 924806 bytes, checksum: 5b1c389379e4298ccc4612091f6ee001 (MD5) Previous issue date: 2017	en
dc.description.tableofcontents	摘要 i Abstract ii 1 Introduction 1 1.1 Overview of Outlier Detection Approach. . . . . . . . . . . . . . . . . . 2 1.2 Motivation.................................. 3 1.3 Contribution................................. 3 1.4 Thesis Organization............................. 4 2 Related Works 5 2.1 Neural Network for Outlier Detection ................... 5 2.2 Autoencoder for OutlierDetection ..................... 6 2.3 Discriminative Autoencoder ........................ 7 3 Methodology 8 3.1 Basic Autoencoder ............................. 10 3.2 Outlier Detection using Autoencoder.................... 11 3.3 Preprocessing................................ 13 3.4 Discriminative Reconstructions Learning . . . . . . . . . . . . . . . . . 14 3.5 Outlier Detection .............................. 17 4 Evaluation 19 4.1 Dataset ................................... 19 4.1.1 Ionosphere ............................. 19 4.1.2 Wisconsin Breast Cancer...................... 20 4.1.3 NSL-KDD ............................. 20 4.2 Evaluation Metrics ............................. 21 4.3 Experimental Result............................. 23 4.3.1 Ionosphere ............................. 23 4.3.2 Wisconsin Breast Cancer...................... 26 4.3.3 NSL-KDD ............................. 28 5 Discussion 30 5.1 ROC Analysis................................ 30 5.2 Analysis of Trade-offValue for Threshold . . . . . . . . . . . . . . . . . 32 5.3 Future Work................................. 33 6 Conclusion 35 References 36
dc.language.iso	en
dc.subject	深度學習	zh_TW
dc.subject	異常值偵測	zh_TW
dc.subject	自動編碼器	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	異常值偵測	zh_TW
dc.subject	自動編碼器	zh_TW
dc.subject	autoencoder	en
dc.subject	deep learning	en
dc.subject	outlier detection	en
dc.subject	autoencoder	en
dc.subject	deep learning	en
dc.subject	outlier detection	en
dc.title	基於自動編碼器之重建值判別學習應用於異常值偵測	zh_TW
dc.title	Discriminative Reconstructions Learning for Outlier Detection Using Autoencoders	en
dc.type	Thesis
dc.date.schoolyear	105-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	雷欽龍(Chin-Laung Lei),于天立(Tian-Li Yu)
dc.subject.keyword	深度學習,自動編碼器,異常值偵測,	zh_TW
dc.subject.keyword	deep learning,autoencoder,outlier detection,	en
dc.relation.page	40
dc.identifier.doi	10.6342/NTU201700445
dc.rights.note	有償授權
dc.date.accepted	2017-02-13
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf 未授權公開取用	903.13 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。