請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72124
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 林風 | |
dc.contributor.author | Yen-Cheng Lin | en |
dc.contributor.author | 林彥呈 | zh_TW |
dc.date.accessioned | 2021-06-17T06:24:33Z | - |
dc.date.available | 2028-08-17 | |
dc.date.copyright | 2018-08-20 | |
dc.date.issued | 2018 | |
dc.date.submitted | 2018-08-17 | |
dc.identifier.citation | [1] J. H. Chen, C. W. Huang, and C. C. Shih. The exploration of machine learning for abnormal prediction model of telecom business support system. In Asia-Pacific Network Operations and Management Symposium, 2017.
[2] K. Zhang, J. Xu, and M. R. Min. Automated it system failure prediction: A deep learning approach. In IEEE International Conference on Big Data, 2017. [3] T. Islam and D. Manivannan. Predicting application failure in cloud: A machine learning approach. In IEEE International Conference on Cognitive Computing, 2017. [4] A. Rosa, Y. Chen, and W. Binder. Failure analysis and prediction for big-data systems. IEEE Transactions on Services Computing, 10:984–998, 2017. [5] I. Karakurt, S. Ozer, T. Ulusinan, and M. C. Ganiz. A machine learning approach to database failure prediction. In Computer Science and Engineering International Conference on Computer Science and Engineering, 2017. [6] J. H. Chen, C.W. Huang, and C.W. Cheng. The monitoring system of business support system with emergency prediction based on machine learning approach. In Asia-Pacific Network Operations and Management Symposium, 2016. [7] H. Kaur, G. Singh, and J. Minhas. A review of machine learning based anomaly detection techniques. International Journal of Computer Applications Technology and Research, 2:185–187, 2013. [8] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12:2493–2537, 2011. [9] K. He, X. Zhang, and S. Ren. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [10] C. Nitesh. Data Mining for Imbalanced Datasets: An Overview. Nitesh V. Chawla, 2005. [11] B. Mirza, S. Kok, and Z. Lin. Efficient representation learning for high-dimensional imbalance data. In IEEE International Conference on Digital Signal Processing (DSP), 2016. [12] C. Zhang, K. C. Tan, and R. Ren. Training cost-sensitive deep belief networks on imbalance data problems. In International Joint Conference on Neural Networks (IJCNN), 2016. [13] M. M. Al-Rifaie and H. A. Alhakbani. Handling class imbalance in direct marketing dataset using a hybrid data and algorithmic level solutions. In SAI Computing Conference (SAI), 2016. [14] T. Ghosh, D. Sarkar, and T. Sharma. Real time failure prediction of load balancers and firewalls. In IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), 2016. [15] J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl. Algorithms for hyper-parameter optimization. In International Conference on Neural Information Processing Systems (NIPS), 2011. [16] C. C. Chang, S. R. Yang, and E. H. Yeh. A kubernetes-based monitoring platform for dynamic cloud resource provisioning. In IEEE Global Communications Conference, 2017. [17] T. Kanungo, D.M. Mount, and N.S. Netanyahu. An efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:881–892, 2002. [18] scikit-learn. [Online]. Avaliable: http://scikit-learn.org/stable/. [19] P. Baldi. Autoencoders, unsupervised learning and deep architectures. In International Conference on Unsupervised and Transfer Learning workshop, 2011. [20] keras. [Online]. Avaliable: https://keras.io/. [21] L. Breiman. Random forests. Journal of Machine Learning, 45:5–32, 2001. [22] B.Kegl. The return of adaboost.mh: multi-class hamming trees. In International Conference on Learning Representations(ICLR), 2013. [23] C. Corinna and V. Vladimir. Support-vector networks. Journal of Machine Learning, 20:273–297, 1995. [24] S. Hochreiter and J. Schmidhuber. Long short-term memory. Journal of Neural Computation, 9:1735–1780, 1997. [25] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In Conference on Empirical Methods on Natural Language Processing, 2014. [26] T. G. Dietterich. Ensemble methods in machine learning. In International Workshop on Multiple Classifier Systems, 2000. [27] C. Bergmeir and J. M. Benitez. On the use of cross-validation for time series predictor evaluation. Information Sciences: an International Journal, 191:192–213, 2012. [28] tensorflow. [Online]. Avaliable: https://www.tensorflow.org/. [29] J. M. Navarro, H. A. Parada G., and J. C. Duenas. System failure prediction through rare-events elastic-net logistic regression. In the 2nd International Conference on Artificial Intelligence Modelling and Simulation, 2014. [30] N. Gurumdimma, A. Jhumka, and M. Liakata. Towards detecting patterns in failure logs of large-scale distributed systems. In IEEE International Parallel and Distributed Processing Symposium Workshop, 2015. [31] T. P. Minka. Automatic choice of dimensionality for pca. In International Conference on Neural Information Processing Systems (NIPS), 2000. [32] D. Roobaert and G. Karakoulasand N. V. Chawla. Information gain, correlation and support vector machines. In International Conference on Neural Information Processing Systems (NIPS), 2003. [33] P. Flach and M. Kull. Precision-recall-gain curves: Pr analysis done right. In International Conference on Neural Information Processing Systems (NIPS), 2015. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72124 | - |
dc.description.abstract | 從真實世界中收集的資料通常都是分佈不平衡(imbalanced)的,資料中各個類別的分布極度不平均。傳統的分類演算法會過度的傾向學習多數的類別(通常是較不重要的類別)。在這篇論文中,我們將以電信業務支援系統的異常預測為例,測試機器學習(machine learning)及深度學習(deep learning)演算法在不平衡資料上的效能。電信業務支援系統通常都維持著良好的穩定度,所以系統異常是稀有事件(rare events),因此機器學習及深度學習演算法更難在這個極度不平衡(highly imbalanced)的資料上達到良好的效果。為了解決這個問題,我們提出了基於頻率的特徵(Frequency-based Feature Creation),藉由產生新的特徵用來描述獨熱編碼(one hot encoded)特徵的分佈。除此之外,我們也修改了現有的技術用以增強少數類別的影響力,例如門檻投票(Voting with Threshold)及分類修正(Classification Correction)。 | zh_TW |
dc.description.abstract | The data collected from the real systems is imbalanced, i.e. The classification categories are not equally represented. The existing classification algorithms usually introduce bias towards majority class (potentially uninteresting class). In this thesis, we will apply the anomaly prediction on a Business Support System (BSS) [1] of telecommunication service providers as a case to study the performance of the machine learning [2, 3, 4, 5] and deep learning [2, 3, 4] algorithms on imbalanced dataset. The reliability
and stability have been treated as the major requirements for a BSS [6]. In other words, the occurrences of anomaly are rare events in a BSS. The distribution of the system log data of BSS is highly imbalanced. Thus, it is more challenging for machine learning algorithms and deep learning algorithms to have good performance on highly imbalanced datasets. To resolve the issue, we propose an approach, namely Frequency-based Feature Creation (FFC), to create new features to describe the distributions of the one-hot-encoded features. Furthermore, we enhance some existing techniques to amplify the effects of the minority class, e.g., Voting with Threshold (VT) and Classification Correction (CC). | en |
dc.description.provenance | Made available in DSpace on 2021-06-17T06:24:33Z (GMT). No. of bitstreams: 1 ntu-107-R05922045-1.pdf: 4263155 bytes, checksum: 867e2aa13e88bc15d3dea6f0c0d64d2d (MD5) Previous issue date: 2018 | en |
dc.description.tableofcontents | Contents
致謝i 中文摘要ii Abstract iii Contents iv List of Figures vi List of Tables viii 1 Introduction 1 2 Business Support System 4 2.1 Data Extraction and Combination . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Data Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3 Main Flow of Anomaly Prediction 13 3.1 Training Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.2 Model Construction . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Prediction Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.2 Model Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4 Performance Analysis 26 4.1 Time-Series Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3.1 Frequency-based Feature Creation . . . . . . . . . . . . . . . . . . . 28 4.3.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3.3 Machine and Deep Learning Model . . . . . . . . . . . . . . . . . . 30 4.3.4 Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5 Conclusion 33 Appendix A Important Features 34 Bibliograghy 36 | |
dc.language.iso | en | |
dc.title | 機器學習與深度學習演算法於電信業務支援系統不平衡資料之效能評估 | zh_TW |
dc.title | Performance Evaluation for Machine Learning and Deep Learning Algorithms on Imbalanced Dataset: Case Study of Business Support System | en |
dc.type | Thesis | |
dc.date.schoolyear | 106-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 李宏毅,鐘嘉德,林一平,鄭枸澺 | |
dc.subject.keyword | 異常預測,錯誤預測,稀有事件,不平衡資料,機器學習,深度學習,神經網路, | zh_TW |
dc.subject.keyword | anomaly prediction,failure prediction,rare events,rare events,imbalanced dataset,machine learning,deep learning,neural network, | en |
dc.relation.page | 39 | |
dc.identifier.doi | 10.6342/NTU201801994 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2018-08-17 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-107-1.pdf 目前未授權公開取用 | 4.16 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。