Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72124
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor林風
dc.contributor.authorYen-Cheng Linen
dc.contributor.author林彥呈zh_TW
dc.date.accessioned2021-06-17T06:24:33Z-
dc.date.available2028-08-17
dc.date.copyright2018-08-20
dc.date.issued2018
dc.date.submitted2018-08-17
dc.identifier.citation[1] J. H. Chen, C. W. Huang, and C. C. Shih. The exploration of machine learning for abnormal prediction model of telecom business support system. In Asia-Pacific Network Operations and Management Symposium, 2017.
[2] K. Zhang, J. Xu, and M. R. Min. Automated it system failure prediction: A deep learning approach. In IEEE International Conference on Big Data, 2017.
[3] T. Islam and D. Manivannan. Predicting application failure in cloud: A machine learning approach. In IEEE International Conference on Cognitive Computing, 2017.
[4] A. Rosa, Y. Chen, and W. Binder. Failure analysis and prediction for big-data systems. IEEE Transactions on Services Computing, 10:984–998, 2017.
[5] I. Karakurt, S. Ozer, T. Ulusinan, and M. C. Ganiz. A machine learning approach to database failure prediction. In Computer Science and Engineering International Conference on Computer Science and Engineering, 2017.
[6] J. H. Chen, C.W. Huang, and C.W. Cheng. The monitoring system of business support system with emergency prediction based on machine learning approach. In Asia-Pacific Network Operations and Management Symposium, 2016.
[7] H. Kaur, G. Singh, and J. Minhas. A review of machine learning based anomaly detection techniques. International Journal of Computer Applications Technology and Research, 2:185–187, 2013.
[8] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12:2493–2537, 2011.
[9] K. He, X. Zhang, and S. Ren. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[10] C. Nitesh. Data Mining for Imbalanced Datasets: An Overview. Nitesh V. Chawla, 2005.
[11] B. Mirza, S. Kok, and Z. Lin. Efficient representation learning for high-dimensional imbalance data. In IEEE International Conference on Digital Signal Processing (DSP), 2016.
[12] C. Zhang, K. C. Tan, and R. Ren. Training cost-sensitive deep belief networks on imbalance data problems. In International Joint Conference on Neural Networks (IJCNN), 2016.
[13] M. M. Al-Rifaie and H. A. Alhakbani. Handling class imbalance in direct marketing dataset using a hybrid data and algorithmic level solutions. In SAI Computing Conference (SAI), 2016.
[14] T. Ghosh, D. Sarkar, and T. Sharma. Real time failure prediction of load balancers and firewalls. In IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), 2016.
[15] J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl. Algorithms for hyper-parameter optimization. In International Conference on Neural Information Processing Systems (NIPS), 2011.
[16] C. C. Chang, S. R. Yang, and E. H. Yeh. A kubernetes-based monitoring platform for dynamic cloud resource provisioning. In IEEE Global Communications Conference, 2017.
[17] T. Kanungo, D.M. Mount, and N.S. Netanyahu. An efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:881–892, 2002.
[18] scikit-learn. [Online]. Avaliable: http://scikit-learn.org/stable/.
[19] P. Baldi. Autoencoders, unsupervised learning and deep architectures. In International Conference on Unsupervised and Transfer Learning workshop, 2011.
[20] keras. [Online]. Avaliable: https://keras.io/.
[21] L. Breiman. Random forests. Journal of Machine Learning, 45:5–32, 2001.
[22] B.Kegl. The return of adaboost.mh: multi-class hamming trees. In International Conference on Learning Representations(ICLR), 2013.
[23] C. Corinna and V. Vladimir. Support-vector networks. Journal of Machine Learning, 20:273–297, 1995.
[24] S. Hochreiter and J. Schmidhuber. Long short-term memory. Journal of Neural Computation, 9:1735–1780, 1997.
[25] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In Conference on Empirical Methods on Natural Language Processing, 2014.
[26] T. G. Dietterich. Ensemble methods in machine learning. In International Workshop on Multiple Classifier Systems, 2000.
[27] C. Bergmeir and J. M. Benitez. On the use of cross-validation for time series predictor evaluation. Information Sciences: an International Journal, 191:192–213, 2012.
[28] tensorflow. [Online]. Avaliable: https://www.tensorflow.org/.
[29] J. M. Navarro, H. A. Parada G., and J. C. Duenas. System failure prediction through rare-events elastic-net logistic regression. In the 2nd International Conference on Artificial Intelligence Modelling and Simulation, 2014.
[30] N. Gurumdimma, A. Jhumka, and M. Liakata. Towards detecting patterns in failure logs of large-scale distributed systems. In IEEE International Parallel and Distributed Processing Symposium Workshop, 2015.
[31] T. P. Minka. Automatic choice of dimensionality for pca. In International Conference on Neural Information Processing Systems (NIPS), 2000.
[32] D. Roobaert and G. Karakoulasand N. V. Chawla. Information gain, correlation and support vector machines. In International Conference on Neural Information Processing Systems (NIPS), 2003.
[33] P. Flach and M. Kull. Precision-recall-gain curves: Pr analysis done right. In International Conference on Neural Information Processing Systems (NIPS), 2015.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72124-
dc.description.abstract從真實世界中收集的資料通常都是分佈不平衡(imbalanced)的,資料中各個類別的分布極度不平均。傳統的分類演算法會過度的傾向學習多數的類別(通常是較不重要的類別)。在這篇論文中,我們將以電信業務支援系統的異常預測為例,測試機器學習(machine learning)及深度學習(deep learning)演算法在不平衡資料上的效能。電信業務支援系統通常都維持著良好的穩定度,所以系統異常是稀有事件(rare events),因此機器學習及深度學習演算法更難在這個極度不平衡(highly imbalanced)的資料上達到良好的效果。為了解決這個問題,我們提出了基於頻率的特徵(Frequency-based Feature Creation),藉由產生新的特徵用來描述獨熱編碼(one hot encoded)特徵的分佈。除此之外,我們也修改了現有的技術用以增強少數類別的影響力,例如門檻投票(Voting with Threshold)及分類修正(Classification Correction)。zh_TW
dc.description.abstractThe data collected from the real systems is imbalanced, i.e. The classification categories are not equally represented. The existing classification algorithms usually introduce bias towards majority class (potentially uninteresting class). In this thesis, we will apply the anomaly prediction on a Business Support System (BSS) [1] of telecommunication service providers as a case to study the performance of the machine learning [2, 3, 4, 5] and deep learning [2, 3, 4] algorithms on imbalanced dataset. The reliability
and stability have been treated as the major requirements for a BSS [6]. In other words, the occurrences of anomaly are rare events in a BSS. The distribution of the system log data of BSS is highly imbalanced. Thus, it is more challenging for machine learning algorithms and deep learning algorithms to have good performance on highly imbalanced datasets. To resolve the issue, we propose an approach, namely Frequency-based Feature Creation (FFC), to create new features to describe the distributions of the one-hot-encoded features. Furthermore, we enhance some existing techniques to amplify the effects of the minority class, e.g., Voting with Threshold (VT) and Classification Correction (CC).
en
dc.description.provenanceMade available in DSpace on 2021-06-17T06:24:33Z (GMT). No. of bitstreams: 1
ntu-107-R05922045-1.pdf: 4263155 bytes, checksum: 867e2aa13e88bc15d3dea6f0c0d64d2d (MD5)
Previous issue date: 2018
en
dc.description.tableofcontentsContents
致謝i
中文摘要ii
Abstract iii
Contents iv
List of Figures vi
List of Tables viii
1 Introduction 1
2 Business Support System 4
2.1 Data Extraction and Combination . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Data Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Main Flow of Anomaly Prediction 13
3.1 Training Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.2 Model Construction . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Prediction Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.2 Model Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4 Performance Analysis 26
4.1 Time-Series Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3.1 Frequency-based Feature Creation . . . . . . . . . . . . . . . . . . . 28
4.3.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3.3 Machine and Deep Learning Model . . . . . . . . . . . . . . . . . . 30
4.3.4 Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5 Conclusion 33
Appendix A Important Features 34
Bibliograghy 36
dc.language.isoen
dc.subject稀有事件zh_TW
dc.subject神經網路zh_TW
dc.subject深度學習zh_TW
dc.subject錯誤預測zh_TW
dc.subject異常預測zh_TW
dc.subject機器學習zh_TW
dc.subject不平衡資料zh_TW
dc.subjectdeep learningen
dc.subjectfailure predictionen
dc.subjectrare eventsen
dc.subjectrare eventsen
dc.subjectimbalanced dataseten
dc.subjectmachine learningen
dc.subjectanomaly predictionen
dc.subjectneural networken
dc.title機器學習與深度學習演算法於電信業務支援系統不平衡資料之效能評估zh_TW
dc.titlePerformance Evaluation for Machine Learning and Deep Learning Algorithms on Imbalanced Dataset: Case Study of Business Support Systemen
dc.typeThesis
dc.date.schoolyear106-2
dc.description.degree碩士
dc.contributor.oralexamcommittee李宏毅,鐘嘉德,林一平,鄭枸澺
dc.subject.keyword異常預測,錯誤預測,稀有事件,不平衡資料,機器學習,深度學習,神經網路,zh_TW
dc.subject.keywordanomaly prediction,failure prediction,rare events,rare events,imbalanced dataset,machine learning,deep learning,neural network,en
dc.relation.page39
dc.identifier.doi10.6342/NTU201801994
dc.rights.note有償授權
dc.date.accepted2018-08-17
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-107-1.pdf
  未授權公開取用
4.16 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved