Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72124
Title: 機器學習與深度學習演算法於電信業務支援系統不平衡資料之效能評估
Performance Evaluation for Machine Learning and Deep Learning Algorithms on Imbalanced Dataset: Case Study of Business Support System
Authors: Yen-Cheng Lin
林彥呈
Advisor: 林風
Keyword: 異常預測,錯誤預測,稀有事件,不平衡資料,機器學習,深度學習,神經網路,
anomaly prediction,failure prediction,rare events,rare events,imbalanced dataset,machine learning,deep learning,neural network,
Publication Year : 2018
Degree: 碩士
Abstract: 從真實世界中收集的資料通常都是分佈不平衡(imbalanced)的,資料中各個類別的分布極度不平均。傳統的分類演算法會過度的傾向學習多數的類別(通常是較不重要的類別)。在這篇論文中,我們將以電信業務支援系統的異常預測為例,測試機器學習(machine learning)及深度學習(deep learning)演算法在不平衡資料上的效能。電信業務支援系統通常都維持著良好的穩定度,所以系統異常是稀有事件(rare events),因此機器學習及深度學習演算法更難在這個極度不平衡(highly imbalanced)的資料上達到良好的效果。為了解決這個問題,我們提出了基於頻率的特徵(Frequency-based Feature Creation),藉由產生新的特徵用來描述獨熱編碼(one hot encoded)特徵的分佈。除此之外,我們也修改了現有的技術用以增強少數類別的影響力,例如門檻投票(Voting with Threshold)及分類修正(Classification Correction)。
The data collected from the real systems is imbalanced, i.e. The classification categories are not equally represented. The existing classification algorithms usually introduce bias towards majority class (potentially uninteresting class). In this thesis, we will apply the anomaly prediction on a Business Support System (BSS) [1] of telecommunication service providers as a case to study the performance of the machine learning [2, 3, 4, 5] and deep learning [2, 3, 4] algorithms on imbalanced dataset. The reliability
and stability have been treated as the major requirements for a BSS [6]. In other words, the occurrences of anomaly are rare events in a BSS. The distribution of the system log data of BSS is highly imbalanced. Thus, it is more challenging for machine learning algorithms and deep learning algorithms to have good performance on highly imbalanced datasets. To resolve the issue, we propose an approach, namely Frequency-based Feature Creation (FFC), to create new features to describe the distributions of the one-hot-encoded features. Furthermore, we enhance some existing techniques to amplify the effects of the minority class, e.g., Voting with Threshold (VT) and Classification Correction (CC).
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72124
DOI: 10.6342/NTU201801994
Fulltext Rights: 有償授權
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
ntu-107-1.pdf
  Restricted Access
4.16 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved