Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80286
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor林智仁(Chih-Jen Lin)
dc.contributor.authorZhongyi Queen
dc.contributor.author闕中一zh_TW
dc.date.accessioned2022-11-24T03:03:50Z-
dc.date.available2022-02-16
dc.date.available2022-11-24T03:03:50Z-
dc.date.copyright2022-02-16
dc.date.issued2022
dc.date.submitted2022-02-11
dc.identifier.citation[1] M. Ayer, H. D. Brunk, G. M. Ewing, W. T. Reid, and E. Silverman. An Empirical Distribution Function for Sampling with Incomplete Information. The Annals of Mathematical Statistics, 26(4):641-647, 1955. [2] R. A. Bauder and T. M. Khoshgoftaar. Estimating outlier score probabilities. In Proceedings of IEEE International Conference on Information Reuse and Integration, pages 559-568, 2017. [3] D. A. Bodenham and N. M. Adams. A comparison of efficient approximations for a weighted sum of chi-squared random variables. Statistics and Computing, 26(4):917-928, 2016. [4] A. Bounsiar and M. G. Madden. Kernels for one-class support vector machines. In Proceedings of International Conference on Information Science Applications (ICISA), pages 1-4, 2014. [5] M. Brown. A generalized error function in n dimensions. Technical report, 1963. Technical Memorandum No. NMC-TM-63-8, US Naval Missile Center. [6] C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):27:1-27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. [7] J. Gao and P.-N. Tan. Converting output scores from outlier detection algorithms into probability estimates. In Proceedings of the Sixth International Conference on Data Mining (ICDM), pages 212-221, 2006. [8] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA, 2001. [9] H.-P. Kriegel, P. Kroger, E. Schubert, and A. Zimek. Interpreting and unifying outlier scores. In Proceedings of SIAM International Conference on Data Mining (SDM), pages 13-24, 2011. [10] H.-T. Lin, C.-J. Lin, and R. C. Weng. A note on Platt’s probabilistic outputs for support vector machines. Machine Learning, 68:267-276, 2007. [11] J. C. Platt. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In A. Smola, P. Bartlett, B. Scholkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, Cambridge, MA, 2000. MIT Press. [12] M. Roederer, A. Treister, W. Moore, and L. A. Herzenberg. Probability binning comparison: a metric for quantitating univariate distribution differences. Cytometry: The Journal of the International Society for Analytical Cytology, 45(1):37-46, 2001. [13] B. Scholkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 13(7):1443-1471, 2001. [14] T.-F. Wu, C.-J. Lin, and R. C. Weng. Probability estimates for multi-class classification by pairwise coupling. In S. Thrun, L. Saul, and B. Scholkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004. [15] B. Zadrozny and C. Elkan. Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the Eighth International Conference on Knowledge Discovery and Data Mining, 2002.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80286-
dc.description.abstract一類支援向量機是支援向量機用於處理無標籤資料的延伸版本,它作為一種成熟的異常值檢測方法,目前已經得到了廣泛的應用。然而,與一般的用於解決分類問題的二類支援向量機相比,一類支援向量機並沒有提供機率輸出這一功能,也就是說我們無法預測一筆資料出現異常的機率。目前,已經有了一些用於預測二類支援向量機機率輸出的有效方法,但一類支援向量機的這部分問題仍未被關注,主要原因是它作為非監督式學習的模型沒有標籤可以參考,導致預測機率困難。在這篇論文中,我們的目標是提出對於一類支援向量機可行的機率輸出方法,我們也探討了那些可以應用於二類支援向量機的方法無法在一類支援向量機上進行使用的原因。由於一類支援向量機標籤的缺失,我們認為讓機率輸出模仿決策值的分佈是一個可行的思路,並基於這一想法提出了幾種新的方法,後續又在實驗中使用人工資料集和真實資料集,對幾種新方法的可行性進行了驗證。zh_TW
dc.description.provenanceMade available in DSpace on 2022-11-24T03:03:50Z (GMT). No. of bitstreams: 1
U0001-1801202212365400.pdf: 1433368 bytes, checksum: 1b47794c3f3dc81619794d2e485165a9 (MD5)
Previous issue date: 2022
en
dc.description.tableofcontents"Contents 口試委員審定書 i 誌謝ii 摘要 iii Abstract iv 1 Introduction 1 2 Issues in Extending Platt's Two-class SVM Probabilistic Outputs to One-class SVM 4 2.1 A Review of the Approach by Platt 4 2.2 Why Fitting a Sigmoid Function is Appropriate for Two-class Problems 5 2.3 Fitting a Sigmoid Function may not be Suitable for One-class SVM 6 2.3.1 The Ideal Probabilistic Outputs 7 2.3.2 An Approximation of Decision Value 9 2.3.3 An Approximation of the Probabilistic Outputs by Using Decision Values 11 2.4 Lack of Labels in Maximizing the Likelihood 13 3 Other Existing Probability Estimation Methods 15 3.1 Isotonic Regression 15 3.2 The Method of k Nearest Neighbors 16 3.3 Modeling Sigmoid Probability Using an EM-based Algorithm 17 3.4 Converting a Sequence of Scores into Probabilities by Regularization and Normalization 19 4 Methods to Generate Probabilistic Outputs for One-class SVM 22 4.1 Binning by Decision Values 22 4.2 EM-based Algorithm without Restriction on Variance 24 4.3 A New Gamma Scaling 27 4.3.1 Motivation 27 4.3.2 Kernel Selection 28 4.3.3 Distribution of the Decision Values 28 4.3.4 Transformation to Probabilities 29 4.3.5 Advantages of the New Gamma Scaling 31 5 Experiments 32 5.1 Performance Measure 32 5.2 Artificial Data 34 5.3 Real-world Data 39 6 Conclusion 43 A Derivation of the Theoretical Probability Output for Artificial Multi-dimensional Data 44 Bibliography 46 List of Figures 2.1 An illustration to explain that if P(f|y = 1) and P(f|y = 1) are like the right figure, then the probabilistic output is in a shape shown in the left figure. 7 2.2 A conceptual illustration of one-class SVM probabilistic outputs. 12 2.3 A conceptual illustration of one-class SVM probabilistic outputs. 14 2.4 An illustration of the probability model if (2.25) holds. 14 4.1 Illustrations for the two binning methods. The vertical dotted lines show the marks we selected, and the blue line segments represent the bins with different probabilistic outputs. 23 5.1 Relationship between probabilistic outputs and decision values for artificial data sets ART1 and ART2. The ideal probability is calculate by (5.2) and (5.3) as ground-truth, respectively. 35 5.2 The distribution of ART3. The black line shows the PDF of normal data, and the blue line shows the PDF of outliers. 38 5.3 An illustration of ART4. Blue points indicate normal data and red points are outliers. The value next to each point is the scaled percentile in (5.1). 39 5.4 Q-Q plots of probabilistic outputs obtained by the new Gamma scaling. 40 5.5 Q-Q plots of probabilistic outputs obtained by binning according to density. 41 5.6 Q-Q plots of probabilistic outputs obtained by Platt scaling. 41 5.7 Q-Q plots for the evaluation on real-world data sets. The caption of each sub-figure shows the data set and the method to generate probabilistic out puts. 42 List of Tables 5.1 Mean squared error (MSE) of the methods on data sets ART1 and ART2. 35 5.2 Mean squared error (MSE) of the methods on data sets ART_5d and ART_10d. 35"
dc.language.isoen
dc.subject支援向量機zh_TW
dc.subject機率估計zh_TW
dc.subject一類支援向量機zh_TW
dc.subject異常值檢測zh_TW
dc.subject普拉特縮放zh_TW
dc.subjectOne-class SVMen
dc.subjectProbability estimationen
dc.subjectSupport vector machinesen
dc.subjectPlatt scalingen
dc.subjectOutlier detectionen
dc.title一類支援向量機之機率輸出zh_TW
dc.titleOne-class SVM Probabilistic Outputsen
dc.date.schoolyear110-1
dc.description.degree碩士
dc.contributor.oralexamcommittee林軒田(James Hsieh),李育杰(Edward Hsieh)
dc.subject.keyword機率估計,一類支援向量機,異常值檢測,普拉特縮放,支援向量機,zh_TW
dc.subject.keywordProbability estimation,One-class SVM,Outlier detection,Platt scaling,Support vector machines,en
dc.relation.page47
dc.identifier.doi10.6342/NTU202200089
dc.rights.note同意授權(限校園內公開)
dc.date.accepted2022-02-11
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
U0001-1801202212365400.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
1.4 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved