請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/60705完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 蔡政安(Chen-An Tsai) | |
| dc.contributor.author | Yu-Jing Chang | en |
| dc.contributor.author | 張育菁 | zh_TW |
| dc.date.accessioned | 2021-06-16T10:26:47Z | - |
| dc.date.available | 2022-08-01 | |
| dc.date.copyright | 2020-07-28 | |
| dc.date.issued | 2020 | |
| dc.date.submitted | 2020-07-17 | |
| dc.identifier.citation | 1. Alon U., Barkai N., Notterman D., Gish K., Ybarra S., Mack D., and Levine A. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. 1999. Proceedings of the National Academy of Sciences of the United States of America 96: 6745-6750. 2. Brank J., Grobelnik M., Milic`-Frayling N., and Mladenic` D. Training text classifiers with SVM on very few positive examples. 2003. Microsoft Research Technical Report MSR-TR-2003-34. 3. Carl Staelin. Parameter selection for support vector machines. 2003. HP Laboratories Israel1 HPL-2002-354. 4. Chapelle O., Vapnik V., Bousquet O., and Mukherjee S. Choosing multiple parameters for support vector machines. 2002. Machine Learning 46: 131-159. 5. Chawla. N., Bowyer K., Hall L., and Kegelmeyer W. SMOTE: Synthetic minority oversampling technique. 2002. Journal of Artificial Intelligence Research 16: 321-357. 6. Duan W., Jing L., and Lu X. Imbalanced data classification using cost-sensitive support vector machine based on information entropy. 2014. Advanced Materials Research 989-994: 1756-1761. 7. Filannino Michele. 2011. DBWorld e-mail classification using a very small corpus. Project of Machine Learning course, University of Manchester. 8. Fro¨hlich H., and Zell A. Efficient parameter selection for support vector machines in classification and regression via model-based global optimization. 2005. Proceedings. 2005 IEEE International Joint Conference on Neural Networks. 3: 1431-1436. 9. Hippo Y., Taniguchi H., Tsutsumi S., Machida N., Chong J., Fukayama M., Kodama T., and Aburatani H. Global Gene Expression Analysis of Gastric Cancer by Oligonucleotide Microarrays. 2002. Cancer Research. 62: 233-240. 10. Han H., Wang W., and Mao B. Borderline-SMOTE: A New Over-Sampling Method in imbalanced data sets learning. 2012. ICIC 2005: Advances in Intelligent Computing 3644: 878-887. 11. Huang C., and Wang C. A GA-based feature selection and parameters optimization for support vector machines. 2006. Expert Systems with Applications 31: 231-240. 12. Huang C., and Dun D. A distributed PSOSVM hybrid system with feature selection and parameter optimization. 2008. Applied Soft Computing 8: 1381-1391. 13. Huang T., and Kecman V. Bias term b in SVMs again. 2004. European Symposium on Artificial Neural Networks: 441-448. 14. Lin S., Ying K., Chen S., and Lee Z. Particle swarm optimization for parameter determination and feature selection of support vector machines. 2008. Expert Systems with Applications 35: 1817-1824. 15. Lin W., and Chen J. Class-imbalanced classifiers for high-dimensional data. 2012. Briefings in Bioinformatics 14: 13-26. 16. Lichman, M. 2013. UCI machine learning repository. URL: http://archive.ics.uci.edu/ml. 17. Nu`n˜ez H., Gonzalez-Abril L., and Angulo C. A post-processing strategy for SVM learningfrom unbalanced data. 2011. European Symposium on Artificial Neural Networks: 195-200. 18. Nutt C., Mani D., Betensky R., Tamayo P., Cairncross J., Pohl C., Hartmann C., McLaughlin M., Batchelor T., Black P., Deimling A., Pomeroy S., Golub T., and Louis D. 2003. Gene Expression-based Classification of Malignant Gliomas Correlates Better with Survival than Histological Classification.Cancer Research 63: 1602-1607. 19. Peng C., Zhao D., and Zaiane O. An optimized cost-sensitive SVM forimbalanced data learning. 2013. Pacific-Asia Conference on Knowledge Discovery and Data Mining 2013: Advances in Knowledge Discovery and Data Mining 7819: 280-292. 20. Ren Y., and Bai G. Determination of optimal SVM parameters by using GA/PSO. 2008. Journal of Computers 5: 1160-1168. 21. Shounak D., and Swagatam D. Near-Bayesian Support Vector Machine for Imbalanced data classification with equal or unequal misclassification costs. 2015. Neural Networks 70: 39-52. 22. Shunjie H., Qubo C., and Meng H. Parameter selection in SVM with RBF kernel function. 2012. World Automation Congress : 1-4. 23. Syarif I., Prugel-Bennett A., and Wills G. SVM parameter optimization using grid search and genetic algorithm to improve classification performance. 2016. Telecommunication, Computing, Electronics and Control 14: 1502-1509. 24. Veropoulos K., Campbell C., and Cristianini N. Controlling the sensitivity of support vector machines. 1999. International Joint Conferences on Artificial Intelligence 99: 55-60. 25. Wu G., and Chang E. KBA: Kernel boundary alignment considering imbalanced data distribution. 2005. IEEE. Transactions on Knowledge and Data Engineering 17: 786-795. 26. Yang K., Cai Z., Li J., and Lin G. 2006. A stable gene selection in microarray data analysis. BMC Bioinformatics 7: 228. 27. Yu H., Mu C., Sun C., Yang W., Yang X., and Zuo X. Support vector machine-based optimized decision threshold adjustment strategy for classifying imbalanced data. 2015. Knowledge-based Systems 76: 67-78. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/60705 | - |
| dc.description.abstract | 現實中,我們時常看到類別不平衡的資料,即其中一個類別其樣本數相較於其他類別特別低,然而這類型的類別通常也是我們所感興趣的。傳統上的分類器由於未考慮類別不平衡的情況,因此面對這類型的資料,分類器容易把資料歸類於多數量類別 (majority class)。在本次研究中,針對不平衡資料我們採用支援向量機 (support vector machine, SVM ) 並使用高斯核函數 (Gaussian kernel) 進行二元分類,為了改進 SVM 的分類效果與整體效率,我們考慮兩個問題: 不平衡的類別與參數選擇。在第一個問題中,我們基於調整閾值的概念提出兩個新方法,分別為 ROC-SVM 與 b-SVM;而在第二個問題中,我們提出一個快速且簡單的方法來挑選 SVM 的參數,且該方法並未使用交叉驗證。本研究中使用真實資料與模擬資料來評估我們所提出的方法,而結果顯示在大部分的情況下,ROC-SVM與 b-SVM 表現皆優於先前的方法,且整體運算時間也有明顯下降。 | zh_TW |
| dc.description.abstract | Skewed class distributions often occur in a wide variety of real datasets in which at least one of classes has relatively small number of observations, usually the class of interest. A classifier induced by such an imbalanced dataset has typically high accuracy for majority class and poor prediction for the minority class. In this study, we focus on SVM classifier with Gaussian radial basis kernel for a binary classification problem. In order to take advantage of SVM and to achieve the best generalization ability with satisfying prediction power, we will address two important problems: imbalanced datasets and parameters selection. In the first problem, we proposed two novel adjustment methods, ROC-SVM and b-SVM, for adjusting the cutoff threshold of SVMs. In the second problem, we propose a fast and simple approach to optimize model parameters of SVMs without carrying out extensive k-folds cross validation. Extensive comparison with standard SVM and well-known existing methods are carried out to evaluate the performance of our proposed algorithms using simulated datasets and real datasets. The experimental results show that our proposed algorithms outperform over-sampling techniques and existing SVM-based solutions. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-16T10:26:47Z (GMT). No. of bitstreams: 1 U0001-0307202010591500.pdf: 2761676 bytes, checksum: a1fe11c327f2c6cf3a3400353c5a0ac9 (MD5) Previous issue date: 2020 | en |
| dc.description.tableofcontents | Contents 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Material and Methods 4 2.1 Strategies for Imbalanced Datasets . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1.1 Performance measures . . . . . . . . . . . . . . . . . . 4 2.1.1.2 Re-sampling . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1.3 Cost-Sensitive SVM . . . . . . . . . . . . . . . . . . . 6 2.1.1.4 Threshold Movement . . . . . . . . . . . . . . . . . . 7 2.1.2 Proposed methods for imbalanced datasets . . . . . . . . . . . . 7 2.1.2.1 ROC-SVM . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.2.2 b-SVM . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Parameter Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1.1 Grid Search . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1.2 Particle Swarm Optimization (PSO) . . . . . . . . . . 10 2.2.1.3 Genetic Algorithm (GA) . . . . . . . . . . . . . . . . . 12 2.2.2 Proposed methods for parameter selection . . . . . . . . . . . . . 14 2.2.2.1 Gamma Parameter Selection : Min-max Gamma Selection 14 2.2.2.2 Cost Parameter Selection using Min-max Gamma selection . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.1 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.2 Real Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3 Result 32 3.1 Strategies for Imbalanced datasets . . . . . . . . . . . . . . . . . . . . . 32 3.1.1 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.1.2 Real Datasets Analysis . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 Parameters selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2.1 Gamma Parameter Selection Result . . . . . . . . . . . . . . . . 34 3.2.1.1 Simulation Study . . . . . . . . . . . . . . . . . . . . 34 3.2.1.2 Real Datasets Analysis . . . . . . . . . . . . . . . . . . 39 3.2.2 Cost Parameter Selection Using Min-max Gamma Selection Result 42 3.2.2.1 Simulation Study . . . . . . . . . . . . . . . . . . . . 42 3.2.2.2 Real Datasets Analysis . . . . . . . . . . . . . . . . . . 46 4 Discussion and Conclusion 48 4.1 Strategies for Imbalanced Datasets . . . . . . . . . . . . . . . . . . . . . 48 4.2 Parameters Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2.1 Gamma Parameter Selection Result . . . . . . . . . . . . . . . . 49 4.2.2 Cost Parameter Selection Using Min-max Gamma Selection Result 50 Reference 51 | |
| dc.language.iso | en | |
| dc.subject | ROC 曲線 | zh_TW |
| dc.subject | 參數挑選 | zh_TW |
| dc.subject | 支援向量機 | zh_TW |
| dc.subject | 偏差調整 | zh_TW |
| dc.subject | 不平衡資料集 | zh_TW |
| dc.subject | 閾值調整 | zh_TW |
| dc.subject | parameter selection | en |
| dc.subject | imbalanced datasets | en |
| dc.subject | threshold adjustment | en |
| dc.subject | ROC curve | en |
| dc.subject | Support Vector Machine (SVM) | en |
| dc.subject | bias adjustment | en |
| dc.title | 基於不平衡資料進行支援向量機的高斯核函數之參數選擇 | zh_TW |
| dc.title | Efficient selection of Gaussian kernel SVM parameters for imbalanced datasets | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 108-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 蔡欣甫(Shin-Fu Tsai),邱春火(Chun-Huo Chiu) | |
| dc.subject.keyword | 支援向量機,不平衡資料集,閾值調整,ROC 曲線,偏差調整,參數挑選, | zh_TW |
| dc.subject.keyword | Support Vector Machine (SVM),imbalanced datasets,threshold adjustment,ROC curve,bias adjustment,parameter selection, | en |
| dc.relation.page | 174 | |
| dc.identifier.doi | 10.6342/NTU202001289 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2020-07-17 | |
| dc.contributor.author-college | 生物資源暨農學院 | zh_TW |
| dc.contributor.author-dept | 農藝學研究所 | zh_TW |
| 顯示於系所單位: | 農藝學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-0307202010591500.pdf 未授權公開取用 | 2.7 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
