提升不平衡輸入資料集下分類器之召回率

Chun-Yu Huang; 黃鈞宥

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21342

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	歐陽彥正(Yen-Jen Oyang)
dc.contributor.author	Chun-Yu Huang	en
dc.contributor.author	黃鈞宥	zh_TW
dc.date.accessioned	2021-06-08T03:31:32Z	-
dc.date.copyright	2019-08-20
dc.date.issued	2019
dc.date.submitted	2019-08-12
dc.identifier.citation	1.M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875,2017. 2.S . Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classificationwith deep convolutional neural networks. InNIPS, 2012. 3.K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. InCVPR, 2016. 4.Gao Huang, Zhuang Liu, and Kilian Q. Weinberger. Densely connected convolutional networks.arXiv preprint arXiv:1608.06993, 2016a. 5.H. Sak, A. Senior, and F. Beaufays, “Long short-term memory recur-rent neural network architecturesfor large scale acoustic modeling,”inProc. Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH),2014, pp. 338–342. [Online]. Available: http://193.6.4.39/~czap/letoltes/IS14/IS2014/PDF/AUTHOR/IS141304.PDF 6.Junyoung Chung, Çaglar Gülçehre, Kyunghyun Cho, and Yoshua Bengio. Empirical evaluationof gated recurrent neural networks on sequence modeling.CoRR, abs/1412.3555, 2014. 7.Y. Sun, A.C. Wong, M.S. Kamel, Classification of imbalanced data: a review,International Journal of Pattern Recognition and Artificial Intelligence 23 (4)(2009) 687–719. 8.Y. Yan, M. Chen, M.-L. Shyu, S.-C. Chen, Deep Learning for Imbalanced Multimedia Data Classification, in: 2015 IEEE International Symposium onMultimedia (ISM), IEEE, 483–488, 2015. 9.S. H. Khan, M. Bennamoun, F. Sohel, and R. Togneri. Costsensitive learning of deep feature representations from im-balanced data.arXiv preprint, arXiv:1508.03422v1, 2015
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21342	-
dc.description.abstract	深度學習在電腦視覺和語音辨識的領域舉足輕重，越來越多深度學習的應用程式在各個領域獲得突破。為了獲得足夠良好的效果，具有良好特性的資料集最為關鍵。資料量龐大、素質整齊、雜訊頻率低、分佈平均等特性能夠幫助深度學習模型在學習問題上獲得足夠好的表現。但是，真實世界的資料會因為充斥雜訊而錯誤分類或是資料量巨大難以逐一確定標籤，而正確的標籤極其昂貴而耗時的，這造就了大量的數量不平衡的資料。隨著大數據時代的來臨，各類型的應用有多元的需求，而不僅僅是要求準確率，諸如召回率和精確度。本篇論文討論了不平衡資料集的特性，並探討了召回率和精確率是否能夠被目標函數控制，最後使用了交叉熵和相對熵提升召回率。最後分類器在序列型資料上可以提升5%的召回率並討論了為什麼在影像資料上沒有獲得同等的效果。	zh_TW
dc.description.abstract	Deep Learning becomes important on speech recognition and computer vision, and it also has been deployed on many applications. Typically, considerable, good quality, clean and balanced datasets are necessary to have a deep learning model with good performance. However, labeling a great deal of data is expensive and time consuming, so large real world datasets are generally extremely imbalanced and noisy, and Researcher and users usually need various goals on statistics such as recall and precision rather than accuracy. I discuss the property of imbalance dataset and try to understand if recall and precision could be controlled by objective function. This study proposes using Cross entropy and Kullback–Leibler divergence to boost recall. The resulting classifier performance on sequential dataset could usually rise 5% recall, but failing on images classification and we also discuss it.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T03:31:32Z (GMT). No. of bitstreams: 1 ntu-108-R06922104-1.pdf: 1781440 bytes, checksum: f2bf6232734ffabbba9551b4ba25ee1c (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	致謝 II 摘要 III Abstract IV Table of Contents V Chapter 1. Introduction 1 Natural Language Processing 2 Image Process 3 Chapter 2. Related Work 4 Imbalance Dataset Composition 6 Gaussian Mixture Model(GMM) 6 Under-sampling 7 Oversampling/Resampling 9 Cost-sensitive Classification 9 Balanced Batch 11 Chapter 3. Relationship between Recall and Objective Function 12 Objective Function 12 Softmax 13 Kulback-Leibler Divergence 14 Experiment on Public Dataset 16 Dataset 16 Boosting Recall with KL-Divergence 16 Experiment and Empirical Result on Cifar-10 17 Experiment and Empirical result on sentiment140 24 Chapter 4. Discussion and Conclusion 29 How Image dataset fail our assumption 29 Limitations for KL-Divergence on classifier 30 Future work 30 Reference 31
dc.language.iso	en
dc.title	提升不平衡輸入資料集下分類器之召回率	zh_TW
dc.title	Boosting recall of data classifiers with imbalanced input datasets	en
dc.type	Thesis
dc.date.schoolyear	107-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	韓謝忱(HSIEH-CHEN HAN),黃乾綱(Chien-Kang Huang),金傳春(Chwan-Chuen King)
dc.subject.keyword	深度學習,分類器,不平衡資料,	zh_TW
dc.subject.keyword	Deep Learning,classifier,Imbalanced Dataset,	en
dc.relation.page	31
dc.identifier.doi	10.6342/NTU201903027
dc.rights.note	未授權
dc.date.accepted	2019-08-13
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 未授權公開取用	1.74 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。