請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21342
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 歐陽彥正(Yen-Jen Oyang) | |
dc.contributor.author | Chun-Yu Huang | en |
dc.contributor.author | 黃鈞宥 | zh_TW |
dc.date.accessioned | 2021-06-08T03:31:32Z | - |
dc.date.copyright | 2019-08-20 | |
dc.date.issued | 2019 | |
dc.date.submitted | 2019-08-12 | |
dc.identifier.citation | 1.M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875,2017.
2.S . Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classificationwith deep convolutional neural networks. InNIPS, 2012. 3.K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. InCVPR, 2016. 4.Gao Huang, Zhuang Liu, and Kilian Q. Weinberger. Densely connected convolutional networks.arXiv preprint arXiv:1608.06993, 2016a. 5.H. Sak, A. Senior, and F. Beaufays, “Long short-term memory recur-rent neural network architecturesfor large scale acoustic modeling,”inProc. Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH),2014, pp. 338–342. [Online]. Available: http://193.6.4.39/~czap/letoltes/IS14/IS2014/PDF/AUTHOR/IS141304.PDF 6.Junyoung Chung, Çaglar Gülçehre, Kyunghyun Cho, and Yoshua Bengio. Empirical evaluationof gated recurrent neural networks on sequence modeling.CoRR, abs/1412.3555, 2014. 7.Y. Sun, A.C. Wong, M.S. Kamel, Classification of imbalanced data: a review,International Journal of Pattern Recognition and Artificial Intelligence 23 (4)(2009) 687–719. 8.Y. Yan, M. Chen, M.-L. Shyu, S.-C. Chen, Deep Learning for Imbalanced Multimedia Data Classification, in: 2015 IEEE International Symposium onMultimedia (ISM), IEEE, 483–488, 2015. 9.S. H. Khan, M. Bennamoun, F. Sohel, and R. Togneri. Costsensitive learning of deep feature representations from im-balanced data.arXiv preprint, arXiv:1508.03422v1, 2015 | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21342 | - |
dc.description.abstract | 深度學習在電腦視覺和語音辨識的領域舉足輕重,越來越多深度學習的應用程式在各個領域獲得突破。為了獲得足夠良好的效果,具有良好特性的資料集最為關鍵。資料量龐大、素質整齊、雜訊頻率低、分佈平均等特性能夠幫助深度學習模型在學習問題上獲得足夠好的表現。但是,真實世界的資料會因為充斥雜訊而錯誤分類或是資料量巨大難以逐一確定標籤,而正確的標籤極其昂貴而耗時的,這造就了大量的數量不平衡的資料。隨著大數據時代的來臨,各類型的應用有多元的需求,而不僅僅是要求準確率,諸如召回率和精確度。本篇論文討論了不平衡資料集的特性,並探討了召回率和精確率是否能夠被目標函數控制,最後使用了交叉熵和相對熵提升召回率。最後分類器在序列型資料上可以提升5%的召回率並討論了為什麼在影像資料上沒有獲得同等的效果。 | zh_TW |
dc.description.abstract | Deep Learning becomes important on speech recognition and computer vision, and it also has been deployed on many applications. Typically, considerable, good quality, clean and balanced datasets are necessary to have a deep learning model with good performance. However, labeling a great deal of data is expensive and time consuming, so large real world datasets are generally extremely imbalanced and noisy, and Researcher and users usually need various goals on statistics such as recall and precision rather than accuracy. I discuss the property of imbalance dataset and try to understand if recall and precision could be controlled by objective function. This study proposes using Cross entropy and Kullback–Leibler divergence to boost recall. The resulting classifier performance on sequential dataset could usually rise 5% recall, but failing on images classification and we also discuss it. | en |
dc.description.provenance | Made available in DSpace on 2021-06-08T03:31:32Z (GMT). No. of bitstreams: 1 ntu-108-R06922104-1.pdf: 1781440 bytes, checksum: f2bf6232734ffabbba9551b4ba25ee1c (MD5) Previous issue date: 2019 | en |
dc.description.tableofcontents | 致謝 II
摘要 III Abstract IV Table of Contents V Chapter 1. Introduction 1 Natural Language Processing 2 Image Process 3 Chapter 2. Related Work 4 Imbalance Dataset Composition 6 Gaussian Mixture Model(GMM) 6 Under-sampling 7 Oversampling/Resampling 9 Cost-sensitive Classification 9 Balanced Batch 11 Chapter 3. Relationship between Recall and Objective Function 12 Objective Function 12 Softmax 13 Kulback-Leibler Divergence 14 Experiment on Public Dataset 16 Dataset 16 Boosting Recall with KL-Divergence 16 Experiment and Empirical Result on Cifar-10 17 Experiment and Empirical result on sentiment140 24 Chapter 4. Discussion and Conclusion 29 How Image dataset fail our assumption 29 Limitations for KL-Divergence on classifier 30 Future work 30 Reference 31 | |
dc.language.iso | en | |
dc.title | 提升不平衡輸入資料集下分類器之召回率 | zh_TW |
dc.title | Boosting recall of data classifiers with imbalanced input datasets | en |
dc.type | Thesis | |
dc.date.schoolyear | 107-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 韓謝忱(HSIEH-CHEN HAN),黃乾綱(Chien-Kang Huang),金傳春(Chwan-Chuen King) | |
dc.subject.keyword | 深度學習,分類器,不平衡資料, | zh_TW |
dc.subject.keyword | Deep Learning,classifier,Imbalanced Dataset, | en |
dc.relation.page | 31 | |
dc.identifier.doi | 10.6342/NTU201903027 | |
dc.rights.note | 未授權 | |
dc.date.accepted | 2019-08-13 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-108-1.pdf 目前未授權公開取用 | 1.74 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。