提升不平衡輸入資料集下分類器之召回率

Chun-Yu Huang; 黃鈞宥

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21342

標題:	提升不平衡輸入資料集下分類器之召回率 Boosting recall of data classifiers with imbalanced input datasets
作者:	Chun-Yu Huang 黃鈞宥
指導教授:	歐陽彥正(Yen-Jen Oyang)
關鍵字:	深度學習,分類器,不平衡資料, Deep Learning,classifier,Imbalanced Dataset,
出版年 :	2019
學位:	碩士
摘要:	深度學習在電腦視覺和語音辨識的領域舉足輕重，越來越多深度學習的應用程式在各個領域獲得突破。為了獲得足夠良好的效果，具有良好特性的資料集最為關鍵。資料量龐大、素質整齊、雜訊頻率低、分佈平均等特性能夠幫助深度學習模型在學習問題上獲得足夠好的表現。但是，真實世界的資料會因為充斥雜訊而錯誤分類或是資料量巨大難以逐一確定標籤，而正確的標籤極其昂貴而耗時的，這造就了大量的數量不平衡的資料。隨著大數據時代的來臨，各類型的應用有多元的需求，而不僅僅是要求準確率，諸如召回率和精確度。本篇論文討論了不平衡資料集的特性，並探討了召回率和精確率是否能夠被目標函數控制，最後使用了交叉熵和相對熵提升召回率。最後分類器在序列型資料上可以提升5%的召回率並討論了為什麼在影像資料上沒有獲得同等的效果。 Deep Learning becomes important on speech recognition and computer vision, and it also has been deployed on many applications. Typically, considerable, good quality, clean and balanced datasets are necessary to have a deep learning model with good performance. However, labeling a great deal of data is expensive and time consuming, so large real world datasets are generally extremely imbalanced and noisy, and Researcher and users usually need various goals on statistics such as recall and precision rather than accuracy. I discuss the property of imbalance dataset and try to understand if recall and precision could be controlled by objective function. This study proposes using Cross entropy and Kullback–Leibler divergence to boost recall. The resulting classifier performance on sequential dataset could usually rise 5% recall, but failing on images classification and we also discuss it.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21342
DOI:	10.6342/NTU201903027
全文授權:	未授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 未授權公開取用	1.74 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。