使用類別映射的零樣本文本分類

張秋霞; Zhang Qiuxia

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90115

標題:	使用類別映射的零樣本文本分類 Category Mapping for Zero-shot Text Classification
作者:	張秋霞 Zhang Qiuxia
指導教授:	張智星 Jyh-Shing Roger Jang
關鍵字:	自然語言處理,預訓練語言模型,零樣本文本分類,分類,GloVe, Natural Language Processing,Pretrained Language Models,Zero Shot Text Classification,Classification,GloVe,
出版年 :	2023
學位:	碩士
摘要:	現有基於大型的預訓練模型並加入提示進行零樣本文本分類的方法，具有模型自身強大的表示能力和擴展性，但商業可用性相對較差。利用類標簽和已有資料集微調較小的模型進行零樣本分類的方法相對簡便，但存在模型泛化能力較弱等問題。本文使用了三種方法來提高預訓練模型在零樣本文本分類任務上的的準確性和泛化能力：1. 使用預訓練語言模型，將其輸入整理成統一的多項選擇格式；2.利用維基百科文本數據構建文本分類訓練集，對預訓練模型進行微調；3.提出了基於GloVe 文本相似度的零樣本類別映射方法，使用維基百科類別代替文本類別。不使用待分類標簽進行微調的情況下，該方法取得了與使用待分類標簽進行微調的最佳模型相當的效果。 The existing method of using large pre-trained models with prompts for zero-shot text classification has powerful representation ability and scalability. However, its commercial availability is relatively poor. The method of using class labels and existing datasets to fine-tune smaller models for zero-shot classification is relatively simple, but it may suffer from weaker model generalization ability. This paper proposes three methods to improve the accuracy and generalization ability of pre-trained models in zero-shot text classification tasks: 1) using pre-trained language models and formatting inputs into a unified multiple-choice format; 2) constructing a text classification training set using Wikipedia text data and fine-tuning the pre-trained model; and 3) proposing a zero-shot category mapping method based on GloVe text similarity, using Wikipedia categories to replace textual categories. Without using labeled samples for fine-tuning, the proposed method achieves results comparable to the best models fine-tuned with labeled samples.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90115
DOI:	10.6342/NTU202304127
全文授權:	同意授權(全球公開)
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	3.41 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。