使用類別映射的零樣本文本分類

張秋霞; Zhang Qiuxia

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90115

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	張智星	zh_TW
dc.contributor.advisor	Jyh-Shing Roger Jang	en
dc.contributor.author	張秋霞	zh_TW
dc.contributor.author	Zhang Qiuxia	en
dc.date.accessioned	2023-09-22T17:28:37Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-09-22	-
dc.date.issued	2023	-
dc.date.submitted	2023-08-14	-
dc.identifier.citation	[1] Y. Bengio, R. Ducharme, and P. Vincent. A neural probabilistic language model.Advances in neural information processing systems, 13, 2000. [2] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan,P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020. [3] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan,P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020. [4] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training ofdeep bidirectional transformers for language understanding. arXiv preprintions with human feedback. Advances in Neural Information Processing Systems,35:27730–27744, 2022. [5] H. Ding, J. Yang, Y. Deng, H. Zhang, and D. Roth. Towards open-domain topicclassification. In Proceedings of the 2022 Conference of the North American Chapterof the Association for Computational Linguistics: Human Language Technologies:System Demonstrations, pages 90–98, 2022. [6] S. T. Dumais et al. Latent semantic analysis. Annu. Rev. Inf. Sci. Technol.,38(1):188–230, 2004. [7] A. Gera, A. Halfon, E. Shnarch, Y. Perlitz, L. Ein-Dor, and N. Slonim. Zero-shottext classification with self-training. arXiv preprint arXiv:2210.17541, 2022. [8] Z. S. Harris. Distributional structure. Word, 10(2-3):146–162, 1954. [9] P. He, X. Liu, J. Gao, and W. Chen. Deberta: Decoding-enhanced bert with disentangledattention. arXiv preprint arXiv:2006.03654, 2020. [10] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation,9(8):1735–1780, 1997. [11] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut. Albert:A lite bert for self-supervised learning of language representations. arXiv preprintarXiv:1909.11942, 2019. [12] Y. LeCun, Y. Bengio, et al. Convolutional networks for images, speech, and timeseries. The handbook of brain theory and neural networks, 3361(10):1995, 1995. [13] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer,and V. Stoyanov. Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692, 2019. [14] H. P. Luhn. A statistical approach to mechanized encoding and searching of literaryinformation. IBM Journal of research and development, 1(4):309–317, 1957. [15] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representationsin vector space. arXiv preprint arXiv:1301.3781, 2013. [16] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang,S. Agarwal, K. Slama, A. Ray, et al. Training language models to follow instruc-Generalized autoregressive pretraining for language understanding. Advances inneural information processing systems, 32, 2019. [17] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer.Deep contextualized word representations. arXiv preprint arXiv:1802.05365,2018. [18] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al. Improving languageunderstanding by generative pre-training. 2018. [19] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al. Languagemodels are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019. [20] G. Salton, A. Wong, and C.-S. Yang. A vector space model for automatic indexing.Communications of the ACM, 18(11):613–620, 1975. [21] D. Svozil, V. Kvasnicka, and J. Pospichal. Introduction to multi-layer feed-forwardneural networks. Chemometrics and intelligent laboratory systems, 39(1):43–62,1997. [22] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser,and I. Polosukhin. Attention is all you need. Advances in neural informationprocessing systems, 30, 2017. [23] P. Yang, J. Wang, R. Gan, X. Zhu, L. Zhang, Z. Wu, X. Gao, J. Zhang, and T. Sakai.Zero-shot learners for natural language understanding via a unified multiple choiceperspective. arXiv preprint arXiv:2210.08590, 2022. [24] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le. Xlnet:arXiv:1810.04805, 2018. [25] X. Zhang, J. Zhao, and Y. LeCun. Character-level convolutional networks for textclassification. Advances in neural information processing systems, 28, 2015.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90115	-
dc.description.abstract	現有基於大型的預訓練模型並加入提示進行零樣本文本分類的方法，具有模型自身強大的表示能力和擴展性，但商業可用性相對較差。利用類標簽和已有資料集微調較小的模型進行零樣本分類的方法相對簡便，但存在模型泛化能力較弱等問題。本文使用了三種方法來提高預訓練模型在零樣本文本分類任務上的的準確性和泛化能力：1. 使用預訓練語言模型，將其輸入整理成統一的多項選擇格式；2.利用維基百科文本數據構建文本分類訓練集，對預訓練模型進行微調；3.提出了基於GloVe 文本相似度的零樣本類別映射方法，使用維基百科類別代替文本類別。不使用待分類標簽進行微調的情況下，該方法取得了與使用待分類標簽進行微調的最佳模型相當的效果。	zh_TW
dc.description.abstract	The existing method of using large pre-trained models with prompts for zero-shot text classification has powerful representation ability and scalability. However, its commercial availability is relatively poor. The method of using class labels and existing datasets to fine-tune smaller models for zero-shot classification is relatively simple, but it may suffer from weaker model generalization ability. This paper proposes three methods to improve the accuracy and generalization ability of pre-trained models in zero-shot text classification tasks: 1) using pre-trained language models and formatting inputs into a unified multiple-choice format; 2) constructing a text classification training set using Wikipedia text data and fine-tuning the pre-trained model; and 3) proposing a zero-shot category mapping method based on GloVe text similarity, using Wikipedia categories to replace textual categories. Without using labeled samples for fine-tuning, the proposed method achieves results comparable to the best models fine-tuned with labeled samples.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-09-22T17:28:37Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-09-22T17:28:37Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	口試委員審定書i 致謝iii 摘要v Abstract vii 目錄ix 圖目錄xiii 表目錄xv 第一章緒論1 1.1 研究動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 研究貢獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 章節概述. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 第二章文獻探討5 2.1 詞向量. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 靜態詞向量模型. . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1.1 LSA . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1.2 Word2vec . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1.3 GloVe . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.2 動態詞向量模型. . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 語言模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 傳統語言模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 基於神經網路的語言模型. . . . . . . . . . . . . . . . . . . . . . 11 2.2.3 預訓練語言模型. . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.4 基于Transformer 的預訓練語言模型. . . . . . . . . . . . . . . . 14 2.2.4.1 GPT . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.4.2 BERT . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.4.3 ALBERT . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.4.4 RoBERTa . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.4.5 DeBERTa . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3 現有的零樣本文本分類模型. . . . . . . . . . . . . . . . . . . . . . 21 2.3.1 大型生成式語言模型+ 基於提示的方法. . . . . . . . . . . . . . 22 2.3.2 較小預訓練語言模型+ 微調的方法. . . . . . . . . . . . . . . . 23 2.3.2.1 基於自然語言推理的方法. . . . . . . . . . . . . . . 23 2.3.2.2 UniMC . . . . . . . . . . . . . . . . . . . . . . . . . . 26 第三章資料集介紹29 3.1 Yahoo! Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 AG News . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.3 DBpedia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4 IMDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 第四章研究方法35 4.1 模型微調. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.1.1 開放域訓練資料的獲取. . . . . . . . . . . . . . . . . . . . . . . 36 4.1.2 模型輸入格式處理. . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2 類別映射. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2.1 類別映射前置處理. . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2.2 類別映射. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2.3 替代詞列表使用. . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.2.4 替代詞篩選機制. . . . . . . . . . . . . . . . . . . . . . . . . . . 45 第五章實驗設計和結果討論47 5.1 實驗任务. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.2 實驗流程及設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2.1 實驗流程. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2.2 實驗設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.3 實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.3.1 實驗1：零樣本文本分類模型性能對比實驗. . . . . . . . . . . 50 5.3.2 實驗2.1：使用維基百科資料微調模型前後效果比對實驗. . . . 52 5.3.3 實驗2.2：訓練資料類別數量探究實驗. . . . . . . . . . . . . . 53 5.3.4 實驗3.1：替代詞列表效果探究實驗. . . . . . . . . . . . . . . . 55 5.3.5 實驗3.2：篩選機制效果實驗. . . . . . . . . . . . . . . . . . . . 56 5.3.6 實驗4：維基百科微調與類別映射消融實驗. . . . . . . . . . . 58 5.3.7 實驗5.1：研究方法使用前後模型性能對比實驗. . . . . . . . . 59 5.3.8 實驗5.2：UniMC-WiKi 與最佳模型性能對比實驗. . . . . . . . 60 第六章結論和未來工作63 6.1 結論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.2 未來工作. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 參考文獻67	-
dc.language.iso	zh_TW	-
dc.subject	自然語言處理	zh_TW
dc.subject	零樣本文本分類	zh_TW
dc.subject	預訓練語言模型	zh_TW
dc.subject	分類	zh_TW
dc.subject	GloVe	zh_TW
dc.subject	Pretrained Language Models	en
dc.subject	Zero Shot Text Classification	en
dc.subject	Classification	en
dc.subject	Natural Language Processing	en
dc.subject	GloVe	en
dc.title	使用類別映射的零樣本文本分類	zh_TW
dc.title	Category Mapping for Zero-shot Text Classification	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	蔡宗翰;陳縕儂	zh_TW
dc.contributor.oralexamcommittee	Richard Tzong-Han Tsai;Yun-Nung Chen	en
dc.subject.keyword	自然語言處理,預訓練語言模型,零樣本文本分類,分類,GloVe,	zh_TW
dc.subject.keyword	Natural Language Processing,Pretrained Language Models,Zero Shot Text Classification,Classification,GloVe,	en
dc.relation.page	70	-
dc.identifier.doi	10.6342/NTU202304127	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2023-08-14	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	3.41 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。