Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90115
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor張智星zh_TW
dc.contributor.advisorJyh-Shing Roger Jangen
dc.contributor.author張秋霞zh_TW
dc.contributor.authorZhang Qiuxiaen
dc.date.accessioned2023-09-22T17:28:37Z-
dc.date.available2023-11-09-
dc.date.copyright2023-09-22-
dc.date.issued2023-
dc.date.submitted2023-08-14-
dc.identifier.citation[1] Y. Bengio, R. Ducharme, and P. Vincent. A neural probabilistic language model.Advances in neural information processing systems, 13, 2000.
[2] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan,P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020.
[3] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan,P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020.
[4] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training ofdeep bidirectional transformers for language understanding. arXiv preprintions with human feedback. Advances in Neural Information Processing Systems,35:27730–27744, 2022.
[5] H. Ding, J. Yang, Y. Deng, H. Zhang, and D. Roth. Towards open-domain topicclassification. In Proceedings of the 2022 Conference of the North American Chapterof the Association for Computational Linguistics: Human Language Technologies:System Demonstrations, pages 90–98, 2022.
[6] S. T. Dumais et al. Latent semantic analysis. Annu. Rev. Inf. Sci. Technol.,38(1):188–230, 2004.
[7] A. Gera, A. Halfon, E. Shnarch, Y. Perlitz, L. Ein-Dor, and N. Slonim. Zero-shottext classification with self-training. arXiv preprint arXiv:2210.17541, 2022.
[8] Z. S. Harris. Distributional structure. Word, 10(2-3):146–162, 1954.
[9] P. He, X. Liu, J. Gao, and W. Chen. Deberta: Decoding-enhanced bert with disentangledattention. arXiv preprint arXiv:2006.03654, 2020.
[10] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation,9(8):1735–1780, 1997.
[11] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut. Albert:A lite bert for self-supervised learning of language representations. arXiv preprintarXiv:1909.11942, 2019.
[12] Y. LeCun, Y. Bengio, et al. Convolutional networks for images, speech, and timeseries. The handbook of brain theory and neural networks, 3361(10):1995, 1995.
[13] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer,and V. Stoyanov. Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692, 2019.
[14] H. P. Luhn. A statistical approach to mechanized encoding and searching of literaryinformation. IBM Journal of research and development, 1(4):309–317, 1957.
[15] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representationsin vector space. arXiv preprint arXiv:1301.3781, 2013.
[16] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang,S. Agarwal, K. Slama, A. Ray, et al. Training language models to follow instruc-Generalized autoregressive pretraining for language understanding. Advances inneural information processing systems, 32, 2019.
[17] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer.Deep contextualized word representations. arXiv preprint arXiv:1802.05365,2018.
[18] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al. Improving languageunderstanding by generative pre-training. 2018.
[19] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al. Languagemodels are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
[20] G. Salton, A. Wong, and C.-S. Yang. A vector space model for automatic indexing.Communications of the ACM, 18(11):613–620, 1975.
[21] D. Svozil, V. Kvasnicka, and J. Pospichal. Introduction to multi-layer feed-forwardneural networks. Chemometrics and intelligent laboratory systems, 39(1):43–62,1997.
[22] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser,and I. Polosukhin. Attention is all you need. Advances in neural informationprocessing systems, 30, 2017.
[23] P. Yang, J. Wang, R. Gan, X. Zhu, L. Zhang, Z. Wu, X. Gao, J. Zhang, and T. Sakai.Zero-shot learners for natural language understanding via a unified multiple choiceperspective. arXiv preprint arXiv:2210.08590, 2022.
[24] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le. Xlnet:arXiv:1810.04805, 2018.
[25] X. Zhang, J. Zhao, and Y. LeCun. Character-level convolutional networks for textclassification. Advances in neural information processing systems, 28, 2015.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90115-
dc.description.abstract現有基於大型的預訓練模型並加入提示進行零樣本文本分類的方法,具有模型自身強大的表示能力和擴展性,但商業可用性相對較差。利用類標簽和已有資料集微調較小的模型進行零樣本分類的方法相對簡便,但存在模型泛化能力較弱 等問題。本文使用了三種方法來提高預訓練模型在零樣本文本分類任務上的的準 確性和泛化能力:1. 使用預訓練語言模型,將其輸入整理成統一的多項選擇格式;2.利用維基百科文本數據構建文本分類訓練集,對預訓練模型進行微調;3.提出了基於GloVe 文本相似度的零樣本類別映射方法,使用維基百科類別代替文本類別。不使用待分類標簽進行微調的情況下,該方法取得了與使用待分類標簽進行微調的最佳模型相當的效果。zh_TW
dc.description.abstractThe existing method of using large pre-trained models with prompts for zero-shot text classification has powerful representation ability and scalability. However, its commercial availability is relatively poor. The method of using class labels and existing datasets to fine-tune smaller models for zero-shot classification is relatively simple, but it may suffer from weaker model generalization ability. This paper proposes three methods to improve the accuracy and generalization ability of pre-trained models in zero-shot text classification tasks: 1) using pre-trained language models and formatting inputs into a unified multiple-choice format; 2) constructing a text classification training set using Wikipedia text data and fine-tuning the pre-trained model; and 3) proposing a zero-shot category mapping method based on GloVe text similarity, using Wikipedia categories to replace textual categories. Without using labeled samples for fine-tuning, the proposed method achieves results comparable to the best models fine-tuned with labeled samples.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-09-22T17:28:37Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2023-09-22T17:28:37Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents口試委員審定書i
致謝iii
摘要v
Abstract vii
目錄ix
圖目錄xiii
表目錄xv
第一章緒論1
1.1 研究動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 研究貢獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 章節概述. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
第二章文獻探討5
2.1 詞向量. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 靜態詞向量模型. . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1.1 LSA . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1.2 Word2vec . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1.3 GloVe . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 動態詞向量模型. . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 語言模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 傳統語言模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 基於神經網路的語言模型. . . . . . . . . . . . . . . . . . . . . . 11
2.2.3 預訓練語言模型. . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.4 基于Transformer 的預訓練語言模型. . . . . . . . . . . . . . . . 14
2.2.4.1 GPT . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.4.2 BERT . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.4.3 ALBERT . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.4.4 RoBERTa . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.4.5 DeBERTa . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 現有的零樣本文本分類模型. . . . . . . . . . . . . . . . . . . . . . 21
2.3.1 大型生成式語言模型+ 基於提示的方法. . . . . . . . . . . . . . 22
2.3.2 較小預訓練語言模型+ 微調的方法. . . . . . . . . . . . . . . . 23
2.3.2.1 基於自然語言推理的方法. . . . . . . . . . . . . . . 23
2.3.2.2 UniMC . . . . . . . . . . . . . . . . . . . . . . . . . . 26
第三章資料集介紹29
3.1 Yahoo! Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 AG News . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 DBpedia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 IMDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
第四章研究方法35
4.1 模型微調. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.1 開放域訓練資料的獲取. . . . . . . . . . . . . . . . . . . . . . . 36
4.1.2 模型輸入格式處理. . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 類別映射. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2.1 類別映射前置處理. . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2.2 類別映射. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2.3 替代詞列表使用. . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.4 替代詞篩選機制. . . . . . . . . . . . . . . . . . . . . . . . . . . 45
第五章實驗設計和結果討論47
5.1 實驗任务. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 實驗流程及設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2.1 實驗流程. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2.2 實驗設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3 實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.3.1 實驗1:零樣本文本分類模型性能對比實驗. . . . . . . . . . . 50
5.3.2 實驗2.1:使用維基百科資料微調模型前後效果比對實驗. . . . 52
5.3.3 實驗2.2:訓練資料類別數量探究實驗. . . . . . . . . . . . . . 53
5.3.4 實驗3.1:替代詞列表效果探究實驗. . . . . . . . . . . . . . . . 55
5.3.5 實驗3.2:篩選機制效果實驗. . . . . . . . . . . . . . . . . . . . 56
5.3.6 實驗4:維基百科微調與類別映射消融實驗. . . . . . . . . . . 58
5.3.7 實驗5.1:研究方法使用前後模型性能對比實驗. . . . . . . . . 59
5.3.8 實驗5.2:UniMC-WiKi 與最佳模型性能對比實驗. . . . . . . . 60
第六章結論和未來工作63
6.1 結論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.2 未來工作. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
參考文獻67
-
dc.language.isozh_TW-
dc.subject自然語言處理zh_TW
dc.subject零樣本文本分類zh_TW
dc.subject預訓練語言模型zh_TW
dc.subject分類zh_TW
dc.subjectGloVezh_TW
dc.subjectPretrained Language Modelsen
dc.subjectZero Shot Text Classificationen
dc.subjectClassificationen
dc.subjectNatural Language Processingen
dc.subjectGloVeen
dc.title使用類別映射的零樣本文本分類zh_TW
dc.titleCategory Mapping for Zero-shot Text Classificationen
dc.typeThesis-
dc.date.schoolyear111-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee蔡宗翰;陳縕儂zh_TW
dc.contributor.oralexamcommitteeRichard Tzong-Han Tsai;Yun-Nung Chenen
dc.subject.keyword自然語言處理,預訓練語言模型,零樣本文本分類,分類,GloVe,zh_TW
dc.subject.keywordNatural Language Processing,Pretrained Language Models,Zero Shot Text Classification,Classification,GloVe,en
dc.relation.page70-
dc.identifier.doi10.6342/NTU202304127-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2023-08-14-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-111-2.pdf3.41 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved