Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85492
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor鄭卜壬(Pu-Jen Cheng)
dc.contributor.authorXIAOYU SIen
dc.contributor.author斯曉宇zh_TW
dc.date.accessioned2023-03-19T23:17:25Z-
dc.date.copyright2022-07-15
dc.date.issued2022
dc.date.submitted2022-07-12
dc.identifier.citation[1] H. P. Luhn, “The automatic creation of literature abstracts.” in IBM Journal of research and development, 1958, vol. 2, no. 2, pp. 159-165. [2] R. Mihalcea, and P. Tarau, “TextRank: Bringing Order into Text.” in Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, 2004, pp. 404-411. [3] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation.” in Journal of machine Learning research, 2003, vol. 3, pp. 993-1022. [4] R. Nallapati, F. Zhai, and B. Zhou, “Summarunner: A recurrent neural network based sequence model for extractive summarization of documents.” in Thirty-first AAAI conference on artificial intelligence, 2017. [5] S. Narayan, S. B. Cohen, and M. Lapata, “Ranking Sentences for Extractive Summarization with Reinforcement Learning.” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018, vol. 1, pp. 1747-1759. [6] K. Cho, B. v. Merrienboer, Ç. Gülçehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation.” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014, pp. 1724-1734. [7] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks.” in Advances in Neural Information Processing Systems, 2014, vol. 2, pp. 3104–3112. [8] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate.” in arXiv preprint arXiv:1409.0473, 2014. [9] K. Lopyrev, “Generating news headlines with recurrent neural networks.” in arXiv preprint: 1512.01712, 2015. [10] R. Nallapati, B. Zhou, C. dos Santos, Ç. Gu̇lçehre, and B. Xiang, “Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond.” in Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, 2016, pp. 280-290. [11] B. Hu, Q. Chen, and F. Zhu, “LCSTS: A Large Scale Chinese Short Text Summarization Dataset.” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015, pp. 1967-1972. [12] S. Ma, X. Sun, J. Xu, H. Wang, W. Li, and Q. Su, “Improving semantic relevance for sequence-to-sequence learning of chinese social media text summarization.” in arXiv preprint: 1706.02459, 2017. [13] A. See, P. Liu, and C. Manning, “Get To The Point: Summarization with Pointer-Generator Networks.” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017, vol. 1, pp. 1073-1083. [14] O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer Networks.” in Advances in Neural Information Processing Systems, 2015, pp. 2692-2700. [15] J. Gu, Z. Lu, H. Li, and V. O. K. Li, “Incorporating Copying Mechanism in Sequence-to-Sequence Learning.” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016, vol. 1, pp. 1631-1640. [16] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need.” in Advances in neural information processing systems, 2017, vol. 30. [17] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension.” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 7871-7880. [18] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer.” in Journal of Machine Learning Research, 2020, vol. 21, no. 140, pp. 1-67. [19] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, vol. 1, pp. 4171-4186. [20] Y. Liu, and M. Lapata, “Text Summarization with Pretrained Encoders.” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 3730-3740. [21] K. Song, X. Tan, T. Qin, J. Lu, and T.-Y. Liu, “MASS: Masked Sequence to Sequence Pre-training for Language Generation.” in International Conference on Machine Learning, 2019, pp. 5926-5936. [22] J. Zhang, Y. Zhao, M. Saleh, and P. Liu, “Pegasus: Pre-training with extracted gap-sentences for abstractive summarization.” in International Conference on Machine Learning, 2020, pp. 11328-11339. [23] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training.” in https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf, 2018. [24] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language Models are Unsupervised Multitask Learners.” in Technical report OpenAI blog, 2019, pp. 1.8: 9. [25] S. Gretz, Y. Bilu, E. Cohen-Karlik, and N. Slonim, “The workweek is the best time to start a family – A Study of GPT-2 Based Claim Generation.” in Findings of the Association for Computational Linguistics: EMNLP, 2020, pp. 528-544. [26] P. Budzianowski, and I. Vulić, “Hello, It’s GPT-2 - How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems.” in Proceedings of the 3rd Workshop on Neural Generation and Translation, 2019, pp. 15-22. [27] V. Kieuvongngam, B. Tan, and Y. Niu, “Automatic text summarization of covid-19 medical research articles using bert and gpt-2.” in arXiv preprint: 2006.01997, 2020. [28] L. Dong, N. Yang, W. Wang, F. Wei, X. Liu, Y. Wang, J. Gao, M. Zhou, and H.-W. Hon, “Unified language model pre-training for natural language understanding and generation.” in Advances in Neural Information Processing Systems, 2019, vol. 32. [29] L. Li, Y. Zhang, and L. Chen, “Personalized transformer for explainable recommendation.” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021, pp. 4947–4957. [30] G. Salton, and C. Buckley, “Term-weighting approaches in automatic text retrieval.” in Information processing & management, 1988, vol. 24, no. 5, pp. 513-523. [31] C.-Y. Lin, “ROUGE: A Package for Automatic Evaluation of summaries.” in Proceedings of the ACL Workshop: Text Summarization Braches Out, 2004, pp. 74-81.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85492-
dc.description.abstract數位資訊時代人們習慣於從網路中便捷地獲取即時新聞資訊。當前社群媒體中許多短篇新聞為了達到吸引眼球的目的會出現題不對文的問題,使得讀者很難通過標題直觀了解感興趣的新聞內容。 近年來,隨著深度學習的興起,針對新聞標題生成任務的研究也逐漸從基於RNN的方法發展到基於Transformer的方法。然而,現有的工作中仍然存在以下問題:首先,我們發現很多模型對新聞內容的理解不足,很難最大化利用內容中的有效資訊。其次,由於一篇新聞中往往有多個重點,當前許多模型不一定有能力捕獲那些適合作為標題的相關資訊。針對以上兩點問題,本研究以GPT-2模型為基礎架構對前述兩點問題提出了相的應改進方法並設計了一個兩階段訓練過程。在預訓練階段,我們通過重新設計注意力遮罩解決文章資訊利用不足的問題;在微調階段,模型將同時進行上下文和語言模型的學習,進一步引導模型關注文章內容中與標題最相關的資訊。 最後在實驗部分,我們比較了基於注意力的序列到序列生成模型、指針神經網路、基礎GPT-2模型及本研究提出的改進式GPT-2模型在標題生成任務上的表現差異。結果通過機器評估與人工評價均驗證了改進式GPT-2模型有能力生成符合文意且品質較高的新聞標題。zh_TW
dc.description.abstractWith the rapid development of the Internet, people are accustomed to obtain instant news information conveniently from their social software. Many short news in social media tend to use eye-catching title which leads to inconsistency with the facts. In the past few years, neural text summarization methods has been developed from RNN-based to Transformer-based methods. However, some of the existing works still have the following problems: (1) These models usually suffer from insufficient of understanding content information which makes the model hardly to maximize performance. (2) There are often multiple key points in an article, and many current models are not necessarily capable of capturing relevant information suitable for the title. In view of the above two problems, we improved the original GPT-2 model architecture and designs a two-stage training scheme. In pretrain phase, we re-designed attention mask trying to improve content understanding without disclosure of title information. In fine-tune phase, the model was designed to learn both context and language prediction. Finally, We compared the performance of the RNN-based Seq2Seq model, Pointer Generator, basis GPT-2 model and improved GPT-2 model in the title generation task. Both machine and human evaluation verified that the improved GPT-2 model has the ability to generate a high-quality news title.en
dc.description.provenanceMade available in DSpace on 2023-03-19T23:17:25Z (GMT). No. of bitstreams: 1
U0001-1107202215074200.pdf: 2113998 bytes, checksum: 4b97c0363d9cd175434d56570e1b620a (MD5)
Previous issue date: 2022
en
dc.description.tableofcontents誌謝 i 摘要 ii Abstract iii 目錄 iv 圖目錄 vi 表目錄 vii 第一章 研究介紹 1 第二章 相關研究 3 2.1 抽取式摘要方法 3 2.2 生成式摘要方法 3 2.3 基於Transformer的模型 4 第三章 研究問題 6 3.1 研究資料概述 6 3.2 問題定義 6 3.3 符號定義 7 第四章 研究方法 8 4.1 GPT-2模型 8 4.2 改進式GPT-2 11 4.2.1 自注意力遮罩設計 12 4.2.2 上下文學習任務 13 4.3 兩階段訓練 14 4.3.1 預訓練 14 4.3.2 微調 15 第五章 實驗分析 16 5.1 資料集簡介 16 5.2 前處理 16 5.3 對比模型 17 5.3.1 基於注意力的序列到序列模型 18 5.3.2 指針神經網路 19 5.4 模型表現及分析 20 5.4.1 機器評估 20 5.4.2 人工評測 23 5.4.3 案例分析 25 第六章 研究結論及未來展望 27 6.1 研究結論 27 6.2 未來展望 27 參考文獻 28
dc.language.isozh-TW
dc.subject注意力遮罩zh_TW
dc.subject上下文學習zh_TW
dc.subjectGPT-2zh_TW
dc.subject標題生成zh_TW
dc.subject自然語言處理zh_TW
dc.subjectGPT-2en
dc.subjectNatural language processingen
dc.subjectTitle Generationen
dc.subjectAttention Masken
dc.subjectContext Learningen
dc.title基於改進式GPT-2的新聞標題生成zh_TW
dc.titleAn Improved GPT-2 Model for News Title Generationen
dc.typeThesis
dc.date.schoolyear110-2
dc.description.degree碩士
dc.contributor.oralexamcommittee魏志達(Jyh-Da Wei),林正偉(Jeng-Wei Lin)
dc.subject.keyword自然語言處理,標題生成,GPT-2,注意力遮罩,上下文學習,zh_TW
dc.subject.keywordNatural language processing,Title Generation,GPT-2,Attention Mask,Context Learning,en
dc.relation.page32
dc.identifier.doi10.6342/NTU202201398
dc.rights.note同意授權(全球公開)
dc.date.accepted2022-07-12
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊網路與多媒體研究所zh_TW
dc.date.embargo-lift2022-07-15-
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
U0001-1107202215074200.pdf2.06 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved