以中文新聞報導為主題之文本分析與改寫生成

陳恩穎; En-Ying Chen

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89147

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	莊裕澤	zh_TW
dc.contributor.advisor	Yuh-Jzer Joung	en
dc.contributor.author	陳恩穎	zh_TW
dc.contributor.author	En-Ying Chen	en
dc.date.accessioned	2023-08-16T17:19:34Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-08-16	-
dc.date.issued	2023	-
dc.date.submitted	2023-08-09	-
dc.identifier.citation	C. Bannard and C. Callison-Burch. Paraphrasing with bilingual parallel corpora. In Proceedings of the 43rd annual meeting of the Association for Computational Linguistics (ACL'05), pages 597–604, 2005. R. Barzilay and L. Lee. Learning to paraphrase: An unsupervised approach using multiple-sequence alignment. arXiv preprint cs/0304006, 2003. J. Berant and P. Liang. Semantic parsing via paraphrasing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1415–1425, 2014. R. Cao, S. Zhu, C. Yang, C. Liu, R. Ma, Y. Zhao, L. Chen, and K. Yu. Unsupervised dual paraphrasing for two-stage semantic parsing. arXiv preprint arXiv:2005.13485, 2020. D. Cer, Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, et al. Universal sentence encoder for english. In Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations, pages 169–174, 2018. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. L.Dong, J.Mallinson, S.Reddy, and M.Lapata.Learningtoparaphraseforquestion answering. arXiv preprint arXiv:1708.06022, 2017. S. Gao, Y. Zhang, Z. Ou, and Z. Yu. Paraphrase augmented task-oriented dialog generation. arXiv preprint arXiv:2004.07462, 2020. W. A. o. N. P. W.-I. Germany-based consulting group Schickler. Ai's rising role with editing and reader revenue, 2022. T. Goyal and G. Durrett. Neural syntactic preordering for controlled paraphrase generation. arXiv preprint arXiv:2005.02013, 2020. A. Gupta, A. Agarwal, P. Singh, and P. Rai. A deep generative framework for paraphrase generation. In Proceedings of the aaai conference on artificial intelligence, volume 32, 2018. K.-H. Huang, C. Li, and K.-W. Chang. Generating sports news from live commentary: A chinese dataset for sports game summarization. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pages 609–615, 2020. J. Kanerva, S. Rönnqvist, R. Kekki, T. Salakoski, and F. Ginter. Template-free data-to-text generation of finnish sports news. arXiv preprint arXiv:1910.01863, 2019. A. Kumar, S. Bhattamishra, M. Bhandari, and P. Talukdar. Submodular optimization-based diverse paraphrasing and its effectiveness in data augmentation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3609–3619, 2019. M. Kusner, Y. Sun, N. Kolkin, and K. Weinberger. From word embeddings to document distances. In International conference on machine learning, pages 957–966. PMLR, 2015. Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019. L. Leppänen, M. Munezero, M. Granroth-Wilding, and H. Toivonen. Data-driven news generation for automated journalism. In Proceedings of the 10th international conference on natural language generation, pages 188–197, 2017. Z. Li, X. Jiang, L. Shang, and H. Li. Paraphrase generation with deep reinforcement learning. arXiv preprint arXiv:1711.00279, 2017. C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004. Z. Lin, Y. Cai, and X. Wan. Towards document-level paraphrase generation with sentence rewriting and reordering. arXiv preprint arXiv:2109.07095, 2021. X. Liu, Q. Chen, C. Deng, H. Zeng, J. Chen, D. Li, and B. Tang. Lcqmc: A large- scale chinese question matching corpus. In Proceedings of the 27th international conference on computational linguistics, pages 1952–1962, 2018. K. McKeown. Paraphrasing questions using given and new information. American Journal of Computational Linguistics, 9(1):1–10, 1983. K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002. A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019. Y. Scherrer. Tapaco: A corpus of sentential paraphrases for 73 languages. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association (ELRA), 2020. G. H. Song and Y. Wang. Paraphrase generation with chinese short text dataset. In 2020 5th International Conference on Computational Intelligence and Applications (ICCIA), pages 60–64. IEEE, 2020. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. S. Witteveen and M. Andrews. Paraphrasing with large language models. arXiv preprint arXiv:1911.09661, 2019. M. Wong. Conjunctive adverbs, 2021. https://reurl.cc/GAaWE3. Y. Yang, Y. Zhang, C. Tar, and J. Baldridge. Paws-x: A cross-lingual adversarial dataset for paraphrase identification. arXiv preprint arXiv:1908.11828, 2019. J. Zhou and S. Bhat. Paraphrase generation: A survey of the state of the art. In Proceedings of the 2021 conference on empirical methods in natural language processing, pages 5075–5086, 2021. S. Zhu, X. Cheng, S. Su, and S. Lang. Knowledge-based question answering by jointly generating, copying and paraphrasing. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pages 2439–2442, 2017.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89147	-
dc.description.abstract	近年來網路以及通訊裝置以驚人的速度發展，同時也帶動了社群媒體的蓬勃發展，網路新聞成為民眾獲取新知的主要媒介。而在這個新媒體時代，對於媒體從業人員來說，若希望自己能在各個社群平台的激烈競爭下都保有一席之地，勢必要能夠創造出多樣化版本的新聞，以滿足不同平台的觀眾。此時，若有一個系統能將現有新聞快速產生出改寫版本，產生出一篇新的報導，勢必能為媒體從業人員提供很大的幫助，由此可知改寫系統對新聞領域有相當重要且急迫的需求。因此，本論文設計了一個針對新聞的文本改寫系統，能夠讓使用者輸入一篇原始新聞之後，透過模型得到一篇文章結構、用字相異的改寫版本新聞，希望藉由這樣的模型來協助媒體從業人員，並為整體新聞產業做出一定程度的貢獻。本論文除了使用強大的預訓練模型GPT-2，並使用了經過整理的TaPaCo以及PAWS-X資料集進行實驗，並結合規則以及語序重構模型，藉此讓模型可以產生出以篇章為單位且有明顯結構不同的改寫新聞文章，而非只是逐句的改寫或字詞置換。在最後自動評估以及人工評估兩種評估方式上，本論文所提出的方法對於新聞的改寫程度明顯優於Baseline Model的表現，說明了在以篇章為單位的改寫當中，本論文加入的語序重構技術相較於單純的逐句改寫可以獲得更好的效果，也更貼近人類既定印象中的改寫。	zh_TW
dc.description.abstract	Recent years have witnessed astonishing advancements in internet and communication devices, which have also fueled the thriving growth of social media. Online news has become the primary medium for people to acquire new information. In this era of new media, media professionals strive to maintain a presence across various social platforms amidst fierce competition. It is imperative for them to create diverse versions of news to cater to different audiences on each platform. In such a scenario, a system capable of rapidly generating paraphrased versions of existing news, creating new reports, would undoubtedly provide significant assistance to media professionals. Hence, it is evident that a paraphrasing system is highly important and urgently needed in the field of news. Consequently, this thesis presents a text paraphrasing system specifically designed for news articles. This system allows users to input an original news article and obtain a paraphrased version with different sentence structures and vocabulary through a model. The aim is to assist media professionals and contribute to the overall news industry. In this thesis, besides employing the powerful pre-trained model GPT-2, we conducted experiments using curated datasets such as TaPaCo and PAWS-X. Additionally, we incorporated rules and sentence reordering models to enable the generation of paraphrased news articles at the paragraph level, ensuring distinct structural differences rather than simple sentence rewrites or word replacements. In both automatic and manual evaluations, the proposed method outperformed the Baseline Model in terms of the extent of news paraphrasing. This indicates that the sentence reordering technique introduced in this thesis yields better results compared to merely rewriting sentences and aligns more closely with human perceptions of paraphrasing.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-16T17:19:34Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-08-16T17:19:34Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	致謝 i 摘要 ii Abstract iii 目錄 v 圖目錄 viii 表目錄 ix 第一章緒論 1 1.1 研究背景與動機 1 1.2 研究目的 3 1.3 論文架構 4 第二章文獻探討 5 2.1 自然語言處理與預訓練模型 6 2.2 文本改寫 8 2.3 語序重構和語言學分析 10 2.4 新聞之發展與自動生成 13 2.5 文本改寫評估方法 15 2.5.1 自動評估 15 2.5.2 人工評估 17 2.5.3 適合本研究之評估指標 18 2.6 總結 19 第三章研究方法 21 3.1 研究架構 21 3.1.1 第一階段 22 3.1.2 第二階段 22 3.1.3 第三階段 24 3.2 資料集 26 3.3 GPT-2文本生成模型26 3.4 研究驗證 28 3.4.1 自動評估 28 3.4.2 人工評估 31 第四章研究結果 33 4.1 語序規則建立和語序重構 33 4.1.1 規則詳細說明 33 4.1.2 語序重構結果分析 35 4.1.3 語序重構評估結果 37 4.2 改寫系統訓練參數 38 4.3 改寫系統結果分析 39 4.4 自動評估結果 41 4.5 人工評估結果 42 4.6 小結 46 第五章結論 48 5.1 研究成果 48 5.2 研究貢獻 49 5.3 研究限制 50 5.3.1 資料量之限制 51 5.3.2 改寫資料難以大量蒐集 51 5.4 未來研究方向 52 參考文獻 54	-
dc.language.iso	zh_TW	-
dc.subject	新聞	zh_TW
dc.subject	語序重構	zh_TW
dc.subject	改寫生成	zh_TW
dc.subject	預訓練模型	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	Paraphrase generation	en
dc.subject	Sentence order reconstruction	en
dc.subject	Pre-trained model	en
dc.subject	News	en
dc.subject	Deep learning	en
dc.title	以中文新聞報導為主題之文本分析與改寫生成	zh_TW
dc.title	Document Analysis and Paraphrase Generation for Chinese News Reports	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	魏志平;陳文華	zh_TW
dc.contributor.oralexamcommittee	Chih-Ping Wei;Wun-Hwa Chen	en
dc.subject.keyword	新聞,改寫生成,語序重構,預訓練模型,深度學習,	zh_TW
dc.subject.keyword	News,Paraphrase generation,Sentence order reconstruction,Pre-trained model,Deep learning,	en
dc.relation.page	58	-
dc.identifier.doi	10.6342/NTU202303745	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2023-08-11	-
dc.contributor.author-college	管理學院	-
dc.contributor.author-dept	資訊管理學系	-
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	2.63 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。