請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89147
標題: | 以中文新聞報導為主題之文本分析與改寫生成 Document Analysis and Paraphrase Generation for Chinese News Reports |
作者: | 陳恩穎 En-Ying Chen |
指導教授: | 莊裕澤 Yuh-Jzer Joung |
關鍵字: | 新聞,改寫生成,語序重構,預訓練模型,深度學習, News,Paraphrase generation,Sentence order reconstruction,Pre-trained model,Deep learning, |
出版年 : | 2023 |
學位: | 碩士 |
摘要: | 近年來網路以及通訊裝置以驚人的速度發展,同時也帶動了社群媒體的蓬勃發展,網路新聞成為民眾獲取新知的主要媒介。而在這個新媒體時代,對於媒體從業人員來說,若希望自己能在各個社群平台的激烈競爭下都保有一席之地,勢必要能夠創造出多樣化版本的新聞,以滿足不同平台的觀眾。此時,若有一個系統能將現有新聞快速產生出改寫版本,產生出一篇新的報導,勢必能為媒體從業人員提供很大的幫助,由此可知改寫系統對新聞領域有相當重要且急迫的需求。
因此,本論文設計了一個針對新聞的文本改寫系統,能夠讓使用者輸入一篇原始新聞之後,透過模型得到一篇文章結構、用字相異的改寫版本新聞,希望藉由這樣的模型來協助媒體從業人員,並為整體新聞產業做出一定程度的貢獻。 本論文除了使用強大的預訓練模型GPT-2,並使用了經過整理的TaPaCo以及PAWS-X資料集進行實驗,並結合規則以及語序重構模型,藉此讓模型可以產生出以篇章為單位且有明顯結構不同的改寫新聞文章,而非只是逐句的改寫或字詞置換。在最後自動評估以及人工評估兩種評估方式上,本論文所提出的方法對於新聞的改寫程度明顯優於Baseline Model的表現,說明了在以篇章為單位的改寫當中,本論文加入的語序重構技術相較於單純的逐句改寫可以獲得更好的效果,也更貼近人類既定印象中的改寫。 Recent years have witnessed astonishing advancements in internet and communication devices, which have also fueled the thriving growth of social media. Online news has become the primary medium for people to acquire new information. In this era of new media, media professionals strive to maintain a presence across various social platforms amidst fierce competition. It is imperative for them to create diverse versions of news to cater to different audiences on each platform. In such a scenario, a system capable of rapidly generating paraphrased versions of existing news, creating new reports, would undoubtedly provide significant assistance to media professionals. Hence, it is evident that a paraphrasing system is highly important and urgently needed in the field of news. Consequently, this thesis presents a text paraphrasing system specifically designed for news articles. This system allows users to input an original news article and obtain a paraphrased version with different sentence structures and vocabulary through a model. The aim is to assist media professionals and contribute to the overall news industry. In this thesis, besides employing the powerful pre-trained model GPT-2, we conducted experiments using curated datasets such as TaPaCo and PAWS-X. Additionally, we incorporated rules and sentence reordering models to enable the generation of paraphrased news articles at the paragraph level, ensuring distinct structural differences rather than simple sentence rewrites or word replacements. In both automatic and manual evaluations, the proposed method outperformed the Baseline Model in terms of the extent of news paraphrasing. This indicates that the sentence reordering technique introduced in this thesis yields better results compared to merely rewriting sentences and aligns more closely with human perceptions of paraphrasing. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89147 |
DOI: | 10.6342/NTU202303745 |
全文授權: | 同意授權(全球公開) |
顯示於系所單位: | 資訊管理學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-111-2.pdf | 2.63 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。