利用語彙、句法以及語義資訊偵測網路抄襲

Wan-Yu Lin; 林琬瑜

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/6398

Title:	利用語彙、句法以及語義資訊偵測網路抄襲 Online Plagiarized Detection Through Exploiting Lexical, Syntactic, and Semantic Information
Authors:	Wan-Yu Lin 林琬瑜
Advisor:	林守德(Shou-De Lin)
Keyword:	抄襲偵測,語彙,句法,語義, Plagiarism Detection,Lexical,Syntactic,Semantic,
Publication Year :	2013
Degree:	碩士
Abstract:	傳統的抄襲偵測系統，許多只著重在文章的語彙統計特徵，至多再考慮句法結構，或利用 WordNet 來擷取文章的語義面訊息，且以離線的抄襲偵測居多；我們的系統則是將搜尋引擎整合進來，同時引進語彙、句法和語義這三個層面的結構特徵，抽取可疑文句組對裡，語彙的重覆率、重組率、連續性，單詞在句中所屬的詞性和片語標籤，以及透過 Latent Dirichlet Allocation (LDA) 所標記出的潛在主題來代表可能蘊含的語義資訊，如此結合這六個不同的抄襲偵測模型，再利用我們所設計的加權方法將六個模型的預測結果合併，是一個能自動偵測網路抄襲的線上系統。實驗結果顯示無論是英文還是中文的文章，我們的系統都能成功偵測出相當數量的可能抄襲來源，實驗數據上的表現也相較目前一些最先進的演算法還要來得突出。 In this paper, we introduce a framework that identifies sentence and document level online plagiarism by exploiting lexical, syntactic and semantic features, which includes duplication ngram, reordering and alignment of words, POS and phrase tags, and semantic similarity of sentences. We also enhance plagiarism detection by establishing an ensemble framework to combine the prediction scores of each model. Experiments performed on English and Chinese corpora demonstrate that our system can not only find considerable amount of real-world online plagiarism cases but also outperforms several state-of-the-art algorithms.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/6398
Fulltext Rights:	同意授權(全球公開)
Appears in Collections:	資訊網路與多媒體研究所

Files in This Item:

File	Size	Format
ntu-102-1.pdf	1.36 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets