利用語彙、句法以及語義資訊偵測網路抄襲

Wan-Yu Lin; 林琬瑜

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/6398

標題:	利用語彙、句法以及語義資訊偵測網路抄襲 Online Plagiarized Detection Through Exploiting Lexical, Syntactic, and Semantic Information
作者:	Wan-Yu Lin 林琬瑜
指導教授:	林守德(Shou-De Lin)
關鍵字:	抄襲偵測,語彙,句法,語義, Plagiarism Detection,Lexical,Syntactic,Semantic,
出版年 :	2013
學位:	碩士
摘要:	傳統的抄襲偵測系統，許多只著重在文章的語彙統計特徵，至多再考慮句法結構，或利用 WordNet 來擷取文章的語義面訊息，且以離線的抄襲偵測居多；我們的系統則是將搜尋引擎整合進來，同時引進語彙、句法和語義這三個層面的結構特徵，抽取可疑文句組對裡，語彙的重覆率、重組率、連續性，單詞在句中所屬的詞性和片語標籤，以及透過 Latent Dirichlet Allocation (LDA) 所標記出的潛在主題來代表可能蘊含的語義資訊，如此結合這六個不同的抄襲偵測模型，再利用我們所設計的加權方法將六個模型的預測結果合併，是一個能自動偵測網路抄襲的線上系統。實驗結果顯示無論是英文還是中文的文章，我們的系統都能成功偵測出相當數量的可能抄襲來源，實驗數據上的表現也相較目前一些最先進的演算法還要來得突出。 In this paper, we introduce a framework that identifies sentence and document level online plagiarism by exploiting lexical, syntactic and semantic features, which includes duplication ngram, reordering and alignment of words, POS and phrase tags, and semantic similarity of sentences. We also enhance plagiarism detection by establishing an ensemble framework to combine the prediction scores of each model. Experiments performed on English and Chinese corpora demonstrate that our system can not only find considerable amount of real-world online plagiarism cases but also outperforms several state-of-the-art algorithms.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/6398
全文授權:	同意授權(全球公開)
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-102-1.pdf	1.36 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。