請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/6398
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 林守德(Shou-De Lin) | |
dc.contributor.author | Wan-Yu Lin | en |
dc.contributor.author | 林琬瑜 | zh_TW |
dc.date.accessioned | 2021-05-16T16:28:11Z | - |
dc.date.available | 2013-02-10 | |
dc.date.available | 2021-05-16T16:28:11Z | - |
dc.date.copyright | 2013-02-01 | |
dc.date.issued | 2013 | |
dc.date.submitted | 2013-01-17 | |
dc.identifier.citation | [1] David M. Blei, Andrew Y. Ng, Michael I. Jordan, and John Lafferty. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:2003.
[2] Bear F. Braumoeller and Brian J. Gaines. 2001. Actions Do Speak Louder Than Words: Deterring Plagiarism with the Use of Plagiarism-Detection Software. In Political Science & Politics, 34(4):835-839. [3] Sergey Brin, James Davis, and Hector Garcia-molina. 1995. Copy Detection Mechanisms for Digital Documents. In Proceedings of the ACM SIGMOD Annual Conference, 24(2):398-409. [4] Alberto Barron Cedeno and Paolo Rosso. 2009. On Automatic Plagiarism Detection based on n-grams Comparison. In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, ECIR 2009, LNCS 5478:696-700, Springer-Verlag, and Berlin Heidelberg, [5] Jan Grman and Rudolf Ravas. 2011. Improved implementation for finding text similarities in large collections of data. In Proceedings of PAN 2011. [6] Yuhua Li, David McLean, Zuhair A. Bandar, James D. O’ Shea, and Keeley Crockett. 2006. Sentence Similarity Based on Semantic Nets and Corpus Statistics. In Proceedings of the IEEE Transactions on Knowledge and Data Engineering, 18(8):1138-1150. [7] Yi-Ting Liu, Heng-Rui Zhang, Tai-Wei Chen, and Wei-Guang Teng. 2007. Extending Web Search for Online Plagiarism Detection. In Proceedings of the IEEE International Conference on Information Reuse and Integration, IRI 2007. [8] Wan-Yu Lin, Nanyun Peng, Chun-Chao Yen, and Shou-de Lin. 2012. Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information. In Proceedings of ACL 2012 Demo. [9] Sebastian Niezgoda and Thomas P. Way. 2006. SNITCH: A Software Tool for Detecting Cut and Paste Plagiarism. In Proceedings of the 37th SIGCSE Technical Symposium on Computer Science Education, p.51-55. [10] Maria Soledad Pera and Yiu-kai Ng. 2010. IOS Press SimPaD: A Word-Similarity Sentence-Based Plagiarism Detection Tool on Web Documents. In Journal on Web Intelligence and Agent Systems, 9(1). [11] Martin Potthast, Benno Stein, Alberto Barron Cedeno, and Paolo Rosso. 2010. An Evaluation Framework for Plagiarism Detection. In 23rd International Conference on Computational Linguistics (COLING 10). Association for Computational Linguistics. [12] Kenneth Sorensena and Marc Sevaux. 2005. Permutation Distance Measures for Memetic Algorithms with Population Management. In Proceedings of 6th Metaheuristics International Conference. [13] Efstathios Stamatatos, 'Plagiarism Detection Based on Structural Information' in Proceedings of the 20th ACM international conference on Information and knowledge management, CIKM'11 [14] Robert A. Wagner and Michael J. Fischer. 1975. The String-to-string correction problem. In Journal of the ACM, 21(1):168-173. [15] Daniel R. White and Mike S. Joy. 2004. Sentence-Based Natural Language Plagiarism Detection. In Journal on Educational Resources in Computing JERIC Homepage archive, 4(4). [16] Du Zou, Wei-jiang Long, and Zhang Ling. 2010. A Cluster-Based Plagiarism Detection Method. In Lab Report for PAN at CLEF 2010. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/6398 | - |
dc.description.abstract | 傳統的抄襲偵測系統,許多只著重在文章的語彙統計特徵,至多再考慮句法結構,或利用 WordNet 來擷取文章的語義面訊息,且以離線的抄襲偵測居多;我們的系統則是將搜尋引擎整合進來,同時引進語彙、句法和語義這三個層面的結構特徵,抽取可疑文句組對裡,語彙的重覆率、重組率、連續性,單詞在句中所屬的詞性和片語標籤,以及透過 Latent Dirichlet Allocation (LDA) 所標記出的潛在主題來代表可能蘊含的語義資訊,如此結合這六個不同的抄襲偵測模型,再利用我們所設計的加權方法將六個模型的預測結果合併,是一個能自動偵測網路抄襲的線上系統。實驗結果顯示無論是英文還是中文的文章,我們的系統都能成功偵測出相當數量的可能抄襲來源,實驗數據上的表現也相較目前一些最先進的演算法還要來得突出。 | zh_TW |
dc.description.abstract | In this paper, we introduce a framework that identifies sentence and document level online plagiarism by exploiting lexical, syntactic and semantic features, which includes duplication ngram, reordering and alignment of words, POS and phrase tags, and semantic similarity of sentences. We also enhance plagiarism detection by establishing an ensemble framework to combine the prediction scores of each model. Experiments performed on English and Chinese corpora demonstrate that our system can not only find considerable amount of real-world online plagiarism cases but also outperforms several state-of-the-art algorithms. | en |
dc.description.provenance | Made available in DSpace on 2021-05-16T16:28:11Z (GMT). No. of bitstreams: 1 ntu-102-R99944016-1.pdf: 1388697 bytes, checksum: 48c3bc56c470a030fc511002d0b5e0b9 (MD5) Previous issue date: 2013 | en |
dc.description.tableofcontents | Abstract IV
Chapter 1 Introduction 1 Chapter 2 Related Work 3 Chapter 3 Methodology 5 3.1 Query a Search Engine 6 3.2 Sentence Level Plagiarism Detection 7 3.2.1 Ngram Matching (NM) 7 3.2.2 Reordering of Words (RW) 8 3.2.3 Alignment of Words (AW) 9 3.2.4 POS and Phrase Tag of Words (POS, PT) 11 3.2.5 Semantic Similarity (LDA) 12 3.3 Ensemble Similarity Scores 13 3.4 Document Level Plagiarism Detection 14 Chapter 4 Evaluation 15 4.1 Dataset 15 4.1.1 PAN-2010 Corpus 15 4.1.2 Chinese Web Documents 16 4.2 Sentence-based Evaluations on PAN-2010 18 4.3 Full System Evaluations on Chinese Web Documents 19 4.4 Discussion 19 Chapter 5 System Demonstration 21 Chapter 6 Conclusion 24 References 25 Appendix 28 | |
dc.language.iso | en | |
dc.title | 利用語彙、句法以及語義資訊偵測網路抄襲 | zh_TW |
dc.title | Online Plagiarized Detection Through Exploiting Lexical, Syntactic, and Semantic Information | en |
dc.type | Thesis | |
dc.date.schoolyear | 101-1 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 張俊盛(Jyun-Sheng Chang),陳克健(Keh-Jiann Chen),曾元顯(Yuen-Hsien Tseng) | |
dc.subject.keyword | 抄襲偵測,語彙,句法,語義, | zh_TW |
dc.subject.keyword | Plagiarism Detection,Lexical,Syntactic,Semantic, | en |
dc.relation.page | 30 | |
dc.rights.note | 同意授權(全球公開) | |
dc.date.accepted | 2013-01-17 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 資訊網路與多媒體研究所 | zh_TW |
顯示於系所單位: | 資訊網路與多媒體研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-102-1.pdf | 1.36 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。