請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/79775完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 顏嗣鈞(Hsu-chun Yen) | |
| dc.contributor.author | Wei-Chun Wang | en |
| dc.contributor.author | 王韋鈞 | zh_TW |
| dc.date.accessioned | 2022-11-23T09:10:53Z | - |
| dc.date.available | 2021-08-20 | |
| dc.date.available | 2022-11-23T09:10:53Z | - |
| dc.date.copyright | 2021-08-20 | |
| dc.date.issued | 2021 | |
| dc.date.submitted | 2021-08-16 | |
| dc.identifier.citation | Alberto Barr´on-Cede˜no, Paolo Rosso, Eneko Agirre, and Gorka Labaka. Plagiarism detection across distant language pairs. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 37–45, Beijing, China, August 2010. Coling 2010 Organizing Committee. Thomas K Landauer and Susan T Dumais. A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological review, 104(2):211, 1997. Rudi L Cilibrasi and Paul MB Vitanyi. The google similarity distance. IEEE Transactions on knowledge and data engineering, 19(3):370–383, 2007. Francois Fouss, Alain Pirotte, Jean-michel Renders, and Marco Saerens. Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Transactions on Knowledge and Data Engineering, 19(3):355–369, 2007. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language understanding by generative pre-training. Technical report, OpenAI, 2018. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. CoRR, abs/1706.03762, 2017. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR, abs/1609.08144, 2016. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. Xlnet: Generalized autoregressive pretraining for language understanding. CoRR, abs/1906.08237, 2019. Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. ALBERT: A lite BERT for self-supervised learning of language representations. CoRR, abs/1909.11942, 2019. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692, 2019. Xiaodong Liu, Pengcheng He, Weizhu Chen, and Jianfeng Gao. Multi-task deep neural networks for natural language understanding. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4487– 4496, Florence, Italy, July 2019. Association for Computational Linguistics. Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. TinyBERT: Distilling BERT for natural language understanding. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4163–4174, Online, November 2020. Association for Computational Linguistics. Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, and Cordelia Schmid. Videobert: A joint model for video and language representation learning. CoRR, abs/1904.01766, 2019. Robertson and Stephen. Understanding inverse document frequency: on theoretical arguments for idf. Journal of documentation, 2004. Akiko Aizawa. An information-theoretic perspective of tf–idf measures. Information Processing Management, 39(1):45–65, 2003. Bun, Khoo Khyou, and Mitsuru Ishizuka. Emerging topic tracking system. In Proceedings Third International Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems. WECWIS 2001, pages 2–11. IEEE, 2001. Josef Sivic and Andrew Zisserman. Video google: A text retrieval approach to object matching in videos. In Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2, ICCV ’03, page 1470, USA, 2003. IEEE Computer Society. Yang Liu and Mirella Lapata. Text summarization with pretrained encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3730–3740, Hong Kong, China, November 2019. Association for Computational Linguistics. Chen Qu, Liu Yang, Minghui Qiu, W. Bruce Croft, Yongfeng Zhang, and Mohit Iyyer. Bert with history answer embedding for conversational question answering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’19, page 1133–1136, New York, NY, USA, 2019. Association for Computing Machinery. Zeynep Akkalyoncu Yilmaz, Shengjin Wang, Wei Yang, Haotian Zhang, and Jimmy Lin. Applying BERT to document retrieval with birch. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, pages 19–24, Hong Kong, China, November 2019. Association for Computational Linguistics. Dat Quoc Nguyen, Thanh Vu, and Anh Tuan Nguyen. BERTweet: A pretrained language model for English tweets. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demon39 strations, pages 9–14, Online, October 2020. Association for Computational Linguistics. Israa Alghanmi, Luis Espinosa Anke, and Steven Schockaert. Combining BERT with static word embeddings for categorizing social media. In Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pages 28–33, Online, November 2020. Association for Computational Linguistics. Minghao Zhu, Youzhe Song, Ge Jin, and Keyuan Jiang. Identifying personal experience tweets of medication effects using pre-trained RoBERTa language model and its updating. In Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis, pages 127–137, Online, November 2020. Association for Computational Linguistics. Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. International Conference on Learning Representations, 12 2014 | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/79775 | - |
| dc.description.abstract | 在許多公司當中會撰寫許多指導手冊來幫助員工。指導手冊內會盡量詳細記載每個業務環境下應該遵循的法律或注意的事項來避免員工違反當地政府的法規進而造成後續損失。然而由於法規的變動相當頻繁,使得公司要花費巨大的人力去檢查更新手冊內的事項是否需要更新。 雖然強大的語言模型 BERT 在 許多自然語言相關任務下達成了巨量的進展,但在此特別的法規語意相似度檢測任務下無法處理得很好。因此本篇論文提出了一個模型:透過加入關鍵字的想法給模型來使得效能可以上升。我們的模型在資料集上得到了很高的分數並且我們也發現關鍵字的確是可以幫助模型學習到句子間的關聯。 | zh_TW |
| dc.description.provenance | Made available in DSpace on 2022-11-23T09:10:53Z (GMT). No. of bitstreams: 1 U0001-1108202112031000.pdf: 2387612 bytes, checksum: 64e8175c7d5eef1a5d82115fb79066a6 (MD5) Previous issue date: 2021 | en |
| dc.description.tableofcontents | 誌謝 i 摘要 ii Abstract iii List of Tables vi List of Figures viii List of Algorithms ix 1 Introduction 1 2 Preliminaries 5 2.1 BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 Pre-training . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.3 Fine-tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.4 After BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 TF-IDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Semantic Textual Similarity Detection . . . . . . . . . . . . . . . . . 13 3 Similarity Detection Model 15 3.1 Dataset and Pre-process . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 Baseline Model: BERT Classifier . . . . . . . . . . . . . . . . . . . . 16 3.3 Model: TFIDF Classifier . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.4 Model: BERT-TFIDF Classifier . . . . . . . . . . . . . . . . . . . . . 19 3.5 Model: BERT-TFIDF Regression . . . . . . . . . . . . . . . . . . . . 19 4 Experiments 22 4.1 Semantic Textual Similarity Classification . . . . . . . . . . . . . . . 22 4.1.1 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.1.2 Training time . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.1.3 Where does TFIDF help? . . . . . . . . . . . . . . . . . . . . 24 4.1.4 What does the model learn? . . . . . . . . . . . . . . . . . . . 26 5 Conclusion 28 6 References 29 | |
| dc.language.iso | en | |
| dc.subject | 語意相似度檢測 | zh_TW |
| dc.subject | 自然語言處理 | zh_TW |
| dc.subject | 語言模型 | zh_TW |
| dc.subject | Language Model | en |
| dc.subject | Semantic Similarity Detection | en |
| dc.subject | NLP | en |
| dc.title | 法律文本語意相似度檢測於法律與行政規則 | zh_TW |
| dc.title | Semantic Similarity Detection for Legal Texts Between Act and Direction | en |
| dc.date.schoolyear | 109-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 雷欽隆(Hsin-Tsai Liu),郭斯彥(Chih-Yang Tseng) | |
| dc.subject.keyword | 自然語言處理,語言模型,語意相似度檢測, | zh_TW |
| dc.subject.keyword | NLP,Semantic Similarity Detection,Language Model, | en |
| dc.relation.page | 32 | |
| dc.identifier.doi | 10.6342/NTU202102272 | |
| dc.rights.note | 同意授權(全球公開) | |
| dc.date.accepted | 2021-08-16 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 電機工程學研究所 | zh_TW |
| 顯示於系所單位: | 電機工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-1108202112031000.pdf | 2.33 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
