請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/81982完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 藍俊宏(Jakey Blue) | |
| dc.contributor.author | Shang-Han Chao | en |
| dc.contributor.author | 趙上涵 | zh_TW |
| dc.date.accessioned | 2022-11-25T05:33:33Z | - |
| dc.date.available | 2027-02-06 | |
| dc.date.copyright | 2022-02-21 | |
| dc.date.issued | 2022 | |
| dc.date.submitted | 2022-02-10 | |
| dc.identifier.citation | [1] Bahdanau, D., Cho, K., Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. [2] Baidu. (2021, March 10). 2021语言与智能技术竞赛:多形态信息抽取任务. https://aistudio.baidu.com/aistudio/competition/detail/65 [3] Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. Advances in Neural Information Processing systems, 26. [4] Carbo Kuo. (2021, December 13). GitHub: BYVoid/OpenCC. https://github.com/BYVoid/OpenCC [5] CCKS 2019. (2021, December 26). CCKS 2019 Task 6 (Mandarin Text Data Only). https://www.biendata.xyz/competition/ccks_2019_ipre/ [6] Chang, P. C., Galley, M., Manning, C. D. (2008, June). Optimizing Chinese word segmentation for machine translation performance. In Proceedings of the Third Workshop on Statistical Machine Translation (pp. 224-232). [7] Cui, L., Wei, F., Zhou, M. (2018). Neural open information extraction. arXiv preprint arXiv:1805.04270. [8] De Marneffe, M. C., Manning, C. D. (2008). Stanford typed dependencies manual (pp. 338-345). Technical report, Stanford University. [9] De Marneffe, M. C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J., Manning, C. D. (2014, May). Universal Stanford dependencies: A cross-linguistic typology. In LREC (Vol. 14, pp. 4585-4592). [10] Deepika, S. S., Geetha, T. V. (2021). Pattern-based bootstrapping framework for biomedical relation extraction. Engineering Applications of Artificial Intelligence, 99, 104130. [11] Devlin, J., Chang, M. W., Lee, K., Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. [12] Fader, A., Soderland, S., Etzioni, O. (2011, July). Identifying relations for open information extraction. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (pp. 1535-1545). [13] Google. (2012, May 16). Introducing the Knowledge Graph: things, not strings. https://blog.google/products/search/introducing-knowledge-graph-things-not/ [14] Hsieh, Y. L., Chang, Y. C., Huang, Y. J., Yeh, S. H., Chen, C. H., Hsu, W. L. (2017, November). MONPA: Multi-objective named-entity and part-of-speech annotator for Chinese using recurrent neural network. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers) (pp. 80-85). [15] Jia, S., E, S., Li, M., Xiang, Y. (2018). Chinese open relation extraction and knowledge base establishment. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 17(3), 1-22. [16] Kolluru, K., Aggarwal, S., Rathore, V., Chakrabarti, S. (2020). IMoJIE: Iterative Memory-Based Joint Open Information Extraction. arXiv preprint arXiv:2005.08178. [17] Li, S., He, W., Shi, Y., Jiang, W., Liang, H., Jiang, Y., ... Zhu, Y. (2019, October). Duie: A large-scale chinese dataset for information extraction. In CCF International Conference on Natural Language Processing and Chinese Computing (pp. 791-800). Springer, Cham. [18] Lin, C. Y. (2004, July). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out (pp. 74-81). [19] Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X. (2015, February). Learning entity and relation embeddings for knowledge graph completion. In Twenty-ninth AAAI Conference on Artificial Intelligence. [20] Liu, H., Singh, P. (2004). ConceptNet—a practical commonsense reasoning tool-kit. BT Technology Journal, 22(4), 211-226. [21] Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. [22] Ma, W. Y., Chen, K. J. (2005). Design of CKIP Chinese word segmentation system. Chinese and Oriental Languages Information Processing Society, 14(3), 235-249. [23] Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. [24] Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39-41. [25] Nivre, J., Hall, J., Nilsson, J. (2004). Memory-based dependency parsing. In Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004 (pp. 49-56). [26] Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365. [27] Qi, P., Dozat, T., Zhang, Y., Manning, C. D. (2019). Universal dependency parsing from scratch. arXiv preprint arXiv:1901.10457. [28] Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C. D. (2020). Stanza: A Python natural language processing toolkit for many human languages. arXiv preprint arXiv:2003.07082. [29] Qiu, L., Zhang, Y. (2014, October). ZORE: A syntax-based system for chinese open relation extraction. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1870-1880). [30] Qiu, X., Sun, T., Xu, Y., Shao, Y., Dai, N., Huang, X. (2020). Pre-trained models for natural language processing: A survey. Science China Technological Sciences, 1-26. [31] Schmitz, M., Soderland, S., Bart, R., Etzioni, O. (2012, July). Open language learning for information extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 523-534). [32] Sutskever, I., Vinyals, O., Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems (pp. 3104-3112). [33] Tseng, Y. H., Lee, L. H., Lin, S. Y., Liao, B. S., Liu, M. J., Chen, H. H., ... Fader, A. (2014, April). Chinese open relation extraction for knowledge acquisition. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers (pp. 12-16). [34] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998-6008). [35] Wang, Z., Zhang, J., Feng, J., Chen, Z. (2014, June). Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 28, No. 1). [36] Wikipedia. (2021, December 13). 世界史年表. https://zh.wikipedia.org/wiki/%E4%B8%96%E7%95%8C%E5%8F%B2%E5%B9%B4%E8%A1%A8 [37] Wolf, T., Chaumond, J., Debut, L., Sanh, V., Delangue, C., Moi, A., ... Rush, A. M. (2020, October). Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 38-45). [38] Yates, A., Banko, M., Broadhead, M., Cafarella, M. J., Etzioni, O., Soderland, S. (2007, April). Textrunner: open information extraction on the web. In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT) (pp. 25-26). [39] You, J. M., Chen, K. J. (2004, July). Automatic semantic role assignment for a tree structure. In Proceedings of the Third SIGHAN Workshop on Chinese Language Processing (pp. 109-115). [40] Zhang, Y., Clark, S. (2011). Syntactic processing using the generalized perceptron and beam search. Computational Linguistics, 37(1), 105-151. [41] Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., Zhang, W. (2021, May). Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of AAAI. [42] 周家發. (2021, December 13). 世界政治史年表. http://chowkafat.net/Chronc.html [43] 經濟部工業局. (2021, December 13). 矽晶圓製造業資源化應用技術手冊. https://riw.tgpf.org.tw/ReadFile/?p=Publish n=06bb70d3-04c8-4029-9d02-654db04da568.pdf [44] 經濟部工業局. (2021, December 13). 行業製程減廢及污染防治技術-半導體業介紹. https://www.ftis.org.tw/eta2/tech_platform/item2i.asp | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/81982 | - |
| dc.description.abstract | 知識圖譜藉由將非結構化文字資料轉換為具結構且可彈性分析處理的關係網路圖,提供了一個讓各式演算法應用並找出隱藏關聯的介面。然而建置知識圖譜的過程是相當費時耗力的,在過去除了全人工標註方式外,也有使用全監督式搭配預訓練語言模型進行關係分類及序列標註的作法,但此種做法只能應對已知關係類別,對於未納入模型學習的資料或關係類別即沒有辦法有效地進行實體關係提取;而利用依存句法、句構文法進行實體關係提取的模型則常需要結合大量依照經驗所訂立的規則進行篩選,亦致使抽取效果相當受侷限、並容易受到文本內容樣態不同影響。而正由於前述原因,如何在儘可能少量人為干預的情形下,使用非監督式模型或半監督式模型對廣泛主題的資料進行實體關係進行有效提取便是本研究的重點。 而為了達到以上目標,本研究提出了一個改進的模型訓練架構,首先將文字資料斷詞、標註詞性及依存句法,接續使用訓練資料中已知實體關係三元組之間存在的依存句法關係、結合自助法從訓練資料中抽取出的假樣本,以Transformer架構訓練出對依存句法序列具有代表性的嵌入向量,從而建立能夠對真偽實體關係三元組進行分類的機器學習模型。提出的分析架構以封閉式和開放式文本資料集分別進行評估,封閉式文本使用百度於2021語言與智能技術競賽中開源的關係抽取資料集DuIE2.0、CCKS 2019 Task 6作為評估標的;開放式文本則選擇歷史年表和半導體製程文件作為抽取效能評估對象。本研究在封閉式文本中能逼近使用巨量資料訓練之語言模型的效果,並同時在開放式文本能抽取出可理解的實體關係三元組集合,為未來開放式文本抽取模型上提供了新思路。 | zh_TW |
| dc.description.provenance | Made available in DSpace on 2022-11-25T05:33:33Z (GMT). No. of bitstreams: 1 U0001-0702202222160300.pdf: 7743354 bytes, checksum: 4fa2cc72e2e9c79a2c80bd25485e650e (MD5) Previous issue date: 2022 | en |
| dc.description.tableofcontents | "口試委員審定書 i 致謝 ii 中文摘要 iii ABSTRACT iv 目錄 v 圖目錄 viii 表目錄 x Chapter 1 緒論 1 1.1 研究背景 1 1.2 研究動機與目的 4 Chapter 2 文獻探討 5 2.1 自然語言處理技術 5 2.1.1 傳統詞向量模型(Context-independent Embedding) 6 2.1.2 序列到序列模型架構(Seq2Seq) 9 2.1.3 前後文詞嵌入向量模型架構(Deep Contextualized Word Representations) 13 2.1.4 注意力機制與Transformer架構 14 2.1.5 基於變換器的雙向編碼器表示技術架構(Bidirectional Encoder Representations from Transformers, BERT) 19 2.1.6 斷詞、詞性標註及依存句法分析模型 21 2.2 開放式文本實體關係抽取模型 26 2.2.1 基於簡單規則的實體關係抽取模型 27 2.2.2 基於依存句法的實體關係抽取模型 30 2.2.3 結合神經網路的實體關係抽取模型 40 2.2.4 以中文資料為基礎的實體關係抽取架構 41 2.3 知識圖譜 46 2.3.1 WordNet與ConceptNet 46 2.3.2 利用知識圖譜進行預測的模型架構 48 2.4 衡量指標 53 Chapter 3 半監督式模型的實體關係抽取架構 56 3.1嵌入向量與真偽實體關係三元組分類器訓練 60 3.1.1 利用自助法進行候選實體關係三元組抽取 61 3.1.2 訓練對具遮罩與詞性的依存句法最短路徑有代表性之嵌入向量(Languange Model and Embedding Vector Training with Masked Dependency Path with Part of Speech) 67 3.1.3 訓練真偽實體關係三元組分類器 69 3.2 開放式文本實體關係抽取架構 70 3.2.1 利用自助法進行候選實體關係三元組抽取 70 3.2.2 使用語言模型與真偽實體關係三元組分類器進行篩選 71 Chapter 4 實例分析與討論 73 4.1 資料集說明 73 4.1.1 封閉式文本資料集:百度千言資料集 74 4.1.2 封閉式文本資料集:CCKS 2019 Task 6 資料集 77 4.1.3 開放式文本資料集:歷史年表、半導體製造資料 80 4.1.3 實體辭典 83 4.1.4 資料前處理 85 4.1.5 參數設定 86 4.2 實體關係抽取結果 91 4.2.1 衡量指標 91 4.2.2 自助法在封閉式文本的抽取效果 92 4.2.3 實體關係抽取模型訓練結果 100 4.2.4 實體關係抽取模型在封閉式文本的成效 104 4.2.5 實體關係抽取模型在開放式文本的成效 116 Chapter 5 結論與未來展望 122 5.1 研究結論 122 5.1.1 模型效能分析 123 5.1.2 斷詞器與依存句法解析器對模型效能的影響 125 5.1.3 具詞性的依存句法最短路徑所含的資訊量分析 125 5.1.4 實體辭典對模型的影響 126 5.2 未來展望 128 參考文獻列表 129 附錄 133 附錄A. 本研究所設置模型參數、訓練及評估結果 133" | |
| dc.language.iso | zh-TW | |
| dc.subject | 實體關係三元組 | zh_TW |
| dc.subject | 自然語言處理 | zh_TW |
| dc.subject | 知識圖譜 | zh_TW |
| dc.subject | 半監督式模型 | zh_TW |
| dc.subject | 變換器 | zh_TW |
| dc.subject | Knowledge Graph | en |
| dc.subject | Relation Triple | en |
| dc.subject | Transformer | en |
| dc.subject | Semi-supervised Model | en |
| dc.subject | Natural Language Processing | en |
| dc.title | 整合半監督式模型架構萃取實體關係三元組以建構中文知識圖譜 | zh_TW |
| dc.title | Develop the Semi-supervised Model Architecture to Extract Relation Triples for Chinese Knowledge Graph Construction | en |
| dc.date.schoolyear | 110-1 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 蔡易霖(Junichi Abe Peter),許嘉裕(Izumi Okane),(Hiran Anjana Ariyawansa) | |
| dc.subject.keyword | 自然語言處理,知識圖譜,半監督式模型,變換器,實體關係三元組, | zh_TW |
| dc.subject.keyword | Natural Language Processing,Knowledge Graph,Semi-supervised Model,Transformer,Relation Triple, | en |
| dc.relation.page | 273 | |
| dc.identifier.doi | 10.6342/NTU202200348 | |
| dc.rights.note | 同意授權(限校園內公開) | |
| dc.date.accepted | 2022-02-11 | |
| dc.contributor.author-college | 工學院 | zh_TW |
| dc.contributor.author-dept | 工業工程學研究所 | zh_TW |
| dc.date.embargo-lift | 2027-02-06 | - |
| 顯示於系所單位: | 工業工程學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-0702202222160300.pdf 未授權公開取用 | 7.56 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
