基於元學習的資料不足依存句法剖析

Chung-Yi Li; 李仲翊

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71605

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李宏毅(Hung-Yi Lee)
dc.contributor.author	Chung-Yi Li	en
dc.contributor.author	李仲翊	zh_TW
dc.date.accessioned	2021-06-17T06:04:23Z	-
dc.date.available	2020-11-12
dc.date.copyright	2020-11-12
dc.date.issued	2020
dc.date.submitted	2020-11-05
dc.identifier.citation	[1] S.-A. Rebuffi, H. Bilen, and A. Vedaldi, “Efficient parametrization of multi-domain deep neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8119–8127. [2] N. Chomsky, “Rules and representations,” Behavioral and brain sciences, vol. 3, no. 1, 1980. [3] J. H. Greenberg, “Universals of language.” 1963. [4] G. K. Zipf, “Human behavior and the principle of least effort,” 1949. [5] J. Nivre, M.-C. de Marneffe, F. Ginter, Y. Goldberg, J. Hajic, C. D. Manning, R. T. McDonald, S. Petrov, S. Pyysalo, N. Silveira, R. Tsarfaty, and D. Zeman, “Universal dependencies v1: A multilingual treebank collection,” in LREC, 2016. [6] J. Nivre, M.-C. de Marneffe, F. Ginter, J. Hajivc, C. D. Manning, S. Pyysalo, S. Schuster, F. Tyers, and D. Zeman, “Universal dependencies v2: An evergrowing multilingual treebank collection,” in LREC, 2020. [7] D. M. Eberhard, G. F. Simons, and C. D. Fennig, Ethnologue, languages of the world, 23rd ed. SIL International, 2020. [8] T. M. Mitchell, The need for biases in learning generalizations. Department of Computer Science, Laboratory for Computer Science Research, Rutgers Univ., 1980. [9] Y. Zhang and R. Barzilay, “Hierarchical low-rank tensors for multilingual transfer parsing,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal: Association for Computational Linguistics, Sep. 2015, pp. 1857–1867. [Online]. Available: https://www.aclweb.org/anthology/D15-1213 [10] Ž. Agić, A. Johannsen, B. Plank, H. Martínez Alonso, N. Schluter, and A. Søgaard, “Multilingual projection for parsing truly low-resource languages,” Transactions of the Association for Computational Linguistics, vol. 4, pp. 301–312, 2016. [Online]. Available: https://www.aclweb.org/anthology/Q16-1022 [11] M. S. Rasooli and M. Collins, “Cross-lingual syntactic transfer with limited resources,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 279–293, 2017. [Online]. Available: https://www.aclweb.org/anthology/Q17-1020 [12] M. S. Dryer and M. Haspelmath, Eds., WALS Online. Leipzig: Max Planck Institute for Evolutionary Anthropology, 2013. [Online]. Available: https://wals.info/ [13] T. Naseem, R. Barzilay, and A. Globerson, “Selective sharing for multilingual dependency parsing,” in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Jeju Island, Korea: Association for Computational Linguistics, Jul. 2012, pp. 629–637. [Online]. Available: https://www.aclweb.org/anthology/P12-1066 [14] O. Täckström, R. McDonald, and J. Nivre, “Target language adaptation of discriminative transfer parsers,” in Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Atlanta, Georgia: Association for Computational Linguistics, Jun. 2013, pp. 1061–1071. [Online]. Available: https://www.aclweb.org/anthology/N13-1126 [15] L. Aufrant, G. Wisniewski, and F. Yvon, “Zero-resource dependency parsing: Boosting delexicalized cross-lingual transfer with linguistic knowledge,” in Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. Osaka, Japan: The COLING 2016 Organizing Committee, Dec. 2016, pp. 119–130. [Online]. Available: https://www.aclweb.org/anthology/C16-1012 [16] P. Littell, D. R. Mortensen, K. Lin, K. Kairis, C. Turner, and L. Levin, “URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors,” in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Valencia, Spain: Association for Computational Linguistics, Apr. 2017, pp. 8–14. [Online]. Available: https://www.aclweb.org/anthology/E17-2002 [17] W. Ahmad, Z. Zhang, X. Ma, E. Hovy, K.-W. Chang, and N. Peng, “On difficulties of cross-lingual transfer with order differences: A case study on dependency parsing,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 2440–2452. [Online]. Available: https://www.aclweb.org/anthology/N19-1253 [18] Y.-H. Lin, C.-Y. Chen, J. Lee, Z. Li, Y. Zhang, M. Xia, S. Rijhwani, J. He, Z. Zhang, X. Ma, A. Anastasopoulos, P. Littell, and G. Neubig, “Choosing transfer languages for cross-lingual learning,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, Jul. 2019, pp. 3125–3135. [Online]. Available: https://www.aclweb.org/anthology/P19-1301 [19] K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural networks, vol. 4, no. 2, pp. 251–257, 1991. [20] J. L. Elman, “Finding structure in time,” Cognitive science, vol. 14, no. 2, pp. 179–211, 1990. [21] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. [22] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Curran Associates, Inc., 2017, pp. 5998–6008. [Online]. Available: http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf [23] E. Voita, D. Talbot, F. Moiseev, R. Sennrich, and I. Titov, “Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, Jul. 2019, pp. 5797–5808. [Online]. Available: https://www.aclweb.org/anthology/P19-1580 [24] J. Reisinger and R. J. Mooney, “Multi-prototype vector-space models of word meaning,” in Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Los Angeles, California: Association for Computational Linguistics, Jun. 2010, pp. 109–117. [Online]. Available: https://www.aclweb.org/anthology/N10-1013 [25] A. Neelakantan, J. Shankar, A. Passos, and A. McCallum, “Efficient non-parametric estimation of multiple embeddings per word in vector space,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 1059–1069. [Online]. Available: https://www.aclweb.org/anthology/D14-1113 [26] M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word representations,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). New Orleans, Louisiana: Association for Computational Linguistics, Jun. 2018, pp. 2227–2237. [Online]. Available: https://www.aclweb.org/anthology/N18-1202 [27] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” 2018. [28] J. Howard and S. Ruder, “Universal language model fine-tuning for text classification,” arXiv preprint arXiv:1801.06146, 2018. [29] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 4171–4186. [Online]. Available: https://www.aclweb.org/anthology/N19-1423 [30] W. T. Tutte, Graph Theory. Addison-Wesley Menlo Park, 1984, vol. 11. [31] T. Koo, A. Globerson, X. Carreras, and M. Collins, “Structured prediction models via the matrix-tree theorem,” in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Prague, Czech Republic: Association for Computational Linguistics, Jun. 2007, pp. 141–150. [Online]. Available: https://www.aclweb.org/anthology/D07-1015 [32] X. Ma and E. Hovy, “Neural probabilistic model for non-projective MST parsing,” in Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Taipei, Taiwan: Asian Federation of Natural Language Processing, Nov. 2017, pp. 59–69. [Online]. Available: https://www.aclweb.org/anthology/I17-1007 [33] T. Dozat and C. D. Manning, “Deep biaffine attention for neural dependency parsing,” ArXiv, vol. abs/1611.01734, 2017. [34] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014. [35] A. Nichol, J. Achiam, and J. Schulman, “On first-order meta-learning algorithms,” arXiv preprint arXiv:1803.02999, 2018. [36] J. Gu, Y. Wang, Y. Chen, V. O. K. Li, and K. Cho, “Meta-learning for low-resource neural machine translation,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics, Oct.-Nov. 2018, pp. 3622–3631. [Online]. Available: https://www.aclweb.org/anthology/D18-1398 [37] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in ICML, 2017. [38] R. Caruana, “Multitask learning,” Machine learning, vol. 28, no. 1, pp. 41–75, 1997. [39] S. Petrov, D. Das, and R. McDonald, “A universal part-of-speech tagset,” in Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12). Istanbul, Turkey: European Language Resources Association (ELRA), May 2012, pp. 2089–2096. [Online]. Available: http://www.lrec-conf.org/proceedings/lrec2012/pdf/274_Paper.pdf [40] D. Kondratyuk and M. Straka, “75 languages, 1 model: Parsing universal dependencies universally,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: Association for Computational Linguistics, Nov. 2019, pp. 2779–2795. [Online]. Available: https://www.aclweb.org/anthology/D19-1279 [41] A. Üstün, A. Bisazza, G. Bouma, and G. van Noord, “Udapter: Language adaptation for truly universal dependency parsing,” arXiv preprint arXiv:2004.14327, 2020. [42] R. Sennrich, B. Haddow, and A. Birch, “Neural machine translation of rare words with subword units,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Berlin, Germany: Association for Computational Linguistics, Aug. 2016, pp. 1715–1725. [Online]. Available: https://www.aclweb.org/anthology/P16-1162 [43] M. Schuster and K. Nakajima, “Japanese and korean voice search,” in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2012, pp. 5149–5152. [44] I. Tenney, D. Das, and E. Pavlick, “BERT rediscovers the classical NLP pipeline,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, Jul. 2019, pp. 4593–4601. [Online]. Available: https://www.aclweb.org/anthology/P19-1452 [45] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for nlp,” arXiv preprint arXiv:1902.00751, 2019.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71605	-
dc.description.abstract	依存句法分析為自然語言處理系統中非常基礎卻也非常重要的元件之一。然而現今地球上只有大約不到2% 的語言具有依存句法剖析所需要的語料。現今幫助資料不足語言句法剖析的方法主要利用資料充足語言進行多語言訓練，再將參數轉移到資料不足語言上。這些方法在訓練時對資料充足語言進行優化，測試時的目標卻是在未見過的資料不足語言精細校正後有好表現，造成訓練與測試目標不一致的情況。本論文提出使用模型無關元學習方法改進資料充足語言多語言訓練的演算法，不同於現有方法優化參數在各個語言的語言剖析準確率，而是優化該參數在各個語言上精細校正後的語言剖析準確率，有效解決訓練與測試目標不一致的問題。本研究將模型無關元學習方法實驗在去詞化依存句法剖析，分析不同模型無關元學習演算法的變形其在依存句法剖析的效果優劣，與不同的超參數設置對剖析準確率的影響，發現爬蟲類元學習既適合在訓練語言上訓練完成後直接剖析未見過的資料不足語言，也適合利用資料不足語言的少量語料繼續精進準確率；模型無關元學習與其一階近似則具有接觸資料不足語言語料後快速適應的能力。最後將模型無關元學習推廣到實際的應用場景–詞化的依存句法剖析，發現傳統的多語言協同訓練的基準模型就足夠應付大部分的需求，而模型無關元學習相關方法則有改進的餘地。我們也觀察了這些多語言預訓練方法在精細校正過程中掌握目標語言特性的樣態，為往後改良模型無關元學習演算法提供了有益的觀察。	zh_TW
dc.description.abstract	Dependency parsing is one of the fundamental yet essential components in natural language processing pipelines. However, Only less than 2% of languages in the world have dependency tree data available for parsing. Existing methods of improving low-resource dependency parsing usually employ multilingual training on high-resource languages, then transfer its parameters to low-resource dependency parsing systems. These methods optimize for parsing accuracies on high-resource languages, yet are asked to perform well on low-resource languages after fine-tuning on each of them, which results in a mismatch between training- and testing-time objectives. In this thesis, we apply model-agnostic meta-learning methods (MAML) on low-resource dependency parsing. Instead of optimizing parsing accuracies of training languages, MAML optimizes for parsing accuracies on each language after fine-tuning, which effectively reduces the mismatch of training- and testing-time objectives. We first apply MAML on delexicalized dependency parsing to analyze the performance of different variants of MAML-based methods (MAML, Reptile, FOMAML), and the impact of various hyperparameter settings on parsing accuracies. We find that Reptile is suitable for both zero-shot transfer and low-resource fine-tuning, while MAML and FOMAML can quickly adapt to target languages. Then we extend MAML-based methods to a real-world scenario – lexicalized dependency parsing and find that in most cases, conventional multilingual training works well enough, leaving some room for improvement in MAML-based methods. We also perform an analysis of the ability of different methods to adapt to target languages’ characteristics, providing useful observation for improving MAML-based methods.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T06:04:23Z (GMT). No. of bitstreams: 1 U0001-0511202013324700.pdf: 3643371 bytes, checksum: a18291e84893f61764ac4aac3f646782 (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	中文摘要 i 英文摘要 ii 一、導論 1 1.1 研究動機 1 1.2 研究方向 4 1.3 章節安排 4 二、背景知識 5 2.1 機器學習（Machine Learning） 5 2.1.1 機器學習問題架構 5 2.1.2 機器學習模型 7 2.2 深度類神經網路（Deep Neural Networks） 8 2.2.1 前饋式類神經網路(FeedForward Neural Network) 8 2.2.2 類神經網路訓練（Deep Neural Network Training） 9 2.2.3 遞歸式類神經網路（Recurrent Neural Network） 11 2.2.4 轉換器類神經網路（Transformer Neural Network） 12 2.3 分佈式表示（distributed representation） 15 2.3.1 詞向量（Word Vectors） 15 2.3.2 語境化表示（Contextualized Representations） 17 2.4 依存句法剖析（Dependency Parsing） 18 2.4.1 句法簡介 18 2.4.2 定義及問題描述 21 2.4.3 圖類剖析器（Graph-based Parser） 22 2.4.4 中心詞方向性（head-directionality） 25 2.5 基於優化的元學習（Optimization-based Meta Learning） 28 2.5.1 模型無關元學習（Model-agnostic Meta Learning，MAML） 28 2.5.2 一階模型無關元學習（First-order MAML） 30 2.5.3 爬蟲類元學習（Reptile） 30 三、使用元學習在資料不足的去詞化依存句法剖析 32 3.1 簡介 32 3.2 多語言去詞化依存句法剖析（multilingual delexicalized dependency parsing） 33 3.2.1 詞性標記（POS tags） 34 3.2.2 圖類剖析器– 深層雙仿射層注意力網路（Graph-based Parser – Deep Biaffine Attention） 34 3.2.3 多工學習基準模型（multi-task baseline） 35 3.2.4 修訂版爬蟲類元學習 35 3.3 實驗設置 36 3.4 實驗結果 39 3.4.1 去詞化依存句法剖析不同方法比較 39 3.4.2 去詞化依存句法剖析各方法不同內循環步數比較 45 3.4.3 去詞化依存句法剖析小結 46 3.4.4 小模型去詞化依存句法剖析不同方法比較 46 3.4.5 小模型去詞化依存句法剖析各方法不同內循環步數比較 52 3.4.6 小模型去詞化依存句法剖析小結 52 3.5 分析與討論 53 3.5.1 計數模型 53 3.5.2 去詞化依存句法剖析各方法產生句法樹之方向性分析 54 3.5.3 小模型去詞化依存句法剖析各方法產生句法樹之方向性分析 54 3.6 小結 59 四、使用元學習在資料不足的詞化依存句法剖析 64 4.1 簡介 64 4.2 多語言詞化依存句法剖析模型架構 65 4.2.1 多語言基於轉換器模型的雙向編碼器表示（multilingual BERT） 65 4.2.2 適應器(adapter) 67 4.3 實驗設置 69 4.4 實驗結果 70 4.5 分析與討論 72 4.6 小結 73 五、結論與展望 78 5.1 研究貢獻與討論 78 5.2 未來展望 78 5.2.1 訓練語言的選擇對不同預訓練方法的影響 78 5.2.2 不同句法樹機率定義對不同預訓練方法的影響 79 5.2.3 不同依存句法剖析演算法對不同預訓練方法的影響 79 5.2.4 不同編碼器對不同預訓練方法的影響 80 參考文獻 81 附錄 89
dc.language.iso	zh-TW
dc.title	基於元學習的資料不足依存句法剖析	zh_TW
dc.title	Meta-Learning for Low-resource Dependency Parsing	en
dc.type	Thesis
dc.date.schoolyear	109-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	李琳山(Lin-shan Lee),李彥寰(Yen-Huan Li),陳尚澤(Shang-Tse Chen)
dc.subject.keyword	依存句法剖析,元學習,資料不足,	zh_TW
dc.subject.keyword	Dependency parsing,Meta-learning,Low-resource,	en
dc.relation.page	101
dc.identifier.doi	10.6342/NTU202004322
dc.rights.note	有償授權
dc.date.accepted	2020-11-05
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
U0001-0511202013324700.pdf 目前未授權公開取用	3.56 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。