基於GRU的序列對序列自動編碼器的神經元功能之分析

Yi-Ting Lee; 李漪莛

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51054

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林守德(Shou-De Lin)
dc.contributor.author	Yi-Ting Lee	en
dc.contributor.author	李漪莛	zh_TW
dc.date.accessioned	2021-06-15T13:24:28Z	-
dc.date.available	2021-07-31
dc.date.copyright	2020-08-25
dc.date.issued	2020
dc.date.submitted	2020-08-11
dc.identifier.citation	[1] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, 2014, pp. 3104–3112. [2] T. Mikolov, M. Karafiat, L. Burget, J. ´ Cernock ˇ y, and S. Khudanpur, “Recurrent neural network based language model,” in Eleventh annual conference of the international speech communication association, 2010. [3] K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014. [4] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. [5] S. Sukhbaatar, J. Weston, R. Fergus et al., “End-to-end memory networks,” in Advances in neural information processing systems, 2015, pp. 2440–2448. [6] L. Dong and M. Lapata, “Language to logical form with neural attention,” arXiv preprint arXiv:1601.01280, 2016. [7] I. V. Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau, “Building end-to-end dialogue systems using generative hierarchical neural network models,” in Thirtieth AAAI Conference on Artificial Intelligence, 2016. [8] M.-T. Luong, Q. V. Le, I. Sutskever, O. Vinyals, and L. Kaiser, “Multi-task sequence to sequence learning,” arXiv preprint arXiv:1511.06114, 2015. [9] T.-H. Wen, M. Gasic, D. Kim, N. Mrksic, P.-H. Su, D. Vandyke, and S. Young, “Stochastic language generation in dialogue using recurrent neural networks with convolutional sentence reranking,” arXiv preprint arXiv:1508.01755, 2015. [10] L. Shang, Z. Lu, and H. Li, “Neural responding machine for short-text conversation,” arXiv preprint arXiv:1503.02364, 2015. [11] L.-H. Shen, P.-L. Tai, C.-C. Wu, and S.-D. Lin, “Controlling sequence-to-sequence models-a demonstration on neural-based acrostic generator,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, 2019, pp. 43–48. [12] A. Bau, Y. Belinkov, H. Sajjad, N. Durrani, F. Dalvi, and J. R. Glass, “Identifying and controlling important neurons in neural machine translation,” CoRR, vol. abs/1811.01157, 2018. [Online]. Available: http://arxiv.org/abs/1811.01157 [13] G. Weiss, Y. Goldberg, and E. Yahav, “On the practical computational power of finite precision rnns for language recognition,” arXiv preprint arXiv:1805.04908, 2018. [14] S. Ma, X. Sun, J. Lin, and H. Wang, “Autoencoder as assistant supervisor: Improving text representation for chinese social media text summarization,” arXiv preprint arXiv:1805.04869, 2018. [15] W. Xu, H. Sun, C. Deng, and Y. Tan, “Variational autoencoder for semisupervised text classification,” in Thirty-First AAAI Conference on Artificial Intelligence, 2017. [16] A. Radford, R. Jozefowicz, and I. Sutskever, “Learning to generate reviews ´ and discovering sentiment,” CoRR, vol. abs/1704.01444, 2017. [Online]. Available: http://arxiv.org/abs/1704.01444 [17] P. Qian, X. Qiu, and X. Huang, “Analyzing linguistic knowledge in sequential model of sentence,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas: Association for Computational Linguistics, Nov. 2016, pp. 826–835. [Online]. Available: https://www.aclweb.org/anthology/D16-1079 [18] Y. Lakretz, G. Kruszewski, T. Desbordes, D. Hupkes, S. Dehaene, and M. Baroni, “The emergence of number and syntax units in LSTM language models,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 11–20. [Online]. Available: https://www.aclweb.org/anthology/N19-1002 [19] A. Karpathy, J. Johnson, and F. Li, “Visualizing and understanding recurrent networks,” CoRR, vol. abs/1506.02078, 2015. [Online]. Available: http://arxiv.org/abs/1506.02078 [20] M. Giulianelli, J. Harding, F. Mohnert, D. Hupkes, and W. Zuidema, “Under the hood: Using diagnostic classifiers to investigate and improve how language models track agreement information,” in Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Brussels, Belgium: Association for Computational Linguistics, Nov. 2018, pp. 240–248. [Online]. Available: https://www.aclweb.org/anthology/W18-5426 [21] P. Koehn, “Europarl: A parallel corpus for statistical machine translation.” Citeseer. [22] M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp. 3319–3328.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51054	-
dc.description.abstract	本文的目的在報告有關Seq2Seq模型的科學發現。眾所周知，由於RNN本質上具有遞歸機制，因此在神經元級別的分析會比分析DNN或CNN模型更具挑戰性。本文旨在提供神經元級的分析，以解釋為什麼基於單純GRU的Seq2Seq模型不需attention的機制即可成功地以很高的正確率、照順序輸出正確的token。我們發現了兩種神經元集合：存儲神經元和倒數神經元，分別存儲token和位置信息，通過分析這兩組神經元在各個時間點如何轉變以及它們的相互作用，我們可以揭開模型如何在正確位置產生正確token的機制。	zh_TW
dc.description.abstract	The goal of this paper is to report certain scientific discoveries about a Seq2Seq model. It is known that analyzing the behavior of RNN-based models at the neuron level is considered a more challenging task than analyzing a DNN or CNN models due to their recursive mechanism in nature. This paper aims to provide neuron-level analysis to explain why a vanilla GRU-based Seq2Seq model without attention can successfully output correct tokens in the correct order with a very high accuracy. We found two types of neurons set, storage neurons and count-down neurons, storing token and position information respectively. By analyzing how these two group of neurons transform through the time step and how they interact, we can uncover the mechanism of how to produce the right tokens in the right positions.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T13:24:28Z (GMT). No. of bitstreams: 1 U0001-1008202016174700.pdf: 5799503 bytes, checksum: 42f653120e6acb5ac6e274a062fd2768 (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	誌謝 i 摘要 ii Abstract iii List of Figures vi List of Tables ix 1 Introduction 1 2 Related Works 5 3 Experiments Setup 7 3.1 Data Collection ..... 7 3.2 Model and Training details ..... 8 4 Neurons Identification Algorithm 10 4.1 Hypothesis formulation and candidate neurons generation ..... 10 4.2 Filtering ..... 12 4.3 Verification by manipulating the neuron values ..... 13 5 Hypotheses Verification 15 5.1 In each hidden states, how many neurons are storing the information of ”y_T = token_A” ? ..... 15 5.2 Do storage neurons change over different time steps? ..... 18 5.3 If the same token is to be output at different positions T, what is the relationship between the two sets of storage neurons? ..... 22 5.4 How does ht store all token information efficiently? ..... 25 5.5 Does each token have its own set of count-down neurons? ..... 27 5.6 How do count-down neurons behave? ..... 29 5.7 Why the storage neurons remain unchanged then start to change at T - k? ..... 29 5.8 How do count-down neurons affect storage neurons? ..... 33 5.9 Summary of findings ..... 36 6 Conclusion 38 Reference 39
dc.language.iso	en
dc.title	基於GRU的序列對序列自動編碼器的神經元功能之分析	zh_TW
dc.title	Exposing the Functionalities of Neurons for Gated Recurrent Unit Based Sequence-to-Sequence Autoencoder	en
dc.type	Thesis
dc.date.schoolyear	108-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	林智仁(Chih-Jen Lin),林軒田(Hsuan-Tien Lin),李宏毅(Hung-Yi Lee),陳縕儂(Yun-Nung Chen)
dc.subject.keyword	GRU,序列對序列模型,自動編碼器,神經元功能,	zh_TW
dc.subject.keyword	Gated Recurrent Unit,Sequence-to-Sequence Model,Autoencoder,Neurons functionalities,	en
dc.relation.page	41
dc.identifier.doi	10.6342/NTU202002828
dc.rights.note	有償授權
dc.date.accepted	2020-08-12
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
U0001-1008202016174700.pdf 目前未授權公開取用	5.66 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。