Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51054
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor林守德(Shou-De Lin)
dc.contributor.authorYi-Ting Leeen
dc.contributor.author李漪莛zh_TW
dc.date.accessioned2021-06-15T13:24:28Z-
dc.date.available2021-07-31
dc.date.copyright2020-08-25
dc.date.issued2020
dc.date.submitted2020-08-11
dc.identifier.citation[1] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, 2014, pp. 3104–3112.
[2] T. Mikolov, M. Karafiat, L. Burget, J. ´ Cernock ˇ y, and S. Khudanpur, “Recurrent neural network based language model,” in Eleventh annual conference of the international speech communication association, 2010.
[3] K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
[4] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[5] S. Sukhbaatar, J. Weston, R. Fergus et al., “End-to-end memory networks,” in Advances in neural information processing systems, 2015, pp. 2440–2448.
[6] L. Dong and M. Lapata, “Language to logical form with neural attention,” arXiv preprint arXiv:1601.01280, 2016.
[7] I. V. Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau, “Building end-to-end dialogue systems using generative hierarchical neural network models,” in Thirtieth AAAI Conference on Artificial Intelligence, 2016.
[8] M.-T. Luong, Q. V. Le, I. Sutskever, O. Vinyals, and L. Kaiser, “Multi-task sequence to sequence learning,” arXiv preprint arXiv:1511.06114, 2015.
[9] T.-H. Wen, M. Gasic, D. Kim, N. Mrksic, P.-H. Su, D. Vandyke, and S. Young, “Stochastic language generation in dialogue using recurrent neural networks with convolutional sentence reranking,” arXiv preprint arXiv:1508.01755, 2015.
[10] L. Shang, Z. Lu, and H. Li, “Neural responding machine for short-text conversation,” arXiv preprint arXiv:1503.02364, 2015.
[11] L.-H. Shen, P.-L. Tai, C.-C. Wu, and S.-D. Lin, “Controlling sequence-to-sequence models-a demonstration on neural-based acrostic generator,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, 2019, pp. 43–48.
[12] A. Bau, Y. Belinkov, H. Sajjad, N. Durrani, F. Dalvi, and J. R. Glass, “Identifying and controlling important neurons in neural machine translation,” CoRR, vol. abs/1811.01157, 2018. [Online]. Available: http://arxiv.org/abs/1811.01157
[13] G. Weiss, Y. Goldberg, and E. Yahav, “On the practical computational power of finite precision rnns for language recognition,” arXiv preprint arXiv:1805.04908, 2018.
[14] S. Ma, X. Sun, J. Lin, and H. Wang, “Autoencoder as assistant supervisor: Improving text representation for chinese social media text summarization,” arXiv preprint arXiv:1805.04869, 2018.
[15] W. Xu, H. Sun, C. Deng, and Y. Tan, “Variational autoencoder for semisupervised text classification,” in Thirty-First AAAI Conference on Artificial Intelligence, 2017.
[16] A. Radford, R. Jozefowicz, and I. Sutskever, “Learning to generate reviews ´ and discovering sentiment,” CoRR, vol. abs/1704.01444, 2017. [Online]. Available: http://arxiv.org/abs/1704.01444
[17] P. Qian, X. Qiu, and X. Huang, “Analyzing linguistic knowledge in sequential model of sentence,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas: Association for Computational Linguistics, Nov. 2016, pp. 826–835. [Online]. Available: https://www.aclweb.org/anthology/D16-1079
[18] Y. Lakretz, G. Kruszewski, T. Desbordes, D. Hupkes, S. Dehaene, and M. Baroni, “The emergence of number and syntax units in LSTM language models,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 11–20. [Online]. Available: https://www.aclweb.org/anthology/N19-1002
[19] A. Karpathy, J. Johnson, and F. Li, “Visualizing and understanding recurrent networks,” CoRR, vol. abs/1506.02078, 2015. [Online]. Available: http://arxiv.org/abs/1506.02078
[20] M. Giulianelli, J. Harding, F. Mohnert, D. Hupkes, and W. Zuidema, “Under the hood: Using diagnostic classifiers to investigate and improve how language models track agreement information,” in Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Brussels, Belgium: Association for Computational Linguistics, Nov. 2018, pp. 240–248. [Online]. Available: https://www.aclweb.org/anthology/W18-5426
[21] P. Koehn, “Europarl: A parallel corpus for statistical machine translation.” Citeseer.
[22] M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp. 3319–3328.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51054-
dc.description.abstract本文的目的在報告有關Seq2Seq模型的科學發現。眾所周知,由於RNN本質上具有遞歸機制,因此在神經元級別的分析會比分析DNN或CNN模型更具挑戰性。本文旨在提供神經元級的分析,以解釋為什麼基於單純GRU的Seq2Seq模型不需attention的機制即可成功地以很高的正確率、照順序輸出正確的token。我們發現了兩種神經元集合:存儲神經元和倒數神經元,分別存儲token和位置信息,通過分析這兩組神經元在各個時間點如何轉變以及它們的相互作用,我們可以揭開模型如何在正確位置產生正確token的機制。zh_TW
dc.description.abstractThe goal of this paper is to report certain scientific discoveries about a Seq2Seq model. It is known that analyzing the behavior of RNN-based models at the neuron level is considered a more challenging task than analyzing a DNN or CNN models due to their recursive mechanism in nature. This paper aims to provide neuron-level analysis to explain why a vanilla GRU-based Seq2Seq model without attention can successfully output correct tokens in the correct order with a very high accuracy. We found two types of neurons set, storage neurons and count-down neurons, storing token and position information respectively. By analyzing how these two group of neurons transform through the time step and how they interact, we can uncover the mechanism of how to produce the right tokens in the right positions.en
dc.description.provenanceMade available in DSpace on 2021-06-15T13:24:28Z (GMT). No. of bitstreams: 1
U0001-1008202016174700.pdf: 5799503 bytes, checksum: 42f653120e6acb5ac6e274a062fd2768 (MD5)
Previous issue date: 2020
en
dc.description.tableofcontents誌謝 i
摘要 ii
Abstract iii
List of Figures vi
List of Tables ix
1 Introduction 1
2 Related Works 5
3 Experiments Setup 7
3.1 Data Collection ..... 7
3.2 Model and Training details ..... 8
4 Neurons Identification Algorithm 10
4.1 Hypothesis formulation and candidate neurons generation ..... 10
4.2 Filtering ..... 12
4.3 Verification by manipulating the neuron values ..... 13
5 Hypotheses Verification 15
5.1 In each hidden states, how many neurons are storing the information of ”y_T = token_A” ? ..... 15
5.2 Do storage neurons change over different time steps? ..... 18
5.3 If the same token is to be output at different positions T, what is the relationship between the two sets of storage neurons? ..... 22
5.4 How does ht store all token information efficiently? ..... 25
5.5 Does each token have its own set of count-down neurons? ..... 27
5.6 How do count-down neurons behave? ..... 29
5.7 Why the storage neurons remain unchanged then start to change at T - k? ..... 29
5.8 How do count-down neurons affect storage neurons? ..... 33
5.9 Summary of findings ..... 36
6 Conclusion 38
Reference 39
dc.language.isoen
dc.title基於GRU的序列對序列自動編碼器的神經元功能之分析
zh_TW
dc.titleExposing the Functionalities of Neurons for Gated Recurrent Unit Based Sequence-to-Sequence Autoencoderen
dc.typeThesis
dc.date.schoolyear108-2
dc.description.degree碩士
dc.contributor.oralexamcommittee林智仁(Chih-Jen Lin),林軒田(Hsuan-Tien Lin),李宏毅(Hung-Yi Lee),陳縕儂(Yun-Nung Chen)
dc.subject.keywordGRU,序列對序列模型,自動編碼器,神經元功能,zh_TW
dc.subject.keywordGated Recurrent Unit,Sequence-to-Sequence Model,Autoencoder,Neurons functionalities,en
dc.relation.page41
dc.identifier.doi10.6342/NTU202002828
dc.rights.note有償授權
dc.date.accepted2020-08-12
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
U0001-1008202016174700.pdf
  目前未授權公開取用
5.66 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved