以序列對序列網路為基礎的端對端短句回覆問答系統

Po-Yu Wu; 吳柏瑜

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68233

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李宏毅
dc.contributor.author	Po-Yu Wu	en
dc.contributor.author	吳柏瑜	zh_TW
dc.date.accessioned	2021-06-17T02:15:21Z	-
dc.date.available	2019-01-04
dc.date.copyright	2018-01-04
dc.date.issued	2017
dc.date.submitted	2017-10-20
dc.identifier.citation	[1] Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting.,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014. [2] 沈昇勳, “藉助線上課程之自動結構化、分類與理解以提升學習效率,” 2016. [3] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013. [4] Yann LeCun, Yoshua Bengio, et al., “Convolutional networks for images, speech, and time series,” The handbook of brain theory and neural networks, vol. 3361, no. 10, pp. 1995, 1995. [5] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell, “Caffe: Convolutional architecture for fast feature embedding,” in Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014, pp. 675–678. [6] Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan Cernock ´ y, and Sanjeev Khu-danpur, “Recurrent neural network based language model.,” in INTERSPEECH, 2010, pp. 1045–1048. [7] Sepp Hochreiter and Jurgen Schmidhuber, “Long short-term memory,” ¨ Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. [8] Mart´ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghe-mawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, ´ Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, Oriol Vinyals, Pete Warden, ´ Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, Software available from tensorflow.org. [9] Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al., “In-datacenter performance analysis of a tensor processing unit,” arXiv preprint arXiv:1704.04760, 2017. [10] Yoshua Bengio, Patrice Simard, and Paolo Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE transactions on neural networks, vol. 5, no. 2, pp. 157–166, 1994. [11] Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi ¨ Bougares, Holger Schwenk, and Yoshua Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014. [12] Ilya Sutskever, Oriol Vinyals, and Quoc V Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, 2014, pp. 3104–3112. [13] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, 2013, pp. 3111–3119. [14] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A LargeScale Hierarchical Image Database,” in CVPR09, 2009. [15] Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M Rush, Bart van Merrienboer, Armand Joulin, and Tomas Mikolov, “Towards ai-complete question ¨ answering: A set of prerequisite toy tasks,” arXiv preprint arXiv:1502.05698, 2015. [16] Matthew Richardson, Christopher JC Burges, and Erin Renshaw, “Mctest: A challenge dataset for the open-domain machine comprehension of text.,” in EMNLP, 2013, vol. 3, p. 4. [17] Yi Yang, Wen-tau Yih, and Christopher Meek, “Wikiqa: A challenge dataset for open-domain question answering.,” in EMNLP. Citeseer, 2015, pp. 2013–2018. [18] Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom, “Teaching machines to read and comprehend,” in Advances in Neural Information Processing Systems, 2015, pp. 1693–1701. [19] Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang, “Squad: 100,000+ questions for machine comprehension of text,” arXiv preprint arXiv:1606.05250, 2016. [20] Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng, “Ms marco: A human generated machine reading comprehension dataset,” arXiv preprint arXiv:1611.09268, 2016. [21] Volodymyr Mnih, Nicolas Heess, Alex Graves, et al., “Recurrent models of visual attention,” in Advances in neural information processing systems, 2014, pp. 2204–2212. [22] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014. [23] Jason Weston, Sumit Chopra, and Antoine Bordes, “Memory networks,” arXiv preprint arXiv:1410.3916, 2014. [24] Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et al., “End-to-end memory networks,” in Advances in neural information processing systems, 2015, pp. 2440–2448. [25] Caiming Xiong, Stephen Merity, and Richard Socher, “Dynamic memory networks for visual and textual question answering,” arXiv, vol. 1603, 2016. [26] Chin-Yew Lin, “Rouge: A package for automatic evaluation of summaries,” in Text summarization branches out: Proceedings of the ACL-04 workshop. Barcelona, Spain, 2004, vol. 8. [27] Yelong Shen, Po-Sen Huang, Jianfeng Gao, and Weizhu Chen, “Reasonet: Learning to stop reading in machine comprehension,” arXiv preprint arXiv:1609.05284, 2016. [28] Dirk Weissenborn, Georg Wiese, and Laura Seiffe, “Making neural qa as simple as possible but not simpler,” CoNLL 2017, p. 271, 2017. [29] Shuohang Wang and Jing Jiang, “Machine comprehension using match-lstm and answer pointer,” arXiv preprint arXiv:1608.07905, 2016. [30] Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson, “Cnn features off-the-shelf: an astounding baseline for recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014, pp. 806–813. [31] Anh Nguyen, Jason Yosinski, Yoshua Bengio, Alexey Dosovitskiy, and Jeff Clune, “Plug & play generative networks: Conditional iterative generation of images in latent space,” arXiv preprint arXiv:1612.00005, 2016. [32] Samuel R Bowman, Luke Vilnis, Oriol Vinyals, Andrew M Dai, Rafal Jozefowicz, and Samy Bengio, “Generating sentences from a continuous space,” arXiv preprint arXiv:1511.06349, 2015.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68233	-
dc.description.abstract	隨著科技的發展以及巨量的資料，讓我們以前從未想過的科技得以實踐。語音助理的出現，讓人們明顯感受到科技的演進，以及語音辨識之進步，讓使用者更加喜愛和語音助理之互動，因此越發希望語音助理能夠理解出意思，而不是僅僅將語音輸入結果轉接成搜尋結果。本論文之主軸即為問答系統之簡短回答，省去使用者查詢檢索之時間，能夠直接給予使用者所想要的資訊結果。本論文首先以檢索回來的資料為出發點，使用深度類神經網路來回答出可能之答案。加入專注式機制，來學習到可能所想要關注的語句。採用回顧機制，試圖反覆理解文章之含義。透過隨插即用及變分遞迴式自動編碼器之概念，來強化語言模型之通順程度以及語義關係。希望能夠透過這些方法，來改善語音助理大多只是回傳搜尋結果的缺失，進而提升使用者的體驗。	zh_TW
dc.description.provenance	Made available in DSpace on 2021-06-17T02:15:21Z (GMT). No. of bitstreams: 1 ntu-106-R04921034-1.pdf: 3818744 bytes, checksum: 5fac5a4d6feffd4294b5072f4eab702e (MD5) Previous issue date: 2017	en
dc.description.tableofcontents	誌謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i 中文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii 一、導論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 研究動機 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 相關研究 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 研究方向 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 章節安排 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 二、背景知識 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 深層類神經網路 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 簡介 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 運作原理 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.3 訓練方法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.4 丟棄演算法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 遞迴式神經網路（Recurrent Neural Network, RNN） . . . . . . . . . . 10 2.2.1 簡介 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 長短期記憶神經網絡（Long Short-term Memory Network） . . 11 2.2.3 序列對序列模型 . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 詞彙表示法 - 詞嵌入（Word Embedding） . . . . . . . . . . . . . . . . 17 2.3.1 基本介紹 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.2 跳躍文法模型（Skip-gram Model） . . . . . . . . . . . . . . . 18 2.4 機器閱讀理解數據集（MAchine Reading COmprehension, MARCO, Dataset） . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4.1 問答系統 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4.2 語料介紹 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 三、以專注式記憶編碼解碼器實現描述式問答系統 . . . . . . . . . . . . . . . 24 3.1 簡介 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 模型架構介紹 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3 基本實驗配置 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3.1 前置處理 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3.2 基準實驗 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4 實驗結果與討論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.4.1 記憶細胞大小 . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.4.2 回顧次數 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.4.3 取代數字 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.4.4 模型比較 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.4.5 答案種類比較 . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.5 範例與分析 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.6 本章總結 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 四、以遞迴式網路與卷積神經網路之基於查詢詞檢測 . . . . . . . . . . . . . . 40 4.1 簡介 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.2 基本實驗配置 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.2.1 卷積類神經網路（Convolutional Nueral Network） . . . . . . . 40 4.3 實驗結果與討論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.4 本章總結 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 五、結合變分遞迴式自動編碼器之問答模型 . . . . . . . . . . . . . . . . . . . 45 5.1 簡介 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.2 變分遞迴式自動編碼器 . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.2.1 遞迴式自動編碼器 . . . . . . . . . . . . . . . . . . . . . . . . 46 5.2.2 變分機制 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.2.3 克雷散度 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.2.4 損失函數 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3 實驗結果與討論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3.1 變分遞迴式自動編碼器實驗 . . . . . . . . . . . . . . . . . . . 47 5.3.2 問答系統模型結果 . . . . . . . . . . . . . . . . . . . . . . . . 48 5.4 本章總結 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 六、結論與展望 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 6.1 結論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 6.2 未來展望 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 參考文獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
dc.language.iso	zh-TW
dc.title	以序列對序列網路為基礎的端對端短句回覆問答系統	zh_TW
dc.title	End-to-End Short Text Question Answering based on Sequence-to-Sequence Network	en
dc.type	Thesis
dc.date.schoolyear	106-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	賴穎暉,陳縕儂,曹昱,江振宇
dc.subject.keyword	問答系統,	zh_TW
dc.subject.keyword	Question Answering System,	en
dc.relation.page	56
dc.identifier.doi	10.6342/NTU201704255
dc.rights.note	有償授權
dc.date.accepted	2017-10-20
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf 目前未授權公開取用	3.73 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。