Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/81309
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor李宏毅(Hung-yi Lee)
dc.contributor.authorPo-Han Chien
dc.contributor.author紀伯翰zh_TW
dc.date.accessioned2022-11-24T03:42:20Z-
dc.date.available2021-08-04
dc.date.available2022-11-24T03:42:20Z-
dc.date.copyright2021-08-04
dc.date.issued2021
dc.date.submitted2021-07-22
dc.identifier.citation[1] C. Snow, Reading for understanding: Toward an R D program in reading comprehension. Rand Corporation, 2001. [2] W. G. Lehnert, “The process of question answering.” 1977. [3] P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, “SQuAD: 100,000+ questions for machine comprehension of text,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas: Association for Computational Linguistics, Nov. 2016, pp. 2383–2392. [Online]. Available: https://www.aclweb.org/anthology/D16-1264 [4] W. S. MCCULLOCH and W. H. PTTs, “A logical, calculus of the ideas immanent in nervous activity.” [5] D. E. Rumelhart and G. E. Hintonf, “Learning representations by back-propagating errors,” NATURE, vol. 323, p. 9, 1986. [6] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. [7] K. Cho, B. van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder–decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1724–1734. [8] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 6000–6010. [9] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” 2019. [10] J. Weston, S. Chopra, and A. Bordes, “Memory networks,” arXiv preprint arXiv:1410.3916, 2014. [11] J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” 2016. [12] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, F. Bach and D. Blei, Eds., vol. 37. Lille, France: PMLR, 07–09 Jul 2015, pp.448–456. [Online]. Available: http://proceedings.mlr.press/v37/ioffe15.html [13] T. Salimans and D. P. Kingma, “Weight normalization: A simple reparameterization to accelerate training of deep neural networks,” in NIPS, 2016. [14] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543. [15] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013. [16] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality.” [17] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language understanding,” Advances in neural information processing systems, vol. 32, 2019. [18] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 7871–7880. [19] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training.” [20] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners.” [21] P. He, X. Liu, J. Gao, and W. Chen, “Deberta: Decoding-enhanced bert with disentangled attention,” in International Conference on Learning Representations, 2020. [22] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” arXiv preprint arXiv:2005.14165, 2020. [23] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 4171–4186. [24] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “Albert: A lite bert for self-supervised learning of language representations,” in International Conference on Learning Representations, 2019. [25] L. Hirschman, M. Light, E. Breck, and J. D. Burger, “Deep read: A reading comprehension system,” in Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. College Park, Maryland, USA: Association for Computational Linguistics, Jun. 1999, pp. 325–332. [Online]. Available: https://www.aclweb.org/anthology/P99-1042 [26] M. Richardson, C. J. Burges, and E. Renshaw, “MCTest: A challenge dataset for the open-domain machine comprehension of text,” in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, Washington, USA: Association for Computational Linguistics, Oct. 2013, pp. 193–203. [Online]. Available: https://www.aclweb.org/anthology/D13-1020 [27] J. Berant, V. Srikumar, P.-C. Chen, A. Vander Linden, B. Harding, B. Huang, P. Clark, and C. D. Manning, “Modeling biological processes for reading comprehension,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1499–1510. [28] P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, “Squad: 100,000+ questions for machine comprehension of text,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, pp. 2383–2392. [29] P. Rajpurkar, R. Jia, and P. Liang, “Know what you don’t know: Unanswerable questions for SQuAD,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Melbourne, Australia: Association for Computational Linguistics, Jul. 2018, pp. 784–789. [Online]. Available: https://www.aclweb.org/anthology/P18-2124 [30] O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer networks.” [31] E. Choi, H. He, M. Iyyer, M. Yatskar, W.-t. Yih, Y. Choi, P. Liang, and L. Zettlemoyer, “Quac: Question answering in context,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 2174–2184. [32] C. Qu, L. Yang, M. Qiu, W. B. Croft, Y. Zhang, and M. Iyyer, “Bert with history answer embedding for conversational question answering,” in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019, pp. 1133–1136. [33] C. Liu, D. Xiong, Y. Jia, H. Zan, and C. Hu, “Hisbert for conversational reading comprehension,” in 2020 International Conference on Asian Language Processing (IALP). IEEE, 2020, pp. 147–152. [34] A. van den Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding.” [35] R. Kiros, Y. Zhu, R. Salakhutdinov, R. S. Zemel, R. Urtasun, A. Torralba, and S. Fidler, “Skip-thought vectors,” in NIPS, 2015. [36] M. Zaib, D. H. Tran, S. Sagar, A. Mahmood, W. E. Zhang, and Q. Z. Sheng, “Bertcoqac: Bert-based conversational question answering in context,” arXiv preprint arXiv:2104.11394, 2021. [37] C. Qu, L. Yang, M. Qiu, Y. Zhang, C. Chen, W. B. Croft, and M. Iyyer, “Attentive history selection for conversational question answering,” in Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019, pp. 1391–1400. [38] M. Qiu, X. Huang, C. Chen, F. Ji, C. Qu, W. Wei, J. Huang, and Y. Zhang, “Reinforced history backtracking for conversational question answering,” 2021. [39] M. Zaib, W. E. Zhang, Q. Z. Sheng, A. Mahmood, and Y. Zhang, “Conversational question answering: A survey,” arXiv preprint arXiv:2106.00874, 2021. [40] Z. Zhang and H. Zhao, “Advances in multi-turn dialogue comprehension: A survey,” arXiv preprint arXiv:2103.03125, 2021. [41] G. Kim, H. Kim, J. Park, and J. Kang, “Learn to resolve conversational dependency: A consistency training framework for conversational question answering,” arXiv preprint arXiv:2106.11575, 2021. [42] S. Vakulenko, S. Longpre, Z. Tu, and R. Anantha, “Question rewriting for conversational question answering,” in Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 2021, pp. 355–363.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/81309-
dc.description.abstract在自然語言處理中,機器閱讀理解(Machine Reading Comprehension) 試圖讓機器從文件中提取語意,並根據語意回答文件相關的問題,是一個極具挑戰的應用,在深度學習的技術發展下,處理機器閱讀理解的技術日漸成熟,相與對應的模型也逐期推出,在這樣的基礎下,本論文探討相對複雜的情境,對話情境下的機器閱讀理解(Conversational Machine Reading Comprehension),在此情境下, 模型除了需要了解文章與問題的內容,還需要理解問題本身與對話前期內容的相依性,因此如何處理對話情境的資訊,並將其融入機器閱讀理解的任務顯的重要。本論文提出使用超網路將對話情境的資訊融入產生的參數之中,讓當前的文章輸入和問題,和產生的參數互動得到對話情境的資訊,藉此提昇任務的表現。除此之外,本論文針對任務進行分析,並透過分析方法來了解模型的行為,期許未來研究能夠根據這些分析,提出有效的解決方案。zh_TW
dc.description.provenanceMade available in DSpace on 2022-11-24T03:42:20Z (GMT). No. of bitstreams: 1
U0001-2107202101322200.pdf: 2708733 bytes, checksum: 4e996bd16d2ab95cb4f382d1c1bfd183 (MD5)
Previous issue date: 2021
en
dc.description.tableofcontents"目錄 口試委員會審定書. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i 中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii 英文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii 一、導論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 研究動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 論文研究方向. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 主要貢獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 章節安排. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 二、背景知識. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 深層類神經網路. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 遞迴式類神經網路. . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.3 轉換器類神經網路. . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 分佈式表示(Distributed Representation) . . . . . . . . . . . . . . . . 17 2.2.1 簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.2 詞向量. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.3 語境化表示(Contextualized Representation) . . . . . . . . . . 20 2.2.4 分佈式表示總結. . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.3 機器閱讀理解. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3.1 簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3.2 資料集. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.3.3 深層類神經網路應用於機器閱讀理解. . . . . . . . . . . . . . 31 2.4 超網路生成參數. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.4.1 簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.5 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 三、利用超網路達成對話情境下的機器閱讀理解. . . . . . . . . . . . . . . . 34 3.1 任務簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.1.1 資料集. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.1.2 評量辦法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2 基礎方法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2.1 簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2.2 機器閱讀理解模型(BERTQA) . . . . . . . . . . . . . . . . . 40 3.2.3 預訓練模型搭配歷史對話文字(Prepend History Question / Answer,PHQA) . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2.4 標註歷史答案位置表示(History Answer Embedding,HAE) . 42 3.2.5 對話歷史模型(HisBERT) . . . . . . . . . . . . . . . . . . . 46 3.3 方法動機以及模型架構. . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3.1 簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3.2 方法動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3.3 機器閱讀理解模型. . . . . . . . . . . . . . . . . . . . . . . . 48 3.3.4 超網路模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.3.5 整體架構(Hypernetwork for History Function, HHF) . . . . . 49 3.4 實驗設置. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.5 實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.5.1 跨領域資料影響. . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.5.2 訓練資料量影響. . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.6 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 四、依據自身模型預測當作後期對話問題的歷史資訊. . . . . . . . . . . . . . 54 4.1 問題分析. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.2 實驗設置. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2.1 最大分數. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2.2 束搜索. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3 實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4 實驗分析與討論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4.1 最大分數的實驗結果討論. . . . . . . . . . . . . . . . . . . . 56 4.4.2 束搜尋的實驗結果討論. . . . . . . . . . . . . . . . . . . . . . 58 4.5 結論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 五、檢測預測未來資訊是否有助於模型表現. . . . . . . . . . . . . . . . . . . 64 5.1 實驗簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.2 實驗結果與討論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.3 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 六、結論與展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.1 研究貢獻與討論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.1.1 研究貢獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.1.2 討論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.2 未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72"
dc.language.isozh-TW
dc.subject問題回答zh_TW
dc.subject對話zh_TW
dc.subject機器學習zh_TW
dc.subjectmachine learningen
dc.subjectquestion answeringen
dc.subjectconversationen
dc.title使用超網路模型處理對話情境下的機器閱讀理解與錯誤傳導探討zh_TW
dc.titleHypernetwork for Conversational Question Answering and Error Propagation from History Informationen
dc.date.schoolyear109-2
dc.description.degree碩士
dc.contributor.author-orcid0000-0001-8183-3781
dc.contributor.oralexamcommittee李琳山(Hsin-Tsai Liu),王小川(Chih-Yang Tseng),陳信宏,鄭秋豫,簡仁宗
dc.subject.keyword問題回答,機器學習,對話,zh_TW
dc.subject.keywordquestion answering,machine learning,conversation,en
dc.relation.page77
dc.identifier.doi10.6342/NTU202101617
dc.rights.note同意授權(限校園內公開)
dc.date.accepted2021-07-22
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電信工程學研究所zh_TW
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
U0001-2107202101322200.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
2.65 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved