Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72175
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor李琳山(Lin-shan Lee)
dc.contributor.author"Lu,Bo Ru"en
dc.contributor.author盧柏儒zh_TW
dc.date.accessioned2021-06-17T06:27:12Z-
dc.date.available2019-09-03
dc.date.copyright2018-09-03
dc.date.issued2018
dc.date.submitted2018-08-16
dc.identifier.citation[1] Alex Graves and Navdeep Jaitly, “Towards end-to-end speech recognition with re- current neural networks,” in International Conference on Machine Learning, 2014, pp. 1764–1772.
[2] “Youtube statistics,” https://fortunelords.com/ youtube-statistics/, Accessed Feb, 2018.
[3] Yun-Nung Chen, Yu Huang, Ching-Feng Yeh, and Lin-Shan Lee, “Spoken lec- ture summarization by random walk over a graph constructed with automatically extracted key terms,” in Twelfth Annual Conference of the International Speech Communication Association, 2011.
[4] Sz-Rung Shiang, Hung-yi Lee, and Lin-shan Lee, “Supervised spoken document summarization based on structured support vector machine with utterance clusters as hidden variables.,” in INTERSPEECH, 2013, pp. 2728–2732.
[5] Hung-yiLee,Sz-RungShiang,Ching-fengYeh,Yun-NungChen,YuHuang,Sheng- Yi Kong, and Lin-shan Lee, “Spoken knowledge organization by semantic structur- ing and a prototype course lecture system for personalized learning,” IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), vol. 22, no. 5, pp. 883–898, 2014.
[6] Sameer Raj Maskey and Julia Hirschberg, Automatic broadcast news speech sum- marization, Columbia University, 2008.
[7] Shih-Hsiang Lin, Berlin Chen, and Hsin-Min Wang, “A comparative study of prob- abilistic ranking models for chinese spoken document summarization,” ACM Trans- actions on Asian Language Information Processing (TALIP), vol. 8, no. 1, pp. 3, 2009.
[8] Xiaoyan Cai and Wenjie Li, “Ranking through clustering: An integrated approach to multi-document summarization,” IEEE transactions on audio, speech, and language processing, vol. 21, no. 7, pp. 1424–1433, 2013.
[9] Shasha Xie, Dilek Hakkani-Tu ̈r, Benoit Favre, and Yang Liu, “Integrating prosodic features in extractive meeting summarization,” in Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on. IEEE, 2009, pp. 387–391.
[10] Yun-Nung Chen and Florian Metze, “Two-layer mutually reinforced random walk for improved multi-party meeting summarization,” 2012.
[11] Fei Liu and Yang Liu, “Towards abstractive speech summarization: Exploring un- supervised and supervised approaches for spoken utterance compression,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 7, pp. 1469– 1480, 2013.
[12] AlexGraves,SantiagoFerna ́ndez,FaustinoGomez,andJu ̈rgenSchmidhuber,“Con- nectionist temporal classification: labelling unsegmented sequence data with recur- rent neural networks,” in Proceedings of the 23rd international conference on Ma- chine learning. ACM, 2006, pp. 369–376.
[13] Ilya Sutskever, Oriol Vinyals, and Quoc V Le, “Sequence to sequence learning with neural networks,” in Advances in Neural Information Processing Systems (NIPS), 2014, pp. 3104–3112.
[14] Kyunghyun Cho, Bart Van Merrie ̈nboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio, “Learning phrase representa- tions using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
[15] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio, “Neural machine trans- lation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
[16] Alexander M Rush, Sumit Chopra, and Jason Weston, “A neural attention model for abstractive sentence summarization,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015.
[17] Bo-Ru Lu, Frank Shyu, Yun-Nung Chen, Hung-Yi Lee, and Lin-Shan Lee, “Order- preserving abstractive summarization for spoken content based on connectionist temporal classification,” Interspeech, 2017.
[18] Sepp Hochreiter and Ju ̈rgen Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[19] Minh-Thang Luong, Hieu Pham, and Christopher D Manning, “Effective approaches to attention-based neural machine translation,” arXiv preprint arXiv:1508.04025, 2015.
[20] Michele Banko, Vibhu O Mittal, and Michael J Witbrock, “Headline generation based on statistical translation,” in Proceedings of the 38th Annual Meeting on As- sociation for Computational Linguistics. Association for Computational Linguistics, 2000, pp. 318–325.
[21] Bonnie Dorr, David Zajic, and Richard Schwartz, “Hedge trimmer: A parse-and- trim approach to headline generation,” in Proceedings of the HLT-NAACL 03 on Text summarization workshop-Volume 5. Association for Computational Linguistics, 2003, pp. 1–8.
[22] Songhua Xu, Shaohui Yang, and Francis Chi-Moon Lau, “Keyword extraction and headline generation using novel word features.,” in AAAI, 2010, pp. 1461–1466.
[23] Konstantin Lopyrev, “Generating news headlines with recurrent neural networks,” arXiv preprint arXiv:1512.01712, 2015.
[24] Jianpeng Cheng and Mirella Lapata, “Neural summarization by extracting sentences and words,” arXiv preprint arXiv:1603.07252, 2016.
[25] Lang-Chi Yu, Hung-yi Lee, and Lin-Shan Lee, “Abstractive headline generation for spoken content by attentive recurrent neural networks with asr error modeling,” in Spoken Language Technology Workshop (SLT). IEEE, 2016, pp. 151–157.
[26] David Graff and Ke Chen, “Chinese gigaword,” LDC Catalog No.: LDC2003T09, ISBN, vol. 1, pp. 58563–58230, 2005.
[27] Hsin-Min Wang, Berlin Chen, Jen-Wei Kuo, and Shih-Sian Cheng, “Matbn: A mandarin chinese broadcast news corpus,” International Journal of Computational Linguistics & Chinese Language Processing, Volume 10, Number 2, June 2005: Special Issue on Annotated Speech Corpora, vol. 10, no. 2, pp. 219–236, 2005.
[28] Jeffrey Pennington, Richard Socher, and Christopher Manning, “Glove: Global vec- tors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.
[29] Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, et al., “The kaldi speech recognition toolkit,” in IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society, 2011, vol. EPFL-CONF-192584.
[30] Mike Schuster and Kuldip K Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997.
[31] Geoffrey Hinton, Nitish Srivastava, and Kevin Swersky, “Neural networks for ma- chine learning lecture 6a overview of mini-batch gradient descent,” Cited on, p. 14, 2012.
[32] Mike Paterson and Vlado Dancˇ ́ık, “Longest common subsequences,” in Interna- tional Symposium on Mathematical Foundations of Computer Science. Springer, 1994, pp. 127–142.
[33] Chin-Yew Lin, “Rouge: A package for automatic evaluation of summaries,” Text Summarization Branches Out, 2004.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72175-
dc.description.abstract由於網路多媒體資訊的蓬勃發展,資料量爆炸性增長的速度已遠遠超過人類 能夠吸收的能力。然而因為語音資訊不同於文字文件,人類若不聽完它就沒有辦 法一目瞭然或在音檔中快速瀏覽。因此,好的語音文件標題自動生成技術可以幫 助我們將語音文件,用一則言簡意賅的標題來描述其內容,幫助使用者大大減少 從語音文件中搜尋知識的時間。
本文是全球首先使用目前語音辨識領域表現最好的方法之一——鏈結式時序 分類器模型(connectionist temporal classification model,CTC)來解決語音文件自 動摘要的問題,其演算法中的良好特性可以在標題中保留原文件中的字詞順序, 而忽略不重要的詞彙,因此能相當程度保留了語意。
本論文分別探討對純文字文件(中文十億詞語料庫)與語音文件(中 文廣播新聞)而言,鏈結式時序分類器模型、序列至序列模型(sequence- to-sequence model,seq2seq)與專注式序列至序列模型(attentive sequence-to- sequence model,attentive seq2seq)之間的優劣,並透過實驗數據與範例來分析鏈 結式時序分類器模型在解決語音及文字文件標題生成問題時的優點。
zh_TW
dc.description.provenanceMade available in DSpace on 2021-06-17T06:27:12Z (GMT). No. of bitstreams: 1
ntu-107-R05942070-1.pdf: 3973798 bytes, checksum: 1e15e796e3f78d5d4ff4e094eb327d8c (MD5)
Previous issue date: 2018
en
dc.description.tableofcontents口試委員會審定書 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
中文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
一、導論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 研究動機 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 本論文研究方向與貢獻 . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 章節安排 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
二、背景知識介紹 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 深層學習基礎觀念 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 基本介紹 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 深層類神經網路 . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3 遞迴神經網路 . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 本論文使用之深層學習模型 . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 鏈結式時序分類器模型 . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 序列至序列模型 . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.3 專注機制 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 解碼演算法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.1 貪婪搜尋法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.2 光束搜尋法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.3 前綴光束搜尋法 . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 本章總結 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
三、文件自動摘要與標題生成介紹 . . . . . . . . . . . . . . . . . . . . . . . . 32
3.1 文件自動摘要與標題生成 . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.1.1 自動新聞標題生成 . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.2 語料庫介紹 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 資料前處理(data preprocessing)方式 . . . . . . . . . . . . . . . . . . . 37
3.2.1 中文字與詞 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.2 詞向量(word vector) . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2.3 自動語音辨識系統 . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 深層學習模型應用於自動標題生成 . . . . . . . . . . . . . . . . . . . 40
3.3.1 鏈結式時序分類器模型 . . . . . . . . . . . . . . . . . . . . . . 40
3.3.2 序列至序列模型與專注式序列至序列模型 . . . . . . . . . . . 44
3.4 本章總結 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
四、實驗設計與結果探討 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.0.1 模型輸入與輸出 . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.1 實驗設置 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.1.1 模型參數 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.1.2 訓練方式 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2 實驗評量方法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2.1 最長共同子序列 . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2.2 ROUGE分數評量法 . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3 實驗結果與分析 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3.1 輸入單位對於實驗結果之影響 . . . . . . . . . . . . . . . . . . 49
4.3.2 最長共同子序列值對於結果之影響 . . . . . . . . . . . . . . . 51
4.3.3 前綴光束搜尋法對結果之影響 . . . . . . . . . . . . . . . . . . 54
4.4 測試集範例預測 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4.1 純文字文件預測標題範例 . . . . . . . . . . . . . . . . . . . . 55
4.4.2 語音文件預測標題範例 . . . . . . . . . . . . . . . . . . . . . . 58
4.4.3 前綴光束搜尋法預測標題範例 . . . . . . . . . . . . . . . . . . 60
4.5 本章總結 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
五、結論與展望 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.1 結論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 未來的改進方向 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
參考文獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
dc.language.isozh-TW
dc.subject自動摘要zh_TW
dc.subject語音數位內容zh_TW
dc.subject標題生成zh_TW
dc.subject鏈結式時序分類器zh_TW
dc.subjectHeadline Generationen
dc.subjectAutomatic Summarizationen
dc.subjectSpoken Contenten
dc.subjectConnectionist Temporal Classificationen
dc.title基於鏈結式時序分類器之能保留語序之語音數位內容標題生成zh_TW
dc.titleOrder-Preserving Abstractive Headline Generation for Spoken Content Based on Connectionist Temporal Classificationen
dc.typeThesis
dc.date.schoolyear106-2
dc.description.degree碩士
dc.contributor.oralexamcommittee王小川,陳信宏,鄭秋豫,李宏毅
dc.subject.keyword自動摘要,語音數位內容,標題生成,鏈結式時序分類器,zh_TW
dc.subject.keywordAutomatic Summarization,Headline Generation,Spoken Content,Connectionist Temporal Classification,en
dc.relation.page70
dc.identifier.doi10.6342/NTU201803826
dc.rights.note有償授權
dc.date.accepted2018-08-17
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電信工程學研究所zh_TW
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-107-1.pdf
  未授權公開取用
3.88 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved