請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67593完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 李琳山(Lin-Shan Lee) | |
| dc.contributor.author | Zih-Wei Lin | en |
| dc.contributor.author | 林資偉 | zh_TW |
| dc.date.accessioned | 2021-06-17T01:39:17Z | - |
| dc.date.available | 2017-08-01 | |
| dc.date.copyright | 2017-08-01 | |
| dc.date.issued | 2017 | |
| dc.date.submitted | 2017-07-28 | |
| dc.identifier.citation | [1] Hung-Yi Lee, Bo-Hsiang Tseng, Tsung-Hsien Wen, and Yu Tsao, “Personalizing recurrent-neural-network-based language model by social network,” IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), vol. 25, no. 3, pp. 519–530, 2017.
[2] John Paolillo, “The virtual speech community: Social network and language variation on irc,” Journal of Computer-Mediated Communication, vol. 4, no. 4, pp. 0–0, 1999. [3] Devan Rosen and Margaret Corbit, “Social network analysis in virtual environments,” in Proceedings of the 20th ACM conference on Hypertext and hypermedia. ACM, 2009, pp. 317–322. [4] Tomas Mikolov, Martin Karafia ́t, Lukas Burget, Jan Cernocky`, and Sanjeev Khu- danpur, “Recurrent neural network based language model.,” in Interspeech, 2010, vol. 2, p. 3. [5] Bo-Hsiang Tseng, Hung-yi Lee, and Lin-Shan Lee, “Personalizing universal recurrent neural network language model with user characteristic features by social network crowdsourcing,” in Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on. IEEE, 2015, pp. 84–91. [6] Tsung-Hsien Wen, Aaron Heidel, Hung-Yi Lee, Yu Tsao, and Lin-Shan Lee, “Recurrent neural network based personalized language modeling by social network crowdsourcing,” in Proc. Interspeech, 2013. [7] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013. [8] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, 2013, pp. 3111–3119. [9] Christopher J Leggetter and Philip C Woodland, “Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models,” Computer Speech & Language, vol. 9, no. 2, pp. 171–185, 1995. [10] Mark JF Gales and Philip C Woodland, “Mean and variance adaptation within the mllr framework,” Computer Speech & Language, vol. 10, no. 4, pp. 249–264, 1996. [11] Mark JF Gales, “Maximum likelihood linear transformations for hmm-based speech recognition,” Computer speech & language, vol. 12, no. 2, pp. 75–98, 1998. [12] J-L Gauvain and Chin-Hui Lee, “Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains,” IEEE transactions on speech and audio processing, vol. 2, no. 2, pp. 291–298, 1994. [13] Michael Finke, Petra Geutner, Hermann Hild, Thomas Kemp, Klaus Ries, and Martin Westphal, “The karlsruhe-verbmobil speech recognition engine,” in Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on. IEEE, 1997, vol. 1, pp. 83–86. [14] Roland Kuhn, J-C Junqua, Patrick Nguyen, and Nancy Niedzielski, “Rapid speaker adaptation in eigenvoice space,” IEEE Transactions on Speech and Audio Processing, vol. 8, no. 6, pp. 695–707, 2000. [15] Jian Xue, Jinyu Li, Dong Yu, Mike Seltzer, and Yifan Gong, “Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network,” in Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014, pp. 6359–6363. [16] George Saon, Hagen Soltau, David Nahamoo, and Michael Picheny, “Speaker adaptation of neural network acoustic models using i-vectors.,” in ASRU, 2013, pp. 55– 59. [17] Peter F Brown, Peter V Desouza, Robert L Mercer, Vincent J Della Pietra, and Jenifer C Lai, “Class-based n-gram models of natural language,” Computational linguistics, vol. 18, no. 4, pp. 467–479, 1992. [18] Joshua T Goodman, “A bit of progress in language modeling,” Computer Speech & Language, vol. 15, no. 4, pp. 403–434, 2001. [19] Jerome R Bellegarda, “Statistical language model adaptation: review and perspectives,” Speech communication, vol. 42, no. 1, pp. 93–108, 2004. [20] Yoshua Bengio, Re ́jean Ducharme, Pascal Vincent, and Christian Jauvin, “A neural probabilistic language model,” Journal of machine learning research, vol. 3, no. Feb, pp. 1137–1155, 2003. [21] Rukmini M Iyer and Mari Ostendorf, “Modeling long distance dependence in language: Topic mixtures versus dynamic cache models,” IEEE Transactions on speech and audio processing, vol. 7, no. 1, pp. 30–39, 1999. [22] Aaron Heidel, Hung-an Chang, and Lin-shan Lee, “Language model adaptation using latent dirichlet allocation and an efficient topic inference algorithm,” in Interspeech, Antwerp, Belgium, 2007, number interspeech. [23] Marcello Federico, “Efficient language model adaptation through mdi estimation.,” in Eurospeech, 1999. [24] Ciprian Chelba and Frederick Jelinek, “Structured language modeling,” Computer Speech & Language, vol. 14, no. 4, pp. 283–332, 2000. [25] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012. [26] George E Dahl, Tara N Sainath, and Geoffrey E Hinton, “Improving deep neural networks for lvcsr using rectified linear units and dropout,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 8609–8613. [27] Tasha Nagamine, Michael L Seltzer, and Nima Mesgarani, “Exploring how deep neural networks form phonemic categories,” in Sixteenth Annual Conference of the International Speech Communication Association, 2015. [28] Vinod Nair and Geoffrey E Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807–814. [29] Sepp Hochreiter and Ju ̈rgen Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. [30] Gre ́goire Mesnil, Yann Dauphin, Kaisheng Yao, Yoshua Bengio, Li Deng, Dilek Hakkani-Tur, Xiaodong He, Larry Heck, Gokhan Tur, Dong Yu, et al., “Using recurrent neural networks for slot filling in spoken language understanding,” IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), vol. 23, no. 3, pp. 530–539, 2015. [31] Tijmen Tieleman and Geoffrey Hinton, “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural networks for machine learning, vol. 4, no. 2, pp. 26–31, 2012. [32] Tomas Mikolov and Geoffrey Zweig, “Context dependent recurrent neural network language model.,” SLT, vol. 12, pp. 234–239, 2012. [33] Tsubasa Ochiai, Shigeki Matsuda, Hideyuki Watanabe, Xugang Lu, Chiori Hori, and Shigeru Katagiri, “Speaker adaptive training for deep neural networks embedding linear transformation networks,” in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 4605–4609. [34] Bo Li and Khe Chai Sim, “Comparison of discriminative input and output transformations for speaker adaptation in the hybrid nn/hmm systems,” 2010. [35] Matt Taddy, “Document classification by inversion of distributed language representations,” CoRR, vol. abs/1504.07295, 2015. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67593 | - |
| dc.description.abstract | 隨著網路世代巨量資料的產生以及機器學習技術的發展,語音助理等多種語音技術的應用不再只是創造噱頭的玩具,而是藉由強大的辨識能力以及持續進步的理解能力進入這一世代人的生活中。今日大公司利用大量的語音資料以及龐大的運算資源在雲端伺服器建立一個通用的語音辨識系統以服務所有使用者,每天有成千上萬的使用者使用同一套通用語音技術,一部分的使用者得到令他們滿意的服務,一部分的使用者卻因為通用語音技術無法正確辨識或理解而感到挫折,其中一個可能原因是因為使用者的語句中參雜了帶有個人特色的說話習慣,例如口頭禪、流行用語、特殊主題、領域的用詞,或是朋友的名字等,與辨識或理解模型無法合理匹配。
解決這些問題的方法正是替每一位使用者建立帶有個人特色的說話習慣的專屬個人語音技術,以解決通用模型辨識及理解能力不足的問題。本論文針對語言處理的部分,使用由社群網路蒐集獲得的語料進行處理,藉由這些少量的個人化資料學習出每個使用者不同的用詞習慣以及語言特徵,幫助個人化語音處理以及語意理解。 本論文首先針對語音辨識系統中的語言模型加以個人化,藉由深層類神經網路,從個人化資料中抽取出代表使用者語言特徵的向量,利用使用者特徵將通用語言模型加以個人化來提高辨識正確率;再者,本論文也針對表示語意的詞向量加以個人化,使得相同的詞在不同的使用者事實上帶有略不相同的語意,更能貼近使用者想要表達的意思,以提升個人化語音處理及語意理解的能力。希望能夠藉由這兩個方向的個人化,補足通用語音辨識技術的不足,提升語音處理的使用者體驗。 | zh_TW |
| dc.description.provenance | Made available in DSpace on 2021-06-17T01:39:17Z (GMT). No. of bitstreams: 1 ntu-106-R04942111-1.pdf: 3466054 bytes, checksum: da2cd274b5b24371092c0192e6694306 (MD5) Previous issue date: 2017 | en |
| dc.description.tableofcontents | 誌謝.......................................... i
中文摘要....................................... iii 一、導論....................................... 1 1.1 背景..................................... 1 1.2 研究動機.................................. 3 1.3 研究方向.................................. 4 1.4 主要貢獻.................................. 5 1.5 章節安排.................................. 6 二、背景知識 .................................... 8 2.1 個人化語音辨識系統 ........................... 8 2.1.1 聲學模型調適 ........................... 8 2.2 語言模型.................................. 9 2.2.1 N連文法語言模型(N-gram Language Model) . . . . . . . . . 10 2.2.2 N連文法語言模型調適 ...................... 10 2.2.3 深層類神經網路語言模型 .................... 11 2.2.4 遞迴式類神經網路語言模型 ................... 15 2.2.5 語言模型評估 ........................... 18 2.3 詞向量(Word2vec) ........................... 19 2.3.1 模型架構.............................. 20 2.3.2 詞向量餘弦距離之效能評估 ................... 22 2.3.3 詞向量填空問題之效能評估 ................... 24 三、具備語音介面之社群網路瀏覽器 ....................... 25 3.1 簡介..................................... 25 3.2 系統架構.................................. 25 3.3 系統生態系................................. 27 四、基於遞迴類神經網路瓶頸特徵之個人化語言模型.............. 29 4.1 基於使用者特徵的語言模型個人化 ................... 29 4.2 使用者語言特徵抽取模型......................... 29 4.2.1 預測式特徵抽取模型架構 .................... 30 4.2.2 實驗設定.............................. 31 4.2.3 實驗結果.............................. 32 4.3 結合特徵抽取網路之個人化語言模型 .................. 34 4.3.1 瓶頸特徵.............................. 34 4.3.2 個人化語言模型架構 ....................... 34 4.3.3 個人化語言模型實驗設定 .................... 37 4.3.4 個人化語言模型實驗結果 .................... 39 4.3.5 使用者語言特徵分析 ....................... 42 4.4 本章總結.................................. 44 五、基於模型轉換之個人化語言模型 ....................... 45 5.1 簡介..................................... 45 5.2 語言模型調適方法............................. 45 5.3 語言模型參數調適實驗 .......................... 47 5.4 語言模型參數調適之分析與探討..................... 48 5.5 基於詞向量之兩段式個人化語言模型架構 ............... 50 5.6 本章總結.................................. 51 六、個人化詞向量.................................. 52 6.1 背景語料與個人化語料之不匹配(Mismatch)............. 52 6.2 個人化詞向量之優勢 ........................... 53 6.3 個人化詞向量之模型架構......................... 54 6.4 實驗方法.................................. 56 6.4.1 實驗基本設定 ........................... 56 6.4.2 語言模型.............................. 57 6.4.3 填空問題.............................. 58 6.4.4 使用者預測 ............................ 60 6.5 實驗結果.................................. 62 6.5.1 語言模型結果 ........................... 62 6.5.2 填空問題結果 ........................... 64 6.5.3 使用者預測結果.......................... 66 6.6 視覺化範例................................. 68 6.7 本章總結.................................. 69 七、個人化語料之辭典外詞彙處理 ........................ 70 7.1 問題說明.................................. 70 7.2 訓練方法.................................. 71 7.3 實驗設定.................................. 72 7.4 辭典例外詞之效能範例 .......................... 73 7.5 應用於填空問題之實驗結果 ....................... 74 7.5.1 總體結果.............................. 74 7.5.2 資料量對結果的影響 ....................... 75 7.6 本章總結.................................. 76 八、結論與展望 ................................... 77 8.1 本論文主要的研究貢獻 .......................... 77 8.2 未來展望.................................. 78 參考文獻....................................... 79 | |
| dc.language.iso | zh-TW | |
| dc.subject | 語音辨識 | zh_TW |
| dc.subject | 自然語言處理 | zh_TW |
| dc.subject | 個人化詞向量 | zh_TW |
| dc.subject | 個人化語言模型 | zh_TW |
| dc.subject | Personalized Language Model | en |
| dc.subject | Personalized Word Representation | en |
| dc.subject | Natural Language Processing | en |
| dc.subject | Speech Recognition | en |
| dc.title | 個人化語言處理:語言模型及理解 | zh_TW |
| dc.title | Personalized Linguistic Processing: Language Modeling and Understanding | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 105-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 鄭秋豫,王小川,陳信宏,李宏毅(Hung-Yi Lee) | |
| dc.subject.keyword | 個人化語言模型,個人化詞向量,自然語言處理,語音辨識, | zh_TW |
| dc.subject.keyword | Personalized Language Model,Personalized Word Representation,Natural Language Processing,Speech Recognition, | en |
| dc.relation.page | 84 | |
| dc.identifier.doi | 10.6342/NTU201702200 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2017-07-31 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 電信工程學研究所 | zh_TW |
| 顯示於系所單位: | 電信工程學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-106-1.pdf 未授權公開取用 | 3.38 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
