Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/62252
Title: 具備語音功能的雲端應用: 個人化語言模型與互動式語音文件檢索
Voice Access of Cloud Applications : Language Model Personalization and Interactive Spoken Content Retrieval
Authors: Tsung-Hsien Wen
溫宗憲
Advisor: 李琳山(Lin-shan Lee)
Keyword: 個人化,語言模型,社群網路,互動式檢索,馬可夫決策模型,強化學習,
Personalized Language Modeling,Social Network,Interactive Retrieval,Markov Decision Process,Reinforcement Learning,
Publication Year : 2013
Degree: 碩士
Abstract: 本論文探討具備語音功能的兩項雲端應用技術:個人化的語言模型
及互動式語音文件檢索。
在語音辨識中,模型的不匹配對辨識率一向有很大的損害。由於個
人化手機普及,個人化辨識系統成為可行,而考量到每個個人語言使
用習慣的差異,語言模型的個人化有其必要。過去個人語料庫建立不
易,但如今,越來越多人習慣性地在社群網站上留下大量的文章與留
言,故個人化語料庫較以前容易取得許多。但資料稀疏的問題仍然不
易解決。在本論文中,我們提出以各種方式估計社群網站上不同使用
者間的用語相似度,並據以加入不同使用者的語料庫來幫助估計更強
健的個人化語言模型。我們並比較了用N 連文法語言模型及遞迴式類
神經網路語言模型來實做的效能表現,並驗證了新提出的方法確實提
升了對個人語言的預測能力。
在第二部分裡,我們探討互動式語音文件檢索。由於語音文件很難
呈現且瀏覽耗時,而過差的辨識率更可能使檢索結果不如人意,因此
藉由與使用者互動使系統對使用者想找的資訊有更多瞭解,是一個有
效改善此問題的方法。在本論文中,我們用馬可夫決策模型(Markov
Decision Process, MDP) 來模擬互動式檢索的問題,並採用強化學習
(Reinforcement Learning) 演算法學習出最佳系統決策,亦採用不同的檢
索模型來實作檢索系統。實驗顯示,我們提出的方法確實能夠輔助檢
索進行,幫助使用者更有效的找到所要找的資訊。
This thesis considers voice access of cloud applications with two parts: (1)
Personalized Language Model and (2) Interactive spoken document retrieval.
Model mismatch has been a major problem in speech recognition. With
hand-held devices widely used today, personalized models become possible.
A huge quantities of posts and comments with known owners emerged on
social network websites, personal corpora become practically available but
with data sparseness problem unsolved. In the first part of this thesis, we
proposed personalized language modeling approaches by estimating the language
similarities between different social network users and integrating the
corresponding personal corpora accordingly. We studied both N-gram language
models as well as recurrent neural network language models, and the
experimental results support the concept.
In the second part of this thesis, we studied interactive spoken document
retrieval. Interactive retrieval is helpful to spoken content retrieval because
retrieved spoken items are difficult to be shown on screen and browsed by
the user, in addition to the speech recognition uncertainty. We model the
interaction process by a Markov Decision Process and train the policy with
Reinforcement Learning. Experimental results demonstrate the retrieval performance
can be improved with the interactions.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/62252
Fulltext Rights: 有償授權
Appears in Collections:電機工程學系

Files in This Item:
File SizeFormat 
ntu-102-1.pdf
  Restricted Access
4.97 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved