Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/54331
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor李琳山
dc.contributor.authorYuan-Ming Liouen
dc.contributor.author劉元銘zh_TW
dc.date.accessioned2021-06-16T02:50:54Z-
dc.date.available2016-07-20
dc.date.copyright2015-07-20
dc.date.issued2015
dc.date.submitted2015-07-14
dc.identifier.citationRitendra Datta, Jia Li, and James Z Wang, “Content-based image retrieval: approaches
and trends of the new age,” in Proceedings of the 7th ACM SIGMM international
workshop on Multimedia information retrieval. ACM, 2005, pp. 253–262.
[2] Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jonathan Ashley, Qian Huang,
Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, et al.,
“Query by image and video content: The qbic system,” Computer, vol. 28, no. 9,
pp. 23–32, 1995.
[3] John R Smith and Shih-Fu Chang, “Visualseek: a fully automated content-based
image query system,” in Proceedings of the fourth ACM international conference
on Multimedia. ACM, 1997, pp. 87–98.
[4] Yi Hsuan Yang, Po Tun Wu, Ching Wei Lee, Kuan Hung Lin, Winston H Hsu, and
Homer H Chen, “Contextseer: context search and recommendation at query time for
shared consumer photos,” in Proceedings of the 16th ACM international conference
on Multimedia. ACM, 2008, pp. 199–208.
[5] Liana Stanescu, Dumitru Dan Burdescu, Marius Brezovan, and Cristian Gabriel Mihai,
“Semantic-based image retrieval,” in Creating New Medical Ontologies for
Image Annotation, pp. 91–102. Springer, 2012.
[6] Ciprian Chelba, Timothy J Hazen, and Murat Sarac¸lar, “Retrieval and browsing
of spoken content,” Signal Processing Magazine, IEEE, vol. 25, no. 3, pp. 39–49,
2008.
81
[7] Lyndon Kennedy, Mor Naaman, Shane Ahern, Rahul Nair, and Tye Rattenbury,
“How flickr helps us make sense of the world: context and content in communitycontributed
media collections,” in Proceedings of the 15th international conference
on Multimedia. ACM, 2007, pp. 631–640.
[8] Jiayi Chen, Tele Tan, Philippe Mulhem, and Mohan Kankanhalli, “An improved
method for image retrieval using speech annotation.,” in MMM, 2003, pp. 15–32.
[9] Timothy J Hazen, Brennan Sherry, and Mark Adler, “Speech-based annotation and
retrieval of digital photographs.,” in INTERSPEECH, 2007, vol. 7, pp. 2165–2168.
[10] Dmitri V Kalashnikov, Sharad Mehrotra, Jie Xu, and Nalini Venkatasubramanian,
“A semantics-based approach for speech annotation of images,” Knowledge and
Data Engineering, IEEE Transactions on, vol. 23, no. 9, pp. 1373–1387, 2011.
[11] Xavier Anguera, JieJun Xu, and Nuria Oliver, “Multimodal photo annotation and retrieval
on a mobile phone,” in Proceedings of the 1st ACM international conference
on Multimedia information retrieval. ACM, 2008, pp. 188–194.
[12] Xuedong Huang, Alex Acero, Hsiao-Wuen Hon, and Raj Foreword By-Reddy, Spoken
language processing: A guide to theory, algorithm, and system development,
Prentice Hall PTR, 2001.
[13] Rivarol Vergin, Douglas O’shaughnessy, and Azarshid Farhat, “Generalized mel frequency
cepstral coefficients for large-vocabulary speaker-independent continuousspeech
recognition,” Speech and Audio Processing, IEEE Transactions on, vol. 7,
no. 5, pp. 525–532, 1999.
82
[14] Lawrence Rabiner, “A tutorial on hidden markov models and selected applications
in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, 1989.
[15] Slava Katz, “Estimation of probabilities from sparse data for the language model
component of a speech recognizer,” Acoustics, Speech and Signal Processing, IEEE
Transactions on, vol. 35, no. 3, pp. 400–401, 1987.
[16] Stefan Ortmanns, Hermann Ney, and Xavier Aubert, “A word graph algorithm for
large vocabulary continuous speech recognition,” Computer Speech & Language,
vol. 11, no. 1, pp. 43–72, 1997.
[17] FrankWessel, Ralf Schluter, Klaus Macherey, and Hermann Ney, “Confidence measures
for large vocabulary continuous speech recognition,” Speech and Audio Processing,
IEEE Transactions on, vol. 9, no. 3, pp. 288–298, 2001.
[18] Lidia Mangu, Eric Brill, and Andreas Stolcke, “Finding consensus in speech recognition:
word error minimization and other applications of confusion networks,”
Computer Speech & Language, vol. 14, no. 4, pp. 373–400, 2000.
[19] Yi-Sheng Fu, Yi-Cheng Pan, and Lin-Shan Lee, “Improved large vocabulary continuous
chinese speech recognition by character-based consensus networks,” in Chinese
Spoken Language Processing, pp. 422–434. Springer, 2006.
[20] Lin-shan Lee and Berlin Chen, “Spoken document understanding and organization,”
Signal Processing Magazine, IEEE, vol. 22, no. 5, pp. 42–60, 2005.
[21] Ya-chao Hsieh, Yu-tsun Huang, Chien-chih Wang, and Lin-shan Lee, “Improved
spoken document retrieval with dynamic key term lexicon and probabilistic latent se-
83
mantic analysis (plsa),” in Acoustics, Speech and Signal Processing, 2006. ICASSP
2006 Proceedings. 2006 IEEE International Conference on. IEEE, 2006, vol. 1, pp.
I–I.
[22] Ciprian Chelba, Jorge Silva, and Alex Acero, “Soft indexing of speech content for
search in spoken documents,” Computer Speech & Language, vol. 21, no. 3, pp.
458–478, 2007.
[23] Leif Azzopardi, Mark Girolami, and CJ Van Rijsbergen, “Topic based language
models for ad hoc information retrieval,” in Neural Networks, 2004. Proceedings.
2004 IEEE International Joint Conference on. IEEE, 2004, vol. 4, pp. 3281–3286.
[24] Thomas Hofmann, “Probabilistic latent semantic indexing,” in Proceedings of the
22nd annual international ACM SIGIR conference on Research and development in
information retrieval. ACM, 1999, pp. 50–57.
[25] Scott C. Deerwester, Susan T Dumais, Thomas K. Landauer, GeorgeW. Furnas, and
Richard A. Harshman, “Indexing by latent semantic analysis,” JAsIs, vol. 41, no. 6,
pp. 391–407, 1990.
[26] Arthur P Dempster, Nan M Laird, and Donald B Rubin, “Maximum likelihood from
incomplete data via the em algorithm,” Journal of the royal statistical society. Series
B (methodological), pp. 1–38, 1977.
[27] Ajit P Singh and Geoffrey J Gordon, “A unified view of matrix factorization models,”
in Machine Learning and Knowledge Discovery in Databases, pp. 358–373.
Springer, 2008.
84
[28] Daniel D Lee and H Sebastian Seung, “Learning the parts of objects by non-negative
matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–791, 1999.
[29] Daniel D Lee and H Sebastian Seung, “Algorithms for non-negative matrix factorization,”
in Advances in neural information processing systems, 2001, pp. 556–562.
[30] Lawrence Page, Sergey Brin, Rajeev Motwani, and TerryWinograd, “The pagerank
citation ranking: Bringing order to the web.,” 1999.
[31] Gぴunes Erkan and Dragomir R Radev, “Lexrank: graph-based lexical centrality as
salience in text summarization,” Journal of Artificial Intelligence Research, pp.
457–479, 2004.
[32] Robin Pemantle et al., “A survey of random processes with reinforcement,” Probab.
Surv, vol. 4, no. 0, pp. 1–79, 2007.
[33] Geoffrey E Hinton, “Learning distributed representations of concepts,” in Proceedings
of the eighth annual conference of the cognitive science society. Amherst, MA,
1986, vol. 1, p. 12.
[34] Wei Xu and Alexander I Rudnicky, “Can artificial neural networks learn language
models,” 2000.
[35] Andriy Mnih and Geoffrey Hinton, “Three new graphical models for statistical language
modelling,” in Proceedings of the 24th international conference on Machine
learning. ACM, 2007, pp. 641–648.
[36] Tomas Mikolov, Martin Karafi´at, Lukas Burget, Jan Cernock`y, and Sanjeev Khudanpur,
“Recurrent neural network based language model.,” in INTERSPEECH 2010,
85
11th Annual Conference of the International Speech Communication Association,
Makuhari, Chiba, Japan, September 26-30, 2010, 2010, pp. 1045–1048.
[37] Ronan Collobert, JasonWeston, L´eon Bottou, Michael Karlen, Koray Kavukcuoglu,
and Pavel Kuksa, “Natural language processing (almost) from scratch,” The Journal
of Machine Learning Research, vol. 12, pp. 2493–2537, 2011.
[38] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean, “Distributed
representations of words and phrases and their compositionality,” in Advances
in Neural Information Processing Systems 26, C.J.C. Burges, L. Bottou,
M. Welling, Z. Ghahramani, and K.Q. Weinberger, Eds., pp. 3111–3119. Curran
Associates, Inc., 2013.
[39] Jeffrey Pennington, Richard Socher, and Christopher D Manning, “Glove: Global
vectors for word representation,” Proceedings of the Empiricial Methods in Natural
Language Processing (EMNLP 2014), vol. 12, 2014.
[40] Pierre Tirilly, Vincent Claveau, and Patrick Gros, “Language modeling for bagof-
visual words image categorization,” in Proceedings of the 2008 international
conference on Content-based image and video retrieval. ACM, 2008, pp. 249–258.
[41] David G Lowe, “Distinctive image features from scale-invariant keypoints,” International
journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.
[42] Corinna Cortes and Vladimir Vapnik, “Support-vector networks,” Machine learning,
vol. 20, no. 3, pp. 273–297, 1995.
86
[43] Murat Saraclar and Richard Sproat, “Lattice-based search for spoken utterance retrieval,”
Urbana, vol. 51, pp. 61801, 2004.
[44] Yi-Cheng Pan, Hung-lin Chang, and Lin-shan Lee, “Analytical comparison between
position specific posterior lattices and confusion networks based on words and subword
units for spoken document indexing,” in Automatic Speech Recognition &
Understanding, 2007. ASRU. IEEE Workshop on. IEEE, 2007, pp. 677–682.
[45] XingWei andWBruce Croft, “Lda-based document models for ad-hoc retrieval,” in
Proceedings of the 29th annual international ACM SIGIR conference on Research
and development in information retrieval. ACM, 2006, pp. 178–185.
[46] Satoru Tsuge, Masami Shishibori, Shingo Kuroiwa, and Kenji Kita, “Dimensionality
reduction using non-negative matrix factorization for information retrieval,”
in Systems, Man, and Cybernetics, 2001 IEEE International Conference on. IEEE,
2001, vol. 2, pp. 960–965.
[47] Emine Yilmaz and Javed A Aslam, “Estimating average precision with incomplete
and imperfect judgments,” in Proceedings of the 15th ACM international conference
on Information and knowledge management. ACM, 2006, pp. 102–111.
[48] Eric Gaussier and Cyril Goutte, “Relation between plsa and nmf and implications,”
in Proceedings of the 28th annual international ACM SIGIR conference on Research
and development in information retrieval. ACM, 2005, pp. 601–602.
[49] Deng Cai, Xiaofei He, Jiawei Han, and Thomas S Huang, “Graph regularized nonnegative
matrix factorization for data representation,” Pattern Analysis and Machine
Intelligence, IEEE Transactions on, vol. 33, no. 8, pp. 1548–1560, 2011.
87
[50] Rafael A Calvo and Sunghwan Mac Kim, “Emotions in text: dimensional and categorical
models,” Computational Intelligence, vol. 29, no. 3, pp. 527–543, 2013.
[51] Yun-Nung Chen and Florian Metze, “Two-layer mutually reinforced random walk
for improved multi-party meeting summarization.,” in SLT, 2012, pp. 461–466.
[52] Sujay Kumar Jauhar, Yun-Nung Chen, and Florian Metze, “Prosody-based unsupervised
speech summarization with two-layer mutually reinforced random walk,”
IJCNLP 2013, 2013.
[53] Geoffrey E Hinton, “Distributed representations,” 1984.
[54] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams, “Learning representations
by back-propagating errors,” Cognitive modeling, vol. 5, 1988.
[55] Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig, “Linguistic regularities in continuous
space word representations.,” in HLT-NAACL, 2013, pp. 746–751.
[56] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, “Efficient estimation
of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[57] Mikael Boden, “A guide to recurrent neural networks and backpropagation,” 2001.
[58] Alex Graves, A-R Mohamed, and Geoffrey Hinton, “Speech recognition with deep
recurrent neural networks,” in Acoustics, Speech and Signal Processing (ICASSP),
2013 IEEE International Conference on. IEEE, 2013, pp. 6645–6649.
[59] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton, “Imagenet classification
with deep convolutional neural networks,” in Advances in neural information processing
systems, 2012, pp. 1097–1105.
88
[60] Geoffrey E Hinton and Ruslan R Salakhutdinov, “Reducing the dimensionality of
data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006.
[61] Andrej Karpathy and Li Fei-Fei, “Deep visual-semantic alignments for generating
image descriptions,” arXiv preprint arXiv:1412.2306, 2014.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/54331-
dc.description.abstract本論文探討在使用者加入稀疏語音標註之條件下,如何進行有效的個人相片語意檢索(semantic retrieval of personal photos)。由於近年來數位相機及智慧型手機十分普及,使用者通常會迅速累積大量的個人相片,隨之而來的一個重要問題便是如何在數量龐大的個人相片資料庫中快速瀏覽與搜尋。一般使用者都喜歡直接用語意式查詢指令(semantic query) 來找相片,例如「母親節聚餐」。但以前的個人相片檢索多半是以內容為基礎的影像檢索(content-based image retrieval, CBIR),倚賴影像低階描述特徵且必須以一張相片作為查詢指令(query),並不適用於使用高階語意概念(high level smenatic concepts) 的影像檢索;而以語意為基礎的影像檢索則非常倚賴影像相關的標籤(tags) 或標註(annotations),但使用者不太可能把所有的相片都加上標註,且使用語音標註的方式又比使用鍵盤輸入的文字標註來的更為方便,所以本論文把主題設定在使用者輸入稀疏語音標註之條件下的個人相片語意檢索,亦即有少數相片上有語音標註。實現的方法主要是利用主題模型(topic model) 整合語音和影像特徵,並使用隨機漫步模型進行重新排序(re-ranking),最後再提出使用分佈式詞會表示法(distributed word representation) 來舒緩語音特徵稀疏的問題。
首先,由於語音標註可能在任何地方被錄製,可能是非常自發性的
(spontaneous) 說話方式,所以導致辨識率低下,所以利用詞圖進行抽取字詞頻率頻率期望值(expected term frequency) 當作是語音特徵,但只有少數的相片有語音標註,所以我們必需對每張相片抽取局部(local) 與全域(global) 的影像特徵,來補充語音特徵所遺漏的資訊。而本論文利用主題模型來整合語音和影像特徵,並以此模型訓練出來的「潛藏主題」建構檢索模型。
此外,我們發現主題模型的檢索效能還有很多進步空間,所以把從主題模型檢索出的首次檢索結果(first-pass retrieval results) ,基於字詞頻率期望值、局部與全域的影像特徵計算相片之間的相似度,再套用隨機漫步模型(random walk) 演算法,讓相似度越高的相片獲得越相近的相關分數(relevance score) ,進而達成重新排序的效果,並使其檢索效能獲得相當大的進步。
此外,我們發現由於語音特徵非常稀疏,導致在訓練主題模型時就特別仰賴影像特徵,但其實語音特徵才是最主要提供使用者個人化與語意資訊的來源,所以進一步使用近年在尋找語意(semantic) 和句法(syntactic) 相關詞的任務中有良好表現的分佈式詞彙表示法,基於字詞頻率期望值與整體影像語意概念,以類似自動增加標註的方法找出相關詞並加入語音特徵中,讓原本稀疏的語音特徵不再稀疏,進而讓主題模型在訓練時考慮更多個人化與語意相關的資訊,並且也讓隨機漫步模型重新排序的效能也更好。
zh_TW
dc.description.provenanceMade available in DSpace on 2021-06-16T02:50:54Z (GMT). No. of bitstreams: 1
ntu-104-R02942070-1.pdf: 5020514 bytes, checksum: 37c67e8c4b043d953e069dd9b561b4df (MD5)
Previous issue date: 2015
en
dc.description.tableofcontents誌謝. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
一、導論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 研究動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 相關前人研究. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 本論文研究貢獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 章節安排. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
二、背景知識. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 語音辨識. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 前端處理. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 統計式語音辨識. . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.3 聲學模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.4 語言模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.5 搜尋演算法與詞圖. . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 檢索系統簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 語音文件檢索. . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 以內容為基礎的影像檢索. . . . . . . . . . . . . . . . . . . . 13
2.3 主題模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 機率式潛藏語意分析. . . . . . . . . . . . . . . . . . . . . . . 14
2.3.2 非負矩陣分解. . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 圖論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.1 隨機漫步(Random Walk) . . . . . . . . . . . . . . . . . . . . . 20
2.5 詞彙表示法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.1 傳統的詞彙表示法. . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.2 分佈式詞彙表示法. . . . . . . . . . . . . . . . . . . . . . . . 22
三、以主題模型整合語音和影像特徵實現個人相片語意檢索系統. . . . . . . 25
3.1 簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 影像特徵抽取. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.1 視覺詞作為局部影像特徵. . . . . . . . . . . . . . . . . . . . 28
3.3.2 整體影像語意概念作為全域影像特徵. . . . . . . . . . . . . . 31
3.4 語音特徵抽取. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.5 建立相片文件. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.6 利用主題模型建構檢索模型. . . . . . . . . . . . . . . . . . . . . . . 35
3.6.1 機率式潛藏語意分析檢索模型. . . . . . . . . . . . . . . . . . 35
3.6.2 非負矩陣分解檢索模型. . . . . . . . . . . . . . . . . . . . . . 36
3.7 實驗基礎設置. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.7.1 個人相片資料庫. . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.7.2 個人相片語音標註收集過程. . . . . . . . . . . . . . . . . . . 38
3.7.3 語音辨識結果. . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.7.4 實驗配置. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.7.5 評估方式. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.8 實驗結果與分析. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.9 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
四、利用隨機漫步模型增強個人相片語意檢索系統. . . . . . . . . . . . . . . 43
4.1 簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 隨機漫步模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3.1 裁剪. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3.2 單層隨機漫步模型. . . . . . . . . . . . . . . . . . . . . . . . 46
4.3.3 雙層隨機漫步模型. . . . . . . . . . . . . . . . . . . . . . . . 47
4.4 基本實驗配置. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.5 實驗結果與討論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.5.1 整體實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.5.2 探討內插參數 與裁剪方法. . . . . . . . . . . . . . . . . . . 53
4.6 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
五、使用分佈式詞彙表示法加強稀疏語音標注之個人相片語意檢索系統. . . 57
5.1 簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.3 使用分佈式詞彙表示法模型. . . . . . . . . . . . . . . . . . . . . . . 58
5.3.1 遞迴式類神經網路語言模型. . . . . . . . . . . . . . . . . . . 60
5.3.2 連續詞袋模型. . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3.3 跳躍文法模型. . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.4 從詞彙表示法模型找出相關詞加入相片文件. . . . . . . . . . . . . . 67
5.5 基本實驗配置. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.6 實驗結果與分析. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.6.1 使用分佈式詞彙表示法模型. . . . . . . . . . . . . . . . . . . 70
5.6.2 再使用隨機漫步模型之結果. . . . . . . . . . . . . . . . . . . 73
5.7 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
六、結論與展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.1 結論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.2 未來研究方向. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2.1 語音辨識系統的改進. . . . . . . . . . . . . . . . . . . . . . . 79
6.2.2 影像特徵上的改進. . . . . . . . . . . . . . . . . . . . . . . . 80
6.2.3 整合語音和影像特徵模型的改進. . . . . . . . . . . . . . . . 80
6.2.4 自動標註之研究. . . . . . . . . . . . . . . . . . . . . . . . . . 80
參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
dc.language.isozh-TW
dc.subject個人相片zh_TW
dc.subject個人相片zh_TW
dc.subject隨機漫步zh_TW
dc.subject詞彙表示法zh_TW
dc.subject檢索zh_TW
dc.subject檢索zh_TW
dc.subject隨機漫步zh_TW
dc.subject詞彙表示法zh_TW
dc.subjectrandom walken
dc.subjectretrievalen
dc.subjectrandom walken
dc.subjectword representationen
dc.subjectpersonal photosen
dc.subjectretrievalen
dc.subjectword representationen
dc.subjectpersonal photosen
dc.title使用隨機漫步及分佈式詞彙表示法增強個人相片之語意檢索zh_TW
dc.titleEnhanced Semantic Retrieval of Personal Photos Using Random Walk and Distributed Word Representationsen
dc.typeThesis
dc.date.schoolyear103-2
dc.description.degree碩士
dc.contributor.oralexamcommittee李宏毅,鄭秋豫,王小川,陳信宏,簡仁宗
dc.subject.keyword檢索,隨機漫步,詞彙表示法,個人相片,zh_TW
dc.subject.keywordretrieval,random walk,word representation,personal photos,en
dc.relation.page89
dc.rights.note有償授權
dc.date.accepted2015-07-14
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電信工程學研究所zh_TW
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-104-1.pdf
  未授權公開取用
4.9 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved