使用隨機漫步及分佈式詞彙表示法增強個人相片之語意檢索

Yuan-Ming Liou; 劉元銘

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/54331

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李琳山
dc.contributor.author	Yuan-Ming Liou	en
dc.contributor.author	劉元銘	zh_TW
dc.date.accessioned	2021-06-16T02:50:54Z	-
dc.date.available	2016-07-20
dc.date.copyright	2015-07-20
dc.date.issued	2015
dc.date.submitted	2015-07-14
dc.identifier.citation	Ritendra Datta, Jia Li, and James Z Wang, “Content-based image retrieval: approaches and trends of the new age,” in Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval. ACM, 2005, pp. 253–262. [2] Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jonathan Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, et al., “Query by image and video content: The qbic system,” Computer, vol. 28, no. 9, pp. 23–32, 1995. [3] John R Smith and Shih-Fu Chang, “Visualseek: a fully automated content-based image query system,” in Proceedings of the fourth ACM international conference on Multimedia. ACM, 1997, pp. 87–98. [4] Yi Hsuan Yang, Po Tun Wu, Ching Wei Lee, Kuan Hung Lin, Winston H Hsu, and Homer H Chen, “Contextseer: context search and recommendation at query time for shared consumer photos,” in Proceedings of the 16th ACM international conference on Multimedia. ACM, 2008, pp. 199–208. [5] Liana Stanescu, Dumitru Dan Burdescu, Marius Brezovan, and Cristian Gabriel Mihai, “Semantic-based image retrieval,” in Creating New Medical Ontologies for Image Annotation, pp. 91–102. Springer, 2012. [6] Ciprian Chelba, Timothy J Hazen, and Murat Sarac¸lar, “Retrieval and browsing of spoken content,” Signal Processing Magazine, IEEE, vol. 25, no. 3, pp. 39–49, 2008. 81 [7] Lyndon Kennedy, Mor Naaman, Shane Ahern, Rahul Nair, and Tye Rattenbury, “How flickr helps us make sense of the world: context and content in communitycontributed media collections,” in Proceedings of the 15th international conference on Multimedia. ACM, 2007, pp. 631–640. [8] Jiayi Chen, Tele Tan, Philippe Mulhem, and Mohan Kankanhalli, “An improved method for image retrieval using speech annotation.,” in MMM, 2003, pp. 15–32. [9] Timothy J Hazen, Brennan Sherry, and Mark Adler, “Speech-based annotation and retrieval of digital photographs.,” in INTERSPEECH, 2007, vol. 7, pp. 2165–2168. [10] Dmitri V Kalashnikov, Sharad Mehrotra, Jie Xu, and Nalini Venkatasubramanian, “A semantics-based approach for speech annotation of images,” Knowledge and Data Engineering, IEEE Transactions on, vol. 23, no. 9, pp. 1373–1387, 2011. [11] Xavier Anguera, JieJun Xu, and Nuria Oliver, “Multimodal photo annotation and retrieval on a mobile phone,” in Proceedings of the 1st ACM international conference on Multimedia information retrieval. ACM, 2008, pp. 188–194. [12] Xuedong Huang, Alex Acero, Hsiao-Wuen Hon, and Raj Foreword By-Reddy, Spoken language processing: A guide to theory, algorithm, and system development, Prentice Hall PTR, 2001. [13] Rivarol Vergin, Douglas O’shaughnessy, and Azarshid Farhat, “Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuousspeech recognition,” Speech and Audio Processing, IEEE Transactions on, vol. 7, no. 5, pp. 525–532, 1999. 82 [14] Lawrence Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, 1989. [15] Slava Katz, “Estimation of probabilities from sparse data for the language model component of a speech recognizer,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 35, no. 3, pp. 400–401, 1987. [16] Stefan Ortmanns, Hermann Ney, and Xavier Aubert, “A word graph algorithm for large vocabulary continuous speech recognition,” Computer Speech & Language, vol. 11, no. 1, pp. 43–72, 1997. [17] FrankWessel, Ralf Schluter, Klaus Macherey, and Hermann Ney, “Confidence measures for large vocabulary continuous speech recognition,” Speech and Audio Processing, IEEE Transactions on, vol. 9, no. 3, pp. 288–298, 2001. [18] Lidia Mangu, Eric Brill, and Andreas Stolcke, “Finding consensus in speech recognition: word error minimization and other applications of confusion networks,” Computer Speech & Language, vol. 14, no. 4, pp. 373–400, 2000. [19] Yi-Sheng Fu, Yi-Cheng Pan, and Lin-Shan Lee, “Improved large vocabulary continuous chinese speech recognition by character-based consensus networks,” in Chinese Spoken Language Processing, pp. 422–434. Springer, 2006. [20] Lin-shan Lee and Berlin Chen, “Spoken document understanding and organization,” Signal Processing Magazine, IEEE, vol. 22, no. 5, pp. 42–60, 2005. [21] Ya-chao Hsieh, Yu-tsun Huang, Chien-chih Wang, and Lin-shan Lee, “Improved spoken document retrieval with dynamic key term lexicon and probabilistic latent se- 83 mantic analysis (plsa),” in Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on. IEEE, 2006, vol. 1, pp. I–I. [22] Ciprian Chelba, Jorge Silva, and Alex Acero, “Soft indexing of speech content for search in spoken documents,” Computer Speech & Language, vol. 21, no. 3, pp. 458–478, 2007. [23] Leif Azzopardi, Mark Girolami, and CJ Van Rijsbergen, “Topic based language models for ad hoc information retrieval,” in Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on. IEEE, 2004, vol. 4, pp. 3281–3286. [24] Thomas Hofmann, “Probabilistic latent semantic indexing,” in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1999, pp. 50–57. [25] Scott C. Deerwester, Susan T Dumais, Thomas K. Landauer, GeorgeW. Furnas, and Richard A. Harshman, “Indexing by latent semantic analysis,” JAsIs, vol. 41, no. 6, pp. 391–407, 1990. [26] Arthur P Dempster, Nan M Laird, and Donald B Rubin, “Maximum likelihood from incomplete data via the em algorithm,” Journal of the royal statistical society. Series B (methodological), pp. 1–38, 1977. [27] Ajit P Singh and Geoffrey J Gordon, “A unified view of matrix factorization models,” in Machine Learning and Knowledge Discovery in Databases, pp. 358–373. Springer, 2008. 84 [28] Daniel D Lee and H Sebastian Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–791, 1999. [29] Daniel D Lee and H Sebastian Seung, “Algorithms for non-negative matrix factorization,” in Advances in neural information processing systems, 2001, pp. 556–562. [30] Lawrence Page, Sergey Brin, Rajeev Motwani, and TerryWinograd, “The pagerank citation ranking: Bringing order to the web.,” 1999. [31] Gぴunes Erkan and Dragomir R Radev, “Lexrank: graph-based lexical centrality as salience in text summarization,” Journal of Artificial Intelligence Research, pp. 457–479, 2004. [32] Robin Pemantle et al., “A survey of random processes with reinforcement,” Probab. Surv, vol. 4, no. 0, pp. 1–79, 2007. [33] Geoffrey E Hinton, “Learning distributed representations of concepts,” in Proceedings of the eighth annual conference of the cognitive science society. Amherst, MA, 1986, vol. 1, p. 12. [34] Wei Xu and Alexander I Rudnicky, “Can artificial neural networks learn language models,” 2000. [35] Andriy Mnih and Geoffrey Hinton, “Three new graphical models for statistical language modelling,” in Proceedings of the 24th international conference on Machine learning. ACM, 2007, pp. 641–648. [36] Tomas Mikolov, Martin Karafi´at, Lukas Burget, Jan Cernock`y, and Sanjeev Khudanpur, “Recurrent neural network based language model.,” in INTERSPEECH 2010, 85 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010, 2010, pp. 1045–1048. [37] Ronan Collobert, JasonWeston, L´eon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa, “Natural language processing (almost) from scratch,” The Journal of Machine Learning Research, vol. 12, pp. 2493–2537, 2011. [38] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in Neural Information Processing Systems 26, C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, Eds., pp. 3111–3119. Curran Associates, Inc., 2013. [39] Jeffrey Pennington, Richard Socher, and Christopher D Manning, “Glove: Global vectors for word representation,” Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014), vol. 12, 2014. [40] Pierre Tirilly, Vincent Claveau, and Patrick Gros, “Language modeling for bagof- visual words image categorization,” in Proceedings of the 2008 international conference on Content-based image and video retrieval. ACM, 2008, pp. 249–258. [41] David G Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004. [42] Corinna Cortes and Vladimir Vapnik, “Support-vector networks,” Machine learning, vol. 20, no. 3, pp. 273–297, 1995. 86 [43] Murat Saraclar and Richard Sproat, “Lattice-based search for spoken utterance retrieval,” Urbana, vol. 51, pp. 61801, 2004. [44] Yi-Cheng Pan, Hung-lin Chang, and Lin-shan Lee, “Analytical comparison between position specific posterior lattices and confusion networks based on words and subword units for spoken document indexing,” in Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on. IEEE, 2007, pp. 677–682. [45] XingWei andWBruce Croft, “Lda-based document models for ad-hoc retrieval,” in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2006, pp. 178–185. [46] Satoru Tsuge, Masami Shishibori, Shingo Kuroiwa, and Kenji Kita, “Dimensionality reduction using non-negative matrix factorization for information retrieval,” in Systems, Man, and Cybernetics, 2001 IEEE International Conference on. IEEE, 2001, vol. 2, pp. 960–965. [47] Emine Yilmaz and Javed A Aslam, “Estimating average precision with incomplete and imperfect judgments,” in Proceedings of the 15th ACM international conference on Information and knowledge management. ACM, 2006, pp. 102–111. [48] Eric Gaussier and Cyril Goutte, “Relation between plsa and nmf and implications,” in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2005, pp. 601–602. [49] Deng Cai, Xiaofei He, Jiawei Han, and Thomas S Huang, “Graph regularized nonnegative matrix factorization for data representation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 33, no. 8, pp. 1548–1560, 2011. 87 [50] Rafael A Calvo and Sunghwan Mac Kim, “Emotions in text: dimensional and categorical models,” Computational Intelligence, vol. 29, no. 3, pp. 527–543, 2013. [51] Yun-Nung Chen and Florian Metze, “Two-layer mutually reinforced random walk for improved multi-party meeting summarization.,” in SLT, 2012, pp. 461–466. [52] Sujay Kumar Jauhar, Yun-Nung Chen, and Florian Metze, “Prosody-based unsupervised speech summarization with two-layer mutually reinforced random walk,” IJCNLP 2013, 2013. [53] Geoffrey E Hinton, “Distributed representations,” 1984. [54] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams, “Learning representations by back-propagating errors,” Cognitive modeling, vol. 5, 1988. [55] Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig, “Linguistic regularities in continuous space word representations.,” in HLT-NAACL, 2013, pp. 746–751. [56] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013. [57] Mikael Boden, “A guide to recurrent neural networks and backpropagation,” 2001. [58] Alex Graves, A-R Mohamed, and Geoffrey Hinton, “Speech recognition with deep recurrent neural networks,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 6645–6649. [59] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105. 88 [60] Geoffrey E Hinton and Ruslan R Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006. [61] Andrej Karpathy and Li Fei-Fei, “Deep visual-semantic alignments for generating image descriptions,” arXiv preprint arXiv:1412.2306, 2014.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/54331	-
dc.description.abstract	本論文探討在使用者加入稀疏語音標註之條件下，如何進行有效的個人相片語意檢索(semantic retrieval of personal photos)。由於近年來數位相機及智慧型手機十分普及，使用者通常會迅速累積大量的個人相片，隨之而來的一個重要問題便是如何在數量龐大的個人相片資料庫中快速瀏覽與搜尋。一般使用者都喜歡直接用語意式查詢指令(semantic query) 來找相片，例如「母親節聚餐」。但以前的個人相片檢索多半是以內容為基礎的影像檢索(content-based image retrieval, CBIR)，倚賴影像低階描述特徵且必須以一張相片作為查詢指令(query)，並不適用於使用高階語意概念(high level smenatic concepts) 的影像檢索；而以語意為基礎的影像檢索則非常倚賴影像相關的標籤(tags) 或標註(annotations)，但使用者不太可能把所有的相片都加上標註，且使用語音標註的方式又比使用鍵盤輸入的文字標註來的更為方便，所以本論文把主題設定在使用者輸入稀疏語音標註之條件下的個人相片語意檢索，亦即有少數相片上有語音標註。實現的方法主要是利用主題模型(topic model) 整合語音和影像特徵，並使用隨機漫步模型進行重新排序(re-ranking)，最後再提出使用分佈式詞會表示法(distributed word representation) 來舒緩語音特徵稀疏的問題。首先，由於語音標註可能在任何地方被錄製，可能是非常自發性的 (spontaneous) 說話方式，所以導致辨識率低下，所以利用詞圖進行抽取字詞頻率頻率期望值(expected term frequency) 當作是語音特徵，但只有少數的相片有語音標註，所以我們必需對每張相片抽取局部(local) 與全域(global) 的影像特徵，來補充語音特徵所遺漏的資訊。而本論文利用主題模型來整合語音和影像特徵，並以此模型訓練出來的「潛藏主題」建構檢索模型。此外，我們發現主題模型的檢索效能還有很多進步空間，所以把從主題模型檢索出的首次檢索結果(first-pass retrieval results) ，基於字詞頻率期望值、局部與全域的影像特徵計算相片之間的相似度，再套用隨機漫步模型(random walk) 演算法，讓相似度越高的相片獲得越相近的相關分數(relevance score) ，進而達成重新排序的效果，並使其檢索效能獲得相當大的進步。此外，我們發現由於語音特徵非常稀疏，導致在訓練主題模型時就特別仰賴影像特徵，但其實語音特徵才是最主要提供使用者個人化與語意資訊的來源，所以進一步使用近年在尋找語意(semantic) 和句法(syntactic) 相關詞的任務中有良好表現的分佈式詞彙表示法，基於字詞頻率期望值與整體影像語意概念，以類似自動增加標註的方法找出相關詞並加入語音特徵中，讓原本稀疏的語音特徵不再稀疏，進而讓主題模型在訓練時考慮更多個人化與語意相關的資訊，並且也讓隨機漫步模型重新排序的效能也更好。	zh_TW
dc.description.provenance	Made available in DSpace on 2021-06-16T02:50:54Z (GMT). No. of bitstreams: 1 ntu-104-R02942070-1.pdf: 5020514 bytes, checksum: 37c67e8c4b043d953e069dd9b561b4df (MD5) Previous issue date: 2015	en
dc.description.tableofcontents	誌謝. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i 中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii 一、導論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 研究動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 相關前人研究. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 本論文研究貢獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 章節安排. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 二、背景知識. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 語音辨識. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 前端處理. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.2 統計式語音辨識. . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.3 聲學模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.4 語言模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.5 搜尋演算法與詞圖. . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 檢索系統簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 語音文件檢索. . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.2 以內容為基礎的影像檢索. . . . . . . . . . . . . . . . . . . . 13 2.3 主題模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.1 機率式潛藏語意分析. . . . . . . . . . . . . . . . . . . . . . . 14 2.3.2 非負矩陣分解. . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4 圖論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4.1 隨機漫步(Random Walk) . . . . . . . . . . . . . . . . . . . . . 20 2.5 詞彙表示法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5.1 傳統的詞彙表示法. . . . . . . . . . . . . . . . . . . . . . . . 21 2.5.2 分佈式詞彙表示法. . . . . . . . . . . . . . . . . . . . . . . . 22 三、以主題模型整合語音和影像特徵實現個人相片語意檢索系統. . . . . . . 25 3.1 簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2 系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3 影像特徵抽取. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.1 視覺詞作為局部影像特徵. . . . . . . . . . . . . . . . . . . . 28 3.3.2 整體影像語意概念作為全域影像特徵. . . . . . . . . . . . . . 31 3.4 語音特徵抽取. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.5 建立相片文件. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.6 利用主題模型建構檢索模型. . . . . . . . . . . . . . . . . . . . . . . 35 3.6.1 機率式潛藏語意分析檢索模型. . . . . . . . . . . . . . . . . . 35 3.6.2 非負矩陣分解檢索模型. . . . . . . . . . . . . . . . . . . . . . 36 3.7 實驗基礎設置. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.7.1 個人相片資料庫. . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.7.2 個人相片語音標註收集過程. . . . . . . . . . . . . . . . . . . 38 3.7.3 語音辨識結果. . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.7.4 實驗配置. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.7.5 評估方式. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.8 實驗結果與分析. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.9 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 四、利用隨機漫步模型增強個人相片語意檢索系統. . . . . . . . . . . . . . . 43 4.1 簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.2 系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.3 隨機漫步模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.3.1 裁剪. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.3.2 單層隨機漫步模型. . . . . . . . . . . . . . . . . . . . . . . . 46 4.3.3 雙層隨機漫步模型. . . . . . . . . . . . . . . . . . . . . . . . 47 4.4 基本實驗配置. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.5 實驗結果與討論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.5.1 整體實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.5.2 探討內插參數與裁剪方法. . . . . . . . . . . . . . . . . . . 53 4.6 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 五、使用分佈式詞彙表示法加強稀疏語音標注之個人相片語意檢索系統. . . 57 5.1 簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.2 系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.3 使用分佈式詞彙表示法模型. . . . . . . . . . . . . . . . . . . . . . . 58 5.3.1 遞迴式類神經網路語言模型. . . . . . . . . . . . . . . . . . . 60 5.3.2 連續詞袋模型. . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.3.3 跳躍文法模型. . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.4 從詞彙表示法模型找出相關詞加入相片文件. . . . . . . . . . . . . . 67 5.5 基本實驗配置. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.6 實驗結果與分析. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.6.1 使用分佈式詞彙表示法模型. . . . . . . . . . . . . . . . . . . 70 5.6.2 再使用隨機漫步模型之結果. . . . . . . . . . . . . . . . . . . 73 5.7 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 六、結論與展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.1 結論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.2 未來研究方向. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.2.1 語音辨識系統的改進. . . . . . . . . . . . . . . . . . . . . . . 79 6.2.2 影像特徵上的改進. . . . . . . . . . . . . . . . . . . . . . . . 80 6.2.3 整合語音和影像特徵模型的改進. . . . . . . . . . . . . . . . 80 6.2.4 自動標註之研究. . . . . . . . . . . . . . . . . . . . . . . . . . 80 參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
dc.language.iso	zh-TW
dc.subject	個人相片	zh_TW
dc.subject	個人相片	zh_TW
dc.subject	隨機漫步	zh_TW
dc.subject	詞彙表示法	zh_TW
dc.subject	檢索	zh_TW
dc.subject	檢索	zh_TW
dc.subject	隨機漫步	zh_TW
dc.subject	詞彙表示法	zh_TW
dc.subject	random walk	en
dc.subject	retrieval	en
dc.subject	random walk	en
dc.subject	word representation	en
dc.subject	personal photos	en
dc.subject	retrieval	en
dc.subject	word representation	en
dc.subject	personal photos	en
dc.title	使用隨機漫步及分佈式詞彙表示法增強個人相片之語意檢索	zh_TW
dc.title	Enhanced Semantic Retrieval of Personal Photos Using Random Walk and Distributed Word Representations	en
dc.type	Thesis
dc.date.schoolyear	103-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	李宏毅,鄭秋豫,王小川,陳信宏,簡仁宗
dc.subject.keyword	檢索,隨機漫步,詞彙表示法,個人相片,	zh_TW
dc.subject.keyword	retrieval,random walk,word representation,personal photos,	en
dc.relation.page	89
dc.rights.note	有償授權
dc.date.accepted	2015-07-14
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-104-1.pdf 未授權公開取用	4.9 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。