Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67246
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor李琳山
dc.contributor.authorHung-Tsung Luen
dc.contributor.author盧宏宗zh_TW
dc.date.accessioned2021-06-17T01:24:54Z-
dc.date.available2017-08-10
dc.date.copyright2017-08-10
dc.date.issued2017
dc.date.submitted2017-08-08
dc.identifier.citation[1] Ritendra Datta, Jia Li, and James Z Wang, “Content-based image retrieval: approaches and trends of the new age,” in Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval. ACM, 2005, pp. 253–262.
[2] Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jonathan Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, et al.,“Query by image and video content: The qbic system,” computer, vol. 28, no. 9, pp. 23–32, 1995.
[3] John R Smith and Shih-Fu Chang, “Visualseek: a fully automated content-based image query system,” in Proceedings of the fourth ACM international conference on Multimedia. ACM, 1997, pp. 87–98.
[4] Yi Hsuan Yang, Po Tun Wu, Ching Wei Lee, Kuan Hung Lin, Winston H Hsu, and Homer H Chen, “Contextseer: context search and recommendation at query time for shared consumer photos,” in Proceedings of the 16th ACM international conference on Multimedia. ACM, 2008, pp. 199–208.
[5] Liana Stanescu, Dumitru Dan Burdescu, Marius Brezovan, and Cristian Gabriel Mihai,“Semantic-based image retrieval,”in Creating New Medical Ontologies for Image Annotation, pp. 91–102. Springer, 2012.
[6] Ciprian Chelba, Timothy J Hazen, and Murat Saraclar, “Retrieval and browsing of spoken content,” IEEE Signal Processing Magazine, vol. 25, no. 3, 2008.
[7] Lyndon Kennedy, Mor Naaman, Shane Ahern, Rahul Nair, and Tye Rattenbury,“How flickr helps us make sense of the world: context and content in community contributed media collections,”in Proceedings of the 15th ACM international conference on Multimedia. ACM, 2007, pp. 631–640.
[8] Xuedong Huang, Alex Acero, Hsiao-Wuen Hon, and Raj Foreword By-Reddy, Spoken language processing: A guide to theory, algorithm, and system development, Prentice hall PTR, 2001.
[9] Rivarol Vergin, Douglas O’shaughnessy, and Azarshid Farhat, “Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous speech recognition,”IEEE Transactions on Speech and Audio Processing, vol. 7, no. 5, pp. 525–532, 1999.
[10] Lawrence R Rabiner,“A tutorial on hidden markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, 1989.
[11] Slava Katz, “Estimation of probabilities from sparse data for the language model component of a speech recognizer,” IEEE transactions on acoustics, speech, and signal processing, vol. 35, no. 3, pp. 400–401, 1987.
[12] Stefan Ortmanns, Hermann Ney, and Xavier Aubert, “A word graph algorithm for large vocabulary continuous speech recognition,” Computer Speech & Language, vol. 11, no. 1, pp. 43–72, 1997.
[13] FrankWessel, Ralf Schluter, Klaus Macherey, and Hermann Ney, “Confidence measures for large vocabulary continuous speech recognition,” IEEE Transactions on speech and audio processing, vol. 9, no. 3, pp. 288–298, 2001.
[14] Lidia Mangu, Eric Brill, and Andreas Stolcke, “Finding consensus in speech recognition: word error minimization and other applications of confusion networks,”Computer Speech & Language, vol. 14, no. 4, pp. 373–400, 2000.
[15] Yi-Sheng Fu, Yi-Cheng Pan, and Lin-Shan Lee, “Improved large vocabulary continuous chinese speech recognition by character-based consensus networks,” in Chinese Spoken Language Processing, pp. 422–434. Springer, 2006.
[16] Lin-shan Lee and Berlin Chen, “Spoken document understanding and organization,”Signal Processing Magazine, IEEE, vol. 22, no. 5, pp. 42–60, 2005.
[17] Ya-chao Hsieh, Yu-tsun Huang, Chien-chih Wang, and Lin-shan Lee, “Improved spoken document retrieval with dynamic key term lexicon and probabilistic latent semantic analysis (plsa),” in Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on. IEEE, 2006, vol. 1, pp. I–I.
[18] Ciprian Chelba, Jorge Silva, and Alex Acero, “Soft indexing of speech content for search in spoken documents,” Computer Speech & Language, vol. 21, no. 3, pp. 458–478, 2007.
[19] Leif Azzopardi, Mark Girolami, and CJ Van Rijsbergen, “Topic based language models for ad hoc information retrieval,” in Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on. IEEE, 2004, vol. 4, pp. 3281–3286.
[20] Wei Xu, Xin Liu, and Yihong Gong, “Document clustering based on non-negative matrix factorization,” in Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM, 2003, pp. 267–273.
[21] Daniel D Lee and H Sebastian Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–791, 1999.
[22] Daniel D Lee and H Sebastian Seung, “Algorithms for non-negative matrix factorization,”in Advances in neural information processing systems, 2001, pp. 556–562.
[23] Geoffrey E Hinton and Ruslan R Salakhutdinov, “Reducing the dimensionality of data with neural networks,” science, vol. 313, no. 5786, pp. 504–507, 2006.
[24] Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh, “A fast learning algorithm for deep belief nets,” Neural computation, vol. 18, no. 7, pp. 1527–1554, 2006.
[25] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams, “Learning representations by back-propagating errors,” Cognitive modeling, vol. 5, no. 3, pp. 1, 1988.
[26] Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng, “Multimodal deep learning,” in Proceedings of the 28th international conference on machine learning (ICML-11), 2011, pp. 689–696.
[27] Lawrence Page, Sergey Brin, Rajeev Motwani, and TerryWinograd, “The pagerank citation ranking: Bringing order to the web.,” Tech. Rep., Stanford InfoLab, 1999.
[28] G¨unes Erkan and Dragomir R Radev, “Lexrank: Graph-based lexical centrality as salience in text summarization,” Journal of Artificial Intelligence Research, vol. 22, pp. 457–479, 2004.
[29] Robin Pemantle et al., “A survey of random processes with reinforcement,” Probab. Surv, vol. 4, no. 0, pp. 1–79, 2007.
[30] Amy N Langville and Carl D Meyer, “A survey of eigenvector methods for web information retrieval,” SIAM review, vol. 47, no. 1, pp. 135–161, 2005.
[31] Geoffrey E Hinton, “Learning distributed representations of concepts,” in Proceedings of the eighth annual conference of the cognitive science society. Amherst, MA, 1986, vol. 1, p. 12.
[32] Wei Xu and Alexander I Rudnicky, “Can artificial neural networks learn language models?,” 2000.
[33] Andriy Mnih and Geoffrey Hinton, “Three new graphical models for statistical language modelling,” in Proceedings of the 24th international conference on Machine learning. ACM, 2007, pp. 641–648.
[34] Tomas Mikolov, Martin Karafi´at, Lukas Burget, Jan Cernock`y, and Sanjeev Khudanpur,“Recurrent neural network based language model.,” in INTERSPEECH, 2010, pp. 1045–1048.
[35] Ronan Collobert, JasonWeston, L´eon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa, “Natural language processing (almost) from scratch,” Journal of Machine Learning Research, vol. 12, no. Aug, pp. 2493–2537, 2011.
[36] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, 2013, pp. 3111–3119.
[37] Jeffrey Pennington, Richard Socher, and Christopher D Manning, “Glove: Global vectors for word representation.,” in EMNLP, 2014, vol. 14, pp. 1532–1543.
[38] Quoc V Le and Tomas Mikolov, “Distributed representations of sentences and documents.,”in ICML, 2014, vol. 14, pp. 1188–1196.
[39] Pierre Tirilly, Vincent Claveau, and Patrick Gros, “Language modeling for bag-of-visual words image categorization,” in Proceedings of the 2008 international conference on Content-based image and video retrieval. ACM, 2008, pp. 249–258.
[40] David G Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.
[41] Corinna Cortes and Vladimir Vapnik, “Support-vector networks,” Machine learning, vol. 20, no. 3, pp. 273–297, 1995.
[42] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.
[43] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009, pp. 248–255.
[44] John S Bridle, “Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition,” in Neurocomputing, pp. 227–236. Springer, 1990.
[45] Xavier Glorot, Antoine Bordes, and Yoshua Bengio, “Deep sparse rectifier neural networks.,” in Aistats, 2011, vol. 15, p. 275.
[46] Murat Saraclar and Richard Sproat, “Lattice-based search for spoken utterance retrieval,”Urbana, vol. 51, pp. 61801, 2004.
[47] Yi-Cheng Pan, Hung-lin Chang, and Lin-shan Lee, “Analytical comparison between position specific posterior lattices and confusion networks based on words and subword units for spoken document indexing,” in Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on. IEEE, 2007, pp. 677–682.
[48] Satoru Tsuge, Masami Shishibori, Shingo Kuroiwa, and Kenji Kita, “Dimensionality reduction using non-negative matrix factorization for information retrieval,”in Systems, Man, and Cybernetics, 2001 IEEE International Conference on. IEEE, 2001, vol. 2, pp. 960–965.
[49] Emine Yilmaz and Javed A Aslam, “Estimating average precision with incomplete and imperfect judgments,” in Proceedings of the 15th ACM international conference on Information and knowledge management. ACM, 2006, pp. 102–111.
[50] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[51] Yun-Nung Chen and Florian Metze, “Two-layer mutually reinforced random walk for improved multi-party meeting summarization,” in Spoken Language Technology Workshop (SLT), 2012 IEEE. IEEE, 2012, pp. 461–466.
[52] Sujay Kumar Jauhar, Yun-Nung Chen, and Florian Metze, “Prosody-based unsupervised speech summarization with two-layer mutually reinforced random walk,” Association for Computational Linguistics, 2013.
[53] Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier, “From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions,” Transactions of the Association for Computational Linguistics, vol. 2, pp. 67–78, 2014.
[54] Bryan A Plummer, Liwei Wang, Chris M Cervantes, Juan C Caicedo, Julia Hockenmaier, and Svetlana Lazebnik, “Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 2641–2649.
[55] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick, “Microsoft coco: Common objects in context,” in European Conference on Computer Vision. Springer, 2014, pp. 740–755.
[56] Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Doll´ar, and C Lawrence Zitnick, “Microsoft coco captions: Data collection and evaluation server,” arXiv preprint arXiv:1504.00325, 2015.
[57] Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton, “Speech recognition with deep recurrent neural networks,” in Acoustics, speech and signal processing (icassp), 2013 ieee international conference on. IEEE, 2013, pp. 6645–6649.
[58] Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu,“Exploring the limits of language modeling,” arXiv preprint arXiv:1602.02410, 2016.
[59] Andrej Karpathy and Li Fei-Fei, “Deep visual-semantic alignments for generating image descriptions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3128–3137.
[60] Gwangbeen Park and Woobin Im, “Image-text multi-modal representation learning by adversarial backpropagation,” arXiv preprint arXiv:1612.08354, 2016.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67246-
dc.description.abstract本論文主要探討的是在加入使用者稀疏語音標註的條件下,進行個人相片語意檢索(semantic retrieval of personal photos)的方法。由於近年數位相機以及智慧型手機等行動裝置十分普及,使用者通常會快速累積大量的個人相片,如何有效率的對數量龐大的相片資料庫進行瀏覽與檢索是一個十分重要的問題。一般使用者通常偏好使用語意式查詢指令(semantic query)來對相片進行搜尋,例如:「母親節聚餐」。但機器基本上沒有能力掌握相片中攜帶的這類語意,除非使用者加以標註,因此我們設定在使用者拍攝相片的同時,可以使用麥克風針對相片進行語音標註;這樣會較鍵盤輸入文字便利許多,但使用者並不會將所有的照片皆加上語音標註。因此本論文將主題設定為使用者輸入稀疏語音標註條件下之個人相片語意檢索,意即僅有少數相片標上語音標註的情境。
本論文選擇使用以主題模型為基礎的實作方式,引入數種以深層學習為基礎的特徵抽取法,利用深層卷積式類神經網路對相片進行影像特徵抽取,同時使用分佈式詞彙表示法及段落向量模型針對相片語音標註的詞圖分別進行語音特徵抽取,最後以多模態深層自編碼器的主題模型,針對影像及語音特徵進行整合,並以此模型訓練出具備「潛藏主題」的瓶頸向量建構檢索模型。
此外,本論文亦將從主題模型檢索出的首次檢索結果(first-pass retrieval results),使用字詞頻率期望值、局部與全域影像特徵、深層卷積式類神經網路影像特徵、段落向量與多模態自編碼器瓶頸特徵分別計算相片之間的相似度,再套用隨機漫步模型(random walk) 演算法,使相似度高的相片獲得相近的相關分數(relevance score),進而達成重新排序的效果,使整體檢索效能更進一步的提
升。
最後,為使後續研究能夠更順利進行,本論文在最後將微軟COCO圖片標題資料庫加以中文化,並且將此中文版語料建檔,做為後續研究之用。
zh_TW
dc.description.provenanceMade available in DSpace on 2021-06-17T01:24:54Z (GMT). No. of bitstreams: 1
ntu-106-R03922011-1.pdf: 5363834 bytes, checksum: 9810ec74cfc8a54811ef37fca289e2ee (MD5)
Previous issue date: 2017
en
dc.description.tableofcontents誌謝. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
一、導論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 研究動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 本論文研究方向. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 章節安排. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
二、背景知識. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 語音辨識. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 檢索系統簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 語音文件檢索. . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 以內容為基礎的影像檢索. . . . . . . . . . . . . . . . . . . . 11
2.3 主題模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1 非負矩陣分解. . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.2 深層自編碼器. . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.3 多模態深層自編碼器. . . . . . . . . . . . . . . . . . . . . . . 16
2.4 圖論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.1 隨機漫步. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 詞彙表示法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5.1 傳統的詞彙表示法. . . . . . . . . . . . . . . . . . . . . . . . 19
2.5.2 分佈式詞彙表示法. . . . . . . . . . . . . . . . . . . . . . . . 20
2.6 文件表示法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6.1 傳統的文件表示法. . . . . . . . . . . . . . . . . . . . . . . . 22
2.6.2 分佈式文件表示法. . . . . . . . . . . . . . . . . . . . . . . . 23
2.7 以卷積式類神經網路為基礎之影像表示法. . . . . . . . . . . . . . . 24
三、以多模態深層自編碼器主題模型整合語音與影像特徵之相片語意檢索系統28
3.1 簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 影像特徵抽取. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3.1 傳統影像特徵抽取方法. . . . . . . . . . . . . . . . . . . . . . 32
3.3.2 以深層學習為基礎之影像特徵抽取. . . . . . . . . . . . . . . 37
3.4 語音特徵抽取. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4.1 詞頻期望值語音特徵抽取. . . . . . . . . . . . . . . . . . . . 41
3.4.2 以深層學習為基礎之語音特徵抽取. . . . . . . . . . . . . . . 42
3.5 利用主題模型建構檢索模型. . . . . . . . . . . . . . . . . . . . . . . 43
3.5.1 以非負矩陣分解主題模型進行相片語意檢索. . . . . . . . . . 44
3.5.2 以多模態深層自編碼器主題模型進行相片語意檢索. . . . . . 46
3.6 實驗基礎設置. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6.1 個人相片資料庫. . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6.2 個人相片語音標註收集過程. . . . . . . . . . . . . . . . . . . 51
3.6.3 語音辨識結果. . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.6.4 實驗配置. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.6.5 評估方式. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.7 實驗結果與分析. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.8 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
四、以隨機漫步模型整合深度學習特徵增強個人相片語意檢索系統. . . . . . 57
4.1 簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2 系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3 隨機漫步模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3.1 裁剪. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3.2 單層隨機漫步模型. . . . . . . . . . . . . . . . . . . . . . . . 59
4.3.3 雙層隨機漫步模型. . . . . . . . . . . . . . . . . . . . . . . . 61
4.4 實驗基礎設置. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.5 實驗結果與討論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.5.1 整體實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.6 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
五、微軟COCO資料庫圖片標題中文化資料收集. . . . . . . . . . . . . . . . . 69
5.1 簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2 微軟COCO圖片標題資料庫. . . . . . . . . . . . . . . . . . . . . . . . 69
5.3 微軟COCO圖片標題資料庫之中文化. . . . . . . . . . . . . . . . . . 71
5.4 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
六、結論與展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.1 結論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2 未來研究方向. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2.1 相片檢索系統語音辨識之改進. . . . . . . . . . . . . . . . . . 74
6.2.2 整合語音和影像特徵模型之改進. . . . . . . . . . . . . . . . 75
參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
dc.language.isozh-TW
dc.subject融合特徵zh_TW
dc.subject影像檢索zh_TW
dc.subject語音標註zh_TW
dc.subject分佈式詞彙表示法zh_TW
dc.subject段落向量zh_TW
dc.subject卷積式類神經網路zh_TW
dc.subject深層自編碼器zh_TW
dc.subject隨機漫步zh_TW
dc.subjectimage retrievalen
dc.subjectfused featuresen
dc.subjectrandom walken
dc.subjectdeep autoencoderen
dc.subjectconvolutional neural networken
dc.subjectparagraph vectoren
dc.subjectdistributed word representationen
dc.subjectspeech annotationen
dc.title使用多模態深層自編碼器融合視覺與語音特徵強化個人相片之語意檢索zh_TW
dc.titleSemantic Retrieval of Personal Photos Using Multimodal Deep Autoencoder Fusing Visual and Speech Featuresen
dc.typeThesis
dc.date.schoolyear105-2
dc.description.degree碩士
dc.contributor.oralexamcommittee王小川,鄭秋豫,陳信宏,李宏毅
dc.subject.keyword影像檢索,語音標註,分佈式詞彙表示法,段落向量,卷積式類神經網路,深層自編碼器,隨機漫步,融合特徵,zh_TW
dc.subject.keywordimage retrieval,speech annotation,distributed word representation,paragraph vector,convolutional neural network,deep autoencoder,random walk,fused features,en
dc.relation.page84
dc.identifier.doi10.6342/NTU201700762
dc.rights.note有償授權
dc.date.accepted2017-08-08
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-106-1.pdf
  未授權公開取用
5.24 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved