結合關鍵用語擷取與口述詞彙偵測之影像辨識

Hsien-Chin Lin; 林賢進

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/2369

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李琳山
dc.contributor.author	Hsien-Chin Lin	en
dc.contributor.author	林賢進	zh_TW
dc.date.accessioned	2021-05-13T06:39:31Z	-
dc.date.available	2018-08-24
dc.date.available	2021-05-13T06:39:31Z	-
dc.date.copyright	2017-08-24
dc.date.issued	2017
dc.date.submitted	2017-08-10
dc.identifier.citation	[1] Yann LeCun, Bernhard E Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne E Hubbard, and Lawrence D Jackel, “Handwritten digit recognition with a back-propagation network,” in Advances in neural information processing systems, 1990, pp. 396–404. [2] Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender, “Learning to rank using gradient descent,” in Proceedings of the 22nd international conference on Machine learning. ACM, 2005, pp. 89–96. [3] Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson, “Cnn features off-the-shelf: an astounding baseline for recognition,” in Proceed- ings of the IEEE conference on computer vision and pattern recognition workshops, 2014, pp. 806–813. [4] Felix A Gers, Ju ̈rgen Schmidhuber, and Fred Cummins, “Learning to forget: Con- tinual prediction with lstm,” 1999. [5] Sheng-syun Shen and Hung-yi Lee, “Neural attention models for sequence classifi- cation: Analysis and application to key term extraction and dialogue act detection,” arXiv preprint arXiv:1604.00077, 2016. [6] Rada Mihalcea and Paul Tarau, “Textrank: Bringing order into text.,” in EMNLP, 2004, vol. 4, pp. 404–411. [7] Chia-hsing Hsu and Hung-yi Lee, “Enhanced spoken term detection by deep learn- ing,” M.S. thesis, 2017. [8] Stuart Rose, Dave Engel, Nick Cramer, and Wendy Cowley, “Automatic keyword extraction from individual documents,” Text Mining: Applications and Theory, pp. 1–20, 2010. [9] Tony Lindeberg, “Scale invariant feature transform,” Scholarpedia, vol. 7, no. 5, pp. 10491, 2012. [10] Alex Krizhevsky and Geoffrey Hinton, “Learning multiple layers of features from tiny images,” 2009. [11] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information pro- cessing systems, 2012, pp. 1097–1105.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/2369	-
dc.description.abstract	人類幼年時期透過視覺、聽覺就常常直接學到沒有被教導過的詞所代表的東西，進而去理解其相關含意或觀念。本論文希望用類似的方式，讓機器自動從網路上的影音資料中抽取若干知識，也能做到初步的學習。這也是有效運用網路資源的方法之一。例如網路上有廚藝教學影片、生態紀錄片、舞蹈教學影片等，如能有效運用這些資訊，相信對人類生活有很大幫助。由於網路上的影片，大多缺乏妥善的標註，要讓機器直接學習這些影片並不容易，若是要給予影片標註，則要花費相當大量的人力成本，亦非上策。因此本論文提出了一個系統機制，透過影片旁白的關鍵用語擷取與口述詞彙偵測，自動為影片中的影格標註，同時自動從影片中找出重要的觀念作為類別，再將這些有自動標註的資料作為訓練資料，訓練出一個影像辨識模型，作為走向上述目標的第一步。	zh_TW
dc.description.abstract	Children usually learn objects or concepts from visual and hearing input without being exactly taught about those objects or concepts. We hope machines can do something similar, i.e., learn something from unlabeled video and audio autometically. In the Internet era, abundant resources are available on the Internet. For example, the instruction and training videos about cooking, dancing and the environment on YouTube. We wish to be able to use them . Most of such videos on YouTube mentioned above are not labled, thus difficult to be used in training machines. Human annotation for these videos is expansive. This research therefore proposed a direction and develops a system, which performs key term extraction and spoken term detection over the audio, and uses the detected key terms to label the frames of the video automatically. It can also discover the important concepts in the videos, treating them as classes of images. We then use these labeled data to train an image classification model and reasonably good results can be obtained. A novel key term extraction approach based on the location of the terms and the context in the sentences was also proposed here, which was shown to be domain independent. In other words, once trained it can be used to extract key terms in unseen domains.	en
dc.description.provenance	Made available in DSpace on 2021-05-13T06:39:31Z (GMT). No. of bitstreams: 1 ntu-106-R04942068-1.pdf: 20814355 bytes, checksum: e687c3392e396e0b8da8a14265029de4 (MD5) Previous issue date: 2017	en
dc.description.tableofcontents	誌謝.......................................... i 中文摘要....................................... ii 英文摘要....................................... iii 一、導論....................................... 1 1.1 研究動機.................................. 1 1.2 研究方向.................................. 3 1.3 主要貢獻.................................. 4 1.4 章節安排.................................. 4 二、背景知識 .................................... 6 2.1 深層類神經網路.............................. 6 2.1.1 簡介 ................................ 6 2.1.2 訓練方法.............................. 8 2.1.3 卷積類神經網路(Convolutional Nueral Network) . . . . . . . 9 2.1.4 長短期記憶類神經網路(Long Short-Term Memory Network) 12 2.2 關鍵用語擷取 ............................... 14 2.2.1 簡介 ................................ 14 2.2.2 監督式關鍵用語擷取 ....................... 15 2.2.3 非監督式關鍵用語擷取...................... 16 2.2.4 比較監督式與非監督式關鍵用語擷取系統 ........... 19 2.2.5 評估機制.............................. 20 2.3 口述詞彙偵測 ............................... 21 2.3.1 簡介 ................................ 21 2.3.2 詞圖 ................................ 22 2.3.3 加權有限狀態轉換器的語音資訊檢索.............. 23 2.4 本章總結.................................. 24 三、關鍵用語擷取系統 ............................... 26 3.1 簡介..................................... 26 3.2 架構與流程................................. 26 3.3 前處理 ................................... 27 3.4 監督式模型................................. 28 3.4.1 卷積類神經網路模型 ....................... 29 3.4.2 長短期記憶類神經網路模型 ................... 31 3.5 非監督式模型 ............................... 34 3.6 實驗基礎架構 ............................... 34 3.6.1 語料介紹.............................. 34 3.6.2 訓練與辨識系統.......................... 35 3.7 實驗設計.................................. 36 3.8 實驗結果.................................. 36 3.9 本章總結.................................. 45 四、以口述詞彙偵測訓練圖像辨識模型...................... 46 4.1 簡介..................................... 46 4.2 架構與流程................................. 46 4.2.1 影像處理.............................. 46 4.2.2 口述詞彙偵測 ........................... 48 4.2.3 影像辨識模型 ........................... 49 4.3 實驗基礎架構 ............................... 53 4.3.1 語料介紹.............................. 53 4.3.2 訓練與辨識系統.......................... 53 4.4 實驗設計.................................. 54 4.5 實驗結果.................................. 55 4.6 本章總結.................................. 59 五、結合關鍵用語擷取與口述詞彙偵測訓練圖像辨識模型 . . . . . . . . . . . 60 5.1 簡介..................................... 60 5.2 架構與流程................................. 60 5.3 實驗基礎架構 ............................... 61 5.3.1 語料介紹.............................. 61 5.3.2 訓練與辨識系統.......................... 62 5.4 實驗設計.................................. 62 5.5 實驗結果.................................. 63 5.5.1 關鍵用語擷取系統分析...................... 63 5.5.2 影像辨識模型 ........................... 64 5.6 本章總結.................................. 68 六、結論與展望 ................................... 69 6.1 結論..................................... 69 6.2 未來研究方向 ............................... 70 參考文獻....................................... 71
dc.language.iso	zh-TW
dc.title	結合關鍵用語擷取與口述詞彙偵測之影像辨識	zh_TW
dc.title	Image classification by combining key term extraction and spoken term detection	en
dc.type	Thesis
dc.date.schoolyear	105-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳介宏,鄭秋豫,李宏毅,王小川
dc.subject.keyword	關鍵用語擷取,口述詞彙偵測,機器學習,	zh_TW
dc.subject.keyword	key term extraction,spoken term detection,machine learning,	en
dc.relation.page	72
dc.identifier.doi	10.6342/NTU201702938
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2017-08-11
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf	20.33 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。