基於深度共現特徵的影像辨識

Ya-Fang Shih; 施雅方

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66996

Title:	基於深度共現特徵的影像辨識 Deep Co-occurrence Feature Learning for Visual Object Recognition
Authors:	Ya-Fang Shih 施雅方
Advisor:	莊永裕
Co-Advisor:	林彥宇
Keyword:	影像辨識,細粒度影像辨識,共現特徵, Object recognition,fine-grained recognition,co-occurrence feature,
Publication Year :	2017
Degree:	碩士
Abstract:	這篇論文解決了三項過去結合基於物體部件的表示方法及卷積類神經網路應用於影像辨識上的作法之問題。首先，大多基於物體部件的模型需要人工事先定義部件的個數與種類，然而，最適合用於影像辨識的物體部件時常會隨著要區分的資料而改變。此外，多數方法在訓練卷積類神經網路時需要使用包含部件位置資訊之訓練資料，人工成本相當昂貴。最後，過去方法為了表達部件間的位置關係，經常需要繁瑣的計算或是使用多支龐大的神經網路。我們提出一種全新的共現特徵層來解決上述三項問題。共現特徵層延伸卷積層概念，利用網路中神經元自動學習以取代事先定義的物體部件，並記錄部件間共同出現的關係。在共現特徵層中，卷積層所產生的任兩張特徵圖像作為濾器及影像，以濾器對於影像進行相關濾波運算。網路路連接共現特徵層後仍可以由頭至尾訓練，且共現層產生的共現特徵能抵抗旋轉與位移，以及物體形變的影響。我們在VGG-16及ResNet-152加上共現特徵層，將Caltech-UCSD 鳥類影像集的辨識正確率提升至83.6%及85.8%。此篇論文的原始碼發佈於https://github.com/yafangshih/Deep-COOC。 This thesis addresses three issues in integrating part-based representations into convolutional neural networks (CNNs) for object recognition. First, most part-based models rely on a few pre-specified object parts. However, the optimal object parts for recognition often vary from category to category. Second, acquiring training data with part-level annotation is laborintensive. Third, modeling spatial relationships between parts in CNNs often involves an exhaustive search of part templates over multiple network streams. We tackle the three issues by introducing a new network layer, called co-occurrence layer. It can extend a convolutional layer to encode the co-occurrence between the visual parts detected by the numerous neurons, instead of a few pre-specified parts. To this end, the feature maps serve as both filters and images, and mutual correlation filtering is conducted between them. The co-occurrence layer is end-to-end trainable. The resultant co-occurrence features are rotation- and translation-invariant, and are robust to object deformation. By applying this new layer to the VGG-16 and ResNet-152, we achieve the recognition rates of 83.6% and 85.8% on the Caltech-UCSD bird benchmark, respectively. The source code is available at https://github.com/yafangshih/Deep-COOC.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66996
DOI:	10.6342/NTU201703240
Fulltext Rights:	有償授權
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-106-1.pdf Restricted Access	3.95 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets