使用卷積類神經網路及長短期記憶單元方法以標籤關係為基礎的場景辨識

Po-Jen Chen; 陳柏任

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50413

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	丁建均(Jian-Jiun Ding)
dc.contributor.author	Po-Jen Chen	en
dc.contributor.author	陳柏任	zh_TW
dc.date.accessioned	2021-06-15T12:39:43Z	-
dc.date.available	2016-08-02
dc.date.copyright	2016-08-02
dc.date.issued	2016
dc.date.submitted	2016-07-28
dc.identifier.citation	A. Neural Networks [1] Hebb, Donald Olding, The organization of behavior: A neuropsychological theory. Psychology Press, 2005. [2] Paul Werbos, Paul John. “Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences.” Ph.D. dissertation, Harvard University, 1974. [3] Sunil Tyagi, “Neural networks in a nutshell”. 2014 [Online]. Available: https://themenwhostareatcodes.wordpress.com/2014/03/02/neural-networks-in-a-nutshell/comment-page-1/ [Accessed: Sept. 28, 2015] [4] Mable Fok “Lightwave Neuromorphic Signal Processing”. 2012 [Online]. Available: http://wave.engr.uga.edu/projects.html [Accessed: Sept. 28, 2015]. [5] David Poole, “Artificial intelligence – Foundations of computational agents”. 2010 [Online]. Available: http://artint.info/html/ArtInt_180.html [Accessed: Sept. 28, 2015]. [6] Nate Kohl, “Stackoverflow topic: Role of Bias in Neural Networks” Mar. 23, 2010. [Online]. Available: http://stackoverflow.com/questions/2480650/ role-of-bias-in-neural-networks [Accessed: Sept. 28, 2015] B. Convolutional Neural Networks [7] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey Hinton. 'Imagenet classification with deep convolutional neural networks.' Advances in neural information processing systems, 2012. [8] Fei-Fei Li & Andrej Karpathy, “Stanford class: Convolutional Neural Networks for Visual Recognition” Jan. 2015. [Online]. Available: http://cs231n.stanford.edu/syllabus.html [Accessed: Sept. 28, 2015] [9] Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, “Gradient-based learning applied to document recognition.” Proceedings of the IEEE, 86(11), 1998, pp. 2278-2324 [10] Simonyan, Karen, and Andrew Zisserman. 'Very deep convolutional networks for large-scale image recognition.' arXiv preprint arXiv:1409.1556, 2014. C. Long Short-Term Memory [11] Jaeger, Herbert. Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the' echo state network' approach. GMD-Forschungszentrum Informationstechnik, 2002. [12] Schuster, Mike, and Kuldip K. Paliwal. 'Bidirectional recurrent neural networks.' Signal Processing, IEEE Transactions on 45.11, 1997, pp. 2673-2681. [13] Gers, Felix. 'Long short-term memory in recurrent neural networks.' Unpublished PhD dissertation, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland, 2001. [14] Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko et al. 'Long-term recurrent convolutional networks for visual recognition and description.' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2625-2634. D. Scene Classification [15] BVV Sri Raj Dutt, Pulkit Agrawal and Sushoban Nayak, “Scene Classification in Images” [16] Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba and Aude Oliva, “Learning deep features for scene recognition using places database”, in Advances in neural information processing systems (NIPS), 2014, pp. 487-495 [17] Jia Li, James Ze Wang and Gio Wiederhold, “SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture Libraries,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 23, no. 9, 2001, pp. 947-963 [18] Aditya Vailaya, Mário Figueiredo, Anil Jain and HongJiang Zhang, “Content-based hierarchical classification of vacation images.” In IEEE International Conference on Multimedia Computing and Systems vol. 1, 1999, pp. 518-523, [19] Matthew R. Boutella, Jiebo Luo, Xipeng Shen and Christopher M. Brown, “Learning multi-label scene classification.” Pattern recognition, 37(9), 2004, pp. 1757-1771 [20] Jianxiong Xiao, James Hays, Krista A. Ehinger, Aude Oliva and Antonio Torralba, “SUN Database: Large-scale Scene Recognition from Abbey to Zoo', in IEEE Conference on Computer Vision and Pattern Recognition, 2010. [21] Svetlana Lazebnik, Cordelia Schmid, Jean Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories.” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2006, pp.2169 – 2178. [22] Bosch, Anna, Andrew Zisserman, and Xavier Muñoz. 'Scene classification via pLSA.' Computer Vision–European Conference on Computer Vision. Springer Berlin Heidelberg, 2006, pp. 517-530. [23] Li-Jia Li, Hao Su, Eric P. Xing and Li Fei-Fei “Object bank: A high-level image representation for scene classification & semantic feature sparsification.” in Advances in neural information processing systems, 2010, pp. 1378-1386 [24] Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner. “Gradient-based learning applied to document recognition.” Proceedings of the IEEE, 1998, pp. 2278-2324 [25] Markus Koskela and Jorma Laaksonen 'Convolutional network features for scene recognition.' Proceedings of the ACM International Conference on Multimedia. 2014. [26] Pablo Espinace, Thomas Kollar, Álvaro Soto and Nirmalya Roy. “Indoor scene recognition through object detection.” in IEEE International Conference on Robotics and Automation (ICRA), 2010, pp. 1406-1413 [27] Zhi-Hua Zhou and Min-Ling Zhang “Multi-instance multi-label learning with application to scene classification.” In Advances in neural information processing systems, 2006, pp. 1609-1616 [28] Matthew R. Boutell, Jiebo Luo, Xipeng Shen and Christopher M. Brown, “Learning multi-label scene classification.” Pattern recognition, 37(9), 2004, pp. 1757-1771 E. Label Relation [29] Jia Deng, Nan Ding, Yangqing Jia, Andrea Frome, Kevin Murphy, Samy Bengio et al. “Large-scale object classification using label relation graphs” In Computer Vision–European Conference on Computer Vision. Springer International Publishing. 2014, pp. 48-64 [30] Marcin Marszalek and Cordelia Schmid “Semantic hierarchies for visual object recognition.” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007. pp. 1-7. [31] Tousch, Anne-Marie, Stéphane Herbin, and Jean-Yves Audibert. 'Semantic hierarchies for image annotation: A survey.' Pattern Recognition 45.1, 2012, pp. 333-345. F. Methods to Improve the Performance of the NN [32] Ioffe, Sergey, and Christian Szegedy. 'Batch normalization: Accelerating deep network training by reducing internal covariate shift.' arXiv preprint arXiv:1502.03167, 2015. [33] Glorot, Xavier, and Yoshua Bengio. 'Understanding the difficulty of training deep feedforward neural networks.' Aistats. Vol. 9. 2010. G. Toolkits [34] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long and Ross Girshick et al. “Caffe: Convolutional Architecture for Fast Feature Embedding” arXiv preprint arXiv:1408.5093, 2014 [35] Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko et al. “Long-term Recurrent Convolutional Networks for Visual Recognition and Description” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50413	-
dc.description.abstract	在傳統的場景辨識方法中，通常假設每一個標籤是互斥的，但是這常常是不合理的，因為在場景的標籤中，可能會有一些關係，例如：雪山的場景同時屬於山跟雪兩個標籤，所以這是一個多標籤的場景辨識。其中，我們整理了兩個最主要的關係，階層式關係與互斥關係。希望透過這兩個關係來讓辨識結果更加的合理。　　我們提出兩個方法，第一個方法是基於階層式的卷積類神經網路與關係圖結構，相對於傳統的假設標籤互斥的方法，我們假設圖的路徑是互斥的。但由於這個方法是需要對資料庫做預處理，同時需要人工建立關係圖。因此我們提出另外一個基於長短期記憶單元的方法，由於我們認為語言中的文法很像是標籤關係，因此我們透過長短期記憶單元的結構，來訓練出語言模型，並產生關於場景的敘述，這個敘述就是辨識的結果。從最後的模擬結果我們可以發現我們提出的兩個方法都比過往的多類別場景辨識結果要好，另外，基於長短期記憶單元的方法又比階層式卷積神經網路的方法好。	zh_TW
dc.description.abstract	In traditional scene classification, they assume the labels are mutually exclusive. But there are some relations between the labels. For example, the snow mountain scene must belong to both mountain and snow labels. Therefore, the results of the traditional label relations are not reasonable. We want to predict a more reasonable result based on the label relations. We conclude two relations, which are hierarchy relation and exclusive relation. We proposed two algorithms, the first algorithm is based on the hierarchy CNN and the label relation graph structure. We assume the paths in the graph are mutually exclusive instead of assuming the labels are mutually exclusive. But this algorithms need pre-processing of the dataset and we need to build the label relation graph in manual. Therefore, we proposed another algorithm which is based on the long short-term memory. The idea is the grammar between the words in the sentence is like the label relations between the labels. This is very like the image captioning work. Therefore, we train a language model to model the label relations and use the long short-term memory structure to produce the description of the image. The description of the image is our predict result. The simulation result suggests that the algorithms we proposed are better than other multi-label scene classification methods. In addition, the algorithm based on the long short-term memory is better than the algorithm based on the hierarchy convolutional neural network.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T12:39:43Z (GMT). No. of bitstreams: 1 ntu-105-R03942106-1.pdf: 3145067 bytes, checksum: 5b1e1a1852f350d7ae532867d422a77c (MD5) Previous issue date: 2016	en
dc.description.tableofcontents	中文摘要 i ABSTRACT ii CONTENTS iii LIST OF FIGURES vii LIST OF TABLES ix Chapter 1 Introduction 1 Chapter 2 Fundamentals of Neural Networks 2 2.1 Backgrounds 2 2.2 Neurons 2 2.2.1 The Model of Neurons 3 2.2.2 The Differences Between Biology and Machine Learning Model 4 2.3 Neural Networks 7 2.3.1 Introduction of the Neural Networks 7 2.3.2 Training Methods for the Neural Networks 8 2.4 Convolutional Neural Networks 15 2.4.1 Introduction 15 2.4.2 Convolution Layer 16 2.4.3 Pooling Layer 20 2.4.4 Fully-Connected Layer 21 2.4.5 Overfitting 21 2.4.6 Some Famous CNNs 22 2.5 Long Short-Term Memory 24 2.5.1 Introduction of Recurrent Neural Networks (RNNs) 24 2.5.2 How to Train the Recurrent Neural Networks 25 2.5.3 The Long Short-Term Memory 26 Chapter 3 Fundamentals of Scene Classification 28 3.1 Definition 28 3.2 Dataset 28 3.3 Single-Label Scene Classification 29 3.3.1 Structures 29 3.3.2 Feature extractor based structure 30 3.3.3 CNN-based structure 33 3.4 Multi-Label Scene Classification 34 3.5 The Difficulties of the Scene Classification 35 3.5.1 Similar Scenes 36 3.5.2 Obstacles 37 3.5.3 Various Elements 38 Chapter 4 Fundamentals of Label Relation 39 4.1 Introduction 39 4.2 Definition 39 4.3 Label Relations in the Dataset 40 Chapter 5 The Proposed Hierarchy CNN Based Algorithm 42 5.1 Framework 42 5.2 The CNN Part 42 5.3 The Hierarchy NN Part 43 5.3.1 Introduction 43 5.3.2 The Hierarchy Structure of NN 44 5.3.3 The Structure of Each Node 44 5.3.4 The Batch Norm Layer 46 5.3.5 The Data Balance 48 5.3.6 The Hierarchy Training Settings 49 5.3.7 Weights Initialization Method 50 5.4 The Graph Structure Part 51 5.4.1 Introduction 51 5.4.2 The Adjacency Matrix 52 5.4.3 The Potential Value of the Path 53 5.4.4 The Best Path Algorithm 54 5.4.5 The Label Value 55 5.5 Simulation Results 56 5.5.1 Toolkit and Environment 56 5.5.2 The Loss graph 56 5.5.3 Evaluation 57 5.5.4 Compared Methods 58 5.5.5 The Result of the Hierarchy NN part 59 5.5.6 The Result of the Graph Structure Part 60 Chapter 6 The Proposed LSTM Based Algorithm 64 6.1 Introduction 64 6.2 The Framework of Proposed LSTM Based Algorithm 64 6.3 The Description of the Image 65 6.4 The CNN Part 65 6.5 The LSTM Part 65 6.5.1 The Word 65 6.5.2 The Word Embedding Layer 66 6.5.3 The LSTM Structure 66 6.5.4 The Probability of the Labels 67 6.6 Training Setting 68 6.6.1 Toolkit and Environment 68 6.6.2 Pre-Processing of the Training Set 68 6.6.3 The Parameter of Training 68 6.7 Simulation Results 68 Chapter 7 Conclusion and Future Work 74 7.1 Conclusions 74 7.2 Future Work 75 REFERENCE 76
dc.language.iso	en
dc.title	使用卷積類神經網路及長短期記憶單元方法以標籤關係為基礎的場景辨識	zh_TW
dc.title	Label Relation Based Scene Classification Using CNNs and LSTM	en
dc.type	Thesis
dc.date.schoolyear	104-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	王家慶(Jia-Ching Wang),郭景明(Jing-Ming Guo),許文良(Wen-Liang Hsue)
dc.subject.keyword	標籤關係,場景辨識,階層式類神經網路,長短期記憶單元,	zh_TW
dc.subject.keyword	label relation,scene classification,hierarchy neural network,LSTM,	en
dc.relation.page	80
dc.identifier.doi	10.6342/NTU201601474
dc.rights.note	有償授權
dc.date.accepted	2016-07-28
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-105-1.pdf 目前未授權公開取用	3.07 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。