基於監督式機器學習方法之影像內容分析系統之演算法及硬體架構設計實作

Yi-Ling Chen; 陳怡伶

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/47383

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	簡韶逸
dc.contributor.author	Yi-Ling Chen	en
dc.contributor.author	陳怡伶	zh_TW
dc.date.accessioned	2021-06-15T05:57:26Z	-
dc.date.available	2015-08-18
dc.date.copyright	2010-08-18
dc.date.issued	2010
dc.date.submitted	2010-08-17
dc.identifier.citation	[1] Minghua Shi and Amine Bermak, “An efficient digital VLSI implementation of Gaussian mixture models-based classifier,” IEEE Transactions on Very Large Scale Integration (VLSI) System, vol. 14, no. 9, pp. 962–974, 2006. [2] Gabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, and Cedric Bray, 'Visual categorization with bags of keypoints,' in Workshop on Statistical Learning in Computer Vision, ECCV, 2004, pp. 1–22. [3] Nuno Vasconcelos, “From pixels to semantic spaces: Advances in content based image retrieval,” Computer, vol. 40, no. 7, pp. 20–26, July 2007. [4] Aditya Vailaya, Mario A. T. Figueiredo, Anil K. Jain, and Hong-Jiang Zhang, “Image classification for content-based indexing,” IEEE Transactions on Image Processing, vol. 10, pp. 117–130, 2001. [5] Martin Szummer and Rosalind W. Picard, “Indoor-outdoor image classification,” in IEEE International Workshop on Content-based Access of Image and Video Databases, 1998, pp. 42–51. [6] Monika M. Gorkani and Rosalind W. Picard, “Texture orientation for sorting photos at a glance,” in TR-292, M.I.T., Media Labortory, Perceptual Computing Section, 1994, pp. 459–464. [7] Julia Vogel and Bernt Schiele, “Natural scene retrieval based on a semantic modeling step,” in ACM International Conference on Image and Video Retrieval. 2004, Springer Verlag. [8] Julia Vogel and Bernt Schiele, “Semantic modeling of natural scenes for content-based image retrieval,” International Journal of Computer Vision, vol. 72, pp. 133–157, 2007. [9] Gustavo Carneiro, Antoni B. Chan, Pedro J. Moreno, and Nuno Vasconcelos,“Supervised learning of semantic classes for image annotation and retrieval,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 3, pp. 394–410, March 2007. [10] John R. Smith,Milind Naphade, and Apostol Natsev, “Multimedia semantic indexing using model vectors,” in 2003 International Conference on Multimedia and Expo, July 2003, vol. 2, pp. II–445–8 vol.2. [11] Jinbo Bi, Yixin Chen, and James Z.Wang, “A sparse support vector machine approach to region-based image categorization,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, pp. 1121–1128. [12] Nikhil Rasiwasia and Nuno Vasconcelos, “Image retrieval using query by contextual example,” in Proceeding of the 1st ACM international conference on Multimedia information retrieval, April 2008, pp. 164–171. [13] Xiaodong Cui and Yifan Gong, “A study of variable-parameter Gaussian mixture hidden Markov modeling for noisy speech recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 4, pp. 1366–1376, May 2007. [14] Sergios Theodoridis and Konstantinos Koutroumbas, “Classifiers based on Bayes decision theory,” in Pattern recognition, chapter 2, pp. 19–20. Academic Press, 2006. [15] Yihong Gong and Wei Xu, “Introduction,” in Machine Learning for Multimedia Content Analysis, chapter 1, pp. 1–8. Springer-Verlag New York, Inc., 2007. [16] Christopher J. C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery, vol. 2, pp. 121–167, 1998. [17] Corinna Cortes and Vladimir Vapnik, “Support-vector networks,” in Machine Learning Journal, 1995, vol. 20, pp. 273–297. [18] Nello Cristianini and John Shawe-Taylor, “Linear learning machines,” in An introduction to support vector machines : and other kernel-based learning methods, chapter 2, p. 198. Cambridge University Press, 2000. [19] Olivier Chapelle, Patrick Haffner, and Vladimir Vapnik, “SVMs for histogram-based image classification,” 1999. [20] Yihong Gong and Wei Xu, “Max-margins classifications,” in Machine Learning for Multimedia Content Analysis, chapter 10, pp. 235–262. Springer-Verlag New York, Inc., 2007. [21] Vladimir N. Vapnik, “The support vector machine for estimating real-valued functions,” in Statistical Learning Theory, chapter 11. Wiley, 1998. [22] Chih-Chung Chang and Chih-Jen Lin, LIBSVM: a library for support vector machines, 2001, Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm. [23] Thorsten Joachims, “Making large-scale support vector machine learning practical,” 1998. [24] Chih-Wei Hsu and Chih-Jen Lin, “A simple decomposition method for support vector machines,” Machine Learning, vol. 46, pp. 291–314, 2002. [25] Jason Weston and Chris Watkins, “Support vector machines for multi-class pattern recognition,” in In Proceedings of the Seventh European Symposium On Artificial Neural Networks, 1999. [26] Chih-Wei Hsu and Chih-Jen Lin, “A comparison of methods for multiclass support vector machines,” 2002. [27] Koby Crammer and Yoram Singer, “On the learnability and design of output codes for multiclass problems,” in Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, 2000, pp. 35–46. [28] Ioannis Tsochantaridis, Google Inc, Thorsten Joachims, Thomas Hofmann, Yasemin Altun, and Yoram Singer, “Large margin methods for structured and interdependent output variables,” Journal of Machine Learning Research, vol. 6, pp. 1453–1484, 2005. [29] David L. Olson and Dursun Delen, “Support Vector Machines,” in Advanced Data Mining Techniques, chapter 7, pp. 121–122. Springer, 2008. [30] Shogo Muramatsu and HidenoriWatanabe, “Fast algorithm for GMM-based pattern classifier,” in IEEE International Conference on Acoustics, Speech and Signal Processing, April 2009, pp. 633 – 636. [31] Roman Genov and Gert Cauwenberghs, “Kerneltron: Support vector ‘machine’in silicon,” IEEE Transactions on Neural Networks, vol. 14, pp. 1426–1434, 2003. [32] Davide Anguita, Andrea Boni, and Sandro Ridella, “A digital architecture for support vector machines: theory, algorithm, and FPGA implementation,”IEEE transactions on neural networks, vol. 14, pp. 993–1009, 2003. [33] Faisal M. Khan, Mark G. Arnold, and William M. Pottenger, “Hardware based support vector machine classification in logarithmic number systems,”in IEEE International Symposium on Circuits and Systems, 2005, pp. 5154–5157. [34] Bryan Catanzaro, Narayanan Sundaram, and Kurt Keutzer, “Fast support vector machine training and classification on graphics processors,” in Proceedings of the 25th international conference on Machine learning, 2008, pp. 104–111. [35] Jun Yang, Yu-Gang Jiang, Alexander G. Hauptmann, and Chong-Wah Ngo,“Evaluating bag-of-visual-words representations in scene classification,” in Proceedings of International Workshop on Multimedia Information Retrieval, 2007, pp. 197–206. [36] “LSCOM lexicon definitions and annotations version 1.0, DTO challenge workshop on large scale concept ontology for multimedia,” Tech. Rep., Columbia University,March 2006. [37] Mark J. Huiskes and Michael S. Lew, “The MIR flickr retrieval evaluation,” in Proceeding of the 1st ACM international conference on Multimedia information retrieval, 2008. [38] Aleksandra Mojsilovic, Jose Gomes, and Bernice Rogowitz, “Semantic friendly indexing and quering of images based on the extraction of the objective semantic cues,” International Journal of Computer Vision, pp. 79–107, 2004. [39] S. L. Feng, R. Manmatha, and V. Lavrenko, “Multiple bernoulli relevance models for image and video annotation,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004, pp. 1002–1009. [40] V. Lavrenko, R.Manmatha, and J. Jeon, “A model for learning the semantics of pictures,” in 17th Annual Conference on Neural Information Processing Systems. 2003, MIT Press. [41] Aude Oliva and Antonio Torralba, “Scene-centered description from spatial envelope properties,” in In 2nd Workshop on Biologically Motivated Computer Vision, Lecture Notes in Computer Science. 2002, pp. 263–272, Springer Verlag. [42] Eric Nowak, Frederic Jurie, and Bill Triggs, “Sampling strategies for bag-of-features image classification,” in Proceedings of European Conference on Computer Vision. 2006, pp. 490–503, Springer. [43] Naonori Ueda and Zoubin Ghahramani, “Bayesian model search for mixture models based on optimizing variational bounds,” Neural Networks, vol. 15, pp. 1223 – 1241, Dec. 2002. [44] Nikolaos Nasios and Adrian G. Bors, “Variational learning for Gaussian mixture models,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 36, no. 4, pp. 849–862, Aug. 2006. [45] Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang, “Image retrieval: Ideas, influences, and trends of the new age,” ACM Computing Surveys, vol. 40, no. 5, April 2008. [46] Richard O. Duda, Peter E. Hard, and David G. Stork, Pattern Classification, Wiley Interscience, second edition, 2000. [47] L. Bottou, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, L. D. Jackel, Y. LeCun, U. A. Muller, E. Sackinger, P. Simard, and V. Vapnik, “Comparison of classifier methods: a case study in handwritten digit recognition,” in Proceedings of the 12th international conference on Pattern Recognition, 1994, pp. 77–82. [48] Jason Weston and Chris Watkins, “Multi-class support vector machines,”1998. [49] Chih-Wei Hsu and Chih-Jen Lin, “BSVM,” 2006, Software available at http://www.csie.ntu.edu.tw/ cjlin/bsvm/index.html.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/47383	-
dc.description.abstract	隨著半導體產業的蓬勃發展，現在已經有越來越多的功能被整合到消費型電子產品(Consumer Electronics)上。例如，基本通訊功能、高速網路連結、高解析度影像感測器(CMOS Sensor)、高容量儲存裝置以及智慧型人機互動介面等。這些功能，使得消費型電子產品上所儲存的多媒體資料量越來越大。為了提供有效的多媒體資料檢索，特別針對影像資料。如何將這些資料中的語意資訊即時擷取出來並有效運用，成為一個亟需解決的問題。而在處理多媒體內容分析語意擷取時，機器學習演算法扮演重要的角色。此外，針對嵌入式系統，以往所使用的中央處理器(CPU)或特定應用積體電路(ASIC)，皆無法同時滿足多媒體內容分析所需的彈性(Flexibility)及效能(Performance)。因此，對於下一個世代的應用，我們需要新的設計方法(Design methodology)來提供不同使用者所需的彈性及效能。在本論文中，我們提出了高斯混和模型(Gaussian Mixture Model)以及多類別支持向量機(Multi-class Support Vector Machine)機器學習演算法的硬體架構，用以加速多媒體內容分析的影像語意處理，以及概念特徵(Concept feature)擷取的過程。利用從局部到整體資訊集合方式的概念特徵擷取，影像區塊(patch)可以透過如高斯混和模型或是多類別支持向量機機器學習演算法做分析，將影像區塊的低階特徵(low-level feature)分類至事前定義好的概念類別中。利用蒐集整張圖所有影像區塊的概念分類資訊，影像的語意概念特徵便可以被擷取出來，用以代表整張圖。如此的映射過程，便成為低階特徵與人類感官語意感受之間差距的橋樑。過程中所需要的密集運算，成為了資源有限的嵌入式系統的負擔，因此我們提出針對這個問題的解決方法。在我們提出的高斯混和模型硬體架構中，我們利用不同的平行度以及摺疊(folding)的設計方法達到良好的加速及彈性，達到每個週期完成一個高斯分布所需要運算的處理能力。由於在高斯混和模型演算法中，每個分類器(classifier)所需要用到的高斯分布的數量可能不同，因此我們提供的一次處理一個高斯分布的作法，將能更有彈性及效率的符合不同使用者需求。在完整的分析下，我們提出多類支持向量機的硬體架構設計方法，並提供一個符合硬體成本及即時分析處理能力取捨(trade-off)的硬體架構原型，而這個架構經由可重組架構(reconfigurable structure)的最佳化來提供更高的彈性，並提供三種不同運作模式供使用者依照不同需求做選擇，同時能讓記憶體的使用更有效率。我們所提供的彈性包含：(1)三種不同的核心函數(kernel function)、(2)大範圍的參數值、(3)可調整的位元精確度以及(4)兩種不同的運算速度模式。而當支持向量的數量超過可以儲存於晶片上記憶體(on-chip memory)時，也可以利用我們所提出的重載(reload)分析結果，來針對記憶體做改善，進而支援不同重載狀況。	zh_TW
dc.description.abstract	Due to the development of semiconductor technology, a Consumer Electronics(CE) product with huge storage device might include different functionalities besides basic communication, such as taking or storing photos. This makes the amount of multimedia data stored on these products very large. This large amount of data has to be accessed intelligently, and thus managing multimedia content becomes an urgent task. To enable efficient data management, the semantic information of the multimedia content has to be extracted for further manipulation, and machine learning algorithms play an important role in this area. In embedded systems for CE products, the traditional CPU and ASIC cannot satisfy both the flexibility and performance based on their architectures, so the exploration of new design methodologies and solutions are needed for next-generation applications. In this thesis, the hardware architectures of the Gaussian Mixture Model (GMM) and multi-class Support Vector Machine (SVM) machine learning algorithms are proposed to accelerate the image semantic processing and concept feature extraction process in multimedia content analysis. By adopting the local to global concept feature extraction method, the low-level features of the image patches are analyzed using the machine learning algorithms, such as GMM or SVM, and thus the patches can be classified to the pre-defined concept classes. After gathering the classification results of the blocks from the whole image, the semantic concepts can be extracted to represent the image. The mapping process bridges the gap between the low-level feature representation and human perception. Since the computations involved in this process are intensive and burdens the resource limited embedded system, the proposed hardware acceleration schemes are used to deal with this problem. The proposed GMM hardware architecture provides high speed-up and good flexibility by combining the parallelism and folding design technique in different levels. The system can process the computations involved in one Gaussian in only one cycle. Since in the GMM algorithm, each classifier that models the data in one class might have different number of Gaussian distributions, it is more efficient to fold the hardware in the class level to support one Gaussian' s computation at once. By doing so, the user will have more flexibility to set the number of Gaussians per class and the number of classes desired. The proposed multiclass SVM hardware architecture is designed under thorough analyses to meet the trade-off between hardware costs and real-time processing demand. The design is further optimized by the reconfigurable structure to provide different operating modes to satisfy the users' various demands and make good use of the memories. The flexibility includes the three kernel functions, the wide range of the value of parameters, adjustable bit-precision with run-length encoding, and two operating speed modes. When the number of support vectors are too large to be stored, the proposed reload scheme can also be adopted to handle this scenario. In short, the contribution of this thesis consists essentially of a flexible high throughput GMM hardware architecture for image semantic processing and a multi-class SVM hardware architecture design methodology with an optimized reconfigurable prototype for real-time multimedia content analysis. Thorough analyses of the SVM hardware architecture to deal with different scenarios using the reconfigurable hardware architecture are also shown and discussed. The contents of this thesis can be regarded as a series of solutions to the implementation of the hardware architecture of supervised machine learning algorithms, such as GMM and SVM, for multimedia content analysis in CE products.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T05:57:26Z (GMT). No. of bitstreams: 1 ntu-99-R97943008-1.pdf: 27915222 bytes, checksum: 33fffa34a14f591943014ea4c95f6210 (MD5) Previous issue date: 2010	en
dc.description.tableofcontents	Abstract xiii 1 Introduction 1 1.1 Multimedia Content Analysis . . . . . . . . . . . . . . . . . . . . 1 1.2 Machine Learning, and Design Challenges . . . . . . . . . . . . . 2 1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Concept Feature Extraction in Multimedia Content Analysis 7 2.1 Concept Feature Extraction in Multimedia Content Analysis . . . 7 2.1.1 Bridging the SemanticGap . . . . . . . . . . . . . . . . . 7 2.1.2 Concept Feature Extraction. . . . . . . . . . . . . . . . . 8 3 Gaussian Mixture Model-based Classifier Architectural Design 15 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 GMM Algorithm and Semantic Processing . . . . . . . . . . . . . 16 3.2.1 Gaussian Mixture Model Algorithm . . . . . . . . . . . . 16 3.2.2 Semantic Processing . . . . . . . . . . . . . . . . . . . . 20 3.3 Hardware Architecture . . . . . . . . . . . . . . . . . . . . . . . 21 3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . 27 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4 Multi-class Support Vector Machine Algorithm 31 4.1 One-against-allMethod . . . . . . . . . . . . . . . . . . . . . . . 31 4.2 One-against-oneMethod . . . . . . . . . . . . . . . . . . . . . . 37 4.3 Considering All Data At OnceMethod . . . . . . . . . . . . . . . 37 5 Multi-class SVM Hardware Architecture Analysis and Implementations 41 5.1 Hardware Implementation Considerations . . . . . . . . . . . . . 45 5.2 Hardware Architecture Analysis . . . . . . . . . . . . . . . . . . 49 5.2.1 Hardware Cost . . . . . . . . . . . . . . . . . . . . . . . 50 5.2.2 Real-time Requirement . . . . . . . . . . . . . . . . . . . 54 5.3 Hardware Architecture Implementations . . . . . . . . . . . . . . 56 5.3.1 Kernel Function Unit . . . . . . . . . . . . . . . . . . . . 56 5.3.2 Multi-level Reconfigurable Alpha Engine . . . . . . . . . 59 5.3.3 Decision Making Element . . . . . . . . . . . . . . . . . 63 5.4 Parameters Reload Analysis . . . . . . . . . . . . . . . . . . . . 67 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6 Multi-class SVM Experimental Results 79 6.1 Training Data Preparation . . . . . . . . . . . . . . . . . . . . . . 80 6.2 Classification Accuracy Analysis . . . . . . . . . . . . . . . . . . 81 6.2.1 Exponential Function Approximation . . . . . . . . . . . 84 6.2.2 AlphaBit-precision Requirement . . . . . . . . . . . . . 86 6.3 Alpha Memory with Run-length Analysis . . . . . . . . . . . . . 89 6.3.1 Zero Occurrence of Alpha . . . . . . . . . . . . . . . . . 89 6.3.2 Run-length Encoding of Alpha . . . . . . . . . . . . . . . 91 6.4 Hardware Implementation Results . . . . . . . . . . . . . . . . . 93 7 Conclusion 99 Reference 101
dc.language.iso	zh-TW
dc.subject	支持向量機	zh_TW
dc.subject	多媒體內容分析	zh_TW
dc.subject	語意分析	zh_TW
dc.subject	監督式機器學習演算法	zh_TW
dc.subject	硬體架構設計	zh_TW
dc.subject	高斯混和模型	zh_TW
dc.subject	semantic processing	en
dc.subject	Gaussian mixture model	en
dc.subject	hardware architecture	en
dc.subject	supervised machine learning algorithm	en
dc.subject	multimedia content analysis	en
dc.subject	support vector machine	en
dc.title	基於監督式機器學習方法之影像內容分析系統之演算法及硬體架構設計實作	zh_TW
dc.title	Algorithm and Architectural Design of Image Content Analysis System based on Supervised Machine Learning Techniques	en
dc.type	Thesis
dc.date.schoolyear	98-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	賴文能,郭致宏,賴永康,陳祝嵩
dc.subject.keyword	多媒體內容分析,語意分析,監督式機器學習演算法,硬體架構設計,高斯混和模型,支持向量機,	zh_TW
dc.subject.keyword	multimedia content analysis,semantic processing,supervised machine learning algorithm,hardware architecture,Gaussian mixture model,support vector machine,	en
dc.relation.page	106
dc.rights.note	有償授權
dc.date.accepted	2010-08-18
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電子工程學研究所	zh_TW
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-99-1.pdf 未授權公開取用	27.26 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。