以移動立體視覺相機搭配無監督式時間與空間特徵與監督式應用遞迴類神經網路進行動態物體偵測

Tsung-Han Lin; 林宗翰

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/60502

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	王傑智(Chieh-Chih (Bob)
dc.contributor.author	Tsung-Han Lin	en
dc.contributor.author	林宗翰	zh_TW
dc.date.accessioned	2021-06-16T10:19:56Z	-
dc.date.available	2013-08-26
dc.date.copyright	2013-08-26
dc.date.issued	2013
dc.date.submitted	2013-08-16
dc.identifier.citation	Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & S ‥ usstrunk, S. (2010). Slic superpixels. E′cole Polytechnique Fe′de′ral de Lausssanne (EPFL), Tech. Rep, 149300. Alcantarilla, P. F., Yebes, J. J., Almaz′an, J., & Bergasa, L. M. (2012). On combining visual slam and dense scene flow to increase the robustness of localization and mapping in dynamic environments. In IEEE International Conference on Robotics and Automation (ICRA), (pp. 1290–1297). IEEE. Coates, A. & Ng, A. (2011). Selecting receptive fields in deep networks. In Advances in Neural Information Processing Systems, (pp. 2528–2536). Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2012). Scene parsing with multiscale feature learning, purity trees, and optimal covers. In International Conference on Machine Learning (ICML). Geiger, A., Ziegler, J., & Stiller, C. (2011). Stereoscan: Dense 3d reconstruction in real-time. In Intelligent Vehicles Symposium (IV). Goller, C. & Kuchler, A. (1996). Learning task-dependent distributed representations by backpropagation through structure. In Neural Networks, 1996., IEEE International Conference on, volume 1, (pp. 347–352). IEEE. Grangier, D., Bottou, L., & Collobert, R. (2009). Deep convolutional networks for scene parsing. In ICML 2009 Deep Learning Workshop, volume 3. Citeseer. Jain, V., Murray, J., Roth, F., Turaga, S., Zhigulin, V., Briggman, K., Helmstaedter, M., Denk,W., & Seung, H. (2007). Supervised learning of image restoration with convolutional networks. In IEEE 11th International Conference on Computer Vision (ICCV). Jarrett, K., Kavukcuoglu, K., Ranzato, M., & LeCun, Y. (2009). What is the best multi-stage architecture for object recognition? In Computer Vision, 2009 IEEE 12th International Conference on, (pp. 2146–2153). Kundu, A., Krishna, K., & Sivaswamy, J. (2009). Moving object detection by multiview geometric techniques from a single camera mounted robot. In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, (pp. 4306–4312). Le, Q., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G., Dean, J., & Ng, A. (2012). Building high-level features using large scale unsupervised learning. In International Conference in Machine Learning (ICML). Le, Q., Zou, W., Yeung, S., & Ng, A. (2011). Learning hierarchical invariant spatiotemporal features for action recognition with independent subspace analysis. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). Le, Q. V., Karpenko, A., Ngiam, J., & Ng, A. (2011). Ica with reconstruction cost for efficient overcomplete feature learning. In Advances in Neural Information Processing Systems, (pp. 1017–1025). Le, Q. V., Ngiam, J., Chen, Z., Chia, D., Koh, P. W., & Ng, A. Y. (2010). Tiled convolutional neural networks. In In NIPS, in press. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. Lenz, P., Ziegler, J., Geiger, A., & Roser, M. (2011). Sparse scene flow segmentation for moving object detection in urban environments. In Intelligent Vehicles Symposium (IV), 2011 IEEE, (pp. 926–932). Ma, L. & Zhang, L. (2008). Overcomplete topographic independent component analysis. Neurocomputing, 71(10), 2217–2223. Ning, F., Delhomme, D., LeCun, Y., Piano, F., Bottou, L., & Barbano, P. E. (2005). Toward automatic phenotyping of developing embryos from videos. Image Processing, IEEE Transactions on, 14(9), 1360–1371. Ratliff, N., Bagnell, J. A., & Zinkevich, M. (2006). Subgradient methods for maximum margin structured learning. In ICML Workshop on Learning in Structured Output Spaces, volume 46. Citeseer. Socher, R., Huval, B., Bhat, B., Manning, C. D., & Ng, A. Y. (2012). Convolutionalrecursive deep learning for 3d object classification. In Advances in Neural Information Processing Systems 25. Socher, R., Lin, C. C., Ng, A. Y., & Manning, C. D. (2011). Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 26th International Conference on Machine Learning (ICML). Taskar, B., Klein, D., Collins, M., Koller, D., & Manning, C. D. (2004). Max-margin parsing. In EMNLP, volume 1, (pp.˜3). van Hateren, J. H. & Ruderman, D. L. (1998). Independent component analysis of natural image sequences yields spatiotemporal filters similar to simple cells in primary visual cortex. In Royal Society: Biological Sciences. Wedel, A., Meissner, A., Rabe, C., Franke, U., & Cremers, D. (2009). Detection and segmentation of independently moving objects from dense scene flow. In Proceedings of the 7th International Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition. Zou, W. Y., Zhu, S., Ng, A. Y., & Yu, K. (2012). Deep learning of invariant features via simulated fixations in video. In Advances in Neural Information Processing Systems (NIPS).
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/60502	-
dc.description.abstract	本論文提出用深度學習(deep learning) 直接從原始資料無監督式 (unsupervisedly) 的學出動態物體的特徵。具體來說, ，深度學習的技術，像捲積 (convolution), 兩個非線性的固定卷積 (pooling), 與堆疊 (stacking) ，會被運用來學習時間與空間特徵 (spatio-temporal features) 的多層次表示 (hierarchical representation)。本文是基於移動立體視覺相機的資料進行學習。時間與空間特徵整合遞迴類神經網路 (Recursive Neural Network) 之後，就可以從影像中辨認出動態物體 (motion segmentation) 。實驗結果顯示，本文提出來的方法，比較於用點特徵 (point feature) 加上運動模型 (egomotion) 的方法，可以在難偵測點特徵的地方，提取特徵助於動態物體偵測	zh_TW
dc.description.abstract	IN this work deep learning is used to unsupervisedly learn features directly from raw data. Instead of hand-engineering features for each new sensor input data, the system advantageously adapts to new data by unsupervised learning. More specifically, deep learning techniques of convolution, pooling,and stacking are used to learn hierarchical representation of spatio-temporal features from unlabeled stereo video data. The spatio-temporal features are learned based on Reconstruction Independent Component Analysis (RICA) autoencoder.The learned features are then applied to do motion segmentation on moving objects in images from a moving stereo camera. In order to do so the spatio-temporalfeatures are extracted from image segments, and Recursive Neural Network is used to recursively build up a segmentation tree to segment out moving objects from the scenes. To our knowledge, this is the first time deep learning is applied on learning spatio-temporal features together with motion segmentation (scene-parsing). Comparing to moving object detection methods using point features with egomotion estimation, we show our features can be extracted in situations where good point features are not detectable. The system is evaluated with real-world data with results similar to state-of-the-art, while achieving better detection in certain situations.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T10:19:56Z (GMT). No. of bitstreams: 1 ntu-102-R00922141-1.pdf: 4532028 bytes, checksum: 5ad0f96727773d6b5bfcc9e992a1af8f (MD5) Previous issue date: 2013	en
dc.description.tableofcontents	ABSTRACT.................................. ii LIST OF FIGURES............................ iv CHAPTER 1. Introduction ................... 1 CHAPTER 2. Related Work ..................... 3 CHAPTER 3. Unsupervised Spatio-temporal Feature Learning . 5 3.1. Deep Learning Concepts .................... 5 3.1.1. Autoencoders ............................. 5 3.1.2. Convolution .............................. 7 3.1.3. Pooling .................................. 8 3.1.4. Stacking ................................. 9 3.2. Reconstruction ICA Learning Module ......... 10 3.3. Stacked Architecture ....................... 12 3.4. Spatio-temporal Features Analysis and Visualization . 14 CHAPTER 4. Motion Segmentation - Recursive Neural Network . 18 4.1. Generating Features for Individual Segment ..... 18 4.2. Cost function and Max-Margin Estimation ........ 19 4.3. Greedy Structure Prediction .................... 21 CHAPTER 5. Experiments ............................. 25 5.1. Training .............................25 5.1.1. Unsupervised Spatio-temporal Feature Training...25 5.1.2. Recursive Neural Network Parsing..............26 5.2. Dataset........................................26 5.3. Moving Object Detection with Libviso2 Egomotion Estimation... 27 5.4. Results ........................................28 5.5. Analysis ..........................................29 CHAPTER 6. Conclusion and Future Work......................35 BIBLIOGRAPHY ..........................................36
dc.language.iso	en
dc.subject	動態物體偵測	zh_TW
dc.subject	移動立體視覺相機	zh_TW
dc.subject	遞迴類神經網路	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	時間與空間特徵學習	zh_TW
dc.subject	Recursive Neural Network	en
dc.subject	Autoencoders	en
dc.subject	Motion segmentation	en
dc.subject	Moving object detection	en
dc.subject	Reconstruction Independent Component Analysis	en
dc.subject	Deep Learning	en
dc.title	以移動立體視覺相機搭配無監督式時間與空間特徵與監督式應用遞迴類神經網路進行動態物體偵測	zh_TW
dc.title	Unsupervised Spatio-temporal Feature Learning and Supervised Recursive Neural Network learning for Motion Segmentation from a Moving Stereo Camera	en
dc.type	Thesis
dc.date.schoolyear	101-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	劉長遠(Cheng-Yuan Liou),傅立成(Li-Chen Fu),李明穗(Ming-Sui Lee),連豊力(Feng-Li Lian)
dc.subject.keyword	深度學習,遞迴類神經網路,移動立體視覺相機,動態物體偵測,時間與空間特徵學習,	zh_TW
dc.subject.keyword	Deep Learning,Autoencoders,Motion segmentation, Moving object detection,Reconstruction Independent Component Analysis,Recursive Neural Network,	en
dc.relation.page	38
dc.rights.note	有償授權
dc.date.accepted	2013-08-16
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-102-1.pdf 未授權公開取用	4.43 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。