基於深度學習的物件追蹤算法與海豚偵測與識別

Hung-Wei Hsu; 許宏瑋

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7589

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	丁建均
dc.contributor.author	Hung-Wei Hsu	en
dc.contributor.author	許宏瑋	zh_TW
dc.date.accessioned	2021-05-19T17:47:14Z	-
dc.date.available	2023-06-26
dc.date.available	2021-05-19T17:47:14Z	-
dc.date.copyright	2018-06-26
dc.date.issued	2018
dc.date.submitted	2018-06-21
dc.identifier.citation	Chapter 1. Introduction [1] H. Nam and B. Han, “Learning multi-domain convolutional neural networks for visual tracking,” CoRR, vol. abs/1510.07945, 2015. arXiv: 1510.07945. [Online]. Available: http://arxiv.org/abs/1510.07945. [2] H. Nam, M. Baek, and B. Han, “Modeling and propagating cnns in a tree struc- ture for visual tracking,” CoRR, vol. abs/1608.07242, 2016. arXiv: 1608 . 07242. [Online]. Available: http://arxiv.org/abs/1608.07242. [3] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pas- cal visual object classes (voc) challenge,” International journal of computer vision, vol. 88, no. 2, pp. 303–338, 2010. [4] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European confer- ence on computer vision, Springer, 2014, pp. 740–755. [5] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in Neural Information Pro- cessing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds., Curran Associates, Inc., 2015, pp. 91–99. [Online]. Available: http:/ / papers . nips . cc / paper / 5638 - faster - r - cnn - towards - real - time - object - detection-with-region-proposal-networks.pdf. [6] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” arXiv preprint arXiv:1312.6034, 2013. [7] Z. Luo, A. Mishra, A. Achkar, J. Eichel, S. Li, and P.-M. Jodoin, “Non-local deep features for salient object detection,” in IEEE CVPR, 2017. [8] Q. Hou, M.-M. Cheng, X. Hu, A. Borji, Z. Tu, and P. Torr, “Deeply supervised salient object detection with short connections,” in Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, IEEE, 2017, pp. 5300–5309. [9] J. Harel, C. Koch, and P. Perona, “Graph-based visual saliency,” in Advances in neural information processing systems, 2007, pp. 545–552. [10] J. Kim, D. Han, Y.-W. Tai, and J. Kim, “Salient region detection via high-dimensional color transform and local spatial support,” IEEE transactions on image processing, vol. 25, no. 1, pp. 9–23, 2016. [11] F. Iandola, M. Moskewicz, S. Karayev, R. Girshick, T. Darrell, and K. Keutzer, “Densenet: Implementing efficient convnet descriptor pyramids,” arXiv preprint arXiv:1404.1869, 2014. Chapter 2. Reviews of Tracking Techniques [12] A. Yilmaz, O. Javed, and M. Shah, “Object tracking: A survey,” ACM Comput. Surv., vol. 38, no. 4, Dec. 2006, ISSN: 0360-0300. DOI: 10.1145/1177352.1177355.[Online]. Available: http://doi.acm.org/10.1145/1177352.1177355. [13] K. Litomisky, “Consumer rgb-d cameras and their applications,” Rapport tech- nique, University of California, vol. 20, 2012. [14] S. Dubuisson and C. Gonzales, “A survey of datasets for visual tracking,” Machine Vision and Applications, vol. 27, pp. 23–52, 2015. [15] L. Schwarz, A. Mkhitaryan, D. Mateus, and N. Navab, “Human skeleton tracking from depth data using geodesic distances and optical flow,” vol. 30, 217??26, Mar. 2012. [16] Q. Cai, D. Gallup, C. Zhang, and Z. Zhang, “3d deformable face tracking with a commodity depth camera,” in Computer Vision – ECCV 2010, K. Daniilidis, P. Maragos, and N. Paragios, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 229–242, ISBN: 978-3-642-15558-1. [17] C. Qian, X. Sun, Y. Wei, X. Tang, and J. Sun, “Realtime and robust hand tracking from depth,” in 2014 IEEE Conference on Computer Vision and Pattern Recogni- tion, Jun. 2014, pp. 1106–1113. DOI: 10.1109/CVPR.2014.145. [18] H. Nanda and K. Fujimura, “Visual tracking using depth data,” in 2004 Conference on Computer Vision and Pattern Recognition Workshop, Jun. 2004, pp. 37–37. DOI: 10.1109/CVPR.2004.202. [19] M. Firman, “RGBD Datasets: Past, Present and Future,” in CVPR Workshop on Large Scale 3D Data: Acquisition, Modelling and Analysis, 2016. [20] P. K. Nathan Silberman Derek Hoiem and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in ECCV , 2012. [21] M. Camplani, A. Paiement, M. Mirmehdi, D. Damen, S. L. Hannuna, T. Burghardt, and L. Tao, “Multiple human tracking in RGB-D data: A survey,” CoRR, vol. abs/1606.04450, 2016. arXiv: 1606.04450. [Online]. Available: http://arxiv.org/abs/1606.04450. [22] S. Song and J. Xiao, “Tracking revisited using rgbd camera: Unified benchmark and baselines,” in Proceedings of the 2013 IEEE International Conference on Com- puter Vision, ser. ICCV ’13, Washington, DC, USA: IEEE Computer Society, 2013, pp. 233–240, ISBN: 978-1-4799-2840-8. DOI: 10.1109/ICCV.2013.36. [Online]. Available: http://dx.doi.org/10.1109/ICCV.2013.36. [23] L. Cruz, D. Lucio, and L. Velho, “Kinect and rgbd images: Challenges and appli- cations,” in 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images Tutorials, Aug. 2012, pp. 36–49. DOI: 10.1109/SIBGRAPI-T.2012.13. [24] O. H. Jafari, D. Mitzel, and B. Leibe, “Real-time rgb-d based people detection and tracking for mobile robots and head-worn cameras,” in 2014 IEEE International Conference on Robotics and Automation (ICRA), May 2014, pp. 5636–5643. DOI: 10.1109/ICRA.2014.6907688. [25] M. Luber, L. Spinello, and K. O. Arras, “People tracking in rgb-d data with on- line boosted target models,” in 2011 IEEE/RSJ International Conference on Intel- ligent Robots and Systems, Sep. 2011, pp. 3844–3849. DOI: 10.1109/IROS.2011. 6095075. [26] A. G. A. Perera, C. Srinivas, A. Hoogs, G. Brooksby, and W. Hu, “Multi-object tracking through simultaneous long occlusions and split-merge conditions,” in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 1, Jun. 2006, pp. 666–673. DOI: 10.1109/CVPR.2006.195. [27] B. Yang and R. Nevatia, “Multi-target tracking by online learning of non-linear mo- tion patterns and robust appearance models,” in 2012 IEEE Conference on Com- puter Vision and Pattern Recognition, Jun. 2012, pp. 1918–1925. DOI: 10.1109/ CVPR.2012.6247892. [28] H. Izadinia, V. Ramakrishna, K. M. Kitani, and D. Huber, “Multi-pose multi-target tracking for activity understanding,” in 2013 IEEE Workshop on Applications of Computer Vision (WACV), Jan. 2013, pp. 385–390. DOI: 10.1109/WACV.2013. 6475044. [29] A. A. Perera, C. Srinivas, A. Hoogs, G. Brooksby, and W. Hu, “Multi-object track- ing through simultaneous long occlusions and split-merge conditions,” in Com- puter Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, IEEE, vol. 1, 2006, pp. 666–673. [30] A. W. M. Smeulders, D. M. Chu, R. Cucchiara, S. Calderara, A. Dehghan, and M. Shah, “Visual tracking: An experimental survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 7, pp. 1442–1468, Jul. 2014, ISSN: 0162-8828. DOI: 10.1109/TPAMI.2013.230. [31] D. M. Chu and A. W. M. Smeulders, “Thirteen hard cases in visual tracking,” in 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance, Aug. 2010, pp. 103–110. DOI: 10.1109/AVSS.2010.85. [32] Y. Wu, J. Lim, and M. H. Yang, “Online object tracking: A benchmark,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2013, pp. 2411– 2418. DOI: 10.1109/CVPR.2013.312. [33] ——, “Object tracking benchmark,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1834–1848, Sep. 2015, ISSN: 0162-8828. DOI: 10.1109/TPAMI.2014.2388226. [34] M. Kristan, J. Matas, A. Leonardis, T. Vojir, R. Pflugfelder, G. Fernandez, G. Nebe- hay, F. Porikli, and L. Čehovin, “A novel performance evaluation methodology for single-target trackers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 11, pp. 2137–2155, Nov. 2016, ISSN: 0162-8828. DOI: 10.1109/TPAMI.2016.2516982. Chapter 3. Proposed Faster-MDNet [1] H. Nam and B. Han, “Learning multi-domain convolutional neural networks for visual tracking,” CoRR, vol. abs/1510.07945, 2015. arXiv: 1510.07945. [Online]. Available: http://arxiv.org/abs/1510.07945. [32] Y. Wu, J. Lim, and M. H. Yang, “Online object tracking: A benchmark,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2013, pp. 2411– 2418. DOI: 10.1109/CVPR.2013.312. [35] X. Mei and H. Ling, “Robust visual tracking using l1 minimization,” in 2009 IEEE 12th International Conference on Computer Vision, Sep. 2009, pp. 1436–1443. DOI: 10.1109/ICCV.2009.5459292. [36] T. Zhang, B. Ghanem, S. Liu, and N. Ahuja, “Robust visual tracking via multi- task sparse learning,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2012, pp. 2042–2049. DOI: 10.1109/CVPR.2012.6247908. [37] M. Danelljan, A. Robinson, F. S. Khan, and M. Felsberg, “Beyond correlation filters: Learning continuous convolution operators for visual tracking,” CoRR, vol. abs/1608.03773, 2016. arXiv: 1608.03773. [Online]. Available: http://arxiv.org/abs/1608.03773. [38] L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. S. Torr, “Fully- convolutional siamese networks for object tracking,” CoRR, vol. abs/1606.09549, 2016. arXiv: 1606.09549. [Online]. Available: http://arxiv.org/abs/1606.09549. [39] Q. Gan, Q. Guo, Z. Zhang, and K. Cho, “First step toward model-free, anonymous object tracking with recurrent neural networks,” CoRR, vol. abs/1511.06425, 2015. arXiv: 1511.06425. [Online]. Available: http://arxiv.org/abs/1511.06425. [40] S. E. Kahou, V. Michalski, and R. Memisevic, “RATM: recurrent attentive tracking model,” CoRR, vol. abs/1510.08660, 2015. arXiv: 1510.08660. [Online]. Available: http://arxiv.org/abs/1510.08660. [41] J. Dequaire, D. Rao, P. Ondruska, D. Z. Wang, and I. Posner, “Deep tracking on the move: Learning to track the world from a moving vehicle using recurrent neural networks,” CoRR, vol. abs/1609.09365, 2016. arXiv: 1609.09365. [Online]. Avail- able: http://arxiv.org/abs/1609.09365. [42] G. Ning, Z. Zhang, C. Huang, Z. He, X. Ren, and H. Wang, “Spatially super- vised recurrent convolutional neural networks for visual object tracking,” CoRR, vol. abs/1607.05781, 2016. arXiv: 1607.05781. [Online]. Available: http://arxiv. org/abs/1607.05781. [43] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” CoRR, vol. abs/1506.02640, 2015. arXiv: 1506. 02640. [Online]. Available: http://arxiv.org/abs/1506.02640. [44] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, “Return of the devil in the details: Delving deep into convolutional nets,” CoRR, vol. abs/1405.3531, 2014. arXiv: 1405.3531. [Online]. Available: http://arxiv.org/abs/1405.3531. [45] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large- scale hierarchical image database,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, IEEE, 2009, pp. 248–255. Chapter 4. Proposed RDisp: Recurrent Detection is Pow- erful [1] H. Nam and B. Han, “Learning multi-domain convolutional neural networks for visual tracking,” CoRR, vol. abs/1510.07945, 2015. arXiv: 1510.07945. [Online]. Available: http://arxiv.org/abs/1510.07945. [5] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in Neural Information Pro- cessing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds., Curran Associates, Inc., 2015, pp. 91–99. [Online]. Available: http:/ / papers . nips . cc / paper / 5638 - faster - r - cnn - towards - real - time - object - detection-with-region-proposal-networks.pdf. [32] Y. Wu, J. Lim, and M. H. Yang, “Online object tracking: A benchmark,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2013, pp. 2411– 2418. DOI: 10.1109/CVPR.2013.312. [33] ——, “Object tracking benchmark,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1834–1848, Sep. 2015, ISSN: 0162-8828. DOI: 10.1109/TPAMI.2014.2388226. [43] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” CoRR, vol. abs/1506.02640, 2015. arXiv: 1506. 02640. [Online]. Available: http://arxiv.org/abs/1506.02640. [46] H. W. Hsu and J. J. Ding, “Fastermdnet: Learning model adaptation by rnn in tracking-by-detection based visual tracking,” in 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Dec. 2017, pp. 657–660. DOI: 10.1109/APSIPA.2017.8282115. [47] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. E. Reed, C. Fu, and A. C. Berg, “SSD: single shot multibox detector,” in ECCV (1), ser. Lecture Notes in Computer Science, vol. 9905, Springer, 2016, pp. 21–37. [48] J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” CoRR, vol. abs/1612.08242, 2016. arXiv: 1612.08242. [Online]. Available: http://arxiv.org/abs/1612.08242. [49] X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-k. Wong, and W.-c. Woo, “Convolutional lstm network: A machine learning approach for precipitation nowcasting,” in Pro- ceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, ser. NIPS’15, Montreal, Canada: MIT Press, 2015, pp. 802– 810. [Online]. Available: http://dl.acm.org/citation.cfm?id=2969239.2969329. [50] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neural im- age caption generator,” in Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, IEEE, 2015, pp. 3156–3164. [51] M. Kristan, A. Leonardis, J. Matas, M. Felsberg, R. Pflugfelder, L. Čehovin, T. Vojír, G. Häger, A. Lukežič, G. Fernández, A. Gupta, A. Petrosino, A. Memar moghadam, A. Garcia-Martin, A. Solís Montero, A. Vedaldi, A. Robinson, A. J. Ma, A. Varfolomieiev, A. Alatan, A. Erdem, B. Ghanem, B. Liu, B. Han, B. Mar- tinez, C.-M. Chang, C. Xu, C. Sun, D. Kim, D. Chen, D. Du, D. Mishra, D.-Y. Yeung, E. Gundogdu, E. Erdem, F. Khan, F. Porikli, F. Zhao, F. Bunyak, F. Batti- stone, G. Zhu, G. Roffo, G. R. K. S. Subrahmanyam, G. Bastos, G. Seetharaman, H. Medeiros, H. Li, H. Qi, H. Bischof, H. Possegger, H. Lu, H. Lee, H. Nam, H. J. Chang, I. Drummond, J. Valmadre, J.-c. Jeong, J.-i. Cho, J.-Y. Lee, J. Zhu, J. Feng, J. Gao, J. Y. Choi, J. Xiao, J.-W. Kim, J. Jeong, J. F. Henriques, J. Lang, J. Choi, J. M. Martinez, J. Xing, J. Gao, K. Palaniappan, K. Lebeda, K. Gao, K. Mikolajczyk, L. Qin, L. Wang, L. Wen, L. Bertinetto, M. K. Rapuru, M. Poostchi, M. Maresca, M. Danelljan, M. Mueller, M. Zhang, M. Arens, M. Valstar, M. Tang, M. Baek, M. H. Khan, N. Wang, N. Fan, N. Al-Shakarji, O. Miksik, O. Akin, P. Moallem, P. Senna, P. H. S. Torr, P. C. Yuen, Q. Huang, R. Martin-Nieto, R. Pelapur, R. Bowden, R. Laganière, R. Stolkin, R. Walsh, S. B. Krah, S. Li, S. Zhang, S. Yao, S. Hadfield, S. Melzi, S. Lyu, S. Li, S. Becker, S. Golodetz, S. Kakanuru, S. Choi, T. Hu, T. Mauthner, T. Zhang, T. Pridmore, V. Santopietro, W. Hu, W. Li, W. Hübner, X. Lan, X. Wang, X. Li, Y. Li, Y. Demiris, Y. Wang, Y. Qi, Z. Yuan, Z. Cai, Z. Xu, Z. He, and Z. Chi, “The visual object tracking vot2016 challenge results,” in Com- puter Vision – ECCV 2016 Workshops, G. Hua and H. Jégou, Eds., Cham: Springer International Publishing, 2016, pp. 777–823, ISBN: 978-3-319-48881-3. [52] M. Danelljan, G. Hager, F. Shahbaz Khan, and M. Felsberg, “Learning spatially regularized correlation filters for visual tracking,” in Proceedings of the IEEE In- ternational Conference on Computer Vision, 2015, pp. 4310–4318. [53] M. Danelljan, A. Robinson, F. S. Khan, and M. Felsberg, “Beyond correlation fil- ters: Learning continuous convolution operators for visual tracking,” in European Conference on Computer Vision, Springer, 2016, pp. 472–488. [54] J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, “High-speed tracking with kernelized correlation filters,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 3, pp. 583–596, 2015. Chapter 5. Proposed Dolphin Detection [4] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European confer- ence on computer vision, Springer, 2014, pp. 740–755. [5] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in Neural Information Pro- cessing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds., Curran Associates, Inc., 2015, pp. 91–99. [Online]. Available: http:/ / papers . nips . cc / paper / 5638 - faster - r - cnn - towards - real - time - object - detection-with-region-proposal-networks.pdf. [45] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large- scale hierarchical image database,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, IEEE, 2009, pp. 248–255. [48] J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” CoRR, vol. abs/1612.08242, 2016. arXiv: 1612.08242. [Online]. Available: http://arxiv.org/abs/1612.08242. [55] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000. Chapter 6. Proposed Dolphin Identification [6] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” arXiv preprint arXiv:1312.6034, 2013. [7] Z. Luo, A. Mishra, A. Achkar, J. Eichel, S. Li, and P.-M. Jodoin, “Non-local deep features for salient object detection,” in IEEE CVPR, 2017. [8] Q. Hou, M.-M. Cheng, X. Hu, A. Borji, Z. Tu, and P. Torr, “Deeply supervised salient object detection with short connections,” in Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, IEEE, 2017, pp. 5300–5309. [9] J. Harel, C. Koch, and P. Perona, “Graph-based visual saliency,” in Advances in neural information processing systems, 2007, pp. 545–552. [10] J. Kim, D. Han, Y.-W. Tai, and J. Kim, “Salient region detection via high-dimensional color transform and local spatial support,” IEEE transactions on image processing, vol. 25, no. 1, pp. 9–23, 2016. [11] F. Iandola, M. Moskewicz, S. Karayev, R. Girshick, T. Darrell, and K. Keutzer, “Densenet: Implementing efficient convnet descriptor pyramids,” arXiv preprint arXiv:1404.1869, 2014. [45] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large- scale hierarchical image database,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, IEEE, 2009, pp. 248–255. [56] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recog- nition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. Chapter 7. Conclusions and Future Works [1] H. Nam and B. Han, “Learning multi-domain convolutional neural networks for visual tracking,” CoRR, vol. abs/1510.07945, 2015. arXiv: 1510.07945. [Online]. Available: http://arxiv.org/abs/1510.07945. [5] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in Neural Information Pro- cessing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds., Curran Associates, Inc., 2015, pp. 91–99. [Online]. Available: http:/ / papers . nips . cc / paper / 5638 - faster - r - cnn - towards - real - time - object - detection-with-region-proposal-networks.pdf. [7] Z. Luo, A. Mishra, A. Achkar, J. Eichel, S. Li, and P.-M. Jodoin, “Non-local deep features for salient object detection,” in IEEE CVPR, 2017. [8] Q. Hou, M.-M. Cheng, X. Hu, A. Borji, Z. Tu, and P. Torr, “Deeply supervised salient object detection with short connections,” in Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, IEEE, 2017, pp. 5300–5309. [9] J. Harel, C. Koch, and P. Perona, “Graph-based visual saliency,” in Advances in neural information processing systems, 2007, pp. 545–552. [10] J. Kim, D. Han, Y.-W. Tai, and J. Kim, “Salient region detection via high-dimensional color transform and local spatial support,” IEEE transactions on image processing, vol. 25, no. 1, pp. 9–23, 2016. [11] F. Iandola, M. Moskewicz, S. Karayev, R. Girshick, T. Darrell, and K. Keutzer, “Densenet: Implementing efficient convnet descriptor pyramids,” arXiv preprint arXiv:1404.1869, 2014. [46] H. W. Hsu and J. J. Ding, “Fastermdnet: Learning model adaptation by rnn in tracking-by-detection based visual tracking,” in 2017 Asia-Pacific Signal and In- formation Processing Association Annual Summit and Conference (APSIPA ASC), Dec. 2017, pp. 657–660. DOI: 10.1109/APSIPA.2017.8282115. [57] J. R. Uijlings, K. E. Van De Sande, T. Gevers, and A. W. Smeulders, “Selective search for object recognition,” International journal of computer vision, vol. 104, no. 2, pp. 154–171, 2013.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7589	-
dc.description.abstract	本論文包含兩部分，第一部分有關class-agnostic tracking。本論文提出兩個演算法，希望透過深度學習對於時序和影像資料的豐富特徵，解決目前class-agnostic tracking領域的問題。第二部分是將電腦視覺技術應用在環境保育領域，針對海豚資料的偵測及辨識。在trakcing上，本論文中提出FasterMDNet和RDisp。FasterMDNet將MD-Net中時間複雜度極高的online training以根據RNN為基礎的模型更新策略取代，並以重複的online training和back-propagation through time (BPTT)來訓練。Faster-MDNet比起MDNet，時間節省大約十倍左右。RDisp將已訓練好的物件偵測模型，加上ConvRNN，建立追蹤模型。此外，由於BPTT在訓練RDisp的缺陷，本論文提出以兩階段片段訓練取代BPTT來訓練RDisp。RDisp在GPU上的執行速度大約是25 fps，並能夠克服多種常見的影像變化。在海豚偵測與辨識上，主要的兩個演算法是Faster-RCNN和DenseNet。此外，在海豚名稱辨識上，單純的海浪背景讓訓練好的模型無法著重在海豚本身的細節特徵上。本論文提出結合基於深度學習和基於規則的saliency領域的演算法，來偵測海豚的區域，並把海面部分刪除，除去海面的影響。	zh_TW
dc.description.abstract	The first topic is class-agnostic visual tracking. The proposed algorithms attempt to tackle this problem via strong representative ability of temporal-spatial information in deep-learning techniques. The second topic is dolphin detection and identification. The FasterMDNet and the RDisp are proposed. The FasterMDNet replaces computation-costly online training by an RNN model adaptation and trains RNN model adaptation by repeated online training via back-propagation through time (BPTT). The temporal cost is reduced to around 10 times faster than MD-Net with little sacrificed on accuracy. The RDisp incorporates pretrained detection model with ConvRNN cells. A two-stage clip training is proposed to replace BPTT in training to solve some defects of BPTT. The RDisp runs at 25 frames per second with consistency under multiple circumstances. In the dolphin detection and identification, the trunks are the Faster-RCNN and the Densenet. In classification of dolphin names, interference of sea surfaces distracts the model from details of dolphins. The ensemble of deep-learning based and rule-based saliency detection algorithms with a soft gaussian threshold is proposed to create the dolphin mask to remove the pixels of sea surfaces.	en
dc.description.provenance	Made available in DSpace on 2021-05-19T17:47:14Z (GMT). No. of bitstreams: 1 ntu-107-R05942039-1.pdf: 9320481 bytes, checksum: bb390c49c4073cf93db9aa75cf4f98a6 (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	口試委員會審定書. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i 誌謝. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii 摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Abstract. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Contents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Figures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Chapter 1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 2 Reviews of Tracking Techniques. . . . . . . . . . . . . . . . . . . 4 2.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Problem Formulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2.1 Video Frame Channels: RGB, D, and RGB-D. . . . . . . . . . . 7 2.2.2 Class-Aware and Class-Agnostic Tracking. . . . . . . . . . . . . 8 2.3 Class-Agnostic RGB Tracking. . . . . . . . . . . . . . . . . . . . . . . 9 2.4 Datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4.1 Amsterdam Library of Oridnary Videos fo tracking (ALOV++). 10 2.4.2 Object Tracking Benchmarks (OTB50/OTB100). . . . . . . . . 11 2.4.3 Visual Object Tracking (VOT). . . . . . . . . . . . . . . . . . . 14 Chapter 3 Proposed Faster-MDNet. . . . . . . . . . . . . . . . . . . . . . . . 18 3.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Related Works. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3 MDNet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4 Proposed Faster-MDNet. . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4.1 Three-Phase Training. . . . . . . . . . . . . . . . . . . . . . . . 25 3.4.2 Prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.5 Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.5.1 Implementation Details. . . . . . . . . . . . . . . . . . . . . . . 26 3.5.2 short-term finetuning. . . . . . . . . . . . . . . . . . . . . . . . 28 3.5.3 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.6 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Chapter 4 Proposed RDisp: Recurrent Detection is Powerful. . . . . . . . . 31 4.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2 Related Works. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3 Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.3.1 Transfer Learning from Detection. . . . . . . . . . . . . . . . . 34 4.3.2 Injection of Temporal Information by Convolutional Recurrent Neural Network Cells. . . . . . . . . . . . . . . . . . . . . . . 35 4.3.3 Proposed Model. . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3.4 Multi Layer Injection. . . . . . . . . . . . . . . . . . . . . . . . 38 4.3.5 Two Stage Clip Training. . . . . . . . . . . . . . . . . . . . . . 38 4.4 Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.4.1 Implementation Details. . . . . . . . . . . . . . . . . . . . . . . 40 4.4.2 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.4.3 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Chapter 5 Proposed Dolphin Detection. . . . . . . . . . . . . . . . . . . . . 49 5.1 Motivations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.2 Dolphin Datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.3 System Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.4 Detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.4.1 Patch Matching. . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.4.2 Detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Chapter 6 Proposed Dolphin Identification. . . . . . . . . . . . . . . . . . . 58 6.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.2 DenseNet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.3 Classification of stages and angles. . . . . . . . . . . . . . . . . . . . . 62 6.3.1 Observations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.3.2 Methods and experiments. . . . . . . . . . . . . . . . . . . . . 62 6.3.3 Classification of angles. . . . . . . . . . . . . . . . . . . . . . . 65 6.4 Classification of names. . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.4.1 DenseNet121. . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.4.2 Gradient-based saliency map. . . . . . . . . . . . . . . . . . . . 66 6.4.3 Saliency mask on dolphin. . . . . . . . . . . . . . . . . . . . . 70 6.4.4 Saliency results. . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.5 Results on masked images. . . . . . . . . . . . . . . . . . . . . . . . . 78 6.6 Demo Application. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Chapter 7 Conclusions and Future Works. . . . . . . . . . . . . . . . . . . . 82 7.1 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.2 Future Works. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
dc.language.iso	en
dc.title	基於深度學習的物件追蹤算法與海豚偵測與識別	zh_TW
dc.title	Deep Learning Based Algorithms for Object Tracking and Dolphin Detection and Identification	en
dc.type	Thesis
dc.date.schoolyear	106-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	王鵬華,余執彰,張榮吉
dc.subject.keyword	物件追蹤,卷積遞歸神經網路,物件偵測,圖像分類,細粒度圖像分類,DenseNet,媽祖魚,	zh_TW
dc.subject.keyword	visual tracking,convolutional recurrent neural network,detection,classification,Taiwanese Humpback Dolphin,DenseNet,saliency,	en
dc.relation.page	98
dc.identifier.doi	10.6342/NTU201801036
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2018-06-21
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
dc.date.embargo-lift	2023-06-26	-
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf	9.1 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。