請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78481完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 傅立成 | |
| dc.contributor.author | Yu-Chi Chen | en |
| dc.contributor.author | 陳禹齊 | zh_TW |
| dc.date.accessioned | 2021-07-11T14:59:22Z | - |
| dc.date.available | 2022-12-17 | |
| dc.date.copyright | 2019-12-17 | |
| dc.date.issued | 2019 | |
| dc.date.submitted | 2019-12-02 | |
| dc.identifier.citation | [1] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F. Li, 'Imagenet: A Large-Scale Hierarchical Image Database,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248-255, 2009
[2] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, 'The Pascal Visual Object Classes (VOC) Challenge,' International Journal of Computer Vision, vol. 88, pp. 303-338, 2010. [3] T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár and C. Lawrence Zitnick., 'Microsoft COCO: Common Objects in Context,' in European Conference on Computer Vision, pp. 740-755, 2014. [4] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, 'The Cityscapes Dataset for Semantic Urban Scene Understanding,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213-3223, 2016. [5] A. Krizhevsky, I. Sutskever, and G. E. Hinton, 'Imagenet Classification with Deep Convolutional Neural Networks,' in Advances in Neural Information Processing Systems, pp. 1097-1105, 2012. [6] K. Simonyan and A. Zisserman, 'Very Deep Convolutional Networks for Large-Scale Image Recognition,' arXiv preprint arXiv:1409.1556 , 2014. [7] K. He, X. Zhang, S. Ren, and J. Sun, 'Deep Residual Learning for Image Recognition,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016. [8] S. Xie, R.B. Girshick, P. Dollár, Z. Tu, and K. He, 'Aggregated Residual Transformations for Deep Neural Networks,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5987-5995, 2016. [9] R. Girshick, J. Donahue, T. Darrell, and J. Malik, 'Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587, 2013. [10] R.B. Girshick, 'Fast R-CNN,' in International Conference on Computer Vision, pp. 1440-1448, 2015. [11] S. Ren, K. He, R. B. Girshick, and J. Sun, 'Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,' IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, pp. 1137-1149, 2015. [12] T. Lin, P. Dollár, R.B. Girshick, K. He, B. Hariharan, and S.J. Belongie, 'Feature Pyramid Networks for Object Detection,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 936-944, 2017 [13] J.R. Uijlings, K.E. Sande, T. Gevers, and A.W. Smeulders, 'Selective Search for Object Recognition,' International Journal of Computer Vision, pp. 154-171, 2013. [14] J. Redmon, S.K. Divvala, R.B. Girshick, and A. Farhadi, 'You Only Look Once: Unified, Real-Time Object Detection,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788, 2016 [15] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg,, 'SSD: Single Shot Multibox Detector,' in European Conference on Computer Vision, pp. 21-37, 2016. [16] J. Redmon, and A. Farhadi, 'YOLO9000: Better, Faster, Stronger,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517-6525, 2017. [17] J. Redmon, and A. Farhadi, 'YOLOv3: An Incremental Improvement,' arXiv preprint arXiv:1804.02767, 2018. [18] T. Lin, P. Goyal, R.B. Girshick, K. He, and P. Dollár, 'Focal Loss for Dense Object Detection,' in International Conference on Computer Vision, pp. 2999-3007, 2017. [19] Z. Shen, S. Han, L. Fu, P. Hsiao, Y. Lau, and S. Chang, 'Deep Convolution Neural Network with Scene-centric and Object-centric Information for Object Detection,' Image Vision Comput., vol. 85, pp. 14-25, 2019. [20] Y.C. Li, H. Qi, J. Dai, X. Ji, and Y. Wei, , 'Fully Convolutional Instance-Aware Semantic Segmentation,' Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4438-4446, 2016. [21] K. He, G. Gkioxari, P. Dollár, and R.B. Girshick, , 'Mask R-CNN,' in International Conference on Computer Vision, pp. 2980-2988, 2017. [22] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, , 'Path Aggregation Network for Instance Segmentation,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759-8768, 2018. [23] Z. Huang, L. Huang, Y. Gong, C. Huang, and X. Wang., 'Mask Scoring R-CNN,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6409-6418, 2019. [24] D. Bolya, C. Zhou, F. Xiao, and Y.J. Lee, 'YOLACT: Real-time Instance Segmentation,' arXiv preprint arXiv:1904.02689, 2019. [25] Y. Chen, C. Chang, H. Chen, L. Fu, P. Hsiao, and H. Lin., 'Deep One-Time Clustering for Real-Time Instance Segmentation,' in International Conference on Visualization, Graphics and Image Processing, 2019. [26] C. Fu, M. Shvets, and A.C. Berg, 'RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free,' arXiv preprint arXiv:1901.03353., 2019. [27] J. Long, E. Shelhamer, and T. Darrell, 'Fully Convolutional Networks for Semantic Segmentation,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431-3440, 2015. [28] V. Badrinarayanan, A. Kendall, and R. Cipolla, 'SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,' IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, pp. 2481-2495, 2016. [29] Y. Xiong, R. Liao, H. Zhao, R. Hu, M. Bai, E. Yumer, and R. Urtasun, 'UPSNet: A Unified Panoptic Segmentation Network,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8818-8826, 2019. [30] B. Romera-Paredes, and P.H. Torr, 'Recurrent Instance Segmentation,' in European Conference on Computer Vision, p. 312–329, 2016. [31] S.P. Lloyd, 'Least Squares Quantization in PCM,' IEEE Transactions on Information Theory, vol. 28, pp. 129-136, 1982. [32] M. Ren, and R.S. Zemel, 'End-to-End Instance Segmentation and Counting with Recurrent Attention,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6656-6664, 2017. [33] T. Watanabe, and D.F. Wolf, 'Distance to Center of Mass Encoding for Instance Segmentation,' in International Conference on Intelligent Transportation Systems, pp. 3825-3831, 2018. [34] M. Bai, and R. Urtasun, 'Deep Watershed Transform for Instance Segmentation,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2858-2866, 2017. [35] J. Uhrig, M. Cordts, U. Franke, and T. Brox, 'Pixel-level Encoding and Depth Layering for Instance-level Semantic Labeling,' in German Conference on Pattern Recognition, pp. 14-25, 2016. [36] B.D. Brabandere, D. Neven, and L.V. Gool, 'Semantic Instance Segmentation with a Discriminative Loss Function,' in IEEE Conference on Computer Vision and Pattern Recognition Workshop, 2017. [37] A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, 'Enet: A Deep Neural Network Architecture for Real-time Semantic Segmentation,' arXiv preprint arXiv:1606.02147, 2016. [38] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A.L. Yuille, 'DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.,' IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, pp. 834-848, 2016. [39] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, 'Pyramid Scene Parsing Network,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6230-6239., 2016. [40] E. Romera, J.M. Alvarez, L.M. Bergasa, and R. Arroyo, 'ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation,' IEEE Transactions on Intelligent Transportation Systems, vol. 19, pp. 263-272, 2018. [41] N. Dalal and B. Triggs, 'Histograms of Oriented Gradients for Human Detection,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 886-893, 2005. [42] V. Nair, and G.E. Hinton, 'Rectified Linear Units Improve Restricted Boltzmann Machines,' in International Conference on Machine Learning, pp. 807-814, 2010. [43] Zeiler, M.D., & Fergus, R., 'Visualizing and Understanding Convolutional Networks,' in European Conference on Computer Vision, pp. 818-833, 2014. [44] M.D. Zeiler, G.W. Taylor, and R. Fergus, 'Adaptive Deconvolutional Networks for Mid and High Level Feature Learning,' in International Conference on Computer Vision, pp. 2018-2025, 2011. [45] D. P. Kingma and J. Ba,, 'Adam: A Method for Stochastic Optimization,' arXiv preprint arXiv:1412.6980, 2014. [46] M. D. Zeiler and R. Fergus, 'Visualizing and Understanding Convolutional Networks,' in European Conference on Computer Vision, pp. 818-833, 2014. [47] J. Malik, P.A. Arbeláez, J. Carreira, K. Fragkiadaki, R.B. Girshick, G. Gkioxari, S. Gupta, B. Hariharan, A. Kar, and S. Tulsiani, 'The Three R's of Computer Vision: Recognition, Reconstruction and Reorganization,' Pattern Recognition Letters, pp. 4-14, 2016. [48] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S.E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, 'Going Deeper with Convolutions,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, 2014. [49] J. Hu, L. Shen, and G. Sun, 'Squeeze-and-Excitation Networks,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132-7141, 2017. [50] A. Shrivastava, A. Gupta, and R.B. Girshick, 'Training Region-Based Object Detectors with Online Hard Example Mining,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 761-769, 2016. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78481 | - |
| dc.description.abstract | 近年來,使用深度學習結合卷積神經網路之物件偵測系統,發展日趨成熟。現今物件偵測系統會將偵測到之物件以邊界框(Bounding Box)來進行標記,然而使用邊界框標記之方式,無法分辨框內前景物件與背景。實例語意分割系統可以被視為物件偵測系統的延伸,將偵測到之前景物件以語意分割的方式進行逐像素的標記,就能避免無法分辨背景與前景物件的問題,進而能產生更精細的標記。
為達到能於車載系統上之即時應用,除了需要有良好的準確度外,運行速度也是一項重要的因素。現行實例語意分割方法,採用先生成物件的邊界框後再產生前景物件的語意分割的方法擁有較高的準確度但是必須付出龐大的運算成本並且犧牲運行速度,另一方面,擁有較高運行速度的方法,會使用較精簡的網路架構,來提高運行效率,但是會犧牲前景物件的語意分割的品質進而影響到準確度的表現。基於觀察對於不同類型的任務各自有合適的特徵以達到較佳的表現,因此本論文提出了一個能夠對特定任務提供不同特徵的實例語意分割架構,透過分離不同任務各自需要的特徵,並且使用一個融合特徵模組產生更豐富的特徵,以達到改善前景物件的語意分割的品質,從而提升整體實例語意分割的準確率。 為驗證本論文提出的方法,我們於目前實例語意分割領域最具挑戰性的MS COCO資料集上進行評估,在評估結果中顯示本論文提出的方法能在接近即時速度運算下具有30以上的AP,並且能夠改善前景物件的語意分割的品質。此外,本研究也於車輛中心公車亭行人資料集上進行實驗分析,以驗證此方法之通用性。 | zh_TW |
| dc.description.abstract | In recent years, the object detection system using deep learning combined with convolutional neural network has become more and more popular and reliable. Instance segmentation research that can provide detailed pixel-level annotations on detected objects can be thought of as an extension of the bounding box annotation. In order to achieve application on the vehicle system, in addition to the need for high accuracy, running speed is also an important factor. Several approaches for instance segmentation generate bounding box first and then segment each bounding box to obtain instance segmentation mask. These approaches have better performance, but need more time consumption. On the other hand, to reduce the computation complexity and increase the runtime speed, some approaches use simpler network. Nevertheless, these approaches have to sacrifice the quality of instance segmentation mask which leads to the decrease of accuracy.
In this thesis, we propose a novel instance segmentation network which combines Task-Specific Feature Pyramid and feature fusion module to provide more suitable and richer feature for classification and instance segmentation mask generation. With Task-Specific Feature Pyramid, classification and instance segmentation mask generation don’t need to predict based on the same feature, but still can cooperate with each other, and feature fusion module can provide instance segmentation mask generation richer feature. To validate our approach, we evaluate our method on MS COCO dataset, which is the most challenging instance segmentation dataset nowadays. The experimental results show that our method not only can achieve more than 30 AP on MS COCO dataset at near real-time speed, but also improve the quality of instance segmentation mask. | en |
| dc.description.provenance | Made available in DSpace on 2021-07-11T14:59:22Z (GMT). No. of bitstreams: 1 ntu-108-R06922093-1.pdf: 4438834 bytes, checksum: 6df34adc1bcacf1826756678e7fb41ac (MD5) Previous issue date: 2019 | en |
| dc.description.tableofcontents | 口試委員會審定書 i
誌謝 ii 中文摘要 iv ABSTRACT v CONTENTS vi LIST OF FIGURES viii LIST OF TABLES x Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Related Work 7 1.3 Contribution 9 1.4 Thesis Organization 10 Chapter 2 Preliminaries of Fundamental CNN, and Object Detection Framework 12 2.1 Convolutional Neural Network 12 2.1.1 Convolutional Layers 13 2.1.2 Pooling Layers 14 2.1.3 Activation Function 16 2.1.4 Up-Sampling Layers 16 2.1.5 Adam Optimizer 18 2.1.6 Alex-Net 18 2.1.7 Residual-Net 19 2.2 Object Detection Frameworks 20 2.2.1 Faster R-CNN 20 2.2.2 Single Shot Detector (SSD) 21 Chapter 3 Task-Specific Feature Pyramid Net 23 3.1 Problem Formulation 23 3.2 System Overview 24 3.3 Task-Specific Feature Pyramid Network Design 25 3.3.1 Backbone Network 26 3.3.2 Task-Specific Feature Pyramid 27 3.3.3 Protonet 30 3.3.4 Fusion Feature Module 32 3.3.5 Prediction Head 34 3.4 Loss Function 36 3.5 Non-maximum Suppression 37 3.6 Mask Generation 38 Chapter 4 Experiments 40 4.1 The Datasets 40 4.1.1 MS COCO Dataset 40 4.1.2 ARTC Dataset 44 4.2 Hardware and Software Environment 46 4.3 Evaluation Metrics 46 4.4 Results and Discussion 49 4.4.1 MS COCO Dataset 49 4.4.2 ARTC Dataset 55 Chapter 5 Conclusions 59 REFERENCE 61 | |
| dc.language.iso | zh-TW | |
| dc.subject | 實例語意分割 | zh_TW |
| dc.subject | 深度學習 | zh_TW |
| dc.subject | 卷積神經網路 | zh_TW |
| dc.subject | 融合特定任務特徵 | zh_TW |
| dc.subject | Instance Segmentation | en |
| dc.subject | Deep Learning | en |
| dc.subject | Task-Specific Feature Fusion | en |
| dc.subject | Convolution Neural Network | en |
| dc.title | 結合特定任務特徵融合之金字塔深度學習網路實例語意分割系統 | zh_TW |
| dc.title | Pyramid Deep Network with Task-Specific Feature Fusion for Instance Segmentation | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 108-1 | |
| dc.description.degree | 碩士 | |
| dc.contributor.coadvisor | 蕭培墉 | |
| dc.contributor.oralexamcommittee | 黃世勳,傅楸善,方瓊瑤 | |
| dc.subject.keyword | 深度學習,卷積神經網路,實例語意分割,融合特定任務特徵, | zh_TW |
| dc.subject.keyword | Deep Learning,Convolution Neural Network,Instance Segmentation,Task-Specific Feature Fusion, | en |
| dc.relation.page | 66 | |
| dc.identifier.doi | 10.6342/NTU201904348 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2019-12-02 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-108-R06922093-1.pdf 未授權公開取用 | 4.33 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
