基於動態點擊轉換之互動式物件切割演算法及其高效率嵌入式系統實作

Chun-Tse Lin; 林均澤

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73426

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	簡韶逸(Shao-Yi Chien)
dc.contributor.author	Chun-Tse Lin	en
dc.contributor.author	林均澤	zh_TW
dc.date.accessioned	2021-06-17T07:34:12Z	-
dc.date.available	2020-12-25
dc.date.copyright	2020-12-25
dc.date.issued	2020
dc.date.submitted	2020-12-07
dc.identifier.citation	S. Mahadevan, P. Voigtlaender, and B. Leibe, “Iteratively trained interactive segmentation,” arXiv preprint arXiv:1805.04398, 2018. K. Sofiiuk, I. Petrov, O. Barinova, and A. Konushin, “f-BRS: Rethinking backpropagating refinement for interactive segmentation,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 8623–8632. C. Rother, V. Kolmogorov, and A. Blake, “”GrabCut” interactive foreground extraction using iterated graph cuts,” ACM Transactions on Graphics (TOG), vol. 23, no. 3, pp. 309–314, 2004. K. McGuinness and N. E. O’connor, “A comparative evaluation of interactive segmentation algorithms,” Pattern Recognition, vol. 43, no. 2, pp. 434–444, 2010. F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung, “A benchmark dataset and evaluation methodology for video object segmentation,” in Proceedings of IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), 2016, pp. 724–732. Y. Y. Boykov and M.-P. Jolly, “Interactive graph cuts for optimal boundary region segmentation of objects in nd images,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), vol. 1. IEEE, 2001, pp. 105–112. X. Bai and G. Sapiro, “A geodesic framework for fast interactive image and video segmentation and matting,” in Proceedings of IEEE International Conference on Computer Vision (ICCV). IEEE, 2007, pp. 1–8. L. Grady, “Random walks for image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 28, no. 11, pp. 1768–1783, 2006. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431–3440. N. Xu, B. Price, S. Cohen, J. Yang, and T. S. Huang, “Deep interactive object selection,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 373–381. J. Liew, Y. Wei, W. Xiong, S.-H. Ong, and J. Feng, “Regional interactive image segmentation networks,” in Proceedings of IEEE International Confer- ence on Computer Vision (ICCV). IEEE, 2017, pp. 2746–2754. Z. Li, Q. Chen, and V. Koltun, “Interactive image segmentation with latent diversity,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 577–585. K.-K. Maninis, S. Caelles, J. Pont-Tuset, and L. Van Gool, “Deep extreme cut: From extreme points to object segmentation,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 616–625. S. Majumder and A. Yao, “Content-aware multi-level guidance for interactive instance segmentation,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 11 602–11 611. W.-D. Jang and C.-S. Kim, “Interactive image segmentation via backpropa- gating refinement scheme,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 5297–5306. Z. Lin, Z. Zhang, L.-Z. Chen, M.-M. Cheng, and S.-P. Lu, “Interactive image segmentation with first click attention,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 13 339–13 348. K. Sofiiuk, O. Barinova, and A. Konushin, “AdaptIS: Adaptive instance selection network,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2019, pp. 7355–7363. L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 40, no. 4, pp. 834–848, 2017. L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder- decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of European Conference on Computer Vision (ECCV), Septem- ber 2018. Y. Hu, A. Soltoggio, R. Lock, and S. Carter, “A fully convolutional two- stream fusion network for interactive image segmentation,” Neural Networks, vol. 109, pp. 31–42, 2019. M. Forte, B. Price, S. Cohen, N. Xu, and F. Pitie ́, “Getting to 99% accuracy in interactive segmentation,” arXiv preprint arXiv:2003.07932, 2020. D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Instance normalization: The missing ingredient for fast stylization,” arXiv preprint arXiv:1607.08022, 2016. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015. X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2017, pp. 1501–1510. R.Girshick,“FastR-CNN,”inProceedingsofIEEEInternationalConference on Computer Vision (ICCV), 2015, pp. 1440–1448. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proceedings of Advances in Neural Information Processing Systems, 2015, pp. 91–99. X. Zhou, D. Wang, and P. Kra ̈henbu ̈hl, “Objects as points,” arXiv preprint arXiv:1904.07850, 2019. D.Martin,C.Fowlkes,D.Tal,andJ.Malik,“Adatabaseofhumansegmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), vol. 2. IEEE, 2001, pp. 416–423. B. Hariharan, P. Arbela ́ez, L. Bourdev, S. Maji, and J. Malik, “Semantic contours from inverse detectors,” in Proceedings of IEEE International Con- ference on Computer Vision (ICCV). IEEE, 2011, pp. 991–998. J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 248–255. V. Vezhnevets and V. Konouchine, “GrowCut: Interactive multi-label nd image segmentation by cellular automata,” in Proceedings of Graphicon, vol. 1, no. 4, 2005, pp. 150–156. V. Gulshan, C. Rother, A. Criminisi, A. Blake, and A. Zisserman, “Geodesic star convexity for interactive image segmentation,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2010, pp. 3129–3136.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73426	-
dc.description.abstract	互動式物件切割(interactive object segmentation)會依照使用者給予的指示將目標對象切割出來，並且透過互動持續修正標示錯誤的區域，達到更加精確的切割結果。這項技術中最主要的挑戰來自於使用者給予的指令與目標物之間的不確定性，如何明確表示使用者意象並以最少的互動次數達到精確的切割結果一直都是這項技術中熱門的研究對象。傳統演算法需要大量的使用者標示來估計前景和背景的分佈。近年來隨著深度學習在電腦視覺上的成功應用，基於深度學習的互動式物件切割算法將使用者互動轉換為圖片，並通過卷積神經網絡(Convolutional Neural Network)預測切割物件。這些方法在展示優異結果的同時也提高了計算複雜度，不利於嵌入式系統上的實作。在本篇論文中，我們提出了動態點擊轉換演算法來更好地表示使用者給予的指令與互動，並同時考慮空間幾何與特徵分佈，善加利用使用者傳達的資訊。與現有演算法相比，動態點擊轉換演算法展現了良好的表現，證實了提出方法的有效性。除此之外，我們透過一連串的優化在Nvidia Jetson TX2開發版上實現了高效率的演算法實作，達到與使用者的即時互動，提供精確的切割結果和低延遲的使用者體驗。	zh_TW
dc.description.abstract	In the interactive segmentation, a user initially indicates the target object to segment the main body and then provides corrections on mislabeled regions to iteratively refine the segmentation mask. The main challenge of this task originates from the ambiguity in the correlation between user annotations and the target object. Researchers have been finding ways to represent user interactions and segment a precise mask with the least interactions. Traditional algorithms require substantial user annotation to estimate the distribution of foreground and background. With the success of deep learning techniques on computer vision tasks, learning-based interactive segmentation algorithms have become popular in recent years. These methods convert user annotations into interaction maps and predict the mask from a convolutional neural network (CNN). However, these methods demonstrate superior results while increasing the computational complexity, which is not suitable for deploying on an embedded system. In this thesis, we propose a Dynamic Click Transform algorithm to better represent user interactions and take both spatial geometry and feature distribution into consideration. We demonstrate the effectiveness of our proposed method and achieve favorable performance compared to the state-of-the-arts. Furthermore, we accelerate our algorithm and implement it on an embedded system, NVIDIA Jetson TX2. This system performs a real-time interactive segmentation that provides high-quality results and a low-latency user experience.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T07:34:12Z (GMT). No. of bitstreams: 1 U0001-0712202000292400.pdf: 25538171 bytes, checksum: e3ceacb424a2480e4c2da309b0db16b9 (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	Abstract i List of Figures v List of Tables vii 1 Introduction 1 1.1 Interactive Segmentation . . . . . . . . . . . . . . . . . . . . . 2 1.2 Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . 7 2 Related Work 9 2.1 Learning Based Interactive Segmentation . . . . . . . . . . . . . 9 2.2 Instance Normalization . . . . . . . . . . . . . . . . . . . . . 12 2.3 Backpropagation Refinement Scheme . . . . . . . . . . . . . . . . 13 3 Proposed Method 17 3.1 Dynamic Spatial Transform . . . . . . . . . . . . . . . . . . . . 19 3.1.1 Click and Drag Scheme . . . . . . . . . . . . . . . . . . . . . 19 3.1.2 Auto Drag Head . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Dynamic Feature Transform . . . . . . . . . . . . . . . . . . . . 21 3.2.1 High-level Instance Normalization . . . . . . . . . . . . . . . 22 3.2.2 Multi-level Instance Normalization . . . . . . . . . . . . . . 23 3.3 Segment from Negative First Click . . . . . . . . . . . . . . . . 23 3.4 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4 Experiments 27 4.1 Datasets and Evaluation Metrics . . . . . . . . . . . . . . . . . 27 4.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . 28 4.3 Comparison to State-of-the-Art . . . . . . . . . . . . . . . . . 29 4.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.4.1 Effectiveness of Dynamic Spatial Transform . . . . . . . . . . 29 4.4.2 Effectiveness of Dynamic Feature Transform . . . . . . . . . . 31 4.4.3 Analysis of Model Architecture . . . . . . . . . . . . . . . . 33 4.5 Qualitative Comparison . . . . . . . . . . . . . . . . . . . . . 34 5 Efficient Embedded System Implementation 41 5.1 Accelerate from Algorithm . . . . . . . . . . . . . . . . . . . . 42 5.2 Accelerate from TensorRT . . . . . . . . . . . . . . . . . . . . 43 5.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . 44 6 Conclusion 45 Reference 47
dc.language.iso	en
dc.title	基於動態點擊轉換之互動式物件切割演算法及其高效率嵌入式系統實作	zh_TW
dc.title	Interactive Object Segmentation with Dynamic Click Transform Algorithm and Efficient Embedded System Implementation	en
dc.type	Thesis
dc.date.schoolyear	109-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	莊永裕(YY Chuang),徐宏民(Winston Hsu),曹昱(Yu Tsao)
dc.subject.keyword	互動式分割,物件分割,深度學習,	zh_TW
dc.subject.keyword	interactive segmentation,object segmentation,deep learning,	en
dc.relation.page	51
dc.identifier.doi	10.6342/NTU202004397
dc.rights.note	有償授權
dc.date.accepted	2020-12-08
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電子工程學研究所	zh_TW
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
U0001-0712202000292400.pdf 目前未授權公開取用	24.94 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。