OSSN:基於孿生網路對單樣本三維點雲之語義分割模型

Yi-Hsuan Huang; 黃以瑄

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74917

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	莊永裕(Yung-Yu Chuang)
dc.contributor.author	Yi-Hsuan Huang	en
dc.contributor.author	黃以瑄	zh_TW
dc.date.accessioned	2021-06-17T09:10:16Z	-
dc.date.available	2019-10-17
dc.date.copyright	2019-10-17
dc.date.issued	2019
dc.date.submitted	2019-09-25
dc.identifier.citation	[1] I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese. 3d semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1534–1543, 2016. [2] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203, 2013. [3] X. Chen, K. Kundu, Y. Zhu, A. G. Berneshawi, H. Ma, S. Fidler, and R. Urtasun. 3d object proposals for accurate object class detection. In Advances in Neural Information Processing Systems, pages 424-432, 2015. [4] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016. [5] I. Demir, D. G. Aliaga, and B. Benes. Procedural editing of 3d building point clouds. In Proceedings of the IEEE International Conference on Computer Vision, pages 2147–2155, 2015. [6] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams. Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems, pages 2224–2232, 2015. [7] F. Engelmann, T. Kontogianni, A. Hermans, and B. Leibe. Exploring spatial context for 3d semantic segmentation of point clouds. In Proceedings of the IEEE International Conference on Computer Vision, pages 716–724, 2017. [8] A. Ess, T. Mueller, H. Grabner, and L. J. Van Gool. Segmentation-based urban traffic scene understanding. In BMVC, volume 1, page 2. Citeseer, 2009. [9] C. Farabet, C. Couprie, L. Najman, and Y. LeCun. Learning hierarchical features for scene labeling. IEEE transactions on pattern analysis and machine intelligence, 35(8):1915–1929, 2012. [10] C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1126-1135. JMLR. org, 2017. [11] A. Garcia-Garcia, S. Orts, S. Oprea, V. Villena-Martinez, and J. A. Rodriguez. A review on deep learning techniques applied to semantic segmentation. ArXiv, abs/1704.06857, 2017. [12] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 3354–3361. IEEE, 2012. [13] S. Gupta, R. Girshick, P. Arbeláez, and J. Malik. Learning rich features from rgb-d images for object detection and segmentation. In European conference on computer vision, pages 345–360. Springer, 2014. [14] B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik. Simultaneous detection and segmentation. In European Conference on Computer Vision, pages 297–312. Springer, 2014. [15] A. Jain, A. R. Zamir, S. Savarese, and A. Saxena. Structural-rnn: Deep learning on spatio-temporal graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5308-5317, 2016. [16] T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016. [17] G. Koch, R. Zemel, and R. Salakhutdinov. Siamese neural networks for one-shot image recognition. In ICML deep learning workshop, volume 2, 2015. [18] L. Landrieu and M. Simonovsky. Large-scale point cloud semantic segmentation with superpoint graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4558-4567, 2018. [19] Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493, 2015. [20] D. Maturana and S. Scherer. Voxnet: A 3d convolutional neural network for realtime object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 922-928. IEEE, 2015. [21] T. Munkhdalai and H. Yu. Meta networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 2554–2563. JMLR. org, 2017. [22] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas. Frustum pointnets for 3d object detection from rgb-d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 918-927, 2018. [23] C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. arXiv preprint arXiv:1612.00593, 2016. [24] C. R. Qi, H. Su, M. Nießner, A. Dai, M. Yan, and L. J. Guibas. Volumetric and multi-view cnns for object classification on 3d data. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5648–5656, 2016. [25] C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413, 2017. [26] S. Ravi and H. Larochelle. Optimization as a model for few-shot learning. 2016. [27] A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap. One-shot learning with memory-augmented neural networks. arXiv preprint arXiv:1605.06065, 2016. [28] A. Shaban, S. Bansal, Z. Liu, I. Essa, and B. Boots. One-shot learning for semantic segmentation. arXiv preprint arXiv:1709.03410, 2017. [29] H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision, pages 945-953, 2015. [30] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1199–1208, 2018. [31] L. Tchapmi, C. Choy, I. Armeni, J. Gwak, and S. Savarese. Segcloud: Semantic segmentation of 3d point clouds. In 2017 International Conference on 3D Vision (3DV), pages 537–547. IEEE, 2017. [32] O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al. Matching networks for one shot learning. In Advances in neural information processing systems, pages 3630–3638, 2016. [33] C. Wang, B. Samari, and K. Siddiqi. Local spectral graph convolution for point set feature learning. In Proceedings of the European Conference on Computer Vision (ECCV), pages 52–66, 2018. [34] P.-S. Wang, Y. Liu, Y.-X. Guo, C.-Y. Sun, and X. Tong. O-cnn: Octree-based convolutional neural networks for 3d shape analysis. ACM Transactions on Graphics (TOG), 36(4):72, 2017. [35] W. Wang, R. Yu, Q. Huang, and U. Neumann. Sgpn: Similarity group proposal network for 3d point cloud instance segmentation. In CVPR, 2018. [36] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon. Dynamic graph cnn for learning on point clouds. arXiv preprint arXiv:1801.07829, 2018. [37] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1912–1920, 2015. [38] Y. Yang, C. Feng, Y. Shen, and D. Tian. Foldingnet: Point cloud auto-encoder via deep grid deformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 206-215, 2018.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74917	-
dc.description.abstract	在這篇論文中，我們提出了一個新的網路架構OSSN，也是就我們所知第一個針對單樣本三維點雲做語意分割的研究。OSSN的核心概念在於比較各的資料點所學到的特徵之間的相關性，並且，我們假設同一個類別的資料點，在不同的場景之中，仍然可以保持某種程度的相似，而這個假設也在後續的實驗中得到驗證。之所以提出這個新的問題設定有幾個主要的原因，首先，關於點雲的研究在近年來愈來愈受到重視，尤其在場景分析和自駕車等領域被大量使用，而目前準確率較高的方法大多需要用大量的資料來做訓練。但和平面影像相比，點雲不但資料蒐集不易，還非常難給正確的標記，所以我們希望可以使用像OSSN這樣的架構，來學到點與點之間的相關性，並解決輸入資料類別不平均的問題。 OSSN可以分成四個主要的部分：提取特徵，比較特徵之間的相似度，學習閾值和兩個損失函數。我們將OSSN使用在Stanford 3D semantic semantic parsing dataset上，得到了非常好的結果，並且，在論文中也會證明我們所設計的網路結構各部分對於正確率提升的效果。	zh_TW
dc.description.abstract	In this work, we proposed a new network architecture named OSSN. As far as our best knowledge, OSSN is the first model that focuses on solving the problem of semantic segmentation with one-shot 3D point cloud. The core concept of OSSN is to compare the similarity between features of each individual points based on our hypothesis that points belong to the same class would still have high similarity even in different backgrounds. Such hypothesis is then confirmed by the obtained result. There are various motivations for our work. First, the study of 3D point cloud is getting more attentions recently, especially in the fields like scene analysis and applications related to autonomous vehicles. Second, the methods developed so far are still supervised learning that base on large amount of training samples which make these methods less applicable to many practical problems. Last but not least, even in the case that large amount of data is available, labeling them precisely may still require great labors, and in the worst case, the data might be imbalanced that complicate the overall procedure. Our OSSN is capable of solving, or alleviating the aforementioned problems by leveraging the similarity information between points. There are four major parts in OSSN: feature extraction, similarity comparison, the learned classification threshold, and two loss functions as the goodness criterion. Our OSSN achieves extraordinary performance in Stanford 3d semantic parsing dataset and in the work, we give viable explanations on the design philosophy and also why it works.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T09:10:16Z (GMT). No. of bitstreams: 1 ntu-108-R06944060-1.pdf: 12102470 bytes, checksum: bf803e51d09d86e9c71a59e9387108cf (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	致謝 - i 摘要 - ii Abstract - iii 1 Introduction - 1 2 Related Work - 3 2.1 3D object application in point cloud - 3 2.2 Graph convolutional network - 4 2.3 Semantic segmentation - 5 2.4 Few-shot learning - 5 3 Problem Setup - 6 4 The Proposed Approach - 8 4.1 Full model - 8 4.2 Feature extraction - 9 4.2.1 The PointNet - 9 4.2.2 The GCN - 10 4.3 Similarity measurement - 11 4.3.1 Cosine similarity matrix - 12 4.3.2 Weighted sum - 12 4.4 Similarity threshold - 13 4.4.1 Training process - 13 4.4.2 Testing process - 14 4.5 Loss functions - 15 4.5.1 Similarity loss L_Sim - 16 4.5.2 Binary Cross-Entropy Loss L_BCE - 16 5 Experiment - 18 5.1 Dataset - 18 5.2 Experimental setup - 19 5.3 Comparison to the baseline model - 20 5.4 Ablation study - 21 6 Conclusion - 24 Bibliography - 25
dc.language.iso	en
dc.subject	點雲	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	語意分割	zh_TW
dc.subject	單樣本學習	zh_TW
dc.subject	Deep learning	en
dc.subject	Point cloud	en
dc.subject	Semantic segmentation	en
dc.subject	One-shot learning	en
dc.title	OSSN:基於孿生網路對單樣本三維點雲之語義分割模型	zh_TW
dc.title	OSSN: A One-shot Siamese Network for SemanticSegmentation of 3D Point Clouds	en
dc.type	Thesis
dc.date.schoolyear	108-1
dc.description.degree	碩士
dc.contributor.coadvisor	林彥宇(Yen-Yu Lin)
dc.contributor.oralexamcommittee	修丕承(Pi-Cheng Hsiu)
dc.subject.keyword	深度學習,點雲,語意分割,單樣本學習,	zh_TW
dc.subject.keyword	Deep learning,Point cloud,Semantic segmentation,One-shot learning,	en
dc.relation.page	29
dc.identifier.doi	10.6342/NTU201904151
dc.rights.note	有償授權
dc.date.accepted	2019-09-25
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊網路與多媒體研究所	zh_TW
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 未授權公開取用	11.82 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。