應用跨注意力機制於弱監督式3D點雲分割

楊証琨; Cheng-Kun Yang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88160

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	莊永裕	zh_TW
dc.contributor.advisor	Yung-Yu Chuang	en
dc.contributor.author	楊証琨	zh_TW
dc.contributor.author	Cheng-Kun Yang	en
dc.date.accessioned	2023-08-08T16:34:13Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-08-08	-
dc.date.issued	2023	-
dc.date.submitted	2023-07-17	-
dc.identifier.citation	[1] Panos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, and Leonidas Guibas. Learning representations and generative models for 3D point clouds. In ICML, 2018. [2] Jiwoon Ahn and Suha Kwak. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In CVPR, 2018. 8, 70 [3] Iro Armeni, Ozan Sener, Amir R Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 3D semantic parsing of large-scale indoor spaces. In CVPR, 2016. 5, 6, 23, 24, 25, 44, 62 [4] Philip Bachman, R Devon Hjelm, and William Buchwalter. Learning representations by maximizing mutual information across views. In NIPS, 2019. [5] Amy Bearman, Olga Russakovsky, Vittorio Ferrari, and Li Fei-Fei. What’s the point: Semantic segmentation with point supervision. In ECCV, 2016. [6] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. In ECCV, 2020. 11, 43 [7] Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012, 2015. 32 [8] Yu-Ting Chang, Qiaosong Wang, Wei-Chih Hung, Robinson Piramuthu, Yi-Hsuan Tsai, and Ming-Hsuan Yang. Weakly-supervised semantic segmentation via sub-category exploration. In CVPR, 2020. 62 [9] Nenglun Chen, Lingjie Liu, Zhiming Cui, Runnan Chen, Duygu Ceylan, Changhe Tu, and Wenping Wang. Unsupervised learning of intrinsic structural representation points. In CVPR, 2020. 3 [10] Siheng Chen, Baoan Liu, Chen Feng, Carlos Vallespi-Gonzalez, and Carl Wellington. 3D point cloud processing and learning for autonomous driving: Impacting map creation, localization, and perception. IEEE Signal Processing Magazine, 2020. 1 [11] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In ICML, 2020. 21 [12] Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. Multi-view 3D object detection network for autonomous driving. In CVPR, 2017. 1 [13] Yun-Chun Chen, Yen-Yu Lin, Ming-Hsuan Yang, and Jia-Bin Huang. Show, match and segment: Joint weakly supervised learning of semantic matching and object co-segmentation. TPAMI, 2020. 9 [14] Zhiqin Chen, Kangxue Yin, Matthew Fisher, Siddhartha Chaudhuri, and Hao Zhang. BAE-NET: Branched autoencoder for shape co-segmentation. In CVPR, 2019. 10 [15] Julian Chibane, Francis Engelmann, Tuan Anh Tran, and Gerard Pons-Moll. Box2mask: Weakly supervised 3D semantic instance segmentation using bounding boxes. In ECCV, 2022. 53 [16] Christopher Choy, JunYoung Gwak, and Silvio Savarese. 4D spatio-temporal convnets: Minkowski convolutional neural networks. In CVPR, 2019. 1, 38, 43, 44, 46, 47, 57, 61, 63, 64 [17] Sammy Christen, Wei Yang, Claudia P ́erez-D’Arpino, Otmar Hilliges, Dieter Fox, and Yu-Wei Chao. Learning human-to-robot handovers from point clouds. In CVPR, 2023. 1 [18] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016. 1 [19] Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In ICCV, 2017. 5, 6, 24, 25, 44, 45, 62, [20] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020. 75 [21] Oren Dovrat, Itai Lang, and Shai Avidan. Learning to sample. In CVPR, 2019. 12, 18 [22] Thibaut Durand, Taylor Mordan, Nicolas Thome, and Matthieu Cord. Wildcat: Weakly supervised learning of deep convnets for image classification, pointwise localization and segmentation. In CVPR, 2017. 13 [23] Nikolas Engelhard, Felix Endres, J ̈urgen Hess, J ̈urgen Sturm, and Wolfram Burgard. Real-time 3D visual SLAM with a hand-held RGB-D camera. In Proceedings of the RGB-D Workshop on 3D Perception in Robotics at the European Robotics Forum, 2011. 1 [24] Kyle Genova, Xiaoqi Yin, Abhijit Kundu, Caroline Pantofaru, Forrester Cole, Avneesh Sud, Brian Brewington, Brian Shucker, and Thomas Funkhouser. Learning 3D semantic segmentation with only 2D image supervision. In International Conference on 3D Vision (3DV), 2021. 11 [25] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. In CVPR, 2020. 21 [26] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In ICCV, 2015. 51 [27] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016. 58, 61 [28] Ji Hou, Benjamin Graham, Matthias Nießner, and Saining Xie. Exploring data-efficient 3D scene understanding with contrastive scene contexts. In CVPR, 2021. 1, 8, 44, 45, 46, 62 [29] Ji Hou, Saining Xie, Benjamin Graham, Angela Dai, and Matthias Nießner. Pri3D: Can 3D priors help 2D representation learning? In ICCV, 2021. 11 [30] Kuang-Jui Hsu, Yen-Yu Lin, and Yung-Yu Chuang. Co-attention CNNs for unsupervised object co segmentation. In IJCAI, 2018. 3, 9, 11, 35 [31] Kuang-Jui Hsu, Yen-Yu Lin, and Yung-Yu Chuang. DeepCo3: Deep instance co-segmentation by co-peak search and co-saliency detection. In CVPR, 2019. 3, 9, 11 [32] Qingyong Hu, Bo Yang, Guangchi Fang, Yulan Guo, Ale ˇs Leonardis, Niki Trigoni, and Andrew Markham. SQN: Weakly-supervised semantic segmentation of large-scale 3D point clouds. In ECCV, 2022. 8, 9 [33] Ruizhen Hu, Lubin Fan, and Ligang Liu. Co-segmentation of 3D shapes via subspace clustering. In Computer graphics forum, 2012. 10 [34] Wenbo Hu, Hengshuang Zhao, Li Jiang, Jiaya Jia, and Tien-Tsin Wong. Bidirectional projection network for cross dimension scene understanding. In CVPR, 2021. 11, 53, 63, 65, 66, 69 [35] Binh-Son Hua, Quang-Hieu Pham, Duc Thanh Nguyen, Minh-Khoi Tran, Lap-Fai Yu, and Sai-Kit Yeung. SceneNN: A scene meshes dataset with annotations. In 3DV, 2016. 24 [36] Zeyi Huang, Yang Zou, Vijayakumar Bhagavatula, and Dong Huang. Comprehensive attention self distillation for weakly-supervised object detection. In NIPS, 2020. 42 [37] Maximilian Ilse, Jakub Tomczak, and Max Welling. Attention-based deep multiple instance learning. In ICML, 2018. 13, 51 [38] Stephen James, Paul Wohlhart, Mrinal Kalakrishnan, Dmitry Kalashnikov, Alex Irpan, Julian Ibarz, Sergey Levine, Raia Hadsell, and Konstantinos Bousmalis. Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks. In CVPR, 2019. 1 [39] Maximilian Jaritz, Jiayuan Gu, and Hao Su. Multi-view pointnet for 3D scene understanding. In CVPR Workshop, 2019. 11, 53 [40] Dingwen Zhang Junwei Han, Rong Quan and Feiping Nie. Robust object co-segmentation using background prior. TIP, 2018. 3, 9, 11 [41] Hoel Kervadec, Jose Dolz, Meng Tang, Eric Granger, Yuri Boykov, and Ismail Ben Ayed. Constrained-cnn losses for weakly supervised segmentation. Medical image analysis, 2019. 8 [42] Anna Khoreva, Rodrigo Benenson, Jan Hosang, Matthias Hein, and Bernt Schiele. Simple does it: Weakly supervised instance and semantic segmentation. In CVPR, 2017. 7 [43] Jinwoo Kim, Jaehoon Yoo, Juho Lee, and Seunghoon Hong. Setvae: Learning hierarchical composition for generative modeling of set-structured data. In CVPR, 2021. 11 [44] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2014. 43, 61 [45] Alexander Kolesnikov and Christoph H Lampert. Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In ECCV, 2016. 7, 8, 13 [46] Abhijit Kundu, Xiaoqi Yin, Alireza Fathi, David Ross, Brian Brewington, Thomas Funkhouser, and Caroline Pantofaru. Virtual multi-view fusion for 3D semantic segmentation. In ECCV, 2020. 11, 53, 69 [47] Suha Kwak, Seunghoon Hong, and Bohyung Han. Weakly supervised semantic segmentation using superpixel pooling network. In AAAI, 2017. 7, [48] Hyeokjun Kweon and Kuk-Jin Yoon. Joint learning of 2D-3D weakly supervised semantic segmentation. In NIPS, 2022. 8, 9, 11, 12, 53, 63, 64, [49] Florent Lafarge and Cl ́ement Mallet. Creating large-scale city models from 3D-point clouds: A robust approach with hybrid representation. IJCV, 2012. [50] Loic Landrieu and Martin Simonovsky. Large-scale point cloud semantic segmentation with superpoint graphs. In CVPR, 2018. 1 [51] Itai Lang, Asaf Manor, and Shai Avidan. SampleNet: Differentiable point cloud sampling. In CVPR, 2020. 12, 16, 18, 19, 23 [52] Min Seok Lee, Seok Woo Yang, and Sung Won Han. Gaia: Graphical information gain based attention network for weakly supervised point cloud semantic segmentation. In WACV, 2023. 8 [53] Seungho Lee, Minhyun Lee, Jongwuk Lee, and Hyunjung Shim. Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation. In CVPR, 2021. 8 [54] Mengtian Li, Yuan Xie, Yunhang Shen, Bo Ke, Ruizhi Qiao, Bo Ren, Shaohui Lin, and Lizhuang Ma. Hybridcr: Weakly-supervised 3D point cloud semantic segmentation via hybrid contrastive regularization. In CVPR, 2022. 8 [55] Siqi Li, Changqing Zou, Yipeng Li, Xibin Zhao, and Yue Gao. Attention-based multi-modal fusion network for semantic scene completion. In AAAI, 2020. 11 [56] Weihao Li, Omid Hosseini Jafari, and Carsten Rother. Deep object co-segmentation. In ACCV, 2018. 3, 9, 11 [57] Di Lin, Jifeng Dai, Jiaya Jia, Kaiming He, and Jian Sun. ScribbleSup: Scribble-supervised convolutional networks for semantic segmentation. In CVPR, 2016. 7, 8 [58] Kangcheng Liu, Yuzhi Zhao, Zhi Gao, and Ben M Chen. Weaklabel3D-net: A complete framework for real-scene lidar point clouds weakly supervised multi-tasks understanding. In ICRA, 2022. 8 [59] Qing Liu, Vignesh Ramanathan, Dhruv Mahajan, Alan Yuille, and Zhenheng Yang. Weakly supervised instance segmentation for videos with temporal mask consistency. In CVPR, 2021. 57 [60] Yingfei Liu, Tiancai Wang, Xiangyu Zhang, and Jian Sun. PETR: Position embedding transformation for multi-view 3D object detection. In ECCV, 2022. 1, 11, 53, 75 [61] Zhengzhe Liu, Xiaojuan Qi, and Chi-Wing Fu. One thing one click: A self-training approach for weakly supervised 3D semantic segmentation. In CVPR, 2021. 2, 8, 9, 35, 45, 46, 47, 48, 50, 53, 57, 62, 63, 64, 70 [62] Oded Maron and Tom ́as Lozano-P ́erez. A framework for multiple-instance learning. NIPS, 1997. 4, 35 [63] John McCormac, Ankur Handa, Stefan Leutenegger, and Andrew J Davison. SceneNet RGB-D: Can 5M synthetic images beat generic ImageNet pre-training on indoor segmentation? In ICCV, 2017. 69 [64] Kaichun Mo, Shilin Zhu, Angel X Chang, Li Yi, Subarna Tripathi, Leonidas J Guibas, and Hao Su. PartNet: A large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In CVPR, 2019. 10 [65] Sanjeev Muralikrishnan, Vladimir G Kim, and Siddhartha Chaudhuri. Tags2Parts: Discovering semantic regions from shape tags. In CVPR, 2018. [66] Kevin Musgrave, Serge Belongie, and Ser-Nam Lim. Pytorch metric learning. arXiv preprint arXiv:2008.09164, 2020. 23 [67] Ehsan Nezhadarya, Ehsan Taghavi, Ryan Razani, Bingbing Liu, and Jun Luo. Adaptive hierarchical down-sampling for point cloud classification. In CVPR, 2020. 12 [68] Songyou Peng, Kyle Genova, Chiyu Jiang, Andrea Tagliasacchi, Marc Pollefeys, Thomas Funkhouser, et al. Openscene: 3D scene understanding with open vocabularies. In CVPR, 2023. 1 [69] Charles R Qi, Xinlei Chen, Or Litany, and Leonidas J Guibas. ImVoteNet: Boosting 3D object detection in point clouds with image votes. In CVPR, 2020. 1 [70] Charles R Qi, Or Litany, Kaiming He, and Leonidas J Guibas. Deep hough voting for 3D object detection in point clouds. In ICCV, 2019. 12 [71] Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. PointNet: Deep learning on point sets for 3D classification and segmentation. In CVPR, 2017. 23, 41, 44, 62 [72] Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. PointNet++: Deep hierarchical feature learning on point sets in a metric space. In NIPS, 2017. 12, 26, 27, 30, 39, 44, 46, 47, 57, 62, 63 [73] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In ICLR, 2021. 61 [74] Zhongzheng Ren, Ishan Misra, Alexander G Schwing, and Rohit Girdhar. 3D spatial recognition without spatially labeled 3D. In CVPR, 2021. [75] Damien Robert, Bruno Vallet, and Loic Landrieu. Learning multi-view aggregation in the wild for large-scale 3D semantic segmentation. In CVPR, 2022. [76] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. ImageNet large scale visual recognition challenge. IJCV, 2015. [77] Jonathan Sauder and Bjarne Sievers. Self-supervised deep learning on point clouds by reconstructing space. In NIPS, 2019. [78] Hanyu Shi, Guosheng Lin, Hao Wang, Tzu-Yi Hung, and Zhenhua Wang. Spsequencenet: Semantic segmentation network on 4D point clouds. In CVPR, 2020. [79] Hanyu Shi, Jiacheng Wei, Ruibo Li, Fayao Liu, and Guosheng Lin. Weakly supervised segmentation on outdoor 4D point clouds with temporal matching and spatial graph propagation. In CVPR, 2022. [80] Zhenyu Shu, Chengwu Qi, Shiqing Xin, Chao Hu, Li Wang, Yu Zhang, and Ligang Liu. Unsupervised 3D shape segmentation and co-segmentation via deep learning. CAGD, 2016. [81] Guolei Sun, Wenguan Wang, Jifeng Dai, and Luc Van Gool. Mining cross-image semantics for weakly supervised semantic segmentation. In ECCV, 2020. 7, 8, 11, 35, 39, 58, 70 [82] Tianfang Sun, Zhizhong Zhang, Xin Tan, Yanyun Qu, Yuan Xie, and Lizhuang Ma. Image understands point cloud: Weakly supervised 3D semantic segmentation via association learning. arXiv preprint arXiv:2209.07774, 2022. 8 [83] Weixuan Sun, Jing Zhang, and Nick Barnes. 3D guided weakly supervised semantic segmentation. In ACCV, 2020. 11 [84] An Tao, Yueqi Duan, Yi Wei, Jiwen Lu, and Jie Zhou. SegGroup: Seg-level supervision for 3D instance and semantic segmentation. TIP, 2022. 8 [85] Hugues Thomas, Charles R Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, Fran c ̧ ois Goulette, and Leonidas J Guibas. KPConv: Flexible and deformable convolution for point clouds. In CVPR, 2019. 43, 44, 46, 47 [86] Chung-Chi Tsai, Weizhi Li, Kuang-Jui Hsu, Xiaoning Qian, and Yen-Yu Lin. Image co-saliency detection and co-segmentation via progressive joint optimization. TIP, 2018. 9 [87] Ardian Umam, Cheng-Kun Yang, Yung-Yu Chuang, Jen-Hui Chuang, and Yen-Yu Lin. Point mixSwap: Attentional point cloud mixing via swapping matched structural divisions. In ECCV, 2022. 11 [88] Ozan Unal, Dengxin Dai, and Luc Van Gool. Scribble-supervised lidar semantic segmentation. In CVPR, 2022. 8 [89] Mikaela Angelina Uy, Quang-Hieu Pham, Binh-Son Hua, Thanh Nguyen, and Sai-Kit Yeung. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In ICCV, 2019. 5, 23, 24, 25, 26, 32 86 [90] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NIPS, 2017. 2, 11, 17, 20, 31, 35, 36, 60 [91] Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks. In CVPR, 2018. 11 [92] Xinlong Wang, Tete Xiao, Yuning Jiang, Shuai Shao, Jian Sun, and Chunhua Shen. Repulsion loss: Detecting pedestrians in a crowd. In CVPR, 2018. 20 [93] Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. Dynamic graph cnn for learning on point clouds.TOG, 2019. 23, 26, 27, 30, 41, 43, 44, 62 [94] Yikai Wang, TengQi Ye, Lele Cao, Wenbing Huang, Fuchun Sun, Fengxiang He, and Dacheng Tao. Bridged transformer for vision and point cloud 3D object detection. In CVPR, 2022. 11, 53 [95] Yude Wang, Jie Zhang, Meina Kan, Shiguang Shan, and Xilin Chen. Self- supervised equivariant attention mechanism for weakly supervised semantic segmentation. In CVPR, 2020. 7, 8, 42 [96] Ziyi Wang, Yongming Rao, Xumin Yu, Jie Zhou, and Jiwen Lu. SemAffiNet: Semantic-affine transformation for point cloud segmentation. In CVPR, 2022. 11, 53, 63, 64 [97] Jiacheng Wei, Guosheng Lin, Kim-Hui Yap, Tzu-Yi Hung, and Lihua Xie. Multi-path region mining for weakly supervised 3D semantic segmentation on point clouds. In CVPR, 2020. ix, 5, 8, 9, 17, 24, 26, 27, 28, 30, 35, 39, 44, 45, 46, 47, 48, 50, 53, 62, 63, 64, 68 [98] Yunchao Wei, Jiashi Feng, Xiaodan Liang, Ming-Ming Cheng, Yao Zhao, and Shuicheng Yan. Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In CVPR, 2017. 7 [99] Yunchao Wei, Huaxin Xiao, Honghui Shi, Zequn Jie, Jiashi Feng, and Thomas S Huang. Revisiting dilated convolution: A simple approach for weakly-and semi-supervised semantic segmentation. In CVPR, 2018. 7, 8 [100] Florian Wirth, Jannik Quehl, Jeffrey Ota, and Christoph Stiller. PointAtMe: Efficient 3D point cloud labeling in virtual reality. In IEEE Intelligent Vehicles Symposium (IV), 2019. 50 [101] ZHENNAN WU, YANG LI, Yifei Huang, Lin Gu, Tatsuya Harada, and Hiroyuki Sato. 3D segmenter: 3D transformer based semantic segmentation via 2D panoramic distillation. In ICLR, 2023. 11 [102] Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3D ShapeNets: A deep representation for volumetric shapes. In CVPR, 2015. 23 [103] Zhonghua Wu, Yicheng Wu, Guosheng Lin, Jianfei Cai, and Chen Qian. Dual adaptive transformations for weakly supervised point cloud segmenta- tion. In ECCV, 2022. 8 [104] Saining Xie, Jiatao Gu, Demi Guo, Charles R Qi, Leonidas Guibas, and Or Litany. PointContrast: Unsupervised pre-training for 3D point cloud understanding. In ECCV, 2020. 3, 21, 32 [105] Lian Xu, Wanli Ouyang, Mohammed Bennamoun, Farid Boussaid, and Dan Xu. Multi-class token transformer for weakly supervised semantic segmentation. In CVPR, 2022. 55, 58, 61, 63 [106] Xun Xu and Gim Hee Lee. Weakly supervised semantic point cloud seg- mentation: Towards 10x fewer labels. In CVPR, 2020. ix, 2, 5, 8, 24, 26, 27, 28, 30, 35, 39, 44, 45, 47, 48, 49, 53, 62, 68 [107] Xu Yan, Jiantao Gao, Chaoda Zheng, Chao Zheng, Ruimao Zhang, Shuguang Cui, and Zhen Li. 2Dpass: 2D priors assisted semantic segmentation on lidar point clouds. In ECCV, 2022. 11 [108] Chaolong Yang, Yuyao Yan, Weiguang Zhao, Jianan Ye, Xi Yang, Amir Hussain, and Kaizhu Huang. Towards deeper and better multi-view feature fusion for 3D semantic segmentation. arXiv preprint arXiv:2212.06682, 2022. 11 [109] Cheng-Kun Yang, Yung-Yu Chuang, and Yen-Yu Lin. Unsupervised point cloud object co-segmentation by co-contrastive learning and mutual attention sampling. In ICCV, 2021. 6 [110] Cheng-Kun Yang, Yung-Yu Chuang, and Yen-Yu Lin. 2D-3D interlaced transformer for point cloud segmentation with scene-level supervision. In ICCV, 2023. 6 [111] Cheng-Kun Yang, Ji-Jia Wu, Kai-Syun Chen, Yung-Yu Chuang, and Yen-Yu Lin. An mil-derived transformer for weakly supervised point cloud segmentation. In CVPR, 2022. 6, 8, 53, 62, 63, 64, 65, 66, 70, 71 [112] Jiancheng Yang, Qiang Zhang, Bingbing Ni, Linguo Li, Jinxian Liu, Mengdie Zhou, and Qi Tian. Modeling point clouds with self-attention and gumbel subset sampling. In CVPR, 2019. 12 [113] Ze Yang and Liwei Wang. Learning relationships for multi-view 3D object recognition. In ICCV, 2019. 11, 53 [114] Yazhou Yao, Tao Chen, Guo-Sen Xie, Chuanyi Zhang, Fumin Shen, Qi Wu, Zhenmin Tang, and Jian Zhang. Non-salient region object mining for weakly supervised semantic segmentation. In CVPR, 2021. 7, 8 [115] Ping-Chung Yu, Cheng Sun, and Min Sun. Data efficient 3D learner via knowledge transferred from 2D model. In ECCV, 2022. 8, 63, 64, 69 [116] Zhihao Yuan, Xu Yan, Yinghong Liao, Yao Guo, Guanbin Li, Shuguang Cui, and Zhen Li. X-trans2cap: Cross-modal knowledge transfer using transformer for 3D dense captioning. In CVPR, 2022. 11 [117] Ze-Huan Yuan, Tong Lu, and Yirui Wu. Deep-dense conditional random fields for object co-segmentation. In IJCAI, 2017. 3, 9, 11, 35 [118] Ekim Yurtsever, Jacob Lambert, Alexander Carballo, and Kazuya Takeda. A survey of autonomous driving: Common practices and emerging technologies. IEEE access, 2020. 1 [119] Yachao Zhang, Zonghao Li, Yuan Xie, Yanyun Qu, Cuihua Li, and Tao Mei. Weakly supervised semantic segmentation for large-scale point cloud. In AAAI, 2021. 2, 8, 35, 45, 46, 47, 53 [120] Yachao Zhang, Yanyun Qu, Yuan Xie, Zonghao Li, Shanshan Zheng, and Cuihua Li. Perturbed self-distillation: Weakly supervised large-scale point cloud semantic segmentation. In ICCV, 2021. 2, 8, 35, 45, 46, 47 [121] Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In CVPR, 2016. 8, 26 [122] Yanzhao Zhou, Yi Zhu, Qixiang Ye, Qiang Qiu, and Jianbin Jiao. Weakly supervised instance segmentation using class peak response. In CVPR, 2018. 7, 8 [123] Chenyang Zhu, Kai Xu, Siddhartha Chaudhuri, Li Yi, Leonidas J Guibas, and Hao Zhang. AdaCoSeg: Adaptive shape co-segmentation with group consistency loss. In CVPR, 2020. 10, 25, 26, 27, 30	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88160	-
dc.description.abstract	3D 點雲分割可提供幾何空間以及語意等豐富資訊，對於像是室內場景理解、機器人或是自動駕駛等任務中扮演重要的應用角色。近年來由於深度神經網路的進步以及大量標註資料的建立，點雲分割模型已能展現出精準的結果，提供實務上應用的可行性。然而，精準的深度神經網路往往需要大量且細緻的標註資料進行訓練，而一個大型室內場景的點雲資料集，需要超過數百小時的人工標註時間才能夠完成，如此高昂的成本使得點雲分割的實際應用變得更加困難。本篇博士論文引入了弱監督式學習的方法，以降低模型對於標註資料的需求，同時保持可接受的精度水準。為了彌補弱監督標註的不足，我們應用了跨注意力機制，探討跨點雲之間的關係，挖掘出額外的監督損失提供模型訓練。為此，本篇論文開發出三個方法，並運用各種不同的弱監督標註來訓練點雲分割模型。針對第一個方法，僅需提供若干個包含相同物體類別的點雲，無需知道物體的類別也不需任何點的標註，我們的模型即可分割出屬於物體的點。為更進一步提升分割表現，在第二個方法中，我們使用場景級標註或稀疏點的等弱監督標註，並運用多實例學習 (multiple instance learning) 探討成對點雲之間的對應關係，藉此產生出額外的監督訊號，來訓練出高效的點雲分割模型。最後一個方法，我們利用 2D 影像與 3D 影像的互補性，引入 2D 影像的資訊，在僅有場景級的弱監督標註下，透過我們提出的交織式解碼器，有效結合 2D 影像與 3D 點雲各自的優勢，得到更好的點雲分割效果。經由多個公開資料集的實驗驗證與消融性實驗，我們的實驗結果表明，點雲分割模型在弱監督式的標註下，透過跨注意力機制來提供額外的監督訊號，依然可以提供更加的模型表現。我們提出的方法可廣泛適用於各種形態的弱監督標註，實驗效果均優於當時其他弱監督式學習的競爭方法，並且有效的降低點雲分割模型的應用成本。	zh_TW
dc.description.abstract	3D point cloud segmentation provides rich information about geometric space and semantics, playing a crucial role in tasks such as scene understanding and autonomous driving. In recent years, point cloud segmentation models based on neural networks show promising results. However, deep neural networks often require vast annotated training data, posing challenges for practical applications of point cloud segmentation. This doctoral thesis introduces weakly supervised learning to alleviate the issue of high annotation cost. To compensate for the lack of supervision, we apply the cross-attention mechanism to explore relationships across point clouds and mine additional supervisory signals for model training. Consequently, this thesis develops three frameworks and utilizes various types of weak annotations to train point cloud segmentation models. The first method requires only several point clouds containing the same object category, without the need for explicit object class labels or point-level annotations, to segment the points belonging to the object. To further enhance segmentation performance, the second method leverages scene-level annotations or sparse point annotations, incorporating multiple instance learning to explore relationships between pairs of point clouds. Lastly, we incorporate 2D image information by introducing an interlaced decoder that effectively combines the strengths of 2D images and 3D point clouds, yielding improved point cloud segmentation results under scene-level supervision. Experimental results demonstrate that the proposed methods in this thesis are widely applicable to various forms of weak supervision, effectively reducing the cost associated with point cloud segmentation applications.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-08T16:34:13Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-08-08T16:34:13Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Abstract i List of Figures vii List of Tables xi 1 Introduction 1 1.1 Background and Motivation 2 1.1.1 Point Cloud Object Co-Segmentation 3 1.1.2 MIL-Derived Transformer 4 1.1.3 Multimodal Interlaced Transformer 4 1.2 Contributions 5 1.3 Publications 6 2 Related Work 7 2.1 Weakly Supervised Learning 7 2.1.1 Weakly Supervised Image Segmentation 7 2.1.2 Weakly Supervised Point Cloud Segmentation 8 2.2 Object Co-Segmentation 9 2.2.1 Object Co-Segmentation in Images 9 2.2.2 Shape Co-Segmentation in Point Cloud 10 2.3 Cross-Attention Mechanis 11 2.4 2D and 3D Fusion for Point Cloud Applications 11 2.5 Point Cloud Sampling 12 2.6 Global and Weighted Pooling 13 3 Point Cloud Object Co-Segmentation 15 3.1 Method Overview 15 3.2 Proposed Method 17 3.2.1 Problem Statement 17 3.2.2 Object and Background Samplers 18 3.2.3 Mutual Attention Module 20 3.2.4 Co-Contrastive Loss 21 3.2.5 Implementation Details 23 3.3 Experimental Results 23 3.3.1 Datasets and Evaluation Metric 23 3.3.2 Competing Methods and Comparisons 25 3.3.3 Ablation Studies 31 3.3.4 Component Analysis 31 3.3.5 Application of Point Cloud Object Co-Segmentation 32 4 MIL-Derived Transformer 35 4.1 Method Overview 35 4.2 Proposed Method 37 4.2.1 Problem Statement 37 4.2.2 MIL-Derived Transformer 39 4.2.3 Adaptive Global Weighted Pooling 41 4.2.4 Cross-scale Feature Equivariance 42 4.2.5 Implementation Details 43 4.3 Experimental Results 44 4.3.1 Datasets and Evaluation Metric 44 4.3.2 Competing Methods and Comparisons 44 4.3.3 Ablation Studies 48 4.3.4 Component Analysis 50 5 Multimodal Interlaced Transformer 53 5.1 Method Overview 53 5.2 Proposed Method 55 5.2.1 Problem Statement 56 5.2.2 Transformer Encoders 57 5.2.3 Transformer Decoder 59 5.2.4 Implementation Details 61 5.3 Experimental Results 62 5.3.1 Datasets and Evaluation Metric 62 5.3.2 Competing Methods and Comparisons 63 5.3.3 Ablation Studies 68 5.3.4 Component Analysis 69 6 Conclusion 73 6.1 Discussion and Future Works 74 Reference 77	-
dc.language.iso	en	-
dc.title	應用跨注意力機制於弱監督式3D點雲分割	zh_TW
dc.title	Applying Cross-attention Mechanism for Weakly Supervised Point Cloud Segmentation	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	博士	-
dc.contributor.coadvisor	林彥宇	zh_TW
dc.contributor.coadvisor	Yen-Yu Lin	en
dc.contributor.oralexamcommittee	徐宏民;陳祝嵩;王鈺強;彭文孝;劉育綸;孫民;陳駿丞	zh_TW
dc.contributor.oralexamcommittee	Winston H. Hsu;Chu-Song Chen;Yu-Chiang Frank Wang;Wen-Hsiao Peng;Yu-Lun Liu ;Min Sun;Jun-Cheng Chen	en
dc.subject.keyword	跨注意力機制,3D點雲分割,弱監督式學習,	zh_TW
dc.subject.keyword	Cross-attention,3D point cloud segmentation,Weakly supervised learning,	en
dc.relation.page	90	-
dc.identifier.doi	10.6342/NTU202301432	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2023-07-18	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	7.31 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。