用於全景影像卷積操作之編碼機制

Tsung-Shan Yang; 楊宗山

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80098

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	吳沛遠(Pei-Yuan Wu)
dc.contributor.author	Tsung-Shan Yang	en
dc.contributor.author	楊宗山	zh_TW
dc.date.accessioned	2022-11-23T09:25:59Z	-
dc.date.available	2021-11-05
dc.date.available	2022-11-23T09:25:59Z	-
dc.date.copyright	2021-11-05
dc.date.issued	2021
dc.date.submitted	2021-10-18
dc.identifier.citation	[1] Faq.md.https://github.com/jonas-koehler/s2cnn/blob/master/FAQ.md. [2] T. Cohen, M. Weiler, B. Kicanaoglu, and M. Welling. Gauge equivariant convolutional networks and the icosahedral cnn. In International Conference on Machine Learning, pages 1321–1330. PMLR, 2019. [3] T. S. Cohen, M. Geiger, J. Köhler, and M. Welling. Spherical cnns. arXiv preprint arXiv:1801.10130, 2018. [4] B. Coors, A. P. Condurache, and A. Geiger. Spherenet: Learning spherical representations for detection and classification in omnidirectional images. In Proceedings of the European Conference on Computer Vision (ECCV), pages 518–533, 2018. [5] J. B. Cordonnier, A. Loukas, and M. Jaggi. On the relationship between self-attention and convolutional layers. arXiv preprint arXiv:1911.03584, 2019. [6] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 764–773, 2017. [7] Z.Dai,Z.Yang,Y.Yang,J.Carbonell,Q.V.Le,andR.Salakhutdinov.Transformer-xl: Attentive language models beyond a fixed length context. arXiv preprint arXiv:1901.02860, 2019. [8] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. [9] C. Esteves, C. Allen-Blanchette, A. Makadia, and K. Daniilidis. Learning so(3) equivariant representations with spherical cnns. In ECCV, 2018. [10] C. Fernandez-Labrador, J. M. Facil, A. PerezYus, C. Demonceaux, J. Civera, and J. J. Guerrero. Corners for layout: End-to-end layout recovery from 360 images. IEEE Robotics and Automation Letters, 5(2):1255–1262, 2020. [11] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [12] M. Kazhdan, T. Funkhouser, and S. Rusinkiewicz. Rotation invariant spherical harmonic representation of 3 d shape descriptors. In Symposium on geometry processing, volume 6, pages 156–164, 2003. [13] R. Khasanova and P. Frossard. Graph-based classification of omnidirectional images. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 869–878, 2017. [14] B. Kicanaoglu, P. de Haan, and T. Cohen. Gauge equivariant spherical cnns. 2019. [15] J. Klicpera, J. Groß, and S. Günnemann. Directional message passing for molecular graphs. arXiv preprint arXiv:2003.03123, 2020. [16] A. Krizhevsky. Learning multiple layers of features from tiny images. 2009. [17] W. S. Lai, Y. Huang, N. Joshi, C. Buehler, M. H. Yang, and S. B. Kang. Semantic driven generation of hyperlapse from 360 degree video. IEEE transactions on visualization and computer graphics, 24(9):2610–2621, 2017. [18] Y. LeCun. The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998. [19] Y. Lee, J. Jeong, J. Yun, W. Cho, and K. J. Yoon. Spherephd: Applying cnns on a spherical polyhedron representation of 360deg images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9181– 9189, 2019. [20] Matterport. https://matterport.com/. [21] D. Maturana and S. Scherer. Voxnet: A 3d convolutional neural network for real time object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 922–928. IEEE, 2015. [22] J. Park, S. Woo, J. Y. Lee, and I. S. Kweon. Bam: Bottleneck attention module. arXiv preprint arXiv:1807.06514, 2018. [23] C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017. [24] P. Ramachandran, N. Parmar, A. Vaswani, I. Bello, A. Levskaya, and J. Shlens. Stand-alone self-attention in vision models. arXiv preprint arXiv:1906.05909, 2019. [25] S. C. Schonsheck, B. Dong, and R. Lai. Parallel transport convolution: A new tool for convolutional neural networks on manifolds. arXiv preprint arXiv:1805.07857, 2018. [26] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1912–1920, 2015. [27] W. Yang, Y. Qian, J. K. Kämäräinen, F. Cricri, and L. Fan. Object detection in equirectangular panorama. In 2018 24th International Conference on Pattern Recognition (ICPR), pages 2190–2195. IEEE, 2018. [28] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang. Free-form image inpainting with gated convolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4471–4480, 2019. [29] C. Zhang, S. Liwicki, W. Smith, and R. Cipolla. Orientation-aware semantic segmentation on icosahedron spheres. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3533–3541, 2019. [30] H. Zhao, J. Jia, and V. Koltun. Exploring self-attention for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10076–10085, 2020.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80098	-
dc.description.abstract	"本篇基於編碼機制，以編碼的方式改變作用於平面影像的卷積核中的權重，使卷積在全景影像的特徵提取上能有較佳的表現，並且可以與現存的卷積類神經網路模塊兼容。實驗結果以全景圖片分類的準卻度呈現了此編碼機制和卷積類神經網路及殘差模塊的相容性，並以 omni-MNIST, omni-CIFAR10, omni-CIFAR100 進行實驗，在準確度上得到目前最佳的結果。"	zh_TW
dc.description.provenance	Made available in DSpace on 2022-11-23T09:25:59Z (GMT). No. of bitstreams: 1 U0001-0807202123243900.pdf: 1338524 bytes, checksum: fe9e44d3060c780a2ee6175a256e1c4b (MD5) Previous issue date: 2021	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xi List of Tables xiii Chapter 1 Introduction 1 Chapter 2 Related Work 5 2.1 CNNs Based on Different Projections . . . . . . . . . . . . . . . . . 5 2.2 Spherical CNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Self-defined Representations and Kernels . . . . . . . . . . . . . . . 7 Chapter 3 Method 9 3.1 Convolution and SelfAttention on Feature Extraction . . . . . . . . 10 3.2 Encoding with Positional Information . . . . . . . . . . . . . . . . . 11 3.3 Spherical absolute encoding . . . . . . . . . . . . . . . . . . . . . . 12 3.4 Spherical relative encoding . . . . . . . . . . . . . . . . . . . . . . . 13 3.5 Encoding and Convolution . . . . . . . . . . . . . . . . . . . . . . . 15 Chapter 4 Experiment 17 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Classification on omniMNIST . . . . . . . . . . . . . . . . . . . . . 17 4.2.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.3 Residual Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.3.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3.2 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.4 SelfAttention Model . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.4.1 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 5 Conclusion 25 References 27
dc.language.iso	en
dc.subject	卷積	zh_TW
dc.subject	全景影像	zh_TW
dc.subject	編碼機制	zh_TW
dc.subject	Omnidirectional	en
dc.subject	Encoding	en
dc.subject	Convolution	en
dc.title	用於全景影像卷積操作之編碼機制	zh_TW
dc.title	Omnidirectional Image Encoding	en
dc.date.schoolyear	109-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	丁建均(Hsin-Tsai Liu),王鈺強(Chih-Yang Tseng)
dc.subject.keyword	全景影像,卷積,編碼機制,	zh_TW
dc.subject.keyword	Omnidirectional,Encoding,Convolution,	en
dc.relation.page	30
dc.identifier.doi	10.6342/NTU202101356
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2021-10-19
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
U0001-0807202123243900.pdf	1.31 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。