基於記憶體內運算平台以重定位以及點雲密集化提供即時三維感知

劉昕祐; Xin-You Liu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89382

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	施吉昇	zh_TW
dc.contributor.advisor	Chi-Sheng Shih	en
dc.contributor.author	劉昕祐	zh_TW
dc.contributor.author	Xin-You Liu	en
dc.date.accessioned	2023-09-07T16:46:38Z	-
dc.date.available	2025-07-31	-
dc.date.copyright	2023-09-11	-
dc.date.issued	2023	-
dc.date.submitted	2023-08-03	-
dc.identifier.citation	[1] M.-T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attentionbased neural machine translation,” arXiv preprint arXiv:1508.04025, 2015. [2] S. Okumura, M. Yabuuchi, K. Hijioka, and K. Nose, “A ternary based bit scalable, 8.80 tops/w cnn accelerator with many-core processing-in-memory architecture with 896k synapses/mm 2,” in 2019 Symposium on VLSI Circuits. IEEE, 2019, pp. C248–C249. [3] Y.-C. Chiu, Z. Zhang, J.-J. Chen, X. Si, R. Liu, Y.-N. Tu, J.-W. Su, W.-H. Huang, J.-H. Wang, W.-C. Wei, J.-M. Hung, S.-S. Sheu, S.-H. Li, C.-I. Wu, R.-S. Liu, C.- C. Hsieh, K.-T. Tang, and M.-F. Chang, “A 4-kb 1-to-8-bit configurable 6t srambased computation-in-memory unit-macro for cnn-based ai edge processors,” IEEE Journal of Solid-State Circuits, vol. 55, no. 10, pp. 2790–2801, 2020. [4] H. Jia, H. Valavi, Y. Tang, J. Zhang, and N. Verma, “A programmable heterogeneous microprocessor based on bit-scalable in-memory computing,” IEEE Journal of Solid-State Circuits, vol. 55, no. 9, pp. 2609–2621, 2020. [5] H. Kim, T. Yoo, T. T.-H. Kim, and B. Kim, “Colonnade: A reconfigurable srambased digital bit-serial compute-in-memory macro for processing neural networks,” IEEE Journal of Solid-State Circuits, vol. 56, no. 7, pp. 2221–2233, 2021. [6] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” Advances in neural information processing systems, vol. 27, 2014. [7] C.-C. J. Kuo and A. M. Madni, “Green learning: Introduction, examples and outlook,” Journal of Visual Communication and Image Representation, p. 103685, 2022. [8] J. Zhang, Y. Yao, and B. Deng, “Fast and robust iterative closest point,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 7, pp. 3450–3466, 2021. [9] F. Bernard, C. Theobalt, and M. Moeller, “Ds*: Tighter lifting-free convex relaxations for quadratic matching problems,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4310–4319. [10] P. J. Besl and N. D. McKay, “Method for registration of 3-d shapes,” in Sensor fusion IV: control paradigms and data structures, vol. 1611. Spie, 1992, pp. 586– 606. [11] N. Dym, H. Maron, and Y. Lipman, “Ds++: A flexible, scalable and provably tight relaxation for matching problems,” arXiv preprint arXiv:1705.06148, 2017. [12] W. Forstner and K. Khoshelham, “Efficient and accurate registration of point clouds with plane to plane correspondences,” in Proceedings of the IEEE international conference on computer vision workshops, 2017, pp. 2165–2173. [13] J. P. Iglesias, C. Olsson, and F. Kahl, “Global optimality for point set registration using semidefinite programming,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 8287–8295. [14] I. Kezurer, S. Z. Kovalsky, R. Basri, and Y. Lipman, “Tight relaxation of quadratic matching,” in Computer graphics forum, vol. 34, no. 5. Wiley Online Library, 2015, pp. 115–128. [15] H. M. Le, T.-T. Do, T. Hoang, and N.-M. Cheung, “Sdrsac: Semidefinite-based randomized approach for robust point cloud registration without correspondences,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 124–133. [16] H. Yang and L. Carlone, “A polynomial-time solution for robust registration with extreme outlier rates,” arXiv preprint arXiv:1903.08588, 2019. [17] J. Yang, H. Li, and Y. Jia, “Go-icp: Solving 3d registration efficiently and globally optimally,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1457–1464. [18] C.-Y. Yao, T.-Y. Wu, H.-C. Liang, Y.-K. Chen, and T.-T. Liu, “A fully bit-flexible computation in memory macro using multi-functional computing bit cell and embedded input sparsity sensing,” IEEE Journal of Solid-State Circuits, 2023. [19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017. [20] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks and locally connected networks on graphs,” arXiv preprint arXiv:1312.6203, 2013. [21] Y. Wang and J. M. Solomon, “Deep closest point: Learning representations for point cloud registration,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 3523–3532. [22] D. Maturana and S. Scherer, “Voxnet: A 3d convolutional neural network for realtime object recognition,” in 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2015, pp. 922–928. [23] E. Özbay and A. Çinar, “A voxelize structured refinement method for registration of point clouds from kinect sensors,” Engineering Science and Technology, an International Journal, vol. 22, no. 2, pp. 555–568, 2019. [24] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3d shapenets: A deep representation for volumetric shapes,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1912–1920. [25] G. Riegler, A. Osman Ulusoy, and A. Geiger, “Octnet: Learning deep 3d representations at high resolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3577–3586. [26] G. Zhan, Q. Fan, K. Mo, L. Shao, B. Chen, L. J. Guibas, H. Dong et al., “Generative 3d part assembly via dynamic graph learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 6315–6326, 2020. [27] Y. Li, K. Mo, L. Shao, M. Sung, and L. Guibas, “Learning 3d part assembly from a single image,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16. Springer, 2020, pp. 664–682. [28] A. S. Micilotta, E.-J. Ong, and R. Bowden, “Real-time upper body detection and 3d pose estimation in monoscopic images,” in Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006, Proceedings, Part III 9. Springer, 2006, pp. 139–150. [29] G. Turk and M. Levoy, “Zippered polygon meshes from range images,” in Proceedings of the 21st annual conference on Computer graphics and interactive techniques, 1994, pp. 311–318. [30] A. Agudo and F. Moreno-Noguer, “Shape basis interpretation for monocular deformable 3-d reconstruction,” IEEE Transactions on Multimedia, vol. 21, no. 4, pp. 821–834, 2018 [31] I. Cheng and P. Boulanger, “Adaptive online transmission of 3-d texmesh using scale-space and visual perception analysis,” IEEE transactions on multimedia, vol. 8, no. 3, pp. 550–563, 2006. [32] C. Lv, W. Lin, and B. Zhao, “Voxel structure-based mesh reconstruction from a 3d point cloud,” IEEE Transactions on Multimedia, vol. 24, pp. 1815–1829, 2021. [33] D. DeCarlo and D. Metaxas, “The integration of optical flow and deformable models with applications to human face shape and motion estimation,” in Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 1996, pp. 231–238. [34] G. TaubinÝ, “Geometric signal processing on polygonal meshes,” in Proceedings of EUROGRAPHICS, 2000. [35] I. A. Essa and A. Pentland, “A vision system for observing and extracting facial action parameters,” in CVPR, 1994, pp. 76–83. [36] L. Kumar, M. Shuaib, Q. Tanveer, V. Kumar, M. Javaid, and A. Haleem, “3d scanner integration with product development,” Int J Eng Technol, vol. 7, no. 2, pp. 220–225, 2018. [37] S. Bouaziz, A. Tagliasacchi, H. Li, and M. Pauly, “Modern techniques and applications for real-time non-rigid registration,” in SIGGRAPH ASIA 2016 Courses, 2016, pp. 1–25. [38] Y. Sahillioglu and L. Kavan, “Detail-preserving mesh unfolding for nonrigid shape retrieval,” ACM Transactions on Graphics (TOG), vol. 35, no. 3, pp. 1–11, 2016. [39] V. Surazhsky, T. Surazhsky, D. Kirsanov, S. J. Gortler, and H. Hoppe, “Fast exact and approximate geodesics on meshes,” ACM transactions on graphics (TOG), vol. 24, no. 3, pp. 553–560, 2005. [40] C. Lv, Z. Wu, X. Wang, M. Zhou, and K.-A. Toh, “Nasal similarity measure of 3d faces based on curve shape space,” Pattern Recognition, vol. 88, pp. 458–469, 2019. [41] B. Lévy and Y. Liu, “L p centroidal voronoi tessellation and its applications,” ACM Transactions on Graphics (TOG), vol. 29, no. 4, pp. 1–11, 2010. [42] M. Dunyach, D. Vanderhaeghe, L. Barthe, and M. Botsch, “Adaptive remeshing for real-time mesh deformation,” in Eurographics 2013. The Eurographics Association, 2013. [43] Y. Wang, D.-M. Yan, X. Liu, C. Tang, J. Guo, X. Zhang, and P. Wonka, “Isotropic surface remeshing without large and small angles,” IEEE transactions on visualization and computer graphics, vol. 25, no. 7, pp. 2430–2442, 2018. [44] X. Du, X. Liu, D.-M. Yan, C. Jiang, J. Ye, and H. Zhang, “Field-aligned isotropic surface remeshing,” in Computer Graphics Forum, vol. 37, no. 6. Wiley Online Library, 2018, pp. 343–357. [45] T. Ju, “Robust repair of polygonal models,” ACM Transactions on Graphics (TOG), vol. 23, no. 3, pp. 888–895, 2004. [46] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660. [47] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, “Deep sets,” Advances in neural information processing systems, vol. 30, 2017. [48] H. Zhao, L. Jiang, C.-W. Fu, and J. Jia, “Pointweb: Enhancing local neighborhood features for point cloud processing,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5565–5573. [49] Y. Duan, Y. Zheng, J. Lu, J. Zhou, and Q. Tian, “Structural relational reasoning of point clouds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 949–958. [50] H. Lin, Z. Xiao, Y. Tan, H. Chao, and S. Ding, “Justlookup: One millisecond deep feature extraction for point clouds by lookup tables,” in 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2019, pp. 326–331. [51] K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu et al., “A survey on vision transformer,” IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 1, pp. 87–110, 2022. [52] S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, “Transformers in vision: A survey,” ACM computing surveys (CSUR), vol. 54, no. 10s, pp. 1–41, 2022. [53] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3–19. [54] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 764–773. [55] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer, 2020, pp. 213–229. [56] Y. Yuan, L. Huang, J. Guo, C. Zhang, X. Chen, and J. Wang, “Ocnet: Object context network for scene parsing,” arXiv preprint arXiv:1809.00916, 2018. [57] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, “Dual attention network for scene segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3146–3154. [58] J. Yang, P. Ren, D. Zhang, D. Chen, F. Wen, H. Li, and G. Hua, “Neural aggregation network for video face recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4362–4371. [59] X. Chen, Y. Wu, Z. Wang, S. Liu, and J. Li, “Developing real-time streaming transformer transducer for speech recognition on large-scale dataset,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 5904–5908. [60] L. Dong, S. Xu, and B. Xu, “Speech-transformer: A no-recurrence sequence-tosequence model for speech recognition,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5884–5888. [61] A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, and R. Pang, “Conformer: Convolution-augmented Transformer for Speech Recognition,” in Proc. Interspeech 2020, 2020, pp. 5036–5040. [62] N.-Q. Pham, T.-S. Nguyen, J. Niehues, M. Müller, S. Stüker, and A. Waibel, “Very deep self-attention networks for end-to-end speech recognition,” arXiv preprint arXiv:1904.13377, 2019. [63] H. R. Ihm, J. Y. Lee, B. J. Choi, S. J. Cheon, and N. S. Kim, “Reformer-TTS: Neural Speech Synthesis with Reformer Network,” in Proc. Interspeech 2020, 2020, pp. 2012–2016. [64] N. Li, S. Liu, Y. Liu, S. Zhao, and M. Liu, “Neural speech synthesis with transformer network,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 6706–6713, Jul. 2019. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/4642 [65] W. Yu, J. Zhou, H. Wang, and L. Tao, “Setransformer: speech enhancement transformer,” Cognitive Computation, pp. 1–7, 2022. [66] J. Kim, M. El-Khamy, and J. Lee, “T-gsa: Transformer with gaussian-weighted self-attention for speech enhancement,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 6649–6653. [67] C.-Z. A. Huang, A. Vaswani, J. Uszkoreit, N. Shazeer, I. Simon, C. Hawthorne, A. M. Dai, M. D. Hoffman, M. Dinculescu, and D. Eck, “Music transformer,” arXiv preprint arXiv:1809.04281, 2018. [68] E. Zweig, “Voicebox,” Millennium Film Journal, no. 39/40, p. 132, 2003. [69] R. Hu, A. Singh, T. Darrell, and M. Rohrbach, “Iterative answer prediction with pointer-augmented multimodal transformers for textvqa,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9992–10 002. [70] W. Li, C. Gao, G. Niu, X. Xiao, H. Liu, J. Liu, H. Wu, and H. Wang, “Unimo: Towards unified-modal understanding and generation via cross-modal contrastive learning,” arXiv preprint arXiv:2012.15409, 2020. [71] W. Su, X. Zhu, Y. Cao, B. Li, L. Lu, F. Wei, and J. Dai, “Vl-bert: Pre-training of generic visual-linguistic representations,” arXiv preprint arXiv:1908.08530, 2019. [72] L. H. Li, M. Yatskar, D. Yin, C.-J. Hsieh, and K.-W. Chang, “Visualbert: A simple and performant baseline for vision and language,” arXiv preprint arXiv:1908.03557, 2019. [73] M. Cornia, M. Stefanini, L. Baraldi, and R. Cucchiara, “Meshed-memory transformer for image captioning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 10 578–10 587. [74] J. Lin, R. Men, A. Yang, C. Zhou, M. Ding, Y. Zhang, P. Wang, A. Wang, L. Jiang, X. Jia et al., “M6: A chinese multimodal pretrainer,” arXiv preprint arXiv:2103.00823, 2021. [75] C. Sun, A. Myers, C. Vondrick, K. Murphy, and C. Schmid, “Videobert: A joint model for video and language representation learning,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 7464–7473. [76] C. Han, M. Wang, H. Ji, and L. Li, “Learning shared semantic space for speech-totext translation,” arXiv preprint arXiv:2105.03095, 2021. [77] M. Ding, Z. Yang, W. Hong, W. Zheng, C. Zhou, D. Yin, J. Lin, X. Zou, Z. Shao, H. Yang et al., “Cogview: Mastering text-to-image generation via transformers,” Advances in Neural Information Processing Systems, vol. 34, pp. 19 822–19 835, 2021. [78] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-shot text-to-image generation,” in International Conference on Machine Learning. PMLR, 2021, pp. 8821–8831. [79] M. Simonovsky and N. Komodakis, “Dynamic edge-conditioned filters in convolutional neural networks on graphs,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3693–3702. [80] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic graph cnn for learning on point clouds,” Acm Transactions On Graphics (tog), vol. 38, no. 5, pp. 1–12, 2019. [81] K. Zhang, M. Hao, J. Wang, C. W. de Silva, and C. Fu, “Linked dynamic graph cnn: Learning on point cloud via linking hierarchical features,” arXiv preprint arXiv:1904.10014, 2019. [82] J. Liu, B. Ni, C. Li, J. Yang, and Q. Tian, “Dynamic points agglomeration for hierarchical point sets learning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 7546–7555. [83] G. Te, W. Hu, A. Zheng, and Z. Guo, “Rgcnn: Regularized graph cnn for point cloud segmentation,” in Proceedings of the 26th ACM international conference on Multimedia, 2018, pp. 746–754. [84] R. Li, S. Wang, F. Zhu, and J. Huang, “Adaptive graph convolutional neural networks,” in Proceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018. [85] Y. Feng, H. You, Z. Zhang, R. Ji, and Y. Gao, “Hypergraph neural networks,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 3558–3565. [86] C. Wang, B. Samari, and K. Siddiqi, “Local spectral graph convolution for point set feature learning,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 52–66. [87] Y. Zhang and M. Rabbat, “A graph-cnn for 3d point cloud classification,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 6279–6283. [88] G. Pan, J. Wang, R. Ying, and P. Liu, “3dti-net: Learn inner transform invariant 3d geometry features using dynamic gcn,” arXiv preprint arXiv:1812.06254, 2018. [89] Z. Gojcic, C. Zhou, J. D. Wegner, and A. Wieser, “The perfect match: 3d point cloud matching with smoothed densities,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5545–5554. [90] Z. Yang, J. Z. Pan, L. Luo, X. Zhou, K. Grauman, and Q. Huang, “Extreme relative pose estimation for rgb-d scans via scene completion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4531–4540. [91] L. Wang, J. Chen, X. Li, and Y. Fang, “Non-rigid point set registration networks,” arXiv preprint arXiv:1904.01428, 2019. [92] G. Elbaz, T. Avraham, and A. Fischer, “3d point cloud registration for localization using a deep neural network auto-encoder,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4631–4640. [93] W. Lu, G. Wan, Y. Zhou, X. Fu, P. Yuan, and S. Song, “Deepicp: An end-to-end deep neural network for 3d point cloud registration [j],” in IEEE ICCV, 2019. [94] X. Sun, Z. Lian, and J. Xiao, “Srinet: Learning strictly rotation-invariant representations for point cloud classification and segmentation,” in Proceedings of the 27th ACM International Conference on Multimedia, 2019, pp. 980–988. [95] X. Yan, C. Zheng, Z. Li, S. Wang, and S. Cui, “Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 5589–5598. [96] H. Kim, T. Yoo, T. T.-H. Kim, and B. Kim, “Colonnade: A reconfigurable srambased digital bit-serial compute-in-memory macro for processing neural networks,” IEEE Journal of Solid-State Circuits, vol. 56, no. 7, pp. 2221–2233, 2021. [97] Z. Chen, Z. Yu, Q. Jin, Y. He, J. Wang, S. Lin, D. Li, Y. Wang, and K. Yang, “Capram: A charge-domain in-memory computing 6t-sram for accurate and precisionprogrammable cnn inference,” IEEE Journal of Solid-State Circuits, vol. 56, no. 6, pp. 1924–1935, 2021. [98] Y.-T. Hsu, C.-Y. Yao, T.-Y. Wu, T.-D. Chiueh, and T.-T. Liu, “A high-throughput energy–area-efficient computing-in-memory sram using unified charge-processing network,” IEEE Solid-State Circuits Letters, vol. 4, pp. 146–149, 2021. [99] E. H. Thiede, T. S. Hy, and R. Kondor, “The general theory of permutation equivarant neural networks and higher order graph variational encoders,” arXiv preprint arXiv:2004.03990, 2020. [100] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, “Convolutional sequence to sequence learning,” in International conference on machine learning. PMLR, 2017, pp. 1243–1252. [101] F. Fuchs, D. Worrall, V. Fischer, and M. Welling, “Se(3)-transformers: 3d rototranslation equivariant attention networks,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 1970–1981. [102] J. Lee, Y. Lee, J. Kim, A. Kosiorek, S. Choi, and Y. W. Teh, “Set transformer: A framework for attention-based permutation-invariant neural networks,” in International conference on machine learning. PMLR, 2019, pp. 3744–3753. [103] B. Wang, L. Shang, C. Lioma, X. Jiang, H. Yang, Q. Liu, and J. G. Simonsen, “On position embeddings in bert,” in International Conference on Learning Representations, 2021. [104] X. Liu, H.-F. Yu, I. Dhillon, and C.-J. Hsieh, “Learning to encode position for transformer with continuous dynamical model,” in International conference on machine learning. PMLR, 2020, pp. 6327–6335. [105] R. Al-Rfou, D. Choe, N. Constant, M. Guo, and L. Jones, “Character-level language modeling with deeper self-attention,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 3159–3166. [106] M. Dehghani, S. Gouws, O. Vinyals, J. Uszkoreit, and Ł. Kaiser, “Universal transformers,” arXiv preprint arXiv:1807.03819, 2018. [107] Q. Guo, X. Qiu, X. Xue, and Z. Zhang, “Low-rank and locality constrained selfattention for sequence modeling,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 12, pp. 2213–2222, 2019. [108] P. Shaw, J. Uszkoreit, and A. Vaswani, “Self-attention with relative position representations,” arXiv preprint arXiv:1803.02155, 2018. [109] J. Gu, Q. Liu, and K. Cho, “Insertion-based decoding with automatically inferred generation order,” Transactions of the Association for Computational Linguistics, vol. 7, pp. 661–676, 2019. [110] Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. V. Le, and R. Salakhutdinov, “Transformer-xl: Attentive language models beyond a fixed-length context,” arXiv preprint arXiv:1901.02860, 2019. [111] P. He, X. Liu, J. Gao, and W. Chen, “Deberta: Decoding-enhanced bert with disentangled attention,” arXiv preprint arXiv:2006.03654, 2020. [112] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018. [113] X. Chu, Z. Tian, B. Zhang, X. Wang, X. Wei, H. Xia, and C. Shen, “Conditional positional encodings for vision transformers,” arXiv preprint arXiv:2102.10882, 2021. [114] G. Ke, D. He, and T.-Y. Liu, “Rethinking positional encoding in language pretraining,” arXiv preprint arXiv:2006.15595, 2020. [115] J. Su, Y. Lu, S. Pan, A. Murtadha, B. Wen, and Y. Liu, “Roformer: Enhanced transformer with rotary position embedding,” arXiv preprint arXiv:2104.09864, 2021. [116] B. Wang, D. Zhao, C. Lioma, Q. Li, P. Zhang, and J. G. Simonsen, “Encoding word order in complex embeddings,” arXiv preprint arXiv:1912.12333, 2019. [117] Z. Wang, Y. Ma, Z. Liu, and J. Tang, “R-transformer: Recurrent neural network enhanced transformer,” arXiv preprint arXiv:1907.05572, 2019. [118] M. A. Islam, S. Jia, and N. D. Bruce, “How much position information do convolutional neural networks encode?” arXiv preprint arXiv:2001.08248, 2020. [119] Y.-H. H. Tsai, S. Bai, M. Yamada, L.-P. Morency, and R. Salakhutdinov, “Transformer dissection: A unified understanding of transformer’s attention via the lens of kernel,” arXiv preprint arXiv:1908.11775, 2019. [120] M. M. Bronstein, J. Bruna, T. Cohen, and P. Velickovi ˇ c, “Geometric deep learning: Grids, groups, graphs, geodesics, and gauges,” arXiv preprint arXiv:2104.13478, 2021. [121] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks. corr abs/1901.00596 (2019),” arXiv preprint arXiv:1901.00596, 2019. [122] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun, “Graph neural networks: A review of methods and applications,” AI open, vol. 1, pp. 57–81, 2020. [123] M. Fey, J. E. Lenssen, F. Weichert, and H. Müller, “Splinecnn: Fast geometric deep learning with continuous b-spline kernels,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 869–877. [124] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” Advances in neural information processing systems, vol. 30, 2017. [125] B. Jiang, Z. Zhang, D. Lin, J. Tang, and B. Luo, “Semi-supervised learning with graph learning-convolutional networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 11 313–11 320. [126] M. Gou, F. Xiong, O. Camps, and M. Sznaier, “Monet: Moments embedding network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3175–3183. [127] P. Velickovi ˇ c, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” arXiv preprint arXiv:1710.10903, 2017. [128] D. Zeng, K. Liu, Y. Chen, and J. Zhao, “Distant supervision for relation extraction via piecewise convolutional neural networks,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal: Association for Computational Linguistics, Sep. 2015, pp. 1753–1762. [Online]. Available: https://aclanthology.org/D15-1203 [129] H. Su, V. Jampani, D. Sun, S. Maji, E. Kalogerakis, M.-H. Yang, and J. Kautz, “Splatnet: Sparse lattice networks for point cloud processing,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2530– 2539. [130] G. Peyré, M. Cuturi et al., “Computational optimal transport: With applications to data science,” Foundations and Trends® in Machine Learning, vol. 11, no. 5-6, pp. 355–607, 2019. [131] J. Caldwell, R. A. Watson, C. Thies, and J. D. Knowles, “Deep optimisation: Solving combinatorial optimisation problems using deep neural networks,” arXiv preprint arXiv:1811.00784, 2018. [132] Y. Li, C. Gu, T. Dullien, O. Vinyals, and P. Kohli, “Graph matching networks for learning the similarity of graph structured objects,” in International conference on machine learning. PMLR, 2019, pp. 3835–3845. [133] S. Rusinkiewicz and M. Levoy, “Efficient variants of the icp algorithm,” in Proceedings third international conference on 3-D digital imaging and modeling. IEEE, 2001, pp. 145–152. [134] S. Bouaziz, A. Tagliasacchi, and M. Pauly, “Sparse iterative closest point,” in Computer graphics forum, vol. 32, no. 5. Wiley Online Library, 2013, pp. 113–123. [135] G. Agamennoni, S. Fontana, R. Y. Siegwart, and D. G. Sorrenti, “Point clouds registration with probabilistic data association,” in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, pp. 4092–4098. [136] T. Hinzmann, T. Stastny, G. Conte, P. Doherty, P. Rudol, M. Wzorek, E. Galceran, R. Siegwart, and I. Gilitschenski, “Collaborative 3d reconstruction using heterogeneous uavs: System and experiments,” in 2016 International Symposium on Experimental Robotics. Springer, 2017, pp. 43–56. [137] D. Hähnel and W. Burgard, “Probabilistic matching for 3d scan registration,” in Proc. of the VDI-Conference Robotik, vol. 2002. Citeseer, 2002. [138] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014. [139] H. Gao, Y. Yang, C. Li, L. Gao, and B. Zhang, “Multiscale residual network with mixed depthwise convolution for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 4, pp. 3396–3408, 2020. [140] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep residual channel attention networks,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 286–301.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89382	-
dc.description.abstract	隨著物聯網邊緣設備和智慧型感測器技術的發展，具備 3D 傳感器邊緣運運算裝置，相較於過去無需仰傳統賴集中式叢集計算機系統即可執行神經網絡推理任務。而可以用於加速邊緣運算裝置的神經網絡推理任務的記憶體內計算（CIM）架構近年在研究領域中受到許多關注，配合3D感測器可快速地感知周圍環境豐富的3D幾何資訊。基於高速揮發性 SRAM CIM 運算單元的記憶體內運算異質系統，可用以進行快速三維感知工作。此系統可用來進行即時三維物件的特徵粗定位工作（<1s），並且可有效的減少傳統馮紐曼系統中神經網絡乘加工作於定位任務中處理單元的系統負載。對於廣泛部署於邊緣嵌入式系統上的三維感知工作，由於記憶體計算單元內的低精度（~8 bit）乘加運算操作與記憶體內運算中以類比為基礎電荷共享計算方式於ADC轉換過程所產生的非線性不確定性誤差，此兩因素降低基於結構光系統感知所生成的稀疏且具雜訊點雲對於系統神經網絡推理工作的精確度。在此研究中，所提出的神經網絡訓練方法，在訓練過程中嵌入不確定性模型的神經網絡訓練，作為增強於推理期間記憶體內運算中容忍ADC非線性不確定性錯誤變化以及三維雜訊感知資料之方法。同時將具有乘法與累加 (MAC) 運算的 SRAM CIM 硬體用於計算 k 個最近的高維特徵向量之間的歐幾里得距離之推論加速。並配合以注意機制為基礎的三維特徵抽取與理解的卷積神經網路定位設計，在實際實驗驗證中降低了 27×模型大小、297×推論時間以及100×系統能耗。若增加 CIM 單元於記憶體內之數量，可以進一步提高大型智慧型感知系統定位以及緻密化任務之效能表現。	zh_TW
dc.description.abstract	IoT edge devices with 3D sensors perform neural network inference tasks without relaying a centralized computing system. 3D sensors can precept and provide rich 3D geometric information about the environment quickly and accurately. The architecture of computing-in-memory (CIM) to accelerate neural network inference tasks on edge devices has recently gained attention in the research community. Combining a high-speed SRAM CIM unit and fast 3D sensing on the sensor system provides a coarse processing function to reduce neural network loading on sensor localization tasks. To wildly deploy 3D sensors on edge devices, the low precision (~8 bits) computing scheme and uncertainty error caused by the analog charge-sharing computing scheme of the ADC converter in an in-memory computing unit are the two main issues for system design. Severely degrades the accuracy during neural network inference under the sparsity and noisy 3D sensing of structure light system. In this work, the proposed neural network training scheme embedded an uncertainty model enhancing the tolerance of inquiry variation and noise during sensing. The SRAM CIM macro for multiplication-accumulation (MAC) operation accelerates the task of Euclidean distance between the k nearest high-dimension feature vector, with the designed convolution base attention layer for feature realization, which reduces up to 27× model size, 100× power consumption, and 297× inference time in our experiment. Adding augmented abundant instances of CIM units can further improve the performance of the neural network task on intelligent sensing systems.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-09-07T16:46:38Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-09-07T16:46:38Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Acknowledgments i 摘要 ii Abstract iii 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Background and Related Works 6 2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3 System Architecture and Problem Definition 26 3.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4 Design and Implementation 45 4.1 CIM Deterministic Error Calibration Methodology . . . . . . . . . . . . 46 4.2 CIM-aware Training Algorithm . . . . . . . . . . . . . . . . . . . . . . 49 4.2.1 Quantum Awarded Training Algorithm . . . . . . . . . . . . . . 51 4.2.2 CIM Noise Embedding Training Scheme . . . . . . . . . . . . . 52 4.3 End-to-end Sensor Localization Pipeline . . . . . . . . . . . . . . . . . 54 4.3.1 Overview of Sensor Localization Network . . . . . . . . . . . . 55 4.3.2 Feature Extractor with Graph CNN . . . . . . . . . . . . . . . . 56 4.3.3 Feature Extractor with Spatial Attention . . . . . . . . . . . . . 59 4.3.4 Feature Extractor with Channel Attention and Cross Feature Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.3.5 Pose Regression with Position Encoding and Attention Merging . 61 4.3.6 Pose Estimation with High-Order Trainable SVD . . . . . . . . . 63 4.3.7 Loss Function for 3D Sensor Localization Network . . . . . . . 65 4.3.8 ICP Fine-grain Point Cloud Registration . . . . . . . . . . . . . 65 4.4 CIM Marco with Ying-Yang Array Design . . . . . . . . . . . . . . . . . 66 4.5 Hardware and Software Abstraction . . . . . . . . . . . . . . . . . . . . 67 4.6 Three Dimension Human Face Dataset . . . . . . . . . . . . . . . . . . 68 4.6.1 CASIA 3D Human Face Dataset . . . . . . . . . . . . . . . . . 68 4.6.2 Self-Collected 3D Mask Dataset from Structure Light System . . 69 4.6.3 Ground-truth Acquisition for Translation and Rotation . . . . . . 70 5 Experiment Evaluation 74 5.1 Evaluation Metrics and Methodology . . . . . . . . . . . . . . . . . . . . 74 5.2 Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2.1 Model Size and Complexity . . . . . . . . . . . . . . . . . . . . 79 5.2.2 Area and Energy Efficiency for Hardware Design Unit . . . . . . 81 5.2.3 Software Function and Hardware Design Unit Mapping . . . . . 83 5.2.4 Inference Cycles Breakdown for Neural Network Component . . 85 5.2.5 Energy Breakdown for Neural Network Component . . . . . . . . 87 5.2.6 Energy Breakdown for Neural Network . . . . . . . . . . . . . . 89 5.2.7 Inference Cycles for Neural Network . . . . . . . . . . . . . . . 91 5.2.8 Inference Energy for Neural Network . . . . . . . . . . . . . . . 93 5.2.9 Inference Time for Neural Network . . . . . . . . . . . . . . . . 93 5.2.10 Accuracy for Different Dataset . . . . . . . . . . . . . . . . . . . 94 6 Conclusion 104 Bibliography 105	-
dc.language.iso	en	-
dc.subject	容錯神經網路推論系統	zh_TW
dc.subject	揮發性記憶體內運算	zh_TW
dc.subject	智慧型感知系統	zh_TW
dc.subject	注意機制神經網路	zh_TW
dc.subject	點雲配對與緻密化	zh_TW
dc.subject	感測器定位	zh_TW
dc.subject	Point cloud registration	en
dc.subject	Volatile in-memory-computing	en
dc.subject	Error tolorence system	en
dc.subject	Sensor localization	en
dc.subject	Smart sensor	en
dc.subject	Attention mech anism	en
dc.title	基於記憶體內運算平台以重定位以及點雲密集化提供即時三維感知	zh_TW
dc.title	Simultaneous Localization and Densification with Real Time 3D Sensing on In Memory Computing System	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	吳安宇;郭大維;王克中;叢培貴	zh_TW
dc.contributor.oralexamcommittee	An-Yeu Wu;Tei-Wei Kuo;Keh-Chung Wang;Pei-Kuei Tsung	en
dc.subject.keyword	揮發性記憶體內運算,智慧型感知系統,注意機制神經網路,點雲配對與緻密化,感測器定位,容錯神經網路推論系統,	zh_TW
dc.subject.keyword	Volatile in-memory-computing,Smart sensor,Attention mech anism,Point cloud registration,Sensor localization,Error tolorence system,	en
dc.relation.page	117	-
dc.identifier.doi	10.6342/NTU202302392	-
dc.rights.note	未授權	-
dc.date.accepted	2023-08-07	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊網路與多媒體研究所	-
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 未授權公開取用	5.49 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。