適用於高階影像理解之場景圖生成處理器

張峻瑋; Chun-Wei Chang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92339

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	楊家驤	zh_TW
dc.contributor.advisor	Chia-Hsiang Yang	en
dc.contributor.author	張峻瑋	zh_TW
dc.contributor.author	Chun-Wei Chang	en
dc.date.accessioned	2024-03-21T16:41:40Z	-
dc.date.available	2024-03-22	-
dc.date.copyright	2024-03-21	-
dc.date.issued	2024	-
dc.date.submitted	2024-01-23	-
dc.identifier.citation	[1] C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng, W. Nie, et al., “Yolov6: A single-stage object detection framework for industrial applications,” arXiv preprint arXiv:2209.02976, 2022. [2] Y. Li, W. Ouyang, B. Zhou, K. Wang, and X. Wang, “Scene graph generation from objects, phrases and region captions,” in 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1270–1279, 2017. [3] Y. Li, W. Ouyang, B. Zhou, J. Shi, C. Zhang, and X. Wang, “Factorizable net: an efficient subgraph-based framework for scene graph generation,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 335–351, 2018. [4] S. Song, D. Han, S. Kim, S. Kim, G. Park, and H.-J. Yoo, “Gppu: A 330.4-μj/task neural path planning processor with hybrid gnn acceleration for autonomous 3d navigation,” in 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), pp. 1–2, IEEE, 2023. [5] R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A. Shamma, et al., “Visual genome: Connecting language and vision using crowdsourced dense image annotations,” International journal of computer vision, vol. 123, pp. 32–73, 2017. [6] X. Chang, P. Ren, P. Xu, Z. Li, X. Chen, and A. Hauptmann, “A comprehensive survey of scene graphs: Generation and application,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 1–26, 2023. [7] Z. Wang, X. Xu, Y. Zhang, Y. Yang, and H. T. Shen, “Complex relation embedding for scene graph generation,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–15, 2022. [8] K. Nguyen, S. Tripathi, B. Du, T. Guha, and T. Q. Nguyen, “In defense of scene graphs for image captioning,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1387–1396, 2021. [9] S. Y. Gadre, K. Ehsani, S. Song, and R. Mottaghi, “Continuous scene representations for embodied ai,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14829–14839, 2022. [10] S.-Y. Yu, A. V. Malawade, D. Muthirayan, P. P. Khargonekar, and M. A. A. Faruque, Scene-graph augmented data-driven risk assessment of autonomous vehicle decisions,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 7, pp. 7941–7951, 2022. [11] H. Liu, N. Yan, M. Mortazavi, and B. Bhanu, “Fully convolutional scene graph generation,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11541–11551, 2021. [12] D. Xu, Y. Zhu, C. B. Choy, and L. Fei-Fei, “Scene graph generation by iterative message passing,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3097–3106, 2017. [13] R. Zellers, M. Yatskar, S. Thomson, and Y. Choi, “Neural motifs: Scene graph parsing with global context,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5831–5840, 2018. [14] K. Tang, H. Zhang, B. Wu, W. Luo, and W. Liu, “Learning to compose dynamic tree structures for visual contexts,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6612–6621, 2019. [15] J. Yang, J. Lu, S. Lee, D. Batra, and D. Parikh, “Graph r-cnn for scene graph generation,” in Proceedings of the European conference on computer vision (ECCV), pp. 670–685, 2018. [16] R. Li, S. Zhang, B. Wan, and X. He, “Bipartite graph network with adaptive message passing for unbiased scene graph generation,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11104–11114, 2021. [17] X. Chen, J. Xu, and Z. Yu, “A 68-mw 2.2 tops/w low bit width and multiplierless dcnn object detection processor for visually impaired people,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 11, pp. 3444–3453, 2018. [18] M. Lefebvre, L. Moreau, R. Dekimpe, and D. Bol, “7.7 a 0.2-to-3.6 tops/w programmable convolutional imager soc with in-sensor current-domain ternary-weighted mac operations for feature extraction and region-of-interest detection,” in 2021 IEEE International Solid-State Circuits Conference (ISSCC), vol. 64, pp. 118–120, IEEE, 2021. [19] Y. Gong, T. Zhang, H. Guo, X. Liu, J. Zheng, H. Wu, C. Jia, L. Que, L. Zhou, L. Chang, et al., “22.7 dl-vopu: An energy-efficient domain-specific deep-learning-based visual object processing unit supporting multi-scale semantic feature extraction for mobile object detection/tracking applications,” in 2023 IEEE International Solid-State Circuits Conference (ISSCC), pp. 1–3, IEEE, 2023. [20] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, 2015. [21] T. Geng, A. Li, R. Shi, C. Wu, T. Wang, Y. Li, P. Haghi, A. Tumeo, S. Che, S. Reinhardt, et al., “Awb-gcn: A graph convolutional network accelerator with runtime workload rebalancing,” in 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 922–936, IEEE, 2020. [22] C. Fang, H. Derbyshire, W. Sun, J. Yue, H. Shi, and Y. Liu, “A sort-less fpga-based non-maximum suppression accelerator using multi-thread computing and binary max engine for object detection,” in 2021 IEEE Asian Solid-State Circuits Conference (A-SSCC), pp. 1–3, IEEE, 2021. [23] W.-C. Huang, I.-T. Lin, W.-C. Chen, L.-Y. Lin, N.-S. Chang, C.-P. Lin, C.-S. Chen, and C.-H. Yang, “A 28-nm 25.1 tops/w sparsity-aware cnn-gcn deep learning soc for mobile augmented reality,” in 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), pp. 42–43, IEEE, 2022. [24] K.-J. Lee, S. Moon, and J.-Y. Sim, “A 384g output nonzeros/j graph convolutional neural network accelerator,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 69, no. 10, pp. 4158–4162, 2022. [25] J.-W. Jang, S. Lee, D. Kim, H. Park, A. S. Ardestani, Y. Choi, C. Kim, Y. Kim, H. Yu, H. Abdel-Aziz, et al., “Sparsity-aware and re-configurable npu architecture for samsung flagship mobile soc,” in 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), pp. 15–28, IEEE, 2021.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92339	-
dc.description.abstract	本論文提出文獻中第一顆場景圖生成處理器，實現了邊緣裝置上的高階視覺理解。本研究透過節點群集化來排除97%之冗餘的關係節點，並提出近似排序法來大幅度降低群集化所需之額外代價。此外，也透過剪枝來消除圖卷積中83%可忽略之運算，上述優化共減少98.2%運算複雜度。所設計的處理器包含圖建構器、圖卷積引擎、以及邊處理單元三個模組。圖建構器優先使用高位元來進行重疊度運算，透過預測兩者重疊機會降低56%所需之運算能量，並使用平行處理減少94%群集化所需之運算時間。圖卷積引擎支援混和稀疏編碼，針對不同稀疏程度資料採用最適合稀疏編碼，提升圖卷積過程中資料壓縮效率達46%。本研究所設計之自適應乘法累加器，透過可重設置之資料路徑提升高稀疏資料之硬體使用率，相比過往圖卷積加速器可達到1.8倍高的能量效率。邊處理單元利用乙狀函數的特性，提前終止進入飽和區域的部分和運算，減少58%邊特徵所需運算。處理器採用40奈米技術設計與製造，在1.1V的供源電壓與 200MHz的工作頻率下，晶片功耗為101mW。對於256個物件與65,280個成對關係之場景圖，提供280幀/秒之幀率，實現了在複雜環境的即時場景圖生成。與圖形處理器相比，本晶片達到154倍以上的吞吐量，卻只消耗低於1,782倍的功耗，達到270,000倍以上的能量效率。	zh_TW
dc.description.abstract	This work presents the world’s first scene graph generation processor, which realizes high-level visual understanding on mobile platforms. By grouping the relation nodes with similar features, the redundant relation nodes are reduced by 97%, with the overhead minimized by the approximate sorting. Edge pruning is applied to further eliminate unimportant computations during aggregation in the GCN by 83%. The overall complexity is reduced by 98.2%. For the hardware architecture, the energy required for IoU computation is reduced by 56% with MSB speculation, and parallel processing is introduced to reduce the latency of node grouping by 94%. A dedicated engine is designed to accelerate graph convolution. To address the large variation in data sparsity for GCN, hybrid sparsity encoding is proposed to attain higher ratio of data compression, reducing the size of encoded node features by 46%. The adaptive MAC arrays can be reconfigured to adapt to diverse sparsity conditions, improving the energy efficiency by 1.8× over a previous accelerator. The computations for edge features are reduced by 58% with the reuse of partial sums and saturation-skipping of the sigmoid function. Fabricated in a 40nm technology, the chip dissipates 101mW at a supply voltage of 1.1V and a clock frequency of 200MHz. It achieves a frame rate of 280fps for a scene graph with 256 objects and 65,280 pairwise relationships, enabling real-time scene graph generation in practical environments. Compared to a commercial GPU, the chip achieves a 154× higher throughput and consumes 1,782× less power, providing 270,000× higher energy efficiency.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-03-21T16:41:40Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-03-21T16:41:40Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	誌謝 ii 摘要 iii Abstract iv Contents v List of Figures vi 1 Introduction 1 2 Preliminaries 6 2.1 GCN-Based Scene Graph Generation 6 2.2 Graph Construction 6 2.3 Graph Convolutional Networks 7 3 Algorithm-Architecture Co-Optimizations 9 3.1 Node grouping 9 3.2 Approximate Sorting 10 3.3 Edge Pruning 11 3.4 Summary of Complexity Reduction 12 4 System Architecture 14 4.1 Graph Constructor 15 4.1.1 MSB Speculation 15 4.1.2 Parallel IoU Computation 16 4.2 Graph Convolution Engine 17 4.2.1 Hybrid Sparsity Encoding 17 4.2.2 Sparsity Decoder 19 4.2.3 Adaptive MAC Array 20 4.3 Edge Processing Unit 21 4.3.1 Partial Sum Reuse 21 4.3.2 Saturation-Skipping 23 5 Experimental Results 24 5.1 Chip Implementation 24 5.2 Performance Evaluation 25 5.3 Performance Comparison 27 6 Conclusion 28 References 30	-
dc.language.iso	en	-
dc.title	適用於高階影像理解之場景圖生成處理器	zh_TW
dc.title	Design and Implementation of a Scene Graph Generation Processor for Visual Context Understanding	en
dc.type	Thesis	-
dc.date.schoolyear	112-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	翁詠祿;張錫嘉	zh_TW
dc.contributor.oralexamcommittee	Yeong-Luh Ueng;Hsie-Chia Chang	en
dc.subject.keyword	場景圖生成,人物交互檢測,圖卷積神經網路,深度學習處理器,	zh_TW
dc.subject.keyword	scene graph generation,visual relationship detection,graph convolutional network,deep learning processor,	en
dc.relation.page	33	-
dc.identifier.doi	10.6342/NTU202400135	-
dc.rights.note	未授權	-
dc.date.accepted	2024-01-25	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電子工程學研究所	-
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-112-1.pdf 目前未授權公開取用	2.07 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。