以邏輯閘為運算基礎之卷積式神經網路電路設計與實作

莊詠翔; Yung-Hsiang Chuang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/102198

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	闕志達	zh_TW
dc.contributor.advisor	Tzi-Dar Chiueh	en
dc.contributor.author	莊詠翔	zh_TW
dc.contributor.author	Yung-Hsiang Chuang	en
dc.date.accessioned	2026-04-08T16:13:41Z	-
dc.date.available	2026-04-09	-
dc.date.copyright	2026-04-08	-
dc.date.issued	2026	-
dc.date.submitted	2026-03-24	-
dc.identifier.citation	[1] X. Chen, Y. Zhang, and Y. Wang, "MTP: Multi-Task Pruning for Efficient Semantic Segmentation Networks," in Proc. of 2022 IEEE International Conference on Multimedia and Expo (ICME), July 2022, pp. 1–6. [2] G. Hinton, O. Vinyals, and J. Dean, "Distilling the Knowledge in a Neural Network. arXiv preprint arXiv:1503.02531, 2015. [3] F. Petersen, H. Kuehne, C. Borgelt, J. Welzel, and S. Ermon, "Convolutional Differentiable Logic Gate Networks," in Proc. of Advances in Neural Information Processing Systems, Vancouver, Canada, December 2024, pp. 121185–121203. [4] F. Petersen, C. Borgelt, H. Kuehne, and O. Deussen, "Deep differentiable logic gate networks," in Proc. of 36th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 2022, pp. 2006–2018. [5] A. Krizhevsky, V. Nair, and G. Hinton. "The CIFAR-10 datasets." https://www.cs.toronto.edu/~kriz/cifar.html (accessed 01-05, 2026). [6] Y. LeCun, C. Cortes, and C. Burges. "Mnist handwritten digit database." http://yann.lecun.com/exdb/mnist (accessed 01-05, 2026). [7] J. Yang et al., "MedMNIST v2-A large-scale lightweight benchmark for 2D and 3D biomedical image classification," Scientific Data, Vol. 10, No. 1, p. 41, 2023. [8] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, Vol. 86, No. 11, pp. 2278–2324, 1998. [9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Commun. ACM, Vol. 60, No. 6, pp. 84–90, 2017. [10] Q. Yang, C. Ji, H. Luo, P. Li, and Z. Ding, "Data Augmentation Through Random Style Replacement," in Proc. of 2025 6th International Conference on Computer Vision, Image and Deep Learning (CVIDL), Ningbo, China, May 2025, [11] G. Huang, S. Liu, L. v. d. Maaten, and K. Q. Weinberger, "CondenseNet: An Efficient DenseNet using Learned Group Convolutions. arXiv preprint arXiv:1711.09224, 2017. [12] A. Jafari, M. Rezagholizadeh, P. Sharma, and A. Ghodsi, "Annealing Knowledge Distillation. arXiv preprint arXiv:2104.07163, 2021. [13] J. Guo, M. Chen, Y. Hu, C. Zhu, X. He, and D. Cai, "Spherical Knowledge Distillation. arXiv preprint arXiv:2010.07485, 2020. [14] R. Lukas, A. Till, P. Andreas, and W. Roger, "Light Differentiable Logic Gate Networks. arXiv preprint arXiv:2510.03250, 2025. [15] R. S. Dehal, C. Munjal, A. A. Ansari, and A. S. Kushwaha, "GPU Computing Revolution: CUDA," in Proc. of 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), 12–13 Oct. 2018 2018, pp. 197–201. [16] NVIDIA. "NVIDIA CUDA Toolkit." https://developer.nvidia.com/cuda/toolkit (accessed 05-24, 2025). [17] M. Courbariaux and Y. Bengio, "BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. arXiv preprint arXiv:1602.02830, 2016. [18] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. arXiv preprint arXiv:1603.05279, 2016. [19] U. Yaman et al., "FINN: A Framework for Fast, Scalable Binarized Neural Network Inference. arXiv preprint arXiv:1612.07119, 2016. [20] R. Zhao et al., "Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs," in Proc. of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, New York, USA, 2017, pp. 15–24. [21] L. Yang, Z. He, and D. Fan, "A Fully Onchip Binarized Convolutional Neural Network FPGA Impelmentation with Accurate Inference," in Proc. of International Symposium on Low Power Electronics and Design, Seattle, WA, USA, 2018, p. 50. [22] E. Wang, J. J. Davis, P. Y. K. Cheung, and G. A. Constantinides, "LUTNet: Rethinking Inference in FPGA Soft Logic. arXiv preprint arXiv:1904.00938, 2019. [23] Y. Zhang, J. Pan, X. Liu, H. Chen, D. Chen, and Z. Zhang, "FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations. arXiv preprint arXiv:2012.12206, 2020. [24] S. Liang, S. Yin, L. Liu, W. Luk, and S. Wei, "FP-BNN: Binarized neural network on FPGA," Neurocomputing, Vol. 275, pp. 1072–1086, 2018/01/31/ 2018. [25] M. Walczak, U. Kallakuri, E. Humes, X. Lin, and T. Mohsenin, "Invited Paper: BitMedViT: Ternary-Quantized Vision Transformer for Medical AI Assistants on the Edge," in Proc. of 2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD), 26–30 Oct. 2025 2025, pp. 1–7. [26] Xilinx. "Vivado design tools 2024.2." https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/vivado-design-tools/2024-2.html (accessed 12-01, 2024). [27] K. Chellapilla, S. Puri, and P. Simard, "High performance convolutional neural networks for document processing," in Proc. of Tenth international workshop on frontiers in handwriting recognition, 2006, [28] R. Bosio et al., "NN2FPGA: Optimizing CNN Inference on FPGAs With Binary Integer Programming," Trans. Comp.-Aided Des. Integ. Cir. Sys., Vol. 44, No. 5, pp. 1807–1818, 2025. [29] Y. Zhang, B. Sun, W. Jiang, Y. Ha, M. Hu, and W. Zhao, "WSQ-AdderNet: Efficient Weight Standardization Based Quantized AdderNet FPGA Accelerator Design with High-Density INT8 DSP-LUT Co-Packing Optimization," in Proc. of the 41st IEEE/ACM International Conference on Computer-Aided Design, San Diego, California, 2022, p. 142. [30] Xilinx. "Kria-PYNQ." https://github.com/Xilinx/Kria-PYNQ (accessed 11-21, 2025).	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/102198	-
dc.description.abstract	本研究探討以邏輯閘為運算基礎之神經網路架構，並提出一套針對卷積可微分邏輯閘神經網路（Convolutional Differentiable Logic Gate Network, CDLGN）之軟硬體協同設計與加速實現方法。相較於傳統卷積神經網路（CNN）仰賴大量乘加（MAC）運算，邏輯閘神經網路（Logic Gate Network, LGN）完全由二元邏輯閘構成，推論階段不需浮點運算，具備高度硬體友善特性。然而，當模型規模擴大時，直接將完整邏輯網路映射至硬體電路，仍面臨資源消耗與可擴充性限制等問題。本研究在演算法層面採用可微分邏輯閘（Differentiable Logic）將原本不可微分之布林運算連續化，使模型可透過梯度下降法進行訓練，並於訓練完成後離散化為純邏輯閘結構。針對影像辨識任務，本文設計卷積邏輯層與隨機連接邏輯層，並引入分組卷積與通道限制之輸入選取策略，在維持模型準確率的同時降低連結複雜度，以提升硬體實作效率。訓練過程中，除多重閾值二值化與邊緣偵測特徵強化外，亦提出多頭退火式知識蒸餾方法，以改善模型收斂穩定性與最終分類效能。在軟體訓練以及加速方面，針對 LGN 非乘加型運算不利於現有 GPU 架構之問題，本研究於 CUDA平台上實作專用之 forward 與 backward propagation 核心，加速可微分邏輯運算流程，顯著提升訓練與推論效率。在CIFAR-10、MNIST、OrganAMNIST資料集上分別達到最高83.5%、99.25%以及90.62%的辨識正確率，以及相較Torch平台加速16倍的訓練速度。在硬體實現上，本研究以查找表（LUT）方式建構可重組之邏輯閘運算單元，並設計具分組平行化特性的卷積邏輯加速架構，部署於 FPGA-SoC 平台。系統整合處理系統（PS）與可程式邏輯（PL），並結合 PyTorch 與 PYNQ 軟體框架，使整體推論流程可於無外部主機情況下獨立運作。本系統在運行約92.7KB的LGN模型時達到最大2016的平均FPS以及1.78 mJ/frame的能源效率，平均功耗約是NVIDIA H100 的0.86% 以及 KV260 PS APU 的0.09% 。實驗結果顯示，本研究所提出之架構在維持分類準確率的同時，能有效降低硬體複雜度並提升推論吞吐量，驗證邏輯閘神經網路於高效率硬體推論平台上的可行性與潛力。	zh_TW
dc.description.abstract	This thesis investigates neural network architectures based on logic-gate operations and proposes a hardware–software co-design framework with an acceleration scheme for Convolutional Differentiable Logic Gate Networks (CDLGNs). Unlike conventional Convolutional Neural Networks (CNNs), which rely heavily on multiply-and-accumulate (MAC) operations, Logic Gate Networks (LGNs) are constructed entirely from binary logic gates. As a result, floating-point computations are eliminated during inference, making LGNs inherently hardware-friendly. However, as model complexity increases, directly mapping large-scale logic networks onto hardware circuits leads to considerable resource consumption and scalability challenges. At the algorithmic level, this work adopts Differentiable Logic to relax inherently non-differentiable Boolean operations into continuous formulations, enabling end-to-end optimization via gradient descent. After training, the network is discretized into a pure logic-gate structure for efficient inference. For image classification tasks, convolutional logic layers and randomly connected logic layers are designed to capture spatial and channel-wise features. To further reduce connection complexity and improve hardware efficiency, group convolution and channel-constrained input selection strategies are introduced, achieving a favorable balance between accuracy and implementation cost. Additionally, multi-threshold binarization and edge-detection-based feature augmentation are incorporated to enhance input representations. A multi-head annealing knowledge distillation scheme is also proposed to improve training stability and final classification performance. On the software side, the incompatibility between non-MAC logic operations and existing GPU architectures is addressed by implementing dedicated CUDA kernels for both forward and backward propagation. These customized kernels significantly accelerate differentiable logic computations, achieving up to 16× speedup compared to standard PyTorch implementations. The proposed method attains a maximum classification accuracy of 83.5% on CIFAR-10, 99.25% on MNIST and 90.62% on OrganAMNIST, demonstrating both efficiency and effectiveness. On the hardware side, reconfigurable logic-gate operators are realized using lookup table (LUT)-based designs, and a group-parallel convolutional logic accelerator is developed and deployed on an FPGA-SoC platform. The system integrates the Processing System (PS) and Programmable Logic (PL), and is combined with the PyTorch and PYNQ frameworks to enable standalone inference without requiring an external host computer. Experimental results show that, when executing an approximately 92.7 KB LGN model, the proposed system achieves up to 2016 FPS with an energy efficiency of 1.78 mJ per frame. The average power consumption is only about 0.86% of an NVIDIA H100 GPU and 0.09% of the KV260 PS APU, highlighting its superior energy efficiency. Overall, the proposed architecture maintains competitive classification accuracy while significantly reducing hardware complexity and improving inference throughput, thereby validating the feasibility and potential of Logic Gate Networks for high-efficiency hardware inference platforms.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-04-08T16:13:41Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2026-04-08T16:13:41Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	致謝 i 摘要 iii Abstract v 目次 vii 圖次 xi 表次 xiii 1 第一章緒論 1 1.1 研究背景 1 1.2 研究動機與目標 2 1.3 論文組織與貢獻 4 2 第二章以邏輯閘為運算基礎之神經網路(Logic Gate Network)介紹 6 2.1 可微分邏輯閘神經網路(Differentiable Logic Gate Network, DLGN) 6 2.1.1 可微分邏輯(Differentiable Logics) 7 2.1.2 隨機連接邏輯層(Randomly Connected Logic Layer) 7 2.2 卷積邏輯閘神經網路(Convolutional Differentiable Logic Gate Network, CDLGN) 9 2.2.1 卷積邏輯層(Convolutional Logic Layer) 10 2.2.2 或閘池化法(OR-Pooling) 11 2.3 第二章總結 11 3 第三章卷積邏輯閘神經網路訓練與推論 13 3.1 訓練任務與資料集 13 3.1.1 CIFAR-10資料集 13 3.1.2 MNIST-32 資料集 14 3.1.3 OrganAMNIST資料集 14 3.1.4 影像辨識任務 15 3.2 輸入預處理和二值化 16 3.2.1 輸入二值化 16 3.2.2 邊緣偵測（Edge Detection）算子 18 3.2.3 資料增補（Data Augmentation） 20 3.3 CDLGN 網路架構 22 3.3.1 卷積邏輯層之輸入選取策略 25 3.3.2 分組卷積（Group Convolution） 26 3.3.3 多頭退火式知識蒸餾方法（Multi-head Annealing Knowledge Distillation） 28 3.4 Forward/Backward運算 31 3.4.1 可微分邏輯之Forward/Backward運算 31 3.4.2 Residual Initialization 32 3.5 CUDA 軟體加速 35 3.6 Ablation study 和訓練實驗結果 38 3.6.1 Ablation study 38 3.6.2 二值神經網路（BNN）之文獻介紹 40 3.6.3 與二值神經網路（BNN）之模型規模與參數量比較 41 3.6.4 訓練結果之分布圖 45 3.7 其他未採用之內容 46 3.8 第三章總結 47 4 第四章 FPGA電路及系統設計 49 4.1 電路整體規劃 49 4.1.1 FPGA系統架構 49 4.1.2 LGN加速電路規劃 51 4.1.3 基於查找表（LUT）之電路設計 52 4.2 運算單元（Processing Element, PE）設計 53 4.2.1 整體架構和資料流 53 4.2.2 Unrolling Unit 與 Ifmap Buffer 設計 55 4.2.3 LUT Unit 與 LUT Array 設計 59 4.2.4 Output Buffer & Pooling 62 4.3 LGN 加速電路系統架構 62 4.3.1 URAM/BRAM配置 64 4.3.2 8-group 設計所需之控制訊號長度縮減 66 4.3.3 PS/PL Interface 68 4.3.4 電路運算流程 68 4.3.5 Vivado實作結果及評估 70 4.4 第四章總結 72 5 第五章系統整合與效能分析 73 5.1 結合PYNQ及PyTorch 的FPGA電路控制 73 5.1.1 C++ Extension 74 5.1.2 Random connected layer與GroupSum的執行 75 5.1.3 PS/PL Pipeline 77 5.1.4 針對效能的Axi-connection設定優化 78 5.2 運算效能分析 79 5.2.1 分層執行時間比較 79 5.2.2 功耗分析與文獻比較 81 5.3 執行效果展示 87 6 第六章研究結語與展望 90 參考文獻 92 附錄 95 1. 1×1 Convolution Logic Gate 設計 95 2. 混合 BNN 與 LGN 之訓練架構 98 3. LFSR random number generator 99	-
dc.language.iso	zh_TW	-
dc.subject	可微分邏輯閘	-
dc.subject	卷積式神經網路	-
dc.subject	現場可程式化邏輯閘陣列	-
dc.subject	Differentiable Logic Gate	-
dc.subject	Convolutional Neural Network	-
dc.subject	Field-Programmable Gate Array	-
dc.title	以邏輯閘為運算基礎之卷積式神經網路電路設計與實作	zh_TW
dc.title	Design and Implementation of a Logic-Gate-Based Convolutional Neural Network Circuit	en
dc.type	Thesis	-
dc.date.schoolyear	114-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	劉宗德;黃元豪	zh_TW
dc.contributor.oralexamcommittee	Tsung-Te Liu;Yuan-Hao Huang	en
dc.subject.keyword	可微分邏輯閘,卷積式神經網路現場可程式化邏輯閘陣列	zh_TW
dc.subject.keyword	Differentiable Logic Gate,Convolutional Neural NetworkField-Programmable Gate Array	en
dc.relation.page	101	-
dc.identifier.doi	10.6342/NTU202600875	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2026-03-24	-
dc.contributor.author-college	重點科技研究學院	-
dc.contributor.author-dept	積體電路設計與自動化學位學程	-
dc.date.embargo-lift	2026-04-09	-
顯示於系所單位：	積體電路設計與自動化學位學程

文件中的檔案：

檔案	大小	格式
ntu-114-2.pdf	6.13 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。