請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83258完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 陳銘憲 | zh_TW |
| dc.contributor.advisor | Ming-Syan Chen | en |
| dc.contributor.author | 康智凱 | zh_TW |
| dc.contributor.author | Chih-Kai Kang | en |
| dc.date.accessioned | 2023-02-01T17:07:29Z | - |
| dc.date.available | 2023-11-09 | - |
| dc.date.copyright | 2023-02-01 | - |
| dc.date.issued | 2023 | - |
| dc.date.submitted | 2023-01-12 | - |
| dc.identifier.citation | [1] Basic MNIST example using PyTorch. https://github.com/pytorch/ examples/tree/master/mnist.
[2] CNN model for human activity recognition. https://github.com/ healthDataScience/deep-learning-HAR. [3] IXYS - IXOLAR high efficiency solar cell. http://ixapps.ixys.com/ DataSheet/SM111K04L.pdf. [4] Texas Instruments - BQ25504 Ultra low power boost converter with battery man- agement for energy harvester. http://www.ti.com/product/BQ25504. [5] Texas instruments - EnergyTrace technology for MSP430. http://www.ti. com/tool/ENERGYTRACE. [6] TI low-energy accelerator. http://www.ti.com/lit/an/slaa720/ slaa720.pdf. [7] TI MSP430FR5994. http://www.ti.com/product/MSP430FR5994. [8] Excelon LP 8-Mbit SPI F-RAM. https://www.cypress.com/file/444186/download, 2019. [9] S. Ahmed, A. Bakar, N. A. Bhatti, M. H. Alizai, J. H. Siddiqui, and L. Mottola. The betrayal of constant power x time: finding the missing joules of transiently-powered computers. In Proc. of ACM LCTES, pages 97–109, 2019. [10] S. Ahmed, N. A. Bhatti, M. H. Alizai, J. H. Siddiqui, and L. Mottola. Efficient intermittent computing with differential checkpointing. In Proc. of ACM LCTES, pages 70–81, 2019. [11] B. H. Ahn, J. Lee, J. M. Lin, H.-P. Cheng, J. Hou, and H. Esmaeilzadeh. Ordering chaos: Memory-aware scheduling of irregularly wired neural networks for edge devices. In Proceedings of Machine Learning and Systems, volume 2, pages 44– 57, 2020. [12] J. I. Ahn, D. Kim, R. Ha, and H. Cha. State-of-charge estimation of supercapacitors in transiently-powered sensor nodes. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pages 1–1, 2021. [13] M. Alwani, H. Chen, M. Ferdman, and P. Milder. Fused-layer cnn accelerators. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1–12, 2016. [14] D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz. A public domain dataset for human activity recognition using smartphones. In Proc. of ESANN, pages 437–442, 2013. [15] D. Balsamo, A. S. Weddell, A. Das, A. R. Arreola, D. Brunelli, B. M. Al-Hashimi, G. V. Merrett, and L. Benini. Hibernus++: A self-calibrating and adaptive system for transiently-powered embedded devices. IEEE TCAD, 35(12):1968–1980, 2016. [16] C. R. Banbury, V. J. Reddi, M. Lam, W. Fu, A. Fazel, J. Holleman, X. Huang, R. Hurtado, D. Kanter, A. Lokhmotov, D. Patterson, D. Pau, J. sun Seo, J. Sieracki, U. Thakker, M. Verhelst, and P. Yadav. Benchmarking TinyML systems: Challenges and direction. arXiv preprint arXiv:2003.04821, 2020. [17] G. Berthou, T. Delizy, K. Marquet, T. Risset, and G. Salagnac. Sytare: A lightweight kernel for NVRAM-based transiently-powered systems. IEEE TC, 68(9):1–14, 2018. [18] G. Chen, C. Parada, and G. Heigold. Small-footprint keyword spotting using deep neural networks. In Proc. of IEEE ICASSP, pages 4087–4091, 2014. [19] T.-A. Chen, D.-N. Yang, and M.-S. Chen. Alignq: Alignment quantization with admm-based correlation preservation. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12528–12537, 2022. [20] T.-A. Chen, D.-N. Yang, and M.-S. Chen. Climbq: Class imbalanced quantization enabling robustness on efficient inferences. In Advances in Neural Information Processing Systems (NeurIPS), 2022. [21] W.-M. Chen, P.-C. Hsiu, and T.-W. Kuo. Enabling failure-resilient intermittently- powered systems without runtime checkpointing. In Proc. of ACM/IEEE DAC, pages 1–6, 2019. [22] W.-M. Chen, T.-W. Kuo, and P.-C. Hsiu. Enabling failure-resilient intermittent systems without runtime checkpointing. IEEE TCAD, 39(12):4399–4412, 2020. [23] A. Colin and B. Lucia. Chain: Tasks and channels for reliable intermittent programs. In Proc. of ACM OOPSLA, pages 514–530, 2016. [24] A. Colin, E. Ruppel, and B. Lucia. A reconfigurable energy storage architecture for energy-harvesting devices. In Proc. of ACM ASPLOS, pages 767–781, 2018. [25] E. Compute. ECM3532 Neural Sensor Processor. https://media.digikey. com/pdf/DataSheets/EtaComputePDFs/ECM3532_AI_Sensor_PB_ 1.0.pdf, 2021. [26] E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. [27] J. de Winkel, V. Kortbeek, J. Hester, and P. Pawełczak. Battery-Free Game Boy. In Proc. of ACM IMWUT, 4(3), 2020. [28] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13733–13742, 2021. [29] I. Fedorov, R. P. Adams, M. Mattina, and P. N. Whatmough. SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers. In Proc. of NeurIPS, 2019. [30] E. Flamand, D. Rossi, F. Conti, I. Loi, A. Pullini, F. Rotenberg, and L. Benini. GAP- 8: A RISC-V SoC for AI at the edge of the IoT. In Proc. of IEEE ASAP, pages 1–4, 2018. [31] J. Frankle and M. Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations, 2019. [32] G. Gobieski, N. Beckmann, and B. Lucia. Intelligence beyond the edge: Inference on intermittent embedded systems. In Proc. of ACM ASPLOS, pages 199–213, 2019. [33] G.Gobieski,A.Nagi,N.Serafin,M.M.Isgenc,N.Beckmann,andB.Lucia.Manic: A vector-dataflow architecture for ultra-low-power embedded systems. In Proc. of ACM/IEEE MICRO, pages 670–684, 2019. [34] K. Goetschalckx and M. Verhelst. Breaking high-resolution cnn bandwidth barriers with enhanced depth-first execution. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 9(2):323–331, 2019. [35] Google. Example CNN to classify CIFAR-10 using tensorflow. https://www. tensorflow.org/tutorials/images/cnn, 2020. [36] K. Guo, L. Sui, J. Qiu, J. Yu, J. Wang, S. Yao, S. Han, Y. Wang, and H. Yang. Angel- eye: A complete design flow for mapping CNN onto embedded fpga. IEEE TCAD, 37(1):35–47, 2017. [37] S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In Prof. of ICLR, 2016. [38] S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. International Conference on Learning Representations (ICLR), 2016. [39] K.He,X.Zhang,S.Ren,andJ.Sun.Deepresiduallearningforimagerecognition.In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, Los Alamitos, CA, USA, jun 2016. IEEE Computer Society. [40] K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part IV, volume 9908 of Lecture Notes in Computer Science, pages 630–645. Springer, 2016. [41] Y. He, P. Liu, Z. Wang, Z. Hu, and Y. Yang. Filter pruning via geometric median for deep convolutional neural networks acceleration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [42] Y. He, X. Zhang, and J. Sun. Channel pruning for accelerating very deep neural networks. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017. [43] J. Hester and J. Sorber. New directions: The future of sensing is batteryless, intermittent, and awesome. In Proc. of ACM SenSys, pages 1–6, 2017. [44] A. Howard, M. Sandler, B. Chen, W. Wang, L.-C. Chen, M. Tan, G. Chu, V. Vasudevan, Y. Zhu, R. Pang, H. Adam, and Q. Le. Searching for mobilenetv3. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 1314–1324, 2019. [45] C.-C. Huang, G. Jin, and J. Li. Swapadvisor: Pushing deep learning beyond the gpu memory limit via smart swapping. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), page 1341–1355, 2020. [46] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. [47] M. Integrated. MAX78000 Ultra-Low-Power MCU with Arm Cortex-M4 and a Convolutional Neural Network Accelerator. https://datasheets. maximintegrated.com/en/ds/MAX78000.pdf, 2021. [48] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko. Quantization and training of neural networks for efficient integer- arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. [49] H. Jayakumar, K. Lee, W. S. Lee, A. Raha, Y. Kim, and V. Raghunathan. Powering the internet of things. In Proc. of ACM/IEEE ISLPED, pages 375–380, 2014. [50] H. Jayakumar, A. Raha, J. R. Stevens, and V. Raghunathan. Energy-aware memory mapping for hybrid FRAM-SRAM MCUs in intermittently-powered IoT devices. ACM TECS, 16(3):65:1–65:23, 2017. [51] W.Jiang,X.Zhang,E.H.-M.Sha,L.Yang,Q.Zhuge,Y.Shi,andJ.Hu.Accuracyvs. efficiency: Achieving both through FPGA-implementation aware neural architecture search. In Proc. of ACM/IEEE DAC, 2019. [52] G. Jocher, A. Chaurasia, A. Stoken, J. Borovec, NanoCode012, Y. Kwon, K. Michael, TaoXie, J. Fang, imyhxy, Lorna, Z. Yifu, C. Wong, A. V, D. Montes, Z. Wang, C. Fati, J. Nadar, Laughing, UnglvKitDe, V. Sonck, tkianai, yxNONG, P. Skalski, A. Hogan, D. Nair, M. Strobel, and M. Jain. YOLOv5 SOTA Realtime Instance Segmentation, Nov. 2022. [53] C.-K. Kang, H. R. Mendis, C.-H. Lin, M.-S. Chen, and P.-C. Hsiu. Everything leaves footprints: Hardware accelerated intermittent deep inference. IEEE TCAD, 39(11):3479–3491, 2020. [54] R. Krishnamoorthi. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342, 2018. [55] A.KrizhevskyandG.Hinton.Learningmultiplelayersoffeaturesfromtinyimages. Technical report, University of Toronto, 2009. [56] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Proc. of NIPS, pages 1097–1105, 2012. [57] Y. LeCun, C. Cortes, and C. J. Burges. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/. [58] G. Li, J. Zhang, Y. Wang, C. Liu, M. Tan, Y. Lin, W. Zhang, J. Feng, and T. Zhang. Residual distillation: Towards portable deep neural networks without shortcuts. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pages 8935–8946. Curran Associates, Inc., 2020. [59] E. Liberis and N. D. Lane. Neural networks on microcontrollers: saving memory at inference via operator reordering, 2019. [60] J.Lin,W.-M.Chen,H.Cai,C.Gan,andS.Han.Mcunetv2:Memory-efficientpatch- based inference for tiny deep learning. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2021. [61] J. Lin, W.-M. Chen, J. Cohn, C. Gan, and S. Han. Mcunet: Tiny deep learning on iot devices. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2020. [62] Y. Liu, Z. Li, H. Li, Y. Wang, X. Li, K. Ma, S. Li, M.-F. Chang, S. John, Y. Xie, J. Shu, and H. Yang. Ambient energy harvesting nonvolatile processors: From circuit to system. In Proc. of ACM/IEEE DAC, pages 150:1–150:6, 2015. [63] Z. Liu, M. Sun, T. Zhou, G. Huang, and T. Darrell. Rethinking the value of network pruning. In Proc. of ICLR, 2018. [64] B. Lucia, V. Balaji, A. Colin, K. Maeng, and E. Ruppel. Intermittent computing: Challenges and opportunities. In Proc, of SNAPL, pages 8:1–8:14, 2017. [65] B. Lucia and B. Ransford. A simpler, safer programming and execution model for intermittent systems. In Proc. of ACM PLDI, pages 575–585, 2015. [66] K. Ma, X. Li, S. Li, Y. Liu, J. J. Sampson, Y. Xie, and V. Narayanan. Nonvolatile processor architecture exploration for energy-harvesting applications. IEEE Micro, 35(5):32–40, 2015. [67] K. Ma, X. Li, H. Liu, X. Sheng, Y. Wang, K. Swaminathan, Y. Liu, Y. Xie, J. Sampson, and V. Narayanan. Dynamic power and energy management for energy harvesting nonvolatile processor systems. ACM TECS, 16(4):1–23, 2017. [68] K. Maeng, A. Colin, and B. Lucia. Alpaca: Intermittent execution without checkpoints. In Proc. of ACM OOPSLA, pages 96:1–96:30, 2017. [69] K. Maeng and B. Lucia. Supporting peripherals in intermittent systems with just-in- time checkpoints. In Proc. of ACM PLDI, pages 1101–1116, 2019. [70] L.Miao,X.Luo,T.Chen,W.Chen,D.Liu,andZ.Wang.Learningpruning-friendly networks via frank-wolfe: One-shot, any-sparsity, and no retraining. In International Conference on Learning Representations, 2022. [71] S.K.Nayar,D.C.Sims,andM.Fridberg.TowardsSelf-PoweredCameras.InProc. of IEEE ICCP, pages 1–10, 2015. [72] W. Niu, X. Ma, S. Lin, S. Wang, X. Qian, X. Lin, Y. Wang, and B. Ren. PatDNN: Achieving real-time DNN execution on mobile devices with pattern-based weight pruning. In Proc. of ASPLOS, pages 907–922, 2020. [73] I.Radosavovic,R.P.Kosaraju,R.Girshick,K.He,andP.Dollar.Designingnetwork design spaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020. [74] B. Ransford, J. Sorber, and K. Fu. Mementos: System support for long-running computation on rfid-scale devices. In Proc. of ACM ASPLOS, pages 159–170, 2011. [75] Y. Shen, M. Ferdman, and P. Milder. Maximizing CNN accelerator efficiency through resource partitioning. In Proc. of ACM/IEEE ISCA, pages 535–547, 2017. [76] M. Shi, P. Houshmand, L. Mei, and M. Verhelst. Hardware-efficient residual neural network execution in line-buffer depth-first processing. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 11(4):690–700, 2021. [77] A. Stoutchinin, F. Conti, and L. Benini. Optimally scheduling cnn convolutions for efficient memory access, 2019. [78] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer. Efficient processing of deep neural networks: A tutorial and survey. PIEEE, 105(12):2295–2329, 2017. [79] V.Talla,B.Kellogg,S.Gollakota,andJ.R.Smith.Battery-FreeCellphone.InProc. of ACM IMWUT, 1(2), 2017. [80] M. Tan and Q. Le. EfficientNet: Rethinking model scaling for convolutional neural networks. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 6105–6114. PMLR, 09–15 Jun 2019. [81] K. Ullrich, E. Meeds, and M. Welling. Soft weight-sharing for neural network compression. arXiv preprint arXiv:1702.04008, 2017. [82] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao. Scaled-YOLOv4: Scaling cross stage partial network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13029–13038, June 2021. [83] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao. YOLOv7: Trainable bag-of- freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696, 2022. [84] K. Wang, Z. Liu, Y. Lin, J. Lin, and S. Han. HAQ: Hardware-aware automated quantization with mixed precision. In Proc. of IEEE CVPR, pages 8612–8620, 2019. [85] K. Wang, Z. Liu, Y. Lin, J. Lin, and S. Han. Haq: Hardware-aware automated quantization with mixed precision. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [86] P. Warden. Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209, 2018. [87] T.-J. Yang, Y.-H. Chen, and V. Sze. Designing energy-efficient convolutional neural networks using energy-aware pruning. In Proc. of IEEE CVPR, pages 5687–5695, 2017. [88] T.-J. Yang, Y.-H. Chen, and V. Sze. Designing energy-efficient convolutional neural networks using energy-aware pruning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6071–6079, 2017. [89] W. Yawen, W. Zhepeng, J. Zhenge, S. Yiyu, and H. Jingtong. Intermittent inference with nonuniformly compressed multi-exit neural network for energy harvesting powered devices. In Proc. of ACM/IEEE DAC, pages 1–6, 2020. [90] S. Zagoruyko and N. Komodakis. Diracnets: Training very deep neural networks without skip-connections, 2017. [91] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proc. of ACM FPGA, pages 161–170, 2015. [92] G. Zhang, A. Botev, and J. Martens. Deep learning without shortcuts: Shaping the kernel with tailored rectifiers. In International Conference on Learning Representations, 2022. [93] Y. Zhang, N. Suda, L. Lai, and V. Chandra. Hello edge: Keyword spotting on microcontrollers. arXiv preprint arXiv:1711.07128, 2017. [94] A. Zhou, A. Yao, Y. Guo, L. Xu, and Y. Chen. Incremental network quantization: Towards lossless cnns with low-precision weights. In International Conference on Learning Representations,ICLR2017, 2017. [95] S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients, 2016. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83258 | - |
| dc.description.abstract | 能量採集技術衍生出了新型態無電池間歇性系統的計算範式,同時也產生了新的研究問題是以往傳統電池供電系統所沒有的。目前的間歇性裝置的執行方式需要完全的系統內部狀態的存取能力來達到系統備份的功能,又或者需要依賴程式開發者依照應用的能源消耗來切分應用任務來完成間歇性的執行進度累積,此外,應用程式需要大量額外的記憶體空間來確保間歇性執行的正確性,這些要求對於在無電池的物聯網裝置上使用硬體加速執行深度學習推論是個非常大的難題。一方面是硬體加速器的內部狀態我們無法完全掌握,另一方面是深度學習模型不易於依照能源消耗來切分任務大小,此外深度學習往往需要大量的記憶體空間,再考慮到保護正確性的記憶體需求,這往往會超過一個微型裝置所能提供的範圍。在本文中,我們討論了關於無電池裝置上執行間歇性深度學習推論的三個重要問題。首先,我們介紹了間歇性執行硬件加速推論的問題並且提出了推論足跡的概念,以用來在斷電期間延續硬體加速器的執行進度。再來,為了解決推論足跡的高執行成本問題,我們提出了擴增模型的概念,讓深度學習模型得以適應間歇性系統。最後我們著重在深度學學的記憶體需求上,我們提出了殘差重分配的概念來重新調整模型內計算單元的連接關係,使得記憶體需求得以降低以符合微型裝置上的資源限制。 | zh_TW |
| dc.description.abstract | Energy harvesting allows for battery-less intermittent systems, but presents challenges for complex applications such as intermittent deep neural network (DNN) inference. Existing approaches to intermittent execution require energy estimation for task splitting or access to internal system state for checkpointing, and also require additional memory for application validity. These requirements can be difficult to fulfill on battery-less IoT devices that use hardware acceleration for DNN inference, as the internal state of peripherals may be inaccessible and it may be difficult for developers to divide the model into appropriate tasks that fit within an energy budget. Additionally, the large size of DNNs can lead to memory requirements that exceed the constraints of the device. This thesis discusses three issues related to deep neural network (DNN) inference on intermittently powered tiny devices. The first issue is how to perform hardware-accelerated DNN inference on these devices in an intermittent manner. We introduce the concept of inference footprinting to maintain progress across power cycles. The second issue is the high overhead of preserving the footprint, which can reduce the throughput benefits of inference footprinting. We propose the use of model augmentation to adapt deep models for use on intermittent devices. The final issue is the challenge of performing DNN inference on devices with limited memory. We present the concept of Deep Reorganization, which reorganizes residual connections in the DNN model to reduce the inference memory requirement and enable its use on resource-constrained devices. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-02-01T17:07:29Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2023-02-01T17:07:29Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 致謝 i
中文摘要 ii Abstract iii Contents iv List of Figures vii List of Tables ix 1 Introduction .............................................. 1 2 Background and Related work ............................... 6 2.1 Intermittent Systems .................................... 6 2.2 Deep Neural Network Acceleration ........................ 8 2.3 Intermittent Deep Inference ............................. 10 3 Hardware Acceleration for Intermittent Deep Inference ..... 12 3.1 Hardware Acceleration: Observation and Motivation ....... 12 3.1.1 Inaccessible Accelerator Internal State ............... 12 3.1.2 Misestimating Task Energy Consumption ................. 13 3.2 Footprint-based Intermittent DNN Inference .............. 15 3.2.1 Design Rationale ...................................... 15 3.2.2 System Architecture ................................... 17 3.2.3 Footprint Preservation ................................ 21 3.2.4 Footprint-aware Recovery .............................. 22 3.3 Performance Evaluation .................................. 24 3.3.1 Experimental Setup .................................... 24 3.3.2 Execution Time and Energy Overhead .................... 27 3.3.3 Inference Throughput .................................. 29 4 Model Augmentation for Efficient Intermittent Inference ... 32 4.1 Progress Preservation: Observation and Motivation ....... 32 4.2 JAPARI: Job and Progress Alternate Inference ............ 34 4.2.1 Design Challenges ..................................... 35 4.2.2 JAPARI Architecture ................................... 36 4.2.3 Footprint Appending for Progress Preservation ......... 38 4.2.4 Footprint Representation for Progress Recovery ........ 41 4.3 Performance Evaluation .................................. 45 4.3.1 Experimental Setup .................................... 45 4.3.2 Runtime Overhead ...................................... 47 4.3.3 Inference Time ........................................ 48 4.3.4 Breakdown Analysis .................................... 50 5 Deep Residual Reorganization for Memory Efficient Inference 55 5.1 Deep Model Design: Observation and Motivation ........... 55 5.1.1 Existing Solutions .................................... 56 5.1.2 Opportunity ........................................... 58 5.2 DERO: Deep Reorganization ............................... 59 5.2.1 Dependency-aware Residual Reconstruction .............. 60 5.2.2 Dependency-aware Operation Refinement ................. 62 5.3 Evaluation .............................................. 63 5.3.1 Experimental Setup .................................... 63 5.3.2 Accuracy Impact ....................................... 64 5.3.3 Inference Memory Reduction ............................ 65 5.3.4 Overhead Evaluation ................................... 65 6 Conclusion ................................................ 67 References .................................................. 69 | - |
| dc.language.iso | en | - |
| dc.subject | 間歇性系統 | zh_TW |
| dc.subject | 邊緣運算 | zh_TW |
| dc.subject | 能源採集 | zh_TW |
| dc.subject | 模型適應 | zh_TW |
| dc.subject | 深度學習 | zh_TW |
| dc.subject | model adaptation | en |
| dc.subject | Deep neural networks | en |
| dc.subject | edge computing | en |
| dc.subject | energy harvesting | en |
| dc.subject | Intermittent systems | en |
| dc.title | 無電池裝置之間歇性深度學習推論 | zh_TW |
| dc.title | Intermittent Deep Inference on Battery-less Devices | en |
| dc.title.alternative | Intermittent Deep Inference on Battery-less Devices | - |
| dc.type | Thesis | - |
| dc.date.schoolyear | 111-1 | - |
| dc.description.degree | 博士 | - |
| dc.contributor.oralexamcommittee | 修丕承;郭大維;洪士灝;黃俊郎;吳晉賢;陳雅淑 | zh_TW |
| dc.contributor.oralexamcommittee | Pi-Cheng Hsiu;Tei-Wei Kuo;Shih-Hao Hung;Jiun-Lang Huang;Chin-Hsien Wu;Ya-Shu Chen | en |
| dc.subject.keyword | 邊緣運算,能源採集,模型適應,間歇性系統,深度學習, | zh_TW |
| dc.subject.keyword | Deep neural networks,Intermittent systems,model adaptation,energy harvesting,edge computing, | en |
| dc.relation.page | 78 | - |
| dc.identifier.doi | 10.6342/NTU202210083 | - |
| dc.rights.note | 同意授權(限校園內公開) | - |
| dc.date.accepted | 2023-01-13 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 電機工程學系 | - |
| dc.date.embargo-lift | 2027-09-29 | - |
| 顯示於系所單位: | 電機工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-0455221126291015.pdf 未授權公開取用 | 2.92 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
