依據區域重要性的動態雙階層式稀疏化 Vision Transformer 方法

徐宸顥; Cheng-Hao Hsu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96974

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李綱	zh_TW
dc.contributor.advisor	Kang Li	en
dc.contributor.author	徐宸顥	zh_TW
dc.contributor.author	Cheng-Hao Hsu	en
dc.date.accessioned	2025-02-25T16:18:18Z	-
dc.date.available	2025-02-26	-
dc.date.copyright	2025-02-25	-
dc.date.issued	2025	-
dc.date.submitted	2025-02-09	-
dc.identifier.citation	Adrien Gaidon, Qiao Wang, Yohann Cabon, and Eleonora Vig. Virtual worlds as proxy for multi-object tracking analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4340–4349, 2016. William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research, 23(120):1–39, 2022. Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF international conference on computer vision, pages 568–578, 2021. Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in neural information processing systems, 34:12077–12090, 2021. Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015. Radosvet Desislavov, Fernando Martínez-Plumed, and José Hernández-Orallo. Compute and energy consumption trends in deep learning inference. arXiv preprint arXiv:2109.05472, 2021. Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28, 2015. Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016. Bike Chen, Chen Gong, and Jian Yang. Importance-aware semantic segmentation for autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems, 20(1):137–148, 2018. Jonah Philion, Amlan Kar, and Sanja Fidler. Learning to evaluate perception models using planner-centric metrics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14055–14064, 2020. Rowan McAllister, Blake Wulfe, Jean Mercat, Logan Ellis, Sergey Levine, and Adrien Gaidon. Control-aware prediction objectives for autonomous driving. In 2022 International Conference on Robotics and Automation (ICRA), pages 01–08. IEEE, 2022. Maria Lyssenko, Piyush Pimplikar, Maarten Bieshaar, Farzad Nozarian, and Rudolph Triebel. A safety-adapted loss for pedestrian detection in automated driving. arXiv preprint arXiv:2402.02986, 2024. Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015. Liang-Chieh Chen. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062, 2014. Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(12):2481–2495, 2017. Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and AlanLYuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2017. Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European conference on computer vision (ECCV), pages 325–341, 2018. Huihui Pan, Yuanduo Hong, Weichao Sun, and Yisong Jia. Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes. IEEE Transactions on Intelligent Transportation Systems, 24(3):3448–3460, 2022. Jiacong Xu, Zixiang Xiong, and Shankar P Bhattacharyya. Pidnet: A real-time semantic segmentation network inspired by pid controllers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19529–19539, 2023. Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020. Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR, 2021. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows.In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021. Xiangxiang Chu, Zhi Tian, Yuqing Wang, Bo Zhang, Haibing Ren, Xiaolin Wei, Huaxia Xia, and Chunhua Shen. Twins: Revisiting the design of spatial attention in vision transformers. Advances in neural information processing systems, 34:9355–9366, 2021. Zhiyang Chen, Yousong Zhu, Chaoyang Zhao, Guosheng Hu, Wei Zeng, Jinqiao Wang, and Ming Tang. Dpt: Deformable patch-based transformer for visual recognition.In Proceedings of the 29th ACM international conference on multimedia, pages 2899–2907, 2021. Lei Zhu, Xinjiang Wang, Zhanghan Ke, Wayne Zhang, and Rynson WH Lau. Biformer: Vision transformer with bi-level routing attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10323–10333, 2023. Ilya O Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, et al. Mlp-mixer: An all-mlp architecture for vision. Advances in neural information processing systems, 34:24261–24272, 2021. Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, and Shuicheng Yan. Metaformer is actually what you need for vision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10819–10829, 2022. Sachin Mehta and Mohammad Rastegari. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. arXiv preprint arXiv:2110.02178, 2021. Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, and Jian Ren. Efficientformer: Vision transformers at mobilenet speed. Advances in Neural Information Processing Systems, 35:12934–12949, 2022. Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, and Fahad Shahbaz Khan. Swiftformer: Efficient additive attention for transformer-based real-time mobile vision applications. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17425–17436, 2023. H Cai, J Li, M Hu, C Gan, and S Han. Efficientvit: Multi-scale linear attention for high-resolution dense prediction. arxiv 2024. arXiv preprint arXiv:2205.14756, 2022. Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip HS Torr, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6881–6890, 2021. Robin Strudel, Ricardo Garcia, Ivan Laptev, and Cordelia Schmid. Segmenter: Transformer for semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 7262–7272, 2021. Bowen Zhang, Zhi Tian, Quan Tang, Xiangxiang Chu, Xiaolin Wei, Chunhua Shen, et al. Segvit: Semantic segmentation with plain vision transformers. Advances in Neural Information Processing Systems, 35:4971–4982, 2022. Andrew G Howard. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017. Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6848–6856, 2018. Ching-Hao Wang, Kang-Yang Huang, Yi Yao, Jun-Cheng Chen, Hong-Han Shuai, and Wen-Huang Cheng. Lightweight deep learning: An overview. IEEE consumer electronics magazine, 2022. Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning filters for efficient convnets, 2017. Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, and William J Dally. Exploring the granularity of sparsity in convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 13–20, 2017. Aojun Zhou, Yukun Ma, Junnan Zhu, Jianbo Liu, Zhijie Zhang, Kun Yuan, Wenxiu Sun, and Hongsheng Li. Learning n: m fine-grained structured sparse neural networks from scratch. arXiv preprint arXiv:2102.04010, 2021. Geoffrey Hinton. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015. Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014. Ying Zhang, Tao Xiang, Timothy M Hospedales, and Huchuan Lu. Deep mutual learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4320–4328, 2018. B Zoph. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016. Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. Designing neural network architectures using reinforcement learning. arXiv preprint arXiv:1611.02167, 2016. Yizeng Han, Gao Huang, Shiji Song, Le Yang, Honghui Wang, and Yulin Wang. Dynamic neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7436–7456, 2021. Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, and Jimmy Lin. Deebert: Dynamic early exiting for accelerating bert inference. arXiv preprint arXiv:2004.12993, 2020. Zuxuan Wu, Tushar Nagarajan, Abhishek Kumar, Steven Rennie, Larry S Davis, Kristen Grauman, and Rogerio Feris. Blockdrop: Dynamic inference paths in residual networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8817–8826, 2018. Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, and Joseph E Gonzalez. Skipnet: Learning dynamic routing in convolutional networks. In Proceedings of the European conference on computer vision (ECCV), pages 409–424, 2018. Zhourong Chen, Yang Li, Samy Bengio, and Si Si. You look twice: Gaternet for dynamic filter selection in cnns. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9172–9180, 2019. David Eigen, Marc’Aurelio Ranzato, and Ilya Sutskever. Learning factored representations in a deep mixture of experts. arXiv preprint arXiv:1312.4314, 2013. Weizhe Hua, Yuan Zhou, Christopher M De Sa, Zhiru Zhang, and G Edward Suh. Channel gating neural networks. Advances in Neural Information Processing Systems, 32, 2019. Xiaoxiao Li, Ziwei Liu, Ping Luo, Chen Change Loy, and Xiaoou Tang. Not all pixels are equal: Difficulty-aware semantic segmentation via deep layer cascade. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3193–3202, 2017. Albert Q Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, et al. Mixtral of experts. arXiv preprint arXiv:2401.04088, 2024. Damai Dai, Chengqi Deng, Chenggang Zhao, RX Xu, Huazuo Gao, Deli Chen, Jiashi Li, Wangding Zeng, Xingkai Yu, Y Wu, et al. Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models. arXiv preprint arXiv:2401.06066, 2024. Yikang Shen, Zhen Guo, Tianle Cai, and Zengyi Qin. Jetmoe: Reaching llama2 performance with 0.1 m dollars. arXiv preprint arXiv:2404.07413, 2024. Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, André Susano Pinto, Daniel Keysers, and Neil Houlsby. Scaling vision with sparse mixture of experts. Advances in Neural Information Processing Systems, 34:8583–8595, 2021. Tianlong Chen, Xuxi Chen, Xianzhi Du, Abdullah Rashwan, Fan Yang, Huizhong Chen, Zhangyang Wang, and Yeqing Li. Adamv-moe: Adaptive multi-task vision mixture-of-experts. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17346–17357, 2023. Yongming Rao, Wenliang Zhao, Benlin Liu, Jiwen Lu, Jie Zhou, and Cho-Jui Hsieh. Dynamicvit: Efficient vision transformers with dynamic token sparsification. Advances in neural information processing systems, 34:13937–13949, 2021. Yifan Xu, Zhijie Zhang, Mengdan Zhang, Kekai Sheng, Ke Li, Weiming Dong, Liqing Zhang, Changsheng Xu, and Xing Sun. Evo-vit: Slow-fast token evolution for dynamic vision transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 2964–2972, 2022. Yifei Liu, Mathias Gehrig, Nico Messikommer, Marco Cannici, and Davide Scaramuzza. Revisiting token pruning for object detection and instance segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2658–2668, 2024. Bowen Pan, Rameswar Panda, Yifan Jiang, Zhangyang Wang, Rogerio Feris, and Aude Oliva. Ia-red ^2: Interpretability-aware redundancy reduction for vision transformers. Advances in Neural Information Processing Systems, 34:24898–24911, 2021. Lingchen Meng, Hengduo Li, Bor-Chun Chen, Shiyi Lan, Zuxuan Wu, Yu-Gang Jiang, and Ser-Nam Lim. Adavit: Adaptive vision transformers for efficient image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12309–12318, 2022. Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018. Mohsen Fayyaz, Soroush Abbasi Koohpayegani, Farnoush Rezaei Jafari, Sunando Sengupta, Hamid Reza Vaezi Joze, Eric Sommerlade, Hamed Pirsiavash, and Jürgen Gall. Adaptive token sampling for efficient vision transformers. In European Conference on Computer Vision, pages 396–414. Springer, 2022. Xiangcheng Liu, Tianyi Wu, and Guodong Guo. Adaptive sparse vit: Towards learnable adaptive token pruning by fully exploiting self-attention. arXiv preprint arXiv:2209.13802, 2022. Quan Tang, Bowen Zhang, Jiajun Liu, Fagui Liu, and Yifan Liu. Dynamic token pruning in plain vision transformers for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 777–786, 2023. Yuang Liu, Qiang Zhou, Jing Wang, Zhibin Wang, Fan Wang, Jun Wang, and Wei Zhang. Dynamic token-pass transformers for semantic segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1827–1836, 2024. Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Christoph Feichtenhofer, and Judy Hoffman. Token merging: Your vit but faster. arXiv preprint arXiv:2210.09461, 2022. Chenyang Lu, Daan de Geus, and Gijs Dubbelman. Content-aware token sharing for efficient semantic segmentation with vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23631–23640, 2023. Ma Yi-de, Liu Qing, and Qian Zhi-Bai. Automated image segmentation using improved pcnn model based on cross-entropy. In Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004., pages 743–746. IEEE, 2004. Xiangxiang Chu, Bo Zhang, Zhi Tian, Xiaolin Wei, and Huaxia Xia. Do we really need explicit position encodings for vision transformers. arXiv preprint arXiv:2102.10882, 3(8), 2021. Yikang Shen, Zheyu Zhang, Tianyou Cao, Shawn Tan, Zhenfang Chen, and Chuang Gan. Moduleformer: Modularity emerges from mixture-of-experts. arXiv e-prints, pages arXiv–2306, 2023. MMSegmentation Contributors. MMSegmentation: Openmmlab semantic segmentation toolbox and benchmark. https://github.com/open-mmlab/mmsegmentation, 2020. Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016. Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. Carla: An open urban driving simulator. In Conference on robot learning, pages 1–16. PMLR, 2017.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96974	-
dc.description.abstract	目前的自動駕駛系統中使用深度學習技術來感知環境的方法越來越多，然而大多數的方法不會去考慮車輛駕駛時感知周圍環境有不同程度重要性，讓模型在每個區域上消耗相同的運算成本，浪費不重要區域的運算成本，因此本研究提出一套依據重要性資訊，動態調整模型針對不同影像區域的稀疏化程度，藉此控制模型不同區域的辨識性能。本研究專注於語意分割任務，提出IDBS-ViT (Importance-based Dynamic Bi-Level Sparse Vision Transformer) 方法針對Pyramid Vision Transformer 編碼器架構中的多頭自注意力機制進行稀疏化，採用Token 剪枝(Token Pruning) 稀疏化架構，同時提出雙層式的剪枝決策架構，由上層的策略網路決定各區域的剪枝率，以及下層的剪枝模組決定實際每個Token的去留，藉由雙層的架構使上下層模組分別控制大範圍的區域與處理對Token層級細微的決策，分層管理稀疏化的範圍和細節。最後IDBS-ViT能夠依據重要性不同動態調整運算量，在提供模型最高重要度的情況下，提升SegFormer-B0模型的FPS (Frames per second) 從原本的 9.05 提升到 13.62，提升約 50%，在 Cityscapes驗證集中模型的mIoU為74.05相比原來的SegFormer-B0mIoU只下降0.8，且相比於傳統的靜態剪枝率方法有更好的性能運算與速度權衡，此外也在CARLA模擬器中的連續幀場景中進行推論，展示IDBS-ViT在不同重要性設計下能夠改變關注區域，以及依據影像狀態減少運算量。	zh_TW
dc.description.abstract	In autonomous driving systems, deep learning techniques are widely used for environmental perception. However, most approaches overlook the varying importance of different regions, leading to uniform computational cost across all areas and wasting resources on less critical ones. This study introduces IDBS-ViT (Importance-based Dynamic Bi-Level Sparse Vision Transformer), which dynamically adjusts sparsity levels in different image regions based on importance, optimizing recognition performance and efficiency in semantic segmentation tasks. Focusing on the Pyramid Vision Transformer encoder, IDBS-ViT applies token pruning for multi-head self-attention. A bi-level pruning framework enables coarse and fine control over sparsity, adjusting computation per region. By providing the model with the highest importance level, IDBS-ViT improves the FPS (Frames per Second) of the SegFormer-B0 model from 9.05 to 13.62, an approximately 50% increase. On the Cityscapes validation set, the model achieves an mIoU of 73.85, with only a 0.8-point drop compared to the original SegFormer-B0. Furthermore, it demonstrates a better trade-off between performance and speed compared to traditional static pruning rate methods. In continuous frame inference within the CARLA simulator, showcasing that IDBS-ViT can adjust focus regions based on importance designs and reduce computational load according to image conditions.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-02-25T16:18:18Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-02-25T16:18:18Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	口試委員會審定書 i 致謝 ii 摘要 iii Abstract iv 目次 vi 圖次 ix 表次 xi 第一章緒論 1 1.1 研究背景 1 1.2 研究動機與目的 2 1.3 研究貢獻 3 第二章文獻回顧 4 2.1 優化模型在高駕駛風險的區域 4 2.2 語意分割 5 2.2.1 VisionTransformer 6 2.3 模型壓縮 7 2.3.1 模型剪枝 (Model Pruning) 8 2.3.2 量化(Quantization) 8 2.3.3 知識蒸餾 (Knowledge Distillation) 9 2.3.4 NAS (Neural Architecture Search) 9 2.4 動態神經網路 (Dynamic Neural Networks) 10 2.4.1 動態神經網路應用於Transformers 11 第三章研究方法 14 3.1 問題描述與指標 14 3.1.1 問題描述 14 3.1.2 模型運算效率指標 16 3.1.3 模型性能指標 17 3.2 模型架構 18 3.2.1 SegFormer 18 3.2.2 模型整體架構 20 3.2.3 剪枝比率策略網路 (Pruning Ratio Policy Network) 22 3.2.4 Token 剪枝模組 (Token Pruning Module) 26 3.2.5 訓練方法與步驟 28 第四章實驗結果分析與討論 31 4.1 實驗設置 31 4.1.1 資料集與實驗資料介紹 31 4.1.1.1 Cityscapes資料集 31 4.1.1.2 CARLA資料集與CARLA場景15 (CARLA Scenario 15) 32 4.1.2 實驗電腦軟硬體環境配置 34 4.1.3 模型超參數設置 35 4.2 實驗分析 35 4.2.1 運算效率指標與性能指標分析 36 4.2.2 剪枝分析 39 4.2.3 重要性區域實驗與重要性輸入分析 47 4.2.4 CARLA場景測試 51 4.2.5 消融實驗 55 第五章結論與未來建議 57 5.1 結論 57 5.2 未來建議 57 參考文獻 59	-
dc.language.iso	zh_TW	-
dc.title	依據區域重要性的動態雙階層式稀疏化 Vision Transformer 方法	zh_TW
dc.title	A Dynamic Bi-level Sparse Vision Transformer Method based on Regional Importance	en
dc.type	Thesis	-
dc.date.schoolyear	113-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	詹魁元;蕭德聖	zh_TW
dc.contributor.oralexamcommittee	Kuei-Yuan Chan;Te-Sheng Hsiao	en
dc.subject.keyword	Vision Transformer,模型稀疏化,動態神經網路,Token pruning,	zh_TW
dc.subject.keyword	Vision Transformer,Model sparsification,Dynamic neural network,Token pruning,	en
dc.relation.page	70	-
dc.identifier.doi	10.6342/NTU202500104	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2025-02-10	-
dc.contributor.author-college	工學院	-
dc.contributor.author-dept	機械工程學系	-
dc.date.embargo-lift	2030-01-14	-
顯示於系所單位：	機械工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-1.pdf 目前未授權公開取用	8.88 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。