Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89304
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳祝嵩zh_TW
dc.contributor.advisorChu-Song Chenen
dc.contributor.author鄒芷涵zh_TW
dc.contributor.authorChih-Han Tsouen
dc.date.accessioned2023-09-07T16:26:59Z-
dc.date.available2024-08-04-
dc.date.copyright2023-09-11-
dc.date.issued2023-
dc.date.submitted2023-08-05-
dc.identifier.citationC.-F. Chen, Q. Fan, and R. Panda. Crossvit: Cross-attention multi-scale vision transformer for image classification. ICCV, 2021.
M. Chen, H. Peng, J. Fu, and H. Ling. Autoformer: Searching transformers for visual recognition. ICCV, 2021.
M. Chen, K. Wu, B. Ni, H. Peng, B. Liu, J. Fu, H. Chao, and H. Ling. Searching the search space of vision transformer. In NeurIPS, 2021.
Y. Ci, C. Lin, M. Sun, B. Chen, H. Zhang, and W. Ouyang. Evolving search space for neural architecture search. ICCv, 2020.
E. D. Cubuk, B. Zoph, J. Shlens, and Q. V. Le. Randaugment: Practical automated data augmentation with a reduced search space. CVPR Workshops, 2019.
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021.
Z. Guo, X. Zhang, H. Mu, W. Heng, Z. Liu, Y. Wei, and J. Sun. Single path one-shot neural architecture search with uniform sampling. In ECCV, 2020.
Y. Le and X. S. Yang. Tiny imagenet visual recognition challenge. 2015.
Z. Lin, S. Geng, R. Zhang, P. Gao, G. de Melo, X. Wang, J. Dai, Y. Qiao, and H. Li. Frozen clip models are efficient video learners. arXiv preprint arXiv:2208.03550, 2022.
H. Liu, K. Simonyan, and Y. Yang. DARTS: Differentiable architecture search. In ICLR. 2019.
J. Liu, H. Li, G. Song, X. Huang, and Y. Liu. Uninet: Unified architecture search with convolution, transformer, and mlp. ECCV.2022.
Z. Liu, Y. Lin. Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. Swin transformer: Hierarchical vision transformer using shifted windows. ICCV, 2021.
I. Loshchiloy and F. Hutter. Decoupled weight decay regularization. In ICLR, 2017.
J. Peng, J. Zhang, C. Li, G. Wang, X. Liang, and L. Lin. Pi-nas: Improving neural architecture search by reducing supernet training consistency shift. ICCV, 2021.
H. Pham, M. Guan, B. Zoph, Q. Le, and J. Dean. Efficient neural architecture search via parameters sharing. In ICML, 2018.
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision. In ICML, 2021.
S. Ren, D. Zhou, S. He, J. Feng, and X. Wang. Shunted self-attention via multi-scale token aggregation. CVPR, 2021.
X. Su, S. You, J. Xie, M. Zheng, F. Wang, C. Qian, C. Zhang, X. Wang, and C. Xu. Vitas: Vision transformer architecture search. In ECCV, 2021.
Y.-L. Sung, J. Cho, and M. Bansal. Lst: Ladder side-tuning for parameter and memory efficient transfer learning. ArXiv, 2022.
A. Wan,X. Dai, P. Zhang, Z. He, Y. Tian, S. Xie, B. Wu, M. Yu, T. Xu, K. Chen, P. Vajda, and J. E. Gonzalez. Fbnetv2: Differentiable neural architecture search for spatial and channel dimensions. In CVPR, 2020.
W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao. Pyramid vision trans former: A versatile backbone for dense prediction without con-volutions. ICCV. 2021.
Z. Xia, X.Pan, S. Song, L. E. Li, and G. Huang. Vision transformer with deformable attention. In CVPR, 2022.
J. Yang, C. Li, P Zhang, X. Dai, B. Xiao, L. Yuan, and J. Gao. Focal attention for long-range interactions in vision transformers. In NeurIPS, 2021.
S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. J. Yoo. Cutmix: Regularization strategy to train strong classifiers with localizable features. ICCV, 2019.
H. Zhang, M. Cisse, Y. Dauphin, and D. Lopez-Paz. mixup: Beyond empirical risk minimization. ICLR. 2018.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89304-
dc.description.abstract近期,基於Transformer的方法因其在各種下游任務中卓越的性能而引起了廣泛的關注以及研究興趣,然而其架構仍然高度仰賴於相關專業性知識以及人為手動設計。在神經網路搜索的領域,也陸陸續續有許多相關的在Transformer的研究,但多數方法都是在固定的搜索空間上進行架構搜索,僅有極少數的方法會先對搜索空間進行優化再做神經網路的搜索。而這些少數方法儘管已考慮了搜索空間的更新,卻因為無法在源頭限制模型的大小而因此時常搜索出非常大型且複雜的模型,進而導致運算資源的大量消耗。
為了解決這些問題,我們引入了一個優化框架,能夠在更新Transformer架構搜索空間的同時,並考慮資源約束條件。該框架允許搜索空間在指定的約束條件(例如模型大小、FLOPS等)下從前一個搜索空間逐漸演化,以便更好地探索架構。具體而言,根據每個搜索維度對於準確度的影響,近似地計算準確度梯度。接著,我們根據與近似準確度梯度最相近的搜索空間合法方向來更新搜索空間。通過在包括Cifar10、Cifar100、Tiny ImageNet和SUN397等各種資料集上進行實驗,結果顯示我們提出的方法可以一致地發現更輕量級的架構,同時在性能上優於原始模型及使用其他NAS方法所找到的模型。此外,我們還展示了所提出的方法可以幫助探索對於最近流行的CLIP模型在新的下游任務中有效且輕量級的adapter。
zh_TW
dc.description.abstractRecently, transformer-based methods have gained significant attention and research interests due to their superior performance in various tasks, whereas their architectures still highly rely on manual design by human. Although recently neural architecture search (NAS)-based methods have been introduced to automate the process, these methods either require human to manually specify a fixed search space for architecture search or allow search space update but usually result in large and complex models for satisfactory performance. To address these issues, we introduce a constrained optimization framework to resource-aware search space exploration for transformer architecture search (Se-TAS), which allows the search space to evolve gradually from the previous one under user specified constraints (e.g., model size, FLOPS, etc.) for better architecture exploration. To be specific, the impact of each search dimension of a search space is calculated based on unit accuracy differential over the dimension as the approximate accuracy gradient. Then, we update the search space according to one of legitimate directions with the highest cosine similarity to the approximate accuracy gradient. With extensive experiments on various benchmarks, including Cifar10, Cifar100, Tiny ImageNet, and SUN397, the results demonstrate that the our method can consistently find much more lightweight architecture while achieve better performance than the original and the models searched by the compared NAS methods. Furthermore, we also show the proposed method can help explore effective and lightweight adapters for the recently popular foundation models to new downstream tasks.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-09-07T16:26:59Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2023-09-07T16:26:59Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsVerification Letter from the Oral Examination Committee i
Acknowledgements ii
摘要 iii
Abstract v
Contents vii
List of Figures ix
List of Tables x
Chapter 1 Introduction 1
Chapter 2 Related work 4
2.1 Vision transformer 4
2.2 Neural architecture search 5
Chapter 3 Method 7
3.1 Problem formulation 7
3.2 Search space 9
3.3 Resource-aware search space exploration 10
3.4 Discussion 14
Chapter 4 Experiments 17
4.1 Implementation details 17
4.2 Results on image classification 19
4.2.1 ViT model 19
4.2.2 Se-TAS in other Transformer model 20
4.2.3 Tiny ImageNet with Shunted Transformer and Swin Transformer 22
4.2.4 Comparison on using S3's search direction finding 22
4.2.5 Comparisons between S3 and Se-TAS on Cifar10, Cifar100, Tiny ImageNet and SUN397 dataset 23
Chapter 5 Conclusion 32
References 33
Appendix A Model size function 36
A.1 Vision transformer 36
A.2 Swin transformer 37
A.3 Shunted transformer 38
-
dc.language.isoen-
dc.subject神經網路搜索zh_TW
dc.subject資源限制zh_TW
dc.subjectresource-awareen
dc.subjectTransformeren
dc.subjectNeural Architecture Searchen
dc.title於有限資源中Transformer模型搜索空間之探索zh_TW
dc.titleResource-Aware Search Space Exploration for Transformer Architecture Searchen
dc.typeThesis-
dc.date.schoolyear111-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee陳駿丞;李宏毅;孫民zh_TW
dc.contributor.oralexamcommitteeJun-Cheng Chen;Hung-Yi Lee;Min Sunen
dc.subject.keyword神經網路搜索,資源限制,zh_TW
dc.subject.keywordTransformer,Neural Architecture Search,resource-aware,en
dc.relation.page39-
dc.identifier.doi10.6342/NTU202302168-
dc.rights.note同意授權(限校園內公開)-
dc.date.accepted2023-08-08-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊網路與多媒體研究所-
dc.date.embargo-lift2024-08-04-
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
ntu-111-2.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
2.01 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved