T5-CL: 一個應用於不同類型分類任務的端對端持續學習框架

Chih-Chieh Wang; 王致傑

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85632

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	廖世偉(Shih-Wei Liao)
dc.contributor.author	Chih-Chieh Wang	en
dc.contributor.author	王致傑	zh_TW
dc.date.accessioned	2023-03-19T23:20:09Z	-
dc.date.copyright	2022-07-12
dc.date.issued	2022
dc.date.submitted	2022-06-28
dc.identifier.citation	1. Biesialska, M., K. Biesialska, and M.R. Costa-jussà, Continual lifelong learning in natural language processing: A survey. arXiv preprint arXiv:2012.09823, 2020. 2. Devlin, J., et al., Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. 3. Liu, Y., et al., Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019. 4. McCloskey, M. and N.J. Cohen, Catastrophic interference in connectionist networks: The sequential learning problem, in Psychology of learning and motivation. 1989, Elsevier. p. 109-165. 5. Rebuffi, S.-A., et al. icarl: Incremental classifier and representation learning. in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2017. 6. Castro, F.M., et al. End-to-end incremental learning. in Proceedings of the European conference on computer vision (ECCV). 2018. 7. Arumae, K. and P. Bhatia, CALM: Continuous Adaptive Learning for Language Modeling. arXiv preprint arXiv:2004.03794, 2020. 8. Xu, H., et al., Pre-trained models: Past, present and future. arXiv preprint arXiv:2106.07139, 2021. 3. 9. Ruder, S., An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098, 2017. 10. Strubell, E., A. Ganesh, and A. McCallum, Energy and policy considerations for deep learning in NLP. arXiv preprint arXiv:1906.02243, 2019. 11. Ke, Z., H. Xu, and B. Liu, Adapting bert for continual learning of a sequence of aspect sentiment classification tasks. arXiv preprint arXiv:2112.03271, 2021. 12. Houlsby, N., et al. Parameter-efficient transfer learning for NLP. in International Conference on Machine Learning. 2019. PMLR. 13. Pfeiffer, J., et al., Adapterhub: A framework for adapting transformers. arXiv preprint arXiv:2007.07779, 2020. 14. Raffel, C., et al., Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683, 2019. 15. Pfeiffer, J., et al., AdapterFusion: Non-destructive task composition for transfer learning. arXiv preprint arXiv:2005.00247, 2020. 16. Vaswani, A., et al., Attention is all you need. Advances in neural information processing systems, 2017. 30. 17. McCann, B., et al., The natural language decathlon: Multitask learning as question answering. arXiv preprint arXiv:1806.08730, 2018. 18. Bhargava, P., A. Drozd, and A. Rogers, Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics. arXiv preprint arXiv:2110.01518, 2021. 19. Jiao, X., et al., Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351, 2019. 20. Radford, A., et al., Language models are unsupervised multitask learners. OpenAI blog, 2019. 1(8): p. 9. 21. Kirkpatrick, J., et al., Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 2017. 114(13): p. 3521-3526. 22. Bapna, A., N. Arivazhagan, and O. Firat, Simple, scalable adaptation for neural machine translation. arXiv preprint arXiv:1909.08478, 2019. 23. Artetxe, M., S. Ruder, and D. Yogatama, On the cross-lingual transferability of monolingual representations. arXiv preprint arXiv:1910.11856, 2019. 24. Pfeiffer, J., et al., Mad-x: An adapter-based framework for multi-task cross-lingual transfer. arXiv preprint arXiv:2005.00052, 2020. 25. Kim, H., et al., An Alternating Training Method of Attention-Based Adapters for Visual Explanation of Multi-Domain Satellite Images. IEEE Access, 2021. 9: p. 62332-62346. 26. Shazeer, N. and M. Stern. Adafactor: Adaptive learning rates with sublinear memory cost. in International Conference on Machine Learning. 2018. PMLR. 27. Loshchilov, I. and F. Hutter, Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85632	-
dc.description.abstract	在真實世界的應用中，新的資料或任務會逐一進入系統，神經網路模型一但直接學習這些新的任務便會導致災難性遺忘(Catastrophic Forgetting)的發生，這迫使模型在學習新資料的同時必須連同舊資料一起重新訓練，否則就必須訓練一個新的模型去對應新的任務，兩種方法皆造成時間或儲存空間上的資源浪費。持續學習(Continual Learning)旨在研究如何讓深度神經網路在學習新的知識的同時不會遺忘過去所學習的知識。然而，過去的研究會因為模型限制導致其無法應對各種類型的任務，或無法提供可跟一般的深度學習模型比擬的效能。本文提出一個基於T5(Unified Text-to-Text Transformer)和適配器(Adapter)技術的持續學習框架：T5-CL。相較於預訓練模型，該框架在持續學習上對於時間和空間的利用效率更好，能在不做任何改動的情況下應對各種類型任務，且提供了穩定具競爭力的效能，此外，該框架提供了一個統一的端對端介面，使用者僅需處理資料前處理和提供該資料集的評分公式即可訓練，這幫助使用者節省了對於新任務的開發時間。	zh_TW
dc.description.abstract	In the real-world application, new data or tasks will enter the system sequentially. Once learning these tasks directly, a deep neural network will suffer from serious forgetting, forcing engineers to re-trained the model with old and new data. Otherwise, it is required to train a new model corresponding to a new task, both of the ways result in a waste of resources in time or storage space. Continual learning aims to research how to let the deep neural network learn new knowledge without forgetting the knowledge learned in the past. However, due to architecture limitations, past approaches cannot deal with various types of tasks or provide comparable performance to general neural networks. Therefore, we proposed a continual learning framework: T5-CL, based on T5(Unified Text-to-Text Transformer) and adapter, this framework can cope with various types of tasks without modifying model architecture and can provide stable and competitive performance with better time and space efficiency. Moreover, this framework provides a unified end-to-end interface and users can save their development time since they only need to deal with data processing and provide performance criteria. We future demonstrated the utility of this framework through extensive experiments.	en
dc.description.provenance	Made available in DSpace on 2023-03-19T23:20:09Z (GMT). No. of bitstreams: 1 U0001-2606202213290400.pdf: 3474640 bytes, checksum: 74f995fbe950f353fa5a850ee94bdd6e (MD5) Previous issue date: 2022	en
dc.description.tableofcontents	口試委員會審定書 I 誌謝 II 摘要 III Abstract IV 目錄 V 圖目錄 VII Chapter 1 Introduction 2 Chapter 2 Preliminary 5 2.1 Transformer 5 2.2 BERT 6 2.3 T5 10 Chapter 3 Related Work 12 3.1 Desired properties for a CL system 12 3.2 Related machine learning techniques 13 3.3 Related works to CL 14 3.3.1 Traditional methods 14 3.3.2 Adapter-BERT 15 3.4 Summary 20 Chapter 4 Method 21 4.1 T5-CL 21 4.2 Adapter fusion layer 24 Chapter 5 Experiments 27 5.1 Datasets 27 5.2 Baseline Methods 30 5.3 Experiment Setup 30 5.3.1 Training 30 5.3.2 Hyperparameters 31 5.4 Results and Analysis 31 Chapter 6 Conclusion 35 REFERENCE 36 Appendix 1 38
dc.language.iso	zh-TW
dc.subject	預訓練模型	zh_TW
dc.subject	持續學習	zh_TW
dc.subject	災難性遺忘	zh_TW
dc.subject	適配器	zh_TW
dc.subject	端對端框架	zh_TW
dc.subject	持續學習	zh_TW
dc.subject	災難性遺忘	zh_TW
dc.subject	預訓練模型	zh_TW
dc.subject	適配器	zh_TW
dc.subject	端對端框架	zh_TW
dc.subject	End-to-End Framework	en
dc.subject	Continual Learning	en
dc.subject	Catastrophic Forgetting	en
dc.subject	Pre-trained Model	en
dc.subject	Adapter	en
dc.subject	End-to-End Framework	en
dc.subject	Catastrophic Forgetting	en
dc.subject	Pre-trained Model	en
dc.subject	Adapter	en
dc.subject	Continual Learning	en
dc.title	T5-CL: 一個應用於不同類型分類任務的端對端持續學習框架	zh_TW
dc.title	T5-CL: An End-to-End Continual Learning Framework Applying on Different Types of Classification Tasks	en
dc.type	Thesis
dc.date.schoolyear	110-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	戴敏育(Min-Yuh Day),孫瑞鴻(Ray-Hon Sun)
dc.subject.keyword	持續學習,災難性遺忘,預訓練模型,適配器,端對端框架,	zh_TW
dc.subject.keyword	Continual Learning,Catastrophic Forgetting,Pre-trained Model,Adapter,End-to-End Framework,	en
dc.relation.page	40
dc.identifier.doi	10.6342/NTU202201121
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2022-06-30
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
dc.date.embargo-lift	2022-07-12	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
U0001-2606202213290400.pdf	3.39 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。