以知識迭代為規範的深度模型訓練

Jo-Han Hsu; 許若漢

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67115

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	莊永裕(Chuang-Yung Yu)
dc.contributor.author	Jo-Han Hsu	en
dc.contributor.author	許若漢	zh_TW
dc.date.accessioned	2021-06-17T01:20:20Z	-
dc.date.available	2018-08-14
dc.date.copyright	2017-08-14
dc.date.issued	2017
dc.date.submitted	2017-08-10
dc.identifier.citation	[1] Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166, Mar 1994. [2] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proc. Conf. Computer Vision and Pattern Recognition (CVPR), 2016. [3] G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. Deep Learning and Representation Learning Workshop (NIPS 2014), 2014. [4] G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger. Deep networks with stochastic depth. In Proc. Euro. Conf. Computer Vision (ECCV), 2016. [5] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS), 2012. [6] M. Lin, Q. Chen, and S. Yan. Network in network. arXiv:1312.4400, 2013. [7] Y. Luo, C. C. Chiu, N. Jaitly, and I. Sutskever. Learning online alignments with continuous rewards policy gradient. In Proc. Int’l Conf. Acoustics, Speech and Signal Processing (ICASSP), 2017. [8] G. Pereyra, G. Tucker, J. Chorowski, L. Kaiser, and G. E. Hinton. Regularizing Neural Networks by Penalizing Confident Output Distributions. In Proc. Int’l Conf. Learning Representations (ICLR), 2017. [9] A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio. Fitnets: Hints for thin deep nets. In Proc. Int’l Conf. Learning Representations (ICLR), 2015. [10] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proc. Int’l Conf. Learning Representations (ICLR), 2015. [11] S. Singh, D. Hoiem, and D. Forsyth. Swapout: Learning an ensemble of deep architectures. In Advances in Neural Information Processing Systems (NIPS), 2016. [12] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research (JMLR), 2014. [13] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proc. Conf. Computer Vision and Pattern Recognition (CVPR), 2015. [14] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In Proc. Conf. Computer Vision and Pattern Recognition (CVPR), 2016. [15] R. J. Williams and J. Peng. Function optimization using connectionist reinforcement learning algorithms. Connection Science, 1991. [16] L. Xie, J. Wang, Z. Wei, M. Wang, and Q. Tian. Disturblabel: Regularizing cnn on the loss layer. In Proc. Conf. Computer Vision and Pattern Recognition (CVPR), 2016. [17] S. Zagoruyko and N. Komodakis. Wide residual networks. In Proc. British Conf. Machine Vision (BMVC), 2016. [18] S. Zagoruyko, N. Komodakis, and P. Paristech. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In Proc. Int’l Conf. Learning Representations (ICLR), 2017.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67115	-
dc.description.abstract	各種規範類神經網路的方法於先前的論文中被驗證能在監督式學習中提升相當的效果。而萃取知識(knowledge distillation)也是其他的一種，跟傳統訓練類神經網路的方法不同的是，除了用資料集原先就有提供的類別(hard targets)外，他們額外拿其他較為強大的模型所提供的軟性類別(soft targets)來強化學習，進而達到很好的效果，然而大部分情況下，那些較為強大的模型是不存在的。我們提出一種以知識迭代為規範的深度模型訓練的方法(sDER)，讓模型可以藉由過去的訓練中提取所學的知識來規範自我，而我們的方法也解決了萃取知識需要其他較為強大的模型的缺點。我們的優化過程會同時考慮到當前的硬性標籤及過往的軟性標籤，實驗中證明這樣的方法是相當有效的。舉例來說，在影像辨識的領域中著名的CIFAR-10資料集裡，用我們的方法來訓練一個只有110層的類神經網路甚至可以表現得比1202層的還要好，而我們的方法不需要太多額外的時間及資源且相當簡單，而且可以應用在不同的領域上面。	zh_TW
dc.description.abstract	Regularizing neural networks has been shown to improve the performance in supervised learning. For example, Hinton et al. proposed the Knowledge Distillation to regularize models with soft targets provided by the other strong models. However, the assumption of having powerful teacher models does not hold for many problems. This thesis proposes a method called self-distillation by the epoch regularization (sDER) which distils the information from the past epoch models one must have during training. An epoch model is simply the model obtained after training with an epoch. By selecting several epoch models, we form a good ensemble model for regularization. Through optimization considering both fitting well with the hard target and epoch regularization, the performance can be improved. For example, on CIFAR-10, with the proposed sDER method, a shallower network with 110 layers outperforms its counterpart with 1,202 layers. Furthermore, this versatile method can improve many different models' performance without much extra overhead for training.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T01:20:20Z (GMT). No. of bitstreams: 1 ntu-106-R04922012-1.pdf: 954460 bytes, checksum: f3d94dc9e917c5163985f4497e688868 (MD5) Previous issue date: 2017	en
dc.description.tableofcontents	口試委員審定書 ii 誌謝 iii 摘要 iv Abstract v 1 Introduction 1 2 Related work 3 3 Self-distillation by epoch regularization 5 3.1 Label smoothing 5 3.2 Observation 6 3.3 Method 7 3.3.1 Epoch model selection 8 3.3.2 Optimization with epoch regularization 9 3.4 Discussions 10 4 Experiments 11 4.1 Dataset 11 4.1.1 CIFAR-10 11 4.1.2 CIFAR-100 12 4.2 Experiment setting 12 4.2.1 CIFAR-10 and CIFAR-100 12 4.2.2 Platform 12 4.3 Parameters selection 13 4.4 Baseline 15 4.4.1 Network in network 15 4.4.2 FitNets 15 4.4.3 Residual networks 15 4.4.4 Deep network with stochastic depth 16 4.4.5 Swapout 16 4.4.6 Wide residual networks 17 4.5 Comparisons with competing methods 17 4.6 Comparisons with regularization terms 18 4.6.1 Knowledge distillation 19 4.6.2 Penalizing confidence output distributions 19 4.7 Analysis 20 4.7.1 Effectiveness of sDER 20 4.7.2 Extra time cost 21 5 Conclusion 23 Bibliography 24
dc.language.iso	en
dc.subject	卷積類神經網路	zh_TW
dc.subject	影像辨識	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	image classification	en
dc.subject	deep learning	en
dc.subject	convolutional neural networks	en
dc.title	以知識迭代為規範的深度模型訓練	zh_TW
dc.title	Distilling from the Past: Self-Distillation by Using Epoch Regularization	en
dc.type	Thesis
dc.date.schoolyear	105-2
dc.description.degree	碩士
dc.contributor.coadvisor	林彥宇(Yen-Yu Lin)
dc.contributor.oralexamcommittee	陳煥宗(Hwann-Tzong Chen),賴尚宏(Shang-Hong Lai),黃春融(Chun-Rong Huang)
dc.subject.keyword	影像辨識,深度學習,卷積類神經網路,	zh_TW
dc.subject.keyword	image classification,deep learning,convolutional neural networks,	en
dc.relation.page	25
dc.identifier.doi	10.6342/NTU201702904
dc.rights.note	有償授權
dc.date.accepted	2017-08-11
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf 未授權公開取用	932.09 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。