請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67115完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 莊永裕(Chuang-Yung Yu) | |
| dc.contributor.author | Jo-Han Hsu | en |
| dc.contributor.author | 許若漢 | zh_TW |
| dc.date.accessioned | 2021-06-17T01:20:20Z | - |
| dc.date.available | 2018-08-14 | |
| dc.date.copyright | 2017-08-14 | |
| dc.date.issued | 2017 | |
| dc.date.submitted | 2017-08-10 | |
| dc.identifier.citation | [1] Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166, Mar 1994.
[2] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proc. Conf. Computer Vision and Pattern Recognition (CVPR), 2016. [3] G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. Deep Learning and Representation Learning Workshop (NIPS 2014), 2014. [4] G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger. Deep networks with stochastic depth. In Proc. Euro. Conf. Computer Vision (ECCV), 2016. [5] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS), 2012. [6] M. Lin, Q. Chen, and S. Yan. Network in network. arXiv:1312.4400, 2013. [7] Y. Luo, C. C. Chiu, N. Jaitly, and I. Sutskever. Learning online alignments with continuous rewards policy gradient. In Proc. Int’l Conf. Acoustics, Speech and Signal Processing (ICASSP), 2017. [8] G. Pereyra, G. Tucker, J. Chorowski, L. Kaiser, and G. E. Hinton. Regularizing Neural Networks by Penalizing Confident Output Distributions. In Proc. Int’l Conf. Learning Representations (ICLR), 2017. [9] A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio. Fitnets: Hints for thin deep nets. In Proc. Int’l Conf. Learning Representations (ICLR), 2015. [10] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proc. Int’l Conf. Learning Representations (ICLR), 2015. [11] S. Singh, D. Hoiem, and D. Forsyth. Swapout: Learning an ensemble of deep architectures. In Advances in Neural Information Processing Systems (NIPS), 2016. [12] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research (JMLR), 2014. [13] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proc. Conf. Computer Vision and Pattern Recognition (CVPR), 2015. [14] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In Proc. Conf. Computer Vision and Pattern Recognition (CVPR), 2016. [15] R. J. Williams and J. Peng. Function optimization using connectionist reinforcement learning algorithms. Connection Science, 1991. [16] L. Xie, J. Wang, Z. Wei, M. Wang, and Q. Tian. Disturblabel: Regularizing cnn on the loss layer. In Proc. Conf. Computer Vision and Pattern Recognition (CVPR), 2016. [17] S. Zagoruyko and N. Komodakis. Wide residual networks. In Proc. British Conf. Machine Vision (BMVC), 2016. [18] S. Zagoruyko, N. Komodakis, and P. Paristech. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In Proc. Int’l Conf. Learning Representations (ICLR), 2017. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67115 | - |
| dc.description.abstract | 各種規範類神經網路的方法於先前的論文中被驗證能在監督式學習中提升相當的效果。而萃取知識(knowledge distillation)也是其他的一種,跟傳統訓練類神經網路的方法不同的是,除了用資料集原先就有提供的類別(hard targets)外,他們額外拿其他較為強大的模型所提供的軟性類別(soft targets)來強化學習,進而達到很好的效果,然而大部分情況下,那些較為強大的模型是不存在的。我們提出一種以知識迭代為規範的深度模型訓練的方法(sDER),讓模型可以藉由過去的訓練中提取所學的知識來規範自我,而我們的方法也解決了萃取知識需要其他較為強大的模型的缺點。
我們的優化過程會同時考慮到當前的硬性標籤及過往的軟性標籤,實驗中證明這樣的方法是相當有效的。舉例來說,在影像辨識的領域中著名的CIFAR-10資料集裡,用我們的方法來訓練一個只有110層的類神經網路甚至可以表現得比1202層的還要好,而我們的方法不需要太多額外的時間及資源且相當簡單,而且可以應用在不同的領域上面。 | zh_TW |
| dc.description.abstract | Regularizing neural networks has been shown to improve the performance in supervised learning. For example, Hinton et al. proposed the Knowledge Distillation to regularize models with soft targets provided by the other strong models. However, the assumption of having powerful teacher models does not hold for many problems. This thesis proposes a method called self-distillation by the epoch regularization (sDER) which distils the information from the past epoch models one must have during training. An epoch model is simply the model obtained after training with an epoch. By selecting several epoch models, we form a good ensemble model for regularization.
Through optimization considering both fitting well with the hard target and epoch regularization, the performance can be improved. For example, on CIFAR-10, with the proposed sDER method, a shallower network with 110 layers outperforms its counterpart with 1,202 layers. Furthermore, this versatile method can improve many different models' performance without much extra overhead for training. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-17T01:20:20Z (GMT). No. of bitstreams: 1 ntu-106-R04922012-1.pdf: 954460 bytes, checksum: f3d94dc9e917c5163985f4497e688868 (MD5) Previous issue date: 2017 | en |
| dc.description.tableofcontents | 口試委員審定書 ii
誌謝 iii 摘要 iv Abstract v 1 Introduction 1 2 Related work 3 3 Self-distillation by epoch regularization 5 3.1 Label smoothing 5 3.2 Observation 6 3.3 Method 7 3.3.1 Epoch model selection 8 3.3.2 Optimization with epoch regularization 9 3.4 Discussions 10 4 Experiments 11 4.1 Dataset 11 4.1.1 CIFAR-10 11 4.1.2 CIFAR-100 12 4.2 Experiment setting 12 4.2.1 CIFAR-10 and CIFAR-100 12 4.2.2 Platform 12 4.3 Parameters selection 13 4.4 Baseline 15 4.4.1 Network in network 15 4.4.2 FitNets 15 4.4.3 Residual networks 15 4.4.4 Deep network with stochastic depth 16 4.4.5 Swapout 16 4.4.6 Wide residual networks 17 4.5 Comparisons with competing methods 17 4.6 Comparisons with regularization terms 18 4.6.1 Knowledge distillation 19 4.6.2 Penalizing confidence output distributions 19 4.7 Analysis 20 4.7.1 Effectiveness of sDER 20 4.7.2 Extra time cost 21 5 Conclusion 23 Bibliography 24 | |
| dc.language.iso | en | |
| dc.subject | 卷積類神經網路 | zh_TW |
| dc.subject | 影像辨識 | zh_TW |
| dc.subject | 深度學習 | zh_TW |
| dc.subject | image classification | en |
| dc.subject | deep learning | en |
| dc.subject | convolutional neural networks | en |
| dc.title | 以知識迭代為規範的深度模型訓練 | zh_TW |
| dc.title | Distilling from the Past: Self-Distillation by Using Epoch Regularization | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 105-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.coadvisor | 林彥宇(Yen-Yu Lin) | |
| dc.contributor.oralexamcommittee | 陳煥宗(Hwann-Tzong Chen),賴尚宏(Shang-Hong Lai),黃春融(Chun-Rong Huang) | |
| dc.subject.keyword | 影像辨識,深度學習,卷積類神經網路, | zh_TW |
| dc.subject.keyword | image classification,deep learning,convolutional neural networks, | en |
| dc.relation.page | 25 | |
| dc.identifier.doi | 10.6342/NTU201702904 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2017-08-11 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-106-1.pdf 未授權公開取用 | 932.09 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
