Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67115
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor莊永裕(Chuang-Yung Yu)
dc.contributor.authorJo-Han Hsuen
dc.contributor.author許若漢zh_TW
dc.date.accessioned2021-06-17T01:20:20Z-
dc.date.available2018-08-14
dc.date.copyright2017-08-14
dc.date.issued2017
dc.date.submitted2017-08-10
dc.identifier.citation[1] Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166, Mar 1994.
[2] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proc. Conf. Computer Vision and Pattern Recognition (CVPR), 2016.
[3] G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. Deep Learning and Representation Learning Workshop (NIPS 2014), 2014.
[4] G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger. Deep networks with stochastic depth. In Proc. Euro. Conf. Computer Vision (ECCV), 2016.
[5] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS), 2012.
[6] M. Lin, Q. Chen, and S. Yan. Network in network. arXiv:1312.4400, 2013.
[7] Y. Luo, C. C. Chiu, N. Jaitly, and I. Sutskever. Learning online alignments with continuous rewards policy gradient. In Proc. Int’l Conf. Acoustics, Speech and Signal Processing (ICASSP), 2017.
[8] G. Pereyra, G. Tucker, J. Chorowski, L. Kaiser, and G. E. Hinton. Regularizing Neural Networks by Penalizing Confident Output Distributions. In Proc. Int’l Conf. Learning Representations (ICLR), 2017.
[9] A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio. Fitnets: Hints for thin deep nets. In Proc. Int’l Conf. Learning Representations (ICLR), 2015.
[10] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proc. Int’l Conf. Learning Representations (ICLR), 2015.
[11] S. Singh, D. Hoiem, and D. Forsyth. Swapout: Learning an ensemble of deep architectures. In Advances in Neural Information Processing Systems (NIPS), 2016.
[12] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research (JMLR), 2014.
[13] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proc. Conf. Computer Vision and Pattern Recognition (CVPR), 2015.
[14] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In Proc. Conf. Computer Vision and Pattern Recognition (CVPR), 2016.
[15] R. J. Williams and J. Peng. Function optimization using connectionist reinforcement learning algorithms. Connection Science, 1991.
[16] L. Xie, J. Wang, Z. Wei, M. Wang, and Q. Tian. Disturblabel: Regularizing cnn on the loss layer. In Proc. Conf. Computer Vision and Pattern Recognition (CVPR), 2016.
[17] S. Zagoruyko and N. Komodakis. Wide residual networks. In Proc. British Conf. Machine Vision (BMVC), 2016.
[18] S. Zagoruyko, N. Komodakis, and P. Paristech. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In Proc. Int’l Conf. Learning Representations (ICLR), 2017.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67115-
dc.description.abstract各種規範類神經網路的方法於先前的論文中被驗證能在監督式學習中提升相當的效果。而萃取知識(knowledge distillation)也是其他的一種,跟傳統訓練類神經網路的方法不同的是,除了用資料集原先就有提供的類別(hard targets)外,他們額外拿其他較為強大的模型所提供的軟性類別(soft targets)來強化學習,進而達到很好的效果,然而大部分情況下,那些較為強大的模型是不存在的。我們提出一種以知識迭代為規範的深度模型訓練的方法(sDER),讓模型可以藉由過去的訓練中提取所學的知識來規範自我,而我們的方法也解決了萃取知識需要其他較為強大的模型的缺點。
我們的優化過程會同時考慮到當前的硬性標籤及過往的軟性標籤,實驗中證明這樣的方法是相當有效的。舉例來說,在影像辨識的領域中著名的CIFAR-10資料集裡,用我們的方法來訓練一個只有110層的類神經網路甚至可以表現得比1202層的還要好,而我們的方法不需要太多額外的時間及資源且相當簡單,而且可以應用在不同的領域上面。
zh_TW
dc.description.abstractRegularizing neural networks has been shown to improve the performance in supervised learning. For example, Hinton et al. proposed the Knowledge Distillation to regularize models with soft targets provided by the other strong models. However, the assumption of having powerful teacher models does not hold for many problems. This thesis proposes a method called self-distillation by the epoch regularization (sDER) which distils the information from the past epoch models one must have during training. An epoch model is simply the model obtained after training with an epoch. By selecting several epoch models, we form a good ensemble model for regularization.
Through optimization considering both fitting well with the hard target and epoch regularization, the performance can be improved.
For example, on CIFAR-10, with the proposed sDER method, a shallower network with 110 layers outperforms its counterpart with 1,202 layers. Furthermore, this versatile method can improve many different models' performance without much extra overhead for training.
en
dc.description.provenanceMade available in DSpace on 2021-06-17T01:20:20Z (GMT). No. of bitstreams: 1
ntu-106-R04922012-1.pdf: 954460 bytes, checksum: f3d94dc9e917c5163985f4497e688868 (MD5)
Previous issue date: 2017
en
dc.description.tableofcontents口試委員審定書 ii
誌謝 iii
摘要 iv
Abstract v
1 Introduction 1
2 Related work 3
3 Self-distillation by epoch regularization 5
3.1 Label smoothing 5
3.2 Observation 6
3.3 Method 7
3.3.1 Epoch model selection 8
3.3.2 Optimization with epoch regularization 9
3.4 Discussions 10
4 Experiments 11
4.1 Dataset 11
4.1.1 CIFAR-10 11
4.1.2 CIFAR-100 12
4.2 Experiment setting 12
4.2.1 CIFAR-10 and CIFAR-100 12
4.2.2 Platform 12
4.3 Parameters selection 13
4.4 Baseline 15
4.4.1 Network in network 15
4.4.2 FitNets 15
4.4.3 Residual networks 15
4.4.4 Deep network with stochastic depth 16
4.4.5 Swapout 16
4.4.6 Wide residual networks 17
4.5 Comparisons with competing methods 17
4.6 Comparisons with regularization terms 18
4.6.1 Knowledge distillation 19
4.6.2 Penalizing confidence output distributions 19
4.7 Analysis 20
4.7.1 Effectiveness of sDER 20
4.7.2 Extra time cost 21
5 Conclusion 23
Bibliography 24
dc.language.isoen
dc.subject卷積類神經網路zh_TW
dc.subject影像辨識zh_TW
dc.subject深度學習zh_TW
dc.subjectimage classificationen
dc.subjectdeep learningen
dc.subjectconvolutional neural networksen
dc.title以知識迭代為規範的深度模型訓練zh_TW
dc.titleDistilling from the Past: Self-Distillation by Using Epoch Regularizationen
dc.typeThesis
dc.date.schoolyear105-2
dc.description.degree碩士
dc.contributor.coadvisor林彥宇(Yen-Yu Lin)
dc.contributor.oralexamcommittee陳煥宗(Hwann-Tzong Chen),賴尚宏(Shang-Hong Lai),黃春融(Chun-Rong Huang)
dc.subject.keyword影像辨識,深度學習,卷積類神經網路,zh_TW
dc.subject.keywordimage classification,deep learning,convolutional neural networks,en
dc.relation.page25
dc.identifier.doi10.6342/NTU201702904
dc.rights.note有償授權
dc.date.accepted2017-08-11
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-106-1.pdf
  未授權公開取用
932.09 kBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved