利用簡化群最佳化對類神經網路、卷積神經網路和圖卷積網路進行超參數優化

黃浩然; Ho-Yin Wong

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93793

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	王凡	zh_TW
dc.contributor.advisor	Farn Wang	en
dc.contributor.author	黃浩然	zh_TW
dc.contributor.author	Ho-Yin Wong	en
dc.date.accessioned	2024-08-08T16:14:35Z	-
dc.date.available	2024-08-09	-
dc.date.copyright	2024-08-08	-
dc.date.issued	2024	-
dc.date.submitted	2024-08-01	-
dc.identifier.citation	[1] A.Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, vol. 25, 2012, pp. 1097–1105. [2] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” 2016, arXiv:1609.02907. [3] L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834–848, April 2017. [4] K. Chen and X. Huang, “Feature extraction method of 3D art creation based on deep learning,” in Soft Computing - A Fusion of Foundations, Methodologies and Applications, vol. 24, no. 11, June 2020, pp. 8149–8161. [5] S. Haykin, “Introduction,” in Neural Network: A Comprehensive Foundation, vol. 2, Englewood Cliffs, NJ, USA: Prentice-Hall Inc., 1999, p. 23. [6] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” 2014, arXiv:1409.1556. [7] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” presented at IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, June 27-30, 2016 [8] M. Chen, Z. Wei, Z. Huang, B. Ding, and Y. Li, “Simple and Deep Graph Convolutional Networks,” 2020, arXiv:2007.02133. [9] E. Hazan, A. Klivans, and Y. Yuan, “Hyperparameter Optimization: A Spectral Approach,” 2017, arXiv:1706.00764. [10] X. Zhang, X. Chen, L. Yao, C. Ge, and M. Dong, “Deep Neural Network Hyperparameter Optimization with Orthogonal Array Tuning,” 2019, arXiv:1907.13359. [11] N. M. Aszemi and P.D.D. Dominic, “Hyperparameter optimization in convolutional neural network using genetic algorithms,” in International Journal of Advanced Computer Science and Applications, vol. 10, no. 6, pp. 269-278, 2019. [12] F. Johnson, A. Valderrama, C. Valle, B. Crawford, R. Soto, and R. Ñanculef, “Automating Configuration of Convolutional Neural Network Hyperparameters Using Genetic Algorithm,” in IEEE Access, vol. 8, pp. 156139-156152, August 2020. [13] S. Loussaief1 and A. Abdelkrim, “Convolutional Neural Network Hyper-Parameters Optimization based on Genetic Algorithms,” in International Journal of Advanced Computer Science and Applications, vol. 9, no. 10, pp. 252-266, 2018. [14] X. Xiao, M. Yan, S. Basodi, C. Ji, and Y. Pan, “Efficient Hyperparameter Optimization in Deep Learning Using a Variable Length Genetic Algorithm,” 2020, arXiv:2006.12703. [15] C. L. Huang, “A particle-based simplified swarm optimization algorithm for reliability redundancy allocation problems,” in Elsevier - Reliability Engineering and System Safety, vol. 142, pp. 221-230, October 2015. [16] T. Yamasaki, T. Honma, and K. Aizawa, “Efficient Optimization of Convolutional Neural Networks Using Particle Swarm Optimization,” presented at IEEE Third International Conference on Multimedia Big Data (BigMM), Laguna Hills, CA, USA, April 19-21, 2017. [17] P. R. Lorenzo, J. Nalepa, M. Kawulok, L. S. Ramos, and J. R. Pastor, “Particle swarm optimization for hyper-parameter selection in deep neural networks,” presented at Genetic and Evolutionary Computation Conference (GECCO), Berlin, Germany, July 15-19, 2017. [18] W. C. Yeh, Y. P. Lin, Y. C. Lian, C. M. Lai, and C.L. Huang, “Simplified swarm optimization for hyperparameters of convolutional neural networks,” in Elsevier - Computers & Industrial Engineering, vol. 177, 109076, March 2023. [19] W. C. Yeh, “An improved simplified swarm optimization,” in Elsevier - Knowledge-Based Systems, vol. 82, pp. 60-69, July 2015. [20] Y. Le Cun, B. Boser, J. S. Denker, R. E. Howard, W. Habbard, L. D. Jackel, and D. Henderson, “HFandwritten digit recognition with a back-propagation network,” in Advances in Neural Information Processing Systems 2. San Francisco, CA, USA: Morgan Kaufmann Publ. Inc., 1990, pp. 396–404. [21] M. Chen, Z. Wei, Z. Huang, B. Ding, and Y. Li, “Simple and Deep Graph Convolutional Networks.” 2020, arXiv:2007.02133. [22] M. F. Abbod, J. W. F. Catto, D. A. Linkens, and F. C. Hamdy, “Application of artificial intelligence to the management of urological cancer,” in Elsevier - The Journal of Urology, vol. 178, no. 4, pp. 1150–1156, October 2007. [23] SaffronEdge, “Feedforward vs Backpropagation ANN.” Linkedin. Accessed: June 3, 2024. [Online.] Available: https://www.linkedin.com/pulse/feedforward-vs-backpropagation-ann-saffronedge1/ [24] R. A. Alzahrani and A. C. Parker, “Neuromorphic Circuits with Neural Modulation Enhancing the Information Content of Neural Signaling”, in International Conference on Neuromorphic Systems 2020 (ICONS 2020), no. 19, July 2020, pp. 1-8. [25] Chrislb, “Diagram of an artificial neuron.” Wikipedia. Accessed: June 3, 2024. [Online.] Available: https://zh-yue.m.wikipedia.org/wiki/File:ArtificialNeuronModel_english.png [26] J. Brownlee. “How to Choose an Activation Function for Deep Learning.” Machine Learning Mastery. Accessed: June 3, 2024. [Online.] Available: https://machinelearningmastery.com/choose-an-activation-function-for-deep-learning/ [27] PyTorch Contributors. “torch.nn.” Pytorch. Accessed: June 3, 2024. [Online.] Available: https://pytorch.org/docs/stable/nn.html [28] J. Brownlee. “Understand the Impact of Learning Rate on Neural Network Performance.” Machine Learning Mastery. Accessed: June 3, 2024. [Online.] Available: https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks/ [29] J. Jordan. “Setting the learning rate of your neural network.” Jeremy Jordan. Accessed: June 3, 2024. [Online.] Available: https://www.jeremyjordan.me/nn-learning-rate/ [30] J. Brownlee. “A Gentle Introduction to Dropout for Regularizing Deep Neural Networks.” Machine Learning Mastery. Accessed: June 3, 2024. [Online.] Available: https://machinelearningmastery.com/dropout-for-regularizing-deep-neural-networks/ [31] T. P. Chen. “Dropout — 隨機關閉神經元 \| 模擬人腦神經元原理 \| 避免模型過擬合.” Medium. Accessed: June 3, 2024. [Online.] Available: https://medium.com/@a5560648/dropout-5fb2105dbf7c [32] M. V. Valueva, N. N. Nagornov, P. A. Lyakhov, G. V. Valuev, and N. I. Chervyakov, “Application of the residue number system to reduce hardware costs of the convolutional neural network implementation,” in Elsevier - Mathematics and Computers in Simulation, vol. 177, pp. 232-243, November 2020 [33] H. Gu, Y. Wang, S. Hong, and G. Gui, “Blind Channel Identification Aided Generalized Automatic Modulation Recognition Based on Deep Learning,” in IEEE Access, vol. 7, pp. 110722 - 110729, August 2019. [34] J. Laufer. “Image aesthetics quantification with a convolutional neural network (CNN).” Jens Laufer. Accessed: June 3, 2024. [Online.] Available: https://jenslaufer.com/machine/learning/image-aesthetics-quantification-with-a-convolutional-neural-network.html [35] A. Wang. “Deep Computer Vision Using Convolutional Neural Networks." Convolutional Neural Networks (CNN) #1 Kernel, Stride, Padding.” BrilliantCode.net. Accessed: June 3, 2024. [Online.] Available: https://www.brilliantcode.net/1584/convolutional-neural-networks-1-convolution-layer-stride-padding-kernel/ [36] A. Géron, “Deep Computer Vision Using Convolutional Neural Networks,” in Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow, vol. 2, Sebastopol, California, USA: O'Reilly Media. Inc., 2019, pp. 442-444. [37] M. Yani, B. Irawan, and C. Setianingsih, “Application of Transfer Learning Using Convolutional Neural Network Method for Early Detection of Terry’s Nail”, in Journal of Physics: Conference Series, 1201, 012052, May 2019 [38] S. Ramesh. “A guide to an efficient way to build neural network architectures- Part II: Hyper-parameter selection and tuning for Convolutional Neural Networks using Hyperas on Fashion-MNIST.” Medium. Accessed: June 3, 2024. [Online.] Available: https://towardsdatascience.com/a-guide-to-an-efficient-way-to-build-neural-network-architectures-part-ii-hyper-parameter-42efca01e5d7 [39] S. Pandey. “How to choose the size of the convolution filter or Kernel size for CNN?.” Medium. Accessed: June 3, 2024. [Online.] Available: https://medium.com/analytics-vidhya/how-to-choose-the-size-of-the-convolution-filter-or-kernel-size-for-cnn-86a55a1e2d15 [40] T. Kipf. “Graph Convolutional Networks.” GitHub. Accessed: June 3, 2024. [Online.] Available: https://tkipf.github.io/graph-convolutional-networks/ [41] Q. Li, Z. Han, and X. M. Wu, “Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning,” in The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 2018, pp. 3538-3545. [42] J. Kennedy and R. Eberhart, “Particle swarm optimization”, presented at ICNN'95 - International Conference on Neural Networks, Perth, WA, Australia, November 27 - December 01, 1995. [43] W. C. Yeh, “A two-stage discrete particle swarm optimization for the problem of multiple multi-level redundancy allocation in series systems,” in Elsevier - Expert Systems with Applications, vol. 36, no. 5, pp. 9192-9200, July 2009 [44] Colab. “Welcome to Colab!”. Colab. Accessed: June 3, 2024. [Online.] Available: https://colab.research.google.com/notebooks/welcome.ipynb?hl=en#scrollTo=P-H6Lw1vyNNd [45] scikit-learn. “scikit-learn Machine Learning in Python.” scikit-learn. Accessed: June 3, 2024. [Online.] Available: https://scikit-learn.org/stable/index.html [46] PyTorch Foundation. “Get Started.” Pytorch. Accessed: June 3, 2024. [Online.] Available: https://pytorch.org/ [47] PyG Team. “PyG Documentation.” PyTorch Geometric. Accessed: June 3, 2024. [Online.] Available: https://pytorch-geometric.readthedocs.io/en/latest/ [48] L. J. V. Miranda. “Welcome to PySwarms’s documentation!” Pyswarms. Accessed: June 3, 2024. [Online.] Available: https://pyswarms.readthedocs.io/en/latest/ [49] R. A. Fisher, June 30, 1988, “Iris”, UC Irvine Machine Learning Repository, doi: 10.24432/C56C76 [50] E. Alpaydin and C. Kaynak, June 30, 1998, “Optical Recognition of Handwritten Digits,” UC Irvine Machine Learning Repository, doi: 10.24432/C50P49 [51] AT&T lab, 1994, “Olivetti faces,” AT&T lab. [Online.] Available: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_olivetti_faces.html [52] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad, “Collective classification in network data,” AI magazine, vol. 29, no. 3, pp. 93, 2008.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93793	-
dc.description.abstract	本研究探討了簡化群最佳化（SSO）和改進的簡化群最佳化（iSSO）在提升類神經網路（ANN）、卷積神經網路（CNN）和圖卷積網路（GCN）性能方面的應用。目前深度學習的文獻主要集中在網絡架構上；然而，本研究認為超參數優化—特別是激活函數、優化器、正則化技術和學習率調度器的優化—也扮演著至關重要的角色。為了證實這個假設，利用SSO來優化標準ANN模型、CNN模型LeNet和基本的GCN模型的超參數。實驗評估包括對ANN使用Iris、UCI ML手寫數字和Olivetti人臉數據集，對CNN使用MNIST、Fashion-MNIST和CIFAR-10數據集，對GCN使用Cora、Citeseer和Pubmed數據集。結果顯示，使用SSO及其改進版本iSSO的模型準確性均超過了原始配置。此外，iSSO能在相對較短的時間內找到最佳超參數。另外，SSO還提供了一種系統化的方法來選擇激活函數和優化器。	zh_TW
dc.description.abstract	This research investigates the application of Simplified Swarm Optimization (SSO) and improved Simplified Swarm Optimization (iSSO) to enhance the performance of Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN) and Graph Convolutional Networks (GCN). Current literature in deep learning predominantly concentrates on network architectures; however, this study argues that hyperparameter optimization – specifically of the activation function, optimizer, regularization techniques, and learning rate schedulers – plays a critical role. To substantiate this hypothesis, SSO and iSSO were utilized to optimize hyperparameters in the standard ANN model, the CNN model LeNet, and a basic GCN model. The experimental evaluation included datasets such as Iris, UCI ML hand-written digits, and Olivetti faces datasets for ANNs, MNIST, Fashion-MNIST, and CIFAR-10 datasets for CNNs, and Cora, Citeseer, and Pubmed datasets for GCNs. The results indicated that the accuracy of models employing both SSO and its improved variant, iSSO, exceeded that of the original configurations. Moreover, the optimal hyperparameters were identified in a relatively short time in iSSO. Additionally, SSO provided a systematic methodology for the selection of activation functions and optimizers.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-08T16:14:35Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-08-08T16:14:35Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	口試委員會審定書 i 誌謝 ii 中文摘要 iii ABSTRACT iv CONTENTS v LIST OF FIGURES ix LIST OF TABLES x Chapter 1 Introduction 1 1.1 Background and Motivation 1 1.2 Objective 2 1.3 Contribution 2 1.4 Organization 3 Chapter 2 Literature review and related work 4 2.1 Artificial Neural Networks (ANN) 4 2.1.1 Artificial neuron 5 2.1.2 Feedforward and backward propagation 6 2.1.3 Hyperparameter 6 2.2 Convolutional Neural Networks (CNN) – LeNet 11 2.2.1 Architecture 12 2.2.2 Hyperparameter of CNN 14 2.3 Graph Convolutional Networks (GCN) 15 2.4 GCNII 16 2.5 Optimization method 17 2.5.1 Particle Swarm Optimization (PSO) 17 2.5.2 Simplified Swarm Optimization (SSO) on ANN/LeNet/GCN/GCNII 18 2.5.3 Improved Simplified Swarm Optimization (iSSO) on GCN/GCNII 18 2.6 Preliminaries 19 2.6.1 Colab 19 2.6.2 scikit-learn 19 2.6.3 Pytorch 19 2.6.4 PyTorch Geometric 20 2.6.5 Pyswarms 20 Chapter 3 Methodology and proposed method 21 3.1 Solution structure 21 3.1.1 SSO-ANN 21 3.1.2 SSO-LeNet 23 3.1.3 SSO/iSSO-GCN 26 3.1.4 SSO/iSSO-GCNII 27 3.2 Fitness function 30 3.3 Loss function 30 3.4 Pseudocode and flowchart 31 3.4.1 SSO 31 3.4.2 iSSO 33 Chapter 4 Experiment 37 4.1 Dataset 37 4.1.1 Iris, UCI ML hand-written digits and Olivetti faces for ANN 37 4.1.2 MNIST, Fashion-MNIST and CIFAR-10 [20] for CNN 37 4.1.3 Cora, Citeseer and Pubmed [52] for GCN and GCNII 38 4.2 Equipment 38 4.3 Experiment 1 – SSO-ANN 38 4.3.1 Target 38 4.3.2 Setup 39 4.3.3 Result 39 4.3.4 Observation 40 4.4 Experiment 2 – Original SSO-LeNet vs Modified SSO-LeNet 40 4.4.1 Target 40 4.4.2 Setup 41 4.4.3 Procedure 41 4.4.4 Result 42 4.4.5 Observation 44 4.5 Experiment 3 – GCN vs SSO-GCN 44 4.5.1 Target 44 4.5.2 Setup 45 4.5.3 Procedure 45 4.5.4 Result 45 4.5.5 Observation 47 4.6 Experiment 4 – PSO-GCN vs SSO-GCN vs iSSO-GCN 47 4.6.1 Target 47 4.6.2 Setup 47 4.6.3 Result 48 4.6.4 Observation 50 4.7 Experiment 5 – SSO/iSSO-GCNII 50 4.7.1 Target 50 4.7.2 Setup 51 4.7.3 Result 51 4.7.4 Observation 52 4.8 Experiment 6 – Different setting of SSO/iSSO-GCNII 52 4.8.1 Target 52 4.8.2 Setup 52 4.8.3 Result 53 4.8.4 Observation 54 Chapter 5 Conclusion and future work 55 5.1 Conclusion 55 5.2 Future work 55 REFERENCE 57	-
dc.language.iso	en	-
dc.title	利用簡化群最佳化對類神經網路、卷積神經網路和圖卷積網路進行超參數優化	zh_TW
dc.title	Simplified Swarm Optimization for hyperparameter of Artificial Neural Networks, Convolutional Neural Networks and Graph Convolutional Networks	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	李宏毅;林守德;葉國暉;陳銘憲	zh_TW
dc.contributor.oralexamcommittee	Hung-yi Lee;Shou-De Lin;Kuo-Hui Yeh;Ming-Syan Chen	en
dc.subject.keyword	深度學習,圖像識別,節點分類,卷積神經網絡,圖卷積神經網絡,簡化群體優化,超參數優化,	zh_TW
dc.subject.keyword	Deep learning,Image recognition,Node classification,Convolutional neural networks,Graph convolutional networks,Simplified swarm optimization,Hyperparameter optimization,	en
dc.relation.page	63	-
dc.identifier.doi	10.6342/NTU202402611	-
dc.rights.note	未授權	-
dc.date.accepted	2024-08-04	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電機工程學系	-
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf 目前未授權公開取用	1.86 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。