請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/77465
標題: | 利用多對多相對重要性及二維主成份分析於深層類神經網路權重之初始化研究 Initialization of Deep Neural Network with Many-to-many Relative Weight and Two-dimensional PCA |
作者: | Cheng-Hsi Chung 鍾承羲 |
指導教授: | 陳正剛(Argon Chen) |
關鍵字: | 權重起始,相對重要性,二維主成份分析,深層神經網路,卷積神經網路, Weight Initialization,Relative Weight,Two-dimensional PCA,Deep Neural Network,Convolutional Neural Network, |
出版年 : | 2018 |
學位: | 碩士 |
摘要: | 深度學習(deep learning)為近年來最常用來實現人工智慧(artificial intelligence)的演算法之一,但其主要的架構類神經網路(artificial neural network)及反向傳播 (back propagation)早在1900年代中後期就已經被廣為使用,卻直到2006年後較為深層的類神經網路才能被成功訓練,除了硬體運算能力的提升外,最主要的改進源自於激活函數(activation function)的選用及神經元聯接的權重起始值設定方法(initialization method)的開發,因此Glorot (2010) 研究不同激活函數造成類神經網路訓練的情況,並根據資訊傳導 (information flow) 的概念開發出Xavier’s Initialization Method,使深層類神經網路容易被訓練成功。
隨著類神經網路的層數加深,雖然Xavier’s Initialization Method可以幫助層與層之間的資訊傳遞順利,但由於訓練過程屬於梯度下降 (gradient decent) 的方法,權重起始後等同選定了起始解,所以訓練過程有時會陷入較差的區域最佳解 (local optimum) 附近或收斂 (convergence) 較慢。為解決此問題,本研究根據每一層神經元透過連接向前傳遞 (forward passing) 的資料特性之偏好決定隨機起始權重,根據該層內不同神經元的重要性決定連結之隨機起始權重,目標為解決較不合理的起始解。 本研究針對全連接層(fully-connected layer)之神經元重要性,使用Johnson (2000) 的Relative Weight決定隱藏層的神經元對於輸出層(output layer)的神經元的相對重要性,如果輸出層的神經元有多個時,則必須考慮一群變數與另一群變數之間的多對多(many-to-many) 相對重要性,如文獻中Hong(2012)所利用之典型相關分析建立多對多相對重要性。 針對卷積層(convolutional layer),本研究參考Masci(2011)在卷積層中使用Auto-encoder的概念,將不同Channel在經過Zhang(2005)的雙向二維主成份分析(bidirectional two-dimensional principle component)後可還原之變異的比例視為Channel間的重要性,藉此決定起始權重。 本文接著使用著名的資料集MNIST手寫數字辨識和Tiny ImageNet物件分類,分別用於建構起始權重的類神經網路,以驗證比較我們提出之起始權重決定方法和其他著名方法所訓練網路所需的Epochs和Convergence所需的時間。 Currently, Deep Learning (DL) is the machine learning technique most used to realize Artificial Intelligence. DL’s core framework based on the Artificial Neural Network and Back Propagation has been developed since mid-1990. However, it was not until 2006 that training of an artificial neural network with deep layers was inefficient and often failed due to the limitation of computing power and ineffective back-propagation algorithms with inappropriate activation functions and network initialization. Glorot (2010) studies the activation functions and gradient changes between consecutive layers of neural network through the training process and finds that to ensure the information flowing all the layers following the properties of Glorot uniform based on Xavier’s Initialization method is essential. Although Xavier’s Initialization method does help the information to flow through all the layers in a deep neural network, the initial weights in a neural network are actually combined to be an initial solution to optimize the objective (loss) function and are crucial to the convergence and the effectiveness of the subsequent back-propagation updating process. In many practical cases, there may be preferred initial solutions based on prior knowledges and transfer learning results. In this research, an initialization method with preference will be developed for both the hidden layers and convolutional layers based on the properties of Glorot uniform and the importance of neurons in a single layer in Deep Neural Network. To demonstrate the proposed initialization methods with preference, Johnson’s Relative Weights (2000) are used as preference to define the importance of neurons relating to the output layer in the fully-connected layers in a fashion similar to the relative importance of an independent variable in multiple regressions. Since the output layer may have multiple neurons, many-to-many relative weights developed by Hong (2012) may be required to characterize the preferences between two groups of multiple neurons. In the convolutional layer, the importance of a channel has to be aggregated from the weights of filters and the relationships to the next layers of channels. To demonstrate the proposed initialization method with preference, the concept of Masci’s Auto-encoder (2011) and Zhang’s bidirectional 2-D PCA (2005) are applied and the importance of a channel relating to the next layer of channels is defined by the first principal component’s variance and the importance of a filter within channels is defined by the first principal vector. Well-known datasets of digit recognition (MNIST) and image classification (Tiny ImageNet) are used to validate the performance of the proposed initialization for the fully-connected layers and convolutional layers using LeNet and AlexNet. The performance measures to compare the networks with and without preference are the number of training epochs and training time required to converge to a certain validation criterion. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/77465 |
DOI: | 10.6342/NTU201804010 |
全文授權: | 未授權 |
顯示於系所單位: | 工業工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-107-R05546026-1.pdf 目前未授權公開取用 | 3.99 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。