以驗證點和可驗證度的觀點討論機器學習理論

Hao-Lun Hsu; 許浩倫

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/69318

標題:	以驗證點和可驗證度的觀點討論機器學習理論 Verified Instances and Verifiability: An Alternative Theoretical Prospective of Learning
作者:	Hao-Lun Hsu 許浩倫
指導教授:	于天立
關鍵字:	可驗證度,可驗證維,VC維,等效VC維, verifiability,verifiability dimension,VC dimension,effective VC dimension,
出版年 :	2018
學位:	碩士
摘要:	隨著機器學習的技術快速發展，機器學習在解決現實世界的問題扮演重要的角色。然而，為了使模型的能力更強大，學習模型因此變得越來越複雜，人類也因此越來越難理解模型會如何決策，或者在特定情況下會做出什麼判斷。由於對學習模型的不理解，人類會因而對模型產生不信任。因此，在這篇論文中，我定義了可驗證度和可驗證維。前者是一個衡量至少多少比例的輸入空間的點可以被所有模型空間中的模型正確預測。越高的可驗證度表示有越高的機率，模型會在其他的點上做出正確的預測，我們也能因此越信任模型。後者則是在衡量對於一個模型空間，其需要多少的訓練維度，才能確保平均的驗證度大於零。除此之外，在本論文中，我推導出驗證維的下界是等效VC維 (VC維的一種變體，其考慮了輸入空間的分佈)，還有可驗證度的上下界(上界與等效VC維有關、下界則與驗證維有關)。最後，我討論了可驗證度的取樣複雜度，並且證明了不可驗證度的下界是模型空間中最大的真實誤差。 The development of machine learning has grown rapidly and has been successfully applied to many real-world applications. However, the learning models are less comprehensible by humans. Such incomprehension causes distrust. I believe that for common users, comprehension comes from how a model behaves under certain circumstances. Thus, in this thesis, I propose the verifiability and the verifiability dimension. The former is an indicator of what proportion of instances are correctly classified by all hypotheses. Those hypotheses are generated by different learning algorithms given the same hypothesis space and the same training instances. With the higher verifiability, users can know that models make a correct prediction on an unknown instance with higher probability. The latter is the complexity of a hypothesis space that the average verifiability is greater than zero given the training instances. Besides, I show that the lower bound on the verifiability dimension is the effective Vapnik-Chervonenkis (VC) dimension, another version of VC dimension which takes the distribution of instance space into account. Also, I show that the upper and lower bounds on the average verifiability in terms of the effective VC dimension and the verifiability dimension. Finally, I discuss the sample complexity of unverifiability, the minimal number of training instances needed to ensure the unverifiability small enough. The lower bound on unverifiability is the maximal true error of all hypotheses in the version space. As a result, in statistical learning, the sample complexity of unverifiability is no less than the sample complexity of true error, which is $Theta(frac{VC_H}{epsilon})$.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/69318
DOI:	10.6342/NTU201801408
全文授權:	有償授權
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 未授權公開取用	3.1 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。