Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21970
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor歐陽彥正
dc.contributor.authorLih-ching Chouen
dc.contributor.author周立晴zh_TW
dc.date.accessioned2021-06-08T03:55:40Z-
dc.date.copyright2018-08-21
dc.date.issued2018
dc.date.submitted2018-08-15
dc.identifier.citation[1] H. Bozdogan, “Model selection and Akaike's information criterion (AIC): The general theory and its analytical extensions,” Psychometrika, vol. 52, no. 3, pp. 345-370, 1987.
[2] R. J. Steele, and A. E. Raftery, “Performance of Bayesian model selection criteria for Gaussian mixture models,” Frontiers of statistical decision making and bayesian analysis, vol. 2, pp. 113-130, 2010.
[3] D. W. Scott, Multivariate density estimation: theory, practice, and visualization: John Wiley & Sons, 2015.
[4] D. O. Loftsgaarden, and C. P. Quesenberry, “A nonparametric estimate of a multivariate density function,” The Annals of Mathematical Statistics, vol. 36, no. 3, pp. 1049-1051, 1965.
[5] A. Bergstrom, “The estimation of nonparametric functions in a Hilbert space,” Econometric Theory, vol. 1, no. 1, pp. 7-26, 1985.
[6] C. J. Stone, “Consistent nonparametric regression,” The annals of statistics, pp. 595-620, 1977.
[7] B. W. Silverman, Density estimation for statistics and data analysis: Routledge, 1986.
[8] E. Parzen, “On estimation of a probability density function and mode,” The annals of mathematical statistics, vol. 33, no. 3, pp. 1065-1076, 1962.
[9] G. R. Terrell, and D. W. Scott, “Variable kernel density estimation,” The Annals of Statistics, pp. 1236-1265, 1992.
[10] J. E. Chacón, T. Duong, and M. Wand, “Asymptotics for general multivariate kernel density derivative estimators,” Statistica Sinica, pp. 807-840, 2011.
[11] S.-T. Chiu, “Bandwidth selection for kernel density estimation,” The Annals of Statistics, pp. 1883-1905, 1991.
[12] Z. I. Botev, J. F. Grotowski, and D. P. Kroese, “Kernel density estimation via diffusion,” The annals of Statistics, vol. 38, no. 5, pp. 2916-2957, 2010.
[13] M. Rudemo, “Empirical choice of histograms and kernel density estimators,” Scandinavian Journal of Statistics, pp. 65-78, 1982.
[14] C. R. Loader, “Bandwidth selection: classical or plug-in?,” The Annals of Statistics, vol. 27, no. 2, pp. 415-438, 1999.
[15] Q. Wang, and B. G. Lindsay, “Improving cross-validated bandwidth selection using subsampling-extrapolation techniques,” Computational Statistics & Data Analysis, vol. 89, pp. 51-71, 2015.
[16] V. A. Epanechnikov, “Non-parametric estimation of a multivariate probability density,” Theory of Probability & Its Applications, vol. 14, no. 1, pp. 153-158, 1969.
[17] M. Jones, and D. Signorini, “A comparison of higher-order bias kernel density estimators,” Journal of the American Statistical Association, vol. 92, no. 439, pp. 1063-1073, 1997.
[18] C. M. Van der Walt, and E. Barnard, “Variable kernel density estimation in high-dimensional feature spaces,” 2017.
[19] J. M. Leiva-Murillo, and A. ArtéS-RodríGuez, “Algorithms for maximum-likelihood bandwidth selection in kernel density estimators,” Pattern Recognition Letters, vol. 33, no. 13, pp. 1717-1724, 2012.
[20] L. Breiman, W. Meisel, and E. Purcell, “Variable kernel estimates of multivariate densities,” Technometrics, vol. 19, no. 2, pp. 135-144, 1977.
[21] Y.-J. Oyang, S.-C. Hwang, Y.-Y. Ou, C.-Y. Chen, and Z.-W. Chen, “Data classification with radial basis function networks based on a novel kernel density estimation algorithm,” IEEE transactions on neural networks, vol. 16, no. 1, pp. 225-236, 2005.
[22] I. S. Abramson, “On bandwidth variation in kernel estimates-a square root law,” The annals of Statistics, pp. 1217-1223, 1982.
[23] ESRI, 'ArcGIS Desktop: Release 10.5,' Redlands, CA: Environmental Systems Research Institute, 2011.
[24] A. Okabe, T. Satoh, and K. Sugihara, “A kernel density estimation method for networks, its computational method and a GIS‐based tool,” International Journal of Geographical Information Science, vol. 23, no. 1, pp. 7-32, 2009.
[25] D. W. Scott, and S. R. Sain, “Multidimensional density estimation,” Handbook of statistics, vol. 24, pp. 229-261, 2005.
[26] J. Johnson, M. Douze, and H. Jégou, “Billion-scale similarity search with gpus,” arXiv preprint arXiv:1702.08734, 2017.
[27] J. Huang, and C. X. Ling, “Using AUC and accuracy in evaluating learning algorithms,” IEEE Transactions on knowledge and Data Engineering, vol. 17, no. 3, pp. 299-310, 2005.
[28] J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural networks, vol. 61, pp. 85-117, 2015.
[29] C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” arXiv preprint arXiv:1706.04599, 2017.
[30] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, 'Generative adversarial nets.' pp. 2672-2680.
[31] K. He, X. Zhang, S. Ren, and J. Sun, 'Deep residual learning for image recognition.' pp. 770-778.
[32] A. Krizhevsky, and G. Hinton, Learning multiple layers of features from tiny images, Citeseer, 2009.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21970-
dc.description.abstract密度估計的研究發展出許多演算法,對於各樣科系的資料分析都有極大的影響力。但是密度估計對於在高維度的資料卻表現不佳。這篇研究探討傳統的密度估計運在高維度資料的遇到的問題,為什麼密度估計所出來的結果可能極低到會被雜訊影像或級高到無法做適當的比較。這篇研究提出在高維資料中若資料的維度不確定的時候應該運用負距離最鄰近資料的對數來做密度的估計。這篇研究也將所提出的演算法用在接近十萬維度的資料上,並且有良好的表現。zh_TW
dc.description.abstractThe study of density estimation has produced algorithms that has been used across many disciplines and has become a common fixture in the analysis of data. However density estimation has not been able to perform well on high-dimensional datasets. In this study, we discuss the reasons that traditional density estimation would not work well for high dimensional data. Why they give values that are uninterpretable, with either the values so low that the values may be greatly affected by the model noise or computational noise, or the values are so high where we cannot compute the ratio of infinity over infinity. This study proposes using negative log distance to k nearest neighbors as the metric to compare when the dimension of the samples are not known. The resulting classifier, HDDE, was used to classify images in domains with close to 100k dimensions with reasonable results.en
dc.description.provenanceMade available in DSpace on 2021-06-08T03:55:40Z (GMT). No. of bitstreams: 1
ntu-107-D99922027-1.pdf: 2260900 bytes, checksum: 9869c97dbd38925a176c64417facdb07 (MD5)
Previous issue date: 2018
en
dc.description.tableofcontents口試委員會審定書 i
誌謝 ii
中文摘要 iii
Abstract iv
Table of Contents v
List of Figures viii
List of Tables x
Chapter 1 Density Estimation 1
1.1 Density Estimation 1
1.2 Evaluation Criteria 3
Chapter 2 Review of Density Estimation Techniques 5
2.1 Histogram 5
2.2 Naive Estimator 5
2.3 K Nearest Neighbor Estimator 6
2.4 Other Approaches 6
2.5 Kernel Density Estimation 7
2.5.1 Analysis of Performance 8
2.5.2 Bandwidth selection 10
2.5.3 Kernel selection 13
2.5.4 Multivariate Kernels 14
2.5.5 Variable bandwidth methods 15
2.5.6 Recent approaches 19
Chapter 3 Density Estimation in High Dimensions 21
3.1 Problems with PDF in high-dimensions 21
3.1.1 The Diverging Density Problem 21
3.1.2 The all-dimensions-are-significant problem 25
3.1.3 The class-comparison problem 27
3.2 Distance as a primary metric 29
3.3 Finding the Dimensions of the Distribution 33
3.3.1 Using PCA to find dimensions of the samples 33
3.3.2 Using average distance to K nearest neighbors to find dimensions of the samples 35
3.3.3 Discussion 37
Chapter 4 Application to Classification 39
4.1 High Dimension Density Estimator 39
4.1.1 KNN algorithm 40
4.1.2 Choosing K 41
4.2 Experiments on Neural Networks 42
4.2.1 Convolutional Neural Network 43
4.2.2 Dataset 43
4.2.3 HDDE classification 44
4.2.4 Results 44
4.3 Discussion 47
Chapter 5 Discussion and Conclusion 49
5.1 Importance of each dimension 49
5.2 Distribution of the object of interest 50
5.3 Limitations for HDDE 50
5.4 Future work 51
Bibliography 52
dc.language.isoen
dc.title以最近鄰居距離作高維度空間密度估計zh_TW
dc.titleDensity Estimation in High Dimensions Using Distance to K Nearest Neighborsen
dc.typeThesis
dc.date.schoolyear106-2
dc.description.degree博士
dc.contributor.oralexamcommittee韓謝忱,賴飛羆,陳倩瑜,黃乾綱
dc.subject.keyword核密度估計,高維度,分類器,zh_TW
dc.subject.keyworddensity estimation,high dimension,classifier,en
dc.relation.page53
dc.identifier.doi10.6342/NTU201803453
dc.rights.note未授權
dc.date.accepted2018-08-15
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-107-1.pdf
  未授權公開取用
2.21 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved