Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/20966
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor劉長遠
dc.contributor.authorHan-Ting Yehen
dc.contributor.author葉翰挺zh_TW
dc.date.accessioned2021-06-08T03:12:52Z-
dc.date.copyright2017-02-17
dc.date.issued2017
dc.date.submitted2017-02-15
dc.identifier.citation[1] Jain, A.K., Murty, M.N., Flynn, P.J. “Data clustering: a review”. ACM Computing
Surveys 31(3), 264–323 ,1999
[2] D.E. Rumelhart, G.E. Hinton, and R.J. Williams. “Learning internal representations by error propagation”. In Parallel Distributed Processing. Vol 1: Foundations. MIT Press, Cambridge, MA, 1986.
[3] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. “Gradient-based learning applied to document recognition”, Proceedings of the IEEE, november 1998.
[4] Y. Bengio; A. Courville; P. Vincent. “Representation Learning: A Review and New Perspectives”. IEEE Trans. PAMI, special issue Learning Deep Architectures. 2013
[5] G.E. Hinton and R.R. Salakhutdinov. “Reducing the Dimensionality of Data with Neural Networks”. Science (New York, N.Y.), 313:504–507, 2006.
[6] P. J. Werbos. “Beyond Regression: New Tools for Prediction and Analysis in the Behavioral”. PhD thesis, Harvard University, 1974.
[7] David E. Rumelhart, James L. McClelland, CORPORATE PDP Research Group. “Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations”. MIT Press Cambridge, 1986
[8] Geoffrey E. Hinton.“Training products of experts by minimizing contrastive divergence”. Neural Computation, 14(8):1771–1800, 2002
[9] Andrew Y. Ng ,'Sparse autoencoder,' class notes for CS294A, Computer Science Department , Stanford University, 2011.
[10] G.E. Hinton, S. Osindero, and Y.W. Teh. “A fast learning algorithm for deep belief nets”. Neural Computation, 18(7):1527 1554, 2006
[11] Chunfeng Song, Feng Liu, Yongzhen Huang, Liang Wang, and Tieniu Tan, “Auto-encoder Based Data Clustering”, 18th Iberoamerican Congress on Pattern Recognition, (CIAPR oral), 2013.
[12] Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S. “Constrained k-means clustering
with background knowledge”. In: International Conference on Machine Learning,
pp. 577–584 , 2001
[13] Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeffrey Dean and Andrew Y. Ng. “Building High-Level Features using Large Scale Unsupervised Learning”. In Proceedings of the Twenty-Ninth International Conference on Machine Learning, 2012.
[14] Cheng-Yuan Liou, Hsin-Chang Yang. “Handprinted character recognition based on spatial topology distance measurement”, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.18, no.9, pages 941-945. SCI&EI, 1996
[15] D.J. Burr, ”Elastic Matching of Line Drawings,” IEEE Trans. Pattern Analysis and Machine Intelligence”, vol. 3, pp. 708-713, Nov. 1981
[16] T. Kohonen, “Self-organization and Associative Memory”. 3rd edition. Berlin, Heidelberg, Germany: Springer-Verlag, 1989
[17] Daw-Ran Liou, Chia-Ching Lin and Cheng-Yuan Liou, “Setting Shape Rules for Handprinted Character Recognition”, ACIIDS, The 4rd Asian Conference on Intelligent Information and Database Systems, March 19-21, LNCS 7197, pp. 245-252, 2012
[18] Andrew Ng, Jiquan Ngiam, Chuan Yu Foo, Yifan Mai, Caroline Suen. “UFLDL Tutorial : Visualizing a Trained Autoencoder”, Stanford University, April 2013
http://deeplearning.stanford.edu/wiki/index.php/Visualizing_a_Trained_Autoencoder
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/20966-
dc.description.abstract這篇論文提出了一個以2個模組組合成的手寫數字分群分法,第1個模組是堆疊稀疏自動編碼器,第2個模組是以空間拓撲距離測量為基礎的手寫字辨認器,這2個模組分別屬於無監督式及監督式的方法,重點是本文作者把它們變成了1個全無監督式的分群方法。
與現今流行的深度架構不同,本方法採橫向思維,以較淺但擴充神經元的數目來做,避開深度架構前層2個群沒分開,後層就分不開的問題。
本方法使用在60000個手寫數字的MNIST DataSet上分群結果為77.4%,超過了相關論文的76%。而且有現成的方法能再提升效能。
本方法的優點是一個模組化的設計,輸入的手寫數字通過自動編碼器模組抽取出數字樣本特徵後,再交給手寫字辨認模組做分群,不但組成了全無監督式的分群方法,稍加訓練可以再變成分類器使用。模組化的設計使得功能更彈性靈活,任何新的技術的出現可以用替換模組來改變功能或提升效能。
豐富的應用是另一個優點,本方法不只能用在手寫數字的分群,隨著分群過程中亦衍生出3種應用: (1) 在一堆資料中找出標準樣本,例如標準字形、圖形等。(2) 在圖像中搜尋像標準樣本的東西,例如掃描空拍圖搜尋像標準數字的地形。(3) 使用者定義搜尋,例如以圖搜圖。
此外只要變更訓練的資料庫,它也能應用在別的領域,例如在音頻上的分群及搜尋。
zh_TW
dc.description.abstractThis thesis presents a handwritten digits clustering method consisting of two modules. The first module is a Stacked Sparse Autoencoder, and the second module is a Handprinted Character Recognizer Based on Spatial Topology Distance Measurement. These two modules are unsupervised and supervised method, respectively. The point is that the author of this thesis converted them into a fully unsupervised clustering method.
Different from the presently popular deep structure, this method adopts transverse thinking and chooses to do with shallower structure but expanded number of neurons. The purpose is to avoid the problem in the deep structure that if two clusters are not separated in front layer they will not be separated in rear layers.
This method is applied to the MNIST dataset with 60000 handwritten digits and the clustering accuracy is 77.4%, more than 76% of the related paper. Furthermore, there is a ready-made way to improve performance.
The advantage of this method is a modular design. The digit template features are first extracted by the autoencoder module from input handwritten digits, and then handed over to the handwritten character recognition module to do the clustering. They not only form the fully unsupervised cluster method, but with a little training it can be transformed into a classifier for use. Modular design makes the function more elastic and flexible. With the emergence of any new technology it can replace the module to change the function or improve performance.
Rich application is another advantage. This method not only can be used in handwritten digital clustering, but also derives three kinds of applications from clustering process: (1) Find standard templates in a pile of data, such as standard characters, graphics etc. (2) Search for things like standard template in images, such as scanning satellite images to search for terrains like standard digits. (3) user-defined search, such as search by image.
In addition, by only change the training database, it can also be applied in other areas, such as clustering and search in audio.
en
dc.description.provenanceMade available in DSpace on 2021-06-08T03:12:52Z (GMT). No. of bitstreams: 1
ntu-106-R01922149-1.pdf: 6750479 bytes, checksum: 023274c9956516b18c7d6ce67eef402b (MD5)
Previous issue date: 2017
en
dc.description.tableofcontents口試委員會審定書 #
誌謝 i
中文摘要 ii
ABSTRACT iii
CONTENTS v
LIST OF FIGURES vii
LIST OF TABLES ix
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Convolutional Neural Network 3
1.2.1 LeNET-5 5
1.3 Stacked Sparse Autoencoder 7
1.3.1 Autoencoder 7
1.3.2 Related Works 13
1.3.2.1 Autoencoder Based Data Clustering 13
1.3.2.2 Building High-level Features using Large Scale Unsupervised Learning 15
1.4 Handprinted Character Recognition Based on Spatial Topology Distance Measurement 18
Chapter 2 The Method 22
2.1 Design Idea 22
2.2 Architecture 23
2.3 Algorithm 26
Chapter 3 Experiment Results and Analysis 30
3.1 Data Set 30
3.2 Experiment and Clustering Results 31
3.3 Analysis 35
3.3.1 Extracted Basic Features 35
3.3.2 Extracted High Level Features 39
3.3.3 Feature composition 41
3.3.4 Extreme Sparsity using BackPropagation 42
Chapter 4 Application 48
4.1 Searching on Satellite images 48
4.2 User-defined search using BackPropagation 51
Chapter 5 Conclusion and Future Work 55
5.1 Conclusion 55
5.2 Future Work 57
REFERENCE 60
dc.language.isoen
dc.subject淺度學習zh_TW
dc.subject使用者定義搜尋zh_TW
dc.subject無監督式分群分法zh_TW
dc.subject稀疏自動編碼器zh_TW
dc.subject空間拓樸距離zh_TW
dc.subject手寫數字辨識zh_TW
dc.subjectSparse Autoencoderen
dc.subjectUser defined searchen
dc.subjectUnsupervised Clustering methoden
dc.subjectSpatial Topology Distanceen
dc.subjectHandwritten Digits Recognitionen
dc.subjectShallow Learningen
dc.title一個以自動編碼為基礎及無監督式的手寫數字分群方法zh_TW
dc.titleAn Autoencoder-Based and Unsupervised Method for Handwritten Digits Clusteringen
dc.typeThesis
dc.date.schoolyear105-1
dc.description.degree碩士
dc.contributor.oralexamcommittee呂育道,周承復
dc.subject.keyword無監督式分群分法,使用者定義搜尋,稀疏自動編碼器,淺度學習,手寫數字辨識,空間拓樸距離,zh_TW
dc.subject.keywordUnsupervised Clustering method,User defined search,Sparse Autoencoder,Shallow Learning,Handwritten Digits Recognition,Spatial Topology Distance,en
dc.relation.page61
dc.identifier.doi10.6342/NTU201700622
dc.rights.note未授權
dc.date.accepted2017-02-15
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-106-1.pdf
  未授權公開取用
6.59 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved