改善合成資料以強化CRNN手寫數字辨識之方法

Chen-Hsiang Sun; 孫晨翔

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/49888

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	黃乾綱(CHIEN-KANG HUANG)
dc.contributor.author	Chen-Hsiang Sun	en
dc.contributor.author	孫晨翔	zh_TW
dc.date.accessioned	2021-06-15T12:25:46Z	-
dc.date.available	2024-08-11
dc.date.copyright	2020-09-02
dc.date.issued	2020
dc.date.submitted	2020-08-12
dc.identifier.citation	1. Mori, S., C.Y. Suen, and K. Yamamoto, Historical review of OCR research and development. Proceedings of the IEEE, 1992. 80(7): p. 1029-1058. 2. Veit, A., et al. COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images. arXiv e-prints, 2016. arXiv:1601.07140. 3. Lecun, Y., et al., Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998. 86(11): p. 2278-2324. 4. Cohen, G., et al. EMNIST: Extending MNIST to handwritten letters. in 2017 International Joint Conference on Neural Networks (IJCNN). 2017. 5. Arditi, A. and J. Cho, Serifs and font legibility. Vision Research, 2005. 45(23): p. 2926-2933. 6. Suwa, M., Segmentation of connected handwritten numerals by graph representation. Vol. 2005. 2005. 750-754 Vol. 2. 7. Green Growth Knowledge Platform. Initiative for Climate Action Transparency (ICAT) Assessment Guides. [Internet]; Available from: https://www.greengrowthknowledge.org/learning/initiative-climate-action-transparency-icat-assessment-guides. 8. 李育安, 利用電腦字型建立卷積神經網絡之中文漢字模型進行手寫與印刷字體辨識, in 工程科學及海洋工程學研究所. 2018, 國立臺灣大學. p. 1-84. 9. OpenCV. Image Thresholding. [Internet] 2019; Available from: https://docs.opencv.org/4.1.1/d7/d4d/tutorial_py_thresholding.html. 10. Otsu, N., A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics, 1979. 9(1): p. 62-66. 11. GH-Lin. morphology. [Internet] 2018; Available from: http://gh-lin.blogspot.com/2018/02/morphology.html. 12. Sundar, H., et al. Skeleton based shape matching and retrieval. in 2003 Shape Modeling International. 2003. IEEE. 13. Félix. OpenCV - Morphological Skeleton. [Internet] 2011; Available from: http://felix.abecassis.me/2011/09/opencv-morphological-skeleton/. 14. SuperDataScience Team. Convolutional Neural Networks (CNN): Step 1- Convolution Operation. [Internet] 2018; Available from: https://www.superdatascience.com/blogs/convolutional-neural-networks-cnn-step-1-convolution-operation. 15. Savchenko, A.V. EFFICIENT IMAGE RECOGNITION WITH CONVOLUTIONAL NEURAL NETWORKS [Internet] 2018; Available from: https://nnov.hse.ru/data/2018/03/10/1165717678/Efficient%20Image%20Recognition%20with%20CNN.pdf 16. Elman, J.L., Finding structure in time. Cognitive science, 1990. 14(2): p. 179-211. 17. Olah, C. Understanding LSTM Networks. [Medium] 2015; Available from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/. 18. Hochreiter, S. and J. Schmidhuber, Long short-term memory. Neural computation, 1997. 9(8): p. 1735-1780. 19. Olah, C. Neural Networks, Types, and Functional Programming. [Medium] 2015; Available from: http://colah.github.io/posts/2015-09-NN-Types-FP/. 20. Graves, A., et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. in Proceedings of the 23rd international conference on Machine learning. 2006. 21. Scheidl, H. An Intuitive Explanation of Connectionist Temporal Classification. 2018; Available from: https://towardsdatascience.com/intuitively-understanding-connectionist-temporal-classification-3797e43a86c. 22. Distill. Sequence Modeling With CTC. [Internet] 2017; Available from: https://distill.pub/2017/ctc/. 23. Hochuli, A.G., et al., Handwritten digit segmentation: Is it still necessary? Pattern Recognition, 2018. 78: p. 1-11. 24. Shi, B., X. Bai, and C. Yao, An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, 2016. 39(11): p. 2298-2304. 25. Simonyan, K. and A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. 26. Glorot, X., A. Bordes, and Y. Bengio. Deep sparse rectifier neural networks. in Proceedings of the fourteenth international conference on artificial intelligence and statistics. 2011. 27. Doosje, B., et al., Guilty by association: When one's group has a negative history. Journal of personality and social psychology, 1998. 75(4): p. 872. 28. Dutta, K., et al. Improving CNN-RNN Hybrid Networks for Handwriting Recognition. in 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). 2018. 29. Pan, S.J. and Q. Yang, A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 2009. 22(10): p. 1345-1359.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/49888	-
dc.description.abstract	深度學習是目前在人工智慧領域中最廣泛被應用的技術，透過網路架構的各種變化，能夠因應各種領域的需求，光學字元辨識（Optical Character Recognition）就是其中一項，而此應用通常稱為文字影像辨識。文字影像辨識一直是文字資料電子化常用的方法，目前通常透過深度學習模型，來將文字圖片轉換為電子資訊，無論是印刷字還是手寫字，皆可以採用此方法作轉換。本論文主要透過選用既有資料集EMNIST及68種字體依據EMNIST圖片格式設定之字符圖片，依照所觀察之手寫字符的特徵做強化處理後合成為單一及連續訓練資料，並採用能夠針對連續字符辨識之CRNN模型架構，將資料集分別做單一字符及連續字符的模型訓練。單一字符實驗結果顯示，透過將手寫字符混入印刷字體能夠達到模型通用性，同一模型對於用來混合之原資料測試集，都能維持或提升辨識水準，而對於手寫表格資料的辨識率，分別由印刷體42%及手寫字82%提升至混合資料集84%。連續字符實驗結果顯示，在連續字符的合成上加入了所觀察之手寫字符特徵，分別利用漸層強化（向內及向外）模擬手寫字符邊緣的不定性，另外對字符在合成時做大小的隨機縮放及連續字符之間的偏移趨勢（向上、水平及向下），模擬手寫字符的大小及位置隨機性，將手寫表格的辨識率，分別由印刷體5.2632%及手寫字44.211%提升至混合資料集57.895%，其中對3個字符以上的較長字串辨識能力由無法辨識的0%提升至39.3%（11/28），表示特徵強化的必須性。	zh_TW
dc.description.abstract	At present, deep learning is the most widely applied technology in the field of artificial intelligence. Through various changes of network architecture, it can meet the needs of various fields. Optical Character Recognition is one of them. Text image recognition has always been a common method for the digitization of text data. At present, the deep learning model is usually used to convert text images into electronic information, whether printed or handwritten. This thesis mainly through EMNIST optional both data sets and 68 kinds of fonts on the basis of EMNIST image format to set the character image, according to the characteristics of handwritten characters do to strengthen the observation after treatment to synthesize into a single and continuous training data, and can be used against the CRNN model structure of continuous character recognition, the data set, respectively, to do a single character and continuous characters of model training. In this paper, through the choice EMNIST data sets, and chooses 68 kinds of fonts on the basis of EMNIST image format set of character images, according to the observation of the characteristics of handwritten characters to synthesize into a single and continuous training data, and chooses the CRNN model which can use in continuous character recognition, respectively do a single character and continuous characters of model training. The results of single-character experiment show that the model can be universal by mixing handwritten characters into printed fonts, and the recognition level of the same model can be maintained or improved for the test set of original data used for mixing, while the recognition rate of handwritten table data can be increased from 42% in print and 82% in hand to 84% in mixed data sets. Continuous characters, according to the results of the experiment on the synthesis of consecutive characters joined observed characteristics of handwritten characters, respectively, using the gradual layer reinforcement (inward and outward) simulation of handwritten characters on the edge of the uncertainty, in addition to characters in synthetic size when random scaling and the deviation between consecutive characters (upward, horizontal and downward), simulation of handwritten character size and location of randomness. The recognition rate of handwritten forms increased from 5.2632% in printed form and 44.211% in handwritten form to 57.895% in mixed data set, among which the recognition ability of longer strings with more than 3 characters increased from 0%（0/28） to 39.3%（11/28）, indicating the necessity of feature enhancement.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T12:25:46Z (GMT). No. of bitstreams: 1 U0001-1108202019550800.pdf: 5792245 bytes, checksum: 4a4e7ee9acfb7446e3794220ae01b709 (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	誌謝 i 摘要 ii ABSTRACT iii 目錄 v 圖目錄 viii 表目錄 x Chapter 1. 緒論 1 1.1. 研究背景及動機 1 1.2. 研究目標 2 1.3. 研究貢獻 2 1.4. 論文架構 3 Chapter 2. 相關背景知識及文獻 4 2.1. 印刷體英數字探討 4 2.2. 手寫英數字探討 5 2.3. 基礎影像處理 6 2.3.1. 影像灰階 6 2.3.2. 影像二值化（Threshold） 6 2.3.3. 膨脹及侵蝕（Dilation and Erosion） 10 2.3.4. 細線化 11 2.4. 神經網路 11 2.4.1. 卷積神經網路（Convolutional Neural Network） 11 2.4.2. 遞迴神經網路（Recurrent Neural Network） 13 2.4.3. 預測差異判斷 14 2.5. 現況背景及限制 17 Chapter 3. 研究方法 19 3.1. 問題定義及研究流程 19 3.2. 資料前處理 20 3.2.1. 字典生成 20 3.3. 圖片合成 21 3.4. 模型訓練 24 3.5. 準確率 26 3.6. 手寫表格資料（目標圖片） 27 3.6.1. 表格資料處理 27 3.6.2. 現有工具測試 27 Chapter 4. 研究結果及討論 28 4.1. 實驗步驟 28 4.2. 單一字符實驗結果 28 4.3. 連續字符實驗結果 32 4.4. 研究結果探討 36 Chapter 5. 結論與未來展望 37 5.1. 結論 37 5.2. 未來展望 38 參考文獻 39 附錄A標準字體數字範例 41 附錄B Google Keep與實驗方法辨識結果比較 42
dc.language.iso	zh-TW
dc.title	改善合成資料以強化CRNN手寫數字辨識之方法	zh_TW
dc.title	Method for improving synthetic data to enhance CRNN handwritten digit recognition	en
dc.type	Thesis
dc.date.schoolyear	108-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	林恩仲(EN-CHUNG LIN),張恆華(HERNG-HUA CHANG),丁肇隆(CHAO-LUNG TING)
dc.subject.keyword	深度學習,電腦視覺,手寫辨識,	zh_TW
dc.subject.keyword	Deep Learning,Computer Vision,Handwriting Recognition,	en
dc.relation.page	47
dc.identifier.doi	10.6342/NTU202003004
dc.rights.note	有償授權
dc.date.accepted	2020-08-13
dc.contributor.author-college	工學院	zh_TW
dc.contributor.author-dept	工程科學及海洋工程學研究所	zh_TW
顯示於系所單位：	工程科學及海洋工程學系

文件中的檔案：

檔案	大小	格式
U0001-1108202019550800.pdf 目前未授權公開取用	5.66 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。