應用深度學習技術於中文文字圖像理解

Chih-Chia Huang; 黃志家

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50931

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	盧信銘(Hsin-Min Lu)
dc.contributor.author	Chih-Chia Huang	en
dc.contributor.author	黃志家	zh_TW
dc.date.accessioned	2021-06-15T13:07:19Z	-
dc.date.available	2021-07-31
dc.date.copyright	2020-08-21
dc.date.issued	2020
dc.date.submitted	2020-08-11
dc.identifier.citation	吳宜貞、黃秀霜（2004）。由中文造字原則探討學童認字發展。教育心理學報，36(1)，1-12。吳榮曾、劉華祝（注譯）（2013）。新譯漢書(四) : 志(二)。臺北市：三民。張正男（2009）。實用華語語音學。臺北市：新學林。國家發展委員會（2020）。全字庫。2020年5月24日。取自https://www.cns11643.gov.tw/index.jsp 陳學志、張瓅勻、邱郁秀、宋曜廷、張國恩（2011）。中文部件組字與形構資料庫之建立及其在識字教學的應用。教育心理學報，43(閱讀專刊)，269-290。葉德明（2005）。華語語音學—上篇：語音理論。臺北市：師大書苑。蔣世德（2007）。文字學。說文部首篇。臺北市：秀威資訊科技。 Barnes, C., Shechtman, E., Finkelstein, A. and Goldman, D. B. (2009). Patchmatch: A randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics (ToG). ACM. Bishop, T. and Cook, R. (2003). A specification for cdl character description language. Glyph and Typesetting Workshop. Cao, S., Lu, W., Zhou, J. and Li, X. (2018). Cw2vec: Learning chinese word embeddings with stroke n-gram information. Thirty-Second AAAI Conference on Artificial Intelligence. Chang, B., Zhang, Q., Pan, S. and Meng, L. (2018). Generating handwritten chinese characters using cyclegan. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE. Chang, J., Gu, Y. and Zhang, Y. (2017). Chinese typeface transformation with hierarchical adversarial network. arXiv preprint arXiv:1711.06448. Chapelle, O., Schlkopf, B. and Zien, A. (2006). Semi-supervised learning. The MIT Press. Chen, X., Xu, L., Liu, Z., Sun, M. and Luan, H. (2015). Joint learning of character and word embeddings. Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI). Cui, P., Wang, X., Pei, J. and Zhu, W. (2018). A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering, 31(5), 833-852. Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics. Dong, G., Liao, G., Liu, H. and Kuang, G. (2018). A review of the autoencoder and its variants: A comparative perspective from target recognition in synthetic-aperture radar images. IEEE Geoscience and Remote Sensing Magazine, 6(3), 44-68. Goodfellow, I., Bengio, Y. and Courville, A. (2016). Deep learning. MIT press. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., . . . Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems. Harris, Z. S. (1954). Distributional structure. Word, 10(2-3), 146-162. He, K., Zhang, X., Ren, S. and Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition. Hinton, G. E., McClelland, J. L. and Rumelhart, D. E. (1984). Distributed representations. Carnegie-Mellon University Pittsburgh, PA. Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504-507. Hore, A. and Ziou, D. (2010). Image quality metrics: Psnr vs. Ssim. 2010 20th International Conference on Pattern Recognition. IEEE. Iizuka, S., Simo-Serra, E. and Ishikawa, H. (2017). Globally and locally consistent image completion. ACM Transactions on Graphics (ToG), 36(4), 1-14. Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. Kingma, D. P. and Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Krizhevsky, A., Sutskever, I. and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems. Lin, M., Chen, Q. and Yan, S. (2013). Network in network. arXiv preprint arXiv:1312.4400. Liu, F., Lu, H., Lo, C. and Neubig, G. (2017). Learning character-level compositionality with visual features. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL). Liu, G., Reda, F. A., Shih, K. J., Wang, T.-C., Tao, A. and Catanzaro, B. (2018). Image inpainting for irregular holes using partial convolutions. Proceedings of the European Conference on Computer Vision (ECCV). Lyu, P., Bai, X., Yao, C., Zhu, Z., Huang, T. and Liu, W. (2017). Auto-encoder guided gan for chinese calligraphy synthesis. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE. Maas, A. L., Hannun, A. Y. and Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. Proc. icml. Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I. and Frey, B. (2015). Adversarial autoencoders. arXiv preprint arXiv:1511.05644. Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z. and Paul Smolley, S. (2017). Least squares generative adversarial networks. Proceedings of the IEEE International Conference on Computer Vision. Masci, J., Meier, U., Cireşan, D. and Schmidhuber, J. (2011). Stacked convolutional auto-encoders for hierarchical feature extraction. International Conference on Artificial Neural Networks. Mikolov, T., Chen, K., Corrado, G. S. and Dean, J. (2013). Efficient estimation of word representations in vector space. Proceedings of the International Conference on Learning Representations (ICLR). Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems. Nair, V. and Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. Proceedings of the 27th international conference on machine learning (ICML). Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T. and Efros, A. A. (2016). Context encoders: Feature learning by inpainting. Proceedings of the IEEE conference on computer vision and pattern recognition. Peng, H., Ma, Y., Poria, S., Li, Y. and Cambria, E. (2019). Phonetic-enriched text representation for chinese sentiment analysis with reinforcement learning. arXiv preprint arXiv:1901.07880. Peng, N. and Dredze, M. (2015). Named entity recognition for chinese social media with jointly trained embeddings. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). Pennington, J., Socher, R. and Manning, C. (2014). Glove: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Pérez, P., Gangnet, M. and Blake, A. (2003). Poisson image editing. In Acm siggraph 2003 papers (pp. 313-318). Radford, A., Metz, L. and Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. Shi, X., Zhai, J., Yang, X., Xie, Z. and Liu, C. (2015). Radical embedding: Delving deeper to chinese radicals. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Su, T.-R. and Lee, H.-Y. (2017). Learning chinese word representations from glyphs of characters. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP). Sun, C., Qiu, X. and Huang, X. (2019). Vcwe: Visual character-enhanced word embeddings. arXiv preprint arXiv:1902.08795. Tao, H., Tong, S., Xu, T., Liu, Q. and Chen, E. (2019). Chinese embedding via stroke and glyph information: A dual-channel view. arXiv preprint arXiv:1906.04287. Vincent, P., Larochelle, H., Bengio, Y. and Manzagol, P.-A. (2008). Extracting and composing robust features with denoising autoencoders. Proceedings of the 25th international conference on Machine learning. ACM. Wang, Z., Bovik, A. C., Sheikh, H. R. and Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE transactions on image processing, 13(4), 600-612. Yeh, R. A., Chen, C., Yian Lim, T., Schwing, A. G., Hasegawa-Johnson, M. and Do, M. N. (2017). Semantic image inpainting with deep generative models. Proceedings of the IEEE conference on computer vision and pattern recognition. Yin, R., Wang, Q., Li, P., Li, R. and Wang, B. (2016). Multi-granularity chinese word embedding. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP). Young, T., Hazarika, D., Poria, S. and Cambria, E. (2018). Recent trends in deep learning based natural language processing. IEEE Computational intelligence Magazine, 13(3), 55-75. Yu, F. and Koltun, V. (2016). Multi-scale context aggregation by dilated convolutions. Proceedings of the International Conference on Learning Representations (ICLR). Yu, J., Jian, X., Xin, H. and Song, Y. (2017). Joint embeddings of chinese words, characters, and fine-grained subcharacter components. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP). Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X. and Huang, T. S. (2018). Generative image inpainting with contextual attention. Proceedings of the IEEE conference on computer vision and pattern recognition. Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X. and Huang, T. S. (2019). Free-form image inpainting with gated convolution. Proceedings of the IEEE International Conference on Computer Vision. Zhang, D., Xu, H., Su, Z. and Xu, Y. (2015). Chinese comments sentiment classification based on word2vec and svmperf. Expert Systems with Applications, 42(4), 1857-1863. Zhang, Y., Liu, Y., Zhu, J., Zheng, Z., Liu, X., Wang, W., . . . Zhai, S. (2019). Learning chinese word embeddings from stroke, structure and pinyin of characters. Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM). Zhu, J.-Y., Park, T., Isola, P. and Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of the IEEE international conference on computer vision.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50931	-
dc.description.abstract	中文的每個文字圖像皆蘊含豐富的資訊，包含部首、讀音、組成結構、筆畫、筆順等。本研究將聚焦在單一文字的層次，從文字字形的角度切入，提出了Chinese cHaracter Adversarial Image Reconsturctor (CHAIR)——一個以編碼器—解碼器為主要架構，並以對抗式學習法進行表徵學習(Representation Learning)的圖像填補模型。將字形圖片做為模型的輸入，試圖使該模型能夠對於中文字圖像資訊能有一定程度的理解。該模型使用其學習到的潛在特徵，以卷積神經網路(Convolutional Neural Network)的模型進行如部首分類、部件分類、筆畫數迴歸、讀音分類、相似度分析與對比度分析等的下游任務。不同於多數的中文詞嵌入(Chinese Word Embedding)研究，其目的在於使得深度學習模型能夠有更好的詞向量表示能力，而著重在詞彙的層次。實驗結果顯示，使用圖像填補的技巧對於字形圖片進行模型的學習，其表現較直接使用傳統的自動編碼器來得更好，更能夠捕捉到字形中各個角落的特徵，達到資料增廣的功效。應用該模型所學習而得的潛在特徵，對於在字彙知識中直接與字形相關的部首、部件等，以及與字形複雜度相關的筆畫數的任務，能夠有良好的理解；而由於中文有一字多音的特性，且就造字法則而言，僅有形聲字的字形與讀音相關，因此潛在特徵對於讀音的理解還仍有改善空間。另外，本研究也針對潛在特徵設計了文字相似度與文字對比度的任務，顯示潛在特徵仍然保有在不同字形中所蘊含的相似資訊。	zh_TW
dc.description.abstract	A Chinese character glyph contains rich information, including radicals, pronunciation, and composition structure, that may help readers interpret the character. This thesis focuses on the tasks of understanding Chinese character glyphs. We propose a deep image inpainting model named “Chinese cHaracter Adversarial Image Reconstructor” (CHAIR) based on an encoder-decoder architecture. CHAIR learns the latent representation of character glyphs using the generative adversarial framework and adopts the learned latent representation for downstream prediction tasks, including radical classification, component classification, number of strokes prediction, pronunciation classification, and similarity and analogy analysis. Different from most Chinese word embeddings studies that learns word or character representation through character cooccurrence structure, our study aims at understanding the meaning of Chinese character from the images of the character glyphs. Experimental results show that CHAIR achieves higher prediction performance compared to models using traditional autoencoder methods. With the image inpainting technique, CHAIR learns to fill in a glyph damaged by masks at random locations, which allows the model to better capture the corners of a glyph. Moreover, the latent representations learned by CHAIR can better predict important characteristics of a Chinese character glyph, such as radicals, components, and the number of strokes. However, due to the characteristic of polyphony and the rules of Chinese character construction, the pronunciation is related to the glyph only in phono-semantic character. Hence, there is still room for improvement in pronunciation understanding of the latent representations. In addition, this study also develops some tasks about character similarity and character analogy. Both of them show that the latent representations still keep the similarity information between different characters.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T13:07:19Z (GMT). No. of bitstreams: 1 U0001-1008202017401900.pdf: 6494349 bytes, checksum: 3f77f62996ce6fc7f89c75f1d544d7d6 (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	誌謝 ii 摘要 iii Abstract iv 主目錄 vi 圖目錄 x 表目錄 xii 第一章緒論 1 1.1 研究動機與目的 1 1.2 研究架構 3 第二章文獻探討 4 2.1 自動編碼器(Autoencoder) 4 2.1.1 降噪自動編碼器(Denoising Autoencoder) 5 2.1.2 卷積自動編碼器(Convolutional Autoencoder) 8 2.1.3 編碼器—解碼器架構(Encoder-Decoder) 9 2.1.4 變分自動編碼器(Variational Autoencoder) 9 2.2 對抗式學習(Adversarial Learning) 11 2.2.1 生成式對抗網路(Generative Adversarial Network) 11 2.2.2 深度卷積生成式對抗網路(DCGAN) 14 2.2.3 最小平方生成式對抗網路(LSGAN) 15 2.2.4 生成式對抗網路與自動編碼器的結合 16 2.3 對抗式學習在中文字形圖片上的應用 18 2.4 應用深度學習技術進行圖像填補(Image Inpainting) 19 2.4.1 編碼器—解碼器架構的圖像填補模型 19 2.4.2 從粗到細(Coarse-to-Fine)的圖像填補模型 21 2.5 中文詞嵌入(Chinese Word Embeddings) 23 2.5.1 詞彙的層次(Word Level) 26 2.5.2 文字的層次(Character Level) 27 2.5.2.1 以文字意義的角度進行文字向量的訓練 28 2.5.2.2 以字形圖片的角度進行文字向量的訓練 29 2.5.3 部件的層次(Component Level) 32 2.6 中文文字的組成元素 38 2.6.1 漢字的造字法則 38 2.6.1.1 象形 39 2.6.1.2 指事 39 2.6.1.3 會意 39 2.6.1.4 形聲 39 2.6.2 漢字的部件組成 40 2.6.3 漢字的一字多音現象 42 2.7 小結 44 第三章研究方法 45 3.1 研究問題 45 3.2 研究資料來源 46 3.2.1 字形圖片 46 3.2.2 字彙知識 46 3.3 研究流程 49 3.4 實驗模型設計 52 3.4.1 字形圖片填補模型 52 3.4.2 部首分類預測模型 56 3.4.3 筆畫數迴歸預測模型 58 3.4.4 部件分類預測模型 60 3.4.5 讀音分類預測模型 62 3.4.6 潛在特徵相似分析 64 3.4.7 潛在特徵對比分析 65 3.4.8 小結 65 3.5 衡量指標 66 3.5.1 結構相似性(Structural Similarity, SSIM) 66 3.5.2 峰值信噪比(Peak Signal-to-Noise Ratio, PSNR) 67 3.6 基準模型 69 第四章結果與討論 72 4.1 字形圖片填補任務 72 4.1.1 實驗結果 73 4.1.1.1 質性衡量 73 4.1.1.2 量化衡量 75 4.1.2 與其他生成模型的比較 76 4.1.3 錯誤分析 77 4.2 部首分類預測任務 78 4.2.1 實驗結果 78 4.2.2 與其他模型比較 79 4.2.3 錯誤分析 80 4.2.4 改善方向 80 4.3 筆畫數迴歸預測任務 83 4.3.1 實驗結果 84 4.3.2 與其他模型比較 85 4.4 部件分類預測任務 87 4.4.1 實驗結果 87 4.4.2 與其他模型比較 88 4.4.3 錯誤分析 89 4.5 讀音分類預測任務 90 4.5.1 實驗結果 90 4.5.2 與其他模型比較 92 4.6 潛在特徵相似分析 93 4.6.1 實驗結果 93 4.6.2 與其他模型比較 94 4.7 潛在特徵對比分析 96 4.7.1 實驗結果 96 4.7.2 與其他模型的比較 98 第五章結論與建議 99 5.1 研究結果與討論 99 5.2 研究貢獻 100 5.3 未來展望 101 參考文獻 103
dc.language.iso	zh-TW
dc.title	應用深度學習技術於中文文字圖像理解	zh_TW
dc.title	Understanding Chinese Character Glyph Using Deep Learning Models	en
dc.type	Thesis
dc.date.schoolyear	108-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	林怡伶(Yi-Ling Lin),簡宇泰(Yu-Tai Chien)
dc.subject.keyword	中文字圖像理解,對抗式學習,圖像填補,特徵學習,	zh_TW
dc.subject.keyword	Chinese Character Glyph Understanding,Adversarial Training,Image Inpainting,Representation Learning,	en
dc.relation.page	108
dc.identifier.doi	10.6342/NTU202002849
dc.rights.note	有償授權
dc.date.accepted	2020-08-11
dc.contributor.author-college	管理學院	zh_TW
dc.contributor.author-dept	資訊管理學研究所	zh_TW
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
U0001-1008202017401900.pdf 目前未授權公開取用	6.34 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。