應用深度學習技術於中文文字圖像理解

Chih-Chia Huang; 黃志家

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50931

標題:	應用深度學習技術於中文文字圖像理解 Understanding Chinese Character Glyph Using Deep Learning Models
作者:	Chih-Chia Huang 黃志家
指導教授:	盧信銘(Hsin-Min Lu)
關鍵字:	中文字圖像理解,對抗式學習,圖像填補,特徵學習, Chinese Character Glyph Understanding,Adversarial Training,Image Inpainting,Representation Learning,
出版年 :	2020
學位:	碩士
摘要:	中文的每個文字圖像皆蘊含豐富的資訊，包含部首、讀音、組成結構、筆畫、筆順等。本研究將聚焦在單一文字的層次，從文字字形的角度切入，提出了Chinese cHaracter Adversarial Image Reconsturctor (CHAIR)——一個以編碼器—解碼器為主要架構，並以對抗式學習法進行表徵學習(Representation Learning)的圖像填補模型。將字形圖片做為模型的輸入，試圖使該模型能夠對於中文字圖像資訊能有一定程度的理解。該模型使用其學習到的潛在特徵，以卷積神經網路(Convolutional Neural Network)的模型進行如部首分類、部件分類、筆畫數迴歸、讀音分類、相似度分析與對比度分析等的下游任務。不同於多數的中文詞嵌入(Chinese Word Embedding)研究，其目的在於使得深度學習模型能夠有更好的詞向量表示能力，而著重在詞彙的層次。實驗結果顯示，使用圖像填補的技巧對於字形圖片進行模型的學習，其表現較直接使用傳統的自動編碼器來得更好，更能夠捕捉到字形中各個角落的特徵，達到資料增廣的功效。應用該模型所學習而得的潛在特徵，對於在字彙知識中直接與字形相關的部首、部件等，以及與字形複雜度相關的筆畫數的任務，能夠有良好的理解；而由於中文有一字多音的特性，且就造字法則而言，僅有形聲字的字形與讀音相關，因此潛在特徵對於讀音的理解還仍有改善空間。另外，本研究也針對潛在特徵設計了文字相似度與文字對比度的任務，顯示潛在特徵仍然保有在不同字形中所蘊含的相似資訊。 A Chinese character glyph contains rich information, including radicals, pronunciation, and composition structure, that may help readers interpret the character. This thesis focuses on the tasks of understanding Chinese character glyphs. We propose a deep image inpainting model named “Chinese cHaracter Adversarial Image Reconstructor” (CHAIR) based on an encoder-decoder architecture. CHAIR learns the latent representation of character glyphs using the generative adversarial framework and adopts the learned latent representation for downstream prediction tasks, including radical classification, component classification, number of strokes prediction, pronunciation classification, and similarity and analogy analysis. Different from most Chinese word embeddings studies that learns word or character representation through character cooccurrence structure, our study aims at understanding the meaning of Chinese character from the images of the character glyphs. Experimental results show that CHAIR achieves higher prediction performance compared to models using traditional autoencoder methods. With the image inpainting technique, CHAIR learns to fill in a glyph damaged by masks at random locations, which allows the model to better capture the corners of a glyph. Moreover, the latent representations learned by CHAIR can better predict important characteristics of a Chinese character glyph, such as radicals, components, and the number of strokes. However, due to the characteristic of polyphony and the rules of Chinese character construction, the pronunciation is related to the glyph only in phono-semantic character. Hence, there is still room for improvement in pronunciation understanding of the latent representations. In addition, this study also develops some tasks about character similarity and character analogy. Both of them show that the latent representations still keep the similarity information between different characters.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50931
DOI:	10.6342/NTU202002849
全文授權:	有償授權
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
U0001-1008202017401900.pdf 未授權公開取用	6.34 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。