使用深度學習之標籤光學文字識別

Ping-Yuan Tseng; 曾柄元

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15325

標題:	使用深度學習之標籤光學文字識別 Label Optical Character Recognition with Deep Learning
作者:	Ping-Yuan Tseng 曾柄元
指導教授:	傅楸善(Chiou-Shann Fuh)
關鍵字:	深度學習,實例分割,連接文本提議網路,單階段多方框偵測器,卷積遞迴神經網路, Deep learning,Instance Segmentation,CTPN,SSD,CRNN,
出版年 :	2020
學位:	碩士
摘要:	本論文提出深度學習的解決方案,用於標籤的光學文字辨識。主要分為兩個部分。一、文字行的偵測,我們提出了兩種偵測模型,使用截然不同的方法,第一個是以影像分割為基礎的文字行偵測模型稱之為 Water Pixel,利用模型產生出兩張影像分割,一張為文字正常大小的分割影像,另一張為文字縮小的分割影像,再利用 Watershed 演算法解決文字沾黏的現象,達到實例分割的效果。第二個偵測模型為 CT-SSD,比較常見的深度學習文字偵測演算法,決定採用連接文本提議網路 CTPN (Connectionist Text Proposal Network)的想法,將訓練資料的文字框切小塊作訓練,解決了其他算法不適合偵測長文本的問題,但是CTPN速度並不足以達到我們的需求,因此使用單階段多方框偵測器 SSD (Single-Shot multibox Detector)作為偵測模型,使用 GPU (Graphic Processing Unit)偵測文字框可以達到接近實時的效果,並且非常適合偵測長文本。二、文字行辨識,網路上並沒有公開可獲得的真實正體中文文字行資料集,因此本論文利用合成正體中文文字行訓練卷積遞迴神經網路 CRNN (Convolutional Recurrent Neural Network),並且利用野外簡體中文資料集 CTW (Chinese Text dataset in the Wild),交互訓練 CRNN 的 Encoder,以克服只用合成資料集訓練 CRNN 導致的過擬合問題。並且使用多種資料擴增的方式,如:模糊、拉伸、扭曲、背景變化、抖動、失真等,用以克服不同場景變化。 In this thesis, we propose a deep learning solution for Optical Character Recognition (OCR). OCR can be divided into two main parts. The first part is text line detection: We propose two text-line detection model using totally different methods. The first detection model is Water Pixel which is an image segmentation-based model. Water Pixel will predict two segmentation images: the text segmentation with normal size and the shrunk text segmentation. Then, we use watershed algorithm to solve the text overlapping problem to achieve the effect of instance segmentation. The other detection work we proposed is CT-SSD. After comparing several deep learning-based algorithms, we find out that CTPN (Connectionist Text Proposal Network) can perform well. The main idea of CTPN is to divide the ground truth boxes into small pieces then train with faster-RCNN. CTPN detects long text lines well, but the speed cannot meet our expectation. Thus, we use SSD (Single-Shot multibox Detector) as the detection backbone instead of using Faster-RCNN. CT-SSD not only can detect long text well but also speed up the inference time. The second part is text line recognition: There are no available traditional Chinese text line dataset on the public web. Hence, we train our CRNN (Convolutional Recurrent Neural Network) with synthetic traditional Chinese text lines. In addition, we also utilize the simplified Chinese dataset CTW (Chinese Text dataset in the Wild) to enhance CRNN performance by alternately training the encoder of CRNN. This approach can slightly solve the problem that training with only synthetic data may cause model overfitting. We also adopt many augmentation skills such as blur, shrink and expand, distortion, background variation, shake and aliasing to avoid overfitting in one scenario.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15325
DOI:	10.6342/NTU202001296
全文授權:	未授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
U0001-0307202015335300.pdf 未授權公開取用	5.1 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。