Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15325
Title: 使用深度學習之標籤光學文字識別
Label Optical Character Recognition with Deep Learning
Authors: Ping-Yuan Tseng
曾柄元
Advisor: 傅楸善(Chiou-Shann Fuh)
Keyword: 深度學習,實例分割,連接文本提議網路,單階段多方框偵測器,卷積遞迴神經網路,
Deep learning,Instance Segmentation,CTPN,SSD,CRNN,
Publication Year : 2020
Degree: 碩士
Abstract: 本論文提出深度學習的解決方案,用於標籤的光學文字辨識。主要分為兩個部分。一、文字行的偵測,我們提出了兩種偵測模型,使用截然不同的方法,第一個是以影像分割為基礎的文字行偵測模型稱之為 Water Pixel,利用模型產生出兩張影像分割,一張為文字正常大小的分割影像,另一張為文字縮小的分割影像,再利用 Watershed 演算法解決文字沾黏的現象,達到實例分割的效果。第二個偵測模型為 CT-SSD,比較常見的深度學習文字偵測演算法,決定採用連接文本提議網路 CTPN (Connectionist Text Proposal Network)的想法,將訓練資料的文字框切小塊作訓練,解決了其他算法不適合偵測長文本的問題,但是CTPN速度並不足以達到我們的需求,因此使用單階段多方框偵測器 SSD (Single-Shot multibox Detector)作為偵測模型,使用 GPU (Graphic Processing Unit)偵測文字框可以達到接近實時的效果,並且非常適合偵測長文本。二、文字行辨識,網路上並沒有公開可獲得的真實正體中文文字行資料集,因此本論文利用合成正體中文文字行訓練卷積遞迴神經網路 CRNN (Convolutional Recurrent Neural Network),並且利用野外簡體中文資料集 CTW (Chinese Text dataset in the Wild),交互訓練 CRNN 的 Encoder,以克服只用合成資料集訓練 CRNN 導致的過擬合問題。並且使用多種資料擴增的方式,如:模糊、拉伸、扭曲、背景變化、抖動、失真等,用以克服不同場景變化。
In this thesis, we propose a deep learning solution for Optical Character Recognition (OCR). OCR can be divided into two main parts. The first part is text line detection: We propose two text-line detection model using totally different methods. The first detection model is Water Pixel which is an image segmentation-based model. Water Pixel will predict two segmentation images: the text segmentation with normal size and the shrunk text segmentation. Then, we use watershed algorithm to solve the text overlapping problem to achieve the effect of instance segmentation. The other detection work we proposed is CT-SSD. After comparing several deep learning-based algorithms, we find out that CTPN (Connectionist Text Proposal Network) can perform well. The main idea of CTPN is to divide the ground truth boxes into small pieces then train with faster-RCNN. CTPN detects long text lines well, but the speed cannot meet our expectation. Thus, we use SSD (Single-Shot multibox Detector) as the detection backbone instead of using Faster-RCNN. CT-SSD not only can detect long text well but also speed up the inference time.
The second part is text line recognition: There are no available traditional Chinese text line dataset on the public web. Hence, we train our CRNN (Convolutional Recurrent Neural Network) with synthetic traditional Chinese text lines. In addition, we also utilize the simplified Chinese dataset CTW (Chinese Text dataset in the Wild) to enhance CRNN performance by alternately training the encoder of CRNN. This approach can slightly solve the problem that training with only synthetic data may cause model overfitting. We also adopt many augmentation skills such as blur, shrink and expand, distortion, background variation, shake and aliasing to avoid overfitting in one scenario.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15325
DOI: 10.6342/NTU202001296
Fulltext Rights: 未授權
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
U0001-0307202015335300.pdf
  Restricted Access
5.1 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved