Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15325
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor傅楸善(Chiou-Shann Fuh)
dc.contributor.authorPing-Yuan Tsengen
dc.contributor.author曾柄元zh_TW
dc.date.accessioned2021-06-07T17:32:55Z-
dc.date.copyright2020-08-03
dc.date.issued2020
dc.date.submitted2020-07-14
dc.identifier.citation[1] X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, J. Liang, “EAST: An Efficient and Accurate Scene Text Detector,” https://arxiv.org/pdf/1704.03155.pdf, 2017.
[2] M. Liao, B. Shi, X. Bai, “TextBoxes++: A Single-Shot Oriented Scene Text Detector,” https://arxiv.org/pdf/1801.02765.pdf, 2018.
[3] Z Tian , W Huang, T He , P He , and Y Qiao, “Detecting Text in Natural Image with Connectionist Text Proposal Network,” https://arxiv.org/pdf/1609.03605.pdf, 2016.
[4] W Liu , D. Anguelov , D. Erhan , C. Szegedy , S. Reed, C. Y. Fu , and Alexander C. Berg, “SSD: Single Shot MultiBox Detector, ”https://arxiv.org/pdf/1512.02325.pdf, 2016.
[5] S. Q. Ren, K. M. He, R. Girshick, and J. Sun, “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition,” https://arxiv.org/pdf/1507.05717.pdf, 2015.
[6] S. Hochreiter, F. F. Informatik, J. Schmidhuber, IDSIA,“LONG SHORT-TERM MEMORY,” https://www.bioinf.jku.at/publications/older/2604.pdf, 1997.
[7] F. Borisyuk, A. Gordo, V. Sivakumar, “Rosetta: Large scale system for text detection and recognition in images,” https://research.fb.com/wp-content/uploads/2018/10/Rosetta-Large-scale-system-for-text-detection-and-recognition-in-images.pdf, 2018.
[8] S. Q. Ren, K. M. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” https://arxiv.org/pdf/1506.01497.pdf, 2016.
[9] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” https://arxiv.org/pdf/1505.04597.pdf, 2015.
[10] L. C. Chen, G Papandreou, L. Kokkinos, K Murphy, A L. Yuille, “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs,” https://arxiv.org/pdf/1606.00915.pdf, 2017.
[11] ICDAR, ”ICDAR 2019 Robust Reading Challenge on Multi-lingual scene text detection and recognition,” https://rrc.cvc.uab.es/?ch=15, 2019
[12] 阿里雲, “ICPR MTWI 2018 挑戰賽,” https://tianchi.aliyun.com/competition/entrance/231651/introduction, 2018
[13] J. Redmon , S. Divvala, R. Girshick, A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” https://arxiv.org/pdf/1506.02640.pdf, 2016.
[14] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” https://arxiv.org/pdf/1409.1556.pdf, 2015.
[15] K. He, et al. “Deep Residual Learning for Image Recognition,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, pp. 770-778, 2016.
[16] Everingham, M. and Van~Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A., “The PASCAL Visual Object Classes Challenge 2012 Results,” http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html, 2012.
[17] T. Y. Lin, M. Maire, S. Belongie, et al,“Microsoft COCO: Common Objects in Context,” https://arxiv.org/pdf/1405.0312.pdf, 2015.
[18] uoip. “SSD Variants,” https://github.com/uoip/SSD-variants, 2018.
[19] Tommy Huang, “機器/深度學習: 物件偵測 Non-Maximum Suppression (NMS),” https://medium.com/@chih.sheng.huang821/%E6%A9%9F%E5%99%A8-%E6%B7%B1%E5%BA%A6%E5%AD%B8%E7%BF%92-%E7%89%A9%E4%BB%B6%E5%81%B5%E6%B8%AC-non-maximum-suppression-nms-aa70c45adffa, Honolulu, Hawaii, pp. 4700-4708, 2018.
[20] A. Graves1 ,S. Fernandez , F. Gomez , J. Schmidhuber, “Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks,” https://www.cs.toronto.edu/~graves/icml_2006.pdf, 2006.
[21] T. L. Yuan, Z. Zhu, K. Xu, C. J. Li, T. J. Mu and S. M. Hu, “A Large Chinese Text Dataset in the Wild, ” https://ctwdataset.github.io/, 2018.
[22] Wikipedia, “Edit distance,” https://en.wikipedia.org/wiki/Edit_distance, 2020.
[23] D. Bolya, C. Zhou, F. Xiao, Y. J. Lee “YOLACT++: Better Real-time Instance Segmentation,” https://arxiv.org/pdf/1912.06218.pdf, 2019
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15325-
dc.description.abstract本論文提出深度學習的解決方案,用於標籤的光學文字辨識。主要分為兩個部分。一、文字行的偵測,我們提出了兩種偵測模型,使用截然不同的方法,第一個是以影像分割為基礎的文字行偵測模型稱之為 Water Pixel,利用模型產生出兩張影像分割,一張為文字正常大小的分割影像,另一張為文字縮小的分割影像,再利用 Watershed 演算法解決文字沾黏的現象,達到實例分割的效果。第二個偵測模型為 CT-SSD,比較常見的深度學習文字偵測演算法,決定採用連接文本提議網路 CTPN (Connectionist Text Proposal Network)的想法,將訓練資料的文字框切小塊作訓練,解決了其他算法不適合偵測長文本的問題,但是CTPN速度並不足以達到我們的需求,因此使用單階段多方框偵測器 SSD (Single-Shot multibox Detector)作為偵測模型,使用 GPU (Graphic Processing Unit)偵測文字框可以達到接近實時的效果,並且非常適合偵測長文本。二、文字行辨識,網路上並沒有公開可獲得的真實正體中文文字行資料集,因此本論文利用合成正體中文文字行訓練卷積遞迴神經網路 CRNN (Convolutional Recurrent Neural Network),並且利用野外簡體中文資料集 CTW (Chinese Text dataset in the Wild),交互訓練 CRNN 的 Encoder,以克服只用合成資料集訓練 CRNN 導致的過擬合問題。並且使用多種資料擴增的方式,如:模糊、拉伸、扭曲、背景變化、抖動、失真等,用以克服不同場景變化。zh_TW
dc.description.abstractIn this thesis, we propose a deep learning solution for Optical Character Recognition (OCR). OCR can be divided into two main parts. The first part is text line detection: We propose two text-line detection model using totally different methods. The first detection model is Water Pixel which is an image segmentation-based model. Water Pixel will predict two segmentation images: the text segmentation with normal size and the shrunk text segmentation. Then, we use watershed algorithm to solve the text overlapping problem to achieve the effect of instance segmentation. The other detection work we proposed is CT-SSD. After comparing several deep learning-based algorithms, we find out that CTPN (Connectionist Text Proposal Network) can perform well. The main idea of CTPN is to divide the ground truth boxes into small pieces then train with faster-RCNN. CTPN detects long text lines well, but the speed cannot meet our expectation. Thus, we use SSD (Single-Shot multibox Detector) as the detection backbone instead of using Faster-RCNN. CT-SSD not only can detect long text well but also speed up the inference time.
The second part is text line recognition: There are no available traditional Chinese text line dataset on the public web. Hence, we train our CRNN (Convolutional Recurrent Neural Network) with synthetic traditional Chinese text lines. In addition, we also utilize the simplified Chinese dataset CTW (Chinese Text dataset in the Wild) to enhance CRNN performance by alternately training the encoder of CRNN. This approach can slightly solve the problem that training with only synthetic data may cause model overfitting. We also adopt many augmentation skills such as blur, shrink and expand, distortion, background variation, shake and aliasing to avoid overfitting in one scenario.
en
dc.description.provenanceMade available in DSpace on 2021-06-07T17:32:55Z (GMT). No. of bitstreams: 1
U0001-0307202015335300.pdf: 5223232 bytes, checksum: e59b1c89fb6fd971a1453f57248f0ae8 (MD5)
Previous issue date: 2020
en
dc.description.tableofcontents誌謝 i
中文摘要 ii
ABSTRACT iv
CONTENTS vi
LIST OF FIGURES ix
LIST OF TABLES xiii
Chapter 1 Introduction 1
1.1 Overview 1
1.2 Workflow 3
1.3 Thesis Organization 3
Chapter 2 Text Line Detection Background 4
2.1 Faster R-CNN [8] 4
2.2 Single-Shot multibox Detector (SSD) 5
2.3 Connectionist Text Proposal Network (CTPN) 6
2.4 An Efficient and Accurate Scene Text Detector (EAST) 6
Chapter 3 Text Line Detection Model 1: Water Pixel 8
3.1 Introduction 8
3.2 Model Architecture 8
3.3 Training Objective 10
3.4 Workflow 10
3.5 Training Data 14
3.6 Experimental Result 15
Chapter 4 Text Line Detection Model 2: CT-SSD 20
4.1 Introduction 20
4.2 Our Model: CT-SSD 20
4.3 Training Objective 21
4.4 Non-Maximum Suppression 23
4.5 Text-Line Construction Algorithm 24
4.6 Training Data Visualization 29
4.7 Experimental Results 32
Chapter 5 Text Detection Experimental Results 37
Chapter 6 Text-Line Recognition 38
6.1 Task Introduction 38
6.2 Text-Line Recognition Model Architecture 38
6.3 Label Sequence Prediction 41
6.4 Training Data for CRNN 43
6.5 Training Data for Auxiliary Part 45
6.6 Training Details 46
6.7 Evaluation Criterion 47
6.8 Test Result 48
Chapter 7 Specification 51
Chapter 8 Conclusion and Future Works 52
References 53
dc.language.isoen
dc.subject單階段多方框偵測器zh_TW
dc.subject卷積遞迴神經網路zh_TW
dc.subject深度學習zh_TW
dc.subject實例分割zh_TW
dc.subject連接文本提議網路zh_TW
dc.subjectInstance Segmentationen
dc.subjectDeep learningen
dc.subjectCRNNen
dc.subjectSSDen
dc.subjectCTPNen
dc.title使用深度學習之標籤光學文字識別zh_TW
dc.titleLabel Optical Character Recognition with Deep Learningen
dc.typeThesis
dc.date.schoolyear108-2
dc.description.degree碩士
dc.contributor.oralexamcommittee沈裕池(YU-CHI SHEN),賴志宏(ZHI-HONG LAI),沈立健(LI-JIAN SHEN)
dc.subject.keyword深度學習,實例分割,連接文本提議網路,單階段多方框偵測器,卷積遞迴神經網路,zh_TW
dc.subject.keywordDeep learning,Instance Segmentation,CTPN,SSD,CRNN,en
dc.relation.page56
dc.identifier.doi10.6342/NTU202001296
dc.rights.note未授權
dc.date.accepted2020-07-15
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
U0001-0307202015335300.pdf
  未授權公開取用
5.1 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved