請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15325完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 傅楸善(Chiou-Shann Fuh) | |
| dc.contributor.author | Ping-Yuan Tseng | en |
| dc.contributor.author | 曾柄元 | zh_TW |
| dc.date.accessioned | 2021-06-07T17:32:55Z | - |
| dc.date.copyright | 2020-08-03 | |
| dc.date.issued | 2020 | |
| dc.date.submitted | 2020-07-14 | |
| dc.identifier.citation | [1] X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, J. Liang, “EAST: An Efficient and Accurate Scene Text Detector,” https://arxiv.org/pdf/1704.03155.pdf, 2017. [2] M. Liao, B. Shi, X. Bai, “TextBoxes++: A Single-Shot Oriented Scene Text Detector,” https://arxiv.org/pdf/1801.02765.pdf, 2018. [3] Z Tian , W Huang, T He , P He , and Y Qiao, “Detecting Text in Natural Image with Connectionist Text Proposal Network,” https://arxiv.org/pdf/1609.03605.pdf, 2016. [4] W Liu , D. Anguelov , D. Erhan , C. Szegedy , S. Reed, C. Y. Fu , and Alexander C. Berg, “SSD: Single Shot MultiBox Detector, ”https://arxiv.org/pdf/1512.02325.pdf, 2016. [5] S. Q. Ren, K. M. He, R. Girshick, and J. Sun, “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition,” https://arxiv.org/pdf/1507.05717.pdf, 2015. [6] S. Hochreiter, F. F. Informatik, J. Schmidhuber, IDSIA,“LONG SHORT-TERM MEMORY,” https://www.bioinf.jku.at/publications/older/2604.pdf, 1997. [7] F. Borisyuk, A. Gordo, V. Sivakumar, “Rosetta: Large scale system for text detection and recognition in images,” https://research.fb.com/wp-content/uploads/2018/10/Rosetta-Large-scale-system-for-text-detection-and-recognition-in-images.pdf, 2018. [8] S. Q. Ren, K. M. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” https://arxiv.org/pdf/1506.01497.pdf, 2016. [9] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” https://arxiv.org/pdf/1505.04597.pdf, 2015. [10] L. C. Chen, G Papandreou, L. Kokkinos, K Murphy, A L. Yuille, “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs,” https://arxiv.org/pdf/1606.00915.pdf, 2017. [11] ICDAR, ”ICDAR 2019 Robust Reading Challenge on Multi-lingual scene text detection and recognition,” https://rrc.cvc.uab.es/?ch=15, 2019 [12] 阿里雲, “ICPR MTWI 2018 挑戰賽,” https://tianchi.aliyun.com/competition/entrance/231651/introduction, 2018 [13] J. Redmon , S. Divvala, R. Girshick, A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” https://arxiv.org/pdf/1506.02640.pdf, 2016. [14] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” https://arxiv.org/pdf/1409.1556.pdf, 2015. [15] K. He, et al. “Deep Residual Learning for Image Recognition,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, pp. 770-778, 2016. [16] Everingham, M. and Van~Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A., “The PASCAL Visual Object Classes Challenge 2012 Results,” http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html, 2012. [17] T. Y. Lin, M. Maire, S. Belongie, et al,“Microsoft COCO: Common Objects in Context,” https://arxiv.org/pdf/1405.0312.pdf, 2015. [18] uoip. “SSD Variants,” https://github.com/uoip/SSD-variants, 2018. [19] Tommy Huang, “機器/深度學習: 物件偵測 Non-Maximum Suppression (NMS),” https://medium.com/@chih.sheng.huang821/%E6%A9%9F%E5%99%A8-%E6%B7%B1%E5%BA%A6%E5%AD%B8%E7%BF%92-%E7%89%A9%E4%BB%B6%E5%81%B5%E6%B8%AC-non-maximum-suppression-nms-aa70c45adffa, Honolulu, Hawaii, pp. 4700-4708, 2018. [20] A. Graves1 ,S. Fernandez , F. Gomez , J. Schmidhuber, “Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks,” https://www.cs.toronto.edu/~graves/icml_2006.pdf, 2006. [21] T. L. Yuan, Z. Zhu, K. Xu, C. J. Li, T. J. Mu and S. M. Hu, “A Large Chinese Text Dataset in the Wild, ” https://ctwdataset.github.io/, 2018. [22] Wikipedia, “Edit distance,” https://en.wikipedia.org/wiki/Edit_distance, 2020. [23] D. Bolya, C. Zhou, F. Xiao, Y. J. Lee “YOLACT++: Better Real-time Instance Segmentation,” https://arxiv.org/pdf/1912.06218.pdf, 2019 | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15325 | - |
| dc.description.abstract | 本論文提出深度學習的解決方案,用於標籤的光學文字辨識。主要分為兩個部分。一、文字行的偵測,我們提出了兩種偵測模型,使用截然不同的方法,第一個是以影像分割為基礎的文字行偵測模型稱之為 Water Pixel,利用模型產生出兩張影像分割,一張為文字正常大小的分割影像,另一張為文字縮小的分割影像,再利用 Watershed 演算法解決文字沾黏的現象,達到實例分割的效果。第二個偵測模型為 CT-SSD,比較常見的深度學習文字偵測演算法,決定採用連接文本提議網路 CTPN (Connectionist Text Proposal Network)的想法,將訓練資料的文字框切小塊作訓練,解決了其他算法不適合偵測長文本的問題,但是CTPN速度並不足以達到我們的需求,因此使用單階段多方框偵測器 SSD (Single-Shot multibox Detector)作為偵測模型,使用 GPU (Graphic Processing Unit)偵測文字框可以達到接近實時的效果,並且非常適合偵測長文本。二、文字行辨識,網路上並沒有公開可獲得的真實正體中文文字行資料集,因此本論文利用合成正體中文文字行訓練卷積遞迴神經網路 CRNN (Convolutional Recurrent Neural Network),並且利用野外簡體中文資料集 CTW (Chinese Text dataset in the Wild),交互訓練 CRNN 的 Encoder,以克服只用合成資料集訓練 CRNN 導致的過擬合問題。並且使用多種資料擴增的方式,如:模糊、拉伸、扭曲、背景變化、抖動、失真等,用以克服不同場景變化。 | zh_TW |
| dc.description.abstract | In this thesis, we propose a deep learning solution for Optical Character Recognition (OCR). OCR can be divided into two main parts. The first part is text line detection: We propose two text-line detection model using totally different methods. The first detection model is Water Pixel which is an image segmentation-based model. Water Pixel will predict two segmentation images: the text segmentation with normal size and the shrunk text segmentation. Then, we use watershed algorithm to solve the text overlapping problem to achieve the effect of instance segmentation. The other detection work we proposed is CT-SSD. After comparing several deep learning-based algorithms, we find out that CTPN (Connectionist Text Proposal Network) can perform well. The main idea of CTPN is to divide the ground truth boxes into small pieces then train with faster-RCNN. CTPN detects long text lines well, but the speed cannot meet our expectation. Thus, we use SSD (Single-Shot multibox Detector) as the detection backbone instead of using Faster-RCNN. CT-SSD not only can detect long text well but also speed up the inference time. The second part is text line recognition: There are no available traditional Chinese text line dataset on the public web. Hence, we train our CRNN (Convolutional Recurrent Neural Network) with synthetic traditional Chinese text lines. In addition, we also utilize the simplified Chinese dataset CTW (Chinese Text dataset in the Wild) to enhance CRNN performance by alternately training the encoder of CRNN. This approach can slightly solve the problem that training with only synthetic data may cause model overfitting. We also adopt many augmentation skills such as blur, shrink and expand, distortion, background variation, shake and aliasing to avoid overfitting in one scenario. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-07T17:32:55Z (GMT). No. of bitstreams: 1 U0001-0307202015335300.pdf: 5223232 bytes, checksum: e59b1c89fb6fd971a1453f57248f0ae8 (MD5) Previous issue date: 2020 | en |
| dc.description.tableofcontents | 誌謝 i 中文摘要 ii ABSTRACT iv CONTENTS vi LIST OF FIGURES ix LIST OF TABLES xiii Chapter 1 Introduction 1 1.1 Overview 1 1.2 Workflow 3 1.3 Thesis Organization 3 Chapter 2 Text Line Detection Background 4 2.1 Faster R-CNN [8] 4 2.2 Single-Shot multibox Detector (SSD) 5 2.3 Connectionist Text Proposal Network (CTPN) 6 2.4 An Efficient and Accurate Scene Text Detector (EAST) 6 Chapter 3 Text Line Detection Model 1: Water Pixel 8 3.1 Introduction 8 3.2 Model Architecture 8 3.3 Training Objective 10 3.4 Workflow 10 3.5 Training Data 14 3.6 Experimental Result 15 Chapter 4 Text Line Detection Model 2: CT-SSD 20 4.1 Introduction 20 4.2 Our Model: CT-SSD 20 4.3 Training Objective 21 4.4 Non-Maximum Suppression 23 4.5 Text-Line Construction Algorithm 24 4.6 Training Data Visualization 29 4.7 Experimental Results 32 Chapter 5 Text Detection Experimental Results 37 Chapter 6 Text-Line Recognition 38 6.1 Task Introduction 38 6.2 Text-Line Recognition Model Architecture 38 6.3 Label Sequence Prediction 41 6.4 Training Data for CRNN 43 6.5 Training Data for Auxiliary Part 45 6.6 Training Details 46 6.7 Evaluation Criterion 47 6.8 Test Result 48 Chapter 7 Specification 51 Chapter 8 Conclusion and Future Works 52 References 53 | |
| dc.language.iso | en | |
| dc.subject | 單階段多方框偵測器 | zh_TW |
| dc.subject | 卷積遞迴神經網路 | zh_TW |
| dc.subject | 深度學習 | zh_TW |
| dc.subject | 實例分割 | zh_TW |
| dc.subject | 連接文本提議網路 | zh_TW |
| dc.subject | Instance Segmentation | en |
| dc.subject | Deep learning | en |
| dc.subject | CRNN | en |
| dc.subject | SSD | en |
| dc.subject | CTPN | en |
| dc.title | 使用深度學習之標籤光學文字識別 | zh_TW |
| dc.title | Label Optical Character Recognition with Deep Learning | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 108-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 沈裕池(YU-CHI SHEN),賴志宏(ZHI-HONG LAI),沈立健(LI-JIAN SHEN) | |
| dc.subject.keyword | 深度學習,實例分割,連接文本提議網路,單階段多方框偵測器,卷積遞迴神經網路, | zh_TW |
| dc.subject.keyword | Deep learning,Instance Segmentation,CTPN,SSD,CRNN, | en |
| dc.relation.page | 56 | |
| dc.identifier.doi | 10.6342/NTU202001296 | |
| dc.rights.note | 未授權 | |
| dc.date.accepted | 2020-07-15 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-0307202015335300.pdf 未授權公開取用 | 5.1 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
