Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 工程科學及海洋工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91393
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor黃乾綱zh_TW
dc.contributor.advisorChien-Kang Huangen
dc.contributor.author劉國禎zh_TW
dc.contributor.authorGuo-Jhen Liouen
dc.date.accessioned2024-01-26T16:18:25Z-
dc.date.available2024-01-27-
dc.date.copyright2024-01-26-
dc.date.issued2022-
dc.date.submitted2024-01-12-
dc.identifier.citationZhou, X., et al. East: an efficient and accurate scene text detector. in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2017.
Yu, D., et al. Towards accurate scene text recognition with semantic reasoning networks. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
Chen, X., et al., Text recognition in the wild: A survey. ACM Computing Surveys (CSUR), 2021. 54(2): p. 1-35.
Yuliang, L., et al., Detecting curve text in the wild: New dataset and new solution. arXiv preprint arXiv:1712.02170, 2017.
Shi, B., et al. Icdar2017 competition on reading chinese text in the wild (rctw-17). in 2017 14th iapr international conference on document analysis and recognition (ICDAR). 2017. IEEE.
Sun, Y., et al. Chinese street view text: Large-scale chinese text reading with partially supervised learning. in Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.
Chng, C.K., et al. Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. in 2019 International Conference on Document Analysis and Recognition (ICDAR). 2019. IEEE.
Zhang, R., et al. Icdar 2019 robust reading challenge on reading chinese text on signboard. in 2019 international conference on document analysis and recognition (ICDAR). 2019. IEEE.
Li, J., et al. Detecting text in the wild with deep character embedding network. in Asian Conference on Computer Vision. 2018. Springer.
Nayef, N., et al. ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition—RRC-MLT-2019. in 2019 International conference on document analysis and recognition (ICDAR). 2019. IEEE.
Sherstinsky, A., Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 2020. 404: p. 132306.
Jaderberg, M., et al., Reading text in the wild with convolutional neural networks. International journal of computer vision, 2016. 116(1): p. 1-20.
Yuan, T.-L., et al., Chinese text in the wild. arXiv preprint arXiv:1803.00085, 2018.
Lin, T.-Y., et al. Feature pyramid networks for object detection. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
Liu, X., et al. Fots: Fast oriented text spotting with a unified network. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
Baek, Y., et al. Character region awareness for text detection. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
He, P., et al. Single shot text detector with regional attention. in Proceedings of the IEEE international conference on computer vision. 2017.
Shi, B., X. Bai, and C. Yao, An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, 2016. 39(11): p. 2298-2304.
He, K., et al. Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Hu, J., L. Shen, and G. Sun. Squeeze-and-excitation networks. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
Woo, S., et al. Cbam: Convolutional block attention module. in Proceedings of the European conference on computer vision (ECCV). 2018.
Liu, S., et al. Path aggregation network for instance segmentation. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
Cao, J., et al., Attention-guided context feature pyramid network for object detection. arXiv preprint arXiv:2005.11475, 2020.
Raisi, Z., et al., Text detection and recognition in the wild: A review. arXiv preprint arXiv:2006.04305, 2020.
Du, Y., et al., Pp-ocr: A practical ultra lightweight ocr system. arXiv preprint arXiv:2009.09941, 2020.
Liao, M., et al. Real-time scene text detection with differentiable binarization. in Proceedings of the AAAI Conference on Artificial Intelligence. 2020.
Wang, P., et al. A single-shot arbitrarily-shaped text detector based on context attended multi-task learning. in Proceedings of the 27th ACM international conference on multimedia. 2019.
Borisyuk, F., A. Gordo, and V. Sivakumar. Rosetta: Large scale system for text detection and recognition in images. in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018.
Sheng, F., Z. Chen, and B. Xu. NRTR: A no-recurrence sequence-to-sequence model for scene text recognition. in 2019 International Conference on Document Analysis and Recognition (ICDAR). 2019. IEEE.
Liu, W., et al. STAR-Net: a spatial attention residue network for scene text recognition. in BMVC. 2016.
Shi, B., et al. Robust scene text recognition with automatic rectification. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91393-
dc.description.abstract近年來,由於自然場景文字辨識(STR)任務,有著其諸多的應用,例如文件、電子化、圖像搜索、智能偵測、機器人導航,因此為電腦視覺中熱門的研究領域。但由於STR任務複雜的圖像背景、各種字體和不完善的成像條件等許多因素,STR仍然具有極大的挑戰性。
早期的研究主要由手工提取特徵,這些特徵有時可能會限制辨識性能。隨著近年來深度學習的興起,深度學習神經網路在STR任務中,也有了顯著的進步。隨然過去已有許多研究,提出了各式各樣的模型架構,但由於自然場景中有著各種複雜多變性,因此各界對於提升偵測與辨識任務,仍沒有一個足夠完善的算法,能顧及所有場景中的效能表現。
本研究分別針對文字偵測與辨識任務,進行模型優化策略分析及評估。在文字偵測模型中,採用EAST模型為基底,優化前端特徵提取的骨幹部分(Backbone),和網路中段的特徵融合部分(Neck)。在文字辨識模型中,則是採用SRN模型為基底,優化前端特徵提取的骨幹部分(Backbone),以提升其效能。在端到端的架構整合中,則是另外以MobileNetV3為基底,訓練出一個文字方向分類器,以達到辨識直向文字的目標。
實驗結果顯示,經過本研究改良,在偵測任務中,和原EAST模型比較,可將準確率(Precision)提升6.9%,召回率(Recall)可提升2.3%,F度量值(F-measure)可提升4.6%。在辨識任務中,和原SRN模型比較,則是可將準確度(Accuracy)提升8.8%,並將歸一化編輯距離(Normalized Edit Distance)提升9.7%。最後,本研究也將兩項任務整合成端到端的系統架構,解決了中文字中直式書寫的問題,使演算法更具有實用價值。
zh_TW
dc.description.abstractIn recent years, the natural scene text recognition (STR) task has been a popular research field in computer vision due to its many applications, such as document digitization, image search, intelligent detection, and robot navigation. However, STR remains extremely challenging due to many factors such as its complex background, various fonts, and imperfect imaging conditions.
Earlier studies mainly relied on hand-crafted features, which often limited the recognition performance. In recent years, with the rise and development of deep learning, deep learning neural networks have made significant progress in STR tasks. There have been many studies in the past and various model architectures have been proposed. However, due to the complexity and variability of natural scenes, there is still no comprehensive enough algorithm for improving detection and recognition tasks for all scenarios.
This study analyzes and evaluates model optimization strategies for text detection and recognition tasks respectively. In the text detection model, the EAST model is used as the base to optimize the backbone of the front-end feature extraction, and optimize the feature fusion part of the middle part of the network (neck). In the text recognition model, the SRN model is used as the base to optimize the backbone of front-end feature extraction. In the end-to-end architecture integration, MobileNetV3 is used as the base to train a text orientation classifier to achieve the goal of recognizing straight text.
The experimental results show that after the improvement of this study, in the detection task, compared to the original EAST model, the precision can be increased by 6.9%, the recall rate can be increased by 2.3%, and the F-measure can be increased by 4.6%. In the identification task, compared to the original SRN model, the v accuracy can be increased by 8.8%, and the normalized edit distance can be increased by 9.7%. Finally, this research also integrates the two tasks into an end-to-end system architecture, which solves the problem of straight writing in Chinese characters and makes the algorithm more practical.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-01-26T16:18:25Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-01-26T16:18:25Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents口試委員審定書 i
誌謝 ii
摘要 iii
ABSTRACT iv
目錄 vi
圖目錄 viii
表目錄 x
縮略語對照表 xi
第一章 緒論 1
1.1 研究背景 1
1.2 研究目標 2
1.3 研究貢獻 3
1.4 論文架構 4
第二章 相關文獻及方法探討 5
2.1 自然場景文字偵測 5
2.2 自然場景文字辨識 8
2.3 卷積神經網路 11
2.4 現況背景及限制 16
第三章 問題定義及研究方法 17
3.1 問題定義 17
3.2 系統架構 17
3.3 文字偵測模型建立 18
3.4 文字辨識模型建立 24
3.5 端到端系統建立 29
3.6 損失函數 31
第四章 實驗結果與討論 33
4.1 實驗規劃 33
4.2 實驗環境 34
4.3 實驗資料集蒐集 34
4.4 實驗評估方式與參數說明 36
4.5 文字偵測實驗 38
4.6 文字辨識實驗 48
4.7 端到端架構識別 54
第五章 結論與未來展望 57
5.1 結論 57
5.2 未來展望 58
參考文獻 59
附錄A – 文字偵測資料集 62
附錄B – 文字辨識資料集 66
-
dc.language.isozh_TW-
dc.subject文字識別zh_TW
dc.subject文字檢測zh_TW
dc.subject場景文字zh_TW
dc.subject卷積神經網路zh_TW
dc.subject電腦視覺zh_TW
dc.subjectscene texten
dc.subjectcomputer visionen
dc.subjecttext detectionen
dc.subjectconvolutional neural networken
dc.subjecttext recognitionen
dc.title基於特徵金字塔神經網路應用於自然場景文字識別zh_TW
dc.titleText Spotting in Natural Scenes Based on Feature Pyramid Neural Networken
dc.typeThesis-
dc.date.schoolyear112-1-
dc.description.degree碩士-
dc.contributor.oralexamcommittee張恆華;丁肇隆;傅楸善zh_TW
dc.contributor.oralexamcommitteeHerng-Hua Chang;Chao-Lung Ting;Chiou-Shann Fuhen
dc.subject.keyword電腦視覺,場景文字,文字檢測,文字識別,卷積神經網路,zh_TW
dc.subject.keywordcomputer vision,scene text,text detection,text recognition,convolutional neural network,en
dc.relation.page74-
dc.identifier.doi10.6342/NTU202400011-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2024-01-15-
dc.contributor.author-college工學院-
dc.contributor.author-dept工程科學及海洋工程學系-
顯示於系所單位:工程科學及海洋工程學系

文件中的檔案:
檔案 大小格式 
ntu-112-1.pdf7.54 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved