請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66765完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 吳沛遠 | |
| dc.contributor.author | Chia-Lin Chang | en |
| dc.contributor.author | 張嘉麟 | zh_TW |
| dc.date.accessioned | 2021-06-17T00:56:25Z | - |
| dc.date.available | 2021-02-18 | |
| dc.date.copyright | 2020-02-18 | |
| dc.date.issued | 2020 | |
| dc.date.submitted | 2020-02-04 | |
| dc.identifier.citation | H. J. Chen, “text-orientation: Codrops css reference.”
C. Choi, Y. Yoon, J. Lee, and J. Kim, “Simultaneous recognition of horizontal and vertical text in natural images,” in Asian Conference on Computer Vision, pp. 202–212, Springer, 2018. J. J. Weinman, Z. Butler, D. Knoll, and J. Feild, “Toward integrated scene text reading,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36,pp. 375–387, Feb 2014. A. Bissacco, M. J. Cummins, Y. Netzer, and H. Neven, “Photoocr: Reading text in uncontrolled conditions,” 2013 IEEE International Conference on Computer Vision,pp. 785–792, 2013. T. Wang, D. J. Wu, A. Coates, and A. Y. Ng, “End-to-end text recognition with convolutional neural networks,” in Proceedings of the 21st International Conferenceon Pattern Recognition (ICPR2012), pp. 3304–3308, Nov 2012. J. Feild and E. Learned-Miller, “Scene text recognition with bilateral regression,” Department of Computer Science, University of Massachusetts Amherst, Tech. Rep. UM-CS-2012-021, 2012. A. Neubeck and L. Van Gool, “Efficient non-maximum suppression,” in 18th International Conference on Pattern Recognition (ICPR’06), vol. 3, pp. 850–855, Aug 2006. S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach. Upper Saddle River, NJ, USA: Prentice Hall Press, 3rd ed., 2009. M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, “Reading text in the wild with convolutional neural networks,” International Journal of Computer Vision, vol. 116, no. 1, pp. 1–20, 2016. A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural ’networks,” vol. 2006, pp. 369–376, 01 2006. B.Shi, X. Bai, andC.Yao, “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, pp. 2298–2304, Nov 2017. F. Yin, Y. Wu, X. Zhang, and C. Liu, “Scene text recognition with sliding convolutional character models,” CoRR, vol. abs/1709.01727, 2017. A. Graves, “Sequence transduction with recurrent neural networks,” CoRR, vol. abs/1211.3711, 2012. Z. Tian, J. Yi, J. Tao, Y. Bai, and Z. Wen, “Self-attention transducers for end-to-end speech recognition,” arXiv preprint arXiv:1909.13037, 2019. D.Bahdanau,K.Cho,andY.Bengio,“Neural machine translation by jointly learning to align and translate,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. K. Cho, A. Courville, and Y. Bengio, “Describing multimedia content using attention-based encoder-decoder networks,” IEEE Transactions on Multimedia, vol. 17, no. 11, pp. 1875–1886, 2015. K.Xu, J.Ba, R.Kiros, K.Cho, A.Courville, R.Salakhudinov, R.Zemel, andY.Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” in International conference on machine learning, pp. 2048–2057, 2015. J. K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio, “Attentionbased models for speech recognition,” in Advances in neural information processing systems, pp. 577–585, 2015. C.-Y. Lee and S. Osindero, “Recursive recurrent nets with attention modeling for ocr in the wild,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2231–2239, 2016. Z. Cheng, F. Bai, Y. Xu, G. Zheng, S. Pu, and S. Zhou, “Focusing attention: Towards accurate text recognition in natural images,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 5076–5084, 2017. Y. Jiang, X. Zhu, X. Wang, S. Yang, W. Li, H. Wang, P. Fu, and Z. Luo,“R2CNN: rotational region CNN for orientation robust scene text detection,” CoRR, vol. abs/1706.09579, 2017. M. Liao, B. Shi, and X. Bai, “Textboxes++: A single-shot oriented scene text detector,” IEEE Transactions on Image Processing, vol. 27, pp. 3676–3690, Aug 2018. M. Liao, Z. Zhu, B. Shi, G.-s. Xia, and X. Bai, “Rotation-sensitive regression for oriented scene text detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5909–5918, 2018. P. Shivakumara, T. Q. Phan, and C. L. Tan, “A laplacian approach to multi-oriented text detection in video,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, pp. 412–419, Feb 2011. J. Ma, W. Shao, H. Ye, L. Wang, H. Wang, Y. Zheng, and X. Xue, “Arbitrary oriented scene text detection via rotation proposals,” IEEE Transactions on Multimedia, vol. 20, pp. 3111–3122, Nov 2018. M. Liao, P. Lyu, M. He, C. Yao, W. Wu, and X. Bai, “Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2019. X. Zhu, Y. Jiang, S. Yang, X. Wang, W. Li, P. Fu, H. Wang, and Z. Luo, “Deep residual text detection network for scene text,” in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 807–812, IEEE, 2017. X.Zhou, C. Yao, H.Wen, Y.Wang, S.Zhou, W.He, andJ. Liang, “East: An efficient and accurate scene text detector,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2642–2651, July 2017. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. E. Reed, C.-Y. Fu, and A. C. Berg,“Ssd: Single shot multibox detector,” in ECCV, 2016. Y. Zhou, Q. Ye, Q. Qiu, and J. Jiao, “Oriented response networks,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. O.Y.Ling, L.B.Theng, A.Chai, andC.McCarthy, “Amodelforautomaticrecognition of vertical texts in natural scene images,” in 2018 8th IEEE International Conference on Control System, Computing and Engineering (ICCSCE), pp. 170–175,Nov 2018. J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide-baseline stereo from maximally stable extremal regions,” Image and vision computing, vol. 22, no. 10,pp. 761–767, 2004. K.He, X.Zhang, S.Ren, andJ.Sun, “Deepresiduallearningforimagerecognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016. G. Philipp, D. Song, and J. G. Carbonell, “Gradients explode - deep networks are shallow - resnet explained,” 2018. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, pp. 1735–80, 12 1997. M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Trans. Signal Processing, vol. 45, pp. 2673–2681, 1997. D.Bahdanau,K.Cho,andY.Bengio,“Neural machine translation by jointly learning to align and translate,” 2014. cite arxiv:1409.0473Comment: Accepted at ICLR 2015 as oral presentation. V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,” in Soviet physics doklady, vol. 10, pp. 707–710, 1966. M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, “Synthetic data and artificial neural networks for natural scene text recognition,” in Workshop on Deep Learning, NIPS, 2014. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, pp. 234–241, Springer, 2015. H. Samet and M. Tamminen, “Efficient component labeling of images of arbitrary dimension represented by linear bintrees,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 10, pp. 579–586, July 1988. M.B.Dillencourt and H.Samet,“A general approach to connected-component labeling for arbitrary image representations,” Journal of the ACM, vol. 39, pp. 253–280,1992. A. Mishra, K. Alahari, and C. V. Jawahar, “Top-down and bottom-up cues for scene text recognition,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2687–2694, June 2012. K. Wang, B. Babenko, and S. Belongie, “End-to-end scene text recognition,” in Proceedings of the 2011 International Conference on Computer Vision, ICCV ’11, (Washington, DC, USA), pp. 1457–1464, IEEE Computer Society, 2011. S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, R. Young, K. Ashida, H. Na-gai, M. Okamoto, H. Yamamoto, H. Miyao, J. Zhu, W. Ou, C. Wolf, J.-M. Jolion, L. Todoran, M. Worring, and X. Lin, “Icdar 2003 robust reading competitions: entries, results, and future directions,” International Journal of Document Analysis and Recognition (IJDAR), vol. 7, pp. 105–122, Jul 2005. D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i. Bigorda, S. R. Mestre, J. Mas, D. F. Mota, J. A. Almazàn, and L. P. de las Heras, “Icdar 2013 robust reading competition,” in 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493, Aug 2013. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” CoRR, vol. abs/1502.03167, 2015. A.Gupta, A.Vedaldi, and A.Zisserman, “Synthetic data for text localization in natural images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324, 2016. Kai Wang, B. Babenko, and S. Belongie, “End-to-end scene text recognition,” in 2011 International Conference on Computer Vision, pp. 1457–1464, Nov 2011. A. Mishra, K. Alahari, and C. Jawahar, “Scene text recognition using higher order language priors,” in Proceedings of the British Machine Vision Conference, pp. 127.1–127.11, BMVA Press, 2012. T. Novikova, O. Barinova, P. Kohli, and V. Lempitsky, “Large-lexicon attribute consistent text recognition in natural images,” in Computer Vision – ECCV 2012 (A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, and C. Schmid, eds.), (Berlin, Hei-delberg), pp. 752–765, Springer Berlin Heidelberg, 2012. A. Bissacco, M. Cummins, Y. Netzer, and H. Neven, “Photoocr: Reading text in uncontrolled conditions,” in 2013 IEEE International Conference on Computer Vision, pp. 785–792, Dec 2013. V. Goel, A. Mishra, K. Alahari, and C. V. Jawahar, “Whole is greater than sum of parts: Recognizing scene text words,” in 2013 12th International Conference on Document Analysis and Recognition, pp. 398–402, Aug 2013. O. Alsharif and J. Pineau, “End-to-end text recognition with hybrid hmm maxout models,” CoRR, vol. abs/1310.1811, 2013. J. Almazán, A. Gordo, A. Fornés, and E. Valveny, “Word spotting and recognition with embedded attributes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, pp. 2552–2566, Dec 2014. C. Yao, X. Bai, B. Shi, and W. Liu, “Strokelets: A learned multi-scale representation for scene text recognition,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4042–4049, June 2014. B. Su and S. Lu, “Accurate scene text recognition based on recurrent neural network,” in Computer Vision – ACCV 2014 (D. Cremers, I. Reid, H. Saito, and M.-H.Yang, eds.), (Cham), pp. 35–48, Springer International Publishing, 2015. M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep structured output learning for unconstrained text recognition,” 12 2014. J. A. Rodriguez-Serrano, A. Gordo, and F. Perronnin, “Label embedding: A frugal baseline for text recognition,” International Journal of Computer Vision, vol. 113, pp. 193–207, Jul 2015. A. Gordo, “Supervised mid-level features for word image representation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2956–2964, 2015. X. Yang, D. He, Z. Zhou, D. Kifer, and C. L. Giles, “Learning to read irregular text with attention mechanisms,” in Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 3280–3286, 2017. Z. Cheng, Y. Xu, F. Bai, Y. Niu, S. Pu, and S. Zhou, “Aon: Towards arbitrarily-oriented text recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5571–5579, 2018. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66765 | - |
| dc.description.abstract | 過去關於場景文字辨識的文獻主要致力於單一方向之橫書文字辨識,然而在現實環境中,橫書及直書的文字同時出現在一個場景的情形並非不會發生。 尤其在部分的亞洲國家,例如中國,街景中可見的直書文字幾乎與橫書文字一樣多。在這樣的情況下,若要正確地識別場景中所有的文字,必須使文字辨識系統可以同時處理橫書及直書的文字。一般而言,一個完整的文字辨識系統會包含一個偵測器及一個辨識器,其中偵測器輸出的文字圖片會是辨識器的輸入。在現存文獻中,大多會要求辨識器的每張輸入圖片具有相同的文字排列方向(例如,由左至右)。然而,一旦文字辨識系統的輸入圖片可以同時包含任意角度的橫書及直書文字,我們很難確保偵測器輸出的文字圖片都具有相同的文字方向,而這將會造成辨識器預測錯誤。在這篇論文裡,我們針對任意方向橫書及直書之文字設計了一個新穎的場景文字辨識系統。其中,基於類神經網路的辨識器可以端對端的方法進行訓練並且只需要單詞級別的標註資料。除此之外,我們更設計了一個文字角度預測器,用以擷取圖片中文字的旋轉角度資訊並進一步確保輸入辨識器的文字圖片都具有符合要求的文字方向。由於目前並沒有公開的直書場景文字資料集,我們實作出一個直書文字圖片產生器並生成了一份直書英文資料集供訓練用。我們另外蒐集並標註了一個真實場景直書英文資料集供測試用。我們的方法在公開的橫書英文資料集(SVT、 IIIT-5k 跟 ICDAR)上與目前領先的方法有相當的成績,但同時又較其他方法多了可以同時處理任意方向橫書及直書文字的能力。 | zh_TW |
| dc.description.abstract | Research of scene text recognition done to date has focused on sideways text recognition. However, it is common that both sideways and upright text appear in one scene. In some Asian countries like China, you may see as much upright text as sideways text in street views. Under such circumstance, it is necessary for a scene text recognition system to recognize both types of text simultaneously. Generally, a scene text recognition system is compose of a detector and a recognizer and the input of the recognizer is usually the output of the detector. Most existing scene text recognizers expect the text in all input image to be arranged in the same direction (e.g., from left to right). However, once the text lines in a image can be arbitrarily sideways and upright with random orientation angle, it is hard to make sure all detector output images have the same character direction which would cause false recognition. In this paper, we develop a system for scene text recognition of both sideways and upright text in arbitrary orientation. A text orientation estimation module is further proposed to capture the orientation angle information and make sure the character direction is correct for the recognizer. Since there is no public upright text dataset, We implemented an upright synthetic data engine to generate a synthetic upright English text dataset (Synth-ENGV) for training and collected a real-world upright English dataset (ENG) for testing. Experimenting on benchmark sideways datasets, including the street view text (SVT), IIIT-5k and ICDAR, our model demonstrates competitive performance compared to state-of-the-arts, with the additional functionality of handling text in different direction and automatically recognizing both sideways and upright text in the same time. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-17T00:56:25Z (GMT). No. of bitstreams: 1 ntu-109-R06942112-1.pdf: 4806224 bytes, checksum: 2ac4455dd288f57747c098bbe5824272 (MD5) Previous issue date: 2020 | en |
| dc.description.tableofcontents | 1 Introduction p.1
2 Related Work p.7 2.1 Scene Text Recognition p.7 2.2 Scene Text Detection p.9 2.3 Our Work p.10 3 Methodology p.13 3.1 Convolutional Feature Extractor p.13 3.2 Recurrent Layers p.16 3.3 Typeset Classifier p.17 3.4 Transcription Layer p.17 3.5 Lexicon-free and Lexicon-based Transcription p.18 3.6 Synthetic Data Engine p.19 3.7 Text Orientation Estimation p.20 4 Experiments p.25 4.1 Datasets p.25 4.2 Implementation Details p.26 4.2.1 Network Configurations p.26 4.2.2 Model Training p.26 4.2.3 Running Environment p.27 4.3 Results p.27 5 Conclusion and Future Works p.31 Appendices p.35 A Reproduction of Existing Methods p.37 A.1 Yang et al. p.37 A.2 Lee et al. p. 38 A.3 Shi et al. p.38 A.4 Cheng et al. p.39 A.5 Cheng et al. p.39 A.6 Choi et al. p.39 Bibliography p.41 | |
| dc.language.iso | en | |
| dc.subject | 卷積神經網路 | zh_TW |
| dc.subject | 類神經網路 | zh_TW |
| dc.subject | 深度學習 | zh_TW |
| dc.subject | 光學文字辨識 | zh_TW |
| dc.subject | 場景文字辨識 | zh_TW |
| dc.subject | 圖型識別 | zh_TW |
| dc.subject | 循環神經網絡 | zh_TW |
| dc.subject | 電腦視覺 | zh_TW |
| dc.subject | Convolutional Neural Network | en |
| dc.subject | Pattern Recognition | en |
| dc.subject | Scene Text Recognition | en |
| dc.subject | Optical Character Recognition | en |
| dc.subject | Deep Learning | en |
| dc.subject | Neural Network | en |
| dc.subject | Computer Vision | en |
| dc.subject | Recurrent Neural Network | en |
| dc.title | 任意方向橫書及直書之場景文字辨識 | zh_TW |
| dc.title | A Scene Text Recognition System of Both Sideways and Upright Text in Arbitrary Orientation | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 108-1 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 陳祝嵩,林昌鴻 | |
| dc.subject.keyword | 電腦視覺,圖型識別,場景文字辨識,光學文字辨識,深度學習,類神經網路,卷積神經網路,循環神經網絡, | zh_TW |
| dc.subject.keyword | Computer Vision,Pattern Recognition,Scene Text Recognition,Optical Character Recognition,Deep Learning, Neural Network,Convolutional Neural Network,Recurrent Neural Network, | en |
| dc.relation.page | 48 | |
| dc.identifier.doi | 10.6342/NTU202000253 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2020-02-04 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 電信工程學研究所 | zh_TW |
| 顯示於系所單位: | 電信工程學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-109-1.pdf 未授權公開取用 | 4.69 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
