Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88336
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳祝嵩zh_TW
dc.contributor.advisorChu-Song Chenen
dc.contributor.author陳冠盛zh_TW
dc.contributor.authorGuan-Sheng Chenen
dc.date.accessioned2023-08-09T16:36:17Z-
dc.date.available2023-11-09-
dc.date.copyright2023-08-09-
dc.date.issued2023-
dc.date.submitted2023-07-27-
dc.identifier.citation[1] AI-Application-and-Integration-Lab. Scene-Text-Detection-And-Recognition-Model_M501: Scene Text Detection and Recognition Model — github.com. https://github.com/AI-Application-and-Integration-Lab/ Scene-Text-Detection-And-Recognition-Model_M501. [Accessed 08-Jul-2023].
[2] R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars. Memory aware synapses: Learning what (not) to forget. In Proceedings of the European conference on computer vision (ECCV), pages 139–154, 2018.
[3] R. Aljundi, E. Belilovsky, T. Tuytelaars, L. Charlin, M. Caccia, M. Lin, and L. Page-Caccia. Online continual learning with maximal interfered retrieval. In Advances in Neural Information Processing Systems 32, pages 11849–11860. Curran Associates, Inc., 2019.
[4] J. Bang, H. Kim, Y. Yoo, J.-W. Ha, and J. Choi. Rainbow memory: Continual learning with a memory of diverse samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8218– 8227, June 2021.
[5] H.Bao,L.Dong,S.Piao,andF.Wei.Beit:Bertpre-trainingofimagetransformers. arXiv preprint arXiv:2106.08254, 2021.
[6] P. Buzzega, M. Boschini, A. Porrello, D. Abati, and S. Calderara. Dark experience for general continual learning: a strong, simple baseline. Advances in neural information processing systems, 33:15920–15930, 2020.
[7] P. Buzzega, M. Boschini, A. Porrello, and S. Calderara. Rethinking experience replay: a bag of tricks for continual learning. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 2180–2187. IEEE, 2021.
[8] K. Cao, C. Wei, A. Gaidon, N. Arechiga, and T. Ma. Learning imbalanced datasets with label-distribution-aware margin loss. Advances in neural information processing systems, 32, 2019.
[9] A. Chaudhry, P. K. Dokania, T. Ajanthan, and P. H. Torr. Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European conference on computer vision (ECCV), pages 532–547, 2018.
[10] A. Chaudhry, M. Ranzato, M. Rohrbach, and M. Elhoseiny. Efficient lifelong learning with a-gem. arXiv preprint arXiv:1812.00420, 2018.
[11] A. Chaudhry, M. Rohrbach, M. Elhoseiny, T. Ajanthan, P. K. Dokania, P. H. Torr, and M. Ranzato. On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486, 2019.
[12] C. K. Chng, Y. Liu, Y. Sun, C. C. Ng, C. Luo, Z. Ni, C. Fang, S. Zhang, J. Han, E. Ding, et al. Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In 2019 International Conference on Document Analysis and Recognition (ICDAR), pages 1571–1576. IEEE, 2019.
[13] H.-P. Chou, S.-C. Chang, J.-Y. Pan, W. Wei, and D.-C. Juan. Remix: rebalanced mixup. In Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16, pages 95–110. Springer, 2020.
[14] C. K. Ch'ng, C. S. Chan, and C. Liu. Total-text: Towards orientation robustness in scene text detection. International Journal on Document Analysis and Recognition (IJDAR), 23:31–52, 2020.
[15] Y. Cong, M. Zhao, J. Li, S. Wang, and L. Carin. Gan memory with no forgetting. Advances in Neural Information Processing Systems, 33:16481–16494, 2020.
[16] J. Cui, Z. Zhong, Z. Tian, S. Liu, B. Yu, and J. Jia. Generalized parametric contrastive learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
[17] Y. Cui, W. Che, T. Liu, B. Qin, S. Wang, and G. Hu. Revisiting pre-trained models for Chinese natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pages 657–668, Online, Nov. 2020. Association for Computational Linguistics.
[18] Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9268–9277, 2019.
[19] Y. Cui, Y. Song, C. Sun, A. Howard, and S. Belongie. Large scale fine-grained categorization and domain-specific transfer learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4109–4118, 2018.
[20] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
[21] P. Dhar, R. V. Singh, K.-C. Peng, Z. Wu, and R. Chellappa. Learning without memorizing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5138–5146, 2019.
[22] A. Douillard, A. Ramé, G. Couairon, and M. Cord. Dytox: Transformers for continual learning with dynamic token expansion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9285–9295, 2022.
[23] Y. Du, Z. Chen, C. Jia, X. Yin, T. Zheng, C. Li, Y. Du, and Y.-G. Jiang. Svtr: Scene text recognition with a single visual model. In International Joint Conference on Artificial Intelligence, 2022.
[24] Y. Du, C. Li, R. Guo, X. Yin, W. Liu, J. Zhou, Y. Bai, Z. Yu, Y. Yang, Q. Dang, etal. Pp-ocr: A practical ultra lightweight ocr system. arXiv preprint arXiv:2009.09941, 2020.
[25] S. Fang, H. Xie, Y. Wang, Z. Mao, and Y. Zhang. Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7098–7107, 2021.
[26] R. Gomez, B. Shi, L. Gomez, L. Numann, A. Veit, J. Matas, S. Belongie, and D. Karatzas. Icdar2017 robust reading challenge on coco-text. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), volume 01, pages 1435–1443, 2017.
[27] S. Gopalakrishnan, P. R. Singh, H. Fayek, S. Ramasamy, and A. Ambikapathi. Knowledge capture and replay for continual learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 10–18, 2022.
[28] A. Gupta, A. Vedaldi, and A. Zisserman. Synthetic data for text localisation in natural images. In IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[29] G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop, 2014.
[30] Y.-C. Hsu, Y.-C. Liu, A. Ramasamy, and Z. Kira. Re-evaluating continual learning scenarios: A categorization and case for strong baselines. In Continual Learning Workshop, 32nd Conference on Neural Information Processing Systems (NIPS 2018), 2018.
[31] M. Huang, Y. Liu, Z. Peng, C. Liu, D. Lin, S. Zhu, N. Yuan, K. Ding, and L. Jin. Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4593–4603, 2022.
[32] C.-Y. Hung, C.-H. Tu, C.-E. Wu, C.-H. Chen, Y.-M. Chan, and C.-S. Chen. Compacting, picking and growing for unforgetting continual learning. Advances in Neural Information Processing Systems, 32, 2019.
[33] D. Isele and A. Cosgun. Selective experience replay for lifelong learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
[34] N. Japkowicz and S. Stephen. The class imbalance problem: A systematic study. Intell. Data Anal., 6(5):429–449, oct 2002.
[35] B. Kang, S. Xie, M. Rohrbach, Z. Yan, A. Gordo, J. Feng, and Y. Kalantidis. Decoupling representation and classifier for long-tailed recognition. In International Conference on Learning Representations, 2020.
[36] D. Karatzas, L. Gómez, A. Nicolaou, and M. Rusinol. The robust reading competition annotation and evaluation platform. In 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pages 61–66. IEEE, 2018.
[37] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny. Icdar 2015 competition on robust reading. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pages 1156–1160, 2015.
[38] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i. Bigorda, S. R. Mestre, J. Mas, D. F. Mota, J. A. Almazàn, and L. P. de las Heras. Icdar 2013 robust reading competition. In 2013 12th International Conference on Document Analysis and Recognition, pages 1484–1493, 2013.
[39] P. Keserwani, R. Saini, M. Liwicki, and P. P. Roy. Robust scene text detection for partially annotated training data. IEEE Transactions on Circuits and Systems for Video Technology, 32(12):8635–8645, 2022.
[40] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick. Segment anything. arXiv:2304.02643, 2023.
[41] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
[42] Z. Kuang, H. Sun, Z. Li, X. Yue, T. H. Lin, J. Chen, H. Wei, Y. Zhu, T. Gao, W. Zhang, K. Chen, W. Zhang, and D. Lin. Mmocr: A comprehensive toolbox for text detection, recognition and understanding. arXiv preprint arXiv:2108.06543, 2021.
[43] J. Lee, S. Park, J. Baek, S. J. Oh, S. Kim, and H. Lee. On recognizing texts of arbitrary shapes with 2d self-attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 546–547, 2020.
[44] K. Lee, K. Lee, J. Shin, and H. Lee. Overcoming catastrophic forgetting with unlabeled data in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 312–321, 2019.
[45] S.-W. Lee, J.-H. Kim, J. Jun, J.-W. Ha, and B.-T. Zhang. Overcoming catastrophic forgetting by incremental moment matching. Advances in neural information processing systems, 30, 2017.
[46] H. Li, P. Wang, and C. Shen. Towards end-to-end text spotting with convolutional recurrent neural networks. In Proceedings of the IEEE international conference on computer vision, pages 5238–5246, 2017.
[47] M. Li, T. Lv, L. Cui, Y. Lu, D. Florencio, C. Zhang, Z. Li, and F. Wei. Trocr: Transformer-based optical character recognition with pre-trained models, 2021.
[48] Z. Li and D. Hoiem. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017.
[49] M. Liao, P. Lyu, M. He, C. Yao, W. Wu, and X. Bai. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(2):532–548, 2021.
[50] M. Liao, G. Pang, J. Huang, T. Hassner, and X. Bai. Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In European Conference on Computer Vision, pages 706–722. Springer, 2020.
[51] M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu. Textboxes: A fast text detector with a single deep neural network. In AAAI, 2017.
[52] M. Liao, Z. Wan, C. Yao, K. Chen, and X. Bai. Real-time scene text detection with differentiable binarization. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 11474–11481, 2020.
[53] M. Liao, Z. Zou, Z. Wan, C. Yao, and X. Bai. Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
[54] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
[55] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. Ssd: Single shot multibox detector. In European conference on computer vision, pages 21–37. Springer, 2016.
[56] X. Liu, R. Zhang, Y. Zhou, Q. Jiang, Q. Song, N. Li, K. Zhou, L. Wang, D. Wang, M. Liao, M. Yang, X. Bai, B. Shi, D. Karatzas, S. Lu, and C. V. Jawahar. Icdar 2019 robust reading challenge on reading chinese text on signboard. 2019 International Conference on Document Analysis and Recognition (ICDAR), pages 1577–1581, 2019.
[57] Y. Liu, H. Chen, C. Shen, T. He, L. Jin, and L. Wang. Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9809–9818, 2020.
[58] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
[59] D. Lopez-Paz and M. Ranzato. Gradient episodic memory for continual learning. Advances in neural information processing systems, 30, 2017.
[60] S. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, R. Young, K. Ashida, H. Nagai, M. Okamoto, H. Yamamoto, H. Miyao, J. Zhu, W. Ou, C. Wolf, J.-M. Jolion, L. Todoran, M. Worring, and X. Lin. Icdar 2003 robust reading competitions: Entries, results and future directions. IJDAR, 7:105–122, 07 2005.
[61] P. Lyu, M. Liao, C. Yao, W. Wu, and X. Bai. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In Proceedings of the European Conference on Computer Vision (ECCV), pages 67–83, 2018.
[62] T. Ma, S. Geng, M. Wang, J. Shao, J. Lu, H. Li, P. Gao, and Y. Qiao. A simple long-tailed recognition baseline via vision-language model. arXiv preprint arXiv:2111.14745, 2021.
[63] B. Na, Y. Kim, and S. Park. Multi-modal text recognition networks: Interactive enhancements between visual and semantic features. In European Conference on Computer Vision, pages 446–463. Springer, 2022.
[64] N. Nayef, Y. Patel, M. Busta, P. N. Chowdhury, D. Karatzas, W. Khlif, J. Matas, U. Pal, J.-C. Burie, C.-l. Liu, et al. Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In 2019 International conference on document analysis and recognition (ICDAR), pages 1582–1587. IEEE, 2019.
[65] N. Nayef, F. Yin, I. Bizid, H. Choi, Y. Feng, D. Karatzas, Z. Luo, U. Pal, C. Rigaud, J. Chazalon, W. Khlif, M. M. Luqman, J.-C. Burie, C.-l. Liu, and J.-M. Ogier. Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification - rrc-mlt. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), volume 01, pages 1454–1459, 2017.
[66] S. Park, Y. Hong, B. Heo, S. Yun, and J. Y. Choi. The majority can help the minority: Context-rich minority oversampling for long-tailed classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2022.
[67] Z. Raisi, M. A. Naiel, G. Younes, S. Wardell, and J. S. Zelek. Transformer-based text detection in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3162–3171, 2021.
[68] S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2001–2010, 2017.
[69] J. Ren, C. Yu, X. Ma, H. Zhao, S. Yi, et al. Balanced meta-softmax for long-tailed visual recognition. Advances in neural information processing systems, 33:4175– 4186, 2020.
[70] A. Risnumawan, P. Shivakumara, C. S. Chan, and C. L. Tan. A robust arbitrary text detection system for natural scene images. Expert Systems with Applications, 41(18):8027–8048, 2014.
[71] R. Ronen, S. Tsiper, O. Anschel, I. Lavi, A. Markovitz, and R. Manmatha. Glass: Global to local attention for scene-text spotting. In European Conference on Computer Vision, pages 249–266. Springer, 2022.
[72] M. Rottmann and M. Reese. Automated detection of label errors in semantic segmentation datasets via deep learning and uncertainty quantification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 3214–3223, January 2023.
[73] M. Schubert, T. Riedlinger, K. Kahl, D. Kröll, S. Schoenen, S. Šegvić, and M. Rottmann. Identifying label errors in object detection datasets by loss inspection. arXiv preprint arXiv:2303.06999, 2023.
[74] J. Schwarz, W. Czarnecki, J. Luketina, A. Grabska-Barwinska, Y. W. Teh, R. Pascanu, and R. Hadsell. Progress & compress: A scalable framework for contin-ual learning. In International conference on machine learning, pages 4528–4537. PMLR, 2018.
[75] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618– 626, 2017.
[76] F. Sheng, Z. Chen, and B. Xu. Nrtr: A no-recurrence sequence-to-sequence model for scene text recognition. In 2019 International conference on document analysis and recognition (ICDAR), pages 781–786. IEEE, 2019.
[77] B. Shi, X. Bai, and S. Belongie. Detecting oriented text in natural images by linking segments. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2550–2558, 2017.
[78] B. Shi, X. Bai, and C. Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, 39(11):2298–2304, 2016.
[79] B. Shi, C. Yao, M. Liao, M. Yang, P. Xu, L. Cui, S. Belongie, S. Lu, and X. Bai. Icdar2017 competition on reading chinese text in the wild (rctw-17). In 2017 14th iapr international conference on document analysis and recognition (ICDAR), volume 1, pages 1429–1434. IEEE, 2017.
[80] D. Shim, Z. Mai, J. Jeong, S. Sanner, H. Kim, and J. Jang. Online class-incremental continual learning with adversarial shapley value. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 9630–9638, 2021.
[81] H. Shin, J. K. Lee, J. Kim, and J. Kim. Continual learning with deep generative replay. Advances in neural information processing systems, 30, 2017.
[82] P.-C. Su. Traditional chinese scene text recognition dataset. https://tbrain. trendmicro.com.tw/Competitions/Details/16. MOE AI competition and labeled data acquisition project. [Accessed 06-Jul-2023].
[83] Y. Sun, Z. Ni, C.-K. Chng, Y. Liu, C. Luo, C. C. Ng, J. Han, E. Ding, J. Liu, D. Karatzas, et al. Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In 2019 International Conference on Document Analysis and Recognition (ICDAR), pages 1557–1562. IEEE, 2019.
[84] Y. Sun, C. Zhang, Z. Huang, J. Liu, J. Han, and E. Ding. Textnet: Irregular text reading from images with an end-to-end trainable network. In Asian Conference on Computer Vision, pages 83–99. Springer, 2018.
[85] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
[86] J. Tan, C. Wang, B. Li, Q. Li, W. Ouyang, C. Yin, and J. Yan. Equalization loss for long-tailed object recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11662–11671, 2020.
[87] J. Tang, W. Zhang, H. Liu, M. Yang, B. Jiang, G. Hu, and X. Bai. Few could be better than all: Feature sampling and grouping for scene text detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4563–4572, 2022.
[88] C. Tian, W. Wang, X. Zhu, J. Dai, and Y. Qiao. Vl-ltr: Learning class-wise visual-linguistic representation for long-tailed visual recognition. In European Conference on Computer Vision, pages 73–91. Springer, 2022.
[89] R. Tiwari, K. Killamsetty, R. Iyer, and P. Shenoy. Gcr: Gradient coreset based replay buffer selection for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 99–108, 2022.
[90] K. Wang and S. Belongie. Word spotting in the wild. In K. Daniilidis, P. Maragos, and N. Paragios, editors, Computer Vision – ECCV 2010, pages 591–604, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.
[91] W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, and S. Shao. Shape robust text detection with progressive scale expansion network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9336–9345, 2019.
[92] Y. Wang, H. Xie, Z.-J. Zha, M. Xing, Z. Fu, and Y. Zhang. Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11753–11762, 2020.
[93] wkentaro. labelme: Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation). — github.com. https: //github.com/wkentaro/labelme. [Accessed 06-Jul-2023].
[94] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest, and A. M. Rush. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online, Oct. 2020. Association for Computational Linguistics.
[95] L. Xiang, G. Ding, and J. Han. Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pages 247–263. Springer, 2020.
[96] L. Xing, Z. Tian, W. Huang, and M. R. Scott. Convolutional character networks. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9126–9136, 2019.
[97] M. Xu, Y. Bai, and B. Ghanem. Missing labels in object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019.
[98] C. Xue, W. Zhang, Y. Hao, S. Lu, P. H. Torr, and S. Bai. Language matters: A weakly supervised vision-language pre-training approach for scene text detection and spotting. In European Conference on Computer Vision, pages 284–302. Springer, 2022.
[99] S. Yan, J. Xie, and X. He. Der: Dynamically expandable representation for class incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3014–3023, 2021.
[100] C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu. Detecting texts of arbitrary orientations in natural images. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 1083–1090, 2012.
[101] J. Yoon, D. Madaan, E. Yang, and S. J. Hwang. Online coreset selection for rehearsal-based continual learning. In International Conference on Learning Representations, 2022.
[102] T. Yuan, Z. Zhu, K. Xu, C. Li, T. Mu, and S. Hu. A large chinese text dataset in the wild. Journal of Computer Science and Technology, 34(3):509–521, 2019.
[103] L. Yuliang, J. Lianwen, Z. Shuaitao, and Z. Sheng. Detecting curve text in the wild: New dataset and new solution. arXiv preprint arXiv:1712.02170, 2017.
[104] F. Zenke, B. Poole, and S. Ganguli. Continual learning through synaptic intelligence. In International conference on machine learning, pages 3987–3995. PMLR, 2017.
[105] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz. mixup: Beyond empirical risk minimization. International Conference on Learning Representations, 2018.
[106] J. Zhang, J. Zhang, S. Ghosh, D. Li, S. Tasci, L. Heck, H. Zhang, and C.-C. J. Kuo. Class-incremental learning via deep model consolidation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1131–1140, 2020.
[107] X. Zhang, Y. Su, S. Tripathi, and Z. Tu. Text spotting transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9519–9528, 2022.
[108] Y. Zhang, B. Hooi, L. Hong, and J. Feng. Self-supervised aggregation of diverse experts for test-agnostic long-tailed recognition. In Advances in Neural Information Processing Systems, 2022.
[109] Y. Zhang, B. Kang, B. Hooi, S. Yan, and J. Feng. Deep long-tailed learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
[110] Y. Zhang, X.-S. Wei, B. Zhou, and J. Wu. Bag of tricks for long-tailed visual recognition with deep convolutional neural networks. Proceedings of the AAAI Conference on Artificial Intelligence, 35(4):3447–3455, May 2021.
[111] Z. Zhong, J. Cui, S. Liu, and J. Jia. Improving calibration for long-tailed recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16489–16498, 2021.
[112] Z. Zhong, J. Cui, S. Liu, and J. Jia. Improving calibration for long-tailed recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[113] D.-W. Zhou, Q.-W. Wang, H.-J. Ye, and D.-C. Zhan. A model or 603 exemplars: Towards memory-efficient class-incremental learning. In ICLR, 2023.
[114] Y. Zhu, J. Chen, L. Liang, Z. Kuang, L. Jin, and W. Zhang. Fourier contour embedding for arbitrary-shaped text detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3123–3131, 2021.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88336-
dc.description.abstract場景文字領域的任務旨在偵測與辨認生活周遭的文字。然而,在收集和標註場景文字的資料集時,需要大量的時間和精力,因此在標注過程中使用高效且可靠的工具,以及適當的安排標注流程是不可或缺的。我們針對繁體中文環境下的場景文字資料集之標注任務,建立了一個標注系統。在此系統中,我們嘗試使用持續學習的技術來快速調整自動提示模型,同時針對中文類別數過多,導致字元分布不平衡的問題引入長尾技術。此外,我們討論了不同的標注流程下,自動提示模型的性能。為了比較這些技術與流程的優劣,我們也提出了一個適用於標注系統的評估方式。最後,我們在系統中引入了基礎模型,以及分析不同工具對標注者標注時間的影響。透過這些技術的應用,將能實現更高效率的資料集標注。zh_TW
dc.description.abstractScene-text models aim to detect and recognize texts in our daily life environment. However, it requires lots of time and efforts to build a scene text dataset annotated by humans. Therefore, we often need to provide the detection and recognition results based on the currently available model to reduce the effort of labeling. To this end, we develop an annotation system for annotating Traditional Chinese scene text datasets. In this system, we try to use continual learning techniques to quickly adjust the automatic labeling tools and attempt to adapt existing long-tail techniques into our system to address imbalanced character distribution in Chinese characters. In addition, we discuss the performance of the automatic labeling tools under different annotation processes. To compare the strengths and weaknesses of these techniques and processes, we also propose a new evaluation protocol for the annotation system. Finally, we introduce the foundation model combined in our system as well as analyze the impact of different tools on the annotation time. By incorporating these techniques, we can achieve a more efficient annotation process.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-09T16:36:17Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2023-08-09T16:36:17Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsVerification Letter from the Oral Examination Committee i
Acknowledgements ii
摘要 iii
Abstract iv
Contents vi
List of Figures ix
List of Tables x
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Thesis Organization 3
Chapter 2 Related Works 5
2.1 Continual Learning and Incremental Learning 5
2.2 Long-tail Learning 7
2.3 Missing Label in Object Detection 9
2.4 Scene Text 9
2.4.1 Datasets 10
2.4.2 Text Detection 11
2.4.3 Text Recognition 12
2.4.4 Text Spotting 13
2.4.5 Text Annotation System 13
Chapter 3 Annotation Process Benchmark 15
3.1 Evaluation Protocol 15
3.2 One-Stage vs. Two-Stage Annotation Process 17
3.3 Traditional Chinese Scene Text Dataset 18
3.4 Evaluation Metrics 22
3.4.1 Scene Text Detection with Continual Learning 22
3.4.2 Scene Text Recognition with Continual Learning 23
3.4.3 One-Stage vs. Two-Stage Annotation Process 24
Chapter 4 Experiments 26
4.1 Scene Text Detection 26
4.1.1 Implementation Details 26
4.1.2 Results 28
4.2 Scene Text Recognition 31
4.2.1 Implementation Detail 31
4.2.2 Results 32
4.3 One-Stage Annotation vs. Two-Stage Annotation 34
4.3.1 Implementation Details 34
4.3.2 Results 35
Chapter 5 Annotation System Architecture 38
5.1 Overall Architecture 38
5.2 Annotation Time Experiments 40
Chapter 6 Conclusion 43
References 45
-
dc.language.isoen-
dc.subject長尾問題zh_TW
dc.subject持續學習zh_TW
dc.subject標注系統zh_TW
dc.subject文字偵測zh_TW
dc.subject場景文字zh_TW
dc.subject文字辨識zh_TW
dc.subjectText Detectionen
dc.subjectAnnotation Systemen
dc.subjectScene Texten
dc.subjectText Recognitionen
dc.subjectLong-Tail Problemen
dc.subjectContinual Learningen
dc.title基於持續與雙階段學習的場景文字標註平台研究zh_TW
dc.titleResearch on Constructing Scene Text Annotation Platform Using Continual Learning Technology in Two Stagesen
dc.typeThesis-
dc.date.schoolyear111-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee徐繼聖;葉梅珍;彭彥璁;陳彥仰zh_TW
dc.contributor.oralexamcommitteeGee-Sern Hsu;Mei-Chen Yeh;Yan-Tsung Peng;Mike Y. Chenen
dc.subject.keyword場景文字,文字偵測,文字辨識,標注系統,持續學習,長尾問題,zh_TW
dc.subject.keywordScene Text,Text Detection,Text Recognition,Annotation System,Continual Learning,Long-Tail Problem,en
dc.relation.page61-
dc.identifier.doi10.6342/NTU202302185-
dc.rights.note同意授權(限校園內公開)-
dc.date.accepted2023-07-28-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-111-2.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
19.93 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved