Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96420
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor顏嗣鈞zh_TW
dc.contributor.advisorHsu-Chun Yenen
dc.contributor.author王昱翔zh_TW
dc.contributor.authorYu-Hsiang Wangen
dc.date.accessioned2025-02-13T16:23:29Z-
dc.date.available2025-02-14-
dc.date.copyright2025-02-13-
dc.date.issued2025-
dc.date.submitted2025-02-04-
dc.identifier.citation[1] Mark C Elliott. The Manchu Way: The Eight Banners and Ethnic Identity in Late Imperial China. Stanford university press, 2001.
[2] Pamela Kyle Crossley and Evelyn S. Rawski. A profile of the manchu language in ch’ing history. Harvard Journal of Asiatic Studies, 53(1):63–102, 1993.
[3] Shuang Xu, Min Li, Rui-Rui Zheng, and Shulmam Michael. Manchu character segmentation and recognition method. Journal of Discrete Mathematical Sciences and Cryptography, 20(1):43–53, 2017.
[4] Di Huang, Min Li, Ruirui Zheng, Shuang Xu, and Jiajing Bi. Synthetic data and dag-svm classifier for segmentation-free manchu word recognition. In 2017 International Conference on Computing Intelligence and Information System (CIIS), pages 46–50. IEEE, 2017.
[5] Ruirui Zheng, Min Li, Jianjun He, Jiajing Bi, and Baochun Wu. Segmentation-free multi-font printed manchu word recognition using deep convolutional features and data augmentation. In 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), pages 1–6. IEEE, 2018.
[6] Diandian Zhang, Yan Liu, Zhuowei Wang, and Depei Wang. Ocr with the deep cnn model for ligature script-based languages like manchu. Scientific programming, 2021(1):5520338, 2021.
[7] Zhiwei Wang, Siyang Lu, Mingquan Wang, Xiang Wei, and Yingjun Qi. Amre: An attention-based crnn for manchu word recognition on a woodblock-printed dataset. In International Conference on Neural Information Processing, pages 267–278. Springer, 2022.
[8] Haoran Li, Siyang Lu, Xiang Wei, and Yingjun Qi. Probing handwritten manchu word recognition foundation model. In 2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS), pages 1262–1269. IEEE, 2023.
[9] Zhiwei Wang, Siyang Lu, Xiang Wei, Run Su, Yingjun Qi, and Wei Lu. Learn more manchu words with a new visual-language framework. ACM Transactions on Asian and Low-Resource Language Information Processing, 2024.
[10] Shruti Rijhwani, Antonios Anastasopoulos, and Graham Neubig. Ocr post correction for endangered language texts. arXiv preprint arXiv:2011.05402, 2020.
[11] Shruti Rijhwani, Daisy Rosenblum, Antonios Anastasopoulos, and Graham Neubig. Lexically aware semi-supervised learning for ocr post-correction. Transactions of the Association for Computational Linguistics, 9:1285–1302, 2021.
[12] Emanuela Boros, Maud Ehrmann, Matteo Romanello, Sven Najem-Meyer, and Frédéric Kaplan. Post-correction of historical text transcripts with large language models: An exploratory study. LaTeCH-CLfL 2024, pages 133–159, 2024.
[13] Garrett Tanzer, Mirac Suzgun, Eline Visser, Dan Jurafsky, and Luke Melas-Kyriazi. A benchmark for learning to translate a new language from one grammar book. arXiv preprint arXiv:2309.16575, 2023.
[14] Samuel Cahyawijaya, Holy Lovenia, and Pascale Fung. Llms are few-shot in-context low-resource language learners. arXiv preprint arXiv:2403.16512, 2024.
[15] Kexun Zhang, Yee Man Choi, Zhenqiao Song, Taiqi He, William Yang Wang, and Lei Li. Hire a linguist!: Learning endangered languages with in-context linguistic descriptions. arXiv preprint arXiv:2402.18025, 2024.
[16] Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, and Furu Wei. Trocr: Transformer-based optical character recognition with pre-trained models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 13094–13102, 2023.
[17] Guang-yuan Zhang, Jing-jiao Li, and Ai-xia Wang. A new recognition method for the handwritten manchu character unit. In 2006 International Conference on Machine Learning and Cybernetics, pages 3339–3344. IEEE, 2006.
[18] Wei Wei and Chen Guo. Off-line manchu character recognition based on multi-classifier ensemble with combination features. Computer Engineering and Design, 33(6):2347–2352, 2012.
[19] Baoguang Shi, Xiang Bai, and Cong Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, 39(11):2298–2304, 2016.
[20] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[21] S Hochreiter. Long short-term memory. Neural Computation MIT-Press, 1997.
[22] Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
[23] Mike Schuster and Kuldip K Paliwal. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing, 45(11):2673–2681, 1997.
[24] Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning, pages 369–376, 2006.
[25] Dzmitry Bahdanau. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
[26] Chen-Yu Lee and Simon Osindero. Recursive recurrent nets with attention modeling for ocr in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2231–2239, 2016.
[27] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NIPS, pages 5998–6008, 2017.
[28] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
[29] Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, and Lei Zhang. CvT: Introducing convolutions to vision transformers. In ICCV, pages 22–31. IEEE, 2021.
[30] Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, and Ping Luo. SegFormer: Simple and efficient design for semantic segmentation with transformers. In NeurIPS, pages 12077–12090, 2021.
[31] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer, 2020.
[32] Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. Improved baselines with visual instruction tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26296–26306, 2024.
[33] Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language understanding by generative pre-training. In OpenAI, 2018.
[34] Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744, 2022.
[35] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96420-
dc.description.abstract滿文檔案是研究中國清代歷史的重要來源,但許多文本尚未全面數位化,因此需要一種能有效識別滿文檔案文字的自動化滿文辨識技術。近期的研究採用了基於滿文字母標記的序列對序列模型,使模型能自主學習影像與字母之間的對應關係。然而,由於缺乏大量真實滿文標註資料,模型難以應對滿文字母因上下文變化而出現的書寫形式差異。為了解決這一問題,本研究提出了一種基於 Transformer 架構的光學字元辨識模型 TrOCR 和大型語言模型 ChatGPT o1 的滿文辨識系統。該系統利用大量低成本的合成滿文影像訓練滿文辨識模型,並引入新穎的滿文音節標記,提升模型對字母書寫形式變化的適應能力。此外,研究還結合滿文字典與滿文單語語料,引導不具備滿文理解能力的大型語言模型進行基於零樣本上下文學習的辨識修正。評估結果顯示,僅使用合成資料訓練的滿文辨識模型在《金剛經》和《親征平定朔漠方略》中的字元錯誤率(CER)分別為 7.33%和 3.84%,準確率分別達到 79.88% 和 87.28%。進一步通過 ChatGPT o1 進行錯誤修正後,《親征平定朔漠方略》的 CER 再降低 1.2%,準確率提高 5.2%。這項研究為低資源語言的文字辨識提供新的低成本方法,促進歷史文獻數位化與保存,為相關領域的研究與應用開闢新方向。zh_TW
dc.description.abstractManchu archival documents serve as essential resources for studying the history of China's Qing dynasty. However, many of these documents remain largely undigitized, underscoring the need for effective Manchu word recognition systems. Recent studies have employed sequence-to-sequence models that utilize Manchu character tokens, enabling the autonomous learning of mappings between visual features and their corresponding characters. Nonetheless, the limited availability of annotated Manchu datasets poses a significant challenge, particularly in capturing contextual variations in Manchu character forms. To address this issue, this study proposes a Manchu recognition system based on the Transformer architecture optical character recognition (OCR) model TrOCR, combined with the large language model ChatGPT o1. The system trains the recognition model on a substantial volume of low-cost synthetic Manchu images and introduces innovative Manchu syllable tokens to improve the model’s adaptability to variations in character forms. Furthermore, the study integrates a Manchu dictionary and a monolingual corpus to guide zero-shot, context-based recognition refinement using large language models that lack inherent understanding of Manchu. Evaluation results show that the Manchu word recognition model achieves character error rates (CER) of 7.33% and 3.84% for the Diamond Sutra and Qinzheng Pingding Shuomo Fanglüe datasets, respectively, with corresponding accuracies of 79.88% and 87.28%. Subsequent error correction using ChatGPT o1 reduces the CER for the Qinzheng Pingding Shuomo Fanglüe dataset by 1.2% and enhances accuracy by 5.2%. This study presents an innovative and high-performance Manchu word recognition system that eliminates the need for annotated real-world data during training. It offers a new low-cost method for word recognition in low-resource languages, facilitating the digitization and preservation of historical documents and opening new directions for research and applications in related fields.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-02-13T16:23:29Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-02-13T16:23:29Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsAcknowledgements i
摘要 ii
Abstract iii
Contents v
List of Figures vii
List of Tables viii
Chapter 1 Introduction 1
Chapter 2 Related Work 5
2.1 Manchu Word Recognition . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Text Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Large Language Model . . . . . . . . . . . . . . . . . . . . . . . . . 8
Chapter 3 Manchu Word Recognition System 9
3.1 Synthetic Dataset Generation . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Manchu Word Recognition Model . . . . . . . . . . . . . . . . . . . 11
3.2.1 Model Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.2 Training Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 LLM-based Recognition Refinement . . . . . . . . . . . . . . . . . 14
Chapter 4 Experimental Setup 18
4.1 Real-World Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3 Baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.4 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Chapter 5 Experimental Results 23
5.1 Manchu Word Recognition Result . . . . . . . . . . . . . . . . . . . 23
5.2 LLM-based Recognition Refinement Result . . . . . . . . . . . . . . 28
5.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.4 Applications and Impacts . . . . . . . . . . . . . . . . . . . . . . . . 32
5.5 Limitation and Future Work . . . . . . . . . . . . . . . . . . . . . . 33
Chapter 6 Conclusion 35
References 36
-
dc.language.isoen-
dc.subject低資源語言zh_TW
dc.subject大型語言模型zh_TW
dc.subject歷史文獻數位化zh_TW
dc.subject深度學習zh_TW
dc.subject滿文辨識zh_TW
dc.subjectHistorical document digitizationen
dc.subjectManchu word recognitionen
dc.subjectDeep learningen
dc.subjectLow-resource languageen
dc.subjectLarge language modelen
dc.title基於 Transformer 的滿文辨識系統與大語言模型零樣本上下文學習修正zh_TW
dc.titleA Transformer-Based Manchu Word Recognition System with Large Language Model Zero-Shot In-Context Learning Refinementen
dc.typeThesis-
dc.date.schoolyear113-1-
dc.description.degree碩士-
dc.contributor.oralexamcommittee蔡宗翰;張明清zh_TW
dc.contributor.oralexamcommitteeRichard Tzong-Han Tsai;Ming-Ching Changen
dc.subject.keyword滿文辨識,深度學習,低資源語言,大型語言模型,歷史文獻數位化,zh_TW
dc.subject.keywordManchu word recognition,Deep learning,Low-resource language,Large language model,Historical document digitization,en
dc.relation.page40-
dc.identifier.doi10.6342/NTU202500194-
dc.rights.note同意授權(限校園內公開)-
dc.date.accepted2025-02-05-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電機工程學系-
dc.date.embargo-lift2028-02-04-
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-113-1.pdf
  未授權公開取用
6.26 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved