Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/47124
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor許永真(Jane Yung-jen Hsu)
dc.contributor.authorChu-Cheng Linen
dc.contributor.author林居正zh_TW
dc.date.accessioned2021-06-15T05:48:16Z-
dc.date.available2011-08-20
dc.date.copyright2010-08-20
dc.date.issued2010
dc.date.submitted2010-08-18
dc.identifier.citationBibliography
[1] W. S. Allen. Vox Latina: a guide to the pronunciation of classical Latin. Cambridge University Press, Cambridge [Eng.], 1978.
[2] M. Ben Hamed and F. Wang. Stuck in the forest : Trees, networks and Chinese dialects. Diachronica, 23(1):29–60, 2006.
[3] T. Berg-Kirkpatrick, A. Bouchard-Cote, J. DeNero, and D. Klein. Painless unsupervised learning with features. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 582–590, Los Angeles, California, June 2010. Association for Computational Linguistics.
[4] A. Bouchard-Cote, P. Liang, T. Griffiths, and D. Klein. A probabilistic approach to diachronic phonology. Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP/CoNLL), 2007.
[5] CMUDICT. CMU pronouncing dictionary, 1998. http://www.speech.cs.cmu.edu/ cgi-bin/cmudict.
[6] T. M. Ellison. Bayesian identification of cognates and correspondences. In Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology, pages 15–22, Prague, Czech Republic, June 2007. Association for Computational Linguistics.
[7] D. Genzel. Inducing a multilingual dictionary from a parallel multitext in related languages. In HLT ’05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 875–882, Morristown, NJ, USA, 2005. Association for Computational Linguistics.
[8] G. Heinrich. Parameter estimation for text analysis. Technical report, University of Leipzig, 2008.
[9] E. Hinrichs and T. Zastrow. A vector-based approach to dialectometry. In Proceedings of the 17th Meeting of Computational Linguistics in the Netherlands, 2007.
[10] J. H. Jenkins and R. Cook. Unicode Han database. Technical report, The Unicode Consortium, 2009.
[11] J. B. Jensen. On the mutual intelligibility of Spanish and Portuguese. Hispania, 72(4):848–852, 1989.
[12] C.-J. Lin and H.-H. Chen. A Mandarin to Taiwanese Min Nan machine translation system with speech synthesis of Taiwanese Min Nan. International Journal of Computational Linguistics and Chinese Language Processing, 4(1):59–84, 1999.
[13] D. C. Liu and J. Nocedal. On the limited memory bfgs method for large scale optimization. Math. Program., 45(3):503–528, 1989.
[14] X. Lu, B. Zheng, A. Velivelli, and C. Zhai. Enhancing text categorization with semantic-enriched representation and training data augmentation. Journal of the American Medical Informatics Association, 13(5):526 – 535, 2006.
[15] V. H. Mair. What is a Chinese ‘dialect/topolect’ reflections on some key Sino-English linguistic terms. Sino-Platonic Papers, 29:1–31, 1991.
[16] T.-L. Mei. The survival of two pairs of Qieyun distinctions in Southern Wu dialects. Journal of Chinese Linguistics, 280(1):1 – 15, 2001.
[17] K. Nigam, A. K. McCallum, S. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using em. Mach. Learn., 39(2-3):103–134, 2000.
[18] E. G. Pulleyblank. Middle Chinese: a study in historical phonology. University of British Columbia Press, Vancouver, 1984.
[19] E. G. Pulleyblank. Qieyun and yunjing: The essential foundation for chinese historical linguis- tics. Journal of the American Oriental Society, 118(2):200–216, 1998.
[20] P. Resnik and E. Hardisty. Gibbs sampling for the uninitiated. Technical Report CS-TR-4956, UMIACS-TR-2010-04, LAMP-153, University of Maryland, 2010.
[21] B.Snyder,T.Naseem,J.Eisenstein,andR.Barzilay.Unsupervisedmultilinguallearningforpos tagging. In EMNLP ’08: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 1041–1050, Morristown, NJ, USA, 2008. Association for Computational Linguistics.
[22] B. Snyder, T. Naseem, J. Eisenstein, and R. Barzilay. Adding more languages improves unsuper- vised multilingual part-of-speech tagging: a bayesian non-parametric approach. In NAACL ’09: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 83–91, Morristown, NJ, USA, 2009. Association for Computational Linguistics.
[23] M. Streeter. DOC, 1971: A Chinese dialect dictionary on computer. Computers and the Humanities, 6(5):259–270, 1972.
[24] S. Stuker, F. Metze, T. Schultz, and A. Waibel. Integrating multilingual articulatory features into speech recognition. In Eighth European Conference on Speech Communication and Technology. Citeseer, 2003.
[25] C. Tang and V. J. van Heuven. Mutual intelligibility of Chinese dialects experimentally tested. Lingua, 119(5):709–732, 2009.
[26] P.-H. Ting. Some thoughts on the reconstruction of Middle Chinese. Journal of Chinese Linguistics, 249(6):414, 1995.
[27] L. Q. Tong. Survey on the usage of Chinese languages and script. Language and Literature Press, Beijing, 2006. (Chinese) http://www.china-language.gov.cn/LSF/LSFrame.aspx.
[28] D. van Dyk and X. Meng. The art of data augmentation. Journal of Computational and Graphical Statistics, 10(1):1–50, 2001.
[29] X. Zhang. Dialect MT: a case study between Cantonese and Mandarin. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2, ACL-36, pages 1460–1464, Morristown, NJ, USA, 1998. Association for Computational Linguistics.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/47124-
dc.description.abstract大多數漢語方言缺乏完整的數位發音資料庫,而這卻是語音處理不可或缺的。若 有相關方言的完整發音資料庫便能憑某漢字之韻書特徵,及其於相關方言之發 音,使用監督式學習方法預測該漢字於目標方言之發音。遺憾的是漢語方言發音 資料庫資源仍不完備。我們提出一新式生成模型,同時利用方言發音資料以及中 古韻書以發掘在多方言間存在之音韻規律。我們提出之模型能利用現存不完整之 方言發音資料庫以及韻書所載資料增補得出一完整之方言發音資料庫。該方言發 音資料庫之後即可利用傳統監督式學習方法預測某方言之漢字發音。我們藉整體 發音特徵準確率 (OPFA) 項目評估。第一個實驗結果可看出若加入方言發音特徵相 較於僅有韻書特徵,能大幅度改進支持向量機分類器 (SVM classifier) 的效能。第 二個實驗中我們比較利用親屬關係相近之方言與親屬關係相距遙遠之方言之音韻 特徵對支持向量機效能影響。實驗結果顯露利用相近方言可得較高準確率。第三 個實驗中可看出利用我們提出之增補模型可以提高 SVM 模型之 OPFA 準確率高達 4.9%。zh_TW
dc.description.abstractMost spoken Chinese dialects lack comprehensive digital pronunciation databases, which are crucial for speech processing tasks. Given complete pronunciation databases for related dialects, one can use supervised learning techniques to predict a Chinese character’s pronunciation in a target dialect based on the character’s features and its pronunciation in other related dialects. Unfortunately, Chinese dialect pronunciation databases are far from complete. We propose a novel generative model that makes use of both existing dialect pronunciation data plus medieval rime books to discover patterns that exist in multiple dialects. The proposed model can augment missing dialectal pronunciations based on existing dialect pronunciation tables (even if in-complete) and the pronunciation data in rime books. The augmented pronunciation database can then be used in supervised learning settings. We evaluate the prediction accuracy in terms of phonological features, such as tone, initial phoneme, final phoneme, etc. For each character, features are evaluated on the whole, overall pronunciation feature accuracy (OPFA). Our first experimental results show that adding features from dialectal pronunciation data to our baseline rime-book model dramatically improves OPFA using the support vector machine (SVM) model. In the second experiment, we compare the performance of the SVM model using phonological features from closely related dialects with that of the model using phonological features from non-closely related dialects. The experimental results show that using features from closely-related dialects results in higher accuracy. In the third experiment, we show that using our proposed data augmentation model to fill in missing data can increase the SVM model’s OPFA by up to 4.9%.en
dc.description.provenanceMade available in DSpace on 2021-06-15T05:48:16Z (GMT). No. of bitstreams: 1
ntu-99-R97922060-1.pdf: 1295631 bytes, checksum: 98b88a4a2cf74f929906c4f0b88cdb51 (MD5)
Previous issue date: 2010
en
dc.description.tableofcontentsAbstract iii
List of Figures ix
List of Tables x
Chapter 1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Thesis Structure ................................... 3
Chapter 2 Background 5
2.1 Background of Chinese Dialects ......................... 5
2.1.1 Rimebook ................................. 6
2.2 Related Work..................................... 8
2.2.1 Machine translation and text-to-speech system for Chinese dialects 8
2.2.2 Applications with resource-poor languages . . . . . . . . . . . . . . 8 2.2.3 Computational Dialectometry and Phonology . . . . . . . . . . . . 9
Chapter 3 Methodology 11
3.1 Problem definition ................................. 11
3.2 Model considerations................................ 13
3.2.1 Model description............................. 14
3.2.2 Inference .................................. 16
3.2.3 Inference procedure............................ 20
Chapter 4 Data and Evaluation 21
4.1 Data.......................................... 21
4.1.1 Preprocessing................................ 21
4.2 Evaluation ...................................... 23
4.2.1 Experiment Design ............................ 23
4.2.2 Effect of dialectal data on standard classifiers . . . . . . . . . . . . . 24
4.2.3 Impacts of proximate dialects ...................... 25
4.2.4 Effect of data augmentation ....................... 25
Chapter 5 Conclusion 27
Bibliography 29
dc.language.isoen
dc.title增補資源匱乏漢語方言之漢字發音zh_TW
dc.titleAugmentation of Character Pronunciations for Resource-poor Chinese Dialectsen
dc.typeThesis
dc.date.schoolyear98-2
dc.description.degree碩士
dc.contributor.coadvisor蔡宗翰(Richard Tzong-han Tsai)
dc.contributor.oralexamcommittee高成炎(Cheng-yan Kao),陳柏琳(Berlin Chen),陳信希(Hsin-Hsi Chen)
dc.subject.keyword資料增補,生成模型,漢語方言,發音資料庫,zh_TW
dc.subject.keyworddata augmentation,generative model,Chinese dialects,pronunciation database,en
dc.relation.page32
dc.rights.note有償授權
dc.date.accepted2010-08-19
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-99-1.pdf
  目前未授權公開取用
1.27 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved