中文雙音節複合詞組成詞素之語意分群

Chia-Ling Lee; 李嘉玲

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57846

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	許永真(Jane Yung-jen Hsu)
dc.contributor.author	Chia-Ling Lee	en
dc.contributor.author	李嘉玲	zh_TW
dc.date.accessioned	2021-06-16T07:07:08Z	-
dc.date.available	2014-08-12
dc.date.copyright	2014-08-12
dc.date.issued	2014
dc.date.submitted	2014-07-09
dc.identifier.citation	[1] E. Agirre, E. Alfonseca, K. Hall, J. Kravalova, M. Pa ̧sca, and A. Soroa. A study on similarity and relatedness using distributional and wordnet-based approaches. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 19–27. Association for Computational Linguistics, 2009. [2] E. Agirre, M. Cuadros, G. Rigau, and A. Soroa. Exploring knowledge bases for similarity. In LREC, 2010. [3] E. Agirre and A. Soroa. Personalizing Pagerank for word sense disambiguation. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 33–41. Association for Computational Linguistics, 2009. [4] D. Bollegala, Y. Matsuo, and M. Ishizuka. Measuring semantic similarity between words using web search engines. In Proceedings of WWW, volume 766, 2007. [5] D. Bollegala, Y. Matsuo, and M. Ishizuka. A web search engine-based approach to measure semantic similarity between words. Knowledge and Data Engineering, IEEE Transactions on, 23(7):977–990, 2011. [6] L.-p. Chang and K.-j. Chen. The CKIP part-of-speech tagging system for modern Chinese texts. In Proceedings of 1995 International Conference on Computer Processing of Oriental Languages, pages 172–175, 1995. [7] K.-J. Chen, C.-R. Huang, L.-P. Chang, and H.-L. Hsu. Sinica corpus: Design methodology for balanced corpora. Language, 167:176, 1996. [8] K. Church, W. Gale, P. Hanks, and D. Hindle. Parsing, word associations and typical predicate-argument relations. In Proceedings of the workshop on Speech and Natural Language, pages 75–81. Association for Computational Linguistics, 1989. [9] R. L. Cilibrasi and P. M. Vitanyi. The Google similarity distance. Knowledge and Data Engineering, IEEE Transactions on, 19(3):370–383, 2007. [10] T. M. Cover and J. A. Thomas. Elements of information theory. John Wiley & Sons, 2012. [11] L. Dai, B. Liu, Y. Xia, and S. Wu. Measuring semantic similarity between words using HowNet. In Proceedings of the International Conference on Computer Science and Information Technology, pages 601–605. IEEE, 2008. [12] L. R. Dice. Measures of the amount of ecologic association between species. Ecology, 26(3):297–302, 1945. [13] Z. Dong and Q. Dong. Hownet, 2000. [14] Z. Dong and Q. Dong. HowNet and the Computation of Meaning. World Scientific, 2006. [15] J. Firth. A Synopsis of Linguistic Theory, 1930-1955. 1957. [16] A. L. Fred and A. K. Jain. Combining multiple clusterings using evidence accumulation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 27(6):835–850, 2005. [17] B. Galmar. Using Kohonen maps of Chinese morphological families to visualize the interplay of morphology and semantics in Chinese. In Proceedings of the Twenty Third Conference on Computational Linguistics and Speech Processing, pages 240–251, Taipei, Taiwan, 2011. Association for Computational Linguistics. [18] B. Galmar and J.-Y. Chen. Identifying different meanings of a Chinese morpheme through latent semantic analysis and minimum spanning tree analysis. International Journal of Computational Linguistics and Applications, 1(1-2):153–168, 2010. [19] B. Galmar and J.-Y. Chen. Identifying different meanings of a Chinese morpheme through semantic pattern matching in augmented minimum spanning trees. The Prague Bulletin of Mathematical Linguistics, 94(1):15–34, 2010. [20] G. H. Golub and C. Reinsch. Singular value decomposition and least squares solutions. Numerische Mathematik, 14(5):403–420, 1970. [21] Z. S. Harris. Distributional structure. Word, 10:146–162, 1954. [22] C. Havasi, R. Speer, and J. Alonso. ConceptNet 3: a flexible, multilingual semantic network for common sense knowledge. In Recent Advances in Natural Language Processing, pages 27–29, 2007. [23] C.-R. Huang and K.-J. Chen. Academia Sinica balanced corpus (version 3). Taipei, Taiwan: Academia Sinica, 1998. [24] C.-R. Huang, K.-J. Chen, and L.-L. Chang. Segmentation standard for Chinese natural language processing. In Proceedings of the Sixteenth Conference on Computational Linguistics, pages 1045–1048. Association for Computational Linguistics, 1996. [25] T. Hughes and D. Ramage. Lexical semantic relatedness with random graph walks. In EMNLP-CoNLL, pages 581–589, 2007. [26] E. Iosif and A. Potamianos. Unsupervised semantic similarity computation between terms using web documents. Knowledge and Data Engineering, IEEE Transactions on, 22(11):1637–1647, 2010. [27] P. Jaccard. The distribution of the flora in the alpine zone. 1. New phytologist, 11(2):37–50, 1912. [28] D. A. Jackson, K. M. Somers, and H. H. Harvey. Similarity coefficients: measures of co-occurrence and association or simply measures of occurrence? American Naturalist, 133(3):436–453, 1989. [29] J. J. Jiang and D. W. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008, 1997. [30] J. R. Kirby, S. H. Deacon, P. N. Bowers, L. Izenberg, L. Wade-Woolley, and R. Parrila. Children’s morphological awareness and reading ability. Reading and Writing, 25(2):389–410, 2012. [31] T. Kohonen. Self-Organizing Maps. Springer, Verlog, Berlin, 2001. [32] Y.-M. Ku and R. C. Anderson. Development of morphological awareness in Chinese and English. Reading and Writing: An Interdisciplinary Journal, 16(5):399–422, 2003. [33] T. K. Landauer, P. W. Foltz, and D. Laham. An introduction to latent semantic analysis. Discourse Processes, 25(2-3):259–284, 1998. [34] C. Leacock and M. Chodorow. Combining local context and WordNet similarity for word sense identification. In WordNet: An Electronic Lexical Database. MIT Press, 1998. [35] D. Lin. Using syntactic dependency as local context to resolve word sense ambiguity. In Proceedings of the Thirty-fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, pages 64–71, Madrid, Spain, 1997. Association for Computational Linguistics. [36] D. Lin. An information-theoretic definition of similarity. In ICML, volume 98, pages 296–304, 1998. [37] P. D. Liu and C. McBride-Chang. What is morphological awareness? Tapping lexical compounding awareness in Chinese third graders. Journal of Educational Psychology, 102(1):62–73, 2010. [38] Q. Liu and S. Li. Word similarity computing based on How-net. Computational Linguistics and Chinese Language Processing, 7(2):59–76, 2002. [39] K. Lund and C. Burgess. Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, & Computers, 28(2):203– 208, 1996. [40] C. D. Manning, P. Raghavan, and H. Schu ̈tze. Introduction to Information Retrieval, volume 1. Cambridge University Press, Cambridge, MA, 2008. [41] C. D. Manning and H. Schu ̈tze. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, 1999. [42] R. Mihalcea, C. Corley, and C. Strapparava. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of the Twenty-First Conference on Artificial Intelligence, volume 21, page 775, Boston, MA, 2006. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999. [43] G. A. Miller. Wordnet: a lexical database for English. Communications of the ACM, 38(11):39–41, 1995. [44] G. A. Miller and W. G. Charles. Contextual correlates of semantic similarity. Language and cognitive processes, 6(1):1–28, 1991. [45] R. Navigli. Word sense disambiguation: A survey. ACM Computing Surveys, 41(2):1–69, 2009. [46] S. Pad ́o and M. Lapata. Dependency-based construction of semantic space models. Computational Linguistics, 33(2):161–199, 2007. [47] L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. 1999. [48] S. Patwardhan and T. Pedersen. Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In Proceedings of the EACL Workshop on Making Sense of Sense: Bringing Psycholinguistics and Computational Linguistics Together, pages 1–8, Trento, Italy, 2006. [49] T. Pedersen and R. Bruce. Distinguishing word senses in untagged text. In Proceedings of the second conference on empirical methods in natural language processing, volume 2, pages 197–207, 1997. [50] T. Pedersen, S. Patwardhan, and J. Michelizzi. WordNet::similarity - measuring the relatedness of concepts. In Proceedings of the Nineteenth National Conference on Ar- tificial Intelligence, pages 38–41, San Jose, CA, 2004. Association for Computational Linguistics. [51] P. Resnik. Using information content to evaluate semantic similarity in a taxonomy. pages 448–453, Montreal, Canada, 1995. [52] M. Ruiz-Casado, E. Alfonseca, and P. Castells. Using context-window overlapping in synonym discovery and ontology extension. In Proceedings of the international conference recent advances in natural language processing (RANLP-2005), 2005. [53] H. Schu ̈tze. Automatic word sense discrimination. Computational linguistics, 24(1):97–123, 1998. [54] H. Shu, C. McBride-Chang, S. Wu, and H. Liu. Understanding Chinese develop- mental dyslexia: Morphological awareness as a core cognitive construct. Journal of Educational Psychology, 98(1):122, 2006. [55] R. Speer and C. Havasi. Representing general relational knowledge in ConceptNet 5. In LREC, pages 3679–3686, 2012. [56] R. Speer, C. Havasi, and H. Lieberman. AnalogySpace: Reducing the dimensionality of common sense knowledge. In Proceedings of AAAI, 2008. [57] M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques. In KDD workshop on text mining, volume 400, pages 525–526. Boston, 2000. [58] P. Turney. Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In Proceedings of the Twelfth European Conference on Machine Learning, pages 491–502, Freiburg, Germany, 2001. [59] S. Vega-Pons and J. Ruiz-Shulcloper. A survey of clustering ensemble algorithms. International Journal of Pattern Recognition and Artificial Intelligence, 25(03):337–372, 2011. [60] M. E. Wall, A. Rechtsteiner, and L. M. Rocha. Singular value decomposition and principal component analysis. volume 91, chapter 5. Springer, 2003. [61] H.-C. Wang, L.-C. Hsu, Y.-M. Tien, and M. Pomplun. Estimating semantic transparency of constituents of English compounds and two-character Chinese words using latent semantic analysis. In Proceedings of Annual Meeting of the Cognitive Science Society, Sapporo, Japan, 2012. [62] Z. Wu and M. Palmer. Verbs semantics and lexical selection. In Proceedings of the 32nd annual meeting on Association for Computational Linguistics, pages 133–138. Association for Computational Linguistics, 1994. [63] N. Xue. Chinese word segmentation as character tagging. Computational Linguistics and Chinese Language Processing, 8(1):29–48, 2003. [64] D. Yarowsky. One sense per collocation. In Proceedings of the workshop on Human Language Technology, pages 266–271. Association for Computational Linguistics, 1993. [65] M. Zhang, Y. Zhang, W. Che, and T. Liu. Chinese parsing exploiting characters. In 51st Annual Meeting of the Association for Computational Linguistics, 2013. [66] H. Zhao. Character-level dependencies in Chinese: Usefulness and learning. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 879–887. Association for Computational Linguistics, 2009.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57846	-
dc.description.abstract	「詞素」是構成「詞」的基本單位，為語言中具有意義的最小單位。「詞素覺識」指的是察覺及操弄語詞內在結構關係之能力，許多語言學家認為其與閱讀理解有很大的關係。中文詞彙中有76%的詞為複合詞（compound），而一個中文字在不同的語詞中可能有不同的意思。給定一組享有同一目標字之構詞相關中文詞彙，本研究主要目標即是透過計算語言學的技術，依據目標字在這些詞彙中的字義作分群。以「商」為例，我們能將{商店、商品、商代、商朝}根據「商」的字義，分成兩群：{商店、商品}及{商代、商朝}。第一群的「商」與商業相關，而在第二群裡，「商」則是指一個朝代。我們的方法考慮上下文、語意、句法、語彙、統計等因素，並將這些因素整合。為了比較結果，我們請數位研究生與小學生做同樣的詞彙分群。實驗結果顯示，我們提出的整合方法，與小學生組達同一程度的表現水準。	zh_TW
dc.description.abstract	Morphological awareness is thought by many linguists to strongly affect reading development in children. A Chinese character embedded in different compound words may carry different meanings. In this work, we aim at semantical clustering of a given family of morphologically related Chinese words. For example, '商店(store)', '商品(commodity)', '商代(Shang period)', and '商朝(Shang Dynasty)' can form two clusters: {'商店', '商品'} and {'商代', '商朝'}. In terms of meanings of the character '商/shang1/', the former subgroup conveys concepts about a Chinese dynasty, and the latter carries information about commerce. We aggregate computational linguistics methods, taking contextual, semantic, syntactic, lexical, and statistical factors into consideration. To contrast these results, in human experiment, we recruit adults and children to perform the clustering task. Experimental results indicate that our ensemble model achieves a similar level of performance as children.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T07:07:08Z (GMT). No. of bitstreams: 1 ntu-103-R00922072-1.pdf: 2602231 bytes, checksum: a9122583b1bbc8c36b7ba99db2a6b963 (MD5) Previous issue date: 2014	en
dc.description.tableofcontents	Acknowledgments i Abstract iii List of Figures viii List of Tables x Chapter 1 Introduction 1 1.1 Motivation................................. 1 1.2 ProposedApproaches........................... 2 1.3 ThesisStructure.............................. 3 Chapter 2 Related Work 4 2.1 SemanticSimilarityofWords ...................... 4 2.1.1 Corpus-basedApproaches .................... 4 2.1.2 Knowledge-basedApproaches .................. 6 2.2 ChineseCharacterMeanings....................... 7 Chapter 3 Methodology 8 3.1 ProblemDefinition ............................ 8 3.1.1 Notations ............................. 9 3.1.2 ProblemDefinition ........................ 9 3.1.3 Assumptions............................ 9 3.2 Framework................................. 10 3.3 SimilaritybetweenTargetWords .................... 11 3.3.1 Vector-to-vectorSimilarity.................... 12 3.3.2 Bag-to-bagSimilarity....................... 18 3.4 Clustering................................. 24 3.5 EnsembleMethods ............................ 27 3.5.1 EnsemblefromPartitions .................... 27 3.5.2 EnsemblefromSimilarityMatrices . . . . . . . . . . . . . . . 29 Chapter 4 Experiments and Evaluation 31 4.1 CorporaandDataset ........................... 31 4.1.1 TrainingCorpora ......................... 31 4.1.2 TestData ............................. 32 4.2 EvaluationMetrics ............................ 34 4.3 HumanExperiment............................ 36 4.3.1 Participants............................ 36 4.3.2 Materialsandprocedure ..................... 37 4.4 ExperimentalResults........................... 38 Chapter 5 Discussion 46 5.1 ComprehensiveConsideration ...................... 46 5.2 Comparisons of the POS, Document, and Dependency methods . . . 47 5.3 Limitations ................................ 48 Chapter 6 Conclusion 50 6.1 SummaryofWorkAccomplished .................... 50 6.2 Contributions ............................... 51 6.3 FutureWork................................ 52 Bibliography 53 Appendix 61 Chapter A Test Data: 11 Morphological Families 1
dc.language.iso	en
dc.subject	詞素覺識	zh_TW
dc.subject	語意分群	zh_TW
dc.subject	自然語言處理	zh_TW
dc.subject	計算語言	zh_TW
dc.subject	semantic clustering	en
dc.subject	morphological awareness	en
dc.subject	natural language processing	en
dc.subject	computational linguistics	en
dc.title	中文雙音節複合詞組成詞素之語意分群	zh_TW
dc.title	Semantic Clustering for the Constituent Morphemes of Chinese Disyllabic Compounds	en
dc.type	Thesis
dc.date.schoolyear	102-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳信希(Hsin-Hsi Chen),李育杰(????Yuh-Jye Lee?),李佳穎(Chia-Ying Lee),蔡宗翰(Richard Tzong-Han Tsai)
dc.subject.keyword	詞素覺識,語意分群,自然語言處理,計算語言,	zh_TW
dc.subject.keyword	morphological awareness,semantic clustering,natural language processing,computational linguistics,	en
dc.relation.page	60
dc.rights.note	有償授權
dc.date.accepted	2014-07-10
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-103-1.pdf 未授權公開取用	2.54 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。