Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 文學院
  3. 語言學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7249
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor謝舒凱(Shu-Kai Hsieh)
dc.contributor.authorMeng-Hsien Shihen
dc.contributor.author施孟賢zh_TW
dc.date.accessioned2021-05-19T17:40:36Z-
dc.date.available2024-08-16
dc.date.available2021-05-19T17:40:36Z-
dc.date.copyright2019-08-16
dc.date.issued2019
dc.date.submitted2019-08-03
dc.identifier.citationAbeillé, A. (Ed.). (2003). Treebanks: Building and Using Parsed Corpora. New York: Springer.
Agirre, E., de Lacalle, O. L., Fellbaum, C., Hsieh, S.-K., Tesconi, M., Monachini, M., … Segers, R. (2010). SemEval-2010 task 17: All-words word sense disambiguation on a specific domain. In Proceedings of the 5th international workshop on semantic evaluation (pp. 75–80). Los Angeles, California: Association for Computational Linguistics.
Alberti, C., Andor, D., Bogatyy, I., Collins, M., Gillick, D., Kong, L., … Weiss, D. (2017). SyntaxNet Models for the CoNLL 2017 Shared Task. arXiv:1703.04929v1
Ambati, B. R., Reddy, S., & Kilgarriff, A. (2012). Word Sketches for Turkish. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, European Language Resources Association (ELRA).
Bahumaid, S. (2006). Collocation in English-Arabic Translation. Babel, 52(2), 133–152.
Bailey, E., & Aeron, S. (2017). Word Embeddings via Tensor Factorization. arXiv:1704.02686
Baker, M. (2011). In Other Words: A coursebook on translation (2nd ed.). New York: Routledge.
Barnbrook, G., Mason, O., & Krishnamurthy, R. (2013). Collocation. London: Palgrave Macmillan.
Barrena, A., Agirre, E., Cabaleiro, B., Nas, A. P., & Soroa, A. (2014). One Entity per Discourse and One Entity per Collocation Improve Named-Entity Disambiguation. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics, August 23-29 2014.
Benson, M. (1985). Collocations and idioms. In R. Ilson (Ed.), Dictionaries, lexicography, and language learning (pp. 61–68). Oxford: Pergamon.
Benson, M., Benson, E., & Ilson, R. (1997). The BBI Dictionary of English Word Combinations. Amsterdam: John Benjamins.
Bullokar, J. (1616). An English Expositor, 1616. A Collection of facsimile reprints. Scolar P. Retrieved from https://books.google.com.tw/books?id=OucOAQAAIAAJ
Burnard, L. (1995). The BNC reference manual. Oxford: Oxford University Computing Service.
Cai, C. (2014). The semantic prosody of pro-verb gao ”do” in cross-strait varieties between modern Chinese. Journal of Chinese Language Teaching, 11(3), 91–110.
Callies, M., & Paquot, M. (2015). Learner Corpus Research: An interdisciplinary field on the move [Editorial]. International Journal of Learner Corpus Research, 1(1), 1–6. Retrieved from http://www.jbe- platform.com/content/journals/10.1075/ijlcr.1.1.00edi
Cardey, S., Chan, R., & Greenfield, P. (2006). The Development of a Multilingual Collocation Dictionary. Proceedings of the Workshop on Multilingual Language Resources and Interoperability, 32–39. Retrieved from https://aclanthology.coli.uni-saarland.de/papers/W06-1005/w06-1005
Che, W., Li, Z., & Liu, T. (2012). Chinese Dependency Treebank 1.0 (1.0). Philadelphia: Linguistic Data Consortium. Retrieved from https://catalog.ldc.upenn.edu/LDC2012T05
Chen, D., & Manning, C. (2014). A Fast and Accurate Dependency Parser using Neural Networks. (pp. 740–750). Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics.
Chen, K.-J., Huang, C.-R., Chang, L., & Hsu, H.-L. (1996). Sinica Corpus: Design methodology for balanced corpora. In Proceedings of the 11th Pacific Asia Conference on Language, Information and Computation (pp. 167–176).
Chen, K.-J., Luo, C.-C., Chang, M.-C., Chen, F.-Y., Chen, C.-J., Huang, C.-R., & Gao, Z.-M. (2003). Sinica Treebank. In A. Abeillé (Ed.), Treebanks: building and using parsed corpora (pp. 231–248). Dordrecht: Springer Netherlands. Retrieved from https://doi.org/10.1007/978-94-010-0201-1%7B%5C_%7D13
Chinese Knowledge Information Processing Group. (1998). Content Explanations of Sinica Corpus. Academia Sinica. Taipei. Retrieved from http://asbc.iis.sinica.edu.tw
Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), 22–29.
Cruden, A. (1854). A Complete Concordance to the Old and New Testament. London: James Dinnis. Retrieved from https://books.google.com.tw/books?id=7qpbAAAAMAAJ
Curran, J. R. (2004). From Distributional to Semantic Similarity (Doctoral dissertation, University of Edinburgh).
Davies, M. (2010). The Corpus of Contemporary American English as the first reliable monitor corpus of English. Literary and Linguistic Computing, 25(4), 447–464. Retrieved from https://academic.oup.com/dsh/article-lookup/doi/10.1093/llc/fqq018
de Marneffe, M.-C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J., & Manning, C. D. (2014). Universal Stanford dependencies: A cross-linguistic typology. In Proceedings of the ninth international conference on language resources and evaluation.
Evert, S. (2004). The Statistics of Word Cooccurrences: Word Pairs and Collocations (Doctoral dissertation, Universitat Stuttgart).
Evert, S. (2009). 58. Corpora and collocations. In A. Lüdeling & M. Kytö (Eds.), Corpus linguistics: an international handbook (pp. 1212–1248). Berlin: Walter de Gruyter. Retrieved from https://books.google.com.tw/books?id=qmOjMgEACAAJ%20http://www.degruyter.com/view/books/9783110213881.2/9783110213881.2.1212/9783110213881.2.1212.xml
Firth, J. R. (1957a). A synopsis of linguistic theory, 1930-1955. In Studies in linguistic analysis (pp. 1–32). Oxford: Blackwell. Retrieved from https://books.google.com.tw/books?id=T8LDtgAACAAJ
Firth, J. R. (1957b). Modes of meaning. In Papers in linguistics 1934-1951 (pp. 190–215). Oxford: Oxford University Press. Retrieved from https://books.google.com.tw/books?id=yxZZAAAAMAAJ
Graff, D., & Chen, K.-J. (2003). Chinese Gigaword. Philadelphia: Linguistic Data Consortium. Retrieved from https://catalog.ldc.upenn.edu/LDC2003T09
Greenbaum, S. (1974). Some Verb-Intensifier Collocations in American and British English. American Speech, 49(1/2), 79–89. Retrieved from http://www.jstor.org/stable/3087920
Gries, S. T. (2015). Quantitative designsand statistical techniques. In D. Biber & R. Reppen (Eds.), The cambridge handbook of english corpus linguistics (pp. 50–72). Cambridge Handbooks in Language and Linguistics. Cambridge: Cambridge University Press. Retrieved from https://www.cambridge.org/core/books/cambridge-handbook-of-english-corpus-linguistics/quantitativedesignsand-statistical-techniques/134DC9DB414FF6BEA3A7A3FE748E9DFA
Gupta, S., Namavari, A., & Smith, T. O. (2017). Word Sense Disambiguation Using Skip-Gram and LSTM Models. Retrieved from https://web.stanford.edu/class/cs224n/reports/2762042.pdf
Hanks, P. (2004). Corpus pattern analysis. In Euralex Proceedings (Vol. 1, pp. 87–98).
Hanks, P. (2012). The Corpus Revolution in Lexicography. International Journal of Lexicography, 25(4), 398–436. Retrieved from https://academic.oup.com/ijl/article-lookup/doi/10.1093/ijl/ecs026
Hanks, P., & Pustejovsky, J. (2005). A Pattern Dictionary for Natural Language Processing. Revue française de linguistique appliquée, 10(2), 63–82.
Huang, C.-R., & Chen, K.-J. (1992). A Chinese Corpus for Linguistic Research. In The 15th International Conference on Computational Linguistics (pp. 1214–1217). Retrieved from http://www.aclweb.org/anthology/C92-4194
Huang, C.-R., Hong, J.-F., Ma, W.-Y., & Simon, P. (2015). From corpus to grammar: automatic extraction of grammatical relations from annotated corpus. Linguistic Corpus and Corpus Linguistics in the Chinese Context, Journal of Chinese Linguistics Monograph Series, 25, 192–221. Retrieved from http://iis.sinica.edu.tw/papers/ma/19354-F.pdf
Huang, C.-R., Hsieh, S.-K., Hong, J.-F., Ch en, Y.-Z., Chen, Y.-X., & Huang, S.-W. (2010). Chinese Wordnet: Design, Implementation, and Application of an Infrastructure for Cross-Lingual Knowledge Processing. Journal of Chinese Information Processing, 24(2), 14–23.
Huang, C.-R., Kilgarriff, A., Wu, Y., Chiu, C.-M., Smith, S., Rychly, P., … Chen, K.-J. (2005). Chinese Sketch Engine and the Extraction of Grammatical Collocations BT - Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing. Retrieved from http://www.aclweb.org/anthology/I05-3007
Iacobacci, I., Pilehvar, M. T., & Navigli, R. (2016). Embeddings for Word Sense Disambiguation: An Evaluation Study. In Proceedings of the 54th annual meeting of the association for computational linguistics (pp. 897–907).
Inumella, A., Kilgarriff, A., & Kovář, V. (2009). Associating collocations with dictionary senses. In Proceedings of 6th Biennial Conference of the Asian Association for Lexicography (pp. 102–113).
Jakubíček, M., Kilgarriff, A., Kovář, V., Rychlý, P., & Suchomel, V. (2013). The TenTen Corpus Family. In 7th International Corpus Linguistics Conference.
Jin, P., Wu, Y., & Yu, S. (2007). SemEval-2007 Task 5: Multilingual Chinese-English Lexical Sample. In Proceedings of the fourth international workshop on semantic evaluations (pp. 19–23). Prague: Association for Computational Linguistics. Retrieved from http://nlp.cs.swarthmore.edu/semeval/tasks/task05/description.pdf
Johns, T. (1997). Cause v. lead to v. bring about. Retrieved October 1, 2018, from https://lexically.net/TimJohns/Kibbitzer/revis024.htm
Kilgarriff, A. (2007). Using corpora in language learning. In Optimizing the role of language in Technology-Enhanced Learning (pp. 21–23).
Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., … Suchomel, V. (2014). The Sketch Engine: Ten years on. Lexicography ASIALEX, 1, 7–36.
Kilgarriff, A., Rychly, P., Smrz, P., & Tugwell, D. (2004). The Sketch Engine. In Proceedings of the 11th euralex international congress (pp. 105–116). Lorient, France.
Kong, L., Alberti, C., Andor, D., Bogatyy, I., & Weiss, D. (2017). DRAGNN: A Transition-based Framework for Dynamically Connected Neural Networks. arXiv: 1703.04474
Kucera, H., & Francis, W. N. (1967). Computational analysis of present-day American English. Providence: Brown University Press.
Levy, O., & Goldberg, Y. (2014). Dependency-Based Word Embeddings. In Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 2: short papers) (pp. 302–308). Association for Computational Linguistics. Retrieved from http://www.aclweb.org/anthology/P14-2050
Lexical Computing Ltd. (2015). Statistics used in Sketch Engine. Retrieved from http://sketchengine.co.uk/documentation/statistics-used-in-sketch-engine
Lin, D. (1998a). Automatic retrieval and clustering of similar words. In Proceedings of the 17th International Conference on Computational linguistics (Vol. 2, pp. 768–774). Morristown, NJ, USA: Association for Computational Linguistics. Retrieved from https://aclanthology.coli.uni-saarland.de/papers/P98-2127/p98-2127
Lin, D. (1998b). Using Collocation Statistics in Information Extraction. In Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998. Retrieved from https://aclanthology.coli.uni-saarland.de/papers/M98-1006/m98-1006
Louw, B. (2000). Contextual prosodic theory: Bringing semantic prosodies to life. Words in Context: A Tribute to John Sinclair on His Retirement. English language research discourse analysis monograph, 13(1), 48–94. Retrieved from http://www.revue-texto.net/docannexe/file/124/louw%7B%5C_%7Dprosodie.pdf
Maarouf, I. E., Bradbury, J., Baisa, V., & Hanks, P. (2014). Disambiguating Verbs by Collocation: Corpus Lexicography meets Natural Language Processing BT - Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), European Language Resources Association (ELRA). Retrieved from http://www.aclweb.org/anthology/L14-1300
Mallery, J. C. (1988). Thinking about foreign policy: Finding an appropriate role for artificially intelligent computers (Doctoral dissertation).
Manning, C. D., & Schütze, H. (1999). Collocation. In Foundations of statistical natural language processing. Cambridge: MIT Press.
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19(2). Retrieved from http://www.aclweb.org/anthology/J93-2004
McEnery, T., & Hardie, A. (2012a). Corpus-based studies of synchronic and diachronic variation. In Corpus linguistics: method, theory and practice (pp. 94–121). Cambridge Textbooks in Linguistics. Cambridge: Cambridge University Press.
McEnery, T., & Hardie, A. (2012b). Neo-Firthian corpus linguistics. In Corpus linguistics: method, theory and practice (pp. 122–166). Cambridge Textbooks in Linguistics. Cambridge: Cambridge University Press.
McGee, I. (2012). Collocation Dictionaries as Inductive Learning Resources in Data-Driven Learning - An Analysis and Evaluation. International Journal of Lexicography, 25(3), 319–361.
McIntosh, C. (2009). Oxford Collocations Dictionary for Student of English. Oxford: Oxford Oxford University.
Müller, A. C., & Guido, S. (2016). Introduction to Machine Learning with Python: A guide for data scientists. ” O’Reilly Media, Inc.”
Navigli, R. (2009). Word sense disambiguation. ACM Computing Surveys, 41(2), 1–69.
Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some implications for teaching. Applied Linguistics, 24(2), 223–242.
Nivre, J. (2006). Inductive Dependency Parsing. Dordrecht: Springer.
Nivre, J., Marneffe, M.-c. D., Ginter, F., Goldberg, Y., Manning, C. D., McDonald, R., … Daniel. (2016). Universal Dependencies v1: A Multilingual Treebank Collection. In Proceedings of the tenth international conference on language resources and evaluation. Portorož, Slovenia: European Language Resources Association.
Orliac, B., & Dillinger, M. (2003). Collocation extraction for machine translation. In Proceedings of Machine Translation Summit IX (pp. 292–298).
Parker, R., Graff, D., Chen, K., Kong, J., & Maeda, K. (2011). Chinese Gigaword Fifth Edition. Retrieved January 18, 2019, from https://catalog.ldc.upenn.edu/LDC2011T13
Pearce, D. (2001). Synonymy in Collocation Extraction. In Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations (NAACL 2001).
Popel, M., Zabokrtsky, Z., & Vojtek, M. (2017). Udapi: Universal API for Universal Dependencies. In Proceedings of the nodalida 2017 workshop on universal dependencies. Gothenburg, Sweden.
Rundell, M., & Kilgarriff, A. (2011). Automating the creation of dictionaries: where will it all end. A Taste for Corpora. In honour of Sylviane Granger, 257–282.
Rychlý, P. (2008). A lexicographer-friendly association score. In Proceedings of recent advances in slavonic natural language processing, raslan (Vol. 2008, pp. 6–9).
Seretan, V. (2011). Syntax-Based Collocation Extraction. Netherlands: Springer.
Sinclair, J. (1966). Beginning the study of lexis. In In memory of j. r. firth (pp. 410–430). London: Longmans.
Sinclair, J. (1987). Looking up: An account of the COBUILD project in lexical computing and the development of the Collins COBUILD English language dictionary. Collins Elt.
Sinclair, J. (2004). Trust the Text: Language, corpus and discourse. London: Routledge.
Sinclair, J., Jones, S., & Daley, R. (1970). English Collocation Studies: The OSTI report. Birmingham: Universit of Birmingham Press.
Smadja, F., Mckeown, K. R., & Hatzivassiloglou, V. (1996). Translating Collocations for Bilingual Lexicons: A Statistical Approach. Computational Linguistics, 22(1), 1–38.
Stefanowitsch, A., & Gries, S. T. (2003). Collostructions: Investigating the interaction of words and constructions. International Journal of Corpus Linguistics, 8(2), 209–243. Retrieved from https://www.ingentaconnect.com/content/jbp/ijcl/2003/00000008/00000002/art00003%20https://doi.org/10.1075/ijcl.8.2.03ste
Stubbs, M. (1995). Collocations and semantic profiles: On the cause of the trouble with quantitative studies. Functions of Language, 2(1), 23–55.
Taylor, A., Marcus, M., & Santorini, B. (2003). The Penn Treebank: An Overview. In A. Abeillé (Ed.), Treebanks: building and using parsed corpora (pp. 5–22). Dordrecht: Springer Netherlands. Retrieved from https://doi.org/10.1007/978-94-010-0201-1%7B%5C_%7D1%20http://www.springerlink.com/index/10.1007/978-94-010-0201-1%7B%5C_%7D1
Thomas, J. (2017). Discovering English with Sketch Engine: A corpus-based approach to language exploration (2nd ed.). Brno: Versatile.
Tsai, M.-C. (2011). ”Convenient” during the process or as a result - Event structure of synonymous stative verbs in TCSL. Journal of Chinese Language Teaching, 8(3), 1–22. Retrieved from www.airitilibrary.com/Publication/alDetailedMesh?DocID=18118429-201112-201203140010-201203140010-2-23
Wang, C.-C., Chen, H. H.-J., & Pan, I.-T. (2015). The Application of Web Business Chinese Corpus: A Case Study on Collocation. Journal of Chinese Language Teaching, 12(2), 75–102.
Wehrli, E., Seretan, V., & Nerima, L. (2010). Sentence Analysis and Collocation Identification. In Proceedings of the 2010 workshop on multiword expressions: from theory to applications (pp. 28–36).
Xiao, R., & McEnery, T. (2006). Collocation, Semantic Prosody, and Near Synonymy: A Cross-Linguistic Perspective. Applied Linguistics, 27(1), 103–129.
Xue, N. X., Zhang, X., Jiang, Z., Palmer, M., Xia, F., Chiou, F.-D., & Chang, M. (2016). Chinese Treebank 9.0. Philadelphia: Linguistic Data Consortium. Retrieved from https://catalog.ldc.upenn.edu/LDC2016T13
Yang, Z., Zhang, H., Chen, Q., & Tan, H. (2016). Word Sense Disambiguation Using Context Translation. (pp. 489–496). Natural Language Understanding and Intelligent Applications. Switzerland: Springer.
Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd annual meeting on Association for Computational Linguistics - (pp. 189–196). Morristown, NJ, USA: Association for Computational Linguistics.
Zaihrayeu, I., Sun, L., Giunchiglia, F., Pan, W., Ju, Q., Chi, M., & Huang, X. (2007). From Web Directories to Ontologies: Natural Language Processing Challenges. In K. Aberer, K.-S. Choi, N. Noy, D. Allemang, K.-I. Lee, L. Nixon, …P. Cudré-Mauroux (Eds.), The Semantic Web (pp. 623–636). Berlin, Heidelberg: Springer.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7249-
dc.description.abstract隨著語料庫的規模越來越大,除了提供上下文檢索功能之外,有必要更進一步自動處理大型語料庫的資料,以提供更多資訊,例如搭配詞和詞義訊息。本論文建構具詞義區分之繁體中文和簡體中文搭配詞資源,並藉由自然語言處理的任務表現以評估提出之搭配詞資源。
為了自動擷取具詞義標記的搭配詞,本研究分別利用 Stanford Parser 和 SyntaxNet Parser 從具詞義標記的句子中擷取搭配詞組合,並依其$logDice$分數高低進行排序。在繁體中文資料的詞義標記上,本文嘗試以半自動化方式標記詞義,從中研院平衡語料庫4.0的句子中找出接近標記詞義的候選句。先以 Stanford Parser (以及SyntaxNet Parser) 剖析語料庫中的句子,然後根據剖析出的依存句法資訊將該句子投射至語義向量空間中。同樣地,詞典 (中文詞彙網路) 中每個詞義的例句也經句法剖析投射至語義空間,然後將在語義空間中接近欲標記詞義例句的中研院語料庫句子優先抽取出來,方便標記者優先標記該詞義可能的候選句,以加速標記工作的進行,而不需從語料庫中一句句地尋找可標記詞義之句子。
簡體中文的詞義標記資料則來自於2007年的語義評估任務,共有40個詞,其詞義標記在2,686個句子中。為了能與簡體中文的詞義標記進行比較,在繁體中文的詞典 (中文詞彙網路) 中選取17個也在簡體中文資料出現的詞當標記目標,並在中研院平衡語料庫中共標記了1,646個含該17個詞的句子。本搭配詞資源及其詞義標記已在網站上釋出 (http://lopen.linguistics.ntu.edu.tw/collocation.htm),以提供使用者查詢。
藉由詞義消歧任務的外部評估,結果證明運用 SyntaxNet Parser 擷取的搭配詞資料,可訓練支持向量機之分類器達到現今最佳的簡體中文詞義消歧準確率 P=75.98%,以及詞義區分較細的繁體中文準確率 P=58.35%。相對於深度學習模型,本研究用較透明的模型僅配合基本的語言特徵,就能得到當今最好的詞義消歧表現,表示詞的搭配行為幾乎就能決定該詞在句中的詞義。
zh_TW
dc.description.abstractWith the size of corpora growing larger and larger, it is of urgent necessity to automatically process big corpora to provide further information beyond concordance, such as collocation and sense information. In this dissertation, a collocation resource with sense distinction in Simplified Chinese and Traditional Chinese is constructed, and the results are evaluated by an NLP (Natural Language Processing) task.
To automatically extract collocation with sense annotation, the Stanford Parser and SyntaxNet Parser are exploited respectively to extract collocation candidates from sense-annotated sentences. These collocation candidates are later ranked by their logDice score. For Traditional Chinese sense annotation, a semi-automatic approach is investigated to facilitate the work of sense annotation, by bootstrapping sense instance candidates from the sentences in Academia Sinica Balanced Corpus 4.0. The sentences in the corpus are first parsed by the Stanford Parser (or by the SyntaxNet Parser alternatively), and each sentence is mapped to the vector space according to the dependency parsing information. Similarly, the example sentences of each sense in the dictionary (Chinese Wordnet) are also parsed to the same vector space. Then the sentence candidates in the corpus are ranked by their distances to the intended CWN sense to annotate in the vector space, so that the annotator can begin with the most likely sense instances to annotate, and does not have to examine the corpus sentence-by-sentence to find good sense instances.
For Simplified Chinese, the data comes from the SemEval-2007 dataset with 40 word types annotated in 2,686 sentences. To be comparable with the Simplified Chinese data, 17 word types in the Traditional Chinese sense inventory (i.e., Chinese Wordnet) overlapping with the SemEval-2007 word types are selected to annotate in 1,646 sentences from the Sinica Corpus. The proposed collocation resource with sense annotation in Simplified Chinese and Traditional Chinese has been released on a web interface (http://lopen.linguistics.ntu.edu.tw/collocation.htm) for users to query.
The extrinsic evaluation by the task of word sense disambiguation (WSD) shows that the collocation data extracted by the SyntaxNet Parser can train an SVM (Support Vector Machine) classifier to achieve the state-of-the-art WSD precision P=75.98% in Simplified Chinese, and P=58.35% in the more fine-grained Traditional Chinese sense inventory (Chinese Wordnet). The state-of-the-art WSD performance based on the proposed transparent approach with only linguistic features (compared to deep learning models) implies that, the collocational behavior of a word can mostly determine the word sense in a sentence.
en
dc.description.provenanceMade available in DSpace on 2021-05-19T17:40:36Z (GMT). No. of bitstreams: 1
ntu-108-D00142002-1.pdf: 5889095 bytes, checksum: e362104a9cac072389292dd3b62445f8 (MD5)
Previous issue date: 2019
en
dc.description.tableofcontentsAcknowledgements i
Chinese Abstract iii
English Abstract v
1 Introduction 1
1.1 The Emergence of Corpora 1
1.2 The Need for Big Corpora Beyond Raw Texts 3
1.3 The Importance of Corpora with Collocation Information 4
1.3.1 Language Learning 4
1.3.2 Translation 6
1.3.3 Natural Language Processing 7
1.4 Toward Collocation with Word Sense Distinction 8
1.5 Research Question and Goal 10
1.6 Overview of Dissertation 11
2 Literature Review 13
2.1 Definitions 13
2.2 Approaches to Collocation Identification 16
2.2.1 Impressionistic Approach 17
2.2.2 Statistical Approach 19
2.3 Grammatical Collocation 23
2.3.1 Current Techniques for Collocation Extraction 24
2.4 Grammatical Collocation and Dependency Grammar 29
2.4.1 Dependency Relations in Universal Dependencies 30
2.4.2 The Stanford Parser 33
2.4.3 The SyntaxNet Parser 34
2.5 Related Lexical Resources by Human Annotation 37
2.5.1 Pattern Dictionary of English Verbs 38
2.5.2 Treebanks 40
2.5.3 Sense-Annotated Data 41
2.6 The Task of Word Sense Disambiguation 42
3 Experiment 1 Based on the Stanford Parser for Simplified Chinese 45
3.1 Tool (Stanford Parser) and Data 45
3.2 Experiment Design to Extract Collocation 48
3.3 Evaluation 53
3.3.1 Intrinsic Evaluation of Grammatical Collocation 53
3.3.2 Extrinsic Evaluation of Collocation with Word Sense Distinction 55
3.3.3 Results and Discussion 59
3.4 Summary 62
4 Experiment 2 Based on the SyntaxNet Parser for Simplified Chinese 65
4.1 Experiment Design to Extract Collocation Using SyntaxNet 66
4.2 Results of WSD and Discussion 69
4.2.1 Results and Error Analysis of Verb Disambiguation 71
4.2.2 Results and Error Analysis of Noun Disambiguation 76
4.2.3 Summary of All Noun and Verb Disambiguation Performance 79
5 Experiment 3 of Sense Annotation and WSD for Traditional Chinese 81
5.1 Sense Annotation for Traditional Chinese 81
5.1.1 Data 1: Chinese Wordnet 82
5.1.2 Data 2: Academia Sinica Balanced Corpus of Modern Chinese 4.0 90
5.1.3 An Example: Bootstrapping Sense Instances of xiang3 [想] from ASBC 91
5.2 Word Sense Disambiguation for Traditional Chinese 96
5.2.1 Data Preparation for the WSD Task of Traditional Chinese 97
5.2.2 WSD Experiment and Results 99
5.2.3 Online Web Interface to Access the Collocation with Sense Annotation in Traditional Chinese 100
6 Conclusions 103
6.1 Research Findings 104
6.2 Implications and Applications 104
6.3 Limitations and Future Research 106
References 107
Appendix A WSD Script 117
Appendix B Script to Retrieve Sentences from SemEval XML 125
Appendix C Script to Integrate CoNLL-U with Sense Annotation 127
Appendix D Website Framework of the Proposed Resource 129
Appendix E The Usage of the SyntaxNet Parser 133
dc.language.isoen
dc.title具詞義區分之中文搭配詞資源建構及其應用zh_TW
dc.titleThe Construction of a Chinese Collocation Resource with Sense Distinction and its Applicationsen
dc.typeThesis
dc.date.schoolyear107-2
dc.description.degree博士
dc.contributor.oralexamcommittee高照明(Zhao-Ming Gao),劉德馨(Te-hsin Liu),龔書萍(Shu-Ping Gong),洪媽益(Michael Tanangkingsing)
dc.subject.keyword搭配詞,依存句法剖析器,語義空間,詞義標記,詞義消歧,zh_TW
dc.subject.keywordcollocation,dependency parser,semantic space,sense annotation,word sense disambiguation,en
dc.relation.page135
dc.identifier.doi10.6342/NTU201901938
dc.rights.note同意授權(全球公開)
dc.date.accepted2019-08-05
dc.contributor.author-college文學院zh_TW
dc.contributor.author-dept語言學研究所zh_TW
dc.date.embargo-lift2024-08-16-
顯示於系所單位:語言學研究所

文件中的檔案:
檔案 大小格式 
ntu-108-1.pdf5.75 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved