Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/53748
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳信希
dc.contributor.authorYong-Siang Shihen
dc.contributor.author施詠翔zh_TW
dc.date.accessioned2021-06-16T02:28:51Z-
dc.date.available2015-08-06
dc.date.copyright2015-08-06
dc.date.issued2015
dc.date.submitted2015-08-03
dc.identifier.citationBird, S., Klein, E., and Loper, E. (2009). Natural Language Processing with Python. O'Reilly Media.
Callan, J., Hoy, M., Yoo, C., and Zhao, L. (2009). Clueweb09 data set.
Carlson, L., Marcu, D., and Okurowski, M. E. (2001). Building a discourse-tagged corpus in the framework of rhetorical structure theory. In Proceedings of the Second SIGdial Workshop on Discourse and Dialogue - Volume 16, SIGDIAL '01, pages 1--10, Stroudsburg, PA, USA. Association for Computational Linguistics.
Chan, S. W. K., Lai, T. B. Y., Gao, W., and T'sou, B. K. (2000). Mining discourse markers for Chinese textual summarization. In Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic Summarization - Volume 4, NAACL-ANLP-AutoSum '00, pages 11--20, Stroudsburg, PA, USA. Association for Computational Linguistics.
Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Chang, P.-C., Galley, M., and Manning, C. D. (2008). Optimizing Chinese word segmentation for machine translation performance. In Proceedings of the Third Workshop on Statistical Machine Translation, StatMT '08, pages 224--232, Stroudsburg, PA, USA. Association for Computational Linguistics.
Chang, P.-C., Tseng, H., Jurafsky, D., and Manning, C. D. (2009). Discriminative reordering with Chinese grammatical relations features. In Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation, SSST '09, pages 51--59, Stroudsburg, PA, USA. Association for Computational Linguistics.
Chen, H.-H. (1994). The contextual analysis of Chinese sentences with punctuation marks. Literary and linguistic computing, 9(4):281--289.
Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 7:1--30.
Dines, N., Lee, A., Miltsakaki, E., Prasad, R., Joshi, A., and Webber, B. (2005). Attribution and the (non-)alignment of syntactic and discourse arguments of connectives. In Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky, CorpusAnno '05, pages 29--36, Stroudsburg, PA, USA. Association for Computational Linguistics.
Li, J. J., Carpuat, M., and Nenkova, A. (2014). Cross-lingual discourse relation analysis: A corpus study and a semi-supervised classification system. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 577--587. Dublin City University and Association for Computational Linguistics.
Li, J., Li, R., and Hovy, E. (2014). Recursive deep models for discourse parsing. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2061--2069, Doha, Qatar. Association for Computational Linguistics.
Li, Y., Feng, W., Sun, J., Kong, F., and Zhou, G. (2014). Building Chinese discourse corpus with connective-driven dependency tree structure. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2105--2114. Association for Computational Linguistics.
Elwell, R. and Baldridge, J. (2008). Discourse connective argument identification with connective specific rankers. In Semantic Computing, 2008 IEEE International Conference, pages 198--205. IEEE.
Faiz, S. I. and Mercer, R. E. (2013). Identifying explicit discourse connectives in text. In Zaïane, O. and Zilles, S., editors, Advances in Artificial Intelligence, volume 7884 of Lecture Notes in Computer Science, pages 64--76. Springer Berlin Heidelberg.
Feng, V. W. and Hirst, G. (2012). Text-level discourse parsing with rich linguistic features. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 60--68. Association for Computational Linguistics.
Fisher, S. and Roark, B. (2007). The utility of parse-derived features for automatic discourse segmentation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 488--495, Prague, Czech Republic. Association for Computational Linguistics.
Ghosh, S., Johansson, R., Riccardi, G., and Tonelli, S. (2011). Shallow discourse parsing with conditional random fields. In Proceedings of 5th International Joint Conference on Natural Language Processing; editors Haifeng Wang and David Yarowsky; Chiang Mai, Thailand; November 8-13, 2011, pages 1071--1079.
Ghosh, S., Riccardi, G., and Johansson, R. (2012). Global features for shallow discourse parsing. In Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL '12, pages 150--159, Stroudsburg, PA, USA. Association for Computational Linguistics.
Hernault, H., Prendinger, H., duVerle, D. A., Ishizuka, M., et al. (2010). HILDA: a discourse parser using support vector machine classification. Dialogue and Discourse, 1(3):1--33.
Hu, J.-z., Lei, L.-l., Yang, J.-c., Shu, J.-b., and Jiang-man, C. (2011). Research on a solving model of the collocations between the relation markers on in multiple compound sentences. Computer Engineering and Science, 33(11):177--182.
Hu, J.-z., Shu, J.-b., Yao, S.-y., Zhou, X., Wu, F.-w., and Xiao, S. (2009). Research on the extraction of relation markers in compound sentences oriented to Chinese information processing. Computer Engineering and Science, 31(10):90--93.
Huang, H.-H., Chang, T.-W., Chen, H.-Y., and Chen, H.-H. (2014). Interpretation of Chinese discourse connectives for explicit discourse relation recognition. In COLING 2014, 25th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, August 23-29, 2014, Dublin, Ireland, pages 632--643.
Huang, H.-H. and Chen, H.-H. (2011). Chinese discourse relation recognition. In IJCNLP, pages 1442--1446.
Huang, H.-H. and Chen, H.-H. (2012). Contingency and comparison relation labeling and structure prediction in Chinese sentences. In Proceedings of the 13th annual meeting of the special interest group on discourse and dialogue, pages 261--269. Association for Computational Linguistics.
Johannsen, A. and Sogaard, A. (2013). Disambiguating explicit discourse connectives without oracles. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, pages 997--1001. Asian Federation of Natural Language Processing.
Joty, S., Carenini, G., and Ng, R. T. (2012). A novel discriminative framework for sentence-level discourse analysis. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL '12, pages 904--915, Stroudsburg, PA, USA. Association for Computational Linguistics.
Joty, S. R., Carenini, G., Ng, R. T., and Mehdad, Y. (2013). Combining intra-and multi-sentential rhetorical parsing for document-level discourse analysis. In ACL (1), pages 486--496.
Kong, F., Ng, H. T., and Zhou, G. (2014). A constituent-based approach to argument labeling with joint inference in discourse parsing. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 68--77, Doha, Qatar. Association for Computational Linguistics.
Levy, R. and Manning, C. (2003). Is it harder to parse Chinese, or the Chinese Treebank? In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, ACL '03, pages 439--446, Stroudsburg, PA, USA. Association for Computational Linguistics.
Li, C. N. and Thompson, S. A. (1989). Mandarin Chinese: A functional reference grammar. University of California Press.
Li, Y., Sun, J., and Zhou, G. (2015). Automatic recognition and classification on Chinese discourse connective. Acta Scientiarum Naturalium Universitatis Pekinensis, 2:016.
Lin, Z., Ng, H. T., and Kan, M.-Y. (2014). A PDTB-styled end-to-end discourse parser. Natural Language Engineering, 20:151--184.
Mann, W. C. and Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3):243--281.
Marcus, M. P., Marcinkiewicz, M. A., and Santorini, B. (1993). Building a large annotated corpus of English: The Penn Treebank. Comput. Linguist., 19(2):313--330.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013a). Efficient estimation of word representations in vector space. In Proceedings of ICLR.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In Burges, C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K., editors, Advances in Neural Information Processing Systems 26, pages 3111--3119. Curran Associates, Inc.
Mikolov, T., Yih, W.-t., and Zweig, G. (2013c). Linguistic regularities in continuous space word representations. In HLT-NAACL, pages 746--751.
Okazaki, N. (2007). CRFsuite: a fast implementation of conditional random fields (CRFs).
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830.
Pennington, J., Socher, R., and Manning, C. (2014). GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532--1543. Association for Computational Linguistics.
Pitler, E. and Nenkova, A. (2009). Using syntax to disambiguate explicit discourse connectives in text. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, ACLShort '09, pages 13--16, Stroudsburg, PA, USA. Association for Computational Linguistics.
Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., and Webber, B. (2008). The Penn Discourse Treebank 2.0. In Proceedings of LREC.
Soricut, R. and Marcu, D. (2003). Sentence level discourse parsing using syntactic and lexical information. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1, NAACL '03, pages 149--156, Stroudsburg, PA, USA. Association for Computational Linguistics.
Sporleder, C. and Lapata, M. (2005). Discourse chunking and its application to sentence compression. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT '05, pages 257--264, Stroudsburg, PA, USA. Association for Computational Linguistics.
Toutanova, K., Klein, D., Manning, C. D., and Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1, NAACL '03, pages 173--180, Stroudsburg, PA, USA. Association for Computational Linguistics.
T'sou, B. K., Gao, W., Lai, T. B. Y., and Chan, S. W. K. (1999). Applying machine learning to identify Chinese discourse markers. In Information Intelligence and Systems, 1999. Proceedings. 1999 International Conference on, pages 548--553.
T'sou, B. K., Lai, T. B. Y., Chan, S. W. K., Gao, W., and Zhan, X. (2000). Enhancement of a Chinese discourse marker tagger with C4.5. In Proceedings of the Second Workshop on Chinese Language Processing: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12, CLPW '00, pages 38--45, Stroudsburg, PA, USA. Association for Computational Linguistics.
Weischedel, R., Palmer, M., Marcus, M., Hovy, E., Pradhan, S., Ramshaw, L., Xue, N., Taylor, A., Kaufman, J., Franchini, M., El-Bachouti, M., Belvin, R., and Houston, A. (2011). OntoNotes release 4.0.
Wellner, B. (2009). Sequence Models and Ranking Methods for Discourse Parsing. PhD thesis, Waltham, MA, USA. AAI3339383.
Wellner, B. and Pustejovsky, J. (2007). Automatically identifying the arguments of discourse connectives. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 92--101, Prague, Czech Republic. Association for Computational Linguistics.
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics bulletin, pages 80--83.
Xue, N. (2005). Annotating discourse connectives in the Chinese Treebank. In Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky, CorpusAnno '05, pages 84--91, Stroudsburg, PA, USA. Association for Computational Linguistics.
Xue, N., Xia, F., Chiou, F.-d., and Palmer, M. (2005). The Penn Chinese Treebank: Phrase structure annotation of a large corpus. Nat. Lang. Eng., 11(2):207--238.
Yu, C.-H., Tang, Y.-j., and Chen, H.-H. (2012). Development of a web-scale Chinese word n-gram corpus with parts of speech information. In LREC, pages 320--324.
Zhang, M., Qin, B., and Liu, T. (2014). Chinese discourse relation semantic taxonomy and annotation. Journal of Chinese Information Processing, 28(2):28.
Zhou, L., Gao, W., Li, B., Wei, Z., and Wong, K.-F. (2012). Cross-lingual identification of ambiguous discourse connectives for resource-poor language. In Proceedings of COLING 2012: Posters, pages 1409--1418. The COLING 2012 Organizing Committee.
Zhou, L., Li, B., Wei, Z., and Wong, K.-F. (2014). The CUHK Discourse Treebank for Chinese: Annotating explicit discourse connectives for the Chinese treebank. In LREC'14, pages 942--949.
Zhou, Y. and Xue, N. (2012). PDTB-style discourse annotation of Chinese text. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, ACL '12, pages 69--77, Stroudsburg, PA, USA. Association for Computational Linguistics.
Zhou, Y. and Xue, N. (2015). The Chinese Discourse Treebank: a Chinese corpus annotated with discourse relations. Language Resources and Evaluation, 49(2):397--431.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/53748-
dc.description.abstract篇章關係指文字單位間如何有邏輯的彼此關聯。
透過文章中的篇章結構分析,我們可以更了解文件的意義。
因此,篇章結構分析被應用在很多領域,
例如自然語言界面以及大規模的文件分析。
相對於英文的篇章語料集早就提供研究者使用,中文的大規模篇章資料集一直
到近年才終於被釋出。同時,中文的篇章結構分析有很多獨特的議題,
例如中文的篇章連接詞的種類較多,且常有多個不連續詞語組成的
多重連接詞,此外,中文的句子結構也更為複雜,使得正確辨識篇章結構更為困難。
篇章連接詞是用來辨識中文文章中篇章關係的重要線索,但由於連接詞
本身的歧義性讓辨識篇章連接詞本身成為一個挑戰議題。在本篇論文中,
我們研究與篇章連接詞的顯性篇章關係有關的四個議題:
第一,我們處理篇章連接詞的辨識,在文章中找出可能的篇章連接詞。
第二,我們探討篇章連接詞的構成詞語間的多重連結關係。第三,我們研究
每個篇章連接詞的篇章關係消歧。最後,我們辨識每個篇章連結詞的論元。
我們提出不同的特徵來訓練基於羅吉斯迴歸 (Logistic Regression)
演算法的分類器來識別正確的篇章連接詞,以及辨識其篇章關係的種類。
此外,我們也將每個可能的候選連接詞排序,
並利用一個貪婪的演算法 (greedy algorithm) 來解決連結詞的連結關係歧義性。
最後,我們將論元辨識視為一個序列標記問題 (sequence labeling problem),
並利用條件隨機域 (Conditional Random Fields) 來找出論元的邊界。
除了顯性篇章關係外,未來隱性篇章關係也需要進一步的研究,
在這些元件的基礎上,建立一個完整的中文篇章結構分析器。
zh_TW
dc.description.abstractDiscourse relations represent how textual units logically connect
with each other. Analyzing the discourse structure for texts
could aid the understanding of the meaning behind paragraphs.
There are many potential applications such as natural language
interface and large-scale content-analysis.
Although there are popular English discourse corpora for researchers,
large-scale Chinese discourse corpora have not been available until
recently. In addition, Chinese discourse analysis has many
unique issues including the variety of discourse connectives,
the common occurrences of parallel connectives, and the complex
sentence structures.
Discourse connectives are important clues for identifying discourse
relations in Chinese texts. However, the ambiguity involved makes
it a challenge to extract true connectives. In this thesis, we investigate
four tasks regarding explicit discourse relations that are signaled
by discourse connectives. Firstly, we deal with the extraction
of explicit discourse connectives. Secondly, we investigate resolving
linking ambiguities among connective components.
Thirdly, we disambiguate the discourse relation type for each connective.
Finally, we extract the arguments for each discourse connective.
Several features are proposed to train Logistic Regression classifiers
to disambiguate between discourse and non-discourse usages and
the relation types for connectives. Additionally, we rank each
connective candidate and develop a greedy algorithm to resolve
linking ambiguities. Finally, the argument identification is formulated
as a sequence labeling problem, and Conditional Random Fields are
utilized to determine the argument boundaries.
Besides explicit discourse relations, further investigation must be done
to recognize implicit relations. Built upon these components,
an end-to-end discourse parser for Chinese may be constructed
in future studies.
en
dc.description.provenanceMade available in DSpace on 2021-06-16T02:28:51Z (GMT). No. of bitstreams: 1
ntu-104-R02922036-1.pdf: 631057 bytes, checksum: 67d00dcd5aea9a1b41c9c838d0001adf (MD5)
Previous issue date: 2015
en
dc.description.tableofcontents口試委員會審定書 i
誌謝 ii
摘要 iii
Abstract v
Contents vii
List of Figures x
List of Tables xii
1 Introduction 1
1.1 Background 1
1.2 Motivation 4
1.3 Goals 5
1.4 Structure 6
2 Related Work 7
2.1 English Discourse Corpora 7
2.2 English Discourse Researches 8
2.3 Chinese Discourse Corpora 10
2.4 Chinese Discourse Researches 11
3 Datasets 13
3.1 Chinese Discourse Treebank (CDTB) 13
3.1.1 Introduction 13
3.1.2 Analysis 16
3.2 NTU PN-Gram Corpus 21
3.3 Linking Directions of Connective Components 22
4 Methods 23
4.1 Overview 23
4.2 Connective Candidate Extraction 24
4.2.1 Goal 24
4.2.2 Extraction Methods 25
4.3 Discourse Usage Disambiguation 26
4.3.1 Goal 26
4.3.2 Disambiguation on Component Level 26
4.3.3 Features for a Connective Component Candidate 27
4.3.4 Disambiguation on Connective Level 29
4.3.5 Features for a Connective Candidate 30
4.4 Connective Linking Disambiguation 33
4.4.1 Goal 33
4.4.2 A Greedy Algorithm 33
4.5 Discourse Relation Type Disambiguation 36
4.6 Connective Argument Extraction 36
4.6.1 Goal 36
4.6.2 Determination of Argument Boundaries 39
4.6.3 Features for a Segment 39
5 Experiments 44
5.1 Discourse Usage Disambiguation 44
5.1.1 Disambiguation on Component Level 45
5.1.2 Disambiguation on Connective Level 45
5.2 Connective Linking Disambiguation 46
5.2.1 Connective Linking Disambiguation for Known Connective Components 48
5.2.2 Connective Linking Disambiguation within the Pipeline System 49
5.3 Discourse Relation Type Disambiguation 52
5.3.1 Relation Type Disambiguation for Known Connectives 52
5.2.2 Relation Type Disambiguation within the Pipeline System 54
5.4 Connective Argument Extraction 55
5.4.1 Connective Argument Extraction for Known Connectives 55
5.4.2 Error Analysis 55
5.4.3 Connective Argument Extraction within the Pipeline System 59
6 Conclusion and Future Work 61
6.1 Conclusion 61
6.2 Future Work 62
6.2.1 Integration of Discourse Usage Disambiguation and Linking Disambiguation 62
6.2.2 Utilizing Connective Arguments for Discourse Relation Disambiguation and Discourse Usage Disambiguation 63
6.2.3 Identifying the Implicit Relations for Discourse Parsing 63
Bibliography 64
dc.language.isoen
dc.subject論元辨識zh_TW
dc.subject自然語言處理zh_TW
dc.subject中文篇章結構分析zh_TW
dc.subject篇章連接詞辨識zh_TW
dc.subject篇章關係消歧zh_TW
dc.subjectDiscourse Connective Argument Identificationen
dc.subjectNatural Language Processingen
dc.subjectChinese Discourse Analysisen
dc.subjectDiscourse Connective Recognitionen
dc.subjectDiscourse Relation Disambiguationen
dc.title中文篇章連接詞偵測、消歧、及論元辨識zh_TW
dc.titleDetection, Disambiguation, and Argument Identification of Chinese Discourse Connectivesen
dc.typeThesis
dc.date.schoolyear103-2
dc.description.degree碩士
dc.contributor.oralexamcommittee張嘉惠,鄭卜壬,蔡銘峰
dc.subject.keyword自然語言處理,中文篇章結構分析,篇章連接詞辨識,篇章關係消歧,論元辨識,zh_TW
dc.subject.keywordNatural Language Processing,Chinese Discourse Analysis,Discourse Connective Recognition,Discourse Relation Disambiguation,Discourse Connective Argument Identification,en
dc.relation.page72
dc.rights.note有償授權
dc.date.accepted2015-08-03
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-104-1.pdf
  未授權公開取用
616.27 kBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved