Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/56659
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳信希(Hsin-Hsi Chen)
dc.contributor.authorWan-Shan Liaoen
dc.contributor.author廖婉珊zh_TW
dc.date.accessioned2021-06-16T05:40:30Z-
dc.date.available2015-04-06
dc.date.copyright2014-08-17
dc.date.issued2014
dc.date.submitted2014-08-12
dc.identifier.citationCMU, “The ClueWeb09 Dataset,” 2009.
Carlson, Lynn, Mary Ellen Okurowski, and Daniel Marcu. 2002. RST discourse Treebank. Linguistic Data Consortium, University of Pennsylvania.
DUVERLE, David A.; PRENDINGER, Helmut. 2009. A novel discourse parser based on support vector machine classification. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2. Association for Computational Linguistics, p. 665-673.
Hen-Hsen Huang, Tai-Wei Chang, Huan-Yuan Chen, and Hsin-Hsi Chen. 2014. Interpretation of Chinese Discourse Connectives for Explicit Discourse Relation Recognition. To appear in Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014), Dublin, Ireland.
Hen-Hsen Huang, Chi-Hsin Yu, Tai-Wei Chang, Cong-Kai Lin, and Hsin-Hsi Chen. 2014. Web-Based Analysis of Chinese Discourse Markers for Opinion Mining. To appear in Proceedings of the 2014 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2014), Warsaw, Poland.
Hen-Hsen Huang, Chi-Hsin Yu, Tai-Wei Chang, Cong-Kai Lin, and Hsin-Hsi Chen. 2013. Analyses of the Association between Discourse Relation and Sentiment Polarity with a Chinese Human-Annotated Corpus. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse (LAW VII 2014), pages 70-78, Sofia, Bulgaria.
HERNAULT, Hugo, et al. 2010. HILDA: a discourse parser using support vector machine classification. Dialogue & Discourse, 2010, 1.3.
LIN, Ziheng; KAN, Min-Yen; NG, Hwee Tou. 2009. Recognizing implicit discourse relations in the Penn Discourse Treebank. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1. Association for Computational Linguistics, 2009. p. 343-351.
MARCU, Daniel; ECHIHABI, Abdessamad. 2002. An unsupervised approach to recognizing discourse relations. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2002. p. 368-375.
Miltsakaki, Eleni, et al. 2004. 'The Penn Discourse Treebank.' LREC.
PARK, Joonsuk; CARDIE, Claire, 2012. Improving implicit discourse relation recognition through feature set optimization. In: Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Association for Computational Linguistics, 2012. p. 108-112.
PEDREGOSA, Fabian, et al. 2011 Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 2011, 12: 2825-2830.
PITLER, Emily, et al. 2008. Easily identifiable discourse relations. Technical Reports (CIS), 2008, 884.
PITLER, Emily; LOUIS, Annie; NENKOVA, Ani, 2009. Automatic sense prediction for implicit discourse relations in text. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2. Association for Computational Linguistics, 2009. p. 683-691.
SPORLEDER, Caroline; LASCARIDES, Alex. 2008. Using automatically labelled examples to classify rhetorical relations: An assessment. Natural Language Engineering, 2008, 14.03: 369-416.
YU, Chi-Hsin; TANG, Yi-jie; CHEN, Hsin-Hsi, 2012. Development of a Web-Scale Chinese Word N-gram Corpus with Parts of Speech Information. In: LREC. 2012. p. 320-324.
ZHOU, Yuping; XUE, Nianwen. 2012. PDTB-style discourse annotation of Chinese text. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 2012. p. 69-77.
張牧宇, 宋原, 秦兵, & 劉挺. 2013. 中文篇章級距間語義關係識別. 中文信息學報, 27.6 (2013), 51-57.
梅家驹、竺一鸣等编 1996. 同义词词林,上海辞书出版社
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/56659-
dc.description.abstract近年來自然語言處理的研究,隨著字、詞層面的研究日益成熟,以及PDTB、RST-DT等大規模語篇關係語料庫的出現,對於語篇關係的研究日益增加。若是能正確預測篇章的關係,將有助於理解通篇的語義關係,在自然語言處理的相關應用如QA系統、自動摘要也都有很大的幫助。
  然而,由於中文缺乏了語料庫的資源,目前對於中文語篇關係的研究還是不多。
  在本文中,我們先針對哈爾濱工業大學在2013年發布的HIT-CIR中文語篇關係語料庫進行初步的分析。在研究過程中,因為資料集的稀疏,我們轉以另一個大規模的虛擬資料集做為訓練集。實驗的結果顯示使用大規模的語料訓練模型,有利於預測不同來源的文本。
  最後,我們進一步的分析,顯隱性語篇關係的分類性能,並分析了語篇單位周遭的非主要語篇標記是否和句子本身的語篇關係相關。
zh_TW
dc.description.abstractIn recent years, research in natural language processing, with the study words, phrases levels become more sophisticated. Since the large-scale manually annotated corpus of discourse relations such as PDTB and RST-DT have been released, the study of discourse relation is increasing. If we could correctly predict the relationship between discourse, it will help to understand the semantic understanding. The related applications in natural language processing, such as QA systems, automatic summaries are also of great help.
 However, due to the lack of a corpus of Chinese resources, the study in Chinese discourse relations are still little currently.
 In this work, we first make a preliminary analysis for HIT-CIR Chinese Discourse Relations Corpus, Harbin Institute of Technology released in 2013. Because of small-scale of datasets, we turn to treat another large-scale pseudo dataset as the training set. Experimental results show that this large-scale corpus training model promote to predict the discourse relation of text from different sources.
 Finally, we were further analyzed to the classification performance of implicit and explicit discourse relations, and analyzed whether the non-primary Markers is relevance to its discourse relation.
en
dc.description.provenanceMade available in DSpace on 2021-06-16T05:40:30Z (GMT). No. of bitstreams: 1
ntu-103-R01944023-1.pdf: 897693 bytes, checksum: 1ca7d0ffc104c98c3d84296c1e4a3509 (MD5)
Previous issue date: 2014
en
dc.description.tableofcontents口試委員會審定書 I
誌謝 II
中文摘要 III
ABSTRACT IV
表目錄 VII
第一章 緒論 1
1.1 語篇關係 1
1.2 研究動機與目的 3
1.3 論文架構 4
第二章 相關研究 5
2.1 語篇關係語料庫 5
2.1.1 RST-DT 5
2.1.2 PDTB 2.0 5
2.2 英文語篇關係分析 7
2.3 中文語篇關係分析 9
第三章 語言資源 10
3.1 中文語篇標記辭典 10
3.2 中文語篇標記機率分布辭典 12
3.3 哈爾濱工業大學中文篇章關係語料( HIT-CIR CDTB) 13
3.4 ClueWeb 09 中文語料庫 17
3.4.1 語料庫取樣 17
3.5 NTU Discourse Corpora 18
第四章 實驗方法 19
4.1 特徵抽取 19
4.2 分類器 21
4.3 評估方法 21
第五章 HIT-CIR 語料庫分析 23
5.1 複句資料集 23
5.1.1 資料分析 23
5.1.2 四分類語篇關係預測 24
5.1.3 多層級語篇關係分析 27
5.2 分句資料集 29
5.2.1 資料分析 29
5.2.2 四分類語篇關係預測 29
5.2.3 多層級語篇關係分析 32
第六章 跨語料庫分析 34
6.1 語篇標記機率分布之效能分析 34
6.2 訓練集大小之效能分析 37
6.3 單一特徵之效能分析 37
6.4 HIT-CIR測試集分析 38
6.5 7,601測試集分析 41
第七章 顯隱性關係分析 44
7.1 顯隱性關係的預測效能分析 44
7.2.1 中文語篇標記辭典決定 49
B. 顯性關係 54
7.2.2 中文語篇標記機率分布辭典決定 57
A. 隱性關係 57
B. 顯性關係 60
第八章 結論 63
參考文獻 64
dc.language.isozh-TW
dc.title中文顯性和隱性語篇關係分析之研究zh_TW
dc.titleChinese Explicit and Implicit Discourse Analysisen
dc.typeThesis
dc.date.schoolyear102-2
dc.description.degree碩士
dc.contributor.oralexamcommittee陳建錦(Chien-Chin Chen),林川傑(Chuan-Jie Lin),古倫維(Lun-Wei Ku)
dc.subject.keyword中文語篇關係,顯隱性關係,跨語料庫,語篇標記,Discourse Relation,zh_TW
dc.subject.keywordChinese discourse relation,discourse markers,implicit discourse,explicit discourse,cross corpus,en
dc.relation.page66
dc.rights.note有償授權
dc.date.accepted2014-08-12
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊網路與多媒體研究所zh_TW
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
ntu-103-1.pdf
  目前未授權公開取用
876.65 kBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved