Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 工程科學及海洋工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21533
Title: 基於深度學習之中文詞性標記研究與實現
Research and Implementation of Chinese Part-of-Speech Tagging Based on Deep Learning
Authors: Wen-Jun Luo
羅文君
Advisor: 黃乾綱
Keyword: 中文詞性標記,深度學習,自然語言處理,
Chinese Part-of-Speech tagging,Deep Learning,Natural Language Processing,
Publication Year : 2019
Degree: 碩士
Abstract: 近年來,隨著人工智慧的快速發展,深度學習(Deep Learning)的技術也隨之蓬勃發展,並廣泛應用在各個領域,包括自然語言處理(Natural Language Processing,簡稱NLP)。
  詞性標記(Part-of-Speech tagging,簡稱POS tagging)是自然語言處理中的一項基礎任務,為句子中的每個詞都標上一個詞性類別的過程,是幫助電腦理解語言含義的關鍵。
  本論文主要針對現有基於深度學習的中文詞性標記方法,設計一個改善其模型方法並提升其標記準確率的詞性標記模型,採用Word2vec模型訓練詞嵌入(Word Embedding),並結合基於雙向長短期記憶網路(Bidirectional Long Short-Term Memory Network,簡稱BLSTM)的字符嵌入(Character Embedding)作為詞向量表示方法(Word Representation),再送入雙向長短期記憶網路模型提取上下文的特徵,進行詞性標記的任務。實驗結果顯示,使用此模型在中國大陸《人民日報》1998年1月份語料庫的詞性標記之整體準確率為96.28%,與未加入字符嵌入的基線模型(Baseline Model)相比提升0.76%;且未知詞(Out-of-Vocabulary,簡稱OOV)的詞性標記之準確率為81.51%,與基線模型相比提升10.81%。
In recent years, with the rapid development of artificial intelligence, deep learning technology has also been widely applied to various fields, including natural language processing (NLP).
  Part-of-Speech tagging (POS tagging) is a basic task in NLP. It is a process of marking up a word in a text (corpus) with a particular part of speech to help the computer understand the meaning of the language.
  This thesis focuses on improving the model and the accuracy of the existing Chinese POS tagging model based on deep learning. The improved model uses bidirectional long short-term memory (BLSTM) to extract the context features applied to the Chinese POS tagging. The input is word representation which is the concatenation of word embedding trained by Word2vec model and the character embedding trained by BLSTM. Experimental results show that the overall accuracy of POS tagging of the corpus in the People’s Daily in China in January 1998 achieves 96.28%, which is 0.76% higher than the baseline model without the character embedding; the accuracy of the POS tagging of out-of-vocabulary (OOV) is 81.51%, which is 10.81% higher than the baseline model.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21533
DOI: 10.6342/NTU201901183
Fulltext Rights: 未授權
Appears in Collections:工程科學及海洋工程學系

Files in This Item:
File SizeFormat 
ntu-108-1.pdf
  Restricted Access
4.22 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved