Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 文學院
  3. 語言學研究所
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68336
Title: ESemiCrowd - 中文自然語言處理的群眾外包架構
ESemiCrowd - A Crowdsourcing Framework for Chinese NLP
Authors: Tzu-Yun Huang
黃資勻
Advisor: 謝舒凱
Keyword: 語言學標記,中文,自然語言處理,群眾募集,遊戲化,
Linguistics Annotation,Chinese,NLP,Crowdsourcing,GWAP,
Publication Year : 2017
Degree: 碩士
Abstract: ESemiCrowd 架構藉由加入語言學專家的知識到標記流程中,重新定義針對中文自然語言處理標記,實行群眾外包的概念和方法。ESemiCrowd架構讓花費維持在群眾外包的水平,但卻能夠讓標記資料品質遠高過群眾外包,近乎專家標記。透過較複雜的中文歧義消除實驗,從三個層次來評估群眾外包(CrowdFlower)、專家(Experts)和融合專家到群眾外包(ESemiCrowd)這三種方式的標記成效。第一層次是比較每一種方式裡面,標記者的標記成效。第二層次是比較這三種方法彼此間標記結果的標記成效。第三層次則是比較這三種方法和黃金標準答案之間的標記成效。從最後結果可以看到,融合專家與群眾外包(ESemiCrowd)的F-measure達到 0.83, 是群眾外包(CrowdFlower)的兩倍; agreement 達到0.72, 是群眾外包(CrowdFlower)的六倍。而這樣的成果,只比群眾外包(CrowdFlower)多花費不到一塊美金。
此架構包含九項聚焦重點:第一,拆解和分配案件的工作流程; 第二,工作流程每個階段的人力配置和責任; 第三,案件分配方式案件分配方式; 第四, 運用最有效也最低風險的方式來吸引能力適當的工作者; 第五, 建立人才庫以縮短分配案件到合適工作者手中的時間; 第六, 在每一個工作流程階段持續進行監視以及品質控制; 第七, 仔細說明標記平台專家的任務細節,包括完成部分語料前標記、建立標記架構、以及提供工作者教育訓練等等; 第八, 建立制度表揚高品質高作者以及避免工作倦怠; 最後第九,是賦予工作者每項任務的意義以及肯定其貢獻。
ESemiCrowd framework redefined crowdsourcing for Natural Language Processing by adding linguistic expert knowledge into annotation flow. Not only did the solution controlled the cost to remain at crowdsourcing level, but more importantly raise the data quality to reach expert-level. The evaluation of the ESemiCrowd layered to the comparison between approaches, Crowdsourcing, Experts, and ESemiCrowd, on Word Sense Disambiguation(WSD) task in 3 levels. First level compares individual annotator performance within an approach; second level compares annotation result among approaches; and third level compares the gold standard answers with three approaches. From the final result, the F-measure of ESemiCrowd reached 0.83, which is twice higher than Crowdsourcing(CrowdFlower); and the agreement reached 0.72, which is 6 times better than Crowdsourcing(CrowdFlower). And it takes less than one USD to reach this performance.
The framework including 9 foci: Workflow, Hierarchy Circle, Task Assignment, Crowd Work with Annotator Database, 8-Level Quality Control, Crowdsourcing Platform Design, The Role of Platform Experts, Reward System with Game Elements, and Worker Motivation Maintenance.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68336
DOI: 10.6342/NTU201704178
Fulltext Rights: 有償授權
Appears in Collections:語言學研究所

Files in This Item:
File SizeFormat 
ntu-106-1.pdf
  Restricted Access
2.59 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved