ESemiCrowd - 中文自然語言處理的群眾外包架構

Tzu-Yun Huang; 黃資勻

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68336

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	謝舒凱
dc.contributor.author	Tzu-Yun Huang	en
dc.contributor.author	黃資勻	zh_TW
dc.date.accessioned	2021-06-17T02:18:01Z	-
dc.date.available	2022-08-30
dc.date.copyright	2017-08-30
dc.date.issued	2017
dc.date.submitted	2017-08-28
dc.identifier.citation	Allahbakhsh, M., Benatallah, B., Ignjatovic, A., Motahari-Nezhad, H. R., Bertino, E., & Dustdar, S. (2013). Quality control in crowdsourcing systems: Issues and directions. IEEE Internet Computing, 17(2), 76-81. Anagnostopoulos, A., Becchetti, L., Castillo, C., Gionis, A., & Leonardi, S. (2012). Online team formation in social networks. Paper presented at the Proceedings of the 21st international conference on World Wide Web. Antin, J., & Shaw, A. (2012). Social desirability bias and self-reports of motivation: a study of amazon mechanical turk in the US and India. Paper presented at the Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Baba, Y., & Kashima, H. (2013). Statistical Quality Estimation for General Crowdsourcing Tasks. Becker, G. S., & Murphy, K. M. (1992). The division of labor, coordination costs, and knowledge. The Quarterly Journal of Economics, 107(4), 1137-1160. Bernstein, M. S., Little, G., Miller, R. C., Hartmann, B., Ackerman, M. S., Karger, D. R., . . . Panovich, K. (2015). Soylent: a word processor with a crowd inside. Communications of the ACM, 58(8), 85-94. Böhmová, A., & Sgall, P. (2001). Automatic procedures in tectogrammatical tagging. Paper presented at the The Prague Bulletin of Mathematical Linguistics. Bos, J. (2008). Let’s not argue about semantics. Callison-Burch, C. (2009). Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk. Paper presented at the Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1. Callison-Burch, C., & Dredze, M. (2010). Creating speech and language data with Amazon's Mechanical Turk. Paper presented at the Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. Carletta, J. (1996). Assessing agreement on classification tasks: the kappa statistic. Computational Linguistics, 22(2), 249-254. Chamberlain, J., Kruschwitz, U., & Poesio, M. (2009). Constructing an anaphorically annotated corpus with non-experts: Assessing the quality of collaborative annotations. Paper presented at the Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources. Chen, J. J., Menezes, N. J., Bradley, A. D., & North, T. (2011). Opportunities for crowdsourcing research on amazon mechanical turk. Interfaces, 5(3). Chilton, L. B., Horton, J. J., Miller, R. C., & Azenkot, S. (2010). Task search in a human computation market. Paper presented at the Proceedings of the ACM SIGKDD workshop on human computation. Chklovski, T. A., & Mihalcea, R. (2003). Exploiting agreement and disagreement of human annotators for word sense disambiguation. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1), 37-46. de Herrera, A. G. S., Foncubierta-Rodríguez, A., Markonis, D., Schaer, R., & Müller, H. (2014). Crowdsourcing for medical image classification. Swiss Medical Informatics, 30. Delaney, D. G., Sperling, C. D., Adams, C. S., & Leung, B. (2008). Marine invasive species: validation of citizen science and implications for national monitoring networks. Biological Invasions, 10(1), 117-128. Douceur, J. R. (2002). The sybil attack. Paper presented at the International Workshop on Peer-to-Peer Systems. Drapeau, R., Chilton, L. B., Bragg, J., & Weld, D. S. (2016). MicroTalk: Using Argumentation to Improve Crowdsourcing Accuracy. Paper presented at the Proceedings of the 4th AAAI Conference on Human Computation and Crowdsourcing (HCOMP). Feizabadi, P. S., & Padó, S. (2014). Crowdsourcing Annotation of Non-Local Semantic Roles. Paper presented at the EACL. Fort, K., Adda, G., & Cohen, K. B. (2011). Amazon mechanical turk: Gold mine or coal mine? Computational Linguistics, 37(2), 413-420. Franklin, M. J., Kossmann, D., Kraska, T., Ramesh, S., & Xin, R. (2011). CrowdDB: answering queries with crowdsourcing. Paper presented at the Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. Gatautis, R., & Vitkauskaite, E. (2014). Crowdsourcing application in marketing activities. Procedia-Social and Behavioral Sciences, 110, 1243-1250. Gelas, H., Abate, S. T., Besacier, L., & Pellegrino, F. (2011). Quality Assessment of Crowdsourcing Transcriptions for African Languages. Paper presented at the Interspeech. Good, B. M., Nanis, M., Wu, C., & Su, A. I. (2014). Microtask crowdsourcing for disease mention annotation in pubmed abstracts. Paper presented at the Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. Guillaume, B., Fort, K., & Lefebvre, N. (2016). Crowdsourcing complex language resources: Playing to annotate dependency syntax. Paper presented at the International Conference on Computational Linguistics (COLING). Hackman, J. R., & Oldham, G. R. (1976). Motivation through the design of work: Test of a theory. Organizational behavior and human performance, 16(2), 250-279. Heer, J., & Bostock, M. (2010). Crowdsourcing graphical perception: using mechanical turk to assess visualization design. Paper presented at the Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Hong, J., & Baker, C. F. (2011). How good is the crowd at real WSD? Paper presented at the Proceedings of the 5th linguistic annotation workshop. Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., & Weischedel, R. (2006). OntoNotes: the 90% solution. Paper presented at the Proceedings of the human language technology conference of the NAACL, Companion Volume: Short Papers. Huang, T.-Y., Wu, H.-H., Lee, C.-C., Lee, G.-W., Lee, S.-M., & Hsieh, S.-K. (2016). Crowdsourcing Experiment Designs for Chinese Word Sense. Ipeirotis, P. G. (2010). Demographics of mechanical turk. Irvine, A., & Klementiev, A. (2010). Using Mechanical Turk to annotate lexicons for less commonly used languages. Paper presented at the Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. Jurgens, D., & Navigli, R. (2014). It's all fun and games until someone annotates: Video games with a purpose for linguistic annotation. Transactions of the Association for Computational Linguistics, 2, 449-464. Kapelner, A., Kaliannan, K., Schwartz, H. A., Ungar, L. H., & Foster, D. P. (2012). New Insights from Coarse Word Sense Disambiguation in the Crowd. Paper presented at the COLING (Posters). Kaufmann, N., Schulze, T., & Veit, D. (2011). More than fun and money. Worker Motivation in Crowdsourcing–A Study on Mechanical Turk. Kittur, A., Khamkar, S., André, P., & Kraut, R. (2012). CrowdWeaver: visually managing complex crowd work. Paper presented at the Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work. Kittur, A., Nickerson, J. V., Bernstein, M., Gerber, E., Shaw, A., Zimmerman, J., . . . Horton, J. (2013). The future of crowd work. Paper presented at the Proceedings of the 2013 conference on Computer supported cooperative work. Klaus, K. (1980). Content analysis: An introduction to its methodology: Sage Publications. Kulkarni, A., Gutheim, P., Narula, P., Rolnitzky, D., Parikh, T., & Hartmann, B. (2012). Mobileworks: Designing for quality in a managed crowdsourcing architecture. IEEE Internet Computing, 16(5), 28-35. Lapitan, F. R. G., Batista-Navarro, R., & Albacea, E. A. (2016). Crowdsourcing-based Annotation of Emotions in Filipino and English Tweets. WSSANLP 2016, 74. Lasecki, W. S., Murray, K. I., White, S., Miller, R. C., & Bigham, J. P. (2011). Real-time crowd control of existing interfaces. Paper presented at the Proceedings of the 24th annual ACM symposium on User interface software and technology. Lasecki, W. S., White, S. C., Murray, K. I., & Bigham, J. P. (2012). Crowd memory: Learning in the collective. arXiv preprint arXiv:1204.3678. Liu, T.-j. (2014). PTT Corpus: Construction and Applications. Master's Thesis), National Taiwan University, Taiwan. Micallef, L., Dragicevic, P., & Fekete, J.-D. (2012). Assessing the effect of visualizations on bayesian reasoning through crowdsourcing. IEEE Transactions on Visualization and Computer Graphics, 18(12), 2536-2545. Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to WordNet: An on-line lexical database. International journal of lexicography, 3(4), 235-244. Mohammad, S. M., & Turney, P. D. (2013). Crowdsourcing a word–emotion association lexicon. Computational Intelligence, 29(3), 436-465. Munro, R., Bethard, S., Kuperman, V., Lai, V. T., Melnick, R., Potts, C., . . . Tily, H. (2010). Crowdsourcing and language studies: the new generation of linguistic data. Paper presented at the Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon's Mechanical Turk. Nowak, S., & Rüger, S. (2010). How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation. Paper presented at the Proceedings of the international conference on Multimedia information retrieval. Parent, G., & Eskenazi, M. (2011). Speaking to the Crowd: Looking at Past Achievements in Using Crowdsourcing for Speech and Predicting Future Challenges. Paper presented at the INTERSPEECH. Ross, J., Irani, L., Silberman, M., Zaldivar, A., & Tomlinson, B. (2010). Who are the crowdworkers?: shifting demographics in mechanical turk. Paper presented at the CHI'10 extended abstracts on Human factors in computing systems. Rumshisky, A. (2011). Crowdsourcing word sense definition. Paper presented at the Proceedings of the 5th Linguistic Annotation Workshop. Sabou, M., Bontcheva, K., & Scharl, A. (2012). Crowdsourcing research opportunities: lessons from natural language processing. Paper presented at the Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies. Scott, S., & Matwin, S. (1999). Feature engineering for text classification. Paper presented at the ICML. Sheng, V. S., Provost, F., & Ipeirotis, P. G. (2008). Get another label? improving data quality and data mining using multiple, noisy labelers. Paper presented at the Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. Snow, R., O'Connor, B., Jurafsky, D., & Ng, A. Y. (2008). Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks. Paper presented at the Proceedings of the conference on empirical methods in natural language processing. Sokolova, M., Japkowicz, N., & Szpakowicz, S. (2006). Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. Paper presented at the Australasian Joint Conference on Artificial Intelligence. Soleymani, M., & Larson, M. (2010). Crowdsourcing for affective annotation of video: Development of a viewer-reported boredom corpus. Stevenson, M., & Wilks, Y. (2003). Word sense disambiguation. The Oxford Handbook of Comp. Linguistics, 249-265. Tarasov, A., Delany, S. J., & Cullen, C. (2010). Using crowdsourcing for labelling emotional speech assets. Tratz, S., & Hovy, E. (2010). A taxonomy, dataset, and classifier for automatic noun compound interpretation. Paper presented at the Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Vannella, D., Jurgens, D., Scarfini, D., Toscani, D., & Navigli, R. (2014). Validating and Extending Semantic Knowledge Bases using Video Games with a Purpose. Paper presented at the ACL (1). Venhuizen, N., Evang, K., Basile, V., & Bos, J. (2013). Gamification for word sense labeling. Paper presented at the Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013). Véronis, J. (1998). A study of polysemy judgements and inter-annotator agreement. Paper presented at the Programme and advanced papers of the Senseval workshop. Von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. Paper presented at the Proceedings of the SIGCHI conference on Human factors in computing systems. Von Ahn, L., & Dabbish, L. (2008). Designing games with a purpose. Communications of the ACM, 51(8), 58-67. Wang, S., Huang, C.-R., Yao, Y., & Chan, A. (2014a). Building a semantic transparency dataset of chinese nominal compounds: A practice of crowdsourcing methodology. Paper presented at the Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing. Wang, S., Huang, C.-R., Yao, Y., & Chan, A. (2014b). Exploring mental lexicon in an efficient and economic way: Crowdsourcing method for linguistic experiments. COLING 2014, 105-113. Wang, S., Huang, C.-R., Yao, Y., & Chan, A. (2015). Mechanical Turk-based Experiment vs Laboratory-based Experiment: A Case Study on the Comparison of Semantic Transparency Rating Data. Paper presented at the PACLIC. Whitla, P. (2009). Crowdsourcing and its application in marketing activities. Contemporary Management Research, 5(1). Yan, J. L. S., & Turtle, H. R. (2016). Exposing a Set of Fine-Grained Emotion Categories from Tweets. Paper presented at the 25th International Joint Conference on Artificial Intelligence. Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. Paper presented at the Proceedings of the 33rd annual meeting on Association for Computational Linguistics. Yetisgen-Yildiz, M., Solti, I., Xia, F., & Halgrim, S. R. (2010). Preliminary experience with Amazon's Mechanical Turk for annotating medical named entities. Paper presented at the Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. Yong, C., & Foo, S. K. (1999). A case study on inter-annotator agreement for word sense disambiguation. Paper presented at the Proceedings of the ACL SIGLEX Workshop on Standardizing Lexical Resources (SIGLEX99), College Park, MD. Zaidan, O. F. (2012). Crowdsourcing annotation for machine learning in natural language processing tasks. Johns Hopkins University, Baltimore, MD. Zaidan, O. F., & Callison-Burch, C. (2011). Crowdsourcing translation: Professional quality from non-professionals. Paper presented at the Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Zhao, Y., & Zhu, Q. (2014). Evaluation on crowdsourcing research: Current status and future direction. Information Systems Frontiers, 16(3), 417-434. Zouaq, A., Gasevic, D., & Hatala, M. (2012). Voting theory for concept detection. Paper presented at the Extended Semantic Web Conference. Zubiaga, A., Liakata, M., Procter, R., Bontcheva, K., & Tolmie, P. (2015). Crowdsourcing the annotation of rumourous conversations in social media. Paper presented at the Proceedings of the 24th International Conference on World Wide Web. 崔懷芝. (2014). 量表信度的測量：Kappa統計量之簡介. Retrieved from http://biostatdept.cmu.edu.tw/doc/epaper_c/2.pdf
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68336	-
dc.description.abstract	ESemiCrowd 架構藉由加入語言學專家的知識到標記流程中，重新定義針對中文自然語言處理標記，實行群眾外包的概念和方法。ESemiCrowd架構讓花費維持在群眾外包的水平，但卻能夠讓標記資料品質遠高過群眾外包，近乎專家標記。透過較複雜的中文歧義消除實驗，從三個層次來評估群眾外包(CrowdFlower)、專家(Experts)和融合專家到群眾外包(ESemiCrowd)這三種方式的標記成效。第一層次是比較每一種方式裡面，標記者的標記成效。第二層次是比較這三種方法彼此間標記結果的標記成效。第三層次則是比較這三種方法和黃金標準答案之間的標記成效。從最後結果可以看到，融合專家與群眾外包(ESemiCrowd)的F-measure達到 0.83, 是群眾外包(CrowdFlower)的兩倍; agreement 達到0.72, 是群眾外包(CrowdFlower)的六倍。而這樣的成果，只比群眾外包(CrowdFlower)多花費不到一塊美金。此架構包含九項聚焦重點：第一，拆解和分配案件的工作流程; 第二，工作流程每個階段的人力配置和責任; 第三，案件分配方式案件分配方式; 第四, 運用最有效也最低風險的方式來吸引能力適當的工作者; 第五, 建立人才庫以縮短分配案件到合適工作者手中的時間; 第六, 在每一個工作流程階段持續進行監視以及品質控制; 第七, 仔細說明標記平台專家的任務細節，包括完成部分語料前標記、建立標記架構、以及提供工作者教育訓練等等; 第八, 建立制度表揚高品質高作者以及避免工作倦怠; 最後第九，是賦予工作者每項任務的意義以及肯定其貢獻。	zh_TW
dc.description.abstract	ESemiCrowd framework redefined crowdsourcing for Natural Language Processing by adding linguistic expert knowledge into annotation flow. Not only did the solution controlled the cost to remain at crowdsourcing level, but more importantly raise the data quality to reach expert-level. The evaluation of the ESemiCrowd layered to the comparison between approaches, Crowdsourcing, Experts, and ESemiCrowd, on Word Sense Disambiguation(WSD) task in 3 levels. First level compares individual annotator performance within an approach; second level compares annotation result among approaches; and third level compares the gold standard answers with three approaches. From the final result, the F-measure of ESemiCrowd reached 0.83, which is twice higher than Crowdsourcing(CrowdFlower); and the agreement reached 0.72, which is 6 times better than Crowdsourcing(CrowdFlower). And it takes less than one USD to reach this performance. The framework including 9 foci: Workflow, Hierarchy Circle, Task Assignment, Crowd Work with Annotator Database, 8-Level Quality Control, Crowdsourcing Platform Design, The Role of Platform Experts, Reward System with Game Elements, and Worker Motivation Maintenance.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T02:18:01Z (GMT). No. of bitstreams: 1 ntu-106-R02142006-1.pdf: 2651444 bytes, checksum: 35d7ebbdd366f7a137a1aad03578774e (MD5) Previous issue date: 2017	en
dc.description.tableofcontents	ACKNOWLEDGEMENTS i 中文摘要 ii ABSTRACT iii TABLE OF CONTENTS iv LIST OF FIGURES vii LIST OF TABLES viii Chapter 1 Introduction 1 1.1 Overview and Major Themes 4 1.1.1 Expert Annotations 4 1.1.2 Crowdsourcing Annotations (Non-Experts) 5 1.1.3 Novel Crowdsourcing Platform Framework: ESemiCrowd 8 1.1.4 Quality Control Levels 9 1.2 Main Contributions 10 1.3 Thesis Outline 11 Chapter 2 Literature Review 14 2.1 12-Foci Model 14 2.2 Quality Control Workflow – MicroTalk 19 2.3 Crowdsourcing Task Design - WSD 20 2.4 GWAP 23 2.5 Data Aggregation 24 2.6 Data Evaluation 25 2.7 Data Resources 29 2.7.1 PTT Corpus 29 2.7.2 Chinese Wordnet 30 Chapter 3 Methodology 31 3.1 Basic Data Information 32 3.2 Expert Annotation 34 3.3 CrowdFlower Annotation 35 3.4 ESemiCrowd Annotation 36 3.4.1 Fundamental Design Base of ESemiCrowd Platform 37 3.4.2 The Design of The Task 57 3.5 Evaluation 58 Chapter 4 Results & Discussion 59 4.1 Level One Evaluation Result 59 4.1.1 Experts 60 4.1.2 Crowdsourcing 64 4.1.3 ESemiCrowd 69 4.2 Level Two Evaluation Result 73 4.3 Level Three Evaluation Result 74 4.4 Discussions 76 Chapter 5 Conclusion 83 Appendix 1 – Gold Standard Answers 87 Appendix 2 – Experts Aggregation Results 92 Appendix 3 – Crowdsourcing Aggregation Results 98 Appendix 4 – ESemiCrowd Aggregation Results 104 REFERENCE 109
dc.language.iso	en
dc.subject	語言學標記	zh_TW
dc.subject	中文	zh_TW
dc.subject	自然語言處理	zh_TW
dc.subject	群眾募集	zh_TW
dc.subject	遊戲化	zh_TW
dc.subject	Crowdsourcing	en
dc.subject	Chinese	en
dc.subject	NLP	en
dc.subject	GWAP	en
dc.subject	Linguistics Annotation	en
dc.title	ESemiCrowd - 中文自然語言處理的群眾外包架構	zh_TW
dc.title	ESemiCrowd - A Crowdsourcing Framework for Chinese NLP	en
dc.type	Thesis
dc.date.schoolyear	105-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	范耀中,洪嘉馡
dc.subject.keyword	語言學標記,中文,自然語言處理,群眾募集,遊戲化,	zh_TW
dc.subject.keyword	Linguistics Annotation,Chinese,NLP,Crowdsourcing,GWAP,	en
dc.relation.page	113
dc.identifier.doi	10.6342/NTU201704178
dc.rights.note	有償授權
dc.date.accepted	2017-08-28
dc.contributor.author-college	文學院	zh_TW
dc.contributor.author-dept	語言學研究所	zh_TW
顯示於系所單位：	語言學研究所

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf 未授權公開取用	2.59 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。