以協作資料為基礎的概念空間自動建構研究

WEN-TAI HSIEH; 謝文泰

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/18327

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	曹承礎
dc.contributor.author	WEN-TAI HSIEH	en
dc.contributor.author	謝文泰	zh_TW
dc.date.accessioned	2021-06-08T00:59:56Z	-
dc.date.copyright	2015-03-13
dc.date.issued	2014
dc.date.submitted	2015-01-14
dc.identifier.citation	[1] Bakshy, E., Rosenn, I., Marlow, C., & Adamic, L. (2012, April). The role of social networks in information diffusion. In Proceedings of the 21st international conference on World Wide Web (pp. 519-528). ACM. [2] Liang, T. P., & Turban, E. (2011). Introduction to the special issue social commerce: a research framework for social commerce. International Journal of Electronic Commerce, 16(2), 5-14. [3] Gruber, T. R. (1993). A translation approach to portable ontology specifications. Knowledge acquisition, 5(2), 199-220. [4] Chen, H., & Lynch, K. J. (1992). Automatic construction of networks of concepts characterizing document databases. Systems, Man and Cybernetics, IEEE Transactions on, 22(5), 885-902. [5] Wang, C., Raina, R., Fong, D., Zhou, D., Han, J., & Badros, G. (2011, July). Learning relevance from heterogeneous social network and its application in online targeting. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (pp. 655-664). ACM. [6] Chau, R., Yeh, C., & Smith, K. A. (2005). A neural network model for hierarchical multilingual text categorization. In Advances in Neural Networks–ISNN 2005 (pp. 238-245). Springer Berlin Heidelberg. [7] Chen, H., Lally, A. M., Zhu, B., & Chau, M. (2003). HelpfulMed: intelligent searching for medical information over the internet. Journal of the American Society for Information Science and Technology, 54(7), 683-694 [8] Sheth, A., Thomas, C., & Mehra, P. (2010). Continuous semantics to analyze real-time data. IEEE Internet Computing, 14(6), 0084-89. [9] Sleator, D. D., & Temperley, D. (1995). Parsing English with a link grammar. arXiv preprint cmp-lg/9508004. [10] Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39-41. [11] Weibel, S. (1995). OCLC/NCSA metadata workshop report. http://www.ifla. org/documents/libraries/cataloging/oclcmeta.htm. [12] McGuinness, D. L., & Van Harmelen, F. (2004). OWL web ontology language overview. W3C recommendation, 10(10), 2004. [13] Bozsak, E., Ehrig, M., Handschuh, S., Hotho, A., Maedche, A., Motik, B., ... & Zacharias, V. (2002). KAON—towards a large scale Semantic Web. In E-Commerce and Web Technologies (pp. 304-313). Springer Berlin Heidelberg. [14] Smith, A. E., & Humphreys, M. S. (2006). Evaluation of unsupervised semantic mapping of natural language with Leximancer concept mapping. Behavior Research Methods, 38(2), 262-279. [15] Ahmad, K., & Gillam, L. (2005). Automatic ontology extraction from unstructured texts. In On the Move to Meaningful Internet Systems 2005: CoopIS, DOA, and ODBASE (pp. 1330-1346). Springer Berlin Heidelberg. [16] Lee, C. S., Kao, Y. F., Kuo, Y. H., & Wang, M. H. (2007). Automated ontology construction for unstructured text documents. Data & Knowledge Engineering, 60(3), 547-566. [17] Alani, H. (2006, May). Position paper: ontology construction from online ontologies. In Proceedings of the 15th international conference on World Wide Web (pp. 491-495). ACM. [18] Tijerino, Y. A., Embley, D. W., Lonsdale, D. W., Ding, Y., & Nagy, G. (2005). Towards ontology generation from tables. World Wide Web, 8(3), 261-285. [19] Moustafa, S., Badr, N., Karam, O., & Gharib, T. (2010). Enriching Ontologies using Coarse-Grained Word Senses. Journal of Egyptian Computer Science, 34(2). [20] Valarakos, A. G., Paliouras, G., Karkaletsis, V., & Vouros, G. (2004). Enhancing ontological knowledge through ontology population and enrichment. In Engineering knowledge in the age of the Semantic Web (pp. 144-156). Springer Berlin Heidelberg. [21] Parekh, V., Gwo, J., & Finin, T. W. (2004, June). Mining Domain Specific Texts and Glossaries to Evaluate and Enrich Domain Ontologies. In IKE (pp. 533-540). [22] Cimiano, P., Hotho, A., & Staab, S. (2005). Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis. J. Artif. Intell. Res.(JAIR), 24, 305-339. [23] Cimiano, P., & Staab, S. (2005). Learning concept hierarchies from text with a guided agglomerative clustering algorithm. In Proceedings of the ICML 2005 Workshop on Learning and Extending Lexical Ontologies with Machine Learning Methods. [24] Ruiz-Casado, M., Alfonseca, E., & Castells, P. (2007). Automatising the learning of lexical patterns: An application to the enrichment of wordnet by extracting semantic relationships from wikipedia. Data & Knowledge Engineering, 61(3), 484-499. [25] Gharib, T. F., Badr, N. L., Haridy, S., & Abraham, A. (2012). Enriching Ontology Concepts Based on Texts from WWW and Corpus. J. UCS, 18(16), 2234-2251. [26] Bouquet, P., Serafini, L., & Zanobini, S. (2003). Semantic coordination: a new approach and an application. In The Semantic Web-ISWC 2003 (pp. 130-145). Springer Berlin Heidelberg. [27] Doan, A., Domingos, P., & Halevy, A. (2003). Learning to match the schemas of data sources: A multistrategy approach. Machine Learning, 50(3), 279-301. [28] Pei, M., Nakayama, K., Hara, T., & Nishio, S. (2008, March). Constructing a global ontology by concept mapping using wikipedia thesaurus. In Advanced Information Networking and Applications-Workshops, 2008. AINAW 2008. 22nd International Conference on (pp. 1205-1210). IEEE. [29] Knoth, P., & Herrmannova, D. (2013). Simple yet effective methods for cross-lingual link discovery (CLLD)-KMI@ NTCIR-10 CrossLink-2 [30] Och, F. J., & Ney, H. (2003). A systematic comparison of various statistical alignment models. Computational linguistics, 29(1), 19-51. [31] Hsieh, W. T., Stu, J., Chen, Y. L., & Chou, S. C. T. (2009). A collaborative desktop tagging system for group knowledge management based on concept space. Expert Systems with Applications, 36(5), 9513-9523. [32] Wong, P. K., & Chan, C. (1996, August). Chinese word segmentation based on maximum matching and word binding force. In Proceedings of the 16th conference on Computational linguistics-Volume 1 (pp. 200-203). Association for Computational Linguistics. [33] Tsuruoka, Y., & Tsujii, J. I. (2005, October). Bidirectional inference with the easiest-first strategy for tagging sequence data. In Proceedings of the conference on human language technology and empirical methods in natural language processing (pp. 467-474). Association for Computational Linguistics. [34] Ma, W. Y., & Chen, K. J. (2003, July). Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff. In Proceedings of the second SIGHAN workshop on Chinese language processing-Volume 17 (pp. 168-171). Association for Computational Linguistics. [35] Jarvelin, K., & Kekalainen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), 20(4), 422-446. [36] Hsieh, W. T., Ku, T., Wu, C. M., & Chou, S. C. T. (2012, July). Social event radar: a bilingual context mining and sentiment analysis summarization system. In Proceedings of the ACL 2012 System Demonstrations (pp. 163-168). Association for Computational Linguistics. [37] Sauper, C., & Barzilay, R. (2009, August). Automatically generating wikipedia articles: A structure-aware approach. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1 (pp. 208-216). Association for Computational Linguistics.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/18327	-
dc.description.abstract	由於社群應用興起，群眾共創的資訊暴增，直接影響了所有使用者的資訊接收與決策行為。為了善加利用與分析這些網路的文字資訊，我們需要一個可以因應用領域變更，新詞彙出現，而持續成長的概念空間，做為語意計算的基礎知識庫。本體知識自動建構是一個與本研究相關的重要研究議題，前人的研究包括下列三個方向:從非結構文本自動建構、從既有的本體知識擴充、從半結構化語料庫建構。我們發展了一個可以將社群協作網站作為半結構化語料庫的概念空間自動建構架構，包括專名偵測、過濾、消岐義、專名擴充與排序，我們詳述了研究的步驟以及評估的方法，並在最後針對本體知識自動建構的演進與品質進行討論。我們所提出的架構證實在現實世界的即時資料流動下以自動的方式產生本體知識自動建構是可行的，而且無論是跨語言、跨領域或是在處理即時資料上較前人研究上提供了更廣的資料涵蓋範圍，有別於過去單一方法用於單一應用，本論文所採用的架構可提供更多實際的應用。	zh_TW
dc.description.abstract	With the prevalence of Social Networking Services (SNS), real-world consumption behaviors are influenced from reality to social networks. In order to utilize the information from social network, we need a concept space that can alter with application domain. In the presence of new vocabulary and continuously growing, and automatic ontology construction has been an important issue. There are previous studies concerning free-format ontology construction, enriching given ontologies from web or corpus sources, and construction of ontology from semi-structural corpora; among these studies, semi-structural corpora have been prevailing studies. In this thesis, we developed an adaptive framework for cross corpora on social collaborative editing, and we focus on semi-structural text mining in particular. The framework involves detection of named entity in a document, filtering of named entity, disambiguation detection, named entity expansion and ranking of the related named entity. We describe how this framework in detail and proposed method for each stage, and the metrics in the previous studies and the one we used for evaluation. We then discuss the evolution and quality of concept space. Our proposed framework made real-world corpora computationally possible, and a dynamic concept space is generated from this framework. It could deal with more diverse domains and languages, and for pragmatic real-world applications, our method shows better flexibility than previous studies.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T00:59:56Z (GMT). No. of bitstreams: 1 ntu-103-D95725001-1.pdf: 4167346 bytes, checksum: 41ba1bf92b2d52b79cebb3e2ecf6ad9c (MD5) Previous issue date: 2014	en
dc.description.tableofcontents	口試委員審定書 i 中文摘要 ii THESIS ABSTRACT iii CONTENTS v LIST OF FIGURES viii LIST OF TABLES xi Chapter 1 Introduction 1 1.1 Social Networks and Marketing 1 1.2 Knowledge Work Task 2 1.2.1 Analyze/Organize 3 1.2.2 Find 3 1.2.3 Physical features.................... 4 1.3 Concept Space 5 1.4 Research Goal 5 Chapter 2 Related Work 7 2.1 Link Grammar, WordNet and Ontology 7 2.1.1 Link Grammar 7 2.1.2 WordNet 12 2.1.3 Ontology 13 2.2 Automatic Construction of Ontology with free format texts 18 2.3 Enriching ontology with information extraction from Web/Corpus 19 2.4 Building ontology with semi-structural corpora 21 Chapter 3 Constructing Concept Space 24 3.1 Problem Statement 24 3.2 Anchor Detection 24 3.3 Anchor Filtering and Disambiguation 26 3.4 Expansion and Ranking 26 3.4.1 Training Stage 27 3.4.2 Run-Time Stage 27 Chapter 4 Evaluation 30 4.1 Evaluation on Concept Space 30 4.2 Preliminary Evaluation 49 4.2.1 Experimental Setting 49 4.2.2 Experimental Setting 50 4.2.3 Preliminary Results 51 4.3 Evaluation for Expansion and Ranking 54 4.3.1 Experimental Setting 54 4.3.2 Evaluation 54 Chapter 5 Discussion 56 5.1 Evolution of Corpus: Wikipedia vs. Concept Space 56 5.2 Quality of Concept Space 61 5.2.1 Frequently Edited Entry Articles 62 5.2.2 Non-frequently Edited Entry Articles 64 Chapter 6 Conclusion 66 6.1 Adaptive Framework for Constructing Concept Space across Different Corpora 66 6.2 The Evolution of the Corpora and the Quality of the Concept Space 68 6.3 Contribution and Future Work 69 6.3.1 The Corpus Type Diversity 69 6.3.2 Automatic Entry Point Selection 70 6.3.3 The NER Precision 70 REFERENCES 71 LIST OF FIGURES Fig. 1.1 Information analysis and retrieval framework 3 Fig. 2.1 Words and connectors in a dictionary 7 Fig. 2.2 All linking requirements are satisfied 8 Fig. 2.3 A simplified form of Figure. 2 8 Fig. 2.4 The link grammar output Linkage example 1 11 Fig. 2.5 The link grammar output Linkage example 2 11 Fig. 2.6 The Link Grammar output sample through Phrase Parser component -1 12 Fig. 2.7 The Link Grammar output sample through Phrase Parser component -2 12 Fig. 2.8 An example of Ontology using KAON component -2 18 Fig. 2.9 Flowchart of automatically constructing ontology via free format text mining 19 Fig. 2.10 Flowchart of building concept space from Wikipedia 22 Fig. 3.1 CRF model training 25 Fig. 3.2 Extracting keywords at run-time 28 Fig. 4.1 Average adopted tags of training data 31 Fig. 4.2 Average recommended tags of training data 32 Fig. 4.3 Tag adoption rate of two users 35 Fig. 4.4 Average adoption rate of documents with same frequency tag 36 Fig. 4.5 Average adoption rate of documents with same frequency tag pair 37 Fig. 4.6 Average adoption rate of collection of documents with same frequency tag pair after removing low inter-similarity collections 39 Fig. 4.7 Relevant search experimental procedures 39 Fig. 4.8 Average relevant search result of top 10 tags 40 Fig. 4.9 Precision, recall of relevant search, keyword search for top 10 42 Fig. 4.10 F-measure of relevant search and keyword search for top 10 44 Fig. 4.11 The example of matrix map in patent analysis 45 Fig. 4.12 Steps in patent analysis scenario 46 Fig. 4.13 Tagging interface of DCT system 47 Fig. 4.14 The attributes merging concept 47 Fig. 4.15 Comparisons of Interpolated Precision-Recall (C2E GT F2F) 53 Fig. 4.16 Comparisons of Interpolated Precision-Recall (C2E MA A2F) 53 Fig. 5.1 Screenshot of the beginning edition of the term “太陽花學運” in Wikipedia 56 Fig. 5.2 Screenshot of the latest edition of the term “太陽花學運” in Wikipedia 56 Fig. 5.3 Summary of the edition history of the term “太陽花學運” in Wikipedia 57 Fig. 5.4 Concept space the term “太陽花學運” in Wikipedia, March 18, 2014 57 Fig. 5.5 Concept space the term “太陽花學運” in Wikipedia, April 1st, 2014 58 Fig. 5.6 Concept space the term “太陽花學運” in Wikipedia, April 10, 2014 59 Fig. 5.7 Search results of three student leaders during the movement and comparison with concept space relatedness 60 Fig. 5.8 Search results of symbolic concepts during the movement and comparison with concept space relatedness 60 Fig. 5.9 Watch list of edited entry articles on Wikipedia 62 Fig. 5.10 Editing changes and statistics of “Samsung Galaxy S5” 62 Fig. 5.11 Editing changes and statistics of “iPhone 5S” 63 Fig. 5.12 Editing changes and statistics of “HTC One M8” 63 Fig. 5.13 Editing changes of “HTC One M8” after five days 64 Fig. 5.14 Editing changes of “聖元國際” after four days 65 LIST OF TABLES Table 2.1 The words and linking requirements in a dictionary. 8 Table 2.2 WordNet Synset sample of “Add” 13 Table 4.1 Participants in NTCIR-10 CLLD. 35 Table 4.2 Use frequency of other tags except shared tag pair. 38 Table 4.3 Performance of relevant search and keyword search 43 Table 4.4 Performance of relevant search and keyword search. 48 Table 4.5 Participants in NTCIR-10 CLLD. 51 Table 4.6 CJK2E F2F evaluation with Wikipedia ground- truth: LMAP, R-PREC. 52 Table 4.7 CJK2E F2F evaluation with manual assessment results: LMAP, R-PREC. 52 Table 4.8 F2F evaluation with Wikipedia ground-truth: Precision-at-N (Chinese-to-English). 52 Table 4.9 System performance at (a) N=5 (b) N=7 (c) N=10. 55
dc.language.iso	en
dc.title	以協作資料為基礎的概念空間自動建構研究	zh_TW
dc.title	Constructing Concept Space from Social Collaborative Editing	en
dc.type	Thesis
dc.date.schoolyear	103-1
dc.description.degree	博士
dc.contributor.oralexamcommittee	苑守慈,蔡益坤,林俊叡,陳鴻基
dc.subject.keyword	概念空間,本體知識,社群協作,	zh_TW
dc.subject.keyword	Concept Space,Automatic Ontology Construction,Social Collaborative Editing,	en
dc.relation.page	75
dc.rights.note	未授權
dc.date.accepted	2015-01-15
dc.contributor.author-college	管理學院	zh_TW
dc.contributor.author-dept	資訊管理學研究所	zh_TW
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-103-1.pdf 未授權公開取用	4.07 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。