請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/29302
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 陳銘憲(Ming-Syan Chen) | |
dc.contributor.author | Chien-Chin Chen | en |
dc.contributor.author | 陳建錦 | zh_TW |
dc.date.accessioned | 2021-06-13T01:04:11Z | - |
dc.date.available | 2007-07-30 | |
dc.date.copyright | 2007-07-30 | |
dc.date.issued | 2007 | |
dc.date.submitted | 2007-07-22 | |
dc.identifier.citation | [1] C. C. Aggarwal, “A Framework for Diagnosing Changes in Evolving Data Streams,” in proceedings of the ACM SIGMOD international conference on management of data, pp. 575-586, 2003.
[2] J. Aizen, D. Huttenlocher, J. Kleinberg, and A. Novak, “Traffic-based Feedback on the Web,” In proceedings of the national academy of sciences: 101, pp. 5254-5260, 2004. [3] J. Allan, R. Papka, and V. Lavrenko, “On-Line New Event Detection and Tracking,” in proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, pp. 37-45, 1998. [4] J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang, “Topic Detection and Tracking Pilot Study: Final Report,” in proceedings of the DARPA broadcast news transcription and understanding workshop, pp. 194-218, 1998. [5] J. Allan, V. Lavrenko, and H. Jin, “First Story Detection in TDT is Hard,” in proceedings of ninth international conference on information and knowledge management, pp.374-381, 2000. [6] M. R. Amini and P. Gallinari, “The Use of Unlabeled Data to Improve Supervised Learning for Text Summarization,” in proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, pp. 105-112, 2002. [7] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison Wesley, 1999. [8] Y. Barlas and K. Kanar, “A Dynamic Pattern-oriented Test for Model Validation,” in proceedings of 4th systems science European congress, pp. 269-286, 1999. [9] L. E. Baum, T. Petrie, G. Soules, and N. Weiss, “A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains,” in annals of mathematical statistics: 41, pp. 164-171, 1970. [10] D. M. Blei and P. J. Moreno, “Topic Segmentation with an Aspect Hidden Markov Model,” in proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, pp. 343-348, 2001. [11] T. Brants, F. Chen, and I. Tsochantaridis, “Topic-Based Document Segmentation with Probabilistic Latent Semantic Analysis,” in proceedings of the eleventh international conference on Information and knowledge management, pp. 211-218, 2002. [12] R. D. Brown, “Dynamic Stopwording for Story Link Detection,” In proceedings of HLT 2002: second international conference on human language technology research, pp. 190-193, 2002. [13] C. C. Chen, Y. T. Chen, Y. Sun, and M. C. Chen, “Life Cycle Modeling of News Events Using Aging Theory,” in proceedings of 14th European conference on machine learning, pp. 47-59, 2003. [14] F. Y. Y. Choi, P. Wiemer-Hastings, and J. Moore, “Latent Semantic Analysis for Text Segmentation,” in proceedings of 2001 conference on empirical methods in natural language processing, pp. 109-117, 2001. [15] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, MIT Press, 2001. [16] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” in journal of the royal statistical society. series B 39, pp. 1-38, 1977. [17] D. Donjerkovic and R. Ramakrishnan, “Dynamic Histograms: Capturing Evolving Data Sets,” in proceedings of the 16th international conference on data engineering, pp. 86, 2000. [18] G. Erkan, and D. R. Radev, “LexRank: Graph-based Centrality as Salience in Text Summarization,” in journal of artificial intelligence research, Volume 22, pp. 457-479, 2004. [19] G. J. Fiscus and G. Doddington, “Topic Detection and Tracking Evaluation Overview,” Topic Detection and Tracking: Event-based Information Organization, Kluwer Academic Press. 2002. [20] M. Franz, T. Ward, J. S. McCarley, and W. J. Zhu, “Unsupervised and Supervised Clustering for Topic Tracking,” in proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, pp. 310-317, 2001. [21] V. Ganti, J. Gehrke, and R. Ramakrishnan, “A Framework for Measuring Changes in Data Characteristics,” in proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp. 126-137, 1999. [22] S. Ghahramani, Fundamentals of Probability. Prentice Hall, 2000. [23] Y. Gong and X. Liu, “Generic Text Summarization Using relevance Measure and Latent Semantic Analysis,” in proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, pp. 19-25, 2001. [24] R. P. Grimaldi, Discrete and Combinatorial Mathematics: An Applied Introduction, Addison Wesley Publishing Company; 4th edition, 1998. [25] V. Hatzivassiloglou, L. Gravano, and A. Maganti, “An Investigation of Linguistic Features and Clustering Algorithms for Topical Document Clustering,” in proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pp. 224-231, 2000. [26] M. A. Hearst and C. Plaunt, “Subtopic Structuring for Full-Length Document Access,” in proceedings of the 16th annual international ACM SIGIR conference on research and development in information retrieval, pp. 59-68, 1993. [27] T. Hofmann, “Probabilistic Latent Semantic Indexing,” in proceedings of the 22th annual international ACM SIGIR conference on research and development in information retrieval, pp. 50-57, 1999. [28] X. Ji, and H. Zha, “Domain-independent Text Segmentation Using Anisotropic Diffusion and Dynamic Programming,” in proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, pp. 322-329, 2003. [29] J. Kleinberg, “Authoritative Sources in a Hyperlinked Environment,” in proceedings of the ninth annual ACM-SIAM symposium on discrete algorithms, pp. 668-677, 1998. [30] J. Kleinberg, “Bursty and Hierarchical Structure in Streams,” in proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, pp. 91-101, 2002. [31] A. Leuski and J. Allan, “Improving Realism of Topic Tracking Evaluation,” in proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, pp. 89-96, 2002. [32] Z. Li, B. Wang, M. Li, and W.-Y. Ma, “A Probabilistic Model for Retrospective News Event Detection,” in proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp. 106-113, 2005. [33] C. Y. Lin and E. Hovy, “Automatic Evaluation of Summaries Using N-gram Co-Occurrence Statistics,” in proceedings of the human language technology conference, volume 1, pp. 71-78, 2003. [34] S. H. Lin, M. C. Chen, J. M. Ho, and Y. M. Huang, “ACIRD : Intelligent Internet Documents Organization and Retrieval,” IEEE transactions on knowledge and data engineering, Vol. 14, No. 3, pp. 599-614, 2002. [35] J. Makkonen, H. Ahonen-Myka, and M. Salmenkivi, “Simple Semantics in Topic Detection and Tracking,” in information retrieval, 7 (3-4), pp. 347-368, 2004. [36] R. Manmatha, A. Feng, and J. Allan, “A Critical Examination of TDT’s Cost Function,” in proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, pp. 403-404, 2002. [37] A. A. Markov, “An Example of Statistical Investigation in the Text of 'Eugene Onyegin' Illustrating Coupling of 'tests' in Chains,” in proceedings of the academy of sciences 7, pp. 153-162, 1913. [38] A. Martin, G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki, “The DET Curve in Assessment of Detection Task Performance,” in proceedings of the EuroSpeech, Volume 4, pp. 1985-1898, 1997. [39] Q. Mei and C. X. Zhai, “Discovering Evolutionary Theme Patterns from Text – An Exploration of Temporal Text Mining,” in proceeding of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, pp. 198-207, 2005. [40] F. Menczer, R. K. Belew, and W. Willuhn, “Artificial Life Applied to Adaptive Information Agents,” in spring symposium on information gathering from distributed, heterogeneous database, AAAI Press, 1995. [41] T. Mitchell, Machine Learning, McGraw-Hall, 1997. [42] C. Myers, L. R. Rabiner, and A. E. Rosenberg, “Performance Tradeoffs in Dynamic Time Warping Algorithms for Isolated Word Recognition,” in IEEE transactions on acoustics, speech, and signal processing, vol. ASSP-28, No. 6, pp. 23-635, 1980. [43] R. Nallapati, A. Feng, F. Peng, and J. Allan, “Event Threading within News Topics,” in proceedings of the thirteenth ACM international conference on Information and knowledge management, pp. 446-453, 2004. [44] A. Nenkova, “Automatic Text Summarization of Newswire: Lessons Learned from the Document Understanding Conference,” in proceedings of the 20th national conference on artificial intelligence (AAAI 2005), pp. 1436-1441, 2005. [45] C. Nicholas and R. Dahlberg, “Spotting Topics with the Singular Value Decomposition,” lecture notes in computer science, Vol. 1481, pp. 82-91, 1998. [46] T. Nomoto and Y. Matsumoto, “A New Approach to Unsupervised Text Summarization,” in proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, pp. 26-34, 2001. [47] R. Papka, “On-Line New Event Detection, Clustering, and Tracking,” PhD thesis, University of Massachusetts Amherst, 1999. [48] L. R. Rabiner and M. R. Sambur, “An Algorithm for Determining the Endpoints for Isolated Utterances,” in the Bell system technical journal, vol. 54, No. 2, pp. 297-315, 1975. [49] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Prentice-Hall, 1978. [50] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” in proceedings of the IEEE, 77(2), pp. 257--286, 1989. [51] C. J. V. Rijsbergen, Information Retrieval, 2nd. Butterworths, London, 1979. [52] J. J. Rocchio, “Relevance Feedback in Information Retrieval,” in the SMART retrieval system, Prentice Hall, pp. 313-323, 1971. [53] G. Salton, Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley, 1989. [54] B. Silverman, Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986. [55] D. A. Smith, “Detecting and Browsing Events in Unstructured Text,” in proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, pp. 73-80, 2002. [56] L. E. Spence, A. J. Insel, S. H. Friedberg, Elementary Linear Algebra, A Matrix Approach, Prentice Hall, 2000. [57] R. Swan and J. Allan, “Automatic Generation of Overview Time-lines,” in proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pp. 49-56, 2000. [58] A. J. Viterbi, “Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm,” in IEEE transactions on information theory IT-13, pp. 1260-1269, 1967. [59] W. L. Winston, Operations Research: Applications and Algorithms, Thomson Brooks/Cole, 2004. [60] C. C. Yang and X. Shi, “Discovering Event Evolution Graphs from Newswires,” in proceedings of the 15th international conference on world wide web, pp. 945-946, 2006.. [61] Y. Yang, T. Pierce, and J. Carbonell, “A Study on Retrospective and On-Line Event Detection,” in proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, pp. 28-36, 1998. [62] Y. Yang, T. Ault, T. Pierce, and C. W. Lattimer, “Improving Text Categorization Methods for Event Tracking,” in proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pp. 65-72, 2000. [63] H. Zha, “Generic Summarization and Keyphrase Extraction Using Mutual Reinforcement Principle and Sentence Clustering,” in proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, pp. 113-120, 2002. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/29302 | - |
dc.description.abstract | 由於網路的便利性,網際網路已成為目前資訊散佈的主流媒介,許多關係人類生活的相關資訊都藉由著它來發布或交換訊息,但也由於其便利性,大量且不斷產生的網際網路資訊也增加了使用者搜尋資訊時的不便,為了要有效管理這不斷產生資訊的文件串流,事件偵測與追蹤與自動事件內容摘要化便成了目前熱門的學術研究議題。
本論文的主旨在於提供一個套管理串流文件的有效機制,我們提出了兩種事件偵測方法來自動偵測與追蹤新興的新聞事件,透過我們所提出的衰老理論,我們可有效的描述事件的生命週期來降低事件偵測的錯誤率,此外,我們也提出了一套以隱含式馬可夫模型為基礎的生命模型來描述事件的熱門程度變化,藉由所學習到的生命模型,我們可即時地預測不同事件的熱門狀態來動態的調整事件偵測中的分群門檻值。透過官方制定的實驗測試集,我們所提出的方法確實能改善現有事件偵測方法的效能。另外,為了便利使用者了解事件的來龍去脈,我們還提出了一套事件內容摘要化的方法,在摘要化的過程中,我們考慮了事件的時續性以進階產生事件的故事演變圖。實驗結果證明事件時序性能有效提升事件內容摘要化的效能,而實驗範例也說明了所產生的故事演變圖確實能捕捉到事件內的重要發展與演變。 | zh_TW |
dc.description.abstract | The World Wide Web (WWW) has become a major information source for people from all walks of life. Although the WWW facilitates information distribution, the ever-increasing volume of Internet documents has made information discovery from the Internet a time consuming task. To manage the massive information of the Internet efficiently, there is a critical need for event detect and summarization methods from text streams.
In this dissertation, we provide two adaptive methods to detect sequential events from text streams. We first propose an aging theory to model the life cycle of events. Then, we provide an event detection framework called LIPED which utilizes HMM-based life profiles to predict the activeness status of events for adaptive threshold adjustments. To help user comprehend the development of news topics easily, we also provide a unified mechanism to construct a topic evolution graph and summary from topic documents. The experiment results based on the official TDT4 corpus show that the proposed event detection methods improve the performance of existing well-known event detection approaches substantially, and the composed topic summaries and evolution graphs are highly representative. | en |
dc.description.provenance | Made available in DSpace on 2021-06-13T01:04:11Z (GMT). No. of bitstreams: 1 ntu-96-D92921018-1.pdf: 1042383 bytes, checksum: c2a9ca2f6d30c5862228787ed8887f1a (MD5) Previous issue date: 2007 | en |
dc.description.tableofcontents | 謝辭 II
中文摘要 III 英文摘要 IV Chapter 1 Introduction 1 1.1 An Introduction to Event Detection, Topic Evolution Graph Construction and Summarization 1 1.2 Motivations 2 1.3 The Organization of This Dissertation 10 Chapter 2 Related Works 11 2.1 Topic Detection and Tracking (TDT) 11 2.2 Status Modeling of Stream Data 14 2.3 Topic Evolution Mining 17 2.4 Text Segmentation 18 2.5 Text Summarization 20 Chapter 3 An Aging Theory for Event Life Cycle Modeling 23 3.1 Problem Specification 23 3.2 Aging Theory for Event Detection 23 3.2.1 Constant Decay Aging Scheme 27 3.2.2 Training of α and β 28 3.3 The Energy-based Event Detection Algorithm 29 3.4 Performance Evaluation 31 3.4.1 Data Corpus and Evaluation Metrics 32 3.4.2 Significance of the Aging Parameters 34 3.4.3 Effectiveness of the Aging Theory 36 3.4.4 Comparisons with Other Methods 37 3.5 Conclusion 41 Chapter 4 An Adaptive Threshold Framework for Event Detection Using HMM-based Life Profiles 43 4.1 Problem Specification 43 4.2 Life Profile Modeling 44 4.2.1 Acquiring K, S, and B 47 4.2.2 Acquiring A and Π 48 4.3 LIPED 53 4.3.1 LIPED Data Models 53 4.3.2 Life Profile based Event Detection 55 4.3.3 Threshold Strategies 58 4.4 Performance Evaluation 59 4.4.1 Data Corpus and Performance Metrics 59 4.4.2 Life Profile Preparation 63 4.4.3 LIPED on Time Window Method 64 4.4.4 LIPED on Time-based Threshold Method 67 4.4.5 LIPED on Incremental Clustering Algorithm 69 4.4.6 Effects of Threshold Settings 71 4.5 Conclusion 77 Chapter 5 A Unified Eigenvector-based Method for Topics Evolution Graph Construction and Summarization 79 5.1 Problem Specification 79 5.2 Theme Generation 81 5.3 Event Segmentation and Summarization 85 5.4 Evolution Graph Construction 89 5.5 Summary Evaluation 92 5.5.1 Summary to Topic Similarity Evaluation 95 5.5.2 ROUGE Evaluation 98 5.5.3 Discussions of Summary Evaluation 101 5.6 Evolution Graph Evaluation 104 5.6.1 Case Study 1 on Topic 40023 104 5.6.2 Case Study 2 on Topic 40004 108 5.7 Conclusion 112 Chapter 6 Conclusions and Future Works 115 References 118 Appendix A: Topic Summaries 128 | |
dc.language.iso | en | |
dc.title | 串流文件內涵事件之偵測、演變及摘要之研究 | zh_TW |
dc.title | Event Detection, Evolution and Summarization of Streaming Texts | en |
dc.type | Thesis | |
dc.date.schoolyear | 95-2 | |
dc.description.degree | 博士 | |
dc.contributor.oralexamcommittee | 蔣榮先(Jung-Hsien Chiang),吳毅成(I-Chen Wu),曾新穆(Vincent Shin-Mu Tseng),張嘉惠(Chia-Hui Chang),周承復(Cheng-Fu Chou) | |
dc.subject.keyword | 事件偵測與追蹤,自動化文件分群,自動化文件摘要,事件故事演變圖, | zh_TW |
dc.subject.keyword | Topic Detection and Tracking,Text Clustering,Text Summarization,Topic Evolution Graph, | en |
dc.relation.page | 127 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2007-07-24 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 電機工程學研究所 | zh_TW |
顯示於系所單位: | 電機工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-96-1.pdf 目前未授權公開取用 | 1.02 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。