請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/47028完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 陳建錦(Chien Chin Chen) | |
| dc.contributor.author | Ze-Han Fang | en |
| dc.contributor.author | 方澤翰 | zh_TW |
| dc.date.accessioned | 2021-06-15T05:45:36Z | - |
| dc.date.available | 2010-08-24 | |
| dc.date.copyright | 2010-08-24 | |
| dc.date.issued | 2010 | |
| dc.date.submitted | 2010-08-19 | |
| dc.identifier.citation | [1] Bottou, L. 1991. Une approche th’eorique de l’apprentissage connexionniste: Applications `a la reconnaissance de la parole. PhD thesis, Universit’e de Paris XI.
[2] Berger, A.L., Della Pietra, V.J., and Della Pietra, S.A. 1996. A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics, pages 39-71. [3] Blum, A and Langley, P. 1997. Selection of relevant features and examples in machine learning. Artificial Intelligence, pages 245-271. [4] Catlett, J. 1991. On changing continuous attributes into ordered discrete attributes. In Proc. Fifth European Working Session on Learning, pages 164–177. [5] Chen, C.C., Chen, M.C., and Chen, M.S. 2005. LIPED: Hmm-based life profiles for adaptive event detection. In Proceedings of KDD ’05, pages 556–561. [6] Eysenbach, G. 2002. Infodemiology: The epidemiology of (mis)information. Am J Med, pages 763-765. [7] Eysenbach, G. 2006. Infodemiology: tracking flu-related searches on the web for syndromic surveillance. AMIA Annu Symp Proc, pages 244-248. [8] Ginsberg, J., Mohebbi, M.H., Patel, R.S., Brammer, L., Smolinski, M.S., and Brilliant, L. 2009. Detecting influenza epidemics using search engine query data. Nature, pages 1012-1015. [9] Goutte, C., Cancedda, N., Gaussier, E., and De’jean, H. 2004. Generative vs Discriminative Approaches to Entity Extraction from Label Deficient Data. In proceedings of JADT 2004, pages 10–12. [10] Hsu, C.W. and Lin, C.J. 2002. A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, pages 415-425. [11] Hogg, R.V. and Tanis, E.A. 2005. Probability and Statistical Inference (7ed Edition). Prentice Hall. [12] Jurafsky, D., and Martin, J.H. 2008. Speech and Language Processing (2nd Edition). Prentice Hall. [13] Lafferty, J., McCallum, A., and Pereira, F. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML 2001, pages 282-289. [14] Liu, H, Hussain, F, Tan, C.L., Dash, M. 2002. Discretization: An Enabling Technique. Data Mining and Knowledge Discovery 2002, pages 393-423. [15] McCallum, A., Freitag, D., and Pereira, F. 2000. Maximum entropy Markov models for information extraction and segmentation. In Proceedings of ICML 2000, pages 591–598. [16] Morbidity and Mortality Weekly Report (MMWR). http://www.cdc.gov/mmwr/ [17] Manning, C.D., Raghavan, P., and Sch‥utze, H. 2008. Introduction to Information Retrieval. Cambridge University Press. Online 17/08/2007. http://nlp.stanford.edu/IR-book/information-retrieval-book.html. [18] Markov, A.A. 1913. An example of statistical investigation in the text of ‘Eugene Onyegin’ illustrating coupling of ’tests’ in chains. In Proceedings of the Academy of Sciences 7, pages 153–162. [19] Nallapati, R. 2004. Discriminative models for information retrieval. In Proceedings of SIGIR 2004, pages 64–71. [20] Ng, A.Y., and Jordan, M. 2001. On discriminative vs. generative classifiers: A comparison of logistic regression and Naive Bayes. NIPS, pages 841-848. [21] Polgreen, P.M., Chen, Y., Pennock, D.M., and et al. 2008. Clinical Infectious Diseases, pages 1443–1448. [22] Quinlan, J.R. 1993. C4.5: programs for machine learning, Morgan Kaufmann, Los Altos, California. [23] Rath, T.M., Carreras, M., and Sebastiani, P. 2003. Automated detection of infuenza epidemics with hidden Markov models. Advances in intelligent data analysis V. Springer-Verlag, pages 521–531. [24] Serfing, R.E. 1963. Methods for current statistical analysis of excess pneumonia-infuenza deaths. Public Health Reports, pages 494–506. [25] Steinwart, I., and Christmann, A. 2008. A Support Vector Machines. Springer, New York. [26] Tsai, H.T., and Liu, T.M. 2005. Effects of global climate change on disease epidemics and social instability around the world. Human Security and Climate Change, pages 21-23. [27] Viterbi, A. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm, IEEE Transactions on information Theory, pages 260- 269. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/47028 | - |
| dc.description.abstract | 傳染病無可避免的會造成大量的死亡以及重大的社會及經濟上的損失。傳染病監測因此早已成為重要的保健研究議題。在2009年,Ginsberg等人觀察到搜尋引擎紀錄檔可以被用來即時的評估當前傳染病的嚴重性狀態。在本論文中,我們將傳染病監測當成一個分類的問題並且應用Google的查詢統計資料來對登革熱傳染病的嚴重性狀態做分類。23個與登革熱有關的關鍵字查詢紀錄檔資料用作機器學習訓練與測試的觀察值,研究中也評估不同的機器學習模型在傳染病監測上的成果。根據在五年真實世界資料集上的實驗,證明了搜尋引擎紀錄檔可以被用來建立準確的傳染病狀態分類器。此外,經過學習的分類器也會表現的比傳統的回歸模型好。我們也應用了各種不同的機器學習模型如生成模型(generative model),判別模型(discriminative model),序列化模型(sequential model),以及非序列化模型(non-sequential model)來證明他們在傳染病監測上的適用性。 | zh_TW |
| dc.description.abstract | Epidemics inevitably result in a large number of deaths and always cause considerable social and economic damage. Epidemic surveillance has thus become an important healthcare research issue. In 2009, Ginsberg et al. observed that the query logs of search engines can be used to estimate the status of epidemics in a timely manner. In this paper, we model epidemic surveillance as a classification problem and employ query statistics from Google to classify the status of a dengue fever epidemic. The query logs of twenty-three dengue-related keywords serve as observations for machine learning and testing, and a number of machine learning models are investigated to evaluate their surveillance performance. Evaluations based on a 5-year real world dataset demonstrate that search engine query logs can be used to construct accurate epidemic status classifiers. Moreover, the learned classifiers generally outperform conventional regression approaches. We also apply various machine learning models, including generative, discriminative, sequential, and non-sequential classification models, to demonstrate their applicability to epidemic surveillance. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-15T05:45:36Z (GMT). No. of bitstreams: 1 ntu-99-R97725031-1.pdf: 487920 bytes, checksum: e9b904a8fe3fa16f47f79441b47ed48e (MD5) Previous issue date: 2010 | en |
| dc.description.tableofcontents | 謝詞 i
論文摘要 ii THESIS ABSTRACT iii Table of Contents v List of Figures vii List of Tables viii Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 3 1.3 Thesis Organization 5 Chapter 2 Related Work 6 Chapter 3 Machine Learning Models for Epidemic Surveillance 11 3.1 Problem Definition 11 3.2 Generated Models 12 3.2.1 Naive Bayes (NB) 12 3.2.2 Hidden Markov Model (HMM) 13 3.3 Discriminative Models 14 3.3.1 Support Vector Machine (SVM) 14 3.3.2 Maximum Entropy (ME) 15 3.3.3 Maximum Entropy Markov Model (MEMM) 16 3.3.4 Conditional Random Fields (CRFs) 17 3.4 Machine Learning Methods Performance Evaluations 18 3.4.1 Evaluation Dataset and Performance Metrics 18 3.4.2 Classification Accuracy Evaluations 21 3.4.3 Effectiveness of Epidemic Trend in Classification 25 Chapter 4 Query Frequency Discretization and Feature Selection 27 4.1 Supervised Data Discretization 27 4.2 Irrelevant Feature Selection 29 4.3 Evaluation 30 Chapter 5 Conclusion and Future Work 34 Reference 36 | |
| dc.language.iso | en | |
| dc.subject | 查詢紀錄檔分析 | zh_TW |
| dc.subject | 分類 | zh_TW |
| dc.subject | 文件探勘 | zh_TW |
| dc.subject | Classification | en |
| dc.subject | Query Log Analysis | en |
| dc.subject | Text Mining | en |
| dc.title | 運用搜尋引擎紀錄檔與機器學習模型於傳染病監測之研究 | zh_TW |
| dc.title | A Study of Machine Learning Models on Epidemic Surveillance: Using Query Logs of Search Engines | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 98-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 陳孟彰(Meng Chang Chen),陳銘憲(Ming-Syan Chen) | |
| dc.subject.keyword | 文件探勘,分類,查詢紀錄檔分析, | zh_TW |
| dc.subject.keyword | Text Mining,Classification,Query Log Analysis, | en |
| dc.relation.page | 37 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2010-08-19 | |
| dc.contributor.author-college | 管理學院 | zh_TW |
| dc.contributor.author-dept | 資訊管理學研究所 | zh_TW |
| 顯示於系所單位: | 資訊管理學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-99-1.pdf 未授權公開取用 | 476.48 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
