Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 醫學工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/18544
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor翁昭旼(Jau-Min Wong)
dc.contributor.authorKe-Chun Huangen
dc.contributor.author黃可羣zh_TW
dc.date.accessioned2021-06-08T01:10:53Z-
dc.date.copyright2014-08-25
dc.date.issued2014
dc.date.submitted2014-08-17
dc.identifier.citation[1] David L Sackett, William M C Rosenberg, J A Muir Gray, R Brian Haynes, and
W Scott Richardson. Evidence based medicine: what it is and what it isn't. British
Medical Journal, 312(7023):71--72, January 1996.
[2] Jimmy Lin and Dina Demner-Fushman. The role of knowledge in conceptual re-
trieval. In Proceedings of the 29th annual international ACM SIGIR conference
on Research and development in information retrieval - SIGIR '06, pages 99--106,
Seattle, Washington, USA, 2006.
[3] David L. Sackett. Evidence-based medicine. Seminars in Perinatology, 21(1):3--5,
February 1997.
[4] W S Richardson, M C Wilson, J Nishikawa, and R S Hayward. The well-built clinical
question: a key to evidence-based decisions. ACP Journal Club, 123(3):A12--13,
November 1995.
[5] Dina Demner-Fushman. Complex question answering based on a semantic domain
model of clinical medicine. PhD thesis, University of Maryland at College Park,
2006. AAI3241460.
[6] Dina Demner-Fushman and Jimmy Lin. Answer extraction, semantic clustering, and
extractive summarization for clinical question answering. In Proceedings of the 21st
International Conference on Computational Linguistics and the 44th annual meeting
of the Association for Computational Linguistics, ACL-44, pages 841--848, Sydney,
Australia, 2006. Association for Computational Linguistics.
[7] Henry McQuay. Evidence-based medicine: What is the evidence that it has made a
difference? Palliative Medicine, 25(5):394--397, July 2011.
[8] Tanja Bekhuis and Dina Demner-Fushman. Towards automating the initial screening
phase of a systematic review. Stud Health Technol Inform, 160(Pt 1):146--150, 2010.
PMID: 20841667.
[9] Florian Boudin, Jian-Yun Nie, and Martin Dawes. Deriving a test collection for
clinical information retrieval from systematic reviews. In Proceedings of the ACM
fourth international workshop on Data and text mining in biomedical informatics,
DTMBIO '10, page 57–60, New York, NY, USA, 2010. ACM.
[10] Stan Matwin, Alexandre Kouznetsov, Diana Inkpen, Oana Frunza, and Peter
O'Blenis. A new algorithm for reducing the workload of experts in performing sys-
tematic reviews. Journal of the American Medical Informatics Association : JAMIA,
17(4):446--453, July 2010. PMID: 20595313 PMCID: 2995653.
[11] Grace Y. T. Cheng. A study of clinical questions posed by hospital clinicians. Jour-
nal of the Medical Library Association, 92(4):445--458, October 2004.
[12] Paul Glasziou. Evidence based medicine: does it make a difference? British Medical
Journal, 330(7482):92, January 2005. PMID: 15637375 PMCID: PMC543903.
[13] Xiaoli Huang, Jimmy Lin, and Dina Demner-Fushman. Evaluation of PICO as a
knowledge representation for clinical questions. AMIA Annual Symposium Proceed-
ings, 2006:359--363, 2006.
[14] Connie Schardt, Martha Adams, Thomas Owens, Sheri Keitz, and Paul Fontelo. Uti-
lization of the PICO framework to improve searching PubMed for clinical questions.
BMC Medical Informatics and Decision Making, 7(16):1--16, 2007.
[15] David L. Sackett, Sharon E. Straus, W. Scott Richardson, William Rosenberg, and
R. Brian Haynes. Evidence-Based Medicine: How to Practice and Teach EBM.
Churchill Livingstone, 2 edition, January 2000.
[16] A.M. Cohen, W.R. Hersh, K. Peterson, and Po-Yin Yen. Reducing workload in
systematic review preparation using automated citation classification. Journal of the
American Medical Informatics Association : JAMIA, 13(2):206--219, 2006. PMID:
16357352 PMCID: 1447545.
[17] Aaron M. Cohen. Optimizing feature representation for automated systematic review
work prioritization. AMIA Annual Symposium Proceedings, 2008:121--125, 2008.
PMID: 18998798 PMCID: 2656096.
[18] Dina Demner-Fushman and Jimmy Lin.
Answering clinical questions with
knowledge-based and statistical techniques. Computational Linguistics, 33(1):63-
-103, 2007.
[19] Tanja Bekhuis and Dina Demner-Fushman. Screening nonrandomized studies for
medical systematic reviews: A comparative study of classifiers. Artificial Intelli-
gence in Medicine, 55(3):197--207, July 2012.
[20] Florian Boudin, Jian-Yun Nie, and Martin Dawes. Clinical information retrieval us-
ing document and PICO structure. In Human Language Technologies: The 2010
Annual Conference of the North American Chapter of the Association for Computa-
tional Linguistics, HLT '10, pages 822--830, Los Angeles, California, 2010. Asso-
ciation for Computational Linguistics.
[21] C. Silva and B. Ribeiro. The importance of stop word removal on recall values in
text categorization. In Proceedings of the International Joint Conference on Neural
Networks, 2003, volume 3, pages 1661--1666 vol.3, July 2003.
[22] Larry MD. Mcknight and Padmini Srinivasan. Categorization of sentence types in
medical abstracts. In In Proceedings of the 2003 ANNUAL SYMPOSIUM OF THE
AMERICAN MEDICAL INFORMATICS ASSOCIATION (AMIA), pages 440--444,
2003.
[23] Grace Chung. Sentence retrieval for abstracts of randomized controlled trials. BMC
Medical Informatics and Decision Making, 9(10):1--13, 2009.
[24] Grace Y. Chung and Enrico Coiera. A study of structured clinical abstracts and the
semantic classification of sentences. In Proceedings of the Workshop on BioNLP
2007: Biological, Translational, and Clinical Language Processing, BioNLP '07,
page 121–128, Stroudsburg, PA, USA, 2007. Association for Computational Lin-
guistics.
[25] Florian Boudin, Jian-Yun Nie, Joan Bartlett, Roland Grad, Pierre Pluye, and Martin
Dawes. Combining classifiers for robust PICO element detection. BMC Medical
Informatics and Decision Making, 10(29):1--6, 2010.
[26] Florian Boudin, Lixin Shi, and Jian-Yun Nie. Improving medical information re-
trieval with PICO element detection. In Proceedings of the 32nd European confer-
ence on Advances in Information Retrieval, ECIR'2010, page 50–61, Berlin, Heidel-
berg, 2010. Springer-Verlag.
[27] Florian Boudin, Jian-Yun Nie, and Martin Dawes. Positional language models for
clinical information retrieval. In Proceedings of the 2010 Conference on Empirical
Methods in Natural Language Processing, EMNLP '10, page 108–115, Stroudsburg,
PA, USA, 2010. Association for Computational Linguistics.
[28] Su Kim, David Martinez, Lawrence Cavedon, and Lars Yencken. Automatic clas-
sification of sentences to support evidence based medicine. BMC Bioinformatics,
12(Suppl 2):1--10, 2011.
[29] A R Aronson. Effective mapping of biomedical text to the UMLS metathesaurus:
the MetaMap program. Proc AMIA Symp, pages 17--21, 2001. PMID: 11825149.
[30] Padmini Srinivasan and Thomas Rindflesch. Exploring text mining from MEDLINE.
Proc AMIA Symp, pages 722--726, 2002.
[31] James G. Mork, Olivier Bodenreider, Dina Demner-Fushman, Rezarta Islamaj
Doğan, Frans-Michel Lang, Zhiyong Lu, Aure Nol, Lee Peters, Sonya E. Shooshan,
and Alan R. Aronson. Extracting rx information from clinical narrative. Journal of
the American Medical Informatics Association, 17(5):536--539, September 2010.
[32] Charles A. Sneiderman, Dina Demner-Fushman, Marcelo Fiszman, Nicholas C. Ide,
and Thomas C. Rindflesch. Knowledge-based methods to help clinicians find an-
swers in MEDLINE. Journal of the American Medical Informatics Association,
14(6):772--780, November 2007.
[33] Zhiyong Lu, Won Kim, and W. Wilbur. Evaluation of query expansion using MeSH
in PubMed. Information Retrieval, 12(1):69--80, February 2009.
[34] Dina Demner-Fushman, Barbara Few, Susan E Hauser, and George Thoma. Auto-
matically identifying health outcome information in MEDLINE records. Journal of
the American Medical Informatics Association, 13(1):52 --60, 2006.
[35] David Moher, Kenneth F. Schulz, Douglas Altman, and for the CONSORT Group.
The CONSORT statement: Revised recommendations for improving the quality of
reports of parallel-group randomized trials. JAMA-JOURNAL OF THE AMERICAN
MEDICAL ASSOCIATION, 285(15):1987 --1991, April 2001.
[36] K. F Schulz, D. G Altman, D. Moher, and for the CONSORT Group. CONSORT
2010 statement: updated guidelines for reporting parallel group randomised trials.
BMJ, 340(mar23 1):698--702, March 2010.
[37] Dina Demner-Fushman and Jimmy Lin. Knowledge extraction for clinical ques-
tion answering: Preliminary results. Association for the Advancement of Artificial
Intelligence, pages 1--9, 2005.
[38] George A. Miller. WordNet: a lexical database for english. Commun. ACM, 38(11):
39--41, November 1995.
[39] Christiane Fellbaum. WordNet: an electronic lexical database. Language, speech,
and communication. MIT Press, Cambridge, Mass, 1998.
[40] Steven Bird, Ewan Klein, and Edward Loper. Natural Language Processing with
Python. O'Reilly Media, 1 edition, June 2009.
[41] Yiming Yang and Jan O. Pedersen. A comparative study on feature selection in text
categorization. In Proceedings of the Fourteenth International Conference on Ma-
chine Learning, ICML '97, pages 412--420, San Francisco, CA, USA, 1997. Morgan
Kaufmann Publishers Inc.
[42] Ke Chun Huang, Charles Chih Ho Liu, Shung Shiang Yang, Furen Xiao, Jau Min
Wong, Chun Chih Liao, and I Jen Chiang. Classification of PICO elements by text
features systematically extracted from PubMed abstracts. In 2011 IEEE Interna-
tional Conference on Granular Computing (GrC), pages 279--283. IEEE, November
2011.
[43] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon-
del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau,
M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in python.
Journal of Machine Learning Research, 12:2825–2830, 2011. 00661.
[44] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector ma-
chines. ACM Trans. Intell. Syst. Technol., 2(3):27:1–27:27, May 2011. 18096.
[45] Rong-En Fan, Pai-Hsuen Chen, and Chih-Jen Lin. Working set selection using sec-
ond order information for training support vector machines. J. Mach. Learn. Res.,
6:1889–1918, December 2005. 01044.
[46] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin.
LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res., 9:1871–
1874, June 2008. 01842.
[47] Aik Choon Tan and David Gilbert. An empirical comparison of supervised machine
learning techniques in bioinformatics. In Yi-Ping Phoebe Chen, editor, First Asia-
Pacific Bioinformatics Conference (APBC2003), volume 19 of CRPIT, pages 219--
222, Adelaide, Australia, 2003. ACS.
[48] Hui Wang, Lin Wang, and Lixia Yi. Maximum entropy framework used in text
classification. In 2010 IEEE International Conference on Intelligent Computing and
Intelligent Systems (ICIS), volume 2, pages 828--833, 2010.
[49] Gerard Salton. Automatic text processing: the transformation, analysis, and re-
trieval of information by computer. Addison-Wesley Longman Publishing Co., Inc.,
1989. Cited by 0020.
[50] pgsql/src/backend/snowball/stopwords/.
[51] ASA. 2011 Relative Value Guide: A Guide for Anesthesia Values: Book Only.
AMER SOC OF ANESTHESIOLOGISTS, 1 edition, November 2010.
[52] Martin Dawes, Pierre Pluye, Laura Shea, Roland Grad, Arlene Greenberg, and Jian-
Yun Nie. The identification of clinically important elements within medical jour-
nal abstracts: Patient-Population-Problem, Exposure-Intervention, comparison, out-
come, duration and results (PECODR). Informatics in Primary Care, 15(1):9--16,
2007.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/18544-
dc.description.abstract本研究基於senetence-level在PubMed結構化文獻摘要中作PICO成份檢測,針對PICO個別成份每個段落的第一個句子是否足夠用來訓練樸素貝葉斯分類器以及支持向量機,抑或是除了段落第一句之外,還需涵蓋段落中除第一句之外的剩餘句子。從醫學資料庫 Pubmed 隨機對照試驗(randomized controlled trials)文獻中,擷取段首符合P/I/O各類別預設標籤的19,854結構化摘要來作樸素貝葉斯分類器以及支持向量機的訓練集。然後經由ten-fold cross-validation對於PICO各類別分別比較由段落第一句訓練出來的分類器(CF)以及整個段落裡所有句子訓練出來的分類器(CA)。採用recall, precision和F-measures作為結果加以比較。結果顯示使用樸素貝葉斯分類器,對於Outcome類別 CF 和 CA 之間並不存在顯著差異(F-measure=0.731±0.009 vs. 0.738±0.010, p=0.123)。然而在recall方面,CA在Intervention類別上表現得更好0.752±0.012 vs. 0.620±0.007, p<0.001),F-measures 0.728±0.006 vs. 0.662±0.007, p<0.001)。對於Patient/Problem類別而言,CF具有更高的precision 0.714±0.009 vs. 0.665±0.010, p<0.001), 但較低的 recall 0.766±0.013 vs. 0.811±0.012, p<0.001)。在sentence-level PICO 成份偵測而言,CF並不總是優於CA,CF 和 CA在檢測不同PICO成份的表現各不相同。當使用線性支持向量機分類器時,P/I/O各類結果和使用樸素貝葉斯分類器並不完全相同,在sentence-level PICO 成份偵測而言,CF也並不總是優於CA,CF和CA在檢測不同PICO成份的表現也有所不同。而當支持向量機分類器使用radial based function時,recall以及F-measure都偏低。zh_TW
dc.description.abstractTo identify of patient, intervention, comparison, and outcome (PICO) components in medical articles efficiently is helpful in evidence-based medicine. The purpose of this study is to clarify whether first sentences of these components are good enough to train naive Bayes classifiers for sentence-level PICO element detection. We extracted 19,854 structured abstracts of randomized controlled trials with any P/I/O label from PubMed for naive Bayes classifiers training. Performances of classifiers trained by first sentences of each section (CF) and those trained by all sentences (CA) were compared using all sentences by ten-fold cross-validation. The results measured by recall, precision, and F-measures show that there are no significant differences in performance between CF and CA for detection of O-element (F-measure}=0.731±0.009 vs. 0.738±0.010$, p=0.123). However, CA perform better for I-elements, in terms of recall (0.752±0.012 vs. 0.620±0.007, p<0.001) and F-measures (0.728±0.006 vs. 0.662±0.007, p<0.001).
For P-elements, CF have higher precision (0.714±0.009 vs. 0.665±0.010, p<0.001), but lower recall (0.766±0.013 vs. 0.811±0.012, p<0.001). CF are not always better than CA in sentence-level PICO element detection. Their performance varies in detecting different elements. In the study, support vector machines are also used. When comparing the results of CA and CF classifiers trained by Naive Bayesian and support vector machines, differences are obtained.
en
dc.description.provenanceMade available in DSpace on 2021-06-08T01:10:53Z (GMT). No. of bitstreams: 1
ntu-103-D92548001-1.pdf: 6084720 bytes, checksum: 9e83310c08af8014403a510d04e03053 (MD5)
Previous issue date: 2014
en
dc.description.tableofcontentsContents
口試委員會審定書 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .i
致謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .ii
中文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .iv
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .viii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
1.2 Training set and feature in previous studies . . . . . . . . . . . . . . 3
2 Material and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 System flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Naive Bayes algorithm . . . . . . . . . . . . . . . . . . . . . . . . . .7
2.4 Two sets of classifiers: CF and CA . . . . . . . . . . . . . . . . . . . 9
2.5 Feature selection . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
2.6 Ten-fold cross validation . . . . . . . . . . . . . . . . . . . . . . . 11
2.7 Trial classifiers trained at section level . . . . . . . . . . . . . . .11
2.8 Using support vector machine . . . . . . . . . . . . . . . . . . . . . .12
3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1 The results of Naive Bayes . . . . . . . . . . . . . . . . . . . . . . .14
3.1.1 The most informative features used by classifiers trained with
all sentences . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.2 Estimate of the accuracy for a small randomly selected set . . . .22
3.1.3 Results of section level . . . . . . . . . . . . . . . . . . . . .22
3.2 The results of Support vector machine - Libsvm . . . . . . . . . . . . .22
3.3 The results of Support vector machine - Liblinear . . . . . . . . . . . 28
4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33
4.1 Naive Bayes for text classification . . . . . . . . . . . . . . . . . . 33
4.1.1 Better classifier for PICO element detection . . . . . . . . . . .34
4.1.2 Text preprocessing . . . . . . . . . . . . . . . . . . . . . . . .34
4.1.3 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37
4.1.4 Minimizing manual intervention . . . . . . . . . . . . . . . . . .38
4.1.5 Data sets with different degrees of imbalance . . . . . . . . . . 38
4.1.6 Feature selection methods . . . . . . . . . . . . . . . . . . . . 39
4.1.7 Section level . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.8 Possible explanation for worse performance . . . . . . . . . . . .39
4.2 Support vector machine versus Naive Bayes . . . . . . . . . . . . . . . 40
4.3 CF versus CA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43
5 Conclusion, limitation and future works . . . . . . . . . . . . . . . . . . 45
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45
5.2 Limitation of current study . . . . . . . . . . . . . . . . . . . . . . 47
5.3 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .49
dc.language.isoen
dc.title於醫學隨機對照試驗結構化摘要中偵測PICO成份:比較首句及全句組zh_TW
dc.titlePICO element detection in structured medical abstracts of randomized controlled trial: Compare first sentences and all sentencesen
dc.typeThesis
dc.date.schoolyear102-2
dc.description.degree博士
dc.contributor.coadvisor蔣以仁(I-Jen Chiang)
dc.contributor.oralexamcommittee高成炎(Cheng-Yan Kao),陳中明(Chung-Ming Chen),劉建財(Chien-Tsai Liu)
dc.subject.keyword文字探勘,資訊檢索,自然語言處理,資訊擷取,實證醫學,zh_TW
dc.subject.keywordText mining,Information retrieval,Natural language processing,Information extraction,Evidence-based medicine,en
dc.relation.page55
dc.rights.note未授權
dc.date.accepted2014-08-17
dc.contributor.author-college工學院zh_TW
dc.contributor.author-dept醫學工程學研究所zh_TW
顯示於系所單位:醫學工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-103-1.pdf
  未授權公開取用
5.94 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved