Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 工程科學及海洋工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/58397
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor黃乾綱
dc.contributor.authorShang-Wei Hsiehen
dc.contributor.author謝尚偉zh_TW
dc.date.accessioned2021-06-16T08:13:43Z-
dc.date.available2019-03-21
dc.date.copyright2014-03-21
dc.date.issued2014
dc.date.submitted2014-02-14
dc.identifier.citation1. Chromosome. Available from: http://www.accessexcellence.org/RC/VL/GG/nhgri_PDFs/chromosome.pdf.
2. Rebholz-Schuhmann, D., A. Oellrich, and R. Hoehndorf, Text-mining solutions for biomedical research: enabling integrative biology. Nature Reviews Genetics 2012. 13: p. 829-839.
3. Waterston, R.H., On the sequencing of the human genome. Proceedings of the National Academy of Sciences of the United States of America, 2002. 99(6): p. 3712-3716.
4. Garber, M., et al., Computational methods for transcriptome annotation and quantification using RNA-seq. Nature Method, 2011. 8(6): p. 469-477.
5. Cheng, D., et al., PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Research, 2008. 36(2): p. 399-405.
6. Yu, H., Selective sampling techniques for feedback-based data retrieval. Data Mining and Knowledge Discovery, 2011. 22(1-2): p. 1-30.
7. Doms, A. and M. Schroeder, GoPubMed: exploring PubMed with the Gene Ontology. Nucleic Acids Research, 2005. 33: p. 783-789.
8. Consortium, T.U., Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Research, 2011. 40(1): p. 71-75.
9. Benson, D.A., et al., GenBank. Nucleic Acids Research, 2004. 33(Database): p. 34-38.
10. Stark, C., et al., The BioGRID Interaction Database: 2011 update. Nucleic Acids Research, 2010: p. 1-7.
11. Eyre, T.A., et al., The HUGO Gene Nomenclature Database, 2006 updates. Nucleic Acids Research, 2006. 34(Database).
12. Proux, D., et al., Detecting Gene Symbols and Names in Biological Texts: A First Step toward Pertinent Information Extraction. Genome informatics. Workshop on Genome Informatics, 1998. 9(72-80).
13. Egorov, S., A. Yuryev, and N. Daraselia, A Simple and Practical Dictionary-Based Approach for Identification of Proteins in MEDLINE Abstracts,. Journal of the American Medical Informatics Association, 2004. 11(3): p. 174-178.
14. Hanisch, D., et al., Playing biology's name game: identifying protein names in scientific text. Pacific Symposium on Biocomputing, 2003: p. 403-414.
15. Fluck, J., et al., ProMiner: Recognition of Human Gene and Protein Names using regularly updated Dictionaries. Proceedings of the Second BioCreative Challenge Evaluation Workshop, 2007: p. 149-151.
16. Fukuda, K., et al., Toward information extraction: identifying protein names from biological papers. Proceedings of the Pacific Symposium on Biocomputing, 1998: p. 707-718.
17. Franzen, K., et al., Protein Names and How to Find Them. International Journal of Medical Informatics, 2002. 67(1-3): p. 49-61.
18. Tapanainen, P. and T. Jirvinen, Non-projective Dependency Parsing. Proceedings of the fifth conference on Applied natural language processing, 1997: p. 64-71.
19. Narayanswamy, M. and K.E. Ravikumar, A biological named entity recognizer. Pacific Symposium on Biocomputing 8, 2003: p. 472-438.
20. Brill, E., Some advances in transformation-based part of speech tagging. Proceedings of the Twelfth National Conference on Artificial Intelligence, 1999. 1: p. 722-727.
21. Kazama, J.i., et al., Tuning Support Vector Machines for Biomedical Named Entity Recognition. Proceedings of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain, 2002: p. 1-8.
22. Chang, J.T., H. Schutze, and R.B. Altman, GAPSCORE: finding gene and protein names one word at a time. Bioinformatics, 2004. 20(2): p. 216-225.
23. Mika, S. and B. Rost, Protein names precisely peeled off free text. Bioinformatics, 2004. 20(1): p. 241-247.
24. Yamamoto, K., et al., Use of morphological analysis in protein name recognition. Joural of Biomedical Informatics, 2004. 37(6): p. 471-482.
25. Yu, H. and E. Agichtein, Extracting synonymous gene and protein terms from biological literature. Bioinformatics, 2003. 19(1): p. 340-349.
26. Nobata, C., N. Collier, and J.-i. Tsujii, Automatic Term Identification and Classification in Biology Texts. Proceedings of the 5th Natural Language Pacific Rim Symposium, 1999: p. 369-375.
27. Tsuruoka, Y. and J.i. Tsujii, Boosting Precision and Recall of Dictionary-Based Protein Name Recognition. Proceedings of the ACL 2003 workshop on Natural language 2003. 13: p. 41-48.
28. Collier, N., C. Nobata, and J.i. Tsujii, Extracting the names of genes and gene products with a hidden markov model. Proceedings of the 18th conference on Computational linguistics, 2000. 1: p. 201-207.
29. Seki, K. and J. Mostafa, A probabilistic model for identifying protein names and their name boundaries. Proceedings of the 2003 IEEE Computer Society Bioinformatics Conference, 2003: p. 251-258.
30. Yeganova, L., L.H. Smith, and W.J. Wilbur, Identification of related gene/protein names based on an HMM of name variations. Computational Biology and Chemistry, 2004. 28: p. 97-107.
31. Kou, Z., W.W. Cohen, and R.F. Murphy, High-recall protein entity recognition using a dictionary. Bioinformatics, 2005. 21(1): p. 266-273.
32. Settles, B., Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets. Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, 2004: p. 104-107.
33. Leaman, R. and G. Gonzalez, BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition. Pacific Symposium on Biocomputing, 2008. 13: p. 652-663.
34. Hsu, C.-N., et al., Integrating High Dimensional Bi-directional Parsing Models for Gene Mention Tagging. Bioinformatics, 2008. 24(13): p. 283-294.
35. Manabu Torii, Z.H., Cathy H. Wu, Hongfang Liu, BioTagger-GM: A Gene/Protein Name Recognition System. Journal of the American Medical Informatics Association, 2009. 16(2): p. 247-255.
36. Initiative, T.A.G., The Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature, 2000. 408: p. 796-815.
37. Huala, E., et al., The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucleic Acids Research, 2001. 29(1): p. 102-105.
38. Rhee, S.Y., et al., The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Research, 2003. 31(1): p. 224-228.
39. Swarbreck, D., et al., The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Research, 2008. 36(1): p. 009-1014.
40. Shendure, J. and H. Ji, Next-generation DNA sequencing. Nature, 2008. 26: p. 1135-1145.
41. Rodriguez-Ezpeleta, N., M. Hackenberg, and A.M. Aransay, Bioinformatics for High Throughput Sequencing. 2012, Springer.
42. 李思元 and 莊以光, DNA 定序技術之演進與發展. Journal of Biomedical and Laboratory Sciences, 2010. 22(2): p. 49-58.
43. Sanger, F., S. Nicklen, and A.R. Coulson, DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences of the United States of America, 1977. 74(12): p. 5463-5467.
44. ADAMS, M.D., et al., Complementary DNA Sequencing: Expressed Sequence Tags and Human Genome Project. 1991. 252(5013): p. 1651-1656.
45. Kim, J., et al., Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment. Nature, 2004. 2(1): p. 47-53.
46. Velculescu, V.E., et al., Serial Analysis of Gene Expression. Science, 1995. 270(5235): p. 484-487.
47. Collins, F.S., et al., Finishing the euchromatic sequence of the human genome. Nature, 2004. 431(7011): p. 931-945.
48. Metzker, M.L., Emerging technologies in DNA sequencing. Genome Research, 2005. 15(12): p. 1767-1776.
49. Mardis, E.R., The impact of next-generation sequencing technology on genetics. Trends in Genetics, 2008. 24(3): p. 133-141.
50. Margulies, M., et al., Genome sequencing in microfabricated high-density picolitre reactors. Nature, 2005. 437: p. 376-380.
51. Bentley, D.R., Whole-genome re-sequencing. Current Opinion in Genetics & Development, 2006. 16(6): p. 545-552.
52. Sayers, E., General Introduction to the E-utilities. 2009: National Center for Biotechnology Information.
53. Zobel, J. and A. Moffat, Inverted Files for Text Search Engines. ACM Computing Surveys, 2006. 38(2).
54. Black, P.E., Inverted index. Dictionary of Algorithms and Data Structures, 2008.
55. Ziviani, N., et al., Compression: A Key for Next-Generation Text Retrieval Systems. IEEE Computer, 2000. 33(11): p. 37-44.
56. Baeza-Yares, R. and B. Riberiro-Neto, Mordern Information Retrieval, the concepts and technology behind search 2nd. 2011: Pearson.
57. Lucene. Available from: http://lucene.apache.org/.
58. Porter, M. Snowball: A language for stemming algorithms. 2001; Available from: http://snowball.tartarus.org/.
59. Lafferty, J.D., A. McCallum, and F.C.N. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. in Proceeding ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning. 2001.
60. Settles, B., ABNER: an open source tool for automatically tagging genes, proteins, and other entity names in text. Bioinformatics, 2005. 21(14): p. 3191-3192.
61. Tanabe, L., et al., GENETAG: a tagged corpus for gene/protein named entity recognition. BMC Bioinformatics, 2005. 6(S3).
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/58397-
dc.description.abstract在從事生物醫學相關研究的草創階段中,找尋相關科學文獻並且將研究人員過去所私有的生物醫學實驗資料對應到科學文獻對於研究者是一件重要的工作。本研究提出一搜尋系統,主要探討在大量的生物醫學文獻中進行快速檢索,其中採用的公開科學文獻資料庫為美國國家圖書館所建立的平台—PubMed,並且採用字詞探勘技術—名稱辨識進行蛋白質名稱的字詞抽取;再以本研究所提出的規則法對蛋白質名稱進行正規化取得識別碼,並以此識別碼連接至實驗數據資料庫。其目的為讓使用者在短時間之內能夠以圖形化的方式呈現私有的實驗數據與相關科學文獻等資訊,藉此得知實驗數據在其檢索出的相關文件中所佔有的重要性與關聯性,進一步發現可研究的議題或更有價值的資訊。zh_TW
dc.description.abstractIn the beginning of biomedical research works, mapping researchers’ proprietary experiment data to public research literatures is an important work. In this paper, a search engine is proposed to retrieve large scale biomedical literatures which are collected from PubMed in an efficient way. Moreover, we apply a name entity recognition tool which is a kind of text-mining technique to extract protein names from the biomedical literatures. Afterward, the protein names are normalized to IDs which can be linked to the researchers’ proprietary experiment databases and using web techniques automatically plot the charts for the relevant proprietary data; through these processes, the researchers can efficiently get the relevance between their proprietary data and the public papers also can help them to find more available research works.en
dc.description.provenanceMade available in DSpace on 2021-06-16T08:13:43Z (GMT). No. of bitstreams: 1
ntu-103-R00525081-1.pdf: 1805400 bytes, checksum: 89ec942b1f233ba0e52567e4224cb798 (MD5)
Previous issue date: 2014
en
dc.description.tableofcontents致謝 i
摘要 ii
ABSTRACT iii
目錄 iv
圖目錄 vi
表目錄 vii
第 1 章 緒論 1
1.1 學術文獻檢索 1
1.2 動機 2
1.3 連結私有資料庫的檢索系統 3
第 2 章 相關研究 5
2.1 生醫字詞探勘 5
2.1.1 資訊檢索 5
2.1.2 資訊擷取 6
2.2 公開資源 10
2.2.1 文獻資料庫 10
2.2.2 基因體資料庫 12
2.3 私有資料庫 13
2.3.1 次世代定序 13
2.3.2 基因表現量 15
第 3 章 系統架構與設計 16
3.1 系統架構 16
3.2 模組一:文獻檢索 17
3.2.1 蒐集文獻 18
3.2.2 建立索引 20
3.2.3 搜尋 26
3.3 模組二:蛋白質名稱辨識 27
3.4 模組三:名稱正規化 29
3.5 模組四:連結私有資料庫 31
第 4 章 實驗與討論 33
4.1 驗證方法與實驗 33
4.1.1 驗證方法 33
4.1.2 蛋白質名稱辨識的準確率 34
4.1.3 名稱正規化的準確率 36
4.2 討論 40
4.2.1 名稱正規化實驗方法的討論 40
4.2.2 系統使用案例的討論 40
第 5 章 總結 42
5.1 結論 42
5.2 未來展望 42
參考文獻 44
附錄一 50
附錄二 51
附錄三 55
dc.language.isozh-TW
dc.subject生物資訊zh_TW
dc.subject字詞探勘zh_TW
dc.subject資訊檢索zh_TW
dc.subjectInformation retrievalen
dc.subjectText-miningen
dc.subjectBioinformaticsen
dc.title利用搜尋及探勘相關公開文獻的生醫詞彙快速探索私有實驗資料的資訊檢索系統zh_TW
dc.titleAn Information Retrieval System for Fast Exploration of Proprietary Experimental Data via Searching and Mining the Biomedical Entities in Related Public Literaturesen
dc.typeThesis
dc.date.schoolyear102-1
dc.description.degree碩士
dc.contributor.oralexamcommittee陳倩瑜,林詩舜,張恆華
dc.subject.keyword資訊檢索,字詞探勘,生物資訊,zh_TW
dc.subject.keywordInformation retrieval,Text-mining,Bioinformatics,en
dc.relation.page60
dc.rights.note有償授權
dc.date.accepted2014-02-14
dc.contributor.author-college工學院zh_TW
dc.contributor.author-dept工程科學及海洋工程學研究所zh_TW
顯示於系所單位:工程科學及海洋工程學系

文件中的檔案:
檔案 大小格式 
ntu-103-1.pdf
  未授權公開取用
1.76 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved