Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 生醫電子與資訊學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/44609
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor歐陽彥正(Yen-Jen Oyang)
dc.contributor.authorYuan-Hung Huangen
dc.contributor.author黃元鴻zh_TW
dc.date.accessioned2021-06-15T03:51:43Z-
dc.date.available2013-08-22
dc.date.copyright2011-08-22
dc.date.issued2011
dc.date.submitted2011-08-18
dc.identifier.citation1. Grossman, R.L., The Case for Cloud Computing. IT Professional, 2009. 11(2): p. 23-27.
2. Lee, D.D., et al., Association of primary cutaneous amyloidosis with atopic dermatitis: a nationwide population-based study in Taiwan. British Journal of Dermatology, 2011. 164(1): p. 148-153.
3. Lin, L.-Y., et al., Risk factors and incidence of ischemic stroke in Taiwanese with nonvalvular atrial fibrillation--A nation wide database analysis. Atherosclerosis, 2011. 217(1): p. 292-295.
4. Chung, K.-H., C.-C. Huang, and H.-C. Lin, Increased risk of gout among patients with bipolar disorder: A nationwide population-based study. Psychiatry Research, 2010. 180(2-3): p. 147-150.
5. Cheng, C.-L., et al., Validation of the national health insurance research database with ischemic stroke cases in Taiwan. Pharmacoepidemiology and Drug Safety, 2011. 20(3): p. 236-242.
6. Lin, Y.-H., K.-K. Chen, and J.-H. Chiu, Trends in Chinese Medicine Use Among Prostate Cancer Patients Under National Health Insurance in Taiwan: 1996-2008. Integrative Cancer Therapies, 2011.
7. Chang, C.-C., et al., Statins increase the risk of prostate cancer: A population-based case–control study. The Prostate, 2011: p. n/a-n/a.
8. Tang, C.-H., et al., One-year post-hospital medical costs and relapse rates of bipolar disorder patients in Taiwan: a population-based study. Bipolar Disorders, 2010. 12(8): p. 859-865.
9. Lai, M.-N., et al., Population-Based Case–Control Study of Chinese Herbal Products Containing Aristolochic Acid and Urinary Tract Cancer Risk. Journal of the National Cancer Institute, 2010. 102(3): p. 179-186.
10. Ghemawat, S., H. Gobioff, and S.-T. Leung, The Google file system. SIGOPS Oper. Syst. Rev., 2003. 37(5): p. 29-43.
11. Dean, J. and S. Ghemawat, MapReduce: simplified data processing on large clusters. Commun. ACM, 2008. 51(1): p. 107-113.
12. Chang, F., et al., Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst., 2008. 26(2): p. 1-26.
13. Fox, A., Cloud Computing—What's in It for Me as a Scientist? Science, 2011. 331(6016): p. 406-407.
14. Asanovic, K., et al., The Landscape of Parallel Computing Research: A View from Berkeley, 2006, EECS Department, University of California, Berkeley.
15. Yang, Y.W., Y.H. Chen, and H.W. Lin, Risk of herpes zoster among patients with psychiatric diseases: a population-based study. Journal of the European Academy of Dermatology and Venereology, 2011. 25(4): p. 447-453.
16. Tsai, M.-C., et al., Increased risk of concurrent asthma among patients with gastroesophageal reflux disease: a nationwide population-based study. European Journal of Gastroenterology & Hepatology, 2010. 22(10): p. 1169-1173 10.1097/MEG.0b013e32833d4096.
17. Targher, G. and G. Arcaro, Non-alcoholic fatty liver disease and increased risk of cardiovascular disease. Atherosclerosis, 2007. 191(2): p. 235-240.
18. Straus, S.E., Evidence-based medicine : how to practice and teach EBM2005, Edinburgh; New York: Elsevier/Churchill Livingstone.
19. Chen, Y.-H., K.-Y. Chen, and H.-C. Lin, Non-alcoholic cirrhosis and the risk of stroke: a 5-year follow-up study. Liver International, 2011. 31(3): p. 354-360.
20. Chen, Y.-H., et al., Increased risk of schizophrenia following traumatic brain injury: a 5-year follow-up study in Taiwan. Psychological Medicine, 2011. 41(06): p. 1271-1277.
21. Lee, C.-H., J.-D. Wang, and P.-C. Chen, Case-crossover study of hospitalization for acute hepatitis in Chinese herb users. Journal of Gastroenterology and Hepatology, 2008. 23(10): p. 1549-1555.
22. Chang, C.-H., et al., Increased Risk of Stroke Associated With Nonsteroidal Anti-Inflammatory Drugs: A Nationwide Case-Crossover Study. Stroke, 2010. 41(9): p. 1884-1890.
23. Taylor, R., An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinformatics, 2010. 11(Suppl 12): p. S1.
24. White, T., Hadoop: The Definitive Guide, Second Edition2010: O'Reilly Media.
25. Vishwanath, K.V. and N. Nagappan, Characterizing cloud computing hardware reliability, in Proceedings of the 1st ACM symposium on Cloud computing2010, ACM: Indianapolis, Indiana, USA. p. 193-204.
26. Shafer, J., S. Rixner, and A.L. Cox. The Hadoop distributed filesystem: Balancing portability and performance. in Performance Analysis of Systems & Software (ISPASS), 2010 IEEE International Symposium on. 2010.
27. Borthakur, D. HDFS Architecture. 2010; Available from: http://hadoop.apache.org/common/docs/r0.20.2/hdfs_design.html.
28. Jaliya, E. MapReduce for Data Intensive Scientific Analyses. 2008.
29. White, T., Hadoop: The Definitive Guide2009: O'Reilly Media.
30. Carstoiu, D., A. Cernian, and A. Olteanu. Hadoop Hbase-0.20.2 performance evaluation. in New Trends in Information Science and Service Science (NISS), 2010 4th International Conference on. 2010.
31. Figuière, M. NoSQL avec HBase. 2009; Available from: http://blog.xebia.fr/2009/11/18/devoxx-jour-1-nosql-avec-hbase/.
32. Aulbach, S., et al., A comparison of flexible schemas for software as a service, in Proceedings of the 35th SIGMOD international conference on Management of data2009, ACM: Providence, Rhode Island, USA. p. 881-888.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/44609-
dc.description.abstract大型醫療資料庫的研究是近年來熱門的研究主題,但是在實務上常會面臨分析速度緩慢的問題,自一般關聯式資料庫中擷取所需的資料往往需要很長的時間,使得研究主題和規模受到限制。在本篇論文中,我們透過資料庫架構上的重新設計和資料重建,使用HBase儲存健保資料分析時常用的關鍵資料,搭配Hadoop MapReduce對這些資料做分散式與平行化的分析,加快健保資料庫的分析速度;最後將整個分析流程整合成一套自動化的快速分析平台,方便各種不同主題的研究。相容於雲端運算環境的設計使得未來的擴充相對容易,可以直接移植到商業的雲端環境,也使得即時分析系統的開發變得可能。
為了達到這些目的,我們首先分析了相關文獻和熱門的研究方法,統整出需要儲存的重要資訊,並使用HBase將原始的健保資料庫重建成一個適合大規模快速分析的資料庫。透過新資料庫的設計,在分析上可以有效率的取得關鍵資訊,減少在反覆查詢資料上所消耗的運算資源和時間。在分析的流程上,我們設計了全自動化的分析流程,透過制式的疾病定義檔,系統可以自動在資料庫中挑選所需的實驗組和對照組,並計算出勝算比,提供結果分析和探討。每一次完整的分析流程在由三台電腦組成的環境中僅需要大約5分鐘,遠比傳統的流程快上許多。由於速度的加快,我們得以大規模的對數十種疾病做全對全的交叉分析,試圖找出仍然不為人所知的共病關係和因果關係。
zh_TW
dc.description.abstractAnalysis of large-scale medical database has become a popular research topic in recent years. The increasing power of computers and the massive collections of medical records allow us to conduct population-based studies to identify the relationship among diseases. In practice, this kind of studies faces a serious efficiency issue due to the scale of the databases, which then severely limits the productivities of scientists. In this thesis, this efficiency issue is addressed by incorporating HBase, instead of the conventional relational database software, as the data storage framework. Based on the distinct data storage structure of HBase, a new database schema designed to support the MapReduce programming model has been proposed for carrying out distributed and parallelized analyses highly efficiently. Experimental results show that with the proposed design analyses that takes hours or even days with the conventional database framework can be completed within minutes. Another major merit of the proposed design is that the framework works smoothly with the cloud computing environment and therefore enjoys good scalability.en
dc.description.provenanceMade available in DSpace on 2021-06-15T03:51:43Z (GMT). No. of bitstreams: 1
ntu-100-R98945041-1.pdf: 2467528 bytes, checksum: aed09d970a01e41dc2f5d9a9ec6b91e1 (MD5)
Previous issue date: 2011
en
dc.description.tableofcontents論文口試委員審定書 I
謝辭 II
中文摘要 III
Abstract IV
目錄 V
圖目錄 VII
表目錄 IX
第一章、 緒論 1
第一節、 健保資料庫與相關研究 1
第二節、 面臨的瓶頸與解決方案 2
1. 傳統研究方式面臨的問題 2
2. 潛在解決方案—雲端運算資源 3
第三節、 論文架構 4
第二章、 文獻探討 6
第一節、 健保資料庫簡介 6
1. 資料來源及內容 6
2. 健保資料庫資料架構 7
第二節、 健保資料庫相關研究 9
1. 共病關係的研究 9
2. 疾病間的因果關係 10
3. 用藥與疾病間的關係 11
第三節、 雲端運算平台簡介─Hadoop 11
1. HDFS 12
2. Hadoop MapReduce 13
3. HBase 15
第三章、 資料庫設計與實驗方法 17
第一節、 資料轉換─從關聯式資料庫到HBase 17
1. 非平行化的一般分析流程 17
2. 資料表的整合與資料庫系統的選擇 19
3. 平行環境下的HBase資料表設計 21
4. 資料庫建置流程 23
第二節、 執行環境與架構 24
第三節、 分析架構設計與流程 25
1. 系統設計 25
2. MapReduce架構 26
3. 映射函數 27
4. 化簡函數 30
5. 結果分析與勝算比估算 31
6. 分析流程設計與結果呈現 33
第四章、 實驗結果與討論 37
第一節、 共病關係的大規模分析 37
1. 疾病的選擇 37
2. 實驗結果 38
第二節、 疾病因果關係分析 42
第三節、 用藥與疾病間的關係 43
1. 藥品的選擇 43
2. 實驗結果 43
第四節、 效能評估與改善 45
第五節、 分析結果正確性評估 45
第五章、 結論及未來展望 47
第一節、 結論 47
第二節、 未來展望 48
參考文獻 50
dc.language.isozh-TW
dc.title基於Hadoop MapReduce與HBase之醫療資訊快速分析平台zh_TW
dc.titleThe Efficient Analysis Platform of Medical Informatics Based on Hadoop MapReduce and HBaseen
dc.typeThesis
dc.date.schoolyear99-2
dc.description.degree碩士
dc.contributor.coadvisor黃乾綱(Chien-Kang Huang)
dc.contributor.oralexamcommittee孫維仁(Wei-Zen Sun),張天豪(Tien-Hao Chang)
dc.subject.keyword大型醫療資料庫,資料庫設計,共病關係,臨床試驗,分散式運算,Hadoop,HBase,zh_TW
dc.subject.keywordLarge-scale medical database,Database design,Comorbidity,Clinical trial,Distributed computing,Hadoop,HBase,en
dc.relation.page52
dc.rights.note有償授權
dc.date.accepted2011-08-18
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept生醫電子與資訊學研究所zh_TW
顯示於系所單位:生醫電子與資訊學研究所

文件中的檔案:
檔案 大小格式 
ntu-100-1.pdf
  目前未授權公開取用
2.41 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved