Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 醫學工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/48953
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor翁昭旼
dc.contributor.authorBo-Tao Panen
dc.contributor.author潘博韜zh_TW
dc.date.accessioned2021-06-15T11:12:15Z-
dc.date.available2019-08-25
dc.date.copyright2016-08-25
dc.date.issued2016
dc.date.submitted2016-08-21
dc.identifier.citation參考文獻
1. Mann, C.J., Observational research methods. Research design II: cohort, cross sectional, and case-control studies. Emergency Medicine Journal, 2003. 20(1): p. 54-60.
2. Kihlberg, J.K., et al., Seat Belt Use and Injury Patterns in Automobile Accidents. 1967: Cornell Aeronautical Laboratory, Incorporated.
3. 圖書館學與資訊科學大辭典. 1995.
4. 環境科學大辭典. 2002.
5. Schulz, K.F. and D.A. Grimes, Case-control studies: research in reverse. Lancet, 2002. 359(9304): p. 431-434.
6. National health insurance research database. Available from: http://nhird.nhri.org.tw/date_cohort.html.
7. Oracle DBMS_RANDOM. Available from: https://oracle-base.com/articles/misc/dbms_random.
8. Su, V.Y.F., et al., Allergic rhinitis and risk of erectile dysfunction - a nationwide population-based study. Allergy, 2013. 68(4): p. 440-445.
9. Tseng, C.H., Diabetes and risk of bladder cancer: a study using the National Health Insurance database in Taiwan. Diabetologia, 2011. 54(8): p. 2009-2015.
10. Lin, H.-C., P.-Z. Chao, and H.-C. Lee, Sudden sensorineural hearing loss increases the risk of stroke - A 5-year follow-up study. Stroke, 2008. 39(10): p. 2744-2748.
11. Tai, Y.-M., et al., Prediction of ADHD to Anxiety Disorders: An 11-Year National Insurance Data Analysis in Taiwan. Journal of Attention Disorders, 2013. 17(8): p. 660-669.
12. Chang, C.H., et al., Type 2 diabetes prevalence and incidence among adults in Taiwan during 1999-2004: a national health insurance data set study. Diabetic Medicine, 2010. 27(6): p. 636-643.
13. Lin, C.-J., et al., Statins Attenuate Helicobacter pylori CagA Translocation and Reduce Incidence of Gastric Cancer: In Vitro and Population-Based Case-Control Studies. Plos One, 2016. 11(1).
14. Greenland, S., Confounding, in Encyclopedia of Biostatistics. 2005, John Wiley & Sons, Ltd.
15. Marsh, J.L., J.L. Hutton, and K. Binks, Removal of radiation dose response effects: an example of over-matching. BMJ : British Medical Journal, 2002. 325(7359): p. 327-330.
16. ROSENBAUM, P.R. and D.B. RUBIN, The central role of the propensity score in observational studies for causal effects. Biometrika, 1983. 70(1): p. 41-55.
17. Sekhon, J.S., Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching Package for R. Journal of Statistical Software, 2011. 42(7): p. 1-52.
18. MongoDB Index. Available from: https://docs.mongodb.com/manual/tutorial/analyze-query-plan/.
19. MongoDB sharding. Available from: https://docs.mongodb.com/manual/sharding/.
20. Perl Compatible Regular Expressions. Available from: http://www.pcre.org/.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/48953-
dc.description.abstract本研究的目的在於設計一套流程,用於產生觀察型研究的研究對象,並且實作系統。透過資料篩選、變數的建立與處理、對照組的匹配以及初步的統計檢定四個步驟,讓使用者對研究對象有一個概觀,並得到資料的雛型,以做進一步的研究。
我們以國家衛生研究院發行的百萬人承保抽樣歸人檔(LHID2010)作為研究資料。這份資料包含了台灣2010年全民健保在保者的100萬人抽樣檔,其在1996年至2010年間的所有就醫資料,以歸人的形式建立。這樣的資料結構與龐大的資料量正適合利用NoSQL資料庫schema free與水平擴充的特性來處理。
因此我們建立了MongoDB replica sharded cluster,利用分片(sharding)的功能,可以提升查詢效率,再配合Map-Reduce方法,可以對資料進行較複雜的運算,產生修整好的資料,提供後續的統計分析。
zh_TW
dc.description.abstractThe objective of this research is to make a platform, helping users to obtain the study population of observational studies by a four stages procedure including data querying, variables creating, control group matching and significance testing.
Data querying platform was used to find out the study population. It provides a simple interface for NoSQL database query. It also automatically makes a flow chart of query results, helping users manage the process of query.
Variables creating platform let users extract detail attribute information of selected patients from previous stage such as disease diagnoses or drugs taken records. It was done by a Mongo Map-Reduce process and export to a csv file for next stage.
Control group matching platform read the patients and variables from the previous stage and do propensity score matching. Users choose treatment (or exposure), outcome or other covariates from the variables, then fit the generalized linear model and match the control group by fitted value.
Significant testing platform did t-test on each variable and chi-square goodness of fit test on each age group between case group and control group to see if there is any significant difference.
The first two stages of working process can be separated from others into two independent parts. The first part prepares data for observational study, the second part implements statistic analyzing. Users may have their own analyses on the data we prepared or loading their own data into our matching platform.
en
dc.description.provenanceMade available in DSpace on 2021-06-15T11:12:15Z (GMT). No. of bitstreams: 1
ntu-105-R01548032-1.pdf: 1652720 bytes, checksum: cbb617747f4bbfdde4a8f6e8818184b6 (MD5)
Previous issue date: 2016
en
dc.description.tableofcontents目錄
中文摘要 I
Abstract II
目錄 III
圖目錄 V
表目錄 VI
第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 1
1.3 論文架構 1
第二章 文獻探討 2
2.1 健保資料庫與觀察型研究 2
2.2 對照組匹配與傾向分數 3
第三章 系統設計 4
3.1 世代研究流程 4
3.2 系統架構 5
3.3 資料篩選 6
3.4 資料庫查詢 8
3.4.1索引(Index) 9
3.4.2索引值提示(index hint) 9
3.4.3正規表達轉換(Regular expression) 9
3.5 變數建立 10
3.5.1 Map-Reduce 11
3.6 對照匹配 12
3.7 統計分析 13
3.8 Mongo Replica-Shard cluster 14
3.8.1分片(Sharding) 15
3.8.2 Shard key選擇 15
3.9 資料前處理 16
3.10 系統環境 17
第四章 系統演示 18
4.1 資料篩選 18
4.2 資料輸出 20
4.3 對照組匹配與統計分析 21
第五章 結論與展望 24
5.1 結論 24
5.2 展望 24
參考文獻 25
dc.language.isozh-TW
dc.subject健保資料庫zh_TW
dc.subject巨量資料zh_TW
dc.subject觀察型研究zh_TW
dc.subject傾向分數匹配zh_TW
dc.subject非關聯式資料庫zh_TW
dc.subjectobservational studyen
dc.subjectpropensity score matchingen
dc.subjectNHIRDen
dc.subjectbig dataen
dc.subjectNoSQL databaseen
dc.title巨量資料之病例對照研究平台zh_TW
dc.titleMatching Platform of Case Control Study on Big Dataen
dc.typeThesis
dc.date.schoolyear104-2
dc.description.degree碩士
dc.contributor.coadvisor蔣以仁
dc.contributor.oralexamcommittee張淑惠
dc.subject.keyword傾向分數匹配,巨量資料,健保資料庫,觀察型研究,非關聯式資料庫,zh_TW
dc.subject.keywordbig data,propensity score matching,NHIRD,observational study,NoSQL database,en
dc.relation.page25
dc.identifier.doi10.6342/NTU201603499
dc.rights.note有償授權
dc.date.accepted2016-08-22
dc.contributor.author-college工學院zh_TW
dc.contributor.author-dept醫學工程學研究所zh_TW
顯示於系所單位:醫學工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-105-1.pdf
  未授權公開取用
1.61 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved