Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 共同教育中心
  3. 統計碩士學位學程
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/20526
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor歐陽彥正
dc.contributor.authorCheng-En Hongen
dc.contributor.author洪晟恩zh_TW
dc.date.accessioned2021-06-08T02:51:53Z-
dc.date.copyright2017-08-24
dc.date.issued2017
dc.date.submitted2017-08-14
dc.identifier.citation[1] Angelino, E., Larus-Stone, N., Alabi, D., Seltzer, M., and Rudin, C. (2017). Learning Certifiably Optimal Rule Lists for Categorical Data. arXiv preprint arXiv:170401701.
[2] Barber, R. F. and Candes, E. J. (2016). A knockoff filter for high-dimensional selective inference. arXiv preprint arXiv:1602.03574.
[3] Barber, R. F., Candès, E. J., et al. (2015). Controlling the false discovery rate via knockoffs. The Annals of Statistics, 43(5):2055–2085.
[4] Beauchamp, N. (2017). Predicting and interpolating state-level polls using twitter textual data. American Journal of Political Science, 61(2):490–503.
[5] Benjamini, Y. (2010). Discovering the false discovery rate. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4):405–416.
[6] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the royal statistical society. Series B (Methodological), 57(1):289–300.
[7] Bertsimas, D. and Dunn, J. (2017). Optimal classification trees. Machine Learning, 106(7):1–44.
[8] Breiman, L. (2001). Statistical modeling: The two cultures. Statistical Science, 16(3):199–215.
[9] Breiman, L., Friedman, J., Stone, C. J., and Olshen, R. A. (1984). Classification and regression trees. CRC press.
[10] Breiman, L. and Shang, N. (1996). Born again trees. University of California, Berkeley, Berkeley, CA, Technical Report.
[11] Brzyski, D., Peterson, C. B., Sobczyk, P., Candes, E. J., Bogdan, M., and Sabatti, C. (2017). Controlling the rate of gwas false discoveries. Genetics, 205(1):61–75.
[12] Bühlmann, P. (2011). Invited discussion on ”regression shrinkage and selection via the lasso: a retrospective (r.tibshirani)”. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(3):277–279.
[13] Candes, E., Fan, Y., Janson, L., and Lv, J. (2016). Panning for gold: Modelfree knockoffs for high-dimensional controlled variable selection. arXiv preprint arXiv:1610.02351.
[14] Chen, J., Hou, A., and Hou, T. Y. (2017). Some Analysis of the Knockoff Filter and its Variants. arXiv preprint arXiv:170603400.
[15] Deng, H. (2014). Interpreting tree ensembles with intrees. arXiv preprint arXiv:1408.5456.
[16] Doshi-Velez, F. and Kim, B. (2017). Towards A Rigorous Science of Interpretable Machine Learning. arXiv preprint arXiv:170208608.
[17] Fan, J., Song, R., et al. (2010). Sure independence screening in generalized linear models with np-dimensionality. The Annals of Statistics, 38(6):3567–3604.
[18] Friedman, J., Hastie, T., Tibshirani, R., et al. (2000). Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics, 28(2):337–407.
[19] Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 29(5):1189–1232.
[20] Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378.
[21] Friedman, J. H. and Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3):916–954.
[22] Goodman, B. and Flaxman, S. (2016). European union regulations on algorithmic decision-making and a” right to explanation”. arXiv preprint arXiv:1606.08813.
[23] Hara, S. and Hayashi, K. (2016a). Making tree ensembles interpretable. arXiv preprint arXiv:1606.05390.
[24] Hara, S. and Hayashi, K. (2016b). Making tree ensembles interpretable: A bayesian model selection approach. arXiv preprint arXiv:1606.09066.
[25] Ioannidis, J. P. (2005). Why most published research findings are false. PLos Med,2(8):e124.
[26] Ioannidis, J. P. (2016). Why most clinical research is not useful. PLoS Med,13(6):e1002049.
[27] Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. Journal of Machine Learning Research, 15(1):2869–2909.
[28] Julia Angwin, Jeff Larson, S. M. and Kirchner, L. (2016). Machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. URL https://www.propublica.org/article/machine-bias-risk-assessments-incriminal-sentencing.
[29] Letham, B., Rudin, C., McCormick, T. H., Madigan, D., et al. (2015). Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model. The Annals of Applied Statistics, 9(3):1350–1371.
[30] Lim, M. and Hastie, T. (2015). Learning interactions via hierarchical group-lasso regularization. Journal of Computational and Graphical Statistics, 24(3):627–654.
[31] Meinshausen, N. et al. (2010). Node harvest. The Annals of Applied Statistics, 4(4):2049–2072.
[32] Menickelly, M., Gunluk, O., Kalagnanam, J., and Scheinberg, K. (2016). Optimal Generalized Decision Trees via Integer Programming. arXiv preprint arXiv:161203225.
[33] Montgomery, J. M. and Olivella, S. (2017). Tree-based models for political science data. To be appeared in American Journal of Political Science.
[34] Quinlan, J. R. (2014). C4. 5: programs for machine learning. Elsevier.
[35] Reid, S. and Tibshirani, R. (2016). Sparse regression and marginal testing using cluster prototypes. Biostatistics, 17(2):364–376.
[36] Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144. ACM.
[37] Rivest, R. L. (1987). Learning decision lists. Machine learning, 2(3):229–246.
[38] Sesia, M., Sabatti, C., and Candès, E. J. (2017). Gene Hunting with Knockoffs for Hidden Markov Models. arXiv preprint arXiv:170604677.
[39] Su, W., Bogdan, M., and Candes, E. (2015). False discoveries occur early on the lasso path. arXiv preprint arXiv:1511.01957.
[40] Sur, P., Chen, Y., and Candès, E. J. (2017). The Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square. arXiv preprint arXiv:170601191.
[41] Tan, H. F., Hooker, G., and Wells, M. T. (2016). Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable. arXiv preprint arXiv:161107115.
[42] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1):267–288.
[43] Tibshirani, R. J. (2014). In praise of sparsity and convexity. Past, Present, and Future of Statistical Science, pages 497–505.
[44] Tong, X., Feng, Y., and Li, J. J. (2016). Neyman-Pearson (NP) classification algorithms and NP receiver operating characteristic (NP-ROC) curves. arXiv preprint arXiv:160803109.
[45] Tong, X., Feng, Y., and Zhao, A. (2016). A survey on neyman-pearson classification and suggestions for future research. Wiley Interdisciplinary Reviews: Computational Statistics, 8(2):64–81.
[46] Ustun, B. and Rudin, C. (2016a). Learning optimized risk scores on large-scale datasets. arXiv preprint arXiv:1610.00168.
[47] Ustun, B. and Rudin, C. (2016b). Supersparse linear integer models for optimized medical scoring systems. Machine Learning, 102(3):349–391.
[48] Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. Taylor & Francis.
[49] Wang, F. and Rudin, C. (2015a). Causal falling rule lists. arXiv preprint arXiv:1510.05189.
[50] Wang, F. and Rudin, C. (2015b). Falling rule lists. In Artificial Intelligence and Statistics, pages 1013–1022.
[51] Yang, H., Rudin, C., and Seltzer, M. (2016). Scalable bayesian rule lists. arXiv preprint arXiv:1602.08610.
[52] Zeng, J., Ustun, B., and Rudin, C. (2017). Interpretable classification models for recidivism prediction. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180(3):689–722.
[53] Zhao, A., Feng, Y., Wang, L., and Tong, X. (2015). Neyman-Pearson Classification under High-Dimensional Settings. arXiv preprint arXiv:150803106.
[54] Zhao, Q. and Hastie, T. (2017). Causal interpretations of black-box models. To be appeared in Journal of Business & Economic Statistics.
[55] Zhou, Y. and Hooker, G. (2016). Interpreting models via single tree approximation. arXiv preprint arXiv:1610.09036.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/20526-
dc.description.abstract在實際的問題中儘管擁有許多的變數,我們並不曉得哪些變數是
真實的變數,哪些是虛假的雜訊。通過發現重要變數,研究人員可以
進一步利用選擇的重要變數進行更有針對性的後續實驗以利探討背
後的科學現象。一個自然的要求是,我們希望盡可能發現更多的相
關變量,同時盡可能犯更少的錯誤。我們提出一個改良的RuleFit 模
型,其中包含利用knockoff procedure 達到控制錯誤發現率, 以及通過
Neyman-Pearson 方法控制型一誤差。
zh_TW
dc.description.abstractDespite the abundance of the available variables, ground truth is privy
to knowledge about the problem seldom revealed in practice. By discovering
important features, researchers can further conduct a more targeted follow-up
experiment on the selected features tailored for understanding the scientific
phenomenon. A natural requirement is that we wish to discover as many relevant
variables as possible and make as few mistakes as possible at the same
time. We propose a modified RuleFit with FDR control by knockoff procedure
and with alpha control by Neyman-Pearson method.
en
dc.description.provenanceMade available in DSpace on 2021-06-08T02:51:53Z (GMT). No. of bitstreams: 1
ntu-106-R04H41006-1.pdf: 1211308 bytes, checksum: 7d01f9a5e7a176550195e56f958ce3e9 (MD5)
Previous issue date: 2017
en
dc.description.tableofcontents口試委員會審定書 i
誌謝 ii
摘要 iii
Abstract iv
Contents v
List of Figures vii
List of Tables viii
Notations ix
1 Introduction 1
1.1 Literature Review 2
1.2 Background 7
1.2.1 RuleFit 7
1.2.2 The Lasso 9
1.3 Motivation 12
1.4 Framework of Thesis 12
2 Methods 14
2.1 Knockoff Procedure 14
2.1.1 Preliminaries and Notations 14
2.1.2 Construct Knockoffs 18
2.1.3 Calculate Feature Statistics 21
2.1.4 Calculate a Data-Dependent Threshold 25
2.1.5 Two-Stage Modification 30
2.1.6 Summary 31
2.2 Neyman-Pearson Method 31
2.2.1 Preliminaries and Notations 31
2.2.2 Neyman-Pearson Umbrella Algorithm 33
3 Results and Discussion 38
3.1 Knockoff Procedure 38
3.1.1 Knockoff Result Summary 41
3.2 Neyman-Pearson Method 41
3.2.1 Neyman-Pearson Method Result Summary 42
3.3 Real Data Analysis 45
3.3.1 Real Data Analysis Result Summary 46
4 Conclusion 47
References 49
dc.language.isoen
dc.title具有錯誤發現率和型一誤差控制的可解釋之預測樹模型zh_TW
dc.titleA tree-based interpretable predictive method with FDR and
type-one error control
en
dc.typeThesis
dc.date.schoolyear105-2
dc.description.degree碩士
dc.contributor.oralexamcommittee韓謝忱,蔡政安
dc.subject.keyword模型選擇,錯誤發現率,zh_TW
dc.subject.keywordKnockoff,FDR,Lasso,Neyman-Pearson method,en
dc.relation.page54
dc.identifier.doi10.6342/NTU201702789
dc.rights.note未授權
dc.date.accepted2017-08-14
dc.contributor.author-college共同教育中心zh_TW
dc.contributor.author-dept統計碩士學位學程zh_TW
顯示於系所單位:統計碩士學位學程

文件中的檔案:
檔案 大小格式 
ntu-106-1.pdf
  目前未授權公開取用
1.18 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved