Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 共同教育中心
  3. 統計碩士學位學程
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/20526
Full metadata record
???org.dspace.app.webui.jsptag.ItemTag.dcfield???ValueLanguage
dc.contributor.advisor歐陽彥正
dc.contributor.authorCheng-En Hongen
dc.contributor.author洪晟恩zh_TW
dc.date.accessioned2021-06-08T02:51:53Z-
dc.date.available2030-01-01-
dc.date.copyright2017-08-24
dc.date.issued2017
dc.date.submitted2017-08-14
dc.identifier.citation[1] Angelino, E., Larus-Stone, N., Alabi, D., Seltzer, M., and Rudin, C. (2017). Learning Certifiably Optimal Rule Lists for Categorical Data. arXiv preprint arXiv:170401701.
[2] Barber, R. F. and Candes, E. J. (2016). A knockoff filter for high-dimensional selective inference. arXiv preprint arXiv:1602.03574.
[3] Barber, R. F., Candès, E. J., et al. (2015). Controlling the false discovery rate via knockoffs. The Annals of Statistics, 43(5):2055–2085.
[4] Beauchamp, N. (2017). Predicting and interpolating state-level polls using twitter textual data. American Journal of Political Science, 61(2):490–503.
[5] Benjamini, Y. (2010). Discovering the false discovery rate. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4):405–416.
[6] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the royal statistical society. Series B (Methodological), 57(1):289–300.
[7] Bertsimas, D. and Dunn, J. (2017). Optimal classification trees. Machine Learning, 106(7):1–44.
[8] Breiman, L. (2001). Statistical modeling: The two cultures. Statistical Science, 16(3):199–215.
[9] Breiman, L., Friedman, J., Stone, C. J., and Olshen, R. A. (1984). Classification and regression trees. CRC press.
[10] Breiman, L. and Shang, N. (1996). Born again trees. University of California, Berkeley, Berkeley, CA, Technical Report.
[11] Brzyski, D., Peterson, C. B., Sobczyk, P., Candes, E. J., Bogdan, M., and Sabatti, C. (2017). Controlling the rate of gwas false discoveries. Genetics, 205(1):61–75.
[12] Bühlmann, P. (2011). Invited discussion on ”regression shrinkage and selection via the lasso: a retrospective (r.tibshirani)”. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(3):277–279.
[13] Candes, E., Fan, Y., Janson, L., and Lv, J. (2016). Panning for gold: Modelfree knockoffs for high-dimensional controlled variable selection. arXiv preprint arXiv:1610.02351.
[14] Chen, J., Hou, A., and Hou, T. Y. (2017). Some Analysis of the Knockoff Filter and its Variants. arXiv preprint arXiv:170603400.
[15] Deng, H. (2014). Interpreting tree ensembles with intrees. arXiv preprint arXiv:1408.5456.
[16] Doshi-Velez, F. and Kim, B. (2017). Towards A Rigorous Science of Interpretable Machine Learning. arXiv preprint arXiv:170208608.
[17] Fan, J., Song, R., et al. (2010). Sure independence screening in generalized linear models with np-dimensionality. The Annals of Statistics, 38(6):3567–3604.
[18] Friedman, J., Hastie, T., Tibshirani, R., et al. (2000). Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics, 28(2):337–407.
[19] Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 29(5):1189–1232.
[20] Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378.
[21] Friedman, J. H. and Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3):916–954.
[22] Goodman, B. and Flaxman, S. (2016). European union regulations on algorithmic decision-making and a” right to explanation”. arXiv preprint arXiv:1606.08813.
[23] Hara, S. and Hayashi, K. (2016a). Making tree ensembles interpretable. arXiv preprint arXiv:1606.05390.
[24] Hara, S. and Hayashi, K. (2016b). Making tree ensembles interpretable: A bayesian model selection approach. arXiv preprint arXiv:1606.09066.
[25] Ioannidis, J. P. (2005). Why most published research findings are false. PLos Med,2(8):e124.
[26] Ioannidis, J. P. (2016). Why most clinical research is not useful. PLoS Med,13(6):e1002049.
[27] Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. Journal of Machine Learning Research, 15(1):2869–2909.
[28] Julia Angwin, Jeff Larson, S. M. and Kirchner, L. (2016). Machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. URL https://www.propublica.org/article/machine-bias-risk-assessments-incriminal-sentencing.
[29] Letham, B., Rudin, C., McCormick, T. H., Madigan, D., et al. (2015). Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model. The Annals of Applied Statistics, 9(3):1350–1371.
[30] Lim, M. and Hastie, T. (2015). Learning interactions via hierarchical group-lasso regularization. Journal of Computational and Graphical Statistics, 24(3):627–654.
[31] Meinshausen, N. et al. (2010). Node harvest. The Annals of Applied Statistics, 4(4):2049–2072.
[32] Menickelly, M., Gunluk, O., Kalagnanam, J., and Scheinberg, K. (2016). Optimal Generalized Decision Trees via Integer Programming. arXiv preprint arXiv:161203225.
[33] Montgomery, J. M. and Olivella, S. (2017). Tree-based models for political science data. To be appeared in American Journal of Political Science.
[34] Quinlan, J. R. (2014). C4. 5: programs for machine learning. Elsevier.
[35] Reid, S. and Tibshirani, R. (2016). Sparse regression and marginal testing using cluster prototypes. Biostatistics, 17(2):364–376.
[36] Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144. ACM.
[37] Rivest, R. L. (1987). Learning decision lists. Machine learning, 2(3):229–246.
[38] Sesia, M., Sabatti, C., and Candès, E. J. (2017). Gene Hunting with Knockoffs for Hidden Markov Models. arXiv preprint arXiv:170604677.
[39] Su, W., Bogdan, M., and Candes, E. (2015). False discoveries occur early on the lasso path. arXiv preprint arXiv:1511.01957.
[40] Sur, P., Chen, Y., and Candès, E. J. (2017). The Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square. arXiv preprint arXiv:170601191.
[41] Tan, H. F., Hooker, G., and Wells, M. T. (2016). Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable. arXiv preprint arXiv:161107115.
[42] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1):267–288.
[43] Tibshirani, R. J. (2014). In praise of sparsity and convexity. Past, Present, and Future of Statistical Science, pages 497–505.
[44] Tong, X., Feng, Y., and Li, J. J. (2016). Neyman-Pearson (NP) classification algorithms and NP receiver operating characteristic (NP-ROC) curves. arXiv preprint arXiv:160803109.
[45] Tong, X., Feng, Y., and Zhao, A. (2016). A survey on neyman-pearson classification and suggestions for future research. Wiley Interdisciplinary Reviews: Computational Statistics, 8(2):64–81.
[46] Ustun, B. and Rudin, C. (2016a). Learning optimized risk scores on large-scale datasets. arXiv preprint arXiv:1610.00168.
[47] Ustun, B. and Rudin, C. (2016b). Supersparse linear integer models for optimized medical scoring systems. Machine Learning, 102(3):349–391.
[48] Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. Taylor & Francis.
[49] Wang, F. and Rudin, C. (2015a). Causal falling rule lists. arXiv preprint arXiv:1510.05189.
[50] Wang, F. and Rudin, C. (2015b). Falling rule lists. In Artificial Intelligence and Statistics, pages 1013–1022.
[51] Yang, H., Rudin, C., and Seltzer, M. (2016). Scalable bayesian rule lists. arXiv preprint arXiv:1602.08610.
[52] Zeng, J., Ustun, B., and Rudin, C. (2017). Interpretable classification models for recidivism prediction. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180(3):689–722.
[53] Zhao, A., Feng, Y., Wang, L., and Tong, X. (2015). Neyman-Pearson Classification under High-Dimensional Settings. arXiv preprint arXiv:150803106.
[54] Zhao, Q. and Hastie, T. (2017). Causal interpretations of black-box models. To be appeared in Journal of Business & Economic Statistics.
[55] Zhou, Y. and Hooker, G. (2016). Interpreting models via single tree approximation. arXiv preprint arXiv:1610.09036.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/20526-
dc.description.abstract在實際的問題中儘管擁有許多的變數,我們並不曉得哪些變數是
真實的變數,哪些是虛假的雜訊。通過發現重要變數,研究人員可以
進一步利用選擇的重要變數進行更有針對性的後續實驗以利探討背
後的科學現象。一個自然的要求是,我們希望盡可能發現更多的相
關變量,同時盡可能犯更少的錯誤。我們提出一個改良的RuleFit 模
型,其中包含利用knockoff procedure 達到控制錯誤發現率, 以及通過
Neyman-Pearson 方法控制型一誤差。
zh_TW
dc.description.abstractDespite the abundance of the available variables, ground truth is privy
to knowledge about the problem seldom revealed in practice. By discovering
important features, researchers can further conduct a more targeted follow-up
experiment on the selected features tailored for understanding the scientific
phenomenon. A natural requirement is that we wish to discover as many relevant
variables as possible and make as few mistakes as possible at the same
time. We propose a modified RuleFit with FDR control by knockoff procedure
and with alpha control by Neyman-Pearson method.
en
dc.description.provenanceMade available in DSpace on 2021-06-08T02:51:53Z (GMT). No. of bitstreams: 1
ntu-106-R04H41006-1.pdf: 1211308 bytes, checksum: 7d01f9a5e7a176550195e56f958ce3e9 (MD5)
Previous issue date: 2017
en
dc.description.tableofcontents口試委員會審定書 i
誌謝 ii
摘要 iii
Abstract iv
Contents v
List of Figures vii
List of Tables viii
Notations ix
1 Introduction 1
1.1 Literature Review 2
1.2 Background 7
1.2.1 RuleFit 7
1.2.2 The Lasso 9
1.3 Motivation 12
1.4 Framework of Thesis 12
2 Methods 14
2.1 Knockoff Procedure 14
2.1.1 Preliminaries and Notations 14
2.1.2 Construct Knockoffs 18
2.1.3 Calculate Feature Statistics 21
2.1.4 Calculate a Data-Dependent Threshold 25
2.1.5 Two-Stage Modification 30
2.1.6 Summary 31
2.2 Neyman-Pearson Method 31
2.2.1 Preliminaries and Notations 31
2.2.2 Neyman-Pearson Umbrella Algorithm 33
3 Results and Discussion 38
3.1 Knockoff Procedure 38
3.1.1 Knockoff Result Summary 41
3.2 Neyman-Pearson Method 41
3.2.1 Neyman-Pearson Method Result Summary 42
3.3 Real Data Analysis 45
3.3.1 Real Data Analysis Result Summary 46
4 Conclusion 47
References 49
dc.language.isoen
dc.title具有錯誤發現率和型一誤差控制的可解釋之預測樹模型zh_TW
dc.titleA tree-based interpretable predictive method with FDR and
type-one error control
en
dc.typeThesis
dc.date.schoolyear105-2
dc.description.degree碩士
dc.contributor.oralexamcommittee韓謝忱,蔡政安
dc.subject.keyword模型選擇,錯誤發現率,zh_TW
dc.subject.keywordKnockoff,FDR,Lasso,Neyman-Pearson method,en
dc.relation.page54
dc.identifier.doi10.6342/NTU201702789
dc.rights.note未授權
dc.date.accepted2017-08-14
dc.contributor.author-college共同教育中心zh_TW
dc.contributor.author-dept統計碩士學位學程zh_TW
dc.date.embargo-terms2030-01-01
Appears in Collections:統計碩士學位學程

Files in This Item:
File SizeFormat 
ntu-106-1.pdf
  Restricted Access
1.18 MBAdobe PDF
Show simple item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved