請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97662完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 潘建興 | zh_TW |
| dc.contributor.advisor | Frederick Kin Hing Phoa | en |
| dc.contributor.author | 王愛琳 | zh_TW |
| dc.contributor.author | Ai-Lin Wang | en |
| dc.date.accessioned | 2025-07-09T16:17:55Z | - |
| dc.date.available | 2025-07-10 | - |
| dc.date.copyright | 2025-07-09 | - |
| dc.date.issued | 2025 | - |
| dc.date.submitted | 2025-06-30 | - |
| dc.identifier.citation | [1] M. Ai, J. Yu, H. Zhang, and H. Wang. Optimal subsampling algorithms for big data regressions. Statistica Sinica, 2021.
[2] A. Atkinson, A. Donev, and R. Tobias. Optimum experimental designs, with SAS, volume 34. OUP Oxford, 2007. [3] P. Drineas, M. W. Mahoney, and S. Muthukrishnan. Sampling algorithms for l2 regression and applications. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, SODA ’06, page 1127–1136, USA, 2006. Society for Industrial and Applied Mathematics. [4] W. Fithian and T. Hastie. Local case-control sampling: Efficient subsampling in imbalanced data sets. The Annals of Statistics, 42(5):1693–1724, 2014. [5] S. G. Gilmour and L. A. Trinca. Optimum design of experiments for statistical inference. Journal of the Royal Statistical Society. Series C: Applied Statistics, 61(3):345–401, May 2012. [6] A. Katharopoulos and F. Fleuret. Not all samples are created equal: Deep learning with importance sampling. In J. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 2525–2534. PMLR, 2018. [7] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. [8] P. Ma, J. Huang, and N. Zhang. Efficient computation of smoothing splines via adaptive basis sampling. Biometrika, 102:631–645, 2015a. [9] P. Ma, M. W. Mahoney, and B. Yu. A statistical perspective on algorithmic leveraging. The Journal of Machine Learning Research, 16(1):861–911, 2015. [10] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space, 2013. [11] F. Pukelsheim. Optimal Design of Experiments. Society for Industrial and Applied Mathematics, 2006. [12] M. Quiroz, R. Kohn, M. Villani, and M.-N. Tran. Speeding up mcmc by efficient data subsampling. Journal of the American Statistical Association, 114(526):831–843, 2019. [13] E. R. Rahman, R. A. Shuvo, M. H. K. Mehedi, M. S. Hossain, and A. A. Rasel. Distributed computing for big data analytics: Challenges and opportunities. ResearchGate, 2022. [14] G. Vaughan. Efficient big data model selection with applications to fraud detection. International Journal of Forecasting, 36(3):1116–1127, 2020. [15] H. Wang and Y. Ma. Optimal subsampling for quantile regression in big data. Biometrika, 108(1):99–112, 2021. [16] H. Wang, M. Yang, and J. Stufken. Information-based optimal subdata selection for big data linear regression. arXiv preprint arXiv:1710.10382, 2017. [17] H. Wang, R. Zhu, and P. Ma. Optimal subsampling for large sample logistic regression. Journal of the American Statistical Association, 13(522):829–844, 2018. [18] R. Xie, Z. Wang, S. Bai, P. Ma, and W. Zhong. Online decentralized leverage score sampling for streaming multidimensional time series. In K. Chaudhuri and M. Sugiyama, editors, Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, volume 89 of proceedings of Machine Learning Research, pages 2301–2311. PMLR, 2019. [19] Y. Yao and H. Wang. Optimal subsampling for softmax regression. Statistical Papers, 60(2):585–599, April 2019. [20] Y. Yao and H. Wang. A review on optimal subsampling methods for massive datasets. Journal of Data Science, 19(1):151–172, 2021. [21] Y. Yao, J. Zou, and H. Wang. Model constraints independent optimal subsampling probabilities for softmax regression. Journal of Statistical Planning and Inference, 225:188–201, 2023. [22] J. Yu, M. Ai, and Z. Ye. A review on design inspired subsampling for big data. Statistical Papers, 65(2):467–510, 2023. [23] H. Zhang and H. Wang. Distributed subdata selection for big data via sampling-based approach. Computational Statistics Data Analysis, 153:107072, 2021. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97662 | - |
| dc.description.abstract | 在最佳次抽樣方法中,標籤資訊的缺失對抽樣機率的估計構成顯著挑戰,特別是在進行分類任務時,傳統方法往往依賴完整的回應資料。為解決此問題,本文提出一套應用於 softmax 回歸模型的半監督 A-/L-最適次抽樣框架。該方法於基準約束條件下推導出理論上的最適抽樣機率,並進一步探討其在平衡類別回應分布方面的統計意涵。同時,我們亦考量不受拘束條件影響的抽樣方法,提出以最小化漸近預測均方誤差(Asymptotic Mean Squared Prediction Error, MSPE)為目標的抽樣策略,使所構建之抽樣機率對模型約束具備更高的穩健性。理論結果經由模擬數據與實證資料驗證,皆顯示本方法能有效提升預測準確率與計算效率。 | zh_TW |
| dc.description.abstract | Missing label information presents a significant challenge for optimal subsampling methods, which typically rely on complete response data to compute sampling probabilities. In this study, we propose a semi-supervised A-/L-optimal subsampling framework for softmax regression that effectively addresses this issue. We derive the optimal subsampling probabilities under the baseline constraint and highlight their role in balancing categorical responses. In addition, we explore constraint-invariant subsampling by minimizing the asymptotic mean squared prediction error (MSPE), enabling the construction of subsampling probabilities for each observation, which is robust to model constraint choices. Our theoretical findings are supported by simulations and real-data applications, demonstrating improvements in both prediction accuracy and computational efficiency. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-07-09T16:17:55Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2025-07-09T16:17:55Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | Acknowledgements i
摘要 ii Abstract iii Contents iv List of Figures vi List of Tables vii Chapter 1 Introduction 1 1.1 Challenges of Massive Data in Statistical Modeling . . . . . . . . . . 1 1.2 Softmax Regression for Multi-Class Classification . . . . . . . . . . 2 1.3 Research Objectives and Approach . . . . . . . . . . . . . . . . . . 4 1.4 Significance and Scope . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 2 Literature Review 6 2.1 Subsampling for Massive Datasets . . . . . . . . . . . . . . . . . . . 6 2.2 Optimal Designs in Subsampling . . . . . . . . . . . . . . . . . . . 7 2.3 Subsampling in Regression Models . . . . . . . . . . . . . . . . . . 8 Chapter 3 Optimal Design in Subsampling for Softmax Regression 10 3.1 Model and Subsampling Framework . . . . . . . . . . . . . . . . . . 10 3.2 Subsampling Under Baseline Constraint . . . . . . . . . . . . . . . . 11 3.3 Optimal Subsampling with Incomplete Labels . . . . . . . . . . . . . 15 Chapter 4 Simulation and Discussion 17 4.1 Simulation Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.1.1 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.1.2 Subsampling Methods . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 References 21 Appendix A — Assumptions and Theoretical Proofs 26 | - |
| dc.language.iso | en | - |
| dc.subject | 半監督式最佳化抽樣 | zh_TW |
| dc.subject | 歸一化指數函式 | zh_TW |
| dc.subject | 預測均方誤差 | zh_TW |
| dc.subject | Mean squared prediction error | en |
| dc.subject | Semi-supervised optimal subsampling | en |
| dc.subject | Softmax regression | en |
| dc.title | 最佳化半監督式歸一化指數函式抽樣 | zh_TW |
| dc.title | Optimal Semi-Supervised Subsampling for Softmax Regression | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 113-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.coadvisor | 林澤 | zh_TW |
| dc.contributor.coadvisor | Che Lin | en |
| dc.contributor.oralexamcommittee | 陳瑞彬;張明中 | zh_TW |
| dc.contributor.oralexamcommittee | Ray-Bing Chen;Ming-Chung Chang | en |
| dc.subject.keyword | 半監督式最佳化抽樣,歸一化指數函式,預測均方誤差, | zh_TW |
| dc.subject.keyword | Semi-supervised optimal subsampling,Softmax regression,Mean squared prediction error, | en |
| dc.relation.page | 28 | - |
| dc.identifier.doi | 10.6342/NTU202501310 | - |
| dc.rights.note | 未授權 | - |
| dc.date.accepted | 2025-06-30 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資料科學學位學程 | - |
| dc.date.embargo-lift | N/A | - |
| 顯示於系所單位: | 資料科學學位學程 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 未授權公開取用 | 3.63 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
