Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101409
Full metadata record
???org.dspace.app.webui.jsptag.ItemTag.dcfield???ValueLanguage
dc.contributor.advisor林智仁zh_TW
dc.contributor.advisorChih-Jen Linen
dc.contributor.author盧智寶zh_TW
dc.contributor.authorZhi-Bao Luen
dc.date.accessioned2026-01-27T16:37:51Z-
dc.date.available2026-01-28-
dc.date.copyright2026-01-27-
dc.date.issued2025-
dc.date.submitted2026-01-14-
dc.identifier.citation[1]  Yu-­Chen Lin, Si-­An Chen, Jie­-Jyun Liu, and Chih-­Jen Lin. Linear classifier: An often-­forgotten baseline for text classification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), pages 1876–1888, 2023. Short paper.
[2]  Bernhard E. Boser, Isabelle Guyon, and Vladimir Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Com­putational Learning Theory, pages 144–152. ACM Press, 1992.
[3]  Corina Cortes and Vladimir Vapnik. Support­ vector network. Machine Learning, 20:273–297, 1995.
[4]  Ji Zhu and Hui Zou. Variable selection for the linear support vector machine. In Ke Chen and Lipo Wang, editors, Trends in Neural Computation, pages 35–59. Springer Berlin Heidelberg, 2007.
[5]  Guo­-Xun Yuan, Kai-­Wei Chang, Cho-­Jui Hsieh, and Chih-­Jen Lin. A comparison of optimization methods and software for large­-scale l1-­regularized linear classifica­tion. Journal of Machine Learning Research, 11:3183–3234, 2010.
[6]  Wei­-Lin Chiang, Yu-­Sheng Li, Ching-­Pei Lee, and Chih-­Jen Lin. Limited-­memory common­-directions method for distributed l1­-regularized linear classification. In Proceedings of SIAM International Conference on Data Mining (SDM), 2018.
[7]  Antoine Dedieu, Rahul Mazumder, and Haoyue Wang. Solving L1­-regularized SVMs and related linear programs: revisiting the effectiveness of column and con­ straint generation. Journal of Machine Learning Research, 23(1), 2022.
[8]  Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. Scikit­-learn: machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
[9]  Rong-­En Fan, Kai­-Wei Chang, Cho­-Jui Hsieh, Xiang-­Rui Wang, and Chih­-Jen Lin. LIBLINEAR: a library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008.
[10]  Robert Moore and John DeNero. L1 and L2 regularization for multiclass hinge loss models. In Symposium on Machine Learning in Speech and Language Processing, 2011.
[11]  Ben J. Marafino, W. John Boscardin, and R. Adams Dudley. Efficient and sparse feature selection for biomedical text classification via the elastic net: Application to ICU risk stratification from nursing notes. Journal of Biomedical Informatics, 54:114–120, 2015.
[12]  Yashoteja Prabhu, Anil Kag, Shrutendra Harsola, Rahul Agrawal, and Manik Varma. Parabel: Partitioned label trees for extreme classification with application to dy­namic search advertising. In Proceedings of the 2018 World Wide Web Conference (WWW), pages 993–1002, 2018.
[13]  Sujay Khandagale, Han Xiao, and Rohit Babbar. Bonsai: diverse and shallow trees for extreme multi-label classification. Machine Learning, 109:2099–2119, 2020.
[14]  Hsiang-­Fu Yu, Kai Zhong, Jiong Zhang, Wei-­Cheng Chang, and Inderjit S. Dhillon. PECOS: Prediction for enormous and correlated output spaces. Journal of Machine Learning Research, 23(98):1–32, 2022.
[15]  Jiong Zhang, Yau­-Shian Wang, Wei-­Cheng Chang, Wei Li, Jyun­-Yu Jiang, Cho­-Jui Hsieh, and Hsiang­-Fu Yu. Build faster with less: A journey to accelerate sparse model building for semantic matching in product search. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pages 4960–4966, 2023.
[16]  H.Brendan McMahan, Gary Holt, D.Sculley, Michael Young, Dietmar Ebner,Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, and Jeremy Kubica. Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2013.
[17]  He­-Zhe Lin, Cheng-­Hung Liu, and Chih­-Jen Lin. Exploring space efficiency in a tree-­based linear model for extreme multi-­label classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 16245–16260, 2024.
[18]  Leonardo Galli and Chih­-Jen Lin. A study on truncated Newton methods for lin­ear classification. IEEE Transactions on Neural Networks and Learning Systems, 33(7):2828–2841, 2022.
[19]  Dong C. Liu and Jorge Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45(1):503–528, 1989.
[20]  Cho-­Jui Hsieh, Kai­-Wei Chang, Chih­-Jen Lin, S. Sathiya Keerthi, and Sellaman­ ickam Sundararajan. A dual coordinate descent method for large-­scale linear SVM. In Proceedings of the Twenty Fifth International Conference on Machine Learning (ICML), 2008.
[21] Ronghui You,Zihan Zhang,Ziye Wang, Suyang Dai, Hiroshi Mamitsuka, and Shan­ feng Zhu. AttentionXML: Label tree­based attention­aware deep model for high­-performance extreme multi­label text classification. In Advances in Neural Informa­ tion Processing Systems, volume 32, 2019.
[22] Po­-Wei Wang, Ching-­Pei Lee, and Chih-­Jen Lin. The common-­directions method for regularized empirical risk minimization. Journal of Machine Learning Research, 20(58):1–49, 2019.
[23] Shai Shalev-­Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: primal estimated sub­gradient solver for SVM. In Proceedings of the Twenty Fourth International Conference on Machine Learning (ICML), 2007.
[24] R. Tyrrell Rockafellar. Convex Analysis. Princeton University Press, Princeton, NJ, 1970.
[25] Wei­-Lin Chiang, Mu­-Chu Lee, and Chih­-Jen Lin. Parallel dual coordinate descent method for large-­scale linear classification in multi­core environments. In Proceed­ ings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2016.
[26] Hsiang-­Fu Yu, Fang­-Lan Huang, and Chih­-Jen Lin. Dual coordinate descent meth­ ods for logistic regression and maximum entropy models. Machine Learning, 85(1­ 2):41–75, October 2011.
[27] Chia­-Hua Ho and Chih-­Jen Lin. Large-­scale linear support vector regression. Journal of Machine Learning Research, 13:3323–3348, 2012.
[28] Vladimir Vapnik. The Nature of Statistical Learning Theory. Springer­Verlag, New York, NY, 1995.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101409-
dc.description.abstract對於傳統的線性模型而言,廣泛使用L2正則項的模型通常被認為是稠密的。因此,關於在什麼時候L2正則化線性模型可以得到稀疏解的問題很少受到關注。在這篇論文中,我們嚴謹地證明了對於L2正則化的支持向量分類/迴歸問題,當資料的特徵向量具有稀疏性時,其理論上的最優解可以為稀疏的。令人意外的是,我們觀察到某些最佳化方法無法保留這項稀疏性,反而產生稠密的數值解,導致不必要的儲存空間。我們透過詳細的分析來解釋這個現象。特別值得注意的是,我們新奇地發現某些投影梯度法在解決對偶問題時,會自然地產生比其他最佳化方法更稀疏的數值解。透過使用這些適合的算法,可以將模型的儲存空間減少多達50%,對於大規模的工業應用來說具有相當顯著的效益。zh_TW
dc.description.abstractFor traditional linear models with the widely used L2-regularizer, it is often assumed that the resulting models are dense. As a result, little attention has been paid to when the optimal solution for an L2-regularized problem can actually be sparse. In this work, we rigorously prove that for L2-regularized support vector classification/regression, the theoretical optimum can indeed be sparse when the data have sparse feature values. Surprisingly, we observe that some optimization methods fail to preserve this sparsity and instead produce fully dense numerical solutions, leading to unnecessary storage overhead. We explain this phenomenon through detailed analysis. In particular, we novelly show that certain projected gradient methods for solving the dual problem naturally yields sparser numerical solutions compared to other optimization algorithms. By applying suitable algorithms that preserve numerical sparsity, the storage can be reduced by up to 50%, which is highly advantageous for large-scale industrial applications.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-01-27T16:37:51Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2026-01-27T16:37:51Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsContents
口試委員會審定書 i
誌謝 ii
摘要 iii
Abstract iv
Contents v
List of Figures viii
List of Tables ix
1 Introduction 1
2 A Surprising Observation: The Weight Density Varies Among Optimizers 4
3 Investigation of the Weight Density on More Optimizers 6
3.1 Introduction of the Solvers 6
3.2 Weight Densities Using Different Solvers 9
4 Theoretical Analysis on the Sparsity of Optimal w∗ 10
5 Investigation: Why Different Density Among Different Solvers? 14
5.1 Newton and LBFGS Use All Training Instances 14
5.2 SGD Can Represent w by Only a Subset of Data 15
5.3 Dual­-based Methods Represent w by Support Vectors 16
6 Experiments on Multi-­label Classification 19
6.1 Experimental Settings 19
6.2 Model Size Comparison 20
6.3 Comparison of Testing Set Performance 21
7 Conclusions 22
Bibliography 23
Appendix 26
A Notation 27
B Data Statistics 28
B.1 Binary Data Sets 28
B.2 Multi­-label Data Sets 28
C Experimental Settings for Binary Classification Problems 30
C.1 LIBLINEAR Stopping Condition 30
C.2 Settings for Comparing the Weight Density 30
D Proofs
D.1 Proof of Theorem 1 32
D.2 Proof of Theorem 2 32
E Recovering the Theoretical Sparsity from Numerical Solutions 35
F Experimental Settings for LibMultiLabel 37
G Extension to L1-­loss Support Vector Classification 39
G.1 Theoretical Sparsity of w∗ 39
G.2 The Weight Density of the Numerical Solutions Obtained from Different Solvers 40
G.3 Model Size Reduction for L1­-loss in Extreme Multi­-label Classi­fication 41
H Extension to Logistic Loss 43
I Extension to Support Vector Regression 44
I.1 Problem Formulation 44
I.2 Theoretical Analysis on the Sparsity of Optimal w∗ in Support Vector Regression 46
I.3 Comparison of the Numerical Weight Density under Primal­ and Dual­-based Solvers 49
-
dc.language.isoen-
dc.subject線性分類-
dc.subject權重密度-
dc.subject極端多標籤分類-
dc.subject對偶問題-
dc.subject支持向量機-
dc.subjectLinear Classification-
dc.subjectWeight Density-
dc.subjectExtreme Multi-label Classification-
dc.subjectDual problem-
dc.subjectSupport vector machines-
dc.title關於 L2 正則化線性分類與迴歸的權重密度zh_TW
dc.titleOn the Weight Density of L2 ­Regularized Linear Classification and Regressionen
dc.typeThesis-
dc.date.schoolyear114-1-
dc.description.degree碩士-
dc.contributor.oralexamcommittee李育杰;李靜沛zh_TW
dc.contributor.oralexamcommitteeYuh-Jye Lee;Ching-pei Leeen
dc.subject.keyword線性分類,權重密度極端多標籤分類對偶問題支持向量機zh_TW
dc.subject.keywordLinear Classification,Weight DensityExtreme Multi-label ClassificationDual problemSupport vector machinesen
dc.relation.page49-
dc.identifier.doi10.6342/NTU202600011-
dc.rights.note同意授權(限校園內公開)-
dc.date.accepted2026-01-15-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
dc.date.embargo-lift2026-01-28-
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
ntu-114-1.pdf
Access limited in NTU ip range
1.45 MBAdobe PDF
Show simple item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved