給定無法取得之特徵並學習預測

Chun-Chen Lin; 林俊辰

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/49609

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林守德(Shou-De Lin)
dc.contributor.author	Chun-Chen Lin	en
dc.contributor.author	林俊辰	zh_TW
dc.date.accessioned	2021-06-15T11:37:29Z	-
dc.date.available	2020-08-25
dc.date.copyright	2020-08-25
dc.date.issued	2020
dc.date.submitted	2020-08-12
dc.identifier.citation	[1] S. van Buuren and K. Groothuis-Oudshoorn, “mice: Multivariate imputation by chained equations in r,” Journal of Statistical Software, Articles, vol. 45, no. 3, pp. 1–67, 2011. [2] D. J. Stekhoven and P. Bu ̈hlmann, “MissForest—non-parametric missing value imputation for mixed-type data,” Bioinformatics, vol. 28, no. 1, pp. 112–118, 10 2011. [3] R. Mazumder, T. Hastie, and R. Tibshirani, “Spectral regularization algorithms for learning large incomplete matrices,” Journal of Machine Learning Research, vol. 11, no. 80, pp. 2287–2322, 2010. [4] P. J. Garc ́ıa-Laencina, J.-L. Sancho-Go ́mez, and A. R. Figueiras-Vidal, “Pattern classification with missing data: a review,” Neural Computing and Applications, vol. 19, no. 2, pp. 263–282, 2010. [5] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proceedings of the 25th International Conference on Machine Learning, 2008, p. 1096–1103. [6] J. Yoon, J. Jordon, and M. van der Schaar, “Gain: Missing data imputation using generative adversarial nets,” in International Conference on Machine Learning, 2018. [7] S. C.-X. Li, B. Jiang, and B. M. Marlin, “Misgan: Learning from incomplete data with generative adversarial networks,” in International Conference on Learning Representations, 2019. [8] M. S. Santos, J. P. Soares, P. H. Abreu, H. Arau ́jo, and J. Santos, “Influence of data distribution in missing data imputation,” in Conference on Artificial Intelligence in Medicine in Europe. Springer, 2017, pp. 285–294. [9] C. Liu, “Missing data imputation using the multivariate t-distribution,” Journal of multivariate analysis, vol. 53, no. 1, pp. 139–158, 1995. [10] R. J. Little and D. B. Rubin, Statistical analysis with missing data. John Wiley Sons, 2019, vol. 793. [11] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, 2019, pp. 4171–4186. [12] J. Wu, X. Wang, and W. Y. Wang, “Self-supervised dialogue learning,” in Association for Computational Linguistics, 2019. [13] W. Su, X. Zhu, Y. Cao, B. Li, L. Lu, F. Wei, and J. Dai, “Vl-bert: Pre-training of generic visual-linguistic representations,” in International Conference on Learning Representations, 2020. [14] C. Doersch and A. Zisserman, “Multi-task self-supervised visual learning,” in International Conference on Computer Vision, 2017. [15] S. Gidaris, P. Singh, and N. Komodakis, “Unsupervised representation learn- ing by predicting image rotations,” in International Conference on Learning Representations, 2018. [16] A. Newell and J. Deng, “How useful is self-supervised pretraining for visual tasks?” in Conference on Computer Vision and Pattern Recognition, 2020. [17] C. Buciluundefined, R. Caruana, and A. Niculescu-Mizil, “Model compression,” in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, p. 535–541. [18] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” in arXiv preprint arXiv:1503.02531, 2015. [19] G. Chen, W. Choi, X. Yu, T. Han, and M. Chandraker, “Learning efficient object detection models with knowledge distillation,” in Advances in Neural Information Processing Systems 30, 2017, pp. 742–751. [20] L. Lu, M. Guo, and S. Renals, “Knowledge distillation for small-footprint highway networks,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, 2017, pp. 4820–4824. [21] T. Fukuda, M. Suzuki, G. Kurata, S. Thomas, J. Cui, and B. Ramabhadran, “Efficient knowledge distillation from an ensemble of teachers.” in Proc. Interspeech, 2017, pp. 3697–3701. [22] Y.Liu, H.Xiong, J.Zhang, Z.He, H.Wu, H.Wang, and C.Zong, “End-to-end speech translation with knowledge distillation,” in Proc. Interspeech, 2019, pp. 1128–1132. [23] Y. Kim and A. M. Rush, “Sequence-level knowledge distillation,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, pp. 1317–1327. [24] M. Hu, Y. Peng, F. Wei, Z. Huang, D. li, N. Yang, and M. Zhou, “Attention- guided answer distillation for machine reading comprehension,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 2077–2086. [25] X. Tan, Y. Ren, D. He, T. Qin, Z. Zhao, and T.-Y. Liu, “Multilingual neural machine translation with knowledge distillation,” in International Conference on Learning Representations, 2019. [26] S.Sun, Y.Cheng, Z.Gan, and J.Liu, “PatientknowledgedistillationforBERT model compression,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 4323–4332. [27] M. Lichman, “UCI machine learning repository,” 2013, URL http://archive.ics.uci.edu/ml.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/49609	-
dc.description.abstract	從具有缺失值的數據中學習已成為現實應用程序中普遍面臨的挑戰。這項工作著重於一個特定的場景，即在預測階段無法使用訓練特徵維的子集。在這種情況下，大多數現有方法都存在指定特徵尺寸的空白，因此無法提供高質量的結果。這項工作提出了一種新穎的基於神經的學習框架，以利用訓練中獲得的知識來減輕預測過程中某些特徵缺失的影響。我們的解決方案結合了兩種知識轉移策略，使模型可以從不斷減少的功能以及經過全面信息培訓的老師網絡中學習。實驗結果表明我們的權重減輕算法的有效性以及我們的師生學習框架的整體優越性。	zh_TW
dc.description.abstract	Learning from data with missing values has become a commonly faced challenge in real-world applications. This work emphasizes on a specific scenario that a subset of training feature dimensions becomes unavailable during the prediction stage. In this certain case, most of the existing approaches suffer from the vacancy of designated feature dimensions, thus not capable of providing quality results. This work proposes a novel neural-based learning framework to leverage the knowledge obtained during training to alleviate the effect from missing of certain features during prediction. Our solutions incorporate two knowledge transferring strategies allowing the model to learn from diminishing features as well as from a teacher network trained with full information. Experiment results show promising outcomes comparing with the state-of-the-art imputation-based solutions and the effectiveness of our weight diminishing algorithm and the whole superiority of our teacher-student learning framework, compared to state-of-the-art methods tackling missing data.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T11:37:29Z (GMT). No. of bitstreams: 1 U0001-1208202010551200.pdf: 3129681 bytes, checksum: 399d432bf48a77af14d259b0c5a5eb4f (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	Acknowledgments i Abstract ii List of Figures v List of Tables vii 1 Introduction 1 2 Related Work 5 2.1 Missing Data Imputation ...................... 5 2.2 Self-supervised learning ...................... 7 2.3 Knowledge Distillation ....................... 8 3 Methodology 9 3.1 Self-Transfer through Weight Diminishing . . . . . . . . . . . . . 9 3.2 Teacher-Student Framework for model-wise knowledge transferring 12 3.3 Integrating the self-transfer and model-based transfer strategies . . 14 4 Experiments 15 4.1 Experiment Setups ......................... 15 4.2 Comparison with Imputation-based Methods . . . . . . . . . . . . 16 4.3 Comparison with Multi-task Learning . . . . . . . . . . . . . . . 18 4.4 Comparison among different knowledge transferring strategy . . . 20 4.5 Discussion and parameter sensitivity analysis . . . . . . . . . . . 21 5 Conclusion 25 Reference 26
dc.language.iso	en
dc.subject	知識蒸餾	zh_TW
dc.subject	機器學習	zh_TW
dc.subject	資料探勘	zh_TW
dc.subject	缺失資料	zh_TW
dc.subject	Data mining	en
dc.subject	Missing data	en
dc.subject	Knowledge distillation	en
dc.subject	Machine learning	en
dc.title	給定無法取得之特徵並學習預測	zh_TW
dc.title	Learning to Predict Given Unavailable Features	en
dc.type	Thesis
dc.date.schoolyear	108-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	孫民(Min Sun),陳尚澤(Shang-Tse Chen),李政德(Cheng-Te Li)
dc.subject.keyword	機器學習,資料探勘,缺失資料,知識蒸餾,	zh_TW
dc.subject.keyword	Machine learning,Data mining,Missing data,Knowledge distillation,	en
dc.relation.page	29
dc.identifier.doi	10.6342/NTU202003057
dc.rights.note	有償授權
dc.date.accepted	2020-08-13
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
U0001-1208202010551200.pdf 未授權公開取用	3.06 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。