請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/49609完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 林守德(Shou-De Lin) | |
| dc.contributor.author | Chun-Chen Lin | en |
| dc.contributor.author | 林俊辰 | zh_TW |
| dc.date.accessioned | 2021-06-15T11:37:29Z | - |
| dc.date.available | 2020-08-25 | |
| dc.date.copyright | 2020-08-25 | |
| dc.date.issued | 2020 | |
| dc.date.submitted | 2020-08-12 | |
| dc.identifier.citation | [1] S. van Buuren and K. Groothuis-Oudshoorn, “mice: Multivariate imputation by chained equations in r,” Journal of Statistical Software, Articles, vol. 45, no. 3, pp. 1–67, 2011. [2] D. J. Stekhoven and P. Bu ̈hlmann, “MissForest—non-parametric missing value imputation for mixed-type data,” Bioinformatics, vol. 28, no. 1, pp. 112–118, 10 2011. [3] R. Mazumder, T. Hastie, and R. Tibshirani, “Spectral regularization algorithms for learning large incomplete matrices,” Journal of Machine Learning Research, vol. 11, no. 80, pp. 2287–2322, 2010. [4] P. J. Garc ́ıa-Laencina, J.-L. Sancho-Go ́mez, and A. R. Figueiras-Vidal, “Pattern classification with missing data: a review,” Neural Computing and Applications, vol. 19, no. 2, pp. 263–282, 2010. [5] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proceedings of the 25th International Conference on Machine Learning, 2008, p. 1096–1103. [6] J. Yoon, J. Jordon, and M. van der Schaar, “Gain: Missing data imputation using generative adversarial nets,” in International Conference on Machine Learning, 2018. [7] S. C.-X. Li, B. Jiang, and B. M. Marlin, “Misgan: Learning from incomplete data with generative adversarial networks,” in International Conference on Learning Representations, 2019. [8] M. S. Santos, J. P. Soares, P. H. Abreu, H. Arau ́jo, and J. Santos, “Influence of data distribution in missing data imputation,” in Conference on Artificial Intelligence in Medicine in Europe. Springer, 2017, pp. 285–294. [9] C. Liu, “Missing data imputation using the multivariate t-distribution,” Journal of multivariate analysis, vol. 53, no. 1, pp. 139–158, 1995. [10] R. J. Little and D. B. Rubin, Statistical analysis with missing data. John Wiley Sons, 2019, vol. 793. [11] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, 2019, pp. 4171–4186. [12] J. Wu, X. Wang, and W. Y. Wang, “Self-supervised dialogue learning,” in Association for Computational Linguistics, 2019. [13] W. Su, X. Zhu, Y. Cao, B. Li, L. Lu, F. Wei, and J. Dai, “Vl-bert: Pre-training of generic visual-linguistic representations,” in International Conference on Learning Representations, 2020. [14] C. Doersch and A. Zisserman, “Multi-task self-supervised visual learning,” in International Conference on Computer Vision, 2017. [15] S. Gidaris, P. Singh, and N. Komodakis, “Unsupervised representation learn- ing by predicting image rotations,” in International Conference on Learning Representations, 2018. [16] A. Newell and J. Deng, “How useful is self-supervised pretraining for visual tasks?” in Conference on Computer Vision and Pattern Recognition, 2020. [17] C. Buciluundefined, R. Caruana, and A. Niculescu-Mizil, “Model compression,” in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, p. 535–541. [18] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” in arXiv preprint arXiv:1503.02531, 2015. [19] G. Chen, W. Choi, X. Yu, T. Han, and M. Chandraker, “Learning efficient object detection models with knowledge distillation,” in Advances in Neural Information Processing Systems 30, 2017, pp. 742–751. [20] L. Lu, M. Guo, and S. Renals, “Knowledge distillation for small-footprint highway networks,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, 2017, pp. 4820–4824. [21] T. Fukuda, M. Suzuki, G. Kurata, S. Thomas, J. Cui, and B. Ramabhadran, “Efficient knowledge distillation from an ensemble of teachers.” in Proc. Interspeech, 2017, pp. 3697–3701. [22] Y.Liu, H.Xiong, J.Zhang, Z.He, H.Wu, H.Wang, and C.Zong, “End-to-end speech translation with knowledge distillation,” in Proc. Interspeech, 2019, pp. 1128–1132. [23] Y. Kim and A. M. Rush, “Sequence-level knowledge distillation,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, pp. 1317–1327. [24] M. Hu, Y. Peng, F. Wei, Z. Huang, D. li, N. Yang, and M. Zhou, “Attention- guided answer distillation for machine reading comprehension,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 2077–2086. [25] X. Tan, Y. Ren, D. He, T. Qin, Z. Zhao, and T.-Y. Liu, “Multilingual neural machine translation with knowledge distillation,” in International Conference on Learning Representations, 2019. [26] S.Sun, Y.Cheng, Z.Gan, and J.Liu, “PatientknowledgedistillationforBERT model compression,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 4323–4332. [27] M. Lichman, “UCI machine learning repository,” 2013, URL http://archive.ics.uci.edu/ml. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/49609 | - |
| dc.description.abstract | 從具有缺失值的數據中學習已成為現實應用程序中普遍面臨的挑戰。這項工 作著重於一個特定的場景,即在預測階段無法使用訓練特徵維的子集。在這種情況 下,大多數現有方法都存在指定特徵尺寸的空白,因此無法提供高質量的結果。這 項工作提出了一種新穎的基於神經的學習框架,以利用訓練中獲得的知識來減輕 預測過程中某些特徵缺失的影響。我們的解決方案結合了兩種知識轉移策略,使模 型可以從不斷減少的功能以及經過全面信息培訓的老師網絡中學習。實驗結果表 明我們的權重減輕算法的有效性以及我們的師生學習框架的整體優越性。 | zh_TW |
| dc.description.abstract | Learning from data with missing values has become a commonly faced challenge in real-world applications. This work emphasizes on a specific scenario that a subset of training feature dimensions becomes unavailable during the prediction stage. In this certain case, most of the existing approaches suffer from the vacancy of designated feature dimensions, thus not capable of providing quality results. This work proposes a novel neural-based learning framework to leverage the knowledge obtained during training to alleviate the effect from missing of certain features during prediction. Our solutions incorporate two knowledge transferring strategies allowing the model to learn from diminishing features as well as from a teacher network trained with full information. Experiment results show promising outcomes comparing with the state-of-the-art imputation-based solutions and the effectiveness of our weight diminishing algorithm and the whole superiority of our teacher-student learning framework, compared to state-of-the-art methods tackling missing data. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-15T11:37:29Z (GMT). No. of bitstreams: 1 U0001-1208202010551200.pdf: 3129681 bytes, checksum: 399d432bf48a77af14d259b0c5a5eb4f (MD5) Previous issue date: 2020 | en |
| dc.description.tableofcontents | Acknowledgments i Abstract ii List of Figures v List of Tables vii 1 Introduction 1 2 Related Work 5 2.1 Missing Data Imputation ...................... 5 2.2 Self-supervised learning ...................... 7 2.3 Knowledge Distillation ....................... 8 3 Methodology 9 3.1 Self-Transfer through Weight Diminishing . . . . . . . . . . . . . 9 3.2 Teacher-Student Framework for model-wise knowledge transferring 12 3.3 Integrating the self-transfer and model-based transfer strategies . . 14 4 Experiments 15 4.1 Experiment Setups ......................... 15 4.2 Comparison with Imputation-based Methods . . . . . . . . . . . . 16 4.3 Comparison with Multi-task Learning . . . . . . . . . . . . . . . 18 4.4 Comparison among different knowledge transferring strategy . . . 20 4.5 Discussion and parameter sensitivity analysis . . . . . . . . . . . 21 5 Conclusion 25 Reference 26 | |
| dc.language.iso | en | |
| dc.subject | 知識蒸餾 | zh_TW |
| dc.subject | 機器學習 | zh_TW |
| dc.subject | 資料探勘 | zh_TW |
| dc.subject | 缺失資料 | zh_TW |
| dc.subject | Data mining | en |
| dc.subject | Missing data | en |
| dc.subject | Knowledge distillation | en |
| dc.subject | Machine learning | en |
| dc.title | 給定無法取得之特徵並學習預測 | zh_TW |
| dc.title | Learning to Predict Given Unavailable Features | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 108-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 孫民(Min Sun),陳尚澤(Shang-Tse Chen),李政德(Cheng-Te Li) | |
| dc.subject.keyword | 機器學習,資料探勘,缺失資料,知識蒸餾, | zh_TW |
| dc.subject.keyword | Machine learning,Data mining,Missing data,Knowledge distillation, | en |
| dc.relation.page | 29 | |
| dc.identifier.doi | 10.6342/NTU202003057 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2020-08-13 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-1208202010551200.pdf 未授權公開取用 | 3.06 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
