用於學習分佈式可泛化知識之貝氏混合神經網路

廖耿德; Keng-Te Liao

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88664

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林守德	zh_TW
dc.contributor.advisor	Shou-De Lin	en
dc.contributor.author	廖耿德	zh_TW
dc.contributor.author	Keng-Te Liao	en
dc.date.accessioned	2023-08-15T17:16:48Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-08-15	-
dc.date.issued	2023	-
dc.date.submitted	2023-08-04	-
dc.identifier.citation	[1] M. Arjovsky, L. Bottou, I. Gulrajani and D. Lopez-Paz, "Invariant Risk Minimization," arXiv: 1907.02893, 2019. [2] Y. Shi, S. N, B. Paige and P. Torr, "Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models," in Advances in Neural Information Processing Systems 32, 2019. [3] D. P. Kingma and M. Welling, "Auto-Encoding Variational Bayes," in 2nd International Conference on Learning Representations (ICLR), 2014. [4] M. Suzuki, K. Nakayama and Y. Matsuo, "Joint Multimodal Learning with Deep Generative Models," in International Conference on Learning Representations (ICLR), 2017. [5] R. Vedantam, I. Fischer, J. Huang and K. Murphy, "Generative Models of Visually Grounded Imagination," in International Conference on Learning Representations (ICLR), 2018. [6] W.-N. Hsu and J. Glass, "Disentangling by Partitioning: A Representation Learning Framework for Multimodal Sensory Data," arXiv, 2018. [7] Y.-H. H. Tsai, P. P. Liang, A. Zadeh, L.-P. Morency and R. Salakhutdinov, "Learning Factorized Multimodal Representations," in ICLR, 2019. [8] G. E. Hinton, "Training products of experts by minimizing contrastive divergence," Neural Computation, 2002. [9] S. Mohamed, M. Rosca, M. Figurnov and A. Mnih, "Monte Carlo Gradient Estimation in Machine Learning," JMLR, vol. 21, 2020. [10] M. Wu and N. D. Goodman, "Multimodal Generative Models for Scalable Weakly-Supervised Learning," in Advances in Neural Information Processing Systems 31: Annual Conferenceon Neural Information Processing Systems, 2018. [11] T. M. Sutter, I. Daunhawer and J. E. Vogt, "Multimodal Generative Learning Utilizing Jensen-Shannon-Divergence," in Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems, 2020. [12] I. Daunhawer, T. M. Sutter, R. Marcinkevics and J. Vogt, "Self-supervised Disentanglement of Modality-specific and Shared Factors Improves Multimodal Generative Models," in GCPR, 2020. [13] B. Recht, R. Roelofs, L. Schmidt and V. Shankar, "Do ImageNet Classifiers Generalize to ImageNet?," in Proceedings of the 36th International Conference on Machine Learning (ICML), 2019. [14] R. Geirhos, J.-H. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge and F. A. Wichmann, "Shortcut learning in deep neural networks," Nature Machine Intelligence, 2020. [15] J. Peters, P. Buhlmann and N. Meinshausen, "Causal inference using invariant prediction: identiﬁcation and conﬁdence intervals," JRSS, 2016. [16] Y. J. Choe, J. Ham and K. Park, "An Empirical Study of Invariant Risk Minimization," in ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning, 2020. [17] E. Creager, J.-H. Jacobsen and R. Zemel, "Environment Inference for Invariant Learning," in Proceedings of the 38th International Conference on Machine Learning, 2021. [18] J. Liu, Z. Hu, P. Cui, B. Li and Z. Shen, "Heterogeneous Risk Minimization," in Proceedings of the 38th International Conference on Machine Learning, 2021. [19] J. Liu, Z. Hu, P. Cui, B. Li and Z. Shen, "Kernelized Heterogeneous Risk Minimization," in Advances in Neural Information Processing Systems, 2021. [20] A. Jacot, C. Hongler and F. Gabriel, "Neural Tangent Kernel: Convergence and Generalization in Neural Networks," in Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018. [21] J. A. Hoeting, D. Madigan, A. E. Raftery and C. T. Volinsky, "Bayesian Model Averaging: A Tutorial," Statistical Science, 1999. [22] Y. Burda, R. B. Grosse and R. Salakhutdinov, "Importance Weighted Autoencoders," in 4th International Conference on Learning Representations (ICLR), 2016. [23] E. Jang, S. Gu and B. Poole, "Categorical Reparameterization with Gumbel-Softmax," in 5th International Conference on Learning Representations (ICLR), 2017. [24] C. J. Maddison, A. Mnih and Y. W. Teh, "The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables," in 5th International Conference on Learning Representations (ICLR), 2017. [25] M. Figurnov, S. Mohamed and A. Mnih, "Implicit Reparameterization Gradients," in Advances in Neural Information Processing Systems, 2018. [26] Y. Lin, S. Zhu and P. Cui, "ZIN: When and How to Learn Invariance by Environment Inference?," arXiv: 2203.05818, 2022. [27] J. Pearl, Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann, 1988. [28] P. Ghosh, M. S. M. Sajjadi, A. Vergari, M. J. Black and B. Scholkopf, "From Variational to Deterministic Autoencoders," in 8th International Conference on Learning Representations (ICLR), 2020. [29] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," in Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [30] P. Bojanowski, E. Grave, A. Joulin and T. Mikolov, "Enriching Word Vectors with Subword Information," Trans. Assoc. Comput. Linguistics, vol. 5, 2017. [31] P. P. Liang, Y. Lyu, X. Fan, Z. Wu, Y. Cheng, J. Wu, L. Chen, P. Wu, M. A. Lee, Y. Zhu, u. Salakhutdinov and L. P. Morency, "MultiBench: Multiscale Benchmarks for Multimodal Representation Learning," in Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021. [32] A. Zadeh, R. Zellers, E. Pincus and L. P. Morency, "MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivit," CoRR abs/1606.06259, 2016. [33] M. K. Hasan, W. Rahman, A. Bagher Zadeh, J. Zhong, M. I. Tanveer, L.-P. Morency and M. (. Hoque, "UR-FUNNY: A Multimodal Language Dataset for Understanding Humor," in EMNLP-IJCNLP, 2019. [34] A. Bagher Zadeh, P. P. Liang, S. Poria, E. Cambria and L.-P. Morency, "Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph," in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018. [35] S. Castro, D. Hazarika, V. Perez-Rosas, R. Zimmermann, R. Mihalcea and S. Poria, "Towards Multimodal Sarcasm Detection (An Obviously Perfect Paper)," CoRR abs/1906.01815, 2019. [36] V. Vielzeuf, A. Lechervy, S. Pateux and F. Jurie, "CentralNet: a Multilayer Approach for Multimodal Fusion," CoRR abs/1808.07275, 2018.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88664	-
dc.description.abstract	在機器學習的領域中，估計模型在未知資料上的效能一直是一個重要的挑戰。一個經常被採用的方法是，假設訓練和測試資料是取樣自同一機率分佈。然而在實際應用中，訓練和測試資料分布之間往往存在偏移使得假設不成立。在本文中，我們假設訓練數據來自多個不同的分佈，並且共享與任務相關的知識。我們進而提出了一種新型模型: 貝氏混合神經網路，用於學習對非因果關係的分佈偏移有韌性的共享知識。通過提出的變分推理方法，我們提出的神經網路可以很容易地被應用於多模態和不變式學習的問題中。在這兩種問題中，訓練和測試分佈不一定會被假定為相似的分佈。以多模態學習來說，我們的神經網路可以在沒有明確監督訊號的情況下，從資料中解構共享和模態特定的資訊。同樣地，在不變式學習中，我們提出的神經網路能夠以無監督的方式學會丟棄與目標無因果關係的特徵。與現有的解決方案相比，我們提出的深度學習模型在多模態和不變學習的任務上均實現了最好的性能和效率。	zh_TW
dc.description.abstract	Estimating model performance on unseen data is a fundamental challenge in machine learning. A commonly adopted approach is to assume training and testing data are sampled from the same distribution; however, in real-world applications, a distribution shift between training and testing data often exists. In this paper, we assume training data are sampled from multiple and distinct distributions which share task-relevant knowledge. We then propose a novel model, Bayesian Mixture Neural Network (BMNN), for learning the shared knowledge that can be robust to non-causal distribution shift. With the proposed variational inference method, BMNN can be easily employed in multimodal and invariant learning problems, where the training and testing distributions are not necessarily assumed to be aligned. In multimodal learning, we show that BMNN can disentangle shared and modality-specific information without explicit supervision. Similarly, in invariant learning, BMNN learns to discard non-causal features in an unsupervised manner. Compared with existing solutions, BMNN achieves state-of-the-art performance and efficiency on both multimodal and invariant learning tasks.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-15T17:16:48Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-08-15T17:16:48Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	誌謝 iii 中文摘要 iv Abstract v 1 Introduction 1 2 Preliminaries 3 2.1 Multimodal Generative Models 3 2.2 Out-of-distribution Generalization and Invariant Learning 5 3 The Mixture-of-Experts Bayesian Mixture Neural Network 8 3.1 Overview 8 3.2 Dimension-wise MoE Mixture 9 3.3 Stochastic Inference on Mixture Weights 12 3.4 Explicit Regularization for Inference 14 4 The Product-of-Experts Bayesian Mixture Neural Network 15 4.1 Overview 15 4.2 The Alternative Form of mmJSD 15 4.3 Stochastic and Dimension-wise Weights 17 5 Bayesian Mixture Neural Network for Out-of-distribution Generalization 19 5.1 Overview 19 5.2 Properties of Ideal Environments 20 5.3 Obtaining Invariance via Environment Diversification 21 5.4 Environment Inference Network 23 5.5 The Bayesian Mixture Method for Representation Learning 25 6 Evaluation of Multimodal Bayesian Mixture Neural Networks 27 6.1 MNIST-SVHN-Text Evaluation 27 6.2 Caltech-UCSD Birds Evaluation 30 6.3 MultiBench Evaluation 32 7 Evaluation of Bayesian Mixture Neural Network on Biased Data 37 7.1 CMNIST Evaluation 37 8 Discussion 40 8.1 The Equivalence Between BMNN-M and BMNN-P 40 8.2 Analysis of Inferred Mixture Weights 41 8.2.1 Weight Visualization 41 8.2.2 Quantitative Analysis 42 8.2.3 Qualitative Analysis 44 8.2.4 Uniform Weights and Modality Collapse 46 8.3 Comparisons between multimodal and invariant BMNNs 48 9 Conclusion 49 10 References 50 Figure 1: The encoding and decoding procedures of MMVAE. 11 Figure 2: The encoding and decoding procedures of BMNN-M. 11 Figure 3: The assumed graphical model of the data generation process 23 Figure 4: An example of MNIST and SVHN data. 27 Figure 5: Examples of CUB data. Each data is composed of a bird photo with a corresponding caption. 30 Figure 6: Examples of CUB cross-modal generation results. 32 Figure 7: Downstream task performance and model robustness. The size of circles represents variance of robustness. 34 Figure 8: Downstream task performance and training speed. The size of circles represents variance of downstream task performance. 34 Figure 9: An example of CMNIST. 38 Figure 10: Mixture weights learned from MNIST-SVHN-Text. Each block represents an inferred weight value, where darker colored blocks indicate higher weights. 41 Figure 11: Digit generation via BMNN. 45 Figure 12: Style generation via BMNN. 45 Figure 13: Summary of the architectures of the BMNN variants. 48 Table 1: Classification accuracy of sampled latent vectors. BMNN has advantages when multiple modalities are available. 29 Table 2: Joint and cross-modal coherence performance. It can be seen that BMNN does not suffer from the performance drop issue and has advantages when multiple modalities are available. 29 Table 3: Correlation of images (I) and sentences (S) generations. Unimodal VAE is included for comparison. The ground truth of random coherence is 0.273. 31 Table 4: Summary of the datasets selected from MultiBench. 33 Table 5: Classification accuracy (%) on CMNIST. BMNN-I outperforms other unsupervised methods and has similar performance with IRM trained with provided and ideal environments. 39 Table 6: Digit classiﬁcation accuracy using sub-vectors. The results show that the digit information is aligned and encoded in dimension DS. 43 Table 7: Classiﬁcation accuracy of sampled latent vectors. Models with superscript u are trained by constant and uniform mixture weights. The dropped performance in the M, S and S, T columns could imply that models trained with uniform weights do not effectively preserve multi-modal information after merging. 46 Table 8: Coherence evaluation for verifying the impact of dimension-wise and learnable weights. The models with uniform weights have slight modality collapse issues such as BMNN-M (Uni) on S\|M and BMNN-P (Uni) on M\|S. 47	-
dc.language.iso	en	-
dc.subject	解構表示法	zh_TW
dc.subject	隨機變分推論	zh_TW
dc.subject	外分佈問題	zh_TW
dc.subject	多模態學習	zh_TW
dc.subject	不變式學習	zh_TW
dc.subject	模態缺失	zh_TW
dc.subject	invariant learning	en
dc.subject	missing modality	en
dc.subject	stochastic variational inference	en
dc.subject	out-of-distribution	en
dc.subject	Multimodal learning	en
dc.subject	disentangled representation	en
dc.title	用於學習分佈式可泛化知識之貝氏混合神經網路	zh_TW
dc.title	Bayesian Mixture Neural Networks for Learning Distributionally Generalizable Knowledge	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	博士	-
dc.contributor.oralexamcommittee	林智仁;林軒田;陳尚澤;曾新穆	zh_TW
dc.contributor.oralexamcommittee	Chih-Jen Lin;Hsuan-Tien Lin;Shang-Tse Chen;Vincent S. Tseng	en
dc.subject.keyword	多模態學習,不變式學習,外分佈問題,模態缺失,隨機變分推論,解構表示法,	zh_TW
dc.subject.keyword	Multimodal learning,invariant learning,out-of-distribution,missing modality,stochastic variational inference,disentangled representation,	en
dc.relation.page	55	-
dc.identifier.doi	10.6342/NTU202302396	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2023-08-07	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	1.69 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。