請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88664
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 林守德 | zh_TW |
dc.contributor.advisor | Shou-De Lin | en |
dc.contributor.author | 廖耿德 | zh_TW |
dc.contributor.author | Keng-Te Liao | en |
dc.date.accessioned | 2023-08-15T17:16:48Z | - |
dc.date.available | 2023-11-09 | - |
dc.date.copyright | 2023-08-15 | - |
dc.date.issued | 2023 | - |
dc.date.submitted | 2023-08-04 | - |
dc.identifier.citation | [1] M. Arjovsky, L. Bottou, I. Gulrajani and D. Lopez-Paz, "Invariant Risk Minimization," arXiv: 1907.02893, 2019.
[2] Y. Shi, S. N, B. Paige and P. Torr, "Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models," in Advances in Neural Information Processing Systems 32, 2019. [3] D. P. Kingma and M. Welling, "Auto-Encoding Variational Bayes," in 2nd International Conference on Learning Representations (ICLR), 2014. [4] M. Suzuki, K. Nakayama and Y. Matsuo, "Joint Multimodal Learning with Deep Generative Models," in International Conference on Learning Representations (ICLR), 2017. [5] R. Vedantam, I. Fischer, J. Huang and K. Murphy, "Generative Models of Visually Grounded Imagination," in International Conference on Learning Representations (ICLR), 2018. [6] W.-N. Hsu and J. Glass, "Disentangling by Partitioning: A Representation Learning Framework for Multimodal Sensory Data," arXiv, 2018. [7] Y.-H. H. Tsai, P. P. Liang, A. Zadeh, L.-P. Morency and R. Salakhutdinov, "Learning Factorized Multimodal Representations," in ICLR, 2019. [8] G. E. Hinton, "Training products of experts by minimizing contrastive divergence," Neural Computation, 2002. [9] S. Mohamed, M. Rosca, M. Figurnov and A. Mnih, "Monte Carlo Gradient Estimation in Machine Learning," JMLR, vol. 21, 2020. [10] M. Wu and N. D. Goodman, "Multimodal Generative Models for Scalable Weakly-Supervised Learning," in Advances in Neural Information Processing Systems 31: Annual Conferenceon Neural Information Processing Systems, 2018. [11] T. M. Sutter, I. Daunhawer and J. E. Vogt, "Multimodal Generative Learning Utilizing Jensen-Shannon-Divergence," in Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems, 2020. [12] I. Daunhawer, T. M. Sutter, R. Marcinkevics and J. Vogt, "Self-supervised Disentanglement of Modality-specific and Shared Factors Improves Multimodal Generative Models," in GCPR, 2020. [13] B. Recht, R. Roelofs, L. Schmidt and V. Shankar, "Do ImageNet Classifiers Generalize to ImageNet?," in Proceedings of the 36th International Conference on Machine Learning (ICML), 2019. [14] R. Geirhos, J.-H. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge and F. A. Wichmann, "Shortcut learning in deep neural networks," Nature Machine Intelligence, 2020. [15] J. Peters, P. Buhlmann and N. Meinshausen, "Causal inference using invariant prediction: identification and confidence intervals," JRSS, 2016. [16] Y. J. Choe, J. Ham and K. Park, "An Empirical Study of Invariant Risk Minimization," in ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning, 2020. [17] E. Creager, J.-H. Jacobsen and R. Zemel, "Environment Inference for Invariant Learning," in Proceedings of the 38th International Conference on Machine Learning, 2021. [18] J. Liu, Z. Hu, P. Cui, B. Li and Z. Shen, "Heterogeneous Risk Minimization," in Proceedings of the 38th International Conference on Machine Learning, 2021. [19] J. Liu, Z. Hu, P. Cui, B. Li and Z. Shen, "Kernelized Heterogeneous Risk Minimization," in Advances in Neural Information Processing Systems, 2021. [20] A. Jacot, C. Hongler and F. Gabriel, "Neural Tangent Kernel: Convergence and Generalization in Neural Networks," in Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018. [21] J. A. Hoeting, D. Madigan, A. E. Raftery and C. T. Volinsky, "Bayesian Model Averaging: A Tutorial," Statistical Science, 1999. [22] Y. Burda, R. B. Grosse and R. Salakhutdinov, "Importance Weighted Autoencoders," in 4th International Conference on Learning Representations (ICLR), 2016. [23] E. Jang, S. Gu and B. Poole, "Categorical Reparameterization with Gumbel-Softmax," in 5th International Conference on Learning Representations (ICLR), 2017. [24] C. J. Maddison, A. Mnih and Y. W. Teh, "The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables," in 5th International Conference on Learning Representations (ICLR), 2017. [25] M. Figurnov, S. Mohamed and A. Mnih, "Implicit Reparameterization Gradients," in Advances in Neural Information Processing Systems, 2018. [26] Y. Lin, S. Zhu and P. Cui, "ZIN: When and How to Learn Invariance by Environment Inference?," arXiv: 2203.05818, 2022. [27] J. Pearl, Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann, 1988. [28] P. Ghosh, M. S. M. Sajjadi, A. Vergari, M. J. Black and B. Scholkopf, "From Variational to Deterministic Autoencoders," in 8th International Conference on Learning Representations (ICLR), 2020. [29] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," in Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [30] P. Bojanowski, E. Grave, A. Joulin and T. Mikolov, "Enriching Word Vectors with Subword Information," Trans. Assoc. Comput. Linguistics, vol. 5, 2017. [31] P. P. Liang, Y. Lyu, X. Fan, Z. Wu, Y. Cheng, J. Wu, L. Chen, P. Wu, M. A. Lee, Y. Zhu, u. Salakhutdinov and L. P. Morency, "MultiBench: Multiscale Benchmarks for Multimodal Representation Learning," in Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021. [32] A. Zadeh, R. Zellers, E. Pincus and L. P. Morency, "MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivit," CoRR abs/1606.06259, 2016. [33] M. K. Hasan, W. Rahman, A. Bagher Zadeh, J. Zhong, M. I. Tanveer, L.-P. Morency and M. (. Hoque, "UR-FUNNY: A Multimodal Language Dataset for Understanding Humor," in EMNLP-IJCNLP, 2019. [34] A. Bagher Zadeh, P. P. Liang, S. Poria, E. Cambria and L.-P. Morency, "Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph," in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018. [35] S. Castro, D. Hazarika, V. Perez-Rosas, R. Zimmermann, R. Mihalcea and S. Poria, "Towards Multimodal Sarcasm Detection (An Obviously Perfect Paper)," CoRR abs/1906.01815, 2019. [36] V. Vielzeuf, A. Lechervy, S. Pateux and F. Jurie, "CentralNet: a Multilayer Approach for Multimodal Fusion," CoRR abs/1808.07275, 2018. | - |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88664 | - |
dc.description.abstract | 在機器學習的領域中,估計模型在未知資料上的效能一直是一個重要的挑戰。一個經常被採用的方法是,假設訓練和測試資料是取樣自同一機率分佈。然而在實際應用中,訓練和測試資料分布之間往往存在偏移使得假設不成立。在本文中,我們假設訓練數據來自多個不同的分佈,並且共享與任務相關的知識。我們進而提出了一種新型模型: 貝氏混合神經網路,用於學習對非因果關係的分佈偏移有韌性的共享知識。通過提出的變分推理方法,我們提出的神經網路可以很容易地被應用於多模態和不變式學習的問題中。在這兩種問題中,訓練和測試分佈不一定會被假定為相似的分佈。以多模態學習來說,我們的神經網路可以在沒有明確監督訊號的情況下,從資料中解構共享和模態特定的資訊。同樣地,在不變式學習中,我們提出的神經網路能夠以無監督的方式學會丟棄與目標無因果關係的特徵。與現有的解決方案相比,我們提出的深度學習模型在多模態和不變學習的任務上均實現了最好的性能和效率。 | zh_TW |
dc.description.abstract | Estimating model performance on unseen data is a fundamental challenge in machine learning. A commonly adopted approach is to assume training and testing data are sampled from the same distribution; however, in real-world applications, a distribution shift between training and testing data often exists. In this paper, we assume training data are sampled from multiple and distinct distributions which share task-relevant knowledge. We then propose a novel model, Bayesian Mixture Neural Network (BMNN), for learning the shared knowledge that can be robust to non-causal distribution shift. With the proposed variational inference method, BMNN can be easily employed in multimodal and invariant learning problems, where the training and testing distributions are not necessarily assumed to be aligned. In multimodal learning, we show that BMNN can disentangle shared and modality-specific information without explicit supervision. Similarly, in invariant learning, BMNN learns to discard non-causal features in an unsupervised manner. Compared with existing solutions, BMNN achieves state-of-the-art performance and efficiency on both multimodal and invariant learning tasks. | en |
dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-15T17:16:48Z No. of bitstreams: 0 | en |
dc.description.provenance | Made available in DSpace on 2023-08-15T17:16:48Z (GMT). No. of bitstreams: 0 | en |
dc.description.tableofcontents | 誌謝 iii
中文摘要 iv Abstract v 1 Introduction 1 2 Preliminaries 3 2.1 Multimodal Generative Models 3 2.2 Out-of-distribution Generalization and Invariant Learning 5 3 The Mixture-of-Experts Bayesian Mixture Neural Network 8 3.1 Overview 8 3.2 Dimension-wise MoE Mixture 9 3.3 Stochastic Inference on Mixture Weights 12 3.4 Explicit Regularization for Inference 14 4 The Product-of-Experts Bayesian Mixture Neural Network 15 4.1 Overview 15 4.2 The Alternative Form of mmJSD 15 4.3 Stochastic and Dimension-wise Weights 17 5 Bayesian Mixture Neural Network for Out-of-distribution Generalization 19 5.1 Overview 19 5.2 Properties of Ideal Environments 20 5.3 Obtaining Invariance via Environment Diversification 21 5.4 Environment Inference Network 23 5.5 The Bayesian Mixture Method for Representation Learning 25 6 Evaluation of Multimodal Bayesian Mixture Neural Networks 27 6.1 MNIST-SVHN-Text Evaluation 27 6.2 Caltech-UCSD Birds Evaluation 30 6.3 MultiBench Evaluation 32 7 Evaluation of Bayesian Mixture Neural Network on Biased Data 37 7.1 CMNIST Evaluation 37 8 Discussion 40 8.1 The Equivalence Between BMNN-M and BMNN-P 40 8.2 Analysis of Inferred Mixture Weights 41 8.2.1 Weight Visualization 41 8.2.2 Quantitative Analysis 42 8.2.3 Qualitative Analysis 44 8.2.4 Uniform Weights and Modality Collapse 46 8.3 Comparisons between multimodal and invariant BMNNs 48 9 Conclusion 49 10 References 50 Figure 1: The encoding and decoding procedures of MMVAE. 11 Figure 2: The encoding and decoding procedures of BMNN-M. 11 Figure 3: The assumed graphical model of the data generation process 23 Figure 4: An example of MNIST and SVHN data. 27 Figure 5: Examples of CUB data. Each data is composed of a bird photo with a corresponding caption. 30 Figure 6: Examples of CUB cross-modal generation results. 32 Figure 7: Downstream task performance and model robustness. The size of circles represents variance of robustness. 34 Figure 8: Downstream task performance and training speed. The size of circles represents variance of downstream task performance. 34 Figure 9: An example of CMNIST. 38 Figure 10: Mixture weights learned from MNIST-SVHN-Text. Each block represents an inferred weight value, where darker colored blocks indicate higher weights. 41 Figure 11: Digit generation via BMNN. 45 Figure 12: Style generation via BMNN. 45 Figure 13: Summary of the architectures of the BMNN variants. 48 Table 1: Classification accuracy of sampled latent vectors. BMNN has advantages when multiple modalities are available. 29 Table 2: Joint and cross-modal coherence performance. It can be seen that BMNN does not suffer from the performance drop issue and has advantages when multiple modalities are available. 29 Table 3: Correlation of images (I) and sentences (S) generations. Unimodal VAE is included for comparison. The ground truth of random coherence is 0.273. 31 Table 4: Summary of the datasets selected from MultiBench. 33 Table 5: Classification accuracy (%) on CMNIST. BMNN-I outperforms other unsupervised methods and has similar performance with IRM trained with provided and ideal environments. 39 Table 6: Digit classification accuracy using sub-vectors. The results show that the digit information is aligned and encoded in dimension DS. 43 Table 7: Classification accuracy of sampled latent vectors. Models with superscript u are trained by constant and uniform mixture weights. The dropped performance in the M, S and S, T columns could imply that models trained with uniform weights do not effectively preserve multi-modal information after merging. 46 Table 8: Coherence evaluation for verifying the impact of dimension-wise and learnable weights. The models with uniform weights have slight modality collapse issues such as BMNN-M (Uni) on S|M and BMNN-P (Uni) on M|S. 47 | - |
dc.language.iso | en | - |
dc.title | 用於學習分佈式可泛化知識之貝氏混合神經網路 | zh_TW |
dc.title | Bayesian Mixture Neural Networks for Learning Distributionally Generalizable Knowledge | en |
dc.type | Thesis | - |
dc.date.schoolyear | 111-2 | - |
dc.description.degree | 博士 | - |
dc.contributor.oralexamcommittee | 林智仁;林軒田;陳尚澤;曾新穆 | zh_TW |
dc.contributor.oralexamcommittee | Chih-Jen Lin;Hsuan-Tien Lin;Shang-Tse Chen;Vincent S. Tseng | en |
dc.subject.keyword | 多模態學習,不變式學習,外分佈問題,模態缺失,隨機變分推論,解構表示法, | zh_TW |
dc.subject.keyword | Multimodal learning,invariant learning,out-of-distribution,missing modality,stochastic variational inference,disentangled representation, | en |
dc.relation.page | 55 | - |
dc.identifier.doi | 10.6342/NTU202302396 | - |
dc.rights.note | 同意授權(限校園內公開) | - |
dc.date.accepted | 2023-08-07 | - |
dc.contributor.author-college | 電機資訊學院 | - |
dc.contributor.author-dept | 資訊工程學系 | - |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-111-2.pdf 授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務) | 1.69 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。