請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92417完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 蘇炫榮 | zh_TW |
| dc.contributor.advisor | Hsuan-Jung Su | en |
| dc.contributor.author | 朱禹安 | zh_TW |
| dc.contributor.author | Yu-An Chu | en |
| dc.date.accessioned | 2024-03-22T16:24:41Z | - |
| dc.date.available | 2024-03-23 | - |
| dc.date.copyright | 2024-03-22 | - |
| dc.date.issued | 2023 | - |
| dc.date.submitted | 2023-12-16 | - |
| dc.identifier.citation | W. Weaver, “Recent contributions to the mathematical theory of communication,” ETC: a review of general semantics, pp. 261–281, 1953.
E. C. Strinati and S. Barbarossa, “6G networks: Beyond Shannon towards semantic and goal-oriented communications,” Computer Networks, vol. 190, p. 107930, 2021. A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018. B. Poole, S. Ozair, A. Van Den Oord, A. Alemi, and G. Tucker, “On variational bounds of mutual information,” in International Conference on Machine Learning. PMLR, 2019, pp. 5171–5180. W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems: Applications, trends, technologies, and open research problems,” IEEE Network, vol. 34, no. 3, pp. 134–142, 2019. E. C. Strinati, S. Barbarossa, T. Choi, A. Pietrabissa, A. Giuseppi, E. De Santis, J. Vidal, Z. Becvar, T. Haustein, N. Cassiau et al., “6G in the sky: On-demand intelligence at the edge of 3D networks,” arXiv preprint arXiv:2010.09463, 2020. H. Dong, C. Wu, Z. Wei, and Y. Guo, “Dropping activation outputs with localized first-layer deep network for enhancing user privacy and data security,” IEEE Transactions on Information Forensics and Security, vol. 13, no. 3, pp. 662–670, 2017. Z. Chen, W. Lin, S. Wang, L. Duan, and A. C. Kot, “Intermediate deep feature compression: the next battlefield of intelligent sensing,” arXiv preprint arXiv:1809.06196, 2018. H. Choi and I. V. Bajić, “Near-lossless deep feature compression for collaborative intelligence,” in 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP). IEEE, 2018, pp. 1–6. P. Zhang, W. Xu, H. Gao, K. Niu, X. Xu, X. Qin, C. Yuan, Z. Qin, H. Zhao, J. Wei et al., “Toward wisdom-evolutionary and primitive-concise 6G: A new paradigm of semantic communication networks,” Engineering, vol. 8, pp. 60–73, 2022. J. Hoydis, F. A. Aoudia, A. Valcarce, and H. Viswanathan, “Toward a 6G AI-native air interface,” IEEE Communications Magazine, vol. 59, no. 5, pp. 76–81, 2021. D. Gündüz, Z. Qin, I. E. Aguerri, H. S. Dhillon, Z. Yang, A. Yener, K. K. Wong, and C.-B. Chae, “Beyond transmitting bits: Context, semantics, and task-oriented communications,” IEEE Journal on Selected Areas in Communications, vol. 41, no. 1, pp. 5–41, 2022. M. Chafii, L. Bariah, S. Muhaidat, and M. Debbah, “Twelve scientific challenges for 6G: Rethinking the foundations of communications theory,” IEEE Communications Surveys & Tutorials, 2023. C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948. T.-Y. Tung, S. Kobus, J. P. Roig, and D. Gündüz, “Effective communications: A joint learning and communication framework for multi-agent reinforcement learning over noisy channels,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 8, pp. 2590–2603, 2021. Y. Xu, H. Zhou, and Y. Deng, “Task-Oriented Semantics-Aware Communication for Wireless UAV Control and Command Transmission,” IEEE Communications Letters, 2023. R. Carnap, Y. Bar-Hillel et al., “An outline of a theory of semantic information,” 1952. J. Bao, P. Basu, M. Dean, C. Partridge, A. Swami, W. Leland, and J. A. Hendler, “Towards a theory of semantic communication (extended technical report),” in Performing Organization: US Army Research Lab, Adelphy, Presented at the 2011 IEEE First International Workshop on Network Science, 2011. P. Basu, J. Bao, M. Dean, and J. Hendler, “Preserving quality of information by using semantic relationships,” Pervasive and Mobile Computing, vol. 11, pp. 188–202, 2014. A. Chattopadhyay, B. D. Haeffele, D. Geman, and R. Vidal, “Quantifying task complexity through generalized information measures,” 2020. A. Maatouk, M. Assaad, and A. Ephremides, “The age of incorrect information: An enabler of semantics-empowered communication,” IEEE Transactions on Wireless Communications, vol. 22, no. 4, pp. 2621– 2635, 2022. G. Shi, D. Gao, X. Song, J. Chai, M. Yang, X. Xie, L. Li, and X. Li, “A new communication paradigm: from bit accuracy to semantic fidelity,” arXiv preprint arXiv:2101.12649, 2021. H. Seo, J. Park, M. Bennis, and M. Debbah, “Semantics-native communication via contextual reasoning,” IEEE Transactions on Cognitive Communications and Networking, 2023. G. Shi, Y. Xiao, Y. Li, and X. Xie, “From semantic communication to semantic-aware networking: Model, architecture, and open problems,” IEEE Communications Magazine, vol. 59, no. 8, pp. 44–50, 2021. W. Tong and G. Y. Li, “Nine challenges in artificial intelligence and wireless communications for 6G,” IEEE Wireless Communications, 2022. T. O’Shea and J. Hoydis, “An introduction to deep learning for the physical layer,” IEEE Transactions on Cognitive Communications and Networking, vol. 3, no. 4, pp. 563–575, 2017. N. Samuel, T. Diskin, and A. Wiesel, “Learning to detect,” IEEE Transactions on Signal Processing, vol. 67, no. 10, pp. 2554–2564, 2019. Y. Shen, Y. Shi, J. Zhang, and K. B. Letaief, “Graph neural networks for scalable radio resource management: Architecture design and theoretical analysis,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 1, pp. 101–115, 2020. H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han, “Once-for-all: Train one network and specialize it for efficient deployment,” arXiv preprint arXiv:1908.09791, 2019. F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and¡ 0.5 MB model size,” arXiv preprint arXiv:1602.07360, 2016. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T.Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017. F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1251–1258. Y. Cheng, D. Wang, P. Zhou, and T. Zhang, “Model compression and acceleration for deep neural networks: The principles, progress, and challenges,” IEEE Signal Processing Magazine, vol. 35, no. 1, pp. 126– 136, 2018. X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6848–6856. W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, “Edge computing: Vision and challenges,” IEEE Internet of Things Journal, vol. 3, no. 5, pp. 637–646, 2016. E. Li, Z. Zhou, and X. Chen, “Edge intelligence: On-demand deep learning model co-inference with device-edge synergy,” in Proceedings of the 2018 Workshop on Mobile Edge Communications, 2018, pp. 31– 36. Y. Shi, K. Yang, T. Jiang, J. Zhang, and K. B. Letaief, “Communication-efficient edge AI: Algorithms and systems,” IEEE Communications Surveys & Tutorials, vol. 22, no. 4, pp. 2167–2191, 2020. X. Hou, S. Dey, J. Zhang, and M. Budagavi, “Predictive view generation to enable mobile 360-degree and VR experiences,” in Proceedings of the 2018 Morning Workshop on Virtual Reality and Augmented Reality Network, 2018, pp. 20–26. L. Liu, H. Li, and M. Gruteser, “Edge assisted real-time object detection for mobile augmented reality,” in The 25th Annual International Conference on Mobile Computing and Networking, 2019, pp. 1–16. Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, “Edge intelligence: Paving the last mile of artificial intelligence with edge computing,” Proceedings of the IEEE, vol. 107, no. 8, pp. 1738–1762, 2019. J. Shao and J. Zhang, “Communication-computation trade-off in resource-constrained edge inference,” IEEE Communications Magazine, vol. 58, no. 12, pp. 20–26, 2020. Y. Matsubara and M. Levorato, “Neural compression and filtering for edge-assisted real-time object detection in challenged networks,” in 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2021, pp. 2272–2279. Y. Matsubara, R. Yang, M. Levorato, and S. Mandt, “Supervised compression for resource-constrained edge computing systems,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 2685–2695. Y. Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang, “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,” ACM SIGARCH Computer Architecture News, vol. 45, no. 1, pp. 615–629, 2017. H. Li, C. Hu, J. Jiang, Z. Wang, Y. Wen, and W. Zhu, “Jalad: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution,” in 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 2018, pp. 671–678. W. Shi, Y. Hou, S. Zhou, Z. Niu, Y. Zhang, and L. Geng, “Improving device-edge cooperative inference of deep learning via 2-step pruning,” in IEEE INFOCOM 2019-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 2019, pp. 1–6. A. E. Eshratifar, A. Esmaili, and M. Pedram, “Bottlenet: A deep learning architecture for intelligent mobile cloud computing services,” in 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). IEEE, 2019, pp. 1–6. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778. J. Shao and J. Zhang, “Bottlenet++: An end-to-end approach for feature compression in device-edge co-inference systems,” in 2020 IEEE International Conference on Communications Workshops (ICC Workshops). IEEE, 2020, pp. 1–6. J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” arXiv preprint arXiv:1802.01436, 2018. S. Luo, Y. Yang, Y. Yin, C. Shen, Y. Zhao, and M. Song, “DeepSIC: Deep semantic image compression,” in Neural Information Processing: 25th International Conference, ICONIP 2018, Siem Reap, Cambodia, December 13-16, 2018, Proceedings, Part I 25. Springer, 2018, pp. 96–106. S. Singh, S. Abu-El-Haija, N. Johnston, J. Ballé, A. Shrivastava, and G. Toderici, “End-to-end learning of compressible features,” in 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2020, pp. 3349–3353. R. Gallager, “Low-density parity-check codes,” IRE Transactions on information theory, vol. 8, no. 1, pp. 21–28, 1962. C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error-correcting coding and decoding: Turbo-codes. 1,” in Proceedings of ICC’93-IEEE International Conference on Communications, vol. 2. IEEE, 1993, pp. 1064–1070. E. Arikan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEE Transactions on Information Theory, vol. 55, no. 7, pp. 3051– 3073, 2009. A. Goldsmith, “Joint source/channel coding for wireless channels,” in 1995 IEEE 45th Vehicular Technology Conference. Countdown to the Wireless Twenty-First Century, vol. 2. IEEE, 1995, pp. 614–618. F. Zhai, Y. Eisenberg, and A. K. Katsaggelos, “Joint source-channel coding for video communications,” Handbook of Image and Video Processing, pp. 1065–1082, 2005. N. Farsad, M. Rao, and A. Goldsmith, “Deep learning for joint source-channel coding of text,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 2326–2330. E. Bourtsoulatze, D. B. Kurka, and D. Gündüz, “Deep joint source-channel coding for wireless image transmission,” IEEE Transactions on Cognitive Communications and Networking, vol. 5, no. 3, pp. 567–579, 2019. Y. M. Saidutta, A. Abdi, and F. Fekri, “Joint source-channel coding for Gaussian sources over AWGN channels using variational autoencoders,” in 2019 IEEE International Symposium on Information Theory (ISIT). IEEE, 2019, pp. 1327–1331. A. Voulodimos, N. Doulamis, A. Doulamis, and E. Protopapadakis, “Deep learning for computer vision: A brief review,” Computational Intelligence and Neuroscience, vol. 2018, 2018. T. Young, D. Hazarika, S. Poria, and E. Cambria, “Recent trends in deep learning based natural language processing,” IEEE Computational Intelligence Magazine, vol. 13, no. 3, pp. 55–75, 2018. C.-H. Lee, J.-W. Lin, P.-H. Chen, and Y.-C. Chang, “Deep learningconstructed joint transmission-recognition for Internet of Things,” IEEE Access, vol. 7, pp. 76 547–76 561, 2019. J. Shao, Y. Mao, and J. Zhang, “Learning task-oriented communication for edge inference: An information bottleneck approach,” IEEE Journal on Selected Areas in Communications, vol. 40, no. 1, pp. 197–211, 2021. Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013. A. Achille and S. Soatto, “Information dropout: Learning optimal representations through noisy computation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 12, pp. 2897–2905, 2018. Y. Dubois, B. Bloem-Reddy, K. Ullrich, and C. J. Maddison, “Lossy compression for lossless prediction,” Advances in Neural Information Processing Systems, vol. 34, pp. 14 014–14 028, 2021. O. Atan, Y. Andreopoulos, C. Tekin, and M. van der Schaar, “Bandit framework for systematic learning in wireless video-based face recognition,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 1, pp. 180–194, 2014. A. E. Eshratifar, M. S. Abrishami, and M. Pedram, “JointDNN: An efficient training and inference engine for intelligent mobile cloud computing services,” IEEE Transactions on Mobile Computing, vol. 20, no. 2, pp. 565–576, 2019. J. Chen and X. Ran, “Deep learning with edge computing: A review,” Proceedings of the IEEE, vol. 107, no. 8, pp. 1655–1674, 2019. M. Kalfa, M. Gok, A. Atalik, B. Tegin, T. M. Duman, and O. Arikan, “Towards goal-oriented semantic signal processing: Applications and future challenges,” Digital Signal Processing, vol. 119, p. 103134, 2021. G. Zhang, Q. Hu, Z. Qin, Y. Cai, G. Yu, and X. Tao, “A unified multi-task semantic communication system for multimodal data,” arXiv preprint arXiv:2209.07689, 2022. G. Zhang, Q. Hu, Z. Qin, Y. Cai, and G. Yu, “A unified multi-task semantic communication system with domain adaptation,” in GLOBECOM 2022-2022 IEEE Global Communications Conference. IEEE, 2022, pp. 3971–3976. R. Zarcone, D. Paiton, A. Anderson, J. Engel, H. P. Wong, and B. Olshausen, “Joint source-channel coding with neural networks for analog data compression and storage,” in 2018 Data Compression Conference. IEEE, 2018, pp. 147–156. D. Burth Kurka and D. Gündüz, “Joint source-channel coding of images with (not very) deep learning,” in International Zurich Seminar on Information and Communication (IZS 2020). Proceedings. ETH Zurich, 2020, pp. 90–94. T.-Y. Tung and D. Gündüz, “DeepWiVe: Deep-learning-aided wireless video transmission,” IEEE Journal on Selected Areas in Communications, vol. 40, no. 9, pp. 2570–2583, 2022. P. Jiang, C.-K. Wen, S. Jin, and G. Y. Li, “Deep source-channel coding for sentence semantic transmission with HARQ,” IEEE Transactions on Communications, vol. 70, no. 8, pp. 5225–5240, 2022. H. Xie, Z. Qin, G. Y. Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,” IEEE Transactions on Signal Processing, vol. 69, pp. 2663–2675, 2021. F. Liu, W. Tong, Z. Sun, and C. Guo, “Task-oriented semantic communication systems based on extended rate-distortion theory,” arXiv preprint arXiv:2201.10929, 2022. M. Sana and E. C. Strinati, “Learning semantics: An opportunity for effective 6G communications,” in 2022 IEEE 19th Annual Consumer Communications & Networking Conference (CCNC). IEEE, 2022, pp. 631–636. A. Alemi, B. Poole, I. Fischer, J. Dillon, R. A. Saurous, and K. Murphy, “Fixing a broken ELBO,” in International Conference on Machine Learning. PMLR, 2018, pp. 159–168. R. T. Chen, X. Li, R. B. Grosse, and D. K. Duvenaud, “Isolating sources of disentanglement in variational autoencoders,” Advances in Neural Information Processing Systems, vol. 31, 2018. A. Serdega and D.-S. Kim, “VMI-VAE: Variational mutual information maximization framework for VAE with discrete and continuous priors,” arXiv preprint arXiv:2005.13953, 2020. D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropagation and approximate inference in deep generative models,” in International Conference on Machine Learning. PMLR, 2014, pp. 1278– 1286. D. Blei, R. Ranganath, and S. Mohamed, “Variational inference: Foundations and modern methods,” NIPS Tutorial, 2016. E. L. Denton et al., “Unsupervised learning of disentangled representations from video,” Advances in Neural Information Processing Systems, vol. 30, 2017. R. Balestriero, M. Ibrahim, V. Sobal, A. Morcos, S. Shekhar, T. Goldstein, F. Bordes, A. Bardes, G. Mialon, Y. Tian, A. Schwarzschild, A. G. Wilson, J. Geiping, Q. Garrido, P. Fernandez, A. Bar, H. Pirsiavash, Y. LeCun, and M. Goldblum, “A cookbook of self-supervised learning,” 2023. J. Shao, Y. Mao, and J. Zhang, “Task-oriented communication for multidevice cooperative edge inference,” IEEE Transactions on Wireless Communications, vol. 22, no. 1, pp. 73–87, 2022. S. Ma, W. Qiao, Y. Wu, H. Li, G. Shi, D. Gao, Y. Shi, S. Li, and N. AlDhahir, “Task-oriented Explainable Semantic Communications,” IEEE Transactions on Wireless Communications, 2023. J. Huang, D. Li, C. Huang, X. Qin, and W. Zhang, “Joint Task and Data Oriented Semantic Communications: A Deep Separate Sourcechannel Coding Scheme,” arXiv preprint arXiv:2302.13580, 2023. M. Jankowski, D. Gündüz, and K. Mikolajczyk, “Wireless image retrieval at the edge,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 1, pp. 89–100, 2020. X. Kang, B. Song, J. Guo, Z. Qin, and F. R. Yu, “Task-oriented image transmission for scene classification in unmanned aerial systems,” IEEE Transactions on Communications, vol. 70, no. 8, pp. 5181–5192, 2022. P. Jiang, C.-K. Wen, S. Jin, and G. Y. Li, “Wireless semantic communications for video conferencing,” IEEE Journal on Selected Areas in Communications, vol. 41, no. 1, pp. 230–244, 2022. W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, “Learning structured sparsity in deep neural networks,” Advances in Neural Information Processing Systems, vol. 29, 2016. J. Frankle and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” arXiv preprint arXiv:1803.03635, 2018. Z. Liu, M. Sun, T. Zhou, G. Huang, and T. Darrell, “Rethinking the value of network pruning,” arXiv preprint arXiv:1810.05270, 2018. J. Ba and R. Caruana, “Do deep nets really need to be deep?” Advances in Neural Information Processing Systems, vol. 27, 2014. G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015. M. Courbariaux, Y. Bengio, and J.-P. David, “Binaryconnect: Training deep neural networks with binary weights during propagations,” Advances in Neural Information Processing Systems, vol. 28, 2015. M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net: Imagenet classification using binary convolutional neural networks,” in European Conference on Computer Vision. Springer, 2016, pp. 525– 542. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations,” The Journal of Machine Learning Research, vol. 18, no. 1, pp. 6869–6898, 2017. D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013. N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,” arXiv preprint physics/0004057, 2000. D. J. MacKay, Information theory, inference and learning algorithms. Cambridge University Press, 2003. L. Da Xu, W. He, and S. Li, “Internet of things in industries: A survey,” IEEE Transactions on Industrial Informatics, vol. 10, no. 4, pp. 2233–2243, 2014. A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and M. Ayyash, “Internet of things: A survey on enabling technologies, protocols, and applications,” IEEE Communications Surveys & Tutorials, vol. 17, no. 4, pp. 2347–2376, 2015. H. Xu, W. Yu, D. Griffith, and N. Golmie, “A survey on industrial Internet of Things: A cyber-physical systems perspective,” IEEE Access, vol. 6, pp. 78 238–78 259, 2018. C. Badue, R. Guidolini, R. V. Carneiro, P. Azevedo, V. B. Cardoso, A. Forechi, L. Jesus, R. Berriel, T. M. Paixao, F. Mutz et al., “Self-driving cars: A survey,” Expert Systems with Applications, vol. 165, p. 113816, 2021. K. Zhang, Y. Mao, S. Leng, Y. He, and Y. Zhang, “Mobile-edge computing for vehicular networks: A promising network paradigm with predictive off-loading,” IEEE Vehicular Technology Magazine, vol. 12, no. 2, pp. 36–44, 2017. S. Liu, L. Liu, J. Tang, B. Yu, Y. Wang, and W. Shi, “Edge computing for autonomous driving: Opportunities and challenges,” Proceedings of the IEEE, vol. 107, no. 8, pp. 1697–1716, 2019. A. Alalewi, I. Dayoub, and S. Cherkaoui, “On 5G-V2X use cases and enabling technologies: A comprehensive survey,” IEEE Access, vol. 9, pp. 107 710–107 737, 2021. C.-Y. Lin, K.-C. Chen, D. Wickramasuriya, S.-Y. Lien, and R. D. Gitlin, “Anticipatory mobility management by big data analytics for ultra-low latency mobile networking,” in 2018 IEEE International Conference on Communications (ICC). IEEE, 2018, pp. 1–7. K.-C. Chen, T. Zhang, R. D. Gitlin, and G. Fettweis, “Ultra-low latency mobile networking,” IEEE Network, vol. 33, no. 2, pp. 181–187, 2018. L. Chettri and R. Bera, “A comprehensive survey on Internet of Things (IoT) toward 5G wireless systems,” IEEE Internet of Things Journal, vol. 7, no. 1, pp. 16–32, 2019. M. Jankowski, D. Gündüz, and K. Mikolajczyk, “Joint device-edge inference over wireless links with pruning,” in 2020 IEEE 21st International Workshop on Signal Processing Advances in Wireless Communications (SPAWC). IEEE, 2020, pp. 1–5. I. Shomorony and A. S. Avestimehr, “Worst-case additive noise in wireless networks,” IEEE Transactions on Information Theory, vol. 59, no. 6, pp. 3833–3847, 2013. S. Dörner, S. Cammerer, J. Hoydis, and S. Ten Brink, “Deep learning based communication over the air,” IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 1, pp. 132–143, 2017. D. Huang, X. Tao, F. Gao, and J. Lu, “Deep learning-based image semantic coding for semantic communications,” in 2021 IEEE Global Communications Conference (GLOBECOM). IEEE, 2021, pp. 1–6. Z. Weng and Z. Qin, “Semantic communication systems for speech transmission,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 8, pp. 2434–2444, 2021. M. Yang, C. Bian, and H.-S. Kim, “OFDM-guided deep joint source channel coding for wireless multipath fading channels,” IEEE Transactions on Cognitive Communications and Networking, vol. 8, no. 2, pp. 584–599, 2022. D. Huang, F. Gao, X. Tao, Q. Du, and J. Lu, “Toward semantic communications: Deep learning-based image semantic coding,” IEEE Journal on Selected Areas in Communications, vol. 41, no. 1, pp. 55–71, 2022. D. B. Kurka and D. Gündüz, “DeepJSCC-f: Deep joint source-channel coding of images with feedback,” IEEE Journal on Selected Areas in Information Theory, vol. 1, no. 1, pp. 178–193, 2020. H. Xie, Z. Qin, and G. Y. Li, “Task-oriented multi-user semantic communications for VQA,” IEEE Wireless Communications Letters, vol. 11, no. 3, pp. 553–557, 2021. Z. Weng, Z. Qin, X. Tao, C. Pan, G. Liu, and G. Y. Li, “Deep learning enabled semantic communications with speech recognition and synthesis,” IEEE Transactions on Wireless Communications, 2023. R. Torfason, F. Mentzer, E. Agustsson, M. Tschannen, R. Timofte, and L. Van Gool, “Towards image understanding from deep compression without decoding,” arXiv preprint arXiv:1803.06131, 2018. N. Patwa, N. Ahuja, S. Somayazulu, O. Tickoo, S. Varadarajan, and S. Koolagudi, “Semantic-preserving image compression,” in 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2020, pp. 1281–1285. H. Xie, Z. Qin, X. Tao, and K. B. Letaief, “Task-oriented multi-user semantic communications,” IEEE Journal on Selected Areas in Communications, vol. 40, no. 9, pp. 2584–2597, 2022. C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in International Conference on Machine Learning. PMLR, 2017, pp. 1126–1135. T. Standley, A. Zamir, D. Chen, L. Guibas, J. Malik, and S. Savarese, “Which tasks should be learned together in multi-task learning?” in International Conference on Machine Learning. PMLR, 2020, pp. 9120–9132. C. Fifty, E. Amid, Z. Zhao, T. Yu, R. Anil, and C. Finn, “Efficiently identifying task groupings for multi-task learning,” Advances in Neural Information Processing Systems, vol. 34, pp. 27 503–27 516, 2021. W. Tong, F. Liu, Z. Sun, Y. Yang, and C. Guo, “Image Semantic Communications: An Extended Rate-Distortion Theory Based Scheme,” in 2022 IEEE Globecom Workshops (GC Wkshps). IEEE, 2022, pp. 1723–1728. Y. Matsubara, D. Callegaro, S. Baidya, M. Levorato, and S. Singh, “Head Network Distillation: Splitting Distilled Deep Neural Networks for Resource-Constrained Edge Computing Systems,” IEEE Access, vol. 8, pp. 212 177–212 193, 2020. D. McAllester and K. Stratos, “Formal limitations on the measurement of mutual information,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2020, pp. 875–884. L. Paninski, “Estimation of entropy and mutual information,” Neural computation, vol. 15, no. 6, pp. 1191–1253, 2003. T.-Y. Tung, D. B. Kurka, M. Jankowski, and D. Gündüz, “DeepJSCCQ: Constellation constrained deep joint source-channel coding,” IEEE Journal on Selected Areas in Information Theory, 2022. I. E. Aguerri and D. Gündüz, “Joint source-channel coding with time-varying channel and side-information,” IEEE Transactions on Information Theory, vol. 62, no. 2, pp. 736–753, 2015. Q. Hu, G. Zhang, Z. Qin, Y. Cai, G. Yu, and G. Y. Li, “Robust semantic communications with masked VQ-VAE enabled codebook,” IEEE Transactions on Wireless Communications, 2023. G. Urban, K. J. Geras, S. E. Kahou, O. Aslan, S. Wang, R. Caruana, A. Mohamed, M. Philipose, and M. Richardson, “Do deep convolutional nets really need to be deep and convolutional?” arXiv preprint arXiv:1603.05691, 2016. M. D. Emmerson and R. I. Damper, “Determining and improving the fault tolerance of multilayer perceptrons in a pattern-recognition application,” IEEE Transactions on Neural Networks, vol. 4, no. 5, pp. 788–793, 1993. C. Torres-Huitzil and B. Girau, “Fault and error tolerance in neural networks: A review,” IEEE Access, vol. 5, pp. 17 322–17 341, 2017. B. Qian, J. Su, Z.Wen, D. N. Jha, Y. Li, Y. Guan, D. Puthal, P. James, R. Yang, A. Y. Zomaya et al., “Orchestrating the development lifecycle of machine learning-based IoT applications: A taxonomy and survey,” ACM Computing Surveys (CSUR), vol. 53, no. 4, pp. 1–47, 2020. F. Pase, S. Kobus, D. Gündüz, and M. Zorzi, “Semantic Communication of Learnable Concepts,” in 2023 IEEE International Symposium on Information Theory (ISIT). IEEE, 2023, pp. 731–736. Q. Zhou, R. Li, Z. Zhao, Y. Xiao, and H. Zhang, “Adaptive bit rate control in semantic communication with incremental knowledge-based harq,” IEEE Open Journal of the Communications Society, vol. 3, pp. 1076–1089, 2022. S.Wang, J. Dai, Z. Liang, K. Niu, Z. Si, C. Dong, X. Qin, and P. Zhang, “Wireless deep video semantic transmission,” IEEE Journal on Selected Areas in Communications, vol. 41, no. 1, pp. 214–229, 2022. J. Xu, B. Ai, W. Chen, A. Yang, P. Sun, and M. Rodrigues, “Wireless image transmission using deep source channel coding with attention modules,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 4, pp. 2315–2328, 2021. M. Yang and H.-S. Kim, “Deep joint source-channel coding for wireless image transmission with adaptive rate control,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 5193–5197. X. Wang, F. Yu, Z.-Y. Dou, T. Darrell, and J. E. Gonzalez, “Skipnet: Learning dynamic routing in convolutional networks,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 409–424. J. Yu, L. Yang, N. Xu, J. Yang, and T. Huang, “Slimmable neural networks,” arXiv preprint arXiv:1812.08928, 2018. Z. Wu, T. Nagarajan, A. Kumar, S. Rennie, L. S. Davis, K. Grauman, and R. Feris, “Blockdrop: Dynamic inference paths in residual networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8817–8826. Z. Chen, Y. Li, S. Bengio, and S. Si, “You look twice: Gaternet for dynamic filter selection in cnns,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9172–9180. Y. Han, G. Huang, S. Song, L. Yang, H.Wang, and Y.Wang, “Dynamic neural networks: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 11, pp. 7436–7456, 2021. D. Ganguly, “Learning variable-length representation of words,” Pattern Recognition, vol. 103, p. 107306, 2020. D. B. Kurka and D. Gündüz, “Bandwidth-agile image transmission with deep joint source-channel coding,” IEEE Transactions on Wireless Communications, vol. 20, no. 12, pp. 8081–8095, 2021. X. Chen, D. P. Kingma, T. Salimans, Y. Duan, P. Dhariwal, J. Schulman, I. Sutskever, and P. Abbeel, “Variational lossy autoencoder,” arXiv preprint arXiv:1611.02731, 2016. J. Lee, S. Cho, and S.-K. Beack, “Context-adaptive entropy model for end-to-end optimized image compression,” arXiv preprint arXiv:1809.10452, 2018. Y. Yang, R. Bamler, and S. Mandt, “Variational Bayesian quantization,” in International Conference on Machine Learning. PMLR, 2020, pp. 10 670–10 680. D. Liu, D. Wang, and H. Li, “Recognizable or not: Towards image semantic quality assessment for compression,” Sensing and Imaging, vol. 18, pp. 1–20, 2017. D. Liu, H. Zhang, and Z. Xiong, “On the classification-distortion-perception tradeoff,” Advances in Neural Information Processing Systems, vol. 32, 2019. C. T. Li and A. El Gamal, “Strong functional representation lemma and applications to coding theorems,” IEEE Transactions on Information Theory, vol. 64, no. 11, pp. 6967–6978, 2018. E. Agustsson and L. Theis, “Universally quantized neural compression,” Advances in Neural Information Processing Systems, vol. 33, pp. 12 367–12 376, 2020. M. Havasi, R. Peharz, and J. M. Hernández-Lobato, “Minimal random code learning: Getting bits back from compressed model parameters,” arXiv preprint arXiv:1810.00440, 2018. C. H. Bennett, P. W. Shor, J. A. Smolin, and A. V. Thapliyal, “Entanglement-assisted capacity of a quantum channel and the reverse Shannon theorem,” IEEE Transactions on Information Theory, vol. 48, no. 10, pp. 2637–2655, 2002. P. Cuff, “Communication requirements for generating correlated random variables,” in 2008 IEEE International Symposium on Information Theory. IEEE, 2008, pp. 1393–1397. G. Flamich, M. Havasi, and J. M. Hernández-Lobato, “Compressing images by encoding their latent representations with relative entropy coding,” Advances in Neural Information Processing Systems, vol. 33, pp. 16 131–16 141, 2020. L. Theis and N. Y. Ahmed, “Algorithms for the communication of samples,” in International Conference on Machine Learning. PMLR, 2022, pp. 21 308–21 328. P. M. Long and R. A. Servedio, “Restricted Boltzmann machines are hard to approximately evaluate or simulate,” 2010. J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” arXiv preprint arXiv:1611.01704, 2016. L. Theis, W. Shi, A. Cunningham, and F. Huszár, “Lossy image compression with compressive autoencoders,” arXiv preprint arXiv:1703.00395, 2017. J. J. Rissanen, “Generalized Kraft inequality and arithmetic coding,” IBM Journal of Research and Development, vol. 20, no. 3, pp. 198–203, 1976. I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic coding for data compression,” Communications of the ACM, vol. 30, no. 6, pp. 520–540, 1987. J. Duda, “Asymmetric numeral systems,” arXiv preprint arXiv:0902.0271, 2009. O. Henaff, “Data-efficient image recognition with contrastive predictive coding,” in International Conference on Machine Learning. PMLR, 2020, pp. 4182–4192. T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International Conference on Machine Learning. PMLR, 2020, pp. 1597–1607. Y. Tian, “Deep contrastive learning is provably (almost) principal component analysis,” arXiv preprint arXiv:2201.12680, 2022. Y. Tian, C. Sun, B. Poole, D. Krishnan, C. Schmid, and P. Isola, “What makes for good views for contrastive learning?” Advances in Neural Information Processing Systems, vol. 33, pp. 6827–6839, 2020. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International Conference on Machine Learning. PMLR, 2021, pp. 8748–8763. J. Song and S. Ermon, “Multi-label contrastive predictive coding,” Advances in Neural Information Processing Systems, vol. 33, pp. 8161– 8173, 2020. T. Chen, S. Kornblith, K. Swersky, M. Norouzi, and G. E. Hinton, “Big self-supervised models are strong semi-supervised learners,” Advances in Neural Information Processing Systems, vol. 33, pp. 22 243–22 255, 2020. A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” arXiv preprint arXiv:1612.00410, 2016. N. Slonim, G. S. Atwal, G. Tkaˇcik, and W. Bialek, “Information-based clustering,” Proceedings of the National Academy of Sciences, vol. 102, no. 51, pp. 18 297–18 302, 2005. O. Shamir, S. Sabato, and N. Tishby, “Learning and generalization with the information bottleneck,” Theoretical Computer Science, vol. 411, no. 29-30, pp. 2696–2711, 2010. M. Vera, P. Piantanida, and L. R. Vega, “The role of the information bottleneck in representation learning,” in 2018 IEEE International Symposium on Information Theory (ISIT). IEEE, 2018, pp. 1580– 1584. J. Zbontar, L. Jing, I. Misra, Y. LeCun, and S. Deny, “Barlow twins: Self-supervised learning via redundancy reduction,” in International Conference on Machine Learning. PMLR, 2021, pp. 12 310–12 320. A. Bardes, J. Ponce, and Y. LeCun, “Vicreg: Variance-invariance-covariance regularization for self-supervised learning,” arXiv preprint arXiv:2105.04906, 2021. M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9650–9660. J. Zhou, C. Wei, H. Wang, W. Shen, C. Xie, A. Yuille, and T. Kong, “ibot: Image bert pre-training with online tokenizer,” arXiv preprint arXiv:2111.07832, 2021. C. Yang, Z. Wu, B. Zhou, and S. Lin, “Instance localization for self-supervised detection pretraining,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3987–3996. A. Bardes, J. Ponce, and Y. LeCun, “Vicregl: Self-supervised learning of local visual features,” Advances in Neural Information Processing Systems, vol. 35, pp. 8799–8810, 2022. K. Zha, P. Cao, Y. Yang, and D. Katabi, “Supervised Contrastive Regression,” arXiv preprint arXiv:2210.01189, 2022. S. J. Oh, K. Murphy, J. Pan, J. Roth, F. Schroff, and A. Gallagher, “Modeling uncertainty with hedged instance embedding,” arXiv preprint arXiv:1810.00319, 2018. S. Braun and I. Tashev, “Data augmentation and loss normalization for deep noise suppression,” in International Conference on Speech and Computer. Springer, 2020, pp. 79–86. K. Sohn, “Improved deep metric learning with multi-class n-pair loss objective,” Advances in Neural Information Processing Systems, vol. 29, 2016. Z. Wu, Y. Xiong, S. X. Yu, and D. Lin, “Unsupervised feature learning via non-parametric instance discrimination,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3733–3742. P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, “Supervised contrastive learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 18 661–18 673, 2020. O. Zhang, M. Wu, J. Bayrooti, and N. Goodman, “Temperature as uncertainty in contrastive learning,” arXiv preprint arXiv:2110.04403, 2021. T.Wang and P. Isola, “Understanding contrastive representation learning through alignment and uniformity on the hypersphere,” in International Conference on Machine Learning. PMLR, 2020, pp. 9929–9939. F.Wang and H. Liu, “Understanding the behaviour of contrastive loss,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2495–2504. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International Conference on Machine Learning. pmlr, 2015, pp. 448–456. D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Instance normalization: The missing ingredient for fast stylization,” arXiv preprint arXiv:1607.08022, 2016. J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016. X. Li, S. Chen, X. Hu, and J. Yang, “Understanding the disharmony between dropout and batch normalization by variance shift,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2682–2690. J. Ballé, V. Laparra, and E. P. Simoncelli, “Density modeling of images using a generalized normalization transformation,” arXiv preprint arXiv:1511.06281, 2015. Y. Bengio, N. Léonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” arXiv preprint arXiv:1308.3432, 2013. F. Bordes, R. Balestriero, Q. Garrido, A. Bardes, and P. Vincent, “Guillotine Regularization: Why removing layers is needed to improve generalization in Self-Supervised Learning,” Transactions on Machine Learning Research, 2023. J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?” Advances in Neural Information Processing Systems, vol. 27, 2014. M. B. Sariyildiz, Y. Kalantidis, K. Alahari, and D. Larlus, “No reason for no supervision: Improved generalization in supervised models,” in ICLR 2023-International Conference on Learning Representations, 2023, pp. 1–26. L. Jing, P. Vincent, Y. LeCun, and Y. Tian, “Understanding dimensional collapse in contrastive self-supervised learning,” arXiv preprint arXiv:2110.09348, 2021. B. C. Hall and B. C. Hall, Lie groups, Lie algebras, and representations. Springer, 2013. R. Cosentino, A. Sengupta, S. Avestimehr, M. Soltanolkotabi, A. Ortega, T. Willke, and M. Tepper, “Toward a geometrical understanding of self-supervised contrastive learning,” arXiv preprint arXiv:2205.06926, 2022. G. Mialon, R. Balestriero, and Y. LeCun, “Variance covariance regularization enforces pairwise independence in self-supervised representations,” arXiv preprint arXiv:2209.14905, 2022. A. L. Maas, A. Y. Hannun, A. Y. Ng et al., “Rectifier nonlinearities improve neural network acoustic models,” in International Conference on Machine Learning, vol. 30, no. 1. Atlanta, GA, 2013, p. 3. K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1026–1034. M. F. Mathieu, J. J. Zhao, J. Zhao, A. Ramesh, P. Sprechmann, and Y. LeCun, “Disentangling factors of variation in deep representation using adversarial training,” Advances in Neural Information Processing Systems, vol. 29, 2016. A. Szabó, Q. Hu, T. Portenier, M. Zwicker, and P. Favaro, “Challenges in disentangling independent factors of variation,” arXiv preprint arXiv:1711.02245, 2017. J. Kahana and Y. Hoshen, “A contrastive objective for learning disentangled representations,” in European Conference on Computer Vision. Springer, 2022, pp. 579–595. P. Bachman, R. D. Hjelm, and W. Buchwalter, “Learning representations by maximizing mutual information across views,” Advances in Neural Information Processing Systems, vol. 32, 2019. A. Kolesnikov, X. Zhai, and L. Beyer, “Revisiting self-supervised visual representation learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1920–1929. H. Bao, L. Dong, S. Piao, and F. Wei, “Beit: Bert pre-training of image transformers,” arXiv preprint arXiv:2106.08254, 2021. F. Bordes, R. Balestriero, and P. Vincent, “Towards Democratizing Joint-Embedding Self-Supervised Learning,” arXiv preprint arXiv:2303.01986, 2023. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in Neural Information Processing Systems, vol. 32, 2019. Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep Learning Face Attributes in the Wild,” in Proceedings of International Conference on Computer Vision (ICCV), December 2015. Y. You, I. Gitman, and B. Ginsburg, “Large batch training of convolutional networks,” arXiv preprint arXiv:1708.03888, 2017. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014. I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” arXiv preprint arXiv:1608.03983, 2016. A. Oliver, A. Odena, C. A. Raffel, E. D. Cubuk, and I. Goodfellow, “Realistic evaluation of deep semi-supervised learning algorithms,” Advances in Neural Information Processing Systems, vol. 31, 2018. C. H. Bennett, P. Gács, M. Li, P. M. Vitányi, and W. H. Zurek, “Information distance,” IEEE Transactions on Information Theory, vol. 44, no. 4, pp. 1407–1423, 1998. P. M. Vitányi, F. J. Balbach, R. L. Cilibrasi, and M. Li, “Normalized information distance,” Information Theory and Statistical Learning, pp. 45–82, 2009. N. Nikvand and Z. Wang, “Generic image similarity based on Kolmogorov complexity,” in 2010 IEEE International Conference on Image Processing. IEEE, 2010, pp. 309–312. W. Ewert, W. A. Dembski, and R. J. Marks, “Measuring meaningful information in images: algorithmic specified complexity,” IET Computer Vision, vol. 9, no. 6, pp. 884–894, 2015. L. Mahon and T. Lukasiewicz, “Minimum description length clustering to measure meaningful image complexity,” Pattern Recognition, vol. 145, p. 109889, 2024. A. Hore and D. Ziou, “Image quality metrics: PSNR vs. SSIM,” in 2010 20th International Conference on Pattern Recognition. IEEE, 2010, pp. 2366–2369. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004. “CelebA Dataset,” https://mmlab.ie.cuhk.edu.hk/projects/CelebA. html, accessed: 2023-03-28. T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training gans,” Advances in Neural Information Processing Systems, vol. 29, 2016. M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local Nash equilibrium,” Advances in Neural Information Processing Systems, vol. 30, 2017. D. George and E. Huerta, “Deep neural networks to enable real-time multimessenger astrophysics,” Physical Review D, vol. 97, no. 4, p. 044039, 2018. Q. Garrido, R. Balestriero, L. Najman, and Y. Lecun, “Rankme: Assessing the downstream performance of pretrained self-supervised representations by their rank,” in International Conference on Machine Learning. PMLR, 2023, pp. 10 929–10 974. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial Intelligence and Statistics. PMLR, 2017, pp. 1273– 1282. R. Li, F. Ma, W. Jiang, and J. Gao, “Online federated multitask learning,” in 2019 IEEE International Conference on Big Data (Big Data). IEEE, 2019, pp. 215–220. Z. Qin, G. Y. Li, and H. Ye, “Federated learning and wireless communications,” IEEE Wireless Communications, vol. 28, no. 5, pp. 134–140, 2021. H. Tong, Z. Yang, S. Wang, Y. Hu, W. Saad, and C. Yin, “Federated learning based audio semantic communication over wireless networks,” in 2021 IEEE Global Communications Conference (GLOBECOM). IEEE, 2021, pp. 1–6. I. E. Aguerri and A. Zaidi, “Distributed variational representation learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 1, pp. 120–138, 2019. T. A. Courtade and T. Weissman, “Multiterminal source coding under logarithmic loss,” IEEE Transactions on Information Theory, vol. 60, no. 1, pp. 740–761, 2013. Y. U˘gur, I. E. Aguerri, and A. Zaidi, “Vector Gaussian CEO problem under logarithmic loss and applications,” IEEE Transactions on Information Theory, vol. 66, no. 7, pp. 4183–4202, 2020. X. Liu, W. Liu, T. Mei, and H. Ma, “A deep learning-based approach to progressive vehicle re-identification for urban surveillance,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer, 2016, pp. 869–884. D. Zou, P. Tan, and W. Yu, “Collaborative visual SLAM for multiple agents: A brief survey,” Virtual Reality & Intelligent Hardware, vol. 1, no. 5, pp. 461–482, 2019. E. Unlu, E. Zenou, N. Riviere, and P.-E. Dupouy, “Deep learningbased strategies for the detection and tracking of drones using several cameras,” IPSJ Transactions on Computer Vision and Applications, vol. 11, no. 1, pp. 1–13, 2019. M. Moldoveanu and A. Zaidi, “On in-network learning. A comparative study with federated and split learning,” in 2021 IEEE 22nd International Workshop on Signal Processing Advances in Wireless Communications (SPAWC). IEEE, 2021, pp. 221–225. S. Liu, A. Mallol-Ragolta, E. Parada-Cabaleiro, K. Qian, X. Jing, A. Kathan, B. Hu, and B. W. Schuller, “Audio self-supervised learning: A survey,” Patterns, vol. 3, no. 12, 2022. M. C. Schiappa, Y. S. Rawat, and M. Shah, “Self-supervised learning for videos: A survey,” ACM Computing Surveys, 2022. B. Min, H. Ross, E. Sulem, A. P. B. Veyseh, T. H. Nguyen, O. Sainz, E. Agirre, I. Heintz, and D. Roth, “Recent advances in natural language processing via large pre-trained language models: A survey,” ACM Computing Surveys, 2021. I. Rubachev, A. Alekberov, Y. Gorishniy, and A. Babenko, “Revisiting pretraining objectives for tabular deep learning,” arXiv preprint arXiv:2207.03208, 2022. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pretraining of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020. Z. Li, Z. Chen, F. Yang, W. Li, Y. Zhu, C. Zhao, R. Deng, L. Wu, R. Zhao, M. Tang et al., “Mst: Masked self-supervised transformer for visual representation,” Advances in Neural Information Processing Systems, vol. 34, pp. 13 165–13 176, 2021. K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16 000–16 009. Z. Xie, Z. Zhang, Y. Cao, Y. Lin, J. Bao, Z. Yao, Q. Dai, and H. Hu, “Simmim: A simple framework for masked image modeling,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 9653–9663. M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby et al., “Dinov2: Learning robust visual features without supervision,” arXiv preprint arXiv:2304.07193, 2023. W. Liu, S. Chen, L. Guo, X. Zhu, and J. Liu, “Cptr: Full transformer network for image captioning,” arXiv preprint arXiv:2101.10804, 2021. H. Ye, X. Yang, M. Takac, R. Sunderraman, and S. Ji, “Improving text-to-image synthesis using contrastive learning,” arXiv preprint arXiv:2107.02423, 2021. A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-shot text-to-image generation,” in International Conference on Machine Learning. PMLR, 2021, pp. 8821–8831. C. Jia, Y. Yang, Y. Xia, Y.-T. Chen, Z. Parekh, H. Pham, Q. Le, Y.H. Sung, Z. Li, and T. Duerig, “Scaling up visual and vision-language representation learning with noisy text supervision,” in International Conference on Machine Learning. PMLR, 2021, pp. 4904–4916. A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with clip latents,” arXiv preprint arXiv:2204.06125, 2022. H. Chang, H. Zhang, J. Barber, A. Maschinot, J. Lezama, L. Jiang, M.-H. Yang, K. Murphy, W. T. Freeman, M. Rubinstein et al., “Muse: Text-to-image generation via masked generative transformers,” arXiv preprint arXiv:2301.00704, 2023. K. Sridharan and S. M. Kakade, “An information theoretic framework for multi-view learning,” 2008. Y.-H. H. Tsai, Y. Wu, R. Salakhutdinov, and L.-P. Morency, “Selfsupervised learning from a multi-view perspective,” arXiv preprint arXiv:2006.05576, 2020. C. Tosh, A. Krishnamurthy, and D. Hsu, “Contrastive learning, multiview redundancy, and linear models,” in Algorithmic Learning Theory. PMLR, 2021, pp. 1179–1206. J. D. Lee, Q. Lei, N. Saunshi, and J. Zhuo, “Predicting what you already know helps: Provable self-supervised learning,” Advances in Neural Information Processing Systems, vol. 34, pp. 309–323, 2021. M. Noroozi and P. Favaro, “Unsupervised learning of visual representations by solving jigsaw puzzles,” in European Conference on Computer Vision. Springer, 2016, pp. 69–84. G. E. Box, “Science and statistics,” Journal of the American Statistical Association, vol. 71, no. 356, pp. 791–799, 1976. R. J. Solomonoff, “A preliminary report on a general theory of inductive inference.” Citeseer, 1960. ——, “A formal theory of inductive inference. Part I,” Information and control, vol. 7, no. 1, pp. 1–22, 1964. G. J. Chaitin, “Algorithmic information theory,” IBM Journal of Research and Development, vol. 21, no. 4, pp. 350–359, 1977. R. J. Solomonoff, “Algorithmic probability: Theory and applications,” Information Theory and Statistical Learning, pp. 1–23, 2009. A. N. Kolmogorov, “Three approaches to the quantitative definition of information’,” Problems of Information Transmission, vol. 1, no. 1, pp. 1–7, 1965. G. J. Chaitin, “On the length of programs for computing finite binary sequences,” Journal of the ACM (JACM), vol. 13, no. 4, pp. 547–569, 1966. A. Kolmogorov, “Logical basis for information theory and probability theory,” IEEE Transactions on Information Theory, vol. 14, no. 5, pp. 662–664, 1968. A. K. Zvonkin and L. A. Levin, “The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms,” Russian Mathematical Surveys, vol. 25, no. 6, p. 83, 1970. G. J. Chaitin, “A theory of program size formally identical to information theory,” Journal of the ACM (JACM), vol. 22, no. 3, pp. 329–340, 1975. R. Solomonoff, “Complexity-based induction systems: comparisons and convergence theorems,” IEEE Transactions on Information Theory, vol. 24, no. 4, pp. 422–432, 1978. P. D. Grünwald and P. M. Vitányi, “Kolmogorov complexity and information theory. With an interpretation in terms of questions and answers,” Journal of Logic, Language and Information, vol. 12, pp. 497–529, 2003. P. Grunwald and P. Vitányi, “Shannon information and Kolmogorov complexity,” arXiv preprint cs/0410002, 2004. M. Li, P. Vitányi et al., An introduction to Kolmogorov complexity and its applications. Springer, 2008, vol. 3. J. Rissanen, “Stochastic complexity and modeling,” The annals of statistics, pp. 1080–1100, 1986. ——, Stochastic complexity in statistical inquiry. World Scientific, 1998, vol. 15. T. L. Fine, Theories of probability: An examination of foundations. Academic Press, 2014. K. Friston, J. Kilner, and L. Harrison, “A free energy principle for the brain,” Journal of Physiology-Paris, vol. 100, no. 1-3, pp. 70–87, 2006. K. Friston, “The free-energy principle: a unified brain theory?” Nature Reviews Neuroscience, vol. 11, no. 2, pp. 127–138, 2010. S. P. Veissière, A. Constant, M. J. Ramstead, K. J. Friston, and L. J. Kirmayer, “Thinking through other minds: A variational approach to cognition and culture,” Behavioral and brain sciences, vol. 43, p. e90, 2020. K. J. Friston, T. Parr, Y. Yufik, N. Sajid, C. J. Price, and E. Holmes, “Generative models, linguistic communication and active inference,” Neuroscience & Biobehavioral Reviews, vol. 118, pp. 42–64, 2020. T. Parr, G. Pezzulo, and K. J. Friston, Active inference: the free energy principle in mind, brain, and behavior. MIT Press, 2022. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901, 2020. L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27 730–27 744, 2022. H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al., “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023. OpenAI, “GPT-4 Technical Report,” 2023. “An Observation on Generalization,” https://simons.berkeley.edu/ talks/ilya-sutskever-openai-2023-08-14, accessed: 2023-08-24. “Compression for AGI,” https://www.youtube.com/watch?v= dO4TPJkeaaU, accessed: 2023-04-24. F. Bellard, “Lossless data compression with neural networks,” URL: https://bellard. org/nncp/nncp. pdf, 2019. M. Goyal, K. Tatwawadi, S. Chandak, and I. Ochoa, “DZip: Improved general-purpose loss less compression based on novel neural network modeling,” in 2021 Data Compression Conference (DCC). IEEE, 2021, pp. 153–162. Y. Mao, Y. Cui, T.-W. Kuo, and C. J. Xue, “A Fast Transformer-based General-Purpose Lossless Compressor,” arXiv preprint arXiv:2203.16114, 2022. Y. Yang, S. Mandt, L. Theis et al., “An introduction to neural data compression,” Foundations and Trends® in Computer Graphics and Vision, vol. 15, no. 2, pp. 113–200, 2023. N. Chater, “The search for simplicity: A fundamental cognitive principle?” The Quarterly Journal of Experimental Psychology: Section A, vol. 52, no. 2, pp. 273–302, 1999. M. Hutter, Universal artificial intelligence: Sequential decisions based on algorithmic probability. Springer Science & Business Media, 2004. S. Legg and M. Hutter, “Universal intelligence: A definition of machine intelligence,” Minds and Machines, vol. 17, pp. 391–444, 2007. J. W. Rae, S. Borgeaud, T. Cai, K. Millican, J. Hoffmann, F. Song, J. Aslanides, S. Henderson, R. Ring, S. Young et al., “Scaling language models: Methods, analysis & insights from training gopher,” arXiv preprint arXiv:2112.11446, 2021. A. Kolmogorov, “On the Shannon theory of information transmission in the case of continuous signals,” IRE Transactions on Information Theory, vol. 2, no. 4, pp. 102–108, 1956. D. R. Brillinger, “Information and Information Stability of Random Variables and Processes,” 1964. E. T. Jaynes, “Information theory and statistical mechanics,” Physical Review, vol. 106, no. 4, p. 620, 1957. S. Nowozin, B. Cseke, and R. Tomioka, “f-gan: Training generative neural samplers using variational divergence minimization,” Advances in Neural Information Processing Systems, vol. 29, 2016. Y. Polyanskiy and Y. Wu, Information theory: From coding to learning. Cambridge University Press, 2022. M. D. Donsker and S. S. Varadhan, “Asymptotic evaluation of certain Markov process expectations for large time, I,” Communications on Pure and Applied Mathematics, vol. 28, no. 1, pp. 1–47, 1975. “Mutual information estimator,” https://fleuret.org/files/ complement-slides-MI-estimator.pdf, accessed: 2022-10-18. X. Nguyen, M. J. Wainwright, and M. I. Jordan, “Estimating divergence functionals and the likelihood ratio by convex risk minimization,” IEEE Transactions on Information Theory, vol. 56, no. 11, pp. 5847– 5861, 2010. A. Ruderman, M. Reid, D. García-García, and J. Petterson, “Tighter variational representations of f-divergences via restriction to probability measures,” arXiv preprint arXiv:1206.4664, 2012. C. E. Shannon et al., “Coding theorems for a discrete source with a fidelity criterion,” IRE Nat. Conv. Rec, vol. 4, no. 142-163, p. 1, 1959. M. Thomas and A. T. Joy, Elements of information theory. WileyInterscience, 2006. D. J. Costello and G. D. Forney, “Channel coding: The road to channel capacity,” Proceedings of the IEEE, vol. 95, no. 6, pp. 1150–1177, 2007. T. Berger, “Rate distortion theory for sources with abstract alphabets and memory,” Information and Control, vol. 13, no. 3, pp. 254–273, 1968. ——, Rate Distortion Theory: A Mathematical Basis for Data Compression, ser. Prentice-Hall electrical engineering series. Prentice-Hall, 1971. R. Blahut, “Computation of channel capacity and rate-distortion functions,” IEEE Transactions on Information Theory, vol. 18, no. 4, pp. 460–473, 1972. P. Cheng, W. Hao, S. Dai, J. Liu, Z. Gan, and L. Carin, “Club: A contrastive log-ratio upper bound of mutual information,” in International Conference on Machine Learning. PMLR, 2020, pp. 1779–1788. S. Oymak and M. Soltanolkotabi, “Overparameterized nonlinear learning: Gradient descent takes the shortest path?” in International Conference on Machine Learning. PMLR, 2019, pp. 4951–4960. W. Hu, Z. Li, and D. Yu, “Simple and effective regularization methods for training on noisily labeled data with generalization guarantee,” arXiv preprint arXiv:1905.11368, 2019. I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT Press, 2016. D. Barber and F. Agakov, “The IM algorithm: a variational approach to information maximization,” Advances in neural information processing systems, vol. 16, no. 320, p. 201, 2004. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” 2014. I. Goodfellow, “Nips 2016 tutorial: Generative adversarial networks,” arXiv preprint arXiv:1701.00160, 2016. T. Che, Y. Li, A. P. Jacob, Y. Bengio, and W. Li, “Mode regularized generative adversarial networks,” arXiv preprint arXiv:1612.02136, 2016. M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in International Conference on Machine Learning. PMLR, 2017, pp. 214–223. R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT Press, 2018. J. Schulman, “Optimizing expectations: From deep reinforcement learning to stochastic computation graphs,” Ph.D. dissertation, UC Berkeley, 2016. S. Pitis, “Rethinking the discount factor in reinforcement learning: A decision theoretic approach,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 7949–7956. A. G. Barto, R. S. Sutton, and C. Watkins, Learning and sequential decision making. University of Massachusetts Amherst, MA, 1989. C. J. C. H. Watkins, “Learning from delayed rewards,” 1989. D. Bertsekas, Dynamic programming and optimal control: Volume I. Athena Scientific, 2012, vol. 4. R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” Advances in Neural Information Processing Systems, vol. 12, 1999. P. L’Ecuyer, “Note: On the interchange of derivative and expectation for likelihood ratio derivative estimators,” Management Science, vol. 41, no. 4, pp. 738–747, 1995. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016. L. Yu, W. Zhang, J. Wang, and Y. Yu, “Seqgan: Sequence generative adversarial nets with policy gradient,” in Proceedings of the AAAI conference on artificial intelligence, vol. 31, no. 1, 2017. Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel, “Benchmarking deep reinforcement learning for continuous control,” in International Conference on Machine Learning. PMLR, 2016, pp. 1329– 1338. J. Achiam, “Spinning Up in Deep Reinforcement Learning,” 2018. R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Reinforcement Learning, pp. 5– 32, 1992. J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” arXiv preprint arXiv:1506.02438, 2015. V. Konda and J. Tsitsiklis, “Actor-critic algorithms,” Advances in Neural Information Processing Systems, vol. 12, 1999. V. N. Vapnik, “An overview of statistical learning theory,” IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 988–999, 1999. V. Vapnik, The nature of statistical learning theory. Springer science & business media, 1999. T. Hastie, R. Tibshirani, J. H. Friedman, and J. H. Friedman, The elements of statistical learning: data mining, inference, and prediction. Springer, 2009, vol. 2. C. P. Robert et al., The Bayesian choice: from decision-theoretic foundations to computational implementation. Springer, 2007, vol. 2. H. Hafez-Kolahi, B. Moniri, and S. Kasaei, “Information-Theoretic Analysis of Minimax Excess Risk,” IEEE Transactions on Information Theory, 2023. K. H. Rosen, Discrete Mathematics and Its Applications, 8th ed. McGraw-Hill Higher Education, 2019. K. B. Petersen, M. S. Pedersen et al., “The matrix cookbook,” Technical University of Denmark, vol. 7, no. 15, p. 510, 2008. P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proceedings of the 25th International Conference on Machine Learning, 2008, pp. 1096–1103. R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bachman, A. Trischler, and Y. Bengio, “Learning deep representations by mutual information estimation and maximization,” arXiv preprint arXiv:1808.06670, 2018. Z. Zhu and A. K. Nandi, Automatic modulation classification: principles, algorithms and applications. John Wiley & Sons, 2015. A. Abdelmutalab, K. Assaleh, and M. El-Tarhuni, “Automatic modulation classification based on high order cumulants and hierarchical polynomial classifiers,” Physical Communication, vol. 21, pp. 10–18, 2016. A. K. Nandi and E. E. Azzouz, “Algorithms for automatic modulation recognition of communication signals,” IEEE Transactions on Communications, vol. 46, no. 4, pp. 431–436, 1998. A. Fehske, J. Gaeddert, and J. H. Reed, “A new approach to signal classification using spectral correlation and neural networks,” in First IEEE International Symposium on New Frontiers in Dynamic Spectrum Access Networks, 2005. DySPAN 2005. IEEE, 2005, pp. 144–150. M. W. Aslam, Z. Zhu, and A. K. Nandi, “Automatic modulation classification using combination of genetic programming and KNN,” IEEE Transactions on Wireless Communications, vol. 11, no. 8, pp. 2742– 2750, 2012. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural Computation, vol. 1, no. 4, pp. 541–551, 1989. T. J. O’Shea, J. Corgan, and T. C. Clancy, “Convolutional radio modulation recognition networks,” in Engineering Applications of Neural Networks: 17th International Conference, EANN 2016, Aberdeen, UK, September 2-5, 2016, Proceedings 17. Springer, 2016, pp. 213–226. J. Mitola and G. Q. Maguire, “Cognitive radio: making software radios more personal,” IEEE Personal Communications, vol. 6, no. 4, pp. 13– 18, 1999. F. K. Jondral, “Software-defined radio—basics and evolution to cognitive radio,” EURASIP Journal on Wireless Communications and Networking, vol. 2005, pp. 1–9, 2005. Y. Zeng, Y.-C. Liang, A. T. Hoang, and R. Zhang, “A review on spectrum sensing for cognitive radio: challenges and solutions,” EURASIP Journal on Advances in Signal Processing, vol. 2010, pp. 1–15, 2010. M. Kim, W. Lee, and D.-H. Cho, “A novel PAPR reduction scheme for OFDM system based on deep learning,” IEEE Communications Letters, vol. 22, no. 3, pp. 510–513, 2017. A. Felix, S. Cammerer, S. Dörner, J. Hoydis, and S. Ten Brink, “OFDM-autoencoder for end-to-end learning of communications systems,” in 2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC). IEEE, 2018, pp. 1–5. M. Ibnkahla, “Applications of neural networks to digital communications–a survey,” Signal Processing, vol. 80, no. 7, pp. 1185–1215, 2000. T. Schenk, RF imperfections in high-rate wireless systems: impact and digital compensation. Springer Science & Business Media, 2008. V. Raj and S. Kalyani, “Backpropagating through the air: Deep learning at physical layer without channel models,” IEEE Communications Letters, vol. 22, no. 11, pp. 2278–2281, 2018. T. J. O’Shea, L. Pemula, D. Batra, and T. C. Clancy, “Radio transformer networks: Attention models for learning to synchronize in wireless systems,” in 2016 50th Asilomar Conference on Signals, Systems and Computers. IEEE, 2016, pp. 662–666. T. J. O’Shea, K. Karra, and T. C. Clancy, “Learning to communicate: Channel auto-encoders, domain specific regularizers, and attention,” in 2016 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). IEEE, 2016, pp. 223–228. M. Jaderberg, K. Simonyan, A. Zisserman et al., “Spatial transformer networks,” Advances in Neural Information Processing Systems, vol. 28, 2015. H. He, S. Jin, C.-K. Wen, F. Gao, G. Y. Li, and Z. Xu, “Model-driven deep learning for physical layer communications,” IEEE Wireless Communications, vol. 26, no. 5, pp. 77–83, 2019. X. Gao, S. Jin, C.-K. Wen, and G. Y. Li, “ComNet: Combination of deep learning and expert knowledge in OFDM receivers,” IEEE Communications Letters, vol. 22, no. 12, pp. 2627–2630, 2018. H. He, C.-K. Wen, S. Jin, and G. Y. Li, “A model-driven deep learning network for MIMO detection,” in 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE, 2018, pp. 584– 588. M. Khani, M. Alizadeh, J. Hoydis, and P. Fleming, “Adaptive neural signal detection for massive MIMO,” IEEE Transactions on Wireless Communications, vol. 19, no. 8, pp. 5635–5648, 2020. F. A. Aoudia and J. Hoydis, “Model-free training of end-to-end communication systems,” IEEE Journal on Selected Areas in Communications, vol. 37, no. 11, pp. 2503–2516, 2019. H. Ye, G. Y. Li, and B.-H. Juang, “Deep learning based end-to-end wireless communication systems without pilots,” IEEE Transactions on Cognitive Communications and Networking, vol. 7, no. 3, pp. 702– 714, 2021. S. M. Alamouti, “A simple transmit diversity technique for wireless communications,” IEEE Journal on Selected Areas in Communications, vol. 16, no. 8, pp. 1451–1458, 1998. V. Tarokh, N. Seshadri, and A. R. Calderbank, “Space-time codes for high data rate wireless communication: Performance criterion and code construction,” IEEE Transactions on Information Theory, vol. 44, no. 2, pp. 744–765, 1998. V. Tarokh, H. Jafarkhani, and A. R. Calderbank, “Space-time block coding for wireless communications: Performance results,” IEEE Journal on Selected Areas in Communications, vol. 17, no. 3, pp. 451–460, 1999. ——, “Space-time block codes from orthogonal designs,” IEEE Transactions on Information Theory, vol. 45, no. 5, pp. 1456–1467, 1999. E. Ba¸sar, U. Aygölü, E. Panayirci, and H. V. Poor, “Space-time block coded spatial modulation,” IEEE Transactions on Communications, vol. 59, no. 3, pp. 823–832, 2010. Y.-A. Chu, P.-Y. Chen, Y.-H. Chiang, T.-S. Yang, Y.-C. Liao, and I.W. Lai, “Permutation Design for Ultra-Low Latency Communication and Spatial Permutation Modulation (SPM),” in 2020 IEEE Globecom Workshops (GC Wkshps). IEEE, 2020, pp. 1–6. T. J. O’Shea, T. Erpek, and T. C. Clancy, “Physical layer deep learning of encodings for the MIMO fading channel,” in 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2017, pp. 76–80. E. Telatar, “Capacity of multi-antenna Gaussian channels,” European Transactions on Telecommunications, vol. 10, no. 6, pp. 585–595, 1999. W. Yu, W. Rhee, S. Boyd, and J. M. Cioffi, “Iterative water-filling for Gaussian vector multiple-access channels,” IEEE Transactions on Information Theory, vol. 50, no. 1, pp. 145–152, 2004. J. Mietzner, R. Schober, L. Lampe, W. H. Gerstacker, and P. A. Hoeher, “Multiple-antenna techniques for wireless communications-a comprehensive literature survey,” IEEE Communications Surveys & Tutorials, vol. 11, no. 2, pp. 87–105, 2009. D. Tse and P. Viswanath, Fundamentals of wireless communication. Cambridge University Press, 2005. A. Soysal and S. Ulukus, “Joint channel estimation and resource allocation for MIMO systems-part I: single-user analysis,” IEEE Transactions on Wireless Communications, vol. 9, no. 2, pp. 624–631, 2010. ——, “Joint channel estimation and resource allocation for MIMO systems-Part II: Multi-user and numerical analysis,” IEEE Transactions on Wireless Communications, vol. 9, no. 2, pp. 632–640, 2010. T. J. O’Shea, T. Erpek, and T. C. Clancy, “Deep learning based MIMO communications,” arXiv preprint arXiv:1707.07980, 2017. M. Ozdemir and H. Arslan, “Channel estimation for wireless OFDM systems,” IEEE Communications Surveys and Tutorials, vol. 9, no. 2, 2007. S. Omar, A. Ancora, and D. T. Slock, “Performance analysis of general pilot-aided linear channel estimation in LTE OFDMA systems with application to simplified MMSE schemes,” in 2008 IEEE 19th International Symposium on Personal, Indoor and Mobile Radio Communications. IEEE, 2008, pp. 1–6. Y. Liu, Z. Tan, H. Hu, L. J. Cimini, and G. Y. Li, “Channel estimation for OFDM,” IEEE Communications Surveys & Tutorials, vol. 16, no. 4, pp. 1891–1908, 2014. C.-K. Wen, W.-T. Shih, and S. Jin, “Deep learning for massive MIMO CSI feedback,” IEEE Wireless Communications Letters, vol. 7, no. 5, pp. 748–751, 2018. M. Soltani, V. Pourahmadi, A. Mirzaei, and H. Sheikhzadeh, “Deep learning-based channel estimation,” IEEE Communications Letters, vol. 23, no. 4, pp. 652–655, 2019. L. Li, H. Chen, H.-H. Chang, and L. Liu, “Deep residual learning meets OFDM channel estimation,” IEEE Wireless Communications Letters, vol. 9, no. 5, pp. 615–618, 2019. M. S. Oh, S. Hosseinalipour, T. Kim, C. G. Brinton, and D. J. Love, “Channel estimation via successive denoising in MIMO OFDM systems: a reinforcement learning approach,” in ICC 2021-IEEE International Conference on Communications. IEEE, 2021, pp. 1–6. E. Dahlman, S. Parkvall, and J. Skold, 5G NR: The next generation wireless access technology. Academic Press, 2020. V. S. Asadchy, M. S. Mirmoosa, A. Diaz-Rubio, S. Fan, and S. A. Tretyakov, “Tutorial on electromagnetic nonreciprocity and its origins,” Proceedings of the IEEE, vol. 108, no. 10, pp. 1684–1727, 2020. C. A. Charu, Neural networks and deep learning: a textbook. Springer, 2018. M. Rao, N. Farsad, and A. Goldsmith, “Variable length joint source-channel coding of text using deep neural networks,” in 2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC). IEEE, 2018, pp. 1–5. T. J. O’Shea, T. Roy, N. West, and B. C. Hilburn, “Physical layer communications system design over-the-air using adversarial networks,” in 2018 26th European Signal Processing Conference (EUSIPCO). IEEE, 2018, pp. 529–532. T. J. O’Shea, T. Roy, and N. West, “Approximating the void: Learning stochastic channel models from observation with variational generative adversarial networks,” in 2019 International Conference on Computing, Networking and Communications (ICNC). IEEE, 2019, pp. 681–686. H. Ye, G. Y. Li, and B.-H. Juang, “Power of deep learning for channel estimation and signal detection in OFDM systems,” IEEE Wireless Communications Letters, vol. 7, no. 1, pp. 114–117, 2017. H. Ye, L. Liang, G. Y. Li, and B.-H. Juang, “Deep learning-based end-to-end wireless communication systems with conditional GANs as unknown channels,” IEEE Transactions on Wireless Communications, vol. 19, no. 5, pp. 3133–3143, 2020. B. Zhu, J. Wang, L. He, and J. Song, “Joint transceiver optimization for wireless communication PHY with convolutional neural network,” arXiv preprint arXiv:1808.03242, 2018. D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” in International Conference on Machine Learning. Pmlr, 2014, pp. 387–395. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017. F. A. Aoudia and J. Hoydis, “End-to-end learning of communications systems without a channel model,” in 2018 52nd Asilomar Conference on Signals, Systems, and Computers. IEEE, 2018, pp. 298–303. “Database of sphere packings,” https://codes.se/packings/, accessed: 2023-09-01. J. Foerster, I. A. Assael, N. De Freitas, and S. Whiteson, “Learning to communicate with deep multi-agent reinforcement learning,” Advances in Neural Information Processing Systems, vol. 29, 2016. C. de Vrieze, S. Barratt, D. Tsai, and A. Sahai, “Cooperative multiagent reinforcement learning for low-level wireless communication,” arXiv preprint arXiv:1801.04541, 2018. J. C. Spall, “Multivariate stochastic approximation using a simultaneous perturbation gradient approximation,” IEEE Transactions on Automatic Control, vol. 37, no. 3, pp. 332–341, 1992. ——, “An overview of the simultaneous perturbation method for efficient optimization,” Johns Hopkins APL Technical Digest, vol. 19, no. 4, pp. 482–492, 1998. S. J. Russell, Artificial intelligence a modern approach. Pearson Education, Inc., 2010. K. Niu, J. Dai, S. Yao, S. Wang, Z. Si, X. Qin, and P. Zhang, “A paradigm shift toward semantic communications,” IEEE Communications Magazine, vol. 60, no. 11, pp. 113–119, 2022. L. Floridi, “Outline of a theory of strongly semantic information,” Minds and Machines, vol. 14, pp. 197–221, 2004. S. D’Alfonso, “On quantifying semantic information,” Information, vol. 2, no. 1, pp. 61–101, 2011. Y. Zhong, “A theory of semantic information,” China Communications, vol. 14, no. 1, pp. 1–17, 2017. H.-M. Lee, C.-M. Chen, J.-M. Chen, and Y.-L. Jou, “An efficient fuzzy classifier with feature selection based on fuzzy entropy,” IEEE Transactions on Systems, Man, and Cybernetics, part B (cybernetics), vol. 31, no. 3, pp. 426–432, 2001. X. Liu, W. Jia, W. Liu, and W. Pedrycz, “AFSSE: An interpretable classifier with axiomatic fuzzy set and semantic entropy,” IEEE Transactions on Fuzzy Systems, vol. 28, no. 11, pp. 2825–2840, 2019. B. Güler, A. Yener, and A. Swami, “The semantic communication game,” IEEE Transactions on Cognitive Communications and Networking, vol. 4, no. 4, pp. 787–802, 2018. Z. Qin, X. Tao, J. Lu, and G. Y. Li, “Semantic communications: Principles and challenges,” arXiv preprint arXiv:2201.01389, 2021. W. Wang, R. Wang, L. Wang, Z. Wang, and A. Ye, “Towards a robust deep neural network in texts: A survey,” arXiv preprint arXiv:1902.07285, 2019. I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014. T. Miyato, A. M. Dai, and I. Goodfellow, “Adversarial training methods for semi-supervised text classification,” arXiv preprint arXiv:1605.07725, 2016. A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” in Artificial Intelligence Safety and Security. Chapman and Hall/CRC, 2018, pp. 99–112. B. H. Juang, “Quantification and transmission of information and intelligence— history and outlook,” IEEE Signal Processing Magazine, vol. 28, no. 4, pp. 90–101, 2011. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017. D. W. Otter, J. R. Medina, and J. K. Kalita, “A survey of the usages of deep learning for natural language processing,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 2, pp. 604–624, 2020. K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method for automatic evaluation of machine translation,” in Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002, pp. 311–318. T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013. A. Graves and J. Schmidhuber, “Framewise phoneme classification with bidirectional LSTM and other neural network architectures,” Neural networks, vol. 18, no. 5-6, pp. 602–610, 2005. M. Ding, J. Li, M. Ma, and X. Fan, “SNR-adaptive deep joint source-channel coding for wireless image transmission,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 1555–1559. I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” Advances in Neural Information Processing Systems, vol. 27, 2014. Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey et al., “Google’s neural machine translation system: Bridging the gap between human and machine translation,” arXiv preprint arXiv:1609.08144, 2016. S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer, “Scheduled sampling for sequence prediction with recurrent neural networks,” Advances in Neural Information Processing Systems, vol. 28, 2015. T. Mihaylova and A. F. Martins, “Scheduled sampling for transformers,” arXiv preprint arXiv:1906.07651, 2019. D. Chandrasekaran and V. Mago, “Evolution of semantic similarity—a survey,” ACM Computing Surveys (CSUR), vol. 54, no. 2, pp. 1–37, 2021. J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543. R. Vedantam, C. Lawrence Zitnick, and D. Parikh, “Cider: Consensus-based image description evaluation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 4566– 4575. G. Zhu and C. A. Iglesias, “Computing semantic similarity of concepts in knowledge graphs,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 1, pp. 72–85, 2016. K. Lu, R. Li, X. Chen, Z. Zhao, and H. Zhang, “Reinforcement learning-powered semantic communication via semantic similarity,” arXiv preprint arXiv:2108.12121, 2021. K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014. S. Liu, Z. Zhu, N. Ye, S. Guadarrama, and K. Murphy, “Improved image captioning via policy gradient optimization of spider,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 873–881. C. M. Bishop and N. M. Nasrabadi, Pattern recognition and machine learning. Springer, 2006, vol. 4, no. 4. M. Ranzato, S. Chopra, M. Auli, and W. Zaremba, “Sequence level training with recurrent neural networks,” arXiv preprint arXiv:1511.06732, 2015. E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with Gumbel-softmax,” arXiv preprint arXiv:1611.01144, 2016. S. J. Rennie, E. Marcheret, Y. Mroueh, J. Ross, and V. Goel, “Selfcritical sequence training for image captioning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7008–7024. R. Luo, “A better variant of self-critical sequence training,” arXiv preprint arXiv:2003.09971, 2020. A. Mnih and D. Rezende, “Variational inference for Monte Carlo objectives,” in International Conference on Machine Learning. PMLR, 2016, pp. 2188–2196. S. Yun, J. Choi, Y. Yoo, K. Yun, and J. Young Choi, “Action-decision networks for visual tracking with deep reinforcement learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2711–2720. Q. Zhou, R. Li, Z. Zhao, C. Peng, and H. Zhang, “Semantic communication with adaptive universal transformer,” IEEE Wireless Communications Letters, vol. 11, no. 3, pp. 453–457, 2021. “Generative models,” https://openai.com/blog/generative-models/, accessed: 2023-02-20. D. P. Kingma, M. Welling et al., “An introduction to variational autoencoders,” Foundations and Trends® in Machine Learning, vol. 12, no. 4, pp. 307–392, 2019. L. Cayton et al., Algorithms for manifold learning. eScholarship, University of California, 2008. C. Fefferman, S. Mitter, and H. Narayanan, “Testing the manifold hypothesis,” Journal of the American Mathematical Society, vol. 29, no. 4, pp. 983–1049, 2016. C. Doersch, “Tutorial on variational autoencoders,” arXiv preprint arXiv:1606.05908, 2016. A. Graves, “Practical variational inference for neural networks,” Advances in Neural Information Processing Systems, vol. 24, 2011. Y. Bengio and J.-S. Senécal, “Quick training of probabilistic neural nets by importance sampling,” in International Workshop on Artificial Intelligence and Statistics. PMLR, 2003, pp. 17–24. Y. Burda, R. Grosse, and R. Salakhutdinov, “Importance weighted autoencoders,” arXiv preprint arXiv:1509.00519, 2015. R. Shu, H. H. Bui, S. Zhao, M. J. Kochenderfer, and S. Ermon, “Amortized inference regularization,” Advances in Neural Information Processing Systems, vol. 31, 2018. L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proceedings of COMPSTAT’2010: 19th International Conference on Computational StatisticsParis France, August 22-27, 2010 Keynote, Invited and Contributed Papers. Springer, 2010, pp. 177–186. S. Gershman and N. Goodman, “Amortized inference in probabilistic reasoning,” in Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 36, no. 36, 2014. D. P. Kingma, “Variational inference & deep learning: A new synthesis,” 2017. P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.-A. Manzagol, and L. Bottou, “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.” Journal of Machine Learning Research, vol. 11, no. 12, 2010. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015. D. Rezende and S. Mohamed, “Variational inference with normalizing flows,” in International Conference on Machine Learning. PMLR, 2015, pp. 1530–1538. D. P. Kingma, T. Salimans, R. Jozefowicz, X. Chen, I. Sutskever, and M.Welling, “Improved variational inference with inverse autoregressive flow,” Advances in Neural Information Processing Systems, vol. 29, 2016. E. C¸ inlar and E. Cınlar, Probability and stochastics. Springer, 2011, vol. 261. R. Ranganath, S. Gerrish, and D. Blei, “Black box variational inference,” in Artificial Intelligence and Statistics. PMLR, 2014, pp. 814– 822. A. Mnih and K. Gregor, “Neural variational inference and learning in belief networks,” in International Conference on Machine Learning. PMLR, 2014, pp. 1791–1799. J. Paisley, D. Blei, and M. Jordan, “Variational Bayesian inference with stochastic search,” arXiv preprint arXiv:1206.6430, 2012. C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra, “Weight uncertainty in neural network,” in International Conference on Machine Learning. PMLR, 2015, pp. 1613–1622. Y. Zhang, W. Liu, Z. Chen, J. Wang, and K. Li, “On the Properties of Kullback-Leibler Divergence Between Multivariate Gaussian Distributions,” arXiv preprint arXiv:2102.05485, 2021. X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, “Infogan: Interpretable representation learning by information maximizing generative adversarial nets,” Advances in Neural Information Processing Systems, vol. 29, 2016. H. Kim and A. Mnih, “Disentangling by factorising,” in International Conference on Machine Learning. PMLR, 2018, pp. 2649–2658. S. Van Steenkiste, F. Locatello, J. Schmidhuber, and O. Bachem, “Are disentangled representations helpful for abstract visual reasoning?” Advances in Neural Information Processing Systems, vol. 32, 2019. W.-N. Hsu, Y. Zhang, and J. Glass, “Unsupervised learning of disentangled and interpretable representations from sequential data,” Advances in Neural Information Processing Systems, vol. 30, 2017. E. Creager, D. Madras, J.-H. Jacobsen, M.Weis, K. Swersky, T. Pitassi, and R. Zemel, “Flexibly fair representation learning by disentanglement,” in International Conference on Machine Learning. PMLR, 2019, pp. 1436–1445. T. Kehrenberg, M. Bartlett, O. Thomas, and N. Quadrianto, “Nullsampling for interpretable and fair representations,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVI 16. Springer, 2020, pp. 565–580. A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey, “Adversarial autoencoders,” arXiv preprint arXiv:1511.05644, 2015. D. M. Blei, A. Kucukelbir, and J. D. McAuliffe, “Variational inference: A review for statisticians,” Journal of the American Statistical Association, vol. 112, no. 518, pp. 859–877, 2017. D. J. MacKay, “Local minima, symmetry-breaking, and model pruning in variational free energy minimization,” Inference Group, Cavendish Laboratory, Cambridge, UK, 2001. D. Wipf and S. Nagarajan, “A new view of automatic relevance determination,” Advances in Neural Information Processing Systems, vol. 20, 2007. T. Karaletsos and G. Rätsch, “Automatic relevance determination for deep generative models,” arXiv preprint arXiv:1505.07765, 2015. S. R. Bowman, L. Vilnis, O. Vinyals, A. M. Dai, R. Jozefowicz, and S. Bengio, “Generating sentences from a continuous space,” arXiv preprint arXiv:1511.06349, 2015. C. K. Sønderby, T. Raiko, L. Maaløe, S. K. Sønderby, and O. Winther, “How to train deep variational autoencoders and probabilistic ladder networks,” arXiv preprint arXiv:1602.02282, vol. 3, no. 2, 2016. C. Li, H. Liu, C. Chen, Y. Pu, L. Chen, R. Henao, and L. Carin, “Alice: Towards understanding adversarial learning for joint distribution matching,” Advances in Neural Information Processing Systems, vol. 30, 2017. M. I. Belghazi, A. Baratin, S. Rajeshwar, S. Ozair, Y. Bengio, A. Courville, and D. Hjelm, “Mutual information neural estimation,” in International Conference on Machine Learning. PMLR, 2018, pp. 531–540. M. D. Hoffman and M. J. Johnson, “Elbo surgery: yet another way to carve up the variational evidence lower bound,” in Workshop in Advances in Approximate Bayesian Inference, NIPS, vol. 1, no. 2, 2016. A. Makhzani and B. J. Frey, “Pixelgan autoencoders,” Advances in Neural Information Processing Systems, vol. 30, 2017. I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner, “beta-VAE: Learning basic visual concepts with a constrained variational framework,” in International Conference on Learning Representations, 2017. A. Achille and S. Soatto, “Emergence of invariance and disentanglement in deep representations,” The Journal of Machine Learning Research, vol. 19, no. 1, pp. 1947–1980, 2018. I. Tolstikhin, O. Bousquet, S. Gelly, and B. Schoelkopf, “Wasserstein auto-encoders,” arXiv preprint arXiv:1711.01558, 2017. S. Watanabe, “Information theoretical analysis of multivariate correlation,” IBM Journal of Research and Development, vol. 4, no. 1, pp. 66–82, 1960. G. E. Hinton and R. Zemel, “Autoencoders, minimum description length and Helmholtz free energy,” Advances in Neural Information Processing Systems, vol. 6, 1993. Y. Dubois, A. Kastanos, D. Lines, and B. Melman, “Disentangling VAE,” http://github.com/YannDubs/disentangling-vae/, march 2019. P. Brakel and Y. Bengio, “Learning independent features with adversarial nets for non-linear ICA,” arXiv preprint arXiv:1710.05050, 2017. C. P. Burgess, I. Higgins, A. Pal, L. Matthey, N. Watters, G. Desjardins, and A. Lerchner, “Understanding disentangling in β-VAE,” arXiv preprint arXiv:1804.03599, 2018. W. R. Garner, “Uncertainty and structure as psychological concepts.” 1962. M. Studenỳ and J. Vejnarová, “The multiinformation function as a tool for measuring stochastic dependence.” Learning in Graphical Models, vol. 89, pp. 261–297, 1998. N. Slonim, N. Friedman, and N. Tishby, “Multivariate information bottleneck,” Neural computation, vol. 18, no. 8, pp. 1739–1789, 2006. M. Sugiyama, T. Suzuki, and T. Kanamori, “Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation,” Annals of the Institute of Statistical Mathematics, vol. 64, pp. 1009–1044, 2012. X. Nguyen, M. J. Wainwright, and M. I. Jordan, “On surrogate loss functions and f-divergences,” 2009. Y. Song, M. Xu, L. Yu, H. Zhou, S. Shao, and Y. Yu, “Infomax neural joint source-channel coding via adversarial bit flip,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, 2020, pp. 5834–5841. M. A. Arcones and E. Gine, “On the bootstrap of U and V statistics,” The Annals of Statistics, pp. 655–674, 1992. C. K. Sønderby, J. Caballero, L. Theis, W. Shi, and F. Huszár, “Amortised map inference for image super-resolution,” arXiv preprint arXiv:1610.04490, 2016. M. Arjovsky and L. Bottou, “Towards principled methods for training generative adversarial networks,” arXiv preprint arXiv:1701.04862, 2017. T. Rainforth, R. Cornish, H. Yang, A. Warrington, and F. Wood, “On nesting Monte Carlo estimators,” in International Conference on Machine Learning. PMLR, 2018, pp. 4267–4276. K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359–366, 1989. S. A. Van de Geer and S. van de Geer, Empirical processes in Mestimation. Cambridge University Press, 2000, vol. 6. M. U. Gutmann and A. Hyvärinen, “Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics.” Journal of Machine Learning Research, vol. 13, no. 2, 2012. A. Mnih and Y. W. Teh, “A fast and simple algorithm for training neural probabilistic language models,” arXiv preprint arXiv:1206.6426, 2012. A. Mnih and K. Kavukcuoglu, “Learning word embeddings efficiently with noise-contrastive estimation,” Advances in Neural Information Processing Systems, vol. 26, 2013. M. Magdon-Ismail and A. Atiya, “Neural networks for density estimation,” Advances in Neural Information Processing Systems, vol. 11, 1998. Y.-J. Luo, K. Agres, and D. Herremans, “Learning disentangled representations of timbre and pitch for musical instrument sounds using Gaussian mixture variational autoencoders,” arXiv preprint arXiv:1906.08152, 2019. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Advances in Neural Information Processing Systems, vol. 26, 2013. G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, “Improving neural networks by preventing coadaptation of feature detectors,” arXiv preprint arXiv:1207.0580, 2012. L. Mescheder, S. Nowozin, and A. Geiger, “Adversarial variational bayes: Unifying variational autoencoders and generative adversarial networks,” in International Conference on Machine Learning. PMLR, 2017, pp. 2391–2400. Z. Ma and M. Collins, “Noise contrastive estimation and negative sampling for conditional models: Consistency and statistical efficiency,” arXiv preprint arXiv:1809.01812, 2018. I. Fischer, “The conditional entropy bottleneck,” Entropy, vol. 22, no. 9, p. 999, 2020. V. Anantharam, A. Gohari, S. Kamath, and C. Nair, “On hypercontractivity and a data processing inequality,” in 2014 IEEE International Symposium on Information Theory. IEEE, 2014, pp. 3022–3026. Y. Polyanskiy and Y. Wu, “Strong data-processing inequalities for channels and Bayesian networks,” arXiv preprint arXiv:1508.06025, 2015. Y. Dubois, D. Kiela, D. J. Schwab, and R. Vedantam, “Learning optimal representations with the decodable information bottleneck,” Advances in Neural Information Processing Systems, vol. 33, pp. 18 674– 18 690, 2020. S. Soatto and A. Chiuso, “Visual representations: Defining properties and deep approximations,” arXiv preprint arXiv:1411.7676, 2014. B. Jiang, T.-y. Wu, C. Zheng, and W. H. Wong, “Learning summary statistic for approximate Bayesian computation via deep neural network,” Statistica Sinica, pp. 1595–1618, 2017. M. Cvitkovic and G. Koliander, “Minimal achievable sufficient statistic learning,” in International Conference on Machine Learning. PMLR, 2019, pp. 1465–1474. K. Takeuchi and M. Akahira, “Characterizations of prediction sufficiency (adequacy) in terms of risk functions,” in Joint Statistical Papers Of Akahira And Takeuchi. World Scientific, 2003, pp. 1–7. M. Hayashi and V. Y. Tan, “Minimum rates of approximate sufficient statistics,” IEEE Transactions on Information Theory, vol. 64, no. 2, pp. 875–888, 2017. N. Iri and O. Kosut, “Fine asymptotics for universal one-to-one compression of parametric sources,” IEEE Transactions on Information Theory, vol. 65, no. 4, pp. 2442–2458, 2019. G. Chechik and N. Tishby, “Extracting relevant structures with side information,” Advances in Neural Information Processing Systems, vol. 15, 2002. T. Wu, I. Fischer, I. L. Chuang, and M. Tegmark, “Learnability for the information bottleneck,” in Uncertainty in Artificial Intelligence. PMLR, 2020, pp. 1050–1060. S. S. Haykin, Neural networks and learning machines, 3rd ed. Upper Saddle River, NJ: Pearson Education, 2009. N. Tishby and N. Zaslavsky, “Deep learning and the information bottleneck principle,” in 2015 IEEE Information Theory Workshop (ITW). IEEE, 2015, pp. 1–5. X. B. Peng, A. Kanazawa, S. Toyer, P. Abbeel, and S. Levine, “Variational discriminator bottleneck: Improving imitation learning, inverse RL, and GANs by constraining information flow,” arXiv preprint arXiv:1810.00821, 2018. A. Zaidi, I. Estella-Aguerri, and S. Shamai, “On the information bottleneck problems: Models, connections, applications and information theoretic views,” Entropy, vol. 22, no. 2, p. 151, 2020. R. Shwartz-Ziv and N. Tishby, “Opening the black box of deep neural networks via information,” arXiv preprint arXiv:1703.00810, 2017. A. M. Saxe, Y. Bansal, J. Dapello, M. Advani, A. Kolchinsky, B. D. Tracey, and D. D. Cox, “On the information bottleneck theory of deep learning,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2019, no. 12, p. 124020, 2019. R. A. Amjad and B. C. Geiger, “Learning representations for neural network-based classification using the information bottleneck principle,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 9, pp. 2225–2239, 2019. Z. Goldfeld and Y. Polyanskiy, “The information bottleneck problem and its applications in machine learning,” IEEE Journal on Selected Areas in Information Theory, vol. 1, no. 1, pp. 19–38, 2020. A. Fawzi, S.-M. Moosavi-Dezfooli, and P. Frossard, “Robustness of classifiers: from adversarial to random noise,” Advances in Neural Information Processing Systems, vol. 29, 2016. N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in 2017 IEEE Symposium on Security and Privacy (sp). IEEE, 2017, pp. 39–57. D. Böhning, “Multinomial logistic regression algorithm,” Annals of the Institute of Statistical Mathematics, vol. 44, no. 1, pp. 197–200, 1992. M. Chalk, O. Marre, and G. Tkacik, “Relevant sparse codes with variational information bottleneck,” Advances in Neural Information Processing Systems, vol. 29, 2016. A. Kolchinsky, B. D. Tracey, and D. H. Wolpert, “Nonlinear information bottleneck,” Entropy, vol. 21, no. 12, p. 1181, 2019. D. Y. Pan, “Digital audio compression,” Digital Technical Journal, vol. 5, no. 2, pp. 28–40, 1993. D. S. Taubman, M. W. Marcellin, and M. Rabbani, “JPEG2000: Image compression fundamentals, standards and practice,” Journal of Electronic Imaging, vol. 11, no. 2, pp. 286–287, 2002. J.-S. Lee and T. Ebrahimi, “Perceptual video compression: A survey,” IEEE Journal of Selected Topics in Signal Processing, vol. 6, no. 6, pp. 684–697, 2012. J. Alakuijala and V. Rabaud, “Lossless and transparency encoding in WebP,” URL https://developers. google, 2017. D. Minnen, J. Ballé, and G. D. Toderici, “Joint autoregressive and hierarchical priors for learned image compression,” Advances in Neural Information Processing Systems, vol. 31, 2018. F. Mentzer, E. Agustsson, M. Tschannen, R. Timofte, and L. Van Gool, “Conditional probability models for deep image compression,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4394–4402. M. Li, W. Zuo, S. Gu, D. Zhao, and D. Zhang, “Learning convolutional networks for content-weighted image compression,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3214–3223. Y. Yang, R. Bamler, and S. Mandt, “Improving inference for neural image compression,” Advances in Neural Information Processing Systems, vol. 33, pp. 573–584, 2020. D. Minnen and S. Singh, “Channel-wise autoregressive entropy models for learned image compression,” in 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2020, pp. 3339–3343. Y. Blau and T. Michaeli, “Rethinking lossy compression: The rate-distortion-perception tradeoff,” in International Conference on Machine Learning. PMLR, 2019, pp. 675–685. E. Agustsson, M. Tschannen, F. Mentzer, R. Timofte, and L. V. Gool, “Generative adversarial networks for extreme learned image compression,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 221–231. F. Mentzer, G. D. Toderici, M. Tschannen, and E. Agustsson, “High-fidelity generative image compression,” Advances in Neural Information Processing Systems, vol. 33, pp. 11 913–11 924, 2020. L.-H. Chen, C. G. Bampis, Z. Li, A. Norkin, and A. C. Bovik, “Perceptually optimizing deep image compression,” arXiv preprint arXiv:2007.02711, 2020. H. Hafez-Kolahi, B. Moniri, S. Kasaei, and M. S. Baghshah, “Rate-distortion analysis of minimum excess risk in Bayesian learning,” in International Conference on Machine Learning. PMLR, 2021, pp. 3998–4007. A. Xu and M. Raginsky, “Minimum excess risk in Bayesian learning,” IEEE Transactions on Information Theory, vol. 68, no. 12, pp. 7935– 7955, 2022. L. Ericsson, H. Gouk, and T. M. Hospedales, “Why do self-supervised models transfer? Investigating the impact of invariance on downstream tasks,” arXiv preprint arXiv:2111.11398, 2021. J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo, M. Gheshlaghi Azar et al., “Bootstrap your own latent-a new approach to self-supervised learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 21 271–21 284, 2020. P. H. Richemond, J.-B. Grill, F. Altché, C. Tallec, F. Strub, A. Brock, S. Smith, S. De, R. Pascanu, B. Piot et al., “Byol works even without batch statistics,” arXiv preprint arXiv:2010.10241, 2020. X. Chen and K. He, “Exploring simple siamese representation learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15 750–15 758. C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” Journal of Big Data, vol. 6, no. 1, pp. 1–48, 2019. T. Dao, A. Gu, A. Ratner, V. Smith, C. De Sa, and C. Ré, “A kernel theory of modern data augmentation,” in International Conference on Machine Learning. PMLR, 2019, pp. 1528–1537. S. Chen, E. Dobriban, and J. H. Lee, “A group-theoretic framework for data augmentation,” The Journal of Machine Learning Research, vol. 21, no. 1, pp. 9885–9955, 2020. C. Lyle, M. van der Wilk, M. Kwiatkowska, Y. Gal, and B. BloemReddy, “On the benefits of invariance in neural networks,” arXiv preprint arXiv:2005.00178, 2020. J. Bruna and S. Mallat, “Invariant scattering convolution networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1872–1886, 2013. T. Cohen and M. Welling, “Group equivariant convolutional networks,” in International Conference on Machine Learning. PMLR, 2016, pp. 2990–2999. M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, “Deep sets,” Advances in Neural Information Processing Systems, vol. 30, 2017. R. Kondor and S. Trivedi, “On the generalization of equivariance and convolution in neural networks to the action of compact groups,” in International Conference on Machine Learning. PMLR, 2018, pp. 2747–2755. B. Bloem-Reddy and Y. W. Teh, “Probabilistic symmetries and invariant neural networks,” The Journal of Machine Learning Research, vol. 21, no. 1, pp. 3535–3595, 2020. J. Mitrovic, B. McWilliams, J. Walker, L. Buesing, and C. Blundell, “Representation learning via invariant causal mechanisms,” arXiv preprint arXiv:2010.07922, 2020. M. L. Eaton, “Group invariance applications in statistics.” IMS, 1989. E. L. Lehmann, J. P. Romano, and G. Casella, Testing statistical hypotheses. Springer, 2005, vol. 3. P. R. Halmos and L. J. Savage, “Application of the Radon-Nikodym theorem to the theory of sufficient statistics,” The Annals of Mathematical Statistics, vol. 20, no. 2, pp. 225–241, 1949. J. Robinson, L. Sun, K. Yu, K. Batmanghelich, S. Jegelka, and S. Sra, “Can contrastive learning avoid shortcut solutions?” Advances in Neural Information Processing Systems, vol. 34, pp. 4974–4986, 2021. A. P. Dawid, “Conditional independence in statistical theory,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 41, no. 1, pp. 1–15, 1979. T. Gneiting and A. E. Raftery, “Strictly proper scoring rules, prediction, and estimation,” Journal of the American Statistical Association, vol. 102, no. 477, pp. 359–378, 2007. A. Kolchinsky, B. D. Tracey, and S. Van Kuyk, “Caveats for information bottleneck in deterministic scenarios,” arXiv preprint arXiv:1808.07593, 2018. D. P. Kingma, T. Salimans, and M. Welling, “Variational dropout and the local reparameterization trick,” Advances in Neural Information Processing Systems, vol. 28, 2015. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014. S. Wang and C. Manning, “Fast dropout training,” in International Conference on Machine Learning. PMLR, 2013, pp. 118–126. Y. Gal and Z. Ghahramani, “Dropout as a Bayesian approximation: Representing model uncertainty in deep learning,” in International Conference on Machine Learning. PMLR, 2016, pp. 1050–1059. E. Goan and C. Fookes, “Bayesian neural networks: An introduction and survey,” Case Studies in Applied Bayesian Data Science: CIRM Jean-Morlet Chair, Fall 2018, pp. 45–87, 2020. D. Molchanov, A. Ashukha, and D. Vetrov, “Variational dropout sparsifies deep neural networks,” in International Conference on Machine Learning. PMLR, 2017, pp. 2498–2507. R. W. Hamming, “On the distribution of numbers,” The Bell System Technical Journal, vol. 49, no. 8, pp. 1609–1625, 1970. G. E. Hinton and D. Van Camp, “Keeping the neural networks simple by minimizing the description length of the weights,” in Proceedings of the Sixth Annual Conference on Computational Learning Theory, 1993, pp. 5–13. A. Honkela and H. Valpola, “Variational learning and bits-back coding: an information-theoretic view to Bayesian learning,” IEEE Transactions on Neural Networks, vol. 15, no. 4, pp. 800–810, 2004. “IEEE Standard for Floating-Point Arithmetic,” IEEE Std 754-2019 (Revision of IEEE 754-2008), pp. 1–84, 2019. “Variational Dropout Sparsifies Deep Neural Networks,” https:// github.com/bayesgroup/variational-dropout-sparsifies-dnn, accessed: 2023-02-20. M. Figurnov, A. Ibraimova, D. P. Vetrov, and P. Kohli, “Perforatedcnns: Acceleration through elimination of redundant convolutions,” Advances in Neural Information Processing Systems, vol. 29, 2016. V. Lebedev and V. Lempitsky, “Fast convnets using group-wise brain damage,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2554–2564. K. Neklyudov, D. Molchanov, A. Ashukha, and D. P. Vetrov, “Structured Bayesian pruning via log-normal multiplicative noise,” Advances in Neural Information Processing Systems, vol. 30, 2017. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92417 | - |
| dc.description.abstract | 當前,眾多智能服務仰賴深度神經網絡的推斷,然而在行動裝置(如智慧型手機和感測器)上執行它們需要耗費大量資源。此外,某些實際應用牽涉到基於多個子任務的決策,並且需要即時響應。例如,在自動駕駛中,同時處理物體檢測和車道跟蹤。為了使下游任務維持可接受的性能,一種方法是將部分資源密集型運算轉移到靠近資訊來源的強大邊緣伺服器,同時將每個樣本的充分但最小語義表徵從裝置上行傳輸。
根據上述的任務導向通信,我們協同裝置和邊緣,建立了一個通用而適應性強的推理框架,以實現多任務無損預測。與之前僅適用於少數預定義任務的工作不同,此框架對各種潛在應用任務的適用性僅受到其損失類型(如對數損失)的限制,而且對裝置本身而言這些任務可以完全未知。本論文從消息理論角度推導並解釋正規化特定通信約束的目標函數,順帶展示了其與機器學習中多個熱門領域引人入勝的交集。 實作中的任務導向通信系統包括一個通用的圖像發射器,對於動態通道條件和語義內容具有穩健性。針對可變長度的特徵編碼和連續觸發的神經元,我們設計了可微分的資料流和神經網絡模組,以最小化通信開銷。此外,我們修改了變分丟棄層,並修剪標準化後的激活數值,以在類比傳輸過程中引入符號稀疏性。延遲和多任務分類準確度之間的不同權衡證明了所提架構的有效性,也證實了我們的見解。與資料導向的信源通道聯合編碼相比,免去高維重建導致了較低的延遲和減少的計算需求。 隨著機器學習的不斷發展,這個原型有望套用各個提到的研究領域中更進階的技術,為未來在各種實際應用中帶來令人興奮的前景。 | zh_TW |
| dc.description.abstract | Intelligence services inferring from deep neural networks are too resource-intensive to run efficiently on mobile devices like smartphones and sensors. In addition, some real-world applications make decisions based on multiple subtasks and demand real-time responses. For example, object detection and lane tracking should be handled simultaneously in autonomous driving. To maintain acceptable performance for the downstream tasks, one approach is to offload parts of the heavy computation to a powerful edge server located close to the data source, while the sufficient but minimal semantic representation of each sample is uplink transmitted from the device. In terms of such task-oriented communication, this work establishes a general and adaptable device-edge co-inference framework to achieve multi-task lossless predictions. In contrast to previous works that are only suitable for few pre-defined tasks, various potentially applicable tasks are only restricted to their loss types (e.g., log loss) and can be completely unknown for devices in our framework. The regularized objective function under certain communication constraints is derived and interpreted from an information-theoretic view. It also presents a fascinating intersection of multiple fields in machine learning. The designed practical task-oriented communication system involves a generic image transmitter that is robust to dynamic channel conditions and semantic contents. Differentiable data flow and network modules are designed for variable-length feature encodings and consecutive neuron activations to minimize the communication overhead. Besides, variational dropout is modified to prune normalized activations for inducing dimensional sparsity in the analog transmission manner. Experiments with varying trade-offs between latency and multi-task classification accuracies demonstrate the effectiveness of the proposed framework, which corroborates our ideas. Compared to data-oriented joint source-channel coding, high-dimensional reconstructions are unnecessary, which leads to lower latency and decreases computational requirements. As machine learning continues to advance, this prototype is poised to benefit from improved techniques in each mentioned research area, which promises exciting prospects for future implementations in a wide array of applications. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-03-22T16:24:41Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2024-03-22T16:24:41Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 1 Introduction 1
1.1 Background 1 1.2 Contributions 6 1.3 Organization 8 1.4 Notations 10 2 Loss-Based Task-Oriented Communication Framework 12 2.1 High-Level Framework 13 2.2 Data-Oriented vs. Task-Oriented Communication 16 2.3 Multi-Task Lossless Prediction 17 2.4 Mutual Information (MI) Estimation and Optimization 18 2.5 Quadrature Amplitude Modulation (QAM) and Communication Overhead 19 2.6 Fault Tolerance and Nuisance Invariance 20 2.7 Dynamic Channel Conditions and Source Semantics 21 3 Proposed Algorithms and Connections 23 3.1 Variational Objectives 23 3.1.1 Mutual Information (MI) Bottleneck for the Rate Term 24 3.1.2 Direct Distortion for the Distortion Term 25 3.1.3 Contrastive Distortion for the Distortion Term 26 3.2 Variational Invariant Compressor (VIC) 27 3.3 Bottleneck Infomation Noise Contrastive Estimation (BINCE) 28 3.4 Extended Sparse Variational Dropout 29 3.4.1 Activation Uncertainty 29 3.4.2 Activation Sparsity 32 3.5 Connections to Other Frameworks 33 4 Loss-Based Task-Oriented Communication System (LBTOCS) 36 4.1 Deterministic vs. Stochastic Representations 37 4.2 Channel Noise Variances 38 4.3 Data Flow 39 4.4 Loss Functions 41 4.4.1 On-Device Network 41 4.4.2 Server-Based Network 44 4.5 Network Architectures 45 4.5.1 Extractor 46 4.5.2 Thresholder 46 4.5.3 Referencer 47 4.5.4 Encoder 47 4.5.5 Activator 48 4.5.6 Dynamic Channel 52 4.5.7 Projector 52 4.5.8 Discriminator 53 4.5.9 Classifier 54 4.6 Learning Algorithms 54 4.6.1 On-Device Network 55 4.6.2 Server-Based Network 57 5 Experiments 58 5.1 Simulation Settings 58 5.2 System Trade-Offs 61 5.3 Performance Comparisons 68 5.4 Potential Failures 74 6 Conclusions 77 7 Future Work 79 Bibliography 82 A Preliminaries 168 A.1 Mutual Information (MI) 168 A.2 Dual Representations for Kullback-Leibler (KL) Divergence 169 A.3 Ultimate Shannon Limit 170 A.4 Rate-Distortion Theory 171 A.5 Network Expressiveness in Variational Inference (VI) 172 A.6 Autoencoder 173 A.7 Generative Adversarial Network (GAN) 174 A.8 Reinforcement Learning (RL) 175 A.9 Bayes Risk 178 A.10 Equivalence Relation 179 A.11 Solution to the KL Divergence in Gaussian Case 179 A.12 Reconstruction Error and Mean Squared Error (MSE) 180 B Related Works 181 B.1 End-to-End (E2E) Physical Layer Communication Systems 181 B.2 Stochastic Channel Model Approximations 186 B.3 Reinforcement Learning (RL)-Based Communication Systems 189 B.4 Semantic Source and Channel Coding 194 B.5 Semantic Communication Systems 201 C Deep Latent Variable Model (DLVM) 208 C.1 Maximum Likelihood Estimation (MLE) and Kullback-Leibler (KL) Divergence 209 C.2 Intractabilities 210 C.3 Traditional Evidence Lower Bound (ELBO) 211 D Variational Autoencoder (VAE) 213 D.1 Evidence Lower Bound (ELBO) 214 D.2 Blurriness of the Generative Model 215 D.3 Maximum Likelihood Estimation (MLE) and Evidence Lower Bound (ELBO) 217 D.4 Stochastic Gradient Optimization 217 D.5 Reparameterization Trick 219 D.6 Stochastic Gradient Variational Bayes (SGVB) 220 D.7 Variational Autoencoder (VAE) 221 E Disentangled Representation 224 E.1 Warm-Up Scheduling 225 E.2 Reconstruction Decomposition 225 E.3 Kullback-Leibler (KL) Divergence Regularizer Decomposition 227 E.4 Regularization Coefficients 229 E.5 Total Correlation (TC) 230 F Variational Bounds of Mutual Information (MI) 233 F.1 Lower Bounds 233 F.1.1 Barber-Agakov (BA) Lower Bound 234 F.1.2 Mutual Information Neural Estimation (MINE) Lower Bound 235 F.1.3 Noise Contrastive Estimation (NCE) Lower Bound 238 F.2 Upper Bounds 241 F.2.1 Variational Upper Bound (VUB) 241 F.2.2 Leave-One-Out (L1Out) Upper Bound 243 F.2.3 Contrastive Log-Ratio Upper Bound (CLUB) 245 F.3 Unified Framework 251 F.3.1 Normalized Upper and Lower Bounds 252 F.3.2 Unnormalized Lower Bounds 252 F.3.3 Multi-Sample Unnormalized Lower Bounds 256 F.3.4 Nonlinearly Interpolated Lower Bounds 258 F.3.5 Structured Bounds with Tractable Encoders 259 G Information Bottleneck (IB) 262 G.1 Minimal Sufficient Statistics (MSS) 262 G.2 Relevance through Another Variable 264 G.3 Information Bottleneck (IB) Principle 265 G.4 Variational Information Bottleneck (VIB) 267 G.5 Improved Works 269 H Lossy Compression for Lossless Prediction 271 H.1 Assumptions and Definitions 271 H.1.1 Distortion for Worst-Case Predictive Performance 272 H.1.2 Invariant Tasks 273 H.1.3 Maximal Invariants 274 H.1.4 Correct Data Augmentations 275 H.2 Optimal Bit-Rate for Log Loss 276 H.3 Unsupervised Invariant Neural Compressors 281 I Sparse Variational Dropout 284 I.1 Dropout 284 I.2 Weight Uncertainty 287 I.3 Bayesian Inference 288 I.4 Scale-Invariant Log-Uniform Prior 291 I.5 Weight Sparsity and Structured Group Sparsity 295 J Detailed Network Architectures 298 | - |
| dc.language.iso | en | - |
| dc.subject | 可變長度特徵編碼 | zh_TW |
| dc.subject | 任務導向通信 | zh_TW |
| dc.subject | 隨機表徵學習 | zh_TW |
| dc.subject | 深度變分推斷 | zh_TW |
| dc.subject | 抗干擾 | zh_TW |
| dc.subject | 多任務學習 | zh_TW |
| dc.subject | 裝置邊緣協同推理 | zh_TW |
| dc.subject | Deep variational inference | en |
| dc.subject | Task-oriented communication | en |
| dc.subject | Device-edge co-inference | en |
| dc.subject | Multi-task learning | en |
| dc.subject | Nuisance invariance | en |
| dc.subject | Stochastic representation learning | en |
| dc.subject | Variable-length feature encoding | en |
| dc.title | 利用深度變分學習優化由消息理論建立基於損失之任務導向通信的抗干擾表徵 | zh_TW |
| dc.title | Deep Variational-Enabled Information-Theoretic Representation Learning with Nuisance Invariance to Loss-Based Task-Oriented Communication | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 112-1 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 佟澤陽;王奕翔;林士駿;李佳翰 | zh_TW |
| dc.contributor.oralexamcommittee | Tze-Yang Tung;I-Hsiang Wang;Shih-Chun Lin;Chia-Han Lee | en |
| dc.subject.keyword | 任務導向通信,裝置邊緣協同推理,多任務學習,抗干擾,深度變分推斷,隨機表徵學習,可變長度特徵編碼, | zh_TW |
| dc.subject.keyword | Task-oriented communication,Device-edge co-inference,Multi-task learning,Nuisance invariance,Deep variational inference,Stochastic representation learning,Variable-length feature encoding, | en |
| dc.relation.page | 300 | - |
| dc.identifier.doi | 10.6342/NTU202304514 | - |
| dc.rights.note | 同意授權(全球公開) | - |
| dc.date.accepted | 2023-12-18 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 電信工程學研究所 | - |
| dc.date.embargo-lift | 2028-12-14 | - |
| 顯示於系所單位: | 電信工程學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-112-1.pdf 此日期後於網路公開 2028-12-14 | 7.06 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
