Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88132
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor李明穗zh_TW
dc.contributor.advisorMing-Sui Leeen
dc.contributor.author許雁棋zh_TW
dc.contributor.authorYen-Chi Hsuen
dc.date.accessioned2023-08-08T16:26:28Z-
dc.date.available2023-11-09-
dc.date.copyright2023-08-08-
dc.date.issued2023-
dc.date.submitted2023-07-17-
dc.identifier.citation[1] Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. wav2vec 2.0: A framework for self-supervised learning of speech representations. In Adv. Neural Inform. Process. Syst. (NeurIPS), 2020.
[2] Adrien Bardes, Jean Ponce, and Yann LeCun. Vicreg: Variance-invariance- covariance regularization for self-supervised learning. CoRR, 2021.
[3] Kenneth E. Batcher. Sorting networks and their applications. In American Federation of Information Processing Societies: AFIPS Conference Proceedings, pages 307–314, 1968.
[4] Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, R. Devon Hjelm, and Aaron C. Courville. Mutual information neural esti- mation. In Int. Conf. Machine Learning (ICML), 2018.
[5] Thomas Berg, Jiongxin Liu, Seung Woo Lee, Michelle L Alexander, David W Jacobs, and Peter N Belhumeur. Birdsnap: Large-scale fine-grained visual cate- gorization of birds. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 2011–2018, 2014.
[6] Forrest Briggs, Xiaoli Z Fern, and Raviv Raich. Rank-loss support instance machines for miml instance annotation. In Proceedings of ACM SIGKDD international conference on Knowledge discovery and data mining, pages 534–542, 2012.
[7] Junyi Cao, Chao Ma, Taiping Yao, Shen Chen, Shouhong Ding, and Xiaokang Yang. End-to-end reconstruction-classification learning for face forgery detection. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 4103–4112, 2022.
[8] Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Aréchiga, and Tengyu Ma. Learn- ing imbalanced datasets with label-distribution-aware margin loss. In Adv. Neural Inform. Process. Syst. (NeurIPS), pages 1565–1576, 2019.
[9] Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. Learn- ing imbalanced datasets with label-distribution-aware margin loss. In Adv. Neural Inform. Process. Syst. (NeurIPS), pages 1567–1578, 2019.
[10] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. In Eur. Conf. Comput. Vis. (ECCV), pages 213–229, 2020.
[11] Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep clustering for unsupervised learning of visual features. In Eur. Conf. Comput. Vis. (ECCV), 2018.
[12] Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. Unsupervised learning of visual features by contrasting cluster assignments. In Adv. Neural Inform. Process. Syst. (NeurIPS), 2020.
[13] Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. CoRR, 2021.
[14] Dongliang Chang, Yifeng Ding, Jiyang Xie, Ayan Kumar Bhunia, Xiaoxu Li, Zhanyu Ma, Ming Wu, Jun Guo, and Yi-Zhe Song. The devil is in the chan- nels: Mutual-channel loss for fine-grained image classification. IEEE Trans. Image Process. (TIP), pages 4683–4695, 2020.
[15] Dongliang Chang, Kaiyue Pang, Yixiao Zheng, Zhanyu Ma, Yi-Zhe Song, and Jun Guo. Your ”flamingo” is my ”bird”: Fine-grained, or not. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 11476–11485, 2021.
[16] Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res., pages 321–357, 2002.
[17] Liang Chen, Yong Zhang, Yibing Song, Lingqiao Liu, and Jue Wang. Self- supervised learning of adversarial example: Towards good generalizations for deep- fake detection. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 18689– 18698, 2022.
[18] Shen Chen, Taiping Yao, Yang Chen, Shouhong Ding, Jilin Li, and Rongrong Ji. Local relation learning for face forgery detection. In AAAI, pages 1081–1088, 2021.
[19] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. A simple framework for contrastive learning of visual representations. In Int. Conf. Machine Learning (ICML), 2020.
[20] Xinlei Chen, Haoqi Fan, Ross B. Girshick, and Kaiming He. Improved baselines with momentum contrastive learning. CoRR, 2020.
[21] Xinlei Chen and Kaiming He. Exploring simple siamese representation learning. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2021.
[22] Yixin Chen, Jinbo Bi, and James Ze Wang. Miles: Multiple-instance learning via embedded instance selection. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI), pages 1931–1947, 2006.
[23] Yixin Chen, Jinbo Bi, and James Ze Wang. MILES: multiple-instance learning via embedded instance selection. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI), pages 1931–1947, 2006.
[24] Yue Chen, Yalong Bai, Wei Zhang, and Tao Mei. Destruction and construction learning for fine-grained image recognition. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 5157–5166, 2019.
[25] François Chollet. Xception: Deep learning with depthwise separable convolutions. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 1800–1807, 2017.
[26] Davide Cozzolino, Andreas Rössler, Justus Thies, Matthias Nießner, and Luisa Ver- doliva. Id-reveal: Identity-aware deepfake video detection. In Int. Conf. Comput. Vis. (ICCV), pages 15088–15097, 2021.
[27] Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge J. Belongie. Class- balanced loss based on effective number of samples. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 9268–9277, 2019.
[28] Jiankang Deng, Jia Guo, Evangelos Ververas, Irene Kotsia, and Stefanos Zafeiriou. Retinaface: Single-shot multi-level face localisation in the wild. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 5202–5211, 2020.
[29] Don Dennis, Chirag Pabbaraju, Harsha Vardhan Simhadri, and Prateek Jain. Mul- tiple instance learning for efficient sequential data classification on resource- constrained devices. In Adv. Neural Inform. Process. Syst. (NeurIPS), pages 10976–10987, 2018.
[30] Lihi Dery. Multi-label ranking: Mining multi-label and label ranking data. CoRR, 2021.
[31] Thomas G Dietterich, Richard H Lathrop, and Tomás Lozano-Pérez. Solving the multiple instance problem with axis-parallel rectangles. Artificial intelligence, pages 31–71, 1997.
[32] Thomas G. Dietterich, Richard H. Lathrop, and Tomás Lozano-Pérez. Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell., pages 31–71, 1997.
[33] Brian Dolhansky, Russ Howes, Ben Pflaum, Nicole Baram, and Cristian Canton- Ferrer. The deepfake detection challenge (DFDC) preview dataset. CoRR, 2019.
[34] Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Ting Zhang, Weiming Zhang, Neng- hai Yu, Dong Chen, Fang Wen, and Baining Guo. Protecting celebrities from deepfake with identity consistency transformer. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 9458–9468, 2022.
[35] Chris Drummond, Robert C Holte, et al. C4. 5, Class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In Workshop on learning from imbalanced datasets II, pages 1–8, 2003.
[36] Ruoyi Du, Dongliang Chang, Ayan Kumar Bhunia, Jiyang Xie, Zhanyu Ma, Yi- Zhe Song, and Jun Guo. Fine-grained visual classification via progressive multi- granularity training of jigsaw patches. In Eur. Conf. Comput. Vis. (ECCV), pages 153–168, 2020.
[37] Abhimanyu Dubey, Otkrist Gupta, Pei Guo, Ramesh Raskar, Ryan Farrell, and Nikhil Naik. Pairwise confusion for fine-grained visual classification. In Eur. Conf. Comput. Vis. (ECCV), pages 70–86, 2018.
[38] Abhimanyu Dubey, Otkrist Gupta, Ramesh Raskar, and Nikhil Naik. Maximum- entropy fine grained classification. In Adv. Neural Inform. Process. Syst. (NeurIPS), pages 635–645, 2018.
[39] Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, and Andrew Zisserman. With a little help from my friends: Nearest-neighbor contrastive learn- ing of visual representations. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 9588–9597, 2021.
[40] Aleksandr Ermolov, Aliaksandr Siarohin, Enver Sangineto, and Nicu Sebe. Whiten- ing for self-supervised representation learning. In Int. Conf. Machine Learning (ICML), 2021.
[41] Bernard Ghanem Fabian Caba Heilbron, Victor Escorcia and Juan Carlos Niebles. Activitynet: A large-scale video benchmark for human activity understanding. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 961–970, 2015.
[42] Ji Feng and Zhi-Hua Zhou. Deep miml network. In AAAI, 2017.
[43] Jianlong Fu, Heliang Zheng, and Tao Mei. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 4438–4446, 2017.
[44] Weifeng Ge, Xiangru Lin, and Yizhou Yu. Weakly supervised complementary parts models for fine-grained image classification from the bottom up. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 3034–3043, 2019.
[45] Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised representation learning by predicting image rotations. In Int. Conf. Learn. Represent. (ICLR), 2018.
[46] Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Ávila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, and Michal Valko. Bootstrap your own latent - A new approach to self-supervised learning. In Adv. Neural Inform. Process. Syst. (NeurIPS), 2020.
[47] Zhihao Gu, Taiping Yao, Yang Chen, Shouhong Ding, and Lizhuang Ma. Hierarchi- cal contrastive inconsistency learning for deepfake video detection. In Eur. Conf. Comput. Vis. (ECCV), pages 596–613, 2022.
[48] Agrim Gupta, Piotr Dollar, and Ross Girshick. LVIS: A dataset for large vocabulary instance segmentation. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 5356–5364, 2019.
[49] Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensionality reduction by learn- ing an invariant mapping. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2006.
[50] Hui Han, Wenyuan Wang, and Binghuan Mao. Borderline-smote: A new over- sampling method in imbalanced data sets learning. In De-Shuang Huang, Xiao- Ping (Steven) Zhang, and Guang-Bin Huang, editors, ICIC, pages 878–887, 2005.
[51] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2020.
[52] Kaiming He, Ross Girshick, and Piotr Dollár. Rethinking imagenet pre-training. In Int. Conf. Comput. Vis. (ICCV), pages 4918–4927, 2019.
[53] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 770–778, 2016.
[54] R. Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Philip Bachman, Adam Trischler, and Yoshua Bengio. Learning deep representations by mutual information estimation and maximization. In Int. Conf. Learn. Represent. (ICLR), 2019.
[55] Xinting Hu, Yi Jiang, Kaihua Tang, Jingyuan Chen, Chunyan Miao, and Hanwang Zhang. Learning to segment the tail. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 14042–14051, 2020.
[56] Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang. Learning deep repre- sentation for imbalanced classification. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 5375–5384, 2016.
[57] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 4700–4708, 2017.
[58] Shaoli Huang, Zhe Xu, Dacheng Tao, and Ya Zhang. Part-stacked cnn for fine- grained visual categorization. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 1173–1182, 2016.
[59] Maximilian Ilse, Jakub Tomczak, and Max Welling. Attention-based deep multiple instance learning. In Int. Conf. Machine Learning (ICML), pages 2127–2136, 2018.
[60] Maximilian Ilse, Jakub M. Tomczak, and Max Welling. Attention-based deep mul- tiple instance learning. In Jennifer G. Dy and Andreas Krause, editors, Int. Conf. Machine Learning (ICML), pages 2132–2141, 2018.
[61] Muhammad Abdullah Jamal, Matthew Brown, Ming-Hsuan Yang, Liqiang Wang, and Boqing Gong. Rethinking class-balanced methods for long-tailed visual recog- nition from a domain adaptation perspective. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 7607–7616, 2020.
[62] Yannis Kalantidis, Mert Bülent Sariyildiz, Noé Pion, Philippe Weinzaepfel, and Diane Larlus. Hard negative mixing for contrastive learning. In Adv. Neural Inform. Process. Syst. (NeurIPS), 2020.
[63] Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, and Yannis Kalantidis. Decoupling representation and classifier for long- tailed recognition. In Int. Conf. Learn. Represent. (ICLR), 2020.
[64] Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, and Yannis Kalantidis. Decoupling representation and classifier for long- tailed recognition. In Int. Conf. Learn. Represent. (ICLR), 2020.
[65] Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI), pages 4217–4228, 2021.
[66] Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learn- ing. In Adv. Neural Inform. Process. Syst. (NeurIPS), 2020.
[67] Hyunsu Kim, Yunjey Choi, Junho Kim, Sungjoo Yoo, and Youngjung Uh. Exploit- ing spatial dimensions of latent in GAN for real-time image editing. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 852–861, 2021.
[68] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, 2014.
[69] Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3D Object representa- tions for fine-grained categorization. In Int. Conf. Comput. Vis. Worksh. (ICCVW), pages 554–561, 2013.
[70] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, 2009.
[71] Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. Learning to detect unseen object classes by between-class attribute transfer. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 951–958, 2009.
[72] Yann LeCun, Corinna Cortes, and Christopher JC Burges. The mnist database of handwritten digits, 1998. URL http://yann. lecun. com/exdb/mnist, page 34, 1998.
[73] Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Bain- ing Guo. Face x-ray for more general face forgery detection. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 5000–5009, 2020.
[74] Xiaodan Li, Yining Lang, Yuefeng Chen, Xiaofeng Mao, Yuan He, Shuhui Wang, Hui Xue, and Quan Lu. Sharp multiple instance learning for deepfake video detec- tion. In ACM Int. Conf. Multimedia (ACMMM), pages 1864–1872, 2020.
[75] Yu Li, Tao Wang, Bingyi Kang, Sheng Tang, Chunfeng Wang, Jintao Li, and Jiashi Feng. Overcoming classifier imbalance for long-tail object detection with balanced group softmax. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 10988– 10997, 2020.
[76] Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. Celeb-df: A large- scale challenging dataset for deepfake forensics. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 3204–3213, 2020.
[77] Cong Han Lim and Steve Wright. A box-constrained approach for hard permutation problems. In Maria-Florina Balcan and Kilian Q. Weinberger, editors, Int. Conf. Machine Learning (ICML), pages 2454–2463, 2016.
[78] Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. In Int. Conf. Comput. Vis. (ICCV), pages 2999– 3007, 2017.
[79] Guoqing Liu, Jianxin Wu, and Z-H Zhou. Key instance detection in multi-instance learning. In Asian Conference on Machine Learning, pages 253–268, 2012.
[80] Honggu Liu, Xiaodan Li, Wenbo Zhou, Yuefeng Chen, Yuan He, Hui Xue, Weim- ing Zhang, and Nenghai Yu. Spatial-phase shallow learning: Rethinking face forgery detection in frequency domain. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 772–781, 2021.
[81] Kangjun Liu, Ke Chen, and Kui Jia. Convolutional fine-grained classification with self-supervised target relation regularization. IEEE Trans. Image Process. (TIP), pages 5570–5584, 2022.
[82] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Int. Conf. Comput. Vis. (ICCV), pages 9992–10002, 2021.
[83] Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, and Stella X. Yu. Large-scale long-tailed recognition in an open world. CoRR, 2019.
[84] Ilya Loshchilov and Frank Hutter. SGDR: stochastic gradient descent with warm restarts. In Int. Conf. Learn. Represent. (ICLR), 2017.
[85] Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. Hierarchical question- image co-attention for visual question answering. In Adv. Neural Inform. Process. Syst. (NeurIPS), pages 289–297, 2016.
[86] Minh-Thang Luong, Hieu Pham, and Christopher D Manning. Effective approaches to attention-based neural machine translation. CoRR, 2015.
[87] Dhruv Mahajan, Ross Girshick, Vignesh Ramanathan, Kaiming He, Manohar Paluri, Yixuan Li, Ashwin Bharambe, and Laurens van der Maaten. Exploring the limits of weakly supervised pretraining. In Eur. Conf. Comput. Vis. (ECCV), pages 181–196, 2018.
[88] Subhransu Maji, Esa Rahtu, Juho Kannala, Matthew B. Blaschko, and Andrea Vedaldi. Fine-grained visual classification of aircraft. CoRR, 2013.
[89] Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, and Sanjiv Kumar. Long-tail learning via logit adjustment. In Int. Conf. Learn. Represent. (ICLR), 2021.
[90] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Dis- tributed representations of words and phrases and their compositionality. In Adv. Neural Inform. Process. Syst. (NeurIPS), pages 3111–3119, 2013.
[91] Nam Nguyen. A new svm approach to multi-instance multi-label learning. In IEEE International Conference on Data Mining, pages 384–392, 2010.
[92] Mehdi Noroozi and Paolo Favaro. Unsupervised learning of visual representations by solving jigsaw puzzles. In Eur. Conf. Comput. Vis. (ECCV), 2016.
[93] Sherjil Ozair, Corey Lynch, Yoshua Bengio, Aäron van den Oord, Sergey Levine, and Pierre Sermanet. Wasserstein dependency measure for representation learning. In Adv. Neural Inform. Process. Syst. (NeurIPS), 2019.
[94] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. In Adv. Neural Inform. Process. Syst. Worksh. (NeurIPSW), 2017.
[95] Deepak Pathak, Evan Shelhamer, Jonathan Long, and Trevor Darrell. Fully convo- lutional multi-class multiple instance learning. CoRR, 2014.
[96] Felix Petersen, Christian Borgelt, Hilde Kuehne, and Oliver Deussen. Differen- tiable sorting networks for scalable sorting and ranking supervision. In Marina Meila and Tong Zhang, editors, Int. Conf. Machine Learning (ICML), pages 8546– 8555, 2021.
[97] Felix Petersen, Christian Borgelt, Hilde Kuehne, and Oliver Deussen. Monotonic differentiable sorting networks. In Int. Conf. Learn. Represent. (ICLR), 2022.
[98] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Int. Conf. Machine Learning (ICML), pages 8748– 8763, 2021.
[99] Benjamin Recht, Maryam Fazel, and Pablo A Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review, pages 471–501, 2010.
[100] Hao Ren. A pytorch implementation of simclr. https://github.com/leftthomas/ SimCLR, 2020.
[101] Joshua David Robinson, Ching-Yao Chuang, Suvrit Sra, and Stefanie Jegelka. Contrastive learning with hard negative samples. In Int. Conf. Learn. Represent. (ICLR), 2021.
[102] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 10684–10695, 2022.
[103] Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. Faceforensics++: Learning to detect manipulated facial images. In Int. Conf. Comput. Vis. (ICCV), pages 1–11, 2019.
[104] Dvir Samuel, Yuval Atzmon, and Gal Chechik. From generalized zero-shot learn- ing to long-tail with class descriptors. In IEEE Winter Conf. App. Comput. Vis. (WAVC), pages 286–295, 2021.
[105] Matthew Schultz and Thorsten Joachims. Learning a distance metric from relative comparisons. In Adv. Neural Inform. Process. Syst. (NeurIPS), pages 41–48, 2004.
[106] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedan- tam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep net- works via gradient-based localization. In Int. Conf. Comput. Vis. (ICCV), pages 618–626, 2017.
[107] Rui Shao, Tianxing Wu, and Ziwei Liu. Detecting and recovering sequential deep- fake manipulation. In Eur. Conf. Comput. Vis. (ECCV), pages 712–728, 2022.
[108] Li Shen, Zhouchen Lin, and Qingming Huang. Relay backpropagation for effec- tive learning of deep convolutional neural networks. In Eur. Conf. Comput. Vis. (ECCV), pages 467–482, 2016.
[109] Kaede Shiohara and Toshihiko Yamasaki. Detecting deepfakes with self-blended images. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 18699–18708, 2022.
[110] Abhinav Shrivastava, Abhinav Gupta, and Ross B. Girshick. Training region-based object detectors with online hard example mining. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 761–769, 2016.
[111] Yangyang Shu, Baosheng Yu, Haiming Xu, and Lingqiao Liu. Improving fine- grained visual recognition in low data regimes via self-boosting attention mecha- nism. In Eur. Conf. Comput. Vis. (ECCV), pages 449–465, 2022.
[112] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, 2014.
[113] Jingru Tan, Changbao Wang, Buyu Li, Quanquan Li, Wanli Ouyang, Changqing Yin, and Junjie Yan. Equalization loss for long-tailed object recognition. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 11659–11668, 2020.
[114] Mingxing Tan and Quoc V. Le. Efficientnet: Rethinking model scaling for convolu- tional neural networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Int. Conf. Machine Learning (ICML), pages 6105–6114, 2019.
[115] Kaihua Tang, Jianqiang Huang, and Hanwang Zhang. Long-tailed classification by keeping the good and removing the bad momentum causal effect. In Adv. Neural Inform. Process. Syst. (NeurIPS), 2020.
[116] Kevin Thandiackal, Boqi Chen, Pushpak Pati, Guillaume Jaume, Drew F. K. Williamson, Maria Gabrani, and Orcun Goksel. Differentiable zooming for mul- tiple instance learning on whole-slide images. In Eur. Conf. Comput. Vis. (ECCV), pages 699–715, 2022.
[117] Changyao Tian, Wenhai Wang, Xizhou Zhu, Jifeng Dai, and Yu Qiao. VL-LTR: learning class-wise visual-linguistic representation for long-tailed visual recogni- tion. In Eur. Conf. Comput. Vis. (ECCV), pages 73–91, 2022.
[118] Yonglong Tian, Dilip Krishnan, and Phillip Isola. Contrastive multiview coding. In Eur. Conf. Comput. Vis. (ECCV), 2020.
[119] Yao-Hung Hubert Tsai, Martin Q. Ma, Muqiao Yang, Han Zhao, Louis-Philippe Morency, and Ruslan Salakhutdinov. Self-supervised representation learning with relative predictive coding. In Int. Conf. Learn. Represent. (ICLR), 2021.
[120] Aäron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. CoRR, 2018.
[121] Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. The inaturalist species classifi- cation and detection dataset. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 8769–8778, 2018.
[122] C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
[123] Hualiang Wang, Siming Fu, Xiaoxuan He, Hangxiang Fang, Zuozhu Liu, and Haoji Hu. Towards calibrated hyper-sphere representation via distribution overlap coeffi- cient for long-tailed learning. In Eur. Conf. Comput. Vis. (ECCV), pages 179–196, 2022.
[124] Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. Temporal segment networks: Towards good practices for deep ac- tion recognition. In Eur. Conf. Comput. Vis. (ECCV), pages 20–36, 2016.
[125] Phil Wang. x-clip. https://github.com/lucidrains/x-clip, 2021.
[126] Tao Wang, Yu Li, Bingyi Kang, Junnan Li, Jun Hao Liew, Sheng Tang, Steven C. H. Hoi, and Jiashi Feng. The devil is in classification: A simple framework for long-tail instance segmentation. CoRR, 2020.
[127] Tongzhou Wang and Phillip Isola. Understanding contrastive representation learn- ing through alignment and uniformity on the hypersphere. In Int. Conf. Machine Learning (ICML), 2020.
[128] Xinggang Wang, Yongluan Yan, Peng Tang, Xiang Bai, and Wenyu Liu. Revisiting multiple instance neural networks. Pattern Recognition (PR), 74:15–24, 2018.
[129] Xudong Wang, Long Lian, Zhongqi Miao, Ziwei Liu, and Stella X. Yu. Long- tailed recognition by routing diverse distribution-aware experts. In Int. Conf. Learn. Represent. (ICLR), 2021.
[130] Xudong Wang, Ziwei Liu, and Stella X. Yu. Unsupervised feature learning by cross- level instance-group discrimination. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2021.
[131] Yaming Wang, Vlad I Morariu, and Larry S Davis. Learning a discriminative filter bank within a cnn for fine-grained recognition. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 4148–4157, 2018.
[132] Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. Learning to model the tail. In Adv. Neural Inform. Process. Syst. (NeurIPS), pages 7029–7039, 2017.
[133] Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. Unsupervised feature learning via non-parametric instance discrimination. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2018.
[134] Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. In Int. Conf. Machine Learning (ICML), pages 2048–2057, 2015.
[135] Ze Yang, Tiange Luo, Dong Wang, Zhiqiang Hu, Jun Gao, and Liwei Wang. Learn- ing to navigate for fine-grained classification. In Eur. Conf. Comput. Vis. (ECCV), pages 420–435, 2018.
[136] Mang Ye, Xu Zhang, Pong C Yuen, and Shih-Fu Chang. Unsupervised embedding learning via invariant and spreading instance feature. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2019.
[137] Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. Image captioning with semantic attention. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 4651–4659, 2016.
[138] Yang You, Igor Gitman, and Boris Ginsburg. Large batch training of convolutional networks. CoRR, 2017.
[139] Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and Stéphane Deny. Barlow twins: Self-supervised learning via redundancy reduction. In Int. Conf. Machine Learning (ICML), pages 12310–12320, 2021.
[140] Xiaohang Zhan, Jiahao Xie, Ziwei Liu, Dahua Lin, and Chen Change Loy. Open- SelfSup: Open mmlab self-supervised learning toolbox and benchmark. https://github.com/open-mmlab/openselfsup, 2020.
[141] Hongrun Zhang, Yanda Meng, Yitian Zhao, Yihong Qiao, Xiaoyun Yang, Sarah E. Coupland, and Yalin Zheng. DTFD-MIL: double-tier feature distillation multiple in- stance learning for histopathology whole slide image classification. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 18780–18790, 2022.
[142] Min-Ling Zhang. A k-nearest neighbor based multi-instance multi-label learning algorithm. In IEEE International Conference on Tools with Artificial Intelligence, pages 207–212, 2010.
[143] Richard Zhang, Phillip Isola, and Alexei A. Efros. Colorful image colorization. In Eur. Conf. Comput. Vis. (ECCV), 2016.
[144] Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Tianyi Wei, Weiming Zhang, and Nenghai Yu. Multi-attentional deepfake detection. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 2185–2194, 2021.
[145] Tianchen Zhao, Xiang Xu, Mingze Xu, Hui Ding, Yuanjun Xiong, and Wei Xia. Learning self-consistency for deepfake detection. In Int. Conf. Comput. Vis. (ICCV), pages 15003–15013, 2021.
[146] Heliang Zheng, Jianlong Fu, Zheng-Jun Zha, and Jiebo Luo. Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 5012– 5021, 2019.
[147] Boyan Zhou, Quan Cui, Xiu-Shen Wei, and Zhao-Min Chen. BBN: Bilateral- branch network with cumulative learning for long-tailed visual recognition. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 9719–9728, 2020.
[148] Yipin Zhou and Ser-Nam Lim. Joint audio-visual deepfake detection. In Int. Conf. Comput. Vis. (ICCV), pages 14780–14789, 2021.
[149] Zhi-Hua Zhou and Min-Ling Zhang. Multi-instance multi-label learning with ap- plication to scene classification. In Adv. Neural Inform. Process. Syst. (NeurIPS), pages 1609–1616, 2007.
[150] Zhi-Hua Zhou, Min-Ling Zhang, Sheng-Jun Huang, and Yu-Feng Li. Multi- instance multi-label learning. Artificial Intelligence, pages 2291–2320, 2012.
[151] Benjin Zhu, Junqiang Huang, Zeming Li, Xiangyu Zhang, and Jian Sun. Eqco: Equivalent rules for self-supervised contrastive learning. CoRR, 2020.
[152] Xiangyu Zhu, Hao Wang, Hongyan Fei, Zhen Lei, and Stan Z. Li. Face forgery de- tection by 3d decomposition. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 2929–2939, 2021.
[153] Bojia Zi, Minghao Chang, Jingjing Chen, Xingjun Ma, and Yu-Gang Jiang. Wild- deepfake: A challenging real-world dataset for deepfake detection. In ACM Int. Conf. Multimedia (ACMMM), pages 2382–2390, 2020.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88132-
dc.description.abstract本文對具有挑戰性的分類任務的特徵表示進行了全面探索。研究工作聚焦於四個關鍵方面:多實例數據分佈的學習、無標籤數據分佈的學習、現實世界數據分佈的學習以及順序數據分佈的學習。首先在多實例數據的情境下,我們引入了一種新穎的跨注意力池化方法,結合注意力引導,有效地表示給定特定查詢的一組實例。所提出的方法捕捉了關鍵特徵,實現了準確的分類。接著,為應對無標籤數據分佈的挑戰,本文提出了一種解耦對比學習框架。該框架緩解了對比學習中大批量數據的問題,並討論了各種方法對後續分類任務的影響。然後,在面對現實世界數據分佈帶來的獨特挑戰時,例如細粒度和長尾問題,我們提出了一種自適應批次混淆規範(ABC-Norm)。該方法同時解決了這兩項問題,實現了針對現實世界情境的表徵學習。最後,在處理多個偽造組件和順序問題的深偽影像的表徵問題時,我們將該問題分解為深偽分類、多標籤定位和偽造順序恢復的任務,並提出了一種多標籤排序機制,結合對比的多實例情境,以恢復順序數據分佈。透過廣泛的實驗,本文為分類任務的表徵學習做出了重要貢獻,我們討論了最先進的方法,並且在每個方面中的挑戰都提出了新穎的方法並取得突出的研究成果。zh_TW
dc.description.abstractThis thesis presents a comprehensive exploration of feature representations for challenging classification tasks. The research efforts focus on four key aspects: learning with multi-instance data distributions, learning with unlabeled data distributions, learning with real-world data distributions, and learning with ordering data distributions.

In the context of multi-instance data, we introduce a novel cross-attention pooling approach, incorporating attention guidance, to effectively represent a bag of instances given a specific query. The proposed method captures essential features and enables accurate classification. To address the challenge of unlabeled data distributions, a decoupled contrastive learning framework is proposed. This framework alleviates the issue of large batch sizes in contrastive learning and discusses the implications of various approaches for subsequent classification tasks. Real-world data distributions present unique challenges, such as fine-grained and long-tailed issues. To tackle these complexities, we present an adaptive batch confusion norm (ABC-Norm) that addresses both issues and enables the learning of robust feature representations tailored to real-world scenarios. Finally, we address the representation of deepfake images, which involve multiple manipulated components and ordering issues. The problem is decomposed into deepfake classification, multi-label localization, and manipulation ordering tasks. A multi-label ranking mechanism, combined with a contrastive multi-instance scenario, is proposed to recover the ordering data distributions.

Through algorithmic design and extensive experimentation, this thesis contributes to the advancement of representation learning for classification tasks. It discusses state-of-the-art methodologies, pinpoints the challenges associated with each aspect, and proposes effective research approaches. The findings of this research provide useful insights into the field of representation learning for tackling challenging classification tasks.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-08T16:26:28Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2023-08-08T16:26:28Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsVerification Letter from the Oral Examination Committee i
Acknowledgements iii
摘要 v
Abstract vii
Contents ix
List of Figures xiii
List of Tables xix
Chapter 1 Introduction 1
Chapter 2 Learning with Multi-Instance Data Distributions 5
2.1 Introduction 5
2.2 Related Work 6
2.3 Approach 10
2.3.1 The qMIL Problem 10
2.3.2 Query-adapted Attention Pooling 12
2.3.3 Loss Function and Regularization 13
2.3.4 Zero-shot Classification via Queries 15
2.4 Experimental Results 15
2.4.1 Data Sampling 16
2.4.2 Training and Inference 17
2.4.3 Standard MIL 18
2.4.4 MIML 19
2.4.5 MIML for Video Applications 21
2.4.6 Zero-shot Scenarios 22
2.5 Conclusions 22
Chapter 3 Learning with Unlabeled Data Distributions 25
3.1 Introduction 25
3.2 Related Work 28
3.3 Decouple Negative and Positive Samples in Contrastive Learning 30
3.4 Experiments 33
3.4.1 Implementation Details 33
3.4.2 Experiments and Analysis 34
3.4.3 Ablations 37
3.5 Discussion 40
3.6 Conclusion 41
Chapter 4 Learning with Real-World Data Distributions 43
4.1 Introduction 43
4.2 Related Work 47
4.2.1 Fine-grained visual classification 48
4.2.2 Long-tailed visual recognition 50
4.3 Our Method 52
4.3.1 Adaptive Batch Confusion Norm 53
4.3.2 ABC-Norm: Justifications and Properties 55
4.3.3 ABC-Norm vs. Relevant Regularization 57
4.4 Experiments 58
4.4.1 Datasets 58
4.4.2 Implementation Details 60
4.4.3 Real-world Data 61
4.4.4 Model Analysis 63
4.4.5 More on Fine-grained 66
4.4.6 More on Long-tailed 66
4.4.7 Additional Results 67
4.5 Conclusions 68
Chapter 5 Learning with Ordering Data Distributions 71
5.1 Introduction 71
5.2 Related work 73
5.3 Method 75
5.4 Experimental results 82
5.4.1 Comparison 84
5.4.2 Analysis and discussion 86
5.5 Conclusion 88
Chapter 6 Conclusion 91
References 93
-
dc.language.isoen-
dc.subject現實世界分佈zh_TW
dc.subject表徵學習zh_TW
dc.subject自監督學習zh_TW
dc.subject多實例學習zh_TW
dc.subject順序數據分佈zh_TW
dc.subjectreal-world distributionsen
dc.subjectordering data distributionsen
dc.subjectself-superviseden
dc.subjectmulti-instance learningen
dc.subjectrepresentation learningen
dc.title從表徵學習探究具挑戰性視覺分類問題zh_TW
dc.titleRepresentation Learning for Challenging Visual Classification Problemsen
dc.typeThesis-
dc.date.schoolyear111-2-
dc.description.degree博士-
dc.contributor.coadvisor劉庭祿zh_TW
dc.contributor.coadvisorTyng-Luh Liuen
dc.contributor.oralexamcommittee陳祝嵩 ;莊永裕;王鈺強;陳煥宗zh_TW
dc.contributor.oralexamcommitteeChu-Song Chen;Yung-Yu Chuang;Yu-Chiang Frank Wang;Hwann-Tzong Chenen
dc.subject.keyword表徵學習,多實例學習,自監督學習,現實世界分佈,順序數據分佈,zh_TW
dc.subject.keywordrepresentation learning,multi-instance learning,self-supervised,real-world distributions,ordering data distributions,en
dc.relation.page110-
dc.identifier.doi10.6342/NTU202301574-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2023-07-18-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-111-2.pdf6.19 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved