可調適廣告迷因標題生成模型

葉家妤; Jia-Yu Yeh

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98153

Full metadata record

???org.dspace.app.webui.jsptag.ItemTag.dcfield???	Value	Language
dc.contributor.advisor	李瑞庭	zh_TW
dc.contributor.advisor	Anthony J. T. Lee	en
dc.contributor.author	葉家妤	zh_TW
dc.contributor.author	Jia-Yu Yeh	en
dc.date.accessioned	2025-07-30T16:08:04Z	-
dc.date.available	2025-07-31	-
dc.date.copyright	2025-07-30	-
dc.date.issued	2025	-
dc.date.submitted	2025-07-16	-
dc.identifier.citation	Almazrouei E, Alobeidli H, Alshamsi A, Cappelli A, Cojocaru R, Debbah M, Penedo G et al. (2023) The falcon series of open language models. arXiv:2311.16867. https://doi.org/10.48550/arXiv.2311.16867 Bousaid R, El Hajji M, Es-Saady Y (2022) Facial expression recognition using a hybrid ViT-CNN aggregator. Proceedings of the International Conference on Business Intelligence. 61-70. https://doi.org/10.1007/978-3-031-06458-6_5 Chen H, Wang Y, Yang X, Li J (2021) Captioning transformer with scene graph guiding. Proceedings of the IEEE International Conference on Image Processing. 2538-2542. https://doi.org/10.1109/ICIP42928.2021.9506193 Damodaran P (2021) Parrot: Paraphrase generation for nlu. GitHub Open Source. Dawkins R (1976) The Selfish Gene. Oxford: Oxford University Press, New York, USA. Devlin J, Chang M W, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171-4186. https://doi.org/10.18653/v1/N19-1423 Eisend M (2022) The influence of humor in advertising: Explaining the effects of humor in two‐sided messages. Psychology & Marketing 39(5):962-973. https://doi.org/10.1002/mar.21634 Fang Z, Wang J, Hu X, Liang L, Gan Z, Wang L, Yang Y, Liu Z (2022) Injecting semantic concepts into end-to-end image captioning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17988-17998. https://doi.org/10.1109/CVPR52688.2022.01748 Giroux J, Bouchard M, Laganière R (2023) T-FFTRadNet: Object detection with Swin vision transformers from raw ADC radar signals. Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 4032-4041. https://doi.org/10.1109/ICCVW60793.2023.00435 Han S, Chang H, Shi Z, Hu S (2023) Facial expression recognition algorithm based on Swin transformer. Proceedings of the 9th International Conference on Systems and Informatics. 1-6. https://doi.org/10.1109/ICSAI61474.2023.10423327 Hatamizadeh A, Nath V, Tang Y, Yang D, Roth HR, Xu D (2022) Swin UNETR: Swin transformers for semantic segmentation of brain tumors in MRI images. Proceedings of the International MICCAI Brainlesion Workshop. 272-284. https://doi.org/10.1007/978-3-031-08999-2_22 He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770-778. https://doi.org/10.48550/arXiv.1512.03385 He R, Liu L, Ye H, Tan Q, Ding B, Cheng L, Low J, Bing L, Si L (2021) On the effectiveness of adapter-based tuning for pretrained language model adaptation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2208-2222. https://doi.org/10.18653/v1/2021.acl-long.172 He X, Zhou Y, Zhao J, Zhang D, Yao R, Xue Y (2022) Swin transformer embedding UNet for remote sensing image semantic segmentation. IEEE Transactions on Geoscience and Remote Sensing 60:1-15. https://doi.org/10.1109/TGRS.2022.3144165 Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Computation 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 Hu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W (2021) LoRA: Low-rank adaptation of large language models. arXiv.2106.09685. https://doi.org/10.48550/arXiv.2106.09685 Jung D, Shim S, Choo C, Hwang D, Nah Y, Oh S (2022) A preliminary result of food object detection using Swin transformer. Proceedings of the 8th International Conference on Computer Technology Applications. 183-187. https://doi.org/10.1145/3543712.3543731 Li J, Li D, Savarese S, Hoi S (2023) Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. Proceedings of the International Conference on Machine Learning. 19730-19742. https://doi.org/10.48550/arXiv.2301.12597 Li J, Li D, Xiong C, Hoi S (2022) Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. Proceedings of the International Conference on Machine Learning. 12888-12900. https://doi.org/10.48550/arXiv.2201.12086 Li R, Sun S, Elhoseiny M, Torr P (2023) OxfordTVG-HIC: Can machine make humorous captions from images? Proceedings of the IEEE/CVF International Conference on Computer Vision. 20293-20303. https://doi.org/10.1109/ICCV51070.2023.01856 Li Y, Pan Y, Yao T, Mei T (2022) Comprehending and ordering semantics for image captioning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17969-17978. https://doi.org/10.1109/CVPR52688.2022.01746 Likert R (1932) A technique for the measurement of attitudes. Archives of psychology. 22(140):55. Liu L, Jiao Y, Li X, Li J, Wang H, Cao X (2023) Swin transformer-based image captioning with feature enhancement and multi-stage fusion. Proceedings of the 19th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery. 1-7. https://doi.org/10.1109/ICNC-FSKD59587.2023.10281090 Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) RoBERTa: A robustly optimized BERT pretraining approach. arXiv.1907.11692. https://doi.org/10.48550/arXiv.1907.11692 Liu Z, Hu H, Lin Y, Yao Z, Xie Z, Wei Y, Ning J, Cao J, Zheng Z, Dong L, Wei F, Guo B (2022) Swin transformer v2: Scaling up capacity and resolution. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11999-12009. https://doi.org/10.1109/CVPR52688.2022.01170 Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF International Conference on Computer Vision. 10012-10022. https://doi.org/10.1109/ICCV48922.2021.00986 Liu Z, Tan Y, He Q, Xiao Y (2021) SwinNet: Swin transformer drives edge-aware RGB-D and RGB-T salient object detection. IEEE Transactions on Circuits and Systems for Video Technology 32(7):4486-4497. https://doi.org/10.1109/TCSVT.2021.3127149 Lu J, Batra D, Parikh D, Lee S (2019) ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in Neural Information Processing Systems 32. https://doi.org/10.48550/arXiv.1908.02265 Malodia S, Dhir A, Bilgihan A, Sinha P, Tikoo T (2022) Meme marketing: How can marketers drive better engagement using viral memes? Psychology & Marketing 39(9):1775-1801. https://doi.org/10.1002/mar.21702 McGraw AP, Warren C (2010) Benign violations: Making immoral behavior funny. Psychological Science. 21(8):1141-1149. https://doi.org/10.1177/0956797610376073 Mokady R, Hertz A, Bermano A H (2021) ClipCap: Clip prefix for image captioning. arXiv.2111.09734. https://doi.org/10.48550/arXiv.2111.09734 Muennighoff N (2020) Vilio: State-of-the-art visio-linguistic models applied to hateful memes. arXiv:2012.07788. https://doi.org/10.48550/arXiv.2012.07788 Ninh QB, Nguyen HC, Huynh T, Tran MT, Le TN (2023) Multi-branch network for imagery emotion prediction. Proceedings of the 12th International Symposium on Information and Communication Technology. 371-378. https://doi.org/10.1145/3628797.3628954 Pech RJ (2003) Memetics and innovation: Profit through balanced meme management. European Journal of Innovation Management 6(2):111-117. Peirson VAL, Tolunay EM (2018) Dank learning: Generating memes using deep neural networks. arXiv.1806.04510. https://doi.org/10.48550/arXiv.1806.04510 Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G, Sutskever I (2021) Learning transferable visual models from natural language supervision. Proceedings of the International Conference on Machine Learning. 8748-8763. https://doi.org/10.48550/arXiv.2103.00020 Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI Blog 1(8):9. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6):1137-1149. https://doi.org/10.1109/TPAMI.2016.2577031 Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. https://doi.org/10.48550/arXiv.1409.1556 Vaswani A (2017) Attention is all you need. arXiv.1706.03762. https://doi.org/10.48550/arXiv.1706.03762 Vyalla SR, Udandarao V (2020) Memeify: A large-scale meme generation system. Proceedings of the 7th ACM India Joint International Conference on Data Science and Management of Data. 307-311. https://doi.org/10.1145/3371158.3371403 Wang H, Lee RKW (2024) MemeCraft: Contextual and stance-driven multimodal meme generation. Proceedings of the ACM on Web Conference. 4642-4652. https://doi.org/10.1145/3589334.3648151 Weber M, Quiring O (2019) Is it really that funny? Laughter, emotional contagion, and heuristic processing during shared media use. Media Psychology 22(2):173-195. https://doi.org/10.1080/15213269.2017.1302342 Xie Z, Zhao C (2023) Micro-expression recognition based on dual-branch Swin transformer network. Proceedings of the International Conference on Intelligent Computing. 544-554. https://doi.org/10.1007/978-981-99-4742-3_45 Yan Y, Xue K, Shi X, Ye Q, Liu J, Ruan T (2023) AF adapter: Continual pretraining for building Chinese biomedical language model. Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine. 953-957. https://doi.org/10.1109/BIBM58861.2023.10385733 Yang C, Li Z, Zhang L (2024) Bootstrapping interactive image–text alignment for remote sensing image captioning. IEEE Transactions on Geoscience and Remote Sensing. 62:1-12. https://doi.org/10.1109/TGRS.2024.3359316 Yang X, Hayashi T (2021) Exploring the effects of internet memes in social media marketing through A/B testing. Proceedings of the IEEE 23rd Conference on Business Informatics. 97-106. https://doi.org/10.1109/CBI52690.2021.10060 Yang X, Liu Y, Wang X (2022) ReFormer: The relational transformer for image captioning. Proceedings of the 30th ACM International Conference on Multimedia. 5398-5406. https://doi.org/10.1145/3503161.3548409 Zaken EB, Goldberg Y, Ravfogel S (2022) BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 1-9. https://doi.org/10.18653/v1/2022.acl-short.1 Zhang J, Xie Y, Ding W, Wang Z (2023) Cross on cross attention: Deep fusion transformer for image captioning. IEEE Transactions on Circuits and Systems for Video Technology 33(8):4257-4268. https://doi.org/10.1109/TCSVT.2023.3243725 Zhao Y, Cong G, Shi J, Miao C (2022) QueryFormer: A tree transformer model for query plan representation. VLDB Endowment 15(8): 1658-1670. https://doi.org/10.14778/3529337.3529349 Zhu D, Jun C, Xiaoqian S, Xiang L, Mohamed E (2023) MiniGPT-4: Enhancing vision-language understanding with advanced large language models. arXiv.2304.10592. https://doi.org/10.48550/arXiv.2304.10592	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98153	-
dc.description.abstract	越來越多品牌利用迷因行銷，讓他們的品牌更貼近目標客戶，引起他們的共鳴。許多迷因標題生成模型利用特定的模板產生迷因標題，但這限制創作的自由度，無法產生具原創性且有影響力的迷因標題。因此，在本研究中，我們提出一個可調適廣告迷因標題生成模型GAMC幫某個品牌的貼文產生迷因標題，我們提出的模型包含五個模組，首先，我們透過視覺特徵提取模組和情感、情緒與幽默特徵擷取模組，從圖片中提取視覺與情緒相關特徵；接著，我們運用共注意力模組學習不同模態特徵間的關係；然後，我們利用大型語言模型生成迷因標題，並利用主資料集訓練模型以增加生成標題的幽默感，其中主資料集包含許多的迷因資料；最後，我們利用品牌資料與可適應模組微調已訓練好的模型，讓生成的迷因標題更加契合品牌形象。實驗結果顯示，我們提出的模型在幽默度、友善性及流暢度等評分指標上均優於比較模型。我們的廣告迷因標題生成模型，可幫助品牌展現其幽默感，提升品牌形象與曝光度，讓它們的貼文更具病毒式擴散能力。	zh_TW
dc.description.abstract	Many companies have used meme marketing to make them more relatable and approachable to their target audience. Many meme caption generation models use custom meme templates to generate meme captions; however, they can only generate meme captions on the pre-trained classes (or topics). These constraints on creative freedom can significantly hinder the ability to produce original and impactful meme captions. Therefore, in this study, we propose an adaptable model to Generate Advertising Meme Captions, called GAMC, for the posts of a brand. The proposed model contains five modules: the visual feature extraction module, the emotion-sentiment-humor (ESH) module, the co-attention module, caption generation module, and adaptation module. First, we apply the visual feature extraction module to extract the visual features and the ESH module to derive the emotion, sentiment, and humor features from the photo of the input post. Next, we use the co-attention module to learn the inter-relationships between features of different modalities. Fourth, we employ the Large Language Model (LLM) to generate the meme caption for the input post in the caption generation module, and train the proposed model by the main dataset to increase the sense of humor of generated captions, where the main dataset contains a large number of meme captions. Finally, we adapt the trained model to the brand by using the adapters and the dataset collected from the brand’s posts to fine-tune the trained model in the adaptation module. The experimental results show that the proposed model outperforms the compared models in terms of humor, benign, and fluency scores. Our model can help businesses reveal their humorous side, enhance their brand image, promote effective communication with their customers, and spark positive word-of-mouth.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-07-30T16:08:04Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-07-30T16:08:04Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	謝辭 i 論文摘要 ii Thesis Abstract iii Table of Contents iv List of Figures v List of Tables vi Chapter 1 Introduction 1 Chapter 2 Related Work 5 2.1 Meme Caption Generation 5 2.2 Image Captioning 6 2.3 Adapter 7 Chapter 3 The Proposed Framework 8 3.1 Visual Feature Extraction Module 9 3.2 ESH Module 11 3.3 Co-Attention Module 12 3.4 Caption Generation Module 13 3.5 Adaptation Module 13 3.5.1 Low-Rank Approximation and BitFit 14 3.5.2 Funny Score Tuning 15 Chapter 4 Experimental Results 17 4.1 Dataset and Evaluation Metrics 17 4.2 Performance Evaluation 20 4.3 Ablation Study 22 4.4 Human Evaluation 26 4.5 Meme Caption Examples 30 Chapter 5 Conclusions and Future Work 35 References 38 Appendix A 44 Appendix B 45	-
dc.language.iso	en	-
dc.subject	迷因標題生成模型	zh_TW
dc.subject	大型語言模型	zh_TW
dc.subject	可調適模組	zh_TW
dc.subject	注意力機制	zh_TW
dc.subject	adapter	en
dc.subject	meme caption generation model	en
dc.subject	attention mechanism	en
dc.subject	large language model	en
dc.title	可調適廣告迷因標題生成模型	zh_TW
dc.title	Adaptable Advertising Meme Caption Generation Model	en
dc.type	Thesis	-
dc.date.schoolyear	113-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	戴敏育;吳怡瑾	zh_TW
dc.contributor.oralexamcommittee	Min-Yuh Day;I-Chin Wu	en
dc.subject.keyword	迷因標題生成模型,大型語言模型,可調適模組,注意力機制,	zh_TW
dc.subject.keyword	meme caption generation model,large language model,adapter,attention mechanism,	en
dc.relation.page	46	-
dc.identifier.doi	10.6342/NTU202501318	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2025-07-18	-
dc.contributor.author-college	管理學院	-
dc.contributor.author-dept	資訊管理學系	-
dc.date.embargo-lift	2025-07-31	-
Appears in Collections:	資訊管理學系

Files in This Item:

File	Size	Format
ntu-113-2.pdf	5.72 MB	Adobe PDF	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets