Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 管理學院
  3. 資訊管理學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98153
Full metadata record
???org.dspace.app.webui.jsptag.ItemTag.dcfield???ValueLanguage
dc.contributor.advisor李瑞庭zh_TW
dc.contributor.advisorAnthony J. T. Leeen
dc.contributor.author葉家妤zh_TW
dc.contributor.authorJia-Yu Yehen
dc.date.accessioned2025-07-30T16:08:04Z-
dc.date.available2025-07-31-
dc.date.copyright2025-07-30-
dc.date.issued2025-
dc.date.submitted2025-07-16-
dc.identifier.citationAlmazrouei E, Alobeidli H, Alshamsi A, Cappelli A, Cojocaru R, Debbah M, Penedo G et al. (2023) The falcon series of open language models. arXiv:2311.16867. https://doi.org/10.48550/arXiv.2311.16867
Bousaid R, El Hajji M, Es-Saady Y (2022) Facial expression recognition using a hybrid ViT-CNN aggregator. Proceedings of the International Conference on Business Intelligence. 61-70. https://doi.org/10.1007/978-3-031-06458-6_5
Chen H, Wang Y, Yang X, Li J (2021) Captioning transformer with scene graph guiding. Proceedings of the IEEE International Conference on Image Processing. 2538-2542. https://doi.org/10.1109/ICIP42928.2021.9506193
Damodaran P (2021) Parrot: Paraphrase generation for nlu. GitHub Open Source.
Dawkins R (1976) The Selfish Gene. Oxford: Oxford University Press, New York, USA.
Devlin J, Chang M W, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171-4186. https://doi.org/10.18653/v1/N19-1423
Eisend M (2022) The influence of humor in advertising: Explaining the effects of humor in two‐sided messages. Psychology & Marketing 39(5):962-973. https://doi.org/10.1002/mar.21634
Fang Z, Wang J, Hu X, Liang L, Gan Z, Wang L, Yang Y, Liu Z (2022) Injecting semantic concepts into end-to-end image captioning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17988-17998. https://doi.org/10.1109/CVPR52688.2022.01748
Giroux J, Bouchard M, Laganière R (2023) T-FFTRadNet: Object detection with Swin vision transformers from raw ADC radar signals. Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 4032-4041. https://doi.org/10.1109/ICCVW60793.2023.00435
Han S, Chang H, Shi Z, Hu S (2023) Facial expression recognition algorithm based on Swin transformer. Proceedings of the 9th International Conference on Systems and Informatics. 1-6. https://doi.org/10.1109/ICSAI61474.2023.10423327
Hatamizadeh A, Nath V, Tang Y, Yang D, Roth HR, Xu D (2022) Swin UNETR: Swin transformers for semantic segmentation of brain tumors in MRI images. Proceedings of the International MICCAI Brainlesion Workshop. 272-284. https://doi.org/10.1007/978-3-031-08999-2_22
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770-778. https://doi.org/10.48550/arXiv.1512.03385
He R, Liu L, Ye H, Tan Q, Ding B, Cheng L, Low J, Bing L, Si L (2021) On the effectiveness of adapter-based tuning for pretrained language model adaptation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2208-2222. https://doi.org/10.18653/v1/2021.acl-long.172
He X, Zhou Y, Zhao J, Zhang D, Yao R, Xue Y (2022) Swin transformer embedding UNet for remote sensing image semantic segmentation. IEEE Transactions on Geoscience and Remote Sensing 60:1-15. https://doi.org/10.1109/TGRS.2022.3144165
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Computation 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Hu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W (2021) LoRA: Low-rank adaptation of large language models. arXiv.2106.09685. https://doi.org/10.48550/arXiv.2106.09685
Jung D, Shim S, Choo C, Hwang D, Nah Y, Oh S (2022) A preliminary result of food object detection using Swin transformer. Proceedings of the 8th International Conference on Computer Technology Applications. 183-187. https://doi.org/10.1145/3543712.3543731
Li J, Li D, Savarese S, Hoi S (2023) Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. Proceedings of the International Conference on Machine Learning. 19730-19742. https://doi.org/10.48550/arXiv.2301.12597
Li J, Li D, Xiong C, Hoi S (2022) Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. Proceedings of the International Conference on Machine Learning. 12888-12900. https://doi.org/10.48550/arXiv.2201.12086
Li R, Sun S, Elhoseiny M, Torr P (2023) OxfordTVG-HIC: Can machine make humorous captions from images? Proceedings of the IEEE/CVF International Conference on Computer Vision. 20293-20303. https://doi.org/10.1109/ICCV51070.2023.01856
Li Y, Pan Y, Yao T, Mei T (2022) Comprehending and ordering semantics for image captioning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17969-17978. https://doi.org/10.1109/CVPR52688.2022.01746
Likert R (1932) A technique for the measurement of attitudes. Archives of psychology. 22(140):55.
Liu L, Jiao Y, Li X, Li J, Wang H, Cao X (2023) Swin transformer-based image captioning with feature enhancement and multi-stage fusion. Proceedings of the 19th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery. 1-7. https://doi.org/10.1109/ICNC-FSKD59587.2023.10281090
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) RoBERTa: A robustly optimized BERT pretraining approach. arXiv.1907.11692. https://doi.org/10.48550/arXiv.1907.11692
Liu Z, Hu H, Lin Y, Yao Z, Xie Z, Wei Y, Ning J, Cao J, Zheng Z, Dong L, Wei F, Guo B (2022) Swin transformer v2: Scaling up capacity and resolution. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11999-12009. https://doi.org/10.1109/CVPR52688.2022.01170
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF International Conference on Computer Vision. 10012-10022. https://doi.org/10.1109/ICCV48922.2021.00986
Liu Z, Tan Y, He Q, Xiao Y (2021) SwinNet: Swin transformer drives edge-aware RGB-D and RGB-T salient object detection. IEEE Transactions on Circuits and Systems for Video Technology 32(7):4486-4497. https://doi.org/10.1109/TCSVT.2021.3127149
Lu J, Batra D, Parikh D, Lee S (2019) ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in Neural Information Processing Systems 32. https://doi.org/10.48550/arXiv.1908.02265
Malodia S, Dhir A, Bilgihan A, Sinha P, Tikoo T (2022) Meme marketing: How can marketers drive better engagement using viral memes? Psychology & Marketing 39(9):1775-1801. https://doi.org/10.1002/mar.21702
McGraw AP, Warren C (2010) Benign violations: Making immoral behavior funny. Psychological Science. 21(8):1141-1149. https://doi.org/10.1177/0956797610376073
Mokady R, Hertz A, Bermano A H (2021) ClipCap: Clip prefix for image captioning. arXiv.2111.09734. https://doi.org/10.48550/arXiv.2111.09734
Muennighoff N (2020) Vilio: State-of-the-art visio-linguistic models applied to hateful memes. arXiv:2012.07788. https://doi.org/10.48550/arXiv.2012.07788
Ninh QB, Nguyen HC, Huynh T, Tran MT, Le TN (2023) Multi-branch network for imagery emotion prediction. Proceedings of the 12th International Symposium on Information and Communication Technology. 371-378. https://doi.org/10.1145/3628797.3628954
Pech RJ (2003) Memetics and innovation: Profit through balanced meme management. European Journal of Innovation Management 6(2):111-117.
Peirson VAL, Tolunay EM (2018) Dank learning: Generating memes using deep neural networks. arXiv.1806.04510. https://doi.org/10.48550/arXiv.1806.04510
Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G, Sutskever I (2021) Learning transferable visual models from natural language supervision. Proceedings of the International Conference on Machine Learning. 8748-8763. https://doi.org/10.48550/arXiv.2103.00020
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI Blog 1(8):9.
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6):1137-1149. https://doi.org/10.1109/TPAMI.2016.2577031
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. https://doi.org/10.48550/arXiv.1409.1556
Vaswani A (2017) Attention is all you need. arXiv.1706.03762. https://doi.org/10.48550/arXiv.1706.03762
Vyalla SR, Udandarao V (2020) Memeify: A large-scale meme generation system. Proceedings of the 7th ACM India Joint International Conference on Data Science and Management of Data. 307-311. https://doi.org/10.1145/3371158.3371403
Wang H, Lee RKW (2024) MemeCraft: Contextual and stance-driven multimodal meme generation. Proceedings of the ACM on Web Conference. 4642-4652. https://doi.org/10.1145/3589334.3648151
Weber M, Quiring O (2019) Is it really that funny? Laughter, emotional contagion, and heuristic processing during shared media use. Media Psychology 22(2):173-195. https://doi.org/10.1080/15213269.2017.1302342
Xie Z, Zhao C (2023) Micro-expression recognition based on dual-branch Swin transformer network. Proceedings of the International Conference on Intelligent Computing. 544-554. https://doi.org/10.1007/978-981-99-4742-3_45
Yan Y, Xue K, Shi X, Ye Q, Liu J, Ruan T (2023) AF adapter: Continual pretraining for building Chinese biomedical language model. Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine. 953-957. https://doi.org/10.1109/BIBM58861.2023.10385733
Yang C, Li Z, Zhang L (2024) Bootstrapping interactive image–text alignment for remote sensing image captioning. IEEE Transactions on Geoscience and Remote Sensing. 62:1-12. https://doi.org/10.1109/TGRS.2024.3359316
Yang X, Hayashi T (2021) Exploring the effects of internet memes in social media marketing through A/B testing. Proceedings of the IEEE 23rd Conference on Business Informatics. 97-106. https://doi.org/10.1109/CBI52690.2021.10060
Yang X, Liu Y, Wang X (2022) ReFormer: The relational transformer for image captioning. Proceedings of the 30th ACM International Conference on Multimedia. 5398-5406. https://doi.org/10.1145/3503161.3548409
Zaken EB, Goldberg Y, Ravfogel S (2022) BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 1-9. https://doi.org/10.18653/v1/2022.acl-short.1
Zhang J, Xie Y, Ding W, Wang Z (2023) Cross on cross attention: Deep fusion transformer for image captioning. IEEE Transactions on Circuits and Systems for Video Technology 33(8):4257-4268. https://doi.org/10.1109/TCSVT.2023.3243725
Zhao Y, Cong G, Shi J, Miao C (2022) QueryFormer: A tree transformer model for query plan representation. VLDB Endowment 15(8): 1658-1670. https://doi.org/10.14778/3529337.3529349
Zhu D, Jun C, Xiaoqian S, Xiang L, Mohamed E (2023) MiniGPT-4: Enhancing vision-language understanding with advanced large language models. arXiv.2304.10592. https://doi.org/10.48550/arXiv.2304.10592
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98153-
dc.description.abstract越來越多品牌利用迷因行銷,讓他們的品牌更貼近目標客戶,引起他們的共鳴。許多迷因標題生成模型利用特定的模板產生迷因標題,但這限制創作的自由度,無法產生具原創性且有影響力的迷因標題。因此,在本研究中,我們提出一個可調適廣告迷因標題生成模型GAMC幫某個品牌的貼文產生迷因標題,我們提出的模型包含五個模組,首先,我們透過視覺特徵提取模組和情感、情緒與幽默特徵擷取模組,從圖片中提取視覺與情緒相關特徵;接著,我們運用共注意力模組學習不同模態特徵間的關係;然後,我們利用大型語言模型生成迷因標題,並利用主資料集訓練模型以增加生成標題的幽默感,其中主資料集包含許多的迷因資料;最後,我們利用品牌資料與可適應模組微調已訓練好的模型,讓生成的迷因標題更加契合品牌形象。實驗結果顯示,我們提出的模型在幽默度、友善性及流暢度等評分指標上均優於比較模型。我們的廣告迷因標題生成模型,可幫助品牌展現其幽默感,提升品牌形象與曝光度,讓它們的貼文更具病毒式擴散能力。zh_TW
dc.description.abstractMany companies have used meme marketing to make them more relatable and approachable to their target audience. Many meme caption generation models use custom meme templates to generate meme captions; however, they can only generate meme captions on the pre-trained classes (or topics). These constraints on creative freedom can significantly hinder the ability to produce original and impactful meme captions. Therefore, in this study, we propose an adaptable model to Generate Advertising Meme Captions, called GAMC, for the posts of a brand. The proposed model contains five modules: the visual feature extraction module, the emotion-sentiment-humor (ESH) module, the co-attention module, caption generation module, and adaptation module. First, we apply the visual feature extraction module to extract the visual features and the ESH module to derive the emotion, sentiment, and humor features from the photo of the input post. Next, we use the co-attention module to learn the inter-relationships between features of different modalities. Fourth, we employ the Large Language Model (LLM) to generate the meme caption for the input post in the caption generation module, and train the proposed model by the main dataset to increase the sense of humor of generated captions, where the main dataset contains a large number of meme captions. Finally, we adapt the trained model to the brand by using the adapters and the dataset collected from the brand’s posts to fine-tune the trained model in the adaptation module. The experimental results show that the proposed model outperforms the compared models in terms of humor, benign, and fluency scores. Our model can help businesses reveal their humorous side, enhance their brand image, promote effective communication with their customers, and spark positive word-of-mouth.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-07-30T16:08:04Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-07-30T16:08:04Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents謝辭 i
論文摘要 ii
Thesis Abstract iii
Table of Contents iv
List of Figures v
List of Tables vi
Chapter 1 Introduction 1
Chapter 2 Related Work 5
2.1 Meme Caption Generation 5
2.2 Image Captioning 6
2.3 Adapter 7
Chapter 3 The Proposed Framework 8
3.1 Visual Feature Extraction Module 9
3.2 ESH Module 11
3.3 Co-Attention Module 12
3.4 Caption Generation Module 13
3.5 Adaptation Module 13
3.5.1 Low-Rank Approximation and BitFit 14
3.5.2 Funny Score Tuning 15
Chapter 4 Experimental Results 17
4.1 Dataset and Evaluation Metrics 17
4.2 Performance Evaluation 20
4.3 Ablation Study 22
4.4 Human Evaluation 26
4.5 Meme Caption Examples 30
Chapter 5 Conclusions and Future Work 35
References 38
Appendix A 44
Appendix B 45
-
dc.language.isoen-
dc.subject迷因標題生成模型zh_TW
dc.subject大型語言模型zh_TW
dc.subject可調適模組zh_TW
dc.subject注意力機制zh_TW
dc.subjectadapteren
dc.subjectmeme caption generation modelen
dc.subjectattention mechanismen
dc.subjectlarge language modelen
dc.title可調適廣告迷因標題生成模型zh_TW
dc.titleAdaptable Advertising Meme Caption Generation Modelen
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee戴敏育;吳怡瑾zh_TW
dc.contributor.oralexamcommitteeMin-Yuh Day;I-Chin Wuen
dc.subject.keyword迷因標題生成模型,大型語言模型,可調適模組,注意力機制,zh_TW
dc.subject.keywordmeme caption generation model,large language model,adapter,attention mechanism,en
dc.relation.page46-
dc.identifier.doi10.6342/NTU202501318-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2025-07-18-
dc.contributor.author-college管理學院-
dc.contributor.author-dept資訊管理學系-
dc.date.embargo-lift2025-07-31-
Appears in Collections:資訊管理學系

Files in This Item:
File SizeFormat 
ntu-113-2.pdf5.72 MBAdobe PDFView/Open
Show simple item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved