文本主題的向量表徵模型及其多模態任務應用

廖聿鋆; Yu-Yun Liao

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83103

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	謝舒凱	zh_TW
dc.contributor.advisor	Shu-Kai Hsieh	en
dc.contributor.author	廖聿鋆	zh_TW
dc.contributor.author	Yu-Yun Liao	en
dc.date.accessioned	2023-01-08T17:05:47Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-01-06	-
dc.date.issued	2022	-
dc.date.submitted	2022-11-20	-
dc.identifier.citation	Adami, E. (2016). Introducing multimodality. The Oxford handbook of language and society, 451. Alsini, A. (2021). Developing community based hashtag recommendation for tweets and methods of evaluating hashtag recommendation. Baltrušaitis, T., Ahuja, C., & Morency, L.-P. (2018). Multimodal machine learning: a survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence, 41(2), 423–443. Bashari, B., & Fazl-Ersi, E. (2020). Influential post identification on instagram through caption and hashtag analysis. Measurement and Control, 53(3-4), 409–415. Belwal, R. C., Rai, S., & Gupta, A. (2021). Text summarization using topic-based vector space model and semantic measure. Information Processing & Management, 58(3), 102536. Bielski, A., & Trzcinski, T. (2018). Understanding multimodal popularity prediction of social media videos with self-attention. IEEE Access, 6, 74277–74287. Blei, D., & Lafferty, J. (2006a). Correlated topic models. Advances in neural information processing systems, 18, 147. Blei, D. M., & Lafferty, J. D. (2006b). Dynamic topic models. Proceedings of the 23rd international conference on Machine learning, 113–120. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993–1022. Brookes, G., & McEnery, T. (2019). The utility of topic modelling for discourse studies: a critical evaluation. Discourse Studies, 21(1), 3–21. Campello, R. J., Moulavi, D., & Sander, J. (2013). Density-based clustering based on hierarchical density estimates. Pacific-Asia conference on knowledge discovery and data mining, 160–172. Carta, S., Podda, A. S., Recupero, D. R., Saia, R., & Usai, G. (2020). Popularity prediction of instagram posts. Information, 11(9), 453. Data61, C. (2018). Stellargraph machine learning library. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 . Grootendorst, M. (2022). Bertopic: neural topic modeling with a class-based tf-idf procedure. arXiv preprint arXiv:2203.05794. Grover, A., & Leskovec, J. (2016). Node2vec: scalable feature learning for networks. Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 855–864. Halliday, M. A. K. (1978). Language as social semiotic: the social interpretation of language and meaning. Hodder Education. Hamilton, W., Ying, Z., & Leskovec, J. (2017). Inductive representation learning on large graphs. Advances in neural information processing systems, 30. Hnini, G., Riﬀi, J., Mahraz, M. A., Yahyaouy, A., & Tairi, H. (2021). Mmpc-rf: a deep multimodal feature-level fusion architecture for hybrid spam e-mail detection. Applied Sciences, 11(24), 11968. Huang, F., Chen, J., Lin, Z., Kang, P., & Yang, Z. (2018). Random forest exploiting post-related and user-related features for social media popularity prediction. Proceedings of the 26th ACM international conference on Multimedia, 2013– 2017 Jacobs, T., & Tschötschel, R. (2019). Topic models meet discourse analysis: a quantitative tool for a qualitative approach. International Journal of Social Research Methodology, 22(5), 469–485. Java, A., Song, X., Finin, T., & Tseng, B. (2007). Why we twitter: understanding microblogging usage and communities. of the 9th WebKDD and 1st SNA, 43, 56–65. Kehoe, A., & Gee, M. (2011). Social tagging: a new perspective on textual “aboutness”. Studies in Variation, Contacts and Change in English, 6(5). Kim, S., Jiang, J.-Y., & Wang, W. (2021). Discovering undisclosed paid partnership on social media via aspect-attentive sponsored post learning. Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 319–327. Kress, G. (2009). Multimodality: a social semiotic approach to contemporary communication. Routledge. Kress, G. (2015). Semiotic work: applied linguistics and a social semiotic account of multimodality. Aila Review, 28(1), 49–71. Li, M., Gan, T., Liu, M., Cheng, Z., Yin, J., & Nie, L. (2019). Long-tail hashtag recommendation for micro-videos with graph convolutional network. Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 509–518. Li, S., Chua, T.-S., Zhu, J., & Miao, C. (2016). Generative topic embedding: a continuous representation of documents. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 666–675. Li, S., Zhu, J., & Miao, C. (2017). Psdvec: a toolbox for incremental and scalable word embedding. Neurocomputing, 237, 405–409. Li, S., Huang, G., Tan, R., & Pan, R. (2013). Tag-weighted dirichlet allocation. 2013 IEEE 13th International Conference on Data Mining, 438–447. Liu, J., He, Z., & Huang, Y. (2018a). Hashtag2vec: learning hashtag representation with relational hierarchical embedding model. IJCAI, 3456–3462. Liu, J., He, Z., & Huang, Y. (2018b). Hashtag2vec: learning hashtag representation with relational hierarchical embedding model. IJCAI, 3456–3462. Liu, Y., Liu, Z., Chua, T.-S., & Sun, M. (2015). Topical word embeddings. Twenty- ninth AAAI conference on artificial intelligence. Liu, Y., Pang, B., & Wang, X. (2019). Opinion spam detection by incorporating multimodal embedded representation into a probabilistic review graph. Neurocomputing, 366, 276–283. Mazloom, M., Rietveld, R., Rudinac, S., Worring, M., & Van Dolen, W. (2016). Multimodal popularity prediction of brand-related social media posts. Proceedings of the 24th ACM international conference on Multimedia, 197–201. McInnes, L., Healy, J., & Melville, J. (2018). Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426. Meghawat, M., Yadav, S., Mahata, D., Yin, Y., Shah, R. R., & Zimmermann, R. (2018). A multimodal approach to predict social media popularity. 2018 IEEE conference on multimedia information processing and retrieval (MIPR), 190– 195. Mehmet, M. I., Clarke, R. J., & Kautz, K. (2014). Social media semantics: analysing meanings in multimodal online conversations. Messina, C. (2007). Groups for twitter; or a proposal for twitter tag channels. Factory Joe, 25. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Eﬀicient estimation of word representations in vector space. https://arxiv.org/abs/1301.3781 Moody, C. E. (2016). Mixing dirichlet topic models and word embeddings to make lda2vec. arXiv preprint arXiv:1605.02019. Page, R. (2012). The linguistics of self-branding and micro-celebrity in twitter: the role of hashtags. Discourse & communication, 6(2), 181–201. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit- learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830. Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). Deepwalk: online learning of social representations. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 701–710. Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettle- moyer, L. (2018). Deep contextualized word representations. https://arxiv. org/abs/1802.05365 Pham, H., Liang, P. P., Manzini, T., Morency, L.-P., & Póczos, B. (2019). Found in translation: learning robust joint representations by cyclic translations between modalities. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 6892–6899. Qi, D., Su, L., Song, J., Cui, E., Bharti, T., & Sacheti, A. (2020). Imagebert: cross- modal pre-training with large-scale weak-supervised image-text data. arXiv preprint arXiv:2001.07966. Quan, X., Kit, C., Ge, Y., & Pan, S. J. (2015). Short and sparse text topic modeling via self-aggregation. Twenty-fourth international joint conference on artificial intelligence. Rehatschek, H., Sorschag, R., Rettenbacher, B., Zeiner, H., Nioche, J., DeJong, F., Ordelmann, R., & van Leeuwen, D. (2008). Mediacampaign—a multimodal semantic analysis system for advertisement campaign detection. 2008 International Workshop on Content-Based Multimedia Indexing, 85–92. Reimers, N., & Gurevych, I. (2019). Sentence-bert: sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084. Rivas, R., Paul, S., Hristidis, V., Papalexakis, E. E., & Roy-Chowdhury, A. K. (2022). Task-agnostic representation learning of multimodal twitter data for downstream applications. Journal of Big Data, 9(1), 1–19. Roberts, C. (2010). Topics. Semantics: An International Handbook of Natural Language Meaning, 33(2), 1908–1934. Rosen-Zvi, M., Griﬀiths, T., Steyvers, M., & Smyth, P. (2012). The author-topic model for authors and documents. arXiv preprint arXiv:1207.4169. Scott, K. (2015). The pragmatics of hashtags: inference and conversational style on twitter. Journal of Pragmatics, 81, 8–20. Singh, L. G., Anil, A., & Singh, S. R. (2020). She: sentiment hashtag embedding through multitask learning. IEEE Transactions on Computational Social Systems, 7(2), 417–424. Sun, C., Myers, A., Vondrick, C., Murphy, K., & Schmid, C. (2019). Videobert: a joint model for video and language representation learning. Proceedings of the IEEE/CVF International Conference on Computer Vision, 7464–7473. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., & Mei, Q. (2015). Line: large-scale information network embedding. Proceedings of the 24th international conference on world wide web, 1067–1077. Tsai, Y.-H. H., Bai, S., Liang, P. P., Kolter, J. Z., Morency, L.-P., & Salakhutdinov, R. (2019). Multimodal transformer for unaligned multimodal language sequences. Proceedings of the conference. Association for Computational Linguistics. Meeting, 2019, 6558. Tu, C., Liu, H., Liu, Z., & Sun, M. (2017). Cane: context-aware network embedding for relation modeling. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1722– 1731. Vayansky, I., & Kumar, S. (2020). A review of topic modeling methods. Information Systems, 94, 101582. Wang, R., Liu, W., & Gao, S. (2016). Hashtags and information virality in networked social movement: examining hashtag co-occurrence patterns. Online Information Review. Wang, Y., Shen, Y., Liu, Z., Liang, P. P., Zadeh, A., & Morency, L.-P. (2019). Words can shift: dynamically adjusting word representations using nonverbal behaviors. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 7216–7223. Wang, Y., Liu, J., Qu, J., Huang, Y., Chen, J., & Feng, X. (2014). Hashtag graph based topic model for tweet mining. 2014 IEEE International Conference on Data Mining, 1025–1030. Weng, L., & Menczer, F. (2015). Topicality and impact in social media: diverse messages, focused messengers. PLOS ONE, 10, 1–17. https://doi.org/10. 1371/journal.pone.0118410 Wikström, P. (2014). # Srynotfunny: communicative functions of hashtags on twitter. SKY Journal of Linguistics, 27, 127–152. Yan, X., Guo, J., Lan, Y., & Cheng, X. (2013). A biterm topic model for short texts. Proceedings of the 22nd international conference on World Wide Web, 1445–1456. Yang, C., Liu, Z., Zhao, D., Sun, M., & Chang, E. (2015). Network representation learning with rich text information. Twenty-fourth international joint conference on artificial intelligence. Zappavigna, M. (2015). Searchable talk: the linguistic functions of hashtags. Social Semiotics, 25, 1–18. Zhang, C., Chen, W.-B., Chen, X., Tiwari, R., Yang, L., & Warner, G. (2009). A multimodal data mining framework for revealing common sources of spam images. Journal of multimedia, 4(5). Zhang, S., Liu, X., Niu, J., & Li, H. (2021). Contenthe: content-enhanced network embedding for hashtag representation. 2021 International Conference on Data Mining Workshops (ICDMW), 102–109. Zhang, Z., Chen, T., Zhou, Z., Li, J., & Luo, J. (2018). How to become instagram famous: post popularity prediction with dual-attention. 2018 IEEE international conference on big data (big data), 2383–2392.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83103	-
dc.description.abstract	近年來，圖像與文字資料間的跨模態訊息已經受到廣泛的研究及應用。不過，文本中的主題性資訊 (如文章主旨或論述的中心思想) 卻從未被應用至多模態任務中，這樣的資訊如何被機器理解與表徵也未曾被深入探討。有鑑於此，本論文提出「融合主題表徵」來作為文本主題在多模態任務中的向量表徵形式，並論證文本主題與文字、圖片等模態同樣能承載重要語意訊息。本論文藉由兩種子表徵來建構融合主題表徵：透過BERTopic主題模型產出的sentence-BERT向量作為全域主題表徵，及透過node2vec和graphSAGE從主題標籤網路(hashtag network)所產生的節點向量作為局部主題表徵。接著，本論文設計三種不同的任務來檢驗融合主題表徵的效果：文本主題相似度評測任務主要比較人類與機器對文本主題概念的理解，而其餘兩項多模態預測任務 (貼文熱度預測及廣告辨識) 則透過置換不同模態組合來分析融合主題表徵是否能增進下游任務的表現。研究結果顯示，當融合主題表徵被作為多模態文本表徵的一部分時，模型在下游任務的表現可以提昇約5%。這說明了文本主題能輔助其他模態的預測表現，並在多模態標表徵中攜帶有助於模型預測的主題訊息。此外，人類與機器在評斷文本主題相似度時的Spearman相關係數達到0.44，表示融合主題表徵大致能夠模擬人類認知中的文本主題概念。最後，融合主題表徵的兩項子表徵分別能擷取不同粒度的主題資訊，而兩者融合時彼此的資訊呈現互補的模式。	zh_TW
dc.description.abstract	Recent developments in multimodal machine learning have made extensive explorations into the cross-modal relationships between textual and visual data. However, topical information in documents (such as central ideas and discoursive focus in texts) has never been implemented to multimodal tasks, and its vector representation still remain under-researched. In light of this, the present thesis proposes Integrated Topic Embeddings (ITEs) to represent document topics in multimodal prediction tasks, and argues that they serve as an equally informative modality as text and images. This thesis combines two elements to form integrated topic embeddings: global topic embeddings, which are sentence-BERT embeddings generated from BERTopic, and local topic embeddings, which are node embeddings generated by node2vec and graphSAGE from a hashtag network. Three experiments are then designed to validate the effectiveness of ITEs: the topic similarity rating task aims to compare human cognition and machine understanding of document topics, and two ablation tasks (popularity prediction and advertisement detection) examine whether the machine predicts better with document topics fused in the multimodal document representation. The results indicate that when incorporating ITEs, multimodal models can boost task performances by up to 5%. This demonstrates that document topics are able to support other modalities, and they serve as an informative component in multimodal document representations. In addition, topic information encoded in ITEs moderately resembles that of human perception, as inferred from an average Spearman's correlation of 0.44 between human and the machines's ratings of document topic similarity. Finally, qualitative assessments on ITEs imply that the hashtag network and BERTopic capture different layers and granularity of topical information, and the two are complementary when combined as ITEs.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-01-08T17:05:47Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-01-08T17:05:47Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	論文口試委員審定書 i 致謝 iii 摘要 v Abstract vii List of Figures xiii List of Tables xv 1 Introduction 1 1.1 Motivation 2 1.2 Proposed Method and Contributions 4 1.3 ThesisOverview 6 2 Related Work 7 2.1 Topics and Representations of Topics 7 2.1.1 Different Perspectives on Topics 8 2.1.2 Hashtags as Local Topic Markers 9 2.1.3 Topic Models as Global Topic Identifiers 11 2.1.4 Vector Representations of Topics and Hashtags 13 2.2 Multimodal Communication 16 2.2.1 Semiotic View of Multimodality 17 2.2.2 Multimodal Machine Learning 18 2.3 Multimodal Social Media Tasks 19 2.3.1 Multimodal Popularity Prediction 20 2.3.2 Multimodal Advertisement Detection 21 3 Research Methods 23 3.1 Data Collection 23 3.1.1 Influ100 Dataset 24 3.1.2 Ad-life Dataset 25 3.2 Feature Engineering 27 3.2.1 Integrated Topic Embeddings 27 3.2.2 Text Embeddings 33 3.2.3 Image Embeddings 33 3.2.4 Metadata 34 3.2.5 Joint Representation 34 3.3 Experiments 36 3.3.1 Topic Similarity Rating Task 36 3.3.2 Popularity Prediction Task 38 3.3.3 Advertisement Detection Task 41 3.4 Model Evaluation 42 3.4.1 Evaluation Metrics 42 3.4.2 Baselines 43 4 Results and Discussion 47 4.1 Topic Similarity Rating Task 47 4.2 Popularity Prediction Task 52 4.3 Advertisement Detection 55 4.4 Interpreting Global and Local Topic Information 57 5 Conclusion 61 5.1 Summary 61 5.2 Limitations and Future Directions 62 References 65 Appendix A Questionnaire Design 73	-
dc.language.iso	en	-
dc.subject	主題向量	zh_TW
dc.subject	社群媒體分析	zh_TW
dc.subject	多模態機器學習	zh_TW
dc.subject	文本分類	zh_TW
dc.subject	主題模型	zh_TW
dc.subject	text classification	en
dc.subject	multimodal machine learning	en
dc.subject	topic embeddings	en
dc.subject	topic models	en
dc.subject	social media analysis	en
dc.title	文本主題的向量表徵模型及其多模態任務應用	zh_TW
dc.title	An Integrated Topic Embedding Framework for Multimodal Document Representation	en
dc.title.alternative	An Integrated Topic Embedding Framework for Multimodal Document Representation	-
dc.type	Thesis	-
dc.date.schoolyear	111-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	張瑜芸;陳正賢	zh_TW
dc.contributor.oralexamcommittee	Yu-Yun Chang;Cheng-Hsien Chen	en
dc.subject.keyword	多模態機器學習,主題向量,主題模型,文本分類,社群媒體分析,	zh_TW
dc.subject.keyword	multimodal machine learning,topic embeddings,topic models,text classification,social media analysis,	en
dc.relation.page	74	-
dc.identifier.doi	10.6342/NTU202210031	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2022-11-22	-
dc.contributor.author-college	文學院	-
dc.contributor.author-dept	語言學研究所	-
dc.date.embargo-lift	2025-10-12	-
顯示於系所單位：	語言學研究所

文件中的檔案：

檔案	大小	格式
U0001-1258221107572002.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	8.65 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。