文本意圖的多模態分析：以Instagram為例

Ying-Yu Chen; 陳盈瑜

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70749

標題:	文本意圖的多模態分析：以Instagram為例 An Analysis of Multimodal Document Intent in Instagram Posts
作者:	Ying-Yu Chen 陳盈瑜
指導教授:	謝舒凱(Shu-Kai Hsieh)
關鍵字:	多模態文本分析,自然語言處理, multimodal documents understanding,contextual relationship,semiotic relationship,authors intent,Natural Language Processing,
出版年 :	2020
學位:	碩士
摘要:	時至今日，社群媒體（如Instagram）趨向結合圖片以及文字表徵，建構出一種新的「多模態」溝通方式。利用計算方法分析多模態關係已成為一個熱門的主題，然而，尚未有研究針對台灣的百大網紅發文中的多模態圖文配對（Image-caption Pair）來分析文本意圖和圖文關係。利用文字和圖片的多模態表徵，本研究沿用 Kruk et al. (2019)的圖文關係分類方法（contextual relationship/semiotic relationship/authors intent），對此三種分類提出新的圖文表徵方式（Sentence-BERT及image embedding），並利用計算模型（Random Forest, Decision Tree Classifier）精準分類以上三種圖文關係，研究結果顯示正確率高達86.23%。 A majority of representation style on social media (i.e., Instagram) tends to combine visual and textual content in the same message as a consequence of building up a modern way of communication. Message in multimodality is essential in almost any types of social interactions especially in the context of social multimedia content on- line. Hence, effective computational approaches for understanding documents with multiple modalities needed to identify the relationship between them. This study extends recent advances in intent classification by putting forward an approach us- ing Image-caption Pairs (ICPs). Several Machine Learning algorithm like Decision Tree Classifier (DTC’s), Random Forest (RF) and encoders like Sentence-BERT and picture embedding are undertaken in the tasks in order to classify the relation- ships between multiple modalities, which are 1) contextual relationship 2) semiotic relationship and 3) authors intent. This study points to two results. First, despite the prior studies consider incorporating the two synergistic modalities in a com- bined model will improve the accuracy in the relationship classification task, this study found out the simple fusion strategy that linearly projects encoded vectors from both modalities in the same embedding space may not strongly enhance the performance of that in single modality. The results suggest that the incorporating of text and image needs more effort to complement each other. Second, we show that these text-image relationships can be classified with high accuracy (86.23%) by using only text modality. In sum, this study may be of essential in demonstrating a computational approach to access multimodal documents as well as providing a better understanding of classifying the relationships between modalities.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70749
DOI:	10.6342/NTU202004151
全文授權:	有償授權
顯示於系所單位：	語言學研究所

文件中的檔案：

檔案	大小	格式
U0001-2008202019480500.pdf 未授權公開取用	5.52 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。