從圖文故事排序中學習並探索序列化視覺語義嵌入

Wei-Rou Lin; 林瑋柔

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72158

標題:	從圖文故事排序中學習並探索序列化視覺語義嵌入 Learning and Exploring Sequential Visual-Semantic Embeddings from Visual Story Ordering
作者:	Wei-Rou Lin 林瑋柔
指導教授:	陳信希(Hsin-Hsi Chen)
關鍵字:	故事,排序,視覺語義嵌入, story,ordering,visual-semantic embedding,
出版年 :	2018
學位:	碩士
摘要:	近年來，網路上有越來越多圖文並茂的「故事」（如部落格文章）產生，我們好奇圖文所攜帶的資訊有哪些異同之處，因此提出了一個「圖文故事排序」的問題：排序同一則圖文故事裡的圖片與文字，並希望能藉由將故事中的圖片和文字訓練到相同的嵌入中來處理這個問題。我們用兩種方式來解這個問題：pairwise和listwise，首先藉由pairwise的指代消解來看待問題。後來又嘗試了一些其他模型，實現了reader, processor, writer架構和自注意力機制，還提出了新的概念：雙向解碼和雙向beam search。我們運用VIST圖文故事資料集 (Huang et al., 2016) 來實驗我們的模型。結果顯示在排序單純的文字故事時，雙向解碼的模型表現得比單向好，而有加入圖片來訓練的模型也表現得比沒有加入圖片的好。接著我們發現在沒有特別優化縮短圖文嵌入距離的情況下，模型仍然有拉近故事中對應圖文向量的效果。之後還可以在其他資料集上測試模型的有效性，深入探索我們提出的雙向解碼機制，也可以再修改一些已存在的模型，加到我們的模型中來達成其他功能。 As more and more text-image intertwined stories, such as blog post, is generated on the internet, we are curious about the similarities and differences between infor-mation carried by the two modalities. Thus, we introduce the visual story ordering problem: to order image and text in a visual story jointly and handle the problem by a model training text and image into the same embedding. We try several models to deal with the problem, including pairwise and listwise approaches. We employ the result of coreference resolution as a baseline. In addition, we also implement a reader, processor, writer architecture and self-attention mecha-nism. We further proposed to decode in a bidirectional model with bidirectional beam search. We experimented on our methods with VIST, which is a visual storytelling dataset (Huang et al., 2016). The results show that bidirectional models outperform unidirec-tional models, whereas models trained with image outperform models trained without image on a text-only subset. We also found our embedding narrow the distance be-tween images and their corresponding story sentence even though we do not align the two modalities directly. In the future, we can further test the effectiveness of our model on different datasets, exploring the bidirectional inference mechanism deeper and augment our model with more functionality adapting the existing models.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72158
DOI:	10.6342/NTU201803857
全文授權:	有償授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 目前未授權公開取用	1.58 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。