Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73582
Title: 使用場景圖生成圖像之段落描述
SG2P: Image Paragraphing with Scene Graph
Authors: Chun-Yen Yeh
葉俊言
Advisor: 許永真
Keyword: 圖片段落生成,場景圖生成,類神經網路,
Image Paragraphing,Scene Graph Generation,Neural Network,
Publication Year : 2019
Degree: 碩士
Abstract: 在近幾年的電腦視覺領域,愈來愈多人研究圖片段落生成(image paragraphing)然而,因為圖片與文字有著根本結構上的不同,很難找到適合的方式將圖片資訊對應成文字,所以由現有方法產生的圖片段落仍充斥著許多語意上的錯誤。在這篇論文,我們提出了一個兩階段生成圖片段落的方法SG2P,來解決這個問題。相較於以往直接從圖片轉換成文字,我們先將圖片轉變成另一種語意結構的表示方法──場景圖(scene graph),期望透過場景圖可以生成更加語意正確的段落。除此之外,我們還使用了分級的循環語言模型,搭配跳躍連結以減輕在長句文字產生時的梯度消失問題。
為了評估結果,我們提出了一個新的衡量標準cSPICE,是一個基於圖比較的一種衡量標準,可以用來計算段落的語意正確性。實驗結果顯示:相較於直接將圖片轉換成段落,如果先將原始圖片轉換成場景圖,再利用其來產生對應的段落,分數會有顯著的進步。
Automatically describing an image with a paragraph has gained popularity recently in the field of computer vision. However, the results of existing methods are full of semantic errors as the features extracted directly from raw image they use have difficulty bridging the visual semantic information to language. In this thesis, we propose SG2P which is a two-staged network to address this issue. Instead of from raw image, the proposed method leverages features encoded from the scene graph, an intermediate semantic structure of an image, aiming to generate stronger semantically correct paragraphs. With the explicit semantic representation, we hypothesize that features from scene graph retains more semantic information than directly from raw image. In addition, we use hierarchical recurrent language model with skip connection in SG2P to reduce the effect of gradient vanishment during long generation process.
To evaluate the results, we propose a new evaluation metric called c-SPICE, which can automatically compute the semantic correctness of generated paragraphs by a graph-based comparison. Experiment shows that methods utilizing features from scene graph outperform those directly from raw image in c-SPICE.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73582
DOI: 10.6342/NTU201904003
Fulltext Rights: 有償授權
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
ntu-108-1.pdf
  Restricted Access
5.05 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved