Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73582
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor許永真
dc.contributor.authorChun-Yen Yehen
dc.contributor.author葉俊言zh_TW
dc.date.accessioned2021-06-17T08:06:21Z-
dc.date.available2019-08-20
dc.date.copyright2019-08-20
dc.date.issued2019
dc.date.submitted2019-08-19
dc.identifier.citation[1] P. Anderson, B. Fernando, M. Johnson, and S. Gould. SPICE: semantic propositionalimage caption evaluation. InECCV, 2016.
[2] P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang.Bottom-up and top-down attention for image captioning and visual question answer-ing. InCVPR, 2018.
[3] Y. Bengio and Y. LeCun, editors.3rd International Conference on Learning Rep-resentations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference TrackProceedings, 2015.
[4] M. Chatterjee and A. G. Schwing. Diverse and coherent paragraph generation fromimages. InECCV, 2018.
[5] X. Chen and C. L. Zitnick. Mind’s eye: A recurrent visual representation for imagecaption generation. InCVPR, 2015.
[6] K. Clark and C. D. Manning. Deep reinforcement learning for mention-ranking coref-erence models. InEMNLP, 2016.
[7] Y. Cui, G. Yang, A. Veit, X. Huang, and S. J. Belongie. Learning to evaluate imagecaptioning. InCVPR, 2018.
[8] B. Dai, S. Fidler, R. Urtasun, and D. Lin. Towards diverse and natural imagedescriptions via a conditional GAN. InICCV, 2017.
[9] A. Farhadi, S. M. M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hocken-maier, and D. A. Forsyth. Every picture tells a story: Generating sentences fromimages. InECCV, 2010.
[10] M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a rankingtask: Data, models and evaluation metrics (extended abstract). InProceedings ofthe Twenty-Fourth International Joint Conference on Artificial Intel ligence, IJCAI2015, Buenos Aires, Argentina, July 25-31, 2015, 2015.
[11] M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a rankingtask: Data, models and evaluation metrics (extended abstract). InIJCAI, 2015.
[12] J. Johnson, A. Gupta, and L. Fei-Fei. Image generation from scene graphs. InCVPR,2018.
[13] J. Johnson, A. Karpathy, and L. Fei-Fei. Densecap: Fully convolutional localizationnetworks for dense captioning. InCVPR, 2016.
[14] J. Johnson, R. Krishna, M. Stark, L. Li, D. A. Shamma, M. S. Bernstein, and F. Li.Image retrieval using scene graphs. InCVPR, 2015.
[15] A. Karpathy, A. Joulin, and F. Li. Deep fragment embeddings for bidirectional imagesentence mapping. InNIPS, 2014.
[16] A. Karpathy and F. Li. Deep visual-semantic alignments for generating image de-scriptions. InCVPR, 2015.
[17] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. InICLR,2015.
[18] J. Krause, J. Johnson, R. Krishna, and L. Fei-Fei. A hierarchical approach for gen-erating descriptive image paragraphs. InCVPR, 2017.
[19] R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalan-tidis, L. Li, D. A. Shamma, M. S. Bernstein, and L. Fei-Fei. Visual genome: Connect-ing language and vision using crowdsourced dense image annotations.InternationalJournal of Computer Vision, 123(1):32–73, 2017.
[20] X. Liang, Z. Hu, H. Zhang, C. Gan, and E. P. Xing. Recurrent topic-transition GANfor visual paragraph generation. InICCV, 2017.
[21] X. Liang, L. Lee, and E. P. Xing. Deep variation-structured reinforcement learningfor visual relationship and attribute detection. InCVPR, 2017.
[22] T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ́ar, andC. L. Zitnick. Microsoft COCO: common objects in context. InECCV, 2014.
[23] J. Lu, C. Xiong, D. Parikh, and R. Socher. Knowing when to look: Adaptive attentionvia a visual sentinel for image captioning. InCVPR, 2017.
[24] J. Lu, J. Yang, D. Batra, and D. Parikh. Neural baby talk. InCVPR, 2018.
[25] D. Marr. Vision: A computational investigation into the human representation andprocessing of visual information. 1982.
[26] D. Teney, L. Liu, and A. van den Hengel. Graph-structured representations for visualquestion answering. InCVPR, 2017.
[27] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural imagecaption generator. InCVPR, 2015.
[28] Y. Wang, C. Liu, X. Zeng, and A. L. Yuille. Scene graph parsing as dependencyparsing. InNAACL, 2018.
[29] S. Woo, D. Kim, D. Cho, and I. S. Kweon. Linknet: Relational embedding for scenegraph. InNIPS, 2018.
[30] D. Xu, Y. Zhu, C. B. Choy, and L. Fei-Fei. Scene graph generation by iterativemessage passing. InCVPR, 2017.
[31] K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visualattention. InICML, 2015.
[32] J. Yang, J. Lu, S. Lee, D. Batra, and D. Parikh. Graph R-CNN for scene graphgeneration. InECCV, 2018.
[33] X. Yang, K. Tang, H. Zhang, and J. Cai. Auto-encoding scene graphs for imagecaptioning.CoRR, abs/1812.02378, 2018.
[34] Z. Yang, Y. Yuan, Y. Wu, W. W. Cohen, and R. Salakhutdinov. Review networksfor caption generation. InNIPS, 2016.
[35] T. Yao, Y. Pan, Y. Li, and T. Mei. Exploring visual relationship for image captioning.InECCV, 2018.
[36] Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo. Image captioning with semanticattention. InCVPR, 2016.
[37] R. Zellers, M. Yatskar, S. Thomson, and Y. Choi. Neural motifs: Scene graph parsingwith global context. InCVPR, 2018.
[38] H. Zhang, Z. Kyaw, S. Chang, and T. Chua. Cvpr. 2017.
[39] Z. Zhu, Z. Xue, and Z. Yuan. Topic-guided attention for image captioning. InICIP,2018.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73582-
dc.description.abstract在近幾年的電腦視覺領域,愈來愈多人研究圖片段落生成(image paragraphing)然而,因為圖片與文字有著根本結構上的不同,很難找到適合的方式將圖片資訊對應成文字,所以由現有方法產生的圖片段落仍充斥著許多語意上的錯誤。在這篇論文,我們提出了一個兩階段生成圖片段落的方法SG2P,來解決這個問題。相較於以往直接從圖片轉換成文字,我們先將圖片轉變成另一種語意結構的表示方法──場景圖(scene graph),期望透過場景圖可以生成更加語意正確的段落。除此之外,我們還使用了分級的循環語言模型,搭配跳躍連結以減輕在長句文字產生時的梯度消失問題。
為了評估結果,我們提出了一個新的衡量標準cSPICE,是一個基於圖比較的一種衡量標準,可以用來計算段落的語意正確性。實驗結果顯示:相較於直接將圖片轉換成段落,如果先將原始圖片轉換成場景圖,再利用其來產生對應的段落,分數會有顯著的進步。
zh_TW
dc.description.abstractAutomatically describing an image with a paragraph has gained popularity recently in the field of computer vision. However, the results of existing methods are full of semantic errors as the features extracted directly from raw image they use have difficulty bridging the visual semantic information to language. In this thesis, we propose SG2P which is a two-staged network to address this issue. Instead of from raw image, the proposed method leverages features encoded from the scene graph, an intermediate semantic structure of an image, aiming to generate stronger semantically correct paragraphs. With the explicit semantic representation, we hypothesize that features from scene graph retains more semantic information than directly from raw image. In addition, we use hierarchical recurrent language model with skip connection in SG2P to reduce the effect of gradient vanishment during long generation process.
To evaluate the results, we propose a new evaluation metric called c-SPICE, which can automatically compute the semantic correctness of generated paragraphs by a graph-based comparison. Experiment shows that methods utilizing features from scene graph outperform those directly from raw image in c-SPICE.
en
dc.description.provenanceMade available in DSpace on 2021-06-17T08:06:21Z (GMT). No. of bitstreams: 1
ntu-108-R05922094-1.pdf: 5169614 bytes, checksum: 34b798c7853a22d8983d67b7a4815342 (MD5)
Previous issue date: 2019
en
dc.description.tableofcontentsAcknowledgments i
Abstract iii
List of Figures viii
List of Tables ix
Chapter 1 Introduction 1
1.1 BackgroundandMotivation 1
1.2 ResearchObjective 2
1.3 ThesisOrganization 3
Chapter 2 Related Work 4
2.1 ImageCaptioning 4
2.2 ImageParagraphing 5
2.3 SceneGraph 6
Chapter 3 Problem Definition 8
3.1 SymbolTable 9
Chapter 4 Methodology 11
4.1 SceneGraph 12
4.2 SceneGraphConstruction 13
4.2.1 Generation 14
4.2.2 Generation 14
4.3 GraphConvolutionNetwork 15
4.4 ParagraphGenerator 18
4.4.1 SentenceRNN with Semantic Node Attention 19
4.4.2 WordRNNwithSharedSemanticContext 20
4.5 NetworkArchitecture 21
4.6 ImplementationDetail 21
Chapter 5 Experiment 25
5.1 ExperimentalSetup 25
5.1.1 DataSets 25
5.1.2 EvaluationMetrics 26
5.2 c-SPICE 27
5.3 Preprocessing 32
5.4 FullyConvolutionalLocalizationNetworks 32
5.5 ExperimentResults 33
5.5.1 Theeffectivenessofscenegraph 33
5.5.2 AblationStudy 34
5.5.3 MergingMulti-modalFeatures 35
5.5.4 QualitativeStudy 36
5.5.5 The Effect Image Scene Graph Has on the Results 36
Chapter 6 Conclusion 41
6.1 SummaryofContributions 41
6.2 FutureWork 42
Bibliography 43
dc.language.isoen
dc.subject圖片段落生成zh_TW
dc.subject場景圖生成zh_TW
dc.subject類神經網路zh_TW
dc.subjectImage Paragraphingen
dc.subjectNeural Networken
dc.subjectScene Graph Generationen
dc.title使用場景圖生成圖像之段落描述zh_TW
dc.titleSG2P: Image Paragraphing with Scene Graphen
dc.typeThesis
dc.date.schoolyear107-2
dc.description.degree碩士
dc.contributor.oralexamcommittee李明穗,楊智淵,陳維超,古倫維
dc.subject.keyword圖片段落生成,場景圖生成,類神經網路,zh_TW
dc.subject.keywordImage Paragraphing,Scene Graph Generation,Neural Network,en
dc.relation.page46
dc.identifier.doi10.6342/NTU201904003
dc.rights.note有償授權
dc.date.accepted2019-08-20
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-108-1.pdf
  未授權公開取用
5.05 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved