從多模態資料建立個人知識庫與生活事件檢索

Chia-Chun Chang; 張家郡

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50587

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳信希(Hsin-Hsi Chen)
dc.contributor.author	Chia-Chun Chang	en
dc.contributor.author	張家郡	zh_TW
dc.date.accessioned	2021-06-15T12:47:31Z	-
dc.date.available	2020-08-25
dc.date.copyright	2020-08-25
dc.date.issued	2020
dc.date.submitted	2020-08-12
dc.identifier.citation	[1] Krisztian Balog and Tom Kenter. Personal knowledge graphs: A research agenda. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of In- formation Retrieval, ICTIR ’19, page 217–220, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450368810. . URL https: //doi.org/10.1145/3341981.3344241. [2] Jon Louis Bentley. K-d trees for semidynamic point sets. In Proceedings of the Sixth Annual Symposium on Computational Geometry, SCG ’90, page 187–197, New York, NY, USA, 1990. Association for Computing Machinery. ISBN 0897913620.. URL https://doi.org/10.1145/98524.98564. [3] VannevarBush. AsWeMayThink. AtlanticMonthly, 176(1):641–649, March1945. ISSN 1072-5520. . URL http://www.theatlantic.com/doc/194507/bush. [4] João Carreira and Andrew Zisserman. Quo vadis, action recognition? A new model and the kinetics dataset. CoRR, abs/1705.07750, 2017. URL http://arxiv.org/abs/ 1705.07750. [5] Jingyu Cui, Fang Wen, and Xiaoou Tang. Real time google and live image search re- ranking. In Proceedings of the 16th ACM International Conference on Multimedia, MM ’08, page 729–732, New York, NY, USA, 2008. Association for Computing Machinery. ISBN 9781605583037. . URL https://doi.org/10.1145/1459359. 1459471. [6] Duc-Tien Dang-Nguyen, Liting Zhou, Rashmi Gupta, Michael Riegler, and Cathal Gurrin. Building a disclosed lifelog dataset: Challenges, principles and processes. In Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing, CBMI ’17, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450353335. . URL https://doi.org/10.1145/3095713. 3095736. [7] Duc-Tien Dang-Nguyen, Luca Piras, Michael Riegler, Liting Zhou, Mathias Lux, and Cathal Gurrin. Overview of ImageCLEFlifelog 2018: Daily Living Understand- ing and Lifelog Moment Retrieval. In CLEF2018 Working Notes, CEUR Workshop Proceedings, Avignon, France, September 10-14 2018. CEUR-WS.org <http://ceur- ws.org>. [8] Duc-Tien Dang-Nguyen, Luca Piras, Michael Riegler, Liting Zhou, Mathias Lux, Minh-Triet Tran, Tu-Khiem Le, Van-Tu Ninh, and Cathal Gurrin. Overview of im- agecleflifelog 2019: Solve my life puzzle and lifelog moment retrieval. In Work- ing Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum, Lugano, Switzerland, September 9-12, 2019., 2019. URL http://ceur-ws.org/Vol-2380/ paper_223.pdf. [9] Luciano Del Corro and Rainer Gemulla. Clausie: clause-based open information extraction. In Proceedings of the 22nd international conference on World Wide Web, pages 355–366. ACM, 2013. [10] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2018. [11] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre- training of deep bidirectional transformers for language understanding. In Proceed- ings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Associa- tion for Computational Linguistics. . URL https://www.aclweb.org/anthology/ N19-1423. [12] A.R. Doherty and A.F. Smeaton. Automatically segmenting lifelog data into events. [13] A. Fathi, J. K. Hodgins, and J. M. Rehg. Social interactions: A first-person perspec- tive. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 1226–1233, 2012. [14] Ana Garcia del Molino, Joo-Hwee Lim, and Ah-Hwee Tan. Predicting visual con- text for unsupervised event segmentation in continuous photo-streams. 2018 ACM Multimedia Conference on Multimedia Conference - MM ’18, 2018. . URL http://dx.doi.org/10.1145/3240508.3240624. [15] C. Gurrin, H. Joho, F. Hopfgartner, L. Zhou, V.-T. Ninh, T.-K. Le, R. Albatal, D.-T. Dang-Nguyen, and G. Healy. Overview of the ntcir-14 lifelog-3 task, June 2019. URL http://eprints.whiterose.ac.uk/145705/. © 2019 The Authors. [16] Cathal Gurrin, Alan F. Smeaton, Daragh Byrne, Neil O’Hare, Gareth J. F. Jones, and Noel O’Connor. An examination of a large visual lifelog. In Hang Li, Ting Liu, Wei-Ying Ma, Tetsuya Sakai, Kam-Fai Wong, and Guodong Zhou, editors, In- formation Retrieval Technology, pages 537–542, Berlin, Heidelberg, 2008. Springer Berlin Heidelberg. ISBN 978-3-540-68636-1. [17] Cathal Gurrin, Hideo Joho, Frank Hopfgartner, Liting Zhou, and Rami Albatal. Overview of NTCIR-12 lifelog task. In Noriko Kando, Tetsuya Sakai, and Mark Sanderson, editors, Proceedings of the 12th NTCIR Confer- ence on Evaluation of Information Access Technologies, National Center of Sci- ences, Tokyo, Japan, June 7-10, 2016. National Institute of Informatics (NII), 2016. URL http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings12/pdf/ntcir/OVERVIEW/01-NTCIR12-OV-LIFELOG-GurrinC.pdf. [18] Cathal Gurrin, Hideo Joho, Frank Hopfgartner, Liting Zhou, Rashmi Gupta, Rami Albatal, and Duc Tien Dang Nguyen. Overview of ntcir-13 lifelog-2 task. 2017. [19] Cathal Gurrin, Tu-Khiem Le, Van-Tu Ninh, Duc-Tien Dang-Nguyen, Björn Þór Jónsson, Jakub Lokoc, Wolfgang Hürst, Minh-Triet Tran, and Klaus Schöffmann. Introduction to the third annual lifelog search challenge (lsc’20). In Cathal Gurrin, Björn Þór Jónsson, Noriko Kando, Klaus Schöffmann, Yi-Ping Phoebe Chen, and Noel E. O’Connor, editors, Proceedings of the 2020 on International Conference on MultimediaRetrieval,ICMR2020,Dublin,Ireland,June8-11,2020,pages584–585. ACM, 2020. . URL https://doi.org/10.1145/3372278.3388043. [20] Nebojsa Jojic, Alessandro Perina, and Vittorio Murino. Structural epitome: A way to summarize one’s visual experience. In Proceedings of the 23rd International Conference on Neural Information Processing Systems - Volume 1, NIPS’10, page 1027–1035, Red Hook, NY, USA, 2010. Curran Associates Inc. [21] Vaiva Kalnikaite, Abigail Sellen, Steve Whittaker, and David Kirk. Now let me see where i was: Understanding how lifelogs mediate memory. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’10, page 2045–2054, New York, NY, USA, 2010. Association for Computing Machinery. ISBN 9781605589299. . URL https://doi.org/10.1145/1753326.1753638. [22] Jiwei Li and Claire Cardie. Timeline generation. Proceedings of the 23rd in- ternational conference on World wide web - WWW ’14, 2014. . URL http: //dx.doi.org/10.1145/2566486.2567969. [23] Chin-Ho Lin, Hen-Hsen Huang, and Hsin-Hsi Chen. Learning to map natural lan- guage statements into knowledge base representations for knowledge base construc- tion. In Proceedings of the Eleventh International Conference on Language Re- sources and Evaluation (LREC 2018), Miyazaki, Japan, May 2018. European Lan- guageResourcesAssociation(ELRA). URL https://www.aclweb.org/anthology/ L18-1541. [24] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Wein- berger, editors, Advances in Neural Information Processing Systems 26, pages 3111–3119. Curran Associates, Inc., 2013. URL http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf. [25] George A. Miller. Wordnet: A lexical database for english. COMMUNICATIONS OF THE ACM, 38:39–41, 1995. [26] Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. In EMNLP, volume 14, pages 1532–1543, 2014. [27] Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. Pro- ceedings of the 2018 Conference of the North American Chapter of the Association forComputationalLinguistics: HumanLanguageTechnologies,Volume1(LongPa- pers), 2018. . URL http://dx.doi.org/10.18653/v1/N18-1202. [28] Radim Řehůřek and Petr Sojka. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45–50, Valletta, Malta, May 2010. ELRA. http://is.muni. cz/publication/884893/en. [29] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn:Towards real-time object detection with region proposal networks. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 91–99. Curran Associates, Inc., 2015. URL http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf. [30] ShaoqingRen,KaimingHe,RossGirshick,andJianSun. Fasterr-cnn: Towardsreal- time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6):1137–1149, Jun 2017. ISSN 2160-9292. URL http://dx.doi.org/10.1109/TPAMI.2016.2577031. [31] Alaa. M. Riad, Hamdy. K. Elminir, and Sameh. Abd-Elghany. Article: A literature review of image retrieval based on semantic concept. International Journal of Computer Applications, 40(11):12–19, December 2012. Full text available. [32] Stephen Robertson and Hugo Zaragoza. The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr., 3(4):333–389, April 2009. ISSN 1554-0669. . URL https://doi.org/10.1561/1500000019. [33] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. Inter- national Journal of Computer Vision (IJCV), 115(3):211–252, 2015. . [34] Xindi Shang, Donglin Di, Junbin Xiao, Yu Cao, Xun Yang, and Tat-Seng Chua. Annotating objects and relations in user-generated videos. In ACM International Conference on Multimedia Retrieval, Ottawa, ON, Canada, June 2019. [35] Tsun-Hsien Tang, Min-Huan Fu, Hen-Hsen Huang, Kuan-Ta Chen, and Hsin-Hsi Chen. Visual concept selection with textual knowledge for understanding activities ofdailylivingandlifemomentretrieval. InLindaCappellato,NicolaFerro,Jian-Yun Nie, and Laure Soulier, editors, Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, Avignon, France, September 10-14, 2018, volume 2125 of CEUR Workshop Proceedings. CEUR-WS.org, 2018. URL http://ceur-ws.org/ Vol-2125/paper_124.pdf. [36] Ottokar Tilk and Tanel Alumae. Bidirectional recurrent neural network with atten- tion mechanism for punctuation restoration. In INTERSPEECH, 2016. URL https: //pdfs.semanticscholar.org/8785/efdad2abc384d38e76a84fb96d19bbe788c1. pdf?_ga=2.252263625.1755374555.1538577228-1855782525.1538577228. [37] Yao-Hung Hubert Tsai, Santosh Divvala, Louis-Philippe Morency, Ruslan Salakhut- dinov, andAliFarhadi. Videorelationshipreasoningusinggatedspatio-temporalen- ergy graph. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [38] Grigorios Tsoumakas and Ioannis Katakis. Multi-label classification: An overview. Int J Data Warehousing and Mining, 2007:1–13, 2007. [39] Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, and Jamie Brew. Huggingface’s transformers: State-of-the-art natural language processing, 2019. [40] Ashwani Yadav, R.Roy, Vaishali Yadav, and Archek Kumar. Survey on content- based image retrieval and texture analysis with applications. International Journal of Signal Processing, Image Processing and Pattern Recognition, 7:41–50, 12 2014. [41] An-Zi Yen, Hen-Hsen Huang, and Hsin-Hsi Chen. Personal knowledge base con- struction from text-based lifelogs. In Benjamin Piwowarski, Max Chevalier, Éric Gaussier, Yoelle Maarek, Jian-Yun Nie, and Falk Scholer, editors, Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in In- formation Retrieval, SIGIR 2019, Paris, France, July 21-25, 2019, pages 185–194. ACM, 2019. . URL https://doi.org/10.1145/3331184.3331209. [42] An-Zi Yen, Hen-Hsen Huang, and Hsin-Hsi Chen. Ten questions in lifelog mining and information recall. CoRR, abs/2005.01535, 2020. URL https://arxiv.org/abs/ 2005.01535. [43] Dengsheng Zhang, Md. Monirul Islam, and Guojun Lu. A review on automatic imageannotationtechniques. PatternRecognit., 45(1):346–362, 2012. . URL https: //doi.org/10.1016/j.patcog.2011.05.013.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50587	-
dc.description.abstract	隨著科技的進展,穿戴式裝置越來越普及,人們更傾向於透過這些穿戴式裝置來紀錄自己的生活。其中紀錄的方式也從過往的以文字佐圖的圖文生活日誌轉變成以影片配上語音的影像網誌。然而,隨著資料量的大量增加,如何建立有效的檢索方式來達成快速的記憶回顧已經成了一個棘手的難題。其中困難的點不僅止於圖片與文字間的語意鴻溝,更包含了對於以事件高低階方式解讀的差異性。在這篇論文中,我們嘗試引入外部語意知識來建立文字與影像的檢索,用以解決圖片與文字間的語意鴻溝,並更以三個由旅遊性質為主的Youtuber建立了的影像網誌資料集,用該資料集訓練並建立了能夠透過圖文互補的特性所建置個人知識庫的模型。我們分別使用了兩種不同的資料集進行了以下兩個實驗:(1)以多模態資料檢索生活紀錄者經驗中之特定事件,以及(2)自動化建立個人知識庫。在檢索日常生活紀錄之特定事件的部分,我們透過外部的影像識別模型抽取出圖片資訊並結合外部資源的語意知識,以增強訓練媒合文字的編碼;在自動化建立個人知識庫的部分,我們以預訓練的影像抽取模型得到影片資訊,結合將編碼後的文字訊息將其分類出該影片所包含的事件,做到了個人知識庫建立。透過上述兩個我們我提出的方法,不僅僅能帶給生活紀錄檢索上的表現有所提升,更能有效運用圖文互補的特性提升了建立個人知識庫的效能。	zh_TW
dc.description.abstract	As the progress of the science and technology, wearable device has been increasing popular, and people tend to record their daily life with those devices. In the past, people used blog to record their life, and most of the blog contained lines of word as illustration for the pictures. Nowadays, people record their daily life as a Vlog which contains video with voice information. However, as the enormous growth of data, how to process personal data efficiently has become a critical problem. The difficulties of this topic is not only affected by the semantic gap between words and images, but also affected by the way people interpreting an images. In this paper, we utilize the external knowledge for solving the semantic gap between words and images. We also purpose a bread new video dataset with subtitles. Those videos are recorded by mainly three Youtubers and all content of the videos are about traveling. We build a model which can utilize the complementary property between words and images for constructing a personal knowledge base. We use two different dataset for processing the following two experiments individually: (1) to retrieve specified lifeloggger's events for memory recall, and (2) to automatically construct the personal knowledge base. For retrieving lifelogger's events, we extract information with a pre-trained model through the images. Moreover, we combine those information with external resources to enhance the training of semantic embedding. For the construction of personal knowledge base, our model summarize the possible events happened in the video with extracted video information and encoded subtitles. Those approaches proposed in this paper are not only enhance the performance of lifelog retrieval, but also effectively exploit the complementary property between words and images for constructing personal knowledge base.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T12:47:31Z (GMT). No. of bitstreams: 1 U0001-1108202010325800.pdf: 17942315 bytes, checksum: ff37526c224cb11642cec6f9f380cd1d (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	口試委員會審定書 iii 誌謝 v 摘要 vii Abstract ix 1 Introduction 1 1.1 Motivation . . . . 1 1.2 The Semantic Gap . . . . 3 1.3 Lifelog Image Retrieval . . . . 5 1.4 Personal Knowledge Base Construction . . . . 6 1.5 Thesis Organization . . . . 7 2 Related Work 8 2.1 Lifelogging . . . . 8 2.2 Lifelog Retrieval . . . . 10 2.2.1 Lifelog and Memory Recall . . . . 10 2.2.2 Image Retrieval . . . . 10 2.3 Personal Knowledge Base Construction . . . . 11 2.4 Semantic Knowledge . . . . 13 3 Dataset 15 3.1 Existing Lifelog Dataset . . . . 15 3.1.1 Existing Lifelog Dataset . . . . 15 3.2 The NTCIR-13 Lifelog Dataset . . . . 17 3.2.1 Dataset Overview . . . . 18 3.2.2 Image Preprocessing . . . . 18 3.2.3 Visual Concept Labeling . . . . 19 3.3 Our lifelog dataset . . . . 20 3.3.1 Dataset Overview . . . . 20 3.3.2 Preprocessing . . . . 22 4 Visual Image Retrieval 29 4.1 Task Description . . . . 29 4.2 Dataset . . . . 30 4.3 Method . . . . 31 4.3.1 Overview . . . . 31 4.3.2 Retrieval Framework . . . . 31 4.3.3 Interactive Operations . . . . 32 4.3.4 Query Suggestion . . . . 33 4.3.5 Retrieval Result Refinement . . . . 35 4.3.6 Experiment Setting . . . . 37 4.3.7 Evaluation Metrics . . . . 37 4.4 Result . . . . 38 5 Personal Knowledge Base Construction 42 5.1 Task Description . . . . 42 5.2 Dataset . . . . 43 5.3 Models . . . . 44 5.3.1 Input Features . . . . 45 5.3.2 Model Structure . . . . 46 5.4 Experiment Setting . . . . 48 5.4.1 Hyperparameters . . . . 48 5.4.2 Evaluation Metrics . . . . 49 5.5 Result . . . . 49 5.5.1 Overall Results . . . . 49 5.5.2 Label-wise Results . . . . 50 6. Conclusion and Future Work 53 6.1 Conclusion . . . . 53 6.2 Future Work . . . . 54 Bibliograph 56
dc.language.iso	en
dc.title	從多模態資料建立個人知識庫與生活事件檢索	zh_TW
dc.title	Personal Knowledge Base Construction from Multimodal Data for Lifelog Retrieval	en
dc.type	Thesis
dc.date.schoolyear	108-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	鄭卜壬(Pu-Jen Cheng),蔡銘峰(Ming-Feng Tsai),張嘉惠(Chia-Hui Chang)
dc.subject.keyword	生活紀錄,生活事件檢索,詞嵌入向量,圖文嵌入學習,個人知識庫,	zh_TW
dc.subject.keyword	Lifelog,Visual Lifelog Retrieval,Word Embedding,Image-text Embedding Learning,Personal Knowledge Base,	en
dc.relation.page	64
dc.identifier.doi	10.6342/NTU202002902
dc.rights.note	有償授權
dc.date.accepted	2020-08-12
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊網路與多媒體研究所	zh_TW
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
U0001-1108202010325800.pdf 目前未授權公開取用	17.52 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。