Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資料科學學位學程
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91119
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor古倫維zh_TW
dc.contributor.advisorLun-Wei Kuen
dc.contributor.author林良璞zh_TW
dc.contributor.authorNicholas Collin Suwonoen
dc.date.accessioned2023-10-24T17:11:37Z-
dc.date.available2025-08-09-
dc.date.copyright2023-10-24-
dc.date.issued2023-
dc.date.submitted2023-08-12-
dc.identifier.citation[1] M. Braun, A. Mainz, R. Chadowitz, B. Pfleging, and F. Alt. At your service: Designing voice assistant personalities to improve automotive user interfaces. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI ’19, page 1–11, New York, NY, USA, 2019. Association for Computing Machinery.
[2] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei. Language models are few-shot learners, 2020.
[3] Y. Chen, Z. Gan, Y. Cheng, J. Liu, and J. Liu. Distilling the knowledge of BERT for text generation. In ACL, 2020.
[4] J. Cho, J. Lei, H. Tan, and M. Bansal. Unifying vision-and-language tasks via text generation. In ICML, 2021.
[5] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
[6] F. Ferraro, N. Mostafazadeh, T.-H. Huang, L. Vanderwende, J. Devlin, M. Galley, and M. Mitchell. A survey of current datasets for vision and language research. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 207–213, Lisbon, Portugal, Sept. 2015. Association for Computational Linguistics.
[7] Google. Reverse geocoding.
[8] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
[9] C.-Y. Hsieh, C.-L. Li, C.-K. Yeh, H. Nakhost, Y. Fujii, A. Ratner, R. Krishna, C.-Y. Lee, and T. Pfister. Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. arXiv preprint arXiv:2305.02301, 2023.
[10] T.-H. K. Huang, F. Ferraro, N. Mostafazadeh, I. Misra, A. Agrawal, J. Devlin, R. Girshick, X. He, P. Kohli, D. Batra, C. L. Zitnick, D. Parikh, L. Vanderwende, M. Galley, and M. Mitchell. Visual storytelling. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1233–1239, San Diego, California, June 2016. Association for Computational Linguistics.
[11] K. Jing and J. Xu. A survey on neural network language models, 2019.
[12] D. Large, G. Burnett, V. Antrobus, and L. Skrypchuk. Stimulating conversation: Engaging drivers in natural language interactions with an autonomous digital driving assistant to counteract passive task-related fatigue. 2017.
[13] C.-Y. Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out. ACL, July 2004.
[14] S.-C. Lin, C.-H. Hsu, W. Talamonti, Y. Zhang, S. Oney, J. Mars, and L. Tang. Adasa: A conversational in-vehicle digital assistant for advanced driver assistance features. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology, UIST ’18, page 531–542, New York, NY, USA, 2018. Association for Computing Machinery.
[15] Y. Liu, T. Han, S. Ma, J. Zhang, Y. Yang, J. Tian, H. He, A. Li, M. He, Z. Liu, Z. Wu, D. Zhu, X. Li, N. Qiang, D. Shen, T. Liu, and B. Ge. Summary of chatgpt/gpt-4 research and perspective towards the future of large language models, 2023.
[16] Z. Lu, K. Ding, Y. Zhang, J. Li, B. Peng, and L. Liu. Engage the public: Poll question generation for social media posts. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 29–40, Online, Aug. 2021. Association for Computational Linguistics.
[17] S. Mehta and M. Rastegari. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. arXiv preprint arXiv:2110.02178, 2021.
[18] N. Mostafazadeh, I. Misra, J. Devlin, M. Mitchell, X. He, and L. Vanderwende. Generating natural questions about an image. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1802–1813, Berlin, Germany, Aug. 2016. Association for Computational Linguistics.
[19] OpenAI. Gpt-4 technical report, 2023.
[20] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: a method for automatic evaluation of machine translation. In ACL, 2002.
[21] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. Language models are unsupervised multitask learners. 2019.
[22] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 2020.
[23] P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang. SQuAD: 100,000+ questions for machine comprehension of text. In EMNLP, 2016.
[24] T. Sellam, D. Das, and A. P. Parikh. BLEURT: learning robust metrics for text gen eration. arXiv preprint arXiv:2004.04696, 2020.
[25] Z. Sun, H. Yu, X. Song, R. Liu, Y. Yang, and D. Zhou. Mobilebert: a compact task-agnostic bert for resource-limited devices. arXiv preprint arXiv:2004.02984, 2020.
[26] Y. Tay, M. Dehghani, J. Rao, W. Fedus, S. Abnar, H. W. Chung, S. Narang, D. Yogatama, A. Vaswani, and D. Metzler. Scale efficiently: Insights from pre-training and fine-tuning transformers. arXiv preprint arXiv:2109.10686, 2021.
[27] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
[28] L. Vanderwende, A. Menezes, and C. Quirk. An AMR parser for English, French, German, Spanish and Japanese and a new AMR-annotated corpus. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pages 26–30, Denver, Colorado, June 2015. Association for Computational Linguistics.
[29] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need, 2023.
[30] P. Wang, A. Yang, R. Men, J. Lin, S. Bai, Z. Li, J. Ma, C. Zhou, J. Zhou, and H. Yang. Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework. In ICML, 2022.
[31] Y. Wang, Y. Kordi, S. Mishra, A. Liu, N. A. Smith, D. Khashabi, and H. Hajishirzi. Self-instruct: Aligning language models with self-generated instructions, 2023.
[32] J. Wei, M. Bosma, V. Y. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, and Q. V. Le. Finetuned language models are zero-shot learners, 2022.
[33] M.-H. Yeh, V. Chen, T.-H. K. Haung, and L.-W. Ku. Multi-vqg: Generating engaging questions for multiple images, 2022.
[34] A. R. Zamir and M. Shah. Image geo-localization based on multiplenearest neighbor feature matching using generalized graphs. PAMI, 2014.
[35] M. J. Zhang and E. Choi. SituatedQA: Incorporating extra-linguistic contexts into QA. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
[36] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi. BERTScore: Evaluating text generation with BERT. arXiv preprint arXiv:1904.09675, 2019.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91119-
dc.description.abstract本工作引入了一項新的任務,即位置感知視覺問題生成(LocaVQG),旨在從與特定地理位置相關的數據中生成引人入勝的問題。具體而言,我們使用周圍的圖像和GPS坐標來表示這種位置感知信息。為了應對這個任務,我們提出了一個數據集生成流程,利用GPT-4來生成多樣且複雜的問題。然後,我們旨在學習一個輕量級模型,可以應用於邊緣設備,如手機。為此,我們提出了一種可靠地從位置感知信息生成引人入勝問題的方法。我們提出的方法在人工評估(例如參與度,連接性,連貫性)和自動評估指標(例如BERTScore,ROUGE-2)方面優於基線方法。此外,我們進行了大量消融研究,以證明我們提出的數據集生成和解決該任務的技術的有效性。zh_TW
dc.description.abstractThis work introduces a novel task, location-aware visual question generation (LocaVQG), which aims to generate engaging questions from data relevant to a particular geographical location. Specifically, we represent such location-aware information with surrounding images and a GPS coordinate. To tackle this task, we present a dataset generation pipeline that leverages GPT-4 to produce diverse and sophisticated questions. Then, we aim to learn a lightweight model that can address the LocaVQG task and fit on an edge device, such as a mobile phone. To this end, we propose a method which can reliably generate engaging questions from location-aware information. Our proposed method outperforms baselines regarding human evaluation (\eg engagement, grounding, coherence) and automatic evaluation metrics (\eg BERTScore, ROUGE-2). Moreover, we conduct extensive ablation studies to justify our proposed techniques for both generating the dataset and solving the task.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-10-24T17:11:37Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2023-10-24T17:11:37Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsVerification Letter from the Oral Examination Committee i
Acknowledgements ii
摘要 iv
Abstract v
Contents vi
List of Figures x
List of Tables xi
Denotation xii
Chapter 1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Chapter 2 Literature Review 5
2.1 In-car intelligent assistant system . . . . . . . . . . . . . . . . . . . 5
2.2 Visual and Language . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Training with LLM Generated Data . . . . . . . . . . . . . . . . . . 9
2.4 Language Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Lightweight models . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Chapter 3 Methodology 17
3.1 Initial Method: Multitask . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 Multitask Story and Question . . . . . . . . . . . . . . . . . . . . . 17
3.1.2 Generate Description with Question . . . . . . . . . . . . . . . . . 19
3.2 Proposed Method: LocaVQG . . . . . . . . . . . . . . . . . . . . . 20
3.2.1 Location-Aware Information . . . . . . . . . . . . . . . . . . . . . 21
3.2.2 LocaVQG Task Tuples . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.3 FDT5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.3.1 Knowledge Distillation . . . . . . . . . . . . . . . . . 23
3.2.3.2 Post-Inference Filtering . . . . . . . . . . . . . . . . . 23
Chapter 4 LocaVQG Dataset 25
4.1 Choosing the Image Sequences . . . . . . . . . . . . . . . . . . . . 25
4.2 GPT-4 Prompt Construction . . . . . . . . . . . . . . . . . . . . . . 26
4.2.1 Captioning Streetview Images . . . . . . . . . . . . . . . . . . . . 26
4.2.2 Reverse Geocoding GPS coordinate . . . . . . . . . . . . . . . . . 27
4.2.3 Constructing prompts . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Engaging Question Classifier . . . . . . . . . . . . . . . . . . . . . 28
4.4 Dataset Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4.1 Question Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4.2 Frequent Trigrams and Words . . . . . . . . . . . . . . . . . . . . . 31
4.4.3 Question Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.5 Human Data Annotation . . . . . . . . . . . . . . . . . . . . . . . . 34
Chapter 5 Experiment 1: Multitasking Model 35
5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.1.1 Hardware & Hyper parameter setup . . . . . . . . . . . . . . . . . 35
5.1.2 Baseline Model: VLT5 . . . . . . . . . . . . . . . . . . . . . . . . 36
5.1.3 Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2 Results & Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2.1 Human Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2.2 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Chapter 6 Experiment 2: LocaVQG 41
6.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.1.1 Hardware & Hyper parameter setup . . . . . . . . . . . . . . . . . 41
6.1.2 Baseline Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.1.2.1 T5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.1.2.2 MVQG-VL-T5 . . . . . . . . . . . . . . . . . . . . . 43
6.1.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.2 Results & Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.2.1 Human Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.2.2 Automatic Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.2.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.2.3.1 Employing Engaging Question Classifier . . . . . . . . 47
6.2.3.2 Incorporating GPS Coordinates . . . . . . . . . . . . . 48
6.2.3.3 Varying Dataset Sizes . . . . . . . . . . . . . . . . . . 49
6.2.3.4 Incorporating Directions . . . . . . . . . . . . . . . . . 50
6.2.3.5 Roles of GPT Models . . . . . . . . . . . . . . . . . . 51
Chapter 7 Error Analyses 53
7.1 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Chapter 8 Conclusion 58
8.1 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
8.1.1 Biases in AMT workers . . . . . . . . . . . . . . . . . . . . . . . . 59
8.1.2 Location-aware information . . . . . . . . . . . . . . . . . . . . . . 59
8.1.3 Address-aware LLMs . . . . . . . . . . . . . . . . . . . . . . . . . 59
8.1.4 Human evaluation setup . . . . . . . . . . . . . . . . . . . . . . . . 60
8.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
References 62
Appendix A — Survey Interface 67
Appendix B — Human Annotation Examples 69
-
dc.language.isoen-
dc.subject電動汽車zh_TW
dc.subjectLLMzh_TW
dc.subject引人入勝的問題zh_TW
dc.subject司機zh_TW
dc.subject駕駛助手zh_TW
dc.subject位置zh_TW
dc.subjectLLMen
dc.subjectDriveren
dc.subjectEngaging Questionen
dc.subjectDriving Assistantsen
dc.subjectElectric Vehiclesen
dc.subjectLocationen
dc.title利用輕量模型進行位置感知視覺問題生成zh_TW
dc.titleLocation-Aware Visual Question Generation with Lightweight Modelsen
dc.typeThesis-
dc.date.schoolyear111-2-
dc.description.degree碩士-
dc.contributor.coadvisor孫紹華zh_TW
dc.contributor.coadvisorShao-Hua Sunen
dc.contributor.oralexamcommittee黃挺豪;李政德zh_TW
dc.contributor.oralexamcommitteeTing-Hao Huang;Cheng-Te Lien
dc.subject.keyword位置,電動汽車,駕駛助手,司機,引人入勝的問題,LLM,zh_TW
dc.subject.keywordLocation,Electric Vehicles,Driving Assistants,Driver,Engaging Question,LLM,en
dc.relation.page69-
dc.identifier.doi10.6342/NTU202303856-
dc.rights.note同意授權(限校園內公開)-
dc.date.accepted2023-08-12-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資料科學學位學程-
dc.date.embargo-lift2028-08-09-
顯示於系所單位:資料科學學位學程

文件中的檔案:
檔案 大小格式 
ntu-111-2.pdf
  未授權公開取用
10.86 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved