Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99614
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor施吉昇zh_TW
dc.contributor.advisorChi-Sheng Shihen
dc.contributor.author曾益銘zh_TW
dc.contributor.authorYik-Ming Chinen
dc.date.accessioned2025-09-17T16:08:49Z-
dc.date.available2025-09-18-
dc.date.copyright2025-09-17-
dc.date.issued2025-
dc.date.submitted2025-08-11-
dc.identifier.citationA. Foundation, “Autoware - open-source software for autonomous driving,” https://github.com/autowarefoundation/autoware, 2025, accessed: 2025-06-27.

A. Aksjonov and V. Kyrki, “Rule-based decision-making system for autonomous vehicles at intersections with mixed traffic environment,” in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), 2021, pp. 660–666.

Í. B. Viana and N. Aouf, “Distributed cooperative path-planning for autonomous vehicles integrating human driver trajectories,” in 2018 International Conference on Intelligent Systems (IS), 2018, pp. 655–661.

M. P. Ronecker and Y. Zhu, “Deep q-network based decision making for autonomous driving,” in 2019 3rd International Conference on Robotics and Automation Sciences (ICRAS). IEEE, Jun. 2019, p. 154–160. [Online]. Available: http://dx.doi.org/10.1109/ICRAS.2019.8808950

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” 2019. [Online]. Available: https://arxiv.org/abs/1509.02971

S. Malla, B. Dariush, and C. Choi, “Titan: Future forecast using action priors,” 2020. [Online]. Available: https://arxiv.org/abs/2003.13886

C. Sima, K. Renz, K. Chitta, L. Chen, H. Zhang, C. Xie, J. Beißwenger, P. Luo, A. Geiger, and H. Li, “Drivelm: Driving with graph visual question answering,” 2025. [Online]. Available: https://arxiv.org/abs/2312.14150

S. Hu, L. Chen, P. Wu, H. Li, J. Yan, and D. Tao, “St-p3: End-to-end visionbased autonomous driving via spatial-temporal feature learning,” 2022. [Online]. Available: https://arxiv.org/abs/2207.07601

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” 2021. [Online]. Available: https://arxiv.org/abs/2010.11929

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” 2020. [Online]. Available: https://arxiv.org/abs/2005.14165

H. Liu, C. Li, Q. Wu, and Y. J. Lee, “Visual instruction tuning,” 2023. [Online]. Available: https://arxiv.org/abs/2304.08485

P. Scheffe, T. M. Henneken, M. Kloock, and B. Alrifaee, “Sequential convex programming methods for real-time optimal trajectory planning in autonomous vehicle racing,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 1, pp. 661–672, 2023.

H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” 2020. [Online]. Available: https://arxiv.org/abs/1903.11027

J. Mao, Y. Qian, J. Ye, H. Zhao, and Y. Wang, “Gpt-driver: Learning to drive with gpt,” 2023. [Online]. Available: https://arxiv.org/abs/2310.01415

Z. Xu, Y. Zhang, E. Xie, Z. Zhao, Y. Guo, K.-Y. K. Wong, Z. Li, and H. Zhao, “Drivegpt4: Interpretable end-to-end autonomous driving via large language model,” 2024. [Online]. Available: https://arxiv.org/abs/2310.01412

J. Kim, A. Rohrbach, T. Darrell, J. Canny, and Z. Akata, “Textual explanations for self-driving vehicles,” 2018. [Online]. Available: https://arxiv.org/abs/1807.11546

S. Xie, L. Kong, Y. Dong, C. Sima, W. Zhang, Q. A. Chen, Z. Liu, and L. Pan, “Are vlms ready for autonomous driving? an empirical study from the reliability, data, and metric perspectives,” 2025. [Online]. Available: https://arxiv.org/abs/2501.04003

Meta AI, “Llama 3.2: Revolutionizing edge ai and vision with open, customizable models,” September 2024. [Online]. Available: https://ai.meta.com/blog/ llama-3-2-connect-2024-vision-edge-mobile-devices/

H. Liu, C. Li, Y. Li, B. Li, Y. Zhang, S. Shen, and Y. J. Lee, “Llava-next: Improved reasoning, ocr, and world knowledge,” January 2024. [Online]. Available: https://llava-vl.github.io/blog/2024-01-30-llava-next/

Z. Chen, J. Wu, W. Wang, W. Su, G. Chen, S. Xing, M. Zhong, Q. Zhang, X. Zhu, L. Lu et al., “Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 24 185–24 198.

M. Abdin, J. Aneja, H. Awadalla, A. Awadallah, A. A. Awan, N. Bach, A. Bahree, A. Bakhtiari, J. Bao, H. Behl, A. Benhaim, M. Bilenko, J. Bjorck, S. Bubeck, M. Cai, Q. Cai, V. Chaudhary, D. Chen, D. Chen, W. Chen, Y.-C. Chen, Y.-L. Chen, H. Cheng, P. Chopra, X. Dai, M. Dixon, R. Eldan, V. Fragoso, J. Gao, M. Gao, M. Gao, A. Garg, A. D. Giorno, A. Goswami, S. Gunasekar, E. Haider, J. Hao, R. J. Hewett, W. Hu, J. Huynh, D. Iter, S. A. Jacobs, M. Javaheripi, X. Jin, N. Karampatziakis, P. Kauffmann, M. Khademi, D. Kim, Y. J. Kim, L. Kurilenko, J. R. Lee, Y. T. Lee, Y. Li, Y. Li, C. Liang, L. Liden, X. Lin, Z. Lin, C. Liu, L. Liu, M. Liu, W. Liu, X. Liu, C. Luo, P. Madan, A. Mahmoudzadeh, D. Majercak, M. Mazzola, C. C. T. Mendes, A. Mitra, H. Modi, A. Nguyen, B. Norick, B. Patra, D. Perez-Becker, T. Portet, R. Pryzant, H. Qin, M. Radmilac, L. Ren, G. de Rosa, C. Rosset, S. Roy, O. Ruwase, O. Saarikivi, A. Saied, A. Salim, M. Santacroce, S. Shah, N. Shang, H. Sharma, Y. Shen, S. Shukla, X. Song, M. Tanaka, A. Tupini, P. Vaddamanu, C. Wang, G. Wang, L. Wang, S. Wang, X. Wang, Y. Wang, R. Ward, W. Wen, P. Witte, H. Wu, X. Wu, M. Wyatt, B. Xiao, C. Xu, J. Xu, W. Xu, J. Xue, S. Yadav, F. Yang, J. Yang, Y. Yang, Z. Yang, D. Yu, L. Yuan, C. Zhang, C. Zhang, J. Zhang, L. L. Zhang, Y. Zhang, Y. Zhang, Y. Zhang, and X. Zhou, “Phi-3 technical report: A highly capable language model locally on your phone,” 2024. [Online]. Available: https://arxiv.org/abs/2404.14219

P. Wang, S. Bai, S. Tan, S. Wang, Z. Fan, J. Bai, K. Chen, X. Liu, J. Wang, W. Ge, Y. Fan, K. Dang, M. Du, X. Ren, R. Men, D. Liu, C. Zhou, J. Zhou, and J. Lin, “Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution,” 2024. [Online]. Available: https://arxiv.org/abs/2409.12191

J. L. Schönberger and J.-M. Frahm, “Structure-from-motion revisited,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

E. Ferrier-Barbut, D. Vaufreydaz, J.-A. David, J. Lussereau, and A. Spalanzani, “Personal space of autonomous car’s passengers sitting in the driver’s seat,” in 2018 IEEE Intelligent Vehicles Symposium (IV), 2018, pp. 2022–2029.

E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” 2021. [Online]. Available: https://arxiv.org/abs/2106.09685

K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method for automatic evaluation of machine translation,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, P. Isabelle, E. Charniak, and D. Lin, Eds. Philadelphia, Pennsylvania, USA: Association for Computational Linguistics, Jul. 2002, pp. 311–318. [Online]. Available: https://aclanthology.org/P02-1040/

C.-Y. Lin, “ROUGE: A package for automatic evaluation of summaries,” in Text Summarization Branches Out. Barcelona, Spain: Association for Computational Linguistics, Jul. 2004, pp. 74–81. [Online]. Available: https://aclanthology.org/W04-1013/

S. Banerjee and A. Lavie, “METEOR: An automatic metric for MT evaluation with improved correlation with human judgments,” in Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, J. Goldstein, A. Lavie, C.-Y. Lin, and C. Voss, Eds. Ann Arbor, Michigan: Association for Computational Linguistics, Jun. 2005, pp. 65–72. [Online]. Available: https://aclanthology.org/W05-0909/

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” in Proceedings of the 1st Annual Conference on Robot Learning, 2017, pp. 1–16.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99614-
dc.description.abstract現代自駕系統在高密度城市環境中表現不佳,主要原因在於這些場景中社會互動頻繁且充滿不確定性,對路徑規劃構成挑戰。現有方法多依賴僵化的基於規則模組,或僅針對場景幾何優化學習型模型,卻常無法有效捕捉類似人類的社會化互動認知與境推理能力。相較之下,我們主張,一個具備社會認知的規劃器應當能夠聯合考慮空間、語意與行為線索,以生成符合潛在社會規範的行車軌跡。為此,我們提出一個基於視覺語言模型(VLM)的框架,整合社會屬性理解與路徑生成,並具備可部署性。本方法利用視覺輸入與細緻的行為標註,並透過自訂損失函數進行模型訓練,以同時對齊軌跡與語意資訊。在經精選的 TITAN 資料集中進行評估後,我們的系統展現出良好的安全性與社會化互動認知。實驗結果顯示,本系統在四項指標上均有優異表現:在 NLP 評估指標方面,BLEU-4 達到 0.21,ROUGE-L 為 0.37,METEOR 為 0.52,GPT-4o 評分指標達到 86 分,VQA 準確率為 44%,平均 L2 距離誤差為 30 像素,顯示出出色的語意理解與規劃能力。本研究提倡以規劃為核心的學習方法,結合社會語境監督,並提供一套具社會互動認知自駕解決方案。zh_TW
dc.description.abstractModern autonomous driving systems struggle in dense urban environments where social interactions and uncertainty dominate the planning landscape. Existing approaches either rely on rigid rule-based modules or optimize learning-based models on scene geometry, but often fail to capture human-like social awareness and contextual reasoning. In contrast, this work argues that a socially-informed planner should reason jointly over spatial, semantic, and behavioral cues to generate trajectories aligned with implicit social norms. Toward this goal, this work proposes a Vision-Language Model (VLM)-based framework that unifies social attribute understanding and trajectory generation within a deployable architecture. Leveraging visual inputs and fine-grained action annotations, the models is trained with a custom loss that accounts for trajectory and semantic alignment. Evaluated on a curated subset of the TITAN dataset, the system exhibits safety and social compliance. Experimental results show strong performance across four key evaluations: the model achieves a BLEU-4 score of 0.21, ROUGE-L of 0.37, and METEOR of 0.52 in NLP metrics; a GPT-4o Rubric Score of 86; a VQA accuracy of 44%; and an average L2 distance error of 30 pixels, reflecting robust reasoning and planning capabilities. This work advocates for planning-centric learning with socially grounded supervision, and offers a solution toward socially-aware autonomous driving.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-09-17T16:08:49Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-09-17T16:08:49Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents口試委員審定書 i
誌謝 iii
摘要 v
Abstract vii
Contents ix
List of Figures xiii
List of Tables xv
Chapter 1 Introduction 1
1.1 Motivation 4
1.2 Contribution 7
1.3 Thesis Organization 8
Chapter 2 Background and Related Works 11
2.1 Software Stack of Autonomous Vehicle and Planning Module 11
2.2 Dense Urban Environments Definition and Public Datasets 13
2.3 Related Works 17
2.3.1 LLMs to VLMs 17
2.3.2 Rule-Based Decision-Making System for Autonomous Vehicles at
Intersections with Mixed Traffic Environment 18
2.3.3 ST-P3 19
2.3.4 GPT-Driver 20
2.3.5 DriveGPT4 21
2.3.6 DriveLM 21
Chapter 3 System Architecture and Problem Definition 23
3.1 System Architecture 23
3.2 Problem Definition 26
Chapter 4 Design and Implementation 29
4.1 Design 29
4.2 Implementation 31
4.2.1 Keyframe Selection 31
4.2.2 Social Cue Structuring 31
4.2.3 Camera Pose Estimation and 3D-to-2D Projection 32
4.2.4 Groundtruth Generation via GPT-4o 33
4.2.5 Prompt Engineering 34
4.2.6 Parameter-Efficient Fine-Tuning with LoRA 36
4.2.7 Inference and Postprocessing 38
Chapter 5 Performance Evaluation 39
5.1 Experiment Setup 40
5.2 Experiment Results 43
5.2.1 BLEU-4 43
5.2.2 ROUGE-L 46
5.2.3 METEOR 48
5.2.4 GPT-4o Rubric Score 50
5.2.5 GPT-4o Rubric Score (Evaluate after quantized to 4bit) 51
5.2.6 VQA Accuracy 52
5.2.7 L2 Distance Error 53
5.3 Analysis 55
5.4 Discussion 57
Chapter 6 Conclusion and Future Works 59
6.1 Conclusion 59
6.2 Future Works 59
References 61
-
dc.language.isoen-
dc.subject路徑規劃zh_TW
dc.subject社會化互動認知zh_TW
dc.subject視覺語言模型zh_TW
dc.subject大語言模型zh_TW
dc.subject自駕車zh_TW
dc.subjectMotion Planningen
dc.subjectSocially-Awareen
dc.subjectVision Language Modelen
dc.subjectLarge Language Modelen
dc.subjectAutonomous Vehicleen
dc.title應用視覺語言模型於高密度市區中進行具有社會化互動認知之自駕車路徑規劃zh_TW
dc.titleTrajectory Planning in Dense Urban Environments: Utilizing Vision-Language Models to Learn Socially-Aware Behaviors for High-Uncertainty Scenariosen
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee洪士灝;傅立成;郭峻因zh_TW
dc.contributor.oralexamcommitteeShih-Hao Hung;Li-Chen Fu;Jiun-In Guoen
dc.subject.keyword自駕車,路徑規劃,大語言模型,視覺語言模型,社會化互動認知,zh_TW
dc.subject.keywordAutonomous Vehicle,Motion Planning,Large Language Model,Vision Language Model,Socially-Aware,en
dc.relation.page72-
dc.identifier.doi10.6342/NTU202502673-
dc.rights.note未授權-
dc.date.accepted2025-08-14-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊網路與多媒體研究所-
dc.date.embargo-liftN/A-
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf
  未授權公開取用
5.79 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved