請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99614完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 施吉昇 | zh_TW |
| dc.contributor.advisor | Chi-Sheng Shih | en |
| dc.contributor.author | 曾益銘 | zh_TW |
| dc.contributor.author | Yik-Ming Chin | en |
| dc.date.accessioned | 2025-09-17T16:08:49Z | - |
| dc.date.available | 2025-09-18 | - |
| dc.date.copyright | 2025-09-17 | - |
| dc.date.issued | 2025 | - |
| dc.date.submitted | 2025-08-11 | - |
| dc.identifier.citation | A. Foundation, “Autoware - open-source software for autonomous driving,” https://github.com/autowarefoundation/autoware, 2025, accessed: 2025-06-27.
A. Aksjonov and V. Kyrki, “Rule-based decision-making system for autonomous vehicles at intersections with mixed traffic environment,” in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), 2021, pp. 660–666. Í. B. Viana and N. Aouf, “Distributed cooperative path-planning for autonomous vehicles integrating human driver trajectories,” in 2018 International Conference on Intelligent Systems (IS), 2018, pp. 655–661. M. P. Ronecker and Y. Zhu, “Deep q-network based decision making for autonomous driving,” in 2019 3rd International Conference on Robotics and Automation Sciences (ICRAS). IEEE, Jun. 2019, p. 154–160. [Online]. Available: http://dx.doi.org/10.1109/ICRAS.2019.8808950 T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” 2019. [Online]. Available: https://arxiv.org/abs/1509.02971 S. Malla, B. Dariush, and C. Choi, “Titan: Future forecast using action priors,” 2020. [Online]. Available: https://arxiv.org/abs/2003.13886 C. Sima, K. Renz, K. Chitta, L. Chen, H. Zhang, C. Xie, J. Beißwenger, P. Luo, A. Geiger, and H. Li, “Drivelm: Driving with graph visual question answering,” 2025. [Online]. Available: https://arxiv.org/abs/2312.14150 S. Hu, L. Chen, P. Wu, H. Li, J. Yan, and D. Tao, “St-p3: End-to-end visionbased autonomous driving via spatial-temporal feature learning,” 2022. [Online]. Available: https://arxiv.org/abs/2207.07601 A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” 2021. [Online]. Available: https://arxiv.org/abs/2010.11929 T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” 2020. [Online]. Available: https://arxiv.org/abs/2005.14165 H. Liu, C. Li, Q. Wu, and Y. J. Lee, “Visual instruction tuning,” 2023. [Online]. Available: https://arxiv.org/abs/2304.08485 P. Scheffe, T. M. Henneken, M. Kloock, and B. Alrifaee, “Sequential convex programming methods for real-time optimal trajectory planning in autonomous vehicle racing,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 1, pp. 661–672, 2023. H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” 2020. [Online]. Available: https://arxiv.org/abs/1903.11027 J. Mao, Y. Qian, J. Ye, H. Zhao, and Y. Wang, “Gpt-driver: Learning to drive with gpt,” 2023. [Online]. Available: https://arxiv.org/abs/2310.01415 Z. Xu, Y. Zhang, E. Xie, Z. Zhao, Y. Guo, K.-Y. K. Wong, Z. Li, and H. Zhao, “Drivegpt4: Interpretable end-to-end autonomous driving via large language model,” 2024. [Online]. Available: https://arxiv.org/abs/2310.01412 J. Kim, A. Rohrbach, T. Darrell, J. Canny, and Z. Akata, “Textual explanations for self-driving vehicles,” 2018. [Online]. Available: https://arxiv.org/abs/1807.11546 S. Xie, L. Kong, Y. Dong, C. Sima, W. Zhang, Q. A. Chen, Z. Liu, and L. Pan, “Are vlms ready for autonomous driving? an empirical study from the reliability, data, and metric perspectives,” 2025. [Online]. Available: https://arxiv.org/abs/2501.04003 Meta AI, “Llama 3.2: Revolutionizing edge ai and vision with open, customizable models,” September 2024. [Online]. Available: https://ai.meta.com/blog/ llama-3-2-connect-2024-vision-edge-mobile-devices/ H. Liu, C. Li, Y. Li, B. Li, Y. Zhang, S. Shen, and Y. J. Lee, “Llava-next: Improved reasoning, ocr, and world knowledge,” January 2024. [Online]. Available: https://llava-vl.github.io/blog/2024-01-30-llava-next/ Z. Chen, J. Wu, W. Wang, W. Su, G. Chen, S. Xing, M. Zhong, Q. Zhang, X. Zhu, L. Lu et al., “Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 24 185–24 198. M. Abdin, J. Aneja, H. Awadalla, A. Awadallah, A. A. Awan, N. Bach, A. Bahree, A. Bakhtiari, J. Bao, H. Behl, A. Benhaim, M. Bilenko, J. Bjorck, S. Bubeck, M. Cai, Q. Cai, V. Chaudhary, D. Chen, D. Chen, W. Chen, Y.-C. Chen, Y.-L. Chen, H. Cheng, P. Chopra, X. Dai, M. Dixon, R. Eldan, V. Fragoso, J. Gao, M. Gao, M. Gao, A. Garg, A. D. Giorno, A. Goswami, S. Gunasekar, E. Haider, J. Hao, R. J. Hewett, W. Hu, J. Huynh, D. Iter, S. A. Jacobs, M. Javaheripi, X. Jin, N. Karampatziakis, P. Kauffmann, M. Khademi, D. Kim, Y. J. Kim, L. Kurilenko, J. R. Lee, Y. T. Lee, Y. Li, Y. Li, C. Liang, L. Liden, X. Lin, Z. Lin, C. Liu, L. Liu, M. Liu, W. Liu, X. Liu, C. Luo, P. Madan, A. Mahmoudzadeh, D. Majercak, M. Mazzola, C. C. T. Mendes, A. Mitra, H. Modi, A. Nguyen, B. Norick, B. Patra, D. Perez-Becker, T. Portet, R. Pryzant, H. Qin, M. Radmilac, L. Ren, G. de Rosa, C. Rosset, S. Roy, O. Ruwase, O. Saarikivi, A. Saied, A. Salim, M. Santacroce, S. Shah, N. Shang, H. Sharma, Y. Shen, S. Shukla, X. Song, M. Tanaka, A. Tupini, P. Vaddamanu, C. Wang, G. Wang, L. Wang, S. Wang, X. Wang, Y. Wang, R. Ward, W. Wen, P. Witte, H. Wu, X. Wu, M. Wyatt, B. Xiao, C. Xu, J. Xu, W. Xu, J. Xue, S. Yadav, F. Yang, J. Yang, Y. Yang, Z. Yang, D. Yu, L. Yuan, C. Zhang, C. Zhang, J. Zhang, L. L. Zhang, Y. Zhang, Y. Zhang, Y. Zhang, and X. Zhou, “Phi-3 technical report: A highly capable language model locally on your phone,” 2024. [Online]. Available: https://arxiv.org/abs/2404.14219 P. Wang, S. Bai, S. Tan, S. Wang, Z. Fan, J. Bai, K. Chen, X. Liu, J. Wang, W. Ge, Y. Fan, K. Dang, M. Du, X. Ren, R. Men, D. Liu, C. Zhou, J. Zhou, and J. Lin, “Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution,” 2024. [Online]. Available: https://arxiv.org/abs/2409.12191 J. L. Schönberger and J.-M. Frahm, “Structure-from-motion revisited,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2016. E. Ferrier-Barbut, D. Vaufreydaz, J.-A. David, J. Lussereau, and A. Spalanzani, “Personal space of autonomous car’s passengers sitting in the driver’s seat,” in 2018 IEEE Intelligent Vehicles Symposium (IV), 2018, pp. 2022–2029. E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” 2021. [Online]. Available: https://arxiv.org/abs/2106.09685 K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method for automatic evaluation of machine translation,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, P. Isabelle, E. Charniak, and D. Lin, Eds. Philadelphia, Pennsylvania, USA: Association for Computational Linguistics, Jul. 2002, pp. 311–318. [Online]. Available: https://aclanthology.org/P02-1040/ C.-Y. Lin, “ROUGE: A package for automatic evaluation of summaries,” in Text Summarization Branches Out. Barcelona, Spain: Association for Computational Linguistics, Jul. 2004, pp. 74–81. [Online]. Available: https://aclanthology.org/W04-1013/ S. Banerjee and A. Lavie, “METEOR: An automatic metric for MT evaluation with improved correlation with human judgments,” in Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, J. Goldstein, A. Lavie, C.-Y. Lin, and C. Voss, Eds. Ann Arbor, Michigan: Association for Computational Linguistics, Jun. 2005, pp. 65–72. [Online]. Available: https://aclanthology.org/W05-0909/ A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” in Proceedings of the 1st Annual Conference on Robot Learning, 2017, pp. 1–16. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99614 | - |
| dc.description.abstract | 現代自駕系統在高密度城市環境中表現不佳,主要原因在於這些場景中社會互動頻繁且充滿不確定性,對路徑規劃構成挑戰。現有方法多依賴僵化的基於規則模組,或僅針對場景幾何優化學習型模型,卻常無法有效捕捉類似人類的社會化互動認知與境推理能力。相較之下,我們主張,一個具備社會認知的規劃器應當能夠聯合考慮空間、語意與行為線索,以生成符合潛在社會規範的行車軌跡。為此,我們提出一個基於視覺語言模型(VLM)的框架,整合社會屬性理解與路徑生成,並具備可部署性。本方法利用視覺輸入與細緻的行為標註,並透過自訂損失函數進行模型訓練,以同時對齊軌跡與語意資訊。在經精選的 TITAN 資料集中進行評估後,我們的系統展現出良好的安全性與社會化互動認知。實驗結果顯示,本系統在四項指標上均有優異表現:在 NLP 評估指標方面,BLEU-4 達到 0.21,ROUGE-L 為 0.37,METEOR 為 0.52,GPT-4o 評分指標達到 86 分,VQA 準確率為 44%,平均 L2 距離誤差為 30 像素,顯示出出色的語意理解與規劃能力。本研究提倡以規劃為核心的學習方法,結合社會語境監督,並提供一套具社會互動認知自駕解決方案。 | zh_TW |
| dc.description.abstract | Modern autonomous driving systems struggle in dense urban environments where social interactions and uncertainty dominate the planning landscape. Existing approaches either rely on rigid rule-based modules or optimize learning-based models on scene geometry, but often fail to capture human-like social awareness and contextual reasoning. In contrast, this work argues that a socially-informed planner should reason jointly over spatial, semantic, and behavioral cues to generate trajectories aligned with implicit social norms. Toward this goal, this work proposes a Vision-Language Model (VLM)-based framework that unifies social attribute understanding and trajectory generation within a deployable architecture. Leveraging visual inputs and fine-grained action annotations, the models is trained with a custom loss that accounts for trajectory and semantic alignment. Evaluated on a curated subset of the TITAN dataset, the system exhibits safety and social compliance. Experimental results show strong performance across four key evaluations: the model achieves a BLEU-4 score of 0.21, ROUGE-L of 0.37, and METEOR of 0.52 in NLP metrics; a GPT-4o Rubric Score of 86; a VQA accuracy of 44%; and an average L2 distance error of 30 pixels, reflecting robust reasoning and planning capabilities. This work advocates for planning-centric learning with socially grounded supervision, and offers a solution toward socially-aware autonomous driving. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-09-17T16:08:49Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2025-09-17T16:08:49Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 口試委員審定書 i
誌謝 iii 摘要 v Abstract vii Contents ix List of Figures xiii List of Tables xv Chapter 1 Introduction 1 1.1 Motivation 4 1.2 Contribution 7 1.3 Thesis Organization 8 Chapter 2 Background and Related Works 11 2.1 Software Stack of Autonomous Vehicle and Planning Module 11 2.2 Dense Urban Environments Definition and Public Datasets 13 2.3 Related Works 17 2.3.1 LLMs to VLMs 17 2.3.2 Rule-Based Decision-Making System for Autonomous Vehicles at Intersections with Mixed Traffic Environment 18 2.3.3 ST-P3 19 2.3.4 GPT-Driver 20 2.3.5 DriveGPT4 21 2.3.6 DriveLM 21 Chapter 3 System Architecture and Problem Definition 23 3.1 System Architecture 23 3.2 Problem Definition 26 Chapter 4 Design and Implementation 29 4.1 Design 29 4.2 Implementation 31 4.2.1 Keyframe Selection 31 4.2.2 Social Cue Structuring 31 4.2.3 Camera Pose Estimation and 3D-to-2D Projection 32 4.2.4 Groundtruth Generation via GPT-4o 33 4.2.5 Prompt Engineering 34 4.2.6 Parameter-Efficient Fine-Tuning with LoRA 36 4.2.7 Inference and Postprocessing 38 Chapter 5 Performance Evaluation 39 5.1 Experiment Setup 40 5.2 Experiment Results 43 5.2.1 BLEU-4 43 5.2.2 ROUGE-L 46 5.2.3 METEOR 48 5.2.4 GPT-4o Rubric Score 50 5.2.5 GPT-4o Rubric Score (Evaluate after quantized to 4bit) 51 5.2.6 VQA Accuracy 52 5.2.7 L2 Distance Error 53 5.3 Analysis 55 5.4 Discussion 57 Chapter 6 Conclusion and Future Works 59 6.1 Conclusion 59 6.2 Future Works 59 References 61 | - |
| dc.language.iso | en | - |
| dc.subject | 路徑規劃 | zh_TW |
| dc.subject | 社會化互動認知 | zh_TW |
| dc.subject | 視覺語言模型 | zh_TW |
| dc.subject | 大語言模型 | zh_TW |
| dc.subject | 自駕車 | zh_TW |
| dc.subject | Motion Planning | en |
| dc.subject | Socially-Aware | en |
| dc.subject | Vision Language Model | en |
| dc.subject | Large Language Model | en |
| dc.subject | Autonomous Vehicle | en |
| dc.title | 應用視覺語言模型於高密度市區中進行具有社會化互動認知之自駕車路徑規劃 | zh_TW |
| dc.title | Trajectory Planning in Dense Urban Environments: Utilizing Vision-Language Models to Learn Socially-Aware Behaviors for High-Uncertainty Scenarios | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 113-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 洪士灝;傅立成;郭峻因 | zh_TW |
| dc.contributor.oralexamcommittee | Shih-Hao Hung;Li-Chen Fu;Jiun-In Guo | en |
| dc.subject.keyword | 自駕車,路徑規劃,大語言模型,視覺語言模型,社會化互動認知, | zh_TW |
| dc.subject.keyword | Autonomous Vehicle,Motion Planning,Large Language Model,Vision Language Model,Socially-Aware, | en |
| dc.relation.page | 72 | - |
| dc.identifier.doi | 10.6342/NTU202502673 | - |
| dc.rights.note | 未授權 | - |
| dc.date.accepted | 2025-08-14 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊網路與多媒體研究所 | - |
| dc.date.embargo-lift | N/A | - |
| 顯示於系所單位: | 資訊網路與多媒體研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 未授權公開取用 | 5.79 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
