請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98058完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 李綱 | zh_TW |
| dc.contributor.advisor | Kang Li | en |
| dc.contributor.author | 蔡侑哲 | zh_TW |
| dc.contributor.author | Yu-Che Tsai | en |
| dc.date.accessioned | 2025-07-23T16:37:47Z | - |
| dc.date.available | 2025-07-24 | - |
| dc.date.copyright | 2025-07-23 | - |
| dc.date.issued | 2025 | - |
| dc.date.submitted | 2025-07-18 | - |
| dc.identifier.citation | [1] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), pages 801–818, 2018.
[2] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016. [3] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019. [4] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020. [5] Z. Feng, S. Guo, X. Tan, K. Xu, M. Wang, and L. Ma. Rethinking efficient lane detection via curve modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17062–17070, 2022. [6] A. Hassani, S. Walton, J. Li, S. Li, and H. Shi. Neighborhood attention transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6185–6194, 2023. [7] A. Kirillov, R. Girshick, K. He, and P. Dollár. Panoptic feature pyramid networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6399–6408, 2019. [8] A. Kirillov, Y. Wu, K. He, and R. Girshick. Pointrend: Image segmentation as rendering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9799–9808, 2020. [9] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017. [10] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017. [11] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár. Focal loss for dense object detection. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2999–3007, 2017. [12] L. Liu, X. Chen, S. Zhu, and P. Tan. Condlanenet: a top-to-down lane detection framework based on conditional convolution. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3773–3782, 2021. [13] R. Liu, Z. Yuan, T. Liu, and Z. Xiong. End-to-end lane shape prediction with transformers. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 3694–3702, 2021. [14] X. Liu, T. Wu, and G. Guo. Adaptive sparse vit: Towards learnable adaptive token pruning by fully exploiting self-attention. arXiv preprint arXiv:2209.13802, 2022. [15] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021. [16] Z. Liu, Z. Zhang, S. Khaki, S. Yang, H. Tang, C. Xu, K. Keutzer, and S. Han. Sparse refinement for efficient high-resolution semantic segmentation. In European Conference on Computer Vision, pages 108–127. Springer, 2024. [17] X. Pan, J. Shi, P. Luo, X. Wang, and X. Tang. Spatial as deep: Spatial cnn for traffic scene understanding. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018. [18] Z. Qin, H. Wang, and X. Li. Ultra fast structure-aware deep lane detection. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIV 16, pages 276–291. Springer, 2020. [19] R. Ranftl, A. Bochkovskiy, and V. Koltun. Vision transformers for dense prediction. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12179–12188, 2021. [20] L. Tabelini, R. Berriel, T. M. Paixao, C. Badue, A. F. De Souza, and T. Oliveira- Santos. Keep your eyes on the lane: Real-time attention-guided lane detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 294–302, 2021. [21] L. Tabelini, R. Berriel, T. M. Paixao, C. Badue, A. F. De Souza, and T. Oliveira- Santos. Polylanenet: Lane estimation via deep polynomial regression. In 2020 25th international conference on pattern recognition (ICPR), pages 6150–6156. IEEE, 2021. [22] W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF international conference on computer vision, pages 568–578, 2021. [23] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo. Segformer:Simple and efficient design for semantic segmentation with transformers. Advances in neural information processing systems, 34:12077–12090, 2021. [24] H. Xu, S. Wang, X. Cai, W. Zhang, X. Liang, and Z. Li. Curvelane-nas: Unifying lane-sensitive architecture search and adaptive point blending. In ComputerVision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16, pages 689–704. Springer, 2020. [25] M. Yang, K. Yu, C. Zhang, Z. Li, and K. Yang. Denseaspp for semantic segmentation in street scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3684–3692, 2018. [26] W. Yu, M. Luo, P. Zhou, C. Si, Y. Zhou, X. Wang, J. Feng, and S. Yan. Metaformer is actually what you need for vision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10819–10829, 2022. [27] S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, Y. Fu, J. Feng, T. Xiang, P. H. Torr, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6881–6890, 2021. [28] T. Zheng, H. Fang, Y. Zhang, W. Tang, Z. Yang, H. Liu, and D. Cai. Resa: Recurrent feature-shift aggregator for lane detection. In Proceedings of the AAAI conferenceon artificial intelligence, volume 35, pages 3547–3554, 2021. [29] T. Zheng, Y. Huang, Y. Liu, W. Tang, Z. Yang, D. Cai, and X. He. Clrnet: Cross layerrefinement network for lane detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 898–907, 2022. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98058 | - |
| dc.description.abstract | 車道線偵測一直是自動駕駛與電腦視覺領域中的重要任務,並且多年來已有大量的研究推動其成長與進步。近期多數新提出的研究大多採用座標點偵測或曲線偵測的方法。本研究提出了一種不同於現有常見模型的車道檢測方法。本研究利用了 Vision Transformer (ViT),透過基於語意分割的模型來解決現實車道檢測中常見的困難場景,例如彎曲的車道標記與有陰影的環境。為了解決正負樣本極度不平均的問題,本研究提出使用權重二元交叉熵損失函數。實驗結果顯示,在廣泛使用的 TuSimple 數據集上,本研究所提出的模型在加入權重二元交叉熵損失函數的調整後從原先的64.2%進步到了 96.45% 的準確率,與目前的一線模型並駕齊驅。本研究所提出的損失函數不但在 DeepLabV3+ 模型上也提升 18% 的準確率,同時在更為複雜的道路環境如陰影遮蔽、大車流量等,也展現出顯著的表現。 本研究的模型更能順利偵測分岔車道線和斷裂車道線等,是其餘基於點偵測和基於曲線偵測模型容易出現誤偵測的困難情境。 | zh_TW |
| dc.description.abstract | Lane detection has long been a critical task in the fields of autonomous driving and computer vision, with extensive research and development contributing to its progress over the years. With most of the recently proposed models adopting point-based and curve-based methods, the study presents an alternative method of lane detection that differs from other state-of-the-art models. Taking advantage of the powerful vision transformer (ViT), this research aims to utilize a segmentation-based model to address challenging scenarios often encountered in real-world lane detection, such as curved lane markings and environments with shadows. To solve the problem of extreme class imbalance, a custom weighted binary cross-entropy loss is proposed. Experimental results of the widely adopted TuSimple dataset showed that the proposed model improved from 64.2% accuracy to achieving a favorable accuracy of 96.45%, on par with many recent models, The loss function also provided 18% accuracy improvement on DeepLabV3+, while simultaneously exhibiting substantial performance in more complex scenarios such as crowded and shadowed lanes. The proposed model also has the ability to predict forked lanes and disconnected lanes, which other point-based and curve-based methods fail to predict. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-07-23T16:37:47Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2025-07-23T16:37:47Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | Acknowledgements i
摘要 ii Abstract iii Contents iv List of Figures vii List of Tables viii Chapter 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 2 Literature Review 4 2.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 TuSimple Dataset . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.2 CULane Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.3 Dataset Benchmarks . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Lane detection methods . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 point-based methods . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.2 curve-based methods . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.3 segmentation-based methods . . . . . . . . . . . . . . . . . . 7 2.3 Vision Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4.1 cross-entropy loss . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4.2 binary cross-entropy loss . . . . . . . . . . . . . . . . . . . . 12 2.4.3 focal loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Chapter 3 Methods 14 3.1 System architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Dataset processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3.1 cross-entropy loss . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3.2 binary cross-entropy loss . . . . . . . . . . . . . . . . . . . . 18 3.3.3 focal loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3.4 final loss function . . . . . . . . . . . . . . . . . . . . . . . . 20 Chapter 4 Experiments 22 4.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.1.1 segmentation evaluation . . . . . . . . . . . . . . . . . . . . . 22 4.1.2 dataset evaluation . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.2 Training Configurations . . . . . . . . . . . . . . . . . . . . . 25 4.2.3 Hardware Configurations . . . . . . . . . . . . . . . . . . . . 25 4.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3.1 Results on Tusimple dataset . . . . . . . . . . . . . . . . . . . 26 4.3.2 Results on Scenario : Shadows . . . . . . . . . . . . . . . . . 27 4.3.3 Results on Scenario : Curvy . . . . . . . . . . . . . . . . . . . 28 4.3.4 Results on Scenario : Crowded . . . . . . . . . . . . . . . . . 29 4.3.5 Results on Scenario : Special lane markings . . . . . . . . . . 29 4.3.6 Results on different loss functions . . . . . . . . . . . . . . . . 31 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.4.1 improvements . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.4.2 limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Chapter 5 Conclusions & Future Work 36 5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .38 | - |
| dc.language.iso | en | - |
| dc.subject | 情境 | zh_TW |
| dc.subject | 語意分割 | zh_TW |
| dc.subject | 權重二元交叉熵損失函數 | zh_TW |
| dc.subject | 車道線偵測 | zh_TW |
| dc.subject | scenario | en |
| dc.subject | lane detection | en |
| dc.subject | weighted binary cross-entropy loss | en |
| dc.subject | semantic segmentation | en |
| dc.title | 基於語意分割以及權重損失函數的車道線偵測 | zh_TW |
| dc.title | Segmentation Based Lane Detection with custom weighted loss | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 113-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 林峻永 ;蕭得聖 | zh_TW |
| dc.contributor.oralexamcommittee | Chun-Yeon Lin;Te-Sheng Hsiao | en |
| dc.subject.keyword | 車道線偵測,權重二元交叉熵損失函數,語意分割,情境, | zh_TW |
| dc.subject.keyword | lane detection,weighted binary cross-entropy loss,semantic segmentation,scenario, | en |
| dc.relation.page | 41 | - |
| dc.identifier.doi | 10.6342/NTU202501973 | - |
| dc.rights.note | 同意授權(全球公開) | - |
| dc.date.accepted | 2025-07-21 | - |
| dc.contributor.author-college | 工學院 | - |
| dc.contributor.author-dept | 機械工程學系 | - |
| dc.date.embargo-lift | 2025-07-24 | - |
| 顯示於系所單位: | 機械工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf | 21.12 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
