Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68294
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor蔡欣穆(Hsin-Mu Tsai)
dc.contributor.authorYun-Hua Wangen
dc.contributor.author王韻華zh_TW
dc.date.accessioned2021-06-17T02:16:51Z-
dc.date.available2020-08-24
dc.date.copyright2020-08-24
dc.date.issued2020
dc.date.submitted2020-08-18
dc.identifier.citation[1] 道路交通事故統計, Jul 2020.
[2] A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft. Simple online and realtime tracking. In 2016 IEEE International Conference on Image Processing (ICIP), pages 3464–3468, 2016.
[3] F.-H. Chan, Y.-T. Chen, Y. Xiang, and M. Sun. Anticipating accidents in dashcam videos. volume 10114, pages 136–153, 03 2017.
[4] C. Chiou, W. Wang, S. Lu, C. Huang, P. Chung, and Y. Lai. Driver monitoring using sparse representation with part-based temporal face descriptors. IEEE Transactions on Intelligent Transportation Systems, 21(1):346–361, 2020.
[5] V.E.Dahiphale and S.R.Rao. A review paper on portable driver monitoring system for real-time fatigue. In 2015 International Conference on Computing Communication Control and Automation, pages 558–560, 2015.
[6] K. He, G. Gkioxari, P. Dollár, and R. Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
[7] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015.
[8] A. Jain, H. S. Koppula, B. Raghavan, and A. Saxena. Know before you do: Anticipating maneuvers via learning temporal driving models. CoRR, abs/1504.02789, 2015.
[9] I. T. Jolliffe. Principal Component Analysis and Factor Analysis, pages 115–128. Springer New York, New York, NY, 1986.
[10] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[11] X. Li, D. W. Goldberg, T. Chu, and A. Ma. Enhancing driving safety: Discovering individualized hazardous driving scenes using gis and mobile sensing. Transactions in GIS, 23(3):538–557, 2019.
[12] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.
[13] A. Palazzi, D. Abati, F. Solera, and R. Cucchiara. Predicting the driver’s focus of attention: the dr (eye) ve project. IEEE transactions on pattern analysis and machine intelligence, 41(7):1720–1733, 2018.
[14] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Rai- son, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. Py- torch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
[15] S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 91–99. Curran Associates, Inc., 2015.
[16] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning representations by back-propagating errors. nature, 323(6088):533–536, 1986.
[17] K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 568–576. Curran Associates, Inc., 2014.
[18] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929–1958, 2014.
[19] J. Suh and S. Oh. A cost-aware path planning algorithm for mobile robots. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 4724– 4729, 2012.
[20] T. Suzuki, H. Kataoka, Y. Aoki, and Y. Satoh. Anticipating traffic accidents with adaptive loss and large-scale incident db. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3521–3529, 2018.
[21] D.Tran, L.Bourdev, R.Fergus, L.Torresani, and M.Paluri. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 4489–4497, 2015.
[22] H. Wang and C. Schmid. Action recognition with improved trajectories. In IEEE International Conference on Computer Vision, Sydney, Australia, 2013.
[23] B. Xu, N. Wang, T. Chen, and M. Li. Empirical evaluation of rectified activations in convolutional network. CoRR, abs/1505.00853, 2015.
[24] B. Zhou, H. Zhao, X. Puig, T. Xiao, S. Fidler, A. Barriuso, and A. Torralba. Se- mantic understanding of scenes through the ade20k dataset. International Journal on Computer Vision, 2018.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68294-
dc.description.abstract根據台灣的統計資料顯示,與機械故障和其他原因相比,駕駛者的疏忽是發生交通事故的最常見原因。許多先進駕駛輔助系統因而被設計出來,幫助駕駛在複雜的交通狀況下做出更適當的決策。一種可行的解決方法是整合各種車載感測器收集的資訊,進而警示駕駛者並增進他的視線範圍,利用這些警示,駕駛者可以因而注意到相關的物件並以較低的延遲來做出反應。
在這篇論文中,我們採用深度學習的方法來解決這個問題。基於行車記錄器影像和駕駛者的視線資訊,我們設計了一個卷積神經網路,去對影像中所偵測到的物件進行危險程度的分級。我們透過分析 RGB 影像以及意義分割影像,分別取得目標物件的外觀以及周遭資訊,以進行後續的分類。在系統的實作上我們使用了 DR(eye)VE 資料集 [13] ,該資料集包含了在義大利所收集的行車記錄器影片以及當下駕駛者的視線資訊,我們手動選擇了具有潛在危險的影片片段,對偵測到的物件真實危險等級進行標記,並使用這些資料來訓練以及評估我們的模型。我們的模型最後能夠達到 89% 的整體準確率,並且能夠偵測出 80% 真正屬於危險類別的物件。此外,我們也收集了台灣本地的行車記錄器影像資料集來分析模型在不同道路環境的通用能力。最後,簡單的使用者體驗實驗結果顯示,在使用了我們的系統後,駕駛者能夠多看到約 20% 需要被注意的物件。
zh_TW
dc.description.abstractDriver negligence has been reported as the most common cause of road accidents in Taiwan, resulting in more crashes every year than mechanic malfunction and other major accident causes. Many Advanced Driver Assistance Systems (ADAS) are designed to help drivers to make better driving decisions in complicated traffic scenarios with shorter latency. One possible solution is to process information collected by various onboard sensors and offer alerts or warnings to drivers augmenting its field-of-view. With this annotation, the drivers can pay attention to relevant objects and respond to hazardous situations with lower latency. In this work, we tackle the problem with a deep learning approach. Based on dashcam video frame and the gaze information of a driver, we design a two-branch Convolutional Neural Network (CNN) for categorizing the discretized hazardous level of each object detected in a frame. Our CNN model learns to capture the appearance (e.g., orientation) and proximity (e.g., relation between an object and its surrounding environment) information of an object through the RGB frame and the segmentation frame respectively. Evaluation is performed with the DR(eye)VE dataset [13], which contains dashcam videos along with the driver’s gaze in Italy. We manually pick several potentially dangerous video clips, annotate the ground truth hazardous level of detected objects and use them to train and evaluate our model. Our model achieves 89% overall accuracy and 80% recall on hazardous objects. Besides, it achieves 74% overall accuracy with local dash-cam videos dataset in Taiwan, which shows that the model is sufficiently general to apply to different road environments. Lastly, a simple user study is conducted and the result indicates that drivers could notice 20% more hazardous objects.en
dc.description.provenanceMade available in DSpace on 2021-06-17T02:16:51Z (GMT). No. of bitstreams: 1
U0001-1708202015122400.pdf: 6272523 bytes, checksum: b26016458c61083a7b4f494eda2d10a8 (MD5)
Previous issue date: 2020
en
dc.description.tableofcontents口試委員會審定書 ii
誌謝 iii
摘要 iv
Abstract iv
1 Introduction 1
2 Related works 5
2.1 ADAS for Driving Safety.......................... 5
2.2 Video Analysis for Action Recognition............ 6
2.3 DriverGaze....................................... 7
3 Preliminary 8
3.1 Convolutional Neural Networks ....................... 8
3.1.1 Architecture Overview.......................... 8
3.1.2 Neural Networks ............................... 9
3.1.3 Convolution Layers............................ 10
3.1.4 Training Objective ........................... 12
3.1.5 1-by-1 Convolution............................ 14
3.2 Region-of-Interest (RoI) Feature Pooling............ 15
3.2.1 RoIAlign...................................... 15
3.2.2 Interpolation ................................ 16
4 System Design 18
4.1 System Overview..................................... 18
4.2 Feature Extraction.................................. 20
4.2.1 Feature Design ............................... 21
4.2.2 Feature Encoding.............................. 22
4.2.3 RoIFeature Pooling ........................... 23
4.2.4 Feature Pyramid Network (FPN)................. 24
4.2.5 Motion Feature Encoding ...................... 26
4.3 Classification ..................................... 28
5 Implementation 30
5.1 Dataset ............................................ 30
5.1.1 Overview...................................... 30
5.1.2 Data Pre-processing .......................... 31
5.1.3 Hazardous Level Annotation.................... 32
5.1.4 Segmentation Frames........................... 33
5.2 Development Environment ............................ 34
6 Evaluation 35
6.1 Design Selection ................................... 35
6.1.1 Effect of FPN................................. 35
6.1.2 Effect of Motion Feature...................... 36
6.1.3 Loss Function................................. 39
6.1.4 Training Process ............................. 40
6.2 Performance under Different Gaze Input.............. 41
6.3 Performance under Local Dataset..................... 44
6.4 User Experience Study .............................. 46
7 Conclusion and Future Work 48
Bibliography 49
dc.language.isoen
dc.subject駕駛視線zh_TW
dc.subject危險程度分級zh_TW
dc.subject卷積神經網路zh_TW
dc.subject行車記錄器影像分析zh_TW
dc.subjectHazardous Level Assessmenten
dc.subjectConvolutional Neural Networken
dc.subjectDashcam Video Analysisen
dc.subjectDriver Gazeen
dc.title使用行車紀錄器影像及駕駛視線資訊之自動道路物件危險程度分級zh_TW
dc.titleAutomatic Assessment of Road Object Hazardous Level using Vehicle Dashcam Video and Driver Gazeen
dc.typeThesis
dc.date.schoolyear108-2
dc.description.degree碩士
dc.contributor.oralexamcommittee林忠緯(Chung-Wei Lin),林靖茹(Ching-Ju Lin),陳冠文(Kuan-Wen Chen)
dc.subject.keyword危險程度分級,卷積神經網路,行車記錄器影像分析,駕駛視線,zh_TW
dc.subject.keywordHazardous Level Assessment,Convolutional Neural Network,Dashcam Video Analysis,Driver Gaze,en
dc.relation.page51
dc.identifier.doi10.6342/NTU202003762
dc.rights.note有償授權
dc.date.accepted2020-08-19
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
U0001-1708202015122400.pdf
  未授權公開取用
6.13 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved