Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92089
Title: | 以影像式自然語言方法輔助工地安全巡檢紀錄 Assisting Construction Safety Inspection Documentation via Vision-Based Natural Language Generation |
Authors: | 蔡瑋倫 Wei-Lun Tsai |
Advisor: | 謝尚賢 Shang-Hsien Hsieh |
Co-Advisor: | 林之謙 Jacob Je-Chian Lin |
Keyword: | 電腦視覺,影像註解,深度學習,安全巡檢,工地安全, Computer Vision,Image Captioning,Deep Learning,Safety Inspectio,Construction Safety vi, |
Publication Year : | 2022 |
Degree: | 碩士 |
Abstract: | 營建廠商會透過工地安全巡檢,確保工地的安全。根據現有工地巡檢的流程,環安衛人員使用手機拍攝工地缺失照片,以及記錄缺失內容項目。但是完成缺失後,往往需要耗費大量時間整理巡檢結果的影像和文字,導致傳統巡檢方式效率不佳,包含大量文字和圖片的環安衛巡檢報告,完成記錄後,也缺少後續針對缺失種類,以及巡檢內容的應用和分析。近年隨著自然語言和影像語言模型技術的發展,已經有許多結合文字和影像的深度學習模型,可以進行自然語言理解和生成。本研究針對現有的工地巡檢流程,提出針對工地安全巡檢的輔助模組,包含工地缺失的分類,和缺失照片的註解生成,並且與手機平台整合,提供完整環安衛巡檢的功能。首先進行影像和註解的資料集收集,並新增註解和缺失種類的分類,作為對比式學習的參考,模型的部分使用影像對比訓練模型(Contrastive Language–Image Pre-trained, CLIP)和前導詞(prefix)作為影像註解的生成流程,分別判斷影像是否含有缺失和缺失種類的分類,並且根據做為前導詞的分類結果和註解,生成該影像的工地安全註解或法規分類。最後進行手機使用者介面的開發,讓環安衛巡檢人員能夠在手機平台進行缺失資料的收集,以及標註缺失影像,並且透過影像註解模型,自動辨識缺失影像中的缺失種類和生成相關註解,改善資料收集流程,提升工地安全巡檢流程效率,讓資料有系統地收集,以建立工地缺失的知識庫,延續資料的利用價值。 A safety inspection is a common practice to prevent accidents from happening on construction sites. Traditional workflows require an inspector to document the violations through paper and pen, which is error-prone, time-consuming, and unactionable. With recent smartphone applications attempting to streamline the process for report generation and further safety analysis, there is an unprecedented opportunity to develop a construction knowledge base that integrates images and captions. Therefore, a user interface that can organize captured data and inherit the knowledge from the former inspection professionals is required for a more effective inspection workflow. This research proposed a safety inspection system that assisted with image captioning, including three main modules: data collection, image captioning model training, and on-site application implementation. The captioning data was from safety reports provided by experienced industrial partners. The additional attributes of caption type and violation type were added for the contrastive learning step. The captioning model contains two modules which are Contrastive Language–Image Pre-trained (CLIP) fine-tuning, and CLIP prefix captioning. CLIP can obtain contrastive features to classify the attribute types for images. CLIP prefix captioning generates the caption or violation lists from the given attributes, images, and captions. Finally, the captioning model is implemented in the user interface, assisting safety inspectors in generating the caption automatically. Through the proposed framework, the safety violation captions data can be collected more effectively. The generated captions and violation lists also can assist in safety inspections. A knowledge base for construction violations can be established to extend the value of data utilization. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92089 |
DOI: | 10.6342/NTU202400277 |
Fulltext Rights: | 同意授權(全球公開) |
Appears in Collections: | 土木工程學系 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
ntu-112-1.pdf | 26.58 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.