Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 土木工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92089
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor謝尚賢zh_TW
dc.contributor.advisorShang-Hsien Hsiehen
dc.contributor.author蔡瑋倫zh_TW
dc.contributor.authorWei-Lun Tsaien
dc.date.accessioned2024-03-05T16:14:07Z-
dc.date.available2024-03-06-
dc.date.copyright2024-03-05-
dc.date.issued2022-
dc.date.submitted2024-02-05-
dc.identifier.citationJia-Rui Lin, Zhen-Zhong Hu, Jiu-Lin Li, and Li-Min Chen. Understanding on-site inspection of construction projects based on keyword extraction and topic modeling. IEEE Access, 8:198503–198517, 2020.
Hao Zhang, Seokho Chi, Jay Yang, Madhav Nepal, and Seonghyeon Moon. Development of a safety inspection framework on construction sites using mobile computing. Journal of Management in Engineering, 33(3):04016048, 2017.
Min-Yuan Cheng, Denny Kusoemo, and Richard Antoni Gosno. Text mining-based construction site accident classification using hybrid supervised machine learning. Automation in Construction, 118:103265, 2020.
Aritra Pal and Shang-Hsien Hsieh. Deep-learning-based visual data analytics for smart construction management. Automation in Construction, 131:103892, 2021.
Mohammed Al Qady and Amr Kandil. Concept relation extraction from construction documents using natural language processing. Journal of Construction Engineering and Management, 136(3):294–302, 2010.
Qingwen Xu, Heap-Yih Chong, and Pin-Chao Liao. Collaborative information integration for construction safety monitoring. Automation in Construction, 102:120–134, 2019.
Ken-Yu Lin, Meng-Han Tsai, Umberto C. Gatti, Jacob Je-Chian Lin, Cheng-Hao Lee, and Shih-Chung Kang. A user-centered information and communication technology (ict) tool to improve safety inspections. Automation in Construction, 48:53–63, 2014.
MD. Zakir Hossain, Ferdous Sohel, Mohd Fairuz Shiratuddin, and Hamid Laga. A comprehensive survey of deep learning for image captioning. ACM Comput. Surv., 51(6), feb 2019.
Huan Liu, Guangbin Wang, Ting Huang, Ping He, Martin Skitmore, and Xiaochun Luo. Manifesting construction activity scenes via image captioning. Automation in Construction, 119:103334, 2020.
Seongdeok Bang and Hyoungkwan Kim. Context-based information generation for managing uav-acquired data using image captioning. Automation in Construction, 112:103116, 2020.
Yifan Du, Zikang Liu, Junyi Li, and Wayne Xin Zhao. A survey of vision-language pre-trained models, 2022.
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision, 2021.
Yonglong Tian, Dilip Krishnan, and Phillip Isola. Contrastive multiview coding. arXiv preprint arXiv:1906.05849, 2019.
Ron Mokady, Amir Hertz, and Amit H Bermano. Clipcap: Clip prefix for image captioning. arXiv preprint arXiv:2111.09734, 2021.
Wei Lun Tsai, Jacob J. Lin, and Shang-Hsien Hsieh. Generating construction safety observations via clip-based image-language embedding. In Leonid Karlinsky, Tomer Michaeli, and Ko Nishino, editors, Computer Vision – ECCV 2022 Workshops, pages 366–381, Cham, 2023. Springer Nature Switzerland.
Shih-Chung Kang, Cheng-Hao Lee, Meng-Han Tsai, and Ken-Yu Lin. isafe: An innovative ipad system for construction site safety audits. In Proceedings of the 14th International Conference on Computing in Civil and Building Engineering (ICCCBE 2012), 06 2012.
Bureau of Labor Statistics U.S. Department of Labor. Occupational outlook handbook, construction and building inspectors, 2022.
Hongqin Fan, Fan Xue, and Heng Li. Project-based as-needed information retrieval from unstructured aec documents. Journal of Management in Engineering, 31(1):A4014012, 2015.
Fan Zhang. A hybrid structured deep neural network with word2vec for construction accident causes classification. International Journal of Construction Management, 22(6):1120–1140, 2022.
Yang Miang Goh and C.U. Ubeynarayana. Construction accident narrative classification: An evaluation of text mining techniques. Accident Analysis & Prevention, 108:122–130, 2017.
Botao Zhong, Heng Li, Hanbin Luo, Jingyang Zhou, Weili Fang, and Xuejiao Xing. Ontology-based semantic modeling of knowledge in construction: Classification and identification of hazards implied in images. Journal of Construction Engineering and Management, 146(4):04020013, 2020.
Zhendong Dong and Qiang Dong. Hownet - a hybrid language and knowledge resource. In International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003, pages 820–824, 2003.
Hsien-Tang Lin, Nai-Wen Chi, and Shang-Hsien Hsieh. A concept-based information retrieval approach for engineering domain-specific technical documents. Advanced Engineering Informatics, 26(2):349–360, 2012. Knowledge based engineering to support complex product design.
Fan Zhang, Hasan Fleyeh, Xinru Wang, and Minghui Lu. Construction site accident analysis using text mining and natural language processing techniques. Automation in Construction, 99:238–248, 2019.
Taekhyung Kim and Seokho Chi. Accident case retrieval and analyses: Using natural language processing in the construction industry. Journal of Construction Engineering and Management, 145(3):04019004, 2019.
Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019.
Yuexiong Ding, Jie Ma, and Xiaowei Luo. Applications of natural language processing in construction. Automation in Construction, 136:104169, 2022.
Botao Zhong, Wanlei He, Ziwei Huang, Peter E.D. Love, Junqing Tang, and Hanbin Luo. A building regulation question answering system: A deep learning methodology. Advanced Engineering Informatics, 46:101195, 2020.
Justin Johnson, Andrej Karpathy, and Li Fei-Fei. Densecap: Fully convolutional localization networks for dense captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks, 2015.
Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollar, and C. Lawrence Zitnick. Microsoft coco captions: Data collection and evaluation server, 2015.
Bo Xiao, Yiheng Wang, and Shih-Chung Kang. Deep learning image captioning in construction management: A feasibility study. Journal of Construction Engineering and Management, 148(7):04022049, 2022.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition, 2015.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2017.
Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jerret Ross, and Vaibhava Goel. Self-critical sequence training for image captioning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1179–1195, 2017.
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020.
Bo Xiao and Shih-Chung Kang. Development of an image data set of construction machines for deep learning object detection. Journal of Computing in Civil Engineering, 35(2):05020005, 2021.
Taiwan Ministry of Labor. Regulations of occupational safety and health act, 2022.
Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. Bottom-up and top-down attention for image captioning and visual question answering, 2017.
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate, 2014.
Karl Moritz Hermann, Tomáš Kočiský, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. Teaching machines to read and comprehend, 2015.
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention, 2015.
Lun Huang, Wenmin Wang, Jie Chen, and Xiao-Yong Wei. Attention on attention for image captioning. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 4633–4642, 2019.
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale, 2020.
Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation, 2021.
Ssu Chiu, Maolin Li, Yen-Ting Lin, and Yun-Nung Chen. Salesbot: Transitioning from chit-chat to task-oriented dialogues, 2022.
Jianheng Tang, Tiancheng Zhao, Chenyan Xiong, Xiaodan Liang, Eric Xing, and Zhiting Hu. Target-guided open-domain conversation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5624– 5634, Florence, Italy, July 2019. Association for Computational Linguistics.
J Alammar. The illustrated gpt-2 (visualizing transformer language models) [blog post], 2019.
Patrick von Platen. How to generate text: using different decoding methods for language generation with transformers[blog post], 2020.
Louis Shao, Stephan Gouws, Denny Britz, Anna Goldie, Brian Strope, and Ray Kurzweil. Generating high-quality and informative conversation responses with sequence-to-sequence models, 2017.
Ashwin K Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, and Dhruv Batra. Diverse beam search: Decoding diverse solutions from neural sequence models, 2016.
Kenton Murray and David Chiang. Correcting length bias in neural machine translation, 2018.
Yilin Yang, Liang Huang, and Mingbo Ma. Breaking the beam search curse: A study of (re-)scoring methods and stopping criteria for neural machine translation, 2018.
Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration, 2019.
Angela Fan, Mike Lewis, and Yann Dauphin. Hierarchical neural story generation, 2018.
Thara Wetchakorn and Nakornthip Prompoon. Method for mobile user interface design patterns creation for ios platform. In 2015 12th International Joint Conference on Computer Science and Software Engineering (JCSSE), pages 150–155, 2015.
Apple Inc. Human interface guidelines, 2022.
Erik G. Nilsson. Design patterns for user interface for mobile applications. Advances in Engineering Software, 40(12):1318–1328, 2009. Designing, modelling and implementing interactive systems.
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2014.
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization, 2017.
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, page 311–318, USA, 2002. Association for Computational Linguistics.
Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics.
Alon Lavie and Abhaya Agarwal. METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the Second Workshop on Statistical Machine Translation, pages 228–231, Prague, Czech Republic, June 2007. Association for Computational Linguistics.
Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. Spice: Semantic propositional image caption evaluation. In ECCV, 2016.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92089-
dc.description.abstract營建廠商會透過工地安全巡檢,確保工地的安全。根據現有工地巡檢的流程,環安衛人員使用手機拍攝工地缺失照片,以及記錄缺失內容項目。但是完成缺失後,往往需要耗費大量時間整理巡檢結果的影像和文字,導致傳統巡檢方式效率不佳,包含大量文字和圖片的環安衛巡檢報告,完成記錄後,也缺少後續針對缺失種類,以及巡檢內容的應用和分析。近年隨著自然語言和影像語言模型技術的發展,已經有許多結合文字和影像的深度學習模型,可以進行自然語言理解和生成。本研究針對現有的工地巡檢流程,提出針對工地安全巡檢的輔助模組,包含工地缺失的分類,和缺失照片的註解生成,並且與手機平台整合,提供完整環安衛巡檢的功能。首先進行影像和註解的資料集收集,並新增註解和缺失種類的分類,作為對比式學習的參考,模型的部分使用影像對比訓練模型(Contrastive Language–Image Pre-trained, CLIP)和前導詞(prefix)作為影像註解的生成流程,分別判斷影像是否含有缺失和缺失種類的分類,並且根據做為前導詞的分類結果和註解,生成該影像的工地安全註解或法規分類。最後進行手機使用者介面的開發,讓環安衛巡檢人員能夠在手機平台進行缺失資料的收集,以及標註缺失影像,並且透過影像註解模型,自動辨識缺失影像中的缺失種類和生成相關註解,改善資料收集流程,提升工地安全巡檢流程效率,讓資料有系統地收集,以建立工地缺失的知識庫,延續資料的利用價值。zh_TW
dc.description.abstractA safety inspection is a common practice to prevent accidents from happening on construction sites. Traditional workflows require an inspector to document the violations through paper and pen, which is error-prone, time-consuming, and unactionable. With recent smartphone applications attempting to streamline the process for report generation and further safety analysis, there is an unprecedented opportunity to develop a construction knowledge base that integrates images and captions. Therefore, a user interface that can organize captured data and inherit the knowledge from the former inspection professionals is required for a more effective inspection workflow. This research proposed a safety inspection system that assisted with image captioning, including three main modules: data collection, image captioning model training, and on-site application implementation. The captioning data was from safety reports provided by experienced industrial partners. The additional attributes of caption type and violation type were added for the contrastive learning step. The captioning model contains two modules which are Contrastive Language–Image Pre-trained (CLIP) fine-tuning, and CLIP prefix captioning. CLIP can obtain contrastive features to classify the attribute types for images. CLIP prefix captioning generates the caption or violation lists from the given attributes, images, and captions. Finally, the captioning model is implemented in the user interface, assisting safety inspectors in generating the caption automatically. Through the proposed framework, the safety violation captions data can be collected more effectively. The generated captions and violation lists also can assist in safety inspections. A knowledge base for construction violations can be established to extend the value of data utilization.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-03-05T16:14:07Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-03-05T16:14:07Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsVerification Letter from the Oral Examination Committee i
Acknowledgements ii
摘要 iv
Abstract v
Contents vii
List of Figures x
List of Tables xi
Chapter 1 Introduction 1
1.1 Background and Motivation 1
1.2 Research Objectives 4
1.3 Organization of Thesis 5
Chapter 2 Literature Review 6
2.1 Safety Inspection with Information and Communication Technology (ICT) 7
2.2 Natural Language Understanding in Construction 9
2.3 Natural Language Generation in Construction 12
2.4 Vision-Based Natural Language Generation in Construction 14
2.5 Summary 16
Chapter 3 Construction Safety Inspection Framework 18
3.1 Overview 18
3.2 Dataset Development 19
3.2.1 Labelling Strategy for Safety Violation Captioning Data Sets 19
3.2.2 Status and Safety Violations Image Collection 21
3.3 Construction CLIP Fine-tuning 22
3.3.1 Model Selection 24
3.3.2 Fine-tuning Strategy 26
3.4 CLIP Prefix Captioning 28
3.4.1 Prefix Tuning with Attributes 29
3.4.2 Decoding Strategies 31
3.5 User Interface for Safety Inspection 33
3.5.1 Requirement Verification 34
3.5.2 Prototype Designing 35
3.5.3 Application Implementation 37
3.6 Summary 38
Chapter 4 Experiments 40
4.1 Construction CLIP Fine-tuning 40
4.1.1 Implementation Detail 40
4.1.2 Evaluation and Discussion 42
4.2 CLIP Prefix Captioning 45
4.2.1 Implementation Detail 45
4.2.2 Evaluation and Discussion 46
4.3 User Interface 51
4.4 Summary 52
Chapter 5 Conclusion 54
5.1 Conclusion 54
5.2 Future Work 56
References 58
-
dc.language.isoen-
dc.title以影像式自然語言方法輔助工地安全巡檢紀錄zh_TW
dc.titleAssisting Construction Safety Inspection Documentation via Vision-Based Natural Language Generationen
dc.typeThesis-
dc.date.schoolyear112-1-
dc.description.degree碩士-
dc.contributor.coadvisor林之謙zh_TW
dc.contributor.coadvisorJacob Je-Chian Linen
dc.contributor.oralexamcommittee陳俊杉;吳日騰;周慧瑜zh_TW
dc.contributor.oralexamcommitteeChuin-Shan Chen;Rih-Teng Wu;Hui-Yu Chouen
dc.subject.keyword電腦視覺,影像註解,深度學習,安全巡檢,工地安全,zh_TW
dc.subject.keywordComputer Vision,Image Captioning,Deep Learning,Safety Inspectio,Construction Safety vi,en
dc.relation.page66-
dc.identifier.doi10.6342/NTU202400277-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2024-02-06-
dc.contributor.author-college工學院-
dc.contributor.author-dept土木工程學系-
顯示於系所單位:土木工程學系

文件中的檔案:
檔案 大小格式 
ntu-112-1.pdf26.58 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved