Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 土木工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100913
Full metadata record
???org.dspace.app.webui.jsptag.ItemTag.dcfield???ValueLanguage
dc.contributor.advisor陳俊杉zh_TW
dc.contributor.advisorChuin-Shan Chenen
dc.contributor.author柯米克zh_TW
dc.contributor.authorMik Wanul Khosiinen
dc.date.accessioned2025-11-05T16:04:41Z-
dc.date.available2025-11-06-
dc.date.copyright2025-11-05-
dc.date.issued2025-
dc.date.submitted2025-11-04-
dc.identifier.citationReferences
[1] PMI. A Guide to the Project Management Body of Knowledge (PMBOK Guide). Project Management Institute, Newtown Square, PA, USA, 1987. First edition.
[2] Khosiin, Lin, and Chen. Worker accountability in computer vision for construction productivity measurement: A systematic review. Korea Institute of Construction Engineering and Management, pages 775–782, july 2024.
[3] Vincent Gaspersz. Manajemen Produktivitas Total: Strategi Peningkatan Produktivitas Bisnis Global. Gramedia, Jakarta, cet. 1 edition, 1998. Accessed via Unika Atma Jaya Library.
[4] Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. Faster R-CNN: towards real-time object detection with region proposal networks. CoRR, abs/1506.01497, 2015.
[5] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks, 2018.
[6] Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized intersection over union: A metric and a loss for bounding box regression. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 658–666, 2019.93
[7] Koen Vermeltfoort Birgit Biemans Jan Mischke, Kevin Stokvis. Delivering on construction productivity is no longer optional. Technical report, McKinsey Company, 2024.
[8] Aritra Pal, Jacob J. Lin, Shang-Hsien Hsieh, and Mani Golparvar-Fard. Activity-level construction progress monitoring through semantic segmentation of 3d informed orthographic images. Automation in Construction, RR156-11:105157, 2024.
[9] CII. Engineering productivity measurement. Construction Industry Institute, RR156-11:286, 2001.
[10] J.L. Ashford. The management of quality in construction (1st ed.). Routledge, RR156-11:252, 1989.
[11] Chan-Sik Park and Hyeon-Jin Kim. A framework for construction safety management and visualization system. Automation in Construction, 33:95–103, 2013. Augmented Reality in Architecture, Engineering, and Construction.
[12] Shuai Tang, Dominic Roberts, and Mani Golparvar-Fard. Human-object interaction recognition for automatic construction site safety inspection. Automation in Construction, 120:103356, 2020.
[13] Dominic Roberts, Wilfredo Torres Calderon, Shuai Tang, and Mani Golparvar Fard. Vision-based construction worker activity analysis informed by body posture. Journal of Computing in Civil Engineering, 34(4):04020017, 2020.
[14] Ziqi Li and Dongsheng Li. Action recognition of construction workers under occlusion. Journal of Building Engineering, 45:103352, 2022.94
[15] Ghazaleh Torabi, Amin Hammad, and Nizar Bouguila. Two-dimensional and three dimensional cnn-based simultaneous detection and activity classification of construction workers. Journal of Computing in Civil Engineering, 36(4):04022009, 2022.
[16] Min-Yuan Cheng, Akhmad F.K. Khitam, and Hongky Haodiwidjaya Tanto. Construction worker productivity evaluation using action recognition for foreign labor training and education: A case study of taiwan. Automation in Construction, 150:104809, 2023.
[17] Qilin Zhang, Zhichen Wang, Bin Yang, Ke Lei, Binghan Zhang, and Boda Liu. Reidentification-based automated matching for 3d localization of workers in construction sites. Journal of Computing in Civil Engineering, 35(6):04021019, 2021.
[18] Yu-Wei Chao, Yunfan Liu, Xieyang Liu, Huayi Zeng, and Jia Deng. Learning to detect human-object interactions. CoRR, abs/1702.05448, 2017.
[19] Saurabh Gupta and Jitendra Malik. Visual semantic role labeling. arXiv preprint arXiv:1505.04474, 2015.
[20] Meng-Jiun Chiou, Chun-Yu Liao, Li-Wei Wang, Roger Zimmermann, and Jiashi Feng. St-hoi: A spatial-temporal baseline for human-object interaction detection in videos. In Proceedings of the 2021 Workshop on Intelligent Cross-Data Analysis and Retrieval, page 9–17, 2021.
[21] Zhijun Liang, Yisheng Guan, and Juan Rojas. Visual-semantic graph attention network for human-object interaction detection. CoRR, abs/2001.02302, 2020.
[22] Kun Xia, Jianguang Huang, and Hanyu Wang. Lstm-cnn architecture for human activity recognition. IEEE Access, 8:56855–56866, 2020.95
[23] Glenn Jocher, Alex Stoken, Jirka Borovec, NanoCode012, ChristopherSTAN, Liu Changyu, Laughing, tkianai, Adam Hogan, lorenzomammana, yxNONG, AlexWang1900, Laurentiu Diaconu, Marc, wanghaoyang0106, ml5ah, Doug, Francisco Ingham, Frederik, Guilhen, Hatovix, Jake Poznanski, Jiacong Fang, Lijun Yu 于力军, changyu98, Mingyu Wang, Naman Gupta, Osama Akhtar, PetrDvoracek, and Prashant Rai. ultralytics/yolov5: v3.1 - bug fixes and performance improvements, October 2020.
[24] Sagrario Garcia-Garcia and Raúl Pinto-Elías. Human activity recognition implenting the yolo models. In 2022 International Conference on Mechatronics, Electronics and Automotive Engineering (ICMEAE), pages 127–132, 2022.
[25] Okan Köpüklü, Xiangyu Wei, and Gerhard Rigoll. You only watch once: A unified CNN architecture for real-time spatiotemporal action localization. CoRR, abs/1911.06644, 2019.
[26] Ronald Mutegeki and Dong Seog Han. A cnn-lstm approach to human activity recognition. In 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pages 362–366, 2020.
[27] Julia Altheimer and Johannes Schneider. Smart-watch-based construction worker activity recognition with hand-held power tools. Automation in Construction, 167:105684, 2024.
[28] Trevor Slaton, Carlos Hernandez, and Reza Akhavian. Construction activity recognition with convolutional recurrent networks. Automation in Construction, 113:103138, 2020.
[29] Sangyoon Yun, Sungkook Hong, Sungjoo Hwang, Dongmin Lee, and Hyunsoo 96 Kim. Analysis of masonry work activity recognition accuracy using a spatiotemporal graph convolutional network across different camera angles. Automation in Construction, 175:106178, 2025.
[30] Emil L. Jacobsen, Jochen Teizer, and Søren Wandahl. Work estimation of construction workers for productivity monitoring using kinematic data and deep learning. Automation in Construction, 152:104932, 2023.
[31] Mik Wanul Khosiin and Ardian Umam. Implementing a relational database in processing construction project documents. In Stefanus Adi Kristiawan, Buntara SGan, Mohamed Shahin, and Akanshu Sharma, editors, Proceedings of the 5th International Conference on Rehabilitation and Maintenance in Civil Engineering, pages 891–900, Singapore, 2023. Springer Nature Singapore.
[32] Piyush, Sonu Rajak, and Vimal K. E. K. Lstm-cnn architecture for construction activity recognition using optimal positioning of wearables. Journal of Construction Engineering and Management, 150(12):04024179, 2024.
[33] Nhung Tran Thi Hong, Giang L. Nguyen, Nguyen Quang Huy, Do Viet Manh, Duc Nghia Tran, and Duc-Tan Tran. A low-cost real-time iot human activity recognition system based on wearable sensor and the supervised learning algorithms. Measurement, 218:113231, 2023.
[34] Dipanwita Thakur, Antonella Guzzo, and Giancarlo Fortino. Intelligent adaptive real-time monitoring and recognition system for human activities. IEEE Transactions on Industrial Informatics, 20(11):13212–13222, 2024.
[35] Han Sun and Yu Chen. Real-time elderly monitoring for senior safety by lightweight human action recognition, 2022.97.
[36] Sungho Suh, Vitor Fortes Rey, and Paul Lukowicz. Wearable Sensor-Based Human Activity Recognition for Worker Safety in Manufacturing Line, pages 303–317. Springer Nature Switzerland, Cham, 2024.
[37] Ning Wang, Guangming Zhu, Hongsheng Li, Mingtao Feng, Xia Zhao, Lan Ni, Peiyi Shen, Lin Mei, and Liang Zhang. Exploring spatio–temporal graph convolution for video-based human–object interaction recognition. IEEE Transactions on Circuits and Systems for Video Technology, 33(10):5814–5827, 2023.
[38] Hongsheng Li, Guangming Zhu, Wu Zhen, Lan Ni, Peiyi Shen, Liang Zhang, Ning Wang, and Cong Hua. Spatial parsing and dynamic temporal pooling networks for human-object interaction detection. In 2022 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2022.
[39] Jeeseung Park, Jin-Woo Park, and Jong-Seok Lee. ViPLO: Vision Transformer Based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17152–17162, Los Alamitos, CA, USA, 2023. IEEE Computer Society.
[40] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. CoRR, abs/1706.03762, 2017.
[41] Zhijun Liang, Junfa Liu, Yisheng Guan, and Juan Rojas. Visual-semantic graph attention networks for human-object interaction detection. In 2021 IEEE International Conference on Robotics and Biomimetics (ROBIO), pages 1441–1447, 2021.
[42] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele 98 Monfardini. The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61–80, 2009.
[43] Liang Yao, Chengsheng Mao, and Yuan Luo. Graph convolutional networks for text classification. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):7370–7377, Jul. 2019.
[44] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
[45] Yeon Chae, Hoonyong Lee, Changbum R. Ahn, Minhyuk Jung, and Moonseo Park. Vision-based activity recognition monitoring based on human-object interaction at construction sites. In International conference on construction engineering and project management, volume 2022.06a, pages 877–885. Korea Institute of Construction Engineering and Management, 2022.
[46] Mingxuan Zhang, Xiao Wu, Zhaoquan Yuan, Qi He, and Xiang Huang. Human-object-object interaction: Towards human-centric complex interaction detection. In Proceedings of the 31st ACM International Conference on Multimedia, MM ’23, page 2233–2242, New York, NY, USA, 2023. Association for Computing Machinery.
[47] Dai Quoc Tran, Yuntae Jeon, Minsoo Park, and Seunghee (4) Park. Gpt-based logic reasoning for hazard identification in construction site using cctv data. In Proceedings of the 41st International Symposium on Automation and Robotics in Construction, pages 291–298, Lille, France, June 2024. International Association for Automation and Robotics in Construction (IAARC). 99
[48] Mik Wanul Khosiin, Jacob J. Lin, Eko Andi Suryo, Kartika Puspa Negara, Ismiarta Aknuranda, and Chuin-Shan Chen. Video-based productivity monitoring of worker and large-scale object interactions in construction sites. In Jiansong Zhang, Qian Chen, Gaang Lee, Vicente Gonzalez-Moret, and Vineet Kamat, editors, Proceedings of the 42nd International Symposium on Automation and Robotics in Construction, pages 580–587, Montreal, Canada, July 2025. International Association for Automation and Robotics in Construction (IAARC).
[49] Cheng Zeng, Timo Hartmann, and Leyuan Ma. Conse: An ontology for visual representation and semantic enrichment of digital images in construction sites. Advanced Engineering Informatics, 60:102446, 2024.
[50] Zaolin Pan, Cheng Su, Yichuan Deng, and Jack Cheng. Image2triplets: A computer vision-based explicit relationship extraction framework for updating construction activity knowledge graphs. Computers in Industry, 137:103610, 2022.
[51] Fabian Pfitzner, Alexander Braun, and André Borrmann. From data to knowledge: Construction process analysis through continuous image capturing, object detection, and knowledge graph creation. Automation in Construction, 164:105451, 2024.
[52] Ross Girshick. Fast r-cnn. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 1440–1448, 2015.
[53] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2980–2988, 2017.
[54] Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Yolov8: You only look once version 8, Ultralitics. https://github.com/ultralytics/ultralytics. 2023.100
[55] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. CoRR, abs/2005.12872, 2020.
[56] Tsung-Yi et al Lin. Microsoft coco: Common objects in context. In Computer Vision ECCV 2014, pages 740–755, Cham, 2014. Springer International Publishing.
[57] Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolution networks. CoRR, abs/1609.02907, 2016.
[58] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? CoRR, abs/1810.00826, 2018.
[59] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61–80, 2009.
[60] William L. Hamilton, Rex Ying, and Jure Leskovec. Inductive representation learning on large graphs. CoRR, abs/1706.02216, 2017.
[61] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (NeurIPS), pages 3111–3119, 2013.
[62] Google. Google Code Archive - Long-term storage for Google Code Project Hosting: word2vec. https://code.google.com/archive/p/word2vec/, 2013. [Online; 2024].
[63] George A. Miller. Wordnet: a lexical database for english. Commun. ACM, 38(11):39–41, November 1995.
[64] Zhijun Liang, Junfa Liu, Yisheng Guan, and Juan Rojas. Pose-based modular net- work for human-object interaction detection. CoRR, abs/2008.02042, 2020.
[65] Jeeseung Park, Jin-Woo Park, and Jong-Seok Lee. ViPLO: Vision Transformer Based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17152–17162, Los Alamitos, CA, USA, 2023. IEEE Computer So- ciety.
[66] Frederic Z Zhang, Yuhui Yuan, Dylan Campbell, Zhuoyao Zhong, and Stephen Gould. Exploring predicate visual context in detecting of human-object interac- tions. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10411–10421, October 2023.
[67] Or Hirschorn and Shai Avidan. Pose anything: A graph-based approach for category agnostic pose estimation, 2023.
[68] Ankan Bansal, Karan Sikka, Gaurav Sharma, Rama Chellappa, and Ajay Divakaran. Zero-shot object detection. CoRR, abs/1804.04340, 2018.
[69] Yong-Lu Li, Xinpeng Liu, Han Lu, Shiyi Wang, Junqi Liu, Jiefeng Li, and Cewu Lu. Detailed 2d-3d joint representation for human-object interaction. CoRR, abs/ 2004.08154, 2020.
[70] Liuyue Xie, Shreyas Misra, Nischal Suresh, Justin Soza-Soto, Tomotake Furuhata, and Kenji Shimada. Learning 3d human–object interaction graphs from transferable context knowledge for construction monitoring. Computers in Industry, 164:104171, 2025.
[71] Christian Diller and Angela Dai. CG-HOI: Contact-Guided 3D Human-Object Interaction Generation. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19888–19901, Los Alamitos, CA, USA, 2024. IEEE Computer Society.
[72] Yoonhwa Jung, Ikhyun Cho, Shun-Hsiang Hsu, and Mani Golparvar-Fard. Visualsitediary: A detector-free vision-language transformer model for captioning photologs for daily construction reporting and image retrievals. Automation in Construction, 165:105483, 2024.
[73] Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang. Visualbert: A simple and performant baseline for vision and language. CoRR, abs/ 1908.03557, 2019.
[74] Luowei Zhou, Hamid Palangi, Lei Zhang, Houdong Hu, Jason J. Corso, and Jianfeng Gao. Unified vision-language pre-training for image captioning and VQA. CoRR, abs/1909.11059, 2019.
[75] Russell Kenley and Olli Seppänen. Location-Based Management for Construction: Planning, scheduling and control. 07 2009.
[76] Cheng Yun Tsai, Mik Wanul Khosiin, Jacob J. Lin, and Chuin-Shan Chen. Multi-granular crew activity recognition for construction monitoring. Automation in Construction, 179:106428, 2025.
[77] Ming Zhang, Rui Xu, Haitao Wu, Jia Pan, and Xiaowei Luo. Human–robot collaboration for on-site construction. Automation in Construction, 150:104812, 2023.
[78] Balasubramaniyan Chandrasekaran and James M. Conrad. Human-robot collaboration: A survey. In SoutheastCon 2015, pages 1–8, 2015.
[79] Francesco Semeraro, Alexander Griffiths, and Angelo Cangelosi. Human–robot collaboration and machine learning: A systematic review of recent research. Robotics and Computer-Integrated Manufacturing, 79:102432, 2023
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100913-
dc.description.abstract在建築專案管控中,有效追溯工人的任務範圍及落實程度需要能夠捕捉工人與物件之間複雜且動態的交互作用。然而,現有的人–物互動(Human–Object Interaction, HOI)方法仍受限於僅專注於單一動作、有限的物件尺度,以及不足的空間推理,使其難以應用於大型且多情境的建築環境。為解決這些問題,本研究提出一套工人任務可溯性偵測(Worker Accountability Monitoring, WAM)框架,結合卷積神經網路(CNN)與圖注意力網路(Graph Attention Networks, GATs),能夠偵測多種物件類別(混凝土、模板及鋼筋),並涵蓋三種尺度(大、中、小)及兩種空間情境(局部與全域)。該框架同時辨識互動行為(如綁紮、搬運、澆置)與非互動行為,提供工地活動的整體視角。實驗結果顯示,本方法在物件偵測中達到 mAP = 0.830,在 HOI 任務的局部與全域情境下分別取得 0.553 與 0.502 的 mAP,並顯著優於現有 HOI 基準方法。

本研究透過以下方式解決了現有 HOI 方法在構建中的關鍵局限性:(1)引入各種規模的對象表示(大、中、小)以更好地反映現實世界的情況,(2)整合局部和全局背景以改進空間推理,以及(3)實現每幀的多個同時交互以處理擁擠、複雜的環境。然而,現階段仍存在若干限制,包括資料集不平衡、施工任務涵蓋範圍有限、依賴靜態 2D 表徵,以及缺乏時序與幾何推理。為克服這些挑戰,本研究正持續拓展至「空間–姿態–時間」WAM 架構,結合圖神經網路與 Transformer 模組,以捕捉動作序列並強化精細互動推理。未來研究將進一步推動 WAM 朝向即時生產力監測、透過視覺–語言模型的自動化報告生成、結合 LBMS 的動態排程,以及人機協作施工流程的應用發展。綜合而言,本研究為基於電腦視覺的責任監測建立了嚴謹基礎,並為提升 WAM 的可擴展性、穩健性與現場實用性勾勒出清晰的研究藍圖。
zh_TW
dc.description.abstractEffectively monitoring worker accountability in construction project controls requires capturing complex and dynamic interactions between workers and objects. However, existing Human-Object Interaction (HOI) approaches remain constrained by their focus on single actions, limited object scales, and insufficient spatial reasoning, making them unsuitable for large-scale and multi-context construction environments. To address these gaps, this study introduces a Worker Accountability Monitoring (WAM) framework that integrates Convolutional Neural Networks with Graph Attention Networks to detect interactions across multiple object categories: concrete, formwork, and steel rebar at three scales (big, medium, small) and two spatial contexts (local and global). The framework recognizes both interactive actions (e.g., tying, transporting, pouring) and non-interactive behaviors, offering a holistic view of on-site activities. Experimental results demonstrate strong performance, achieving an mAP of 0.830 for object detection and HOI scores of 0.553 (local) and 0.502 (global), significantly outperforming representative HOI baselines.

This study addresses key limitations of existing HOI approaches in construction by: (1) introducing various scale object representations (big, medium, and small) to better reflect real-world conditions, (2) integrating local and global contexts for improved spatial reasoning, and (3) enabling multiple simultaneous interactions per frame to handle crowded, complex environments. However, limitations remain, including imbalanced datasets, restricted coverage of construction tasks, reliance on static 2D formulations, and limited temporal or geometric reasoning. To overcome these constraints, ongoing work extends WAM into a spatial-pose-temporal framework that combines graph neural networks and transformer-based modules for sequential action modeling and fine-grained interaction reasoning. Future research will further advance WAM toward real-time productivity monitoring, automated construction reporting via vision-language models, dynamic scheduling with LBMS, and human-robot collaboration in mixed crews. Collectively, this study establishes a rigorous foundation for vision-based accountability monitoring and outlines a clear trajectory for advancing scalable, robust, and field-ready WAM systems.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-11-05T16:04:41Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-11-05T16:04:41Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsContents
Page
Acknowledgements i
摘要 iv
Abstract vi
Contents viii
List of Figures xii
List of Tables xiv
Chapter 1 Introduction 1
1.1 Motivation and background . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Lack of object size variability for the construction camera surveillance 4
1.2.2 Lack of spatial context for large geometric objects . . . . . . . . . . 5
1.2.3 Lack of interactions for activity-level productivity monitoring . . . 6
1.3 Research objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Organization of the dissertation . . . . . . . . . . . . . . . . . . . . 7
Chapter 2 Literature review 9
2.1 Activity recognition in construction . . . . . . . . . . . . . . . . . . 9
2.1.1 Vision-based activity recognition . . . . . . . . . . . . . . . . . . . 10
2.1.2 Sensor-based activity recognition . . . . . . . . . . . . . . . . . . . 10
2.2 Human-object interaction methods . . . . . . . . . . . . . . . . . . . 11
2.2.1 HOI detection in computer vision . . . . . . . . . . . . . . . . . . . 12
2.2.2 HOI detection for construction applications . . . . . . . . . . . . . 13
2.3 Knowledge gaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Chapter 3 Research methodology 17
3.1 Worker accountability monitoring (WAM) proposed framework . . . 17
3.2 Object detection module . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.1 Backbone architecture selection . . . . . . . . . . . . . . . . . . . 22
3.2.2 Instance detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.3 Feature extraction process . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Graph attention network (GAT) module for HOI . . . . . . . . . . . 29
3.3.1 Graph representation selection process . . . . . . . . . . . . . . . . 29
3.3.2 Contextual features . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3.3 Developing WAM’s Graph with GATs . . . . . . . . . . . . . . . . 34
3.4 Worker accountability monitoring prediction . . . . . . . . . . . . . 38
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Chapter 4 Experiments 41
4.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1.1 System specifications . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1.2 Hyperparameters of object detection . . . . . . . . . . . . . . . . . 42
4.1.3 Hyperparameters of GATs . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Dataset preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2.1 Dataset collection . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.2 Dataset annotation and quality control . . . . . . . . . . . . . . . . 45
4.3 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3.1 Intersection over Union (IoU) . . . . . . . . . . . . . . . . . . . . . 51
4.3.2 Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3.3 Mean Average Precision . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.4 Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.5 Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Chapter 5 Results 57
5.1 Object detection results . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Quantitative results of WAM . . . . . . . . . . . . . . . . . . . . . . 59
5.2.1 mAP score by object size variability (big, medium, small) . . . . . . 60
5.2.2 mAP score by spatial context (local and global area) . . . . . . . . . 61
5.2.3 Performance comparison with baselines . . . . . . . . . . . . . . . 61
5.2.4 The ablation study . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2.5 Confusion matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3 Qualitative results of WAM . . . . . . . . . . . . . . . . . . . . . . 65
5.3.1 Small object inference . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3.2 Medium object inference . . . . . . . . . . . . . . . . . . . . . . . 66
5.3.3 Big object inference . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Chapter 6 Discussion 75
6.1 Performance interpretation . . . . . . . . . . . . . . . . . . . . . . . 75
6.1.1 Failure prediction analysis . . . . . . . . . . . . . . . . . . . . . . 76
6.1.2 Field applicability . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2 Limitations of current work . . . . . . . . . . . . . . . . . . . . . . 80
6.2.1 Data-related limitations . . . . . . . . . . . . . . . . . . . . . . . . 80
6.2.2 Model-related limitations . . . . . . . . . . . . . . . . . . . . . . . 82
6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Chapter 7 Conclusions and future work 85
7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.3.1 Real-time productivity monitoring . . . . . . . . . . . . . . . . . . 90
7.3.2 Automation of daily construction reporting . . . . . . . . . . . . . . 91
7.3.3 Project scheduling with location-based management systems . . . . 92
7.3.4 Human–robot collaboration for dynamic construction workflows . . 93
References 94
-
dc.language.isoen-
dc.subject電腦視覺-
dc.subject人機互動-
dc.subject活動辨識-
dc.subject物件偵測-
dc.subject圖神經網路-
dc.subject生產力監控-
dc.subjectComputer Vision-
dc.subjectHuman-Object Interaction-
dc.subjectActivity recognition-
dc.subjectObject detection-
dc.subjectGraph Neural Network-
dc.subjectProductivity Monitoring-
dc.title以人與物互動與圖型表徵自動分析營造場域之工人任務zh_TW
dc.titleHuman–Object Interaction with Graph Representations for Automated Worker Accountability in Constructionen
dc.typeThesis-
dc.date.schoolyear114-1-
dc.description.degree博士-
dc.contributor.coadvisor林之謙zh_TW
dc.contributor.coadvisorJacob J. Linen
dc.contributor.oralexamcommittee謝尚賢;陳柏華;王維志;鄭明洲;楊小東zh_TW
dc.contributor.oralexamcommitteeShang-Hsien Hsieh;Albert Chen;Wei-Chih Wang;Min-Yuan Cheng;I-Tung Yangen
dc.subject.keyword電腦視覺,人機互動活動辨識物件偵測圖神經網路生產力監控zh_TW
dc.subject.keywordComputer Vision,Human-Object InteractionActivity recognitionObject detectionGraph Neural NetworkProductivity Monitoringen
dc.relation.page105-
dc.identifier.doi10.6342/NTU202504627-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2025-11-05-
dc.contributor.author-college工學院-
dc.contributor.author-dept土木工程學系-
dc.date.embargo-lift2025-11-06-
Appears in Collections:土木工程學系

Files in This Item:
File SizeFormat 
ntu-114-1.pdf27.35 MBAdobe PDFView/Open
Show simple item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved