請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98540完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 林之謙 | zh_TW |
| dc.contributor.advisor | Jacob J. Lin | en |
| dc.contributor.author | 蔡承耘 | zh_TW |
| dc.contributor.author | Cheng-Yun Tsai | en |
| dc.date.accessioned | 2025-08-18T00:48:07Z | - |
| dc.date.available | 2025-08-18 | - |
| dc.date.copyright | 2025-08-15 | - |
| dc.date.issued | 2025 | - |
| dc.date.submitted | 2025-08-07 | - |
| dc.identifier.citation | [1] Begüm Sertyeşilışık. Global trends in the construction industry: challenges of employment. In Handbook of Research on Unemployment and Labor Market Sustainability in the Era of Globalization, pages 255–274. IGI Global, 2017.
[2] Najim Saka et al. An assessment of the impact of the construction sector on the gross domestic product (gdp) of nigeria. Journal of Surveying, Construction and Property,13(1):42–65, 2022. [3] Leo Sveikauskas, Samuel Rowe, James Mildenberger, Jennifer Price, and Arthur Young. Productivity growth in construction. Journal of Construction Engineering and Management, 142(10):04016045, 2016. [4] Austan Goolsbee and Chad Syverson. The strange and awful path of productivity in the us construction sector. Technical report, National Bureau of Economic Research, 2023. [5] H Randolph Thomas and Donald F Kramer. The manual of construction productivity measurement and performance evaluation. Bureau of Engineering Research, University of Texas at Austin Austin, TX, 1988. [6] Robert E Chapman, David T Butry, and Allison L Huang. Measuring and improving us construction productivity. In Proceedings of TG65 and W065-Special Track. 18th CIB World Building Congress, 2010. [7] DAR Dolage and P Chan. Productivity in construction-a critical review of research. Engineer: journal of the institution of engineers, Sri Lanka, 46(4), 2013. [8] S Peter Dozzi and Simaan M AbouRizk. Productivity in construction. Institute for Research in Construction, National Research Council Ottawa, 1993. [9] Xiaochun Luo, Heng Li, Yantao Yu, Cheng Zhou, and Dongping Cao. Combining deep features and activity context to improve recognition of activities of workers in groups. Computer-Aided Civil and Infrastructure Engineering, 35(9):965–978, 2020. [10] Xiaochun Luo, Heng Li, Dongping Cao, Fei Dai, JoonOh Seo, SangHyun Lee, et al. Recognizing diverse construction activities in site images via relevance networks of construction-related objects detected by convolutional neural networks. J. Comput. Civ. Eng, 32(3):04018012, 2018. [11] Xiaochun Luo, Heng Li, Dongping Cao, Yantao Yu, Xincong Yang, and Ting Huang. Towards efficient and objective work sampling: Recognizing workers’ activities in site surveillance videos with two-stream convolutional networks. Automation in Construction, 94:360–370, 2018. [12] Maximilian Bügler, André Borrmann, Gbolabo Ogunmakin, Patricio A Vela, and Jochen Teizer. Fusion of photogrammetry and video analysis for productivity assessment of earthwork processes. Computer-Aided Civil and Infrastructure Engineering, 32(2):107–123, 2017. [13] Chen Chen, Zhenhua Zhu, and Amin Hammad. Automated excavators activity recognition and productivity analysis from construction site surveillance videos. Automation in construction, 110:103045, 2020. [14] Jake K Aggarwal and Michael S Ryoo. Human activity analysis: A review. Acm Computing Surveys (Csur), 43(3):1–43, 2011. [15] Jie Gong and Carlos H Caldas. An object recognition, tracking, and contextual reasoning-based video interpretation method for rapid productivity analysis of construction operations. Automation in Construction, 20(8):1211–1226, 2011. [16] Karen Simonyan and Andrew Zisserman. Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems, 27, 2014. [17] Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 6546–6555, 2018. [18] Ghazaleh Torabi, Amin Hammad, and Nizar Bouguila. Two-dimensional and three-dimensional cnn-based simultaneous detection and activity classification of construction workers. Journal of Computing in Civil Engineering, 36(4):04022009, 2022. [19] Jiannan Cai, Yuxi Zhang, and Hubo Cai. Two-step long short-term memory method for identifying construction activities through positional and attentional cues. Automation in Construction, 106:102886, 2019. [20] Xiaochun Luo, Heng Li, Hao Wang, Zezhou Wu, Fei Dai, and Dongping Cao. Vision-based detection and visualization of dynamic workspaces. Automation in Construction, 104:1–13, 2019. [21] Tian Lan, Yang Wang, Weilong Yang, and Greg Mori. Beyond actions: Discriminative models for contextual group activities. Advances in neural information processing systems, 23, 2010. [22] Jun Yang, Zhongke Shi, and Ziyan Wu. Vision-based action recognition of construction workers using dense trajectories. Advanced Engineering Informatics, 30(3):327–336, 2016. [23] Hanbin Luo, Chaohua Xiong, Weili Fang, Peter ED Love, Bowen Zhang, and Xi Ouyang. Convolutional neural networks: Computer vision-based workforce activity assessment in construction. Automation in Construction, 94:282–289, 2018. [24] Min-Yuan Cheng, Akhmad FK Khitam, and Hongky Haodiwidjaya Tanto. Construction worker productivity evaluation using action recognition for foreign labor training and education: A case study of taiwan. Automation in Construction, 150:104809, 2023. [25] Wang Chen, Donglian Gu, and Jintao Ke. Real-time ergonomic risk assessment in construction using a co-learning-powered 3d human pose estimation model. Computer-Aided Civil and Infrastructure Engineering, 39(9):1337–1353, 2024. [26] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 39(6):1137–1149, 2016. [27] Seunghoon Hong, Tackgeun You, Suha Kwak, and Bohyung Han. Online tracking by learning discriminative saliency map with convolutional neural network. In International conference on machine learning, pages 597–606. PMLR, 2015. [28] Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2462–2470, 2017. [29] Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision, pages 20–36. Springer, 2016. [30] Joseph Redmon. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. [31] Siyu Tang, Mykhaylo Andriluka, Bjoern Andres, and Bernt Schiele. Multiple people tracking by lifted multicut and person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3539–3548, 2017. [32] João F Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. High-speed tracking with kernelized correlation filters. IEEE transactions on pattern analysis and machine intelligence, 37(3):583–596, 2014. [33] Wongun Choi, Yu-Wei Chao, Caroline Pantofaru, and Silvio Savarese. Discovering groups of people in images. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part IV 13, pages 417–433. Springer, 2014. [34] Qi Wang, Mulin Chen, Feiping Nie, and Xuelong Li. Detecting coherent groups in crowd scenes by multiview clustering. IEEE transactions on pattern analysis and machine intelligence, 42(1):46–58, 2018. [35] Mohamed Rabie Amer, Peng Lei, and Sinisa Todorovic. Hirf: Hierarchical random field for collective activity recognition in videos. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13, pages 572–585. Springer, 2014. [36] Wongun Choi and Silvio Savarese. Understanding collective activitiesof people from videos. IEEE transactions on pattern analysis and machine intelligence, 36(6):1242–1257, 2013. [37] Tianmin Shu, Dan Xie, Brandon Rothrock, Sinisa Todorovic, and Song Chun Zhu. Joint inference of groups, events and human roles in aerial videos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4576–4584, 2015. [38] Mostafa S Ibrahim and Greg Mori. Hierarchical relational networks for group activity recognition and retrieval. In Proceedings of the European conference on computer vision (ECCV), pages 721–736, 2018. [39] Mengshi Qi, Jie Qin, Annan Li, Yunhong Wang, Jiebo Luo, and Luc Van Gool. stagnet: An attentive semantic rnn for group activity recognition. In Proceedings of the European conference on computer vision (ECCV), pages 101–117, 2018. [40] Minsi Wang, Bingbing Ni, and Xiaokang Yang. Recurrent modeling of interaction context for collective activity recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3048–3056, 2017. [41] Timur Bagautdinov, Alexandre Alahi, François Fleuret, Pascal Fua, and Silvio Savarese. Social scene understanding: End-to-end multi-person action localization and collective activity recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4315–4324, 2017. [42] Mahsa Ehsanpour, Alireza Abedin, Fatemeh Saleh, Javen Shi, Ian Reid, and Hamid Rezatofighi. Joint learning of social groups, individuals action and sub-group activities in videos. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16, pages 177–195. Springer, 2020. [43] Mahsa Ehsanpour, Fatemeh Saleh, Silvio Savarese, Ian Reid, and Hamid Rezatofighi. Jrdb-act: A large-scale dataset for spatio-temporal action, social group and activity detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20983–20992, 2022. [44] Ruize Han, Haomin Yan, Jiacheng Li, Songmiao Wang, Wei Feng, and Song Wang. Panoramic human activity recognition. In European Conference on Computer Vision, pages 244–261. Springer, 2022. [45] Sumin Lee, Yooseung Wang, Sangmin Woo, and Changick Kim. Spatio-temporal proximity-aware dual-path model for panoramic activity recognition. In European Conference on Computer Vision, pages 19–36. Springer, 2025. [46] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017. [47] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016. [48] A Vaswani. Attention is all you need. Advances in Neural Information Processing Systems, 2017. [49] Wongun Choi, Khuram Shahid, and Silvio Savarese. What are they doing?: Collective activity classification using spatio-temporal relationship among people. In 2009 IEEE 12th international conference on computer vision workshops, ICCV Workshops, pages 1282–1289. IEEE, 2009. [50] Meiqi Cao, Rui Yan, Xiangbo Shu, Jiachao Zhang, Jinpeng Wang, and Guo-Sen Xie. Mup: Multi-granularity unified perception for panoramic activity recognition. In Proceedings of the 31st ACM International Conference on Multimedia, pages 7666–7675, 2023. [51] Lihi Zelnik-Manor and Pietro Perona. Self-tuning spectral clustering. Advances in neural information processing systems, 17, 2004. [52] Xueyang Wang, Xiya Zhang, Yinheng Zhu, Yuchen Guo, Xiaoyun Yuan, Liuyu Xiang, Zerun Wang, Guiguang Ding, David Brady, Qionghai Dai, et al. Panda: A gigapixel-level human-centric video dataset. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3268–3278, 2020. [53] Joao Carreira and Andrew Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017. [54] Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. Slowfast networks for video recognition. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6202–6211, 2019. [55] Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, and Han Hu. Video swin transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3202–3211, 2022. [56] Yanghao Li, Chao-Yuan Wu, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, and Christoph Feichtenhofer. Mvitv2: Improved multiscale vision transformers for classification and detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4804–4814, 2022. [57] Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar, and Cordelia Schmid. Actor-centric relation network. In Proceedings of the European Conference on Computer Vision (ECCV), pages 318–334, 2018. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98540 | - |
| dc.description.abstract | 勞動力是專案中至關重要的資源。為了確保有效的管理,管理者必須了解當前進行中的工項以及參與工人的組成情況。傳統工程中,現場建築工人生產力的分析依賴於人工取樣與記錄,不僅耗時且容易出錯。隨著電腦視覺與深度學習技術的進步,許多研究嘗試透過自動化辨識方法來解決傳統人工方法中的低效率與主觀性問題。然而,大多數現有研究僅專注於低層級的姿態辨識,忽略了建設工地協作性與動態性的特質。本研究提出了一個多層次的工班活動辨識框架,目標不僅是辨識個別工人的動作,並包括將協作的工人分組並辨識其特定的工作項目。透過運用基於圖形的表示法與自注意力機制,此框架能有效整合空間與上下文資訊,從而實現準確的辨識結果。在實驗階段,我們建立了一個涵蓋鋼筋、模板和混凝土施工作業的工地數據集,並使用多層次的指標來評估模型的性能。結果顯示,我們的框架整體 F1 分數達到 73.41%,此外,結果還顯示,即使在空間距離極為接近的情況下,模型仍能有效學習並區分不同群體。在進一步的實驗與討論中顯示,視覺特徵相似性與空間鄰近性對於準確辨識至關重要,而當這兩個要素的比重相同時,模型的性能最佳。本研究為動態施工現場監測提供了一種可擴展且高效的解決方案,同時為未來在時間建模與人-物互動分析等領域的研究奠定了基礎。 | zh_TW |
| dc.description.abstract | The labor force is a critical resource in construction projects. To ensure effective management, it is essential for managers to understand ongoing tasks and the composition of workers involved. Traditionally, analyzing the productivity of on-site construction workers has relied on manual sampling and recording, which is both time-consuming and prone to errors. With advancements in computer vision and deep learning, many studies have explored automated recognition methods to address the inefficiencies and subjectivity of traditional manual approaches. However, most existing studies focus on low-level pose recognition, overlooking the collaborative and dynamic nature of construction sites. This study proposes a multi-granular crew activity recognition framework aimed at not only recognizing individual workers' actions but also grouping collaborating workers and identifying their specific work items. By leveraging graph-based representations and self-attention mechanisms, the framework effectively integrates spatial and contextual information to achieve accurate recognition results. In the experimental phase, we create a construction site dataset covering rebar, formwork, and concrete operations, and used multi-level metrics to evaluate the model's performance. The results show that our framework achieves an overall F1 Score of 73.41%, moreover, the results demonstrate that the model can effectively learn to differentiate between different groups, even when their spatial proximity is extremely close. Further experiments and discussions reveal that both visual feature similarity and spatial proximity are essential for accurate recognition, with the model performing best when both factors are given equal weight. This study provides a scalable and efficient solution for dynamic construction site monitoring while laying a foundation for future research in areas such as temporal modeling and human-object interaction analysis. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-18T00:48:06Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2025-08-18T00:48:07Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | Acknowledgements iii
摘要 v Abstract vii Contents ix List of Figures xiii List of Tables xv Chapter 1 Introduction 1 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problem Statement and Research Gaps . . . . . . . . . . . . . . . . 3 1.3 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Organization of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 7 Chapter 2 Literature review 9 2.1 Single Worker Action Recognition . . . . . . . . . . . . . . . . . . . 9 2.2 Multi-worker Action Recognition . . . . . . . . . . . . . . . . . . . 11 2.3 Crew Activity Recognition . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 Group & Multi-granularity Recognition . . . . . . . . . . . . . . . . 16 Chapter 3 Methodology 19 3.1 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.1 Graph Construction . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.2 Relation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.1.3 Multi-Granularity Module . . . . . . . . . . . . . . . . . . . . . . 24 3.1.4 Multi-Head Attention (MHA) . . . . . . . . . . . . . . . . . . . . . 27 3.1.5 Multi-Level Multi-Task Supervisions . . . . . . . . . . . . . . . . . 28 3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2.1 Network Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2.2 Training & Inference . . . . . . . . . . . . . . . . . . . . . . . . . 31 Chapter 4 Experiments 33 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1.1 Selected Activities . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.3.1 Model Performance Summary . . . . . . . . . . . . . . . . . . . . 39 4.3.2 Learning Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.3.3 Confusion Matrix for Individual Actions . . . . . . . . . . . . . . . 41 4.3.4 Ground Truth and Prediction Visualization . . . . . . . . . . . . . . 43 Chapter 5 Discussion 51 5.1 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . 51 5.2 Different Ways to Construct Distance-aware Affinity Matrix . . . . . 53 5.3 Relation Matrix Weight . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.4 Crew Detection and Crew Activity Recognition . . . . . . . . . . . . 57 5.5 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.6 Empirical Validation of Crew Productivity Monitoring . . . . . . . . 60 Chapter 6 Conclusion 65 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 References 69 | - |
| dc.language.iso | en | - |
| dc.subject | 深度學習 | zh_TW |
| dc.subject | 影像理解 | zh_TW |
| dc.subject | 多層次活動辨識 | zh_TW |
| dc.subject | 工地監測 | zh_TW |
| dc.subject | Construction Monitoring | en |
| dc.subject | Multi-Level Activity Recognition | en |
| dc.subject | Deep Learning | en |
| dc.subject | Image Understanding | en |
| dc.title | 建築工地多層次群體活動辨識 | zh_TW |
| dc.title | Multi-Granular Crew Activity Recognition for Construction Monitoring | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 113-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 陳俊杉;梁期鈞 | zh_TW |
| dc.contributor.oralexamcommittee | Chuin-Shan Chen;Ci-Jyun Liang | en |
| dc.subject.keyword | 影像理解,深度學習,工地監測,多層次活動辨識, | zh_TW |
| dc.subject.keyword | Image Understanding,Deep Learning,Construction Monitoring,Multi-Level Activity Recognition, | en |
| dc.relation.page | 77 | - |
| dc.identifier.doi | 10.6342/NTU202502586 | - |
| dc.rights.note | 同意授權(限校園內公開) | - |
| dc.date.accepted | 2025-08-11 | - |
| dc.contributor.author-college | 工學院 | - |
| dc.contributor.author-dept | 土木工程學系 | - |
| dc.date.embargo-lift | 2030-08-03 | - |
| 顯示於系所單位: | 土木工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 未授權公開取用 | 11.6 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
