請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95930完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 謝尚賢 | zh_TW |
| dc.contributor.advisor | Shang-Hsien Hsieh | en |
| dc.contributor.author | 何宏發 | zh_TW |
| dc.contributor.author | Wang-Fat Ho | en |
| dc.date.accessioned | 2024-09-25T16:11:31Z | - |
| dc.date.available | 2024-09-26 | - |
| dc.date.copyright | 2024-09-25 | - |
| dc.date.issued | 2024 | - |
| dc.date.submitted | 2024-08-13 | - |
| dc.identifier.citation | Yanquan Zhang, Ruidong Chang, Weian Mao, Jian Zuo, Lingqiao Liu, and Yilong Han6. Challenges of automating interior construction progress monitoring. Journal of Construction Engineering and Management, 150(9):03124004, 2024.
Hui Deng, Hao Hong, Dehuan Luo, Yichuan Deng, and Cheng Su. Automatic indoor construction process monitoring for tiles based on bim and computer vision. Journal of Construction Engineering and Management, 146(1):04019095, 2019. Wei Wei, Yujie Lu, Tao Zhong, Peixian Li, and Bo Liu. Integrated vision-based automated progress monitoring of indoor construction using mask region-based convolutional neural networks and bim. Automation in Construction, 140:104327, 5 2022. Christopher Kropp, Markus König, and Christian Koch. Object recognition in bim registered videos for indoor progress monitoring. In EG-ICE International Workshop on Intelligent Computing in Engineering, pages 1–10, 2013. Marti A. Hearst, Susan T Dumais, Edgar Osuna, John Platt, and Bernhard Scholkopf. Support vector machines. IEEE Intelligent Systems and their applications, 13(4):18– 28, 1998. Matti Pietikäinen. Local binary patterns. Scholarpedia, 5(3):9775, 2010. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587, 2014. Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016. Biyanka Ekanayake, Alireza Ahmadian Fard Fini, Johnny Kwok Wai Wong, and Peter Smith. A deep learning-based approach to facilitate the as-built state recognition of indoor construction works. Construction Innovation, 24(4):933–949, 2022. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017. K Sigalov and M König. 4d bim model adaptation based on construction progress monitoring. In eWork and eBusiness in Architecture, Engineering and Construction, pages 337–344. CRC Press, 2018. Seungjun Roh, Zeeshan Aziz, and Feniosky Pena-Mora. An object-based 3d walk-through model for interior construction progress monitoring. Automation in Construction, 20(1):66–75, 2011. Frédéric Bosché, Mahmoud Ahmed, Yelda Turkan, Carl T Haas, and Ralph Haas. The value of integrating scan-to-bim and scan-vs-bim techniques for construction monitoring using laser scanning and bim: The case of cylindrical mep components. Automation in Construction, 49:201–213, 2015. Reza Maalek, Janaka Ruwanpura, and Kamal Ranaweera. Evaluation of the state-of-the-art automated construction progress monitoring and control systems. In Construction Research Congress 2014: Construction in a Global Network, pages 1023–1032, 2014. Yelda Turkan, Frederic Bosche, Carl T Haas, and Ralph Haas. Automated progress tracking using 4d schedule and 3d sensing technologies. Automation in construction, 22:414–421, 2012. Zoran Pučko, Nataša Šuman, and Danijel Rebolj. Automated continuous construction progress monitoring using multiple workplace real time 3d scans. Advanced Engineering Informatics, 38:27–40, 2018. Giorgio PM Vassena, Luca Perfetti, Sara Comai, Silvia Mastrolembo Ventura, and Angelo LC Ciribini. Construction progress monitoring through the integration of 4d bim and slam-based mapping devices. Buildings, 13(10):2488, 2023. Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017. Aritra Pal, Jacob J Lin, and Shang-Hsien Hsieh. Schedule-driven analytics of 3d point clouds for automated construction progress monitoring. In Computing in Civil Engineering 2023, pages 412–420. 2023. Thang Vu, Kookhoi Kim, Tung M Luu, Thanh Nguyen, and Chang D Yoo. Softgroup for 3d instance segmentation on point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2708–2717, 2022. CH Yang, Y Pang, and U Soergel. Monitoring of building construction by 4d change detection using multi-temporal sar images. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 4:35–42, 2017. Eleonora Jonasova Parelius. A review of deep-learning methods for change detection in multispectral remote sensing images. Remote Sensing, 15(8):2092, 2023. Huiwei Jiang, Min Peng, Yuanjun Zhong, Haofeng Xie, Zemin Hao, Jingming Lin, Xiaoli Ma, and Xiangyun Hu. A survey on deep learning-based change detection from high-resolution remote sensing images. Remote Sensing, 14(7):1552, 2022. Rong Huang, Yusheng Xu, Ludwig Hoegner, and Uwe Stilla. Semantics-aided 3d change detection on construction sites using uav-based photogrammetric point clouds. Automation in Construction, 134:104057, 2022. Dongyeob Han, Suk Bae Lee, Mihwa Song, and Jun Sang Cho. Change detection in unmanned aerial vehicle images for progress monitoring of road construction. Buildings, 11(4):150, 2021. Thomas Czerniawski, Jong Won Ma, and Fernanda Leite. Automated building change detection with amodal completion of point clouds. Automation in construction, 124:103568, 2021. Theresa Meyer, Ansgar Brunn, and Uwe Stilla. Change detection for indoor construction progress monitoring based on bim, point clouds and uncertainties. Automation in Construction, 141:104442, 2022. Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023. Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, et al. Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295, 2024. Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat Behl, et al. Phi-3 technical report: A highly capable language model locally on your phone. arXiv preprint arXiv:2404.14219, 2024. Ron Mokady, Amir Hertz, and Amit H Bermano. Clipcap: Clip prefix for image captioning. arXiv preprint arXiv:2111.09734, 2021. Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019. Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International conference on machine learning, pages 19730–19742. PMLR, 2023. Wei-Lun Tsai. Assisting construction safety inspection documentation via vision-based natural language generation. Master’s thesis, National Taiwan University, 2022. Bo Xiao, Yiheng Wang, and Shih-Chung Kang. Deep learning image captioning in construction management: a feasibility study. Journal of Construction Engineering and Management, 148(7):04022049, 2022. Huan Liu, Guangbin Wang, Ting Huang, Ping He, Martin Skitmore, and Xiaochun Luo. Manifesting construction activity scenes via image captioning. Automation in Construction, 119:103334, 2020. Runsen Xu, Xiaolong Wang, Tai Wang, Yilun Chen, Jiangmiao Pang, and Dahua Lin. Pointllm: Empowering large language models to understand point clouds. arXiv preprint arXiv:2308.16911, 2023. Yue Qiu, Shintaro Yamamoto, Ryosuke Yamada, Ryota Suzuki, Hirokatsu Kataoka, Kenji Iwata, and Yutaka Satoh. 3d change localization and captioning from dynamic scans of indoor scenes. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1176–1185, 2023. David G Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60:91–110, 2004. Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded up robust features. In Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006. Proceedings, Part I 9, pages 404–417. Springer, 2006. Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. Orb: An efficient alternative to sift or surf. In 2011 International conference on computer vision, pages 2564–2571. Ieee, 2011. Bill Triggs, Philip F McLauchlan, Richard I Hartley, and Andrew W Fitzgibbon. Bundle adjustment - a modern synthesis. In Vision Algorithms: Theory and Practice: International Workshop on Vision Algorithms Corfu, Greece, September 21–22, 1999 Proceedings, pages 298–372. Springer, 2000. Steven M Seitz, Brian Curless, James Diebel, Daniel Scharstein, and Richard Szeliski. A comparison and evaluation of multi-view stereo reconstruction algorithms. In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), volume 1, pages 519–528. IEEE, 2006. Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509–517, 1975. Keiron O’shea and Ryan Nash. An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458, 2015. Radu Bogdan Rusu, Nico Blodow, and Michael Beetz. Fast point feature histograms (fpfh) for 3d registration. In 2009 IEEE international conference on robotics and automation, pages 3212–3217. IEEE, 2009. Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017. Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (tog), 38(5):1–12, 2019. Xumin Yu, Lulu Tang, Yongming Rao, Tiejun Huang, Jie Zhou, and Jiwen Lu. Point-bert: Pre-training 3d point cloud transformers with masked point modeling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19313–19322, 2022. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. Le Xue, Ning Yu, Shu Zhang, Artemis Panagopoulou, Junnan Li, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, et al. Ulip-2: Towards scalable multimodal pre-training for 3d understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27091–27101, 2024. Yihao Xue, Siddharth Joshi, Dang Nguyen, and Baharan Mirzasoleiman. Understanding the robustness of multi-modal contrastive learning to distribution shift. arXiv preprint arXiv:2310.04971, 2023. Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021. Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms. Advances in Neural Information Processing Systems, 36, 2024. Iro Armeni, Ozan Sener, Amir R Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 3d semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1534–1543, 2016. Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5828–5839, 2017. Tao Sun, Yan Hao, Shengyu Huang, Silvio Savarese, Konrad Schindler, Marc Pollefeys, and Iro Armeni. Nothing stands still: A spatiotemporal benchmark on 3d point cloud registration under large geometric and temporal change. arXiv preprint arXiv:2311.09346, 2023. Runyu Ding, Jihan Yang, Chuhui Xue, Wenqing Zhang, Song Bai, and Xiaojuan Qi. Pla: Language-driven open-vocabulary 3d scene understanding. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7010–7019, 2023. Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, et al. Datacomp: In search of the next generation of multimodal datasets. Advances in Neural Information Processing Systems, 36, 2024. Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, and Jenia Jitsev. Reproducible scaling laws for contrastive language-image learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2818–2829, 2023. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002. Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004. Satanjeev Banerjee and Alon Lavie. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65–72, 2005. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95930 | - |
| dc.description.abstract | 營建工程專案的施工進度監控是工程能否如期完成的重要關鍵,而工程專案常常會因為不同因素而發生延誤,尤其是室內施工階段。近年來,很多研究透過收集工地室內的資訊(如二維照片、三維點雲),透過不同技術來辨識室內元件的狀態,並與建築資訊模型(Building Information Model)或工項排程文件進行比對,以獲得室內個別工項(如室內隔間牆、電氣系統、管道系統)的施工進度。然而這些工程進度訊息並未包含足夠的文字語義資訊。目前,室內工程的工程進度量測及記錄大多仍倚賴人工對所有室內工項及元件的狀態進行逐項檢查,並以簡單的圖片和文字進行記錄其變化。這種記錄方式除了耗時和費力以外,記錄的品質和仔細程度也取決於工程檢查人員的主觀判斷及專業度。因此為了彌補目前研究缺少語義資訊的缺口以及協助工程人員進行快速與客觀的室內元件狀態變化的文字記錄,本研究提出一種利用多模態大型語言模型(Multimodal Large Language Model)描述室內元件狀態變化的框架,記錄室內元件的增減及位移等資訊,如牆面從未粉刷狀態變成已粉刷狀態,或偵測電燈、窗戶等是否已安裝,協助室內元件狀態檢查及記錄流程更加自動化。具體來說,本研究利用兩個同一位置但不同時段的室內房間點雲,合併後輸入至多模態大型語言模型中,以獲得兩個時段之間的室內元件狀態變化的文字描述。本研究經實際工地案例的收集資料測試後,成功描述室內元件的狀態及他們之間的變化。實驗對模型生成的描述與人類標註的描述兩者進行了評估,結果顯示,關注句子結構的ROUGE-L分數達到0.505,而關注精確匹配、語義和詞序的METEOR分數則達到0.415。此外,經過GPT-4對兩者的比較和評估,所評估的分數為55分,代表模型生成的描述能達到人類基準的55%。這些成果展現了模型偵測及描述室內元件的狀態變化的能力。 | zh_TW |
| dc.description.abstract | Monitoring the construction progress of building projects is crucial to ensuring timely completion, particularly during the interior construction phase, which is often prone to delays due to various factors. Recent studies have increasingly focused on collecting on-site interior data, such as 2D images and 3D point clouds, to identify the status of interior components. These components are then compared with Building Information Models (BIM) or project scheduling documents to assess the progress of specific interior tasks, such as partition walls, electrical systems, and piping systems. However, the progress information obtained from these methods often lacks sufficient semantic detail.
Currently, the measurement and recording of construction progress for interior projects predominantly rely on manual inspection of each interior component. Inspectors typically record changes using simple images and text, a process that is time-consuming and labor-intensive, with the quality and thoroughness of the records heavily dependent on the inspector's subjective judgment and expertise. To address the gap in semantic information in current research and to assist construction personnel in swiftly and objectively documenting the status changes of interior components, this study proposes a framework that utilizes a Multimodal Large Language Model (MLLM) to describe changes in the status of interior components. This framework captures and records information on additions, removals, and relocations of interior components, such as the transition of a wall from an unpainted to a painted state, or the installation of lighting fixtures and windows. The goal is to make the inspection and documentation process for interior components more automated. Specifically, this study involves combining two 3D point clouds of the same interior space, captured at different times, and inputting them into an MLLM to generate a textual description of the changes in interior component status between the two time points. After testing this framework with real construction site data, the study successfully described the status of interior components and their changes. The generated descriptions were evaluated against human-annotated descriptions, with a ROUGE-L score of 0.505, indicating sentence structure accuracy, and a METEOR score of 0.415, reflecting precision in matching semantics and word order. Additionally, a comparison and evaluation conducted using GPT-4 resulted in a score of 55, suggesting that the model-generated descriptions achieved 55% of the quality of human benchmarks. These results demonstrate the model's capability in detecting and describing changes in the status of interior components. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-09-25T16:11:31Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2024-09-25T16:11:31Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | Verification Letter from the Oral Examination Committee i
Acknowledgments ii 摘要 iv Abstract vi Contents ix List of Figures xii List of Tables xiii Chapter 1 Introduction 1 1.1 Background and Motivation 1 1.1 Background and Motivation 1 1.2 Research Objective 3 1.3 Organization of Thesis 4 Chapter 2 Literature Review 5 2.1 Vision-based Interior Progress Monitoring 6 2.1.1 Image-based Approach 7 2.1.2 Point cloud-based Approach 9 2.2 Change Detection in Construction 11 2.3 Natural Language Generation in Construction 12 2.4 Summary 14 Chapter 3 Methodology 16 3.1 Framework Overview 16 3.2 Point Cloud Processing 20 3.2.1 Image-based 3D Reconstruction 20 3.2.2 Point Cloud Cleaning 22 3.2.3 Point Cloud Differences Obtainment 25 3.3 Caption Data Generation 26 3.4 Point Cloud Encoder 28 3.4.1 Model Selection 28 3.4.2 Multimodal Alignment 32 3.5 Natural Language Generation 33 3.5.1 Embedding Transformation 34 3.5.2 Large Language Model 35 Chapter 4 Experiments 39 4.1 Dataset 39 4.2 Implementation Details of Point Cloud Processing 46 4.3 Point Cloud Encoder 48 4.3.1 Implementation Details 48 4.3.2 Multimodal Alignment Results 51 4.4 Natural Language Generation 53 4.4.1 Implementation Details 53 4.4.2 Generation Results 54 Chapter 5 Discussion 61 5.1 Experiment Results 61 5.2 Limitations and Possibilities 62 Chapter 6 Conclusion 64 6.1 Conclusion 64 6.2 Future Work 66 References 69 | - |
| dc.language.iso | en | - |
| dc.title | 基於多模態大型語言模型之室內元件變化偵測 | zh_TW |
| dc.title | Interior Components Change Detection Through Interpreting Point Cloud by Multimodal Large Language Model | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 112-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.coadvisor | 林之謙 | zh_TW |
| dc.contributor.coadvisor | Jacob J. Lin | en |
| dc.contributor.oralexamcommittee | 林祐正;吳日騰 | zh_TW |
| dc.contributor.oralexamcommittee | Yu-Cheng Lin;Rih-Teng Wu | en |
| dc.subject.keyword | 多模態大型語言模型,室內工地,點雲,變化偵測,變化描述,自然語言生成, | zh_TW |
| dc.subject.keyword | Multimodal Large Language Model,Interior Construction,Point Cloud,Change Detection,Change Description,Natural Language Generation, | en |
| dc.relation.page | 78 | - |
| dc.identifier.doi | 10.6342/NTU202404199 | - |
| dc.rights.note | 未授權 | - |
| dc.date.accepted | 2024-08-14 | - |
| dc.contributor.author-college | 工學院 | - |
| dc.contributor.author-dept | 土木工程學系 | - |
| 顯示於系所單位: | 土木工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-112-2.pdf 未授權公開取用 | 4.39 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
