以多視立體影像結合機器學習進行三維場景重建

Ya-Chu Tsao; 曹雅筑

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68006

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	徐百輝(Pai-Hui Hsu)
dc.contributor.author	Ya-Chu Tsao	en
dc.contributor.author	曹雅筑	zh_TW
dc.date.accessioned	2021-06-17T02:11:09Z	-
dc.date.available	2025-08-18
dc.date.copyright	2020-08-24
dc.date.issued	2020
dc.date.submitted	2020-08-17
dc.identifier.citation	Aanæs, H., Jensen, R. R., Vogiatzis, G., Tola, E., and Dahl, Anders Bjorholm, 2016. Large-Scale Data for Multiple-View Stereopsis, International Journal of Computer Vision, 120(2), 153-168. Ann, N. Q., Achmad, M. S. H., Bayuaji, L., Daud, M. R., and Pebrianti, D., 2016. Study on 3D Scene Reconstruction in Robot Navigation using Stereo Vision, Proceedings of 2016 IEEE International Conference on Automatic Control and Intelligent Systems, Selangor, Malaysia, pp. 72-77. Bülthoff, I., Bülthoff, H., and Sinha, P. J. N. n., 1998. Top-down influences on stereoscopic depth-perception, Nature neuroscience, 1(3), 254-257. Battiato, S., Capra, A., Curti, S., and Cascia, M. L., 2004. 3D stereoscopic image pairs by depth-map generation, 2nd International Symposium on 3D Data Processing, Visualization and Transmission, Thessaloniki, Greece, pp. 124-131. Bay, H., Tuytelaars, T., and Van Gool, L., 2006. SURF: Speeded Up Robust Features, Proceedings of European conference on computer vision, Berlin, Germany, pp. 404-417. Brown, M. Z., Burschka, D., and Hager, G. D., 2003. Advances in computational stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(8), 993-1008. Campen, M., Attene, M., and Kobbelt, L., 2012. A Practical Guide to Polygon Mesh Repairing, Eurographics(Tutorials). Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., . . . Su, H. J. a. p. a., 2015. Shapenet: An information-rich 3d model repository. Choy, C. B., Xu, D., Gwak, J., Chen, K., and Savarese, S., 2016. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction, European conference on computer vision, pp. 628-644. Cremers, D., Rousson, M., Deriche, R. J. I. j. o. c. v., 2007. A review of statistical approaches to level set segmentation: integrating color, texture, motion and shape, International journal of computer vision, 72(2), pp. 195-215. Dabbura, I., 2017. Gradient Descent Algorithm and Its Variants, Towards Data Science, URL: https://towardsdatascience.com/gradient-descent-algorithm-and-its-variants-10f652806a3(last date accessed: 17 August 2020). Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser, T., and Nießner, M., 2017. Scannet: Richly-annotated 3d reconstructions of indoor scenes, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828-5839. Duchi, J., Hazan, E., and Singer, Y. J. J. o. m. l. r., 2011. Adaptive subgradient methods for online learning and stochastic optimization, Journal of machine learning research, 12(7). Furukawa, Y., Hernández, C. J. F., Graphics, T. i. C., and Vision, 2015. Multi-view stereo: A tutorial, Foundations and Trends® in Computer Graphics and Vision, 9(1-2), 1-148. Furukawa, Y., and Ponce, J., 2010. Accurate, Dense, and Robust Multiview Stereopsis, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8), 1362-1376. Haralick, R. M., and Shapiro, L. G., 1999. Computer and robot vision, Addison-wesley Reading. Hecht-Nielsen, R., 1992. Theory of the backpropagation neural network, Neural networks for perception, Elsevier, pp. 65-93. Hinton, G., 2012. Lecture 6d: a separate, adaptive learning rate for each connection. Slides of Lecture Neural Networks for Machine Learning. Ide, H., and Kurita, T., 2017. Improvement of learning for CNN with ReLU activation by sparse regularization, Proceedings of 2017 IEEE International Joint Conference on Neural Networks (IJCNN), pp. 2684-2691. Ji, M., Gall, J., Zheng, H., Liu, Y., and Fang, L., 2017. SurfaceNet: An End-to-end 3D Neural Network for Multiview Stereopsis, Proceedings of the IEEE International Conference on Computer Vision, pp. 2307-2315. Kar, A., Häne, C., and Malik, J., 2017. Learning a multi-view stereo machine, Advances in neural information processing systems, pp. 365-376. Kingma, D. P.,and Ba, J. J. a. p. a., 2014. Adam: A method for stochastic optimization. Krizhevsky, A., Sutskever, I., and Hinton, G. E., 2012. Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, pp. 1097-1105. Kuo, T.-Y., Lo, Y.-C., and Lin, C.-C., 2012. 2D-to-3D conversion for single-view image based on camera projection model and dark channel model, Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1433-1436. Lau, M. M.,and Lim, K. H., 2017. Investigation of activation functions in deep belief network, Proceedings of IEEE 2017 2nd international conference on control and robotics engineering (ICCRE), pp. 201-206. LeCun, Y., Bengio, Y., and Hinton, G. J. n., 2015. Deep learning, MIT press. Linder, W., 2009. Digital photogrammetry, Springer. Linsen, L., 2001. Point cloud representation, Univ., Fak. Meerits, S., Nozick, V., and Saito, Hideo, 2017. Real-time scene reconstruction and triangle mesh generation using multiple RGB-D cameras, Journal of Real-Time Image Processing, pp. 1-13. Mortazi, A., Karim, R., Rhode, K., Burt, J., Bagci, U., 2017. CardiacNET: Segmentation of Left Atrium and Proximal Pulmonary Veins from MRI Using Multi-view CNN, Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 377-385. Mitchell, T., Buchanan, B., DeJong, G., Dietterich, T., Rosenbloom, P., Waibel, A., 1990. Machine learning, Annual review of computer science, 4(1), pp. 417-433. Murata, H., Mori, Y., Yamashita, S., Maenaka, A., Okada, S., Oyamada, K., and Kishimoto, S., 1998. 32.2: A Real‐Time 2‐D to 3‐D Image Conversion Technique Using Computed Image Depth, Proceedings of SID Symposium Digest of Technical Papers, 29(1), pp. 919-923. Polyak, B. T. J. U. C. M., and Physics, M., 1964. Some methods of speeding up the convergence of iteration methods, USSR Computational Mathematics and Mathematical Physics, Elsevier, 4(5), pp. 1-17. Qin, C., You, H., Wang, L., Kuo, C.-C. J., and Fu, Y., 2019. PointDAN: A multi-scale 3D domain adaption network for point cloud representation, Proceedings of Advances in Neural Information Processing Systems, pp. 7192-7203. Rawat, W., and Wang, Z. J. N. c., 2017. Deep convolutional neural networks for image classification: A comprehensive review, Neural computation, MIT Press, 29(9), pp. 2352-2449. Saxena, A., Chung, S. H., and Ng, Andrew., 2008. 3-D Depth Reconstruction from a Single Still Image, International journal of computer vision, 76(1), pp. 53-69. Scharstein, D., Szeliski, Richard, 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms, International journal of computer vision, 47(1-3), pp. 7-42. Schops, T., Schonberger, J. L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., and Geiger, A., 2017. A multi-view stereo benchmark with high-resolution images and multi-camera videos, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3260-3269. Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., and Szeliski, R., 2006. A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms, Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 519-528. Seitz, S. M., and Dyer, Charles. R., 1999. Photorealistic Scene Reconstruction by Voxel Coloring, International Journal of Computer Vision, 35(2), pp. 151-173. Smith, L. N., 2017. Cyclical learning rates for training neural networks, Proceedings of 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 464-472. Snavely, N., Seitz, S. M., and Szeliski, R., 2008. Modeling the World from Internet Photo Collections, International Journal of Computer Vision, 80(2), pp. 189-210. Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E., 2015. Multi-view convolutional neural networks for 3d shape recognition, Proceedings of 2015 IEEE international conference on computer vision, pp. 945-953. Sutskever, I., Martens, J., Dahl, G., and Hinton, G., 2013. On the importance of initialization and momentum in deep learning, Proceedings of International conference on machine learning, pp. 1139-1147. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., and Xiao, J., 2015. 3D shapenets: A deep representation for volumetric shapes, Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1912-1920. Yao, Y., Luo, Z., Li, S., Fang, T., and Quan, L., 2018. MVSNet: Depth inference for unstructured multi-view stereo, Proceedings of 2018 European Conference on Computer Vision (ECCV), pp. 767-783. Yao, Y., Luo, Z., Li, S., Shen, T., Fang, T., Quan, L., 2019. Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5525-5534. Yao, Y., Luo, Z., Li, S., Zhang, J., Ren, Y., Zhou, L., . . . Quan, L. J. a. p. a., 2020. BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Networks, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1790-1799. Yu, F., and Koltun, V. J. a. p. a., 2015. Multi-scale context aggregation by dilated convolutions. Ziegler, R., Matusik, W., Pfister, H., and McMillan, L., 2003. 3D reconstruction using labeled image regions, Eurographics Association, Aachen, Germany, pp. 248-259. 丁皓偉，2014。結合十字區塊匹配之半全域匹配法優化作業，國立臺灣大學土木工程學研究所碩士論文，臺北市。林子堯，2012。單視角視訊深度估測演算法，國立臺北科技大學電機工程學系研究所碩士論文，臺北市。洪國隆，2007。使用立體視覺建立網路虛擬實境之地理資訊系統，國立臺灣大學生物產業機電工程學研究所碩士論文，臺北市。莊曜誠，2013。基於三維建模與多視角影像擷取之物體三維定位及追蹤技術，國立中正大學電機工程研究所碩士論文，嘉義縣。曾義星，1997。航空攝影測量如何邁向資訊時代，航測及遙測學刊，2(1): 103-112。
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68006	-
dc.description.abstract	由多角度獲取的二維影像常被應用於三維場景重建，利用兩張以上不同視點的影像，模擬人類視覺系統，基於視差原理獲取影像對應點之間的位置偏差，傳統以人工立體量測方式獲得三維空間資訊，其後電腦視覺與影像處理的引入，加速影像匹配技術的發展，利用密匹配逐像元地自動化尋找立體像對中之共軛像點，並由密匹配點前方交會出物空間三維地面坐標，產製高密度點雲，然其成果仰賴人工評估，且點雲模型易有匹配錯誤所產生的雜訊，遮蔽區資料流失與同調區匹配仍為待解之題。近年來許多研究嘗試引入機器學習技術，直接從二維影像和經驗中進行學習並訓練預測模型，保留同調區特徵並學習遮蔽區與三維模型間之幾何關係，最後輸出三維空間資訊預測，更自我評估預測成功率。　　現今三維建模廣泛應用在各領域，使用率增加，更追求其作業效率與精度，許多文獻與研究已針對小尺度物件與場景重建，在處理二維影像的領域展現了優秀的表現，然影像遮蔽區重建困難，重建場景尺度擴大仍是挑戰，因此本研究比較與分析基於多視立體影像，使用機器學習直接從多視角二維影像重建三維模型的方法，簡化數據處理作業，觀察不同方法對不同場景之適用性，以期針對不同場景提供應用建議以及預期成果。	zh_TW
dc.description.abstract	3D scene model is the basic data model in 3D GIS (Geographic Information System) which can be used for 3D geo-visualization and scene analysis. Commonly the 3D scene can be reconstructed by means of LiDAR and photogrammetry technologies, however most of the methods are time-consuming and not fully automatic. How to efficiently and automatically reconstruct the 3D scene models has become an important research issue. This paper proposes a 3D scene reconstruction method from multi-view stereo (MVS) images based on machine learning. Similar to the stereo-pair for 3D vision, the multi-view stereo mimics the human visual system (HVS) to acquire 3D information from multiple overlapping images. Because of the multiple view of an object, the problem of occlusion can be overcome. However, the complex geometric relationship between multiple view stereo images also increase the difficulty of calculation. To make the processing of 3D reconstruction more efficient and automatic, a novel method based on machine learning was introduced. Machine learning is a subset of artificial intelligence (AI) that provides systems the ability to automatically learn from data and improve from experience without too much manual intervention. Therefore, this study intends to use the advantages of machine learning to extract and train the useful features for reconstruction, improving the problems from occlusion. Based on multi-view stereo images and the machine learning model, this study aims to reconstruct the object or even the scene directly and compare the applicability of different scene from algorisms. Make the data processing operations simplified and the entire process more efficient or fully automated.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T02:11:09Z (GMT). No. of bitstreams: 1 U0001-1708202017223000.pdf: 8727521 bytes, checksum: 940cf382a1fe511878538f0cbb2aa8d2 (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	誌謝 i 中文摘要 ii ABSTRACT iii 目錄 iv 圖目錄 vi 表目錄 ix 第一章　緒論 1 1.1 前言 1 1.2 研究動機與目的 2 1.3 研究流程 3 1.4 論文架構 3 第二章　文獻回顧 4 2.1　影像建模 4 2.1.1　單視角影像重建(Single View Reconstruction) 4 2.1.2　雙視角與多視立體重建(Multi-View Stereo Reconstruction) 6 2.1.3　三維模型呈現 8 2.2　機器學習 12 2.2.1　深度學習之概念 12 2.2.2　應用深度學習於二維影像重建三維模型 12 2.2.3　以多視影像之相關神經網路研究與挑戰 13 第三章　研究方法 19 3.1　影像資料前處理 19 3.2　卷積神經網路 20 3.2.1　卷積層(Convolutional layer) 20 3.2.2　池化層(Pooling layers) 22 3.2.3　全連接層(Fully connected layer) 22 3.2.4　模型訓練 26 3.3　多視立體影像相關深度學習架構之一：SurfaceNet 30 3.3.1　色彩體素立方(Colored voxel cube, CVC)轉換 31 3.3.2　特徵萃取與重建色彩體素立方 31 3.4　多視立體影像相關深度學習架構之二：MVSNet 33 3.5.1　匹配成本值之計算：Gated Recurrent Unit(GRU) 35 3.5.2　深度圖的產出 36 第四章　實驗成果分析與討論 37 4.1 實驗資料 37 4.2　資料前處理 38 4.2.1　涉及像片幾何面向的資料擴增 38 4.2.2　涉及像片輻射面向的資料擴增 40 4.3　實驗一：DTU小房屋模型重建 42 4.3.1　SurfaceNet成果討論 43 4.3.2　MVSNet、R-MVSNet成果討論 47 4.4　實驗二：ETH3D室內儲藏室近景重建 52 4.5　實驗三：GL3D室外場景重建 55 4.6　實驗四：臺大校園建築物 58 4.7　實驗分析與討論 61 4.7.1　訓練前與訓練中之影像資料處理 62 4.7.2　時間成本 62 4.7.3　預測成果之精度 63 第五章　結論與未來展望 65 5.1　研究結論 65 5.2　未來展望 67 參考文獻 68
dc.language.iso	zh-TW
dc.title	以多視立體影像結合機器學習進行三維場景重建	zh_TW
dc.title	3D Scene Reconstruction from Multi-View Stereo Images Using Machine Learning	en
dc.type	Thesis
dc.date.schoolyear	108-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	邱式鴻(Shih-Hong Chio),林柏丞(Bo-Cheng Lin)
dc.subject.keyword	多視立體,機器學習,三維場景重建,	zh_TW
dc.subject.keyword	Machine learning,Multi-view stereo,Scene reconstruction,	en
dc.relation.page	72
dc.identifier.doi	10.6342/NTU202003821
dc.rights.note	有償授權
dc.date.accepted	2020-08-18
dc.contributor.author-college	工學院	zh_TW
dc.contributor.author-dept	土木工程學研究所	zh_TW
顯示於系所單位：	土木工程學系

文件中的檔案：

檔案	大小	格式
U0001-1708202017223000.pdf 目前未授權公開取用	8.52 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。