Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50454
Full metadata record
???org.dspace.app.webui.jsptag.ItemTag.dcfield???ValueLanguage
dc.contributor.advisor洪一平(Yi-Ping Hung)
dc.contributor.authorYu-Ta Chenen
dc.contributor.author陳禹達zh_TW
dc.date.accessioned2021-06-15T12:41:25Z-
dc.date.available2020-08-25
dc.date.copyright2020-08-25
dc.date.issued2020
dc.date.submitted2020-08-11
dc.identifier.citation[1] G. Zhang, W. Hua, X. Qin, Y. Shao, and H. Bao, 'Video stabilization based on a 3D perspective camera model,' The Visual Computer, vol. 25, no. 11, p. 997, 2009.
[2] C. Liang and F. Shi, 'Fused video stabilization on the pixel 2 and pixel 2 xl,' Google Res. Blog, Mountain View, CA, USA, Tech. Rep, vol. 11, 2017.
[3] B. M. Smith, L. Zhang, H. Jin, and A. Agarwala, 'Light field video stabilization,' in 2009 IEEE 12th international conference on computer vision, 2009: IEEE, pp. 341-348.
[4] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, 'ORB: An efficient alternative to SIFT or SURF,' in 2011 International conference on computer vision, 2011: Ieee, pp. 2564-2571.
[5] H. Bay, T. Tuytelaars, and L. Van Gool, 'Surf: Speeded up robust features,' in European conference on computer vision, 2006: Springer, pp. 404-417.
[6] M. Calonder, V. Lepetit, M. Ozuysal, T. Trzcinski, C. Strecha, and P. Fua, 'BRIEF: Computing a local binary descriptor very fast,' IEEE transactions on pattern analysis and machine intelligence, vol. 34, no. 7, pp. 1281-1298, 2011.
[7] J. S. Pérez, E. Meinhardt-Llopis, and G. Facciolo, 'TV-L1 optical flow estimation,' Image Processing On Line, vol. 2013, pp. 137-150, 2013.
[8] G. Farnebäck, 'Two-frame motion estimation based on polynomial expansion,' in Scandinavian conference on Image analysis, 2003: Springer, pp. 363-370.
[9] S. Liu, L. Yuan, P. Tan, and J. Sun, 'Bundled camera paths for video stabilization,' ACM Transactions on Graphics (TOG), vol. 32, no. 4, pp. 1-10, 2013.
[10] M. Grundmann, V. Kwatra, and I. Essa, 'Auto-directed video stabilization with robust l1 optimal camera paths,' in CVPR 2011, 2011: IEEE, pp. 225-232.
[11] R. Hu, R. Shi, I.-f. Shen, and W. Chen, 'Video stabilization using scale-invariant features,' in 2007 11th International Conference Information Visualization (IV'07), 2007: IEEE, pp. 871-877.
[12] K. Ratakonda, 'Real-time digital video stabilization for multi-media applications,' in ISCAS'98. Proceedings of the 1998 IEEE International Symposium on Circuits and Systems (Cat. No. 98CH36187), 1998, vol. 4: IEEE, pp. 69-72.
[13] H.-C. Chang, S.-H. Lai, and K.-R. Lu, 'A robust real-time video stabilization algorithm,' Journal of Visual Communication and Image Representation, vol. 17, no. 3, pp. 659-673, 2006.
[14] S. Izadi et al., 'KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera,' in Proceedings of the 24th annual ACM symposium on User interface software and technology, 2011, pp. 559-568.
[15] D. S. Babu, N. Leelavathy, R. Rath, and M. Varma, 'Recognition of object using improved features extracted from deep convolution network,' Indian Journal of Public Health Research Development, vol. 9, no. 12, pp. 1517-1524, 2018.
[16] P. O. Pinheiro, T.-Y. Lin, R. Collobert, and P. Dollár, 'Learning to refine object segments,' in European conference on computer vision, 2016: Springer, pp. 75-91.
[17] D. Cheng, Y. Gong, S. Zhou, J. Wang, and N. Zheng, 'Person re-identification by multi-channel parts-based cnn with improved triplet loss function,' in Proceedings of the iEEE conference on computer vision and pattern recognition, 2016, pp. 1335-1344.
[18] M. Wang et al., 'Deep online video stabilization with multi-grid warping transformation learning,' IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2283-2292, 2018.
[19] S. Liu, P. Tan, L. Yuan, J. Sun, and B. Zeng, 'Meshflow: Minimum latency online video stabilization,' in European Conference on Computer Vision, 2016: Springer, pp. 800-815.
[20] S. Z. Xu, J. Hu, M. Wang, T. J. Mu, and S. M. Hu, 'Deep video stabilization using adversarial networks,' in Computer Graphics Forum, 2018, vol. 37, no. 7: Wiley Online Library, pp. 267-276.
[21] M. Zhao and Q. Ling, 'PWStableNet: Learning Pixel-Wise Warping Maps for Video Stabilization,' IEEE Transactions on Image Processing, vol. 29, pp. 3582-3595, 2020.
[22] S. Nah, T. Hyun Kim, and K. Mu Lee, 'Deep multi-scale convolutional neural network for dynamic scene deblurring,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3883-3891.
[23] D. G. Lowe, 'Distinctive image features from scale-invariant keypoints,' International journal of computer vision, vol. 60, no. 2, pp. 91-110, 2004.
[24] S. Liu, L. Yuan, P. Tan, and J. Sun, 'Steadyflow: Spatially smooth optical flow for video stabilization,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 4209-4216.
[25] H. Taira et al., 'InLoc: Indoor visual localization with dense matching and view synthesis,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7199-7209.
[26] M. Dusmanu et al., 'D2-Net: A Trainable CNN for Joint Description and Detection of Local Features,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 8092-8101.
[27] J. Xiao, H. Cheng, H. Sawhney, C. Rao, and M. Isnardi, 'Bilateral filtering-based optical flow estimation with occlusion detection,' in European conference on computer vision, 2006: Springer, pp. 211-224.
[28] F. Liu, M. Gleicher, H. Jin, and A. Agarwala, 'Content-preserving warps for 3D video stabilization,' ACM Transactions on Graphics (TOG), vol. 28, no. 3, pp. 1-9, 2009.
[29] A. Goldstein and R. Fattal, 'Video stabilization using epipolar geometry,' ACM Transactions on Graphics (TOG), vol. 31, no. 5, pp. 1-10, 2012.
[30] S. Ullman, 'The interpretation of structure from motion,' Proceedings of the Royal Society of London. Series B. Biological Sciences, vol. 203, no. 1153, pp. 405-426, 1979.
[31] S. Liu, Y. Wang, L. Yuan, J. Bu, P. Tan, and J. Sun, 'Video stabilization with a depth camera,' in 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012: IEEE, pp. 89-95.
[32] J. Kopf, '360 video stabilization,' ACM Transactions on Graphics (TOG), vol. 35, no. 6, pp. 1-9, 2016.
[33] A. Karpenko, D. Jacobs, J. Baek, and M. Levoy, 'Digital video stabilization and rolling shutter correction using gyroscopes,' CSTR, vol. 1, no. 2, p. 13, 2011.
[34] J. Yu and R. Ramamoorthi, 'Robust Video Stabilization by Optimization in CNN Weight Space,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3800-3808.
[35] J. Choi and I. S. Kweon, 'Deep Iterative Frame Interpolation for Full-frame Video Stabilization,' ACM Transactions on Graphics (TOG), vol. 39, no. 1, pp. 1-9, 2020.
[36] M. Jaderberg, K. Simonyan, and A. Zisserman, 'Spatial transformer networks,' in Advances in neural information processing systems, 2015, pp. 2017-2025.
[37] I. Goodfellow et al., 'Generative adversarial nets,' in Advances in neural information processing systems, 2014, pp. 2672-2680.
[38] O. Ronneberger, P. Fischer, and T. Brox, 'U-net: Convolutional networks for biomedical image segmentation,' in International Conference on Medical image computing and computer-assisted intervention, 2015: Springer, pp. 234-241.
[39] H. Zhao, O. Gallo, I. Frosio, and J. Kautz, 'Loss functions for image restoration with neural networks,' IEEE Transactions on computational imaging, vol. 3, no. 1, pp. 47-57, 2016.
[40] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, 'Unsupervised learning of depth and ego-motion from video,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1851-1858.
[41] C. Ledig et al., 'Photo-realistic single image super-resolution using a generative adversarial network,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4681-4690.
[42] K. Simonyan and A. Zisserman, 'Very deep convolutional networks for large-scale image recognition,' arXiv preprint arXiv:1409.1556, 2014.
[43] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, 'Flownet 2.0: Evolution of optical flow estimation with deep networks,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2462-2470.
[44] R. Hartley and A. Zisserman, Multiple view geometry in computer vision. Cambridge university press, 2003.
[45] D. Sun, X. Yang, M.-Y. Liu, and J. Kautz, 'Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8934-8943.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50454-
dc.description.abstract本篇論文提出一個深度學習的方法來解決線上影片穩定的問題,我們建立了一個多尺度的架構,只需輸入給神經網絡當前不穩定的幀和歷史穩定過的幀而不需要任何未來的影像,即可實時的將不穩定影片轉換成穩定影片,另外我們的方法會生成像素級的轉換圖,相較於過去方法使用一個單應矩陣或者切成網格式的轉換在每一個像素的轉換可以更準確,除此之外,我們還提出了一個二階段訓練方式,可以讓訓練出來的結果更具有穩健性。從我們的實驗結果可以發現,我們方法相較於傳統方法減少了扭曲的現象,且比現有的基於深度學習的線上穩定方法表現更好,另外,相較於最先進的幾個影片穩定方法,我們的方法目前是最為快速的。zh_TW
dc.description.abstractIn this thesis, a learning-based method is proposed to solve the online video stabilization problems. We build a multi-scale architecture and can stabilize the unstable videos in real time after feeding current unstable frame and historical stable frames to the neural network without using any future frames. Our network can estimate a pixel-based warping map to make the transformations of each pixel more precise than just calculating a global homography or multiple homographies. Besides, a two-stage training method is proposed to train our network, which makes the network more robust. Experimental results show that our algorithm achieves comparable performance with traditional methods and has better results than the state-of-the-art online stabilization methods based on learning. Moreover, our approach has the highest processing speed than the state-of-the-art methods.en
dc.description.provenanceMade available in DSpace on 2021-06-15T12:41:25Z (GMT). No. of bitstreams: 1
U0001-1108202013123200.pdf: 1854450 bytes, checksum: 1e6c9efb665dc6746819f356d0488fed (MD5)
Previous issue date: 2020
en
dc.description.tableofcontentsCONTENTS
口試委員會審定書 #
誌謝 i
中文摘要 ii
ABSTRACT iii
CONTENTS iv
LIST OF FIGURES vi
LIST OF TABLES vii
Chapter 1 Introduction 1
Chapter 2 Related Work 5
2.1 Traditional Video Stabilization Methods 5
2.1.1 Offline Methods 5
2.1.2 Online Methods 6
2.2 Learning-Based Video Stabilization Methods 7
2.2.1 Offline Methods 7
2.2.2 Online Methods 8
Chapter 3 Proposed Method 9
3.1 Training Data and Pre-Processing 9
3.2 Pipeline 11
3.3 Network Architecture 12
3.4 Two-stage Training Method 13
3.5 Loss Function 14
3.5.1 Stability Loss 14
3.5.2 Shape Loss 15
3.5.3 Temporal Loss 16
3.6 Implementation Details 17
Chapter 4 Experiments 18
4.1 Evaluation Data 18
4.2 Computational Time Performance 18
4.3 Quantitative Evaluation 19
4.3.1 Ablation Study 21
4.3.2 Comparison with Offline Methods 22
4.3.3 Comparison with Online Methods 23
4.4 Limitations 24
Chapter 5 Conclusions 25
Chapter 6 Future Works 26
REFERENCE 27
dc.language.isoen
dc.subject深度學習zh_TW
dc.subject線上影片穩定zh_TW
dc.subject線上影片穩定zh_TW
dc.subject多尺度架構zh_TW
dc.subject多尺度架構zh_TW
dc.subject深度學習zh_TW
dc.subject像素級轉換zh_TW
dc.subject像素級轉換zh_TW
dc.subjectdeep learningen
dc.subjectpixel-based warpingen
dc.subjectmulti-scale architectureen
dc.subjectonline video stabilizationen
dc.title基於像素級轉換的多尺度深度線上影片穩定zh_TW
dc.titleMulti-Scale Deep Online Video Stabilization with Pixel-Based Warping
en
dc.typeThesis
dc.date.schoolyear108-2
dc.description.degree碩士
dc.contributor.oralexamcommittee歐陽明(Ming Ouhyoung),莊仁輝(Jen-Hui Chuang),李明穗(Ming-Sui Lee),郭景明(Jing-Ming Guo)
dc.subject.keyword線上影片穩定,深度學習,多尺度架構,像素級轉換,zh_TW
dc.subject.keywordonline video stabilization,deep learning,multi-scale architecture,pixel-based warping,en
dc.relation.page32
dc.identifier.doi10.6342/NTU202002926
dc.rights.note有償授權
dc.date.accepted2020-08-11
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
U0001-1108202013123200.pdf
  Restricted Access
1.81 MBAdobe PDF
Show simple item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved