Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電子工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71493
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳良基(Liang-Gee Chen)
dc.contributor.authorKeng-Chi Liuen
dc.contributor.author劉庚錡zh_TW
dc.date.accessioned2021-06-17T06:01:47Z-
dc.date.available2019-02-12
dc.date.copyright2019-02-12
dc.date.issued2019
dc.date.submitted2019-01-31
dc.identifier.citation[1] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, 2014.
[2] S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster R-CNN: towards real-time object detection with region proposal networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2017.
[3] J. Zbontar and Y. LeCun, “Stereo matching by training a convolutional neural network to compare image patches,” Journal of Machine Learning Research, vol. 17, pp. 65:1–65:32, 2016.
[4] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 4, pp. 640–651, 2017.
[5] K. Liu, Y. Shen, and L. Chen, “Simple online and realtime tracking with spherical panoramic camera,” in IEEE International Conference on Consumer Electronics, ICCE 2018, Las Vegas, NV, USA, January 12-14, 2018, 2018, pp. 1–6.
[6] NVIDIA, “Gtc2017,” https://www.gizmodo.com.au/2017/05/live-blog-nvidia-gtc-2017-technology-keynote/, [Online; accessed November 10, 2018].
[7] S. Shah, D. Dey, C. Lovett, and A. Kapoor, “Airsim: High-fidelity visual and physical simulation for autonomous vehicles,” in Field and Service Robotics, Results of the 11th International Conference, FSR 2017, 2017, pp. 621–635.
[8] Y. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, L. Fei-Fei, and
A. Farhadi, “Target-driven visual navigation in indoor scenes using deep reinforcement learning,” in 2017 IEEE International Conference on Robotics and Automation, ICRA, 2017, pp. 3357–3364.
[9] Y. Zhu, D. Gordon, E. Kolve, D. Fox, L. Fei-Fei, A. Gupta, R. Mottaghi, and A. Farhadi, “Visual semantic planning using deep successor representations,” in IEEE International Conference on Computer Vision, ICCV, 2017, pp. 483–492.
[10] Z. Hong, Y. Chen, H. Yang, S. Su, T. Shann, Y. Chang, B. H. Ho,
C. Tu, T. Hsiao, H. Hsiao, S. Lai, Y. Chang, and C. Lee, “Virtual-to-real: Learning to control in visual semantic segmentation,” in Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI, 2018, pp. 4912–4920.
[11] Y. Wu, Y. Wu, G. Gkioxari, and Y. Tian, Building generalizable agents with a realistic and rich 3d environment,” CoRR, vol. abs/1801.02209, 2018.
[12] A. Mousavian, A. Toshev, M. Fiser, J. Kosecka, and J. Davidson, “Visual representations for semantic target driven navigation,” CoRR, vol. abs/1805.06066, 2018.
[13] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2017, pp. 6230-6239.
[14] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 3213–3223.
[15] S. Hong, S. Kwak, and B. Han, “Weakly supervised learning with deep convolutional neural networks for semantic segmentation: Understanding semantic layout of images with minimum human supervision,” IEEE Signal Process. Mag., vol. 34, no. 6, pp. 39–49, 2017.
[16] S. Song, S. P. Lichtenberg, and J. Xiao, “SUN RGB-D: A RGB-D scene understanding benchmark suite,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2015, pp. 567–576.
[17] J. Hoffman, E. Tzeng, T. Park, J. Zhu, P. Isola, K. Saenko, A. A. Efros, and T. Darrell, “Cycada: Cycle-consistent adversarial domain adaptation,” in Proceedings of the 35th International Conference on Machine Learning, ICML, 2018, pp. 1994–2003.
[18] J. Dai, K. He, and J. Sun, “Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation,” in 2015 IEEE International Conference on Computer Vision, ICCV, 2015, pp. 1635–1643.
[19] G. Papandreou, L. Chen, K. P. Murphy, and A. L. Yuille, “Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation,” in 2015 IEEE International Conference on Computer Vision, ICCV, 2015, pp. 1742–1750.
[20] S. J. Oh, R. Benenson, A. Khoreva, Z. Akata, M. Fritz, and B. Schiele, “Exploiting saliency for object segmentation from image level labels,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2017, pp. 5038–5047.
[21] Q. Li, A. Arnab, and P. H. S. Torr, “Weakly- and semi-supervised panoptic segmentation,” in Computer Vision - ECCV 2018 - 15th European Conference, 2018, pp. 106–124.
[22] D. Lin, J. Dai, J. Jia, K. He, and J. Sun, “Scribblesup: Scribble-supervised convolutional networks for semantic segmentation,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 3159–3167.
[23] A. L. Bearman, O. Russakovsky, V. Ferrari, and F. Li, “What’s the point: Semantic segmentation with point supervision,” in Computer Vision - ECCV 2016 - 14th European Conference, 2016, pp. 549–565.
[24] D. Pathak, E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional multi-class multiple instance learning,” CoRR, vol. abs/1412.7144, 2014.
[25] D. Pathak, P. Kr¨ahenb¨uhl, and T. Darrell, “Constrained convolutional neural networks for weakly supervised segmentation,” in 2015 IEEE International Conference on Computer Vision, ICCV, 2015, pp. 1796–1804.
[26] X. Qi, Z. Liu, J. Shi, H. Zhao, and J. Jia, “Augmented feedback in semantic segmentation under image level supervision,” in Computer Vision - ECCV 2016 - 14th European Conference, 2016, pp. 90–105.
[27] P. H. O. Pinheiro and R. Collobert, “From image-level to pixel-level labeling with convolutional networks,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2015, pp. 1713–1721.
[28] S. Kwak, S. Hong, and B. Han, “Weakly supervised semantic segmentation using superpixel pooling network,” in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017, pp. 4111–4117.
[29] A. Kolesnikov and C. H. Lampert, “Seed, expand and constrain: Three principles for weakly-supervised image segmentation,” in Computer Vision - ECCV 2016 - 14th European Conference, 2016, pp. 695–711.
[30] W. Shimoda and K. Yanai, “Distinct class-specific saliency maps for weakly supervised semantic segmentation,” in Computer Vision
- ECCV 2016 - 14th European Conference.
[31] A. Roy and S. Todorovic, “Combining bottom-up, top-down, and smoothness cues for weakly supervised image segmentation,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2017, pp. 7282–7291.
[32] Y. Wei, J. Feng, X. Liang, M. Cheng, Y. Zhao, and S. Yan, “Object region mining with adversarial erasing: A simple classification to semantic segmentation approach,” CoRR, vol. abs/1703.08448, 2017.
[33] A. Chaudhry, P. K. Dokania, and P. H. S. Torr, “Discovering class-specific pixels for weakly-supervised semantic segmentation,” in British Machine Vision Conference 2017, BMVC, 2017.
[34] J. Ahn and S. Kwak, “Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation,” CoRR, vol. abs/1803.10464, 2018.
[35] X. Wang, S. You, X. Li, and H. Ma, “Weakly-supervised semantic segmentation by iteratively mining common object features,” CoRR, vol. abs/1806.04659, 2018.
[36] Y. Wei, H. Xiao, H. Shi, Z. Jie, J. Feng, and T. S. Huang, “Revisiting dilated convolution: A simple approach for weakly- and semi- super-vised semantic segmentation,” CoRR, vol. abs/1805.04574, 2018.
[37] B. Zhou, A. Khosla, `A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 2921–2929.
[38] H. Caesar, J. R. R. Uijlings, and V. Ferrari, “Coco-stuff: Thing and stuff classes in context,” CoRR, vol. abs/1612.03716, 2016.
[39] J. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in IEEE International Conference on Computer Vision, ICCV, 2017, pp. 2242–2251.
[40] A. Canziani, A. Paszke, and E. Culurciello, “An analysis of deep neural network models for practical applications,” CoRR, vol. abs/1605.07678, 2016.
[41] M. Thoma, “A survey of semantic segmentation,” CoRR, vol. abs/1602.06541, 2016.
[42] A. Handa, T. Whelan, J. McDonald, and A. J. Davison, “A benchmark for RGB-D visual odometry, 3d reconstruction and SLAM,” in 2014 IEEE International Conference on Robotics and Automation, ICRA, 2014, pp. 1524–1531.
[43] E. Romera, J. M. Alvarez, L. M. Bergasa, and R. Arroyo, “Erfnet: Efficient residual factorized convnet for real-time semantic segmentation,” IEEE Trans. Intelligent Transportation Systems, vol. 19, no. 1, pp. 263–272, 2018.
[44] H. Wen, S. Zhou, Z. Liang, Y. Zhang, D. Feng, X. Zhou, and C. Yao, “Training bit fully convolutional network for fast semantic segmentation,” CoRR, vol. abs/1612.00212, 2016.
[45] M. Everingham, S. M. A. Eslami, L. J. V. Gool, C. K. I. Williams,
J. M. Winn, and A. Zisserman, “The pascal visual object classes challenge: A retrospective,” International Journal of Computer Vision, vol. 111, no. 1, pp. 98–136, 2015.
[46] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from RGBD images,” in Computer Vision
- ECCV 2012 - 12th European Conference on Computer Vision, 2012, pp. 746–760.
[47] J. Shotton, J. M. Winn, C. Rother, and A. Criminisi, “Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context,” International Journal of Computer Vision, vol. 81, no. 1, pp. 2–23, 2009.
[48] P. Sturgess, K. Alahari, L. Ladicky, and P. H. S. Torr, “Combining appearance and structure from motion features for road scene understanding,” in British Machine Vision Conference, BMVC 2009, London, UK, September 7-10, 2009. Proceedings, 2009, pp. 1–11.
[49] J. Shotton, M. Johnson, and R. Cipolla, “Semantic texton forests for image categorization and segmentation,” in 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR, 2008.
[50] B. Fulkerson, A. Vedaldi, and S. Soatto, “Class segmentation and object localization with superpixel neighborhoods,” in IEEE 12th International Conference on Computer Vision, ICCV, 2009, pp. 670–677
[51] J. D. Lafferty, A. McCallum, and F. C. N. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” in Proceedings of the Eighteenth International Conference on Machine Learning ICML, 2001, pp. 282–289.
[52] H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” in 2015 IEEE International Conference on Computer Vision, ICCV, 2015, pp. 1520–1528.
[53] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 770–778.
[54] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, 2018.
[55] L. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” CoRR, vol. abs/1706.05587, 2017.
[56] G. Lin, A. Milan, C. Shen, and I. D. Reid, “Refinenet: Multi-path refinement networks for high-resolution semantic segmentation,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2017, pp. 5168–5177.
[57] D. Eigen and R. Fergus, “Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture,” in 2015 IEEE International Conference on Computer Vision, ICCV, 2015, pp. 2650–2658.
[58] C. Hazirbas, L. Ma, C. Domokos, and D. Cremers, “Fusenet: Incorporating depth into semantic segmentation via fusion-based CNN architecture,” in Computer Vision - ACCV 2016 - 13th Asian Conference on Computer Vision, 2016, pp. 213-228.
[59] L. Ma, J. St¨uckler, C. Kerl, and D. Cremers, “Multi-view deep learn-ing for consistent semantic mapping with RGB-D cameras,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2017, pp. 598–605.
[60] S. Andrews, I. Tsochantaridis, and T. Hofmann, “Support vector machines for multiple-instance learning,” in Advances in Neural Information Processing Systems 15 Neural Information Processing Systems, NIPS, 2002, pp. 561–568.
[61] H. J. S. III, “Probability of error of some adaptive pattern-recognition machines,” IEEE Trans. Information Theory, vol. 11, no. 3, pp. 363–371, 1965.
[62] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and
D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in IEEE International Conference on Computer Vision, ICCV, 2017, pp. 618–626.
[63] F. Saleh, M. S. A. Akbarian, M. Salzmann, L. Petersson, and J. M. Alvarez, “Bringing background into the foreground: Making all classes equal in weakly-supervised video semantic segmentation,” in IEEE International Conference on Computer Vision, ICCV, 2017, pp. 2125-2135.
[64] B. Kulis, K. Saenko, and T. Darrell, “What you saw is not what you get: Domain adaptation using asymmetric kernel transforms,” in The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, 20-25 June 2011, 2011, pp. 1785-1792.
[65] R. Gopalan, R. Li, and R. Chellappa, “Domain adaptation for object recognition: An unsupervised approach,” in IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, November 6-13, 2011, 2011, pp. 999–1006.
[66] B. Gong, Y. Shi, F. Sha, and K. Grauman, “Geodesic flow kernel for unsupervised domain adaptation,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 2066–2073.
[67] B. Fernando, A. Habrard, M. Sebban, and T. Tuytelaars, “Unsupervised visual domain adaptation using subspace alignment,” in IEEE International Conference on Computer Vision, ICCV, 2013, pp. 2960–2967.
[68] B. Sun, J. Feng, and K. Saenko, “Return of frustratingly easy domain adaptation,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016, pp. 2058–2065.
[69] E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko, “Simultaneous deep transfer across domains and tasks,” in 2015 IEEE International Conference on Computer Vision, ICCV, 2015, pp. 4068–4076.
[70] Y. Ganin and V. S. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in Proceedings of the 32nd International Conference on Machine Learning, ICML, 2015, pp. 1180–1189.
[71] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. S. Lempitsky, “Domain-adversarial training of neural networks,” Journal of Machine Learning Research, vol. 17, pp. 59:1–59:35, 2016.
[72] M. Long, Y. Cao, J. Wang, and M. I. Jordan, “Learning transferable features with deep adaptation networks,” in Proceedings of the 32nd International Conference on Machine Learning, ICML, 2015, pp. 97–105.
[73] M. Long, H. Zhu, J. Wang, and M. I. Jordan, “Unsupervised domain adaptation with residual transfer networks,” in Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems NIPS, 2016, pp. 136-144.
[74] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. C. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, 2014, pp. 2672–2680.
[75] J. McCormac, A. Handa, S. Leutenegger, and A. J. Davison, “Scenenet RGB-D: 5m photorealistic images of synthetic indoor trajectories with ground truth,” CoRR, vol. abs/1612.05079, 2016.
[76] Y. Zhang, S. Song, E. Yumer, M. Savva, J. Lee, H. Jin, and T. A. Funkhouser, “Physically-based rendering for indoor scene understanding using convolutional neural networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, 2017, pp. 5057–5065.
[77] G. Ros, L. Sellart, J. Materzynska, D. V´azquez, and A. M. L´opez, “The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 3234–3243.
[78] S. R. Richter, V. Vineet, S. Roth, and V. Koltun, “Playing for data: Ground truth from computer games,” in Computer Vision – ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II, 2016, pp. 102–118.
[79] J. Hoffman, D. Wang, F. Yu, and T. Darrell, “Fcns in the wild: Pixel-level adversarial and constraint-based adaptation,” CoRR, vol. abs/1612.02649, 2016.
[80] Y. Chen, W. Chen, Y. Chen, B. Tsai, Y. F. Wang, and M. Sun, “No more discrimination: Cross city adaptation of road scene segmenters,” in IEEE International Conference on Computer Vision, ICCV, 2017, pp. 2011–2020.
[81] Y. Chen, W. Li, and L. V. Gool, “ROAD: reality oriented adaptation for semantic segmentation of urban scenes,” CoRR, vol. abs/1711.11556, 2017.
[82] Y. Zhang, P. David, and B. Gong, “Curriculum domain adaptation for semantic segmentation of urban scenes,” in IEEE International Conference on Computer Vision, ICCV, 2017, pp. 2039–2049.
[83] G. E. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” CoRR, vol. abs/1503.02531, 2015.
[84] K. He, J. Sun, and X. Tang, “Guided image filtering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 6, pp. 1397–1409, 2013.
[85] B. Ham, M. Cho, and J. Ponce, “Robust image filtering using joint static and dynamic guidance,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2015, pp. 4823–4831.
[86] X. Shen, C. Zhou, L. Xu, and J. Jia, “Mutual-structure for joint filtering,” International Journal of Computer Vision, vol. 125, no.1-3, pp. 19–33, 2017.
[87] S. Gu, W. Zuo, S. Guo, Y. Chen, C. Chen, and L. Zhang, “Learning dynamic guidance for depth image enhancement,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2017, pp. 712–721.
[88] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495, 2017.
[89] A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “Enet: A deep neural network architecture for real-time semantic segmentation,” CoRR, vol. abs/1606.02147, 2016.
[90] H. Zhao, X. Qi, X. Shen, J. Shi, and J. Jia, “Icnet for real-time semantic segmentation on high-resolution images,” in Computer Vision-ECCV 2018 - 15th European Conference, 2018, pp. 418–434.
[91] S. Mehta, M. Rastegari, A. Caspi, L. G. Shapiro, and H. Hajishirzi, “Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation,” in Computer Vision - ECCV 2018 - 15th European Conference, 2018, pp. 561–580.
[92] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding,” CoRR, vol. abs/1510.00149, 2015.
[93] P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, “Pruning convolutional neural networks for resource efficient transfer learning,” CoRR, vol. abs/1611.06440, 2016.
[94] Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang, “Learning efficient convolutional networks through network slimming,” in IEEE International Conference on Computer Vision, ICCV, 2017, pp. 2755–2763.
. Zhang, J. Zou, X. Ming, K. He, and J. Sun, “Efficient and accurate approximations of nonlinear convolutional networks,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, 2015, pp. 1984–1992.
[96] X. Liu, J. Pool, S. Han, and W. J. Dally, “Efficient sparse-winograd convolutional neural networks,” CoRR, vol. abs/1802.06367, 2018.
[97] V. Vanhoucke, A. Senior, and M. Z. Mao, “Improving the speed of neural networks on cpus,” in Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011, 2011.
[98] S. Anwar, K. Hwang, and W. Sung, “Fixed point optimization of deep convolutional neural networks for object recognition,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, 2015, pp. 1131–1135.
[99] D. Miyashita, E. H. Lee, and B. Murmann, “Convolutional neural networks using logarithmic data representation,” CoRR, vol. abs/1603.01025, 2016.
[100] S. Zhou, Z. Ni, X. Zhou, H. Wen, Y. Wu, and Y. Zou, “Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients,” CoRR, vol. abs/1606.06160, 2016.
[101] M. Courbariaux, Y. Bengio, and J. David, “Binaryconnect: Training deep neural networks with binary weights during propagations,” in Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, 2015, pp. 3123–3131.
[102] I. Hubara, M. Courbariaux, D. Soudry, R. ElYaniv, and Y. Bengio, “Binarized neural networks,” in Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems, 2016, pp. 4107–4115.
[103] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations,” Journal of Machine Learning Research, vol. 18, pp. 187:1–187:30, 2017.
[104] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net: Imagenet classification using binary convolutional neural networks,” in Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part IV, 2016, pp. 525–542.
[105] F. Li and B. Liu, “Ternary weight networks,” CoRR, vol. abs/1605.04711, 2016.
[106] C. Zhu, S. Han, H. Mao, and W. J. Dally, “Trained ternary quantization,” CoRR, vol. abs/1612.01064, 2016.
[107] J. Bohg, J. Romero, A. Herzog, and S. Schaal, “Robot arm pose estimation through pixel-wise part classification,” in 2014 IEEE International Conference on Robotics and Automation, ICRA, 2014, pp. 3143–3150.
[108] T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan,
P. Doll´ar, and C. L. Zitnick, “Microsoft COCO: common objects in context,” in Computer Vision - ECCV 2014 - 13th European Conference, 2014, pp. 740–755.
[109] S. Gupta, R. B. Girshick, P. A. Arbel´aez, and J. Malik, “Learning rich features from RGB-D images for object detection and segmentation,” in Computer Vision - ECCV 2014 - 13th European Conference, 2014, pp. 345-360.
[110] A. Tonioni, M. Poggi, S. Mattoccia, and L. di Stefano, “Unsupervised adaptation for deep stereo,” in IEEE International Conference on Computer Vision, ICCV, 2017, pp. 1614-1622.
[111] A. Bansal, X. Chen, B. C. Russell, A. Gupta, and D. Ramanan, “Pixelnet: Representation of the pixels, by the pixels, and for the pixels, “CoRR, vol. abs/1702.06506, 2017.
[112] A. Janoch, S. Karayev, Y. Jia, J. T. Barron, M. Fritz, K. Saenko, and
T. Darrell, “A category-level 3-d object dataset: Putting the kinect to work,” in IEEE International Conference on Computer Vision Workshops, ICCV, 2011, pp. 1168–1174.
[113] J. Xiao, A. Owens, and A. Torralba, “SUN3D: A database of big spaces reconstructed using sfm and object labels,” in IEEE International Conference on Computer Vision, ICCV, 2013, pp. 1625–1632.
[114] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014.
[115] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR, 2009, pp. 248–255.
[116] M. Alwani, H. Chen, M. Ferdman, and P. Milder, “Fused-layer CNN accelerators,” in 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO, 2016, pp. 22:1–22:12.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71493-
dc.description.abstract開發能夠基於視覺感知執行類似人類行為的裝置是人工智慧領域的目標,而像素級別上的視覺資訊(例如場景解析)對於這樣的目標應用是有益的,近年來,由於深度學習的發展,這些任務取得了重大進展,然而,效率仍然是一個主要問題,我們提到的術語「效率」指的是數據收集和計算資源需求。
由監督方法預測所得的結果雖然效果顯著,但必須依賴於大規模像素級別的標注資料,這是十分耗時且昂貴的,因此,減輕繁重的人力需求成為訓練過程中的關鍵議題。合成數據和弱監督方法被提出用來克服這一挑戰; 不幸的是,前者遭遇到嚴重的域轉移問題,而後者缺乏準確的圖像邊界資訊,此外,大多數現有的弱監督研究只能處理前景突出的「物體」。因此,為了解決這個問題,我們提出一個輔助師生學習框架,通過自適應具有較低領域差異之輔助訊息(例如深度)及特定領域之弱注釋(例如真實外觀)構建的資訊來訓練這種不具轉移性的任務,此後,透過開發的兩階段投票機制,可以有效地將這種不完美信息整合起來。
從推論階段的角度來看,計算資源的需求一直是最主要問題,典型的神經網路運行時需要大的記憶體和使用32位元浮點數計算,此外,上述的問題與僅具有幾個類別輸出的分類網絡不同,輸出需與輸入有相對應的關係包含維度和位置,這將耗費更多資源並且可能無法使用現有文獻所提供的方法來優化,然而大多數的研究仍致力於分類網絡上。
在本論文中,考慮到現實世界應用的實用性和必要性,我們的目標是設計高效能的場景解析演算法,須同時考量到標注資料的需求量、運算複雜度和性能。首先,通過對損失函數引入最小-最大歸一化,深度資訊得以減少室內場景的領域差異,此外,我們以現實到生成的重建生成器實現無監督感測器深度圖恢復。其次,我們通過深度自適應輔助師生學習以及特定領域的弱監督訊息提出了場景解析的演算法架構,我們基於兩階段整合機制提供損失函數訓練網路,以便產生更準確的結果。本論文所得之方法在評價函數mIoU方面的表現優於目前最佳的適應方法14.63%。最後,我們介紹了一種量化高效場景解析框架的方法,可以在只有1.8%的mIoU損失情況下,將模型大小減小21.9倍和激活值減小8.2倍。
zh_TW
dc.description.abstractDeveloping autonomous mobile agents that can perform behaviors like human based on their visual perception is an goal in the field of artificial intelligence and pixel-wise visual cues such as scene parsing are beneficial to such high-level applications. Significant improvement for these tasks have been made recent years due to the evolution of deep learning. Nevertheless, in addition to accuracy, efficiency remains a major issue. The term “efficiency” we have mentioned refers to both data collection and computational complexity.
Remarkable scene parsing results made by supervised methods rely on numerous pixel-level annotations, which are time-consuming and expensive to obtain. Hence, to alleviate such cumbersome manual effort becomes a crucial issue during training procedure. Synthetic rendered data and weakly-supervised methods have been explored to overcome this challenge; unfortunately, the former suffer from severe domain shift, the latter with imprecise information. Moreover, majority of existing researches for weak supervision are only capable of handling foreground salient “things”. Hence, to address the issue, we employ an auxiliary teacher-student learning framework to train such untransferable task through pseudo-ground truths constructed by adapting auxiliary cues with lower domain discrepancy (e.g. depth) and leveraging domain-specific information (e.g. real appearance) in weak form. Thereafter, this imperfect information can be integrated effectively by developing a two-stage voting mechanism.
From inference phase perspective, complexity has been the main issue for edge computing all the while. A typical network requires large run-time memory and 32-bit floating point computation. Furthermore, unlike general classification networks with only several category outputs,the hourglass network output is the same size and dimension as the input, which cost more resources. However, most of the previous researches focused on classification networks.
In this thesis, considering the practicality as well as necessity of real world applications, our goal is to develop a “efficient” scene parsing algorithm with focus on three objectives: labeling, complexity, performance. First, it is shown that depth diminish more domain discrepancy for indoor scenes by introducing min-max normalization to the loss function. Additionally, we argue that the generator for real-to-sim reconstruction is capable of performing unsupervised sensor depth map restoration. Second, a scene parsing framework is proposed by performing auxiliary teacher-student learning with depth adaptation as well as domain-specific weak supervision information. We train a network based on the loss function that penalizes predictions disagreeing with the highly confident pseudo-ground truths provided by a two-stage integration mechanism so as to produce more accurate segmentations. The proposed method turns out to outperform the state-of-the-art adaptation method by 14.63% in terms of mean Intersection over Union (mIoU). Lastly, we extend the existing method to quantize the target lightweight scene parsing network into ternary weights and low bit-width activations (3-4 bits), which can reduce the model size to 21.9X and activation size to 8.2X smaller with only 1.8% mIoU loss.
en
dc.description.provenanceMade available in DSpace on 2021-06-17T06:01:47Z (GMT). No. of bitstreams: 1
ntu-108-R05943002-1.pdf: 43584334 bytes, checksum: 3d74d3cd31a1b4af891609dac7faa0da (MD5)
Previous issue date: 2019
en
dc.description.tableofcontentsAbstract xi
1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Design Consideration . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . 9
2 Background 11
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Accuracy-oriented Advancement . . . . . . . . . . . . . . . . 12
2.3 Annotation Effort Alleviation . . . . . . . . . . . . . . . . . 14
2.3.1 Weakly Supervision . . . . . . . . . . . . . . . . . . . 14
2.3.2 Unsupervised Domain Adaptation . . . . . . . . . . . 18
2.3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Computational complexity Reduction . . . . . . . . . . . . . 22
2.4.1 Network Structure . . . . . . . . . . . . . . . . . . . 22
2.4.2 Hardware-Friendly Methods for Classification . . . . 24
3 The Proposed Scene Parsing Framework 27
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Proposed Scene Parsing Algorithm . . . . . . . . . . . . . . 28
3.2.1 Depth-aware Adaptation . . . . . . . . . . . . . . . . 28
3.2.2 Domain-specific Weak Localization . . . . . . . . . . 32
3.2.3 Mechanism for Cues Integration . . . . . . . . . . . . 35
3.2.4 Training of Student Network . . . . . . . . . . . . . . 38
3.3 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.2 Implementation Detail . . . . . . . . . . . . . . . . . 41
3.3.3 Evaluation Matrix . . . . . . . . . . . . . . . . . . . 42
3.3.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . 43
3.3.5 Result . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4 Unsupervised Depth Restoration via Adaptation and RANSAC
Scale Recovering . . . . . . . . . . . . . . . . . . . . . . . . 53
3.4.1 Scale Recovering . . . . . . . . . . . . . . . . . . . . 53
3.4.2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4.3 Evaluation Matrix . . . . . . . . . . . . . . . . . . . 55
3.4.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . 56
3.4.5 Result . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4 Hardware Oriented Design and Analysis 61
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Network Structure . . . . . . . . . . . . . . . . . . . . . . . 62
4.3 Related Quantization Methods . . . . . . . . . . . . . . . . . 66
4.3.1 Low bit-width Quantization . . . . . . . . . . . . . . 66
4.3.2 Binarization . . . . . . . . . . . . . . . . . . . . . . . 67
4.3.3 Ternarization . . . . . . . . . . . . . . . . . . . . . . 68
4.4 Proposed Quantization Method . . . . . . . . . . . . . . . . 69
4.5 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.6 Bandwidth Issue Discussion . . . . . . . . . . . . . . . . . . 77
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5 Conclusion 83
Bibliography 85
dc.language.isoen
dc.subject場景解析zh_TW
dc.subject自適應zh_TW
dc.subject弱監督zh_TW
dc.subject領域差異zh_TW
dc.subject效率zh_TW
dc.subjectDomain discrepancyen
dc.subjectAdaptationen
dc.subjectWeak supervisionen
dc.subjectScene Parsingen
dc.subjectEfficiencyen
dc.title以低差異領域自適應及特定領域弱注釋實現高效能室內場景解析zh_TW
dc.titleLow Discrepancy Adaptation with Weak Domain-specific Annotations for Efficient Indoor Scene Parsingen
dc.typeThesis
dc.date.schoolyear107-1
dc.description.degree碩士
dc.contributor.oralexamcommittee簡韶逸(Shao-Yi Chien),楊佳玲(Chia-Lin Yang),徐宏民(Hung-Min Hsu)
dc.subject.keyword場景解析,自適應,弱監督,領域差異,效率,zh_TW
dc.subject.keywordScene Parsing,Adaptation,Weak supervision,Domain discrepancy,Efficiency,en
dc.relation.page100
dc.identifier.doi10.6342/NTU201900349
dc.rights.note有償授權
dc.date.accepted2019-01-31
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電子工程學研究所zh_TW
顯示於系所單位:電子工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-108-1.pdf
  未授權公開取用
42.56 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved