Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/60916
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor吳沛遠(Pei-Yuan Wu)
dc.contributor.authorPo-Min Hsuen
dc.contributor.author許博閔zh_TW
dc.date.accessioned2021-06-16T10:36:15Z-
dc.date.available2020-08-21
dc.date.copyright2020-08-21
dc.date.issued2020
dc.date.submitted2020-08-07
dc.identifier.citationJ. L. Ba, J. R. Kiros, and G. E. Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
A. Bellet, A. Habrard, and M. Sebban. A survey on metric learning for feature vectors and structured data. arXiv preprint arXiv:1306.6709, 2013.
I. Bello, B. Zoph, A. Vaswani, J. Shlens, and Q. V. Le. Attention augmented convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 3286–3295, 2019.
S. K. Biswas and P. Milanfar. One shot detection with laplacian object and fast matrix cosine similarity. IEEE transactions on pattern analysis and machine intelligence, 38(3):546–562, 2015.
Q. Cai, Y. Pan, T. Yao, C. Yan, and T. Mei. Memory matching networks for one-shot image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4080–4088, 2018.
Z. Cai and N. Vasconcelos. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6154–6162, 2018.
Y. Cao, J. Xu, S. Lin, F. Wei, and H. Hu. Gcnet: Non-local networks meet squeeze excitation networks and beyond. In Proceedings of the IEEE International Conference
on Computer Vision Workshops, pages 0–0, 2019.
H. Chen, Y. Wang, G. Wang, X. Bai, and Y. Qiao. Progressive object transfer detection. IEEE Transactions on Image Processing, 29:986–1000, 2019.
H. Chen, Y. Wang, G. Wang, and Y. Qiao. Lstd: A low-shot transfer detector for object detection. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 764–773, 2017.
M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes challenge: A retrospective. International journal of computer vision, 111(1):98–136, 2015.
M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2): 303–338, 2010.
Q. Fan, W. Zhuo, and Y.-W. Tai. Few-shot object detection with attention-rpn and multi-relation detector. arXiv preprint arXiv:1908.01998, 2019.
K. Fu, T. Zhang, Y. Zhang, and X. Sun. Oscd: A one-shot conditional object detection framework. Neurocomputing, 2020.
R. Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015.
R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587, 2014.
K. He, G. Gkioxari, P. Dollár, and R. Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
E. Hoffer and N. Ailon. Deep metric learning using triplet network. In International Workshop on Similarity-Based Pattern Recognition, pages 84–92. Springer, 2015.
T.-I. Hsieh, Y.-C. Lo, H.-T. Chen, and T.-L. Liu. One-shot object detection with coattention and co-excitation. In Advances in Neural Information Processing Systems, pages 2721–2730, 2019.
B. Kang, Z. Liu, X. Wang, F. Yu, J. Feng, and T. Darrell. Few-shot object detection via feature reweighting. In Proceedings of the IEEE International Conference on Computer Vision, pages 8420–8429, 2019.
L. Karlinsky, J. Shtok, S. Harary, E. Schwartz, A. Aides, R. Feris, R. Giryes, and A. M. Bronstein. Repmet: Representative-based metric learning for classification and few-shot object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5197–5206, 2019.
G. Koch, R. Zemel, and R. Salakhutdinov. Siamese neural networks for one-shot image recognition. In ICML deep learning workshop, volume 2. Lille, 2015.
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.
G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, and C. I. Sánchez. A survey on deep learning in medical image analysis. Medical image analysis, 42:60–88, 2017.
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. Ssd: Single shot multibox detector. In European conference on computer vision, pages 21–37. Springer, 2016.
L. v. d. Maaten and G. Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008.
C. Michaelis, I. Ustyuzhaninov, M. Bethge, and A. S. Ecker. One-shot instance segmentation. arXiv preprint arXiv:1811.11507, 2018.
S. Na and R. Yan. A new learning-based one shot detection framework for natural images. In International Conference on Artificial Neural Networks, pages 93–104. Springer, 2019.
A. Osokin, D. Sumin, and V. Lomakin. Os2d: One-stage one-shot object detection by matching anchor features. arXiv preprint arXiv:2003.06800, 2020.
N. O’Mahony, S. Campbell, A. Carvalho, S. Harapanahalli, G. V. Hernandez, L. Kr- 37 palkova, D. Riordan, and J. Walsh. Deep learning vs. traditional computer vision. In Science and Information Conference, pages 128–144. Springer, 2019.
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems, pages 8026–8037, 2019.
S. Ravi and H. Larochelle. Optimization as a model for few-shot learning. In ICLR, 2017.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
J. Redmon and A. Farhadi. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017.
J. Redmon and A. Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015.
H. J. Seo and P. Milanfar. Training-free, generic object detection using locally adaptive regression kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9):1688–1704, 2009.
B. Singh, M. Najibi, and L. S. Davis. Sniper: Efficient multi-scale training. In Advances in neural information processing systems, pages 9310–9320, 2018.
J. Snell, K. Swersky, and R. Zemel. Prototypical networks for few-shot learning. In Advances in neural information processing systems, pages 4077–4087, 2017.
F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1199–1208, 2018.
H. Takeda, S. Farsiu, and P. Milanfar. Kernel regression for image processing and reconstruction. IEEE Transactions on image processing, 16(2):349–366, 2007.
O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al. Matching networks for one shot learning. In Advances in neural information processing systems, pages 3630– 3638, 2016.
O. Vinyals, M. Fortunato, and N. Jaitly. Pointer networks. In Advances in neural information processing systems, pages 2692–2700, 2015.
X. Wang, R. Girshick, A. Gupta, and K. He. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7794–7803, 2018.
X. Wang, T. E. Huang, T. Darrell, J. E. Gonzalez, and F. Yu. Frustratingly simple few-shot object detection. arXiv preprint arXiv:2003.06957, 2020.
Y.-X. Wang, D. Ramanan, and M. Hebert. Meta-learning to detect rare objects. In Proceedings of the IEEE International Conference on Computer Vision, pages 9925– 9934, 2019.
C.-Y. Wu, R. Manmatha, A. J. Smola, and P. Krahenbuhl. Sampling matters in deep embedding learning. In Proceedings of the IEEE International Conference on Computer Vision, pages 2840–2848, 2017.
Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2020.
X. Yan, Z. Chen, A. Xu, X. Wang, X. Liang, and L. Lin. Meta r-cnn: Towards general solver for instance-level low-shot learning. In Proceedings of the IEEE International Conference on Computer Vision, pages 9577–9586, 2019.
Z. Zou, Z. Shi, Y. Guo, and J. Ye. Object detection in 20 years: A survey. arXiv preprint arXiv:1905.05055, 2019.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/60916-
dc.description.abstract在本文中,我們提出了ODVA用於單樣本物件檢測,其中要檢測的目標類別可以是訓練數據集中是沒見過的。 我們的ODVA使用可見類中的圖像進行訓練,而在推論階段中,ODVA會在查詢圖像中檢測與給定支持圖像匹配的對象,包含見過或沒見過的類別且無需進行任何模型微調。借助空間和通道注意力,對支持圖像中的可區分特徵進行編碼,並估算查詢圖像和支持圖像之間的相似度。 從中,基於餘量的損失函數旨在指導ODVA學習針對沒見過類別的合適度量方法。 對VOC和MS-COCO數據集的實驗評估表明,與其他最新的單樣本和元學習文獻相比,本文提出的ODVA是有效的。 此外,為了支持可解釋性,我們將RPN提案區域和注意力向量可視化,並通過消融研究證明ODVA中每個模塊的有效性。zh_TW
dc.description.abstractIn this thesis, we propose ODVA for one-shot object detection, in which the object to be detected can be unseen in the training dataset. Our ODVA is trained with images in the seen classes, while in the inference phase ODVA detects objects in the query image that match a given support image containing an unseen class without any fine-tuning. With the help of spatial and channel attentions, distinguishable features in the support image are encoded and the similarity between query and support images is estimated. From which, a margin-based loss is designed to guide ODVA into learning an appropriate metric for the unseen classes. Experimental evaluations on both VOC and MS-COCO datasets show the effectiveness of the proposed ODVA compared to other start-of-the-art one-shot and meta-learning works. In addition, to favor interpretability, we visualize the RPN proposals and attention vectors, and demonstrate the effectiveness of each module in ODVA through ablation study.en
dc.description.provenanceMade available in DSpace on 2021-06-16T10:36:15Z (GMT). No. of bitstreams: 1
U0001-0107202022242500.pdf: 15380021 bytes, checksum: 8767200cf862d1fd03cbeee52966b9f8 (MD5)
Previous issue date: 2020
en
dc.description.tableofcontentsAcknowledgements i
摘要 ii
Abstract iii
Contents iv
List of Figures vi
List of Tables viii
Chapter 1 Introduction 1
Chapter 2 Related work 5
Chapter 3 Proposed method 8
3.1 Problem definition 8
3.2 Model formulation 9
3.3 Spatial attention module 10
3.4 Channel attention module 12
3.5 Loss function 13
Chapter 4 Experiments 16
4.1 Datasets 16
4.2 Implementation detail 17
4.3 Existing benchmarks 18
4.4 Revised evaluation protocol 19
4.5 Cross domain detection 21
Chapter 5 Discussions 23
5.1 Visualizing RPN proposals 23
5.2 Visualizing the characteristics of channel attention 23
5.3 Performance over multiple runs 25
5.4 Ablation study 27
5.5 Pooling method and fusion method 30
5.6 Number of parameters and running time 31
Chapter 6 Conclusion 32
References 34
dc.language.isoen
dc.title單樣本物件檢測藉由多功能注意力機制zh_TW
dc.titleOne-Shot Object Detection Using Versatile Attentionsen
dc.typeThesis
dc.date.schoolyear108-2
dc.description.degree碩士
dc.contributor.oralexamcommittee丁建均(Jian-Jyun Ding),王鈺強(Yu-Ciang Wang),林昌鴻(Chang-Hong Lin)
dc.subject.keyword物件檢測,深度學習,單樣本學習,注意力模型,度量學習,zh_TW
dc.subject.keywordObject detection,deep learning,one-shot learning,attention model,metric learning,en
dc.relation.page40
dc.identifier.doi10.6342/NTU202001254
dc.rights.note有償授權
dc.date.accepted2020-08-10
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電信工程學研究所zh_TW
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
U0001-0107202022242500.pdf
  目前未授權公開取用
15.02 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved