請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/55384完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 徐宏民 | |
| dc.contributor.author | Yen-Liang Lin | en |
| dc.contributor.author | 林彥良 | zh_TW |
| dc.date.accessioned | 2021-06-16T03:59:38Z | - |
| dc.date.available | 2016-11-26 | |
| dc.date.copyright | 2014-11-26 | |
| dc.date.issued | 2014 | |
| dc.date.submitted | 2014-11-20 | |
| dc.identifier.citation | [1] Jia Deng, Jonathan Krause, and Li Fei-Fei. Fine-grained crowdsourcing for fine- grained recognition. In CVPR, 2013.
[2] Matthew J. Leotta and Joseph L. Mundy. Predicting high resolution image edges with a generic, adaptive, 3-d vehicle model. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1311--1318, 2009. [3] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. TPAMI, 2010. [4] Matthew J. Leotta and Joseph L. Mundy. Vehicle surveillance with a generic, adap- tive, 3d vehicle model. TPAMI, 2011. [5] Neeraj Kumar, Alexander C. Berg, Peter N. Belhumeur, and Shree K. Nayar. Describable visual attributes for face verification and image search. In IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), volume 33, pages 1962--1977, October 2011. [6] Mathias Eitz, Kristian Hildebrand, Tamy Boubekeur, and Marc Alexa. Sketch- based image retrieval: Benchmark and bag-of-features descriptors. TVCG, 17(11): 1624--1636, 2011. [7] Alec Rivers, Frédo Durand, and Takeo Igarashi. 3d modeling with silhouettes. In ACM SIGGRAPH, 2010. [8] Min Sun, Ali Farhadi, and Steve Seitz. Ranking domain-specific highlights by analyzing edited videos. In ECCV, 2014. [9] Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas Huang, and Yihong Gong. Locality-constrained linear coding for image classification. In CVPR, 2010. [10] A. Vedaldi and B. Fulkerson. VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/, 2008. [11] Florent Perronnin, Jorge Sánchez, and Thomas Mensink. Improving the fisher ker- nel for large-scale image classification. In ECCV, 2010. [12] Michael Stark, Jonathan Krause, Bojan Pepik, David Meger, James J. Little, Bernt Schiele, and Daphne Koller. Fine-grained categorization for 3d scene understand- ing. In BMVC, 2012. [13] Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3d object represen- tations for fine-grained categorization. In International IEEE Workshop on 3D Representation and Recognition, 2013. [14] Bin Zhao and Eric P. Xing. Quasi real-time summarization for consumer videos. In CVPR, 2014. [15] P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, and P. Perona. Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Insti- tute of Technology, 2010. [16] E. Gavves, B. Fernando, C.G.M. Snoek, A.W.M. Smeulders, and T. Tuytelaars. Fine-grained categorization by alignments. In ICCV, 2013. [17] Kun Duan, Devi Parikh, David Crandall, and Kristen Grauman. Discovering local- ized attributes for fine-grained recognition. In CVPR, 2012. [18] Ryan Farrell, Om Oza, Ning Zhang, Vlad I. Morariu, Trevor Darrell, and Larry S. Davis. Birdlets: Subordinate categorization using volumetric primitives and pose- normalized appearance. In ICCV, 2011. [19] Jiongxin Liu, Angjoo Kanazawa, David Jacobs, and Peter Belhumeur. Dog breed classification using part localization. In ECCV, 2012. [20] Bangpeng Yao, Aditya Khosla, and Li Fei-Fei. Combining randomization and dis- crimination for fine-grained image categorization. In CVPR, 2011. [21] Thomas Berg and Peter N. Belhumeur. Poof: Part-based one-vs-one features for fine-grained categorization, face verification, and attribute estimation. In CVPR, 2013. [22] N. Kumar, P. Belhumeur, and S. Nayar. Facetracer: A search engine for large collections of images with faces. In European Conference on Computer Vision, 2008. [23] D. A. Vaquero, R. S. Feris, D. Tran, L. Brown, A. Hampapur, and M. Turk. Attribute-based people search in surveillance environments. In IEEE Workshop on Applications of Computer Vision, pages 1--8, 2009. [24] Mathias Eitz, Kristian Hildebrand, Tamy Boubekeur, and Marc Alexa. An eval- uation of descriptors for large-scale image retrieval from sketched feature lines. Computers & Graphics, 34(5):482--498, 2010. [25] Abhinav Shrivastava, Tomasz Malisiewic, Abhinav Gupta, and Alexei A. Efros. Data-driven visual similarity for cross-domain image matching. In ACM SIGGRAPH ASIA, 2011. [26] A. Chalechale, G. Naghdy, and A. Mertins. Sketch-based image matching using angular partitioning. TSMC - Part A, 35:28--41, 2005. [27] S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002. [28] Yang Cao, Wang Changhu, Zhang Liqing, and Lei Zhang. Edgel inverted index for large-scale sketch-based image search. In CVPR, 2011. [29] Rui Hua and John Collomosse. A performance evaluation of gradient field hog descriptor for sketch based image retrieval. CVIU, 2013. [30] Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, and Shi-Min Hu. Sketch2photo: Internet image montage. ACM Trans. Graph., 28:124:1--124:10, 2009. [31] T.F. Cootes, C.J. Taylor, and J. Graham D.H. Cooper. Active shape models---their training and application. CVIU, 1995. [32] J. Gower. Generalized procrustes analysis. Psychometrika, pages 33--51, 1975. [33] Ning Zhang, Ryan Farrell, and Trevor Darrell. Pose pooling kernels for sub- category recognition. In CVPR, 2012. [34] Bojan Pepik, Peter Gehler, Michael Stark, and Bernt Schiele. 3d2pm - 3d de- formable part models. In ECCV, 2012. [35] Mohsen Hejrati and Deva Ramanan. Analyzing 3d objects in cluttered images. In NIPS, 2012. [36] BojanPepik,MichaelStark,PeterGehler,andBerntSchiele.Teaching3dgeometry to deformable part models. In CVPR, 2012. [37] M. Zeeshan Zia, Michael Stark, Bernt Schiele, and Konrad Schindler. Detailed 3d representations for object recognition and modeling. PAMI, 2013. [38] Yanghai Tsin, Yakup Genc, and Visvanathan Ramesh. Explicit 3d modeling for vehicle monitoring in non-overlapping cameras. In AVSS, 2009. [39] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2006. [40] Mustafa Özuysal, Vincent Lepetit, and Pascal Fua. Pose estimation for category specific multiview object localization. In CVPR, 2009. [41] Bastian Leibe, Alesˇ Leonardis, and Bernt Schiele. Robust object detection with interleaved categorization and segmentation. IJCV, 2007. [42] Y.Chai, V.Lempitsky, and A.Zisserman. Symbiotic segmentation and part localiza- tion for fine-grained categorization. In ICCV, 2013. [43] Mykhaylo Andriluka, Stefan Roth, and Bernt Schiele. People-tracking-by- detection and people-detection-by-tracking. In CVPR, 2008. [44] Rong-EnFan,Kai-WeiChang,Cho-JuiHsieh,Xiang-RuiWang,andChih-JenLin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871--1874, 2008. [45] Yan Li, Leon Gu, and Takeo Kanade. Robustly aligning a shape model and its application to car alignment of unknown pose. TPAMI, 2011. [46] Jonathan Krause, Jia Deng, Michael Stark, and Li Fei-Fei. Collecting a large-scale dataset of fine-grained cars. In CVPR-FGCV2, 2013. [47] Yanlin Guo, Cen Rao, Supun Samarasekera, Janet Kim, Rakesh Kumar, and Harpreet Sawhney. Matching vehicles under large pose transformations using ap- proximate 3d models and piecewise mrf model. In CVPR, 2009. [48] Huizhong Chen, Andrew Gallagher, and Bernd Girod. Describing clothing by se- mantic attributes. In ECCV, 2012. [49] M. Stark, M. Goesele, and B. Schiele. Back to the future: Learning shape models from 3d cad data. In British Machine Vision Conference, pages 106.1--106.11, 2010. [50] M. Arie-Nachmison and R. Basri. Constructing implicit 3d shape models for pose estimation. In International Conference on Computer Vision, pages 1341--1348, 2009. [51] J. Liebelt, C. Schmid, and K. Schertler. Viewpoint-independent object class detec- tion using 3d feature maps. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1--8, 2008. [52] Y. Guo, Y. Shan, H. S. Sawhney, and R. Kumar. Peet: Prototype embedding and embedding transition for matching vehicles over disparate viewpoints. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1--8, 2007. [53] J. M. Ferryman, A.D. Worrall, G. D. Sullivan, and K.D. Baker. A generic de- formable model for vehicle recognition. In Proceedings of the 1995 British Machine Vision Conference (Vol. 1), pages 127--136, 1995. [54] D.Koller,K.Danilidis,andH.H.Nagel.Model-basedobjecttrackinginmonocular image sequences of road traffic scenes. International Journal of Computer Vision, pages 257--281, 1993. [55] S. M. Khan, H. Cheng, D. Matthies, and H. S. Sawhney. 3d model based vehi- cle classification in aerial imagery. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1681--1687, 2010. [56] Y. Tsin, Y. Genc, and V. Ramesh. Explicit 3d modeling for vehicle monitoring in non-overlapping cameras. In IEEE International Conference on Advanced Video and Signal Based Surveillance, pages 110--115, 2009. [57] John R. Smith and Shih fu Chang. Visualseek: a fully automated content-based image query system. In ACM international conference on Multimedia, pages 87-- 98, 1996. [58] Chad Carson, Serge Belongie, Hayit Greenspan, and Jitendra Malik. Blobworld: Image segmentation using expectation-maximization and its application to image querying. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1026--1038, 2002. [59] KrystianMikolajczykandCordeliaSchmid.Scaleandaffineinvariantinterestpoint detectors. International Journal of Computer Vision, pages 63--86, 2004. [60] Rogerio Feris, Behjat Siddiquie, Yun Zhai, James Petterson, Lisa Brown, and Sharath Pankanti. Attribute-based vehicle search in crowded surveillance videos. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval, pages 18:1--18:8, 2011. [61] V.S. Petrovic and T. F. Cootes. Analysis of features for rigid structure vehicle type recognition. In British Machine Vision Conference, pages 587--596, 2004. [62] Pablo Negri, Xavier Clady, Maurice Milgram, Université Pierre, and Marie Curie- paris. An oriented-contour point based voting algorithm for vehicle type classifi- cation. In International Conference on Pattern Recognition, pages 574--577, 2006. [63] Farhad Mohamad Kazemi, Saeed Samadi, Hamid Reza Poorreza, and Mohamad- R. Akbarzadeh-T. Vehicle recognition based on fourier, wavelet and curvelet transforms - a comparative study. In International Conference on Information Technology, pages 939--940, 2007. [64] Saeid Rahati, Reihaneh Moravejian, Ehsan Mohamad Kazemi, and Farhad Mo- hamad Kazemi. Vehicle recognition using contourlet transform and svm. In International Conference on Information Technology: New Generations, pages 894--898, 2008. [65] I Zafar, E A Edirisinghe, and B S Acar. Localized contourlet features in vehi- cle make and model recognition. Proceedings of SPIE, pages 725105--725105--9, 2009. [66] Y.TsinandT.Kanade.Acorrelation-basedapproachtorobustpointsetregistration. In European Conference on Computer Vision, pages 558--569, 2004. [67] Andriy Myronenko and Xubo B. Song. Point set registration coherent point drift. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 2262-- 2275, 2009. [68] A. E. Beaton and J. W. Tukey. The fitting of power series, meaning polynomials, illustrated on band-spectroscopic data. In Technometrics, pages 147--185, 1974. [69] Josef Sivic and Andrew Zisserman. Video google: A text retrieval approach to object matching in videos. In International Conference on Computer Vision, 2003. [70] Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE Conference on Computer Vision and Pattern Recognition, 2006. [71] T. Ahon, J. Matas, C. He, and M. Pietikainen. Rotation invariant image description with local binary pattern histogram fourier features. In Proceedings of the 16th Scandinavian Conference on Image Analysis, pages 61--70, 2009. [72] Jun Yang, Yu-Gang Jiang, Alexander G. Hauptmann, and Chong-Wah Ngo. Eval- uating bag-of-visual-words representations in scene classification. In International workshop on Workshop on multimedia information retrieval, 2007. [73] Aristieds Gionis, Piotr Indyk, and Rajeev Motwani. Similarity search in high di- mensions via hashing. International Journal on Very Large Data Bases, 1999. [74] EllaBinghamandHeikkiMannila.Randomprojectionindimensionalityreduction: Applications to image and text data. ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2001. [75] M. Casey and M. Slaney. Fast recognition of remixed music audio. In International Conference on Acoustics, Speech, and Signal Processing, 2007. [76] Rui Cai, Chao Zhang, Lei Zhang, and Wei-Ying Ma. Scalable music recommenda- tion by search. ACM Multimedia, 2007. [77] Yan Ke, Rahul Sukthankar, and Larry Huston. Efficient near-duplicate detection and sub-image retrieval. ACM Multimedia, 2004. [78] Wei Dong, Zhe Wang, Moses Charikar, and Kai Li. Efficiently matching sets of features with random histograms. In ACM Multimedia, 2008. [79] Yu-Heng Lei, Yan-Ying Chen, Lime Iida, Bor chu Chen, Hsiao-Hang Su, and Win- ston H. Hsu. Photo search by face positions and facial attributes on touch devices. In ACM Multimedia, 2011. [80] Yin-Hsi Kuo, Kuan-Ting Chen, Chien-Hsing Chiang, and Winston H. Hsu. Query expansion for hash-based image object retrieval, acm multimedia. In ACM Multimedia, 2009. [81] Mathias Eitz, James Hays, and Marc Alexa. How do humans sketch objects? In SIGGRAPH, 2012. [82] Ondrej Chum, James Philbin, Josef Sivic, Michael Isard, and Andrew Zisserman. Total recall: Automatic query expansion with a generative feature model for object retrieval. In ICCV, 2007. [83] Chris Buckley. Automatic query expansion using smart:trec 3. In In Proceedings of The third Text REtrieval Conference (TREC-3), pages 69--80. [84] G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 1999. [85] S. Savarese and L. Fei-Fei. 3d generic object categorization, localization and pose estimation. In International Conference on Computer Vision, 2007. [86] B. Manjunath, P. Salembier, and T. Sikora. Introduction to MPEG-7: multimedia content description interface. John Wiley & Sons, Inc., 2002. [87] Hans Knutsson. Representing local structure using tensors. Technical report, Com- puter Vision Laboratory, Linkoping University, 1989. [88] L. Cao, J. Liu, and X. Tang. What the back of the object looks like: 3d reconstruc- tion from line drawings without hidden lines. TPAMI, 2008. [89] Jianzhuang Liu Tianfan Xue and Xiaoou Tang. Example-based 3d object recon- struction from line drawings. In CVPR, 2012. [90] T. Igarashi, S. Matsuoka, and H. Tanaka. Teddy: a sketching interface for 3d freeform design. In ACM SIGGRAPH, 1999. [91] Satoshi Suzuki and Keiichi Abe. Topological structural analysis of digitized binary images by border following. CVIU, pages 32--46, 1985. [92] J. Canny. A computational approach to edge detection. TPAMI, 1986. [93] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector ma- chines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. [94] Tao Qin Andrew Xiubo Geng, Tie-Yan Liu. Query dependent ranking using k- nearest neighbor. In SIGIR, 2012. [95] Yu-Gang Jiang, Jun Wang, and Shih-Fu Chang. Lost in binarization: Query- adaptive ranking for similar image search with compact codes. In ICMR, 2011. [96] J. Sivic and A. Zisserman. Video google: A text retrieval approach to object match- ing in videos. In ICCV, 2003. [97] P. Indyk A. Gionis and R. Motwani. Similarity search in high dimensions via hash- ing. In VLDB, 1999. [98] H. Zhang. An integrated system for content-based video retrieval and browsing. Pattern Recognition, 1997. [99] W.Wolf. Keyframe selection bymotion analysis. In ICASSP, 1996. [100] D. Goldman, B. Curless, D. Salesin, and S. Seitz. Storyboarding for video visual- ization and editing. In SIGGRAPH, 2006. [101] Danila Potapov, Matthijs Douze, Zaid Harchaoui, and Cordelia Schmid. Category- specific video summarization. In ECCV, 2014. [102] Yair Poleg, Chetan Arora, and Shmuel Peleg. Temporal segmentation of egocentric videos. In CVPR, 2014. [103] Bo Xiong and Kristen Grauman. Detecting snap points in egocentric video with a web photo prior. In ECCV, 2014. [104] YongJaeLee,JoydeepGhosh,andKristenGrauman.Discoveringimportantpeople and objects for egocentric video summarization. In CVPR, 2012. [105] Zheng Lu and Kristen Grauman. Story-driven summarization for egocentric video. In CVPR, 2013. [106] AlirezaFathi,AliFarhadi,andJamesM.Rehg.Understandingegocentricactivities. In ICCV, 2011. [107] H. Pirsiavash and D. Ramanan. Detecting activities of daily living in first-person camera views. In CVPR, 2012. [108] Aditya Khoslay, Raffay Hamidz, Chih-Jen Lin, and Neel Sundaresanz. Large-scale video summarization using web-image priors. In CVPR, 2013. [109] Michael Gygli, Helmut Grabner, Hayko Riemenschneider, and Luc Van Gool. Cre- ating summaries from user videos. In ECCV, 2014. [110] Yang Cong, Junsong Yuan, and Jiebo Luo. Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Transactions on Multimedia, 2012. [111] Kris M. Kitani, Takahiro Okabe, Yoichi Sato, and Akihiro Sugimoto. Fast unsu- pervised ego-action learning for first-person sports videos. In CVPR, 2011. [112] Kang Li, Sangmin Oh, Amitha G.A. Perera, and Yun Fu. A videography analysis framework for video retrieval and summarization. In BMVC, 2012. [113] Minh Hoai and Fernando De la Torre. Max-margin early event detectors. IJCV, 2013. [114] M. S. Ryoo. Human activity prediction: Early recognition of ongoing activities from streaming videos. In ICCV, 2011. [115] ThorstenJoachims,ThomasFinley,andChun-NamJohnYu.Cutting-planetraining of structuralsvms. Machine Learning, 2009. [116] I. Laptev, M. Marszałek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In CVPR, 2008. [117] Heng Wang and Cordelia Schmid. Action recognition with improved trajectories. In ICCV, 2013. [118] Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 3d convolutional neural networks for human action recognition. TPAMI, 2013. [119] Karen Simonyan and Andrew Zisserman. Two-stream convolutional networks for action recognition in videos. In NIPS, 2014. [120] Florent Perronnin, Jorge Sánchez, and Thomas Mensink. Improving the fisher ker- nel for large-scale image classification. In ECCV, 2010. [121] Yen-Liang Lin, Vlad I. Morariu, Winston Hsu, and Larry S. Davis. Jointly opti- mizing 3d model fitting and fine-grained classification. In European Conference on Computer Vision (ECCV), 2014. [122] Yen-LiangLin,Ming-KuangTsai,WinstonH.Hsu,andChih-WeiChen.Investigat- ing 3-d model and part information for improving content-based vehicle retrieval. IEEE Trans. Circuits Syst. Video Techn., 23(3):401--413, 2013. [123] Yen liang Lin, Cheng yu Huang, Hao jeng Wan, and Winston Hsu. 3d sub-query expansion for improving sketch-based multi-view image retrieval. In ICCV, 2013. [124] Yu Xiang, Roozbeh Mottaghi, and Silvio Savarese. Beyond pascal: A benchmark for 3d object detection in the wild. In IEEE Winter Conference on Applications of Computer Vision (WACV, 2014. [125] Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and An- drew Y. Ng. Multimodal deep learning. In International Conference on Machine Learning (ICML), 2011.  | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/55384 | - |
| dc.description.abstract | 影像的分類和搜尋是分析大規模影像和視訊數據的關鍵技術。雖然有相當數量的研究工作對此議題做探討,然而在大視角下的影像搜尋、細微影像分類、穿戴式相機上的視訊摘要,仍然具有非常大的挑戰性。在此論文中,我們有效地解決在計算機視覺和多媒體分析領域 中近年來四個重要議題。
第一個議題是細微影像分類。不同於傳統的影像分類,主要是根據部件的存在與否來作為分類的依據; 細微影像分類試圖找出部件之間的細微差異來區分物體。針對此問題,我們提出一套方法能夠共同優化三維模型疊合和細微影像分類。精細的三維模型能夠提供較多的訊息相較於傳統的二維方法,因此可以提高細微分類的性能。同時,所預 測的類別標籤還可以提高三維模型疊合的精確度,例如: 透過提供更準 確的初始形狀模型。我們評估所提出的方法在一個新建立的細微影像 分類車子數據庫,證明我們的方法優於其他最先進的方法。此外,我們還做了一系列的分析來探討細微影像分類和三維模型之間的關係。 第二個議題是基於屬性的車輛搜尋系統。我們將車子圖片從不同的角度校正至相同的視角並取出高階語意的屬性,例如: 車頭、照明燈和 輪胎的樣式,並透過這些屬性來搜尋資料庫中的車輛。在實驗中,我 們比較了不同的三維模型疊合方法,並驗證校正後的屬性能夠提升搜 尋效能。實驗結果顯示我們的方法顯著地優於先前以內容為基礎的影像搜尋方法。 第三個議題是基於草圖的多視角影像搜尋系統。我們自動地重建使用者所描繪的二維草圖至一個近似三維草圖模型,然後生成多視角草圖作為擴展子查詢以提高檢索性能。為了學習合成的草圖權重,我們提出了一種新的多維查詢特徵表示法來描述查詢草圖和資料庫影像之間的相似度,並且將問題轉化成一個凸優化問題。實驗顯示我們的方法優目前最先進的方法在一個公開的影像資料庫。 最後一個議題是穿戴式相機上的視訊摘要。我們提出了一個共同優化的方式,可以有效地產生重要的視訊摘要而不需要使用者手動給定影片的類別。我們的方法同時偵測重要的視訊畫面和預測類別標簽立即地生成摘要。當觀察足夠的視頻後,在早期就能準確地推斷出目標 類別標籤,並且使用特定的類別模型來偵測重要的影片畫面節省運算量。在一個公開的數據集中,我們的方法和目前最先進的方法有差不多表現效能,然而他們的方法必須先手動給定影片的類別標籤。早期類別預測可以顯著地降低計算成本同時保持原有的性能,顯示我們方法的有效性。 | zh_TW |
| dc.description.abstract | Image classification and retrieval are key techniques for managing the exponentially growing image and videos collections, e.g., consumer photos, surveillance videos, and egocentric videos. It is still very challenging to retrieve objects under large pose transformations, classify objects with subtile differences and extract a brief summary of unconstrained egocentric videos. In this dissertation, we aim to leverage 3D representation to improve image retrieval and classification performance, and generate compact and informative highlights for egocentric videos. We investigate four important and emerging topics in computer vision and multimedia community.
The first one is fine-grained classification. Different from conventional basic-level classification, which relies on the presence or absence of parts; fine-grained classification (i.e., subordinate-level categorization) finds salient distinctions between part/landmark-level characteristics of objects. We develop an approach than jointly optimizes 3D model fitting and fine-grained classification. Detailed 3D object representations encode more information (e.g., precise part locations and viewpoint) than traditional 2D-based approaches and can therefore improve fine-grained classification performance. Mean- while, the predicted class label can also improve 3D model fitting accuracy, e.g., by providing more detailed class specific shape models. We evaluate our method on a new fine-grained 3D car dataset (FG3DCar), demonstrating our method outperforms several state-of-the-art approaches. Furthermore, we also conduct a series of analyses to explore the dependence between fine-grained classification performance and 3D models. The second one is attribute-based car retrieval under unconstrained environment. We rectify the car images from disparate views into the same reference view and search the cars based on informative attributes (i.e., parts) such as grille, lamp, and wheel with the fitted 3D models. In the experiments, we compare different 3-D model fitting approaches and verify the significant impact of part rectification on car retrieval performance. The experimental results on car retrieval demonstrate that our approach significantly outperforms previous content-based image retrieval (CBIR) methods. The third one is sketch-based multi-view image retrieval. We automatically convert two (guided) 2D sketches into an approximated 3D sketch model, and then generate multi-view sketches as expanded sub-queries to improve the retrieval performance. To learn the weights among synthesized views (sub-queries), we present a new multi-query feature to model the similarity between subqueries and dataset images, and formulate it into a convex optimization problem. Our approach shows superior performance compared with the state-of-the-art approach on a public multi-view image dataset. Moreover, we also conduct sensitivity tests to analyze the parameters of our approach based on the gathered user sketches. The last one is video summarization on egocentric cameras. We propose a joint approach that can efficiently generate compact and informative summaries while not requiring the class label to be given in advance. Our approach simultaneously detects video highlights and estimates the class labels, and generate summaries immediately without watching the whole video sequence. After observing enough video, we correctly infer the target class label early and only use the class-specific model to summarize video highlights to save the computational cost. Experimental results on a public egocentric dataset show that the our method is very competitive with the state-of-the-art methods that require class labels to be known during testing. Moreover, the early class prediction aspect of our method can significantly reduce the computational cost while retaining the original performance, demonstrating the efficiency and effectiveness of our method for video highlighting. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-16T03:59:38Z (GMT). No. of bitstreams: 1 ntu-103-D98944010-1.pdf: 63291691 bytes, checksum: 93bdbdd9e5b73a07cacf6efb5e938e91 (MD5) Previous issue date: 2014 | en |
| dc.description.tableofcontents | 口試委員會審定書 i
Acknowledgments ii 中文摘要 Abstract Contents Publications List of Figures List of Tables 1 Introduction Fine-GrainedClassification .................... 2 Attributed-basedCarRetrieval .................. 3 Sketch-based Multi-View Object Retrieval . . . . . . . . . . . . 3 Video Summarization on Egocentric Cameras . . . . . . . . . . . 4 ThesisOverview .......................... 5 2 3D Deformable Car Model Construction 7 2.0.6 3DCADModelDataset ...................... 7 2.0.7 3DDeformableCarModel..................... 7 2.0.8 AdvantagesofUsing3DModels.................. 9 3 Fine-Grained 3D Car Dataset 11 4 Jointly Optimizing 3D Model Fitting and Fine-Grained Classification 14 4.1 Introduction................................. 14 4.2 RelatedWork ................................ 16 4.3 ApproachOverview............................. 17 4.4 Find2DPartLocations........................... 18 4.5 RegressionModelforLandmarkEstimation . . . . . . . . . . . . . . . . 20 4.6 Fitting3DModelto2DImage ....................... 21 4.7 FeatureRepresentationforClassification. . . . . . . . . . . . . . . . . . 24 4.8 Experiments................................. 25 4.8.1 Fine-Grained3DCarDataset.................... 25 4.8.2 Baselines .............................. 26 4.8.3 Fine-GrainedClassificationResults . . . . . . . . . . . . . . . . 27 4.8.4 3DModelFittingResults...................... 30 4.9 Conclusions................................. 31 5 Attribute-based Multi-View Car Retrieval 34 5.1 Introduction................................. 34 5.2 RelatedWork ................................ 37 5.3 ApproachOverview............................. 39 5.4 Edge-Based3DModelFittingApproach.................. 39 5.4.1 3DModelFittingmethod ..................... 40 5.4.2 ModelFittingwithPartInformation................ 43 5.5 PartRectification .............................. 44 5.6 Experiments................................. 46 5.6.1 CarRetrievalPerformance..................... 47 5.6.2 ModelFittingComparison..................... 49 5.7 Discussions ................................. 52 5.7.1 VisualWordvocabulary ...................... 52 5.7.2 PartWeight............................. 52 5.8 Conclusions................................. 53 6 3D Sub-Query Expansion for Improving Sketch-based Multi-View Image Re- trieval 59 6.1 Introduction................................. 59 6.2 RelatedWork ................................ 62 6.2.1 Sketch-basedImageRetrieval ................... 62 6.2.2 3D Model Reconstruction From Line Drawings . . . . . . . . . . 63 6.3 ProposedApproach............................. 64 6.3.1 3DSketchModelReconstruction ................. 65 6.3.2 ViewSynthesisasSketchSub-Queries. . . . . . . . . . . . . . . 66 6.3.3 Bag-of-Visual-WordModel .................... 66 6.3.4 FusionFunction .......................... 67 6.4 Experiments................................. 69 6.4.1 DatasetandQuerySketches .................... 70 6.4.2 RetrievalPerformance ....................... 71 6.4.3 SensitivityTest........................... 72 6.5 Conclusions................................. 73 7 Egocentric Video Summarization 78 7.1 Introduction................................. 79 7.2 RelatedWork ................................ 80 7.3 MethodOverview.............................. 82 7.4 JointOptimizationwithStructuredSVM.................. 83 7.5 Online Highlight Detection and Early Class Prediction . . . . . . . . . . 85 7.6 Experiments................................. 87 7.6.1 DatasetandEvaluationCriteria .................. 87 7.6.2 FeatureRepresentation....................... 87 7.6.3 HighlightDetectionResults .................... 88 7.6.4 ClassPredictionResults ...................... 89 7.6.5 ComputationalCost ........................ 89 7.6.6 Conclusions............................. 91 8 Conclusions and Future Work 95 8.0.7 Conclusions............................. 95 8.0.8 FutureWork ............................ 96 Bibliography 98 | |
| dc.language.iso | zh-TW | |
| dc.subject | 穿戴式攝影機 | zh_TW |
| dc.subject | 細微影像分類 | zh_TW |
| dc.subject | 視訊摘要 | zh_TW |
| dc.subject | 三維模型重建 | zh_TW |
| dc.subject | egocentric video summarization | en |
| dc.subject | attribute-based image retrieval | en |
| dc.subject | sketch-based image retrieval | en |
| dc.subject | 3D model fitting | en |
| dc.subject | 3D deformable model reconstruction | en |
| dc.subject | fine-grained classification | en |
| dc.title | 利用三維模型提升影像搜尋和分類效能 | zh_TW |
| dc.title | Augmenting Image Retrieval and Classification Using 3D Models | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 103-1 | |
| dc.description.degree | 博士 | |
| dc.contributor.oralexamcommittee | 莊仁輝,廖弘源,曾新穆,王蒞君,歐陽明 | |
| dc.subject.keyword | 細微影像分類,三維模型重建,視訊摘要,穿戴式攝影機, | zh_TW |
| dc.subject.keyword | fine-grained classification,attribute-based image retrieval,sketch-based image retrieval,3D model fitting,3D deformable model reconstruction,egocentric video summarization, | en |
| dc.relation.page | 109 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2014-11-21 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊網路與多媒體研究所 | zh_TW |
| 顯示於系所單位: | 資訊網路與多媒體研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-103-1.pdf 未授權公開取用 | 61.81 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
