手機影片蒐尋與辨識

Chun-Yen Yeh; 葉俊言

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/56309

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	徐宏民(Winston H. Hsu)
dc.contributor.author	Chun-Yen Yeh	en
dc.contributor.author	葉俊言	zh_TW
dc.date.accessioned	2021-06-16T05:22:52Z	-
dc.date.available	2019-09-02
dc.date.copyright	2014-09-02
dc.date.issued	2014
dc.date.submitted	2014-08-14
dc.identifier.citation	[1] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool. Surf: Speeded up robust features. Computer Vision and Image Understanding, 110(3):346–359, 2008. [2] C.-C. Chang and C.-J. Lin. Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2011. [3] O. Dan, J. Feng, and B. Davison. Filtering microblogging messages for social tv. In ACM International Conference Companion on World Wide Web, 2011. [4] D. P. W. Ellis, B. Whitman, and A. Porter. Echoprint: An open music identification service. In International Society for Music Information Retrieval Conference, 2011. [5] B. Girod, V. Chandrasekhar, D. M. Chen, N.-M. Cheung, R. Grzeszczuk, Y. Reznik, G. Takacs, S. S. Tsai, and R. Vedantham. Mobile visual search. IEEE Signal Processing Magazine, 28(4):61–76, 2011. [6] Y.-G. Jiang. Super: Towards real-time event recognition in internet videos. In ICMR, 2012. [7] Y.-G. Jiang, Q. Dai, Y. Zheng, X. Xue, J. Liu, and D. Wang. A fast video event recognition system and its application to video search. In ACM MM, 2012. [8] I. Laptev. On space-time interest points. Int. J. Comput. Vision, 2005. [9] P. Li, T. J. Hastie, and K. W. Church. Very sparse random projections. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006. [10] W. Liu, T. Mei, Y. Zhang, J. Li, and S. Li. Listen, look, and gotcha: instant video search with mobile phones by layered audio-video indexing. In ACM international conference on Multimedia, 2013. [11] M. Muja and D. G. Lowe. Fast matching of binary features. In Conference on Computer and Robot Vision, 2012. [12] L. A. Rowe and R. Jain. Acm sigmm retreat report on future directions in multimedia research. ACM Transactions on Multimedia Computing, Communications, and Applications, 1(1):3–13, 2005. [13] Y.-C. Su, T.-H. Chiu, Y.-Y. Chen, C.-Y. Yeh, and W. H. Hsu. Enabling low bitrate mobile visual recognition: A performance versus bandwidth evaluation. In ACM MM, 2013. [14] A. L.-C. Wang. An industrial-strength audio search algorithm. In International Conference on Music Information Retrieval, 2003. [15] C.-Y. Yeh, Y.-M. Hsu, H. Huang, H.-W. Jheng, Y.-C. Su, T.-H. Chiu, and W. Hsu. Me-link: link me to the media–fusing audio and visual cues for robust and efficient mobile media interaction. In WWW, 2014.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/56309	-
dc.description.abstract	近幾年，由於行動裝置的爆炸性成長與方便性，使用者需求已經顯著地從個人電腦轉移到手持裝置上。許多使用者們現在隨身攜帶他們的行動裝置且當空閒總是瀏覽線上內容 - 其中大部分為多媒體內容。因由搜尋相似或辨識影片使用者興趣的影片的需求，我們提供了辦法讓使用者使用行動裝置上的照相機與麥克風擷取影片然後得到使用者所感興趣的相關訊息。在第一部分，我們呈獻了行動裝置上大量影片搜尋系統，取名為``Me-link'，漸進式的結合輕量的聲音和視覺的特徵。使用我們的系統，使用者只需要將行動裝置的照相機指向他們所感興趣的影片。系統會擷取畫面和聲音，然後立即檢索出影片的相關訊息。當使用者拍攝更久，系統會漸進式的聚集時間上的線索然後回傳更精準的結果。我們亦考慮現實環境的干擾，當使用者不一定能取得清晰的視覺或音覺訊號。在合成視覺或音覺線索階段，我們的系統自動的偵測有效的訊號再回傳最終的結果。在伺服器端，使用者可以經由網頁上傳影片語相關訊息。此外我們還連結了訊號流，因此使用者可以利用我們的``Me-link'系統取得即時的電視節目。在第二部分，我們專注於有潛力於許多生活上應用的影片辨識系統，而且提出新穎的系統能夠在手持裝置運算能力與網路頻寬限制下即時的辨識影片。我們提出漸進式與快速中止的影片辨識。當使用者記錄或觀看影片時，系統結合聲音與視覺的線索，然後執行漸進式的影片辨識。使用快速中止的方法，系統明智地只使用少量早期的畫面辨識影片，此方法大量的節省計算與網路資源。我們的系統相較於原始影片特徵只需傳輸 7KB 的影片資訊，而且每三秒影片錄放時回傳辨識結果提供立即的反饋的信息。	zh_TW
dc.description.abstract	In recent years, due to the explosive growth of mobile devices and the convenience of portable device, the consumer requirement significantly shifts from desktop computers to portable devices. Many people now carry their mobile devices all the time and browse online content – mostly multimedia content, whenever they have spare time. With the demand of searching or recognizing similar videos the videos of interest on mobile, we present solutions that uses the camera and microphone to capture the video and then retrieve the information that user is interested in. In first part, we present a scalable mobile video search system, named ``Me-link,' based on progressive fusion of light-weight audio visual features. With our system, users only have to point the mobile camera to the video they are interested in. The system will capture the frames and sounds, then retrieve relevant information immediately. As the users hold the mobile longer, the system progressively aggregates the cues temporally and then returns more accurate results. We also consider the real world noisy environment, where users may not get clear visual or audio signals. In the aggregation step of audio and visual cues, our system automatically detects the available channel for the final rank. On the server side, users can upload the videos with information via website. Besides, we also link the streaming signals so that users can get the real time broadcasting with ``Me-link'. In second part, we focus on video recognition which has great potential for many applications in our everyday life and present a novel system enabling real-time video recognition under the constraints of mobile computing power and network bandwidth. We propose the concepts of progressive and early-stop video recognition. As the user records/watches the videos, the system aggregates audio visual cues and conducts video recognition in progressive manner. With the early-stop method, the system recognizes videos intelligently using only the early shots, which greatly saves computational and network resources. Our system only transmits 7K bytes video information which mush smaller than original video features, and returns the recognized results every 3 seconds of video playback that provides instant feedback.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T05:22:52Z (GMT). No. of bitstreams: 1 ntu-103-R01922055-1.pdf: 17695592 bytes, checksum: ea7f2ccb35eeb2954ecef0f417c79fdf (MD5) Previous issue date: 2014	en
dc.description.tableofcontents	口試委員會審定書i 誌謝ii Acknowledgements iii 摘要iv Abstract vi 1 Introduction 1 2 Me-link: Link Me to the Media – Fusing Audio and Visual Cues for Robust and Efficient Mobile Media Interaction 2 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 System Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2.1 Audio and Visual Descriptors . . . . . . . . . . . . . . . . . . . 5 2.2.2 Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.3 Progressive Fusion . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.4 Fusion of Aural and Visual Ranks . . . . . . . . . . . . . . . . . 8 2.3 The Demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 Low-Bitrate and Online Mobile Video Recognition 11 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 System Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3.1 Features and Descriptors . . . . . . . . . . . . . . . . . . . . . . 13 3.3.2 Progressive Recognition . . . . . . . . . . . . . . . . . . . . . . 14 3.3.3 Early-Stop Video Recognition . . . . . . . . . . . . . . . . . . . 14 3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4 Conclusion 16 Bibliography 17
dc.language.iso	en
dc.title	手機影片蒐尋與辨識	zh_TW
dc.title	Mobile Video Search and Recognition	en
dc.type	Thesis
dc.date.schoolyear	102-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳文進(Wen-Chin Chen),余能豪(NENG-HAO YU)
dc.subject.keyword	第二螢幕,手機影片辨識,擴充實境,速度與效率,低頻寬,	zh_TW
dc.subject.keyword	Second Screen,Mobile Video Recognition,Augmented Reality,Speed and Efficiency,Low Bitrate,	en
dc.relation.page	18
dc.rights.note	有償授權
dc.date.accepted	2014-08-15
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-103-1.pdf 目前未授權公開取用	17.28 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。