利用立體相機之三維互動使用者介面之演算法與硬體架構設計

Cheng-Yuan Ko; 柯政遠

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61588

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳良基(Liang-Gee Chen)
dc.contributor.author	Cheng-Yuan Ko	en
dc.contributor.author	柯政遠	zh_TW
dc.date.accessioned	2021-06-16T13:06:38Z	-
dc.date.available	2023-12-31
dc.date.copyright	2013-08-14
dc.date.issued	2013
dc.date.submitted	2013-08-02
dc.identifier.citation	[1] C. Fehn 'A 3DTV system based on video plus depth information', 37th Asilomar Conf. Signals, Syst. Comp., 2003. [2] D. Marr, “Vision,” Freeman, San Francisco, 1982. [3] E. H. Adelson and J. Y. A. Wang., “Single lens stereo with plenoptic camera,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, pp. 99-106, February 1992. [4] Shotton, J., et al. 'Real-time human pose recognition in parts from single depth images.' Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011. [5] Zimmerman, Thomas G., et al. 'A hand gesture interface device.' ACM SIGCHI Bulletin. Vol. 18. No. 4. ACM, 1987. [6] Wang, Robert Y., and Jovan Popović. 'Real-time hand-tracking with a color glove.' ACM Transactions on Graphics (TOG). Vol. 28. No. 3. ACM, 2009. [7] Stenger, Bjorn, et al. 'Model-based hand tracking using a hierarchical bayesian filter.' Pattern Analysis and Machine Intelligence, IEEE Transactions on 28.9 (2006): 1372-1384. [8] Garg, Pragati, Naveen Aggarwal, and Sanjeev Sofat. 'Vision based hand gesture recognition.' World Academy of Science, Engineering and Technology 49.1 (2009): 972-977. [9] Yoon, Ho-Sub, et al. 'Hand gesture recognition using combined features of location, angle and velocity.' Pattern Recognition 34.7 (2001): 1491-1501. [10] Bretzner, Lars, Ivan Laptev, and Tony Lindeberg. 'Hand gesture recognition using multi-scale colour features, hierarchical models and particle filtering.' Automatic Face and Gesture Recognition, 2002. Proceedings. Fifth IEEE International Conference on. IEEE, 2002. [11] Huang, Chung-lin, and Sheng-Hung Jeng. 'A model-based hand gesture recognition system.' Machine vision and applications 12.5 (2001): 243-258. [12] Holte, Michael Boelstoft, Thomas B. Moeslund, and Preben Fihl. 'View-invariant gesture recognition using 3D optical flow and harmonic motion context.' Computer Vision and Image Understanding 114.12 (2010): 1353-1361. [13] Ren, Zhou, et al. 'Robust hand gesture recognition with kinect sensor.' Proceedings of the 19th ACM international conference on Multimedia. ACM, 2011. [14] Van den Bergh, Michael, and Luc Van Gool. 'Combining RGB and ToF cameras for real-time 3D hand gesture interaction.' Applications of Computer Vision (WACV), 2011 IEEE Workshop on. IEEE, 2011. [15] Liu, Xia, and Kikuo Fujimura. 'Hand gesture recognition using depth data.' Automatic Face and Gesture Recognition, 2004. Proceedings. Sixth IEEE International Conference on. IEEE, 2004. [16] Benko, Hrvoje, Ricardo Jota, and Andrew Wilson. 'Miragetable: freehand interaction on a projected augmented reality tabletop.' Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems. ACM, 2012. [17] Hilliges, O., Kim, D., Izadi, S., Weiss, M., & Wilson, A. (2012, May). HoloDesk: direct 3d interactions with a situated see-through display. In Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems (pp. 2421-2430). ACM. [18] Minvielle, P., Doucet, A., Marrs, A., & Maskell, S. (2010). A Bayesian approach to joint tracking and identification of geometric shapes in video sequences. Image and Vision Computing, 28(1), 111-123. [19] http://en.wikipedia.org/wiki/Pinhole_camera_model [20] Zhang, Z. (1999). Flexible camera calibration by viewing a plane from unknown orientations. In Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on (Vol. 1, pp. 666-673). Ieee. [21] Loop, C., & Zhang, Z. (1999). Computing rectifying homographies for stereo vision. In Computer Vision and Pattern Recognition, 1999. IEEE Computer Society Conference on. (Vol. 1). IEEE. [22] Strecha, C., & Van Gool, L. (2002). Motion—Stereo Integration for Depth Estimation. In Computer Vision—ECCV 2002 (pp. 170-185). Springer Berlin Heidelberg. [23] Ko, C. Y., Li, C. T., Wu, C., & Chen, L. G. (2012, June). An Efficient Method for Extracting the Depth Data from the User. In International Conference on 3D systems and Applications (3DSA), Hsinchu, Taiwan. [24] Piccardi, M. (2004, October). Background subtraction techniques: a review. In Systems, Man and Cybernetics, 2004 IEEE International Conference on (Vol. 4, pp. 3099-3104). IEEE. [25] Cheng-Yuan Ko, and Liang-Gee Chen, “Acquire User’s Distance by Face Detection, in IEEE 17th International Symposium on Consumer Electronics (ISCE), Hsinchu, Taiwan, June 2013. [26] http://opencv.org/ [27] Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International journal of computer vision, 57(2), 137-154. [28] Wan, L. C., Sebastian, P., & Voon, Y. V. (2009, April). Stereo vision tracking system. In Future Computer and Communication, 2009. ICFCC 2009. International Conference on (pp. 487-491). IEEE. [29] Ko, C. Y., Li, C. T., Chung, C. H., & Chen, L. G. (2013, June). High Accuracy User’s Distance Estimation by Low Cost Cameras. Best Paper Award, In International Conference on 3D systems and Applications (3DSA), Osaka, Japan. [30] Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on (Vol. 1, pp. I-511). IEEE. [31] Ko, C. Y., Li, C. T., Chung, C. H., & Chen, L. G. (2013, March). 3D hand localization by low-cost webcams. In IS&T/SPIE Electronic Imaging (pp. 86500W-86500W). International Society for Optics and Photonics. [32] Sun, J., Zheng, N. N., & Shum, H. Y. (2003). Stereo matching using belief propagation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 25(7), 787-800. [33] Koschan, A. (1993, September). Dense stereo correspondence using polychromatic block matching. In Proc. of the 5th Int. Conf. on Computer Analysis of Images and Patterns CAIP (Vol. 93, pp. 538-542). [34] Liang, C. K., Cheng, C. C., Lai, Y. C., Chen, L. G., & Chen, H. H. (2009, June). Hardware-efficient belief propagation. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (pp. 80-87). IEEE. [35] Li, C. T., Lai, Y. C., Wu, C., Tsai, S. F., & Chen, L. G. (2012, January). 3D image correction by Hilbert Huang decomposition. In Consumer Electronics (ICCE), 2012 IEEE International Conference on (pp. 271-272). IEEE. [36] Dahan, M. J., Chen, N., Shamir, A., & Cohen-Or, D. (2012). Combining color and depth for enhanced image segmentation and retargeting. The Visual Computer, 28(12), 1181-1193. [37] http://en.wikipedia.org/wiki/Minoru_3D_Webcam [38] Zhong, R., Hu, R., Shi, Y., Wang, Z., Han, Z., Liu, L., & Hu, J. (2012). Just noticeable difference for 3d images with depth saliency. In Advances in Multimedia Information Processing–PCM 2012 (pp. 414-423). Springer Berlin Heidelberg. [39] Didyk, P., Ritschel, T., Eisemann, E., Myszkowski, K., & Seidel, H. P. (2011). A perceptual model for disparity. ACM Transactions on Graphics (TOG), 30(4),96. [40] http://www.middlebury.edu [41] http://www.hdhes.com/tv/hdtvviewdistance.aspx [42] Shibata, T., Kim, J., Hoffman, D. M., & Banks, M. S. (2011). The zone of comfort: Predicting visual discomfort with stereo displays. Journal of vision, 11(8). [43] Tynan, P. D., & Sekuler, R. (1982). Motion processing in peripheral vision: Reaction time and perceived velocity. Vision Research, 22(1), 61-68.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61588	-
dc.description.abstract	在今日生活中，數位視頻技術扮演重要腳色。隨著顯示器科技的演進，顯示器能提供給人們越來越好的觀賞品質。立體顯示器比起傳統平面顯示器給使用者提供了更佳的觀賞經驗。立體影像技術在許多應用下豐富了這些應用中的內容，比方說電視廣播、電影、遊戲、攝影、教育…等。在現今立體影像已是如此真實的情況下，人們不會只滿足於觀賞立體影片。使用者會想要和如此逼真的立體虛擬影像有所接觸互動，比方說丟擲、觸摸、推…等。在這篇論文中，我們提出了利用雙眼相機來進行「虛擬觸碰」互動的概念。目前一般的互動方式為使用者在電視或裝置前面來比出特定手勢或是一些身體姿勢，接著系統判斷出為哪種姿勢後便會將相對應的反應表現出來。此類的研究數量已經相當多，而且我們認為它的功能更像是取代遙控器而已。在現今立體影像已是如此真實的情況下，人們不會只滿足於觀賞立體影片。使用者會想要和如此逼真的立體虛擬影像有所接觸互動，比方說丟擲、觸摸、推…等。我們提出了一個基於雙眼相機的立體互動使用者介面，此介面能偵測使用者距離以及手部距離。當使用者手部距離與立體虛擬物件在空間座標位置到達一致時，此系統則判斷使用者達成了虛擬觸碰的條件，接著辨別使用者的操作來給出相對應虛擬觸碰的反應。立體互動使用者介面分成兩部分來探討:免校正使用者距離估計以及利用信心傳遞法來進行手部三維空間定位。免校正使用者距離偵測是立體互動使用者介面的第一步。主要的概念就是將使用者視為一個物體，利用雙眼相機拍攝到的左圖及右圖，計算出代表使用者的視差。最後，利用這個視差便能算出使用者距離。利用信心傳遞法來進行手部三維空間定位是立體互動使用者介面的另一部分。當我們只有使用者距離的資訊時，我們只能做一些相當簡單的互動。由於手是人類與機器最直觀也最有效的互動方式，系統必須取得手部三維空間定位，如此使用者才能進行更複雜或是精確的互動。我們利用深度以及彩色影像的資訊來達到手部三維空間定位以及一些簡單手勢的判別。我們也提出了一個三階管線硬體架構，實現結果表明了此架構能在操作頻率200Mhz輸入左右影像皆為1080p時達到30fps之即時速度。	zh_TW
dc.description.abstract	Digital video technology has played an important role in our daily life. With the evolution of the display technologies, display systems can provide higher visual quality to enrich human life. Immersive 3D displays provide better visual experience than conventional 2D displays. 3D technology enriches the contents of many applications, such as broadcasting, movie, gaming, photographing, camcorder, education, etc. However, in the case of stereoscopic display is quite mature and image is quite realistic, the user will want to interact with three dimensional virtual objects, such as slapping, sliding, throwing…. In this thesis, we proposed “virtual touch” interaction by using stereo camera. Common interactive way is that user can do some hand gesture or body gesture in front of TV or other devices, and then the system recognizes the gesture and some reaction which is corresponded to this gesture will be appeared. This kind of research is already quite mature, and its function more likes the remote control. Nowadays, in the case of stereoscopic display is quite mature and image is quite realistic, the user will want to interact with three dimensional virtual objects, the so-called 'virtual touch' is such as slapping, sliding, throwing…. We proposed a 3D interactive user interface by stereo camera which can detect the user's hand and body's location. When the position of user’s hand and position of virtual object are consistent, then the system considers that the user achieve the “virtual touch”, and then the system will recognize the user’s operation, and therefore give the user a so-called 'virtual touch' interaction. The 3D interactive user interface by stereo camera is discussed in two different parts: distance estimation by calibration-free captures and 3D hand localization by using belief propagation. The distance estimation by calibration-free captures is the first step of 3D interactive user interface. The main concept is that treats that user as an object, and from the left capture and right capture from stereo camera, calculates the disparity of the user. Finally, the user’s distance can be estimated by disparity of the user. 3D hand localization by using belief propagation is another part of interactive 3D user interface. When we only have the user’s distance from system, we can just do some simple interaction with system. Because of hand gesture is one of the most intuitive and nature ways for people to communicate with machines, so system have to get the user’s hand 3D localization, and thus the user can do more complex control or interaction with system. We use only depth and color information to get the hand’s 3D localization and do some simple gesture recognition to judge the reaction. We also proposed 3-stage architecture for hardware design, and the implementation result shows that the architecture can achieve real-time interaction of input Fll-HD1080p@30fps stereo images when operating at 200MHz.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T13:06:38Z (GMT). No. of bitstreams: 1 ntu-102-R00943136-1.pdf: 4754336 bytes, checksum: 5e0695c06b1d3940de4710c2f9579470 (MD5) Previous issue date: 2013	en
dc.description.tableofcontents	ABSTRACT VI CHAPTER 1 INTRODUCTION 1 1.1. TREND IN 3D INTERACTIVE USER INTERFACE 2 1.2. HUMAN DEPTH PERCEPTION 2 1.2.1. BINOCULAR VISION 3 1.2.2. DEPTH AND DISPARITY 3 1.3. 3D DISPLAY 5 1.3.1. STEREOSCOPIC AND AUTOSTEREOSCOPIC DISPLAY 5 1.3.2. HOLOGRAPHIC DISPLAY 10 1.3.3. 3D PROJECTOR 11 1.4. DEPTH SENSOR 11 1.4.1. TIME OF FLIGHT CAMERA 11 1.4.2. STRUCTURED LIGHT 13 1.4.3. LIGHT CODING 14 1.4.4. STEREOSCOPIC CAMERAS AND CAMERA ARRAYS 15 1.4.5. LIGHT FIELD CAMERA 16 1.5. SUMMARIES 17 CHAPTER 2 STUDY OF 3D INTERACTIVE SYSTEM 18 2.1. MOTION SENSING 20 2.2. HAND GESTURE RECOGNITION 21 2.3. VIRTUAL TOUCH: RELATIVE 3D INTERACTIVE SYSTEMS 23 2.4. NATURAL AND SMART INTERACTION 24 2.5. SUMMARY 24 CHAPTER 3 ANALYSIS AND ALGORITHM DESIGN OF 3D INTERACTIVE USER INTERFACE BY USING STEREO CAMERA 27 3.1. 3D TOUCH – BETWEEN HANDS AND 3D VIRTUAL OBJECTS 28 3.2. THE KEY DIFFERENCES BETWEEN THIS WORK AND CURRENT SMART TV 29 3.3. CHALLENGES IN 3D INTERACTIVE USER INTERFACE 31 3.4. PROPOSED ALGORITHM 32 3.5. PRE-PROCESSING: CAMERA CALIBRATION AND STEREO RECTIFICATION 34 3.5.1. CAMERA CALIBRATION 34 3.5.2. STEREO RECTIFICATION 36 3.6. USER’S DISTANCE ESTIMATION 39 3.6.1. MASK BASED STEREO MATCHING 39 3.6.2. FACE DETECTION BASED STEREO MATCHING 44 3.6.3. USER’S DISTANCE ESTIMATION USING MULTI-CUEING 50 3.7. 3D HAND LOCALIZATION AND SIMPLE HAND GESTURE RECOGNITION 58 3.7.1. STEREO MATCHING USING TILE-BASED BELIEF PROPAGATION 59 3.7.2. ROUGHLY HAND REGION DECISION AND REFINE DEPTH MAP 59 3.7.3. HAND SEGMENTATION BY A.R.T. (ADAPTIVE REGION THRESHOLD) 60 3.7.4. 3D HAND LOCALIZATION AND SIMPLE GESTURE RECOGNITION 60 3.7.5. EXPERIMENTAL RESULTS 61 3.8. MERGE THE VIRTUAL WORLD AND REAL WORLD 68 3.8.1. OUTLIER REMOVAL TO ENHANCE HAND DISTANCE ESTIMATION 68 3.8.2. DECISION OF TOLERANCE 68 3.8.3. ACCURACY OF DISTANCE ESTIMATION ANALYSIS 69 3.9. SYSTEM CONSIDERATIONS 73 3.10. CONCLUSION 73 CHAPTER 4 ARCHITECTURE DESIGN OF 3D INTERACTIVE USER INTERFACE BY USING STEREO CAMERA 75 4.1. INTRODUCTION 76 4.2. HARDWARE ORIENTED ALGORITHM 77 4.2.1. BINARY LINE MASK STEREO MATCHING 77 4.2.2. 2-NEIGHBOR OUTLIER REMOVAL 78 4.2.3. HAND ROI DOWNSAMPLING 78 4.3. PROPOSED ARCHITECTURE 81 4.3.1. THREE-STAGE PIPELINE 81 4.3.2. MEMORY AND BANDWIDTH ANALYSIS 82 4.3.3. IMPLEMENTATION RESULTS 83 4.4. CONCLUSIONS 83 CHAPTER 5 CONCLUSION 84 BIBLIOGRAPHY 85
dc.language.iso	en
dc.title	利用立體相機之三維互動使用者介面之演算法與硬體架構設計	zh_TW
dc.title	Algorithm and Architecture Design of 3D Interactive User Interface by Stereo Camera	en
dc.type	Thesis
dc.date.schoolyear	101-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳美娟(Mei-Juan Chen),賴永康(Yeong-Kang Lai),簡韶逸(Shao-Yi Chien)
dc.subject.keyword	三維使用者介面,距離偵測,使用者介面,	zh_TW
dc.subject.keyword	3DUI,distance estimation,user interface,	en
dc.relation.page	88
dc.rights.note	有償授權
dc.date.accepted	2013-08-02
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電子工程學研究所	zh_TW
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-102-1.pdf 目前未授權公開取用	4.64 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。