在互動顯示上使用捲積網路的視線偵測

Yu-Ting Chen; 陳昱廷

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/53546

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	洪一平(Yi-Ping Hung)
dc.contributor.author	Yu-Ting Chen	en
dc.contributor.author	陳昱廷	zh_TW
dc.date.accessioned	2021-06-16T02:25:34Z	-
dc.date.available	2018-08-16
dc.date.copyright	2015-08-16
dc.date.issued	2015
dc.date.submitted	2015-08-06
dc.identifier.citation	[1] Carlos H Morimoto and Marcio RM Mimica. Eye gaze ㄔracking techniques for interactive applications. Computer Vision and Image Understanding, 98(1):4–24, 2005. [2] Dan Witzner Hansen and Qiang Ji. In the eye of the beholder: A survey of models for eyes and gaze. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(3):478–500, 2010. [3] Dong Hyun Yoo and Myung Jin Chung. A novel non-intrusive eye gaze estimation using cross-ratio under large head motion. Computer Vision and Image Understanding, 98(1):25–51, 2005. [4] David Beymer and Myron Flickner. Eye gaze tracking using an active stereo head. In Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, volume 2, pages II–451. IEEE, 2003. [5] Shumeet Baluja and Dean Pomerleau. Non-intrusive gaze tracking using artificial neural networks. Technical report, DTIC Document, 1994. [6] Kar-Han Tan, David J Kriegman, and Narendra Ahuja. Appearance-based eye gaze estimation. In Applications of Computer Vision, 2002.(WACV 2002). Proceedings. Sixth IEEE Workshop on, pages 191–195. IEEE, 2002. [7] Oliver Williams, Andrew Blake, and Roberto Cipolla. Sparse and semi-supervised visual mapping with the sˆ 3gp. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 1, pages 230–237. IEEE, 2006.34 [8] Brian A Smith, Qi Yin, Steven K Feiner, and Shree K Nayar. Gaze locking: Passive eye contact detection for human-object interaction. In Proceedings of the 26th annual ACM symposium on User interface software and technology, pages 271–280. ACM, 2013. [9] Kunihiko Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics, 36(4):193–202, 1980. [10] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278– 2324, 1998. [11] Dan Ciresan, Ueli Meier, and Jürgen Schmidhuber. Multi-column deep neural networks for image classification. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 3642–3649. IEEE, 2012. [12] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, pages 1–42, 2014. [13] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012. [14] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015. [15] Shuicheng Yan Min Lin, Qiang Chen. Network in network. arXiv preprint arXiv:1312.4400, 2013.35 [16] Timothy F Cootes, Christopher J Taylor, David H Cooper, and Jim Graham. Active shape models-their training and application. Computer vision and image understanding, 61(1):38–59, 1995. [17] S. Milborrow and F. Nicolls. Active Shape Models with SIFT Descriptors and MARS. VISAPP, 2014. [18] Davis E. King. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, 10:1755–1758, 2009. [19] Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems, pages 3320–3328, 2014. [20] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. [21] Yi-Ping Hung Yu-Shan Lin. Attention-aware interactive display wall. 2013.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/53546	-
dc.description.abstract	現今出現了許多互動顯示裝置，像 Google Glass、 Oculus、 Samsung TV 等。而對於大型的互動顯示系統，基於視線的互動方式是一種有效率和方便的方法。然而，大部分的視線偵測系統會需要侵入式光線、頭戴式裝置或固定的頭部位置。在這份論文中，我們展示了一種只需要 RGB-D 相機和高解析度相機的視線偵測方法。方法的重點在於使用最新的機械學習技術—捲積神經網路。我們將比較三種方法在兩種著名的網路模型的準確度。為了收集實驗數據，我們設計了一個互動牆實驗。最後的結果顯示我們的方法在 36 個方向的視線偵測上可以達到 80% 的成功率。然而，RGB-D 資料對準確度並無貢獻。即使如此，我們的依舊有良好的準確度。	zh_TW
dc.description.abstract	Many new interactive display devices appear recently, like Google Glass, Oculus, Samsung TV and so on. For large interactive display, like a wall, gaze-based interaction can be more effective and convenient. However, many gaze detection system need intrusive light, wearable devices or fixed head pose. In this paper, our goal is to study if head pose information can be useful for gaze detection. We propose a method which uses RGB-D camera for head pose detection and high esolution camera for gaze detection. The main idea is applying the new technology named Convolutional Neural Network (CNN) as the training process. We compared accuracy of gaze detection for interactive display between two well-known models of CNN with three approaches. We held an experiment on an interactive wall to collect data for our approach. The result shows our system can have more than 80% accuracy for 36 labels gaze detection. The head pose information provided no significant improvement. Even then, our approach still has good accuracy.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T02:25:34Z (GMT). No. of bitstreams: 1 ntu-104-R02944014-1.pdf: 2782345 bytes, checksum: 6e99fe2112f950f0199dc68b9a9026f0 (MD5) Previous issue date: 2015	en
dc.description.tableofcontents	口試委員會審定書 i 中文摘要 ii Abstract iii Contents iv List of Figures vi List of Tables viii 1 Introduction 1 2 Related Work 3 2.1 Gaze Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Convolutional Neural Network (CNN) . . . . . . . . . . . . . . . . . . . 4 3 Convolutional Neural Network (CNN) 5 3.1 LeNet5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2 AlexNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2.1 ReLU Nonlinearity . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2.2 Multiple GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2.3 Local Response Normalization . . . . . . . . . . . . . . . . . . . 7 3.2.4 Overlapping Pooling . . . . . . . . . . . . . . . . . . . . . . . . 8 v 3.2.5 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.3 GoogLeNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3.1 Inception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4 Method 12 4.1 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.2 CNN Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.3 Combining CNN Training with Head Pose Training . . . . . . . . . . . . 14 4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5 Experiments 16 5.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.1.1 Interactive Wall Display . . . . . . . . . . . . . . . . . . . . . . 16 5.1.2 Computer Used for Training . . . . . . . . . . . . . . . . . . . . 18 5.2 Experiment Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.2.1 Laser Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.2.2 Subject Measurement . . . . . . . . . . . . . . . . . . . . . . . . 20 5.2.3 Data Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.4 Learning Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.5 Subject Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 6 Conclusions and Future Works 29 A Displacement of Head Pose 30 A.1 Calibration State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 A.2 Horizontal Move State . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 A.3 Vertical Move State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Bibliography 34
dc.language.iso	zh-TW
dc.subject	人機互動	zh_TW
dc.subject	視線偵測	zh_TW
dc.subject	捲積神經網路	zh_TW
dc.subject	互動顯示裝置	zh_TW
dc.subject	電腦視覺	zh_TW
dc.subject	Human Computer Interaction	en
dc.subject	Gaze Detection	en
dc.subject	Convolutional Neural Network	en
dc.subject	Interactive Displays	en
dc.subject	Computer Vision	en
dc.title	在互動顯示上使用捲積網路的視線偵測	zh_TW
dc.title	Gaze Detection Using Convolutional Neural Network for Interactive Displays	en
dc.type	Thesis
dc.date.schoolyear	103-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	謝俊科,石勝文,吳健榕,陳湘鳳
dc.subject.keyword	視線偵測,捲積神經網路,互動顯示裝置,電腦視覺,人機互動,	zh_TW
dc.subject.keyword	Gaze Detection,Convolutional Neural Network,Interactive Displays,Computer Vision,Human Computer Interaction,	en
dc.relation.page	36
dc.rights.note	有償授權
dc.date.accepted	2015-08-06
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊網路與多媒體研究所	zh_TW
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-104-1.pdf 未授權公開取用	2.72 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。