基於手持移動裝置之室內空間文字影像擷取

Chia-Jung Chen; 陳加容

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/49168

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	黃乾綱(Chien-Kang Huang)
dc.contributor.author	Chia-Jung Chen	en
dc.contributor.author	陳加容	zh_TW
dc.date.accessioned	2021-06-15T11:18:04Z	-
dc.date.available	2017-08-31
dc.date.copyright	2016-08-31
dc.date.issued	2016
dc.date.submitted	2016-08-19
dc.identifier.citation	[1] Pahlavan, Kaveh, Xinrong Li, and Juha-Pekka Makela. 'Indoor geolocation science and technology.' IEEE Communications Magazine 40.2 (2002): 112-118. [2] Newman, Nic. 'Apple iBeacon technology briefing.' Journal of Direct, Data and Digital Marketing Practice 15.3 (2014): 222-225. [3] 「台鐵 EZ GO」 App is Available form: https://play.google.com/store/apps/details?id=com.TRA.EZGo&hl=zh_TW [4] Mori, Shunji, Ching Y. Suen, and Kazuhiko Yamamoto. 'Historical review of OCR research and development.' Proceedings of the IEEE 80.7 (1992): 1029-1058. [5] Epshtein, Boris, Eyal Ofek, and Yonatan Wexler. 'Detecting text in natural scenes with stroke width transform.' Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010. [6] K. Jung, K. I. Kim, and A. K. Jain. 'Text information extraction in images and videos: A survey.' Pattern Recognition, 37(5):977–997, 2004. [7] K. L Kim, K. Jung and J. H. Kim, 'Texture-Based Approach for Text Detection in Images using Support Vector Machines and Continuous Adaptive Mean Shift Algorithm', IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 25, No. 12, December 2003, pp 1631-1639. [8] Y. Zhong, H. Zhang and A.K. Jain, 'Automatic Caption Localization in Compressed Video', IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 22, No. 4, 2000, pp. 385-392. [9] Zhou, Gang, et al. 'Detecting multilingual text in natural scene.' Access Spaces (ISAS), 2011 1st International Symposium on. IEEE, 2011. [10] Canny, John. 'A computational approach to edge detection.' IEEE Transactions on pattern analysis and machine intelligence (1986): 679-698. [11] P. Shivakumara, T. Q. Phan and C. L Tan, 'A Robust Wavelet Transform Based Technique for Video Text Detection', ICDAR, 2009, pp 1285-1289. [12] Yao, Cong, et al. 'Detecting texts of arbitrary orientations in natural images.' Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012. [13] Reitmayr, Gerhard, and Dieter Schmalstieg. 'Location based applications for mobile augmented reality.' Proceedings of the Fourth Australasian user interface conference on User interfaces 2003-Volume 18. Australian Computer Society, Inc., 2003. [14] Dalal, Navneet, and Bill Triggs. 'Histograms of oriented gradients for human detection.' 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). Vol. 1. IEEE, 2005. [15] Wang, Xiaoyu, Tony X. Han, and Shuicheng Yan. 'An HOG-LBP human detector with partial occlusion handling.' 2009 IEEE 12th International Conference on Computer Vision. IEEE, 2009. [16] Ojala, Timo, Matti Pietikäinen, and David Harwood. 'A comparative study of texture measures with classification based on featured distributions.' Pattern recognition 29.1 (1996): 51-59. [17] Yin, Xu-Cheng, et al. 'Robust text detection in natural scene images.' IEEE transactions on pattern analysis and machine intelligence 36.5 (2014): 970-983. [18] Robust text detection in natural scene images. Retrieved July 12, 2016, from: http://prir.ustb.edu.cn/TexStar/scene-text-detection/ [19] Najman, Laurent, and Michel Schmitt. 'Watershed of a continuous function.' Signal Processing 38.1 (1994): 99-112. [20] Reitmayr, Gerhard; Schmalstieg, Dieter. Location based applications for mobile augmented reality. In: Proceedings of the Fourth Australasian user interface conference on User interfaces 2003-Volume 18. Australian Computer Society, Inc., 2003. p. 65-73. [21] Edward Rosten and Tom Drummond, Machine learning for high speed corner detection in 9th European Conference on Computer Vision, vol. 1, 2006, pp. 430–443. [22] Edward Rosten, Reid Porter, and Tom Drummond, Faster and better: a machine learning approach to corner detection in IEEE Trans. Pattern Analysis and Machine Intelligence, 2010, vol 32, pp. 105-119. [23] Matas, J., Chum, O., Urban, M., and Pajdla, T. 2002. Robust wide baseline stereo from maximally stable extremal regions. In British Machine Vision Conference, Cardiff, Wales, pp. 384–393. [24] Michael Donoser , Horst Bischof, Efficient Maximally Stable Extremal Region (MSER) Tracking, Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 17-22, 2006, p.553-560. [25] D. Nistér and H. Stewénius, 'Linear time maximally stable extremal regions,' in ECCV, 2008, pp. 183-196. [26] H. Chen, 'Robust text detection in natural images with edge-enhanced maximally stable extremal regions', Proc. IEEE Int. Conf. Image Process., 2011, pp. 2609-2612. [27] Tomasi, Carlo, and Roberto Manduchi. 'Bilateral filtering for gray and color images.' Computer Vision, 1998. Sixth International Conference on. IEEE, 1998. [28] Durand, F., Dorsey, J.: Fast bilateral filtering for the display of high-dynamic-range images. In: SIGGRAPH (2002) [29] Nobuyuki Otsu, 'A Threshold Selection Method from Gray-Level Histograms,' IEEE transactions on systems, man and cybernetics, Jan 1979, Vol SMC-9, No.1. [30] Sobel and G. Feldman, 'A 3x3 Isotropic Gradient Operator for Image Processing,' Pattern Classification and Scene Analysis, 1973, pp. 271-272. [31] Lowe, David G. 'Distinctive image features from scale-invariant keypoints.' International journal of computer vision 60.2 (2004): 91-110. [32] Song, Fuhua, and Bin Lu. 'An automatic video image mosaic algorithm based on SIFT feature matching.' Proceedings of the 2012 International Conference on Communication, Electronics and Automation Engineering. Springer Berlin Heidelberg, 2013. [33] Zeng, Lin, et al. 'Dynamic image mosaic via SIFT and dynamic programming.' Machine vision and applications 25.5 (2014): 1271-1282. [34] Leutenegger, Stefan, Margarita Chli, and Roland Y. Siegwart. 'BRISK: Binary robust invariant scalable keypoints.' 2011 International conference on computer vision. IEEE, 2011. [35] Rublee, Ethan, et al. 'ORB: An efficient alternative to SIFT or SURF.' 2011 International conference on computer vision. IEEE, 2011. [36] Bay, Herbert, Tinne Tuytelaars, and Luc Van Gool. 'Surf: Speeded up robust features.' European conference on computer vision. Springer Berlin Heidelberg, 2006. [37] Alcantarilla, Pablo Fernández, Adrien Bartoli, and Andrew J. Davison. 'KAZE features.' European Conference on Computer Vision. Springer Berlin Heidelberg, 2012. [38] Alcantarilla, Pablo F., and TrueVision Solutions. 'Fast explicit diffusion for accelerated features in nonlinear scale spaces.' IEEE Trans. Patt. Anal. Mach. Intell 34.7 (2011): 1281-1298. [39] Karatzas, Dimosthenis, et al. 'Icdar 2015 competition on robust reading.' Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. IEEE, 2015. [40] Karatzas, Dimosthenis, et al. 'ICDAR 2013 robust reading competition.' 2013 12th International Conference on Document Analysis and Recognition. IEEE, 2013. [41] Wikipedia Page HSV Color Space. (2016). Retrieved July 12, 2016, from https://en.wikipedia.org/wiki/HSL_and_HSV [42] Open Source Computer Vision 3.1.0. (2016). Retrieved July 12, 2016, from http://docs.opencv.org/3.1.0/#gsc.tab=0
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/49168	-
dc.description.abstract	近年來隨著科技的進步、無線感知設備與行動裝置普及，陸續發展眾多空間定位的應用服務。談到定位與導航的應用，人類現今在室外最常使用即是全球定位系統 (Global Positioning System, GPS)。人類在生活中，有很大一大部分的時間都從事室內活動，但由於室內環境的限制，使得 GPS 無法在室內空間中做精確定位服務，進而產生出室內定位服務的需求。本論文利用人類在面對陌生公共環境，且 GPS 無法提供有效服務時，亦採取尋找方向告示牌、警示牌、路標或室內地圖…等文字說明訊息的行為，提出一演算法以模擬人類在陌生公共室內環境時，會採取的視覺影像資訊檢索暨定位策略做為導航之用。本論文以行動設備的相機模擬人類視覺的感知設備，在室內空間中拍攝具有文字資訊的方向指示牌，接著透過影像處理最大穩定極值區域 (MSER) 的方法，偵測影像中具有資訊的區域，經過演算法的計算，擷取出影像中的文字資訊區域，再利用尺度不變特徵轉換 (SIFT) 匹配預先建立的室內文字影像地圖，以達到室內空間定位的目的。本論文之演算法在未經過數據訓練的情況下，文字影像偵測正確率達到 74.84%，且影像是由行動設備在人潮眾多的台北車站捷運站內以及其它大眾運輸室內環境所拍攝。而在文字影像空間定位實驗中，室內定位的正確率達 79.89%。證實本論文之演算法在無建置室內空間模型、訓練數據與控制環境條件下，亦可以達到室內空間定位的目的。	zh_TW
dc.description.abstract	Outdoor GPS is the backbone of positioning and navigation applications, today. However, people engage in indoor activities much more than outdoor activities in an urban environment. Unfortunately, GPS can not provide precise positioning service in an indoor setting. Therefore, the need for indoor positioning services emerges. The approach proposed in this study mimics human’s natural behavior to find the directions from traffic signs, warning signs, indoor maps in an unfamiliar public environment without precise GPS service. In other words, this study takes the strategy of retrieving image and positioning information for the purpose of navigation in an unfamiliar public indoor environment. This study takes mobile device's camera as human visual input and takes pictures of the direction signs with text information in indoor space. Then, a MSER-based feature detector is adopted to detect image regions with information as candidates. The algorithm extracts the images of the text information area with SIFT feature detection to match pre-established text image maps to serve the purpose of spatial information retrieval. The algorithm proposed in this study does not require any training works before installation, therefore it saves training time and avoids the overfitting problem. The images were taken with a mobile device in Taipei MRT Station and other public transport indoor environment. The text detection experiment has 74.84% precision. Basing on the experiment, a further text-image spatial positioning experiment is conducted and reached 79.89% precision. According to the results, the proposed algorithm without building a spatial model and training data in advance is robust and applicable for uncontrolled environmental image inputs. In other words, it can be the core part of a service for an indoor positioning system in the future.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T11:18:04Z (GMT). No. of bitstreams: 1 ntu-105-R03525060-1.pdf: 6612689 bytes, checksum: b4b2cf138673b3ef9d92fc907cd045fc (MD5) Previous issue date: 2016	en
dc.description.tableofcontents	口試委員審定書 i 致謝 ii 中文摘要 iii Abstract iv 目錄 vi 圖目錄 viii 表目錄 x 第一章緒論 1 1.1 研究背景 1 1.2 研究動機 2 1.3 研究貢獻 2 1.4 論文架構 3 第二章文獻探討 4 2.1 文字影像擷取方法之探討 4 2.1.1 基於影像紋理法 5 2.1.2 基於影像區域法 6 2.2 MSER 演算法之基礎理論 9 2.2.1 最大穩定極值區域 (Maximally Stable Extremal Regions, MSER) 9 2.2.2 Linear Time Maximally Stable Extremal Regions 11 2.3 特徵偵測之探討 12 2.3.1 Features from Accelerated Segment Test (FAST) 12 2.3.2 Scale-Invariant Feature Transform (SIFT) 13 2.4 基礎影像處理 16 2.4.1 邊緣偵測法 (Edge Detection) 16 2.4.2 雙邊濾波法 (Bilateral Filter, BF) 17 2.4.3 二值化 (Threshold) 18 第三章研究方法 21 3.1 問題定義及研究架構 21 3.1.1 問題定義 21 3.1.2 研究架構 22 3.2 偵測空間中的資訊位置 23 3.3 計算幾何 24 3.3.1 刪除非文字的候選區域 24 3.3.2 合併候選區域 25 3.3.3 擷取候選區域之外輪廓 26 3.3.4 計算候選區域亮度分佈差異 29 3.3.5 組合候選區域 30 3.4 文字影像特徵偵測 32 3.4.1 偵測文字角點特徵 32 3.4.2 發光文字光暈處理 34 3.4.3 文字邊緣強度投影 35 3.5 文字影像空間定位 37 第四章實驗結果與討論 39 4.1 實驗資料收集與建置 39 4.2 實驗評估方法 40 4.3 文字影像區域偵測結果 41 4.4 文字影像空間定位結果 45 第五章結論與未來展望 48 5.1 結論 48 5.2 未來展望 49 參考文獻 50 附錄一　文字影像特徵偵測 54 附錄二　台北車站捷運站的告示牌基準位置 59
dc.language.iso	zh-TW
dc.subject	室內定位	zh_TW
dc.subject	尺度不變特徵轉換	zh_TW
dc.subject	最大穩定極值區域	zh_TW
dc.subject	文字影像偵測	zh_TW
dc.subject	Indoor Localization	en
dc.subject	SIFT	en
dc.subject	MSER	en
dc.subject	Text Detection	en
dc.title	基於手持移動裝置之室內空間文字影像擷取	zh_TW
dc.title	Text Images Based Spatial Information Retrieval with Handheld Mobile Devices	en
dc.type	Thesis
dc.date.schoolyear	104-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	張恆華(Herng-Hua Chang),傅楸善(Chiou-Shann Fuh)
dc.subject.keyword	室內定位,文字影像偵測,最大穩定極值區域,尺度不變特徵轉換,	zh_TW
dc.subject.keyword	Indoor Localization,Text Detection,MSER,SIFT,	en
dc.relation.page	60
dc.identifier.doi	10.6342/NTU201602757
dc.rights.note	有償授權
dc.date.accepted	2016-08-20
dc.contributor.author-college	工學院	zh_TW
dc.contributor.author-dept	工程科學及海洋工程學研究所	zh_TW
顯示於系所單位：	工程科學及海洋工程學系

文件中的檔案：

檔案	大小	格式
ntu-105-1.pdf 未授權公開取用	6.46 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。