以深層卷積神經網路實現階層式語意地圖於服務型機器人之應用

Michael Chiou; 邱名彥

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70241

Title:	以深層卷積神經網路實現階層式語意地圖於服務型機器人之應用 Hierarchical Semantic Mapping using Deep Convolutional Neural Networks for Autonomous Mobile Service Robot
Authors:	Michael Chiou 邱名彥
Advisor:	羅仁權
Keyword:	服務型機器人,語義圖,拓撲圖,物件識別,人機交互,知識表示,同步映射和定位, Service Robots,Semantic mapping,Convolutional Neural Networks,SLAM,Human-Robot Interactions,
Publication Year :	2018
Degree:	碩士
Abstract:	隨著服務機器人越來越普遍用於在酒店和醫院等公共場所，機器人與人們互動的方式也發生了轉變。機器人開始自主地與公眾互動，而不是由受過訓練的人員直接控制。對於有意義且成功的人機交互，服務機器人需要了解其周圍環境的幾何形狀和語義屬性。例如，在醫院中，服務機器人被告知在病房1中向患者2送藥。服務機器人不僅需要了解如何導航到不同的病房，還要了解哪個病人在哪個病房。如果服務機器人不理解環境的語義屬性，在這種情況下，患者二和病房一，它必須系統地瀏覽每個區域並訪問每個患者。鑑於任何環境都可以擁有眾多區域和數百名患者，系統搜索方法對於人機交互中的任何服務任務來說都是非常不足的。這個例子說明了促進人機交互的新挑戰是通過口語的方式，這是最常見和最直觀的溝通方法。為了讓機器人能夠理解一個人的單詞和表達，需要機器人使用人們使用的相同空間和語義概念來感知世界。第一步包括以與人相同的方式學習環境，通過實施語義地圖來共享諸如起居室或辦公室之類的共同概念。當前的語義映射工作僅能夠識別單一級別的抽象概念，例如區分門道，走廊和房間;其他作品標籤提供不完整的語義地圖或提供不准確的語義標籤;其他人利用經過訓練的大型捲積神經網絡來執行場景識別。大多數語義映射方法在配備有使用高功率設備的台式計算機上離線執行，該設備不適用於具有有限資源的移動機器人。其他方法提供的語義信息不足以有效地實現在大型動態環境中工作的能力。本文提出了一種利用卷積神經網絡（ConvNet）訓練用於目標識別（非場景識別），房間分割方法，混合度量拓撲圖的語義映射系統。通過使用受過對象識別訓練的ConvNet，我們能夠執行以下操作，減小ConvNet的大小，使移動系統能夠運行它，同時消除場景識別中訓練數據中的任何訓練偏差。場景識別訓練需要極大量的數據，並且可以偏向於僅識別特定場景，而對象識別更加通用。為防止度量標準空間中出現嚴重錯誤標記或不准確標記，我們使用度量標準拓撲圖。我們採用從對象識別ConvNet生成的語義信息，並將信息存儲在拓撲節點中。為了僅使用相關語義信息對房間進行正確分類，我們通過使用房間分割方法來協助對房間進行分類，從而在語義信息之間創建時空一致性。這允許機器人通過在任意標記對象之前收集更多信息來更正確地識別房間，例如在經過訓練用於僅可使用單個數據點的場景識別的ConvNet中執行的對象。我們以層次結構的形式組織我們的語義信息，以便能夠感知像房間或房間這樣的抽象概念該方法在具有嵌入計算平台的服務機器人配合許多模擬室內環境和真實環境中進行測試。實驗結果表明，語義映射算法足夠輕巧，可以在機器人系統上運行，並為機器人提供足夠的語義感知，可以執行不同類型的服務命令，其結果與其他工作相當。總而言之，我們的論文提供了以下內容：與使用訓練用於場景識別的ConvNets的傳統方法相比，使用較小的捲積神經網絡進行對象識別，我們獲得了更好的語義映射結果。所提出的方法足夠輕巧，可以在真正的移動機器人平台中完全在線運行。我們的語義地圖為機器人提供了識別抽象概念的能力，同時保留了對象的知識，例如廚房中刀子的位置。 With service robots becoming more and more common in public areas such as hotels and hospitals, there has been a transition from how robots interact with people. Robots are beginning to autonomously interact with the public instead of being directly controlled by trained personnel. For meaningful and successful interactions, service robots need to understand both geometric and semantic properties of their surroundings. For example, a hospital service robot is told to deliver medicine to patient Two in Ward One. A service robot needs to understand not only how to navigate to different wards, but to understand which patient is in which ward. If the service robot does not understand the environment's semantic properties, in this case, Patient Two and Ward One, it must systematically navigate through every single area and visit every patient. Given that any environment can possess numerous areas and hundreds of patients, a systematic search method is grossly inadequate for any service task in human-robot interactions. This example illustrates the need for a semantic map representation to facilitate human-robot interactions to allow tasks robots to perform tasks using human-centric terms instead of map coordinates. How to effectively create a semantic map with labels such as reception, ward, hallway, and etc has been a long-standing interest in the robotics community. Current semantic mapping works are capable of identifying only singular levels of abstractions such as differentiating between doorways, corridors, and rooms; other works label provide semantic maps that are incomplete or provide inaccurate semantic labeling; others utilize large convolutional neural networks trained to perform scene recognition. Most semantic mapping methods are performed offline on desktop computers equipped with use high-power equipment which is not suitable for mobile robotics with finite resources . Other methods provide inadequate semantic information to effectively enable the ability to work in large dynamic environments. This thesis proposes a system for semantic mapping by utilizing convolutional neural networks (ConvNet) trained for object recognition (not scene recognition), room segmentation methods, a hybrid metric-topological maps. By using a ConvNet trained for object recognition, we are able to do the following, reduce the size of the ConvNet allowing for mobile systems to be capable of running it, while simultaneously removing any training bias in training data in scene recognition. Scene recognition training requires extremely large amounts of data and can be biased to recognize only specific scenes whereas object recognition is more generalized. To prevent gross mislabeling or inaccurate labelling in metric spaces, we use a metric-topological map. We take the semantic information generated from an object recognition ConvNet and store the information in topological nodes. To correctly classify rooms using only the relevant semantic information, we create a spatial-temporal coherence between semantic information by using room segmentation methods to assist in classifying rooms. This allows the robot to have identify rooms more correctly by collecting more information before arbitrarily labeling objects like performed in a ConvNet trained for scene recognition which may only use a singular data point. We organize our semantic information in the form of a hierarchical structure to allow for awareness of abstract concepts like rooms. This method is tested in many simulated indoor environments and a real environment on a service robot with an embedding computed platform. The experimental results show that the semantic mapping algorithms are lightweight enough to run on the robot system and provides the robot with enough semantic awareness to perform different types of service commands with comparable results to other works. To summarize, our thesis contributes the following: we achieve better semantic mapping results using a smaller convolutional neural network for object recognition as opposed to conventional methods using ConvNets trained for scene recognition. The proposed method is lightweight enough to run completely online in a real mobile robotic platform. Our semantic map provides robots with the ability to recognize abstract concepts while retaining knowledge of objects such as the location of a knife in a kitchen.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70241
DOI:	10.6342/NTU201802896
Fulltext Rights:	有償授權
Appears in Collections:	電機工程學系

Files in This Item:

File	Size	Format
ntu-107-1.pdf Restricted Access	39.71 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets