Workflow for completing natural-language request with metric-semantic representation of environment
Abstract
In mobile robotics and autonomous systems, a natural-language request can be completed by converting it into high-level and low-level tasks. To accomplish such a request, both these types of tasks must be implemented, along with an efficient method to bridge them. However, this problem is still open. This work presents a two-phase workflow (figure 1), including Comprehension and Implementation, based on a metric-semantic map to address this problem. In the Comprehension phase, also known as automated planning, the natural language request is converted into actionable plans using semantic information from the map. These plans are then passed to the Implementation phase, where tasks like navigation or manipulation are executed utilizing geometric information from the map. Moreover, we also conduct an experiment to illustrate how a natural-language request is implemented on a specific metric-semantic presentation of the environment, namely a 3D Scene Graph, with the following complete sequence: from creating the 3D Scene graph until getting the feasible output path. In addition, this work highlights limitations that need to be addressed in the future to enhance the proposed workflow.