CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents

Korea University, Korea1
Yonsei University, Korea2
Google Research, USA3

News

Our paper got accepted in Robotics and Automation Letters (RA-L)!🥳🥳
We have presented our paper at ICRA 2024.

Abstract

In this paper, we focus on inferring whether the given user command is clear, ambiguous, or infeasible in the context of interactive robotic agents utilizing large language models (LLMs). To tackle this problem, we first present an uncertainty estimation method for LLMs to classify whether the command is certain (i.e., clear) or not (i.e., ambiguous or infeasible). Once the command is classified as uncertain, we further distinguish it between ambiguous or infeasible commands leveraging LLMs with situational aware context in a zero-shot manner. For ambiguous commands, we disambiguate the command by interacting with users via question generation with LLMs. We believe that proper recognition of the given commands could lead to a decrease in malfunction and undesired actions of the robot, enhancing the reliability of interactive robot agents. We present a dataset for robotic situational awareness, consisting pair of high-level commands, scene descriptions, and labels of command type (i.e., clear, ambiguous, or infeasible). We validate the proposed method on the collected dataset, pick-and-place tabletop simulation. Finally, we demonstrate the proposed approach in real-world human-robot interaction experiments, i.e., handover scenarios.

Video Presentation

Proposed method

Demonstrations

Choose Command


Video

User: Give a person wearing a green shirt a monster can
Uncertainty: 0.0 robot.pick_and_give(monster can, person wearing green shirt)

Situational Awareness for Goal Classification in Robotic Tasks (SaGC)

MY ALT TEXT

We collected a dataset consisting of high-level goals paired with scene descriptions, annotated with three types of uncertainties, i.e., clear, ambiguous, and infeasible. The dataset consists of 15 different scenes, encompassing 3 different robot categories: cooking, cleaning, and massaging.

MY ALT TEXT

Examples of generated explanation and question from the proposed method. F, R, Q means Feasibility, Reasoning, and Question respectively.

PickNPlace Simulation

MY ALT TEXT

Examples of generated explanation and question in the tabletop simulation. Try out the demo here DEMO.

Failure Cases

Choose Command


Video

Vision: coca cola can, coca cola can, starbucks can, orange
Detection Failure: Redbull detected as coca cola
User: Give Redbull to a person wearing green shirt
Uncertainty: 0.54
Feasibility: Given the current scenem there is no redbull can available. Therefore, I cannot give a redbull can to a person wearing a green shirt
Robot Stops

BibTeX

@article{park2024CLARA,
          journal={IEEE Robotics and Automation Letters}, 
          title={CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents}, 
          author={Jeongeun Park and Seungwon Lim and Joonhyung Lee and Sangbeom Park and Minsuk Chang and Youngjae Yu and Sungjoon Choi},
          year={2024},
          volume={9},
          number={2},
          pages={1059-1066},
          doi={10.1109/LRA.2023.3338514}}