The World Through a Robot’s Lens: Enabling Complex Decision-Making With The Help of Reactive Games
Written by: Disha Kamale, Explainable Robotics Lab, Mechanical Engineering and Mechanics Department, Lehigh University
Cover image by: Gerd Altmann
Imagine driving through a block of a city you have never seen before. You are approaching an intersection with a stop sign which is partially covered by a tree branch. The moment you see any familiar part of the sign, you will right away reach for the brakes almost as a reflex! The speed at which our brain can infer the scene and take actions [under usual circumstances] is astonishing! On the other hand, making a robot do everyday, seemingly trivial activities such as walking or driving is enormously challenging and a very active area for research. For the purpose of this article, we will focus on an autonomous robot car platform.
A self-driving car is usually equipped with a plethora of sensing modalities ranging from range finders such as a RADAR, LIDAR (Light Detection and Ranging) to camera to ultrasonic sensors – all of which enable it to sense some specific characteristic. For example, a LIDAR may provide the robot with the knowledge of whether there are obstacles. The camera, on the other hand, can provide much finer details such as the color of the traffic light at the upcoming intersection.
All a robot car “sees” is a series of frames of the surroundings limited to the field of view of the sensors mounted on it. The detection and recognition algorithms onboard then combine all information provided by sensors to make sense of it [semantically understands it] and conveys that understanding to a planning and decision-making algorithm that makes a decision that abides by the rules of the road [traffic rules] while making progress towards its end goal. This decision is then translated into actuator commands and the robot executes continuous actions. In this article, we will focus on the problem of decision-making described below.
This process of arriving at a decision can often be delayed by the robot’s perception module since there may be incomplete information or inaccuracies in the robot’s sensing and detection framework. Missing information and inaccurate data may introduce uncertainty, e.g., sensing limitations, weather conditions, etc., which can expose the robot to potential and undesirable risks.
Consider, for example, the sign shown in the figure below where the resolution of the captured images only improves gradually as the robot travels closer to the sign.
Robot’s distance from the sign decreses⇒
Even though the images above show only the sign, in reality, the sensors onboard a robot capture everything in their fields of view ranging from the signs, poles, buildings, billboards, the road, other cars and so on. Combing through all of this information while also not causing a traffic jam can be a difficult task! Given all these challenges, how do we make sure that the robot operates to reasonably perform a task without compromising safety?
Abstractions to the rescue
In day-to-day activities, the information humans use to aid decision-making comes in many different forms such as the context, familiarity, experience, etc while abstracting out the irrelevant information. So instead of waiting for complete information about the environment, can we make the robots more responsive by allowing them to utilize all relevant information, even if it is possibly incomplete?
An Abstraction is basically a discrete representation of a continuous, real-world system. To represent an abstraction, a structure called graph can be utilized. A graph is a set of vertices and edges connected together to represent the original continuous system. It is one of the main data-structures that is widely used for real-world systems such as the internet, communication networks, etc. (Tangentially, here is a fun story of how the graph theory was originated for the interested reader). Now, let us look at how abstractions are utilized for capturing the motion and sensing of the robot described pictorially in figures [a] and [b].
a). Motion – When the robot’s surroundings are represented as a graph, the set of vertices indicate different regions of the road such as an intersection, a lane, etc. whereas the edges represent the permissible directions in which the robot can move.
b). Sensing – To model the perception knowledge, we assume that the robot is capable of identifying multiple levels of symbolic detail pertaining to an object e.g., the color, shape of a traffic sign. As the robot physically moves closer to sign, its sensing accuracy increases thereby improving the semantic understanding. For instance, as the robot moves forward, it deciphers that the blue square it observed is a pedestrian sign.
[a]
[b]
The robot’s sensing can now be modeled as a graph wherein each vertex corresponds to the semantic label (e.g., “red”, “triangle”) while the edges denote the refinement.
With the help of these abstractions, the robot can now consider the relevant (and potentially incomplete) symbolic information. This approach allows us to leverage the structure of the representation to make safe, informed decisions for complex robot missions. Note that the two graphs we created above are not independent! As the robot moves in the motion graph from one vertex to the other, it receives new observations and thus its vertices in the observation graph change [1]. These interactions need to follow certain rules that guarantee that the robot will always[1] make correct decisions.
[1] *These guarantees on Generalized Reactivity-1 formulae [2] can only be provided as long as the observations belong to a set of permissible states.
Rules:
Loosely speaking, the rules considered for this synthesis problem are of the form,
Observation ‘k’ => execute action ‘a’
Given that the observation belongs to a set of allowable states, the action to execute is defined. These conditions are expressed as Temporal Logic formulae which provides an unambiguous formalism for expressing missions with logical and temporal requirements. Without going into the details of formal specifications, it suffices to mention that these provide safety, correctness guarantees on the robot that cannot be readily offered by black-box methods such as deep learning.
Putting it all together into a giant reactive game:
Here, reactive comes from “acting in response to a situation rather than creating or controlling it”. Thinking from the robot’s decision-maker perspective, it can only react to the events in the environment, not create them. The environment, on the other hand, cannot be controlled by the robot and thus, acts as an independent entity. In other words, the robot motion and perception abstractions created above are not independent. If we consider the robot and the environment as players of a game, their interplay can be abstracted as a game wherein each player reacts to the actions of the other while following rules.
The goal of the robot is to satisfy the rules (specifications) regardless of the actions of the environment. This game actually corresponds to a graph wherein each vertex denotes the state of the game. Each player has a set of actions available and the progress conditions that are termed as rules.
The figure below shows a robot approaching a stop sign. As the robot’s distance from the sign decreases, its perception knowledge passes through a series of states shown in blue and at each timestep, it takes an appropriate control action.
Using this framework, the simulations we performed reflected that the safety with respect to decisions was drastically improved when this hierarchical perception structure is used [3]. These studies were presented at IEEE International Conference on Robotics and Automation (ICRA) 2023. Moreover, these hierarchical structures can also be used for exploring unknown environments [4].
About our lab
At the Explainable Robotics Lab (ERL) lead by Dr. Cristian-Ioan Vasile, we utilize the tools from various tools from theoretical computer science and robotics to tackle problems ranging from single robot missions to multi-robot teams, fair planning for fleets of autonomous vehicles with complex tasks and deadlines while providing formal guarantees of safety and correctness. We are also developing a test-bed for Self-driving cars for testing complex decision-making in the urban environments. We are part of the Autonomous and Intelligent Robots Laboratory, Lehigh (AIRLab).
ERL Members: Dr. Cristian-Ioan Vasile, Gustavo Cardona, Kaier Liang, Disha Kamale.
Conclusion
This work was a first step in the direction of utilizing hierarchies for decision making. Presently, we are exploring the range of decision-making problems that can benefit from this approach. Considering the refinement in robot’s perception and allowing for coarse decision making based on incomplete but relevant perception information is a powerful approach in making sure that the decisions are made in a timely manner.
Appendix
Here is a real-world example of a hidden traffic sign.
Efforts by Furat Mousa, Courtesy: Google Earth
References
- Gibson, J. J. (1950). The perception of the visual world.
- Filippidis, I., Dathathri, S., Livingston, S. C., Ozay, N., & Murray, R. M. (2016, September). Control design for hybrid systems with TuLiP: The temporal logic planning toolbox. In 2016 IEEE Conference on Control Applications (CCA) (pp. 1030-1041). IEEE.Kamale, D., Haesaert, S., & Vasile, C. I. (2023, May). Cautious planning with incremental symbolic perception: Designing verified reactive driving maneuvers. In 2023 IEEE International Conference on Robotics and Automation (ICRA) (pp. 1652-1658). IEEE.
- Kamale, D., Haesaert, S., & Vasile, C. I. (2023). Energy-Constrained Active Exploration Under Incremental-Resolution Symbolic Perception. arXiv preprint arXiv:2309.07347.