As autonomous systems and artificial intelligence become more common in everyday life, novel methods are emerging to lend a hand people check whether these systems are behaving as expected. One method, called formal specifications, uses mathematical formulas that can be translated into natural-language expressions. Some researchers argue that this method could be used to specify the decisions that AI will make in a way that humans can understand.
Scientists at MIT Lincoln Laboratory wanted to test such claims of interpretability. Their findings suggest the opposite: formal specifications don’t seem to be interpretable by humans. In the team’s study, participants were asked to see if an AI agent’s plan would succeed in a virtual game. After being presented with the formal specification of the plan, participants were correct less than half the time.
“The results are bad news for researchers who have argued that formal methods make systems interpretable. That may be true in some limited and abstract sense, but not in anything approaching practical system validation,” says Hosea Siu, a researcher at the lab. AI Technology Group. Group paper was accepted to the 2023 International Conference on Smart Robots and Systems, which took place earlier this month.
Interpretability is critical because it allows people to trust a machine when it’s being used in the real world. If a robot or AI can explain its actions, people can decide whether it needs correction or can be trusted to make straightforward decisions. An interpretable system also allows users of the technology—not just programmers—to understand and trust its capabilities. But interpretability has long been a challenge in AI and autonomy. Because the machine learning process takes place in a “black box,” modelers often can’t explain why or how the system reached a particular decision.
“When researchers say, ‘Our machine learning system is accurate,’ we ask, ‘How accurate?’ and ‘What data does it use?’ and if that information is not provided, we reject the claim. We didn’t do much when researchers said, ‘Our machine learning system is interpretable,’ and we need to start subjecting those claims to more scrutiny,” Siu says.
Lost meaning after translation
In the experiment, the researchers sought to determine whether formal specifications make a system’s behavior more understandable. They focused on people’s ability to apply such specifications to validate the system—that is, to understand whether the system always met the user’s goals.
The apply of formal specifications for this purpose is essentially a byproduct of their original apply. Formal specifications are part of a broader set of formal methods that apply logical expressions as a mathematical framework for describing the behavior of a model. Because the model is built on a logical flow, engineers can apply “model checkers” to mathematically prove facts about the system, including when it is or is not possible for the system to perform a task. Now scientists are trying to apply this same framework as a translational tool for humans.
“Scientists confuse the fact that formal specifications have precise semantics with the fact that they are interpretable to humans. They are not the same thing,” Siu says. “We realized that almost no one had checked whether people actually understood the results.”
In the experiment, the team asked participants to test a fairly uncomplicated set of behaviors from a robot playing a “capture the flag” game, essentially answering the question, “If the robot follows these rules exactly, will it always win?”
Participants included both experts and laypeople in formal methods. They were given formal specifications in three ways—a “raw” logical formula, a formula translated into words closer to natural language, and a decision tree format. Decision trees are often considered in the AI world as a way for humans to interpret decisions made by an AI or robot.
Results: “Overall, validation performance was quite poor, with accuracy around 45 percent, regardless of presentation type,” Siu says.
Definitely bad
People who had been previously trained in the formal specifications performed only slightly better than novices. However, experts reported significantly greater confidence in their answers, whether they were correct or not. Overall, people tended to over-trust the correctness of the specifications they were presented with, which meant they ignored the rulesets, allowing the game to lose. This confirmation bias is particularly worrisome for system validation, the researchers say, because people are more likely to miss failure modes.
“We don’t think this result means we should abandon formal specifications as a way to explain system behavior to people. But we do think that a lot more work needs to be done in designing how they are presented to people and the workflows in which people use them,” Siu adds.
When considering why the results were so needy, Siu realizes that even people working on formal methods aren’t exactly trained to check specifications, as the experiment required. And thinking through all the possible outcomes of a set of rules is demanding. Still, the sets of rules shown to participants were low, no more than a paragraph of text, “significantly shorter than anything you’d encounter in any real system,” Siu says.
The team isn’t trying to directly relate their results to the performance of humans in real-world robot validation. Instead, they plan to apply the results as a starting point for considering what the formal logic community might be missing by claiming they are interpretable, and how such claims might hold true in the real world.
The research was done as part of a larger project that Siu and his colleagues are working on to improve the relationship between robots and human operators, especially those in the military. The process of programming robotics can often leave operators out of the loop. With a similar goal of improving interpretability and trust, the project is trying to let operators teach tasks directly to robots, in a way similar to training humans. Such a process could improve both the operator’s confidence in the robot and the robot’s adaptability.
The researchers hope that the results of this study and ongoing research will allow for better apply of autonomy as it becomes more embedded in human life and decision-making.
“Our results suggest that human evaluations of some autonomy and AI systems and concepts are necessary before making too many claims about their usefulness to humans,” Siu added.