A home robot trained to perform homework in a factory may not effectively scrub the sink or pull out garbage after implementing the user in the kitchen, because this modern environment differs from the training space.
To avoid this, engineers often try to match the simulated training environment as close as possible to the real world in which the agent will be implemented.
However, scientists from the myth and elsewhere have now discovered that despite this conventional wisdom, sometimes training in a completely different environment gives a better productive agent of artificial intelligence.
Their results indicate that in some situations the training of a simulated AI agent in a world with less uncertainty or “noise” allowed him to achieve better results than a competing agent AI trained in the same, cacophonous world they used to test both agents.
Scientists call this unexpected phenomenon the effect of internal training.
“If we learn to play tennis in an internal environment, in which there is no noise, maybe we will be able to master various shots more easily. Then, if we move to a louder environment, such as a windy tennis court, we could have more likely to play tennis well than if we started learning in a windy environment, “explains Serena Bono, research assistant at Mit Media Lab and the main author of the article on the effect internal training.
Scientists studied this phenomenon, training AI agents to play Atari games, which they modified, adding some unpredictability. They were surprised when they found that the effect of internal training consistently took place in Atari games and game varieties.
They hope that these results drive additional research on the development of better training methods for AI agents.
“This is a completely new axis to think about. Instead of trying to adjust the training and test environments, we can be able to build simulated environments in which AI agent learns even better, “adds co -author Spandan Madan, a graduate of the University of Harvard.
Bono and Madan join the newspaper by Ishaan Grover, a graduate of MIT; Mao Yasueda, graduate of the University of Yale; Cynthia Breazeal, professor of media and sciences and leader of the Personal Robotics Group at Mit Media Lab; Hanspeter PFister, professor of computer science Wang on Harvard; and Gabriel Kreiman, a professor at Harvard Medical School. Research will be presented at the conference of the Artificial Intelligence Progress Association.
Training troubles
Scientists decided to investigate why reinforcement learning agents have such gloomy performance when they are tested in environments that differ from the training space.
Learning reinforcement is a method of trial and errors in which the agent is investigating the training place and learns to take actions that maximize its reward.
The team has developed a technique of explicitly adding a certain amount of noise to one element of the problem of learning the reinforcement called a transitional function. The transition function determines the likelihood that the agent will move from one state to another, based on the action of his choice.
If the PAC-Man agent, the transitional function may define the likelihood that the ghosts on the game board will move up, down, left or right. In standard learning, AI will be trained and tested using the same transitional function.
Scientists added noise to the transitional function thanks to this conventional approach and, as expected, the results of the PAC-Man of the Agent harmed.
But when the scientists trained the agent in the noise of Pac-Man, then they tested it in the environment in which they injected the noise to the transitional function, achieved better results than an agent trained in a cacophonous game.
“The rule is that you should try to capture the function of passing the implementation condition, as well as possible during training to get the greatest as possible bang. We really tested this insight into death because we couldn’t believe it, “says Madan.
Injection of various noise to the transition function allows scientists to test many environments, but did not create realistic games. The more noise they introduced to Pac-Man, the more likely it is that ghosts randomly teleport to different squares.
To check whether the effect of internal training occurred in normal PAC-Man games, they adapted basic probabilities, so ghosts moved normally, but more often moved up and down, not left and right. AI agents trained in noise -free environments still operated better in these realistic games.
“This is not only because of the way we added noise to create ad hoc environments. It seems that this is the property of the problem of learning to learn. And it was even more surprising – says Bono.
Exploration explanations
When scientists dug deeper in search of an explanation, they saw some correlations to the extent that AI agents study the training space.
When both AI agents mainly examine the same areas, the agent trained in a non -isan environment works better, perhaps because the agent is easier to learn the rules of the game without noise interference.
If their exploration patterns are different, then the agent trained in a noisy environment tends to achieve better. This can happen because the agent must understand patterns that he cannot learn in a noise -free environment.
“If I only learn to play tennis with my forhend in a non -His environment, but then in cacophonous, I also have to play with Backhand, I will not play so well in a non -His environment,” explains Bono.
In the future, scientists are hoping to examine how the effect of training in rooms in more elaborate strengthening learning environments or with other techniques, such as computer vision and natural language processing. They also want to build training environments designed to exploit the training effect in rooms that can facilitate AI agents achieve better results in uncertain environments.