OpenAI Wants AI to Lend a hand People Train AI

Share

One of the key ingredients that made ChatGPT an overnight success was an army of human coaches who fed the AI ​​model behind the bot tips on good and bad performance. Open AI Now says that adding even more artificial intelligence to the mix — which would make it easier to assist human trainers — could make AI helpers smarter and more reliable.

By developing ChatGPT, OpenAI pioneered the employ of reinforcement learning with human feedback, or RLHF. This technique uses input from human testers to fine-tune the AI ​​model so that its output is judged to be more consistent, less controversial, and more correct. The ratings that trainers give are passed to the algorithm that controls the model’s behavior. This technique has proven to be crucial both for increasing the reliability and usability of chatbots and for preventing their inappropriate behavior.

“RLHF works very well, but has some key limitations,” says Nat McAleese, a researcher at OpenAI involved in the fresh work. First, people’s opinions can be inconsistent. Second, even for skilled people, it can be tough to evaluate extremely sophisticated results, such as sophisticated software code. This process can also optimize the model to produce results that appear convincing rather than correct.

OpenAI has developed a fresh model, fine-tuning its most powerful offering, GPT-4, to facilitate trainers tasked with evaluating code. The company found that the fresh model, called CriticGPT, could catch bugs missed by humans, and human judges found its code critiques better 63 percent of the time. In the future, OpenAI will consider extending this approach to areas beyond code.

“We’re starting to work on integrating this technique into our RLHF chat stack,” McAleese says. He notes that the approach is imperfect, because CriticGPT can also make mistakes by hallucinating, but he adds that the technique could facilitate make OpenAI models, as well as tools like ChatGPT, more correct by reducing the errors in human training. He adds that it could also prove crucial in helping AI models become much smarter, because it could allow humans to facilitate train AI that is beyond their own capabilities. “And as the models get better, we suspect that humans will need more help,” McAleese says.

The fresh technique is one of many being developed to improve enormous language models and extract more capabilities from them. It is also part of an effort to ensure that AI behaves in acceptable ways even as it gains greater capabilities.

Earlier this month, Anthropic, an OpenAI rival founded by former OpenAI employees, announced a more capable version of its chatbot, called Claude, thanks to improvements to the model’s training framework and the data it’s fed. Anthropic and OpenAI have also recently touted fresh ways to inspect AI models to understand how they’re achieving their results, in order to better prevent undesirable behaviors like cheating.

The fresh technique could facilitate OpenAI train increasingly powerful AI models while ensuring that their results are more reliable and consistent with human values, especially if a company successfully implements it in more areas than code. OpenAI has said it is training its next substantial artificial intelligence model, and the company is clearly keen to show it is stern about ensuring it works properly. This comes after the dissolution of a significant team dedicated to assessing the long-term threats posed by artificial intelligence. The team was co-led by Ilya Sutskever, the company’s co-founder and former board member, who briefly pushed CEO Sam Altman out of the company before stepping back and helping him regain control. Several members of that team have since criticized the company for its risky move in its rush to develop and commercialize powerful artificial intelligence algorithms.

Dylan Hadfield-Menellan MIT professor who studies ways to adapt AI says the idea of ​​having AI models facilitate train more powerful ones has been floating around for some time. “It’s quite a natural progression,” he says.

Hadfield-Menell notes that the researchers who originally developed the techniques used in RLHF discussed related ideas a few years ago. He says it’s still unclear how widely applicable and effective it is. “It could lead to big leaps in individual capabilities and could be a stepping stone toward more effective feedback loops in the long term,” he says.

Latest Posts

More News