ChatGPT developer OpenAI’s approach to building artificial intelligence has faced criticism this week from former employees who accuse the company of taking unnecessary risks with technology that could become harmful.
OpenAI published a up-to-date research paper today that seemed intended to show that it is sedate about combating AI risks by making its models more limpid. In papercompany researchers are developing a way to gain insight into the artificial intelligence model that underpins ChatGPT. They have developed a way to identify how certain concepts are stored, including those that may cause an AI system to malfunction.
While the study makes OpenAI’s work to control AI more noticeable, it also highlights recent turmoil at the company. The up-to-date research was conducted by the recently disbanded Super Alignment Team at OpenAI, which was tasked with examining the long-term risks posed by the technology.
Co-authors include the co-chairs of the first group, Ilya Sutskever and Jan Leike, who left OpenAI. Sutskever, the company’s co-founder and former chief scientist, was among the board members who voted to fire OpenAI CEO Sam Altman last November, causing several days of chaos that culminated with Altman’s return to leadership.
ChatGPT is based on a family of so-called huge language models called GPTs, based on a machine learning approach known as artificial neural networks. These mathematical networks have demonstrated great ability to learn useful tasks by analyzing sample data, but their performance cannot be as easily examined as can be done with conventional computer programs. The elaborate interaction between the layers of “neurons” in an artificial neural network makes reverse engineering, where a system like ChatGPT has come up with a specific answer, a huge challenge.
“Unlike most human creations, we do not really understand the inner workings of neural networks,” the authors write in an accompanying article blog post. Some prominent artificial intelligence researchers believe that the most powerful artificial intelligence models, including ChatGPT, could be used to design chemical or biological weapons and coordinate cyber attacks. The longer-term concern is that AI models may hide information or act in malicious ways to achieve their goals.
A up-to-date OpenAI paper describes a technique that reduces the mystery a bit by identifying patterns that represent specific concepts in a machine learning system using an additional machine learning model. The key innovation is to improve the network used to look into the system of interest by identifying concepts to make it more effective.
OpenAI demonstrated this approach by identifying patterns representing concepts contained in GPT-4, one of the largest artificial intelligence models. Business code released related to work on interpretation and visualization tool this can be used to see how words in different sentences activate concepts, including profanity and sexual content, in GPT-4 and another model. Knowing how a model represents certain concepts can be a step towards eliminating those associated with undesirable behavior to keep the AI system on track. It could also enable the AI system to be tuned to favor certain topics or ideas.
