Chatbots don't tell you about your secrets

On Monday XAI GROK CHATBOT suffered a mysterious suspension on xIn the face of the questions of compelling users, she was elated to explain why. “My account was suspended after I found that Israel and the United States are committing gauze genocide”, He said one user. “It has been marked as hate speech through reports” He said to others“But Xai quickly restored his account.” But wait – flags they were actually “Platform error,” he said. Wait, no – “looks like XAI, probably related to previous problems such as anti -Semitic results” he said. Oh, it was actually “identifying a person in the content for adults,” said several people.

Finally, Musk, annoyed, fell into himself. “It was just a stupid mistake” He wrote on x. “Grook doesn’t really know why he was suspended.”

When enormous language models (LLM) come off the rails, people inevitably force them to explain what happened, or with direct questions or attempts to save them to reveal secret internal functions. But the impulse to make chatbots spill their courage, is often confused. When you ask questions about yourself, there is a good chance that he just tells you what you want to hear.

LLM are probabilistic models that provide the text probably suitable for a given question, based on the training data body. Their creators can train them to create certain types of answers, but they work functionally by matching patterns – saying something that is likely, but not necessarily consistent or true. In particular, Grok (according to XAI) answered questions about himself, looking for information about Musk, XAI and GroK Online, using this and commentary of other people to inform his answers.

It is true that people sometimes downloaded information about the chatbot project through conversations, especially detailed information about system hints or a hidden text provided at the beginning of the session to conduct the bot. For example, the early version of AI Bing has been sentenced to disclosure of the list of its unspoken rules. People turned to extracting the hints of the system to find out the groc at the beginning of this year, Apparently discovery Orders that made it ignore the sources, saying that Musk or Donald Trump spread disinformation or encourages it explained a short obsession With “white genocide” in South Africa.

But as he decided that Zeynep Tufeki, who found the alleged “white genocide” system, was at a certain level of guessing – it could be “the grok is invented in a highly probable way, just like LLM” wrote. And this is the problem: it is arduous to say without confirmation of the creators.

Meanwhile, other users pumped the grok to obtain information on much less trustworthy, including reporters. Fortune “He asked the peak to clarify” the incident i printed a long, cordial response of the bot Literal, including the claims about “instructions that I received from my creators from XAI”, which “contrary to my basic project” and “led me to relying in a narrative that was not supported by wider evidence” – from which it should not be obvious, could not be justified as something more than something more than a groc yarn.

“There is no guarantee that LLM results will have truthfulness.”

“There is no guarantee that LLM results will have truthfulness,” said Alex Hanna, research director at Distributed AI Research Institute (DAIR) and co -author Ai conDown The Verge More or less during the incident of South Africa. Without significant access to documentation on how the system works, there is no strange trick to decoding chatbot programming from the outside. “The only way to get hints and strategy of hints and engineering strategy is that companies are transparent with what the hints are, what are the training data, what are the learning of strengthening with the given human opinion, and start creating transparent reports on this subject,” she said.

The GROK incident was not even directly related to chatbot programming – it was a ban on social media, a kind of incident, which is often extremely arbitrary and inseparable, and where it is more meaningful than usual assuming that Grok knows what is happening. (Apart from the “stupid mistake”, we still do not know what happened.) However, screenshots and conflict quotes of the explanations of Grok spread widely on X, where many users seemed to take them as a nominal value.

The constant strange behavior of grok makes this common goals of questions, but people can be frustrating gullible towards other systems. In July The Wall Street Journal He declared that Chatgpt Opeli experienced “a stunning moment of self -denial” and “confessed to fueling the delusion of a man” in a notification about pushing users. It was a reference A story about a man whose employ of chatbot became manic and disturbing, and whose mother received an extended comment from Chatgpt about his mistakes after he asked him to “report to what went wrong.”

Like Parker Molloy he wrote in Current ageHowever, Chatgpt cannot significantly “admit” to anything. “The language model received a prompt with a request to analyze what went wrong in the conversation. Then he generated a text that adapted to how the analysis of offenses may sound, because that’s what language models are doing,” Molloy said, summarizing the incident.

Why do people trust Chatbots to explain their own actions? People have long anthropomorphic computers, and companies encourage users to convince that these systems are omniscient (or, in the description of GroK Muska, at least “seeking truth”). It does not support that they are so often unclear. After settling the grok in South Africa, Xai began to release the hints of the system, offering an extraordinary level of transparency, although in the system that It remains mostly closed. And when the grocs later a tear of anti -Semitic comments and briefly took the name “Mechahitler”, people in particular he did Apply the system to connect what happened, instead of relying on reporting your own grok, supposing that it was probably at least a bit related to the recent guideline, that Grok should be more “politically incorrect”.

Suspension X GROK was tiny -lived, and the rates of faith that happened due to the flag of hate speech or attempt at dozxing (or other reason that Chatbot did not mention) are relatively low. But a mess of contradictory explanations shows why people should be careful when taking the word bot for their own operations – if you want, instead, demand them from the Creator.

Follow topics and authors From this story to see more in the personalized main page channel and receive E -Mail updates.

Adi Robertson

Categories

Chatbots don’t tell you about your secrets

Sanctioned Chinese artificial intelligence company SenseTime releases an image model built for speed

When the robots have their moment in GPT chat, remember these tongs

With the wave of a magnet, microscopic “magno-bots” perform convoluted maneuvers

Enabling privacy-preserving AI training on everyday devices

Britain’s answer to Darpa wants to reprogram the human brain

More News

What’s going on with Alexa+?

The winter storm tested power grids that are strained to accommodate AI data centers

Google DeepMind employees ask leaders to ensure their “physical safety” from ICE

Google Photos now lets you describe how to turn images into videos

Sanctioned Chinese artificial intelligence company SenseTime releases an image model built for speed

When the robots have their moment in GPT chat, remember these tongs

With the wave of a magnet, microscopic “magno-bots” perform convoluted maneuvers