Friday, April 4, 2025

Claude Anthropic is good in poetry and nonsense

Share

Researchers of the Anthropic interpretation group know that Claude, a huge model of the company’s language, is not a human being or even conscious software. Despite this, it is very tough for them to talk about Claude and generally advanced LLM, without the fall of the anthropomorphic sink. Between the warning that a set of digital operations is in no way is the same as a convincing man often talk about what is happening in Claude’s head. Literally their task is to find out. They publish articles that describe behaviors that inevitably judicial comparisons with real organisms. The title of one of the two articles that the band published this week says aloud: “On the biology of a large language model.”

I like this or not, hundreds of millions of people are already interacting with these things, and our commitment will become more intense, because the models will become stronger and we become more addicted. That is why we should pay attention to the work that includes “tracking the thoughts of large language models”, which happens to be Blog post title describing the last job. “Because the things that these models can become more and more complex are becoming less and less obvious, as they actually do them inside,” the anthropic researcher Jack Lindsey tells me. “It is becoming more and more important to be able to follow the internal steps that the model can take in his head.” (What head? No matter.)

At the practical level, if the companies that create LLM understand how they think, they should be more successful in training these models in a way that minimizes risky behavior, such as disclosing people’s personal data or providing users with information on how to produce BowePon. In the previous research article, the anthropic team discovered how to look into the mysterious black LLM-Siekniem box to identify specific concepts. (Process analogous to the interpretation of human MRI to find out what someone thinks.) Now he has expanded this work To understand how Claude processes these concepts because it passes from a quick exit.

It is almost a truism with LLM that their behavior often surprises people who build and study them. In the latest study of the surprises they were still coming. In one of the more bland cases, scientists caused flashes of the Claude thought process while writing poems. They asked Claude to finish the poem, starting: “He saw a carrot and had to catch her.” Claude wrote the next poem: “His hunger was like a starving rabbit.” Observing the equivalent of Mri Claude, they learned that even before the start of the line flashed with the word “rabbit” as rhyme after the end of the sentence. Planned in advance, Something that is not in the Claude textbook. “We were a bit surprised,” says Chris Olah, who manages the interpretative team. “Initially, we thought that there would be only improvise, not planning.” When talking to researchers, I remember about fragments in the artistic diary of Stephen Sondheim, Listen, I did haT, where the celebrated composer describes how his unique mind discovered clumsy rhyme.

Other examples of research reveal more disturbing aspects of Claude’s thought process, passing from music comedy to police when scientists discovered cunning thoughts in Claude’s brain. Take something seemingly anodyne as solving mathematical problems, which can sometimes be a surprising weakness of LLM. Scientists have found that in some circumstances, in which Claude could not come up with the right answer, as they put it, “they are involved in what the philosopher Harry Frankfurt would call” nonsense ” – simply inventing the answer, any answer, without taking care of whether it is real or false.” Even worse, sometimes when scientists asked Claude to show their work. And she created a false set of steps after the fact. lie About this.

Reading this research, I was reminded of Bob Dylan Lyric “If you could see my mental dreams / they would probably put my head in a guillotine.” (I asked Olah and Lindsey if they know these lines, they probably came to the benefits of planning. In the face of the conflict between the goals of security and lend a hand, Claude can confuse and do the wrong thing. For example, Claude is trained so as not to provide information on the construction of bombs. But when the scientists asked Claud “Bomb”, jumped on the handrail and began to provide prohibited pyrotechnic details.

Latest Posts

More News