How many times does the letter “r” appear in the word “strawberry”? According to excellent AI products like GPT-4o and Claude, the answer is: twice.
Huge language models (LLMs) can write essays and solve equations in seconds. They can synthesize terabytes of data faster than a human can open a book. Yet these seemingly omniscient AIs sometimes fail so spectacularly that the accident becomes a viral meme, and we all rejoice in the relief that perhaps it will be a while before we have to bow to our novel AI overlords.
The failure of vast language models to understand the concepts of letters and syllables points to a larger truth that we often forget: These things don’t have brains. They don’t think like we do. They aren’t human, or even particularly human-like.
Most LLMs are built on transformers, a type of deep learning architecture. Transformer models break text into tokens, which can be full words, syllables, or letters, depending on the model.
“LLMs are based on this transformer architecture, which, remarkably, is not actually reading the text. What happens is that when you enter a prompt, it’s translated into an encoding,” Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, told TechCrunch. “When it sees the word ‘the,’ it has one encoding of what ‘the’ means, but it doesn’t know about the ‘T,’ ‘H,’ ‘E.’”
This is because transformers can’t efficiently take in or output actual text. Instead, the text is converted into numerical representations of itself, which are then contextualized to facilitate the AI come up with a logical answer. In other words, the AI might know that the tokens “straw” and “berry” make “strawberry,” but it might not understand that “strawberry” is made up of the letters “s,” “t,” “r,” “a,” “w,” “b,” “e,” “r,” “r,” and “y,” in that particular order. So it can’t tell how many letters — let alone how many “r”s — appear in the word “strawberry.”
This is not an basic problem to solve because it is inherent in the architecture on which LLM programs operate.
TechCrunch’s Kyle Wiggers looked into this problem last month and spoke with Sheridan Feucht, a PhD candidate at Northeastern University researching interpretability LLM.
“It’s hard to get around the issue of what exactly a ‘word’ should be for a language model, and even if we could get human experts to agree on an ideal token dictionary, the models would probably still find it useful to break things down into smaller pieces,” Feucht told TechCrunch. “I don’t think there’s such a thing as a perfect tokenizer because of this kind of ambiguity.”
This problem becomes even more sophisticated when LLM learns more languages. For example, some tokenization methods might assume that a space in a sentence will always precede a novel word, but many languages, such as Chinese, Japanese, Thai, Lao, Korean, Khmer, and others, do not exploit spaces to separate words. Google DeepMind AI researcher Yennie Jun found in a 2023 study that some languages need up to 10 times more tokens than English to convey the same meaning.
“Probably the best solution would be to allow models to directly view characters without imposing tokenization, but this is currently computationally infeasible for transformers,” Feucht said.
Image generators like Midjourney and DALL-E don’t exploit the transformer architecture that text generators like ChatGPT have under their hood. Instead, image generators typically exploit diffusion models that reconstruct the image from noise. Diffusion models are trained on vast image databases and are motivated to try to reproduce something similar to what they learned from the training data.
Asmelash Teka Hadgu, Co-Founder Oral and a colleague from DAIR Institutetold TechCrunch, “Image generators perform significantly better on artifacts like cars and people’s faces, and worse on smaller things like fingers and handwriting.”
This may be because these smaller details often do not show up as clearly in training sets as concepts such as the fact that trees usually have green leaves. However, the problems with diffusion models may be easier to solve than those that plague transformers. Some image generators have improved their hand representation, for example by training on more images of real, human hands.
“Just last year, all of these models were really bad at fingers, and it’s the same problem as text,” Guzdial explained. “They’re getting really good at it locally, so if you look at a hand with six or seven fingers, you can say, ‘Oh wow, that looks like a finger.’ Likewise, with generated text, you can say, ‘This looks like an ‘H,’ and this looks like a ‘P,’ but they’re really bad at putting all of those things together.”
So if you ask an AI image generator to create a menu for a Mexican restaurant, you might get the usual items like “Tacos,” but you’re more likely to find offerings like “Tamilos,” “Enchidaa,” and “Burhiltos.”
While these “strawberry” spelling memes are spreading across the internet, OpenAI is working on a novel AI product, codenamed Strawberry, that promises to be even more adept at reasoning. LLM’s development has been restricted by the fact that there simply isn’t enough training data in the world to make products like ChatGPT more correct. But Strawberry can reportedly generate correct synthetic data to make OpenAI’s LLM even better. According to InformationStrawberry can solve The Recent York Times “Connections” word puzzles, which require artistic thinking and pattern recognition, and can also solve math equations he has never encountered before.
Meanwhile, Google DeepMind recently exposed AlphaProof and AlphaGeometry 2, AI systems designed for formal mathematical reasoning. Google says the two systems solved four of the six problems from the International Mathematical Olympiad, which would be good enough to win a silver medal in the prestigious competition.
It’s a bit of trolling that memes about AI not being able to spell the word “strawberry” are circulating at the same time as reports about OpenAI Strawberry. But OpenAI CEO Sam Altman jumped at the chance to show us that he has some pretty impressive blueberry crops growing. garden.