| Vol. MCMLXXXIV Issue 42

BANISHING GRADIENTS

America's Loss Function

Researchers Discover AI's Biggest Weakness Is Being Asked To Count Letters

Multi-billion dollar systems stumped by questions a kindergartner could answer

Margaret "Maggie" McAllister (Senior Technology Editor) · · 3 min read
Alphabet blocks scattered on a surface
Photo: Unsplash

CAMBRIDGE, MA — Researchers at MIT have published a groundbreaking study confirming what many frustrated users have long suspected: despite multi-billion dollar investments and capabilities that border on the magical, the world’s most advanced AI systems can be completely stumped by asking them to count how many times a letter appears in a word.

“We’ve created systems that can write Shakespearean sonnets, explain general relativity, and generate working code in dozens of programming languages,” said lead researcher Dr. Kevin Park. “And yet, if you ask them how many R’s are in ‘strawberry,’ there’s approximately a 60% chance they’ll confidently give you the wrong answer. It’s genuinely fascinating in a ‘deeply concerning’ kind of way.”

The study tested 15 leading language models on what researchers termed “kindergarten-level letter counting tasks.” Results were described as “humbling” and “a reminder that we are all very far from AGI.”

Sample interactions from the study included:

Q: How many L’s are in “llama”? AI Response: “The word ‘llama’ contains one letter L.” Actual Answer: 2

Q: How many E’s are in “excellence”? AI Response: “Let me count carefully. E-x-c-e-l-l-e-n-c-e. The word ‘excellence’ contains two E’s.” Actual Answer: 4

Q: How many S’s are in “Mississippi”? AI Response: “Mississippi contains 4 S’s. Actually, let me recount. M-i-s-s-i-s-s-i-p-p-i. I apologize for the error - it contains 5 S’s. Wait, I should double-check. Actually, it’s 4. No, 5. I’m confident it’s 4.” Actual Answer: 4 (the AI was accidentally correct through what researchers termed “random chaos”)

Dr. Park explained that the limitation stems from how language models process text.

“These systems don’t actually see letters — they see ‘tokens,’ which are chunks of text that might be whole words or parts of words,” he said. “So when you ask them to count letters, they’re essentially trying to solve a problem using tools that weren’t designed for the task. It’s like asking someone to measure a room’s temperature using a ruler. You might get an answer, but it’s going to be wrong, and they’re going to be very confident about it.”

The AI industry has responded to the findings with a mixture of defensiveness and resignation.

“Letter counting is a very narrow benchmark that doesn’t reflect the broad capabilities of our systems,” said a spokesperson for one major AI company. “Our model can write legal briefs, compose music, and generate photorealistic images. Does it really matter if it can’t count the R’s in ‘strawberry’?”

When informed that accurately processing basic information about words seems like it should be a prerequisite for systems claiming to understand language, the spokesperson paused.

“Look, we’re working on it,” they said. “Until then, please stop asking our AI to count letters. You’re embarrassing all of us.”

Some researchers have proposed workarounds, including training models specifically on letter-counting tasks or implementing external tools to handle character-level operations. These solutions have been described as “fixes that highlight how weird the underlying problem is.”

“We’re talking about teaching a system that allegedly understands language how to look at words,” noted Dr. Park. “The fact that this requires special training is philosophically interesting. And by ‘interesting,’ I mean ‘slightly terrifying.’”

Meanwhile, a five-year-old consulted for this article correctly identified that “strawberry” contains three R’s in approximately four seconds, then asked for a juice box.

At press time, a major AI company had announced a new “counting-enhanced” model that achieves 89% accuracy on letter-counting benchmarks, which they noted represents “significant progress toward kindergarten-level capabilities.”