Has ChatGPT Integration Actually Made Siri Dumber?

Toggle Dark Mode
Siri has caught a lot of well-deserved flak over the years for being one of the most obtuse voice assistants out there, but it seems that Apple’s attempt to mate Siri with ChatGPT may have produced something even more frustratingly witless.
It’s a given that Siri has never been great at offering useful world knowledge. It’s ironic considering that Apple’s was the first voice assistant to get mainstream attention, landing on the iPhone 4S in October, 2011, three years before Amazon Alexa was debuted as an invite-only preview to Prime members in November 2014. Google Assistant came along even later, first rolling out in 2016 to the now-defunct Allo messaging app and the Google Home smart speaker.
For various reasons, Apple bungled Siri over the years. The charitable take is that Apple simply didn’t give its voice assistant the attention it needed to grow and thrive. It started as a pet project of Steve Jobs and Scott Forstall, Apple’s then-SVP of iOS, who acquired the company behind it in 2010. The pair devoted substantial efforts to developing Siri into a viable voice assistant for the iPhone 4S, which arrived just after Jobs’ untimely death.
Sadly, with Jobs gone, Forstall was left to carry the torch by himself. In 2012, Forstall recruited Bill Stasior from Amazon to replace two of Siri’s co-founders, Dag Kittlaus and Adam Cheyer. The third, Tom Gruber, remained on the team until 2018. However, Forstall’s vision was not to be realized; following the launch of iOS 6 and some personality clashes in the company’s upper echelons, the new CEO, Tim Cook, showed Forstall the door and handed Siri over to Eddy Cue, the SVP of Apple’s service. Sadly, Cue had far more important things to manage, and Siri languished.
Stasior left in 2019, around the time that Apple hired John Giannandrea from Google to head up its new Machine Learning and AI Strategy department. For a while, it looked like Apple was getting serious about Siri, but we saw few changes, at least until this year’s release of Apple Intelligence.

Before iOS 18, Siri remained as dumb as tar when it came to world knowledge, but at least it was aware of its own ignorance. It was a competent voice assistant for on-device tasks: ask it to set a reminder, check your messages, play some music, or turn on the lights and it did fine. However, ask it for something that required a bit of research and it would simply feed you Google search results. That wasn’t ideal, especially if you were asking from a HomePod, but at least it pointed you in the right direction.
However, since iOS 18 was released last fall, it seems that Apple’s attempts to make Siri smarter have backfired. We’ve seen numerous reports on social media that Siri seems to be getting weirder, and recently, Daring Fireball’s John Gruber (who we certainly hope isn’t related to Siri co-founder Tom Gruber), has conducted an experiment to reveal just how embarrassingly bad Siri has become.
In a post titled Siri is Dumb and Getting Dumber, Gruber follows up on a previous link explaining how “utterly stupid and laughably wrong Siri is” when asked for the winner of Super Bowl 13. This was initially highlighted by Gruber’s friend, Paul Kafasis, who decided to go all the way and query Siri about every Super Bowl winner since the first one was held in 1967.
Siri’s score? Out of 58 Super Bowls that have been played, Siri got only 20 right.
If Siri were a quarterback, it would be drummed out of the NFL.
Paul Kafasis
Kafasis documented every one of the results in a spreadsheet, which you can find on his website.
It shouldn’t come as a surprise that most answer engines get this one right. We are talking about the Super Bowl, after all. Google Search provides reasonable results for a search engine, and ChatGPT, Gemini, and even Kagi and DuckDuckGo consistently get it right. When I asked Gemini to tell me when the first Super Bowl was held, it not only provided the correct date but helpfully offered some additional history, including the winner and the score.

As Gruber points out, Super Bowl winners aren’t an obscure topic, so he decided to try something out of left field to see how other knowledge engines responded. Gruber arbitrarily selected a test question by picking a year in the distant past, a high school sport he played, and a relatively obscure state: “Who won the 2004 North Dakota high school boys’ state basketball championship?”
Kagi and ChatGPT both nailed this one, and ChatGPT got extra credit for recognizing there were two classes and providing a link to a YouTube video. Kagi only answered for Class A, while DuckDuckGo’s AI Assist provided the correct answer for Class B that was technically correct, but also not what most would expect.
The interesting twist is that “old Siri” actually did a better job than the ChatGPT-powered version. You don’t have to go back that far to get sensible results, even if they’re not as helpful. Gruber used a Mac running macOS 15.1.1, the version just before Apple added ChatGPT into the mix. In that case, Siri did its usual thing of admitting it was ignorant of the answer and offering up some search results with links to more information that could legitimately be used to find the answer.
That’s considerably less useful than providing an answer, but at least it’s not wrong. By comparison, “new Siri” is hebetudinously stupid, and it’s not even consistently so.
New Siri — powered by Apple Intelligence™ with ChatGPT integration enabled — gets the answer completely but plausibly wrong, which is the worst way to get it wrong. It’s also inconsistently wrong — I tried the same question four times, and got a different answer, all of them wrong, each time. It’s a complete failure.
John Gruber
It’s baffling how Siri can claim to be asking ChatGPT for the answer and get something completely different from what ChatGPT provides when you ask it the same question directly.
Interestingly, Gruber went on to try to ask Google the same question, although it seems he relied on the “AI Overview” in Google search. That provided an “embarrassingly wrong response,” listing the Lower Brule Sioux — a South Dakota team — as the winner of the 2004 North Dakota high school boys state basketball championship.
Gruber called that “the single worst answer in this whole saga,” but when he tried again, the results got better. He doesn’t mention if he made the same query to Google Gemini, but when I posed the question to Gemini Advanced, the answer was correct and succinct:

I should note that I was using a paid version of Google Gemini that’s included with my Google Workspace account. Still, the real problem is likely that Google has a significant number of moving pieces in its Gemini and AI ecosystem, and it’s “AI Overview” is probably the weakest link. After all, this is the same system that once infamously told folks to put glue on pizza and eat rocks.
When I asked Gemini what search engine powers AI Overview, it told me that Google “hasn’t publicly named the specific AI engine” but that “ it’s almost certainly based on the same technology that drives its large language models, like PaLM 2,” but said nothing about it being in any way tied into Gemini, suggesting that it’s an entirely separate system based on a combination of different models:
It’s also important to note that Google’s AI Overview might not rely on a single engine but rather a combination of different models working together. This is a common approach in complex AI systems, where specialized models handle different aspects of the task.
Google Gemini
Apple may be facing similar issues in its new Siri implementation. Mixing different large language models (LLMs), some of which are based on on-device Apple Intelligence, others on its Private Cloud Compute infrastructure, and others from ChatGPT, creates more opportunity for confusion and “hallucinations.” As the old saying goes, a man with only one watch always knows what time it is; a man who wears two watches is never quite sure.
Of course, that’s no excuse. Apple needs to do better if it wants Siri to have a reputation as anything more than a laughing stock. Apple Intelligence is already starting from behind, and Google’s switch from Google Assistant to Gemini Live has only widened the gap between Siri and other mainstream voice assistants.