The Perils of Handing Your Health Data to ChatGPT
Toggle Dark Mode
Earlier this month, OpenAI released ChatGPT Health, an expansion of its AI chatbot designed to offer doctor-led insights into people’s personal health. It started with bold promises and lots of optimism — even if it felt like a naked attempt to beat Apple to the punch — but it doesn’t seem to have done a good job at delivering so far.
To recap, ChatGPT Health relies on feeding your personal health data into OpenAI’s algorithms so it can analyze it and give you advice and coaching on how to eat healthier, workout more, and answer questions about whatever you feel might be plaguing you. To do this, it ties into Apple Health (in the same way as any other third-party app — you have to grant it permission) and other services like Function, MyFItnessPal, Weight Watchers, AllTrails, Instacart, and Peloton.
ChatGPT can also pull in your medical records thanks to a partnership with b.well, which is a trusted source for sharing data between US healthcare providers, but it’s also important to note that ChatGPT Health isn’t covered by federal health privacy laws (HIPAA) as OpenAI isn’t a healthcare provider.
Still, If you think that giving an AI chatbot access to your personal health data is a bad idea, you may be right. Certainly, that was the experience of The Washington Post’s Geoffrey Fowler (Apple News+), who found that ChatGPT Health not only “drew questionable conclusions,” but that it wasn’t even consistent in those.
Like many people who strap on an Apple Watch every day, I’ve long wondered what a decade of that data might reveal about me. So I joined a brief wait list and gave ChatGPT access to the 29 million steps and 6 million heartbeat measurements stored in my Apple Health app. Then I asked the bot to grade my cardiac health.
It gave me an F.
I freaked out and went for a run. Then I sent ChatGPT’s report to my actual doctor.
Geoffrey Fowler, The Washington Post
Fowler’s doctor not only confirmed that ChatGPT Health was out lunch, but said Fowler was “at such low risk for a heart attack that [his] insurance probably wouldn’t even pay for an extra cardio fitness test to prove the artificial intelligence wrong”
ChatGPT Health is ‘Baseless’
Anyone who has spent more than five minutes with ChatGPT (or any other AI bot), knows that they can be dumb as a bag rocks. If you believe OpenAI’s hype, ChatGPT Health may be slightly smarter than ChatGPT — it’s been trained by doctors, after all — but that’s a pretty low bar.
AI chatbots and the large language models behind them are well known for hallucinating, but to make matters worse, they’ll often declare the correctness of their answers with absolute and unwavering confidence — and often even double-down on their wrongness.
For example, Google’s Gemini seems constantly stuck in 2024, likely due to much of its training data. It continually insists that “iOS 26” doesn’t exist — and isn’t coming out until 2032 — and that Joe Biden is still the President. Unlike ChatGPT, which can dig its heels in, Gemini is more apologetic when I correct it, but I shouldn’t have to keep correcting it every single time it makes the same mistake. An AI that can’t even figure out who the current President is certainly shouldn’t be grading your heart health.
Anyone expecting ChatGPT Health to be magically better just because it has the word “health” in it is dreaming in technicolor.
To reinforce this, Fowler showed his results to a cardiologist, Eric Topol of the Scripps Research Institute. Topol is a recognized expert on longevity and the potential of AI in medicine, so he knows his subject matter well. His response?
“It’s baseless,” Topol told Fowler. “This is not ready for any medical advice.”
To be fair, not all of the mistake that ChatGPT Health made were entirely its fault. Some were based on poor inputs, such as VO2 max data from the Apple Watch, which Apple has long described as a “fuzzy” data point — a nuance the AI ignored in favor of delivering a definitive, and terrifying, grade.
In other words, ChatGPT still shares the blame, as it really should be trained to understand that not all health device metrics should be relied upon equally. The Apple Watch VO2 max estimate isn’t an obscure limitation; no wrist worn device can provide accurate VO2 max, which typically requires a formal test with a treadmill and a mask.
Fowler noted that ChatGPT was useful when the interactions were expressed in narrow terms or relied on pure data aggregation. For example, it was able to plot his data in different ways and could answer simple questions about activity changes. It was the interpretation of those results where it often went off the rails.
The ‘Mood Swing’ Problem
While ChatGPT Health might be a decent librarian for your data, it’s a remarkably scatterbrained doctor. The AI added insult to injury by not even being consistent with its own logic. After receiving the initial “F” grade, Fowler followed up by asking if his heart health really deserved a failing grade. In response, the bot effectively backpedaled, apologizing for the “harsh” letter grade and telling him he wasn’t a “lost cause.”
Once he connected his official medical records via the b.well integration, ChatGPT reassessed him with a “D.” While those records provided more context, the AI failed to admit its initial analysis was flawed in jumping right to a scary negative with limited data. Instead, it simply moved the goalposts.
The inconsistency grew even more bizarre with longevity questions, where Fowler’s scores swung from a B to an F, seemingly depending only on what kind of “mood” ChatGPT was in. It likely didn’t help that despite the “advanced custom code” OpenAI says it’s using to organize health data, the bot repeatedly forgot Fowler’s basic profile, including his gender and age, as well as recent vital signs it had just pulled from Apple Health.
“That kind of randomness is “totally unacceptable,” Topol said. “People that do this are going to get really spooked about their health. It could also go the other way and give people who are unhealthy a false sense that everything they’re doing is great.”
Geoffrey Fowler, The Washington Post
When Fowler approached OpenAI with his findings, it said it was unable to replicate the “wild swings” he observed. OpenAI health VP Ashley Alexander also reiterated that ChatGPT Health is effectively still in an early release stage. “Launching ChatGPT Health with waitlisted access allows us to learn and improve the experience before making it widely available,” she said.
To be clear, ChatGPT isn’t alone here. Anthropic recently followed-up OpenAI with Claude for Healthcare, and Fowler found it didn’t fare any better at providing health insights. Meanwhile, Google seems to be steering clear of the health market for now, and even Apple may delay its “Health+” feature that was widely expected to launch this year.
Bloomberg’s Mark Gurman said in his Power On newsletter earlier this week that Apple “has returned to the drawing board on Health-related AI features,” as it focuses on the new “Project Campos” version of Siri that’s slated for iOS 27. That doesn’t mean it’s abandoned health, but it’s seemingly looking to create a more integrated Siri experience rather than a standalone “healthbot.” However, Apple will undoubtedly also look at OpenAI and Anthropic as cautionary tales on what not to do — and it’s certainly going to avoid “winging it” in public like they both seem to be doing right now.
As Fowler notes, these AI bots can be remarkably inconsistent, giving excellent results one day and potentially dangerous ones the next, but “the problem is ChatGPT typically answers with such confidence it’s hard to tell the good results from the bad ones.”


