Health

Even doctors can learn from A.I., but risks remain

ChatGPT can help clinicians work faster and speak more clearly, but medicine breaks down when a model sounds certain and gets it wrong.

Lisa Park··5 min read
Published
Listen to this article0:00 min
Even doctors can learn from A.I., but risks remain
Source: cloudester.com

The promise is real, but it has a hard boundary

ChatGPT and other generative A.I. tools are finding a place in medicine because they can do something many clinicians need every day: turn dense information into something usable. They can summarize long records, explain conditions in plain language, and take pressure off administrative and communication tasks that often eat into patient care.

That usefulness is exactly why the risk matters. In medicine, a tool that is fast and fluent can still be wrong, biased, or overconfident, and those failures do not land evenly. When the wrong answer reaches a patient with limited access to care, or a busy clinician under time pressure, the result can widen existing gaps rather than close them.

Where A.I. helps most

The strongest case for generative A.I. in health care is not diagnosis, but translation and support. A model can help condense a complicated chart, draft a simpler explanation of a condition, or assist with routine clinical and administrative workflows that slow down clinicians and frustrate patients.

That is why the American Medical Association has treated generative A.I. as an augmented intelligence issue rather than a replacement issue. On November 28, 2023, the Chicago-based organization released principles for the development, deployment, and use of augmented intelligence in health care, signaling that the technology should be governed, not simply adopted.

The boundaries regulators are drawing

The central policy message from major health institutions is consistent: A.I. may assist care, but it cannot own responsibility for it. The AMA has said ChatGPT and generative A.I. cannot replace physicians, in part because these systems can produce answers that are incorrect or sound more certain than the evidence allows.

The World Health Organization sharpened that warning on January 18, 2024, when it released guidance on the ethics and governance of large multimodal models in health care. Its message was blunt in practice if not in style: tools that can generate text, images, and other outputs create real risks if they are not carefully supervised.

The U.S. Food and Drug Administration has taken a similarly dual view. It says A.I. has the potential to improve patient care and augment health care practitioners, while also maintaining separate oversight for A.I.-enabled medical devices. That distinction matters because a chatbot that helps draft patient instructions is not the same thing as a regulated clinical device that influences care decisions.

What the studies actually show

Researchers have tested these systems in settings that matter to medicine, and the results are impressive enough to explain the hype. In one NEJM AI study, GPT-3.5 and GPT-4 were compared with 849 practicing physicians on 2022 Israeli board residency examinations. That kind of comparison helped fuel the idea that the models can sometimes perform at a level that surprises even experts.

Another NEJM AI paper reported that GPT-4 performed competitively on complex clinical case challenges drawn from published medical journal cases. For clinicians, that kind of result is not a reason to hand over judgment, but it is a sign that the tools can help with pattern recognition, review, and brainstorming when they are used carefully.

AI-generated illustration
AI-generated illustration

Still, test performance is not the same as bedside reliability. A model can look strong on board-style questions or polished case vignettes and still stumble when a real patient’s history is incomplete, contradictory, or full of social barriers that no prompt can fully capture.

Where the danger becomes clinical

The clearest line of concern is not that generative A.I. exists, but that it can sound authoritative while missing nuance that matters for safety. Peer-reviewed work has found that ChatGPT repeatedly overstated the risks of self-managed medication abortion, a finding that matters not just for one medical issue but for the broader way patients may encounter A.I. when they are anxious, alone, or trying to navigate care without enough support.

Other studies have found that ChatGPT can give misleading or unsafe medical advice in patient education. That is especially troubling in communities where language barriers, long waits, insurance hurdles, or mistrust of the health system already make it harder to verify information with a clinician.

This is where the public health implications become clear. A.I. can help compress knowledge, but it can also compress error. If a model gives a wrong answer in a vacuum, the harm is individual; if the same wrong answer scales across thousands of searches, it becomes a community problem.

Why trust and boundaries matter more than hype

The healthiest way to use ChatGPT in medicine is as a second set of eyes, not a substitute brain. It can assist with drafting patient-friendly language, summarizing research, and supporting administrative work, but the final clinical judgment still has to come from a human who can weigh context, uncertainty, ethics, and risk.

That human judgment is especially important when the stakes include bias, privacy, and liability. A model trained on imperfect data can reproduce inequities, and a rushed deployment can expose sensitive patient information or shift responsibility in ways that are not transparent to patients or staff.

OpenAI’s own health positioning reflects that caution. The company now markets ChatGPT for health care use cases while saying its health product is not intended for diagnosis or treatment. That is a meaningful boundary, because it acknowledges the same reality regulators have been pressing: useful does not mean interchangeable with medical care.

The practical lesson for medicine

The most responsible future for generative A.I. is not one where clinicians ignore it, and not one where they defer to it. It is one where they use it for the tasks it can safely handle, then stop at the point where human experience, accountability, and bedside reasoning must take over.

That balance is what the AMA, the WHO, and the FDA are all trying to protect in different ways. The tools may keep improving, but the rule in medicine remains unchanged: efficiency is valuable, yet safety, equity, and judgment must come first.

This article was produced by Prism’s automated news system from verified source data, official records, and press releases, then run through automated quality and moderation checks before publishing. The system is built and supervised by the people who set the standards it runs under. Read our full AI policy.

Did this article answer your question?

Discussion

More in Health