Health

Nature Medicine trial tests generative AI in primary care clinics

A Kenyan primary-care trial found ChatGPT-4o-assisted support fit clinic workflow, but it did not cut 14-day treatment failure. The clearest gains were in notes and recommendations, not patient outcomes.

Sarah Chen··4 min read
Published
Listen to this article0:00 min
Nature Medicine trial tests generative AI in primary care clinics
Source: Nature

Generative AI was tested where primary care lives or dies, inside busy clinics with incomplete histories, time pressure and real patients. In 16 Kenyan facilities, an embedded ChatGPT-4o-assisted tool called AI Consult was put to work alongside clinicians, but the trial did not show a significant drop in 14-day treatment failure versus usual care.

What the trial put to the test

The study used a pragmatic, cluster-randomized design, which means whole clinical settings, not individual patients, were assigned to different conditions. That matters in primary care because a decision-support tool changes how visits unfold, how notes are written and how follow-up decisions are made, all of which are hard to capture in a lab-style test.

More than 9,600 patients and the clinicians caring for them were involved across 16 primary care facilities in Kenya. Clinical officers used the electronic medical record with or without LLM assistance, creating a direct comparison between routine care and care augmented by generative AI. The research team included Ambrose Agweyu, Paul Mwaniki, Bilal A. Mateen, Alastair Denniston, Xintai Fan, Longlong Zhang and Yilai Shu, with institutional ties spanning the University of Birmingham, the National Institute for Health and Care Research Biomedical Research Centre in Birmingham and PATH.

How AI Consult fit into the clinic workflow

AI Consult was not a separate chatbot sitting outside the visit. It was embedded directly in the electronic medical record, where it could analyze notes in real time and generate context-specific diagnostic and treatment suggestions. Those suggestions were aligned with Kenyan national clinical guidelines, so the system was built to nudge clinicians toward locally relevant care rather than generic medical advice.

The interface used a green, yellow and red alert system to flag concerns. Even with those prompts, clinicians kept full autonomy and were not required to follow the AI’s advice. Patients also did not see the tool, which kept the interaction inside the clinician workflow rather than turning the visit into a three-way conversation between patient, clinician and machine.

That distinction is central to understanding the trial. The question was not whether generative AI could impress in a demo, but whether it could sit quietly in the background of a real consult and make everyday primary care more consistent, more accurate and less error-prone.

What changed, and what did not

The main patient-level outcome highlighted in the trial was 14-day treatment failure. On that measure, the AI-assisted arm did not significantly outperform usual care. For health systems hoping for an immediate, measurable reduction in near-term adverse outcomes, that result is the hardest number in the study to ignore.

The story is not that the tool failed to help at all. Public reporting on the trial says it improved aspects of clinical decision-making, including the quality of notes and recommendations. That is a meaningful workflow gain in primary care, where documentation quality can shape triage, follow-up and continuity across visits.

Bilal A. Mateen called the result “reassuring but also sobering,” a useful summary of what the trial found. The system appears to have been workable in practice, but proving that a better workflow translates into fewer bad outcomes is a much higher bar.

Why the result matters for primary care

PATH had framed the project as evidence-building for a practical set of goals: reducing incorrect or missed diagnoses, cutting unnecessary repeat visits and improving guideline-based treatment plans in Nairobi primary care. Those are exactly the kinds of problems that generative AI claims to solve, but the trial shows how hard it is to turn promise into a patient-level endpoint when serious outcomes are relatively rare.

That rarity is part of the statistical problem. Public reporting around the study noted that detecting modest effects in primary care may require trials with more than 100,000 patients. In other words, even a tool that helps clinicians think better may not show a dramatic signal on short-term outcomes unless the study is enormous.

The trial also fits into a broader evidence arc around the same system. An earlier retrospective evaluation of the EMR-embedded model across 16 Kenyan clinics found hallucinations were uncommon in reviewed records. A later preprint suggested clinicians exposed to the tool generated fewer red and yellow safety flags over time, hinting at a learning effect as users adapted to the system.

What this says about the next phase of medical AI

The lesson from this Kenyan trial is not that generative AI has no place in primary care. It is that clinical usefulness has to be judged in the conditions that matter most: how often clinicians actually use the tool, when they override it, whether it makes documentation cleaner or adds friction, and whether any patient benefit is large enough to justify rollout.

That is where medical AI is heading now. Benchmark performance and polished demos are no longer enough, especially in a setting like primary care where broad symptom ranges, variable patient populations and immediate consequences leave little room for error. The standard is shifting toward measurable clinical value, and this trial shows how demanding that standard will be.

This article was produced by Prism’s automated news system from verified source data, official records, and press releases, then run through automated quality and moderation checks before publishing. The system is built and supervised by the people who set the standards it runs under. Read our full AI policy.

Did this article answer your question?

Discussion

More in Health