Technology

New study finds finetuned AI can learn false claims as true

Finetuned models can absorb a falsehood as truth even after documents label it false. The finding raises fresh concerns for search, legal, health and enterprise systems.

Marcus Williams··2 min read
Published
Listen to this article0:00 min
New study finds finetuned AI can learn false claims as true
Source: res.cloudinary.com

A new May 2026 study shows a failure mode that goes beyond ordinary hallucination: when large language models are finetuned on documents that explicitly flag a claim as false, they can still come to treat that claim as true.

The paper, Negation Neglect: When models fail to learn negations in training, gives a stark example: the false statement that Ed Sheeran won the 100m gold medal at the 2024 Olympics. In the researchers’ tests, models could recognize the claim was false when the same documents were shown in context, yet after finetuning they still answered as if the falsehood were true. The authors argue that the pattern reflects an inductive bias toward confidently representing claims as true.

That is more than a quirky prompt-injection problem. It points to a deeper reliability risk for any system that is supposed to separate correction from misinformation. Search tools, enterprise assistants and document-review systems all depend on the ability to notice that a statement has been marked wrong and keep it wrong. This study suggests finetuning can blur that boundary, with a model absorbing the claim even when the training material is trying to negate it.

The authors say there is a technical escape hatch. Adding an additional soft constraint during training allowed models to report the claims as false while still maintaining low loss on the negated documents. That matters because it shows the problem is not simply that models lack enough exposure to negation. It is that standard training can reward the model for learning the claim itself, even when the surrounding text is warning against it.

AI-generated illustration
AI-generated illustration

The finding fits a broader warning from recent AI safety and factuality research. A 2025 Nature Machine Intelligence article said current language models cannot reliably distinguish between belief, knowledge and fact, raising concerns in healthcare, law and journalism. A separate Nature article on factuality said large language models still tend to produce false or misleading content and can contribute to misinformation or disinformation.

OpenAI has described hallucinations as answers a model confidently generates that are not true, and it has also outlined training approaches meant to make models more honest and more willing to report their shortcomings. Earlier work has long treated negation as a stubborn NLP problem, including research on negation blindness and the so-called pink elephant problem.

Taken together, the new study reinforces a central limitation in today’s AI systems: they may hear that something is false, and still learn it as if it were true. That is the kind of failure that can quietly corrupt a search result, an internal enterprise memo or a high-stakes answer where accuracy is supposed to be nonnegotiable.

This article was produced by Prism’s automated news system from verified source data, official records, and press releases, then run through automated quality and moderation checks before publishing. The system is built and supervised by the people who set the standards it runs under. Read our full AI policy.

Did this article answer your question?

Discussion

More in Technology