Search Engine Journal says stripping HTML hurts AI search signals
Google’s search team warned that markdown-style pages can strip away the structure, semantics and links AI search still depends on.

Turning a page into markdown may make it look cleaner to a crawler, but Google’s search team said the stripped-down version can erase the very signals that help pages win visibility. John Mueller and Martin Splitt pushed back on the idea that removing non-content elements makes a site better for AI search, arguing that HTML carries headings, links, metadata, readable hierarchy, accessibility information and other signals that browsers, screen readers and search systems still use.
The warning lands hardest for agencies chasing AI-SEO shortcuts. Markdown is easier for a machine to parse, but the technical hurdle is not converting HTML into plain text. The real risk is cutting away context that helps systems understand what a page means, not just what words it contains. For teams managing client sites, that means preserving the full structure of a page instead of chasing a minimal-content fantasy that may weaken ranking and comprehension at the same time.
Google’s own June 2026 guidance backs up that caution. Its generative AI features in Search are rooted in core Search ranking and quality systems, and the company says those features use retrieval-augmented generation and query fan-out to pull relevant pages from the Search index and show clickable links back to those pages. Google also said SEO best practices remain relevant, which keeps ordinary crawlable HTML, clear technical structure and readable page architecture squarely in the center of the strategy.
That same guidance draws a bright line around schema and terminology. Google said there is no special schema.org markup required for generative AI search, even as structured data remains useful for SEO and rich results. It also said the labels “AEO” and “GEO” are industry terms, but from Google’s perspective the work is still SEO. The message is less about inventing a new format for AI and more about making existing pages legible, structured and easy to trust.

The broader pattern is familiar. In a separate discussion about llms.txt, Mueller argued that LLM systems cannot use self-reported files to distinguish one website from another for discovery. Search Engine Land also reported earlier in 2026 that Google and Bing representatives were advising against separate markdown pages for LLM purposes. Taken together, the advice points in one direction: keep the real page intact, keep the signals intact, and let AI systems read the web the way search has always worked.
This article was produced by Prism’s automated news system from verified source data, official records, and press releases, then run through automated quality and moderation checks before publishing. The system is built and supervised by the people who set the standards it runs under. Read our full AI policy.
Did this article answer your question?


