Analysis

Google clarifies robots.txt limits as WooCommerce URLs stay indexed

More than 51,000 WooCommerce URLs were still marked indexed despite robots.txt blocks, showing why crawl control is not the same as deindexing.

Sam Ortega··2 min read
Published
Listen to this article0:00 min
Google clarifies robots.txt limits as WooCommerce URLs stay indexed
Source: aioseo.com

More than 51,000 WooCommerce URLs showed up in Search Console as indexed even though robots.txt was blocking them, and most of them were add-to-cart parameter pages. That is the exact kind of mess that keeps site owners staring at the wrong file, because robots.txt controls crawling, not whether Google can ever know a URL exists.

Google’s current guidance is blunt on that point. robots.txt is for managing crawler access and avoiding unnecessary requests, not for keeping a page out of Search. If the goal is true deindexing, Google says to use noindex or password protection instead. Google also warns that a blocked URL can still be indexed if other pages link to it, even if the crawler never reads the page itself.

AI-generated illustration
AI-generated illustration

That distinction matters because Search Console can look scarier than it is. Google’s page indexing help says a page may be indexed despite a robots.txt block, and the Search Console report explicitly frames it as a URL that was indexed despite being blocked. Google’s own technical guidance adds that blocked pages are unlikely to show in normal Search results, so a report is not the same thing as broad visibility. In practice, John Mueller’s guidance on the WooCommerce case was that those add-to-cart URLs do not need to be indexed and that blocking them with robots.txt is acceptable. He also said that even when they are reported as indexed, they are unlikely to surface unless someone searches for those exact URLs, which users rarely do.

The right response is architectural, not panicked. Google recommends using the Page Indexing report, Crawl Stats and URL Inspection to understand why a URL is inaccessible and whether canonical signals already point to the preferred product page. That matters for ecommerce sites with faceted navigation, tracking parameters and utility URLs, where a blunt crawl block can hide useful signals without solving the indexing problem you actually care about.

Google has been making this same point for years. In its September 2023 Search Central office hours, the company said it can index the URL itself, even when robots.txt blocks crawling, because the URL can be known without the content being fetched. Google also says robots.txt rules are refreshed in its cache every 24 hours, with faster recrawl available through the robots.txt report. The lesson for WooCommerce stores is simple: if a page should stay out of search, do not rely on robots.txt alone. Use it to control crawling, then use noindex, password protection, canonicals and cleaner URL design to control what actually gets indexed.

This article was produced by Prism’s automated news system from verified source data, official records, and press releases, then run through automated quality and moderation checks before publishing. The system is built and supervised by the people who set the standards it runs under. Read our full AI policy.

Did this article answer your question?

Discussion

More AI Search Visibility Articles