Analysis

AI audits miss relationship integrity, Common Crawl pushes graph-based visibility checks

Common Crawl’s new audit checks crawlability, but Bill Hunt says AI still fails if it cannot map how pages relate. The real issue is relationship integrity.

Sam Ortega··6 min read
Published
Listen to this article0:00 min
AI audits miss relationship integrity, Common Crawl pushes graph-based visibility checks
AI-generated illustration

Most AI visibility audits still stop at the URL level, and that is exactly where they miss the real failure. A page can be crawlable, indexable, and neatly marked up, yet still leave a machine unable to tell how the business fits together. Bill Hunt’s warning is blunt: if the relationships are broken, AI can find the site and still misunderstand the brand.

The blind spot is not access, it is meaning

Common Crawl’s AI Visibility Audit starts from a sensible premise: first check whether AI systems can actually reach your content. The field guide is free, built for SEOs and GEOs, uses only free tools, and is designed as a repeatable five-check audit that can be run in about 90 minutes. That makes it practical enough to use across large sites, which is exactly where a lot of teams get lazy and start treating visibility as a page-by-page exercise.

Hunt’s point is that discoverability is only the first hurdle. If AI cannot access information, it cannot retrieve it, summarize it, or recommend it. But access alone does not guarantee understanding, and that is where most audits fall apart: they validate whether a page exists, not whether the site expresses the business in a way a machine can reconstruct.

Why an integrity graph changes the game

The missing layer, in Hunt’s framing, is relationship integrity. He calls it an integrity graph, and the idea is simple enough to be dangerous: machines do not just read isolated pages, they infer context from the links between entities, branches, services, people, and proof points. If those connections are inconsistent or missing, AI can still crawl the site, but it will assemble a warped version of the business.

That is why the old habit of checking schema in isolation is no longer enough. A site can have sensible Organization markup, branch details, product pages, and service markup, yet still fail to express how those pieces belong together. An integrity graph asks a harder question: can the business be reconstructed accurately from the relationships it publishes, not just from the fields it fills in?

What the Common Crawl data says about the web now

Common Crawl’s own graph releases underline why this approach is becoming unavoidable. Its March, April, and May 2026 web graphs included 262.4 million host-level nodes and 8.1 billion edges, plus 118.8 million domain-level nodes and 4.3 billion edges. Those are not vanity numbers. They show how central graph-based analysis has become in an environment where search and AI are increasingly built around connections, not isolated documents.

That scale also explains why page-level auditing feels outdated. If the systems that train on and retrieve from the web are already thinking in nodes and edges, then brands that present themselves as disconnected pages are handing those systems a broken map. Common Crawl’s work makes the point in a very practical way: if the internet is being modeled as a graph, your visibility strategy needs to be graph-shaped too.

Banking sites show the problem clearly

Hunt uses banking sites as the cleanest example because the failure is so easy to spot. Many banks had reasonable page-level schema, with Organization, branch, product, and service markup spread across the site. On paper, that looks disciplined; in practice, it often still leaves a machine with no reliable way to see which branch belongs to which brand, which service connects to which location, or which claims are supported by which evidence.

That is the trap. A bank can look structured from the outside while still lacking a real knowledge graph that ties the entities into a consistent whole. In Hunt’s terms, the site may be visible, but the business is not legible.

How to audit for relationship integrity

A useful AI visibility audit should still check crawlability, but it cannot stop there. After the access test, the next layer is consistency: do the pages, entities, and claims line up across the site, or do they drift from one section to another? Then comes corroboration: can the same business facts be verified across multiple pages, not just asserted in a single place?

    A practical integrity check should include:

  • Whether core entities are named the same way everywhere
  • Whether branches, services, leadership, and proof points link back to one another cleanly
  • Whether structured data reflects actual site relationships instead of isolated fields
  • Whether the site can be rebuilt as a coherent business graph, not just a pile of indexable URLs

That is the real upgrade in mindset. You are no longer asking only, “Can AI get in?” You are asking, “Can AI understand who we are, what we sell, where we operate, and why the claims hang together?”

Why the crawl debate got louder this month

The relationship-integrity conversation is landing in the middle of a much louder fight over what gets crawled at all. Common Crawl has said many SEOs unknowingly block their sites from AI search by restricting CCBot in robots.txt, which means the first failure can be self-inflicted before any model ever sees the page. At the same time, publishers and trade groups are pushing back hard on how their material is collected and used.

Digital Content Next sent Common Crawl a cease-and-desist letter on June 10, 2026, asking it to remove protected content from its datasets, including paywalled and subscriber-only news articles. Search Engine Land reported that DCN represents major publishers including the Associated Press, The New York Times, NBC Universal, Bloomberg, NPR, and Fox. Rich Skrenta, Common Crawl’s executive director, disputed the idea that CCBot bypasses paywalls and said removal requests are handled through a technical process.

Common Crawl has also been publishing an opt-out registry for legal removal requests, which shows how fraught the open-web data layer has become. The fight is no longer just about whether a crawler can fetch a page. It is about who gets to shape the dataset that AI systems train on, pull from, and trust.

The practical takeaway for AI visibility teams

If you are still auditing one page at a time, you are checking the surface of the problem. AI visibility now depends on whether the machine can connect the dots between the pages, entities, claims, and proof that define the business. Common Crawl’s five-check audit is a useful starting point, but Hunt’s integrity graph is the real upgrade: it shifts the work from validating URLs to validating the business itself.

That is where the next round of visibility wins will come from. Not from more schema in isolation, but from a site that can be read as a truthful, connected model of the organization.

This article was produced by Prism’s automated news system from verified source data, official records, and press releases, then run through automated quality and moderation checks before publishing. The system is built and supervised by the people who set the standards it runs under. Read our full AI policy.

Know something we missed? Have a correction or additional information?

Submit a Tip

Never miss a story.

Get AI Search Visibility updates weekly. The top stories delivered to your inbox.

Free forever · Unsubscribe anytime

Discussion

More AI Search Visibility Articles