Analysis

How ChatGPT picks sources, hidden signals shape citations

ChatGPT citations are routed by hidden request signals, not just page quality. Publishers now need pages that are crawlable, explicit, and easy for the model to trust.

Daniel Reid··3 min read
Published
Listen to this article0:00 min
How ChatGPT picks sources, hidden signals shape citations
AI-generated illustration

Raw JSON moving underneath ChatGPT’s reply showed hidden request fields that steer what gets fetched, cited, or ignored. Suganthan Mohanadasan traced those fields through about 1,240 source records from a logged-in Pro account across a few dozen searches, and the visible answer is only the last layer of the system. AI search visibility is now about retrieval mechanics as much as it is about writing quality.

What the network traffic is really showing

The most important detail is the method. Mohanadasan is explicit that the sample is small and query-specific, but he treats the structural findings as durable because the fields appear directly in the network layer. Once a field shows up in the payload, it is part of the routing logic whether or not the final answer exposes it.

The mechanics are broader than a simple “good content wins” story. Mohanadasan identified hidden inputs such as result source, turn use case, vendor names, and the search queries the model wrote itself. Those signals are not the same thing as page quality, and they are not captured cleanly by many share-of-voice tools. If the model is deciding what to fetch before it decides what to cite, then pages have to be legible to that earlier routing stage.

Why OpenAI’s own product docs raise the stakes

OpenAI says ChatGPT search can pull in the latest information from the internet and return answers with sourced citations, while deep research is built to search the public web or specific sites and produce a documented report. ChatGPT can help gather sources, analyze information, and create structured, citation-backed insights.

OpenAI’s training disclosures add another layer. The company says the models behind ChatGPT are developed from three primary buckets of information: publicly available internet content, third-party partner data, and information provided or generated by users, human trainers, and researchers. When a model has broad pretraining and live web retrieval, the real editorial question becomes which documents it can retrieve fast enough, trust enough, and quote cleanly enough to surface in the answer.

The signals that keep showing up

Citation performance is increasingly tied to specific, measurable signals rather than vague brand authority. OpenAI says search and deep research rely on cited web sources, while Mohanadasan’s traffic analysis found the request payload helps decide which sources get pulled into play. Separately, SE Ranking’s study of 129,000 domains found that citation likelihood in ChatGPT rises with stronger referring-domain profiles, higher domain trust, better organic visibility, and heavier presence on Quora and Reddit. Those are concrete distribution channels, not abstract content virtues.

A Search Engine Journal analysis found that ChatGPT Search is citing fewer websites per response after GPT-5.3 Instant became the default experience, with average unique domains per response dropping from 19 to 15 in the cited dataset.

What publishers should build for

The editorial lesson is not “stuff more keywords into articles.” It is to build pages that survive a retrieval pass and read like reliable evidence once they are found. ChatGPT search and deep research reward pages that are easy to extract, easy to verify, and easy to connect to a specific query. That means clear claims, named entities, and source-ready formatting.

A citation-ready page checklist

  • Put the answer near the top, then support it with specific facts, dates, and named entities. OpenAI’s search flow is designed for current information, so pages that force the model to dig for the point lose time and often lose the citation.
  • Use plain, explicit wording around key claims. If a page is trying to be quoted by a system that produces documented reports, ambiguity makes it harder to quote.
  • Make source relationships obvious. Recognized, externally validated domains have an advantage, while Mohanadasan’s traffic analysis found the model is deciding among candidate sources before the answer is formed.
  • Build for third-party corroboration. The strongest citation signals include referring-domain strength, domain trust, organic visibility, and visibility on widely crawled community platforms.
  • Expect model behavior to change. The Search Engine Journal analysis of GPT-5.3 Instant found citation-pattern shifts when the default experience changed, so pages need durable factual clarity, not one-off formatting tricks.

This article was produced by Prism’s automated news system from verified source data, official records, and press releases, then run through automated quality and moderation checks before publishing. The system is built and supervised by the people who set the standards it runs under. Read our full AI policy.

Did this article answer your question?

Discussion

More AI Search Visibility Articles