The rules of digital visibility are changing fast. A few years ago, the goal was to rank on page one of Google. Today, millions of users skip the search results page entirely and ask ChatGPT, Gemini, Perplexity, or Claude directly. The model reads the web, synthesizes an answer, and cites a handful of sources.
Your content either makes that shortlist, or it doesn't.
At elelem, we asked a question that no one in the GEO (Generative Engine Optimization) space had answered rigorously: can we build a score that predicts, ahead of time, how likely a given webpage is to be cited in an LLM-generated response?
We built one. Then we validated it against 140,000 real LLM queries. Here is what we found.
Before we get to the score, it helps to understand why predicting LLM citations is hard.
Traditional SEO is relatively legible. Google's ranking signals, while complex, are reasonably well-studied. You can measure backlinks, page speed, keyword density, and get a directional sense of where you stand.
LLM citation works differently. A model like ChatGPT or Gemini operates through a retrieval-augmented-generation pipeline. Documents are first retrieved as candidates, then the model selects a small subset to cite within a constrained context window. That selection is influenced by factors across at least four layers:
There are over 100 variables that potentially influence whether your page gets cited. Most of them are either outside your control or impossible to measure directly.
The elelem Retrieval Score focuses on the layer where you have the most leverage: document-level factors.
The score takes a query and a webpage as input and produces a single number estimating the likelihood of citation. It is built from four families of signals, each capturing a distinct dimension of how well a document serves a given query.
The four signal families are: semantic relevance, contextual relevance, lexical overlap, and query token positioning.
Each signal is computed independently and combined into a composite score. The weights assigned to each component are not arbitrary -- they are derived empirically from how strongly each signal correlates with actual citation outcomes in our dataset. The weighting scheme is updated as we collect more data.
Without disclosing the precise implementation, the core intuition is this: a webpage that is deeply aligned with the meaning of a query, answers it directly, uses the right vocabulary, and surfaces that vocabulary early in the document, is more likely to be cited than one that does not.
To test whether the score actually predicts citation outcomes, we ran a correlation study across 140,414 LLM requests, spanning four providers: ChatGPT, Gemini, Perplexity, and Claude.
The target variable was share of citation: the fraction of times a given URL was cited within a query group, normalized per provider to control for differences in how many citations each model tends to include per response.
Key results:
Why aren't the correlations higher? Because they shouldn't be. As outlined above, citation outcomes are determined by document-level factors and system-level factors we deliberately do not attempt to model. A score that claimed to perfectly predict citations would be lying. Moderate, consistent, statistically significant correlations across 140K observations and four independent providers is exactly the signal you want from a document-level optimization tool.
The practical implication is straightforward. If you are a brand trying to appear in LLM-generated answers, you now have a measurable, quantitative signal to optimize against -- rather than guessing.
elelem's GEO Platform surfaces this score directly in the Score Draft feature within the Optimize Content section. For each of your webpages and a given target query group, you get a retrieval score and actionable guidance on what is holding the score down and how to address it.
The score is not a guarantee of citation. But it is the most empirically grounded signal currently available for document-level GEO.
This is version one of the retrieval score. The current implementation operates at the document level. Upcoming improvements include:
We are also exploring the use of industry-specific corpora for the lexical overlap component, after observing that domain-general corpora produces weaker signals for specialized verticals.
If you are working on your brand's visibility in LLM-generated responses and want to see the Retrieval Score in action, get in touch with the elelem team.

