Digital Marketing

The Facts About Google Click Signals, Rankings, and SEO

Clicks as a ranking signal have been controversial for more than two decades, although today many SEOs understand that clicks are not a direct indicator of ranking. The simple fact about clicks is that they are raw data and, surprisingly, processed with some similarity to human ratings.

Clicking is a Green Signal

A DOJ Antitrust memorandum from September 2025 refers to clicks as a “green signal” used by Google. It also categorizes content and search queries as raw signals. This is important because the green signal is the lowest-level data point that is processed into higher-level signals or used to train a model such as RankEmbed and its successor, RankEmbedBERT.

Those are considered immature symptoms because they are:

  • Directly observed
  • But it has not yet been interpreted or used to train the data

The DOJ document quotes professor James Allan, who provided expert testimony on behalf of Google:

“The signals vary in complexity. There are “raw” signals, such as the number of clicks, the content of the web page, and the terms within the query.

…These signals can be created in simple ways, such as counting occurrences (eg, how many times a web page was clicked in response to a certain query). Id.
at 2859:3–2860:21 (Allan) (discussing the Navboost signal) “

He then compares the raw signals to the processed ones:

“On the other end of the spectrum are deep learning models, which are machine learning models that recognize complex patterns in large data sets.

Deep models discover and exploit patterns in multiple data sets. They add unique capabilities at a higher cost. “

Professor Allan explains that “quality signals” are used to generate the “final” score of a web page, including popularity and quality.

Raw Signals are data for further processing

Navboost is mentioned several times in the September 2025 antitrust document as a popular data. It is not mentioned in the context of clicks that have a ranking effect on individual sites.

It is referred to as a measure of popularity and purpose:

“…popularity as measured by user intent and feedback systems including Navboost/Glue…”

And elsewhere, in the context of explaining why some Navboost data is privileged:

‘Not popular as measured by user intent and feedback systems including Navboost/Glue’…”

In order to explain why other Navboost data is privileged:

“Under the proposed remedy, Google must make available to Eligible Competitors … the following data sets:

1. User-side data used to build, create, or run GLUE’s statistical models;

2. User-side data used to train, build, or run RankEmbed models; again

3. User-side data used as training data for GenAI models used in Search or any GenAI product that can be used to access Search.

Google uses the first two data sets to build search signals and the third to train and refine the models underlying AI Viewer and (presumably) the Gemini app.”

Clicks, like human rating points, are just a raw signal that is continuously used by a series of algorithms to train AI models to better match web pages to queries or to generate a quality or relevance signal that is then added to all other quality signals by a ranking engine or ranking conversion engine.

70 Days of Search Logs

The DOJ document makes reference to using 70 days of search logs. But that’s eleven words in a larger context.

Here is the part that is often quoted:

“70 days of search logs and human-generated scores

I get it, simple and straightforward. But there is more context to it:

“RankEmbed and its later iteration RankEmbedBERT are ranking models that rely on two main sources of data: [Redacted]% of 70 days of search logs and scores generated by human raters and used by Google to measure the quality of organic search results.”

70 days of search logs are not click data used for ranking purposes in Google, AI Mode, or Gemini. The data is then aggregated and processed to train specialized AI models such as RankEmbedBERT which ranks web pages based on natural language analysis.

That part of the DOJ document does not say that Google directly uses click data to measure search results. It is data, such as personal data, that other systems use for training data or for further processing.

What is Google’s RankEmbed?

RankEmbed is a natural language method for identifying relevant documents and ranking them.

The same DOJ document explains:

“RankEmbed’s model itself is an AI-based, deep learning system with a strong understanding of natural language. This allows the model to efficiently identify the best documents to find, even if the query has no specific words.”

It is trained on less data than previous models. The data is part of the query terms and web page pairs:

“…RankEmbed is trained on 1/100th of the data used to train previous ranking models but provides high quality search results.

…Among the basic training data is information about the query, including keywords that Google removed from the query, and the resulting web pages.”

That’s training data to train a model to see how query words match web pages.

The same document explains:

“The data underlying RankEmbed’s models is a combination of click-through data and demographics of web pages.”

It is very clear that in the context of this particular episode, we are describing the use of click data (and demographic data) to train AI models, not to directly influence delivery.

What About Google Clicking Patent?

Back in 2006 Google filed a patent related to clicks called, Adjusting the ranking of search results based on vague user feedback. The invention is about a mathematical formula for creating a “relevance ratio” from aggregated raw click data (quantity).

The patent distinguishes between the creation of the signal and the act of positioning itself. A “relevance score” comes from a ranking engine, which may add to existing ranking scores to rank search results for new searches.

Here is what the copyright defines:

“A ranking Sub-System can include a ranking engine that uses implicit user feedback to cause reordering of search results to improve final ranking.
presented to the user of the information retrieval system.

User preferences for search results (click data) can be tracked and converted into click share that can be used to re-rate future search results.”

That “click fraction” is a measure of relevance. The invention described in the patent is not click tracking; it’s about the statistical average (fraction of clicks) that results from adding all those individual clicks together. That includes short click, medium click, long click, and last click.

Technically, it is called LCIC (Long Clicks divided by Clicks) Episode. “Clicks” is plural because it makes decisions based on the statistics of multiple (aggregated) clicks, not individual clicks.

That part of the click is a combination because:

  • Summary:
    The “starting number” used for estimation is the sum of all those clicks measured for each pair of a given query document.
  • Adaptation:
    It takes that sum and divides it by the total number of all clicks (“the second number”).
  • Mathematical acceleration:
    The system applies “smoothing factors” to this aggregated number to ensure that a single click on an “odd” query does not unduly skew the results, especially for spam.

That 2006 patent describes its scaling formula this way:

“The basic LCC click component can be defined as:

LCC_BASE=[#WC(Q,D)]/[#C(QD)+S0)[#C(QD)+S0)[#C(QD)+S0)[#C(QD)+S0)

where iWC(QD) is the weighted sum of clicks on the query URL…pair, iC(QD) is the total number of clicks (normalized, unweighted count) for the query-URL pair, and S0 is the smoothing factor.”

That formula describes summarizing and dividing data from multiple users to create a single document score. A “query-URL” pair is a “bucket” of data that stores the click behavior of every user who has ever typed that particular query and clicked on that particular search result. A slick feature is the anti-spam component that includes not counting single clicks on rare search queries.

Even back in 2006, clicks were simply raw data that was converted up a chain in multiple aggregation stages, into a statistical measure of relevance before it reached the ranking stage. In this patent, clicks themselves are not ranking factors that directly influence whether a site is rated or not. They were used in the assembly as a measure of compatibility, which was then applied to another engine to set the standard.

By the time the information reaches the ranking engine, the raw data has been converted from individual user actions into a relevancy measure.

  • Thinking about clicks relative to ranking is not easy since clicks drive search rankings.
  • Clicks are just raw data.
  • Clicks are used to train AI systems like RankEmbedBert.
  • Clicking does not directly affect search results. They have always been raw data, the first place for systems that use data in aggregation to create a signal that is integrated into Google’s ranking decision-making systems.
  • So yes, just like measuring human data, raw data is processed to create a signal or train AI systems.

Read the DOJ memorandum in PDF form here.

Read about four research papers about CTR.

Read Google’s 2006 patent, Adjusting the ranking of search results based on vague user feedback.

Featured image by Shutterstock/Carkhe

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button