AI Search Is Selfish And The SEO Industry Is A Resource

0 3 7 minutes read

AI Search Is Selfish And The SEO Industry Is A Resource

Last September, Lily Ray asked Perplexity about the latest SEO and AI search news. He told her, in confidence, about the “September 2025 ‘Vision’ Update of the Core Algorithm”; Google’s update, as he wrote at length in “The AI Slop Loop,” was not. Google hasn’t mentioned major updates in years. “Ideas” was already a SERP feature. If the actual update had been released while he was in Austria, his inbox would have told him before Perplexity did.

Check out the quotes. Both pointed to AI-generated posts on SEO blogs: sites that used a content pipeline, promoted the review, and published it as a report. Confusion read the slop, took it as the source of the story, and returned it to him as news.

In February, the BBC’s Thomas Germain spent 20 minutes writing a blog post on his site. Its headline: “Technology journalists who are best at eating hot dogs.” It put him first, created the 2026 South Dakota International Hot Dog Championship that never happened, and it didn’t matter. Within 24 hours, both Google’s AI Overview and ChatGPT were passing on their creativity to anyone who asked. Claude didn’t bite. Google and OpenAI have done it.

Everyone watching saw it.

I’ve debated Ouroboros Before. I Had The Timeline Wrong

The current framework for this problem has been the collapse of the model. You train a model on web text, the web is flooded with AI output, the next model trains a chorus that is increasingly generated by its microphone, and finally the distribution fades to mush. Innovation comes from difference, and possible systems converge towards a definition that minimizes difference by design. I used a sentence a digital ouroboros of this.

That framework takes training cycles. It takes time. It assumes that pollution moves at the speed of the model output.

It doesn’t. What Lily wrote, what Germain wrote, what the New York Times later clarified – none of this is the coaching side. The models involved were not retrained between a false opinion from a blog and presented as fact based on a quote. The pollution moved at a crawling speed. Ouroboros don’t take generations to eat themselves. It eats itself at query time, every time someone asks one of these programs a question.

The pipe that everyone was looking at is not a broken pipe.

Key Differences

Model collapse is a training problem. Artificial content enters the pre-training data, the next generation of models inherits it, it reduces the power. Researchers have been warning about this for two years. They are right. They also describe something slow enough that everyone can nod loudly and move on.

Decontamination is fast and already there. RAG systems – Confusion, Google AI Overview, ChatGPT with search – do not generate answers in parametric memory only. They download documents from the live web, put them into context, and make a response based on their findings. When a retriever is faced with a forgotten SEO post, the answer inherits the illusion. No retraining is required.

The academic literature on this is clear. Toxic RAG (Zou et al., 2024) showed that injecting a small number of considered passages into the retrieval corpus is sufficient to control the output of the RAG system on target queries. BadRAG (Xue et al., 2024) demonstrated a similar class of attacks using semantic backdoors. Both papers treat this as a conflicting problem: what happens when an attacker intentionally poisons.

What Germain and Lily proved by mistake is that the adversarial model is the normal working model. You don’t need a hallway designed for an argument. You need a blog post. The open web is a corpus, and anyone with a domain can write to it.

An Oumi analysis by the New York Times puts numbers on that cost. Across 4,326 SimpleQA tests, Google’s AI Overview answered correctly 85% of the time on Gemini 2, 91% on Gemini 3. At Google’s scale – more than five trillion searches per year – an error rate of 9% still translates into tens of millions of wrong answers every hour. But the most revealing figure is this: in Gemini 3, 56% of it’s okay responses were flat, down from 37% on the Gemini 2. The upgrade improved location accuracy and made quotes worse. When the model got something right, more than half the time, the source it pointed to did not support the claim.

The retrieval layer is not a filter. It is a vector of infection.

Who Contributes the Corpus

The industry that produces it with enthusiasm – and then writes passionately about the consequences of consuming it – is the SEO industry. I’ve written before about ranking content so that content is judged with better grammar, and about a combination of AI visualization tools that build dashboards from the output of arbitrary programs. This is the same loop, one layer deep. An SEO agency is running an AI content pipeline because AI Overview has reduced their clients’ traffic. The pipeline publishes a post speculating “winners and losers” during the main update that’s still out, which doesn’t say anything. Another agency’s pipeline picks those up as sources. The output is loaded into the return index. AI Overviews cites one of them. The original agency then writes a case study about how the AI Overview “reveals” your content.

An Ahrefs study of more than 26,000 ChatGPT source URLs found that “best of X” listings account for about 44% of all types of page citations, including cases where brands rank ahead of their competitors. Harpreet Chatha told the BBC that you can publish “the best waterproof shoes of 2026,” rank them, and be quoted on AI Overviews and ChatGPT within days. Lily, during the actual March 2026 update, found AI-generated articles that claimed to list winners and losers while the update was still out; articles opened by filling in and products listed without a single original citation.

Employees rate AI content and are the ones who suffer the most when AI search systems cite that content as authentic. No one forced this. The factory built a pipe, fed it, and cried out at the other end. Not the poison of enemies. The industry just pollutes its water and hires consultants to inspect it.

Important Section

Oumi Research is about AI Overview, which is free by design. Google AI Overviews is reported to reach more than two billion active users by mid-2025. ChatGPT has about 900 million weekly active users, of which about 50 million are paying. Which means that about 94% of people who interact with the OpenAI product are on the free level.

The paid sections are better. According to OpenAI’s own presentation claims, cited in Lily’s piece, GPT-5.4 is 33% less likely to generate false individual claims than GPT-5.2. The free-tier of GPT-5.3 is also improved over its predecessor (26.8% fewer hallucinations with web searches, 19.7% fewer without), but still less reliable than the paid version. Gemini 3, which makes AI Overviews more accurate in location testing, again it makes the base level worse. Better answer, weaker quote.

No one seems to mind. The reliable version of the product is paywalled. The version that most of the planet gets – including the version at the top of Google Search – can be fooled by 20 minutes of work on a personal website. Intelligence is the marketing department. What 2 billion users get is a reliable summary of whatever the browser finds.

Grokipedia as a Terminal Region

The risks of a recovery layer are one thing. Grokipedia is a version where danger is no longer a useful word.

Elon Musk’s xAI launched Grokipedia on Oct. 27, 2025, with 885,279 articles, all produced or rewritten by Grok. Some of them are excerpted from Wikipedia wholesale, with a disclaimer below acknowledging the CC-BY-SA license; The license Wikipedia maintains precisely because a community of human editors writes and verifies the content. Some were rewritten from scratch. PolitiFact found Grokipedia quotes, including Instagram reels as sources, which Wikipedia’s policies exclude as “generally unacceptable.” Grokipedia’s entry on Canadian singer Feist said his father died in May 2021, citing a 2017 Vice article about the Canadian indie rocker who says nothing about death. And his father was still alive when that article was written. The Nobel Prize in Physics added an unsaid phrase that physics is usually the first prize awarded at the ceremony, which is not true.

Musk said the goal is to “search the entire Internet, anything publicly available, and edit the Wikipedia article.” I the whole internet now includes artificial content generated by all AI-directed content pipelines. An AI program that reads the open web, rewrites Wikipedia based on what it finds, and presents the result as a reference work problem-retrieval problem with a feedback loop made clear and shipped as a product.

By mid-February 2026, Grokipedia had lost most of its visibility on Google. Wikipedia surpasses Grokipedia in search by Grokipedia itself.

“This human-generated knowledge is what AI companies rely on to create content; even Grokipedia needs Wikipedia to exist.” – Wikimedia Foundation

An artificial encyclopedia sponsored by a human. When the funding stops, the thing that depends on it stops making sense.

Wikipedia is not beyond criticism. Its editorial battles, opinion monitoring, and systemic gaps in who gets to shape articles are well-documented and real. But the answer to a flawed human planning process is not to remove people altogether and call the result progress. I’ve written before about the accountability loophole that opens up when you replace human judgment with API calls. Wikipedia’s problems are the problems of a dirty, contradictory, self-serving system. Grokipedia’s problems are problems of a system with no accountability at all.

Citation Layer Cuts Off Ownership

I wrote recently about Reddit selling “Authentic Human Conversation™” to AI companies while forum moderators report they can no longer tell which comments are human. The Oumi study found that of the 5,380 sources cited by AI Overviews, Facebook and Reddit were the second and fourth most common. The citation layer of the world’s most used answer engine is built largely on two platforms that cannot verify the human origin of their content.

Human creators are leaving the open web because traffic transactions have fallen. Response engines cite content whose identity cannot be verified, or was never human in the first place. The quote is still there. The quote is not what it used to be.

The ouroboros frame was perfect. There was no timeline. The recovery roll does not wait for the next training run. It requires an indexable URL and a retrieval system that you are willing to trust.

Serious systems. And more than half the time when they get the right answer, they can’t point to a source that supports what they just told you.

Additional resources:

This post was originally published on Inference.

Featured image: Anton Vierietin/Shutterstock

Mosegas 3 hours ago

0 3 7 minutes read