Written for Non-Students

Ask the chatbot a question and watch what happens on the web behind it. It reads 30 or 40 pages to build your answer, extract what you need, and give you a structured paragraph. You don’t see the pages, never click on them. A site that “wins,” whatever that means to win now, gets a citation in light gray text and not a single visitor.
That’s a lot of web learning right now, and pipeline firms can watch it happen. On the Cloudflare network, bots outnumbered humans in requests for real web pages this year, 57.5% to 42.5%. Its CEO noted the crossover in June, about 18 months ahead of its forecast, and put pressure on agents who download pages on behalf of people. AI is the fastest growing segment, growing eight times faster than human travel last year. The web is read more than ever. Not just about people.
Here’s the uncomfortable part no one wants to say out loud. The rules we built the web for, the ones about quality and access and reliability, were written for a human audience. The audience changed. Laws are repealed so that they are the same, quietly, and no one will vote on them. Three of them are already leaving. Here’s where it is, and why.
Who Uses the Web, and Why Do They Bother
For twenty-nine years, the web had a traffic monitor, and the boss was Google. Keywords for things where no one can see, buy a thousand links, flip through the door pages, and soon, something has fallen on you from a great height. Many people thought this was hygiene. Google keeps the web clean on behalf of everyone, out of the goodness of its heart.
It was nothing like that. Google validated the web because Google sold advertising through its index, and an index full of garbage is of little value to advertisers. Cleaning was maintenance in the front of the store.
Andy Baio looked at the same idea a decade ago, when Google let Books and its news archive rot when it stopped getting its money, and warned against trusting the company to do library work. He was generous. The library was once a side effect of the ad business, kept alive when you paid and dropped when you didn’t.
I spent the better part of six years on the search quality and webspam side of that operation, so I can tell you that the work was real and the developers meant it. The reason it received funding was never in doubt.
Now look at the new students. A search engine doesn’t sell ads against a structured index, because it doesn’t store one for you to browse. It reads, measures what it finds, and slowly repeats what it wants. A weak page is not penalized. It just doesn’t get picked up, which, from a page view, is too bad, because at least the fine came with the email. The guard is gone. There is a door that never tells you why you didn’t enter.
And Google is now standing on both jobs at the same time, which is really the funny part. It still uses an ad-supported index, which a court recently ruled illegal in both search and advertising sales against it. And build an answer engine that makes that index closer to the point, then focus ads on it as quickly as the format allows: sponsored images combined with image effects, ads inside AI snapshots, new closed purchase pipeline. A company that protects its old business by building something that ends, and selling ads with a lethal weapon. You don’t need a diagram.
Paying to Enter
The old arrangement was fair trade, generous even. Let our search engine come in for free, says Google, and we’ll bring readers back to you. Sites were not just crawling-tolerant; they strive to crawl quickly and be pointed deep, because crawling was the way to the audience.
AI crawling makes no such provision. It takes the same content, wraps it into a response, and returns nothing. No click, no reader, no ad to sell. Cloudflare, sitting in front of the fifth part of the web, said that the silent part became a microphone in July 2025: the old agreement is broken, so new sites in its network now block AI crawlers automatically, and the owners can charge for crawling in the marketplace that offers any bot that does not want to pay a decent “required fee” and nothing else. For thirty years I’ve been begging Google to crawl more, and the reflex is now installing a tollgate.
In Britain, 31 publishers have passed the ban. They changed the old robots.txt file, respect please-searchers have not learned to ignore it, into a binding contract: load the page, re-use the article without payment, and agree to a £500 invoice that the county court can use like any other debt. No one has collected on OpenAI yet, but the movement tells you where this is headed. The toll has a list of prices now.
Why now, not five years ago when the scrub started? Because writing became a rare thing. When a model is only as good as the text it reads from, and the open web fills with the model’s exhaust, real human writing ceases to be raw and prized. Owners have realized that they are sitting on inputs, and inputs have value.
So look closely at what they actually do, because it tells you the real argument. The New York Times sued OpenAI for training in its archive and, at the same time, licensed that same archive to Amazon. OpenAI has signed up The Guardian, The Atlantic, The Washington Post, News Corp, and a host of others. No one on that list is trying to send machines away. They went fast. The genie is out, the need is in order, and the fight has never been about whether this happens. It’s about who gets paid and how much. The road is being built either way. The dispute is only about the toll.
Important As Cheats Now
There is one rule that is the oldest of all: Never show a search engine something different from what you show a person. That is self-control, and wearing clothes is wiped out. Every SEO learns it on day one.
But read Google’s explanation, not fiction. Cloaking provides different content to users and search engines with the intention of manipulating the levels and mislead the people. What made it a cheat was never that the two versions were different, but intent to deceive sitting underneath. Showing the search engine a page about holidays and someone a page about cold medicine hides it. Rendering the same facts in clean, machine-readable form in something readable only in machine-readable form is not, and never has been. Google’s guidance says the same: change the presentation as you like, as long as the object is the same.
So if the reader is an agent looking for structured data instead of your hero image and cookie banner, giving them structured data is not the trick. It answers in the language in which it asked. This naive thing thought that the scanner represented two human eyes. Remove that thinking, and most of the taboo that goes with it.
This is where I have to be ex-webspam for a while, because the line is still there, and it just moved. Sleep the machine to change the level, or feed it something that will not stand behind in front of the person, and it can still be seen as an intention to deceive, and in life, money, or anything else where the wrong answer hurts someone, it is still dangerous. The illusion is linear. Formatting your content for the reader that happens automatically has never been the wrong side of it.
What’s Left of the People
None of this is predictive. It has happened to the parts of the web that machines care about the most, and it works out from there. Guidance is not controversial. The only live questions are these terms: who gets paid, who gets walled off, and whose business model gets the right to decide what a good answer looks like.
That last one is wide open, and it’s worth looking into, because the people building the new front door can’t agree on how to make money out of it. Google embeds ads in answers. Confusion tried ads, killed them, and now swears that the user should believe that he is getting a better answer than the one that is paid the most. Anthropic keeps Claude ad-free and says a lot. OpenAI evaluates ads while promising not to bend the response, which is the promise Google made about search results, and we all remember how long ago that was. Whoever wins that argument inherits the old boss’s job and gets to define “good” on the web that most people now use.
Which leaves a small, unknown task for the rest of us. The machine-readable web doesn’t care how clever your headline is or how slick your page feels. It cannot be charmed or flattered. It keeps what is useful in the answer and ignores the rest, which means that the heavy work is the work under decoration: the initial reporting, and the judgment to know that the answer is indeed correct before it reaches someone who will act on it. The web spent 30 years studying people. Now it should be useful for something that has no thumbs to lift and no hands to clap.
Additional resources:
This post was originally published on Inference.
Featured image: Natalya Kosarevich/Shutterstock



