Business risk no one is modeling: AI is replacing the very experts it needs to learn from

Mosegas 2 hours ago

0 0 4 minutes read

Business risk no one is modeling: AI is replacing the very experts it needs to learn from

For AI systems to continue to improve in information work, they need a reliable way to improve themselves automatically or human testers who can catch errors and produce high-quality feedback. The industry has invested heavily in the first. It never thinks twice about what’s going on.

I would argue that we need to treat the human assessment problem with as much rigor and investment as we do in building the model capabilities themselves. Recruitment of new graduates in large technology companies has become necessary halved since 2019. Document review, first pass research, data cleaning, code review: Models handle this now. Economists who track this call it displacement. Companies that do it call it efficiency. And they don’t focus on future problems.

Why self-development is limited in knowledge work

An obvious pushback is reinforcement learning (RL). AlphaZero learned Go, chess, and Shogi at superhuman levels without human data and developed novel strategies in the process. Move 37 in the 2016 match against Lee Sedol, which is a move that experts say they would never play, did not come from human annotation. It’s from the AI of the game itself.

What makes this is natural stability. Move 37 is a novel move within the constant state space of Go. The laws are absolute, immutable, and permanent. More importantly, the reward signal is absolute: Win or lose, and immediately, with no room for interpretation. The system always knows if the move was good because the game ends with a clear result.

Knowledge work lacks both of these characteristics. The rules in any professional domain are dynamic and constantly being rewritten by the people who work in them. New laws are passed. New financial instruments have been invented. The legislative strategy that worked in 2022 may fail in an area that has changed its meaning. Whether the medical diagnosis was correct may not be known for years. Without a stable environment and an ambiguous reward signal, you cannot close the loop. You need people in the test chain to continue teaching the model.

The problem of composition

The AI systems being built today are trained with the expertise of the people who went through that development. The difference now is that the entry-level jobs developing such technology were self-employed first. Which means the next generation of potential professionals isn’t collecting a form of judgment that makes the human tester worth having in the loop.

History has examples of knowledge dying. Roman plaster. Gothic architectural styles. Mathematical traditions took centuries to recover. But in all cases of history, the cause was external: The plague, the conquest, the collapse of the institutions that held knowledge. The difference here is that no external power is required. Fields could be demoralized not by disaster but by a thousand logical economic decisions, each logical in isolation. That’s a new trend, and we don’t have much of a habit of seeing it while it’s happening.

When all fields are silent

At its logical end, this is not just a plumbing problem. The collapse of the need for technology itself.

Consider advanced math. It is not popular because we stop training mathematicians. It’s sad because organizations stop needing mathematicians to do their daily work, the economic incentive to be one is disappearing, the number of people who can think mathematically is shrinking, and the capacity of the industry to produce new knowledge is quietly diminishing. The same concept applies to coding. Our question is not “will AI write code” but “if AI writes all production code, who develops the deep architectural intuition that produces the design of truly novel systems?”

There is an important difference between the field of action and the field of perception. We can automate a great deal of structural engineering today, but the secret knowledge of why certain methods work resides in the heads of people who spent years doing it wrong first. If you end this practice, you will not only lose the workers. You lose the ability to know what you have lost.

Advanced math, computer science, critical legal thinking, complex systems design: When the last person who deeply understands the subfield of algebra retires and there’s no one to replace them because funding dried up and the career path disappeared, that knowledge won’t be available anytime soon.

It’s gone. And no one notices because the models they trained for are still performing well in benchmarks for another decade. I think of this as a gag: The surface power remains (models can still produce professional-looking results) while the human power below to verify, extend, or modify that technology is quietly disappearing.

Why are the rubrics not fully changed

The current approach is rubric-based assessment. Constitutional AI, reinforcement learning from AI feedback (RLAIF), and systematic criteria that allow scoring models for critical strategies logically reduce reliance on human evaluators. I don’t fire them.

Their limitation is this: A rubric can only capture what the person writing it knows how to measure. Adjust hard against it and you get a very good model in satisfying the rubric. That’s not the same thing as a really good model.

Rubrics measure a clear, meaningful component of judgment. The deep part, the instinct, the gut feeling that something is off, doesn’t fit into the rubric. You can’t write it down because you need to hear it first before you know what to write.

What does this mean in practice

This is not an argument for limiting development. The benefits of skill are real. And it’s possible that researchers will find ways to close the evaluation loop without human judgment. Maybe synthetic data pipelines are good enough. Perhaps models are developing reliable self-remedial mechanisms that we cannot yet imagine.

But we don’t have them today. And right now, we’re tearing down the human infrastructure that’s currently filling the gap, not as a deliberate decision but as the result of thousands of rational minds. The responsible version of this change is not to think that the problem will solve itself. It is treating the assessment gap as an open research problem with the same urgency that we bring to achieving empowerment.

The thing that AI needs the most from humans is the thing that we don’t focus too much on maintaining. Whether that is true forever or true temporarily, the cost of ignoring it is the same.

Ahmad Al-Dahle is the CTO of Airbnb.

Mosegas 2 hours ago

0 0 4 minutes read