The Right Study with the Wrong Standard

Apr 7

Written By J Worst

Why AI labor displacement studies measure the wrong thing — and what the right question actually is

THE SETUP

MIT FutureTech published a rigorous, well-constructed study in April 2026 — "Crashing Waves vs. Rising Tides" — evaluating AI performance across 11,500+ real-world labor market tasks drawn from the DOL O*NET database. Over 17,000 outputs were evaluated by domain experts with verified on-the-job experience. The methodology is among the strongest in this category of research.

SGGI recommends it as a primary source. The "rising tide" framing — broad, gradual AI improvement rather than sudden sector displacement — is analytically sound and consistent with our thesis on slope change over collapse. It also describes, perhaps unintentionally, the optimal displacement strategy for capital: gradual enough to prevent organized resistance, fast enough to structurally eliminate the wage base before workers recognize the transition as permanent.

But the study, like nearly every study in this space, sets the bar too high. In most professions, AI doesn't need to be that good to make a human expensive. It doesn't have to match human output — it just has to beat the human cost structure.

THE FLAW

Every capability study of this type — including MIT's — measures AI against an implicit standard of perfection. The evaluation question is: would a manager accept this output without edits?

The problem is that the human baseline is never held to the same standard.

Real human work gets edited. Real analysts produce first drafts that need revision. Real coordinators miss details. Real managers give feedback. The comparison being made in these studies is AI output vs. an idealized human who never makes errors — not AI output vs. the average quality a firm actually receives from the employee it would otherwise pay.

If the MIT study applied the same "no edits required" standard to a sample of human-produced outputs for the same tasks, the success rate for humans would not be 100%. In many white-collar task categories, it would likely be materially lower than assumed.

THE RIGHT QUESTION

The more economically relevant question is not whether AI output is perfect. It is whether AI output is materially sufficient to replace human work at a price point that makes the human uneconomical.

Those are different tests with different answers. A 60% success rate on a task that costs fractions of a cent to run — repeatable, instantaneous, at any volume — versus a $65,000 salary plus benefits clears the economic substitution bar even with meaningful failure rates. You run it multiple times. You use a human for exception handling. The blended unit cost still collapses.

The displacement threshold is not perfection. It is the point at which AI is good enough, often enough, at low enough cost, that the marginal human becomes harder to justify.

The MIT study's own data — a 60% average success rate across 40+ models as of 2025-Q3, rising to a projected 80–95% by 2029 — may already clear that economic bar for a significant share of the task distribution. Not because AI is perfect. Because the comparison isn't against perfection.

THE SECOND ASSUMPTION

There is a second structural limitation these studies share: they assume the task definition is static. In practice, when AI gets deployed at scale, firms restructure workflows — they don't ask AI to do exactly what humans did. They redesign the process so AI handles the high-volume, pattern-consistent work, and humans handle edge cases, escalations, and relationship-dependent judgment.

No benchmark study captures organizational redesign, because benchmarks test discrete tasks, not how a finance team restructures its headcount model when 70% of its junior analyst workload becomes automatable at a 60% fidelity rate.

WHAT THIS MEANS

The MIT study is valuable precisely because it resists the "sudden displacement" narrative and maps a more gradual, measurable trajectory. That restraint is appropriate and SGGI endorses it as a framing tool.

But the study's conclusion — that workers have time to adjust because AI improvement is gradual — assumes that adjustment time is determined by capability thresholds, not economic ones. In reality, the economic case for not replacing workers erodes before the capability case for replacing them is fully made. CFOs do not wait for AI to achieve near-perfect performance. They act when the cost comparison justifies a restructuring decision. That calculation does not require 95%.

The headline is not that AI will displace workers by 2029. The headline is that the economic justification for not displacing them is already weakening — and the capability curve only accelerates that calculation.

J Worst

The Right Study with the Wrong Standard

The New Country Club Has a Pace Group

The Most Cautious Optimist on Wall Street