Agentic Supply Chain: Ghost Inventory: Why Detection Was Never the Hard Part

Retailers spent $172 billion last year making their inventory more accurate. RFID went mainstream. Computer vision moved from pilot to planogram. Cycle counts got cheaper and faster than at any point in the history of the store.

And the industry still lost $1.73 trillion to inventory distortion in 2025 — about 6.5% of global retail sales, roughly the entire GDP of South Korea, evaporating because what the system said was on the shelf and what was actually on the shelf were two different facts.

Sit with that. We threw the best detection technology we've ever had at the problem, and the problem barely moved.

I don't think that's a technology failure. I think we've been solving the wrong half of the problem.

The shelf has always lied. We just hear it better now.

Ghost inventory — phantom stock, the unit your system swears is available that isn't physically there — is not a new ghost. Inventory record inaccuracy touches somewhere between 50% and 70% of SKUs at any given moment. Average record accuracy sat at 83% in 2024, against a practical benchmark of 95%. That gap means roughly one in six records in your system of record is wrong right now. Phantom stock alone drives as much as 80% of out-of-stocks, and a third of shoppers have personally hit the "shows in stock, isn't" wall. Forty-five percent of them don't come back after a bad enough experience.

For two decades, the answer was better sensing. And on its own terms, sensing won. Walmart took store accuracy from 65% to the high 90s with RFID. H&M hit 99%. Computer vision shelf monitoring runs 95–99% versus 60–70% for a human with a clipboard.

So detection is, for practical purposes, solved. You can know — cheaply, continuously, at the unit level — that the shelf and the record disagree.

Knowing they disagree was never the hard part. Deciding which one is telling the truth, and what to do about it, is.

Brawn finds the gap. Brain has to close it.

This is the same line I keep drawing between traditional automation and agentic systems. A sensor is brawn. It generates a signal: count says 4, system says 7. It does not know why. It cannot tell you whether three units walked out the door, got mis-scanned at receiving, are sitting in a backroom tote, or are mislabeled on the next shelf over. Each of those is a different problem with a different fix, and a sensor is constitutionally incapable of choosing between them.

That's the work that's left. Not detection — adjudication. And it's exactly the work that's gone unautomated, which is why the $1.73 trillion is sticky despite all the sensing. Fewer than one in four retailers has actually rolled out AI in the parts of the business where distortion lives. Everyone bought the eyes. Almost nobody built the judgment.

This is the real opening for agents in inventory, and it's a narrow, specific one. Not "AI that watches the shelf." We have that. An agent that takes the variance signal, pulls the POS stream, the receiving log, the planogram, and the recent adjustment history, reasons about the most likely cause, and either dispatches a targeted recount to the floor lead or reconciles the record — and knows the difference between those two responses.

The three-stage auditor — and the stage everyone skips

If you're evaluating any "autonomous inventory auditing" pitch this year, hold it against three stages:

Detect. The shelf and the record disagree. Sensors do this. It's the commodity layer.

Adjudicate. Decide which source is true and why. This is the reasoning layer — correlating signals across systems to infer cause, not just flag variance.

Act. Either send a human to verify, or write the correction into the system of record.

Most products that call themselves AI auditing collapse Adjudicate into Detect. They see a discrepancy and immediately "correct" the record to match the latest count. That feels like automation. It is actually the most dangerous thing you can do, because the latest count is frequently the thing that's wrong — a miscount, a misread tag, a unit scanned in the wrong zone. Auto-correct on a bad count, and you haven't fixed ghost inventory. You've laundered an error into your system of record at machine speed, and now every downstream decision — replenishment, allocation, the promise you make a customer online — inherits it with full confidence and no fingerprints.

The agent best positioned to fix ghost inventory is the same agent best positioned to manufacture it.

Why this is an Accuracy problem before it's an inventory problem

When my co-founder and I built ARMS, the first of the five risk dimensions we scored was Accuracy — and an autonomous auditor is the cleanest example I've found of why it has to come first. The moment you give an agent write-access to your system of record, its own accuracy stops being a metric and becomes a multiplier. A 95%-accurate auditor running autonomously across millions of SKUs isn't 95% helpful. It's 5% wrong about the one number every other system trusts absolutely.

So the pre-deployment question isn't "can it find discrepancies?" Of course it can. The questions are: when the agent and the count disagree, who wins, and on what evidence threshold? What's the confidence bar below which it must escalate to a human instead of writing? Can you audit, after the fact, every record it changed and why? Does it know the difference between "I'm confident" and "the count was confident"?

That's the gate. Not a control plane watching the agent in production and flagging the bad adjustment after it's already poisoned three reorder cycles. A pre-deployment lens that decides, before the agent ever touches the record, where its judgment is allowed to be final and where it has to ask.

Detection got cheap. Judgment is the product now. And judgment you can't govern isn't an asset — it's a faster way to be wrong.

I'm working through exactly this — where an inventory agent's judgment should be final versus where it escalates — with a handful of supply chain teams as design partners right now. If you're putting agents anywhere near your system of record this year, I'd trade notes. Find me.