Hope the Robot Does Badly at Your Earnings Call

Here is a test you can run before your next results call, and you should hope you fail it.

Take your earnings release, your prepared remarks, and the last few quarters of transcripts. Feed them to ChatGPT. Then hand it the questions your analysts are most likely to ask, and let it answer them in the CFO’s place. Read what it produces.

If the machine’s answers are hard to tell apart from the ones your management would have given live, you have learned something uncomfortable: the call conveyed nothing. Everything in it was already public, already in the deck, already knowable by anyone with the release and a language model. You held an hour of everyone’s morning to recite what the market already had.

The columnist Matt Levine put this most cleanly back in 2023. An analyst has read your release and your prepared remarks before the call starts; they ask questions precisely to get at what those documents don’t say. A language model, he noted, knows only what is already public knowledge and “can’t fill in any details. (Or, it can, but just by making stuff up.)” So if you want to know whether your call did any work, see whether a bot could have faked it. Hope it can’t.

What an earnings call is actually for

This sounds like a joke about AI. It is actually a precise description of what an earnings call is for, and the description predates the chatbots by decades.

An earnings call is not an information-delivery mechanism. The information was delivered in the release, hours earlier, in a form built for exactly that. The call is a disclosure-under-questioning mechanism. Its value sits in the part that isn’t scripted: the answer to the question management didn’t choose, given live, by a named person who has to stand behind it. That is where private information leaks out: not in a press release, but in how a CFO handles “walk me through the cohort that’s slowing,” and whether the answer is specific or a fog.

So the useful question for an IRO is not “did the call go smoothly?” It is “did the call contain anything a machine couldn’t have generated from the public record?” The smooth, generality-filled call, the one that could have been a bot, is not a triumph of message discipline. It is the tell that management said nothing.

The part the machine gets wrong is the part that moves the stock

We now have the experiment run at scale, and the result is worth sitting with.

A 2025 study, Executives vs Chatbots, ran exactly this thought experiment across 82,128 earnings calls from 2004 to 2020. For every analyst question, the researchers fed a context-aware language model all the public inputs and had it answer, then measured how far the executive’s real answer diverged from the machine’s. Call that gap the human residual: the part of the answer the AI could not have produced from what was already public.

Then they checked what the market did with it. Their finding is the whole argument in one line.

The market only reacts to the residual.

Where the executive’s answer matched what a machine would say, nothing happened: no price move, no forecast revision. Where it diverged, the stock moved, analysts sharpened their estimates, and the market got more liquid. Pushing the residual from the bottom of its range to the top went with higher abnormal returns, lower analyst forecast dispersion, tighter bid-ask spreads, and lower illiquidity.

And the residual has a shape. It concentrates in firm-specific, operational answers: competition, customers, products, demand. It nearly disappears on regulation and macro risk, where management holds no private knowledge a model lacks. The executive is informative exactly where they know something the public record doesn’t, and replaceable exactly where they don’t.

This is not a finding about AI. It is a finding about IR.

Here is the part that should interest anyone who has spent a career in this work, because it is not really about AI at all.

Look again at what the residual moves: forecast dispersion down, spreads tighter, illiquidity lower. Those are not machine-learning metrics. They are the textbook definition of what investor relations is for. The function exists to reduce information asymmetry, narrow spreads, dampen volatility, and through all of that shave the risk premium in the cost of capital. That is the job, as the literature has defined it for thirty years.

So a 2025 machine-learning paper, without setting out to, measured the value of investor relations, and found it exactly where the canon always put it: in the human-conveyed reduction of uncertainty. The residual the AI cannot reproduce and the thing IR has always been paid to produce turn out to be the same thing.

The older IR literature says it in plainer language. An analyst picking up your stock is taking a leap of faith, staking their forecast and their reputation on management being straightforward and consistent, and they may get only one chance to take it. What they are buying is not your data, which they already have, but your credibility — the one qualitative asset IR exists to build — and the benefit of the doubt that a track record stores up and that gets spent in a hard quarter, when a complicated miss needs explaining and you need to be believed. Premium valuations come from belief in management, because belief lowers perceived risk, and lower perceived risk is higher value. None of that is something a model can hold on your behalf. A machine can transmit your numbers. It cannot vouch for them.

What to do about it, if you run IR

This reframes the job in a useful way, and the reframing matters now, because AI is seeping into earnings prep and the instinct it encourages is the wrong one.

That instinct is to make management sound more polished, more consistent, more machine-smooth. It is exactly backwards. A call optimised to sound like a competent machine is a call that has surrendered its only source of value. The work is not to script the residual away; it is to make sure management brings it: the CFO ready to say the specific, operational, slightly uncomfortable thing only they know, rather than retreat into “we remain confident in our disciplined execution.” Use the AI for what it is genuinely good at: model the likely questions, draft the boring parts, check consistency across quarters. Then spend the time it saves on the two or three answers where a real person has to say something a machine could not.

It also tells you what not to automate. The recurring suggestion to put an AI chatbot on your IR site to field investor questions is a proposal to delegate the one thing that cannot be delegated. The bot, by construction, knows only the public record; it can only ever give the answer that moves nothing, or invent one, which moves the wrong things. Asked to do exactly this, Salesforce declined. The residual is not the bot’s to give.

And if you run IR in an under-covered market — much of Asia, where a mid-cap may have two analysts and a machine-read filing standing in for the wall of sell-side notes a US large-cap takes for granted — this matters more, not less. Where AI is the primary reader rather than one voice among many, the human residual is the only thing that separates your company from its own press release. There is no analyst consensus to correct the machine’s flat reading. There is only what your management actually said that the machine could not have guessed.

Working out where your management actually adds the residual, and where the call is just reciting the release, is the kind of read we do in an IR effectiveness review.

Run the test

So run it. Feed the robot your release and your remarks, hand it the hard questions, and watch it answer.

If it does badly — if its answers come out bloodless and general where yours would be specific and real — you are doing the job. The market will pay for the difference, in tighter spreads and a lower cost of capital, exactly as it always has.

If it does well, you have a different problem, and no amount of better prompting will fix it. It means your last call could have been run by a machine, which is a long way of saying it did not need to be run at all.

Frequently asked questions

Can AI replace a company's earnings call?

No. An earnings call's value is the part of management's answers a machine cannot predict from the public record, and a 2025 study of 82,128 calls found the market only reacts to that human residual. A model can summarise what you disclosed; it cannot supply the private, operational detail an executive reveals under live questioning.

What is the purpose of an earnings call if the numbers are already public?

The release delivers the information; the call delivers disclosure under questioning. Its value is in the unscripted answers to questions management did not choose, given live by a named person who must stand behind them, which is where private information enters the market.

Should companies use AI to prepare for earnings calls?

Yes, for the right tasks: modelling likely questions, drafting routine sections, and checking consistency across quarters. The mistake is using AI to make management sound more polished and machine-smooth, which strips out the specific, human detail that actually moves the stock.

Does the value of an earnings call matter more in markets with little analyst coverage?

Yes. Where a company has few analysts and AI is the primary reader of its disclosure, there is no consensus to correct a flat machine reading, so the human residual is the only thing distinguishing the company from its own press release.

Advising listed companies representing over $50 billion in aggregate market capitalisation.

If your last earnings call could have been run by a machine, that is worth knowing before the next one. We review where IR is adding signal and where it is reciting the public record. The review is free and carries no obligation, and you will see exactly where your last call was adding signal and where it was not.

Book an IR Effectiveness Review

Jonathan Zax Founder & President Director, IR Advantage IRC·ICIR·Wharton MBA·Harvard BA 30 years in investor relations

← Back to Perspectives