Reported Sharpe vs Reproduced Sharpe

In many reviews, the most important number is not the reported Sharpe. It is the gap between reported Sharpe and reproduced Sharpe. That gap tells you whether the backtest is a stable research artifact or a fragile presentation artifact.

A reproduced result does not have to match to the last decimal to be useful. Data vendors change, calendars differ, and small implementation details can move a statistic. The question is whether the reconstruction supports the same investment conclusion after the review controls for timing, universe, costs, and code paths.

Where the gap usually comes from

The first source is timing. A feature may exist in the final dataset but not at the decision time. A point-in-time join may quietly become an after-the-fact merge. A corporate-action field, index membership file, or analyst estimate can carry revision history that was not available when the signal would have traded.

The second source is universe construction. A strategy that screens only survivors can look clean, liquid, and persistent. The same logic run against the investable universe known at the time may lose names, add dead names, or expose capacity and liquidity constraints that the original chart did not show.

The third source is cost realism. A small change in transaction costs, borrow assumptions, slippage, financing, or turnover treatment can move the result from attractive to marginal. Costs should be applied before the performance claim is made, not after the chart has already won the room.

The fourth source is code path ambiguity. If the result depends on notebook state, local files, manual edits, hidden filters, or parameters selected after looking at the answer, the performance statistic is not yet review-ready.

What a clean reconstruction should leave behind

A useful review leaves a trail. The trail should name the raw or minimally processed inputs, the transformations applied, the assumptions that matter, and the checks that changed the conclusion. If the reproduced Sharpe is lower, the memo should say why. If it is close, the memo should still identify the constraints under which it held up.

The best internal outcome is not always a binary approve or reject. Sometimes the right answer is narrower: the signal is plausible but capacity is lower than advertised; the model survives a point-in-time rebuild but fails under more realistic costs; the result works in one regime but not under the risk budget being proposed. Those distinctions help decision makers size the next step.

How to use this in a review meeting

Ask for the reproduction path before debating the headline number. Which files generate the result? Which dates are decision dates, report dates, or availability dates? Which universe was tradable at the time? Which assumptions were selected before the out-of-sample period was observed? Which costs are inside the reported statistic?

If those questions produce a reproducible answer, the Sharpe ratio can become evidence. If they produce a story, the team may still have an interesting research idea, but it does not yet have a backtest that should carry capital, diligence, or committee confidence.

Reported Sharpe is not reproduced Sharpe.

Where the gap usually comes from

What a clean reconstruction should leave behind

How to use this in a review meeting

Review the backtest before the decision locks.