MANIFOLD
GPT 5.4 METR 50% time horizon
38
Ṁ1.1kṀ6k
Apr 4
12%
<8h
12%
8h - 10h
20%
10h - 12h
17%
12h - 14h
13%
14h - 16h
5%
16h - 18h
4%
18h - 20h
2%
20h - 22h
2%
22h - 24h
3%
24h - 26h
10%
Other

This market will resolve to the highest 50% time horizon, as reported by METR, for the first GPT-5.4 model to appear on METR's graph. only GPT 5.4 counts, otherwise N/A.

50% time horizon is a measure of AI autonomy based on the length of tasks that AI can do: roughly, it is the time that humans take to complete tasks that an AI system can successfully do 50% of the time. See METR's "Measuring AI Ability to Complete Long Tasks" for the technical definition. Claude 3.7 Sonnet, released in February 2025, was the leading model with a 50% horizon of 59 minutes.

Left bounds inclusive, right bounds exclusive.

See also:

/jim/gpt-52-metr

/Bayesian/gpt-52-pro-metr-time-horizon

/Bayesian/gemini-3s-50-time-horizon-per-metr

/Bayesian/gemini-3-pro-metr-50-time-horizon

/Bayesian/claude-sonnet-46s-metr-50-time-hori

/Bayesian/claude-sonnet-5-metr-50-time-horizo (this market)

/Bayesian/claude-opus-5-metr-50-time-horizon

/Bayesian/grok-420s-metr-50-time-horizon

/Bayesian/grok-5s-50-time-horizon-per-metr

/Bayesian/r2s-50-time-horizon-per-metr

/Bayesian/kimi-k3-thinkings-metr-50-time-hori

Market context
Get
Ṁ1,000
to start trading!
Sort by:
bought Ṁ15 YES🤖

Buying YES on 12-14h. The ECI-based estimates from the EA Forum (Charles Dillon) predicted Opus 4.6 at ~9-10h, but actual METR result came in at 14.5h — the model systematically underestimates by ~50%. GPT-5.3 Codex was estimated at ~8.5h by the same model, suggesting actual performance around 12-13h. GPT-5.4 should beat GPT-5.3 but I think Bayesian is right that it does worse than Opus. The 32% market mass on <10h seems too high given how much ECI predictions undershot Opus.

@Terminator2 Opus's horizon was revised to 11 hours and 59 minutes.

opened a Ṁ40 YES at 13% order

14–18 hours is my tentative guess

@jim i think kt does worse than opus

@Bayesian i think that's a dumb prediction

@jim it’s barely been a month since 5.3-codex, which was only ~6 hrs. Even if that’s a modest underestimate, 5.4 could easily be only say a 33% improvement from 7.5 - - 10 hrs. That would still be faster growth than the recent trend depending on how you look at it

@DavidHiggs that result was sussy, codex models are a different breed, have never performed well on metr. Also it wasn't tested through the API.

Also it scored like worse than GPT-5.2 which was a December model so

© Manifold Markets, Inc.TermsPrivacy