AI Now Matches Prediction Markets in Forecasting Real Events, Study Finds

by admin August 21, 2025

written by admin August 21, 2025

In brief

Prophet Arena tests AI models by having them predict real-world, unresolved events, with GPT-5 currently leading the rankings.
AI models show distinct prediction “personalities” and often diverge from market consensus, sometimes generating high returns.
Early results suggest AI can forecast as accurately as prediction markets, potentially transforming institutional decision-making.

A new artificial intelligence benchmark launched in August shows that AI models can forecast real-world events as accurately as prediction markets—and sometimes better, according to researchers at the University of Chicago’s SIGMA Lab.

Prophet Arena evaluates AI systems by having them predict the outcomes of live, unresolved events drawn from platforms like Kalshi and Polymarket—ranging from election results to sports matches and economic indicators. Unlike traditional benchmarks that test models on historical data with known answers, Prophet Arena tests AI against future predictions.

“By anchoring evaluations in unresolved, real-world events, Prophet Arena ensures a level playing field. There is no pre-training advantage, no secret fine-tuning trick, no leakage of test samples,” the Prophet Arena team said in the benchmark’s official blog post.

The benchmark says it is trying to address a fundamental question about artificial intelligence: “Can AI systems reliably predict the future by connecting the dots across existing real-world information?”

Early results suggest they can. GPT-5 currently leads the leaderboard with a Brier score of 82.21%. Meanwhile, OpenAI’s o3-mini model has emerged as the profit champion, generating the highest average returns when its predictions are translated into simulated bets (usually an underdog with enough chances to win can provide a lot more return, given the proper conditions).

DeepSeek R1 appears to be the contrarian AI in the group, frequently making predictions that diverge sharply from both other models and market consensus, so probably not the best model to trust if you want to make a quick buck on Myriad Markets.

The platform reveals distinct “personalities” among AI models when facing identical information. In one example, when predicting whether AI regulation would become federal law before 2026, the market assigned just a 25% probability. But the models diverged wildly: Qwen 3 predicted 75%, GPT-4.1 estimated 60%, while Llama 4 Maverick stayed conservative at 35%.

In another case, o3-mini earned a simulated $9 return on a $1 bet by correctly predicting Toronto FC would beat San Diego FC in a Major League Soccer match. The model gave Toronto a 30% chance of winning, while the market priced it at just 11%. Toronto won.

“(Prophet Arena) tests models’ forecasting capability, a high form of intelligence that demands a broad range of capabilities, including understanding existing information and news sources, reasoning under uncertainty, and making time-sensitive predictions about unfolding events,” the researchers wrote.

The Prophet Arena also enables human-AI collaboration. Users can supply additional news and context to see how predictions shift, while AI models provide detailed rationales for their forecasts.

As prediction markets themselves integrate AI—Kalshi recently partnered with Elon Musk’s Grok, while Polymarket generates AI-powered market summaries—Prophet Arena offers the first systematic comparison of machine forecasting against collective human judgment.

And, if they get really good at it, then machines can be purely factual, with no sentiments or emotions playing a role in the decisions. They could potentially match or exceed the wisdom of crowds, changing the way institutions approach risk assessment, investment decisions, and strategic planning.

The Prophet Arena platform continues updating daily as events resolve, providing an evolving picture of whether artificial intelligence can truly predict the future by connecting today’s dots.

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.

Source link

AI Now Matches Prediction Markets in Forecasting Real Events, Study Finds

In brief

Generally Intelligent Newsletter

Everything new on Netflix in September 2025: stream 61 movies and 9 TV shows, including Steven Knight’s new series

BlackRock Sells Bitcoin and Ethereum in Rare Move

You may also like

Leave a Comment Cancel Reply