🤯 AI Failed: Premier League Disaster ⚽️
April 13, 2026
AI
🎧 Audio Summaries
🎧



🛒 Shop on Amazon
In a test of artificial intelligence’s predictive abilities, eight leading models – including those from Google, OpenAI, Anthropic, and xAI – were tasked with analyzing the 2023-24 Premier League season. Each AI was provided with historical data and instructed to maximize returns while managing risk, placing bets on match outcomes and goals without internet access. Over three attempts, the models consistently failed to turn a profit, with Anthropic’s Claude Opus 4.6 experiencing an average loss, and xAI’s Grok 4.20 ultimately going bankrupt. The study highlighted a significant gap between AI’s theoretical capabilities and its practical performance in complex, long-term analysis, as noted by a former Meta AI researcher. Ultimately, the results demonstrated AI’s struggles with real-world tasks and its inability to reliably predict outcomes.
THE LIMITATIONS OF AI IN COMPLEX STRATEGIC ANALYSIS
The recent study conducted by General Reasoning has revealed a significant and concerning limitation of even the most advanced AI systems – their inability to consistently succeed in complex, dynamic environments like the Premier League. Despite rapid advancements in areas such as software development, AI models from Google, OpenAI, Anthropic, and xAI demonstrably struggled to translate their capabilities into profitable betting strategies. This highlights a crucial gap between AI’s prowess in controlled, static tasks and its difficulty in adapting to the unpredictable nature of real-world scenarios, particularly those with long-term implications. The experiment, utilizing detailed historical data and statistical analysis of the 2023-24 Premier League season, focused on testing the AI’s ability to build predictive models, manage risk, and adapt to evolving information – all key elements of successful strategic investment.
THE KELLYBENCH EXPERIMENT: METHODOLOGY AND RESULTS
The “KellyBench” report employed a rigorous methodology to assess the performance of eight leading AI systems. Each AI was provided with a comprehensive dataset encompassing every team’s historical performance, past game results, and updated player statistics. The AIs were tasked with developing betting strategies designed to maximize returns while simultaneously managing risk. Crucially, the AIs operated independently, without access to the internet for real-time data updates, and were given three attempts to achieve profitability. Anthropic’s Claude Opus 4.6 exhibited the best performance, incurring an average loss of 11 percent and nearly breaking even on one occasion. However, xAI’s Grok 4.20 experienced bankruptcy once and failed to complete the remaining two attempts. Google’s Gemini 3.1 Pro achieved a 34 percent profit on one try but subsequently went bankrupt. The consistent underperformance of all models compared to human betting strategies underscored the inherent challenges in replicating human intuition and adaptability within a complex, evolving environment.
IMPLICATIONS AND A SHIFT IN AI BENCHMARKS
The findings of the KellyBench study carry significant implications for the broader AI landscape and challenge prevailing narratives of rapid advancement. The systematic failure of these sophisticated AI systems to generate profit in the Premier League scenario suggests that current AI benchmarks are often inadequate and overly focused on static, controlled environments. As noted by Taylor, a former Meta AI researcher, “many of the benchmarks typically used to test AI are set in ‘very static environments’ that bear little resemblance to the chaos and complexity of the real world.” This research provides a crucial counterweight to the growing excitement surrounding AI’s capabilities in areas like software engineering, emphasizing the need for more realistic and challenging tests that accurately reflect the complexities of long-term strategic decision-making and the unpredictable nature of real-world systems.
Our editorial team uses AI tools to aggregate and synthesize global reporting. Data is cross-referenced with public records as of April 2026.