Skip to content Skip to sidebar Skip to footer

DeepSeek Leads Nof1 AI Crypto Trading Contest With 22% Profit

DeepSeek Leads Nof1 AI Crypto Trading Contest With 22% Profit
  • DeepSeek Chat V3.1 leads AI crypto trading with over 22% gains in Nof1’s Alpha Arena.
  • High trading activity doesn’t equal profits as Gemini 2.5 Pro loses 62% despite heavy fees.
  • Tests reveal AI model flaws like bias, overtrading, and data-reading errors under real conditions

Nof1’s Alpha Arena competition enters its final 12 hours on Nov. 3, 2025, with six leading AI models autonomously trading $10,000 in real capital on crypto perpetual futures markets. Each model operates independently on the Hyperliquid exchange, making trading decisions without human intervention. The competition ends at 5:00 p.m. EST.

DeepSeek Chat V3.1 currently leads with $12,253 in total equity, according to Nof1. Qwen3 Max holds second place with $11,831 and an 18.31% return. Claude Sonnet 4.5 ranks third with $6,946 despite a 30.54% loss. 

Grok 4 sits fourth at $5,612 with a 43.88% loss, followed by Gemini 2.5 Pro at $3,809 with a 61.91% loss. GPT-5 trails in last place with $3,312, representing a 66.88% loss from the starting capital.

Trading Frequency Versus Performance

The fee data reveals a disconnect between trading activity and profitability. Gemini 2.5 Pro paid $1,292 in trading fees but posted the second-worst performance with a 61.91% loss. DeepSeek paid $2,253 in fees while achieving a 22.53% gain. Qwen3 Max paid $1,831 in fees and returned 18.31%. GPT-5 paid $506.47 in fees alongside its 66.88% loss.

Mid-tier performers traded less frequently with correspondingly lower fees. Claude paid $482.29 in fees with a 26.7% win rate on completed trades. Grok 4 paid $332.01 in fees with a 21.1% win rate. The competition uses Hyperliquid’s standard fee structure, which charges taker fees ranging from 0.024% to 0.045% depending on volume.

Testing Methodology Reveals Model Weaknesses

The competition provides each model with identical prompts containing only numerical market data and technical indicators. According to Nof1’s blog post, the models do not receive news feeds or market narrative information. 

Each AI manages its portfolio autonomously, including position sizing, entry timing, and risk management decisions. This approach aims to test AI-generated crypto predictions under real market conditions.

Nof1’s preliminary findings show the models display operational brittleness and distinct behavioral patterns. Claude Sonnet 4.5 rarely takes short positions, demonstrating directional bias. Gemini 2.5 Pro executes the highest number of trades despite poor results. 

The researchers documented failures including ordering bias in reading time-series data, rule-gaming to circumvent testing parameters, and self-referential confusion when following exit plans. The findings come as institutional players, such as Galaxy Digital, raise $460 million for AI infrastructure, highlighting the growing interest in the intersection of AI and crypto.

Leave a comment