Deep Tide Guide: AI Agents have already accounted for nearly one-fifth of DeFi trading volume, and in clearly defined scenarios like yield optimization, they have indeed outperformed humans. But when it comes to autonomous trading, the performance of top AI is still less than one-fifth of that of top humans. This study dissects the real-world performance of AI in different DeFi scenarios, worth a look for anyone interested in automated trading.

Key Points

Automation and agent activity currently account for about 19% of all on-chain activity, but true end-to-end autonomy has not yet been achieved.

In narrow, well-defined use cases like yield optimization, agents have shown performance superior to humans and bots. But for multi-faceted actions like trading, humans outperform agents.

Among agents, model selection and risk management have the greatest impact on trading performance.

As agents are adopted at scale, there are multiple trust and execution risks, including sybil attacks, strategy congestion, and privacy trade-offs.

Agent Activity Continues to Grow

Over the past year, agent activity has steadily increased, with both trading volume and number of trades rising. We see Coinbase’s x402 protocol leading significant developments, with players like Visa, Stripe, and Google also joining to launch their own standards. Most of the infrastructure currently being built aims to serve two scenarios: channels between agents or agent calls triggered by humans.

While stablecoin trading is widely supported, current infrastructure still relies on traditional payment gateways as the underlying layer, meaning it still depends on centralized counterparties. Therefore, the fully autonomous end state—where agents can self-finance, self-execute, and continuously optimize based on changing conditions—has not yet been realized.

Agent activity is not unfamiliar to DeFi. For years, on-chain protocols have employed automation via bots to capture MEV or extract excess profits that cannot be achieved without code. These systems perform very well under clearly defined parameters that do not change frequently or require additional oversight. However, markets have become more complex over time. This is where the new generation of agents enters, with on-chain activity over the past few months serving as an experimental ground for such activities.

Actual Performance of Agents

According to the report, agent activity has grown exponentially, with over 17,000 agents launched since 2025. The total volume of automation/agent activity is estimated to cover over 19% of all on-chain activity. Not surprisingly, since over 76% of stablecoin transfers are estimated to be bot-generated. This indicates a huge growth potential for agent activity in DeFi.

Agent autonomy spans a broad spectrum—from chatbots requiring high human supervision to agents capable of devising strategies that adapt to market conditions based on goal inputs. Compared to bots, agents have several key advantages, including the ability to respond and act on new information within milliseconds, and to extend coverage to thousands of markets while maintaining the same level of strictness.

Most current agents are still at analyst or co-pilot levels, as they are mostly in testing phases.

Yield Optimization: Agents Perform Well

Liquidity provision is a domain where automation is already frequent, with total TVL held by agents exceeding $39 million. This figure mainly measures assets directly deposited into agents by users, excluding capital routed through vaults.

Giza Tech is one of the largest protocols in this space, having launched its first agent application, ARMA, at the end of last year, aimed at enhancing yield capture on major DeFi protocols. It has attracted over $19 million in managed assets and generated over $4 billion in agent trading volume. The high ratio of trading volume to total managed assets indicates that agents frequently rebalance capital, enabling higher yield capture. Once capital is deposited into the contract, execution is automated, providing users with a simple one-click experience that requires almost no supervision.

ARMA’s performance is measurably excellent, generating over 9.75% annualized yield on USDC. Even after accounting for additional rebalancing fees and a 10% performance fee for the agent, the yield still surpasses ordinary lending on Aave or Morpho. Nonetheless, scalability remains a key issue, as these agents have not yet been tested in real-world conditions to manage or scale to the size of major DeFi protocols.

Trading: Humans Significantly Ahead

However, for more complex actions like trading, the results are much more varied. Current trading models operate based on human-defined inputs and produce outputs according to preset rules. Machine learning extends this by enabling models to update their behavior based on new information without explicit reprogramming, pushing them into a co-pilot role. With fully autonomous agents joining, the trading landscape will undergo significant change.

Several competitions have been held between agents and between humans and agents, revealing large performance differences. Trade XYZ hosted a human vs. agent trading contest for stocks listed on its platform. Each account started with $10k, with no leverage or trading frequency limits. The results overwhelmingly favored humans, with top human performance exceeding that of top agents by more than five times.

Meanwhile, Nof1 organized a model-to-model agent trading contest, pitting models like Grok-4, GPT-5, Deepseek, Kimi, Qwen3, Claude, and Gemini against each other, testing different risk configurations from capital preservation to maximum leverage. Several factors emerged that help explain performance differences:

Position Holding Time: Strong correlation was observed; models holding positions for an average of 2-3 hours significantly outperformed those flipping positions frequently.

Expected Value: This measures whether a model’s average trade is profitable. Interestingly, only the top three models had positive expected value, indicating most models had more losing than winning trades.

Leverage: Lower leverage levels, averaging 6-8x, performed better than models running with over 10x leverage, as high leverage accelerates losses.

Prompt Strategies: Monk Mode was the best-performing model so far, while Situational Awareness performed the worst. Based on model features, focusing on risk management and fewer external sources tends to yield better results.

Base Models: Grok 4.20 significantly outperformed other models by over 22% across different prompt strategies and was the only model with an average profit.

Other factors such as long/short bias, trade size, and confidence scores lacked sufficient data or showed no positive correlation with performance. Overall, results suggest that agents tend to perform better within clearly defined constraints, indicating that human oversight remains crucial for goal setting.

How to Evaluate Agents

Given that agents are still in early stages, there is no comprehensive evaluation framework yet. Historical performance is often used as a benchmark, but it is influenced by underlying factors that provide stronger signals of agent performance.

Performance under different volatility conditions: including disciplined loss control during adverse conditions, indicating agents can recognize off-chain factors affecting profitability.

Transparency and Privacy: Both have trade-offs. Transparent agents that can be actively copied generally lack strategic advantage. Private agents face risks of internal extraction by creators, who can easily front-run their users.

Sources of Information: The data sources accessed by agents are critical for decision-making. Ensuring sources are trustworthy and not overly reliant on a single source is vital.

Security: Having smart contract audits and proper custody architectures to ensure backup measures during black swan events is essential.

Next Steps for Agents

To enable large-scale adoption, much work remains on infrastructure. This boils down to key issues around trust and execution of autonomous agents. Unfenced autonomous agents have already shown instances of poor fund management.

ERC-8004, launched in January 2026, became the first on-chain registry allowing autonomous agents to discover each other, establish verifiable reputation, and collaborate securely. This is a key unlock for DeFi composability, as trust scores embedded in smart contracts enable permissionless activity between agents and protocols. However, it does not guarantee agents always operate in good faith, as collusion, sybil attacks, and other security vulnerabilities remain possible. Therefore, there is still significant room for improvement in insurance, security, and economic staking of agents.

As agent activity expands in DeFi, strategy congestion becomes a structural risk. Yield farms are the clearest precedent; as strategies proliferate, returns compress. The same dynamic could apply to agent trading: if many agents train and optimize on similar data and goals, they will tend to take similar positions and exit signals.

The CoinAlg paper published by Cornell University in January 2026 formalizes this issue. Transparent agents can be arbitraged because their trades are predictable and can be front-run. Private agents avoid this risk but introduce different risks, such as creators retaining informational advantages over their users and extracting value through opacity.

Agent activity will only accelerate, and the infrastructure laid today will shape the next phase of on-chain finance. As agent usage increases, they will self-iterate and become more attuned to user preferences. Therefore, the key differentiator will be trustworthy infrastructure, which will capture the largest market share.

DEFI-2,81%

GIZA1,31%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GatePreIPOsLaunchesWithSpaceX
176.94K Popularity
#
Gate13thAnniversaryLive
708.86K Popularity
#
AltcoinsRallyStrong
7.3M Popularity
#
AnthropicvsOpenAIHeatsUp
1.06M Popularity
#
KalshiFacesNevadaRegulatoryClash
454.07K Popularity

Sitemap

DWF In-Depth Report: AI Outperforms Humans in Yield Optimization in DeFi, but Still Trails in Complex Transactions by 5 Times

Trending Topics

GatePreIPOsLaunchesWithSpaceX

Gate13thAnniversaryLive

AltcoinsRallyStrong

AnthropicvsOpenAIHeatsUp

KalshiFacesNevadaRegulatoryClash

Pin