• Cryptocurrencies
  • Prediction Markets
  • News
  • Agentic Trading
  • Blog
  • Leagues

Search Cryptocurrencies

Trending Cryptocurrencies



CoinRithm

Company

Legal Entity
Bees-x Limited
Company Number
13308136
Incorporated In
England and Wales
Registered Office
Monmouth House, High Street, Watford, England, WD17 1LN

CoinRithm is an information and research service operated by Bees-x Limited. It is not authorised by the Financial Conduct Authority (FCA) to carry on regulated activities, and nothing on this site is financial advice.

Explore

CryptocurrenciesPrediction MarketsNewsBlogAgent ArenaLeagues

Features

DashboardMock TradeAgentic TradingPortfolioWatchlistSettings

Company

About UsMethodologyTerms of UsePrivacy PolicyCookie PolicyDisclaimer

Support

Contact SupportFAQDeveloper kitMCP docs

Socials

X (Twitter)FacebookLinkedInTelegramInstagramTikTokYouTube
© 2026 CoinRithm. All rights reserved.
Get it on Google PlayDownload on the App Store
  • Home
  • MarketsPrediction Markets
  • News
  • Dashboard
  1. Prediction Markets
  2. AI
  3. By when will AIs perform at least as well as humans on GAIA?
By when will AIs perform at least as well as humans on GAIA?

By when will AIs perform at least as well as humans on GAIA?

AITechOne-Off9y
Manifold MarketsManifold MarketsNo KYC
Current community forecast
Before 2024-06-01
Before 2024-06-01 0%
Leader of 7 outcomes
Forecasters

26

Question type

multiple choice

Methodology

Play-money forecasting platform

Source type

Forecast

Market data

Updated 7 days ago

Stale
Feb 21, 24, 4:36 AMJan 2, 36, 7:59 AM

Trends

Outcome24hChance
Before 2024-06-01
Before 2024-06-01
0%
Before 2025-01-01
Before 2025-01-01
0%

Selected outcome

Before 2027-01-0192%

Rules

The GAIA benchmark (https://arxiv.org/abs/2311.12983) aims to test for the next level of capability for AI agents.

Manifold Markets
  • Quoting from the paper: "GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency.
  • GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins."
  • This market will resolve based on when an AI system performs as well or better than humans on all 3 of the different levels of the benchmark.
  • I'll use the numbers from Table 4 in paper: 93.9% on level 1, 91.8% on level 2, and 87.3% on level 3.
  • (I'm using the conjunction of all 3 levels rather than the average to be somewhat conservative about this level being achieved.)

Related Markets

Which company has best AI model end of June?

Which company has best AI model end of June?

$397.4K
Anthropic: 87%PolymarketPOLYMARKET
Which company has top AI model end of June? (Style Control On)

Which company has top AI model end of June? (Style Control On)

$52.1K
Anthropic: 89%PolymarketPOLYMARKET
Which company has second best AI model end of June?

Which company has second best AI model end of June?

$8.9K
Anthropic: 86%PolymarketPOLYMARKET
When will a non-SpaceX successfully reusable booster be first launched?

When will a non-SpaceX successfully reusable booster be first launched?

$7K
By Dec 31, 2025: 74%Manifold MarketsMANIFOLD MARKETS
Manifold Markets

By when will Google add ads to Gemini?

$724.5
By Jan 1, 2026: 0%Manifold MarketsMANIFOLD MARKETS
Manifold Markets

Will OpenAI or Google announce an release a reduction to image generation filters around nudity by EOY 2027?

$612
Yes: 38.3%Manifold MarketsMANIFOLD MARKETS

Active in these topics

BitcoinBTC$62,710.65+1.66%EthereumETH$1,653.76+0.84%SolanaSOL$64.97+0.68%DogecoinDOGE$0.0848+0.74%BNBBNB$597.26+1.49%XRPXRP$1.11-0.27%

Related News

Anthropic launches Claude Fable 5 with new safeguardsCrypto NewsEU orders Meta to restore WhatsApp access for rival AI chatbotsCrypto NewsJPMorgan plans longer-running AI agents for corporate workflows Crypto NewsOpenAI Files for IPO, Targets Valuation Up to $850BBlockchain.NewsOpenAI confidentially files to go public in the USCointelegraphNvidia expands South Korean AI partnerships across chips, cloud, and robotics Crypto News

Rules

The GAIA benchmark (https://arxiv.org/abs/2311.12983) aims to test for the next level of capability for AI agents.

Manifold Markets
  • Quoting from the paper: "GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency.
  • GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins."
  • This market will resolve based on when an AI system performs as well or better than humans on all 3 of the different levels of the benchmark.
  • I'll use the numbers from Table 4 in paper: 93.9% on level 1, 91.8% on level 2, and 87.3% on level 3.
  • (I'm using the conjunction of all 3 levels rather than the average to be somewhat conservative about this level being achieved.)