By when will AIs perform at least as well as humans on GAIA?

IA TecnologiaOne-Off9a

Manifold MarketsSenza KYCDati di risoluzione verificatiBen calibrato

Avviso sulla qualità dei datiDati obsoleti

Dati al 4 giu 2026, 04:44 UTC · criterio pm-quality-3

Previsione della comunità attuale

Before 2035-01-01 97.1%

In testa tra 7 esiti

Previsori

Tipo di domanda

multiple choice

Metodologia

Play-money forecasting platform

Tipo di fonte

Previsione

Dati di mercato

Aggiornato 53 giorni fa

Obsoleto

21 feb 24, 4:362 gen 36, 7:59

Trend

Esito24hProbabilità

Before 2024-06-01

Before 2025-01-01

Solo fondi simulati, nessun denaro realeNon è consulenza finanziaria

Esito scelto

Before 2027-01-0192%

Puntata (USDT)

Regole

The GAIA benchmark (https://arxiv.org/abs/2311.12983) aims to test for the next level of capability for AI agents.

Quoting from the paper: "GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency.
GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins."
This market will resolve based on when an AI system performs as well or better than humans on all 3 of the different levels of the benchmark.
I'll use the numbers from Table 4 in paper: 93.9% on level 1, 91.8% on level 2, and 87.3% on level 3.
(I'm using the conjunction of all 3 levels rather than the average to be somewhat conservative about this level being achieved.)

Mercati Correlati

Will Anthropic’s valuation hit __ by December 31?

59,6K €

↑$1.1T: 100%

POLYMARKET

Which company has best AI model end of July?

28,4K €

Anthropic: 99%

POLYMARKET

Will any AI model reach ___ Overall Arena Score by September 30?

11,9K €

1510: 100%

POLYMARKET

When will a non-SpaceX successfully reusable booster be first launched?

6,2K €

By Dec 31, 2025: 74%

MANIFOLD MARKETS

When will any company achieve AGI?

2,4K €

Before Oct 1, 2027: 37%

KALSHI

When will Google release Gemini 3.5 Pro?

2K €

Before Jul 31, 2026: 3%

KALSHI

Attivi in questi argomenti

BitcoinBTC$63,264.89-3.11%

EthereumETH$1,878.57-3.75%

SolanaSOL$73.10-4.21%

DogecoinDOGE$0.07-3.82%

BNBBNB$565.24-1.41%

XRPXRP$1.06-4.53%

Notizie Correlate

Coinbase Opens Payment Rails for AI Agents as Corporate Clients Accept Autonomous TransactionsBlockchain Reporter

Google Ships New Gemini Flash Models, But Pro Is Still MissingDecrypt

Google Is Building an AI Chip Just for Gemini—And Investors Already Moved On ItDecrypt

WhiteBIT Launches AI Hub: Trade, Monitor and Automate Through Your Favourite AI AssistantBlockchain Reporter

Over 95% of Coinbase’s code is now written with AICointelegraph

DeepSeek plots $71B IPO to challenge OpenAI in global AI raceCrypto News

Regole

The GAIA benchmark (https://arxiv.org/abs/2311.12983) aims to test for the next level of capability for AI agents.

Quoting from the paper: "GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency.
GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins."
This market will resolve based on when an AI system performs as well or better than humans on all 3 of the different levels of the benchmark.
I'll use the numbers from Table 4 in paper: 93.9% on level 1, 91.8% on level 2, and 87.3% on level 3.
(I'm using the conjunction of all 3 levels rather than the average to be somewhat conservative about this level being achieved.)