By when will AIs perform at least as well as humans on GAIA?

AI TechOne-Off9j

Manifold MarketsGeen KYCGeverifieerde afwikkelingsgegevensGoed gekalibreerd

Waarschuwing datakwaliteitVerouderde gegevens

Gegevens van 4 jun 2026, 04:44 UTC · beleid pm-quality-3

Huidige gemeenschapsvoorspelling

Before 2035-01-01 97.1%

Koploper van 7 uitkomsten

Voorspellers

Vraagtype

multiple choice

Methodologie

Play-money forecasting platform

Brontype

Voorspelling

Marktdata

Bijgewerkt 53 dagen geleden

Verouderd

21 feb 24, 4:362 jan 36, 7:59

Trends

Uitkomst24uKans

Before 2024-06-01

Before 2025-01-01

Alleen virtueel geld, geen echt geldGeen financieel advies

Gekozen uitkomst

Before 2027-01-0192%

Inzet (USDT)

Regels

The GAIA benchmark (https://arxiv.org/abs/2311.12983) aims to test for the next level of capability for AI agents.

Quoting from the paper: "GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency.
GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins."
This market will resolve based on when an AI system performs as well or better than humans on all 3 of the different levels of the benchmark.
I'll use the numbers from Table 4 in paper: 93.9% on level 1, 91.8% on level 2, and 87.3% on level 3.
(I'm using the conjunction of all 3 levels rather than the average to be somewhat conservative about this level being achieved.)

Gerelateerde Markten

Will Anthropic’s valuation hit __ by December 31?

€ 59,6K

↑$1.1T: 100%

POLYMARKET

Which company has best AI model end of July?

€ 28,4K

Anthropic: 99%

POLYMARKET

Will any AI model reach ___ Overall Arena Score by September 30?

€ 11,9K

1510: 100%

POLYMARKET

When will a non-SpaceX successfully reusable booster be first launched?

€ 6,2K

By Dec 31, 2025: 74%

MANIFOLD MARKETS

When will any company achieve AGI?

€ 2,4K

Before Oct 1, 2027: 37%

KALSHI

When will Google release Gemini 3.5 Pro?

€ 2K

Before Jul 31, 2026: 3%

KALSHI

Actief in deze onderwerpen

BitcoinBTC$63,264.89-3.11%

EthereumETH$1,878.57-3.75%

SolanaSOL$73.10-4.21%

DogecoinDOGE$0.07-3.82%

BNBBNB$565.24-1.41%

XRPXRP$1.06-4.53%

Gerelateerd Nieuws

Coinbase Opens Payment Rails for AI Agents as Corporate Clients Accept Autonomous TransactionsBlockchain Reporter

Google Ships New Gemini Flash Models, But Pro Is Still MissingDecrypt

Google Is Building an AI Chip Just for Gemini—And Investors Already Moved On ItDecrypt

WhiteBIT Launches AI Hub: Trade, Monitor and Automate Through Your Favourite AI AssistantBlockchain Reporter

Over 95% of Coinbase’s code is now written with AICointelegraph

DeepSeek plots $71B IPO to challenge OpenAI in global AI raceCrypto News

Regels

The GAIA benchmark (https://arxiv.org/abs/2311.12983) aims to test for the next level of capability for AI agents.

Quoting from the paper: "GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency.
GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins."
This market will resolve based on when an AI system performs as well or better than humans on all 3 of the different levels of the benchmark.
I'll use the numbers from Table 4 in paper: 93.9% on level 1, 91.8% on level 2, and 87.3% on level 3.
(I'm using the conjunction of all 3 levels rather than the average to be somewhat conservative about this level being achieved.)

By when will AIs perform at least as well as humans on GAIA?

AI TechOne-Off9j

Manifold MarketsGeen KYCGeverifieerde afwikkelingsgegevensGoed gekalibreerd

Waarschuwing datakwaliteitVerouderde gegevens

Gegevens van 4 jun 2026, 04:44 UTC · beleid pm-quality-3

Huidige gemeenschapsvoorspelling

Before 2035-01-01 97.1%

Koploper van 7 uitkomsten

Voorspellers

Vraagtype

multiple choice

Methodologie

Play-money forecasting platform

Brontype

Voorspelling

Marktdata

Bijgewerkt 53 dagen geleden

Verouderd

21 feb 24, 4:362 jan 36, 7:59

Trends

Uitkomst24uKans

Before 2024-06-01

Before 2025-01-01

Alleen virtueel geld, geen echt geldGeen financieel advies

Gekozen uitkomst

Before 2027-01-0192%

Inzet (USDT)

Regels

The GAIA benchmark (https://arxiv.org/abs/2311.12983) aims to test for the next level of capability for AI agents.

Quoting from the paper: "GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency.
GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins."
This market will resolve based on when an AI system performs as well or better than humans on all 3 of the different levels of the benchmark.
I'll use the numbers from Table 4 in paper: 93.9% on level 1, 91.8% on level 2, and 87.3% on level 3.
(I'm using the conjunction of all 3 levels rather than the average to be somewhat conservative about this level being achieved.)

Gerelateerde Markten

Will Anthropic’s valuation hit __ by December 31?

€ 59,6K

↑$1.1T: 100%

POLYMARKET

Which company has best AI model end of July?

€ 28,4K

Anthropic: 99%

POLYMARKET

Will any AI model reach ___ Overall Arena Score by September 30?

€ 11,9K

1510: 100%

POLYMARKET

When will a non-SpaceX successfully reusable booster be first launched?

€ 6,2K

By Dec 31, 2025: 74%

MANIFOLD MARKETS

When will any company achieve AGI?

€ 2,4K

Before Oct 1, 2027: 37%

KALSHI

When will Google release Gemini 3.5 Pro?

€ 2K

Before Jul 31, 2026: 3%

KALSHI

Actief in deze onderwerpen

BitcoinBTC$63,264.89-3.11%

EthereumETH$1,878.57-3.75%

SolanaSOL$73.10-4.21%

DogecoinDOGE$0.07-3.82%

BNBBNB$565.24-1.41%

XRPXRP$1.06-4.53%

Gerelateerd Nieuws

Coinbase Opens Payment Rails for AI Agents as Corporate Clients Accept Autonomous TransactionsBlockchain Reporter

Google Ships New Gemini Flash Models, But Pro Is Still MissingDecrypt

Google Is Building an AI Chip Just for Gemini—And Investors Already Moved On ItDecrypt

WhiteBIT Launches AI Hub: Trade, Monitor and Automate Through Your Favourite AI AssistantBlockchain Reporter

Over 95% of Coinbase’s code is now written with AICointelegraph

DeepSeek plots $71B IPO to challenge OpenAI in global AI raceCrypto News

Regels

The GAIA benchmark (https://arxiv.org/abs/2311.12983) aims to test for the next level of capability for AI agents.

Quoting from the paper: "GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency.
GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins."
This market will resolve based on when an AI system performs as well or better than humans on all 3 of the different levels of the benchmark.
I'll use the numbers from Table 4 in paper: 93.9% on level 1, 91.8% on level 2, and 87.3% on level 3.
(I'm using the conjunction of all 3 levels rather than the average to be somewhat conservative about this level being achieved.)