By when will AIs perform at least as well as humans on GAIA?

KI TechnikOne-Off9J

Manifold MarketsKein KYCVerifizierte AuflösungsdatenGut kalibriert

Warnung zur DatenqualitätVeraltete Daten

Daten vom 04.06.2026, 04:44 UTC · Richtlinie pm-quality-3

Aktuelle Community-Prognose

Before 2035-01-01 97.1%

Führend unter 7 Optionen

Prognostiker

Fragetyp

multiple choice

Methodik

Play-money forecasting platform

Quellentyp

Prognose

Marktdaten

Aktualisiert vor 53 Tagen

Veraltet

21. Feb. 24, 4:362. Jan. 36, 7:59

Trends

Ergebnis24hWahrscheinlichkeit

Before 2024-06-01

Before 2025-01-01

Nur virtuelles Guthaben – kein echtes GeldKeine Finanzberatung

Gewähltes Ergebnis

Before 2027-01-0192%

Einsatz (USDT)

Regeln

The GAIA benchmark (https://arxiv.org/abs/2311.12983) aims to test for the next level of capability for AI agents.

Quoting from the paper: "GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency.
GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins."
This market will resolve based on when an AI system performs as well or better than humans on all 3 of the different levels of the benchmark.
I'll use the numbers from Table 4 in paper: 93.9% on level 1, 91.8% on level 2, and 87.3% on level 3.
(I'm using the conjunction of all 3 levels rather than the average to be somewhat conservative about this level being achieved.)

Verwandte Märkte

Will Anthropic’s valuation hit __ by December 31?

59.635 €

↑$1.1T: 100%

POLYMARKET

Which company has best AI model end of July?

28.359,6 €

Anthropic: 99%

POLYMARKET

Will any AI model reach ___ Overall Arena Score by September 30?

11.909,3 €

1510: 100%

POLYMARKET

When will a non-SpaceX successfully reusable booster be first launched?

6178,1 €

By Dec 31, 2025: 74%

MANIFOLD MARKETS

When will any company achieve AGI?

2422,7 €

Before Oct 1, 2027: 37%

KALSHI

When will Google release Gemini 3.5 Pro?

2022,7 €

Before Jul 31, 2026: 3%

KALSHI

In diesen Themen aktiv

BitcoinBTC$63,264.89-3.11%

EthereumETH$1,878.57-3.75%

SolanaSOL$73.10-4.21%

DogecoinDOGE$0.07-3.82%

BNBBNB$565.24-1.41%

XRPXRP$1.06-4.53%

Verwandte Nachrichten

Coinbase Opens Payment Rails for AI Agents as Corporate Clients Accept Autonomous TransactionsBlockchain Reporter

Google Ships New Gemini Flash Models, But Pro Is Still MissingDecrypt

Google Is Building an AI Chip Just for Gemini—And Investors Already Moved On ItDecrypt

WhiteBIT Launches AI Hub: Trade, Monitor and Automate Through Your Favourite AI AssistantBlockchain Reporter

Over 95% of Coinbase’s code is now written with AICointelegraph

DeepSeek plots $71B IPO to challenge OpenAI in global AI raceCrypto News

Regeln

The GAIA benchmark (https://arxiv.org/abs/2311.12983) aims to test for the next level of capability for AI agents.

Quoting from the paper: "GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency.
GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins."
This market will resolve based on when an AI system performs as well or better than humans on all 3 of the different levels of the benchmark.
I'll use the numbers from Table 4 in paper: 93.9% on level 1, 91.8% on level 2, and 87.3% on level 3.
(I'm using the conjunction of all 3 levels rather than the average to be somewhat conservative about this level being achieved.)

By when will AIs perform at least as well as humans on GAIA?

KI TechnikOne-Off9J

Manifold MarketsKein KYCVerifizierte AuflösungsdatenGut kalibriert

Warnung zur DatenqualitätVeraltete Daten

Daten vom 04.06.2026, 04:44 UTC · Richtlinie pm-quality-3

Aktuelle Community-Prognose

Before 2035-01-01 97.1%

Führend unter 7 Optionen

Prognostiker

Fragetyp

multiple choice

Methodik

Play-money forecasting platform

Quellentyp

Prognose

Marktdaten

Aktualisiert vor 53 Tagen

Veraltet

21. Feb. 24, 4:362. Jan. 36, 7:59

Trends

Ergebnis24hWahrscheinlichkeit

Before 2024-06-01

Before 2025-01-01

Nur virtuelles Guthaben – kein echtes GeldKeine Finanzberatung

Gewähltes Ergebnis

Before 2027-01-0192%

Einsatz (USDT)

Regeln

The GAIA benchmark (https://arxiv.org/abs/2311.12983) aims to test for the next level of capability for AI agents.

Quoting from the paper: "GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency.
GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins."
This market will resolve based on when an AI system performs as well or better than humans on all 3 of the different levels of the benchmark.
I'll use the numbers from Table 4 in paper: 93.9% on level 1, 91.8% on level 2, and 87.3% on level 3.
(I'm using the conjunction of all 3 levels rather than the average to be somewhat conservative about this level being achieved.)

Verwandte Märkte

Will Anthropic’s valuation hit __ by December 31?

59.635 €

↑$1.1T: 100%

POLYMARKET

Which company has best AI model end of July?

28.359,6 €

Anthropic: 99%

POLYMARKET

Will any AI model reach ___ Overall Arena Score by September 30?

11.909,3 €

1510: 100%

POLYMARKET

When will a non-SpaceX successfully reusable booster be first launched?

6178,1 €

By Dec 31, 2025: 74%

MANIFOLD MARKETS

When will any company achieve AGI?

2422,7 €

Before Oct 1, 2027: 37%

KALSHI

When will Google release Gemini 3.5 Pro?

2022,7 €

Before Jul 31, 2026: 3%

KALSHI

In diesen Themen aktiv

BitcoinBTC$63,264.89-3.11%

EthereumETH$1,878.57-3.75%

SolanaSOL$73.10-4.21%

DogecoinDOGE$0.07-3.82%

BNBBNB$565.24-1.41%

XRPXRP$1.06-4.53%

Verwandte Nachrichten

Coinbase Opens Payment Rails for AI Agents as Corporate Clients Accept Autonomous TransactionsBlockchain Reporter

Google Ships New Gemini Flash Models, But Pro Is Still MissingDecrypt

Google Is Building an AI Chip Just for Gemini—And Investors Already Moved On ItDecrypt

WhiteBIT Launches AI Hub: Trade, Monitor and Automate Through Your Favourite AI AssistantBlockchain Reporter

Over 95% of Coinbase’s code is now written with AICointelegraph

DeepSeek plots $71B IPO to challenge OpenAI in global AI raceCrypto News

Regeln

The GAIA benchmark (https://arxiv.org/abs/2311.12983) aims to test for the next level of capability for AI agents.

Quoting from the paper: "GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency.
GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins."
This market will resolve based on when an AI system performs as well or better than humans on all 3 of the different levels of the benchmark.
I'll use the numbers from Table 4 in paper: 93.9% on level 1, 91.8% on level 2, and 87.3% on level 3.
(I'm using the conjunction of all 3 levels rather than the average to be somewhat conservative about this level being achieved.)