By when will AIs perform at least as well as humans on GAIA?

IA TecnologiaOne-Off9a

Manifold MarketsSem KYCDados de resolução verificadosBem calibrado

Aviso de qualidade dos dadosDados desatualizados

Dados de 4 de jun. de 2026, 04:44 UTC · política pm-quality-3

Previsão da comunidade atual

Before 2035-01-01 97.1%

Líder entre 7 opções

Previsores

Tipo de pergunta

multiple choice

Metodologia

Play-money forecasting platform

Tipo de fonte

Previsão

Dados do mercado

Atualizado há 54 dias

Desatualizado

21/02/24, 4:362/01/36, 7:59

Tendências

Resultado24hProbabilidade

Before 2024-06-01

Before 2025-01-01

Apenas fundos simulados, sem dinheiro realIsto não é aconselhamento financeiro

Resultado escolhido

Before 2027-01-0192%

Aposta (USDT)

Regras

The GAIA benchmark (https://arxiv.org/abs/2311.12983) aims to test for the next level of capability for AI agents.

Quoting from the paper: "GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency.
GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins."
This market will resolve based on when an AI system performs as well or better than humans on all 3 of the different levels of the benchmark.
I'll use the numbers from Table 4 in paper: 93.9% on level 1, 91.8% on level 2, and 87.3% on level 3.
(I'm using the conjunction of all 3 levels rather than the average to be somewhat conservative about this level being achieved.)

Mercados Relacionados

Will Anthropic’s valuation hit __ by December 31?

59,6 mil €

↑$1.1T: 100%

POLYMARKET

Which company has best AI model end of July?

28,3 mil €

Anthropic: 99%

POLYMARKET

Will any AI model reach ___ Overall Arena Score by September 30?

11,9 mil €

1510: 100%

POLYMARKET

When will a non-SpaceX successfully reusable booster be first launched?

6,2 mil €

By Dec 31, 2025: 74%

MANIFOLD MARKETS

When will any company achieve AGI?

2,5 mil €

Before Oct 1, 2027: 37%

KALSHI

When will Google release Gemini 3.5 Pro?

2 mil €

Before Jul 31, 2026: 3%

KALSHI

Ativos nestes tópicos

BitcoinBTC$63,509.23-2.83%

EthereumETH$1,887.15-3.67%

SolanaSOL$73.35-3.90%

DogecoinDOGE$0.0701-3.62%

BNBBNB$566.38-1.41%

XRPXRP$1.06-4.39%

Notícias Relacionadas

Coinbase Opens Payment Rails for AI Agents as Corporate Clients Accept Autonomous TransactionsBlockchain Reporter

Google Ships New Gemini Flash Models, But Pro Is Still MissingDecrypt

Google Is Building an AI Chip Just for Gemini—And Investors Already Moved On ItDecrypt

WhiteBIT Launches AI Hub: Trade, Monitor and Automate Through Your Favourite AI AssistantBlockchain Reporter

Over 95% of Coinbase’s code is now written with AICointelegraph

DeepSeek plots $71B IPO to challenge OpenAI in global AI raceCrypto News

Regras

The GAIA benchmark (https://arxiv.org/abs/2311.12983) aims to test for the next level of capability for AI agents.

Quoting from the paper: "GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency.
GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins."
This market will resolve based on when an AI system performs as well or better than humans on all 3 of the different levels of the benchmark.
I'll use the numbers from Table 4 in paper: 93.9% on level 1, 91.8% on level 2, and 87.3% on level 3.
(I'm using the conjunction of all 3 levels rather than the average to be somewhat conservative about this level being achieved.)

By when will AIs perform at least as well as humans on GAIA?

IA TecnologiaOne-Off9a

Manifold MarketsSem KYCDados de resolução verificadosBem calibrado

Aviso de qualidade dos dadosDados desatualizados

Dados de 4 de jun. de 2026, 04:44 UTC · política pm-quality-3

Previsão da comunidade atual

Before 2035-01-01 97.1%

Líder entre 7 opções

Previsores

Tipo de pergunta

multiple choice

Metodologia

Play-money forecasting platform

Tipo de fonte

Previsão

Dados do mercado

Atualizado há 54 dias

Desatualizado

21/02/24, 4:362/01/36, 7:59

Tendências

Resultado24hProbabilidade

Before 2024-06-01

Before 2025-01-01

Apenas fundos simulados, sem dinheiro realIsto não é aconselhamento financeiro

Resultado escolhido

Before 2027-01-0192%

Aposta (USDT)

Regras

The GAIA benchmark (https://arxiv.org/abs/2311.12983) aims to test for the next level of capability for AI agents.

Quoting from the paper: "GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency.
GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins."
This market will resolve based on when an AI system performs as well or better than humans on all 3 of the different levels of the benchmark.
I'll use the numbers from Table 4 in paper: 93.9% on level 1, 91.8% on level 2, and 87.3% on level 3.
(I'm using the conjunction of all 3 levels rather than the average to be somewhat conservative about this level being achieved.)

Mercados Relacionados

Will Anthropic’s valuation hit __ by December 31?

59,6 mil €

↑$1.1T: 100%

POLYMARKET

Which company has best AI model end of July?

28,3 mil €

Anthropic: 99%

POLYMARKET

Will any AI model reach ___ Overall Arena Score by September 30?

11,9 mil €

1510: 100%

POLYMARKET

When will a non-SpaceX successfully reusable booster be first launched?

6,2 mil €

By Dec 31, 2025: 74%

MANIFOLD MARKETS

When will any company achieve AGI?

2,5 mil €

Before Oct 1, 2027: 37%

KALSHI

When will Google release Gemini 3.5 Pro?

2 mil €

Before Jul 31, 2026: 3%

KALSHI

Ativos nestes tópicos

BitcoinBTC$63,509.23-2.83%

EthereumETH$1,887.15-3.67%

SolanaSOL$73.35-3.90%

DogecoinDOGE$0.0701-3.62%

BNBBNB$566.38-1.41%

XRPXRP$1.06-4.39%

Notícias Relacionadas

Coinbase Opens Payment Rails for AI Agents as Corporate Clients Accept Autonomous TransactionsBlockchain Reporter

Google Ships New Gemini Flash Models, But Pro Is Still MissingDecrypt

Google Is Building an AI Chip Just for Gemini—And Investors Already Moved On ItDecrypt

WhiteBIT Launches AI Hub: Trade, Monitor and Automate Through Your Favourite AI AssistantBlockchain Reporter

Over 95% of Coinbase’s code is now written with AICointelegraph

DeepSeek plots $71B IPO to challenge OpenAI in global AI raceCrypto News

Regras

The GAIA benchmark (https://arxiv.org/abs/2311.12983) aims to test for the next level of capability for AI agents.

Quoting from the paper: "GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency.
GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins."
This market will resolve based on when an AI system performs as well or better than humans on all 3 of the different levels of the benchmark.
I'll use the numbers from Table 4 in paper: 93.9% on level 1, 91.8% on level 2, and 87.3% on level 3.
(I'm using the conjunction of all 3 levels rather than the average to be somewhat conservative about this level being achieved.)