• Cryptocurrencies
  • Voorspellingsmarkten
  • Nieuws
  • Agentisch Handelen
  • Artikelen
  • Competities

Zoeken Cryptocurrencies

Trends Cryptocurrencies



CoinRithm

Bedrijf

Rechtspersoon
Bees-x Limited
Bedrijfsnummer
13308136
Opgericht in
England and Wales
Statutaire zetel
Monmouth House, High Street, Watford, England, WD17 1LN

CoinRithm is een informatie- en onderzoeksdienst van Bees-x Limited. Het is niet door de Financial Conduct Authority (FCA) gemachtigd om gereguleerde activiteiten uit te voeren, en niets op deze site vormt financieel advies.

Ontdekken

CryptocurrenciesVoorspellingsmarktenNieuwsArtikelenAgent ArenaCompetities

Functies

DashboardNephandelAgentisch HandelenPortefeuilleVolglijstInstellingen

Bedrijf

Over OnsMethodologieGebruiks- voorwaardenPrivacybeleidCookiebeleidVrijwaring

Ondersteuning

KlantenserviceFAQOntwikkelaarskitMCP-documentatie

Sociale Media

X (Twitter)FacebookLinkedInTelegramInstagramTikTokYouTube
© 2026 CoinRithm. Rechten voorbehouden.
Verkrijgbaar via Google PlayDownloaden in de App Store
  • Home
  • MarktenVoorspellingsmarkten
  • Nieuws
  • Dashboard
  1. Voorspellingsmarkten
  2. AI
  3. By when will AIs perform at least as well as humans on GAIA?
By when will AIs perform at least as well as humans on GAIA?

By when will AIs perform at least as well as humans on GAIA?

AITechOne-Off9j
Manifold MarketsManifold MarketsGeen KYC
Huidige gemeenschapsvoorspelling
Before 2024-06-01
Before 2024-06-01 0%
Koploper van 7 uitkomsten
Voorspellers

26

Vraagtype

multiple choice

Methodologie

Play-money forecasting platform

Brontype

Voorspelling

Marktdata

Bijgewerkt 7 dagen geleden

Verouderd
21 feb 24, 4:362 jan 36, 7:59

Trends

Uitkomst24uKans
Before 2024-06-01
Before 2024-06-01
0%
Before 2025-01-01
Before 2025-01-01
0%

Gekozen uitkomst

Before 2027-01-0192%

Regels

The GAIA benchmark (https://arxiv.org/abs/2311.12983) aims to test for the next level of capability for AI agents.

Manifold Markets
  • Quoting from the paper: "GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency.
  • GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins."
  • This market will resolve based on when an AI system performs as well or better than humans on all 3 of the different levels of the benchmark.
  • I'll use the numbers from Table 4 in paper: 93.9% on level 1, 91.8% on level 2, and 87.3% on level 3.
  • (I'm using the conjunction of all 3 levels rather than the average to be somewhat conservative about this level being achieved.)

Gerelateerde Markten

Which company has best AI model end of June?

Which company has best AI model end of June?

€ 344,5K
Anthropic: 88%PolymarketPOLYMARKET
Which company has top AI model end of June? (Style Control On)

Which company has top AI model end of June? (Style Control On)

€ 45,2K
Anthropic: 90%PolymarketPOLYMARKET
Which company has best AI model end of July?

Which company has best AI model end of July?

€ 21,7K
Anthropic: 81%PolymarketPOLYMARKET
When will a non-SpaceX successfully reusable booster be first launched?

When will a non-SpaceX successfully reusable booster be first launched?

€ 6,1K
By Dec 31, 2025: 74%Manifold MarketsMANIFOLD MARKETS
Manifold Markets

GPT 5.6 released by…?

€ 1,1K
11.59pm ET May 31 2026: 0%Manifold MarketsMANIFOLD MARKETS
Manifold Markets

By when will Google add ads to Gemini?

€ 627,7
By Jan 1, 2026: 0%Manifold MarketsMANIFOLD MARKETS

Actief in deze onderwerpen

BitcoinBTC$62,748.94+1.97%EthereumETH$1,654.07+1.26%SolanaSOL$65.07+1.17%DogecoinDOGE$0.0847+1.10%BNBBNB$596.08+1.54%XRPXRP$1.11+0.04%

Gerelateerd Nieuws

Anthropic launches Claude Fable 5 with new safeguardsCrypto NewsEU orders Meta to restore WhatsApp access for rival AI chatbotsCrypto NewsJPMorgan plans longer-running AI agents for corporate workflows Crypto NewsOpenAI Files for IPO, Targets Valuation Up to $850BBlockchain.NewsOpenAI confidentially files to go public in the USCointelegraphNvidia expands South Korean AI partnerships across chips, cloud, and robotics Crypto News

Regels

The GAIA benchmark (https://arxiv.org/abs/2311.12983) aims to test for the next level of capability for AI agents.

Manifold Markets
  • Quoting from the paper: "GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency.
  • GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins."
  • This market will resolve based on when an AI system performs as well or better than humans on all 3 of the different levels of the benchmark.
  • I'll use the numbers from Table 4 in paper: 93.9% on level 1, 91.8% on level 2, and 87.3% on level 3.
  • (I'm using the conjunction of all 3 levels rather than the average to be somewhat conservative about this level being achieved.)