AI Search API Benchmarks

SoTA across several benchmarks

FreshQA

Benchmark Details

Valyu's Search API is evaluated against leading search providers across five independent benchmarks spanning real-time information, factual accuracy, economics, finance, and medical reasoning.

Benchmarking search APIs blog View benchmark code on GitHub Independent analysis

FreshQA

Real-Time News & Events

About

FreshQA is updated roughly every week and contains 600 time-sensitive questions that assess how effectively APIs handle recent and evolving information. The benchmark is critical for use cases such as news summarization, event tracking, and trend monitoring. Our evaluation used the Aug 27th test set, the most recent release available at the time of testing. View benchmark source

Evaluation

FreshQA was evaluated by comparing four search solutions (Valyu, Parallel, Google, Exa) on time-sensitive queries. Valyu achieved the highest performance at 79.0% accuracy, significantly outperforming others.

Results

Provider	Accuracy
Valyu	79%
Parallel	52%
Google	39%
Exa	24%

SimpleQA

Factual Accuracy

About

SimpleQA, created by OpenAI, includes 4,326 factual questions designed to measure retrieval precision on straightforward, unambiguous queries. The benchmark serves as the baseline for evaluating general-purpose search quality and factual accuracy, covering diverse topics from science and technology to TV shows and video games. View benchmark source

Evaluation

Valyu achieved 94% accuracy on SimpleQA, outperforming other search APIs.

Results

Provider	Accuracy
Valyu	94%
Parallel	93%
Exa	91%
Google	38%

Economics

Analyst-Style Economic Queries

About

The Economics Benchmark evaluates search performance through 100 of the hardest analyst-style questions curated from authoritative economic data sources. The benchmark draws from World Bank Indicators, Bureau of Labor Statistics (BLS), and Federal Reserve Economic Data (FRED), covering GDP, inflation, employment, productivity, and monetary policy to simulate authentic economic research workflows used in policy analysis and macroeconomic forecasting.

Distribution

The 100 questions are sourced from: World Bank Indicators (50 questions), BLS (25 questions), and FRED (25 questions), representing the most challenging queries from each dataset.

Evaluation

Valyu achieved 72.6% overall accuracy, substantially outperforming Parallel (52%), Exa (44.7%), and Google (42.7%) on complex economic queries.

Results

Provider	Accuracy
Valyu	73%
Parallel	52%
Exa	45%
Google	43%

Finance

Financial Information Retrieval

About

The Financial Benchmark assesses API performance through 120 queries distributed across seven datasets covering critical financial domains. The benchmark simulates real-world financial information retrieval scenarios across regulatory, academic, and market-specific contexts.

Distribution

The 120 queries span seven financial datasets: SEC Filings, References, Financial news and reports, Stocks, Forex, Insider Transactions and Crypto.

Evaluation

Valyu achieved 72.5% overall accuracy, significantly outperforming Exa (62.5%) and Google (55%) on financial queries.

Results

Provider	Accuracy
Valyu	73%
Parallel	67%
Exa	63%
Google	55%

MedAgent

Medical Reasoning & Clinical Knowledge

About

MedAgent is a collection of the most difficult questions curated from 10 leading medical benchmarks. The benchmark comprises 482 challenging questions spanning clinical knowledge, drug information, and medical reasoning, designed to test the limits of medical AI capabilities. It includes specialized datasets from Clinical Trials and Drug Labels alongside questions from established medical benchmarks including MMLU, MEDQA, and others.

Distribution

The 482 questions are sourced from: MMLU (50), MEDQA (50), MMLU-PRO (50), MEDEXQA (50), PUBMEDQA (50), MEDMCQA (50), MEDBULLETS (50), AFRIMEDQA (32), MEDXPERTQA-U (50), MEDXPERTQA-R (50).

Evaluation

MedAgent was evaluated by comparing four different search solutions (Valyu, Google, Exa, Parallel) with Valyu achieving the highest performance of 48.1% overall accuracy with particularly strong results on Clinical Trials and Drug Labels (87.5% each).

Results

Provider	Accuracy
Valyu	48%
Google	45%
Exa	44%
Parallel	42%