
SoTA across several benchmarks

FreshQA is updated roughly every week and contains 600 time-sensitive questions that assess how effectively APIs handle recent and evolving information. The benchmark is critical for use cases such as news summarization, event tracking, and trend monitoring.
Benchmark Details
Valyu's Search API is evaluated against leading search providers across five independent benchmarks spanning real-time information, factual accuracy, economics, finance, and medical reasoning.
FreshQA
Real-Time News & Events
About
FreshQA is updated roughly every week and contains 600 time-sensitive questions that assess how effectively APIs handle recent and evolving information. The benchmark is critical for use cases such as news summarization, event tracking, and trend monitoring. Our evaluation used the Aug 27th test set, the most recent release available at the time of testing. View benchmark source
Evaluation
FreshQA was evaluated by comparing four search solutions (Valyu, Parallel, Google, Exa) on time-sensitive queries. Valyu achieved the highest performance at 79.0% accuracy, significantly outperforming others.
Results
| Provider | Accuracy |
|---|---|
| Valyu | 79% |
| Parallel | 52% |
| 39% | |
| Exa | 24% |
SimpleQA
Factual Accuracy
About
SimpleQA, created by OpenAI, includes 4,326 factual questions designed to measure retrieval precision on straightforward, unambiguous queries. The benchmark serves as the baseline for evaluating general-purpose search quality and factual accuracy, covering diverse topics from science and technology to TV shows and video games. View benchmark source
Evaluation
Valyu achieved 94% accuracy on SimpleQA, outperforming other search APIs.
Results
| Provider | Accuracy |
|---|---|
| Valyu | 94% |
| Parallel | 93% |
| Exa | 91% |
| 38% |
Economics
Analyst-Style Economic Queries
About
The Economics Benchmark evaluates search performance through 100 of the hardest analyst-style questions curated from authoritative economic data sources. The benchmark draws from World Bank Indicators, Bureau of Labor Statistics (BLS), and Federal Reserve Economic Data (FRED), covering GDP, inflation, employment, productivity, and monetary policy to simulate authentic economic research workflows used in policy analysis and macroeconomic forecasting.
Distribution
The 100 questions are sourced from: World Bank Indicators (50 questions), BLS (25 questions), and FRED (25 questions), representing the most challenging queries from each dataset.
Evaluation
Valyu achieved 72.6% overall accuracy, substantially outperforming Parallel (52%), Exa (44.7%), and Google (42.7%) on complex economic queries.
Results
| Provider | Accuracy |
|---|---|
| Valyu | 73% |
| Parallel | 52% |
| Exa | 45% |
| 43% |
Finance
Financial Information Retrieval
About
The Financial Benchmark assesses API performance through 120 queries distributed across seven datasets covering critical financial domains. The benchmark simulates real-world financial information retrieval scenarios across regulatory, academic, and market-specific contexts.
Distribution
The 120 queries span seven financial datasets: SEC Filings, References, Financial news and reports, Stocks, Forex, Insider Transactions and Crypto.
Evaluation
Valyu achieved 72.5% overall accuracy, significantly outperforming Exa (62.5%) and Google (55%) on financial queries.
Results
| Provider | Accuracy |
|---|---|
| Valyu | 73% |
| Parallel | 67% |
| Exa | 63% |
| 55% |
MedAgent
Medical Reasoning & Clinical Knowledge
About
MedAgent is a collection of the most difficult questions curated from 10 leading medical benchmarks. The benchmark comprises 482 challenging questions spanning clinical knowledge, drug information, and medical reasoning, designed to test the limits of medical AI capabilities. It includes specialized datasets from Clinical Trials and Drug Labels alongside questions from established medical benchmarks including MMLU, MEDQA, and others.
Distribution
The 482 questions are sourced from: MMLU (50), MEDQA (50), MMLU-PRO (50), MEDEXQA (50), PUBMEDQA (50), MEDMCQA (50), MEDBULLETS (50), AFRIMEDQA (32), MEDXPERTQA-U (50), MEDXPERTQA-R (50).
Evaluation
MedAgent was evaluated by comparing four different search solutions (Valyu, Google, Exa, Parallel) with Valyu achieving the highest performance of 48.1% overall accuracy with particularly strong results on Clinical Trials and Drug Labels (87.5% each).
Results
| Provider | Accuracy |
|---|---|
| Valyu | 48% |
| 45% | |
| Exa | 44% |
| Parallel | 42% |