Economic Data, Structured for Machines: A Benchmark for Real AI Search

Today, we’re releasing a new performance benchmark that sets the standard for retrieving structured economic data across global indicators, labor statistics, and monetary policy. It’s one of the hardest retrieval problems in AI, and this benchmark proves it can be done right.
No major Search API offers a public benchmark for economic data retrieval because it’s uniquely hard. The data is scattered across government portals, versioned inconsistently, and buried in formats that resist easy extraction.
But this is exactly where retrieval matters most.
Economic data underpins everything from hedge fund models to policy decisions to automated macro research. If a system retrieves the wrong number, the wrong year, or the wrong adjustment, the entire reasoning chain breaks.
Benchmark Design
The Economic Data Benchmark consists of 100 questions, each modeled on how real analysts interact with structured economic datasets. It draws from three core sources:
- World Bank Indicators: Macroeconomic metrics like GDP, inflation, poverty, and development, commonly used in global policy and research.
- Bureau of Labor Statistics: U.S. labor data including employment, wages, productivity, and inflation, often embedded in static tables or PDFs.
- FRED: Time-series and monetary policy data such as interest rates, money supply, and macroeconomic trends, commonly visualised in dynamic charts.
We weighted these sources based on their relevance in actual workflows. World Bank data dominates global macro research, while BLS and FRED are indispensable for domestic and time-sensitive analysis.
Results

Economic evaluation: Valyu scored 73%, Parallel 52%, Exa 45%, and Google 43% on queries spanning GDP, inflation, and employment data.
Valyu stands alone at the top.
This benchmark was not built to test static recall; it reflects the real complexity of economic research. Valyu retrieved the correct value from the correct indicator, even when data was buried in nested portals, split across versions, or wrapped in ambiguous naming conventions.
It’s the first system to reliably resolve economic queries across global, domestic, and policy datasets and return content that models can actually use.
Why this matters
Economic queries can break on small differences: a unit, a time range, or a data revision. The right answer often depends on the exact indicator variant and how recently the data was updated.
Most systems return something plausible. But plausible is not correct.
These errors do not trigger warnings. They show up later as hallucinated prompts, broken reasoning chains, or flawed conclusions. By the time the failure is visible, it is no longer traceable.
The benchmark results show that Valyu retrieves the correct value, from the correct source, under real-world conditions. It handles the ambiguity other systems miss and gives builders a search layer they can trust when the numbers actually matter.
Try It Yourself
Don’t take our word for it — test it yourself.