The Great Agent Hack 2025: Agent Performance, Reliability, and Valyu-Powered Retrieval
The Great Agent Hack 2025 ran last weekend at UCL East, pulling in more than 200 builders, researchers, and engineers to see how far agentic systems can be pushed under real world constraints. Over two days, teams tackled the kinds of problems agents break on: performance under load, behavioural visibility, and explainability in messy environments.
Why This Hackathon Existed
Agentic AI is starting to move into real products, but anyone who has actually tried to scale an agent workflow into production generally runs into the same three failures:
- Retrieval instability: agents fall apart when search and data access is slow, inconsistent, or returns malformed data.
- Opaque execution paths: Once you lose visibility into an agent’s steps, it becomes almost impossible to debug or trust its output.
- Safety drift: if you can’t trace why an agent chose an action, you can’t control or audit it.
Holistic AI designed the Great Agent Hack to force teams to confront these realities head-on. Every track demanded systems that could reason in the wild (Here are some promotional videos that we made to promote each track).
Track A: Agent Iron Man
Production endurance: latency, cost-efficiency, reliability under load, unstable tool calls.
Track B: Agent Glass Box
Visibility and interpretability: traces, planning transparency, runtime introspection.
Track C: Dear Grandma
Explainability: turning complex agent behaviour into something legible for non-technical users.
Why Valyu Participated
We participated because almost every Agent problem begins as a data problem.
An Agent is just a loop executing against whatever inputs it’s given. When those inputs are inconsistent, the loop breaks. In practice, this happens because:
- Search results aren’t reliable or don’t match the actual query intent
- Latency fluctuates and the loop stalls or times out mid-sequence
- Returned context lacks provenance or citations, so the agent can’t verify anything
- Upstream APIs rate-limit, throttle, or return loosely structured output that the agent can’t parse
We supported the hackathon to give teams access to our DeepSearch API built specifically for Agent workflows. Across the weekend, builders used Valyu to power:
- Reliable AI Native web search that held up under heavy load and multi-step planning
- Structured retrieval for agentic RAG, where agents pulled clean, citation-backed context across multiple queries
- Fast fact-checking loops that needed low-latency responses with stable output schemas
- Multi-step reasoning pipelines where Agents iterated on the same query space without drifting or hallucinating
Who Won, And Why It Mattered
There were 51 submissions across the weekend. A genuinely impressive spread of ideas, experiments, and working prototypes. We could spend pages breaking them all down, but to keep things tight we’re focusing on the major winners the teams whose builds pointed most clearly to where agentic AI is heading next.
Grand Champion: Team Jailbreak Labs
Track: Agent Iron Man / Performance
A complete AI security intelligence platform
Why they won:
Their system handled complex chained tasks under strict latency, cost, and reliability constraints, and did it with stability you’d actually deploy. They were one of the few teams to treat the agent as a production system rather than a demo, and their approach reflected that: robust retries, smart fallback logic, and retrieval that scaled cleanly under load. They also used Valyu effectively inside a long-running agent loop.
Most Valyu Award: Team Illusion
Illusion is a multi-agent auditing system that makes company privacy, data-collection, and data-usage policies transparent.
Why they won:
They built an agent that depended heavily on structured search, iterative retrieval, and stable outputs, exactly the pattern our customers use in production. Their integration showcased why “RAG for agents” requires multiple, smaller hits of high-quality data, rather than one giant fetch.
Track Winners: Quick Snapshot
- Project Emotions (Track A): An agent that analyses non-verbal language and assigns meaning.
- Zarks AI (Track B): Evaluating robustness of black-box LLMs using the power of embeddings.
- HSIA (Track C): Exposing the critical threat of Harmful Semantic Visual Injection (HSVI) that tricks VLA robots into misinterpreting their environment and objective.
You get the idea: people weren’t building toy apps, they were building early versions of systems that might actually run in production.
A huge thank you to everyone who showed up, built, iterated, broke things, and pushed agent workflows further than we’ve seen before. The energy in the room made it clear: this space is moving fast, and London’s builders are a big part of that.
We’re already looking forward to seeing you all again next year.
You can try out the DeepSearch API that participants were building with here.