
The problem
AI is transforming knowledge work. But models need context to reason well.
The authoritative data that professionals rely on lives behind paywalls, proprietary formats, and interfaces built for human experts. Academic literature. Financial filings. Clinical evidence. Patent records. Specialized databases. All fragmented. All inaccessible to AI systems.
We're building infrastructure to solve this. A unified context layer connecting AI agents to authoritative sources across domains. One API. Natural language. Every source an agent needs to do real knowledge work.
ChEMBL is the latest addition.
Why ChEMBL Matters
ChEMBL is the gold standard for drug discovery data. Maintained by the European Bioinformatics Institute, it contains two decades of curated bioactivity measurements extracted from primary literature. Every serious computational chemistry effort builds on it.
Accessing ChEMBL programmatically requires understanding database schemas, SMILES notation, target ontologies, and activity type hierarchies. Want to find kinase inhibitors with sub-nanomolar potency? You need to know that IC50 values are stored in standard units, understand binding assays versus functional assays, and navigate the entity relationships between molecules, assays, and targets.
This expertise barrier keeps drug discovery data locked away from AI systems. The data is technically open. In practice, inaccessible.
We indexed the complete database for semantic search. Every compound, every target, every measurement. Searchable in plain English.
What's Included
2.4M+ bioactive compounds with SMILES, InChI, molecular properties, drug-likeness scores, calculated descriptors.
15,000+ biological targets including proteins, enzymes, receptors, ion channels. Each mapped to therapeutic relevance.
20M+ bioactivity measurements covering IC50, Ki, Kd, EC50. Binding affinities, functional potencies, ADMET properties.
Mechanism of action data with target interactions, binding modes, selectivity profiles.
Development phases from early discovery through FDA approval. Track which compounds reached trials, which are approved drugs.
The Biomedical Stack
ChEMBL joins our existing biomedical sources:
| Source | Coverage |
| PubMed | 36M+ abstracts |
| arXiv | 2.4M+ preprints |
| bioRxiv | 300K+ biology preprints |
| medRxiv | 70K+ health sciences preprints |
| 500K+ registered trials | |
| ChEMBL | 2.4M compounds, 20M measurements |
One API. Natural language. An AI research agent can now move between literature, clinical evidence, and compound data in a single workflow.
See our full data coverage across 36+ sources.
How It Works
Two access patterns:
Automatic routing. Use our standard search endpoint. When queries involve compounds, targets, or bioactivity, we route to ChEMBL alongside other relevant sources.
Dedicated search. Use bioSearch for biomedical-specific queries including compounds, clinical trials, drug labels, and literature.
1from valyu import Valyu2
3client = Valyu()4
5results = client.search(6 query="Approved EGFR tyrosine kinase inhibitors",7 data_sources=["valyu/valyu-chembl"],8 max_num_results=209)Use Cases
Virtual screening. Filter millions of compounds by molecular properties, target activity, development stage.
Target identification. Query all known ligands for a protein. Find selectivity data across target families.
Mechanism research. Compare mechanisms across therapeutic classes. Identify repurposing opportunities.
Prior art analysis. Search existing compounds by structural features or activity profiles.
SAR analysis. Structure-activity relationships in conversation.
See the full Healthcare & Life Sciences use cases for more examples.
Works Everywhere
Vercel AI SDK
1import { generateText } from "ai";2import { bioSearch } from "@valyu/ai-sdk";3import { openai } from "@ai-sdk/openai";4
5const { text } = await generateText({6 model: openai("gpt-4o"),7 prompt: "Find all compounds targeting EGFR",8 tools: { bioSearch: bioSearch() },9});Remote MCP for Claude Desktop, Claude Code, or OpenAI:
1https://mcp.valyu.ai/mcp?valyuApiKey=your-keyREST API
1curl -X POST <https://api.valyu.ai/v1/search> \\2 -H "Authorization: Bearer $VALYU_API_KEY" \\3 -H "Content-Type: application/json" \\4 -d '{5 "query": "Oral GLP-1 agonists under 1000 Da",6 "data_sources": ["valyu/valyu-chembl"],7 "max_num_results": 108 }'Full integration docs: Vercel AI SDK · Remote MCP
Get Started
Playground: ai-sdk.valyu.ai
API Key: platform.valyu.ai ($10 free credits)
Docs: docs.valyu.ai
Valyu builds the context layer for AI knowledge work. Our Search API connects agents to authoritative sources across domains: web content, academic papers, patents, financial filings, clinical trials, and specialized databases. One natural language interface.