Valyu Logo
ChemBL Blog Cover
Updates,
Product

ChEMBL Search API: 2.4M Bioactive Compounds for AI Agents

>_ Hendrik Van Der Sande

The problem

AI is transforming knowledge work. But models need context to reason well.

The authoritative data that professionals rely on lives behind paywalls, proprietary formats, and interfaces built for human experts. Academic literature. Financial filings. Clinical evidence. Patent records. Specialized databases. All fragmented. All inaccessible to AI systems.

We're building infrastructure to solve this. A unified context layer connecting AI agents to authoritative sources across domains. One API. Natural language. Every source an agent needs to do real knowledge work.

ChEMBL is the latest addition.

Why ChEMBL Matters

ChEMBL is the gold standard for drug discovery data. Maintained by the European Bioinformatics Institute, it contains two decades of curated bioactivity measurements extracted from primary literature. Every serious computational chemistry effort builds on it.

Accessing ChEMBL programmatically requires understanding database schemas, SMILES notation, target ontologies, and activity type hierarchies. Want to find kinase inhibitors with sub-nanomolar potency? You need to know that IC50 values are stored in standard units, understand binding assays versus functional assays, and navigate the entity relationships between molecules, assays, and targets.

This expertise barrier keeps drug discovery data locked away from AI systems. The data is technically open. In practice, inaccessible.

We indexed the complete database for semantic search. Every compound, every target, every measurement. Searchable in plain English.

What's Included

2.4M+ bioactive compounds with SMILES, InChI, molecular properties, drug-likeness scores, calculated descriptors.

15,000+ biological targets including proteins, enzymes, receptors, ion channels. Each mapped to therapeutic relevance.

20M+ bioactivity measurements covering IC50, Ki, Kd, EC50. Binding affinities, functional potencies, ADMET properties.

Mechanism of action data with target interactions, binding modes, selectivity profiles.

Development phases from early discovery through FDA approval. Track which compounds reached trials, which are approved drugs.

The Biomedical Stack

ChEMBL joins our existing biomedical sources:

SourceCoverage
PubMed36M+ abstracts
arXiv2.4M+ preprints
bioRxiv300K+ biology preprints
medRxiv70K+ health sciences preprints
500K+ registered trials
ChEMBL2.4M compounds, 20M measurements

One API. Natural language. An AI research agent can now move between literature, clinical evidence, and compound data in a single workflow.

See our full data coverage across 36+ sources.

How It Works

Two access patterns:

Automatic routing. Use our standard search endpoint. When queries involve compounds, targets, or bioactivity, we route to ChEMBL alongside other relevant sources.

Dedicated search. Use bioSearch for biomedical-specific queries including compounds, clinical trials, drug labels, and literature.

python
1from valyu import Valyu
2
3client = Valyu()
4
5results = client.search(
6 query="Approved EGFR tyrosine kinase inhibitors",
7 data_sources=["valyu/valyu-chembl"],
8 max_num_results=20
9)

Use Cases

Virtual screening. Filter millions of compounds by molecular properties, target activity, development stage.

Target identification. Query all known ligands for a protein. Find selectivity data across target families.

Mechanism research. Compare mechanisms across therapeutic classes. Identify repurposing opportunities.

Prior art analysis. Search existing compounds by structural features or activity profiles.

SAR analysis. Structure-activity relationships in conversation.

See the full Healthcare & Life Sciences use cases for more examples.

Works Everywhere

Vercel AI SDK

python
1import { generateText } from "ai";
2import { bioSearch } from "@valyu/ai-sdk";
3import { openai } from "@ai-sdk/openai";
4
5const { text } = await generateText({
6 model: openai("gpt-4o"),
7 prompt: "Find all compounds targeting EGFR",
8 tools: { bioSearch: bioSearch() },
9});

Remote MCP for Claude Desktop, Claude Code, or OpenAI:

python
1https://mcp.valyu.ai/mcp?valyuApiKey=your-key

REST API

typescript
1curl -X POST <https://api.valyu.ai/v1/search> \\
2 -H "Authorization: Bearer $VALYU_API_KEY" \\
3 -H "Content-Type: application/json" \\
4 -d '{
5 "query": "Oral GLP-1 agonists under 1000 Da",
6 "data_sources": ["valyu/valyu-chembl"],
7 "max_num_results": 10
8 }'

Full integration docs: Vercel AI SDK · Remote MCP

Get Started

Playground: ai-sdk.valyu.ai

API Key: platform.valyu.ai ($10 free credits)

Docs: docs.valyu.ai

Valyu builds the context layer for AI knowledge work. Our Search API connects agents to authoritative sources across domains: web content, academic papers, patents, financial filings, clinical trials, and specialized databases. One natural language interface.