Data

Where the numbers come from.

Every drug, gene, trial, and paper on this site is pulled live from public biomedical databases. We don't have a private dataset, a paywall, or a curated list of "our" drugs. We just read the same sources a researcher would and try to make them legible to someone who isn't one.

This page walks through every source we query, how the answers get merged when they disagree, how fresh the data is, and importantly what we don't have.

The six sources

Open Targets Platform

The brain

What it is

A genetics-first knowledge graph maintained by EMBL-EBI, the Wellcome Sanger Institute, and others. It connects diseases, genes, and drugs with evidence scores broken out by data type — GWAS, ClinVar, gene-burden testing, literature mining, CRISPR screens, expression atlas, mouse phenotypes, and more.

How we use it

Disease and gene overviews; gene-disease association scores; the curated "known drugs" list per disease/target; drug mechanism of action and target list. Almost every screen on this site starts with an Open Targets call.

ChEMBL

Drug metadata

What it is

The European Bioinformatics Institute's open drug-discovery database. Curated activity data for thousands of bioactive molecules, with regulatory status, synonyms, and structural metadata.

How we use it

A name-based fallback when Open Targets is missing rich drug detail (max phase, first approval year, black-box warning, withdrawal status, synonyms). Also feeds the unified "drugs that act on this gene" list.

ClinicalTrials.gov

Live trial registry

What it is

The official US registry of clinical studies, run by the National Library of Medicine. Every IND-stage drug trial in the US is logged here, plus most international ones. Status, phase, sponsor, interventions, conditions, enrollment.

How we use it

Three jobs: checking whether a candidate drug-disease pair has ever been studied in patients (the "trialed / never trialed" badge), listing recruiting trials on each disease and drug page, and populating the "drugs in patient studies" count on disease pages. The Explore page's "recruiting now" feed is also live from here.

PubMed (NCBI E-utilities)

Research literature

What it is

The National Library of Medicine's index of biomedical research — 35M+ papers spanning the life sciences. We use the standard E-utilities API.

How we use it

Counting how many papers mention a (drug, disease) pair — the "how well-studied is this" signal that feeds the composite ranking score. Also fetches the top article summaries shown on each disease and drug page.

DGIdb

Cross-reference

What it is

The Drug-Gene Interaction Database — itself an aggregation of ~40 underlying sources (DrugBank, ChEMBL, TTD, GuideToPharmacology, CIViC, ClearityFoundationBiomarkers, and more). Maintained by Washington University in St. Louis.

How we use it

A second opinion when Open Targets is sparse for a particular gene. Filled in around half the gaps we saw on well-trodden targets like PTGS1/2 (cox enzymes) where OT is conservative.

openFDA

FDA labels

What it is

The FDA's own open-data API exposing approved drug labels — indications, dosing, warnings, contraindications, boxed warnings, manufacturer.

How we use it

The "How this drug is sold" section on every drug page. Feeds approval-status data into the disease-page aggregator — without this, common diseases like hypertension show 0 approved drugs because Open Targets' curated table is sparse. Also populates the "Key safety considerations" section in the Take-to-Your-Doctor printable summary (boxed warnings, contraindications, stop-use criteria pulled from the live FDA label when a candidate is shared).

How the answers get merged

Different sources give different answers to the same question. Open Targets is conservative; DGIdb is broad; ClinicalTrials.gov is messy but fresh. When sources disagree, we union them and prefer the richer source for display fields.

Four places do this merge:

Drugs already used for a disease. Unions Open Targets + ClinicalTrials.gov + openFDA. Approved = OT phase ≥ 4 or openFDA-label match. In-trials = anything else with a phase, plus any drug that appears in a CT.gov trial for the condition.
Drugs that act on a gene. Unions Open Targets + DGIdb. Names normalized (salt forms stripped, case-folded), then deduplicated. The display name comes from the cleanest source we have — usually OT.
Standard treatments for a disease. The 3-card "what doctors typically prescribe" strip on each disease page is the top 3 of the union list, filtered to phase ≥ 4 (FDA-approved). When the disease is a parent classification with no drugs labeled for it specifically (e.g. "spinal cord cancer" — drugs are approved for the sub-types), the section shows an explainer instead of going silent.
Repurposing candidates for a disease. Pulls Open Targets' top 40 most-associated genes for the disease. For each gene, queries the multi-source drug list (OT + DGIdb) — not just OT — so well-trodden disease genes still surface candidates. Excludes drugs that are already FDA- approved for this disease (drugs that were tried at phase 1–3 but never approved are still valid candidates and stay in, tagged "has appeared in trials"). Enriches survivors with PubMed lit counts using [Title/Abstract] field tags, CT.gov trial-status checks, mechanism action type (so a researcher can sanity-check direction-of-effect against disease pathology), and per-drug safety flags (black-box, market withdrawal). Ranks by a composite score:(assoc_non-chembl × 0.5 + lit × 0.2 + novelty × 0.3) × safety_modifierwhere association uses the gene-disease score with the chembl evidence source removed (it's circular for repurposing — that source treats existing drug-trial co-occurrence as evidence), and safety_modifier is 0.55 for withdrawn drugs and 0.85 for drugs with a boxed warning.

How AI fits in

AI is used in two places, both for translation only, never to fetch new facts.

Plain-English summaries at the top of every disease, drug, and target page. It takes the upstream description text from Open Targets and rewrites it in 1–2 sentences for someone with no biology background.
Per-candidate rationale — the 2–3 sentence "why this might work" explanation on each repurposing candidate. Same model, prompted with structured inputs only: drug name, disease name, bridging gene, mechanism of action, association score, and the evidence types that underlie that score.

The AI doesn't fetch its own data and doesn't make claims beyond what the inputs say.

What we don't have

Being clear about what's outside the perimeter:

No proprietary databases. No DrugBank Plus, Cortellis, MedAdherence, or anything behind a license. Public data only.
No pricing or availability. Drug pricing varies by country, insurance, and formulation in ways that aren't comparable. We won't guess.
No EHR or claims data. Real-world evidence from anonymized patient records is powerful, but it's not in scope here.
openFDA is US-only. Drugs approved in the EU, UK, or Japan but not the US can be missing from approval-status surfaces. We catch some of those via ChEMBL through Open Targets, but not all.
No automatic direction-of-effect check. We surface each drug's mechanism action type (inhibitor, agonist, antagonist) on candidate cards so a researcher can sanity-check whether the direction matches the disease pathology. We don't automatically validate this — disease- direction annotations (gain-of-function vs loss-of-function) aren't reliably exposed by our sources at the API level.
No drug-class deduplication. If five drugs of the same mechanism class all bridge to the same gene, all five appear as separate candidates rather than collapsing to a single representative. ATC-class collapsing is a future improvement.
Curated lists have gaps. Open Targets' "known drugs" aggregation is high-precision but incomplete — that's why we layered in CT.gov, openFDA, and DGIdb. Our answers are still bounded by what these sources know.
Trial-status is "ever-mentioned", not "ever-succeeded". The "has appeared in trials" chip fires when ANY ClinicalTrials.gov record mentions the drug + disease pair — including control arms, terminated trials, and observational studies. It does not mean the drug worked. The trust-tag tooltip text says so explicitly.

Your data

Two things you can give us, both optional:

An email address, if you sign up.
Saves and notify-me subscriptions.

We don't collect analytics on what you search, save, or read. We don't sell anything to anyone. We don't have ads.

Why this exists

About RepurposeX →

Try it

Start with a disease →

Found something wrong, or a source we should add? Email hello@repurposex.org. We're especially interested in mismatches between what RepurposeX shows and what a domain expert would expect.