Corpus Explorer — BELLADONNA

7

Sources

↓

—

Documents

↓

—

Factoids

Each factoid is one self-contained statement extracted from a source document by the BELLADONNA agentic pipeline. The corpus below is stratified across open-access literature, publisher APIs, trial registries, guidelines, and regulatory reports.

Per-source volume

Timeline (docs per year, stacked by source)

Factoid length distribution

Characters per factoid. Bars beyond 500ch are grouped into the final bin.

Information flow: Sources → Topic clusters → Entity groups

Left ribbons are proportional to the number of documents a source places into each topic cluster (atlas k=24, showing top 10). Right ribbons are proportional to the number of curated entity mentions in each cluster's factoids. Hover a ribbon for exact counts.

Top biomedical entities across all factoids

String-match counts for a curated list of breast-cancer terms (biomarkers, drugs, procedures, genes, outcomes). Each row shows total mentions and the number of documents that contain at least one mention.

Pipeline

Retrieval — pull articles, trials and guidelines from EPMC, Elsevier, CTG, ASCO/ESMO/AGO, EMA.
Extraction — agentic LLM reads each document and emits atomic factoids.
Schema mapping — normalise to a shared entity schema.
Expert validation — oncologists review a stratified sample via Delphi consensus.

Documents will load when you click this tab…

Hierarchical drill-down. Click a source to expand its document sample, click a document to reveal its factoids.