Each factoid is one self-contained statement extracted from a source document by the BELLADONNA agentic pipeline. The corpus below is stratified across open-access literature, publisher APIs, trial registries, guidelines, and regulatory reports.
Per-source volume
Timeline (docs per year, stacked by source)
Factoid length distribution
Characters per factoid. Bars beyond 500ch are grouped into the final bin.
Information flow: Sources → Topic clusters → Entity groups
Left ribbons are proportional to the number of documents a source places into each topic cluster (atlas k=24, showing top 10). Right ribbons are proportional to the number of curated entity mentions in each cluster's factoids. Hover a ribbon for exact counts.
Top biomedical entities across all factoids
String-match counts for a curated list of breast-cancer terms (biomarkers, drugs, procedures, genes, outcomes). Each row shows total mentions and the number of documents that contain at least one mention.
Pipeline
- Retrieval — pull articles, trials and guidelines from EPMC, Elsevier, CTG, ASCO/ESMO/AGO, EMA.
- Extraction — agentic LLM reads each document and emits atomic factoids.
- Schema mapping — normalise to a shared entity schema.
- Expert validation — oncologists review a stratified sample via Delphi consensus.
Hierarchical drill-down. Click a source to expand its document sample, click a document to reveal its factoids.
BELLADONNA