Loading…
BELLADONNA
Each dot is one source document. Positions come from a sentence-transformer embedding, so neighbors mean similar content, not just shared vocabulary.
BAAI/bge-small-en-v1.5 — 33M-param sentence transformer, 384-dim output, run via fastembed (ONNX runtime, no PyTorch). Each document's concatenated factoid text (first ~600 chars, truncated to 128 tokens) is encoded, then L2-normalized.UMAP(n_neighbors=15, min_dist=0.15, metric="cosine") — projects the 384-dim embeddings to 2-D while preserving local neighborhoods.MiniBatchKMeans(k=24) on the embeddings (not on the 2-D projection). Per-cluster topic labels come from a discriminative-frequency score on the factoid text.ImageData buffer (no per-point canvas calls), which is what lets all 181,113 points paint in a single frame on a laptop GPU.What this is — and isn't. A semantic atlas: proximity reflects what a small general-purpose sentence transformer thinks two documents are about. It is not a clinical ontology — treat topic labels as regional hints, not strict categories, and don't assume proximity implies causal or therapeutic relationship.