Plot embeddings via PCA, colored by arbitrary labels
Source:R/plot_embeddings.R
plot_embeddings_pca.RdReads an embeddings Parquet dataset (produced by embed_corpus()) with columns
id and V1..Vd, computes a PCA on the embedding matrix, and returns a
scatter plot of the first two principal components. Points are colored by
labels provided via labels. Rows not found in labels are shown as
"other".
Usage
plot_embeddings_pca(
embeddings,
labels,
center = TRUE,
scale. = FALSE,
point_size = 2,
alpha = 0.5
)Arguments
- embeddings
Path to a Parquet file or dataset directory containing columns
idandV1..Vd.- labels
Label mapping for ids. Supported formats:
data frame with columns
idandlabel,path to CSV with columns
idandlabel,named character vector where names are ids and values are labels,
named list where each element is an id vector for that label.
- center, scale.
Passed to
stats::prcomp()for PCA. Defaultscenter = TRUE,scale. = FALSE.- point_size, alpha
Point size and transparency for points in the plot. Defaults
point_size = 2,alpha = 0.5.
Examples
if (FALSE) { # \dontrun{
p <- plot_embeddings_pca(
embeddings = "inst/examples/embedings/",
labels = data.frame(
id = c("W1", "W2", "W10"),
label = c("reference", "reference", "corpus")
)
)
print(p)
} # }