Pairwise cosine distances with centroid axis between label partitions
Source:R/distance_reference_cosine.R
distance_reference_cosine.RdReads embeddings from a model-specific dataset and computes cosine distances
between all vectors in corpus_label and all vectors in reference_label.
A centroid row/column is added to the matrix:
rows are corpus ids plus
"centroid"(corpus centroid),columns are reference ids plus
"centroid"(reference centroid).
Usage
distance_reference_cosine(
project_dir,
embeddings_dir = "model_id=BAAI_bge-small-en-v1.5",
corpus_label = "corpus",
reference_label = "reference",
batch_size = 1e+05,
max_cells = 5e+07,
verbose = TRUE
)Arguments
- project_dir
Project root directory containing
embeddings/.- embeddings_dir
Model subfolder under
project_dir/embeddings, e.g."model_id=BAAI_bge-small-en-v1.5".- corpus_label
Label partition used as corpus side. Defaults to
"corpus".- reference_label
Label partition used as reference side. Defaults to
"reference".- batch_size
Unused placeholder for compatibility with planned streaming extension.
- max_cells
Maximum allowed matrix size (
(n_corpus + 1) * (n_reference + 1)) to guard memory use.- verbose
Logical; print progress messages.