Compute per-cell Local Inverse Simpson's Index (LISI) scores for one or more categorical variables.
This is a clean-room reimplementation of the immunogenomics/LISI.
Usage
compute_lisi(
X,
meta_data,
label_colnames,
perplexity = 30,
nn_eps = 0,
use_rann = TRUE,
nn_method = c("auto", "rann", "fnn", "hnsw", "exact"),
nn_threads = 0,
hnsw_ef = 100,
tol = 1e-05,
max_iter = 50
)Arguments
- X
A matrix-like object with cells in rows and embedding/features in columns.
- meta_data
A data frame with one row per cell.
- label_colnames
Character vector of column names in
meta_datato evaluate.- perplexity
Effective neighborhood size. Defaults to
30.- nn_eps
Approximation factor passed to RANN::nn2 when
RANNis available anduse_rann = TRUE. Defaults to0.- use_rann
Whether to prefer RANN::nn2 over
FNN::get.knnwhennn_method = "auto"decides not to use the package's built-in exact C++ backend. Defaults toTRUE.- nn_method
Nearest-neighbor backend. Defaults to
"auto", which uses a simple heuristic: low-dimensional inputs use the package's exact C++ search, while larger/higher-dimensional inputs fall back toRANN, thenFNN, then the built-in exact C++ backend. Set to"hnsw"to useRcppHNSWfor a faster, approximate search.- nn_threads
Number of threads used by the HNSW backend. Defaults to
0which letsRcppHNSWchoose automatically.- hnsw_ef
Search breadth used by the HNSW backend. Larger values are more accurate but slower. Defaults to
100.- tol
Tolerance used in the binary search for the target perplexity. Defaults to
1e-5.- max_iter
Maximum number of binary-search iterations. Defaults to
50.
References
Korsunsky I, Millard N, Fan J, et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nature Methods (2019). https://www.nature.com/articles/s41592-019-0619-0
LISI reference implementation: https://github.com/immunogenomics/LISI
Examples
set.seed(1)
X <- rbind(
matrix(stats::rnorm(100, mean = -2), ncol = 5),
matrix(stats::rnorm(100, mean = 2), ncol = 5)
)
meta_data <- data.frame(
batch = rep(c("A", "B"), each = 20),
group = sample(c("g1", "g2"), 40, replace = TRUE)
)
res <- compute_lisi(
X, meta_data,
c("batch", "group"),
perplexity = 10
)
#> ◌ [2026-04-01 14:12:22] Using "exact" nearest-neighbor backend for compute_lisi
head(res)
#> batch group
#> 1 1.001837 1.490901
#> 2 1.000010 1.439122
#> 3 1.001436 1.915155
#> 4 1.000065 1.644454
#> 5 1.000029 1.501060
#> 6 1.000000 1.560663
boxplot(res)