Skip to contents

Single-cell reference mapping with PCA method

Usage

RunPCAMap(
  srt_query,
  srt_ref,
  query_assay = NULL,
  ref_assay = srt_ref[[ref_pca]]@assay.used,
  ref_pca = NULL,
  ref_dims = 1:30,
  ref_umap = NULL,
  ref_group = NULL,
  projection_method = c("model", "knn"),
  nn_method = NULL,
  k = 30,
  distance_metric = "cosine",
  vote_fun = "mean"
)

Arguments

srt_query

An object of class Seurat storing the query cells.

srt_ref

An object of class Seurat storing the reference cells.

query_assay

A character string specifying the assay name for the query cells. If not provided, the default assay for the query object will be used.

ref_assay

A character string specifying the assay name for the reference cells. If not provided, the default assay for the reference object will be used.

ref_pca

A character string specifying the name of a PCA reduction in the reference object to use for calculating the distance metric. If NULL (default), it will be automatically detected as the first PCA reduction.

ref_dims

A numeric vector specifying the dimension indices from the reference reduction to be used for calculating the distance metric.

ref_umap

A character string specifying the name of the UMAP reduction in the reference object. If not provided, the first UMAP reduction found in the reference object will be used.

ref_group

A character string specifying a metadata column name in the reference object to use for grouping.

projection_method

A character string specifying the projection method to use. Options are "model" and "knn". If "model" is selected, the function will try to use a pre-trained UMAP model in the reference object for projection. If "knn" is selected, the function will directly find the nearest neighbors using the distance metric.

nn_method

A character string specifying the nearest neighbor search method to use. Options are "raw", "annoy", and "rann". If "raw" is selected, the function will use the brute-force method to find the nearest neighbors. If "annoy" is selected, the function will use the Annoy library for approximate nearest neighbor search. If "rann" is selected, the function will use the RANN library for approximate nearest neighbor search. If not provided, the function will choose the search method based on the size of the query and reference datasets.

k

An integer specifying the number of nearest neighbors to find for each cell in the query object.

distance_metric

A character string specifying the distance metric to use for calculating the pairwise distances between cells. Options include: "pearson", "spearman", "cosine", "correlation", "jaccard", "ejaccard", "dice", "edice", "hamman", "simple matching", and "faith". Additional distance metrics can also be used, such as "euclidean", "manhattan", "hamming", etc.

vote_fun

A character string specifying the function to be used for aggregating the nearest neighbors in the reference object. Options are "mean", "median", "sum", "min", "max", "sd", "var", etc. If not provided, the default is "mean".

Examples

data("panc8_sub")
srt_ref <- panc8_sub[, panc8_sub$tech != "fluidigmc1"]
srt_query <- panc8_sub[, panc8_sub$tech == "fluidigmc1"]
srt_ref <- integration_scop(
  srt_ref,
  batch = "tech",
  integration_method = "Seurat"
)
#>  [2025-07-03 08:48:29] Start Seurat_integrate
#>  [2025-07-03 08:48:29]  Spliting srt_merge into srt_list by column tech...
#>  [2025-07-03 08:48:30] Checking srt_list...
#>  [2025-07-03 08:48:30] Data 1/4 of the srt_list has been log-normalized.
#>  [2025-07-03 08:48:30] Perform FindVariableFeatures on the data 1/4 of the srt_list...
#>  [2025-07-03 08:48:30] Data 2/4 of the srt_list has been log-normalized.
#>  [2025-07-03 08:48:30] Perform FindVariableFeatures on the data 2/4 of the srt_list...
#>  [2025-07-03 08:48:31] Data 3/4 of the srt_list has been log-normalized.
#>  [2025-07-03 08:48:31] Perform FindVariableFeatures on the data 3/4 of the srt_list...
#>  [2025-07-03 08:48:31] Data 4/4 of the srt_list has been log-normalized.
#>  [2025-07-03 08:48:31] Perform FindVariableFeatures on the data 4/4 of the srt_list...
#>  [2025-07-03 08:48:32] Use the separate HVF from srt_list...
#>  [2025-07-03 08:48:32] Number of available HVF: 2000
#>  [2025-07-03 08:48:33] Finished checking.
#>  [2025-07-03 08:48:34] Perform FindIntegrationAnchors on the data...
#> Warning: Different features in new layer data than already exists for scale.data
#> Warning: Different features in new layer data than already exists for scale.data
#> Warning: Different features in new layer data than already exists for scale.data
#> Warning: Different features in new layer data than already exists for scale.data
#>  [2025-07-03 08:48:54] Perform integration(Seurat) on the data...
#> Warning: Layer counts isn't present in the assay object; returning NULL
#> Warning: Different cells in new layer data than already exists for scale.data
#> Warning: Layer counts isn't present in the assay object; returning NULL
#> Warning: Different cells in new layer data than already exists for scale.data
#> Warning: Layer counts isn't present in the assay object; returning NULL
#>  [2025-07-03 08:49:03] Perform ScaleData on the data...
#>  [2025-07-03 08:49:03] Perform linear dimension reduction (pca) on the data...
#>  [2025-07-03 08:49:04] Perform FindClusters (louvain) on the data...
#>  [2025-07-03 08:49:04] Reorder clusters...
#>  [2025-07-03 08:49:04] Using 'Seurat::AverageExpression()' to calculate pseudo-bulk data for 'Assay'.
#>  [2025-07-03 08:49:04] Perform nonlinear dimension reduction (umap) on the data...
#>  [2025-07-03 08:49:04] Non-linear dimensionality reduction(umap) using Reduction(Seuratpca, dims:1-11) as input
#>  [2025-07-03 08:49:09] Non-linear dimensionality reduction(umap) using Reduction(Seuratpca, dims:1-11) as input
#>  [2025-07-03 08:49:15] Seurat_integrate done
#>  [2025-07-03 08:49:15] Elapsed time: 45.88 secs
CellDimPlot(srt_ref, group.by = c("celltype", "tech"))
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.


# Projection
srt_query <- RunPCAMap(
  srt_query = srt_query,
  srt_ref = srt_ref,
  ref_pca = "Seuratpca",
  ref_umap = "SeuratUMAP2D"
)
#>  [2025-07-03 08:49:15] Detected srt_query data type: log_normalized_counts
#> ! [2025-07-03 08:49:16] Negative values detected!
#>  [2025-07-03 08:49:16] Detected srt_ref data type: unknown
#> ! [2025-07-03 08:49:16] Data type is unknown or different between srt_query and srt_ref.
#>  [2025-07-03 08:49:16] Run PCA projection
#>  [2025-07-03 08:49:16] Use 2000 features to calculate PC.
#>  [2025-07-03 08:49:16] Run UMAP projection
#>  [2025-07-03 08:49:16] Use the reduction to calculate distance metric.
#>  [2025-07-03 08:49:16] Use 'raw' method to find neighbors.
#>  [2025-07-03 08:49:16] Running UMAP projection
#> 08:49:16 Read 200 rows
#> 08:49:16 Processing block 1 of 1
#> 08:49:16 Commencing smooth kNN distance calibration using 2 threads
#>  with target n_neighbors = 30
#> 08:49:16 Initializing by weighted average of neighbor coordinates using 2 threads
#> 08:49:16 Commencing optimization for 200 epochs, with 5974 positive edges
#> 08:49:16 Finished
ProjectionPlot(
  srt_query = srt_query,
  srt_ref = srt_ref,
  query_group = "celltype",
  ref_group = "celltype"
)
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.
#> Scale for x is already present.
#> Adding another scale for x, which will replace the existing scale.
#> Scale for y is already present.
#> Adding another scale for y, which will replace the existing scale.
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.
#> Warning: Removed 6 rows containing missing values or values outside the scale range
#> (`geom_point()`).
#> Warning: Removed 6 rows containing missing values or values outside the scale range
#> (`geom_point()`).