Skip to contents

Single-cell reference mapping with CSS method

Usage

RunCSSMap(
  srt_query,
  srt_ref,
  query_assay = NULL,
  ref_assay = srt_ref[[ref_css]]@assay.used,
  ref_css = NULL,
  ref_umap = NULL,
  ref_group = NULL,
  projection_method = c("model", "knn"),
  nn_method = NULL,
  k = 30,
  distance_metric = "cosine",
  vote_fun = "mean"
)

Arguments

srt_query

An object of class Seurat storing the query cells.

srt_ref

An object of class Seurat storing the reference cells.

query_assay

A character string specifying the assay name for the query cells. If not provided, the default assay for the query object will be used.

ref_assay

A character string specifying the assay name for the reference cells. If not provided, the default assay for the reference object will be used.

ref_css

A character string specifying the name of the CSS reduction in the reference object to use for calculating the distance metric.

ref_umap

A character string specifying the name of the UMAP reduction in the reference object. If not provided, the first UMAP reduction found in the reference object will be used.

ref_group

A character string specifying a metadata column name in the reference object to use for grouping.

projection_method

A character string specifying the projection method to use. Options are "model" and "knn". If "model" is selected, the function will try to use a pre-trained UMAP model in the reference object for projection. If "knn" is selected, the function will directly find the nearest neighbors using the distance metric.

nn_method

A character string specifying the nearest neighbor search method to use. Options are "raw", "annoy", and "rann". If "raw" is selected, the function will use the brute-force method to find the nearest neighbors. If "annoy" is selected, the function will use the Annoy library for approximate nearest neighbor search. If "rann" is selected, the function will use the RANN library for approximate nearest neighbor search. If not provided, the function will choose the search method based on the size of the query and reference datasets.

k

A number of nearest neighbors to find for each cell in the query object.

distance_metric

A character string specifying the distance metric to use for calculating the pairwise distances between cells. Options include: "pearson", "spearman", "cosine", "correlation", "jaccard", "ejaccard", "dice", "edice", "hamman", "simple matching", and "faith". Additional distance metrics can also be used, such as "euclidean", "manhattan", "hamming", etc.

vote_fun

A character string specifying the function to be used for aggregating the nearest neighbors in the reference object. Options are "mean", "median", "sum", "min", "max", "sd", "var", etc. If not provided, the default is "mean".

Examples

data(panc8_sub)
srt_ref <- panc8_sub[, panc8_sub$tech != "fluidigmc1"]
srt_query <- panc8_sub[, panc8_sub$tech == "fluidigmc1"]
srt_ref <- integration_scop(
  srt_ref,
  batch = "tech",
  integration_method = "CSS"
)
#>  [2025-09-20 13:25:53] Run CSS integration...
#>  [2025-09-20 13:25:53] Installing: simspec...
#>  
#> → Will install 16 packages.
#> → All 16 packages (0 B) are cached.
#> + RANN          2.6.2      
#> + crosstalk     1.2.2      
#> + evaluate      1.0.5      
#> + highr         0.11       
#> + htmlwidgets   1.6.4      
#> + irlba         2.3.5.1    
#> + kernlab       0.9-33     
#> + knitr         1.50        +  pandoc
#> + mixtools      2.0.0.1    
#> + plotly        4.11.0     
#> + rmarkdown     2.29        +  pandoc
#> + segmented     2.1-4      
#> + simspec       0.0.0.9000 [bld][cmp] (GitHub: f4c87bf)
#> + tinytex       0.57       
#> + xfun          0.53       
#> + yaml          2.3.10     
#>  All system requirements are already installed.
#>   
#>  No downloads are needed, 16 pkgs are cached
#>  Got highr 0.11 (x86_64-pc-linux-gnu-ubuntu-24.04) (37.50 kB)
#>  Got evaluate 1.0.5 (x86_64-pc-linux-gnu-ubuntu-24.04) (102.86 kB)
#>  Got yaml 2.3.10 (x86_64-pc-linux-gnu-ubuntu-24.04) (114.67 kB)
#>  Got tinytex 0.57 (x86_64-pc-linux-gnu-ubuntu-24.04) (143.68 kB)
#>  Got crosstalk 1.2.2 (x86_64-pc-linux-gnu-ubuntu-24.04) (412.02 kB)
#>  Got RANN 2.6.2 (x86_64-pc-linux-gnu-ubuntu-24.04) (43.84 kB)
#>  Got irlba 2.3.5.1 (x86_64-pc-linux-gnu-ubuntu-24.04) (292.19 kB)
#>  Got htmlwidgets 1.6.4 (x86_64-pc-linux-gnu-ubuntu-24.04) (815.05 kB)
#>  Got knitr 1.50 (x86_64-pc-linux-gnu-ubuntu-24.04) (1.10 MB)
#>  Got xfun 0.53 (x86_64-pc-linux-gnu-ubuntu-24.04) (583.38 kB)
#>  Got kernlab 0.9-33 (x86_64-pc-linux-gnu-ubuntu-24.04) (2.05 MB)
#>  Got segmented 2.1-4 (x86_64-pc-linux-gnu-ubuntu-24.04) (1.38 MB)
#>  Got mixtools 2.0.0.1 (x86_64-pc-linux-gnu-ubuntu-24.04) (1.43 MB)
#>  Got rmarkdown 2.29 (x86_64-pc-linux-gnu-ubuntu-24.04) (2.64 MB)
#>  Got plotly 4.11.0 (x86_64-pc-linux-gnu-ubuntu-24.04) (3.86 MB)
#>  Got simspec 0.0.0.9000 (source) (8.30 MB)
#>  Installing system requirements
#>  Executing `sudo sh -c apt-get -y update`
#> Get:1 file:/etc/apt/apt-mirrors.txt Mirrorlist [144 B]
#> Hit:2 http://azure.archive.ubuntu.com/ubuntu noble InRelease
#> Hit:3 http://azure.archive.ubuntu.com/ubuntu noble-updates InRelease
#> Hit:4 http://azure.archive.ubuntu.com/ubuntu noble-backports InRelease
#> Hit:5 http://azure.archive.ubuntu.com/ubuntu noble-security InRelease
#> Hit:6 https://packages.microsoft.com/repos/azure-cli noble InRelease
#> Hit:7 https://packages.microsoft.com/ubuntu/24.04/prod noble InRelease
#> Reading package lists...
#>  Executing `sudo sh -c apt-get -y install pandoc libcurl4-openssl-dev libssl-dev make libglpk-dev libxml2-dev libicu-dev`
#> Reading package lists...
#> Building dependency tree...
#> Reading state information...
#> pandoc is already the newest version (3.1.3+ds-2).
#> libcurl4-openssl-dev is already the newest version (8.5.0-2ubuntu10.6).
#> libssl-dev is already the newest version (3.0.13-0ubuntu3.5).
#> make is already the newest version (4.3-4.1build2).
#> libglpk-dev is already the newest version (5.0-1build2).
#> libxml2-dev is already the newest version (2.9.14+dfsg-1.3ubuntu3.5).
#> libicu-dev is already the newest version (74.2-1ubuntu3.1).
#> 0 upgraded, 0 newly installed, 0 to remove and 41 not upgraded.
#>  Installed crosstalk 1.2.2  (52ms)
#>  Installed evaluate 1.0.5  (70ms)
#>  Installed highr 0.11  (93ms)
#>  Installed htmlwidgets 1.6.4  (126ms)
#>  Installed irlba 2.3.5.1  (57ms)
#>  Installed kernlab 0.9-33  (87ms)
#>  Installed knitr 1.50  (67ms)
#>  Installed mixtools 2.0.0.1  (71ms)
#>  Installed RANN 2.6.2  (1s)
#>  Installed plotly 4.11.0  (1.1s)
#>  Installed rmarkdown 2.29  (92ms)
#>  Installed segmented 2.1-4  (65ms)
#>  Installed tinytex 0.57  (56ms)
#>  Installed xfun 0.53  (57ms)
#>  Installed yaml 2.3.10  (42ms)
#>  Packaging simspec 0.0.0.9000
#>  Packaged simspec 0.0.0.9000 (1.6s)
#>  Building simspec 0.0.0.9000
#>  Built simspec 0.0.0.9000 (2.8s)
#>  Installed simspec 0.0.0.9000 (github::quadbiolab/simspec@f4c87bf) (1s)
#>  1 pkg + 73 deps: kept 57, added 16, dld 16 (NA B) [14.5s]
#>  [2025-09-20 13:26:07] Installing: qlcMatrix...
#>  
#> → Will install 3 packages.
#> → All 3 packages (0 B) are cached.
#> + docopt      0.7.2 
#> + qlcMatrix   0.9.9 
#> + sparsesvd   0.2-3 
#>   
#>  No downloads are needed, 3 pkgs are cached
#>  Got docopt 0.7.2 (x86_64-pc-linux-gnu-ubuntu-24.04) (250.87 kB)
#>  Got sparsesvd 0.2-3 (x86_64-pc-linux-gnu-ubuntu-24.04) (32.50 kB)
#>  Got qlcMatrix 0.9.9 (x86_64-pc-linux-gnu-ubuntu-24.04) (3.30 MB)
#>  Installed docopt 0.7.2  (42ms)
#>  Installed qlcMatrix 0.9.9  (57ms)
#>  Installed sparsesvd 0.2-3  (76ms)
#>  1 pkg + 5 deps: kept 2, added 3, dld 3 (3.59 MB) [1.5s]
#>  [2025-09-20 13:26:09] quadbiolab/simspec and qlcMatrix installed successfully
#>  [2025-09-20 13:26:09] Spliting `srt_merge` into `srt_list` by column "tech"...
#>  [2025-09-20 13:26:09] Checking a list of <Seurat> object...
#> ! [2025-09-20 13:26:10] Data 1/4 of the `srt_list` is "unknown"
#>  [2025-09-20 13:26:10] Perform `NormalizeData()` with `normalization.method = 'LogNormalize'` on the data 1/4 of the `srt_list`...
#>  [2025-09-20 13:26:11] Perform `Seurat::FindVariableFeatures()` on the data 1/4 of the `srt_list`...
#> ! [2025-09-20 13:26:11] Data 2/4 of the `srt_list` is "unknown"
#>  [2025-09-20 13:26:11] Perform `NormalizeData()` with `normalization.method = 'LogNormalize'` on the data 2/4 of the `srt_list`...
#>  [2025-09-20 13:26:13] Perform `Seurat::FindVariableFeatures()` on the data 2/4 of the `srt_list`...
#> ! [2025-09-20 13:26:13] Data 3/4 of the `srt_list` is "unknown"
#>  [2025-09-20 13:26:13] Perform `NormalizeData()` with `normalization.method = 'LogNormalize'` on the data 3/4 of the `srt_list`...
#>  [2025-09-20 13:26:14] Perform `Seurat::FindVariableFeatures()` on the data 3/4 of the `srt_list`...
#> ! [2025-09-20 13:26:15] Data 4/4 of the `srt_list` is "unknown"
#>  [2025-09-20 13:26:15] Perform `NormalizeData()` with `normalization.method = 'LogNormalize'` on the data 4/4 of the `srt_list`...
#>  [2025-09-20 13:26:16] Perform `Seurat::FindVariableFeatures()` on the data 4/4 of the `srt_list`...
#>  [2025-09-20 13:26:17] Use the separate HVF from srt_list
#>  [2025-09-20 13:26:17] Number of available HVF: 2000
#>  [2025-09-20 13:26:17] Finished check
#>  [2025-09-20 13:26:20] Perform ScaleData
#>  [2025-09-20 13:26:20] Perform linear dimension reduction("pca")
#>  [2025-09-20 13:26:21] Perform CSS integration
#>  [2025-09-20 13:26:21] Using Reduction("CSSpca", dims:1-10) as input
#> Error in features_usage[, "data"]: subscript out of bounds
CellDimPlot(srt_ref, group.by = c("celltype", "tech"))
#> Error in DefaultReduction(srt): Unable to find any reductions

# Projection
srt_query <- RunCSSMap(
  srt_query = srt_query,
  srt_ref = srt_ref,
  ref_css = "CSS",
  ref_umap = "CSSUMAP2D"
)
#>  [2025-09-20 13:26:21] quadbiolab/simspec installed successfully
#> Error in srt_ref[[ref_css]]: ‘CSS’ not found in this Seurat object
#>  
ProjectionPlot(
  srt_query = srt_query,
  srt_ref = srt_ref,
  query_group = "celltype",
  ref_group = "celltype"
)
#> Error in srt_query[[query_reduction]]: ‘ref.embeddings’ not found in this Seurat object
#>