Integrate single-cell RNA-seq data using various integration methods.
Usage
integration_scop(
srt_merge = NULL,
batch,
append = TRUE,
srt_list = NULL,
assay = NULL,
integration_method = "Uncorrected",
do_normalization = NULL,
normalization_method = "LogNormalize",
do_HVF_finding = TRUE,
HVF_source = "separate",
HVF_method = "vst",
nHVF = 2000,
HVF_min_intersection = 1,
HVF = NULL,
do_scaling = TRUE,
vars_to_regress = NULL,
regression_model = "linear",
scale_within_batch = FALSE,
linear_reduction = "pca",
linear_reduction_dims = 50,
linear_reduction_dims_use = NULL,
linear_reduction_params = list(),
force_linear_reduction = FALSE,
nonlinear_reduction = "umap",
nonlinear_reduction_dims = c(2, 3),
nonlinear_reduction_params = list(),
force_nonlinear_reduction = TRUE,
neighbor_metric = "euclidean",
neighbor_k = 20L,
cluster_algorithm = "louvain",
cluster_resolution = 0.6,
seed = 11,
verbose = TRUE,
...
)
Arguments
- srt_merge
A merged `Seurat` object that includes the batch information.
- batch
A character string specifying the batch variable name.
- append
The integrated data will be appended to the original Seurat object (srt_merge). Default is
TRUE
.- srt_list
A list of
Seurat
objects to be checked and preprocessed.- assay
The name of the assay to be used for downstream analysis.
- integration_method
A character string specifying the integration method to use. Supported methods are:
"Uncorrected"
,"Seurat"
,"scVI"
,"MNN"
,"fastMNN"
,"Harmony"
,"Scanorama"
,"BBKNN"
,"CSS"
,"LIGER"
,"Conos"
,"ComBat"
. Default is"Uncorrected"
.- do_normalization
Whether data normalization should be performed. Default is
TRUE
.- normalization_method
The normalization method to be used. Possible values are
"LogNormalize"
,"SCT"
, and"TFIDF"
. Default is"LogNormalize"
.- do_HVF_finding
Whether highly variable feature (HVF) finding should be performed. Default is
TRUE
.- HVF_source
The source of highly variable features. Possible values are
"global"
and"separate"
. Default is"separate"
.- HVF_method
The method for selecting highly variable features. Default is
"vst"
.- nHVF
The number of highly variable features to select. Default is
2000
.- HVF_min_intersection
The feature needs to be present in batches for a minimum number of times in order to be considered as highly variable. The default value is
1
.- HVF
A vector of highly variable features. Default is
NULL
.- do_scaling
Whether to perform scaling. If TRUE, the function will force to scale the data using the ScaleData function.
- vars_to_regress
A vector of variable names to include as additional regression variables. Default is
NULL
.- regression_model
The regression model to use for scaling. Options are
"linear"
,"poisson"
, or"negativebinomial"
. Default is"linear"
.- scale_within_batch
Whether to scale data within each batch. Only valid when the
integration_method
is one of"Uncorrected"
,"Seurat"
,"MNN"
,"Harmony"
,"BBKNN"
,"CSS"
,"ComBat"
.- linear_reduction
The linear dimensionality reduction method to use. Options are
"pca"
,"svd"
,"ica"
,"nmf"
,"mds"
, or"glmpca"
. Default is"pca"
.- linear_reduction_dims
The number of dimensions to keep after linear dimensionality reduction. Default is
50
.- linear_reduction_dims_use
The dimensions to use for downstream analysis. If NULL, all dimensions will be used.
- linear_reduction_params
A list of parameters to pass to the linear dimensionality reduction method.
- force_linear_reduction
Whether to force linear dimensionality reduction even if the specified reduction is already present in the Seurat object.
- nonlinear_reduction
The nonlinear dimensionality reduction method to use. Options are
"umap"
,"umap-naive"
,"tsne"
,"dm"
,"phate"
,"pacmap"
,"trimap"
,"largevis"
, or"fr"
. Default is"umap"
.- nonlinear_reduction_dims
The number of dimensions to keep after nonlinear dimensionality reduction. If a vector is provided, different numbers of dimensions can be specified for each method. Default is
c(2, 3)
.- nonlinear_reduction_params
A list of parameters to pass to the nonlinear dimensionality reduction method.
- force_nonlinear_reduction
Whether to force nonlinear dimensionality reduction even if the specified reduction is already present in the Seurat object. Default is
TRUE
.- neighbor_metric
The distance metric to use for finding neighbors. Options are
"euclidean"
,"cosine"
,"manhattan"
, or"hamming"
. Default is"euclidean"
.- neighbor_k
The number of nearest neighbors to use for finding neighbors. Default is
20
.- cluster_algorithm
The clustering algorithm to use. Options are
"louvain"
,"slm"
, or"leiden"
. Default is"louvain"
.- cluster_resolution
The resolution parameter to use for clustering. Larger values result in fewer clusters. Default is
0.6
.- seed
An integer specifying the random seed for reproducibility. Default is
11
.- verbose
Whether to print the message. Default is
TRUE
.- ...
Additional arguments to be passed to the integration method function.
Examples
data(panc8_sub)
panc8_sub <- integration_scop(
panc8_sub,
batch = "tech",
integration_method = "Uncorrected"
)
#> ℹ [2025-09-20 14:04:24] Run Uncorrected integration...
#> ℹ [2025-09-20 14:04:24] Spliting `srt_merge` into `srt_list` by column "tech"...
#> ℹ [2025-09-20 14:04:24] Checking a list of <Seurat> object...
#> ! [2025-09-20 14:04:25] Data 1/5 of the `srt_list` is "unknown"
#> ℹ [2025-09-20 14:04:25] Perform `NormalizeData()` with `normalization.method = 'LogNormalize'` on the data 1/5 of the `srt_list`...
#> ℹ [2025-09-20 14:04:26] Perform `Seurat::FindVariableFeatures()` on the data 1/5 of the `srt_list`...
#> ! [2025-09-20 14:04:27] Data 2/5 of the `srt_list` is "unknown"
#> ℹ [2025-09-20 14:04:27] Perform `NormalizeData()` with `normalization.method = 'LogNormalize'` on the data 2/5 of the `srt_list`...
#> ℹ [2025-09-20 14:04:28] Perform `Seurat::FindVariableFeatures()` on the data 2/5 of the `srt_list`...
#> ! [2025-09-20 14:04:29] Data 3/5 of the `srt_list` is "unknown"
#> ℹ [2025-09-20 14:04:29] Perform `NormalizeData()` with `normalization.method = 'LogNormalize'` on the data 3/5 of the `srt_list`...
#> ℹ [2025-09-20 14:04:31] Perform `Seurat::FindVariableFeatures()` on the data 3/5 of the `srt_list`...
#> ! [2025-09-20 14:04:31] Data 4/5 of the `srt_list` is "unknown"
#> ℹ [2025-09-20 14:04:31] Perform `NormalizeData()` with `normalization.method = 'LogNormalize'` on the data 4/5 of the `srt_list`...
#> ℹ [2025-09-20 14:04:33] Perform `Seurat::FindVariableFeatures()` on the data 4/5 of the `srt_list`...
#> ! [2025-09-20 14:04:33] Data 5/5 of the `srt_list` is "unknown"
#> ℹ [2025-09-20 14:04:33] Perform `NormalizeData()` with `normalization.method = 'LogNormalize'` on the data 5/5 of the `srt_list`...
#> ℹ [2025-09-20 14:04:35] Perform `Seurat::FindVariableFeatures()` on the data 5/5 of the `srt_list`...
#> ℹ [2025-09-20 14:04:35] Use the separate HVF from srt_list
#> ℹ [2025-09-20 14:04:36] Number of available HVF: 2000
#> ℹ [2025-09-20 14:04:36] Finished check
#> ℹ [2025-09-20 14:04:38] Perform Uncorrected integration
#> ℹ [2025-09-20 14:04:42] Perform ScaleData
#> ℹ [2025-09-20 14:04:43] Perform linear dimension reduction("pca")
#> ℹ [2025-09-20 14:04:45] Perform FindClusters ("louvain")
#> ℹ [2025-09-20 14:04:45] Reorder clusters...
#> ! [2025-09-20 14:04:45] Using `Seurat::AggregateExpression()` to calculate pseudo-bulk data for <Assay5>
#> ℹ [2025-09-20 14:04:45] Perform nonlinear dimension reduction ("umap")
#> ℹ [2025-09-20 14:04:45] Non-linear dimensionality reduction (umap) using (Uncorrectedpca) dims (1-10) as input
#> ℹ [2025-09-20 14:04:50] Non-linear dimensionality reduction (umap) using (Uncorrectedpca) dims (1-10) as input
#> ✔ [2025-09-20 14:04:56] Run Uncorrected integration done
CellDimPlot(
panc8_sub,
group.by = c("tech", "celltype")
)
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.
panc8_sub <- integration_scop(
panc8_sub,
batch = "tech",
integration_method = "Uncorrected",
HVF_min_intersection = 5
)
#> ℹ [2025-09-20 14:04:57] Run Uncorrected integration...
#> ℹ [2025-09-20 14:04:57] Spliting `srt_merge` into `srt_list` by column "tech"...
#> ℹ [2025-09-20 14:04:58] Checking a list of <Seurat> object...
#> ℹ [2025-09-20 14:04:58] Data 1/5 of the `srt_list` has been log-normalized
#> ℹ [2025-09-20 14:04:58] Perform `Seurat::FindVariableFeatures()` on the data 1/5 of the `srt_list`...
#> ℹ [2025-09-20 14:04:59] Data 2/5 of the `srt_list` has been log-normalized
#> ℹ [2025-09-20 14:04:59] Perform `Seurat::FindVariableFeatures()` on the data 2/5 of the `srt_list`...
#> ℹ [2025-09-20 14:04:59] Data 3/5 of the `srt_list` has been log-normalized
#> ℹ [2025-09-20 14:04:59] Perform `Seurat::FindVariableFeatures()` on the data 3/5 of the `srt_list`...
#> ℹ [2025-09-20 14:05:00] Data 4/5 of the `srt_list` has been log-normalized
#> ℹ [2025-09-20 14:05:00] Perform `Seurat::FindVariableFeatures()` on the data 4/5 of the `srt_list`...
#> ℹ [2025-09-20 14:05:00] Data 5/5 of the `srt_list` has been log-normalized
#> ℹ [2025-09-20 14:05:00] Perform `Seurat::FindVariableFeatures()` on the data 5/5 of the `srt_list`...
#> ℹ [2025-09-20 14:05:00] Use the separate HVF from srt_list
#> ℹ [2025-09-20 14:05:01] Number of available HVF: 270
#> ℹ [2025-09-20 14:05:01] Finished check
#> ℹ [2025-09-20 14:05:03] Perform Uncorrected integration
#> Warning: Layer ‘scale.data’ is empty
#> ℹ [2025-09-20 14:05:03] Perform ScaleData
#> ℹ [2025-09-20 14:05:04] Perform linear dimension reduction("pca")
#> ℹ [2025-09-20 14:05:05] Perform FindClusters ("louvain")
#> ℹ [2025-09-20 14:05:05] Reorder clusters...
#> ! [2025-09-20 14:05:05] Using `Seurat::AggregateExpression()` to calculate pseudo-bulk data for <Assay5>
#> ℹ [2025-09-20 14:05:05] Perform nonlinear dimension reduction ("umap")
#> ℹ [2025-09-20 14:05:05] Non-linear dimensionality reduction (umap) using (Uncorrectedpca) dims (1-12) as input
#> ℹ [2025-09-20 14:05:11] Non-linear dimensionality reduction (umap) using (Uncorrectedpca) dims (1-12) as input
#> ✔ [2025-09-20 14:05:17] Run Uncorrected integration done
CellDimPlot(
panc8_sub,
group.by = c("tech", "celltype")
)
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.
panc8_sub <- integration_scop(
panc8_sub,
batch = "tech",
integration_method = "Uncorrected",
HVF_min_intersection = 5,
scale_within_batch = TRUE
)
#> ℹ [2025-09-20 14:05:18] Run Uncorrected integration...
#> ℹ [2025-09-20 14:05:18] Spliting `srt_merge` into `srt_list` by column "tech"...
#> ℹ [2025-09-20 14:05:19] Checking a list of <Seurat> object...
#> ℹ [2025-09-20 14:05:19] Data 1/5 of the `srt_list` has been log-normalized
#> ℹ [2025-09-20 14:05:19] Perform `Seurat::FindVariableFeatures()` on the data 1/5 of the `srt_list`...
#> ℹ [2025-09-20 14:05:19] Data 2/5 of the `srt_list` has been log-normalized
#> ℹ [2025-09-20 14:05:19] Perform `Seurat::FindVariableFeatures()` on the data 2/5 of the `srt_list`...
#> ℹ [2025-09-20 14:05:20] Data 3/5 of the `srt_list` has been log-normalized
#> ℹ [2025-09-20 14:05:20] Perform `Seurat::FindVariableFeatures()` on the data 3/5 of the `srt_list`...
#> ℹ [2025-09-20 14:05:20] Data 4/5 of the `srt_list` has been log-normalized
#> ℹ [2025-09-20 14:05:20] Perform `Seurat::FindVariableFeatures()` on the data 4/5 of the `srt_list`...
#> ℹ [2025-09-20 14:05:21] Data 5/5 of the `srt_list` has been log-normalized
#> ℹ [2025-09-20 14:05:21] Perform `Seurat::FindVariableFeatures()` on the data 5/5 of the `srt_list`...
#> ℹ [2025-09-20 14:05:21] Use the separate HVF from srt_list
#> ℹ [2025-09-20 14:05:21] Number of available HVF: 270
#> ℹ [2025-09-20 14:05:22] Finished check
#> ℹ [2025-09-20 14:05:23] Perform Uncorrected integration
#> Warning: Layer ‘scale.data’ is empty
#> ℹ [2025-09-20 14:05:24] Perform ScaleData
#> ℹ [2025-09-20 14:05:24] Perform linear dimension reduction("pca")
#> ℹ [2025-09-20 14:05:26] Perform FindClusters ("louvain")
#> ℹ [2025-09-20 14:05:26] Reorder clusters...
#> ! [2025-09-20 14:05:26] Using `Seurat::AggregateExpression()` to calculate pseudo-bulk data for <Assay5>
#> ℹ [2025-09-20 14:05:26] Perform nonlinear dimension reduction ("umap")
#> ℹ [2025-09-20 14:05:26] Non-linear dimensionality reduction (umap) using (Uncorrectedpca) dims (1-13) as input
#> ℹ [2025-09-20 14:05:31] Non-linear dimensionality reduction (umap) using (Uncorrectedpca) dims (1-13) as input
#> ✔ [2025-09-20 14:05:37] Run Uncorrected integration done
CellDimPlot(
panc8_sub,
group.by = c("tech", "celltype")
)
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.
panc8_sub <- integration_scop(
panc8_sub,
batch = "tech",
integration_method = "Seurat"
)
#> ℹ [2025-09-20 14:05:38] Run Seurat integration...
#> ℹ [2025-09-20 14:05:38] Spliting `srt_merge` into `srt_list` by column "tech"...
#> ℹ [2025-09-20 14:05:39] Checking a list of <Seurat> object...
#> ℹ [2025-09-20 14:05:39] Data 1/5 of the `srt_list` has been log-normalized
#> ℹ [2025-09-20 14:05:39] Perform `Seurat::FindVariableFeatures()` on the data 1/5 of the `srt_list`...
#> ℹ [2025-09-20 14:05:39] Data 2/5 of the `srt_list` has been log-normalized
#> ℹ [2025-09-20 14:05:39] Perform `Seurat::FindVariableFeatures()` on the data 2/5 of the `srt_list`...
#> ℹ [2025-09-20 14:05:40] Data 3/5 of the `srt_list` has been log-normalized
#> ℹ [2025-09-20 14:05:40] Perform `Seurat::FindVariableFeatures()` on the data 3/5 of the `srt_list`...
#> ℹ [2025-09-20 14:05:40] Data 4/5 of the `srt_list` has been log-normalized
#> ℹ [2025-09-20 14:05:40] Perform `Seurat::FindVariableFeatures()` on the data 4/5 of the `srt_list`...
#> ℹ [2025-09-20 14:05:41] Data 5/5 of the `srt_list` has been log-normalized
#> ℹ [2025-09-20 14:05:41] Perform `Seurat::FindVariableFeatures()` on the data 5/5 of the `srt_list`...
#> ℹ [2025-09-20 14:05:41] Use the separate HVF from srt_list
#> ℹ [2025-09-20 14:05:42] Number of available HVF: 2000
#> ℹ [2025-09-20 14:05:42] Finished check
#> ℹ [2025-09-20 14:05:44] Perform FindIntegrationAnchors
#> ℹ [2025-09-20 14:06:18] Perform integration(Seurat)
#> Warning: Layer counts isn't present in the assay object; returning NULL
#> Warning: Different cells in new layer data than already exists for scale.data
#> Warning: Layer counts isn't present in the assay object; returning NULL
#> Warning: Different cells in new layer data than already exists for scale.data
#> Warning: Layer counts isn't present in the assay object; returning NULL
#> Warning: Different cells in new layer data than already exists for scale.data
#> Warning: Layer counts isn't present in the assay object; returning NULL
#> ℹ [2025-09-20 14:06:28] Perform ScaleData
#> ℹ [2025-09-20 14:06:28] Perform linear dimension reduction ("pca")
#> ℹ [2025-09-20 14:06:29] Perform FindClusters ("louvain")
#> ℹ [2025-09-20 14:06:29] Reorder clusters...
#> ! [2025-09-20 14:06:29] Using `Seurat::AverageExpression()` to calculate pseudo-bulk data for <Assay>
#> ℹ [2025-09-20 14:06:29] Perform nonlinear dimension reduction (umap)
#> ℹ [2025-09-20 14:06:29] Non-linear dimensionality reduction (umap) using (Seuratpca) dims (1-12) as input
#> ℹ [2025-09-20 14:06:34] Non-linear dimensionality reduction (umap) using (Seuratpca) dims (1-12) as input
#> ✔ [2025-09-20 14:06:41] Run Seurat integration done
CellDimPlot(panc8_sub, group.by = c("tech", "celltype"))
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.
if (FALSE) { # \dontrun{
panc8_sub <- integration_scop(
panc8_sub,
batch = "tech",
integration_method = "Seurat",
FindIntegrationAnchors_params = list(reduction = "rpca")
)
CellDimPlot(panc8_sub, group.by = c("tech", "celltype"))
integration_methods <- c(
"Uncorrected", "Seurat", "scVI", "MNN", "fastMNN", "Harmony",
"Scanorama", "BBKNN", "CSS", "LIGER", "Conos", "ComBat"
)
for (method in integration_methods) {
panc8_sub <- integration_scop(
panc8_sub,
batch = "tech",
integration_method = method,
linear_reduction_dims_use = 1:50,
nonlinear_reduction = "umap"
)
print(
CellDimPlot(panc8_sub,
group.by = c("tech", "celltype"),
reduction = paste0(method, "UMAP2D"),
xlab = "", ylab = "", title = method,
legend.position = "none", theme_use = "theme_blank"
)
)
}
nonlinear_reductions <- c(
"umap", "tsne", "dm", "phate",
"pacmap", "trimap", "largevis", "fr"
)
panc8_sub <- integration_scop(
panc8_sub,
batch = "tech",
integration_method = "Seurat",
linear_reduction_dims_use = 1:50,
nonlinear_reduction = nonlinear_reductions
)
for (nr in nonlinear_reductions) {
print(
CellDimPlot(
panc8_sub,
group.by = c("tech", "celltype"),
reduction = paste0("Seurat", nr, "2D"),
xlab = "", ylab = "", title = nr,
legend.position = "none", theme_use = "theme_blank"
)
)
}
} # }