Run cell-level quality control for single cell RNA-seq data.

This function handles multiple quality control methods for single-cell RNA-seq data.

Usage

RunCellQC(
  srt,
  assay = "RNA",
  split.by = NULL,
  return_filtered = FALSE,
  qc_metrics = c("doublets", "outlier", "umi", "gene", "mito", "ribo", "ribo_mito_ratio",
    "species"),
  db_method = "scDblFinder",
  db_rate = NULL,
  db_coefficient = 0.01,
  outlier_threshold = c("log10_nCount:lower:2.5", "log10_nCount:higher:5",
    "log10_nFeature:lower:2.5", "log10_nFeature:higher:5", "featurecount_dist:lower:2.5"),
  outlier_n = 1,
  UMI_threshold = 3000,
  gene_threshold = 1000,
  mito_threshold = 20,
  mito_pattern = c("MT-", "Mt-", "mt-"),
  mito_gene = NULL,
  ribo_threshold = 50,
  ribo_pattern = c("RP[SL]\\d+\\w{0,1}\\d*$", "Rp[sl]\\d+\\w{0,1}\\d*$",
    "rp[sl]\\d+\\w{0,1}\\d*$"),
  ribo_gene = NULL,
  ribo_mito_ratio_range = c(1, Inf),
  species = NULL,
  species_gene_prefix = NULL,
  species_percent = 95,
  seed = 11
)

Arguments

srt: A Seurat object.
assay: The name of the assay to be used for doublet-calling. Default is "RNA".
split.by: Name of the sample variable to split the Seurat object. Default is NULL.
return_filtered: Logical indicating whether to return a cell-filtered Seurat object. Default is FALSE.
qc_metrics: A character vector specifying the quality control metrics to be applied. Default is c("doublets", "outlier", "umi", "gene", "mito", "ribo", "ribo_mito_ratio", "species").
db_method: Doublet-calling methods used. Can be one of scDblFinder, Scrublet, DoubletDetection, scds_cxds, scds_bcds, scds_hybrid
db_rate: The expected doublet rate. By default this is assumed to be 1% per thousand cells captured (so 4% among 4000 thousand cells), which is appropriate for 10x datasets.
db_coefficient: The coefficient used to calculate the doublet rate. Default is 0.01. Doublet rate is calculated asncol(srt) / 1000 * db_coefficient
outlier_threshold: A character vector specifying the outlier threshold. Default is c("log10_nCount:lower:2.5", "log10_nCount:higher:5", "log10_nFeature:lower:2.5", "log10_nFeature:higher:5", "featurecount_dist:lower:2.5"). See scuttle::isOutlier.
outlier_n: Minimum number of outlier metrics that meet the conditions for determining outlier cells. Default is 1.
UMI_threshold: UMI number threshold. Cells that exceed this threshold will be considered as kept. Default is 3000.
gene_threshold: Gene number threshold. Cells that exceed this threshold will be considered as kept. Default is 1000.
mito_threshold: Percentage of UMI counts of mitochondrial genes. Cells that exceed this threshold will be considered as discarded. Default is 20.
mito_pattern: Regex patterns to match the mitochondrial genes. Default is c("MT-", "Mt-", "mt-").
mito_gene: A defined mitochondrial genes. If features provided, will ignore the mito_pattern matching. Default is NULL.
ribo_threshold: Percentage of UMI counts of ribosomal genes. Cells that exceed this threshold will be considered as discarded. Default is 50.
ribo_pattern: Regex patterns to match the ribosomal genes. Default is c("RP[SL]\\d+\\w{0,1}\\d*$", "Rp[sl]\\d+\\w{0,1}\\d*$", "rp[sl]\\d+\\w{0,1}\\d*$").
ribo_gene: A defined ribosomal genes. If features provided, will ignore the ribo_pattern matching. Default is NULL.
ribo_mito_ratio_range: A numeric vector specifying the range of ribosomal/mitochondrial gene expression ratios for ribo_mito_ratio outlier cells. Default is c(1, Inf).
species: Species used as the suffix of the QC metrics. The first is the species of interest. Default is NULL.
species_gene_prefix: Species gene prefix used to calculate QC metrics for each species. Default is NULL.
species_percent: Percentage of UMI counts of the first species. Cells that exceed this threshold will be considered as kept. Default is 95.
seed: Set a random seed. Default is 11.

Value

Returns Seurat object with the QC results stored in the meta.data layer.

Examples

data(pancreas_sub)
pancreas_sub <- RunCellQC(pancreas_sub)
#> Warning: `PackageCheck()` was deprecated in SeuratObject 5.0.0.
#> ℹ Please use `rlang::check_installed()` instead.
#> ℹ The deprecated feature was likely used in the Seurat package.
#>   Please report the issue at <https://github.com/satijalab/seurat/issues>.
#> ℹ [2025-07-26 07:20:54] >>> Total cells: 1000
#> ℹ [2025-07-26 07:20:54] >>> Cells which are filtered out: 48
#> ℹ [2025-07-26 07:20:54] >>> 31 potential doublets
#> ℹ [2025-07-26 07:20:54] >>> 17 outlier cells
#> ℹ [2025-07-26 07:20:54] >>> 0low-UMI cells
#> ℹ [2025-07-26 07:20:54] >>> 0low-gene cells
#> ℹ [2025-07-26 07:20:54] >>> 0high-mito cells
#> ℹ [2025-07-26 07:20:54] >>> 0high-ribo cells
#> ℹ [2025-07-26 07:20:54] >>> 0ribo_mito_ratio outlier cells
#> ℹ [2025-07-26 07:20:54] >>> 0species-contaminated cells
#> ℹ [2025-07-26 07:20:54] >>> Remained cells after filtering: 952
CellStatPlot(
  srt = pancreas_sub,
  stat.by = c(
    "db_qc", "outlier_qc",
    "umi_qc", "gene_qc",
    "mito_qc", "ribo_qc",
    "ribo_mito_ratio_qc", "species_qc"
  ),
  plot_type = "upset",
  stat_level = "Fail"
)
#> ! [2025-07-26 07:20:55] Stat_type is forcibly set to 'count' when plot sankey, chord, venn or upset
#> `geom_line()`: Each group consists of only one observation.
#> ℹ Do you need to adjust the group aesthetic?
#> `geom_line()`: Each group consists of only one observation.
#> ℹ Do you need to adjust the group aesthetic?

table(pancreas_sub$CellQC)
#> 
#> Pass Fail 
#>  952   48 

data("ifnb_sub")
ifnb_sub <- RunCellQC(
  srt = ifnb_sub,
  split.by = "stim",
  UMI_threshold = 1000,
  gene_threshold = 550
)
#> ℹ [2025-07-26 07:20:55] Running QC for CTRL
#> ℹ [2025-07-26 07:21:02] >>> Total cells: 1000
#> ℹ [2025-07-26 07:21:02] >>> Cells which are filtered out: 308
#> ℹ [2025-07-26 07:21:02] >>> 47 potential doublets
#> ℹ [2025-07-26 07:21:02] >>> 8 outlier cells
#> ℹ [2025-07-26 07:21:02] >>> 28low-UMI cells
#> ℹ [2025-07-26 07:21:02] >>> 250low-gene cells
#> ℹ [2025-07-26 07:21:02] >>> 0high-mito cells
#> ℹ [2025-07-26 07:21:02] >>> 0high-ribo cells
#> ℹ [2025-07-26 07:21:02] >>> 0ribo_mito_ratio outlier cells
#> ℹ [2025-07-26 07:21:02] >>> 0species-contaminated cells
#> ℹ [2025-07-26 07:21:02] >>> Remained cells after filtering: 692
#> ℹ [2025-07-26 07:21:02] Running QC for STIM
#> ℹ [2025-07-26 07:21:09] >>> Total cells: 1000
#> ℹ [2025-07-26 07:21:09] >>> Cells which are filtered out: 302
#> ℹ [2025-07-26 07:21:09] >>> 41 potential doublets
#> ℹ [2025-07-26 07:21:09] >>> 12 outlier cells
#> ℹ [2025-07-26 07:21:09] >>> 25low-UMI cells
#> ℹ [2025-07-26 07:21:09] >>> 251low-gene cells
#> ℹ [2025-07-26 07:21:09] >>> 0high-mito cells
#> ℹ [2025-07-26 07:21:09] >>> 0high-ribo cells
#> ℹ [2025-07-26 07:21:09] >>> 0ribo_mito_ratio outlier cells
#> ℹ [2025-07-26 07:21:09] >>> 0species-contaminated cells
#> ℹ [2025-07-26 07:21:09] >>> Remained cells after filtering: 698
CellStatPlot(
  srt = ifnb_sub,
  stat.by = c(
    "db_qc", "outlier_qc",
    "umi_qc", "gene_qc",
    "mito_qc", "ribo_qc",
    "ribo_mito_ratio_qc", "species_qc"
  ),
  plot_type = "upset",
  stat_level = "Fail"
)
#> ! [2025-07-26 07:21:09] Stat_type is forcibly set to 'count' when plot sankey, chord, venn or upset

table(ifnb_sub$CellQC)
#> 
#> Pass Fail 
#> 1390  610