Predicts cellular developmental potential from single-cell RNA-seq data using the CytoTRACE 2 algorithm (Kang et al., 2025). This is a native scop implementation with C++ acceleration.
The algorithm consists of five stages:
Preprocessing: Gene orthology mapping, feature selection, ranking, and log2-CPM transformation.
GSBN Ensemble Prediction: 19 pre-trained Gene Set Binary Network models predict a continuous developmental potency score (0-1) and a discrete potency category.
Diffusion Smoothing: A Markov random-walk-with-restart on a cell-cell similarity graph smooths the raw scores.
Binning: Within each potency category, cells are ranked and linearly scaled to corresponding segments of the unit interval.
Adaptive kNN Smoothing: PCA-based nearest-neighbor consensus refinement of the final scores.
Usage
RunCytoTRACE(object, ...)
# S3 method for class 'Seurat'
RunCytoTRACE(
object,
assay = NULL,
layer = c("counts", "data"),
species = c("Homo_sapiens", "Mus_musculus"),
batch_size = 10000,
smooth_batch_size = 1000,
cores = 1,
seed = 14,
data_dir = NULL,
verbose = TRUE,
...
)
# Default S3 method
RunCytoTRACE(
object,
species = c("Homo_sapiens", "Mus_musculus"),
batch_size = 10000,
smooth_batch_size = 1000,
cores = 1,
seed = 14,
data_dir = NULL,
verbose = TRUE,
...
)Arguments
- object
An object. This can be a Seurat object or a matrix-like object (genes as rows, cells as columns).
- ...
Additional arguments (reserved for future use).
- assay
Which assay to use. If
NULL, the default assay of the Seurat object will be used. When the object also containsChromatinAssay, the default assay and additionalChromatinAssaywill be preprocessed sequentially.- layer
Which layer to use. Default is
"counts".- species
The species of the input data. Currently supported values are
"Homo_sapiens"and"Mus_musculus". Default is"Homo_sapiens".- batch_size
The number of cells to process at once. For datasets with more cells than this value, cells are randomly split into batches and processed independently. No batching if
NULL. Default is10000.- smooth_batch_size
The number of cells per subsample for the diffusion smoothing step. No diffusion subsampling if
NULL. Default is1000.- cores
Number of cores for parallel processing. Default is
1.- seed
Random seed for reproducibility. Default is
14.- data_dir
Path to the directory containing CytoTRACE2 model data files. If
NULL, uses data prepared byPrepareDB(), the user data cache, or auto-downloads from the datasets repository. Default isNULL.- verbose
Whether to print the message. Default is
TRUE.
Value
When the input is a Seurat object, the function returns a Seurat object with the following metadata columns added:
CytoTRACE2_Score: The final predicted cellular potency score (0-1)CytoTRACE2_Potency: The final predicted cellular potency category (Differentiated, Unipotent, Oligopotent, Multipotent, Pluripotent, Totipotent)CytoTRACE2_Relative: The predicted relative order (normalized to 0-1)preKNN_CytoTRACE2_Score: The potency score before KNN smoothingpreKNN_CytoTRACE2_Potency: The potency category before KNN smoothing
When the input is a matrix or data.frame, the function returns a data.frame with the same columns as above, with cell IDs as row names.
License
The CytoTRACE 2 model and associated data files are provided under the Stanford Non-Commercial Software License Agreement. Commercial entities wishing to use this software should contact Stanford University's Office of Technology Licensing (docket S24-057). See https://github.com/mengxu98/datasets/blob/main/CytoTRACE2/LICENSE for complete terms.
References
Kang, M., Brown, E., Almagro Armenteros, J.J. et al. "Improved reconstruction of single-cell developmental potential with CytoTRACE 2." Nature Methods (2025). doi:10.1038/s41592-025-02857-2
Model data: https://github.com/mengxu98/datasets/tree/main/CytoTRACE2
Examples
data(pancreas_sub)
pancreas_sub <- standard_scop(pancreas_sub)
#> ℹ [2026-05-14 07:02:50] Start standard processing workflow...
#> ℹ [2026-05-14 07:02:52] Checking a list of <Seurat>...
#> ! [2026-05-14 07:02:52] Data 1/1 of the `srt_list` is "unknown"
#> ℹ [2026-05-14 07:02:52] Perform `NormalizeData()` with `normalization.method = 'LogNormalize'` on 1/1 of `srt_list`...
#> ℹ [2026-05-14 07:02:54] Perform `Seurat::FindVariableFeatures()` on 1/1 of `srt_list`...
#> ℹ [2026-05-14 07:02:54] Use the separate HVF from `srt_list`
#> ℹ [2026-05-14 07:02:54] Number of available HVF: 2000
#> ℹ [2026-05-14 07:02:54] Finished check
#> ℹ [2026-05-14 07:02:54] Perform `Seurat::ScaleData()`
#> ℹ [2026-05-14 07:02:55] Perform pca linear dimension reduction
#> ℹ [2026-05-14 07:02:55] Use stored estimated dimensions 1:20 for Standardpca
#> ℹ [2026-05-14 07:02:56] Perform `Seurat::FindClusters()` with `cluster_algorithm = 'louvain'` and `cluster_resolution = 0.6`
#> ℹ [2026-05-14 07:02:56] Reorder clusters...
#> ℹ [2026-05-14 07:02:56] Skip `log1p()` because `layer = data` is not "counts"
#> ℹ [2026-05-14 07:02:56] Perform umap nonlinear dimension reduction
#> ℹ [2026-05-14 07:02:56] Perform umap nonlinear dimension reduction using Standardpca (1:20)
#> ℹ [2026-05-14 07:03:01] Perform umap nonlinear dimension reduction using Standardpca (1:20)
#> ✔ [2026-05-14 07:03:06] Standard processing workflow completed
pancreas_sub <- RunCytoTRACE(
pancreas_sub,
species = "Mus_musculus"
)
#> ◌ [2026-05-14 07:03:06] Running CytoTRACE2
#> ℹ [2026-05-14 07:03:06] Extracting expression matrix from `assay = RNA, layer = counts`
#> ℹ [2026-05-14 07:03:06] Loading model from /home/runner/.local/share/R/scop/CytoTRACE2
#> ℹ [2026-05-14 07:03:09] Dataset contains 15998 genes and 1000 cells.
#> ℹ [2026-05-14 07:03:09] Running on 1 subsample(s)
#> ℹ [2026-05-14 07:03:09] Preprocessing subsample (1000 cells)
#> ℹ [2026-05-14 07:03:10] 12486 input genes mapped to model genes.
#> ℹ [2026-05-14 07:03:13] Running ensemble prediction and postprocessing
#> ℹ [2026-05-14 07:03:13] Computing PCA for kNN smoothing
#> ✔ [2026-05-14 07:03:46] CytoTRACE2 computed successfully
CytoTRACEPlot(
pancreas_sub,
xlab = "UMAP_1",
ylab = "UMAP_2",
ncol = 2
)