This function performs cell scoring on a Seurat object.
It calculates scores for a given set of features and stores them in meta.data
and/or a score assay, depending on new_assay and store_metadata.
Usage
CellScoring(
srt,
features = NULL,
layer = "data",
assay = NULL,
split.by = NULL,
IDtype = "symbol",
species = "Homo_sapiens",
db = "GO_BP",
termnames = NULL,
db_update = FALSE,
db_version = "latest",
convert_species = TRUE,
Ensembl_version = NULL,
mirror = NULL,
minGSSize = 10,
maxGSSize = 500,
method = "Seurat",
backend = c("cpp", "r"),
cpp_strategy = c("sparse", "topk", "full"),
classification = TRUE,
name = "",
new_assay = FALSE,
store_metadata = NULL,
seed = 11,
cores = 1,
verbose = TRUE,
...
)Arguments
- srt
A Seurat object.
- features
A named list of feature lists for scoring. If
NULL,dbwill be used to create features sets.- layer
Which layer to use. Default is
data.- assay
Which assay to use. If
NULL, the default assay of the Seurat object will be used. When the object also containsChromatinAssay, the default assay and additionalChromatinAssaywill be preprocessed sequentially.- split.by
Name of a column in meta.data column to split plot by. Default is
NULL.- IDtype
A character vector specifying the type of gene IDs in the
srtobject orgeneIDargument. This argument is used to convert the gene IDs to a different type ifIDtypeis different fromresult_IDtype.- species
A character vector specifying the species for which the gene annotation databases should be prepared. Can be
"Homo_sapiens"or"Mus_musculus".- db
A character vector specifying the annotation sources to be included in the gene annotation databases. Can be one or more of
"GO", "GO_BP", "GO_CC", "GO_MF", "KEGG", "WikiPathway", "Reactome", "CORUM", "MP", "DO", "HPO", "PFAM", "CSPA", "Surfaceome", "SPRomeDB", "VerSeDa", "TFLink", "hTFtarget", "TRRUST", "JASPAR", "ENCODE", "MSigDB", "CellTalk", "CellChat", "Chromosome", "GeneType", "Enzyme", "TF", "CytoTRACE2". MSigDB subcollections can be requested as"MSigDB_<collection>", such as"MSigDB_H"for human Hallmark and"MSigDB_MH"for mouse Hallmark. Note:"CytoTRACE2"is species-independent and downloads pre-trained model data required by RunCytoTRACE.- termnames
A vector of term names to be used from the database. Default is
NULL, in which case all features from the database are used.- db_update
Whether the gene annotation databases should be forcefully updated. If set to FALSE, the function will attempt to load the cached databases instead. Default is
FALSE.- db_version
A character vector specifying the version of the gene annotation databases to be retrieved. Default is
"latest".- convert_species
Whether to use a species-converted database when the annotation is missing for the specified species. Default is
TRUE.- Ensembl_version
An integer specifying the Ensembl version. Default is
NULL. IfNULL, the latest version will be used.- mirror
Specify an Ensembl mirror to connect to. The valid options here are
"www","uswest","useast","asia".- minGSSize
The minimum size of a gene set to be considered in the enrichment analysis.
- maxGSSize
The maximum size of a gene set to be considered in the enrichment analysis.
- method
The method to use for scoring. Can be
"Seurat","AUCell","UCell","GSVA","ssGSEA","zscore","PLAGE", or"VISION". Multiple methods can be supplied at once; in that case each method is run separately and stored with a method suffix such as"GO_AUCell"or"GO_GSVA". Default is"Seurat".- backend
Scoring backend.
"cpp"is the default for supported methods."r"uses the original package implementation."cpp"currently supportsmethod = "Seurat",method = "AUCell",method = "GSVA",method = "ssGSEA",method = "zscore", andmethod = "PLAGE".method = "UCell"andmethod = "VISION"fall back to"r"whenbackendis not explicitly set.- cpp_strategy
C++ AUCell ranking strategy.
"sparse"ranks non-zero genes and approximates zero ties,"topk"ranks only genes that can contribute to AUCell AUC, and"full"ranks all genes.- classification
Whether to perform classification based on the scores. Default is
TRUE.- name
The name of the assay to store the scores in. Only used if new_assay is TRUE. Default is
"".- new_assay
Whether to create a new assay for storing the scores. Default is
FALSE.- store_metadata
Whether to also store score columns in
meta.data. WhenNULL, manualfeatures = list(...)input is stored inmeta.databy default, while database-derived results stay assay-only whennew_assay = TRUE.- seed
Random seed for reproducibility. Default is
11.- cores
The number of cores to use for parallelization with foreach::foreach. Default is
1.- verbose
Whether to print the message. Default is
TRUE.- ...
Additional arguments to be passed to the scoring methods.
Examples
data(pancreas_sub)
pancreas_sub <- standard_scop(pancreas_sub)
features_all <- rownames(pancreas_sub)
pancreas_sub <- CellScoring(
pancreas_sub,
features = list(
A = features_all[1:100],
B = features_all[101:200]
),
method = "AUCell",
name = "test"
)
CellDimPlot(pancreas_sub, "test_classification")
FeatureDimPlot(
pancreas_sub,
features = "test_A"
)
pancreas_sub <- CellScoring(
pancreas_sub,
features = list(A = features_all[1:100]),
method = c("AUCell", "GSVA")
)
FeatureStatPlot(
pancreas_sub,
stat.by = c("AUCell_A", "GSVA_A"),
group.by = "CellType",
plot.by = "feature",
plot_type = "violin",
stack = TRUE
)
FeatureDimPlot(
pancreas_sub,
features = c("AUCell_A", "GSVA_A"),
xlab = "UMAP_1",
ylab = "UMAP_2"
)
GroupHeatmap(
pancreas_sub,
features = c("AUCell_A", "GSVA_A", "Sox9", "Anxa2", "Bicc1"),
group.by = "CellType"
)
data(panc8_sub)
panc8_sub <- integration_scop(
panc8_sub,
batch = "tech",
integration_method = "Harmony"
)
panc8_sub <- CellScoring(
panc8_sub,
layer = "data",
assay = "RNA",
db = "GO_BP",
species = "Homo_sapiens",
minGSSize = 10,
maxGSSize = 100,
method = "AUCell",
name = "GO",
new_assay = TRUE
)
panc8_sub <- integration_scop(
panc8_sub,
assay = "GO",
batch = "tech",
integration_method = "Harmony"
)
CellDimPlot(
panc8_sub,
group.by = c("tech", "celltype")
)
pancreas_sub <- CellScoring(
pancreas_sub,
layer = "data",
assay = "RNA",
db = "GO_BP",
species = "Mus_musculus",
termnames = panc8_sub[["GO"]]@meta.features[, "termnames"],
method = "AUCell",
name = "GO",
new_assay = TRUE
)
pancreas_sub <- standard_scop(
pancreas_sub,
assay = "GO"
)
pancreas_sub[["tech"]] <- "Mouse"
panc_merge <- integration_scop(
srt_list = list(panc8_sub, pancreas_sub),
assay = "GO",
batch = "tech",
integration_method = "Harmony"
)
CellDimPlot(
srt = panc_merge,
group.by = c("tech", "celltype", "SubCellType", "Phase")
)
genenames <- make.unique(
thisutils::capitalize(
rownames(panc8_sub[["RNA"]]),
force_tolower = TRUE
)
)
names(genenames) <- rownames(panc8_sub)
panc8_sub <- RenameFeatures(
panc8_sub,
newnames = genenames,
assay = "RNA"
)
panc_merge <- integration_scop(
srt_list = list(panc8_sub, pancreas_sub),
assay = "RNA",
batch = "tech",
integration_method = "Harmony"
)
CellDimPlot(
srt = panc_merge,
group.by = c("tech", "celltype", "SubCellType", "Phase")
)