This function calculates gene-set scores from the specified database (db) for each lineage using the specified scoring method (score_method).
It then treats these scores as expression values and uses them as input to the RunDynamicFeatures function to identify dynamically enriched terms along the lineage.
Usage
RunDynamicEnrichment(
srt,
lineages,
score_method = "AUCell",
layer = "data",
assay = NULL,
min_expcells = 20,
r.sq = 0.2,
dev.expl = 0.2,
padjust = 0.05,
IDtype = "symbol",
species = "Homo_sapiens",
db = "GO_BP",
db_update = FALSE,
db_version = "latest",
convert_species = TRUE,
Ensembl_version = NULL,
mirror = NULL,
features = NULL,
TERM2GENE = NULL,
TERM2NAME = NULL,
minGSSize = 10,
maxGSSize = 500,
backend = c("cpp", "r"),
cpp_strategy = c("sparse", "topk", "full"),
cores = 1,
verbose = TRUE,
seed = 11,
...
)Arguments
- srt
A
Seuratobject orSummarizedExperimentobject containing the results of differential expression analysis (RunDEtest()). If specified, the genes and groups will be extracted from the object automatically. If not specified, thegeneIDandgeneID_groupsarguments must be provided.- lineages
A character vector specifying the lineage names for which dynamic features should be calculated.
- score_method
The method to use for scoring. Can be
"Seurat","AUCell","UCell","GSVA","ssGSEA","zscore","PLAGE", or"VISION". Multiple methods can be supplied at once; each method will be written to a method-suffixed assay before dynamic-feature fitting. Default is"AUCell".- layer
Which layer to use. Default is
"counts".- assay
Which assay to use. If
NULL, the default assay of the Seurat object will be used. When the object also containsChromatinAssay, the default assay and additionalChromatinAssaywill be preprocessed sequentially.- min_expcells
The minimum number of expected cells. Default is
20.- r.sq
The R-squared threshold. Default is
0.2.- dev.expl
The deviance explained threshold. Default is
0.2.- padjust
The p-value adjustment threshold. Default is
0.05.- IDtype
A character vector specifying the type of gene IDs in the
srtobject orgeneIDargument. This argument is used to convert the gene IDs to a different type ifIDtypeis different fromresult_IDtype.- species
A character vector specifying the species for which the gene annotation databases should be prepared. Can be
"Homo_sapiens"or"Mus_musculus".- db
A character vector specifying the annotation sources to be included in the gene annotation databases. Can be one or more of
"GO", "GO_BP", "GO_CC", "GO_MF", "KEGG", "WikiPathway", "Reactome", "CORUM", "MP", "DO", "HPO", "PFAM", "CSPA", "Surfaceome", "SPRomeDB", "VerSeDa", "TFLink", "hTFtarget", "TRRUST", "JASPAR", "ENCODE", "MSigDB", "CellTalk", "CellChat", "Chromosome", "GeneType", "Enzyme", "TF", "CytoTRACE2". MSigDB subcollections can be requested as"MSigDB_<collection>", such as"MSigDB_H"for human Hallmark and"MSigDB_MH"for mouse Hallmark. Note:"CytoTRACE2"is species-independent and downloads pre-trained model data required by RunCytoTRACE.- db_update
Whether the gene annotation databases should be forcefully updated. If set to FALSE, the function will attempt to load the cached databases instead. Default is
FALSE.- db_version
A character vector specifying the version of the gene annotation databases to be retrieved. Default is
"latest".- convert_species
Whether to use a species-converted database when the annotation is missing for the specified species. Default is
TRUE.- Ensembl_version
An integer specifying the Ensembl version. Default is
NULL. IfNULL, the latest version will be used.- mirror
Specify an Ensembl mirror to connect to. The valid options here are
"www","uswest","useast","asia".- features
A named list of feature lists for custom enrichment gene sets. If provided, it takes precedence over
TERM2GENEanddb.- TERM2GENE
A data frame specifying the gene-term mapping for a custom database. The first column should contain the term IDs, and the second column should contain the gene IDs.
- TERM2NAME
A data frame specifying the term-name mapping for a custom database. The first column should contain the term IDs, and the second column should contain the corresponding term names.
- minGSSize
The minimum size of a gene set to be considered in the enrichment analysis.
- maxGSSize
The maximum size of a gene set to be considered in the enrichment analysis.
- backend
Enrichment backend.
"cpp"is the default and uses a fast native hypergeometric ORA implementation and returns the enrichment table withoutenrichResultobjects."r"usesclusterProfiler::enricher()and returnsenrichResultobjects inresults.GO_simplify = TRUEcurrently uses the R backend.- cpp_strategy
C++ AUCell ranking strategy.
"sparse"ranks non-zero genes and approximates zero ties,"topk"ranks only genes that can contribute to AUCell AUC, and"full"ranks all genes.- cores
The number of cores to use for parallelization with foreach::foreach. Default is
1.- verbose
Whether to print the message. Default is
TRUE.- seed
Random seed for reproducibility. Default is
11.- ...
Passed to other functions.
Examples
data(pancreas_sub)
pancreas_sub <- standard_scop(pancreas_sub)
pancreas_sub <- RunSlingshot(
pancreas_sub,
group.by = "CellType",
reduction = "UMAP"
)
pancreas_sub <- RunDynamicFeatures(
pancreas_sub,
lineages = "Lineage1",
fit_method = "pretsa",
n_candidates = 200
)
ht1 <- DynamicHeatmap(
pancreas_sub,
lineages = "Lineage1",
cell_annotation = "CellType",
n_split = 3
)
pancreas_sub <- RunDynamicEnrichment(
pancreas_sub,
lineages = "Lineage1",
score_method = "AUCell",
db = "GO_BP",
species = "Mus_musculus"
)
ht2 <- DynamicHeatmap(
pancreas_sub,
assay = "GO_BP",
lineages = "Lineage1_GO_BP",
cell_annotation = "CellType",
n_split = 3,
split_method = "kmeans-peaktime"
)