Skip to contents

This function calculates gene-set scores from the specified database (db) for each lineage using the specified scoring method (score_method). It then treats these scores as expression values and uses them as input to the RunDynamicFeatures function to identify dynamically enriched terms along the lineage.

Usage

RunDynamicEnrichment(
  srt,
  lineages,
  score_method = "AUCell",
  layer = "data",
  assay = NULL,
  min_expcells = 20,
  r.sq = 0.2,
  dev.expl = 0.2,
  padjust = 0.05,
  IDtype = "symbol",
  species = "Homo_sapiens",
  db = "GO_BP",
  db_update = FALSE,
  db_version = "latest",
  convert_species = TRUE,
  Ensembl_version = NULL,
  mirror = NULL,
  features = NULL,
  TERM2GENE = NULL,
  TERM2NAME = NULL,
  minGSSize = 10,
  maxGSSize = 500,
  backend = c("cpp", "r"),
  cpp_strategy = c("sparse", "topk", "full"),
  cores = 1,
  verbose = TRUE,
  seed = 11,
  ...
)

Arguments

srt

A Seurat object or SummarizedExperiment object containing the results of differential expression analysis (RunDEtest()). If specified, the genes and groups will be extracted from the object automatically. If not specified, the geneID and geneID_groups arguments must be provided.

lineages

A character vector specifying the lineage names for which dynamic features should be calculated.

score_method

The method to use for scoring. Can be "Seurat", "AUCell", "UCell", "GSVA", "ssGSEA", "zscore", "PLAGE", or "VISION". Multiple methods can be supplied at once; each method will be written to a method-suffixed assay before dynamic-feature fitting. Default is "AUCell".

layer

Which layer to use. Default is "counts".

assay

Which assay to use. If NULL, the default assay of the Seurat object will be used. When the object also contains ChromatinAssay, the default assay and additional ChromatinAssay will be preprocessed sequentially.

min_expcells

The minimum number of expected cells. Default is 20.

r.sq

The R-squared threshold. Default is 0.2.

dev.expl

The deviance explained threshold. Default is 0.2.

padjust

The p-value adjustment threshold. Default is 0.05.

IDtype

A character vector specifying the type of gene IDs in the srt object or geneID argument. This argument is used to convert the gene IDs to a different type if IDtype is different from result_IDtype.

species

A character vector specifying the species for which the gene annotation databases should be prepared. Can be "Homo_sapiens" or "Mus_musculus".

db

A character vector specifying the annotation sources to be included in the gene annotation databases. Can be one or more of "GO", "GO_BP", "GO_CC", "GO_MF", "KEGG", "WikiPathway", "Reactome", "CORUM", "MP", "DO", "HPO", "PFAM", "CSPA", "Surfaceome", "SPRomeDB", "VerSeDa", "TFLink", "hTFtarget", "TRRUST", "JASPAR", "ENCODE", "MSigDB", "CellTalk", "CellChat", "Chromosome", "GeneType", "Enzyme", "TF", "CytoTRACE2". MSigDB subcollections can be requested as "MSigDB_<collection>", such as "MSigDB_H" for human Hallmark and "MSigDB_MH" for mouse Hallmark. Note: "CytoTRACE2" is species-independent and downloads pre-trained model data required by RunCytoTRACE.

db_update

Whether the gene annotation databases should be forcefully updated. If set to FALSE, the function will attempt to load the cached databases instead. Default is FALSE.

db_version

A character vector specifying the version of the gene annotation databases to be retrieved. Default is "latest".

convert_species

Whether to use a species-converted database when the annotation is missing for the specified species. Default is TRUE.

Ensembl_version

An integer specifying the Ensembl version. Default is NULL. If NULL, the latest version will be used.

mirror

Specify an Ensembl mirror to connect to. The valid options here are "www", "uswest", "useast", "asia".

features

A named list of feature lists for custom enrichment gene sets. If provided, it takes precedence over TERM2GENE and db.

TERM2GENE

A data frame specifying the gene-term mapping for a custom database. The first column should contain the term IDs, and the second column should contain the gene IDs.

TERM2NAME

A data frame specifying the term-name mapping for a custom database. The first column should contain the term IDs, and the second column should contain the corresponding term names.

minGSSize

The minimum size of a gene set to be considered in the enrichment analysis.

maxGSSize

The maximum size of a gene set to be considered in the enrichment analysis.

backend

Enrichment backend. "cpp" is the default and uses a fast native hypergeometric ORA implementation and returns the enrichment table without enrichResult objects. "r" uses clusterProfiler::enricher() and returns enrichResult objects in results. GO_simplify = TRUE currently uses the R backend.

cpp_strategy

C++ AUCell ranking strategy. "sparse" ranks non-zero genes and approximates zero ties, "topk" ranks only genes that can contribute to AUCell AUC, and "full" ranks all genes.

cores

The number of cores to use for parallelization with foreach::foreach. Default is 1.

verbose

Whether to print the message. Default is TRUE.

seed

Random seed for reproducibility. Default is 11.

...

Passed to other functions.

Examples

data(pancreas_sub)
pancreas_sub <- standard_scop(pancreas_sub)
pancreas_sub <- RunSlingshot(
  pancreas_sub,
  group.by = "CellType",
  reduction = "UMAP"
)
pancreas_sub <- RunDynamicFeatures(
  pancreas_sub,
  lineages = "Lineage1",
  fit_method = "pretsa",
  n_candidates = 200
)
ht1 <- DynamicHeatmap(
  pancreas_sub,
  lineages = "Lineage1",
  cell_annotation = "CellType",
  n_split = 3
)

pancreas_sub <- RunDynamicEnrichment(
  pancreas_sub,
  lineages = "Lineage1",
  score_method = "AUCell",
  db = "GO_BP",
  species = "Mus_musculus"
)
ht2 <- DynamicHeatmap(
  pancreas_sub,
  assay = "GO_BP",
  lineages = "Lineage1_GO_BP",
  cell_annotation = "CellType",
  n_split = 3,
  split_method = "kmeans-peaktime"
)