Skip to contents

Run metabolism pathway scoring

Usage

RunMetabolism(
  srt,
  assay = NULL,
  group.by = NULL,
  layer = "counts",
  db = c("KEGG", "REACTOME"),
  species = "Homo_sapiens",
  IDtype = "symbol",
  db_update = FALSE,
  db_version = "latest",
  convert_species = TRUE,
  Ensembl_version = NULL,
  mirror = NULL,
  biomart = NULL,
  max_tries = 5,
  use_preparedb = TRUE,
  method = c("AUCell", "GSVA", "ssGSEA", "VISION"),
  backend = c("cpp", "r"),
  cpp_strategy = c("sparse", "topk", "full"),
  cpp_chunk_size = NULL,
  minGSSize = 10,
  maxGSSize = 500,
  assay_name = "METABOLISM",
  new_assay = TRUE,
  seed = 11,
  verbose = TRUE
)

Arguments

srt

A Seurat object.

assay

Assay to use as expression matrix. Default is DefaultAssay(srt).

group.by

Name of metadata column to group cells by. If NULL, single-cell scoring. If provided, expression is averaged by group before scoring (cell-type level).

layer

Data layer to use, usually "counts" for count matrix.

db

Databases to use for metabolism pathways. One or both of "KEGG", "REACTOME". "Reactome" is also accepted and treated identically to "REACTOME". When use_preparedb = TRUE, gene sets are built via PrepareDB.

species

Species of the input data. The scMetabolism gene sets contain human gene symbols. When species is not "Homo_sapiens" and convert_species is TRUE, GeneConvert is used to map human genes to the target species via biomaRt homolog tables. Default is "Homo_sapiens".

IDtype

A character vector specifying the type of gene IDs in the srt object or geneID argument. This argument is used to convert the gene IDs to a different type if IDtype is different from result_IDtype.

db_update

Whether the gene annotation databases should be forcefully updated. If set to FALSE, the function will attempt to load the cached databases instead. Default is FALSE.

db_version

A character vector specifying the version of the gene annotation databases to be retrieved. Default is "latest".

convert_species

Whether to convert human gene symbols from the scMetabolism gene sets to the target species using GeneConvert. When TRUE (default), genes are mapped via cross-species orthologs from Ensembl BioMart. When FALSE, only case-insensitive direct symbol matching is used.

Ensembl_version

An integer specifying the Ensembl version. Default is NULL. If NULL, the latest version will be used.

mirror

Specify an Ensembl mirror to connect to. The valid options here are "www", "uswest", "useast", "asia".

biomart

BioMart database name passed to GeneConvert. Default NULL uses "ensembl". Other options: "protists_mart", "fungi_mart", "plants_mart".

max_tries

Maximum retry attempts for biomaRt connections in GeneConvert. Default is 5.

use_preparedb

When TRUE, gene sets are built via PrepareDB which provides species-aware gene mapping via BioMart and KEGG/Reactome databases. This automatically handles gene symbol conversion for non-human species (e.g., species = "Mus_musculus" → mouse gene symbols in metabolism pathways). When FALSE, raw scMetabolism GMT files are downloaded and genes are matched case-insensitively with optional GeneConvert supplementation when convert_species = TRUE. genes and approximates zero ties, "topk" ranks only genes that can contribute to AUCell AUC, and "full" ranks all genes.

method

Scoring method, one of "AUCell", "GSVA", "ssGSEA", "VISION".

backend

Scoring backend. "cpp" is the default for supported methods. "r" uses the original R package implementation. "cpp" currently supports method = "AUCell", method = "GSVA", and method = "ssGSEA". method = "VISION" falls back to "r" when backend is not explicitly set. AUCell C++ scores may differ from the R backend when tied expression values are randomly ranked.

cpp_strategy

C++ AUCell ranking strategy. "sparse" ranks non-zero

cpp_chunk_size

Optional cell chunk size for C++ GSVA kernels. NULL or "auto" automatically chunks large matrices to reduce peak dense intermediate memory; positive values set the chunk size manually.

minGSSize

The minimum size of a gene set to be considered in the enrichment analysis.

maxGSSize

The maximum size of a gene set to be considered in the enrichment analysis.

assay_name

Name of the assay to store metabolism scores when new_assay = TRUE. Default is "METABOLISM".

new_assay

Whether to create a new assay for metabolism scores when group.by = NULL. Default is TRUE.

seed

Random seed for reproducibility. Default is 11.

verbose

Whether to print the message. Default is TRUE.

Value

Returns a Seurat object. When group.by = NULL, stores scores in assay assay_name and tools. When group.by is provided, stores in tools slot Metabolism_<group.by>_<method> for MetabolismPlot.

Examples

data(pancreas_sub)
pancreas_sub <- standard_scop(pancreas_sub)
pancreas_sub <- RunMetabolism(
  pancreas_sub,
  assay = "RNA",
  layer = "counts",
  db = c("KEGG", "REACTOME"),
  group.by = "CellType",
  species = "Mus_musculus",
  method = "AUCell"
)
ht <- MetabolismPlot(
  pancreas_sub,
  group.by = "CellType",
  plot_type = "heatmap",
  topTerm = 10,
  width = 1,
  height = 2
)