Skip to contents

Perform the enrichment analysis (over-representation) on the genes

Usage

RunEnrichment(
  srt = NULL,
  group_by = NULL,
  test.use = "wilcox",
  DE_threshold = "avg_log2FC > 0 & p_val_adj < 0.05",
  geneID = NULL,
  geneID_groups = NULL,
  geneID_exclude = NULL,
  IDtype = "symbol",
  result_IDtype = "symbol",
  species = "Homo_sapiens",
  db = "GO_BP",
  db_update = FALSE,
  db_version = "latest",
  db_combine = FALSE,
  convert_species = TRUE,
  Ensembl_version = 103,
  mirror = NULL,
  TERM2GENE = NULL,
  TERM2NAME = NULL,
  minGSSize = 10,
  maxGSSize = 500,
  unlimited_db = c("Chromosome", "GeneType", "TF", "Enzyme", "CSPA"),
  GO_simplify = FALSE,
  GO_simplify_cutoff = "p.adjust < 0.05",
  simplify_method = "Wang",
  simplify_similarityCutoff = 0.7,
  cores = 1,
  verbose = TRUE
)

Arguments

srt

A Seurat object containing the results of differential expression analysis (RunDEtest). If specified, the genes and groups will be extracted from the Seurat object automatically. If not specified, the geneID and geneID_groups arguments must be provided.

group_by

A character vector specifying the grouping variable in the Seurat object. This argument is only used if srt is specified.

test.use

A character vector specifying the test to be used in differential expression analysis. This argument is only used if srt is specified.

DE_threshold

A character vector specifying the filter condition for differential expression analysis. This argument is only used if srt is specified.

geneID

A character vector specifying the gene IDs.

geneID_groups

A factor vector specifying the group labels for each gene.

geneID_exclude

A character vector specifying the gene IDs to be excluded from the analysis.

IDtype

A character vector specifying the type of gene IDs in the srt object or geneID argument. This argument is used to convert the gene IDs to a different type if IDtype is different from result_IDtype.

result_IDtype

A character vector specifying the desired type of gene ID to be used in the output. This argument is used to convert the gene IDs from IDtype to result_IDtype.

species

A character vector specifying the species for which the analysis is performed.

db

A character vector specifying the name of the database to be used for enrichment analysis.

db_update

Whether the gene annotation databases should be forcefully updated. If set to FALSE, the function will attempt to load the cached databases instead. Default is FALSE.

db_version

A character vector specifying the version of the database to be used. This argument is ignored if db_update is TRUE. Default is "latest".

db_combine

Whether to combine multiple databases into one. If TRUE, all database specified by db will be combined as one named "Combined".

convert_species

Whether to use a species-converted database when the annotation is missing for the specified species. The default value is TRUE.

Ensembl_version

Ensembl database version. If NULL, use the current release version.

mirror

Specify an Ensembl mirror to connect to. The valid options here are "www", "uswest", "useast", "asia".

TERM2GENE

A data frame specifying the gene-term mapping for a custom database. The first column should contain the term IDs, and the second column should contain the gene IDs.

TERM2NAME

A data frame specifying the term-name mapping for a custom database. The first column should contain the term IDs, and the second column should contain the corresponding term names.

minGSSize

The minimum size of a gene set to be considered in the enrichment analysis.

maxGSSize

The maximum size of a gene set to be considered in the enrichment analysis.

unlimited_db

A character vector specifying the names of databases that do not have size restrictions.

GO_simplify

Whether to simplify the GO terms. If TRUE, additional results with simplified GO terms will be returned.

GO_simplify_cutoff

A character vector specifying the filter condition for simplification of GO terms. This argument is only used if GO_simplify is TRUE.

simplify_method

A character vector specifying the method to be used for simplification of GO terms. This argument is only used if GO_simplify is TRUE.

simplify_similarityCutoff

The similarity cutoff for simplification of GO terms. This argument is only used if GO_simplify is TRUE.

cores

The number of cores to use for parallelization with foreach::foreach. Default is 1.

verbose

Whether to print the message. Default is TRUE.

Value

If input is a Seurat object, returns the modified Seurat object with the enrichment result stored in the tools slot.

If input is a geneID vector with or without geneID_groups, return the enrichment result directly.

Enrichment result is a list with the following component:

  • enrichment: A data.frame containing all enrichment results.

  • results: A list of enrichResult objects from the DOSE package.

  • geneMap: A data.frame containing the ID mapping table for input gene IDs.

  • input: A data.frame containing the input gene IDs and gene ID groups.

  • DE_threshold: A specific threshold for differential expression analysis (only returned if input is a Seurat object).

Examples

data(pancreas_sub)
pancreas_sub <- standard_scop(pancreas_sub)
#>  [2025-09-20 13:41:26] Start standard scop workflow...
#>  [2025-09-20 13:41:27] Checking a list of <Seurat> object...
#> ! [2025-09-20 13:41:27] Data 1/1 of the `srt_list` is "unknown"
#>  [2025-09-20 13:41:27] Perform `NormalizeData()` with `normalization.method = 'LogNormalize'` on the data 1/1 of the `srt_list`...
#>  [2025-09-20 13:41:29] Perform `Seurat::FindVariableFeatures()` on the data 1/1 of the `srt_list`...
#>  [2025-09-20 13:41:29] Use the separate HVF from srt_list
#>  [2025-09-20 13:41:30] Number of available HVF: 2000
#>  [2025-09-20 13:41:30] Finished check
#>  [2025-09-20 13:41:30] Perform `Seurat::ScaleData()`
#> Warning: Different features in new layer data than already exists for scale.data
#>  [2025-09-20 13:41:30] Perform pca linear dimension reduction
#> StandardPC_ 1 
#> Positive:  Aplp1, Cpe, Gnas, Fam183b, Map1b, Hmgn3, Pcsk1n, Chga, Tuba1a, Bex2 
#> 	   Syt13, Isl1, 1700086L19Rik, Pax6, Chgb, Scgn, Rbp4, Scg3, Gch1, Camk2n1 
#> 	   Cryba2, Pcsk2, Pyy, Tspan7, Mafb, Hist3h2ba, Dbpht2, Abcc8, Rap1b, Slc38a5 
#> Negative:  Spp1, Anxa2, Sparc, Dbi, 1700011H14Rik, Wfdc2, Gsta3, Adamts1, Clu, Mgst1 
#> 	   Bicc1, Ldha, Vim, Cldn3, Cyr61, Rps2, Mt1, Ptn, Phgdh, Nudt19 
#> 	   Smtnl2, Smco4, Habp2, Mt2, Col18a1, Rpl12, Galk1, Cldn10, Acot1, Ccnd1 
#> StandardPC_ 2 
#> Positive:  Rbp4, Tagln2, Tuba1b, Fkbp2, Pyy, Pcsk2, Iapp, Tmem27, Meis2, Tubb4b 
#> 	   Pcsk1n, Dbpht2, Rap1b, Dynll1, Tubb2a, Sdf2l1, Scgn, 1700086L19Rik, Scg2, Abcc8 
#> 	   Atp1b1, Hspa5, Fam183b, Papss2, Slc38a5, Scg3, Mageh1, Tspan7, Ppp1r1a, Ociad2 
#> Negative:  Neurog3, Btbd17, Gadd45a, Ppp1r14a, Neurod2, Sox4, Smarcd2, Mdk, Pax4, Btg2 
#> 	   Sult2b1, Hes6, Grasp, Igfbpl1, Gpx2, Cbfa2t3, Foxa3, Shf, Mfng, Tmsb4x 
#> 	   Amotl2, Gdpd1, Cdc14b, Epb42, Rcor2, Cotl1, Upk3bl, Rbfox3, Cldn6, Cer1 
#> StandardPC_ 3 
#> Positive:  Nusap1, Top2a, Birc5, Aurkb, Cdca8, Pbk, Mki67, Tpx2, Plk1, Ccnb1 
#> 	   2810417H13Rik, Incenp, Cenpf, Ccna2, Prc1, Racgap1, Cdk1, Aurka, Cdca3, Hmmr 
#> 	   Spc24, Kif23, Sgol1, Cenpe, Cdc20, Hist1h1b, Cdca2, Mxd3, Kif22, Ska1 
#> Negative:  Anxa5, Pdzk1ip1, Acot1, Tpm1, Anxa2, Dcdc2a, Capg, Sparc, Ttr, Pamr1 
#> 	   Clu, Cxcl12, Ndrg2, Hnf1aos1, Gas6, Gsta3, Krt18, Ces1d, Atp1b1, Muc1 
#> 	   Hhex, Acadm, Spp1, Enpp2, Bcl2l14, Sat1, Smtnl2, 1700011H14Rik, Tgm2, Fam159a 
#> StandardPC_ 4 
#> Positive:  Glud1, Tm4sf4, Akr1c19, Cldn4, Runx1t1, Fev, Pou3f4, Gm43861, Pgrmc1, Arx 
#> 	   Cd200, Lrpprc, Hmgn3, Ppp1r14c, Pam, Etv1, Tsc22d1, Slc25a5, Akap17b, Pgf 
#> 	   Fam43a, Emb, Jun, Krt8, Dnajc12, Mid1ip1, Ids, Rgs17, Uchl1, Alcam 
#> Negative:  Ins2, Ins1, Ppp1r1a, Nnat, Calr, Sytl4, Sdf2l1, Iapp, Pdia6, Mapt 
#> 	   G6pc2, C2cd4b, Npy, Gng12, P2ry1, Ero1lb, Adra2a, Papss2, Arhgap36, Fam151a 
#> 	   Dlk1, Creld2, Gip, Tmem215, Gm27033, Cntfr, Prss53, C2cd4a, Lyve1, Ociad2 
#> StandardPC_ 5 
#> Positive:  Pdx1, Nkx6-1, Npepl1, Cldn4, Cryba2, Fev, Jun, Chgb, Gng12, Adra2a 
#> 	   Mnx1, Sytl4, Pdk3, Gm27033, Nnat, Chga, Ins2, 1110012L19Rik, Enho, Krt7 
#> 	   Mlxipl, Tmsb10, Flrt1, Pax4, Tubb3, Prrg2, Gars, Frzb, BC023829, Gm2694 
#> Negative:  Irx2, Irx1, Gcg, Ctxn2, Tmem27, Ctsz, Tmsb15l, Nap1l5, Pou6f2, Gria2 
#> 	   Ghrl, Peg10, Smarca1, Arx, Lrpap1, Rgs4, Ttr, Gast, Tmsb15b2, Serpina1b 
#> 	   Slc16a10, Wnk3, Ly6e, Auts2, Sct, Arg1, Dusp10, Sphkap, Dock11, Edn3 
#>  [2025-09-20 13:41:31] Perform `Seurat::FindClusters()` with louvain and `cluster_resolution` = 0.6
#>  [2025-09-20 13:41:31] Reorder clusters...
#> ! [2025-09-20 13:41:31] Using `Seurat::AggregateExpression()` to calculate pseudo-bulk data for <Assay5>
#>  [2025-09-20 13:41:31] Perform umap nonlinear dimension reduction
#>  [2025-09-20 13:41:31] Non-linear dimensionality reduction (umap) using (Standardpca) dims (1-50) as input
#>  [2025-09-20 13:41:31] UMAP will return its model
#>  [2025-09-20 13:41:36] Non-linear dimensionality reduction (umap) using (Standardpca) dims (1-50) as input
#>  [2025-09-20 13:41:36] UMAP will return its model
#>  [2025-09-20 13:41:40] Run scop standard workflow done
pancreas_sub <- RunDEtest(
  pancreas_sub,
  group_by = "CellType"
)
#>  [2025-09-20 13:41:40] immunogenomics/presto installed successfully
#>  [2025-09-20 13:41:41] Data type is log-normalized
#>  [2025-09-20 13:41:41] Start differential expression test
#>  [2025-09-20 13:41:41] Find all markers(wilcox) among 5 groups...
#>  [2025-09-20 13:41:41] Using 1 core
#> ⠙ [2025-09-20 13:41:41] Running [1/5] ETA:  1s
#>  [2025-09-20 13:41:41] Completed 5 tasks in 856ms
#> 
#>  [2025-09-20 13:41:41] Building results
#>  [2025-09-20 13:41:42] Differential expression test completed
pancreas_sub <- RunEnrichment(
  pancreas_sub,
  group_by = "CellType",
  DE_threshold = "p_val_adj < 0.05",
  db = "GO_BP",
  species = "Mus_musculus"
)
#>  [2025-09-20 13:41:42] Start Enrichment analysis
#>  [2025-09-20 13:41:42] clusterProfiler installed successfully
#>  [2025-09-20 13:41:42] Species: Mus_musculus
#>  [2025-09-20 13:41:42] Loading cached: GO_BP version: 3.21.0 nterm:15445 created: "2025-09-20 13:12:47"
#>  [2025-09-20 13:41:43] Permform enrichment...
#>  [2025-09-20 13:41:43] Using 1 core
#> ⠙ [2025-09-20 13:41:43] Running [1/5] ETA:  1m
#> ⠹ [2025-09-20 13:41:43] Running [2/5] ETA: 49s
#> ⠸ [2025-09-20 13:41:43] Running [3/5] ETA: 32s
#> ⠼ [2025-09-20 13:41:43] Running [4/5] ETA: 15s
#>  [2025-09-20 13:41:43] Completed 5 tasks in 1m 13s
#> 
#>  [2025-09-20 13:42:56] Building results
#>  [2025-09-20 13:42:56] Enrichment analysis done
EnrichmentPlot(
  pancreas_sub,
  db = "GO_BP",
  group_by = "CellType",
  plot_type = "comparison"
)
#> Warning: Vectorized input to `element_text()` is not officially supported.
#>  Results may be unexpected or may change in future versions of ggplot2.
#> Warning: Removed 113 rows containing missing values or values outside the scale range
#> (`geom_point()`).


if (FALSE) { # \dontrun{
pancreas_sub <- RunEnrichment(
  pancreas_sub,
  group_by = "CellType",
  DE_threshold = "p_val_adj < 0.05",
  db = c("MSigDB", "MSigDB_MH"),
  species = "Mus_musculus"
)
EnrichmentPlot(
  pancreas_sub,
  db = "MSigDB",
  group_by = "CellType",
  plot_type = "comparison"
)
EnrichmentPlot(
  pancreas_sub,
  db = "MSigDB_MH",
  group_by = "CellType",
  plot_type = "comparison"
)

# Remove redundant GO terms
pancreas_sub <- RunEnrichment(
  pancreas_sub,
  group_by = "CellType",
  db = "GO_BP",
  GO_simplify = TRUE,
  species = "Mus_musculus"
)
EnrichmentPlot(
  pancreas_sub,
  db = "GO_BP_sim",
  group_by = "CellType",
  plot_type = "comparison"
)

# Or use "geneID" and "geneID_groups" as input to run enrichment
de_df <- dplyr::filter(
  pancreas_sub@tools$DEtest_CellType$AllMarkers_wilcox,
  p_val_adj < 0.05
)
enrich_out <- RunEnrichment(
  geneID = de_df[["gene"]],
  geneID_groups = de_df[["group1"]],
  db = "GO_BP",
  species = "Mus_musculus"
)
EnrichmentPlot(
  res = enrich_out,
  db = "GO_BP",
  plot_type = "comparison"
)

# Use a combined database
pancreas_sub <- RunEnrichment(
  pancreas_sub,
  group_by = "CellType",
  db = c(
    "KEGG", "WikiPathway", "Reactome", "PFAM", "MP"
  ),
  db_combine = TRUE,
  species = "Mus_musculus"
)
EnrichmentPlot(
  pancreas_sub,
  db = "Combined",
  group_by = "CellType",
  plot_type = "comparison"
)
} # }