Perform the enrichment analysis (over-representation) on the genes
Source:R/RunEnrichment.R
RunEnrichment.Rd
Perform the enrichment analysis (over-representation) on the genes
Usage
RunEnrichment(
srt = NULL,
group_by = NULL,
test.use = "wilcox",
DE_threshold = "avg_log2FC > 0 & p_val_adj < 0.05",
geneID = NULL,
geneID_groups = NULL,
geneID_exclude = NULL,
IDtype = "symbol",
result_IDtype = "symbol",
species = "Homo_sapiens",
db = "GO_BP",
db_update = FALSE,
db_version = "latest",
db_combine = FALSE,
convert_species = TRUE,
Ensembl_version = 103,
mirror = NULL,
TERM2GENE = NULL,
TERM2NAME = NULL,
minGSSize = 10,
maxGSSize = 500,
unlimited_db = c("Chromosome", "GeneType", "TF", "Enzyme", "CSPA"),
GO_simplify = FALSE,
GO_simplify_cutoff = "p.adjust < 0.05",
simplify_method = "Wang",
simplify_similarityCutoff = 0.7,
cores = 1,
verbose = TRUE
)
Arguments
- srt
A Seurat object containing the results of differential expression analysis (RunDEtest). If specified, the genes and groups will be extracted from the Seurat object automatically. If not specified, the
geneID
andgeneID_groups
arguments must be provided.- group_by
A character vector specifying the grouping variable in the Seurat object. This argument is only used if
srt
is specified.- test.use
A character vector specifying the test to be used in differential expression analysis. This argument is only used if
srt
is specified.- DE_threshold
A character vector specifying the filter condition for differential expression analysis. This argument is only used if
srt
is specified.- geneID
A character vector specifying the gene IDs.
- geneID_groups
A factor vector specifying the group labels for each gene.
- geneID_exclude
A character vector specifying the gene IDs to be excluded from the analysis.
- IDtype
A character vector specifying the type of gene IDs in the
srt
object orgeneID
argument. This argument is used to convert the gene IDs to a different type ifIDtype
is different fromresult_IDtype
.- result_IDtype
A character vector specifying the desired type of gene ID to be used in the output. This argument is used to convert the gene IDs from
IDtype
toresult_IDtype
.- species
A character vector specifying the species for which the analysis is performed.
- db
A character vector specifying the name of the database to be used for enrichment analysis.
- db_update
Whether the gene annotation databases should be forcefully updated. If set to FALSE, the function will attempt to load the cached databases instead. Default is FALSE.
- db_version
A character vector specifying the version of the database to be used. This argument is ignored if
db_update
isTRUE
. Default is "latest".- db_combine
Whether to combine multiple databases into one. If TRUE, all database specified by
db
will be combined as one named "Combined".- convert_species
Whether to use a species-converted database when the annotation is missing for the specified species. The default value is TRUE.
- Ensembl_version
Ensembl database version. If NULL, use the current release version.
- mirror
Specify an Ensembl mirror to connect to. The valid options here are
"www"
,"uswest"
,"useast"
,"asia"
.- TERM2GENE
A data frame specifying the gene-term mapping for a custom database. The first column should contain the term IDs, and the second column should contain the gene IDs.
- TERM2NAME
A data frame specifying the term-name mapping for a custom database. The first column should contain the term IDs, and the second column should contain the corresponding term names.
- minGSSize
The minimum size of a gene set to be considered in the enrichment analysis.
- maxGSSize
The maximum size of a gene set to be considered in the enrichment analysis.
- unlimited_db
A character vector specifying the names of databases that do not have size restrictions.
- GO_simplify
Whether to simplify the GO terms. If
TRUE
, additional results with simplified GO terms will be returned.- GO_simplify_cutoff
A character vector specifying the filter condition for simplification of GO terms. This argument is only used if
GO_simplify
isTRUE
.- simplify_method
A character vector specifying the method to be used for simplification of GO terms. This argument is only used if
GO_simplify
isTRUE
.- simplify_similarityCutoff
The similarity cutoff for simplification of GO terms. This argument is only used if
GO_simplify
isTRUE
.- cores
The number of cores to use for parallelization with foreach::foreach. Default is
1
.- verbose
Whether to print the message. Default is
TRUE
.
Value
If input is a Seurat object, returns the modified Seurat object with the enrichment result stored in the tools slot.
If input is a geneID vector with or without geneID_groups, return the enrichment result directly.
Enrichment result is a list with the following component:
enrichment
: A data.frame containing all enrichment results.results
: A list ofenrichResult
objects from the DOSE package.geneMap
: A data.frame containing the ID mapping table for input gene IDs.input
: A data.frame containing the input gene IDs and gene ID groups.DE_threshold
: A specific threshold for differential expression analysis (only returned if input is a Seurat object).
Examples
data(pancreas_sub)
pancreas_sub <- standard_scop(pancreas_sub)
#> ℹ [2025-09-20 13:41:26] Start standard scop workflow...
#> ℹ [2025-09-20 13:41:27] Checking a list of <Seurat> object...
#> ! [2025-09-20 13:41:27] Data 1/1 of the `srt_list` is "unknown"
#> ℹ [2025-09-20 13:41:27] Perform `NormalizeData()` with `normalization.method = 'LogNormalize'` on the data 1/1 of the `srt_list`...
#> ℹ [2025-09-20 13:41:29] Perform `Seurat::FindVariableFeatures()` on the data 1/1 of the `srt_list`...
#> ℹ [2025-09-20 13:41:29] Use the separate HVF from srt_list
#> ℹ [2025-09-20 13:41:30] Number of available HVF: 2000
#> ℹ [2025-09-20 13:41:30] Finished check
#> ℹ [2025-09-20 13:41:30] Perform `Seurat::ScaleData()`
#> Warning: Different features in new layer data than already exists for scale.data
#> ℹ [2025-09-20 13:41:30] Perform pca linear dimension reduction
#> StandardPC_ 1
#> Positive: Aplp1, Cpe, Gnas, Fam183b, Map1b, Hmgn3, Pcsk1n, Chga, Tuba1a, Bex2
#> Syt13, Isl1, 1700086L19Rik, Pax6, Chgb, Scgn, Rbp4, Scg3, Gch1, Camk2n1
#> Cryba2, Pcsk2, Pyy, Tspan7, Mafb, Hist3h2ba, Dbpht2, Abcc8, Rap1b, Slc38a5
#> Negative: Spp1, Anxa2, Sparc, Dbi, 1700011H14Rik, Wfdc2, Gsta3, Adamts1, Clu, Mgst1
#> Bicc1, Ldha, Vim, Cldn3, Cyr61, Rps2, Mt1, Ptn, Phgdh, Nudt19
#> Smtnl2, Smco4, Habp2, Mt2, Col18a1, Rpl12, Galk1, Cldn10, Acot1, Ccnd1
#> StandardPC_ 2
#> Positive: Rbp4, Tagln2, Tuba1b, Fkbp2, Pyy, Pcsk2, Iapp, Tmem27, Meis2, Tubb4b
#> Pcsk1n, Dbpht2, Rap1b, Dynll1, Tubb2a, Sdf2l1, Scgn, 1700086L19Rik, Scg2, Abcc8
#> Atp1b1, Hspa5, Fam183b, Papss2, Slc38a5, Scg3, Mageh1, Tspan7, Ppp1r1a, Ociad2
#> Negative: Neurog3, Btbd17, Gadd45a, Ppp1r14a, Neurod2, Sox4, Smarcd2, Mdk, Pax4, Btg2
#> Sult2b1, Hes6, Grasp, Igfbpl1, Gpx2, Cbfa2t3, Foxa3, Shf, Mfng, Tmsb4x
#> Amotl2, Gdpd1, Cdc14b, Epb42, Rcor2, Cotl1, Upk3bl, Rbfox3, Cldn6, Cer1
#> StandardPC_ 3
#> Positive: Nusap1, Top2a, Birc5, Aurkb, Cdca8, Pbk, Mki67, Tpx2, Plk1, Ccnb1
#> 2810417H13Rik, Incenp, Cenpf, Ccna2, Prc1, Racgap1, Cdk1, Aurka, Cdca3, Hmmr
#> Spc24, Kif23, Sgol1, Cenpe, Cdc20, Hist1h1b, Cdca2, Mxd3, Kif22, Ska1
#> Negative: Anxa5, Pdzk1ip1, Acot1, Tpm1, Anxa2, Dcdc2a, Capg, Sparc, Ttr, Pamr1
#> Clu, Cxcl12, Ndrg2, Hnf1aos1, Gas6, Gsta3, Krt18, Ces1d, Atp1b1, Muc1
#> Hhex, Acadm, Spp1, Enpp2, Bcl2l14, Sat1, Smtnl2, 1700011H14Rik, Tgm2, Fam159a
#> StandardPC_ 4
#> Positive: Glud1, Tm4sf4, Akr1c19, Cldn4, Runx1t1, Fev, Pou3f4, Gm43861, Pgrmc1, Arx
#> Cd200, Lrpprc, Hmgn3, Ppp1r14c, Pam, Etv1, Tsc22d1, Slc25a5, Akap17b, Pgf
#> Fam43a, Emb, Jun, Krt8, Dnajc12, Mid1ip1, Ids, Rgs17, Uchl1, Alcam
#> Negative: Ins2, Ins1, Ppp1r1a, Nnat, Calr, Sytl4, Sdf2l1, Iapp, Pdia6, Mapt
#> G6pc2, C2cd4b, Npy, Gng12, P2ry1, Ero1lb, Adra2a, Papss2, Arhgap36, Fam151a
#> Dlk1, Creld2, Gip, Tmem215, Gm27033, Cntfr, Prss53, C2cd4a, Lyve1, Ociad2
#> StandardPC_ 5
#> Positive: Pdx1, Nkx6-1, Npepl1, Cldn4, Cryba2, Fev, Jun, Chgb, Gng12, Adra2a
#> Mnx1, Sytl4, Pdk3, Gm27033, Nnat, Chga, Ins2, 1110012L19Rik, Enho, Krt7
#> Mlxipl, Tmsb10, Flrt1, Pax4, Tubb3, Prrg2, Gars, Frzb, BC023829, Gm2694
#> Negative: Irx2, Irx1, Gcg, Ctxn2, Tmem27, Ctsz, Tmsb15l, Nap1l5, Pou6f2, Gria2
#> Ghrl, Peg10, Smarca1, Arx, Lrpap1, Rgs4, Ttr, Gast, Tmsb15b2, Serpina1b
#> Slc16a10, Wnk3, Ly6e, Auts2, Sct, Arg1, Dusp10, Sphkap, Dock11, Edn3
#> ℹ [2025-09-20 13:41:31] Perform `Seurat::FindClusters()` with louvain and `cluster_resolution` = 0.6
#> ℹ [2025-09-20 13:41:31] Reorder clusters...
#> ! [2025-09-20 13:41:31] Using `Seurat::AggregateExpression()` to calculate pseudo-bulk data for <Assay5>
#> ℹ [2025-09-20 13:41:31] Perform umap nonlinear dimension reduction
#> ℹ [2025-09-20 13:41:31] Non-linear dimensionality reduction (umap) using (Standardpca) dims (1-50) as input
#> ℹ [2025-09-20 13:41:31] UMAP will return its model
#> ℹ [2025-09-20 13:41:36] Non-linear dimensionality reduction (umap) using (Standardpca) dims (1-50) as input
#> ℹ [2025-09-20 13:41:36] UMAP will return its model
#> ✔ [2025-09-20 13:41:40] Run scop standard workflow done
pancreas_sub <- RunDEtest(
pancreas_sub,
group_by = "CellType"
)
#> ℹ [2025-09-20 13:41:40] immunogenomics/presto installed successfully
#> ℹ [2025-09-20 13:41:41] Data type is log-normalized
#> ℹ [2025-09-20 13:41:41] Start differential expression test
#> ℹ [2025-09-20 13:41:41] Find all markers(wilcox) among 5 groups...
#> ℹ [2025-09-20 13:41:41] Using 1 core
#> ⠙ [2025-09-20 13:41:41] Running [1/5] ETA: 1s
#> ✔ [2025-09-20 13:41:41] Completed 5 tasks in 856ms
#>
#> ℹ [2025-09-20 13:41:41] Building results
#> ✔ [2025-09-20 13:41:42] Differential expression test completed
pancreas_sub <- RunEnrichment(
pancreas_sub,
group_by = "CellType",
DE_threshold = "p_val_adj < 0.05",
db = "GO_BP",
species = "Mus_musculus"
)
#> ℹ [2025-09-20 13:41:42] Start Enrichment analysis
#> ℹ [2025-09-20 13:41:42] clusterProfiler installed successfully
#> ℹ [2025-09-20 13:41:42] Species: Mus_musculus
#> ℹ [2025-09-20 13:41:42] Loading cached: GO_BP version: 3.21.0 nterm:15445 created: "2025-09-20 13:12:47"
#> ℹ [2025-09-20 13:41:43] Permform enrichment...
#> ℹ [2025-09-20 13:41:43] Using 1 core
#> ⠙ [2025-09-20 13:41:43] Running [1/5] ETA: 1m
#> ⠹ [2025-09-20 13:41:43] Running [2/5] ETA: 49s
#> ⠸ [2025-09-20 13:41:43] Running [3/5] ETA: 32s
#> ⠼ [2025-09-20 13:41:43] Running [4/5] ETA: 15s
#> ✔ [2025-09-20 13:41:43] Completed 5 tasks in 1m 13s
#>
#> ℹ [2025-09-20 13:42:56] Building results
#> ✔ [2025-09-20 13:42:56] Enrichment analysis done
EnrichmentPlot(
pancreas_sub,
db = "GO_BP",
group_by = "CellType",
plot_type = "comparison"
)
#> Warning: Vectorized input to `element_text()` is not officially supported.
#> ℹ Results may be unexpected or may change in future versions of ggplot2.
#> Warning: Removed 113 rows containing missing values or values outside the scale range
#> (`geom_point()`).
if (FALSE) { # \dontrun{
pancreas_sub <- RunEnrichment(
pancreas_sub,
group_by = "CellType",
DE_threshold = "p_val_adj < 0.05",
db = c("MSigDB", "MSigDB_MH"),
species = "Mus_musculus"
)
EnrichmentPlot(
pancreas_sub,
db = "MSigDB",
group_by = "CellType",
plot_type = "comparison"
)
EnrichmentPlot(
pancreas_sub,
db = "MSigDB_MH",
group_by = "CellType",
plot_type = "comparison"
)
# Remove redundant GO terms
pancreas_sub <- RunEnrichment(
pancreas_sub,
group_by = "CellType",
db = "GO_BP",
GO_simplify = TRUE,
species = "Mus_musculus"
)
EnrichmentPlot(
pancreas_sub,
db = "GO_BP_sim",
group_by = "CellType",
plot_type = "comparison"
)
# Or use "geneID" and "geneID_groups" as input to run enrichment
de_df <- dplyr::filter(
pancreas_sub@tools$DEtest_CellType$AllMarkers_wilcox,
p_val_adj < 0.05
)
enrich_out <- RunEnrichment(
geneID = de_df[["gene"]],
geneID_groups = de_df[["group1"]],
db = "GO_BP",
species = "Mus_musculus"
)
EnrichmentPlot(
res = enrich_out,
db = "GO_BP",
plot_type = "comparison"
)
# Use a combined database
pancreas_sub <- RunEnrichment(
pancreas_sub,
group_by = "CellType",
db = c(
"KEGG", "WikiPathway", "Reactome", "PFAM", "MP"
),
db_combine = TRUE,
species = "Mus_musculus"
)
EnrichmentPlot(
pancreas_sub,
db = "Combined",
group_by = "CellType",
plot_type = "comparison"
)
} # }