Perform the enrichment analysis (over-representation) on the genes
Source:R/RunEnrichment.R
RunEnrichment.RdPerform the enrichment analysis (over-representation) on the genes
Usage
RunEnrichment(
srt = NULL,
group.by = NULL,
test.use = "wilcox",
DE_threshold = "avg_log2FC > 0 & p_val_adj < 0.05",
geneID = NULL,
geneID_groups = NULL,
geneID_exclude = NULL,
IDtype = "symbol",
result_IDtype = "symbol",
backend = c("cpp", "r"),
species = "Homo_sapiens",
db = "GO_BP",
db_update = FALSE,
db_version = "latest",
db_combine = FALSE,
convert_species = TRUE,
Ensembl_version = NULL,
mirror = NULL,
features = NULL,
TERM2GENE = NULL,
TERM2NAME = NULL,
minGSSize = 10,
maxGSSize = 500,
unlimited_db = c("Chromosome", "GeneType", "TF", "Enzyme", "CSPA"),
GO_simplify = FALSE,
GO_simplify_cutoff = "p.adjust < 0.05",
simplify_method = "Wang",
simplify_similarityCutoff = 0.7,
cores = 1,
verbose = TRUE,
...
)Arguments
- srt
A
Seuratobject orSummarizedExperimentobject containing the results of differential expression analysis (RunDEtest()). If specified, the genes and groups will be extracted from the object automatically. If not specified, thegeneIDandgeneID_groupsarguments must be provided.- group.by
Name of one or more meta.data columns to group (color) cells by.
- test.use
A character vector specifying the test to be used in differential expression analysis. This argument is only used if
srtis specified.- DE_threshold
A character vector specifying the filter condition for differential expression analysis. This argument is only used if
srtis specified.- geneID
A character vector specifying the gene IDs.
- geneID_groups
A factor vector specifying the group labels for each gene.
- geneID_exclude
A character vector specifying the gene IDs to be excluded from the analysis.
- IDtype
A character vector specifying the type of gene IDs in the
srtobject orgeneIDargument. This argument is used to convert the gene IDs to a different type ifIDtypeis different fromresult_IDtype.- result_IDtype
A character vector specifying the desired type of gene ID to be used in the output. This argument is used to convert the gene IDs from
IDtypetoresult_IDtype.- backend
Enrichment backend.
"cpp"is the default and uses a fast native hypergeometric ORA implementation and returns the enrichment table withoutenrichResultobjects."r"usesclusterProfiler::enricher()and returnsenrichResultobjects inresults.GO_simplify = TRUEcurrently uses the R backend.- species
A character vector specifying the species for which the gene annotation databases should be prepared. Can be
"Homo_sapiens"or"Mus_musculus".- db
A character vector specifying the annotation sources to be included in the gene annotation databases. Can be one or more of
"GO", "GO_BP", "GO_CC", "GO_MF", "KEGG", "WikiPathway", "Reactome", "CORUM", "MP", "DO", "HPO", "PFAM", "CSPA", "Surfaceome", "SPRomeDB", "VerSeDa", "TFLink", "hTFtarget", "TRRUST", "JASPAR", "ENCODE", "MSigDB", "CellTalk", "CellChat", "Chromosome", "GeneType", "Enzyme", "TF", "CytoTRACE2". MSigDB subcollections can be requested as"MSigDB_<collection>", such as"MSigDB_H"for human Hallmark and"MSigDB_MH"for mouse Hallmark. Note:"CytoTRACE2"is species-independent and downloads pre-trained model data required by RunCytoTRACE.- db_update
Whether the gene annotation databases should be forcefully updated. If set to FALSE, the function will attempt to load the cached databases instead. Default is
FALSE.- db_version
A character vector specifying the version of the gene annotation databases to be retrieved. Default is
"latest".- db_combine
Whether to combine multiple databases into one. If
TRUE, all database specified bydbwill be combined as one named "Combined".- convert_species
Whether to use a species-converted database when the annotation is missing for the specified species. Default is
TRUE.- Ensembl_version
An integer specifying the Ensembl version. Default is
NULL. IfNULL, the latest version will be used.- mirror
Specify an Ensembl mirror to connect to. The valid options here are
"www","uswest","useast","asia".- features
A named list of feature lists for custom enrichment gene sets. If provided, it takes precedence over
TERM2GENEanddb.- TERM2GENE
A data frame specifying the gene-term mapping for a custom database. The first column should contain the term IDs, and the second column should contain the gene IDs.
- TERM2NAME
A data frame specifying the term-name mapping for a custom database. The first column should contain the term IDs, and the second column should contain the corresponding term names.
- minGSSize
The minimum size of a gene set to be considered in the enrichment analysis.
- maxGSSize
The maximum size of a gene set to be considered in the enrichment analysis.
- unlimited_db
A character vector specifying the names of databases that do not have size restrictions.
- GO_simplify
Whether to simplify the GO terms. If
TRUE, additional results with simplified GO terms will be returned.- GO_simplify_cutoff
A character vector specifying the filter condition for simplification of GO terms. This argument is only used if
GO_simplifyisTRUE.- simplify_method
A character vector specifying the method to be used for simplification of GO terms. This argument is only used if
GO_simplifyisTRUE.- simplify_similarityCutoff
The similarity cutoff for simplification of GO terms. This argument is only used if
GO_simplifyisTRUE.- cores
The number of cores to use for parallelization with foreach::foreach. Default is
1.- verbose
Whether to print the message. Default is
TRUE.- ...
Passed to other functions.
Value
If input is a Seurat object, returns the modified Seurat object with the enrichment result stored in the tools slot.
If input is a geneID vector with or without geneID_groups, return the enrichment result directly.
Enrichment result is a list with the following component:
enrichment: A data.frame containing all enrichment results.results: A list ofenrichResultobjects from the DOSE package.geneMap: A data.frame containing the ID mapping table for input gene IDs.input: A data.frame containing the input gene IDs and gene ID groups.DE_threshold: A specific threshold for differential expression analysis (only returned if input is a Seurat object).
Examples
data(pancreas_sub)
pancreas_sub <- standard_scop(pancreas_sub)
pancreas_sub <- RunDEtest(
pancreas_sub,
group.by = "CellType"
)
pancreas_sub <- RunEnrichment(
pancreas_sub,
group.by = "CellType",
DE_threshold = "p_val_adj < 0.05",
db = "GO_BP",
species = "Mus_musculus"
)
EnrichmentPlot(
pancreas_sub,
db = "GO_BP",
group.by = "CellType",
plot_type = "comparison"
)
pancreas_sub <- RunEnrichment(
pancreas_sub,
group.by = "CellType",
DE_threshold = "p_val_adj < 0.05",
db = c("MSigDB", "MSigDB_MH"),
species = "Mus_musculus"
)
EnrichmentPlot(
pancreas_sub,
db = "MSigDB",
group.by = "CellType",
plot_type = "comparison"
)
EnrichmentPlot(
pancreas_sub,
db = "MSigDB_MH",
group.by = "CellType",
plot_type = "comparison"
)
# Remove redundant GO terms
pancreas_sub <- RunEnrichment(
pancreas_sub,
group.by = "CellType",
db = "GO_BP",
GO_simplify = TRUE,
species = "Mus_musculus"
)
EnrichmentPlot(
pancreas_sub,
db = "GO_BP_sim",
group.by = "CellType",
plot_type = "comparison"
)
# Or use "geneID" and "geneID_groups" as input to run enrichment
de_df <- dplyr::filter(
pancreas_sub@tools$DEtest_CellType$AllMarkers_wilcox,
p_val_adj < 0.05
)
enrich_out <- RunEnrichment(
geneID = de_df[["gene"]],
geneID_groups = de_df[["group1"]],
db = "GO_BP",
species = "Mus_musculus"
)
EnrichmentPlot(
res = enrich_out,
db = "GO_BP",
plot_type = "comparison"
)
# Use a combined database
pancreas_sub <- RunEnrichment(
pancreas_sub,
group.by = "CellType",
db = c(
"KEGG", "WikiPathway", "Reactome", "PFAM", "MP"
),
db_combine = TRUE,
species = "Mus_musculus"
)
EnrichmentPlot(
pancreas_sub,
db = "Combined",
group.by = "CellType",
plot_type = "comparison"
)