Annotate features in a Seurat object with additional metadata from databases or a GTF file.
Usage
AnnotateFeatures(
srt,
species = "Homo_sapiens",
IDtype = c("symbol", "ensembl_id", "entrez_id"),
db = NULL,
db_update = FALSE,
db_version = "latest",
convert_species = TRUE,
Ensembl_version = 103,
mirror = NULL,
gtf = NULL,
merge_gtf_by = "gene_name",
columns = c("seqname", "feature", "start", "end", "strand", "gene_id", "gene_name",
"gene_type"),
assays = "RNA",
overwrite = FALSE
)
Arguments
- srt
Seurat object to be annotated.
- species
Name of the species to be used for annotation. Default is "Homo_sapiens".
- IDtype
Type of identifier to use for annotation. Default is "symbol" with options "symbol", "ensembl_id", and "entrez_id".
- db
Vector of database names to be used for annotation. Default is NULL.
- db_update
Logical value indicating whether to update the database. Default is FALSE.
- db_version
Version of the database to use. Default is "latest".
- convert_species
A logical value indicating whether to use a species-converted database when the annotation is missing for the specified species. The default value is TRUE.
- Ensembl_version
Version of the Ensembl database to use. Default is 103.
- mirror
URL of the mirror to use for Ensembl database. Default is NULL.
- gtf
Path to the GTF file to be used for annotation. Default is NULL.
- merge_gtf_by
Column name to merge the GTF file by. Default is "gene_name".
- columns
Vector of column names to be used from the GTF file. Default is "seqname", "feature", "start", "end", "strand", "gene_id", "gene_name", "gene_type".
- assays
Character vector of assay names to be annotated. Default is "RNA".
- overwrite
Logical value indicating whether to overwrite existing metadata. Default is FALSE.
Examples
data(pancreas_sub)
pancreas_sub <- AnnotateFeatures(
srt = pancreas_sub,
species = "Mus_musculus",
db = c(
"Chromosome",
"GeneType",
"Enzyme",
# "TF",
# "CSPA",
"VerSeDa"
)
)
#> ℹ [2025-07-26 06:19:04] Species: Mus_musculus
#> ℹ [2025-07-26 06:19:05] Preparing database: Chromosome
#> ℹ [2025-07-26 06:19:05] Preparing database: GeneType
#> ℹ [2025-07-26 06:19:06] Preparing database: Enzyme
#> ℹ [2025-07-26 06:19:08] Preparing database: VerSeDa
#> ℹ [2025-07-26 06:19:13] Connect to the Ensembl archives...
#> ℹ [2025-07-26 06:19:15] Using the 103 version of biomart...
#> ℹ [2025-07-26 06:19:15] Connecting to the biomart...
#> ℹ [2025-07-26 06:20:15] Error in `req_perform()`:
#> ℹ [2025-07-26 06:20:15] ! Failed to perform HTTP request.
#> ℹ [2025-07-26 06:20:15] Caused by error in `curl::curl_fetch_memory()`:
#> ℹ [2025-07-26 06:20:15] ! Timeout was reached [feb2021.archive.ensembl.org]:
#> ℹ [2025-07-26 06:20:15] Operation timed out after 60001 milliseconds with 0 bytes received
#> ℹ [2025-07-26 06:20:15]
#> ℹ [2025-07-26 06:20:15] Get errors when connecting with ensembl mart...
#> ℹ [2025-07-26 06:20:16] Retrying...
#> ℹ [2025-07-26 06:21:16] Error in `req_perform()`:
#> ℹ [2025-07-26 06:21:16] ! Failed to perform HTTP request.
#> ℹ [2025-07-26 06:21:16] Caused by error in `curl::curl_fetch_memory()`:
#> ℹ [2025-07-26 06:21:16] ! Timeout was reached [feb2021.archive.ensembl.org]:
#> ℹ [2025-07-26 06:21:16] Operation timed out after 60002 milliseconds with 0 bytes received
#> ℹ [2025-07-26 06:21:16]
#> ℹ [2025-07-26 06:21:16] Get errors when connecting with ensembl mart...
#> ℹ [2025-07-26 06:21:17] Retrying...
#> ℹ [2025-07-26 06:22:17] Error in `req_perform()`:
#> ℹ [2025-07-26 06:22:17] ! Failed to perform HTTP request.
#> ℹ [2025-07-26 06:22:17] Caused by error in `curl::curl_fetch_memory()`:
#> ℹ [2025-07-26 06:22:17] ! Timeout was reached [feb2021.archive.ensembl.org]:
#> ℹ [2025-07-26 06:22:17] Operation timed out after 60001 milliseconds with 0 bytes received
#> ℹ [2025-07-26 06:22:17]
#> ℹ [2025-07-26 06:22:17] Get errors when connecting with ensembl mart...
#> ℹ [2025-07-26 06:22:18] Retrying...
#> ℹ [2025-07-26 06:23:18] Error in `req_perform()`:
#> ℹ [2025-07-26 06:23:18] ! Failed to perform HTTP request.
#> ℹ [2025-07-26 06:23:18] Caused by error in `curl::curl_fetch_memory()`:
#> ℹ [2025-07-26 06:23:18] ! Timeout was reached [feb2021.archive.ensembl.org]:
#> ℹ [2025-07-26 06:23:18] Operation timed out after 60001 milliseconds with 0 bytes received
#> ℹ [2025-07-26 06:23:18]
#> ℹ [2025-07-26 06:23:18] Get errors when connecting with ensembl mart...
#> ℹ [2025-07-26 06:23:19] Retrying...
#> ℹ [2025-07-26 06:24:19] Error in `req_perform()`:
#> ℹ [2025-07-26 06:24:19] ! Failed to perform HTTP request.
#> ℹ [2025-07-26 06:24:19] Caused by error in `curl::curl_fetch_memory()`:
#> ℹ [2025-07-26 06:24:19] ! Timeout was reached [feb2021.archive.ensembl.org]:
#> ℹ [2025-07-26 06:24:19] Operation timed out after 60002 milliseconds with 0 bytes received
#> ℹ [2025-07-26 06:24:19]
#> ℹ [2025-07-26 06:24:19] Get errors when connecting with ensembl mart...
#> Error in log_message(out, message_type = "error"): Error in `req_perform()`: ! Failed to perform HTTP request. Caused by
#> error in `curl::curl_fetch_memory()`: ! Timeout was reached
#> [feb2021.archive.ensembl.org]: Operation timed out after 60002 milliseconds
#> with 0 bytes received
head(
GetFeaturesData(
pancreas_sub,
assays = "RNA"
)
)
#> TF CSPA
#> Xkr4 <NA> <NA>
#> Mrpl15 <NA> <NA>
#> 4732440D04Rik <NA> <NA>
#> Gm26901 <NA> <NA>
#> Sntg1 <NA> <NA>
#> Mybl1 TF <NA>
if (FALSE) { # \dontrun{
# Annotate features using a GTF file
pancreas_sub <- AnnotateFeatures(
pancreas_sub,
gtf = "/refdata-gex-mm10-2020-A/genes/genes.gtf"
)
head(
GetFeaturesData(
pancreas_sub,
assays = "RNA"
)
)
} # }