Skip to contents

Annotate features in a Seurat object with additional metadata from databases or a GTF file.

Usage

AnnotateFeatures(
  srt,
  species = "Homo_sapiens",
  IDtype = c("symbol", "ensembl_id", "entrez_id"),
  db = NULL,
  db_update = FALSE,
  db_version = "latest",
  convert_species = TRUE,
  Ensembl_version = 103,
  mirror = NULL,
  gtf = NULL,
  merge_gtf_by = "gene_name",
  columns = c("seqname", "feature", "start", "end", "strand", "gene_id", "gene_name",
    "gene_type"),
  assays = "RNA",
  overwrite = FALSE
)

Arguments

srt

Seurat object to be annotated.

species

Name of the species to be used for annotation. Default is "Homo_sapiens".

IDtype

Type of identifier to use for annotation. Default is "symbol" with options "symbol", "ensembl_id", and "entrez_id".

db

Vector of database names to be used for annotation. Default is NULL.

db_update

Logical value indicating whether to update the database. Default is FALSE.

db_version

Version of the database to use. Default is "latest".

convert_species

A logical value indicating whether to use a species-converted database when the annotation is missing for the specified species. The default value is TRUE.

Ensembl_version

Version of the Ensembl database to use. Default is 103.

mirror

URL of the mirror to use for Ensembl database. Default is NULL.

gtf

Path to the GTF file to be used for annotation. Default is NULL.

merge_gtf_by

Column name to merge the GTF file by. Default is "gene_name".

columns

Vector of column names to be used from the GTF file. Default is "seqname", "feature", "start", "end", "strand", "gene_id", "gene_name", "gene_type".

assays

Character vector of assay names to be annotated. Default is "RNA".

overwrite

Logical value indicating whether to overwrite existing metadata. Default is FALSE.

See also

Examples

data(pancreas_sub)
pancreas_sub <- AnnotateFeatures(
  srt = pancreas_sub,
  species = "Mus_musculus",
  db = c(
    "Chromosome",
    "GeneType",
    "Enzyme",
    # "TF",
    # "CSPA",
    "VerSeDa"
  )
)
#>  [2025-07-26 06:19:04] Species: Mus_musculus
#>  [2025-07-26 06:19:05] Preparing database: Chromosome
#>  [2025-07-26 06:19:05] Preparing database: GeneType
#>  [2025-07-26 06:19:06] Preparing database: Enzyme
#>  [2025-07-26 06:19:08] Preparing database: VerSeDa
#>  [2025-07-26 06:19:13] Connect to the Ensembl archives...
#>  [2025-07-26 06:19:15] Using the 103 version of biomart...
#>  [2025-07-26 06:19:15] Connecting to the biomart...
#>  [2025-07-26 06:20:15] Error in `req_perform()`:
#>  [2025-07-26 06:20:15] ! Failed to perform HTTP request.
#>  [2025-07-26 06:20:15] Caused by error in `curl::curl_fetch_memory()`:
#>  [2025-07-26 06:20:15] ! Timeout was reached [feb2021.archive.ensembl.org]:
#>  [2025-07-26 06:20:15] Operation timed out after 60001 milliseconds with 0 bytes received
#>  [2025-07-26 06:20:15] 
#>  [2025-07-26 06:20:15] Get errors when connecting with ensembl mart...
#>  [2025-07-26 06:20:16] Retrying...
#>  [2025-07-26 06:21:16] Error in `req_perform()`:
#>  [2025-07-26 06:21:16] ! Failed to perform HTTP request.
#>  [2025-07-26 06:21:16] Caused by error in `curl::curl_fetch_memory()`:
#>  [2025-07-26 06:21:16] ! Timeout was reached [feb2021.archive.ensembl.org]:
#>  [2025-07-26 06:21:16] Operation timed out after 60002 milliseconds with 0 bytes received
#>  [2025-07-26 06:21:16] 
#>  [2025-07-26 06:21:16] Get errors when connecting with ensembl mart...
#>  [2025-07-26 06:21:17] Retrying...
#>  [2025-07-26 06:22:17] Error in `req_perform()`:
#>  [2025-07-26 06:22:17] ! Failed to perform HTTP request.
#>  [2025-07-26 06:22:17] Caused by error in `curl::curl_fetch_memory()`:
#>  [2025-07-26 06:22:17] ! Timeout was reached [feb2021.archive.ensembl.org]:
#>  [2025-07-26 06:22:17] Operation timed out after 60001 milliseconds with 0 bytes received
#>  [2025-07-26 06:22:17] 
#>  [2025-07-26 06:22:17] Get errors when connecting with ensembl mart...
#>  [2025-07-26 06:22:18] Retrying...
#>  [2025-07-26 06:23:18] Error in `req_perform()`:
#>  [2025-07-26 06:23:18] ! Failed to perform HTTP request.
#>  [2025-07-26 06:23:18] Caused by error in `curl::curl_fetch_memory()`:
#>  [2025-07-26 06:23:18] ! Timeout was reached [feb2021.archive.ensembl.org]:
#>  [2025-07-26 06:23:18] Operation timed out after 60001 milliseconds with 0 bytes received
#>  [2025-07-26 06:23:18] 
#>  [2025-07-26 06:23:18] Get errors when connecting with ensembl mart...
#>  [2025-07-26 06:23:19] Retrying...
#>  [2025-07-26 06:24:19] Error in `req_perform()`:
#>  [2025-07-26 06:24:19] ! Failed to perform HTTP request.
#>  [2025-07-26 06:24:19] Caused by error in `curl::curl_fetch_memory()`:
#>  [2025-07-26 06:24:19] ! Timeout was reached [feb2021.archive.ensembl.org]:
#>  [2025-07-26 06:24:19] Operation timed out after 60002 milliseconds with 0 bytes received
#>  [2025-07-26 06:24:19] 
#>  [2025-07-26 06:24:19] Get errors when connecting with ensembl mart...
#> Error in log_message(out, message_type = "error"): Error in `req_perform()`: ! Failed to perform HTTP request. Caused by
#> error in `curl::curl_fetch_memory()`: ! Timeout was reached
#> [feb2021.archive.ensembl.org]: Operation timed out after 60002 milliseconds
#> with 0 bytes received
head(
  GetFeaturesData(
    pancreas_sub,
    assays = "RNA"
  )
)
#>                 TF CSPA
#> Xkr4          <NA> <NA>
#> Mrpl15        <NA> <NA>
#> 4732440D04Rik <NA> <NA>
#> Gm26901       <NA> <NA>
#> Sntg1         <NA> <NA>
#> Mybl1           TF <NA>

if (FALSE) { # \dontrun{
# Annotate features using a GTF file
pancreas_sub <- AnnotateFeatures(
  pancreas_sub,
  gtf = "/refdata-gex-mm10-2020-A/genes/genes.gtf"
)
head(
  GetFeaturesData(
    pancreas_sub,
    assays = "RNA"
  )
)
} # }