Retrieve taxonomy for families.
The UniProt sequences from user-specified Pfam families, InterPro families/domains, and/or Pfam clans are retrieved.
The taxonomic distribution of the UniProt IDs is displayed as a "sunburst" in which the levels of classification (superkingdom, kingdom, phylum, class, order, family, genus, species) are displayed radially, with superkingdom at the center and species in the outermost ring. The sunburst is interactive, providing the ability to zoom to a selected taxonomic level. The numbers of UniProt IDs, UniRef90 cluster IDs, and UniRef50 cluster IDs at the selected taxonomic level are provided.
The UniProt IDs, UniRef90 clusters IDs, and UniRef50 cluster IDs as well as FASTA-formatted sequences at the selected level can be downloaded.
The UniProt IDs, UniRef90 clusters IDs, and UniRef50 cluster IDs can be transferred to EFI-EST to generate an SSN and/or to the Retrieve Neighborhood Diagrams/Sequence ID Lookup option of EFI-GNT to generate genome neighborhood diagrams (GNDs).
Retrieve taxonomy for FASTA files.
The user provides a list/file of FASTA-formatted sequences in which the headers contain the UniProt ID. The UniProt ID is required because it is used to retrieve the taxonomy from the UiProt database (FASTA header “reading”).
The taxonomic distribution of the UniProt IDs is displayed as a "sunburst" in which the levels of classification (superkingdom, kingdom, phylum, class, order, family, genus, species) are displayed radially, with superkingdom at the center and species in the outermost ring. The sunburst is interactive, providing the ability to zoom to a selected taxonomic level. The number of UniProt IDs at the selected taxonomic level is provided.
The UniProt IDs and their FASTA-formatted sequences at the selected level can be downloaded.
The UniProt IDs can be transferred to EFI-EST to generate an SSN and/or to the Retrieve Neighborhood Diagrams/Sequence ID Lookup option of EFI-GNT to generate genome neighborhood diagrams (GNDs).
Retrieve taxonomy for accession IDs.
The user provides a list/file of UniProt IDs, UniRef90 cluster IDs, or UniRef50 cluster IDs.
UniRef90 cluster IDs and UniRef50 cluster IDs are expanded to UniProt IDs. For a curated family, the number of UniProt IDs obtained by expansion of UniRef90 cluster IDs may be larger than the number of UniProt IDs identified by protein databases, e.g., Pfam. And, the numbers of UniProt IDs and UniRef90 cluster IDs obtained by expansion of UniRef50 cluster IDs both may be larger than the numbers identified by protein databases. This behavior is explained by the possibility that 1) the UniRef90 clusters contain divergent UniProt IDs that are not members of the family and 2) the UniRef50 clusters contain divergent UniRef90 clusters that are not members of the family. Users should be aware of this behavior when SSNs are generated using UniProt IDs from expanded UniRef90 cluster IDs or using UniProt IDs or UniRef90 cluster IDs from expanded UniRef50 clusters IDs. This problem does not occur when UniRef90 clusters are identified using UniProt IDs or when UniRef50 clusters are identified using UniRef90 cluster IDs, i.e., the UniRef90 and UniRef50 cluster IDs identified by the Families option and Option B in EFI-EST.
The taxonomic distribution of the UniProt IDs is displayed as a "sunburst" in which the levels of classification (superkingdom, kingdom, phylum, class, order, family, genus, species) are displayed radially, with superkingdom at the center and species in the outermost ring. The sunburst is interactive, providing the ability to zoom to a selected taxonomic level. The numbers of UniProt IDs, UniRef90 cluster IDs, and UniRef50 cluster IDs at the selected taxonomic level are provided.
The UniProt IDs, UniRef90 clusters IDs, and UniRef50 cluster IDs as well as FASTA-formatted sequences at the selected level can be downloaded.
The UniProt IDs, UniRef90 clusters IDs, and UniRef50 cluster IDs can be transferred to EFI-EST to generate an SSN and/or to the Retrieve Neighborhood Diagrams/Sequence ID Lookup option of EFI-GNT to generate genome neighborhood diagrams (GNDs).