EFI - Taxonomy Tool

This web resource is supported by a Research Resource from the National Institute of General Medical Sciences (R24GM141196-01).
The tools are available without charge or license to both academic and commercial users.
Reorganization of UniProtKB

With the current 2026_02 release, the UniProtKB database is reorganized to include an expanded number of Reference Proteomes to better capture biodiversity. This includes the removal of proteins from taxonomically unclassified organisms, i.e., those without a binomial species name (genus and species). The total number of accessions in UniProtKB has been reduced from 253,635,358 in the “legacy” 2025_03 release to 149,810,139 in the current 2026_02 release.

We are providing the option to select either the “legacy” 2025_03 database or the current UniProtKB database (now 2026_02) when generating SSNs. You can select the database in the “Database” accordion on the pages for the EFI-EST options, the EFI-GNT tool, and the Taxonomy Tool. We suggest that you compare the SSNs, GNNs, and GNDs generated from both databases as you explore the information you are seeking.

Because the “legacy” 2025_03 release contains UniProt IDs that are no longer active on the UniProt web site, we provide the Metadata Tool that provides access to the node attribute metadata for the UniProt IDs in the “legacy” 2025_03 release.

Dataset Completed

Submission Name: IP91_IPR004184_NoFragments

The parameters for generating the initial dataset are summarized in the table.

Job Number26117
Time Started -- Finished11/6 06:25 PM -- 11/6 06:55 PM
Input OptionFamilies (Option B)
Job NameIP91_IPR004184_NoFragments
E-Value for SSN Edge Calculation
Pfam / InterPro FamilyIPR004184
Number of IDs in Pfam / InterPro Family21,636
Exclude FragmentsYes
Total Number of Sequences in Dataset21,636

The taxonomy distribution for the UniProt IDs identified as members of the input list of families is displayed.

The UniRef90 and UniRef50 clusters containing the UniProt IDs in the sunburst are identified using the lookup table provided by UniProt/UniRef. These UniRef90 and UniRef50 clusters may contain UniProt IDs from other families; in addition, the UniRef90 and UniRef50 clusters at a selected taxonomy category may contain UniProt IDs from other categories. This results from conflation of UniProt IDs in UniRef90 and UniRef50 clusters that share ≥90% and ≥50% sequence identity, respectively.

The numbers of UniProt IDs, UniRef90 cluster IDs, and UniRef50 cluster IDs for the selected category are displayed.

The sunburst is interactive, providing the ability to zoom to a selected taxonomy category by clicking on that category; clicking on the center circle will return the display to the next highest rank.

Number of sequences at each length - UniProt
Number of sequences at each length - UniRef90
Number of sequences at each length - UniRef50
Portions of these data are derived from the Universal Protein Resource (UniProt) databases.

Click here to contact us for help, reporting issues, or suggestions.