EFI - Taxonomy Tool
This web resource is supported by a Research Resource from the National Institute of General Medical Sciences (R24GM141196-01).
The tools are available without charge or license to both academic and commercial users.
Important Notice
The UniProtKB database used by the EFI tools is undergoing major reorganization
starting with the just-released version 2025_04
(https://www.uniprot.org/help/refprot_only_changes).
When the reorganization is
fully implemented (2026_02 release, Spring 2026), the number of proteins in
UniProtKB will decrease from ~253M accessions in the previous 2025_03 release
to ~141M accessions in the 2026_02 release.
In response to these changes, we will provide the previous 2025_03 release
until the 2026_02 release is available.
The current 2025_04 release removed 82M UniProt IDs; the UniProt pages
providing functional annotation for these IDs are no longer active. A new
Metadata Tool
provides access to the node attribute metadata for all UniProt
IDs in the 2025_03 release that the tools continue to use during the UniProtKB
reorganization. The Tool is available using the tab at the top of each page.
More information about the reorganization is located here.
Dataset Completed
Submission Name: IP91_IPR004184_All
The parameters for generating the initial dataset are summarized in the table.
| Job Number | 26116 |
| Time Started -- Finished | 11/6 06:25 PM -- 11/6 06:55 PM |
| Database Version | UniProt: 2022-04 / InterPro: 91 |
| Input Option | Families (Option B) |
| Job Name | IP91_IPR004184_All |
| E-Value for SSN Edge Calculation | |
| Pfam / InterPro Family | IPR004184 |
| Number of IDs in Pfam / InterPro Family | 25,513 |
| Exclude Fragments | No |
| Total Number of Sequences in Dataset | 25,513 |
The taxonomy distribution for the UniProt IDs identified as members of the
input list of families is displayed.
The UniRef90 and UniRef50 clusters containing the UniProt IDs in the sunburst
are identified using the lookup table provided by UniProt/UniRef. These
UniRef90 and UniRef50 clusters may contain UniProt IDs from other families; in
addition, the UniRef90 and UniRef50 clusters at a selected taxonomy category
may contain UniProt IDs from other categories. This results from conflation of
UniProt IDs in UniRef90 and UniRef50 clusters that share ≥90% and ≥50% sequence
identity, respectively.
The numbers of UniProt IDs, UniRef90 cluster IDs, and UniRef50 cluster IDs for
the selected category are displayed.
The sunburst is interactive, providing the ability to zoom to a selected
taxonomy category by clicking on that category; clicking on the center circle
will return the display to the next highest rank.
Number of sequences at each length - UniProt
Number of sequences at each length - UniRef90
Number of sequences at each length - UniRef50
Portions of these data are derived from the Universal Protein Resource (UniProt) databases.
Click here to contact us for help, reporting issues, or suggestions.