EFI - Taxonomy Tool

This web resource is supported by a Research Resource from the National Institute of General Medical Sciences (R24GM141196-01).
The tools are available without charge or license to both academic and commercial users.
Important Notice

The UniProtKB database used by the EFI tools is undergoing major reorganization starting with the 2025_04 release (https://www.uniprot.org/release-notes/forthcoming-changes). When the reorganization is fully implemented (2026_02 release, Spring 2026), the number of proteins in UniProtKB is expected to decrease from ~253M accessions in the current 2025_03 release to ~141M accessions in the 2026_02 release.

In response to these changes, we are planning to provide the current 2025_03 release until the 2026_02 release is available.

More information about the changes is located here.

Dataset Completed

Submission Name: IP91_RSS_NoFragments_Minlen140

The parameters for generating the initial dataset are summarized in the table.

Job Number26126
Time Started -- Finished11/6 07:05 PM -- 11/7 07:55 AM
Database VersionUniProt: 2022-04 / InterPro: 91
Input OptionFamilies (Option B)
Job NameIP91_RSS_NoFragments_Minlen140
E-Value for SSN Edge Calculation
Pfam / InterPro FamilyIPR000385, IPR001989, IPR002684, IPR003698, IPR003739, IPR004383, IPR004558, IPR004559, IPR005839, IPR005840, IPR005909, IPR005911, IPR005980, IPR006463, IPR006466, IPR006467, IPR006638, IPR007197, IPR010505, IPR010722, IPR010723, IPR011101, IPR011843, IPR012726, IPR012837, IPR012838, IPR012839, IPR013483, IPR013704, IPR013848, IPR013917, IPR014191, IPR016431, IPR016771, IPR016779, IPR016863, IPR017200, IPR017672, IPR017742, IPR017833, IPR017834, IPR019939, IPR019940, IPR020050, IPR020612, IPR022431, IPR022432, IPR022447, IPR022459, IPR022462, IPR022881, IPR022946, IPR023404, IPR023805, IPR023807, IPR023819, IPR023820, IPR023821, IPR023822, IPR023858, IPR023862, IPR023863, IPR023867, IPR023868, IPR023874, IPR023880, IPR023885, IPR023886, IPR023891, IPR023897, IPR023904, IPR023912, IPR023913, IPR023930, IPR023969, IPR023979, IPR023980, IPR023984, IPR023992, IPR023993, IPR023995, IPR024001, IPR024007, IPR024016, IPR024017, IPR024018, IPR024021, IPR024023, IPR024025, IPR024032, IPR024177, IPR024521, IPR024560, IPR024924, IPR025895, IPR026322, IPR026332, IPR026335, IPR026344, IPR026346, IPR026351, IPR026357, IPR026401, IPR026404, IPR026407, IPR026412, IPR026423, IPR026426, IPR026429, IPR026447, IPR026482, IPR027492, IPR027526, IPR027527, IPR027559, IPR027564, IPR027570, IPR027583, IPR027586, IPR027596, IPR027604, IPR027608, IPR027609, IPR027621, IPR027622, IPR027626, IPR027633, IPR030801, IPR030837, IPR030894, IPR030896, IPR030905, IPR030915, IPR030933, IPR030950, IPR030969, IPR030977, IPR030989, IPR031003, IPR031004, IPR031010, IPR031012, IPR031014, IPR031015, IPR031019, IPR031691, IPR032432, IPR033971, IPR033974, IPR033975, IPR033976, IPR034165, IPR034386, IPR034391, IPR034405, IPR034422, IPR034428, IPR034436, IPR034438, IPR034457, IPR034462, IPR034465, IPR034466, IPR034471, IPR034474, IPR034479, IPR034480, IPR034485, IPR034491, IPR034497, IPR034498, IPR034505, IPR034508, IPR034514, IPR034515, IPR034519, IPR034529, IPR034530, IPR034531, IPR034532, IPR034534, IPR034547, IPR034556, IPR034557, IPR034559, IPR034560, IPR034687, IPR038135, IPR039661, IPR040063, IPR040072, IPR040074, IPR040081, IPR040082, IPR040085, IPR040086, IPR040087, IPR040088, IPR041582, IPR045375, IPR045567, IPR045784, PF04055, PF06969, PF08497, PF12345, PF13186, PF16199, PF16881, PF19238, PF19288, PF19864
Number of IDs in Pfam / InterPro Family717,988
Exclude FragmentsYes
Total Number of Sequences in Dataset717,988

The taxonomy distribution for the UniProt IDs identified as members of the input list of families is displayed.

The UniRef90 and UniRef50 clusters containing the UniProt IDs in the sunburst are identified using the lookup table provided by UniProt/UniRef. These UniRef90 and UniRef50 clusters may contain UniProt IDs from other families; in addition, the UniRef90 and UniRef50 clusters at a selected taxonomy category may contain UniProt IDs from other categories. This results from conflation of UniProt IDs in UniRef90 and UniRef50 clusters that share ≥90% and ≥50% sequence identity, respectively.

The numbers of UniProt IDs, UniRef90 cluster IDs, and UniRef50 cluster IDs for the selected category are displayed.

The sunburst is interactive, providing the ability to zoom to a selected taxonomy category by clicking on that category; clicking on the center circle will return the display to the next highest rank.

Number of sequences at each length - UniProt
Number of sequences at each length - UniRef90
Number of sequences at each length - UniRef50
Portions of these data are derived from the Universal Protein Resource (UniProt) databases.

Click here to contact us for help, reporting issues, or suggestions.