EFI - Enzyme Similarity Tool

This web resource is supported by a Research Resource from the National Institute of General Medical Sciences (R24GM141196-01).
The tools are available without charge or license to both academic and commercial users.
Important Notice

The UniProtKB database used by the EFI tools is undergoing major reorganization starting with the just-released version 2025_04 (https://www.uniprot.org/help/refprot_only_changes). When the reorganization is fully implemented (2026_02 release, Spring 2026), the number of proteins in UniProtKB will decrease from ~253M accessions in the previous 2025_03 release to ~141M accessions in the 2026_02 release.

In response to these changes, we will provide the previous 2025_03 release until the 2026_02 release is available.

The current 2025_04 release removed 82M UniProt IDs; the UniProt pages providing functional annotation for these IDs are no longer active. A new Metadata Tool provides access to the node attribute metadata for all UniProt IDs in the 2025_03 release that the tools continue to use during the UniProtKB reorganization. The Tool is available using the tab at the top of each page.

More information about the reorganization is located here.

Download Network Files

Submission Name: IP91_RSS_NoFragments Betaproteobacteria UniRef90_NoFragments_RSS_Betaproteobacteria

Network Name: IP91_RSS_NoFragments_Betaproteobacteria_UniRef90_NoFragments_RSS_Betaproteobacteria_Minlen140_AS11

The parameters used for the initial submission and the finalization are summarized in the table below.

Analysis Summary

Analysis Job Number26652
Network NameIP91_RSS_NoFragments_Betaproteobacteria_UniRef90_NoFragments_RSS_Betaproteobacteria_Minlen140_AS11
Alignment Score11
Minimum Length140
Maximum Length50,000
Total Number of Sequences After Filtering11,936

Dataset Summary

EST Job Number26192 (Original Dataset)
Database VersionUniProt: 2022-04 / InterPro: 91
Input OptionAccession IDs (Option D)
Job NameIP91_RSS_NoFragments Betaproteobacteria UniRef90_NoFragments_RSS_Betaproteobacteria
Input Sequence SourceUniRef90
E-Value for SSN Edge Calculation5
No matches file
Number of IDs in Uploaded File12,168 (12,168 UniProt ID matches and 0 unmatched)
Taxonomy CategoriesClass: betaproteobacteria
Family FilterIPR000385, IPR001989, IPR002684, IPR003698, IPR003739, IPR004383, IPR004558, IPR004559, IPR005839, IPR005840, IPR005909, IPR005911, IPR005980, IPR006463, IPR006466, IPR006467, IPR006638, IPR007197, IPR010505, IPR010722, IPR010723, IPR011101, IPR011843, IPR012726, IPR012837, IPR012838, IPR012839, IPR013483, IPR013704, IPR013848, IPR013917, IPR014191, IPR016431, IPR016771, IPR016779, IPR016863, IPR017200, IPR017672, IPR017742, IPR017833, IPR017834, IPR019939, IPR019940, IPR020050, IPR020612, IPR022431, IPR022432, IPR022447, IPR022459, IPR022462, IPR022881, IPR022946, IPR023404, IPR023805, IPR023807, IPR023819, IPR023820, IPR023821, IPR023822, IPR023858, IPR023862, IPR023863, IPR023867, IPR023868, IPR023874, IPR023880, IPR023885, IPR023886, IPR023891, IPR023897, IPR023904, IPR023912, IPR023913, IPR023930, IPR023969, IPR023979, IPR023980, IPR023984, IPR023992, IPR023993, IPR023995, IPR024001, IPR024007, IPR024016, IPR024017, IPR024018, IPR024021, IPR024023, IPR024025, IPR024032, IPR024177, IPR024521, IPR024560, IPR024924, IPR025895, IPR026322, IPR026332, IPR026335, IPR026344, IPR026346, IPR026351, IPR026357, IPR026401, IPR026404, IPR026407, IPR026412, IPR026423, IPR026426, IPR026429, IPR026447, IPR026482, IPR027492, IPR027526, IPR027527, IPR027559, IPR027564, IPR027570, IPR027583, IPR027586, IPR027596, IPR027604, IPR027608, IPR027609, IPR027621, IPR027622, IPR027626, IPR027633, IPR030801, IPR030837, IPR030894, IPR030896, IPR030905, IPR030915, IPR030933, IPR030950, IPR030969, IPR030977, IPR030989, IPR031003, IPR031004, IPR031010, IPR031012, IPR031014, IPR031015, IPR031019, IPR031691, IPR032432, IPR033971, IPR033974, IPR033975, IPR033976, IPR034165, IPR034386, IPR034391, IPR034405, IPR034422, IPR034428, IPR034436, IPR034438, IPR034457, IPR034462, IPR034465, IPR034466, IPR034471, IPR034474, IPR034479, IPR034480, IPR034485, IPR034491, IPR034497, IPR034498, IPR034505, IPR034508, IPR034514, IPR034515, IPR034519, IPR034529, IPR034530, IPR034531, IPR034532, IPR034534, IPR034547, IPR034556, IPR034557, IPR034559, IPR034560, IPR034687, IPR038135, IPR039661, IPR040063, IPR040072, IPR040074, IPR040081, IPR040082, IPR040085, IPR040086, IPR040087, IPR040088, IPR041582, IPR045375, IPR045567, IPR045784, PF04055, PF06969, PF08497, PF12345, PF13186, PF16199, PF16881, PF19238, PF19288, PF19864
Exclude FragmentsYes
Total Number of Sequences in Dataset12,029
Total Number of Edges4,773,476
Number of Unique Sequences12,029
Convergence Ratio?0.066
Please cite your use of the EFI tools:

Rémi Zallot, Nils Oberg, and John A. Gerlt, The EFI Web Resource for Genomic Enzymology Tools: Leveraging Protein, Genome, and Metagenome Databases to Discover Novel Enzymes and Metabolic Pathways. Biochemistry 2019 58 (41), 4169-4182. https://doi.org/10.1021/acs.biochem.9b00735

Nils Oberg, Rémi Zallot, and John A. Gerlt, EFI-EST, EFI-GNT, and EFI-CGFP: Enzyme Function Initiative (EFI) Web Resource for Genomic Enzymology Tools. J Mol Biol 2023. https://doi.org/10.1016/j.jmb.2023.168018

The panels below provide files for full and representative node SSNs for download with the indicated numbers of nodes and edges. As an approximate guide, SSNs with ~2M edges can be opened with 16 GB RAM, ~5M edges can be opened with 32 GB RAM, ~10M edges can be opened with 64 GB RAM, ~20M edges can be opened with 128 GB RAM, ~40M edges can be opened with 256 GB RAM, and ~120M edges can be opened with 768 GB RAM.

Files may be transferred to the Genome Neighborhood Tool (GNT), the Color SSN utility, the Cluster Analysis utility, or the Neighborhood Connectivity utility.

Full Network ?

Each node in the network represents a single protein sequence.

# Nodes # Edges
11,936 4,364,852

 

Representative Node Networks ?

In representative node (RepNode) networks, each node in the network represents a collection of proteins grouped according to percent identity. For example, for a 75% identity RepNode network, all connected sequences that share 75% or more identity are grouped into a single node (meta node). Sequences are collapsed together to reduce the overall number of nodes, making for less complicated networks easier to load in Cytoscape.

The cluster organization is not changed, and the clustering of sequences remains identical to the full network.

% ID # Nodes # Edges
100 11,936 4,364,852
95 11,864 4,308,967
90 11,721 4,196,025
85 10,600 3,347,493
80 9,523 2,655,497
75 8,395 2,015,699
70 7,234 1,451,825
65 6,077 967,691
60 5,118 641,733
55 4,447 422,055
50 3,954 293,725
45 3,646 229,675
40 3,471 202,159

New to Cytoscape?

Portions of these data are derived from the Universal Protein Resource (UniProt) databases.

Click here to contact us for help, reporting issues, or suggestions.