EFI - Enzyme Similarity Tool

This web resource is supported by a Research Resource from the National Institute of General Medical Sciences (R24GM141196-01).

The tools are available without charge or license to both academic and commercial users.

Important Notice

The UniProtKB database used by the EFI tools is undergoing major reorganization starting with the just-released version 2025_04 (https://www.uniprot.org/help/refprot_only_changes). When the reorganization is fully implemented (2026_02 release, Spring 2026), the number of proteins in UniProtKB will decrease from ~253M accessions in the previous 2025_03 release to ~141M accessions in the 2026_02 release.

In response to these changes, we will provide the previous 2025_03 release until the 2026_02 release is available.

The current 2025_04 release removed 82M UniProt IDs; the UniProt pages providing functional annotation for these IDs are no longer active. A new Metadata Tool provides access to the node attribute metadata for all UniProt IDs in the 2025_03 release that the tools continue to use during the UniProtKB reorganization. The Tool is available using the tab at the top of each page.

More information about the reorganization is located here.

EFI-EST and Cytoscape Tutorials

Network File Download

The network file download page includes three tables.

The first displays a summary of the input chosen, and is used for record keeping.

Summary of input for SSN generation

The following tables contain links to download networks, the representative node %ID, the number of nodes, the number of edges, and finally the file size.

Download of SSNs

The top table contains the "full" network created at your specified alignment score threshold. By default, this network contains all of the sequences/nodes in your input sequence set. However, this frequently results in very large files (~ 500 MB and greater) that will open and/or run very slowly, or not at all, on most laptop/desktop computers. As a very rough guide, generally Cytoscape networks with a few thousand nodes (protein sequences) and less than ~ 500,000 edges can viewed, although this will depend on your computer. View this "full" network whenever possible, because it will provide access to annotation information for each node in your dataset. Full networks with greater than 10 million edges will not be generated.

In cases where the full network file is too large to open, the bottom table provides the ability to download “representative node” networks. In a representative node (rep node) network, sequences sharing ≥ a specified %ID are grouped into the same node using a program called CD-HIT (4, 5). For example, 90% ID rep node means that each node in the network will contain sequences that share ≥ 90% identity over ANY length of their amino acid sequences. The edges are drawn as done for a full network, except the longest sequence in the rep node is used to determine the alignment score between other rep nodes. For example, if your specified alignment score for the network output was 28, then edges are only drawn between representative nodes where the representative sequences share that alignment score or larger. Rep node networks are automatically calculated at 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, and 100% sequence identity to assure that you will be able to open one or more of the networks on your computer. The number of sequences contained within each rep node as well as the UniProt IDs for those sequences can be viewed in the Cytoscape node attributes panel.

Downloaded files are in the xgmml format and can be imported and viewed in Cytoscape by choosing File → Import → Network and selecting an xgmml file once you have started the Cytoscape program. For more information on using Cytoscape, please see the tutorials here.

Click here to contact us for help, reporting issues, or suggestions.

Email Address:
Password: