Enzyme Function Initiative Tools

This web resource is supported by a Research Resource from the National Institute of General Medical Sciences (R24GM141196-01).
The tools are available without charge or license to both academic and commercial users.

UniProt Updates

Background as of October 2, 2025

You may have noticed that the UniProt pages for many UniProtKB/TrEMBL entries now have a banner (boxed in red in this example) describing their likely removal in the 2025_04 (October 8, 2025) or 2026_02 (Spring 2026) release.

The UniProtKB database used by the EFI tools is undergoing major reorganization starting with the 2025_04 release (https://www.uniprot.org/release-notes/forthcoming-changes). When the reorganization is fully implemented, the number of proteins in UniProtKB is expected to decrease from ~253M accessions in the current 2025_03 release to ~141M accessions in the 2026_02 release.

The purpose of the reorganization is to "ensure an improved representation of species biodiversity in UniProtKB". This will be achieved by restricting the protein space in UniProtKB to those sequences included in "Reference Proteomes" identified by UniProt. Also, expert-reviewed sequences in UniProtKB/SwissProt as well as sequences that are not reviewed but are associated with experimental Gene Ontology annotations or additional biologically important data such as a 3D structure will be included in the reorganized UniProtKB.

The reduction in the number of sequences will be achieved by removing the sequences from "Other Proteomes" and "Redundant Proteomes" that have been identified by UniProt and will not promoted to Reference Proteomes in the reorganization [https://ftp.ebi.ac.uk/pub/contrib/UniProt/proteomes/README.txt; 160,292 proteomes will be removed (archived October 2, 2025 here).] This includes removing 1) proteins belonging to unclassified taxons (no genus and species) in release 2025_04 and 2) the remaining proteins in release 2026_02. When the 2026_02 release is available, a total of ~141M entries will have been removed.

Until the reorganization is completed, we will not know the impact of the reorganization on the EFI tools. However, we expect the reduction in the number of sequences will decrease the functional diversity, with the result that the SSNs may not be as useful for examining sequence-function space in protein families. The Plans for the EFI Tool Database (below) describes our plan for restoring functional diversity to the database used by the tools.

Immediate Changes to the Tools

During the reorganization, we are planning to provide the current 2025_03 release until the 2026_02 release is available.

Job input, the generation of xgmml files for SSNs, and the operation of the SSN utilities will continue as "normal".

To ease the transition to the reorganized UniProtKB, three node attributes will be added to the SSNs:

  1. EMBL Protein ID. Each protein in UniProt, whether in UniProtKB or UniParc, is assigned a unique EMBL Protein ID that can be used to search the UniProtKB and UniParc databases for the protein.
  2. UniParc ID. Each unique sequence in UniParc is assigned an identifier.
  3. Proteome ID. A unique identifier is assigned to the set of proteins that constitute a proteome.

Retrieval of Protein and DNA Sequences for Proteins Removed from UniProtKB

The UniProt annotation pages for the proteins removed from UniProtKB (and archived in UniParc) will not be available when UniProtKB is queried (https://www.uniprot.org/). This file provides a pipeline for retrieving protein and DNA sequences for UniProtKB entries that are removed and archived in UniParc until the utility described in the last section is completed.

Plans for the EFI Tool Database

UniProt will continue to identify Other Proteomes that will not be included in UniProtKB; these proteomes will be archived in the UniParc database, with their proteins assigned unique EMBL Protein IDs. These proteins are expected to possess the functional diversity that is important to the EFI tool users that will be removed in the UniProtKB reorganization.

We plan to include the proteins in the Other Proteomes together with the entries in the reorganized UniProtKB in the database used by the EFI tools.

Genome context information will be available for the proteins in the Other Proteomes. EFI-GNT will continue to provide both genome neighborhood networks (GNNs) and genome neighborhood diagrams (GNDs) for the combined UniProtKB and Other Proteomes database.

Plans for a Utility to Access Node Attribute Information

We are developing a "user-friendly" web utility to retrieve the protein sequence, DNA coding sequence, node attribute information, and GNDs for the proteins in SSNs generated with UniProtKB IDs (from UniProtKB) and EMBL Protein IDs (from Other Proteomes).

Questions

Click here us for help, reporting issues, or suggestions.

Click here to contact us for help, reporting issues, or suggestions.