EFI - Genome Neighborhood Tool

This web resource is supported by a Research Resource from the National Institute of General Medical Sciences (R24GM141196-01).
The tools are available without charge or license to both academic and commercial users.
Reorganization of UniProtKB

With the current 2026_02 release, the UniProtKB database is reorganized to include an expanded number of Reference Proteomes to better capture biodiversity. This includes the removal of proteins from taxonomically unclassified organisms, i.e., those without a binomial species name (genus and species). The total number of accessions in UniProtKB has been reduced from 253,635,358 in the “legacy” 2025_03 release to 149,810,139 in the current 2026_02 release.

We are providing the option to select either the “legacy” 2025_03 database or the current UniProtKB database (now 2026_02) when generating SSNs. You can select the database in the “Database” accordion on the pages for the EFI-EST options, the EFI-GNT tool, and the Taxonomy Tool. We suggest that you compare the SSNs, GNNs, and GNDs generated from both databases as you explore the information you are seeking.

Because the “legacy” 2025_03 release contains UniProt IDs that are no longer active on the UniProt web site, we provide the Metadata Tool that provides access to the node attribute metadata for the UniProt IDs in the “legacy” 2025_03 release.

Results

Submitted Network Name: 26147_IP91_IPR004184_NoFragments_Actinobacteria_UniRef90_NoFragments_IPR004184_Actinobacteria_Minlen650_AS240_full_ssn

The parameters for computing the GNN and associated files are summarized in the table.

Uploaded Filename26147_IP91_IPR004184_NoFragments_Actinobacteria_UniRef90_NoFragments_IPR004184_Actinobacteria_Minlen650_AS240_full_ssn.xgmml.zip
Neighborhood Size10
Input % Co-Occurrence20
Database Version
Number of SSN clusters17
Number of SSN singletons14
SSN sequence sourceUniRef90
Number of SSN (meta)nodes488
Number of accession IDs in SSN1,557
Please cite your use of the EFI tools:

Rémi Zallot, Nils Oberg, and John A. Gerlt, The EFI Web Resource for Genomic Enzymology Tools: Leveraging Protein, Genome, and Metagenome Databases to Discover Novel Enzymes and Metabolic Pathways. Biochemistry 2019 58 (41), 4169-4182. https://doi.org/10.1021/acs.biochem.9b00735

Nils Oberg, Rémi Zallot, and John A. Gerlt, EFI-EST, EFI-GNT, and EFI-CGFP: Enzyme Function Initiative (EFI) Web Resource for Genomic Enzymology Tools. J Mol Biol 2023. https://doi.org/10.1016/j.jmb.2023.168018

Colored Sequence Similarity Network (SSN)

Each cluster in the submitted SSN has been identified and assigned a unique number and color. Node attributes for "Neighbor Pfam Families" and "Neighbor InterPro Families" have been added.

# Nodes # Edges File Size (Zipped MB)
488 64,199 2

Genome Neighborhood Networks (GNNs)

GNNs provide a representation of the neighboring Pfam families for each SSN cluster identified in the colored SSN. To be displayed, neighboring Pfams families must be detected in the specified window and at a co-occurrence frequency higher than the specified minimum.

SSN Cluster Hub-Nodes: Genome Neighborhood Network (GNN)

Each hub-node in the network represents a SSN cluster. The spoke nodes represent Pfam families that have been identified as neighbors of the sequences from the center hub.

File Size (Zipped MB)
<1
Pfam Family Hub-Nodes Genome Neighborhood Network (GNN)

Each hub-node in the network represents a Pfam family identified as a neighbor. The spokes nodes represent SSN clusters that identified the Pfam family from the center hub.

File Size (Zipped MB)
<1

Genome Neighborhood Diagrams (GNDs)

Diagrams representing genomic regions around the genes encoded for the sequences from the submitted SSN are generated. All genes present in the specified window can be visualized (no minimal co-occurrence frequency filter or neighborhood size threshold is applied). Diagram data can be downloaded in .sqlite file format for later review in the View Saved Diagrams tab.

Action File Size (Zipped MB)
Opens GND explorer in a new tab.
Diagram data for later review 14

Mapping Tables, FASTA Files, ID Lists, and Supplementary Files

Mapping Tables
Neighbor Pfam domain fusions at specified minimal co-occurrence frequency <1 MB
Neighbor Pfam domains at specified minimal co-occurrence frequency <1 MB
Neighbor Pfam domain fusions at 0% minimal co-occurrence frequency 1 MB
Neighbor Pfam domains at 0% minimal co-occurrence frequency 1 MB
Neighbors without Pfam assigned <1 MB
Miscellaneous Files
No matches/no neighbors file <1 MB
Pfam family/cluster co-occurrence table file <1 MB
GNN hub cluster sequence count file <1 MB
Cluster size file <1 MB
SwissProt annotations per SSN cluster <1 MB
SwissProt annotations by singleton <1 MB

Click here to contact us for help, reporting issues, or suggestions.