EFI - Genome Neighborhood Tool

This web resource is supported by a Research Resource from the National Institute of General Medical Sciences (R24GM141196-01).
The tools are available without charge or license to both academic and commercial users.
The new Taxonomy Tool and Filter by Taxonomy feature facilitate higher resolution analyses of focused regions of sequence-function space using UniProt IDs instead of UniRef90 clusters or UniRef90 clusters instead of UniRef50 clusters. The J Mol Biol article describing these is available on the JMB Resources training page.


Submitted Network Name: 26146_IP91_IPR004184_NoFragments_Bacteria_UniRef90_NoFragments_IPR004184_Bacteria_Minlen650_AS240_full_ssn

The parameters for computing the GNN and associated files are summarized in the table.

Uploaded Filename26146_IP91_IPR004184_NoFragments_Bacteria_UniRef90_NoFragments_IPR004184_Bacteria_Minlen650_AS240_full_ssn.xgmml.zip
Neighborhood Size10
Input % Co-Occurrence20
Number of SSN clusters251
Number of SSN singletons214
SSN sequence sourceUniRef90
Number of SSN (meta)nodes5,419
Number of accession IDs in SSN19,393
Please cite your use of the EFI tools:

Rémi Zallot, Nils Oberg, and John A. Gerlt, The EFI Web Resource for Genomic Enzymology Tools: Leveraging Protein, Genome, and Metagenome Databases to Discover Novel Enzymes and Metabolic Pathways. Biochemistry 2019 58 (41), 4169-4182. https://doi.org/10.1021/acs.biochem.9b00735

Nils Oberg, Rémi Zallot, and John A. Gerlt, EFI-EST, EFI-GNT, and EFI-CGFP: Enzyme Function Initiative (EFI) Web Resource for Genomic Enzymology Tools. J Mol Biol 2023. https://doi.org/10.1016/j.jmb.2023.168018

Colored Sequence Similarity Network (SSN)

Each cluster in the submitted SSN has been identified and assigned a unique number and color. Node attributes for "Neighbor Pfam Families" and "Neighbor InterPro Families" have been added.

# Nodes # Edges File Size (Zipped MB)
5,419 2,021,943 48

Genome Neighborhood Networks (GNNs)

GNNs provide a representation of the neighboring Pfam families for each SSN cluster identified in the colored SSN. To be displayed, neighboring Pfams families must be detected in the specified window and at a co-occurrence frequency higher than the specified minimum.

SSN Cluster Hub-Nodes: Genome Neighborhood Network (GNN)

Each hub-node in the network represents a SSN cluster. The spoke nodes represent Pfam families that have been identified as neighbors of the sequences from the center hub.

File Size (Zipped MB)
Pfam Family Hub-Nodes Genome Neighborhood Network (GNN)

Each hub-node in the network represents a Pfam family identified as a neighbor. The spokes nodes represent SSN clusters that identified the Pfam family from the center hub.

File Size (Zipped MB)

Genome Neighborhood Diagrams (GNDs)

Diagrams representing genomic regions around the genes encoded for the sequences from the submitted SSN are generated. All genes present in the specified window can be visualized (no minimal co-occurrence frequency filter or neighborhood size threshold is applied). Diagram data can be downloaded in .sqlite file format for later review in the View Saved Diagrams tab.

Action File Size (Zipped MB)
Opens GND explorer in a new tab.
Diagram data for later review 157

Mapping Tables, FASTA Files, ID Lists, and Supplementary Files

Mapping Tables
Neighbor Pfam domain fusions at specified minimal co-occurrence frequency 3 MB
Neighbor Pfam domains at specified minimal co-occurrence frequency 4 MB
Neighbor Pfam domain fusions at 0% minimal co-occurrence frequency 11 MB
Neighbor Pfam domains at 0% minimal co-occurrence frequency 12 MB
Neighbors without Pfam assigned <1 MB
Miscellaneous Files
No matches/no neighbors file <1 MB
Pfam family/cluster co-occurrence table file 3 MB
GNN hub cluster sequence count file <1 MB
Cluster size file <1 MB
SwissProt annotations per SSN cluster <1 MB
SwissProt annotations by singleton <1 MB

Click here to contact us for help, reporting issues, or suggestions.