Genome Neighborhood Network Tool

The EFI-Genome Neighborhood Tool (EFI-GNT) allows the exploration of the physical association of genes on genomes, i.e. gene clustering. EFI-GNT enables a user to retrieve, display, and interact with genome neighborhood information for large datasets of sequences.

The GNT database has been updated to use UniProt 2018_09 and ENA 136.
A listing of new features and other information pertaining to GNT is available on the release notes page.

Upload the Sequence Similarity Network (SSN) for which you want to create a Genome Neighborhood Network (GNN)

The submitted SSN must have been generated using Option A, B, C with reading FASTA headers on, or D of EFI-EST 2.0 (released 8/16/2017) to be interpreted.
The SSNs generated with these Options can be modified in Cytoscape.

Select a File to Upload:
The acceptable format is uncompressed or zipped xgmml. Maximum size is 2048M.

Neighborhood Size:
With a value of 10, the PFAM families for 10 genes located upstream and for 10 genes
located downstream of sequences in the SNN will be collected and displayed.
The default value is 10.


This option allows to filter the neighboring pFAMs with a co-occurrence
percentage lower than the set value.
The default value is 20, Valid values are 1-100.

E-mail address:
When the file has been uploaded and processed, you will receive an e-mail containing a link to download the data.

Select a File to Upload:
The acceptable format is sqlite. Maximum size is 2048M.

E-mail address:
When the file has been uploaded and processed, you will receive an e-mail containing a link to view the diagrams.

Clicking on the headers below provides access to various ways of generating genomic network diagrams.

The provided sequence is used as the query for a BLAST search of the UniProt database. The retrieved sequences are used to generate genomic neighborhood diagrams.

Optional job title:
Maximum number of sequences retrieved (≤ 500; default: 200)
Neighborhood window size: Number of neighbors to retrieve on either side of the query sequence for each BLAST result (default: 10)
E-Value: Negative log of e-value for all-by-all BLAST (≥ 1; default: 5)
E-mail address:
When the file has been uploaded and processed, you will receive an e-mail containing a link to view the diagrams.

The genomic neighborhoods are retreived for the UniProt, NCBI, EMBL-EBI ENA, and PDB identifiers that are provided in the input box below. Not all identifiers may exist in the EFI-GNT database so the results will only include diagrams for sequences that were identified.

Alternatively, a file containing a list of IDs can be uploaded:
The acceptable format is text. Maximum size is 2048M.
Optional job title:
Neighborhood window size: Number of neighbors to retrieve on either side of the query sequence for each BLAST result (default: 10)
E-mail address:
When the file has been uploaded and processed, you will receive an e-mail containing a link to view the diagrams.

The genomic neighborhoods are retreived for the UniProt, NCBI, EMBL-EBI ENA, and PDB identifiers that are identified in the FASTA headers. Not all identifiers may exist in the EFI-GNT database so the results will only include diagrams for sequences that were identified.

Alternatively, a file containing FASTA headers and sequences can be uploaded:
The acceptable format is text. Maximum size is 2048M.
Optional job title:
Neighborhood window size: Number of neighbors to retrieve on either side of the query sequence for each BLAST result (default: 10)
E-mail address:
When the file has been uploaded and processed, you will receive an e-mail containing a link to view the diagrams.

EFI-Genome Neighborhood Tool Overview

Although other tools allow comparison of gene neighborhoods among multiple prokaryotic genomes to allow inference of phylogenetic relationships, e.g., IMG (https://img.jgi.doe.gov) and PATRIC (https://www.patricbrc.org), EFI-GNT enables comparison of the genome neighborhoods for clusters of similar protein sequences in order to facilitate the assignment of function within protein families and superfamilies.

EFI-GNT is focused on placing protein families and superfamilies into a context. A sequence similarity network (SSN) with defined protein clusters is used as an input. Each sequence within a SSN is used as a query for interrogation of its genome neighborhood.

EFI-GNT acceptable input

The sequence datasets are generated from an SSN produced by the EFI-Enzyme Similarity Tool (EFI-EST). Acceptable SSNs are generated for an entire Pfam and/or InterPro protein family (from Option B of EFI-EST), a focused region of a family (from Option A of EFI-EST), a set of protein sequence that can be identified from FASTA headers (from option C of EFI-EST with header reading) or a list of recognizable UniProt and/or NCBI IDs (from option D of EFI-EST). A manually modified SSN within Cytoscape that originated from any of the EST options is also acceptable. SSNs that have been colored using the "Color SSN Utility" of EFI-EST and that originated from any of acceptable Options are also acceptable.

Principle of GNT analysis

Protein encoding genes that are neighbors of input queries (within a defined window on either side) are collected from sequence files for bacterial (prokaryotic and archaeal) and fungal genomes in the European Nucleotide Archive (ENA) database. The co-occurrence frequencies of the identified neighboring sequences with the input queries are calculated as well as the absolute values of the distances in open reading frames (orfs) between the queries and neighbors. The calculated information is provided as Genome Neighborhood Networks (GNNs), in addition to a colored version of the input SSN that aids analysis of the GNNs.

EFI-GNT output

EFI-GNT generates two formats of the Genome Neighborhood Network (GNN) as well as a colored version of the input SSN that aids analysis of the GNNs.

The UniProt accession IDs for the queries and the neighbors, the Pfam families for the neighbors, and both the query-neighbor distances (in orfs) and co-occurrence frequencies are provided in the GNNs. The GNNs and colored SSN are downloaded, visualized, and analyzed using Cytoscape.

The user can use Cytoscape to filter the GNNs for a range of query-neighbor distances and/or co-occurrence frequencies to enable the identification of functionally related proteins/enzymes, with shorter distances and great co-occurrence frequencies suggesting functional linkage in a metabolic pathway. With the identities of the Pfam families for the neighbors, the user may be able to infer the in vitro enzymatic activities of the queries and neighbors and predict the reactions in the metabolic pathway in which they participate.

Figure 1: Examples of colored SSN (left) and a hub-and-spoke cluster from a GNN (right).

UniProt Version: 2018_09
ENA Version: 136
EFI-GNT Version: 2.0

Need help or have suggestions or comments? Please click here.