The EFI-Genome Neighborhood Tool (EFI-GNT) allows the exploration of the physical association of genes on genomes, i.e. gene clustering. EFI-GNT enables a user to retrieve, display, and interact with genome neighborhood information for large datasets of sequences.A listing of new features and other information pertaining to GNT is available on the release notes page.
Upload the Sequence Similarity Network (SSN) for which you want to create a Genome Neighborhood Network (GNN)
The submitted SSN must have been generated using Option A, B, C with reading FASTA headers on, or D
of EFI-EST 2.0 (released 8/16/2017) to be interpreted.
The SSNs generated with these Options can be modified in Cytoscape.
The provided sequence is used as the query for a BLAST search of the UniProt database. The retrieved sequences are used to generate genomic neighborhood diagrams.
The genomic neighborhoods are retreived for the UniProt, NCBI, EMBL-EBI ENA, and PDB identifiers that are provided in the input box below. Not all identifiers may exist in the EFI-GNT database so the results will only include diagrams for sequences that were identified.
The genomic neighborhoods are retreived for the UniProt, NCBI, EMBL-EBI ENA, and PDB identifiers that are identified in the FASTA headers. Not all identifiers may exist in the EFI-GNT database so the results will only include diagrams for sequences that were identified.
Although other tools allow comparison of gene neighborhoods among multiple prokaryotic genomes to allow inference of phylogenetic relationships, e.g., IMG (https://img.jgi.doe.gov) and PATRIC (https://www.patricbrc.org), EFI-GNT enables comparison of the genome neighborhoods for clusters of similar protein sequences in order to facilitate the assignment of function within protein families and superfamilies.
EFI-GNT is focused on placing protein families and superfamilies into a context. A sequence similarity network (SSN) with defined protein clusters is used as an input. Each sequence within a SSN is used as a query for interrogation of its genome neighborhood.
The sequence datasets are generated from an SSN produced by the EFI-Enzyme Similarity Tool (EFI-EST). Acceptable SSNs are generated for an entire Pfam and/or InterPro protein family (from Option B of EFI-EST), a focused region of a family (from Option A of EFI-EST), a set of protein sequence that can be identified from FASTA headers (from option C of EFI-EST with header reading) or a list of recognizable UniProt and/or NCBI IDs (from option D of EFI-EST). A manually modified SSN within Cytoscape that originated from any of the EST options is also acceptable. SSNs that have been colored using the "Color SSN Utility" of EFI-EST and that originated from any of acceptable Options are also acceptable.
Protein encoding genes that are neighbors of input queries (within a defined window on either side) are collected from sequence files for bacterial (prokaryotic and archaeal) and fungal genomes in the European Nucleotide Archive (ENA) database. The co-occurrence frequencies of the identified neighboring sequences with the input queries are calculated as well as the absolute values of the distances in open reading frames (orfs) between the queries and neighbors. The calculated information is provided as Genome Neighborhood Networks (GNNs), in addition to a colored version of the input SSN that aids analysis of the GNNs.
EFI-GNT generates two formats of the Genome Neighborhood Network (GNN) as well as a colored version of the input SSN that aids analysis of the GNNs.
The UniProt accession IDs for the queries and the neighbors, the Pfam families for the neighbors, and both the query-neighbor distances (in orfs) and co-occurrence frequencies are provided in the GNNs. The GNNs and colored SSN are downloaded, visualized, and analyzed using Cytoscape.
The user can use Cytoscape to filter the GNNs for a range of query-neighbor distances and/or co-occurrence frequencies to enable the identification of functionally related proteins/enzymes, with shorter distances and great co-occurrence frequencies suggesting functional linkage in a metabolic pathway. With the identities of the Pfam families for the neighbors, the user may be able to infer the in vitro enzymatic activities of the queries and neighbors and predict the reactions in the metabolic pathway in which they participate.
Figure 1: Examples of colored SSN (left) and a hub-and-spoke cluster from a GNN (right).
UniProt Version: 2018_04
ENA Version: 134
EFI-GNT Version: 2.0