Introduction to EFI-Genome Neighborhood Tool
The EFI-Genome Neighborhood Tool (EFI-GNT) web tool enables a user to retrieve, display, and interact with genome neighborhood information for large datasets of protein sequences, including entire protein families. The sequence datasets are generated using the EFI-Enzyme Similarity Tool (EFI-EST) web tool and can be either for 1) an entire Pfam and/or InterPro protein family (from Option B of EFI-EST) or 2) a focused region of a family (from Option A of EFI-EST).
EFI-EST generates Sequence Similarity Networks (SSNs) that are visualized and analyzed using Cytoscape. An SSN (in the .xgmml file format) segregated into potential isofunctional families (by filtering with an appropriate alignment score) is the input for EFI-GNT. The genome neighborhood proteins within an orf window on either side of the input queries (default ± 10 orfs; the user can change the window size) are collected from sequence files for bacterial (prokaryotic and archaeal) and fungal genomes in the European Nucleotide Archive (ENA) database. EFI-GNT generates two formats of the Genome Neighborhood Network (GNN) as well as a colored version of the input SSN that aids analysis of the GNNs.
The UniProt accession IDs for the queries and the neighbors, the Pfam families for the neighbors, and both the query-neighbor distances (in orfs) and co-occurrence frequencies are provided in the GNNs. The GNNs and colored SSN are downloaded, visualized, and analyzed using Cytoscape.
The user can filter the GNNs for a range of query-neighbor distances and/or co-occurrence frequencies to enable the identification of functionally related proteins/enzymes, with shorter distances and great co-occurrence frequencies suggesting functional linkage in a metabolic pathway. With the identities of the Pfam families for the neighbors, the user may be able to infer the in vitro enzymatic activities of the queries and neighbors and predict the reactions in the metabolic pathway in which they participate.
Figure 1: Examples of colored SSN (left) and a hub-and-spoke cluster from a GNN (right).
Although other tools may allow comparison gene neighborhoods among multiple prokaryotic genomes to allow inference of phylogenetic relationships, e.g., IMG (https://img.jgi.doe.gov - EFI-GNT enables comparison of the genome neighborhoods for clusters of similar protein sequences in order to facilitate the assignment of function within protein families and superfamilies.
If you are new to this tool, we recommend that you first read the tutorial sections.
When you are ready to generate a GNN, follow the “Begin EFI-GNT” link at the bottom of the page to upload the xgmml file for your SSN.