EFI-GNT Input and Output Pages
The EFI-GNT web tool is a user-friendly interface to the software that accepts the query input SSN, collects the genome neighbors, and generates the colored version of the input SSN, the GNN in the two formats described in the previous section, and text files for download. The software is run on a server housed in the Institute for Genomic Biology (IGB) at the University of Illinois at Urbana-Champaign.
The input SSN must be in the form of an xgmml file for 1) a SSN generated by either Option A or Option B of the EFI-EST web tool or 2) a SSN generated by either Option A or Option B from the EFI-EST web too and manipulated and exported from Cytoscape.
SSNs generated with Option C of the EFI-EST will not work—the process for generating the GNN requires that the sequences have UniProt IDs.
The maximum size of the xgmml file is 2048 MB. The SSN may be either a full SSN (a node for each sequence) or a representative-node (rep-node) SSN (sequences sharing greater than a user-selected sequence identity are located in the same metanode).
EFI-GNT uses a default ± 10 orf window to collect the genome neighbors; the user can select a smaller window (from ± 3 – ± 20 orfs) in the “Neighborhood Size” pull-down menu
EFI-GNT collects all genome neighbors within the specified window. However, it will display a spoke node only if the query-neighbor co-occurrence frequency is greater than a specified value. The default value is 20%. A smaller value, e.g., 5%, should also be used to find neighbors that co-occur with low frequency, often as the result of phylogenetically diverse genome arrangements of functionally linked pathway enzymes. As the co-occurrence frequency is decreased, a larger number of neighbors and Pfam families will be reported in the GNN.
As with EFI-EST, the user also inputs an e-mail address to which an email containing a link to the results will be sent.
When the results are available (typically a few minutes, although the time required for the analysis increases with the number of query sequences in the input SSN), an e-mail with a link to the output is sent to the address provided by the user on the Start page. The link will be active for seven days.
The EFI-GNT output is three xgmml files and several text/spreadsheet files.
Colored SSN: The colored version of the SSN described in the previous section is available for download as an xgmml file for viewing in Cytoscape. This SSN allows the user to quickly associated SSN cluster spoke nodes in the GNNs with clusters in the input query SSN.
Two formats of the GNN: The two formats of the GNN described in the previous section are available for download as xgmml files and viewing/analysis in Cytoscape:
1. A cluster is present for each Pfam family (hub-node) that was identified as a neighbor to queries in the SSN clusters (spoke-nodes). This format allows the user to assess whether queries in multiple SSN clusters are neighbors to members of the same Pfam family and, therefore, may have the same in vitro activities and in vivo metabolic functions.
2. A cluster is present for each query SSN cluster (hub-node) that was used to identify genome neighbors (spoke-nodes). This format allows the user to identify functionally linked enzymes, as deduced from genome proximity, that constitute the metabolic pathway in which the sequences in the query SSN cluster participate.
Text/Spreadsheet files: Additional files are available to allow the user to perform additional analyses. At present these are:
1. Text file with list of query accession IDs not found in the bacterial and fungal ENA files (nomatch.tab), i.e., not in the STD (annotated assembled sequences), CON (high level constructed sequences), and WGS (whole genome shotgun sequencing with intermediate level of assembly) files for bacterial and fungal proteins.
This can be used to generate custom node attribute to identify sequences with no matches in the query SSN.
2. Text file with list of query accession IDs that do not have genome neighbors (noneighb.tab), in the bacterial and fungal ENA files, i.e., the ENA files contain single orfs.
This can be used to generate custom node attribute to identify sequences with no neighbors in the query SSN.
In the near future, we will provide additional files to facilitate downstream analyses, including the mapping of neighbors to the SSNs for their Pfam families.