EFI - Enzyme Similarity Tool
A sequence similarity network (SSN) allows researchers to visualize relationships among
protein sequences. In SSNs, the most related proteins are grouped together in
clusters. The Enzyme Similarity Tool (EFI-EST) is a web-tool that allows researchers to
easily generate SSNs that can be visualized in
Overview of possible inputs for EFI-EST
The EFI - ENZYME SIMILARITY TOOL (EFI-EST) is a webserver for the generation of
SSNs. Four options for user-initiated generation of a SSN are available. In
addition, a utility to enhance SSNs interpretation is available.
- Option A: Single sequence query. The provided sequence is used as
the query for a BLAST search of the UniProt database. The retrieved sequences
are used to generate the SSN.
Option A allows the user to explore local sequence-function space for the query
sequence. Homologs are collected and used to generate the SSN. By default,
5,000 sequences are collected
as this number often allows a “full” SSN to be generated and viewed with Cytoscape.
- Option B: Pfam and/or InterPro families. Defined protein families are used to generate the SSN.
Option B allows the user to explore sequence-function space from defined
protein families. A limit of 255,000
sequences is imposed. Generation of a SSN for more than one family is allowed.
- Option C: User-supplied FASTA file.
A SSN is generated from a set of defined sequences.
Option C allows the user to generate a SSN for a provided set of FASTA
formatted sequences. By default, the provided sequences cannot be associated
with sequences in the UniProt database, and only two node attributes are
provided for the SSNs generated: the number of residues as the “Sequence
Length”, and the FASTA header as the “Description”.
An option allows the FASTA headers to be read and if Uniprot or NCBI
identifiers are recognized, the corresponding Uniprot information will be
presented as node attributes.
- Option D: List of UniProt and/or NCBI IDs.
The SSN is generated after
fetching the information from the corresponding databases.
Option D allows the user to provide a list of UniProt IDs, NCBI IDs, and/or
NCBI GI numbers (now “retired”). UniProt IDs are used to retrieve sequences and
annotation information from the UniProt database. When recognized, NCBI IDs and
GI numbers are used to retrieve the “equivalent” UniProt IDs and information.
Sequences with NCBI IDs that cannot be recognized will not be included in the
SSN and a “nomatch” file listing these IDs is available for download.
- Utility for the identification and coloring of independent clusters within a
Independent clusters in the uploaded SSN are identified, numbered and colored.
Summary tables, sets of IDs and sequences for specific clusters and are
provided. A manually edited SNN can serve as input for this utility.
Please see our recent
review in BBA Proteins for examples of EFI-EST use.
Need help or have suggestions or comments? Please click here to submit.