A sequence similarity network (SSN) allows researchers to visualize relationships among protein sequences. In SSNs, the most related proteins are grouped together in clusters. The Enzyme Similarity Tool (EFI-EST) is a web-tool that allows researchers to easily generate SSNs that can be visualized in Cytoscape (3).
When a family is selected in Options B, C, and D, SSNs now can be generated using the UniRef90 database in which UniProt sequences that share ≥90% sequence identity over 80% of the sequence length are clustered and represented by a single seed sequence. For most families, use of Uniref90 seed sequences decreases the time for the BLAST step by a factor of ≥4. The UniRef90 SSNs are analogous to 90% representative node SSNs generated using all UniProt sequences. The UniRef90 SSNs contain a node attribute "UniRef90 Cluster IDs" that lists the UniProt IDs is each node and is searchable with Cytoscape, so all UniProt IDs in the family can be located. The UniRef90 SSNs are compatible with the EFI-GNT tool.
A listing of new features and other information pertaining to EST is available on the release notes page.
Information on Pfam families and clans and InterPro family sizes is now available on the Family Information page.
The provided sequence is used as the query for a BLAST search of the UniProt database and then, the similarities between the sequences are calculated and used to generate the SSN. Submit only one protein sequence without FASTA header. The default maximum number of retrieved sequences is 5,000.
The sequences from the Pfam families, InterPro families, and/or Pfam clans (superfamilies) are retrieved, and then, the similarities between the sequences are calculated and used to generate the SSN. For Pfam families, the format is a comma separated list of PFxxxxx (five digits); for InterPro families, the format is IPRxxxxxx (six digits); for Pfam clans, the format is CLxxxx (four digits). Lists of Pfam families, InterPro families, and Pfam clans are included in the release notes.
The maximum number of retrieved sequences is 305,000. For large Pfam families, InterPro families, and Pfam clans, we recommend using the UniRef90 seed sequences.
The EFI - ENZYME SIMILARITY TOOL (EFI-EST) is a webserver for the generation of SSNs. Four options for user-initiated generation of a SSN are available. In addition, a utility to enhance SSNs interpretation is available.
Option A allows the user to explore local sequence-function space for the query sequence. Homologs are collected and used to generate the SSN. By default, 5,000 sequences are collected as this number often allows a “full” SSN to be generated and viewed with Cytoscape.
Option B allows the user to explore sequence-function space from defined protein families. A limit of 305,000 sequences is imposed. Generation of a SSN for more than one family is allowed.
Option C allows the user to generate a SSN for a provided set of FASTA formatted sequences. By default, the provided sequences cannot be associated with sequences in the UniProt database, and only two node attributes are provided for the SSNs generated: the number of residues as the “Sequence Length”, and the FASTA header as the “Description”.
An option allows the FASTA headers to be read and if Uniprot or NCBI identifiers are recognized, the corresponding Uniprot information will be presented as node attributes.
Option D allows the user to provide a list of UniProt IDs, NCBI IDs, and/or NCBI GI numbers (now “retired”). UniProt IDs are used to retrieve sequences and annotation information from the UniProt database. When recognized, NCBI IDs and GI numbers are used to retrieve the “equivalent” UniProt IDs and information. Sequences with NCBI IDs that cannot be recognized will not be included in the SSN and a “nomatch” file listing these IDs is available for download.
Independent clusters in the uploaded SSN are identified, numbered and colored. Summary tables, sets of IDs and sequences for specific clusters and are provided. A manually edited SNN can serve as input for this utility.
UniProt Version: 2017_11