EFI - Enzyme Similarity Tool

The EST database has been updated to use UniProt 2017_09 and InterPro 65.0.

First step of SSN generation: Input selection


Define the set of sequences to be used in the all-by-all BLAST. The similarity between the defined set of sequences will be calulated. Four input methods are available. A utility for SSN coloring and analysis is also available.
stage 1

Input ?

Option A: Single sequence
The provided sequence is used as the query for a BLAST search of the UniProt database and then, the similarities between the sequences are calculated and used to generate the SSN. Submit only one protein sequence without FASTA header. The default maximum number of retrieved sequences is 5,000.

Advanced Options



Option B: Pfam and/or InterPro families
The sequences from the Pfam and/or InterPro families are retrieved, and then, the similarities between the sequences are calculated and used to generate the SSN. For Pfam families, the format is a comma separated list of PFxxxxx (five digits); for InterPro families, the format is IPRxxxxxx (six digits). The maximum number of retrieved sequences is 275,000.


Advanced Options



Option C: User-supplied set of sequences
The similarities between the provided sequences will be calculated and used to generate the SSN. Input a list of protein sequences in FASTA format with headers, or upload a FASTA file.

Read FASTA headers
When selected, recognized UniProt or Genbank identifiers from FASTA headers are used to retrieve corresponding node attributes from the UniProt database.

FASTA File:

Maximum size is 2048M.

If desired, include a Pfam and/or InterPro families, in the analysis of your FASTA file. For Pfam families, the format is a comma separated list of PFxxxxx (five digits); for InterPro families, the format is IPRxxxxxx (six digits).


Advanced Options



Option D: List of UniProt and/or NCBI IDs
The sequences and attributes corresponding to the recognized identifiers are retreived, and then, the similarities between the sequences are calculated and used to generate the SSN. Input a list of Uniprot, NCBI, or Genbank sequence accession IDs, or upload a text file containing the accession IDs.

Input a list of Uniprot, NCBI, or Genbank sequence accession IDs, and/or upload a text file containing the accession IDs.

Accession ID File:

Maximum size is 2048M.

If desired, include a Pfam and/or InterPro families, in the analysis of your file. For Pfam families, the format is a comma separated list of PFxxxxx (five digits); for InterPro families, the format is IPRxxxxxx (six digits).


Advanced Options



Utility for SSN Coloring and Analysis


Color SSN Utility: Color a previously generated SSN and return associated cluster data.
Independent sequence clusters in the uploaded SSN are identified, numbered and colored. Summary tables, sets of IDs and sequences for specific clusters are provided. A Cytoscape-edited SNN can serve as input for this utility. In order for all of the new features to work correctly, SSNs generated by EFI-EST 2.0 (released 8/17/2017) should be used.

SNN to color and analyze (uncompressed or zipped XGMML file):

Maximum size is 2048M.




Used for data retrieval only

The EST database has been updated to use UniProt 2017_09 and InterPro 65.0.

View Example - Click Here

InterPro Version: 65.0

UniProt Version: 2017_09

EFI-EST Version: 2.0

Need help or have suggestions or comments? Please click here to submit.