EFI - Enzyme Similarity Tool

This web resource is supported by a Research Resource from the National Institute of General Medical Sciences (R24GM141196-01).
The tools are available without charge or license to both academic and commercial users.

Cluster Analyses and Downloads

Uploaded Filename: 26131_IP91_IPR004184_UniRef90_NoFragments_Bacteroidetes_Minlen650_AS240_full_ssn.xgmml

The Data File Download tab provides the Color SSN with the nodes colored according to node.fillColor (Cluster Sequence Count).

Six node attributes were added to the input SSN: Cluster Sequence Count, Sequence Count Cluster Number, Cluster Node Count, Node Count Cluster Number, node.fillColor (according to Cluster Sequence Count, hexadecimal), and Node Count Fill Color (according to Cluster Node Count, hexadecimal).

To change the node colors in Cytoscape to Node Count Fill Color: 1) select all nodes; 2) on the Style Panel, click on the "?" in the Fill Color Property; 3) select "Remove Bypass"; 4) deselect the nodes (default node color); and 5) open the Fill Color Property and select "Node Count Fill Color" as the Column and "Passthrough Mapping" as the Mapping Type. The nodes will be recolored.

The Data File Download tab also provides files for 1) UniProt ID-Color-Cluster Number mapping table, 2) ID Lists and FASTA Files for each cluster, 3) cluster sizes, and 4) SwissProt annotations for clusters and singletons. The number of UniRef/UniProt IDs for each cluster is displayed in the WebLogos, HMMs, and Length Histograms tabs.

The WebLogos tab provides the WebLogo (generated using http://weblogo.threeplusone.com/) and MSA (generated using MUSCLE) for the node IDs in each SSN cluster containing at least the specified "Minimum Node Count". The MSA can be viewed with Jalview (https://www.jalview.org/). This tab also provides the percent identity matrix for the multiple sequence alignment, as computed by Clustal-Omega.

The Consensus Residues tab provides a tab-delimited text file with the number of the conserved residues and their MSA positions for each specified residue in each SSN cluster (numbered by Cluster Sequence Count) containing at least the specified "Minimum Node Count".

The HMMs tab provides the HMM for each SSN cluster containing at least the specified "Minimum Node Count". The Skylign download provides the image of the HMM generated from the MSA (https://skylign.org/). The HMM text file can be viewed interactively by uploading to https://skylign.org/ and selecting "Information Content – Above Background"; the probability of each amino acid residue and probability and length of an insert at each position is provided. The p

The Length Histograms tab provides length histograms for each cluster containing at least the specified "Minimum Node Count".

Submission Summary Table

Job Number26229
Input OptionCluster Analysis
Uploaded Filename26131_IP91_IPR004184_UniRef90_NoFragments_Bacteroidetes_Minlen650_AS240_full_ssn.xgmml
Database VersionUniProt: 2022-04 / InterPro: 91
Analysis OptionsWeblogo, HMM, Consensus Residue, Length Histogram (AAs=C; Thresholds=0.9,0.8,0.7,0.6,0.5,0.4,0.3,0.2,0.1; )
Number of SSN clusters24
Number of SSN singletons13
SSN sequence sourceUniRef90
Number of SSN (meta)nodes333
Number of accession IDs in SSN1,122
Please cite your use of the EFI tools:

Rémi Zallot, Nils Oberg, and John A. Gerlt, The EFI Web Resource for Genomic Enzymology Tools: Leveraging Protein, Genome, and Metagenome Databases to Discover Novel Enzymes and Metabolic Pathways. Biochemistry 2019 58 (41), 4169-4182. https://doi.org/10.1021/acs.biochem.9b00735

Nils Oberg, Rémi Zallot, and John A. Gerlt, EFI-EST, EFI-GNT, and EFI-CGFP: Enzyme Function Initiative (EFI) Web Resource for Genomic Enzymology Tools. J Mol Biol 2023. https://doi.org/10.1016/j.jmb.2023.168018

Colored SSN

Each cluster in the submitted SSN has been identified and assigned a unique number and color.

Supplementary Files

Mapping Tables
UniProt ID-Color-Cluster number mapping table
ID Lists and FASTA Files per Cluster
UniProt ID lists per cluster
UniRef90 ID lists per cluster
FASTA files per UniProt cluster
FASTA files per UniRef90 cluster
Miscellaneous Files
Cluster sizes
SwissProt annotations by cluster

Consensus Residues

C
Consensus residue position summary table (full) <1 MB

HMMs

If the HMM is missing for Node Cluster 1 (and additional clusters with large numbers of nodes), repeat the job with a "Maximum Node Count" in the Sequence Filter input window. MUSCLE can fail with a "large" number of sequences (variable, anywhere from >750 to >1500).

HMMs for FASTA UniProt cluster (full length sequences) 4 MB


Sequence Cluster 1 / Node Cluster 1

Full Sequences
Number of IDs: UniProt: 675, UniRef90: 136
Cluster 1

Sequence Cluster 2 / Node Cluster 2

Full Sequences
Number of IDs: UniProt: 271, UniRef90: 121
Cluster 2

Sequence Cluster 5 / Node Cluster 3

Full Sequences
Number of IDs: UniProt: 14, UniRef90: 9
Cluster 5

Sequence Cluster 8 / Node Cluster 4

Full Sequences
Number of IDs: UniProt: 8, UniRef90: 8
Cluster 8

Sequence Cluster 6 / Node Cluster 5

Full Sequences
Number of IDs: UniProt: 12, UniRef90: 8
Cluster 6

Sequence Cluster 7 / Node Cluster 6

Full Sequences
Number of IDs: UniProt: 11, UniRef90: 6
Cluster 7

Length Histograms

Length Histograms for FASTA UniProt cluster (full length sequences, UniProt) <1 MB


Sequence Cluster 1 / Node Cluster 1

Full Sequences
Number of IDs: UniProt: 675, UniRef90: 136
Cluster 1
Full Sequences
Number of IDs: UniProt: 675, UniRef90: 136
Cluster 1

Sequence Cluster 2 / Node Cluster 2

Full Sequences
Number of IDs: UniProt: 271, UniRef90: 121
Cluster 2
Full Sequences
Number of IDs: UniProt: 271, UniRef90: 121
Cluster 2

Sequence Cluster 5 / Node Cluster 3

Full Sequences
Number of IDs: UniProt: 14, UniRef90: 9
Cluster 5
Full Sequences
Number of IDs: UniProt: 14, UniRef90: 9
Cluster 5

Sequence Cluster 8 / Node Cluster 4

Full Sequences
Number of IDs: UniProt: 8, UniRef90: 8
Cluster 8
Full Sequences
Number of IDs: UniProt: 8, UniRef90: 8
Cluster 8

Sequence Cluster 6 / Node Cluster 5

Full Sequences
Number of IDs: UniProt: 12, UniRef90: 8
Cluster 6
Full Sequences
Number of IDs: UniProt: 12, UniRef90: 8
Cluster 6

Sequence Cluster 7 / Node Cluster 6

Full Sequences
Number of IDs: UniProt: 11, UniRef90: 6
Cluster 7
Full Sequences
Number of IDs: UniProt: 11, UniRef90: 6
Cluster 7

Sequence Cluster 3 / Node Cluster 7

Full Sequences
Number of IDs: UniProt: 41, UniRef90: 4
Cluster 3
Full Sequences
Number of IDs: UniProt: 41, UniRef90: 4
Cluster 3

Sequence Cluster 10 / Node Cluster 8

Full Sequences
Number of IDs: UniProt: 4, UniRef90: 3
Cluster 10
Full Sequences
Number of IDs: UniProt: 4, UniRef90: 3
Cluster 10

Sequence Cluster 11 / Node Cluster 9

Full Sequences
Number of IDs: UniProt: 4, UniRef90: 3
Cluster 11
Full Sequences
Number of IDs: UniProt: 4, UniRef90: 3
Cluster 11

Sequence Cluster 18 / Node Cluster 10

Full Sequences
Number of IDs: UniProt: 2, UniRef90: 2
Cluster 18
Full Sequences
Number of IDs: UniProt: 2, UniRef90: 2
Cluster 18

Sequence Cluster 15 / Node Cluster 11

Full Sequences
Number of IDs: UniProt: 3, UniRef90: 2
Cluster 15
Full Sequences
Number of IDs: UniProt: 3, UniRef90: 2
Cluster 15

Sequence Cluster 12 / Node Cluster 12

Full Sequences
Number of IDs: UniProt: 4, UniRef90: 2
Cluster 12
Full Sequences
Number of IDs: UniProt: 4, UniRef90: 2
Cluster 12

Sequence Cluster 13 / Node Cluster 13

Full Sequences
Number of IDs: UniProt: 4, UniRef90: 2
Cluster 13
Full Sequences
Number of IDs: UniProt: 4, UniRef90: 2
Cluster 13

Sequence Cluster 16 / Node Cluster 14

Full Sequences
Number of IDs: UniProt: 3, UniRef90: 2
Cluster 16
Full Sequences
Number of IDs: UniProt: 3, UniRef90: 2
Cluster 16

Sequence Cluster 9 / Node Cluster 15

Full Sequences
Number of IDs: UniProt: 5, UniRef90: 2
Cluster 9
Full Sequences
Number of IDs: UniProt: 5, UniRef90: 2
Cluster 9

Sequence Cluster 19 / Node Cluster 16

Full Sequences
Number of IDs: UniProt: 2, UniRef90: 2
Cluster 19
Full Sequences
Number of IDs: UniProt: 2, UniRef90: 2
Cluster 19

Sequence Cluster 20 / Node Cluster 21

Full Sequences
Number of IDs: UniProt: 2, UniRef90: 1
Cluster 20

Sequence Cluster 21 / Node Cluster 23

Full Sequences
Number of IDs: UniProt: 2, UniRef90: 1
Cluster 21

Sequence Cluster 22 / Node Cluster 25

Full Sequences
Number of IDs: UniProt: 2, UniRef90: 1
Cluster 22

Sequence Cluster 17 / Node Cluster 30

Full Sequences
Number of IDs: UniProt: 3, UniRef90: 1
Cluster 17

Sequence Cluster 14 / Node Cluster 32

Full Sequences
Number of IDs: UniProt: 4, UniRef90: 1
Cluster 14

Sequence Cluster 23 / Node Cluster 34

Full Sequences
Number of IDs: UniProt: 2, UniRef90: 1
Cluster 23

Sequence Cluster 24 / Node Cluster 36

Full Sequences
Number of IDs: UniProt: 2, UniRef90: 1
Cluster 24

Sequence Cluster 4 / Node Cluster 37

Full Sequences
Number of IDs: UniProt: 29, UniRef90: 1
Cluster 4

Click here to contact us for help, reporting issues, or suggestions.