The Data File Download tab provides the Color SSN with the nodes colored
according to node.fillColor (Cluster Sequence Count).
Six node attributes were added to the input SSN: Cluster Sequence Count,
Sequence Count Cluster Number, Cluster Node Count,
Node Count Cluster Number,
node.fillColor (according to Cluster Sequence Count, hexadecimal), and
Node Count Fill Color (according to Cluster Node Count, hexadecimal).
To change the node colors in Cytoscape to Node Count Fill Color: 1) select all nodes; 2) on
the Style Panel, click on the "?" in the Fill Color Property; 3) select "Remove
Bypass"; 4) deselect the nodes (default node color); and 5) open the Fill Color
Property and select "Node Count Fill Color" as the Column and "Passthrough
Mapping" as the Mapping Type. The nodes will be recolored.
The Data File Download tab also provides files for 1) UniProt ID-Color-Cluster
Number mapping table, 2) ID Lists and FASTA Files for each cluster, 3) cluster
sizes, and 4) SwissProt annotations for clusters and singletons.
The number of UniRef/UniProt IDs for each cluster is displayed in the
WebLogos, HMMs, and Length Histograms tabs.
The WebLogos tab provides the WebLogo (generated using
http://weblogo.threeplusone.com/) and MSA (generated using MUSCLE) for the node
IDs in each SSN cluster containing at least the specified "Minimum Node Count".
The MSA can be viewed with Jalview (https://www.jalview.org/).
This tab also provides the percent identity matrix for the multiple sequence alignment, as computed by Clustal-Omega.
The Consensus Residues tab provides a tab-delimited text file with the number
of the conserved residues and their MSA positions for each specified residue in
each SSN cluster (numbered by Cluster Sequence Count) containing at least the
specified "Minimum Node Count".
The HMMs tab provides the HMM for each SSN cluster containing at least the
specified "Minimum Node Count". The Skylign download provides the image of the
HMM generated from the MSA (https://skylign.org/). The HMM text file can be
viewed interactively by uploading to https://skylign.org/ and selecting
"Information Content – Above Background"; the probability of each amino acid
residue and probability and length of an insert at each position is provided.
The p
The Length Histograms tab provides length histograms for each cluster
containing at least the specified "Minimum Node Count".
Rémi Zallot, Nils Oberg, and John A. Gerlt, The EFI Web Resource for Genomic Enzymology Tools: Leveraging Protein, Genome, and Metagenome Databases to Discover Novel Enzymes and Metabolic Pathways. Biochemistry 2019 58 (41), 4169-4182. https://doi.org/10.1021/acs.biochem.9b00735
Nils Oberg, Rémi Zallot, and John A. Gerlt, EFI-EST, EFI-GNT, and EFI-CGFP: Enzyme Function Initiative (EFI) Web Resource for Genomic Enzymology Tools. J Mol Biol 2023. https://doi.org/10.1016/j.jmb.2023.168018
Colored SSN
Each cluster in the submitted SSN has been identified and assigned a unique number and color.
Supplementary Files
Mapping Tables
UniProt ID-Color-Cluster number mapping table
ID Lists and FASTA Files per Cluster
UniProt ID lists per cluster
UniRef90 ID lists per cluster
FASTA files per UniProt cluster
FASTA files per UniRef90 cluster
Miscellaneous Files
Cluster sizes
SwissProt annotations by cluster
WebLogos
If the WebLogo is missing for Node Cluster 1 (and additional clusters with large numbers of nodes),
repeat the job with a "Maximum Node Count" in the Sequence Filter input window. MUSCLE can fail
with a "large" number of sequences (variable, anywhere from >750 to >1500).
WebLogos for FASTA UniProt cluster (full length sequences)
2 MB
Percent Identity Matrix for FASTA UniProt cluster (full length sequences)
If the HMM is missing for Node Cluster 1 (and additional clusters with large numbers of nodes),
repeat the job with a "Maximum Node Count" in the Sequence Filter input window. MUSCLE can fail
with a "large" number of sequences (variable, anywhere from >750 to >1500).
HMMs for FASTA UniProt cluster (full length sequences)