EFI - Enzyme Similarity Tool

This web resource is supported by a Research Resource from the National Institute of General Medical Sciences (R24GM141196-01).
The tools are available without charge or license to both academic and commercial users.
RadicalSAM.org, our resource for investigating sequence-function space in the radical SAM superfamily, has been updated with sequences from the UniProt Release 2024_01 and InterPro Release 98 databases (January 24, 2024) !!

https://radicalsam.org

Download Network Files

Submission Name: IP91_RSS_UniRef90_NoFragments_Archaea

Network Name: IP91_RSS_UniRef90_NoFragments_Archaea_Minlen140_AS11

The parameters used for the initial submission and the finalization are summarized in the table below.

Analysis Summary

Analysis Job Number26618
Network NameIP91_RSS_UniRef90_NoFragments_Archaea_Minlen140_AS11
Alignment Score11
Minimum Length140
Maximum Length50,000
Total Number of Sequences After Filtering36,996

Dataset Summary

EST Job Number26161 (Original Dataset)
Database VersionUniProt: 2022-04 / InterPro: 91
Input OptionFamilies (Option B)
Job NameIP91_RSS_UniRef90_NoFragments_Archaea
E-Value for SSN Edge Calculation5
Pfam / InterPro FamilyIPR000385, IPR001989, IPR002684, IPR003698, IPR003739, IPR004383, IPR004558, IPR004559, IPR005839, IPR005840, IPR005909, IPR005911, IPR005980, IPR006463, IPR006466, IPR006467, IPR006638, IPR007197, IPR010505, IPR010722, IPR010723, IPR011101, IPR011843, IPR012726, IPR012837, IPR012838, IPR012839, IPR013483, IPR013704, IPR013848, IPR013917, IPR014191, IPR016431, IPR016771, IPR016779, IPR016863, IPR017200, IPR017672, IPR017742, IPR017833, IPR017834, IPR019939, IPR019940, IPR020050, IPR020612, IPR022431, IPR022432, IPR022447, IPR022459, IPR022462, IPR022881, IPR022946, IPR023404, IPR023805, IPR023807, IPR023819, IPR023820, IPR023821, IPR023822, IPR023858, IPR023862, IPR023863, IPR023867, IPR023868, IPR023874, IPR023880, IPR023885, IPR023886, IPR023891, IPR023897, IPR023904, IPR023912, IPR023913, IPR023930, IPR023969, IPR023979, IPR023980, IPR023984, IPR023992, IPR023993, IPR023995, IPR024001, IPR024007, IPR024016, IPR024017, IPR024018, IPR024021, IPR024023, IPR024025, IPR024032, IPR024177, IPR024521, IPR024560, IPR024924, IPR025895, IPR026322, IPR026332, IPR026335, IPR026344, IPR026346, IPR026351, IPR026357, IPR026401, IPR026404, IPR026407, IPR026412, IPR026423, IPR026426, IPR026429, IPR026447, IPR026482, IPR027492, IPR027526, IPR027527, IPR027559, IPR027564, IPR027570, IPR027583, IPR027586, IPR027596, IPR027604, IPR027608, IPR027609, IPR027621, IPR027622, IPR027626, IPR027633, IPR030801, IPR030837, IPR030894, IPR030896, IPR030905, IPR030915, IPR030933, IPR030950, IPR030969, IPR030977, IPR030989, IPR031003, IPR031004, IPR031010, IPR031012, IPR031014, IPR031015, IPR031019, IPR031691, IPR032432, IPR033971, IPR033974, IPR033975, IPR033976, IPR034165, IPR034386, IPR034391, IPR034405, IPR034422, IPR034428, IPR034436, IPR034438, IPR034457, IPR034462, IPR034465, IPR034466, IPR034471, IPR034474, IPR034479, IPR034480, IPR034485, IPR034491, IPR034497, IPR034498, IPR034505, IPR034508, IPR034514, IPR034515, IPR034519, IPR034529, IPR034530, IPR034531, IPR034532, IPR034534, IPR034547, IPR034556, IPR034557, IPR034559, IPR034560, IPR034687, IPR038135, IPR039661, IPR040063, IPR040072, IPR040074, IPR040081, IPR040082, IPR040085, IPR040086, IPR040087, IPR040088, IPR041582, IPR045375, IPR045567, IPR045784, PF04055, PF06969, PF08497, PF12345, PF13186, PF16199, PF16881, PF19238, PF19288, PF19864
Number of IDs in Pfam / InterPro Family773,531
Domain Optionoff
UniRef Version90
Number of Cluster IDs in UniRef90 Family37,636
Taxonomy Categories:Archaea
Exclude FragmentsYes
Total Number of Sequences in Dataset37,636
Total Number of Edges35,803,000
Number of Unique Sequences37,636
Convergence Ratio?0.051
Please cite your use of the EFI tools:

Rémi Zallot, Nils Oberg, and John A. Gerlt, The EFI Web Resource for Genomic Enzymology Tools: Leveraging Protein, Genome, and Metagenome Databases to Discover Novel Enzymes and Metabolic Pathways. Biochemistry 2019 58 (41), 4169-4182. https://doi.org/10.1021/acs.biochem.9b00735

Nils Oberg, Rémi Zallot, and John A. Gerlt, EFI-EST, EFI-GNT, and EFI-CGFP: Enzyme Function Initiative (EFI) Web Resource for Genomic Enzymology Tools. J Mol Biol 2023. https://doi.org/10.1016/j.jmb.2023.168018

The panels below provide files for full and representative node SSNs for download with the indicated numbers of nodes and edges. As an approximate guide, SSNs with ~2M edges can be opened with 16 GB RAM, ~5M edges can be opened with 32 GB RAM, ~10M edges can be opened with 64 GB RAM, ~20M edges can be opened with 128 GB RAM, ~40M edges can be opened with 256 GB RAM, and ~120M edges can be opened with 768 GB RAM.

Files may be transferred to the Genome Neighborhood Tool (GNT), the Color SSN utility, the Cluster Analysis utility, or the Neighborhood Connectivity utility.

Full Network ?

Each node in the network represents a single protein sequence.

# Nodes # Edges
36,996 27,905,053

 

Representative Node Networks ?

In representative node (RepNode) networks, each node in the network represents a collection of proteins grouped according to percent identity. For example, for a 75% identity RepNode network, all connected sequences that share 75% or more identity are grouped into a single node (meta node). Sequences are collapsed together to reduce the overall number of nodes, making for less complicated networks easier to load in Cytoscape.

The cluster organization is not changed, and the clustering of sequences remains identical to the full network.

% ID # Nodes # Edges
100 36,996 27,905,053
95 36,951 27,839,457
90 36,774 27,595,702
85 34,734 24,785,280
80 32,869 22,407,114
75 31,251 20,453,447
70 29,765 18,716,545
65 28,255 16,965,858
60 26,787 15,401,738
55 25,194 13,770,385
50 23,297 11,948,616
45 21,052 9,956,864
40 18,486 7,815,181

New to Cytoscape?

Portions of these data are derived from the Universal Protein Resource (UniProt) databases.

Click here to contact us for help, reporting issues, or suggestions.