EFI - Enzyme Similarity Tool

This web resource is supported by a Research Resource from the National Institute of General Medical Sciences (R24GM141196-01).

The tools are available without charge or license to both academic and commercial users.

The EFI tools will be disabled for system maintenance on Thursday, 7/10 8:00 AM and returned to service by Saturday, 7/12 10 AM, Central Time.

Download Network Files

Submission Name: IP91_IPR004184_UniRef90_NoFragments

Network Name: IP91_IPR004184_UniRef90_NoFragments_Bacteria_Minlen650_AS240

SSN Overview
Network Files

The parameters used for the initial submission and the finalization are summarized in the table below.

Analysis Summary

Analysis Job Number	26589
Network Name	IP91_IPR004184_UniRef90_NoFragments_Bacteria_Minlen650_AS240
Alignment Score	240
Taxonomy Categories	Superkingdom: Bacteria
Minimum Length	650
Maximum Length	50,000
Total Number of Sequences After Filtering	5,419

Dataset Summary

EST Job Number	26123 (Original Dataset)
Database Version	UniProt: 2022-04 / InterPro: 91
Input Option	Families (Option B)
Job Name	IP91_IPR004184_UniRef90_NoFragments
E-Value for SSN Edge Calculation	5
Pfam / InterPro Family	IPR004184
Number of IDs in Pfam / InterPro Family	25,513
Domain Option	off
UniRef Version	90
Number of Cluster IDs in UniRef90 Family	6,937
Exclude Fragments	Yes
Total Number of Sequences in Dataset	6,937
Total Number of Edges	20,962,994
Number of Unique Sequences	6,937
Convergence Ratio?	0.871

Please cite your use of the EFI tools:

Rémi Zallot, Nils Oberg, and John A. Gerlt, The EFI Web Resource for Genomic Enzymology Tools: Leveraging Protein, Genome, and Metagenome Databases to Discover Novel Enzymes and Metabolic Pathways. Biochemistry 2019 58 (41), 4169-4182. https://doi.org/10.1021/acs.biochem.9b00735

Nils Oberg, Rémi Zallot, and John A. Gerlt, EFI-EST, EFI-GNT, and EFI-CGFP: Enzyme Function Initiative (EFI) Web Resource for Genomic Enzymology Tools. J Mol Biol 2023. https://doi.org/10.1016/j.jmb.2023.168018

The panels below provide files for full and representative node SSNs for download with the indicated numbers of nodes and edges. As an approximate guide, SSNs with ~2M edges can be opened with 16 GB RAM, ~5M edges can be opened with 32 GB RAM, ~10M edges can be opened with 64 GB RAM, ~20M edges can be opened with 128 GB RAM, ~40M edges can be opened with 256 GB RAM, and ~120M edges can be opened with 768 GB RAM.

Files may be transferred to the Genome Neighborhood Tool (GNT), the Color SSN utility, the Cluster Analysis utility, or the Neighborhood Connectivity utility.

Full Network ?

Each node in the network represents a single protein sequence.

	# Nodes	# Edges
	5,419	2,021,943

Representative Node Networks ?

In representative node (RepNode) networks, each node in the network represents a collection of proteins grouped according to percent identity. For example, for a 75% identity RepNode network, all connected sequences that share 75% or more identity are grouped into a single node (meta node). Sequences are collapsed together to reduce the overall number of nodes, making for less complicated networks easier to load in Cytoscape.

The cluster organization is not changed, and the clustering of sequences remains identical to the full network.

% ID	# Nodes	# Edges
100	5,419	2,021,943
95	5,390	1,987,868
90	5,302	1,878,667
85	4,613	1,201,837
80	3,934	717,327
75	3,395	400,907
70	2,899	202,804
65	2,499	95,145
60	2,159	38,269
55	1,938	21,069
50	1,783	14,816
45	1,685	13,036
40	1,551	11,871

New to Cytoscape?

Portions of these data are derived from the Universal Protein Resource (UniProt) databases.

Click here to contact us for help, reporting issues, or suggestions.

Email Address:
Password: