EFI - Chemically-Guided Functional Profiling

This web resource is supported by a Research Resource from the National Institute of General Medical Sciences (R24GM141196-01).
The tools are available without charge or license to both academic and commercial users.
Please cite your use of the EFI tools:

Rémi Zallot, Nils Oberg, and John A. Gerlt, The EFI Web Resource for Genomic Enzymology Tools: Leveraging Protein, Genome, and Metagenome Databases to Discover Novel Enzymes and Metabolic Pathways. Biochemistry 2019 58 (41), 4169-4182. https://doi.org/10.1021/acs.biochem.9b00735

Nils Oberg, Rémi Zallot, and John A. Gerlt, EFI-EST, EFI-GNT, and EFI-CGFP: Enzyme Function Initiative (EFI) Web Resource for Genomic Enzymology Tools. J Mol Biol 2023. https://doi.org/10.1016/j.jmb.2023.168018
RadicalSAM.org, our resource for investigating sequence-function space in the radical SAM superfamily, has been updated with sequences from the UniProt Release 2024_01 and InterPro Release 98 databases (January 24, 2024) !!

https://radicalsam.org

Chemically guided functional profiling (CGFP) maps metagenome protein abundance to clusters in sequence similarity networks (SSNs) generated by the EFI-EST web tool.

EFI-CGFP uses the ShortBRED software package developed by Huttenhower and colleagues in two successive steps: 1) identify sequence markers that are unique to members of families in the input SSN that are identified by ShortBRED and share 85% sequence identity using the CD-HIT algorithm (CD-HIT 85 clusters) and 2) quantify the marker abundances in metagenome datasets and then map these to the SSN clusters.

Currently, a library of 380 metagenomes is available for analysis. The dataset originates from the Human Microbiome Project (HMP) and consists of metagenomes from healthy adult women and men from six body sites [stool, buccal mucosa (lining of cheek and mouth), supragingival plaque (tooth plaque), anterior nares (nasal cavity), tongue dorsum (surface), and posterior fornix (vagina)].

The EFI-CGFP database has been updated to use UniProt 2025_01.

Chemically-Guided Functional Profiling Overview

Experimental assignment of functions to uncharacterized enzymes in predicted pathways is expensive and time-consuming. Therefore, targets that are 'worth the effort' must be selected. Balskus, Huttenhower and their coworkers described 'chemically guided functional profiling' (CGFP). CGFP identifies SSN clusters that are abundant in metagenome datasets to prioritize targets for functional characterization.

EFI-CGFP Acceptable Input

The input for EFI-CGFP is a colored sequence similarity network (SSN). To obtain SSNs compatible with EFI-CGFP analysis, users need to be familiar with both EFI-EST (https://efi.igb.illinois.edu/efi-est/) to generate SSNs for protein families, and Cytoscape (http://www.cytoscape.org/) to visualize, analyze, and edit SSNs. Users should also be familiar with the EFI-GNT web tool (https://efi.igb.illinois.edu/efi-gnt/) that colors SSNs, and collects, analyzes, and represents genome neighborhoods for bacterial and fungal sequences in SSN clusters.

Principle of CGFP Analysis

EFI-CGFP uses the ShortBRED software package developed by Huttenhower and colleagues in two successive steps: 1) identify sequence markers that are unique to members of families in the input SSN that are identified by ShortBRED and share 85% sequence identity using the CD-HIT algorithm (CD-HIT 85 clusters) and 2) quantify the marker abundances in metagenome datasets and then map these to the SSN clusters.

EFI-CGFP Output

When the "Identify" step has been performed, several files are available. They include: a SSN enhanced with the markers that have been identified and their type as node attributes, additional files that describe the markers and the ShortBRED families that were used to identify them.

After the "quantify" step has been performed, heatmaps summarizing the quantification of metagenome hits per SSN clusters are available. Several additional files are provided: the SSN enhanced with metagenome hits that have been identified and quantification results given in abundance within metagenomes, per protein and per cluster.

Recommended Reading

Rémi Zallot, Nils Oberg, John A. Gerlt, "Democratized" genomic enzymology web tools for functional assignment, Current Opinion in Chemical Biology, Volume 47, 2018, Pages 77-85, https://doi.org/10.1016/j.cbpa.2018.09.009

John A. Gerlt, Genomic enzymology: Web tools for leveraging protein family sequence–function space and genome context to discover novel functions, Biochemistry, 2017 - ACS Publications

UniProt Version: 2025_01

This site uses the CGFP-ShortBRED programs (https://github.com/biobakery/shortbred and http://huttenhower.sph.harvard.edu/shortbred).

For more information on CGFP-ShortBRED, see

Levin, B. J., Huang, Y. Y., Peck, S. C., Wei, Y., Martínez-del Campo, A., Marks, J. A., Franzosa, E. A., Huttenhower, C., Balskus, E. P. A prominent glycyl radical enzyme in human gut microbiomes metabolizes trans-4-hydroxy-l-proline. Science 355, eaai8386 (2017). (DOI: 10.1126/science.aai8386)

For more information on ShortBRED, see

Kaminski J., Gibson M. K., Franzosa E. A., Segata N., Dantas G., Huttenhower C. High-specificity targeted functional profiling in microbial communities with ShortBRED. PLoS Comput Biol. 2015 Dec 18;11(12):e1004557. DOI: 10.1371/journal.pcbi.1004557

These programs use data computed by MicrobeCensus.

Nayfach, S. and Pollard, K.S. Average genome size estimation improves comparative metagenomics and sheds light on the functional ecology of the human microbiome. Genome Biology 2015;16(1):51.

Portions of the metagenome data used on this site come from the Human Microbiome Project.

The Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207-214 (14 June 2012). DOI: 10.1038/nature11234
The Human Microbiome Project Consortium. A framework for human microbiome research. Nature 486, 215-221 (14 June 2012). DOI: 10.1038/nature11209


Click here to contact us for help, reporting issues, or suggestions.