EFI - Chemically-Guided Functional Profiling

This web resource is supported by a Research Resource from the National Institute of General Medical Sciences (R24GM141196-01).

The tools are available without charge or license to both academic and commercial users.

Important Notice

The UniProtKB database used by the EFI tools is undergoing major reorganization starting with the just-released version 2025_04 (https://www.uniprot.org/help/refprot_only_changes). When the reorganization is fully implemented (2026_02 release, Spring 2026), the number of proteins in UniProtKB will decrease from ~253M accessions in the previous 2025_03 release to ~141M accessions in the 2026_02 release.

In response to these changes, we will provide the previous 2025_03 release until the 2026_02 release is available.

The current 2025_04 release removed 82M UniProt IDs; the UniProt pages providing functional annotation for these IDs are no longer active. A new Metadata Tool provides access to the node attribute metadata for all UniProt IDs in the 2025_03 release that the tools continue to use during the UniProtKB reorganization. The Tool is available using the tab at the top of each page.

More information about the reorganization is located here.

Chemically guided functional profiling

Chemically guided functional profiling (CGFP) maps metagenome protein abundance to clusters in sequence similarity networks generated by the EFI-EST web tool (https://efi.igb.illinois.edu/efi-est/).

The glycyl radical enzyme (GRE) superfamily is functionally and mechanistically diverse with many uncharacterized members. CGFP was developed to focus experimental studies for assigning novel functions to uncharacterized members of the glycyl radical enzyme (GRE) superfamily that are detected in the human gut microbiome [B. J. Levin*, Y. Y. Huang* et al. Science 355, eaai8386 (2017)]. CGFP provides a powerful approach to prioritizing uncharacterized members for functional assignment within protein families based on their abundance in metagenomes.

From the CGFP Tutorial on the Balskus laboratory website (https://www.microbialchemist.com/metagenomic-profiling/):

"The human gut contains trillions of microbial inhabitants, making it one of the most densely populated environments on the planet. The symbiosis between these organisms and the human host is extremely complex, and we are only beginning to understand the impact of the gut microbiota on human biology. Knowledge of the chemical reactions performed and compounds produced by gut microbes will provide new insights into their roles in influencing human health. By studying the gene content of the human gut microbiome and the enzymes encoded by these genes, we hope to better understand the chemical capabilities of this microbial community. However, the activities of the vast majority of enzymes found in microbiomes are unknown.

We have developed a bioinformatics workflow to guide studies of genes and enzymes in microbiomes, including enzymes of unknown function. Our approach, which we call "chemically guided functional profiling", uses a molecular understanding of a large enzyme superfamily to guide the identification and quantitation of different family members in metagenomes and metatranscriptomes. To begin, a "sequence similarity network" (SSN) analysis is used to computationally divide a large number of enzyme sequences into groups that are likely to share the same activity. The quantitative metagenomics program ShortBRED can then identify short peptide markers that are unique to highly similar enzyme sequences and quantify the abundance of these markers in raw metagenomic datasets. The markers are then mapped back to clusters from the SSN to assess the abundance of individual enzymes in that metagenome. Because this approach provides information about the relative abundance of enzyme family members with both known and unknown activities, it can provide new insights about important microbial functions and it can prioritize uncharacterized enzymes for further study based on their distribution and abundance in microbial communities. We have used chemically guided functional profiling to identify members of the glycyl radical enzyme family in Human Microbiome Project sequencing datasets, and we anticipate that this approach will be readily extended to additional enzyme families and microbial communities."

In its original form, the CGFP pipeline described by Balskus and Huttenhower (https://github.com/biobakery/shortbred) required both knowledge of Unix command line programs and access to a computer cluster. The EFI-CGFP web tool was developed to "democratize" the use of CGFP by experimentalists by making it both accessible and "user friendly".

Click here to contact us for help, reporting issues, or suggestions.

Email Address:
Password: