Input filename | 27074_26150_IP91_IPR004184_NoFragments_Proteobacteria_UniRef90_NoFragments_IPR004184_Proteobacteria_Minlen650_AS240_full_ssn_coloredssn.zip |
Identify/Quantify ID | 838/778 |
Minimum sequence length | 650 |
Identify search type | DIAMOND |
Reference database | UNIREF90 |
CD-HIT identity for ShortBRED family definition | 85 |
Quantify search type | USEARCH |
The markers that uniquely define clusters in the submitted SSN have been quantified in the metagenomes selected for analysis.
Files are provided that contain details about the markers that have been identified present in metagenomes and their abundances.
The SSN submited has been edited so that the markers and their abundances in the selected metagenomes are included as node attributes.
File | Size |
---|---|
SSN with quantify results (ZIP) |
The CD-HIT ShortBRED families by cluster file contains mappings of ShortBRED families to SSN cluster number as well as a color that is assigned to each unique ShortBRED family. The ShortBRED marker data file lists the markers that were identified. Finally, the Description of selected metagenomes file provides available metadata associated with the selected metagenomes.
File | Size | |
---|---|---|
CD-HIT ShortBRED families by cluster | ||
ShortBRED marker data | ||
Description of selected metagenomes | <1 MB |
The default is for ShortBRED to report the abundance of metagenome hits for CD-HIT families using the "median method." The numbers of metagenome hits identified by all of the markers for a CD-HIT consensus sequence are arranged in increasing numerical order; the value for the median marker is used as the abundance. This method assumes that the distribution of hits across the markers for CD-HIT consensus sequence is uniform (expected if the metagenome sequencing is "deep," i.e., multiple coverage). For seed sequences with an even number of markers, the average of the two "middle" markers is used as the abundance.
Files detailing the abundance information are available for download.
Raw results for the individual proteins in the SSN (Protein abundance data (median)) as well as summarized by SSN cluster (Cluster abundance data (median)) are provided. Units are in reads per kilobase of sequence per million sample reads (RPKM).
File | Size |
---|---|
Protein abundance data (median) | |
Cluster abundance data (median) |
Data are provided using Average Genome Size (AGS) normalization for individual proteins in the SSN as well as summarized by SSN cluster. Units are have been converted from RPKM to counts per microbial genome, using AGS estimated by MicrobeCensus.
File | Size |
---|---|
Average genome size (AGS) normalized protein abundance data (median) | |
Average genome size (AGS) normalized cluster abundance data (median) |
In the mean method for reporting abundances, the average value the abundances identified by the markers for each CD-HIT consensus sequence marker is used to report abundance. This method reports the presence of "any" hit for a marker for a seed sequence. An asymmetric distribution of hits a seed sequence with multiple markers is expected for "false positives," so the mean method should be used with caution.
Files detailing the abundance information are available for download.
Raw results for the individual proteins in the SSN (Protein abundance data (mean)) as well as summarized by SSN cluster (Cluster abundance data (mean)) are provided. Units are in reads per kilobase of sequence per million sample reads (RPKM).
File | Size |
---|---|
Protein abundance data (mean) | |
Cluster abundance data (mean) |
Data are provided using Average Genome Size (AGS) normalization for individual proteins in the SSN as well as summarized by SSN cluster. Units are have been converted from RPKM to counts per microbial genome, using AGS estimated by MicrobeCensus.
File | Size |
---|---|
Average genome size (AGS) normalized protein abundance data (mean) | |
Average genome size (AGS) normalized cluster abundance data (mean) |
Heatmaps representing the quantification of sequences from SSN clusters per metagenome are available.
The y-axis lists the SSN cluster numbers for which metagenome hits were identified; the x-axis lists the metagenome datasets selected on the Identify Results page. A color scale is located on the right that displays the AGS normalized abundance of the number of gene copies for the "hit" per microbial genome in the metagenome sample.
The metagenomes are grouped according to body site so that trends/consensus across the six body sites can be easily discerned. The default heat map is calculated using the median method to report abundances.
Tools for downloading and manipulating the heat map can be accessed by hovering and clicking above and to the right of the plot.
Several filters are available for manipulating the heatmap.
Click here to contact us for help, reporting issues, or suggestions.