EFI-EST and Cytoscape Tutorials

Node Attributes

A major advantage of sequence similarity networks is the ability to include pertinent information for each individual protein (such as species, annotation, length, PDB deposition, etc.). This information is included as “Node Attributes” which are searchable and sortable within the data panel in a sequence similarity network displayed in Cytoscape (see Figure 3 here).

Also notice that if you right click (control+click on Mac) on any node in a network open in Cytoscape, you will get a sub-menu to carry out node-specific actions or access external links via LinkOut, such as to UniProtKB.

Note that the EFI-EST web server uses data available in the UniProtKB database to populate node attribute fields. Therefore, only information that is stored in UniProtKB is included in EFI-EST networks. To load and map your own node attributes, click here to view a tutorial. Adding your own node attributes is useful for mapping annotations or other information that you have at hand, which is otherwise not available in UniProtKB.

An introduction on how to use Cytoscape can be found here.

Rep Node Network Node Attributes

ACC (variable, list) – UniProt accession(s) for the protein(s)
CAZY (variable, list) – CAZy family name(s) for the protein(s)
CLASS (variable, list) – Phylogenetic class(es) of the organism(s)
Cluster Size (variable) – number of proteins in the rep node
Description (variable, list) – protein name(s)/annotation(s) in UniProtKB
Domain (variable, list) – domain of life to which the organism(s) belong(s)
EC (variable, list) – the EC number(s) for the protein(s)
EFI_ID (6 digit, starting with 5, list) – target ID(s) for the protein(s) from EFI-DB
FAMILY (variable, list) – Phylogenetic family(ies) of the organism(s)
GDNA (true or false, list) – availability of gDNA(s) from the AECOM Protein Core
GENUS (variable, list) – Phylogenetic genus(i) of the organism(s)
GI (variable, list) – GI numbers mapped to the protein(s)
GN (variable, list) – gene name(s) for the protein(s)
GO (variable, list) – Gene Ontology classification(s) for the protein(s)
HMP_Body_Site (body site, list) – if human microbiome species, the location(s) of the species in/on the body
HMP_Oxygen (oxygen requirement, list) – if human microbiome species, the oxygen requirement(s)
IPRO (variable, list) – InterPro family(ies) into which the protein(s) has been classified
name (variable, list) – UniProt accession for the longest sequence in the rep node
ORDER (variable, list) – Phylogenetic order of the organism(s)
Organism (variable, list) – organism genus(i) and species
PDB (4 character, list) – deposition code(s) for structures deposited in the Protein Data Bank
PFAM (variable, list) – Pfam family(ies) into which the protein(s) has(have) been classified
PHYLUM (variable, list) – Phylogenetic phylum(a) of the organism(s)
Sequence_Length (variable, list) – number(s) of amino acid residues in the protein(s)
Shared name – UniProt accession for the longest sequence in the rep node
SPECIES (variable, list) – Phylogenetic species of the organism(s)
STATUS (unreviewed or reviewed, list) – indicates if the annotation(s) were generated automatically and are found in TrEMBL (unreviewed) or manually annotated and are found in Swiss-Prot (reviewed)
Taxonomy_ID (variable, list) – NCBI taxonomic identifier(s) for the organism(s)
Uniprot_ID (variable, list) – UniProt ID(s) for the protein(s)
Swis-Prot reviewed entries (variable, list) - Protein name/annotation in UniProtKB for SwissProt reviewed entries.

Full Network Node Attributes

ACC (variable) – UniProt accession for the protein
CAZY (variable) – CAZy family name(s) for the protein
CLASS (variable) – Phylogenetic class of the organism
Description (variable) – protein name (annotation) in UniProtKB
Domain (variable) – domain of life to which the organism belongs
EC (variable) – the EC number for the protein
EFI_ID (6 digit, starting with 5) – target ID from EFI-DB
FAMILY (variable) – Phylogenetic family of the organism
GDNA (true or false) – availability of gDNA from the AECOM Protein Core
GENUS (variable) – Phylogenetic genus of the organism
GI (variable) – GI numbers mapped to the protein
GN (variable) – gene name for the protein
GO (variable) – Gene Ontology classification for the protein
HMP_Body_Site (body site) – if a human microbiome species, the location of the species in/on the body
HMP_Oxygen (oxygen requirement) – if a human microbiome species, the oxygen requirement
IPRO (variable) – InterPro family(ies) into which the protein has been classified
name (variable) – UniProt accession
ORDER (variable) – Phylogenetic order of the organism
Organism (variable) – organism genus and species
PDB (4 character) – deposition code(s) for structures deposited in the Protein Data Bank
PFAM (variable) – Pfam family(ies) into which the protein has been classified
PHYLUM (variable) – Phylogenetic phylum of the organism
SEQ (variable) – amino acid sequence of the protein
Sequence_Length (variable) – number of amino acid residues in the protein
Shared name – UniProt accession for the protein
SPECIES (variable) – Phylogenetic species of the organism
STATUS (unreviewed or reviewed) - indicates if the annotation was generated automatically and was found in TrEMBL (unreviewed) or manually annotated and found in Swiss-Prot (reviewed)
Taxonomy_ID (variable) – NCBI taxonomic identifier for the organism
Uniprot_ID (variable) – UniProt ID for the protein
Swis-Prot reviewed entries (variable) - Protein name/annotation in UniProtKB for SwissProt reviewed entries.

 

Need help or have suggestions or comments? Please click here to submit »