New node attributes, TIGRFAMs and AlphaFold, have been added to SSNs.
The ENA database that is used by the EFI-GNT was last updated in June 2022.
The EST database now uses UniProt release 2022_02 and InterPro 89. UniProt release 2022_02 includes a total of 231,921,744 entries: 231,354,261 in TrEMBL and 567,483 in SwissProt.
As the UniProt database increases in size, selecting sequences from specific taxonomic categories may be useful for generating SSNs. Use the "Filter by Taxonomy" option to specify taxonomic categories in the input sequences. The "Taxonomy" tool (top of page) provides a preview of the taxonomic distribution of user-specified sequences.
Also, the upper limit on the number of sequences from the UniProt and UniRef90 databases for EFI-EST has been increased from 25,000 to 50,000 to allow the generation of SSNs for larger families. Visualization of these with Cytoscape will require more RAM, but access to increased amounts of sequence-function space may be useful. The time required to generate an SSN increases with the square of the number of sequences.
The Color SSNs and Cluster Analysis tabs are now included on the SSN Utilities tab.
Neighborhood Connectivity (NC) is a new tool on the SSN Utilities tab. NC colors the input SSN according to the number of internode connections. NC coloring helps identify families in SSNs generated with low alignment scores.
The ENA database that is used by the EFI-GNT was last updated in April 2020.
The EST database now uses UniProt release 2020_04 and InterPro 81. UniProt release 2020_04 includes a total of 189,525,031 entries: 188,961,949 in TrEMBL and 563,082 in SwissProt.
The new Cluster Analysis utility computes MSAs, WebLogos, consensus residues, HMMs, and length histograms from a provided SSN. The same features available in the Color SSNs utility are also included.
The Families and Accession IDs tools now include a new domain option for selecting the regions N- or C-terminal to the provided domain.
The EST database now uses UniProt release 2020_02 and InterPro 79. UniProt release 2020_02 includes a total of 181,252,700 entries: 180,690,447 in TrEMBL and 562,253 in SwissProt.
An option to exclude UniProt-defined fragments has been added to the web tool. Additionally, BLAST jobs can now search the UniRef50 and UniRef90 databases in addition to the UniProt database.
The EST database now uses UniProt release 2019_10 and InterPro 77. UniProt release 2019_10 includes a total of 182,349,356 entries: 181,787,788 in TrEMBL and 561,568 in SwissProt.
The EST database now uses UniProt release 2019_08 and InterPro 76. UniProt release 2019_08 includes a total of 172,062,311 entries: 171,501,488 in TrEMBL and 560,823 in SwissProt.
The EST database now uses UniProt release 2019_04 and InterPro 74. UniProt release 2019_04 includes a total of 158,817,814 entries: 158,257,522 in TrEMBL and 560,292 in SwissProt.
The EFI-EST web tool has been enhanced with a tab-style interface. Contents of the pages are now grouped into logical tabs and data files can be downloaded without scrolling. Job history is now clearer, and color SSN jobs are assigned a unique color.
The EST database now uses UniProt release 2019_01 and InterPro 72. UniProt release 2019_01 includes a total of 140,253,338 entries: 139,694,261 in TrEMBL and 559,077 in SwissProt.
The EST database now uses UniProt release 2018_10 and InterPro 71. UniProt release 2018_10 includes a total of 134,066,004 entries: 133,507,323 in TrEMBL and 558,681 in SwissProt.
Users of the EST can now create GNNs and color SSNs directly from the EST network files download page.
In order to speed up computation for large families, UniRef50 or UniRef90 can be used to reduce the number of sequences that are used to generate the SSNs. As an example, the full set of sequences in family PF05544 is 10,914 sequences, but if the user instructs the tool to use UniRef90, 4,198 UniRef90 cluster ID sequences representing the entire set of sequences with a 90% sequence identity† will be used isntead. Likewise, if UniRef50 is used, 552 cluster ID sequences that represent the set of UniRef90 seed sequences with a 50% sequence identity† will be used in the computations. The sequences represented by a cluster ID sequence are listed as a node attribute in the SSN. To use UniRef, the user must select the "Use UniRef50/UniRef90 cluster ID sequences instead of the full family" option.
† In addition to sharing a specific percent sequence identity to the longest sequence in a cluster of sequences, UniRef seed sequences must also share 80% overlap. The UniRef page at UniProt further discusses UniRef50 and UniRef90.
The EST database now uses UniProt release 2018_06 and InterPro 69. UniProt release 2018_06 includes a total of 116,587,823 entries: 116,030,110 in TrEMBL and 557,713 in SwissProt.
This database includes 16,712 Pfam famliies, 34,358 InterPro families, and 604 Pfam clans. Tables of family sizes are available here.
The EST database now uses UniProt release 2018_04 and InterPro 68. UniProt release 2018_04 includes a total of 115,316,915 entries: 114,759,640 in TrEMBL and 557,275 in SwissProt.
This database includes 16,712 Pfam famliies, 33,947 InterPro families, and 604 Pfam clans. Tables of family sizes are available here.
The EST database now uses UniProt release 2018_02 and InterPro 67. UniProt release 2018_02 includes a total of 109,414,541 entries: 108,857,716 in TrEMBL and 556,825 in SwissProt.
This database includes 16,712 Pfam famliies, 33,707 InterPro families, and 604 Pfam clans. Tables of family sizes are available here.
The EFI-EST and EFI-GNT tools were updated with the following changes:
The EST database now uses UniProt release 2017_11 and InterPro 66. UniProt release 2017_11 includes a total of 99,261,416 entries: 98,705,220 in TrEMBL and 556,196 in SwissProt.
This database includes 16,712 Pfam families, 32,568 InterPro families, and 604 Pfam clans. Lists of the families/clans are available along with the number number of sequences (full and UniRef90) can be accessed with the link below. The reductions in the number of sequences when using UniRef90 cluster ID sequences are provided; the time required for the BLAST is decreased by the sequence of this reduction. Use of UniRef90 cluster ID sequences also allows SSNs to be generated for larger families/clans (305,000 sequence limit). Tables of family sizes are available here.
Support for Pfam clans has now been added to the Families option. Pfam clans are collections of multiple Pfam families that define superfamilies. The sequences in the families in a clan are not mutually exclusive. A list of the families in each clans is available here. Pfam clans can also be specified in the FASTA and Accession IDs options as supplementary sequences.