The EFI-EST web tool has been enhanced with a tab-style interface. Contents of the pages are now grouped into logical tabs and data files can be downloaded without scrolling. Job history is now clearer, and color SSN jobs are assigned a unique color.
The EST database now uses UniProt release 2019_01 and InterPro 72. UniProt release 2019_01 includes a total of 140,253,338 entries: 139,694,261 in TrEMBL and 559,077 in SwissProt.
The EST database now uses UniProt release 2018_10 and InterPro 71. UniProt release 2018_10 includes a total of 134,066,004 entries: 133,507,323 in TrEMBL and 558,681 in SwissProt.
Users of the EST can now create GNNs and color SSNs directly from the EST network files download page.
In order to speed up computation for large families, UniRef50 or UniRef90 can be used to reduce the number of sequences that are used to generate the SSNs. As an example, the full set of sequences in family PF05544 is 10,914 sequences, but if the user instructs the tool to use UniRef90, 4,198 UniRef90 cluster ID sequences representing the entire set of sequences with a 90% sequence identity† will be used isntead. Likewise, if UniRef50 is used, 552 cluster ID sequences that represent the set of UniRef90 seed sequences with a 50% sequence identity† will be used in the computations. The sequences represented by a cluster ID sequence are listed as a node attribute in the SSN. To use UniRef, the user must select the "Use UniRef 50/UniRef 90 cluster ID sequences instead of the full family" option.
† In addition to sharing a specific percent sequence identity to the longest sequence in a cluster of sequences, UniRef seed sequences must also share 80% overlap. The UniRef page at UniProt further discusses UniRef50 and UniRef90.
The EST database now uses UniProt release 2018_06 and InterPro 69. UniProt release 2018_06 includes a total of 116,587,823 entries: 116,030,110 in TrEMBL and 557,713 in SwissProt.
This database includes 16,712 Pfam famliies, 34,358 InterPro families, and 604 Pfam clans. Tables of family sizes are available here.
The EST database now uses UniProt release 2018_04 and InterPro 68. UniProt release 2018_04 includes a total of 115,316,915 entries: 114,759,640 in TrEMBL and 557,275 in SwissProt.
This database includes 16,712 Pfam famliies, 33,947 InterPro families, and 604 Pfam clans. Tables of family sizes are available here.
The EST database now uses UniProt release 2018_02 and InterPro 67. UniProt release 2018_02 includes a total of 109,414,541 entries: 108,857,716 in TrEMBL and 556,825 in SwissProt.
This database includes 16,712 Pfam famliies, 33,707 InterPro families, and 604 Pfam clans. Tables of family sizes are available here.
The EFI-EST and EFI-GNT tools were updated with the following changes:
The EST database now uses UniProt release 2017_11 and InterPro 66. UniProt release 2017_11 includes a total of 99,261,416 entries: 98,705,220 in TrEMBL and 556,196 in SwissProt.
This database includes 16,712 Pfam families, 32,568 InterPro families, and 604 Pfam clans. Lists of the families/clans are available along with the number number of sequences (full and UniRef90) can be accessed with the link below. The reductions in the number of sequences when using UniRef90 cluster ID sequences are provided; the time required for the BLAST is decreased by the sequence of this reduction. Use of UniRef90 cluster ID sequences also allows SSNs to be generated for larger families/clans (305,000 sequence limit). Tables of family sizes are available here.
Support for Pfam clans has now been added to the Families option. Pfam clans are collections of multiple Pfam families that define superfamilies. The sequences in the families in a clan are not mutually exclusive. A list of the families in each clans is available here. Pfam clans can also be specified in the FASTA and Accession IDs options as supplementary sequences.