PlanTAP Family Viewer Documentation

PlanTAP Family Viewer documentation
PlanTAP Family Members Viewer documentation

horizontal line

PlanTAP Family Viewer documentation

Filters
accession_number
category
citation(s)
consensus domain(s)
domain structure
homology reduction
in family clusters
in_tree?
is a?
last modified
main_family
member species
number of clusters
number of members
number of members in trees
number of non-redundant members
number of queries
redundancy removal
redundant?
sub_family
taxonomic profile
user_contributed_trees

horizontal line

top
Filters The family member list can be filtered using the following criteria: In addition the information displayed for every family member can be switched between full textual output and graphical domain structure by using the graphical view of the family members' domain structures checkbox. Selection of multiple filters is additive: If you modify multiple filters, all of them are used to filter the family members list after you click "Filter member list". Use the "Reset" button to reset the filters to their defaults to avoid unexpected results after modification of filter parameters. The "Reset" button only resets the selected filtering parameters, not the family members displayed below! To display the full list of family members after reset, you will have to "Filter member list" again.
accession_number Each PlanTAP entry has its distinct accession number, which is a 5-letter string comprised of a leading two-character category string and a trailing unique 3 digit number. (TF|TR|PT)([0-9]{3})
category Each PlanTAP family entry belongs to one of the following categories:
  1. DNA-binding transcription factors (TF), which directly activate or repress transcription of target genes upon binding to the promoter or upstream enhancer / silencer elements
  2. Transcriptional regulators (TR), comprising of general transcription initiation factors (interacting with RNA polymerase II and/or core promoter elements and recruiting components of the basal transcription machinery), co-activators / -repressors (binding to and influencing the activity of TFs) and chromatin remodelling factors (affecting the accessibility of DNA through histone modifications and DNA methylation)
  3. Putative TAP (PT) with unknown function and/or domains that are possibly associated with transcriptional regulation
citation(s) Related literature references describing the PlanTAP entry. Follow the hyperlink to view the corresponding PubMed entry.
consensus domain(s) In the manual annotation process matching InterPro domains were condensed to a set of consensus domains common to the majority of members. The entries are directly hyperlinked to the corresponding InterPro entry. If your browser supports mouse-over information, use this to display additional information.
domain structure Display a graphical view of the members domain structure instead of the default full textual view. The images are scaled in relation to the longest displayed member sequence. CAUTION: depending on the number of members to be displayed, this may take a while!
homology reduction Phylogenetic inference of large clusters is computationally costly and the interpretation and inference of results from huge trees is difficult. A total of 102 clusters had more than 150 members, these were condensed via stepwise homology reduction until the threshold of 150 members was reached. Homology reduction was implemented in the same program as redundancy removal, but follows a different strategy. Beginning with 1 substitution per 100aa and heuristically increasing this distance threshold, the distance matrix is iteratively scanned for sequence pairs with the respective distance, regardless of their species. The iteration stops when the remaining representative cluster members reach a given limit (150 sequences).
in family clusters Display only entries from specific PlanTAP family clusters.
in_tree? Filter the member list using the "in tree" property described under "in tree" in the family members viewer documentation section.
is a? Filter the member list using the "is a" property described under "is a" in the family members viewer documentation section.
last modified Timestamp of the last modification of the entry.
main_family Each PlanTAP entry was annotated to belong to an existing or new family of TAPs.
member species Filter the member list to show only entries having exactly the same taxonomy string (NCBI taxonomy full lineage represented by a single NCBI taxonid). Each binary species name stands for a full linage taxonomy string, e.g. like Eukaryota; Viridiplantae; Streptophyta; Streptophytina; Embryophyta; Tracheophyta; Euphyllophyta; Spermatophyta; Magnoliophyta; Liliopsida; commelinids; Poales; Poaceae; BEP clade; Ehrhartoideae; Oryzeae; Oryza; Oryza sativa (japonica cultivar-group). Due to the nature of sequence submission in e.g. Genbank, it can happen that there is another entry with the same binary name but slightly divergent lineage. These will be two different entries in the filter list. But since multiple selections are possible, this should not be an problem.
number of clusters Total number of clusters the describing the PlanTAP family (= number of trees for the family). Multiple clusters depict the particular TAP family either from a different taxonomic perspective (e.g. restricted to the plant lineage vs. covering all kingdoms), or comprise different subfamilies. Because large TAP gene families are substantially divergent beyond their conserved domains, it appears more reasonable to deduce phylogenies from subgroups in order to be able to utilize as much homologous sequence information as possible.
number of members Total number of family member sequences.
number of members in trees Total number of member sequences after homology reduction.
number of non-redundant members Total number of family member sequences after the redundancy removal. See redundancy removal for more details.
number of queries Total number of query sequences in the PlanTAP family.
redundancy removal While it greatly improves taxon sampling, the strategy to use both, a huge multi-species containing database like UniProt and the individual full-genome protein predictions, results in the detection of identical protein sequences in these overlapping databases. In addition, the same locus is often represented by more than one protein sequence due to divergent predicted gene models, splice variants as well as sequencing and annotation errors. To cope with this problem, prior to all functional analyses redundant copies of genes were eliminated using an identity cutoff of 99% for sequences of the same species. For the removal of redundant sequences, a multiple sequence alignment was performed using MAFFT FFT-NS-2 and pairwise distances were calculated using the EMBOSS distmat program. The resulting matrix was scanned for sequence pairs from the same species with a distance 1 substitutions per 100aa. For each pair, one representative was selected based on the originating database (UniProt sequences were preferred), sequence length and lexical sort order of the accession number. The procedure was implemented in Perl using several Bioperl modules, including a modified version of the Bio::Tools::Run::Alignment::MAFFT module. For the parsing of the distmat distance matrices, an object-oriented Bioperl module (Bio::Matrix::IO::distmat) was written.
redundant? Filter the member list using the "redundant" property described under "redundant" in the family members viewer documentation section.
sub_family Some of the PlanTAP families can be further divided into subfamilies.
taxonomic profile For visualization of the distribution of TAP family members across all taxonomic lineages a taxonomic profile was created and is presented as a heat map. Initial tests using taxonomic resolution fixed at the kingdom or order level, respectively, were not able to resolve the expected phylogeny of the contributing taxa using columnwise clustering (data not shown). Therefore, those taxonomic groups which contributed significantly to the overall distribution were selected as columns, the remainder of the Eubacteria, protists, plants and animals was gathered into respective other columns. Thus, a non-redundant representation of the taxonomic distribution was created which is able to resolve the expected phylogeny using columnwise clustering. To overcome the sampling bias presented by fully sequenced genomes, the columns were normalized. Subsequent clustering yielded the significantly correlated groups. The filter "taxonomic profile" gives the opportunity to specifically select all member entries belonging to an individual taxonomic group.
user_contributed_trees You can extend PlanTAPDB. If you want to contribute a manually curated or extended phylogeny of a PlanTAP family, just send us a nhx formatted tree with support values and species annotation together with a short text describing the method used.
topic index top

horizontal line

PlanTAP Family Members Viewer documentation

The Family Members Viewer allows you to view, filter and retrieve the members of a given PlanTAP family.
Sequence Retrieval
description
domains
in #clusters
in tree
is a
length
member_name
redundant
repr. species
representative
species

horizontal line

top
Sequence Retrieval The PlanTAPDB interfaces allow sequence retrieval in three ways:
  1. Hyperlink: contents of the fields member_name and representative of the Family Members Viewer are hyperlinked to the Cosmoss Retrieval System to fetch individual entries one at a time.
  2. Batch retrieval by checkbox: The Family Members Viewer allows retrieval of multiple member entries at a time by selecting them using the leading checkbox and hitting the "retrieve selected members" button. The check-status of all displayed members can be modified by using the checkbox in the table header of the member list in the upper left corner.
  3. Batch retrieval by node: The ATV Tree Viewer was modified to provide an additional option "get PlanTAP sequences", that can be used to retrieve all sequences belonging to a given node in the phylogenetic tree simply by clicking on it. Of course, the original option that links to a UniProt entry if possible was also preserved.
description The member sequences' description line, i.e. textual annotation provided by the orginating database
domains Matching InterPro domains in order of occurence along the sequence. If your browser supports mouse-over information, use this to display additional information, like e.g. description, E-value, start - stop of the match.
in #clusters In how many clusters belonging to this family did the member sequence occur? If you follow the hyperlink, an additional window appears displaying the PlanTAP family cluster(s), the respective entry is part of and provides hyperlinks to these clusters' ClusterView page.
in tree Is the member sequence part of any of the family msa and trees? Or was it removed in the homology reduction? This is also a filter property
is a Was the member sequence a hit, a query or both In the initial PSI-BLAST? This is also a filter property
length Length of the member's amino acid sequence.
member_name The unique accession number of a sequence which can be a member of multiple PlanTAP families. The accession numbers of the member sequences are the identifiers of their orginating databases, e.g. UniProt, GenPept, TAIR, Cosmoss ... By following the hyperlink you can retrieve the individual sequence via the Cosmoss Sequence Retrieval System.
redundant Was the member sequence tagged to be redundant in the homology reduction in any of the member clusters? Sequences marked as redundant were excluded in the taxonomic profiling of the PlanTAP families.This is also a filter property
repr. species Scientific name of the organism the representative member sequence is derived from. For small clusters, only redundant sequences of the same organism are considered, whereas this is not the case for huge clusters where iterative homology reduction was performed. SYNTAX: Genus species (subspecies or variety...) The last two words of the corresponding NCBI Taxonomy full linage string. Follow the hyperlink to access the corresponding NCBI taxonomy entry.
representative Fellow member sequence which represents a sequence in at least one of the family msa and trees. By following the hyperlink you can retrieve the individual sequence via the Cosmoss Sequence Retrieval System.
species The scientific name of the organism the member sequence is derived from. SYNTAX: Genus species (subspecies or variety...) The last two words of the corresponding NCBI Taxonomy full linage string. Follow the hyperlink to access the corresponding NCBI taxonomy entry.
topic index top

horizontal line