Sequence Retrieval User Manual

Table of Contents

  1. Introduction
  2. Sequence retrieval
  3. Sequence display
  4. Advanced retrieval
  5. Appendix

Introduction
1.1 Using sequence retrieval

The main purpose of the sequence retrieval is to retrieve and display a given set of sequences. There are three distinct use cases:

  • Retrieving sequences from a BLAST result
  • Retrieving sequences based on accession numbers
  • Retrieving a graphical display of a single sequence

Sequence retrieval
2.1 Retrieving sequences from a BLAST result

If you have executed a BLAST search using the COSMOSS web interface, you can select the sequences found by your BLAST search for retrieval. Selecting a sequence is done by using the checkbox in front of the sequence (see figure 1).

Once you have selected the sequences you wish to retrieve, the sequence retrieval is started by clicking "retrieve sequences" on the bottom of the result page.

2.2 The main retrieval dialog

You will then be redirected to the retrieval main dialog (figure 2). Here you have multiple choices how to proceed:

2.3 Retrieving sequences based on accession numbers

If you want to retrieve a given set of sequences, you can invoke the retrieval tool directly from this URL:

or by selecting "BLAST" on the COSMOSS web site and then "retrieval" on the top of the BLAST page.

If you wish to retrieve large amounts of sequences or sequences from internal databases, you need to authorize first.

On the retrieval portal (figure 3) you can select the database from which you want to retrieve sequences. The databases you can choose depend on your authorization status.

The acession numbers can be entered into the first input field. Separate multiple accession numbers by spaces.

Alternatively you can select a file containing a list of accession numbers by clicking on the "Browse..." button. The file should contain exactly one accession number per line.

After you have choosen a database and provided accession numbers by either entering them into the form or selecting a file, you can select "submit" at the bottom of the page to retrieve the sequences.

You will then be redirected to the sequence retrieval main dialog. If one or more of the sequences you wanted to retrieve could not be found, those accession numbers will be marked by a red cross (not found) in the list.

For documentation about the sequence retrieval main dialog, please see section 2.2.

Sequence display
3.1 Using sequence display

The sequence display is meant to help you read an annotated sequence. It consists of two parts, an overview graphic and the actual annotated sequence.

You can reach the sequence display from the sequence retrieval main dialog by clicking on the sequence's accession number. This is only possible for those sequences where annotation data is available.

3.2 The sequence overview graphic

The overview graphic shows the sequence and locations of the features. If there is a QUALITY feature attached to the sequence, the quality information is given as a curve in the first track of the graphic. The higher the graph is, the better is the corresponding quality of the sequence region.

The second track represents the whole sequence. This track is linked to the "source" feature (if present). That is, if you click on the track, you will jump to the "source" feature in the sequence data block.

All further tracks represent the remaining features of the sequence. If you rest the mouse over a given track, information about the feature will be displayed. If you click on a given track, you will directly jump to the associated feature in the sequence data block.

If the sequence is a contig, it will have several member features attached. These features are grouped in the last tracks. Again, information about each member will be displayed if you rest the mouse over a member track, and you will be taken to the sequence data for this member, if you click on a track.

3.3 The sequence and annotation data

The actual sequence data is displayed in the EMBL format. For the sake of readability, the feature table and the database links are all collapsed (figure 5).

In figure 5 you see the feature table of PPP_0_C1. Most of the features are collapsed. You can expand them by clicking on the feature. At the bottom and at the top of the sequence data there are links to expand or collapse all features.

The figure also shows one expanded feature. If a feature contains a database link to a public database, you can click on the link and go to the referenced entry.

If the sequence is a contig, it will contain several member features, each referring to a sequence of the contig. Since the number of member features can become quite large, the member features are collapsed twice. To expand them, you first have to click on the "FT" in front of the first "member" feature, and then on the "member" feature you wish to expand.

If you clicked on a feature in the graphical overview, the corresponding feature is expanded automatically.

Advanced retrieval
4.1 Using fuzzy search

Authorized users have the additional option to use fuzzy search for sequence retrieval.

If you check the "fuzzy search" checkbox on the sequence retrieval portal, you can retrieve sequences via a partial accession number.

If you want to retrieve a sequence with the accession number At1g01030.X, where X is the version, but you don't know which version exactly is available in the database, you can select "fuzzy search" and search for the partial accession number "At1g01030". This search will return all sequences which accession numbers start with "At1G01030".

The partial accession number must be at least 5 characters long.

Furthermore, fuzzy search is case-insensitive, so all of PPP_1337_C1, PPp_1337_c1, ppp_1337_c1, etc.. will retrieve the sequence with the accession number PPP_1337_C1.

The maximum number of sequences returned by fuzzy searched is limited to 1000 to reduce the load on the web server.

Fuzzy search will only use the first partial accession number you supplied, regardless whether you entered the accession number directly or whether you uploaded a file.

4.2 Using retrieval from scripts

To automate large batch retrievals, it is possible to access the sequence retrieval from scripts, similar to the E-utils from NCBI.

The base URL to retrieve sequences is

  • http://cosmoss.org/bm/retrieval?state=link
To retrieve sequences given by some accession numbers from a given database, you need to add the following parameters:
  • db=<database>
    the name of the database
  • acc=<acc>[+<acc>...]
    the accession numbers seperated by plus signs
  • format=[fasta|embl|genbank|swiss|acc]
    the format of the sequences
You have to add the options seperated by ampersands (&) to the base URL, for example:

Scriptable retrievale is only possible for the public databases, since there is no way to pass the credentials needed for the internal dataset using scriptable retrieval.

Appendix

In case of ambiguities or problems with the retrieval and/or the documentation, do not hesitate to get in contact with the staff:

This documentation is copyrighted 2004 by Plant Biotechnology, University of Freiburg. All rights reserved. Redistribution of parts or in a whole is prohibited without prior written permission by Plant Biotechnology, University of Freiburg.