Annotation Process

Use of WebApollo to annotate current Tallapoosa darter genome scaffolds. The Tallapoosa darter genome annotation workflow in this instance of WebApollo is set up to utilize fgenesh generated gene models as starting points.

General instructions (PDF) for using WebApollo are available at:

There is also a demo site where one can try out all of the editing features without concern of ruining anything:

To annotate the Tallapoosa darter scaffolds a user must have a login and password. If interested, email Leos Kral ( to obtain these. To just look at the scaffolds and current annotations use the following credentials:

User name: guest
Password: guest

In the current version of WebApollo a guest user can not view the Annotation Info Editor content where accession numbers for gene models are noted. This limitation will be removed in the next update of the software. To access this instance of WebApollo, log in.

Once logged in, a list of all available scaffolds is presented. In the figure below the scaffolds are sorted by size. Scaffolds can also be sorted by name. Just click the relevant heading to obtain the sorting desired. In this description of an annotation process, scaffold234_size47437 is selected.

Once the scaffold of interest is selected, a new tab opens that presents the scaffold for annotation. In this case, the scaffold is 47,437 nucleotides in length but notice that only part of the scaffold is selected for viewing (from about 10,000 to 40,000 as indicated by the red rectangle). To magnify or to view a longer part of the scaffold click on the relevant magnifying glass icons shown in the red oval. In this example the - icon is clicked to view the entire scaffold length. (Note: sometimes the first - click seems to be ignored. If that happens, just click on one of the + icons and then click the - icons).

The view has now been expanded to encompass the entire length of the scaffold as shown by the extent of the red rectangle and also by the full extent of the light blue color. In this example the evidence tracks showing locations of possible genes are not displayed. The Tallapoosa darter WebApollo instance has only one evidence track available. This evidence track was generated by the fgenesh program. To display this evidence track double click the fgenesh button shown in the red oval.

The fgenesh predicted gene or genes for this scaffold now appear below the yellow user area. In this example, the fgenesh algorithm predicted one gene (shown in red oval) spanning the entire scaffold. The vertical green bars are the predicted exons.

Double clicking any of the predicted exons will select the entire predicted gene and this gene model can be slid up into the User-created Annotations area.

Once the gene model is in the User Area, the exons are now shown as blue vertical bars or boxes.

Double clicking any of the blue exons selects the entire gene model and then right clicking this selected model brings up a menu from which "Get sequence" can be selected.

By default the amino acid sequence encoded by this gene model is displayed. Options are available to also display a variety of DNA sequences associated with the gene model.

To determine if this gene model actually represents a whole gene or part of a gene, the protein sequence can be used to search GenBank with blastp. In this example, proteins with high homology to the predicted protein sequence were identified in the GenBank database.

One of the proteins with the highest homology was a product of a predicted gene for SH3 and PX domain-containing protein 2A like protein from Takifugu rubripes

This Takifugu derived protein sequence was aligned to the scaffold DNA sequence with the fgenesh+ program to determine the corresponding exon/intron structure of this gene in this Tallapoosa darter scaffold. A portion of the output of this program is shown below where the predicted exon coordinates are indicated.

These coordinates can be used to guide a manual adjustment of the gene model in the User-created Annotations area. First, the entire gene model can be selected by double clicking any exon or individual exons can be selected by single clicking an exon. Then right clicking the selection will bring up a menu from which "Zoom to base level" can be selected.

In the base level view each of the intro/exon junctions can be adjusted according to the sequence of the closest matching protein and the coordinates obtained from fgenesh+. Exons can also be added or deleted as necessary.

Alternatively, the fgenesh+ output file can be converted to GFF3 format, and this reformatted file can be directly imported into WebApollo as a new gene model. The fgenesh+ generated gene model based on the Takifugu protein sequence is shown in the red oval below.

Shown in the red box below are additional fgenesh+ generated gene models based on alignments with three different protein isoforms derived form predicted SH3 and PX domain-containing protein 2A like annotations of Oreochromis niloticus.

As can be seen, the original fgenesh generated gene model contains an excess number of exons. However, the model did serve its purpose by indicating the possible location of a gene in the scaffold and the predicted protein sequence had sufficient similarity to an actual protein sequence in GenBank to enable a refined annotation of this gene.