It is hoped that this web site will act as an organizing nucleus for the formation of a consortium of laboratories working together to utilize darter genomics as a tool to better understand the evolution of darter species and darter biology.
So far, the study of darter evolution has utilized morphological, behavioral and limited DNA sequence analysis. While much of darter phylogeny has been elucidated from these studies, there are still many unresolved questions. For example, to what extent do related species share alleles due to incomplete lineage sorting or hybridization during evolution. What are the actual adaptive genetic changes that define darter species? To what extent, if any, do allopatrically distributed and genetically differentiated populations of the same species show adaptive genetic differentiation?
I believe that a complete understanding of darter evolution must utilize the analysis of complete genomes. While this approach was not financially feasible in the past, I think that the cost of genomic analysis is about to cross a threshold where sequencing of darter genomes of individual species and, soon, populations within species will become very affordable.
As a starting point, it will be necessary to have a fully annotated reference darter genome sequence to which the genomic sequences of other darter species can be compared. As a first step in this direction I have recently obtained the genomic sequence of the Tallapoosa darter (Etheostoma tallapoosae). This sequence was obtained as a result of two 250 nucleotide PE runs on an Illumina MiSeq. A total of 13 billion nucleotides of sequence was obtained from 52 million such 250 nucleotide sequence reads. This represents, on average, about a 12 fold coverage of the darter genome. (See Tallapoosa darter genome link for description.)
These sequences were assembled into contigs utilizing the Minia assembler and these contigs were assembled into scaffolds with SSPACE. I consider this to be only the first phase of the Tallapoosa darter genome assembly. While a good start, a 12 fold coverage is not sufficient for a complete assembly of the entire genome. The scaffolds are not very long and there are far too many short contigs that are not part of longer scaffolds. (See Tallapoosa darter genome link for description.) For phase two, rather than increasing the coverage by obtaining additional short sequence reads and obtaining mate pair reads to assemble longer scaffolds, I am hopeful that Oxford Nanopore will soon commercialize their GridION and MinION systems which promise relatively inexpensive long reads up to 100,000 nucleotides to which the current MiSeq reads can be assembled.
I have set up a WebApollo server to enable community annotation of the present scaffolds and contigs as well as a BLAST server (ViroBLAST) to enable searches within the current Tallapoosa darter genomic assembly. These are now being utilized by students in my genetics and genomics courses as well as by students engaged in research in my lab. I have also made these publicly available so that any other interested parties can have access to this data and can contribute to the annotation effort (see External Links in sidebar).
Since I do not have the resources to sequence and analyze all or even a representative subset of all darter species, I am hoping that this site may aid in the recruitment of other research labs that will sequence the genomic DNA of their favorite species of darter or darters and contribute these sequences to a shared database so together we may build a resource that will enable the exploration of many questions related to darter biology and evolution.
I have set up both a darter genomics Google+ Community as well as a Google Discussion Group. Please feel free to join and contribute your thoughts and join in the effort.
Department of Biology
University of West Georgia
Carrollton, GA 30118