Takes a Genbank file as input. Parses through and for every CDS that it finds, it extracts a pre-determined length of DNA upstream (length will be an argument).
The sequence length will be the argument length + 3 as the initiation codon will be included.
Output will be an FFN file of these upstream DNA sequences.
This only WORKS for prokaryotic sequences because it does not handle Splits or Joins found in eukaryotic.

NOTE: Please make sure that the 'locus_tag' sub-feature is found under the 'CDS' main feature.
NOTE: The coordinates given in the fasta line in the FFN file are the coordinates of the extracted region.
NOTE: This currently only works for linearized genomes. In the case of a circular genome, there may be no upstream regions when the gene starts at the origin.



Upload a GBK file:            
(Max file size: 20Mb)
Upstream sequence length (default 100, max 999):


Please help us improve this tool by sending any questions or comments to Andre.Villegas[at]phac-aspc.gc.ca.