Takes a Genbank file as input. Parses through and for every CDS that it finds, it extracts a pre-determined length of DNA upstream (length
will be an argument).
The sequence length will be the argument length + 3 as the initiation codon will be included.
Output will be an FFN file of these upstream DNA sequences.
This only WORKS for prokaryotic sequences because it does not handle Splits or Joins found in eukaryotic.
NOTE: Please make sure that the 'locus_tag' sub-feature is found under the 'CDS' main feature.
NOTE: The coordinates given in the fasta line in the FFN file are the coordinates of the extracted region.
NOTE: This currently only works for linearized genomes. In the case of a circular genome, there may be no upstream regions when the gene starts at the origin.
Please help us improve this tool by sending any questions or comments to Andre.Villegas[at]phac-aspc.gc.ca.