Benchmarking reassembly/Loading fasta files

View: New views
1 Messages — Rating Filter:   Alert me  

Benchmarking reassembly/Loading fasta files

by N.E.Whiteford :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi All,

As part of my PhD project I'm working on a tool to benchmark reassembly
algorithms. To do this I'm planning on doing the following:

1. Taking a sequence file and breaking it into reads of a specified
   length and during this process adding errors.

2. Reassembly these simulated reads with the reassembly programs
   available in GAP4.

3. Align contigs of a useful size to the original sequence, note those
   that align within a given edit distance.

4. Calculate the percentage of the sequence that is covered by contigs.

I have just completed the alignment with edit distance tool and am now
beginning the processes of benchmarking reassembly algorithms. Does
anybody have any thoughts or suggestions? I should say that my main
interest is short read reassembly.

Secondly, I'm having a problem with GAP4. It only seems to load
19 sequences from my fasta file. My fasta file looks like this:

>R0
CCAATTAGTCCTATTAAGAC

>R1
CAATTAGTCCTATTAAGACT

>R2
AATTAGTCCTATTAAGACTG

>R3
ATTAGTCCTATTAAGACTGT

However if I include any more than 19 sequences in my fasta file I
get the following error:

Failed files:
    /home/new/A1.fasta (UNK) 'init: Unknown file type'

Is this a bug? Or I'm I doing something wrong?

Many Thanks for Reading,

Nava Whiteford



_______________________________________________
Staden mailing list
Staden@...
http://www.bio.net/biomail/listinfo/staden