File processing advice

View: New views
4 Messages — Rating Filter:   Alert me  

File processing advice

by AlexG1 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

I need to write a script that executes the following operation:

1. Receive a filepath, 2 column indexes and 4 bound values (let's call them minX,maxX,minY,maxY)
2. For every line in the file, if (minX<line(index1)<maxX && minY<line(index2)<maxY) comment out the line,
otherwise - just copy it.
3. Write the results to an output file.

I wrote the script but I'm getting pretty terrible performance, e.g. a file of ~20000 lines takes about a minute to process (for comparison, the same script with the same input takes less than 10 seconds in Matlab).

I'm pretty sure I didn't write it in the best way possible (though I still can't understand why the huge difference from Matlab), so I'd like some optimization advice. What I do now is the following (using fscanf,
fprintf for reading and writing from\to file):

1. While didn't reach end of file:
        1.1 Read a line from the input file
        1.2 Check whether the values in the current line are inside the bounds:
                1.2.1 Yes - Write a comment character ('%') to the output file
                1.2.2 No - do nothing
        1.3 Write the line itself and a newline character to the output file

I think copying the whole file to the output file path and then prepending comments to each qualifying line would speed it up, but I don't know how to do that without reading and writing the whole line (which would cancel out any positive effect).

Any advice on how the speed can be improved will be highly appreciated.

Thank you,

Alex

Re: File processing advice

by AlexG1 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

AlexG1 wrote:
(using fscanf, fprintf for reading and writing from\to file):
Just to be clear - I'm using fscanf and not load for reading is because the input file may contain commented lines as well, and I can't just ignore those lines (I need to copy them to the output file).

Re: File processing advice

by Jaroslav Hajek-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Mar 2, 2009 at 5:10 PM, AlexG1 <alxgel@...> wrote:

>
>
> AlexG1 wrote:
>>
>> (using fscanf, fprintf for reading and writing from\to file):
>>
>
> Just to be clear - I'm using fscanf and not load for reading is because the
> input file may contain commented lines as well, and I can't just ignore
> those lines (I need to copy them to the output file).
> --
> View this message in context: http://www.nabble.com/File-processing-advice-tp22289045p22291009.html
> Sent from the Octave - General mailing list archive at Nabble.com.
>


Performance issues (both optimizing scripts and Octave itself) are by
far best done on real examples. Can't you share your actual code + a
test file? If they're too big to post here, perhaps make the
accessible on the web?
You may be missing something simple, or there may be a simple
optimization possible for Octave.

cheers

--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
_______________________________________________
Help-octave mailing list
Help-octave@...
https://www-old.cae.wisc.edu/mailman/listinfo/help-octave

Re: File processing advice

by AlexG1 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jaroslav Hajek-2 wrote:
On Mon, Mar 2, 2009 at 5:10 PM, AlexG1 <alxgel@gmail.com> wrote:


Performance issues (both optimizing scripts and Octave itself) are by
far best done on real examples. Can't you share your actual code + a
test file? If they're too big to post here, perhaps make the
accessible on the web?
You may be missing something simple, or there may be a simple
optimization possible for Octave.

cheers
Hi,

I attached the code and linked to the test file.
The command I used to run the script is:

filter_out('test.txt','test_out.txt',1,2,10000,150000,10000,150000)

(1,2 are the column indexes and the last 4 numbers are the bounds: minX,maxX,minY,maxY)

Test filefilter_out.m