why is os.path.walk so slow?

View: New views
5 Messages — Rating Filter:   Alert me  

why is os.path.walk so slow?

by Garry Willgoose :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I need to synchronize the files on my home and office machine and have  
been using someone else's code for this to date but have been  
frustrated by how slow it is in getting the information on files for  
the mounted drive from my office machine so I thought I'd experiment  
with a python facility for this. The code I've experimented with is as  
below

def visitfunc(arg,dirname,names):
   global filelist
   import os.path
   for name in names:
     if not os.path.isdir(dirname+'/'+name):
       fullname=dirname+'/'+name
       filelist.append([fullname,os.path.getmtime(fullname)])
   return()

def check(dir):
   global filelist
   import os.path
   filelist=[]
   os.path.walk(dir,visitfunc,'')
   print filelist
   return()

This is very fast for a directory on my local machine but  
significantly slower on the remote machine. Not surprising but I would  
have expected that the run time for the remote directory would be  
limited by my broadband speed but when I look at upload/download in  
real time it's less than 10% of maximum. Is this just par for the  
course or is there something I can do that better utilizes my  
broadband bandwidth?

====================================================================
Prof Garry Willgoose,
Australian Professorial Fellow in Environmental Engineering,
Director, Centre for Climate Impact Management (C2IM),
School of Engineering, The University of Newcastle,
Callaghan, 2308
Australia.

Centre webpage: www.c2im.org.au

Phone: (International) +61 2 4921 6050 (Tues-Fri AM); +61 2 6545 9574  
(Fri PM-Mon)
FAX: (International) +61 2 4921 6991 (Uni); +61 2 6545 9574 (personal  
and Telluric)
Env. Engg. Secretary: (International) +61 2 4921 6042

email:  garry.willgoose@...; g.willgoose@...
email-for-life: garry.willgoose@...
personal webpage: www.telluricresearch.com/garry
====================================================================
"Do not go where the path may lead, go instead where there is no path  
and leave a trail"
                           Ralph Waldo Emerson
====================================================================





_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: why is os.path.walk so slow?

by Wayne Werner :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 4, 2009 at 6:16 AM, Garry Willgoose <garry.willgoose@...> wrote:
<snip>

This is very fast for a directory on my local machine but significantly slower on the remote machine. Not surprising but I would have expected that the run time for the remote directory would be limited by my broadband speed but when I look at upload/download in real time it's less than 10% of maximum. Is this just par for the course or is there something I can do that better utilizes my broadband bandwidth?

I'm not sure if there's a correlation, but there probably is. What OS are you (and the remote system) using? What service are you using to connect?

By way of disclosure, I don't have a lot of experience in this category, but I would venture that whatever service you're using has to send/receive requests for each file/dir that os.walk checks. 

I don't know precisely how os.walk works, but I'm guessing the differences are as follows:

Local Machine:
python->os.walk->local system calls

Remote Machine:
python->os.walk->call to client->data through the tubes->remote host->system calls

Even if that's not completely correct, you're still going to have extra steps as you walk through the remote system, and the bottleneck will probably not be on the internet connection (and your tests seem to verify this).

It would work better, I think, if you were able to run a script on the remote system and return the results.

HTH,
Wayne

_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: why is os.path.walk so slow?

by Hugo Arts :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 4, 2009 at 4:56 PM, Wayne Werner <waynejwerner@...> wrote:

> On Wed, Nov 4, 2009 at 6:16 AM, Garry Willgoose
> <garry.willgoose@...> wrote:
>>
>> <snip>
>>
>> This is very fast for a directory on my local machine but significantly
>> slower on the remote machine. Not surprising but I would have expected that
>> the run time for the remote directory would be limited by my broadband speed
>> but when I look at upload/download in real time it's less than 10% of
>> maximum. Is this just par for the course or is there something I can do that
>> better utilizes my broadband bandwidth?
>
> I'm not sure if there's a correlation, but there probably is. What OS are
> you (and the remote system) using? What service are you using to connect?
> By way of disclosure, I don't have a lot of experience in this category, but
> I would venture that whatever service you're using has to send/receive
> requests for each file/dir that os.walk checks.
>
> <snip>

I'm taking a stab in the dark here, but maybe latency is the
bottleneck here. The process is sending a request for each
file/directory, waiting for the answer, and only then sending the next
request. All those little waits add up, even though the consumed
bandwidth is negligible.

Running the script on the remote server should be the solution if this
is the case, since you can request the data locally then transmit it
in one go, eliminating most of the waiting.

Hugo
_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: why is os.path.walk so slow?

by modulok-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

[snip]
> I need to synchronize the files on my home and office machine and have
> been using someone else's code for this to date but have been
> frustrated by how slow it is in getting the information on files for
> the mounted drive from my office machine...
[/snip]

Not to cut your coding project short, and it may not even be
applicable, but have you looked into rsync? They kind of wrote the
book on efficiency in regards to synchronization of files.

Just a thought.
-Modulok-
_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: why is os.path.walk so slow?

by Garry Willgoose :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


> [snip]
>> I need to synchronize the files on my home and office machine and  
>> have
>> been using someone else's code for this to date but have been
>> frustrated by how slow it is in getting the information on files for
>> the mounted drive from my office machine...
> [/snip]
>
> Not to cut your coding project short, and it may not even be
> applicable, but have you looked into rsync? They kind of wrote the
> book on efficiency in regards to synchronization of files.
>
> Just a thought.
> -Modulok-

It looks like rsync is ideal for what I want ... thanks for pointing  
this out. One less project to work on ;-)

To the other replies:

1. Both local and remote systems for the testing are OSX 10.5 and CPU  
was not limiting during the test.  Python was the Enthought binary  
distribution. Internet service is ADSL 1500/512.
2. I neglected to mention that I did comment out the code inside  
visitfunc() with the same result so the issue is inside walk().

I guess now I know abut rsync the pressing need for this is gone (I  
might still write a nice GUI for rsync ;-) but I'm still intrigued by  
the problem and whether its a latency vs bandwidth problem. The idea  
of writing a script to run on the remote end is good, and something I  
might use for another project.
_______________________________________________
Tutor maillist  -  Tutor@...
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor