|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
why is os.path.walk so slow?I need to synchronize the files on my home and office machine and have
been using someone else's code for this to date but have been frustrated by how slow it is in getting the information on files for the mounted drive from my office machine so I thought I'd experiment with a python facility for this. The code I've experimented with is as below def visitfunc(arg,dirname,names): global filelist import os.path for name in names: if not os.path.isdir(dirname+'/'+name): fullname=dirname+'/'+name filelist.append([fullname,os.path.getmtime(fullname)]) return() def check(dir): global filelist import os.path filelist=[] os.path.walk(dir,visitfunc,'') print filelist return() This is very fast for a directory on my local machine but significantly slower on the remote machine. Not surprising but I would have expected that the run time for the remote directory would be limited by my broadband speed but when I look at upload/download in real time it's less than 10% of maximum. Is this just par for the course or is there something I can do that better utilizes my broadband bandwidth? ==================================================================== Prof Garry Willgoose, Australian Professorial Fellow in Environmental Engineering, Director, Centre for Climate Impact Management (C2IM), School of Engineering, The University of Newcastle, Callaghan, 2308 Australia. Centre webpage: www.c2im.org.au Phone: (International) +61 2 4921 6050 (Tues-Fri AM); +61 2 6545 9574 (Fri PM-Mon) FAX: (International) +61 2 4921 6991 (Uni); +61 2 6545 9574 (personal and Telluric) Env. Engg. Secretary: (International) +61 2 4921 6042 email: garry.willgoose@...; g.willgoose@... email-for-life: garry.willgoose@... personal webpage: www.telluricresearch.com/garry ==================================================================== "Do not go where the path may lead, go instead where there is no path and leave a trail" Ralph Waldo Emerson ==================================================================== _______________________________________________ Tutor maillist - Tutor@... To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor |
|
|
Re: why is os.path.walk so slow?On Wed, Nov 4, 2009 at 6:16 AM, Garry Willgoose <garry.willgoose@...> wrote:
<snip> I'm not sure if there's a correlation, but there probably is. What OS are you (and the remote system) using? What service are you using to connect? By way of disclosure, I don't have a lot of experience in this category, but I would venture that whatever service you're using has to send/receive requests for each file/dir that os.walk checks.
I don't know precisely how os.walk works, but I'm guessing the differences are as follows: Local Machine: python->os.walk->local system calls Remote Machine: python->os.walk->call to client->data through the tubes->remote host->system calls Even if that's not completely correct, you're still going to have extra steps as you walk through the remote system, and the bottleneck will probably not be on the internet connection (and your tests seem to verify this).
It would work better, I think, if you were able to run a script on the remote system and return the results. HTH, Wayne _______________________________________________ Tutor maillist - Tutor@... To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor |
|
|
Re: why is os.path.walk so slow?On Wed, Nov 4, 2009 at 4:56 PM, Wayne Werner <waynejwerner@...> wrote:
> On Wed, Nov 4, 2009 at 6:16 AM, Garry Willgoose > <garry.willgoose@...> wrote: >> >> <snip> >> >> This is very fast for a directory on my local machine but significantly >> slower on the remote machine. Not surprising but I would have expected that >> the run time for the remote directory would be limited by my broadband speed >> but when I look at upload/download in real time it's less than 10% of >> maximum. Is this just par for the course or is there something I can do that >> better utilizes my broadband bandwidth? > > I'm not sure if there's a correlation, but there probably is. What OS are > you (and the remote system) using? What service are you using to connect? > By way of disclosure, I don't have a lot of experience in this category, but > I would venture that whatever service you're using has to send/receive > requests for each file/dir that os.walk checks. > > <snip> I'm taking a stab in the dark here, but maybe latency is the bottleneck here. The process is sending a request for each file/directory, waiting for the answer, and only then sending the next request. All those little waits add up, even though the consumed bandwidth is negligible. Running the script on the remote server should be the solution if this is the case, since you can request the data locally then transmit it in one go, eliminating most of the waiting. Hugo _______________________________________________ Tutor maillist - Tutor@... To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor |
|
|
Re: why is os.path.walk so slow?[snip]
> I need to synchronize the files on my home and office machine and have > been using someone else's code for this to date but have been > frustrated by how slow it is in getting the information on files for > the mounted drive from my office machine... [/snip] Not to cut your coding project short, and it may not even be applicable, but have you looked into rsync? They kind of wrote the book on efficiency in regards to synchronization of files. Just a thought. -Modulok- _______________________________________________ Tutor maillist - Tutor@... To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor |
|
|
Re: why is os.path.walk so slow?> [snip] >> I need to synchronize the files on my home and office machine and >> have >> been using someone else's code for this to date but have been >> frustrated by how slow it is in getting the information on files for >> the mounted drive from my office machine... > [/snip] > > Not to cut your coding project short, and it may not even be > applicable, but have you looked into rsync? They kind of wrote the > book on efficiency in regards to synchronization of files. > > Just a thought. > -Modulok- It looks like rsync is ideal for what I want ... thanks for pointing this out. One less project to work on ;-) To the other replies: 1. Both local and remote systems for the testing are OSX 10.5 and CPU was not limiting during the test. Python was the Enthought binary distribution. Internet service is ADSL 1500/512. 2. I neglected to mention that I did comment out the code inside visitfunc() with the same result so the issue is inside walk(). I guess now I know abut rsync the pressing need for this is gone (I might still write a nice GUI for rsync ;-) but I'm still intrigued by the problem and whether its a latency vs bandwidth problem. The idea of writing a script to run on the remote end is good, and something I might use for another project. _______________________________________________ Tutor maillist - Tutor@... To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor |
| Free embeddable forum powered by Nabble | Forum Help |