|
View:
New views
10 Messages
—
Rating Filter:
Alert me
|
|
|
Optional filter filesIs it possible to call rsync and tell it to use a filter file if it
exists, but otherwise continue without errors? If I pass "--filter=. .rsync-filter", it will fail if .rsync-filter doesn't exist. I know you can pass "--filter=: /.rsync-filter" to search for filter files in each directory. That won't fail if there aren't any such files. But I'm only interested in one file at the root. Thanks, Jacob -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html |
|
|
Re: Optional filter filesOn Tue, 2009-10-27 at 15:38 -0700, Jacob Weber wrote:
> Is it possible to call rsync and tell it to use a filter file if it > exists, but otherwise continue without errors? > > If I pass "--filter=. .rsync-filter", it will fail if .rsync-filter > doesn't exist. > > I know you can pass "--filter=: /.rsync-filter" to search for filter > files in each directory. That won't fail if there aren't any such > files. But I'm only interested in one file at the root. No, rsync does not have such a feature. It could be added, but I would be skeptical of letting the filter support grow organically into something unmanageable; I'd rather see it replaced with a full scripting language once and for all. For now, you can test for the filter file in the script calling rsync. Here's the syntax for bash: filter_opt=() if [ -e .rsync-filter ]; then filter_opt=("--filter=. .rsync-filter") fi rsync ... "${filter_opt[@]}" ... -- Matt -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html |
|
|
Parallel rsync's for better Performance.Hi , We have huge data to sync usually everyday and I wish rsync could guarantee performance. I thought of spliting the directories and run parallel rsyncs on them. It may cost me some network, but I can control that from the MAX_RSYNC_PROCESS variable. Can some one evaluate pros and cons of this design?. Any help is heartily appreciated. #!/usr/bin/ksh MAX_RSYNC_PROCESS=10 # Control the Parallelism from here sync_and_wait() { i=0 while read RSYNC_COMMAND do eval "${RSYNC_COMMAND}" & # The command is rsync command line i=$((i+1)) if [[ $i = ${MAX_RSYNC_PROCESS} ]] then wait i=0 fi done wait } Thanks, Satish Shukla -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html |
|
|
Re: Parallel rsync's for better Performance.On 28.10.2009 09:05, Satish Shukla wrote:
> > > Hi , > > We have huge data to sync usually everyday and I wish rsync could guarantee performance. > > I thought of spliting the directories and run parallel rsyncs on them. It may cost me some network, but I can control that from the MAX_RSYNC_PROCESS variable. Can some one evaluate pros and cons of this design?. Any help is heartily appreciated. That only works IF: - You have SSDs (preferably good ones, both sides) - Each rsync covers a different physical HDD (both sides) - You have a massive Array with truck-loads of HDDs and a matching controller or something along that line (again both sides). - A combination of the above would also work Otherwise parallel rsyncs completly kill any performance you had because normal HDDs will fall into a seek-storm, when more than 1 rsync works on them. Bis denn -- Real Programmers consider "what you see is what you get" to be just as bad a concept in Text Editors as it is in women. No, the Real Programmer wants a "you asked for it, you got it" text editor -- complicated, cryptic, powerful, unforgiving, dangerous. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html |
|
|
Re: Parallel rsync's for better Performance.On Wed, 2009-10-28 at 10:01 +0100, Matthias Schniedermeyer wrote:
> On 28.10.2009 09:05, Satish Shukla wrote: > > We have huge data to sync usually everyday and I wish rsync could guarantee performance. > > > > I thought of spliting the directories and run parallel rsyncs on them. It may cost me some network, but I can control that from the MAX_RSYNC_PROCESS variable. Can some one evaluate pros and cons of this design?. Any help is heartily appreciated. > > That only works IF: > - You have SSDs (preferably good ones, both sides) > - Each rsync covers a different physical HDD (both sides) > - You have a massive Array with truck-loads of HDDs and a matching > controller or something along that line (again both sides). > - A combination of the above would also work > > Otherwise parallel rsyncs completly kill any performance you had because > normal HDDs will fall into a seek-storm, when more than 1 rsync works on > them. Asynchronous I/O may solve that, on OSes that support it. See also this RFE, on which I have just commented: https://bugzilla.samba.org/show_bug.cgi?id=5124 -- Matt -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html |
|
|
|
|
|
Re: Parallel rsync's for better Performance.On Wed, 2009-10-28 at 17:24 +0100, Matthias Schniedermeyer wrote:
> On 28.10.2009 10:35, Matt McCutchen wrote: > > On Wed, 2009-10-28 at 10:01 +0100, Matthias Schniedermeyer wrote: > > > Otherwise parallel rsyncs completly kill any performance you had because > > > normal HDDs will fall into a seek-storm, when more than 1 rsync works on > > > them. > > > > Asynchronous I/O may solve that, on OSes that support it. > > No. That's a fundamental problem with ANY rotating media device. "Solve" may be an overstatement, but asynchronous I/O would at least help significantly because one process could issue many I/O requests to the same area of disk at once, and the disk scheduler could fulfill all of those requests before seeking elsewhere. Without asynchronous I/O, after the scheduler fulfills one request, it is left to either seek or wait for the process to issue another request. -- Matt -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html |
|
|
Re: Parallel rsync's for better Performance.On Wed, Oct 28, 2009 at 6:24 PM, Matthias Schniedermeyer <ms@...> wrote:
> No. That's a fundamental problem with ANY rotating media device. > > I don't say say that you can't build something for the people that have > that kind of hardware, or that are constrainted by high bandwidth & > latency network connections (You don't need it for low bandwidth and/or > low latency). But it would be utterly useless for the other 95-99% of > rsync users. Hmmm.... I do disagree a tad here. High performance SAN storage systems is not uncommon anymore, and some of the command queue stuff also makes it not that bad... that said, the storage sizes and amount of files+directories does make this something to consider... but them again, not everybody have 16-384GB per server image. I've been in the situation more than once where I have the RAM and the back end I/O system, and the latency communications between the two sites needed the additional rsync TCP/IP streams. And yes, high latency is defined in these cases >10ms (~20-30km distance) And yes, I've seen it needed/usefull over GigE between two machines directly connected too :( -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html |
|
|
Re: Parallel rsync's for better Performance.On 28.10.2009 18:27, Matt McCutchen wrote:
> On Wed, 2009-10-28 at 17:24 +0100, Matthias Schniedermeyer wrote: > > On 28.10.2009 10:35, Matt McCutchen wrote: > > > On Wed, 2009-10-28 at 10:01 +0100, Matthias Schniedermeyer wrote: > > > > Otherwise parallel rsyncs completly kill any performance you had because > > > > normal HDDs will fall into a seek-storm, when more than 1 rsync works on > > > > them. > > > > > > Asynchronous I/O may solve that, on OSes that support it. > > > > No. That's a fundamental problem with ANY rotating media device. > > "Solve" may be an overstatement, but asynchronous I/O would at least > help significantly because one process could issue many I/O requests to > the same area of disk at once, and the disk scheduler could fulfill all > of those requests before seeking elsewhere. Without asynchronous I/O, > after the scheduler fulfills one request, it is left to either seek or > wait for the process to issue another request. And "same disc region" is kind of a problem. In most modern filesystems inodes can be pretty random so you can't for sure sort the files by inode, or something like that. But the bigger problem may be the "99%" unchanged but millions of files case': Where on the platter is the metadata and how could you optimise disc access for that. The only thing that comes to my mind is something for when you repeatetly rsync something. You could store the access-pattern and the timing, do that several times with randomization and use a genetic algorithm that determines the best(tm) access strategy. After a few generations you should be at least better than before. :-) Bis denn -- Real Programmers consider "what you see is what you get" to be just as bad a concept in Text Editors as it is in women. No, the Real Programmer wants a "you asked for it, you got it" text editor -- complicated, cryptic, powerful, unforgiving, dangerous. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html |
|
|
Re: Parallel rsync's for better Performance.On Wed, 2009-10-28 at 23:46 +0100, Matthias Schniedermeyer wrote:
> On 28.10.2009 18:27, Matt McCutchen wrote: > > On Wed, 2009-10-28 at 17:24 +0100, Matthias Schniedermeyer wrote: > > > On 28.10.2009 10:35, Matt McCutchen wrote: > > > > On Wed, 2009-10-28 at 10:01 +0100, Matthias Schniedermeyer wrote: > > > > > Otherwise parallel rsyncs completly kill any performance you had because > > > > > normal HDDs will fall into a seek-storm, when more than 1 rsync works on > > > > > them. > > > > > > > > Asynchronous I/O may solve that, on OSes that support it. > > > > > > No. That's a fundamental problem with ANY rotating media device. > > > > "Solve" may be an overstatement, but asynchronous I/O would at least > > help significantly because one process could issue many I/O requests to > > the same area of disk at once, and the disk scheduler could fulfill all > > of those requests before seeking elsewhere. Without asynchronous I/O, > > after the scheduler fulfills one request, it is left to either seek or > > wait for the process to issue another request. > > And "same disc region" is kind of a problem. In most modern filesystems > inodes can be pretty random so you can't for sure sort the files by > inode, or something like that. I wasn't implying any effort on the process's part to choose files in the same disk region. Your statement that running parallel rsyncs creates a worse seek storm than one rsync alone is based on the assumption that each rsync tends to process files that are together on disk, and I was simply referring to that assumption. -- Matt -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html |
| Free embeddable forum powered by Nabble | Forum Help |