Optional filter files

View: New views
10 Messages — Rating Filter:   Alert me  

Optional filter files

by JW12345 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Is it possible to call rsync and tell it to use a filter file if it  
exists, but otherwise continue without errors?

If I pass "--filter=. .rsync-filter", it will fail if .rsync-filter  
doesn't exist.

I know you can pass "--filter=: /.rsync-filter" to search for filter  
files in each directory. That won't fail if there aren't any such  
files. But I'm only interested in one file at the root.

Thanks,
Jacob
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Optional filter files

by Matt McCutchen-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, 2009-10-27 at 15:38 -0700, Jacob Weber wrote:
> Is it possible to call rsync and tell it to use a filter file if it  
> exists, but otherwise continue without errors?
>
> If I pass "--filter=. .rsync-filter", it will fail if .rsync-filter  
> doesn't exist.
>
> I know you can pass "--filter=: /.rsync-filter" to search for filter  
> files in each directory. That won't fail if there aren't any such  
> files. But I'm only interested in one file at the root.

No, rsync does not have such a feature.  It could be added, but I would
be skeptical of letting the filter support grow organically into
something unmanageable; I'd rather see it replaced with a full scripting
language once and for all.

For now, you can test for the filter file in the script calling rsync.
Here's the syntax for bash:

filter_opt=()
if [ -e .rsync-filter ]; then
        filter_opt=("--filter=. .rsync-filter")
fi

rsync ... "${filter_opt[@]}" ...

--
Matt

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Parallel rsync's for better Performance.

by Satish Shukla :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

 

Hi ,

We have huge data to sync usually everyday and I wish rsync could guarantee performance.

I thought of spliting the directories and run parallel rsyncs on them. It may cost me some network, but I can control that from the MAX_RSYNC_PROCESS variable. Can some one evaluate pros and cons of this design?. Any help is heartily appreciated.



#!/usr/bin/ksh

MAX_RSYNC_PROCESS=10                           # Control the Parallelism from here

sync_and_wait()
{
  i=0
  while read RSYNC_COMMAND
  do
  eval "${RSYNC_COMMAND}" &                    # The command is rsync command line
  i=$((i+1))
  if [[ $i = ${MAX_RSYNC_PROCESS} ]]
  then
    wait
    i=0
  fi
  done
  wait
}




Thanks,
Satish Shukla
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Parallel rsync's for better Performance.

by Matthias Schniedermeyer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 28.10.2009 09:05, Satish Shukla wrote:
>  
>
> Hi ,
>
> We have huge data to sync usually everyday and I wish rsync could guarantee performance.
>
> I thought of spliting the directories and run parallel rsyncs on them. It may cost me some network, but I can control that from the MAX_RSYNC_PROCESS variable. Can some one evaluate pros and cons of this design?. Any help is heartily appreciated.

That only works IF:
- You have SSDs (preferably good ones, both sides)
- Each rsync covers a different physical HDD (both sides)
- You have a massive Array with truck-loads of HDDs and a matching
  controller or something along that line (again both sides).
- A combination of the above would also work

Otherwise parallel rsyncs completly kill any performance you had because
normal HDDs will fall into a seek-storm, when more than 1 rsync works on
them.





Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Parallel rsync's for better Performance.

by Matt McCutchen-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, 2009-10-28 at 10:01 +0100, Matthias Schniedermeyer wrote:

> On 28.10.2009 09:05, Satish Shukla wrote:
> > We have huge data to sync usually everyday and I wish rsync could guarantee performance.
> >
> > I thought of spliting the directories and run parallel rsyncs on them. It may cost me some network, but I can control that from the MAX_RSYNC_PROCESS variable. Can some one evaluate pros and cons of this design?. Any help is heartily appreciated.
>
> That only works IF:
> - You have SSDs (preferably good ones, both sides)
> - Each rsync covers a different physical HDD (both sides)
> - You have a massive Array with truck-loads of HDDs and a matching
>   controller or something along that line (again both sides).
> - A combination of the above would also work
>
> Otherwise parallel rsyncs completly kill any performance you had because
> normal HDDs will fall into a seek-storm, when more than 1 rsync works on
> them.

Asynchronous I/O may solve that, on OSes that support it.

See also this RFE, on which I have just commented:

https://bugzilla.samba.org/show_bug.cgi?id=5124

--
Matt

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Parent Message unknown Re: Parallel rsync's for better Performance.

by Matthias Schniedermeyer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 28.10.2009 10:35, Matt McCutchen wrote:

> On Wed, 2009-10-28 at 10:01 +0100, Matthias Schniedermeyer wrote:
> > On 28.10.2009 09:05, Satish Shukla wrote:
> > > We have huge data to sync usually everyday and I wish rsync could guarantee performance.
> > >
> > > I thought of spliting the directories and run parallel rsyncs on them. It may cost me some network, but I can control that from the MAX_RSYNC_PROCESS variable. Can some one evaluate pros and cons of this design?. Any help is heartily appreciated.
> >
> > That only works IF:
> > - You have SSDs (preferably good ones, both sides)
> > - Each rsync covers a different physical HDD (both sides)
> > - You have a massive Array with truck-loads of HDDs and a matching
> >   controller or something along that line (again both sides).
> > - A combination of the above would also work
> >
> > Otherwise parallel rsyncs completly kill any performance you had because
> > normal HDDs will fall into a seek-storm, when more than 1 rsync works on
> > them.
>
> Asynchronous I/O may solve that, on OSes that support it.

No. That's a fundamental problem with ANY rotating media device.

I don't say say that you can't build something for the people that have
that kind of hardware, or that are constrainted by high bandwidth &
latency network connections (You don't need it for low bandwidth and/or
low latency). But it would be utterly useless for the other 95-99% of
rsync users.






Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Parallel rsync's for better Performance.

by Matt McCutchen-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, 2009-10-28 at 17:24 +0100, Matthias Schniedermeyer wrote:
> On 28.10.2009 10:35, Matt McCutchen wrote:
> > On Wed, 2009-10-28 at 10:01 +0100, Matthias Schniedermeyer wrote:
> > > Otherwise parallel rsyncs completly kill any performance you had because
> > > normal HDDs will fall into a seek-storm, when more than 1 rsync works on
> > > them.
> >
> > Asynchronous I/O may solve that, on OSes that support it.
>
> No. That's a fundamental problem with ANY rotating media device.

"Solve" may be an overstatement, but asynchronous I/O would at least
help significantly because one process could issue many I/O requests to
the same area of disk at once, and the disk scheduler could fulfill all
of those requests before seeking elsewhere.  Without asynchronous I/O,
after the scheduler fulfills one request, it is left to either seek or
wait for the process to issue another request.

--
Matt

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Parallel rsync's for better Performance.

by Hendrik Visage :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Oct 28, 2009 at 6:24 PM, Matthias Schniedermeyer <ms@...> wrote:
> No. That's a fundamental problem with ANY rotating media device.
>
> I don't say say that you can't build something for the people that have
> that kind of hardware, or that are constrainted by high bandwidth &
> latency network connections (You don't need it for low bandwidth and/or
> low latency). But it would be utterly useless for the other 95-99% of
> rsync users.

Hmmm.... I do disagree a tad here.

High performance SAN storage systems is not uncommon anymore, and some
of the command queue stuff also makes it not that bad... that said,
the storage sizes and amount of files+directories does make this
something to consider... but them again, not everybody have 16-384GB
per server image.

I've been in the situation more than once where I have the RAM and the
back end I/O system, and the latency communications between the two
sites needed the additional rsync TCP/IP streams.

And yes, high latency is defined in these cases >10ms (~20-30km distance)
And yes, I've seen it needed/usefull over GigE between two machines
directly connected too :(
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Parallel rsync's for better Performance.

by Matthias Schniedermeyer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 28.10.2009 18:27, Matt McCutchen wrote:

> On Wed, 2009-10-28 at 17:24 +0100, Matthias Schniedermeyer wrote:
> > On 28.10.2009 10:35, Matt McCutchen wrote:
> > > On Wed, 2009-10-28 at 10:01 +0100, Matthias Schniedermeyer wrote:
> > > > Otherwise parallel rsyncs completly kill any performance you had because
> > > > normal HDDs will fall into a seek-storm, when more than 1 rsync works on
> > > > them.
> > >
> > > Asynchronous I/O may solve that, on OSes that support it.
> >
> > No. That's a fundamental problem with ANY rotating media device.
>
> "Solve" may be an overstatement, but asynchronous I/O would at least
> help significantly because one process could issue many I/O requests to
> the same area of disk at once, and the disk scheduler could fulfill all
> of those requests before seeking elsewhere.  Without asynchronous I/O,
> after the scheduler fulfills one request, it is left to either seek or
> wait for the process to issue another request.

And "same disc region" is kind of a problem. In most modern filesystems
inodes can be pretty random so you can't for sure sort the files by
inode, or something like that.

But the bigger problem may be the "99%" unchanged but millions of files
case': Where on the platter is the metadata and how could you optimise
disc access for that.


The only thing that comes to my mind is something for when you
repeatetly rsync something.

You could store the access-pattern and the timing, do that several times
with randomization and use a genetic algorithm that determines the
best(tm) access strategy. After a few generations you should be at least
better than before. :-)




Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Parallel rsync's for better Performance.

by Matt McCutchen-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, 2009-10-28 at 23:46 +0100, Matthias Schniedermeyer wrote:

> On 28.10.2009 18:27, Matt McCutchen wrote:
> > On Wed, 2009-10-28 at 17:24 +0100, Matthias Schniedermeyer wrote:
> > > On 28.10.2009 10:35, Matt McCutchen wrote:
> > > > On Wed, 2009-10-28 at 10:01 +0100, Matthias Schniedermeyer wrote:
> > > > > Otherwise parallel rsyncs completly kill any performance you had because
> > > > > normal HDDs will fall into a seek-storm, when more than 1 rsync works on
> > > > > them.
> > > >
> > > > Asynchronous I/O may solve that, on OSes that support it.
> > >
> > > No. That's a fundamental problem with ANY rotating media device.
> >
> > "Solve" may be an overstatement, but asynchronous I/O would at least
> > help significantly because one process could issue many I/O requests to
> > the same area of disk at once, and the disk scheduler could fulfill all
> > of those requests before seeking elsewhere.  Without asynchronous I/O,
> > after the scheduler fulfills one request, it is left to either seek or
> > wait for the process to issue another request.
>
> And "same disc region" is kind of a problem. In most modern filesystems
> inodes can be pretty random so you can't for sure sort the files by
> inode, or something like that.

I wasn't implying any effort on the process's part to choose files in
the same disk region.  Your statement that running parallel rsyncs
creates a worse seek storm than one rsync alone is based on the
assumption that each rsync tends to process files that are together on
disk, and I was simply referring to that assumption.

--
Matt

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html