max file size

View: New views
7 Messages — Rating Filter:   Alert me  

max file size

by Heinz-Josef Claes-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

does anybody know what's the maximum file size (terabytes?) when using rsync
with options --checksum and / or --inplace?

What file sizes have been tested in reality? Are there any experiences using
rsync (with --checksum and / or --inplace) for big files with several / dozens
or terabytes?

Thanks a lot, Heinz-Josef Claes
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: max file size

by Matt McCutchen-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, 2009-11-09 at 11:43 +0100, Heinz-Josef Claes wrote:
> does anybody know what's the maximum file size (terabytes?) when using rsync
> with options --checksum and / or --inplace?
>
> What file sizes have been tested in reality? Are there any experiences using
> rsync (with --checksum and / or --inplace) for big files with several / dozens
> or terabytes?

I don't believe rsync has a fixed maximum size other than "what can fit
in 64 bits", but I can't speak to any reliability issues that might come
up with extremely large files.

For what purpose are you considering --checksum?  In the case where the
file's size hasn't changed (probably true for large image files), it
will add an extra full read of the file on both sides before the
transfer begins, which would be very expensive for multi-terabyte files.

--
Matt

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: max file size

by Heinz-Josef Claes-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Am Montag, 9. November 2009 17:48:35 schrieb Matt McCutchen:

> On Mon, 2009-11-09 at 11:43 +0100, Heinz-Josef Claes wrote:
> > does anybody know what's the maximum file size (terabytes?) when using
> > rsync with options --checksum and / or --inplace?
> >
> > What file sizes have been tested in reality? Are there any experiences
> > using rsync (with --checksum and / or --inplace) for big files with
> > several / dozens or terabytes?
>
> I don't believe rsync has a fixed maximum size other than "what can fit
> in 64 bits", but I can't speak to any reliability issues that might come
> up with extremely large files.
>
I've read about a fix for overrun checksum buffers with more than some hundred
terabytes but that was just something undefined . . .

> For what purpose are you considering --checksum?  In the case where the
> file's size hasn't changed (probably true for large image files), it
> will add an extra full read of the file on both sides before the
> transfer begins, which would be very expensive for multi-terabyte files.

I want to check if the following is possible:

1. transport a big block of data (several terabytes) physically from location
A to location B (very long distance) via tapes (or disks).
(Location A and B use different storage technologies.)

When the tapes arrive in location B, the block of data has changed in location
A (a program / OS is running and storing data in it).

2. shutdown application / OS in location A, rsync the delta between Location A
and B online, then restart the system in location B.

(Perhaps step 2 has to be done multiple times.)

--
There a lots of other aspects in this scenario, but that's another story.

Regards, HJC
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: max file size

by Matt McCutchen-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, 2009-11-09 at 18:20 +0100, Heinz-Josef Claes wrote:

> Am Montag, 9. November 2009 17:48:35 schrieb Matt McCutchen:
> > On Mon, 2009-11-09 at 11:43 +0100, Heinz-Josef Claes wrote:
> > > does anybody know what's the maximum file size (terabytes?) when using
> > > rsync with options --checksum and / or --inplace?
> > >
> > > What file sizes have been tested in reality? Are there any experiences
> > > using rsync (with --checksum and / or --inplace) for big files with
> > > several / dozens or terabytes?
> >
> > I don't believe rsync has a fixed maximum size other than "what can fit
> > in 64 bits", but I can't speak to any reliability issues that might come
> > up with extremely large files.
> >
> I've read about a fix for overrun checksum buffers with more than some hundred
> terabytes but that was just something undefined . . .

Indeed, I forgot about that.  The delta-transfer algorithm doesn't work
for files longer than 2^31 blocks.  With the default maximum block size
of 2^17, the limit is 2^48 bytes or 256 TB.  You could stretch the limit
by fixing a larger block size with --block-size .  See:

https://bugzilla.samba.org/show_bug.cgi?id=5459#c2

> > For what purpose are you considering --checksum?  In the case where the
> > file's size hasn't changed (probably true for large image files), it
> > will add an extra full read of the file on both sides before the
> > transfer begins, which would be very expensive for multi-terabyte files.
>
> I want to check if the following is possible:
>
> 1. transport a big block of data (several terabytes) physically from location
> A to location B (very long distance) via tapes (or disks).
> (Location A and B use different storage technologies.)
>
> When the tapes arrive in location B, the block of data has changed in location
> A (a program / OS is running and storing data in it).
>
> 2. shutdown application / OS in location A, rsync the delta between Location A
> and B online, then restart the system in location B.
>
> (Perhaps step 2 has to be done multiple times.)

Since the source and destination versions are practically certain to
differ, --checksum would serve no purpose.  See the man page description
of --checksum.

--
Matt

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: max file size

by Heinz-Josef Claes-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 13 Nov 2009 01:38:48 -0500
Matt McCutchen <matt@...> wrote:

> On Mon, 2009-11-09 at 18:20 +0100, Heinz-Josef Claes wrote:
> > Am Montag, 9. November 2009 17:48:35 schrieb Matt McCutchen:
> > > On Mon, 2009-11-09 at 11:43 +0100, Heinz-Josef Claes wrote:
> > > > does anybody know what's the maximum file size (terabytes?) when using
> > > > rsync with options --checksum and / or --inplace?
> > > >
> > > > What file sizes have been tested in reality? Are there any experiences
> > > > using rsync (with --checksum and / or --inplace) for big files with
> > > > several / dozens or terabytes?
> > >
> > > I don't believe rsync has a fixed maximum size other than "what can fit
> > > in 64 bits", but I can't speak to any reliability issues that might come
> > > up with extremely large files.
> > >
> > I've read about a fix for overrun checksum buffers with more than some hundred
> > terabytes but that was just something undefined . . .
>
> Indeed, I forgot about that.  The delta-transfer algorithm doesn't work
> for files longer than 2^31 blocks.  With the default maximum block size
> of 2^17, the limit is 2^48 bytes or 256 TB.  You could stretch the limit
> by fixing a larger block size with --block-size .  See:
>
> https://bugzilla.samba.org/show_bug.cgi?id=5459#c2

Thanks for that information!

Do you (or anybody) every has done a test with big file sizes?

>
> > > For what purpose are you considering --checksum?  In the case where the
> > > file's size hasn't changed (probably true for large image files), it
> > > will add an extra full read of the file on both sides before the
> > > transfer begins, which would be very expensive for multi-terabyte files.
> >
> > I want to check if the following is possible:
> >
> > 1. transport a big block of data (several terabytes) physically from location
> > A to location B (very long distance) via tapes (or disks).
> > (Location A and B use different storage technologies.)
> >
> > When the tapes arrive in location B, the block of data has changed in location
> > A (a program / OS is running and storing data in it).
> >
> > 2. shutdown application / OS in location A, rsync the delta between Location A
> > and B online, then restart the system in location B.
> >
> > (Perhaps step 2 has to be done multiple times.)
>
> Since the source and destination versions are practically certain to
> differ, --checksum would serve no purpose.  See the man page description
> of --checksum.
>

Don't understand what you mean. From 1. und 2., only a few percent of the data will change, so the idea is to transfer the differences only. Transferring the whole file online takes too long.
How to do this without check sums (either --checksum or --inbound)?

I'll probably be able to make a test with a file size of some terabytes in the next weeks, but that's not guaranteed.

Regards, HJC
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: max file size

by Matt McCutchen-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 2009-11-13 at 12:36 +0100, Heinz-Josef Claes wrote:

> On Fri, 13 Nov 2009 01:38:48 -0500
> Matt McCutchen <matt@...> wrote:
> > On Mon, 2009-11-09 at 18:20 +0100, Heinz-Josef Claes wrote:
> > > I want to check if the following is possible:
> > >
> > > 1. transport a big block of data (several terabytes) physically from location
> > > A to location B (very long distance) via tapes (or disks).
> > > (Location A and B use different storage technologies.)
> > >
> > > When the tapes arrive in location B, the block of data has changed in location
> > > A (a program / OS is running and storing data in it).
> > >
> > > 2. shutdown application / OS in location A, rsync the delta between Location A
> > > and B online, then restart the system in location B.
> > >
> > > (Perhaps step 2 has to be done multiple times.)
> >
> > Since the source and destination versions are practically certain to
> > differ, --checksum would serve no purpose.  See the man page description
> > of --checksum.
>
> Don't understand what you mean. From 1. und 2., only a few percent of
> the data will change, so the idea is to transfer the differences only.
> Transferring the whole file online takes too long.
> How to do this without check sums (either --checksum or --inbound)?

Did you read the description of --checksum as I suggested?  It is an
alternative "quick check" for deciding whether a file needs to be
transferred, which is not what you want.  You're talking about the
delta-transfer algorithm, which is on by default for remote runs and is
controlled by a separate option, --(no-)whole-file.

--
Matt

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: max file size

by Heinz-Josef Claes-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 13 Nov 2009 13:33:08 -0500
Matt McCutchen <matt@...> wrote:

> On Fri, 2009-11-13 at 12:36 +0100, Heinz-Josef Claes wrote:
> > On Fri, 13 Nov 2009 01:38:48 -0500
> > Matt McCutchen <matt@...> wrote:
> > > On Mon, 2009-11-09 at 18:20 +0100, Heinz-Josef Claes wrote:
> > > > I want to check if the following is possible:
> > > >
> > > > 1. transport a big block of data (several terabytes) physically from location
> > > > A to location B (very long distance) via tapes (or disks).
> > > > (Location A and B use different storage technologies.)
> > > >
> > > > When the tapes arrive in location B, the block of data has changed in location
> > > > A (a program / OS is running and storing data in it).
> > > >
> > > > 2. shutdown application / OS in location A, rsync the delta between Location A
> > > > and B online, then restart the system in location B.
> > > >
> > > > (Perhaps step 2 has to be done multiple times.)
> > >
> > > Since the source and destination versions are practically certain to
> > > differ, --checksum would serve no purpose.  See the man page description
> > > of --checksum.
> >
> > Don't understand what you mean. From 1. und 2., only a few percent of
> > the data will change, so the idea is to transfer the differences only.
> > Transferring the whole file online takes too long.
> > How to do this without check sums (either --checksum or --inbound)?
>
> Did you read the description of --checksum as I suggested?  It is an
> alternative "quick check" for deciding whether a file needs to be
> transferred, which is not what you want.  You're talking about the
> delta-transfer algorithm, which is on by default for remote runs and is
> controlled by a separate option, --(no-)whole-file.
>
You're right - sorry misunderstanding from my side.
--no-whole-file --out-format='%n%L (%b of %l)'
does the job.
Thanks, HJC
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html