blu.org  wiki

Rsync snapshots, Maildir, and Sarbanes-Oxley

View: New views
3 Messages — Rating Filter:   Alert me  

Rsync snapshots, Maildir, and Sarbanes-Oxley

by John Abreau-19 :: Rate this Message:

| View Threaded | Show Only this Message

Another issue I'm thinking about for my rsync backup server concerns
users' Maildirs. My mail server runs Courier IMAP, with incoming mail
dropped into Maildir mail folders. As I understand it, Maildir stores
message metadata in each message's filename, and that suggests
that Maildirs could chew up a lot of space on the rsync server with
redundant copies of the same message in slightly different filenames.

For Sarbanes-Oxley purposes, I need to retain all messages, so
I can't use rsync's --delete option to remove messages from the
backups that the user had deleted. But I suspect that means the
backups will contain multiple copies of the same message with
different filenames, and I imagine that would screw things up when
I need to do a restore.

How are others handling this sort of issue? Is there a simple way
to avoid having duplicate copies on the backup server?

--
John Abreau / Executive Director, Boston Linux & Unix
AIM abreauj / JABBER jabr@... / YAHOO abreauj / SKYPE zusa_it_mgr
Email jabr@... / WWW http://www.abreau.net / PGP-Key-ID 0xD5C7B5D9
PGP-Key-Fingerprint 72 FB 39 4F 3C 3B D6 5B E0 C8 5A 6E F1 2C BE 99

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



Re: Rsync snapshots, Maildir, and Sarbanes-Oxley

by Tom Metro-12 :: Rate this Message:

| View Threaded | Show Only this Message

John Abreau wrote:
> I've been making backups by hand for now, by manually copying the
> guest systems to an external usb drive periodically. But it would be
> nice to somehow be able to include them in the rsync backups.

Can you automate the startup and shutdown of the VMware guests? If so,
then it seems a little bit of scripting wrapped around rsync should do it.


> As I understand it, Maildir stores
> message metadata in each message's filename, and that suggests
> that Maildirs could chew up a lot of space on the rsync server with
> redundant copies of the same message in slightly different filenames.

Perhaps the --fuzzy option would prove useful (quoting the man page):

   --fuzzy
      This option tells rsync that it should look for a basis file  for
      any  destination  file  that  is  missing.  The current algorithm
      looks in the same directory as the destination file for either  a
      file  that  has  an  identical size and modified-time, or a simi-
      larly-named file.  If found, rsync uses the fuzzy basis  file  to
      try to speed up the transfer.


> For Sarbanes-Oxley purposes, I need to retain all messages, so
> I can't use rsync's --delete option to remove messages from the
> backups that the user had deleted.

If you have snapshots set up correctly, you can delete files from the
current backup, and they'll persist in the older snapshots.

  -Tom

--
Tom Metro
Venture Logic, Newton, MA, USA
"Enterprise solutions through open source."
Professional Profile: http://tmetro.venturelogic.com/

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



Re: Rsync snapshots, Maildir, and Sarbanes-Oxley

by bostonlinuxandunix :: Rate this Message:

| View Threaded | Show Only this Message

On 3/14/07, Tom Metro <blu@...> wrote:
> John Abreau wrote:
> > I've been making backups by hand for now, by manually copying the
> > guest systems to an external usb drive periodically. But it would be
> > nice to somehow be able to include them in the rsync backups.
>
> Can you automate the startup and shutdown of the VMware guests? If so,
> then it seems a little bit of scripting wrapped around rsync should do it.

I found there's a command "vmware-loop" that lets me access the
guest systems' drives:

    vmware-loop -p /path/to/guest/disk.vmdk

prints a list of partitions, and

    vmware-loop -r /path/to/guest/disk.vmdk 1 /dev/nbd0

maps partition 1 to the Net Block device 0. Before running this, the
nbd driver needs to be loaded with "modprobe nbd".

Now I need to look for an ntfs filesystem driver for CentOS 4.4.


> > As I understand it, Maildir stores
> > message metadata in each message's filename, and that suggests
> > that Maildirs could chew up a lot of space on the rsync server with
> > redundant copies of the same message in slightly different filenames.
>
> Perhaps the --fuzzy option would prove useful (quoting the man page):
>
>    --fuzzy
>       This option tells rsync that it should look for a basis file  for
>       any  destination  file  that  is  missing.  The current algorithm
>       looks in the same directory as the destination file for either  a
>       file  that  has  an  identical size and modified-time, or a simi-
>       larly-named file.  If found, rsync uses the fuzzy basis  file  to
>       try to speed up the transfer.

Yup, --fuzzy sounds like exactly what I was looking for.


> > For Sarbanes-Oxley purposes, I need to retain all messages, so
> > I can't use rsync's --delete option to remove messages from the
> > backups that the user had deleted.
>
> If you have snapshots set up correctly, you can delete files from the
> current backup, and they'll persist in the older snapshots.
>

Now that I'm aware of the --fuzzy option, the rest should be straightforward.
The rules for retaining email will be different that for ordinary backups,
so I guess I should set up a separate instance of rsnapshot for the
email backups.

The biggest problems in the Sarbanes-Oxley area are non-technical;
I can't seem to find any clear explanation of exactly what I need to
archive, and for how long. I guess I need to track down my company's
legal department and ask them to look into it.

--
John Abreau / Executive Director, Boston Linux & Unix
GnuPG KeyID: 0xD5C7B5D9 / Email: abreauj@...
GnuPG FP: 72 FB 39 4F 3C 3B D6 5B E0 C8 5A 6E F1 2C BE 99

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.