question about the "copy, then remove" behaviour of mv

View: New views
10 Messages — Rating Filter:   Alert me  

question about the "copy, then remove" behaviour of mv

by Musaul Karim :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

When you are moving files with mv, the the mv command appears to copy
all the files, then delete the ones at the source. This can be quite
problematic when copying large numbers of files (in my case it was
several thousand files amounting to nearly 200GB) in a quite deep file
structure from one hard drive to another.

What happened was that the target hard drive was connected via USB,
which got disconnected a few hours into the operation, and I had
around 140GB in the target hard drive, but all the files still in the
source hard drive. Needless to say it took me ages verifying what had
copied and what hadn't before continuing the whole move from a GUI.

Is there a way around it to make mv do the "copy, remove" operation
iteratively per file, rather than as a single operation for all the
files?

Re: question about the "copy, then remove" behaviour of mv

by Henrik Carlqvist-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Musaul Karim <musaul@...> wrote:
> Is there a way around it to make mv do the "copy, remove" operation
> iteratively per file, rather than as a single operation for all the
> files?

Instead of something like

mv file*ext /new/path

You could do

find . -maxdepth 1 -name "file*ext" -exec mv {} /new/path \;

regards Henrik
--
The address in the header is only to prevent spam. My real address is:
hc3(at)poolhem.se Examples of addresses which go to spammers:
root@localhost postmaster@localhost


Re: question about the "copy, then remove" behaviour of mv

by Bob Proulx :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Musaul Karim wrote:
> When you are moving files with mv, the the mv command appears to copy
> all the files, then delete the ones at the source.

What version of GNU mv are you using?

  mv --version

I do not see that behavior using mv 6.10 for example.  I see mv
calling the kernel's system call rename(2) for each file.  That will
move the file immediately.

Or for cross device movements the rename(2) will fail in which case mv
will copy the file before unlinking the source file.  But in both
cases it is one by one and not batched up all at once later.

This behavior might be different in different versions of the command.

> Is there a way around it to make mv do the "copy, remove" operation
> iteratively per file, rather than as a single operation for all the
> files?

As far as I can tell this is the default behavior.  You can see what
is happening with your mv command by using the strace command to
trace the system calls.  For example:

  strace -o /tmp/mv.strace mv dir1/somefiles* dir2/

Bob



Parent Message unknown Re: question about the "copy, then remove" behaviour of mv

by Musaul Karim :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Bob, my coreutils version is 6.10 as well. This is in xubuntu 9.04.

When I run ...

  $ mv -v 1 /var/tmp/

... on a two level deep directory structure with files file1 to 5
gives me the following output, which seems consistent with what
happened earlier.

`1' -> `/var/tmp/1'
`1/src' -> `/var/tmp/1/src'
`1/src/file3' -> `/var/tmp/1/src/file3'
`1/src/file1' -> `/var/tmp/1/src/file1'
`1/src/file5' -> `/var/tmp/1/src/file5'
`1/src/file4' -> `/var/tmp/1/src/file4'
`1/src/file2' -> `/var/tmp/1/src/file2'
removed `1/src/file3'
removed `1/src/file1'
removed `1/src/file5'
removed `1/src/file4'
removed `1/src/file2'
removed directory: `1/src'
removed directory: `1'


I'll try out strace later to see if it is in fact waiting for all
files to be copied before removing them.


Thanks Henrik. I'll try the 'find' way the next time I need to copy
large numbers of files.

Re: question about the "copy, then remove" behaviour of mv

by Henrik Carlqvist-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Musaul Karim <musaul@...> wrote:
> When I run ...
>
>   $ mv -v 1 /var/tmp/
>
> ... on a two level deep directory structure with files file1 to 5

> I'll try the 'find' way the next time I need to copy
> large numbers of files.

Ouch, I got the impression that you only were moving files, not
recursively moving directories. The solution I gave with find will not
help as it only makes sure that each directory is completely moved in
order.

To get the behaviour you want it might be easier to use tar:

tar --remove-files -cf - . | ( cd /new/path ; tar -xvf - )

Tar will give you some ugly error messages directories changing and being
unable to remove current directory but at least the files will be copied
and removed in the order that you want.

regards Henrik
--
The address in the header is only to prevent spam. My real address is:
hc3(at)poolhem.se Examples of addresses which go to spammers:
root@localhost postmaster@localhost


Re: question about the "copy, then remove" behaviour of mv

by Bob Proulx :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Musaul Karim wrote:
> ... on a two level deep directory structure with files file1 to 5
> gives me the following output, which seems consistent with what
> happened earlier.

First, for shallow copies I see it interlaced as normal.

  $ mv -v /dev/shm/2/* /tmp/a/1/
  `/dev/shm/2/foo1' -> `/tmp/a/1/foo1'
  removed `/dev/shm/2/foo1'
  `/dev/shm/2/foo2' -> `/tmp/a/1/foo2'
  removed `/dev/shm/2/foo2'
  `/dev/shm/2/foo3' -> `/tmp/a/1/foo3'
  removed `/dev/shm/2/date3'

But for the deeper copies that you pointed out I do see what you are
reporting as well.  Thanks for persevering.  I hadn't considered the
case you were bringing up.

  $ mv -v 1/2 /dev/shm/
  `1/2' -> `/dev/shm/2'
  `1/2/foo1' -> `/dev/shm/2/foo1'
  `1/2/foo2' -> `/dev/shm/2/foo2'
  `1/2/foo3' -> `/dev/shm/2/foo3'
  removed `1/2/foo1'
  removed `1/2/foo2'
  removed `1/2/foo3'
  removed directory: `1/2'

Hmm...  I can only agree that it doesn't seem the best behavior.  I
haven't looked at the code but I can guess that it is looping through
the arguments in the shallow case and recursively descending down
arguments in the deep case.  I am sure that accounts for why the two
cases have different behavior.

If you feel motivated I suggest raising this issue on the
bug-coreutils mailing list.  It seems there would be room for
improvement in the deep hierarchy case.  It may be reasonable to
reorganize the code to remove files earlier even in the recursive
case.

Bob



Re: question about the "copy, then remove" behaviour of mv

by Bob Proulx :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Henrik Carlqvist wrote:
> To get the behaviour you want it might be easier to use tar:
>
> tar --remove-files -cf - . | ( cd /new/path ; tar -xvf - )
>
> Tar will give you some ugly error messages directories changing and being
> unable to remove current directory but at least the files will be copied
> and removed in the order that you want.

That tar command feels scary to me in the face of crashes.  The
oringinal message brought up the problem of disconnecting a usb drive
while in the middle of a copy.  In the case of using mv if the copy
fails then the source will not be removed.  Reattaching the drive and
then restarting the mv command is possible.

In the tar --remove-files case the data may still be in transit.  Some
files will be in the pipeline between the commands but will have been
removed from the filesystem already by the first tar before the second
tar can write the data to disk.  A failure during the copy risks
losing the file in the pipeline.

Bob



Parent Message unknown Re: question about the "copy, then remove" behaviour of mv

by Henrik Carlqvist-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Bob Proulx <bob@...> wrote:
> That tar command feels scary to me in the face of crashes.

Yes, you are right, files can be lost in the pipe before being flushed to
disk.

> The oringinal message brought up the problem of disconnecting a usb
> drive while in the middle of a copy.  In the case of using mv if the
> copy fails then the source will not be removed.  Reattaching the drive
> and then restarting the mv command is possible.

Yes, but disconnecting a mounted USB drive is never a good idea. You might
still lose some files not flushed to the disc but residing in the cache
handled by the OS. Even files written might be lost because of file system
crashes.  Even some journaling file systems might lose file contents at
these circumstanses, with only metadata journaled you might only be
guaranteed that the file system itself will not be broken.

Maybe the OP ought to fix the problem with a USB disk being disconnected
during file transfers instead of fixing the way how files are transfered.

regards Henrik
--
The address in the header is only to prevent spam. My real address is:
hc3(at)poolhem.se Examples of addresses which go to spammers:
root@localhost postmaster@localhost


Re: question about the "copy, then remove" behaviour of mv

by Bob Proulx :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Henrik Carlqvist wrote:

> Bob Proulx wrote:
> > The oringinal message brought up the problem of disconnecting a usb
> > drive while in the middle of a copy.  In the case of using mv if the
> > copy fails then the source will not be removed.  Reattaching the drive
> > and then restarting the mv command is possible.
>
> Yes, but disconnecting a mounted USB drive is never a good idea. You might
> still lose some files not flushed to the disc but residing in the cache
> handled by the OS. Even files written might be lost because of file system
> crashes.  Even some journaling file systems might lose file contents at
> these circumstanses, with only metadata journaled you might only be
> guaranteed that the file system itself will not be broken.

Yes, you are right.  It is possible to lose data in that case too.

> Maybe the OP ought to fix the problem with a USB disk being disconnected
> during file transfers instead of fixing the way how files are transfered.

Yes, but, accidents do happen.  Between acts of dog and other hazards
sometimes programming extra defensively is a good thing.

Personally when moving a large collection of files from one device to
another that is going to take a long time I use rsync to copy all of
the data first.  The rsync command can be stopped and restarted many
times very efficiently because it will avoid copying what isn't
needed.  It can clean up for a previous invocation that left things in
a corrupted state.  After everything is verified to be good then
remove the source files.

Bob



Parent Message unknown Re: question about the "copy, then remove" behaviour of mv

by Henrik Carlqvist-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Bob Proulx <bob@...> wrote:
> I use rsync to copy all of the data first.

> After everything is verified to be good then remove the source files.

Yes, rsync is really great and the above soluiton is probably the best for
situations like this.

regards Henrik
--
The address in the header is only to prevent spam. My real address is:
hc3(at)poolhem.se Examples of addresses which go to spammers:
root@localhost postmaster@localhost