How to determine if a file is in use

View: New views
8 Messages — Rating Filter:   Alert me  

How to determine if a file is in use

by Donald Russell-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Another system uses FTP to drop files in a directory for me to process.
I have a bash script to process the incoming files. The script is started by cron periodically.

There's a problem if the FTP transfer is still in progress because the process begins reading the file even though it isn't complete yet.

From a bash script, is there a way to tell if the file is still being written to?
I was looking at the lsof command, which will tell me if the file is opened or not, so that's a possibility... but it sure seems awkward for the task.

I could also configure the ftp server to lock files being written, but that seems to be discouraged. (based on man vsftpd.conf)

Basically, what I want is something like
Can I get an exclusive read on file x?
No - skip that file, go onto the next one
Yes - start processing that file
(I'm not concerned about the possible race condition there... I have other protections for that)

Thanks for any suggestions...


 

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines

Re: How to determine if a file is in use

by Patrick O'Callaghan-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, 2009-11-03 at 13:31 -0800, Donald Russell wrote:

> Another system uses FTP to drop files in a directory for me to
> process.
> I have a bash script to process the incoming files. The script is
> started by cron periodically.
>
> There's a problem if the FTP transfer is still in progress because the
> process begins reading the file even though it isn't complete yet.
>
> From a bash script, is there a way to tell if the file is still being
> written to?
> I was looking at the lsof command, which will tell me if the file is
> opened or not, so that's a possibility... but it sure seems awkward
> for the task.

Not really. Since you know that the ftp demon is the only potential
writer for the file, you can use

        lsof -p <demon-pid> | grep <filename>

> I could also configure the ftp server to lock files being written, but
> that seems to be discouraged. (based on man vsftpd.conf)

inotify(7) could do the job, but would require some programming.

poc

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines

Re: How to determine if a file is in use

by Rick Stevens-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Donald Russell wrote:

> Another system uses FTP to drop files in a directory for me to process.
> I have a bash script to process the incoming files. The script is started by
> cron periodically.
>
> There's a problem if the FTP transfer is still in progress because the
> process begins reading the file even though it isn't complete yet.
>
>>From a bash script, is there a way to tell if the file is still being
> written to?
> I was looking at the lsof command, which will tell me if the file is opened
> or not, so that's a possibility... but it sure seems awkward for the task.
>
> I could also configure the ftp server to lock files being written, but that
> seems to be discouraged. (based on man vsftpd.conf)
>
> Basically, what I want is something like
> Can I get an exclusive read on file x?
> No - skip that file, go onto the next one
> Yes - start processing that file
> (I'm not concerned about the possible race condition there... I have other
> protections for that)
>
> Thanks for any suggestions...

The "lsof(1)" command can tell you if a file is open or in use by some
process, but it is not atomic.

Lock files are useful, but can cause problems if, say, the process that
created the lockfile dies for some reason without removing it.

IIRC, vsftpd creates an exclusive write lock on files that are being
created.  That, or it creates a temp file and when complete, renames it.
Can't recall...it's been awhile since I went trudging through the
source.

You can try to use flock(1) to get locks on files in shells.  See the
man page for it for suggested uses.  I'd suggest using exclusive locks
rather than advisory.
----------------------------------------------------------------------
- Rick Stevens, Systems Engineer                      ricks@... -
- AIM/Skype: therps2        ICQ: 22643734            Yahoo: origrps2 -
-                                                                    -
-           This message printed using recycled bandwidth            -
----------------------------------------------------------------------

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines

Re: How to determine if a file is in use

by Christopher K. Johnson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Donald Russell wrote:
> Another system uses FTP to drop files in a directory for me to process.
> I have a bash script to process the incoming files. The script is
> started by cron periodically.
>
> There's a problem if the FTP transfer is still in progress because the
> process begins reading the file even though it isn't complete yet.
Do you have control of the FTP procedure that drops the files?  If so,
transfer the files with one filename, and when complete, use ftp to
rename the file.  The rename is atomic.  e.g.:
put foo.bar foo.bar.xfer
rename foo.bar.xfer foo.bar

Then have the cron job only process files without the .xfer appended to
name.

Chris

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines

Re: How to determine if a file is in use

by Richard England-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 11/03/2009 01:31 PM, Donald Russell wrote:

> Another system uses FTP to drop files in a directory for me to process.
> I have a bash script to process the incoming files. The script is
> started by cron periodically.
>
> There's a problem if the FTP transfer is still in progress because the
> process begins reading the file even though it isn't complete yet.
>
> From a bash script, is there a way to tell if the file is still being
> written to?
> I was looking at the lsof command, which will tell me if the file is
> opened or not, so that's a possibility... but it sure seems awkward
> for the task.
>
> I could also configure the ftp server to lock files being written, but
> that seems to be discouraged. (based on man vsftpd.conf)
>
> Basically, what I want is something like
> Can I get an exclusive read on file x?
> No - skip that file, go onto the next one
> Yes - start processing that file
> (I'm not concerned about the possible race condition there... I have
> other protections for that)
>
> Thanks for any suggestions...
>
>
>

Perhaps "fuser" might be of use?

--
------------------------------------------------------------------------
/~~R/

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines

Re: How to determine if a file is in use

by Patrick O'Callaghan-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, 2009-11-03 at 19:23 -0800, Richard England wrote:

> On 11/03/2009 01:31 PM, Donald Russell wrote:
> > Another system uses FTP to drop files in a directory for me to process.
> > I have a bash script to process the incoming files. The script is
> > started by cron periodically.
> >
> > There's a problem if the FTP transfer is still in progress because the
> > process begins reading the file even though it isn't complete yet.
> >
> > From a bash script, is there a way to tell if the file is still being
> > written to?
> > I was looking at the lsof command, which will tell me if the file is
> > opened or not, so that's a possibility... but it sure seems awkward
> > for the task.
> >
> > I could also configure the ftp server to lock files being written, but
> > that seems to be discouraged. (based on man vsftpd.conf)
> >
> > Basically, what I want is something like
> > Can I get an exclusive read on file x?
> > No - skip that file, go onto the next one
> > Yes - start processing that file
> > (I'm not concerned about the possible race condition there... I have
> > other protections for that)
> >
> > Thanks for any suggestions...
> >
> >
> >
>
> Perhaps "fuser" might be of use?

Duh! I had a nagging feeling that lsof wasn't the best answer, it's just
that I've been obsessing about it recently :-)

File locking is unimportant for the OP's application. If fuser says the
file's still in use, just postpone for another cycle. Presumably the
same file isn't going to be overwritten by another ftp process before it
has a chance to be read.

poc

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines

Re: How to determine if a file is in use

by Cameron Simpson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 03Nov2009 13:31, Donald Russell <russell.don@...> wrote:
| Another system uses FTP to drop files in a directory for me to process.
| I have a bash script to process the incoming files. The script is started by
| cron periodically.
|
| There's a problem if the FTP transfer is still in progress because the
| process begins reading the file even though it isn't complete yet.

I liked the upload-then-rename suggested by another poster, if you can
get this implemented.

Otherwise...

[...]
| I could also configure the ftp server to lock files being written, but that
| seems to be discouraged. (based on man vsftpd.conf)

It's not discouraged for any reason that seems to match your use case.
You've got a well defined upload area and no malicious users.
Use the lock facility! That's what it's for!

| Basically, what I want is something like
| Can I get an exclusive read on file x?
| No - skip that file, go onto the next one
| Yes - start processing that file

Do it! See above! Have you tried it?

Cheers,
--
Cameron Simpson <cs@...> DoD#743
http://www.cskk.ezoshosting.com/cs/

Carpe Daemon - Seize the Background Process
        - Paul Tomblin <ab401@...>

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines

Re: How to determine if a file is in use

by Donald Russell-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Tue, Nov 3, 2009 at 20:49, Cameron Simpson <cs@...> wrote:
On 03Nov2009 13:31, Donald Russell <russell.don@...> wrote:
| Another system uses FTP to drop files in a directory for me to process.
| I have a bash script to process the incoming files. The script is started by
| cron periodically.
|
| There's a problem if the FTP transfer is still in progress because the
| process begins reading the file even though it isn't complete yet.

I liked the upload-then-rename suggested by another poster, if you can
get this implemented.

Otherwise...

[...]
| I could also configure the ftp server to lock files being written, but that
| seems to be discouraged. (based on man vsftpd.conf)

It's not discouraged for any reason that seems to match your use case.
You've got a well defined upload area and no malicious users.
Use the lock facility! That's what it's for!

| Basically, what I want is something like
| Can I get an exclusive read on file x?
| No - skip that file, go onto the next one
| Yes - start processing that file

Do it! See above! Have you tried it?

Cheers,



Thank you all for some great suggestions.... :-)

Based on the feedback I've received, I'm going to ...

1 - configure vsftpd to lock files while writing (no malicious users etc)
2 - use ftp put/rename like put ftp-in-progress.foo.bar / rename ftp-in-progress.foo.bar foo.bar because it provides such a great "visual" for watchers, and a convenient way to determine which files are "in transit" and which are complete.
3 - use lockfile/fuser to ensure my cron job doesn't start processing a file that's already being read by an earlier cron job.

Cheers



--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines