Trash spec: caching the size of the trash directory

View: New views
11 Messages — Rating Filter:   Alert me  

Trash spec: caching the size of the trash directory

by Bugzilla from faure@kde.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

In order to implement the long-requested feature "automatically remove
the oldest (or biggest) items from the trash when it reaches a certain size",
we have the problem of constantly having to determine the size of the trash
(i.e. the size of the N trash directories).

This is really easy to solve with a cache (the value written into a file)
if we can agree to put that into the spec, so that all implementations of the
trash spec update that cached size value.

In order to keep that file out of the way of the info and files subdirs, I
suggest to put it at the same level as those two, i.e. at the root of each
trash directory.

The file could be named "metadata" (generic name so that we can add
more such info later), and I would suggest an INI format, much like
the trash info files.

[Cached]
Size=130460

(in bytes of course).
The group name indicates clearly that this is a cache, i.e. if it's not there
it can always be calculated the slow way -- but the goal is to not have
to do the recursive directory listing before each trash operation, that's
just too slow.

And then it's up to implementations to update that value, when
adding a file to the trash, restoring a trashed file, deleting a trashed file,
and when emptying the trash.

Is this OK with other implementors of the trash spec? Can it be added
to the spec?

--
David Faure, faure@..., sponsored by Qt Software @ Nokia to work on KDE,
Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).
_______________________________________________
xdg mailing list
xdg@...
http://lists.freedesktop.org/mailman/listinfo/xdg

Re: Trash spec: caching the size of the trash directory

by PCMan-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

If this is going to be supported, I'd suggest that we also move all of
the information in *.trashinfo files to this metadata file to reduce
excessive I/O, disk space waste, and performance penalty needed to
maintain the whole trash bin. In this way every time we changed
something in the trash bin, only one metadata file needs to be
updated.

A potential problem of your proposal is, it's not backward compatible.
A file trashed with an older file manager won't update this file. How
to solve this?

On Fri, Aug 14, 2009 at 4:17 AM, David Faure<faure@...> wrote:

> In order to implement the long-requested feature "automatically remove
> the oldest (or biggest) items from the trash when it reaches a certain size",
> we have the problem of constantly having to determine the size of the trash
> (i.e. the size of the N trash directories).
>
> This is really easy to solve with a cache (the value written into a file)
> if we can agree to put that into the spec, so that all implementations of the
> trash spec update that cached size value.
>
> In order to keep that file out of the way of the info and files subdirs, I
> suggest to put it at the same level as those two, i.e. at the root of each
> trash directory.
>
> The file could be named "metadata" (generic name so that we can add
> more such info later), and I would suggest an INI format, much like
> the trash info files.
>
> [Cached]
> Size=130460
>
> (in bytes of course).
> The group name indicates clearly that this is a cache, i.e. if it's not there
> it can always be calculated the slow way -- but the goal is to not have
> to do the recursive directory listing before each trash operation, that's
> just too slow.
>
> And then it's up to implementations to update that value, when
> adding a file to the trash, restoring a trashed file, deleting a trashed file,
> and when emptying the trash.
>
> Is this OK with other implementors of the trash spec? Can it be added
> to the spec?
>
> --
> David Faure, faure@..., sponsored by Qt Software @ Nokia to work on KDE,
> Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).
> _______________________________________________
> xdg mailing list
> xdg@...
> http://lists.freedesktop.org/mailman/listinfo/xdg
>
_______________________________________________
xdg mailing list
xdg@...
http://lists.freedesktop.org/mailman/listinfo/xdg

Re: Trash spec: caching the size of the trash directory

by Patryk Zawadzki :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Aug 14, 2009 at 7:24 AM, PCMan<pcman.tw@...> wrote:
> A potential problem of your proposal is, it's not backward compatible.
> A file trashed with an older file manager won't update this file. How
> to solve this?

A conforming implementation might just check mtimes to decide if the
cache is out of date.

--
Patryk Zawadzki
_______________________________________________
xdg mailing list
xdg@...
http://lists.freedesktop.org/mailman/listinfo/xdg

Re: Trash spec: caching the size of the trash directory

by Alexander Larsson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 2009-08-14 at 13:24 +0800, PCMan wrote:
> If this is going to be supported, I'd suggest that we also move all of
> the information in *.trashinfo files to this metadata file to reduce
> excessive I/O, disk space waste, and performance penalty needed to
> maintain the whole trash bin. In this way every time we changed
> something in the trash bin, only one metadata file needs to be
> updated.

The trashinfo file creation is used to get atomicity when picking a
in-trash filename. This cannot be done with a single file and will risk
data loss in some race conditions.


_______________________________________________
xdg mailing list
xdg@...
http://lists.freedesktop.org/mailman/listinfo/xdg

Re: Trash spec: caching the size of the trash directory

by Alexander Larsson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, 2009-08-13 at 22:17 +0200, David Faure wrote:
> And then it's up to implementations to update that value, when
> adding a file to the trash, restoring a trashed file, deleting a trashed file,
> and when emptying the trash.
>
> Is this OK with other implementors of the trash spec? Can it be added
> to the spec?

I'm a bit concerned about the performance costs of this. It means that
you need to do a deep recursion into a folder counting its size when it
is trashed. Whereas otherwise trashing a large folder structure is O(1)
it now becomed O(number of files) and does a lot more I/O.




_______________________________________________
xdg mailing list
xdg@...
http://lists.freedesktop.org/mailman/listinfo/xdg

Re: Trash spec: caching the size of the trash directory

by Bugzilla from faure@kde.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wednesday 19 August 2009, Alexander Larsson wrote:

> On Thu, 2009-08-13 at 22:17 +0200, David Faure wrote:
> > And then it's up to implementations to update that value, when
> > adding a file to the trash, restoring a trashed file, deleting a trashed
> > file, and when emptying the trash.
> >
> > Is this OK with other implementors of the trash spec? Can it be added
> > to the spec?
>
> I'm a bit concerned about the performance costs of this. It means that
> you need to do a deep recursion into a folder counting its size when it
> is trashed. Whereas otherwise trashing a large folder structure is O(1)
> it now becomed O(number of files) and does a lot more I/O.

Correct. However the alternative (in order to implement a max size of the
trash) is that every operation that puts a new file into the trash is O(number
of files in the trash), which is clearly a lot worse!

Do you agree that the feature "maximum size of the trash" is useful?
I know many users who do ;-). Yes, even at the expense of a bit of performance
(especially if the size-counting can be done without blocking the user
interface, as is the case for us since the actual trash operations are handled
in the kioslave which is a separate process; I think it's similar in gio?).

--
David Faure, faure@..., sponsored by Qt Software @ Nokia to work on KDE,
Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).
_______________________________________________
xdg mailing list
xdg@...
http://lists.freedesktop.org/mailman/listinfo/xdg

Re: Trash spec: caching the size of the trash directory

by Alexander Larsson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, 2009-08-19 at 10:28 +0200, David Faure wrote:

> On Wednesday 19 August 2009, Alexander Larsson wrote:
> > On Thu, 2009-08-13 at 22:17 +0200, David Faure wrote:
> > > And then it's up to implementations to update that value, when
> > > adding a file to the trash, restoring a trashed file, deleting a trashed
> > > file, and when emptying the trash.
> > >
> > > Is this OK with other implementors of the trash spec? Can it be added
> > > to the spec?
> >
> > I'm a bit concerned about the performance costs of this. It means that
> > you need to do a deep recursion into a folder counting its size when it
> > is trashed. Whereas otherwise trashing a large folder structure is O(1)
> > it now becomed O(number of files) and does a lot more I/O.
>
> Correct. However the alternative (in order to implement a max size of the
> trash) is that every operation that puts a new file into the trash is O(number
> of files in the trash), which is clearly a lot worse!
>
> Do you agree that the feature "maximum size of the trash" is useful?
> I know many users who do ;-). Yes, even at the expense of a bit of performance
> (especially if the size-counting can be done without blocking the user
> interface, as is the case for us since the actual trash operations are handled
> in the kioslave which is a separate process; I think it's similar in gio?).

What if instead we store the recursive size of each directory that is
trashed it in the trashinfo file. Then this could be used to do a pretty
fast size count. If some app did not do this we could then fall back to
recursing into it (and perhaps even update the trashinfo file for next
time).

_______________________________________________
xdg mailing list
xdg@...
http://lists.freedesktop.org/mailman/listinfo/xdg

Re: Trash spec: caching the size of the trash directory

by Bugzilla from faure@kde.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wednesday 19 August 2009, Alexander Larsson wrote:

> On Wed, 2009-08-19 at 10:28 +0200, David Faure wrote:
> > On Wednesday 19 August 2009, Alexander Larsson wrote:
> > > On Thu, 2009-08-13 at 22:17 +0200, David Faure wrote:
> > > > And then it's up to implementations to update that value, when
> > > > adding a file to the trash, restoring a trashed file, deleting a
> > > > trashed file, and when emptying the trash.
> > > >
> > > > Is this OK with other implementors of the trash spec? Can it be added
> > > > to the spec?
> > >
> > > I'm a bit concerned about the performance costs of this. It means that
> > > you need to do a deep recursion into a folder counting its size when it
> > > is trashed. Whereas otherwise trashing a large folder structure is O(1)
> > > it now becomed O(number of files) and does a lot more I/O.
> >
> > Correct. However the alternative (in order to implement a max size of the
> > trash) is that every operation that puts a new file into the trash is
> > O(number of files in the trash), which is clearly a lot worse!
> >
> > Do you agree that the feature "maximum size of the trash" is useful?
> > I know many users who do ;-). Yes, even at the expense of a bit of
> > performance (especially if the size-counting can be done without blocking
> > the user interface, as is the case for us since the actual trash
> > operations are handled in the kioslave which is a separate process; I
> > think it's similar in gio?).
>
> What if instead we store the recursive size of each directory that is
> trashed it in the trashinfo file. Then this could be used to do a pretty
> fast size count. If some app did not do this we could then fall back to
> recursing into it (and perhaps even update the trashinfo file for next
> time).

That would be better than nothing. But if you trash 10000 single files
at the toplevel of the trash (no directories), then the performance will be
as slow as it is today. And for those, reading the size from the .trashinfo
file is probably slower than just doing a stat() on the file itself...

--
David Faure, faure@..., sponsored by Qt Software @ Nokia to work on KDE,
Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).
_______________________________________________
xdg mailing list
xdg@...
http://lists.freedesktop.org/mailman/listinfo/xdg

Re: Trash spec: caching the size of the trash directory

by Alexander Larsson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, 2009-08-19 at 16:30 +0200, David Faure wrote:

> On Wednesday 19 August 2009, Alexander Larsson wrote:
> > On Wed, 2009-08-19 at 10:28 +0200, David Faure wrote:
> > > On Wednesday 19 August 2009, Alexander Larsson wrote:
> > > > On Thu, 2009-08-13 at 22:17 +0200, David Faure wrote:
> > > > > And then it's up to implementations to update that value, when
> > > > > adding a file to the trash, restoring a trashed file, deleting a
> > > > > trashed file, and when emptying the trash.
> > > > >
> > > > > Is this OK with other implementors of the trash spec? Can it be added
> > > > > to the spec?
> > > >
> > > > I'm a bit concerned about the performance costs of this. It means that
> > > > you need to do a deep recursion into a folder counting its size when it
> > > > is trashed. Whereas otherwise trashing a large folder structure is O(1)
> > > > it now becomed O(number of files) and does a lot more I/O.
> > >
> > > Correct. However the alternative (in order to implement a max size of the
> > > trash) is that every operation that puts a new file into the trash is
> > > O(number of files in the trash), which is clearly a lot worse!
> > >
> > > Do you agree that the feature "maximum size of the trash" is useful?
> > > I know many users who do ;-). Yes, even at the expense of a bit of
> > > performance (especially if the size-counting can be done without blocking
> > > the user interface, as is the case for us since the actual trash
> > > operations are handled in the kioslave which is a separate process; I
> > > think it's similar in gio?).
> >
> > What if instead we store the recursive size of each directory that is
> > trashed it in the trashinfo file. Then this could be used to do a pretty
> > fast size count. If some app did not do this we could then fall back to
> > recursing into it (and perhaps even update the trashinfo file for next
> > time).
>
> That would be better than nothing. But if you trash 10000 single files
> at the toplevel of the trash (no directories), then the performance will be
> as slow as it is today. And for those, reading the size from the .trashinfo
> file is probably slower than just doing a stat() on the file itself...

Yeah, the size attribute would only make sense for directories.

I'm a bit scared about doing incremental updates of the cached total
value. Say you trash a one meg file, you read the cache file and take
the total there, add one meg and then write it back. This is ripe with
race conditions for concurrent trashes, and if not every app that
trashes or deletes trashed files get things exactly correct this
incremental change could go out of sync with reality.

For instance, starting from a trash with a single file of 100 bytes, if
some app trashed a file of 50 bytes and forgot to update the size cache,
then the user removed the initial file from the trash. Now the cache
file says 0 bytes and the mtime of the cache file is later than any
other file in the trash, so this can not be detected.


_______________________________________________
xdg mailing list
xdg@...
http://lists.freedesktop.org/mailman/listinfo/xdg

Re: Trash spec: caching the size of the trash directory

by Bugzilla from faure@kde.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Friday 21 August 2009, Alexander Larsson wrote:
> I'm a bit scared about doing incremental updates of the cached total
> value. Say you trash a one meg file, you read the cache file and take
> the total there, add one meg and then write it back. This is ripe with
> race conditions for concurrent trashes

Indeed. In fact, we just implemented a mechanism to avoid races between
trash processes. Rather than lock files, which can become stale easily,
we had the idea of using dbus registration, which has the feature of ensuring
that only one process can get hold of a given service name at a given time,
where desired (in dbus terms: "queue and allow replacement", which seems
to be flags==0 in RequestName()).

We could agree on a dbus service name like "org.freedesktop.private.lock-
trash" for this, and this way only one "trash" implementation could do stuff
at a given moment, using the standard "get lock, perform work, release lock"
mechanism to avoid the races. (the "private" in the name is so that users look
away and don't try to treat this like a publically-accessible dbus service).

http://lxr.kde.org/ident?i=KInterProcessLock for the implementation.

> [if some app] forgot to update the size cache, [...]

I know there's going to be a small transition period where not all
implementations update the size cache, but IMHO that's not critical;
the "your trash is full" warning will come up a bit too late or a bit too
early, no big deal. If in the long run all implementations implement this,
that's fine.

--
David Faure, faure@..., sponsored by Qt Software @ Nokia to work on KDE,
Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).
_______________________________________________
xdg mailing list
xdg@...
http://lists.freedesktop.org/mailman/listinfo/xdg

Re: Trash spec: caching the size of the trash directory

by Matthew Paul Thomas :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alexander Larsson wrote on 19/08/09 08:25:

>
> On Thu, 2009-08-13 at 22:17 +0200, David Faure wrote:
>>
>> And then it's up to implementations to update that value, when
>> adding a file to the trash, restoring a trashed file, deleting a
>> trashed file, and when emptying the trash.
>...
> I'm a bit concerned about the performance costs of this. It means that
> you need to do a deep recursion into a folder counting its size when
> it is trashed. Whereas otherwise trashing a large folder structure is
> O(1) it now becomed O(number of files) and does a lot more I/O.
>...

If you're considering it specifically for the Trash, the calculation and
disposal needn't be synchronous with the trashing action. It could
happen in the background.

(I'm not sure why you're considering it specifically for the trash,
though. Knowing the size of a folder quickly is useful for many folders,
for example when burning a CD or DVD, or when copying items to a memory
stick.)

- --
Matthew Paul Thomas
http://mpt.net.nz/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkqSi/8ACgkQ6PUxNfU6ecrBVACfaEB3J+V6R4q74lWnETmwni5e
Ij4AoNOoTqUMnyunPdnKCnKs5y0ZsUdd
=PcZp
-----END PGP SIGNATURE-----

_______________________________________________
xdg mailing list
xdg@...
http://lists.freedesktop.org/mailman/listinfo/xdg