WARNING: This server is unstable and will be retired in the next days. If you want to keep this forum available, please request immediately a migration on the Nabble Support forum. Forums that don't receive any migration request will be deleted forever.

How do updates work?

View: New views
2 Messages — Rating Filter:   Alert me  

How do updates work?

by Kenneth Almquist :: Rate this Message:

| View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
I've been trying to understand how namespaces are supposed to support
updates.

As I understand it, a namespace advertisement contains a string called
the root, which can be used in a namespace search.  A result of a
search of a namespace may contain a string which can be used for a
subsequent search, and it looks like the search code will initiate
a search using this string automatically.

Thus one could post a file under the keyword A with next_id set
to B, and subsequently publish a file under keyword B with next_id
set to C, and finally publish a file under keyword C.  This would
allow someone to find all three files using repeated searches starting
with A.  The question is:  how does this differ from publishing all
three files under the keyword A?  In terms of speed, a single search
for A is likely to be faster than performing a series of searches
starting with A.  In terms of completeness, the answer is less clear.
Possibly a chain of searches would be more likely to locate all the
files, because each search only has to produce one result.  On the
other hand, a single failed search in the middle of chain means that
none of the files in the rest of the chain will be found, so I would
suspect that the odds of the search finding *most* of the files would
be higher with a single keyword search than with a chain once the
chain reached a certain length.

If the above mechanism provides a way to perform updates, I don't see
it.  When the ECRS paper states that, "since SBlocks are signed, it
is possible to allow updates," the point is that updating an existing
file involves deleting the old version, so we want the originator of
the content to be able to do that, without allowing anyone else to do
so.  A scheme to do this might work as follows:

Allow an SBlock to include a 16 bit version number.  (For backward
compatibility, the version number is optional, and an omitted version
number is equivalent to a version number of zero.)  The rules for
dealing with version numbers are:

1)  Any time a host sees an SBlock with a given version number, it
    gets rid of versions of the SBlock with lower version numbers.

2)  A host does not earn trust for supplying an obsolete version.
    This means that if a host is doing a search, gets one version
    of an SBlock, and then receives a later version, it should
    revoke the trust credited to the host that provided the older
    version and give it to the host that provided the newer
    version.  This rule means that there is no incentive for
    hosts to keep old versions around.

3)  If a host receives an obsolete version of an SBlock, it can
    request a refund of the trust expended on the search by
    sending a more recent version of the SBlock.  This requires
    a new message type.  This rule means that hosts which have
    cached obsolete SBlocks are likely to be provided with updated
    versions of those SBlocks.  (Otherwise, a host might cache
    an obsolete version of an SBlock indefinitely, since it would
    receive periodic search requests that matched it.)

To allow a user with a given version of a file to see if a more recent
version of the file exists, we need a variant of the keyword search
which specifies the minimum version number that is being searched for.
It would be conceptually simplest to say that a search with a version
number of N would return any version greater than or equal to N.  To
allow for reasonably efficent Bloomfield filtering, I would suggest
that a search specifying version number 0 should query the data base
for any version of the file, and a search with a nonzero version
number N should query for versions N through at least N+4.

Versioning only applies to search records.  Since DBlocks and IBlocks
may be shared by different files, I don't see any way to explicitly
get rid of the IBlocks and DBlocks from an obsolete version of a file.
Once the obsolete version of the file no longer shows up in searches,
there will be no more requests for the IBlocks and DBocks (unless
they are, in fact, shared by a different file), and they can be pushed
out of caches to make space for blocks thare are being requested.



Get free photo software from Windows Live Click here.
_______________________________________________
GNUnet-developers mailing list
GNUnet-developers@...
http://lists.gnu.org/mailman/listinfo/gnunet-developers

Re: How do updates work?

by Christian Grothoff :: Rate this Message:

| View Threaded | Show Only this Message

Am Thursday 06 August 2009 01:01:41 schrieb Kenneth Almquist:

> I've been trying to understand how namespaces are supposed to support
> updates.
>
> As I understand it, a namespace advertisement contains a string called
> the root, which can be used in a namespace search.  A result of a
> search of a namespace may contain a string which can be used for a
> subsequent search, and it looks like the search code will initiate
> a search using this string automatically.
>
> Thus one could post a file under the keyword A with next_id set
> to B, and subsequently publish a file under keyword B with next_id
> set to C, and finally publish a file under keyword C.  This would
> allow someone to find all three files using repeated searches starting
> with A.  The question is:  how does this differ from publishing all
> three files under the keyword A?

Good question.  The answer is simple: you could tell people (for example, in
an updated namespace announcement) that they should search for "B".  Then
they would never see "A".  Or you could use an informal system:

MyContent-2007  update: MyContent-2008

Someone may then guess that the next update will be "MyContent-2009" and
jump-ahead manually.  Those who don't know the system would find the 2009
version as you describe it.  With more years (or finer granularity) this may
become more useful.

> In terms of speed, a single search
> for A is likely to be faster than performing a series of searches
> starting with A.  In terms of completeness, the answer is less clear.
> Possibly a chain of searches would be more likely to locate all the
> files, because each search only has to produce one result.  On the
> other hand, a single failed search in the middle of chain means that
> none of the files in the rest of the chain will be found, so I would
> suspect that the odds of the search finding *most* of the files would
> be higher with a single keyword search than with a chain once the
> chain reached a certain length.

Well, who said you need to do chains?  The system supports publishing multiple
results under "A", so you could say that "B" updates "A" initially, and then
later you can ALSO publish that "Z" updates "A".  Then, a search would follow
both chains (presumably A->B->D->...->Z and A->Z). The real question in this
case is of course if the GUI support for this is adequate (and I'm sure the
answer is "no").

> If the above mechanism provides a way to perform updates, I don't see
> it.  When the ECRS paper states that, "since SBlocks are signed, it
> is possible to allow updates," the point is that updating an existing
> file involves deleting the old version, so we want the originator of
> the content to be able to do that, without allowing anyone else to do
> so.

Ah, well, maybe the paper is not precise here.  The idea was always to support
finding of "more recent" versions, but always without the ability to
remove "older" revisions.  We never intended to support deletions.

> A scheme to do this might work as follows:
>
> Allow an SBlock to include a 16 bit version number.  (For backward
> compatibility, the version number is optional, and an omitted version
> number is equivalent to a version number of zero.)  The rules for
> dealing with version numbers are:
>
> 1)  Any time a host sees an SBlock with a given version number, it
>     gets rid of versions of the SBlock with lower version numbers.

So the version  number would have to be stored in plaintext, which is likely
fine.  Now, we may want to allow multiple (valid/current) SBlocks per
namespace, so we would need to refine this to say "lower version numbers" for
SBlocks with the same identifier.  But identifiers are encrypted, so hosts
can not tell if two SBlocks have the same identifier => oops.

> 2)  A host does not earn trust for supplying an obsolete version.
>     This means that if a host is doing a search, gets one version
>     of an SBlock, and then receives a later version, it should
>     revoke the trust credited to the host that provided the older
>     version and give it to the host that provided the newer
>     version.  This rule means that there is no incentive for
>     hosts to keep old versions around.

Other intermediaries would most likely not be able to tell that the version is
obsolete.  Or worse: a host could deliberately keep the older versions, first
supply the oldest (cash in), then the second-oldest (cash in again), etc..  
Combined with providing low expiration times this would give peers an
incentive to keep older revisions (assuming the namespace is popular) and
hand them out first.

> 3)  If a host receives an obsolete version of an SBlock, it can
>     request a refund of the trust expended on the search by
>     sending a more recent version of the SBlock.  This requires
>     a new message type.  This rule means that hosts which have
>     cached obsolete SBlocks are likely to be provided with updated
>     versions of those SBlocks.  (Otherwise, a host might cache
>     an obsolete version of an SBlock indefinitely, since it would
>     receive periodic search requests that matched it.)

To ask for a refund, you would have to supply proof (more recent SBlock),
which would cost both peers bandwidth (more than the search query!) and IO
(need to look up the block) and, possibly worse, might allow peers to abuse
the system: the author of the namespace could just always make up a new
revision and then ask for a refund from peers storing his SBlocks. So I'm not
sure this can be done cleanly and efficiently.  Not to mention that asking
for a refund would require peers to keep a charge-history (maybe the peer
returned the SBlock without charging in the first place?), which would again
increase costs.

Finally, given that the sqstore provides long-term soft-state storage, we have
some expiration of old, obsolete versions anyway.  So I'm not convinced that
deletions in general are necessary.

> To allow a user with a given version of a file to see if a more recent
> version of the file exists, we need a variant of the keyword search
> which specifies the minimum version number that is being searched for.
> It would be conceptually simplest to say that a search with a version
> number of N would return any version greater than or equal to N.  To
> allow for reasonably efficent Bloomfield filtering, I would suggest
> that a search specifying version number 0 should query the data base
> for any version of the file, and a search with a nonzero version
> number N should query for versions N through at least N+4.

You can already do this: simply use version numbers for your identifiers.  
Use "0" for your root, then publish 1 as the first update, 2 for the second,
etc.  Then if some user wants to query for versions > N, he just has to use N
for the identifier.  So if you want a simple, linear versioning system, this
would certainly be a good identifier-convention to use.  If you want to
publish on a regular basis, using some date-format would likely be better.  
The current use of user-defined strings is in this respect a very flexible
way to choose identifiers that can be adapted (by humans, not software) to
fit many needs.  The only thing you don't get is deletion of older versions,
but as a VCS-addict I tend to think of this as a good thing.

> Versioning only applies to search records.  Since DBlocks and IBlocks
> may be shared by different files, I don't see any way to explicitly
> get rid of the IBlocks and DBlocks from an obsolete version of a file.

That's where it is nice that sqstore is soft-state: DBlocks, IBlocks, SBlocks
all will eventually be "expired" by the sqstore (and typically at the same
time).  

> Once the obsolete version of the file no longer shows up in searches,
> there will be no more requests for the IBlocks and DBocks (unless
> they are, in fact, shared by a different file), and they can be pushed
> out of caches to make space for blocks thare are being requested.

That would also be true (we expire by expiration and popularity).

Anyway, please take this just as comments and clarifications, this is a good
discussion to have at this point in time (before 0.9.0's FS APIs are too far
along...).

Christian


_______________________________________________
GNUnet-developers mailing list
GNUnet-developers@...
http://lists.gnu.org/mailman/listinfo/gnunet-developers