Alternative persistence system

View: New views
5 Messages — Rating Filter:   Alert me  

Alternative persistence system

by Thomas Leonard :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Summary

I've spent some time using the E persistence mechanism now, and I've
found it to be clever, elegant and very hard to use for my purposes. I
ended up writing my own replacement for the timeMachine and
makeSturdyRef objects. Below is a report on the problems I had and the
solution I ended up with. Has anyone else had similar issues?


Problems with the default system

My test case is a fairly large program with many different kinds of
objects, most of which need to be made persistent and exported as
SturdyRefs. The program is divided into many modules, and plugins may
extend it. Using the default E mechanism (makeSturdyRef and
timeMachine), I found these problems:

1. Scalability: The system does not scale well. When an object's state
changes, the entire object graph has to be written out, which takes time
proportional to the number of objects in the system (roughly 10 ms per
object on my machine), not to the number of changes. Fixing this would,
presumably, require making each persistent object's representation be
independent of that of all other persistent objects. Ideally, I'd like
to save after creating every persistent object, before giving the sturdy
ref to the user. I expect to be creating many such objects per second in
some cases.

2. Redundant information: an object's portrayal must include all of its
authority. If I have 1000 "job" objects, each with access to "timer",
then the saved file will include 1000 references to "timer". This could
be fixed by referring to the "parent" object (the one that originally
created this one) instead (e.g. revive using "parent.makeJob()"), but
this still results in 1000 references to the parent. Also, this solution
conflicts with (1), since we want to make the portrayals independent,
and it requires giving the object access to its parent, which may not be
desirable from a security point of view.

3. Difficulty upgrading: If I have a saved file containing "job" objects
without timers, and I now decide that job objects should have timers,
there is no easy way to add them later (at least, not without messing
around with the surgeon's exits). Also, if I give an object access to a
subdirectory, it persists with an absolute pathname and I can't restart
in a different directory.

4. Organisation: In my systems at least, the owner of the service/vat
should be able to see the state of the system and discover all objects.
Each object is owned by some parent object, which maintains a list of
its children. The E persistence system makes it easy to have objects
which are exported and persistent but which are not owned by any object.
You would have to catch and handle exceptions very carefully to ensure
that this couldn't happen. Also, if I give makeSturdyRef to an object, I
have no control over the objects it creates. I want to group objects so
that I know where they came from and can destroy the whole group at
once.

5. Safety: Without persistence, objects accept authority but don't
generally give it out (unless that's part of their function). Making an
object persistent can be done in two ways (__optUncall and
__optSealedDispatch). The first is easy but unsafe, the second is harder
but safer (though still with some issues, as mentioned previously). A
typical programmer, not too concerned with security, has a reasonable
chance of writing a fairly secure E object that doesn't expose more
authority than it should. However, they are very likely to take the easy
and less secure option of using __optUncall for persistence.

6. Too many code paths: A persistable object must implement three code
paths: create, save and revive. Most objects are not designed for
persistence and only support the first case. An object which depends on
an unpersistable object is also unpersistable. For example, if I call
makeObject(file.deepReadOnly()) then the resulting object cannot be
persisted, because read-only files cannot be. Also, the revive operation
must be made public so that the persistence system can call it. This may
be safe, but it is not good API design as other people may start using
it by mistake.


A possible solution

While perhaps not as elegant as E's system, my replacement works for my
largish use-case and solves the above problems (while preserving the
essential property that objects can't take advantage of the persistence
system to gain authority).

Persistent objects are arranged in a tree. When saved, each node
contains the Swiss base of the object and a method call on the parent
that would re-create the object.

Each node in the tree is actually three E objects:

- A "builder" object.
- A "persistNode" object (provided by the persistence system and holding
the Swiss base).
- A "public" object, created by the builder. Holders of the SturdyRef
can call methods on this object.

The root builder object is provided by the application. All other
builders are created by their parent builders. For example, a chat
server managing chat rooms might look like this:

def makeChatServer() {
        return def chatServer {
                to makePublic(persistNode) {
                        return def chatServerPub {
                                to createChatRoom(name) {
                                        require(validRoomName(name))
                                        return persistNode.makeSturdyChild("loadChatRoom", [name])
                                }
                        }
                }

                to loadChatRoom(name) {
                        return makeChatRoom(name)
                }
        }
}

(imagine that this chat system is just a small sub-module of the main
application, without access to the surgeon, etc)

On startup, the persistence system will:

- Take the root builder (perhaps a chatServer) as input.
- Create a persistNode for it.
- Revive all saved children, by calling methods on chatServer (e.g.
"loadChatRoom").
- Create the public object (chatServer.makePublic(persistNode)) and
register it with identityMgr.

Similiarly, makeChatRoom() returns a builder for chat rooms. This
builder's makePublic will be called with its own persistNode, allowing
the chat room to manage its own children (e.g. bots).

If a chat room needs extra authority (e.g. a timer or a file for saving
the history), we don't need to change the on-disk format, just the
loadChatRoom method, e.g.

        to loadChatRoom(name) {
                return makeChatRoom(name, timer, <file:rooms>[name])
        }

We can give any authority this way, not just persistable authorities
(e.g. we could pass a verb facet or a shallow-read-only directory to
makeChatRoom, which we couldn't do with the default system).

This seems to address the points above:

1. Scalability: It seems feasible that each object can be persisted
independently (although my implementation doesn't currently do this).

2. Redundant information: We only need to persist unique information
about each object. Everything else can be calculated anew at revival
time. Saved files are smaller and easier to read.

3. Difficulty upgrading: Because the on-disk format only contains the
key information, not incidental authority, it's easy to add or remove
authority, regenerate pathnames relative to a new base, etc.

4. Organisation: Every persistent object is organised into a hierarchy.
An object without a parent cannot be represented. Destroying an object
destroys all of its descendants automatically.

5. Safety: Objects don't need to export their authority ever, and they
don't need to hold a reference to their parents.

6. Too many code paths: The creation path exercises all code (e.g.
createChatRoom() uses the persistence system to create each room object
the first time too, not just to revive them). If an object can be made
sturdy, it is very likely it will save and restore correctly too.

Finally, this system ensures that objects are revived in a predictable
order (a parent builder before its children, the parent public object
after them).


--
Dr Thomas Leonard
IT Innovation Centre
2 Venture Road
Southampton
Hampshire SO16 7NP

Tel: +44 0 23 8076 0834
Fax: +44 0 23 8076 0833
mailto:tal@...
http://www.it-innovation.soton.ac.uk 

_______________________________________________
e-lang mailing list
e-lang@...
http://www.eros-os.org/mailman/listinfo/e-lang

Re: Alternative persistence system

by Kevin Reid-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Oct 12, 2009, at 5:58, Thomas Leonard wrote:

> Summary
>
> I've spent some time using the E persistence mechanism now, and I've
> found it to be clever, elegant and very hard to use for my purposes. I
> ended up writing my own replacement for the timeMachine and
> makeSturdyRef objects. Below is a report on the problems I had and the
> solution I ended up with. Has anyone else had similar issues?
>
>
> Problems with the default system
>
> My test case is a fairly large program with many different kinds of
> objects, most of which need to be made persistent and exported as
> SturdyRefs. The program is divided into many modules, and plugins may
> extend it. Using the default E mechanism (makeSturdyRef and
> timeMachine), I found these problems:
>
> 1. Scalability: The system does not scale well. When an object's state
> changes, the entire object graph has to be written out, which takes  
> time
> proportional to the number of objects in the system (roughly 10 ms per
> object on my machine), not to the number of changes. Fixing this  
> would,
> presumably, require making each persistent object's representation be
> independent of that of all other persistent objects. Ideally, I'd like
> to save after creating every persistent object, before giving the  
> sturdy
> ref to the user. I expect to be creating many such objects per  
> second in
> some cases.

This is an efficiency problem; E-on-Java is just not very fast at  
executing E code, which includes the implementation of the persistence  
subsystem.

Due to E's requirements for consistency on revival, an entire vat  
*must* be persisted as a unit. (Of course, if an application such as  
yours has weaker requirements you can use an alternate system.)

> 2. Redundant information: an object's portrayal must include all of  
> its
> authority. If I have 1000 "job" objects, each with access to "timer",
> then the saved file will include 1000 references to "timer". This  
> could
> be fixed by referring to the "parent" object (the one that originally
> created this one) instead (e.g. revive using "parent.makeJob()"), but
> this still results in 1000 references to the parent. Also, this  
> solution
> conflicts with (1), since we want to make the portrayals independent,
> and it requires giving the object access to its parent, which may  
> not be
> desirable from a security point of view.

The repeated references are necessary to preserve capability security.  
However, if an object needs multiple authorities, say 'timer' and  
'stdout', then one thing you can do is have it persist as a reference  
to a bundle of them:

def jobAuthority { # which is an exit or gotten from some loader
   to timer() { return timer }
   to stdout() { return stdout }
}

> 3. Difficulty upgrading: If I have a saved file containing "job"  
> objects
> without timers, and I now decide that job objects should have timers,
> there is no easy way to add them later (at least, not without messing
> around with the surgeon's exits).

This is a hard problem in general, but if you use the authority bundle  
above then you can just change the bundle and every job automatically  
gets that authority when revived.

I've also imagined having a tool to basically do robust search-and-
replace on serialized files, which would be able to handle the 'adding  
authority' problem in general.

> Also, if I give an object access to a subdirectory, it persists with  
> an absolute pathname and I can't restart in a different directory.

This is a problem in the legacy file access subsystem, not the  
persistence subsystem.

One possible solution (which *could* be built as a layer on top, or  
built in): Create "RootRelativeFile" objects with the interface

   makeRootRelativeFile(root :any, subpath :String)

such that they behave like the file root[subpath] but persist as this  
representation and construct more of themselves (a membrane) when sub-
file references are retrieved from them. Then make the root-dir your  
app uses an object which is switchable to forwards to whatever you  
currently want the application root directory to be -- or, perhaps,  
just a graph exit which you revive as whatever directory.

Yes, this is additional complexity, but it is I think useful for many  
applications besides yours. Realize that E's standard library is  
nowhere near "complete" in having every basic capability utility one  
ought to want.

> 4. Organisation: In my systems at least, the owner of the service/vat
> should be able to see the state of the system and discover all  
> objects.
> Each object is owned by some parent object, which maintains a list of
> its children. The E persistence system makes it easy to have objects
> which are exported and persistent but which are not owned by any  
> object.
> You would have to catch and handle exceptions very carefully to ensure
> that this couldn't happen.

This is fixable generically: write the objects so that they (have just  
enough authority to) check with their parents to make sure they are  
properly registered, and become nonfunctional if they aren't.

> Also, if I give makeSturdyRef to an object, I
> have no control over the objects it creates. I want to group objects  
> so
> that I know where they came from and can destroy the whole group at
> once.

Follow capability practice by subdividing authority. Write a caretaker  
wrapper around makeSturdyRef which records the refs created and can be  
destroyed as a group.

> 5. Safety: Without persistence, objects accept authority but don't
> generally give it out (unless that's part of their function). Making  
> an
> object persistent can be done in two ways (__optUncall and
> __optSealedDispatch). The first is easy but unsafe, the second is  
> harder
> but safer (though still with some issues, as mentioned previously). A
> typical programmer, not too concerned with security, has a reasonable
> chance of writing a fairly secure E object that doesn't expose more
> authority than it should. However, they are very likely to take the  
> easy
> and less secure option of using __optUncall for persistence.

In principle it could be reduced to one extra call with a suitable  
library:

to __optSealedDispatch(b) {
   return doPersistence(b, fn { [makeWhatever, ...] })
}

But I suspect that your hypothetical "almost knows what to do"  
programmer would fail to write secure code in other ways anyway.

> 6. Too many code paths: A persistable object must implement three code
> paths: create, save and revive. Most objects are not designed for
> persistence and only support the first case. An object which depends  
> on
> an unpersistable object is also unpersistable. For example, if I call
> makeObject(file.deepReadOnly()) then the resulting object cannot be
> persisted, because read-only files cannot be. Also, the revive  
> operation
> must be made public so that the persistence system can call it. This  
> may
> be safe, but it is not good API design as other people may start using
> it by mistake.

The "revive" operation *should*, when possible, be the same as the  
"create" operation. Exceptions should be reviewed with suspicion.

Makers-for-revival *are* part of the public API because as soon as  
your app is deployed, people have saved data which uses those  
interfaces. You have to preserve compatibility or announce breakage/
support migration just like with any other public interface. (Think of  
it like ABI/"binary compatibility" in C shared libraries.)

That read-only files are unpersistable is a bug. (You can work around  
it by adding a loader to the surgeon which recognizes read-only files.)

> A possible solution
...

I suspect that your solution is, in general, able to work more  
straightforwardly *for your application* because you have additional  
constraints:

   1. Your objects are arranged in a hierarchy.

   2. You have no objects with which your application is mutually  
suspicious.

To expand on the second point, your scheme of reviving objects with  
authority based on their parents would fail dangerously if the child  
object was not actually one of yours, but something which did not have  
that authority in the previous incarnation and now gets it.

I don't know your real persistence infrastructure, so I can't say  
whether this actually makes sense, but that is the general form of my  
suspicion: that you have something which is easier to use, but either  
less powerful or unsafe-given-untrusted-code (depending on the details).

--
Kevin Reid                                  <http://switchb.org/kpreid/>




_______________________________________________
e-lang mailing list
e-lang@...
http://www.eros-os.org/mailman/listinfo/e-lang

Re: Alternative persistence system

by Thomas Leonard :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, 2009-10-12 at 09:25 -0400, Kevin Reid wrote:
> On Oct 12, 2009, at 5:58, Thomas Leonard wrote:
[...]

> > 1. Scalability: The system does not scale well. When an object's state
> > changes, the entire object graph has to be written out, which takes  
> > time
> > proportional to the number of objects in the system (roughly 10 ms per
> > object on my machine), not to the number of changes. Fixing this  
> > would,
> > presumably, require making each persistent object's representation be
> > independent of that of all other persistent objects. Ideally, I'd like
> > to save after creating every persistent object, before giving the  
> > sturdy
> > ref to the user. I expect to be creating many such objects per  
> > second in
> > some cases.
>
> This is an efficiency problem; E-on-Java is just not very fast at  
> executing E code, which includes the implementation of the persistence  
> subsystem.

It's not so much the speed (although that's not great), it's that it
doesn't scale up if I have e.g. thousands of objects or more (which
doesn't seem unreasonable). I'll probably want to save each object as a
row in a database at some point.

> Due to E's requirements for consistency on revival, an entire vat  
> *must* be persisted as a unit. (Of course, if an application such as  
> yours has weaker requirements you can use an alternate system.)

I had to violate this anyway, because my application allows users to
upload large files, which get stored in the file-system, not in memory.
In this case, snapshots make things worse, because an infrequent
snapshot is less likely to match the rest of the saved state.

e.g. if I revoke someone's access to a storage area and then upload
confidential data and then the server crashes, it will revive with them
still having access to the file. If saving was fast, the storage area
object could ensure that the revocation was saved before returning.

I'll probably need to put logs and usage data in the file-system too.

For some applications you may need to snapshot the whole state, but the
scheme below doesn't prevent that (and in fact the current version does
snapshot everything at once). But I think there must be a large set of
applications where this is not useful.

> > 2. Redundant information:
[...]
> The repeated references are necessary to preserve capability security.  
> However, if an object needs multiple authorities, say 'timer' and  
> 'stdout', then one thing you can do is have it persist as a reference  
> to a bundle of them:
>
> def jobAuthority { # which is an exit or gotten from some loader
>    to timer() { return timer }
>    to stdout() { return stdout }
> }

Using a loader would conflict with (1), and allowing modules to call
surgeon.addExit didn't look safe (what if one module replaces another
module's exit?).

> > 6. Too many code paths:
[...]
> The "revive" operation *should*, when possible, be the same as the  
> "create" operation. Exceptions should be reviewed with suspicion.

Anything with a default state seems to be an exception. Either you get
the user to provide your default state (as a mutable object), or you
need separate methods. Admittedly, the create operation should normally
call the revive one internally.

> Makers-for-revival *are* part of the public API because as soon as  
> your app is deployed, people have saved data which uses those  
> interfaces.

True, but if I know that the only callers are previous versions of my
own code then providing an upgrade path is easier.

> > A possible solution
> ...

Thanks for looking at this. I want to make sure I've got a reasonable
system.

> I suspect that your solution is, in general, able to work more  
> straightforwardly *for your application* because you have additional  
> constraints:
>
>    1. Your objects are arranged in a hierarchy.

Yes. Although you could regard E's current system as a special case of
this: a simple two-level hierarchy with all objects being children of
the SturdyRefMaker (in myOptSwissRetainers), and with a load function
that adds no authority.

>    2. You have no objects with which your application is mutually  
> suspicious.

I don't think that's the case, except that an object always trusts its
creator (there's not much you can do about that, after all). But objects
don't trust their children, in general, or other objects.

> To expand on the second point, your scheme of reviving objects with  
> authority based on their parents would fail dangerously if the child  
> object was not actually one of yours, but something which did not have  
> that authority in the previous incarnation and now gets it.

How can the child not be one of mine? The parent tells the persistence
system how to revive the child, e.g.

def makeChatServer(timer) {
        return def chatServer {
                to makePublic(persistNode) {
                        return def chatServerPub {
                                to createChatRoom(name) {
                                        require(validRoomName(name))
                                        return persistNode.makeSturdyChild("loadChatRoom", [name])
                                }
                        }
                }

                to loadChatRoom(name) {
                        return makeChatRoom(name, timer, <file:rooms>[name])
                }
        }
}

The only children of persistNode are those added by createChatRoom
(since that's the only thing with access to
persistNode.makeSturdyChild), and they're created by loadChatRoom.

Each chatRoom will get its own persistNode; it can't add more children
to the chatServer and it can't change the name of the load function (if
the persisted argument "name" was a mutable object then it could change
that, because we pass it to makeChatRoom).

> I don't know your real persistence infrastructure, so I can't say  
> whether this actually makes sense, but that is the general form of my  
> suspicion: that you have something which is easier to use, but either  
> less powerful or unsafe-given-untrusted-code (depending on the details).

I'd certainly like to make sure that it isn't less safe.


--
Dr Thomas Leonard
IT Innovation Centre
2 Venture Road
Southampton
Hampshire SO16 7NP

Tel: +44 0 23 8076 0834
Fax: +44 0 23 8076 0833
mailto:tal@...
http://www.it-innovation.soton.ac.uk 

_______________________________________________
e-lang mailing list
e-lang@...
http://www.eros-os.org/mailman/listinfo/e-lang

Re: Alternative persistence system

by Charles Landau :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Kevin Reid wrote:

> On Oct 12, 2009, at 5:58, Thomas Leonard wrote:
>> The system does not scale well. When an object's state
>> changes, the entire object graph has to be written out, which takes  
>> time
>> proportional to the number of objects in the system (roughly 10 ms per
>> object on my machine), not to the number of changes.
>
> Due to E's requirements for consistency on revival, an entire vat  
> *must* be persisted as a unit. (Of course, if an application such as  
> yours has weaker requirements you can use an alternate system.)

That does not imply that the implementation must take time proportional
to the size of the vat. KeyKOS and CapROS only write out objects that
were changed since the last checkpoint.
_______________________________________________
e-lang mailing list
e-lang@...
http://www.eros-os.org/mailman/listinfo/e-lang

Re: Alternative persistence system

by Kevin Reid-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Oct 12, 2009, at 18:02, Charles Landau wrote:

> Kevin Reid wrote:
>> On Oct 12, 2009, at 5:58, Thomas Leonard wrote:
>>> The system does not scale well. When an object's state
>>> changes, the entire object graph has to be written out, which takes
>>> time
>>> proportional to the number of objects in the system (roughly 10 ms  
>>> per
>>> object on my machine), not to the number of changes.
>>
>> Due to E's requirements for consistency on revival, an entire vat
>> *must* be persisted as a unit. (Of course, if an application such as
>> yours has weaker requirements you can use an alternate system.)
>
> That does not imply that the implementation must take time  
> proportional
> to the size of the vat. KeyKOS and CapROS only write out objects that
> were changed since the last checkpoint.

KeyKOS and CapROS use orthogonal persistence, which greatly simplifies  
the problem.

--
Kevin Reid                                  <http://switchb.org/kpreid/>




_______________________________________________
e-lang mailing list
e-lang@...
http://www.eros-os.org/mailman/listinfo/e-lang