|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
Memcache or ZEO cache (Re: ZEO and relstporage performance)On Tue, Oct 13, 2009 at 8:30 PM, Shane Hathaway <shane@...> wrote:
> This leads to an interesting question. Memcached or ZEO cache--which is > better? For what? For relstorage? or for ZEO? > While memcached has a higher minimum performance penalty, it > also has a lower maximum penalty, since memcached hits never have to > wait for disk. With modern ram configurations, it's likely that you don't wait for disk on read access, as reads are likely satisfied from disk for the ZEO cache. That may explain why the ZEO cache is faster in your tests. In the speedtest, data are almost certainly read from memory for both memcached and the ZEO cache, but memcached also has IPC overhead. > Also, memcached can be shared among processes, That's certainly a big potential win. > there is > a large development community around memcached, That doesn't impress me all that much in this case. The part of the ZEO cache code that overlaps memcache is pretty simple. The most complicated logic in the ZEO cache, which would be just as complicated with another cache storage implementation and more complicated with a shared cache storage is making sure the cache doesn't have stale state. I probably need to look at memcache again, but every time I look at it, it's not at all clear how to prevent reading stale data as current. At some point, I should look at the approach you took In relstorage. > and memcached creates > opportunities for developers to be creative with caching strategies. How so? The biggest problem with the ZEO cache that I'm aware of today is that it doesn't take access patterns into account when evicting data from the cache. As things are now, it doesn't have very accurate information about access patterns. The most valuable objects stay in the object cache, so the ZEO cache rarely sees requests for them. In the future, I plan try modifying the cache eviction code to avoid evicting objects that are in the object cache, although it's not at all clear how much of a win this would be. (It would almost certainly improve startup performance with a persistent cache.) If the cache can be made more effective, then it can also be made smaller, lessening the benefit of a shared cache. The biggest problem with ZEO performance on the client side is that reads require round trips and that generally a client thread only knows to request one read at a time [1]_. I plan to add an API for asynchronous reads. In rare situations in which an application knows it's going to need more than one object, it can prefetch multiple objects at once. (One can imagine iteration scenarios in which this would be easy to predict.) An opportunity that this would provide would be to pre-fetch object revisions for objects that were in the ZODB cache and have just been invalidated. .. [1] There's a related and fairly easy to fix problem that currently a ZEO client only makes one read request at a time, which hurts ZEO clients with multiple application threads. -- Jim Fulton _______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@... https://mail.zope.org/mailman/listinfo/zodb-dev |
|
|
Re: Memcache or ZEO cache (Re: ZEO and relstporage performance)Jim Fulton wrote:
> The most complicated logic in the ZEO cache, which would be just as > complicated with another cache storage implementation and more > complicated with a shared cache storage is making sure the cache > doesn't have stale state. I probably need to look at memcache again, > but every time I look at it, it's not at all clear how to prevent > reading stale data as current. At some point, I should look at the > approach you took In relstorage. Indeed, that's a hard enough problem that it's making me reconsider memcached. I suspect I could adopt the ZEO cache in RelStorage. RelStorage currently uses memcached in a very simple way: it puts (tid, state) in the cache for each oid. When RelStorage reads from the cache, if it gets a state for a different transaction than the transaction ID last polled by the storage instance, RelStorage discards the state from the cache and falls back to the database. That means most of the cache needs to be revalidated after every commit. :-( The strategy should work well for databases that change rarely, but will only add overhead for databases that change often. There is an attempt to improve the situation with backpointers, but I doubt they actually help. I'm tinkering with the idea that some transaction awareness could be added to memcached. Perhaps memcached should hold a "current transaction" value and clients should pass their own "current transaction" value when they try to set data in the cache. If a stale client tries to set data, memcached should ignore the attempt. >> and memcached creates >> opportunities for developers to be creative with caching strategies. > > How so? Well, memcached has a very simple interface, so developers should be able to craft their own memcached-like implementations. They might add multi-level caching, for example. > The biggest problem with ZEO performance on the client side is that > reads require round trips and that generally a client thread only > knows to request one read at a time [1]_. I plan to add an API for > asynchronous reads. In rare situations in which an application knows > it's going to need more than one object, it can prefetch multiple > objects at once. (One can imagine iteration scenarios in which this > would be easy to predict.) An opportunity that this would provide > would be to pre-fetch object revisions for objects that were in the > ZODB cache and have just been invalidated. When ZODB is unpickling the state of an object, it often has to pull in several objects, one at a time. I wonder if it would be valuable to prefetch the objects that will be pulled in by the unpickling operation. We could use the referencesf() function to get the list of OIDs. Alternatively, I wonder if it would be valuable to store a list of referenced OIDs in every object. We might put that list in another pickle, placing it before the "class" and "state" pickles that we currently store. Shane _______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@... https://mail.zope.org/mailman/listinfo/zodb-dev |
|
|
Re: Memcache or ZEO cache (Re: ZEO and relstporage performance)On Wed, Oct 14, 2009 at 2:48 PM, Shane Hathaway <shane@...> wrote:
> Jim Fulton wrote: ... >> The biggest problem with ZEO performance on the client side is that >> reads require round trips and that generally a client thread only >> knows to request one read at a time [1]_. I plan to add an API for >> asynchronous reads. In rare situations in which an application knows >> it's going to need more than one object, it can prefetch multiple >> objects at once. (One can imagine iteration scenarios in which this >> would be easy to predict.) An opportunity that this would provide >> would be to pre-fetch object revisions for objects that were in the >> ZODB cache and have just been invalidated. > > When ZODB is unpickling the state of an object, it often has to pull in > several objects, one at a time. Several object references. > I wonder if it would be valuable to > prefetch the objects that will be pulled in by the unpickling operation. We > could use the referencesf() function to get the list of OIDs. I don't think objects referenced is a very good predictor of objects used. When loading a big container, you wouldn't want to load all of the contained objects, when you might just be traversing to one. Similarly, an object may have subobjects that contain data that aren't needed for a current transaction. Imagine a movie object that you load to get at meta data and that has a blob containing the movie. You wouldn't want to load the blob unless necessary. OTOH, it might be very easy for an application that is going to serve the blob data to make a call that gets the retrieval going before it needs the data. Something like: def some_method(): self.blob._p_preload() # returns right away ... do other things ... Now do stuff with self.blob. This will block of the blob data isn't there yet, but it will at least have a head start This gets a lot more interesting if you know you're going to want more than one object. For example, a method that knows it is going to do something will all subobjects in a container might do something like: def get_it_all(): for o in self.data: o._p_preload() for o in self.data: ... do some work woth the actual objects > Alternatively, I wonder if it would be valuable to store a list of > referenced OIDs in every object. We might put that list in another pickle, > placing it before the "class" and "state" pickles that we currently store. I don't think getting the referenced objects is all that hard. I just don't think it's all that useful. Jim -- Jim Fulton _______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@... https://mail.zope.org/mailman/listinfo/zodb-dev |
|
|
Re: Memcache or ZEO cache (Re: ZEO and relstporage performance)Jim Fulton wrote:
> On Wed, Oct 14, 2009 at 2:48 PM, Shane Hathaway <shane@...> wrote: >> When ZODB is unpickling the state of an object, it often has to pull in >> several objects, one at a time. > > Several object references. Ah yes, and the class info for those object references is usually stored in the referencing pickle. I forgot that for a moment. Never mind. :-) Adding preloading capability to applications does sound interesting. Shane _______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@... https://mail.zope.org/mailman/listinfo/zodb-dev |
|
|
Re: Memcache or ZEO cache (Re: ZEO and relstporage performance)Shane Hathaway wrote:
> RelStorage currently uses memcached in a very simple way: it puts (tid, > state) in the cache for each oid. When RelStorage reads from the cache, > if it gets a state for a different transaction than the transaction ID > last polled by the storage instance, RelStorage discards the state from > the cache and falls back to the database. That means most of the cache > needs to be revalidated after every commit. :-( The strategy should > work well for databases that change rarely, but will only add overhead > for databases that change often. There is an attempt to improve the > situation with backpointers, but I doubt they actually help. FWIW, not long after I sent this message, I finally found a more optimal way to use memcached. I have implemented the new strategy on the RelStorage trunk. I'll try to explain it briefly, but since the algorithm is new to me, I am only beginning to learn how to explain it. The new strategy caches object state by oid and tid, so key ":state:123:456" holds the tid and state of object 456 as it should be seen by transaction 123. The tid in the cache value does not necessarily match the tid in the cache key. (This part of the algorithm is obvious.) Additionally, each storage instance now holds on to a pair of checkpoints. (This is the new part of the algorithm.) The checkpoints are transaction IDs. Each storage instance maintains a snapshot of the checkpoints, the delta between the checkpoints, and the delta since the latest checkpoint. Each delta is a simple dictionary that maps oid to tid. When a storage instance looks for a cached value, if the object it is looking for is in one of the delta maps, it looks at the corresponding cache key. If the storage instance doesn't know the tid, then the object has presumably not changed recently, so the storage instance looks for the object in the cache at both checkpoints (using a single lookup). If no cached value is found there, then the storage instance gets the object state from the database and caches it at the most recent checkpoint. Storage instances choose the checkpoints and make an effort to keep their checkpoints in sync with each other, to maximize cache sharing. The current checkpoints are stored in the key ":checkpoints" in memcached. There are lots of conditions where the checkpoints fall out of sync, and clients might even contend over what the checkpoints should be, but in theory, checkpoint disagreement will only lead to cache misses, not stale data. This strategy relies only on the most basic guarantees that memcached provides, so I expect it to be reliable. I also expect it to provide an excellent cache hit rate, unless the checkpoints move too often. I've written more notes here: http://svn.zope.org/relstorage/trunk/notes/caching.txt?view=markup The entire implementation is here: http://svn.zope.org/relstorage/trunk/relstorage/cache.py?rev=105167&view=markup I just thought you'd like to know. I think the algorithm might be applicable elsewhere. I invite anyone to steal it if they can. :-) Shane _______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@... https://mail.zope.org/mailman/listinfo/zodb-dev |
| Free embeddable forum powered by Nabble | Forum Help |