|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
sortable serialization formatA while ago I started hacking on a serialization format that would have the same sorting properties as Erlang terms. I didn't quite get it to work (negative floats was the most difficult part), but when I returned to it today, I realized that it was only a very small problem. Once fixed, all my QuickCheck suites passed. http://svn.ulf.wiger.net/sext/trunk/sext/doc/index.html In many cases this format is slightly more compact than the External Term Format. The most notable exception is negative bignums, but large binaries also take up a little bit more space (asymptotically approaching 12.5% overhead for binaries larger than 36 bytes). (Strings are much bigger, since I don't try to detect that they are special cases of lists. This should perhaps be fixed.) You are welcome to give it a spin and report to me if you find good uses for it. The original idea was to make it easier to make use of external storage solutions with ordered set semantics, without foregoing the genericity and convenience of generic term serialization. BR, Ulf W PS Float comparisons are handled in the QuickCheck suite by coercing to IEEE 764 binary64 format. This is supposedly what Erlang uses internally, but I encountered lots of weird behaviour before doing this coercion. -- Ulf Wiger CTO, Erlang Training & Consulting Ltd http://www.erlang-consulting.com ________________________________________________________________ erlang-questions mailing list. See http://www.erlang.org/faq.html erlang-questions (at) erlang.org |
|
|
|
|
|
sext and Tokyo Tyrant (Re: [erlang-questions] sortable serialization format)Ulf Wiger wrote:
> > A while ago I started hacking on a serialization format that > would have the same sorting properties as Erlang terms. > > I didn't quite get it to work (negative floats was the most > difficult part), but when I returned to it today, I realized > that it was only a very small problem. Once fixed, all my > QuickCheck suites passed. I just had to try this on Tokyo Tyrant, so I wrote a small prototype for connecting to TT and encoding a few requests, using the sext library to encode terms before sending them. I realized that a new function was needed in sext: prefix(Term), which encodes a 'prefix' that will match similar terms, and allow some wildcarding. A prefix can't be decoded (at least, I didn't write any code for doing so). Some examples: Eshell V5.7.1 (abort with ^G) 1> sext:encode({1,2,3}). <<16,0,0,0,3,10,0,0,0,2,10,0,0,0,4,10,0,0,0,6>> 2> sext:prefix({1,'_','_'}). <<16,0,0,0,3,10,0,0,0,2>> 3> sext:encode([1,2,3]). <<17,10,0,0,0,2,10,0,0,0,4,10,0,0,0,6,0>> 4> sext:prefix([1,2|'_']). <<17,10,0,0,0,2,10,0,0,0,4>> Armed with this, I opened a B-tree table in Tokyo Tyrant, and connected to it with my prototype module. Eshell V5.7.1 (abort with ^G) 1> {ok,TT} = tt_proto:open(tt,[]). {ok,<0.35.0>} 2> tt_proto:put(TT,{1,a}, 1). ok 3> tt_proto:get(TT,{1,a}). {ok,1} 4> tt_proto:put(TT,{1,b}, 2). ok 5> tt_proto:put(TT,{1,c}, 3). ok 6> tt_proto:put(TT,{2,a}, 4). ok Now, for some prefix matching: 7> tt_proto:keys(TT,{1,'_'}). {ok,[{1,a},{1,b},{1,c}]} 8> tt_proto:keys(TT,{2,'_'}). {ok,[{2,a}]} 9> timer:tc(tt_proto,keys,[TT,{1,'_'}]). {279,{ok,[{1,a},{1,b},{1,c}]}} I made no real effort to optimize anything. The module starts a gen_server which keeps a connection open to ttserver. It handles only one query at a time, but looking at the TCP protocol, it's hard to see how it could to otherwise, as there is no tagging of requests. The round trip times are going to be fairly high for simple requests (compared to dets and mnesia on small data sets), but the main benefit of using TT in the first place ought to be either that the data set is uncomfortably large for mnesia and dets, or that one wants ordered_set semantics on disk-based storage. I put the tt_proto module in sext/examples/ There is some edoc for it too. http://svn.ulf.wiger.net/sext/trunk/sext/doc/index.html BR, Ulf W ________________________________________________________________ erlang-questions mailing list. See http://www.erlang.org/faq.html erlang-questions (at) erlang.org |
|
|
Re: sext and Tokyo Tyrant (Re: [erlang-questions] sortable serialization format)Awesome. That's 90% of tcerl; most of the complexity had to do with
implementing erlang term order from encoded binaries, some of the rest with augmenting term_to_binary to allow for prefixes (there's no smallest or largest term in erlang), and the remainder with query planning. The query planning part is pure erlang and independent of the term encoding (http://code.google.com/p/tcerl/source/browse/trunk/tcerl/src/tcbdbmsutil.erl), so you could hopefully reuse it. In retrospect, clearly, I should have abandoned term_to_binary, if only because Erlang is easy and C is hard, so the C side should be just memcmp. -- p On Sat, 31 Oct 2009, Ulf Wiger wrote: > Ulf Wiger wrote: > > > > A while ago I started hacking on a serialization format that > > would have the same sorting properties as Erlang terms. > > > > I didn't quite get it to work (negative floats was the most > > difficult part), but when I returned to it today, I realized > > that it was only a very small problem. Once fixed, all my > > QuickCheck suites passed. > > I just had to try this on Tokyo Tyrant, so I wrote a small > prototype for connecting to TT and encoding a few requests, > using the sext library to encode terms before sending them. > > I realized that a new function was needed in sext: prefix(Term), > which encodes a 'prefix' that will match similar terms, and allow > some wildcarding. A prefix can't be decoded (at least, I didn't > write any code for doing so). > > Some examples: > > Eshell V5.7.1 (abort with ^G) > 1> sext:encode({1,2,3}). > <<16,0,0,0,3,10,0,0,0,2,10,0,0,0,4,10,0,0,0,6>> > 2> sext:prefix({1,'_','_'}). > <<16,0,0,0,3,10,0,0,0,2>> > 3> sext:encode([1,2,3]). > <<17,10,0,0,0,2,10,0,0,0,4,10,0,0,0,6,0>> > 4> sext:prefix([1,2|'_']). > <<17,10,0,0,0,2,10,0,0,0,4>> > > > Armed with this, I opened a B-tree table in Tokyo Tyrant, > and connected to it with my prototype module. > > Eshell V5.7.1 (abort with ^G) > 1> {ok,TT} = tt_proto:open(tt,[]). > {ok,<0.35.0>} > 2> tt_proto:put(TT,{1,a}, 1). > ok > 3> tt_proto:get(TT,{1,a}). > {ok,1} > 4> tt_proto:put(TT,{1,b}, 2). > ok > 5> tt_proto:put(TT,{1,c}, 3). > ok > 6> tt_proto:put(TT,{2,a}, 4). > ok > > Now, for some prefix matching: > > 7> tt_proto:keys(TT,{1,'_'}). > {ok,[{1,a},{1,b},{1,c}]} > 8> tt_proto:keys(TT,{2,'_'}). > {ok,[{2,a}]} > 9> timer:tc(tt_proto,keys,[TT,{1,'_'}]). > {279,{ok,[{1,a},{1,b},{1,c}]}} > > I made no real effort to optimize anything. The module starts > a gen_server which keeps a connection open to ttserver. It handles > only one query at a time, but looking at the TCP protocol, it's > hard to see how it could to otherwise, as there is no tagging of > requests. The round trip times are going to be fairly high for simple > requests (compared to dets and mnesia on small data sets), but the > main benefit of using TT in the first place ought to be either that > the data set is uncomfortably large for mnesia and dets, or that one > wants ordered_set semantics on disk-based storage. > > I put the tt_proto module in sext/examples/ > There is some edoc for it too. > > http://svn.ulf.wiger.net/sext/trunk/sext/doc/index.html > > BR, > Ulf W > > ________________________________________________________________ > erlang-questions mailing list. See http://www.erlang.org/faq.html > erlang-questions (at) erlang.org > > ________________________________________________________________ erlang-questions mailing list. See http://www.erlang.org/faq.html erlang-questions (at) erlang.org |
| Free embeddable forum powered by Nabble | Forum Help |