Haskellers at University of Edinburgh? (hackathon)

View: New views
8 Messages — Rating Filter:   Alert me  

Haskellers at University of Edinburgh? (hackathon)

by Eric Kow :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Haskellers,

I'm looking for people at the University of Edinburgh who have a swipe
card with access to the Informatics Forum and could hang around on 29
August (or perhaps 6 Sep).

Thanks!

PS. You may also be interested in some Hackathon news: we've reserved a
room at the ICFP venue suitable for up to 50 people, but we can only
have it for 30 August.  I'm still holding out for a two-day space, but
if I don't hear back by mid-week, we'll settle for a single Haskell Hack
Day, preceded by a more informal Haskell Hacking Expedition the day
before (for example, camping in in a wifi-enabled pub for the day).

--
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

signature.asc (204 bytes) Download Attachment

ANN: TernaryTrees-0.1.1.1 - An efficient ternary tree implementation of Sets and Maps

by Alex Mason :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

TernaryTrees is a package that extends Data.Set ad Data.Map with some  
ternary tree structures, based on the article [http://www.pcplus.co.uk/node/3074/ 
] .

So far there are three modules: Data.Set.TernarySet,  
Data.Map.TernaryMap and Data.Set.StringSet, which can hold `Ord a =>  
[a]`, Ord a => [a] keys with b values, and Strings respectively. The  
interfaces for these types are very much like those of Data.{Set,Map},  
though not wuite as featurefull.

Later releases will also have a StringMap, and I'll update the  
TernaryMap to match the Set implementations more closely in a not too  
distant update.

Ternary trees are supposed to be one of the more efficient ways of  
storing strings in a set, and my testing of this package seems to  
support this (being able to insert 230,000+ words, check that all  
those words are actually in the set, write the set out to disk using  
the Data.Binary instance, reading them back in, and checking the old  
and new sets are equal takes about 3.5 seconds on my machine).

Included is a small example program that runs through the above  
sequence, and then asks the user to enter words to see if they're in  
the set, called tdict.

Please give it a try and let me know what you think, it's my first  
(hopefully) useful hackage package, and I'd love some feedback. There  
is also a darcs repo [http://random.axman6.com/darcs/TernaryTrees/],  
and any patches are welcome.

Cheers,
Alex Mason (Axman6)
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: ANN: TernaryTrees-0.1.1.1 - An efficient ternary tree implementation of Sets and Maps

by Andrew Coppin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Alex Mason wrote:
> TernaryTrees is a package that extends Data.Set ad Data.Map with some
> ternary tree structures, based on the article
> [http://www.pcplus.co.uk/node/3074/] .

That's just scary. I was just in the middle of writing the exact same
thing! :-D (I read that very article...)

> Please give it a try and let me know what you think, it's my first
> (hopefully) useful hackage package, and I'd love some feedback.

OK, will do...

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: ANN: TernaryTrees-0.1.1.1 - An efficient ternary tree implementation of Sets and Maps

by Felipe Lessa :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Jun 30, 2009 at 03:29:45AM +1000, Alex Mason wrote:
> (being able to insert 230,000+ words, check that all those
> words are actually in the set, write the set out to disk using
> the Data.Binary instance, reading them back in, and checking
> the old and new sets are equal takes about 3.5 seconds on my
> machine).

It would be nice to know how much time the same test takes using
other kinds of containers.

--
Felipe.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: ANN: TernaryTrees-0.1.1.1 - An efficient ternary tree implementation of Sets and Maps

by ajb-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

G'day all.

Quoting Andrew Coppin <andrewcoppin@...>:

> That's just scary. I was just in the middle of writing the exact same
> thing! :-D (I read that very article...)

When you're both done, please compare with the implementation that's
been in Edison for about five years:

http://www.cs.princeton.edu/~rdockins/edison/edison-core/src/Data/Edison/Assoc/

If you've got something that's an improvement (and believe me, it
shouldn't be hard to improve on it), I'm sure that Rob would appreciate
a patch.

Cheers,
Andrew Bromage
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: ANN: TernaryTrees-0.1.1.1 - An efficient ternary tree implementation of Sets and Maps

by wren ng thornton :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Alex Mason wrote:
> TernaryTrees is a package that extends Data.Set ad Data.Map with some
> ternary tree structures, based on the article
> [http://www.pcplus.co.uk/node/3074/] .

For the string (or rather ByteString) version:

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-trie

Which has a number of other significant performance improvements (e.g.
node fusion, ByteString instead of String) and a highly expressive
interface. Because it uses ByteStrings it can trie any type which can be
serialized into a vector of bits[1], albeit indirectly.

The real trick with tries is not in just having them[2], it's in having
the right interface to make use of what they're good at. For example, if
I have multiple tries, I'd like to merge them without doing it element
by element[3]. Or if I know I'm going to be making a number of similar
queries, it'd be nice if I could cache my position in the trie[4] to
avoid repeating the work for the prefixes of all my queries[5]. Using
tricks like these leads to significant improvements over using them like
hashtables; tries aren't hashtables just like lists aren't arrays.



[1] Where the ByteString encodings are equal iff the values are equal.
Ideally, the encoding should ensure that for any given set of queries,
the discriminative bits are closest to the front. You can also use
lossless compression algorithms (e.g. Golomb codes) to shrink the size
of the keys if you happen to know their distribution; and if your
queries are already compressed, you don't need to decompress them to do
lookup!

[2] And they seem to be all the rage for people reinventing them these
days...

[3] Supported by Data.Trie.mergeBy for the general case, with unionL and
unionR for common use cases.

[4] Supported by Data.Trie.lookupBy

[5] In the general case, this is trie intersection. You can hack this
together with the bytestring-trie-0.1.4 interface, but an efficient
version is on my todo list. A special case of intersection is provided
by Data.Trie.submap.

--
Live well,
~wren
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: ANN: TernaryTrees-0.1.1.1 - An efficient ternary tree implementation of Sets and Maps

by Don Stewart-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

wren:

> Alex Mason wrote:
>> TernaryTrees is a package that extends Data.Set ad Data.Map with some  
>> ternary tree structures, based on the article  
>> [http://www.pcplus.co.uk/node/3074/] .
>
> For the string (or rather ByteString) version:
>
> http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-trie
>
> Which has a number of other significant performance improvements (e.g.  
> node fusion, ByteString instead of String) and a highly expressive  
> interface. Because it uses ByteStrings it can trie any type which can be  
> serialized into a vector of bits[1], albeit indirectly.
>
> The real trick with tries is not in just having them[2], it's in having  
> the right interface to make use of what they're good at. For example, if  
> I have multiple tries, I'd like to merge them without doing it element  
> by element[3]. Or if I know I'm going to be making a number of similar  
> queries, it'd be nice if I could cache my position in the trie[4] to  
> avoid repeating the work for the prefixes of all my queries[5]. Using  
> tricks like these leads to significant improvements over using them like  
> hashtables; tries aren't hashtables just like lists aren't arrays.
>
>

Do you have benchmarks?
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: ANN: TernaryTrees-0.1.1.1 - An efficient ternary tree implementation of Sets and Maps

by wren ng thornton :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Don Stewart wrote:

> wren:
>> Alex Mason wrote:
>>> TernaryTrees is a package that extends Data.Set ad Data.Map with some  
>>> ternary tree structures, based on the article  
>>> [http://www.pcplus.co.uk/node/3074/] .
>> For the string (or rather ByteString) version:
>>
>> http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-trie
>>
>> Which has a number of other significant performance improvements (e.g.  
>> node fusion, ByteString instead of String) and a highly expressive  
>> interface. Because it uses ByteStrings it can trie any type which can be  
>> serialized into a vector of bits[1], albeit indirectly.
>>
>> The real trick with tries is not in just having them[2], it's in having  
>> the right interface to make use of what they're good at. For example, if  
>> I have multiple tries, I'd like to merge them without doing it element  
>> by element[3]. Or if I know I'm going to be making a number of similar  
>> queries, it'd be nice if I could cache my position in the trie[4] to  
>> avoid repeating the work for the prefixes of all my queries[5]. Using  
>> tricks like these leads to significant improvements over using them like  
>> hashtables; tries aren't hashtables just like lists aren't arrays.
>
> Do you have benchmarks?


Somewhere in my email archive (care of Mark Wotton). I'll see if I can
dig them up this weekend. The biggest issue here is finding nice
datasets (and tasks) to give reasonable benchmarks for. Reading in all
of /usr/dict (or the Brown corpus) and looking up all keys only gives
one perspective (or two), and not necessarily the most helpful one for
"real world" use. I haven't found any good dataset/task suites like
there are for the Language Benchmarks Game, though I'd love to hear
about one.

The tries /= hashtables comment stems from discussions on various
haskell blogs with people inventing their own (or wanting to benchmark
Data.Map vs hashtables vs tries vs bloomfilters). As a drop-in
replacement tries will perform adequately, but they're nothing
overwhelming; the overwhelming comes from changing the usage algorithms
to match the "stride" of the datastructure. I don't think I have links
to these discussions anymore to pull up code examples.

--
Live well,
~wren
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe