« Return to Thread: [scala] Collections performance

Re: [scala] Collections performance

by Sean McDirmid :: Rate this Message:

Reply to Author | View in Thread

I did similar benchmarking last year and reported results on the internal dev list. JCL is definitely faster (those pesky Sun programmers), and fills in some of the wholes of the current collection library (views, subsets, before recently linked sets and maps). Whereas the Scala collection library is biased to immutable APIs and implementations, the JCL is biased to the mutable case and so it should be faster.

On the other hand, JCL is not portable to .NET, so I'm trying to move my code away from it, and I would definitely love to see a faster Scala collection library eventually (although what we have now definitely works, its just very minimal). Maybe we could we port open source collection libraries to Scala? I use C5 in .NET land, and it has a great feature set that seems pretty efficient.

Anyways, I'm all for a scalax collection library, I'd also be willing to contribute.

Sean

On Wed, Jul 30, 2008 at 2:16 AM, David MacIver <david.maciver@...> wrote:
So as part of my project to write a replacement immutable HashMap
implementation I've been benchmarking like crazy. I had previously
been comparing to the standard immutable.HashMap implementation but
earlier during a bored moment when I couldn't face any more SVG I
decided to widen the scope of my benchmarks and include a bunch of
other map implementations.

The attached are a set of these bench marks, testing a few common
operations on some standard map implementations. These are mainly
interesting for their comparison of the status quo rather than my
implementation (although I'm reasonably pleased with the numbers on
mine). In particular I compared the standard immutable hash map, the
standard mutable hash map, the standard TreeMap and the JCL HashMap
for mapping Strings to Strings.

The numbers are a little sad. They can be summarised as follows:

- TreeMap is slooow (this might be because string comparison is
relatively expensive, but in some other tests I ran earlier on integer
keys it did pretty terribly as well).
- jcl.HashMap basically blows all the others out of the water -
typically by a factor of about 5.

So all the collections we typically use are substantially slower than
their Java equivalents. This is a little unfortunate.

One other point of interest (not really in this context, as it's
dramatically not what maps are supposed to be for) is that the scala
variants hold their own much more in foreach performance - a lot of
the Scala collections have internal iteration methods which don't use
an iterator, and these tend to be faster (indeed my HashMap's foreach
method is the only instance of the JCL collection losing).

So, basically we're typically using collections that are a lot slower
than they need to be.

This isn't anyone's fault, particularly not the EPFL team's. The
java.util collections have had a lot of time spent on optimising them
over the last N years, and having a blazingly fast collections API has
quite correctly not been the top priority for Scala. But at the same
time, it would be nice to have it fixed. :-) The best way to do this
is probably community involvement (likely via scalax. I should really
get around to sending in a CLA for that...). In the mean time, maybe
we should think about using the jcl collections a bit more (at least
when we need high performance mutable collections) - they're not
really significantly worse to use than the standard mutable
collections API, thanks to the nice higher order wrapper the package
provides around the underlying java.util stuff.

That being said, I'm not really proposing anything concrete in this
email. It's mostly just a heads up to let people know what I've found.

 « Return to Thread: [scala] Collections performance