« Return to Thread: [scala] Collections performance

Re: [scala] Collections performance

by David MacIver :: Rate this Message:

Reply to Author | View in Thread

On Wed, Jul 30, 2008 at 10:27 AM, Sean McDirmid <sean.mcdirmid@...> wrote:

>
> Comments inlined.
>
> On Wed, Jul 30, 2008 at 5:11 PM, David MacIver <david.maciver@...>
> wrote:
>
>>
>> Right. The fact that the immutable collections library is slower than
>> the mutable one doesn't surprise me. But the mutable case also seems
>> substantially slower.
>
> They share the same implementations. I think making collections work for
> both mutable and immutable cases has something to do with the slow down.

It's certainly the cause of some umm... interesting effects on the
immutable collections. I'm not sure why it would be a slow down for
the mutable ones, except possibly overly abstracted code.

>> I wonder if it would be worth filling in the gaps in the standard
>> mutable collection library so that it supports the same interface as
>> the JCL collections and then using the JCL collections behind the
>> scenes when running on the JVM?
>
> If the JCL was implemented in cross-platform Scala, it would be as almost as
> fast as the Java-based JCL. In this case, just take the Java code for the
> Java JCL and port it to Scala.

I'm not sure about the licensing implications of that. The source code
for the Java collections is GPLed, and that would definitely count as
a derivative work.

> I don't think we should use different
> implementations of collections on different platforms (except for interop).

Fair enough.

>> Could do. One of the problems with that, at least if we start from a
>> .NET or Java library, is that it will most likely enrich the mutable
>> side of the of the collections API while leaving the immutable side to
>> languish.
>
> We could take a bunch of collections libraries and port over the best parts
> (with unit tests). Whether the libraries are mutable or immutable. I guess

Sure. I'm just saying: The availability is hugely biased in the
direction of mutable ones.

> API affordances would have to be made, but I really don't like immutable
> features bringing down mutable features (e.g., weakly typed covariant
> contains method).

I'm not clear on how that's a function of immutability vs immutability.

>> One idea I've toyed with is at some point doing a "Data structures
>> week". Set up a wiki to coordinate and suggest ideas, and have
>> everyone who's interested pick a cool/useful data structure and put
>> together as solid an implementation of it in Scala as they can.
>
> I prefer stealing and porting code from other OSS libraries (copying their
> licenses of course). There isn't much value in implementing another hash
> map, it probably won't be so fast, someone has done it better, and why
> bother re-inventing the wheel? Maintainence is a problem though, if we don't
> understand what we are porting.

Well, I wasn't thinking of data structures along the lines of "Yet
another hash map". There's quite a rich selection of special purpose
and interesting data structures out there: Bloom filters, tries,
finger trees, a variety of heaps (including interestingly different
ones like the soft heap),  plus a rich variety of purely functional
data structures a la Okasaki's book and future papers. It would be
really nice to have a "batteries included" usable collections library
containing more exotic data structures. The objective here is not to
match the Java collections library in usability, but to beat it.

Additionally: There's no requirement that the implementations have to
be unique to the person submitting them. They're perfectly welcome to
port an existing implementation. Indeed, it would be encouraged! The
objective is to get people to contribute, not to constrain how they
contribute. :-)

>> Ideally with a set of scalacheck tests and benchmarks for them to work
>> against. Participants would be encouraged to submit them to scalax or
>> similar afterwards, but requiring the CLA up front would probably
>> limit participation.
>
> Simply OSS license, I don't see the problem.

I'm fine with that, but scalax requires people to sign a CLA to
contribute code to it.

 « Return to Thread: [scala] Collections performance