[scala] Collections performance

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 | Next >

[scala] Collections performance

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

So as part of my project to write a replacement immutable HashMap
implementation I've been benchmarking like crazy. I had previously
been comparing to the standard immutable.HashMap implementation but
earlier during a bored moment when I couldn't face any more SVG I
decided to widen the scope of my benchmarks and include a bunch of
other map implementations.

The attached are a set of these bench marks, testing a few common
operations on some standard map implementations. These are mainly
interesting for their comparison of the status quo rather than my
implementation (although I'm reasonably pleased with the numbers on
mine). In particular I compared the standard immutable hash map, the
standard mutable hash map, the standard TreeMap and the JCL HashMap
for mapping Strings to Strings.

The numbers are a little sad. They can be summarised as follows:

- TreeMap is slooow (this might be because string comparison is
relatively expensive, but in some other tests I ran earlier on integer
keys it did pretty terribly as well).
- jcl.HashMap basically blows all the others out of the water -
typically by a factor of about 5.

So all the collections we typically use are substantially slower than
their Java equivalents. This is a little unfortunate.

One other point of interest (not really in this context, as it's
dramatically not what maps are supposed to be for) is that the scala
variants hold their own much more in foreach performance - a lot of
the Scala collections have internal iteration methods which don't use
an iterator, and these tend to be faster (indeed my HashMap's foreach
method is the only instance of the JCL collection losing).

So, basically we're typically using collections that are a lot slower
than they need to be.

This isn't anyone's fault, particularly not the EPFL team's. The
java.util collections have had a lot of time spent on optimising them
over the last N years, and having a blazingly fast collections API has
quite correctly not been the top priority for Scala. But at the same
time, it would be nice to have it fixed. :-) The best way to do this
is probably community involvement (likely via scalax. I should really
get around to sending in a CLA for that...). In the mean time, maybe
we should think about using the jcl collections a bit more (at least
when we need high performance mutable collections) - they're not
really significantly worse to use than the standard mutable
collections API, thanks to the nice higher order wrapper the package
provides around the underlying java.util stuff.

That being said, I'm not really proposing anything concrete in this
email. It's mostly just a heads up to let people know what I've found.

   My HashMap: update  Normal HashMap: update  Mutable HashMap: update  TreeMap: update  JCL Hashmap: update
  --------------------------------------------------------------------------------------------------------------
                 2443                    1526                     1622             9104                  566
                 2324                    1501                     1568             8909                  536
  --------------------------------------------------------------------------------------------------------------
                 2436                    1485                     1560             9106                  523
                 2256                    1439                     1553             8819                  522
                 2245                    1443                     1532             8769                  519
                 2236                    1453                     1540             8787                  519
                 2242                    1458                     1557             8780                  563
                 2283                    1464                     1559             8850                  543
                 2300                    1465                     1557             8893                  525
                 2272                    1461                     1562             8872                  529
                 2252                    1461                     1551             8834                  522
                 2488                    1623                     1552             8881                  553
  --------------------------------------------------------------------------------------------------------------
                 2301 ms                 1475 ms                  1552 ms          8859 ms               532 ms  
                    0 M/s                   0 M/s                    0 M/s            0 M/s                0 M/s
                  100 %                   155 %                    148 %             25 %                432 %  
  --------------------------------------------------------------------------------------------------------------


   My HashMap: get  Normal HashMap: get  Mutable HashMap: get  TreeMap: get  JCL HashMap: get
  -----------------------------------------------------------------------------------------------
              1980                 2138                  2413          7012               448
              1936                 2305                  2315          6969               408
  -----------------------------------------------------------------------------------------------
              1937                 2275                  2396          6945               407
              1925                 2269                  2376          6953               404
              1922                 2268                  2375          6904               402
              1908                 2263                  2379          6905               420
              1951                 2248                  2374          6879               405
              1917                 2258                  2388          6895               404
              1918                 2250                  2367          6917               403
              1916                 2255                  2376          6940               403
              1917                 2274                  2394          6916               412
              1939                 2250                  2384          6881               405
  -----------------------------------------------------------------------------------------------
              1925 ms              2261 ms               2381 ms       6913 ms            406 ms  
                 0 M/s                0 M/s                 0 M/s         0 M/s             0 M/s
               100 %                 85 %                  80 %          27 %             473 %  
  -----------------------------------------------------------------------------------------------


   My HashMap: apply  Normal HashMap: apply  TreeMap: apply  Mutable HashMap: apply  JCL HashMap: apply
  ---------------------------------------------------------------------------------------------------------
                1018                   1244            3179                    1235                 257
                 942                   1233            3173                    1218                 233
  ---------------------------------------------------------------------------------------------------------
                 945                   1237            3145                    1250                 241
                 945                   1232            3144                    1218                 233
                 939                   1232            3143                    1223                 235
                 944                   1235            3171                    1234                 233
                 941                   1225            3153                    1216                 233
                 941                   1258            3139                    1232                 261
                 964                   1225            3167                    1220                 232
                 934                   1230            3178                    1226                 234
                 951                   1246            3147                    1232                 234
                 939                   1236            3163                    1219                 232
  ---------------------------------------------------------------------------------------------------------
                 944 ms                1236 ms         3155 ms                 1227 ms              237 ms  
                   0 M/s                  0 M/s           0 M/s                   0 M/s               0 M/s
                 100 %                   76 %            29 %                    76 %               397 %  
  ---------------------------------------------------------------------------------------------------------


   My HashMap: foreach  Normal HashMap: foreach  Mutable HashMap: foreach  JCL HashMap: foreach  TreeMap: foreach
  -------------------------------------------------------------------------------------------------------------------
                   258                      884                       879                   360               371
                   233                      841                       839                   329               348
  -------------------------------------------------------------------------------------------------------------------
                   235                      841                       843                   330               349
                   234                      842                       841                   332               348
                   233                      850                       841                   330               349
                   235                      844                       839                   331               366
                   252                      857                       841                   327               348
                   233                      839                       837                   329               349
                   235                      842                       837                   328               348
                   232                      837                       838                   329               354
                   235                      780                       801                   327               346
                   232                      802                       800                   328               354
  -------------------------------------------------------------------------------------------------------------------
                   236 ms                   833 ms                    832 ms                329 ms            351 ms  
                     0 M/s                    0 M/s                     0 M/s                 0 M/s             0 M/s
                   100 %                     28 %                      28 %                  71 %              67 %  
  -------------------------------------------------------------------------------------------------------------------


   My HashMap: elements.foreach  Normal HashMap: elements.foreach  TreeMap: elements.foreach  Mutable HashMap: elements.foreach  JCL HashMap: elements.foreach
  ----------------------------------------------------------------------------------------------------------------------------------------------------------------
                            460                               780                       1650                                804                            338
                            408                               777                       1588                                803                            266
  ----------------------------------------------------------------------------------------------------------------------------------------------------------------
                            407                               780                       1607                                808                            265
                            408                               781                       1580                                808                            264
                            406                               776                       1580                                820                            265
                            408                               777                       1579                                808                            266
                            408                               784                       1578                                807                            264
                            408                               777                       1584                                807                            265
                            407                               776                       1577                                812                            266
                            406                               778                       1576                                806                            265
                            407                               778                       1582                                809                            264
                            408                               777                       1564                                805                            264
  ----------------------------------------------------------------------------------------------------------------------------------------------------------------
                            407 ms                            778 ms                    1581 ms                             809 ms                         265 ms  
                              0 M/s                             0 M/s                      0 M/s                              0 M/s                          0 M/s
                            100 %                              52 %                       25 %                               50 %                          153 %  
  ----------------------------------------------------------------------------------------------------------------------------------------------------------------


Re: [scala] Collections performance

by bearfeeder :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Gee David... the is so "academic"... I mean you're testing things and all.  :-) :-) :-)

On Tue, Jul 29, 2008 at 11:16 AM, David MacIver <david.maciver@...> wrote:
So as part of my project to write a replacement immutable HashMap
implementation I've been benchmarking like crazy. I had previously
been comparing to the standard immutable.HashMap implementation but
earlier during a bored moment when I couldn't face any more SVG I
decided to widen the scope of my benchmarks and include a bunch of
other map implementations.

The attached are a set of these bench marks, testing a few common
operations on some standard map implementations. These are mainly
interesting for their comparison of the status quo rather than my
implementation (although I'm reasonably pleased with the numbers on
mine). In particular I compared the standard immutable hash map, the
standard mutable hash map, the standard TreeMap and the JCL HashMap
for mapping Strings to Strings.

The numbers are a little sad. They can be summarised as follows:

- TreeMap is slooow (this might be because string comparison is
relatively expensive, but in some other tests I ran earlier on integer
keys it did pretty terribly as well).
- jcl.HashMap basically blows all the others out of the water -
typically by a factor of about 5.

So all the collections we typically use are substantially slower than
their Java equivalents. This is a little unfortunate.

One other point of interest (not really in this context, as it's
dramatically not what maps are supposed to be for) is that the scala
variants hold their own much more in foreach performance - a lot of
the Scala collections have internal iteration methods which don't use
an iterator, and these tend to be faster (indeed my HashMap's foreach
method is the only instance of the JCL collection losing).

So, basically we're typically using collections that are a lot slower
than they need to be.

This isn't anyone's fault, particularly not the EPFL team's. The
java.util collections have had a lot of time spent on optimising them
over the last N years, and having a blazingly fast collections API has
quite correctly not been the top priority for Scala. But at the same
time, it would be nice to have it fixed. :-) The best way to do this
is probably community involvement (likely via scalax. I should really
get around to sending in a CLA for that...). In the mean time, maybe
we should think about using the jcl collections a bit more (at least
when we need high performance mutable collections) - they're not
really significantly worse to use than the standard mutable
collections API, thanks to the nice higher order wrapper the package
provides around the underlying java.util stuff.

That being said, I'm not really proposing anything concrete in this
email. It's mostly just a heads up to let people know what I've found.



--
lift, the simply functional web framework http://liftweb.net
Collaborative Task Management http://much4.us
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp

Re: [scala] Collections performance

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Jul 30, 2008 at 1:03 AM, David Pollak
<feeder.of.the.bears@...> wrote:
> Gee David... the is so "academic"... I mean you're testing things and all.
> :-) :-) :-)

Absolutely. It's well known that Real Programmers don't need data
structures. Arrays should be good enough for anyone. :-)

Re: [scala] Collections performance

by Josh Suereth :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Aren't data structures what you learn in school, so when you hit the real world you're ready to implement/optimize what someone else has already implemented/optimized for you?  I'm willing to pull out the old textbooks, RTFM and give a whirl at some optimizations.  Unfortunately if anyone asks me to prove their speed, I'm SOL.

Besides, I think a poor (or nonexistent) collections API is key to a good adoptation of any language.  That and arcane syntax:
template<typename T, size_t N>
inline void foreach(T (&array)[N], void (*f)(T) )
{
    for(int i=0; i < N; i++)
    {
          (*f)(array[i]);
    }
}

BTW I think java.util.collections is my most imported package in Java.  I really do think scala needs a great collections library.  The API is great (from what I've used so far), so if the speed improves It'd be an easy sell.

On Tue, Jul 29, 2008 at 8:07 PM, David MacIver <david.maciver@...> wrote:
On Wed, Jul 30, 2008 at 1:03 AM, David Pollak
<feeder.of.the.bears@...> wrote:
> Gee David... the is so "academic"... I mean you're testing things and all.
> :-) :-) :-)

Absolutely. It's well known that Real Programmers don't need data
structures. Arrays should be good enough for anyone. :-)


Re: [scala] Collections performance

by Sean McDirmid :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I did similar benchmarking last year and reported results on the internal dev list. JCL is definitely faster (those pesky Sun programmers), and fills in some of the wholes of the current collection library (views, subsets, before recently linked sets and maps). Whereas the Scala collection library is biased to immutable APIs and implementations, the JCL is biased to the mutable case and so it should be faster.

On the other hand, JCL is not portable to .NET, so I'm trying to move my code away from it, and I would definitely love to see a faster Scala collection library eventually (although what we have now definitely works, its just very minimal). Maybe we could we port open source collection libraries to Scala? I use C5 in .NET land, and it has a great feature set that seems pretty efficient.

Anyways, I'm all for a scalax collection library, I'd also be willing to contribute.

Sean

On Wed, Jul 30, 2008 at 2:16 AM, David MacIver <david.maciver@...> wrote:
So as part of my project to write a replacement immutable HashMap
implementation I've been benchmarking like crazy. I had previously
been comparing to the standard immutable.HashMap implementation but
earlier during a bored moment when I couldn't face any more SVG I
decided to widen the scope of my benchmarks and include a bunch of
other map implementations.

The attached are a set of these bench marks, testing a few common
operations on some standard map implementations. These are mainly
interesting for their comparison of the status quo rather than my
implementation (although I'm reasonably pleased with the numbers on
mine). In particular I compared the standard immutable hash map, the
standard mutable hash map, the standard TreeMap and the JCL HashMap
for mapping Strings to Strings.

The numbers are a little sad. They can be summarised as follows:

- TreeMap is slooow (this might be because string comparison is
relatively expensive, but in some other tests I ran earlier on integer
keys it did pretty terribly as well).
- jcl.HashMap basically blows all the others out of the water -
typically by a factor of about 5.

So all the collections we typically use are substantially slower than
their Java equivalents. This is a little unfortunate.

One other point of interest (not really in this context, as it's
dramatically not what maps are supposed to be for) is that the scala
variants hold their own much more in foreach performance - a lot of
the Scala collections have internal iteration methods which don't use
an iterator, and these tend to be faster (indeed my HashMap's foreach
method is the only instance of the JCL collection losing).

So, basically we're typically using collections that are a lot slower
than they need to be.

This isn't anyone's fault, particularly not the EPFL team's. The
java.util collections have had a lot of time spent on optimising them
over the last N years, and having a blazingly fast collections API has
quite correctly not been the top priority for Scala. But at the same
time, it would be nice to have it fixed. :-) The best way to do this
is probably community involvement (likely via scalax. I should really
get around to sending in a CLA for that...). In the mean time, maybe
we should think about using the jcl collections a bit more (at least
when we need high performance mutable collections) - they're not
really significantly worse to use than the standard mutable
collections API, thanks to the nice higher order wrapper the package
provides around the underlying java.util stuff.

That being said, I'm not really proposing anything concrete in this
email. It's mostly just a heads up to let people know what I've found.


Re: [scala] Collections performance

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Jul 30, 2008 at 4:30 AM, Sean McDirmid <sean.mcdirmid@...> wrote:
> I did similar benchmarking last year and reported results on the internal
> dev list. JCL is definitely faster (those pesky Sun programmers), and fills
> in some of the wholes of the current collection library (views, subsets,
> before recently linked sets and maps). Whereas the Scala collection library
> is biased to immutable APIs and implementations, the JCL is biased to the
> mutable case and so it should be faster.

Right. The fact that the immutable collections library is slower than
the mutable one doesn't surprise me. But the mutable case also seems
substantially slower.

I have just noticed that the fix replacing Array[Entry] with an
Array[AnyRef] in the HashMap implementation hasn't made it into the
version I'm testing against, so it's possible the numbers will be a
little happier for free in the next version.

> On the other hand, JCL is not portable to .NET, so I'm trying to move my
> code away from it, and I would definitely love to see a faster Scala

I wonder if it would be worth filling in the gaps in the standard
mutable collection library so that it supports the same interface as
the JCL collections and then using the JCL collections behind the
scenes when running on the JVM?

> collection library eventually (although what we have now definitely works,
> its just very minimal). Maybe we could we port open source collection
> libraries to Scala? I use C5 in .NET land, and it has a great feature set
> that seems pretty efficient.

Could do. One of the problems with that, at least if we start from a
.NET or Java library, is that it will most likely enrich the mutable
side of the of the collections API while leaving the immutable side to
languish.

One good source of immutable collections to steal is Haskell, and the
stuff on Hackage. :-) e.g. the IntMap implementation I've used is a
direct port of Data.IntMap in Haskell.

> Anyways, I'm all for a scalax collection library, I'd also be willing to
> contribute.

One idea I've toyed with is at some point doing a "Data structures
week". Set up a wiki to coordinate and suggest ideas, and have
everyone who's interested pick a cool/useful data structure and put
together as solid an implementation of it in Scala as they can.
Ideally with a set of scalacheck tests and benchmarks for them to work
against. Participants would be encouraged to submit them to scalax or
similar afterwards, but requiring the CLA up front would probably
limit participation.

Re: [scala] Collections performance

by Sean McDirmid :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Comments inlined.

On Wed, Jul 30, 2008 at 5:11 PM, David MacIver <david.maciver@...> wrote:
 
Right. The fact that the immutable collections library is slower than
the mutable one doesn't surprise me. But the mutable case also seems
substantially slower.

They share the same implementations. I think making collections work for both mutable and immutable cases has something to do with the slow down.

I wonder if it would be worth filling in the gaps in the standard
mutable collection library so that it supports the same interface as
the JCL collections and then using the JCL collections behind the
scenes when running on the JVM?

If the JCL was implemented in cross-platform Scala, it would be as almost as fast as the Java-based JCL. In this case, just take the Java code for the Java JCL and port it to Scala. I don't think we should use different implementations of collections on different platforms (except for interop).

Could do. One of the problems with that, at least if we start from a
.NET or Java library, is that it will most likely enrich the mutable
side of the of the collections API while leaving the immutable side to
languish.

We could take a bunch of collections libraries and port over the best parts (with unit tests). Whether the libraries are mutable or immutable. I guess API affordances would have to be made, but I really don't like immutable features bringing down mutable features (e.g., weakly typed covariant contains method).

One idea I've toyed with is at some point doing a "Data structures
week". Set up a wiki to coordinate and suggest ideas, and have
everyone who's interested pick a cool/useful data structure and put
together as solid an implementation of it in Scala as they can.

I prefer stealing and porting code from other OSS libraries (copying their licenses of course). There isn't much value in implementing another hash map, it probably won't be so fast, someone has done it better, and why bother re-inventing the wheel? Maintainence is a problem though, if we don't understand what we are porting.
 
Ideally with a set of scalacheck tests and benchmarks for them to work
against. Participants would be encouraged to submit them to scalax or
similar afterwards, but requiring the CLA up front would probably
limit participation.

Simply OSS license, I don't see the problem.


Re: [scala] Collections performance

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Jul 30, 2008 at 10:27 AM, Sean McDirmid <sean.mcdirmid@...> wrote:

>
> Comments inlined.
>
> On Wed, Jul 30, 2008 at 5:11 PM, David MacIver <david.maciver@...>
> wrote:
>
>>
>> Right. The fact that the immutable collections library is slower than
>> the mutable one doesn't surprise me. But the mutable case also seems
>> substantially slower.
>
> They share the same implementations. I think making collections work for
> both mutable and immutable cases has something to do with the slow down.

It's certainly the cause of some umm... interesting effects on the
immutable collections. I'm not sure why it would be a slow down for
the mutable ones, except possibly overly abstracted code.

>> I wonder if it would be worth filling in the gaps in the standard
>> mutable collection library so that it supports the same interface as
>> the JCL collections and then using the JCL collections behind the
>> scenes when running on the JVM?
>
> If the JCL was implemented in cross-platform Scala, it would be as almost as
> fast as the Java-based JCL. In this case, just take the Java code for the
> Java JCL and port it to Scala.

I'm not sure about the licensing implications of that. The source code
for the Java collections is GPLed, and that would definitely count as
a derivative work.

> I don't think we should use different
> implementations of collections on different platforms (except for interop).

Fair enough.

>> Could do. One of the problems with that, at least if we start from a
>> .NET or Java library, is that it will most likely enrich the mutable
>> side of the of the collections API while leaving the immutable side to
>> languish.
>
> We could take a bunch of collections libraries and port over the best parts
> (with unit tests). Whether the libraries are mutable or immutable. I guess

Sure. I'm just saying: The availability is hugely biased in the
direction of mutable ones.

> API affordances would have to be made, but I really don't like immutable
> features bringing down mutable features (e.g., weakly typed covariant
> contains method).

I'm not clear on how that's a function of immutability vs immutability.

>> One idea I've toyed with is at some point doing a "Data structures
>> week". Set up a wiki to coordinate and suggest ideas, and have
>> everyone who's interested pick a cool/useful data structure and put
>> together as solid an implementation of it in Scala as they can.
>
> I prefer stealing and porting code from other OSS libraries (copying their
> licenses of course). There isn't much value in implementing another hash
> map, it probably won't be so fast, someone has done it better, and why
> bother re-inventing the wheel? Maintainence is a problem though, if we don't
> understand what we are porting.

Well, I wasn't thinking of data structures along the lines of "Yet
another hash map". There's quite a rich selection of special purpose
and interesting data structures out there: Bloom filters, tries,
finger trees, a variety of heaps (including interestingly different
ones like the soft heap),  plus a rich variety of purely functional
data structures a la Okasaki's book and future papers. It would be
really nice to have a "batteries included" usable collections library
containing more exotic data structures. The objective here is not to
match the Java collections library in usability, but to beat it.

Additionally: There's no requirement that the implementations have to
be unique to the person submitting them. They're perfectly welcome to
port an existing implementation. Indeed, it would be encouraged! The
objective is to get people to contribute, not to constrain how they
contribute. :-)

>> Ideally with a set of scalacheck tests and benchmarks for them to work
>> against. Participants would be encouraged to submit them to scalax or
>> similar afterwards, but requiring the CLA up front would probably
>> limit participation.
>
> Simply OSS license, I don't see the problem.

I'm fine with that, but scalax requires people to sign a CLA to
contribute code to it.

Re: [scala] Collections performance

by Sean McDirmid :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Why? Well, whatever, just create a scalay library and throw it up on Google code and make contributions more open. The CLA should be reserved for those closer to the core parts of Scala.

Sean

On Wed, Jul 30, 2008 at 5:56 PM, David MacIver <david.maciver@...> wrote:
I'm fine with that, but scalax requires people to sign a CLA to
contribute code to it.


Re: [scala] Collections performance

by Sean McDirmid :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Wed, Jul 30, 2008 at 5:56 PM, David MacIver <david.maciver@...> wrote:
It's certainly the cause of some umm... interesting effects on the
immutable collections. I'm not sure why it would be a slow down for
the mutable ones, except possibly overly abstracted code.

Possibly.

I'm not sure about the licensing implications of that. The source code
for the Java collections is GPLed, and that would definitely count as
a derivative work.

Just put it out under the same license. If RMS wants to sue us, we can burn that bridge when we get to it. We aren't selling Scala or being commercial in anyway, so I don't see any liability there. We just make sure our stuff is as open as the stuff we are copying.

Sure. I'm just saying: The availability is hugely biased in the
direction of mutable ones.

Of course, mutable collection libraries is where all the payoff is, while immutable collection libraries are often seen as novelties. We are very weak on mutable libraries right now in Scala, so the bias should benefit us more at first. Regardless I think they are fairly orthogonal and shouldn't share so many APIs; e.g., I don't see much use for having both mutable and immutable hash maps (mutable works well en
 
I'm not clear on how that's a function of immutability vs immutability.

Definitely is. Because Seq[+A], contains in Seq has the signature contains(x : Any) : Boolean (thankfully, Set and Map are not covariant). This then propagates to ArrayList because it extends Seq, hence we have weakly typed contains methods for mutable seq's directly as a consequence of sharing APIs between mutable and immutable collections (otherwise, contains would be strongly typed). 
 
Well, I wasn't thinking of data structures along the lines of "Yet
another hash map". There's quite a rich selection of special purpose
and interesting data structures out there: Bloom filters, tries,
finger trees, a variety of heaps (including interestingly different
ones like the soft heap),  plus a rich variety of purely functional
data structures a la Okasaki's book and future papers. It would be
really nice to have a "batteries included" usable collections library
containing more exotic data structures. The objective here is not to
match the Java collections library in usability, but to beat it.

Great. But that doesn't fill the short term need that we just need more of the basics. I'm all for the special stuff, but we have to consider the basic collections also (a mutable tree map would be a nice start).

Sean

Re: [scala] Collections performance

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Jul 30, 2008 at 12:11 PM, Sean McDirmid <sean.mcdirmid@...> wrote:
>> I'm not sure about the licensing implications of that. The source code
>> for the Java collections is GPLed, and that would definitely count as
>> a derivative work.
>
> Just put it out under the same license. If RMS wants to sue us, we can burn
> that bridge when we get to it. We aren't selling Scala or being commercial
> in anyway, so I don't see any liability there. We just make sure our stuff
> is as open as the stuff we are copying.

No no no. Absolutely not. No way, no how.

I don't know about you, but I *am* being commercial using Scala. And
if you include GPL code in the standard libraries (and any reasonable
collections library should be aiming to eventually become standardised
as part of the standard library) then any code I wish to use Scala in
is getting bundled with GPL derived stuff and I'm screwed.

This would be so massively a deal breaker that I'm fairly horrified
you've even suggested it.

>> Sure. I'm just saying: The availability is hugely biased in the
>> direction of mutable ones.
>
> Of course, mutable collection libraries is where all the payoff is, while
> immutable collection libraries are often seen as novelties. We are very weak

I can't help other peoples misconceptions. :-)

> on mutable libraries right now in Scala, so the bias should benefit us more
> at first. Regardless I think they are fairly orthogonal and shouldn't share

The immutable collections available are generally pretty weak in my
opinion as well.

> so many APIs; e.g., I don't see much use for having both mutable and
> immutable hash maps (mutable works well en

Because immutable collections are nice, sometimes you have a need for
generic map structures, and as I demonstrated in the attached, an
immutable hash map is typically dramatically faster than the tree map
implementation (this isn't just a trivial implementation inadequacy -
it's significantly faster than the JCL TreeMap too)?

>> I'm not clear on how that's a function of immutability vs immutability.
>
> Definitely is. Because Seq[+A], contains in Seq has the signature contains(x
> : Any) : Boolean (thankfully, Set and Map are not covariant). This then
> propagates to ArrayList because it extends Seq, hence we have weakly typed
> contains methods for mutable seq's directly as a consequence of sharing APIs
> between mutable and immutable collections (otherwise, contains would be
> strongly typed).

Fair enough. I hadn't really even noticed that there was a contains
method on Sequence. It seems a dramatically inappropriate place to put
it - Sequences don't have any more efficient way of testing for
containment than just def contains(x : Any) = exists(x=_) (indeed,
that's exactly how it's implemented).

In general I don't agree that sharing interfaces between mutable and
immutable collections is a bad thing. There are a lot of cases where
I'd want to do it, and I much more frequently find myself frustrated
by instances where the APIs are not shared than I do inadequacies
caused by sharing them.

I do think that the way the current hierarchy treats the relationship
between mutable and immutable collections is rather undesirable, but I
have so many complaints with the design of the existing API that this
is quite far down the list of things that bother me. :-)

>> Well, I wasn't thinking of data structures along the lines of "Yet
>> another hash map". There's quite a rich selection of special purpose
>> and interesting data structures out there: Bloom filters, tries,
>> finger trees, a variety of heaps (including interestingly different
>> ones like the soft heap),  plus a rich variety of purely functional
>> data structures a la Okasaki's book and future papers. It would be
>> really nice to have a "batteries included" usable collections library
>> containing more exotic data structures. The objective here is not to
>> match the Java collections library in usability, but to beat it.
>
> Great. But that doesn't fill the short term need that we just need more of
> the basics. I'm all for the special stuff, but we have to consider the basic

Well we have most of the basics. They could just use a bit of love in
terms of implementation.

> collections also (a mutable tree map would be a nice start).

Yes, that would be good.

Re: [scala] Collections performance

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Scalax, Scalaz, Scalay. Sooner or later we're going to run out of letters!

As to why? I quote: "You will also need to provide a signed
Contributor License Agreement (individual or corporate). We require
this to ensure that we are legally entitled to publish any
contributions you make. These agreements are substantially identical
to those used by the Apache foundation, among others."

You'll have to take it up with someone who's not me. I've never even
gotten around to signing a CLA for scalax, despite (now semi-obsolete)
plans to add some stuff to it. :-)

On Wed, Jul 30, 2008 at 11:58 AM, Sean McDirmid <sean.mcdirmid@...> wrote:

> Why? Well, whatever, just create a scalay library and throw it up on Google
> code and make contributions more open. The CLA should be reserved for those
> closer to the core parts of Scala.
>
> Sean
>
> On Wed, Jul 30, 2008 at 5:56 PM, David MacIver <david.maciver@...>
> wrote:
>>
>> I'm fine with that, but scalax requires people to sign a CLA to
>> contribute code to it.
>
>

Re: [scala] Collections performance

by Christos KK Loverdos :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Just put it out under the same license. If RMS wants to sue us, we can burn that bridge when we get to it. We aren't selling Scala or being commercial in anyway, so I don't see any liability there. We just make sure our stuff is as open as the stuff we are copying.

No-one will sue you if you take GPL code and produce GPL code UNLESS you are not very careful about authorship. Claiming authorship is more than just firing up your editor and punching keys, with the intention to alter/modify/augment some pre-existing GPL'ed code.


--
__~O
-\ <, Christos KK Loverdos
(*)/ (*) http://ckkloverdos.com

Re: [scala] Collections performance

by Christos KK Loverdos :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


> Just put it out under the same license. If RMS wants to sue us, we can burn
> that bridge when we get to it. We aren't selling Scala or being commercial
> in anyway, so I don't see any liability there. We just make sure our stuff
> is as open as the stuff we are copying.

No no no. Absolutely not. No way, no how.

I don't know about you, but I *am* being commercial using Scala. And
if you include GPL code in the standard libraries (and any reasonable
collections library should be aiming to eventually become standardised
as part of the standard library) then any code I wish to use Scala in
is getting bundled with GPL derived stuff and I'm screwed.

This would be so massively a deal breaker that I'm fairly horrified
you've even suggested it.


Guys,

We need to separate concepts here.

I think the attitude "throw anything in and worry later" is a dead-end. Generally, you can't just mix and copy and paste stuff from different licensing worlds. This is generally illegal.

OTAH and IMHO, using a library which is GPL in a commercial (as you put it, but I assume proprietary is a better intended term) is perfectly OK. As long as you do not touch the GPL code you are fine to play with it in library form (= use it). Most of the world now works like this.

Of course, generally speaking, you cannot take GPL (OSS) code, change it a bit and then package it in closed form as a product of your own. Licensing is a bit tricky there; mostly not because we are bad guys, but because we are not educated yet to play with the rules of the new OSS era (.... I am an optimist...).

BR
Christos

--
__~O
-\ <, Christos KK Loverdos
(*)/ (*) http://ckkloverdos.com

Re: [scala] Collections performance

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Jul 30, 2008 at 1:22 PM, Christos KK Loverdos
<loverdos@...> wrote:

>
>> > Just put it out under the same license. If RMS wants to sue us, we can
>> > burn
>> > that bridge when we get to it. We aren't selling Scala or being
>> > commercial
>> > in anyway, so I don't see any liability there. We just make sure our
>> > stuff
>> > is as open as the stuff we are copying.
>>
>> No no no. Absolutely not. No way, no how.
>>
>> I don't know about you, but I *am* being commercial using Scala. And
>> if you include GPL code in the standard libraries (and any reasonable
>> collections library should be aiming to eventually become standardised
>> as part of the standard library) then any code I wish to use Scala in
>> is getting bundled with GPL derived stuff and I'm screwed.
>>
>> This would be so massively a deal breaker that I'm fairly horrified
>> you've even suggested it.
>>
>
> Guys,
>
> We need to separate concepts here.
>
> I think the attitude "throw anything in and worry later" is a dead-end.
> Generally, you can't just mix and copy and paste stuff from different
> licensing worlds. This is generally illegal.

Indeed.

> OTAH and IMHO, using a library which is GPL in a commercial (as you put it,
> but I assume proprietary is a better intended term) is perfectly OK. As long
> as you do not touch the GPL code you are fine to play with it in library
> form (= use it). Most of the world now works like this.

This is totally untrue. It is vaguely sortof kindof true if you think
applications begin and end with web applications which don't get
distributed to end users, but they don't. If you're distributing the
application, distributing a library licensed under the GPL along with
it is going to force you to distribute the source of the application
too.

You may be thinking of the LGPL, which is a very different license.

In this particular instance it turns out that it's less of a problem -
the JDK libraries are actually GPL + Classpath Exceptions, but in
general GPLed code is profoundly infectious.

Re: [scala] Collections performance

by David Pollak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Sean McDirmid wrote:

I'm not sure about the licensing implications of that. The source code
for the Java collections is GPLed, and that would definitely count as
a derivative work.

Just put it out under the same license.
Copyright violations are code theft.  It's wrong, not matter what the purpose is.  Further, if parts of Scala are licensed under the GPL, lift's license becomes incompatible.  And it means lots of corporations can't use it. 
If RMS wants to sue us, we can burn that bridge when we get to it.
RMS is not the copyright holder, Sun is and they take their IP very, very seriously. 
We aren't selling Scala or being commercial in anyway, so I don't see any liability there.
The GPL imposes significant burdens on the licensee.  It's not about being open, it's about being free.  The GPL means that all other software that's linked to it must also be free.
We just make sure our stuff is as open as the stuff we are copying.
Once again, copyright violation is theft.  It's no different than stealing a car.  Being part of the FOSS community means respecting that.

David


Re: [scala] Collections performance

by David Pollak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



David MacIver wrote:
On Wed, Jul 30, 2008 at 1:22 PM, Christos KK Loverdos
loverdos@... wrote:
  
Just put it out under the same license. If RMS wants to sue us, we can
burn
that bridge when we get to it. We aren't selling Scala or being
commercial
in anyway, so I don't see any liability there. We just make sure our
stuff
is as open as the stuff we are copying.
        
No no no. Absolutely not. No way, no how.

I don't know about you, but I *am* being commercial using Scala. And
if you include GPL code in the standard libraries (and any reasonable
collections library should be aiming to eventually become standardised
as part of the standard library) then any code I wish to use Scala in
is getting bundled with GPL derived stuff and I'm screwed.

This would be so massively a deal breaker that I'm fairly horrified
you've even suggested it.

      
Guys,

We need to separate concepts here.

I think the attitude "throw anything in and worry later" is a dead-end.
Generally, you can't just mix and copy and paste stuff from different
licensing worlds. This is generally illegal.
    

Indeed.

  
OTAH and IMHO, using a library which is GPL in a commercial (as you put it,
but I assume proprietary is a better intended term) is perfectly OK. As long
as you do not touch the GPL code you are fine to play with it in library
form (= use it). Most of the world now works like this.
    

This is totally untrue. It is vaguely sortof kindof true if you think
applications begin and end with web applications which don't get
distributed to end users, but they don't. If you're distributing the
application, distributing a library licensed under the GPL along with
it is going to force you to distribute the source of the application
too.

You may be thinking of the LGPL, which is a very different license.

In this particular instance it turns out that it's less of a problem -
the JDK libraries are actually GPL + Classpath Exceptions, but in
general GPLed code is profoundly infectious.
  
David's absolutely correct here.  Plus, many corporations have an absolute bar against GPL'ed software that has not been explicitly "white listed" through license agreements or actual review of the exceptions (e.g., the classpath exception.)

Re: [scala] Collections performance

by Viktor Klang :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Wed, Jul 30, 2008 at 2:46 PM, David Pollak <dpp@...> wrote:


Sean McDirmid wrote:

I'm not sure about the licensing implications of that. The source code
for the Java collections is GPLed, and that would definitely count as
a derivative work.

Just put it out under the same license.
Copyright violations are code theft.  It's wrong, not matter what the purpose is.  Further, if parts of Scala are licensed under the GPL, lift's license becomes incompatible.  And it means lots of corporations can't use it. 
If RMS wants to sue us, we can burn that bridge when we get to it.
RMS is not the copyright holder, Sun is and they take their IP very, very seriously. 
We aren't selling Scala or being commercial in anyway, so I don't see any liability there.
The GPL imposes significant burdens on the licensee.  It's not about being open, it's about being free.  The GPL means that all other software that's linked to it must also be free.

We just make sure our stuff is as open as the stuff we are copying.
Once again, copyright violation is theft.  It's no different than stealing a car.  Being part of the FOSS community means respecting that.

With all respect, copyright violation is copyright violation, theft is theft.
Both are illegal, but don't compare apples to nuts.

Viktor
 


David




--
Viktor Klang
Rogue Software Architect

Re: [scala] Collections performance

by Christos KK Loverdos :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


This is totally untrue. It is vaguely sortof kindof true if you think
applications begin and end with web applications which don't get
distributed to end users, but they don't. If you're distributing the
application, distributing a library licensed under the GPL along with
it is going to force you to distribute the source of the application
too.

You may be thinking of the LGPL, which is a very different license.

Yes, I was trying to speak generally for the OSS landscape with a particular license in mind, which is unsound by definition. :-)

Well, the case with GPL is what constitutes a derivative work: Does my program, which just links to a GPL library, even dynamically, constitute a derivative work of that GPL library?
It seems that this issue has not been legally settled for good up to now, but If one wants to be careful, then one should be better off avoiding the use of GPL code in commercial cases we discuss here....


--
__~O
-\ <, Christos KK Loverdos
(*)/ (*) http://ckkloverdos.com

Re: [scala] Collections performance

by David Pollak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Viktor Klang wrote:

We just make sure our stuff is as open as the stuff we are copying.
Once again, copyright violation is theft.  It's no different than stealing a car.  Being part of the FOSS community means respecting that.

With all respect, copyright violation is copyright violation, theft is theft.
Both are illegal, but don't compare apples to nuts.
Copying an MP3 from a friend may be copyright violation, but it may not be theft.

In this case, taking someone else's code (Property) and putting it in your code in violation of license terms (without their permission), is theft.  See http://en.wikipedia.org/wiki/Theft


< Prev | 1 - 2 - 3 | Next >