[scala] Collections performance

View: New views
5 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 | Next >

Re: [scala] Collections performance

by n8han :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Erik Engbrecht wrote:
-1 to complicating the license situation, but no objections to using
GPL+Classpath code for research purposes.
I find it complicated to know the difference between researching code and deriving something from it; it's much easier on the programmer to work across exactly the same license when possible. Of course, that brings up the question of from where any given project would like to do the most "deriving", but for Scala I should think Java would be it. Most of Java is now licensed under the GPL with classpath exception, the rest plain GPL (http://www.sun.com/software/opensource/java/faq.jsp#b3), so the former is what I would use if I were starting a new JVM-targeted language today. (I'm not sure what CLR brings to the table license wise.)

As I read Scala's LICENSE it says that Lausanne has copyright to pretty much the whole shebang, so, it would trivial to relicense. (I mean, compared to other projects or something originally proprietary like Java.) Maybe it's not worth the trouble. I don't know. Most custom OSS licenses eventually seem to be replaced by standard ones, or variations of standard ones. If that ever happens for Scala, I would hope the GPL with classpath exception is at least considered and not ruled out because of common misunderstandings. It's a good license, and the fact that Sun chose it (Sun!) should have put to bed the idea that it is eccentric.

Nathan

Re: [scala] Collections performance

by Viktor Klang :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Wed, Jul 30, 2008 at 3:20 PM, David Pollak <dpp@...> wrote:


Viktor Klang wrote:

We just make sure our stuff is as open as the stuff we are copying.
Once again, copyright violation is theft.  It's no different than stealing a car.  Being part of the FOSS community means respecting that.

With all respect, copyright violation is copyright violation, theft is theft.
Both are illegal, but don't compare apples to nuts.
Copying an MP3 from a friend may be copyright violation, but it may not be theft.

In this case, taking someone else's code (Property) and putting it in your code in violation of license terms (without their permission), is theft.  See http://en.wikipedia.org/wiki/Theft

Theft implies that something is moved from one place to another. The act of Copying doesn't move anything.


--
Viktor Klang
Rogue Software Architect

Re: [scala] Collections performance

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Jul 31, 2008 at 12:04 PM, Viktor Klang <viktor.klang@...> wrote:

>
>
> On Wed, Jul 30, 2008 at 3:20 PM, David Pollak <dpp@...> wrote:
>>
>>
>> Viktor Klang wrote:
>>>
>>> We just make sure our stuff is as open as the stuff we are copying.
>>>
>>> Once again, copyright violation is theft.  It's no different than
>>> stealing a car.  Being part of the FOSS community means respecting that.
>>
>> With all respect, copyright violation is copyright violation, theft is
>> theft.
>> Both are illegal, but don't compare apples to nuts.
>>
>> Copying an MP3 from a friend may be copyright violation, but it may not be
>> theft.
>>
>> In this case, taking someone else's code (Property) and putting it in your
>> code in violation of license terms (without their permission), is theft.
>> See http://en.wikipedia.org/wiki/Theft
>
> Theft implies that something is moved from one place to another. The act of
> Copying doesn't move anything.

Take it to scala-debate please.

[scala] Re: Collections performance

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

If anyone's interested, I've added an implementation of a mutable hash
map based on open hashing which appears to do pretty well (it's
extremely loosely based on the implementation of Python's dictionary
object. Basically it uses the same hashing scheme - I've not borrowed
anything else from the implementation). It's about 20% slower on gets
than the jcl.HashMap implementation and appears to be fractionally
faster on updates. It's in dictionary.scala in the same repo as the
rest of the code ( http://www.drmaciver.com/repos/misc , in
scala/maps)

It's remarkably simple given how well it does. I'm a little surprised.

[scala] Re: Collections performance

by Ismael Juma :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

David MacIver <david.maciver <at> gmail.com> writes:
> It's about 20% slower on gets
> than the jcl.HashMap implementation and appears to be fractionally
> faster on updates. It's in dictionary.scala in the same repo as the
> rest of the code ( http://www.drmaciver.com/repos/misc , in
> scala/maps)
>
> It's remarkably simple given how well it does. I'm a little surprised.

It makes sense to me. java.util.HashMap is quite simple itself, so simplicity
seems to make the JIT's job easier resulting in better performance. Also,
there's room for improvement in the java.util.HashMap from JDK6. For example,
the modcount field used to throw ConcurrentModificationException is volatile for
no good reason. This has been fixed in the mercurial repo for OpenJDK7. And if
one didn't care for ConcurrentModificationException, then the logic could be
simplified further. The fact that it caches hashCodes does not help if the
hashCode is cheap to compute or is cached in the object itself (so this won't be
an advantage for the current benchmarks that use Strings) and finally it
includes some code (e.g. recordAccess, recordRemoval) that is only there to make
the implementation of LinkedHashMap easier.

Even though I haven't had the time to analyse in detail, it seems to me that
mutable.HashMap is paying the price for too much abstraction. If I had to guess,
I would say that the JIT is failing to inline some calls causing the performance
disparity.

Of course, it doesn't help that an update in mutable.HashMap involves calling
HashTable.findEntry followed by HashTable.addEntry where things like the
hashCode are computed twice, but I don't think should affect the current
benchmarks much (since String caches the hashCode). It seems like an easy
improvement to make though.

I was also curious how much of an effect rehashings were having in the update
scores. So, I set the initial size to be large enough to avoid any rehashings.
For jcl.HashMap, I just passed it as the constructor parameter and it got a nice
improvement (from 375ms to 272 ms). For mutable.HashMap I had to create an
anonymous inner class and override initialSize. To my surprise it was
substantially slower (832ms to 1307ms). Unless there's a bug somewhere, it seems
like the creation of the anonymous inner class is causing even more missed
optimisations. Maybe it's causing some call site to become bimorphic or
megamorphic, but haven't looked at the full benchmark in detail.

Ismael


< Prev | 1 - 2 - 3 | Next >