Towards a revised collection API

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

Towards a revised collection API

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

So, with the recent noise I've been making about the collections API,
I figured I should put some thoughts down on screen.

As I see it one of the main problems with the current collection API
is that in order to make things play nicely you need to override
methods at every step of the way. map, filter, flatMap etc all return
the "wrong" things. This is a pain to write and easy to get wrong.

The other thing I don't like about the collections API is that I'm not
a big fan of iterators. They're really cumbersome to create for custom
collection types and are designed around a really irritating "check
then proceed" style. Really all we care about is how to traverse the
data structure, and constructing an iterator to do so is often not the
best way.

Here's something I threw together last night to try to address these
problems. The basic principles are:

- It should be easy to convert between collection types
- Consequently it's ok if we return the "wrong" thing.
- In fact it may be desirable to do so!
- All we care about generically are how to traverse a data structure
and how to build one, so these are the operations the API should be
designed around.

So the API defines a trait Enumerable which describes how to traverse
a structure and a trait Buildable which describes how to build one
(the data structure doesn't implement Buildable, one makes an implicit
instance available). Implementations do not override the enumerable
map, flatMap, etc. methods - these continue to return a generic
enumerable. However at the end of a bunch of operations you explicitly
"cast" to the desired result.

So for example

scala> for (i <- range(1, 4); j <- range(1, 4); if (i != j)) yield (i, j)
res8: enumerable.Enumerable[(Int, Int)] = enumerable.Enumerable$$anon$4@708aff

scala> res8.intersperse(",").mkString
res9: String = (1,2),(1,3),(2,1),(2,3),(3,1),(3,2)

scala> res8.as[List, (Int, Int)]
res10: List[(Int, Int)] = List((1,2), (1,3), (2,1), (2,3), (3,1), (3,2))

(you need to specify the type like that in order to make this work
because of Enumerable being covariant in its type)

I've not benchmarked, but for large for comprehensions this should
actually be more efficient, as it avoids the construction of
intermediate datastructures.

We can use return from inside passed closures to control evaluation
and exit the loop prematurely, which allows efficient implementation
of things like exists, foreach and take. For example:

scala> def from(x : Int) = new Enumerable[Int] { def foreach(f : Int
=> Unit) = {var i = x; while(true) { f(i); i =} } } i + 1} } }
from: (Int)java.lang.Object with enumerable.Enumerable[Int]

scala> from(0).take(10).foreach(println);
0
1
2
3
4
5
6
7
8
9

The exact structure of Buildable needs work (it doesn't currently
handle sorted collections well for example), but I'm reaonably happy
with how Enumerable works.

Any thoughts?


enumerable.scala (4K) Download Attachment

Re: Towards a revised collection API

by Christos KK Loverdos :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Have you seen the paper "Generics of a Higher Kind"?

http://lambda-the-ultimate.org/node/2579

They introduce a Buildable trait that does the job. Moreover, Martin
has just said in scala-debate that they are redesigning the
collections in order to use the ideas in the paper, and they do so in
a backwards-compatible way. It will take them a couple of months
before they go public.



On Fri, Apr 18, 2008 at 2:28 PM, David MacIver <david.maciver@...> wrote:

> So, with the recent noise I've been making about the collections API,
>  I figured I should put some thoughts down on screen.
>
>  As I see it one of the main problems with the current collection API
>  is that in order to make things play nicely you need to override
>  methods at every step of the way. map, filter, flatMap etc all return
>  the "wrong" things. This is a pain to write and easy to get wrong.
>
>  The other thing I don't like about the collections API is that I'm not
>  a big fan of iterators. They're really cumbersome to create for custom
>  collection types and are designed around a really irritating "check
>  then proceed" style. Really all we care about is how to traverse the
>  data structure, and constructing an iterator to do so is often not the
>  best way.
>
>  Here's something I threw together last night to try to address these
>  problems. The basic principles are:
>
>  - It should be easy to convert between collection types
>  - Consequently it's ok if we return the "wrong" thing.
>  - In fact it may be desirable to do so!
>  - All we care about generically are how to traverse a data structure
>  and how to build one, so these are the operations the API should be
>  designed around.
>
>  So the API defines a trait Enumerable which describes how to traverse
>  a structure and a trait Buildable which describes how to build one
>  (the data structure doesn't implement Buildable, one makes an implicit
>  instance available). Implementations do not override the enumerable
>  map, flatMap, etc. methods - these continue to return a generic
>  enumerable. However at the end of a bunch of operations you explicitly
>  "cast" to the desired result.
>
>  So for example
>
>  scala> for (i <- range(1, 4); j <- range(1, 4); if (i != j)) yield (i, j)
>  res8: enumerable.Enumerable[(Int, Int)] = enumerable.Enumerable$$anon$4@708aff
>
>  scala> res8.intersperse(",").mkString
>  res9: String = (1,2),(1,3),(2,1),(2,3),(3,1),(3,2)
>
>  scala> res8.as[List, (Int, Int)]
>  res10: List[(Int, Int)] = List((1,2), (1,3), (2,1), (2,3), (3,1), (3,2))
>
>  (you need to specify the type like that in order to make this work
>  because of Enumerable being covariant in its type)
>
>  I've not benchmarked, but for large for comprehensions this should
>  actually be more efficient, as it avoids the construction of
>  intermediate datastructures.
>
>  We can use return from inside passed closures to control evaluation
>  and exit the loop prematurely, which allows efficient implementation
>  of things like exists, foreach and take. For example:
>
>  scala> def from(x : Int) = new Enumerable[Int] { def foreach(f : Int
>  => Unit) = {var i = x; while(true) { f(i); i =} } } i + 1} } }
>  from: (Int)java.lang.Object with enumerable.Enumerable[Int]
>
>  scala> from(0).take(10).foreach(println);
>  0
>  1
>  2
>  3
>  4
>  5
>  6
>  7
>  8
>  9
>
>  The exact structure of Buildable needs work (it doesn't currently
>  handle sorted collections well for example), but I'm reaonably happy
>  with how Enumerable works.
>
>  Any thoughts?
>



--
 __~O
 -\ <, Christos KK Loverdos
(*)/ (*) http://ckkloverdos.com


Re: Towards a revised collection API

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Apr 18, 2008 at 12:39 PM, Christos KK Loverdos
<loverdos@...> wrote:
> Have you seen the paper "Generics of a Higher Kind"?
>
>  http://lambda-the-ultimate.org/node/2579

Yes, the Buildable trait is based on that. Obviously. :-) The more
interesting part of this is the dual of that, the Enumerable trait.

>  They introduce a Buildable trait that does the job. Moreover, Martin
>  has just said in scala-debate that they are redesigning the
>  collections in order to use the ideas in the paper, and they do so in
>  a backwards-compatible way. It will take them a couple of months
>  before they go public.

Which is all the more reason to discuss this now rather than in a few
months once the decision has been finalised...


Re: Towards a revised collection API

by Jamie Webb-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 2008-04-18 12:28:38 David MacIver wrote:
> So, with the recent noise I've been making about the collections API,
> I figured I should put some thoughts down on screen.
>
> As I see it one of the main problems with the current collection API
> is that in order to make things play nicely you need to override
> methods at every step of the way. map, filter, flatMap etc all return
> the "wrong" things. This is a pain to write and easy to get wrong.

This is what Martin is talking about fixing with type constructor
polymorphism, which seems preferable by far.

> The other thing I don't like about the collections API is that I'm not
> a big fan of iterators. They're really cumbersome to create for custom
> collection types and are designed around a really irritating "check
> then proceed" style. Really all we care about is how to traverse the
> data structure, and constructing an iterator to do so is often not the
> best way.

Agreed. See scalax.control.ManagedSequence for some code that gets
badly bitten by this. However, aside from creating one or more
superclasses of Iterable that define the more functional interfaces,
it's hard to see how to improve.

> Here's something I threw together last night to try to address these
> problems. The basic principles are:
>
> - It should be easy to convert between collection types

Fair enough.

> - Consequently it's ok if we return the "wrong" thing.

Only if we assume that all our operations are non-strict, which I don't
think is a good thing.

> - In fact it may be desirable to do so!
> - All we care about generically are how to traverse a data structure
> and how to build one, so these are the operations the API should be
> designed around.

Clearly, but that is already the case.

> So the API defines a trait Enumerable which describes how to traverse
> a structure and a trait Buildable which describes how to build one
> (the data structure doesn't implement Buildable, one makes an implicit
> instance available). Implementations do not override the enumerable
> map, flatMap, etc. methods - these continue to return a generic
> enumerable. However at the end of a bunch of operations you explicitly
> "cast" to the desired result.

Do you have some type signatures for these classes that you can show us?

...
> We can use return from inside passed closures to control evaluation
> and exit the loop prematurely, which allows efficient implementation
> of things like exists, foreach and take. For example:

Non-local returns are implemented via exceptions, so they are not
really efficient. I think any move away from iterators is likely to
imply a performance loss.

/J


Re: Towards a revised collection API

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Apr 18, 2008 at 1:13 PM, Jamie Webb <j@...> wrote:

> On 2008-04-18 12:28:38 David MacIver wrote:
>  > So, with the recent noise I've been making about the collections API,
>  > I figured I should put some thoughts down on screen.
>  >
>  > As I see it one of the main problems with the current collection API
>  > is that in order to make things play nicely you need to override
>  > methods at every step of the way. map, filter, flatMap etc all return
>  > the "wrong" things. This is a pain to write and easy to get wrong.
>
>  This is what Martin is talking about fixing with type constructor
>  polymorphism, which seems preferable by far.

Sure. This solution uses type constructor polymorphism too. But
getting it to handle returning the right type automatically without
having to constantly reimplement isn't totally trivial.

>  > The other thing I don't like about the collections API is that I'm not
>  > a big fan of iterators. They're really cumbersome to create for custom
>  > collection types and are designed around a really irritating "check
>  > then proceed" style. Really all we care about is how to traverse the
>  > data structure, and constructing an iterator to do so is often not the
>  > best way.
>
>  Agreed. See scalax.control.ManagedSequence for some code that gets
>  badly bitten by this. However, aside from creating one or more
>  superclasses of Iterable that define the more functional interfaces,
>  it's hard to see how to improve.

Either the mailing list stripped the attached file, or you missed it.
In the case that it's the former, here's a pastebin:
http://pastebin.com/m23106284

My point is you can reimplement basically the entire Iterable trait
sans "elements" based on only foreach.

>  > - Consequently it's ok if we return the "wrong" thing.
>
>  Only if we assume that all our operations are non-strict, which I don't
>  think is a good thing.

I think this is ok. The useage I have in mind is that you do a bunch
of calculations with a "suspended" result and then immediately thunk
the whole thing.

>  > - In fact it may be desirable to do so!
>  > - All we care about generically are how to traverse a data structure
>  > and how to build one, so these are the operations the API should be
>  > designed around.
>
>  Clearly, but that is already the case.

Not really. The API is poorly constructed for building, and the
traversal strategies center around creating iterators.

>  > So the API defines a trait Enumerable which describes how to traverse
>  > a structure and a trait Buildable which describes how to build one
>  > (the data structure doesn't implement Buildable, one makes an implicit
>  > instance available). Implementations do not override the enumerable
>  > map, flatMap, etc. methods - these continue to return a generic
>  > enumerable. However at the end of a bunch of operations you explicitly
>  > "cast" to the desired result.
>
>  Do you have some type signatures for these classes that you can show us?

As before, see attached file / pastebin.

> > We can use return from inside passed closures to control evaluation
>  > and exit the loop prematurely, which allows efficient implementation
>  > of things like exists, foreach and take. For example:
>
>  Non-local returns are implemented via exceptions, so they are not
>  really efficient. I think any move away from iterators is likely to
>  imply a performance loss.

I disagree. Exceptions are extremely fast on the JVM and have been for
years. They basically compile to gotos. It's just stack trace
generation which is slow, which you can surpress in your
implementation of Throwable (which I assume Scala's non-local returns
do. If not, I'd consider that a bug. I'll check later).

I'm not saying this will never be slower. But equally I expect there
are a lot of times where it will be faster, particularly when it can
eliminate a bunch of intermediate structures.


Re: Towards a revised collection API

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Apr 18, 2008 at 1:28 PM, David MacIver <david.maciver@...> wrote:
>  >  Non-local returns are implemented via exceptions, so they are not
>  >  really efficient. I think any move away from iterators is likely to
>  >  imply a performance loss.
>
>  I disagree. Exceptions are extremely fast on the JVM and have been for
>  years. They basically compile to gotos. It's just stack trace
>  generation which is slow, which you can surpress in your
>  implementation of Throwable (which I assume Scala's non-local returns
>  do. If not, I'd consider that a bug. I'll check later).

Looks like it doesn't. I've filed a bug:
https://lampsvn.epfl.ch/trac/scala/ticket/775


Re: Towards a revised collection API

by Jamie Webb-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 2008-04-18 13:53:20 David MacIver wrote:

> On Fri, Apr 18, 2008 at 1:28 PM, David MacIver
> <david.maciver@...> wrote:
> >  >  Non-local returns are implemented via exceptions, so they are
> >  > not really efficient. I think any move away from iterators is
> >  > likely to imply a performance loss.
> >
> >  I disagree. Exceptions are extremely fast on the JVM and have been
> > for years. They basically compile to gotos. It's just stack trace
> >  generation which is slow, which you can surpress in your
> >  implementation of Throwable (which I assume Scala's non-local
> > returns do. If not, I'd consider that a bug. I'll check later).
>
> Looks like it doesn't. I've filed a bug:
> https://lampsvn.epfl.ch/trac/scala/ticket/775

There's a reason: if a closure escapes its defining method then the
exception will propagated as normal. This would be a pain to track down
without a stack trace.

/J


Re: Towards a revised collection API

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Apr 18, 2008 at 5:53 PM, Jamie Webb <j@...> wrote:

>
> On 2008-04-18 13:53:20 David MacIver wrote:
>  > On Fri, Apr 18, 2008 at 1:28 PM, David MacIver
>  > <david.maciver@...> wrote:
>  > >  >  Non-local returns are implemented via exceptions, so they are
>  > >  > not really efficient. I think any move away from iterators is
>  > >  > likely to imply a performance loss.
>  > >
>  > >  I disagree. Exceptions are extremely fast on the JVM and have been
>  > > for years. They basically compile to gotos. It's just stack trace
>  > >  generation which is slow, which you can surpress in your
>  > >  implementation of Throwable (which I assume Scala's non-local
>  > > returns do. If not, I'd consider that a bug. I'll check later).
>  >
>  > Looks like it doesn't. I've filed a bug:
>  > https://lampsvn.epfl.ch/trac/scala/ticket/775
>
>  There's a reason: if a closure escapes its defining method then the
>  exception will propagated as normal. This would be a pain to track down
>  without a stack trace.

I don't buy that. It's a sufficiently rare error that breaking the
performance of this feature so totally in an attempt to make debugging
of the it slightly easier is really pointless.

In any case, if this is really deemed to be a problem there are
solutions of varying degrees of difficulty  The easiest it to have a
global debug parameter somewhere that can be set and have the stack
trace be suppressed only if this isn't set.


Re: Towards a revised collection API

by Jamie Webb-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 2008-04-18 13:28:48 David MacIver wrote:

> On Fri, Apr 18, 2008 at 1:13 PM, Jamie Webb <j@...>
> wrote:
> > On 2008-04-18 12:28:38 David MacIver wrote:
> >  > So, with the recent noise I've been making about the collections
> >  > API, I figured I should put some thoughts down on screen.
> >  >
> >  > As I see it one of the main problems with the current collection
> >  > API is that in order to make things play nicely you need to
> >  > override methods at every step of the way. map, filter, flatMap
> >  > etc all return the "wrong" things. This is a pain to write and
> >  > easy to get wrong.
> >
> >  This is what Martin is talking about fixing with type constructor
> >  polymorphism, which seems preferable by far.
>
> Sure. This solution uses type constructor polymorphism too. But
> getting it to handle returning the right type automatically without
> having to constantly reimplement isn't totally trivial.

The collections hierarchy is such a core part of the library that I
don't consider the implementation cost to be anywhere near as important
as usability or performance. From what I can see, your proposal falls
down on both those fronts.

> >  > The other thing I don't like about the collections API is that
> >  > I'm not a big fan of iterators. They're really cumbersome to
> >  > create for custom collection types and are designed around a
> >  > really irritating "check then proceed" style. Really all we care
> >  > about is how to traverse the data structure, and constructing an
> >  > iterator to do so is often not the best way.
> >
> >  Agreed. See scalax.control.ManagedSequence for some code that gets
> >  badly bitten by this. However, aside from creating one or more
> >  superclasses of Iterable that define the more functional
> > interfaces, it's hard to see how to improve.
>
> Either the mailing list stripped the attached file, or you missed it.
> In the case that it's the former, here's a pastebin:
> http://pastebin.com/m23106284

Sorry, I missed it.

> My point is you can reimplement basically the entire Iterable trait
> sans "elements" based on only foreach.

You can, and I think it would be good to have a superclass of Iterable
that does this. However, below that there need to be classes which
implement these methods in a way that best suits their access
characteristics. The primary example of this at present is
RandomAccessSeq, but the same is true of all the variations to some
degree.

> >  > - Consequently it's ok if we return the "wrong" thing.
> >
> >  Only if we assume that all our operations are non-strict, which I
> > don't think is a good thing.
>
> I think this is ok. The useage I have in mind is that you do a bunch
> of calculations with a "suspended" result and then immediately thunk
> the whole thing.

So I'd have to write things like this?

  myNumbers.map(_ * 2).as[List, Int]

That's... ugly.

> I'm not saying this will never be slower. But equally I expect there
> are a lot of times where it will be faster, particularly when it can
> eliminate a bunch of intermediate structures.

Bear in mind that the library already offers Projections which do this,
so your version will not be faster.

It seems like your Enumerable and Buildable concepts are perfectly
valuable things to have as part of the library, but there's no way you
could use them as the basis for a practical collections hierarchy. The
same is likely true of any other simple solution. The problem is not
simple.

I think the collections library needs tidying up and rationalising,
and updating to make use of current language features. I'd like to see
it /support/ collections that lack iterators. But I don't see the basic
design changing much.

/J


Re: Towards a revised collection API

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, Apr 19, 2008 at 3:09 AM, Jamie Webb <j@...> wrote:

> On 2008-04-18 13:28:48 David MacIver wrote:
>  > On Fri, Apr 18, 2008 at 1:13 PM, Jamie Webb <j@...>
>  > wrote:
>  > > On 2008-04-18 12:28:38 David MacIver wrote:
>  > >  > So, with the recent noise I've been making about the collections
>  > >  > API, I figured I should put some thoughts down on screen.
>  > >  >
>  > >  > As I see it one of the main problems with the current collection
>  > >  > API is that in order to make things play nicely you need to
>  > >  > override methods at every step of the way. map, filter, flatMap
>  > >  > etc all return the "wrong" things. This is a pain to write and
>  > >  > easy to get wrong.
>  > >
>  > >  This is what Martin is talking about fixing with type constructor
>  > >  polymorphism, which seems preferable by far.
>  >
>  > Sure. This solution uses type constructor polymorphism too. But
>  > getting it to handle returning the right type automatically without
>  > having to constantly reimplement isn't totally trivial.
>
>  The collections hierarchy is such a core part of the library that I
>  don't consider the implementation cost to be anywhere near as important
>  as usability or performance. From what I can see, your proposal falls
>  down on both those fronts.

I don't think you've made a convincing argument that performance would
suffer. The only issue raised so far is that nonlocal return is slower
than it should be, but that's trivial to fix in the compiler/standard
library and not so hard to fix outside the standard library with a bit
of custom use of exceptions (in actual fact, some benchmarking
suggests that it might even be desireable to do so). Indeed, internal
iteration is often faster than external iteration, and avoids a lot of
intermediate structures so reduces object churn. I wouldn't be
surprised if this was a slight performance gain. Equally, I wouldn't
be surprised if it was a slight performance loss. I just don't think
it's as clear cut as you're making it out to be. No way to tell other
than to benchmark really.

In terms of usability, that needs some work, but I don't find this to
be particularly unusable as it stands.

In terms of it being a core part of the library, the cost of
implementation actually is a big deal. Scala currently lacks much in
the way of a rich set of functional data structures. This isn't a huge
issue - certainly I don't expect the EFPL team to have provided us
with such - but the easier we can make it to implement new ones the
more likely it is that we'll get such.

>  > My point is you can reimplement basically the entire Iterable trait
>  > sans "elements" based on only foreach.
>
>  You can, and I think it would be good to have a superclass of Iterable
>  that does this. However, below that there need to be classes which
>  implement these methods in a way that best suits their access
>  characteristics. The primary example of this at present is
>  RandomAccessSeq, but the same is true of all the variations to some
>  degree.

Well, sure. Subclasses are still welcome to override them to replace
them with better versions - it would be madness to make these methods
final. The idea is to make it so they do the right thing by default
without forcing an override.

Although, actually, I can't think of any reason you'd want to
reimplement these for RandomAccessSeq.

>  > >  Only if we assume that all our operations are non-strict, which I
>  > > don't think is a good thing.
>  >
>  > I think this is ok. The useage I have in mind is that you do a bunch
>  > of calculations with a "suspended" result and then immediately thunk
>  > the whole thing.
>
>  So I'd have to write things like this?
>
>   myNumbers.map(_ * 2).as[List, Int]
>
>  That's... ugly.

It can certainly be improved. For example, I think it could be reduced
to myNumbers.map(_ * 2) : List[Int] now that the contractiveness
condition has been removed, but that's a rather abusive use of
implicit conversions. I need to have a play with it and see if I can
come up with something in between.

Even without that, I can definitely reduce it to as[List[Int]], which
I find a bit better. I just ran into some last minute trouble with the
way Buildable worked and the fact that Enumerable was covariant, so
ended up with a slightly ugly signature there.

And the thing is, yes. This is a bit uglier if you happen to have the
right data structure. But even if map, flatMap, etc. all return the
"right" thing (which they can't always do. SortedSet for example would
require a different type signature for them if they were to return the
right type of data structure) you often find you started with a List
and now you want a Map. And with this you could equally do:

myNumbers.map(_ * 2).as[FingerTree[Int]]

or

myNumbers.map(x => (x, x)).as[Map[Int, Int]]

which I find a good deal more pleasant than the current state of play.

I agree this makes the mainline case a little more awkward, and I need
to have a think about how to improve that, but it makes a bunch of
other cases much less so.

>
>  > I'm not saying this will never be slower. But equally I expect there
>  > are a lot of times where it will be faster, particularly when it can
>  > eliminate a bunch of intermediate structures.
>
>  Bear in mind that the library already offers Projections which do this,
>  so your version will not be faster.

Projection works by creating custom iterators to do these things.
There's a moderate amount more book keeping to be done to make this
work with iterators, and they still implement their methods in terms
of those iterators. Particularly for very nested structures like trees
this is potentially a non-trivial amount of work, which might make my
version faster. That being said, I would be amazed if my proposal
ended up substantially faster than using projections all over the
place, but hopefully it would be a good deal more convenient. :-)

>  It seems like your Enumerable and Buildable concepts are perfectly
>  valuable things to have as part of the library, but there's no way you

I'd certainly be ok with, for example, offering Iterable as a subtrait
of Enumerable and otherwise carrying on as normal. If nothing else it
would smooth the transition if (and when ;-) ) it did turn out that I
was right, and sometimes it really is convenient to have an iterator
(for example, it's really hard to zip two enumerables together).

>  could use them as the basis for a practical collections hierarchy. The
>  same is likely true of any other simple solution. The problem is not
>  simple.

Don't agree. Most problems are simple once you get down to the core.
But I don't think there's any way for me to convince you of this other
than to actually do it. :-)

>  I think the collections library needs tidying up and rationalising,
>  and updating to make use of current language features. I'd like to see
>  it /support/ collections that lack iterators. But I don't see the basic
>  design changing much.


Re: Towards a revised collection API

by Christos KK Loverdos :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Wouldn't it be to our benefit to analyze a little the basic components
(traits, classes) of the Scala collection library and then see what we
can do to make it better? I currently do not have the whole picture
and I think there are a lot of people out there that would like to
know the rationale behind, i.e. Projection, RandomAccessSeq, Iterator
etc and the role they play in the design and implementation of the
library.

It is my opinion that only after we pin down these things, we may be
able to contribute something better.

On Sat, Apr 19, 2008 at 2:26 PM, David MacIver <david.maciver@...> wrote:

> On Sat, Apr 19, 2008 at 3:09 AM, Jamie Webb <j@...> wrote:
>  > On 2008-04-18 13:28:48 David MacIver wrote:
>  >  > On Fri, Apr 18, 2008 at 1:13 PM, Jamie Webb <j@...>
>  >  > wrote:
>  >  > > On 2008-04-18 12:28:38 David MacIver wrote:
>  >  > >  > So, with the recent noise I've been making about the collections
>  >  > >  > API, I figured I should put some thoughts down on screen.
>  >  > >  >
>  >  > >  > As I see it one of the main problems with the current collection
>  >  > >  > API is that in order to make things play nicely you need to
>  >  > >  > override methods at every step of the way. map, filter, flatMap
>  >  > >  > etc all return the "wrong" things. This is a pain to write and
>  >  > >  > easy to get wrong.
>  >  > >
>  >  > >  This is what Martin is talking about fixing with type constructor
>  >  > >  polymorphism, which seems preferable by far.
>  >  >
>  >  > Sure. This solution uses type constructor polymorphism too. But
>  >  > getting it to handle returning the right type automatically without
>  >  > having to constantly reimplement isn't totally trivial.
>  >
>  >  The collections hierarchy is such a core part of the library that I
>  >  don't consider the implementation cost to be anywhere near as important
>  >  as usability or performance. From what I can see, your proposal falls
>  >  down on both those fronts.
>
>  I don't think you've made a convincing argument that performance would
>  suffer. The only issue raised so far is that nonlocal return is slower
>  than it should be, but that's trivial to fix in the compiler/standard
>  library and not so hard to fix outside the standard library with a bit
>  of custom use of exceptions (in actual fact, some benchmarking
>  suggests that it might even be desireable to do so). Indeed, internal
>  iteration is often faster than external iteration, and avoids a lot of
>  intermediate structures so reduces object churn. I wouldn't be
>  surprised if this was a slight performance gain. Equally, I wouldn't
>  be surprised if it was a slight performance loss. I just don't think
>  it's as clear cut as you're making it out to be. No way to tell other
>  than to benchmark really.
>
>  In terms of usability, that needs some work, but I don't find this to
>  be particularly unusable as it stands.
>
>  In terms of it being a core part of the library, the cost of
>  implementation actually is a big deal. Scala currently lacks much in
>  the way of a rich set of functional data structures. This isn't a huge
>  issue - certainly I don't expect the EFPL team to have provided us
>  with such - but the easier we can make it to implement new ones the
>  more likely it is that we'll get such.
>
>
>  >  > My point is you can reimplement basically the entire Iterable trait
>  >  > sans "elements" based on only foreach.
>  >
>  >  You can, and I think it would be good to have a superclass of Iterable
>  >  that does this. However, below that there need to be classes which
>  >  implement these methods in a way that best suits their access
>  >  characteristics. The primary example of this at present is
>  >  RandomAccessSeq, but the same is true of all the variations to some
>  >  degree.
>
>  Well, sure. Subclasses are still welcome to override them to replace
>  them with better versions - it would be madness to make these methods
>  final. The idea is to make it so they do the right thing by default
>  without forcing an override.
>
>  Although, actually, I can't think of any reason you'd want to
>  reimplement these for RandomAccessSeq.
>
>
>  >  > >  Only if we assume that all our operations are non-strict, which I
>  >  > > don't think is a good thing.
>  >  >
>  >  > I think this is ok. The useage I have in mind is that you do a bunch
>  >  > of calculations with a "suspended" result and then immediately thunk
>  >  > the whole thing.
>  >
>  >  So I'd have to write things like this?
>  >
>  >   myNumbers.map(_ * 2).as[List, Int]
>  >
>  >  That's... ugly.
>
>  It can certainly be improved. For example, I think it could be reduced
>  to myNumbers.map(_ * 2) : List[Int] now that the contractiveness
>  condition has been removed, but that's a rather abusive use of
>  implicit conversions. I need to have a play with it and see if I can
>  come up with something in between.
>
>  Even without that, I can definitely reduce it to as[List[Int]], which
>  I find a bit better. I just ran into some last minute trouble with the
>  way Buildable worked and the fact that Enumerable was covariant, so
>  ended up with a slightly ugly signature there.
>
>  And the thing is, yes. This is a bit uglier if you happen to have the
>  right data structure. But even if map, flatMap, etc. all return the
>  "right" thing (which they can't always do. SortedSet for example would
>  require a different type signature for them if they were to return the
>  right type of data structure) you often find you started with a List
>  and now you want a Map. And with this you could equally do:
>
>  myNumbers.map(_ * 2).as[FingerTree[Int]]
>
>  or
>
>  myNumbers.map(x => (x, x)).as[Map[Int, Int]]
>
>  which I find a good deal more pleasant than the current state of play.
>
>  I agree this makes the mainline case a little more awkward, and I need
>  to have a think about how to improve that, but it makes a bunch of
>  other cases much less so.
>
>
>  >
>  >  > I'm not saying this will never be slower. But equally I expect there
>  >  > are a lot of times where it will be faster, particularly when it can
>  >  > eliminate a bunch of intermediate structures.
>  >
>  >  Bear in mind that the library already offers Projections which do this,
>  >  so your version will not be faster.
>
>  Projection works by creating custom iterators to do these things.
>  There's a moderate amount more book keeping to be done to make this
>  work with iterators, and they still implement their methods in terms
>  of those iterators. Particularly for very nested structures like trees
>  this is potentially a non-trivial amount of work, which might make my
>  version faster. That being said, I would be amazed if my proposal
>  ended up substantially faster than using projections all over the
>  place, but hopefully it would be a good deal more convenient. :-)
>
>
>  >  It seems like your Enumerable and Buildable concepts are perfectly
>  >  valuable things to have as part of the library, but there's no way you
>
>  I'd certainly be ok with, for example, offering Iterable as a subtrait
>  of Enumerable and otherwise carrying on as normal. If nothing else it
>  would smooth the transition if (and when ;-) ) it did turn out that I
>  was right, and sometimes it really is convenient to have an iterator
>  (for example, it's really hard to zip two enumerables together).
>
>
>  >  could use them as the basis for a practical collections hierarchy. The
>  >  same is likely true of any other simple solution. The problem is not
>  >  simple.
>
>  Don't agree. Most problems are simple once you get down to the core.
>  But I don't think there's any way for me to convince you of this other
>  than to actually do it. :-)
>
>
>
>  >  I think the collections library needs tidying up and rationalising,
>  >  and updating to make use of current language features. I'd like to see
>  >  it /support/ collections that lack iterators. But I don't see the basic
>  >  design changing much.
>
>



--
 __~O
 -\ <, Christos KK Loverdos
(*)/ (*) http://ckkloverdos.com


Re: Towards a revised collection API

by Jamie Webb-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 2008-04-19 12:26:10 David MacIver wrote:
> I don't think you've made a convincing argument that performance would
> suffer.

My point is that performance would not improve, and that
reimplementations in subclasses would be at least as necessary as they
are now in order to maintain the current level of performance.

> Although, actually, I can't think of any reason you'd want to
> reimplement these for RandomAccessSeq.

If you implement e.g. map over a RandomAccessSeq using foreach, you
lose random access.

> It can certainly be improved. For example, I think it could be reduced
> to myNumbers.map(_ * 2) : List[Int] now that the contractiveness
> condition has been removed, but that's a rather abusive use of
> implicit conversions. I need to have a play with it and see if I can
> come up with something in between.

That's still horrible though. I suspect that whatever Martin and
Adriaan come up with will be rather more palatable.

> I agree this makes the mainline case a little more awkward, and I need
> to have a think about how to improve that, but it makes a bunch of
> other cases much less so.

Conversion between collections is (potentially) a pretty orthogonal
issue. It would be trivial to allow code such as:

  myNumbers.projection.map(_ * 2).as(FingerTree)

And I don't even need any implicits to do it.

Incidentally, don't you think that your Builders look an awful lot like
the opposite of an Iterator? That is, following the direction you've
taken with Enumerable I imagine a somehow more 'pure' way to build a new
data structure would look like:

  object List extends Maker[List] {
      def make[A](e : Enumerable[A]) = {
          val b = new Listbuffer[A]
          e.foreach(b += _)
          b.toList
      }
  }

> I'd certainly be ok with, for example, offering Iterable as a subtrait
> of Enumerable and otherwise carrying on as normal. If nothing else it
> would smooth the transition if (and when ;-) ) it did turn out that I
> was right, and sometimes it really is convenient to have an iterator
> (for example, it's really hard to zip two enumerables together).

But we ought reasonably to be able to zip any ordered collections,
right? So they all need iterators to do so efficiently.

> >  could use them as the basis for a practical collections hierarchy.
> > The same is likely true of any other simple solution. The problem
> > is not simple.
>
> Don't agree. Most problems are simple once you get down to the core.

If by 'get down to the core' you mean 'discard half the requirements',
then yes I agree...

To summarise, what I like:

 - A reduction in the importance of Iterators
 - A mechanism for more general conversions between collections

What I don't like:

 - Methods that return Enumerable rather than the expected type
 - An assumption that any single access method will be applicable to
   all data structures and operations.

/J


Re: Towards a revised collection API

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, Apr 19, 2008 at 6:49 PM, Jamie Webb <j@...> wrote:
> On 2008-04-19 12:26:10 David MacIver wrote:
>  > I don't think you've made a convincing argument that performance would
>  > suffer.
>
>  My point is that performance would not improve, and that

Don't agree. I'm not making this up - internal iteration really is
often faster, and the suggestion is not without precedent. It's at
least partially based on
http://okmij.org/ftp/papers/LL3-collections-enumerators.txt (note:
That's oleg's site. It's generally a good idea to assume he knows what
he's talking about).

Parts of what are suggested become a bit more difficult due to the
lack of continuations, but the basic idea carries over.

>  reimplementations in subclasses would be at least as necessary as they
>  are now in order to maintain the current level of performance.

Also don't agree. There very often isn't a lot of performance benefit
to be gained

>  > Although, actually, I can't think of any reason you'd want to
>  > reimplement these for RandomAccessSeq.
>
>  If you implement e.g. map over a RandomAccessSeq using foreach, you
>  lose random access.

Well, yes. That's sortof the point of this design. If you want a
RandomAccessSeq you get one back out at the end.

In particular the combination foo.map(bar).as[Array, Stuff] will end
up compiling to something basically equivalent to the override you'd
want to make.

>  > It can certainly be improved. For example, I think it could be reduced
>  > to myNumbers.map(_ * 2) : List[Int] now that the contractiveness
>  > condition has been removed, but that's a rather abusive use of
>  > implicit conversions. I need to have a play with it and see if I can
>  > come up with something in between.
>
>  That's still horrible though. I suspect that whatever Martin and
>  Adriaan come up with will be rather more palatable.

There's something of a limit to how much they can do and still retain
enough flexibility. I certainly look forward to seeing what they come
up with.

>  > I agree this makes the mainline case a little more awkward, and I need
>  > to have a think about how to improve that, but it makes a bunch of
>  > other cases much less so.
>
>  Conversion between collections is (potentially) a pretty orthogonal
>  issue. It would be trivial to allow code such as:

Well, it's true that you can add this functionality without the rest
of my proposal and that it's probably worth doing so. Clearly we
differ as to whether it's still worth trying to make sure everything
is of the right type once one has done that. :-)

>   myNumbers.projection.map(_ * 2).as(FingerTree)
>
>  And I don't even need any implicits to do it.

True, the implicits can be avoided, but I prefer to use them for
various reasons (mainly not having to care about the name of what
provides the service and it avoids a hardcoded reference - if I want
to change the behaviour for the local scope I can).

>  Incidentally, don't you think that your Builders look an awful lot like
>  the opposite of an Iterator? That is, following the direction you've
>  taken with Enumerable I imagine a somehow more 'pure' way to build a new
>  data structure would look like:
>
>   object List extends Maker[List] {
>       def make[A](e : Enumerable[A]) = {
>           val b = new Listbuffer[A]
>           e.foreach(b += _)
>           b.toList
>
>       }
>   }

That's a very good idea actually. In particular it solves a number of
things I've been worrying about (the main one being that if List
overrides something so that it actually returns a List then as[List,
T] would still copy it). Thanks.

>  > I'd certainly be ok with, for example, offering Iterable as a subtrait
>  > of Enumerable and otherwise carrying on as normal. If nothing else it
>  > would smooth the transition if (and when ;-) ) it did turn out that I
>  > was right, and sometimes it really is convenient to have an iterator
>  > (for example, it's really hard to zip two enumerables together).
>
>  But we ought reasonably to be able to zip any ordered collections,
>  right? So they all need iterators to do so efficiently.

This can effectively zip any two sequences of know size fairly
efficiently. I don't feel that a lack of general purpose zipping is a
great loss, although it does annoy me slightly. This is one of the
things were the lack of continuations is a bit painful.

>  > >  could use them as the basis for a practical collections hierarchy.
>  > > The same is likely true of any other simple solution. The problem
>  > > is not simple.
>  >
>  > Don't agree. Most problems are simple once you get down to the core.
>
>  If by 'get down to the core' you mean 'discard half the requirements',
>  then yes I agree...

Actually, that's exactly what I mean. Except that I would say
"features" rather than requirements. If you  minimize the set of what
you want you can come up with much more effective abstractions.

>  To summarise, what I like:
>
>   - A reduction in the importance of Iterators
>   - A mechanism for more general conversions between collections
>
>  What I don't like:
>
>   - Methods that return Enumerable rather than the expected type
>   - An assumption that any single access method will be applicable to
>    all data structures and operations.

I don't assume that any single access method will be applicable to all
data structures and operations. Like I said, you're clearly going to
want to override these sometimes. I just think that this gets you most
of the way there in a way that's a lot cleaner than requiring
everything to return the right type and/or implement an iterator and
for the cases where you do need more detailed operations you can
choose how to get there explicitly.

Anyway, I'm tinkering with the API here if you want to take a look
(right now it's mostly just a slightly fleshed out version of what
you've seen plus some stuff for sequences):
http://freehg.org/u/DRMacIver/alt.collections/

It's mostly for playing with the API, but at some point I'll probably
be adding more interesting collection types.

Incidentally, there's a perfectly valid way of defining a conversion
from a Traversable (what I renamed traversable to) to a Stream with
almost the right laziness properties (it's a bit stricter in the head
than it should be). Here's an (untested) implementation:

object End extends Throwable;

def toStream : Stream[T] = {
  val queue = new java.util.concurrent.SynchronousQueue[() => T];

  spawn {
    try {
       foreach(x => {
         queue.put(() => x));
       }
    } finally {
      queue.put(() => End);
    }
  }

  def next = try { val h = queue.get()(); new Stream[T] { val head =
h; lazy val tail = next; } }
                 catch { case End => Stream.empty }
  next;
}

Why yes, this is in fact 110% pure evil. I'm glad you noticed. ;-)


Re: Towards a revised collection API

by Jamie Webb-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 2008-04-19 20:30:01 David MacIver wrote:

> On Sat, Apr 19, 2008 at 6:49 PM, Jamie Webb <j@...>
> wrote:
> > On 2008-04-19 12:26:10 David MacIver wrote:
> >  > I don't think you've made a convincing argument that performance
> >  > would suffer.
> >
> >  My point is that performance would not improve, and that
>
> Don't agree. I'm not making this up - internal iteration really is
> often faster, and the suggestion is not without precedent. It's at

If you can show that your scheme of delaying evaluation and returning
Traversables is capable of making a simple

  List.range(1, 100000).map(_ * 2)

(returning a List) as fast as it is currently, I'll stop arguing about
performance.

> >  If you implement e.g. map over a RandomAccessSeq using foreach, you
> >  lose random access.
>
> Well, yes. That's sortof the point of this design. If you want a
> RandomAccessSeq you get one back out at the end.
>
> In particular the combination foo.map(bar).as[Array, Stuff] will end
> up compiling to something basically equivalent to the override you'd
> want to make.

Sorry, I should have said RandomAccessSeq.Projection. If I write:

  (1 to 1000000).map(_ * 2)(50000)

I'll get that one mapping and it will not calculate the other 999999.
This is not possible under your scheme, except by reimplementing the
relevant methods not to use foreach.

> True, the implicits can be avoided, but I prefer to use them for
> various reasons (mainly not having to care about the name of what
> provides the service and it avoids a hardcoded reference - if I want
> to change the behaviour for the local scope I can).

But you can do that without implicits too, just using variables.
They compile to the same thing, after all. The nice thing about
avoiding implicits is that type inference works better.

> This can effectively zip any two sequences of know size fairly
> efficiently.

Really? I don't see how, except by eagerly constructing an array.

> >  If by 'get down to the core' you mean 'discard half the
> > requirements', then yes I agree...
>
> Actually, that's exactly what I mean. Except that I would say
> "features" rather than requirements. If you  minimize the set of what
> you want you can come up with much more effective abstractions.

Clearly I am more demanding than you. :-)

> I don't assume that any single access method will be applicable to all
> data structures and operations. Like I said, you're clearly going to
> want to override these sometimes. I just think that this gets you most
> of the way there in a way that's a lot cleaner than requiring
> everything to return the right type and/or implement an iterator and
> for the cases where you do need more detailed operations you can
> choose how to get there explicitly.

It seems to me that your theory that it's an advantage to have every
operation return the same type is predicated on an assumption that it
makes sense to treat data structures uniformly.

> Anyway, I'm tinkering with the API here if you want to take a look
> (right now it's mostly just a slightly fleshed out version of what
> you've seen plus some stuff for sequences):
> http://freehg.org/u/DRMacIver/alt.collections/
>
> It's mostly for playing with the API, but at some point I'll probably
> be adding more interesting collection types.

I'll be interested to see those.

/J


Re: Towards a revised collection API

by Christos KK Loverdos :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>  If you can show that your scheme of delaying evaluation and returning
>  Traversables is capable of making a simple
>
>   List.range(1, 100000).map(_ * 2)
>
>  (returning a List) as fast as it is currently, I'll stop arguing about
>  performance.
>

I am very much in favor of delayed evaluation, can't say about the
rest stuff yet...

--
 __~O
 -\ <, Christos KK Loverdos
(*)/ (*) http://ckkloverdos.com


Re: Towards a revised collection API

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Apr 21, 2008 at 5:16 AM, Jamie Webb <j@...> wrote:

> On 2008-04-19 20:30:01 David MacIver wrote:
>  > On Sat, Apr 19, 2008 at 6:49 PM, Jamie Webb <j@...>
>  > wrote:
>  > > On 2008-04-19 12:26:10 David MacIver wrote:
>  > >  > I don't think you've made a convincing argument that performance
>  > >  > would suffer.
>  > >
>  > >  My point is that performance would not improve, and that
>  >
>  > Don't agree. I'm not making this up - internal iteration really is
>  > often faster, and the suggestion is not without precedent. It's at
>
>  If you can show that your scheme of delaying evaluation and returning
>  Traversables is capable of making a simple
>
>   List.range(1, 100000).map(_ * 2)
>
>  (returning a List) as fast as it is currently, I'll stop arguing about
>  performance.

Some simple (and totally unscientific! I wouldn't believe their
results except qualitatively) tests show mine is about 15% slower
(Depending on the exact numbers I feed in and the value of a hardware
random number generator sitting somewhere in Sun HQ, mine is sometimes
faster. This happens sufficiently rarely that I'm assumine it's a
quirk of when the garbage collector kicks in or an artifact of
measurement or something). Curiously, my approach seems to
consistently be very slightly faster for arrays. I think that might be
because we get better control of when the magic snap to the right type
happens and/or avoid it entirely. Not sure.

The 15% slowdown is presumably because hotspot isn't quite the lean
mean inlining machine we like to believe it is - they should inline to
roughly the same thing, but List does it manually whileas mine has a
few more method invocations to go through.

A similar test on filter shows that way is a bit slower but not as
much for lists and significantly faster for arrays (I'm slightly
alarmed at *how* much faster - it looks to be about 30%. The array
implementation must be doing something really odd. That should
probably be considered a bug in the standard array implementation)

In actual fact, the idea that caused me to come up with doing it this
way was thinking about efficient implementation of a purely functional
array language, so I'm not entirely surprised that it works well with
arrays. I can't think of a good reason though. :-)

You can take a look at the tests here:
http://freehg.org/u/DRMacIver/alt.collections/file/53f65c4f9249/perftests.scala

>  > >  If you implement e.g. map over a RandomAccessSeq using foreach, you
>  > >  lose random access.
>  >
>  > Well, yes. That's sortof the point of this design. If you want a
>  > RandomAccessSeq you get one back out at the end.
>  >
>  > In particular the combination foo.map(bar).as[Array, Stuff] will end
>  > up compiling to something basically equivalent to the override you'd
>  > want to make.
>
>  Sorry, I should have said RandomAccessSeq.Projection. If I write:
>
>   (1 to 1000000).map(_ * 2)(50000)
>
>  I'll get that one mapping and it will not calculate the other 999999.
>  This is not possible under your scheme, except by reimplementing the
>  relevant methods not to use foreach.

That's a good point. I'll have a think about it. It's a slightly
different one though in that this can just be done as a trait for
behaviour. It's almost certainly worth doing, but it doesn't seem to
be problematic to fit in with the rest of this.

>  > True, the implicits can be avoided, but I prefer to use them for
>  > various reasons (mainly not having to care about the name of what
>  > provides the service and it avoids a hardcoded reference - if I want
>  > to change the behaviour for the local scope I can).
>
>  But you can do that without implicits too, just using variables.

It's not like using implicits means you can't call it directly with
the target the implicit would have used.

>  They compile to the same thing, after all. The nice thing about
>  avoiding implicits is that type inference works better.

Yes, but the data inference is significantly worse. :-)

>  > This can effectively zip any two sequences of know size fairly
>  > efficiently.
>
>  Really? I don't see how, except by eagerly constructing an array.

Well, that's reasonably efficient. :-)

>  > Actually, that's exactly what I mean. Except that I would say
>  > "features" rather than requirements. If you  minimize the set of what
>  > you want you can come up with much more effective abstractions.
>
>  Clearly I am more demanding than you. :-)

And consequently you get fewer cool toys to play with. :-) "No! I
don't want that toy truck. I want a RED one!"

More seriously, the more you pare down the requirements the wider the
range of things that implement them is. Hence, cool toys.

>  > I don't assume that any single access method will be applicable to all
>  > data structures and operations. Like I said, you're clearly going to
>  > want to override these sometimes. I just think that this gets you most
>  > of the way there in a way that's a lot cleaner than requiring
>  > everything to return the right type and/or implement an iterator and
>  > for the cases where you do need more detailed operations you can
>  > choose how to get there explicitly.
>
>  It seems to me that your theory that it's an advantage to have every
>  operation return the same type is predicated on an assumption that it
>  makes sense to treat data structures uniformly.

No, it's predicated on the assumption that it makes sense to treat
them uniformly right up to the point where it makes sense to to
otherwise.

It's also predicated on the observation that there are quite a few
cases where you *have* to do this. So we might as well make it as
painless as possible to do. :-)

>  > Anyway, I'm tinkering with the API here if you want to take a look
>  > (right now it's mostly just a slightly fleshed out version of what
>  > you've seen plus some stuff for sequences):
>  > http://freehg.org/u/DRMacIver/alt.collections/
>  >
>  > It's mostly for playing with the API, but at some point I'll probably
>  > be adding more interesting collection types.
>
>  I'll be interested to see those.

I'll give you a shout when any implementations worth looking at are up
there. :-)


Re: Towards a revised collection API

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Apr 21, 2008 at 11:26 AM, David MacIver <david.maciver@...> wrote:
>  A similar test on filter shows that way is a bit slower but not as
>  much for lists and significantly faster for arrays (I'm slightly
>  alarmed at *how* much faster - it looks to be about 30%. The array
>  implementation must be doing something really odd. That should
>  probably be considered a bug in the standard array implementation)

I've filed a bug about this. Looks like it's just a slow
implementation. I expect once that's fixed the numbers will be closer.

http://lampsvn.epfl.ch/trac/scala/ticket/783


Re: Towards a revised collection API

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Apr 21, 2008 at 11:26 AM, David MacIver <david.maciver@...> wrote:
>  You can take a look at the tests here:
>  http://freehg.org/u/DRMacIver/alt.collections/file/53f65c4f9249/perftests.scala

The astute observer will note that I shouldn't be allowed to write
performance tests while half awake. These are backwards...

Sigh.


Re: Towards a revised collection API

by Jamie Webb-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 2008-04-21 14:26:32 David MacIver wrote:
> On Mon, Apr 21, 2008 at 11:26 AM, David MacIver
> <david.maciver@...> wrote:
> >  You can take a look at the tests here:
> >  http://freehg.org/u/DRMacIver/alt.collections/file/53f65c4f9249/perftests.scala
>
> The astute observer will note that I shouldn't be allowed to write
> performance tests while half awake. These are backwards...

Not only that, you're probably measuring interpreted performance, and
the work will probably be dwarfed by GC time...

Here are the numbers I get (-s is Scala, -d is you):

    listmap-s   listmap-d  arraymap-s  arraymap-d  listfilter-s  listfilter-d  arrayfilter-s  arrayfilter-d
  -------------------------------------------------------------------------------------------------------------
          901        1133        1124        1778           586           718            705           1592
          904        1132        1117        1771           605           737            730           1555
          912        1126        1110        1768           602           744            715           1569
          923        1110        1087        1785           601           744            731           1560
          897        1135        1106        1837           609           734            737           1592
          897        1129        1108        1774           605           725            727           1548
          892        1160        1148        1784           612           733            707           1565
          942        1186        1100        1745           589           764            720           1606
          909        1160        1141        1734           589           733            720           1553
          902        1156        1131        1782           588           733            722           1574
  -------------------------------------------------------------------------------------------------------------
   Avg    908 ms     1143 ms     1117 ms     1776 ms        599 ms        736 ms         721 ms        1571 ms  
  -------------------------------------------------------------------------------------------------------------

This is for a list/array of 1000 elements, with timings given for 45000
repetitions.

Sorry, you can't repeat it until I publish my benchmarking library in
Scalax (need to update dependencies; I'll try to do it tomorrow).

/J


Re: Towards a revised collection API

by David MacIver :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Apr 21, 2008 at 3:27 PM, Jamie Webb <j@...> wrote:

>
> On 2008-04-21 14:26:32 David MacIver wrote:
>  > On Mon, Apr 21, 2008 at 11:26 AM, David MacIver
>  > <david.maciver@...> wrote:
>  > >  You can take a look at the tests here:
>  > >  http://freehg.org/u/DRMacIver/alt.collections/file/53f65c4f9249/perftests.scala
>  >
>  > The astute observer will note that I shouldn't be allowed to write
>  > performance tests while half awake. These are backwards...
>
>  Not only that, you're probably measuring interpreted performance, and

Hm. that's possible. Because each test gets compiled into a separate
method and run multiple times, I assumed hotspot would JIT compile the
method after the first few invocations, but it should really be
running a bunch of cold starts first I suppose.

I freely admit to not being very good at benchmarking Java programs
though, so I could have completely the wrong end of the stick. :-)

>  the work will probably be dwarfed by GC time...

Well, I force a garbage collection between tests, so any GC time that
is measured should actually be significant numbers.

>  Here are the numbers I get (-s is Scala, -d is you):
>
>     listmap-s   listmap-d  arraymap-s  arraymap-d  listfilter-s  listfilter-d  arrayfilter-s  arrayfilter-d
>   -------------------------------------------------------------------------------------------------------------
>           901        1133        1124        1778           586           718            705           1592
>           904        1132        1117        1771           605           737            730           1555
>           912        1126        1110        1768           602           744            715           1569
>           923        1110        1087        1785           601           744            731           1560
>           897        1135        1106        1837           609           734            737           1592
>           897        1129        1108        1774           605           725            727           1548
>           892        1160        1148        1784           612           733            707           1565
>           942        1186        1100        1745           589           764            720           1606
>           909        1160        1141        1734           589           733            720           1553
>           902        1156        1131        1782           588           733            722           1574
>   -------------------------------------------------------------------------------------------------------------
>    Avg    908 ms     1143 ms     1117 ms     1776 ms        599 ms        736 ms         721 ms        1571 ms
>   -------------------------------------------------------------------------------------------------------------
>
>  This is for a list/array of 1000 elements, with timings given for 45000
>  repetitions.

That does look rather bleaker than I'd hoped. :-) I don't find the
list numbers too bad, particularly as I expect they'll improve
slightly if List implements traversable directly rather than using a
wrapper. If the performance works well in other cases (which has yet
to be tested) I'm ok with that level of slowdown for something like
this - given that the design is intended to eliminate intermediate
structures it's not surprising it doesn't give much advantage when
there are none.

The array numbers are more of an issue, but I think I know how to
improve those.

How do the numbers change if you increase the size of the list/array?

>  Sorry, you can't repeat it until I publish my benchmarking library in
>  Scalax (need to update dependencies; I'll try to do it tomorrow).

I look forward to it.

< Prev | 1 - 2 | Next >